This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
About the Author Jay Gao, Ph.D., attended Wuhan Technical University of Surveying and Mapping in China. He received his Bachelor of Engineering degree in Photogrammetry and Remote Sensing in 1984. He continued his education at the University of Toronto, and obtained his Master of Science degree from the Geography Department in 1988, majoring in remote sensing. Four years later, he earned his Ph.D. from the University of Georgia in the field of remote sensing and geographic information systems. He then joined the Geography Department at the University of Auckland in New Zealand as a lecturer. His teaching interests include remote sensing, digital image processing, geographic information systems, and spatial analysis. Over his academic career Dr. Gao has done extensive research in digital image analysis and its applications to resources management and hazards monitoring. His numerous papers have appeared in a wide range of journals and conference proceedings.
Digital Analysis of Remotely Sensed Imagery Jay Gao, Ph.D. School of Geography, Geology and Environmental Science The University of Auckland Auckland, New Zealand
New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto
igital image analysis is a field that crosses the boundaries of several disciplines. Digital analysis of remotely sensed data for managing the environment of the Earth and its natural resources, however, differs from medical image processing and imaging processing in electrical engineering in three ways. First, the object of study is different. Satellite images are snapshots of the Earth’s surface, which lies in a state of constant change. These changes need to be monitored from multitemporal images to identify the longitudinal trends. Second, the data used are captured over a much longer wavelength range, extending into the thermal infrared and microwave spectrum, usually recorded in the multispectral domain. Their enhancement and classification require an understanding of the interaction between solar radiation and the Earth’s surface. Finally, the objective of image analysis is different. A very significant component of digital analysis of remotely sensed data is to convert them into useful information on land cover/land use over the Earth’s surface. The derived information is usually presented in graphic (map) format at a certain scale. In order to make the map conform to certain cartographic standards, image geometry and the accuracy issue must be addressed and featured prominently in digital image analysis. This book aims at providing exhaustive coverage of the entire process of analyzing remotely sensed data for the purpose of producing an accurate and faithful representation of the Earth’s resources in the thematic map format. Recent years have witnessed phenomenal development in sensor technology and the emergence of a wide variety of remote sensing satellites. Now it is possible to acquire satellite images with a spatial resolution as fine as submeters or comparable to that of airborne photographs. The wide and easy availability of remote sensing data from various sensors in the digital format creates an ideal opportunity to process and analyze them automatically. Satellite data are routinely analyzed using various image processing systems to fulfill different application needs. The functionality and sophistication of image analysis have evolved considerably over the last two decades, thanks to the incessant advances in computing technology.
xix
xx
Preface Parallel to the advances in sensing technology is progress in pertinent geocomputational fields such as positioning systems and geographic information systems. The geospatial data from these sources not only enrich the sources of data in digital image analysis, but also broaden the avenue to which digitally processed results are exported. Increasingly, the final products of digital image analysis are not an end in themselves, but a part of a much larger database. The prerequisite for integrating these data from diverse sources is compatibility in accuracy. This demands that results derived from digital analysis of remotely sensed data be assessed for their thematic accuracy. Reliable and efficient processing of these data faces challenges that can no longer be met by the traditional, well-established perpixel classifiers, owing to increased spatial heterogeneity observable in the imagery. In response to these challenges, efforts have gone to developing new image processing techniques, making use of additional image elements, and incorporating nonremote sensing data into image analysis in an attempt to improve the accuracy and reliability of the obtained results. In the meantime, image analysis has evolved from one-time to long-term dynamic monitoring via analysis of multitemporal satellite data. A new book is required to introduce these recent innovative image classification methods designed to overcome the limitations of per-pixel classifiers, and to capture these new trends in image analysis. Contained in this book is a comprehensive and systematic examination of topics in digital image analysis, ranging from data input to data output and result presentation, under a few themes. The first is how to generate geometrically reliable imagery (Chap. 5). The second theme is how to produce thematically reliable maps (Chaps. 6 to 11). The third theme of the book centers around the provision of accuracy indicators for the results produced (Chap. 12). The last theme is about integration of digital image analysis with pertinent geospatial techniques such as global positioning system and geographic information system (GIS) (Chaps. 13 and 14). This book differs from existing books of a similar topic in three areas. First, unlike those books written by engineers for engineering students, this book does not lean heavily toward image processing algorithms. Wherever necessary, mathematical formulas behind certain processing are provided to ensure a solid theoretical understanding. Nevertheless, the reader is left with the discretion to decide on the level of comprehension. Those who are mathematically challenged may wish to skip the mathematical equations. Instead, they can concentrate on the examples provided and on the interpretation of processed output. In this way the fundamental concepts in image analysis are not lost. Second, the book features the geometric component of digital image analysis, a topic that is treated rather superficially or in a fragmented manner by authors with little background in geography,
Preface prominently. Finally, this book captures the most recent developments in image analysis comprehensively. Tremendous progress has been made in analyzing remotely sensed data more accurately and with greater ease. This book about digital image analysis is a timely reflection of these recent changes and trends in the field. This book is best used as a textbook for a course in digital image analysis. The targeted readership of the book is upper-level undergraduate students and lower-level postgraduate students. Ideally, they should have had a fundamental remote sensing course in their curriculum already. Otherwise, they need to spend more time than other students in the class to familiarize themselves with the content of the first few chapters. No assumption is made about their mathematical background, even though it is a great advantage to understand matrix operations in order to comprehend certain analyses better. Besides, this book is a great reference for those practitioners, resources managers, and consultants engaged in analysis of geospatial data, especially those who need to derive information about the Earth from airborne or spaceborne remote sensing materials. Jay Gao, Ph.D.
xxi
This page intentionally left blank
Acknowledgments
I
t is very hard to say when the writing of this book started. I initially became interested in digital image analysis when I was doing my master’s thesis research at the University of Waterloo and later at the Canadian Centre for Remote Sensing. At the beginning of my teaching career, Peng Gong generously shared his lecturing material with me. His material on the topic of image segmentation has been revised and incorporated in this book. After I started my teaching career in the School of Geography at the University of Auckland, I further acquired new skills in image analysis and undertook several projects of digital image analysis. This book could not have been published without the assistance of Somya Rustagi. She put up with my impatience and sluggish response to her queries, and chased me for follow-up matters that I had forgotten. Taisuke Soda at McGraw-Hill deserves special mention for his persistence in seeing this project through. Stephen Smith offered insightful advice on how to format the manuscript. Many publishers and organizations have generously granted me the use of their copyrighted materials in this book. It is not possible for me to express my gratitude to all of them as there are too many. However, I would like to mention ESA for Figs. 2.10 and 2.12; DigitalGlobe for Figs. 2.9 and 13.1; Clark Labs at Clark University for Fig. 4.1; ITT Visual for Fig. 4.3; Definiens for Fig. 4.6; ERDAS for Figs. 4.2, 4.4, and 11.10; Visual Learning Systems for Fig. 10.10; ASPRS for Figs. 10.4B and 11.4; Elsevier for Figs. 9.7 and 10.8 and Table 9.3; Taylor and Francis for Figs. 9.9 and 13.9; Wiley for Figs. 5.5, 6.26, and 7.7; Trimble for Fig. 14.19; and SpringerVerlag for Fig. 6.24. Tim Noland drew Fig. 5.20. Igor Dracki offered valuable tips on how to use CoralDraw competently and on preparation of graphics. Last but not least, I would like to thank my parents for their support over the years. Without their generosity I could not have gone to university and received such a good education. I am especially indebted to my father, who was always proud of my achievements and who surely would have been very pleased to see the publication of this book.
xxiii
This page intentionally left blank
Digital Analysis of Remotely Sensed Imagery
This page intentionally left blank
CHAPTER
1
Overview
D
igital processing of satellite imagery refers to computerbased operations that aim to restore, enhance, and classify remotely sensed data. It may involve a single band or multiple bands in the input, depending on the nature and purpose of the processing. The output image is a single band in most cases. Digital analysis of remotely sensed data has a relatively short history. It did not come into existence until the early 1970s with the launch of the first Earth Resources Technology Satellite (subsequently renamed Landsat), when remote sensing images in the digital format became available for the first time in history. The availability of a huge quantity of data necessitated their timely and efficient processing. In response to this demand, digital analysis of remote sensing data experienced exponential development. In its early years, digital image analysis was very cumbersome to undertake because the computer had limited functions and capability. Undertaking of digital image analysis was made more difficult by the minicomputer running the user-unfriendly UNIX operating system. Over the years digital image analysis has become increasingly easy to perform, thanks to advances in computing technology and in image analysis systems. Now more processing functions can be achieved at a much faster pace than ever before. This chapter introduces the main characteristics and components of a digital image processing system. The nature of digital analysis of remote sensing images is summarized comparatively with that of the familiar visual image interpretation. Following this comparison is a comprehensive review of the entire process of digital image analysis from data input to results presentation. Presented next in this chapter is an introduction to the preliminary knowledge of digital image analysis that serves to lay a solid foundation for discussion in the subsequent chapters. Featured prominently in this section are pixels, the building blocks of satellite imagery. Lastly, this chapter introduces the important properties of satellite data, such as their spatial and spectral resolutions, in detail. This chapter ends with an overview of the content of the remaining chapters in this book.
1
2 1.1
Chapter One
Image Analysis System In order to function smoothly, a digital image analysis system must encompass a few essential components in hardware, software, the operating system, and peripheral devices (Fig. 1.1). Featured prominently among various hardware components is the computer, which, among other things, is made up of a central processing unit, a monitor, and a keyboard. As the heart of the system, the central processing unit determines the speed of computation. The keyboard/ mouse is the device through which the user interacts with the machine. The monitor fulfils the function of displaying the image processing software and visualizing image data, as well as any intermediate and final results in tabular and graphic forms. The operating system controls the operation of a computer’s activities and manages the entry, flow, and display of remote sensing data within the computer. These components are common to all computers. Unique to image analysis is the software that executes computer commands to achieve desired image processing functions. These computer programs are written in a language comprehensible to the computer. During image analysis these commands are issued by the image analyst by clicking on certain buttons or icons of the image processing system. So far, a number of image analysis systems have been developed for processing a wide range of satellite data and for their integrated analysis with non-remote sensing data. An image analysis system is incomplete without peripheral devices that input data into the system and output the results from
Data input/ import
FIGURE 1.1
Data analysis & display
Results output/ export
Configuration of a typical digital image processing system.
Overview the system. Common input devices include scanners that are able to convert analog images into a digital format quickly and drives that allow data stored in the external media to be read into the computer. Standard output devices include printers and plotters. Printers can print results, usually small in size, in black and white, or color. A plotter is able to print a large map of classified results. Other peripheral devices include a few ports and drives that can read data stored in special media. Disk drives and special drives for CD read-only memory (CD-ROM) and memory sticks are so universal to all desktop and laptop computers that they can hardly be regarded as peripheral devices any more.
1.2
Features of Digital Image Analysis Analysis of remotely sensed data in the digital environment differs drastically from the familiar visual interpretation of satellite images. The main features of digital image analysis are summarized in Table 1.1, comparatively with visual interpretation. The most critical difference lies in the use of cues in the input data. In the digital environment only the value of pixels in the input data is taken advantage of. During image classification these pixels are treated mostly in isolation without regard to their spatial relationship. Another distinctive feature of digital analysis is its abstractness. Both the raw data and the final processed results are invisible to the analyst unless they are visualized on the computer monitor. The analyst’s prior knowledge or experience plays no role in the decision making behind a classification. The analyst is only able to exert an influence prior to the decision-making process, such as during selection of input fed into the computer. In this way
Features
Digital
Visual
Evidence of decision making
Pixel value in multiple bands treated in isolation
All seven elements in one image treated in a spatial context
Process of decision making
Fast, abstract, invisible
Slow, concrete, visible
Role of prior knowledge
Limited
Critical
Nature of result
Quantitative and objective
Qualitative and subjective
Facilities required
Complex and expensive
Simple and inexpensive
TABLE 1.1 Main Features of Digital Image Analysis in Comparison with Visual Interpretation
3
4
Chapter One the results are much more objective than visual ones that are strongly influenced by the interpreter’s knowledge and expertise in the subject area concerned, as well as personal bias. The results, nevertheless, are quantitative and can be exported to other systems for further analysis without much additional work. This ease of portability is achieved at the expense of purchasing and maintaining expensive and sophisticated computer hardware and software.
1.2.1 Advantages Digital image processing has a number of advantages over the conventional visual interpretation of remote sensing imagery, such as increased efficiency and reliability, and marked decrease in costs.
Efficiency Owing to the improvement in computing capability, a huge amount of data can be processed quickly and efficiently. A task that used to take days or even months for a human interpreter to complete can be finished by the machine in a matter of seconds. This process is sped up if the processing is routinely set up. Computer-based processing is even more advantageous than visual interpretation for multiple bands of satellite data. Human interpreters can handle at most three bands simultaneously by examining their color composite. However, there is no limit as to the number of bands that can be processed in image classification. Moreover, the input of many spectral bands will not noticeably slow the processing.
Flexibility Digital analysis of images offers high flexibility. The same processing can be carried out repeatedly using different parameters to explore the effect of alternative settings. If a classification is not satisfactory, it can be repeated with different algorithms or with updated inputs in a new trial. This process can continue until the results are satisfactory. Such flexibility makes it possible to produce results not only from satellite data that are recorded at one time only, but also from data that are obtained at multiple times or even from different sensors. In this way the advantages of different remote sensing data can be fully exploited. Even non-remote sensing data can be incorporated into the processing to enhance the accuracy of the obtained results.
Reliability Unlike the human interpreter, the computer’s performance in an image analysis is not affected by the working conditions and the duration of analysis. In contrast, the results obtained by a human interpreter are likely to deteriorate owing to mental fatigue after the user has been working for a long time, as the interpretation process is highly demanding mentally. The results are also likely to be different, sometimes even drastically, if obtained by different interpreters,
Overview because of their subjectivity and personal bias. By comparison, the computer can produce the same results with the same input no matter who is performing the analysis. The only exception is the selection of training samples, which could be subjective. However, the extent of such human intervention is con-siderably reduced in the digital environment.
Portability As digital data are widely used in the geoinformatics community, the results obtained from digital analysis of remote sensing data are seldom an end product in themselves. Instead, they are likely to become a component in a vast database. Digital analysis means that all processed results are available in the digital format. Digital results can be shared readily with other users who are working in a different, but related, project. These results are fully compatible with other existent data that have been acquired and stored in the digital format already. This has profound repercussions for certain analyses that were not possible to undertake before. For instance, the results of digital analysis can be easily exported to a geographic information system (GIS) for further analysis, such as spatial modeling, land cover change detection, and studying the relationship between land cover change and socioeconomic factors (e.g., population growth).
1.2.2
Disadvantages
Digital image analysis has four major disadvantages, the critical ones being the initial high costs in setting up the system and limited classification accuracy.
High Setup Costs The most expensive component of digital image analysis is the high initial cost associated with setting up the analysis system, such as purchase of hardware and software. These days the power of computers has advanced drastically, while their prices have tumbled. Desktop computers can now perform jobs that used to require a minicomputer. The same machine can be shared with others for many other purposes in addition to image analysis, such as GIS spatial analysis and modeling. Nevertheless, they depreciate very fast and have a short life cycle. Hardware has to be replaced periodically. Similar to hardware, the initial cost of purchasing software is also high. Unlike hardware, software is never meant to be a one-off cost. Software licensing policy usually needs to be renewed annually. Additional costs may include subscription of ongoing user support service so that assistance is available whenever the system runs into problem. The third cost is related to the purchase of data. Compared with printed materials, satellite data are much more expensive. Although the price of medium-resolution data has dropped considerably, it is
5
6
Chapter One still expensive to buy the most recent, very high spatial resolution satellite data. High costs are also related to maintenance personnel. A system administrator is needed to update the image processing system periodically and to back up system data and temporary results regularly.
Limited Accuracy The second major limitation of digital image analysis is the lowerthan-expected classification accuracy. Classification accuracy varies with the detail level and the number of ground covers mapped. In general, it hovers around 60 to 80 percent. A higher accuracy is not so easy to achieve because the computer is able to take advantage of only a small portion of the information inherent in the input image, while a large portion of it is disregarded. Understandably, the accuracy is rather limited for ground covers whose spectral response bears a high resemblance to that of other covers.
Complexity A digital image system is complex in that the user requires special training before being able to use it with confidence. Skillful operation of the system requires many hours of training and practice. As the system becomes increasingly sophisticated, it becomes more difficult to navigate to a specific function or to make full use of the system’s capability.
Limited Choices All image processing systems are tailored for a certain set of routine applications. In practice it may be necessary to undertake special analyses different from what these prescribed functions can offer. Solutions are difficult to find among the functions available in a given package. Although this situation has improved with the availability of a special scripting language in some image analysis systems, it is still not easy to tackle this scripting job if the user does not have a background in computer programming.
1.3
Components of Image Analysis The process of image analysis starts from preparation of remotely sensed data readable in a given system and feeding them into the computer to generate the final results in either graphic or numeric form (Fig. 1.2). Additional preliminary steps, such as scanning, may also be required, depending on the format in which data are stored. There is no common agreement as to what kind of postclassification processing should be contained in the process. In this book, three postclassification processings are considered: accuracy assessment, change detection, and integration with non-remote sensing data. The logical sequence of these processing steps is chronologically presented
FIGURE 1.2 Flowchart of a comprehensive image analysis procedure. Some of the steps in the chart could be absent in certain applications while other steps can be carried out in a sequence different from that shown in the chart. The major blocks in the chart will be covered in separate chapters in this book.
in a flowchart in Fig. 1.2. Not all topics shown in the diagram are equally complex and significant. Some of them need a paragraph to explain while others require a chapter to cover adequately. The important topics are identified below.
7
8
Chapter One
1.3.1
Data Preparation
Core to data preparation is image preprocessing. Its objective is to correct geometrically distorted and radiometrically degraded images to create a more faithful representation of the original scene. Preprocessing tasks include image restoration, geometric rectification, radiometric correction, and noise removal or suppression. Some of these tasks may have been performed at a groundreceiving station when the data are initially received from the satellite. More preprocessing specific to the needs of a particular project or a particular geographic area may still be performed by the image analyst.
1.3.2
Image Enhancement
Image enhancement refers to computer operations aimed specifically at increasing the spectral visibility of ground features of interest through manipulation of their pixel values in the original image. On the enhanced image it is very easy to perceive these objects thanks to their enhanced distinctiveness. Image enhancement may serve as a preparatory step for subsequent machine analysis such as for the selection of training samples in supervised classification, or be an end in itself (e.g., for visual interpretation). The quality or appearance of an image can be enhanced via many processing techniques, the most common ones being contrast enhancement, image transformation, and multiple band manipulation.
1.3.3
Image Classification
Image classification is a process during which pixels in an image are categorized into several classes of ground cover based on the application of statistical decision rules in the multispectral domain or logical decision rules in the spatial domain. Image classification in the spectral domain is known as pattern recognition in which the decision rules are based solely on the spectral values of the remote sensing data. In spatial pattern recognition, the decision rules are based on the geometric shape, size, texture, and patterns of pixels or objects derived from them over a prescribed neighborhood. This book is devoted heavily to image classification in the multispectral domain. Use of additional image elements in performing image classification in the spatial domain is covered extensively, as well, together with image classification based on machine learning.
1.3.4 Accuracy Assessment The product of image classification is land cover maps. Their accuracy needs to be assessed so that the ultimate user is made aware of the potential problems associated with their use. Accuracy assessment is a quality assurance step in which classification results are compared with what is there on ground at the time of imaging or something that
Overview can be regarded as its acceptable substitute, commonly known as the ground reference. Evaluation of the accuracy of a classification may be undertaken for each of the categories identified and its confusion with other covers, as well as for all the categories. The outcome of accuracy assessment is usually presented in a table that reveals accuracy for each cover category and for all categories as a whole.
1.3.5
Change Detection
Change detection takes remote sensing to the next stage, during which results from respective analysis of remotely sensed data are compared with each other, either spatially or nonspatially. This is commonly known as multitemporal remote sensing that attempts to identify what has changed on the ground. Change may be detected from multitemporal remotely sensed data using different methods, all of which are covered in this book. A number of issues relating to change detection (e.g., operating environment, accuracy, and ease of operation) and their impact on the accuracy of detected results are examined in depth, as well.
1.3.6
Integrated Analysis
In addition to satellite imagery data, non-remote sensing data have been increasingly incorporated into digital image analysis to overcome one of the limitations identified above, namely, to make use of more image elements in the decision making so that classification results can be more accurate. Many kinds of ancillary data, such as topographic, cadastral, and environmental, have found use in image analysis. Different methods have been developed to integrate them with remotely sensed data for a wide range of purposes, such as development of more accurate databases and more efficient means of data acquisition. This book explores the various methods by which different sources of data may be integrated to fulfill specific image analysis objectives.
1.4
Preliminary Knowledge 1.4.1
Pixel
Formed from the combination of picture and element, pixel is the fundamental building block of a digital image. An image is composed of a regularly spaced array of pixels (Fig. 1.3). All pixels have a common shape of square, even though triangle and hexagon are also possible. When a pixel is stored in a computer, it is represented as an integer. In this sense, a pixel does not have any size. Nevertheless, a pixel still has a physical size. Also known as cell size, it refers to the ground area from which the reflected or emitted electromagnetic radiation is integrated and recorded as a single value in the image
FIGURE 1.3 An image is composed of a two-dimensional array of pixel values. Down: row. Across: column.
during sampling of the Earth’s surface. Thus, pixel size is synonymous with the ground sampling interval. Theoretically, the pixel size of a satellite image cannot be made finer once the image is scanned, though it is possible to reduce this size to a smaller dimension (e.g., from 10 to 5 m) through resampling during image processing. However, the detail of the image cannot be improved by simply splitting a pixel into fractions. Similarly, through resampling the pixel size of an image can be reduced by amalgamating spatially adjoining pixels. As more adjoining pixels are merged, the image increasingly loses its detail level. Pixels fall into two broad categories, pure pixels and mixed pixels, in terms of the composition of their corresponding covers on the ground. Pure pixels are defined as those that are scanned over a homogeneous ground cover. These pixels have a pure identity relating to a unique type of ground feature. By comparison, mixed pixels contain the electromagnetic radiation originating from at least two types of cover features on the ground. The formation and quantity of mixed pixels in an image are affected by the following three factors. (1) Spatial resolution or pixel size: Given the same scene on the ground, an image of a coarser spatial resolution contains more mixed pixels. (2) Homogeneity of the scene: A highly heterogeneous scene is conducive to formation of more mixed pixels (these pixels are usually located at the interface of differing ground covers). (3) Shape and orientation of these different cover parcels in relation to the direction of scanning: Highly irregulary shaped cover parcels tend to have more mixed pixels along their borders.
Overview Since mixed pixels do not have a singular identity, it is impossible to correctly classify them into any one component cover at the pixel level. Their precise labeling has to take place at the subpixel level with a probability attached to each component feature. No matter whether a pixel is pure or mixed, it always has two crucial properties, its value or digital number (DN), and its location in a two-dimensional space.
1.4.2
Digital Number (DN)
The DN of a pixel in a spectral band represents the amount of radiation received at the sensor, which is determined primarily by the capability of the ground object in reflecting and emitting energy. The amount of energy reaching the sensor is a function of the wavelength of the radiation. Thus, pixel value varies from band to band. The actual DN value of a pixel in an image is affected by many other external factors, such as atmospheric radiation, the sensor’s sensitivity, and more importantly, the ground sampling interval of the sensing system. In spite of these external interferences, theoretically, the same target should have the same or similar DN value in the same band; and different targets should have dissimilar DN values in the same band. However, this relationship is not always maintained because of the similar appearance of some ground objects. No matter how many bands the received energy is split into spectrally, it is always recorded as positive integers (Fig. 1.3). The theoretical range of pixel values in an image is determined by the number of bites used to record the energy, or the quantization level. A commonly adopted quantization level is 8 bits. So the number of potential DN values amounts to 28 or 256, ranging from 0 to 255. A DN value of 0 implies that no radiative energy is received from the target on the ground. A value of 255 indicates a huge amount of radiation has reached the sensor in space. Because of the atmospheric impact or limitations in the sensing system, not all of the potential levels of DN are fully taken advantage of during data recording, a situation that can be remedied through image enhancement. Recent advances in sensing technology have made it possible to reach a quantization level as high as 11 bits. As illustrated in Fig. 1.3, at a quantization level of 9, pixel values vary from 0 to 29 − 1 (511). In the binary system of encoding the amount of received energy, pixel values are not allowed to have any decimal points. They are recorded as 8-bit, unsigned integers. Floating point pixel values are not commonly associated with raw satellite data. With the use of more bits in a computer, it is possible to have floating point data for some processed results (e.g., ratioed band).
1.4.3
Image Reference System
There are many coordinate systems in use, such as the latitudelongitude system and the cartesian coordinate system. The latter is a plane system suitable for representing two-dimensional digital
11
Chapter One imagery (Fig. 1.4a). This system consists of two axes: abscissa that increases in value eastward and ordinate that increases in value northward. Hence, the space is partitioned into four quadrants. Coordinates in different quadrants have different signs. Only in the first quadrant are both abscissa and ordinate positive. Due to the presence of negative coordinates, this system is not suitable for referencing pixels in an image. In spite of the three-dimensional Earth’s surface in reality, its rendition in a digital image has one fewer dimension. This reduction is permissible given that the sensor is usually located hundreds of kilometers above the Earth’s surface that has a negligible relief by comparison. Since the third dimension (height) of ground objects is not a concern in natural resource applications of remote sensing, it is acceptable to approximate this surface as a flat one represented by a two-dimensional array of pixels. Thus, a pair of coordinates in the form of row and column (also known as line and pixel) is required to locate uniquely locate a pixel in this array. Both have an increment of 1. These coordinates depict the central location of a grid cell. Since an image always starts with the first pixel and then the next sequentially, an image coordinate system differs from the commonly known cartesian coordinate system. Here, its origin is located in the upper left corner (Fig. 1.4b). Known as line, row increases vertically downward. Column refers to the position of a pixel in a row. It increases across from left to right. The total number of rows and columns of an image defines its physical size. Pixel P in Fig. 1.4b has a coordinate of (3, 10), in which 3
Northing Column (position, pixel)
+6
Origin
+5
Quadrant I Easting > = 0 Northing > = 0
Quadrant II +4 Easting < 0 +3 Northing > = 0+2
P
+1 –6 –5 –4 –3 –2 –1 0 –1
+1 +2 +3 +4 +5 +6
–2
Quadrant III –3 Easting < 0 –4 Northing < 0 –5
Easting
Row (line)
12
Quadrant IV Easting > = 0 Northing < 0
–6
(a)
(b)
FIGURE 1.4 Comparison of the cartesian coordinate system (a) with the image coordinate system (b). In the cartesian coordinate system, the space is divided into four quadrants, so coordinates can be positive or negative, dependent upon in which quadrant a point is located. In the image coordinate system, all coordinates are positive, as the origin is located in the upper left corner. Both systems require a pair of coordinates to reference a location uniquely.
Overview is known as row or line, and 10 as column, position, or pixel. This convention of representation is not universally adhered to, so it can vary with the image processing system. Of particular note is that the first row and last row are counted in determining the number of pixels/ columns of an image. Also, the first row or column can start from 0 as well as from 1. As with all raster data, the coordinates of pixels in an image are not explicitly stored in the computer except for a few strategic ones (i.e. the four corner pixels). Instead, all pixels are recorded sequentially by column first and by row next as a long list. Their geographic location is implicitly defined by their relative position in the list or their distance from the origin (i.e., the first pixel). This relative position can be converted into a pair of absolute coordinates expressed as row and column from this distance as well as the physical dimension (e.g., number of rows by number of columns) of the image. These coordinates may be further converted into the metric expression by multiplying them by the spatial resolution of the image.
1.4.4
Histogram
A histogram is a diagram displaying the frequency distribution of pixels in an image with respect to their DNs (Fig. 1.5). It can be presented either graphically or numerically. A graphic histogram contains two axes. The horizontal axis is reserved for the pixel’s DN. It is an integer with an increment of 1 or other larger integers specified by the analyst. Thus, the histogram is not smooth but discrete. The vertical axis represents the frequency, in either relative terms (percentage) or absolute terms (actual number of pixels). A graphic histogram is an effective means of visualizing the quality of a single spectral band directly. For instance, a broad histogram curve signifies a reasonable contrast while its position relative to the horizontal axis is indicative of the overall tone of the band (Fig. 1.5a). A position toward the left suggests that the image tends to have an overall dark tone, a phenomenon equivalent to underexposure in an analog aerial photograph (Fig. 1.5b). On the other hand, a position toward the right shows that the image has a bright tone throughout, with an appearance similar to an overexposed aerial photograph. Unlike a graphic histogram, a numeric histogram displays the exact number of pixels at every given DN level. In order to reduce the number of DN levels, a few DNs may be amalgamated. In this case, the frequency refers to the combined pixels over the indicated range of DNs. Both forms of histogram are essential in contrast manipulation of spectral bands. A preview of a graphic histogram enables the analyst to prescribe the kind of enhancement method most appropriate for the image. A numeric histogram provides important clues in deciding critical thresholds needed in performing certain kinds of image contrast stretching.
13
14
Chapter One Histogram 18195
0 1
2047
(a) Histogram 28894
0 80
2047
(b)
FIGURE 1.5 Examples of two graphic histograms illustrating different qualities of the spectral bands they correspond to. The first histogram (a) has a larger range, but most pixels have a small DN value, causing the image to have a darkish overall tone. The spike in the histogram represents water pixels. The skinny and narrow histogram (b) shows a limited contrast as not all available DNs are taken advantage of during data recording.
1.4.5
Scatterplot
A scatterplot is an extension of a one-band graphic histogram into a two-band situation. This diagram illustrates the distribution of pixel values in the two spectral band domain (Fig. 1.6). Either band can serve as the horizontal or vertical axis in a scatterplot. The variable in both axes is the pixel DN of the usual range of 0 to 255. What this diagram is able to reveal depends on where the pixels originate from. If they come from the entire image, then a scatterplot is able to reveal whether the content of the two bands is correlated with each other. If all pixels fall into a linear trend neatly, then the content of both bands exhibits a high degree of resemblance, or there is severe data redundancy between them. Since a scatterplot is best at showing the distribution of pixel values over two bands, multiple scatterplots have to be constructed to illustrate the correlation extent between any two spectral bands in case of more than two multispectral bands. If the pixels are selected from a subarea related to specific land covers, the scatterplot can be used to identify whether the covers represented by these pixels are spectrally separable. Such a plot is very useful in revealing the feasibility of mapping these covers prior to the
Overview
Band B
255
0
0
255 Band A
FIGURE 1.6 A scatterplot of two spectral bands. It illustrates the correlation between the information content of spectral band A versus band B. In the diagram the variable in both axes is DN, which ranges from 0 to 255. Dashed lines are histograms of respective bands.
classification. They can also foretell the accuracy of mapping these covers on the basis of the spectral distance between these pixels and pixels from other covers. If the pixels from one type of land cover feature are distributed in close proximity to those from another type of land cover feature, then there is a low spectral separability between the two concerned land covers in these two spectral bands.
1.5
Properties of Remotely Sensed Data The property of remotely sensed data most critical to their utility is their resolution. It refers to an imaging system’s capability of resolving two adjacent features or phenomena. There are four types of resolution for remote sensing imagery: spatial, spectral, radiometric, and temporal.
1.5.1
Spatial Resolution
Also called ground sampling distance, spatial resolution of imagery refers to its ability to distinguish two spatially adjacent objects on the ground. Spatial resolution is the equivalent of the spatial dimension of scanning on the ground during image acquisition. For raster images, spatial resolution is synonymous with the pixel size of the remotely sensed data. Ground sampling distance is jointly governed by the instantaneous field-of-view (IFOV) (α) of the sensing system and the altitude of the platform (H) that carries the sensor (Fig. 1.7), or
Pixel size = a × H
(1.1)
15
16
Chapter One Rotating scan mirror
rs
to
c te
Field of view y rra
of
de
A
IFOV
H
tion
irec
d can
S
h
idt
w ath w S Ground cell size
Direction of satellite motion
FIGURE 1.7 Relationship among spatial resolution of satellite imagery, satellite altitude (H), and IFOV (a) of the scanner.
where α is expressed as a radian angle. According to this equation, at the same altitude a smaller IFOV translates into a smaller pixel size, and vice versa. At the same IFOV, a lower altitude leads to an image of a finer spatial resolution, and vice versa. Spatial resolution denotes the theoretical dimension of ground features that can be identified from a given remote sensing image. The finer the spatial resolution, the more detailed the image is. As the pixel size increases, less detail about the target is preserved in the data (Fig. 1.8). A small cell size is desirable in those local-scale applications that demand great details about the target. A fine spatial resolution reduces the number of mixed pixels, especially if the landscape is highly fragmented and land cover parcels have an irregular shape. The downside effect of having a fine spatial resolution is a large image file size. This file size is going to double or triple if two or three spectral bands are needed. As it is a common practice to record satellite data in the multispectral mode, an image file size can reach a few megabits easily. Such a large file is going to slow down all subsequent analyses. It is thus important to select data with a spatial resolution appropriate for the needs of an application. If the digital remote sensing data are obtained through scanning of existing aerial photographs, their spatial resolution is determined by
Overview
(a)
(b)
(c)
(d)
FIGURE 1.8 Appearance of an image represented at four spatial resolutions of 4 m (a), 8 m (b), 20 m (c), and 40 m (d). As pixel size increases, ground features become less defined. See also color insert.
both the scanning interval and the scale of the photographs used. If an analog satellite image is scanned, then the scanned image’s spatial resolution may not bear any relationship with that of the original digital image. This discrepancy needs to be taken into consideration when data scanned from analog materials are analyzed digitally.
1.5.2
Spectral Resolution
Spectral resolution refers to the ability of a remote sensing system to differentiate the subtle difference in reflectance of the same ground object at different wavelengths. Spectral resolution is determined by the number of spectral bands used to record spectrally split radiative energy received from the target. It is related to the wavelength range of each spectral band, as well as the wavelength range of all bands. It must be noted that not all spectral bands have the same wavelength range (Fig. 1.9). Nor is the wavelength range of all bands continuous. Because of atmospheric scattering and absorption, electromagnetic radiation over some wavelengths cannot be used for spaceborne remote sensing, causing discontinuity in the wavelength of spectral bands. Spectral bands in the visible and near infrared spectrum tend
17
Chapter One Dry bare soil (gray brown) 60 Reflectance (%)
18
Vegetation (green)
40
20
B1 B2 B3 B4
B6
B5
0 0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
Wavelength (mm)
FIGURE 1.9 Spectral resolution of imagery. It is defined as the width of a spectral band. As illustrated in this figure, band 6 has the coarsest spectral resolution against bands 1 and 2. Spectral resolution affects the spectral separability of covers.
to have a narrower wavelength range, than those in the middle and far infrared spectrum, because of it’s stronger reflectance here. Since the reflectance curves of most ground objects vary with wavelength (Fig. 1.9), in general, the finer the spectral resolution, the more information about the target is captured. This generalization is valid to a certain degree. The issue of data redundancy arises if the spectrum is sliced into too many spectral bands thinly, as is the case with hyperspectral remote sensing data. Spectral resolution is an important image property to consider in certain applications as it determines the success or failure of computer-assisted per-pixel image classification of satellite imagery data based exclusively on pixel values. The use of more spectral bands in a classification is conducive to the achievement of higher classification accuracy to a certain degree. In general, spaceborne remotely sensed data have a higher spectral resolution than panchromatic aerial photographs that are taken with a frame camera of a single lens. Such data recorded in the multispectral domain represent an effort of increasing spatial resolution to compensate for the inability to use other image elements than pixel values.
1.5.3
Radiometric Resolution
Radiometric resolution refers to the ability of a remote sensing system to distinguish the subtle disparity in the intensity of the radiant energy from a target at the sensor. It is determined by the level of quantizing the electrical signal converted from the radiant energy (Fig. 1.10). Radiometric resolution controls the range of pixel values of an image, and affects its overall contrast. Recently, the common
Overview
W av el
en gt h
Band 6 Band 5 Band 4 Band 3 Band 2 Band 1
FIGURE 1.10 The multispectral concept in obtaining remotely sensed data. It is a common practice to obtain multispectral data in spaceborne remote sensing in which the low spatial resolution is compensated for by a finer spectral resolution.
8-bit quantization level has evolved into a level as high as 11 bits, thanks to advances in sensing technology. With the use of more bits in recording remotely sensed data, the radiative energy received at the sensor is sliced into more levels radiometrically (Fig. 1.11), which
Intensity
Level of sampling the signal
Time
FIGURE 1.11 Quantization of energy reflected from a ground target is converted into an electrical signal whose intensity is proportional to its reflectance. The interval of sampling the signal intensity determines the radiometric resolution of the satellite imagery, or its ability to discriminate subtle variation in reflectance.
19
20
Chapter One makes it possible to differentiate subtle variations in the condition of targets. A fine radiometric resolution is critical in studying targets that have only a subtle variation in their reflectance, such as detection of different kinds of minerals in the soil and varying levels of vegetation stress caused by drought and diseases. Also, remotely sensed data of a fine radiometric resolution are especially critical in quantitative applications in which a ground parameter (e.g., sea surface temperature and concentration level of suspended solids in a water body) is retrieved from pixel values directly. Data of a higher quantization level enable the retrieval to be achieved more accurately, while a coarse radiometric resolution causes the pixels to look similar to one another.
1.5.4 Temporal Resolution Also known as revisit period, temporal resolution refers to the temporal frequency at which the same ground area is sensed consecutively by the same sensing system. Since remote sensing satellites are revolving around the Earth 24 hours a day and 365 days a year, the temporal resolution is directly related to the satellite orbital period. A short period means more revolutions per day and is equivalent to a high temporal resolution. Temporal resolution of the same satellite varies with latitude of the geographic area being sensed. At higher latitude there is more spatial overlap among images acquired over adjoining orbits. The same ground is sensed more frequently, or at a finer temporal resolution, than at a lower latitude. One method of refining temporal resolution of satellite data is to tilt the scanning mirror of the sensing system. In this way the same scene is able to be scanned repeatedly at a close temporal interval from the adjoining orbits either to the left or to the right of the current path. Temporal resolution is very significant in applications in which the object or phenomenon of study is temporally dynamic or in a state of constant change, such as weather conditions, floods, and fires. In general, satellite remote sensing data have a higher temporal resolution than airborne remote sensing data. Among the four image resolutions, temporal resolution bears no direct relationship to the other three resolutions. Although spectral and spatial resolutions are independent of each other, both are tied closely to radiometric resolution. In recording images at a finer spectral or spatial resolution, the returned energy emitted from or reflected by the ground is sliced into numerous units either spectrally or spatially. The successful detection of such a tiny quantity of energy over a unit imposes a stringent demand on the radiometric sensitivity of the detectors (Fig. 1.12). Consequently, a fine radiometric resolution is achievable by compromising either spectral or spatial or both resolutions in order to accumulate sufficient energy from the target to be accurately identifiable. Conversely, a low radiometric resolution may be adopted in order to achieve a finer spectral or spatial
Overview
(a)
(b)
(c)
FIGURE 1.12 Appearance of the same image represented at three radiometric levels: (a) 8 bits (256 gray levels), (b) 6 bits (64 gray levels), and (c) 3 bits (8 gray levels).
21
22
Chapter One resolution. Since the amount of energy emitted by targets is much smaller than what is reflected, it is more difficult to achieve the same spatial or radiometric resolution for images acquired over the thermal infrared portion of the spectrum than over visible and near infrared wavelengths.
1.6
Organization of the Book This book is divided into 14 chapters. Chapter 2 comprehensively surveys the main characteristics of existent remote sensed data available for digital analysis. Also included in this chapter is how to convert existing analog remote sensing materials into digital format via scanning. Chapter 3 presents various media for storing remote sensing data, and the common image formats for saving remote sensing imagery and processed results. Also covered in this chapter are methods of data compression, both lossy and error free. Contained in Chap. 4 is a critical overview and assessment of main digital image analysis systems, their major features and functions. A few of the lead players are described in great depth, with the strengths and limitations of each system critically assessed. How to prepare remote sensing data for digital analysis geometrically forms the content of Chap. 5. After the fundamentals of image geometric rectification are introduced, several issues related to image rectification are addressed through practical examples. Also featured in this chapter are the most recent developments in image georeferencing, such as image orthorectification and real-time georeferencing. Chapter 6 is devoted to image enhancement methods, ranging from simple contrast manipulation to sophisticated image transformation. Most of the discussion centers around processing in the spectral domain while image enhancement in the spatial domain is covered briefly. Covered in Chaps. 7 to 11 are five unique approaches toward image classification. Chapter 7 on spectral image classification begins with a discussion on the requirements and procedure of image classification. The conventional per-pixel-based parametric and nonparametric methods, namely, unsupervised and supervised methods, are presented next. Three supervised image classification algorithms are introduced and compared with one another in terms of their requirements and performance. This is followed by more advanced classification methods, including subpixel and fuzzy image classification. This chapter ends with a brief discussion on postclassification processing. With the advances in machine learning, new methods have been attempted to perform image classification in the hope of achieving higher accuracies. Two attempts of neural network classification and decision tree classification form the focus of Chaps. 8 and 9, respectively. After the various types of neural network structures are introduced in Chap. 8, the discussion then shifts to network configuration and
Overview training, both being critical issues to the success of neural network image classification. The potential of this endeavor is evaluated toward the end of the chapter. Chapter 9 on decision tree classification begins with an introduction to major decision trees that have found applications in image classification, followed by a discussion on how to construct a tree. The potential of this classification method is assessed toward the end of this chapter. The focus of Chap. 10 is on spatial image classification in which the spatial relationship among pixels is taken advantage of. Two topics, use of texture and objectbased image classification, are featured prominently in this chapter. In addition, image segmentation, which is a vital preparatory step for object-oriented image classification, is also covered extensively. Recently, image classification has evolved to a level where external knowledge has been incorporated into the decision making. How to represent knowledge and incorporate it into image classification forms the content of Chap. 11. After presenting various types of knowledge that have found applications in intelligent image classification, this chapter concentrates on how to acquire knowledge from various sources and represent it. A case study is supplied to illustrate how knowledge can be implemented in knowledge-based image classification and in knowledge-based postclassification processing. The performance of intelligent image classification relative to per-pixel classifiers is assessed in terms of the classification accuracy achievable. The next logical step of processing following image classification is to provide a quality assurance. Assessment of the classification results for their accuracy forms the content of Chap. 12. Addressed in this chapter are sources of classification inaccuracy, procedure of accuracy assessment, and proper reporting of accuracies. Chapter 13 extends digital analysis of remote sensing data to the multitemporal domain, commonly known as change detection. The results derived from respective remote sensing data are compared with each other either spatially or nonspatially. Many issues related to change detection are identified, in conjunction with innovative methods of change detection. Suggestions are made about how to assess and effectively visualize change detection results. The last chapter of this book focuses on integrated image analysis with GIS and global positioning system (GPS). After models of integrating these geoinformatic technologies are presented, this chapter identifies the barriers to full integration and potential areas to which the integrated analysis approach may bring out the most benefits.
23
This page intentionally left blank
CHAPTER
2
Overview of Remotely Sensed Data
I
n the late 1960s, meteorological satellite data with a coarse spatial resolution from instruments such as the Advanced Very High Resolution Radiometer (AVHRR) from the National Oceanographic and Atmospheric Administration (NOAA) came into existence for the first time in history. These data, initially designed chiefly for the purpose of studying weather conditions, were not accompanied by wide practice of digital image analysis in the remote sensing community, due probably to the fledgling state of computing technology back then. In the early 1970s, the Landsat program was initiated to acquire satellite data for the exclusive purpose of natural resources monitoring and mapping. Since then tremendous progress has been made in remote sensing data acquisition, with tens of satellites launched. The advance in our data acquisition capacity is attributed largely to the progress in rocket technology and sensor design. Consequently, a wide range of satellite data has become available at a drastically reduced price. Over the years the spatial and spectral resolutions of these data have been improved. Satellite data of a finer spatial resolution have opened up new fields of applications that were not possible with data of a poor spatial or spectral resolution before. In addition to multispectral data, it is possible to obtain satellite data in hundreds of spectral bands. These remotely sensed data with improved viewing capabilities and improved resolution have not only opened up new areas of successful applications, but also created specific fields in digital image analysis. In this chapter, these satellite data are comprehensively reviewed in terms of their critical properties and main areas of application. All the satellite data, including meteorological, oceanographic, natural resources, and even radar, will be covered in this overview. Both multispectral and hyperspectral data are included in this review. In addition, this chapter also identifies recently emerged trends in satellite data acquisition, including the acquisition from airborne platforms. This identification is followed by a discussion on how to convert existent
25
26
Chapter Two analog materials into the digital format. Finally, this chapter concentrates on the proper selection of remotely sensed data for a given application.
2.1
Meteorological Satellite Data Among all remote sensing satellites, meteorological satellites have the longest history. Of the existing meteorological satellite data, the most widely known and used are from the AVHRR sensors aboard the NOAA series of satellites, the most recently launched being the 18th. These satellites orbit around the Earth at an altitude of 833 km with an average period of approximately 102 minutes (Table 2.1). Designed primarily for meteorological applications, the NOAA series of satellites are capable of obtaining data of a fine temporal resolution via at least two satellites working in a sun-synchronous orbit. Some missions have a daylight (e.g., 7:30 a.m.) north-to-south equatorial crossing time while other missions have a nighttime (e.g., 2:30 a.m.) equatorial crossing time. As a result, any location on the surface of the Earth can be sensed twice a day, once in the morning and again in the afternoon. The AVHRR sensor captures radiation over the visible light, near infrared (NIR), and thermal infrared (TIR) portion of the spectrum in five spectral bands (Table 2.2). This radiometer has a nominal swath width of 2400 km and an instantaneous field-of-view (IFOV) of 1.3 milliradians at nadir. AVHRR data are available in three forms, high resolution picture transmission (HRPT), global area coverage (GAC), and local area coverage (LAC). Both HRPT and LAC data have a full ground resolution of approximately 1.1 ⫻ 1.1 km2. It increases to about 5 km at the largest off-nadir viewing angle near the edges of the 3000–km wide imaging swath. GAC data are sampled four out of every five pixels along the scan line, and every third scan line in LAC data. Such processed data have a spatial resolution of 4 ⫻ 4 km2. AVHRR data are available at two levels. Level 1B data are raw data that have not been radiometrically calibrated, even though radiometric Satellite Number
Launch Date
Ascending Node
Descending Node
14
12/30/94
1340
0140
15
05/13/98
0730
1930
16
09/21/00
1400
0200
17
06/24/02
2200
1000
18
05/20/05
1400
0200
These satellites had an altitude of 833 km, a period of 102 min, a revisit period of 12 h, and an inclination of 98.9°.
TABLE 2.1 Characteristics of Recent NOAA AVHRR Satellites
Overview of Remotely Sensed Data
Spatial Resolution, km Band
Wavelength, μ m
Typical Use
LAC
GAC
1
0.58–0.68
Daytime cloud/ surface and vegetation mapping
1.1
4
2
0.725–1.10
Surface water delineation, ice, and snow melt
1.1
4
3A
1.58–1.64
Snow/ice discrimination
1.1
4
3B
3.55–3.93
Night cloud mapping, SST
1.1
4
4
10.30–11.30
Night cloud mapping, SST
1.1
4
5
11.50–12.50
SST (sea surface temperature)
1.1
4
TABLE 2.2 Characteristics of AVHRR Bands and Their Uses
calibration coefficients are appended to the data, together with Earth location data. They are supplied either as a single scene or as a mosaic of multiple scenes. A single scene image has a dimension of 2400 ⫻ 6400 km2. A mosaic consists of multiple images from the same orbit that have been stitched together. Their availability is limited to certain dates only. Georegistered level 1B data have been radiometrically and geometrically corrected in accordance with the parameters specified by the user. They include projection, resampling method, and pixel size. The data are supplied in single scenes only in the binary format of 8 or 10 bits. Because of the broad geographic area that can be covered by one scene and their low cost, AVHRR data have found applications in global and regional monitoring of forests, tundra, and grasslands ecosystems. Other applications include agricultural assessment, land cover mapping, soil moisture analysis at the regional scale, tracking of regional and continental snow cover, and prediction of runoff from snow melting. The thermal bands of AVHRR data are also useful in retrieving various geophysical parameters such as SST (sea surface temperature) and energy budget. Since they have a fairly continuous global coverage since June 1979, AVHRR data are perfectly suited to long-term longitudinal studies. Multiple results can be averaged to show the long-term patterns of global biomass and chlorophyll concentration (Fig. 2.1). Their extremely high temporal resolution makes them perfectly suited to monitor dynamic and ephemeral processes like flooding and fires on a broad scale. In geology, AVHRR images can be used to monitor volcanic eruptions, and study regional drainage and physiographic features.
27
28
Chapter Two
FIGURE 2.1 Global distribution of vegetation expressed as normalized difference vegetation index (NDVI) and chlorophyll averaged from multitemporal AVHRR data between June and August 1998. (Source: Goddard Space Flight Center.) See also color insert.
2.2
Oceanographic Satellite Data The Sea-viewing Wide Field-of-view Sensor (SeaWiFS) satellite was launched on August 1, 1997 into a 705 km sun-synchronous orbit that is inclined at 98.2° (Table 2.3). This satellite serves as the successor to the Coastal Zone Color Scanner sensor that ceased operation in 1986 in a mission to acquire quantitative data of ocean biooptical and biogeochemical properties on the global scale. This satellite has a period of 98.9 minutes and a return period of only 1 day. The nadir resolution of SeaWiFS imagery is 1.13 km (LAC) and 4.5 km (GAC). All data are quantized to 10 bits. The ground swath varies from 1500 km at the scanning angle of 45° (GAC) to 2800 km at 58.3° (LAC). There are 1285 (LAC) and 248 (GAC) pixels along scan. The eight spectral bands of SeaWiFS imagery cover the wavelength range of 0.402 to 0.885 μm over the visible light and NIR spectrum (Table 2.4). Such a narrow range of spectral sensitivity is justified because ocean color is mostly observable in visible light. These data are processed to three levels. Level 1A data are raw radiance values. Their
Overview of Remotely Sensed Data
Height
705 km
Inclination
98.217°
Period
98.9 min
Orbit type
Sun synchronous
Speed
6.47 km/s
Repeat cycle
1 day
Spatial resolution (km)
1.13 (LAC), 4.5 (GAC)
Swath width
2,801 km LAC/HRPT (58.3°) 1,502 km GAC (45°)
Quantization
10 bits
Source: Feldman.
TABLE 2.3 Orbit Characteristics of SeaWiFS Satellite
calibration and navigation information is stored in a separate file in the hierarchical data format (HDF). Level 2 (GAC) data are processed products of 11 geophysical parameters, such as normalized waterleaving radiances at 412, 443, 490, 510, 555, and 670 nm. Other derived products are chlorophyll α concentration, Epsilon of aerosol correction at 765 and 865 nm, and aerosol optical thickness at 865 nm. Data processed to level 3 include, five normalized water-leaving radiances that have been corrected for atmospheric scattering and sun angles
Band
Center Wavelength, mm (color)
Primary Use
1
0.402–0.422 (violet)
Dissolved organic matter (incl. Gelbstoffe)
2
0.443–0.453 (blue)
Chlorophyll absorption
3
0.480–0.500 (blue-green)
Pigment absorption (Case 2), K(490)
4
0.500–0.520 (blue-green)
Chlorophyll absorption
5
0.545–0.565 (green)
Pigments, optical properties, sediments
6
0.660–0.680 (red)
Atmospheric correction and sediments
7
0.745–0.785 (NIR)
Atmospheric correction, aerosol radiance
8
0.845–0.885 (NIR)
Atmospheric correction, aerosol radiance
TABLE 2.4 Spectral Bands of SeaWiFS Data and Their Major Uses
29
30
Chapter Two differing from nadir, and seven geophysical parameters. Free access to these data and the results processed from them is granted to approved users only. SeaWiFS data have a narrow and focused application area, namely, the study of ocean color on the global scale, which is critical to studying the concentration of microscopic marine plants (e.g., phytoplankton) and ocean biogeochemical properties. In conjunction with ancillary data, SeaWiFS data enable retrieval of meaningful biologic parameters such as photosynthesis rates.
2.3
Earth Resources Satellite Data There are several satellites in this category, all of which share the same characteristics of capturing radiation in the visible light, and NIR spectrum at a medium spatial resolution and a return period of around 20 days. Introduced in this section are six of the lead satellites/sensors: Landsat, Le Systeme Pour l’Observation de la Terre (SPOT, or Earth Observation System), Indian Remote Sensing (IRS), Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), Moderate Resolution Imaging Spectroradiometer (MODIS), and Advanced Land Observing Satellite (ALOS).
2.3.1
Landsat Data
The launch of the first Landsat satellite by the National Aeronautics and Space Administration (NASA) on June 23, 1972, ushered remote sensing into the space era and aroused an enormous interest in digital image processing. During the course of the NASA space program, Landsat images have evolved toward a higher spatial resolution and a finer spectral resolution. Although some early satellites in this series are no longer in service, they did collect a tremendous quantity of data that are indispensable in long-term monitoring applications. These historic data are also essential in studying changes in land cover and the environment. Initially called the Earth Resources Technology Satellite and later renamed Landsat, the Landsat program represents the first unmanned satellite designed specifically to acquire medium resolution, multispectral data about the Earth on a systematic and repetitive basis. The satellite has a circular, sun-synchronous orbit that is inclined at 99° (Table 2.5). At a height of 915 km and an orbital period of 103 minutes, the satellite is able to complete 14 revolutions a day around the globe. A distance of 2760 km is shifted at the equator between two consecutive orbits on the same day. The same orbit travels by 160 km from one day to the next, resulting in a maximum overlap of only 14 percent between images recorded in successive orbits at the equator. Thus, it is impossible to establish three-dimensional (3D) viewing for most of the Earth’s surface from Landsat imagery. Eighteen days later the orbit returns to where it starts. These parameters have not changed for the first three satellites to maintain consistency in the data acquired.
Overview of Remotely Sensed Data
Height
915 km (880–940)
Inclination
99°
Period
103 min
Revolution
14 per day
Speed
6.47 km/s
Distance between successive tracks at the equator
2,760 km
Distance between orbits
159.38 km
Repeat cycle
18 days
Overlap at the equator
14%
Time of equatorial crossing
9:42 a.m.
Total IFOV
11.56°
Orbit type
Circular, sun-synchronous
TABLE 2.5 Orbital Characteristics of Landsats 1, 2, and 3
Aboard Landsats 1 to 3 are two sensors, Return Beam Vidicon (RBV) and Multispectral Scanner (MSS). RBV consists of three television-like cameras. These detectors with a central perspective projection were intended to obtain images of a high geometric fidelity in three spectral bands for mapping purposes. However, the sensor malfunctioned soon after launch. Consequently, a highly limited number of images were obtained during the mission. MSS operates in four spectral bands spanning from 0.5 to 1.1 μm (Table 2.6). Each band is equipped with six detectors. Thus, six lines of images are obtained simultaneously during cross-track scanning that is perpendicular to the direction of satellite motion. During scanning, a swath width of 185 km is covered on the ground as the scanning mirror rotates within a field-of-view (FOV) of 11.56°. At each scanning position, a ground area of 57 ⫻ 79 m2 is scanned. One image comprises 2340 scan lines and 3240 pixels (Fig. 2.2). Data are transmitted to the ground receiving stations electronically, where all images are resampled to 79 ⫻ 79 m2 before they are released to the general public. Data are recorded in the CCT (computer-compatible tape) form and can be downloaded from the U.S. Geological Survey website at http://glovis.usgs.gov/. Launched on July 16, 1982, and March 1, 1984, respectively, Landsat 4 and Landsat 5 retained most of the orbital characteristics of their predecessors (Table 2.7). While the satellite altitude was lowered by about 200 km, the total FOV increased to 14.92° so that the same swath width of 185 km on the ground could be maintained. Associated with the lower altitude is the shorter return period of 16 days. Landsat 4 and Landsat 5 are considered the second generation in the series in
PAN: 0.52–0.90 6: 10.4–12.5 (the remaining bands are the same as TM’s)
Spatial Resolution, m
Swath Width, km
Quantization Level, bits
79
185
7
30
185
8
120 15 60
TABLE 2.6 Characteristics of Landsat MSS and TM Imagery
that their images have several improved qualities over imagery form Landsats 1 to 3. Since the RBV sensor was not successful, it was dropped from these two satellites. The MSS sensor had exactly the same properties as before. Added to these satellites was a new sensor called Thematic Mapper (TM). TM imagery is recorded in seven spectral bands at a spatial resolution of 30 m except band 6, which has a spatial resolution of 120 m. The wavelength range of these bands and their primary uses are provided in Table 2.8. The newest satellite in the series is Landsat 7 launched on April 15, 1999 (Landsat 6 failed soon after launch). Carried on board was a new sensor called Enhanced TM Plus (ETM⫹). It has a few improvements over its predecessors, such as a panchromatic band (band 8) at a spatial resolution of 15 m. Besides, the spatial resolution of the TIR band (band 6) was refined from 120 to 60 m. All the seven multispectral bands have maintained the same wavelengths (Table 2.8). The ground area covered per scene still stays at 185 ⫻ 185 km2. Landsat 7 ETM⫹ data are available to the general public at two levels, 0Rp and 1G. Level 0Rp data are raw data that have not been corrected for radiometric and geometric distortions except that scan lines are reversed and nominally aligned. Level 1G data have been corrected for systematic distortions, such as radiometric calibration and geometric transformation to a user-specified projection. Such geometrically corrected images have a typical accuracy of <250 m in low-relief areas.
Overview of Remotely Sensed Data
Field of view
e llit te n Sa otio m Dir ec sca tion o nni f ng
79
(32
40 185 k m pix els per
line
m
m 79 km es) 5 n 18 0 li 4 3 (2
)
FIGURE 2.2 The cross-track scanning concept in Landsat MSS sensor. The image has a format of 185 × 185 km2, made up of 2340 lines by about 3240 pixels in one scan line. The scanning angle at the cell size is IFOV (too small to mark in the diagram). The scanning direction is perpendicular to satellite motion.
Altitude
705 km
Period
98.9 min
Total FOV
14.92°
Inclination
98.2°
Repeat cycle
16 days (233 revolutions)
Orbit type
Sun-synchronous polar
Equatorial crossing time
10:00 a.m.
TABLE 2.7 Orbital Characteristics of Landsats 4, 5, and 7
33
34
Chapter Two
Band
Wavelength, lm
Primary Use
1
0.45–0.52 (blue-green)
Water body penetration, useful for coastal water mapping, and for differentiation of soil from vegetation, and deciduous from coniferous flora
2
0.52–0.60 (green)
For measuring visible green reflectance peak of vegetation for vigor assessment
3
0.63–0.69 (red)
A chlorophyll absorption band important for vegetation discrimination
4
0.76–0.90 (NIR)
Useful for determining biomass content and for delineation of water bodies
5
1.55–1.75 (mid-IR)
Indicative of vegetation moisture content and soil moisture. Also useful for differentiation of snow from clouds
6
2.08–2.35 (mid-IR)
For discrimination of rock types and for hydrothermal mapping
7
10.40–12.50 (TIR)
Vegetation stress analysis, soil moisture discrimination, and thermal mapping
8∗
0.52–0.90
General purpose land cover/land use mapping
∗Available in Landsat 7 ETM⫹ only.
TABLE 2.8 Landsat TM and ETM⫹ Spectral Bands and Their Major Applications
Both level 0Rp and 1G image data are provided in rescaled 8-bit unsigned integer values. More accurately corrected images using ground control points and digital elevation models (DEMs) at levels 1R and 1T are available to U.S. Geological Survey–approved researchers only. Landsat imagery data may be purchased in a single scene or multiple scenes. The cost starts from as low as $200 per scene. The price is higher for processed data. Data delivery has evolved from exclusively CCT-based to Internet downloading in the Tagged Image File Format (TIFF) or GeoTIFF format. Delivery via hardcopy media takes 1 to 3 weeks. By comparison, about 1 to 3 days are required to process the request to download the data via the Internet. Data can be purchased from a few vendors, one of which is Space Imaging. As its name suggests, Landsat TM data are suited primarily for producing thematic maps in a wide range of areas, from water to land. Thanks to their fine spectral resolution and a moderately high spatial resolution, Landsat TM/ETM⫹ data have found wide applications in mapping land use, managing natural resources such as forestry and
Overview of Remotely Sensed Data water resources, monitoring flooding, and in agriculture, geology, and oceanography. These applications are made more versatile and robust by a large body of data spanning over three decades.
2.3.2
SPOT Data
Designed by the French Centre National d’Etudes Spatiales (CNES), SPOT was first launched into an orbit of 832 km on February 21, 1986. This first commercial Earth observation satellite provides complete world coverage with stereoscopic viewing capability at a high spatial resolution. A number of satellites have been launched in this series, all having an identical set of orbital parameters, such as a circular, near-polar, sun-synchronous orbit with an inclination of 98.7° (Table 2.9). The payload of the first three SPOT satellites encompasses two identical high resolution visible (HRV) sensors that operate in either of two modes, panchromatic or multispectral. In the first mode, one panchromatic band is acquired over the wavelength range of 0.51 to 0.73 μm at a spatial resolution of 10 m. In the second mode, three multispectral bands are obtained at a spatial resolution of 20 m (Table 2.10). Unlike Landsat TM, SPOT uses the pushbroom scanning technology to minimize scanning duration (Fig. 2.3). Owing to the use of a linear array of 6000 detectors, 3000 lines of imagery in the multispectral mode or 6000 lines of imagery in the panchromatic mode are obtained simultaneously. Consequently, SPOT imagery is much more geometrically reliable than its Landsat counterpart obtained via cross-track scanning. Furthermore, the scanning mirror can be tilted in steps of 0.6° by up to 27° in either direction away from the nadir, reaching a maximum swath width of 80 km (Fig. 2.4). This off-nadir viewing capability brings out two advantages in image acquisition. First, it can curtail the revisit period of the satellite from the nominal 26 days to a few days. If the nadir area is under cloud cover, it can still be sensed during the next orbit of the same day by steering the scanning mirror sideway toward the missed track on the ground.
Height
832 km
Inclination
98.7°
Repeat cycle
26 days
FOV
4.14°
Off-nadir viewing
Up to 27° in 45 steps of 0.6°
Orbit type
Near polar, sun synchronous
Equatorial crossing
10:30 a.m.
TABLE 2.9 Orbital Parameters and Sensor Characteristics of SPOT Satellites
35
36
Chapter Two
Image Properties
Multispectral
Panchromatic
Wavelength (μm)
0.50–0.59
0.51–0.73
0.61–0.68 0.79–0.89 SWIR (shortwave infrared)∗
1.58–1.75
Spatial resolution at nadir (m)
20
10 (2.5 or 5)†
Swath width at nadir (km)
60
60
Number of pixels per line
3,000
6,000
Quantization level (bits)
8
6
∗Available in SPOT 4 and SPOT 5 only. †Figure in the bracket applies to SPOT 5 only.
TABLE 2.10 Characteristics of SPOT Satellite Data Detector array
Focal plane
Lens
ng nni Sca ction dire
Sw at
h
wi
dt
h
Ground sampling size
tion
ite tell
mo
Sa
FIGURE 2.3 The “pushbroom” concept of ground scanning adopted by SPOT satellites. In this mode the scanning direction is in agreement with the satellite motion direction, hence shortening scanning duration to achieve a higher geometric fidelity at the expense of using more detectors.
Overview of Remotely Sensed Data
27∞ 27∞
80
km
dir
Na 95
0
60
km
h
pat
km
dir
-na Off 80
h
pat
km
FIGURE 2.4 The off-nadir viewing capability of SPOT satellite achieved by tilting the scanning mirror by up to 27° in both directions. This capability makes it possible to obtain 3D views of the ground and to shorten the revisit period.
Because of the tilt, the ground swath becomes larger than the nominal 60 km. Second, stereoscopic viewing of the ground is feasible from stereo images acquired in two neighboring orbits. Starting from the SPOT 4 that was launched on March 23, 1998, a new spectral band called shortwave infrared (SWIR) over a wavelength range of 1.58 to 1.75 μm was added to the sensor in order to overcome the deficiency of SPOT data in automatically mapping vegetation. Vegetation, which has a unique and distinctive spectral response in the shortwave infrared spectrum, is best distinguished from other ground covers over this wavelength range. Also, an additional sensor called the vegetation monitoring instrument (VMI) was added to SPOT 4 and SPOT 5 that were launched on the night of May 3 to 4, 2002 to improve their capability in monitoring vegetation and in discriminating minerals. This sensor has a wide FOV of 101°, with a nadir spatial resolution of 1 km. In SPOT 5 the spatial resolution of the panchromatic band was halved from 10 to 5 m, which can be sampled to 2.5 m using Supermode.
37
38
Chapter Two SPOT data are processed to various levels for distribution. Level 1A data are essentially raw data without geometric correction except normalization of charge coupled device (CCD) detectors. Level 1B data have been corrected for such geometric distortions as view angle, Earth rotation and curvature, in addition to the processing done at level 1A. The extra processing done to level 2A data include projection to a ground coordinate system. Such corrected images have a global coordinate system at an absolute positional accuracy around 500 m. However, the optimal location accuracy can be as high as approximately 20 m, the most accurate among all processed SPOT data. SPOT data have a number of strengths and weaknesses in comparison with Landsat TM imagery (Table 2.11). The strengths of SPOT data lie in their finer spatial resolution and higher geometric fidelity, thanks to the use of many more detectors and along-track scanning. Stereoscopic viewing of the scene is possible by scanning the same ground from adjacent tracks via tilting of the scanning mirror. However, the data are disadvantageous in that each scene covers only a fraction of that of Landsat TM imagery. Thus, it is more expensive to buy SPOT data for a large study area. Unlike TM imagery that comes in seven bands (eight if the panchromatic band is counted), SPOT data are available in three bands, or four if the SWIR is counted. Nevertheless, the total number of spectral bands (maximum of five) is still smaller than TM’s eight. Due to its limited number of spectral bands in the near and middle infrared spectrum, SPOT imagery most likely produces less accurate results than Landsat TM if automatically classified by the computer, especially if the object of study is related to vegetation.
Image Properties
Landsat TM
SPOT
Spatial resolution (m) Spectral bands∗
30 (15)
20 (10)
7 + 1 bands
3 (4) + 1 bands
Revisit period (days)
18/16
26
2
Image format (km )
185 × 185
60 × 60
Manner of scanning
Cross track
Along track
Geometric fidelity
Lower
Higher
Off-nadir viewing
No
Yes
3D imaging capability
No
Yes
∗In recent satellites a panchromatic band was added to Landsat TM, and a SWIR band was added to SPOT 4 and SPOT 5.
TABLE 2.11 Comparison of Landsat TM and SPOT Imagery
Overview of Remotely Sensed Data SPOT multispectral data, with a spatial resolution slightly better than that of Landsat TM images, have found applications in many similar areas. They range from agriculture, forest management, natural disaster management, to water resources management. Thanks to the availability of finer spatial resolution panchromatic data, they have been applied to costal studies and oceanography, as well as urban planning, areas that are difficult to study using TM imagery. The 3D viewing ability of SPOT data enables the production of small-scale topographic maps in areas where useable stereoscopic aerial photographs are difficult to obtain because of frequent cloud blockage or areas where such photographs are nonexistent.
2.3.3
IRS Data
The third major player in remotely sensing the Earth’s surface for resources mapping is the India National Space program. Since the late 1980s, the IRS program has witnessed the launch of a series of satellites with a revisit period of 22 days. The first one, IRS-1A, was launched on March 17, 1988, followed by IRS-1B on August 29, 1991. The orbital parameters of both satellites (Table 2.12) resemble remarkably those of Landsat satellites. For instance, their orbital height, inclination, orbital period, and return period are almost identical to those of Landsats 1 to 3. Different is the name of the sensor aboard the IRS satellites, which is called Linear Imaging Self-Scanning Sensor (LISS). There are a series of LISS sensors. LISS-I has a spatial
Satellite Parameter
IRS-1A and 1B
IRS-1C, -1D
Resourcesat-1
Height (km)
905
817
817
Inclination
99°
98.6°
98.59°
Period (min)
103
101.35
101.35
Revolution
14 per day
About 14 per day
14 per day
Orbit type
Sun synchronous, near polar
Sun synchronous
Sun synchronous
Repeat cycle (days)
22
5–24
5–25
Time of equatorial crossing
9:40 a.m.
10:30 a.m.
10:30 a.m.
Sensors aboard
LISS-I, LISS-IIA, LISS-IIB
LISS-III, PAN, WiFS
LISS-III, LISS-IV, AWiFS
TABLE 2.12
Orbital Characteristics of IRS Satellites
39
40
Chapter Two resolution of 72 m and a swath width of 146 km. LISS-II has two separate imaging sensors, LISS-IIA and LISS-IIB at a spatial resolution of 36.25 m each. Jointly, they provide a composite swath width of 146.98 km. This first generation of sensors contains four spectral bands, three visible and one NIR (Table 2.12). Launched on December 28, 1995 and September 28, 1997, respectively, IRS-1C and IRS-1D represent the second generation of the IRS series of satellites. They were designed as follow-ons to the first generation of IRS-1A and IRS-1B satellites, with an enhanced resolution and capability. IRS-1C has a near-polar, sun-synchronous orbit at an altitude of 817 km at the local north-to-south equatorial crossing time of 10:30 a.m. (Table 2.12). The satellite completes about 14 revolutions around the Earth per day (NRSA, 1995). Each successive orbit is shifted westward by 2820 km at the equator. There is a distance of 117.5 km between any two adjacent paths at the equator. It takes 24 days for the satellite to cover the Earth completely. The payload of IRS-1C and –ID consists of three sensors: one panchromatic (PAN) camera, one LISS-III sensor, and one Wide Field Sensor (WiFS) (Table 2.13). All the three sensors use pushbroom scanning to ensure a high geometric fidelity. The PAN camera can be rotated by up to 26°, reducing the revisit period to only 5 days. Through this rotation it is possible to acquire stereoscopic images. Recently, the IRS program was expanded to include Resourcesat-1, launched on October 17, 2003. This satellite followed the ground track of the IRS-1C satellite. Its payload is similar to that of IRS-1C and IRS-1D satellites, namely, LISS-III, LISS-IV and Advanced WiFS (AWiFS). LISS-III is identical to that in the previous satellites. The high resolution LISS-IV sensor operates in either of two modes: multispectral or monospectral. In the multispectral mode, all three visible and near-infrared (VNIR) spectral bands cover a swath width of 23 km within a total swath of 70 km. In the monospectral or panchromatic mode, a single band covers a full swath of 70 km at a spatial resolution of 5.8 m. AWiFS operates in four spectral bands (Table 2.13). Its twin cameras are tilted by 11.94° from each other, each covering a swath of 370 km on the ground. Specifically designed for agricultural applications and Earth resources, both images have a nadir resolution of 56 m with a 5-day revisit period. Closely following the design of Landsat sensors, early IRS satellite data share a striking similarity in their properties to Landsat MSS and TM. The designation of bandwidth of IRS-1A and IRS-1B bears a remarkable resemblance to that of Landsat MSS imagery. Besides, the spatial resolution (73 m) and swath width (146 km) are both very similar to, but slightly inferior to, those of MSS. Although the spatial resolution was halved to 36 m later, it is still coarser than the 30 m of Landsat TM. Because of these uncompetitive features, LISS-I and LISS-II data have found limited applications outside India. This fact is due partly to the absence of ground stations to receive IRS data
Satellite
Sensor
Spectral Band (lm)
Spatial Resolution, m
Swath Width, km
Quantization Level, bits
0.45–0.52 0.52–0.59 0.62–0.68 0.77–0.86
73 (I) 36 (II)
146
7
7
IRS-1A IRS-1B
LISS-I, II
IRS-1C
LISS-III
0.52–0.59 0.62–0.68 0.77–0.86
23.5
141
LISS-III
The same as above 1.55–1.70
20 70.5
148
0.5–0.75 0.62–0.68 0.77–0.86
5.8
70
189
810
The same as above B2: 0.52–0.59 B3: 0.62–0.68 B4: 0.77–0.86 The same as above B2: 0.52–0.59 B3: 0.62–0.68 B4: 0.77–0.86 B5: 1.55–1.70
The same as above 5.8
The same as above 23.9
The same as above 10 (7)
The same as above
The same as above
The same as above
56 (at nadir) to 70
370 × 2
10
IRS-1D
PAN WiFS
Resourcesat-1
LISS-III LISS-IV∗ PAN mode
AWiFS
∗The same sensor can work either in the LISS-IV or the PAN mode.
TABLE 2.13 Characteristics of Sensors aboard the IRS Satellites
6
41
42
Chapter Two outside India, apart from the one in Norman, Oklahoma (Bakker, 1998). No established data vendors are selling IRS data. Starting from the second generation of satellites, the IRS data underwent significant improvements in their quality, with stereo viewing capability added and revisit period shortened. The type of data acquired has been diversified to include very coarse resolution imagery for regional studies, and the spatial resolution of multispectral bands has been refined to less than 10 m. These moderate resolution data are able to fulfill the traditional applications (mainly natural resources mapping) similar to those of Landsat MSS and TM data. LISS-III data with a 23.7 m spatial resolution can complement data from the aging Landsat 5 TM sensor. However, high resolution IRS data will stimulate new applications. In particular, the 5.8-m resolution PAN imagery with stereo capabilities is ideal for applications that require spatial detail, and the coarse WiFS data are suited to monitor vegetation over broad areas. Not only has IRS data quality been enhanced, but also the means of data distribution and delivery has been diversified. It is now possible to purchase IRS data from lead data supply agencies, such as Space Imaging in the United States via its Web site.
2.3.4 ASTER Data The Earth Observing Satellites (EOS) program jointly initiated by the United States and other countries heralded a new era in spaceborne remote sensing. It triggered one of the three trends in remote sensing data acquisition, namely, multiple sensors aboard a single platform, each of which is designed to obtain data intended for specialized applications, and all of which complement each other. In this largest and most ambitious mission ever undertaken, the flagship satellite is the Terra spacecraft launched on December 18, 1999. It has a sun-synchronous circular orbit, crossing the equator at 10:30 a.m. local time (descending node). Terra has a polar orbit 705 km in altitude. Its orbital path follows closely that of Landsat 7 to ensure data continuity. Terra was later complemented by another satellite, Aqua, launched on May 4, 2002. It crosses the equator at 1:30 p.m. The payload of the Terra satellite includes five state-of-the-art sensors, each designed for a specific domain of applications (Table 2.14). These sensors are called ASTER, MODIS, Clouds and the Earth’s Radiant Energy System (CERES), Multiangle Imaging Spectroradiometer (MISR), and Measurements of Pollution in the Troposphere (MOPITT). The diverse data collected by these sensors are excellent in studying the Earth’s radiation balance, including the effect of heavier cloud cover on the amounts of solar radiation absorbed by the planet, human-induced land cover and land use changes, glacier volume, properties of the mid to upper atmosphere, and the effects of volcanic activity on the atmosphere (JPL, 2004).
Overview of Remotely Sensed Data
Sensor
General Characteristics
Primary Applications
ASTER
Three scanners
Land surface, land cover mapping (including vegetation conditions), hazard monitoring, geology, and hydrology
MODIS
36-channel imaging spectrometer
Monitoring large-scale changes in the biosphere (e.g., global carbon cycle)
CERES
Two broadband scanners
Assessing clouds’ roles in radiative fluxes from the surface to the top of the atmosphere
MISR
Four-channel CCD arrays
Differentiation of different types of clouds, aerosol particles, and surfaces
MOPITT
Three NIR scanners
Studying distribution, transport, sources, and sinks of carbon monoxide and methane in the troposphere
TABLE 2.14
Summary of Sensors aboard the Terra Spacecraft
Of the five sensors, ASTER is designed to study Earth resources. ASTER data are collected in 14 spectral bands from the visible light to the TIR wavelengths (Table 2.15). The ASTER sensing system is made up of three subsystems covering the visible and near infrared (VNIR), SWIR, and TIR, respectively (Fig. 2.5). All three telescopes can be rotated by up to ±24° in the cross-track direction by tilting the entire telescope assembly. The VNIR subsystem encompasses two telescopes, one nadir-looking and the other backward-looking. Recording images over an identical wavelength range, they facilitate stereoscopic viewing of the target area. All four VNIR bands have a spatial resolution of 15 m, finer than that of SPOT multispectral bands. The six SWIR bands have a 30-m resolution. The TIR subsystem operates in five bands with a resolution of 90 m. Each band uses 10 detectors in a staggered array, with optical bandpass filters over each detector element. ASTER data are available at several levels. Level 1A data are raw image data (e.g., radiance at sensor) in addition to the radiometric and geometric coefficients. They are stored in the HDF, separated by telescope. Level 1B data are those 1A data that have been radiometrically and geometrically corrected using the supplied coefficients. Data at higher levels may also be available, but they are produced only upon request. Essentially, ASTER is an on-demand instrument. Thus, data are not routinely recorded unless a special
43
44
Chapter Two
Characteristics
VNIR
SWIR
TIR
Spectral band and range (μm)
1: 0.52–0.60
4: 1.600–1.700
10: 8.125–8.475
2: 0.63–0.69 3: 0.76–0.86∗
5: 2.145–2.185
11: 8.475–8.825
6: 2.185–2.225
12: 8.925–9.275
7: 2.235–2.285
13: 10.25–10.95
8: 2.295–2.365
14: 10.95–11.65
9: 2.360–2.430 Ground resolution at nadir (m)
15
30
90
Swath width (km) Data rate (Mbps)
60
60
60
62
23
4.2
Cross-track pointing (°)
±24
±8.55
±8.55
Cross-track pointing (km)
±318
±116
±116
Quantization (bits)
8
8
12
∗Two identical bands are acquired, one nadir looking, and one backward looking.
TABLE 2.15
Characteristics of ASTER Spectral Bands
FIGURE 2.5 Sensors aboard the Terra satellite. (Source: Goddard Space Flight Center.)
Overview of Remotely Sensed Data request is received. All archived ASTER 1A and 1B data may be searched and ordered via the Earth Observing System Data Gateway (EDG) at http://lpdaac.usgs.gov/. ASTER data are rather inexpensive. Level 1 data of the U.S. continent and territories can be downloaded from the above site for free. Data for the rest of the world can be purchased at $85 ($80 plus $5 handling fee) per scene. Data delivery via the Internet is rather efficient and quick if the user has access to broadband. At a relatively high spatial resolution with over 10 spectral bands, ASTER data serve as a continuation of Landsat TM data. Therefore, their application areas are highly similar to those of TM data. They include, but are not restricted to, natural resources mapping and monitoring. In particular, ASTER VNIR imagery is invaluable in applications where it is important to monitor sparsely populated vegetation such as land desertification and soil salinization (Fig. 2.6). In such applications SWIR data are not as useful as Landsat ETM⫹ because they overlap with each other excessively in their information content (Gao and Liu, 2008). The combination of the 3D viewing capability of ASTER
FIGURE 2.6 A subscene (1001 rows by 1101 columns) ASTER image of Northeast China recorded on September 11, 2004. This color composite is formed by VNIR bands 1 (b), 2 (g), and 3 (r). Its spatial resolution of 15 m is ideal in studying natural hazards such as land salinization, which appears as white patches in this composite. See also color insert.
45
46
Chapter Two VNIR imagery with its high resolution allows the production of small-scale topographic maps and DEMs, especially in mountainous areas. In addition, ASTER TIR bands are suited to creation of detailed maps of land surface temperature, emissivity, and reflectance.
2.3.5
MODIS Data
The MODIS instrument is another key sensor aboard the Terra (EOS AM) and Aqua (EOS PM) satellites. This sensor is able to cover the entire Earth in 1 to 2 days. Its total FOV of ±55° combined with a satellite altitude of 705 km enables a swath width of 2330 km to be scanned using cross-track scanning. The MODIS multispectral radiometer captures solar radiation over the wavelength range of 0.405 to 14.385 μm in 36 spectral bands at three spatial resolutions ranging from 250 m (bands 1 and 2) to 1000 m (Table 2.16). Five bands have a spatial resolution of 500 m and the remaining 29 spectral bands have a spatial resolution of 1 km, all quantized to 12 bits. MODIS data are processed to several levels, ranging from level 0 to level 1B. In total, 44 standard MODIS data products are available to approved users at minimal or no cost. Thanks to its wide sweeping swath and a larger number of spectral bands (36), MODIS represents a significant improvement over AVHRR imagery in sensing a wide array of terrestrial processes. MODIS data will
Swath dimension
2,330 × 10 km (at nadir)
Spatial resolution
250 m (bands 1–2) 500 m (bands 3–7) 1,000 m (bands 8–36)
Characteristics of MODIS Data (source: NASA, 2008).
Overview of Remotely Sensed Data enhance our understanding of the dynamics and processes of the Earth’s surface on the global scale, such as global change in oceanography, biology, and the atmosphere. In particular, MODIS bands 1 to 7 can be used to differentiate land from cloud and to study the boundaries and properties of aerosols. Bands 8 to 16 are the most useful in studying ocean colors, phytoplankton, and biogeochemistry. All the remaining bands are suited to determine atmospheric water vapor, study surface and cloud temperature, ozone and cloud, as well as the temperature of the atmosphere. Therefore, MODIS data are suited not only to the study of the Earth’s surface, but also the atmosphere.
2.3.6 ALOS Data Designed for precise land observation over the optical and microwave portion of the spectrum, the Advanced Land Observing Satellite ALOS was launched into a sun-synchronous orbit on January 24, 2006. Its payload comprises three sensors, the Panchromatic Remote Sensing Instrument for Stereo Mapping (PRISM), the Advanced Visible and Near Infrared Radiometer type 2 (AVNIR-2), and the Phased Array type L-band Synthetic Aperture Radar (PALSAR) (Table 2.17). PRISM is made up of three independent optical systems for forward, nadir, and backward observations in the along-track direction, sensing radiation over the 0.52 to 0.77 μm spectral range. Two of the telescopes are tilted by 24° to achieve off-nadir viewing.
PALSAR Sensor
PRISM
AVNIR-2
Wavelength (μm)/ Frequency (GHz)
0.52–0.77
0.42–0.50 0.52–0.60 0.61–0.69 0.76–0.89
Spatial resolution (m)
2.5
10
10
100
Swath width (km)
35–70
70
70
250–350
Point angle (°)
+/−24
+/−44
Number of looks
3
Flexible
Polarization Data transmission rate (Mbps)
960
High Resolution
ScanSar
1.27 GHz (L band)
10–51 2
8
HH, VV, HH & HV, VV & VH
HH, VV
160
Source: JAXA, 2004.
TABLE 2.17 Major Characteristics of ALOS Imagery
240
47
48
Chapter Two Panchromatic images are acquired along the satellite track with a base-to-height ratio of 1:0. In this forward-nadir-backward stereo mode, a swath width of 35 km on the ground is covered at a spatial resolution of 2.5 m (JAXA, 2004). This width rises to 70 km in the nadir-only viewing mode. Through pointing the telescope sideway, the normal revisit period of 46 days can be reduced to up to two days. Stereoscopic PRISM data can be used to construct highly accurate DEMs and to produce topographic maps of the world at a scale <1:25,000. AVNIR-2 is a visible and NIR imaging radiometer of four multispectral bands at a spatial resolution of 10 m at nadir (Table 2.17). In this position a strip of 70 km is scanned along track. The sensor can be tilted cross track by up to 44° away from the nadir position either way for quickly monitoring natural disasters, such as earthquakes, fires, volcanic eruptions, and oil spills. However, the primary applications of AVNIR-2 data are mapping of land covers and environmental monitoring at the regional scale, very similar to Landsat TM and SPOT data. PALSAR is an L-band (1.27-GHz) sensor designed to succeed the Synthetic Aperture Radar (SAR) sensor aboard the Japanese Earth Resources Satellite-1 (JERS-1) satellite (refer to Sec. 2.6.1 for more details) with improved functionality. This sensor is able to operate in either high resolution mode or the ScanSAR mode. The former mode is able to generate images at a spatial resolution of 10 m at a swath width of 70 km. This conventional mode is intended for detailed regional observations and repeat-pass interferometry. In the second mode a swath width of about 250 to 350 km, depending upon the number of scans, is covered. This mode extends the swath width of conventional SAR images by three- to fivefolds, a feature particularly useful for monitoring sea-ice extent and rainforests.
2.4 Very High Spatial Resolution Data About a decade ago spaceborne remote sensing experienced another trend, namely, the advent of very high spatial resolution satellite imagery. Their spatial resolution is so fine that it is comparable to that of airborne photographs. A few successful satellite programs have recorded images with a spatial resolution on the order of meters and submeters. So far more than twelve of them have become available for digital analysis (Table 2.18). The majority of these sensors record only a panchromatic band that is intended mostly for cartographic mapping, even though environmental monitoring is possible with the data. Acquisition of multispectral bands that are usually numbered four for the purpose of detection is not the primary objective for these satellites. Of these systems, some have been in existence for nearly a decade, and others are still in the pipeline, to
Nadir Spatial Resolution, m Satellite
Agency
Altitude, km
Viewing
Pan
XSL(*)
Swath Width, km
IKONOS 2/3
GeoEye(Space Imaging)
680
Flexible
0.82
3.28 (4)
11.3
EROS A
ImageSat Int Israel
600
Flexible
1.9
14 7
EROS B
500
0.7
EROS C
500
0.7
2.8
11
0.61
2.44 (4)
16.4
4 (4)
8
QuickBird 2/3
DigitalGlobe
Flexible
WorldView 1/2
DigitalGlobe
496
0.55
OrbView X
OrbImage
470
1
TESS
ISRO, India
565
Flexible
1.0
12
Cartosat- 1
ISRO, India
617
+26°
2.5
30
630
Flexible
1
10
691
±24°
2.5
35/70
Cartosat- 2 ALOS
*
450
NASDA, Japan
17.6
Resurs DK1- 3
Russia
360–604
1
2–3 (3)
28.2
Kompsat- 2
Korea
675–701
1
4
15
COSMOSkyMed
ASI/MOD
622–627
Side looking
1 to 100†
Pleiades- 1/2
CNES, France
694
Flexible
0.7
49
Number of spectral bands. ASI-Italy Space Agency/Ministry of Defense. Resolution depends on the mode of operation.
†
TABLE 2.18 A Comparison of Major Characteristics of Very High Resolution Satellite Imagery
10–100 2.8
20
Chapter Two
3 Resolution (m)
50
Radarsat 2 Cartosat 1 ALOS
SPOT 5
2
∗
OrbView 3 Formosat
EROS A
QuickBird 3 Cartosat 2 Resurs 1 Cosmos 2 Kompsat 2 Cosmo WorldView 2 EROS B
1 QuickBird 2
WorldView 1
2001
2002
2003
2004
2005
2006
2007
Pleiades EROS C
GeoEye
2008
2009
2010
Year
FIGURE 2.7 Very high resolution (e.g., <4 m) satellites that have been launched recently or are to be launched soon. ∗ Launched earlier. Data available since January 2001.
be launched either this year or over the next few years (Fig. 2.7). Apart from COSMO-SkyMed (Constellation of Small Satellites for the Mediterranean basin Observation), all systems record optical data near the visible light and NIR portion of the spectrum. This section focuses on six major types: IKONOS, QuickBird, OrbView- 3, Cartosat, GeoEye and WorldView. Other satellite data will be introduced less extensively.
2.4.1
IKONOS
Launched on September 24, 1999, the IKONOS-2 satellite (IKONOS-1 was launched on April 23, 1999, but failed) ushered spaceborne remote sensing into the hyperspatial resolution era. It made very high spatial resolution satellite data commercially available for the first time in history. Weighing about 720 kg, the IKONOS-2 satellite spins around the Earth at an altitude of 681 km in a sunsynchronous orbit (Table 2.19). Its light weight means that it is relatively easy and less costly to be launched into a predefined orbit than a heavy satellite. Because of its light weight, the life expectancy of IKONOS-2 is anticipated to be between 5 and 7 years. At a return period of 98 minutes, the satellite is able to revolve around the Earth 14 times a day. Data can be collected over a total area of 20,000 km2 in a single pass. The payload of the satellite comprises a digital camera to record data in two modes, multispectral and panchromatic. In the multispectral mode, four spectral bands of blue to NIR wavelengths are captured at a spatial resolution of 4 m (Table 2.19). In the panchromatic mode, only one band is recorded over the wavelength range of 0.45 to 0.9 μm at 1 m resolution. Both types of data are quantized to 11 bits. The sensor can be tilted to acquire images up to
∗Figure in the bracket applies to IKONOS-3. Source: GeoEye, 2007a.
TABLE 2.19 Main Features of the IKONOS-2 Satellite and Imagery
700 km on either side of the track in the off-nadir viewing position, reducing the revisit period from the normal 35 days to 1.5 days. Data can be collected in the off-nadir direction up to 52°. The obtained imagery maintains a <1-m resolution at a scanning angle up to 27.1°. Sold by square kilometers, IKONOS data are rather expensive in comparison with aerial photographs and other satellite data of a coarser spatial resolution. A minimum order of 100 km2 is imposed for data collected upon the user’s request and 49 km2 for archived georectified data. Panchromatic and multispectral bands of the same geographic area are sold as two separate products. Georectified IKONOS data are supplied at three processing levels: Geo, Pro, and PrecisionPlus (GeoEye, 2007a). Being the most accurate, PrecisionPlus data have a horizontal accuracy of 0.9 m root-mean-square (RMS). In general, the more processing is done to the data, the more expensive they become. A delicate balance must be struck between cost and accuracy of data. One way of bringing down the cost is to purchase a multiorganization licensing policy within a large institution (see Sec. 4.8).
51
52
Chapter Two The fine spatial resolution of IKONOS images makes them the ideal candidates in applications that require detailed and highly accurate data, such as cadastral and infrastructure mapping, as well as detailed urban analysis. Panchromatic data of 1-m resolution are useful for mapping at a scale up to 1:5000. Stereoscopic pairs of PAN imagery allow construction of DEMs from them. Other applications of IKONOS data include planning, agriculture, and even insurance. In planning, IKONOS data are suited to plan housing development. In urban areas IKONOS imagery can be used to update street networks, plan urban transport, and manage utilities (Fig. 2.8). Pipelines and power lines, rubbish dumps, and dams can all be positioned using the fine resolution data. IKONOS images also play an important role in precision farming, such as pinpointing the spots in need of special treatment. For insurance purposes, the imagery can be used for assessing damage to properties and crops caused by natural disasters such as flooding, drought, and tornadoes.
2.4.2
QuickBird
There are three satellites in this series. The first, called EarlyBird, was launched by DigitalGlobe in late 1997, but it malfunctioned shortly after being sent into orbit. The second, QuickBird-1, failed to reach
FIGURE 2.8 A subscene (474 rows by 581 columns) color composite of IKONOS multispectral bands 2 (blue), 3 (green), and 4 (red) over a densely populated suburb of Auckland, New Zealand. It was recorded on April 8, 2001. The fine detail exhibited in IKONOS imagery is ideal in studying housing, street networks, and many environmental problems. (Copyright Space Imaging Inc.) See also color insert.
Overview of Remotely Sensed Data its orbit when it was launched on November 20, 2000. QuickBird-2 was successfully launched into a sun-synchronous, circular orbit 450 km in altitude on October 21, 2001. It is the first commercial satellite that is capable of gathering submeter data. Its orbit has an inclination of 97.2° and a period of 93.45 minutes (Table 2.20). The telescope of QuickBird-2 provides an FOV of 2.12º that can be further extended to a pointing capability of ±30° off nadir in the along-track and cross-track directions. At this off-nadir position, the revisit period is reduced to 1–3.5 days, the exact period varying with latitude. The sensor acquires high-resolution, coincident panchromatic, and multispectral images simultaneously using pushbroom scanning. The CCD detector comprises an array of 27,568 pixels in the cross-track
Weight
931 kg
Altitude
450 km
Inclination
97.2°
Orbital period (min) Orbital type Revisit cycle (days)
93.5 Circular, sun synchronous 1–3.5, depending upon latitude
FOV
2.12°
Off-nadir viewing capability
±30° (norm), up to 45°
Pointing accuracy
<0.5 mrad absolute per axis
Positional accuracy
<15 m after ground processing
Data transfer rate
320 Mbps X band
Swath (nadir)
16.5 km
Quantization level Spectral bands Wavelength range (μm)
11 bits XLS
PAN
0.45–0.52 (blue)
0.445–0.90
0.52–0.60 (green) 0.63–0.69 (red) 0.76–0.90 (NIR) Spatial resolution at nadir Image dimension
2.4 m
0.61 m
6,888 × 6,856 pixels
27,552 × 27,424 pixels
Source: DigitalGlobe, 2007.
TABLE 2.20 Main Features of the QuickBird Satellite and Imagery
53
54
Chapter Two direction for the panchromatic band, and 6892 pixels for each of the four multispectral bands. The spectral wavelength of both multispectral and panchromatic bands extends from 0.45 to 0.90 μm. The wavelength of the multispectral bands corresponds to the first four Landsat 7 ETM⫹ bands, and the PAN band is also identical to its ETM⫹ counterpart. QuickBird imagery closely resembles that of IKONOS in its band designation. For instance, both sensors contain four multispectral bands and one panchromatic band that are quantized to 11 bits. However, QuickBird images have two improvements: • First, the spatial resolution has been improved from 1 m and 4 m to 0.61 m in the panchromatic mode and to 2.4 m in the multispectral mode (Table 2.20), respectively. These resolutions, however, do vary with the viewing direction. For instance, they degrade to 0.72 m and 2.8 m, respectively, at the off-nadir angle of 25°. • Second, the swath width is increased from 13 to 16.5 km. QuickBird images can be PAN-sharpened by merging the fine resolution panchromatic band with multispectral ones to take advantage of the strengths of both images. QuickBird data are offered to the public at three accuracy levels: basic, standard, and georectified. Basic images are radiometrically corrected and sensor corrected, but not geometrically corrected. Standard images have been corrected for radiometric, geometric, and sensor distortions. Georectified data have been projected to a ground coordinate system. There is a minimum area-of-order restriction, its actual value varying with imagery type and the nature of the order. For basic images the unit of purchase is scene. A minimum order of 25 km2 is imposed for standard archived images, 64 km2 for new collection, and 100 km2 for orthorectified ones. These figures, nevertheless, do vary with the urgency of the data order. Data can be delivered in the GeoTIFF format electronically. QuickBird imagery is an excellent source of environmental data. It is useful for detecting changes in land use, agriculture, forest, and climate. Thanks to its high spatial resolution, QuickBird imagery is able to identify adequacy of irrigation and soil erosion quickly. It is also possible to closely monitor and even optimize the use of pesticides, fertilizer, and other agricultural treatments using QuickBird data. In forestry, QuickBird satellite imagery can be used to monitor logging, and assess damage caused by forest fires. Environmental impacts of logging, such as stream sedimentation associated with road construction, clear-cut harvesting, and slash-and-burn activities, can be clearly detected from the imagery. In addition, QuickBird satellite data can be potentially useful in environmental and hazard assessment (e.g., damage caused by tsunami in the coastal area). It
Overview of Remotely Sensed Data
FIGURE 2.9 An exemplary QuickBird image of Three Gorges Dam, China, recorded on October 4, 2003. The ground resolution of 0.6 m of QuickBird imagery makes it ideal in applications that require a great deal of details. (Copyright DigitalGlobe.) See also color insert.
can be used to assess risk of flooding, and in planning emergency response and evacuation (Fig. 2.9). It is also possible to map habitats, and assess wetlands and prospect minerals and potential mining sites using QuickBird imagery.
2.4.3
OrbView-3
OrbView-3 exemplifies another commercial satellite designed to obtain hyperspatial resolution remote sensing data of the Earth. Launched on June 26, 2003, OrbView-3 has a circular, sun-synchronous orbit that has an inclination of 97.1° and an altitude of 450 km. Although its orbital period is 93.6 minutes long, the revisit cycle can be as short as less than 3 days owing to its ability to turn the telescope sideway by up to 45°. In this way, the revisit cycle can be shortened to 1–5 days, depending on latitude. Sensors aboard the satellite are able to record one panchromatic band at a spatial resolution of 1 m,
Nominal swath width at nadir Quantization level TABLE 2.21
8 km 11 bits
Characteristics of OrbView-3 Imagery
and four multispectral bands at 4 m, covering a swath width of 8 km on the ground (Table 2.21). All data are quantized to 11 bits. Similar to IKONOS data, OrbView-3 data are also sold in units of square kilometers. Archived basic standard imagery, either panchromatic or multispectral bands, without much processing done to them (e.g., BASIC enhanced), is sold at $5 per square kilometer for international customers. The minimum size of order is one scene, or 64 km2. The price for programmed data (e.g., userinitiated data recording) doubles to $10 per square kilometer for either panchromatic or multispectral bands. The minimum size of order for user-requested data rises to three consecutive scenes, or 192 km2 (NPA Group, 2008). Similar to other hyperspatial resolution data, OrbView-3 data are best applied to fields that require fine details and high geometric reliability, such as telecommunications, utilities, oil and gas exploration, mapping and surveying, agriculture, forestry, and national security. The panchromatic band is best at producing highly accurate maps and 3D fly-through scenes, such as topographic maps at the scale of 1:10,000 (Topan et al., 2007). OrbView-3 color infrared bands are of particular value in studying vegetation, monitoring the environment, forestry, and agriculture, as well as characterizing urban and rural areas, and undeveloped land. However, it is impossible to produce PAN-sharpened imagery from OrbView-3 bands because panchromatic and multispectral bands cannot be recorded concurrently.
2.4.4
Cartosat
Two satellites in this series have been launched, Cartosat-1 and Cartosat-2. Cartosat-1 was launched into a sun-synchronous orbit at an altitude of 618 km on May 5, 2005. This polar orbit has an inclination of 97.87° and a period of 97 minutes. The satellite crosses the equator at 10:30 a.m. local time (Table 2.22). It has a nominal wait time of 11 days to acquire imagery of adjacent path, which can be reduced to
Overview of Remotely Sensed Data
Orbital Parameter
Specification
Height
618 km
Inclination
97.87°
Period
97 min
Equatorial crossing
10:30 a.m.
Nominal repeat cycle
116 days
Revisit cycle
5 days
Off-nadir viewing
−5° to +26°
Orbit type
Near polar, sun synchronous
Equatorial crossing
10:30 a.m.
TABLE 2.22
Orbital Characteristics of Cartosat
5 days (maximum wait time for revisit) via the off-nadir viewing capability. The nominal swath width is 30 km. Aboard Cartosat-1 are two state-of-the-art panchromatic cameras, one tilted fore 26°, and another aft −5°. Both are positioned in such a way that the same ground area is almost simultaneously imaged along track from two different perspectives, yielding stereoscopic images of 2.5-m resolution over the 0.5 to 0.85 μm wavelength range, from which highly accurate 3D maps can be generated. The cameras cover a 30-km swath width at nadir viewing. Data are quantized to 10 bits and compressed before being transmitted to ground-receiving stations. A wide variety of products are derived from Cartosat-1 data at several levels of processing, such as level 0B RAD, level 1 SYS, level 2 GCP, and so on. Level 0B RAD data have been radiometrically corrected, but not geometrically corrected. Level 1 SYS data have been geometrically corrected using the system parameters, in addition to radiometric correction, to a mapping accuracy level sufficiently adequate over flat terrain. Level 2 GCP data have been geometrically corrected using ground control points. They are more accurate than level 1 SYS data, but not as accurate as level 2 DEMA data that are virtually orthoimages (Krishnaswamy and Kalyanaraman, 2002). As its name implies, Cartosat-1 was launched primarily for cartographic applications, such as generating large-scale topographic maps and DEMs, and updating topographic maps. Besides, they are also useful as enhanced inputs in large-scale mapping, and stimulate new applications in land and water resources management, disaster assessment, relief planning and management, and environment impact assessment. Following Cartosat-1, Cartosat-2 was successfully sent into a sun-synchronous polar orbit (orbital inclination ⫽ 97.9°) 630 km in altitude on January 10, 2007. The payload of this satellite consists of a
57
58
Chapter Two single panchromatic camera that collects data over 0.45 to 0.85 μm at a spatial resolution of 0.8 m. The camera is able to operate in one of three modes: spot, paint brush, and multiview. In the first mode, strips of imagery are obtainable on either side of the satellite track in the north-south orientation. In the second mode, the camera is tilted to increase the standard swath width of 9.6 km (NRSA, 2007). In the last mode the same ground is imaged 3 times at different viewing angles to acquire a stereoscopic coverage. Lacking multispectral bands, Cartosat-2 data are more suited to mapping than monitoring and detection applications.
2.4.5 WorldView This program consists of two commercial satellites, WorldView-1 and Worldview-2. The first satellite was launched into a sun-synchronous orbit 496 km in altitude on September 18, 2007. It has a period of 94.6 minutes with a 10:30 a.m. descending node. The payload of this satellite is a panchromatic camera only. It captures data of a 0.50-m nadir spatial resolution at 11 bits of quantization, covering a swath width of 16 km (Table 2.23). This resolution degrades to 0.59 m at the viewing angle of 25°. In total, 500,000 km2 of ground area can be sensed by the satellite in a single day. Similar to Cartosat-2, WorldView-1 does not record multispectral data. What distinguishes WorldView-1 from other satellites is its agility in tilting the scanning mirror. It can be tilted over a wide range of angles rapidly to retarget the ground in order to image it stereoscopically. Another improvement of the satellite is its geolocation capability that is accurate to 7.6 m CE90 at nadir without ground control. This accuracy is improved further to <3.5 m with ground control. The revisit period can be shortened to 1.7 days at 1-m ground sampling distance (GSD), or 5.9 days at 25° off nadir. These features of WorldView-1 imagery make it suitable for cartographic applications, defense, and national security, instead of general land use mapping, and environmental monitoring. The second satellite in this series, WorldView-2, is anticipated to be launched into a sun-synchronous orbit at an altitude of 770 km in late 2008. Apart from the same one panchromatic band as WorldView-1, it is able to capture four multispectral bands at 1.8 m. Furthermore, it will also contain four new bands of coastal, yellow, red edge, and NIR-2 (Table 2.23) (Krebs, 2007). All data are quantized to 11 bits. The ground swath is 16.5 km at a geospatial accuracy (CE90) of 10 to 13 m without ground control. This accuracy improves to 5.5 m with ground control. Revisit frequency (to 40° latitude) varies between 1.1 days at 1.0 m GSD to 4.2 days at 0.52 m GSD. Major applications of WorldView-2 data include mapping and updating of maps at a scale up to 1:2000, production of high accuracy DEMs, as well as other applications similar to other very high resolution satellite data, such as agriculture and forestry, oil and gas pipeline monitoring and exploration, municipal and land planning, and so on.
TABLE 2.23 Main Features of the WorldView-2 Satellite
2.4.6
GeoEye-1
Scheduled for launch in mid-2008, the GeoEye-1 satellite will acquire the highest resolution images of the Earth, owing to the use of the most advanced sensing technology ever in a commercial system. The satellite will revolve around the Earth in a polar orbit inclined by 98° at an altitude of 684 km (Table 2.24). At a velocity of approximately 7.5 km/s and a period of 98 minutes, it is able to complete 12 to 13 revolutions around the Earth each day. Its sun-synchronous orbit passes a given area at around 10:30 a.m. local time. Images will be collected in two modes, panchromatic and multispectral, both simultaneously and independently. Panchromatic images have a spatial resolution of 0.41 m over the spectral range of 0.45 to 0.90 μm. The resolution changes to 1.64 m in the multispectral mode. Thanks to the ability to steer the camera away from the nadir position at an angle up to 35°, GeoEye-1 is able to sense ground areas from side to side and front to back, shortening
Spatial resolution at nadir Nominal swath width at nadir
15.2 km
Quantization level
11 bits
Off-nadir viewing
Up to 60°
Revisit period
<3 days
Altitude
684 km
Inclination
98°
Source: GeoEye, 2007b.
TABLE 2.24 Characteristics of GeoEye-1 Satellite and Imagery
its revisit period to 3 days or less for anywhere on the Earth surface. Objects on the ground can be precisely located to an accuracy of within 3 m. Data will be supplied to the public at different levels of processing, such as basic, geo, ortho, and stereo (GeoEye, 2007b). It is also possible to derive elevational information from the imagery, such as DEMs, and digital surface models. GeoEye-2 will follow the same general setting as GeoEye-1 except it is planned to have a spatial resolution as fine as 0.25 m. However, the final resolution to be determined will have to depend on the feedback from users. Under the licensing agreement, only U.S. government agencies are authorized to access GeoEye-2 images at the finest resolution. Civilian users may receive data resampled to a coarser resolution.
2.4.7
Other Satellite Programs
Unlike the above satellites that were launched by private companies or consortiums for profit making, except Cartosat, several other satellites have been launched mostly by governments that include Israel, Russia, South Korea, Taiwan, and Italy. These satellites include EROS, Resurs DK, COSMO-SkyMed, Formosat, and Kompsat. EROS (Earth Resources Observation Satellite)-A was launched on December 5, 2000, to a sun-synchronous polar orbit 480 km in altitude by a consortium with close ties to the Israeli government. Weighing only 250 kg, the satellite started to collect data commercially in January 2001. Aboard the satellite is a high resolution panchromatic camera
Overview of Remotely Sensed Data whose CCD focal plane records images over 0.5 to 0.9 μm at a nominal spatial resolution of 1.9 m with a nadir swath width of 14 km (IODM, 2006). EROS B was launched on April 25, 2006. Aboard it was a slightly larger camera that produces a resolution of 0.7 m in the panchromatic mode, superior to that of EROS A, though the ground swath width is reduced to only 7 km. With this pointable sensor, it is possible to acquire stereo pairs of imagery at a temporal resolution as short as 3 to 4 days. Scheduled for launch in 2009, EROS C will have an altitude of 480 to 600 km. The payload will be a camera with CCD/ TDI (time delay integration) that is able to provide 20,000 pixels per line at a resolution of 0.7 m in the panchromatic mode, and 2.8 m in the multispectral mode. Resurs DK-1 is a commercial Earth observation satellite launched on June 15, 2006. Weighing 6.6 tons, it has an elliptical orbit of 360 to 604 km at a standard revisit cycle of 5 to 7 days (nadir). Its inclination of only 69.9°, the lowest among all remote sensing satellites, is designed to sense the polar region primarily. It has a period of 94 minutes and repeat period of 6 days at nadir. The cameras aboard the satellite capture data over 0.58 to 0.8 μm in one panchromatic band and three multispectral bands at a spatial resolution of 1.0 m and 2 to 3 m, respectively, both covering a ground swath of 28.2 m (Anshakov and Skirmunt, 2000). All data are quantized to 10 bits. Additional payload includes the Russian-Italian spectrometer Ramela, designed for astronomical applications. The Resurs DK-1 remote sensing satellite is designed mainly to collect data for studying natural resources, ecology, sea surface status, ice situation, and meteorological conditions in the polar region. Its ground receiving station is located in Moscow, without international distribution channels established yet. So it is uncertain whether the data will become available to users outside Russia. Formosat-2 was successfully launched into a sun-synchronous orbit of 891 km in height at an approximate period of 103 minutes on May 21, 2004, passing through Taiwan twice a day. Images are captured in two modes, panchromatic (2-m resolution) and multispectral (8 m). Both are able to view the terrain stereoscopically via tilting the scanning mirror by up to 45°. A ground swath of 24 km is covered during each scan (SIC, 2008). Formosat data can be purchased from SPOT Image at 2500 Euro per scene for either 2-m panchromatic or 8-m multispectral bands. The price goes higher if special programming is requested or both types of data are purchased. The price goes even higher if more processing is done to the data, reaching a maximum of 5000 euro per scene. Kompsat (Korean Multi-Purpose Satellite)-2 was launched on July 28, 2006. Aboard the satellite is a multispectral camera that is able to record 1 m panchromatic images and four multispectral bands at a spatial resolution of 4 m, both of which can be acquired simultaneously. A 15 ⫻ 15 km2 ground area is covered per scene. All data have a
61
62
Chapter Two dynamic range of 10 bits preprocessed to three levels: 1A (radiometric correction), 2A (radiometric and geometric correction), and ortho (SPOT Image, 2004). Similar to Formosat, Kompsat data may be purchased from SPOT Image at 10 euro/km2 for either panchromatic or multispectral bands at the 1A or 2A levels. A minimum order of 225 km2 is imposed for level 1A data. This size drops to 100 km2 for level 2A data. Price is higher for data acquired via special programming. COSMO-SkyMed is an Earth observation remote sensing system designed for both military and civilian applications. Consisting of four satellites, the first COSMO-SkyMed was launched on June 7, 2007, followed by COSMO-2 on December 9, 2007. COSMO-3 will be launched in 2008. All satellites share the same orbital plane, though not the same bands (i.e., C, L, and P bands are used, respectively). Unlike other very high spatial resolution imagery, COSMO-1 imagery is microwave SAR at 9.6 GHz (X band). The area of interest can be sensed several times a day. There are two constellation configurations: stand alone or interferometric. In the second configuration, a pair of 3D SAR imagery is created by joining two radar images that are recorded at slightly different incident angles (Wade, 2007). Image resolution varies from 1 m in the Spotlight/frame mode to 3 to 15 m in the Himage mode in which strip images covering a swath width of 40 km are acquired. It degrades further to 30 m and 100 m in the WideRegion mode and the HugeRegion mode, respectively. Swath width doubles from 100 km to 200 km as the mode switches from the WideRegion to HugeRegion. Major civilian applications of COSMO data include seismic hazard analysis, monitoring of environmental disasters such as landslides and floods, monitoring of coastline and sea waters, and agricultural mapping (e.g., harvest planning and management of treatment cycles). Unlike the above satellites, Pleiades-1 is still at the planning stage, scheduled for launch into a sun-synchronous orbit at a height of 695 km in 2009. It is intended as a follow up of the SPOT satellite program with improved resolution over a field 20 km wide. Similar to SPOT, it also retains the stereoscopic viewing capability. It records one panchromatic band of 0.7 m resolution and four multispectral bands of 2.8 m resolution, identical to those of Kompsat-3, and EROS C imagery.
2.5
Hyperspectral Data The third trend in data acquisition that emerged about a decade ago is termed hyperspectral remote sensing because it involves sensing the target in hundreds of spectral bands simultaneously, in sharp contrast to tens of spectral bands that have been the norm of multispectral remote sensing. These hyperspectral bands cover roughly the same wavelength range as that of multispectral bands. Hence, each
Overview of Remotely Sensed Data hyperspectral band is much narrower in its bandwidth. In contrast to all the satellite programs covered in this chapter so far, the hyperspectral sensor is either spaceborne or airborne. In this section, a spaceborne example, Hyperion, is introduced first, followed by two airborne examples, Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Compact Airborne Spectrographic Imager (CASI).
2.5.1
Hyperion Satellite Data
Hyperion is one of the three primary instruments aboard the Earth Observing-1(EO-1) spacecraft that was launched on November 21, 2000. EO-1 revolves around the Earth in a circular, sun-synchronous orbit. Inclined at 98.7°, this orbit has an altitude of 705 km. Such an orbital setting of EO-1 matches the Landsat 7 orbit within 1 minute. Hyperion images cover a ground area nearly identical to TM’s for the purpose of comparison. The Hyperion telescope can be rolled by up to 22° to view a Landsat swath next to the ground track swath. This side-looking capability enables a given ground target to be imaged up to 5 times during the 16-day revisit period. As a high spatial resolution hyperspectral sensor, the Hyperion payload consists of a single telescope, one VNIR spectrometer, and one SWIR spectrometer. The VNIR spectrometer captures radiation over the 0.4 to 1 μm range (Pearlman et al., 2000). The SWIR spectrometer has an array of 160 (spectral) by 250 (spatial) channels. Jointly, the spectral range of the instrument extends from 0.4 to 2.5 μm with a spectral resolution of 10 nm (Table 2.25). The sensor is capable of recording 220 contiguous spectral bands at a spatial resolution of 30 m on the ground, covering a ground area of 7.5 ⫻ 100 km2 at high radiometric accuracy. All data are quantized to 12 bits.
Spatial resolution
30 m
Swath width
7.75 km
Image format
20 × 7.5 km
IFOV
0.624°
Return period
16 days
Spectral channels
220 channels (70 VNIR channels from 356 to 1,058 nm; 172 SWIR channels from 852 to 2,577 nm)
Spectral interval
10 nm (nominal)
Quantization
12 bits
TABLE 2.25 Features of Hyperion Satellite Imagery
63
64
Chapter Two Hyperion data are processed to various levels before they are released to the public. Level 0 processing includes removal of transmission artifacts and reordering of data formats. VNIR and SWIR data are merged to form a single raw image file, together with the flight information and ancillary data. Level 1R data have been radiometrically calibrated based on coefficients derived from both laboratory and on-orbit calibration, but not corrected for geometric distortions. Level 1Gst data have been terrain corrected and are available in 16-bits. Standard datasets, which include image as well as metadata and ancillary information, may be purchased in units of 20 ⫻ 7.5 km2 in the HDF. Hyperion data are priced from $250 per scene for archived level 1R data, rising to $500 for level 1Gst data. An addition cost of $750 is charged for data acquired upon the user’s request. Hyperion data are ideally applied in fields in which the subtle spectral variations among ground targets at some wavelengths need to be identified and differentiated. Usually, such identification and differentiation are almost impossible with standard multispectral data, such as mapping of soil salinity, accurate mineral exploration, better predictions and assessments of crop yield, and better containment mapping.
2.5.2 AVIRIS AVIRIS is a further development from the prototype, the Airborne Imaging Spectrometer (AIS). It was designed and constructed by the Jet Propulsion Laboratory in Pasadena, California, under contract to NASA. This unique optical sensor captures solar radiant energy over the wavelength range of 0.4 to 2.5 μm in 224 contiguous spectral bands. The wavelength range of each spectral band is programmable. Quantized to 12 bits, AVIRIS data are radiometrically calibrated to <10 percent absolute value. AVIRIS uses “whiskbroom” scanning to sweep the ground back and forth, producing 614 pixels for the 224 detectors per scan. The FOV and IFOV of AVIRIS imagery are fixed at 30° and 1 milliradian, respectively (JPL, 2007). These two parameters translate into a varying swath width and pixel size, depending upon the flying height of the aircraft in which the sensor is aboard. If flown at approximately 20 km above sea level, each AVIRIS pixel corresponds to a ground area of approximately 20 ⫻ 20 m2 (with some overlap between pixels), thus yielding a ground swath width of about 11 km (Table 2.26). At a flight height of 4 km above the ground, each pixel covers a ground area of 4 ⫻ 4 m2 at a swath width of 2 km (Vane and Goetz, 1993). Since AVIRIS data are airborne, not all areas of the Earth’s surface have been sensed yet. The areas flown so far were decided by NASA on the basis of their scientific merits. Of the flown areas, their archived data are not routinely available to the public, which is the same as with other commercial satellite data.
Overview of Remotely Sensed Data
Aircraft altitude
20 km above sea level
Spectral bands
224 bands over 0.4–2.5 μm, with programmable wavelength ranges
Spectral interval
10 nm
FOV
30°
IFOV
1 mrad
Spatial resolution
20 m (17 m at center)
Swath width
10.5 km (614 pixels) by 1,000 km/flight
Quantization level
12 bits
TABLE 2.26 Main Properties of AVIRIS Data
AVIRIS data are suited to identify, measure, and monitor constituents of the Earth’s surface and atmosphere—thanks to their finer spatial resolution than their spaceborne Hyperion counterpart. Most of these applications are related to our understanding of the processes of global environment and climate change, including oceanography, environmental science, snow hydrology, geology, volcanology, soil and land management, atmospheric science, agriculture, and limnology. Assessment and monitoring of environmental hazards such as toxic waste, oil spills, and land/air/ water pollution are some other exemplary applications of AVIRIS data in which the imagery may have to be properly calibrated and corrected for the atmospheric effects.
2.5.3
CASI
CASI is an airborne hyperspectral sensor, sensing the ground directly below the platform in a fixed direction, using a pushbroom imaging spectrograph. During imaging the scene, the radiation gathered from the target over the wavelength ranges of 0.4 to 0.95 μm is recorded in hundreds of bands at an increment of 1.8 nm. An image of 512 by 288 pixels is formed as the platform moves forward. The two-dimensional (2D) CCD sensor is digitized to 12 bits via a programmable electronics system. The CASI sensor operates in one of three modes: spatial, spectral, and full-frame. In the spatial mode (imaging), up to 15 nonoverlapping bands are recorded, each comprising 512 spatial pixels across the 35° FOV (Table 2.27). Both the central band wavelength and the number of spectral bands can be specified by the user. In the spectral mode, 288 spectral bands are sampled in up to 39 view directions. In the fullframe mode, the sensor samples all 288 spectral rows for all 512 spatial columns. Requiring long image recording times, this mode produces best results only when the sensor is stationary or aboard a slowly moving platform (Babey and Anger, 1989). During flight, the aircraft can be equipped with a global positioning system(GPS)/inertial
65
66
Chapter Two
CASI
CASI-2
35° (15°–60°)
54.4°
Spectral bands
288 over 0.4–0.95 μm programmable ranges
288 over 0.405–0.95
Spectral interval
1.8 nm
1.8 nm
Spatial resolution
1.23 m (578 pixels)
512 pixels
Image dimension
489 lines
512 pixels
Spectral bands
Up to 15
Up to 18 spectral bands
Image dimension
512 pixels
512 pixels swath
Full-frame mode
512 spatial columns by 288 spectral bands
512 pixels × 288 spectral pixels
Quantization level
12 bits
12 bits
FOV Spectral mode
Spatial mode
TABLE 2.27 Main Properties of CASI Data
navigation system (INS) so that all CASI data can be georeferenced in real time during image acquisition (see Sec. 5.9). CASI was later superseded by CASI-2, which retained most of the CASI features (Table 2.27) except that the spectral bands can be up to 18 in the spatial mode (NERC ARSF). Another feature of CASI-2 is the enhanced spectral mode that has full spectrum (288 channels) in a block of 101 adjacent spatial pixels. The former tape media of data recording was replaced by a removable 9 Gb hard disk. At present the most advanced sensor in this series is CASI 1500 by ITRES. In terms of spectral bands (288) and bandwidth (2.2 nm), it is very similar to CASI. Its two distinctive improvements are higher dynamic range (14 bits) and finer spatial resolution of 25 cm owing to the use of 1500 pixels across the FOV (ITRES Research, 2007). However, it is able to sense the radiation at a combined wavelength range of only 0.65 μm between 0.38 and 1.05 μm, causing a spectral gap in the acquired data. CASI imagery data have found a variety of applications ranging from forest cover mapping to pollution monitoring, such as analysis of water quality and pollution in coastal areas (e.g., total and fecal coliforms, aeromonads, turbidity, salinity, and chlorophyll); monitoring of natural disasters such as fires, floods, and volcanoes; inventory of natural resources; and precision farming.
Overview of Remotely Sensed Data
2.6
Radar Data Because of the strong penetration capability of microwave radiation, radar remote sensing is operational for the entire Earth’s surface regardless of the frequency and volatility of cloud cover. In fact, it is the only operational remote sensing system in mapping persistently cloudy regions. Radar data are hence a useful supplement to optical remote sensing data. In this section, four major spaceborne radar remote sensing programs (JERS, ERS, Radarsat, and EnviSat) are briefly surveyed, with the emphasis placed on system parameters and image properties.
2.6.1
JERS Data
JERS-1 was launched into a sun-synchronous orbit on February 11, 1992. It has an orbital inclination of 97.7° and a period of 94 minutes. At an altitude of 568 km, the satellite completely covers the Earth in 44 days. The payload of JERS-1 encompasses optical sensors and a SAR sensor. The former collect data in eight spectral bands, covering the wavelength range 0.52–2.4 μm. The SAR sensor operating in L-band (1.3 GHz, 23 cm) acquires imagery at a swath width of 75 km and a spatial resolution of 18 ⫻ 18 m2. JERS SAR imagery is best at monitoring land use, glacier extent, snow cover, surface topography, and ocean currents and waves. Other potential applications include national land survey, agriculture, forestry, fishery, and environmental protection, and coastal monitoring.
2.6.2
ERS Data
The first Earth Resource Satellite (ERS-1) was launched into a near-polar orbit of about 780 km on July 17, 1991, by the European Space Agency, followed by ERS-2 on April 20, 1995. Both satellites are designed to acquire data about the Earth’s ocean, ice, and land resources. ERS-1 has an orbital inclination of 98.52° at a period of 100 minutes. Its return period can be adjusted from 3 days to 168 days to meet different data acquisition requirements at different geographic locations (Table 2.28). Accordingly, other parameters such as altitude and orbital inclination
3 days
35 days
168 days
Mean altitudeH (km)
785
782
770
Orbital inclination (°)
98.516
98.543
98.491
Orbits per cycle
43
501
2,411
Semimajor axis (km)
7,153.138
7,159.496
7,147.191
TABLE 2.28
Orbital Parameters of ERS-1 Satellite at Three Repeat Cycles
67
68
Chapter Two have to change, as well. It takes up to 2 weeks for the newly configured orbital parameters to stabilize to within 1 km of the nominal ground track. The stabilization pace of the orbital cycle is as fast as within 5 km after 24 hours. The payload of ERS-1 comprises both active and passive microwave sensors (Fig. 2.10). The C-band SAR sensor operates at a frequency of 5.3 GHz (bandwidth: 15.55 MHz). Using VV (vertical transmission and vertical reception) polarization, it senses the target in either the image mode or the wave mode. In the image mode the SAR sensor scans the ground 250 km off nadir at an incidence angle of 23° (midswath), covering a strip of ground about 100 km wide (Fig. 2.11). The acquired images have a spatial resolution of 26 m in range (across track) and 6 to 30 m in azimuth (along track). Data are acquired for a maximum duration of approximately 10 minutes per orbit. In the wave mode the SAR sensor samples the ground within the image swath at a regular interval, forming images of 2D spectra of ocean surface waves. ERS radar data have found increasingly wider applications in many areas, such as near real time surveillance, exploration of offshore oil reserves, and monitoring of iceberg movements for shipping routing. They are especially useful in derivation of elevational information.
FIGURE 2.11 Geometry of ERS-1 satellite during scanning in the image mode. A strip of about 100 km wide is scanned at a distance of 250 km from the nadir track.
2.6.3
Radarsat Data
Two satellites have been launched in this series, Radarsat-1 and Radarsat-2. Radarsat-1 is a sophisticated commercial Earth observation satellite launched by a Canadian consortium on November 4, 1995. The satellite has a sun-synchronous polar orbit at a height of 798 km and an orbital inclination of 98.6°. At an orbital period of 100.7 minutes, Radarsat-1 is able to circle the Earth 14 times a day. Its standard return period of 24 days can be shortened to daily coverage for the Arctic, 3 days for certain parts of the world, and 6 days at the equatorial latitudes using the nonstandard wide swath (i.e., different beam positions). This is achieved through the multimode imaging capability by steering the radar beam over a 500-km range, a unique feature that is absent from other radar systems. The SAR sensor acquires C-band images at the wavelength of 5.6 cm (5.3 GHz) in the HH polarization mode using an antenna of 15 ⫻ 1.5 m2 in dimension. In total, there are seven beam modes in Radarsat-1 (fine, standard, wide, ScanSAR narrow, ScanSAR wide, extended high, and extended low), each offering a unique image resolution and swath width (Table 2.29). Associated with these modes are variations in image swath from 45 to 500 km in width, in resolution from 8 to 100 m, and in incidence angle from 10° to 58°. All sensed areas on the ground lie to the right of the satellite path as Radarsat-1 is a right-looking satellite.
69
70
Chapter Two
Mode
Nominal Resolution, m
No. of Positions/ Beams
Swath Width, Incident km Angles, deg
Fine
8
15
45
37–47
Standard
30
7
100
20–49
Wide
30
3
150
20–45
ScanSAR narrow
50
2
300
20–49
ScanSAR wide
100
2
500
20–49
Extended high
18–27
3
75
52–58
Extended low
30
1
170
10–22
Source: CSA, 2005.
TABLE 2.29 Characteristics of Radsarsat-1 Imagery in Seven Modes of Scanning
Radarsat data are rather expensive. An archived image acquired in the standard mode (100 ⫻ 100 km2) costs US$2750 at level 0 (i.e., raw data). The price goes as high as US$3750 for more processed path-oriented and map-oriented data. An additional programming fee is chargeable for data recorded upon user request. The exact amount of the programming fee varies with the urgency of request. Data recorded prior to January 1, 1999, are sold at a heavily discounted price. In spite of this, they are still much more expensive than other optical satellite data, such as ASTER. Radarsat-1 data may be delivered in a stored media via courier or electronically via the Internet in a wide range of spatial resolutions and swath widths. Succeeding Radarsat-1, Radarsat-2 was successfully launched into the same orbit as Radarsat-1 on December 14, 2007, safeguarding the continuity of data supply. Radarsat-2 makes a few improvements over its predecessor, such as enhanced spatial resolution to as low as 3 m, looking directions (both left- and right-looking imaging is possible), reduced revisit period to 2 to 3 days (based on 500-km swath width) at the equator, and onboard geolocation via GPS to an accuracy of ±60 m in real time. It is possible to acquire multichannel images at bandwidths of 11.6, 17.3, 30, 50, and 100 MHz. The sensor is fully flexible in selecting polarization that includes HH, HV, VH, and VV (Table 2.30). Together with different beam modes, the image resolution ranges from 3 to 100 m, and the swath width varies from 20 to 500 km. These improvements are made possible owing to the deployment of a state-of-the-art phased array antenna. It comprises hundreds of miniature transit-receive modules. Controlled by a computer, the antenna is able to steer at the full range of swath and alternate operation modes nearly instantaneously.
Overview of Remotely Sensed Data
Approximate Resolution (m)
Beam Mode
Polarization
Nominal Swath Width (km)
Ultra-fine Multi-look fine
Selective single
20 50
3 8
3 8
30–40 30–50
Fine Standard
Quad
25 25
12 25
8 8
20–41 20–41
50 100 150
8 25 30
8 26 26
30–50 20–49 20–45
300 500
50 100
50 100
20–46 20–49
75 170
18 40
26 26
49–60 10–23
Fine Standard Wide ScanSAR narrow ScanSAR wide
Selectives
Extended high Extended low
Single
Range
Azimuth
Incident Angle (°)
Source: MDA, 2008.
TABLE 2.30 Typical Spatial Resolutions of Radarsat-2 Imagery in Different Beam Modes and Polarizations
Similar to all other radar data, Radarsat images are best applied to areas that cannot be adequately imaged with optical sensors. In comparison with optical sensor data, Radarsat data offer useful values in identifying mesoscale ocean features, such as icebergs and sea ice, oil spills, and geological structures. It is also possible to detect underwater topography from Radarsat data through local changes in the surface roughness pattern. Other applications of Radarsat data include mapping of topographic relief and production of DEMs.
2.6.4
EnviSat Data
The Environmental Satellite (EnviSat) is an ambitious and innovative Earth observation satellite that was launched into a sun-synchronous polar orbit on March 1, 2002, by the European Space Agency. It has a return period of 35 days, though most of the Earth can be imaged within 1 to 3 days. EnviSat completes one revolution around the Earth in 100 minutes. Its orbit of about 800 km high has an inclination of 98.54°. This satellite was designed to facilitate the monitoring and study of the Earth’s environment and climate changes, and to manage and monitor the Earth’s resources, the atmosphere, ocean, land, and ice. EnviSat maintains all the capabilities of its predecessors, the ERS-1 (mission ended on March 10, 2000) and ERS-2 satellites, in addition to many new capabilities. Its comprehensive payload is made up of 10 sophisticated optical and radar sensors, such as Medium Resolution Imaging Spectrometer (MERIS), and Advanced
71
72
Chapter Two
Sensor
General Characteristics
Primary Application Areas
ASAR
Advanced synthetic aperture radar
General purpose all-weather satellite
MERIS
Medium resolution imaging spectrometer
Ocean biology, marine water quality, vegetation on land, cloud and water vapor in 15 bands
AATSR
Advanced along track scanning radiometer
Sea-surface temperature
MIPAS
Michelson interferometer for passive atmospheric sounding
Chemical and physical processes in the stratosphere
TABLE 2.31
Summary of Selective Sensors aboard the EnviSat Spacecraft and Their Intended Applications
Along Track Scanning Radiometer (AATSR) (Table 2.31). Of these sensors, the largest is the high resolution radar instrument named Advanced Synthetic Aperture Radar (ASAR) operating in the C band (Fig. 2.12). The capability of EnviSat in data gathering has been drastically enhanced over ERS satellites in terms of coverage (one of seven swath widths, each 100 km wide), range of incidence angles (15° to 45°), polarization, and modes of operation. Five mutually exclusive modes of ASAR operation are created through various combinations of polarization with incidence angle (Table 2.32). This number rises to 37 if the spatial resolution of the acquired imagery is further differentiated into high, medium (wide-swath mode), and reduced categories. An area up to 400 km is imaged in the wide-swath mode, thus significantly reducing the return period. In this mode a large strip of area up to 4000 km long is swept. In the alternating-polarization mode, coarse resolution (150 m) images of both vertical and horizontal polarizations can be recorded simultaneously. In the image mode ASAR operates at one of seven selectable subswath widths from 55.5 to 100 km. In this mode HH- or VV-polarized images are acquired at a nominal resolution of 30 m in one of multiple looks that include single-look complex, phase-preserved, and slant-range images. Both high and medium resolution ASAR imaging data are recorded only when required to satisfy background mission scenarios and/or user requests. HH- or VV-polarized images with a reduced resolution of 1 km are obtainable over a 405-km swath in the global-monitoring mode. This mode of operation is activated mainly in response to user requests. In the wave mode the ASAR instrument generates vignettes with a minimum size of 5 ⫻ 5 km2, spaced 100 km along track in HH or VV polarization.
AATSR MIPAS SCIAMACHY
MERIS
MWR Ka-band Antenna GOMOS DORIS X-band Antenna
RA-2 Antenna LRR
ASAR Antenna
Service module Solar Array (not shown)
FIGURE 2.12 Sensors aboard the EnviSat satellite. (Source: ESA, 2002.)
Operation Mode
Approximate Resolution (m)
Image Format
Polarization
Image
30 (precision product)
56 km (swath 7) to 100 km (swath 1)
VV or HH
Alternating polarization
30 (precision product)
Two coregistered images per acquisition, any of seven selectable swaths
HH/VV, HH/HV, or VV/VH
Wide swath
150 (nominal product)
400 km × 400 km
VV or HH
Global monitoring
1,000
Up to a full orbit of coverage
HH or VV
10 km × 5 km to 5 km × 5 km
HH or VV
Wave
Source: ESA, 2002.
TABLE 2.32 Nominal Characteristics of ASAR Imagery in Different Modes
73
74
Chapter Two
Characteristics
JERS-1
ERS-1, 2
Radarsat-1, 2
EnviSat ASAR
Satellite altitude (km)
568
780
798
800
Orbital inclination
97.7°
98.52°
98.6°
98.54°
Period (min)
94
100
100.7
100
Return period (days)
44
3–168
24
35
Frequency
1.3 GHz
5.3 GHz
5.3 GHz
5.3 GHz
Spectral band
L
C
C (5.6 cm)
C
Polarization
VV
HV
HH/quad
HH or VV
Swath width (km)
75
102.5
100/20–500∗
Up to 100 km
Spatial resolution (m)
18
26/(6–30)
30/3–100
30
∗The figure for Radarsat-2 varies widely with beam mode and polarization.
TABLE 2.33 Comparison of Major Characteristics among Common SAR Imagery
In general, ASAR data can be used for site-specific investigations. They provide measurements of the atmosphere, ocean, land, and ice, such as land use monitoring, potential forecasting of ocean circulation, and ultraviolet forecasting. It is also possible to monitor El Niño, the Gulf Stream, and the ozone layer above the Arctic in near real-time using EnviSat data. Presented in Table 2.33 is a comparison of the radar images covered in this section. According to this table, these data are extremely similar to one another in their frequency and swath width. As with all radar satellite data, their spatial resolution varies from along track to cross track. Moreover, the resolution is also subject to the incidence angle. So it matters little which image should be selected for a particular application or for a given geographic area so long as it is available for the study area at the right time.
2.7
Conversion from Analog Materials Remote sensing materials (e.g., historic aerial photographs and satellite images) play a vital role in such image analyses as change detection. The difficulty with their use is that some of them may only exist in the printed form. Before they can be analyzed digitally, they have to be converted into the digital format first. A quick and efficient method of data conversion is via scanning. There are a vast range of scanners available at various precision levels. The most reliable scanners are photogrammetric, which enable the conversion to be
Overview of Remotely Sensed Data accomplished with excellent location accuracy. Data scanned using this type of scanner has an accuracy level comparable to that obtained with analog and analytical photogrammetric devices. Positional accuracy for photographs recorded on film can be as high as 5 μm or less RMS error (Bethel, 1995). They are essential in applications that require high geometric accuracy. By comparison, desktop scanners are much less reliable geometrically, and hence less expensive (Fig. 2.13). They find applications in which accuracy is not a paramount concern. Regardless of their accuracy, all scanners must have an active area of 9 ⫻ 9 inches2, the physical size of standard aerial photographs, in order to scan one aerial photograph wholly. Scanning of photographs is carried out in either the gray or color mode, depending upon the nature of the photograph being scanned. If the original photograph is black and white, the gray mode should be adopted to minimize file size. The color mode is reserved for true color or color infrared photographs. Unlike scanning the ground surface from space during initial data acquisition, scanning aerial photographs allows the analyst to control the process of data
FIGURE 2.13 The HP ScanJet 8200 produces consistently enhanced colors and razor-sharp images. It is able to scan photographs at 4800 DPI in 48-bit colors.
75
76
Chapter Two acquisition, such as specification of scanning resolution. Commonly adopted scanning intervals are 150, 300, and 600 dots per inch (DPI). Since the original data are recorded in the graphic format, scanning at any resolution, no matter how fine it is, inevitably causes loss of information, even though this loss can be minimized by adopting a finer scanning resolution. A large DPI enables a great deal of details in the original photograph to be preserved in the captured image, but also leads to an enormous amount of data to maintain and process subsequently. This huge data volume can be very problematic and troublesome to handle. For instance, a black-and-white aerial photograph of 23 ⫻ 23 cm2 in dimension requires 7.29 Mb (see Sec. 3.1) of storage space if scanned at 300 DPI. This figure rises to 29.16 Mb if the scanning resolution doubles to 600 DPI, or to 87.48 Mb if the photograph happens to be color. Therefore, it is important to determine the optimal resolution of scanning. Which scanning resolution is the most appropriate depends on the desired smallest objects on the ground to be resolved on the scanned photograph, or ground resolving distance (GRD). It is calculated using Eq. (2.1).
GRD (m) ⫽ 25.4 ⫻ SF/(1000 R)
(2.1)
where SF ⫽ the scale factor of the photograph being scanned, R ⫽ the scanning resolution expressed as DPI. This equation can be inversed to determine the optimal scanning resolution R. Below is an example of how to determine the required scanning resolution using Eq. (2.1) above: Example If the smallest ground feature to be preserved is 0.5 m in the digital image scanned from a 1:12,500 aerial photograph, what scanning resolution should be adopted? Solution R ⫽ 25.4SF/(1000 GRD) ⫽ 25.4 ⫻ 12500/(1000 ⫻ 0.5) ⫽ 635 DPI
The following four points need to be borne in mind with the above calculation: • First, the user may specify the scanning resolution as calculated above prior to scanning. However, the extent to which the scanner can be sensitive to this resolution is affected by its quality (e.g., there may be no difference between 634.5 and 635.4). • Second, the calculated resolution is only theoretical in that the spatial resolution of the source photograph is not taken into account. So long as the grain size of the photograph is sufficiently fine (e.g., a few micrometers), it should be abundantly detailed. This detail level should not affect discerning the smallest feature in the scanned image if the scanning resolution is not overly high (e.g., in excess of 1200 DPI or 21 μm).
Overview of Remotely Sensed Data • Finally, the desired detail to be resolved should be several times larger than the calculated spatial resolution of scanning, depending on the distinctiveness of the features and their contrast with the surroundings. Compared with purchasing digital data, scanning analog print materials to obtain digital data is rather inexpensive. A print costs a small fraction of its digital counterpart. Besides, the obtained digital photographs can be set at various spatial resolutions, though not finer than the detail level of the original print. Data of a finer resolution are obtainable with the use of a larger DPI, but raw digital data can be degraded only from a fine resolution to a coarse resolution. In other words, their spatial resolution cannot be made finer than the original, no matter what kind of resampling scheme is used. Scanning aerial photographs can also take advantage of the high geometric fidelity of frame photographs because they are obtained at the instant of opening the camera’s shuttle. By comparison, satellite images may suffer from more geometric distortions during scanning that takes at least tens of microseconds to complete. Furthermore, the obtained digital data have a spatially uniform resolution. Pixel size hardly varies across the photograph, in sharp contrast to satellite scanning in which pixel size could be severely compromised at a large off-nadir viewing perspective. In spite of the above advantages, there are three disadvantages associated with scanning aerial photographs: • First, photographs have a limited spectral range. Since films can capture visible light and NIR radiation over the wavelength of 0.4 to 0.9 μm, it is impossible to obtain data over the midinfrared or TIR portion of the spectrum. • Second, scanned photographs can be separated only into three layers: blue, green, and red. It is impossible to obtain more spectral bands than this number. Besides, the exact spectral range of each separated layer is not precisely known. • Finally, artificial radiometric variations are inherent in one photograph and across multiple photographs. It may not be possible to eliminate the radiometric variation of the same ground object across multiple scanned photographs. The issue of artificial variation in radiometry over an aerial photograph is usually dealt with by tinting the camera’s lens. The nonuniformity in illumination caused by the absorption of a concave lens is reduced to such a level that it is not a primary concern any more. By comparison, the issue of varied radiometry across multiple photographs is much more severe and difficult to tackle (Fig. 2.14). Pseudoradiometric properties of the same ground feature across photographs are produced out of two processes:
77
78
Chapter Two
FIGURE 2.14 Scanning of multiple photographs faces the problem of artificial variation in image radiometry caused by varying development in the dark room. See also color insert.
• First, the solar radiation may have changed during photography. This can be minimized by reducing the number of flights and shortening the duration of photography, or taking photographs when the solar radiation is the most stable (e.g., around noon). • Second, not all photographs are submerged in the chemicals for the same duration during development and fixing in the darkroom. How to minimize the tonal discrepancy and unify the radiometric properties of all photographs will be covered in Sec. 6.2. In addition to radiometric inconsistency, scanned photographs may also suffer from geometric problems. Recorded on paper, aerial photographs may be stretched or worn after extensive use. As a result, their geometry may have degraded despite the high geometric fidelity of the original photographs. Care must be taken to remove such distortions or control them to within an acceptable level during subsequent geometric processing.
2.8
Proper Selection of Data Before deciding which type of remote sensing data is the most suitable for a particular project, the user needs to evaluate its requirements and constraints carefully by taking a number of factors into consideration. Some of the most important ones are user needs, seasonality, cost, and mode of data delivery.
Overview of Remotely Sensed Data
2.8.1
Identification of User Needs
Different users purchase remote sensing data for different purposes and needs. Before deciding what type of data to purchase, the analyst needs to identify the special requirements by answering the following questions: • First, what type of data is the most useful for studying the phenomenon in question? Is it optical or microwave? If optical, should the data be recorded in visible light, NIR, middle infrared, far infrared, or their combination? Optical data are the best at studying in-water constituents, while near and middle infrared imagery is especially good at studying vegetation. TIR imagery is effective at revealing heat-related phenomena, but suffers from a lack of detail and coarse spectral resolution. Microwave imagery is excellent at detecting hidden features and at ocean applications. It is the only remote sensing functioning in tropical areas where the ground area is frequently obstructed by clouds. Radar imagery suffers from a lack of spectral resolution and radiometric noises. For most natural resources mapping and environmental monitoring, the best choice is multispectral data over the visible and NIR portion of the spectrum. • Second, what detail level is required? The amount of details to be identified from remote sensing imagery is related directly to its spatial resolution and scale. Images of varying spatial resolutions enable different information to be derived from them at different accuracy levels. Images of a finer spatial resolution allow more details to be discerned, but they may cover only a small strip of the area under study. Thus, a large number of images have to be acquired to completely cover the area. Another implication of using fine spatial resolution images is the long time and huge cost necessary to process them. Besides, data of a finer spatial resolution are more expensive than those of a coarser resolution, so it is important to select images at the right spatial resolution. In deciding which image is the best for the task, users need to base the decision on their special needs. It may not bring much benefit if detailed information is not required. Apart from the amount of details visible, spatial resolution or pixel size also affects the reliability of the results derived from the data. Covering a local area, images of a finer spatial resolution are suited to applications of a small area as the mapping accuracy requirements are stringent. The expected accuracy standard can be met more easily with the use of images of a finer spatial resolution. Coarser resolution data are more appropriate for broad-scale applications. Geometric accuracy
79
80
Chapter Two is an especially important factor to consider if the remote sensing data are used to produce elevational information such as DEMs. However, it is a less important consideration in thematic mapping. • Third, what does the study area look like geometrically? When deciding which type of remote sensing data is the most appropriate, the geometric properties of the study area need to be considered. They include its size, shape, orientation, and type of terrain (e.g., the proportion of land, if land is the object of study). These factors govern how many scenes of imagery have to be purchased. More images may be needed if the area is not oriented in parallel to the orbital path or has a highly irregular boundary. In the worst case a small portion of the area can spread into several neighboring scenes. In this case, a few half- or even quarter-scene images may be purchased, instead of a full-scene image. • The last factors to consider are related to quality, reliability, and currency of the data source. For instance, what is the revisit period and acquisition dependability of the data? This quality is important in carrying out longitudinal studies of ephemeral phenomena such as flooding and fires. The Earth’s surface is in a state of constant change. Some ground covers (e.g., forest) change faster than others (e.g., urban). When purchasing the data, the user needs to know the acceptable time frame of the data. If the features or phenomenon under study do not change quickly over time, then data recorded years back are still useable. If the temporal resolution is too coarse, then these data cannot fulfill such applications as fire monitoring. Recent data have to be acquired at a higher cost.
2.8.2
Seasonal Factors
When considering what data are the optimal, the user must be aware of the seasonal factor, even if the subject of study is not directly related to phenology. Imaging should take place at a time when the phenomenon under study is maximized, and thus most easily distinguished from other features or phenomena. It is not a good idea to obtain wintry images when the ground is likely to be buried underneath snow. Another reason for considering seasonality is because different seasons have different shadow lengths and different chances of cloud cover. In general, summer images have the shortest topographic shadow, whereas winter ones have the longest shadow. Topographic shadow is generally not considered desirable as it may obscure critical information on the phenomenon under study. This is especially true in the mountainous setting. Shadow also degrades the accuracy of mapping if the data are
Overview of Remotely Sensed Data automatically classified. However, shadow also facilitates the identification of geologic structures and appreciation of terrain relief. The chance of having more cloud cover is higher during winter. The user must be knowledgeable in the maximum amount of cloud cover allowable in an image. This limit should be explicitly spelled out in signing contracts to purchase programmed data.
2.8.3
Cost of Data
The cost of remote sensing data varies enormously from absolutely free to prohibitively expensive for academics. Although the price charged by a data supplier or its approved agent is not negotiable, it still pays to consider a few factors before an order is placed. In general, the older the data, the cheaper they become. Data suppliers usually sell several-years-old data at a heavily discounted price. So it may be worthwhile to consider buying archived data that were recorded years back. It is also cheaper to buy multitemporal, multiple-sensor, multiple-scene data from the same data supplier. If the required data are not archived, then an order must be accompanied by a special programming request. Usually, the supplier charges a programming fee in addition to the regular data cost. In this case the user can specify the minimum acceptable level of cloud cover. When inspecting the quoted prices, the user needs to be aware if they include all processing and handling charges. The cost for processed data that tend to be more accurate is higher. It is certainly cheaper to have the processing done in-house for those users who have access to a competent image processing system (refer to Chap. 4).
2.8.4
Mode of Data Delivery
Once the needed data have been identified, the next step is to place an order. Prior to this, the analyst needs to be aware of the time it takes to process the order and how quickly the ordered data are delivered. If the requested data are to be recorded via special programming, it may take a while to obtain the suitable ones. By comparison, archived data can be delivered rather quickly. The urgency attached to an order depends upon the nature of the application. A higher priority should be attached to data for disasterrelated applications (e.g., fire monitoring and emergency response), as they must be delivered almost instantaneously, even though this means a higher cost. For other nonemergency applications, a delay of a few days probably will not make much difference. In placing an order, the user may have to decide the appropriate way of data delivery. Data delivered via courier takes longer than direct downloading from the vendor’s Web site. Courier delivery involves more cost associated with data storage media and handling. If this method is chosen, the data should be stored in a media that can be read into the user’s computer properly. If accessibility to broadband
81
82
Chapter Two is not a problem, then the direct downloading option is preferred. Downloading should take place during off-peak times (e.g., evening or weekends) when the Internet speed is the fastest. This is especially important if the ordered data are measured in hundreds of megabytes. No matter whether the data are couriered or downloaded, they must be saved in a common format. This format may be image specific, for instance, HDF for ASTER data. If there is an option in specifying the image storage format, always use TIFF or GeoTIFF. Finally, the user needs to be aware of the data license policy, as well as any restrictions on the use of the data. For instance, can the data be used commercially? Is it possible to share the data with other researchers at the same institution? Are there any implications in publishing results generated from the data? Some data suppliers require the user to acknowledge them as the legitimate copyright holder of the raw data and results derived from them in all publications. The user must honor these requests in order to avoid any potential litigation.
References Anshakov, G. P., and V. K. Skirmunt. 2000. “The Russian project of ‘Resurs-DK 1’ space complex development. Status, prospects, new opportunities for the consumers of space snapshots.” Acta Astronautica. 47(2):347–353, 357. Babey, S. K., and C. D. Anger. 1989. “A compact air-borne spectrographic imager (CASI).” Digest—International Geoscience and Remote Sensing Symposium (IGARSS). 2:1028–1031. Bakker, W. 1998. The Indian remote sensing satellites, http://www.itc.nl/~bakker/ earsel/9806b.html. Bethel, J. S. 1995. “Geometric alignment and calibration of a photogrammetric image scanner.” ISPRS Journal of Photogrammetry and Remote Sensing. 50(2):37–42. CSA (Canadian Space Agency). 2005. Radarsat-1, http://www.space.gc.ca/asc/ eng/satellites/radarsat1/components.asp DigitalGlobe. 2007. QuickBird Imagery Products—Product Guide. Longmont, CO: DigitalGlobe Inc. DigitalGlobe. 2008. DigitalGlobe Constellation—WorldView-1 Imaging satellite, http://www.digitalglobe.com/index.php/86/WorldView-1. ESA (European Space Agency). 2002. EnviSat tour, http://envisat.esa.int/ instruments/tour-index/. Feldman, G. C. An overview of SeaWiFS and the SeaStar spacecraft, http://oceancolor.gsfc.nasa.gov/SeaWiFS/SEASTAR/SPACECRAFT.html, last accessed on June 17, 2008. Gao, J., and Y. Liu. 2008. “Mapping of land degradation from space: A comparative study of Landsat ETM⫹ and ASTER data.” International Journal of Remote Sensing. 29(14):4029–4043. GeoEye. 2007a. GeoEye imagery products: IKONOS, http://www.geoeye.com/ products/imagery/ikonos/default.htm. GeoEye. 2007b. GeoEye Imagery Product: GeoEye-1, the world’s highest resolution commercial Earth-imaging satellite, http://www.geoeye.com/products/imagery/ geoeye1/default.htm. Goddard Space Flight Center, NASA. http://svs.gsfc.nasa.gov/gallery; visibleearth.gsfc.nasa.gov/114/terra.poster.jpg. IODM (International Online Defense Magazine). 2006. Defense update, http:// www.defense-update.com/directory/erosC.htm.
Overview of Remotely Sensed Data ITRES Research. 2007. CASI 1500—Hyperspectral imager, http://www.itres.com/ CASI_1500. JAXA (Japan Aerospace Exploration Agency). 2004. ALOS mission, http://alos. nasda.go.jp/index-e_old.html JPL (Jet Propulsion Laboratory). 2004. ASTER: Advanced Spaceborne Thermal Emission and Reflection Radiometer, http://asterweb.jpl.nasa.gov/mission.asp. JPL. 2007. AVIRIS—Airborne Visible/Infrared Imaging Spectrometer, http://aviris. jpl.nasa.gov/. Krebs, G. D. 2007. Gunter’s space page—WorldView 2, http://www.skyrocket. de/space/doc_sdat/worldview-2.htm. Krishnaswamy, M., and S. Kalyanaraman. 2002. “Indian Remote Sensing satellite Cartosat-1: Technical features and data products.” http://www.gisdevelopment. net/technology/rs/techrs023pf.htm. MDA (MacDonald, Dettwiler and Associates Ltd). 2008. Radarsat-2—A new era in synthetic aperture radar www.radarsat2.info/. NASA, MODIS web, http://modis.gsfc.nasa.gov/about/specifications.php, last accessed June 17, 2008. NERC ARSF. CASI-2 Sensor (version 2), http://arsf.nerc.ac.uk/documents/casi2. pdf, last accessed June 17, 2008. NPA Group. 2008. Satellite imagery: OrbView-3, http://www.npagroup.co.uk/ imagery/satimagery/orbview3.htm. NRSA (National Remote Sensing Agency). 1995. IRS-1C Data Users Handbook (version 1.0). Hyderabad, India. NRSA. 2007. Cartosat-2, Department of Space, Government of India, http://www. nrsa.gov.in/satellites/cartosat-2.htm. Pearlman, J., C. Segal, L. Liao, S. Carman, M. Folkman, B. Browne, L. Ong, and S. Ungar. 2000. Development and operations of the EO-1 Hyperion imaging spectrometer, http://eo1.gsfc.nasa.gov/Technology/SPIE-Hyperion.pdf. SIC (Satellite Imaging Corporation). 2008. FORMOSAT-2 satellite sensor, http:// www.satimagingcorp.com/satellite-sensors/formosat-2.html. SPOT Image. 2004. Kompsat-2: The Alternative Metric Solution, http://www.spotimage.fr/web/en/1155-kompsat-2-images.php. Topan, H., G. Buyuksalih, and D. Maktav. 2007. “Mapping potential of OrbView-3 panchromatic image in mountainous urban areas: Results of Zonguldak testfield.” 2007 Urban Remote Sensing Joint Event, 11–13 April 2007. Paris: IEEE. Vane, G., and A. F. H. Goetz. 1993. “Terrestrial imaging spectrometry: Current status, future trends.” Remote Sensing of Environment. 44(2–3):117–126. Wade, M. 2007. Cosmo-SkyMed, http://www.astronautix.com/craft/coskymed.htm.
83
This page intentionally left blank
CHAPTER
3
Storage of Remotely Sensed Data
N
o matter whether they are directly downloaded from the Internet site of a data vendor or scanned from analog materials, remotely sensed data must be stored digitally in a certain format before they can be imported to an image analysis system for processing. Besides, the analyzed results must be saved in a certain format appropriate for them to be integrated with data from another source for further analysis. This chapter on storage of remotely sensed data consists of four sections. The first section is devoted to space needed and formats for storing multispectral remote sensing data in a computer. This discussion is followed by a survey of various data storage media, and generic graphic formats in which images are commonly stored. With the emergence of hyperspectral remote sensing data, a huge storage space is required to store both raw and intermediate results during data processing. Thus, the image analyst faces the problem of how to compress these data to a manageable level prior to, during, and after image processing. Included in the last section is a brief review of common data-compression methods and their major characteristics.
3.1
Storage of Multispectral Images 3.1.1
Storage Space Needed
Storage space is measured in bytes (8 bits), kilobytes (kb) or 1000 bytes, megabytes (Mb) or 1 million bytes, gigabytes (Gb) or 1 billion bytes, and terabytes (Tb) or 1 trillion bytes. The space needed to store a multispectral image is affected by several variables, including the physical size of the image (e.g., number of rows r and columns c), the number of spectral bands it has (b), and its radiometric resolution or quantization level q. So the total space D required is calculated as below:
D⫽r⫻c⫻b⫻q
(3.1)
85
86
Chapter Three For instance, storage of a 512 by 512 image of 4 bands that has a quantization level of 8 bits (256 gray levels) requires a space of 8,388,608 bites or 8.4Mb. This number doubles or quadruples if the quantization level increases by 1 or 2 bits. Data recorded in 10 bits and 11 bits that are rather common with the recently emerged hyperspatial resolution satellite data (see Chap. 2) require more storage space than other multispectral data. The above calculation of storage space applies to images whose pixel values are integers, as with all raw data. However, some processed results (e.g., band ratioing) are more appropriately saved in the floating point mode. Storage of such pixel values requires more space. Instead of a single byte for one pixel value, a pixel value saved in the float point format requires 4 bytes, thus quadrupling the storage space calculated using Eq. (3.1).
3.1.2
Data Storage Forms
If saved in the binary format, multispectral satellite images may be stored in one of three forms, band sequential (BSQ), band interleaved by line (BIL), and band interleaved by pixel (BIP). In the BSQ form, all the information related to one particular band and the header information is stored in one file. Multiple bands are written sequentially into this file (Fig. 3.1). In case of multispectral bands, the
FIGURE 3.1 The original satellite imagery of two spectral bands [refer to (a) and (b) here]. Storage of the two spectral bands in the BSQ format inside the computer (bottom draw).
Storage of Remotely Sensed Data header information is followed by the image content of the first band. This sequence is repeated for each of the multispectral bands. This method of image storage is the most intuitive and practical, but is difficult to deal with subscene images. This storage form has the advantage in certain image analyses, such as display of individual bands on the computer screen. In this case BSQ is preferable insofar as one does not have to read past ancillary data in an image stack. On the other hand, this form is ill suited to other analyses, such as image classification, in which multiple pixels at the same location need to be examined simultaneously. Other forms of image storage are more efficient in this regard. In the BIL form multiple bands are stored line by line. For instance, the first line of the first band is recorded, followed by the first line of the second band, and the first line of the third band until the first line of the last band. Then the second line of the first band is stored, followed by the same line in the second, third, … , and the nth band (Fig. 3.2). The BIP form is very similar to that of BIL except the value associated with a pixel is stored sequentially in all bands (Fig. 3.3). All pixels at a certain location are stored at a close proximity to one another. This kind of data storage is advantageous when these pixels are examined at the same time, such as during a classification. However, it is an inefficient form of storage if only one band is examined, such as image contrast enhancement in which a single band forms the focus of analysis.
Storage Media Over the years tremendous advances have been made in data storage media. In the mid-1980s ½- and ¼-inch magnetic tapes were commonly used for data distribution and even as media of permanent storage. At 1600 bytes/inch, these media do not have a large capacity, and data access is quite slow. Besides, they are highly instable, with a maximum shelf-life expectancy of about 10 years under normal storage conditions. In the early to mid-1990s, these bulky tapes were gradually replaced by more compact magnetic tapes such as 8 mm, Exabyte, and DAT tapes. These media had a much larger storage capacity (e.g., up to 1Gb) than ½- or ¼-inch tapes, but had one limitation in common: they required a specific tape drive to read the data. These tape drives are increasingly difficult to find these days as their use has been gradually phased out. Replacing these tapes are much more advanced and reliable media, such as compact disk (CD), digital versatile disk (DVD), and memory sticks. These storage media are not only more powerful with a larger capacity than magnetic tapes, but also have improved reliability and flexibility.
3.2.1
CDs
At present there are three major types of CDs in use: CD read-only memory (CD-ROM), CD recordable (CD-R), and CD rewritable (CDRW). CD-ROM is a write-once-read-many storage medium that is compact, indelible, and highly reliable. These disks have a life expectancy in excess of 30 years in a normal storage environment, much longer than the 10 years for ½-inch tapes. CD-ROM is an ideal media for storing digital image processing software and raw satellite data permanently. One outstanding advantage of CD-ROM is that data stored in it cannot be erased accidentally as they are protected. CD-ROM is the ideal choice for permanent storage, not only because of its reliability but also because of its huge storage capacity. As a type of optical disk, a CD-ROM is capable of storing data up to 1 Gb, even though the most common size is 670 Mb. At this capacity one CDROM can hold 15 SPOT (Le Systeme Pour l’Observation de la Terre) multiple linear array scenes. Compared with other storage media, CDROM has the advantages of being inexpensive, compact, portable, and efficient in access. Since it is a random-access medium, data can be accessed very efficiently at a speed that is generally much higher than magnetic tapes’. Most of all, CD-ROM is universally compatible as all desktop computers are equipped with a CD drive. Since all CD-ROMs conform to a standard size and format, they can be read by all CD drives, an advantage that cannot be enjoyed by other magnetic tapes. Recently, two types of CD have been invented, CD-R and CD-RW. CD-R disks are blank compact disks that allow data to be recorded to them. With a CD-write drive and proper software, remote sensing data and processed results of image analysis can be saved to a CD-R,
Storage of Remotely Sensed Data but only once. Once the CD is “burned,” its content becomes permanent and cannot be erased. Unlike CD-R, special rewritable CDs or CD-RW allow data to be written to them up to a thousand times. Similar to CD-R, they can be extended, but not overwritten. However, the writing session must be closed before its content can be read in a CD drive. Usually, the rewrite speed is slower than the write speed, both of which can reach hundreds of kilobytes per second. Read speed is much faster, though. The limit of CD-ROM is that special software and the CD-R or CD-RW drive are essential in saving data to them. This process of data writing is cumbersome and lengthy because all the data to be stored in a recordable or rewritable CD have to be queued up first and then burned to a CD in one transaction.
3.2.2
Digital Versatile Disk (DVD)
Similar to a CD, a DVD is also an optical, read-only storage media that is recordable and rewritable. It has the same physical dimension as a CD. Thus, all CD drives can recognize DVDs. At least the newest models of DVD drives are backward compatible with current CD media. At a thickness of 1.2 mm, a DVD has four storage capacity levels, 4.7, 8.54, 9.4, and 17.08Gb, depending on the disk structure. This storage capacity is at least seven times larger than that of a CD. Data recording can be single layered or double layered, single sided or double sided. DVDs also offer fast random access like in hard drives and CDs. Thanks to all of its similarities to a CD, a DVD is considered to be a future CD replacement (eMag Solutions, 2006).
3.2.3
Memory Sticks
Universal Serial Bus (USB) memory sticks, also known as USB flash drives or pen drives, are a recent addition to the vast range of storage media. There are an increasing number of brands of USB memory sticks on the market. All of them tend to have a standardized physical size, typically 9 to 10 cm long by 2.5 cm wide by 1.2 cm thick (2″ by 0.25″ by 0.75″ net size) (Fig. 3.4). At such a compact size, as small as a thumb, it is even more portable and less subject to physical damage than a CD because it is encapsulated inside a plastic shield with no parts moveable. This new generation of storage media has greatly expanded storage capacity that usually ranges from 512Mb to 1Gb. Larger capacities such as 2 and 4Gb are also available, but are disproportionately more expensive. Like CDs, these media are also write protectable. Content protection can be turned on or off at will by switching a tiny latch on the side of the drive. Access speed is very fast, as high as a few megabytes per second for reading and up to 1Mb/s for writing. Some USB memory sticks are shock resistant, and can retain data for more than 10 years. USB memory sticks are a convenient way of transferring a large amount of data among different users and between different machines.
89
90
Chapter Three
FIGURE 3.4 The Kingston USB memory stick with 2Gb flash storage memory. It has a dimension of 9 to 10 cm by 2.5 cm by 1.2 cm. It is also highly reliable as no parts are moveable.
Similar to a CD-ROM, this storage device is highly portable. All desktop computers sold since 1998 are equipped with the USB sockets. USB memory sticks are very easy and convenient to use. Once plugged into a USB socket on a computer, a new drive is automatically detected by the host computer running Windows 2000 or later version. Data stored in it can then be read, and data can be written to it if the write protection mode is turned off. USB memory sticks are preferable to CDs because they are smaller, more reliable, and much more flexible in data reading and recording. The same USB drive can be used for data reading and writing without any special software, which is standard with CD-R. Drives are not required for Windows ME, 2000, XP, and modern Apple Macs. If running in Windows 98, the computer must have the drive software installed.
3.2.4
Computer Hard Disk
The storage device with the longest history is the computer hard disk. It differs from other storage media in its • Capacity—usually very huge • Speed—very fast • Mobility—fixed to a computer, not easily moveable • Cost—much more expensive Tremendous progress has been made in the capacity, speed, reliability, and power usage of computer hard disks. Even though the
Storage of Remotely Sensed Data
FIGURE 3.5 A 36Gb, 10,000-RPM, IBM SCSI server hard disk, with its top cover removed. Note the height of the drive and the 10 stacked platters (the IBM Ultrastar 36ZX). (Copyright Hitachi.)
price for computer hard disks has tumbled (e.g., from over US $100 to less than 1 cent per megabyte) over the last two decades, hard disks (Fig. 3.5) are still more expensive than mobile storage devices per unit data size. Thanks to these reductions in price, hard disks have become a popular alternative media for data storage, especially for storing raw remote sensing data, and intermediate results temporarily. There are two types of computer hard disks in terms of their accessibility, local and networked. The former resides at a single desktop computer. Data in the hard disks are thus accessible to one image analyst at a time. This is the preferred option only when one user is engaged in analyzing the data. Networked hard disks reside in a server that offers a high degree of flexibility in data accessibility. All authorized users logged into the network have access to the data simultaneously, if authorized. This data storage media is preferred if the data are needed by multiple users working in a large research project or a classroom setting, or they have to be accessed from different terminals.
3.3
Format of Image Storage A remote sensing image may be stored in one of many graphic formats. Which format is the most appropriate depends upon the image processing system being used. Each image analysis system likely has its own proprietary format. Due to commercial sensitivity, such image formats are not routinely disclosed to the public. Therefore, these special image formats unique to a particular image processing system (e.g., the IMG format in ERDAS Imagine) are beyond the
91
92
Chapter Three scope of this section. Instead, the discussion will center around four commonly used image formats: generic binary, Graphic Interchange Format (GIF), Joint Photographers Experts Group (JPEG), and Tagged Image File Format (TIFF). They are recognizable by most image processing systems, if not all. These formats play an essential role in transferring raw or processed image data among different image analysis systems, and between image analysis systems and GIS.
3.3.1
Generic Binary
In the generic binary format, all image pixel values are represented as binary data of 0s and 1s without header information. Each pixel is represented as a byte. Thus, an image of 512 rows by 512 columns requires a space of 262,144 bytes to store. Ancillary image information, such as the number of rows and columns, and the number of spectral bands, is stored in the header, separate from the image data. The image analyst needs to specify these parameters when importing a generic binary image into an image processing system. If the header information is lumped together with the binary image file, the analyst needs to specify the exact number of bytes used to store it as well as the number of spectral bands when importing the data. In this way the computer knows how many bytes to skip before it starts to read the image data. Importing a generic binary image cannot be successful unless all of the above information is supplied correctly and completely. Generic binary is an image format that enables image data to be stored faithfully without any loss of information. However, it is also cumbersome to read generic binary images. They have to be converted to other graphic image formats as not many systems can read them directly. So this format is not widely used for storing remote sensing imagery. For exchanging image data between different systems, other generic image formats are more user-friendly than generic binary, and thus preferred.
3.3.2
GIF
Initially developed by CompuServe in 1987, GIF is a standard defining a mechanism for storing and transmitting satellite imagery data (Fulton, 2008). A GIF file is made up of several parts, including a signature, a screen descriptor, global color map, image descriptor, local color map, and finally raster data (CompuServ, 1987). Unlike the other components, the last three parts are repeated many times. The screen descriptor contains all ancillary information about the image, such as the number of bits used, image width and height, and background color, and so on. Though optional, the global color map is recommended for accurately rendering color images. If the global color map is present, 3 bytes are used to specify the relative intensities of red, green, and blue, respectively. The image descriptor defines the
Storage of Remotely Sensed Data placement and extents of the image to follow, and the pixel display sequence. As with all raster data, pixels in a row are stored left to right sequentially. The entire image is stored row by row sequentially from top to bottom. GIF uses a palette of only 256 colors that allows a single band of continuously varying tone (i.e., grayscale) to be represented adequately if it is recorded at 8 bits such, as with Landsat Thematic Mapper (TM) imagery. However, the tens of thousands of colors on a color composite of three spectral bands would suffer considerably in its quality if stored in the GIF format. They require 24 bits to store faithfully, 8 bits for each band, creating a rather large 256-color file. The GIF format was thus improved in 1989 to include a compression function. The lossless LZW compression (see Sec. 3.4.3) was adopted to reduce image size without degrading image quality. GIF files offer optimum compression (e.g., smallest files) for solid-color graphics. Initially designed for the explicit purpose of uploading graphics to the World Wide Web, GIF is particularly suited to store images that are captured using screen dumping and that are to be embedded into other systems such as Word and PowerPoint for crude visualization. In this sense, it is useful for capturing digitally processed remote sensing results for presentation at professional meetings. This format is not suited to store either raw or processed remote sensing data, or any graphic results derived from them.
3.3.3
JPEG
Named after the group that originated it, the JPEG format is a popular and efficient graphic format for storing images, albeit not always faithfully. Frame images of continuous tone in binary, grayscale, or color can be stored in JPEG. This format is particularly suited for those images that must be reduced to a very small size through image compression. There are three coding systems (Gonzalez and Woods, 2002): • A lossy baseline coding system that is adequate for most compression needs • An extended coding system for greater compression • A lossless independent coding system for reversible compression During data compression, image pixel values are modified via a complex mathematical formula. The relationship between a pixel and its neighbors is examined in all directions to identify the factors for that formula so that these pixels can be best represented. Minor detail that does not fit for compression is not retained in the compressed image in order to achieve a high efficiency of data compression at the expense of losing image quality. Prior to storing an image in the JPEG format, the system analyst is given the option of specifying the amount of compression desired,
93
94
Chapter Three
(a)
(b)
FIGURE 3.6 Comparison of a 734-by-454 image that has been compressed with JPEG (quality: 60, standard deviation: 2). (a) Original image, (b) image that has been compressed with JPEG. See also color insert.
and the acceptable loss to the quality of the compressed image. A compression ratio of around 60 percent usually results in the optimum balance between the quality of the compressed image and reduction in its size (e.g., the compression will not cause too much loss of quality with a reasonable reduction in file size) (Shannon, 1997). As for the quality of the compressed image, there is hardly any noticeable degradation to the naked eye. As shown in Fig. 3.6, the image saved in the JPEG format has a size of only 49.8Kb instead of 1.14Mb in the ERDAS IMG format. A compression ratio of 96:1 is achieved in this case. With the current compression standard such as the coding systems mentioned above, the JPEG format does not allow the original quality of an image to be fully recoverable from the compressed file because it is intentionally designed so (Fig. 3.6). For this reason, JPEG should be avoided in storing raw remote sensing data and all products derived from them.
3.3.4 TIFF and GeoTIFF Developed by Aldus Corporation (1988), TIFF has become the industry standard for image interchange. It supersedes all existing graphics or image file formats by incorporating enough flexibility to eliminate the need of or justification for proprietary image formats. However, proprietary information can still be stored in a TIFF image without violating the intent of the format. TIFF is characterized by three distinctive features of being extendable, portable, and revisable. New image types can be added without invalidating older types. Besides, the addition of new informational fields to the format will not affect the ability of older applications to read the images. This format, independent of the platform and operating system, can be used as an internal one for image editing and swapping. This all-purpose format has a rich and variable structure that is more complex than many of the proprietary image formats it supersedes. A unique tag is used to identify individual fields that can be present or absent. All TIFF images are made up of three components:
Storage of Remotely Sensed Data the header, the image file, and the tag (Davenport and Vellon, 1987). The 8-byte image header contains information vital for the correct interpretation of the remainder of the TIFF file. Namely, it points to one or more image file directories. Also contained in the header is the information on the TIFF version number and the byte offset of the first image file directory. An image file directory consists of a 2-byte count of the number of fields, followed by a sequence of 12-byte field entries, and a 4-byte offset of the next image file directory, if present. The image file directories contain information about the image, and pointers to the actual image data. Multiple image file directories are needed to store multispectral imagery, each image file directory reserved for a single band. The flexibility of the TIFF format derives from its tags that describe the image. In total, there are 35 commonly used tags in the TIFF specification (version 5.0). Four classes of images can be contained in a TIFF file, B (Bilevel or bitmap), G (Gray level), P (Palette or pseudocolor), and R [RGB (Red-Blue-Green)]. B class images are stored with 1 bit per pixel. Grayscale images require 2 to 8 bits per pixel, color images need up to 24 bits per pixel. This added complexity slows access to the image files. Their size may be reduced through data compression, with the compression method stored in the “compression” tag. The GeoTIFF interchange standard is an extension of the popular TIFF format, to support georeferenced remote sensing data (Ritter and Ruth, 1997). This standard unifies various internally represented transformations between raster data and the reference coordinate frame. It guarantees accessibility to images stored in the conventional TIFF format, as well as all additional data needed for georeferencing or geocoding independent of the TIFF image data. This limitation in the number of available TIFF tags is overcome with the addition of a new level of abstraction called GeoKeys (Hild and Fritsch, 1998). In this format geospatial tags are imbedded within the TIFF file. With this metatag concept, only six TIFF tags suffice to carry all georeferencing information, namely, cartographic projection, geodetic datum, pixel size, image spatial coordinates, and any additional information such as projected coordinate systems, without destroying the data structure of files saved in the standard TIFF format. GeoTIFF is especially suitable for processed remote sensing data. In fact, it is the only generic image format that enables geospatial information of a geometrically rectified image to be preserved. In general, this format is also platform independent, just like TIFF. Any digital image analysis systems for analyzing remotely sensed and GIS data are able to read GeoTIFF data correctly. When the image file is read, all the georeferencing information is automatically loaded up into the computer. Any generic graphics software packages that do not utilize spatial information, such as Photoshop or CorelDraw, will still be able to read in GeoTIFF files as regular TIFF files (in some cases support for TIFF 6.0 is required). However, all the spatial information contained in the tag will be lost.
95
96
Chapter Three
3.4
Data Compression Compression of remotely sensed data is becoming an increasingly important issue in digital image processing in light of the emergence of hyperspatial and hyperspectral resolution data that are measured easily in hundreds of megabytes and more. These bit-mapped images require an incredible amount of storage space. For instance, a very small 16-bit Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) scene of 512 by 512 at 224 spectral bands requires over 117Mb of space to store. Processing of this small scene requires correspondingly large swap and temporary spaces for intermediate results. On the other hand, data redundancy in the form of repeated occurrence of the same pixel values (e.g., extensive distribution of the same cover on the Earth’s surface) is rife in some satellite images. Data redundancy is effectively solved by reducing the data volume via data compression, defined as “the process of reducing the amount of data required to represent a given quantity of information” by Gonzalez and Woods (2002). Data compression not only reduces the amount of data that have to be stored and transferred, but also speeds up the processing, thus saving time and cost. Data compression techniques fall into two broad categories, those that do not result in any loss of information (i.e., error free) and those that result in partial loss of information. Error-free compression, also known as lossless compression, is essential when the compressed image data have to be restored to their original state without any loss of information. Typically, a compression ratio, defined as the ratio of the number of information carrying units in the compressed data to that of the raw data, of 2 to 10 can be expected. There are a number of error-free compression techniques, including variable-length coding, run-length coding, and lossless predictive coding.
3.4.1 Variable-Length Coding The simplest approach toward data reduction is to reduce coding redundancy. One way of achieving coding reduction is to assign the shortest codes to the most probable sequence of pixel values in the input data or the result of a gray level mapping operation (e.g., pixel difference, run lengths, and so on) after a variable-length code is constructed. A good example of variable-length coding is Huffman coding. As the most popular technique, Huffman coding produces the smallest possible number of codes from the same source than other coding methods. It involves three steps: • First, all possible pixel values in the input image are identified with their probabilities of occurrence calculated. These probabilities are then ordered in the descending order. The two lowest probability values are combined recursively to form a “compound” value that replaces them in the next round of probability calculation. This process is iterated until only two probabilities are left.
Storage of Remotely Sensed Data • Second, each reduced value is recoded, starting from the smallest to the original one, in a backward sequence. The two remaining probabilities are assigned the simplest codes of 0 and 1 arbitrarily to distinguish them from each other. If 0 is assigned to the value of a lower probability, it has to remain unchanged in subsequent operations in which more codes are added to the front of the existing code(s). • Third, repeat the process until the original source is reached. All codes for the source values end up having a varying length. The original pixel values are decoded by examining the string of codes from left to right in a lookup table manner. This string of codes is uniquely decodable. For 8-bit remote sensing imagery, there are 256 possible pixel values. The construction of the optimal binary Huffman code (i.e., 254 source reductions and 254 code assignments) is a daunting task. It is hence necessary to simplify code construction at the expense of coding efficiency. In lossless predictive coding, interpixel redundancies of closely spaced pixels are eliminated through extraction and coding of only the difference between the actual and predicted value in each pixel. The degree of data compression achievable with this method is related directly to the entropy reduction.
3.4.2
Run-Length Coding
It is quite common to encounter a long string of the same pixel value in a row, especially if the image is binary. Binary images can be created through a process called bit-plane coding. An original image of either grayscale or color is decomposed into binary ones in the form of the base 2 polynomial. For instance, the tone of an image changes little across the same hayfield. In this case, the run-length coding method can be adopted to reduce image data. In this coding, two values are used to code a particular value, say zero. Whenever this value is encountered in a row from left to right, it is coded by two values in the output file. The first is the value of the first run, and the second describes the number of times it is encountered or successive runs (Fig. 3.7). Thus, a high efficiency of data compression is expected if a long contiguous string of pixels have the same value. The input data can be treated as individual bytes, or groups of bytes that represent something more elaborate, such as floating point numbers. Although this method is an effective means of data compression, run-length
Original data stream: 25 34
4
4
4
32 28 34
4
Run-length codes:
34
1
4
3
28
FIGURE 3.7
25
1
An example of run-length coding.
32
1
1
34
1
4
1
97
98
Chapter Three codes can be coded using variable-length encoding to further reduce data volume. Run-length encoding can be extended to two-dimensional (2D) binary images using various coding procedures, one of which is known as relative-address coding. In this coding the binary transitions that begin and end each pixel are tracked. The simplest way of achieving this is to note the beginning and end positions of a nonvoid pixel in a row by row fashion. The binary image shown in Fig. 3.8 is coded as below: Row 4 Row 5 Row 6 Row 7 Row 8 Row 9 Row 10 Row 11 Row 12
4, 6 2, 6 1, 8 1, 10 3, 12 3, 12 5, 8 5, 6 5, 6
As shown in the above diagram, a large compression ratio is achieved if the binary image extends far in a row. Instead of recoding all the pixels, only a pair of coordinates is needed for the presentation of each row of image. This compression ratio may be further improved by combining different coding methods (e.g., Huffman and run-length) in one compression. 1
2
3
4
5
6
7
8
9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12
FIGURE 3.8 Two-dimensional run-length coding is a useful way of reducing remote sensing data volume, even though its most common application is in compressing fax files.
Storage of Remotely Sensed Data
3.4.3
LZW Coding
Initially devised by Ziv and Lempel (1977), and later supplemented by Welch (1984), the LZW compression is an error-free compression algorithm in which fixed-length codes are used to replace patterns of variable-length sequences of pixel values in the original data. Conceptually, this general purpose compression method is simple and versatile. It does not require any a priori knowledge on the occurrence probability of pixel values to be encoded. Instead, a code table of 4096 entries is constructed. All encoded codes are 12 bits long, each corresponding to one of the entries in the table. Codes 0 to 255 in the code table always represent single bytes from the input image. Codes 256 through 4095 are represented as sequences of bytes. For example, code 439 may represent the sequence of three bytes. Each time the compression algorithm encounters this sequence in the input file, code 439 is placed in the encoded file. During uncompression code 439 is translated back to the true 3-byte sequence via the code table. The longer the sequence assigned to a single code, and the more often the sequence is repeated, the higher the compression ratio. During compression the computer reads the data from the input stream and builds a code or translation table with the patterns as it encounters them. Initially, there are only 256 entries in this code table, the remainder of the table being blank. Thus, the first codes going into the compressed file are single bytes from the input file being converted to 12 bits. As the encoding proceeds, a new pattern is added to the code table whenever it is encountered for the first time. At the same time, its index is added to the output stream. When this pattern is encountered since the last code-table refresh, its index from the code table is put on the output stream, thus achieving data compression. When the number of patterns detected by the compressor in the input stream exceeds the number of patterns encodable with the current number of bits, the number of bits per LZW code is increased by one. The compression process is made up of four steps: • Definition of the number of bits needed to represent the actual data. The first byte of the compressed data stream is a value indicating the minimum number of bits required to represent the set of actual pixel values. Normally, this will be the same as the number of color bits. • Compression of image pixels to a series of compression codes. • Conversion of these compression codes into a string of 8-bit bytes. • Package sets of bytes into blocks preceded by character counts and output. Each new pattern is entered into the code table and its index is used to replace it in the compressed stream.
99
100
Chapter Three During uncompression the decoder reconstructs an identical code table directly from the compressed data as it decodes the encoded data stream, without having to transmit the code table separately. The original characters are restored from the compressed file by taking a code at a time and by translating it to the character(s) it represents in the code table.
3.4.4
Lossy Compression
Lossy compression is able to achieve a very high compression ratio at the expense of losing a certain amount of information in the original image. With some loss of information, the compression ratio can be increased from lossless compression by tens of fold to 100:1 for singleband imagery. Lossy compression technique is suitable for applications that can tolerate some loss of information that is perceptually insignificant. Occasionally the level of information loss may be able to be specified prior to compression. Lossy compression differs from error-free compression in that it involves a quantizer between the symbol encoder and the stage when the prediction error is calculated. The input to a quantizer can either be a scalar or vector. In the latter case, it is called a vector quantizer. Lossy compression may be implemented in one of three types: lossy predictive coding, transform coding, and wavelet coding. In lossy predictive coding the quantizer absorbs the nearest integer function of the error-free encoder, between the symbol encoder and the point at which the prediction error is formed. It establishes the relationship between the degree of compression and distortion associated with lossy predictive coding. In transform coding the input image is first transformed linearly in a reversible fashion to decorrelate the pixel values of each subimage or to pack as much information as possible into the smallest set of transform coefficients. In a transform compression the data resulting from a signal passing through the transform (e.g., the discrete Fourier or cosine) will not have the same information-carrying role. The transform coefficients are then quantized and coded. Those coefficients that carry the least information are quantized at the coarsest interval or truncated to zero. In this way a high compression ratio is achieved without causing too much distortion to the image. The encoding process consists of four steps of decomposition into subimages, transformation, quantization, and coding. In the decoding process these four steps are performed in the reversed order. Of the various image transforms, discrete cosine transform is better at packing information to the coefficients than others such as the discrete Fourier transform. The most popular subimage sizes are 8 by 8 and 16 by 16. A larger subimage size will cause both the level of compression and computational complexity to increase. In wavelet coding the pixel values of an input image are processed via the wavelet transform function to remove any correlation among them. Afterward the original image is decomposed into several
Storage of Remotely Sensed Data components, such as horizontal, vertical, and diagonal coefficients with zero mean and Laplachian-like distributions. Most of the important visual information is projected into a few coefficients. They are then quantized and coded using one of the lossless coding methods mentioned above. Those coefficients that carry little visual information are either quantized at a coarse level or discarded altogether. Consequently, the raw image cannot be restored via decoding that is accomplished by inverting the encoding process without the quantization step. Wavelet coding differs from transform coding in that the input image does not have to be divided into subimages because wavelet transforms are both computationally efficient and inherently local. The level of computation intensity is affected by the specific form of wavelets. There are several in use, the most common being the Daubecies wavelets and biorthogonal wavelets. The latter is more computationally intensive than the former, but can achieve a higher compression ratio. The level of computation intensity is also affected by the number of transform decomposition levels.
3.4.5
JPEG and JPEG 2000
The JPEG compression is usually implemented in several sequential steps (Smith, 2004): • First, the image is divided into subimages of 8 ⫻ 8 pixels from left to right and from top to bottom, each to be compressed independently. A subimage initially represented with 64 bytes is reduced to much fewer bytes by subtracting the quantity of 2n-1, 2n being the maximum pixel value. The difference is then transformed with the discrete cosine transform (DCT). DCT is the best among various standards in terms of ease of implementation and the achievable compression ratio. Blockbased DCT techniques are characterized by lossy compression that has become the norm. Thus, it has been widely used to store satellite data in several remote sensing systems at present, even though other more efficient and flexible compression techniques have become popular. Each of the 64 spectra produced by the 8 ⫻ 8 subimage has the amplitude of a basis function. Each spectrum is compressed by reducing the number of bits and eliminating some of the components in a step controlled by a quantization table. • Next, the modified spectrum is converted from an 8 ⫻ 8 array into a linear sequence, at the end of which all of the high frequency components are merged. This groups the zeros from the eliminated components into long runs. These runs of zeros are compressed using run-length encoding. • Finally, the compressed file is formed by encoding the sequence with either Huffman or arithmetic encoding.
101
102
Chapter Three Stored in the compressed image are only the transform coefficients rather than pixel values of the image. During uncompression the JPEG decoder re-creates the normalized transform coefficients first. This can be achieved easily via a lookup table as they are coded using the uniquely decodable Huffman coding. These coefficients are plugged into the formula to generate pixel values that best represent the original ones. An approximate version of the original 8 ⫻ 8 subarea is created via the inverse transform. Tiling of all subimages restores the original uncompressed image. JPEG compression has the drawback of introducing artifacts along the border of subimages after they are mosaicked to form the original image, because each subimage is compressed separately. This problem disappears with JPEG 2000. JPEG 2000 is a new image compression standard that is backward compatible with, and further extends, the current standard JPEG with increased flexibility in image compression and in access to the compressed data. Unlike JPEG that can compress RGB imagery at most, JPEG 2000 is able to compress images up to 256 bands. A very large compression ratio can be achieved with very little appreciable degradation in image quality. Unlike JPEG, which uses a Fourier transform, JPEG 2000 uses two coding modes, DCT coding and wavelet coding, to achieve more efficient compression. Based on wavelet technology, JPEG 2000 enables an image to be compressed with or without loss of information. Error-free compression is achieved using a biorthogonal, 5/3 coefficient scaling and wavelet vector at an expected compression ratio of 2:1 (Le Gall and Tabatabai, 1988). Ordinary lossy compression, if implemented with a 9/7 coefficient scaling-wavelet vector, is able to achieve a compression ratio of up to 200:1 (Antonini et al., 1992). Such a large ratio is achieved because the “mother wave,” which best represents the wavelet signature generated from scanning an image, does not accompany the compressed image data. Instead, the JPEG 2000 decoder is equipped with a universal mother wave. Whenever the decoder is supplied with a compressed image, it can detect the mother wave used. Furthermore, lossy compression can be embedded into lossless compression. Since an image can be regarded as composed of different regions of interest (e.g., images embedded into text), different compression schemes can be applied to different regions of interest in the image to help preserve the image quality of those regions. The shape of a region can be square or rectangular. However, it can also be a circle, oval, triangle, or bloblike. In JPEG 2000, the original image is optionally divided into multiple, nonoverlapping subimages called tiles. In case of three components (e.g., color composite), all components (e.g., red, green, and blue layers) are divided identically. Alternatively, these components can be linearly combined either reversibly or irreversibly to be decorrelated with each other to achieve a high compression ratio. The dimension of each tile is always dividable by 2. Arbitrary tile sizes
Storage of Remotely Sensed Data are allowed, up to and including the entire image (i.e., no tiles). Each tile must be of the same size except border tiles. Treated as a separate image, each tile is processed independently using the one-dimensional discrete wavelet transform. It can be decomposed to different levels, to which different coefficients in quantization are applied. It can also be accessed and referenced independently of each other. Thus, it is possible to uncompress any portion of the compressed image. The JPEG 2000 image compression process is made up of several steps that may include data ordering, arithmetic entropy encoding, coefficient bit modeling, quantization, wavelet transformation of tiles, level shifting and component transformations, and coding of images with regions of interest. Once the entire image has been compressed, a postprocessing operation passes over all the compressed blocks and determines the extent to which each block’s embedded bit stream should be truncated to achieve a desired bit rate, distortion bound, or other quality metric. A separate bit stream is generated for each tile. The bit stream is organized as a succession of layers, where each layer contains the additional contributions from each code tile (some contributions may be empty). The final bit stream is composed of a collection of such layers. Each layer has an interpretation in terms of overall image quality, indicating the discrete lengths to which the bit stream has been truncated. To decode the compressed image, the compression procedure is reversed. The inverse discrete wavelet transformation is applied to the compressed image file, undo the coefficient bit modeling, undo the entropy encoding, and read the tiles from the codestream (Miljour). The information contained in the tiles and marker headers tells the decoder how to reconstruct the original image. Since images no longer need to be divided into subimages of 8 ⫻ 8 pixels, the artifacts of uncompressed JPEG images are avoided. In additional to the above compression methods, there is another method called fractal compression. This method demands a huge amount of time to generate fractal formulae from an image. However, the reverse process is very simple and can be achieved relatively fast. The compression ratio that can be accomplished with fractal compression is independent of zooming and asymmetric process. This technique of compression is very suitable for applications where compression is done at a single location and decompression at a large number of places, such as distribution of remotely sensed data. Compared with the above techniques, the quality of a fractaldecompressed image is independent of zooming, which makes it suitable for graphics applications. In fractal transform, the entire image is presented in terms of parts of itself and encoded. Fractal basics can be described as a notion of “futuristic photocopies.” The amount of information lost through the compressionuncompression process is dependent upon the time spent in deriving the fractal formulae. The more time spent in the encoding process, the
103
104
Chapter Three more details of an image get induced in the set of fractal formulae, thus reducing the loss of information. The fractal transform identifies the fractals that make up an image, hence finding the fractal formula that can recreate it. This technique is uniquely characterized by an incredibly large compression ratio achievable on the order of 1000:1.
References Aldus Corporation. 1988. An Aldus/Microsoft technical memorandum: 8/8/88 (TIFF version 4.0), http://www.dcs.ed.ac.uk/home/mxr/gfx/2d/TIFF-5.txt. Antonini, M., M. Barlaud, P. Mathieu, and I. Daubechies. 1992. “Image coding using wavelet transform.” IEEE Transactions on Image Processing. 1(2):205–220. CompuServe. 1987. Graphics Interchange Format: A standard defining a mechanism for the storage and transmission of raster-based graphics information. Columbus, OH, http://www.w3.org/Graphics/GIF/spec-gif87.txt. Davenport, T., and M. Vellon. 1987. Tag Image File Format (Rev 4.0), http://www. martinreddy.net/gfx/2d/TIFF-4.txt. eMag Solutions. 2006. DVD (Digital Versatile Disk), http://www.usbyte.com/ common/dvd.htm. Fulton, W. 2008. A few scanning tips, http://www.scantips.com. Gonzalez, R. C., and R. E. Woods. 2002. Digital Image Processing (2nd ed.). Upper Saddle River, NJ: Prentice-Hall. Hild, H., and D. Fritsch. 1998. “GeoTIFF—A standard for raster data exchange.” Geo Informations Systeme. 11(2):5–9. Le Gall, D., and A. Tabatabai. 1988. “Sub-band coding of digital images using symmetric short kernel filters and arithmetic doing techniques.” IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, New York, NY, pp. 761–765. Ritter, N., and M. Ruth. 1997. “The GeoTIFF data interchange standard for raster geographic images.” International Journal of Remote Sensing. 18(7):1637–1647. Miljour, S. An introduction to the JPEG 2000 image compression standard, http:// www.gvsu.edu/math/wavelets/student_work/Miljour/, last accessed 17 June 2008. Shannon, R. 1997. Image file formats, http://www.yourhtmlsource.com/images/ fileformats.html. Smith, S. W. 2004. Digital Signal Processing: A Practical Guide for Engineers and Scientists. Amsterdam: Elsevier. Welch, T. A. 1984. “A technique for high-performance data compression.” IEEE Computer. 17(6):8–19. Ziv, J., and A. Lempel. 1977. “A universal algorithm for sequential data compression.” IEEE Transactions on Information Theory. IT-23(3):337–343.
CHAPTER
4
Image Processing Systems
S
ince their advent, digital image analysis systems have undergone revolutionary changes. In the mid-1980s when digital image processing was still in its infancy, satellite data were analyzed in microcomputer-based image analysis systems comprising a few separate but linked components. A monitor was reserved for user interface with the system and another monitor for displaying images. Needless to say, these command-driven systems running in the VAX and later the UNIX environment were user-unfriendly and cumbersome and inefficient to operate. Since then digital image processing systems have greatly improved their performance and functionality owing to the advances in computing technology in direct response to the need for efficiently processing a huge quantity of remote sensing data. The early generation of primitive image processing systems has evolved into sophisticated icon-driven desktop ones. Running in the Windows XP environment, these advanced systems are capable of performing multiple geospatial tasks, one of which is image analysis. Accompanied by these advances are considerably improved and expanded analytical functionality of image processing, as well as great ease and flexibility of operation. Many peripheral processing functions, such as postclassification filtering and raster-based spatial data analysis and modeling, can be performed in one system. Moreover, results from one system can be exchanged or integrated with others from another system at will. A number of mature image processing systems are available for the explicit purpose of analyzing and visualizing remotely sensed data. This chapter presents a comprehensive overview of the leading image analysis packages currently on the market. The criteria for inclusion in this overview are the image analysis functions. Although the ELT/5500 system has some image manipulation capabilities, such as contrast enhancement and geoprocessing (e.g., image registration and coregistration), it will not be covered because it lacks image classification capability. Since one system is able to perform many geoinformatic functions, some of which may not be related closely to image
105
106
Chapter Four processing, this overview thus concentrates mostly on the image analysis components of these systems. The strengths and limitations of each system in image analysis will be critically evaluated and compared with other systems wherever possible.
4.1
IDRISI IDRISI is a sophisticated desktop raster geographic information and image processing system developed by the Graduate School of Geography at Clark University, Worcester, Massachusetts. The latest release, called IDRISI Andes (version 15), is 32-bit Windows NT– compatible. This affordable system comprises over 250 modules or stand-alone programs for the digital analysis and visualization of spatial data, including remotely sensed imagery, in a single package. These modules ranging from the basic to highly advanced in their functionality, and are grouped into database query, spatial modeling, image enhancement, and classification. Those modules related specifically to GIS, such as database query and GIS modeling, will not be covered here. Instead, this section focuses on its image analysis functions.
4.1.1
Image Analysis Functions
The capacity of IDRISI for processing remotely sensed data falls into six areas: image restoration, enhancement, transformation, classification, change detection, and accuracy assessment. In image restoration, images are corrected both geometrically and radiometrically using the procedures in IDRISI. Radiometric correction may be undertaken to eliminate the atmospheric effects and destripping. Images can be geometrically corrected using interactively selected ground control points (GCPs). Such images may be integrated with georeferenced data from other sources. Images may be enhanced via contrast adjustment, PAN sharpening (i.e., merging of the panchromatic band with the multispectral bands from the same sensor), and filtered using edge enhancement. The spectral quality of an image can be enhanced using such modules as noise removal through convolutional filters and Fourier analysis. IDRISI provides all major data preparatory tools, such as image subsetting, mosaicking and vector generalization. Images may be transformed using an extensive range of procedures that include principal component analysis, canonical component analysis, color space transformations, and vegetation indexing. IDRISI offers an unparalleled suite of classifiers among all leading image analysis systems. Remote sensing data can be classified either unsupervised or supervised. The unsupervised method is based on clustering analysis. The supervised classifiers include maximum likelihood, minimum distance to means, and parallelepiped.
Image Processing Systems Signature essential in supervised classification may be developed from training samples or from laboratory spectral libraries. In addition, there are two novel classifiers in IDRISI Andes, the Fisher classifier, based on linear discriminant analysis, and the backpropagation neural network classifier. Apart from 13 such hard classifiers, there is an extensive set of soft classifiers, totaling 14 for analyzing multispectral data, such as those based on the Dempster-Shafer evidence theory, fuzzy logic, and the linear mixture model. It is possible to combine different classification procedures (e.g., Bayesian probability calculation with linear spectral unmixing) to form hybrid procedures to produce more reliable classifications. Also released in this version is the largest suite of machine learning/neural network classifiers, such as classification tree analysis, multilayer perceptron, self-organizing feature map, and fuzzy ARTMap. Six special modules are designed to analyze hyperspectral images. IDRISI Andes is also able to carry out change detection based on image differencing, image ratioing, time series Fourier analysis, spatial/ temporal correlation, and image profiling over time. Image differencing may be implemented via change vector analysis and regressionbased calibration. Derived using the temporal resonance module, the temporal index indicates the degree of correlation between every pair of pixels in multitemporal images. Special change analysis tools are available for assessing change quickly. The most celebrated addition to IDRISI Andes is the land change module for modeling ecological sustainability, developed specifically for the International Center for Biodiversity Conservation in the Andes. It is able to analyze land conversion, predict and model change in the future via Markov chain analysis or cellular automata, and assess the effect of the change on biodiversity (Hermann, 2006). The modeled results may be validated through a set of comparison tools with categorical map data. The transition in land cover or potential of change can be explored from both static and dynamic explanatory variables using either logistic regression or multilayered perceptron neural network.
4.1.2
Display and Output
Raster images may be displayed in black and white or in true (24-bit) color, or as transparent. Color display is possible with any bands designated as the red, green, and blue layers of a RGB composite (Fig. 4.1). Each layer can be assigned a symbol file created with a special symbol/palette development tool. Images may be displayed as three-dimensional (3D) perspectives, contour plots, and analytical hillshading. The fly-through module provides real-time interactive animation over a digital elevation model (DEM). A 3D impression of stereoscopic images may be obtained with the assistance of a pair of anaglyphic glasses. Moreover, images may be displayed on screen as a map composition, into which nonimage layers such as hydrography, roads, and elevation
107
108 FIGURE 4.1
The icon panel of IDRISI Andes. Shown in the picture is a 24-bit composite image. (Copyright: Clark Labs, Clark University.)
Image Processing Systems in the raster or vector format may be added. Annotative information, such as legend, scale bar, north arrow, and text, can all be easily inserted. Data may be classified before they are displayed using enhanced cartographic symbols. It is possible to query a map composition extensively to determine the identities or attributes of features in a layer that can be toggled on or off at will. The created composition can be saved, printed, or transferred to other packages.
4.1.3
File Format
All files in IDRISI consist of two parts, an image file and a document or metadata file that are stored separately, distinguishable by their extension. Raster data files are usually stored in the binary format. However, a wide range of other data types are also supported, such as unsigned 8-bit integer, integer, real, RGB8, and RGB24 (24-bit bandinterleaved-by-pixel format in the order of blue, green, and red). The document file contains descriptions of the image file, such as its dimension, cell size, and ground coordinate system, and so on. IDRISI has an excellent range of capabilities of importing, processing, and exporting raster imagery in many formats. In addition to common imagery formats like SPOT (Le Systeme Pour l’Observation de la Terre), Landsat, and Radarsat data, other data formats such as the hierarchical data format (HDF) from the Terra satellite are also supported. Data in all major formats can be imported from other image processing systems, including ESRI (Environmental Systems Research Institute) ArcRaster, ERDAS (Earth Resources Data Analysis System) Imagine, MapInfo vector, ENVI (Environment for Visualizing Images), and ER (Earth Resources) Mapper files, using a consortium of tools. Imported images can be rubber-sheet resampled to a user-specified coordinate system, or projected to the desired geodetic datum. IDRISI 15 supplies more than 400 reference system parameter files and instructions on how to create a desired projection. Data can be converted from raster to vector or vice versa.
4.1.4
User Interface
In this system's graphical user interface (GUI), all modules are accessible via a main toolbar under seven categories: file, display, GIS analysis, modeling, image processing, reformat, and data entry. All image processing functions are further organized into more categories in the second tier of program menu, such as restoration, enhancement, transformation, hard classifier, soft classifier, hyperspectral image analysis, accuracy assessment, and restoration. Common modules and functions are directly available through icons immediately below the toolbars. Once activated, a new window is opened for further interaction with the system. In the new release an Explorer bar has been added to provide a heads-up reference that enhances file organization on the fly and that are most likely to affect the workflow of regular users.
109
110
Chapter Four Another way of interfacing with individual modules is via the Macro Modeler (Warner and Campagna, 2004). It enables the user to develop and link a sequence of image analysis routines to achieve higher efficiency and to enhance workflow. Complex and repetitive analysis is thus sped up. Only the datasets need to be modified if the same analysis is attempted for a different area. This is very significant because IDRISI requires a number of processing steps to accomplish a single analytical task. Through this modeling environment, a number of processing steps can be fulfilled by clicking one button. With such a graphic and intuitive macro modeler, the user does not need to have any programming background in building a model. However, it is not possible to access the image processing modules via the Macro Modeler.
4.1.5
Documentation and Evaluation
Clark Labs provides a comprehensive online document titled “The Guide to GIS and Image Processing” in the PDF format. Included in this guide of more than 300 pages are both basic and advanced image processing topics. Beginners may find a lot of the advanced topics intimidating, and may wish to study them only after they have grasped the essence of image analysis and familiarized themselves with IDRISI. Other documents that are delivered with the purchase of IDRISI licenses include a user’s guide, a quick reference, and a book of tutorial exercises. From the reference book the user can learn what steps to undertake to perform a particular task. Containing descriptions of image analysis functions in IDRIS, the reference book may be useful for novice users who have not had much experience with IDRISI. However, experienced users may find this book redundant. In contrast, the tutorial exercises are recommended for both novice and experienced users to learn what IDRISI can offer and to familiarize themselves with running the system (Hermann, 2006). Totaling more than 40, these exercises range from introductory to advanced, arranged in order of increasing difficulty and complexity. Accompanying the tutorial is a large amount of sample data representing a wide variety of geographic areas and image types. In addition, IDRISI is equipped with an excellent online help document in the HTML format with the index function not available in the PDF file. Under each entry is a short description of the program, the general nature of the input and output, and additional notes on technical information, such as the algorithms used, if applicable. The last section of the help information is about how to run the program from the command line. Overall, IDRISI is an affordable, very user-friendly system that has a comprehensive range of image analysis functions with excellent documentation. It offers the most diverse range of image classifiers and their combined uses. Its open architecture and extensibility through its application programming interface enables developers to
Image Processing Systems integrate new modules or construct metamodules that control existing IDRISI modules via a scripting language such as Visual Basic, Delphi, or Visual C++. Even the menu system can be completely reconfigured. Over the years this system has undergone drastic expansion and improvement. It has evolved from a system excellent for teaching the principles of digital image analysis to a professional system suitable for practical production of image analysis results at the industry standard. Its wide use by government agencies, schools, research institutions, academia, and the private sector testifies to its popularity (Simonovic, 1997). Originating from a desktop system, IDRISI used to have a limited capacity to handle large image files. Now this limitation has largely disappeared with the improved functionality of personal computers. However, other limitations persist, such as the legacy of the highly modulized structure in which each module performs a narrowly defined step. This design philosophy may be ideal for teaching the concept of image processing, but is ineffective for practical production because a number of modules are needed to perform a simple analytical task such as image classification. The second critical limitation of IDRISI is its lack of automation. Although the Macro Modeler is a kind of scripting language for speeding up the process, it is not applicable to all functions. Besides, processing speed is slow for certain functions such as zoom on screen (Huber, 2000). Finally, IDRISI still lacks data preparation functions, such as its inability to orthorectify images or georeference data using sensor-specific models. In spite of its comprehensive range of image classifiers, it is not possible to classify images based on pixel spatial properties and using external knowledge.
4.2
ERDAS Imagine ERDAS is one of the oldest and leading geoinformatic software companies. Its major product, Imagine, contains a suite of comprehensive and sophisticated tools for digital analysis of remotely sensed data. The latest release, version 9.1, offers something for everyone: 3D feature extraction, terrain modeling, hyperspectral data analysis, publication of 3D interaction environments, in addition to photogrammetric tools after the company was acquired by Leica. This version significantly improves the integration of remote sensing and photogrammetry with the addition of the Leica Photogrammetry Suite (LPS). Both Imagine and LPS can be configured to work closely with Oracle 10g Spatial and/or ESRI’s ArcSDE (Cothren and Barnes, 2006). The revamped ERDAS Imagine is offered at three levels, Essentials, Advantage, and Professional. Imagine Essentials encompasses a set of powerful tools for manipulating geographic and imagery data, such as image georeferencing, visualization, and map output. Imagine Advantage extends the capabilities of Imagine
111
112
Chapter Four Essentials through addition of several more functions, such as mosaicking, surface interpolation, advanced image interpretation and orthorectification. Imagine Professional contains more classification and spectral analysis, and radar processing utilities than Imagine Essential. In addition to all those modules included in both Essentials and Advantage, it also encompasses add-on tools for complex image analysis and modeling, radar data analysis and advanced classification. The graphic spatial data modeling module in Imagine Professional allows the user to create and run models to analyze images efficiently and flexibly. Core to Imagine are several important modules for data display (Viewer), exchange (Import), data preparation, image enhancement (Interpreter), and classification. They are elaborated on below.
4.2.1
Image Display and Output
ERDAS Imagine has a versatile and flexible capability in image display (Fig. 4.2). Single and multiple (at most three) bands are displayed in grayscale or as a color composite. Multispectral bands can be displayed simultaneously in multiple viewers. The displayed image may be panned or zoomed at will through appropriate buttons in the toolbar. Special facilities are provided for locating a particular point on the displayed image via its coordinates, expressed as row/column or easting/northing, depending on the geometric properties of the image. In addition, multiple images can be displayed on top of each other in the same viewer. Special tools (e.g., swipe) are available for viewing a portion of the top image and the bottom image simultaneously. Vector layers may be displayed on top of the displayed raster imagery. The displayed image can be queried, as well. Processed results may be output using the Composer module that allows an image to be added to an empty map. Other cartographic elements such as legend, scale bar, and north arrow can all be added to it. The composed map may be saved in a number of formats. However, it is very difficult to produce a perfect map composition using ERDAS Composer because of its limited range of fonts.
4.2.2
Data Preparation
More than 130 image formats, even non-remote sensing DEM data, are recognizable by ERDAS Imagine. Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), Indian Remote Sensing (IRS), Thematic Mapper (TM), and Advanced Very High Resolution Radiometer (AVHRR) data can be read directly into the system. In addition, images saved in all major graphic formats can be imported from and exported to other systems. Furthermore, ERDAS Imagine also supports some vector file formats such as ESRI shape files, and vector coverages. With ERDAS data preparation functions, it is possible to project an image to a specified new coordinate system, or geometrically
113
FIGURE 4.2 The icon panel of ERDAS Imagine (version 9.1) with the classification icon clicked. Each module linked with a drag-on menu contains a varying number of further options. (Copyright: ERDAS.)
114
Chapter Four rectify it using polynomial, rubber sheeting, or a predefined model for IKONOS and SPOT imagery. Such rectified images may be mosaicked. In generating a controlled mosaic from overlapping aerial photographs, options (e.g., minimum, maximum, average) are available to specify how the output image should look like radiometrically over the overlapped portion. It is also possible to import/export cut lines, and smooth images along cut lines, and to balance imagery’s color using dodging in Imagine.
4.2.3
Image Enhancement
All kinds of image enhancement, be it radiometric, spatial, or spectral, are performed under Imagine Interpreter. Radiometric correction refers to the removal of atmospheric effects (e.g., haze) from the input image so that its pixel values correspond closely to the reflectance of targets on the Earth’s surface. Radiometric enhancement includes contrast enhancement, haze and noise removal, inversion of brightness, and destripping (for TM imagery only). Contrast of an image may be enhanced through a look-up table and histogrambased manipulation (e.g., histogram equalization and histogram matching). Image enhancement in the spatial domain includes texture analysis, focal analysis, and image convolution. Spectral enhancements may be carried out via principal component analysis, Fourier analysis, image transformation from hue-intensity-saturation to redgreen-blue or vice versa, and image indexing. With version 9.1, it is possible to pan-sharpen low resolution multispectral bands with a finer resolution panchromatic band. The quality of the sharpened image may be improved via a two-pass filtering option. Specific tools (e.g., band normalization, spectrum averaging, profiling, spectral library, and so on) have been developed to handle hyperspectral data. Also found in the Interpreter module are two analytical functions for both remote sensing and non-remote sensing data: GIS analysis and topographic analysis. Some GIS functions (e.g., clumping and sieving) are essential in performing postclassification processing such as spatial filtering and thematic generalization.
4.2.4
Image Classification
ERDAS Imagine supports both unsupervised and supervised classification. The unsupervised classification algorithm is called Iterative Self-organizing Data Analysis Technique (ISODATA). Images may be classified using one of the four supervised methods: parallelepiped, minimum distance, maximum likelihood, and mahalanobis. Special tools are available for selection of training samples and for analyzing their separability. Unique to ERDAS Imagine is its Knowledge Classifier that allows multiple decision rules to be combined logically to deduce the likely identity of a pixel in question. These rules are contained in a knowledge base that is created via the Knowledge Engine. All classified results may be assessed for their accuracy using the
Image Processing Systems accuracy assessment routine. This tool enables evaluation pixels to be selected either randomly or via a combination of random with stratified sampling. Once the true identity of all evaluation pixels is specified, indices such as overall accuracy and Kappa (see Sec. 12.4.4) are then generated automatically. The Subpixel Classifier module in Imagine is designed to classify images at the subpixel level. With this module it is possible to identify ground features that occupy a fraction of a pixel, or to discriminate materials with similar spectral characteristics. However, this add-on module is not iusually available in the standard package of Imagine.
4.2.5
Spatial Modeler
The flexibility and efficiency of undertaking image analysis in ERDAS Imagine is considerably improved with the use of Spatial Modeler. In this environment repetitive and complex image analysis operations are lumped together sequentially to form a deliberately built model. This complex image analysis model may be constructed from a number of fundamental built-in functions in a pick-and-mix manner using an internal modeling language without the user having to possess any programming knowledge. After a model is constructed, the analyst just needs to modify the names of import and output files in the first and last building blocks while the system sets the import and output file names in all other blocks in the model automatically. Apart from high efficiency, another added advantage of using Spatial Modeler is the disappearance of the necessity to save a huge amount of intermediate results.
4.2.6
Radar
Contained in the Radar toolbox is a suite of functions designed to process radar imagery specifically. Some of the key functions in this module are Interpreter, IFSAR, StereoSAR, and OrthoRadar, the last three being add-on modules. The Radar Interpreter module is able to perform many analyses on Synthetic Aperture Radar (SAR) images, such as speckle removal, texture analysis, image merging, slant range adjustment, radiometric and geometric calibration. A correlating pair of SAR images can then be used to generate DEMs. DEMs can also be created from stereoscopic Radarsat imagery based on the principles of radar interferometry in IFSAR (ERDAS, 2003). In generating a DEM, the system is able to select the pair of images that have the best geometric configuration. OrthoRadar is a module designed for accurate correction of terraincaused geometric distortions in SAR imagery. The output orthoimages are produced by modeling the SAR sensor from flight path parameters.
4.2.7
Other Toolboxes
In addition to the aforementioned modules, ERDAS Imagine contains several more toolboxes (e.g., 3D, Vector, Stereo, VirtualGIS, AutoSync, and DeltaQue) for performing specialized analytical tasks. These
115
116
Chapter Four add-on modules expand the functionalities of ERDAS to meet special image processing needs, such as radiometric correction, use of vector data, 3D analysis, and visualization. The Vector toolbox is designed to process chiefly vector data that have found increasing use in digital image analysis. Vector data may be subset, mosaicked, copied, and transformed just as with raster data. They can also be cleaned after their topology has been built within an ESRI Arc structured format. Tools are available for creating, editing, and viewing vector data, and reprojecting them. With the Enterprise Editor, users can manipulate vectors and topology in Oracle 10g. Vector layers may be imported to ERDAS Imagine and exported to ArcGIS. It is also possible to convert raster data to the vector format or vice versa in this module. Thanks to this module, image analysis and vector GIS are more closely integrated than ever before. Stereo Analyst is a toolbox designed specifically to manipulate 3D models, and to collect 3D features and attributes easily and accurately. Two-dimensional vector layers may be superimposed onto a digital stereo model so that they can be edited and reshaped to their genuine positions in reality. The 3D Measure Tool aids airphoto interpretation and quantitative analysis of geographic information. It enables direct delineation and measurement of the area boundary of a parcel of land. The 3D Position Tool facilitates determination of the accuracy associated with existing GIS vector datasets and digital stereo models. VirtualGIS is a powerful 3D analysis environment in which topographic data are dynamically visualized as fly-throughs of userdefined flight paths in an interactive, real-time fashion. Multiple types of geographic data can also be assimilated in VirtualGIS, for instance, blending and fading of two images. It is also possible to carry out interactive 3D models, intervisibility analysis, and create triangulated irregular network (TIN) meshes in this module. AutoSync is an add-on module for locating and measuring control points for georeferencing multiple images and for image triangulation. The automated point matching (APM) algorithm in LPS is used to find a large number of tie points among two or more overlapping images to georeference them quickly and robustly. AutoSync makes use of a wizard to provide an integrated and intuitive interface to the APM algorithm. A large number of points that tie a raw image to a similar, georeferenced image are generated automatically with one of four models: affine, polynomial, piecewise linear, and nonlinear transformations (Cothren and Barnes, 2006). Other models such as Rational Polynomial Coefficient and Direct Linear Transform are provided to Imagine Advantage license holders. They are useful for orthorectifying satellite imagery and aerial photographs. DeltaQue is a module designed specifically for identifying changes between raw, coregistered multitemporal images. A series of algorithms, procedures, and processing steps essential in change
Image Processing Systems detection are presented in a user-friendly manner. Change may be detected in two modes: a broad area search and site monitoring. In the first mode, change pixels are highlighted across the entire scene. In the second mode, change in the user-specified areas is visualized and explored from the differencing image after significant changes have been distinguished from insignificant ones using spatial filtering or thresholding. The user is able to identify and focus on changes that are of special interest to a particular application. ERDAS Imagine also contains a Developers’ Toolkit that allows the user to modify the commercial version of Imagine, or to develop entirely new applications as extensions, using a set of libraries and documentation. Several packages in the Toolkit support a number of routines, such as abstract object manipulation, file I/O and system access, GUI access, 2D and 3D visualization. Recently, ERDAS has acquired ER Mapper, so it offers more modules than can be described here.
4.2.8
Documentation and Evaluation
ERDAS Imagine provides a wide variety of means for user support, including on-screen help, Field Guide and tutorials. The help button is ubiquitously displayed at every menu and in every operation window. A detailed explanation of the functions of each module in the menu pops up upon clicking the help button. If this button is located in an operation window, the displayed message explains how to interact with the system. Such information in the hyperlinked text format is very helpful for new users of the system. More information on a specific topic can be found by clicking the highlighted phrases in the message. Through the HTML help message, it may be possible to navigate to the online Imagine interface manual and the Field Guide. The manual illustrates how to interact with a tool window once it is activated (e.g., how to enter the appropriate parameters in the boxes in a function window). The ERDAS Field Guide presents more theoretic background on special processing methods or algorithms used in Imagine. Formerly written in HTML, the guide allowed the search for a special topic thanks to the use of indexing. Sadly, it is almost impossible to search through the document for a specific topic anymore, due to the PDF format adopted. Additionally, ERDAS also provides the user with a 790-page Tour Guide organized in hierarchical form. Contained in this manual are step-by-step instructions on how to operate the system to perform a specific image analysis task, supplemented with screen-captured illustrations. All major modules are covered in this manual. The strength of ERDAS Imagine lies in its comprehensiveness and its ability to integrate digital image analysis with photogrammetry and GIS. Its collection of toolboxes, the largest in the industry, is able to perform nearly all kinds of image analyses, ranging from
117
118
Chapter Four simple image display, rectification to advanced knowledge-based image classification. Even processing of large image files does not degrade its performance. Over the years ERDAS Imagine has evolved from UNIX-based command-driven to Windows-based icon-driven. Its GUI makes this desktop package user-friendly. The ease of operation is further enhanced through the ubiquitous help button. Workflow and efficiency are enhanced through the modeling environment. ERDAS Imagine has a number of limitations, the most critical being the limited range of image classifiers it offers. It does not have a strong cohort of nonconventional classifiers, such as decision tree and fuzzy ones. Neither does it support neural network or machine learning classification, even though tools for analyzing hyperspectral data have been added to the latest release. It is also inadequate in spatial image classification based on texture. The only texture measures are variance and skewness, and other more realistic texture measures such as those derived from the gray level co-occurrence matrix are not supported. Another area for improvement is more detailed documentation on the latest modules. For instance, it is difficult to learn how to use the Expert Classifier due to the lack of examples. Navigation through the Field Guide is also difficult. Finally, with many add-on modules, ERDAS Imagine could be quite expensive if all of them are acquired.
4.3
ENVI Produced by Research Systems (now called ITT Visual Information Solutions), ENVI is a comprehensive icon-driven image analysis package designed especially for processing large multispectral and hyperspectral remote sensing data (ITT, 2007). The most recent release, version 4.4, has considerably expanded and perfected its image analysis functionality that includes spectral analysis, geometric correction, terrain analysis, and radar analysis, in addition to strong GIS capability. In this new version, it is much easier to integrate GIS and global positioning system (GPS) data, even survey information, into image analysis than before. Some of the new modules in this release include feature extraction, DEM generation, and radiometric correction. At the same time ENVI still maintains an easier and faster workflow than its predecessor. All the functions available in ENVI 4.4 are organized into toolboxes that number 10 in total. Some of the important ones are classification, transformation, spectral, map, vector, topographic, and radar. ENVI has many unique features that are presented in four categories below.
4.3.1
Data Preparation and Display
ENVI has flexible image display and browsing capabilities. The same raster image may be displayed at various zooming levels in multiple viewing windows, on top of which vector GIS layers and annotation
Image Processing Systems information may be overlaid. Comprehensive vector overlays with GIS attributes can be created, and map and pixel grids added to images. Vector data may be further queried, modified, and analyzed in ENVI. The processed results are presented in map form using map composition utilities. These results can be saved in several popular proprietary image formats (e.g., Imagine, PCI, and ER Mapper) or in generic GeoTIFF. They may also be converted into the vector format. Other forms of image display include 3D perspective viewing, surface shading, image draping, and animation (i.e., movies). ENVI can recognize a wide variety of satellite data, such as AVHRR, Landsat Multispectral Scanner (MSS) and TM data, multispectral and hyperspectral OrbView-3 and EnviSat images, and even radar data (e.g., Shuttle Radar Topographic Mission). They may be subset, layer stacked, and segmented. Satellite data may be registered to a map or coregistered with another image based on ground control in the Map module. They can also be georeferenced to a user-specified coordinate system using a generic model or sensor-specific models for SPOT, Sea-viewing Wide Field-of-view Sensor (SeaWiFS), AVHRR, and Moderate Resolution Imaging Spectroradiometer (MODIS) data. This module also allows mosaicking of georeferenced images. Basic orthorectification may be carried out for SPOT, IKONOS, and QuickBird images in addition to aerial photographs. ENVI can import airborne light detection and ranging (LiDAR) data, and interoperate with GIS, GPS, and other spatial data obtainable through the Open Geospatial Consortium standards (Thurston, 2008). In addition, ENVI is capable of handling vector data, for example, it can import from virtually any sources and export to ESRI shape files. Thus, it should not be viewed merely as a system with strong image analysis capabilities, but also as an integrated geospatial data management system.
4.3.2
Image Enhancement
The contrast of an image may be stretched linearly with or without truncation in ENVI. It is also possible to stretch the contrast using gaussian or histogram equalization. An image must be displayed before it can be contrast enhanced because not all image enhancement functions are available in the icons in the toolbar. They are accessible via image display. The histogram of two displayed gray or color images can be made similar to each other through histogram matching. The same image may be filtered (e.g., smoothed and median filtered) or sharpened. Grayscale images can be color coded through standard color tables (e.g., density slicing). ENVI also contains an extensive range of spatial enhancement and general-purpose transform functions, including principal components transform, band ratioing, hue-intensity-saturation transformation, decorrelation stretching, and vegetation indexing. Images can be spatially filtered using convolution kernels for low pass, high pass, median, directional filtering, and edge detection methods
119
120
Chapter Four (ITT, 2007). Other special filters such as Sobel, Roberts, dilation, and erosion are also available, along with adaptive filters such as Lee, Frost, Gamma, and Kuan. Image texture is described using data range, mean, variance, entropy, skewness, variance, homogeneity, contrast, dissimilarity, entropy, and correlation. Filtering in the frequency domain (e.g., fast Fourier transformation or FFT) or inverse FFT can be easily carried out in ENVI, as well. Although ENVI does not have any generic radiometric correction functions, special tools are available for radiometric processing of Landsat data, such as destriping, atmospheric correction, and calibration of reflectance using prelaunch parameters. A specific set of tools are designed for displaying ephemeris data, radiometric calibration and geometric rectification, and calculation of sea surface temperature from AVHRR data. ENVI also encompasses tools for calibrating thermal infrared data to emissivity.
4.3.3
Image Classification and Feature Extraction
There is a comprehensive set of image classifiers in the Classification toolbox of ENVI. Remotely sensed data may be classified unsupervised (K-means and ISODATA) or supervised (parallelepiped, minimum distance, maximum likelihood, and mahalanobis distance) (Fig. 4.3). Moreover, ENVI offers three more nonconventional classifiers: binary encoding, neural network, and Spectral Angle Mapper (SAP). In the binary decision tree classification, pixels are grouped into classes via a series of binary decisions in multiple stages. The neural network classifier makes use of standard backpropagation for supervised learning and a layered feed-forward network for classification. Images may be classified at the subpixel level using the Subpixel module. In the SAP spectral classifier, the spectra of input pixels are compared to those of reference pixels. Their similarity is measured by the angle between them. All classified results may be spatially filtered during postclassification processing. Indices of classification accuracy, such as confusion matrix and Kappa coefficient, can all be generated in ENVI. In addition to these classifiers, ENVI also contains image processing functions, including anomaly detection, feature extraction, pan sharpening, and vegetation suppression. Of particular notice is the feature extraction tool, an object-oriented add-on module that is designed to quickly, easily, and accurately extract features from high resolution imagery. Its wizard makes use of both spectral and spatial properties of pixels. Features are identified based on their spectral and physical characteristics such as structure and shape. With the use of such characteristic attributes, these features can be expected to be classified more accurately. Moreover, features as small as buildings and vehicles can be extracted from hyperspatial imagery using this tool. Libraries containing the features of interest can be built over time, making this tool quite valuable in automating the workflow.
121
FIGURE 4.3 The menu panel of ENVI (version 4.4) with all the options in supervised classification displayed. (Copyright: ITT Visual Information Solutions.)
122
Chapter Four Another way of achieving a high level of automation is through the SPEAR toolbox. Its tools are intended for automatic spatial and temporal change detection, pan-sharpening images, and terrain categorization. Nevertheless, it must be pointed out that this set of tools is designed for defense and intelligence image analysts to perform both common and advanced image processing routines. They are not developed with the general users in mind. Neither are they available for all kinds of analyses. The DEM generation tools in ENVI 4.4 are tailored for extracting elevational information from a pair of stereo aerial photographs or satellite imagery. Additional information on this pair, such as the rational polynomial coefficients for frame photographs and pushbroom sensor imager, must be supplied to construct a 3D model. Such information is available for ASTER, Cartosat-1, IKONOS, OrbView-3, and QuickBird data. The accuracy of the extracted DEMs depends on the quality of ground control.
4.3.4
Processing of Hyperspectral and Radar Imagery
An extensive suite of functions designed specifically for processing hyperspectral data are found in the Spectral toolbox. Some of these functions are Pixel Purity Index, n-Dimensional Visualizer, and Spectral Analyst. The Pixel Purity Index enables the identification of the spectrally purest pixels in an image that serve as the spectral endmembers in subpixel image classification. The n-Dimensional Visualizer allows the interactive animation of the n-dimensional scatterplot, through which to select the best endmember materials and their corresponding spectra. The Linear Spectral Unmixing function serves to determine the relative spectral abundances of endmembers depicted in multispectral and hyperspectral data. Spectral libraries may be built or viewed through ENVI routines, and compared to image spectra. This comparison with reference spectra is carried out at selected wavelengths, based on the least-squares principle. A root-mean-square error is produced for each reference spectrum. A wide range of radar imagery, such as EnviSat, ERS, JERS, Radarsat and Topsar, can be processed in ENVI with a full range of generic or radar-specific methods. Some exemplary routines are antenna pattern correction, slant-to-ground range correction, and generation of incidence angle images. SAR-specific analysis functions include review and reading of header information from CEOS-format data. Other radar image analysis functions include adaptive and texture filters, creation of synthetic color images, and a broad range of polarimetric data analysis methods. EVNI 4.4 is unable to extract 3D information from stereoscopic radar imagery, though.
4.3.5
Documentation and Evaluation
ENVI provides a comprehensive online document. This hyperlinked text can be viewed sequentially or searched through an index. Keywords
Image Processing Systems in the searched text are highlighted. More information on these topics is displayed upon a click. ENVI 4.4 has drastically expanded image processing capabilities and successfully overcome the limitations of previous releases with the addition of a few modules. Its simplified user interface is easy to operate with a high degree of automation, thanks to the logic layout (Thurston, 2008). The newly added tools make common image processing tasks, such as working with vector layers, pan-sharpening images, and change detection, quickly and efficiently performed. In particular, the SPEAR tools facilitate development of quick applications, and support existing libraries of feature extraction. ENVI 4.4 offers additional spectral processing and analysis functions—the core capabilities that make ENVI the choice of geospatial professionals around the world. As an important image analysis system, ENVI offers a wide range of traditional and nontraditional image classifiers. For instance, it is one of the earliest systems with the capability of image classification based on machine learning. At present ENVI is able to classify images using neural network and binary encoding, but not knowledge-based methods. Nor does it have a strong capability of integrating image analysis with GIS, even though this weakness has been minimized in the recent release of ENVI. Written in the powerful structured Interactive Data Language (IDL), which is required to run ENVI, ENVI is highly flexible and dynamic. However, the operation of ENVI can be further improved via better organization of some functions such as image display, which is spread into several places. Graphic icons may be added to accompany the functions in the toolbar for quick access.
4.4
ER Mapper Founded by Stuart Nixon, ER Mapper is an Australian software company specializing in digital image analysis. Its major product of the same name is a powerful window-based imagery analysis package that offers a complete suite of image processing tools. Its latest release, version 7.0, is one of the industry’s leaders in image analysis. ER Mapper is offered to the public at two levels: desktop and enterprise. The former contains such modules as Professional, Image Viewer, and Compressor, the first being the flagship product. The later includes Image Web Sever. These important modules are introduced below.
4.4.1
User Interface, Data Input/Output and Preparation
Similar to all other systems, ER Mapper has an icon-driven user interface. Once the system is activated, a panel of toolbars is displayed at the top of the screen (Fig. 4.4). Contained in the panel are modules for image display (View), editing (Edit), and processing (Process), apart from a few utility buttons such as file, toolbar, utilities, and help.
123
124 FIGURE 4.4 The icon panel of ER Mapper. Displayed in this screen shot are windows for image analysis algorithms and settings. (Copyright: ER Mapper.)
Image Processing Systems Underneath the panel are many icons that provide a shortcut to access some functions directly without going through the menu. ER Mapper supports an extensive range of image formats in which popular satellite data are saved, such as Landsat MSS, SPOT, and AVHRR. Major proprietary image formats such as ERDAS Imagine, vector data (i.e., ArcInfo coverage and MicroStation DGN), and non-remote sensing DEM data can all be imported into ER Mapper or exported to other systems (ER Mapper, 2007). Images can be directly read and saved in several common formats, such as Joint Photographers Experts Group (JPEG), JPEG 2000, and GeoTIFF. Processed results can be exported to these formats, as well. Furthermore, those image formats that are directly importable can be imported through generic import functions. In addition to remote sensing data, ER Mapper also has tools for analyzing DEM data, such as conversion of point and line data into a raster DEM from which topographic parameters (e.g., aspect, gradient, shaded relief, and 3D shading) may be produced. Specialized functions are also available for processing radar data, from which speckle noise may be removed and texture derived. Output of processed results is via the map composition and printing tools. All final results may be further embellished using the map composition functions. Both simple and complex maps can be composed using a library of predefined postscript map objects, such as legends, coordinate grids, scale bars, north arrows, color bars, and symbols. As a major data preparation step, image georeferencing may be based on one of the seven models, including polynomial, triangulation, and map-to-map reprojection using GCPs derived from other images or maps. Images can be georeferenced to a known ground coordinate system using the geocoding tool. Large-scale aerial photographs can be orthorectified using GCPs collected from georeferenced images or using the exterior orientation parameters of the camera. Georeferenced images may be reprojected from one datum to another or resampled to a different spatial resolution using the neareast neighbor, bilinear, or cubic convolution method. Multiple georeferenced images up to 100 may be stitched together to form a seamless mosaic after their color or tone has been calibrated using the powerful wizards, or their contrast matched through histogram matching. The mosaicked image may be subset with a vector polygon to any political boundary or the boundary of the study area. Images from different sensors may be fused to create a best view of the area under study.
4.4.2
Image Display
Image display is via ER Viewer, a free and easy-to-use tool that allows interactive roaming and zooming of very large images. ER Mapper has a powerful toolset for many kinds of display. One image may be displayed with different options. For instance, a raster image may be displayed as transparent, or over a shaded DEM. The impact of solar
125
126
Chapter Four elevation on the display is shown immediately and interactively by moving the shading angle or enhancement saturation button. It is possible to display multiple layers of vector data from different sources in different formats on top of raster imagery. The displayed image may be manipulated using the transform editor. Processed results can be viewed instantly using the display and mosaic wizard that decides the best setting for the display after it detects the type of the data. An image displayed as 2D views may be changed to 3D perspective or 3D fly-through views if the DEM of the area is available. Viewing angle and zoom factor in 3D perspective can be interactively modified.
4.4.3
Image Enhancement and Classification
In ER Mapper the contrast of images may be adjusted linearly or piecewise linearly, using gaussian and histogram equalization. They can also be manipulated using a full range of transforms, such as principal component analysis, Tasseled Cap transformation, RGB to HIS transformation, and fast Fourier transformation. Multiple bands may be ratioed. Images may also be spatially enhanced using convolution filtering of high pass, low pass, edge enhancement, adaptive median, morphological, and majority filtering. ER Mapper is able to perform both unsupervised and supervised classifications. The analyst has full control over the unsupervised classification by specifying certain parameters, such as maximum standard deviation within a class, separation value when splitting classes, and minimum distance between class means. All common supervised classifiers, such as parallelepiped, minimum distance, maximum likelihood, and mahalanobis, are supported in ER Mapper. Unlike its counterpart in other systems, the maximum likelihood classifier in ER Mapper incorporates contextual information gleaned from neighboring pixels into its decision making. Knowledge on the identity of the surrounding pixels is used to help smooth the classification. The classified results are evaluated against the reference data or other classified images for their accuracy. The produced confusion matrix shows the producer’s and user’s accuracy, as well as the overall accuracy. The classified raster imagery may be converted automatically into a polygon coverage using the powerful vectorization tool.
4.4.4
Image Web Server
Perhaps the most distinct module of ER Mapper Enterprise is its image web server. With the increasing popularity of the Internet, images are delivered to remote users electronically. How to share images among different researchers working in a larger project via the Internet and intranet is an issue that caught the attention of ER Mapper earlier on. Their response to this demand is the image web server that is designed for fast delivery of large raster image files over the Internet and intranets. One version of the server bundled with ER Mapper 7.0
Image Processing Systems is able to handle images as large as 500 Mb. This limit disappears in the enterprise version. The web server requires an image compression plug-in installed in the client’s machine before it can function properly. Prior to image delivery via the Internet, the server first compresses the image in one of the two formats: the ECW (enhanced compressed wavelet) and JPEG 2000. Large image data can be compressed or uncompressed using the compression wizard. A compression ratio of 10:1 to 15:1 is achievable for grayscale imagery. The quality of color imagery can still be very high after being compressed at a ratio of 25:1 to 50:1. Once accessed, images can be browsed via web pages. Certain applications, such as ArcGIS, may require the installation of special software. In addition, a number of plug-ins freely available from the ER Mapper Web site allow users to interact with images on the Internet server in Microsoft Windows applications, ArcView, MapInfo, and so on (Qiu and Thrall, 2006). Unlike other similar systems, ER Mapper’s web server does not require any processing on the client side, such as loading up image pyramids. Images may be panned, zoomed in and out freely, but not scrolled using the mouse, a feature that is not supported at present. Besides, errors are resulted if the client has a display larger than 1600 by 200 pixels. This server allows seamless integration of various types of 2D and 3D raster images with such GIS web applications as ArcIMS, Tiger, and Map Server. Thus, ER Mapper image web server is fully compatible with all industry standard image-serving protocols. To be launched in the image web server is the RightWebMap function that is able to integrate multiple GIS and image services into a single application view.
4.4.5
Evaluation
ER Mapper has the most friendly user interface thanks to the widespread use of interactive wizards that automate common and complex tasks, such as image mosaicking, image enhancement and compression, and batch conversion. These wizards also simplify complex image processing tasks. Consequently, even novice users can take full advantage of functions on offer by the system. ER Mapper possesses a comprehensive range of image processing functions. Its image display functions are especially powerful among major image processing packages. This system allows dynamic integration of data of various types and spatial resolution without the need to unify cell size (Civco, 1996). Similar to ERDAS, this truly open and user-extensible system enjoys an unrivalled advantage in image processing speed. It is rather competent at handling large image files, and is able to mosaic a large number of images quickly. It used to have a limited capacity for integration with vector data. This drawback has mostly disappeared in the new release. Perhaps the most distinct strength of ER Mapper is its image web server that enables images to be delivered, shared, and increasingly manipulated over the Internet and
127
128
Chapter Four intranets quickly, an advantage unmatched by other image analysis systems. However, ER Mapper has a limited range of image manipulation functions. For instance, such simple image enhancement capabilities as spatial filtering are absent. Besides, there are also few GIS functions for performing postclassification filtering. Apart from the standard image classification methods, ER Mapper supports none of the innovative image classifiers based on machine learning. Thus, it is impossible to undertake neural network image classification, image classification based on decision trees, or at the subpixel level, or classify images using spatial information. There is no provision for undertaking change detection and exploring the nature of changes. No special modules are available for processing hyperspectral resolution remote sensing data and for analyzing radar imagery.
4.5
PCI Headquartered in Toronto, Canada, PCI is a leading software developer specializing in remote sensing, digital photogrammetry, and cartography. It markets a range of software products, the flagship of which is Geomatica, a comprehensive, image-centric computer system. This package of an extensive suite of geospatial tools offers the most complete geospatial solution via its many built-in capabilities. This system brings together remote sensing, GIS, cartography, and photogrammetry into an integrated environment. PCI Geomatica is able to perform image analysis, in addition to digital photogrammetry and spatial analysis. The most significant recent release is version 10 in which all functions are organized into seven toolbars: Focus, OrthoEngine, Modeler, Easy, Fly!, GeoRaster, and Metadata Mapper. In this release PCI appears to have endeavored to adequately meet the need of geospatial enterprise (Page, 2006). Geomatic is positioning as one of the industry leaders in undertaking truly integrated geospatial analysis. The most recent release of version 10.1.2 includes significant updates to support images from new sensors, such as the Phased Array type L-band Synthetic Aperture Radar (PALSAR) data products, WorldView-1 images, EROS-B data, and data acquired from the China-Brazil Earth Resources satellite.
4.5.1
Image Input and Display
Of the numerous PCI modules, the freely available Geomatica FreeView is designed for image viewing. This fully georeferenced data viewer is able to read and assimilate raster and vector data saved in more than 100 formats. A wide variety of airborne and satellite data, as well as non-remote sensing data (e.g., DEM and digital line graphic data), is recognizable by PCI. This powerful viewer also features image enhancement and viewing capability of attribute tables (Fig. 4.5).
Chapter Four Geomatica 10 functions perfectly well for images saved in the PCI proprietary format that is useful in bringing together multisource information about a study area, facilitating data manipulation, analysis, and organization. Images in the JPEG format are acceptable to Geomatica, but should be converted into the proprietary format for all data manipulations to achieve higher efficiency. PCI is able to display images in a range of manners. Both raster and vector data can be displayed and integrated, though good response times are essential in displaying high resolution imagery of a very large size. Three-dimensional viewing and vector feature attributes may be built, edited, and queried in the 3D stereo environment. In addition to images, raster DEM data may be displayed with elevation color coded. Interactive fly-throughs may be created from imagery data in FLY! for topographic visualization. With the assistance of this visualization tool, imagery and vector layers may be draped over a DEM to create perspective scenes in near real-time. The user is able to interactively control flight speed, viewing direction, elevation, and perspective parameters through the intuitive point-and-click user interface.
4.5.2
Major Modules
PCI encompasses many image analysis modules for various processing tasks, such as geometric correction, image classification, data visualization, and cartographic production. The two most important modules in Geomatica are Focus and OrthoEngine. The former offers a comprehensive range of tools for undertaking an image analysis task from beginning to end. The input data may be transformed to an extensive range of projections, which can be user defined. The transformed image can be subset or clipped through advanced spatial operations. The map and file trees of Focus keep the image analyst abreast of the workflow as the image is being edited and modeled. OrthoEngine is a complete photogrammetric suite of image geometric processing tools, such as geometric rectification, orthorectification, image mosaicking, DEM extraction, and 3D visualization. This powerful tool is designed to produce geometrically corrected images and mosaics efficiently. It allows mosaicking of even nongeoreferenced oblique images using the polynomial model. It is possible to sequentially step through matching points within the geocoded images. However, it is through manual manipulation that the analyst has more control over final output quality. In an enterprise setting, not all available functions in a comprehensive software package are used equally frequently. Instead, the image analyst may perform certain tasks repeatedly. High efficiency is achievable if such specific functions are initiated and accessed conveniently. PCI tailors this access through a particularly useful
Image Processing Systems feature called the Algorithm Librarian. Hundreds of analysis algorithms accessible in the Algorithm Librarian are grouped into data interchange, image processing, vector processing, data analysis, multilayer modeling, and image correction. Repetitive processing procedures can also be automated by using a scripting language in Geomatica 10. Productivity is increased with visually modeled workflows and command-line scripting. Users can customize workflows by creating batch processes for all types of data processing and analysis. For instance, production workflow efficiency is boosted for a set of geospatial tasks such as automated orthorectification, mosaicking, DEM extraction, feature extraction, and change detection, all of which can be embedded within existing processes either through visual modeling or command-line scripting (Page, 2006). This extremely flexible and convenient interface allows the user to define their own algorithm categories, thereby giving them quick and easy access to the frequently run analyses. Another important module is the Geomatica WebServer suite of tools that combines three standard geospatial data services (web coverage service, web feature service, and web map service) into one module. This web server is designed to share and distribute geospatial data more efficiently through the Internet and intranets. It consists of three parts, two of which are a visual environment for various data, such as imagery, vector layers, and bitmaps; and an applet that allows the user to roam, zoom, and pan images; and interaction with the web server. It is automatically downloaded into the user’s web browser, but is deleted after the user leaves the web site. The last part is a servlet that coordinates activities between the applet and the web server.
4.5.3
User Interface
The user interface of PCI has evolved from command driven in the old days to Windows NT, 2000, and XP. Its GUI is easy to use, intuitive, and unobtrusive. The tabbed legend frame makes switching on and off layers and views very easy. Both characteristics and functions can be accessed with ease from this legend frame. Some users may prefer an enhancement to this interface by adding dockable panels for displaying algorithm or other libraries, image characteristics like histograms, or frequently used complex tools (Page, 2006). Users may also wish to interface with the system via command line if using the scripting language, a growing trend of providing highly flexible and customizable user interfaces.
4.5.4
Documentation and Evaluation
PCI supplies a user guide describing its functions and how to run some analyses. However, more documents are needed to provide
131
132
Chapter Four support for its users. For instance, a tutorial could be added to support novice users and to explain how the system should be run properly. More resources such as freely available introductory training materials and analysis examples should be provided to ensure that the user can learn quickly how to operate the system with confidence. With a minimum effort PCI could upgrade and reorganize the e-mail discussion thread (Page, 2006). In this way the geospatial community can exchange ideas with software developers and learn from each other. If the discussion becomes a wellorganized and moderated forum, the users can support one another in using PCI products, such as how to find solutions to their problems quickly. PCI can also benefit from this forum by receiving feedback from its users about how to improve its products in future releases. The strength of PCI lies in its high degree of automation and customized workflows. However, this advantage turns to a limitation at the same time as it can be applied to a narrow range of image analysis functions such as image georeferencing, orthorectification, mosaicking, and DEM generation. It does not apply to thematic information extraction from multispectral imagery. Another obvious area for improvement is the componentization of functions, even though they may be linked together into automated workflows and run in batch processes. The flagship package contains only seven modules, mostly for the extraction of elevational information. There are too many add-on modules that are core components in other similar systems. These modules are tailored toward performing specific tasks. For instance, the Optical module is designed to analyze optical remote sensing data. It offers tools for spectral unmixing, neural network classification, geometric and atmospheric correction of AVHRR data, and surface temperature and vegetation index extraction from AVHRR imagery. However, a separate add-on module is required to analyze radar data. Contained in this add-on module are a few programs for radar geometric correction and despeckling filtering. Similarly, a separate module is essential for analyzing hyperspectral images, such as visualization and spectral libraries. One more module is needed to compress hyperspectral data. In fact, a separate module is needed for atmospheric correction and another for pan sharpening. Thus, the organization of PCI functions is not logical and self-obvious. The same function (e.g., radiometric correction) appears in more than one module and for different satellite images. In this regard PCI is better suited to industrial applications than teaching the concept of image analysis. Geared toward extraction of digital information, PCI Geomatica lacks capabilities of analyzing satellite data to produce accurate land cover maps.
Image Processing Systems
4.6
eCognition eCognition was developed by Definiens in order to overcome the weaknesses and limitations of traditional spectral image classification methods for hyperspatial resolution satellite imagery inherent in current image analysis systems. It represents an attempt to remedy the deficiency of the conventional per-pixel image classifiers that treat images as composed of individual pixels instead of meaningful objects. Based on the assumption that objects provide important semantic clues critical to properly labeling image pixels, eCognition adopts an object-oriented and multiscale approach toward image analysis. This attempt provides a powerful means for analyzing images in an effort to achieve more reliable classifications.
4.6.1
General Overview
The flagship package of Definiens is eCognition Professional, an object-oriented image classifier that takes contextual information into the decision making behind classification (Definiens, 2008). In addition to tone that is the sole clue in spectral classification, shape, texture, area, and contexture, are all used in eCognition classification. As a consequence of this innovation, complex image data are classified more intelligently, more accurately, and more efficiently than with traditional methods in a number of steps, ranging from multisource data fusion, multiresolution image segmentation, and fuzzy classification. Data from a wide variety of sensors and platforms (e.g., Landsat and Radarsat) of different spatial and spectral resolutions (e.g., IKONOS and SPOT) may be merged. Image analysis starts with segmentation of remote sensing images into homogeneous objects. General knowledge of object features is applied to improving the accuracy of identification. Image classification is implemented through sample objects (training areas) or the knowledge base. Sample-based fuzzy classification is a very simple, rapid supervised classification. Knowledge-based fuzzy classification relies on knowledge about the relevant image content (e.g., contexture) stored in a knowledge base. Apart from eCognition Professional, Definiens offers three other eCongnition packages: Elements, Enterprise, and Forester. eCognition Elements offers a subset of the image processing tools in eCognition Professional for multisource data fusion, multiresolution image segmentation, and supervised fuzzy classification. It is hence less complex and easier to use than eCognition Professional. eCognition Enterprise is an expanded version of eCognition Professional with extra enhanced functionality for increasing the efficiency of image classification through batch processing. This modular client-server system is developed for server-based image classification to centralize enterprise image classification
133
134
Chapter Four and data management. eCognition Forester is a stand-alone tool for automatically identifying individual conifer tree crowns from aerial photographs. Such information is needed by forest managers to monitor forest resources.
4.6.2
Main Features
eCognition is an icon driven desktop system running on Windows NT, 2000, and XP. Best performance is achieved on machines with a minimum of 512 Mb RAM and a Pentium III 500 MHz CPU. Once the system is activated, a list of functions is displayed at the very top of the screen (Fig. 4.6). Each of them contains a drag-on menu with more options displayed if dragged down. Immediately underneath the toolbar are icons that provide fast access to some commonly used functions. Image display is accomplished via Definiens Viewer, which is able to load images, display them, and visualize processed results in a wide range of formats, including TIF, Graphic Interchange Format (GIF), JPEG, and even the ERDAS Imagine format through a raster data translator library. Non-technical users may rely on Definiens Architect to configure and execute versatile image analysis workflows using a library of analysis actions created in Definiens Developer. They are then calibrated for the image data being classified using training samples. It is very similar to Definiens Analyst, which executes workflows without requiring preconfiguration or user calibration before execution. Definiens Developer is a powerful integrated environment for image analysts to develop and test new analysis applications. Incorporating all the functionality of Definiens Architect, Definiens Analyst, and Definiens Viewer, it contains an unparalleled array of image, vector, and tabular data import and export functions. Results may be interactively explored and visualized.
4.6.3
Documentation and Evaluation
Definiens supplies a detailed user guide that contains tutorial materials. Contained in the tutorial are a few examples through which the user can become familiarized with the concept behind object-oriented image analysis and learn how to use the functions of this image processing system properly to generate the desired results. More user support, such as a user discussion forum, is needed to provide a means of communication between Definiens software developers and the user community, as well as for users to support one another. Unlike other image analysis packages that have a long history of existence, eCognition is a recent addition. As a primarily object-oriented image analysis system, it is able to produce more accurate results than per-pixel-based image classifiers, thanks to the use of contextual information at multiresolutions. Therefore, it does not have a comprehensive set of image processing functions, such as 3D data handling capabilities or tools for analyzing radar data. It also
135
FIGURE 4.6 The icon panel of eCognition Professional. Many icons are available for performing specific tasks. They can be used to improve operational efficiency. (Copyright: Definiens, 2006.)
136
Chapter Four offers limited capabilities of integrating remote sensing data with non-remote sensing vector data. Spatial analysis functions that have become an integral part of standard image analysis are also absent from this package.
4.7
GRASS The Geographic Resources Analysis Support System, commonly known as GRASS, is a computer system for image processing, spatial modeling, and visualization of many types of data. This free system was initially developed by the U.S. Army Construction Engineering Research Laboratories as a tool for land management and environmental planning by the military. It has evolved into a powerful system for a wide range of applications in many different areas, both academic and commercial (Shepard, 2000). Unlike all other image processing systems which are commercially oriented, this open source system allows users to make their own contributions to perfect it via a sophisticated library. Although a raster GIS primarily, GRASS does have an image processing component apart from geospatial data management and analysis, graphics/maps production, spatial modeling, and visualization. There are more than 350 programs and tools to render maps and images, process multispectral image data, and manipulate and manage spatial data. Of these programs, 26 are designed specially for image processing, such as data preparation (image rectification and mosaicking of up to four images), image transformation (principal component analysis, canonical components analysis, Tasseled Cap (Kauth-Thomas) transformation, fast Fourier transformation, hueintensity-saturation to red-green-blue color transformation and vice versa), image georeferencing, and image classification (clustering analysis, maximum likelihood, contextual image classification). GRASS modules are organized into display, general file management, imagery (satellite data), photo, postscript (for map printing), raster shell scripts (macro), sites (for point data), vector (for digitizing and making aesthetically pleasing maps), and 3D visualization, such as fly-through animations (Shepard, 2000). GRASS can be run on a plethora of platforms and interface with databases to develop new data and manage existing data. Existing data in a variety of formats can be read by GRASS, such as ArcGIS and MapInfo. Besides, graphic formats and even ASCII text can be imported and exported between GRASS and other systems. In its latest release (version 6.3), more than 30 modules have been added to GRASS to broaden its functionality, such as atmospheric correction and importing and exporting attribute tables in various formats. In particular, its ability to handle vector data has improved considerably, such as vector editing, generalization, and importing/exporting. Written in the C language and operational in the UNIX environment, GRASS used to be exclusively command driven. This mode of interface has been replaced with the introduction of the GIS
Image Processing Systems Manager, which offers a GUI for ease of operation. Other user interfaces include Quantum GIS and JAVAGRASS. JAVAGRASS is a multi-platform, multi-session GRASS package designed for a large number of GIS professionals fully functional on Unix/Linux and Mac-OS platforms. In its latest and a major release (version 6.3), GRASS will run in Microsoft Windows. This icon-driven intuitive interface is very user-friendly with all the options shown on screen (Fig. 4.7). The user just needs to provide the necessary and desired values in the appropriate interface boxes. Through this improved
FIGURE 4.7
The main icons in the GRASS icon panel (version 6.3).
137
138
Chapter Four user interface, user-developed modules can be easily incorporated into the system for better interaction with map display and access to local and networked databases. As a public domain system, GRASS has a limited user support and documentation. A programmer’s manual in PDF and HTML formats is publicly available for download. There is a frequently asked questions (FAQ) document in Wikipedia that answers common questions the user may have about running GRASS programs. It is relatively easy to navigate through this document to find the information on how to perform a specific image processing task. However, more efforts are needed to compile a comprehensive user manual. There are certain tutorial materials in existence, but not all of them are written in English. In these tutorials examples are provided to illustrate how to perform certain image analysis tasks, such as image overlay/ clipping, vector map import/export, and network analysis. Nevertheless, not all available programs/scripts are fully documented. The limited range of GRASS image processing capability has been improved with the addition of new programs for image enhancement, data fusion, spectral unmixing, and georeferencing. Other recently created functions include LiDAR data processing and DEM extraction. However, important image classification functions, such as postclassification processing, accuracy assessment, and change detection, that an analyst takes for granted in other commercial systems are absent from GRASS. For these reasons, it has not found wide applications beyond certain government agencies in the United States.
4.8
Comparison When the digital image processing systems reviewed above were initially developed, each was targeting a particular area of application and had its unique features. Over the years these systems have undergone tremendous improvements and expansion in their functionality and ease of operation, thanks to the advances in computing technology. Upon requests from the user community, each system has perfected its functionality to such an extent that unique features are disappearing quickly. Now most of them have become powerful systems offering a comprehensive set of tools, with add-on modules for specialized applications. Increasingly, each system is becoming more and more versatile in its analytical functions. More analyses than ever before can be easily accomplished in the same system. Consequently, image analysis has become a smaller component of a sophisticated package. In terms of the data being analyzed, they have evolved from exclusive satellite imagery to include stereoscopic aerial photographs and radar imagery. These data are processed for image georeferencing, mosaicking, and elevational information extraction, thus forging a close integration of the geometric component of image analysis with digital photogrammetry. In terms of spectral information processing,
Image Processing Systems all systems offer nearly the same suite of functions as their competitors. Thus, they are losing their unique individuality. In terms of workflows, the processed results are integrated into other systems or, increasingly, with other GIS layers for further analysis in the same image analysis system. These systems are compared with each other in several categories under image display, data preparation and image enhancement, image classification, and user interface in Table 4.1. eCognition and GRASS are excluded from the comparison because they are not generic image analysis systems. Originated from the personal computer background, IDRISI is best at teaching digital image processing, owing to its modular structure and its comprehensive range of image classifiers and change detection analysis tools. Primarily, it is still a digital image analysis system with GIS functions. This system has a limited capacity to accurately georeference images using sensor-specific models. It also lacks the capability of analyzing stereo images and producing DEMs. Processing speed is slow for large image files. It has reached the industry standard, but not for all types of image analysis. In comparison with other image processing packages, ERDAS Imagine is the most capable of handling vector data and integrating image analysis with GIS (e.g., ESRI ArcGIS). After being acquired by Leica, ERDAS has expanded its functionality in analyzing stereoscopic images and extracting topographic information from remote sensing materials. Its range of image classification methods has broadened with the addition of the subpixel classifier and modules for classifying hyperspectral data. However, in the recent release it still lacks image classification capability based on machine learning or pixel spatial properties. Despite being the only system that is able to undertake intelligent image analysis. ENVI used to be the only system able to process hyperspectral data, but this advantage is disappearing quickly, as other systems also offer similar modules. Now its functionality has expanded to such a degree that its gap with ERDAS is being bridged up very quickly. Consequently, it shares a number of similarities with ERDAS Imagine. For instance, both are designed with the purpose of performing all kinds of image analysis comprehensively. ENVI is a slight leader in handling hyperspectral data. Its other strength is the ability to offer more image classifiers than ERDAS (but less than IDRISI), as well as a broad range of texture descriptors in spatial image analysis. With the release of the Feature Extraction add-on module, ENVI is the only mainstream system able to perform object-oriented image classification. However, it is disadvantaged by its limited capacity in processing 3D data. The only non-remote sensing data ENVI can handle are topographic in nature. ENVI is way behind ERDAS in processing vector data, and in integrating image analysis with GIS and photogrammetry.
139
140
Chapter Four
IDRISI Andes
ERDAS Imagine 9.1
ENVI 4.4
ER-Mapper 7.1
PCI Geomatica 10
Display and output
RGB24 color possible
Stacked, flexible
Versatile
Powerful
Best in its own format
3D capability
Yes
Functions
Yes
Yes
Yes
Yes
Data preparation orthoimage
Yes
Yes
Yes
Yes
Image mosaicking
Yes
Yes
Up to 100 images
Oblique images possible
Image transform
limited
PCA, RGB, Fourier
Extensive
PCA, RGB, FFT, T-C
Limited
Classification (Un)supervised
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Hyperspectral Neural net
Yes
Knowledge
Yes Yes
Subpixel
Yes
Decision tree Object oriented Change detection
Yes
Yes Yes Excellent
Accuracy assessment
Raw image
Limited
Yes
Yes
User interface
GUI
GUI Modeler
SPEAR tools
Use of wizards
GUI, command line
Ease of operation
Online document
Ubiquitous help button
Improved
Batch/ scripting language
Efficient
TABLE 4.1
Comparison of the Functions of Major Image Analysis Systems
The biggest advantage of ER Mapper is its image web server that may usher image analysis into a new era of distributed processing via the Internet. It also has the strongest ease of use thanks to the use of wizards. ER Mapper used to be similar to ERDAS, but its functionality has not been sufficiently diversified in recent releases. Similar to ERDAS, it does not support innovative image classifiers based on machine learning, such as decision tree image analysis and neural network classification. Subpixel image classification is not supported in ER Mapper, even though it is possible to carry out spatial image
Image Processing Systems classification incorporating contexture using the maximum likelihood classifier. PCI used to be similar to ERDAS in converting satellite imagery into land cover maps. However, PCI products have shifted the emphasis toward geometric processing of satellite images and stereoscopic aerial photographs. By comparison, image analysis functions are obscured in some add-on modules instead of featured prominently in the core products of PCI. The packaging of a few programs into an add-on module makes this system highly fragmented. Although a high level of production efficiency is achieved through special customization for certain tasks, it is rather expensive for an educational institution to acquire all the add-on modules in order to teach core image analysis functions properly. Thus, this package serves the industry better than academic needs. No matter which image analysis system is selected, the user must consider what modules are included in the standard package, what add-on modules to acquire, and whether user support should be subscribed to so that an updated version is acquired at a minimum cost when it is released. Another important consideration is the software licensing policy. When purchasing a license, the user must be aware of its restrictions (e.g., whether it is a commercial license or educational license). Another point to keep in mind is the number of licenses, or the number of users who have concurrent access to the system. Multiple licensing or site licensing is essential to an education provider or a large institution in which a team of image analysts are involved in a project. Normally, the system checks the number of concurrent users and allows the maximum number of users at the same time. However, if a license key is used, then the system can be run in a keyed machine only. Needless to say, this is highly inconvenient and should be avoided.
References Civco, D. L. 1996. “Review: ER Mapper 5.1 image processing software.” Photogrammetric Engineering and Remote Sensing. 62(3):269–274. Clark Labs. IDRISI Andes: GIS and Image Processing Software, http://www.clarklabs. org/products/upload/Andes_Brochure.pdf, last accessed June 18, 2008. Cothren, J., and A. Barnes. 2006. QuickTake Review: ERDAS IMAGINE and Leica Photogrammetry Suite 9.0, http://Geoplace.com. Definiens. 2008. Definiens Enterprise Image Intelligence Suite, http://www. definiens-imaging.com/. ER Mapper. 2007. ER Mapper Professional Datasheet, http://www.ermapper.com/ frame_redirect.asp?url=/ermapper/prodinfo/index.htm. ERDAS. 2003. ERDAS Imagine Field Guide. Atlanta, GA. Hermann, J. T. 2006. “IDRISI Andes Edition enhances modules, adds land change modeler.” Earth Imaging Journal, http://www.eijournal.com/idrisi.asp. Huber, B. 2000. “A review of IDRISI 32.” Directions Magazine, http://www. directionsmag.com/product_reviews.php?feature_id=40. ITT. 2007. ENVI—get the information you need from imagery, http://www.ittvis. com/envi/. Page, M. 2006. Product Review: PCI’s Geomatica 10. http://www.directionsmag.com/ article.php?article_id=2174&trv=1.
141
142
Chapter Four PCI. 2005. Geomatics overview, http://www.pcigeomatics.com/, last accessed June 18, 2008. Qiu, Y., and G. I. Thrall. 2006. ER Mapper’s image web server (IWS), http:// www.geospatial-solutions.com/geospatialsolutions/article/articleDetail. jsp?id=337600. Shepard, R. 2000. “GRASS—the Free GIS.” Directions Magazine, http://www.directionsmag.com/features.php?feature_id=34. Simonovic, S. P. 1997. “Software review: IDRISI for Windows, 2.0.” Journal of Geographic Information and Decision Analysis. 1(2):151–157. Thurston, J. 2008. Review: ENVI 4.4, http://vector1media.com/article/review/ review:-envi-4.4/. Warner, T., and D. Campagna, 2004. “IDRISI Kilimangaro Review.” Photogrammetric Engineering and Remote Sensing. 70(6):669–673, 684.
CHAPTER
5
Image Geometric Rectification
R
emotely sensed images obtainable from Earth resources satellites form a vital information source for managers of the environment and natural resources. Thanks to their ease of acquisition and up-to-datedness, satellite images, together with aerial photographs, are playing an increasingly significant role in numerous geographic information system (GIS) applications. Unlike existing maps, raw satellite images and scanned aerial photographs have a local coordinate system. They do not have the right projection. Neither do they have any scale nor a proper orientation. During data acquisition the platform on which the sensor is mounted is in a state of constant motion. Any deviation of the sensor position and orientation from the norm will lead to geometric distortions in the resultant satellite images. It is important to geometrically rectify these images for three reasons: • Firstly, the end product of a large number of remote sensing applications about the Earth’s resources and environment occurs in the thematic map form. These maps must conform to certain geometric mapping standards. Image rectification ensures that geometric distortions inherent in the remote sensing imagery are eliminated or reduced to an acceptable level. • Secondly, in order to be analyzed with data from other sources in a GIS, raw satellite images and aerial photographs have to be projected to a common ground reference system, which enables images obtained at different times to be spatially registered with one another. • Finally, if remotely sensed data are used to detect changes or update existing maps, they must be reprojected to a coordinate system with a known geometry identical to that of the digital maps to be revised.
143
144
Chapter Five In this chapter the sources of image geometric distortion are comprehensively reviewed under three categories. This review is followed by a discussion of ground coordinate systems. Two systems, one global and one regional, are introduced in detail, in conjunction with the fundamentals of image transformation. Presented next in this chapter are various image transformation models. Three special georeferencing methods (polynomial, direct, and orthorectification) are covered extensively. Also explored in this chapter are a number of issues involved in image rectification, such as the impact of ground control and image spatial resolution on the accuracy of image rectification. With the emergence of hyperspatial resolution satellite imagery, it is increasingly important to orthorectify it. How to perform image orthorectification forms the content of Sec. 5.7, followed by a discussion on image direct georeferencing in Sec 5.8. The last section of this chapter is about image subsetting and mosaicking of georeferenced images.
5.1
Sources of Geometric Distortion Many factors contribute to geometric distortions of satellite imagery. They are related broadly to the target, the sensor, and the platform. Target-related factors include rotation of the Earth during scanning and Earth curvature. Sensor-related factors are scale distortion and scanning mirror inconsistency. Platform factors refer to its position and orientation in space.
5.1.1
Errors Associated with the Earth
Earth Rotation The Earth spins at a constant angular velocity of 360° a day, or 463 m/s. During scanning in acquiring lines of imagery, the Earth rotates some distance from the west to the east. The exact amount of linear rotation varies with the scan duration and the manner of scanning. Cross-track scanning, as with Landsat imagery, takes longer to obtain a frame of imagery than along-track scanning commonly associated with SPOT (Le Systeme Pour I'Observation de la Terre). After a scan, the Earth shifts eastward by a certain distance, for instance, 9.26 m after a scan duration of 20 microseconds. When the scanner returns to its former position to begin the next scan, this position has moved eastward by 9.26 m. Instead, the ground of the same distance to the left has been displaced here to replace the former position, resulting in a gradual westward shift in the ground swath being scanned. As the number of scan lines accumulates, the start position shifts to the left cumulatively from the first line to the last. Although the raw image is recorded as a square (Fig. 5.1a), the actual ground covered by the image is skewed toward the left (Fig. 5.1b) as a consequence of the rotation. Earth rotation causes a constant displacement in the start position of scan lines only. It does not affect the number of scan lines in an image nor the number of
Image Geometric Rectification
(a)
(b)
FIGURE 5.1 Impact of Earth rotation on the geometry of satellite imagery. (a) Raw image. (b) The actual ground area covered by the image.
pixels in a scan line. Namely, it does not affect the number of pixels needed to represent an image. This kind of distortion is hence systematic and can be completely eliminated during image rectification.
Earth Curvature The three-dimensional surface of the Earth is not flat but curved with a varying topographic relief. Recording of this surface into satellite images is virtually a process of transforming a three-dimensional (3D) surface onto a two-dimensional (2D) medium, during which geometric distortions are inevitably introduced (Fig. 5.2). The severity of geometric distortion caused by Earth curvature depends on the scanning angle q and the swath width D, both of which are related to the sensor altitude H. The approximation of the curved surface with a flat surface may cause a negligible error if the field-of-view (FOV) of the sensing system is very small. However, for a sensing system such as the National Oceanographic and Atmospheric Administration (NOAA) Advanced Very High Resolution Radiometer (AVHRR) that θ
FIGURE 5.2 Effect of Earth curvature on the ground dimension of an image.
H A
D
B
r = 6370 km
145
146
Chapter Five covers a massive swath width of 2400 km per scene, the distortion is substantial. This systematic error can be calculated using Eq. (5.1).
L = 2r × sin−1(H × tan(FOV/2)/R)
(5.1)
where L = actual length of the Earth’s surface covered by the image after taking its curvature into consideration r = radius of the Earth or (6370 km) H = height of the satellite, usually in the range of hundreds of kilometers, for AVHRR data, it is 833 km (see Sec. 2.3). The discrepancy between the flat and curved dimension is
ΔL = L − D = L − 2H × tan(FOV/2)
(5.2)
Example H = 833 km, r = 6370 km, D = 2400 km f = sin−1 (D/2/R) = sin−1(2400/2/6370) = 10.858° ΔL = L − D = 6370 × 2 × (10.858 × 3.14159)/180 − 2400 = 14 km
Thus, there is a discrepancy of 14 km in the image swath width caused by Earth curvature. The distortion caused by Earth curvature is much more severe if the dimension of the ground area covered D exceeds the altitude of the sensor markedly, as is the case with aerial photography (Fig. 5.3).
r
Δr
f O
H Ground coverage
A′
Map plane
h
Ea r th ’s su rfa ce
R
A
FIGURE 5.3 Distortion caused by Earth curvature on aerial photographs. (Source: Modified from Zhou and Jezek, 2004.)
Image Geometric Rectification Calculation of this distortion requires measurement of the radial distance r from the principal point of the photograph [Eq. (5.3)]. The distortion on the photography Δr is calculated as Δr =
Hr 3 2 Rf 2
(5.3)
where f is focal length of the camera used to take the photograph, and H and R are as defined previously.
5.1.2
Sensor Distortions
Off-Nadir Scale Distortion Scan-direction distortion is caused by the nonuniformity in object distance, while image distance (e.g., focal length) remains constant during scanning (Fig. 5.4). This type of distortion is especially pronounced in airborne thermal infrared and radar imagery which is acquired at a close range to the ground. During a scan, the scene directly beneath the scanner has the shortest object distance. As the scanning mirror rotates away from this nadir position the object distance becomes increasingly Δy Δy
f f O
q Scan angle Ho Δq
HA = Ho secq Δq = instantaneous field-of-view Δq H secq
C ΔY
A
Area covered by the scan spot
Area covered by the scan spot
FIGURE 5.4 Tangential scale distortion caused by changing object distance during scanning.
147
Chapter Five longer. On the other hand, the scanning instantaneous field-of-view (IFOV) (Δq ) remains constant irrespective of the scan angle q. Therefore, a larger area on the ground is covered by the same scanning IFOV further away from the nadir. In the obtained imagery, this increased ground is recorded at the same dimension as the nadir ground because of the fixed focal length, resulting in off-nadir image compression. The further away an object is from the nadir position, the larger the compression. This scan-direction distortion is called tangential scale distortion. Tangential scale distortion is a systematic error that can be calculated using the following formula. As shown in Fig. 5.4,
AC = Δq ⋅ HA
Scale =
(5.4)
AC cosθ
(5.5)
HA = H 0 cosθ
(5.6)
AC = Δ θ H 0 cosθ
(5.7)
ΔY =
f Δy Δθ ⋅ f Δθ ⋅ f cosθ = = = cos 2 θ H0 ΔY Δθ ⋅ H A Δθ ⋅ H0 cosθ cosθ
(5.8)
Varying lateral scale
where f/H0 represents the nadir scale. Scale distortion occurs at a rate of cos2q in the cross-track direction. Of particular note is that tangential scale distortion occurs only in the direction perpendicular to the flight direction (Fig. 5.5). Along the flight direction there is no such distortion. Consequently, regular shapes of grids, diamonds, and circles on the ground (Fig. 5.5a) are no longer regular in the acquired
FIGURE 5.5 The degree of compression in the flight direction and across the track. (Source: modified from Lillesand et al., 2003.)
Image Geometric Rectification image (Fig. 5.5b). This kind of distortion is especially severe toward the end of the scan line, where q is maximal.
Scanning Mirror Inconsistency Two problems related to the scanning mirror emerge during scanning, velocity inconsistency and scan-skew. The mirror rotates nonlinearly across a scan like a pendulum. Its velocity is the maximum at the nadir and gradually decreases to zero at the extreme. The velocity always alternates between these two extremes. Consequently, the ground is not swept linearly, causing distortion along the scan direction. This systematic distortion can be eliminated completely through application of some kind of mathematical formula. Scan-skew refers to the motion of the spacecraft away from the planned direction along the ground track during the time required to complete a scan. This causes the ground swath scanned, not normal to the ground track but slightly skewed (cross-scan geometric distortion). It causes random errors that are impossible to deal with.
5.1.3
Errors Associated with the Platform
Two types of parameters determine the status of the sensor in space, position and orientation, each being associated with three components.
Position Among the three parameters (X-easting, Y-northing, and Z-altitude) defining the position of the platform in space, altitude (Z) is the most critical as it affects the scale of the obtained imagery (Table 5.1). A higher altitude than the nominal causes a larger ground area to be covered, resulting in the image having a smaller scale. Conversely, a lower altitude leads a smaller ground area to be sensed. Hence, the image has a larger scale. The departure of a spacecraft or aircraft from its nominal altitude is translated into a scale variation. The other two coordinates (easting and northing) govern the geographic area to be covered. A change in the position causes a slightly different area from the planned one to be sensed. There is no change in image scale.
Orientation Sensor orientation is defined by three parameters, roll (w), pitch (f ), and yaw (k ). Roll refers to the rotation around the flight direction (X-axis), whose increment points to the right. As illustrated in Table 5.1, this rotation occurs clockwise. It results in a change in scale in the direction perpendicular to the flight direction. The scale is either larger or smaller than the nominal, depending upon the location in relation to the X-axis. The scale along all lines parallel to the flight direction is constant. Pitch is the rotation around the Y-axis, or a direction perpendicular to the flight direction. Its effect on scale distortion is identical to roll except that the distortion occurs in the
149
150
Chapter Five
Parameter
Effect
Z—sensor altitude
Change in image scale
X, Y—sensor location
A different area is sensed; no change in scale
Rotation along the flight direction (X-axis) (w)
Change in scale along Y-axis
Graphic Illustration*
y
x
Rotation along the Y-axis (f)
Change in scale along X-axis
y
x
Rotation along the Z-axis (k )
∗
A different area is sensed
Solid lines show the ground area covered without geometric distortion; dashed lines show the area covered as a result of geometric distortion caused by one of the parameters.
TABLE 5.1
Effect of Platform/Sensor Position and Orientation on Image Geometry
X-direction (the flight direction). Yaw is defined as the rotation around the Z-axis (plumb direction). Unlike roll and pitch, yaw exerts no direct impact on the geometry of the obtained image (e.g., no change in scale). Instead, a slightly different area from the
Image Geometric Rectification planned one is covered in a yawed image. Thus, its effect is very similar to that of (X, Y).
Velocity In order to obtain imagery of a high geometric fidelity, the platform must be moving at a constant velocity during data acquisition. Any inconsistency in its velocity will lead to image distortion along the flight direction. If the velocity is faster than the norm, then the image is stretched, otherwise it is compressed. This generalization also applies to across-track scanning. Such an inconsistency in platform velocity is random and its impact on the acquired imagery cannot be eliminated.
5.1.4
Nature of Distortions
All distortions mentioned above fall into two broad categories in terms of their nature: systematic or random. Systematic errors behave in a predictable manner. Usually, they can be precisely described mathematically. In other words, they can be completely eliminated through image rectification. By comparison, random errors are nonsystematic and unpredictable. Their haphazard behavior in the imagery means that these errors cannot be completely removed, though it is possible to suppress them to an acceptable level through image rectification. Both Earth rotation and curvature cause systematic distortions that can be completely eradicated. So can the distortion caused by inconsistency in scanning mirror velocity. By comparison, most of the errors related to the orientation and position of the sensor, and the velocity of the platform are random. They exert a residual effect on geometrically rectified images.
5.2
Projection and Coordinate Systems Projection refers to the systematic manner in which the approximately sphere-shaped surface of the Earth is consistently transformed onto 2D media according to predetermined mathematical equations. Many kinds of projection are in use, each having its own unique features and being applicable to different parts of the world. All these projections share one commonality in that distinctive global patterns of parallels and meridians are first transferred onto an easily flattenable surface such as a cylinder or a cone. These surfaces are then flattened to form the planar coordinate system. During the transfer all features on the curved surface may be distorted in shape, area, distance, or direction. In one projection it is impossible to preserve all of these properties. At most one or two properties are preserved along certain lines. The achievement of fidelity in these properties is termed, respectively, conformality, equivalence, equidistance, and true direction. Conformality refers to the preservation of shape of an area during
151
152
Chapter Five projection. If conformally projected, an area has an identical shape on the projected map to its 3D shape on the Earth’s surface. Conformal projection is accomplishable by exact transformation of angles around points. Equivalence, equidistance, and true direction are less important than conformality in thematic mapping from satellite imagery. They are not covered here. There are many ground coordinate systems in existence. Some systems are suitable for one part of the globe while others are designed for the entire Earth’s surface. Two coordinate systems, UTM and New Zealand Map Grid (NZGM), are introduced in this section.
5.2.1
UTM Projection
The Universal Transverse Mercator (UTM) projection is a common cylindrical projection with meridians as lines of tangency, and parallels as lines of secancy. After the projection cylinder is rotated 90° from the vertical axis, the Earth’s surface is placed in such a way that the cylinder intersects with it at the desired central meridian. All parallels are projected to the cylindrical surface mathematically. Graticular angles of 90° are obtained after the cylinder is “cut.” Lines of tangency run north and south, along which there is no scale distortion. Lines of secancy represent scale variation, even though it is possible to preserve the scale along one or two parallels. In order to minimize the amount of distortion, the Earth’s surface is divided into dozens of small zones. Each zone is projected individually but identically, with the central meridian varying from zone to zone. The smaller a zone is, the smaller the error in approximating its surface as a flat one. The UTM coordinate system is an international plane system commonly adopted for medium- and large-scale maps. It ranges from 84°N to 80°S in latitude, outside which there is a significant geometric distortion. Such a large geographic area is divided into 60 nonoverlapping quadrangles in order to minimize distortion. These zones are numbered 1 to 60 eastward, beginning at 180°W (Fig. 5.6). Each zone covers 6° of longitude. Within each zone, the central meridian is used as the line of tangency. It extends 3° eastward and 3° westward. Each zone is divided into horizontal bands spanning 8° of latitude. These bands are lettered, south to north, beginning at 80°S with the letter C and ending at 84°N with the letter X. The letters I and O are skipped to avoid confusion with the numbers one and zero. In total, there are 21 zones in the north-south direction. With the exception of band X that covers 12° of latitude, all bands cover a latitude of 8°. Each UTM band is projected independently using formulas for a transverse version of the Mercator projection. This projection is conformal and displays true direction along straight lines. In order to reduce the distortion within each zone, scale along the central meridian is reduced to 0.9996. Thus, two parallel lines approximately 180 km away from the central meridian do not have any distortion. Each zone is divided into 100,000 m2, within
Image Geometric Rectification 180° 6° 90°W
Y
Z
90°E
6° 180° 84°
168°
156°
30° 24° 18° 12° 6° 0° 6° 12° 18° 24° 30°
156°
168°
180°
X 72° W 64° V P 6° N 0° M 6°
L E
64°
D 72° C
80° 1
2
3
4
5
26 27 28 29 30 31 32 33 34 35
90°W
A
B
56 57 58 59 60
90°E
180°
FIGURE 5.6 Distribution of grid zones in the UTM system around the globe. Each 6° latitude by 8° longitude quadrilateral is identified by its column number and row letter. The tinted zone is 29M.
which coordinates are measured northing and easting in meters. The northing values start from zero at the equator in a northerly direction. The equator has a northing value of 0 m in the northern hemisphere. For northings in the southern hemisphere, the origin is defined as a point 10,000,000 m south of the equator in order to avoid negative coordinates. Similarly, a central meridian through the middle of each 6° zone is assigned an easting value of 500,000 m. Grid values to the west of this central meridian are smaller than 500,000; to the east, larger than 500,000. Because the same coordinate system is applied to all zones and to both hemispheres, it is necessary to add the UTM longitudinal zone in front of the easting value and the hemisphere (e.g., N or S) after the coordinates to differentiate the location, for instance, (510,000E, 4,970,000N).
5.2.2
NZMG
Similar to the UTM, the NZMG is also a conformal projection that was devised specially for New Zealand because of its elongated
153
Chapter Five 167
168
169
170
171
172
173
174
176
175
177
178
179
34
34
35
35
36
36 + +0 0.02 +0 .01%% .0 0% –0 .0 1%
38
.0 2
%
–0
.0
38
37
2%
+0.0 +0.0 2% +0.0 1% 0 –0. % 01%
37
39
–0
39
40
40
41
41
42
42
43
43
45
.0
1%
45
44
+0 +0 .00 +0 .01 % .0 % 2%
% .02 +0 .01% +0 .00% +0 1% .0 –0
44
46
–0
154
46
47
47
48
48 165
166
167
168
169 170
171
172
173
174
175 176
177
178
179
180
FIGURE 5.7 Distribution of scale distortion in the NZMG projection. Lines are indicative of constant scale and constant convergence. (Source: modified from Stirling, 1974.)
shape (Fig. 5.7). Unlike other conformal projections, this projection involves a minimal amount of distortion in scale that ranges from +0.023 to −0.022 percent over the land area of the country at the expense of abandoning orderly arrangement of scale curves. Its true origin is located at (173°E, 41°S). The meridian of longitude 173°E is so oriented that its tangent at the origin forms the north-south axis of the coordinate system (Stirling, 1974). As with all other coordinate systems, the NZMG coordinate system has a false origin that is assigned a pair of sufficiently large
3 000 000 E
2 900 000 E
2 800 000 E
2 700 000 E
2 600 000 E
2 500 000 E
2 400 000 E
2 300 000 E
2 200 000 E
2 100 000 E
2 000 000 E
Image Geometric Rectification
4 800 000 N
4 700 000 N 4 600 000 N
4 500 000 N 4 400 000 N 4 300 000 N
4 200 000 N
4 100 000 N 4 000 000 N
5 900 000 N 5 800 000 N 5 700 000 N 5 600 000 N
5 500 000 N 5 400 000 N 5 300 000 N
FIGURE 5.8
Coordinates in the NZMG coordinate system.
coordinates at (2,510,000E, 6,023,150N) to prevent negative coordinates. In this metric system all coordinates are expressed in seven digits. Easting coordinates are arbitrarily assigned a minimum value of 2,000,000 m and can go as high as 3,000,000 m (Fig. 5.8). Northing coordinates are assigned a minimum value of 5,000,000 m, much larger than the maximum easting, to avoid overlapping with easting coordinates. Therefore, northing is easily distinguishable from easting no matter which one is stated first.
155
156 5.3
Chapter Five
Fundamentals of Image Rectification 5.3.1
Common Terms
Image geometric projection has been described using various terms. The first term, image geometric rectification, refers to the process during which geometric distortions inherent in the input image (e.g., distortions caused by the sensor, the platform, and the scene itself) are removed from the output image. Through geometric modification the output image is made to have the desired projection, a uniform scale, and a proper orientation. Image geometric rectification emphasizes the elimination of distortions. In practice, this term is not without flaws as the behavior of random distortions is not exactly known. Therefore, these distortions cannot be realistically eradicated. For this reason the second term georeferencing is preferred by some image analysts. It simply means reprojection or transformation of an image from a local coordinate system, in which coordinates are expressed in rows and columns, into a global one in easting and northing. During image transformation, systematic distortions are eliminated while random ones are suppressed to an acceptable level. Another term commonly associated with image rectification is called coregistration, the process of registering the coordinate system of one image to that of another, even though both images could have a local coordinate system. Image coregistration differs from geometric rectification and georeferencing in that only the image coordinate system is altered while no distortion is dealt with if both input images contain geometric distortions. If the master image is geometrically reliable, the geometric distortions in the slave image are removed or suppressed through coregistration. However, any geometric uncertainty in the source image also propagates into the coregistered image. Image coregistration is usually undertaken for multitemporal images of the same geographic area when no ground control exists. It commonly finds applications in change detection (see Chap. 13). Wherever ground control is available, image coregistration should be avoided to prevent the geometric inaccuracy in the input images from propagating to another image.
5.3.2
Image Geometric Transformation
Two-dimensional remotely sensed imagery, either spaceborne or airborne, is imaged at a height as much as hundreds of kilometers above the Earth, much higher than the topographic relief on the Earth’s surface. This 3D surface can be approximated as a flat one safely in applications in which geometric positioning is not a critical concern, such as in thematic resources mapping. Therefore, only their horizontal position measured in easting (E) and northing (N) in a ground coordinate system needs consideration in image rectification. The pixel
Image Geometric Rectification
Northing
Column
Row
O
(E, N)
(r, c)
f (a)
O
Easting (b)
FIGURE 5.9 Spatial arrangements of pixels in the input image that contains geometric distortions (a) and their corresponding distribution on the ground (b). Pixels are not regularly spaced in the output image after the removal of geometric distortions (exaggerated).
corresponding to this position has the image coordinates of row r and column c, respectively (Fig. 5.9a). The task of image georeferencing is to transfer this pair of local plane coordinates into the global ones systematically. Through the established mathematical equations, the position of every pixel in the input image is associated uniquely with a location on the ground. Conceptually, this relationship is expressed mathematically as
E = f1(r, c)
(5.9)
N = f2(r, c)
(5.10)
where f1 and f2 = transformation functions whose specific form varies with the transform model adopted r and c = image coordinates usually expressed as integers. They may be expressed in meters by multiplying by the spatial resolution of the imagery E and N = coordinates in the desired ground coordinate system to which the input image is to be projected, such as the UTM In the output image the spacing of pixels is no longer regular after the removal of geometric distortions (Fig. 5.9b). Nor are they aligned properly in a clearly defined orientation in light of severe distortions. The spacing between any two neighboring pixels does not equal the spatial resolution of the imagery. Instead, it varies with local image compression or stretching. Furthermore, the number of rows and columns of pixels needed to represent the image may also differ from that of the input image.
157
158
Chapter Five (r0, c0)
(a)
(E0,N0 )
(b)
(c)
(d)
(e)
(f)
FIGURE 5.10 Effects of different terms in the polynomial equation on the rectified image. (a) Input image; (b) output image shifted in its origin; (c) output image that has been scaled; (d) output image that has been rotated. It shares the same origin as the input image; (e) output image that has been linearly transformed. Notice that the image still has a clear-cut edge; (f ) output image that has been nonlinearly transformed. Its edge is not linear anymore.
The transformation from (r, c) to (E, N) may be fulfilled through a number of geometric operations, such as lateral shift, scaling, and rotation (Fig. 5.10). Lateral shift means that the origin of the coordinate system is moved from one set of values to another by adding a constant to the original values. It does not cause any change to the shape and size of the output image (Fig. 5.10b). So the transformed image resembles exactly the input image in geometry. Scaling involves a change in the physical dimension of the image achievable by multiplying a scalar to the original coordinates (Fig. 5.10c). The output image has a physical size either larger or smaller than the original one. In either case, the shape of the image is preserved. Rotation is a change in the orientation of the output image around the origin (Fig. 5.10d). The output image is still square in shape. It has a physical size identical to that of the input image. However, it requires a larger number of rows and columns to represent as a result of the rotated orientation. The shape of the rectified output image may no longer be regular (Fig. 5.10e and f ) if f1 and f2 are nonlinear.
5.3.3
GCPs in Image Transformation
The establishment of f1 and f2 requires ground control points (GCPs). They are distinctive physical features on the ground that are readily identifiable from remote sensing images or topographic maps. The accuracy of locating these features is affected by the contrast between them and their surrounding environment, which is related indirectly to the spatial resolution of the image being rectified. These features must be sufficiently large to be recognizable on coarse resolution images (e.g., on the order of tens of meters). This dimension may be reduced for linear features or when the contrast with the surroundings is sufficiently distinct. This implies that only those features that are large enough to register as distinctive points and thus discernable on the image can serve as GCP candidates. Apart from visibility, these points must allow their positions to be pinpointed precisely. The more
Image Geometric Rectification
FIGURE 5.11 Exemplary points (arrows) that can serve as reliable GCPs. These points are easy to identify on satellite imagery, and their positions can be pinpointed accurately.
distinctive these landmarks are, the easier and the more accurately they can be located on the image. The more accurately their position is pinpointed, the more reliable their image coordinates are determined, and the more accurately the image is transformed to the ground coordinate system. Quality GCP candidates are exemplified by road and street intersections that can be located at a high accuracy level (Fig. 5.11). Another quality that all GCPs should possess is temporal stability. They should not have moved since imaging if the location is going to be determined using a global positioning system (GPS) in the field, or since the day of photographing if the position is to be determined from a topographic map compiled from aerial photographs. GCPs based on land-water interface are useable only when the water level is temporarily stable. Theoretically, GCPs are only point features without any area. However, the minimum unit identifiable on remote sensing imagery is the pixel. No matter how small a pixel is, it covers a ground area. The incongruity between point and area is reconciled by interpreting the point position as the center of the pixel. The ground coordinates (E, N) of GCPs can be determined from topographic maps or measured on the ground. Their image coordinates are usually determined from the image displayed on a screen. With these two sets of coordinates, it is possible to establish f1 and f2. Once the equations are established, they are then applied to all pixels in the input image. With the established transform equations, the distortion-infected image coordinates (r, c) are converted to mostly error-free global ones.
159
160
Chapter Five
5.3.4
Sources of Ground Control
There are two sources for obtaining the ground coordinates (E, N) of GCPs, topographic maps and the global positioning system (GPS) (see Sec. 5.6). Topographic maps were the exclusive source of ground control prior to the advent of GPS technology. This source of ground control is relatively cheap and efficient. Coordinates are interpolated from analog topographic maps using a ruler within hours. Linear interpolation is essential for those GCPs whose position does not coincide with the kilometer grids on the analog topographic map. No expensive equipment is needed, nor is a trip to the field necessary. However, the accuracy of the obtained coordinates is subject to map scale and the care taken during reading the coordinates. Coordinates derived from topographic maps encompass two types of uncertainties, uncertainty inherent in the map itself and uncertainty in reading coordinates. According to the U.S. Geological Survey standards of topographic mapping, horizontal accuracy must be within 2.5 m at 1:25,000 and 5 m at 1:50,000. For this reason a map of a larger scale is always preferable to that of a smaller scale. Under ideal conditions the minimum resolving distance of a naked human eye is 0.1 mm in interpolating the coordinates. This uncertainty is translated into a ground distance of 5 m if the map has a scale of 1:50,000. In reality, of course, the uncertainty in estimating the coordinates is much larger than this theoretic minimum. This source of inaccuracy is irrelevant if the coordinates are obtained from a digital topographic map. Determination of coordinates from a digital map is usually much more accurate than from an analog counterpart since no interpolation is required. Coordinates can be read directly from the computer screen within minutes. The accuracy of the read coordinates can be made higher through zoom-in. However, the reliability of the obtained coordinates is subject to the accuracy of the digital map. Any inaccuracy of the map propagates into the acquired coordinates for both analog or digital maps alike. The combination of source inaccuracy with reading inaccuracy can easily create an uncertainty level around 10 m. Apart from the low reliability of coordinates, topographic maps are limited in that they are not readily accessible for certain parts of the world. Even if available, they may be obsolete or at an inappropriate scale. An alternative method of obtaining reliable ground control is to take advantage of GPS. The accuracy of GPS-derived coordinates is governed primarily by the performance of the GPS receiver. With the use of a competent GPS unit it is quite common to achieve an accuracy within 5 m under normal logging circumstances. This accuracy level may be further improved through more processing. With proper postprocessing, GPS-derived coordinates are much more accurate than those acquired from topographic maps. Another added advantage of using GPS is the ability to make use of distinctive ground features that have been generalized on the topographic map because of their perceived insignificance or small scale of the map. Thanks to
Image Geometric Rectification the use of GPS technology, the choice in selecting quality GCP candidates is broadened. Nevertheless, this superior performance of GPS technology is not without limitations. Since coordinates have to be logged in the field, it can take a lengthy time to travel there. Extended field trips are essential in reaching distant points. Besides, the technology is highly restricted by site accessibility. GCPs located in remote, isolated, or mountainous areas are not so easily accessible if there is a lack of vehicle navigable roads, such as in rural environments in some developing countries. Accessibility is also an issue if GCPs are on private land. Prior access authorization must be gained from respective land owners. This expensive method is the only choice for areas where no up-to-date topographic map is available or where the ground has changed considerably since the topographic map was compiled.
5.4
Rectification Models The transformation of a local image coordinate system to a global one requires a rigorous geometric model. Many geometric models have been devised for f1 and f2. Ranging from affine to projective transformation, these models are designed for processing images acquired from a variety of sensors and platforms, and for areas of varying topographic relief. Some of them are applicable to images obtained from a particular sensor, whereas other generic ones can be used for all types of imagery.
5.4.1 Affine Model The affine model is a custom geometric correction model that allows three modifications to be made to the input image: scaling (change in pixel size), offset (lateral shift in image origin), and rotation (change in image orientation). Through offset an image is moved laterally by a user-specified number of pixels in both the easting and northing directions. It does not involve any change in image geometry (e.g., shape and dimension).
5.4.2
Sensor-Specific Models
Dissimilar to physical models that are based on the imaging process (Okeke, 2006), sensor-specific models depict the mathematic relationship between the image space and the object space. As a function of time over the imaging period, these models depict the position of the satellite at the time of imaging They incorporate exterior orientation parameters of the sensor supplied with the data in the transformation, such as the parameters accompanying the Landsat series of satellite data. Landsat-specific models can be used to rectify Landsat Thematic Mapper (TM) and Multispectral Scanner (MSS) images
161
162
Chapter Five only. Other sensor-specific models include those designed to rectify Synthetic Aperture Radar (SAR) images and Indian National Remote Sensing Agency–generated (Indian Remote Sensing) IRS-1C/1D data. These sensor-specific models are computationally and mathematically rigorous and complex, and produce more accurate geometric rectification than generic models. However, with increasing availability of diverse sensors, such as frame camera, panoramic camera, pushbroom scanner, whiskbroom sensor, and radar antenna, it is not always feasible to find all sensor-specific models in an image analysis system. It may be difficult to add new models or to revise existing ones. Moreover, sensor parameters for commercial remote sensing satellites (e.g., IKONOS) are not routinely supplied to the user. Instead of sensor-specific models, the rectification of these images has to rely on generalized sensor models. One particular type of generalized sensor model is the Rational Function Model (RFM) (Tao and Hu, 2001).
5.4.3
RPC Model
Generalized rectification models are independent of sensor platforms and sensor types. They model the relationship between the image space and the Earth’s surface as some general function instead of the physical process of imaging, so it is called RFM. This model is commonly realized through rational polynomial coefficients or rapid positioning coordinates (RPC). The rational coefficients describe the geometric relationship between the sensor and the Earth’s surface. These coefficients pertain to the interior and exterior orientation parameters of the sensor at the instant of imaging. With these coefficients, it is even possible to rectify images without using GCPs, even though it is desirable to further refine RPC models with the use of GCPs because this refinement can further improve the accuracy of image rectification. RFM is very suited to rectification of images at a high accuracy level and in rectifying very high resolution satellite imagery such as IKONOS and QuickBird. The RPC model can be implemented in two modes, standard and fast. In the standard mode, the transformation equations are established pixelwise for every pixel. The calculation of each pixel’s location using elevation and geoid information is a lengthy process. This can be sped up by using the fast mode in which the equations are established for a grid of points spaced throughout the image. A faster speed is achieved at the expense of losing some accuracy, but usually not a substantial amount (Leica Geosystems, 2006).
5.4.4
Projective Transformation
This transformation is applicable to frame aerial photographs that do not contain warping. The photograph plane is projected to a corresponding plane on the ground (Fig. 5.12) via the following equations:
Image Geometric Rectification O
d a
Imagery plane
c b D
C A
Ground surface B
FIGURE 5.12 The relationship between a photo plane and its corresponding plane on the ground in the projective transform. This relationship is uniquely determined by four control points.
r=
a1E + a2 N + a3 c1E + c 2 N + 1
(5.11)
c=
b1E + b2 N + b3 c1 E + c 2 N + 1
(5.12)
where ai, bi, and ci (i = 1, 2, 3) are the projective parameters. They are uniquely determined with the assistance of four object points. There is no need to consider the elements of exterior and interior orientations of the camera as they are implicit in these nine parameters (Novak, 1992). This model produces the best results when the area covered by the photograph has a flat terrain. With some modification, this model is applicable to rectification of satellite images obtained via along-track scanning (e.g., SPOT). Since each scan line has its own unique geometry, the above equations have to be repeated many times, each time for a single scan line. The satellite imagery cannot cover an extensive ground area, though. Otherwise it is subject to the distortion caused by the Earth curvature that cannot be adequately addressed with this model. Thus, it is suited to hyperspatial resolution images such as IKONOS and QuickBird.
5.4.5
Direct Linear Transform Model
Proposed by El-Manadili and Novak (1996), the direct linear transform model is a rigorous method for the accurate geometric rectification of SPOT images. This model is a simplified version of the general collinearity equations for images obtained with pushbroom scanning. A slight variation of this model is the camera model that is designed to rectify frame images obtained with a camera. This 3D model
163
164
Chapter Five requires provision of 3D GCP coordinates in the format of (E, N, H) with a balanced spatial distribution both horizontally and vertically. Elevation information is indispensable in building the model for removing relief displacement from the input image. Image coordinates are first geometrically corrected for Earth rotation and cell-size variations caused by oblique viewing. Direct linear transform models are purely solvable based on the principle of the least-squares adjustment through the use of a minimum of six GCPs to determine the 11 orientation parameters of a scene. However, more GCPs are needed to achieve a satisfactory solution in practice. The least-squares adjustment also removes systematic residuals after the preliminary processing. With this model submeter accuracy can be achieved using as few as six GCPs for SPOT data.
5.4.6
Polynomial Model
The polynomial model is a popular generic model completely independent of the imaging sensor, and hence highly suited for rectifying satellite images whose geometry and distortions are difficult to model. This model is generic in that it can be applied to all sorts of images, even though some of them may be more accurately georeferenced with other sensor-specific models. Its high flexibility allows customization of geometric correction via polynomial equations that can have a varying number of terms. The complexity of the model can be adapted to suit the availability of the numbers of GCPs and to meet different transformation precision requirements. It is possible to use the first-order polynomials to project raw satellite imagery with satisfactory accuracy if there is not much distortion in it. Higher-order polynomials are preferred for images suffering from nonlinear geometric distortion. Irrespective of the model complexity, only the horizontal position of pixels is dealt with in all polynomial model–based image transformations. Three-dimensional GCPs are not required in performing image rectification based on this model. All geometric distortions of the input image (e.g., sensor-caused distortion, relief displacement, Earth curvature, and so on) are addressed in one transformation. However, relief displacement cannot be inadequately removed from the rectified image.
5.4.7
Rubber-Sheeting Model
The rubber-sheeting model is a piecewise polynomial model for geometrically correcting severely warped images in a number of steps. The first step is to form a triangulated irregular network (TIN) from all the available GCPs. The image area encompassed by each triangle in the network is rectified using the first- (linear) or fifth- (nonlinear) order polynomials. Because of geometric uncertainty, the areas outside the convex hull of the TIN (i.e., extrapolation) should not be rectified using this model. The rubber-sheeting model is appropriate for
Image Geometric Rectification rectifying highly distorted images when a large contingent of GCPs is available. For this reason it should not be the first choice if other geometric models are applicable (Leica Geosystems, 2006) because the output image may suffer discontinuity at the transit from the faucet of one triangle to the next. It is impossible to generalize which model is the best to use. The answer to this question relies on many factors, the most important being the image to be rectified. Special satellite images, such as QuickBird, Landsat, and SPOT data, are best rectified with sensor-specific models or using the polynomial coefficients provided by the data supplier. If ancillary information about the image (e.g., RPC file) is available, then specific models should be attempted first, supplemented with additional GCPs for higher accuracy. However, if the residual of the rectification is very large, other models such as the generic ones may be used with the addition of more GCPs if their number is not sufficiently large yet.
5.5
Polynomial-Based Image Rectification Of the various image transform models, the polynomial method is the most flexible and versatile. It is the only generic model suitable for rectifying all sorts of satellite imagery, and hence warrants an in-depth discussion. Polynomial-based image rectification is implemented in several steps and requires a varying number of GCPs at different accuracy levels. All of these issues are discussed in this section.
5.5.1 Transform Equations In polynomial-based image rectification, Eqs. (5.9) and (5.10) are rewritten specifically in the following polynomial form:
E = f1(r, c) = a0 + a1r + a2c + a3r2 + a4rc + a5c2 + …
(5.13)
N = f2(r, c) = b0 + b1r + b2c + b3r2 + b4rc + b5c2 + …
(5.14)
where ai and bi (i = 0, 1, 2, ...) are the transformation coefficients. Polynomial equations as shown above do not recognize the internal relationship between (r, c) of a pixel and its (E, N). Instead, the two sets of coordinates are linked up arbitrarily. The highest power of r, c, or their combination in the above equations is known as the order of transformation. It exerts a profound impact on the nature of image rectification. In case of no order (e.g., r = 0 and c = 0), the rectification degenerates into a simple shift in the origin of the image coordinate system by a0 and b0 (see Fig. 5.10b). In the absence of shift (e.g., both a0 and b0 are 0), the modifications involved are only scaling and rotation in both the easting and northing directions in a first-order transformation (Fig. 5.10b, c, d). This order permits a linear transformation
165
166
Chapter Five between (r, c) and (E, N). The original image can only be rotated and shifted. There is no change in the relative distance and position between pixels in the input image and the output image. The shape of the output image can be rectangular or skewed, depending upon the value of the transformation coefficients. However, the two opposite sides are still parallel to each other. By comparison, a secondorder transformation enables more nonlinear geometric distortions to be removed. In addition to rotation and shift in the origin, it is also possible to remove local scaling (e.g., stretching and compression). Thus, the border of the output image no longer follows a straight line, as is the case with the first order (Fig. 5.10, e). Theoretically, Eqs. (5.13) and (5.14) can continue indefinitely with many more terms. However, past experience has demonstrated that after the second order as shown above, the amount of work and computation involved increases exponentially while the accuracy of E and N hardly improves. Therefore, the second-order transformation appears to have the optimal balance between accuracy and complexity. Since the terms r, c, r2, rc, and c2 are common to both Eqs. (5.13) and (5.14), they are rewritten more concisely in the matrix format below:
⎛ E ⎞ ⎛ a0 ⎜⎝ N⎟⎠ = ⎜ b ⎝ 0
a1 b1
a2 b2
a3 b3
a4 b4
⎛ 1⎞ ⎜ r⎟ a5⎞ ⎜ c ⎟ ⋅⎜ ⎟ b5⎟⎠ ⎜ r 2⎟ ⎜ rc⎟ ⎜ 2⎟ ⎝c ⎠
(5.15)
In the above equation, matrices (E N)T and (1 r c r2 rc c2) T are both known for GCPs. The only unknown term in the equation is the ⎛ a a1 a2 a3 coefficient matrix ⎜ 0 ⎝ b0 b1 b2 b3 through inversion of Eq. (5.15), or
⎛ a0 ⎜⎝ b 0
a1 b1
a2 b2
a3 b3
a4 b4
a4 b4
a5⎞ . This can be determined b5⎟⎠
⎛ 1⎞ ⎜ r⎟ a 5⎞ ⎛ E ⎞ ⎜ c ⎟ ⋅⎜ ⎟ = b5⎟⎠ ⎜⎝ N⎟⎠ ⎜ r 2⎟ ⎜ rc⎟ ⎜ 2⎟ ⎝c ⎠
−1
(5.16)
Solution of the above equation relies on GCPs. Once these coefficients are determined, Eqs. (5.13) and (5.14) are then applied to the entire image. After image transformation, all pixels in the output image are expressed in ground coordinates instead of image coordinates.
Image Geometric Rectification
5.5.2
Minimum Number of GCPs
The minimum number of GCPs required to carry out image georeferencing is governed by the order of transformation, which is related to the number of transformation coefficients to be resolved in Eq. (5.16). In a first-order transformation, there are six coefficients to be determined. Since each GCP enables two transformation equations (one for E and another for N) to be established, the minimum number of GCPs required is equal to three. Accordingly, twelve coefficients are needed in a second-order transformation. They are solvable with a minimum of six GCPs. The relationship between the minimum number of GCPs required Nmin and the order of polynomial equations t is generally expressed as
Nmin = (t + 1) × (t + 2)/2
(5.17)
The minimum number of GCPs needed for performing an image transformation up to the fifth order has been calculated according to the above relationship (Table 5.2). It increases almost exponentially with the transformation order. The transformation coefficients are uniquely determined if only the minimum number of GCPs is used. In reality, however, the actual number of GCPs selected usually exceeds the bare minimum shown in Table 5.2, even in areas that do not offer many quality GCP candidates. The extra GCPs can be treated as check points through which any errors or inaccuracy in identifying GCPs in the image, in reading their coordinates from topographic maps, and in entering the coordinates into the computer are revealed. It is quite common to misidentify GCPs on a topographic map or on an image. Pinpointing the points on the remote sensing imagery is subject to inaccuracy. The position can be missed by a few pixels easily on a coarse resolution image if the GCPs are not sufficiently distinct. The extra GCPs provide the flexibility and luxury of deleting those that have been poorly identified or inaccurately
Order of Transformation
Minimum No. of GCPs Required
1
3
2
6
3
10
4
15
5
21
6
28
TABLE 5.2 Order of Transformation and the Minimum Number of GCPs Required
167
168
Chapter Five pinpointed. If more GCPs than are necessary are retained in the transformation equations, the coefficients of the multiple equations are then solved using the principle of least-squares adjustment. According to this principle, the coefficients are so estimated that the sum of the squared residuals at each of the GCP is minimal. Therefore, residuals among all the retained GCPs are dependent on one another. No matter how many GCPs are selected, they must always have a balanced spatial distribution to achieve quality rectification. All selected GCPs must be dispersed widely throughout the entire image. If the area of interest makes up only a portion of the whole scene, then the GCPs should be distributed well beyond its border. In this way all portions of the rectified image have the same geometric reliability. Any areas that are not adequately represented by ground control are virtually extrapolated from the covered area. The final output image generated from extrapolation is much less reliable geometrically than that produced from interpolation. Any areas that lack GCPs have a lower geometric reliability than that indicated by the overall accuracy.
5.5.3 Accuracy of Image Transform Application of Eqs. (5.13) and (5.14) results in one set of coordinates called estimated (E, N) for all retained GCPs. They also have another set of observed coordinates ( Eˆ , Nˆ ) that are obtained either from a topographic map or using a GPS unit. Owing to the aforementioned reasons (e.g., coordinate inaccuracy, inaccuracy in identifying the GCPs and in reading their coordinates, and the use of more GCPs than is necessary), these two sets of coordinates are unlikely to be identical. Their difference, termed rectification residual, is rarely equal to zero. Instead, it varies from GCP to GCP. At a given point, the residual in easting may differ from that in northing. Statistical analysis of residuals at all GCPs yields an accuracy indicator called root-mean-square-error (RMSE). It is derived using the following equations in both easting and northing: RMSEE =
1 n 2 δ Ei = n∑ i=1
1 n ˆ (Ei − Ei )2 n∑ i=1
(5.18)
RMSE N =
1 n 2 δ Ni = n∑ i=1
1 n ˆ ( N i − N i )2 n∑ i=1
(5.19)
and
where
n = the total number of GCPs finally retained in a rectification Ei and Ni = easting and northing coordinates of the ith GCP calculated from the established functions f1 and f2 Eˆ , Nˆ = reference coordinates in easting and northing that are obtained from topographic maps or using a GPS
Image Geometric Rectification The overall accuracy of the transformation is evaluated by integrating residuals in both the easting and northing directions at all the GCPs utilized. The final accuracy indicator RMSEEN is calculated using the following formula: RMSEEN =
1 n 2 (δ Ni + δ Ei2 ) n∑ i=1
(5.20)
The accuracy of image rectification is affected by the following factors: • The quantity of GCPs and their reliability, which in turn is controlled by the quality of the GCP source. • The precision in locating these GCPs in the image being rectified. It is much easier and more accurate to locate these points in imagery of a fine spatial resolution than in a coarse one. Similarly, image coordinates are more reliably determined from imagery of a fine spatial resolution. The accuracy is even higher if the points are more conspicuous (e.g., road intersections amid agricultural fields). • Accuracy of the ground coordinates. They are more accurate if logged with a GPS unit than read from a topographic map. • Finally, the order of transformation. A high order of transformation is usually associated with a more accurate rectification. There is no theoretic guideline as to what RMSE value is the acceptable minimum. The conventional wisdom, or rule of thumb, is that overall RMSE should not exceed one pixel size in value. For instance, it should be less than 20 m for SPOT multispectral bands or smaller than 30 m for Landsat TM data. This tolerance level is justified by the rationale that within a pixel it is impossible to locate the GCP precisely. If the actual RMSE of a rectification exceeds this limit, it can be reduced to within the one-pixel limit by using the following two approaches: • First, the GCPs with a relatively large residual are excluded from the transformation. Worse than average GCPs may be sequentially removed from the transformation until RMSE falls within one pixel for the first time. It must be emphasized that removal of any selected GCPs reduces control over certain parts of the image being rectified. Deletion of more carefully selected GCPs from a rectification creates an increasingly imbalanced spatial distribution of GCPs. The geometric accuracy achieved is thus spatially more uneven. Geometric uncertainty rises in areas where GCPs have been removed.
169
170
Chapter Five • Second, the order of transformation is raised if there is a sufficient number of GCPs available. Caution must be exercised in adopting a higher order in that this order of transformation must have a physical meaning. Furthermore, the increase in the transformation order has its limits. Beyond the second order the accuracy of transformation improves only marginally despite an exponential increase in the amount of work involved. Besides, higher-order polynomials produce unreliable results for satellite images with simple geometric conditions (e.g., near-vertical or relatively flat areas), while a low-order polynomial is able to produce submeter rectifications (Rosenholm and Akerman, 1998). Thus, the second option is not as effective as the first one. In either case, the transformation coefficients have to be recalculated with the revised georeferencing setting, and the RMSE updated before they are used to create the output image. Shown in Table 5.3 is an example of image rectification results using 17 GCPs. The first column represents GCP sequential numbers. GCP image coordinates expressed in row and column are provided in the second and third columns, respectively. They can be entered into the computer by clicking on the points in the image directly. Their coordinates in the ground coordinate system to be projected are listed in the next two columns. They have to be entered into the computer manually. The nature (type) of GCP is identified in the next column. It can have two possibilities, control or check. In this particular case, all GCPs were used as control points. Calculated using Eqs. (5.18) and (5.19), respectively, the residuals in easting and northing for each GCP are provided in columns seven and eight, expressed in the image unit (e.g., pixel size). The last column shows the RMSE at each point, calculated using Eq. (5.20) with n being 1. Those with the largest residual or largest contribution represent the worst GCPs. For instance, the unusually large residual at GCP 15 in easting (4.699) is indicative of the presence of a mistake that could have stemmed from incorrect entry of the coordinates into the computer or erroneous identification of the GCP on the image or on the ground. This GCP may be discarded by turning it into a check point. Of particular note is the connectivity among all residuals. After the largest residual associated with GCP 15 is removed from the calculation, residuals at other GCPs will become smaller accordingly. Y residuals will become larger for some GCPs because its Y residual (0.157) is smaller than the mean. Also presented in the table are the overall residuals in both easting (RMSEX = 1.4957) and northing (RMSEY = 0.4936), with the overall RMSEXY being 1.575. This outcome, larger than the expected 1, can be reduced to within one pixel after GCP 15 is turned into a check point.
TABLE 5.3 Accuracy of Image Rectification Using 17 GCPs
172
Chapter Five
5.5.4
Creation of the Output Image
The discussion so far has concentrated only on the geometric position of pixels in the rectified image. Their radiometric values in the output image have not been resolved yet. This issue is dealt with by resampling the input image. As illustrated in Fig. 5.9, pixels in the output image do not have a regular interval after the removal of geometric distortions. This pixel spacing does not reflect the spatial resolution of the original image anymore. Furthermore, the ground coordinates calculated using Eqs. (5.13) and (5.14) rarely correspond to the ground position where pixels are sampled. The output pixels do not have any radiometric values associated with them. Therefore, the idea of transferring the value of pixels in the input image to the output image directly, as discussed previously, has to be abandoned. Instead, the process must be reversed for all pixels with the exception of the four corner ones. After their location in the output image is determined using the transformation equations. (5.13) and (5.14), the output image is created in two steps: • First, the position of all other pixels in the output image is determined by sampling the newly created empty image at a constant interval equivalent to the image’s spatial resolution both horizontally and vertically, starting from the corner pixels. In this way, it is guaranteed that the rectified image will retain the same spatial resolution as the input image. Once the position of all pixels in the output image is determined, then their corresponding position in the input image is estimated by inversing Eq. (5.15), or
r = f1−1(E, N)
(5.21)
c = f2−1(E, N)
(5.22)
• Second, the radiometry at the position calculated in step one in the input image is estimated via a process known as radiometric resampling in which the radiometric value is estimated from the pixel values in the neighborhood. Resampling of radiometric values at these positions is precipitated by the fact that the inversed r and c from Eqs. (5.21) and (5.22) are likely to be floating points. At these calculated positions there are no pixels. Instead, radiometric values are available at pixels that are spaced neatly in a regular grid in the original image. The radiometric values at these calculated r and c coordinates have to be estimated from the value of pixels in the vicinity. The definition of the neighborhood size varies with the resampling method. There are three resampling methods in creating the output image.
Image Geometric Rectification
41
51
41 DN1
51
38
46 DN1 x
36
2 41 DN x 42 +
0.3 0.2 0.7
41 +
+ 42 0.8
34
42
42
34
53
56
51
55
32
3 34 DN x
42
48
28
30 DN x 4
36
40
DN2
(a)
(b)
(c)
FIGURE 5.13 Mechanism of sampling pixel values using various methods. (a) Nearest neighbor; (b) bilinear; (c) cubic convolution.
Nearest Neighbor In this method, the output pixel is assigned the pixel value of its closest adjoining pixel in the input image. For instance, the position of an output pixel falls within four pixels that have radiometric values of 41, 51, 34, and 42, respectively (Fig. 5.13a). Since the calculated position (r = 0.2, c = 0.3) is closest to the upper left pixel that has a value of 41, the output pixel receives a value of 41. This resampling method is simple to understand and implement. It does not require complex computation. Most of all, it does not involve altering the input pixel value. The output value at the new location is the same as that in the input image, or change with distance if the four corner pixels have different values. This problem is successfully overcome in the bilinear method.
Bilinear Interpolation This method is underpinned by the assumption that pixel value varies linearly from one location to another. Thus, the pixel value at a location can be interpolated from its immediately neighboring pixels through distance-weighted averaging. Bilinear interpolation involves three linear interpolations of four neighboring pixel values. Since the input pixel is surrounded by two pixels above and below it, to its left and to its right, two interpolations have to be undertaken in one of these two directions first. After the radiometric value in either direction is interpolated, the final pixel value is linearly interpolated from the two interpolated values in a perpendicular direction. The sequence of executing the interpolation in a particular direction bears no sequence with the final interpolated results. In all interpolations, the horizontal and vertical distances between the pixel under consideration and its
173
174
Chapter Five four neighbors are treated as weights. So the output pixel value is a proximity-weighted average of its four nearest pixel values. The calculation for the pixel shown in Fig. 5.13b is illustrated below. In this example, the interpolation is performed horizontally first and vertically next.
DN1 = 41 + (51 − 41) × 0.3/(0.3 + 0.7) = 44 DN2 = 34 + (42 − 34) × 0.3/(0.3 + 0.7) = 36.4 DN = 44 + (36.4 − 44) × 0.2/(0.2 + 0.8) = 42.48 = 42 After the first round of interpolation the results can be accurate to one decimal point. However, it must be rounded to the nearest integer after the second round of interpolation a pixel value is not allowed to have any decimal points if the image is saved as 8-bit unsigned. As shown in this example, there is little difference (only 1 in this case) between the results resampled using the nearest neighbor and bilinear interpolation methods, even though bilinear involves more computation.
Cubic Convolution In this method, 16 pixels surrounding the one under study are needed for the interpolation. The output pixel is assigned the radiance averaged from these 16 nearest neighbors after five interpolations. The algorithm for the computation (Moik, 1980) is provided below:
DN(i, Δj) = Δj{Δj[Δj(DN(i, j + 3) − DN(i, j + 2) + DN(i, j + 1) − DN(i, j)) + DN(i, j + 2) − DN(i, j + 3) − 2DN(i, j + 1) + 2DN(i, j)] + DN(i, j + 2) − DN(i, j)} + DN(i, j + 1) (5.23) where Δj = distance increment in column and DN(i, j + k)(k = 0, 1, 2, 3) = pixel value at the kth pixel to the right in row i. The above estimation is repeated 4 times, each time for one of the four rows or columns. After the four values are interpolated, the operation is repeated one more time in a perpendicular direction (e.g., vertically) using Eq. (5.24). This equation looks identical to Eq. (5.23) except that the initial pixel values are not raw but estimated from the previous four interpolations.
where Δi = distance increment in row, and DN(i + k, Δj) (k = 0, 1, 2, 3) = pixel value at the kth pixel down in column Δj.
Image Geometric Rectification Just as with bilinear interpolation, the final interpolated result of cubic convolution is not affected by the sequence of interpolation. Namely, the results are the same regardless of whether the interpolation is carried out horizontally first or vertically first. The interpolation for the pixel in Fig. 5.13c based on cubic convolution is illustrated below:
DN1: 0.3{0.3[0.3(56 − 53 + 46 − 38) + (53 − 56 − 2 × 46 + 2 × 38)] + (53 − 38)} + 46 = 48.727 DN2: 0.3{0.3[0.3(55 − 51 + 41 − 36) + (51 − 55.2 × 41 + 2 × 36)] + (51 − 36)} + 41 = 44.483 DN3: 0.3{0.3[0.3(48 − 42 + 34 − 32) + (42 − 48 − 2 × 34 + 2 × 32)] + (42 − 32)} + 34 = 36.316 DN4: 0.3{0.3[0.3(40 − 36 + 30 − 28) + (36 − 40 − 2 × 30 + 2 × 28)] + (36 − 28)} + 30 = 31.842 DN = 0.2{0.2[0.2(31.842 − 36.316 + 44.483 − 48.727) + (36.316 − 31.842 − 2 × 44.483 + 2 × 48.727)] + (36.316 − 48.727)} + 44.483 = 42.449536 = 42 Coincidentally, the cubic convoluted pixel value is identical to that obtained using bilinear interpolation after the pixel value is rounded down to the nearest integer. Compared to the other two methods, cubic convolution is much more complex and computationally intensive because the output value is estimated from more neighboring pixels. This method may not necessarily lead to more accurate interpolation, as demonstrated in the above example. It should be used with caution. The actual implementation of image rectification in an image analysis system requires specification of a number of image projection parameters, such as geometric correction model, order of transformation, map unit, spheroid, datum, and zone number. Not all of these options are applicable to a particular projection, depending upon the transform model selected and whether the transformation is 2D or 3D. All the entered projection information is stored in a single file, together with the rectified image data. The output image may have a larger physical dimension (e.g., more rows and columns than the original image) after rotation (Fig. 5.14), even though the ground area covered remains unchanged. Pixels outside the initially covered area are considered background. They are usually allocated a value of zero (black) that may be ignored in all subsequent processing steps.
175
176
Chapter Five
FIGURE 5.14 An example of an output image that has been rectified to the NZMG coordinate system based on a first-order transformation. The edge of the image appears to be straight. The image’s dimension has increased to 605 rows by 647 columns from 512 rows by 512 columns because of image rotation. (Copyright CNS, 1994.) See also color insert.
5.6
Issues in Image Georeferencing Several issues in image georeferencing remain unexplored. For instance, how does the source of ground control affect accuracy? What is the relationship between accuracy expressed in absolute values in meters and that expressed relatively in pixel size? How do the spatial distribution of GCPs and their ease of identification affect the accuracy of image georeferencing? In order to answer these questions, a comparative study was carried out in which a portable Trimble Geo-Explorer GPS receiver was deployed to repeatedly log the geographic positions of 20 GCPs in metropolitan Auckland, New Zealand (Fig. 5.15) and 25 GCPs in a rural area in northwestern China (Fig. 5.16). Working in the autonomous mode with selective availability, the GPS receiver has a planimetric accuracy up to 100 m, or 2dRMS (approximately 95 percent of the positions are within the specified value horizontally). The singular horizontal accuracy can be as high as 2 to 5 m via averaging differentially corrected positions logged within 3 minutes. The urban area of about 16 km2 contains ample road intersections that can serve as GCP candidates. Thus, it is relatively easy to select a sufficient number of widely dispersed quality GCPs. Their coordinates were also read from a topographic map for the purpose of comparison. The rural scene covering about 94 km2 encompasses predominantly sand dunes, and to a lesser extent, cultivated fields. Few distinct features visible on the satellite imagery can serve as GCP candidates. Scarcity of these features led to the selection of unstable or inconspicuous water-related features as GCP candidates, such as sharp turning
15+ N
1+ Auckland
19+ 14+
+
North Island NEW ZEALAND
2+
200 km
16+
17 3+
18+
4+
20+ 12+
8+ 10+
13+
+9
6
N 0
7+
+
5 km
11+
5+
FIGURE 5.15 Distribution of GCPs in a urban setting where many quality GCP candidates are available thanks to presence of roads. (Source: Gao and Zha, 2006.)
FIGURE 5.16 Distribution of GCPs in a rural area. Lack of distinct landmarks makes the selection of quality GCPs impossible. Their spatial distribution is highly uneven as a result. (Source: Gao and Zha, 2006.) See also color insert.
177
178
Chapter Five points of irrigation canals, roads, waterways, and intersections of windbreaks with secondary roads. This selection represents a compromise among spatial distribution, site accessibility, and ease of identification on the satellite imagery. Their ground coordinates were also read from a topographic map. GPS data for the urban scene were logged between April 4 and 22 of 1999. The GPS data logged on April 10 were differentially corrected using the Pathfinder Office (version 1.01) software. The 60 loggings contained in each GCP file were then grouped to obtain its true coordinates. The total number of loggings retained in the merged GPS files varied from 30 to 180. The rural GPS data were collected on June 22 to 24, 1998. Not all preselected GCPs were accessible via a vehicle due to absence of navigable roads. Those inaccessible GCPs had to be abandoned. Most GCPs were logged once, with the number of fix points varying widely from 33 to 192. A uniform interval of 5 seconds was adopted throughout data logging during which the sky was clear. The horizontal view at each location was unobstructed except by wind breaks at the intersection of irrigation canals. Thus, the problem of multipath (see Sec. 14.2.2 for more details) was kept to the minimum. The obtained planimetric coordinates were used to rectify Landsat TM, and SPOT panchromatic (PAN) and multispectral (XL) images based on first-order polynomials. Once the image and reference coordinates of all GCPs were entered into Earth Resources Data Analysis System (ERDAS) Imagine, rectification residuals were automatically calculated. If the RMSE exceeded one pixel size of the image being rectified, the GCPs with the largest residuals were sequentially excluded from the rectification until the overall residual fell within one pixel size for the first time. Reference coordinates were then replaced by map coordinates or GPS coordinates averaged from a varying number of fix points. Again, the worst GCPs were sequentially removed until the overall residual became smaller than one pixel size.
5.6.1
Impact of the Number of GCPs
The use of all 25 rural GCPs in rectifying the SPOT PAN image resulted in an unusually low rectification accuracy of 7.13 pixels (Table 5.4). Of these points, two were at the interface of water and sand dunes that could have moved since the map was produced. Their exclusion from the rectification led to a drastic reduction in the overall residual to 2.49 pixels. This accuracy is similar to 2.65 pixels for SPOT XL. The number of useable GCPs is reduced to 22 in rectifying the TM image because GCP 11 falls outside the image area. According to these results, there is an inverse relationship between rectification accuracy and the number of GCPs used, regardless of the image being rectified. Rectification accuracy is not so high for all
Image Geometric Rectification
No. of GCPs Used
SPOT PAN
SPOT XL
TM
25
7.13
24
4.16
23
2.49
2.65
22
2.26
1.82
1.34
21
2.10
1.51
1.10
20
1.91
1.43
1.01
19
1.67
1.37
0.93
18
1.57
1.29
17
1.46
1.23
16
1.33
1.16
15
1.20
1.10
14
1.12
1.00
13
1.01
12
0.90
11 ∗
Coordinates are averaged from 60 sequential GPS logging. Source: modified from Gao and Zha, 2006.
TABLE 5.4 Accuracy of Image Rectification for the Rural Scene (unit: pixel size) ∗
images if all GCPs are retained. Rectification residuals drop drastically after the removal of the worst GCPs,. Initially, rectification accuracy improves quickly as a few poor GCPs are excluded. With the exclusion of more and more GCPs, the pace of improvement is lowered. This is explained by the fact that not all GCP coordinates are equally reliable, even though they were averaged from the same number of fix points because of their varying degrees of ease of identification. Averaged GPS coordinates are not so accurate for only a few GCPs. Their removal is responsible for a substantial decrease in the overall RMSE. After the worst GCPs are abandoned, the RMSEs become stabilized, indicating that the remaining GCP coordinates are similarly accurate. If a sufficient number of unreliable GCPs are eliminated from a rectification, the image can be rectified at an acceptable accuracy level. The achievement of this accuracy requires the removal of 12 GCPs for the SPOT PAN image, rendering about half of the GCPs
179
Chapter Five unuseable. More GCPs are useable with the SPOT XL image owing to its coarse spatial resolution. If the spatial resolution degrades to 30 m for Landsat TM imagery, a large majority of the 22 selected GCPs are still useable.
5.6.2
Impact of Image Resolution
Accuracy of image rectification may be expressed in image pixel size or in meters. If expressed in pixel size, the accuracy achieved using non-differential GPS loggings is correlated inversely with the spatial resolution of satellite imagery (Fig. 5.17). The overall RMSE is always the largest for SPOT PAN (10 m resolution) and the smallest for Landsat TM (30 m resolution). Despite the greatest ease of pinpointing GCPs on the SPOT PAN image, its fine spatial resolution leaves little room of inaccuracy in locating the GCP within a pixel. On the other hand, the largest pixel size of TM imagery provides ample room to identify a GCP inside a given pixel. Nevertheless, the low spatial resolution of the TM imagery prolonged pinpointing the GCPs properly. More time was spent on locating the same set of GCPs on the TM image than on the SPOT images. It is comparatively easier to rectify images of a coarse resolution more accurately than for images of a fine resolution as measured against pixel size. In order to achieve a similar accuracy level, more
3.5
3.5 Pixel size Meters
3.3
2.5 3.2 2
3.1 3
1.5
2.9 1 2.8 0.5
2.7 2.6
0 PAN
XL
TM
Image
FIGURE 5.17 Relationship between image rectification accuracy and the spatial resolution of the imagery being rectified (urban scene with 30 GPS loggings).
Accuracy in meters (10)
3.4
3 Accuracy in pixel size
180
Image Geometric Rectification GCPs have to be excluded from rectifying an image of a fine spatial resolution (Table 5.4). The SPOT PAN image that has the finest spatial resolution among the three is the most difficult to rectify because this fine spatial resolution is unable to accommodate the inaccuracy of the GCP coordinates. However, these same coordinates achieved more accurate rectification for coarser resolution imagery. This is more so if these coordinates have an uncertainty level comparable to the pixel size of the imagery being rectified. The accuracy expressed in pixel size decreases from those shown in Fig. 5.17, the pace of decrease being larger for images of a finer resolution. If translated into meters, the relationship identified above is mostly reversed. The highest accuracy is associated with SPOT PAN and the lowest with Landsat TM (Gao, 2001). Both SPOT PAN and XL have a very similar accuracy. The accuracy is noticeably lower for The 30-m Landsat TM because most GCPs at road intersections are difficult to locate precisely on the image. Their ease of identification on the 30-m resolution image disappears. The uncertainty in locating the GCPs on the imagery is outstripped by the inaccuracy in identifying them. If the resolution continues to decrease, most of the GCPs will no longer be identifiable on the image. Its lowest accuracy is attributed to the difficulty in identifying the same set of GCPs on the imagery caused by its low spatial resolution.
5.6.3
Impact of GCP Quality
Ease of GCP identification varies with the nature of the scene. In the urban environment, intersections of narrow roads and streets can be precisely located even on coarse resolution satellite images. The rural scene dominated by sand dunes lacks distinct landmarks visible on satellite imagery. There is a limited choice in selecting quality GCPs, which are located less accurately on the topographic map and the satellite imagery than their urban counterparts. Unreliable GCPs had to be selected to make up a number comparable to that of the urban scene for the purpose of comparison. The impact of the nature of the scene on image rectification accuracy is illustrated in Fig. 5.18 in which the numbers of GCPs were standardized to percentages of all GCPs to compensate for the effect of the varying number of GCPs used for the urban and rural scenes. The ease of identifying GCPs on the ground and on the satellite image exerts a direct impact on rectification accuracy for SPOT PAN (Fig. 5.18a). At any given level of GCPs the rural scene is rectified less accurately than its urban counterpart if the image has a fine spatial resolution. When all GCPs are used, the urban image is rectified much more accurately than its rural counterpart. However, the gap in rectification accuracy is gradually bridged as more and more poor GCPs are excluded from rectification. Eventually, the difference between the two scenes in accuracy nearly disappears when half of the GCPs
181
Chapter Five a: SPOT PAN
Residual (in pixel size)
8 7
Rural
6
Urban
5 4 3 2 1 0 100
95
90
85
80
75
70
65
60
55
50
b: SPOT XL
Residual (in pixel size)
3 2.5
Rural Urban
2 1.5 1 0.5 0 100
95
90
85
80
75
70
65
60
55
50
C: Landsat TM 2 Residual (in pixel size)
182
Rural 1.5
Urban
1
0.5
0 100
95
90
85
80
75
70
65
60
55
50
FIGURE 5.18 Comparison of rectification accuracies of three images between rural and urban scenes using various GCPs. (Source: Gao and Zha, 2006.)
Image Geometric Rectification are excluded. On average, the difficulty of identifying GCPs on the ground degrades rectification accuracy by up to one pixel (0.85 pixels larger for SPOT PAN). The above finding still holds true for the SPOT XL image (Fig. 5.18b). For instance, the urban scene is more accurately rectified at all levels of GCPs with the average residual being 1.09 lower. Besides, a much higher proportion of the selected GCPs are useable in achieving an acceptable level of rectification accuracy for the urban scene. This relationship is reversed once the percentage of GCPs used drops to 90, this percentage drops to 50 for the rural scene due to the difficulty in locating these points accurately. The larger uncertainty surrounding the locating of these points makes an accurate rectification harder to achieve. As with the previous two types of SPOT imagery, the accuracy is also higher for Landsat TM imagery when all GCPs are retained, even though neither accuracy is acceptable (Fig. 5.18c). However, the difference is almost nonexistent after a few GCPs are removed. However, this relationship is reversed once the percentage of GCPs used drops to 90, in sharp contrast with the previous two images. Consequently, rectification accuracy is actually higher for the rural scene. The difficulty of identifying GCPs becomes less significant if the image being rectified has a coarse resolution. In this case the nature of the scene exerts little influence on rectification accuracy. Owing to the image’s coarser resolution a larger portion of the selected GCPs are useable to achieve the required accuracy.
5.7
Image Orthorectification The image georeferencing covered so far has concentrated on the horizontal position (E, N) of pixels in the output image while their elevation (H) on the ground has not been given any consideration. This practice is acceptable with spaceborne satellite imagery of a coarse to medium spatial resolution in which topographic reliefinduced shift in pixel position is negligible, or in applications in which precise geographic location is not a primary concern. In local applications (e.g., urban planning) involving very high spatial resolution satellite imagery, the geometric position of pixels needs to be determined accurately as well. Such image georeferencing must take into consideration the minor shift in pixel position caused by topographic relief. This brings out the issue of image orthorectification. Topics covered in this section include differences between orthographic and perspective projections, and the methods and procedures of image orthorectification.
183
184
Chapter Five
5.7.1
Perspective versus Orthographic Projection
There are two types of projection from a 3D surface to a 2D medium, central perspective and parallel (Fig. 5.19). In the former projection all of the field within a view is sensed from a single point in space, such as the geometric center of the camera lens. This kind of projection is commonly associated with vertical aerial photographs taken with a frame camera. The horizontal position of pixels on the photograph is no longer correct except at the nadir position if the topography has a relief (Fig. 5.19a). The magnitude of positional shift or relief displacement is a function of the relief, the altitude of the sensor H and the focal length of the camera f. Unlike aerial photographs, topographic maps have an orthographic projection in which the surface of the Earth is viewed directly below from poly-perspectives, all being parallel to one another (Fig. 5.19b). In this kind of projection there is no relief displacement, so all indicated positions are correct. Orthorectification is the process of transforming a central perspective image into an orthogonal image by removing positional displacement caused by topographic relief from the input image, in addition to providing the ground coordinates for all pixels. The effect of other conditions during image acquisition, such as variation in viewing geometry and platform attitude, and Earth rotation, is also removed from the rectified image just as in standard image georeferencing. Images that have been orthorectified are termed orthoimages that have a uniform scale without any relief displacement. Generation of true orthoimages requires a digital surface model in which any objects (e.g., buildings and bridges) that cause relief displacement are described.
edc b a
D
B
E
D
B Surface
C B′ Relief displacement
Map
A
A
A′
ab cd e
Photo plane
E Surface
C
Datum
Datum C′ (a)
D′
E′
A′
B′
C′
D′
E′
(b)
FIGURE 5.19 Comparison of central perspective projection, commonly associated with aerial photographs (a), with orthographic projection, commonly associated with orthophotos (b).
Image Geometric Rectification Usually carried out for large scale hyperspatial resolution images or airborne photographs, orthorectification is recommended for mountainous terrains and for remote sensing materials that are to be used to construct 3D models of the scene. Orthorectification, however, is not beneficial if the scene has a relatively flat terrain. It is not recommended for small-scale images obtained at an altitude much higher than the topographic relief as the amount of topography-induced displacement in pixel position is negligible on such images.
5.7.2
Methods of Image Orthorectification
The precondition of image orthorectification is the establishment of a relationship between image coordinates (r, c) and the ground coordinates (E, N, Z). For spaceborne satellite imagery, the construction of this relationship relies on the exterior and interior orientation parameters (e.g., position and orientation) of the sensor, with the assistance of 3D GCPs. Image orthorectification may be implemented nonparametrically or parametrically (Hemmleb and Wiedemann, 1997). Nonparametric approaches such as polynomial transformation and projective transformation are very similar to the 2D polynomial-based image rectification covered in Sec. 5.5 except that the height of GCPs is also considered. No information on the sensor is utilized in the rectification, in drastic contrast to parametric approaches in which the image coordinates of all pixels are transformed to ground coordinates based on the information on the interior and exterior orientation of the sensor. These approaches include differential rectification, sensor-specific model rectification, and RFM rectification. Differential rectification refers to individual transformation of pixel values from the input image to an output image that has the right geometry (i.e., distortion free). Both camera distortions and relief displacement are removed from the rectified photographs and satellite images, which may be further refined using GCPs. A sufficiently large number of 3D GCPs is essential in both parametric and nonparametric approaches. This number may be reduced through the deployment of mathematical models, such as the bundle-adjustment model for overlapping 3D aerial photographs. Through photogrammetric bundle adjustment, satellite images can be orthorectified from satellite orbital parameters. It is possible to integrate the parametric methods with the nonparametric ones. Image orthorectification may be based on a physical model or sensor-specific model such as the rigorous collinearity equations. In this model the ground coordinates of pixel A(EA, NA, HA) in the ground coordinate system are calculated from its image coordinates (xa, ya) (Fig. 5.20) directly using the collinearity equations [Eqs. (5.25) and (5.26)]. The establishment of these equations is based on the principle
185
186
Chapter Five z O (center of lens)
f
y
o a (xa,ya)
c b
x
C
A (EA,NA,HA)
O′ B
Z (H) Y (N)
X (E)
FIGURE 5.20 Relationship between the coordinates of point a (xa, ya) on the image and A (EA, NA, HA) on the ground.
that the projection center O of a central perspective image (e.g., the center of the camera lens), an object point on the ground A, and its photographic image a lie in a straight line.
x a − x0 = − f
r11 (EA − EO ) + r12 (N A − NO ) + r13 (H A − HO ) r31 (EA − EO ) + r32 (N A − NO ) + r33 (H A − HO )
(5.25)
y a − y0 = − f
r21 (EA − EO ) + r22 (N A − NO ) + r23 (H A − HO ) r31 (EA − EO ) + r32 (N A − NO ) + r33 (H A − HO )
(5.26)
Image Geometric Rectification f = calibrated camera focal length (x0, y0, f ) = interior orientation parameters of the input image
where
(EO, NO, HO) = coordinates of the exposure center in the ground coordinate system rij (i = 1, 2, 3; j = 1, 2, 3) = elements of the rotation matrix R [Eq. (5.27)]. It is calculated from the three rotation angles with respect to the geocentric coordinate system, or ⎛ r11 R = ⎜ r21 ⎜ ⎝ r31
r12 r22 r32
r13⎞ r23⎟ ⎟ r33⎠
⎛ cos ϕ cosκ = ⎜ − cos ϕ sin κ ⎜ sin ϕ ⎝
cos ω sin κ + sin ω sin ϕ cosκ cos ω cosκ − sin ω sin ϕ sin κ − sin ω cos ϕ
sin ω sin κ − cos ω sin ϕ cosκ ⎞ sin ω cosκ + cos ω sin ϕ sin κ ⎟ ⎟ cos ω cos ϕ ⎠
(5.27) Two methods are available for solving the above collinearity equations: • The first method is to define a uniform grid over the orthophoto plane (datum). For every grid cell (X, Y) in this plane, its corresponding height is interpolated from neighboring pixels. These coordinates are then plugged into the collinearity equations to calculate their coordinates in the image. The pixel value at this determined position is then resampled from its neighboring pixel values using the methods described in Sec. 5.5.4. This process is then repeated for all other pixels in the orthophoto plane. • Alternatively, the equations are rearranged in a polynomial form. The transformation from the object to image space polynomials can have the fourth order with 14 and 15 terms for the basic and extended forms, respectively. The extended form enables finer influences to be modeled, such as quadratic terms of altitude change of the sensor. A higher order of transformation requires more GCPs. Starting from the regular digital elevation model (DEM), the nodes were transformed into pixel space and used as anchor points to bilinearly interpolate pixel coordinates of the remaining orthophoto pixels (Vassilopoulou et al., 2002). Designed for rectifying stereoscopic aerial photographs (e.g., analytical aerotriangulation), image orthorectification based on the collinearity equations is the most suited to rectify frame images accurately, achieving an accuracy as high as a fraction of a pixel. Furthermore, it
187
188
Chapter Five can be extended to rectify a block of overlapping images or photographs via tie points (TPs) that are distinctive landmarks in the overlapping portion of stereoscopic images/photographs. These TPs should possess the same characteristics as GCPs except that they should be located at the corners and midway near the border of the overlapping zone. With the use of these TPs, the number of GCPs on individual images can be drastically reduced, even to zero. As with the standard image rectification, the ground coordinates are modeled as functions of the image coordinates using the principle of least-squares adjustment (Zhou and Jezek, 2004). With modification this method can be used to orthorectify SPOT images obtained through along-track scanning. In this case the equations apply to lines of an image instead of a frame of an image.
5.7.3
Procedure of Orthorectification
Image orthorectification must be preceded by a number of preliminary steps, including data input, selection of GCPs/TPs, specification of a rectification model, derivation of the DEM, and the actual rectification (Fig. 5.21). Selection of GCPs/TPs has been covered in Sec. 5.5.3 Input of imagery
Collection of GCPs/TPs
Selection of projection model Import Existing DEM
Lines, contours, TINs
Spatial Interpolation
Stereo images
Photogrammetric method
Construction of DEM
Orthorectification
FIGURE 5.21 Major steps precede image orthorectification.
Image Geometric Rectification except that they should be representative of the whole planimetric and height range with a balanced distribution. This means that GCPs should be dispersed throughout the area covered, and TPs should be located widely within the overlapped images. Unlike GCPs that must be identifiable on the ground or on topographic maps, TPs need to be identifiable only in the overlapped portion of adjoining images or photographs. Some of the rectification models introduced in Sec. 5.4 are also useable for image orthorectification, including the simple affine transformation and projective transformation models. In particular, the traditional differential rectification model is able to remove topographic relief displacement. In addition, several sensor-specific models such as RPCs are also applicable to orthorectification. At present the RPC method applies to IKONOS Geo Ortho Kit images and QuickBird Other Ready Standard imagery, OrbView-3, and SPOT. Supplied with these images is an ancillary file containing the RPC parameters. With the increasing coupling of sensors with GPS units, it is possible for the RPC model to find more applications to other very high resolution satellite images in the near future. A critical preliminary step in image orthorectification is the construction of a DEM. It should cover the same geographic area as the imagery being rectified. Preferably, it should have the same spatial resolution as the imagery. For instance, the DEM should have a cell size of 30 m if the imagery being orthorectified is from Landsat TM. DEMs can originate from a wide range of sources. Coarse resolution (e.g., 90 m) DEMs, such as those obtained from the Shuttle Radar Topography Mission, are freely available for the global terrestrial surface. DEMs at a spatial resolution finer than this have to be constructed from stereoscopic pairs of aerial photographs by means of digital photogrammetry. Alternatively, they can be created from existing digital contour data, TIN, or raster data via a process known as spatial interpolation (Fig. 5.21). With increasing ease of availability, light detection and ranging (LiDAR) data are another source of reliable elevational information. It is important to construct the DEM at the highest accuracy possible because the horizontal shift in the position of pixels is calculated from its elevation in the DEM. The accuracy of elevation information governs the quality of image orthorectification. Prior to the application of the constructed DEM in orthorectification, attention must be paid to the datum from which elevation is referenced, in particular, whether the DEM and the height of the sensor are referenced to the same datum. Usually called the optometric height, elevations in most DEMs are referenced to the mean sea level. Globally, mean sea level is a broadly undulating surface known as the geoid (Fig. 5.22). In contrast, satellite height is referenced to an ideal earth-centered ellipsoid whose geometric shape is mathematically defined, such as the World Geodetic System (WGS) 1984. The vertical discrepancy between the geoid and ellipsoid surface at any location,
189
190
Chapter Five
Satellite
Viewing direction
Surface relief
P0 Geoid height (–)
P1
Geoid Geoid height (+) Ellipsoid
FIGURE 5.22 Precondition of orthorectification and geoid height in vertical section (exaggerated).
called the geoid height, ranges from −100 to +100 m (Smith, 2006). The elevation recorded in the DEM must be modified to take this discrepancy into consideration in order to achieve accurate orthorectification. Once the DEM has been modified to the proper datum, the orthorectification process can proceed. After the four corner positions are determined, an empty orthoimage is created. This image is then sampled at a regular interval equivalent to the spatial resolution of the input image (Fig. 5.23). The height at each of the grid cells is then determined from the DEM and used to calculate the relief displacement, from which its position on the surface (E, N, H) is determined. These ground coordinates are then used to find the image coordinates of the pixel in the image space, using Eqs. (5.25) and (5.26). As with regular image georeferencing, the output orthoimage is created through resampling, such as nearest neighbor or bilinear interpolation. The equivalent location of every pixel in the input image is calculated from the height (H) and exterior orientation parameters. The rectified image needs to be further refined because the georeferencing information contained in the ancillary file refers to the position of the satellite in its orbit and an average elevation for the whole scene. These average conditions used to determine the four corner points may differ from the actual condition for a particular image.
Image Geometric Rectification
2O
1. Pixel height in the DEM 2. Exterior orientation of the sensor 3. Image coordinate and pixel value 4. Equivalent position on orthoimage
3
A
1 4
FIGURE 5.23 Relationship among a ground point (1), its elevation and position on the ground and on the remote sensing imagery (3), and the sensor (2) used in orthorectification.
Any discrepancy between them will translate into errors in the position of the orthorectified image. Further refinement with the use of GCPs is accomplished in the same manner as described in Sec. 5.5. In comparison to planimetric accuracy, a high vertical accuracy is more difficult to achieve in orthorectification because it is affected by more variables. In addition to the regular factors such as the quantity and quality of GCPs/TPs, and their coordinates inaccuracy, the accuracy of image orthorectification is also significantly influenced by the quality of the DEM and topographic relief. A higher accuracy is achievable over a gently rolling terrain than over a mountainous area of a large relief. This accuracy can be as small as RMSE < 2.0 m (0.9 to 2.0 m) for IKONOS imagery except at the most heterogeneous site (2.6 m) (Wang and Ellis, 2005), an accuracy level meeting the U.S. National Map Accuracy Standards for 1:12,000 to 1:4800 maps. However, the accuracy level (e.g., RMSE at independent check points) degraded to ±5.1 ~ ±5.7 m for coarse resolution SPOT level 1B stereo images (Al-Rousan et al., 1997). The residuals in elevation ranged from ±4.4 to ±7.7 m even when all available GCPs were used in the absolution orientation.
191
192 5.8
Chapter Five
Image Direct Georeferencing With the advent of GPS technology, it is possible to couple it with an inertial navigation system (INS), also known as an independent measurement unit (IMU), in acquisition of remote sensing data. In addition to easing aircraft navigation, this integration considerably facilitates georeferencing of remote sensing imagery. For instance, during flight to acquire aerial photographs there is no need for the pilot to follow a rigid position thanks to “in-flight alignment” afforded by GPS. The INS is able to provide a continuous high-bandwidth measurement of position and velocity after the noisy velocity from GPS outputs is smoothed (Skaloud, 2002). The generated information on sensor position and exterior orientation at the time of imaging from the deployment of a GPS-aided INS makes it possible to directly georeference images without ground control (Schwarz et al., 1993). Image direct georeferencing is a process of restoring the image orientation from in-flight measured exterior orientation parameters of the sensor without reliance on ground control. The position of all pixels on this restored image can be translated into ground coordinates according to their internal mathematical relationship. The concept behind GPS-aided INS for direct georeferencing dates back to the late 1980s and early 1990s, with the first system commercialized in 1996. Since then tremendous progress has been made, with the capabilities of this new technology fully exploited. Now, image direct georeferencing has been accepted as an augmentation to and replacement of aerial triangulation. With advances in computing and the wide use of digital cameras in aerial photography, direct image georeferencing is quickly becoming the de facto industry standard. It has evolved to such a degree that the traditional workflow of data acquisition, data processing, and map production can be accomplished in one step, thus revolutionizing our perspective of mapping science altogether. High quality mapping products are generated in a much simplified process. In this section the principle of direct georeferencing is presented first, followed by a comparison of its performance with the conventional GCP-based image rectification.
5.8.1 Transformation Equation As illustrated in Fig. 5.20, the basic concept of direct georeferencing is expressed mathematically as RA = RO + sa RINS Rc INS r
(5.28)
where RA is the coordinates of point A, (EA NA HA)T or its georeferenced position in the ground coordinate system, and RO is the 3D coordinates (EO NO HO)T of the exposure center of the imaging sensor
Image Geometric Rectification at the instant of exposure determined from the INS/GPS. sa is the scalar factor of the image, implicitly derived during the photogrammetric reconstruction of a 3D model from a pair of stereoscopic images, so does not need special computation to determine. RINS is the rotation matrix involving the three orientation angles of w, j, and k derived from the INS/GPS integration. ra is a matrix of coordinates for point a in the focal plane, measured from the principal point of the photograph. It is expressed as ⎛ na − npp ⎞ ra = ⎜ e a − e pp − f⎟ k ⎝ ⎠
T
(5.29)
where (ea, na) = image coordinates of point a corresponding to the ground point A (epp, npp) = offsets of the principal point from the CCD center k = factor accounting for the nonsquare shape of the CCD pixels f = calibrated focal length of the camera lens Rc INS (boresight) is the transformation matrix which rotates the INS body-frame into the camera frame, or the INS/camera orientation offset (Mostafa and Schwarz, 2000). Similar to RINS and RO, Rc INS is a function of time. Its value is based on time measurements from a spacecraft constellation. There are two ways of computing it: either measured with an extra sensor carried aboard or via tight coupling of the INS with a camera during photography. In the second method, a tight bundling of the INS and the camera during photography is advantageous because it keeps the orientation offset constant. However, the camera model needs validation in a test flight over a permanent ground field or a part of the area to be photographed. Test flights are also needed to calibrate the bundled system and INS boresight Rc INS , a pivotal parameter in transforming the INS-measured altitude and angles into photogrammetric angles with respect to a local mapping frame of reference. Since the INS and the camera can rotate independently of each other, the offset needs to be computed for each flight if the INS is detached from the camera between flights. If the two are permanently coupled, then the constant offset needs only occasional calibration. The accuracy of direct georeferencing is affected by a number of factors, such as the GPS/INS system used, the reliability of the coefficients (e.g., how accurately the system is calibrated overall), and camera interior orientation parameters (Skaloud, 2002). Integration of GPS and INS allows the determination of attitude (roll, pitch, heading) and position (E, N, H) of the camera at the time of exposure.
193
194
Chapter Five
Average error Standard error
Northing 0.006
Easting 0.014
Height −0.005
v
e
j
−0.450
0.000
0.000
0.121
0.061
0.224
19.581
16.783
32.146
Source: Kinn 2002.
TABLE 5.5 Difference between Camera Exterior Orientation Provided by GPS/IMU and Aerotriangulation (Position in Meters and Angles in Arc Seconds)
Their accuracy can be improved by reconciling the errors in these systems through the Kalman filter that provides error estimates for the camera’s position and altitude. These estimates can be used to modify altitude and position to achieve higher accuracy. Consequently, positions can be accurate to a decimeter, and angles to 20 arc seconds in roll and pitch with an adjustment of approximately 20 arc seconds, and to 30 arc seconds in heading (Table 5.5). Residual orientation errors are due most likely to misalignment of the inertial platform during flight. These accuracies vary with the camera/INS system. A better accuracy level has been achieved with the Vexel UltraCan D digital imaging system with a total of 20 GCPs (Ip et al., 2006). As shown in Table 5.6, whether the photograph is panchromatic or color makes little difference in the orientation parameters, even though panchromatic images produced a slightly better TPs ratio than color images (Kinn, 2002). Precise determination of boresight requires a small block adjustment with some GCPs. If logged with a GPS device, the positioning accuracy of GCPs is affected by whether GPS loggings are differentially corrected and by the capability of the GPS system in resolving ambiguities. Usually, GPS positioning accuracy falls within the range of 0.05 to 0.5 m, depending on the baseline length and atmospheric condition. The better accuracy in Table 5.6 is due to the use of GCPs.
Position, m
Orientation, arc min
Photograph
E
N
H
v
i
j
Panchromatic
0.03
0.03
0.07
0.11
0.12
0.29
Color
0.03
0.04
0.06
0.11
0.12
0.32
Pan (boresight)
0.11
0.12
0.29
Color (boresight)
0.11
0.12
0.32
Source: Ip et al., 2006.
TABLE 5.6 RMS Residuals of Photocenter Position, Orientation, and Boresight Calibration
Image Geometric Rectification The accuracy of determining the position and orientation of the sensor using direct georeferencing lies typically between 10 and 20 cm (RMS), and 15 and 30 arc seconds (RMS), respectively (Lithopoulos et al., 1999). This level of accuracy has been confirmed by Kinn (2002). Evaluated against 24 GCPs, a ground positioning accuracy of around 1 m was achieved in both easting and northing (Mostafa and Schwarz, 2000). The accuracy for height is lower, at a range from 1.5 to about 3 m. Horizontal coordinates and height have a standard deviation of 0.9 and 1.8 m, respectively. The ground positioning accuracy was further improved to 0.5 m in planimetry (1s) and 1.6 m in height from stereopairs at an average image scale of 1:12,000 (Mostafa and Schwarz, 2001). The highest ground positioning accuracy is achieved at 0.2 m of planimetric accuracy and 0.3 m in height using GPS/INSaided block triangulation of both nadir and oblique images. This accuracy level is sufficiently accurate for mappings at scales <1:5000.
5.8.2
Comparison with Polynomial Model
As shown in Eqs. (5.13) and (5.14), polynomial-based image transformation handles only the horizontal position of pixels. It is unable to deal with the third dimension of pixels (i.e., height). Therefore, any positional shift caused by topographic relief on the ground cannot be removed via the application of the two transformation models. The polynomial method is suitable for georeferencing satellite images obtained at a very small scale. If ground control is available, this method is preferable in transforming images from the local coordinate system to the global system. In this way it is feasible not only to coregister multiple images but also to correct geometric distortions inherent in the input image. Very easy to implement, polynomial image rectification is advantageous over direct georeferencing in that it does not require information on satellite orbit and sensor calibration (El-Manadili and Novak, 1996). However, it has the following three disadvantages: • First, it requires maintaining a large number of well-distributed GCPs. • Second, it lacks physical interpretation of the model beyond the second order. • Third, it is unable to handle the positional shift caused by topographic relief displacement. Of the two sources of geometric distortions in remote sensing images, those caused by orbital parameters exert a global impact on all pixels in an image. With the polynomial model, such distortions can be effectively dealt with. However, those caused by variation in topography are impossible to address. The influence of topographic relief is local and random in nature. This influence can be effectively tackled by a stochastic approach of
195
196
Chapter Five image rectification in which the spatial variation in terrain is taken into account through a DEM. DEM is particularly favored for geometric rectification when GCPs are not easily obtainable, such as when it is prohibitively expensive to collect ground control in inaccessible areas or when it is difficult to identify them on imagery (e.g., radar imagery) as a result of topographic shadow and radar layover. By comparison, direct georeferencing makes use of orbital parameters of the platform instead of relying on GCPs in rectifying images. It is the only method available in areas where the selection of quality GCPs is hampered by the absence of distinct landmarks (e.g., the coastal area). Even without the need for ground control, direct georeferencing enables the computation of 3D positions of pixels that appear in the FOV of the sensing system. Elimination of the necessity to establish ground control saves a large amount of labor and time, and minimizes the cost of georeferencing images. Besides, it is also possible to georeference images in real time. This capability is particularly useful in emergency situations in which ground control is almost impossible to obtain instantaneously, such as fire fighting, oil spills, or leaking pipelines. Image direct georeferencing is especially important in mobile mapping where the scene keeps changing constantly. Nevertheless, image direct georeferencing is limited by its complexity in converting GPS/INS’s orientation parameters to the parameters of the sensor. It is also computationally intensive. This problem is less severe with the advent of powerful and faster computers. At present, image direct georeferencing is associated only with aerial mapping in which a block of airborne photographs are georeferenced simultaneously. Through incorporation of the exterior orientation parameters of overlapping photographs in the aerotriangulation adjustment, a much more accurate rectification is achieved for many pairs of stereoscopic images quickly and efficiently than rectification of individual images. Requiring the focal length of the camera, image direct georeferencing is suitable for frame photographs taken with an analog or digital camera. Direct georeferencing is applicable to very high resolution spaceborne imagery if it is obtained by scanning the CCD plane. With such frame-based images, it is possible to achieve excellent results. However, accuracy will be much lower if the satellite images have a poly-central perspective projection, such as those obtained via pushbroom scanning. Besides, it brings out fewer advantages in comparison with stereoscopic aerial photographs since satellite imagery contains minimal overlap. Each image has to be georeferenced individually rather than in a block of tens or even hundreds of images. This disadvantage explains why no direct georeferencing systems have been developed for satellite images yet.
Image Geometric Rectification So far no studies have been carried out to compare the accuracy levels of image direct georeferencing with GCP-based polynomial image georeferencing. So it remains unknown how the two methods differ from each other in their accuracy. Irrespective of their relative performance, a high level of georeferencing accuracy is certainly achievable if image direct georeferencing is combined with the GCPbased polynomial method.
5.9
Image Subsetting and Mosaicking 5.9.1
Image Subsetting
Image subsetting is the process of delimiting a small area from an input image that covers a ground area larger than is necessary (e.g., a full-scene image). It is a vital processing step in remote sensing applications in which the area under study makes up a small portion of the full-scene image. Through image subsetting the necessary image size is kept small, which is conducive to expediting all subsequent image processing. Image subsetting can be accomplished using two sets of parameters: row/column, or a boundary file. The former method requires a pair of coordinates defining the two opposite corners (e.g., upper left and lower right) of a subimage (Fig. 5.24). One pair of the coordinates can be substituted by the physical dimension of the image to be subset. When subsetting an image, it is important to bear in mind that the first row (column) and the last row (column) of the subset image are both counted. So the subtraction of the first row (column) from the last row (column) will not produce the correct image dimension. Instead, it should be incremented by 1. This row/ column number method is applicable to raw images of a local coordinate system in which the exact boundary of the study area is
FIGURE 5.24 A subscene image of 512 by 512 subset from a full-scene SPOT image using a pair of row and column numbers. (Copyright CNS, 1994.) See also color insert.
197
198
Chapter Five unknown. The subset image is always square or rectangular in shape, covering a smaller area than that of the input image. The area falling outside the defined bound is discarded. There is no limit to the number of spectral bands in the input image. The number of output bands is usually kept the same as that of the input image. If the satellite images have been georeferenced to a global system already, it is more appropriate to subset the area of interest using a polygon known as area of interest (AOI) in ERDAS. This polygon file can be imported from an existing GIS database or digitized manually on screen. Its intersection with the input image enables a subimage to be cut out. Unlike the row/column number method, an irregularly shaped polygon allows the study area to be defined more precisely than a regular square or rectangle (Fig. 5.24). Images subset using AOI are likely to have an irregular boundary (Fig. 5.25) that closely follows the study area. Image subsetting using AOI has the advantage of a reduced file size and less processing in subsequent steps as the background, rendered as black in the figure, is automatically excluded from analysis. In terms of data storage, less space is needed as all background has the same value of zero. Image subsetting needs to be undertaken only once if the raw image has been
FIGURE 5.25 A subimage subset with the AOI tool in ERDAS Imagine. AOI allows an irregularly shaped study area to be defined. This has the advantage of limiting the area to reduce the file size and avoid misclassification.
Image Geometric Rectification georeferenced already. In this case the AOI method is the more appropriate choice. In the image processing flowchart (Fig. 1.2), image subsetting is presented ahead of image rectification. Apparently, this sequence applies to the raw image. In case of georeferenced images, image subsetting takes place after image rectification. Possibly, image subsetting may take place twice in reality. The first time a much larger area than the study area is delimited. This processing is necessary as the rectified image may not be oriented properly, introducing void into the final image after rotation. A second subsetting is needed to make the final image have a regular shape or one conforming to the outline of the study area defined by the AOI.
5.9.2
Image Mosaicking
Opposite to image subsetting, image mosaicking is the process of stitching multiple images or digital photographs of the same area together to form a larger image. It is activated when the study area is covered by multiple images. Image mosaicking is quite common with aerial photographs and hyperspatial resolution satellite images because they cover a limited ground area per scene. A sizeable study area requires multiple images to cover. They must be mosaicked to form one image for the convenience of subsequent analyses. Image mosaics fall into three major categories: index, uncontrolled, and controlled. Index mosaic is created out of analog photographs that have been properly aligned. Generated for reference purposes, index mosaic does not involve digital processing to produce, so it is not discussed further. By comparison, uncontrolled mosaics are generated from raw images without geometric rectification, but controlled mosaics must be produced from georeferenced images. Mosaicking of nongeoreferenced images relies on the spatial continuity of the same ground features in multiple overlapping images. Component images are stitched together through visually examining these features to ensure their uninterrupted continuity across the border of multiple images. It is usually undertaken in computer systems that are unable to preserve the geometric properties of images, such as Adobe Photoshop. In this environment a pair of images are mosaicked through manually shifting, rotating, and scaling one of them, usually the slave one. Once a mosaic is created, it is stitched with the third image in a similar manner. The only difference is that geometric change can be done only to the new image but not to the mosaic. This process continues until all images have been inserted into the final mosaic. The accuracy of the mosaicked image is subject to the visual acuity of the analyst and the amount of distortions inherent in the component images. Geometric inaccuracy in any of the input images other than rotation cannot be eliminated during
199
200
Chapter Five stitching. It cumulates in the mosaicked image. For instance, the mosaicking of the third component image with the mosaic of the first two images is subject to the inaccuracy of either of them. Therefore, the accuracy of the generated mosaic degrades very quickly as the number of images added to the mosaic increases. The final mosaic is thus imprecise if the component images contain nonlinear distortions that cannot be removed through rotation and scaling. Therefore, this uncontrolled method is not recommended for mosaicking remotely sensed images. By comparison, it is much easier to produce a controlled mosaic from georeferenced images in an image processing system that allows the geometric information of an image to be preserved, such as ERDAS Imagine. In this environment, an empty mosaic is created first. Afterward, all component images are dumped into it. Since they have been georeferenced to the same ground coordinate system, the machine recognizes their spatial position in the mosaic automatically according to their geographic coordinates. Thus, the component images do not need to overlap each other. If they do overlap, the overlapped portion is either trimmed or untrimmed from the resultant mosaic. To produce an untrimmed mosaic, several options are available to specify the output pixel values in the overlapped portion, such as averaging, minimum, or maximum. In the resultant controlled mosaic, geometric distortions are noncumulative. Instead, they are restricted to individual images. Thus, the accuracy of controlled mosaics is the same as the accuracy of individual georeferenced component images. However, spatial discrepancy in the position of the same features in two adjacent images cannot be reconciled manually during mosaicking, no matter how large it is. Both controlled and uncontrolled mosaicking face the same issue of radiometric inconsistency across multiple images. Prior to mosaicking it is possible to unify the radiometric properties of all component images through some kind of image processing. This task becomes much easier if the component images are black and white. Their radiometry can be matched closely by unifying the histogram of both images, or making them have the same mean and standard deviation (see Sec. 6.1.6 for more information). However, the images will not resemble each other radiometrically (Fig. 5.26) because the ground features covered vary in their proportion. It is also rare that the mosaic will have a uniform tone. The task of unifying image radiometry is much more challenging with color images as color has three dimensions of hue, saturation, and brightness, as against tone of a black-and-white photograph. Unless the radiometry of all images can be unified to an acceptable level, it is recommended that the mosaicked image not be used for any quantitative analyses.
Image Geometric Rectification
FIGURE 5.26 A controlled trimmed mosaic produced in ERDAS Imagine. Tonal variation across photographs means that the mosaic is not suitable for being analyzed digitally with a computer.
References Al-Rousan, N., P. Cheng, G. Petrie, T. Toutin, and M. J. V. Zoej. 1997. “Automated DEM extraction and orthoimage generation from SPOT level 1B imagery.” Photogrammetric Engineering and Remote Sensing. 63(8):965–974. El-Manadili, Y., and K. Novak. 1996. “Precision rectification of SPOT imagery using the direct linear transformation model.” Photogrammetric Engineering and Remote Sensing. 62(1):67–72. Gao, J. 2001. “Non-differential GPS as an alternative source of planimetric control for rectifying satellite imagery.” Photogrammetric Engineering and Remote Sensing. 67(1): 49-55. Gao, J., and Y. Zha. 2006. “Integration of GPS and remote sensing into GIS: A case study of rectifying satellite imagery using uncorrected coordinates in different scenes.” Geocarto International. 21(4):59–65. Hemmleb, M., and A. Wiedemann. 1997. “Digital rectification and generation of orthoimages in architectural photogrammetry.” CIPA International Symposium, IAPRS, XXXII, Part 5C1B:261–267, International Archives of Photogrammetry and Remote Sensing, Göteborg, Sweden. Ip, A., W. Dillane, A. Giannelia, and M. Mostafa. 2006. “Georeferencing of the UltraCam D images—Boresight calibration results.” Photogrammetric Engineering and Remote Sensing. 72(1):9. Kinn, G. 2002. “Direct georeferencing in digital imaging practice.” Photogrammetric Engineering and Remote Sensing. 68(5):399, 401–402. Leica Geosystems. 2006. ERDAS Imagine Tour Guides. Leica Geosystems Geospatial, Norcross, Georgia, US. Lillesand, T. M., R. W. Kiefer, and J. W. Chipman. 2003. Remote Sensing and Image Interpretation (5th ed.). New York: John Wiley & Sons. Lithopoulos, E., B. Reid, and J. Hutton. 1999. “Automatic sensor orientation using integrated inertial/GPS: Direct georeferencing with minimal ground control.” Geomatics Info Magazine. 13(6):58–61. Moik, J. G. 1980. Digital Processing of Remotely Sensed Images. Washington, D. C.: NASA. Mostafa, M. M. R., and K. P. Schwarz. 2000. “A multi-sensor system for airborne image capture and georeferencing.” Photogrammetric Engineering and Remote Sensing. 66(12):1417–1423. Mostafa, M. M. R., and K. P. Schwarz. 2001. “Digital image georeferencing from a multiple camera system by GPS/INS.” ISPRS Journal of Photogrammetry and Remote Sensing. 56(1):1–12.
201
202
Chapter Five Novak, K. 1992. “Rectification of digital imagery.” Photogrammetric Engineering and Remote Sensing. 58(3):339–344. Okeke, F. 2006. “Review of digital image orthorectification techniques.” GIS Development: Asia Pacific. 10(7):36–39. Rosenholm, D., and D. Akerman. 1998. “Digital orthophotos from IRS—Production and utilization.” GIS—Between Visions and Applications, International Archives of Photogrammetry and Remote Sensing, 32. Stuttgart, Germany. Schwarz, K. P., M. A. Chapman, M. E. Cannon, and P. Gong. 1993. “An integrated INS/GPS approach to the georeferencing of remotely sensed data.” Photogrammetric Engineering and Remote Sensing. 59(11):1667–1674. Skaloud, J. 2002. “Direct georeferencing in aerial photogrammetric mapping.” Photogrammetric Engineering and Remote Sensing. 68(3):207, 209–210. Smith, R. B. 2006. Tutorials: Orthorectification Using Rational Polynomials. Lincoln, NE: MicroImages. Stirling, I. F. 1974. “The new map projection.” New Zealand Cartographic Journal. 4(1):3–9. Tao, C. V., and Y. Hu. 2001. “A comprehensive study of the rational function model for photogrammetric processing.” Photogrammetric Engineering and Remote Sensing. 67(12):1347–1357. Vassilopoulou, S., L. Hurni, V. Dietrich, E. Baltsavias, M. Pateraki, E. Lagios, and I. Parcharidis. 2002. “Orthophoto generation using IKONOS imagery and highresolution DEM: A case study on volcanic hazard monitoring of Nisyros Island (Greece).” ISPRS Journal of Photogrammetry and Remote Sensing. 57(1–2):24–38. Wang, H., and E. C. Ellis. 2005. “Spatial accuracy of orthorectified IKONOS imagery and historical aerial photographs across five sites in China.” International Journal of Remote Sensing. 26(9):1893–1911. Zhou, G., and K. Jezek. 2004. “Satellite navigation parameter-assisted orthorectification for over 60°N latitude satellite imagery.” Photogrammetric Engineering and Remote Sensing. 70(9):1021–1029.
CHAPTER
6
Image Enhancement
I
mage enhancement refers to data processing that aims to increase the overall visual quality of an image or to enhance the visibility and interpretability of certain features of interest in it. During acquisition of remotely sensed imagery, the potential range of pixel values may not be fully utilized in recording the data owing to the atmospheric effect and the limitations of the sensing system. Consequently, the obtained data may have a poor quality, such as a low contrast, an overly dark tone, or much radiometric noise. Eradication of such problems lies in image enhancement that may be carried out either nonspatially, based on histogram information, or spatially within an operating window. In nonspatial image enhancement, the output value of a pixel is based solely on its input value without taking its neighboring pixels into consideration. Namely, the value a pixel receives in the output image is not affected by the value of its neighboring pixels. In spatial image enhancement, the output value of a pixel is affected by that of surrounding pixels within the operating window. Both spatial and nonspatial enhancements are undertaken either for a single band or for multiple bands. No matter how many spectral bands are involved, it is worthwhile to note that image enhancement does not create any new information in the output image. On the contrary, such processing is usually accompanied by a loss of information. Thus, the enhanced image may contain less information than the original image. As a matter of fact, it is the quality of the features of interest in the input image that is enhanced at the expense of losing information about features of no interest to the analyst. Whether the same pixel value is regarded as information or noise depends utterly on the purpose of enhancing the image. This chapter on image enhancement consists of seven sections. Covered in the first section are nonspatial image enhancement techniques that include density slicing and contrast enhancement. This is followed by spatial enhancement, such as spatial filtering, and edge enhancement and detection. Afterward, the discussion shifts to multiple image manipulation in Sec. 6.5. The sixth part of this chapter is devoted to image transformation. An example of principal component analysis is provided to illustrate the undertaking of the
203
204
Chapter Six transformation in detail, its main use, and interpretation of the transformed results. Finally, this chapter ends with a section on image filtering in the frequency domain. In all discussions, mathematical equations and calculations are provided for those readers with the necessary background. Those readers who are not interested in the mathematical underpinning of image enhancement may choose to focus on the interpretation of the transformed results.
6.1
Contrast Stretching Contrast stretching is a process of modifying or enlarging the range of pixel values in an input image in an attempt to improve its visual effectiveness or quality. In this process the digital number (DN) value of every pixel in the image is modified according to a pre-determined function. It includes density slicing and contrast enhancement, both carried out for single bands. In this histogram-based operation, a pixel’s DN is modified regardless of its neighboring pixels’ values. Mathematically, contrast stretching is expressed as
DNout = f(DNin)
(6.1)
where DNout = output DN in the contrast-stretched image DNin = DN of the same pixel in the raw image f = transformation function through which contrast is manipulated; it can be either linear or nonlinear
6.1.1
Density Slicing
Also known as pixel-value thresholding, density slicing is virtually a process of discretizing the continuously varying pixel values in the input band. Pixel values within a certain gray level range are amalgamated into a single value in the output image. The range of entire pixel values in the input image is reduced to a few categories of values, each corresponding to a unique range of pixel values in the input image. Thus, the potential number of pixel values is considerably reduced in the sliced image. A unique color may then be assigned to each newly created pixel value, converting a gray level image into a pseudocolor one. Since the naked human eyes are much more sensitive to variation in color than gray tone, the subtle spatial pattern contained in the input image is much more easily perceived visually in the density-sliced output. In order to produce a meaningful pattern for the phenomenon under study (e.g., concentration levels of silt in nearshore water), the thresholds for each discrete category must be carefully selected (Fig. 6.1). Frequently, the histogram of a spectral band is relied to derive the critical slicing thresholds that should not be overlapping across categories. The more appropriately selected these thresholds are, the more authentic the resultant spatial pattern is.
Frequency
Image Enhancement
1 2 3
4 5
6
7
DN (b)
(a)
(c)
FIGURE 6.1 Effect of density slicing. The raw image (a) in grayscale is color-coded (c) after the pixel values are discretized into seven levels using the thresholds (b) shown in the histogram.
6.1.2
Linear Enhancement
In linear enhancement the output value of a pixel is modified from its input value via a linear function (Fig. 6.2). The slope of the linear function dictates the degree of stretching. Its tangent of the angle with the horizontal axis (q ) represents the coefficient through which contrast is altered. If tan q > 1, then the enhanced image has a larger
205
206
Chapter Six 255
f
DNout
0
θ 0
DNin
255
FIGURE 6.2 Relationship between a pixel’s value in the input image DNin and its output value in the enhanced image DNout after it has been modified via the function f.
contrast than the original image. If tanq < 1, then the contrast of the output image is suppressed. This enhancement function is mathematically expressed as DN out =
DN in − DN min × (2 n − 1) DN max − DN min
(6.2)
where DNmax = the largest DN in the input image DNmin = the smallest DN in the initial image n = quantization level of the stretched image; it usually has a value of 8, even though a larger value is possible with the recent generation of satellite data The degree of stretching depends on the DN range (DNmax − DNmin) of the initial image. It is measured by the stretching ratio that is defined as
Stretching ratio =
2n − 1 DN max − DN min
(6.3)
Apparently, an input image with a narrow range of pixel values has a larger stretching ratio than that of a broader range. In linear stretching, the disparity between any two adjacent gray levels is enlarged proportionally irrespective of their actual value. In the output image, the disparity between any two adjacent gray levels is always constant (Fig. 6.3c). Example What output value should a pixel receive if it has a value of 112 in an input image in which DN ranges from 48 to 132 (assume the output image is recorded at 8 bits)?
Note: The value must be rounded up or down to the nearest integer as DN must be an integer. In practice, linear contrast stretching may be implemented either nontruncated or truncated. Nontruncated linear stretching is called full stretching, in which DNmax and DNmin are the actual values from the input image. In full stretching all information in the input image is completely preserved in the output image (Fig. 6.3b). Full stretching is limited in that a small degree of stretching is achievable, especially when the histogram has long and skinny tails (Fig. 6.3a). In this case the contrast of the same image can be stretched much more through truncated stretching. In truncated linear stretching, also called saturated linear stretching, DNmin and DNmax are special break points determined from the image’s histogram (Fig. 6.3c). Abrupt changes in the histogram are proper break points whose DN is taken as DNmin or DNmax.
(a) 40
Frequency
0
150
78
162 DN
255
DN
255
DN
255
(b)
Frequency
0 (c)
0
FIGURE 6.3 Rearrangement of pixel DN enumerated at 8 bits after linear contrast stretching. (a) Histogram of the raw band. The DNmin and DNmax are 40 and 162, respectively; (b) histogram of the same band after nontruncation linear stretching; (c) histogram of the same band after truncated linear stretching. All those pixels with a DN below 78 and above 150 have been amalgamated to achieve a larger stretching ratio.
207
208
Chapter Six In the absence of these critical points, DNmin and DNmax may be defined statistically, such as 5 percent of the total number of pixels in either side of the tail, or the mean DN ± one standard deviation. Defined in such a way, DNmin is larger than the actual minimum DN, while DNmax is smaller than the actual maximum DN. Both are less subject to random noise than the actual DNmin and DNmax. In truncated stretching, all the input pixel values smaller than this statistical DNmin are assigned the same minimum value, whereas all input pixel values above this statistical DNmax receive the same maximum value in the output image (Fig. 6.3c), resulting in a higher stretching ratio than full stretching. All the pixel values lying between these two extremes are stretched linearly just as in nontruncated stretching. Inevitably, truncated linear stretching involves loss of information at both tails of the distribution. Compared with the raw image (Fig. 6.4a), any subtle radiometric variations outside the DNmin − DNmax range are generalized in the output, which exhibits more detail thanks to its enhanced quality (Fig. 6.4b).
6.1.3
Piecewise Linear Enhancement
The contrast of the same input image may be linearly stretched differently for different pixel values in a piecemeal manner (Fig. 6.5). Instead of a single stretching function f for all DNs, a few linear functions are used for the stretching. Each function segment has its own slope and is applicable to a specific range of digital numbers. This is known as piecemeal linear enhancement. With the use of multiple enhancement functions, it is possible to stretch the contrast of an image at different pixel values. For instance, the contrast within a certain range of DNs is artificially enlarged just as in ordinary linear stretching, but the contrast over another DN range is suppressed. Suppression of contrast over a DN range that falls outside the scope of interest leaves more room to stretch the contrast over a wider range of DNs for features of interest (e.g., water turbidity). Through sacrificing the information of uninteresting features, features of interest are rendered more prominently in a piecewise linearly stretched image than a single linear stretching.
6.1.4
Look-Up Table
Look-up table is a method of adjusting the value of pixels in an input image based on a purposely defined scheme. Contained in this scheme is a series of arbitrarily but deliberately designed values corresponding to every potential value in the input band. A look-up table is an effective way of visualizing an image. If it is black and white, it can be easily rendered as a gray image, using only one series of numbers. However, three series of numbers are needed for its color rendition. In each series of numbers, there is a unique correspondence between an input value and the designated output value. Look-up
Image Enhancement
(a)
(b)
FIGURE 6.4 An example of linear nontruncating contrast stretching. (a) Raw image of IKONOS band 3. It is not so easy to interpret because of its dark tone; (b) image of the same band whose contrast has been linearly stretched. More details become visible after its tone is manipulated. Nontruncating means that degree of stretching is limited.
table is a means of visualizing the content of a single image to maximize its effectiveness of communication. However, the generation of a meaningful and satisfactory visualization requires repetitive efforts in fine tuning the output values for every given input value. This task is made more challenging if the image is color (Fig. 6.6). Illustrated in this example is a visualization of sea surface temperature derived from Advanced Very High Resolution Radiometer (AVHRR) data.
209
210
Chapter Six
f DNout
DNin
FIGURE 6.5 The relationship between a pixel value in the input band and the output band in a piecewise linear contrast stretching. The transformation function is made up of a few line segments, each having its own slope.
Temperature (degrees Celsius) 15 37 16 Cold Plume 17
38
18 19 20
39S
21 22
177
178
179
DN Blue Green Red 15
120
0
136
16
256
0
0
17
210
46
0
18
180
76
0
19
90
166
0
20
0
256
0
21
0
125
131
22
0
0
256
180E
FIGURE 6.6 An example of visualizing sea surface temperature using a look-up table. (Source: Modified from Gao and Lythe, 1996.) See also color insert.
6.1.5
Nonlinear Stretching
The function f in Eq. (6.1) can be nonlinear. Similar to piecewise linear stretching, nonlinear stretching allows some part of the input image to have a stretched contrast while contrast in some other DN ranges is suppressed in the output image. Unlike linear contrast stretching in which the contrast of an image is either enlarged or reduced, both contrast stretching and contrast compression can be achieved in one nonlinear stretching. Whether the contrast is stretched or suppressed depends on the input DN value and the nonlinear function (Fig. 6.7).
Image Enhancement
DNout
f
DNout
f
DNin
DNin
(a)
(b)
FIGURE 6.7 Nonlinear functions for transforming an input image’s contrast. (a) Logarithmic function in which the contrast of small DNs is stretched but larger DNs are suppressed; (b) exponential function in which small DNs are suppressed but large DNs are stretched.
It is possible to stretch the contrast in one gray level range, and to reduce the contrast in another gray level range. There are a number of nonlinear functions for contrast enhancement. Two common examples are logarithmic and exponential functions. The logarithmic function takes the following form:
DNout = log10DNin
(6.4)
In the above example, the logarithmic function has a base of 10. Other common bases are 2 and e. In all logarithmic stretching, the contrast is stretched for pixels of a small value, but suppressed for pixels of a large DN (Fig. 6.7a). The smaller the base, the more stretching at low values, the more suppression at high values, and vice versa. In exponential contrast stretching, pixel values in the output image are adjusted according to the following form: DN out = e DN in
(6.5)
This exponential function has a base of e, but it can be any positive figure. The exponential function achieves an adjustment effect just opposite to that of the logarithmic function. Namely, those gray levels with a smaller value are suppressed, but those with a larger DN value are stretched (Fig. 6.7b). The larger the base, the more the stretching. This stretching is effective at suppressing dark-toned features (e.g., water) and stretching light-toned features such as urban residential and industrial.
6.1.6
Histogram Equalization
The histogram of most images rarely has an equal distribution. It is more likely to be bell shaped. This kind of pixel value distribution suggests that the large majority of pixels are confined to a small range that is indicative of a low contrast. On the other hand, few very
211
212
Chapter Six extremely bright or dark objects are likely to occupy a wide range of DNs. Consequently, there is an imbalance in the number of pixels at a given DN (Fig. 6.8a). This imbalance represents an inefficient allocation of pixel DNs. Intuitively, the predominant pixels should be represented in a wider range of DNs so that the subtle spectral variations among them can be readily differentiated. This imbalance is ideally remedied through histogram equalization. This nonlinear contrast manipulation technique achieves an enhanced contrast at the expense of losing minor details in the input image. In the output image, distribution of pixels is roughly equalized through aggregation of minority pixels of a similar gray level into one value. Therefore, the output image always contains fewer levels of DN than the raw image. The DN range vacated from the aggregation is used by pixels of a predominant quantity. In this way, the spectral distance or disparity between any two adjoining DNs is artificially broadened. Since the number of pixels at a given DN varies widely, the aggregated frequency is rarely equalized no matter how differently individual frequencies in the histogram are combined. Instead, the discrepancy between the maximum and minimum number of pixels at different DNs is reduced in comparison to the raw distribution (Fig. 6.8b). The undertaking of histogram equalization requires a few specifications, including the number of gray levels in the output image. Then the probability for each pixel to occur in one of these levels is calculated. The calculation is illustrated using a hypothetical image of 7 × 8 pixels recorded at 4 bits (Fig. 6.8). Most of the 56 pixels have a DN centered at 5 and 13 (Fig. 6.8a). The first step in performing histogram equalization is to derive the cumulative frequency of pixels Σf(DNj) (Table 6.1, col. 3). This absolute frequency is then converted into the relative frequency c(k) by dividing by the total number of pixels N (56), or
c(k ) =
1 k 1 k f (DN j ) = ∑ n j ∑ N j= 0 N j= 0
(6.6)
where nj = number of pixels at gray level j and k = number of discrete gray levels. The results (Table 6.1, col. 4) are graphically illustrated in Fig. 6.8c. The next step is to calculate the equalized cumulative frequency expressed as probability. Since the 56 pixels are represented in 4 bits, or at 16 gray levels, the average probability of each level is
56/(16 − 1) = 6.25% The constructed relative cumulative probability as shown in Fig. 6.8d is the decision rule for equalization. For every relative cumulative frequency in Fig. 6.8c, its corresponding value in right
10
Frequency (absolute)
Frequency (absolute)
10 8 6 4
6 4 2
2 0
8
0
2
4
6
8
10
12
14
0
16
0
2
4
6
100 80 60 40 20 0
8
10
12
14
16
10
12
14
16
DN (b)
Equalized cumulative probability
Cumulative frequency (%)
DN (a)
100 80 60 40 20 0
0
2
4
6
8
10
12
14
16
0
2
4
6
8
DN
DN
(c)
(d)
213
FIGURE 6.8 An example of histogram equalization of an image recorded at 4 bits. (a) Histogram of the raw image; (b) equalized histogram of the output image; (c) cumulative relative frequency (%) of the input image; and (d) equalized cumulative frequency expressed in percentage. It is the decision rule in histogram equalization (e.g., whether pixels at a given DN level should be amalgamated with those at an adjacent DN level).
214 DN
Absolute Frequency
Cumulative Frequency
Cumulative Frequency, %
Equalized Cumulative Frequency
Nearest DN in the Output
Amalgamated Frequency
0
0
0
0.00
6.25 (1/16)
0
1
0
0
0.00
12.50 (2/16)
0
2
1
1
1.79
18.75 (3/16)
0
3
2
3
5.36
25.00 (4/16)
0
3
4
7
10
17.86
31.25 (5/16)
2
7
5
9
19
33.93
37.50 (6/16)
4
9
6
8
27
48.21
43.75 (7/16)
7
8
7
6
33
58.93
50.00 (8/16)
8
6
8
2
35
62.50
56.25 (9/16)
9
9
1
36
64.29
62.50 (10/16)
9
10
1
37
66.07
68.75 (11/16)
10
11
2
39
69.64
75.00 (12/16)
10
3
12
6
45
80.36
81.25 (13/16)
12
6
13
9
54
96.43
87.50 (14/16)
14
9
14
1
55
98.21
93.75 (15/16)
15
15
1
56
100.00
100.00 (16/16)
15
TABLE 6.1
An Example of Histogram Equalization for a 4-Bit Image
3
2
Image Enhancement diagram (Fig. 6.8d) is searched. Since the output DN is discrete, caution must be exercised here to determine to which DN the observed probability is closer, if it falls between two DNs. For instance, the observed probability at DN 4 (17.86 percent) lies between 12.50 percent at DN 1 and 18.75 percent at DN 2. Since it bears much more resemblance to the second percentage than the first one, it is hence assigned the new DN of 2 in the equalized image. The calculation of the output cumulative distribution frequency cout(k) in col. 6 of Table 6.1 is mathematically expressed as ⎡ c (k ) − cmin ⎤ cout (k ) = round ⎢ in × (2 b − 1)⎥ N − c min ⎣ ⎦
(6.7)
where cin(k) = input cumulative distribution frequency at level k cmin = smallest cumulative frequency of the image b = number of bits at which the output image is recorded Thus, ⎡ cin (11) − cmin
cout (11) = round ⎢
⎣
N − c min
⎤
⎡ 39 − 0 ⎤ × 15⎥ = round[10 . 45] = 10 56 − 0 ⎣ ⎦
× (2 4 − 1)⎥ = round ⎢
⎦
The newly assigned DNs for other levels are given in col. 6 in Table 6.1. After the nearest DN in the output image has been determined, it is time to calculate the number of pixels at these DNs. This process is a reversal of calculating the cumulative frequency, namely, to multiply the observed net frequency, defined as the cumulative frequency at the current DN level minus that at the previous DN level, by the total number of pixels in the input image. For instance, the cumulative frequency of DN 0 is observed at 5.36 percent. This translates into 3 pixels, or
56 × 5.36% = 3 Similarly, at DN 10 there are 56 × (69.64% − 64.29%) = 3 (pixels) The scaled frequency at all other DNs is provided in the last column of Table 6.1. The equalized histogram of the output image is shown in Fig. 6.8b. There are three disparities between the distribution in this figure and that in Fig. 6.8a: • First, the number of DNs at which there are pixels has been reduced from 14 to 10. This reduction is achieved through amalgamation of pixels at adjoining values, for instance
215
216
Chapter Six between DN 2 and DN 3, between DN 8 and DN 9, between DN 10 and 11, and between DN 14 and DN 15. • Second, the disparity between the maximum frequency (9 pixels) and the minimum frequency (1 pixel) has been reduced. Now the frequency ranges from 2 to 9 pixels. • Finally, the spectral distance between any two pixel-containing DNs has been enlarged. In the input image, this distance is invariably 1. However, it varies from 1 to 3 in the histogramequalized image. Moreover, the higher the frequency at a DN, the broader the distance. Therefore, the most dominant pixels in the input image are enhanced more than minority pixels. They are more easily perceived in the histogram-equalized image. Identical to truncated linear stretching, histogram equalization always involves loss of information. However, the manner of loss differs widely. Unlike truncated linear stretching in which the loss is always restricted to the darkest and brightest pixels, in histogram equalization the loss can occur at any DN level if there are few pixels at this level. In other words, the loss takes place at whatever gray level so long at it has a minority of pixels. Furthermore, the degree of stretching is proportional to the frequency. Those DN levels with a higher frequency are stretched more than those with a lower frequency. This explains why the interval between any two vertical bars in Fig. 6.8b is not uniform. Histogram equalization is an effective means of contrast enhancement. A high degree of stretching is achieved at the expense of losing information for minor pixels. As illustrated in Fig. 6.9, both the raw and enhanced images have the same DN range from the minimum of 0 to the maximum of 255. However, the raw image has a low contrast with most of the pixels having a value confined to a narrow range of DNs (Fig. 6.9a). They lean toward the lower end of the DNs, causing the image to have a rather dark tone. In its histogram-equalized counterpart, the dominant pixel values have shifted to the midrange, and the DN range has been extended. Consequently, more details are visible. Since the histogram is a bell-shaped curve, the loss of information is restricted to the lowest and the highest DNs. Namely, the darkest and the brightest features become indistinctive in the output image.
6.2
Histogram Matching The tonal inconsistency problem in creating a mosaic from multiple aerial photographs (Sec. 2.7) can be eliminated or reduced to a lesser extent through histogram matching. The principle underlying histogram matching is rearrangement of the pixel values in a slave image in such a way so as to achieve a distribution approximately identical to that of
Image Enhancement Histogram 3800
0 0
256
(a) Histogram 3800
0 0
256
(b)
FIGURE 6.9 Effect of histogram equalization on the contrast of an image. (a) Appearance of raw SPOT band 3 (256 rows by 256 columns) and its histogram; (b) the same image that has been contrast-enhanced with histogram equalization. In the histogram the lighter vertical bars represent the original frequency; the darker bars represent the frequency after histogram equalization, the same in the histograms to follow.
the master image. If both the master and slave images cover the same ground, the histogram can be created from the entire scene. Otherwise, it has to be established from a subset common to both of them. The matching is accomplished by adjusting the pixel value distribution of the slave image so that it mirrors that of the master image as closely as possible. Two steps are involved in achieving this adjustment: • First, a cumulative histogram c(k) is constructed for both the master image and the image to be adjusted using Eq. (6.6), just like in histogram equalization. • Second, a look-up table is constructed to determine the DN of pixels that should be reassigned to other DN levels in order to achieve the desired distribution. Through this look-up table, the histogram of the slave image is virtually transformed to that of the master image. How to construct a meaningful histogram is most critical to the success of histogram matching. In order to produce an ideal match, the
217
218
Chapter Six histogram of both images should be very similar. The matching is conceptually illustrated in Fig. 6.10. The two photographs (Fig. 6.10a and b) to be histogram matched cover the same geographic area. They have a temporal separation of 20 years (from 1978 to 1997). Histogram matching bears a high resemblance to histogram equalization. The only difference is that the second cumulative histogram comes from the master image (Fig. 6.10b) instead of the equalized one in this case. Because of the changed illumination conditions and the environment, the histogram-matched image (Fig. 6.10c) highly resembles the master image, but is not identical to its radiometry.
Histogram 26366
0 0
256
(a) Histogram 25126
0 0
256
(b) Histogram 3800
0 0
256
(c)
FIGURE 6.10 An example of histogram matching. (a) Slave image and its histogram; (b) master image and its histogram; (c) the output image whose histogram has been matched to that of the master image (b).
Image Enhancement It must be pointed out that the two images will not look alike even if their histogram has been perfectly matched unless they are multitemporal images of the same geographic area that has not experienced much radiometric variation in the interim. This effectively guarantees that the proportion of different land covers in the area remains little changed. Since different objects have different spectral properties, water pixels tend to have a smaller value than their land counterpart. A perfect matching is almost impossible to achieve in this case. Besides, even the second best match still causes noticeable disparities in radiometry between the two images. The above discussion applies to black-and-white imagery of a single band. It is more difficult to achieve a perfect match for color images as they involve more parameters. Apart from hue, saturation, and brightness, both need to be matched as well. So the matched color image will resemble the master image less closely than if it were black and white.
6.3
Spatial Filtering Spatial filtering is a window-based image processing technique for altering the input pixel value based on its own value and the value of the pixels surrounding it. It requires the use of a spatial mask known as a spatial filter. Filtering is carried out to achieve several functions, such as image smoothing and feature enhancement within a neighborhood.
6.3.1
Neighborhood and Connectivity
In a raster image, neighborhood refers to a defined window inside which all the pixels surrounding the one in question are considered its neighbors. The appearance of a neighborhood depends upon the definition of connectivity. There are two types of connectivity, four and eight. In the former case, four neighboring pixels above, below, to the left and to the right of pixel (r, c) are regarded as the neighbors (Fig. 6.11a). In the eight-connection situation, all the pixels immediately adjoining the pixel under consideration at (r, c) are regarded as its neighbors, including those to the upper left, upper right, lower left, and lower right (Fig. 6.11b). This neighborhood is defined by a window size of 3 × 3 pixels. Other larger neighborhood sizes (e.g., 7 × 7 and 9 × 9) are also commonly used in spatial filtering.
6.3.2
Kernels and Convolution
Spatial filtering requires a convolution kernel, also called “template” by some authors. A kernel is a matrix of values whose size governs the sphere of influence of neighboring pixels in spatial filtering. Common kernel sizes are odd numbers between 3 and 9. The larger the kernel size, the more computation is involved, the more the output pixel value is subject to that of its neighboring pixels. Elements in the
219
220
Chapter Six
r−1,c
r,c
r,c−1
r−1,c−1
r−1,c
r−1,c+1
r,c−1
r,c
r,c+1
r+1,c−1
r+1,c
r+1,c+1
r,c+1
r+1,c (a)
(b)
FIGURE 6.11 Definition of neighborhood with a window size of 3 × 3 pixels for pixel (r, c) (shaded). (a) Four-connectivity neighborhood; (b) eightconnectivity neighborhood.
matrix, often called kernel coefficients, serve to weigh pixels in calculating the convoluted output. Different kernel values serve different filtering purposes. The convolution of the kernel with the two-dimensional (2D) input image is essentially a matrix multiplication (Fig. 6.12). Since the kernel is square, the working window must also be square. The weights in the kernel dictate the influence of pixels in the corresponding position. This operation is mathematically expressed as DN out =
1 d d ∑ wij DN(i, j)in W∑ i=1 j=1 d
(6.8)
d
W = ∑ ∑ wij
(6.9)
j=1 i=1
245
233
247
274
344
269
240
251
260
332
W11
W12
W13
W21
W22
W23
305
268
230
234
259
W31
W32
W33
305
258
310
259
276
331
218
454
386
557
(a)
(b)
FIGURE 6.12 The spatial convolution concept in image spatial filtering. (a) A 3 × 3 kernel, or template, containing weights; (b) the array of pixel values in the input image (only partial) as shown in Fig. 1.3. The operation is based on moving window. After the boldfaced pixel is convoluted, the operation moves on toward the right by one pixel at a time.
Image Enhancement where
DNout = convoluted output pixel value DN(i, j)in = pixel value in the input image at location (i, j) i (i = 1, 2, 3) = row index j (j = 1, 2, 3) = column index wij = value of the element at location (i, j) in the kernel W = sum of all kernel elements d = kernel size, usually an odd number ranging from 3 to 9
The kernel is applied to the image in a moving window manner. The operation moves on to the next pixel in the same row after the current one has been convoluted. This is repeated until the next-tolast pixel in the row, and then continues with the first pixel in the following row. The output image has a lower dimension than the input image because the first and last rows/columns do not have a complete neighborhood if the kernel size is 3 × 3. This reduced dimension may be restored to that of the input image by duplicating the first and last rows/columns of the output image.
6.3.3
Image Smoothing
Also called low-pass filtering or low spatial frequency (defined as infrequent grayscale changes that occur gradually over a relatively large number of pixel distance) filtering, image smoothing is a process of suppressing noise in the input image that may arise during image acquisition and transmission. Radiometric noises in an image are manifested as abnormally larger or smaller pixel values than those in the neighborhood. Since the genuine pixel value is unknown, noise cannot be completely eliminated through image smoothing. Instead, this noise is suppressed to a certain degree by dividing it among all pixels within the kernel. There are several methods for suppressing the noise. A common method is to replace the noise-infected pixel with the mean of all pixel values inside the kernel. Essentially, the noise is shared by all pixels in the kernel. Another method is to filter the noise-infested pixel out by substituting it with a statistical parameter of all pixels (e.g., median). Low-pass filtering makes use of the low-pass kernel that is characterized by an equal weight for all elements: ⎛ 1 1 1⎞ 1⎜ 1 1 1⎟ 9⎜ ⎟ ⎝ 1 1 1⎠ During low-pass filtering, the pixel values are averaged among those in the kernel. High-frequency features (e.g., edges) are subdued in the smoothed image. Thus, every pixel in the window shares a portion of the abnormality that may exist in the value of the pixel under
221
222
Chapter Six
(a)
(b)
(c)
(d)
FIGURE 6.13 Effect of various convolutions within a 3 × 3 kernel on an image. (a) Raw band; (b) low-pass filtered output; (c) high-pass filtered image; (d) median filtered image.
consideration. In comparison with the raw image (Fig. 6.13a), the smoothed image has reduced contrast and a blurred appearance (Fig. 6.13b). The degree of blurring is related to the kernel size. A large kernel causes more smoothing than a small one. It also results in a greater loss in high-frequency details. It is possible to treat pixels in the kernel as having a differential significance by assigning nonuniform weights to the elements. Below are two examples of a 3 × 3 kernel with different weights: ⎛ 1 1 1⎞ 1⎜ 0 0 0⎟ 6⎜ ⎟ ⎝ 1 1 1⎠
and
⎛ ⎜ 1 ⎜ 6 + 2 2 ⎜⎜ ⎜⎝
2 2 1 2 2
1 2 1
2⎞ 2 ⎟ ⎟ 1⎟ 2⎟ ⎟ 2 ⎠
Image Enhancement The first filter is characterized by the same value for all elements in the same row, but different values in different rows. A common application of this filter is to remove from Landsat images the dropout lines caused by the malfunctioning of one of the detectors. This removal is accomplished by applying the filter along the dropout lines. A missing line is replaced by the average of the two scan lines immediately above and below it. The second kernel places less importance on the four corner pixels because they are farther away from the pixel under consideration than the four horizontal and vertical neighboring ones. No matter how widely the weight varies from one element to another in the kernel, all three smoothing operants presented above have one characteristic in common: the sum of all nine elements divided by the scalar in front of the matrix equals 1. In this way the image pixel value is not artificially scaled up or down after being smoothed. The last two kernels are the same as the first one in that they maintain the input image pixel values unchanged after the convolution. This is achieved by having a scalar equivalent of the inverted sum of all elements. Image filtering using an operand with differential weights is called high-pass filtering, during which the difference between adjacent pixels is artificially enlarged. Contrary to low-pass filtering, high-pass filtering attenuates low-frequency features (Gonzalez and Woods, 1992). As a result, high-frequency features, such as edges between homogeneous groups of pixels and other sharp details, stand out. High-frequency filtering produces an effect just opposite to low-frequency filtering. In high-pass filtered images, large pixel values become larger and spatial frequency is increased. A high-frequency kernel or high-pass kernel has the effect of enhancing features of a high spatial frequency. High spatial frequencies are those that represent frequent grayscale changes in a short distance. For instance, features that are separated at a large distance are made more visible on the output image. The net effect of high-pass filtering is the reduction of slowly varying features and a correspondingly apparent enhancement of edges and other sharp details (Fig. 6.13c). Unlike image smoothing, the operant has a strong component of orientation. Only those edges oriented along a certain direction can be sharpened during high-pass filtering. It enables edges to be highlighted, but does not necessarily eliminate other features (Leica, 2006). ⎛ − 1 − 1 − 1⎞ ⎜ − 1 9 − 1⎟ ⎜ − 1 − 1 − 1⎟ ⎝ ⎠ Unlike low-pass filtering operants, high-pass operants do not have a scalar since all the elements in it amount to a sum of zero, as shown in the example above.
223
224
Chapter Six 5 8 10 7 16 9
5 8 10 7 8 9
6
6 5 11
5 11 (a)
FIGURE 6.14 output.
6.3.4
(b)
Effect of median filtering. (a) Input window; (b) median-filtered
Median Filtering
As a spatial domain processing technique, median filtering is very similar to image smoothing in that it is carried out within a window. Unlike image smoothing, it does not require an operant, or a convolution kernel. Instead, only the pixels within a neighborhood are examined, with the central pixel being the focal point. The median of these pixels is determined by sorting the nine DNs inside the window in either ascending or descending order (e.g., the fifth number in the list). This median is used to replace the pixel value in question in the output image. For instance, the input window contains the following nine pixels: After sorting, their DNs are ordered in the ascending order of 5, 5, 6, 7, 8, 9, 10, 11, 16, with the median being 8. The central pixel value of 16 is replaced by the median 8 as the output DN for the pixel under study (Fig. 6.14b). This example illustrates that this filter requires only simple calculation and thus can be implemented very quickly. Nevertheless, it is effective in removing outliers, impulse-like noises, and speckles commonly encountered in radar imagery. These noises usually occur as singular pixels. The principal advantage of this method is that it leaves edges intact and thus preserves the sharpness of an edge (Richards and Jia, 2006). As illustrated in Fig. 6.13d, the medianfiltered image is also blurred in comparison with the raw image. Again, the degree of blurring is related to the template size. The processed image is very similar to the smoothed one (Fig. 6.13b). As a matter of fact, median filtering degenerates into low-pass smoothing if the median of the nine pixels is replaced with their mean.
6.4
Edge Enhancement and Detection An edge or linear feature is manifested as an abrupt change in DN along a certain direction in an image. This direction is the orientation of that feature. The manifestation becomes an extreme of the first-order derivative or a zero crossing in the second derivative. Edge detection can be based on such a discontinuity property by tracing the maximum along the bound of an area. A few methods are available for implementing edge detection and enhancement. This section introduces two of them, self-subtraction and edge-detection templates.
Image Enhancement 5
5
5
5
5
5 5 5 5 5 17 17 17 17 17 17 17 17 17 17 (a)
0
0
0
0
0
0 0 0 0 0 12 12 12 12 12 0
0
0 0 (b)
0
FIGURE 6.15 Principle of edge detection through image self-subtraction. (a) Raw image of 4 by 5 pixels showing a horizontally oriented edge; (b) selfsubtracted image showing the edge after a vertical shift by one pixel (e.g., l = 1, m = 0). The first row is a duplication of the second row.
6.4.1
Enhancement through Subtraction
Edge enhancement through image self-subtraction is underpinned by the fact that nonedge features have a spatially uniform value, in sharp contrast to edges that experience a drastic and usually abrupt change in pixel value along a certain direction (Fig. 6.15a). A new image is created by duplicating the existing one. If this newly created image is subtracted from the source one, then nothing remains in the resultant image. No edges are detectable through this subtraction. The subtraction can be improved by slightly shifting one of the images to the left, or to the right, or above or below, or even diagonally by one or two pixels. This operation is mathematically expressed as ΔDN(i, j) = DN(i, j) − DN(i + l, j + m) + b where
(6.10)
DN(i, j) = pixel value at location (i, j) DN(i + l, j + m) = pixel value at location (i + l, j + m) (l, m = 0, 1, 2, … , the distance of shift) of the same image b = bias to prevent the emergence of negative differences
The above subtraction essentially compares the pixel values of the same image at a spatial separation of (l, m). In the difference image, all nonedge pixels have a value of zero, in stark contrast to edge pixels, which have a nonzero value (Fig. 6.15b). Thus, nonedge features disappear from the difference image, leaving only linear features in the difference image. This newly derived layer can be added back to the original image to enhance edges. It must be noted that only those features perpendicular to the direction of shift can be detected in one subtraction. If linear features are oriented in multiple directions, several self-subtractions are essential to detect all of them. In each subtraction, the duplicated image is shifted in one of the four possible directions. All the separately detected edges are merged to form one composite image to show all the detected edges. As with spatial filtering, the output image may have a dimension smaller than that of the input image. This can be restored by
225
226
Chapter Six duplicating the border rows and columns. However, those linear features very close to the image border cannot be detected well using this method. The above discussion applies to ideal situations where there is no noise in the image, which is rarely true in reality. In order to reduce the random variation of pixels within the same feature and hence improve the reliability of edge detection, it may be necessary to smooth the image using the methods described in Sec. 6.3.3 before it is used in the detection. The detection quality can also be improved by imposing a threshold to test the validity of the detected edges in a postdetection session. For instance, only those differences exceeding a certain threshold are regarded as representing genuine edges. All other differences are treated as noise and removed. Another postdetection processing technique is to spatially filter the detected results. All isolated nonzero pixels that do not appear to be aligned with any linear segments in a meaningful direction are eliminated from the output image.
6.4.2
Edge-Detection Templates
Several templates have been devised for edge detection (Fig. 6.16). All of them have one characteristic in common: the sum of all elements in a kernel is zero. These zero-sum kernels smooth out areas of low spatial frequency (e.g., absence of any edge), and cause a low output in areas of low spatial frequency. In areas of high spatial frequency (e.g., the interface of homogeneous patches of pixels), a sharp contrast results. It is possible for the edge-enhanced images to contain only edges and zeros. In areas of high spatial frequency, the disparity among pixel values is magnified as large values become larger while low values become even lower owing to the unequal element values along the first/last row/column (Leica, 2006). Edges are enhanced in the filtered image as it frequently is made up of only edges and zeros. Two kinds of special edge-detection operants deserve more discussion here, Sobel filters (Fig. 6.16a, b, c, top) and Prewitt (1970) filters (Fig. 6.16a, b, c, bottom). The nine elements can be arranged horizontally, –1 –2 –1
0 0 0
1 2 1
–1 –2 –1
–1
0
1
–1 –1 –1
–1
0
1
0
0
0
–1
0
1
1
1
1
(a) Vertical
0 1
0 2
0 1
(b) Horizontal
–2 –1
0
–1 0
0 1
1 2
0
1
1
–1
0
1
–1 –1
0
(c) Diagonal
FIGURE 6.16 Examples of edge detection operants. (a) Operants designed to detect vertically oriented edges; (b) operants designed to detect horizontally oriented edges; and (c) operants designed to detect diagonally oriented edges.
Image Enhancement –1 –1 –1
2 2 2
1 1 1
(a)
–1
2
1
–2 –1
4 2
2 1
(b)
FIGURE 6.17 The Laplacian templates based on second-order derivatives. (a) Unweighted line; (b) weighted line.
FIGURE 6.18 Output image that has been Laplacian edge-enhanced using the raw image in Fig. 6.13a. Conspicuous in the image are linear features that are oriented in a certain direction.
vertically, or diagonally to detect edges oriented in a direction perpendicular to them. As with all edge-detection templates, the nine coefficients are designed so that they add up to zero. This sum of coefficients should not be used in the convolution shown in Eq. (6.8). The Laplacian template is another edge-detection-operant based on second-order derivatives. They are effective at detecting lines or spots distinctively from ramp edges (Leica, 2006). They have two forms, one used to detect unweighted lines (Fig. 6.17a) and one used to detect weighted lines (Fig. 6.17b). The processed images often show edges and zeros conspicuously (Fig. 6.18). In this enhanced image, large pixel values are much larger, and small pixel values are much smaller than those in the input image.
6.5
Multiple-Image Manipulation The aforementioned enhancement techniques involve only a single band in the input and in the output. In practice it is possible to generate a new image from multiple images of the same area. Usually, these images are multispectral bands from the same sensor obtained at the same time, covering an identical geographic area. If any of these conditions is not met (e.g., images from different sensors or obtained from the same sensor at different times), the images need to
227
228
Chapter Six be registered to the same coordinate system and resampled to the same spatial resolution using the methods introduced in the previous chapter before they can be manipulated. Multiple-image manipulation may be performed on individual pixels aspatially. The two input images can be manipulated using a wide range of arithmetic operations, such as addition, subtraction, multiplication, division, or their combination. The DN of every pixel in one image is added to, subtracted from, or multiplied/divided by the DN of the corresponding pixel in another image. Subtraction is commonly used to detect edges (see Sec. 6.4.1). It also finds applications in change detection, which will be discussed in detail in Chap. 13. By subtracting one image from another, it is possible to detect variations in the scene. Division is an operation of comparing multitemporal images. It is better than image subtraction in detecting changes from images that are recorded at different seasons and at different times of the day. Pseudoimages caused by the change in shadow length as a result of these differences are able to be eliminated partially through ratioing. Furthermore, it is possible to combine image subtraction with image division to achieve even better results.
6.5.1
Band Ratioing
Band ratioing refers to division of one spectral band by another from the same sensor, preferably obtained at the same time. Prior to division, the two bands must be coregistered precisely if they come from separate sensors or cover a different ground area. After precise coregistration, a pixel in one image corresponds to its counterpart in another image. The ratioing of one image by another means the pixel value at the same location is divided by one another. After division, all pixel values that are expressed as a ratio between 0 and 1 may have to be rescaled to 0 to 255. Band ratioing is able to achieve several purposes, dependent on the nature of the input bands. If the two bands are obtained at different times, band ratioing is effective at detecting changes that have taken place during the interval (change detection will be covered in detail in Chap. 13). If the two bands are from the same sensor, then this process is effective at eliminating radiometric variations caused by topography (Fig. 6.19). The sunlit slopes have a brighter tone than the shadows in the same bands. However, after the two bands are ratioed, the same feature has the same or nearly the same values in the resultant image while the spectral disparity between different features is enlarged. In addition, band ratioing is also effective at partially eliminating the impact of atmospheric radiance. For instance, if the atmospheric effect causes pixel DN values to be 3 higher across all pixels, then a division of (95 − 3)/(102 − 3) yields a ratio of 0.929. This value is extremely similar to the ratio of 0.93 derived from the division of raw DN containing the atmospheric effect.
Image Enhancement
Digital Number Forest
Slope
Band A
Band B
Band A/Band B
Oak
Sunlit
95
102
0.93
Shadow
42
44
0.95
Sunlit
66
89
0.74
Shadow
26
35
0.74
Pine
Sunlit slope
Shadow
FIGURE 6.19 Effect of band ratioing in eliminating terrain-cast shadow. (Source: Modified from Sabins, 1996.)
6.5.2 Vegetation Index (Components) Vegetation index (VI) is an arithmetic disparity between pixel values in two or more spectral bands of the same imagery. Originating from the same sensor, both bands are acquired at the same time. This effectively ensures that their spatial resolution is identical and they cover the same ground area. Vegetation indexing is able to enhance the conspicuousness of vegetation through subtraction of one spectral band from another because of the differential reflectance of ground features over different wavelength ranges (Fig. 1.10). Vegetation has a minor peak reflectance around the green (0.5 μm) spectrum, but a much higher peak reflectance in the infrared spectrum. By comparison, the spectral reflectance of soil does not vary significantly with wavelength. Of the two bands involved in producing an indexing image, the band to be subtracted from must have a wavelength in the nearinfrared spectrum (e.g., 1.1 μm). Here the reflectance of vegetation is peaked, in sharp contrast to soil and water that have a much lower reflectance. The second band used in the subtraction should have a wavelength around the red spectrum (e.g., 0.6 μm) where the reflectance of vegetation is much lower than elsewhere. By comparison, soil and water have a reflectance very similar to that in the near-infrared
229
230
Chapter Six spectrum. Subtraction of the two bands leads to a near-zero difference in the resultant image for them. This effectively exaggerates the spectral disparity between vegetation and nonvegetative covers, making vegetation more visible in the outcome band. In other words, the resultant image maximizes the vegetative signal and suppresses the visibility of soil and other background covers. VIs indices are related closely to the amount of vegetative cover present on the ground, and its greenness or biomass. Highly correlated to green-leaf density, they can be viewed as a proxy for aboveground biomass (Bannari et al., 1995; Rasmussen, 1998). Vegetation indexing using a multitude of spectral bands is an effective means of taking advantage of the rich spectral information of multispectral satellite data. These indices can serve as good surrogate measures of vegetative cover if calculated properly from the right combination of spectral bands. In the multispectral domain, which band should be the near infrared and which should be the red band depends upon their wavelength range, which in turn is governed by the spectral reflectance curve. For Landsat Multispectral Scanner (MSS) data they are bands 7 and 5 [Eq. (6.11)], but channels 1 and 2 for AVHRR data [Eq. (6.12)].
VI = MSS7 − MSS5
(6.11)
VI = ch 2 – ch 1
(6.12)
All VIs must be radiometrically standardized to account for the atmospheric effect, differences in solar elevation, and differences in instrument calibration from one image to another if they are to be compared with one another directly, as in a longitudinal study. Seasonal and geographic variations can be taken into account by standardizing the derived VIs through division by the sum of the same two bands [Eq. (6.13)], and the resultant index is the normalized difference vegetation index (NDVI). In this way, all indices obtained from images of different seasons across the globe are directly comparable to one another. NDVI =
RNIR − Rred RNIR + Rred
(6.13)
where RNIR and Rred represent spectral reflectance at the near-infrared (0.73 − 1.10 μm) and red (0.58 − 0.68 μm) wavelengths, respectively (Holben, 1986). Again, the actual spectral bands corresponding to RNIR and Rred vary with the sensor, such as bands 7 and 5 for Landsat MSS data [Eq. (6.14)]. NDVI =
FIGURE 6.20 Distribution of grass cover density derived from an NDVI image that is produced from Landsat TM3 and TM4 in conjunction with 68 in situ density samples. (Source: Zha et al., 2003a.) See also color insert.
Relatively easy to derive from multispectral remote sensing data, NDVI has found wide applications in quantifying vegetative cover on the ground (Bryceson, 1989), monitoring land surfaces and vegetation canopies, and estimation of leave area index, in addition to estimating grass cover, vegetation biomass, and quantifying percentage grass cover. Multitemporal NDVI data are routinely used to study vegetation health and seasonal variations. With a number of concurrently collected in situ samples, it is possible to convert radiometrically calibrated satellite data into absolute cover densities on the ground via this index (Zha et al., 2003a). Hence, NDVI is effective at detecting and visualizing the quantity of grassland biomass (Fig. 6.20), revealing potential and current status of grassland, from which it is possible to quantify grassland degradation. If combined with the moving standard-deviation index, NDVI proves a powerful index in monitoring degradation patterns in a semiarid heterogeneous landscape (Tanser and Palmer, 1999).
6.6
Image Transformation Image transformation differs from all the image processing techniques covered previously in that the input image is made up of more than two bands. It is also possible to output multiple bands from image transformation, their exact number depending upon the specification by the analyst. These newly created bands represent the reorganized information content of the raw bands. Images are transformed
231
232
Chapter Six into new bands to fulfill different purposes, such as decorrelation of spectral bands, reduction in shared amount of information and the number of bands used to represent it, and enhancement of certain features. In this section, three image transformations will be covered: principal component analysis (PCA), hue-intensity-saturation (HIS) transformation, and Tasseled Cap transformation.
6.6.1
PCA
In remote sensing, spectral radiation from the ground is captured in multispectral bands so as to facilitate the identification and detection of ground features. The recent trend of data acquisition is toward refinement of spectral resolution so that subtle variations in radiance from the objects of interest can be captured and detected from satellite data. For instance, the spectral resolution of Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) imagery has improved to 10 nm with 224 bands. Such a fine spectral resolution increases the spectral information content about the target on the one hand. On the other, the huge number of spectral bands undoubtedly leads to data redundancy as the spectral reflectance of some ground objects scarcely varies over certain wavelength ranges. In other words, not all multispectral bands contribute the same amount of information toward the total. Chances are most of the information contained in one band is also found in another because their wavelength is so close to each other, hence their information content is highly correlated. Such data redundancy increases the cost of data storage and needlessly prolongs image classification. It is very desirable and even necessary to reduce the number of multispectral bands without losing substantial information through image transformation. Data redundancy among spectral bands is best visualized by plotting a scatter diagram of pixel values in these bands. Illustrated in Fig. 6.21 are two-band scatterplots. If the pixels are distributed along a line, then the information content of the two bands is highly correlated (Fig. 6.21a). If the pixels are widely scatted without following any trend, then the information content of the two bands is uncorrelated (Fig. 6.21b). There is little data redundancy between them. The issue of data redundancy is addressed by transforming the raw data into another domain using PCA. PCA is able to fulfill three objectives: • First, it can be used to ascertain the information content of each multispectral band and to identify the most informative bands. • Second, it is able to reduce the number of bands needed to represent most of the information contained in the original spectral bands. • Finally, transformation of the information content onto orthogonal axes increases spectral separability of certain spectrally adjacent classes that partially overlap each other in the original spectral domain. The undertaking of PCA is illustrated with eight pixels in two spectral bands (Fig. 6.22). The values of these pixels are given in Table 6.2.
Image Enhancement
8
DN of band 2
DN of band 2
8 6 4 2 0
6 4 2
0
0
4 6 8 DN of band 1 (a)
2
0
2
4 6 8 DN of band 1 (b)
FIGURE 6.21 Scatterplot of pixel values in a two-band spectral domain. (a) The spectral content of the two bands is highly correlated as the pixels follow a linear distribution; (b) there is little correlation between the spectral content of the two bands since the pixels are widely dispersed in their distribution.
5
DN of band 2
4
e on
nt
2
8 3
6
2 4
4
Co
8
12 10
8 1 nt
e on
6 p m
4
2
0 0
7
1
2
mp
6
Co
θ 2 2
4 6 DN of band 1
8
FIGURE 6.22 Pixel values in the raw bands before and after the PCA transformation. Essentially, the transformation is fulfilled by rotating the axis by an angle of q to form components. Boldfaced numbers: pixel number in Table 6.2 (col. 1).
The first step of PCA is to calculate the mean pixel value in each band X (cols. 2 and 3, Table 6.2 in preparation for the derivation of standard deviation. Provided below is the calculation of the variance-covariance matrix V by plugging all the relevant values into Eq. (6.15). V=
1 N (xi − X )(xi − X )T N − 1∑ i=1
(6.15)
where N refers to the total number of pixels in the input image.
233
234
Chapter Six
xi − X Pixel
Band 1
Band 2
Band 1
Band 2
Comp 1
Comp 2
1
2
4
−3
−1
3.83
2.30
2
4
5
−1
0
6.06
2.07
3
3
6
−2
1
5.75
3.45
4
4
3
−1
−2
4.99
0.38
5
7
8
2
3
10.20
2.99
6
7
6
2
1
9.13
1.30
7
8
5
3
0
9.43
−0.08
8
5
3
−2
5.83
−0.16
X
5
TABLE 6.2
0 −
5
−
Coordinates of the Eight Pixels in the Two Spectral Bands in Fig. 6.22
After the variance-covariance matrix V is determined, it is time to calculate the correlation matrix R using Eq. (6.16): ⎡1 ⎢r R = ⎢ 21 ⎢ ... ⎢⎣rn1
r12 1 ... rn2
... r1n ⎤ ... r2 n ⎥ ⎥ ... ... ⎥ ... 1 ⎥⎦
Image Enhancement where Vij
rij = rji =
r12 =
(6.16)
Vii × Vjj
V12 V11 × V22
=
2 . 14 6×4
= 0 . 44
⎛1 . 00 0 . 44⎞
R= ⎜ ⎝ 0 . 44 1 . 00⎟⎠ In other words, 44 percent of information is shared between bands 1 and 2 as illustrated in Fig. 6.22. The variance matrix after transformation must meet the following condition:
|V − k I| = 0
(6.17)
where I = identity matrix; k = eigen value matrix. It has the following form: ⎛ λ1 ⎜0 k =⎜ ⎜ ... ⎜⎝ 0
0 λ2 ... 0
... 0 ⎞ ... 0 ⎟ ⎟ ... 0 ⎟ ... λn⎟⎠
Plugging the V and I matrices into Eq. (6.17) yields the following formula: ⎛ 6 2 . 14⎞ − l ⎛⎜ 1 0⎞⎟ = 0 ⎜⎝ 2 . 14 4 ⎟⎠ ⎝ 0 1⎠
or
6−λ 2.1.4 − 0 = 0 2 . 14 − 0 4−λ
Or (6 − l )(4 − l ) − 2.14 × 2.14 = 0 So l 1 = 7.36 l2 = 2.76 The eigen value matrix k after the transformation is
k = ⎛⎜7 . 36 0 . 00⎞⎟ ⎝ 0 . 00 2 . 64⎠
The following three points should be noted from the variancecovariance matrix V and the eigen value matrix k: • The total variance of the two spectral bands (6 + 4 = 10) before the transformation is exactly the same as the total eigen values (the main diagonal elements) (7.36 + 2.64 = 10) after the transformation. This clearly demonstrates that PCA does not
235
236
Chapter Six create any new information in the components. Instead, it simply redistributes the available information among the output components. • The correlation coefficient between the two newly created bands is zero after the transformation. In other words, the information content of component (band) 1 does not overlap with that of component (band) 2 anymore. For this reason, the eigen value matrix is usually written as a one-dimensional (1D) matrix [7.36 2.64]T. • Prior to image transformation band 1 carries 6/(6 + 4), or 60 percent, of the total amount of information. After the transformation, this figure rises to 7.36/(7.36 + 2.64), or 73.6 percent. On the other hand, the information content of band 2 drops from 4/(6 + 4), or 40 percent, before the transformation to 2.64/(7.36 + 2.64), 26.4 percent, after the transformation. Therefore, the first component is much more informative after the transformation than before. The opposite is true for the second component. In fact, the information content of the components decreases drastically from the first to the last. In order to project the information content of the input bands X into new component images Y, it is necessary to determine the transformation matrix G. The purpose of image transformation is to construct a new feature space in which the covariance between any components equals zero.
Y = GX
(6.18)
G is a rotation matrix in the following form: ⎡g11 ⎢g G = ⎢ 21 ⎢ ⎢⎣g n1 In this particular case
g n2
g 1n ⎤ g2 n ⎥ ⎥ ⎥ g nm ⎥⎦
⎛g G = ⎜ 11 ⎝ g 21
g12 ⎞ g 22⎟⎠
g12 g 22
G must meet the following condition:
G−1 = GT
(6.19)
This is to say, the inversed matrix equals its transposed matrix so that pixel values are not artificially enlarged or reduced after transformation.
Also called a loading matrix in statistics, matrix G essentially represents the angle of rotation of the former spectral domain to the new component domain (Fig. 6.22). In this particular case the angle of rotation (q ) equals 36°06′. The element values in matrix G govern how the pixel values in the raw band should be rearranged among
237
238
Chapter Six the components. The output pixel value in a component after transformation is a linear combination of all the input pixel values [Eq. (6.23)]. The transformed coordinates of the eight pixels are given in the last two columns of Table 6.2.
Presented in Fig. 6.23a is a subscene (250 × 250 pixels) from Landsat Thematic Mapper (TM) imagery of central Auckland, New Zealand.
(a)
(b)
(c)
(d)
FIGURE 6.23 PCA-transformed component images of a subscene from Landsat TM imagery of central Auckland (image size: 250 × 250 pixels). (a) Color composite of raw bands 4 (red), 3 (green), and 2 (blue). See also color insert. (b) Component 1 image, showing mostly variation of land. (c) Component 2 image. (d ) Component 3 image. (e) Component 4 image. (f ) Component 5 image. (g) Component 6 image.
Image Enhancement
(e)
(f)
(g)
FIGURE 6.23
(Continued)
It contains only six bands with band six excluded because its spatial resolution (120 m) is incompatible with the other bands (30 m). The PCA results of the image are presented in Tables 6.3 to 6.5. Shown in Table 6.3 is the variance-covariance matrix, which is symmetric and square. Its dimension equals the number of spectral bands used in the PCA. The most revealing figures in the matrix are the main diagonal values. They illustrate the variance of spectral bands. This variance is indicative of the diversity of pixel values that are related indirectly to ground covers in the scene. For a given scene, this variance is also indicative of the potential range of DNs. A band of a small variance suggests that all pixels have a similar value. Therefore, it is difficult to separate them spectrally. In this sense, variance can serve as an indicator of information content of a given spectral band. Judged against this criterion, the most informative band is
239
240 Band
Chapter Six
1
2
3
4
5
7
1
21.7564
2
8.6635
5.1738
3
11.3633
5.8163
4
−11.7377
0.0769
9.8601
132.3204
5
−1.0364
3.1006
14.5452
83.5923
74.5963
7
3.6701
2.8159
8.6601
26.9379
28.8602
12.1910
15.0311
TABLE 6.3 Variance-Covariance Matrix of the Auckland TM Subscene Image in Fig. 6.23a
band 4, and the least informative is band 2. Its small variance means that most pixels have a similar value in this band, making it difficult to interpret the image. Similar to the variance-covariance matrix, the correlation matrix (Table 6.4) is also square and symmetric. Again, its dimension is the same as the number of spectral bands used in the analysis. Unlike the variance-covariance matrix, all main diagonal values have a value of 1 as the content of each band is perfectly correlated with itself. All the off-main diagonal values vary between −1.0 and 1.0. Virtually, these correlation coefficients represent the degree of data redundancy between any two bands. For instance, the information content of bands 4 and 5 is highly correlated with each other at 84.14 percent. This means more than three quarters of the information is shared between the two bands. This high correlation, however, does not imply that band 4 is of little value as its degree of correlation with band 2 is very low at only 0.30 percent. During PCA it is possible to output only a portion of all possible components (Fig. 6.23). The first three component images (Fig. 6.23b, c, and d) are still informative in a sense that the scene is illustrated quite well. However, starting from the fourth component (Fig. 6.23e), the quality of the component images deteriorates so drastically that little information about the scene is preserved. It is not absolutely necessary to output all the
Band
1
2
3
4
5
1
1
2
0.8166
1
3
0.6977
0.7323
1
4
−0.2188
0.0030
0.2455
5
−0.0257
0.1578
0.4823
0.8414
1
7
0.2029
0.3193
0.6397
0.6040
0.8619
TABLE 6.4
7
1
1
Correlation Matrix of the Auckland TM Subscene Image in Fig. 6.23a
Relative eigen values (percentage) = eigen value × 100% / total eigen value
TABLE 6.5
Eigen Values and Cumulative Eigen Values for the Image in Fig. 6.23a
components available. There is no universal specification as how many components should be retained after the transformation. The rule of thumb is to discard all components that contain far less than the average contribution of each component (e.g., 1/6 = 16.67 percent). According to this rule, only components 1 and 2 in Table 6.5 should be retained. In order to assess the amount of information loss caused by the abandonment of the remaining components, the absolute eigen values need to be converted into relative ones by dividing them by the total eigen value of 261.069 (Table 6.5). The cumulative percentage is calculated by adding the current percentage to the previous one. So (100 − 92.68) percent, or 7.32 percent, of the total information is lost if the first two component images (33.33 percent) are retained. In other words, a third of the components are able to preserve 92.68 percent of the total information. This represents a huge improvement in data representation efficiency. Also known as the loading matrix, the eigen vector (Table 6.6) is a square but asymmetrical matrix if all component images are retained from PCA. If not, the number of rows equals the number of input bands and the number of columns is the same as the number of retained output components. Similar to the correlation matrix, all elements in the table have a value between −1.0 and 1.0 because the figures represent the proportion of the information content of each band/component to Spectral Band
Component 1
2
3
4
5
6
1
0.0439
0.6694
−0.4555
0.4466
−0.2367
−0.2951
2
−0.0131
0.2929
−0.2174
0.0228
0.1398
0.9202
3
−0.0919
0.4658
−0.1049
−0.6953
0.4769
−0.2296
4
−0.7865
−0.2983
−0.5244
−0.0798
−0.1027
−0.0225
5
−0.5733
0.2751
0.5819
0.3951
0.3172
−0.0162
7
−0.2053
0.2908
0.3473
−0.3926
−0.7654
0.1126
TABLE 6.6 Eigen Vectors Computed for the Variance-Covariance Matrix in Table 6.3
241
242
Chapter Six be redistributed. As indicated by Eq. (6.20), all the squared figures in a row sum to 1. So do all the squared figures in a column. The information content of a spectral band is split into many parts among the components. As shown in Table 6.6, the information content of band 1 is allocated mainly to components 2 and 4. On the other hand, the information content of the component 1 image originates mostly from band 4, albeit negatively. This means that the radiometric appearance of the component 2 image resembles mostly that of band 4 in a reverse manner. Namely, a bright tone in the spectral band shows up as a dark tone in the component image, and vice versa. Similarly, the tonal appearance of component 2 resembles that of spectral band 1 most among all the bands.
6.6.2 Tasseled Cap Transformation Also called the Kauth-Thomas (1976) transformation, the Tasseled Cap transformation was developed specifically for transforming the four Landsat MSS spectral bands. Pixels in a triangle formed by the four-band feature space represent vegetation at various stages of growth (Fig. 6.24a). The Tasseled Cap transformation optimizes viewing the original satellite data in the feature space for certain particular purposes
Fold of green stuff
1 nd
Mature crop
(4)
Ba
Emergence Band 3 (6)
Band 3 (6)
Yellowing
Badge of trees Plane of soils Band
Bright soil
Senescence Line of soils
Dark soil
2 (5)
Band 2 (5) (b)
Band 2 (5)
(a)
Yellowing of senescent crops
Soil region
Band 1 (4) (c)
FIGURE 6.24 The three new axes in the Kauth-Thomas transformation showing crop trajectories in Landsat MSS band 4, 5, 6 space. (Source: Richards and Jia, 2006.)
Image Enhancement (e.g., maximization of vegetation difference). Through rearranging the content of the output bands, it is possible to highlight the subtle variations in crop types. Essentially, the Kauth-Thomas transformation is a rotation of axes so that the differences among the pixels are more distinguishable along the new axes (Fig. 6.24b). The four bands are rotated to a new space defined by four axes “brightness,” “greenness,” “yellowness,” and “None-such.” Each axis represents a unique aspect of the object of study. The brightness axis reflects chiefly variations in soil reflectance. The greenness axis reflects the variation in vegetation vigor. The yellowness axis is indicative of vegetation that has reached maturity (Fig. 6.24c). The last axis, still orthogonal to the previous three axes that are mutually orthogonal to one another, accounts for noise in the data not related to soil or vegetation conditions. So far brightness and greenness have found wide applications, while the other two functions have not been so useful in discriminating different types and status of vegetation. Pixel values in the original images that may be obtained at different times are transformed into a space of three or four dimensions in the Tasseled Cap transformation. Pixel values in each of the output axes are produced arithmetically from a linear combination of those in the raw bands. The transformation from the raw spectral bands to the four parameters is accomplished through the following equation: ⎛ Brightness ⎞ ⎛ 0 . 433 0 . 632 0 . 586 ⎜ Greenness ⎟ ⎜ − 0 . 290 − 0 . 562 0 . 60 0 U= ⎜ ⎟ = ⎜ − 0 . 829 0 522 0 . 039 . − Yellowness ⎜ ⎟ ⎜ 0 . 012 − 0 . 5 4 3 ⎝ None − such⎠ ⎝ 0 . 223
where MSS is the matrix of pixel values in the raw bands; C is a constant matrix that offsets U to prevent the appearance of negative values. The coefficient matrix, obtained from Landsat MSS imagery of four spectral bands, is applicable to Landsat MSS data recorded in any season anywhere at a quantization level of 7 bits. Efforts have been made to extend the transformation to Landsat TM data, which have seven spectral bands (Crist and Kauth, 1986). In addition to brightness and greenness, they have identified two extra axes, wetness and haze [Eq. (6.25)]. Pixels in this data space respond to vegetation canopy composition and structure, from which the vegetation type and stage of growth are able to be studied. The haze parameter can be used to dehaze Landsat TM imagery.
Chapter Six Absent from the above equation is TM6, which that has a spatial resolution different from the remaining bands. This equation was established from Landsat TM4 satellite data. The transformation coefficients in the matrix would be slightly different if other TM satellite data were used.
6.6.3
HIS Transformation
A false-color image of three multispectral bands is visualized on a computer screen by projecting three spectral bands to the monitor using three guns of blue, green, and red lights. In this additive process, different proportions of these primary-color lights are mixed to produce a whole range of colors, including yellow, magenta, and cyan (Fig. 6.25a). This kind of image is commonly referred to as red-greenblue (RGB). If the intensities of blue, green, and red are set equal to each other, then their mixed color will become achromatic (e.g., gray). Color is defined precisely by three parameters, hue (H), saturation (S), and intensity (I). Hue refers to a specific color, like red, green, and blue. It is related to the wavelength of light. For instance, blue light has a rough wavelength range of 0.4 to 0.5 mm. Saturation is defined as the purity of a color, or the ratio of color pigment to gray. Intensity refers to brightness of a color. As shown in Fig. 6.25a, it is proportional to the average of the three primary colors of red (R), green (G) and blue (B). Any color can be defined by these three parameters using the Commission International de l’Eclairage (CIE) curve (Fig. 6.25b). The outer boundary of the CIE curve corresponds to hue. Saturation is measured by the distance at which a Hue Cyan
Green
Gree
Yellow
Red
n
White
Saturation
Cyan
White
Yellow
Magen
ta
y
sit
n nte
I
Blue Intensity
Blue Black
Magenta
Red
(a)
Black
(b)
FIGURE 6.25 Two versions of the color space. (a) The cube model for RGB rendition. Color is formed by projecting three guns of red, green, and blue to the same screen; (b) the cone model for HIS rendition in which each color is separated into three components of hue (variation around the perimeter of the circle), intensity (along the vertical axis), and (variation from the center of the circle).
Image Enhancement point is located relative to the outer boundary. Intensity (also called value) is the distance from the apex of the cone. The transformation of pixel values in the RGB space into values in the HIS space requires the establishment of a new reference system. In this system hue is defined as proportional to the degree of rotation about the achromatic point. Saturation is defined as the length of a vector from the achromatic point to the point (R, G, B). Intensity is the vector length from the origin. After the establishment of this system, RGB can be translated to HIS using the following algorithms (Carper et al., 1990): 1 ⎧ I = (B + G + R) ⎪ 3 ⎪ ⎡ ⎤ ⎪ −1 R − G ⎨ H = tan ⎢ ⎥ ⎣ 3I ⎦ ⎪ ⎪ ⎪⎩S = 0 . 5 3I 2 + (R − G)2
(6.26)
Quantification of color through the RGB to HIS transformation provides direct control over accurate portray and representation of colors. This useful means of image enhancement is good at fusing data from multiple sensors. For instance, images of different resolutions (e.g., 1-m panchromatic band and 4-m multispectral bands) can be fused through RGB to HIS transformation in a procedure known as “pan sharpening” to take advantage of the fine spatial resolution imagery. It is also possible to differentially contrast enhance the saturation and intensity components before they are transformed back to RGB.
6.7
Image Filtering in Frequency Domain PCA, Kauth-Thomas, and RGB transformations share one commonality in that they are all carried out in the spatial domain. Apart from this domain, image filtering can also be implemented in the frequency domain using the common method of Fourier transformation that operates on a single band (e.g., grayscale image). The fundamental premise underlying this transformation is that each row of image f(x) can be approximated by a series of sinusoidal waves, each having its own amplitude, frequency, and coefficient (Fig. 6.26). The transformed image can be described by the frequency of each wave form fitted to the image and the proportion of information associated with each frequency component (Mather, 2004). For satellite imagery, this generalization needs to be extended in two ways. First, the image is discrete instead of continuous, thus the transformation is termed discrete Fourier transformation (DFT). A highly efficient version of the DFT called fast Fourier transformation (FFT) has been developed
245
246
Chapter Six
cosine
0
π
0
2π
sine
(a)
π
2π
(b)
π
0
2π
(c)
FIGURE 6.26 Decomposition of 1D image ƒ(x) into a series of sine and cosine curves of various frequencies, magnitudes, and coefficients. (a) Original function ƒ(x); (b) sine and cosine of ƒ(x); and (c) Fourier transform of ƒ(x). (Source: Mather, 2004)
to expedite the computational efficiency in determining the huge number of values for all the sine and cosine terms along with the coefficient multiplications. The second extension is from 1D images to 2D images. A 2D image can be considered to comprise many 1D rows of pixels. A 2D FFT can be devised by combining many 1D FFTs. Thus, the scale components are 2D waveforms, and each scale component has an orientational component, in addition to the usual magnitude. These coefficients form 2D arrays of real numbers or a single 2D complex array. An image in the spatial domain is denoted mathematically as f(i, j), a three-dimensional (3D) intensity surface with the rows (i) and columns (j) being the two horizontal axes. The gray-level intensity value at each (i, j) forms the third dimension. It is this value that is transformed into a series of waveforms of increasing frequencies and with different orientations using the following equation: F(u, v) =
M −1 N −1
∑ ∑ f (i, j)e
⎛ ux uy ⎞ − 2π ⎜ + ⎟ ⎝M N⎠
x= 0 y = 0
where M = number of pixels in a row N = number of pixels in a column u, v = spatial frequency variables k = imaginary component of a complex number
(6.27)
Image Enhancement Both M and N must be a power of 2. In case the input image does not meet this requirement either horizontally or vertically, the dimension is padded up to the next highest power of two (Leica, 2006). Alternatively, the image can be subset or resampled to the required dimension. Of particular notice is that a position associated with (u, v) in a Fourier image does not always represent the same frequency because it is inversely related to the size of the input image. A large spatial domain image contains components of lower frequency than a small spatial domain. The transform of a 2D grayscale image via FFT involves two steps. First, the Fourier coefficients are computed for each row of the image and stored in separate 2D arrays, one for cosine (ai) and another for sine (bi). Second, the Fourier transform of the columns of the two matrices composed ai and bi are computed to yield the Fourier coefficients of the 2D image. This requires the transpose of the two coefficient matrices, which is very time consuming to implement. After filtering in the frequency domain, the filtered image is transformed back to the spatial domain using the following equation: f ⬘(i, j) =
1 M×N
M −1 N −1
∑
∑ F(u.v)e
⎛ ux uy ⎞ 2π k ⎜ + ⎟ ⎝M N⎠
(6.28)
x= 0 y = 0
The main advantage of FFT is that image filtering takes place in the frequency domain instead of the spatial domain. As such, those periodic noises, such as stripping, that cannot be dealt with effectively in the spatial domain can be handled better in the frequency domain. The obvious disadvantage of FFT is its mathematical and computational complexity.
References Bannari, A., D. Morin, F. Bonn, and A. R. Huete. 1995. “A review of vegetation indices.” Remote Sensing Reviews. 13(1–2):95–120. Bryceson, K. P. 1989. “The use of Landsat MSS data to determine the distribution of locust eggbeds in the Riverina region of New South Wales, Australia.” International Journal of Remote Sensing. 10:1749–1762. Carper, W. J., T. W. Lilesand, and R. W. Kieffer. 1990. “The use of intensity–hue– saturation transformation for merging SPOT panchromatic and multispectral image data.” Photogrammetric Engineering and Remote Sensing. 56(4):459–467. Crist, E. P., and R. J. Kauth. 1986. “The Tasseled Cap de-mystified.” Photogrammetric Engineering and Remote Sensing. 52(1):81–86. Gao, J., and M. B. Lythe. 1996. “The maximum cross-correlation approach to detecting translational motions from sequential remote sensing images.” Computers and Geosciences. 22(5):525–534. Gonzalez, R. C., and R. E. Woods. 1992. Digital Image Processing. Reading, MA: Addison-Wesley. Holben, B. N. 1986. “Characteristics of maximum-value composite images from temporal AVHRR data.” International Journal of Remote Sensing. 7(11):1417–1434. Kauth, R. J., and G. Thomas. 1976. “The tasseled cap—a graphic description of the spectral-temporal development of agricultural crops as seen by Landsat.” In
247
248
Chapter Six Proceedings of the Symposium on Machine Processing of Remotely-Sensed Data 1976, ed. A., 4B:41–51. West Lafayette, IN: Purdue University. Leica. 2006. ERDAS Field Guide. Norcross, GA: Leica Geosystems Geospatial Imaging. Mather, P. M. 2004. Computer Processing of Remotely-Sensed Images: An Introduction (3rd ed.). Chichester, England: John Wiley & Sons. Prewitt, J. M. S. 1970. “Object enhancement and extraction.” In Picture Processing and Psychopictorics, ed. B. S. Lipkin and A. Resenfeld. 75–149 New York: Academic Press. Rasmussen, M. S. 1998. “Developing simple, operational, consistent NDVIvegetation models by applying environmental and climatic information: Part I. Assessment of net primary production.” International Journal of Remote Sensing. 19(1):97–117. Richards, J. A., and X. Jia. 2006. Remote Sensing Digital Image Analysis: An Introduction (3rd ed.). Berlin: Springer. Sabins, F. F., Jr. 1996. Remote Sensing: Principles and Interpretation (3rd ed.). New York: Freeman. Tanser, F. C., and A. R. Palmer. 1999. “The application of a remotely-sensed diversity index to monitor degradation patterns in a semi-arid, heterogeneous, South African landscape.” Journal of Arid Environments. 43(4):477–484. Zha, Y., J. Gao, S. X. Ni, Y. Liu, J. Jiang, and Y. Wei. 2003a. “A spectral reflectancebased approach to quantification of grassland cover from Landsat TM imagery.” Remote Sensing of Environment. 87(2–3):371–375.
CHAPTER
7
Spectral Image Analysis
S
ince the advent of spaceborne remotely sensed data with the launch of the first Earth Resources Technology Satellite in 1972, a huge quantity of multispectral and hyperspectral satellite data from a wide variety of sensors has accumulated so far. Although they have considerably expanded our ability to study the Earth’s surface, timely processing of this plethora of data is a formidable challenge that can no longer be met by using the traditional manual interpretation method, which is tedious, subjective, and slow. The solution to overcoming this limitation lies in the automatic processing and analysis of remote sensing data in the digital environment. Thanks to the development of computing technology and the availability of powerful digital image analysis systems introduced in Chap. 4, these data are now routinely analyzed digitally to derive the desired information automatically. Computer-assisted image classification has considerably expedited the process of studying the Earth’s surface from satellite data. Spectral image classification, also called information extraction, is a process of converting satellite data into meaningful land cover information based on pixel values in an image. Such spectral classification is accomplished either parametrically or nonparametrically in the multispectral domain. Depending upon the classifier, the process can be very simple or very complex, involving a number of steps. This chapter on spectral image classification starts with several rudimental concepts related to image classification. The next topic is spectral distance, the fundamental decision criterion behind spectral classification. This chapter then concentrates on image classification itself. The two broad categories of per-pixel image classification methods, unsupervised and supervised, are then discussed under separate headings, and their main features are compared with each other in Sec. 7.6. Following this comparison are two sections devoted to image classification at the subpixel level and fuzzy image classification. Finally, this chapter ends with postclassification processing undertaken to embellish the classification results.
249
250 7.1
Chapter Seven
General Knowledge of Image Classification 7.1.1
Requirements
Although image classification is mostly performed automatically by the computer in the digital environment, human intervention, either prior to the classification or during postclassification, still plays an indispensable role in its success, even though this intervention is reduced markedly in comparison with manual interpretation. The successful completion of image classification is impossible to achieve without the analyst fulfilling the following three requirements. First and foremost, the analyst must be familiar with the subject area under investigation. For instance, vegetation mapping at the species level is best done by a botanist. If not feasible, at least the analyst should be knowledgeable in botany in order to produce a reasonable and convincing vegetation map. Secondly, the analyst must be familiar with the geographic area under study, including its physiographic settings and the background information related to the theme under investigation. Such a familiarity may be gained through selective reconnaissance trips augmented by a study of large-scale aerial photographs or other satellite images of a larger scale. Other useful ancillary materials include the most recent topographic maps and thematic maps. These collateral materials are essential to achieving reliable classification results. The more preparatory work done to become familiar with the study area and the subject, the easier the task of image classification, and the more accurate the results will be. Finally, the analyst must have a sound understanding of the remotely sensed data being used, such as their spatial resolution, number of spectral bands and their wavelength ranges, and time and date of acquisition. In addition, the analyst must possess rudimental photo interpretation skills. Command of basic skills in using photo elements eases the task of image analysis, and is conducive to generation of accurate results. These photo elements are critical to the proper interpretation of aerial photographs by human interpreters. Although the use of these elements is reduced considerably in the digital analysis context, their skilful application facilitates the achievement of more accurate classification results, especially if the classification method is supervised.
7.1.2
Image Elements
There are a total of seven photo elements (Table 7.1): tone/color, shape, size, shadow, texture, pattern, and location/association. Tone refers to the darkness or brightness of a pixel. In the digital environment, tone is equivalent to gray level, representing the amount of radiation from the scene received at the sensor. Gray level is rendered
Spectral Image Analysis
Photo Element
Description/Math Expression
Utility
Tone/color
DN in multispectral domain
Sole clue in parametric classifiers
Size
Number of spatially contiguous pixels
Not useful in the spectral domain
Shape
Hard to Area/ perimeter ratio Size/ perimeterww
Not useful
Shadow
DN of a small value, mostly noise
Not useful
Texture
Standard deviation of DN or spatial autocorrelation
Useful for identifying certain covers of a unique texture
Pattern
Could be measured as spatial adjacency
Useful in pattern recognition
Location/ association
Logical expressions in the form of “IF … THEN …”
Restricted use in identifying certain features
Graphic Sample
TABLE 7.1 Photo Elements, Their Mathematical Expression, and Their Utilities in Spectral Image Analysis.
as digital number (DN) in the digital context. According to the convention, a pixel of a small value in a spectral band has a dark tone (e.g., shadow). A large pixel value has a bright tone (e.g., cloud). The potential range of all digital numbers is governed by the quantization level of the sensing system (refer to Sec. 1.4.2). If a sensing system uses 8 bits to record Earth resources satellite data, there are 28 (256) possible gray levels ranging from 0 to 255. All pixels in spaceborne multispectral images have multiple DNs, each corresponding to a spectral band in the multispectral domain. Since each multispectral band has a unique wavelength range, the value of this pixel likely varies from band to band. The magnitude of variation in DN from a band to the one immediately adjacent to it is determined by the radiometric resolution of the remote sensing system. The specific value of a pixel’s DN is
251
252
Chapter Seven subject to the wavelength range of its band. When multispectral bands are used to produce a composite by assigning each band a unique color of blue, green, or red, tone is replaced by color, which has three dimensions of hue, saturation, and intensity (refer to Sec. 6.6.3). Color is much more useful and effective than tone in visual interpretation. In digital analysis color occurs in the form of multiple values for the same pixel. All of these values (i.e., in all the bands) are utilized to compensate for the classifier’s inability to use other elements. Related to pixel DN is contrast, which refers to the gray level range of an image. Contrast affects the visual quality of an image, but bears no direct relationship to image classification. It exerts an impact on image classification indirectly in that it affects the selection of training samples vital in supervised classification. An image with a small contrast has to be stretched, using the methods discussed in the preceding chapter, to make all details visible. In this way representative training samples can be selected with relative ease for each of the categories to be classified. Shape refers to the outline or configuration of an object. This element is very useful in identifying certain objects that have a unique shape in manual interpretation, such as the circular shape commonly associated with open pit quarries. However, shape is meaningless if pixels in a neighborhood are treated individually instead of as a group because each pixel is perfectly square in shape. All per-pixel classifiers are common in that they cannot incorporate shape into the decision-making process. Size refers to the physical dimension of ground objects in an image, which is directly proportional to its scale. An object appears to be larger on images of a larger scale. Given the same scale, however, the physical dimension of a ground object on remotely sensed imagery is determined by its spatial resolution. For instance, an object is recorded with more pixels (i.e., a large size) in a SPOT (Le Systeme Pour l’Observation de la Terre) panchromatic band of 10-m resolution than in a Landsat Thematic Mapper (TM) band of 30-m resolution. Object size is of little utility in per-pixel classification in which images are treated as being composed of pixels of a uniform size rather than as objects. Potentially, object size could be a useful clue in objectoriented image classification, to be covered in Chap. 10. Shadow appears as dark pixels that have a small DN on satellite imagery owing to a lack of sufficient incident energy. Therefore, shadow is not of particular significance if pixels are treated as individual cells. Similar to size, shadow is meaningful only when a group of pixels are treated as an object. Although a useful clue in revealing the third dimension (e.g., topographic relief or height) of ground objects in an image, shadow is not desirable in digital image classification as it creates a pseudoimage for the same type of ground cover, and hence degrades the quality of classification results. In order to produce reliable results, shadow needs to be removed prior
Spectral Image Analysis to classification. An effective means of eliminating shadow is via special image processing such as band ratioing. Texture refers to the spatial frequency of variation in image tone or pixel value. This image element is measured on the basis of a group of pixels. Use of texture in image analysis is problematic owing to its scale dependency and the difficulty of its precise quantification. This topic is so complex that it will be covered in depth in Chap. 10. Pattern refers to the regular and predictable spatial arrangement of the same object. It differs from texture in that it never occurs at the pixel level. Instead, pattern implies the repetitive occurrence of the same artificial objects along a certain direction, such as buildings and land plots. Thus, it is commonly associated with residential pattern and land use pattern. Pattern can be a critical element in identifying an object or an activity if it is manifested with a unique pattern. However, in the digital environment pattern is of little use in image analysis at the pixel level. It can be made very useful if the spatial relationship of objects is exploited as with intelligent image classification or pattern recognition. This issue is so complex that it will not be covered in this book. Also called location, contexture, and convergence of evidence, association refers to the inherent linkage between one object and another in a neighborhood. At present, association is not routinely used in image classification. Its use to improve image classification accuracy forms a frontier in digital image analysis, such as rule-based or knowledge-based classification in which the relationship is explicitly coded and multiple statements may be combined logically to infer the identity of pixels. Use of contexture in image classification will be covered in Chap. 10. In visual interpretation it is impossible to rank these elements in terms of their significance as the value of each element depends on the area under study and the object of image interpretation. However, in the digital environment, the most useful element is tone or its equivalent. As a matter of fact, it is the only image element used in all per-pixel classifications in the spectral domain. The decision-making evidence is based solely on the pixel’s DN and its relationship with that of others. By comparison, use of the other six image elements is problematic and much more challenging owing to the difficulty of their definition and representation. How to incorporate some of these mathematically ambiguous elements into image classification to enhance the accuracy of image classification is a challenge at present. This issue will be addressed separately in chapters to follow.
7.1.3
Data versus Information
In digital image analysis, data and information are by no means synonymous with each other. On the contrary, they have quite different connotations. Data refer to all the remotely sensed images that the
253
254
Chapter Seven analyst has access to or is about to analyze. They form most of the input fed into the computer. Virtually, these remotely sensed data represent values or DNs of pixels, usually in the multispectral domain. They capture the ability of ground objects to reflect or emit energy at various wavelengths. Information, on the other hand, refers to the final outcome derived from the analysis of the data. As a type of specially processed data, information is able to provide answers to questions related to these data. In other words, information is the data useful for a particular application. A very important difference between the input satellite data and the classification results that are regarded as the information derived from the data is the range of pixel values and their meaning. Raw satellite data may have a pixel value ranging from 0 to 255, the exact values being determined by the quantization level of the sensing system. All pixel values are indicative of the amount of reflective/ emissive radiation. In contrast, the pixel value range of a classified image is much narrower, usually numbered no more than tens. The objective of image classification is to convert such a vast amount of data into useful information. During the conversion, a wide range of DNs is reduced to a certain number of codes, each ideally corresponding to a meaningful ground cover in the classified results. Therefore, image classification is essentially a process of data generalization during which a range of pixel values corresponding to a ground cover is amalgamated into a single code. How these pixel values should be amalgamated depends on the classifier used and how many information codes are preserved in the classification outcome. These codes are collectively known as information classes. They are usually rendered graphically in map format with the desired geometry (e.g., georeferencing maps) toward the end of image analysis.
7.1.4
Spectral Class versus Information Class
A spectral class is defined as a cluster of pixels that are characterized by a common similarity in their DNs in the multispectral space. Whether a group of pixels can be regarded as one cluster is subjective, dependent on the specification of spectral distance among these pixels. If the distance between a pixel and a group of pixels falls within the specified threshold, this pixel is considered a part of that cluster. An information class, on the other hand, is a category of ground features retained in the classification results. Every category included in the enacted classification scheme represents an information class. It corresponds to a specific type of ground cover or feature to be extracted from remotely sensed data. The purpose of image classification is to map the input data into these information classes as accurately and reasonably as possible. There is a complex relationship between information classes and spectral classes. Rarely, they correspond to one another neatly. In
Spectral Image Analysis
Informational class “vegetation”
Spectral subclasses arising from variation in illumination
Shadowed slope
Sunlit slope
Spectral subclasses arising from variation in species
Shrubs
Forest
Spectral subclasses arising from growing condition
Healthy
Drought stricken
FIGURE 7.1 Relationship between an information class and spectral (sub) classes. (Source: modified from Campbell, 2002.)
other words, it is almost impossible for one information class to be linked uniquely with a spectral class. On the contrary, one information class may exhibit a wide range of variations in its spectral value. Consequently, it can correspond to a number of spectral classes that are formed out of many slightly different but significant variations in appearance caused by the status, spatial composition (e.g., varying density), and the environmental settings. For instance, the appearance of a typical forest in a satellite image is affected by its age, and the differing proportions of mixture with trees of other species and topography (Fig. 7.1). The task of image classification is to merge these different and numerous spectral clusters rationally to form meaningful information classes.
7.1.5
Classification Scheme
The success of image classification depends largely upon the nature and soundness of the classification scheme adopted. A classification scheme is virtually a list of all potential land cover types present inside a study area that can be soundly identified from the satellite image. This scheme should be comprehensive and encompass all the covers present inside the area under study. All the information classes to be mapped should have an unambiguous definition so that they are mutually exclusive. One ground feature should not fit into the criteria of two information classes. All the covers in the classification scheme are usually grouped hierarchically for the convenience of their mapping. There are a number of classification schemes in use. One of the most popular schemes is the U.S. Geological Survey Land Use/Cover System devised by Anderson et al. (1976) (Table 7.2). Its popularity is attributed to its universality. This classification scheme can be adapted to all parts of the world for general land cover/use mapping after certain modifications. All terrestrial features on the Earth’s surface are
255
256
Chapter Seven
Level I 1 Urban or built-up land
Level II 11 Residential 12 Commercial and service 13 Industrial 14 Transportation, communications, and utilities 15 Industrial and commercial complexes 16 Mixed urban or built-up land 17 Other urban or built-up land
2 Agricultural land
21 Cropland and pasture 22 Orchards, groves, vineyards, nurseries, and ornamental horticultural areas 23 Confined feeding operations 24 Other agricultural land
41 Deciduous forest land 42 Evergreen forest land 43 Mixed forest land
5 Water
51 Streams and canals 52 Lakes 53 Reservoirs 54 Bays and estuaries
6 Wetland
61 Forested wetland 62 Nonforested wetland
7 Barren land
71 Dry salt flats 72 Beaches 73 Sandy areas other than beaches 74 Bare exposed rock 75 Strip mines, quarries, and gravel pits 76 Transitional areas 77 Mixed barren land
Source: Anderson et al., 1976.
TABLE 7.2 The USGS Land Use and Land Cover Classification System for Use of Remote Sensing Data at the Primary and Secondary Levels (Continued)
Spectral Image Analysis
8 Tundra
81 Shrub and brush tundra 82 Herbaceous tundra 83 Bare ground tundra 84 Wet tundra 85 Mixed tundra
9 Perennial snow or ice
91 Perennial snowfields 92 Glaciers
TABLE 7.2
(Continued)
encompassed in this scheme, organized in a hierarchical order. Those at the primary level are the most general. Their mapping is usually accomplishable from coarse resolution satellite data such as Landsat Multispectral Scanner (MSS). At the secondary level, each cover is subdivided further into more detailed classes. For instance, urban is broken down into seven subcategories of residential, industrial, commercial, transportation, mixed, and so on. The mapping of these covers requires the remotely sensed data to have a moderate spatial resolution (e.g., around 30 m) in order to achieve reasonable accuracy. Land covers at the tertiary level are even more detailed than those at the secondary level. Their successful mapping through automatic classification, however, is possible only with the use of fine-detailed imagery, such as those from very high resolution satellite data (refer to Sec. 2.5). Even so, the accuracy of the mapping might not be satisfactory unless additional photo elements other than pixel values are used in the classification. Irrespective of the classification scheme used, all spectral image classifications are underpinned by the same assumption that different information classes on the ground have different pixel values in the satellite imagery, preferably in every multispectral band used. Moreover, the same ground feature should have the same or a similar value in the same band. While this implicit assumption is not valid in every case, it is certainly correct under most circumstances. Whenever this assumption is violated, an incorrect classification may result if spectral information is the only clue used in the decisionmaking process.
7.2
Distance in the Spectral Domain Distance is defined as the shortest length between any two points in the conventional cartesian space. In the spectral domain, distance between any two pixels is measured by the disparity in their DNs in the same band. There are two spectral distance measures, euclidean spectral distance and mahalanobis spectral distance.
257
Chapter Seven
7.2.1
Euclidean Spectral Distance
The euclidean spectral distance between two pixels A and B in a two-band domain is the straight distance between them (Fig. 7.2). The formula commonly used for calculating distance in the cartesian coordinate system can be easily extended to the spectral coordinate system. The euclidean spectral distance De between two pixels in a multiple band space is calculated using Eq. (7.1): De = where
n
∑ (DN Bi − DN Ai )2
(7.1)
i =1
n = number of spectral bands used in a classification DNAi = DN of pixel A in the ith band DNBi = DN of pixel B in the same band (i.e., band i)
Unlike the cartesian coordinate system, the spectral coordinate system can have a dimension higher than three, depending upon the number of spectral bands used. The use of more bands does not alter the appearance of Eq. (7.1). More spectral bands simply mean more terms in the summation. The calculated spectral distance shows how far apart one pixel is from another. If one of the pixel’s value is replaced by the mean of a cluster of pixels, the spectral distance shows how far away this pixel is from this group. As a very important measure
255 Pixel B (DNB1, DNB2)
DN (Band 2)
258
De
Pixel A (DNA1, DNA2) 0
ΔDN2 = DNB2 – DNA2
ΔDN1 = DNB1 – DNA1
0
255 DN (Band 1)
FIGURE 7.2 Calculation of the euclidean spectral distance De between two pixels in the multispectral domain (illustrated in this diagram are only two spectral bands).
Spectral Image Analysis of spectral similarity, euclidean spectral distance can serve as a membership function. If a pixel has a shorter distance to the center of one cluster than to the center of another, then it is more likely to be a member of the former cluster because it shares a higher spectral similarity with all the pixels in that cluster.
7.2.2
Mahalanobis Spectral Distance
Also called Manhattan distance in human geography the mahalanobis distance is the algebraic sum of the absolute differences between two pixels in the same band and in every band used in the classification (Fig. 7.3). This distance, Dm, is calculated using Eq. (7.2). The use of the absolute value eliminates the need to worry about which pixel value should be subtracted from which pixel in deriving the difference. Since the difference is guaranteed to be positive, no differences are going to negate themselves in the summation. n
Dm = ∑|DN Ai − DN Bi |
(7.2)
i =1
The mahalanobis spectral distance is easier to compute and less complex than the euclidean distance. It remains unknown which distance is able to produce more accurate results, De or Dm, even though the euclidean spectral distance is used much more commonly in practice.
255
DN (Band 2)
Pixel B (DNB1, DNB2)
ΔDN2 = DNB2 – DNA2
Pixel A (DNA1, DNA2) 0
ΔDN1 = DNB1 – DNA1
0
255 DN (Band 1)
FIGURE 7.3 Calculation of the mahalanobis distance in the spectral domain. It is the sum of the two right sides of the right triangle as against the inclined side which is the euclidean distance.
259
260
Chapter Seven
7.2.3
Normalized Distance
Normalized distance, Dnorm, refers to the absolute value of the difference between the means of two clusters divided by the sum of their standard deviations (Swain, 1978), or Dnorm =
|ui − u j |
δi + δ j
(7.3)
where ui and uj are the means of the two clusters and di and dj are their corresponding standard deviations. This distance applies to two clusters of pixels, but not individual pixels. The distance is measured from the center of one cluster to that of another. There is no restriction as to how many members each cluster can contain. Virtually, this distance is indicative of the statistical separability between the two clusters. The larger the distance, the more easily and accurately the two clusters can be distinguished from each other. This distance may be used to judge the quality of the selected training samples between any two covers before these samples are formally used in a classification.
7.3
Unsupervised Classification Unsupervised classification is essentially clustering analysis in which pixels are grouped into certain categories in terms of the similarity in their spectral values. In this analytical procedure, all pixels in the input data are categorized into one of the groups specified by the analyst beforehand. Prior to the classification, the image analyst does not have to know anything about either the scene or the covers to be produced. During posterior processing the identity of each spectral cluster is scrutinized and may be linked to a meaningful ground cover. Unsupervised classification may be implemented in a variety of ways. This section presents four common approaches.
7.3.1
Moving Cluster Analysis
Also known as K-means clustering, moving clustering starts with the specification of the total number of spectral classes (e.g., k) to be clustered from the input data. Then the computer arbitrarily selects this number of cluster centers or means as the candidates. The distance of every pixel in the input image to each of the candidate clusters is calculated. Of all the euclidean spectral distances calculated, a pixel is assigned to a candidate cluster to which the spectral distance is the shortest. After all of the pixels in the input image have been assigned to one of the candidate clusters, the sum of squared error (SSE) is
Spectral Image Analysis calculated from the pixels belonging to respective clusters using the following formula: k
n
SSE = ∑ ∑ [DN(i, j) − m j ]2 j =1 i =1
where
(7.4)
n = number of pixels enclosed in a given cluster. Its specific value varies from cluster to cluster DN(i, j) = value of the ith pixel in the jth cluster mj = mean of the jth cluster
Moving clustering is carried out iteratively. At the end of each iteration, the center of each cluster mj is updated with the mean value of all pixels comprising that group. The entire process of assigning pixels to one of the updated candidate clusters is reiterated using the newly derived cluster mean. As the clustering process continues, the updated cluster mean gradually approaches the genuine mean. In other words, SSE is going to become stabilized and leveled off. There are two means by which the iteration process is terminated: either the number of iterations reaches the specified value or the SSE convergence threshold (e.g., the amount of variation in the membership of all clusters from one iteration to the next) is reached. Obviously, the number of iterations required to reach the SSE threshold is affected by the initial arbitrarily selected cluster centers. The closer these centers are located to the genuine ones, the fewer number of iterations are required to reach the final result. A sensible approach of allocating the initial means is to determine the DN range in each band. The mean is obtained by dividing this range by the number of clusters. Then the means for each cluster equals the increment of the quotient plus the minimum DN. The process of K-means clustering analysis is best understood by examining Fig. 7.4 in which there are eight pixels in the two spectralband domain. During the first iteration, two cluster centers are randomly chosen by the computer. Five of the pixels fall into the first cluster while the remaining three are grouped into the second cluster (Fig. 7.4a). After this iteration the two clusters produce an SSE value of 93. During the second iteration, the cluster means have been updated using the member pixels in the corresponding cluster. Four of the five pixels still stay inside this cluster while another is assigned to the second cluster (Fig. 7.4b). After this iteration SSE decreases to 65.52 (Fig. 7.4c). At the end of the third iteration, a few pixels switch their membership regimes. Consequently, cluster 1 comprises five pixels, but cluster 2 encompasses only three (Fig. 7.4d). SSE continues to decrease to 55.75. At the fourth iteration, the membership composition of both clusters does not change at all (Fig. 7.4e). However, SSE decreases further to only 20.27. Thus, the process of clustering is terminated after four iterations.
261
Chapter Seven
First iteration 8
DN of band 2
DN of band 2
8 6 4 2
m1
+
6 4
+ m2 2 SSE = 93
0
0
0
2
4
6
8
8
6
4
2
0
DN of band 1
DN of band 1
(a)
(b)
Third iteration
Second iteration
8
DN of band 2
8
DN of band 2
6
+ m1
4
+ m2
6 4
+
+
m1
m2 2
2
SSE = 55.75
SSE = 65.52
0
0 0
2
8
6
4
0
4
2
6
DN of band 1
DN of band 1
(c)
(d)
8
Fourth and final iteration 8
DN of band 2
262
m2
+ 6 4 m1+
2
SSE = 20.27
0 0
2
4
6
8
DN of band 1 (e)
FIGURE 7.4 Decision-making process of K-means clustering. (a) Values of the pixels to be classified in the two spectral bands; (b) clustering results after each pixel value is assessed against the means of two arbitrarily selected centers; (c) clustering results after the pixels are reassessed using the updated cluster centers during the second iteration; (d) clustering results after the pixels are reassessed using the updated cluster centers during the third iteration; (e) final clustering results.
Spectral Image Analysis During the actual implementation of K-means clustering, the analyst needs to specify the convergence threshold in addition to the number of clusters to be generated. Since it is easier to merge several clusters into one than splitting one into a few, it is recommended that more clusters than is necessary be specified initially. As illustrated in Fig. 7.5, not every cluster corresponds to a unique
(a)
(b)
FIGURE 7.5 An example of unsupervised classification in which the input image is classified into 8 (a) and 12 (b) clusters using a convergence threshold of 0.950 and a maximum iteration of 10. See also color insert.
263
264
Chapter Seven ground cover, no matter how many clusters are generated in a classification. This problem may be alleviated via the iterative self-organizing method.
7.3.2
Iterative Self-Organizing Data Analysis
The Iterative Self-Organizing Data Analysis Technique (ISODATA) is very similar to the K-means clustering method. The difference lies in three additional steps that are undertaken to optimize the clusters: • Deletion After a certain number of iterations, a particular cluster may be deleted if its number of member pixels falls below the prespecified threshold. • Merging During clustering the spectral distance between any two clusters is constantly monitored. They are merged if their spectral distance falls within the predefined threshold. • Splitting New clusters may be created by splitting an existing cluster if its variance is too large or if it contains a large contingent of pixels exceeding the specified threshold. These three additional steps increase the adaptivity of the algorithm, but also make the computation more complex. Compared to K-means, ISODATA requires specification of more parameters for deletion and merging, and a variance limit for splitting (variance has to be calculated for each cluster).
7.3.3 Agglomerative Hierarchical Clustering Unlike K-means clustering and its variants, the agglomerative hierarchical grouping algorithm does not require specification of the number of clusters prior to classification. Instead, all pixels present in an image are treated as potential clusters. The distance among all pixels is then calculated. Those pixels that have the shortest spectral distance among themselves are considered to belong to one cluster. They are merged to form a cluster if their distance falls below the specified threshold. The means of all newly formed clusters are then calculated. They are treated as if they were still individual pixels in subsequent calculation of spectral distance between any two clusters, or from one cluster to individual pixels. This process continues until all pixels belong to one cluster. The history of merging is recorded and all cluster fusions are displayed as a dendrogram (Fig. 7.6). As shown in this figure, the number of clusters to be generated depends on the definition of spectral distance for a cluster. If the distance is defined as 1, there are seven clusters in the output, with pixels 1 and 2 merged as one cluster. If the distance rises to 1.6, then there are six clusters in the output. Pixels 6, 7, and 8 remain as three separate clusters by themselves. If the distance rises further to 2.24, then only three clusters are left in the output. This process is terminated only when the number
Dendrogram 4
6
7
8
1
5
4
4 2
2
2
7 7 clusters
6
0
5 4 2 +(3,1.5) 1
4
2
6
8
0
4
4
2
5 clusters 6 5
4
(4.5, 3.5)
+4
2
2
(3.7,1.3)
+ 1
(6.3, 7)
3 clusters
DN of band 2
6
8
8
7
+ 6
6 5
4
(4.5, 3.5)
+4
2
2
(3.7,1.3)
3
+ 1
0 2
4
6
DN of band 1
8
(6.3, 7)
2 clusters
+
7
6
6 5
4
4 2
2
(4,2.2)
+
3
3
1
0 0
8
8
8
7
DN of band 2
8
8
6
DN of band 1
DN of band 1
DN of band 2
3
0 0
265
FIGURE 7.6
6
4 2
3
3
1
DN of band 2
6
6
8
8
7
Distance
DN of band 2
Pixel 2 3
1
0
8
8
5
0 0
2
4
6
DN of band 1
8
0
2
4
6
8
DN of band 1
An example of the agglomerative hierarchical clustering analysis based on various euclidean distances in decision making.
266
Chapter Seven of clusters is equal to that specified by the image analyst after numerous merging. This procedure is rarely used to classify remotely sensed data owing to the tremendous number of pixels involved and thus a high intensity of computation.
7.3.4
Histogram-Based Clustering
Histogram-based image clustering relies on an n-dimensional (n being the number of spectral bands used) graphic histogram constructed from the input data. Illustrated in Fig. 7.7 is a two-dimensional histogram. The variable of both horizontal axes is DN, which typically ranges from 0 to 255. The vertical axis represents frequency, or the number of pixels having a specific DN in the two bands. A local peak in this histogram represents a cluster. This can be determined by searching for the local probability peaks in the multispectral space using a 3 × 3 window. A potential cluster is encountered if the center frequency is the highest within the neighborhood, such as 24 (boldfaced) in Fig. 7.8. However, false peaks can be produced if the search is executed only once strictly within the window. In order to prevent this from happening, a second search around all potential local peaks is warranted to determine whether there exist other peaks in the vicinity. If the answer is yes, then at least one of them is not a genuine peak. Only the cell with the highest frequency is a genuine candidate for the
Probability density function value Sand Corn Hay Urban Forest
Ba
nd
3d
igi
tal
ber
nu m
be
d4
r
Ban
al igit
num
d
Water
FIGURE 7.7 A two-dimensional histogram in which a peak in frequency represents a potential cluster center. (Source: Lillesand et al., 2004.)
Spectral Image Analysis FIGURE 7.8 A two-dimensional histogram of an input image that can be used to identify local peaks in the histogrambased clustering analysis.
local peak. A further search around this frequency guarantees the identification of a genuine peak. This process is repeated until all the neighboring peaks are searched and assessed. While it is relatively easy to construct a multidimensional histogram and search for local peaks, histogram-based image clustering analysis has a critical limitation in that the boundaries between two adjoining peaks cannot be drawn precisely. A common solution is to use the cells with the lowest frequency between them. However, it is still uncertain as to which of the two adjoining clusters these border pixels should be assigned.
7.4
Supervised Classification Supervised classification is much more complex than unsupervised classification in that it involves many more steps. All of these steps are discussed in detail in this section. Also included in the discussion are three supervised classifiers and the decision rules behind them.
7.4.1
Procedure
Illustrated in Fig. 7.9 is the entire procedure of a supervised classification. The first three steps in the procedure are development of a classification scheme, determination of spectral bands, and selection of a classifier, all of which are considered preliminary. They do not necessarily have to take place in the order shown. Formulation of a proper classification scheme is critical to the success of supervised classification as it controls the soundness of the classified results directly. At present there are several common classification schemes in use, including those for wetland (Cowardin et al., 1979), and for specific geographic regions (Florida Topographic Bureau, 1985), in addition to the already mentioned USGS one proposed by Anderson et al. (1976). No matter which classification scheme is adopted, it may not completely suit the geographic area or the theme under study without modification. It is the image analyst’s responsibility to determine the number of classes and their detail level in the modified scheme.
267
268
Chapter Seven
Development of a classification scheme
Selection of spectral bands
Selection of a classifier
Selection of training samples
Training samples satisfactory?
No
Yes Classification
Classified results satisfactory?
No
Yes Postclassification processing
Results representation
FIGURE 7.9 Flowchart of the complete steps involved in a supervised image classification.
This detail level should be commensurate with the spatial resolution of the satellite data. Prior to classification, consideration should be given to the number of spectral bands used in the classification. Remotely sensed data are usually acquired in the multispectral mode. Most Earth resources satellite data are recorded in spectral bands that are numbered at tens. In this case it is acceptable to take advantage of all of them in one classification because, generally, the more bands used, the more accurate the results. However, such an indiscriminate practice is problematic for hyperspectral remote sensing data of hundreds of spectral bands. They are troublesome to handle in an image processing system. Moreover, the information content of some of these bands
Spectral Image Analysis is inevitably correlated with one another. The improvement in classification accuracy is not necessarily proportional to the number of spectral bands included in a classification anymore. In order to reduce data redundancy, the most informative bands should be identified from their variance/covariance (see Sec. 6.6.1). The image analyst has to decide how many of these informative bands should be kept for the classification. Selection of the classifier is very simple. There are three commonly utilized algorithms to choose from (parallelepiped, minimum distance, and maximum likelihood), each having its own strengths and limitations (to be discussed in Sec. 7.5 below). The last two classifiers produce superior results to the first one from the same set of training samples. They are used more commonly in practice. Once the number of spectral bands (also called spectral features) is finalized, representative samples or training areas need to be selected for each of the classes retained in the classification scheme. These samples form the foundation for the subsequent classification. Care must be taken in their selection. The quality of these samples is judged by their “representativeness.” Training areas should be representative of the spectral characteristics of all the information classes in the scheme. Selection of quality training samples requires knowledge of and understanding of the properties of different ground features in the satellite imagery. They can be gained from a field visit to selected spots or via visual interpretation of aerial photographs. In order to train the computer to recognize the land cover types present in the study area, the image analysts have to train themselves first through studying collateral materials such as outdated thematic maps. After the selection of training areas, the spectral signatures of all information classes are generated from every spectral band to be used in the classification. These signatures are in the form of statistical parameters, such as mean, standard deviation, variance, and covariance. Once the selected training samples are deemed satisfactory, the process proceeds to the next stage, image classification. Image classification is a process of labeling the identity of pixels in the input image. Most image analysis systems provide provisions of not labeling all pixels due to conflicting evidences in the training samples. However, the image analyst has the ultimate discretion in deciding how many pixels should remain unclassified. The machine then assigns the specified number of pixels into one of the information classes based on the relationship between the pixels’ DN to the statistical parameters of the training samples. This classification step may have to be repeated with updated training samples if the preliminary classification results are not satisfactory. Although the quality of training samples has been evaluated and deemed acceptable after their selection, the ultimate judgment has to wait until after results are generated from them. This process may have to be reiterated until the classification results are considered acceptable.
269
270
Chapter Seven Once the classification results are considered sufficiently accurate, they may undergo thematic generalization during postclassification processing. At this stage the results are spatially filtered to remove isolated pixels before they are assessed for accuracy. The issue of accuracy assessment is so complex that it warrants a separate chapter (Chap. 12). After accuracy assessment the classified results are presented in the map format with necessary cartographic embellishments (e.g., inclusion of a legend, a scale bar, and annotative information).
7.4.2
Selection of Training Samples
To ensure the highest quality of the training samples and to maximize their representativeness, a number of points must be borne in mind during their selection, including the number of pixels at each training site, spatial distribution of training sites, and spectral properties of pixels within each site.
Quantity How many pixels should be selected as training samples for an information class varies with its spatial prevalence. The more extensively a cover is distributed over the scene, the more pixels should be selected for this class. For a class of a subordinate areal extent, its total training sample size should amount to 100 pixels. Such a large size guarantees that its spectral variation is adequately represented in the selected samples. The number of training pixels required also varies with the image classifier. For the maximum likelihood classifier, the number of pixels in the training dataset should be at least 10 to 30 times the number of features for each class (Mather, 2004).
Size A common method of selecting training samples is to delimit polygons over the area of interest. There is no definite rule stipulating the physical size of a training area patch. Generally, this size should be sufficiently large to capture the spectral characteristics of the class. How large a polygon should be delimited is governed by the physical size and shape of the information class on the image. For classes that are spatially widespread inside a study area, it is very easy to delimit a large polygon that encloses sufficient pixels. Occasionally, the desired size has to be compromised for classes that are highly restricted in their spatial extent with an irregular shape of distribution.
Location If the spectral properties of a ground cover vary geographically across the study area, a few patches of training areas positioned throughout the image should be included in the samples. For instance, residential areas in close proximity to an urban center are quite different from those in the suburbs and periurban areas. Training samples should be
Spectral Image Analysis collected from all localities where the spectral properties are different, if they are to be classified into the same category.
Number How many polygons should be drawn for a cover class depends on the diversity of the radiometric properties of this information class manifested in the image. It is always preferable to select a few smaller patches distributed at different locations than to select one large polygon at only one location. For instance, training samples should be selected from a few sites to cover water of various turbidity levels if all waters are to be lumped into one class. It is a sensible practice to select more patches than is necessary so that the unsatisfactory ones can be deleted later. A minimum of five to ten polygons is recommended for each information class.
Uniformity No matter how many polygons are drawn, all the pixels inside them should ideally exhibit a unimodal distribution in their value in every spectral band to be used. Multimodal histograms are not ideal as they violate the assumption underlying certain classifiers, a situation that should be avoided to achieve more accurate classification results.
7.4.3 Assessment of Training Sample Quality The quality of the selected training samples is assessed for their separability, which is judged against the spectral distance between signatures, such as euclidean spectral distances between their means and divergence. If the signature of two classes is very similar in the same spectral band, they cannot be separated reliably. For instance, if the mean of one information class is very close to or overlaps with that of another information class, these two classes will be severely confused with each other in the classification results. The amount of confusion is worsened if both covers happen to have a large standard deviation as well. A more rigorous approach of assessing the training sample quality is to produce a confusion matrix for all selected training samples. This table is able to illustrate the spectral confusion among all classes even before the classification is performed. How to generate and interpret a confusion matrix is such a complex topic that its discussion has to be delayed until Chap. 12. If the confusion is excessive, the heavily confused samples have to be deleted and reselected.
7.5
Per-Pixel Image Classifiers There are three per-pixel image classifiers: parallelepiped, minimum distance to mean, and maximum likelihood. Of these three, the computationally simplest parallelepiped method is presented first, followed by the more complex minimum distance to mean,
271
272
Chapter Seven and finally the maximum likelihood ones in this section. In order to understand the mathematical underpinning of these classifiers, two matrices are introduced below. The first matrix is the DN matrix of pixel X, or DNX: ⎧DN 1 ⎫ ⎪⎪DN ⎪⎪ 2 DN X = ⎨ ⎬ ⎪ ... ⎪ ⎪⎩DN n ⎪⎭
(7.5)
where n is the total number of spectral bands used in a classification and DNi (i ranges from 1 to n) is the digital number of pixel X in the ith spectral band. Thus, matrix DNX has a dimension of one column by n rows [Eq. (7.5)]. The second matrix is called uj, the mean DN for class j, or ⎧C1 j ⎫ ⎪ ⎪ ⎪C ⎪ uj = ⎨ 2 j ⎬ ⎪ ... ⎪ ⎪⎩Cmj ⎪⎭
(7.6)
where m is the total number of information classes to be classified and Cij is the mean DN of class j in band i (j ranges from 1 to m). Therefore, Cij is a matrix comprising m × n DN values [Eq. (7.6)].
7.5.1
Parallelepiped Classifier
Also known as the “box” method, the parallelepiped classifier assigns a pixel into one of the predefined information classes in terms of its value in relation to the DN range of each class in the same band. This comparison is expressed mathematically as
Pixel X ∈ Cj if min DNj ≤ DNx ≤ max DNj
(7.7)
Translated into plain language, the decision rule states that pixel X under consideration is a member of information class Cj if and only if its value falls inside the DN range of this class in the same band. In this classification, the statistical parameters used are the minimum and maximum values, obtainable from the training samples, of an information class. Their minimum and maximum pixel values are defined in two ways. First, they are literally the smallest and the largest values. Use of these actual values poses a high vulnerability to the influence of a few outlier pixels. In order to prevent this from taking place, the minimum and maximum values should be defined more reliably from such statistical parameters as mean and standard deviation. For instance, the maximum value may be defined as the mean
Spectral Image Analysis plus one standard deviation and the minimum as the mean minus one standard deviation. The decision rule in Eq. (7.7) applies to a single band only. There are two logical permutations in applying this decision rule to multiple spectral bands, “AND” and “OR.” In the former logic, the evaluation returns true only if the evaluation is true for every band used. Thus, only those pixels falling inside the bound of an information class are classified reliably, hence the name “box” method. As illustrated in Fig. 7.10, pixel 3 can be reliably classified into “residential” using this logic. By comparison, pixel 2 cannot be classified into either “cloud” or “quarries” for certain because it falls into two boxes simultaneously. This confusion stems from the fact that the two boxes overlap with each other in both spectral bands. The final classification outcome for this pixel depends on the sequence in which the training samples are fed into the computer. It is classified as “cloud” if its training sample is provided to the computer ahead that of “quarries.” Otherwise, it will be classified as “quarries.” This subjectivity is caused by the violation of the assumption that different covers should have dissimilar DNs in every spectral band used in the classification. Judged against the above decision rule, any pixels falling outside the
255 I I I R I I I R R R I R R I R I R R R 1 R R R R R 3 R P P R Q Q P P Q P Q Q PP Q P Q Q 2 P P Q P C C C Q C C C C C C
FIGURE 7.10 Decision rules of the parallelepiped classifier. Only those pixels that fall inside a dashed-line box can be reliably classified if it does not overlap with another box. 1, 2, and 3: location of three representative pixels.
273
274
Chapter Seven boxes remain unclassified in the output results, a phenomenon known as “gap.” For instance, pixel 1 will remain unclassified if the “AND” logic is applied to the decision rule. The gap can be substantial if too many pixels fall outside all boxes. An alternative means of applying the decision rule is to adopt the “OR” logic. The evaluation returns a true value if the condition is met at least once for one of the bands. A pixel is assigned to an information class if its value falls within its DN range for the first time. This lax decision criterion is unscientific for those pixels that fit the range of two information classes in one of the spectral bands, even though it may not fall into any of their boxes (Fig. 7.10). The classification results vary with the sequence of entering the training samples to the computer. For instance, pixel 1 will be classified as “industrial” instead of “residential” if the training samples of the former class are compared with the pixel value first. Similarly, pixel 3 will be classified erroneously as “pasture” or even as “forest,” depending upon which spectral band is used for the evaluation first and which information class’s training samples are fed into the computer first. In other words, the results are not unique, and the situation should be avoided. The parallelepiped classifier is characterized by its simplicity. The decision-making process does not require sophisticated computation. Decision rules are simple comparisons in pixel value. The upper and lower thresholds of an information class suffice in the decision making of a classification. This method is limited in that not every pixel can be reliably classified in the output result, causing it to have a considerable gap sometimes.
7.5.2
Minimum-Distance-to-Mean Classifier
The decision rule in the minimum-distance-to-mean classifier is based on the relativity among the spectral distances between the pixel in question and the center (mean) of all information classes that have been derived from the training samples. The decision rule behind this classifier takes the following form:
Pixel X ∈ Cj if d(Cj) = min[d(C1), d(C2), ..., d(Cm)]
(7.8)
where min [d(C1), d(C2), ..., d(Cm)] is a function for identifying the smallest distance among all those inside the bracket. For instance, min [23.4, 35.2, 47.8, 12.3, 56.7] returns a value of 12.3; d(Cj) refers to the euclidean distance between pixel X and the center of information class Cj. It is calculated using the following equation: d(C j ) =
n
∑ [DN(i, j) − Cij ]2 i =1
(7.9)
Spectral Image Analysis Equation (7.9) is identical to Eq. (7.1) except that the second pixel value is replaced with the mean of all pixels in the information class Cj. For every pixel in the input image, this computation is repeated m times, each time for one of the information classes. The amount of computation required is related exponentially to the physical dimension of the input image. According to Eq. (7.8), pixel X is considered to be a member of information class Cj if and only if its spectral distance to the mean of this class is the shortest among its distances to the center of all information classes. The information class corresponding to this distance is used as the identity for pixel X. Therefore, the allocation of this pixel into this information class is relative. All derived distances are compared among themselves to identify the minimum one. This classifier offers no provision for cases in which the two shortest distances happen to be identical. There are two options to resolve this dilemma: •
To assign the pixel to either of the classes randomly at a risk of making a mistake. This solution is not recommended owing to the fact that the same data will produce two different results if classified twice using the same training samples.
• To leave the pixel unclassified as a result of lack of convincing evidence. This creates the same gap problem as that commonly associated with the parallelepiped method. Application of the decision rule [Eq. (7.8)] causes the spectral space to be partitioned into polygons or spheres of influence that are known as Voronoi polygons in GIS (Fig. 7.11). The boundaries of these polygons are formed by straight lines that bisect the line connecting the centers (means) of two nearest clusters. All pixels falling inside one of the polygons receive the identity of the information class enclosed by this polygon (the boundary is not enclosed for outer polygons). Pixels falling on the bisecting lines have an equal distance to the two land covers on both sides. These pixels cannot be classified into either of the covers with certainty. As illustrated in Fig. 7.11, the euclidean distance from pixel 3 to the center of “pasture” is the shortest among the seven distances. Thus, it is assigned to this information class incorrectly. Intuitively, this pixel should be classified as “residential” since it is located among the cluster of residential pixels. This misclassification stems from the fact that only the mean of each information class is taken into consideration in the decision-making process and its standard deviation is ignored. The standard deviation of each class governs the compactness of its cluster. “Residential” has a much larger standard deviation (i.e., a larger cluster size) than “pasture.” This misclassification can be eliminated if both the mean and standard deviation of a class are taken into account.
275
Chapter Seven
255 I I I R I +I I R R R I R R I R I R R R +R 1 R R R R 3 R P P R Q Q P P P + Q Q Q + PP Q 2Q Q P P P Q P C C C Q C + C C C C C
I
F F F F F F+ F F F F F F F
DN of band 2
276
W W W W +W WW WW W W W
0
0
255
DN of band 1
FIGURE 7.11 Decision rules behind the minimum-distance-to-mean classifier. In this classifier the distances from a pixel to the mean (+) of all information class is calculated and compared among themselves. Solid lines define the boundary of influence. Namely, all the pixels falling inside the convex polygons are classified as the land cover indicated because the euclidean distance to the cover is the shortest.
In spite of this limitation, the minimum-distance-to-mean classifier requires only a moderate amount of computation in its decision making. The decision is valid in most cases if the training samples fed into the computer are representative. Besides, no assumptions about the distribution of pixel values are required of these samples.
7.5.3
Maximum Likelihood Classifier
The maximum likelihood method takes advantage of the probability of a pixel being a member of an information class in its decision making. This classifier relies on the second-order statistics of the gaussian probability density function model for each class. The basic discriminant function for pixel X is
X ∈ Cj if p(Cj/X) = max[p(C1/X), p(C2/X), …, p(Cm/X)]
(7.10)
where max [p(C1/X), p(C2/X), …, p(Cm/X)] is a function that returns the largest probability among those inside the bracket. For instance, max [0.45, 0.32, 0.67, 0.83, 0.71] returns a value of 0.83. The information
Spectral Image Analysis class corresponding to this probability is used as the identity for pixel X. p(Ci/X) denotes the conditional probability of pixel X being a member of class Cj. It is solved using the Bayes’s theorem:
p(Cj/X) = p(X/Cj) × p(Cj)/p(X)
(7.11)
where p(X/Cj) represents the conditional probability of encountering pixel X in class Cj. Also called a priori probability, p(Cj) stands for the occurrence probability of class Cj in the input image; p(X) denotes the probability of pixel X occurring in the input image. It is obtainable from the training samples by summing up the probability of finding it in every information class multiplied by the proportion of the respective class, or m
p(X ) = ∑ p(X/C j ) × p(C j ) j =1
(7.12)
Upon a first glance at Eqs. (7.11) and (7.12), it appears to be selfcontradictory that the probability of encountering a specific information class p(Cj) has to be known before the classification. Without the classification results it is impossible to know how large each land cover class is in the input image. This seeming contradiction is resolved in two approaches: • First, the percentage of each land cover can be derived from another classification such as unsupervised classification, the minimum distance to mean, or from a previous run of the maximum likelihood classification. • Second, this probability can be assumed to be equal for all cover classes. This assumption is not rational, but does not appear to affect the classification results as the classifier itself is robust. During actual implementation of maximum likelihood classification, the analyst is given the chance to specify the probability of each information class. No matter which method is used to resolve p(Cj), it actually does not affect the determination of p(Ci/X) after Eq. (7.12) is plugged into Eq. (7.11), or p(C j/X ) =
p(X/C j )p(C j ) m
=
p(X/C j ) m
∑ p(X/C j )p(C j ) ∑ p(X/C j ) j =1
(7.13)
j =1
Therefore, the calculation of p(Cj/X) is reduced to determination of p(X/Cj). The calculation of this conditional probability is based on the gaussian normal distribution probability model under
277
278
Chapter Seven the assumption of a normal distribution for all training samples. In the one-dimensional case p(X/C j ) =
⎡ ( x − u j )2 ⎤ exp ⎢− ⎥ 2σ j 2 ⎥⎦ 2 πσ j | ⎢⎣ 1
(7.14)
where uj is the mean for class Cj and sj is the standard deviation of class Cj. Both uj and sj are generated from the training samples. The computation of p(X/Cj) becomes much more complex in the multispectral domain, as is the case with satellite imagery data: p(X/C j ) =
(7.15) where ui and ¬j are the mean vector and covariance matrix of class Cj, respectively. They are estimated from the training data by the unbiased estimators using Eqs. (7.16) and (7.17). p
The basic discriminant function of Eq. (7.10) can be rewritten as
X ∈ Cj if and only if p(Cj/X) > p(Cj/X) (i ≠ j)
(7.18)
According to the above decision rule, pixel X is a member of information class Cj if and only if the probability of this pixel belonging to this information class is larger than the probability of its occurrence in any other information classes. For every pixel in the input image, its probability to occur in every information class has to be computed using Eq. (7.15). The intensity of computation can quickly get out of hand with an increase in the input image size. Just as with the minimum-distance-to-mean method, this decision rule is also
Spectral Image Analysis
255
DN of band 2
FF FFF F F F F F F FF
I I I R R II I I R R I IR R R I R R R R 1 R R R R 3 R P P R Q Q P P Q P Q QQ PP Q 2Q P PP Q P C C C Q C C C C C C
W W WW W WW WW WWW
0 0
255 DN of band 1
FIGURE 7.12 Distribution of occurrence probability for each of the information classes to be classified over the two spectral-band domain. Circles and ellipses: equiprobability contours.
relative. If a pixel happens to have the highest probability in two information classes, it cannot be assigned to either class with certainty as a consequence of incomplete evidence. In this case the pixel can be allocated to one of them arbitrarily at a risk of making a mistake. Alternatively, it can be left unclassified in the output. The probability for an information class to occur is the highest at the center of a cluster, and decreases gradually away from it (Fig. 7.12). In this diagram the probability is represented as circles of equiprobability. Pixel 3 has a higher probability of occurrence in “residential” than in “pasture” owing to its shorter distance to the center of residential probability circles. Thanks to the use of both mean and standard deviation of both information classes in the decision making, the mistake of classifying pixel 1 into “industrial,” as is the case with the minimum distance to mean, is avoided. Instead, this pixel is correctly classified as “residential.” However, pixel 2 still cannot be classified with 100 percent certainty. As a matter of fact, any pixels that fall inside the overlapped equiprobability contours cannot be classified with perfect confidence. In order to determine to which class these pixels should be assigned, the probability curves of both covers involved need to be examined. Illustrated in Fig. 7.13 are the probability curves generated from one of the spectral bands in Fig. 7.12. The two conditional probability
279
Chapter Seven
Cloud
p(X/Cj)p(C)j
280
Quarries
0
128
256
DN of band 1
FIGURE 7.13 Uncertain zone (shaded) in the maximum likelihood classification for pixels falling inside the overlapping probability distribution curves of two cover classes.
distribution curves intersect with each other at DN 128. Pixels inside the shaded overlapping zone cannot be classified into either cover reliably. The appropriate assignment of these pixels requires calculation of conditional probabilities in the respective covers, for instance, the probability of “128” being a member of “cloud” p(C/128) and “128” being a member of “quarries” p(Q/128). The determination of these two probabilities requires the probabilities for p(C) and p(Q), which are assumed to be 0.4 and 0.6, respectively. It is further assumed that the conditional probability of encountering 128 from the cloud and quarries training samples is, respectively, p(128/C) = 0.7 and p(128/Q) = 0.3. Thus,
Therefore, all pixels with a value of 128 in this spectral band should be classified as “cloud.” Illustrated in this example is a one spectral-band situation. In reality, multispectral bands are likely to be
Spectral Image Analysis used in a classification. The calculation of the conditional probabilities has to rely on Eq. (7.15). Like all other parametric classifiers, the maximum likelihood classifier is constrained by the following three limitations: • The calculation of p(X/ci) requires that the training samples of all land covers to be mapped follow some known form of distribution such as the normal distribution. Such a requirement, however, is frequently violated in practice, especially for information classes that have an extremely narrow range of pixel values. Thanks to its robustness, the maximum likelihood classifier is able to tolerate this violation to a certain degree without degrading the quality of the classification results. • The input pixel vector X is the reflectance of the Earth’s surface quantified at 8 bits, or 256 gray levels, for most remotely sensed data, usually recorded as integers. Incompatibility in data formats may rise when non-remote sensing data are incorporated in the classification. • In order to calculate p(X/ci), the class-specific covariance matrix must be nonsingular or invertible (Benediktsson et al., 1993). In classification of high-dimensional remote sensing data, this may be problematic.
7.5.4 Which Classifier to Use? The three classifiers make use of different statistical parameters in their decision making (Table 7.3). They have varying computation complexities and ease of understanding and implementation. The parallelepiped method is the easiest to use. However, the results are Minimum Distance
Maximum Likelihood
Easy to understand
Moderate, Relatively easy to understand
Complex, Hard to understand
Usefulness
Not useful
Useful
Very useful
Computation intensity
Simple
Moderate
Intensive
Statistical parameters used
Minimum and maximum
Mean only
Mean and standard deviation
Assumption
No
No
Yes
Classifier
Parallelepiped
Ease of understanding
TABLE 7.3
Comparison of the Three Spectral Classifiers
281
282
Chapter Seven not very useful due to the presence of an extensive “gap” (Fig. 7.14a). In this classification, the raw pixel values are used directly in the comparison without any assumption about these values. By comparison, the maximum likelihood classifier is mathematically the most complex in that a conditional probability must be derived for every pixel in every information class in the classification scheme. Pixel values are used indirectly in the decision making. Since the classification involves use of more statistical parameters of the training samples than the other two classifiers, it should theoretically produce the most reliable results (Fig. 7.14b). Although this generalization is correct in most cases, its validity is subject to the quality of the training samples. If quality training samples are not provided, the minimum-distance-to-mean classifier, which requires the use of only cluster means, can outperform this robust classifier. Similar to
FIGURE 7.14 Comparison of results classified with three classifiers. The input image is shown in Fig. 5.24. (a) Parallelepiped; (b) maximum likelihood; (c) minimum distance to mean. See also color insert.
Spectral Image Analysis the parallelepiped method, it does not involve any assumption about the data. Under the same conditions, this classifier produces classification results (Fig. 7.14c) slightly inferior to those of the maximum likelihood method.
7.6
Unsupervised and Supervised Classification Both unsupervised and supervised classifications have their own distinctive features (Table 7.4). Unsupervised classification differs drastically from supervised classification in that it moves from unknown to known, while the latter operates in the reverse sequence. In unsupervised classification all pixels of an unknown identity are blindly grouped into a certain number of clusters according to the similarity in their DNs. Prior to classification the image analyst may know nothing about the nature of the scene. The image analyst only needs to specify the number of clusters to be derived, even though these clusters may not correspond to meaningful ground covers found in the area of study. The identity of all clusters is ascertained in a postclassification session. In contrast, supervised classification starts from known in that the image analyst must be knowledgeable about the variety of land covers present in the study area and their characteristics on the image. Without such knowledge it is impossible to devise a sensible classification scheme and select training samples representative of respective information classes. Through comparison with these pixels of a known identity, pixels of an unknown identity in the input image are labeled. Unlike supervised classification, unsupervised classification is relatively easy to undertake. Its implementation can be very fast, thanks to minimum human intervention required during classification. A quick result can be generated within seconds. However, the results of unsupervised classification are merely spectral clusters that do not necessarily correspond to information classes on the ground. Because of this limitation, unsupervised classification is not regarded as useful and hence not widely utilized. The exception is when the scene has a rather simplistic structure or the information class of interest has its own unique spectral properties, such as areas affected by flooding, areas burned by fires, and deforested areas. These distinctive Criteria
Supervised
Unsupervised
Ease of operation
Complex
Simple
Speed
Slow
Fast
Usefulness
Useful
Not so useful
Cover identity known
Prior to classification
After classification
TABLE 7.4 Comparison between Supervised and Unsupervised Image Classification
283
284
Chapter Seven areas can be reliably and efficiently mapped via unsupervised classification. Given that water usually has a distinctive spectral reflectance pattern, flood-affected areas are easily clustered as a separate group in an unsupervised classification. In case of forest fires, burned areas are easily distinguished from other intact areas and are likely to form a unique cluster of their own in an unsupervised classification. Deforested areas devoid of dense vegetative cover can be mapped accurately amid their vegetated surroundings using cluster analysis. In light of a complex scene of diverse land covers, it is unlikely that all spectral clusters formed in an unsupervised classification can be linked to unique information classes on the ground. In this case it is better to take advantage of the supervised method, even if it takes longer to generate the results. Better results can be produced quickly by combining the two methods. For instance, the selection of training samples can be simplified by taking advantage of unsupervised classification results. Namely, the output from an unsupervised classification is used as the input into the supervised classification. Furthermore, the results from the unsupervised classification can guide the specification of a prior probability of occurrence for each information class during maximum likelihood classification. No matter which classifier is used, it is not possible to produce a perfect classification (Fig. 7.14). This situation will remain little changed no matter how carefully the training samples are selected because all classifiers share the same critical limitation in that they are intrinsically aspatial in nature. Only the spectral information of pixels is used in reaching the decision, and the other six image elements (size, shape, texture, shadow, location, and association) are ignored by the classifiers. These pixels are treated individually, and the spatial relationship among them is totally disregarded. These spectral classifiers are unable to take advantage of rich spatial information inherent in the input image that is routinely employed by human interpreters in visual image analysis. Predictably, the accuracy of machine-derived land cover maps is markedly lower than manual results, even though the latter could still be more detailed (Harvey and Hill, 2001). The inability of these classifiers to incorporate spatial information in the classification represents a huge waste of available clues in reaching the classification decision. This deficiency may be overcome with the use of additional image elements or other ancillary data. The efforts that have been made to remedy the situation and to improve the accuracy of classification include spatial image classification and knowledge-based image classification, to be covered in Chaps. 10 and 11, respectively.
7.7
Fuzzy Image Classification The classification algorithms discussed in the preceding sections are known as the hard classifiers. They produce thematic maps in which each pixel is associated with just one class. These results are crisp, or
Spectral Image Analysis deterministic, in that all pixels can have only one identity. Thus, all mapped land covers have only one of two memberships, 0 being a nonmember of a class (false or incorrectly labeled) and 1 being a full member of the class (true or correctly classified). Even in the probabilitybased maximum likelihood classification, a pixel is assigned to the class with the highest membership because it is allowed to have only one identity attached to it. The final outcome of a classification is rendered as a membership of 0 or 1, despite the fact that the probability for the pixel to occur in each of the covers has already been calculated. This kind of classification is limited in that it cannot accommodate any uncertainty about the target. It is also problematic for mixed pixels whose identity may involve at least two land covers. On the other hand, many ground covers (e.g., vegetation) can have diverse, but slightly different, appearances on satellite imagery due to a variety of factors, such as age, soil moisture content, and slope orientation. These diverse conditions cannot possibly be reliably lumped into one information class in the crisp classification. In fact, it is very difficult for the limited number of information classes to represent the whole range of class mixture and within-class spectral variability present in the input image, regardless how many information classes are designated to capture their spectral variations. The solution to this problem lies in soft classification, or fuzzy classification, based on fuzzy logic.
7.7.1
Fuzzy Logic
II
1.0 0.8 0.6 0.4 0.2 0
I 50
100 150 200 250 Pixel value (a)
Membership (m)
Membership (m)
Fuzzy logic is a multivalued logic that attempts to quantify uncertainty. Essential to fuzzy logic is the concept of membership function. This function can be represented as a rectangle (Fig. 7.15a). In fuzzy logic the two boolean logic values of “true” and “false” are replaced by a continuous range of values between 0 and 1 thanks to the membership concept. This fuzzy expression enables an information
II
1.0 0.8 0.6 0.4 0.2 0
I 50
100 150 200 250 Pixel value (b)
FIGURE 7.15 Comparison of decision boundaries in a crisp and a fuzzy classification. (a) The decision boundary is abrupt with only two possible membership values, 0 and 1; (b) the fuzzy boundary. Notice the gradual transition from 0 to 1 in membership value for class I and from 1 to 0 for part of class II.
285
Chapter Seven class to be described precisely (Fig. 7.15b). This expression is much more precise than the dichotomous membership of 1 and 0 in the traditional boolean logic behind crisp classification. The gradual membership function is able to represent a whole transition between them. In general, the broader the membership function (i.e., the gentler the membership function line), the more vague the underlying concept; the lower the membership values, the more uncertain the assignment of a pixel value to an information class. Owing to these transitional values between true and false, arbitrary sharp thresholds are avoided. The complex reality is much better approximated by fuzzy logic than the simplistic boolean logic. Moreover, one feature may be defined with more than one fuzzy set. For instance, one object feature may be defined with three fuzzy sets (Fig. 7.16). The more the membership functions overlap, the more objects are common in the fuzzy sets and the more vague the final classification. As illustrated in Fig. 7.16, if pixel X has a value of 80, membership to land cover A is 0.08 but 0.2 to cover B. The membership to both A and B drops to 0 if the value is 200. Fuzzy image classification requires fuzzy expression of all features. The fuzzy sets for different features are operated on instead of the feature values themselves. Thus, all mathematical operations are based on membership values between 0 and 1, irrespective of the actual range of these features. This simplification is advantageous with a high-dimensional feature space in which different features have different types of values at different ranges.
1.0
0.8 Membership (m)
286
0.6
0.4
0.2 A
B
C
0 0
50
100 150 Pixel value
200
250
FIGURE 7.16 Example of fuzzy decision boundaries for pixel value. A pixel value in an overlapped portion of the boundaries means that it can belong to two classes at the same time.
Spectral Image Analysis
7.7.2
Fuzziness in Image Classification
The introduction of the fuzzy concept to image classification is prompted by the fact that most land covers are fuzzy in nature. There is no abrupt change from one cover (e.g., urban residential) to another (e.g., urban commercial) in most cases. Instead, the transition from densely populated areas to sparsely populated areas is more likely to be gradual. On the other hand, the remotely sensed data on which the classification is based are full of ambiguities caused by the atmosphere and the sensor. The combination of these factors makes the results of parametric classification unsatisfactory owing to the presence of too much confusion between classes. In fuzzy image classification, land cover classes are defined as fuzzy sets, and pixels as set elements. Each pixel is associated with a membership vector containing the probabilities of this pixel being a member in each of the information classes. These probabilities are indicative of the degree of the association. Therefore, pixels of mixed cover components and those covers that have a condition between two defined information classes can be easily accommodated. Unlike the decision boundaries in Fig. 7.10, there is no clear-cut decision boundary in the fuzzy partition of the spectral space any more (Fig. 7.17). Included in the output of a fuzzy classification are multiple membership values expressed as fclass = [μ1 , μ2 , ..., μn ]
(7.19)
Fuzzy classification based on fuzzy logic is a process decomposable into three major steps: fuzzy expression of input variables resulting in fuzzy sets, fuzzy classification, and defuzzification of the classified results, if necessary. Success of fuzzy image classification depends on the careful selection and parameterization of the membership function, which must be able to model the underlying relation between the value of a pixel in a spectral band and its identity accurately. The proper design of a membership function may also require expert knowledge. How well the modeled membership function captures the reality directly governs the accuracy of the final classification results (Civanlar and Trussel, 1986). The quality of a fuzzy classification is judged against overall reliability and stability, two classification parameters contained in fuzzy classification results, together with the aforementioned class mixture (Benz et al., 2004). The membership values in Eq. (7.19) are likely to vary from class to class. The higher the membership value, the higher the degree of this pixel being a member of the concerned class, and the more reliable the assignment. The larger the discrepancy between the highest and the second highest membership value, the more stable the classification. A pixel that has an equal membership value to occur in more than one class is said to represent an
287
Chapter Seven
fsoil = 0.70 Pixel vector
fveg = 0.25
250
fwater = 0.05
Class mean Classified pixel
200 Pixel value in band B
288
Soil
150 Vegetation 100
50 Water 0
0
50
150 200 100 Pixel value in band A
250
FIGURE 7.17 Output of membership values for a pixel under consideration in a fuzzy partition of two-band spectral domain. Notice that the sum of membership values is 1 for every classified pixel. (Source: modified from Wang, 1990a.)
unstable classification. This means the pixel cannot be distinguished using the spectral evidence available. Fuzzy classification results rich in information, however, are problematic for visualization. Users may have difficulty perceiving any information in the mapped results. In order to be effectively visualized and appreciated, these fuzzy results have to be converted to crisp results or “hardened” to standard land cover and land use maps. The easiest method of hardening fuzzy classification results is to use the maximum membership value, a typical approach to defuzzifying them. If the maximum membership value of a class falls below a specified threshold (e.g., 50 percent), the pixel in question may not be assigned to any informational classes to ensure maximum reliability (Benz et al., 2004). After defuzzification, all the rich measures of uncertainty in the fuzzy classification are lost. Thus, this processing should be carried out toward the very end of the entire information extraction process, if absolutely necessary at all.
Spectral Image Analysis
7.7.3
Implementation and Accuracy
In addition to fuzzy logic–based classification, fuzzy classifications can also be implemented in four other ways. The first method is spectral unmixing analysis to be discussed in Sec. 7.8. The second method is to represent image properties as fuzzy, such as fuzzy mean and variance matrices that are extensions of the conventional mean and variance matrices (Wang, 1990a). They then replace the regular matrices in a supervised classification in the same way as with the conventional classifier. The third method that has found common applications is the fuzzy c-means clustering analysis in which fuzzy sets are used chiefly to represent intermediate results (Cannon et al., 1986). Fuzzy membership functions are then linked to the land cover composition of mixed pixels. A critical limitation of this method is the inability to evaluate the accuracy of the fuzzy classification outcome. The last method is to “soften” the output from a hard classifier (Foody, 1999). Conventional parametric classifiers, such as maximum likelihood, already output memberships for a pixel to occur in all information classes to be mapped. Instead of converting the highest probability to 1, a vector of all probabilities of occurrence [see Eq. (7.19)] can be retained in the output. Ideally, such probabilities could indicate the composition of mixed pixels. The full membership information about a pixel is construed as fuzzy. Another hard classifier suitable for being softened is neural network (to be covered in Chap. 8). In particular, the backpropagation network is suitable for modification to become a soft classifier. This modification is done to the output of the activation function that used to be binary (e.g., activation an output or not at all). Instead, the level of activation can be considered as a measure of the strength of being a member of the class in question. Instead of retaining only the most activated network output unit, the strength of the activation level for all units is output. These levels may indicate the land cover composition at the subpixel level equivalent to the fuzzy membership values derived from fuzzy c-means analysis (Foody, 1996). Common to all implementations of fuzzy classification is the assumption that membership values are equivalent to the proportions of component covers at the subpixel level. It has been repeatedly demonstrated that fuzzy membership grades are indeed correlated strongly to the actual proportion of covers within a pixel (Foody and Cox, 1994; Maselli et al., 1996). Fuzzy probabilities are more precise estimates of class distribution than conventional hard classifiers, thanks to better characterization of mixed, uncertainly attributable pixels. However, it is questionable whether all membership grades will sum to a unit. If not, membership normalization may have to follow the classification. Fuzzy classification has been found to be more accurate than the maximum likelihood classifier in mapping land covers (Wang, 1990b). Fuzzy classifiers (overall accuracy in the vicinity of 70 percent) are
289
290
Chapter Seven superior to conventional hard classifiers (overall accuracy in the vicinity of 50 percent) in mapping suburban land covers (Zhang and Foody, 1998). The improvement in fuzzy classification accuracy is due probably to the differentiation of homogeneous and mixed pixels via the membership grades because most misclassifications are clustering along boundaries, where land covers are usually in a transitional state and where scene complexity is rather high. With fuzzy classifiers, it is possible to identify the types and proportions of component covers even in mixed pixels. Such a capability to extract richer spectral information about the target is conducive to yielding more accurate results in subsequent analyses. Besides, stray pixels and pixels that are not normally assigned to any information classes in a crisp classification can be classified as well. The accuracy of fuzzy classification reported in the above studies has been assessed in a flawed manner in that fuzziness is confined to the classification process itself, and is not in the training samples or in the reference data used. The reference data have to be fuzzy themselves in order to generate a realistic assessment. As revealed by Zhang and Foody (1998), if fuzziness in the ground reference data is not taken into account, fuzzy image classification results are accurate in the lower 70-percent range. However, the accuracy reaches the lower 90-percent range if assessed against fuzzy reference data, and the Kappa value more than doubles from the per-pixel classification. Additionally, conventional measures of classification accuracy are no longer applicable to fuzzy classified results as they are designed primarily for hard classifiers. Therefore, the reported accuracy is not truly revealing. This deficiency may be overcome with the simple measures proposed by Foody (1996). In these measures the land cover composition in a fuzzy classification is compared to the composition measured on the ground. The results are presented as distance or information closeness for probability distribution, though more research is needed to study their properties in identifying significant differences in the output before they are widely accepted. To sum up, fuzzy classifiers are able to tolerate imprecision or vagueness in the input data, in class description, and in data modeling. Fuzzy classification may yield much more information on land covers than “hard,” deterministic classifiers. However, such obtained results may face two problems: • First, it is uncertain how such information would find any practical application. • Second, with the increased information on the land covers, it is doubtful whether it will actually hamper the interpretation of the results themselves (Maselli et al., 1996). In spite of the richer information supplied by fuzzy classifiers, it is cautioned that the appropriateness of fuzzy classification for operational applications must be carefully weighed for every single case.
Spectral Image Analysis
7.8
Subpixel Image Classification In the parametric classification methods discussed above, land covers are produced at the pixel level. This per-pixel approach to classification works well for ground covers that are spatially uniform over broad scales, such as plantation forests and cultivated fields. However, the results are not accurate for land covers that do not make up an entire pixel. The accuracy is lower for those covers that are spatially fragmented into patches smaller than the pixel size. The mapping accuracy is further degraded if the satellite imagery used happens to have a coarse spatial resolution. This problem is best addressed with subpixel image classification, which is able to detect surface covers occupying only a fraction of a pixel in the input image. Subpixel image classification outputs the proportions of all cover components within a pixel that can have multiple identities. Image classification at the subpixel level is very useful in mapping land covers occupying less than a pixel precisely. Subpixel image classification can be performed in several approaches, including fuzzy maximum likelihood, fuzzy c-means clustering (Liu et al., 2004), artificial neural networks (Atkinson et al., 1997), and spectral unmixing. Fuzzy c-means classification is identical to the “hard” minimum-distance-to-mean classification except that the spectral distance to an information class is construed as the likelihood of a pixel to occur in this corresponding cover class. A shorter distance from a pixel to a given class means a greater proportion of the pixel being a member of that class. The exact proportions are determined from the full set of distances for each class. Thus, proportions of land cover are estimated at the subpixel level. This method works well for linear ground features that have a width similar to the pixel size (Thornton et al., 2006). The accuracy is noticeably lower if the dimension falls below half a pixel. A slight variation in the implementation of fuzzy c-means classification is an additional regression analysis based on fuzzy membership functions (Foody and Cox, 1994). Although this method is more accurate than fuzzy c-means classification, its success does require accurate coregistration of all images and a training dataset. This neural network method has not found wide applications, and thus is not covered in depth here. By comparison, the widely used spectral unmixing is the focus of this section.
7.8.1
Mathematical Underpinning
Underlying spectral unmixing is the assumption that the net radiance at the sensor (i.e., without the atmospheric effect) is a linear combination of the spectra of all component covers within one pixel. Each component contributes in a unique way to the observed reflectance. If the mixing of the reflectance from individual component covers is independent of each other, the contribution of these covers can be
291
292
Chapter Seven estimated via inversion of spectral mixture models (Small, 2001). The composite reflectance spectrum of band i, Ri, can be expressed mathematically as n
Ri = ∑ f k Rik + ε i k =1
(7.20)
where n = number of spectral bands Ri = spectral reflectance of a pixel in the ith band Rik = reflectance of the kth (k = 1, 2, …, n) component cover of the pixel in the ith spectral band fk = weight of the mixing, or the respective areal proportions of the kth component or endmember in the pixel (endmember refers to idealized and pure signatures for a class) ei = error term in the ith spectral band If the noise is uncorrelated, the linear equation above can be inverted to compute fk based on the least-squares principle, or
f = (ETE)−1ETr
(7.21)
where r is the observed reflectance vector [r1, r2, …, rn] and E is an n × k (number of endmember spectra) matrix. This equation has two constraints: • Abundance sum-to-one, or Σfj = 1. • Abundance nonnegativity or fj > 0. Implementation of inversion with various constraints shows that the fully constrained least square produces the most accurate estimate, more accurate than nonnegativity-constrained least squares, though both being close in the estimated fraction (Chang et al., 2004). However, the constraints are essential if the target size is smaller than the ground sampling distance. The validity of the inversion is subject to the following three conditions being satisfied: • Endmembers should be selected independently of one another. • The number of endmembers should not exceed the number of spectral bands used. • The selected spectral bands should be as decorrelated as possible (Lu and Weng, 2004). If these bands are highly correlated with each other, it may be necessary to transform them using PCA as described in Sec. 6.6.1 first.
Spectral Image Analysis Another common strategy to deal with the correlation problem is the minimum noise function proposed by Green et al. (1988). It aims to diagonalize the noise covariance matrix in an effort to minimize the effect of band-specific noises.
7.8.2
Factors Affecting Performance
The accuracy of inversion depends on the implicit assumption that the constituent components in a mixed pixel are known, together with their pure spectra. Therefore, the fewer endmembers are involved, the more accurate the inversion is. Practically, as dictated by the second constraint above, the number of endmembers to be estimated using spectral mixing analysis is limited as multispectral satellite images have spectral bands numbered no more than 10 in most cases. So far this number has been restricted to three or four in most analyses (Small, 2001; Lee and Lathrop, 2006). Nevertheless, such a small quantity proves inadequate in a complex environment such as urban areas where the number is increased (Van der Meer, 1997) to five endmembers: shade, green vegetation, imperious surface, dry soil, and dark soil (Lu and Weng, 2004). Shade actually does not have any reflectance, and should not be regarded as an endmember. Even so, impervious surfaces of low, medium, or high albedo in urban areas are frequently confused with bare soils. In this case accurate results of spectral unmixing require multiple endmembers, each beginning with a two-endmember candidate model. This model is evaluated in terms of three criteria of fraction values: RMS (root mean square) error, residual threshold, and the finally produced fraction image that has the smallest error (Roberts et al., 1998). Multiple endmember spectral unmixing has superior performance to the standard mixing model. In addition, the accuracy of inversion depends largely on the quality and purity of the selected endmembers. The result from Eq. (7.21) is a set of endmember fraction estimates for each pixel. They provide a fraction image for each endmember. Derivation of the fraction images is subject largely to the proper selection of the endmember components, a critical step in the success of spectral unmixing analysis. They can be selected via a number of methods, such as taken from libraries, in situ reflectance measurements, from the image itself, high-order PCA eigen vectors, and based on spectrally pure pixels or manually selected. In order to ensure high inversion accuracy, it may be necessary to carry out the selection iteratively. For instance, the initially selected endmembers may be refined after evaluating the fraction images. No matter which selection method is adopted, the selected endmembers must have pure spectra. This task may not seem straightforward in a highly heterogeneous environment where many diverse land covers coexist in close spatial proximity to each other, such as residential, commercial, transportation, vegetation, and shade in an urban setting. Besides, their reflectance may interfere with each other. It is equally difficult to locate homogeneous areas of any indicator
293
294
Chapter Seven group with any certainty in a mixed heterogeneous forest (Brown et al., 1998). Neither is it easy to locate small groups of pixels where the percentage cover of each indicator group can be determined with confidence. This difficulty may be overcome by stratifying the image first or by using multiple endmembers. The problem, however, disappears if subpixel classification is implemented using neural networks that do not require pure spectra. Neural network–based classifiers can accommodate a wide range of training data to capture a full range of spectral variability within a land cover.
7.8.3
Implementation Environments
Subpixel classification based on spectral unmixing analysis may be implemented in two environments, ERDAS (Earth Resources Data Analysis System) Imagine and ENVI (Environment for Visualizing Images).
ERDAS Imagine Subpixel Classifier This classifier contains the Applied Analysis Spectral Analytical Process (AASAP) module for subpixel image analysis. This module considers a pixel to contain a portion (f) of a material of interest (MOI), together with (1 − f) background materials (Brown et al., 1998). Determination of this fraction is via subtraction of fractions of candidate background spectra until the resultant residual spectrum resembles most closely that of the MOI. In this environment subpixel image classification must be preceded by preprocessing, radiometric correction, and signature derivation, undertaken in this order. The function of preprocessing is to characterize candidate backgrounds to be compared to image pixels in the scene. Radiometric correction includes correction for atmospheric radiation and environmental conditions during data acquisition, and normalization of all spectral bands obtained on different dates or over different geographic areas. In this way scene-to-scene radiance variation caused by the atmosphere is compensated for and background from each pixel is eliminated. Such correction makes detection of material at the subpixel level more accurate and less time consuming. In signature derivation, common subpixel components are identified and the signature defined. Such a signature may include reflectance spectrum and extra information needed for subpixel classification. The developed signature is cleaner and scene-to-scene transferable. Removal of extraneous material contributions improves subpixel classification even in complex backgrounds (Applied Analysis Inc., 2003). In addition, there are two optional steps: data quality assurance and signature combiner. The first optional step is designed to identify duplicate line artifacts in older satellite imagery. Any training pixels falling inside these artifacts are automatically removed during signature derivation. Signature combiner combines existing signature files and environmental correction parameters to form the input into the final classification. This is a useful feature for those MOIs whose spectral properties vary with seasonality. Multiple signatures that are conducive to more reliable
Spectral Image Analysis discrimination may be combined in this step. However, this implementation environment is limited in that it can only map one component cover at a time instead of k components that can be mapped using the standard linear spectral unmixing analysis. The ERDAS Imagine subpixel classifier outputs an overlay image of the proportion of MOIs within pixels that are quantified at five levels. This classifier is advantageous over linear mixture modeling in that it is not restricted to a limited set of endmembers controlled by the number of independent spectral bands (Huguenin et al., 1997). Classification of multiple spectrally dissimilar materials simply means repetition of the same process a number of times, each time for one of the materials (Flanagan and Civco, 2001). The user can develop highly specialized and transferable signatures of discrete materials in Imagine. However, this implementation has the drawback of identifying the proportion at a minimum increment of 20 percent. Any proportions smaller than 20 percent cannot be resolved by this method. In addition, it is computationally demanding and very slow for large image files.
ENVI Implementation The ENVI implementation of subpixel classification involves use of the minimum noise function in two steps, decorrelation of the noise in the input data and implementation of a standard PCA of the noisewhitened data. There are two options in performing unmixing, unconstrained or partially constrained (ITT, 2007): • In the first option, it is possible to have negative abundance values and all abundance values do not have to sum up to unity. • The second option is the variable-weight, unit-sum constraint. It permits proper unmixing of minimum noise functiontransformed data, with zero-mean bands. This unit-sum constraint from the weights defined by the user is added to the unmixing inversion process. During unmixing, the user can select endmember spectra from diverse sources, such as spectral libraries, spectral plots, statistics files, and region of interests. These spectra are automatically resampled to match the wavelengths of the multiband image being unmixed. The output from ENVI spectral unmixing analysis includes a series of grayscale images, one for every endmember, in addition to a RMS error image. Brighter pixels represent higher abundances in the grayscale images and larger errors in the RMS error image. Normally, abundances in the images have a values ranging from 0 to 1. Abnormal abundance values outside this range (i.e., values <0 or >1) indicate incorrect or missing endmembers. They can be located by viewing the RMS error image.
295
296
Chapter Seven
7.8.4
Results Validation
Subpixel image classification results may be validated either spatially or nonspatially. Spatial, or location-specific validation is challenging in that pixels cannot be pinpointed precisely on the ground owing to subpixel residuals being perfectly legitimate in image geometric rectification. Therefore, all assessments of subpixel analysis have to be limited to the nonspatial overall accuracy. This can be accomplished using three methods (Hung and Ridd, 2002): • The first method is per-pixel based, the same as that for assessing the conventional per-pixel classification to be discussed in Chap. 12. The only extra processing is hardening of the result from the subpixel to the pixel level first. • The second method is to use remote sensing data of a much finer spatial resolution that have been classified using the per-pixel method already. The obtained results are then degraded to simulate a coarse resolution image to be classified at the subpixel level. The correlation coefficient of the pixellevel results with their subpixel-level counterparts is able to shed light on classification accuracy. • The last method relies on comparison with real-world statistics that may be derived from ground surveys and interpretation of aerial photographs. This method is also the most expensive of the three. However, cost can be reduced by limiting validation to a smaller subarea. Numerous validations have confirmed that subpixel image classification can achieve higher accuracy than per-pixel classifiers (Ichoku and Karnieli, 1996). More accurate mapping results were obtained for rural land cover features, such as trees and hedgerows, based on spatial dependence (Thornton et al., 2006). Spectral mixing analysis significantly improved classification accuracy over the maximum likelihood classifier in mapping urban landscape (Lu and Weng, 2004). In a two-endmember subpixel analysis, it achieved an accuracy of 18 and 6 percent, respectively, higher than the most accurate results from per-pixel classifiers (Huguenin et al., 1997). Results generated from the linear unmixing model classification were correlated with the actual class proportion at a coefficient >0.7 (Foody and Cox, 1994). These estimates are more accurate than those obtained from a conventional hard classifier. The estimates of impervious surface deviated within 0.5 to 7.6 percent from the reference data (Lee and Lathrop, 2006). The discrepancy for grass and vegetation is higher, probably due to the shadow effect of trees. The fraction images are effective for characterizing urban landscape patterns and for classifying urban land covers. In an urban setting, inversion of a threecomponent spectral unmixing model produced reasonably accurate
Spectral Image Analysis estimates of vegetation fraction over a wide range of abundances (Small, 2001). These physically based quantitative estimates of vegetation abundance and distribution are more informative and useful than NDVI in urban areas. Nevertheless, spectral mixture analysis is less accurate than neural networks and fuzzy c-means classification (Atkinson et al., 1997). Spectral unmixing analysis is limited in that it may be lengthy to generate high-quality fraction images and even technically challenging if some of the selected endmembers are potentially correlated among themselves, such as barren land and impervious surfaces. It may have a limited applicability to mapping of complex urban surface materials due to potentially nonlinear spectral mixing of diverse covers, especially where tree cover is a significant factor (Lee and Lanthrop, 2006). Another limitation is that spectral mixing can yield only the quantity of land covers within a pixel. It is unable to show the exact location of these fractional land covers within the pixel. These problems can be easily avoided by using satellite imagery of a very fine spatial resolution.
7.9
Postclassification Filtering Postclassification filtering is a process of thematic generalization during which the identity of a minor feature is amalgamated into that of one of its dominant surrounding covers. Postclassification filtering is usually carried out in the spatial domain, and serves a number of purposes, such as to fine tune the classification results to make them more reasonable, and to improve their aesthetic appearance and communication effectiveness. The first purpose is achieved by eliminating spatially isolated pixels or a small group of pixels falling below the threshold. This is essentially a process of thematic generalization. For instance, a passenger ship in a harbor may be correctly classified as built-up area in the classification results. Similarly, a farmstead in a predominantly rural area could have been correctly mapped as urban residential. Should the mapped features be retained in the final results in both cases? The answer to this question depends on the purpose of producing the classification in the first place, the spatial resolution of the image being used, and the scale at which the final map is to be presented, all of which are related to the concept of minimum mapping unit, which varies with the scale of the mapping (refer to Sec. 13.6.1 for more information). From the aesthetic perspective the produced land cover map has to be thematically generalized to enhance its readability. Regardless of the purpose of postclassification filtering, it differs from preclassification spatial enhancement covered in the preceding chapter in that the nature of pixel values have changed. Pixel values in the raw data represent the amount of spectral reflectance from the target on the
297
298
Chapter Seven ground. These values, expressed as integers, are physically meaningful in that a larger value is indicative of more radiative energy from the target. However, the classified pixel values are artificial codes denoting different land covers. They do not have any physical meanings attached to them. Therefore, such statistical parameters as means and median are not applicable to them. Instead, the only meaningful statistical parameter is majority. Postclassification filtering may be implemented as majority filtering, significance-based filtering, and clumping plus sieving (elimination), all being carried out in the spatial domain. In majority filtering, minority pixels within the operating window are integrated into their dominant surrounding covers. The identity of the pixel in question (i.e., the central pixel in the window) is replaced by that of the most dominant pixels inside the window (Fig. 7.18). Since the central pixel has a value of 3 (Fig. 7.18a) and the majority of pixels in the window have a value of 4, its value is changed to 4 in the output image (Fig. 7.18b). Majority filtering is relatively easy to implement, though not very effective in that the filtered image can still be spotty because each pixel is assessed in isolation (Fig. 7.19a). It is effective at removing stray pixels, but not small clusters of pixels. The degree of thematic generalization is affected by the window size. The same cover that is a majority in a small window may not be a majority in a larger window. As shown in Fig. 7.20, a totally different outcome emerges as the window size changes from 3 × 3 to 5 × 5. Using a large window, the neighborhood of influence is expanded. The filtered output is subject to the influence of more neighboring pixels. Thus, the degree of generalization is higher. Besides, more border pixels will not be filtered reasonably well owing to lack of an adequate neighborhood size. Occasionally, majority filtering may fail to remove spatially adjoining pixels within the working window because they can be majority in quantity. Significance-based filtering is identical to majority filtering except the significance of a pixel instead of its prevalence is taken into consideration during filtering. For instance, if the minority pixels are considered significant (e.g., an oasis in a desert), then it is preserved in the output no matter how subordinate these pixels are in their quantity. They receive the same identity in the filtered output image as that in the input image. Only those pixels whose identity is considered insignificant are changed. 2 4 8
2 4 8
1 3 4 8 4 2
1 4 4 8 4 2
(a) Input before filtering
(b) Output after filtering
FIGURE 7.18 In this 3 × 3 window, the central pixel 3 will be replaced by 4 (the majority inside the window) after majority filtering.
Spectral Image Analysis
(a)
(b)
FIGURE 7.19 Comparison of the effect of two filtering methods on classified results. (a) Majority filtering within a window of 5 × 5; (b) clumping with eight connections followed by elimination using a threshold of five pixels. See also color insert.
In comparison with majority filtering, clumping followed by sieving or elimination is more complex in that the spatial context is broadened to include the entire input image. As a preliminary step of filtering, clumping is the process of identifying spatially contiguous pixels in a thematic layer. Whether spatially adjacent pixels can be considered to form a clump depends upon the definition of connectivity (refer to Sec. 6.3.1). If the eight-connection definition is
6 6 4 4 5 2 3
6 4 4 5 2 3 2
3 4 5 5 2 3 2
3 5 5 2 2 2 2
3 5 2 2 6 2 4
2 2 2 6 6 4 4
(a) Input image
2 6 6 6 4 6 4
4 4 5 2 2
4 5 5 2 2
5 5 2 2 2
2 2 2 2 2
2 2 6 6 4
(b) Filtered with a 3 × 3 window
5 2 2 2 2 2 2 2 2
(c) Filtered with a 5 × 5 window
FIGURE 7.20 Impact of the operating window on the final result in majority filtering.
299
300
Chapter Seven 6 6 6 6 6
6 6 6 6 6
6 6 6 6 6
4 4 6 2 2
4 4 2 2 2
2 2 2 4 2
(a) Input image
6 6 6 6 6
6 6 6 6 6
6 6 6 6 6
4 4 6 2 2
4 4 2 2 2
2 2 2 2 2
(b) Sieving threshold of 3
6 6 6 6 6
6 6 6 6 6
6 6 6 6 6
6 6 6 2 2
6 6 2 2 2
2 2 2 2 2
(c) Sieving threshold of 5
FIGURE 7.21 Impact of the operating window on the final result in elimination. (a) Input image; (b) output based on a sieving threshold of 3 pixels; (c) output based on a sieving threshold of 5 pixels. All four pixels with a value of 4 receive a new identity of 6 instead of 2 because the former is more dominant.
adopted, all neighboring pixels of the same identity are considered as a clump. After the image has been partitioned into “clumps” it is filtered, during which minor pixels forming small clumps are “sieved” or “eliminated.” Of these two algorithms, “elimination” is easier to achieve as it simply removes those isolated pixels that have an identity different from its adjoining clumps. However, this removal is problematic in that gaps are created in the final output. This problem is avoided by using the “sieving” function. During sieving the identity of a small clump that falls below the specified threshold size is replaced with that of its larger neighboring clump. The pixels to be sieved receive a new identity, the same as that of the larger clump (Fig. 7.19b). The degree of thematic generalization is affected by the threshold specified for “elimination” or “sieving.” Its exact value should be based on the spatial resolution of the image being filtered and the final scale at which the results are presented. For instance, there are four clumps in Fig. 7.21a. If a clump threshold of three is used, then the smallest clump made up of one 4 will receive a new identity of 2 in the output image (Fig. 7.21b). If the clump size rises to five, then the four pixels with a value of 4 will receive a new identity of 6 instead of 2 in Fig. 7.21c. In comparison with majority filtering, “clumping” followed by “sieving” is more effective at removing small clusters of pixels (Fig. 7.20b). The border problem associated with majority filtering disappears in clumping and sieving, even though they are more complex and time consuming to undertake.
7.10
Presentation of Classification Results There are two forms of presenting the classification results, numeric and graphic. In numeric presentation the statistics of all mapped covers are presented in a tabular form. Included in the table are the names of information classes that have been mapped, their number of pixels,
Spectral Image Analysis
Cover
Number of Pixels
Area, km2
Percentage of Image, %
Forest
11360
4.54
4.60
Pasture
20132
8.05
8.15
Lush (L) mangroves
7955
3.18
3.22
Stunted (S) mangroves
6428
2.57
2.60
Residential
62671
25.07
25.38
4747
1.90
1.92
Murky water
27933
11.17
11.31
Clear water
12670
5.07
5.13
Bare ground
4116
1.65
1.67
Shallow water
65825
26.33
26.65
Cloud
21142
8.46
8.56
1996
0.80
0.81
246975
98.79
100.00
Industrial
Shadow Sum Source: Gao, 1998.
TABLE 7.5
Tabular Presentation of Classified Results Shown in Fig. 7.22
and the area of each information class and its percentage (Table 7.5). This form of presentation is quantitative. It can reveal the most and least dominant covers very directly. The graphic mode of presentation usually takes the form of a color-coded thematic map showing the spatial distribution of each mapped cover, in addition to its dominance over the study area. Compared with the numeric method, this form is much more effective in visualizing the location of each information class with the assistance of a legend. The effectiveness of the map in communication is subject to the design of a proper color scheme. In order to achieve the intended effectiveness, care must be taken in designing an appropriate legend by allocating similar colors to covers of a similar nature on the ground. This means exercise of care is needed in choosing the color for a given land cover feature so as to maximize its visibility and distinctness. This type of thematic map is usually embellished with essential cartographic elements, such as a scale bar and orientation (Fig. 7.22). Annotative information, such as the method of image classification and the original data of classification may also be inserted into the map or its caption.
301
302
Chapter Seven
0
3 km
Forest Pasture L mangroves S mangroves Residential Industrial Bare ground Murky water Clear water Shallow water Cloud Shadow
FIGURE 7.22 An example of graphic embellishment of classified results. Essential components are a legend, a scale bar, and the orientation. The statistics of all mapped coveres are presented in Table 7.5. (Source: Gao, 1998.) See also color insert.
References Anderson, J. R., E. E. Hardy, J. T. Roach, and R. E. Witmer. 1976. A Land Use and Land Cover Classification System for Use with Remote Sensor Data. U.S. Geological Survey Professional Paper 964. Washington, DC: U.S. Government Printing Office. Applied Analysis Inc. 2003. Imagine Subpixel Classifier White Paper. Billerica, Mass. Atkinson, P. M., M. E. Cutler, and H. Lewis. 1997. “Mapping sub-pixel proportional land cover with AVHRR imagery.” International Journal of Remote Sensing. 18(4):917–935. Benediktsson, J. A., P. H. Swain, and O. K. Ersoy. 1993. “Conjugate-gradient neural networks in classification of multisource and very-high-dimensional remote sensing data.” International Journal of Remote Sensing. 14(15):2883–2903. Benz, U. C., P. Hofmann, G. Willhauck, I. Lingenfelder, and M. Heynen. 2004. “Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information.” ISPRS Journal of Photogrammetry and Remote Sensing. 58(3–4):239–258. Brown, L. J., C. I. Trotter, and M. R. Johnston. 1998. “Assessing the potential of subpixel classification in a mixed conifer-broadleaf forest.” Geoscience and Remote Sensing Symposium Proceedings, IGARSS’98, 6–10 July. 2:776–778. Campbell, J. B. 2002. Introduction to Remote Sensing (3rd ed.). London: Taylor & Francis. Cannon, R. L., J. V. Dave, J. C. Bezdek, and M. M. Trivedi. 1986. “Segmentation of a thematic mapper image using the fuzzy c-means clustering algorithm.” IEEE Transactions on Geocscience and Remote Sensing. GE24(3):400–408.
Spectral Image Analysis Chang, C.-I., H. Ren, C. C. Chang, F. D’Amico, and J. O. Jensen. 2004. “Estimation of subpixel target size for remotely sensed imagery.” IEEE Transactions on Geoscience and Remote Sensing. 42(6):1642–1653. Civanlar, R., and H. Trussel. 1986. “Constructing membership functions using statistical data.” IEEE Fuzzy Sets and Systems. 18:1–14. Cowardin, L. M., V. Carter, F. C. Golet, and E. T. LaRoe. 1979. Classification of Wetlands and Deepwater Habitats of the United States. Washington, DC: U.S. Fish and Wildlife Service. Flanagan, M., and D. L. Civco. 2001. “IMAGINE subpixel classifier version 8.4.” Photogrammetric Engineering and Remote Sensing. 67(1):23–28. Florida Topographic Bureau. 1985. Florida Land Use, Cover and Forms Classification System. Thematic Mapping Section, Florida Department of Transportation, Procedure No. 550-010-001-a Florida Topographic Bureau. Foody, G. M. 1996. “Approaches for the production and evaluation of fuzzy land cover classifications from remotely-sensed data.” International Journal of Remote Sensing. 17(7):1317–1340. Foody, G. M. 1999. “Image classification with a neural network: From completelycrisp to fully-fuzzy situations.” In Advances in Remote Sensing and GIS Analysis, ed. P. M. Atkinson and N. J. Tate, 17-37. Chichester, UK: John Wiley & Sons. Foody, G. M., and D. P. Cox. 1994. “Sub-pixel land cover composition estimation using a linear mixture model and fuzzy membership functions.” International Journal of Remote Sensing. 15(3):619–631. Gao, J. 1998. “A hybrid method toward accurate mapping of mangroves in a marginal habitat from SPOT multispectral data.” International Journal of Remote Sensing. 19(10):1887–1899. Green, A. A., M. Berman, P. Switzer, and M. D. Craig. 1988. “A transformation for ordering mutispectral data in terms of image quality with implications for noise removal.” IEEE Transactions on Geoscience and Remote Sensing. 26(1):65–74. Harvey, K. R., and G. J. E. Hill. 2001. “Vegetation mapping of a tropical freshwater swamp in the Northern Territory Australia: A comparison of aerial photography, Landsat TM and SPOT satellite imagery.” International Journal of Remote Sensing. 22(15):2911–2925. Huguenin, R., M. Karaska, D. van Blaricom, and J. Jensen. 1997. “Sub-pixel classification of bald cypress and tupelo gum trees in Thematic Mapper imagery.” Photogrammetric Engineering and Remote Sensing. 63(6):717–725. Hung, M. C., and M. K. Ridd. 2002. “A subpixel classifier for urban land-cover mapping based on a maximum-likelihood approach and expert system rules.” Photogrammetric Engineering and Remote Sensing. 68(11):1173–1180. Ichoku, C., and A. Karnieli. 1996. “A review of mixture modeling techniques for sub-pixel land cover estimation.” Remote Sensing Reviews. 13(3–4):161–186. ITT. 2007. IDL Reference Guide, version 6.4, http://idlastro.gsfc.nasa.gov/idl_ html_help/IDL_Reference_Guide.html. Lee, S., and R. G. Lathrop. 2006. “Subpixel analysis of Landsat ETM+ using SelfOrganizing Map (SOM) neural networks for urban land cover characterization.” IEEE Transactions on Geoscience and Remote Sensing. 44(6):1642–1654. Lillesand, T. M., R. W. Kiefer, and J. C. Chipman. 2004. Remote Sensing and Image Interpretation (5th ed.). New York: John Wiley & Sons. Liu, W., K. C. Seto, E. Y. Wu, S. Gopal, and C. E. Woodcock. 2004. “ART-MMAP: A neural network approach to subpixel classification.” IEEE Transactions on Geoscience and Remote Sensing. 42(9):1976–1983. Lu, D., and Q. Weng. 2004. “Spectral mixture analysis of the urban landscape in Indianapolis with Landsat ETM+ imagery.” Photogrammetric Engineering and Remote Sensing. 70(9):1053–1062. Maselli, F., A. Rodolfi, and C. Copnese. 1996. “Fuzzy classification of spatially degraded thematic mapper data for the estimation of sub-pixel components.” International Journal of Remote Sensing. 17(3):537–551. Mather, P. M. 2004. Computer Processing of Remotely-Sensed Images: An Introduction (3rd ed.). Chichester, England: John Wiley & Sons.
303
304
Chapter Seven Roberts, D. A., M. Gardner, R. Church, S. Ustin, G. Scheer, and R. O. Green. 1998. “Mapping chaparral in the Santa Monica mountains using multiple endmember spectral mixture models.” Remote Sensing of Environment. 65(3):267–279. Small, C. 2001. “Estimation of urban vegetation abundance by linear spectral unmixing.” International Journal of Remote Sensing. 22(7):1305–1334. Swain, P. H. 1978. “Fundamentals of pattern recognition in remote sensing.” In Remote Sensing: The Quantitative Approach, ed. P. H. Swain and S. M. Davis, 136–187, New York: McGraw-Hill. Thornton, M. W., P. M. Atkinson, and D. A. Holland. 2006. “Sub-pixel mapping of rural land cover objects from fine spatial resolution satellite sensor imagery using super-resolution pixel-swapping.” International Journal of Remote Sensing. 27(3):473–491. Van der Meer, F. 1997. “Mineral mapping and Landsat Thematic Mapper image classification using spectral unmixing.” Geocarto International. 12(3):27–40. Wang, F. 1990a. “Fuzzy supervised classification of remotely sensed images.” IEEE Transactions on Geoscience and Remote Sensing. 28(2):194–201. Wang, F. 1990b. “Improving remote sensing image analysis through fuzzy information representation.” Photogrammetric Engineering and Remote Sensing. 56(8):1163–1169. Zhang, J., and G. M. Foody. 1998. “A fuzzy classification of sub-urban land cover from remotely sensed imagery.” International Journal of Remote Sensing. 19(14): 2721–2738.
CHAPTER
8
Neural Network Image Analysis
I
n addition to the unsupervised clustering algorithms and parametric classifiers introduced in Chap. 7, several machine learning algorithms have been developed for image analysis. They include neural networks and decision trees. Neural network–based image analysis is covered in this chapter, and decision tree classification will be discussed in the following chapter. The idea behind artificial neural networks (ANNs), also called neural networks for short, was first put forward more than half a century ago by McCulloch and Pitts (1943). It did not find any applications until the 1970s. It was in the late 1980s when ANNs were applied to analysis of multispectral remote sensing data. This resurgence of interest in image classification is attributed mainly to the advances in computing technology and the discovery of powerful learning algorithms. Since then, vast efforts have been directed toward building neural network models and exploring their utility in classifying remotely sensed data, particularly its performance relative to conventional parametric classifiers. So far ANNs have proved to be a viable alternative for automatically mapping land covers on the global scale because of the improved accuracy and their ability to provide additional information on uncertainty (Gopal et al., 1999). In this chapter the fundamentals of ANNs, including their biological counterpart and main features, are introduced first. Next, major types of ANNs that have found applications in image classification are described and compared with each other wherever possible, including alternative network models. Network architecture (e.g., network configuration and optimization of network parameters) is the topic of the third section. The essential process of ANN-based image classification, network learning, is comprehensively introduced in Sec. 8.4. Covered in the subsequent section are implementation issues of ANN classifiers, especially those related to network training, data encoding, and standardization. Finally, the potential of this nonparametric method in classifying remote sensing data is evaluated in comparison
305
306
Chapter Eight with the per-pixel maximum likelihood classifier in a case study (Sec. 8.7), with their relative performance compared and critically evaluated.
8.1
Fundamentals of Neural Networks 8.1.1
Human Neurons
A biological neuron is the structural and functional unit of the nerve system of the human brain. Numbered on the order of 1010, a typical neuron encompasses the nerve cell body, a branching input called dendrites, and a branching output called the axon that splits into thousands of synapses (Fig. 8.1a). A synapse connects the axon of one neuron to the dendrites of another. All neurons are highly interconnected with one another. As a specialized cell, each neuron fires and propagates spikes of electrochemical signals to other connected neurons via the axon. The strength of the received signal depends on the efficiency of the synapses. A neuron also collects signals from other neurons and converts it into electrical effects that either inhibit or excite activity in the connected neurons, depending on whether the total signal received exceeds the firing threshold.
8.1.2 Artificial Neurons Artificial neurons are processing nodes or units that receive inputs from a number of connected nodes. All the connections between nodes are analogous to dendrites of a biological neuron. All the input signals that are fed into this artificial node are combined linearly or nonlinearly to generate an output (summation) via a transfer function (threshold) (Fig. 8.1b). The output is passed to other artificial neurons via another connection resembling the axon. Each processing unit functions as a simple pattern recognition machine at which the input data are evaluated against the synaptic strength and an output is produced. Computationally, an artificial neuron may be implemented as a weighted sum of all the input signals coming into it, plus a numerical Input Dendrites
Cell body
Cell body
Nucleus Axon
Summa- Transform tion (∑) (f )
Output Axon
Dendrites
(a)
(b)
FIGURE 8.1 Comparison of biological and artificial neurons. (a) Structure of a biological neuron; (b) a rendition of an artificial neuron that mimics the biological neuron. (Source: Stergiou and Siganos.)
Neural Network Image Analysis Input x1 w1
x2
w2 w3
x3 x4
. . . . .
Summation Activation
Output
w4 wn
xn
FIGURE 8.2
Data processing at a typical artificial neuron in the network.
value or bias (Fig. 8.2). A weight wij is associated with every synapselike link, or connection, between nodes i and j. Indicative of the strength of the synaptic connection, this weight is usually expressed as a real value that modulates the output of a processing unit via the transfer or activation function. The input signal contributes more toward the output if it has a larger weight. Generally, the smaller the weights are, the faster the network processes data. The actual output of the processing node is converted into a value that either fires or inhibits activity in the next node or nodes connected to it. This activation, in turn, becomes the input to other units in the layer in a multilayer network or to the terminal nodes if there are no further hidden layers. ANNs, commonly related to artificial intelligence, machine learning, and statistics, are a learning paradigm rooted in the human brain. Initially developed to mimic the cognitive capability of the human brain, these networks are made up of independent units that process information through interactions between individual nodes that are highly interconnected among them. Hence they are also called the connectionist models, or parallel distributed processing, since all interconnected nodes work in parallel. This machine-based information processing system comprises a huge number of processing nodes or artificial neurons equivalent to biological neurons in the human brain. Neural network models are algorithms for cognitive tasks (e.g., image classification) that are based on research into how the brain processes information.
8.2
Neural Network Architecture The three parameters essential to the architecture of an ANN model are topology, learning paradigm, and learning algorithm. This section focuses on network topology, while the other two parameters will be covered in the next section. Network topology refers to the manner in which all the nodes in a neural network are organized and
307
308
Chapter Eight connected, and how data and error information travel from one layer of nodes to the next. Fundamentally, network topology falls into two groups, feed-forward and feedback. The most common feed-forward networks are exemplified by multilayer perceptrons and radial basis functions. Their differences lie in the way in which inputs from preceding layers are combined in the hidden layer. Different combinations of these three network parameters result in a wide variety of neural network models in image classification (Table 8.1). Each model
Model
Author(s)
Strength
Backpropagation
Heermann and Khazenie (1992)
Multichannel data acceptable
Self-organizing topological map (SOP)
Kohonen (1984) Schaale and Furrer (1995)
More accurate results; easy clustering through lateral links; fast training
Adaptive resonance theory (ART) and fuzzy ARTMAP
Carpenter et al. (1997)
Minimized predictive error; stable and “plastic,” scalable
Parallel consensual network
Benediktsson et al. (1997)
Improved accuracy
Binary diamond network
Salu and Tilton (1993)
Easy to configure and use; faster training speed
Structured neural network
Serpico and Roli (1995)
Opacity problem overcome; role of each neuron or multisource data known
Dynamic learning network
Tzeng et al. (1994)
Faster convergence; nonlinear decision boundaries; very accurate results
Hopfield network
Tatem et al. (2002)
Subpixel mapping possible; ease of incorporating ancillary data
Conjugate-gradient BP network
Benediktsson et al. (1993)
Faster convergence; excellent for classifying multisource data
Hierarchical network
Miller et al. (1995)
Fast training through weight updating
Blocked backpropagation network
Liu and Xiao (1991)
Fast convergence during training; avoidance of local minima
TABLE 8.1 Neural Network Models Commonly Used for Image Classification and Their Strengths
Neural Network Image Analysis has its own unique strengths and limitations. Of these models, the most popular ones in classification of remotely sensed data are feedforward, backpropagation, adaptive resonance theory (ART), SOP and structured neural networks.
8.2.1
Feed-Forward Model
In the feed-forward model, all processing units, or nodes, are organized into three general layers: input, hidden, and output (Fig. 8.3), as is common with all other models. Every node in a given layer is fully connected to every other node in the immediate upper layer or lower layer. However, no connection exists among all the nodes located in the same layer. Nodes in the first layer, or input layer, perform two functions, receiving satellite data from the outside world and feeding them into the network (Atkinson and Tatnall, 1997). Sitting at the very top of the layer, these input nodes do not receive any input from other nodes. Their number varies widely. Also varying in its number, the hidden layer lies next to the input layer. All nodes in this layer perform the same two neuron-like functions: collecting the activation of nodes in the previous layer and setting output activation. These processing units receive inputs either from nodes in the input layer or from outputs of a previous hidden layer. Formation of an internal representation of the input data within the network is accomplished through interactions between individual nodes. The inputs or outputs are processed in parallel. The processed information is fed to the terminal nodes in the output layer. These units do not have any links leading away from them. The final outcome of the Input layer
(Satellite and ancillary data)
Hidden layer(s)
Output layer
(Land cover categories)
FIGURE 8.3 The structure of and data flow in a feed-forward neural network. Data flow from the input layer to the hidden layer(s) and eventually to the output layer in one direction. All the nodes in the same layer are fully connected to those in a layer immediately below or above. Each link is assigned a weight.
309
310
Chapter Eight neural network classification from these nodes may correspond to land covers to be mapped from the input data. As a common type of neural network, the layered feed-forward network can be envisaged as a parallel distributed processing system. Once fed into the neural network, data flow through the input units to the output layer via the hidden layer(s) in one direction (Fig. 8.3). There is no feedback mechanism from the output to the input layers. Neither are there any direct connections between individual nodes in the input and the output layers. This kind of network is capable of dealing with only linear relationships through the linear learning discriminants. A set of weights and biases associated with each node must be known prior to a classification, usually determined during network training. The topologies of connecting the input, hidden, and output nodes in the standard feed-forward networks may be modified to form recurrent networks (Ho, 2004) in which information about previous inputs from the hidden units is fed back into and mixed with a set of additional inputs called context nodes, or a subset of the nodes designated as the input processors, through feedback connections for hidden or output nodes (Fig. 8.4a). The former are called partially recurrent networks, and the latter is known as fully recurrent networks (Fig. 8.4b). Partially recurrent networks are much simpler in structure than fully recurrent networks, which require two-way interactions between the hidden and output nodes. Data are fed into both the hidden units (if any) and the output units via the input nodes. Data continue to flow to all connected nodes recursively until the activation of the nodes reaches stability. Afterward the activations of the hidden and output units are recalculated accordingly until the neural
C o n t e x t
I n p u t
H i d d e n
(a)
O u t p u t
H i d d e n I n p u t
O u t p u t (b)
FIGURE 8.4 Partially recurrent (a) and fully recurrent (b) neural networks. Arrow: direction of data flow. (Source: Ho, 2004.)
Neural Network Image Analysis network stabilizes. Then the output values from the output nodes are regarded as the final outcome of the processing. Compared with standard feed-forward networks, recurrent networks are complex and dynamic, even though they may not be stable. Network stability can be enhanced by imposing a constraint on the connection weights.
8.2.2
Backpropagation Networks
A backpropagation neural network is characterized by a feed-forward topology, supervised learning, and the backpropagation learning algorithm. Data pass forward from the input layer to the output layer via the hidden layer(s), just as with the feed-forward network. In the supervised learning, there is a human teacher who knows the desired outcome for any given input. After an input is presented to the input nodes, it propagates forward in the network. An output is initially produced from this input based on randomly assigned weights. This calculated outcome is then compared with the desired output. Their discrepancy is the error signal that is subsequently propagated backward from the output nodes to the input nodes through the network (Fig. 8.5). This backpropagation of errors is implemented iteratively. In each iteration, the synaptic strengths or weights between nodes are adjusted to ensure the output resembles the desired outcome as closely as possible. In its simplest form, a backpropagation network comprises a single hidden layer of nodes, one input layer, and one output layer. These simple multilayer perceptrons can model continuous functions to any degree of accuracy if there are a sufficient
3. Adjustment of weights using error tolerance 2. Specification of desired output Comparison 1. Input (learning rate momentum, RS data)
Output
FIGURE 8.5 The structure of and data flow in a feed-forward, backpropagation neural network. Data flow from the input layer to the hidden layer, and eventually to the output layer. The produced output is compared with the desired one to calculate the error signal that is recursively propagated backward into the input nodes to adjust the synapse strength or weight in such a way that the computed output increasingly resembles the desired output. (Source: modified from Ho, 2004.)
311
312
Chapter Eight number of processing nodes in the hidden layer. These nodes must have nonlinear activation functions in order to serve any useful purpose. This simple model can be expanded by inserting a new layer of hidden nodes, thus turning the linear neural network into a nonlinear one. Nonlinear neural network models are distinctly advantageous over the traditional statistical classifiers as they are capable of performing multivariate logistic regression. Use of a backpropagation network to do logistic regression enables multiple outputs to be modeled simultaneously. In fact, a single backpropagation network model is able to capture the confounding effects from multiple parameters in the input. This backpropagation neural network model has several drawbacks in classification of multispectral remote sensing data, one of which is the need to specify and fine-tune too many parameters before the network can function optimally. Consequently, a huge amount of time is required to configure the network properly during network training. In addition, it is quite common for the standard backpropagation classifier to encounter slow convergence and local minima in classifying remotely sensed data. These problems may be overcome with the blocked backpropagation network model in which the hidden layer is organized into blocks of processors (Liu and Xiao, 1991). In this modified model the hidden nodes in each block are connected to a single output node, but not fully connected to the output layer (Fig. 8.6). This modified architecture of the network converges faster and more smoothly than the standard backpropagation classifier, and achieves more accurate classification results, as well.
Blocked hidden nodes Output nodes
Input nodes
.. . Input
. . .
. . . .. .
. . . Output
. . . . .
FIGURE 8.6 Structure of the blocked backpropagation neural network. (Source: modified from Liu and Xiao, 1991.)
Neural Network Image Analysis
8.2.3
Self-Organizing Topological Map
Devised by Kohonen (1984), this unsupervised feed-forward network model is quite distinct from other networks in that the output nodes are configured into a topological or spatial map through a selforganizing process. The SOP takes into account the spatial arrangement and geometry of output nodes. Neighborhood relations among these output nodes are preserved by imposing a topological structure on them (Fig. 8.7). Adjacent nodes interact with one another differently than with those that are farther away. The varying level of interaction is captured by a varying weight that decays with distance. After the winning output node is declared via competitive learning, a kind of unsupervised learning, its weight is adjusted. Moreover, the weights of those nodes in close proximity of the winning node or within the specified neighborhood are also updated in accordance with their distance to the winning node (Ho, 2004). The weights for those nodes closer to it are updated appreciably, while the weights of more distant output nodes are not altered noticeably. The simplest self-organizing topological or feature map network can be modeled in just two layers, one input layer and one competitive layer, the two being fully connected to each other without any hidden layers in between. When an input pattern is presented to the input nodes, the output nodes compete with one another to be the winning node. This winning output node typically has an incoming connection weight closest to the input pattern as measured by euclidean distance. The output deemed closest to the input pattern becomes the winner, and hence its connection weights are updated by a factor determined by the learning rate. The updating of weights is implemented recursively. Initially, the output nodes are assigned a random weight. After each iteration the weight is adjusted in such a way that the output units gradually match the input pattern. As training progresses, the learning rate decreases while the size of the neighborhood centered at the winning node shrinks. The initially large number of output nodes adjoining the winning node diminishes, and
Input Winning node Nodes in the neighborhood Ordinary output node
FIGURE 8.7 Structure of Kohonen self-organizing feature maps in which the spatial relations among the output nodes are retained. (Source: Ho, 2004.)
313
314
Chapter Eight fewer and fewer weights need to be updated. At the end of training only the weight of the winning unit is adjusted. In self-organizing feature maps, lateral network connections are introduced to ease the clustering of the resulting topological feature space. In conjunction with an activity-induced clustering scheme, SOPs are valuable in classifying land surfaces (Schaale and Furrer, 1995).
8.2.4 ART ART, introduced by Stephen Grossberg in 1976 as a theory of human cognitive information processing, has developed into a family of evolving recurrent neural network models that are defined algorithmically by differential equations. They are usually approximated in the implementation of ART network models. Both supervised and unsupervised learning are possible with ART networks. Unsupervised ART networks are similar to iterative clustering algorithms. Supervised learning is a pattern matching process during which the current input is compared with a select learned category representation. After the input pattern is presented to the ART network, it will determine a winning output mapped to a corresponding output node in the network. The expected pattern from the output ART network provides the overall output pattern. Unlike standard feedforward neural networks in which all input data are encoded, ART models encode only attended features (e.g., those features or patterns that have been successfully matched previously) in the input. If the computed outcome is deemed close enough to the expected outcome from the input, then a state of resonance arises. This resonant state lasts long enough for the weight to be adapted, hence the term adaptive resonance theory (ART) (Carpenter et al., 1997). An output node is declared a winner in a manner similar to the Kohonen topological map. The learned representation may be refined to reflect the newly incorporated information from the current input. Or a new category is created if it does not exist already in the learned representation. However, this winning node is switched off if a match between the actual input pattern and the expected connection weights is not considered sufficiently strong. The winner will be the next closest output node. This process continues until one of the output nodes’ expectations falls within the required tolerance. If no winner is found among the output nodes, then a new output node is committed with the initial expected pattern set to the current input pattern. The search process is controlled by the orienting subsystem. It ensures that the network learns about novel inputs, but still selectively remembers its previous knowledge. The choice parameter of ART governs how deeply the search proceeds before an uncommitted node is selected. Whether the match between the input and the learned representation is deemed acceptable is judged against the vigilance parameter.
Neural Network Image Analysis It dictates the fraction of the input that must be matched with the representation in order for the resonant state to occur. This parameter could be a fixed constant or an internally controlled variable. A small vigilance threshold allows broad generalization, coarse categories, and abstract representations (Carpenter et al., 1997), or vice versa. In ART the already “learned” knowledge in a previous training cycle is refined by incorporating new input in the next training cycle. This ability to interfere learned patterns in the training sample with new ones is important in classifying remote sensing images because the training samples may not necessarily capture the possible range of variability within and among land covers to be mapped. By maintaining stability in tracing previously learned patterns and being flexible or plastic enough to recognize new patterns at the same time, ART network models successfully resolve the dilemma of stability versus plasticity, also known as the serial learning problem in image classification. A slight variation of ART is called ARTMAP whose architecture consists of twin, back-to-back ART networks (Fig. 8.8). One of them is used to classify the input patterns and another to encode the matching output patterns. The two are joined together by an associate learning network, called the MAP field of units, and an internal controller. ARTa
ARTb MAP field
Layer 3
Layer 2
Predictive error
ya
...
...
Network parameter
yb
...
...
Resonance
Layer 1
...
Match tracking
Remote sensing data
...
Desired output
FIGURE 8.8 Architecture of an ARTMAP neural network. ya in layer 3: a short-term memory pattern activated by the vector output from layer 2. (Source: modified from Carpenter et al., 1997.)
315
316
Chapter Eight These nodes serve as an index between the input ART network and the output ART network. During network training, the given input data are fed to the classifying network, while the desired output is fed to its encoding counterpart. The controller creates the minimal number of hidden units needed to meet the accuracy criteria in the classifying network. The output is fed from the classifying network to the MAP field to calculate the predictive error. This error term activates a match tracking process in the classifying network. A new memory search increases the likelihood that one of the categories in the memory will bring down the predictive error. If no such category exists, a new category is created. The learning algorithm in ARTMAP is able to minimize the predictive error (e.g., learning accurately) while maximizing its ability to generalize, namely, to predict previously unseen patterns. Learning can be expedited by adapting the learning rate, or incremental learning. Most ARTMAP algorithms are limited in that they do not have a mechanism to prevent overfitting, and hence should not be used with noisy data (Williamson, 1995). The ART family of networks can be expanded through the introduction of fuzzy logic to them. In this way it is possible to accept real values in the input (Ho, 2004). Fuzzy ARTMAP outperformed a standard backpropagation neural network in classification of Sahelian land covers from multitemporal Advanced Very High Resolution Radiometer (AVHRR) derived normalized difference vegetation index (NDVI) data in terms of classification accuracy and processing speed (Gopal et al., 1999).
8.2.5
Parallel Consensual Network
Proposed by Benediktsson et al. (1997) the parallel consensual neural network has an architecture based on statistical consensus theory. It does not require prior statistical information, but functions analogously to the statistical consensus theory approach. With this structure it is possible to combine certain aspects of statistical consensus theory with neural networks to improve image classification accuracy. In this structure a series of neural networks is organized into several parallel stages, each stage being a particular neural network. Each stage neural network works independently of other stage neural networks as it does not receive any input from the previous stage neural networks. Each stage neural network contains the same number of output nodes and is trained with the same fixed number of iterations. The final outcome of the consensual network output is a weighted averaging from all stage neural networks. Those stage neural networks that produce the best representation of the input data receive the largest weights. The parallel consensual network differs from standard network structure in that it is fed with different representations of input data that have been trained with a stage neural network first. All the input data that have been transformed several times are treated as independent inputs. Tests using two sets of data showed an overall accuracy in the lower 70 to 80 percent range, slightly higher than the minimum distance classifier (Benedikttson et al., 1997).
Neural Network Image Analysis
8.2.6
Binary Diamond Network
This model was developed specifically for classifying remotely sensed data by Salu and Tilton (1993). It is similar to the backpropagation network in that data are fed into the input layer and then passed to other layers until they reach the output layer. Its uniqueness lies in the large number of input nodes for every possible input value of every input spectral band. For instance, there are 256 by 4 nodes in the input layer if the remote sensing data happen to have four multispectral bands recorded at 8 bits. Virtually, a possible pixel value requires one input node to represent. It is possible for some input nodes to be empty (i.e., off, or having a value of 0) while only certain nodes are on (i.e., having a value of 1). In either case it is no concern to the analyst as the structure of the binary diamond network is to be worked out automatically during network training. Besides, there is no network configuration to worry about, and no network parameter to adjust. As network training can be accomplished in one pass, it offers much greater ease of use with a vastly improved training speed over backpropagation networks (Murnion, 1996). The image analyst only needs to specify the number of the input and output nodes. Despite these advantages, the ability of the binary diamond network to classify new regions distant from the training area is inferior to that of the popular backpropagation networks.
8.2.7
Structured Neural Network
In a fully connected neural network, it is impossible to identify the specific contribution of individual neurons. This deficiency can be remedied by modifying the architecture of existing networks to form a structured neural network. This modification also overcomes the network’s unpredictable behavior commonly known as the “opacity problem” (Serpico and Roli, 1995). In this model the output of each hidden neuron is fed to just one neuron in the next layer, so neuron contribution is kept separate thanks to the tree-like architecture of subnets (Fig. 8.9). One subnet is needed for each information class. Each subnet is dedicated to a specific source of data (i.e., from different sensors) or channel. Each channel-related subnet (CRS) is represented by one input node and an equivalent neuron. The input neuron represents pixel values in a spectral band, while the equivalent neuron is reserved for the constraints imposed on these values. All sensor-related subnets (SRSs) are identical to one another in their structure and function. Each neuron in the subnet, however, processes a unique aspect of information. The output neuron in different hierarchies performs different functions. Those in a dedicated subnet are to combine the output of such subnets; those in a subnet are to combine the results of this processing. The output of all the tree-like networks is compared by a decision block that makes the final classification decision.
317
Chapter Eight SRSL
.. ..
.. ..
Sensor L
Band J
CRSJ
.. ..
Band K
. .. . CRSK
.. ..
.. ... .
.. ..
.. ..
.. .. .. .. ..
.. .. .. .. ..
SRSM
Band L
CRSL
.. ..
.. ..
Sensor M
.. .. .. .. CRSS
Band S
318
.. ..
.. ..
.. ..
.. ..
FIGURE 8.9 The architecture of a structured neural network that is made up of two components: a channel-related subnet and a sensor-related subnet. Each sensor-related subnet functions independently of other sensor-related subnets. All channel-related subnets are organized into a tree-like structure. (Source: modified from Serpico and Roli, 1995.)
In order to facilitate the interpretation of the contribution of each data source, the representation of the neuron activation function at each node is simplified through approximation by a piecewise-linear function that achieves a similar outcome. This structure also helps to avoid the trial-and-error search for the optimal network architecture while also achieving classification accuracy comparable to that of standard backpropagation classifiers.
Neural Network Image Analysis More importantly, multisource data can be used in the classification to improve classification accuracy. This neural network is able to explain and quantify the roles played by different sensors and by their channels in classification of multisensor data so that the results by a neural classifier may be validated (Serpico and Roli, 1995).
8.2.8 Alternative Models Because of the difficulties with backpropagation classifiers in network configuration, network training, and comprehension of network operation, a number of alternative ANN models have been proposed and developed to classify remotely sensed data (Table 8.1). They include conjugate-gradient, dynamic learning, and hierarchical networks. Each of them is designed to overcome a particular limitation of the standard backpropagation model. Strictly speaking, these new models cannot be regarded as any new architecture since they are still built from multilayer feed-forward networks commonly called “multilayer perceptrons.” The improvements occur either in the way the learning rate is updated or the learning algorithm is modified (e.g., Kalman filtering).
Conjugate-Gradient Backpropagation (GGBP) Network The lengthy training process characteristic of standard backpropagation networks in which error propagation is based on gradient descent can be shortened via the conjugate-gradient backpropagation network (Benediktsson et al., 1993). Conjugate-gradient optimization is just a little more complex but more efficient than the standard gradient descent in minimizing a cost function, but does not require specification of any parameters such as the gain factor of gradient descent. Extremely effective with general purpose functions, this network optimization is achieved by not specifying search directions beforehand. Instead, it is determined at each cycle of training in such a way that the new direction is conjugate to the previous gradient. Namely, after the current gradient vector is computed, it is linearly combined with the previous direction vectors to obtain a new conjugate vector for the next movement. Weights are updated only after all patterns have been presented to the network in each epoch. Because the conjugacy deteriorates after several iterations, this direction vector has to be reinitialized every certain number of iterations. This modified network is excellent at classifying multisource data, but has the drawback of overtraining. The training process is still computationally complex.
Dynamic Learning Network This network is based on the polynomial function, a modified version of the multilayer perceptrons network. It takes advantage of Kalman filtering, a recursive minimum mean square estimation procedure, in training the network. An updated weight is estimated from the
319
320
Chapter Eight previous weight and the new input data. In this process the stochastic characteristics of incoming input are implicitly incorporated. The Kalman gain is regarded as an adaptive network learning rate (Tzeng et al., 1994). Thus, all weights in the same layer are concatenated to form a long vector. They are updated without the need of backpropagation. Kalman filtering not only increases the convergence rate in the learning stage, but also enhances the separability of highly nonlinear boundaries. This network produced a very high classification accuracy for every class and obtained an overall accuracy of 92 percent from a training sample as small as only 692 pixels (Chen et al., 1995). Thanks to the small training sample, network training is considerably shortened.
Hierarchical Network The rationale behind this model is that land covers derived from remotely sensed data are often hierarchical in nature. Some covers occur at the first level, but others are more detailed at the second or third tier in the Anderson scheme. Through the introduction of this model, it may be possible to improve the discrimination ability of the network and to shorten training by removing “easy” classes so that the network can focus on the training of more-difficult classes (Miller et al., 1995). Proposed by Ersoy and Hong (1990), the parallel, self-organizing, hierarchical neural network consists of multistages of neural networks (Fig. 8.10). Unique to this structure is the error detection of the output at the end of each stage neural network. Those inputs associated with a large error are rejected. The rejected input vectors are nonlinearly transformed before they are fed to the networks at the next stage. This minimizes the network’s learning and accelerates network training as fewer training vectors are involved at subsequent stages. Through real-time adjustment of the error detection bounds,
Input data
NN 2
Output 1
NN 3
Output 2
NN 4
Output 3
NN 1
Output 4
FIGURE 8.10 Structure of the hierarchical neural network in which multiple networks are organized into different levels. (Source: modified from Ersoy and Hong, 1990.)
Neural Network Image Analysis the network is made very robust to faulty data. The architecture is also parallel in that all stage networks operate simultaneously without the need to wait for data from each other during training. It is self-organizing in the sense that the number of stages needed is optimized. Two stages are adequate for easy problems, but more stages are needed for difficult problems. Use of a two-level hierarchical network in mapping 20 land covers did not improve classification accuracy, but training time was reduced by 50 percent over the four-layer standard network (Kanellopoulos et al., 1992).
8.3
Network Learning Before a neural network can perform an image classification, first it has to be taught how to recognize patterns associated with each information class in the input samples. During this learning process, a set of weights for every connection between two nodes in the network is so fine-tuned recursively that the network computed output matches the desired output as closely as possible. Equipped with this set of weights, the network should produce a close approximation of the desired outcome in data not shown to it before. Core to network learning are learning paradigm, learning rate, learning algorithms, and the activation function.
8.3.1
Learning Paradigm
The neural network learning paradigm falls into two broad categories: supervised and unsupervised. In supervised learning, the output neuron is presented with the target value or the desired outcome that should be generated from the given input data. There are a few supervised learning algorithms, including the least mean squares rule or the delta rule, error-correction learning, reinforcement learning, and stochastic learning, all of which aim to minimize errors. In reinforcement learning some feedback given from the environment (e.g., everything the learner or the decision maker interacts with) is only evaluative instead of instructive (Stergiou and Siganos). It can be regarded as learning with a critic as opposed to learning with a teacher. Unsupervised learning is based on local information. Data presented to the network are self-organized to detect their collective properties via competitive learning and hebbian learning. In competitive learning all output nodes compete with one another for the right to respond to a request. Hebbian learning minimizes the same error function that is equivalent to the sum of squared distances between each training case and a linear subspace of the input space (with distances measured perpendicularly), as an autoassociative network with a linear hidden layer (Sarle, 2002). However, the distinction between supervised and unsupervised learning is not always so clear-cut because in unsupervised learning a summarized distribution of probability can be used to make predictions.
321
322
Chapter Eight Neural network learning can be further differentiated as online versus off-line. Online learning occurs when the learning phase and the operation phase take place simultaneously. For instance, the weights are updated immediately after each data point is presented. All the information is discarded immediately afterwards. By comparison, off-line learning, or batch learning, takes place when the two phases are carried out separately. In this way of learning, all the information about learning (e.g., learning rate and weights) is stored and can be accessed repeatedly. It is thus possible to track the progress of training. Off-line learning is commonly associated with supervised learning, but unsupervised learning is performed online (Sarle, 2002). Online learning has both advantages and disadvantages. It is advantageous in that nonstationary environments where the best model gradually changes over time can be better monitored. There is less chance for noise to develop into local minima. Online learning is often faster than off-line learning if the training dataset has a high degree of redundancy (Orr et al., 1999). The downside of online learning is that it is unable to take advantage of many network optimization measures that are widely practiced in off-line learning, such as multiple random initialization, computation of a minimum of the objective function to any desired prevision, conjugate and secondorder gradient methods, support vector machines, and bayesian methods. Thus, off-line learning is easier and more reliable than online learning. A compromise can be reached by combing online learning with off-line learning in which the weights are updated only after certain number of data points.
8.3.2
Learning Rate
The learning rate m refers to the speed at which network weights are tuned with respect to the pace of changes in the average error at the output, analogous to the distance traveled over the error surface after each learning cycle (Haykin, 1999). It is critical to set an appropriate learning rate so that the network can function smoothly. A small learning rate is associated with minor adjustments to the weights. However, if the learning rate is too small, it takes a long time for the network to converge (i.e., the network learns very slowly) and thus prolongs the training process. Conversely, a too large learning rate makes the network unstable and weights oscillate, resulting in no learning at all. The learning rate must be kept strictly away from zero in order to allow the network to track changes over time. Sometimes it is difficult to set the optimal learning rate as it changes dramatically during the training process. On the other hand, use of a constant learning rate to train a network is not recommended as it results in a tedious process based on much trial and error. A compromise can be reached by adapting the learning rate, for instance, initial rate divided by the number of training cycles. In general, they tend to produce
Neural Network Image Analysis better optimization incremental size in the learning rate. On the other hand, gradient-based algorithms should be avoided because gradient, which may also change abruptly, causes the network to behave erratically.
8.3.3
Learning Algorithms
There are different learning algorithms in existence. A popular supervised learning algorithm is the backpropagation delta rule, in which the desired outputs are given as part of the training vector. This forward pass produces the predicted or computed output pattern oi that is compared with the desired output ti. After all the input-output pairs in the training set are processed, the discrepancy between the computed outputs and the desired outputs, known as ε, the error signal or residual, is calculated using Eq. ( 8.1):
ε=
1 k (ti − oi )2 k∑ i=1
(8.1)
where i is the index of the output nodes of the network and k is the number of information classes to be mapped. Critical to the success of network learning is the minimization of the error signal through adjustment of the weights between the concerned nodes. This is achieved by propagating the errors backward from the output nodes to the input nodes by computing the contribution of each hidden node and deriving the corresponding adjustment to the connection weight wij and thresholds needed to produce the correct output. In this way the neural network has just learned from an experience. Through repetitively adapting the synaptic strength wij according to the errors, it is possible to obtain an output that differs as little as possible from the desired one. Adjustment to the synaptic strength wij in the nth iteration is governed by the generalized delta rule (Rumelhart et al., 1986): Δ wij (n + 1) = η(δ j oi ) + α Δ w ji (n)
(8.2)
where h = learning rate a = momentum rate dj = index of the changing rate of the error for the output at node j Momentum refers to the rate at which the previous weight change affects the current adjustment. It controls possible wide oscillations in the weights, which can be caused by alternately signed error signals, and allows the learning rate to be larger without losing stability. Learning rate and momentum are two parameters that control the training process of a backpropagation neural network, such as network training duration and performance. This process of feeding
323
324
Chapter Eight signals forward and propagating the error backward is terminated once the error value converges to a minimum or after the number of iterations or epochs has been reached.
8.3.4 Transfer Functions All the input to a neuron is transformed into an output value plus a bias via the transfer function f, also known as the activation function as the translated output from the input either activates or inhibits an activity. This function plays a critical role in ANN image classification as it, in conjunction with the weight, governs the behavior of the network. Typical transfer functions are linear, threshold, and nonlinear (e.g., sigmoid). Linear functions produce an output that is proportional to the sum of weighted inputs. In the threshold function the output is set to binary, depending on whether the total input is larger or smaller than the specified threshold value. Nonlinear functions produce a modulated output from the input. There are a few nonlinear functions in use, including logistic, tanh, Gaussian, and sigmoidal functions. Tanh and arctan functions produce both positive and negative values, and are conductive to faster training than the logistic function, which produces only positive values. The hyperbolic tangent function is calculated using the following equation: ⎛ ⎞ oi = m tanh ⎜ k ∑ wij o j⎟ ⎠ ⎝ j
(8.3)
where m is a constant, and oj is the output pattern from node j (input pixel values for input nodes). The sigmoid activation function is a better choice than threshold networks in that it is easier to train. For hidden nodes this function is especially preferable to threshold functions that make the network difficult to train. With the sigmoid nodes, a small change in the weights usually induces a change to the output. This sensitivity makes it possible to judge whether the adjustment in weight takes place in the right direction. With the threshold units, a change in the weights will not necessarily result in a change to the output. For continuous-valued targets with a bounded range, such as remote sensing data, the logistic and tanh functions can be used, as well. However, it may be necessary to scale them to a range suitable for the activation function first. The sigmoid activation function resembles the functioning of biologic neurons more than either the linear or the threshold functions, and is the most popular activation function in neural networks classifiers. It translates an input into a nonlinear but continuous output (Fig. 8.11). Usually taking the form of the S-shaped function, it is computed using the following equation: oi = f (net i ) =
1 1 + e − neti
(8.4)
Neural Network Image Analysis 1.0 y=
0.8
1 1 + e –x
y
0.6
0.4
0.2
0.0 –4
–2
0 x
2
4
FIGURE 8.11 The sigmoid transfer function that translates all inputs into an output ranging from 0 to 1.
where neti is the sum of the bias, the weights of its incoming links, and the states of the nodes connected to it, or net i = ∑ wij o j + biasi
(8.5)
j
The above activation function behaves asymptotically in multilayer perceptrons. It makes the hidden nodes in the network function like a nonlinear signal modulator. Through this function an input value is converted to an output ranging from 0 to 1. The threshold effectively shifts the position of the curve, thereby increasing or decreasing the output value, depending on the sign of the threshold. Thanks to this transfer function, it is possible for hidden neurons to translate the input nonlinearly into an output. Without the nonlinearity of these hidden nodes, the network cannot be more powerful than plain perceptrons. It is this capability of representing nonlinear functions that makes multilayer networks so powerful in image classification.
8.4
Network Configuration ANN-based image classification requires selection of an appropriate network model. The success of image classification depends on the proper configuration of this selected network. The potential of ANNs in image classification cannot be fully realized unless the network is optimally configured. On the other hand, the lack of theoretical
325
326
Chapter Eight foundation in network configuration and selection of network parameters is the biggest hurdle to the routine use of ANNs in image classification. Network configuration is made more challenging by the absence of general rules regarding defining suitable network architecture. Thus, it is difficult to configure and define the minimum number of nodes per layer, the learning rate, and the convergence rate objectively (Civco, 1993). Although a number of researchers have examined the impact of network configuration on image classification accuracy (Foody and Arora, 1997; Paola and Schowengerdt, 1997), it is still uncertain whether the established optimal configuration still holds true in light of data from other sensors, or in different geographic areas, because of the “opacity problem.” Therefore, the configuration effort must start from scratch and the best configuration has to be determined experimentally in the classical trial-and-error manner. The blind search for the best neural network architecture and experiments with fine-tuning network parameters reduce the efficiency of this method in image classification. A number of network configuration issues need to be resolved before a successful classification is feasible. The frequently encountered issues in deciding network architecture are specification of the number of hidden layers and hidden nodes, and the number of input and output nodes. The number of input and output nodes can be resolved relatively easily. In a typical network configuration setting, the number of input nodes can be made equal to the number of spectral bands or channels of the remote sensing data used in a classification. Although the number of hidden layers and hidden nodes in the neural network can be selected automatically using the default setting in an image analysis system, it is questionable whether such specifications are the most appropriate. An alternative is to build an arbitrarily large network and then prune out nodes and connections until a competent network is reached. Conversely, it is also possible to start with a small network and then grow it up until it can perform the task competently. In either case there are still the problems of deciding upon the exact number of hidden layers and hidden nodes to be addressed.
8.4.1
Number of Hidden Layers
How to set the appropriate number of hidden layers depends upon the structure of the input data. Usually, it ranges from one to three. In practice, one layer of hidden nodes suffices in most cases. Even if the function to be learned is mildly nonlinear, a simple linear model is a better choice than a complicated nonlinear model if there are too few data or too much noise in estimating the nonlinearities accurately. There is no advantage in using more than one hidden layer if there is only one input. The issue becomes more complicated in light of more than two inputs. Two layers are required only in the presence of a large number of hidden units or network divergence. More hidden
Neural Network Image Analysis layers enable the network to learn more complex patterns and model the data more accurately. Three hidden layers produced higher training accuracy and test accuracy (Skidmore et al., 1997). Unfortunately, the use of two hidden layers exacerbates the problem of local minima, in addition to making the model more complex. Local minima with two hidden layers can result in extreme spikes even if the number of weights is fewer than the number of training cases. In order to overcome this problem, it is important to use lots of random initializations or other methods of global optimization. Another disadvantage of using more hidden layers is the reduced ability of the network to generalize from unseen samples, a problem known as “overfitting.” An overfit network performs well on training data, but poorly on unseen data outside the range of the training data. Moreover, more hidden layers also lengthen network training, but bring few benefits in image classification. For instance, a two hiddenlayer network produced marginally better results than a one hiddenlayer network, but took much longer to train (Murnion, 1996). Having more than one hidden layer did not improve the classification accuracy of the neural network classifier (Benedikttson et al., 1990). In fact, two hidden layers have a lower accuracy because of the “overfitting” problem. This finding is contradictory to that of Skidmore et al. (1997), plausibly due to the number of nodes contained in each hidden layer.
8.4.2
Number of Hidden Nodes
It is important to use a proper number of hidden nodes to maximize network performance. The optimal number of hidden units depends on a number of factors, such as the number of input and output nodes, the number of training cases, the amount of noise in the data, the complexity of the network architecture, the type of activation function for hidden units, and the training algorithms (Sarle, 2002). Too few nodes will not partition the input data enough to allow the network to form an internal representation of all the land covers to be classified. A large enough number of hidden nodes safeguards the neural network’s computational power to learn complex nonlinear functions. Increasing the number of hidden nodes vastly improves the classification ability of a network. A large number of hidden nodes should be used to force the network to converge more efficiently, particularly when the input images are extremely complex and granular (Benediktsson et al., 1990). However the classification accuracy of a network did not improve after the number of hidden nodes exceeded 32. A larger number of hidden nodes led to higher training accuracy, but lowered mean test accuracy (Skidmore et al., 1997). The optimal number of hidden nodes for one hidden layer should approximate 20, rising higher for two and three hidden layers to maximize both training accuracy and test accuracy. Under ideal circumstances, the first hidden layer of two hidden layers should contain 2 to 3 times the number of
327
328
Chapter Eight inputs to define hyperregions or information classes (Kanellopoulos and Wilkinson, 1997). If this number proves unrealistic, then it should be at least equal to twice the number of input nodes. However, a large number of hidden nodes could degrade the generalization ability of the network, even though they may lead to a small training error. In fact, too many hidden nodes cause the network to be static within a localized minimum, and increase the network complexity and computation intensity (Benediktsson et al., 1993). As more and more units are added to the network, it becomes increasingly difficult for the algorithm to find a solution. Of all network configuration issues, determination of the appropriate number of hidden nodes receives the least theoretical guidance (Javis and Stuart, 1996). A “rule of thumb,” proposed by Fletcher and Goss (1993), stipulates that the number of hidden nodes be set between 2n + 1 and 2√(n + m) (n is the number of input nodes; m is the number of output nodes) in order to maximize network performance. This range tends to prevent the weights from becoming stabilized prematurely at a local minimum while still keeping the network reasonably simple in structure. This rule of thumb is flawed in that it ignores noise in the data and complexity of the activation function. An improved rule of thumb should incorporate the data-dependent noise coefficient r in estimating the number of hidden layer nodes (Garson, 1998), or
Np/[r(Ni + No)]
(8.6)
where Ni and No are the number of input and output nodes, respectively, and Np is the number of training samples. It is calculated using 60 × Ni × (Ni + 1), a heuristic suggested by Hush (1989) for estimating the optimal number of training samples. Contrary to a fixed quantity, the optimal hidden layer size has a fairly wide range below which the accuracy decreases and above which the training time increases (Paola and Schowengerdt, 1997). Within this range, changes in the hidden layer size bear almost no relationship with the final classification accuracy. A better alternative is to determine the optimal number of hidden units based on generalization errors produced from training the same network with different numbers of hidden nodes. These errors can be calculated using cross-validation, a process of repeatedly training the network with all the possible subsets into which the data have been divided. During each training, one of the subsets that has been excluded from the training is used to compute the error criterion. The model with the smallest generalization error is regarded as the best among the tried. A large training error indicates that there are too few hidden units. This method works well when there is no early stopping, namely, the training process terminates when the validation error rate becomes larger instead of smaller. A similar method proposed
Neural Network Image Analysis by Yuan et al. (2003) calculates the information gain associated with different numbers of hidden nodes. The number of nodes that produces the highest gain is regarded as the optimal. Those unimportant hidden nodes are then discarded.
8.5
Network Training Classification of remotely sensed data based on neural networks consists of two stages, network training and classification. After the network has been properly configured and the weights and bias initialized, the next step is to train the network. Network training is one of the most crucial steps in the success of ANN classification. It aims at establishing a model between the given input and the desired output so that the input data can be correctly classified into the desired land cover categories. If properly trained, the constructed network should have the ability to generalize and predict outputs from the input not included in the training samples, that is, to produce an approximation of new cases not used for training. A number of issues related to network training need to be resolved before satisfactory results can be expected, such as how many data should be used to train the network at what speed, and the number of iterations or duration of training.
8.5.1
General Procedure
Network training is usually accomplished in four steps: data input, specification of the desired output, calculation of error tolerance, and feedback of the error signal to the input to adjust weights (Fig. 8.12). All the patterns in the training samples representative of all prospective known land covers from the remotely sensed data to be classified are fed forward repeatedly to the input nodes. The input patterns are propagated through the network until they reach the output nodes. As each pattern is presented, the network calculates an output that is passed to the output nodes. The results produced by the machine may differ from the desired outcome. Their difference is termed predictive error. Through iterative adjustment of the weights of connections in the network, the outcome will gradually shift to the ideal ones, bringing down the predictive error. This process is repeated with new examples as the weights are continuously adjusted until the desired output is obtained for each example eventually. In order to generate a satisfactory result, the entire set of training samples should be presented to the network a number of times. If the available training data are small in size, experience gained from previous classifications may be reused to increase ANN classification accuracy (van Coillie et al., 2004). There are no general criteria for setting the initial values, such as network connections or weights, learning step-size and search direction, and some other parameters dictated by the type
329
330
Chapter Eight
Learning rate momentum Initialization of weight 2
1 Input data
Encoding
∑
f
Computed output
Desired output
Predictive error
3
4 Weight update
No
Acceptable? Yes Training termination
FIGURE 8.12 Process of training the backpropagation network in preparation for image classification.
of function-minimization strategies employed (German and Gahegan, 1996). A common practice for initializing the network is to randomize the weight of every node in the network. Preferably, this weight should start from a small value as it is more conducive to avoidance of saturation. The exact magnitude of this small value should reflect the scale of the input data and the number of inputs and their correlations. The trouble with this strategy is that network training tends to be very slow and lengthy, especially when a large number of nodes are involved or the sample size is large.
8.5.2
Size of Training Samples
There exists a positive correlation between training sample size and classification accuracy (Heermann and Khazenie, 1992). While a large training set produces more reliable results, it also slows down the training process even with a powerful machine. Once over a certain threshold, a large training set causes the network to be overtrained, namely, it loses its ability to generalize for independent data. Conversely, if insufficiently trained, the network is unable to recognize patterns in the input samples. Therefore, it is essential to select an optimal number of training samples, defined as the minimum number that allows a representative set to be selected. In ANN classification
Neural Network Image Analysis a much larger number of training samples are needed to produce a satisfactory result. However, it is unknown what the optimal training sample should be. A rule of thumb is that a quarter of the total dataset is needed in order for a network to be adequately trained (Miller et al., 1995). The optimal training sample size for each class should be at least 10 to 30 times the number of input spectral bands in statistical classifiers (Mather, 1987). Certainly, this recommended size varies with scene complexity, and can be reduced by selecting distinct patterns randomly.
8.5.3
Nature of Training Samples
Patterns in the training samples must be representative of all values likely to be associated with a particular land cover to be mapped. Those pixel values not included in the training samples may not be correctly classified. Accurate, meaningful results are possible only when quality training datasets are fed to the classifier (Skidmore et al., 1997). Otherwise network outputs will not be very reliable. A wide variety of mixed pixels need to be included in the training samples to achieve good recognition of dominant classes and not just of subsidiary classes, as might have been anticipated (Bernard et al., 1997). It is hence better to train the network with both pure and mixed pixels. The representativeness of training samples is warranted through two ways, using a large sample or by deleting those pixels that are incorrectly classified from the samples. A more reliable network is expected if the training dataset also contains some kind of random noise. In this sense the network is only as good as the training data.
8.5.4
Ease and Speed of Network Training
Neural networks can be trained sequentially based on distinct local subsets of the training data after data are clustered. The constructed network may be further trained using standard algorithms operating on the global training set. The newly established network effectively inherits the knowledge from the local training procedure before improving its generalization ability through subsequent global training (Jiang and Wah, 2002). The speed of network training is subject to the complexity of network architecture and training parameters, such as learning rate and momentum. Backpropagation and recurrent backpropagation networks tend to train quite slowly. By comparison, ART networks train quite fast, usually in a few passes. The process of training can be accelerated via the proper specification of training rate and momentum. They are usually treated as a pair of decimal values, with higher values nearer to unity. Within limits, momentum may be used to support a higher learning rate and faster convergence. A higher learning
331
Chapter Eight rate is needed to classify scenes of high spectral variability such as the built environment. Another method of speeding up the training process is to simplify the network architecture by reducing the size of the input layer (e.g., use of fewer spectral bands; refer to data encoding in Sec. 8.6.2) and by eliminating those hidden nodes that are deemed insignificant during training, while leaving the remainder of the network intact (Dreyer, 1993). No guideline exists to govern the specification of the appropriate number of iterations or error tolerance. The relationship between the number of iterations and the error level of a training dataset, which is affected by the initial weights, can be used to determine the optimal number of iterations. The error signal is drastically reduced initially, but the reduction is more gradual after a certain number of iterations (Fig. 8.13). However, it is also possible for the error level of the validation and test datasets to surge after decreasing up to a certain degree, resulting in small fluctuations (Fig. 8.14). This process is terminated if the number of iterations or the error tolerance is reached. Termination of network training based on the number of iterations is flawed because at the specified iteration neither the training accuracy nor the test accuracy may reach maxima. Use of the error tolerance level as the terminator of network training is not ideal either because the error tolerance may be localized rather than global minima if it does not decrease consistently. This is known as the local minima problem. Besides, a smaller error tolerance only represents a higher training accuracy. There is no evidence to suggest that it will always translate
300 250 200 Error
332
150 100 50 0
0
100
200
300 400 500 600 Number of iterations
700
800
900
FIGURE 8.13 Relationship between the number of iterations and network accuracy expressed as RMS (root mean square) error. The three lines correspond to three different initial weight settings.
Error rate
Neural Network Image Analysis
Time to stop training
Local minima Plateau
Test data Global minima Training data
Training cycle
FIGURE 8.14 The relationship between training cycles and error rate for training data and test data. One potential pitfall during network training is encountering local minima, as shown in the diagram. Training should be terminated just before the error rate bounces up for the test data.
into higher test accuracy (Sun et al., 1997). The optimal way of terminating network training is when the error rate starts to bounce up for the test data (Fig. 8.14).
8.5.5
Issues in Network Training
Two issues related to network training deserve special consideration here: network convergence and overtraining. If a network takes too long to train, it does not look promising. This may hint that training of the network has encountered difficulty (e.g., it does not converge). The progress of training can be monitored periodically by examining the classification accuracy, or the prediction accuracy, of the network. If the RMS error falls quickly and then stays flat, or if it oscillates up and down, then the network is trapped in a local minima instead of a global one (Fig. 8.14). In light of nonconvergence of a neural network, the analyst should examine the model architecture first to ensure that it is suitable for the problem. Data representation schemes need to be evaluated next to ensure all key input parameters are scaled or coded properly. Minimization of training errors can lead to overfitting and poor generalization if the number of training cases is small relative to network complexity. Overtraining occurs when the same patterns are repeatedly presented to the neural network and the weights are adjusted to match the desired outputs accordingly. What the network does in this case is to simply memorize the patterns rather than learn and extract the essence of the relationships. Such trained neural networks may perform extremely well on the training data, but poorly
333
334
Chapter Eight with unseen patterns in the input data owing to their limited ability to generalize. Three strategies help to avoid overfitting: • First, have the right perspective. The sole objective of network training is not to predict the training data most accurately. Instead, training aims at determining an optimal set of weights based on the samples so that the trained network performs optimally on the test and validation data. • Second, divide the data into two sets, one of which is used as the training set and another as the validation set. Occasionally, the training session is terminated to assess the network performance on the independent validation set. During validation the established weights are not altered. So long as the network is still learning (i.e., performing a good measure of generalization), training will proceed. By plotting the error versus training cycle on both training and validation sets, it is possible to know when training should be terminated (Fig. 8.14). • Last, insert random noise into the input data. The random nature of the noise prevents the network from forming any approximation of the noise, thus improving the network’s ability to generalize.
8.6
Features of ANN Classifiers 8.6.1
Methods of Data Encoding
Data encoding refers to the format and enumeration scale of representing the input data before they are fed to the network for classification. How data are encoded affects the architecture of the network, such as the number of input nodes needed. If the input data are categorical in nature, they must be recoded numerically to be understood by the neural network. Even if they are digital, they still have to undergo proper encoding. As demonstrated in Eq. (8.4), all input data must be transformed into a range of −1.0 and 1.0 to be useable in an ANN classification. On the other hand, all remote sensing data are not recorded on this scale. Therefore, they have to be encoded. There are two broad categories of data encoding methods, per channel and binary. In the former method, each spectral band of the input data is allocated an input node. The number of input nodes equals the number of input bands. The number of output nodes is also set the same as the number of information classes, or one node per class. More input nodes may be needed for other nonimage data. For instance, an extra node is usually reserved for computing the bias (Benediktsson et al., 1997), or to record the confidence level of classification, dependent upon the network structure. In image classification
Neural Network Image Analysis it is a common practice to use all available spectral bands because the more spectral bands used in ANN classification, the higher the classification accuracy (Foody and Arora, 1997). However, the temptation of adding more spectral bands to a classification indiscriminately must be resisted as addition of every input node increases the network complexity. Whether to add extra nodes must be weighed carefully. It is not cost effective to add a node unless the input data contain substantially different information from all other input bands. In binary encoding each bit of the input data is represented with an input node. If the remotely sensed data are recorded in 8 bits, or 256 gray levels, the total number of input nodes required equals the number of input channels multiplied by 256 (Benediktsson et al., 1993). Data encoding in the output layer is usually binary, with 1 for pixels belonging to the class and 0 otherwise (Serpico and Roli, 1995). This method of deterministic encoding in which the two codes add up to 1, however, can be misleading. Analogous to probabilities, they may be mistakenly construed as representing class membership or a posterior probabilities, and thus should be avoided. Instead, a range of [0.003,0.99] has been suggested to avoid the confusion. In order to express the genuine a posterior probabilities, this deterministic data encoding can be expanded by inserting one more node to store the confidence level at which each pixel is classified. In this probabilistic encoding, the value varies from 0 to 1.0. A membership of 0.9 implies that the probability of the pixel belonging to the classified category is quite high, while a membership of 0.2 indicates a low probability, just like in a fuzzy classification. The difficulty with probabilistic encoding is the large number of categories in the results. If presented in a map, they will not have a high legibility and cannot be evaluated for their accuracy using the traditional method.
8.6.2
Incorporation of Ancillary Data
The attempt to further improve the accuracy of land covers using the ANN classifiers has witnessed a number of endeavors, the commonest of which is to incorporate multitemporal, multisource remote sensing data, and even non-remote sensing ancillary data, into the classification. Inclusion of multisensor, multitemporal data, in an ANN classification is conducive to the improvement of classification accuracy. Multitemporal data refer to remotely sensed data of the same geographic area acquired by the same sensor at different times. These data have the same spatial and spectral resolutions. Multitemporal data are indispensable in certain image analyses, such as detection of land cover changes. Multisensor data refer to remotely sensed data originating from different sensors and platforms. Possibly or most likely, these data do not share the same spatial and spectral resolutions, or the same polarization for radar data. Ancillary data refer to non-remote sensing data from other sources or those data that are
335
336
Chapter Eight not normally used in image classification. Non-remote sensing data may be geographic or environmental, usually stored as a layer in a geographic information system (GIS) database. The nature and type of geographic data used in image classification varies with the application. Commonly used geographic data are topographic, such as digital elevation models (DEMs) and the variables derived from them (Frizzelle and Moody, 2001). Geographic data are vital in classifying images of mountainous regions to account for the topographic effect. Such variables as local solar zenith angle, slope, elevation, and orientation that are useful in mapping vegetation can also be derived from DEMs (Carpenter et al., 1999). Other ancillary data include pixel location defined by the northing and easting coordinates. However, the use of geographic data may invoke additional processing as they differ from remote sensing data in format and reliability. They may not be in the digital format. Even if they are, they may not be enumerated at the same scale as the remote sensing data, namely, continuously varying from a scale of 0 to 255 or higher, depending up the quantization level. Furthermore, they may not be equally reliable as the remote sensing data. These differences demand extra processing to reconcile. Analog ancillary data have to be converted into the digital format via scanning or digitization. The acquired digital geographic data and ancillary information have to be transformed to the same ground reference system as the remote sensing data. Non-remote sensing data have to be encoded to conform to the requirements of the neural network classifier, in a manner discussed previously. A varying weight may be attached to an input variable, which may not be indispensable if all data are to be standardized (see Sec. 8.6.3). Ancillary data may also be derived from satellite imagery itself. Image-derived ancillary data are not related to pixel values. They concern the spatial association of pixels such as texture, contexture, shape, and even fractal dimension (Chen et al., 1997). Such ancillary information as texture can be incorporated into a neural network classifier without even having to define a texture measure explicitly (Bischof et al., 1992). A simple method of including texture in the classification is to specify pixels within a given window (e.g., 3 × 3) as the input to the network (Paola and Schowengerdt, 1997). Multitemporal data and multisource ancillary data are easily incorporated into ANN classifiers, thanks to their high flexibility. This incorporation can be readily accomplished by altering the number of input nodes in their structure.
8.6.3
Standardization of Input Data
Data standardization, also called normalization, refers to a process during which all input data are rescaled to the same range or the same standard deviation. Theoretically, standardization is totally unnecessary if all input data are to be combined linearly as in a multilayer perceptron network. The importance and reliability of a data source can be captured by assigning a large weight and bias to it in
Neural Network Image Analysis the linear combination, which achieves the same effect as rescaling. Whether the input data should be standardized depends primarily on the nature of the network. Data standardization is indispensable for multisource data that may include remote sensing images and GIS data if they are to be combined via a distance function in a radial basis function network. This is because the contribution of one source of data depends on their variability relative to other data in the input. Standardization ensures that all input data enumerated at different scales have a variance commensurate with their importance. This is especially important if their prior importance is unknown. In this case it is sensible to scale the more important data to a wider variance or range (Sarle, 2002). Apart from necessity, standardization can bring out the following tangible benefits: • Expediting network training and minimizing the chance of getting trapped in local minima. With standardized data fewer training cycles are needed to reach a small system error as the nodes have nearly the same range of weights (Skidmore et al., 1997). • Standardized data enable the weight decay and bayesian estimates to be determined more conveniently. • Data standardization creates certainty in setting initial weights for the network. Standardization of the input data eliminates the problem of scale dependence of the initial weights. Standardization of data to a range of [−1, 1] is preferable to a range of [0, 1] as the former is much more conducive to the avoidance of local minima. As a matter of fact, any scaling range that centers at zero for its mean or a median of 1 can achieve the same effect. Best results are achieved if the standard deviation of standardized training data amounts to 1.
8.6.4
Strengths and Weaknesses
Unlike the human analyst, ANN classifiers rely almost exclusively on pixel values while ignoring the spatial association between pixels. This inferiority is compensated for by • the nonnecessity of a priori knowledge on the statistical distribution of the input data • a compliant structure that can be adapted to improve performance on particular problems (Carpenter et al., 1997) Neural network–based classification has a number of distinct features such as the ability to generalize, error correction, and graceful degradation. Of its various strengths, the most significant ones are flexibility and versatility (Table 8.2). The extreme flexibility of neural
337
338
Chapter Eight
Advantages
Disadvantages
Nonlinearity possible
Prohibitively slow training for large networks on sequential machines
No statistical assumptions on data distribution required
Problematic with patterns not represented in supervised classes
Relative tolerance of noisy and missing data
Potentially large memory needed for some types of networks
Both nominal and continuous data acceptable
Difficult to configure the ideal network “architecture” and training algorithms
No need to program learning algorithms into the network
Problems with local minima in training
Efficient classifiers if properly trained
Unpredictable network behavior (opacity problem)
Able to adapt to changes in data over time
Opaque in explanation of predictive model and process
Ability to incorporate ancillary data easily Output of classification confidence possible (e.g., fuzzy classification) Source: modified from Javis and Stuart, 1996.
TABLE 8.2 Features of Neural Network Classifiers in Comparison with Parametric Classifiers
networks enables them to handle both continuous and categorical data in the input and output. Categorical data are dealt with in two ways, either by using a single node with each category given a subset of the range from −1 to 1 or by using a separate node for each category. Continuous data are easily mapped into the necessary range. Thanks to this flexibility, nonlinear combinations of features can be incorporated into neural networks, which makes them much more powerful than standard statistical or decision tree classifiers. It is possible to experiment with all possible combinations of features so as to derive the best solution. ANNs are capable of taking advantage of highly nonlinear decision boundaries in the feature space. The versatility of neural network classifiers enables them to be combined with themselves or with other classifiers to classify multisource data and to take full advantage of the strengths of each component classifier. For instance, one neural network classifier may base the classification on one data source, and another on a different
Neural Network Image Analysis Rule 1
Sl
El Rule 2
Sk
Ek
Em Rule 3
Sm
FIGURE 8.15 An example of embedding an expert knowledge classifier into a neural network classifier. (Source: modified from Desachy et al., 1996.)
source. Both sources of data can be fused to reach the final classification decision (Fig. 8.15). A neural network classifier may be combined with other classifiers, such as a rule-based expert system classifier to improve the accuracy of decision making in the classification process. The output of the expert system classifier can serve as an additional input layer for the neural network classifier in an effort to produce the highest overall accuracy (Liu et al., 2002). Incorporation of correct, complete, and relevant expert knowledge improves the accuracy of neural network classification results. The performance of ANN classification of land cover is also boosted via its combined use with texture and decision trees. In spite of the aforementioned strengths, ANN classifiers do face a number of problems (Table 8.2). The three major limitations among them are identified below: • It is difficult to predict network behavior and comprehend how a classification task is accomplished owing to the holistic approach used by the machine. Neural network classification is based on the concept of learning. So long as the machine learns how to solve the problem, there is no need to program the algorithm, which is impossible to know in some cases. However, this also brings out the problem of unpredictability. It is unknown beforehand how the machine will behave for a given
339
340
Chapter Eight dataset. This lack of knowledge creates uncertainties as how to further improve the accuracy of the classification results. • Neural networks cannot explain the results they generate because of the absence of explicit rules. It is possible to reveal which inputs are more important than others through sensitivity analysis that can be performed inside the network, by using the errors generated from backpropagation, or externally by poking the network with specific inputs. This limitation makes them a suitable choice in applications in which production of classification results is more important than understanding the mechanism of their generation. • Data fed to a network must be restricted to a narrow range of −1 and 1. All inputs to a neural network have to be scaled if they do not conform to this range. This requirement necessitates additional transformation and manipulation of raw data before they are acceptable to the classifier. The extra processing and transformation of categorical values into this numeric value range may rely on a histogram. However, skewed distributions with a few outliers can result in poor neural network performance.
8.7
Parametric or ANN Classifier? ANNs represent a novel approach to image classification that differs drastically from the conventional parametric classifiers in that no specific algorithms are presented to the computer. This is advantageous in that sophisticated problems can still be solved by the computer even if the steps needed for the solution are unknown. In this section, its performance relative to parametric classifiers is compared via a case study. This study also illustrates the capability of ANN classification in mapping salt farms. This section ends with a critical assessment of the potential of ANN classification.
8.7.1
Case Study
This case study focuses on salt farm mapping using a neural network classifier (Zhang et al., 2006). Unlike land-based features that have a wide range of pixel values on multispectral satellite data, water bodies in a salt farm exhibit a subtle range of radiometric variation due to differences in water properties (e.g., concentration levels of impurities and depth). As artificially constructed spatial entities, all fields in a salt farm have a regular shape and a uniform composition, even though they may be used for the production of salts at various stages. These fields have a spatial arrangement that facilitates assessment of their mapping accuracy. Pixels within a given field have a spatially uniform spectral property. Such a property provides an excellent opportunity to train the classifier with spectrally uniform samples, and in realistically
Neural Network Image Analysis assessing the performance of neural network classifiers in handling spatially homogeneous features within an area, and especially in resolving subtle variations among different fields. The test site is the Taibei Salt Farm located in central Lianyungan City, northern Jiangsu Province of East China. Featured prominently inside the site are water bodies of various uses, including reservoirs, and salt and marine farms. Salt fields fall into three categories of use: evaporation, condensation, and crystallization. These three zones were classified from a Landsat Thematic Mapper (TM) image that had been radiometrically calibrated and geometrically rectified, together with the marine farms and reservoirs. The neural network classifier used was a layered feed-forward model in ENVI (Environment for Visualizing Images) version 4.0 with standard backpropagation for supervised learning. The network configuration included the training rate (2), number of training iterations (2000), 0.1 (a larger value was found ineffective in classifying images where a number of ground covers cluster) for training threshold contribution, 0.1 for training momentum (that had little influence on the classified results when other parameters were held constant), 0.1 for the training RMS exit criteria, and 100 for the minimum output activation threshold. Accuracy of the classified covers was evaluated using 50 randomly selected checkpoints from the respective cover classes. The neural network classifier enables the salt farm to be mapped into the three zones reasonably well (Fig. 8.16). Spatially, the crystallization zone is situated most closely to roads adjoining the condensation zone that is widespread throughout the study area, arranged in elongated form. Evaporation zones are located at the bottom of the study area and next to marine farm at the top of the scene. The classification was achieved at an overall accuracy of 84.4 percent. All the detailed neural network–produced classes have an accuracy of
N
0
1
2
3
4 km
Reservoir Marine farm Crystallization zone Condensation zone Evaporation zone Other
FIGURE 8.16 Salt farm zones in the Taibei Salt Farm classified using the neural network method. (Source: Zhang et al., 2006.) See also color insert.
341
342
Chapter Eight 76 percent or higher. The producer’s accuracy is higher than 80 percent for all water classes except for the evaporation zone. Its low salt content makes it spectrally resemble reservoir and marine farm. This resemblance enhanced its confusion with them and lowered its mapping accuracy. By comparison, the producer’s accuracy for the crystallization zone is the highest, at 86 percent. As for the user’s accuracy, it is higher than 80 percent for all covers except the condensation zone, which has the lowest user’s accuracy of 78.2 percent. In contrast, the evaporation zone has a particularly high user’s accuracy of 95 percent. The user’s accuracy is highly similar to producer’s accuracy for the condensation zone. The producer’s accuracy of general water covers (i.e., reservoir and marine farm) is higher than that of water bodies in the salt farm. This fact suggests that it is more difficult to map water bodies in a salt farm accurately, even though they contain a higher level of salt than ocean water. Their accurate mapping demands the use of hyperspectral remote sensing data.
8.7.2 A Comparison Since the parametric classifier (maximum likelihood) is unable to resolve the three types of salt farm fields, it was decided to amalgamate them into one cover type in both the parametric and neural network results (Fig. 8.17) for the sake of comparison. After amalgamation the salt farm mapped using the neural network method has a spatial pattern of distribution (Fig. 8.17a) highly resembling that in the maximum likelihood results (Fig. 8.17b). All the classified covers look more reasonable in this map than in Fig. 8.17b except roads and marine farm. Most of the void (not classified) in that map has been correctly classified as roads in the neural network results. Marine farm is found inside salt farm, and to a lesser degree, inside reservoir. Such an illogical spatial occurrence results from misclassifications.
N
0 1
2
3
4 km
(a)
Reservoir Marine farm Salt farm Settlement Roads
(b)
FIGURE 8.17 General land covers classified from TM satellite data. (a) Results from the neural network method; (b) results from the maximum likelihood method. (Source: Zhang et al., 2006.) See also color insert.
Neural Network Image Analysis With the maximum likelihood method, salt farm was mapped at an accuracy of 76 percent. A higher accuracy level could not be achieved because of the failure to reliably resolve different water covers in the salt farm. If mapped using the ANN method, this accuracy rose to 84 percent. Thus, the neural network classifier is able to produce more accurate results than the parametric classifier when the targets of mapping exhibit subtle tonal variations such as different fields in a salt farm. In particular, the neural network is especially good at classifying spatially extensive ground covers of a regular shape such as reservoir and salt farm. Thanks to this strength, the neural network classifier is quite suited to the monitoring of salt farm dynamics. The improvement in the mapping accuracy of the neural network classifier over the parametric method is comparable to what others have achieved in mapping land covers. A fuzzy ARTMAP neural network achieved an accuracy of over 85 percent, against 78 percent achieved using the maximum likelihood classifier in general land cover mapping (Gopal et al., 1999). Therefore, neural network is better than maximum likelihood not only in mapping water bodies in a salt farm but also in mapping general land covers.
8.7.3
Critical Evaluation
The performance of various neural network classifiers in mapping land covers relative to that of parametric classifiers (e.g., maximum likelihood) has been extensively evaluated in a number of studies (Table 8.3). Most studies have confirmed the superiority of ANN classifiers. In classifying land into five covers, the multilayer perceptron neural network achieved the best results, at an overall accuracy of 93 percent, higher than 88.7 percent achieved with the maximum likelihood classifier (Erbek et al., 2004). This superiority is affected by the detail level of the land cover changes detected (Seto and Liu, 2003). At the fine class resolution (23 classes), ARTMAP achieved an overall accuracy much higher than that by the maximum likelihood method. However, this superiority is subdued at the coarse class resolution level (10 classes), at which the neural network is only marginally more accurate than the maximum likelihood method. The neural network classifier is more successful than maximum likelihood and evidential reasoning in mapping tundra in the boreal forest of central Yukon (Leverington and Duguay, 1996). The superior accuracy of ANN is attributed to the use of ancillary DEM-derived data that are difficult to take advantage of in the maximum likelihood classification. Owing to the inclusion of slope, aspect, shade, and elevation, a fuzzy ARTMAP outperformed the maximum likelihood method in terms of the highest overall accuracy achieved (Carpenter et al., 1997). The superior accuracy of ANN in mapping agricultural and forested areas is not possible without the use of ancillary data (Frizzelle and Moody, 2001). It attained an overall accuracy of 90.9 percent, slightly lower than 92.2 percent associated with
343
344
Accuracy (%) Objective of Classification
Imagery Used
No. of Covers
Parametric
ANN
Authors
Vegetation and agricultural
TM
8
92.2
90.9 (6-10-6) 96.8 (12-20-20-8)
Frizzelle and Moody (2001)
Urban change
TM
23 10
75.8 84.1
84.4 (fuzzy ARTMAP) 89.8
Seto and Liu (2003)
General land use
TM
9
58.5 61.5
60.5 (BP) 62.0 (BP) 85.5 (SOM + BP)
Yoshida and Omatu (1994)
General land use
TM
4
84.7
85.9 88.1 (with texture)
Bischof et al. (1992)
General land use
TM
5
88.7
93 (MLP) 85.1 (LVQ)
Erbek et al. (2004)
Vegetation
TM + terrain
8
54.0∗
57.2∗
Carpenter et al. (1997)
Terrain
Polarimetric SAR
3
74.1
72.6 (LVQ) 77.2 (LVQ + ML)
Hara et al. (1994)
Multiple land cover and vegetation
AVHRR
14
52.3
82.8 (gaussian ARTMAP) 79.3 (fuzzy ARTMAP)
Muchoney and Williamson (2001)
NDVI Global land cover
AVHRR
12
78†
85†
Gopal et al. (1999)
Land and cloud
AVHRR + SMMR
12 (8 clouds)
84.8
52.7 (FF BP)
Key et al. (1989)
∗
Highest accuracy among different combinations of variables in the input (e.g., topographic variables). Accuracy based on training samples, not ground reference data.
†
TABLE 8.3 Comparison of Statistical Parametric Classifiers with Neural Network Classifiers
Neural Network Image Analysis maximum likelihood, even though both are able to characterize land cover as continuous fields. The performance of neural network classifiers is also boosted by the use of texture, in addition to topographic data (Bischof et al., 1992). Without texture, the neural network classifier is barely more accurate than the maximum likelihood method in classifying Landsat TM data into four covers. With texture, the neural network accuracy is elevated to a level much higher than the parametric method. It must be noted that the superiority is also subject to the specific form of network implementation. For instance, a fuzzy implementation of ARTMAP achieved an accuracy 27 percent higher than its maximum likelihood counterpart in classifying NDVI data into 14 land covers (Muchoney and Williamson 2001). The classification accuracy using gaussian ARTMAP is even higher than fuzzy ARTMAP at 82.8 percent, thanks to the use of the a priori probabilities. In most applications both types of classifiers achieved a comparable performance. For instance, in classifying urban land use, accuracy is generally close for maximum likelihood and neural network classifiers, ranging from 60.6 to 89.5 percent (maximum likelihood), and from 72.8 to 89.8 percent (neural network) (Paola and Schowengerdt, 1995). However, the map produced by the neural network is visually more accurate. The performance of structured, probabilistic, and multilayer perceptron neural networks is extremely similar to the k-nearest neighbor classifier in mapping agricultural areas from airborne TM and SAR images (Serpico et al., 1996). The accuracy of both classifiers is highly comparable to one another, in the upper 80 percent range. The lowest accuracy is associated with the structured neural network. Even misclassifications are common to all classifiers. The comparable performance of the two types of classifiers is confirmed in mapping vegetation from TM imagery, a highly challenging task even with incorporation of such ancillary data as topography (Carpenter et al., 1997). The best accuracy achieved with the maximum likelihood method is slightly lower than the highest overall accuracy achieved using the fuzzy ARTMAP classifier. This higher accuracy results from inclusion of slope, aspect, shade, and elevation in ANN-based classification. In a nine-category classification of Landsat TM data, the overall accuracy of 58.5 percent achieved from maximum likelihood is very close to 60.5 percent from the backpropagation neural network (Yoshida and Omatu, 1994). Both accuracies improved after some pixels that were incorrectly classified previously were removed from the training samples. The superiority of the neural network method did not show up until after the Kohonen’s selforganizing feature map was combined with the backpropagation algorithm. The combined neural network produced the most accurate results at 85.5 percent, which are also more realistic and noiseless than those derived from the conventional statistical method.
345
346
Chapter Eight However, the superior performance of the neural network is not guaranteed if the algorithm is not powerful enough. For instance, the learning vector quantization network produced lower classification accuracy than maximum likelihood (Erbek et al., 2004). The inferior performance of this network was substantiated by Hara et al. (1994), who obtained an accuracy of 72.6 percent in mapping terrain classes from polarimetric SAR images, lower than 74.1 percent with the supervised maximum likelihood method. It is also possible for the neural network to perform much worse than the parametric method in certain applications, such as cloud mapping from coarse resolution AVHRR and Nimbus-7 multichannel microwave radiometer data (Key et al., 1989). The mapped land covers (4) and cloud types (8) agreed with manual interpretation at 53 percent, much lower than 84.8 percent achieved with the maximum likelihood method. The lower accuracy is attributed to the inability of the neural network model to differentiate ice from cloud accurately, both having a similar spectral response. Neural network is less accurate than maximum likelihood in discriminating land covers. In mapping land covers into seven categories from TM data, the producer’s accuracy is higher for every category in the maximum likelihood–derived result except agricultural land (Civco, 1993). The user’s accuracy is similarly lower for all covers except barren land, sometimes even substantially. The studies reviewed above (Table 8.3) indicate that both maximum likelihood and neural network classifiers can be optimal, depending on the nature of the data used. Maximum likelihood is superior to ANN if the land covers to be classified have a unimodal distribution which makes them ideal to be classified using the maximum likelihood method (Javis and Stuart, 1996). It is inadequate to simply compare the numerical accuracy across the board as each classification was achieved with different parameterization. Although neural networks tend to be slightly more accurate in most cases, this accuracy level is not vastly better than the parametric method. Besides, additional data must be included in the input. There are two methods by which the performance of the neural network method can be certainly boosted: • The first method is to use more external variables, such as topographic variables and even soil properties, in the mapping. Use of an array of pixels pooled together to form functional regions or sites is also conducive to the achievement of more accurate results if predictions are made at the site level instead of the usual pixel level (Carpenter et al., 1997), a topic to be covered in depth in Chap. 10. • The second method is to combine different classifiers to form a hybridized classifier that is able to produce the best results. Higher accuracy can be expected from the combination of ANN classifiers with parametric classifiers such as maximum
Neural Network Image Analysis likelihood because their performance varies with different combinations of input variables (Hara et al., 1994). These two classifiers tend to make somewhat different predictive errors.
References Atkinson, P. M., and A. R. L. Tatnall. 1997. “Neural networks in remote sensing.” International Journal of Remote Sensing. 18(4):699–709. Benediktsson, J. A., J. R. Sveinsson, O. K. Ersoy, and P. H. Swain. 1997. “Parallel consensual neural networks.” IEEE Transactions on Neural Networks. 8(1):54–64. Benediktsson, J. A., P. H. Swain, and O. K. Ersoy. 1990. “Neural network approaches versus statistical methods in classification of multisource remote sensing data.” IEEE Transactions on Geoscience and Remote Sensing. 28(4):540–552. Benediktsson, J. A., P. H. Swain, and O. K. Ersoy. 1993. “Conjugate-gradient neural networks in classification of multisource and very-high-dimensional remote sensing data.” International Journal of Remote Sensing. 14(15):2883–2903. Bernard, A. C., G. G. Wilkinson, and I. Kanellopoulos. 1997. “Training strategies for neural network soft classification of remotely-sensed imagery.” International Journal of Remote Sensing. 18(8):1851–1856. Bischof, B., W. Schneider, and A. J. Pinz. 1992. “Multispectral classification of Landsat-images using neural networks.” IEEE Transactions on Geoscience and Remote Sensing. 30(3):482–490. Carpenter, G. A., M. N. Gjaja, S. Gopal, and C. E. Woodcock. 1997. “ART neural networks for remote sensing: Vegetation classification from Landsat TM and terrain data.” IEEE Transactions on Geoscience and Remote Sensing. 35(2):308–325. Carpenter, G. A., S. Gopal, S. Macomber, S. Martens, C. E. Woodcock, and J. Franklin. 1999. “A neural network method for efficient vegetation mapping.” Remote Sensing of Environment. 70(3):326–338. Chen, K. S., Y. C. Tzeng, C. F. Chen, and W. L. Kao. 1995. “Land-cover classification of multispectral imagery using a dynamic learning neural network.” Photogrammetric Engineering and Remote Sensing. 61(4):403–408. Chen, K. S., S. K. Yen, and D. W. Tsay. 1997. “Neural classification of SPOT imagery through integration of intensity and fractal information.” International Journal of Remote Sensing. 18(4):763–783. Civco, D. L. 1993. “Artificial neural networks for land-cover classification and mapping.” International Journal of Geographical Information Systems. 7(2):173–186. Desachy, J., L. Roux, and E. H. Zahzah. 1996. “Numeric and symbolic data fusion: A soft computing approach to remote sensing images analysis.” Pattern Recognition Letters. 17(13):1361–1378. Dreyer, P. 1993. “Classification of land cover using optimized neural nets on SPOT data.” Photogrammetric Engineering & Remote Sensing. 59(5):617–621. Erbek, F. S., C. O. Zkan, and M. Taberner. 2004. “Comparison of maximum likelihood classification method with supervised artificial neural network algorithms for land use activities.” International Journal of Remote Sensing. 25(9):1733–1748. Ersoy, O. K., and D. Hong. 1990. “Parallel, self-organizing, hierarchical neural networks.” IEEE Transactions on Neural Networks. 1(2):167–178. Fletcher, D., and Goss, E. 1993. “Forecasting with neural networks: An application using bankruptcy data.” Information and Management. 24:159–167. Foody, G. M., and M. K. Arora. 1997. “Evaluation of some factors affecting the accuracy of classification by an artificial neural network.” International Journal of Remote Sensing. 18(4):799–810. Frizzelle, B. G., and A. Moody. 2001. “Mapping continuous distributions of land cover: A comparison of maximum-likelihood estimation and artificial neural networks.” Photogrammetric Engineering and Remote Sensing. 67(6):693–705. Garson, G. D. 1998. Neural Networks: An Introductory Guide for Social Scientists. London: Sage.
347
348
Chapter Eight German, G. W. H., and M. N. Gahegan. 1996. “Neural network architectures for the classification of temporal image sequences.” Computers and Geosciences. 22(9):969–979. Gopal, S., C. E. Woodcock, and A. H. Strahler. 1999. “Fuzzy neural network classification of global land cover from a 1 degrees AVHRR data set.” Remote Sensing of Environment. 67(2):230–243. Hara, Y., R. G. Atkins, S. H. Yueh, R. T. Shin, and J. A. Kong. 1994. “Application of neural networks to radar image classification.” IEEE Transactions on Geoscience and Remote Sensing. 32(1):100–109. Haykin, S. S. 1999. Neural Networks: A Comprehensive Foundation (2nd ed.). Upper Saddle River, NJ: Prentice Hall. Heermann, P. D., and N. Khazenie. 1992. “Classification of multispectral remote sensing data using a back-propagation neural network.” IEEE Transactions on Geoscience and Remote Sensing. 30(1):81–88. Ho, T. B. 2004. Data mining with neural networks (Chap. 6). In Knowledge Discovery and Data Mining Techniques and Practice. http://www.netnam.vn/unescocourse/ knowlegde/61.htm. Hush, D. R. 1989. “Classification with neural networks: A performance analysis.” Proceedings of the IEEE International Conference on Systems Engineering. Dayton, Ohio, U.S., pp. 277–280. Javis, C. H., and N. Stuart. 1996. “The sensitivity of a neural network for classifying remotely sensed imagery.” Computers and Geosciences. 22(9):959–967. Jiang, X., and A. H. K. S. Wah. 2002. “Constructing and training feed-forward neutral networks for pattern classification.” Pattern Recognition. 36(4):853–867. Kanellopoulos, I., A. Varfis, G. G. Wilkinson, and J. Megier. 1992. “Land-cover discrimination in SPOT HRV imagery using an artificial neural network—A 20 class experiment.” International Journal of Remote Sensing. 13(5):917–924. Kanellopoulos, I., and G. G. Wilkinson. 1997. “Strategies and best practice for neural network image classification.” International Journal of Remote Sensing. 18(4):711–725. Key, J., J. A. Maslanik, and A. J. Schweiger. 1989. “Classification of merged AVHRR and SMMR Arctic data with neural networks.” Photogrammetric Engineering and Remote Sensing. 55(9):1331–1338. Kohonen, T. 1984. Self Organization and Associative Memory. Berlin: SpringerVerlag. Leverington, D. W., and C. R. Duguay. 1996. “Evaluation of three supervised classifiers in mapping ‘depth to late-summer frozen ground,’ central Yukon Territory.” Canadian Journal of Remote Sensing. 22(2):163–174. Liu, X. H., A. K. Skidmore, and H. van Oosten. 2002. “Integration of classification methods for improvement of land-cover map accuracy.” ISPRS Journal of Photogrammetry and Remote Sensing. 56(4):257–268. Liu, Z. K., and J. Y. Xiao. 1991. “Classification of remotely sensed image data using artificial neural networks.” International Journal of Remote Sensing. 12(11): 2433–2438. Mather, P. M. 1987. Computer Processing of Remotely Sensed Images. Chichester, U.K.: Wiley-Interscience. McCulloch, W. C., and W. Pitts. 1943. “A logical calculus of the ideas immanent in nervous activity.” Bulletin of Mathematical Biophysics. 5:115–133. Miller, D. M., E. J. Kaminsky, and S. Rana. 1995. “Neural network classification of remote sensing data.” Computers and Geosciences. 21(3):377–386. Muchoney, D., and J. Williamson. 2001. “A gaussian adaptive resonance theory neural network classification algorithm applied to supervised land cover mapping using multitemporal vegetation index data.” IEEE Transactions on Geoscience and Remote Sensing. 39(9):1969–1977. Murnion, S. D. 1996. “Comparison of back propagation and binary diamond neural networks in the classification of a Landsat TM image.” Computers and Geosciences. 22(9):995–1001. Orr, G., N. Schraudolph, and F. Cummins. 1999. CS-449: Neural networks, http:// www.willamette.edu/~gorr/classes/cs449/linear2.html.
Neural Network Image Analysis Paola, J. D., and R. A. Schowengerdt. 1995. “A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification.” IEEE Transactions on Geoscience and Remote Sensing. 33(4): 981–996. Paola, J. D., and R. A. Schowengerdt. 1997. “The effect of neural-network structure on a multispectral land-use/land-cover classification.” Photogrammetric Engineering and Remote Sensing. 63(5):535–544. Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. “Learning internal representations by error propagation.” In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, ed. D.E. Rumelhart and J. McClelland, 1:318–362, Cambridge, MA: MIT Press. Salu, Y., and J. Tilton. 1993. “Classification of multispectral image data by the binary diamond neural network and by nonparametric, pixel-by-pixel methods.” IEEE Transactions on Geoscience and Remote Sensing. 31(3):606–617. Sarle, W. S. (ed.). 2002. Neural network FAQ, part 1 of 7: Introduction, periodic posting to the Usenet newsgroup comp.ai.neural-nets, ftp://ftp.sas.com/ pub/neural/FAQ.html. Schaale, M., and Furrer, R. 1995. “Land surface classification by neural networks.” International Journal of Remote Sensing. 16(16):3003–3031. Serpico, S. B., and F. Roli. 1995. “Classification of multisensor remote-sensing images by structured neural networks.” IEEE Transactions on Geoscience and Remote Sensing. 33(3):562–578. Serpico, S. B., L. Bruzzone, and F. Roli. 1996. “An experimental comparison of neural and statistical non-parametric logarithms for supervised classification of remote-sensing images.” Pattern Recognition Letters. 17(13):1331–1341. Seto, K. C., and W. Liu. 2003. “Comparing ARTMAP neural network with the maximum-likelihood classifier for detecting urban change.” Photogrammetric Engineering and Remote Sensing. 69(9):981–990. Skidmore, A., B. Turner, W. Brinkhof, and E. Knowles. 1997. Performance of a neural network: Mapping forests using GIS and remotely sensed data. Photogrammetric Engineering and Remote Sensing. 63:501–514. Stergiou, C., and D. Siganos. Neural networks, http://www.doc.ic.ac.uk/~nd/ surprise_96/journal/vol4/cs11/report.html. Sun, C., C. M. U. Neale, J. J. McDonnell, and H.-D. Cheng. 1997. “Monitoring landsurface snow conditions from SSM/I data using an artificial neural network classifier.” IEEE Transactions on Geoscience and Remote Sensing. 35(4):801–809. Tatem, A. J., H. G. Lewis, P. M. Atkinson, and M. S. Nixon. 2002. “Super-resolution land cover pattern prediction using a Hopfield neural network.” Remote Sensing of Environment. 79(1):1–14. Tzeng, Y. C., K. S. Chen, W.-L. Kao, and A. K. Fung. 1994. “Dynamic learning neural network for remote sensing applications.” IEEE Transactions on Geoscience and Remote Sensing. 32(5):1096–1102. van Coillie, F. M. B., L. P. C. Verbeke, and R. R. de Wulf. 2004. “Previously trained neural networks as ensemble members: Knowledge extraction and transfer.” International Journal of Remote Sensing. 25(21):4843–4850. Williamson, J. R. 1995. Gaussian ARTMAP: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps. Technical Report CAS/CNS-95-003, Boston: Boston University, Center of Adaptive Systems and Department of Cognitive and Neural Systems. Yoshida, T., and S. Omatu. 1994. “Neural network approach to land cover mapping.” IEEE Transactions on Geoscience and Remote Sensing. 32(5):1103–1108. Yuan, H.C., F. L. Xiong, and X. Y. Huai. 2003. “A method for estimating the number of hidden neurons in feed-forward neural networks based on information entropy.” Computers and Electronics in Agriculture. 40(1–3):57–64. Zhang, Y., J. Gao, and J. Wang. 2006. “Detailed mapping of a salt farm from Landsat TM imagery using neural network and maximum likelihood classifiers: A comparison.” International Journal of Remote Sensing. 28(10):2077–2089.
349
This page intentionally left blank
CHAPTER
9
Decision Tree Image Analysis
D
ecision tree image analysis has been used to map land covers from satellite data since the early 1980s. These classifiers have the advantages of artificial neural network (ANN) algorithms while avoiding some of their limitations. In this chapter the fundamentals of decision trees are introduced first. Various types of decision trees based on one or more variables are presented next. The focus of this chapter then shifts to construction of decision trees. Issues related to building a tree, such as feature selection and node splitting rules are discussed. The common decision trees that have found applications in image classification are introduced in the next section. Finally, this chapter concentrates on decision tree–based image classification, including the main characteristics of decision tree classifiers. Emphasis of the discussion is placed on the classification accuracy level achieved and its relative performance with respect to parametric classification methods such as the maximum likelihood classifier.
9.1
Fundamentals of Decision Trees A decision tree describes the conditions under which a set of constituent attributes are abstracted into a set of general informational classes. Conditions are more detailed aspects of the environment that can be expressed as unequal relationships. The structure of a decision tree comprises typically variables, nodes, branches, and leaves. Variables fall into several categories, such as predictor variables referring to the input data (i.e., multispectral bands), and target variables representing the land covers to be mapped (Breiman et al., 1984). User-defined variables may include raster images, vector geographic information system (GIS) layers, spatial models, external programs, and simple scalars. There are several categories of nodes as well, such as the root node, internal nodes, and terminal nodes, dependent on their position in the tree. In one decision tree there is only one root node and it
351
352
Chapter Nine Hypothesis
Rule
Conditions
Distance to airport <20 km Distance to highway<1 km Suitable location
Easy accessibility Slope gradient <15° Orientation = north
FIGURE 9.1
A typical example of a decision tree branch.
contains all the input data. Also called splits, internal nodes vary in their number. At these decision nodes, some test is carried out on a single attribute value, with one branch and subtree for each possible outcome of the test. All internal nodes in a decision tree are associated with one parental node and two or more descendant nodes. Terminal nodes, or leaf nodes, are virtually leaves of the tree representing pixel labels. They do not have any descendant nodes. Branches are made up of three components, hypothesis, rules, and conditions (Fig. 9.1). A hypothesis is a condition or state concerning a special attribute of the remote sensing data to be classified. Rules are conditional statements that produce an outcome based on the evaluation of the condition. Each rule can be associated with several conditions. All conditions must be true to fire a rule. However, only one rule must be true to satisfy the hypothesis in case of multiple rules. When the hypothesis of one rule is referred to by a condition of another rule, a decision tree grows in depth. The terminal nodes or hypotheses of a decision tree at the bottom of the tree represent the final classes of interest. Intermediate hypotheses may also be flagged as being a class of interest. This may occur when there is an association between classes. Many rules may be needed for the reliable and accurate identification of one land cover. Multiple rules can be combined in reaching a conclusion. There is no limitation as to the number of rules that can be logically combined. The rule of thumb is the more rules included in reaching a decision about a class, the more likely the conclusion will be correct (i.e., it is more likely the pixel under examination will be labeled correctly). These rules must be mutually exclusive. Multiple rules can be organized using various strategies, the most popular being the tree, or hierarchical, structure. In this method rules are organized into layers. In a top-down way the execution of one rule at a top layer triggers the execution of other rules at the lower layers. In a hierarchical multilayer decision tree, there are more branches at a lower level than at a higher level. The tree is commonly called binary if there are only two branches at every split (Fig. 9.2). Namely, a dichotomous decision is made at each of the internal nodes
D e c i s i o n Tr e e I m a g e A n a l y s i s
NDVI>0.7 yes
Root no 0.4
Forest yes
Split no 0.2
Shrub yes Leaf
no 0.1
Pasture
no
yes Degraded
Elevation >100
grassland yes Barren
no Water
FIGURE 9.2 An example of a binary decision tree. At each internal node, represented by an oval box, there are only two choices; a decision is made by comparing one aspect of the data with a threshold. The dataset is split based on the outcome of the evaluation. The leaf node refers to the class value or label assigned to each observation.
to separate one class or some of the classes from the remaining classes. In this way the input remote sensing data are successively partitioned into smaller and purer subdivisions. Each partition is represented by an internal node in the tree.
9.2 Types of Decision Trees There are many types of decision trees. These trees can be classified with different criteria, such as the variables used for testing at each internal node, the homogeneity of rules in all branches, and the nature of the terminal nodes. Therefore, trees can be classified as univariate or multivariate, discrete or regressional, homogeneous or hybrid. They are elaborated on below.
9.2.1
Univariate Decision Trees
A univariate decision tree is a particular type of tree in which the decision at every internal node is based on the evaluation of the same
353
354
Chapter Nine single variable, even though it is possible for different attributes of the variable to be tested at different nodes. At each node a simple test in the form of boolean comparison is performed. This comparison usually looks like
xi ≤ b
(9.1)
where xi is an ordered variable or a feature in the data space, and b is a threshold objectively estimated from the distribution of xi by using a number of measures, all of which maximize the dissimilarity and minimize the similarity of the descendant nodes. A popular method of determining b is via a histogram. Decisions are made in accordance with the outcome of the test. Each test is normally associated with a limited number of outcomes. In a binary univariate tree, a pixel is allocated to the left node at the next level if the evaluation is found to be true, or to the right node otherwise (see Fig. 9.2). In this way the input data are partitioned into two or more homogeneous subsets. This process is recursively executed until a leaf node is reached. At this level the class value of this leaf is used to label the pixel in question by means of an allocation strategy, such as majority voting. A good example of a univariate decision tree is the use of the normalized difference vegetation index (NDVI) in labeling pixels for land cover mapping. For instance, if NDVI is >0.7, then land cover = forest; otherwise, if NDVI is >0.4 and ≤0.7, then land cover = shrub; otherwise, if NDVI is >0.2 and ≤0.4, then land cover = pasture; otherwise, if NDVI is >0.1 and ≤0.2, then land cover = degraded grassland; otherwise if elevation is >100, then land cover = barren or desert; otherwise land cover = water (Fig. 9.2). The decision boundary at each node of a univariate decision tree can be derived from training samples empirically. Orthogonal to the axis of the selected attribute or variable, these boundaries partition the data or feature space into hyercuboids, with each hypercuboid corresponding to an information class (Fig. 9.3). An accuracy of 100 percent is achieved if all observations are linearly separable. As demonstrated in the NDVI example above, it is allowable to combine a few tests against the same variable in reaching a decision at an internal node. No matter how many tests are undertaken in reaching the decision, the decision boundaries in a univariate tree are always parallel to one another and to one of the attribute axes. They can be properly defined via a combination of features. This kind of tree is suitable for situations where only a small number of features are involved, and where these features are numbered only a few at each stage and do not interact with each other. It is computationally complex. The order of computation increases exponentially with the number of categories for a categorical variable (Loh and Shih, 1997). Most critically, its performance is poor (Breiman et al., 1984), which may be overcome with multivariate decision trees.
D e c i s i o n Tr e e I m a g e A n a l y s i s
Class 1
Variable B
Class 2
Class 3 Class 4 Variable A
FIGURE 9.3
9.2.2
Axis-parallel decision boundaries of a univariate decision tree.
Multivariate Decision Trees
In a multivariate decision tree the decision making at each internal node is based on a test against at least two attributes of the input data simultaneously, in contrast with the univariate decision tree that applies a test to a single attribute at a time. Unique to a multivariate decision tree is that the decision rule at each internal node is created through a linear combination of several features, such as using the linear discriminant function. The splitting rule at an internal node takes the following form: n
∑ ai xi ≤ b
(9.2)
i =1
where xi = vector of measurements on the n selected attributes a = vector of coefficients of a linear discriminant function {a1, a2, …, an} that are usually estimated from the training data b = threshold value This decision rule is more flexible than the one in Eq. (9.1). As with univariate decision trees, the actual value of b used in the decision rules at each internal node of a decision tree is estimated statistically from training data or automatically using a machine learning algorithm. Due to the use of this linear discriminant function, the decision boundary is no longer orthogonal to the feature axes (Fig. 9.4). Nor do these decision boundaries intersect with each other at a right angle anymore. Nevertheless, the decision boundaries are always linear. Thus, the performance of a multivariate decision tree decreases if the data are not linearly separable.
355
Chapter Nine
Class 1 Attribute B
Class 2
Bound ary for output node 1
r fo ry e 2 da od un t n Bo tpu ou
r y fo dar de 3 n u o Bo ut n p out
Class 3
Attribute A
FIGURE 9.4
Decision boundaries of a multivariate decision tree classifier.
The same dataset can be partitioned into purer subsets using either a univariate decision tree or a multivariate tree (Fig. 9.5). Multivariate decision trees are more compact than univariate trees owing to the simultaneous use of multiple variables at the same internal node in the decision making. This increases the complexity of the splitting rules, and makes multivariate trees more difficult to interpret than univariate trees.
4
Elevation >2.5 G = grassland W = woodland
Elevation (km)
356
3
2
1
G G G G G G GG G G GG G G W G W G W W W W W W W W W W WW
0.2
0.4
0.6 NDVI (a)
true G
false Elevation <2.5 false
true W
0.8
NDVI <0.4 true
false
G
W
1.0 (b)
FIGURE 9.5 (a) Comparison of decision boundaries in a multivariate decision tree (dotted line) and a univariate decision tree (dashed lines). The dataset can be classified using a linear combination of the two features (i.e., if elevation + 4.52 NDVI ® 3.8 then woodland, otherwise grassland); (b) the same decision rule can be expressed as a series of unequal statements.
D e c i s i o n Tr e e I m a g e A n a l y s i s As shown in Fig. 9.5, the application of the linear discriminant function can separate grassland from woodland neatly. However, the same level of separation cannot be achieved from a series of unequal statements in the univariate decision tree because the decision boundaries are always orthogonal to each other. In fact, multivariate trees have been found to produce more accurate classifications than univariate decision trees (Brodley and Utgoff, 1992). However, the performance of multivariate trees is subject to a number of factors, such as the decision algorithms at internal nodes. These algorithms govern how the input data should be split properly. Estimates of splitting rules (to be discussed in Sec. 9.3.4) vary with the nature of the remote sensing data and the complexity of the classification scheme. The split at each internal node of a multivariate decision tree is based on more than one feature that can be selected from a large pool of candidate features. Essentially, multivariate tree algorithms perform local feature selection rather than global feature selection. The selection of features in each test is based on the data observed at a particular node. The selected features are likely to vary from node to node. It is not possible to select a uniform set of features on which tests for the entire tree are based. Thus, there is no guarantee that a multivariate decision tree always outperforms a univariate tree in terms of predictive accuracy in image classification (Pal and Mather, 2003). With hyperspectral data of tens or even hundreds of spectral bands, multivariate decision trees do not produce more accurate results than univariate decision trees in land cover classification. Although both univariate and multivariate trees may produce an identical classification from the same input data, the implementation in the multivariate format can be more efficient. It is also possible to use different classification algorithms at different nodes of a decision tree classifier. This brings out the issue of hybrid decision trees.
9.2.3
Hybrid Decision Trees
In a decision tree classifier, decision-making algorithms in estimating the splits at internal nodes have varying degrees of homogeneity. If these algorithms are uniform at every internal node, the decision tree is termed homogeneous. In contrast, a heterogeneous, or a hybrid, decision tree uses various hypotheses for different branches, causing the decision-making mechanism to vary from branch to branch (Fig. 9.6) (Friedl and Brodley, 1997). With the assistance of the statistical model or “learning algorithm” used, it is possible to adopt a unique splitting method at each branch of a large decision tree (Brodley, 1995). Commonly used classification algorithms (e.g., linear discriminant functions and k nearest-neighbor clustering) may be combined with learning algorithms to form a hybrid tree classifier. Another method of hybridizing decision-making algorithms is to combine a regular decision tree with a neural network classifier to provide confidence or uncertainty information via majority voting.
357
358
Chapter Nine
DT
ANN
Evidential reasoning
A
K-means
A
B
C
B
FIGURE 9.6 A hybrid decision tree (DT) classifier that contains different classifiers in different subtrees. In this case the right branch uses the K-means clustering algorithm, and the left one uses an ANN algorithm.
Although a hybrid decision tree is considerably more complex than its homogeneous counterpart, it is still a worthwhile and viable option in image classification because hybridization brings out two advantages: • First, the performance of an algorithm varies with the dataset to be classified. If different classification algorithms can be integrated into a single hybrid tree, the dataset can expect to be classified with the most appropriate algorithm. The integration of different classification algorithms into a single decision tree creates an opportunity for the dataset to be classified optimally. For instance, a subset of classes or a dataset for a given problem (or both) may be more accurately classified by using different classification algorithms, resulting in more accurate classification results (Brodley, 1995). • Second, the use of hybrid splitting rules improves the flexibility of a decision tree as well. Complex classification problems and subsets of the decision space (i.e., subtrees) can be adapted to characteristics specific to different subsets of the data.
9.2.4
Regression Trees
When the target variable is discrete or categorical (e.g., a class attribute in a land cover classification), a tree is known as decision tree classification. When target variables are continuous, it is known as decision tree regression (Xu et al., 2005). A regression tree, also called decision
D e c i s i o n Tr e e I m a g e A n a l y s i s tree regression, is a variant of decision tree classifiers that is designed to approximate real-valued outcomes such as the proportion of a land cover within a mixed pixel. The implicit assumption underlying the decision tree regression approach is that relationships between features (i.e., spectral bands) and target objects (i.e., class proportions) can be described either linearly or nonlinearly. Therefore, the complex nonlinear relationships between spectral bands and class proportions of mixed pixels in remote sensing images are handled competently by the decision tree regression approach. Regression decision trees are suited to soft classification of remote sensing data in which mixed pixels are decomposed into constituent covers. In a soft classification, pixel values become the predictor variables, or the feature vector, and the known class proportions of a pixel in the training sample become the target variables, or the target vector (Xu et al., 2005). On the basis of training samples, a separate tree must be constructed for each class in the classification scheme in order to perform a soft classification (Fig. 9.7). Thus, M trees are needed in a classification of M classes. Class proportions expressed on a scale of 0 to 1 are predicted from the input pixel values according to each regression tree. The predicted class proportions for each tree, denoted as DT(i) (i = 1, ..., M), are normalized using the following equation to identify its proportion on the ground:
P(i) =
DT(i) (i = 1, 2, K , M) ∑ DT(i)
(9.3)
i
Reference data
Training data
Decision-tree forest (M classes)
........... DT1
DT2
DTm–1
DTm
Output normalization Classification output
...........
FIGURE 9.7 Decision tree regression approach for soft classification of remote sensing data. Training data are from the image; reference data are known as class proportions. (Source: Xu et al., 2005.)
359
360 9.3
Chapter Nine
Construction of Decision Trees The construction of a sound decision tree requires a large set of training samples representative of the entire dataset. Contained in these samples are the input of a feature vector (i.e., a number of spectral bands) and the target vector (i.e., a number of ground covers). These training samples must be of high quality as they are used to “learn” the relationship among the features and classes present within the data. Initially, a large portion of the data is used to determine the structure of the tree. This is done by breaking down these data using every possible binary split at an internal node. The training data are used to estimate the splits at each internal node via a statistical procedure. The splitting process is then applied to each of the new branches iteratively. The process continues until each node reaches a user-specified minimum node size (i.e., the number of training samples at the node) and becomes a terminal node. There are a number of issues involved in the construction of a decision tree for image classification, such as methods of construction, feature selection (i.e., which feature(s) should be selected at which level of splits), tree pruning, and refinement. Dependent upon the construction method adopted, specific issues may arise later in the process of construction. Common to all methods is how the input remote sensing data should be split into homogeneous subsets so that each one approximates a certain category of the land cover classes to be mapped.
9.3.1
Construction Methods
Numerous approaches have been developed to construct decision trees. The various heuristic methods for designing decision trees fall roughly into four primary categories: bottom-up, top-down, hybrid, and growing-pruning (Safavian and Landgrebe, 1991). The bottom-up method starts with the information classes (e.g., terminal or leaf nodes) and amalgamates them successively until the root node encompasses all classes. The criterion for combination is some kind of distance measure, such as mahalanobis distance. After pairwise distances among all classes are computed, the two classes with the shortest distance are joined together to form a large and more generalized class. This process is repeated with updated distances until there is only one class left at the root node. In essence, this is unsupervised hierarchical clustering analysis. The top-down approach starts from the root node where the input samples are partitioned using a splitting rule until they can no longer be subdivided or the stopping criterion has been reached. The sample data are used to grow the tree to the maximum first, without any stopping rules. This tree is then pruned back by removing insignificant
D e c i s i o n Tr e e I m a g e A n a l y s i s branches to achieve the optimal structure. The three essential tasks in the construction are • Selection of node splitting criteria • Specification of the stopping rule • Labeling terminal nodes Of the three tasks, the first is so complex that it warrants an indepth discussion under a separate heading (Sec. 9.3.4). By comparison, the last is the easiest to accomplish by assigning terminal nodes to the classes of the highest probability. A common method of estimating the probability is to divide the samples for each class by the total number of samples at that specific terminal node. The stopping rule dictates how data subdivision should be terminated. The general principle is that termination should occur when newly generated splits lead to little gain in the overall accuracy of subsequent splits. For instance, it is not economical to increase splits from 11 to 12 if the prediction accuracy rises only marginally from 89 to 89.3 percent. Data partitioning may be terminated by specifying a predefined maximum level of nodes or a minimum number of objects, or when the nodes are turned to terminal nodes even if data at these nodes can be potentially split further. With these stopping rules, it is possible to prevent the tree from overfitting. However, the drawback of this threshold method is that the stop may occur prematurely at some nodes (e.g., the tree is unable to perfectly classify the training data), but could be too late at some others. In either case the so-constructed tree is rarely optimal and has to be optimized. The hybrid method, proposed by Kim and Landgrebe (1990), integrates the bottom-up and top-down approaches sequentially. In this combination the bottom-up approach supplies information on the shape and center of clusters that are formed in the clustering analysis in the top-down approach. This integrated approach is able to expedite convergence. The implementation of this method starts with the entire dataset that is subsequently grouped into two clusters in a bottom-up approach. The mean and covariance of each cluster are then calculated and used to generate two new clusters in a top-down algorithm. Every cluster is examined to determine whether it contains only one class. If so, it is labeled as terminal. Otherwise the process is repeated until all clusters are labeled as terminal (Safavian and Landgrebe, 1991). The growing-pruning method is very similar to the top-down approach in that the tree is allowed to grow to the maximum using the training data and then pruned back using the validation data. So it will not be repeated here.
9.3.2
Feature Selection
The first step in constructing a decision tree is to determine which attribute among all those available should be tested at which node. In
361
362
Chapter Nine a decision tree, features that make a significant contribution to the variance of training data are selected for classification and the remaining features that contribute little are rejected, thereby increasing the computational efficiency. Feature selection and classification are performed simultaneously. Moreover, the sequence of selecting a feature should reflect its importance. Namely, more important features should be selected at a split higher in the tree than less important ones in the decision-making process. At a given node the selected attribute should be the most effective for splitting the dataset. Several measures are available for feature selection at each internal node within a multivariate decision tree (Friedl and Brodley, 1997). The most commonly used ones are information gain and information gain ratio. Information gain is a statistical parameter indicative of the effectiveness of an attribute in classifying the training data. It measures how well a given attribute separates the training samples according to their target classification. Used in every step in growing a tree, information gain can be precisely measured by entropy. In information theory, entropy indicates the amount of information that is missing before reception. In image classification it is an indicator of purity or homogeneity of training samples. The gain in information from splitting the training samples according to an attribute is evaluated against the calculated entropy. A comparison of information gain among all the attributes reveals the most relevant one that should be selected and tested at or near the root node of the tree. In order to understand the mathematical underpinning of entropy, let us suppose sample T has k possible values for the target attribute, {C1, C2, …, Ck}. Its entropy is calculated as k
Entropy ( T ) = − ∑ pi log 2 pi i=1
(9.4)
where pi stands for the proportion or probability of T belonging to class i. It is calculated using Eq. (9.5). Note the logarithm has a base of 2 because entropy is a measure of the expected encoding length in bits. Entropy can be as large as 1 if the target attribute has an equal distribution, or as small as zero if there is only one possible value. Thus, a larger entropy value is indicative of a more uniform distribution, and a smaller entropy value suggests a more uneven distribution. pi =
freq(Ci , T ) |T |
(9.5)
where freq (Ci, T) refers to the number of cases in T belonging to class Ci, and |T| is the total number of observations in T. In a test, the total information gain after applying A to partition T into n subgroups is defined as
D e c i s i o n Tr e e I m a g e A n a l y s i s n |T | Gain ( T , A) = entropy ( T ) − ∑ i × entropy (Ti ) |T | i =1 1
(9.6)
Gain (T, A) is the information provided about the target function value, given the value of some other attribute A. The second term in the above equation represents the expected value of entropy after T is partitioned using attribute A. It is the sum of the entropy of each subset Ti, weighted by its proportion in the entire dataset or |Ti|/|T|. The information gain Gain (T, A) is normalized to compensate for varying tests with a large number of splits. Information gain is not appropriate for some variables. In order to prevent the occurrence of such a situation, it is better to use the information gain ratio, which is obtainable by dividing information gain by splitting information, or
Gain ratio (X) = gain (X)/split info (X)
(9.7)
n |T | ⎛|T |⎞ Split info (T) = − ∑ i × log 2 ⎜ i ⎟ | T | ⎝|T |⎠ i=1
(9.8)
where
The process of selecting a new attribute and partitioning the training examples is repeated for each nonterminal descendant node using only the training examples associated with that node. Attributes that have been incorporated already in the tree are automatically excluded from further consideration. Thus, an attribute can appear once along any path through the tree (Ho, 2004). This process continues for every new leaf node until either of the following two conditions is met: • Every attribute has already been included along this path through the tree. • The training samples associated with this leaf node all have the same attribute value (i.e., their entropy is zero). This means that all observations at each leaf node represent the same class and no information is gained from splitting them again. In addition to information gain and information gain ratio, other measures such as chi-square and the Gini index have also been used for feature selection. Nevertheless, the choice of measures does not exert any noticeable impact on land cover classification accuracy (Pal and Mather, 2003). Different measures achieve nearly the same accuracy level, even though information gain ratio is slightly more accurate than the other three.
363
364
Chapter Nine
9.3.3 An Example In this example, T is a collection of training samples comprising 16 pixels of two land cover identities, woodland and grassland (Table 9.1). Of these 16 pixels, 10 represent woodland (W) and 6 are grassland (G). In terms of land cover, T is divided into two groups, T = [10W, 6G]. Thus, the entropy of the 16 pixels is calculated as
Entropy ([10W, 6G]) = −(10/16) log2(10/16) − (6/16) log2(6/16) = 0.954 There are four attributes defining the properties of these pixels: texture, elevation, tone, and pattern. All of them are categorical in nature. There are two types of pattern, vague and definite. The two subsets of T partitioned by these two values look like
Tvague fl [3W, 5G] Tdefinite fl [7W, 1G]
Pixel
Texture
Elevation
Tone
Pattern
Land Cover
1
Medium
High
Dark
Definite
Grassland
2
Fine
Low
Light
Vague
Grassland
3
Coarse
Low
Dark
Definite
Woodland
4
Fine
Medium
Light
Definite
Woodland
5
Coarse
High
Dark
Vague
Woodland
6
Medium
High
Dark
Definite
Woodland
7
Fine
Medium
Light
Vague
Grassland
8
Coarse
High
Dark
Definite
Woodland
9
Medium
Low
Dark
Vague
Woodland
10
Fine
Medium
Light
Definite
Woodland
11
Medium
Low
Dark
Vague
Grassland
12
Coarse
Low
Light
Vague
Woodland
13
Fine
Medium
Light
Vague
Grassland
14
Coarse
High
Dark
Definite
Woodland
15
Coarse
Medium
Dark
Definite
Woodland
16
Fine
High
Light
Vague
Grassland
TABLE 9.1
Training Samples for the Target Variable Land Cover
D e c i s i o n Tr e e I m a g e A n a l y s i s Entropy (Tvague) = −(3/8) log2(3/8) − (5/8) log2(5/8) = 0.954 Entropy (Tdefinite) = −(7/8) log2(7/8) − (1/8) log2(1/8) = 0.544 Thus, n |T | Gain ( T , pattern) = entropy ( T ) − ∑ i × entr o py(Ti ) |T | i=1
Gain (T, texture) = 0.954 − (6/16) × 0.918 − (4/16) × 0 − (6/16) × 1 = 0.235 Gain (T, tone) = 0.954 − (7/16) × 0.985 − (9/16) × 0.764 = 0.093 Gain (T, elevation) = 0.954 − (5/16) × 0.971 × 2 − (6/16) × 0.918 = 0.003 Feature selection follows the principle that the selected feature should reduce the entropy of the descendant nodes to the maximum during data subdivision. Therefore, elevation, as the least important feature, is excluded from consideration in the classification. By comparison, texture is the most significant attribute, with the highest information gain, and thus should be tested at the top of the tree. The other two attributes (pattern and tone) form the two branches of this root node. These two split nodes have two branches each, each corresponding to a unique target value in the table. Their terminal nodes represent pure datasets equivalent to land covers in the table. The final constructed decision tree is presented in Fig. 9.8. The entire process of constructing a basic decision tree as described above is summarized into six steps below: • Start with the root, or the ancestor, node at the top of the tree. • Calculate the information gain for all the attributes and determine the attribute with the highest information gain (texture in the example). Attach it to the ancestor node. • Create a descendant branch for every possible value of the above attribute (three for texture in the example). • Attach a target value to every descendant node in the branch if the attribute value matches that of the node.
365
366
Chapter Nine
Texture
coarse
medium
fine
Pattern
W definite
Tone vague
W
G
dark
W
light
G
FIGURE 9.8 A decision tree for the classification of land cover based on data provided in Table 9.1 (W = woodland; G = grassland).
• If the attributes attached to the child node can be classified uniquely (the coarse branch in the example), add that classification to that node and mark it as a leaf node. • Iterate the process starting from step two for other attributes (e.g., tone and pattern) that have not been added to the tree yet until all target values are attached to a terminal node. Although the above steps are done manually in the example, it can be fully automated by a computer script.
9.3.4
Node Splitting Rules
After the attribute with the largest information gain has been determined and selected at an internal node, the next important consideration is to design an appropriate splitting rule through which the input data are partitioned into increasingly homogeneous subsets, each of which could correspond to a meaningful information class on the ground if sufficiently subdivided. The method used to estimate splits at each internal node of the tree is important to image classification as it controls the pace of improvement in predictive accuracy. Many algorithms have been developed to establish the splitting rule at internal nodes during growing a decision tree. Of these rules, the four most common ones are Gini index, entropy, twoing, and class probability (Zambon et al., 2006). As a type of node impurity measures, they all shed light on the relative homogeneity of cases in the terminal nodes. In order to understand their mathematical underpinning, let us consider a land cover variable that has J instances (j = 1, 2, ..., J). The tree to be constructed contains t nodes, each node has a probability value for the target variable, denoted as p(1/t), p(2/t), …, and p(J/t).
D e c i s i o n Tr e e I m a g e A n a l y s i s The Gini index is defined as Gini(t) = ∑ p(i/ t)[1 − p(i / t)]
(9.9)
i
where p(i/t) stands for the relative frequency of class i at node t (Breiman, 1996). The Gini measure is computed as the sum of products of all pairs of class proportions, derivable for every class present at the node. This impurity or heterogeneity index is associated with every splitting at a node. The selected split partitions the data into two disjoints in such a way that minimizes the sum of the squared deviations from the mean in the separate parts. The goodness of a split is measured against the reduction in impurity at the descendant nodes. Thus, split aims to maximize the value of ΔR(i, t) or M(i) = ∑ Δ R(i, t) t∈T
= ∑ ⎡⎣Gini(t) − Gini(tR )PR − Gini(tL )PL ⎤⎦ (i = 1, 2, K , N ) (9.10) t∈T
where PR and PL are the proportion of cases in T sent to the right and left branches, respectively (PR + PL = 1). The Gini rule produces the optimal split by assigning all data in the class with the largest pi to tL and all other classes to tR. Gini values range from 0 to 1, with 0 signifying that all observations in a node belong to the same class (i.e., the node is homogeneous), whereas 1 means that all observations are equally sized. Through the application of this rule, the machine searches for the largest category in the input dataset. The largest homogeneous category is then isolated from the remainder of the data and assigned to one node (i.e., a pure node). The process is repeated for other categories stepwise. Attention is then focused on other nodes, with the data subdivided in a similar fashion recursively until the end nodes. The entropy measure, also called the information rule, is based on the homogeneity of a node. If the target attribute can take on c different values, then the entropy of T relative to this c-wise classification is defined as in Eq. (9.5). This rule attempts to identify splits that partition the input data into as many groups as possible and as precisely as possible, while the internal variation of each group is minimized at the same time. The splitting at a node is carried out in such a way that maximizes the reduction in entropy of the descendant nodes. The principle is the same as that described in Eq. (9.10) except the Gini index is replaced by entropy. The twoing index, another way of measuring impurity, is defined as Twoing(t) =
1 PP 4 L R
{∑ i
}
⎣⎡| p(i / tL ) − p(i / tR )|⎤⎦
2
(9.11)
367
368
Chapter Nine where L and R refer to the left and right branch of the split, respectively. The whole group of data is separated so that those making up 50 percent of the remaining data are identified at each successive node. It produces a split that is more evenly distributed than the Gini index. A new observation to be grouped is assigned to all possible nodes at the same level alternatively. Then the probability of this observation being a member of each class is calculated respectively. It is assigned to the one to which the probability is the highest (Zambon et al., 2006). Thus, this splitting rule is based on the probability structure of the tree instead of the classification structure or prediction success, even though the Gini value is used to grow the class probability trees. Naturally, trees grown with this measure bear a high resemblance to those based on the Gini-index splitting rule. Of these four splitting rules, the Gini index attempts to produce pure nodes. It is able to isolate minority groups more accurately than the twoing rule. By comparison, the entropy rule tends to split the data into two disjoint subsets and equalize the subsets of tL and tR (Breiman, 1996). It is commonly chosen for classification type problems. The Gini and class probability rules have been recommended for classification of remotely sensed data (Zambon et al., 2006).
9.3.5 Tree Pruning A tree is allowed to grow to accommodate all observations in the sample during training. In this way all training samples are classified as accurately as possible (e.g., 100 percent). This could be problematic in the presence of noises in the training data. Some of the splits created from this noisy dataset may in fact yield very poor classifications when applied to unseen data or new cases (Weiss and Kulikowski, 1991; Quinlan, 1993). The decision tree is said to suffer the overfitting, or overlearning, problem. An overfitted decision tree compromises classification accuracy of the actual data because of its poor generalization ability, even though it is rather accurate with the training data used to grow it. Such a tree has a limited predictive capability with unseen data, which is one of the objectives of building it initially. Overfitted trees can be effectively dealt with via pruning using an independent set of training data. Pruning is a process of eliminating leaf nodes that are fit to noise in the training data, thereby simplifying its overall structure, reducing its size, and making it more easily interpretable. Pruning is a means of optimization during which the sum of the output variable variance is minimized. Optimization aims at achieving a similar level of accuracy with fewer nodes through removal of ineffective nodes. A properly pruned tree has an optimal structure, with a balance between classification accuracy and parsimony. A parsimonious tree does not overfit the training data. If properly pruned, an overfitted tree is able to predict unseen patterns reasonably accurately.
D e c i s i o n Tr e e I m a g e A n a l y s i s The utility of tree pruning can be evaluated against an independent set of examples, or a portion of the training dataset. In the first approach, a statistical test is applied to determine whether pruning a node is likely to produce an improvement beyond the training set, based on an explicit measure of the complexity for encoding the training examples and the decision tree. The second approach relies on the training and validation set after all the available data are divided into three sets: a training set, a validation set, and an evaluation set. Usually much larger than the other two, the first portion is used to grow the tree initially, in which the learned hypothesis is formed. The second part is used to prune the built tree via crossvalidation that is highly suited to small dataset. In n-fold crossvalidation, the original input data are randomly divided into n sets, with all but one of them used to build the tree from scratch each time. The remaining ones not involved in the building are used to validate the built tree. The sum of the error counts from each of the n test samples is obtained to derive the overall error estimate. The criterion for judging the suitable size of the tree is the minimum average deviance. The last set is used to evaluate the accuracy of this hypothesis over subsequent data and, in particular, to evaluate the impact of pruning this hypothesis. The pruning process starts with the leaves automatically and proceeds upward in a bottom-up manner. The criterion for pruning is whether the information gain from one subtree, and the whole tree by extension, can be improved by replacing it with either its most common leaf or branch. Nodes are removed if the resultant tree has the same competent performance as the original. The node that improves the decision tree accuracy the least is eliminated first. Internal nodes directly above two or more leaf nodes are converted into leaf nodes if the conversion leads to a higher accuracy in the independent set of samples. Thus, any leaf node created out of coincidental regularities in the training samples is likely to be pruned because these same coincidences are unlikely to be replicated in the validation dataset. The class assignment at each new leaf node is based on the majority class at that node. This process progresses recursively up the tree until further removal of leaf nodes leads to an adverse effect. For instance, the classification accuracy of the tree is lowered over the independent set of samples. The degree of tree pruning is governed by the user-specified minimum number of cases that must follow each of the branches of a tree, or the confidence level in deriving the predicted error rate at each leaf, branch, and/or subtree, as well as the predicted number of errors. A small value means more severe pruning. The pruned tree with a simplified structure has superior predictive capabilities. Each final leaf is the result of following a set of mutually exclusive decision rules down the tree. Such pruned decision trees are more compact and generalized than the original.
369
370
Chapter Nine
9.3.6 Tree Refinement Tree refinement differs from tree pruning in that nothing is changed to the established tree structure during refinement. Rather, a series of trees are constructed from the same training dataset to figure out the best one. Two techniques, known as bagging and boosting, are available for refining a decision tree. Proposed by Breiman (1996), bagging generates an ensemble of individual decision trees via multiple samples resulting from such sampling techniques as sampling with replacement, or bootstraps sampling. A decision tree classifier is constructed from each sample, which has the same size as the original training set. The final classifier is produced by aggregating all classifiers using majority vote or plurality vote. Bagging is a useful way of enhancing accuracy when the learning algorithm is unstable (Chan et al., 2003). For instance, a small change in training samples induces a large change in accuracy. Proposed by Freund and Shapiro (1996), boosting is a kind of processing aiming at enhancing the performance of a weak classifier in the tree. During boosting a series of trees are generated sequentially from the raw tree constructed from the training data. Inevitably, some pixels in the input data are incorrectly classified as a consequence of noise. Each classifier thus focuses on the errors of the previous one. The performance is enhanced by altering weights assigned to individual elements of the training dataset. Initially, an equal weight, or the same probability, is attached to all elements, which allows them to be selected with an equal chance. After the tree is trained, the output from the tree classifier is compared with the original data to reveal which is associated with more misclassifications. Boosting in essence is a process of focusing on erroneous cases, such as those observations that are difficult to classify correctly by assigning a proportionally larger weight to them while a proportionally smaller weight is given to those correctly identified observations. This effectively forces the new classifiers to direct their attention to cases difficult to classify accurately in the next iteration. Pruning is usually carried out iteratively. The residuals from iteration n − 1 are used to modify the training sample used at iteration n to emphasize previously misclassified cases (McIver and Friedl, 2001). Naturally, some misclassifications may still persist with the second classifier based on the updated weights. Generally different from the first, they become the focus of attention during construction of the third classifier. The probability of a misclassified case is modified by a factor of
bt = (1 − at)/at
(9.12)
where at is the sum of the misclassified case probabilities of a current classifier Ct at trial t. The sum of the probabilities is then standardized to 1.
D e c i s i o n Tr e e I m a g e A n a l y s i s This process continues for a pre-determined number of iterations, or until the most recent classifier is either extremely accurate or inaccurate. A boosted tree is created by voting among the different trees that have been created in this fashion (Schapire, 1999). There are two well-known boosting algorithms, Adaboost and Arc-4x, both attaching a larger weight to pixels that have been misclassified in a previous iteration to maximize their chances of inclusion in the next training set. Arc-4x is an ad hoc creation that uses similar logic to Adaboost except the probability update in the nth case at t + 1 trial is calculated as 1 + m(n)4 ∑ [1 + m(n)4 ]
(9.13)
where m(n) is the number of misclassifications of the nth case at trial Ct. Both bagging and boosting can be combined with any supervised classification algorithms. The combination of these refinement methods with a base classifier (e.g., a decision tree or neural network) brings out two benefits: • First, it improves effectiveness and may make results classified from remotely sensed data more accurate. After 10 such trials on 27 different datasets, boosting reduces the amount of errors by about 15 percent on average over the use of a single tree (Quinlan, 1996). Boosting has been found to increase classification accuracy by between 3 percent and 6 percent (Pal and Mather, 2003). However, boosting may not always result in more accurate classifications. As a matter of fact, boosting can actually reduce classification accuracy in light of noisy training cases (RuleQuest Research, 2007). Thus, it should be treated with caution. Boosting should be tried when the predictive accuracy is a paramount concern. • Second, it increases stability, robustness, and resistance to overfitting. Decision tree classifiers refined with bagging and boosting algorithms are substantially more stable and more robust to noises in the training data than the standard tree (DeFries and Chan, 2000), even though the computation involved is more intensive, especially with the bagging algorithm.
9.4
Common Trees in Use So far a number of decision trees have been developed commercially and academically. Their features and capability in handling remotely sensed data vary widely. Contained in Table 9.2 is a comparison of three major trees. Their specific features are elaborated on under separate headings below.
371
372
Chapter Nine
Feature
CART
C4.5
Variable selection
QUEST Unbiased selection
Split types
Univariate Linear combination
Univariate
Univariate Linear combination
Nature of tree
Binary
Multiple
Binary
Choice of misclassification cost Use of prior probabilities Choice of impurity functions Bagging
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
No
No
Missing value handling
Surrogate split
Probability weights
imputation
Pruning control
Test sample pruning Cross-validation
Pre-determined
Test sample pruning Crossvalidation
Classification functions
Error estimation
Source: modified from Shih, 2005.
TABLE 9.2
Comparison of Three Major Decision Trees in Remote Sensing Image
Analysis
9.4.1
CART
The classification and regression tree (CART) is a binary decision tree that can operate on both continuous remote sensing and categorical ancillary data (Lawrence and Wright, 2001). This classifier automatically selects useful spectral and ancillary data from the input data (Breiman et al., 1984). Available in widely used statistical packages such as S-Plus, the CART tree is built by recursively dividing the input data until end points or terminal nodes are reached. This supervised algorithm requires training data or learning samples. After analyzing all explanatory variables (i.e., all input spectral bands and auxiliary data), the machine decides which binary splitting of a variable is the best at reducing variance in the land cover classes. Not relying on any stopping rules, CART uses the growing-pruning method in building the tree, hence avoiding overlooking important structure
D e c i s i o n Tr e e I m a g e A n a l y s i s owing to premature stopping. The final pruned tree tends to be optimal thanks to the reliable pruning strategy used. Provisions are available for selecting and testing the optimal tree as an integral part of the CART algorithm. Missing values in the input data are handled intelligently by back-up rules or “surrogate splitters,” instead of treating all entries with missing values as if they had the same unknown value (Salford Systems, http://www.salford-systems.com). The surrogate splitter contains information that is typically similar to what would be found in the primary splitter. If misclassifications or cases that have been incorrectly classified are more serious than others, they are dealt with by specifying a higher penalty so that the computer will steer the tree away from this type of error. There are seven splitting criteria (Gini, symmetric Gini, twoing, ordered twoing, class probability for classification trees, least squares, and least absolute deviation for regression trees) to choose from in building univariate trees and one (the linear combination) for multivariate trees. Among the huge array of potentially useful input and ancillary data, CART is able to differentiate the most-useful from the leastuseful data in its decision making without any priori knowledge, a characteristic distinguishing decision trees from neural networks and expert systems (Lawrence and Wright, 2001). Since the focus is on minimization of total misclassifications for the entire training dataset, a class of more training samples exerts a greater influence on the analysis than a class of fewer samples. CART is computationally expensive owing to the requirement to generate multiple auxiliary trees. Most importantly, the final pruned subtree selected from a parametric family of pruned subtrees may exclude the optimal pruned subtree (Safavian and Landgrebe, 1991).
9.4.2
C4.5 and C5.0 Trees
C4.5 is an extension of the basic ID3 algorithm designed by Quinlan. The improvements in C4.5 include the ability to handle continuous data, data with missing values, reduced error pruning, and improved computational efficiency. C5.0 is an improved version of C4.5 in that it is much faster, more compact than C4.5, and supports boosting. Both C4.5 and C5.0 produce trees with similar predictive accuracies, but C5.0 requires much less memory and has a simpler structure. It is also possible to weight different attributes and misclassification types. C5.0 supports boosting with any number of trials, though it may take longer to produce a boosted tree. The extra computation is justified by the improvement in classification accuracy. C5.0 is better than C4.5 in that it has incorporated a number of new functions, such as variable misclassification costs instead of treating all errors equally as in C4.5. Separate costs are available for each predicted class pair to differentiate more serious misclassifications from less serious ones
373
374
Chapter Nine (RuleQuest Research, 2007). The constructed classifiers are able to minimize expected classification costs instead of error rates. C5.0 is good at uncovering patterns in the input data, assembling them into classifiers, and using them to make predictions. The importance of each feature is reflected in the weight attribute. Weights associated with incorrectly labeled pixels are reduced proportionally in the subsequent tree. Both the best attribute to separate the different classes in the training data and the best possible threshold to make this separation are based on the concept of information (Quinlan, 1993). Voting among all such built trees produces the boosted tree. The building of a C5.0 tree and its pruning favor those dominant classes with a large training sample size while minor classes are often “penalized.” This effect may be reduced via the gain ratio. The C5.0 tree is suited for classification of remote sensing data when only a small training sample is available, even though it is designed primarily for analyzing a huge quantity of numeric and nominal data measured in hundreds of thousands of records. In mapping national park vegetation into 11 categories, a boosted C5.0 decision tree classifier achieved an overall accuracy of 82.05 percent with a Kappa coefficient of 0.80 from two Enhanced Thematic Mapper Plus (ETM+) scenes (de Colstoun et al., 2003). As with all decision trees, the predictive accuracy of C5.0 is influenced by the training sample size.
9.4.3
M5 Trees
Model trees are a form of binary decision tree using linear regression functions at the leaf (terminal) nodes instead of discrete class values. Similar to all regression trees, the M5 model tree is commonly used to predict outcomes expressed as a continuous value instead of integers. Analogous to piecewise linear functions, model trees are more transparent and hence acceptable to decision makers, very fast in training, and always converge (Quinlan, 1992). One method of constructing an M5 model tree is to use the divide-and-conquer method in two stages. At the first stage a splitting criterion is used to create a decision tree. This splitting criterion maximizes the intra-subtree variation of the target value. The maximization is based on the calculated variance of the class values that reach a node as a measure of the error at that node, and the expected reduction in this error as a result of testing each attribute at that node. The formula to compute the standard deviation reduction (SDR) is |T | SDR = sd(T ) − ∑ i sd(Ti ) |T | where T = set of examples that reaches the node Ti = subset of examples that have the ith outcome of the potential set sd = standard deviation
(9.14)
D e c i s i o n Tr e e I m a g e A n a l y s i s The data in child nodes have a smaller standard deviation than parent nodes as they are more homogeneous. After all the possible splits are examined, the one that maximizes the expected error reduction is selected. This division often produces a large treelike structure that must be pruned back by replacing a subtree with a leaf. During the second stage a model tree is pruned back by replacing the subtrees with linear regression functions wherever this seems appropriate. This technique of generating the model tree splits the parameter space into areas (subspaces) and builds a linear regression model in each of them. As a regression tree, M5 is not suitable for hard classification of land covers without modification. One possible way of modifying it is to use the conditional class probability function and a model tree to approximate this probability function. The use of this tree for image classification involves the following four steps (Pal, 2006): • Selection of training datasets from the input data for each of the information classes to be mapped. The number of training datasets equals the number of information classes. All the datasets have an identical number of land covers that are assigned a value of 0 except the current class that receives a value of 1. For example, in a mapping of four covers, the training datasets take the form of [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], and [0, 0, 0, 1]. • Application of a model tree to each of the created datasets. Its output approximates the probabilities of each instance in the training dataset belonging to the class (Frank et al., 1998). • Testing of each of the model trees against the test dataset. • Presentation of new cases (e.g., original satellite data) to the tested trees with a probability generated. The model tree that produces the highest probability for an instance is considered the predicted class. The M5 model tree–based classification approach achieved a higher level of classification accuracy than a univariate decision tree classifier, a variant of C4.5, even with a training dataset as small as only 50 and 100 pixels per class in mapping seven types of crops from ETM+ data (Pal, 2006). With a large training dataset, the M5 model tree achieved an accuracy level comparable to a boosted univariate tree. This comparability demonstrates the utility of the M5 model tree in classifying satellite imagery for the purpose of mapping land covers, even though the M5 model tree algorithm takes longer to train.
9.4.4
QUEST
The quick, unbiased, and efficient statistical tree (QUEST) is a binary decision tree for image classification and data mining. It is very similar
375
376
Chapter Nine to CART except that a few improvements have been made to perfect it (Loh and Shih, 1997). For instance, missing values in the input data are handled by imputation instead of surrogate splits. Categorical predictor variables of many categories can be handled easily. These improvements make it a perfect candidate in hard classification where land covers are mapped into nonoverlapping classes. All satellite data and ancillary data are selected using an unbiased variable technique, which is the default option of the system. The advantages of this decision tree include its negligible variable selection bias, computational simplicity, and tree pruning. Because of its short history, QUEST has not found any applications in classification of remote sensing data so far. Thus, its performance relative to other decision trees or parametric classifiers remains unknown at present.
9.5
Decision Tree Classification Rooted in machine learning theory, decision tree classification is nonparametric and rule based. The assumption underlying this method is that a feature vector corresponds to a target vector. This correspondence is established from a sample of pixels (training data), also known as labeled samples. As with supervised classification, decision trees need to be trained prior to classification. During training, input samples are successively partitioned into more and more homogeneous subsets by producing optimal rules or decisions, which maximize the information gained and minimize the error rates in the branches of the tree (Weiss and Kulikowski, 1991). Once properly trained, a decision tree is ready for image classification, a process of recursively partitioning the remote sensing data until they can no longer be subdivided. The feature space is estimated by recursively splitting the data at each node on the basis of a statistical test that increases the homogeneity of the training data in the resulting descendant nodes. Decision tree classifiers label the identity of pixels in the feature vector in a chain of simple decisions over multiple stages. This process of recursively dividing the training data into smaller subsets continues until the leaves of the tree contain only cases from one class, or until the splitting does not bring any improvement in the gain ratio (de Colstoun et al., 2003). Toward the very end of the partition, the homogeneous subsets virtually form the target vector. The processing is generally executed by moving from the root node down to the terminal nodes in a manner known as the top-down approach. The performance of the classification is judged against a number of parameters, such as ease and speed of tree training, strengths, and limitations. Most importantly, it is the classification accuracy level and the robustness to noises in the training samples that are the critical factors to consider.
D e c i s i o n Tr e e I m a g e A n a l y s i s
9.5.1 Accuracy Classification accuracy refers to the percentage of cases correctly classified in both training and testing. It is determined using the third portion of the data in cross-validation. Test set error is a measure of the ability of the decision tree to predict data that have not been presented to it before. Training set error refers to the proportion of pixels that are not correctly predicted by the tree. The smaller the error, the more accurate the decision tree. If it occurs during the validation stage (i.e., determined using the third portion of the training sample data), it is termed cross-validation classification accuracy. The accuracy of decision tree classification of remotely sensed data has been achieved at a very high level in producing either hard or crisp classification results. For instance, the overall accuracy stands at 82 percent (Kappa = 0.8) in mapping vegetation in a national park into 11 categories from Landsat ETM+ data (de Colstoun et al., 2003). Overall, decision tree–based nonparametric classification of field variables and auxiliary variables served by Landsat Thematic Mapper (TM) imagery achieved an overall accuracy of 74.5 percent (Kappa = 0.5) in predicting vegetation types (Joy et al., 2003). This classification method successfully identified dominant vegetation types at a finer spatial resolution than is typically possible with the traditional classifier. This accuracy level, however, is dependent upon several factors, such as whether the tree is boosted, the detail level mapped, training sample size, and dimensionality of the input data. Initially, classification tree analysis achieved an overall accuracy of only 73.1 percent in detecting various wetlands and riparian zones from multiseason Landsat ETM+ imagery combined with ancillary topographic and soils data (Baker et al., 2006). This accuracy was later improved to 86.0 percent after the classification tree was refined with the stochastic gradient boosting using classification errors. The accuracy of decision tree classification of spectral indices and fraction images derived from remote sensing data varies with the number of condition classes and the nature of the crown mapped (Sims et al., 2007). The best overall accuracy decreases from 92 percent (Kappa = 0.83) in a two-class model to 68 percent (Kappa = 0.28) in a three-class model of crown transparency. The accuracy further degrades to 58 percent (Kappa = 0.21) in a two-class model of crown volumes affected by soil nitrogen levels. Similarly, accuracy decreases from 96 percent in a classification of three classes of natural vegetation, agriculture, and urban to 79 percent at eight classes, and further to 65 percent at 11 classes in mapping general land use with an emphasis on vegetation (Lawrence and Wright, 2001). Classification accuracy increases linearly with training sample size up to a limit of 300 pixels per class, with both univariate and multivariate decision trees (Pal and Mather, 2003). Classification accuracy actually declines as the feature dimension of the satellite data increases, in drastic contrast with ANN or parametric classifiers such as maximum likelihood that tend to produce more
377
378
Chapter Nine accurate results from more channels in the input data. A possible explanation is that more features in the input data increase the chance of correlation between them. This correlation makes class structure more dependent on combinations of these features and hence difficult for the decision tree to perform well. The above studies, carried out in isolation, are unable to reveal how effective this decision tree approach of image classification is in comparison with other parametric methods. Such a deficiency has been remedied by a few studies in which a comparable accuracy has been achieved with both methods. For instance, the decision tree and maximum likelihood classifiers achieved a comparable accuracy in mapping global vegetation distribution (Hansen et al., 2001). The resulting habitat unit maps have an overall accuracy of 91.8 percent with TM data, and 89.5 percent with Multispectral Scanner (MSS) data. Similar levels of classification accuracy are obtained in estimating crop cover using both maximum likelihood and decision tree classifiers from multitemporal MSS data (Belward and de Hoyos, 1987). The basic decision tree model performed comparably to a maximum likelihood classifier in classification accuracy (Hansen et al., 1996). In spite of this comparability, it must be noted that decision trees have significant advantages in feature selection and in handling disparate data types and missing data. More researchers, however, have confirmed that the decision tree approach is more accurate than parametric classifiers. Even without boosting, the decision tree classifier outperformed the maximum likelihood classifier by about 10 percent in accuracy in monitoring vegetation from multitemporal TM imagery (Rogan et al., 2002). In identifying changes in vegetation cover, their respective accuracies are 68.7 percent versus 65 percent. Decision tree algorithms achieved consistently higher accuracies than the maximum likelihood classifier, sometimes even substantially (Friedl and Brodley, 1997), owing to their ability to adapt to the noisy and nonlinear relationship often existing between land cover classes and remotely sensed data. A decision tree classifier based on linear spectral mixture analysis achieved an overall accuracy of 85.9 percent, higher than 79.75 percent and 77.17 percent achieved, respectively, with the maximum likelihood and minimum distance classifiers (Lu et al., 2004). This approach is recommended for classifying mature forest, different stages of secondary succession, pasture, agricultural lands, and bare lands. A 14-feature decision tree correctly labeled the classes in the reference map at a rate of 73.5 percent, much higher than 56.3 percent achieved with the maximum likelihood classifier (Borak and Strahler, 1999). This advantage of higher accuracy is lost if the remote sensing data have a large number of spectral bands, especially when it exceeds 20. However, the performance of the decision tree classifier with hyperspectral data seems to contradict the above claim. At the first glance, decision tree classifiers appear to be ill-suited for classification
D e c i s i o n Tr e e I m a g e A n a l y s i s of hyperspectral data as misclassifications are as high as the lower 40-percent range in detecting all the combinations of different nitrogen and weed categories in corn fields (Goel et al., 2003). The low accuracy with the hyperspectral data is due less to the rise in the dimensionality of the input data than to the difficulty in mapping the spectrally subtle classes (e.g., nitrogen status and weed stress), judging from the fact that classification results are satisfactory (i.e., around 80 percent) when one factor (nitrogen or weed) was considered at a time. It was concluded that decision tree classification algorithms have potential in the classification of hyperspectral data for crop condition assessment. Furthermore, the CART approach can generally distinguish tillage practices with a classification accuracy of 89 percent, and residue levels with a classification accuracy of 98 percent from hyperspectral images (Yang et al., 2003). Thus, a high dimensionality of remote sensing data is not necessarily related to a lower accuracy.
9.5.2
Robustness
The robustness of a decision tree to noise is a critical measure of its quality. For various reasons (e.g., atmospheric scattering) remote sensing data tend to be infested with noise, which degrades the accuracy of land covers mapped from the data. Noise exists in different forms, such as mislabeling of cover land in the training data and random noise. Mislabeling occurs when the training dataset does not originate from the input data to be classified or when the ancillary data do not match the image data. Noise level is usually measured by the ratio of information to noise. In order to test the robustness of a decision tree classifier to noise, noise is deliberately introduced to training dataset by randomly altering the class value of examples equally across the board. The percentage of chance is known as the amount of noise. Thus, a 5 percent noise means that 5 percent of pixels in the training set change their class values in proportion to its dominance in the original training data. As demonstrated in Fig. 9.9, in general, decision tree classifiers are rather resistant to noise in the input. In particular, the M5 tree is rather robust. As the noise level rises from 0 to 40 percent, the accuracy of the M5 tree drops only slightly. By comparison, the standard C4.5 tree and its boosted counterpart produce less accurate results as the noise level in the input data rises (Pal, 2006). Therefore, the M5 tree is more robust than standard C4.5 tree. Tree boosting does not seem to exert any positive effect on the tree's ability to resist noise. It must be noted that robustness to noise is a function of many variables, including the spatial resolution of the satellite data used and whether noise exists in the source data or in the training samples (DeFries and Chan, 2000). For instance, the standard C5.0 tree is not so resistant to noise in the input data (Table 9.3). The tree is significantly less robust if the remote sensing data have a finer spatial resolution. With data of all spatial resolutions, boosting and bagging
379
380
Chapter Nine
Accuracy (%)
100
80
M5 model tree 60
DT Boosted DT
40 0
10
15
35 25 30 20 Noise in training data (%)
40
45
FIGURE 9.9 Robustness of three decision tree algorithms to noise in the input data. (Source: modified from Pal, 2006.)
Tree type/ nature
Accuracy
Computational Intensity
Stability
Robustness to Noise
8-km AVHRR data Standard C5.0 decision tree
Slightly lower
Low
Low
Low
Decision tree with boosting
Slightly higher
Medium
High
High
Decision tree with bagging
Medium
High
High
High
Landsat data Standard C5.0 decision tree
Slightly lower
Low
Low
Low
Decision tree with boosting
Slightly higher
Medium
Medium
Medium
Decision tree with bagging
Medium
High
High
High
Source: DeFries and Chan, 2000.
TABLE 9.3 Relative Performance of Different Algorithm Settings with 8-km-AVHRR and Landsat Data
D e c i s i o n Tr e e I m a g e A n a l y s i s appear to enhance the classifier’s ability to resist noise. Again, this level of enhancement is related to the spatial resolution of the data. With coarse satellite data such as 8-km AVHRR, boosting has a more profound effect than with fine resolution imagery data. Thus, whether the robustness of a decision tree can be enhanced via boosting is further compounded by the spatial resolution of the satellite data used.
9.5.3
Strengths
Decision trees have shown an enormous potential in classification of multisource remotely sensed data. With the decision tree regression approach, it is possible to determine class proportions within a pixel so as to produce soft classification (Xu et al., 2005). Prior probabilities and ancillary information can be incorporated into classification of land covers, which is especially beneficial for poorly separable classes. Robust probabilities of class membership can be estimated from nonparametric supervised classifiers. Even incomplete or imperfect information can be taken advantage of with the use of a confidence parameter that weights the influence of ancillary information relative to its quality. With decision trees it is possible to explore the complex relationship between multispectral bands and land cover classes. By identifying the most useful combination of spectral bands, the separability between any two classes is increased. Decision trees are effective at revealing the relationship between image object metrics and standard forest-inventory parameters (Chubey et al., 2006). Decision trees are easy to implement, and model transparency allows easy interpretation of results. The other advantages of decision tree classifiers over traditional parametric algorithms in classification of remote sensing data are summarized in five aspects below (Hansen et al., 1996).
Nonparametric Decision trees are strictly nonparametric and nonlinear. There is no underlying assumption about the distribution of the input data. In this regard decision tree classifiers are inherently superior to parametric classifiers (e.g., maximum likelihood). They are well suited to non-normal data and situations where a single cover type is represented by more than one cluster in the spectral space (DeFries and Chan, 2000). Decision trees can reveal nonlinear and hierarchical relationships between input variables and make use of them to predict class membership. The decision at a lower hierarchy is based on the outcome of previous evaluations at a higher level. In this way the complex decision-making process is decomposed into a series of simple decisions. This contrasts sharply to parametric statistical classifiers that assign a pixel to a class based solely on a single decision reached from simultaneous consideration of all evidence or features.
381
382
Chapter Nine
Comprehensible and Simple The decision tree approach makes the solution to the problem under study easier to comprehend and interpret. The final results of a decision tree classification can be summarized in a series of rules. These rules are easily interpretable owing to the explicit classification structure. They are translatable into comprehensible English in the form of logical if-then conditional statements. Besides, they also facilitate the derivation of a physical understanding of the classification process. Even when a complex domain or a domain that does not decompose easily into rectangular regions causes the decision tree to be large and complex, it is generally fairly easy to traverse any one path through the tree. Its explicit and easily interpretable structure makes it simple to understand. This simplicity enables the analyst to explain why observations are classified or predicted in a particular manner, a relatively straightforward task. Moreover, the tree structure assists the analyst to understand and interpret the information at each level within the process.
Versatile Decision tree classifiers are capable of handling data enumerated on different measurement scales. They are equally adept at dealing with both continuous and categorical variables. Categorical variables, which pose problems for neural networks and statistical classifiers, come ready-made with their own splitting criteria: one branch for each category. Continuous variables are equally easy to split by picking a number somewhere in their range of values or using the soft classification option.
Able to Deal with High-Dimensional Data Decision tree classifiers can accommodate a large array of multispectral bands and potentially useful ancillary data in their decision making. For instance, they are able to handle high-dimensional datasets of up to 87 image object metrics (Chubey et al., 2006). This is achieved by rejecting those features or data sources that contribute minimal information toward the classification. Thus, only those few features carrying rich information are automatically selected. In fact, decision trees can effectively reduce the dimensionality of the input feature space up to a manageable level of 80 percent. By retaining most of the information of the original database, classification accuracy is not degraded significantly (Borak and Strahler, 1999). Although neural networks are able to exclude less significant variables from classification, the selection is hidden from the analyst. This makes it impossible to judge the relative importance of all ancillary data and hinders their application in future classifications.
Flexible Decision trees are highly flexible and have a significant intuitive appeal. Decision trees are advantageous in classifying remotely sensed data
D e c i s i o n Tr e e I m a g e A n a l y s i s owing to their flexibility and computational efficiency. Tree branches can be pruned back or expanded easily. Different feature subsets and decision rules are used at different stages of classification. This flexibility allows it to be combined with other classifiers. It is also possible to trade classification accuracy for computational efficiency.
9.5.4
Limitations
Despite these strengths, decision trees do face a few limitations: • First, the results are error prone when too many classes are involved. Some decision tree algorithms can accommodate only binary target classes. Others may be able to assign pixels into an arbitrary number of classes, but are error prone if the number of training examples per class is small. This can happen rather quickly in a tree with many levels and/or many branches per node. • Second, it is computationally expensive to train a tree. At each node, each candidate splitting feature must be sorted before its best split can be found. In some algorithms, combinations of features are used and a search is vital to determine the optimal combination of weights. Pruning algorithms can also be expensive since many candidate subtrees must be formed and compared among themselves. • Finally, most decision tree (e.g., univariate tree) algorithms examine a single feature at a time. This leads to rectangular classification boundaries that may not correspond well with the actual distribution of pixel values in the feature space. In addition, there are more specific obstacles in image classification with decision tree classifiers (Safavian and Landgrebe, 1991). For instance, the designed decision tree classifier may never be the optimal. Classification error may be cumulative from one layer to the next in a large tree. It is not possible to achieve both classification accuracy and efficiency through optimization of the same tree. A delicate balance must be struck between the two. Besides, there may be overlap between the terminal nodes. For instance, the same conclusion is reached at different levels via different test variables (see Fig. 9.5b). Therefore, the number of terminal nodes required far exceeds the number of information classes. This increases the memory space and the search time unnecessarily. These limitations, however, may be overcome with the ensemble classifiers.
9.5.5
Ensemble Classifiers
Facilitated by the easy commercial availability of decision tree classifiers, the flexibility in breaking down a complex decision-making process into a collection of simpler, easier-to-comprehend decisions has fostered the emergence of ensemble, or hybrid, classifiers in machine
383
384
Chapter Nine learning. In a hybrid decision tree, multiple decision algorithms are provided for partitioning the dataset recursively into smaller subdivisions or classes. Hybridizing multiple classifiers to form an ensemble classifier is motivated by the fact that different algorithms exhibit selective superiority in regard to their performance. Namely, whether a classifier is optimal depends on the dataset to be classified (e.g., its spatial resolution and dimensionality). The performance of a decision tree classifier is strongly influenced by the classification strategy employed at each internal node and whether noise is present. For instance, decision tree and ARTMAP classifiers tend to make predictive errors in different contexts in classifying land covers (Liu et al., 2004). Their combination to form a hybrid classifier means that the strength of each classifier is taken advantage of. Coexistence of different classification algorithms within the same single hybrid tree ensures that the dataset is always partitioned with the “fittest” classifier. The application of this classification algorithm in the tree ensures that the particular classes are mapped most accurately, which has been verified by Friedl and Brodley (1997). They found that a hybrid tree consistently produced the highest classification accuracies, which may not be possible with individual classifiers alone. The improvements to the overall classification accuracy of an ensemble classifier is much higher if the two classifiers are complementary in their misclassifications (e.g., there is disagreement among the component classifiers), in addition to the confidence level attached to the output. This combination can enhance the confidence level of a classification (Liu et al., 2004). For instance, after two classifiers of the same type are combined via such schemes as voting in neural networks and boosting in decision trees, the final output is voted from the result of a higher confidence level. The common candidates for producing an ensemble classifier are K-means clustering and neural networks. Hybridization takes place commonly between decision trees and neural networks as both are rooted in machine learning. Besides, they seem to be complementary in their functions and their implementation. The more diverse these component classifiers are, the more benefits their hybridization can generate. Decision trees are less complex and easier to implement than ANNs. However, they may suffer from overtraining just like ANNs and have to be pruned. Decision trees are samples-hungry. A large sample is essential in training a deep tree so as not to compromise its ability to generalize. In order to avoid this problem, it is preferable to have a simple tree that is easier to implement. Similarly, it is also lengthy to train neural networks. Decision tree classifiers require the specification of attribute selection and pruning methods, whereas the use of ANNs involves selection of an appropriate type of network, configuration of network architecture, and initialization of values for various parameters (Pal and Mather, 2003). By comparison, it is much easier to configure a decision tree optimally, thanks to its simpler architecture. Besides, the problem of global minima is not so commonly encountered with decision trees. In terms of accuracy, the ANN approach slightly
D e c i s i o n Tr e e I m a g e A n a l y s i s outperformed decision trees (Goel et al., 2003). Decision trees achieved the same accuracy as ANNs only after boosting. Even so, a decision tree is a better choice because of its ease of use. The user simply specifies the attribute selection and pruning method. Although neither decision trees nor ANNs offer the perfect solution to overcoming the limitations of existing parametric classifiers (Gahegan and West, 1998), their hybridization can help to avoid some of these problems. Decision trees can be combined with K-means clustering and neural networks in various ways. For instance, the iterative K-means clustering may be used at a top node, whereas supervised maximum likelihood classification decision rules are used at a lower level, and classification results are postprocessed using sorting rules (Hansen et al., 2001). Shown in Fig. 9.7 is one of the potential methods in which the splitting node is made up of clustering analysis algorithms or ANNs. Another method is to embed neural networks into a decision tree to form a tree of neural networks. In this way the decision tree carries out the task of feature selection and decision boundary construction. Another way is to juxtapose the two classifiers to create a situation where the same data are classified twice, with two sets of class labels and two sets of confidence levels (Fig. 9.10). The final Remote sensing data
ANN classifier
Pixel label
Confidence level
DT classifier
Pixel label
Membership
Parametric classifier
Pixel label
Probability
Hybrid classifier
Final land cover map
Uncertainty map
FIGURE 9.10 A possible scheme by which decision tree classifiers may be combined with another classifier to form an ensemble classifier. (Source: modified from Liu et al., 2004.)
385
386
Chapter Nine results are made more accurate through a general vote of the two results, with an improved confidence level. Or a decision tree classifier may be used to refine the ANN output (Frohn and Arellano-Neri, 2005). Other suggested methods of hybridization include combination of component results after the results from individual classifiers have been hardened separately, or averaging the probabilities from individual classifiers before they are hardened (Huang and Lees, 2004). The second method achieved not only higher classification accuracy, but generated estimates of prediction confidence on the basis of a comparison between a combined model and three component models.
References Baker, C., R. Lawrence, C. Montagne, and D. Patten. 2006. “Mapping wetlands and riparian areas using Landsat ETM+ imagery and decision-tree-based models.” Wetlands. 26(2):465–474. Belward, A. S., and A. de Hoyos. 1987. “A comparison of supervised maximum likelihood and decision tree classification for crop cover estimation from multitemporal Landsat MSS data.” International Journal of Remote Sensing. 8(2):229–235. Borak, J. S., and A. H. Strahler. 1999. “Feature selection and land cover classification of a MODIS-like data set for a semiarid environment.” International Journal of Remote Sensing. 20(5):919–938. Breiman, L. 1996. “Some properties of splitting criteria.” Machine Learning. 24(1):41–47. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Monterey, CA: Wadsworth. Brodley, C. E. 1995. “Recursive automatic bias selection for classifier construction.” Machine Learning. 20(1–2):63–94. Brodley, C. E., and P. E. Utgoff. 1992. “Multivariate decision trees.” Machine Learning. 19(1):45–77. Chan, J. C.-W., N. Laporte, and R. S. DeFries. 2003. “Texture classification of logged forests in tropical Africa using machine-learning algorithms.” International Journal of Remote Sensing. 24(6):1401–1407. Chubey, M. S., S. E. Franklin, and M. A. Wulder. 2006. “Object-based analysis of Ikonos-2 imagery for extraction of forest inventory parameters.” Photogrammetric Engineering and Remote Sensing. 72(4):383–394. de Colstoun, E. C. B., M. H. Story, C. Thompson, K. Commisso, T. G. Smith, and J. R. Irons. 2003. “National park vegetation mapping using multitemporal Landsat 7 data and a decision tree classifier.” Remote Sensing of Environment. 85(3):316–327. DeFries, R. S., and J. C. Chan. 2000. “Multiple criteria for evaluating machine learning algorithms for land cover classification from satellite data.” Remote Sensing of Environment. 74(3):503–515. Frank, E., Y. Wang, S. Inglish, G. Holmes, and I. H. Witten. 1998. “Using model trees for classification.” Machine Learning. 32(1):63–76. Freund, Y., and R. E. Shapiro. 1996. “Experiments with a new boosting algorithm.” In: Machine Learning, Proceedings of the Thirteenth International Conference. San Francisco, C.A.: Morgan Kaufman, pp. 148–156. Friedl, M. A., and C. E. Brodley. 1997. “Decision tree classification of land cover from remotely sensed data.” Remote Sensing of Environment. 61(3):399–409. Frohn, R. C., and O. Arellano-Neri. 2005. “Improving artificial neural networks using texture analysis and decision trees for the classification of land cover.” GIScience and Remote Sensing. 42(1):44–65. Gahegan, M., and G. West. 1998. “The classification of complex geographic datasets: An operational comparison of artificial neural network and decision tree classifiers.” Third International Conference on GeoComputation (CD-ROM), 17–19 September, Bristol, UK: R.J. Abrahart.
D e c i s i o n Tr e e I m a g e A n a l y s i s Goel, P. K., S. O. Prasher, R. M. Patel, J. A. Landry, R. B. Bonnell, and A. A. Viau. 2003. “Classification of hyperspectral data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn.” Computers and Electronics in Agriculture. 39(2):67–93. Hansen, M., R. Dubayah, and R. DeFries. 1996. “Classification trees: An alternative to traditional land cover classifiers.” International Journal of Remote Sensing. 17(5):1075–1081. Hansen, M. J., S. E. Franklin, C. G. Woudsma, and M. Peterson. 2001. “Caribou habitat mapping and fragmentation analysis using Landsat MSS, TM, and GIS data in the North Columbia Mountains, British Columbia, Canada.” Remote Sensing of Environment. 77(1):50–65. Ho, T. B. 2004. Data mining with decision trees (Chap. 3). Knowledge Discovery and Data Mining Techniques and Practice, http://www.netnam.vn/unescocourse/ knowlegde/3-1.htm. Huang, Z., and B. G. Lees. 2004. “Combining non-parametric models for multisource predictive forest mapping.” Photogrammetric Engineering and Remote Sensing. 70(4):415–425. Joy, S. M., R. M. Reich, and R. T. Reynolds. 2003. “A non-parametric, supervised classification of vegetation types on the Kaibab National Forest using decision trees.” International Journal of Remote Sensing. 24(9):1835–1852. Kim, B., and D. A. Landgrebe. 1990. Hierarchical Decision Tree Classifiers in High-Dimensional and Large Class Data. Ph.D. dissertation and technical report TR-EE-90-47, School of Electrical Engineering, Purdue University, West Lafayette, IN. Lawrence, R. L., and A. Wright. 2001. “Rule-based classification systems using classification and regression tree (CART) analysis.” Photogrammetric Engineering and Remote Sensing. 67(10):137–1142. Liu, W., S. Gopal, and C. E. Woodcock. 2004. “Uncertainty and confidence in land cover classification using a hybrid classifier approach.” Photogrammetric Engineering and Remote Sensing. 70(8):963–971. Loh, W. -Y., and Y. -S. Shih. 1997. “Split selection methods for classification trees.” Statistica Sinica. 7:815–840. Lu, D., P. Mausel, M. Batistella, and E. Moran. 2004. “Comparison of land-cover classification methods in the Brazilian Amazon basin.” Photogrammetric Engineering and Remote Sensing. 70(6):723–731. McIver, D. K., and M. A. Friedl. 2001. “Estimating pixel-scale land cover classification confidence using nonparametric machine learning methods.” IEEE Transactions on Geoscience and Remote Sensing. 39(9):1959–1968. Pal, M. 2006. “M5 model tree for land cover classification.” International Journal of Remote Sensing. 27(4):825–831. Pal, M., and P. M. Mather. 2003. “An assessment of the effectiveness of decision tree methods for land cover classification.” Remote Sensing of Environment. 86(4):554–565. Quinlan, J. R. 1992. “Learning with continuous classes.” In: Proceedings of Australian Joint Conference on Artificial Intelligence. Singapore: World Scientific Press, pp. 343–348. Quinlan, J. R. 1993. C4.5 Programs for Machine Learning. San Mateo, CA Morgan Kaufmann. Quinlan, J. R. 1996. “Bagging, boosting and C4.5.” In: Proceedings of The Thirteenth National Conference of Artificial Intelligence, Portland, OR, USA: American Association for Artificial Intelligence, pp. 725–730. Rogan, J., J. Franklin, and D. A. Roberts. 2002. “A comparison of methods for monitoring multitemporal vegetation change using thematic mapper imagery.” Remote Sensing of Environment. 80(1):143–156. RuleQuest Research. 2007. Is See5/C5.0 better than C4.5? http://www.rulequest. com/see5-comparison.html. Safavian, S. R., and D. Landgrebe. 1991. “A survey of decision tree classifier methodology.” IEEE Transactions on Systems, Man, and Cybernetics. 21(3):660–674. Schapire, R. E. 1999. “A brief introduction to boosting.”In: Proceedings of the 16th International Joint Conference on Artificial Intelligence. Portland: AAAI Press, pp. 1–6.
387
388
Chapter Nine Shih, Y. S. 2005. QUEST classification tree (version 1.9.2), http://www.stat.wisc. edu/~loh/quest.html. Sims, N. C., C. Stone, N. C. Coops, and P. Ryan. 2007. “Assessing the health of Pinus radiata plantations using remote sensing data and decision tree analysis.” New Zealand Journal of Forestry Science. 37(1):57–80. Weiss, S. M., and C. A. Kulikowski. 1991. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann. Xu, M., P. Watanachaturaporn, P. K. Varshney, and M. K. Arora. 2005. “Decision tree regression for soft classification of remote sensing data.” Remote Sensing of Environment. 97(3):322–336. Yang, C. -C., S. O. Prasher, P. Enright, C. Madramootoo, M. Burgess, P. K. Goel, and I. Callum. 2003. “Application of decision tree technology for image classification using remote sensing data.” Agricultural Systems. 76(3):1101–1117. Zambon, M., L. Rick, A. Bunn, and P. Scott. 2006. “Effect of alternative splitting rules on image processing using classification tree analysis.” Photogrammetric Engineering and Remote Sensing. 72(1):25–30.
CHAPTER
10
Spatial Image Analysis
A
s shown in Chap. 2, the capability of data acquisition has witnessed a phenomenal growth over the last decade or so with the launch of several lightweight commercial remote sensing satellites. The huge quantity of multisource, multisensor, multiresolution, and multispectral remotely sensed data accumulated demand efficient and accurate data processing. Traditionally, these data are classified using the spectral methods presented in Chap. 7 in which pixels in the input image are individually allocated into certain predefined, discrete land cover categories in a multidimensional feature space. All the image classifications discussed in the preceding chapters are carried out using either the parametric or nonparametric methods. These classifiers (e.g., maximum likelihood and minimum distance to mean) have one characteristic in common: they rely exclusively on the spectral properties of pixels in deciding their allocation to one of the predefined categories of ground cover. Underpinning these classifiers is an implicit assumption that all the ground covers to be mapped have a unique combination of values in all the multispectral bands used. Whenever this assumption is violated, pixels are assigned an erroneous cover identity in the output results. The exclusive reliance on spectral information hence severely compromises the accuracy of land cover maps produced. The reliability of these traditional per-pixel image classifiers is even lower in mapping complex scenes of land covers commonly found in the urban environment, owing to the lack of consideration of pixel spatial properties. In this environment, a single spectral response pattern cannot adequately capture the wide spectral variation within land covers. One possible means of rectifying this defect is to make use of the neighboring relationship of pixels within a working window, in the same manner as the decision making behind visual image interpretation. Spatial information among pixels inherent in the input data offers ample opportunities to make the classification accurate. This is known as spatial image classification, in which spatial image elements are combined with spectral properties in reaching a classification
389
390
Chapter Ten decision. Of the available spatial image elements, the most commonly tried ones are texture, contexture, and geometry (i.e., shape). Spatial image classification gained huge momentum in its development with the increasing use of high resolution satellite imagery, airborne digital data, and radar data. The increased spectral variability among land covers in these images demands classification algorithms based on the use of more spatial information that cannot be met with the conventional per-pixel image classification methods, even though they are able to produce adequate results for simplistic surface arrangements from medium resolution satellite data. In response to this demand, a new paradigm called object-oriented image analysis has emerged recently in image analysis. It is a special type of spatial image classifiers incorporating information among pixels in the input data into the decision-making process. It has aroused renewed interest in generating detailed and accurate land cover maps from hyperspatial resolution satellite data. This chapter consists of five sections. The first section is devoted to texture-based image classification. Included in the discussion are the various methods of quantifying texture, a comparison of several quantitative textural measures, and the utility of texture in image classification. The second section discusses contexture in image classification. The third section focuses on image segmentation that is intended to form objects. Object-oriented image classification, its performance relative to spectral classification, and affecting factors are the focuses of the last two sections.
10.1 Texture and Image Classification Texture refers to the spatial variation of pixel values along a certain direction on an image, usually measured within a specific window or neighborhood. It is hence not commonly associated with individual pixels. Instead, it is formed by a collection of the same components of a land cover, such as tree branches, that are organized in different spatial permutations. This inherently qualitative image element is commonly described as smooth, medium, rough, lumpy, stippled, mottled, and rippled (Ambrosia and Whiteford, 1983). All of these terms are rather imprecise without a definite threshold separating one from another. Also called fine, smooth texture means little spatial variation in image tone. It is commonly associated with small and uniformly sized objects, such as saplings and fine sand. Coarse texture or rough texture, on the other hand, refers to huge variations in image tone over a broad range. It is indicative of large and highly variable sized objects such as mature deciduous trees. Apart from the physical dimension of ground objects, their texture in an image is also affected by its spatial resolution or scale. Scale determines the spacing of objects in an image universally. A scene may have a fine texture if
Spatial Image Analysis imaged at a distant range. Conversely, the same scene may appear as coarse textured if imaged at a close range. In order to be useable in image classification, texture must be expressed quantitatively. Quantitative description of image texture is difficult to achieve objectively as the results vary with the manner of quantification, regardless of the algorithm used. Commonly used description methods are structural, statistical, model-based, semivariogram, Fourier transforms, and Markov random fields. In the structural method, texture is regarded as comprising a number of textural elements that are spatially organized according to certain placement rules. These textural elements are characterized by their mean pixel value, area, perimeter, eccentricity, orientation, elongation, magnitude, compactness, moments, and so on. All structural methods of depiction assume the existence of a fundamental, repetitive primitive pattern of very regular and predictable blocks with a certain rule of placement (Fig. 10.1). Suitable for depicting texture
(a)
(b)
(c)
FIGURE 10.1 album.
Examples of three structural textures from the Brodatz texture
391
392
Chapter Ten on the macroscale, this method of description is ill suited for remote sensing imagery as natural ground features seldom exhibit such a regular and artificial pattern. Even if artificial features do, it is still difficult to establish the rule of placement. Therefore, they will not be explored further. The Markov random-field models have been introduced to model texture recently (Rellier et al., 2004; Zhao et al., 2007). A texture image is modeled as a probability function or as a linear combination of a set of basic functions. The texture image is depicted by the coefficients of these functions. Quantification of texture, hence, is reduced to estimation of these coefficients. Although this type of description is powerful in performing invariant texture analysis, namely, the derived results are not affected by translation, rotation, affine, and perspective transform of an image, it is not covered here as it has limited applicability to image classification. Instead this section concentrates on four popular texture measures: statistical, gray tone-based, Fourier spectra, and semivariogram.
10.1.1
Statistical Texture Quantifiers
The statistical method is the most suited to description of texture on remote sensing imagery on the basis of a collection of pixel values within a neighborhood. The pixel value at one location is compared to the value of other pixels within an operating window mathematically. So far, a number of indices have been proposed for the quantification. They differ from one another in their complexity of computation and in the size of the operating window. Complexity is judged against the order of statistics that ranges from first, second, third (skewness), to even fourth (kurtosis). Common first-order statistical measures include mean Euclidean distance, mean contrast, and spatial autocorrelation. The mean Euclidean distance D is calculated using the following equation:
D(k) =
∑ ∑ [DN(i, j) k
k
− DN c k ]2
n−1
(10.1)
where DN(i, j)k = pixel value at location (i, j) in spectral band k DNck = value of the central pixel in the window in band k n = number of pixels in the window Spatial autocorrelation is derived using Geary’s ratio [Eq. (10.2)] or Moran’s coefficient [Eq. (10.3)]. Although both Geary’s ratio and Moran’s coefficient were initially devised for interval data enumerated over an area, they can be adapted easily for pixel values that are enumerated over a sampling area on the ground as a ratio. In the calculation the ground sampling area can be treated as the enumeration unit. So the ratio or coefficient measures similarity in
Spatial Image Analysis spectral value between adjoining pixels. A strong autocorrelation suggests that pixel value varies gradually from one location to the next predictably, indicating a smooth texture. If the correlation is very loose, pixel value behaves unpredictably, as is the case with a coarse texture. n
GR =
n
∑ ∑ wij (Pi − Pj )2 i =1 j =1
n
n
2∑ ∑ wij (Pi − DN ) / (n − 1)
(10.2)
2
i =1 j =1
where
wij = the similarity of pixels Pi and Pj ; wij = 1 if these pixels are adjoining each other; otherwise 0 DN = the mean value of all pixels within the operating window n = the number of pixels in the window n
MC =
n
∑ ∑ wij (Pi − DN )(Pj − DN ) i =1 j =1 n
n
∑ ∑ wij (Pi − DN )2 / n
(10.3)
i =1 j =1
The association between autocorrelation and image texture is complicated by its dependency on scale. The autocorrelation of immediately adjoining pixels is different from that calculated from pixels separated by a larger distance. The texture derived at one scale likely differs from that at another scale. A number of texture values can be derived from the same image, each corresponding to a unique scale. This scale dependency may be handled by deriving texture at multiple scales and then averaging the results at all scales. Second-order parameters include variance and standard deviation, both of which are able to shed light on the spatial variation of pixel values within a window. A larger variance or standard deviation is indicative of more spatial variations in pixel value, thus a coarse texture. However, both variance and standard deviation are inherently nonspatial. In this sense, neither can capture the essence of texture realistically as different textures can have the same variance or standard deviation. There is no unique correspondence between a given texture and its quantitative value. Two drastically different textures associated with different spatial placements can result in the same value in the texture measure because of their inability to capture the spatial variation of pixel values precisely. At most they are imperfect substitutes for image texture. The spatial
393
394
Chapter Ten component is embodied through the adoption of an operating window. The deficiency of the second-order statistical parameters may be overcome with the gray tone spatial-dependence matrix of pixel values.
10.1.2 Texture Based on Gray Tone Spatial Matrix The gray tone spatial matrix is a more precise and scientific depictor of texture than nonspatial statistical indices. This spatial dependency gray level co-occurrence matrix (GLCM) is computed from the spatial stochastic properties of pixel values in a given spectral band or its subset (Haralick, 1979). Contained in this matrix is the probability of switching from one pixel value to another value in an image neighborhood, or the frequency of concurrence of two specified pixel values at two designated positions. The position between this pair of pixels is governed by the distance displacement vector δ = (Δi, Δj) (i = row; j = column), including its bearing. For a given separation, four angular matrices (horizontal, vertical, and two diagonal directions) are generally computed (Augusteijn et al., 1995). Considered in these calculations are only immediately neighboring pixels, even though wider distances (e.g., 2 or 3 pixels) have been used. Derivation of the GLCM matrix denoted as Md is illustrated using an image of 6 × 6 pixels (Fig. 10.2a). It contains differences between pairs of pixel values in an image segment separated by a distance of δ = (Δi, Δj) = (1, 0) (i.e., neighboring pixels in the same row) and at an angle of 0° (i.e., horizontally) in the direction from left to right (Fig. 10.2b). Organized in the two-dimensional (2D) GLCM are pixel values that make up both the horizontal and vertical axes. The dimension of the matrix is controlled by the pixel value range of the input image. This array has a size of m × m, with m being the number of all possible gray level values. Since the pixel value ranges from 5 to 8, the matrix has a dimension of 4 × 4. Contained in the GLCM Md is the probability or frequency of the pair of 5 5 5 6 7 7
6 5 6 7 7 7
6 7 8 8 8 8
7 8 8 7 8 7
(a)
8 8 8 7 7 6
6 7 7 6 6 5
5 6 7 8
5 1 1 0 0
6 2 1 3 1 (b)
7 1 2 3 5
8 0 1 5 4
5 2 2 0 0
6 2 1 2 0
7 0 4 5 3
8 0 0 3 6
(c)
FIGURE 10.2 A simple image of 6 row by 6 columns (a) and its GLCM Mδ calculated at (Δi, Δj) = (1, 0) (b) and at (Δi, Δj ) = (0, 1) (c). The entry in row i and column j is the number of times gray level i occurs immediately to the left of gray level j.
Spatial Image Analysis gray levels (i, j) occurring at separation d, denoted as pd (i, j). For instance, “4” in the matrix represents four pixels having a value of 8 whose right neighboring pixels have a value of 8 as well. A matrix is conveniently constructed based on symmetry by counting the number of pairs of gray levels separated by a distance of either d or −d. Multiple GLCM matrices are constructed from the same image using different spacing separations δ (i.e., variation in direction and distance). A variety of texture measures can be extracted from GLCM. Four useful measures that can be derived from the probability density pd (i, j) are energy, variance, dissimilarity, and homogeneity (Table 10.1). Energy p(i, j) measures the uniformity of texture, or pixel-pair repetitions. A high energy is derived from pixel values that have a constant or periodic distribution. Variance measures the heterogeneity of pixel values. Pixels having a large range of values have a large variance. Similar to contrast, dissimilarity measures the difference between adjoining pixels. More texture indicators can be calculated from the secondorder gray level and gray level difference statistics, including contrast (the second moment of the probability density pd(i, j)), angular second moment, entropy, and the weighted mean value of the components of the probability density (Haralick et al., 1973). Contrast measures the difference between the maximum and minimum value of a contiguous set of pixels. Highly correlated to energy, entropy measures the disorder of an image. A nonuniform texture has a high entropy value. Four other texture parameters (long-runs emphasis, gray level distribution, run-length distribution, and run percentage) can also be calculated from gray level run-length statistics (Table 10.2). The gray level difference vector counts the occurrence of the absolute difference between the reference pixel and its neighbor. It can be derived by adding elements in lines parallel to the main diagonal of the GLCM. Although the dimension of the operating window is a variable critical to the quantification of image texture based on GLCM, there is no theoretical guidance to stipulate the appropriate selection of the optimal window size. Commonly adopted sizes are 3 × 3, 5 × 5, and 7 × 7 pixels in practice. The window size adopted for calculation of texture should reflect the characteristics of the area under study or the objects of study, taking into consideration the spatial resolution of the imagery. For instance, a small window is preferable for spectrally homogeneous classes (Chen et al., 2004), but a large window size is required for spectrally heterogeneous classes. Which window size is the most appropriate also depends on the nature of the information classes to be mapped. For relatively smoothly textured classes such as agricultural, a quite large window size is needed to achieve a stable and homogeneous measure (Ferro and Warner, 2002). A large window
395
396
Chapter Ten
Texture Measure Energy
Formula m
Description
m
∑ ∑ pδ (i, j)2 j =1 i =1
Variance
m
m
∑ ∑ (i − u)2 pδ (i , j) j =1 i =1
Dissimilarity
m
m
∑ ∑ pδ (i, j)|i − j| j =1 i =1
Homogeneity
m
m
p (i , j)
∑ ∑ 1 + [R(δi) − C( j)]2 j =1 i =1
Ng Ng
u = ∑ ∑ i. pδ (i, j) i=1 j=1
Measures texture uniformity, or pixelpair repetitions. Constant or periodic distribution of gray level values produces high energy. Measures heterogeneity. A wide range of gray level values returns a large variance. Similar to contrast. Its weights increase linearly instead of weighting the diagonal exponentially. Measures tonal uniformity. Sensitive to the presence of near diagonal elements in a GLCM.
Ng Ng
σ 2 = ∑ ∑ (i − u)2 pδ (i, j) i=1 j=1
where Ng is the number of gray levels entry p(i, j) is the GLCM R(i) is the gray level value for a row C(j) is the gray level value for a column Source: Modified from Herold et al., 2003.
TABLE 10.1 Major Texture Measures Based on GLCMs and Their Description
suits mature trees and commercial land use classes better, whereas a small window size is appropriate for grassland. The window size, as measured by the radius of the circular window, exerts a varying influence on the utility of texture in image classification (Debeir et al., 2002). A clear relationship exists between window size and class separability. Large windows produce a stable
Second-Order Gray Level Statistics
Gray Level Difference Statistics
Contrast Long runs emphasis
Σ (i-j)²pd(i, j )
Σ i ²pd(i )
Angular second movement Gray level distribution
Σpd(i, j )²
Entropy Run-length distribution
−Σpd(i, j )log pδ(i, j )
Correlation/ Mean/ Run percentage
Σ [ijpd (i, j )-μxμy]/δxδy
Gray Level RunLength Statistics
∑ j p (i, j)/ ∑ p (i, j) 2
δ
δ
Σ pd(i )²
∑ i ⎡⎣∑ j pδ (i, j)⎤⎦ / ∑ pδ (i, j) 2
−Σpd(i )log pδ(i )
(1/m) Σipd(i )
∑ j ⎡⎣∑ i pδ (i, j)⎤⎦ / ∑ pδ (i, j) 2
∑ pδ (i, j) / N 2 (N – number of pixels in the image)
where P(i, j) in the (i, j)th element in the matrix, divided by the sum of all the matrix elements mx and dx are the mean and standard deviation of the row sums of matrix Md my and dy are the analogous statistics of the column sums Source: Summarized from Haralick, 1973.
397
TABLE 10.2 Texture Parameters Calculated from Second-Order Gray Level, Gray Level Difference, and Gray Level Run-Length Statistics
398
Chapter Ten texture measure, but also result in large edge effects. Small windows minimize edge effects, but often do not provide stable texture measures (Ferro and Warner, 2002). The selection of an appropriate window size is also affected by image resolution. An image of a finer spatial resolution requires a smaller window size than an image of a coarse spatial resolution. A larger window (e.g., 9 × 9) usually leads to more accurate estimates of pixel gray level distribution due to reduced random error, and consequently lower noise in the feature image. However, a large window involves more averaging over different texture properties near region boundaries, and introduces systematic errors to the derived texture. There is no universally proper window size for classifying all land covers in an image. A larger window size increases the classification accuracy for some classes, but lowers the accuracy for some other classes in the same classification (Ferro and Warner, 2002). The indiscriminant application of a fixed window size is problematic for pixels at the borders that are commonly misclassified. The larger the window size, the more pixels are classified erroneously (Walter, 2004). Such misclassifications may be remedied by establishing a texture feature using a moving window. Customized window sizes are preferred over fixed, arbitrary windows as they enable forest-stand parameters such as leaf area index, stand density, and volume to be estimated more accurately (Franklin and McDermid, 1993; Wulder et al., 1998). A 7 × 7 window size is optimal in improving the global classification accuracy of intraurban land cover types (Puissant et al., 2005). However, a window size of 32 is deemed sufficiently large in order to produce separable texture features consistently across different radar datasets (Clausi and Yue, 2004). The risk of using such a large window size is that it may not enable fine textures to be quantified when multiple textures coexist. No matter which window size is adopted for texture derivation, or which texture measure is adopted for the calculation, all indices derived from GLCM face two common limitations: • First, the number of spectral bands used in the derivation is restricted to only a single band. Spaceborne imagery is commonly recorded in the multispectral domain. The use of a single band in deriving texture represents a huge waste of information content in other spectral bands. On the other hand, it is unknown how the texture derived from individual bands should be combined to generate a compound texture index. An intuitive solution is to derive texture from individual bands and then average the sum. • Second, it is computationally intensive to quantify texture from satellite imagery recorded at 8 bits. There are 256 possible gray levels. The time required to calculate the
Spatial Image Analysis matrix is further prolonged with the new generation of satellite data recorded at a radiometric resolution as high as 10 or even 11 bits.
10.1.3 Texture Measures from Fourier Spectra The Fourier transform is a process of converting an image from the spatial domain to the frequency domain. A discrete 2D image expressed as DN(rj, ck) is transformed into a series of sine and cosine functions whose spatial frequencies un and vm are captured to describe texture. The transform of an M × N image and Eq. (6.27) is adapted to the following form (Augusteijn et al., 1995):
M
N
F(un , vm ) = ∑ ∑ exp(2π iu n rj / M) ⋅ exp(2π ivmc k / N )DN (rj , c k ) (10.4) rj =1 c k =1
where i = −1. F refers to the real-valued Fourier power spectrum from which different types of features can be derived, such as the radial distribution of values in |F|². Large values of |F|² clustering close to the origin signify a coarse texture, and a fine texture is associated with a dispersed distribution of these values. Thus, a measure of coarseness can be derived from the averaging of |F|² over ring-shaped regions centered at the origin. Such a measure may be further complemented by the angular distribution of values in |F|² that is sensitive to texture directionality. Statistical measures such as maximum magnitude, average magnitude, energy, and variance of magnitude can also be derived from |F|. No matter how many measures are selected for calculation, they are limited in that the spatial component of texture is not reflected realistically.
10.1.4
Semivariogram-Based Texture Quantification
The uncertainty surrounding the selection of a window size disappears with the semivariogram method of texture quantification. The semivariogram is a diagram of distance h versus variance γ(h) (Fig. 10.3). Pixel self-similarity is related to distance or neighborhood size. Optimal window sizes are decided from the characteristics of a particular spectral band. In particular, the range r seems to be a reliable indicator of the appropriate scale, beyond which the variance of pixel values levels off. In images of a lower spatial resolution with fewer pixels than the optimal window size predicted by the semivariogram, the window
399
Chapter Ten
g (h)
400
r
lag(h)
FIGURE 10.3 A semivariogram, or diagram, of the variance of pixel values versus window size h. Pixel self-similarity is able to shed light on the best window size. r, the range or distance at which semivariance reaches plateau for the first time, is used to calculate texture.
is restricted to the plot area. Several models are available for modeling the variogram, such as spherical, exponential, and sinusoidal. The spherical model might be the most appropriate at depicting texture in certain cases, and the exponential or even the sinusoidal model may be the optimal in some other cases (Maillard, 2003). Such an inconsistency is avoided by always choosing the best model. The standard formula for calculating semivariograms [Eq. (10.5)] is applicable to a single spectral band. Multispectral bands have to be combined arithmetically to create a single band before texture is derived from them (Carr, 1999). The merge of two bands to create a new one may be iterated for each possible two–spectral band combination. Classification of multispectral bands based on semivariogram-derived texture requires a crossvariogram. Unlike the variogram that accounts for within spectral band correlation, the cross-variogram, or paired-sum variogram, represents between-band correlation. It captures the mean spatial relationship of pixels for a particular land cover across spectral bands (Carr, 1996).
γ ( h) =
where
1 N [DN (i) − DN (i + h)]2 2N ∑ i =1
(10.5)
DN(i) = value of the ith pixel DN(i + h) = value of another pixel at a distance separated by a vector h that has a directional component N = the number of pixel pairs separated by the range h in a particular direction
Spatial Image Analysis In total, four texture values exist in the four directions. Their average yields one omnidirectional variogram, meaning the texture is the same in all directions (Carr, 1999). Variogram texture image classification is implemented by computing the semivariogram for each class using a training sample of an M × M size. Classification itself is still per-pixel in that it proceeds pixel by pixel. However, the comparison is based on its texture calculated over a neighborhood of M × M instead of its spectral value alone, namely the numeric distance metric: K
Distance = ∑|γ t (i) − γ p (i)| i =1
(10.6)
where K is defined as the allowable increment of h within the constraint of the neighborhood size M; subscripts t and p refer to the semivariogram calculated from the training sample and the pixel in question, respectively. A pixel is assigned to a class to which the distance is the shortest (Carr and Miranda, 1998). Variogram-derived texture classification may be implemented in three ways: based solely on spectral information, texture information, or a combination of both. The combination of texture classification with that of spectral information is found particularly valuable for monospectral radar imagery (Carr, 1996). In the combination of spectral and textural information, both types of information are used either sequentially or simultaneously. In the former case, the input image is first classified spectrally using a per-pixel method such as minimum distance to mean. The classified results are then refined using textural information, during which the assigned pixel identity is further verified or modified in accordance with the spatial evidence. In the simultaneous use of spectral information and textural information, the decision rule is identical to Eq. (7.9) except that the calculation of spectral distance has to be modified accordingly.
10.1.5
Comparison of Texture Measures
It is difficult to judge which texture measure is the best as the performance varies with the ground features to be mapped and the complexity of their boundaries. Both the GLCM and gaussian Markov randomfield measures can produce correct feature estimates for simple boundaries in light of multiple textures (Clausi and Yue, 2004). Gray level co-occurrence probabilities are more capable of discriminating sea ice on Synthetic Aperture Radar (SAR) imagery than Markov random fields at a smaller window size. They are also more sensitive to texture boundary confusion than the latter in image segmentation. The reliability of estimating these features is compromised if their boundaries are irregular. The performance of texture measures is also affected by the number of spectral bands used in the classification.
401
402
Chapter Ten In classifying remote sensing data of three multispectral bands, the GLCM is, on average, 3.1 percent more accurate than the Fourier transform measures (Dulyakarn et al., 2000). Of the six measures (pixel patterns, co-occurrence matrix, gray level difference, texture tone, Fourier transform, and Gabor filters), the Fourier measures are the most accurate. The performance of co-occurrence measures is the most consistent when a single band is used (Augusteijn et al., 1995). However, this conclusion is not useful as most satellite data are classified in the multispectral domain. The performance of a given texture measure is affected by its distinctiveness. Although semivariogram, Fourier spectra, and GLCM are all powerful measures of texture, achieving a high Kappa value over 70 percent (90 percent if edges are excluded) in separating a set of very different textures, the GLCM measure is better at separating textures that are visually separable easily (Maillard, 2003). The semivariogrambased method is slightly better at separating similarly textured patches. Both variogram and GLCM are generally superior to fast Fourier transform, which is more sensitive to small variations in textures than GLCM. The performance of both methods is degraded in light of complex situations (e.g., too many classes to be distinguished). The relative performance of texture quantifiers is also affected by the remote sensing data. For microwave imagery the semivariogram textural classifier outperforms the co-occurrence matrix classifier in terms of accuracy (Carr and Miranda, 1998). This method is particularly useful for classifying texture in microwave imagery. However, the performance of the two classifiers is comparable if the imagery is optically acquired over the reflective and NIR spectrum, such as Landsat Thematic Mapper (TM), SPOT (Le Systeme Pour l’Observation de la Terre), and Linear Imaging Self-Scanning Sensor (LISS)-II data, even though the variogram method is substantially more computationally efficient than spatial co-occurrence matrices.
10.1.6
Utility of Texture in Image Classification
Incorporation of texture into a per-pixel classifier facilitates the improvement of classification accuracy. Texture provides additional information and makes the classification more accurate than without it (Skidmore et al., 1997). Remarkable results of cloud classification were achieved from texture features calculated in a window of 512 × 512 pixels (Lee et al., 1990). Incorporation of textural measures into the classification of logged forests increased classification accuracy by almost 40 percent, even though the classification accuracy was only approximately 50 percent using both spectral and textural features (Chan et al., 2003). The accuracy of logged forests increased significantly by 36 percent after texture measures were added. Owing to the use of texture, classification accuracy at the stand level was improved by 21 percent over that obtained using spectral data alone; the
Spatial Image Analysis accuracy improved further to 80 percent in stands grouped according to species dominance/codominance (Franklin et al., 2001). The overall accuracy in a highly generalized classification rose over that obtained using spectral clues alone. A combination of texture derived from GLCM, gray level difference histogram, and sum and difference histogram with spectral features significantly improved the classification accuracy over that with pure spectral features (Shaban and Dikshit, 2001). The amount of improvement was larger if more texture features were used. It must be noted that the higher accuracy is achieved with distinct textures that are not spatially overlapping with each other (Wang and He, 1990). The accuracy will be much lower if the image covers a natural scene where multiple textures may coexist spatially. The utility of texture in improving classification accuracy is affected by a plethora of factors, such as the ratio of noise to information, the texture measure adopted and the window size of its derivation, the land covers to be mapped, and the imagery to be classified. If the ratio is low, there is little advantage in including texture in a classification. Highly promising results are achievable using texture quantified with a new statistical approach, reaching a perfect accuracy for nonborder pixels (Wang and He, 1990). The spatial classification–based autocorrelograms of IKONOS panchromatic imagery yielded an accuracy of 0.95 (Kappa = 0.9) in classifying orchards and vineyards (Warner and Steinmaus, 2005). However, the accuracy (0.865) is lower with a maximum likelihood classification of 32 gray level co-occurrence texture bands. Classification accuracy was improved from 55 percent achieved with traditional spectral-based classification to 92 percent using the lacunarity approaches in mapping complex urban features from high resolution image data (Myint and Lam, 2005). Texture of continuous urban classes derived from a larger window size is associated with a higher mapping accuracy. Texture of other covers led to a lower accuracy if derived from a small window size, but to a higher accuracy if derived from a large window size (Chen et al., 2004). Therefore, the influence of window size on the derived texture is not uniform even at the same geographic scale. The utility of texture in image classification, however, does vary with the land covers to be mapped since not all covers have a distinctive and unique texture. The utility of texture in improving classification accuracy depends on the distinctiveness of texture for a class. Inclusion of texture may improve the mapping accuracy for certain texturally distinct covers, such as forest, agricultural, and urban areas, at a very small scale. These covers have monotonous components. For an accurate classification of land cover classes in urban areas, especially urban residential, it is very important to incorporate spatial relationships among pixels by using textural features (Frohn and Arellano-Neri, 2005). They have a spatial pattern that cannot be
403
404
Chapter Ten recognized solely from spectral information. Image texture improves classification accuracies for hardwood stands more than for softwood stands (Franklin et al., 2001). In mapping vegetation (open, flooded, and dense) and water, vegetation units and water could be discriminated and tentatively mapped on the basis of texture derived using the semivariogram method (Miranda et al., 1996). Inclusion of textural features by specifying a window of 7 × 7 pixels in a neural network classification improved the accuracy for forest and agriculture, but decreased the accuracy for water and build-up land (Bischof et al., 1992). Additionally, the degree to which texture is able to improve classification accuracy depends on the type of imagery used and its spatial resolution. Classification of radar imagery benefited considerably from the use of texture much more than optical and NIR imagery (Carr, 1996; Carr and Miranda, 1998). Texture brings little more advantage in classifying optically obtained visible and NIR images. The improvement in the overall accuracies is modest for fine resolution (<1 m) images (Franklin et al., 2000). The inclusion of texture increased classification accuracy more for high spatial resolution imagery relative to coarse resolution imagery. Texture was more effective for improving the classification accuracy of land use classes at finer resolution levels (Chen et al., 2004). The use of textural information yielded a much higher rate of correction classification (75 percent) than the use of spectral information (36 percent) (Carr, 1999). However, the accuracy is similarly high at 74 percent in another test in which both spectral and textural information is used. Spatially, not all image areas benefit equally from the use of texture in a classification. It is possible to achieve perfect accuracy if texture is pure and nonoverlapping spatially (Fig. 10.4a). However, misclassifications emerged at the junction of two similar textures (Fig. 10.4b). The limited capability of texture in classifying border pixels has been repeatedly demonstrated in a few studies. In spite of the perfect classification elsewhere, classification accuracy is noticeably lower in some localized areas, such as along the boundary of two distinct textures where between-class variance is much higher (Ferro and Warner, 2002). In fact, misclassification errors can reach up to 20 percent along edges (Maillard, 2003). In order to be useful near the border areas, texture must be measured from a large number of pixels that are influenced by between-class variance (Ferro and Warner, 2002). A delicate balance must be weighed between making the texture measure potentially useful and the added confusion of the between-class variance pixels. Regardless of the window size, the derivation of texture is always problematic for border pixels that simply do not have the same number of neighboring pixels as those in the middle of an image. Here it is impossible to describe the texture adequately using a limited number of pixels. This difficulty, however, vanishes with the artificial neural network classifier (refer to Chap. 8
Spatial Image Analysis (I)
(II)
(III)
(IV)
(a)
(b) FIGURE 10.4 (a) Four of Brodatz’ texture images: (i) beach sand; (ii) water; (iii) pressed cork; and (iv) fur hide of an unborn calf. (b) Classified results of the spatial patterns (source: Wang and He, 1990).
405
406
Chapter Ten for details) because texture can be implicitly included by inputting all the pixels surrounding the one under consideration within a neighborhood, thus avoiding the necessity of texture quantification altogether (Bischof et al., 1992).
10.2
Contexture and Image Analysis Contexture is another measure of encoding spatial relationships among land cover types. It refers to the location or spatial association of one feature in relation to other features in the vicinity, or the interrelationship between pixels and/or regions in a predetermined neighborhood. Usually, these features have a known identity, such as a building or car park, in order for the contextual information to be of any use. Contexture is usually described in three ways: • The scene model—It describes the expected connections between different elements in the scene (Nicolin and Gabler, 1987). This method of description does not suit all land covers as their components are not always identifiable. • The heuristic approach—It allows the use of knowledge in a special domain with minimal requirements on training samples. A priori knowledge of the object is essential in context-based image classification. This method is not suitable for image classification since it is unable to be expressed quantitatively. • The rule format—The contextual information of an object in a given part of a scene is described in terms of its distance to other objects (Moller-Jensen, 1990). This method is the most popular with knowledge-based image classification. Contexture has two connotations, image and spatial (Kontoes and Rokos, 1996). Image contextual information refers to those measures applied internally on remotely sensed data (Gurney and Townshend, 1983), such as class labels assigned to neighboring pixels/ segments on the map layer. It can be either qualitative or quantitative. Qualitative contextual information, such as a road leading to or close to a building, is used frequently by human interpreters. Machine-based classification relies on quantitative spatial contexture that is usually derived from ancillary data. Such information is quantified from a vector topographic database (e.g., roads, railways, settlements, hydrological networks, and so on) or raster digital elevation models (DEMs) (e.g., slopes, orientation, and so on). Other data sources for acquiring quantitative spatial contexture include soil type, subsurface drainage, land suitability, temperature, precipitation, proximity to facilities networks, and environmental parameters. Spatial contexture, such as a
Spatial Image Analysis certain detected vegetation class that is incompatible with soil type and altitude information, may be derived from soil maps and DEMs (Baltsavias, 2004). Unlike texture, contexture has not been widely exploited to improve classification accuracy. Use of contexture in image classification may be achieved in two ways: use of additional contextual information derived either from texture features or from other map products, and a knowledge-based system that makes use of geographic context (Kontoes and Rokos, 1996). The implementation of the second method requires intelligent image classification and will be discussed more in Sec. 11.6. Both methods generated more accurate classification results in a supervised classification than a parametric image classifier alone. Spatial contexture is especially useful in situations where spectral knowledge alone proves insufficient in reliably labeling pixels (Wilkinson and Megier, 1990). The introduction of contextual features (roads, hydrology, relief, and so on) into image classification, together with texture, considerably increases the mapping accuracy of land covers, even though not all covers benefit from the extra information equally (Debeir et al., 2002). With the introduction of textural features and contextual data, accuracy increased between 0.60 and 0.82 for the Kappa coefficient. One problem with contexture is the presence of artifacts in the classified results. This problem disappears if the contextual information is represented in the rule form in intelligent image classification (to be discussed in Sec. 11.6.4).
10.3
Image Segmentation Image segmentation refers to the process of decomposing an input image into spatially discrete, contiguous, nonintersecting, and semantically meaningful segments or regions. These regions are patches comprising relatively homogeneous pixels. These pixels share a higher internal spectral homogeneity among themselves than external homogeneity with pixels in other regions (Ryherd and Woodcock, 1996). They may be regarded as object primitives as they are not always associated with real-world objects. These object primitives serve as building blocks for other segmentation processes and are fed into a classifier as the input (Definiens Imaging, 2004). The segmented image, however, should not be parametrically classified at the pixel level, though it is permissible to classify it using nonconventional classification methods (e.g., object-oriented image classification). Image segmentation fulfils a number of image processing functions. It is a vital preparatory step toward image classification based on a hierarchy above the pixel level. Image segmentation is also an indispensable step in certain applications that focus on a particular type of land cover among several present within a study area (e.g., water bodies in water quality analysis). The land covers of no interest are
407
408
Chapter Ten stripped off an image via image segmentation. Through image segmentation, much of the structural clutter in hyperspatial satellite data, which tend to have a high internal spectral variability among the same land cover types caused by a fine spatial resolution, can be successfully removed and local spectral variation reduced. If implemented in conjunction with classification, image segmentation can result in meaningful land cover classes directly. Image segmentation may be carried out in the top-down or bottomup strategies or a combination of both. In the top-down approach, the input image is partitioned into many homogeneous regions. In the bottom-up approach, pixels are linked together to form regions that are amalgamated later. In either strategy homogeneous patches are formed by generalizing the subtle spectral variation within an identified neighborhood. These patches are numbered as low as binary (e.g., land vs. water) or as high as is necessary. Image segmentation can be carried out in the spectral and/or spatial domain. In the spectral domain, pixel value– or gray level–based image segmentation may be implemented in one of five methods: measurement-space-guided spatial clustering, single-linkage region growing, centroid-linkage region growing, and split-and-merge methods (Haralick and Shapiro, 1985), and their combination (Hu et al., 2005). In terms of the manners of region formation, these methods fall into three broad categories: pixel-based, edge-based, and region-based.
10.3.1
Pixel-Based Segmentation
Also known as thresholding, pixel-based image segmentation aims to stratify an input image into pixels of two or more values through a comparison of pixel values with the predefined threshold T individually. In this method a pixel is examined in isolation to determine whether or not it belongs to a region of interest based on its value in relation to the mean value of all pixels inside this region. If its value is smaller than the specified threshold, then it is given a value of 0 in the output image; otherwise it receives a value of 1 [Eq. (10.7)]. This method is easy to implement and computationally simple. ⎧0 I ' (i , j) = ⎨ ⎩1
DN (i , j) < T DN (i , j) ≥ T
(10.7)
where DN(i, j) refers to the pixel value at position (i, j). Apparently, this kind of operation can be carried out only for a singular band. In case of multiple bands, the same operation has to be repeated a number of times. Each time the threshold T has to be modified accordingly. Differently segmented results from multiple bands may be combined linearly using differential weights. Thresholding may be implemented locally or globally. In local thresholding, the image is divided into smaller subimages, and the
Spatial Image Analysis
(a)
(b)
FIGURE 10.5 An example of image segmentation based on pixel value. (a) Raw band 3 of IKONOS image; (b) image segmented at a threshold of 132. This binary image shows land and water. The segmented results still require further spatial filtering to be perfect.
threshold for each subimage is derived from the local properties of pixels. In global thresholding, the entire image is segmented into one or more values universally (Fig. 10.5). Similar to density slicing, global thresholding produces a binary output if the pixel values of the input image have a bimodal distribution. The success of this method relies on the proper selection of the threshold and the nature of the scene. In order to increase the reliability of the segmented result, it is recommended that this threshold be determined from the histogram of the input band. Since pixel values along a boundary vary only gradually, there is no abrupt change in the histogram. Thus, the selection of the threshold without consideration of the actual boundary may not produce a realistic segmentation. A too conservative threshold causes the segmented region to be smaller than in reality, and a too lax one produces just the opposite effect. How to avoid this problem requires consideration of the nature of the edge itself. This problem, however, disappears in edge-based segmentation.
10.3.2
Edge-Based Segmentation
Edges in imagery refer to boundaries between land covers at which pixel values change abruptly along a certain direction. Edges can be detected using the techniques described in Sec. 6.4.2. In edge-based segmentation, pixels completely encompassed by edge pixels or a boundary are considered part of a homogeneous region. Image pixels either belong to a segment or form a boundary. Also known as boundarybased segmentation, edge-based segmentation starts with the detection of linear features or edges in the input image using the many edgedetection algorithms covered in Chap. 6. Two important edge-based
409
410
Chapter Ten segmentation methods are optimal edge detector and watershed segmentation. In the former method the image is first filtered and then thresholded so that only coherent boundaries are preserved in the output image. In the watershed method the input image is first transformed into a gradient image that is hereafter treated as a topographic surface from which catchments and basins are detected (Vincent and Soille, 1991). Each catchment basin is considered to correspond to a homogeneous region. In addition to texture, the segmented results are sensitive to noise. Overdetection of edges is rife in noisy images with the morphologic gradient. This problem may be remedied through median filtering that homogenizes the image locally, and eliminates extreme contour-distorting gradients. Contour sensitivity can also be limited by thresholding the image gradient. Boundarybased segmentation, nevertheless, does not always lead to partitioning of the input image into regions because the detected boundaries may be discontinuous (Carleer et al., 2005), or because many small ground objects are obscured by boundary pixels (Geneletti and Gorte, 2003). Additional processing steps are needed to link up the detected boundary segments to form polygons. Even so, the results are still subject to the selection of segmentation parameters and are inferior to those obtained with the region-based algorithms. Since spatial relationship among pixels is not taken advantage of, segmented regions suffer from the salt-and-pepper effect. These limitations can be overcome by incorporating more image features into the segmentation process, such as texture and intensity that are commonly used in region-based segmentation.
10.3.3
Region-Based Segmentation
Essential to region-based segmentation is the identification of homogeneous areas via the application of homogeneity criteria among candidate segments. Relying on the statistically derived homogeneity over an area, these region-based methods are resistant to noise in the image data. Region-based segmentation can be divided further into subgroups of region growing, and split and merge.
Region Growing Region growing can be implemented in two ways: general purpose and knowledge based (to be covered in Sec. 10.3.4). General purpose, or partial-image segmentation based on region growing starts from initial seed pixels or areas without any a priori knowledge about the scene. These seed regions are then augmented with neighboring pixels if they meet the homogeneity criteria. A region grows until certain homogeneity thresholds are reached, such as size and shape. This process is usually iterative, so a region grows until no more pixels can be allocated to any of the segments. The process is repeated until the entire image is segmented. This very popular method is implemented either locally or globally. One local region-growing method is known
Spatial Image Analysis DN7 DN6 DN5
DN0 DN DN4
DN1 DN2 DN3
FIGURE 10.6 Pixel Pi under consideration and its eight neighboring pixels within the operating window of 3×3 pixels.
as the Purdue method, in which spectrally similar adjoining pixels are replaced with the mean gray level of pixels within the operating window. Regions of similarly valued pixels are created on the basis of the spectral disparity of a single band. In this window-based operation, a seed pixel or region labeled K is identified first (Fig. 10.6). It then gradually expands through annexing those pixels adjoining it sequentially. Whether or not a neighboring pixel Pi (i = 1, 2, 3, … , 8) is considered a part of K is determined by its value in relation to that of K. The value of remaining ungrouped pixels in the window is compared with K’s value DN K to see whether their absolute difference |DN K − DN| is smaller i than the predefined tolerance threshold ε. If the answer is positive, then pixel Pi is considered to be part of K and assigned to it. This process is then repeated for the next pixel in the window. Prior to the next comparison, K’s value DN K must be updated using Eq. (10.8) by taking the newly appended pixel’s value into consideration. The process is repeated with the updated mean of the window replacing DNi. In case the condition |DN K − DNi| < e is not met at least once among all pixels in the window, pixel Pi is then excluded from K.
DN n = where
nDN n−1 + DN i n+1
(10.8)
n = the number of pixels that have been confirmed as part of region K DN n = the mean pixel value of region K averaged from the n pixels DN n−1 = the mean value averaged from (n − 1) pixels DNi = the value of pixel Pi in the operating window
Unlike edge-based segmentation, region growing always produces closed patches. Besides, it is flexible in handling multispectral bands. This method, however, has the drawback of potentially generating many small regions. Thus, it is commonly implemented in multiple runs to merge some of them. In the first pass, spatially homogeneous regions are built on the basis of Euclidean distances in the n-dimensional space using the aforementioned method. These tentative results are then further refined by combining the identified regions in the next pass. Examples of region-growing approaches with proven records are the extraction and classification of homogeneous object algorithms, the hierarchical stepwise optimization
411
412
Chapter Ten algorithm, the Woodcock-Haward algorithm, and the interactive mutually optimum region merging algorithm (Blaschke et al., 2000). They sometimes suffer from the lack of control over the break-off criterion for the growth of a region. So it was later modified to include a decision function for segmenting tree crowns in aerial photographs (Erikson, 2003). In this function, the decision is based on consideration of a pixel in the spatial domain and the spectral domain simultaneously. Owing to this modification, the segmented tree crowns are, on average, within 73 percent of the corresponding results from manual delineation.
Split and Merge In this method the input image is subdivided into squares of a uniform size from initial seed pixels or areas, without any a priori knowledge about the nature of the scene. Those adjacent regions with a certain similarity as measured by correlation coefficient in image properties are then merged agglomeratively until the termination criterion is reached (Hu et al., 2005). The segmented results are then pixelwise fine-tuned to make the boundaries of the segmented regions regular. The entire process consists of three steps: hierarchical splitting, agglomerative merging, and pixel-level refining. In the first step the input image is segmented into regions of a uniform feature. Their formation is based on similarity in texture, intensity, and color. Other merging criteria may include size, shape, and mean value. Similar regions that lie within a certain proximity to one another are agglomeratively merged. Many segments may grow simultaneously over a scene through merging of adjacent objects of a similar size and thus of a comparable scale. The number of regions in the segmented results may be restricted in accordance with certain region attributes. One way of controlling the growth of merged regions is to set a threshold to allow the percentage of regions to be merged in a single pass. In merging regions it is critical to select suitable segmentation thresholds. Every merge is accompanied by a calculation of the stopping criterion. Once this criterion is reached, the merging process is terminated. For instance, if the smallest growth exceeds the threshold, then no merging is allowed. This process is iterated until all pixels have been allocated to segments. Afterwards, pixels along the border of each finalized region are checked and assigned to either of the two regions bordering it to achieve better localization of the region boundaries. A major problem with region-based segmentation is that the outcome is sequence-dependent. Thus, different segmentation results are obtained if the search is executed in a different order.
Alternative Implementations The split-and-merge method may be implemented in a two-stage segmentation in which the input image is segmented into “coarse” clusters first, using the fuzzy c-means clustering analysis (Cannon et al., 1986). A cluster may be deleted if its membership is too small, or merged
Spatial Image Analysis with another if it is highly similar to it, as judged against the constructed cross-similarity matrix, in the same way as with the standard clustering analysis. During the second stage each retained cluster is subsequently subdivided into “fine” clusters in order to minimize computation. The separation into a large number of clusters enables the identification of extremely minor classes that would have been overwhelmed in a one-stage clustering. The formed regions can be effectively labeled based on a priori information. In this alternative form of implementation, the identity of the segmented outcome is known, so there is no need to classify the image anymore. This is virtually a means of image classification. Another alternative implementation of region growing is to combine the two broad types of image segmentation methods (region-oriented and line- or contour-oriented) in multiple steps (Kestner and Rumpler, 1984). In the first step the input image is partitioned into regions, lines, small spots, or groups of similar parts, distinguishable from one another by their characteristics such as compactness. They are also depicted differently. Regions are described by their contours (straight lines) while linearly shaped objects are described by their skeletons, and spots by a pair of coordinates. These characteristics dictate how they should be segmented in the subsequent steps. For instance, the region-oriented procedure is invoked to deal with areal objects that have a relatively large area-to-perimeter ratio in a manner identical to the above. The lineoriented procedure is activated to process local edge features characterized by an elongated shape (i.e., a large perimeter-to-area ratio). A confidence value as a measure of segmentation reliability is attached to each generated object. However, in order to be practically successful, this method still requires an elaborate scheme that decides and ordinates which actions should be selected and executed according to a specified task in order to avoid redundant calculation and to construct a high-level representation of all objects formed in the preliminary step. The confidence level attached to each preliminary area of interest can be improved using additional knowledge. This brings out the issue of knowledgebased image segmentation.
10.3.4
Knowledge-Based Image Segmentation
In contrast to general-purpose image segmentation covered above, knowledge-based, or knowledge-guided, image segmentation involves both domain-specific and domain-independent knowledge. Domainspecific knowledge is needed to decide the types of regions an image should be segmented into. Knowledge is also needed in refining an initial segmentation outcome derived using the standard method. The knowledge used may be spectral or spatial, represented as rules (Ton et al., 1991). Spatial knowledge refers to spatial relationships (e.g., proximity, connectivity, and relative orientation) between pixels, which are commonly used in image interpretation. Under ideal circumstances, the perfect knowledge-based image segmentation
413
414
Chapter Ten should consist of two stages, with spectral knowledge used at the first stage and spatial knowledge used at the second stage. One method of implementing knowledge-based image segmentation is categoryoriented segmentation followed by image-oriented segmentation (Ton et al., 1991). At the first stage, spatial and spectral knowledge about a land cover category is integrated into the segmentation process to extract targeted regions. Such kernel information is then combined with the domain knowledge to ascertain their identities, such as urban and barren covers. During the second phase, the specific covers of certain targeted regions are narrowed down hierarchically on the basis of the spectral behavior of covers (e.g., water in multispectral bands, vegetation, and nonvegetation in terms of their vegetation index [VI] values) and spatial knowledge (e.g., the rectangular shape characteristic of agricultural and clear-cut regions, and the spatial proximity of urban areas to roads). Formation of the targeted regions may rely on clustering, region growing, and region adjustment in which spectral knowledge represented in the rule form determines the number of clusters. Spatial knowledge on region size can be used to adjust regions. Key to the success of this kind of segmentation is knowledge representation or rule generation from training samples. Predictably, the more unique (i.e., nonoverlapping) these rules are, the more accurate the subsequent segmentation results. This knowledge-based method can be successful in segmenting images of even complex scenes, achieving an accuracy in the mid-80 percents with three types of vegetation (deciduous, coniferous, and nonforest) (Ton et al., 1991). This very flexible method is applicable to any geographic area. However, the spectral and spatial knowledge may have to be modified for different geographic areas and for different satellite images. More spatial rules are needed to further perfect the segmented results. Unlike spectral knowledge that can be generated automatically, spatial knowledge requires considerable work to be transformed into a machine-readable form of representation. In addition to knowledge derived from the imagery itself, external knowledge can also be incorporated into image segmentation using various means. One method of integrating external knowledge with object models is to modify the Bayes theorem for calculating conditional probabilities in likelihood-based image segmentation of a forested area (Abkar et al., 2000). A parameterized model represents a priori knowledge on geometric shape of forested and nonforested objects. A morphologic model is constructed for predicting the extent of deforestation based on the expansion of farmland caused by deforestation in the past. Such external knowledge is used to locally classify satellite imagery via the Bayes function. This knowledge-based segmentation method relies on criteria derived from local average likelihoods, instead of local means or variances, and hence is much less sensitive to radiometric outliers. It avoids problems commonly
Spatial Image Analysis associated with the segmentation of multispectral data based on edge detection. Another method of incorporating knowledge into image segmentation is via the map model. After an image has been initially segmented, the result may be refined using such external knowledge as structural and contextual information acquired from a topographic map (Mason et al., 1988). Knowledge of region characteristics (e.g., size, shape, compactness, border regularity, and texture) represented in rule form is used to refine the segmentation results, such as checking for domain consistency (increase or decrease confidence) or the possibility for splitting or merging between a pair of neighboring regions. External knowledge is also useful for devising the confidence level at which a segmented region is labeled. This confidence may be based on edges detected using an edge detector (e.g., the correlation between the model boundary and any coincident edge in the image). This image segmentation based on the map model has lower segmentation errors than the traditional method of segmentation.
10.3.5
Segmentation Based on Multiple Criteria
The pixel-, edge-, and region-based segmentation methods are limited in that the operation is based exclusively on one criterion, namely, pixel values in the multispectral domain. Thus, a huge amount of spatial information among pixels is wasted. This inability to make full use of the available spatial properties usually leads to poor segmentation outcomes, especially if the image has a fine spatial resolution. This deficiency may be overcome via three means: • First, by making use of more segmentation criteria. For instance, spatial relationship (e.g., contexture and shape) may be incorporated into segmentation by replacing the value of individual pixels with that averaged in a neighborhood around each pixel. • Second, by making use of additional image characteristics other than pixel values (e.g., texture) as an extra criterion in the segmentation. The utility of texture in segmenting an image varies with the scene. Its use is the most beneficial for areas where the desired classes exhibit textural differences (Ryherd and Woodcock, 1996). The addition of texture into spectral image segmentation brings out stronger benefits in threshold-based segmentation than in minimum size-based segmentation. • Third, by making use of multiple criteria such as shape and texture combined in one image segmentation (Hu et al., 2005). Shape can be depicted by such geometric parameters as compactness C and smoothness S. The compactness criterion is
415
416
Chapter Ten especially important to consider in segmenting urban scenes where building roofs and adjacent roads share similar spectral values but have a very dissimilar shape. There are different ways of calculating smoothness and compactness. One way is to divide the de facto border length l by the square root of the number of pixels comprising this image object n (Benz et al., 2004) or C=
S=
l n l b
(10.9)
(10.10)
Smoothness is defined as the ratio of the de facto border length l to the shortest possible border length b, or the border length derived from the bounding box of an image object parallel to the raster provided by the bounding box for an image object corresponding to the raster [Eq. (10.10)]. Both smoothness and compactness may be combined linearly to define the shape homogeneity criterion that is invaluable in preventing the formation of fractal objects in urban areas. Homogeneity of image objects may be defined by the spectral and contextual information determined from such parameters as shape. Shape heterogeneity describes the change in an object’s configuration as measured by smoothness and compactness. The change in shape homogeneity accompanying a merge (Δhshape) is calculated using the following formulae: Δhshape = wcompt ⋅ Δhcompt + w smooth ⋅ Δhsmooth
(10.11)
where wsmooth and wcompt stand for the weight for smoothness and compactness, respectively, with values between 0 and 1. The proper allocation of these two weight parameters allows adaptation of the heterogeneity definition to an application and determines the success of multiple segmentation; Δhcompt and Δhsmooth represent compactness heterogeneity and smoothness heterogeneity, respectively, both of which are governed by the number of pixels in objects before and after the merge, or Δhsmooth = nmerge ⋅ Smerge − (nobj −1 ⋅ Sobj −1 + nobj − 2 ⋅ Sobj − 2 )
where subscript merge refers to the merged object; nmerge denotes the number of pixels within the merged object; subscripts obj−1 and obj−2
Spatial Image Analysis refer to the two objects prior to the merge; nobj−1 and nobj−2 represent the number of pixels in objects 1 and 2, respectively, before the merge. Multiple segmentation criteria calculated from different parameters are usually combined to derive a compound fusion value f. For instance, the shape heterogeneity criterion derived in Eq. (10.11) may be fused with the spectral heterogeneity criterion Δhcolor to calculate spatial heterogeneity. This combination minimizes the deviation derived from a compact or smooth shape (Benz et al., 2004). The fused value f is a weighted linear combination of spectral and shape heterogeneity. In fact, the similarity between any two regions j and k is calculated separately in each feature space used in the segmentation, or n
Sim jk ( f1 , f2 , ... , fn ) = ∑ Wi ρi = wcolor ⋅ Δhcolor + w shape ⋅ Δhshape i =1
= w ⋅ Δhcolor + (1 − w) ⋅ Δhshape
(10.14)
where n denotes the total number of criteria used in segmentation; Wi refers to the weight assigned to the ith criterion ri (e.g., shape, color, size, texture, and so on); wcolor and wshape stand for the weights assigned to the spectral and geometrical parameters, respectively. Their sum equals 1. The determination of these weights is based on the significance of the criterion in defining the regions. Spectral heterogeneity Δhcolor refers to the spectral variation induced by merging two image objects. Spectral or color heterogeneity is a weighted sum of standard deviation of pixel values within the respective regions in a given spectral band, or the sum of the standard deviations of spectral values in each layer multiplied by its weight wc [Eq. (10.15)]. The color heterogeneity criterion ensures the generation of meaningful objects. Δhcolor = ∑ wb [nmerge ⋅ σ b ,merge − (nobj−1 ⋅ σ b ,obj−1 + nobj− 2 ⋅ σ b ,obj− 2 )] (10.15) b
where sb stands for the standard deviation within an object in band b; wb denotes the weight assigned to band b (Benz et al., 2004). This weight enables multivariant segmentation of an image based on spectral properties. The above calculation has many terms, the exact number being equal to the number of spectral bands of the image. The weight of shape and the image’s bands on the homogeneity of an object can be flexibly modified. The segmentation results may be adjusted in accordance with the desired application by assigning different weights to spectral and shape heterogeneity. Illustrated in Fig. 10.7 is an image segmented at 50 pixels (b) from the color composite image of three multispectral IKONOS bands (a), using the shape and color heterogeneity criteria. The results are not perfect in that superficial objects are still present owing to the conservative
417
418
Chapter Ten
(a)
(b)
FIGURE 10.7 An example of image segmentation based on a combination of spectral heterogeneity and shape. (a) Color composite of three IKONOS spectral bands; (b) Multiresolution segmented results based on a combination of spectral value and shape at a scale of 50 pixels. See also color insert.
Spatial Image Analysis scale (200 m) adopted. Besides, water has been segmented into a few classes. No matter how many parameters are used in generating regions with which method, the segmented results need validation to detect whether the machine has over- or undersegmented regions. Nevertheless, these segmentation results cannot be evaluated automatically. Validation of such machine-produced results has to rely on human beings who are still regarded as the best judge for the evaluation, even though a 2D distance may be used to quantitatively indicate the difference between a human-proposed segmentation and a machine-produced counterpart. Other performance parameters may include region uniformity, region contrast, line contrast, and so on.
10.3.6
Multiscale Image Segmentation
To successfully segment an image, it is imperative to take into account the scale at which the objects of interest occur in conjunction with the spatial resolution of the image. In most cases it is not possible to specify the exact scale level beforehand as there is no universally “ideal” scale for all features. This is especially true in urban areas where ground objects occur at a unique scale of their own. The appearance and characteristics of even the same type of objects vary with the scale of their rendition on satellite imagery. Segmentation of such objects must take place at multiscales. Scale is a unitless parameter related to image resolution. Thus, multiscale segmentation is synonymous with multiresolution segmentation. Analysis at multiscale or multiresolution is necessitated by the fact that not all ground features occur at the same physical scale. The best segmentation result is achievable by segmenting an image at different scales (Burnett and Blaschke, 2003). In multiscale segmentation the input image is first segmented at a small scale by uniting the most similar objects, followed by a set of multiscale objects with their topological relationship fully obtained (Sun et al., 2006). During multiscale segmentation the image is converted into object primitives that share a certain spectral behavior, shape, and context. These preliminary object features are then segmented at a higher level. Multiresolution segmentation is a bottom-up region-merge starting with singular seed pixels, each of which is regarded as a potential region. In subsequent steps, these small regions are merged to form fewer big ones. A pair of neighboring image objects is evaluated to see if they meet the merging criteria. Whether adjoining objects should be merged is governed by the principle of homogeneity or lack of it (i.e., heterogeneity). Namely, a merge should result in minimal growth in the selected heterogeneity criteria. Commonly used amalgamation criteria include area, perimeter, compactness, texture, and shape, all of which are derived from the segmented regions. Determination of their specific values is critical to achievement of segmentation results suitable for a particular type of application.
419
420
Chapter Ten Objects grow in size through successive iterations in which small objects are incrementally merged to form larger ones. This pairwise clustering is accompanied by an even and simultaneous growth of segments over a scene, and the calculation of the above indices for the newly formed objects. Such indices are applied to determining whether they should be amalgamated to form a large object after evaluation against a number of object properties. Expert knowledge may be involved in forming objects at different scales. As the merging process continues, the merged object becomes increasingly heterogeneous. Hence the heterogeneity criterion must be updated following every merge. It is imposed as a constraint on the merging process. The break-off criterion or the stop criterion is based on the relationship between these two objects and the comparison with the squared scale parameter. The merging process is terminated if all pixels have been assigned to regions or when the threshold derived from the user-defined parameters is reached (Baatz and Schäpe, 2000). The outcome of multiscale image segmentation is affected by the scale parameter, the single layer weights, and the heterogeneity criteria. The scale parameter dictates the spatial extent within which pixel values are used to derive spectral heterogeneity in merging two regions. Its squared value internally determines the threshold for terminating the segmentation process. The extent of object growth also depends on the predefined break-off value. The broader this value, the bigger the segmented object. Proper setting of the optimal breakoff value can overcome the limitation of pixel-based approaches in mapping large urban areas. Multiscale segmentation creates homogeneous image object primitives in a desired resolution, taking local contrasts into account without any prior knowledge (Blaschke and Hay, 2001). Multiscale segmentation leads to a better understanding of the image content. A hierarchical network may be created to link image objects at different resolutions or scales. In this way the same image is represented at several resolutions (scales) simultaneously. The constructed hierarchy shows the horizontal neighbors (adjacent objects) of an image object at the same level, as well as their neighbors at other hierarchies. This multiscale representation enables differentiation of the same ground objects on several levels, hence increasing the reliability of their identification. If an image is segmented at multiscales, it can be classified at different scales using the objectoriented method.
10.4
Fundamentals of Object-Oriented Classification Object-oriented image classification has been variously known as perparcel or per-field classification. Objects, parcels, and fields all refer to ground features, such as pastures, urban residential, and impervious
Spatial Image Analysis surfaces. Other objects that are frequently extracted from remotely sensed imagery include forests, agricultural fields, and so on (Baltsavias, 2004). They all have a homogeneous composition and a set of identifiable characteristics. Introduced to image classification in the 1970s, object-oriented image classification was mostly abandoned in favor of per-pixel classifiers because of their ease of implementation. Objectoriented image classification did not gain popularity until a few years ago, thanks to advances in computer hardware, software, and image interpretation theories, and to refined image spatial resolution. This popularity is attributed largely to the release of commercial image analysis software packages such as eCognition and Feature Analyst. Prior to the advent of these systems, object-oriented image classification was very difficult to accomplish. The results already classified using the per-pixel method had to be integrated with a vector coverage to be reclassified in the per-field method (Aplin et al., 1999). In object-oriented image classification, objects do not refer to individual entities such as roads and buildings that may constitute very important geospatial features in many applications. Instead, image objects are defined as contiguous regions of pixels that have a more uniform radiometric property among them than those across other regions. Each object corresponds to a patch of uniform tone and texture. Image objects can be described in terms of their shape, texture, topology, heterogeneity, and spatial relationship with other objects. They are formed by spatially aggregating neighboring pixels with similar spatial or spectral characteristics (Ryherd and Woodcock, 1996; Benz et al., 2004) in image segmentation. The objective of objectoriented image classification is to associate these object primitives with real-world objects. Object-oriented image classifiers operate on objects or group of pixels instead of individual pixels. The spatial variation in tone within an object has been generalized already (Fig. 10.8). Unlike pixel-based classification in which pixels are considered individually and assigned to land cover classes independently of one another, object-based classification treats each patch or object primitive as the smallest unit of analysis in the decision making.
10.4.1
Rationale
With the emergence of hyperspatial resolution satellite images (e.g., IKONOS and QuickBird), the Earth’s surface is captured in ever increasing fine detail. These advanced images permit the identification of submeter ground features from space. Minutely sized features (e.g., buildings and roads in urban areas) require an array of pixels to be represented in the hyperspatial resolution imagery of a relatively small (e.g., 0.5 m) pixel size. However, the same features would not even register as a single pure pixel in Earth resources satellite imagery of a medium spatial resolution, such as Landsat TM and SPOT high resolution visible (HRV). The small ground area covered by a single pixel means that more spectral variation within an object can show up
421
422
Chapter Ten
(a)
(b)
FIGURE 10.8 The fundamental unit of decision making in per-pixel and object-oriented image classification. (a) Input image composed of pixels in pixel-based classification; (b) input in object-oriented image classification. (Source: Walter, 2004.)
on the satellite imagery of a very high spatial resolution, together with its shadow, if present. Consequently, the assumption of the same object having the same spectral value in the same spectral band is violated more frequently than with medium resolution images. These ground features are almost impossible to classify accurately using per-pixel classifiers because of their increased spectral variability. If classified on the sole basis of spectral information of pixels, the classification accuracy will not be sufficiently high to meet the application's needs owing to the rise in the internal variability among the same land cover types. Traditional per-pixel image classifiers based on a statistic relationship can no longer meet the challenges in classifying very high resolution satellite imagery. Such an automatic or semiautomatic method is especially problematic in classifying heterogeneous scenes such as densely populated urban areas. The solution to this dilemma lies in object-oriented image analysis, which appears to offer a promising alternative. Instead of relying exclusively on pixel spectral information, object-oriented image classification allows inclusion of additional information such as shape, texture, size, and context derived from the relationship between adjacent pixels, as well as ancillary information (e.g., DEM data) from other object layers in the classification. Even
Spatial Image Analysis the result of a pixel-based classification and count of pixels that are classified to a specific land use class can be included in the input (Walter, 2004). This diverse range of ancillary information is combined with remotely sensed data. The combined use of spectral information with spatial arrangements (pattern, association with neighboring objects) in an object-oriented image analysis is very similar to the way a human interpreter reaches a decision during visual interpretation. Thus, the derived result can be expected to be similarly accurate to, if not more accurate than, those from per-pixel-based classification.
10.4.2
Process of Object-Oriented Analysis
Object-oriented image classification comprises several steps that include selection of training samples, construction of a class hierarchy, image segmentation, and object-based classification. This classification process may be preceded by a number of preliminary steps, such as fusion of multisource data. All land covers to be mapped must be declared beforehand, with their training samples selected. During training sample selection, representative samples are declared for each of the covers to be mapped. These covers may be represented hierarchically to show their complexity. Construction of a hierarchical class structure at different spatial resolutions (Fig. 10.9) is essential to produce a reasonable classification, but may prolong the process. It
FIGURE 10.9 A hierarchical cover structure that is suitable for mapping urban land covers from fine resolution imagery.
Level I
Level Il Residential Roof Impervious ground Lawn Trees
Commercial Roof Impervious ground
Transportation Impervious ground
Recreational Parks (lawn) Reserves (bushes)
Vegetation Grassland Bushes Trees
423
424
Chapter Ten is not always possible to construct a perfect hierarchy if the land covers are not hierarchical in nature. Information pertaining to an object’s sub- or superobjects in a multilevel object hierarchy allows the classification to be performed at various scales. A higher accuracy is expected at a lesser detail level. As the likelihood or posterior probabilities are calculated per object rather than per pixel, the variance in (spectral) likelihoods is considerably reduced. As a preliminary step in preparation for image classification, image segmentation is a process of forming a discrete number of segments or patches by grouping neighboring pixels with a similar spectral characteristic. Spectral variations within each segmented region are generalized, and each patch is treated as a manifestation of a ground object. Formation of these image objects is based on simultaneous consideration of their spatial properties among pixels with their spectral properties. An image is partitioned into separated regions in the image segmentation process described in Sec. 10.3. Homogeneous regions of pixels are extracted through multiresolution segmentation. During classification, every image segment or object is treated as the unit of analysis. All pixels comprising an object or segment are assigned to an information class according to the simultaneous consideration of the class’s detailed description. There are several classification methods available. The commonly adopted classifiers are nearest neighbor, statistical (e.g., maximum likelihood), fuzzy, and neural networks. The nearest neighbor classifier is functional for the meaningful separation of classes when performing in the same feature space. It is a soft classifier based on fuzzy logic, and takes into consideration vagueness in class description, class mixture, and pixel radiometric uncertainty in its decision making. After sample objects are declared for each class of interest, the machine then searches for the closest sample object in the feature space. The closer an image object resembles a sample of a class, the larger the membership it receives. The highest membership values are considered to represent the best classification outcome (Baatz et al., 2001). The image object is assigned to the class with the highest membership value if it outstrips the predefined minimum value.
10.4.3
Implementation Environments
There are three environments in which object-oriented image analysis may be implemented: Environment for Visualizing Images (ENVI) Feature Extract, eCognition, and Feature Analyst. This section introduces the last two classifiers.
eCognition eCognition is a very popular and successful image processing system that is able to convert remote sensing data into accurate geographic information for various applications and hence extends the range of image analysis applications. It is based on objects formed through multiresolution image segmentation in a hierarchical manner. The objects at a higher level tend to be larger sized and are formed by
Spatial Image Analysis grouping those at a lower level via three criteria: spectral similarity, contrast with neighboring objects, and shape characteristics of the resultant object. Multiple spectral bands may be used in the segmentation with each channel weighted, together with shape defined by smoothness and compactness. Local context is incorporated into classification in which the relationship between networked image objects is considered, in addition to image object attributes. Other alternative segmentation methods include segmentation according to the spectral difference of objects and segmentation of subobjects for analyzing linear features. In addition to this standard mode of object creation, external segmentation results can be imported into eCognition to meet special application needs. The segmented image is classified using nearest neighbor based on fuzzy logic in which a broad spectrum of object features, including spectral values, shape, and texture, are integrated into the decision making. It is possible to fuse multiple source data by parallel evaluation of image information. Textured or low-contrast data, such as very high resolution airborne or even radar data, can be analyzed to form objects (Definiens Imaging, 2004). A set of interfaces allows the classification process to be transparent and accessible to the image analyst. A new toolbox is available for earth scientists who are not so familiar with image analysis. Via this toolbox the user is able to perform basic segmentation, threshold-based classification, basic merge operations, and export results.
Feature Analyst This commercial feature-extraction software package is released by Visual Learning Systems, Inc. located in Missoula, Montana. It attempts to overcome the limitations of the manual method in object recognition and feature extraction through the introduction of the inductive learning-based approach to model the feature-extraction process that incorporates object-recognition attributes. Through the system’s graphical user interface (GUI), the user manually extracts and classifies features from a small subset of the remote sensing image displayed on screen initially (Fig. 10.10). They are then fed to the system as labeled examples or training set, based on which the predicted features of the learning algorithms are corrected during hierarchical learning in which clutters are removed. The system then classifies and extracts the remaining targets or objects based on a learned model that correlates known spectral and spatial signatures with targeted outputs (Blundell and Opitz, 2006). Featured prominently in Feature Analyst are two modules, Feature Modeler and Feature Model Library, designed to automate the process of feature extraction. Adaptive feature models can be modified to extract features from images recorded at different spatial and radiometric resolutions and across different geographic regions in different seasons. These models can also be refined via the Feature Modeler so that they can be shared among users, resulting in significant saving in time. The system is very easy to use. Users just need to click buttons to activate the necessary
425
426
Chapter Ten
FIGURE 10.10 The layout of user interface and modules in Feature Analyst. (Copyright: Visual Learning Systems.)
functions, without needing any programming knowledge. Its open architecture means that innovative feature-extraction algorithms and tools by third-party developers can be added to it. This system can be embedded easily into mainstream GIS and image analysis systems, such as ArcGIS and ERDAS (Earth Resources Data Analysis System) Imagine, as a plug-in. However, its reported performance has not been encountered in the literature yet.
10.5
Potential of Object-Oriented Image Analysis The performance of object-oriented image classification has been evaluated comparatively with per-pixel classifiers in a number of studies. In this section the object-oriented method is assessed relative to the maximum likelihood method via a case study. The reported performance of object-oriented image analysis in mapping various ground features from different satellite data in the literature is surveyed and evaluated next. This section then concentrates on the strengths and limitations of object-oriented image classification. Finally, this section ends with a critical assessment of the factors that affect the accuracy of object-oriented image analysis.
10.5.1 A Case Study The objective of this case study is to evaluate the potential of the objectbased image classification method in mapping various forms and intensity of land degradation that lack clear boundaries in its extent from
Spatial Image Analysis medium resolution (15 m) satellite data in Tongyu County, northeast China, a geographic area that has suffered from various forms of land degradation, including land salinization/alkalization, waterlogging, and desertification. Also assessed in this study is the impact of spatial scaling on the mapping accuracy of land degradation from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data recorded on September 11, 2004. The 30-m shortwave infrared (SWIR) bands were georeferenced to the Universal Transverse Mercator projection (zone 51; datum: WGS84) using nine widely distributed ground control points (GCPs) and resampled to 15 m to be stacked with the three visible and near-infrared (VNIR) bands (15 m) that had been projected to the same ground coordinate system already. Training samples or objects representative of nine covers (healthy farmland, barren, degraded, wetland, water, grassland, woodland, settled areas, and fallow farmland) were declared beforehand. The nine-band image was segmented at a spatial scale of 10 pixels (150 m) and 20 pixels (300 m) in eCognition, and the segmented images were classified using the standard nearest neighbor function. During image classification, all classes in the class hierarchy were linked to the image objects in a scene. Accuracy assessment was based on a grand sum of 256 pixels whose identity on the ground was ascertained from a field visit and ground photographs. The image segmented at 10 pixels was classified (Fig. 10.11) at an overall accuracy of 74.2 percent (Table 10.3). Of the four degradationrelated covers, degraded land has the lowest user’s accuracy at 65.8 percent while farmland degraded to grassland has the highest accuracy at 87.5 percent. This is because degraded land is defined by the quantity of vegetation present that was spatially discontinuous. Thus, it is impossible to demarcate its boundary precisely. By comparison, grassland degraded from farmland still retained its clear boundary on the ground. The other two covers (barren and grassland), both being spatially fragmented, have a similar accuracy. The producer’s accuracy is much higher for barren land and degraded land, but noticeably lower for grassland. The image segmented at 20 pixels was classified at an overall accuracy of 76 percent, slightly higher than the accuracy achieved at 10 pixels. The four degradation-related covers respond to scaling differently. Degraded land that is poorly mapped at 10 pixels still has the lowest user’s accuracy of 64.6 percent, whereas farmland degraded to grassland has its user’s accuracy lowered by 10 percent. Barren land benefits marginally from the use of a larger scale. The most noticeable improvement took place for grassland, whose user accuracy rose sharply to 96.2 percent. Such a differential manner of response to scaling is accounted for by the scale of occurrence. Farmland degraded to grassland has the smallest scale of occurrence. A large scale (e.g., 300 m) is detrimental to its accuracy of mapping. Grassland is the most extensive on the ground. A large scale is conducive to achievement of higher mapping accuracy. Neither barren ground nor grassland has a clear-cut boundary on the ground. Their
427
428
Chapter Ten
Barren Degraded Farmland to grassland Farmland (fallow) Farmland (healthy) Grassland Settlement Water Wetland
FIGURE 10.11 Land covers with an emphasis on degraded land classified from merged ASTER VNIR and SWIR bands at 15 m using the object-oriented method. (Source: Gao, 2008.) See also color insert.
10-Pixel Level (Kappa = 0.664) Category
1
1. Barren
26
4. Farmland degraded to grassland 8. Degraded land
4
9
Sum
User’s Accuracy (%)
5
2
33
78.8
24
87.5
21 1
2
9. Grassland Producer’s accuracy (%)
8
96.3
77.8
50
14
76
65.8
1
25
33
75.8
76.9
59.5
74.2
20-Pixel Level (Kappa = 0.667) 1. Barren
23
4. Farmland degraded to grassland 8. Degraded land
5 17
1
1
9. Grassland
28
82.1
22
77.3
42
14
65
64.6
1
25
26
96.2
263
Sum
24
28
57
39
Producer’s accuracy (%)
95.8
60.7
73.7
64.1
76.0
Source: Modified from Gao, 2008.
TABLE 10.3 Error Matrix of Standard Nearest Neighbor Classification at the Object
Spatial Image Analysis mapping accuracy is not profoundly affected by scaling in the objectoriented method. Overall, not all land covers can benefit from a broader scale as each type of object has its own scale of occurrence.
10.5.2
Performance Relative to Per-Pixel Classifiers
The outstanding performance of object-oriented image classification has been repeatedly demonstrated by numerous studies in the literature. It has the potential for effectively mapping large burned areas, even with coarse resolution Advanced Very High Resolution Radiometer (AVHRR) imagery. The object-oriented classifier performed rather competently for mapping a large forest fire on the Mediterranean coast of Spain (Gitas et al., 2004). However, it is questionable how much this high accuracy benefits from the use of objects in the classification. Naturally, a higher accuracy is expected if the mapped areas are as simplistic as burned versus unaffected, regardless of the type of image classifier used. This deficiency in comparatively assessing object-oriented and per-pixel-based image classification has been remedied recently by quite a number of comparative studies summarized in Table 10.4.
Per-Pixel Accuracy (%)
ObjectOriented Accuracy (%)
Objective of Classification
Imagery Used
No. of Covers Mapped
Agricultural land use
TM (30 m)
6
83.8
86.3
Geneletti and Gorte, 2003
Coal fire
ASTER (15 m)
12
46.5
83.25
Yan et al., 2006
Land degradation
ASTER (15 m)
9
70.6
74.2
Gao, 2008
Agricultural land use
SPOT (10 m)
10
73.5
85.5
Jensen et al., 2006
Dense urban area
IKONOS (1 m PAN and XL)
6
80.3*
86.4
Shackelford and Davis, 2003
Land use
QuickBird (2.4 m)
5
64.45
95.47
Wei and Ma 2005
Urban scenes
QuickBird
NA
73.9
84.8
Zhou et al., 2007
Land use
CASI
13
63.08
59.23
Aplin et al., 1999
Authors
∗Averaged accuracy of six covers (road, building, grass, tree, bare soil, and water).
TABLE 10.4
Comparison of Per-Pixel and Per-Field Image Classification
429
430
Chapter Ten Listed in the table are the types of mapping applications, the satellite data used, and the number of land cover types mapped, all of which affect the mapping accuracy of both methods. According to these studies, object-oriented image classification is superior to its per-pixel counterpart in classifying satellite data of a medium spatial resolution for a wide range of applications. The best superiority is achieved in mapping land cover into 12 categories from 15-m ASTER data in a coal-fire area (Yan et al., 2006). The object-oriented method achieved an overall accuracy 37 percent higher than that associated with the per-pixel method. In particular, the accuracy of (potential) surface coal-fire areas showed a marked increase. Both per-pixel and per-field methods are similarly accurate in mapping agricultural land use from 30-m TM data, achieving an accuracy in the mid-80 percent range (Geneletti and Gorte, 2003). This accuracy level is higher than that reported by others, probably due to the fact that the land covers were mapped into only six categories. However, the relative superiority of the object-oriented method to the per-pixel method (about 3 percent) has been verified repeatedly in the literature. For instance, Gao (2008) obtained an overall accuracy of 74.2 percent in a nine-category land-degradation mapping with the object-oriented method, marginally higher than 70.6 percent achieved using the maximum likelihood method. This relativity in overall accuracy is very similar to what Koch et al. (2003) had achieved in classifying forest from 30-m Landsat Enhanced TM Plus (ETM+) data fused with 6-m IRS 1D Pan data. The superior performance of the objectoriented method persists at the forest-stands level, as well. Again, the higher accuracy is due to the fact that only forest was the target of mapping in the study. The reported mid-80 percent accuracy of the object-oriented method in mapping agricultural land use has been corroborated by Jensen et al. (2006) using multispectral SPOT data. This accuracy is much higher than the around 50 percent accuracy from the maximum likelihood method. The huge superiority of the object-oriented method is attributed to the fact that most of the mapped land covers (e.g., sugarcane, citrus plantations, forest, and banana) are artificially planted, and hence have a regular shape and pattern, and a uniform composition. So their mapping benefits considerably from the use of spatial context in object-oriented classification. The object-based approach is able to produce highly accurate results not only with satellite imagery of a medium resolution, but also with imagery of a fine spatial resolution. Object-based classification is more accurate than the pixel-based method in extracting relatively small objects such as shrubs from high-resolution QuickBird satellite imagery of 1- and 4-m resolutions (Laliberte et al., 2004). Object-based image analysis is successful in extracting forest inventory information from 1-m panchromatic and 4-m multispectral IKONOS-2 imagery data (Chubey et al., 2006). Image objects generated through image segmentation carry important forest-related information,
Spatial Image Analysis such as inherent spectral and spatial features of forest-stand components. The mapped results from airborne digital camera imagery of 1-m resolution match the field reference at 68 percent (Kappa = 0.57) (Lathrop et al., 2006). The agreement between the mapped results and the independent reference data rises to 71 percent (Kappa = 0.43) if judged against presence/absence of vegetation. An overall classification accuracy of 86.3 percent was obtained from an orthophoto mosaic of 7.5 m in a classification of six covers (vineyards, orchards, bare, built-up, pasture, and water) (Geneletti and Gorte, 2003). Thanks to the finer spatial resolution of the image, almost all the cover classes are better distinguished from the segmented image. However, the object-oriented method made little difference to the accuracy of builtup and bare, owing to both being spectrally heterogeneous. Even in highly fragmented environment such as urban areas that are challenging to be classify reliably, the object-oriented method is equally competent and produces more superior results to per-pixel methods. In classifying a 2.4-m QuickBird image into five covers (water, vegetation, road, building, and bare lands), the object-oriented method achieved an overall accuracy of 95.5 percent (Wei et al., 2005). The accuracy dropped to 64.45 percent using the maximum likelihood classifier, owing to confusion between water and shadow, presence of noise, and fuzzy boundaries of cultural features. However, the excessively high accuracy of the object-oriented classifier is not replicated by Zhou et al. (2007). They achieved an overall accuracy of 84.82 percent based on the integration of fuzzy classification and the nearest neighbor in extracting urban information from QuickBird imagery, against 73.9 percent with the traditional pixel-based method. The object-oriented approach is promising in providing detailed and accurate information about the physical structure of urban areas. The accuracy is extremely similar to 86.4 percent achieved by Shackelford and Davis (2003) in mapping urban areas into buildings, impervious surface, and roads from fused panchromatic (1 m) and multispectral (4 m) IKONOS data. Although this accuracy is not directly comparable to the user’s accuracy of 80.3 percent averaged from the six covers mapped with the maximum likelihood method, the object-based method resulted in more details in that “building” was separated from “impervious surface,” which was not possible with the per-pixel-based method. In mapping tree mortality from multispectral images of 1-m spatial resolution (Guo et al., 2007), knowledge-based classification of objects formed through regiongrowing image segmentation significantly outperformed the pixelbased maximum likelihood classification method. It achieved an overall accuracy of 95.7 percent, much higher than 71.6 percent of the later, even though the object-oriented method is much more complex, involving a number of steps. Experimentation result shows that per-object maximum likelihood classification performs much better than the per-pixel method (Abkar et al., 2000). The object-based
431
432
Chapter Ten approach appears to be a better alternative in mapping suburban land covers than the pixel-based method. The inferior effectiveness of the per-pixel classification method is attributed to its underlying assumption that is frequently violated in reality, especially in urban areas where residential, commercial, and impervious surface coexist in close proximity to one another. Mistaken assumptions lead to misclassifications. In addition, the classification algorithm itself is limited in that it is essentially aspatial. No spatial relationship among pixels is taken into consideration in the decisionmaking process. Subsequently, the results derived from pixel-based classification suffer from the familiar salt-and-pepper effect, with single pixels of a different identity surrounded by homogenous regions. In spite of these limitations, nevertheless, the pixel-based classification approach produced comparatively accurate results objectively and quickly sometimes, as demonstrated in another study below. So far the only study that found per-pixel classification outperformed object-oriented classification is reported by Aplin et al. (1999). They classified land covers into 13 categories using 4-m CASI data. Unsupervised classification of CASI data was achieved at 63.08 percent owing to extensive confusion between several classes. This accuracy dropped to 59.23 percent with the per-field method in which objects were generated from a vector land cover layer. Of the six classification problems identified, four are the same facing the per-pixel method. The two problems unique to per-field classification is the misregistration with the vector land cover and errors in the vector data. This inferior performance of the per-field classification is due to the exclusion of other spatial attributes of pixels in the classification, such as shape and texture.
10.5.3
Strengths
Object-oriented image classification is intuitively appealing as it highly resembles how human vision tends to divide images into homogeneous areas at the beginning of manual interpretation (Blaschke et al., 2000). The object-oriented classification of multiscale segmented images closely mirrors our conceptual model of the spatial structure of aquatic vegetation habitats (Lathrop et al., 2006). Object-oriented analysis has the following five advantages over perpixel-based classifiers: • First and foremost, rich spectral and spatial information inherent in satellite data is fully and jointly utilized in forming objects. Apart from spectral information that is exclusively used in pixel-based classification, shape and neighborhood relationships are incorporated into object-oriented image classification (Baatz et al., 2001). Statistical and texture information; properties of image segments, such as shape, length, size, and number of edges; and even topological
Spatial Image Analysis features (neighbor, superobject, and so on) can all be readily incorporated into the classification process. Characteristic textures exhibited by most image data ignored in per-pixel classification may also be utilized in forming objects. Able to exploit spatial metrics and texture measures, object-oriented classification can potentially open up a new avenue in extracting detailed land use information (Herold et al., 2003), and considerably increases the chances of correctly labeling pixels in the decision-making process. Simultaneous utilization of spectral information and spatial arrangements (e.g., size, shape, texture, pattern, association with neighboring objects) conforms to the way humans interpret remote sensing imagery (Hudak and Wessman, 1998), but has the benefit of an automated classification routine (Laliberte et al., 2004). • Second, multisource information can be easily taken advantage of. As shown in Eq. (10.4), more data layers in the input simply means more terms in the summation. Commonly used ancillary information includes topographic and existing land cover maps (Abkar et al., 2000). From these multisource data contextual relationship and semantic information can be derived and used in object-oriented image classification. It is even possible to include vector ancillary data in a classification. Synergetic use of pixel-based or statistical signal processing methods with contextual use is able to make full use of the rich information inherent in satellite imagery. Furthermore, the reliability of ancillary data can be taken into consideration by assigning an appropriate weight to them during production of the similarity index. • Third, object-oriented image classification is so flexible that it can be combined with other innovative classifiers covered in the previous chapters, such as fuzzy classifiers (Shackelford and Davis, 2003), neural network classifiers, and decision trees. Once an input image has been segmented to form object primitives, it can be classified using the innovative classifiers, just as a raw image. The only difference lies in the unit of decision making. Instead of pixels, it is the object primitives that are the fundamental units in the decision making. This combination takes advantage of the strengths of both fuzzy classification and object-oriented classification, and is effective in separating covers that are very difficult to map using other methods. After object-oriented classification, decision trees may be used to correlate spectral and spatial metrics of image objects with field samples to assess the quality of image segmentation (Chubey et al., 2006). The object-oriented approach can contribute to powerful automatic and semiautomatic analysis for most remote sensing applications (Benz et al., 2004).
433
434
Chapter Ten • Fourth, object-based image analysis is remarkably beneficial in mapping relatively small objects such as pockets of remnant woodland in large residential areas in a complex suburban environment that is difficult to map using pixelbased classification (Shackelford and Davis, 2003), owing to its ability to generalize small land patches embedded in typical dominant land covers. About 87 percent of all shrubs larger than 2 m² in area were detected with the object method (Laliberte et al., 2004) owing to its ability to generalize the spectral information within a neighborhood via the local homogeneity criterion. If mapped with the per-pixel method, a significant salt-and-pepper effect is inevitable in mapping such a small patch of trees. The mapping is made possible thanks to the fine spatial resolution of the satellite imagery used. On coarse resolution imagery, such minor covers as urban forest would be classified as a small cluster of few pixels that are likely to be generalized during postclassification filtering. • Finally, an added advantage of object-oriented image analysis is that the results are patches or polygons already. They are more accurate and easier to interpret than those derived from per-pixel classifiers. The latter may still appear to be highly speckled even after postclassification filtering. Since the unit of decision making in object-oriented image classification is objects that have been formed through image segmentation, they can be easily converted to the vector format with little additional processing. This makes the results highly compatible with GIS data, most of which are stored in the vector format. The lengthy process of vectorizing and editing the per-pixel classification results in the raster form becomes redundant with the object-oriented method. This data format makes certain applications (e.g., change detection) extremely easy to carry out. This will considerably accelerate the integration of remote sensing with GIS, and lead to extensive use of remotely sensed data in ever diverse and increasing GIS applications. On the other hand, existing spatial data in the vector format can be integrated into the classification without having to be rasterized first. It is envisaged that object-based image analysis will trigger new developments toward a full integration of GIS and remote sensing functions (Blaschke et al., 2000), a topic to be covered in depth in Chap. 14.
10.5.4
Limitations
Although the object-oriented method is mostly accurate, it does have the following drawbacks:
Spatial Image Analysis • In object-oriented image classification, it is implicitly assumed that objects on the ground can be identified on the satellite imagery. However, this assumption is not valid if the object is too small or if the satellite imagery has a too coarse spatial resolution on which objects are difficult to discern. Thus, this method is suitable for images that have a large scale, such as high or very high resolution satellite imagery or large-scale aerial photographs, or for relatively large features on the ground. • The success of object-based image classification is subject heavily to the scale at which the input image is segmented. Not all ground objects occur at the same or a similar scale. Although the scale problem can be addressed with multiscale or multiresolution image segmentation, the quality of segmented results varies with the proper specification of scale at which the image is partitioned into objects. There is no theoretical guidance for selecting an appropriate scale for images of a particular spatial resolution. In practice, scale determination relies mostly on the experience and knowledge of the analyst about the spatial distribution of covers and their appearance inside the study area, as well as the spatial resolution of the image. A suitable scale is established through a lengthy and occasionally painstaking process of trial and error. Thus, the segmented image can be highly subjective. Besides, the scale deemed optimal in one geographic area or for one type of features in one type of imagery may not be applicable to another study area or for other types of features in another type of imagery. • This method has the drawback of arbitrary selection of training samples whose quality directly governs the outcome and reliability of image classification. It is impossible to ensure that every object merged against the homogeneity criteria is composed of only one class of pixels. Quality samples should represent the typical range of cover classes. Consequently, mixed objects further blur the boundary between land cover types. In addition, these samples should be large enough to enable the calculation of the separation distance at different scales. In light of the above limitations, it might be necessary to combine this method with a per-pixel classifier. For instance, the segmented results can be classified using the minimum-distance-to-mean method (Guo et al., 2007). The combination of both methods may be beneficial in determining the internal composition of objects and their structure (Koch et al., 2003). If integrated with the pixel-based method, the objectoriented approach achieved a higher accuracy than other methods in mapping mangroves from IKONOS images (Wang et al., 2004).
435
436
Chapter Ten
10.5.5 Affecting Factors As the comparison in Table 10.4 reveals, no matter how superior the object-oriented method is to per-pixel–based image classification, it never achieved the perfect classification accuracy (i.e., 100 percent). The specific accuracy at which a cover class is classified depends on a number of factors. Some of them are general to all image classifiers, such as the detail level at which an image is classified and the soundness of the classification scheme. Other factors that are unique to object-oriented image classification are summarized here.
Mapping Scale The scale of the objects in relation to the spatial resolution of the satellite imagery used and the number of spectral bands bear a close relationship with classification accuracy (Jensen et al., 2006). In general, higher accuracy is associated with data of a finer resolution. For instance, the most accurate results are associated with SPOT multispectral (20 m) and panchromatic data (10 m). When applied to Landsat ETM+ data and TM data (30 m), the accuracy of the objectoriented approach drops by 5 to 6 percent. The average accuracy of mapping buildings and impervious surfaces is higher if the panchromatic band is merged with multispectral bands than with either of them separately (Shackelford and Davis, 2003). Object-oriented image classification can bring benefits out in mapping urban areas from fine resolution satellite images (e.g., 1- to 4-m IKONOS images), even in mapping mangroves if combined with the per-pixel method (Wang et al., 2004). This confirms that “region-based approaches are suitable for the analysis of high-resolution remotely sensed data” (Schiewe et al., 2001). Therefore, object-based image classification is best performed with fine resolution satellite data.
Presence of Boundary and Its Distinctiveness An assumption underlying object-oriented image classification is that there are clear, identifiable boundaries between different land covers. This assumption is valid for artificial ground features such as urban residential and cropland. The distinctiveness in boundaries between adjacent agricultural fields is conducive to improvement in classification accuracy because they are relatively stable while the cropping pattern (also within the lots) changes often (Blaschke et al., 2000). Thanks to the correct identification of the boundaries of agricultural fields, the object-oriented method is particularly effective at improving the classification accuracy of agricultural areas. However, the assumption is not completely valid for naturally occurring phenomena, such as degraded grassland, whose boundaries are gradual. Boundaries of these land covers are hardly clear-cut, nor a true continuum. More likely they are fuzzy, or transitional sometimes. Without definite boundaries it is impossible to form object primitives during image segmentation. For these covers, image segments do not correspond to
Spatial Image Analysis meaningful objects precisely. It is this misrepresentation that leads to misclassifications. For those spectrally diverse covers (e.g., degraded lands) that are spatially fragmented by the juxtaposition of image segments, will never represent meaningful objects. This explains why object-based classification is marginally advantageous to the per-pixel method in mapping degraded lands (Gao, 2008).
Uncertainty of Image Segmentation This uncertainty is caused by unreliability of the satellite data (e.g., because of atmospheric conditions), and can be minimized by taking more image cues into consideration during segmentation. For instance, the use of shape and shadow increases the segmentation accuracy of buildings and impervious surfaces in dense urban areas (Shackelford and Davis, 2003). Additional cues in image segmentation may include texture and ancillary data (e.g., existing boundaries logged with a global positioning system [GPS]), and use of more spectral bands. In addition, more robust and highly effective image segmentation algorithms are needed for multispectral remote sensing imagery (Liu et al., 2006). How to best segment an image so that the resultant object primitives always correspond to real-world land covers precisely is an issue critical to the perfection of object-oriented image classification.
References Abkar, A. A., M. A. Sharifi, and N. J. Mulder. 2000. “Likelihood-based image segmentation and classification: A framework for the integration of expert knowledge in image classification procedures.” ITC Journal. 2000(2):104–119. Ambrosia, V., and G. T. Whiteford. 1983. “Aerial photograph interpretation in remote sensing.” In Introduction to Remote Sensing of the Environment, ed. B. F. Richason, 57–86. Dubuque, IA: Kendall/Hunt. Aplin, P., P. Atkinson, and P. Curran. 1999. “Per-field classification of land use using the forthcoming very fine resolution satellite sensors: Problems and potential solutions.” In Advances in Remote Sensing and GIS Analysis, ed P. M. Atkinson and N. J. Tate, 219–239. Chichester, England: Wiley & Son. Augusteijn, M. F., L. E. Clemans, and K. A. Shaw. 1995. “Performance evaluation of texture measures for ground cover identification in satellite images by means of a neural network classifier.” IEEE Transactions on Geoscience and Remote Sensing. 33(3):616–626. Baatz, M., U. Benz, S. Dehghani, M. Heynen, A. Höltje, P. Hofmann, I. Lingenfelder, et al. 2001. eCognition—Object Oriented Image Analysis. Germany: Definiens Imaging GmbH. Baatz, M., and A. Schäpe. 2000. “Multiresolution segmentation—An optimization approach for high quality multi-scale image segmentation.” In Angewandte Geographische Informations-Verarbeitung XII, ed. J. Strobl, T. Blaschke, and G. Griesebner, 12–23. Karlsruhe, Germany: Wichmann Verlag (in German). Baltsavias, E. P. 2004. “Object extraction and revision by image analysis using existing geodata and knowledge: Current status and steps towards operational systems.” ISPRS Journal of Photogrammetry and Remote Sensing. 58(3–4):129–151. Benz, U. C., P. Hofmann, G. Willhauck, I. Lingenfelder, and M. Heynen. 2004. “Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information.” ISPRS Journal of Photogrammetry and Remote Sensing. 58(3–4):239–258.
437
438
Chapter Ten Bischof, H., W. Schneider, and A. J. Pinz. 1992. “Multispectral classification of Landsat-images using neural networks.” IEEE Transactions on Geoscience and Remote Sensing. 30(3):482–490. Blaschke, T., and G. J. Hay. 2001. “Object oriented image analysis and scale-space: Theory and methods for modeling and evaluating multiscale landscape structure.” International Archives of Photogrammetry and Remote Sensing. 58:239–258. Blaschke, T., S. Lang, E. Lorup, J. Strobl, and P. Zeil. 2000. “Object oriented image processing in an integrated GIS/remote sensing environment and perspectives for environmental applications.” In Environmental Information for Planning, Politics and the Public. ed. A. Cremers and K. Greve, II:555–570. Marburg: Metropolis-Verlag. Blundell, J. S., and D. W. Opitz. 2006. “Object recognition and feature extraction from imagery: The Feature Analyst approach.” In Proceeding of 1st International Conference on Object-based Image Analysis, July 4-5, 2006, 1–6. (sblundell@vls-inc. com) (http://www.featureanalyst.com/). Burnett, C., and T. Blaschke. 2003. “A multi-scale segmentation/object relationship modeling methodology for landscape analysis.” Ecological Modeling. 168(3):233–249. Cannon, R. L., J. V. Dave, J. C. Bezdek, and M. M. Trivedi. 1986. “Segmentation of a thematic mapper image using the fuzzy c-means clustering algorithm.” IEEE Transactions on Geoscience and Remote Sensing. GE-24(3):400–408. Carleer, A. P., O. Debeir, and E. Wolff. 2005. “Assessment of very high spatial resolution satellite image segmentation.” Photogrammetric Engineering and Remote Sensing. 71(11):1285–1294. Carr, J. R. 1996. “Spectral and textural classification of single and multiple band digital images.” Computers and Geosciences. 22(8):849–865. Carr, J. R. 1999. “Classification of digital image texture using variograms.” In Advances in Remote Sensing and GIS Analysis, ed. P. M. Atkinson and N. J. Tate, 135–146. Chichester, England: Wiley & Son. Carr, J. R., and F. P. Miranda. 1998. “Semivariogram in comparison to the co-occurrence matrix for classification of image texture.” IEEE Transactions on Geoscience and Remote Sensing. 369(6):1945–1952. Chan, J. C. W., N. Laporte, and R. S. Defries. 2003. “Texture classification of logged forests in tropical Africa using machine-learning algorithms.” International Journal of Remote Sensing. 24(6):1401–1407. Chen, D., D. A. Stow, and P. Gong. 2004. “Examining the effect of spatial resolution and texture window size on classification accuracy: An urban environment case.” International Journal of Remote Sensing. 25(11):2177–2192. Chubey, M. C., S. E. Franklin, and M. A. Wulder. 2006. “Object-based analysis of Ikonos-2 imagery for extraction of forest inventory parameters.” Photogrammetric Engineering and Remote Sensing. 72(4):383–394. Clausi, D. A., and B. Yue. 2004. “Comparing cooccurrence probabilities and Markov random fields for texture analysis of SAR sea ice imagery.” IEEE Transactions on Geoscience and Remote Sensing. 42(1):215–228. Debeir, O., I. van den Steen, P. Latinne, P. van Ham, and E. Wolff. 2002. “Textural and contextual land-cover classification using single and multiple classifier systems.” Photogrammetric Engineering and Remote Sensing. 68(6):597–605. Definiens Imaging. 2004. eCognition User Guide 4, 2000–2004, if part of title, otherwise delete & change comma after ‘Guide 4’ to period, http:\\www.definiens- imaging.com. Dulyakarn, P., Y. Rangsanseri, and P. Thitimajshima. 2000. “Comparison of two texture features for multispectral imagery analysis.” GIS Development. http:// www.gisdevelopment.net/aars/acrs/2000/ts9/imgp0012pf.htm. Erikson, M. 2003. “Segmentation of individual tree crowns in color aerial photographs using region growing supported by fuzzy rules.” Canadian Journal of Forest Research. 33(8):1557–1563. Ferro, C. J. S., and T. A. Warner. 2002. “Scale and texture in digital image classification.” Photogrammetric Engineering and Remote Sensing. 68(1):51–63. Franklin, S. E., and McDermid, G. J. 1993. “Empirical relations between digital SPOT HRV and CASI spectral response and lodgepole pine (Pinus
Spatial Image Analysis contorta) forest stand parameters.” International Journal of Remote Sensing. 14(12):2331–2348. Franklin, S. E., R. J. Hall, L. M. Moskal, A. J. Maudie, and M. B. Lavigne. 2000. “Incorporating texture into classification of forest species composition from airborne multispectral images.” International Journal of Remote Sensing. 21(1):61–79. Franklin, S. E., Maudie, A. J., and M. B. Lavlgne. 2001. “Using spatial co-occurrence texture to increase forest structure and species composition classification accuracy.” Photogrammetric Engineering and Remote Sensing. 67(7):849–855. Frohn, R. C., and O. Arellano-Neri. 2005. “Improving artificial neural networks using texture analysis and decision trees for the classification of land cover.” GIScience and Remote Sensing. 42(1):44–65. Gao, J. 2008. “Mapping of land degradation from ASTER data: A comparison of objectbased and pixel-based methods.” GIScience and Remote Sensing. 45(2):1–18. Geneletti, D., and B. G. H. Gorte. 2003. “A method for object-oriented land cover classification combining Landsat TM data and aerial photographs.” International Journal of Remote Sensing. 24(6):1273–1286. Gitas, I. Z., G. H. Mitri, and G. Ventura. 2004. “Object-based image classification for burned area mapping of Creus Cape, Spain, using NOAA-AVHRR imagery.” Remote Sensing of Environment. 92(3):409–413. Guo, Q., M. Kelly, P. Gong, and D. Liu. 2007. “An object-based classification approach in mapping tree mortality using high spatial resolution imagery.” GIScience and Remote Sensing. 44(1):24–47. Gurney, M. C., and J. R. G. Townshend. 1983. “The use of contextual information in the classification of remotely sensed data.” Photogrammetric Engineering and Remote Sensing. 49(1):55–64. Haralick, R. 1979. Handbook of Pattern Recognition and Image Processing, 247–279 London: Academic Press. Haralick, R. M., K. Shanmugam, and I. Dinstein. 1973. “Textural features for image classification.” IEEE Transactions on Systems, Man, and Cybernetics. SMC-3(6):610–621. Haralick, R M., and L. G. Shapiro. 1985. “Image segmentation techniques.” Computer Vision, Graphics and Image Processing. 29(1):100–132. Herold, M., X. Liu, and K. Clarke. 2003. “Spatial metrics and image texture for mapping urban land use.” Photogrammetric Engineering and Remote Sensing. 69(9):991–1001. Hu, X., C. V. Tao, and B. Prenzel. 2005. “Automatic segmentation of high-resolution satellite imagery by integrating texture, intensity, and color features.” Photogrammetric Engineering and Remote Sensing. 71(12):1399–1408. Hudak, A. T., and C. A. Wessman. 1998. “Textural analysis of historical aerial photography to characterize woody plant encroachment in South African savanna.” Remote Sensing of Environment. 66(3):317–330. Jensen, J. R., M. Garcia-Quijano, B. Hadley, J. Im, Z. Wang, A. L. Nel, E. Teixeira, et al. 2006. “Remote sensing agricultural crop type for sustainable development in South Africa.” Geocarto International. 21(2):5–18. Kestner, W., and C. Rumpler. 1984. “Integration of methods for the segmentation of aerial photographs.” Photogrammetria. 39(3):125–134. Koch, B., E. Ivits, and M. Jochum. 2003. “Object-based versus pixel-based: Forest classification with eCognition and ERDAS expert classifier.” GIM International. 17(12):12–15. Kontoes, C. C., and D. Rokos. 1996. “The integration of spatial context information in an experimental knowledge-based system and the supervised relaxation algorithm—Two successful approaches to improving SPOT XS classification.” International Journal of Remote Sensing. 17/16:3093–3106. Laliberte, A. S., A. Rango, K. M. Havstad, J. F. Paris, R. F. Beck, R. McNeely, and A. L. Gonzalez. 2004. “Object-oriented image analysis for mapping shrub encroachment from 1937 to 2003 in southern New Mexico.” Remote Sensing of Environment. 93(1–2):198–210. Lathrop, R. G., P. Montesano, and S. Haag. 2006. “A multi-scale segmentation approach to mapping seagrass habitats using airborne digital camera imagery.” Photogrammetric Engineering and Remote Sensing. 72(6):665–675.
439
440
Chapter Ten Lee, J., R. C. Weger, S. K. Sengupta, and R. M. Welch. 1990. “A neural network approach to cloud classification.” IEEE Transactions on Geosciences and Remote Sensing. 28(5):846–855. Liu, Y., M. Li, L. Mao, F. Xu, and S. Huang. 2006. “Review of remotely sensed imagery classification patterns based on object-oriented image analysis.” Chinese Geographical Science. 16(3):282–288. Maillard, P. 2003. “Comparing texture analysis methods through classification.” Photogrammetric Engineering and Remote Sensing. 69(4):357–367. Mason, D. C., D. G. Corr, A. Cross, D. C. Hogg, D. H. Lawrence, M. Petrou, and A. M. Tailor. 1988. “The use of digital map data in the segmentation and classification of remotely-sensed images.” International Journal of Geographical Information Systems. 2(3):195–215. Miranda, F. P., L. E. N. Fonseca, J. R. Carr, and J. W. Taranik. 1996. “Analysis of JERS1 (Fuyo-1) SAR data for vegetation discrimination in northwestern Brazil using the semivariogram textural classifier (STC).” International Journal of Remote Sensing. 17(17):3523–3529. Moller-Jensen, L. 1990. “Knowledge-based classification of an urban area using texture and context information in Landsat TM imagery.” Photogrammetric Engineering and Remote Sensing. 56(6):899–904. Myint, S. W., and N. Lam. 2005. “A study of lacunarity-based texture analysis approaches to improve urban image classification.” Computers, Environment and Urban Systems. 29(5):501–523. Nicolin, B., and R. Gabler. 1987. “A knowledge-based system for the analysis of aerial images.” IEEE Transactions on Geoscience and Remote Sensing. GE 25(3):317–329. Puissant, A., J. Hirsch, and C. Weber. 2005. “The utility of texture analysis to improve per-pixel classification for high to very high spatial resolution imagery.” International Journal of Remote Sensing. 26(4):733–745. Rellier, G., X. Descombes, F. Falzon, and J. Zerubia. 2004. “Texture feature analysis using a Gauss–Markov model in hyperspectral image classification.” IEEE Transactions on Geoscience and Remote Sensing. 42(7):1543–1551. Ryherd, S., and C. E. Woodcock. 1996. “Combining spectral and texture data in the segmentation of remotely sensed images.” Photogrammetric Engineering and Remote Sensing. 62(2):181–194. Schiewe, J., L. Tufte, and M. Ehlers. 2001. “Potential and problems of multi-scale segmentation methods in remote sensing.” Geo Informations Systeme. 14(6):34– 39 (in German). Shackelford, A. K., and C. H. Davis. 2003. “A combined fuzzy pixel-based and object-based approach for classification of high-resolution multispectral data over urban areas.” IEEE Transactions on Geoscience and Remote Sensing. 41(10):2354–2363. Shaban, M. A., and O. Dikshit. 2001. “Improvement of classification in urban areas by the use of textural features: The case study of Lucknow city, Uttar Pradesh.” International Journal of Remote Sensing. 22(4):565–593. Skidmore, A. K., B. J. Turner, W. Brinkhof, and E. Knowles. 1997. “Performance of a neural network: mapping forests using GIS and remotely sensed data.” Photogrammetric Engineering and Remote Sensing. 63(5):501–514. Sun, K., Y. Chen, and D. Li. 2006. “Multiscale image segmentation and its application in image information extraction.” Proceedings of SPIE—The International Society for Optical Engineering. 6419:64191I. Ton, J., J. Sticklen, and A. K. Jain. 1991. “Knowledge-based segmentation of Landsat Images.” IEEE Transactions on Geoscience and Remote Sensing. 29(2):222–232. Vincent, L., and P. Soille. 1991. “Watershed in digital spaces: An efficient algorithm based on immersion simulations.” IEEE Transactions on Pattern Analysis and Machine Intelligence. 13(6):583–598. Walter, V. 2004. “Object-based classification of remote sensing data for change detection.” ISPRS Journal of Photogrammetry & Remote Sensing. 58(3–4):225–238. Wang, L., and D. C. He. 1990. “A new statistical approach for texture analysis.” Photogrammetric Engineering and Remote Sensing. 56(1):61–66.
Spatial Image Analysis Wang, L., W. P. Sousa, and P. Gong. 2004. “Integration of object-based and pixel-based classification for mapping mangroves with IKONOS imagery.” International Journal of Remote Sensing. 25(24):5655–5668. Warner, T. A., and K. Steinmaus. 2005. “Spatial classification of orchards and vineyards with high spatial resolution panchromatic imagery.” Photogrammetric Engineering and Remote Sensing. 71(2):179–187. Wei, W., X. Chen, and A. Ma. 2005. “Object-oriented information extraction and application in high-resolution remote sensing image.” International Geoscience and Remote Sensing Symposium (IGARSS), v 6, p 3803-3806. Wilkinson, G. G., and J. Megier. 1990. “Evidential reasoning in a pixel classification hierarchy—a potential method for integrating image classifiers and expert system rules based on geographic context.” International Journal of Remote Sensing. 11(10):1963–1968. Wulder, M. A., E. F. LeDrew, S. E. Franklin, and M. B. Lavigne. 1998. “Aerial image texture information in the estimation of northern deciduous and mixed wood forest leaf area index (LAI).” Remote Sensing of Environment. 64(1):64–76. Yan, G., J. F. Mas, B. H. P. Maathuis, X. Zhang, and P. M. van Dijk. 2006. “Comparison of pixel-based and object-oriented image classification approaches—A case study in a coal fire area, Wuda, Inner Mongolia, China.” International Journal of Remote Sensing. 27(18):4039–4055. Zhao, Y., L. Zhang, P. Li, and B. Huang. 2007. “Classification of high spatial resolution imagery using improved gaussian Markov random-field-based texture features.” IEEE Transactions on Geoscience and Remote Sensing. 45(5):1458–1468. Zhou, C., P. Wang, Z. Zhang, C. Qi, and Y. Wang. 2007. “Object-oriented information extraction technology from QuickBird pan-sharpened images.” Proceedings of SPIE—The International Society for Optical Engineering. 6279(2):62793L.
441
This page intentionally left blank
CHAPTER
11
Intelligent Image Analysis
A
ll the image classification methods covered in the preceding chapters have one limitation in common: they rely solely on the evidences derived from the image itself, be it spectral or spatial. No external knowledge is involved in the decision making. Although context is partially considered in object-oriented image classification, this evidence is not explicitly spelled out for all land covers to be mapped. Background knowledge of the identity of pixels in the input image, such as spatial location and association in a scene, is ignored during classification. This differs from the human interpreter who routinely uses location or association in conjunction with other image elements in reaching a decision, in addition to the expertise accumulated from years of practice in image interpretation. In order to achieve higher classification accuracy than is possible at present, the automatic classification method must make use of more clues or evidences in labeling pixels. The solution to overcome the above limitation lies in the incorporation of external knowledge into the classification process. This is known as intelligent, or knowledge-based, image classification. It is defined as the use of knowledge or additional evidence that commonly does not form a part of input to the computer in the decision making in a way similar to the human interpreter. The external knowledge may be derived from the image itself independently of image classification or from other data sources. In this way multisource data are taken advantage of in the analysis. Use of spatially explicit knowledge is conducive to correct identification of pixels in the image, and likely leads to more reliable classification results. Such intelligent image classification distinguishes itself from the classification methods covered in the previous three chapters in that external knowledge is used to determine the identity of pixels in the input image. It is viewed as intelligent because the decision making based on rules gives the impression that the machine is able to “think” or “reason” just like the human interpreter.
443
444
Chapter Eleven This chapter comprises seven sections. The first section introduces the general features of expert systems, a small subset of intelligent image analysis, and assesses their current status and potential in image classification. The second part of the chapter elaborates on the type of knowledge that has found applications in image classification. How to acquire such knowledge from external sources forms the content of Sec. 11.3. The various ways of representing knowledge for use in image classification are discussed in Sec. 11.4. Section 11.5 concentrates on the calculation of evidence from multiple sources that is so vital to intelligent reasoning in image analysis. The last two sections are devoted to knowledge-based image classification. The topics covered include incorporation of knowledge, and a critical evaluation of its strengths, limitations, and potential in improving classification accuracy via a case study.
11.1
Expert Systems As a subset of artificial intelligence, expert systems are computer programs that make use of human knowledge in a restricted domain to provide advice or solutions to problems (Jackson, 1990). In these computer systems, real-world problems that demand special expertise are solved through a model of human reasoning, reaching the same conclusions that the human expert would come to if faced with the same problem (Weiss and Kulikowski, 1984). Expert systems can be commercial systems that are applicable directly to the targeted field of study, including image analysis in which expert knowledge is involved. They are also known as rule-based systems since knowledge is usually represented in the rule format. This knowledge-based system emulates the human expert in determining the identity of pixels based on human experiences and “rules of thumb” of experts. Knowledge-based systems, however, are not synonymous to expert systems in that they do not rely so much on algorithmic or statistical reasoning as expert systems (Jackson, 1990).
11.1.1
General Features
An expert system comprises two independent components, a domainspecific knowledge base and a domain-independent inference engine or control mechanism (Robinson et al., 1986) (Fig. 11.1). This section focuses on the inference engine, and the knowledge base containing rules and facts will be covered in Sec. 11.1.2. The inference engine, equivalent to the human brain, relates facts and rules defined by the image analyst. It determines which rule instantiations are relevant to a given working memory configuration and assembles them in a conflict set in order to devise a resolution strategy to fire rules. Namely, it governs the order in which the rules are executed. There are two orders of execution, either forward-chaining (top-down) or backwardchaining (bottom-up) inference (Schowengerdt and Wang, 1989).
Intelligent Image Analysis Expert
Knowledge acquisition
Image data samples GIS Database DEM Climate Soil Vegetation Roads etc.
Machine learning
Knowledge base Facts
Rules
Interference engine (control)
Explanatory interface User interface
Classification results
FIGURE 11.1 General scheme and components of an expert system for image analysis. (Source: Modified from Desachy et al., 1996.)
Other components of an expert system may include the knowledge acquisition subsystem, a justifier to explain the reasoning of the program, methods for treating uncertainty, and the internal and external communications mechanisms (Goldberg et al., 1985). The justifier “explains” the line of reasoning by supplying the analyst with the piece(s) of evidence used in reaching a conclusion.
11.1.2
Knowledge Base
Featured prominently in the knowledge base is a collection of facts, in addition to rules, variables, and output classes of interest. Facts refer to factual information about the problem to be solved, properties of the target, or relationships among different targets. For instance, “mangroves are evergreen trees located in the intertidal zone” is a fact. Rules are conditional statements that can result in certain outcomes or conclusions drawn from the conditions. Variables are a certain aspect of the environment or special properties of the target to be extracted. They can be a spatial data layer. The variety and number of variables needed to be stored in the knowledge base depend upon the type of knowledge considered effective in mapping land covers. Output classes are virtually land covers to be mapped from the remote sensing data. They are the same as those in all other classification methods. Of these four components, rules are the most significant in that they dictate the outcome of reasoning during image classification. In order for an expert system to function properly, the expert knowledge must be specified in a way suitable for a rule-based
445
446
Chapter Eleven
Rule 1: If elevation >2000, then grassland is likely. Rule 2: If tone is dark, then native bush is likely. Rule 3: If texture is uniform, then plantation forest is likely. Rule 4: If there is a definite pattern, then orchards are likely. Rule 5: If located in periurban areas, then market garden is likely. TABLE 11.1 Expert System Rules for Classifying Vegetation
representation. As shown in Table 11.1, these rules should be expandable easily by including new ones or by modifying existing ones or facts already in the knowledge base without having to update all rules. Modification of the knowledge base may be triggered by conflicting assertions from multiple variables or when the rules are proved inadequate in a preclassification trial. As demonstrated in Chap. 9, knowledge is commonly represented as a tree of decision rules organized hierarchically. Each rule is associated with a number of conditions under which a set of lowlevel primary objects (e.g., pixels) gets abstracted into a set of highlevel land cover classes (Fig. 11.2). The exact number of conditions associated with a rule varies with the land cover to be mapped. Usually, the more conditions are considered, the more reliably the
Condition 1 Cover A
Rule I
Rule I
Condition 2 Condition 1 Condition 2
Cover B Rule II
Condition 1
Rule I
Condition 1 Condition 1
Rule II Condition 2 Cover C Condition 1 Rule III
Condition 2 Condition 3
FIGURE 11.2 A typical example of a knowledge base used in knowledge-based classification. The number of rules and conditions under each rule varies with the cover to be classified.
Intelligent Image Analysis concerned land cover class can be identified. However, more conditions also create more chances for the rise of conflicting evidences. Rules are commonly codified manually from knowledge. However, it may take a long time to construct the knowledge base using this manual method. An alternative manner of automatically generating rules is via machine learning or decision trees. For this reason, expert systems are commonly known as expert decision trees. Nevertheless, the two differ from each other in how the conclusions are reached. In the decision tree, the condition is evaluated first before a conclusion is reached. In an expert system, the conclusion (hypothesis) cannot be reached until all the conditions are evaluated to be true.
11.1.3
Expert Systems and Image Analysis
The attempt to incorporate knowledge into automatic image analysis started in the early 1980s when Goldberg et al. (1983) produced a rulebased expert system for analyzing forested regions in multi-temporal satellite imagery. After the image data are classified using standard methods with confidence or reliability attached, the results are reevaluated using the expert system. Production rules are used to check whether the detected changes are genuine or caused by misclassifications. Another hierarchical expert system was devised to update forestry maps from satellite data (Goldberg et al., 1985). The top-level experts set the agenda, such as detecting changes between the maps and satellite imagery. They communicate with low-level experts via the blackboard method. Instead of dealing with satellite data, a knowledgebased system was designed for aerial photograph interpretation (McKeown, 1984). Essentially, this system is a comprehensive database containing high resolution aerial photographs, digital terrain data, and digitized maps from which spatial knowledge is derived. This system has a limited capability of “recognizing” objects, except airports. Since the research was still in progress at the time of publication, no report on the effectiveness of the system is found. By comparison, the expert system developed by Nicolin and Gabler (1987) for automatically analyzing aerial images of suburban scenes is much more advanced. Able to handle static, monocular, and panchromatic aerial photographs, it works by first constructing a gray level profile, followed by image segmentation and a structural analysis of segments in the processed image. During the final stage of analysis, semantics is attached to image segments. This system, designed mainly for identifying individual objects from large-scale photographs, bears some embryonic resemblance to modern day intelligent image analysis systems. By the late 1980s there was a resurgence of interest in the use of expert systems for processing satellite data. Goodenough et al. (1987) described two hierarchical expert systems: the first for updating maps based on their comparison with images; the second for providing
447
448
Chapter Eleven advice on extraction of information from remotely sensed data. This system is intended to facilitate the user to access existing hardware and software resources efficiently. Thus, it is not related particularly to information extraction. This deficiency persisted in other similar studies. For instance, a model of expert systems provides advice on how to combine different image processing functions to most effectively analyze images (Matsuyama, 1987). Since this research did not proceed beyond the implementation stage, it is impossible to judge its performance. This situation changed with another embryonic expert system described by Schowengerdt and Wang (1989). It has two subsystems, one for image processing, and another for rudimentary image analysis, such as contrast enhancement and noise suppression. Primarily, the system serves as a friendly interface between the analyst and the image processing system. All the above expert systems fall into three categories: consultation systems for image processing, knowledge-based program composition systems, and systems for image segmentation (Matsuyama, 1989). The knowledge is about how to use image processing techniques, not on the scene under consideration or on the targets to be extracted. These systems offer a user-friendly operating environment for image analysis with limited enhancement of classification capability. Effective use of available image processing methods and functions is accomplished through the shell environment. It is so powerful that it supplies most of the features necessary for developing and implementing knowledge-based systems. The lack of progress in improving automatic extraction of objects from images using knowledge-based analysis did not change until Mulder et al. (1991) designed a rule-based expert system to classify soil from SPOT (Le Systeme Pour l’Observation de la Terre) images. An image that has been segmented already and attributes (e.g., output classes of interest) form the inputs to the expert system in which rules are implemented in both boolean logic and the Bayes’ theorem. Each segment is checked against the list of the attributes before being allocated to one of the classes. This early system has limited functions to improve the accuracy of image classification. Later this situation changed with the application of expert systems to selection of spectral bands for classifying polar ice (Penaloza and Welch (1996), production of landslide warning maps from Earth observation satellite and ancillary data (Muthu and Petrou, 2007), and for automated identification of surface materials based on their spectral properties (Kruse et al., 1993). In these applications, the expert system was used to determine the effectiveness of each method in classifying polar scenes, to further reduce the features into a more optimal set, and to automatically identify the principal surface mineralogy from Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data, respectively. As for image classification, the expert system approach was applied to mapping vegetation types in a eucalypt forest by incorporating
Intelligent Image Analysis ecological knowledge (Skidmore, 1989), combining airborne hyperspectral imagery with terrain data derived from radar altimetry to classify coastal wetland vegetation (Schmidt et al., 2004), and classifying urban areas from satellite imagery (Moller-Jensen, 1990). Different types of classification rules were tried to explore the possibility of using “quantitative knowledge” and more informally stated “heuristic” assumptions, reflecting the human interpretation process as classification criteria. Later, object models and texture were incorporated into rule generations in an expert system (Moller-Jensen, 1997). In summary, expert systems have been used for a wide variety of purposes in image analysis, such as selection of the most useful features or evidence in a classification. The early systems were designed to facilitate the use of then-complex image processing systems. Only recently has the application of expert systems been widened to include classification of land covers from satellite data. Some successes have been achieved in improving the classification accuracy of vegetation mapping. No general expert systems have yet been invented for classification of land cover maps from a generic type of satellite data.
11.2
Knowledge in Image Classification Although knowledge has been widely associated with image analysis methods such as rule-based, model-based, and context-based, there is no universally accepted definition of knowledge. In the broad sense, knowledge refers to any kind of information, except the normal input data or not so commonly or widely used data about the target and other related objects, including models, rules, and context (Baltsavias, 2004). In image classification, knowledge may signify awareness of or familiarity with pixel identity as gained from visual interpretation or image data analysis in the past. It portrays the geometric, topologic, or even thematic properties of a group of pixels restricted to a special domain of thematic mapping. Knowledge can be classified into various types using different criteria.
11.2.1 Type of Knowledge The knowledge used in image analysis falls into two broad types, procedural and declarative in terms of their function (Robinson et al., 1986). Procedural knowledge refers to knowledge required to perform a particular task, such as the know-how of computer programming. It describes how to locate, recognize, and classify ground cover features. Declarative knowledge is defined as acquaintance, familiarity, a fact, or an integrated collection of facts about an information class or its relationship with other cover classes in the vicinity. Such knowledge can be either image or nonimage-based information that is conducive to inferring or reasoning the genuine identity of ground features.
449
450
Chapter Eleven Declarative knowledge used in image classification can be grouped in various ways, ranging from general to domain specific. General knowledge pertains to acquaintance with a geographic area or a geographic feature (e.g., vegetation). It can be gained from reading textbooks, professional journals, or originate from special knowledge, judgment, and experience. General knowledge also refers to familiarity with the imaging system (e.g., spatial resolution and number of spectral bands), or the structural and spatial layouts of artificial structures, as well as prior map knowledge. Domaindependent knowledge, or domain-specific knowledge, includes familiarity with the scene and types of structures and their relationship. It may also include contextual and semantic constraints among objects (e.g., hedgerows border fields) (Mason et al., 1988). In the automated mapping of ground features from satellite data, domain knowledge can be subdivided into different detail levels (Argialas and Harlow, 1990), for instance, at the regional level and at the object level. Regional knowledge provides information on the geographic area in which the ground features to be mapped are located. Such knowledge is vital in accurately mapping certain cover classes that have a strong geographic affiliation. Object knowledge concerns the appearance of features in a given image, such as size, shape, texture, and color, which may be related to its spatial resolution. Domain-specific knowledge is related closely to discipline knowledge or target knowledge. Discipline knowledge can be further classified as factual or heuristic (Argialas and Harlow, 1990). Factual knowledge, or facts, are widely shared through such publications as journals and books, and commonly agreed upon and accepted by those knowledgeable in the field concerned. By comparison, heuristic knowledge is based on good practice, judgment, and plausible reasoning in a given field. As such, it is not publicly discussed as it is mostly experiential and judgmental knowledge of performance. Thus, heuristic knowledge has a limited scope of application in knowledge-based image classification. In terms of its sources, knowledge may be categorized into internal or external. Internal knowledge is derivable from the remote sensing image data to be classified. External knowledge refers to the image scene and its relationship with the environment. Such knowledge is classified as type I or type II by Zhu et al. (1996) in mapping natural resources. Type I knowledge refers to the typical environment configurations under which resource exists. This knowledge may take such forms as global classification systems (e.g., soil taxonomy) and resource descriptions (e.g., soil descriptions). Type II knowledge is usually domain specific, such as ecological knowledge and geographic knowledge. It defines the relationship between the changing memberships of a resource category with the deviation of environmental conditions from the typical configuration. Type II knowledge may be approximated by general linear regression
Intelligent Image Analysis models needed to map natural resources as spatial continua under fuzzy logic. In contrast to such domain-specific knowledge, it is the spectral and spatial knowledge that is commonly used in knowledgebased image classification.
11.2.2
Spectral Knowledge
Object knowledge, or knowledge of the target and its context within the scene, may be spectral or spatial in nature. Spectral knowledge depicts the radiometric properties, or pixel values, of target land covers in the spectral bands used in a classification. As shown in Fig. 11.3, this knowledge is a reflectance profile in the multispectral domain. Spectral knowledge may refer to the contrast and color of a cover category in a given spectral band. It can also refer to the selection of spectral bands for a classification, or the description of each category in terms of the relevant spectral features. A wide range of sources are available for gaining spectral knowledge, such as the irrefutable physical nature of the situation at hand, the experience and expertise of the image analyst, or the satellite data being analyzed (Civco, 1989). Understanding the physics of the scattering process, such as the nature of backscattering from surfaces and volumes on radar images, is able to generate spectral knowledge. In addition, spectral knowledge can be gained from experience with extensive experimental measurements and theoretical analyses conducted for ground covers (Pierce et al., 1994). However, this means of acquisition is much more complex than examining training samples for possible pixel value ranges.
255 Forest
DN
Quarry
Mudflat Water 0 Band 1
Band 2
Band 3
Band 4
FIGURE 11.3 Spectral knowledge of land covers in different spectral bands used in knowledge-based image analysis. The pixel value, or digital number (DN), corresponds to reflectance; spectral bands are recorded at different wavelengths.
451
452
Chapter Eleven In the most common method of acquiring spectral knowledge, the remotely sensed data or a small portion of them (i.e., the training samples) are examined to establish the typical response profiles of relevant surface materials in the multispectral domain, or the spectral reflectance characteristics of the features to be classified (Fig. 11.3). In order to generate reliable spectral knowledge, it may be necessary to radiometrically calibrate the bands of the same multispectral data, especially for multitemporal data. Any radiometric noise degrades the quality of spectral knowledge. The gained knowledge is represented in the form of a spectral range in a given band for a given land cover class. This spectral range characterizes spectral relationships between and within categories, or category-to-category knowledge. The wider the difference between categories, the more accurately they can be separated. Multiple spectral rules in different spectral bands may be combined logically in a statement to reach a decision in the knowledge base. As the number of spectral bands increases, spectral knowledge becomes more specific and the statement becomes more complex. Other methods of acquiring spectral knowledge include laboratory and in situ measurements using a spectrometer. Generalized spectral knowledge requires extensive laboratory spectral measurements and analysis to establish and characterize the quantitative relationship between a target, target mixtures, and their spectral response (Kruse et al., 1993). In the laboratory setting it is very difficult to replicate the natural conditions under which the target is sensed, even though it is much easier to control certain variables in the measurement. By comparison, in situ measurements are more realistic. Attention must be paid to the weather conditions and the timing of measurements. Both should resemble closely those during imaging. Neither in situ nor indoor measurements, however, are perfect in that the measured results obtained at the surface of the Earth exclude the atmospheric effect. Although relatively easy to acquire, spectral knowledge alone may not always lead to effective discrimination of all land covers to be mapped. Many vegetative cover types cannot be reliably identified solely on a spectral basis, even on large-scale airborne color imagery (Plumb, 1993). If only spectral knowledge is used, misclassifications are still persistent for certain land covers, even though the results are generally more accurate than those obtained without using the knowledge (Kartikeyan et al., 1995). The accurate identification of such covers requires additional spatial information that may help to distinguish certain cover types.
11.2.3
Spatial Knowledge
Spatial information depicts the spatial properties of pixels and spatial arrangement (e.g., location and association) of target covers in the input imagery. Spatial knowledge of the geometric characteristics of
Intelligent Image Analysis an array of pixels is appreciated from such image elements as shape and size (e.g., land use parcel size). Shape refers to the geometric configuration of a group of pixels. It can be defined as compact or elongated using the area-to-perimeter ratio. Size refers to the physical dimension of the array of pixels. It can be determined by counting the number of pixels that are regarded as representing the same object. Compared with spectral knowledge, spatial knowledge is more diverse. It may encompass information on geographic location, spatial relationship (e.g., context), spatial arrangement (e.g., texture) (Moller-Jensen, 1990), and geometry. Knowledge of the spatial or contextual relationship of one ground feature with another or knowledge of their spatial arrangement can also be derived from diverse sources of material. Remotely sensed images, aerial photographs, topographic maps, and thematic maps are the ideal sources for acquiring spatial knowledge, even though they may differ in their spatial resolution and reliability. It is much more difficult to represent spatial knowledge than spectral knowledge. Apart from location, spatial knowledge is thorny to portray quantitatively and precisely. Compared to geometry, location is relatively easy to represent with a pair of coordinates. So long as an image is properly georeferenced, the location of any pixels in it can be uniquely identified. In fact, spatial knowledge can be gained from any data layers that have been projected to a known ground coordinate system. The accuracy of locating a pixel or the precision of spatial knowledge is a function of the spatial resolution of the image and its georeferencing accuracy. Huge discrepancies in image rectification residuals may lead to imprecise spatial knowledge or even wrong knowledge. Dissimilar to locational knowledge, contextual knowledge is more difficult to represent as it involves a combination of distance and bearing. This representation tends to be cumbersome and imprecise. The utility of spatial knowledge in image classification varies with its type and the target features to be mapped. Location is important to discrimination of certain features that have a strong environmental association, such as vegetation. Its distribution is affected by elevation and slope aspect. Spatial knowledge in the form of slope aspect facilitates separation of spectrally overlapping vegetation classes (Plumb, 1993). Spatial knowledge in the form of geometry, however, is not as effective as expected in mapping urban areas from aerial photographs owing to the lack of uniqueness (Mehldau and Schowengerdt, 1990). This can be improved via the use of external knowledge.
11.2.4
External Knowledge
This nonimage knowledge concerns the relationship between the geographic phenomenon under study, observable in the image and other environmental variables. Such knowledge describes the spatial
453
454
Chapter Eleven characteristics or relationship between one aspect of the environment (e.g., slope orientation) and the identity of land covers. Knowledge external to the scene is domain specific (e.g., ecological based). It serves an especially valuable function in mapping features that bear a close relationship with the environment, such as vegetation. For instance, vegetation species distribution is related commonly to such environmental parameters as gradient, aspect, and topographic position (Skidmore, 1989). An ecotonal association may exist between forest types and the environment. Mangroves are distributed in the intertidal zone in the coastal environment. Such discipline knowledge is usually acquired exclusively from human experts. Unlike internal knowledge that can be derived from remote sensing data themselves, external knowledge has to be acquired from non-remote sensing data, such as field samples and geographic information system (GIS) data layers (Schmidt et al., 2004), or conversation with a human expert. Field plots, however, are available at limited spots at most. They are expensive to acquire and their acquisition may take a long time. Increasingly, knowledge external to the scene is acquired from an existent spatial database that has evolved to become the most important knowledge source. Common components of the database are topographic data in the form of digital elevation models (DEMs), soil maps (e.g., soil type and moisture), geology maps, climatic data (e.g., temperature and rainfall), and hydrographic data (e.g., drainage). Topographic data are readily available and have a spatial accuracy standard consistent within a national boundary. A wide range of environmental variables, such as elevation, distance to existent roads, and neighborhood size, can be readily derived from topographic data. In addition, existing digital thematic maps, large-scale plans, cadastral maps, bathymetric maps, and road maps are all potential sources of external knowledge. With the increasing use of GIS databases, these data layers will become more readily available and more diversified. Additional preprocessing may be essential to convert them into a useable format. Derivation of external knowledge is ideally done from overlay analysis of remotely sensed data with GIS data. Given that these external data sources play a significant part in the decision making during image classification, they should not be regarded as ancillary or auxiliary anymore. In fact, they are an integral part of the knowledge base. Use of external knowledge, such as a priori information about the expected distribution of classes in a final classification map, can reduce classification errors in remotely sensed images. Incorporation of external knowledge into image classification resulted in more accurate mapping of ground covers (Wilson and Franklin, 1992). For instance, the accuracy level was improved from approximately 74 percent using remote sensing data alone to 85 percent after these data were combined with geomorphometric variables in the classification. The degree of the effectiveness of external knowledge is governed by the features to
Intelligent Image Analysis be mapped, as well as its quality, such as its reliability and currency (Gao et al., 2004).
11.2.5
Quality of Knowledge
Knowledge quality should be judged against four criteria: precision or accuracy, reliability, currency, and universality. Accuracy refers to the degree to which the knowledge is valid. It is usually related to the spatial resolution of the satellite data. For instance, the use of coastal line derived from small-scale topographic maps is not precise as local geographic variations are likely to have been generalized. Imprecise knowledge cannot guarantee the effectiveness of knowledge-based classification. In order to ensure the highest quality of the knowledge generated, the external sources must have a spatial scale compatible with that of satellite imagery. Discrepancy in scale means slightly varied positions for the same physical feature, and hence degradation in the quality of spatial knowledge derived from it and its effectiveness in subsequent image classification. Apart from the data source, the manner of expressing knowledge also affects its accuracy. It is well known that there is a correlation between the distribution of biomass and elevation. As elevation rises, land cover gradually transforms from forest to grassland. Dependent upon the absolute height above sea level, biomass can change gradually from mature trees, to shrubs, to lichens, and eventually to barren. At a given location, the transition from one type of vegetation to another takes place at an approximate elevation. If this relationship is expressed as a set of discrete rules, they are likely to be imprecise. The knowledge is more accurate if the relationship is expressed as fuzzy. Reliability refers to the degree to which the knowledge is correct. Knowledge is not reliable if the source data from which it is derived are imperfect. For instance, if the spectral knowledge is derived from a very small training sample that is atypical of a land cover, the knowledge derived from it cannot be reliable. Random noise in the input image and atmospheric noise can also degrade the reliability of the spectral knowledge gained from it. Similarly, unreliability occurs if the training samples are not selected from the images to be classified or are selected from images recorded at a different date. Another cause of generating unreliable knowledge is the use of external sources that have a high degree of uncertainty and a spatial scale much smaller than that of the imagery to be classified. Unreliable knowledge should be avoided in knowledge-based image classification because rules generated from it are not precise or adequate enough to warrant improved classification accuracy. On the contrary, it is highly possible for them to degrade the accuracy of final classification results. Currency refers to the recency of the knowledge. Knowledge currency is especially important to consider for those geographic features that change quickly over a short time span, such as vegetation and tidal position. Vegetation structure, species composition,
455
456
Chapter Eleven vegetation distribution, and even wildlife habitat can all change very quickly, even within days because of a forest fire. Tidal position always changes with time. If the external data source used to derive domain knowledge for land covers whose properties vary drastically with time, it is imperative that the data be obtained at a time as close to that of the image data as possible to ensure that the knowledge gained from them is reasonably current. Otherwise, the knowledge derived from obsolete sources can be so outdated that it is virtually useless. For instance, the coastal line shown on satellite imagery is the watermark at the time of imaging. At this given moment the tide could be high or low. However, the same coastal line from external topographic data represents the highest watermark. The two positions of coastline are rarely identical to each other owing to tidal fluctuation. This temporal discrepancy or delay should be avoided by using more stringent criteria if the object of study is located in the coastal environment. In addition to temporal incompatibility, another aspect of knowledge currency that is also important to consider is seasonality. The same physical feature may vary with seasons. For instance, the snow line is lower in winter but higher in summer. This seasonal variation must be factored in when mapping vegetation from satellite images using external knowledge of the snow-line height. Knowledge universality refers to the applicability of knowledge to different geographic contexts. It is a time-consuming task to construct a knowledge base from scratch even with machine learning. A tremendous amount of time and expense is potentially saved if existing knowledge can be shared widely among all scientists working on the same set of remote sensing data. Knowledge sharing is permissible only if it is universally valid. Knowledge universality may be geographic or contextual. Geographic universality is an important consideration if the knowledge is to be applied to different study areas. Its validity may be restricted to a certain geographic extent or boundary, depending upon the nature of the knowledge. For instance, it may be universal within a local area, a region, or globally. Local knowledge is valid at a specific geographic site, or in a particular image (e.g., spectral knowledge). Regional knowledge applies to a broader geographic zone such as an alpine environment where vegetation distribution is governed by elevation. In general, spectral knowledge can be universal if derived from a classified image that contains virtually a set of land cover codes, so long as the codes for the same ground covers remain unchanged. However, if derived from raw images, the thresholds in the decision rules have to be modified accordingly from training samples selected from the respective images because spectral value is rarely universal. In fact, a rule-based classification was not even universally applicable to 14 individual image mosaics covering an area as small as 36,000 ha (Lathrop et al., 2006). Spectral knowledge has to be modified for different types of satellite imagery and for different geographic areas,
Intelligent Image Analysis even for the same type of satellite data obtained at another date, to compensate for the effect of changed illumination conditions during imaging. In contrast to spectral knowledge, spatial knowledge is more universal and applicable to anywhere around the world. For instance, mangroves are located in the intertidal zone around the globe. This fact or knowledge does not vary with geography and can be applied to the mapping of all kinds of mangroves around the world. Occasionally, it may be necessary to modify such spatial knowledge for multitemporal images. As an example, the exact position of the snow line has to be modified in accordance with the season of image acquisition.
11.2.6
Knowledge Integration
The diverse types of knowledge or multiple evidences discussed previously may be combined to form a compound decision rule during image classification. Three methods are available for this integration. The first method is to join them together in a conditional statement using the boolean decision rules based on the logic AND in the following form: IF “NDVI>=0.7” (first evidence) AND “elevation<2000” (second evidence) AND “ASPECT=North“ (third evidence) THEN...(decision) In this example, three pieces of evidence are integrated. In practice, there is no limitation as to how many pieces of evidence can be combined in the statement. This qualitative reasoning method of combination is deterministic. There are only two possible outcomes, true or false. The final evaluation of the hypothesis is true only when all conditions are met or all evidences are correct. This method of combination functions well only when all evidences are equally reliable or when their reliability is unknown, so all are treated equally. It is flawed if all available evidences have an unequal degree of reliability. In this case it is more precise to reflect the reliability of a piece of evidence using fuzzy logic (Chou et al., 2005). The uncertainty level is quantified by assigning a confidence value (e.g., a membership between 0 and 1) to the evidence. A membership lying anywhere between these two extremes is especially suited for those concerning environmental relationships. The knowledge base shown in Fig. 11.2 thus has a probability attached to every condition, every rule, and every information class. Furthermore, the contribution of a given piece of evidence toward the separation of ground features can be precisely captured by assigning a larger weight to it. The combination of such evidences requires the second method of integration, based on probability, known as numerical reasoning (Skidmore, 1989). Numerical reasoning is essential for nondeterministic knowledge, such as heuristic
457
458
Chapter Eleven estimates from the “feeling” or “knowledge” of experts (Schmidt et al., 2004). Based on the Bayes’ theorem, this method makes use of the same probability function as Eq. (7.11) except that the probability is substituted with the confidence value of the evidence. This method is better than the first in that it allows the accumulation of evidence by using previously classified results or when new attributes become available. It has high classification accuracies and is computationally efficient (Srinivasan and Richards, 1990). The last method is based on the Dempster-Shafer theory of evidence (D-S ToE). Its mathematical underpinning is so complex that it will be discussed in depth under a separate heading in Sec. 11.5.
11.3
Knowledge Acquisition Knowledge acquisition is the process of transforming domain-specific problem-solving expertise from some knowledge source to a format recognizable by the computer (Buchanan et al., 1983). This complex cognitive process may involve perception, learning, communication, association, and reasoning. Under ideal circumstances the knowledge about the target may be derived statistically. In particular, the Bayes’ theorem has been used to update the probability of rules. Statistics-based rule derivation is applicable only in the ideal situation. As a result, this method of acquisition has not found wide applications in comparison with other methods, such as domain experts and machine learning.
11.3.1 Acquisition via Domain Experts Robust and versatile knowledge-based image classification requires at least one acknowledged human expert in the field concerned (e.g., distribution of vegetation in relation to the physical environment). In addition, the acquisition of external knowledge also requires a knowledge engineer who is responsible for the representation of expert knowledge and the design of the inference engine. Another responsibility of the knowledge engineer is to select the aspects or attributes of data that have been found important in differentiating the land covers to be mapped. The knowledge engineer works closely with the human expert to develop and test declarative knowledge and production rules for inclusion in the knowledge base. The domain expert may be given a detailed questionnaire to fill out, which is then followed up by an interview during which the human expert articulates the domainspecific knowledge in a language comprehensible to the knowledge engineer (Huang and Jensen, 1997). The knowledge engineer chooses the knowledge components to be included in the knowledge base, decides its format of representation, and then translates the knowledge into a machine-readable format. Before knowledge can be acquired, the image analyst needs to clearly identify and delimit the area of application. The next task is to decide upon a form suitable for representing and organizing knowledge in smaller groups. Only then can the process of acquiring
Intelligent Image Analysis the desired knowledge from human experts in the field begin. Knowledge acquisition from domain experts has been generally considered a critical bottleneck in knowledge-based image analysis for two reasons. First, the human experts may not be able to articulate and formulate their knowledge consistently and thoroughly enough to enable its implementation. Second, it is not clearly understood how humans acquire, organize, and process domain knowledge. Associated with this lack of understanding is the difficulty in conveying the domain knowledge from the expert to the knowledge engineer. For instance, in acquiring image-based knowledge, the human expert reaches a decision about the identity of an object from simultaneous consideration of many image elements, some of which may be interrelated. The expertise accumulated through years of experience in photo interpretation may be difficult to accurately articulate and convey to others. However, the difficulty of knowledge acqui-sition does vary with the nature of knowledge and with the domain of application. The narrower the domain of application, the easier it becomes to acquire knowledge. The response to this difficulty is the emergence of knowledge acquisition via machine learning.
11.3.2 Acquisition through Machine Learning Machine learning is a process of acquiring knowledge through computer modeling. Since this method can be used to build a knowledge base automatically, the process of knowledge acquisition is thus expedited. In machine learning, knowledge is generated from existing data through either inductive or deductive inference strategies, which might make the knowledge engineer redundant in knowledge acquisition (Fig. 11.4). Inductive learning refers to making accurate generalizations from a few scattered facts using inductive
Literature
Training Training Domain expert data
Domain expert cti
on
g g nin in ar nd Le rsta de
Un
Knowledge engineer
Training
Int
era
Learning Machine learning
Encoding
KB (decision rules) (a)
KB (decision rules) (b)
FIGURE 11.4 Methods of knowledge acquisition in knowledge-base development. (a) Traditional method of constructing the knowledge base; (b) the machine learning approach. (Source: Modified from Huang and Jensen, 1997.)
459
460
Chapter Eleven inferences. Inductive learning algorithms can be used to generate production rules from training data. A number of good training samples suffice for the construction of the knowledge base. It can be achieved much more easily than explicit extraction of complete general theories from the domain expert. Several inductive learning algorithms have been developed already, one of which is the decisiontree learning algorithm. This flexible algorithm allows the learned knowledge to be represented as rules, and hence is highly suitable for building the knowledge base. The machine learning approach to automation of building knowledge bases for image analysis uses an inductive learning algorithm (Huang and Jensen, 1997). This method of building a knowledge base from training data for rule-based image classification is much easier than using the conventional domainexpert approach of knowledge acquisition. The construction of a knowledge base from remotely sensed data using this method usually involves three steps: training, decision tree generation, and the creation of production rules. The training dataset serves as examples from which concepts (e.g., rules that dictate how the remaining data should be classified) are generalized. A decision tree is virtually a classifier comprising a root, branches, leaves, and nodes. The path from the root to a leaf can be represented as a production rule. The issue of knowledge representation in this rule format is so complex that it will be covered separately (Sec. 11.4.2).
11.3.3 Acquisition through Remote Sensing and GPS Remote sensing imagery is a rich source of information. A wide variety of information that is unable to be recognized or taken advantage of in per-pixel classification is visible to the human interpreter. The input image can yield potentially useful special knowledge prior to any intelligent image classification using both manual interpretation and digital analysis. Data acquired via the manual method (Fig. 11.5) usually have the vector format, showing the spatial position or distribution of critical features (e.g., coastal line, hydrographic features, snow line. Such spatial knowledge has to be transformed to the same ground reference system as that of the image itself before valid knowledge can be derived from them. The automatic classification method is suited to obtain spectral knowledge of the features to be mapped. Such knowledge can come from the training samples (knowledge base I), or externally from a GIS database (knowledge base III), and reveals the minimum and maximum pixel values of a given feature in a spectral band, and even the degree to which the land cover classes to be mapped can be differentiated spectrally. It can also originate from the classified image (knowledge base II). There are several advantages of acquiring knowledge from the image as the knowledge and the image data have the same spatial resolution and temporal currency. Another advantage is that a wide variety of remotely sensed data are readily available. Thus, it
Intelligent Image Analysis Remote sensing imagery
GIS data layer
Subsetting and georeferencing
Digitization, rasterization, and encoding
Training samples
Knowledge base I
Image classification
Knowledge base II
Coregistration with RS imagery
Knowledge base III
FIGURE 11.5 Knowledge acquisition from remote sensing (RS) images and GIS databases. The external knowledge from a GIS is used to generate spatial knowledge base III manually, while image classification is able to yield spectral knowledge bases I and II.
should be easy to find the right data at the right spatial resolution. Any ground features that are visible on the satellite imagery can be used to acquire knowledge about them. The downside of using remotely sensed data is their imprecision in comparison with in situ collected data (Muthu and Petrou, 2007). Besides, the process could be slow and tedious. It is not possible to acquire knowledge for all features. For instance, knowledge of topographic characteristics, soil, and tracks too narrow to be discernable on satellite imagery has to rely on other data sources, such as GIS and global positioning system (GPS). The GPS method is highly effective in acquiring point and linear data, such as the position of cell phone transmission towers, roads, and administrative boundaries. Such data are invariably spatial in vector format. Such acquired data are much more accurate and current than those from other sources. Already in digital format, they can be exported to a knowledge-based image classification system easily with minimal processing. If mounted in a vehicle, the data can be acquired rather quickly. This technology is particularly valuable for obtaining special knowledge about ground features too small to discern on remote sensing imagery, or on larger-scale aerial photographs. The main problem with this method is accessibility. Coastal lines, hydrological features (e.g., sand bars and channel boundaries), and snow lines are usually located in inaccessible and inhospitable environment. Special prior permission for access has to be sought in some cases. The acquisition process can be very slow and lengthy.
461
462 11.4
Chapter Eleven
Knowledge Representation Once the knowledge necessary for correctly classifying ground features in a particular geographic area has been acquired, the next step is to decide how to represent it in an appropriate machinereadable form. Knowledge representation should aim to render knowledge in such a manner so as to facilitate maximum inference or drawing conclusions from knowledge. How the knowledge should be represented is affected by the image analysis system, logical structure, and knowledge representation architecture. Several theoretical frameworks have been developed for knowledge representation, one of which is models. They allow a ground feature to be portrayed from multiple perspectives, including its size in a given image of a certain spatial resolution (e.g., roads cannot be more than 20 pixels wide). It is advantageous to use simple, static spatial models connected to a metadatabase to represent objects if all information related to them can be modeled, because no remote sensing experience is then needed in their establishment (MollerJensen, 1997). Object models are usually based on geometry, even though topology and materials or spectral characteristics are sometimes used (Baltsavias, 2004). Geometry-oriented object models are able to depict mainly artificial objects that have a definable geometry or component, especially three-dimensional (3D) objects, such as buildings from stereoscopic aerial photographs (Matsuyama, 1987). They are ill suited to two-dimensional (2D) land covers that are more precisely described by their spectral and texture properties, hence such use is not covered here. Instead, this section concentrates on the methods of knowledge representation that have found applications in image analysis, including semantic networks, rulebased, frame-based, and blackboard representation. Which one of them is the best format of representation varies with the nature of the knowledge to be represented. For declarative knowledge, the best ways are semantic networks and frames. For procedural knowledge, the best way is rule-based representation.
11.4.1
Semantic Network
As one of several means by which knowledge may be utilized for image classification, a semantic network is a directed diagram good at representing declarative knowledge. Objects are represented by interconnected nodes. Their relationship, represented by directed arcs, are characterized by attributes. A set of binary relationships is depicted in a semantic network. Essential in a semantic network are entities. An entity consists of property slots and relation slots. Illustrated in Fig. 11.6 is a possible means of representing mangroves using a semantic network in which the relations are expressed as “part of” and “is.” Forest is described by three property slots: tone,
Intelligent Image Analysis
Trees
is a
part of
Mangroves
Canopy
instance of Coarse
texture
Forest
location
tone
Light
Intertidal zone
instance of
Coastal environment
is an
Indicator of strong reflectance
FIGURE 11.6 network.
A possible layout for representing mangroves using a semantic
texture, and location. Mangroves are described by two relation slots, trees and canopy. One advantage of semantic-network representation is the elimination of the need to define an attribute explicitly as it can be inherited from another class in a hierarchy. For instance, urban commercial has all the features associated with built-up areas (e.g., a paved surface). This method of representation is limited in that it is not suitable for inference in rule format, and hence has not found wide applications in knowledge-based image classification.
11.4.2
Rule-Based Representation
Of the various forms of knowledge representation, rule-based representation is the most popular. A rule is a conditional statement, or list of conditional statements, about a variable’s values and/or attributes that determine an informational component or hypothesis. As the most successful method of knowledge representation, rules
463
464
Chapter Eleven consist of a series of logical clauses in the form of “IF condition THEN action ELSE some other action” or IF [C1, and, C2 and ... Cn] THEN in which Ci (i = 1, 2, ..., n) are logical constraints on one or more features. It is quite common for more than one rule to be involved in generating the hypothesis p. Rj (j =1, 2, ..., m) are the set of rules which lead to a hypothesis Hk. Hk is acceptable if at least one or more of Rj is satisfied. The conditional part comprises one or more antecedent clauses, and the action part or hypothesis leads to a consequence, such as the creation or modification of working memory elements. It is also possible to have multiple hypotheses associated with the same conditional statements, such as IF [C1, and C2, and ... Cn] THEN ELSE ELSE Hypothesis 1: frequent Hypothesis 2: rare Hypothesis 3: seldom In this example three hypotheses are associated with the outcome of evaluating one conditional statement or a combination of several. The most likely hypothesis is activated first, followed by the less frequent hypothesis. The rarest hypothesis is always placed last. In this way, processing efficiency is higher. Rules may be empirical and spatiotemporally restricted (e.g., certain vegetation cannot survive in certain kinds of climate). Rules can be created from the constraints on spatial and spectral parameters. Multiple rules and hypotheses may be joined together hierarchically to dictate an ultimate set of target information classes or terminal hypotheses. Rules are executed in two ways in a knowledge-based system, forward chaining and backward chaining (Argialas and Harlow, 1990). In forward chaining, rules are matched against existing facts in an attempt to construct new facts or hypotheses. In backward chaining, the newly established rules are applied to proving what the knowledge-based system starts with. This process of matching rule conditions to the facts is the task of the inference engine. They differ from each other in the order in which rules are selected and fired. Rules may be organized hierarchically in a tree-like structure, not only in general but also related to the scene. At a higher level rules may be based on pixel value itself. At a lower level they can be based on shape, surface material, spatial context, and spatial relations. A knowledge-based classification model thus resembles a decision tree
Intelligent Image Analysis classifier. However, this tree is not identical to those in Chap. 9 in that all rules are generated by the machine automatically. Rules in the expert classifier are based on ad hoc conditional value inputs associated with various criteria along decision tree branches (Stow et al., 2003). One of the main disadvantages of the rule-based approach is that each classification process induces activation of all rules for each pixel in the image, which slows the classification process considerably (Desachy et al., 1996). This limitation can be overcome with the connectionist approach (i.e., neural network) to speed up computation. A learning process based on the fuzzy neural network approach is another means of automatically constructing the knowledge base. No matter whether a knowledge base is constructed automatically using machine learning or manually, it must meet certain criteria. Namely, all rules must be mutually exclusive and exhaustive. There must be a mechanism for resolving conflicting rules when they arise. If features in the data to be classified do not meet any rules, there should be provisions (e.g., default rules) that will be activated. The quality of all rules can be assessed against error rates produced by applying them to a test dataset. Rule-based representation has the advantage of being flexible. An existing rule base can be expanded simply by adding new rules to it. Obsolete or ineffective rules can be deleted from the rule base. However, this unordered and unstructured rule base is disadvantaged by its lack of organization. For instance, it is inadequate at representing complicated relationships such as taxonomy and belonging between classes of objects. These limitations can be overcome with the frame method of representation.
11.4.3
Frames
A frame can be intuitively regarded as a conceptual entity corresponding to a ground cover class with identifiable components. It consists of a number of “slots,” each storing a special attribute of the cover, its relations to other covers, and procedures to compute properties. This frame can be used to encode declarative knowledge about a ground feature, including its attribute and values deemed important and relevant to the correct labeling of its identity on the ground. Multiple frames may be needed to represent a complex cover. Different types of frames may be used for different types of land covers. For instance, the frames for water should be much simpler than those for urban in terms of the number of attributes to be considered. Multiple frames are connected with each other by pointers or indices that point to other frames or attached procedures (Argialas, 1989). Pointers are virtually semantic links to other similar concepts, or more general concepts from which properties are inherited, or more specialized concepts to which properties are passed. Relationships between frames can be depicted via membership links (e.g., part-whole relationship) and subclass links (e.g., belonging).
465
466
Chapter Eleven Thus, land cover classes can be organized into taxonomies through description of each conceptual or physical class as special cases of other more generic classes in a frame. Similar to semantic networks, representation of knowledge in frames enables all relevant information to be linked together. Dependent upon the complexity, nature, and purpose of a frame, it may contain the following information: a frame ID, its parent frame (if available) that provides a link for the inheritance mechanism, attributes and their values, and attached predicates (with the exception of the root frame) (Fig. 11.7). Sitting at the top of the hierarchy, the root frame is the parent of all frames. As such it contains neither slots nor properties. The number of explicitly defined slots in a frame varies with its complexity. A slot has an empty value field when initially defined. However, once defined, a slot can be shared by all of its descendants. Each slot may be associated with a number of conditions. Only when these conditions are met, is it assigned a value via a predicate. For instance, the if added predicate must be tested true before a value is inserted into a slot. In case of reading, the if needed predicate must be invoked first. Thus, there are two predicates attached to each slot, one for reading and another for writing values into a slot, to achieve the procedural attachment. Procedural attachment is a distinctive feature of frames since not all kinds of knowledge are suitable for declarative representation. Frames are good at performing certain types of inferences, such as inheritance of attributes from generic class frames to specific instances. Inheritance is accomplished automatically through a
FRAME
Frame m
Parent
Frame t
Attribute a
Value 1
Attribute b
Value 2
Attribute c
Value 3
Attribute d
Predicate 4
Attribute e
Value 5 (default)
If needed
Attribute f (inherited)
Value 6 (default)
If added
Attribute g (inherited)
Value 7
Method
Attribute h
Frame n
FIGURE 11.7 An example of a frame used to represent knowledge. (Source: Modified from Argialas, 1989.)
Intelligent Image Analysis hierarchy that contains the parent-child or ancestor-descendant relationship among all frames. Every child frame inherits all slots and properties of its parent frames. In this method of representation, all declarative knowledge about a particular information class is stored together. Accessibility to the knowledge can be improved through better organization, such as making the frames more modular-based. Thus, the information can be accessed easily and manipulated efficiently.
11.4.4
Blackboards
As a modified way of representing hierarchically organized knowledge, a blackboard is a database containing information about properties of regions and recognized objects. It serves as a short-term memory or a central repository for all shared information, consisting of a number of nodes that work collaboratively to solve complex and ill-defined problems. Each node in this hierarchy has a collection of unique domain-specific knowledge, usually taking the form of a series of rules called knowledge source. Each knowledge source has its own knowledge (e.g., facts, assumptions, and deductions drawn by the system during the course of solving a problem) and the mechanism for carrying out its reasoning. Each knowledge source generates its own solution elements and tries to contribute to the overall solution independent of other sources (Westinghouse Science & Technology Center, 1997). This independence means that it can be developed and maintained as a module. All knowledge sources contributing to the solution of the problem communicate with each other via writing on the blackboard (Fig. 11.8). The control system comprises an agenda setter and an events
Controller (agenda setter and events coordinator)
Database Communications interface
User interface
Rule based
Case based
Fuzzy logic
Neural networks
Generic algorithms
Legacy systems
FIGURE 11.8 A potential structure of the blackboard expert system. (Source: Modified from Westinghouse Science & Technology Center, 1997.)
467
468
Chapter Eleven coordinator that synchronizes blackboard activities, such as invoking pattern-directed procedures. The agenda of actions to be taken on the blackboard is visible to all knowledge sources. The agenda setter serves as a facilitator to determine which knowledge source offers the most insightful solution to the problem. The facilitator also mediates among different knowledge sources competing to write on the blackboard. The object-extraction session starts with the facilitator spelling out the problem and its specification with all known facts and assumptions on the blackboard. The solution to the problem may be based on rules, neural networks, fuzzy logic, and generic algorithms. This structure has several advantages, such as modularity, flexibility, and extensibility. It supports collaboration and interaction between different experts. Unlike a hierarchical knowledge-based classification system, blackboard is highly efficient in communication among all experts at different hierarchies and knowledge sharing through the event manager. New knowledge sources, when becoming available, may be added to the existing system without impacting other knowledge sources. Disparate knowledge sources can be easily integrated but still transparently managed by the control system. In addition, diverse modules can also be integrated using this model (Matsuyama, 1987). Nevertheless, this model of representation does not suit classification of satellite imagery as it is unlikely for many experts to be involved in reasoning the identity of pixels in the input image.
11.5
Evidential Reasoning The D-S ToE is related closely to image analysis because of its ability to integrate individual pieces of evidence or a rule-based model that are used in knowledge-based image classification (Baltsavias, 2004). It is able to perform domain-independent inference by combining evidences and rule bases and, in the process, representing some levels of ignorance, bias, and conflicts. This section introduces its mathematical underpinning first, and then focuses on its applications in knowledge-based classification, including an assessment of its performance.
11.5.1
Mathematical Underpinning
In evidential theory (Shafer, 1976), the hypothesis space denoted as Θ is made up of n propositions. Contained in this space, or frame of discernment, are all possible hypothesis subsets of Θ, numbered at 2Θ. The hypotheses in Θ are exhaustive empty set, and Φ is considered as a false hypothesis in Θ. Also called a mass function to distinguish it from the probability distribution, the basic probability assignment m(A) refers to the degree to which the evidence supports the hypothesis for element A. Known as a focal element, A is one subset of 2Θ. The mass function m(A) is expressed as a probability value in
Intelligent Image Analysis the range of [0, 1], and is assigned for every hypothesis and their possible union. This mass function has the following two properties for every element A:
∑
m( A) = 1
A⊂ 2 Θ
m(Φ) = 0
(11.1) (11.2)
The above Dempster-Shafer theory needs to be modified in light of image classification that aims to produce a crisp land cover map of C classes. The hypothesis space will become the class space. The set of all possible hypotheses 2Θ can be drastically reduced in classification of remotely sensed data as some joint sets are not relevant. For instance, there should be 23 − 1 = 7 classes if three land covers C1, C2, and C3 are mapped (e.g., three propositions) in the hypothesis space. They are C1, C2, C3, C1∪C2, C1∪C3, C2∪C3, and C1∪C2∪C3. With D-S ToE it is possible to consider groups of subclasses (e.g., winter crops vs. summer crops, urban covers vs. rural covers) in addition to individual classes. In most classifications only the three land covers need to be mapped and subclasses are ignored. This treatment makes all the propositions mutually exclusive in the class space. Thus, the theoretical number of 2C − 1 hypotheses is simplified to C hypotheses if the focus is on individual cover classes, as is the case with all image classifications. The probability mass has two extremes, belief (Bel) and plausibilities (Pls) for a subset B of C. Both can be calculated from the mass function using the following equations: Belm ( B) = Plsm ( B) =
∑ m( A)
(11.3)
∑
(11.4)
A⊂ B
A∩ B ≠ Φ
m( A)
These probabilities can either support or dispute a hypothesis. The low extreme value Belm(B) represents the proportion of belief committed to B based on the given piece of evidence, or the minimum uncertainty value about B. Representing the upper bound of probability distribution, Plsm(B) refers to the maximum degree to which the current evidence allows one to believe in A, or the maximum uncertainty value of A (Lu et al., 2006). The uncertainty interval defined by [Belm(B), Plsm(B)] is the range within which the true probability lies. Thus, the D-S ToE provides estimates of impression and uncertainty of the information derived from different knowledge sources.
469
470
Chapter Eleven
Spectral Band
Built-up (B)
Pasture (P)
Forest (F)
1
0.3
0.2
0.4
2
0.1
0.4
0.3
TABLE 11.2
Exemplary Values (Evidences) of Three Covers in Two Spectral
Bands
The total mass function m of two independent sources of evidence, Pi and Qj, with a mass function of m1 and m2, respectively, can be calculated using the Dempster’s rule of combination; or
∑ m1(Pi )m2 (Q j )
m(D) = m1 ⊕m2 =
Pi ∩Q j = D
1−
(11.5)
∑ m1(Pi )m2 (Q j )
Pi ∩Q j = Φ
where ⊕ represents orthogonal summation; D is the shared subset of Pi and Qj, D ⊂ C, and D ≠ Φ. An example is provided in Table 11.2 giving data used to calculate m(D) using two spectral bands, the independence sources of knowledge for the convenience of illustration. In this particular case D = {B, P, F}. Notice that the probabilities along each row do not add up to 1. Their residual [1 − m(B) − m(P) − m(F)], denoted as I, is called ignorance. Table 11.3 contains two axes, each corresponding to a source of evidence. The figures in rows 2 to 5 are the products of probabilities
B 0.3
P 0.2
F 0.4
I 0.1
B 0.1
B 0.03
Φ 0.02
Φ 0.04
B 0.01
P 0.4
Φ 0.12
P 0.08
Φ 0.16
P 0.04
F 0.3
Φ 0.09
Φ 0.06
F 0.12
F 0.03
I 0.2
B 0.06
P 0.04
F 0.08
I 0.02
Σ P∩1 QP=K i2
0.10
0.16
0.23
0.02
Supporting evidence Belm(B)
0.10/0.49 = 0.20
0.16/0.49 = 0.33
0.23/0.49 = 0.47
0.02/0.49 = 0.04
Conflicting evidence Plsm(B)
0.24
0.37
0.27
0.04
m ( )⋅m (Q)∗
*K = B, P, F i
TABLE 11.3 Calculation of the Dempster’s Orthogonal Sum Using Figures in from Table 11.2
Intelligent Image Analysis in both bands. In the next row all those products of the same cover (e.g., B), including those associated with ignorance, are summed up. For instance, the sum = 0.03 + 0.06 + 0.01 = 0.10 for B. If additional sources of evidence become available for image classification, the beliefs and plausibilities in space C need to be updated with the existing orthogonal summation. In this case the evidence from the third source can be treated as m3 while the current m1 ⊕ m2 can be treated as either m1 or m2 in the same manner as the combination of m1 with m2. Since the combination operator ⊕ is cumulative and associative, the order of undertaking the orthogonal summation exerts no impact on the final product of the operation.
11.5.2
Evidential Reasoning and Image Classification
Classification of remotely sensed data based on the evidence theory usually consists of five steps (Gong, 1996): • Determination of the probability distribution pij(xi) of Cj for each evidential source Ej. This evidential source may occur in the form of a spectral band in the input satellite data, or a GIS layer in the multisource input. In case of satellite data, pij(DN) can be approximated as the histogram of the ith band recorded at n bits, where DN = {0, 1, 2, …, 2n − 1}. • Calculation of the mass function m(Cj) for each class Cj (j = 1, 2, …, k) for each evidential source Ej. This calculation can be achieved using the following equation if the evidential source is composed of multispectral bands. mi (C j ) =
∑
f ( DN ) =C j
pij ( DN )
(11.6)
where DN is pixel digital number that varies from 0 to 255 for an 8-bit band; f(DN) = Cj defines a mapping function between value DN in evidential source Ei, the feature space or observation space or evidence space, and class Cj. If there is only a single band in the input, then mi (C j ) degenerates to pij(DN). • Combination of mass functions from all evidential sources using Eq. (11.5). This combination may be reiterated until all sources are taken into account if more than two sources are involved. • Determination of the belief interval for each class Cj using Eqs. (11.3) and (11.4). • Classification based on a set of evidences or observations and measurements, X = (x1, x2,…xn), on either the total belief or the total plausibility.
471
472
Chapter Eleven Image classification based on evidential reasoning has a number of advantages over that based on the Bayes’ theorem. It is not subject to the number of data sources and their enumeration scales. Multisource data enumerated at diverging levels can be easily incorporated into the classification. Nominal data such as soil and land use types can be handled by knowledge-based systems using the D-S ToE (Desachy et al., 1996). The capability of parametric classifiers that can only handle ratio and interval data is thus extended. Furthermore, D-S ToE is able to function well in light of incomplete, missing, or even conflicting evidences, thanks to the incorporation of ignorance in computing the accumulated belief value for each inferred class at each pixel. A further advantage of the D-S ToE is that it provides estimates of imprecision and uncertainty of the information derived from different sources (Cohen and Shoshany, 2005).
11.5.3
Utility
Whether a land cover Cj to be mapped can benefit from the use of knowledge in D-S ToE classification depends on a whole range of factors, such as its complexity and the evidences used. The effectiveness of knowledge in the classification is judged by the cumulative belief value, a measure of the evidential support accumulated for each cover class inferred at each image pixel xij in the Dempster’s rule. Its exact magnitude is affected by the available evidences or rules at the pixel supporting and/or conflicting with Cj. Given that a single class Cj may be determined from a number of variables, the same class may be associated with several cumulative belief values. There exists a positive relationship between this belief value and the number of evidences. A low value is indicative of more conflicting evidence and less supportive evidence. The distribution of these cumulative belief values reveals the complexity of this class. It is inversely proportional to the complexity of the classification problem and the heterogeneity of ground covers. For homogeneous classes (e.g., water), a prolific accumulation of supporting evidence is associated with high reliability and accuracy. In this case knowledgebased classification is scarcely advantageous over conventional parametric classifiers (Cohen and Shoshany, 2005). Spectrally complex classes (e.g., urban residential) usually have more conflicting and/or incomplete evidences with little supportive evidence accumulated. It is these classes that may benefit from the use of knowledge-based classification. Knowledge-based systems utilizing D-S ToE have been successfully applied to a wide range of areas partly because of their aforementioned advantages. Thanks to its capability to include multisource datasets, an evidential reasoning algorithm significantly outperformed a GIS-based maximum likelihood approach in analyzing a complex alpine tundra ecosystem in the Rocky Mountains (Peddle and Duguay, 1998). A multisource evidential reasoning
Intelligent Image Analysis classifier yielded an acceptable level of accuracy in mapping grizzly bear habitat. The accuracy ranges from 44.3 percent for alpine/ subalpine grasses to 100 percent for recently burned areas in a classification of 21 habitat classes at level III, with an overall accuracy of 85.9 percent (Franklin et al., 2002). This overall accuracy is much higher than 65.3 percent using the maximum likelihood classifier from the satellite data alone, or 71 percent using all data available. The higher accuracy of the evidential reasoning classifier is attributed to the use of a large and diverse dataset with 37 variables that encompassed satellite imagery, topographic descriptors derived from DEMs, and GIS inventory information. The inability of the conventional maximum likelihood decision rule to make full use of the available data is blamed for its inferior performance. In classification of rock types from a multisource dataset of remote sensing data and aeromagnetic, radiometric, and gravity data, the evidential reasoning method resulted in an overall accuracy of 94.7 percent and an average accuracy of 83.8 percent (Gong, 1996). In order to achieve such high accuracies, however, this classifier must be fed with training samples that have been selected with the right strategy, as they determine the quality of knowledge. In particular, the performance of this classifier is subject to sample size. A large sample size, being more representative, is conducive to the achievement of higher mapping accuracy. Additionally, its performance is sensitive to noise and data variability. Wide data variability, as is the case with agricultural land, degrades the mapping accuracy. The accuracy level became acceptable only after the initial results were hardened using the standard procedure. Much more accurate results (accuracy >90 percent) were obtained after the individual belief surface was reclassified into boolean layers in image classification (Lein, 2003). Through the application of this technique, a framework can be developed to support and guide the use of subjective judgments during the classification process and permit greater flexibility in the formulation of informational classes. Unlike the unsupervised method that performed well only in cases of homogeneity and uniqueness, this knowledge-based method is particularly effective in cases of conflicts and moderate support in mapping orchards, crops, and natural vegetation (Cohen and Shoshany, 2005).
11.6
Knowledge-Based Image Analysis Knowledge-based image analysis is a process in which declarative knowledge is relied on to infer the identity of pixels in the input image. In the process the knowledge used to derive the conclusion (e.g., knowledge base) is separate from the mechanism (e.g., the inference engine) used to reach the conclusion, in a way quite different from statistical methods. Knowledge-based image analysis has been described variously as “rule-based,” “model-based,” and “contextbased” in the literature (Strat and Fischler, 1991). Rules, models, and
473
474
Chapter Eleven context can all be regarded as specific formats of knowledge representation. The incorporation of knowledge in image classification takes place in two stages, concurrently or sequentially after classification. The former is termed knowledge-based image classification; the latter is called knowledge-based postclassification processing. There are two means by which knowledge-based postclassification may be implemented: postclassification filtering and postclassification spatial reasoning. Both implementations work on an already classified result to eradicate unreasonable assignments of pixel identities through the application of additional knowledge. All three types of knowledgebased image analysis are covered under separate headings in this section.
11.6.1
Knowledge-Based Image Classification
Knowledge-based image classification is a new approach to determination of pixel identity on the basis of its properties in relation to what is known. Unlike in conventional image classification procedures in which the classification decision is reached on the basis of pixel values statistically, in knowledge-based image analysis pixel values themselves are no longer involved in the decisionmaking process directly. Therefore, the statistical relationship among pixels becomes irrelevant. Instead, the relativity of pixel values becomes critical to the outcome of an evaluation. This relativity is usually expressed as a logic conditional statement in order to be comprehended by the machine. A simple comparison or a series of comparisons form the decision rules. During classification all knowledge is considered by evaluating the relevant rules. Knowledge-based image classification consists of a number of steps that may vary with the implementation system. The essential steps generally encompass preparation of the database, organization of information classes to be mapped, construction of the knowledge base, and classification (Fig. 11.9). Construction of the knowledge base forms the most crucial processing in knowledge-based image classification. Data from which knowledge is to be derived may come from a variety of sensors and platforms. Aerial photographs, satellite images, topographic data, and other GIS layers are commonly integrated into a spatial knowledge database. Of these data sources, satellite images are relatively easy to acquire. Their selection should aim at maximizing discrimination of land covers to be mapped from them. The important factors to consider are their spatial resolution and seasonality. Nonremote sensing data useful in generating external knowledge are diverse, such as soil maps, vegetation maps, rainfall data, or even recently surveyed GPS data. Which of these data layers should be selected for inclusion in knowledge-based image classification is judged by its effectiveness in discriminating the land cover classes to be mapped. Only those effective ones should be selected in order to minimize the complexity of the knowledge base. Other selection criteria
Intelligent Image Analysis
Data (remote sensing and non-remote sensing)
Preprocessing
Database Codifying Knowledge base
Results
Classification
Refined knowledge base
Evaluation
Satisfactory
Modification
Testing
No
Yes Termination
FIGURE 11.9
Procedure of knowledge-based image classification.
are their currency and scale compatibility with the remote sensing data. Currency is an especially important attribute to consider for geographic phenomena that tend to change frequently. Ideally, they should be temporally as close as possible to the remote sensing data to be classified. Integration of remote sensing and non-remote sensing data to the knowledge base may encounter difficulty if they have different accuracy levels or if they are enumerated at varying scales. In general, ground data having a local scale are more reliable than remote sensing data, even though the latter covers more massive areas on the ground (Muthu and Petrou, 2007). Such differences have to be resolved through careful selection of data and additional processing. Satellite data are enumerated numerically at a scale of 8 to 11 bits. Non-remote sensing data may have a different scale or are enumerated qualitatively. Data rescaling and quantification are vital if they are to be combined with other evidences. For instance, the definition of facts and rules has to be modified, using the spectral features, to a quantitative form so that probabilities can be generated (Kruse et al., 1993). Such preparatory steps form part of preprocessing. The objective of preliminary processing is to ensure that all the data have the same projection and are referenced to the same ground coordinate system. Radiometric rectification, if essential, has to be undertaken to remedy temporal variation of multitemporal remote sensing data. This is especially important if spectral knowledge is to be derived from them directly. Radiometric rectification may take the form of
475
476
Chapter Eleven normalization so that no artificial variation in radiometry exists between multitemporal data. Other preliminary processing includes resampling of the data to the same spatial resolution. Non-remote sensing data may also require further processing, albeit for different reasons. If their format is not compatible with that of the remote sensing data (e.g., in vector format), it has to be unified via a conversion from raster to vector or vice versa. In general, it is better to convert vector layers such as roads and coastal lines to a raster format, at a spatial resolution identical to that of the satellite imagery to be classified, than to vectorize satellite images because no objects exist in them. They may also have to undergo data projection and resampling. This may be followed by recoding of the rasterized data to differentiate different attribute values critical to reaching the right decision (e.g., above the tree line is assigned a code of 0 and below the tree line receives a code of 10). Data recoding is also essential to meet the specific needs of an application, such as conversion from a continuous scale to a categorical scale so that fuzzy logic may be applied. Finally, spatial analyses may have to be carried out to derive environmental variables, such as slope gradient and orientation, from a DEM. If a priori knowledge about the reliability of the evidences is available, then it should be incorporated into the knowledge base by attaching a weight to each of them. Those more reliable ones should receive a larger weight. In order to facilitate the calculation of probability, categorical data such as favoring a landslide event or acting against it have to be quantified by converting a verbal description to a number systematically (Muthu and Petrou, 2007). The categorical attribute values converted into numerical ones may have to be standardized to a sum of one for a given attribute. Once the data have been properly processed, they are ready for codifying knowledge, usually as rules, the format acceptable to most knowledge-based systems. Included in the rules are targets defining specific land covers or ground objects and attribute values. Care must be taken in determining these values, usually from training samples. The more specific values these attributes have, the more accurate the final decision. All available evidences in the database should occur in the rules. Rules should be available for every cover to be mapped. These rules can take the form of a decision tree, for instance, organized in a way similar to those shown in Fig. 11.2. When multiple evidences are involved, the most important evidence should be tested first by arranging them near the top of the hierarchical tree. Decision rules vary from cover to cover, but they should be complete and cover every possible situation. No matter how many rules are involved, they should not overlap with each other. Once the knowledge base is constructed, it should be tested to examine the effectiveness of the established rules. If the classification results are not satisfactory, then the attribute values may have to be
Intelligent Image Analysis modified. If a rule is not effective in improving classification accuracy, it should be replaced with new rules involving fresh evidences. This process is repeated until the generated results are deemed accurate and acceptable. Once properly tested, the knowledge base is ready for application to classifying remote sensing data. Dependent upon the nature of the system, it may be possible to output a classification together with a confidence layer, just like in a fuzzy classification. Knowledge-based image classification is typically implemented in Expert Classifier in ERDAS (Earth Resources Data Analysis System) Imagine (Fig. 11.10). Core to this classifier is the Knowledge Engineer, a graphical user interface that builds the knowledge base in a tree structure. Once activated, it requires the input of variables that can be raster, vector (e.g., a road layer), or scalar (e.g., land covers to be mapped), hypotheses about them (e.g., slopes = “north”), and conditions. In this environment complex rules may be formulated. A hypothesis defines a class or a land cover type to be mapped. The rule under a hypothesis may be a subset of the intermediate hypothesis, such as “vegetation in residential” and “vegetation in parks,” or conditional statements concerning variables. Hypotheses are evaluated to be true or false from the rules based on the input variables. A variable has conditions or status or a particular value used in the evaluation. Several attributes may be combined to reach a conclusion. All the rules must be correct
FIGURE 11.10 The ERDAS Knowledge Engineer user interface. The large window is reserved for display of the constructed tree. The smaller window in the lower left displays any files under Hypothesis, Rules, and Variables (attributes). (Copyright: ERDAS.)
477
478
Chapter Eleven to return a true evaluation in the hypothesis. Expert Classifier offers a dynamic environment in which all criteria, rules, and hypotheses can be interactively modified.
11.6.2
Postclassification Filtering
As the name suggests, postclassification filtering works on images that have been classified already. There is only one layer of remote sensing data (i.e., no multispectral bands anymore) in the input. What this layer contains are a limited range of codes of land covers. Therefore, it is much easier to derive spectral knowledge from it than from raw images. The knowledge base constructed from parametrically classified results (knowledge base II in Fig. 11.5) and other data sources deliberately target those misclassified covers or a land cover of interest to fine-tune them. This is because misclassifications rarely take place equally for all covers. By focusing the attention only on those that are difficult to be classified correctly, both classification accuracy and processing efficiency can be improved simultaneously. During postclassification processing, this knowledge base is applied to spatially filtering the land cover categories of interest against a set of rules. In this filtering process, the identity of pixels in the classified image is evaluated against the available evidences using the logic AND to determine whether the rules have been violated. If so, these misclassifications are then corrected in accordance with the new evidences. This is virtually a process of confirming or rejecting the assigned identity of pixels, or a given segment if the image is classified based on segments, by examining other attributes at the same location. Postclassification filtering either reinforces or suppresses an existing classification. In contrast to ordinary postclassification filtering, knowledgebased postclassification filtering can be aspatial and spatial. The former implementation does not operate within a neighborhood. Only the value of a single pixel under consideration exerts an impact on its output while the properties of its neighboring pixels are ignored. Instead, external knowledge from other data sources or attributes at the same location is examined. These external layers, usually from a GIS database, serve as a constraint against which the validity of the classification is evaluated. During evaluation, each pixel is examined in isolation. The identity of the pixel in question is either changed or left intact, depending upon the outcome of evaluating the conditional statements. Spatial knowledge-based postclassification filtering operates within a window. The pixel in question is evaluated against the geometric and spatial properties (e.g., shape and distance) of those pixels in the defined neighborhood. Much more challenging to implement than aspatial knowledge-based postclassification filtering, it will be covered in depth in Sec. 11.6.4. An ideal environment for implementing postclassification filtering is Raster Calculator, one of the several Spatial Analyst
Intelligent Image Analysis extensions of ArcGIS. This spatial cartographic modeling tool is designed for the analysis of raster data at individual cells. It has five categories of functions: local, focal, zonal, global, and application specific. One of the local functions suitable for performing postclassification filtering is called Con, short for “conditional statements.” It has the following syntax: Con [, <true expression>, (false expression)] where condition refers to a conditional statement that is evaluated at individual cells in the input data layer. If the condition is met even once, the value associated with the true expression is activated and selected as the output. If the condition is false, then the false expression is evaluated. These expressions can be as simple as a single value or a raster layer. They can also be very complex, consisting of multiple nested functions, including Con itself. A number of conditional statements may be logically combined to test the value of the input pixels with a mandatory true expression that identifies the value applicable to those pixels that have tested true. If none of the conditional statements are found to be true, a value associated with the optional argument false expression is activated as the output. If no value is specified at the false expression argument, then the pixel that does not meet any of the conditions within the expression receives a status of “no-data.”
11.6.3 A Case Study Which method of implementing knowledge in image classification is better, knowledge-based classification or knowledge-based postclassification filtering? In order to answer this question, both methods were tried for mapping mangroves from SPOT satellite imagery in a case study. Mangroves are salt-tolerant vegetation confined to shallow coastal waters and estuaries. Their unique habitat within the intertidal zone means that they are spatially homogeneous. Such a predictable geographic location combined with a pure composition creates an excellent opportunity to test the effectiveness of knowledge-based image analysis in improving their mapping accuracy. This case study was carried out for the western portion of the Waitemata Harbour in Auckland, New Zealand, which is made up of several steep-sided creeks (Gao et al., 2004). Each creek contains a prominent tidal channel even at low tides when mudflats along the channel become extensive. Distributed over these shallow intertidal mudflats are mangroves of a limited height because of the stressful environment (e.g., low mean annual temperature). The mapping of such mangroves is complicated by the fact that they are not spatially continuous, especially at high tides. Moreover, they also abut coastal cliffs over which dense coastal forests and residential areas are
479
480
Chapter Eleven situated extensively. The latter are often confused with mangroves in spectral image classification of satellite images, even after the mangrove-containing coastal zone is demarcated and analyzed (Gao, 1998). Therefore, external knowledge is indispensable for their accurate mapping. The external knowledge used in the knowledge-based mapping includes the coastline position and the values of mangrove pixels in the SPOT spectral bands. The former was acquired by tracing the coastline directly on the satellite imagery via on-screen digitization. The generated vector coverage was subsequently rasterized at the same spatial resolution as the multispectral SPOT data (20 m), and recoded as a binary image of land (0) and water (−5000). Although the focus of knowledge-based image classification is mangroves, it is still necessary to establish a separate set of decision rules for residential because it is very heavily mixed with mangroves as found in a preclassification trial. Their mixture takes place in two directions: residential incorrectly labeled as mangroves, and mangroves misclassified as residential. In conjunction with the two classes of pure mangroves and residential, four covers were mapped in the knowledgebased image classification. The knowledge of the values of mangrove pixels was obtained from training samples selected from the satellite image. The spectral range of mangrove pixels was ascertained by examining their values in the original multispectral bands. Decision rules defining this range for mangroves and residential areas were then established and presented in a tree-like structure using the ERDAS Imagine Expert Classifier (Fig. 11.11). Spectral knowledge was represented as a constraint in the form of DNLDN≤DNU (L, U: lower and upper limits). Such a constraint was established from every spectral band used. Thus, the total number of constraints equaled the total number of spectral bands of the satellite data. Mangroves were mapped at a total area of 904 ha, of which 469 ha stood for residential in water and should be excluded. Thus, only 435 ha were genuine at a mapping accuracy of 100 percent. In spite of such high producer’s accuracy, its user’s accuracy was rather low at only 43.3 percent owing mainly to misclassifications between stunted mangroves and residential. Dense mangroves were mapped rather accurately, though. Hence, not all types of mangroves can be accurately mapped despite the application of the spatial knowledge. The low user’s accuracy suggests that the spectral thresholds for mangroves in the decision rules were too conservatively defined to encompass all mangrove pixels. By comparison, the two classes of residential were highly accurate (98.3 percent for pure residential, and 81.7 percent for residential reclassified from mangroves), in drastic contrast to the low user’s accuracy of mangroves reclassified from residential (Fig. 11.12). These high accuracies demonstrate that the decision rules for residual were not adequately relaxed to encompass the whole range of values for residential pixels.
10 <= Band 2 <= 15 21 <= Band 3 <= 27 Coastline = –5000
17 <= Band 1 <= 29 Mangroves in residential
Rule II
11 <= Band 2 <= 23 15 <= Band 3 <= 23 Coastline = –5000
18 <= Band 1 <= 23 Residential in mangroves
Rule III
10 <= Band 2 <= 15 21 <= Band 3 <= 27 Coastline != –5000
17 <= Band 1 <= 29 11 <= Band 2 <= 23 Genuine residential
Rule IV 15 <= Band 3 <= 33 Coastline != –5000
FIGURE 11.11 The knowledge base for mangroves and residential constructed from three SPOT multispectral bands and a coastline coverage. (Source: Gao et al., 2004.)
In knowledge-based image analysis, the threshold values of pixels in the decision rules govern the class to which they should be assigned through simple comparisons. The degree of classification success is subject directly to the specification of these threshold values. A slight variation in these values exerts a profound impact on the classification results. Judging from the above accuracies, the spectral knowledge about the threshold ranges of mangrove and residential pixels was too narrowly defined using the raw spectral bands. In order to ameliorate the confusion between stunted mangroves and residential, the spectral knowledge was fine-tuned by relaxing the spectral range of residential pixels to 18 to 25 in band 1, 11 to19 in band 2, and 23 to 36 in band 3 in the decision rules (Fig. 11.11). Application of such updated rules caused the disappearance of most misclassified residential pixels from muddy water. The heavy mixture of sparse
481
482
Chapter Eleven
Classified residential Residential from mangroves Classified mangroves Mangroves from residential Unclassified
0
3
6 km
N
FIGURE 11.12 Results (mangroves and residential areas) mapped using the knowledge-based approach in ERDAS Imagine. (Source: Gao et al., 2004.) See also color insert.
mangroves and residential was lessened, too. The accuracy of mangroves inside residential, however, increased merely to 55 percent, with many evaluation pixels situating close to the coastline. Therefore, the confusion cannot be resolved by simply relaxing the spectral range of residential pixels in the decision rules. If the decision rules for mangroves had been relaxed excessively, some residential pixels would have been misclassified as mangroves because residential had a broad spectral range in the input image, leading to misclassifications in the opposite direction. There is only limited room for maneuvering spectral knowledge. This demonstrates that a land cover cannot be accurately mapped if its spectral variability in the input image is excessively broad and overlaps with other covers. Knowledge-based image classification does not necessarily result in an acceptable level of classification accuracy unless the knowledge is reasonably precise. A high accuracy is achievable only when the threshold values in the decision rules have been appropriately set for every class. In postclassification filtering, the same satellite data were classified into lush mangroves, stunted mangroves, and seven other categories using the maximum likelihood classifier. As with the
Intelligent Image Analysis knowledge-based image classification, these covers have to be mapped separately even if they are not the objects of study. Otherwise, their omission in this parametric classification would degrade the accuracy of mangroves. After the parametric results classified in ERDAS Imagine were spatially filtered within a window of 3×3 pixels using clumping and sieving, decision rules for mangroves and residential were established from them (Fig. 11.13).
Class == 6 Lush Mangroves
Mangrove 1 Coastline == –5000
Mangrove 2
Class == 5 Coastline == –5000
Stunted mangroves Class == 7 Residential in water Coastline == –5000 Stunted mangroves in land
Residential
Residential
Class == 5 Coastline != –5000 Class == 7 Coastline != –5000
Lush mangroves in land
Industrial
Class == 6 Coastline != –5000 Class == 3
Industrial Coastline != –5000 Muddy water
Muddy water
Class == 1 Class == 3
Industrial in water Coastline != –5000 Clear water
Clear water
Class == 2
Forest
Forest
Class == 4
Pasture
Pasture
Class == 8
River water
River water
Class == 9
FIGURE 11.13 The knowledge base constructed from the parametrically classified results and the coastline layer. (Source: Gao et al., 2004.)
483
484
Chapter Eleven These rules are easier to define as all pixels in this classified image have one of only nine possibilities, obtainable by examining the attribute table of the classified results. These decision rules are much more comprehensive than those in Fig. 11.11 as they must be available for every cover in the classification scheme, even if no external knowledge exists for it. Refined with knowledge-based filtering using the knowledge base shown in Fig. 11.13, the parametrically classified results contain two classes of mangroves at a combined area of 798 ha after nearly 50 percent (778 ha) of classified mangroves in the parametric results (1576 ha) were excluded. Use of knowledge about mangrove spatial distribution enabled a portion of misclassified pixels in the parametric results to gain a genuine identity (Fig. 11.14). The accuracies of mangroves were improved from 26.7 percent (stunted mangroves) and 78.3 percent (lush mangrove) in the parametric classification to 81.7 and 98.3 percent, respectively. The overall accuracy rose to 88.3 percent with the Kappa value being 0.87. Knowledge-based postclassification filtering can considerably improve the accuracy of mangroves that have been mapped with parametric classification. It is better than knowledge-based image classification because of its ability to produce more accurate results for more covers. In spite of the much improved accuracy, however, the use of the spatial knowledge cannot achieve a perfect accuracy because it is
Clear water Forest Industry Lush mangroves Muddy water Pasture Residential River water Stunted mangroves
FIGURE 11.14 Land covers mapped using postclassification filtering. The input image was produced with a parametric classifier. (Source: Gao et al., 2004.) See also color insert.
Intelligent Image Analysis assumed that mangroves are strictly limited to water areas, and residential and industrial classes are confined to land areas in their geographic distribution. The reality is that some evaluation pixels for stunted mangroves were still misclassified as residential. Such misclassifications took place in the vicinity of the land-water interface. They are attributed to positional discrepancies between the spatial knowledge (i.e., coastline positions in the knowledge base) and the satellite image.
11.6.4
Postclassification Spatial Reasoning
Varying connotations are associated with spatial reasoning in diverse disciplines. Some regard it as the ability to visualize objects and space, and to transform qualitative information into spatial representations. In information technology, spatial reasoning refers to a deductive process of restoring incomplete or missing information from what is remaining, on the basis of the spatial relationship between absent and present features. In geoinformatics spatial reasoning can be broadly considered as a process of forming ideas through spatial relationships between geographic entities (Crawford, 1992–1993). This formation may involve manipulation of maps in a GIS using spatial statistics, spatial models, and spatial analysis functions. Here knowledge-based spatial reasoning is defined as the inference of the genuine identity of pixels from those in their immediate vicinity or from external knowledge at the same or similar locations. Spatial reasoning aims at restoring and recognizing mapped targets from an incomplete set of evidences with the assistance of spatial knowledge, such as the spatial interconnectivity of all pixels in a neighborhood. Similar to postclassification filtering, knowledge-based spatial reasoning works on classified images by examining the spatial relationships among pixels or their contextual information. Unlike knowledge-based postclassification filtering, which makes use of spectral and external knowledge, spatial reasoning relies exclusively on spatial knowledge or spatial properties of pixels. Spatial reasoning also differs from standard postclassification filtering based on clumping and sieving in that no knowledge is involved in the latter, even though the processing takes place in the spatial domain. Spatial knowledge or contextual information may include, but is not limited to, proximity to other classes or features, shape, compactness, and spatial relationships of pixels. Such knowledge is helpful in generalizing classification results in a hierarchy of land covers. For instance, a pixel labeled as tree may be further aggregated into a general land cover by examining its neighboring pixels. It can be generalized as forest if surrounded predominantly by tree pixels, or parks if surrounded by urban pixels, or orchard if the surrounding trees all have a regular pattern. Similar to postclassification filtering, postclassification spatial reasoning is a follow-up step to further refine parametric classification
485
486
Chapter Eleven results and to improve the accuracy of the mapped target or its completeness. The final identity of a pixel is checked against geometric criteria derived from pixels within a certain neighborhood. This decision is reached from assessment of additional spatial knowledge, such as distance to existing blocks of pixels, and whether the pixel fits into a spatially meaningful pattern nicely. For instance, a road detected from per-pixel classification may not be spatially continuous or may not have a uniform width owing to coarse spatial resolution of the image used or limitations of the classifier. Such results can be made more complete or reasonable based on the spatial relationship among pixels or the geometric properties of roads via spatial reasoning. It is difficult to implement knowledge-based postclassification spatial reasoning at present because no commercial software has been released for performing this task, even though research has been done on possible search directions and algorithms. This window-based operation is highly challenging in that sophisticated rules and algorithms must be constructed to take every possible case into consideration during searching for the neighboring pixels. So far, a limited success has been achieved in perfecting a street map produced from unsupervised clustering analysis using knowledge-based spatial reasoning (Gao and Wu, 2004). This initial map was created at 20 clusters from 4-m multispectral IKONOS data for a part of central Auckland, New Zealand. Because of the limitation of the algorithm and presence of spectrally similar features (e.g., vehicles parked along streets), the results are rather noisy and the continuity of mapped streets is disrupted by the presence of trees lining the streets (Fig. 11.15a). These imperfections may be improved through spatial reasoning based on the external knowledge that streets have a uniform width and are spatially continuous. After the initial classification results were filtered to remove noise, they were further refined using spatial reasoning based on the geometric properties of streets. They should not be dangling but connected to other roads, usually perpendicularly. The output from spatial reasoning (Fig. 11.15b) is not perfect yet in that many small dangling street segments are still persent. Some of them are genuine roads that should be connected with other longer ones in the vicinity, and other noisy ones should have been removed. These imperfections were not eradicated during spatial reasoning because of the use of a conservative searching neighborhood (i.e., only 8 pixels). A much better outcome could have been expected if the search distance had been relaxed to examine whether there was another segment in a wider spatial context. The refined street could be further checked for its continuity in multiple iterations. More research is needed to perfect the mapping results.
Intelligent Image Analysis
(a)
(b)
FIGURE 11.15 Mapping of a road network using knowledge-based postclassification spatial reasoning. (a) A binary image produced from unsupervised clustering analysis. (b) A refined road map using knowledge of road configuration. The result could be further refined by taking road lengths into consideration. (Source: Gao and Wu, 2004.)
11.7
Critical Evaluation The performance of knowledge-based image classification is usually evaluated against a number of criteria, such as accuracy, speed, and ease of use. In addition, the advantages and disadvantages in relation to parametric classifiers should be considered as well. These issues are comprehensively examined in this section.
487
488
Chapter Eleven
11.7.1
Relative Performance
A wide discrepancy exists in the reported performance of knowledge-based image classification relative to that of parametric classifiers. In general, most studies have demonstrated that knowledgebased image analysis is more accurate than the conventional perpixel classification methods. The knowledge-based image analysis approach outperforms its maximum likelihood counterpart in deriving land covers at level II from multitemporal TM (Thematic Mapper) data (Civco, 1989). The lowest classification accuracy of 86.1 percent achieved by knowledge-based image analysis was further improved to 100 percent after certain rules in the knowledge base were modified. By comparison, the maximum likelihood method achieved an overall accuracy of only 33.2 percent at the same level, even though this accuracy rose to 56.6 percent at level I of six covers. This superiority was later confirmed in mapping land covers from multisource data (Srinivasan and Richards, 1990). The knowledge-based method yielded a classification accuracy of 81.35 percent, against 69.0 percent using the maximum likelihood classifier. Additionally, the knowledge-based approach also produced vastly superior results in terms of visual comprehensibility. In mapping vegetation in a forested environment, the knowledgebased method achieved an accuracy of 76.2 percent, much superior to the accuracies of 50.4 percent (maximum likelihood) and 66.7 percent (supervised nonparametric) associated with the per-pixel image classifier (Skidmore, 1989). Also in mapping vegetation in a national park, the knowledge-based method attained an overall accuracy of 72 percent, much higher than 42 percent achieved by the supervised classification of the same data, thanks to the use of spatial knowledge (Plumb, 1993). The lower accuracy than that reported by others is due to the fact that vegetation was classified into 20 categories from TM imagery, some of which could not be reliably identified solely from their spectral properties, even on large-scale color aerial photographs. The superior performance of the knowledge-based approach, however, is complicated by the remote sensing data used. Their temporal and spatial resolutions all have an impact on its accuracy. Knowledge-based image classification improved the classification accuracy of the standard supervised relaxation method by 13.2 percent from a single-date satellite image (Kontoes and Rokos, 1996). However, the margin of improvement dropped to only 1.3 percent for two-date images. The superiority also decreases for data of a finer spatial resolution, such as very high resolution IKONOS data. In mapping irrigated vegetation in urban areas, the “optimal” expert classifier yielded a root-mean-square (RMS) error of about 8 percent from reference data (Stow et al., 2003), approximately 2 percent higher than the results classified using the standard unsupervised method. This intelligent classifier makes little further improvement to the results
Intelligent Image Analysis from unsupervised classification, which was used as one of the inputs in the knowledge-based classification. Contrary to the above comparable performance, it is possible for knowledge-based method to be less effective than the per-pixel classification method. For instance, the knowledge-based method is the least accurate at 59 percent, lower than 62 percent achieved by the maximum likelihood classifier and 74 percent by artificial neural networks (ANN), in mapping land covers into 11 categories (Liu et al., 2002). No explanation was offered as to why the knowledge-based method failed to achieve more accurate results in this case. It is conjectured that such external knowledge as GIS data layers (terrain aspect, elevation, slope, and soil) is ineffective in providing more information about the land covers to be produced, or the rules derived from the knowledge are ill formulated. As a matter of fact, the knowledgebased approach can be intrinsically limited if rules describing the relationships between terrain variables and species associations are too broadly defined (Carpenter et al., 1997). Such vagueness prevents the identification of certain species in vegetation mapping. The mixed performance of knowledge-based image classification as cited above is attributable to the methods of analysis and to the special knowledge used. If the spatial knowledge is used simultaneously with spectral knowledge, the results will differ from the use of the same knowledge in a postclassification session, as demonstrated in the case study in Sec. 11.6.3. The performance of the knowledge-based approach may be improved by making the knowledge more accurate through modification of decision rules in the knowledge base or through the use of additional spatial knowledge. For instance, the most accurate (80 percent) output was yielded after additional rules were added during postprocessing (Liu et al., 2002). The success of knowledge-based resources mapping relies heavily on the availability of special knowledge concerned. The utilization of more external knowledge is conducive to better performance. Spatial contextual information derived from satellite imagery, GIS data layers (e.g., soil and road buffer maps), even maximum likelihood probability layers, can be incorporated into knowledge-based image classification in a postclassification session (Kontoes and Rokos, 1996). Such a knowledge-based approach is able to refine the results obtained from standard image classifiers.
11.7.2
Effectiveness of Spatial Knowledge
Apart from the manner in which spatial knowledge is utilized, the reported discrepancy in the performance of knowledge-based image classification is also attributed to the utility of knowledge. The effectiveness of knowledge in image classification can be easily determined by leaving one kind of knowledge out of the classification at a time, for instance, by turning a branch of the concerned rule off in the tree. A comparison of the achieved accuracy with that obtained
489
490
Chapter Eleven from the use of all available knowledge is able to shed light on its effectiveness. To a large degree, the effectiveness of knowledge depends on its capability to resolve spectral overlapping among land covers to be mapped because it is such spectral confusion that degrades the accuracy of per-pixel parametric classifiers. The effectiveness of knowledge is affected by many factors, such as the type of knowledge, its accuracy and currency, and the land covers to be mapped. For mangroves, the most effective and useful knowledge is their spatial situation. Application of spatial location in a knowledge-based image classification improved the user’s accuracy of stunted mangroves and lush mangroves in a parametric classification considerably, and the spatial distribution of the mapped mangrove forests is much more reasonable than that in the parametric classification (Gao et al., 2004). For other covers such knowledge may not exist or may be too complicated to represent (e.g., a large building next to a large parking lot is most likely to be a shopping mall, but could also be a hospital). In detecting transformed land use and land cover in urban areas, GIS data in the knowledge base (e.g., schools, roads, rivers, golf courses, and parks) played a critical role in the achievement of very high mapping accuracy in knowledge-based mapping (Chou et al., 2005). The improvement in accuracy depends on the accuracy of contextual rules, which, in turn, depends on the type and reliability of attributes used in the knowledge base (Wilkinson and Megier, 1990). In mapping coastal vegetation the most effective external knowledge is elevation rather than texture and normalized difference vegetation index (NDVI) value, as anticipated (Schmidt et al., 2004). It improved the mapping accuracy from 40 to 58 percent. By comparison, slope aspect and position on the slope are much less effective. The improvement is not significantly different from the highest accuracy with all the external knowledge. This suggests that other external knowledge is not so critical to the achievement of accurate vegetative mapping in this special environment.
11.7.3
Strengths
Knowledge-based image classification has four distinct advantages in comparison to parametric classifiers. Firstly, the knowledge-based approach produces objective and replicable results. Core to the success of knowledge-based classification is the knowledge base. Experts are only needed to construct the knowledge base and to put it into a workable format. Once the knowledge base is completed, the end user does not have to be an expert in using the system or in the subject area. This will increase the wide accessibility of the method and its popularity. Once the knowledge base is constructed, it can be applied to other studies of the same geographic area wholly or other geographic areas with some modifications. In the former case, the results of image classification are repeatable and efficiency is improved without the need for constructing the knowledge base, the biggest hurdle in knowledge-based image classification.
Intelligent Image Analysis Secondly, knowledge-based image analysis is versatile and capable of incorporating external, diverse non-remotely sensed data in the decision-making process. All kinds of geographic data, a useful source of expert knowledge, can be taken advantage of in the classification so long as they are compatible with the satellite data in accuracy and currency. In fact, knowledge-based image classification is a useful means of incorporating ancillary data (Lawrence and Wright, 2001) in an effort to improve the accuracy and reliability of classified results. Incorporation of diverse ancillary data makes use of knowledge of the environment in the decision-making process, which is conducive to the achievement of higher classification accuracy. Thirdly, knowledge-based image classification has a transparent decision-making process with all decisions made independently of each other. Every decision is justifiable. Such a transparency offers considerable flexibility in testing the effectiveness of hypotheses, variables, rules, and conditions in the classification, and in improving classification accuracy. After misclassifications have been diagnosed in a test, attention can then be directed at difficult cases by modifying the rules or attribute values concerning them accordingly. There is no need to interfere with or abandon other rules or variables that have proved satisfactory. Finally, knowledge-based image classification is considerably expedited if a constructed knowledge base can find repeated applications. This is extremely beneficial in longitudinal studies, such as change detection of a particular ground feature. Images are classified very quickly if a knowledge base for the same geographic area exists (Schmidt et al., 2004) as is the case with ongoing research. Spectral knowledge may have to be modified in light of new satellite data, though. Use of the same or similar knowledge base also makes the results obtained by different researchers directly comparable.
11.7.4
Limitations
Limitations of using the knowledge-based image classification can be summarized as:
Constrained Articulation of Knowledge In addition to knowledge, the human expert also uses experience and intuition in reaching a decision. The human expert sifts through the available facts, processes and analyzes geographic data, and compares and combines them to infer the identity of pixels from their spatial cues and other geographic settings. The number of cues used and their significance vary. Their role in the decision making is not precisely understood. The domain expert is not conscious of how an interpretation decision is derived based on what kind of knowledge. It may be difficult for the expert to articulate the logic and reasoning clearly under all circumstances. This creates difficulties in deciding what kind of knowledge should be included in the knowledge base
491
492
Chapter Eleven and what significance should be attached to each piece of evidence included in the knowledge base.
Limited Knowledge Expression The most acceptable way of representing knowledge is rules in the form of IF…THEN conditional statements. This rule format is excellent at expressing factual relationships (e.g., roads are linear), but utterly inadequate at representing geometric relationships (e.g., a linear feature is a road), and belong-to and functional relationships (e.g., a subset of). On the other hand, ground covers may be made up of a few identifiable components. For instance, urban residential comprises paved driveway, concrete roof, and a lawn yard, or a paved courtyard. Individually, each cover itself can form a hypothesis. Identification of urban residential cannot rely on simple IF-THEN rules because of the difficulty in formulating unmistaken knowledge about its component covers. The inability to represent such knowledge in rule form reduces the potential of knowledge-based classification. Even if knowledge can be expressed as rules, they may be too complex. For instance, whether a pixel belongs to a street is subject to its width, sinuosity, and orientation, not just its distance to other neighboring pixels. An exhaustive list of rules is needed for every potential situation, which makes the final tree of rules highly complex and difficult to trace and understand.
Knowledge Imprecision Knowledge represented in rule form may not be precise. There are three reasons for this imprecision. First, experts may disagree with each other as to what information class an observation should be assigned to when its precise definition is impossible to know (e.g., what does having a high cultural heritage value mean exactly?). Second, knowledge may be imprecise if it is derived from unreliable sources or applied to external data that may have a geometric accuracy incompatible with the source data. In the second case, the position of a land cover in one data layer does not match its position in another layer owing to a shifted boundary caused by a large residual in its georeferencing, even though all rules regarding their identity are still correct. Finally, the knowledge derived from training samples may not be representative. In particular, the lower and upper limits of a land cover in a spectral band have a degree of randomness. They should be avoided by using statistically determined values (e.g., means and standard deviation), instead of the actual ones, as imprecise knowledge reduces the accuracy of knowledge-based classification.
Initial Slow Construction of the Knowledge Base Critical to the success of knowledge-based image classification is a comprehensive knowledge base. The lengthy process of selecting and fine-tuning training samples in parametric classification is replaced
Intelligent Image Analysis by the construction of a knowledge base that requires a huge amount of time and effort. This process is prolonged if spectral knowledge is acquired from raw multispectral bands. A minor shift in the cursor’s position over an image could lead to a wide variation in pixel values during acquisition of spectral knowledge from raw imagery. A quality knowledge base requires repeated refinement. The constructed knowledge base has to be tested for its reliability and effectiveness. If proved unsatisfactory, it may have to be modified by altering the thresholds in existing rules or adding new rules. Determining the appropriate thresholds in a rule and searching for the most effective rules can be a very lengthy process. The above limitations can be overcome by using three methods. The first is to closely integrate the knowledge-based approach with other classifiers. For instance, knowledge-based postclassification processing may be combined with parametric image classification to take advantage of the strengths of both methods. Precious time can be saved in constructing the knowledge base. Another method is to use the output of the knowledge classifier as an additional input layer in a neural network classification (Liu et al., 2002). During postprocessing some additional expert rules are used to improve the output of the integrated classifier. The last method is to make use of more complete and relevant external knowledge obtainable from a GIS database if it is closely integrated with image analysis. Integrated image analysis will be covered in Chap. 14.
References Argialas, D. P. 1989. “A frame-based approach to modeling terrain analysis knowledge.” In Technical Papers of ASPRS/ACSM Annual Convention, April 2–7, 1989, Baltimore, 3:311–319, Bethesda, MD:ASPRS (American Society for Photogrammetry and Remote Sensing). Argialas, D. P., and C. A. Harlow. 1990. “Computational image interpretation models: An overview and a perspective.” Photogrammetric Engineering and Remote Sensing. 56(6):871–886. Baltsavias, E. P. 2004. “Object extraction and revision by image analysis using existing geodata and knowledge: Current status and steps towards operational systems.” ISPRS Journal of Photogrammetry & Remote Sensing. 58(3–4): 129–151. Buchanan, B. G., D. Barstow, R. Bechtel, J. Bennett, W. Clancey, C. Kulikowski, T. Mitchell, et al. 1983. “Constructing an expert system.” In Building Expert Systems, ed. F. Hayes-Roth, D. A. Waterman, and D. B. Lenat, 127–168, Reading, MA: Addison–Wesley. Carpenter, G. A., M. N. Gjaja, S. Gopal, and C. E. Woodcock. 1997. “ART neural networks for remote sensing: Vegetation classification from Landsat TM and terrain data.” IEEE Transactions on Geoscience and Remote Sensing. 35(2):308–325. Chou, T. Y., T. C. Lei, S. Wan, and L. S. Yang. 2005. “Spatial knowledge databases as applied to the detection of changes in urban land use.” International Journal of Remote Sensing. 26(14):3047–3068. Civco, D. L. 1989. “Knowledge-based land use and land cover mapping.” In Technical Papers of ASPRS/ACSM Annual Convention, April 2–7, 1989, Baltimore, 3:276–291, Bethesda, MD:ASPRS (American Society for Photogrammetry and Remote Sensing).
493
494
Chapter Eleven Cohen, Y., and M. Shoshany. 2005. “Analysis of convergent evidence in an evidential reasoning knowledge-based classification.” Remote Sensing of Environment. 96(3–4):518–528. Crawford, C. 1992–1993. Spatial versus verbal reasoning. The Journal of Computer Game Design 6, http://www.erasmatazz.com/library/JCGD_Volume_6/ Spatial_Vs_Verbal.html, last accessed June 24, 2008. Desachy, J., L. Roux, and E. H. Zahzah. 1996. “Numeric and symbolic data fusion: A soft computing approach to remote sensing images analysis.” Pattern Recognition Letters. 17(13):1361–1378. Franklin, S. E., D. R. Peddle, J. A. Dechka, and G. B. Stenhouse. 2002. “Evidential reasoning with Landsat TM, DEM and GIS data for land cover classification in support of grizzly bear habitat mapping.” International Journal of Remote Sensing. 23(21):4633–4652. Gao, J. 1998. “A hybrid method toward accurate mapping of mangroves in a marginal habitat from SPOT multispectral data.” International Journal of Remote Sensing. 19(10):1887–1899. Gao, J., H. Chen, Y. Zhang, and Y. Zha. 2004. “Knowledge-based approaches to accurate mapping of mangroves from satellite data.” Photogrammetric Engineering and Remote Sensing. 70(11):1241–1248. Gao, J., and L. Wu. 2004. “Automatic extraction of road networks in urban areas from Ikonos imagery based on spatial reasoning. Proceedings of the XXth Congress of ISPRS, July 12-24, 2004. Istanbul, Turkey, CD–ROM. Goldberg, M., G. Karam, and M. Alvo. 1983. “A production rule-based expert system for interpreting multitemporal Landsat imagery." In Proceedings, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 77–82, Silver Spring, MD. Goldberg, M., D. G. Goodenough, M. Alvo, and G. M. Karam. 1985. “A hierarchical expert systems for updating forestry maps with Landsat data.” Proceedings of the IEEE: A Special Issue on Perceiving Earth’s Resources from Space. 73(6):1054– 1063. Gong, P. 1996. “Integrated analysis of spatial data from multiple sources: Using evidential reasoning and artificial neural network techniques for geological mapping.” Photogrammetric Engineering and Remote Sensing. 62(5):513–523. Goodenough, D. G., M. Goldberg, G. Plunkett, and J. Zelek. 1987. “An expert system for remote sensing.” IEEE Transactions on Geoscience and Remote Sensing. GE-25(3): 349–359. Huang, X., and J. R. Jensen. 1997. “A machine-learning approach to automated knowledge-base building for remote sensing image analysis with GIS data.” Photogrammetric Engineering and Remote Sensing. 63(10):1185–1194. Jackson, P. 1990. Introduction to Expert Systems. (2nd ed.). Wokingham, England: Addison-Wesley. Kartikeyan, B., K. L. Majumder, and A. R. Dasgupta. 1995. “An expert system for land cover classification.” IEEE Transactions on Geoscience and Remote Sensing. 33(1):58–66. Kontoes, C. C., and D. Rokos. 1996. “The integration of spatial context information in an experimental knowledge-based system and the supervised relaxation algorithm—Two successful approaches to improving SPOT XS classification.” International Journal of Remote Sensing. 17(16):3093–3106. Kruse, F. A., A. B. Lefkoff, and J. B. Dietz. 1993. “Expert system-based mineral mapping in northern Death Valley, California/Nevada, using the airborne visible/infrared imaging spectrometer (AVIRIS).” Remote Sensing of Environment. 44(2–3):309–336. Lathrop, R. G., P. Montesano, and S. Haag. 2006. “A multi-scale segmentation approach to mapping sea grass habitats using airborne digital camera imagery.” Photogrammetric Engineering and Remote Sensing. 72(6):665–675. Lawrence, R. L., and Wright, A. 2001. “Rule-based classification systems using classification and regression tree (CART) analysis.” Photogrammetric Engineering and Remote Sensing, 67(10):1137–1142. Lein, J. K. 2003. “Applying evidential reasoning methods to agricultural land cover classification.” International Journal of Remote Sensing. 24(21):4161–4180.
Intelligent Image Analysis Liu, X. H., A. K. Skidmore, and H. van Oosten. 2002. “Integration of classification methods for improvement of land-cover map accuracy.” ISPRS Journal of Photogrammetry and Remote Sensing. 56(4):257–268. Lu, Y. H., J. C. Trinder, and K. Kubik. 2006. “Automatic building detection using the Dempster-Shafer algorithm.” Photogrammetric Engineering and Remote Sensing. 72(4):395–403. Mason, D. C., D. G. Corr, A. Cross, D. C. Hogg, D. H. Lawrence, M. Petrou, and A. M. Tailor. 1988. “The use of digital map data in the segmentation and classification of remotely-sensed images.” International Journal of Geographical Information Systems. 2(3):195–215. Matsuyama, T. 1987. “Knowledge-based aerial image understanding systems and expert systems for image processing.” IEEE Transactions on Geoscience and Remote Sensing. GE–25(3):305–316. Matsuyama, T. 1989. “Expert systems for image processing: Knowledge-based composition of image analysis processes.” Computer Vision, Graphics, and Image Processing. 48(1):22–49. McKeown, D. M., Jr. 1984. “Knowledge-based aerial photo interpretation.” Photogrammetria, 39(3):91–123. Mehldau, G., and R. A. Schowengerdt. 1990. “A C-extension for rule-based image classification systems.” Photogrammetric Engineering and Remote Sensing. 56(6):887–892. Moller-Jensen, L. 1990. “Knowledge-based classification of an urban area using texture and context information in Landsat TM imagery.” Photogrammetric Engineering and Remote Sensing. 56(6):899–904. Moller-Jensen, L. 1997. “Classification of urban land cover based on expert systems, object models and texture.” Computers, Environment and Urban Systems. 21(3–4):291–302. Mulder, N. J., H. Middelkoop, and J. W. Miltenburg. 1991. “Progress in knowledge engineering for image interpretation and classification.” ISPRS Journal of Photogrammetry and Remote Sensing. 46(3):161–171. Muthu, K., and M. Petrou. 2007. “Landslide-hazard mapping using an expert system and a GIS.” IEEE Transactions on Geoscience and Remote Sensing. 45(2):522–531. Nicolin, B., and R. Gabler. 1987. “A knowledge-based system for the analysis of aerial images.” IEEE Transactions on Geoscience and Remote Sensing. GE–25(3):317–329. Peddle, D. R., and C. R. Duguay. 1998. “Mountain terrain analysis using a knowledge-based interface to a GIS.” Geomatica. 52(3):265–272. Penaloza, M. A., and R. M. Welch. 1996. “Feature selection for classification of polar regions using a fuzzy expert system.” Remote Sensing of Environment. 58(1): 81–100. Pierce, L. E., F. T. Ulaby, K. Sarabandi, and M. C. Dobson. 1994. “Knowledge-based classification of polarimetric SAR images.” IEEE Transactions on Geoscience and Remote Sensing. 32(5):1081–1086. Plumb, G. A. 1993. “Knowledge-based digital mapping of vegetation types in Big Bend National Park, Texas.” Geocarto International. 8(2):29–38. Robinson, V. B., A. U. Frank, and M. Blaze. 1986. “Expert systems applied to problems in geographic information systems: Introduction, review and prospects.” Computers, Environment and Urban Systems. 11(4):161–173. Schmidt, K. S., A. K. Skidmore, E. H. Kloosterman, H. van Oosten, L. Kumar, and J. A. M. Janssen. 2004. “Mapping coastal vegetation using an expert system and hyperspectral imagery.” Photogrammetric Engineering and Remote Sensing. 70(6):703–715. Schowengerdt, R. A., and H. Wang. 1989. “A general purpose expert system for image processing.” Photogrammetric Engineering and Remote Sensing. 55(9):1277–1284. Shafer, G. 1976. A Mathematical Theory of Evidence. Princeton, NJ: Princeton University Press. Skidmore, A. K. 1989. “An expert system classifies eucalypt forest types using Thematic Mapper data and a digital terrain model.” Photogrammetric Engineering and Remote Sensing, 55(10):1449–1464. Srinivasan, A., and J. A. Richards. 1990. “Knowledge-based techniques for multisource classification.” International Journal of Remote Sensing. 11(3):505–525.
495
496
Chapter Eleven Stow, D., L. Coulter, J. Kaiser, A. Hope, D. Service, K. Schutte, and A. Walters. 2003. “Irrigated vegetation assessment for urban environments.” Photogrammetric Engineering and Remote Sensing. 69(4):381–390. Strat, T. M., and M. A. Fischler. 1991. “Context-based vision-recognizing objects using information from both 2-D and 3-D imagery.” IEEE Transactions on Pattern Analysis and Machine Intelligence. 13(10):1050–1065. Weiss, S. M., and C. A. Kulikowski. 1984. A Practical Guide to Designing Expert Systems. Totowa, NJ: Rowman & Littlefield. Westinghouse Science & Technology Center, 1997. Blackboard architecture, http:// javadoug.googlepages.com/bb.htm. Wilkinson, G. G., and J. Megier. 1990. “Evidential reasoning in a pixel classification hierarchy—A potential method for integrating image classifiers and expert system rules based on geographic context.” International Journal of Remote Sensing. 11(10):1963–1968. Wilson, B. A., and S. E. Franklin. 1992. “Characterization of alpine vegetation cover using satellite remote sensing in the Front Ranges, St. Elias Mountains, Yukon Territory.” Global Ecology and Biogeography Letters. 2(3):90–95. Zhu, A. X., L. E. Band, B. Dutton, and T. Nimlos. 1996. “Automated soil inference under fuzzy logic.” Ecological Modeling. 90:123–145.
CHAPTER
12
Classification Accuracy Assessment
B
efore they can be treated as a trustful representation of the reality at the time of imaging, the land cover results classified from satellite images must be assessed for their accuracy to ensure that they meet the requirements of the intended applications. Provision of such quality assurance can also relieve the analyst from possible litigation related to the use of the data products. Assessment of the thematic accuracy of the classification results is defined as the verification of the labeled pixel identity against the ground truth at the time of sensing at representative sample points. Because “ground truth” may have changed since imaging, it is commonly substituted by the information obtained from a reconnaissance field visit or from other more reliable remote sensing materials such as aerial photographs of a larger scale and an existing thematic map of a known quality. This evaluation process is lengthy and expensive, and may take longer to complete than the image classification itself. At present it is still controversial as to how the accuracy of land use and land cover maps generated from digital analysis of remotely sensed data should be appropriately evaluated (Ginevan, 1979; Hay, 1979). For instance, no consensus has been reached yet on the proper size of evaluation samples and the type of accuracy that has to be reported. A proper evaluation sample size, usually only a portion of the classified pixels that are randomly selected, is able to minimize the cost of evaluation while expediting the evaluation process. This chapter on classification accuracy assessment begins with a discussion on the relationship between two commonly used and confused terms, accuracy and precision. This discussion is followed by a comprehensive review of the sources of inaccuracy arising from image classification and accuracy assessment, and identification of their characteristics. The focus then shifts to the appropriate method
497
498
Chapter Twelve of accuracy assessment, including the selection of evaluation pixels. Lastly, this chapter presents a commonly adopted method for reporting accuracy. Practical examples are provided to illustrate how the method should be applied in practice.
12.1
Precision versus Accuracy Precision has a number of connotations in different disciplines. In mathematics, it refers to the number of digits after the decimal point. The more effective numbers a measured result contains, the more precise the measurement is. In statistics, precision means repeatedness of the measurement. If the results from repetitive measurements have a small range of variation in their value, then they are said to be very precise, even though the mean of these measured values can be way away from the actual value. Accuracy refers to deviation of the measurement from the truth. It connotes a degree of correctness. A measurement that deviates from the true value excessively is said to be inaccurate. Inaccurate measurements are commonly associated with errors and mistakes. Accuracy and precision may be used interchangeably in other contexts. However, in digital image classification they are by no means synonymous with each other. In fact, they are two quite differing concepts for image classification results. In this context, precision refers to the detail level of land cover at which a classification has been performed. A classification at the first level of the Anderson scheme (Table 7.2) is said to be less precise than another at the second level. For example, a classification of vegetation at the species level (e.g., “gorse”) is more precise than that at the family level (e.g., “shrubs”). Accuracy or thematic accuracy of spatial data has been defined as “the closeness of results of observation, computations, or estimates to the true values or the values accepted as being true” (USGS, 1990). Thus, accuracy represents a measure of agreement between the assigned identity of a pixel and its genuine identity in the real world or its tested surrogate. If a large majority of pixels in the input image have been correctly labeled in a classification, the classification result is deemed accurate. Usually, accuracy is determined statistically from a subset of the classified pixels to minimize the cost of assessment. The material used to judge whether pixels have been correctly labeled is commonly known as reference data derived from the ground truth. Although accuracy and precision are two quite different concepts, they are not independent of each other. The level of details at which an image is classified is correlated strongly with the accuracy at which the classification is accomplishable. In general, the two are interrelated inversely in one classification (Fig. 12.1). An increase in classification precision is usually accompanied by a proportional decline in classification accuracy. This relationship can be expressed roughly
Accuracy
Classification Accuracy Assessment
I
II
III
Precision
FIGURE 12.1 Relationship between classification accuracy and the precision level of the same classification.
as a reverse J-shaped curve. It cannot be depicted precisely or via a mathematical equation as this curve varies with the study area and the type of land covers present. According to this relationship, under the same circumstance a classification becomes increasingly less reliable if performed at a higher detail level. It must be noted that this accuracy refers to the overall accuracy of all classified covers, namely, the accuracy for all classes in a classification. The accuracy for individual classes may not bear any relationship with this generalized relationship because the accuracy of a given class (e.g., water) may not change substantially at different levels of detail. Although a more detailed classification is more informative, it is not always desirable to classify satellite imagery data as precisely as possible because this could be expensive and time consuming to accomplish. In addition, a classification at a more precise level means that more evaluation pixels must be selected and their genuine identity checked against the reference data. This prolongs the assessment process and increases the cost of the image analysis unnecessarily. Instead, the precision level of a land cover classification should be commensurate with the intended purpose of the image analysis. The best compromise is thus to produce a highly accurate result that meets the desired requirements for precision. It is always feasible to generalize the classification results to a coarse precision level by amalgamating relevant classes at the subcategory level during postclassification processing. In this way, an even more accurate classification is derived after all confusion among sublevel classes is resolved.
499
500 12.2
Chapter Twelve
Inaccuracy of Classification Results The results classified from remote sensing data contain misclassifications. The sources of such mistakes and their nature are discussed in this section under four headings.
12.2.1
Image Misclassification
All pixels in an input image cannot be correctly labeled no matter how careful the image analyst is with the classification. A common misconception about classification errors is that they can be eliminated through exercise of more care during classification. In fact, the emergence of classification errors could be beyond the control of the image analyst. Some of the pixels are misclassified because of the limitation of the classifier or the noise in the input data. Misclassification or classification inaccuracy refers to the erroneous labeling of pixels in the input image, or their incorrect allocation to a land cover category. In order to avoid this potential ambiguity, the term inaccuracy is preferred in this chapter. It indicates the disagreement between classification results and what is actually observed on the ground or its surrogate. Since errors have long been used to mean inaccuracy, this tradition is also observed here. Inaccuracies in the land cover maps produced from digital analysis of remotely sensed data fall into three types: classification error, boundary error, and control error (Hord and Brooner, 1976). During grouping of pixels in the input image, some of them are incorrectly labeled as a consequence of the limitation of the classification algorithm or violation of the underlying assumption. As mentioned in Chap. 7, most spectral image classifiers are per pixel–based. This means that they can take advantage of only a pixel’s spectral value in the decision-making process while spatial relationship among pixels is totally ignored. Furthermore, the pixel-value–based classification is performed under the implicit assumption that each spectral cluster is associated uniquely with a ground cover, but different ground covers have differing Digital Numbers (DNs) in the same spectral band. Misclassifications arise whenever this assumption is violated, which is common for certain types of features. In addition, misclassifications may result from the inappropriateness of the adopted classification scheme. For instance, it may be ill suited to the remotely sensed data that are not recorded in the right season, or it does not have rooms to accommodate transitional covers that lie between two information classes (e.g., a mixture of two classes). Misclassifications, to some extent, can be traced to the nature of the scene to be classified, such as scene heterogeneity and ground sampling interval, or the presence of mixed pixels in the source data. It is relatively difficult to form pure pixels over a heterogeneous landscape. A broad sampling interval is conducive to formation of mixed pixels, especially over a highly fragmented landscape. Formed out of
Classification Accuracy Assessment the integrated radiance from all types of land covers present within the scanned ground area, mixed pixels are impossible to correctly classify at the pixel level. Classification inaccuracy can also be attributed to the effect of improper image preprocessing. For instance, a geometrically rectified image is output via resampling pixel values in the input image. Of the three resampling methods (Sec. 5.5.4), only the nearest neighbor does not require alternation of raw pixel values. The output pixel value is altered if the bilinear and cubic convolution resampling strategies are adopted. This minor adjustment in pixel values, however, can prove critical in causing misclassifications for some pixels. Misclassifications can also arise from sensor- and atmosphereinduced noise during acquisition of remotely sensed data. The radiation received at the sensor is subject to inconsistent sensitivity of multiple detectors. Theoretically, the same reflectance level entering the sensor should produce the same radiometric response and the same intensity level in the output signal. This ideal correspondence, however, may not always be the case. In addition, the quality of the data may be degraded during transmission through the atmosphere. The radiance that reaches a sensor comes from three parts: the target, the background, and the atmosphere, of which only the radiation from the target represents genuine information. The radiance from the background and the atmosphere is essentially noise. It disguises the true signal from the target and hence should be eliminated. Its complete removal is complicated by the fact that atmospheric scattering is a function of radiation wavelength and the atmospheric path. An in-depth discussion of radiometric calibration to account for atmospheric effect is beyond the scope of this book. Readers who are interested in exploring this topic further can refer to Gordon (1978). A common practice in qualitative remote sensing, such as land use/land cover mapping, is to disregard the effect of atmospheric radiation on satellite data as it has a minimal impact on land cover classification results.
12.2.2
Boundary Inaccuracy
The second source of errors in the evaluation process stems from the inaccuracy of spatial registration caused by boundary error. This source is relevant to site-specific accuracy. Unlike classification errors, boundary-induced errors are artificial. They do not exist in the classification results. Their emergence is due solely to the inappropriateness in the assessment itself, which has nothing to do with the classification results to be assessed. Such artificial inaccuracy can originate from two sources. • First, the “ground truth” used in the assessment could have become obsolete since the satellite data were recorded owing to a minor shift in the boundary of certain ground covers. As shown in Fig. 12.2, the boundary of the water body has
501
502
Chapter Twelve
A Water
Forest B
Pasture
Cropland
FIGURE 12.2 Thematic inaccuracy arising from inconsistency between map geometric inaccuracy and the position of the evaluation pixels. Solid lines represent boundaries at the time of imaging that are derived from the classified image. Dotted lines represent the changed boundary since the image was obtained, or the boundary whose position has shifted as a consequence of residuals in image geometric rectification.
expanded since the area was imaged. Consequently, there is a thematic shift from water to forest at spot A. However, such a change may not be recorded if the site is inspected in the field. • Second, the indicated boundaries have a degree of uncertainty associated with image rectification residuals. As mentioned in Chap. 5, it is quite permissible to have a rectification residual on the order of one pixel or smaller. This means that the geographic location of evaluation pixels may not be identical to the spot examined in the reference data or on the ground. Nevertheless, this geographic mismatch may be translated into thematic inaccuracy. In fact, it is quite common for the uncertainty in geometric position to translate into thematic inaccuracy along the border of two adjoining covers (Fig. 12.2). In this example, pixel B is classified as forest without boundary shift [e.g., determined using a global positioning system (GPS) unit in the field]. However, its identity changes to cropland in the classified results as a consequence of positional shift induced by image georeferencing residuals. Inaccuracy induced by boundary mismatch is confined spatially to a narrow band at the interference of two covers. This effect is highly pronounced if the evaluation pixels are distributed within the residual band.
12.2.3
Inaccuracy of Reference Data
The third type of inaccuracy originates from control error. Similar to boundary mismatch, control error does not exist in the classification results to be assessed. This inaccuracy, due to differential scales between image classification and accuracy assessment, is artificially
Classification Accuracy Assessment introduced to the assessment when the genuine land cover of a different spot is checked in the field. Image classification takes place at the pixel level. All the features falling within a pixel on the ground are reduced to a single cover in the classification results. On the other hand, identification of the genuine identity of evaluation pixels in the field takes place at the point level (i.e., where the assessor is standing in the field). It is not always true for the land cover observed at this spot to be representative of that over an area equivalent to the entire pixel size, unless it is spatially uniform. Otherwise, a perfect match is difficult to establish. Related to differential scaling is thematic discrepancy, resulting from different levels of thematic generalization. The same feature could have been faithfully preserved in the reference data, whereas it could have been generalized in the classification results during postclassification processing. Of the three types of errors, image misclassifications are the most important and critical source of errors. They are a major contributor toward inaccuracy in the classified results. The degree of misclassifications could be substantial for those land covers that do not have a unique spectral signature. By comparison, both boundary mismatch and control uncertainty exert a minor influence on the final reported accuracy. Their influence is restricted to certain geographic areas and to certain covers. Thus, a more in-depth discussion of misclassifications is warranted.
12.2.4
Characteristics of Classification Inaccuracy
Classification inaccuracy has a number of identifiable traits. Misclassified pixels have neither a random nor systematic distribution in the results. They are not spatially isolated either, but distributed as spatial clusters in association with certain themes. This association takes place preferentially with those information classes whose spectral properties share a high degree of similarity with other covers, as revealed by the spectral separability of their training samples. The information classes that are likely to be mixed with others are not spatially random. Instead, they may extend into certain areas and locations. As shown in Fig. 7.12, the spectral separability between any two classes is not equal among all land covers to be mapped. Those with a distinct spectral behavior can expect to be classified accurately. Those covers whose distribution of occurrence probability overlaps with that of another cover tend to have a low accuracy. Misclassifications concentrate more densely in places where spectral similarity is higher. Furthermore, the spatial distribution of misclassified mixed pixels is also affected by the size, shape, and arrangements of land cover parcels. Similar to the errors caused by boundary mismatch, they are located along the border of land cover parcels. Regardless of the causes for their presence in the classification results, classification errors fall into two broad categories in terms of their nature: errors of omission and errors of commission. Errors of omission
503
504
Chapter Twelve refer to misclassifications that exclude genuine pixels from an identified land cover to which they should belong. For instance, the exclusion of a forest pixel from the forest class represents an error of omission. Errors of omission are responsible for underestimating the area of the concerned land cover. By comparison, errors of commission refer to the erroneous assignment of pixels to a class to which they do not belong. For instance, the assignment of mudflat pixels to urban industrial represents an error of commission. Commission errors cause a land cover to be overestimated in the classification results. It must be emphasized that both omission errors and commission errors are relative, rather like “the two sides of a coin.” The same misclassification can be regarded as either an error of omission or an error of commission, depending upon which cover is referred. One cover’s omission error could be another cover’s commission error. Since the total number of incorrectly classified pixels remains the same in a classification, the overall sum of commission errors must be equal to the overall sum of omission errors among all the classified land covers. However, this statement cannot be applied to an individual cover unless there are only two covers in the classification results. This is because the amount of confusion taking place between any two covers A and B does not equal that between any two other covers C and D.
12.3 Procedure of Accuracy Assessment Image classification results are usually evaluated against two types of standards: training samples and independently selected evaluation pixels. The training area–based assessment is relatively easy to implement. During supervised classification, training samples must be selected for every information class in the classification scheme. Thus, the image analyst has already learned their genuine identity on the ground through a reconnaissance trip or from other reference data. This identity is passed on to the computer once the training samples are saved. On the other hand, their assigned identity in the classification results is also known to the computer. With these two sets of identity, it is relatively easy to determine how many of the pixels in the training samples have been classified correctly and with which classes a given class is mixed in case of a misclassifications. A table can then be readily constructed to show the type and severity of confusion among all mapped covers, together with accuracy indices for each cover and the overall classification accuracy for all covers. However, this kind of assessment is fundamentally flawed in two senses: • First, it lacks independence and objectivity. The initial purpose of selecting the training samples is to generate the most accurate classification results. Later, the same training samples are used to assess the accuracy of the classification results derived from them. This double use of the training samples
Classification Accuracy Assessment creates a problem of dependency. Theoretically, all the pixels encompassed in the training samples should be classified at a perfect accuracy. Use of these biased samples for accuracy assessment produces an artificially inflated accuracy. • Second, these pixels are selected at the patch scale by drawing a polygon around them (e.g., use of Area of Interest in ERDAS [Earth Resources Data Analysis System] Imagine). They are ill-suited for assessing the accuracy of the results that are classified at the pixel level. Because of these two reasons, this kind of accuracy assessment should be disused. Instead, the evaluation samples should be selected independently, even though this means the assessment is much more complex to undertake and that it takes much longer to complete at a higher cost. The procedure of using these samples for the assessment is subject to the scale of assessment.
12.3.1
Scale and Procedure of Assessment
The assessment of a classification result can be carried out at two spatial scales, the patch level and the point (pixel) level. At the patch scale of assessment, a polygon is drawn to enclose a large number of evaluation pixels quickly in one session. All of them correspond to the same type of ground cover in the reference data. In this way a large sample size is easily retained. Consequently, both the reliability and speed of assessment are improved considerably. This segment level of assessment is exemplified by the training area–based assessment in which a group of pixels are selected by delineating a polygon around them. This scale is appropriate only for land cover maps produced using the objectbased image classification method discussed in Chap. 10, during which a block of pixels are classified simultaneously in the decision-making process. However, if the land cover map is produced via per-pixel image classification, the identity of pixels in the input image is determined individually pixel by pixel. It is inappropriate to assess their classification accuracy at the polygon level. Instead, the pixel-level assessment should be adopted. At this level the accuracy of the classified pixels is assessed through examining the identity of certain selected evaluation pixels representative of respective land covers. In order to ensure statistical validity, these pixels should amount to an adequate size. Their genuine identity in the reference data needs to be ascertained individually too. This considerably slows down the process of accuracy assessment. Despite this drawback, this scale of assessment is appropriate and scientific, and thus commonly adopted. The remainder of this chapter will focus on this kind of accuracy assessment. The procedure of accuracy assessment based on independently selected evaluation pixels at the pixel level involves a number of issues, such as how many pixels should be selected and with which
505
506
Chapter Twelve methods. The procedure of assessment is very complex, involving the following six significant steps: • To decide on a sampling method appropriate for the purpose of accuracy evaluation • To select an optimum number of samples for each land cover • To compare them with the reference data • To generate an error matrix • To calculate the accuracy indices, including the Kappa index • To provide a confidence level for the evaluation
12.3.2
Selection of Evaluation Pixels
In evaluating image classification results, it is not realistic to check the identity of every classified pixel (van Genderen and Lock, 1977) as this is prohibitively expensive. An economical and commonly accepted practice is to use a subset of pixels that are representative of the population from which they are selected. A number of spatial sampling strategies are available for the selection of these evaluation pixels, including random, systematic, clustered, stratified, or a combination of them. Although the random method is the most scientific (Fig. 12.3a), it does not suit the evaluation of image classification results particularly well because not all land covers are randomly distributed in space. The adoption of a spatially random sampling strategy causes some of them to be underrepresented in the samples. Moreover, it may not be possible to select a sufficient number of evaluation pixels for certain classes that have a restricted spatial distribution or that have a subordinate prevalence within the study area. Similar problems also occur with the systematic sampling scheme (Fig. 12.3b) as no natural land covers have such a regular spatial arrangement and juxtaposition on the ground. Sampling at a constant interval both horizontally and vertically causes underrepresentation of minor land covers in the samples. The clustered sampling scheme is not an ideal choice either (Fig. 12.3c) because it causes overrepresentation of land cover at one location, but gross underrepresentation of another cover at a different location. The best strategy appears to be the stratified method (Fig. 12.3d) because the classified image has virtually been thematically stratified. This sampling method guarantees that a specified number of evaluation pixels can be selected from a given land cover. In this way all covers can be adequately represented in the evaluation samples. In order to retain the scientific validity of the selected samples, this stratified sampling method is usually combined with the random method. Thus, it is guaranteed that a certain number of evaluation pixels can always be randomly selected from the specified land cover. No matter how small a land cover is in size or limited in its spatial distribution, this specified number of evaluation pixels can always be selected, albeit not in a single pass. For these covers, the selection process may
Classification Accuracy Assessment
(a)
(b)
(c)
(d)
FIGURE 12.3 Spatial distribution of evaluation pixels selected using different schemes: (a) random; (b) systematic; (c) clustered; (d) stratified random. This is an equalized random in which three pixels are selected from each subarea.
have to be repeated a number of times to enable the selection of a sufficient number of evaluation pixels.
12.3.3
Number of Evaluation Pixels
It is very important to have the right number of evaluation pixels. A very small number leads to a low reliability in the accuracy indicators generated because the evaluation results are subject heavily to a few abnormalities. On the other hand, a large number is not desirable either, because it prolongs the evaluation process and increases the cost of results validation. In order to strike a delicate balance between reliability and expense, the number of pixels should be kept as small as possible, but still be capable of meeting a specified requirement at an accuracy level (van Genderen and Lock, 1977). One method of determining the necessary minimal sampling size N is given below: N=
Z 2 pq E2
where p = the expected percentage of accuracy q = 100 − p E = allowable error
(12.1)
507
508
Chapter Twelve Z is set to 2 based on a bimodal distribution (Fitzpatrick-Lins, 1981). Usually, Z is generalized from the standard normal deviation of 1.96 for the 95 percent two-tailed confidence level. Example If the expected accuracy is 85 percent at an allowable error of 5 percent, what is the minimum number of evaluation pixels needed for a reliable assessment? Solution p = 85, q = 100 − 85 = 15, E = 5 N = 4 × (85 × 15)/5² = 204
The formula given in Eq. (12.1) provides an estimate for the total number of pixels for all mapped land cover categories. It does not specify how these pixels should be allocated among individual land covers. The number of pixels that should be selected for a given land cover class N is determined with the use of the following equation (van Genderen and Lock, 1977): N = ( p + q) x
(12.2)
where p and q are the same as in Eq. (12.1); x = sample size. Based on numerous calculations, van Genderen and Lock (1977) proposed that the minimum sample size for each category necessary for 85 percent interpretation accuracy be set to 20, for 90 percent accuracy to 30, and so on. In case the importance of some categories is overemphasized, it may be desirable to increase the minimum sample size in terms of the following formula (Hay, 1979): N=
100ni Si
(12.3)
where Si = proportion in the ith stratum ni = required sample size for that stratum N = total sampling size In the above discussion, the recommended sampling size fails to take into consideration the original population size, even though the importance of a land cover category has been recognized. The use of a constant sample size for all covers regardless of their prevalence (or lack of it) in the scene, however, undermines the authenticity of the accuracy indicators to be generated. Intuitively, more evaluation pixels should be selected for a widespread class than a subordinate class in order to achieve the same level of confidence in the assessment. According to Hay (1979), any sample size of less than 50 pixels is an unsatisfactory guide to true error rates. It is recommended that in most cases minimum sample sizes be set from 50 to 100. The rule of thumb is to use
Classification Accuracy Assessment a sample size in the vicinity of 50. This size should vary in accordance with the prevalence of a land cover. In this way, a delicate balance between statistical validity and economical implementation is reached.
12.3.4
Collection of Reference Data
Reference data play an indispensable and often neglected role in accuracy assessment. It is imperative that these data are as error free as possible. Any misinformation in them will cause the classification results to be judged unrealistically and degrade the confidence in the quality assurance generated from the accuracy assessment. There are a number of ways to collect reference data through which the genuine identity of reference pixels is established, such as study of existing maps, visual examination of the raw color composite image and large scale aerial photographs, and field visits guided with a GPS receiver (Congalton and Green, 1999). Each of these sources or methods of collecting reference data has its own strengths and limitations. Use of existing maps is very easy and cheap. In order to produce a reliable assessment outcome, the existing thematic map must have a scale comparable to that of the land cover map to be assessed, if not larger. The analyst must also be aware of potential issues relating to the use of tested thematic maps, such as map currency, purpose of map compilation, and thematic accuracy of the map. Land covers depicted in the thematic map could have changed since the map was compiled, especially if the map was produced long ago and land cover inside the study area is dynamic and tends to change quickly over a short span of time. The number of covers shown in a thematic map varies with the purpose of its compilation and the data source from which it is compiled. The land covers mapped through digital analysis may not be identical to those shown on the thematic map. Therefore, it is not feasible to select evaluation pixels for every cover mapped. Use of existing thematic maps may be severely limited in that their accuracy is not normally known. The accuracy of the map produced by means of remote sensing is artificially degraded if assessed against error-infested “ground truth.” Even if the accuracy is provided, it is still unknown how it applies to every land cover shown. So caution must be exercised in using tested thematic maps. The second method of collecting reference data is through field visits. All the selected evaluation pixels can be precisely pinpointed in the field with the assistance of a positioning device such as a GPS data logger. However, this method of collection has a number of disadvantages, such as scene instability, differential scaling, site inaccessibility, and high cost. • First, the scene may have changed since it was imaged. This issue is the same as with the use of existing thematic maps, except that the reference data are more recent than the results derived from remotely sensed data.
509
510
Chapter Twelve • Second, image classification and field verification do not take place at the same spatial scale. On the ground the analyst is able to observe land covers at specific spots that may have a limited view. In contrast, the genuine identity of a pixel on the satellite imagery is enumerated over the minimal unit of pixels that correspond to a huge ground area (e.g., 20 × 20 m² for SPOT [Le Systeme Pour l’Observation de la Terre] multispectral bands). There may be a huge variation between the land cover class shown on the map (e.g., pixel identity) and the spot land cover in the field. Therefore, the analyst must determine the predominant majority of covers within a spatial extent comparable to the pixel size on the ground when multiple covers are present. This is not an easy task, especially when the view is obstructed by high-rise buildings, topographic relief, or by trees. It is especially important to determine the predominant land cover over an area equivalent to the pixel size in the field if the landscape is highly fragmented or when the evaluation pixels lie close to the border of multiple land covers. Any discrepancy in spatial extent or scaling may cause misidentification of the land cover of evaluation pixels on the ground. • Finally, this method is severely constrained by site inaccessibility. If the evaluation pixels fall over remote sites such as over the top of a tall mountain or inside a wetland, then it is very difficult to get to these spots. A possible method of solving the inaccessibility problem is to use a helicopter or existing aerial photographs. For example, photographs of the area surrounding the evaluation pixels can be taken with a handheld camera from a helicopter under the navigation assistance of a GPS receiver. This method of collecting reference data is efficient but very costly. Besides, it is not practical when the study site is thousands of kilometers away from the nearest airport. In this case the reference data may be collected from existent aerial photographs. As with existing thematic maps, these photographs must have a scale larger than or comparable to that of the remotely sensed data used to produce the land cover map to be accessed. Again, these photographs must be current. They should at least cover an adequately large site within the study area to guarantee the selection of a sufficient number of evaluation pixels. Regardless of the sources of the reference data, the image analyst must always keep one important point in mind: the spatial and temporal scales of the reference data must be comparable to that of the image-derived results. This compatibility ensures that the two are roughly similar in their accuracy and currency. Use of inferior and obsolete reference data does not do justice to the classification results. It may thwart the whole purpose of undertaking an accuracy assessment.
Classification Accuracy Assessment
12.4
Report of Accuracy A common approach to reporting image classification accuracy is to calculate the proportion or ratio of the mapped area that has been correctly classified in comparison to reference data or “ground truth” ˆ (Story and Congalton, 1986), or (Aobs) to the total area mapped A
Accuracy =
Aobs ˆ A
(12.4)
The above equation is applicable to individual classes. It can be implemented in two ways, site-general and site-specific. The former, also called aspatial accuracy, is discussed first. The latter, known as spatial accuracy, is covered next.
12.4.1 Aspatial Accuracy Aspatial accuracy refers to the degree or percent of agreement in terms of areal proportion for a mapped cover in a classification result and in the reference data or their acceptable substitute, irrespective of its spatial location. For instance, if forest is classified at 90 ha from the image against an area of 100 ha in the reference data, then the classification is said to be achieved at an accuracy of 90 ha divided by 100 ha, or 90 percent, without regard to where these 90 ha of forest are located inside the study area. This method of reporting the accuracy can be simply implemented by examining the statistics of the classification results (see Sec. 7.10). However, this method of reporting is fundamentally flawed in that it ignores potential misclassifications between forest and other nonforest covers in both directions. For instance, the 90-ha classified forest may not correspond completely to genuine forest on the ground. It is possible that some forest pixels are classified as pastures, whereas some shrubs are classified as forest. In other words, misclassifications between forest and other spectrally resembling classes (i.e., both errors of omission and errors of commission) could have been negated in deriving the sum of forest acreage. As illustrated in Fig. 12.4, the accuracy would be 100 percent for all
Forest
Water
Pasture
(a)
Cropland
Forest
Water
Pasture
Cropland
(b)
FIGURE 12.4 Two spatial permutations of a classification of four covers. (a) Ground truth; (b) classification results. Notice in these two maps how each cover differs in its spatial distribution, but remains unchanged in its area.
511
512
Chapter Twelve four covers if the area-based accuracy reporting were applied. As demonstrated here clearly, the actual accuracy is much lower than this indicator suggests. The exclusion of both commission errors and omission errors from consideration in aspatial accuracy reporting makes site-general accuracy underreport classification errors. By comparison, site-specific accuracy is a more realistic indicator.
12.4.2
Spatial Accuracy
Spatial accuracy refers to the proportion of agreement between a classification result and the reference data at certain specific locations. It can be derived for individual land covers and for all land covers in one classified result based on evaluation pixels that have been selected using the stratified random sampling method. Their labeled identity in the classification results is known to the computer. Their genuine identity on the ground has been ascertained through comparison with the reference data. The identity is then coded systematically in a manner consistent with the classified results. Namely, the same code is assigned to the same cover in both the classified results and in the reference data before it is entered into the computer manually. In order to avoid human bias, entry of the genuine identity is best done when the assigned identity in the classification results is hidden from view (column 5, Table 12.1). Once the genuine identities of all evaluation pixels are entered into the computer, various quantitative indices can be generated automatically from them, with the confusion between any two classes revealed. The outcome of spatial comparison of the evaluation pixels is usually presented in a tabular form known as an error matrix or confusion matrix. It has also previously been variously called contingency table (Story and Congalton, 1986), evaluation matrix (Aronoff, 1984), and misclassification matrix (Chrisman, 1991). The expression of classification accuracy in the error matrix form is highly effective in evaluating both errors of omission and errors of commission in a classification (Congalton et al., 1983). This two-dimensional (2D) matrix is always square (Table 12.2), showing the membership of evaluation pixels among the mapped information classes. The size of the matrix equals the number of information classes retained in a classification. Of particular notice is that the error matrix is rarely symmetric. Namely, A2 is hardly the same as B1, suggesting that the confusion between forest and water is different from that between water and forest unless there is no confusion in either direction, or there are only two land covers in the classification results. The horizontal axis and vertical axis have different meanings. The convention used here is that the assigned identity in the classification results is presented in a row (across), whereas a column (down) is reserved for the reference data (i.e., genuine identity on the ground) (Congalton and Mead, 1983). Figures in the major diagonal cells of the error matrix denote the number of correctly classified pixels. The larger the cell value, the
Classification Accuracy Assessment
Location
Evaluation Pixel No.
Name
X
Y
1
ID#1
2660610
2
ID#2
3 4
Identity In Classified Results
In Reference Data
6483287
1
1
2664750
6488057
5
5
ID#3
2664750
6478637
7
6
ID#4
2655210
6489527
2
2
5
ID#5
2654370
6478607
3
3
6
ID#6
2660850
6479327
2
2
7
ID#7
2663070
6482447
5
5
8
ID#8
2657790
6483317
5
5
9
ID#9
2660520
6476477
5
7
10
ID#10
2660250
6482477
4
4
11
ID#11
2661480
6489527
1
1
12
ID#12
2665770
6482117
2
2
13
ID#13
2658360
6491477
6
5
14
ID#14
2657340
6478817
6
6
15
ID#15
2660250
6480497
7
7
16
ID#16
2662980
6481187
1
1
TABLE 12.1 Preparation of Evaluation Pixels for the Generation of the Error Matrix
higher the classification accuracy. The off-main diagonal cell values stand for the number of misclassified pixels in the form of errors of omission and errors of commission. The larger the cell values, the more cells that have a nonzero value, the lower the classification accuracy. Reference data → Classified Results ↓
Water
Forest
Pasture
Cropland
Water
A1
A2
A3
A4
Forest
B1
B2
B3
B4
Pasture
C1
C2
C3
C4
Cropland
D1
D2
D3
D4
TABLE 12.2
An Error Matrix for a Four-Cover Classification
513
514
Chapter Twelve
12.4.3
Interpretation of Error Matrix
A number of accuracy indices can be derived from the error matrix. In order to quantify the errors and calculate the classification accuracy, Table 12.2 needs to be expanded by inserting one column named column margin for row sum and one row called row margin for column sum. Row sum signifies the number of evaluation pixels selected for a particular land cover. Their selection and quantity have been covered already in Secs. 12.3.2 and 12.3.4, respectively. Representing the actual number of pixels in the classification results, column sum is used to calculate errors of omission. All the figures in a column, except the main diagonal ones, in the table represent errors of omission (e.g., B1 + C1 + D1 for forest). Errors of commission are derived from all the figures in a row, except the main diagonal cells (e.g., A2 + A3 + A4) (Table 12.3). These errors are calculated using the following equations:
Errors of commission =
A2 + A3 + A 4 Σr
(12.5)
Errors of omission =
B1 + C1 + D1 Σc
(12.6)
where Σr (row sum) = A1 + A2 + A3 + A4; Σc (column sum) = A1 + B1 + C1 + D1. The above calculation can be repeated for each of the mapped covers. Once errors of commission and errors of omission are determined, user’s accuracy and producer’s accuracy can be derived from them. User’s accuracy is defined as the ratio of the main diagonal cell value to the sum of the same row. Alternatively, it can also be derived from the omission errors using the following equation: A1
User’s accuracy = Σc = 100 − omission errors (%)
(12.7)
Producer’s accuracy is defined as the ratio of the figure in the main diagonal cell of a column to the sum of the same column (FitzpatrickLins, 1981). It can also be obtained by subtracting commission errors from 100, or
Producer’s accuracy =
A1 = 100 − commission errors (%) Σr
(12.8)
Both user’s accuracy and producer’s accuracy are applicable to individual land cover classes. In practice it is far more common to report producer’s accuracy than user’s accuracy. The obtained producer’s accuracy is usually very high, but this high accuracy can be misleading because its derivation ignores commission errors, a pitfall that should be avoided in accuracy evaluation. For this reason, user’s accuracy should be reported as well (Story and Congalton, 1986). The inflated high
Row Sum (Σr)
Errors of Commission (%)
Cover
Water
Forest
Pasture
Cropland
Water
A1
A2
A3
A4
A1 + A2 + A3 + A4
(A2 + A3 + A4)/Σr
Forest
B1
B2
B3
B4
B1 + B2 + B3 + B4
(B1 + B3 + B4)/Σr
Pasture
C1
C2
C3
C4
C1 + C2 + C3 + C4
(C1 + C2 + C4)/Σr
Cropland
D1
D2
D3
D4
D1 + D2 + D3 + D4
(D1 + D2 + D3)/Σr
Column sum (Σc)
A1 + B1 + C1 + D1
A2 + B2 + C2 + D2
A3 + B3 + C3 + D3
A4 + B4 + C4 + D4
Errors of omission
(B1 + C1 + D1)/Σc
(A1 + C1 + D1)/Σc
(A1 + B1 + D1)/Σc
(A1 + B1 + C1)/Σc
TABLE 12.3
Calculation of Errors of Omission and Errors of Commission
515
516
Chapter Twelve accuracy due to reporting of only producer’s accuracy can be redressed by inclusion of the overall accuracy in the report. The overall accuracy of a classification is defined as the ratio of the sum of the main diagonal cells to the grand sum of all evaluation pixels [Eq. (12.9)]. This overall accuracy cannot be calculated by simply averaging all producer’s accuracies or user’s accuracies directly, unless the number of evaluation pixels is exactly the same for all classes. Otherwise, a weighted averaging is required for the derivation, which is cumbersome to achieve. A much more efficient and better alternative is to divide the sum of the figures in the main diagonal cells in the error matrix by the total number of pixels used in an evaluation or the grand total of all row sums or column sums [Eq. (12.9)]. Table 12.4 provides a practical example to illustrate the application of the formulas introduced previously in deriving all accuracy indicators, including producer’s accuracy and user’s accuracy. In this example, the overall accuracy is calculated at 88 percent.
Overall accuracy =
A1 + B2 + C 3 + D4 4
∑ ( Ai + Bi + Ci + Di)
(12.9)
i=1
After the accuracy for each category is evaluated, a confidence interval for the evaluation should be provided. This confidence level is determined based on the tests of different consumer risks. The confidence interval for the means (m) of a category is computed as: ⎛ ⎞ ⎛ X − μ⎞ P ⎜−b < ⎜ < b⎟ = 1 − α ⎟ ⎟ ⎜ ⎜σ ⎟ ⎝ N ⎠ ⎠ ⎝
(12.10)
where P = probability of correctly labeling pixels in a classification N = total number of samples α = significance level μ = population mean derived from the sum of the population elements divided by the number of observations or population size X = sample mean σ = standard deviation b = confidence limit For a binomial distribution, σ² = μ(1 − μ); 100(1 − α) percent is the confidence level of the interval. For a 95 percent confidence interval, α = 0.05, at which b = 1.960 (determined from the normal distribution table). Equation (12.10) can then be rewritten as: N (X 2 − 2Xμ + μ 2 ) < 1.962 μ(1 − μ )
(12.11)
Classified Results (across)
Water (W)
Water
28
2
0
0
Forest
1
44
5
0
Pasture
0
3
53
Cropland
0
2
29
Column total Errors of omission (%)∗ User’s accuracy (%)
3 (1) 97
Forest (F)
Pasture (P)
Cropland (C)
Errors of Commission (%)∗
Producer’s Accuracy (%)
30
7 (2)
93
50
12 (6)
88
4
60
12 (7)
88
7
51
60
15 (9)
85
51
65
55
200
14 (7)
18 (13)
86
82
7 (4) 93
Row Total
Overall accuracy = total correct/grand total = (28 + 44 + 53 + 51)/200 = 176/200 = 88%
* Figures in brackets refer to the sum of off-diagonal cells.
TABLE 12.4 A Practical Example Illustrating How to Calculate Errors of Commission, Errors of Omission, Producer’s Accuracy, User’s Accuracy, and the Overall Accuracy
517
518
Chapter Twelve The above equation is solved for upper and lower limits of μ (Hord and Brooner, 1976). Example If N = 150 and X = 0.98, then 0.9429 < μ < 0.9931
That is to say, the true map accuracy, with the 95 percent confidence, lies in the range from 0.9429 to 0.9931. The sample accuracy is 98 percent determined from a sample size of 150 pixels. What has been missing from the discussion so far is the acceptable level of classification accuracy. For instance, what is the minimum acceptable accuracy for a cover? The minimum accuracy is defined as the lowest expected accuracy of a map at a user-selected consumer risk in an observed accuracy test result. As argued by Aronoff (1985), the minimum accuracy should be used as an index of classification accuracy. The answer to the previously posed question lies with the user as there is no universal standard for thematic accuracy, even though universal planimetric accuracy standards have been established for topographic maps. The user accepts the classification results at their own peril. After the generation of all accuracy indicators, hypotheses about image classification accuracy may be tested, which is essential in remote sensing applications that require the classification results to have a minimum accuracy as a general measure of quality control (Janssen and van der Wel, 1994). The number of sample size needed in the testing is determined by risks and the acceptable accuracy if it is predefined. The testing involves two hypotheses, null (H0) and alternative (H1). The null hypothesis states that the classification is achieved below the claimed accuracy. The alternative hypothesis states otherwise. Both hypotheses are tested at a significance level that defines the possibility of erroneously rejecting H0 (type I error). The possibility of incorrectly accepting H0 (type II error) for situations that are valid under H1 is called the power of the test (1 − b ). These two types of error in testing the accuracy of land cover maps are termed consumer’s risk (a) and producer’s risk (b ), respectively. The former means accepting a map of an unacceptable accuracy, which has a larger consequence for the user. The latter refers to rejection of a classification result of an acceptable accuracy.
12.4.4
Quantitative Assessment of Error Matrix
The accuracy assessment discussed so far is unable to demonstrate how effectively the image classification has been achieved. The answer to this question lies in benchmarking the classification results against a commonly accepted standard. A handy benchmark standard is random assignment of pixel values to one of the land covers in the classification scheme. Comparison of an actual classification result with this standard is able to shed light on the effectiveness of the classification over randomness. The result of the comparison is measured by the Kappa (k ) test statistic that assesses interclassifier agreement. Kappa analysis can be used to statistically test whether
Classification Accuracy Assessment one error matrix, which is a discrete multivariate table, is significantly different from another (Bishop et al., 1975). It is a more discerning statistic parameter for comparing the accuracy of different classifiers (Fitzgerald and Lees, 1994), and offers better interclass discrimination than the overall accuracy measure. The process of deriving the Kappa value is known as the KHAT statistic that is able to explore the impact of variables on classification accuracy, and thereby, better understanding of the evaluation (Congalton and Mead, 1983). The estimate of Kappa, Kˆ , is a measure of the difference between the observed agreement of the classification result with the reference data as shown in an error matrix, and the agreement of chance matching with the same reference data that is shown in another matrix similar to the error matrix. It is calculated by subtracting the estimated contribution of chance agreement from the observed agreement, or k =
Observed − expected 1 − expected
(12.12)
where “observed” refers to the observed overall accuracy that is derivable from Eq. (12.9). The calculation of the “expected” accuracy resembles the more familiar c ² analysis. In the calculation it is assumed that the dataset has a c ² distribution. For computational convenience,
∑ (X − ∑ ∑ ) r
Kˆ =
i=1
ii
ci
r
N2 − ∑ i=1
ri
(∑ ∑ ) ci
(12.13)
ri
where r = number of rows and columns in the error matrix Xii = number of observations in row i and column i Sci = marginal total of column i Sri = marginal total of row i N = total number of observations The value of Kˆ ranges from −1.0 to 1.0. As illustrated in Table 12.5, Kˆ = 1.00 if there are values only in the main diagonal cells. This represents a situation where the classification is perfectly superior to random assignment (Table 12.5). On the other extreme, Kˆ = −1.00 if all the values fall in the opposite main diagonal cells. If every cell has exactly the same value, then Kˆ = 0.00, an outcome caused by equal assignment of pixels into every information class. A practical example is provided in Fig. 12.5 to illustrate how the calculation is done. This example is based on the classification results presented in Table 12.4. In this example the expected accuracy is calculated at 26.6 percent, against the observed accuracy of 88 percent as shown in Table 12.3.
Therefore, the k value is calculated at 83.7 percent. A larger k value suggests a higher effectiveness of the classification. In other words, the classification is 83.7 percent more effective than random assignment of land cover identities to pixels.
12.4.5 An Example of Accuracy Assessment Provided in Table 12.6 is an example of accuracy assessment for a supervised classification of 12 land covers. A total of 560 evaluation pixels were selected using stratified random sampling. The number of evaluation pixels selected for a given cover ranges from 30 (for shadow) to 60 (for residential and deep water). The selection of these pixels has taken into account their spatial distribution and prevalence. The genuine identity of these pixels was determined from three
Figures represent the main diagonal value of the error matrix in Table 12.4.
FIGURE 12.5 A worked example illustrating the calculation of κ. Contained in the body of the table are products of row and column marginals in Table 12.4.
sources of reference data: the color composite of the original image, aerial photographs taken only 2 years before the satellite data were recorded, and field visits. Also included in the error matrix are row and column sums, as well as user’s accuracy and producer’s accuracy. The highest producer’s accuracy is 98.3 percent for water with the lowest being 6.7 percent for industrial. This low accuracy is attributed to its heavy confusion with murky water. This classification is achieved at an overall accuracy of 76.8 percent, with the k value being 0.745.
12.4.6
Comparison of Error Matrices
Occasionally, it may be necessary to compare error matrices. For instance, the same image is classified with various classifiers to assess the effectiveness of different classification algorithms. The same classification may be assessed for its accuracy using different numbers of evaluation pixels or against evaluation pixels that have been collected using different sampling strategies. Comparison of multiple error matrices is able to reveal which classifier is the most effective and what is the most appropriate number of evaluation pixels. Different error matrices are not directly comparable with one another by simply examining their k values because the number of evaluation pixels used likely varies with assessment. How should the impact of this variable be accounted for? Which classifier performs better for a particular class? The answer to these questions requires
TABLE 12.6 A Practical Example of Accuracy Assessment for Classified Results
76.8
Classification Accuracy Assessment standardization of all error matrices prior to the calculation of their k values. Standardization of the error matrix is achievable through the Margfit technique, in which the row sum or the column sum in the margin is made to equal a predetermined value (hence the name Margfit) through iterative proportional adjustments (Congalton and Green, 1999). A convenient predetermined value is 1, which is easily achieved by dividing all the values in a row by the sum of the same row. In this way the absolute value representing the number of pixels in the error matrix is converted into a relative value or percentage. Through this standardization process, the actual number of pixels used in different accuracy assessments is no longer relevant. The overall classification accuracy is then derived by simply adding up the main diagonal cell values and then by dividing the sum by the number of information classes in the error matrix (Table 12.7). The overall classification accuracy shown in Table 12.7 is calculated at 88.65 percent after this standardization procedure. This accuracy level is slightly higher than its counterpart of 88 percent derived using the absolute value. Arguably, this normalized accuracy is better than the former overall accuracy as it incorporates all the cell values (i.e., both row sum and column sum). There are two ways of standardizing the error matrix, dividing the table by the row sum or dividing the table by the column sum. If divided by the row sum, then the results can be used to determine the user’s accuracy. If divided by the column sum, then the results can be used to determine the producer’s accuracy. It does not matter which sum is used to standardize the error matrix. However, the manner of standardization must remain consistent across all error matrices when several matrices are to be compared with each other. As shown in Table 12.7, the k value is calculated at 84.87 percent using the values
Cover
Water
Forest
Pasture
Cropland
Water
0.933
0.067
0.000
0.000
Forest
0.020
0.880
0.100
0.000
Pasture
0.000
0.050
0.883
0.067
Cropland
0.000
0.033
0.117
0.850
Normalized accuracy (i.e., average of main diagonal values) = (0.933 + 0.880 + 0.883 + 0.850)/4 = 0.8865 Expected accuracy = 4/16 = 0.25 k=
TABLE 12.7 Normalized Error Matrix for the Data Shown in Table 12.3
523
524
Chapter Twelve standardized by the row sum, highly similar to 83.7 percent obtained using the actual values. This k value is directly comparable among all error matrices regardless of how many evaluation pixels are used in the assessment.
References Aronoff, S. 1984. “An approach to optimized labeling of image classes.” Photogrammetric Engineering and Remote Sensing. 50(6):719–727. Aronoff, S. 1985. “The minimum accuracy value as an index of classification accuracy.” Photogrammetric Engineering and Remote Sensing. 51(1):99–111. Bishop, Y., S. Fienberg, and P. Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press. Chrisman, N. R. 1991. “The error component in spatial data.” In Geographical Information Systems: Principles and Applications, ed. D. J. Maguire, M. F. Goodchild, and D. W. Rhind, 165–174. New York: John Wiley & Sons. Congalton, R., and R. Mead. 1983. “A quantitative method to test for consistency and correctness in photointerpretation.” Photogrammetric Engineering and Remote Sensing. 49(1):69–74. Congalton, R., R. Oderwald, and R. Mead. 1983. “Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques.” Photogrammetric Engineering and Remote Sensing. 49(12):1671–1678. Congalton, R., and K. Green. 1999. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices. Boca Raton, FL: CRC Press. Fitzpatrick-Lins, K. 1981. “Comparison of sampling procedure and data analysis for a land-use and land-cover map.” Photogrammetric Engineering and Remote Sensing. 47(3):343–351. Fitzgerald, R. W., and B. G. Lees. 1994. “Assessing the classification accuracy of multisource remote sensing data.” Remote Sensing of Environment. 47(3):362–368. Gao, J. 1999. “Evaluation of SPOT HRV data in mapping mangrove forests in a temperate zone.” GeoCarto International. 14(3):43–50. Ginevan, M. E. 1979. “Testing land use map accuracy: Another look.” Photogrammetric Engineering and Remote Sensing. 45(10):1371–1377. Gordon, H. R. 1978. “Removal of atmospheric effects from satellite imagery of the ocean.” Applied Optics. 17(10):1631–1636. Hay, A. M. 1979. “Sampling designs to test land-use map accuracy.” Photogrammetric Engineering and Remote Sensing. 45:529–533. Hord, R. M., and W. Brooner. 1976. “Land-use map accuracy criteria.” Photogrammetric Engineering and Remote Sensing. 42(5):671–677. Janssen, L. L. F., and F. J. M. van der Wel. 1994. “Accuracy assessment of satellite derived land-cover data: A review.” Photogrammetric Engineering and Remote Sensing. 60(4):419–416. Story, M., and R. G. Congalton. 1986. “Accuracy assessment: A user’s perspective.” Photogrammetric Engineering and Remote Sensing. 52(3):397–399. USGS. 1990. The Spatial Data Transfer Standard Draft, January 1990. Reston, VA: USGS. van Genderen, J. L., and B. F. Lock. 1977. “Testing land-use map accuracy.” Photogrammetric Engineering and Remote Sensing. 43(9):1135–1137.
CHAPTER
13
Multitemporal Image Analysis
T
he image processing steps discussed in the preceding chapters have covered the complete procedure of image analysis in that the input data have been converted into useful information with the quality of the produced results assessed. Nonetheless, the obtained results are valid at the time of satellite data recording. They could become obsolete as the Earth’s surface is in a state of change. For instance, forests are logged when trees reach maturity. Crops are harvested seasonally. New urban areas sprawl in the vicinity of existing built-up areas. In light of catastrophic events (e.g., flood, fire, earthquake, and tsunami), extensive and drastic changes can take place over a matter of days or even hours (Fig. 13.1). In order to bring the results up-to-date, recent images should be analyzed and compared with historic ones to identify areas of change before reasons behind the change can be explored further. This is commonly known as multitemporal remote sensing, in which results obtained from remote sensing data recorded over the same geographic area at different times are compared with each other. This kind of multitemporal data analysis is essential in detecting land cover change. Change analysis refers to a process of monitoring the state of an object or phenomenon in a given geographic area longitudinally. It can be carried out either spatially or aspatially, each having its own unique requirements and features. In this chapter, the requirements of change analysis are presented first. The various techniques of change analysis are presented next. This is followed by accuracy analysis of the detected results. Finally, methods to effectively visualize the detected changes are discussed and proposed.
525
(a)
(b)
FIGURE 13.1 Drastic changes in land cover can take place over a short period of time in light of a natural disaster such as a tsunami. (a) The Gleebruk village of Aceh on April 12, 2004 before the Boxing Day tsunami; (b) the same area on January 2, 2005 soon after the devastating tsunami. (Copyright DigitalGlobe.) See also color insert.
526
Multitemporal Image Analysis
13.1
Fundamentals of Change Analysis 13.1.1
Conceptual Illustration
Change analysis is in essence a spatial comparison of two or more land covers of the same geographic area produced from remotely sensed data that are recorded at different times (Fig. 13.2). Any spatial variation in the boundaries of the land cover parcels signifies changes taking place during the interval. The input land covers may be produced from different sources (e.g., airborne and spaceborne) of remote sensing material. Preferably, all the covers should be derived from data originating from the same sensor by the same analyst, even though different spaceborne sensors are acceptable so long as their spatial resolution does not vary widely. This effectively guarantees that the impact of the factors that contribute to artificial changes is minimized. If the materials are from different sources or from different sensors, the change detection procedure may become more complicated in that additional steps may be essential in unifying the inputs prior to change analysis. The topic of data preparation for change detection is so complex that it will be covered separately in Secs. 13.1.2 and 13.1.3. The overlay of one land cover on top of another may be implemented in an image analysis system if both of the input layers are in raster format, or in a geographic information system (GIS) if they are in vector format. If the two covers differ in their data format, then a conversion in data format from one to another must precede change analysis.
Land cover 1
+
Land cover map at time 1
Land cover map at time 2
Land cover 2
=
Overlay analysis
Change cover Change cover (a)
(b)
FIGURE 13.2 The concept of change detection in image analysis. Land covers at different times are overlaid with one another to reveal whether the boundaries of land cover parcels have shifted. Any shift is construed as manifestation of a change. (a) Spatial illustration; (b) conceptual illustration.
527
528
Chapter Thirteen
13.1.2
Requirements of Change Analysis
Change analysis is meaningful only when changes on the ground have taken place in the interim between recording the remote sensing data from which the input land covers are derived. There is no requirement as to the minimum temporal interval between these covers. It can be as short as hours and days, or as long as decades because some changes can take place within a matter of hours (e.g., fire damage and flooding) and other changes occur at a temporal scale of months and even years (e.g., deforestation and urban sprawling). Before change analysis can be carried out successfully, the input land covers must meet five requirements: • First, all input layers must cover an identical geographic area. Any discrepancy in the spatial extent of the area under study between two sets of input will lead to artificial changes. Namely, the detected change between the two covers results from the changed spatial extent instead of genuine change in land cover. The area of study can be unified by intersecting the same boundary file with all the input layers. • Second, if all the input layers consist of raw spectral bands, they must be recorded over the same spectral wavelength ranges at the same quantization level so that the same ground feature has the same radiometric response in the satellite images. This requirement is best fulfilled by using the same spectral band from the same sensor. • Third, if the input is made up of raw spectral bands, they must have the same spatial resolution. This requirement also applies to raster land cover maps classified from raw spectral bands. In this case it is permissible for these bands to originate from different sensors that may have a different number of bands, different ground cover per scene, even though not all spectral bands are always useable in a change analysis. Spatial resolution still matters if the change analysis is carried out in raster format with raster data. However, spatial resolution becomes irrelevant if the land cover maps are to be vectorized and the change detection is to be undertaken in vector format. If the input layers have differing spatial resolutions, they have to be unified through an additional step of resampling during image geometric rectification (see Sec. 5.5.4). However, nonuniformity in spatial resolution may exert an impact on the accuracy of the detected results, a topic to be covered in depth in Sec. 13.6. • Fourth, if the input layers are classified land cover maps, they must be classified in accordance with the same classification scheme, namely, the same number of information classes at the same detail level. These classes must also be defined identically.
Multitemporal Image Analysis • Finally, no matter whether change analysis is implemented in raster or vector format, all input covers must have the same coordinate system and conform to the same map projection before they can be overlaid with each other spatially.
13.1.3
Procedure of Change Analysis
There are many steps involved in a change analysis (Fig. 13.3). Prior to the actual detection, a number of preparatory steps may have to be Aerial photographs
Satellite data
Scanning
Subsetting
Georeferencing
Georeferencing
Mosaicking
Training sample selection No Satisfactory
Photointerpretation
Yes Editing
Aspatial change detection
Classification
Vectorization Union (spatial change detection)
Exploration of potential changes
Numerical representation of changes
Visualization of changes
Accuracy assessment of detection
FIGURE 13.3 Major steps involved in a typical change analysis process in which both aerial photographs and satellite images are used. If only one type of data is used in a detection, there is only one set of steps prior to union. Note: The last three steps are not sequential—each can be carried out independently of the other two. (Source: Modified from Gao and Skillcorn, 1996.)
529
530
Chapter Thirteen undertaken to transform the data into the right format. The number of preparatory steps required varies with the nature of the data and their sources. In general, change analysis based on classified land cover maps requires less preparation than raw data–based detection. One important preprocessing of raw data is radiometric calibration. Images obtained from different sensors or from the same sensor in different seasons must be calibrated to account for the variations in the atmospheric effects and solar radiance. The images may be smoothed first to filter out random noises. Since only one spectral band from an image is allowed in a change analysis, the most informative band should be selected. It can be determined from the main diagonal values in the variance-covariance matrix [see Eq. (6.15)] of all spectral bands. Apart from the original bands, the most informative band can be created by projecting the content of all bands into a few components through principal component analysis (PCA). The most informative component image is then used in a change analysis. If the data from different sensors have varying quantization levels, then they need to be unified by reducing the higher quantization level to the lower one. Their spatial resolution, if different, has to be unified as well if the change analysis is to take place in raster format. If the covers to be used in the change analysis are classified images, the digital number (DN) of pixels may be recoded before change analysis can take place in raster format. Recoding is done to avoid the situation where a few subtractions lead to the same outcome. After all preparatory work, change analysis can be performed simply by overlaying all the layers on top of each other. Potential changes can be explored from the change detection layer. These change analysis results are then represented in a matrix form and their spatial distribution visualized. Sometimes it is also necessary to assess the accuracy of the detected changes. The last three steps of numeric representation, results visualization, and change detection accuracy assessment are not necessarily sequential. In fact, each of them can be carried out independently of the other two or concurrently with them.
13.2
Qualitative Change Analysis There are several change analysis methods for use with remotely sensed data, ranging from univariate image differencing, postclassification comparison to change vector analysis (Singh, 1989). Apart from these quantitative change analysis methods, there are also qualitative ones. In this section the qualitative methods will be discussed first, followed by the quantitative ones in the next two sections. Qualitative change analysis is virtually a process of highlighting the location of changes through stacking at least two images obtained at different times on top of each other. There are two strategies by which
Multitemporal Image Analysis the changes can be revealed, visual overlay of images and image compositing.
13.2.1 Visual Overlay In this method one of the images is displayed first, either as a color composite or in gray scale if the original image contains only a single band. Another image is displayed on top of the existing one in the same viewer if both have the same projection and are georeferenced to the same coordinate system. The second image can, however, come from different sensors, have different spatial resolutions, and involve a different number of spectral bands (a maximum of three bands can be used at a time though) than the first image. Again, this image can be displayed in color or shades of gray, depending upon the number of bands available. Since both images are in raster format, only the second image on the top is visible in the viewer. However, the bottom image can be made partially visible by displaying a portion of the top image. This can be easily accomplished through a special button, such as the “swipe” function in ERDAS (Earth Resources Data Analysis System) Imagine. The top image can be swiped either from left to right or from top to bottom (Fig. 13.4). At the border of the two images, any spatial discontinuity in ground objects manifests change. This method is advantageous in that multiple bands (a maximum of three) from
FIGURE 13.4 An image display method for carrying out qualitative change analysis. In this example the IKONOS-2 image of February 4, 2001 is displayed on top of the historic aerial photograph of the same area taken in 1960. In the middle of the display is a discontinuity in the shoreline position, suggesting that coastal erosion has taken place here during the interval. From this display the magnitude of beach erosion can be measured quantitatively on the screen. See also color insert.
531
532
Chapter Thirteen the same date can be used in the display. Subtle variation in some changes can be shown very clearly. However, this method can reveal only general information on where a change (e.g., coastal erosion) has taken place, from which it is possible to quantitatively measure the change on the screen. Nevertheless, it can neither reveal the entire spatial extent of the change nor produce a permanent record of the change. The inability to show the whole extent of change can be overcome by displaying the top image at a certain transparency level, such as 50 percent. In this way the entire underlying image is also visible, enabling change in the overlaid viewer to be revealed. The advantage of this display is that both images are fully visible instead of only partially. The disadvantage is that the change may not be distinctively rendered and hence may be difficult to perceive. Regardless of the mode of display, the two images must be precisely registered either to a common geographic coordinate system or with each other spatially. Any geometric inaccuracy of registration will be translated into pseudo changes.
13.2.2
Image Compositing
This method is very similar to visual overlay of images in that all images used in change detection are stacked on top of each other to form a composite one. The difference is that spectral bands from both times are used to actually create a false color composite rather than a display on the screen. Since only a maximum of three bands can be visualized in producing one color composite, the number of spectral bands from each time cannot exceed two. Once the three bands are determined, the composite image can be visualized in several forms. Each individual band can be assigned one of the three primary colors (blue, green, red). The change is maximized through a unique bandcolor combination that can capture the viewer’s attention easily, even though this may require repeated experiments with various bandcolor filter combinations. As shown in Fig. 13.5, areas of change have a color different from areas of no change. The changed area in the sea appears as bright saturated red. Near shore the changed area has a color of darkish bright red, in contrast to unchanged areas that have a color of green. However, this color pattern varies with the colorband combination. The color of both changed and unchanged covers will be different if a different color filter is assigned to a given band. The overlay of the same spectral band from both datasets, one assigned the color of blue and another the color of red, is an effective means of identifying change (Howarth and Boasson, 1983). Compared with image visual overlay, image compositing allows the entire spatial extent of areas of change to be revealed and permanently stored. It has the added advantage of allowing more than one spectral band from each time to be used in the detection. A maximum of two bands is allowed in one of the images. The flexibility in selecting these
Multitemporal Image Analysis
FIGURE 13.5 The image compositing method of highlighting change. This color composite is formed by a historic aerial photograph (red) with bands 1 (green) and 2 (blue) of an IKONOS image. The red color in this composite represents change. The darkish bright red near the coast in the middle of the composite shows change from land to water (i.e., coastal erosion). See also color insert.
two bands enriches the information content of the color composite and facilitates the perception of the change in the display. However, it is also difficult, if not impossible, to determine the specific nature of change at a given location in case of multiple changes of a differing nature. Also, it is extremely difficult to quantify the observed change from the composite image, the same as with the overlay display. The solution to this difficulty lies with the quantitative methods of change analysis.
13.3
Quantitative Change Analysis Quantitative change analysis is a kind of spatial analysis that can directly yield quantitative information on changes. It can be performed with raw remote sensing bands or thematic land cover maps classified from them. For raw bands, the quantitative information is rather general. For instance, it is available for all types of change, but not for a specific type of change. There are two methods by which this kind of quantitative change analysis can be implemented: image differencing and image ratioing. Image differencing involves subtraction of one image from another, whereas image ratioing refers to division of one image by another. These two methods operate on the principle that pixels of no change have the same reflectance value in both images. Their resultant pixel value in the change layer is quite distinct
533
534
Chapter Thirteen from that of changed pixels. These two methods of analysis will be covered in this section. Another quantitative method of land cover change detection is called postclassification comparison. It is so complex that it will be covered in a separate section (Sec. 13.4).
13.3.1
Spectral Differencing
After the two images, either raw spectral bands or processed bands, of the same area are coregistered precisely or referenced to the same ground coordinate system, one image is subtracted from another. The concept of change analysis based on image spectral differencing is expressed mathematically as:
ΔDN(i, j)k = DN(i, j)kt1 − DN(i, j)kt2 + b
(13.1)
where ΔDN(i, j)k = difference in DN between the two input bands at location (i, j) in spectral band k DN(i, j)kt1 = pixel value in spectral band k at the same location at time 1 DN(i, j)kt2 = pixel value in spectral band k at the same location at time 2 b = a bias used as an offset to prevent the occurrence of negative pixel values in the difference image Since change can occur in both directions (e.g., either forward or backward), it is uncertain as to which image should be subtracted from which. A bias b of 127 is usually added to the difference indiscriminately in order to guarantee that the resultant difference image always contains positive pixel values. This method of analysis is advantageous in that it is relatively easy to understand and to implement. Theoretically, all pixels having a value of 0 in the resultant image manifest areas that have not changed during the study period. All the remaining pixels with nonzero values are considered areas of change. This method of analysis involves only subtraction with minimal human intervention. So long as the two images have been sampled to the same ground resolution and projected to the same coordinate system, the subtraction can be carried out very quickly. The results of change detection are not subject to the inaccuracy inherent in classified land cover maps. In spite of these advantages, implementation of this change analysis method is complicated by a number of factors in practice: • First, this method is limited in that it fails to reveal the nature of a detected change (e.g., the class from which a land cover has changed) because the difference is sometimes indistinguishable between two subtractions. For instance, a difference of 10 could be derived from 235 and 245, and also from 187 and 197. However, this should not be a problem if the
Multitemporal Image Analysis change is restricted to only one category of cover, and thus foreknown, such as the change associated with bushfire and flooding. In the former case the nature of change is from vegetated land to barren land. In the latter case the change is from nonwater to water areas. • Second, direct use of raw spectral data in change analysis makes the detected change highly susceptible to radiometric variations caused by illumination conditions and seasonality. Solar elevation varies with season. The same terrain will have a shorter shadow length when the solar elevation is higher. Such variations may not represent a genuine change in land cover. In order to avoid this problem, it may be necessary to radiometrically calibrate the images before they are used in image differencing. Radiometric calibration can be based on histograms of certain covers that have not changed in both images. Precise calibration requires the use of topographic data or digital elevation models (DEMs). Associated with the slight variation in pixel value is the identification of the threshold for genuine changes. It is difficult to set a precise threshold for change/no change. One potential solution to this uncertainty is to produce a histogram of the resultant change image. Contained in the histogram are break points that may serve as the critical thresholds for change/no change. Another method of differentiating areas of change/no change is to create a change detection mask on the basis of the spectral information of the raw images. This method works successfully only if the threshold is appropriately selected to separate pixels of change from pixels of no change (Jensen et al., 1993).
13.3.2
Spectral Ratioing
Also known as image division, spectral image ratioing is mathematically underpinned by the following equation: DN (i, j)k =
DN (i, j)tk1 +b DN (i, j)tk2
(13.2)
In this equation all terms are defined as in Eq. (13.1). In order to prevent the denominator from becoming 0, usually a small value (e.g., 1) is added to DN(i, j)kt2 in practice. In this equation the image at time 1 is divided by that at time 2. This relationship can be reversed. In other words, it does not matter which image is divided by which as all resultant pixel values are expressed as ratios initially. Some image analysis systems may rescale them to a range from 0 to 255 through a gain and an offset.
535
536
Chapter Thirteen Image ratioing or division is extremely similar to image differencing conceptually. For instance, it also involves the use of two spectral bands, one from each of the multitemporal images recorded at different times, even though many more spectral bands may be available from either of the two sets of imagery. Covering the same geographic area, these bands have been unified in their projection and spatial resolution. Image ratioing differs from image subtraction in that which image is divided by which image exerts no impact on the ratioed outcome. Besides, the results are expressed as a ratio instead of a difference. All those areas that have not changed in the interim will receive a value of 1 in the divided image while all changed pixels will have a value of non-1. In addition, image ratioing has the advantage of being able to compensate, to a certain degree, for topographic shadow and changed illumination condition (for details, refer to Sec. 6.5.1). All other advantages and disadvantages of image differencing apply to image ratioing. For instance, both images must be recorded under an identical solar illumination. Since seasonal differences tend to mask real changes, images from the same time of year should be used (Quarmby and Cushnie, 1989). Because such ideal conditions are rarely met in practice, it is critical to determine an appropriate threshold to distinguish pixels of change from pixels of no change, both of which should be normally distributed. Finally, this method is unable to take advantage of rich spectral information of the input images. Although each of the input images has multispectral bands as is the norm with satellite imagery, only one spectral band from each time can be used in a detection and the information contained in other spectral bands is utterly wasted in the change detection analysis. This sharply reduces the authenticity and accuracy of the detected change. This deficiency may be partially overcome with the use of a vegetation index that is derived from two spectral bands.
13.3.3
NDVI-Based Change Analysis
Although it is rather common to use a single band in spectral differencing change analysis, more spectral bands may be taken advantage of in the detection by deriving this input image from two spectral bands. Use of more spectral bands is preferable to a single band as, theoretically, more bands contain more spectral information. The use of more spectral information is conducive to improvement in the change detection accuracy. A common strategy of deriving a new image from two bands is called the normalized difference vegetation index (NDVI) (see Sec. 6.5.2). The newly derived NDVI band can be treated as a raw band, and NDVI-based change analysis can be implemented in NDVI image differencing and NDVI change classification. NDVI differencing is the same as image ratioing, except that the data used in the division are not raw pixel values but NDVI values. Unlike the subtraction of raw pixel values, an NDVI differencing image is meaningful and able to provide insights into the nature of
Multitemporal Image Analysis the detected change. That is to say, any change in the NDVI value is indicative of the change in ground biomass. Therefore, it is a better alternative to image ratioing in detecting vegetation-related changes. In fact, this method is much more accurate than other change detection methods in detecting vegetation changes (Michener and Houhoulis, 1997). In deriving NDVI, the spectral bands captured in the vicinity of red and infrared wavelengths should be used as they are more sensitive to change than other wavelengths. Since NDVI images are derived separately from respective temporal data, it is imperative to use the same spectral bands in the derivation for both sets of input data. One potential problem with NDVI-based image differencing is the difficulty in setting the appropriate threshold for significant change, such as change from full vegetation to no vegetation in mapping the burning intensity of a forest fire. Strategies for determining this threshold include combination of means with standard deviation, and empirical selection from the histogram of the ratioed NDVI result. The second method of implementation is temporal change classification of NDVI data, preferably using supervised classification rather than unsupervised classification as NDVI data are continuously changing. NDVI temporal change classification is very similar to spectral temporal change classification in that one of the input bands is substituted by NDVI. Thus, it shares the same conceptual foundation and potential disadvantages as spectral temporal change classification. NDVI temporal change classification is able to produce more accurate results in detecting vegetation changes caused by extensive flooding (Michener and Houhoulis, 1997). As a matter of fact, NDVI image differencing produced the highest accuracy among five change detection methods (spectral temporal change classification, temporal change classification based on NDVI, PCA of spectral data, PCA of NDVI data, and NDVI image differencing), and should be adopted for vegetation-related change detection (Garcia-Haro et al., 2001).
13.4
Postclassification Change Analysis In postclassification change analysis, land cover changes are detected from land cover maps that have been classified from remotely sensed data. It differs from change detection based on raw data in that two independently classified land cover maps are compared with each other following image classification. In these classified maps pixel values are codes corresponding to specific land covers defined by the analyst. Hence, it does not matter how the land covers are coded, so long as the codes are comprehensible to the analyst. These codes have a greatly reduced range in comparison to the raw pixel values that represent the amount of energy reflected by or emitted from the target. The number of pixel values equals the number of information
537
538
Chapter Thirteen classes produced in a classification. Owing to the reduction in the possible number of pixel values, the process and complexity of change detection based on classified images are considerably simplified. This reduced range is conducive to revelation of the nature of changes, as well. However, in order to make the two independently classified results fully compatible, the number of information classes and their nature must be kept identical before they can be used in a postclassification change analysis (refer to Fig. 13.6), even if the creation of two comparable classifications may be problematic for complicated landscapes (Weismiller et al., 1977), especially if the data used have a diverse range of spatial resolution. Postclassification change detection can be implemented aspatially or spatially, each having its own unique features.
13.4.1 Aspatial Change Detection Aspatial, or nonspatial, change detection involves numeric comparison of the detected areas at the categorical level (Lo and Shipman, 1990). This kind of change detection is accomplished by comparing the statistics of the same cover in two separate classifications. The detection is based on the arithmetic operation of subtraction. The amount of land cover change is ascertained by subtracting the statistics of the land cover in one of the results from that in another (Table 13.1). Any discrepancy in value in columns 3 and 4 of the table for the same row represents change. Dependent upon the order of subtraction, a positive difference could mean an increase in a land cover whereas a negative difference, a decrease in coverage. This method of change detection has the strength that the results are not subject directly to the spatial resolution of the original satellite data used in deriving the figures in columns 3 and 4 in Table 13.1, even though it exerts an indirect impact to a much lesser extent. This issue is so complex that it will be discussed under a separate heading in Sec. 13.6.1. Although easy to comprehend and implement, nonspatial change detection has the following four critical limitations: • First, the difference fails to reveal the dynamics of change. For instance, in Table 13.1 it is unknown to which cover a change (e.g., pasture) has turned, or from which cover a change (e.g., residential) has originated unless both classified maps contain only two information classes. Therefore, this kind of change detection is able to show the quantity of change at the categorical level only. • Second, the location where the changes have taken place is unknown and cannot be visualized either. • Third, it is subject to classification inaccuracy. Any classification inaccuracy introduced to either of the input maps propagates to the detected changes. At best, the accuracy of the change
Multitemporal Image Analysis
11 15 17 21 22 43 54 61 76 77 (a)
1 2 3 4 6 7 8 10 11 12 (b)
FIGURE 13.6 Land cover maps used in a postclassification change detection. (a) Land cover map interpreted from photograph; (b) land cover map derived from supervised classification of SPOT data. Both images were georeferenced to the New Zealand Map Grid (NZMG) coordinate system, covering a ground area of 90.23 km². For land cover codes, refer to Table 13.1. (Source: Gao and Skillcorn, 1996.) See also color insert.
539
540
Chapter Thirteen
Land Cover Codes
Cover on the Ground
11
Residential
15
Industrial
17
Other urban
21
Pasture
22
Orchards
1972
1994
Change
908.3
2118.2
+1209.9
187.4
286.3
+98.9
75.5
271.5
+196.0
3123.4
2062.3
−1061.1
0.7
139.2
+138.5
41.7
116.9
+75.2
3485.3
3287.8
−197.5
Mangroves
135.1
164.5
+29.4
76
Transitional areas
551.2
154.6
−396.6
77
Mixed barren land
499.4
406.7
−92.7
43
Mixed forest
54
Estuaries
61
+: increase in coverage; −: decrease in coverage. Source: Gao and Skillcorn, 1996.
TABLE 13.1 Land Cover Changes between 1972 and 1994 Detected Aspatially (Unit: ha)
analysis is equivalent to the product of the classification accuracies of the two individual land cover maps. • Finally, it is very time consuming to carry out the two classifications.
13.4.2
Spatial Change Analysis
Also known as per-pixel change analysis, spatial change analysis is a process of comparing land cover in both classified maps at the same spot on a pixel-to-pixel basis. The results of change detection can be presented nongraphically in a matrix form or graphically. Graphic visualization of change detection is a complex topic that requires a separate section (Sec. 13.7) to cover. The change detection matrix has a square dimension (Table 13.2), with the number of rows equal to the number of columns, both being the same as the number of information classes in the classification. All the main diagonal figures represent areas of no change, usually expressed in hectares. All the figures off the main diagonal indicate change. Whether a change is the source (i.e., change from) or destination (i.e., change to) depends upon the arrangement of the axis. As illustrated in Table 13.2, row figures represent land cover in 1972 (origin), and column figures land cover in 1994 (destination). Thus, all the figures in a column with the exception of the main diagonal show the quantity of changed-to covers, while all the figures in a row, except the main diagonal, represent the
Cover
11
15
11
–
15 17
17
21
2.2
12.5
42.6
–
39.3
2.4
21
724.5
22
22
43
54
0.8
61
60.4
1.1
2.9
0.7
7.8
0.2
–
5.3
–
13.7
0.2
1.8
0.1
67.7
196.2
–
124.2
44.7
8.6
0.1
–
–
0.2
–
–
43
4.8
–
1.3
9.7
1.4
54
46.4
7.0
1.1
21.4
–
76
3.2
77
Row Sum
17.0
3.2
103.3
–
6.9
33.8
97.3
–
3.5
0.7
61.7
37.4
83.4
66.2
1352.9
–
–
–
–
0.3
–
1.5
0.8
–
0.9
20.4
14.0
–
33.6
3.4
163.0
289.9
61
23.9
0.2
1.0
9.3
–
6.5
7.6
–
0.2
10.9
59.6
76
275.2
60.3
33.7
94.1
4.4
20.5
1.3
7.8
–
17.4
514.7
77
158.0
57.4
11.2
106.9
6.8
7.3
7.3
6.8
11.4
–
373.1
1314.8
197.2
257.7
323.5
138.8
95.6
34.6
89.6
125.8
Column sum
Across: land cover in 1972 (change from); down: land cover in 1994 (change to). Source: Gao and Skillcorn (1996).
TABLE 13.2
Change in Land Cover Detected in Raster Format (Unit: ha)
296.1
2873.7
541
542
Chapter Thirteen amount of changed-from covers. As an example, 724.3 in Table 13.2 means that 724.3 ha of pasture land in 1972 were converted to residential in 1994. Since the same amount of change often does not occur in the opposite direction, this table is seldom symmetric. All the potential changes between any two land covers in both directions (from and to) are shown with the total changes for a given cover calculated as the row or column sum. Therefore, spatial change detection is able to yield the most detailed and informative results, including the location where the changes have taken place as well as the amount and nature of change. Because of these strengths, this spatial change detection method has been widely used, even though it is subject to the accuracy at which land covers are mapped (Quarmby and Cushnie, 1989). The detailed and informative change results are obtained at the expense of a huge amount of work going into the change detection process that is very complex and time consuming to carry out. The analysis involves generation of land cover maps from respective satellite images or aerial photographs. The results of change analysis are also subject to misclassifications and inaccuracy in spatially registering the two land cover maps. Any residual errors in either of the results degrade the quality of the change detection outcome. This statement holds true irrespective of whether the change detection operation is carried out in raster format or the vector environment, even though either implementation has its own unique features and limitations.
13.4.3
Raster Implementation
Since all spaceborne remotely sensed data are recorded in raster format initially, implementation of change detection in raster format is advantageous in that minimal preparatory processing (e.g., unification of data format) is required. Already in the same format, all classification results to be used in a change detection are fully compatible with one another, even though it may be necessary to unify them to a common ground coordinate system. Conceptually, change is detected through subtraction of pixel values in one classified image from those of another, the same as in Eq. (13.1). Nevertheless, this wholesale subtraction is plagued by the uncertainty in the nature of detected changes, a problem caused by a huge number of permutations in mathematically combining all land cover codes. A given resultant difference in the output image could correspond to many different types of change. For instance, a difference of 1 in the resultant output in Table 13.1 can be derived from subtraction of 21 from 22, or from subtraction of 76 from 77. In the first instance, this difference represents change from pasture to orchard. In the second case, the same difference stands for change from transitional to mixed barren. In other words, a unique kind of change does not correspond to a unique difference value in the output image.
Multitemporal Image Analysis In order to avoid this confusion and uncertainty, it is necessary to recode the land covers in one or both of the input layers. A common recoding method is to convert one of the input images into a binary one. For instance, a binary land cover map is produced by assigning the same code (e.g., 0) to nine land covers in Fig. 13.6a while leaving the code of the only remaining cover unaltered. After data recoding the classified land cover map is turned into many binary ones. In order to reduce the number of new layers and the number of subsequent change detections, it is also possible to recode the data in such a clever way (e.g., enlarge the codes by 10 times) that no identical differences result from the subtraction. The subtraction of one of the binary maps or the ingeniously coded map with the uncoded one is able to reveal the potential change between this cover and all other covers (including itself) uniquely in the output image. In order to avoid negative values in the output image, it may be necessary to add an offset to the difference. In case of multiple binary layers, the process of change detection has to be repeated several times. The number of repetitions equals the number of binary coded maps or the number of information classes in the classified image. In each of the output images, the histogram of the overlaid image is normally analyzed to determine the total number of pixels, or area of change, and the final results are presented in tabular form (Table 13.2). Change detection in raster format can be implemented in an image analysis system such as ERDAS Imagine, or in a raster GIS. Which one to use is a personal preference. The selection is affected by two system considerations. First, how easy is it to recode the input land cover maps? Second, how easily can the attributes of the change detection images be retrieved, analyzed, and visualized?
13.4.4 Vector Implementation Vector-based change analysis is the natural choice if all input land cover maps are produced in vector format. Change detection in vector form is ideally implemented in a vector GIS, such as ArcGIS. The detection comprises two stages, spatial overlay and aspatial query of the attribute table of the change layer. The logic operations underpinning this change detection is “union” or “intersect” (Fig. 13.3), which enables the uniquely assigned identifier, area, and land cover code for every polygon in both input layers to be retained in the newly created output layer. Therefore, all possible changes between any two land covers are detected in one overlay owing to the retention of topology for all land cover parcels, in huge contrast to the raster implementation. During the rebuilding of the topology for the output change layer, new polygons are created through the intersection of arcs in the input layers if a land cover parcel has changed its boundary. All those newly created polygons indicative of change in land cover are given a new identity in the output layer while their old identities in the input layer are still retained. This enables the detected changes to be queried during the second stage.
543
544
Chapter Thirteen After spatial union, change detection degenerates into queries of the attribute table of the newly derived layer. Determination of the nature and quantity of change becomes a matter of finding out the polygons whose identity has changed and the amount of such changes. A few queries may have to be logically executed or combined to explore all possible types of change between one land cover and others. Queried results are usually presented statistically. Since vector format is a precise way to represent the real world (for details, refer to Sec. 14.1.2), the detected results are highly sensitive to any misalignment in spatially registering the two input layers or to any slight change in the boundary of the same polygon in both input layers (Fig. 13.7). The same boundaries could have shifted their position in different sources of data owing to a variety of reasons, such as residual positional inaccuracy or varying scales of representation. The net effect of the changed boundaries is the occurrence of tiny polygons that probably represent spurious changes. This issue is so complex that it will be systematically covered in Sec. 13.6.
13.4.5
Raster or Vector?
Data format is not an issue if both input land cover layers happen to be in the same format. This issue arises only when the two input layers have different formats. For instance, suppose one of the land covers is derived from visual on-screen interpretation of historic aerial photographs in vector format, and the other is derived from automatic per-pixel classification of satellite data in raster format. In this case it is possible to undertake change detection either in vector format or in raster format, depending upon personal preference. No matter which format is selected, one of the input land covers has to be converted from raster to vector or vice versa because at present no image processing systems or GIS allow a raster land cover map to be overlaid directly on top of a vector one to perform change detection between them. If the detection is carried out in the raster environment, the vector coverage has to be exported to an image analysis
(a)
(b)
(c)
FIGURE 13.7 Inconsistency in parcel boundaries during land cover change detection in the vector format. (a) Boundary in the first land cover map; (b) boundary in the second land cover map; (c) boundary in the overlaid land cover map. A number of new polygons have been created after the two input coverages are “overlaid” with each other because of the positional variation in the boundaries.
Multitemporal Image Analysis system and then rasterized. Rasterization is a process of converting the contiguous space into a continuous, nonoverlapping array of grid cells, which can be achieved automatically. The major parameter of rasterization is the specification of the cell size, which should be set identical to the spatial resolution of the other raster input layer. If vector format is selected, the raster input has to be exported from an image analysis system to a vector GIS, and subsequently vectorized. Vectorization is the process of forming objects (e.g., polygons) by tracing border pixels of a spatially contiguous cluster of pixels. Although this vectorization process itself is quite simple and easy to implement, additional painstaking processing may be indispensable to further perfect the vectorized cover, such as smoothing the boundary of tiny polygons or removing them altogether if the raster cover map has a coarse spatial resolution. In addition, perfecting of topology for the newly vectorized coverage is extremely tedious and time consuming. All the vectorized polygons must be manually labeled if the process cannot be automated. Because of these processing steps, vectorization is more difficult to accomplish than rasterization. As a matter of fact, vectorization is the most severe bottleneck in change analysis in vector format. This task is much easier to accomplish if the image is classified using the object-oriented method presented in Sec. 10.4. However, once this demanding hurdle is overcome, subsequent detection can be performed in a single overlay. The above discussion suggests that no matter which data format is selected, additional processing is essential in change detection. If raster format is selected, the processing in the form of date recoding, the most cumbersome preparatory step in raster implementation of change detection, takes place at the detection stage. In vector format the processing occurs prior to the detection. A huge amount of effort is required to construct a vector layer from the raster map and to ensure that its topology is complete and correct. Unlike in raster format, change detection in vector format is not subject to the spatial resolution of the input images. The data format not only affects the procedure, complexity, and speed of the change analysis, but also exerts an influence on the detected results. If detected in raster format, a total of 76 types of change have been identified from the two maps shown in Fig. 13.6, totaling 2873.7 ha (Table 13.2). The change from cropland and pasture to residential is the most predominant at 724.5 ha. By comparison, the minimum change, from other urban to bays and estuaries, and from orchards to residential areas, is as small as 0.1 ha. Judging from this small value, it is most likely artificial. In total, these spurious changes amount to 49 ha in quantity and 511.3 ha in area. A change detection in vector format reveals that a total of 2878.9 ha of land have changed cover type (Table 13.3). Of the 79 types of change identified, 34 have an area smaller than 5 ha. The most common types of change are from agricultural, transitional areas, and barren land to residential,
545
546 Cover*
11
15
11
–
17 2.2
21
43
54
61
2.0
3.2
17.2
3.1
103.6
7.1
0.3
0.1
4.6
0.1
7.2
33.9
97.4
14.3
0.1
1.7
0.1
–
3.3
0.5
61.6
125.1
44.9
8.7
83.6
66.1
1353.3
0.8
17
39.3
2.3
–
21
724.3
66.0
22
0.1
197.6 –
– 0.2
– 1.3
0.1
–
–
0.1
43
4.9
–
1.5
9.4
54
44.4
7.0
0.9
20.4
–
14.1
61
24.1
0.2
0.9
9.6
–
9.3
4.5
76
275.7
59.9
34.0
94.6
4.1
26.7
77
157.9
57.0
11.0
108.2
6.7
194.6
259.0
324.7
138.8
*Codes of land covers are defined identically to those in Table 13.2. Source: Skillcorn (1995).
TABLE 13.3
Row Sum
1.5
–
1314
77
1.2
43.3
Column sum
76
60.9
15
–
22
12.3
Matrix of Vector-Based Change Analysis (Unit: ha)
37.0 –
0.4
1.0
18.9
165.9
288.4
–
0.1
11.5
60.2
1.8
8.2
–
17.3
522.3
7.2
7.0
6.5
11.6
105.6
28.8
88.3
125.8
32.6
–
–
3.1
–
0.7
–
– 299.3
373.1 2878.9
Multitemporal Image Analysis and industrial areas. Some of the changes, such as from residential to mixed barren land, are illogical. They amount to 505.6 ha, or 17.6 percent of the total detected changes. The corresponding figures in both Tables 13.2 and 13.3 resemble each other remarkably, with a Pearson correlation coefficient of 0.9999. Among the 79 pairs of figures, only 9 have a disparity larger than 1 ha. The largest difference is 6.7 ha for the change from transitional areas to mixed forest. The total area of detected land cover change is nearly the same in both tables, differing from each other by 5.2 ha in absolute value, or by 0.18 percent in relative terms. Therefore, data format exerts a less drastic impact on the detected results than on the procedure and complexity of detection.
13.5
Novel Change Analysis Methods Apart from the above standard methods of change analysis, a few novel ones have been proposed to fulfill the same objective. These methods include spectral temporal change classification, PCA, change vector analysis, and correlation-based change analysis.
13.5.1
Spectral Temporal Change Classification
Spectral temporal change classification is also called layered temporal change classification, in which all images used in a change detection are merged to form a dataset as if they were taken at the same time. This layered multitemporal dataset is then classified, by either an unsupervised or a supervised method, on the rationale that pixel values in the multidate dataset would be similar in areas of no change, but would be significantly different statistically in areas of change. Spectral temporal change classification yielded a much lower accuracy in detecting vegetation changes associated with extensive flooding than other methods such as temporal change classification of NDVI, PCA of NDVI, and NDVI image differencing (Michener and Houhoulis, 1997). This method is limited by its inability to label change classes unless they are known beforehand (e.g., only one possible direction of change from vegetation to non-vegetation). Besides, combination of multitemporal data creates redundancy in spectral information present in some of the bands.
13.5.2
PCA
As discussed in Sec. 6.6.1, PCA is able to decorrelate the information content of the input spectral bands. Furthermore, the change in the multitemporal input data can be projected to one or more output component images. PCA is effective at identifying areas where change has taken place between two sets of satellite imagery (Byrne et al., 1980). The capability of PCA in change detection stems from the fact that one or more of the PCA components contain information that is related directly to change. For instance, spectral change caused by gypsy moths–induced hardwood defoliation can be identified in one of the PCA component
547
548
Chapter Thirteen bands (Muchoney and Haack, 1994). PCA-based change detection is usually implemented in two approaches, spectral PCA and NDVI PCA. Spectral PCA works on all the combined multitemporal bands. The combination of spectral bands from all times creates unnecessary data redundancy for areas that have not changed, in sharp contrast to areas of change that are characterized by an absence of correlation (Byrne et al., 1980). Such data redundancy can be reduced through PCA. Besides, change can be identified in one of the output component images (Michener and Houhoulis, 1997). PCA of stacked images produces superior results to those from the conventional classification method in that overestimation of land cover change is significantly minimized. Alternatively, using a supervised method, the most informative components can be classified interactively following PCA transformation (Li and Yeh, 1998). The classified image needs to be decomposed into a matrix of land cover conversion to produce a map of land cover change. In NDVI PCA, a new NDVI image is derived from the raw data. Multitemporal NDVI datasets are analyzed using the PCA method. Results in detecting flooding-caused vegetation change demonstrate that a similar accuracy level in classifying the transformed data was achieved by spectral and NDVI PCA (Michener and Houhoulis, 1997). Thus, transformation of a relatively small number of spectral bands will bring minimal benefits in improving image classification accuracy. No matter which method of implementation is adopted, the difficulty in labeling change classes remains as the detected change is not specific. Namely, it does not yield any information on from-to change classes.
13.5.3 Change Vector Analysis Change vector analysis is a change detection algorithm suitable for use with multispectral data. The general idea behind change vector analysis, initially proposed by Malila (1980), was further expanded to include two components of a vector: its angle indicating the direction and possible nature of change, and its length indicating the magnitude of change in the spectral feature space between the data recorded on two different dates (Jensen, 1996; Warner, 2005) (Fig. 13.8a and 13.8b). Assume the pixel in the n-dimensional (n being the number of spectral bands) multispectral domain has a value of DN(i, j)t1 = [DN(i, j)1t1 DN(i, j)2t1 ... DN(i, j)nt1]T in time 1 and DN(i, j)t2 = [DN(i, j)1t2 DN(i, j)2t2 ... DN(i, j)nt2]T in time 2, then the change can be expressed as ⎛ DN (i, j)1t1 − DN (i, j)1t 2⎞ ⎜ ⎟ ⎜ DN (i, j)t21 − DN (i, j)t22⎟ ⎟ Δ G = DN (i, j)t1 − DN (i, j)t 2 = ⎜ ⎜ ⎟ LL ⎜ ⎟ ⎜ DN (i, j)n − DN (i, j)n ⎟ ⎝ t1 t 2⎠
(13.3)
Multitemporal Image Analysis
Time 2
Band B
Band B
Time 2 Code 1 Code 2
Quadrant II (A<0, B>0)
Quadrant I (A>0, B>0)
Time 1
Time 1
Band A
Quadrant III (A <0, B<0)
Band A
Quadrant IV (A >0, B<0)
(a)
(b)
Time 2
ΔB Change vector
Mag
nitud
e
Band B
Time 1
Direction Band A
ΔA
(c)
FIGURE 13.8 Principles of change vector analysis with vector direction (a) categorized according to circle quadrants (Michalek et al., 1993); (b) categorized according to vector angle grouping (Malila, 1980); (c) both direction and magnitude expressed as continuous values (mVCA). (Source: Modified from Nackaerts et al., 2005.)
The total change in value ΔG is calculated using the following equation: ΔG =
n
∑ [DN(i, j)tk1 − DN(i, j)tk2 ]2
(13.4)
k =1
Of the two outputs in change vector analysis, change directions refer to the angle between the change vector and a reference direction, usually defined as the horizontal axis in a simple two-dimensional (2D) feature space (Fig. 13.8c). Unlike vector length, which is a
549
550
Chapter Thirteen continuous variable, vector direction has an angle ranging from 0 to 360°. One approach of dealing with this continuous range is to divide it into sectors. Presumably, each sector corresponds to a unique category of change, in addition to the quantitative magnitude of change. Drawing from methods in spherical statistics, change vector analysis can be extended to measure absolute angular changes and total magnitude of Tasseled Cap indices (brightness, greenness, and wetness). Summary spherical statistics of change vectors can be used to quantify both magnitude and direction of change (Allen and Kupfer, 2000). Unlike some other change detection methods such as image differencing and band ratioing, change vector analysis is able to take advantage of multispectral bands in its decision making instead of a few selected ones. It allows change images to be formed so that change can be interpreted and labeled with the greatest ease (Johnson and Kasischke, 1998). The conventional two-band change vector analysis can be extended to many dimensions via hyperspherical direction cosine change vector analysis by developing a simple procedure for selecting the change magnitude threshold automatically (Warner, 2005). Incorporation of multiple bands in the change analysis via hyperspherical direction cosine resulted in more accurate results than conventional change vector analysis. However, it is problematic to discriminate the nature of change from the derived distance in light of the large number of spectral bands used (Chen et al., 2003). Change vector analysis may be modified to preserve the information retained in the change vector’s magnitude and in the change vector’s direction as continuous data (Nackaerts et al., 2005). Owing to this modification, computation in change vector analysis is reduced to simple multidimensional differencing of cartesian coordinates. Change vector analysis is a valuable technique for change detection (Garcia-Haro et al., 2001; Lanjeri et al., 2004). In addition to raw pixel values, it can be adapted to analyze newly derived indices such as NDVI. In this case it compares the difference in the time-trajectory of a biophysical indicator (Lambin and Strahler, 1994). Moreover, NDVI can be replaced with other biophysical parameters, such as surface temperature, to study change of other features, or even by Tasseled Cap components transformed from raw satellite data (Schoppmann and Tyler, 1996). The modified change vector analysis algorithm has proven to be a promising method for detecting land cover changes in forested areas (Nackaerts et al., 2005). Its output expressed as continuously varying data are suitable for statistical change feature extraction in a follow-up step. As shown in Eq. (13.3), the mechanism behind change vector analysis is identical to that of spectral image differencing, except that the change is detected from multiple bands instead of only two bands. Therefore, this method shares the same prerequisites and limitations as the former method in three aspects:
Multitemporal Image Analysis • First, all images must be accurately registered with each other. Inaccurate image-to-image registration causes artificial errors that are impossible to eradicate during postprocessing. In areas of a large topographic relief, high resolution data should be orthorectified (refer to Sec. 5.7) with a platform-specific geometric model to ensure that residuals are controlled within the subpixel level. • Second, the input images must have a high fidelity in their radiometric properties. What is detected is radiometric change in the multispectral domain. Therefore, the detected result is subject to changes in the atmospheric condition, solar illumination, and background features (e.g., soil moisture content and vegetation vigor), none of which represent genuine change in land cover or features of interest. Therefore, radiometric calibration may be mandatory to achievement of temporal and spatial consistency through normalization of multidate data (Allen and Kupfer, 2000) before this detection algorithm can expect to produce reliable results. • Third, all change detection outputs are multidimensional and of a continuous nature. There is a lack of guidance in setting an appropriate threshold to separate genuine change from no change, even though they can be classified using supervised methods. This situation could be alleviated by adding more stages in the analysis (Chen et al., 2003), such as doublewindow flexible pace search designed to ascertain the threshold of change magnitude iteratively. The nature of change is determined by classifying direction cosines of the change vectors using the minimum distance classifier.
13.5.4
Correlation-Based Change Analysis
Proposed by Im and Jensen (2005), this change analysis method is based on correlation of two images of the same geographic area calculated within a subarea (e.g., a window of 3×3 pixels). A strong correlation between them is indicative of little change. Absence of any correlation is symptomatic of drastic change. The piecewise correlation between the two images is able to shed light on the location and numeric change value derived using contextual information within the specified neighborhood. Correlation calculated over a small neighborhood reveals local change (such as modification done to a building). However, such a scale of change is vulnerable to noises, such as shadow-induced changes that disappear over a large neighborhood. However, some small-scale changes (e.g., a newly constructed street or road) may not be picked up in the change analysis. This deficiency can be remedied by combining the calculated correlation image with an image classifier such as a machine learning decision tree (see Chap. 9).
551
552
Chapter Thirteen This correlation-based change analysis method can be extended to objects or image segments. In the detection object, correlation images are generated from meaningful objects and their properties (e.g., object slope and intercept) created from bitemporal imagery (Im et al., 2008). Neighborhood correlation images are also generated from bitemporal imagery within a window in a similar fashion. Both types of images yield visually useful information on change and have a comparable performance in revealing change. They are complementary in their utility. For instance, the “salt-and-pepper” errors on neighborhood correlation images are removed from object correlation images. Most unchanged land cover classes have a correlation coefficient over 0.8, in drastic contrast to weak correlation for changed classes. Change/no-change classes are easily separable through the application of the slope and intercept values. However, determination of the nature of change requires more processing such as image classification using the nearest neighbor and decision tree classifiers. This method is more efficient than per-pixel change detection owing to the reduced number of units in the analysis. However, the use of bitemporal data may be problematic if they have different look angles. This difference causes errors in the composite imagery and in the detected change results. Such a problem is avoidable if the input images used for change detection have been classified at the object level. In this postclassification change detection, the minimum unit in the calculation of correlation is objects that have been assigned a land cover code. Through ingenious coding of these land cover polygons, it is possible to identify the nature of change on the basis of the closeness of the correlation. For instance, a weak correlation indicates a change from cover type 1 to cover type 9, while a strong correlation suggests a change from cover type 1 to cover type 2. The disadvantage of this objectoriented postclassification change analysis based on correlation is its inability to reveal the direction of change, namely, from-to change. Given the same correlation coefficient value, it is uncertain whether the change is from 1 to 9 or in the opposite direction.
13.5.5 A Comparison The relative performance of different change detection methods, such as image differencing, PCA, and postclassification has been studied (Macleod and Congalton, 1998) In determining the change in eelgrass meadows at the nine-category level, image differencing was found to be more accurate than PCA that is more accurate than postclassification change detection. This relationship stood unchanged when the nine categories of change were amalgamated into binary categories of change/no change and the accuracy was higher for all methods. However, the pace of improvement was not uniform. Consequently, the image differencing method was much more accurate than PCA, which was also more accurate than postclassification change detection.
Multitemporal Image Analysis The inferior performance of the last method was due mostly to the misclassifications in the input layers. If modified, change vector analysis significantly outperformed standardized image differencing, image ratioing, and selective PCA in detecting change indicators (vegetation indices) that were categorized into three biophysically meaningful groups in terms of Kappa coefficients, regardless of the number of input bands (Nackaerts et al., 2005). However, this method is not so superior to standardized image differencing and PCA if the input images are separated by a short span (e.g., only 2 and 4 years instead of 6 years) when the input is reduced to only two bands.
13.5.6
Change Analysis from Monotemporal Imagery
Theoretically, it is impossible to detect changes from monotemporal data using all the change detection methods introduced previously because they make use of pixel values only. Change detection relies exclusively on comparison of raw or processed pixel values in multitemporal images while other image elements such as shape and size are ignored. Thus, a single pixel value obtained from a one-time image is unable to reveal whether change has occurred in a ground cover over time. However, these elements, unable to be taken advantage of by the computer in change detection, can be used by the human interpreter. If they can be incorporated into a change analysis, then the deficiency of not being able to detect change from a monotemporal image is overcome. Monotemporal change detection is preferable over multitemporal detection as it involves processing of fewer images, and avoids any pseudochanges not related to the radiance of the scene. This monotemporal change detection method is object-based in that pixels within certain spatial adjacency are treated as a group. In addition to their value, other spatial properties of pixels such as texture, shape, and location are also used in the decision making. This kind of detection is underpinned by the principle that a change in land cover is not always accompanied by a change in all image elements. For instance, some image elements may remain unchanged (e.g., size, shape, and location) within a certain spatial adjacency or may not have changed as extensively as pixel values (e.g., texture). A combination of the changed image elements with other unchanged image elements (e.g., location, size, shape, and pattern) forms the crucial clue for detecting changes from monotemporal images (Fig. 13.9). As illustrated in the aerial photograph, the tone of cropland in the southeastern direction had changed to that of nearby desert. However, desertified farmland still retained a definitive boundary and outline. Such clues help to detect the change in land cover from productive farmland to abandoned farmland in this particular instance. From the changed tone it can be inferred that farmland has been degraded by the encroaching sand dunes.
553
554
Chapter Thirteen
FIGURE 13.9 An example of using a monotemporal aerial photograph to detect change in farmland with a combined use of tone and pattern. Healthy farmland is dark toned; degraded and abandoned cropland has a light tone with an irregular shape and pattern.
It must be acknowledged that from monotemporal data it is impossible to know how long the change has been going on. In some cases where the boundaries of the abandoned farmland are indistinct as a consequence of severe degradation, it is impossible to demarcate the spatial extent of change precisely. However, it is still possible to identify change in the opposite direction, namely, land reclamation, a change from desert to farmland. Newly reclaimed land parcels still have a sharp and clearly defined boundary but lack any pattern inside (lower left corner in Fig. 13.9). No windbreak system has been established or established long enough to show up on the aerial photograph.
13.6 Accuracy of Change Analysis The result of change detection is affected by many factors, more factors than those affecting a single classification. These factors include differential spatial resolutions of the respective remote sensing data used to derive the land cover map, and the accuracy at which they are georeferenced. In addition, the results are also subject to processing errors introduced during change in data format, thematic generalization, and misclassifications. The impact of these errors is comprehensively
Multitemporal Image Analysis discussed in this section. The second part of the section is devoted to assessment of change detection accuracy. After the current methods of assessment are presented, two new methods of indirect and direct accuracy assessment methods are proposed.
13.6.1 Factors Affecting Detection Accuracy Classification Inaccuracy As discussed in Chaps. 7 to 11, no image classifiers can achieve perfect accuracy, even for homogeneous covers such as water. The accuracy is noticeably lower for highly heterogeneous covers like urban residential, which comprises a mixture of diverse covers. The classification accuracy is also lower if the spatial or spectral resolution of the satellite data is not commensurate with the mapped detail level of land covers. Thus, classification inaccuracies are inherent in the inputs of change analysis. Any inaccuracy in the mapped land covers in either of the input lands will propagate to the resultant change detection layer. The more input layers involved, the more adversely affected by their inaccuracies is the accuracy of the detected change.
Differential Minimum Mapping Units After an image is classified, the final results usually undergo thematic generalization to conform to a certain mapping standard. The degree of generalization is governed by the minimum mapping unit, the smallest area of land cover parcels that have to be identified and retained in the final classification results. The physical size of the unit varies with the scale at which the final results are represented (Table 13.4). The theoretic minimum mapping unit is 0.35 ha at a scale of 1:24,000, only slightly less than one-seventh of that at 1:62,500. Thus, the minimum mapping unit is related exponentially to the mapping scale. Additionally, the minimum mapping unit is also related to the spatial resolution of the source satellite data. The same minimum threshold in filtering the classification results is translated into a larger minimum mapping unit for satellite images of a coarse spatial resolution [e.g., Landsat Thematic Mapper (TM)] than a fine resolution (e.g., SPOT). For this reason it is quite possible that the same small
Mapping Level
Patch Size (ha)
1:500,000
I
150
1:62,500
II
2.5
1:24,000
III
0.35
Mapping Scale
TABLE 13.4
Minimum Mapping Unit and Its Relationship with Mapping Scale
555
556
Chapter Thirteen land cover parcel represented in the result obtained from large-scale aerial photographs may have been generalized in another result classified from satellite data of a coarser spatial resolution (see Fig. 13.6), even if both results are derived by the same analyst. Consequently, artificial changes are introduced to the detected result owing to differential minimum mapping units of the two input land covers.
Misregistration The accuracy of all per-pixel change detection results is susceptible to misregistration of the input land cover maps, even though the impact is limited to a much less degree in aspatial change detection in which only the entire area is affected, but not every land cover inside the study area. During image georeferencing, the allowable residual in horizontal accuracy is within one pixel. The presence of this residual in every image used in spatial change detection means that these images cannot be precisely coregistered spatially. Spatial mismatch can exert a marked influence on the ability of remotely sensed data to detect changes in land cover (Townshed et al., 1992). Registration accuracy at the subpixel level can have a large impact, and the most marked proportional changes tend to occur at the finest misregistrations. A misregistration on the order of one pixel causes an error of 50 to 100 percent of the true changes in NDVI due to land cover change when the pixel size is 500 m. To control the error to 10 percent requires a misregistration accuracy of around one-fifth of a pixel size or smaller in the worst-case scenario. Different from the effect of parcel boundary inconsistency (Fig. 13.7), misregistration is a global factor that affects all land covers indiscriminately (Fig. 13.10). Errors caused by spatially misaligning the two
(a)
(b)
(c)
FIGURE 13.10 Artificial changes along the border of land cover parcels caused by misregistration (e.g., a lateral shift) of two land cover layers. (a) Boundary in the first land cover map. (b) Boundary in the second land cover map. It is identical to that in the first map. (c) Spurious changes along some of the boundaries owing to misregistration, which occurs only horizontally in this case.
Multitemporal Image Analysis land cover maps affect the detected results in both raster and vector formats alike. Distributed around the border of land cover parcels, the resultant errors occur as narrow strips of a roughly uniform width. The formation of these bands is determined by the direction of misregistration. They are enhanced in a direction perpendicular to that of the misregistration, but subdued in a direction parallel to it (Fig. 13.10c). For this reason misregistration-caused change detection errors tend to congregate along the interfaces of land cover patches as spatial clusters in their distribution. The net impact of misregistration on land cover patches varies with their physical size and location. For a spatially extensive land cover, a slight shift in its geometric position will not lead to a significant change in its identity in the two input maps. However, the same minor shift would cause a profound change in identity for those highly fragmented parcels along their borders, across which a different land cover is encountered. Therefore, the relative errors are larger for land cover classes that are small in their patch size than those of a large patch size. As shown in Table 13.6, the detected spurious changes range widely from 5.7 ha for orchards to 121.9 ha for cropland and pasture. Such an illogical change is correlated inversely with the size of a land cover class. The presence of misregistration in spatial change detection means that pixel values at different locations are compared with each other, instead of land covers at the same location at different times.
Processing Errors The input land cover maps may have to be rasterized or vectorized in order to unify data formats prior to change analysis. During rasterization and vectorization, processing errors are inevitably introduced into the boundaries of all land cover parcels (Fig. 13.11). Vectorization and rasterization errors are affected by a few factors, such as the spatial resolution of the source/destination image, the regularity and complexity of the land cover parcel boundaries, and their orientation. The finer the spatial resolution of the raster image, the more closely the rasterized boundaries resemble their depiction in vector format. Naturally, a highly sinuous and irregularly shaped boundary suffers a large rasterization error. More errors are introduced to the entire layer if its dimension cannot be divided neatly by the raster pixel size. Any row or column residual less than half a pixel will be rounded down, causing a slight reduction in the total area. Conversely, any row or column residual above half a pixel will be rounded up, expanding the total area slightly. Rasterization errors can be appreciated from the figures in Table 13.5. After rasterization the original vector coverage lost 3.7 ha (0.045 percent) of its total area. Such a loss stems from its reduced dimension at the coverage border. At 9.624 by 9.36 km², the original vector coverage was reduced to 481 columns instead of 481.2 columns after rasterization. The rounding down of 0.2 columns caused
557
558
Chapter Thirteen
FIGURE 13.11 Rasterization and vectorization errors along the border of a land cover parcel. The magnitude of the errors is related to the grid cell size, the orientation of the boundary, and its regularity.
a loss of 3.7 ha. The rasterization error for individual covers varies between −0.2 and 1.5 ha (Table 13.5, column 4). These errors are related closely to the location and complexity of all parcels in a land cover class, the method, and cell size, but do not seem to correlate with the area of the land covers. Land Cover
Rasterization Vector
Rasterized
Vectorization Difference
Raster
Vectorized
Difference
11
908.3
906.8
1.5
2118.2
2116.7
1.5
15
187.4
186.4
1.0
286.3
285.2
1.1
17
75.5
75.5
0.0
271.5
271.5
0.0
21
3123.4
3123.6
−0.2
2062.3
2061.2
1.1
22
0.7
0.7
0.0
139.2
139.2
0.0
43
41.6
41.6
0.0
116.9
116.9
0.0
54
3485.3
3485.2
0.1
3287.8
3291.3
−3.5
61
135.1
134.5
0.6
164.5
164.6
−0.1
76
551.1
551.0
0.1
154.6
154.6
0.0
77
499.3
498.7
0.6
406.7
406.8
−0.1
9007.7
9004.0
9008.0
9008.0
Total
3.7(0.76*)
0.0(1.68*)
*Differences in root-mean-square (RMS) values.
TABLE 13.5 Difference in Land Covers after Vectorization and Rasterization (Unit: ha)
Multitemporal Image Analysis By comparison, vectorization does not cause the total area (9008 ha) of the coverage to change, owing to the absence of the “border effect” (Table 13.5). Nevertheless, minor changes still took place along the boundaries of land cover parcels after pixel cells were thinned to a line. This did not expand or reduce the area of a given cover type as significantly as expected because the minor expansion or shrinkage along the border could cancel each other out partially. As shown in the above example, vectorization errors range from −3.5 to 1.5 ha, with a mean of 0. These errors appear to be correlated loosely with the size of a land cover class and affected by its complexity. For instance, having the largest area of 3287.8 ha, bays and estuaries (54) also have the largest error of 3.5 ha. With an area of 286.3 ha, the industrial and commercial category (15) has an error of 1.1 ha. Expressed as the rootmean-square (RMS) error, the vectorization error of 1.68 ha is much larger than the rasterization error of 0.76 ha because the original raster map was classified from multispectral SPOT data of 20-m spatial resolution. The original vector map was produced from aerial photographs. Therefore, the amount of rasterization or vectorization error is related to the spatial resolution of the input land cover. In spite of these errors, the area of land covers in the original format and the converted format is remarkably similar to each other (Table 13.5). This similarity suggests that the detected change of land cover is not noticeably affected by data format. It has a minimal impact on the results of the postclassification change detection methodology; neither does it considerably affect the accuracy at which changes are detected. Although it is impossible to distinguish errors caused by misregistration, data processing, and differential minimum mapping units, all of them add up to no more than a few hectares. This value is well below the spurious changes in the order of tens of hectares (e.g., 60.9 ha from urban residential to cropland and pasture) (Table 13.3). Such false changes are accounted for by the inaccuracy at which a cover class is mapped. Impossible to verify in the field, the inaccuracy of the photograph-derived map is assumed to be low and thus not taken into consideration. In contrast, the accuracy of the 10 covers mapped from the SPOT data is much lower and ranges widely from 62 percent for other urban to 100 percent for bays and estuaries. The classification accuracy (A) has a Pearson correlation coefficient of −0.707 with the nonspurious change ratio for the vector-based results, and −0.700 for the raster-based results. The two variables have the following regression relationship:
Genuine change ratio = 26.914 + 2.157(100 − A) (R² = 49.92, vector) Genuine change ratio = 27.245 + 2.122(100 − A) (R² = 48.99, raster)
(13.5) (13.6)
Therefore, the more accurately a land cover is classified, the less spurious change exists in its detected results.
559
560
Chapter Thirteen It is impossible to rank the significance of the four identified factors precisely because of the varying nature of the scene. In most cases classification inaccuracy inherent in the land cover maps generated from remote sensing data is responsible mainly for spurious changes. In some cases it contributes as high as 50 percent toward the variation in the ratio of spurious change to genuine change. Differential minimum mapping units of the input maps become an important factor to consider only when the source data have drastically differing spatial resolution and detail levels. The errors caused by differential levels of thematic generalization, misregistration, and data format conversion are usually the least important among all the identified factors.
13.6.2
Evaluation of Detection Accuracy
It is important to carry out accuracy assessment for change analysis in order to provide quality assurance and to determine which change analysis method is the most accurate. Assessment of change analysis accuracy is challenging. All the issues, complications, and difficulty with assessing the accuracy of a single land cover map produced via digital analysis of remote sensing data are applicable to the assessment of spatial change detection accuracy. In addition, this assessment also faces a few unique problems, such as the selection of evaluation pixels and confirmation of the reference pixels’ identity in the historic result. The accuracy of change detection may be quantitatively assessed using a change-detection error matrix as proposed by Macleod and Congalton (1998). This table is identical to that for single-date image classification results (see Table 12.2), except that it has more rows and columns reserved for both change and no-change categories. In a three-cover classification, the matrix has a dimension of nine rows by nine columns, three for no change and six for change, even though not all possible changes have taken place. Thus, it can get out of control easily in constructing such a matrix if there are tens of land covers in the input layer. One method of simplifying the complexity of this change detection error matrix is to leave out those covers of no change. Even so it is still awkward to select a large number of evaluation pixels for these mapped classes in the input maps, given that their genuine identity has to be verified in both inputs. An alternative is to use single-date error matrices, and binary change/no change error matrices (van Oort, 2007). Based on the assumption that errors are independent in both inputs and that there is a maximum positive temporal correlation between them, single-date error matrices can be used to obtain the most pessimistic and most optimistic estimates of the transition accuracy. However, this matrix is of limited value as it does not reveal the accuracy of transition. To do so requires the construction of the full transition error matrix that does not even quantify certain classification errors; thus it fails to become the choice in reporting change detection accuracy.
Multitemporal Image Analysis To some extent the accuracy of the land cover change results can be gauged from the errors in the two input land cover maps in accordance with the behavior of error propagation. The effect of land cover errors on the accuracy of change maps has been examined by simulating error propagation in land cover change analysis (Burnicki et al., 2007). Simulated results demonstrate that the ability to detect the level of error in the change maps is affected by temporal dependence of errors in land cover maps. However, this study fails to address the accuracy issue of the change detection maps based on the errors inherent in the input maps. Thus, it is not possible to come up with a realistic measure of change detection accuracy via simulation of errors in the input land covers based on error propagation. Intuitively, change detection accuracy should be assessed by examining the results at selected sampling locations in a manner that is almost identical to the accuracy assessment of an image classification (see Chap. 12). In this per-pixel method the land covers in both original datasets as well as the results of change detection are examined simultaneously. The assessment is simplified if the pixel identity shown in both results is accepted at face value. Nevertheless, this blind acceptance is unable to differentiate the error caused by change detection from that caused by misclassification. This problem can be avoided by examining the genuine land cover in the respective raw images. This, however, brings out the issue of how to verify the identity of the reference pixels in the historic result. Another unresolved issue is the determination of an adequate number of pixels that have changed their identity. It is unknown how they should be selected (Congalton, 2004). It is even more difficult to select a certain number of reference pixels for a particular type of change. In spite of being able to realistically assess the accuracy of change detection, this direct method is tedious, expensive, subjective, error prone, and impractical. Thus, the indirect methods have to be relied upon. The concept of indirect assessment is based on the fact that illogical changes (e.g., from urban industrial to rural) are most likely not genuine in reality. Therefore, it might be possible to use the ratio of illogical change to total detected change as an indicator of change detection accuracy (Table 13.6). Judged against this indicator, the classes having a large area (e.g., built-up areas) are generally detected with a higher accuracy, whereas those having a small size are detected much less accurately. The amount of false change in the raster-based detection is very consistent with that in its vector counterpart, as are the change detection accuracies for individual classes. This assessment method based on spurious change is not perfect in that it involves the assumption that all the identified covers have been mapped at the 100 percent accuracy level, which is not true in reality. In addition, it is inapplicable to those classes that do not have illogical change associated with them. Therefore, an accuracy estimate cannot be generated for all individual covers. Besides, the criterion for false
561
562
Chapter Thirteen
Vector-Based
Raster-Based
Class
False
Total
Correctly Detected (%)
11
111.9
1314
94.8
15
9.4
194.6
95.2
17
14.9
259.0
94.2
15.3
257.7
94.1
21
121.9
324.7
62.5
122.5
323.5
62.1
22
5.7
138.8
95.9
5.9
138.8
95.7
43
26.8
105.6
74.6
23.1
95.6
75.8
54
28.8
28.8
0.0
34.6
34.6
0.0
61
88.3
88.3
0.0
89.6
89.6
0.0
76
30.6
125.8
75.7
31.0
125.8
75.4
77
67.3
299.3
77.5
66.9
296.1
77.4
505.6
2878.9
82.4
511.3
2873.7
82.2
Total
False
Total
Correctly Detected (%)
113
1314.8
91.4
9.4
197.2
95.2
TABLE 13.6
Detected Land Cover Changes and False Changes in Both Raster and Vector Formats
change is fuzzy in some cases. These deficiencies may be overcome with another method based on the classification inaccuracy of individual classes using error propagation theory. The accuracy can be calculated from the area-weighted classification accuracy of all the cover categories from which a land cover has changed [Eq. (13.7)]: ⎛ ⎞ ⎜ S ⎟ A j = 100 − ∑ ⎜ n i (100 − Ai )2⎟ ⎟ i =1 ⎜ ⎜ ∑ Si ⎟ ⎝ i=1 ⎠ n
(13.7)
where Aj = change detection accuracy of cover j Ai = accuracy of land cover class i (expressed as percentage) classified from satellite data Si = area of class i n = total number of land covers from which cover j has changed Determined from the area-weighted accuracy of classification using Eq. (13.7), the change detection from the two maps in Fig. 13.6 has an accuracy ranging from 74.6 percent for cropland and pasture to 88.4 percent for transitional areas, with an overall accuracy of 79.3 percent in vector format (Table 13.7), slightly lower than the
Multitemporal Image Analysis
Land Cover Class
Classification Accuracy
Detection Accuracy (%) Vector
Raster
Discrepancy
11
68
78.57
78.90
0.33
15
92
77.26
77.27
0.01
17
62
78.97
78.96
0.01
21
80
74.56
74.66
0.10
22
84
79.41
79.42
0.01
43
80
80.92
80.78
0.14
54
100
79.35
79.99
0.64
61
96
82.60
82.64
0.04
76
80
77.20
77.23
0.03
77
72
88.44
88.32
0.12
Overall
81.4
79.29
79.43
0.14
TABLE 13.7
Detection Accuracy Calculated from Classification Accuracy
overall genuine change ratio of 82.4 percent (Table 13.6). The change detection accuracy of a class bears little resemblance to its classification accuracy. The change detection accuracy for a given class differs from the genuine change ratio because this indicator is based on the classification accuracies of the classes that have been converted to this class. The detection accuracy of a land cover change in raster form closely resembles its counterpart in vector form. The largest discrepancy between corresponding accuracies is only 0.64 percent for mixed forest. These figures demonstrate once again that data structure does not profoundly affect the accuracy of change detection. This assessment method may yield information on the overall change detection accuracy, but it is not perfect yet because it is nonspatial. It does not address how misclassifications in either input image propagate into the change detection layer. This deficiency has been overcome with the modified error propagation model [Eq. (13.8)] proposed by Zhang (1994), in which the correlation r between the two input layers in change detection is taken into consideration.
Pri−>j = Pri,1[Prj,2 + (1 − Prj,2)r]
(13.8)
where Pri,1 = probability of a pixel classified as i (i.e., in the ith row and ith column) has the genuine identity of i in the first image Pri,2 = probability of a pixel classified as j (i.e., in the jth row and jth column) has the genuine identity of j in the second image Pri−>j = probability of a pixel whose detected identity has changed from i to j
563
564
Chapter Thirteen Equation (13.8) is valid only when the two layers are precisely coregistered with each other. In other words, there is no misregistration between them. In light of registration errors, Eq. (13.8) has to be modified by taking the boundary length and area of class j into consideration [Eq. (13.9)]:
Pci−>j = Pri−>j − 0.508 Pri−>j |dm| Lj,2/Aj,2
(13.9)
Pci−>j = probability of a pixel that changes its identity from i in the first input map to j in the second land cover map dm = registration error Lj,2 and Aj,2 = total boundary length and area of class j in the second land cover map, respectively
where
According to Eq. (13.9), change detection accuracy can be degraded by misregistrations for covers that have a large proportion of boundaries relative to the total. The joint application of Eqs. (13.8) and (13.9) is able to reveal the accuracy of a pixel with a changed identity so long as its accuracy in the source and destination layers is known.
13.7 Visualization of Detected Change It is difficult to visualize the detected change effectively because both the original and destination covers at the location of a change have to be represented in one map. A common practice is to use a specially designed color scheme to represent all the possible types of change. This method of visualization is limited in that the change from a source cover to all destination covers is not clearly conveyed in the map. Besides, the map readability deteriorates rapidly in light of many different kinds of change, even though the readability can be improved by omitting all those parcels whose identity has not changed. In this way the reader’s attention is focused more on the changed areas. Another means of visualizing change is to combine the use of two graphic elements (e. g., color and pattern) in the map. One is reserved for the source cover while the other is reserved for the destination cover. Since human eyes are more sensitive to change in color than in pattern, it is better to use color for the destination covers if they are considered more important than the source covers. The above visualization methods are able to exhibit all potential types of change in one map for results that are detected from only two land cover maps. They are inapplicable to change that has been detected from a series of maps. For instance, it is not possible to visualize the temporal evolution of urban sprawl over a period of time using the above methods. In this case only the destination cover is illustrated in the visualization. Such change is best visualized via animation in which this series of maps are superimposed on top of each other. The maps are displayed at a short temporal interval continuously as an animation on a
Multitemporal Image Analysis computer screen. Through animating the change maps at different times, the process of the gradually changing phenomenon can be effectively perceived by the viewer.
References Allen, T. R., and J. A. Kupfer. 2000. “Application of spherical statistics to change vector analysis of Landsat data: Southern Appalachian spruce-fir forests.” Remote Sensing of Environment. 74(3):482–493. Burnicki, A. C., D. G. Brown, and P. Goovaerts. 2007. “Simulating error propagation in land-cover change analysis: The implications of temporal dependence.” Computers, Environment and Urban Systems. 31(3):282–302. Byrne, G. F., P. F. Crapper, and K. K. Mayo. 1980. “Monitoring land-cover change by principal component analysis of multitemporal Landsat data.” Remote Sensing of Environment. 10(3):175–184. Chen, J., P. Gong, C. He, R. Pu, and P. Shi. 2003. “Land-use/land-cover change detection using improved change-vector analysis.” Photogrammetric Engineering and Remote Sensing. 69(4):369–379. Congalton, R. G. 2004. “Putting the map back in map accuracy assessment.”In Remote Sensing and GIS Accuracy Assessment, ed. R. S. Lunetta and J. G. Lyon, 1–11. Boca Raton, FL: CRC Press. Gao, J., and D. Skillcorn. 1996. “Detection of land cover change from remotely sensed data: A comparative study of spatial and non-spatial methods. Proceedings of the 8th Australasian Remote Sensing Conference, March 26–28, Canberra, CD-ROM. Garcia-Haro, F. J., M. A. Gilabert, and J. Melia. 2001. “Monitoring fire-affected areas using Thematic Mapper data.” International Journal of Remote Sensing. 22(4):533–549. Howarth, J. P., and E. Boasson. 1983. “Landsat digital enhancements for change detection in urban environments.” Remote Sensing of Environment. 13(2):149–160. Im, J., and J. R. Jensen. 2005. “A change detection model based on neighborhood correlation image analysis and decision tree classification.” Remote Sensing of Environment. 99(3):326–340. Im, J., J. R. Jensen, and J. A. Tullis. 2008. “Object-based change detection using correlation image analysis and image segmentation.” International Journal of Remote Sensing. 29(2):399–423. Jensen, J. R. 1996. Introductory Digital Image Processing, A Remote Sensing Perspective (2nd ed.). Upper Saddle River, NJ: Prentice-Hall. Jensen, J. R., D. J. Cowen, J. D. Althausen, S. Narumalani, and O. Weatherbee. 1993. “Evaluation of the coastwatch change detection protocol in South Carolina.” Photogrammetric Engineering and Remote Sensing. 59(6):1039–1046. Johnson, R. D., and E. S. Kasischke. 1998. “Change vector analysis: A technique for the multispectral monitoring of land cover and condition.” International Journal of Remote Sensing. 19(3):411–426. Lambin, E. F., and A. H. Strahler. 1994. “Change-vector analysis in multitemporal space: A tool to detect and categorize land-cover change processes using high temporalresolution satellite data.” Remote Sensing of Environment. 48(2):231–244. Lanjeri, S., D. Segarra, and J. Melia. 2004. “Interannual vineyard crop variability in the Castilla-La Mancha region during the period 1991–1996 with Landsat Thematic Mapper images.” International Journal of Remote Sensing. 25(12): 2441–2457. Li, X., and A. G. O. Yeh. 1998. “Principal component analysis of stacked multitemporal images for the monitoring of rapid urban expansion in the Pearl River Delta.” International Journal of Remote Sensing. 19(8):1501–1518. Lo, C. P., and R. L. Shipman. 1990. “A GIS approach to land-use change dynamics detection.” Photogrammetric Engineering and Remote Sensing. 56(2):197–206. Macleod, R. D., and R. G. Congalton. 1998. “A quantitative comparison of changedetection algorithms for monitoring eelgrass from remotely sensed data.” Photogrammetric Engineering and Remote Sensing. 64(3):207–216.
565
566
Chapter Thirteen Malila, W. A. 1980. “Change vector analysis: An approach for detecting forest changes with Landsat.” In Proceedings of the 6th Annual Symposium on Machine Processing of Remotely Sensed Data, June 6–3, Purdue University, West Lafayette, IN. 326–335. Ann Arbor, MI: ERIM (Environmental Research Institute of Michigan). Michalek, J. L., T. W. Wagner, J. J. Luczkovich, and R. W. Stoffle. 1993. “Multispectral change vector analysis for monitoring coastal marine environments.” Photogrammetric Engineering and Remote Sensing. 59(3):381–384. Michener, W. K., and P. F. Houhoulis. 1997. “Detection of vegetation changes associated with extensive flooding in a forested ecosystem.” Photogrammetric Engineering and Remote Sensing. 63(12):1363–1374. Muchoney, D. M., and B. N. Haack. 1994. “Change detection for monitoring forest defoliation.” Photogrammetric Engineering and Remote Sensing. 60(10):1234–1251. Nackaerts, K., K. Vaesen, B. Muys, and P. Coppin. 2005. “Comparative performance of a modified change vector analysis in forest change detection.” International Journal of Remote Sensing. 26(5):839–852. Quarmby, N. A., and J. L. Cushnie. 1989. “Monitoring urban land cover changes at the urban fringe from SPOT HRV imagery in south-east England.” International Journal of Remote Sensing. 10(6):953–963. Schoppmann, M. W., and W. A. Tyler. 1996. “Chernobyl revisited: Monitoring change with change vector analysis.” Geocarto-International. 11(1):13–27. Singh, A. 1989. “Digital change detection technique using remotely-sensed data.” International Journal of Remote Sensing. 10(6):989–1003. Skillcorn, D. J. 1995. Detection of Land Cover Change at the Auckland Urban Periphery: An Integrated Approach of GIS and Remote Sensing. M.S. thesis, University of Auckland. Townshed, J. R. G., C. O. Justice, C. Gurney, and J. McManus. 1992. “Impact of mis-registration on change detection.” IEEE Transactions on Geoscience and Remote Sensing. 30(5):1054–1060. van Oort, P. A. J. 2007. “Interpreting the change detection error matrix.” Remote Sensing of Environment. 108(1):1–8. Warner, T. 2005. “Hyperspherical direction cosine change vector analysis.” International Journal of Remote Sensing. 26(6):1201–1215. Weismiller, R. A., S. J. Kristof, D. K. Scholz, P. E. Anuta, and S. A. Momin. 1977. “Change detection in coastal environment.” Photogrammetric Engineering and Remote Sensing. 43:1533–1539. Zhang, B. 1994. “Comparison error analysis of two classified satellite images.” Australian Journal of Geodesy, Photogrammetry and Surveying. 61(December):49–67.
CHAPTER
14
Integrated Image Analysis
A
s shown in Chap. 11, nonimage ancillary data have been used increasingly in digital image analysis to overcome the limitations of conventional image classifiers and to improve the accuracy of classification results. In turn, the improved results from digital analysis of remotely sensed data also form an invaluable data source in a geographic information system (GIS) database. With these results the database can be updated more quickly and frequently than is possible otherwise. Thus, there exists a mutually interactive and beneficial relationship between digital image analysis and GIS. In the early 1990s a new geoinformatic technology called global positioning system (GPS), developed by the U.S. Department of Defense mainly for military uses, started to find a wide range of civilian applications. As an efficient and accurate means of spatial data acquisition, this satellite-based positioning and navigation system is able “to provide a global absolute positioning capability with respect to a consistent terrestrial reference frame” (Bock, 1996). The emergence of the GPS technology has not only confounded the approaches by which it can be integrated with image analysis and GIS to achieve more accurate classification results, but also has diversified the fields to which this integrated approach can be applied. Thanks to the integrated use of GPS, digital image analysis, and GIS, more and more problems in resource management and environmental modeling can be tackled with relative ease while new fields of application have become feasible. In this chapter the fundamentals of GIS and GPS are introduced, with their major functions related to image analysis highlighted. Through this introduction the necessity for the integrated approach of analysis is justified. All possible manners in which the three disciplines have been integrated are summarized and presented graphically in four models. The applications that demand different manners of integration are
567
568
Chapter Fourteen systematically and comprehensively reviewed. This chapter ends with a discussion on the prospect of and the obstacles to full integration.
14.1
GIS and Image Analysis There is no universally accepted definition for GIS in the literature. Goodchild (1985) considered GIS “a system that uses a spatial database to provide answers to queries of a geographical nature.” Burrough and McDonnell (1998) referred to it as “a powerful set of tools for collecting, storing, retrieving at will, transforming, and displaying spatial data from the real world for a particular set of purposes.” Irrespective of its precise definition, GIS plays a critical role in digital image analysis owing to its comprehensive spatial database and powerful spatial analytical functions.
14.1.1
GIS Database
Highly similar to digital image analysis, a GIS consists of a number of components, such as software, hardware, a user interface, and peripheral devices for data input, output, and display. Essential to all GIS analyses is a spatial database that contains a large collection of spatially referenced data and their attributes. Acting as a model of reality, data stored in the database represent a selected set or an approximation of real-world phenomena that are deemed important to be represented in digital form. These data originate chiefly from analog maps and aerial photographs, GPS and field data, satellite imagery, as well as statistical data (Fig. 14.1). Depending upon the nature of the data, they can be entered into the database via a keyboard, a digitizer, a scanner, or direct importing. Keyboard is the proper mode for entering nonspatial data. Both scanner and digitizer are suited for entering spatial data (e.g., original photographs and land cover parcels interpreted
Existing maps and photos
Satellite data
GPS data
Statistical data
Change in data format
Direct import
Linking with spatial entities
Scanning
On-screen interpretation/ digitization
Editing
GIS database
FIGURE 14.1 Sources of GIS data and methods of data input into the GIS database. Note: All spatial data must be projected to a common coordinate system before they can be fully integrated with other data in the database.
Integrated Image Analysis from them). Direct importing applies to existing data that have been converted to or saved in digital format already. A common method of analog spatial data entry is to scan them into digital format first and then trace all features, both point and linear, using on-screen digitization. Alternatively, the analog materials may be directly converted into digital form via a digitizer. Afterwards, the acquired digital data are made useful through editing. To be fully integratable with data from other sources, the spatial component of all data in the GIS database has to be transformed into a ground coordinate system common to all data layers already in the database. Nonspatial data, such as special statistics and questionnaire results, are entered into the computer either through direct porting if they are already in digital format, or via the keyboard otherwise. In either case the attribute data must be linked to spatial entities through an internally generated code or identifier before they can be queried, analyzed, and modeled. No matter where the data originate from initially, all the captured spatial data must be represented in either vector or raster format when stored in the GIS database. Each format of representation has its own unique strengths and limitations in terms of accuracy and efficiency.
14.1.2 Vector Mode of Representation In the vector mode of representation a real-world entity is abstracted either as a point, a line, or an area (Fig. 14.2). Exemplified by fire hydrants, hospitals, and transmission towers, point data are zerodimensional (0D) features in terms of their topological complexity. All point entities must be represented by a pair of coordinates. They Reality Tower Plantation forest
Lake
Track
Digital representation
FIGURE 14.2 Digital representation of objects of varying complexity in vector mode. Notice that the accuracy of representation is determined by the sampling interval as a straight line segment is used to connect any two adjacent nodes.
569
570
Chapter Fourteen Line
Line segment
String
Arc
Link
*
*
Directed link
*
*
Chain
*
*
FIGURE 14.3 Various forms of representing linear features in a GIS database.
indicate the horizontal location of ground features in a ground coordinate system. Linear features such as river channels, rail tracks, and highways are one-dimensional (1D) objects that require representation by a string of points. All linear features can be represented in one of six line forms: line segment, string, arc, link, directed link, and chain (Fig. 14.3). The end point of each line segment is called a node. The space between any two consecutive points is always linked with a straight line. Thus, line segments are the simplest form of representation among all linear features. They may be used to represent a street in a block or make up a string. Both string and arc are suitable for representing sinuous features such as river channels. Link and directed link are commonly used to represent the direction of movement associated with a linear feature, such as traffic flow of a oneway street. Chain is a combination of string with directed link. It is suitable for representing the direction associated with a sinuous feature (e.g., flow direction of a river channel). Areas, or polygon features, are two-dimensional (2D) objects that are represented in the same manner as linear features, except that the first node and the last node in this string of points are identical (Fig. 14.4). The specific format in which a real-world object is represented in the database is a function of the scale of representation. So the same ground feature may be represented either as a point or as an area. For instance, a 2D object such as a town or city that is normally considered
Integrated Image Analysis y 3
14 4
II I
13 12
2 III.I
III.II
5 6
1
7
8
11 9 10 x
Origin
FIGURE 14.4 Representation of geographic entities of various topological complexities in vector format.
Feature
ID Number
Representation
Point
I
x, y (single pair)
Line
II
String of x, y coordinate pairs
Polygon
III.I
Closed loop of x, y coordinate pairs
III.II
Closed loop sharing coordinates with others
an area may be represented as a point if the scale of representation is sufficiently small. In order to achieve efficiency and convenience in data management and retrieval, all vector data are organized into layers according to the similarity in their topological complexity, with each layer containing a unique aspect of the complex world (Fig. 14.5), in drastic contrast to topographic maps, which incorporate all represented features in one layer. For instance, all linear features may be separated into one layer whereas all land cover parcels (polygon features) are stored in another layer. All hydrologic features (e.g., coastal line, channel networks, and watershed boundaries) may also be organized into one layer. No matter what type of features a layer contains, it must be compatible with other layers in its spatial accuracy and georeferencing system so that the same object on the ground will have the same coordinates in all layers. This method of organization has a few advantages, such as efficient retrieval of features from the database. Analysis in some applications can be performed very quickly by activating only the concerned data layers while all other irrelevant data layers can be left out. The vector form of representation is precise. All real-world entities can be accurately represented with different combinations of the three fundamental elements: points, lines (and their variants), and
571
572
Chapter Fourteen
Contours
Footpaths
90 80 70 60 50
60
Schools
Cadastral
Land use
Composite (contours excluded)
FIGURE 14.5 The data-layer concept in a GIS database. Conceptually related spatial objects are organized into one layer known as theme or coverage (e.g., a layer may contain only stream segments or may contain streams, lakes, coastline, and swamps).
areas. This data structure is very compact, with little data redundancy. However, this data model is very complex owing to the need to encode and store spatial relationships among geographic entities (see Sec. 14.1.5). Comprehensive encoding of topology makes it efficient to carry out certain applications (e.g., spatial queries). Nevertheless, this overhead must be updated whenever the spatial component is altered. As a consequence of the complex topology and geometric problems, certain GIS analyses require considerable computation, while other analytical operations (e.g., spatial modeling) are almost impossible to implement in vector mode.
14.1.3
Raster Mode of Representation
In the raster mode of representation, the Earth’s surface is partitioned into a set of discrete but connected 2D arrays of grid cells (Fig. 14.6). Each cell has a regular shape. The most common shape is square, although triangle and hexagon are also possible. The size of each
Integrated Image Analysis Origin
Column Point
Row
(0D)
Area 2 (2D)
Area 1
Line (1D)
FIGURE 14.6 The raster form of feature representation. In this model of representation, all ground objects are presented as cells of a uniform size. Proper ties of different areal objects are represented as different cell values.
cell is known as resolution. All cells have a regular orientation. Each cell can be surrounded by four or eight neighboring cells, depending upon the connectivity number adopted. All cells are referenced to the origin in the upper left corner. Cell coordinates are implicitly coded by their row and column numbers, or the distance from the origin. Raster data are obtained by sampling the space at a regular interval, both horizontally and vertically. In this view of the world, all point features are represented as a single cell. All linear features are represented as a string of cells. An area is represented as an array of cells (Fig. 14.6). Thus, all features of different topological complexities can be stored in the same raster layer. The raster mode of representation treats the space as making up of grids of different values instead of objects. Objects exist only when a group of spatially contiguous cells are examined simultaneously. The attribute at each cell is represented as a code that can be nominal, categorical (e.g., 1 for forest and 2 for water), or ratio. Since each cell can have only one code, every aspect of the real world must be represented by a separate raster layer, for instance, one layer for elevation, another layer for land cover. The raster mode of representation has a simple and uniform data structure. It is very popular in representing spatially continuous surfaces. Besides, certain GIS operations, such as overlay, modeling, and simulation, can be efficiently implemented in this data structure. This model of representation is inherently compatible with remote sensing
573
574
Chapter Fourteen imagery. Therefore, it is very easy to integrate raster GIS data with remotely sensed data in undertaking sophisticated image analysis and spatial modeling. However, this data mode is limited in that the representation is very crude and highly approximate for point and linear features. The representation is also inaccurate for areal features that do not conform to a regular boundary (Fig. 14.6). Most of all, the accuracy of representation is adversely affected by the cell size. A large cell size may reduce the file size, but can cause loss of details. The indiscriminate partition of the space into an array of uniformly sized cells is also inefficient. A huge quantity of data must be maintained even if the feature of interest does not vary much spatially or does not occupy much space, resulting in severe data redundancy. Besides, it is impossible to search the data spatially without any links between any two cells in space. Consequently, topology cannot be built for the data, and some analyses (e.g., network) are impossible to carry out using data represented in this mode.
14.1.4 Attribute Data In addition to spatial data, the GIS database must also encompass attribute data. Attributes depict certain aspects of a spatial entity. These nongeographic properties describe the quality or degree of a selected aspect of spatial features stored in vector format. Each feature may have a number of attributes associated with it. How many attributes or which attributes should be retained in the database is governed by the purpose of constructing the database. The manner of storing the attribute data varies with the data format. In raster form attributes are represented as cell values inseparably from the spatial data themselves. Thus, both spatial (e.g., location) and nonspatial data (e.g., thematic value) are stored in the same raster layer. In vector format, however, attribute data are stored separately from spatial data. Attributes associated with a data layer are commonly represented in tabular format. In this attribute table, each row is called a record, corresponding to a geographic entity in the spatial database. A column represents a unique quality, or attribute, of that entity that can be either quantitative or qualitative (Table 14.1). Both rows and columns can be updated or expanded conveniently. New rows are addable to the table if new entities are created during a spatial operation. Obsolete records are removed from the table by deleting the relevant rows. Similarly, new attributes can be added to the table by inserting new columns. Because ground features are organized into layers according to their topological complexity, a separate attribute table must be constructed for each data layer. Attribute tables fall into three categories, corresponding to various topological complexities of spatial entities. For instance, an attribute table is needed for a point layer or coverage, a line table is essential for a layer containing linear features, and a polygon table is required for a coverage of polygon features. Neither geographic nor attribute data can be of any use if the two are not linked with each other. This linkage is established via an
Integrated Image Analysis
ID Code
Address
Organization
X Coordinate
Y Coordinate
1971
74 Epsom Ave.
Auckland College of Education
2667942
6478421
1972
15 Marama Ave.
Dr. Morris Rooms
2667970
6478580
1973
16 Park Rd.
Auckland Sexual Health Service
2668054
6480627
1974
95 Mountain Rd.
Cairnhill Health Centre
2668071
6479489
1976
98 Mountain Rd.
St. Joseph’s Hospice
2668142
6479408
1980
475A Manukau Rd.
Epsom Medical Care
2668496
6477045
1984
235 Manukau Rd.
Ranfurly Medical Centre
2668651
6478119
1989
2 Owens Rd.
Auckland Healthcare
2668731
6478789
1990
197 Broadway
Newmarket Medical Centre
2668879
6479859
9322
12 St. Marks Rd.
The Vein Centre
2669017
6479242
9337
3 St. Georges Bay Rd.
Parnell Medical Centre
2669355
6481055
9403
383 Great North Rd.
Dr. Mackay’s Surgery
2665674
6480254
9455
491A New North Rd.
Consulting Rooms
2665862
6479590
…
…
…
…
…
TABLE 14.1 An Attribute Table for Location of Medical Facilities in Auckland
internally generated identification number or code that is unique for every spatial entity. Each of the records in the database is assigned a sequential number (Table 14.1, column 1) automatically, corresponding to the same number in the spatial layer. In addition to spatial and attribute data, the GIS database also contains topological data.
14.1.5 Topological Data Topological data portray the interrelationship between different spatial entities and the relationship of one entity to other subentities in the database. The former is concerned with spatial arrangement and spatial adjacency. The latter depicts compositional relationship. Topological relationships among spatial entities must be explicitly spelled out and stored in the database if it is to be efficiently queried spatially. The
575
576
Chapter Fourteen y 3
4
3
5
7
6 2
4
31
32
6 5 33
2
1
1
9 7
8
x
Origin
FIGURE 14.7
Topological relationships for three adjacent polygons.
Link ID
Polygon ID
Line Segments
31
1, 2, 3, 4
32
5, 4, 7, 6
33
8, 5, 9
Left Polygon
Right Polygon
From Node
To Node
1
0
31
1
2
2
0
31
2
3
3
0
31
3
4
4
32
31
4
1
5
33
32
6
1
6
0
32
5
6
7
0
32
4
5
8
0
33
7
1
9
0
33
6
7
Node ID
X
1
16
3
2
3
3
3
3
30
4
13
30
5
20
31
6
29
15
7
30
3
Y
Integrated Image Analysis complexity of topological information for an entity that has to be stored varies with its spatial dimension. Polygon features have the most complex topology. Topology for linear and point features, by comparison, is much simpler. As illustrated in Fig. 14.7, polygons 31 and 32 are adjacent as they share one common boundary (first table). Both of them are made up of four line segments (second table), each defined by two nodes. All the nodes are further defined by a pair of coordinates (third table). When encoding the topological relationship it is imperative to conform to the established convention. Namely, if the clockwise direction is observed, then it should be adhered to for all polygons in the map to avoid inconsistency and potential confusion. Explicit encoding of all the potential relationships (e.g., belonging and neighboring) among spatial entities beforehand is a prerequisite to an efficient search of the database. Higher search efficiency is achieved if more relationships are stored in the database at the expense of maintaining a larger overhead of topological data. These relationships enable queries to be answered quickly. In addition, they also make certain GIS analysis functions possible. However, it must be pointed out that not all possible spatial relationships need to be encoded explicitly. For instance, those that can be determined from calculation of node coordinates (e.g., whether two lines are intersect with each other) during a database query do not need to be encoded explicitly. Apparently, this absence of codes slows down the query.
14.1.6
GIS Functions
A GIS can serve a number of functions, such as data storage, data retrieval, data query, spatial analysis, and results display (Fig. 14.8). Of these functions, data collection, input, and transformation are preparatory steps for the construction of the database. These generic steps are not related directly to any particular GIS applications. Once all the data are stored in the database in a proper format, they can be retrieved, queried, and analyzed, and the generated results visualized and printed if necessary. Data retrieval is a process of extracting a subset of the database and visualizing it graphically for effective communication. It takes advantage of the data storage function of a GIS. In this context GIS is treated as a data depository. Since data are organized logically, they can be retrieved quickly and efficiently. Data query is a process of searching the database to identify all the records that meet the specified criteria. In this sense, it is very similar to data retrieval. However, data query is not synonymous with data retrieval in that it can be performed on new data layers derived from spatial analysis or on nonexisting entities. In this case the attribute table has to be updated as new information has been generated following a spatial operation. It is the information newly generated from the analysis that can be queried. Spatial analyses that can be done to the data stored in the GIS database include topographic analysis, overlay
FIGURE 14.8 Functions of a typical GIS system and their relationship in the flow of data analysis.
analysis, network analysis, and geostatistical analysis. Some of these analyses are meaningful only when the data format is appropriate. All data retrieval, database query, and/or spatial analysis may be followed by display and visualization. Of all the aforementioned GIS functions, database query and spatial analysis are so important to GIS integration with digital image analysis that they will be covered in greater depth under separate headings.
14.1.7
Database Query
Database queries can be executed either nonspatially or spatially. An aspatial query is a search of an existing attribute table (e.g., a relational database) similar to an Excel spreadsheet file. It involves data retrieval followed by result display. In this kind of query, properties about spatial objects are retrieved and/or displayed without any change to the spatial component of the database. Namely, no new spatial entities are created as a result of the operation. Queries by no means are always such a simple operation of data recall. On the contrary, new attributes, such as population density and per capita income, may be created following a query. They can be inserted back into the original attribute table. The query is performed on one class of objects from one attribute table. It is executed by searching the database using a special attribute value or a combination of several values. Nonspatial queries strongly resemble those using the structured query language, an industry standard query language used by commercial
Integrated Image Analysis database systems, such as ORACLE, for relational databases. It has three keywords: SELECT, FROM, and WHERE. Their proper usage is illustrated here: SELECT : An attribute whose values are to be retrieved. FROM
: A relational table containing the data. WHERE : A boolean expression to identify the records. For instance, the next example illustrates a query of the database Auckland (suburb_name, population). It identifies all suburbs having a population over 50,000. SELECT population FROM Auckland WHERE population >50,000 Similarly, the query of a relational GIS database typically involves three essential ingredients: an attribute table, an attribute, and a selection criterion. A keyword to all queries is SELECT or RESELECT. The query is done in three steps: 1. Selection of the attribute table (e.g., auckland.pat) 2. Selection of the attribute value (e.g., population) 3. Specification of the selection criteria if any (e.g., >50,000) (can be combined with step 2 in the form of “population >50,000”) Example: The property prone to landslide must meet the following conditions: Rainfall: High Vegetation cover: Pasture Elevation: High Slope gradient: Steep Result of query: Location number 3
Location 1 2 3 4 5 6
Rainfall
Vegetation Cover
Elevation
Slope Gradient
High Low High Moderate High High
Shrub Forest Pasture Shrub Pasture Pasture
Moderate Low High High Moderate High
Gentle Gentle Steep Moderate Steep Moderate
FIGURE 14.9 An example of a data query using multiple criteria. In this query the property prone to landslide (location 3) is identified.
579
580
Chapter Fourteen
Conditional Query More sophisticated conditional queries can be formulated by combining different attributes or different values of the same attribute through boolean logic. For instance, area <50,000 AND area >10,000 would enable all suburbs with an area between 10,000 and 50,000 ha to be selected. All properties prone to landslides are identified in the query example illustrated in Fig. 14.9. In this query only one record (location 3) meets all the selection criteria. All those records that meet the selection criteria may be analyzed further to derive such statistical parameters as sum, average, and standard deviation. Spatial queries involve the use of locational information. A query can be issued for existing features after they are displayed on the computer screen. All three types of spatial objects (point, line, and area) can be queried by directly clicking on them on screen. All attributes associated with the selected entity are then displayed, such as street name, ID number, street address, location, and so on (see Table 14.1). In addition, it is also possible to query multiple objects at a time through multiple selections, or by defining a query area. All features within it are selected. These selected features are usually highlighted in a color different from that of the same class of objects to confirm their selection and to show their spatial distribution. All attribute data related to these objects are highlighted in the attribute table if it is displayed on screen already. Query of nonexistent objects is more complex, lengthy, and difficult to implement than query of existing entities. It may have to be preceded by some kind of spatial analysis, during which the geographic area to be queried is created first. If the queried area spreads over a few polygons, the query cannot be resolved with relational algebra. Relationships not stored in the database will have to be ascertained first using computational geometry, thus prolonging the query process. There are several types of such queries. The simplest form is point-to-point queries, such as identifying all point features within a spatial range from a given spot or identifying the nearest point(s) from a designated point. Typical queries are “Where is the nearest hospital from the accident spot?,” “How many restaurants are located within 500 m from here?,” and “Where is the nearest river from a burning house?” The last query exemplifies a point-to-line query. Region or zonal queries, and path queries are more complex than the above queries, and hence more difficult to undertake. An example of a zonal query is “Whose properties will be affected by a proposed landfill site or a motorway route?” To answer this kind of query, preparatory steps (e.g., buffering) have to be undertaken to create a new polygon to be used in the query first. Path queries are attempts to find the shortest route between two points in a network, such as the best route to navigate to the nearest hospital from the spot of a traffic accident. Its successful implementation requires a road network database in vector format.
Integrated Image Analysis The queried results can be visualized in map format to show the spatial distribution and pattern of the queried attribute. The number of attributes that can be visualized in one map is normally restricted to one, even though two are possible. The attribute value can be either numeric or categorical. Charts may also be produced for the queried results.
14.1.8
GIS Overlay Functions
Of all GIS analytical functions, overlay analysis is the most important and relevant to digital image analysis. Overlay analysis is defined as placement of a cartographic representation of one theme over that of another. Conceptually, it refers to stacking one map layer or theme over the top of another to generate a third coverage, namely coverage A + coverage B = coverage C (Fig. 14.10). This very intuitive but powerful GIS function is difficult to achieve with analog maps, but is a straightforward process in the digital environment. Overlay analysis can be performed to fulfill many needs, such as showing spatial correlation among different variables, revealing any cause-effect relationships among them, or modifying the database or study area, and detecting changes in land cover. There are different approaches by which the two input layers are stacked. In certain kinds of operations, their sequence in the input affects the overlay outcome. Prior to being overlaid, all input layers must be georeferenced to the same ground coordinate system, even though they may not cover an identical ground area. The topology of the output layer must be rebuilt after the operation in order to reflect the fact that new spatial objects may have been created out of the overlay analysis. Of special notice is that overlay is by no means
Coverage 1
+ Coverage 2
= Output coverage
FIGURE 14.10 The concept of spatial overlay analysis in GIS. All coverages involved in the analysis, including both the input and output ones, must be georeferenced to the same ground coordinate system.
581
582
Chapter Fourteen restricted to only two layers, a common number in practice. If a GIS can manage only one pair of input layers at a time, it is still possible to overlay more than two layers through multiple overlay analyses. For instance, three coverages of soils, crop, and farm practices may be overlaid to predict or to help understand yield potential. Any two of them can be overlaid first before the newly created layer is overlaid with the third one. Not all input layers in an overlay analysis contain the same type of features. In fact, it is quite legitimate for the input layers to contain features of different topological complexities. For instance, one layer can be point-based school or hospital locations while another layer contains the boundaries of suburbs. Combination of layers of different topological complexities fulfills different purposes of overlay analysis, such as identifying point in polygon, line on polygon, and polygon on polygon. In point-in-polygon overlay, one layer contains point data while another contains area (polygon) entities. The area boundary is used to group the points into spatial segments, but no new polygons are created during the overlay. The properties of the point attributes may be further studied by relating them to other statistical data, for example, to identify crime scenes in different suburbs and to explore potential factors contributing to the crimes. In line-on-polygon overlay, one of the input layers contains linear objects and another contains polygons. The sequence of entering the two layers in the analysis critically affects the overlay outcome. If the polygon coverage is the first input layer, then lines no longer exist in the output layer. Instead, they have become the boundaries of newly created polygons (refer to the heading “Identity” later in this section). After the analysis, more but smaller polygons are created through the intersection of polygons with lines. However, lines can also be partitioned into segments by the boundaries of the polygons if the line coverage is the first input layer. A potential application of this kind of analysis is to identify the types of land cover to be crossed by a proposed pylon and the length of the power line in each type of land cover. In polygon-on-polygon overlay, both coverages contain areal objects. Many new but smaller polygons are created after the operation. The topology of the output coverage needs to be rebuilt following the operation. There are different logic options in implementing polygon overlay, such as union, split, intersect, update, and identity. They are discussed in detail next.
Union In union a new polygon coverage is created out of two input ones using the boolean logic OR. The resultant output coverage retains all the features in either of the input layers. In other words, all features and attributes of both coverages are preserved in the output layer (Fig. 14.11). If the two coverages do not cover an identical ground area, the output coverage will always cover a larger area than that
Integrated Image Analysis
+
=
Union layer
Input layer
Output layer
FIGURE 14.11 Graphic illustration of union in overlay analysis. Any area unique to the union layer will be annexed in the output layer while the common area will not be duplicated.
covered in either of the input layers. New polygons are created through intersection of arcs in the input layers. They are not formed until the postoperation stage when the topology of the newly created coverage is constructed. Its attribute table is formed by joining the coverage items of both input layers. Besides, all existing polygons retain their identity in both of the input coverages prior to the operation. The sequence of inputting the two coverages exerts no effect on the output, even though both must be polygon coverages. It is illogical to use point or line coverage inputs in undertaking a union operation. Union differs from map join in that any area common to both layers will not be duplicated in the output layer. This operation is valuable in identifying land cover parcels whose identity has changed in change detection from multitemporal satellite images in which both input layers have exactly the same ground area.
Intersect Underpinned by the boolean logic AND, intersect creates a new coverage out of two input layers. After the two coverages are geometrically intersected through their coordinates, only the area and those features common to both the input and intersect coverages are preserved in the output layer (Fig. 14.12). Thus, the output layer always covers a smaller area than either of the input coverages if they have a different size. The attribute tables from both layers are joined as a single
+
Input layer
=
Intersect layer
Output layer
FIGURE 14.12 Graphic illustration of intersect overlay analysis. The output area is common to both layers, and the areas unique to either layer in the input are clipped off in the output.
583
584
Chapter Fourteen one with duplicated records deleted. These features are of the same class as those in the input coverage. The first (input) coverage can be point, line, or area. The intersect (second) coverage must always contain polygon features. It is not permissible to use a point or line coverage as the intersect layer. Similar to union, intersect is also a useful way of identifying land cover changes from multitemporal remotely sensed results in vector mode. In this case both the input and the intersect layers contain polygon-based land cover parcels.
Identity Similar to all overlay analyses, identity requires two layers in the input, an input (first) coverage and an identity (second) coverage. The input layer may contain points, lines, and polygons. However, the identity coverage must be polygon-based. Since most land cover parcels are polygons, the polygon option is the most common in image analysis. With this option, all arcs in the input coverage are intersected with and split by those in the identity layer (Fig. 14.13). New polygons are formed after the topology is rebuilt to update the attribute table. Unlike union, the geographic area covered by the output layer is identical to that of the input layer only, with all entities of the input coverage retained, and the area unique to the identity layer is clipped off. However, among the features in the identity coverage, only those overlapping the spatial extent of the first (input) coverage are preserved in the output coverage. Therefore, it is important to specify the correct sequence of coverages in performing the analysis. This operation is useful in unifying the spatial extent of all data layers related to the same geographic area that may cover a unique area of their own initially.
Erase and Clip The first of the two input layers in erase is regarded as the input layer and the second the erase coverage that defines the region to be erased. Features in the input layer overlapping the erase region (polygons) are removed after this operation (Fig. 14.14). The output coverage contains only those input features outside the erase region. The input
+
Input layer
=
Identity layer
Output layer
FIGURE 14.13 Graphic illustration of the identity operation in overlay analysis. More polygons in the output layer are created through the intersection with arcs in the identity layer.
Integrated Image Analysis
+
Input layer
=
Erase layer
Output layer
FIGURE 14.14 Graphic illustration of erase in overlay analysis. The area enclosed inside a boundary in the erase layer will be removed from the input layer after this operation.
coverage can contain point, line, or polygon features. However, the erase coverage must always contain polygon features. Output coverage features are of the same class as the input features. Their topology must be rebuilt after this operation. Erase is a useful operation in stripping off areas of no interest in certain applications, such as removal of land areas from an image to be used in water quality analysis. Clip is opposite to erase in that features in the input layer outside the erase region are removed, and those overlapping with the erase region are retained in the output layer (Fig. 14.15). The input (first) coverage may be a point, line, or polygon coverage. The clip coverage (second) contains a polygon that defines the clipping region. Essentially, a portion of the coverage is cut out using a “cookie cutter” in clip. Since only those input coverage features that fall within the clipping region are preserved in the output, the output layer is always smaller than the input layer in size, in sharp contrast to erase, in that the erased area could be anywhere inside the input layer. Clipping is a useful way of extracting features, or parts of them, from a large dataset or area. In particular, it is used very commonly in redefining the size of a remote sensing image. In this case the clipping layer contains the boundary of the study area.
Split As a feature extraction operation, split is very similar to clip in that the input coverage is divided into a number of output coverages, each covering a subarea of the whole coverage. The input coverage may contain point, line or polygon features, but the split coverage must
+
Input layer
FIGURE 14.15
=
Clip layer
Output layer
Graphic illustration of clip in overlay analysis.
585
586
Chapter Fourteen
Zone 1
Zone 2
+
= Zone 3
Input layer
Zone 4
Split layer
Output layer
FIGURE 14.16 Graphic illustration of split in overlay analysis. The input layer is partitioned into four subcoverages as there are four polygons in the split layer.
always contain polygons, through which the input coverage features are partitioned by the boundaries of the split polygons. The number of resultant coverages equals the number of polygons in the split coverage (Fig. 14.16). All the output coverages have the same feature class as the input coverage, but they are smaller than the input layer in size. Split produces an effect opposite to that of union, achievable via a series of clip operations. This analysis is useful in partitioning a huge geographic area into a number of smaller areas so that each one can be analyzed separately by several analysts to speed up the process of data analysis. Before this section ends, it must be emphasized that GIS overlay analysis sounds sophisticated, but simple to perform. At most, it is only an analytical tool, no matter how powerful it is. The driving reason behind them lies in the applications, not the computer operations. Apart from the specific technicalities behind different overlay operations, the image analyst needs to understand which one is best at achieving the desired objective of an application. Without such an understanding, overlay is merely an exercise in fancy graphics.
14.1.9
Errors in Overlay Analysis
When multiple layers are input into the computer to perform one of the spatial overlay analyses discussed above, intersection of all arcs in the respective coverages creates many new polygons. While some of them are genuinely formed by arcs representing the boundaries of different polygons in separate input layers, others are artificially created by arcs that presumably represent the same boundaries as a consequence of minor shift in their horizontal positions in different data sources. Characterized by a small size and an elongated (skinny) shape, these spurious polygons are called sliver polygons (Fig. 14.17).
FIGURE 14.17 Formation of spurious polygons in overlay analysis owing to a slight shift in the position of the same boundary in different input layers.
Integrated Image Analysis Composed of only two line segments in most cases, these skinny polygons have a large perimeter-to-area ratio. Sliver polygons can be formed for three reasons: 1. First, the horizontal position of the object being depicted has indeed shifted. For instance, a river channel could have shifted its course in the interim when it is surveyed in different seasons or years. If the channel is represented with a single line in different layers, sliver polygons will result. 2. Second, the same boundary is depicted at different scales in different data sources. Different scales mean different levels of generalization for the same boundary. For instance, the same coastal line looks slightly different at a large scale from that at a small scale. 3. Finally and more likely, sliver polygons are caused by minor artificial variations in indicating the boundary. The same boundary is not identical in all input layers because its nature is ambiguous or fuzzy, or its representation is not error free. For instance, soil boundary is rather fuzzy. Different pedologists may interpret and draw the same soil boundary differently. Moreover, artificial changes are inevitably introduced into its representation during digitization. Even if the same source is used, it is unlikely that the same boundary is captured identically by different operators, or even by the same operator at different times, owing to the use of varying spatial intervals (Fig. 14.18). Consequently, the same linear feature can look slightly different from one layer to the next. For these reasons, no two boundaries are exactly the same. That is to say, boundary inconsistency is the norm in the captured digital data, and sliver polygons are inevitable in the overlaid results. The critical issue is how to deal with them. The varying position of the same boundary in different input layers can be resolved through conflation, a procedure of reconciling the differences in boundaries by dissolving sliver polygons. Sliver polygons may be removed from the resultant output coverage by two means. The first method is to average the two sets of boundary lines
FIGURE 14.18 The impact of sampling interval on the appearance of a curved boundary (dashed line) in digital format. Solid line: the captured line consisting of line segments. Notice how the cur ve is more generalized at a longer interval.
587
588
Chapter Fourteen if the reliability of both boundaries is the same or unknown. This can be achieved by breaking apart the intersecting boundaries of the sliver polygons and then removing both line segments. The two dangling nodes left behind are then joined with a straight line. In this way the newly drawn line falls roughly in the middle of the dissolved sliver polygon. This process is lengthy and tedious as every spurious polygon has to be identified and eliminated manually. A better and more efficient alternative is to eliminate them automatically. A logic expression may be followed by the DELETE command that defines the characteristics of the polygons to be eliminated. A common elimination criterion is polygon size. All polygons, both genuine and spurious, are eliminated after the operation so long as their area falls below the specified threshold. Thus, it is important to set an appropriate threshold carefully so that genuine polygons are not affected. Needless to say, this method is much faster than the manual one. In the output layer, the removal is accomplished by dropping one of the longest shared borders between them. A more sensible way is to remove the segment of borders with a higher positional uncertainty. Of course, this will have to be done manually, thus prolonging the process of conflation.
14.1.10
Relevance of GIS to Image Analysis
GIS is related closely to image analysis in at least two areas. First, it provides a framework for preparing the data for analysis and for undertaking change detection analysis. Second, it is able to supply a huge amount of nonsatellite data in knowledge-based image analysis.
Analytical Framework Although most of the data used in image analysis are obtained from a sensor aboard a satellite or an aircraft, rudimentary GIS analysis is essential in getting the data into the right shape during data preparation. In this instance, the data must be subset to an appropriate size and shape to follow that of the study area closely so that precious time can be spared in subsequent analyses. The redefinition of a study area using its boundary file is effectively carried out using the clip function in a GIS overlay, as shown above. In digital image analysis it may be necessary to undertake a longterm longitudinal study of a geographic problem under investigation, in addition to analysis of remotely sensed data acquired at a given time. This involves spatial comparison of multitemporal results derived from the analysis of satellite image data of the same geographic area recorded at different times. This spatial comparison is virtually a spatial overlay analysis in concept. It is best undertaken in a GIS if the results are in vector format.
Supplier of Ancillary Data As shown in Table 14.2, a vast variety of spatial data are stored in a GIS database, all of which must have been properly edited and
Integrated Image Analysis
Data Category
Example Layers
Topographic
Elevation Gradient Orientation
Hydrologic
Stream channels Lakes and reservoirs Watershed Coastal line
Environmental
Soil pH Floodplain Protected reserves
Natural resources
Vegetation Land cover Farmland
Transport
Bus stops Passenger rail network Highway
TABLE 14.2
Exemplary Data Stored in a GIS Database
georeferenced to a common ground coordinate system. While some of them are best represented in vector format, others are more suited to the raster format of representation. These data can be easily exported to an image analysis system with a simple change in data format, or used directly without any change if the system is able to read the GIS data. They are the potential sources for deriving external knowledge (e.g., residential area must have a slope gradient <10°) in knowledge-based image analysis in an effort to improve the accuracy of digital image analysis. These data can also serve as a constraint to exclude certain areas (e.g., removal of land pixels) in certain applications such as analysis of water quality.
14.2
GPS and Image Analysis 14.2.1
Principles of GPS
Primarily, GPS is a ranging device for measuring the distance between a satellite in space and a GPS receiver on the Earth’s surface. The location of every point on the Earth’s surface in a global reference system can be determined if the distance between this
589
590
Chapter Fourteen receiver and three satellites is known simultaneously. The precise determination of location relies on the spread spectrum signal broadcast by the satellites in space. In total, there are 28 of them in the constellation. This quantity guarantees that a minimum of four satellites are always above the horizon and within the view field of a GPS receiver on the Earth. Each of these satellites transmits a coded signal of unique phases and amplitudes periodically, which is controlled by atomic clocks. The distance between each satellite and the GPS receiver is determined by tracking the transmission duration (the travel time) of the signal from the satellite to the receiver. The duration of signal propagation is determined by correlating the phase of the signal from the satellite with that from the receiver. The distance is calculated by multiplying the duration an electrical pulse travels from the satellite to the receiver by its speed using the following equation:
Range = v × t
(14.1)
where v = velocity of the electrical pulse. This constant has a value of 3 × 108 m/s t = duration of the signal traveling from the satellite to the receiver The determination of a receiver’s location on the Earth’s surface is accomplished via simultaneous tracking of the signals from four satellites by the same GPS receiver. If the distance between one satellite and the receiver is known, then the potential location of the receiver is restricted to a sphere, its radius being the distance D1 (Fig. 14.19a). If the distance from another satellite to the same receiver D2 is known simultaneously, then the potential location of this receiver is further restricted to the commonality of the two respective spheres, or their intersection, that is a circle (Fig. 14.19b). Moreover, if the distance from the third satellite D3 is also known at the same time, then the location of the receiver must be common to all three spheres, or the intersection of this circle with the sphere of the last satellite (Fig. 14.19c). This restricts the potential location of the receiver to only two points E and F, the outcome of intersecting the three spheres. Since one of the distances between the satellite and the intersecting point is unreasonably large or small, it falls outside the possible range between a satellite and the Earth. Thus, it can be eliminated at a high confidence. Theoretically, the location of the receiver can be known for certain from simultaneous tracking of the signals from three satellites. In reality, a fourth satellite is also tracked to ensure a higher accuracy of positioning (refer to Sec. 14.2.3). Location of the receiver is then determined from the four ranges, as well as that of the satellites, whose orbital parameters in space are precisely known at any given moment.
Integrated Image Analysis
D1
D1
D
2
(a)
(b)
D1
E
F
D
2
D3
(c)
FIGURE 14.19 The location of a receiver on the Earth’s surface can be determined by simultaneously tracking the distance between it and three satellites in space. (a) The potential position of the receiver based on its distance to one satellite; (b) the potential position of the receiver based on its distance to two satellites; (c) the potential position of the receiver based on its distance to three satellites. (Source: Modified from Trimble, 2007.)
14.2.2
GPS Accuracy
The accuracy of GPS readings can be degraded by a number of factors, of which the four most important ones are clock errors, the geometry of the satellite position, the atmospheric composition, and the logging environment.
Clock Inaccuracy The atomic clocks aboard the GPS satellites are highly accurate, at an accuracy level of 9 × 10−7 s. The accuracy of the clock in a receiver, however, is much lower than this to make it affordable. As shown in Eq. (14.1), any inaccuracy in timing the duration of GPS signal propagation in the atmosphere is enlarged by 3 × 108 times. The ultimate effect of timing inaccuracy shows up in the derived ranges between the receiver and the GPS satellites. Instead of converging at a point, these ranges will form a small triangle (Fig. 14.20). This effect can be resolved by adjusting the ranges so that they converge at a point. In this way the inaccuracy of the receiver clock is partially canceled out.
591
592
Chapter Fourteen
1
3 2
FIGURE 14.20 Nonconvergence of three ranges from three GPS satellites due to errors in timing and other factors. This can be resolved through adjustment to the range (dashed lines) to make all spheres coverge at a point.
Satellite Position The second factor is the geometric configuration of the satellites in space with respect to the observing spot, a phenomenon commonly known as geometric dilution of precision (GDOP). Radio signals from different satellites eventually converge at the receiver (Fig. 14.21). Owing to the inaccuracy factors mentioned above, the signals will not converge at a point. Instead, they most likely intersect within a small zone. The physical size of this zone represents GPS satellite 2 GPS satellite 1
GPS satellite 2
GPS satellite 1
Enlargement
Enlargement
GDOP
GDOP
(a)
(b)
FIGURE 14.21 Uncertainty formed by the intersection of signals from two satellites due to their geometric configuration. (a) Good intersection with a small GDOP; (b) poor intersection with a large GDOP.
Integrated Image Analysis uncertainty in positioning the receiver. Therefore, the smaller the intersection zone, the less uncertain the logged position. The shape and size of this zone vary with the direction of the satellite signal propagation. The best configuration is achieved when the signals from two adjacent satellites are intersecting at a right angle to each other, resulting in the smallest intersection zone (Fig. 14.21a). A large uncertainty zone of intersection is created if the satellites are located close to each other in space (Fig. 14.21b). This uncertainty is minimized by tracking only those satellites forming a small GDOP among all the satellites in the view field. This can be accomplished by specifying an appropriate GDOP threshold in the receiver setting. Data logging is automatically turned off whenever this threshold is reached or surpassed.
Atmospheric Composition The physical composition of the atmosphere affects the velocity of electromagnetic-radiation signal propagation in it. The nominal speed at which the electromagnetic-radiation signal propagates in the atmosphere (3 × 108 m/s) varies with the vertical composition and heterogeneity of the atmosphere. In particular, ions in the air interfere with light photons, and thus reduce the velocity of the signal. The effect of ionospheric refraction can be eliminated through the use of dual frequencies in transmitting the signal (Hofmann-Wellenhof et al., 1997). Thus, dual frequency GPS receivers are more accurate than single frequency receivers.
Logging Environment The third factor is the logging condition, such as the ambient environs of the GPS receiver. Once transmitted, the radio signal from the satellites propagates in the atmosphere in a straight direction. Upon reaching the surface of the Earth, the path of the satellite signal propagation may be bent or disrupted by the presence of tall buildings in the vicinity of the receiver. Instead of reaching the receiver directly from the satellite (path 2 in Fig. 14.22), the signal reaches taller obstacles nearby first, and is then deflected sideway before it finally reaches the receiver (paths 1 and 3 in Fig. 14.22). This phenomenon, known as the multipath problem, is rather common in urban areas where there are many highrise buildings. It is also a very important factor to consider in a forest setting. Tall trees near a GPS receiver can prevent the signal from reaching the receiver altogether in the worst-case scenario. In a less severe situation, the signal is so drastically weakened that data logging is possible only intermittently. A viable means of overcoming this problem is to mount the receiver on a pole raised above the canopy.
14.2.3
Improvements in GPS Accuracy
Since the accuracy of GPS readings is degraded by a number of factors, it may be necessary to improve it to meet the requirements of certain image analysis applications. The reliability of the logged GPS readings can be
593
594
Chapter Fourteen GPS satellite
Path 3 Path 1
Path 2
GPS receiver
FIGURE 14.22 The multipath problem in GPS data logging in which the travel time of the signal from the satellite to the receiver is prolonged owing to deflection from a barrier nearby before it reaches the GPS receiver eventually.
improved via two approaches, differential correction and averaging of multiple readings at the same location, or their combination. These two approaches work using different principles, have different requirements, and are suitable for differently logged data. Differential GPS requires the use of at least two receivers that track the signals from the same set of satellites. One receiver, called the base station, is placed at a known position (e.g., a survey mark). Therefore, the positioning inaccuracy, or offset, between the true position and the GPS-logged one is precisely known at any given moment. Another GPS receiver called the rover is deployed in the field, tracking exactly the same signals as the base receiver simultaneously. The principle underpinning GPS differential correction is that inaccuracy inherent in the signal of GPS satellites remains roughly unchanged within a certain geographic range (e.g., <500 km). So the amount of time-based offset at the base station is the same as that in the field. The disparity between the GPS coordinates and the true location of the base station is used to offset the coordinates measured from the rover receiver in the field. With differential correction, GPS readings can be accurate to submeters or even higher. This method of correction is applicable to all GPS readings, no matter whether they are obtained statically at a point or dynamically along a route, so long as timing of data logging is retained in the logged file. Differential GPS, nevertheless, cannot be implemented if the user has access to only one GPS unit, or where there are no known or suitable survey marks in the vicinity of the rover receiver, a problem common in some geographic regions of the world. This deficiency may be overcome by
Integrated Image Analysis placing the rover receiver stationary at a position whose true coordinates can be determined via averaging of multiple readings. The rationale underlying this improvement is that some errors caused by the aforementioned factors are spatially random at a location. Naturally, certain inaccuracies will cancel each other out at least partially as a result of averaging multiple readings logged at the same location. Consequently, the averaged position is much closer to the true location than individually logged positions. It has been demonstrated that averaging of multiple loggings at the same spot is able to enhance the reliability of a GPS-derived location (August et al., 1994). If the number of data loggings is sufficiently large, errors inherent in the raw coordinate readings can negate each other to such a level that the GPS accuracy is sufficiently high to meet certain application requirements. As shown in Table 14.3, averaged discrepancies between uncorrected and corrected GPS coordinates, expressed as distance, are correlated inversely to the number of positions logged. The mean discrepancy drops sharply from 57.53 to 21.84 m as the number of loggings doubles from 15 to 30. Even at 30 loggings, averaged GPS coordinates are still more accurate than those read from a 1:50,000 topographic map (Table 14.3). However, averaging of more than 30 loggings brings little further improvement to the reliability of uncorrected GPS coordinates as the discrepancy tends to stabilize at a large number of loggings. Therefore, averaging is quite a legitimate and effective method for correcting point-based loggings up to a certain degree. Much more accurate positioning results are obtainable by combining the two methods, namely, to average all differentially corrected GPS coordinates of the same location. This combined method can substantially improve GPS accuracy more than either of the methods individually.
Distribution of Discrepancies
No. of Data Loggings
≤10
20–30
≥30
Mean Discrepancy (m)
180
7
9
3
1
13.99
90
8
6
4
2
14.36
60
6
8
2
4
23.90
45
4
11
2
3
19.31
30
5
9
2
4
21.84
15
7
6
3
4
57.53
3
5
3
9
33.00
Topographic map∗
10–20
∗
The map used to obtain the coordinates has a scale of 1:50,000. Source: Modified from Gao, (2001).
TABLE 14.3 Mean Discrepancy between 20 Pairs of Differentially Corrected and Uncorrected Coordinates Expressed in Distance
595
596
Chapter Fourteen It must be noted that averaging is applicable to GPS readings logged at a stationary position. In other words, it is not applicable to data logged in a dynamic environment, such as from a moving vehicle along a route.
14.2.4
Relevance of GPS to Image Analysis
GPS is relevant to image analysis in two areas: image georeferencing and supply of external data in knowledge-based image analysis. Ground control point (GCP) coordinates essential in geometrically rectifying satellite imagery (see Sec. 5.5.3) used to be routinely acquired from analog topographic maps prior to the era of GPS because of their ready availability and high geometric fidelity. Topographic maps, however, may not be available at the right scale or for all parts of the world. They are completely ineffective in areas where no distinct landmarks are present or where the scene has undergone drastic changes since the compilation of the maps. They may not show landmarks that can be reliably located in satellite imagery owing to map generalization. Most importantly, map-derived coordinates may not be sufficiently accurate to meet accuracy requirements, especially for very high resolution satellite imagery. These problems may be overcome with the use of GPS technology. GPS receivers can log the coordinates of GCPs at a much higher accuracy level. Even uncorrected GPS coordinates have accuracies higher than those read from 1:20,000 topographic maps in an urban area (Table 14.4). Of the 20 pairs of coordinates for the GCPs, 6 have a distance discrepancy larger than the corresponding figure for the topographic map. All the discrepancies have a root-mean-square (RMS) error of only 17.02 m, smaller than 25.93 m achieved from the topographic map. This outcome demonstrates the potential of using uncorrected GPS coordinates as a means of establishing ground control in geometrically rectifying remotely sensed imagery. More reliable GPS coordinates can lead to more accurate image rectification. If the GPS coordinates are differentially corrected, a much higher accuracy is achieved in geometrically rectifying satellite images irrespective of their spatial resolution (Table 14.5). By comparison, GCP coordinates derived from topographic maps, in most cases, result in the lowest accuracy, even lower than uncorrected GPS coordinates. Coordinates averaged from as few as 30 uncorrected GPS loggings achieved rectification accuracy comparable to that of a 1:20,000 topographic map. This demonstrates that GPS without differential correction can replace topographic maps as an alternative, reliable source of geometric control in rectifying satellite imagery. Although primarily a positioning method, GPS can be used to undertake linear measurements (profiling) if a receiver is carried onboard a traveling vehicle while data are logged at a regular temporal or spatial interval. The obtained linear feature can be directly imported into an image analysis session as an additional constraint in a knowledge-based image classification. Also, GPS plays an indispensable role in postclassification processing, such as assessment of
Comparison of Discrepancies between Differentially Corrected GPS Coordinates and Those from Averaged GPS Loggings and Topographic Maps (Unit: m)
598
Chapter Fourteen
Satellite Imagery Coordinate Sources
SPOT PAN (10 m)
SPOT XSL (20 m)
Landsat TM (30 m)
No. of Loggings Used in Averaging
In Pixels
In Meters
In Pixels
In Meters
In Pixels
In Meters
180
1.71
17.1
0.86
17.3
0.81
24.2
90
1.72
17.2
0.92
18.4
0.87
26.0
45
2.45
24.5
1.36
27.2
1.20
35.9
30
3.02
30.2
1.68
33.6
1.38
41.5
Differentially corrected
0.76
7.6
0.63
12.8
0.66
19.9
Topographic maps
3.26
32.6
1.60
32.0
1.34
40.2
∗
A total of 18 or 19 GCPs were used. Source: Modified from Gao, 2001.
TABLE 14.5
Comparison of Image Rectification Accuracy (RMS Errors) among Uncorrected and Differentially Corrected GPS, and Map-Derived Coordinates ∗
classification accuracy. The genuine identity of all selected evaluation pixels must be verified in the field. An efficient method of navigating to these spots is via a GPS unit.
14.3
Necessity of Integration The strength of GIS lies in its comprehensive database and its powerful analytical capability of handling a wide range of georeferenced data, including those obtained by means of digital image analysis. These functions, however, cannot be fully realized if the GIS database is incomplete, inaccurate, or obsolete. The data contained in a GIS database are spatial (e.g., boundaries of land cover parcels), thematic (e.g., types of land cover), or topological. As shown in Fig. 14.1, the spatial data and the thematic data associated with them used to originate from existing thematic maps. These secondary data sources may not show all desired features. Some data already in the database, such as roads in an urban area, could have changed owing to rapid urban sprawl. New roads could have been constructed since the map was compiled. The existing database needs to be updated and expanded. Database updating can be easily achieved by the use of image analysis or GPS. Aerial photographs and satellite images are original,
Integrated Image Analysis and satellite images in particular, are much more current than topographic and thematic maps. GPS is an efficient method of collecting up-to-date data in a timely fashion. The integration of image analysis and GPS with GIS will considerably supplement and diversify the spatial data needed in a GIS database, which is especially strong at integrating data from such diverse sources. Remote sensing deals with the gathering and processing of information about the Earth’s environment, particularly its natural and cultural resources, through the use of photographs and related data acquired from satellites. Remote sensing distinguishes itself from GPS and GIS by its strong data acquisition capabilities. Remote sensing is able to supply a wide variety of spatial data acquired from various sensors and platforms over a wide range of spectrum to a GIS database. It is even possible to obtain highly accurate topographic data (e.g., LiDAR data) directly from an aircraft. Before these data can be integrated with data from other sources for a particular application, they must be processed to various levels to meet the necessary accuracy standard and detail requirements. Therefore, remote sensing image analysis can no longer function perfectly well as a stand-alone discipline that provides definitive end products, such as maps, statistics, tables, and/or reports. It must be integrated with other systems in order to fulfill more applications. An important processing function of image analysis is geometric rectification. All original aerial photographs and satellite images are geographically referenced to their own local coordinate system. They must be transformed to a common or a universal system if multiple frames of images are involved in a study, or if they are to be spatially overlaid with data from other sources. Image georeferencing requires some kind of geometric control on the ground that can be provided by GPS either in a real-time fashion or in a postacquisition session. Pointbased data at selected landmarks are essential in georeferencing existing remotely sensed images or those scanned from analog images. Nevertheless, GPS can never replace aerial photographs or satellite images because it can log nongraphic data along linear features at most. Although GPS is able to acquire data on its own without assistance from the other two disciplines, its function can be enormously enhanced or expanded if integrated with remote sensing. Use of aerial photographs and satellite imagery facilitates navigation. The target area is easily identified from the graphic images beforehand. The GPS user can gain confidence from these photographs and images in navigating to the destination spot in the field and in getting a better picture about the surrounding environment. Automatic navigation to the desired destination also has to rely on road and street maps, both of which can be extracted from a GIS database. Even the routing from the origin to the destination can be fully automated in a GIS with the assistance of a road-network layer.
599
600
Chapter Fourteen This discussion demonstrates that digital image analysis, GIS, and GPS have complementary functions, each with its own unique strengths and limitations in certain aspects of data acquisition or data analysis. Their whole is larger than the sum of the individual parts. If functioning individually, each technology may perform poorly or incompetently in certain geospatial applications. Conversely, their unique strengths can be maximized through integration with other technologies. In particular, this integrated approach has considerable potential in resource management and environmental monitoring (e.g., wildfire fighting). Additionally, integration also broadens the scope to which they are applicable and opens up new applications such as realtime emergency response. The integration of image analysis with GPS and GIS in combination with ground monitoring systems is an efficient method of managing, analyzing, and outputting spatial data for regional resources management. More areas of application are possible with innovative ways of integration.
14.4
Models of Integration There is a wide range of approaches by which remote sensing image analysis can be integrated with GPS and GIS, depending upon the purpose of integration. These approaches are best conceptualized and summarized in four models (linear, interactive, hierarchical, and complex), each having its own characteristics.
14.4.1
Linear Integration
This model of integration maximizes the unique strengths of each discipline. Namely, GPS is employed to provide geometric control for remotely sensed data such as aerial photographs and satellite images. The photographs and images that have been rectified to the required accuracy level are then exported to a GIS database, creating a linear data flow from GPS to remote sensing and ultimately to GIS (Fig. 14.23). This data flow suggests that the three components do not play an equal role in the integration. GPS is the least significant since it is destined for supplying point-based data for image analysis. Such data are absent from the final product of the integration as there is no direct link between GPS and GIS. The function of GPS in the integration is to bridge the gap between remote sensing data and other data in the GIS database. Specifically, satellite images and aerial photographs GPS coordinates
Georeferencing
RS image analysis
Database development
GIS database
FIGURE 14.23 The linear model of integration with a unidirection of data flow from GPS to remote sensing image analysis and ultimately to GIS. (Source: Gao, 2002.)
Integrated Image Analysis are standardized to a coordinate system common to all other GIS data layers with the assistance of the GPS data. Thus, remote sensing plays an ancillary role by feeding data to GIS, the ultimate destination of the integration. All the subsequent spatial analyses and modeling are carried out in GIS, as well. The integration of GPS with remote sensing can be implemented in one of two temporal modes: independent and simultaneous, both of which must be fulfilled prior to their integration with GIS. The independent mode of implementation applies to existing remotely sensed images that were recorded when the GPS technology had not been invented or widely used. The images and the GPS data used to rectify them are acquired independently of each other. The accuracy of integration is affected by the quantity and spatial distribution of GCPs logged, reliability of the GPS coordinates (e.g., whether GPS coordinates have been differentially corrected), and the functionality of the GPS receiver used to log the data. As the GPS technology is widely used these days, the independent mode of implementation has been gradually replaced with simultaneous integration, which is increasingly becoming the norm. It is accomplished by equipping the platform with a GPS unit during flights to record remotely sensed data (refer to “Direct Georeferencing” in Sec. 5.8). GPS helps to navigate to the study area and to obtain additional information on the sensor at the time of imaging. Thanks to the deployment of GPS, which forms an integral part of the inertial navigation system, the position of every photograph or image and its orientation are precisely known in space. Such parameters enable the rectification of the acquired imagery data in a real-time fashion. Moreover, this simultaneous mode of integration not only eliminates the need for constructing costly ground control but also adds flexibility to data acquisition. More information on the photographs/images can be acquired with less time and effort at a higher productivity.
14.4.2
Interactive Integration
At a first glance, the structure of this model of integration resembles the linear model closely (Fig. 14.24). However, major differences in data flow exist between the two upon closer scrutiny. Instead of unidirectionally, data flow interactively between GPS and remote sensing image analysis, as well as between image analysis and GIS. This interactive nature makes it difficult to judge their relative significance. In addition to feeding data to image analysis as in the linear Georeferencing
GPS coordinates Field verification
FIGURE 14.24 Gao, 2002.)
Database construction
RS image analysis
GIS database Environmental modeling
The interactive model of integration. (Source: Modified from
601
602
Chapter Fourteen model, GPS also receives data from image analysis, mostly in a postanalysis session. This kind of integration is termed postprocessing in that remote sensing images have been classified and analyzed already. The classification results need to be verified in the field, together with the genuine identity of evaluation pixels used in accuracy assessment. Furthermore, changes in land covers detected from analysis of multitemporal remote sensing data must be verified in the field. In both cases a GPS unit is essential in guiding the analyst to locate the sites of interest (Haack et al., 1998). Similarly, there is a mutual relationship between image analysis and GIS. This mutuality means that GIS also serves as a data supplier for image analysis. As image processing systems become more powerful and sophisticated, they are able to perform complex modeling that requires a wide range of spatial data in addition to satellite images. These spatial data can be supplied by GIS. Besides, data from a GIS database and GPS may be overlaid with remote sensing results to map features such as roads that are invisible on satellite imagery (Treitz et al., 1992). Thus, remote sensing is no longer a mere data feeder to GIS. The integration of image analysis with GIS is exemplified by detection of land cover change through overlay of historic and current land cover maps in a GIS (Haack et al., 1998; Welch et al., 1992). Although it is possible for data to flow from GIS to image analysis in this model, the left-to-right integration is much more common and stronger than that in the opposite direction, as implied by the arrow widths in Fig. 14.24. The integration of GIS with image analysis facilitates image segmentation and classification using the information (knowledge) stored in a GIS database. Incorporation of ancillary data in the classification either overcomes limitations underpinning conventional parametric classifiers and those associated with the heterogeneity of the scene under study, or compensates for topographic effects. In either case, the accuracy of the obtained results is improved. The final mapping product may be integrated into a GIS for further analyses, such as derivation of the percentage of impervious and pervious surfaces through overlay analysis. The ultimate task of integration may be carried out in a raster GIS or in a digital image analysis system such as ERDAS (Earth Resources Data Analysis System) Imagine.
14.4.3
Hierarchical Integration
The hierarchical model of integration contains two tiers (Fig. 14.25). The first tier of integration takes place between GPS and image analysis via in situ samples. In addition to feeding data to remote sensing, GPS also provides positioning information on in situ samples of a physical variable so that their spectral properties can be precisely characterized in remote sensing images when assisted with their GPS coordinates. The correlating of such in situ samples with geometrically rectified digital photographs or satellite images is a prerequisite to the
Integrated Image Analysis In situ samples
GPS coordinates
RS image analysis
GIS database
Correlating
Modeling
FIGURE 14.25 The hierarchical model of integration. In this model GIS supplies the necessary data or is used to implement modeling, which can also be carried out in an image analysis system if all data are in the raster format. (Source: Modified from Gao, 2002.)
establishment of the association between the variable under study and its image properties (Gao and O’Leary, 1997). The second tier of integration involves mathematical modeling in which remotely sensed data and GIS data are combined, in conjunction with the established association expressed as a mathematical or regression model. There is no direct link between GIS and remote sensing in this model of integration. Spatial modeling is implemented in an image analysis system or a raster GIS. Remote sensing serves a more dominant role than GIS in raster-based applications. It supplies the primary data needed for monitoring and modeling, whereas GIS supplements additional data (e.g., topographic or bathymetric), and may also provide the modeling environment. GPS still plays a subordinate, albeit expanded, role because GPS data are not directly involved at the second tier of integration. The applications of this model of integration are exemplified by the quantification of grassland biomass and the estimation of suspended sediment in a water body from satellite imagery. In these applications GPS is used to guide the collection of samples in the field concurrently with the recording of remotely sensed data to avoid any temporal variation in the physical variable to be retrieved. The precise location of these in situ samples can be determined using a GPS unit in the field. In case of grassland cover biomass, it can be correlated with the image properties directly. For sediments, additional analysis at the sampled locations has to be performed in the laboratory, such as filtering, drying, and weighing, to determine suspended solids. If the remote sensing materials are frame photographs in analog format, they must be scanned, georeferenced to the desired ground coordinate system, and mosaicked if multiple photographs or satellite images are involved (Fig. 14.26). Prior to image mosaicking, it may be vital to histogram-match the overlapping portion of the images. Histogram matching is more important for aerial photographs than for satellite images as they tend to experience more artificial radiometric variations caused during photographic processing
603
604
Chapter Fourteen GIS
Navigational chart
RS
GPS
Aerial photographs In situ samples Scanning
Tracing of contours
Editing
Geometric and radiometric rectification
Lab analysis
Photo mosaicking Point-based SSC Removal of land portion
TIN Overlay analysis DEM
Regressional analysis Modeling of SS
Total mass of SS
FIGURE 14.26 Steps involved in an exemplary case of hierarchical integration among GPS, GIS, and remote sensing for estimating total suspended solids in a water body. (Source: Gao and O’Leary, 1997.)
or by solar radiance. Furthermore, it may also be necessary to strip off land area from the image so that only water pixels remain in the resultant mosaic. The estimation of suspended solids in a water body requires bathymetric data in a GIS database. If no existing data are available, bathymetric data have to be acquired from hydrographic maps or navigational charts by first tracing contour lines in them digitally, from which a DEM is constructed later at the desired spatial resolution, usually the same as the pixel size of the remote sensing data (Fig. 14.26). It may also be necessary to project the captured DEM data to a ground coordinate system identical to that of the remote sensing imagery. After the image properties of the in situ water samples are determined, they are used to statistically establish the relationship between the histogram-matched reflectance and the physical variable by plotting the two variables in a scatter diagram. It is able to demonstrate whether the relationship is linear or nonlinear. Once the nature of the
Integrated Image Analysis
FIGURE 14.27 Spatial distribution of suspended sediment in Waitemata Harbour in Auckland, New Zealand, estimated from scanned aerial photographs and bathymetric data. GPS is essential in providing information on the positions of in situ samples. (Source: Gao and O’Leary, 1997.) See also color insert.
relationship is known, a model can be constructed using some of randomly selected samples, with the remaining samples used to validate and assess the accuracy of the constructed regression model. Through the established regression relationship, which must be tested for its significance, the pixel value expressed as a digital number (DN) is finally converted to values of the physical variable. In order to show its spatial distribution, it may have to be categorized into different levels (Fig. 14.27).
14.4.4
Complex Model of Integration
There are mutual interactions between any two of the three components in the complex model that represents the ultimate, or total, integration of GPS, image analysis, and GIS (Fig. 14.28). Apart from all the links already covered in the previous three models, an extra connection exists between GPS and GIS. The interaction of GPS with GIS may occur in the form of exporting GPS data to a GIS database to update it. Point, linear, or even areal data logged with a GPS receiver can be added to the GIS database if they have been transformed to the required format and projection. This kind of integration has become increasingly common as the value of GPS in acquiring accurate and timely spatial data has been widely recognized and taken advantage of. This integration has found a wide range of applications, such as precision farming in which GPS is used to determine coordinates
605
606
Chapter Fourteen
Overlay (In situ Samples/obs.)
GPS (coordinates)
Database construction
Georeferencing In situ data collection
Database development RS (image analysis)
GIS (database) Environmental modeling
FIGURE 14.28
The complex model of integration. (Source: Gao, 2002.)
associated with precision-farming variables (e.g., pest infection) while GIS is used for data integration, storage, and analysis (Lachapelle et al., 1996). The integration of GIS with GPS is initiated when the results from GIS modeling have to be substantiated in the field, or when more ground information at positions determined from the modeled results needs to be collected in the field. Because of the circular form of integration, it is very difficult to judge the relative importance of a component technology in the integration. Every component can be either critically important or insignificant, depending upon how the integrated approach is applied to solving the problem under study.
14.4.5
Levels of Integration
The aforementioned full integration can take place at three physical levels, such as primitive, seamless, and total. At the primitive level of integration, the image analysis system still remains separated from the GIS and GPS systems. However, in either of these systems there is an interface that allows data or files to be freely moveable among image processing, GPS, and GIS (Ehlers, 1990). The integration is achieved through such functions as import, read, and simultaneous display of vector data on top of raster images (e.g., overlay of road networks over rectified images to check rectification accuracy). This level of integration is fully operational today. At the second level of integration (seamless), image analysis, GPS and GIS systems are interwoven. It is permissible to perform tandem raster-vector processing through a common user interface. At this level of integration raster and vector data can be converted back and forth freely. It is also possible to perform some analyses involving both raster and vector layers. At present some of these functions can be undertaken in a singular system alone, but others still have to be done in either a raster- or a vector-based system.
Integrated Image Analysis The ultimate integration, also known as unity, involves fusion of data from all forms of geoinformatic technologies, including survey, GPS, remote sensing, and GIS. The real world is represented by an integrated model using data from all the sources, all of which are handled by this total information system. In this system the raster/ vector dichotomy is replaced by a hierarchical level of data representation. For instance, all features are represented as tiles at the tertiary level, as objects at the secondary level, and as pixels at the primary level. With this totally integrated model it is possible to undertake all integrated spatial queries, analysis, and modeling. This level of integration is yet to be accomplished because of two barriers.
14.5
Impediments to Integration Before the full integration of remote sensing image analysis, GIS, and GPS is realized, two impediments must be recognized and resolved: incompatibility in data format and in accuracy.
14.5.1
Format Incompatibility
The biggest hurdle in full integration is inconsistency in data format between image analysis and GIS. All raw remotely sensed data are in raster format. So are most image analysis results. In this field-based representation, space is divided into nonempty, nonoverlapping grid cells of a uniform shape and size with minimal abstraction (refer to Sec. 14.1.3). On the other hand, all GPS and the majority of GIS data are in vector format, in which an “empty” euclidean space is filled with objects. This creates incompatibility in data format as most systems, either GIS or image analysis, cannot handle both data formats equally competently for all analyses. Some systems are good at processing raster data and others are designed primarily to handle vector data, even though the capacity of a system to handle both types of data has improved considerably over recent years. So far most efforts to reconcile the raster/vector dichotomy have concentrated on unifying data representation, for instance, use of hierarchical image analysis in which satellite images are stored as vectors or objects at the mid and high levels, but as pixels at the low level. Other efforts include the use of alternative data structures, such as quadtree and Voronoi polygons, with limited success. More research is needed to fully resolve the dichotomy. This issue is going to become less serious if the remotely sensed data are processed in vector format (e.g., on-screen digitization using manual interpretation or image classification at the object level); the results can be exported to a GIS database with minimal extra processing, eliminating the data incompatibility problem. In fact, total integration has been fully achievable with raster GIS, such as IDRISI and GRASS (Geoographic Resources Analysis Support System), as data format is not an issue at all. Integration between GPS and GIS encounters little difficulty as far as data format is concerned. All the data logged
607
608
Chapter Fourteen with a GPS receiver are in point or line mode, both being fully compatible with the corresponding data in GIS. Similarly, format compatibility is not problematic in the integration of GPS with image analysis. This kind of integration is achieved by direct exporting of data from one system to another after a change in format (e.g., from vector to raster). However, it must be kept in mind that there is still the issue of accuracy incompatibility in their integration.
14.5.2 Accuracy Incompatibility As the application areas of geoinformatics widen, the required spatial data have become increasingly diversified, ranging from remote sensing, GIS, GPS, and ground survey to special statistical data. Integration of such a wide range of data is meaningful and feasible only when remotely sensed data maintain an accuracy standard compatible with that of other data in the GIS database (i.e., collected at a similar scale). It is meaningless to perform analyses that involve data obtained at drastically divergent scales. Reputable results are obtainable only when the input data are compatible in their positional and thematic accuracies. Remotely sensed data are widely available, thanks to advances in satellite technology (refer to Chap. 2). These data come at different accuracy levels and in different spatial resolutions ranging from submeters to kilometers. The user has to decide which type of data is appropriate for a particular application. No matter which data are selected, inaccuracy, either geometric or thematic, is introduced in every step of processing. The final results are usually assessed for their thematic accuracy before being delivered to the user or exported to another database. In this sense the accuracy level of the results is known and adequately documented. Unlike remote sensing data, most GIS data are secondary in nature. They are gathered either from digitization of existing topographic or thematic maps or imported from a GPS session, or by means of remote sensing. The original source data are not error free. In addition, GIS data suffer from errors introduced at every processing step, just like remote sensing data. Unlike remote sensing data, no consistent standards of accuracy have been devised, let alone conformed and documented properly. Such information as the history of a GIS data layer may be found in its metadata. However, it is unlikely that information on the potential sources of errors in the data layer and its possible accuracy level is maintained in the metadata. Errors in the data in a GIS database are difficult to trace because they have been collected from various sources by different people. Before the analyst decides whether a certain kind of data should be applied for their integration with other remotely sensed and GPS data, a few questions must be asked, such as: 1. At what level of accuracy discrepancy can different sets of data be considered compatible? 2. How can models of the errors and their propagation in combining GIS and remote sensing datasets be constructed?
Integrated Image Analysis
14.6
Exemplary Analyses of Integration 14.6.1
Image Analysis and GIS
Image analysis benefits tremendously from its integration with GIS in several areas. The most obvious lies in change detection. As demonstrated in the preceding chapter, vector-based change detection is virtually an overlay analysis ideally performed in a vector GIS if both input land cover layers are in the vector format. In this environment, changes are detected much more efficiently than in a raster-based image analysis system. Another area of application in image analysis is the radiometric and geometric correction of the input image with the assistance of topographic data, such as in a DEM. Band ratioing may be able to eliminate partially the shadow effect of topographic relief in the image, but it is restricted to individual bands and requires multitemporal data that must be recorded at the same or a similar solar elevation. This stringent requirement is almost impossible to satisfy in reality. Therefore, it has not found wide applications. By comparison, the amount of shadow at any given point in the input image can be precisely calculated from the topographic data of the study area if the solar elevation is known. The radiometry of the input image is calibrated much more precisely with the use of the DEM data from a GIS database. With the assistance of the DEM data, it is also possible to determine the amount of relief displacement at every pixel in the image. Thus, orthographic images can be produced from the input image in which all relief displacement is eliminated completely. This image may also be projected to a ground coordinate system of the analyst’s specification using the coordinates acquired from the DEM. As shown in Table 14.2, the rich and diverse data contained in a GIS database can be used to fulfill a variety of image analysis functions. Popular applications are knowledge-based image classification, postclassification filtering, and knowledge-based accuracy evaluation. For instance, texture information may be stored as a separate layer in a GIS database and used in conjunction with spectral information in a classification to improve the accuracy of derived results. In addition, the input image can be better segmented using knowledge stored in the GIS database. Another possible application is to stratify the input image using a combination of variables, such as topography, phonology, geology, and climate. Layers of all these variables are retrievable from a GIS database. Integration of GIS with image analysis brings tangible benefits to database development, effective visualization, and spatial modeling. 1. Topographic data, an important item in a GIS database, are routinely obtained by means of remote sensing (i.e., a pair of
609
610
Chapter Fourteen stereoscopic photographs). The easiest way of producing them is to use the photogrammetric method in a digital environment. Besides, it is also possible to produce DEMs from large scale, fine resolution, stereoscopic hyperspatial resolution satellite images. The production of DEMs requires reliable ground control that can be supplied by GPS. Related to data acquisition is the updating of existing data already in the database. Obsolete data layers can be updated with information acquired from analysis of recent aerial photographs or satellite images. For instance, a road map is updated by overlaying it on top of a recent aerial photograph or very high resolution satellite image that has been geometrically rectified to the required accuracy standard. Any changes in the road system show up through the overlaid display. New roads and newly urbanized areas can be added to the database via on-screen digitization. 2. The second area that benefits from this kind of integration is visualization of geographic data. For instance, satellite imagery or land cover information obtained from satellite imagery may be superimposed or draped over elevational data in a GIS to create a realistic depiction of the ground scenery for planning purposes (e.g., construction of a highway), environmental impact assessment of a proposed project (e.g., pylon and hydro dam), virtual tourism, and even simulation. 3. Finally, certain applications, such as spatial modeling, benefit considerably from the integration. The validity of any modeled results depends on the currency of the data. One important type of data in landscape modeling is land cover, which can be derived from digital analysis of satellite data. Since modeling is best done in either an image analysis system or in a raster GIS, this eliminates the data format compatibility issue in integration.
14.6.2
Image Analysis and GPS
Integration of image analysis with GPS takes place mostly during postclassification. Two particular examples of integration are identified here. The first one concerns locating sample points in the field for accuracy assessment. GPS can guide the analyst to the location of evaluation pixels on the ground. After all the evaluation pixels are selected indoor, it is necessary to go out to the field to check their genuine identities on the ground if they have not changed since imaging. How to reach these points and pinpoint them on the ground involve a degree of uncertainty. With the use of a GPS unit the image analyst can be certain where exactly the points are located and whether
Integrated Image Analysis a genuine point has been selected for check on the ground. In the second example, a GPS unit may be used to identify the nature of change in the field after preliminary change detection is performed. If the change detection method is band ratioing or image differencing, the nature of change at a given location is unknown. With the GPS navigation function, the analyst can be guided to the identified spot of change to ascertain its nature. Alternatively, GPS can be used to verify the kind of changes detected from multitemporal images on the ground, or assess the accuracy of change detection, for instance, to determine whether the indicated change is genuine or not. GPS data can be integrated into digital image analysis in a number of ways, depending upon the nature of the data. Prior to image rectification, all GPS data in point mode provide the coordinates of selected GCPs. During image processing, the linear features logged using a GPS receiver can be used to construct a knowledge base for image classification (see Chap. 11). For instance, a GPS unit can be hand-carried along the water-land interface to log the precise location of the coastal line at the time of imaging. Such newly logged data are much more current than the corresponding data used to make existing topographic maps. This currency is close to remotely sensed data that tend to be updating. Similarly, the logged data can be regarded as ground reference data in assessing the accuracy of coastal vegetation mapped from aerial photographs or satellite images.
14.6.3
GPS and GIS
GIS can benefit considerably from its integration with GPS. The functionality and power of a GIS cannot be fully realized unless data stored in its database maintain a certain level of currency and accuracy. Some features in the database can be easily and efficiently updated by mounting a GPS receiver on the top of a moving vehicle. For instance, the logged position of a newly constructed road can be exported to the GIS database after being differentially corrected and geometrically projected to the same coordinate system as that of all other data layers already in the database. In addition to linear data, GPS can also be used to construct a point database or update an existing one, such as the distribution of mobile phone transmission towers. The location of newly erected towers absent from the existing database can be updated with a GPS unit in the field. Such information is essential in deciding where to insert additional ones into the current network.
14.7 Applications of Integrated Approach The value of integrating remote sensing, GIS, and GPS is realized to the maximum in applications that require comprehensive georeferenced data current within seconds or nearly instantaneously. These
611
612
Chapter Fourteen applications include resources management, environmental monitoring, emergency response, and mobile mapping.
14.7.1 Resources Management and Environmental Monitoring In conjunction with powerful computer modeling tools, an integrated approach to remote sensing, GPS, and GIS enables planners and resources managers to better cope with the dynamic, multiuse complexity of natural resources. This integrated approach enables them to quantitatively model the resources and objectively analyze multiple demands. In this integration application, remote sensing image analysis yields information on natural resources (e.g., forests) that tend to change rather frequently over time. Rectification and interpretation of satellite images and aerial photographs needed for a forestry information system can be expedited with GPS data. In assessing groundwater resources, remote sensing and GPS data can be used to construct a comprehensive database incorporating all anticipated data requirements (Hardisty et al., 1996). In such applications base maps and initial surface hydrogeological conditions can be produced via interpretation of satellite imagery, whereas ground reference and point sampling data are obtainable using handheld GPS units. In precision farming the integrated use of GIS, remote sensing images, and GPS helps to reduce the loss of nutrients from agricultural fields (Ifft et al., 1995). In the integration satellite imagery produces all the information needed for compiling maps of current and planned cropland areas and associated databases. Topography and soil properties are retrieved from the GIS database. Those soils with poor fertility are located using a GPS receiver in the field. With the integrated approach fertilizers are applied more deliberately to the target spots. Such integration is also indispensable in devising an effective scheme for selectively applying pesticides to improve farming efficiency and reduce environmental hazards. Integration of ground photographs with GPS and GIS is also useful in visualization of scenic resources for environmental monitoring (Clay and Marsh, 1997) or even virtual tourism. The positions from where photographs are taken to survey the ground are determined with a GPS unit. The acquired images and the site data are then used to construct three-dimensional (3D) surfaces that are tremendously invaluable in monitoring beach morphodynamics and in devising coastal management strategies. The lack of distinct and stable landmarks in the coastal environment necessitates the use of GPS in building geometric control for remote sensing images. Coastal managers can then rely on the integrated resource database developed from the joint application of remote sensing and GPS to evaluate proposed coastal management scenarios and make ecologically sound decisions.
Integrated Image Analysis
14.7.2
Emergency Response
In emergency situations such as fires, accidents, and crime scenes, a vehicle needs to be dispatched to the trouble spot promptly and quickly, which is difficult to achieve if the driver is not familiar with the road network or the area. Automatic vehicle navigation in response to emergencies benefits from the integrated approach considerably. Land navigation systems have been built for many cities around the world, thanks to profound breakthroughs in GPS and mobile communications, assisted by revolutionary advances in computing technology (Adams et al., 1990). Such integration can be further improved by adding ground photographs to the system to give the driver more confidence and reassurance during navigation to the destination spot. The Mayday system, which allows a motorist to call for assistance without knowing one’s location in case of an emergency, typifies the most noteworthy applications of integrated GPS and GIS in public safety (Carstensen, 1998). Firefighting and emergency evacuation all require real-time georeferenced images. Wildfire fighters can use GPS and radio communications to track field personnel in real time, whereas integration of GPS and GIS with communications technology provides them with operational resources (Ambrosia et al., 1998). Remote sensing images on the latest development of wildfires can be relayed from the field to the disaster control center where they are integrated with maps almost instantly. These integrated data, in turn, provide vital clues as where to deploy firefighting resources most effectively. After the fire is extinguished, surfaces that have been affected by fire or other disastrous events (e.g., flooding) can be mapped from satellite imagery. Information on fire-induced watershed conditions may be gathered and mapped using GPS, videography, aerial photographs, and satellite imagery (Lachowski et al., 1997). Overlay of GPS and satellite imagery-produced data with GIS data layers reveals details of burned property parcels for insurance claims or compensation. A GPS unit may be used to log the perimeters of large fires in the field to validate such imagery-derived maps. In combination with ground observations, digital images taken from the air may be used to produce maps of burn intensity, fire damage to soil, and the environment together with GIS. These maps are essential in enacting an appropriate rehabilitation scheme, as well as planning remedial measures after a fire. The integration of all three technologies is even more valuable in effective hazard warning and disaster evacuation, such as storm warnings, search-and-rescue operations, and postdisaster recovery. Remote sensing and communications satellites are useful in disaster prevention, preparedness, and emergency relief in that the areas to be devastated by an imminent disaster (e.g., a hurricane) can be predicted from satellite imagery and evacuated well
613
614
Chapter Fourteen in advance (Walter, 1990). Improved flood prediction and global mobile communications during relief are other capabilities afforded by the integration. Relief efforts can then target those areas most severely affected, as identified from remote sensing imagery. In postdisaster recovery, aerial photographs and satellite imagery can detect navigable routes that have not been blocked by fallen trees or damaged by the disaster. Any damaged road sections that need urgent repair can be mapped with GPS, and assessed, analyzed, and visualized in a GIS road map.
14.7.3
Mapping and Mobile Mapping
The integration of GPS and GIS with remote sensing enables special point features, such as historical monuments and archaeological sites to be located and mapped in the field efficiently. A database of important landmarks that enjoy special protection status or have significant preservation values are mapped with GPS together with ground photographs, both of which can be integrated, stored, and managed in a GIS. More values are gained by applying the integrated approach to mobile mapping of roads. Road maps that are so vital in vehicle tracking may be produced or updated by combining GPS with digital orthophotographs in a stationary mode. Nevertheless, they are mapped more quickly using the mobile mapping system in which GPS is integrated with digital cameras. A mobile mapping system, comprising mainly a moving platform (e.g., a vehicle, a vessel, or an aircraft), navigation sensors, and mapping sensors, is able to automatically collect road data (Tao, 2000). Vehicles are tracked by the GPS receivers that also yield information on the position and orientation of the mapping sensors. Ground features and their attributes are then extracted from the georeferenced mapping sensor data (e.g., video images) either automatically on the mobile platform or during postprocessing (Novak, 1995). These real-time images provide the most accurate data about road conditions and the traffic. If based on an airborne platform such as a helicopter, this mobile system considerably improves the existing capabilities of acquiring and updating large- and medium-scale spatial data. In addition to traffic surveillance, this comprehensive set of multimedia information can be used for other purposes, such as highway mapping, railway maintenance, and utility mapping.
14.7.4
Prospect of Integrated Analysis
An exciting prospect of integrating digital image analysis with GPS and GIS is the development of the location-aware personal digital assistant (PDA). This system consists of the a GPS-equipped handheld computer or a palmtop GIS. Contained in its database is
Integrated Image Analysis such classic information as roads and attributes (van Essen and de Taeye, 1993). With the improvement in the capability of the computer, more innovative data such as pictures of popular tourist attractions, ground and aerial photographs, and satellite imagery can be stored in it. Incorporating remote sensing, GIS, and GPS, PDAs have found important applications in vehicle navigation, route planning, an advanced version of land navigation system, and tourism. Perhaps a more significant PDA application is a study of the travel behavior of commuters (Murakami and Wagner, 1999). Vehicle-based daily travel information, such as date, timing, duration of journeys, as well as vehicle position and speed of travel is collected automatically at frequent intervals by the PDA. These data are indispensable in better planning transport facilities and in designing better road networks. In contrast to automatic vehicle tracking and navigation, the location-aware PDA may be guided by voice in the future. Besides, it has the capability of displaying scenic views along the route of travel on its computer screen, in addition to the current location of the vehicle on a road map of the immediate vicinity. The display may be periodically refreshed and automatically updated as the vehicle roams outside the display. The driver can be assured of the right route if the view outside the window matches what is displayed on the screen. Such a multimedia, location-aware PDA is quickly becoming a reality as the computing technology advances. With the development in mobile multimedia communications technologies, it is anticipated that the handheld computer will be replaced with a cellular phone some day. In this way the road map and scenic views can be downloaded from a remote server instead of being stored locally. Very similar to the PDA in principle, distributed mobile GIS (DM-GIS) represents another exciting prospect of the integration (Karimi et al., 2000). A DM-GIS is typically made up of palmtop computers equipped with a GPS unit and cameras. They communicate remotely via wireless networks with a cluster of backend servers where all GIS data are stored. Unlike the PDA, there are twoway communications between the computers and the GIS servers. Namely, photographs captured by cameras in the field are relayed back to the servers to update the database as frequently as necessary. At present the most critical hurdle to the fulfillment of DM-GIS is the efficient verification of topology for the relayed data, as well as the reconciliation of conflicting topologies. Once these challenges are successfully overcome, DM-GIS will become invaluable in those applications that require computation by field crews and/or fleets of vehicles, such as emergency services, utilities, transportation, rescue missions, telecommunications, scientific field studies, and environmental monitoring and planning.
615
616
Chapter Fourteen
References Adams, J. R., M. R. Gervais, and J. F. McLellan. 1990. “GIS for dispatch automatic vehicle location and navigation systems.” In GIS for the 1990s, Proceedings of the National Conference, Ottawa, 1990 (CISM), 1185–1196. Ottawa: CISM. Ambrosia, V. G., S. W. Buechel, J. A. Brass, J. R. Peterson, R. H. Davies, R. J. Kane, and S. Spain. 1998. “An integration of remote sensing, GIS, and information distribution for wildfire detection and management.” Photogrammetric Engineering and Remote Sensing. 64(10):977–985. August, P., J. Michaud, C. Labash, and C. Smith. 1994. “GPS for environmental applications: Accuracy and precision of locational data.” Photogrammetric Engineering and Remote Sensing. 60(1):41–45. Bock, Y. 1996. “Introduction.” In GPS for Geodesy, ed. A. Kleusberg and P. J. G. Teunissen, 3–36. Berlin: Springer-Verlag. Burrough, P. A., and R. A. McDonnell. 1998. Principles of Geographical Information Systems. New York: Oxford University Press. Carstensen, L. W. Jr. 1998. “GPS and GIS: Enhanced accuracy in map matching through effective filtering of autonomous GPS points.” Cartography and Geographic Information Systems. 25(1):51–62. Clay, G. R., and S. E. Marsh. 1997. “Spectral analysis for articulating scenic color changes in a coniferous landscape.” Photogrammetric Engineering and Remote Sensing. 63(12):1353–1362. Ehlers, M. 1990. “Remote sensing and geographic information systems: Towards integrated spatial information processing.” IEEE Transactions on Geoscience and Remote Sensing. 28(4):763–766. Gao, J. 2001. “Non-differential GPS as an alternative source of planimetric control for rectifying satellite imagery.” Photogrammetric Engineering and Remote Sensing. 67(1):49–55. Gao, J. 2002. “Integration of GPS with remote sensing and GIS: reality and prospect.” Photogrammetric Engineering and Remote Sensing. 68(5):447–453. Gao, J., and S. M. O’Leary. 1997. “Estimation of suspended solids from aerial photographs in a GIS.” International Journal of Remote Sensing.18(10):2073–2085. Goodchild, M. F. 1985. “Geographic information systems in undergraduate geography: A contemporary dilemma.” The Operational Geographer. 8:34–38. Haack, B., J. Wolf, and R. English. 1998. “Remote sensing change detection of irrigated agriculture in Afghanistan.” Geocarto International.13(2):65–76. Hardisty, P. E., J. Watson, and S. D. Ross. 1996. “A geomatics platform for groundwater resources assessment and management in the Hadramout-Masila region of Yemen. Application of geographic information systems in hydrology and water resources management.” In Proceedings of HydroGIS’96 Conference. Vienna. 1996. IAHS Publication 235, 527–533. . Hofmann-Wellenhof, B., H. Lichtenegger, and J. Collins. 1997. Global Positioning System: Theory and Practice (4th ed.). Vienna: Springer. Ifft, T. H., W. L. Magette, and M. A. Bower. 1995. “Nutrient management models combine GIS, GPS, and digital satellite imaging.” Geo Info Systems. 5(7):48–50. Karimi, H. A., P. Krishnamurthy, S. Banerjee, and P. K. Chrysanthis. 2000. “Distributed mobile GIS—Challenges and architecture for integration of GIS, GPS, mobile computing and wireless communications.” Geomatics Info Magazine. 14(9):80–83. Lachapelle, G., M. E. Cannon, D. C. Penney, and T. Goddard. 1996. “GIS/GPS facilitates precision farming.” GIS World. 9(7):54–56. Lachowski, H., P. Hardwick, R. Griffith, A. Parsons, and R. Warbington. 1997. “Faster, better data for burned watersheds needing emergency rehab.” Journal of Forestry. 95(6):4–8. Murakami, E., and D. P. Wagner. 1999. “Can using Global Positioning System (GPS) improve trip reporting?” Transportation Research Part C: Emerging Technologies. 7(2–3):149–165. Novak, K. 1995. “Mobile mapping technology for GIS data collection.” Photogrammetric Engineering and Remote Sensing. 61(5):493–501.
Integrated Image Analysis Tao, C. V. 2000. “Semi-automated object measurement using multiple-image matching from mobile mapping image sequences.” Photogrammetric Engineering and Remote Sensing. 66(12):1477–1485. Treitz, P. M., P. J. Howarth, and P. Gong. 1992. “Application of satellite and GIS technologies for land-cover and land-use mapping at the rural-urban fringe: A case study.” Photogrammetric Engineering and Remote Sensing. 58(4):439–448. Trimble. 2007. GPS—The First Global Navigation Satellite System. Sunnyvale, CA: Trimble Navigation Ltd. Van Essen, R. J., and I. A. de Taeye, 1993. “The road map redefined: The European digital road map triggered by car information systems.” Geodetical Info Magazine. 7(9):32–34. Walter, L. S. 1990. “The uses of satellite technology in disaster management.” Disasters. 14(1):20–35. Welch, R., M. Remillard, and J. Alberts. 1992. “Integration of GPS, remote sensing, and GIS techniques for coastal resource management.” Photogrammetric Engineering and Remote Sensing. 58(11):1571–1578.
617
This page intentionally left blank
Index Note: Page numbers referencing figures are italicized and followed by an “f ”; page numbers referencing tables are italicized and followed by a “t”.
A AASAP (Applied Analysis Spectral Analytical Process) module, 294 AATSR (Advanced Along Track Scanning Radiometer), 72t, 73f abscissa axis, 12 abstractness, 3 accuracy of change analysis evaluation of, 560–564 factors affecting, 555–560 overview, 554–555 of classifiers, 343–347 decision trees, 377–379 of digital image analysis, 6 of D-S ToE, 471–472 effect of texture on, 403 of fuzzy classification, 289 GPS improvements in, 593–596 overview, 591–593 in ground cover mapping, 454–455 incompatibility as impediment to integration, 608 of knowledge, 455 of object-oriented image classification, 429–432 of per-pixel image classification, 429–432 precision versus, 498–499
accuracy assessments IDRISI, 107 in image analysis, 8–9 inaccuracy of classification results, 500–504 integration of image analysis and GPS, 610–611 in major image analysis systems, 140t overview, 497–498 precision versus accuracy, 498–499 procedure collection of reference data, 509–510 number of evaluation pixels, 507–509 overview, 504–505 scale, 505–506 selection of evaluation pixels, 506–507 report of accuracy aspatial accuracy, 511–512 comparison of error matrices, 521–524 example of accuracy assessment, 520–521 interpretation of error matrix, 514–518 overview, 511 quantitative assessment of error matrix, 518–520 spatial accuracy, 512–513
619
620
Index acquisition of knowledge, 458–461 activation function, 324 adaptive resonance theory (ART), 314–316 Advanced Along Track Scanning Radiometer (AATSR), 72t, 73f Advanced Land Observing Satellite (ALOS), 47–48, 49t, 50f Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), 30, 42–46 Advanced Synthetic Aperture Radar (ASAR), 72–74 Advanced Very High Resolution Radiometer (AVHRR), 25–28, 145–146, 380t Advanced Visible and Near Infrared Radiometer type 2 (AVNIR-2), 47–48 aerial photographs expert system, 447 image direct georeferencing, 196 projection, 184–185 scanning, 74–78 site inaccessibility, 510 affine rectification model, 161 agenda setters, 467–468 agglomerative hierarchical clustering, 264–266 agglomerative merging, 412 Airborne Imaging Spectrometer (AIS), 64–65 Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), 64–65 Algorithm Librarian feature, PCI Geomatica, 130–131 along track scanning radiometer (ATSR), 68f along-track scanning, 144 ALOS (Advanced Land Observing Satellite), 47–48, 49t, 50f altitude (Z), 149 analog images, 17, 74–78. See also aerial photographs analysts, 250, 283 analytical framework, GIS, 588 ancillary data ANN classifiers, 335–336 GIS as supplier of, 588–589
Index AVIRIS (Airborne Visible/Infrared Imaging Spectrometer), 64–65 AVNIR-2 (Advanced Visible and Near Infrared Radiometer type 2), 47–48
B backpropagation delta rule, 323–324 backpropagation networks, 311–312, 330f bagging, 370–371, 379–381 band interleaved by line (BIL) format, 87 band interleaved by pixel (BIP) format, 87 band ratioing, 228 band sequential (BSQ) format, 86–87 band-color filter combinations, 532–533 base stations, 594 basic probability assignment m(A), 468–469 batch learning, 322 bathymetric data, 604 Bayes theorem, 277, 414 BIL (band interleaved by line) format, 87 bilinear interpolation method, 173–174 binary decision trees, 353f binary diamond network, 317 binary encoding, 335 binary land cover maps, 543 biorthogonal wavelets, 101 BIP (band interleaved by pixel) format, 87 bit-plane coding, 97–98 blackboards, 467–468 blocked backpropagation network model, 312 blurring in low-pass filtering, 222 in median filtering, 224 boosting, 370–371, 379–381 bottom-up construction method, 360 bottom-up segmentation approach, 408 boundaries inaccuracies, 501–502 object-oriented image analysis, 436–437 segmentation based on, 409–410 vector-based, 544 box method, 272–274
Boxing Day tsunami, 526f brightness axis, 243 BSQ (band sequential) format, 86–87 bytes, 85
C C4.5 and C5.0 trees, 373–374 CART (classification and regression tree), 372–373 cartesian coordinate system, 11–12 Cartosat satellites, 49t, 50f, 56–58 case studies knowledge-based image analysis, 479–485 parametric versus ANN classifiers, 340–342 potential of object-oriented image analysis, 426–429 CASI (Compact Airborne Spectrographic Imager), 65–66, 432 CDs (compact disks), 88–89 cell size, 9–10, 16, 17f, 33 cells, raster, 572–573 central processing units, 2 CERES (Clouds and the Earth’s Radiant Energy System), 42, 43t change analysis accuracy of evaluating, 560–564 factors affecting, 555–560 overview, 554–555 fundamentals, 527–530 novel methods analysis from monotemporal imagery, 553–554 change vector analysis, 548–551 comparison, 552–553 correlation-based analysis, 551–552 overview, 547 PCA, 547–548 spectral-temporal change classification, 547 overview, 525–526 postclassification aspatial change analysis, 538–540 overview, 537–538 raster implementation, 542–543 rasterization versus vectorization, 544–547
621
622
Index change analysis, postclassification (Cont.): spatial change analysis, 540–542 vector implementation, 543–544 qualitative, 530–533 quantitative, 533–537 visualization of detected change, 564–565 change detection accuracy assessment, 560 ERDAS Imagine, 116–117 IDRISI, 107 in major image analysis systems, 140t overview, 9 spatial change analysis, 540–542 use of GPS for, 611 change vector analysis, 548–551 channel-related subnets (CRS), 317, 318f CIE (Commission International de l’Eclairage) curve, 244 classification. See also image classification decision trees accuracy, 377–379 ensemble classifiers, 383–386 limitations, 383 overview, 376 robustness, 379–381 strengths, 381–383 fuzzy c-means, 289 layered temporal change, 547 per-parcel, 420–421 soft, 367 spectral-temporal change, 547 vegetation, 446t Zhu, 450–451 classification accuracy, 6, 8–9, 555, 562–563 classification accuracy assessment inaccuracy of classification results, 500–504 overview, 497–498 precision versus accuracy, 498–499 procedure collection of reference data, 509–510 number of evaluation pixels, 507–509 overview, 504–505 scale, 505–506 selection of evaluation pixels, 506–507
classification accuracy assessment (Cont.): report of accuracy aspatial accuracy, 511–512 comparison of error matrices, 521–524 example of accuracy assessment, 520–521 interpretation of error matrix, 514–518 overview, 511 quantitative assessment of error matrix, 518–520 spatial accuracy, 512–513 classification and regression tree (CART), 372–373 classifiers. See also decision trees neural network ancillary data, 335–336 data encoding, 334–335 versus parametric classifiers, 340–347 standardization of input data, 336–337 strengths and weaknesses, 337–340 parametric, 338t, 343–347 selection of, 267–268 clip, GIS overlay function, 584–585 clock inaccuracies, GPS, 591–592 Clouds and the Earth’s Radiant Energy System (CERES), 42, 43t clumping, 298–299 clustered sampling scheme, 506, 507f clustering analysis. See unsupervised classification clusters, pixel normalized distance between, 260 spectral distance in, 254 in split and merge method, 412–413 coarse texture, 390 coastal erosion, 531f coastal management, 612 Coastal Zone Color Scanner sensor, 28 coefficients, kernel, 220–221 color composites, 532–533 heterogeneity, 417 parameters, 243–244 schemes in maps, 564 color-coding in density slicing, 204, 205f
Index column coordinates, 12–13 column sum, 514 commission errors, 503–504, 514, 515t, 517t Commission International de l’Eclairage (CIE) curve, 244 Compact Airborne Spectrographic Imager (CASI), 65–66, 432 compact disks (CDs), 88–89 compactness, 415–416 comparison as change analysis method, 552–553 competitive learning, 321 complex model of integration, 605–606 complexity of digital image analysis, 6 composite reflectance spectrum equation, 292 composites, color, 532–533 compositing, image, 532–533 compression. See also data GIF, 93 JPEG, 93, 94f, 101–104 computer hard disks, 90–91 conceptual illustrations, 527 conditional probabilities, 280–281 conditional queries, GIS, 580–581 conditional statements, 464, 492 conditions, decision tree, 352, 446–447 confidence intervals, 516 conflation, 587–588 conformality, 151–152 confusion matrices, 271, 512–513 Conjugate-Gradient Backpropagation (GGBP) network, 319 conjugate-gradient optimization, 319 connectionist models, 307 connectivity, 219 Constellation of Small Satellites for the Mediterranean basin Observation (COSMOSkyMed) satellite, 49t, 50f, 62 constrained articulation of knowledge, 491–492 consumer’s risk, 518 contrast stretching density slicing, 204–205 histogram equalization, 211–216 linear enhancement, 205–208 look-up table, 208–210 nonlinear stretching, 210–211 piecewise linear enhancement, 208
control errors, 502–503 controlled mosaics, 199–200, 201f convergence, network, 333 convolution of kernels, 219–221 coordinate systems NZMG, 153–155 overview, 151–152 UTM projection, 152–153 coregistration, 156 correlation matrix, 240, 240t correlation-based change analysis, 551–552 COSMOSkyMed (Constellation of Small Satellites for the Mediterranean basin Observation) satellite, 49t, 50f, 62 courier delivery, 81–82 critical evaluations effectiveness of spatial knowledge, 489–490 limitations, 491–493 overview, 487 relative performance, 488–489 strengths, 490–491 cross-track scanning, 144 cross-validation, 328, 377 CRS (channel-related subnets), 317, 318f cubic convolution method, 174–176 cumulative frequency of pixels, 212, 215–216 currency of knowledge, 455–456 curvature of Earth and geometric distortion, 145–147
D data. See also purchase of data compression JPEG and JPEG 2000, 101–104 lossy compression, 100–101 LZW encoding, 99–100 overview, 96 run-length encoding, 97–98 variable-length encoding, 96–97 encoding, 334–335 versus information, 253–254 input and output in ER Mapper, 123–125 non-remote sensing, 454 preparation ENVI, 118–119 ER Mapper, 123–125
623
624
Index data, preparation (Cont.): ERDAS Imagine, 112–114 for image analysis, 8 processing, artificial neuron, 307f queries, 577 raster/vector dichotomy, 607–608 redundancy, 18, 96, 232 retrieval, 577 topographic, 454, 609–610, 612 data layer concept, GIS database, 571–572 databases, GIS, 568–569, 578–579 Daubecies wavelets, 101 decision boundaries fuzzy classification, 286f multivariate decision tree, 356f univariate decision tree, 354, 355f, 356f decision rules maximum likelihood classifier, 276–281 minimum distance to mean classifier, 274–276 parallelepiped classifier, 272–273 decision trees C4.5 and C5.0 trees, 373–374 CART, 372–373 classification accuracy, 377–379 ensemble classifiers, 383–386 limitations, 383 overview, 376 robustness, 379–381 strengths, 381–383 construction of construction methods, 360–361 example, 364–366 feature selection, 361–363 node splitting rules, 366–368 tree pruning, 368–369 tree refinement, 370–371 versus expert systems, 447 fundamentals, 351–353 hybrid, 357–358 inductive learning, 459–460 knowledge-based image classification, 476–477 M5 trees, 374–375 multivariate, 355–357 QUEST, 375–376 regression, 358–359 univariate, 353–355
declarative knowledge, 449–450 Definiens Analyst, eCognition, 134 Definiens Architect, eCognition, 134 Definiens Developer, eCognition, 134 Definiens Viewer, eCognition, 134 defuzzification, 287 deletion, cluster, 264 delimitation, polygon, 270 delta rule, 323 DeltaQue module, ERDAS Imagine, 116–117 DEM (digital elevation model), 122, 188–190 Dempster’s Orthogonal Sum, 470–471 Dempster’s rule of combination, 470 Dempster-Shafer Theory of Evidence (D-S ToE), 468–473 density slicing, 204–205 desktop scanners, 75, 75f detection templates, edge, 226–227 Developers’ Toolkit, ERDAS Imagine, 117 DFT (discrete Fourier transformation), 245–246 differencing, spectral, 534–535 differential correction, GPS, 546t–547t, 594 differential minimum mapping units, 555–556 differential rectification, 185 digital elevation model (DEM), 122, 188–190 digital image analysis. See also integrated image analysis; knowledge-based image analysis; multitemporal image analysis; object-oriented image analysis features of, 3–6 IDRISI functions for, 106–107 overview, 1, 22–23 preliminary knowledge, 9–15 process of, 6–9 properties of remotely sensed data, 15–22 system components, 2–3 digital numbers (DNs) generic binary format, 92 Huffman coding, 97 in images, 10f overview, 11–13 storage, 86
Index digital versatile disks (DVDs), 89 direct georeferencing overview, 192 and polynomial model, 195–197 transformation equation, 192–195 direct linear transform model, 163–164 discipline knowledge, 450 discrete Fourier transformation (DFT), 245–246 discretizing pixel values, 204, 205f display ENVI, 118–119 ERDAS Imagine, 112 IDRISI, 107–109 PCI, 128–130 dissimilarity, 395, 396t distance in spectral domain, 257–260 distortion. See geometric distortion distributed mobile GIS (DMGIS), 615 division, image, 228 DNs. See digital numbers documentation ENVI, 122–123 ER Mapper, 127–128 ERDAS Imagine, 117–118 IDRISI, 110–111 PCI, 131–132 domain experts, 458–459 domain-specific knowledge, 450, 454 downloading data, 82 drives, 3 dropout lines, 223 D-S ToE (Dempster-Shafer Theory of Evidence), 468–473 DVDs (digital versatile disks), 89 dynamic learning network, 319–320
G GAC (global area coverage), 26, 30 gaussian Markov random field measures, 401 gaussian normal distribution probability model, 277–278 Gb (gigabytes), 85 GCPs. See ground control points GDOP (geometric dilution of precision), 592–593 Geary’s ratio, 392–393
general declarative knowledge, 450 generalized delta rule, 323 generic binary format, 92 genuine change ratio, 559 GeoEye-1 satellite, 50f, 59–60 geographic data, 336 geographic information system (GIS) analytical framework, 588 attribute data, 574–575 conditional queries, 580–581 database queries, 578–579 databases, 461f, 568–569 exemplary analyses of integration, 609–611 functions, 577–578 overlay functions, 581–586 overview, 568 portability, 5 raster mode of representation, 572–574 topologic data, 575–577 vector mode of representation, 569–572 Geographic Resources Analysis Support System (GRASS), 136–138 geographic universality, 456 geoid, 189, 190f GeoKeys, 95 Geomatica, PCI comparison with other systems, 140t, 141 documentation and evaluation, 131–132 image input and display, 128–130 major modules, 130–131 overview, 128 user interface, 131 geometric dilution of precision (GDOP), 592–593 geometric distortion categorizing, 151 Earth curvature, 145–147 Earth rotation, 144–145 platform orientation, 149–151 platform position, 149 platform velocity, 151 sensor distortions, 147–149 geometric properties, 80 geometric rectification, 599 georeferencing defined, 156–157 GCP quality, 181–183
627
628
Index georeferencing (Cont.): GCP quantity, 178–180 GPS, 596, 599 overview, 176–178 resolution, 180–181 GeoTIFF interchange standard, 95 GGBP (Conjugate-Gradient Backpropagation) network, 319 GIF (Graphic Interchange Format), 92–93 gigabytes (Gb), 85 Gini index, 367–368 GIS. See geographic information system GLCM (gray level co-occurrence matrix), 394–399, 401–402 global area coverage (GAC), 26, 28–29 global positioning system (GPS) accuracy, 591–596 exemplary analyses of integration, 610–611 georeferencing, 176, 178 inertial navigation system, 192–194 knowledge acquisition through, 460–461 principles of, 589–591 relevance of GIS to image analysis, 596–598 as source of ground control, 160–161 global thresholding, 409 Goldberg expert system, 447 Goodenough expert systems, 447–448 GPS. See global positioning system graphic histograms, 13, 14f Graphic Interchange Format (GIF), 92–93 graphic presentation of classification results, 301, 302f GRASS (Geographic Resources Analysis Support System), 136–138 grassland biomass, NDVI detection of, 231 gray level co-occurrence matrix (GLCM), 394–399, 401–402 gray levels, 206, 246, 250–251 gray tone spatial matrix, 394–399 GRD (ground resolving distance), 76 greenness axis, 243 ground control, 160–161
ground control points (GCPs) accuracy of image transformation, 168–170, 171t image georeferencing impact on, 178–183 issues in, 176–178 image resolution, 180–181 in image transformation, 158–161 orthorectification, 188–191 polynomial model, 167–168 rubber-sheeting model, 164–165 transformation equation, 194–196 ground resolving distance (GRD), 76 ground sampling distance. See spatial resolution ground sampling intervals, 10 groundwater resources, 612 growing-pruning construction method, 361
H H (satellite altitude), 15–16 hardware, 5 haze axis, 243 hebbian learning, 321 heterogeneity in multi-scale image segmentation merging, 419–420 shape, 416–417 spectral, 417 heuristic knowledge, 406, 450 hidden layers in neural networks, 326–327 hidden nodes in neural networks, 327–329 hierarchical class structure, 423–424 hierarchical integration, 602–605, 606f hierarchical network, 320–321 hierarchical splitting, 412 high-dimensional data, 382 high resolution picture transmission (HRPT), 26 high resolution visible (HRV) sensors, 35 high-pass filtering, 222f, 223 HIS (hue-intensity saturation) transformation, 244–245 histograms clustering based on, 266–267 ENVI, 119 equalization, 211–216, 217f
Index histograms (Cont.): linear stretching, 207f overview, 13–14 homogeneity in GLCM measures, 395, 396t region-based segmentation, 410 shape, 416–417 HP ScanJet 8200, 75f HRPT (high resolution picture transmission), 26 HRV (high resolution visible) sensors, 35 hue-intensity saturation (HIS) transformation, 244–245 Huffman coding, 97 human interpreters. See visual image interpretation human neurons, 306 hybrid classifiers, 383–386 hybrid construction method, 361 hybrid decision trees, 357–358 hyperbolic tangent function, 325 hyperion satellite data, 63–64 hyperspatial resolution satellite images, 421 hyperspectral data Airborne Imaging Spectrometer, 64–65 CASI, 65–66 decision tree classifiers and, 378–379 in ENVI system, 122 hyperion satellite data, 63–64 hyperspherical direction cosine change vector analysis, 550 hypotheses, decision tree, 352
I identity, GIS overlay function, 584 IDRISI system comparison with other systems, 139, 140t display and output, 107–109 documentation and evaluation, 110–111 file format, 109 image analysis functions, 106–107 overview, 106 user interface, 109–110 IFOV (instantaneous field-of-view), 15–16, 26 IKONOS-2 satellite, 49t, 50–52
illustrations, conceptual, 527 image analysis. See digital image analysis image classification. See also fuzzy image classification; intelligent image classification; object-oriented image classification; spatial image classification; spectral image classification; subpixel image classification contexture in, 407 eCognition, 133 ENVI, 120–123 ER Mapper, 126, 128 ERDAS Imagine, 114–115, 117–118 IDRISI, 106–107 image analysis, 8 in major image analysis systems, 140t neural network models for, 308t per-field, 420–421, 429–431 variogram texture, 400–401 image compositing, 532–533 image contextual information, 406 image differencing, 533–535 image display, ER Mapper, 125–126 image elements, 250–253 image enhancement and classification in ER Mapper, 126 contrast stretching density slicing, 204–205 histogram equalization, 211–216 linear enhancement, 205–208 look-up table, 208–210 nonlinear stretching, 210–211 piecewise linear enhancement, 208 edge enhancement and detection edge detection templates, 226–227 enhancement through subtraction, 225–226 overview, 224–225 ENVI, 119–120 ER Mapper, 126, 128 ERDAS Imagine, 114 histogram matching, 216–219 IDRISI, 106 in image analysis, 8 image filtering in frequency domain, 245–247
Z Z (altitude), 149 zero-sum kernels, 226 Zhu classification, 450–451 zonal queries, 580
645
This page intentionally left blank
(a)
(b)
(c)
(d)
FIGURE 1.8 Appearance of an image represented at four spatial resolutions of 4 m (a), 8 m (b), 20 m (c), and 40 m (d ). As pixel size increases, ground features become less defined.
FIGURE 2.1 Global distribution of vegetation expressed as normalized difference vegetation Index (NDVI) and chlorophyll averaged from multitemporal AVHRR data between June and August 1998. (Source: Goddard Space Flight Center.)
FIGURE 2.6 A subscene (1001 rows by 1101 columns) ASTER image of Northeast China recorded on September 11, 2004. This color composite is formed by VNIR bands 1 (b), 2 (g), and 3 (r). Its spatial resolution of 15 m is ideal in studying natural hazards such as land salinization which appears as white patches in this composite.
FIGURE 2.8 A subscene (474 rows by 581 columns) color composite of IKONOS multispectral bands 2 (blue), 3 (green), and 4 (red) over a densely populated suburb of Auckland, New Zealand. It was recorded on April 8, 2001. The fine detail exhibited in IKONOS imagery is ideal in studying housing, street networks, and many environmental problems. (Copyright Space Imaging Inc.)
FIGURE 2.9 An exemplary QuickBird image of Three Gorges Dam, China, recorded on October 4, 2003. The ground resolution of 0.6 m of QuickBird imagery makes it ideal in applications that require a great deal of details. (Copyright DigitalGlobe.)
FIGURE 2.14 Scanning of multiple photographs faces the problem of artificial variation in image radiometry caused by varying development in the dark room.
(a)
(b)
FIGURE 3.6 Comparison of a 734-by-454 image that has been compressed with JPEG (quality: 60, standard deviation: 2). (a) Original image, (b) image that has been compressed with JPEG.
FIGURE 5.14 An example of an output image that has been rectified to the NZMG coordinate system based on a first-order transformation. The edge of the image appears to be straight. The image’s dimension has increased to 605 rows by 647 columns from 512 rows by 512 columns due to image rotation. (Copyright CNS, 1994.)
FIGURE 5.16 Distribution of GCPs in a rural area. Lack of distinct landmarks makes the selection of quality GCPs impossible. Their spatial distribution is highly uneven as a result. (Source: Gao and Zha, 2006.)
FIGURE 5.24 A subscene image of 512 by 512 subset from a full-scene SPOT image using a pair of row and column numbers. (Copyright CNS, 1994.)
Temperature degrees celsius 15 37 16 Cold Plume 17
38
18 19 20
39S
21 22
177
178
179
DN
Blue Green
Red
15
120
0
136
16
256
0
0
17
210
46
0
18
180
76
0
19
90
166
0
20
0
256
0
21
0
125
131
22
0
0
256
180E
FIGURE 6.6 An example of visualizing sea surface temperature using a look-up table. (Source: Modified from Gao and Lythe, 1996.)
FIGURE 6.20 Distribution of grass cover density derived from an NDVI image that is produced from Landsat TM3 and TM4 in conjunction with 68 in situ density samples. (Source: Zha et al., 2003a.)
(See next page for Fig. 6.23.)
(a)
(b)
FIGURE 7.5 An example of unsupervised classification in which the input image is classified into 8 (a) and 12 (b) clusters using a convergence threshold of 0.950 and a maximum iteration of 10.
FIGURE 6.23 A Landsat TM subscene image of central Auckland. This color composite of bands 4 (red), 3 (green), and 2 (blue) was used for the principal component analysis.
FIGURE 7.14 Comparison of results classified with three classifiers. The input image is shown in Fig. 5.24. (a) Parallelepiped; (b) maximum likelihood; (c) minimum distance to mean.
(b)
(a)
FIGURE 7.19 Comparison of the effect of two filtering methods on classified results. (a) Majority filtering within a window of 5 × 5; (b) clumping with eight connections followed by elimination using a threshold of five pixels.
0
3 km
Forest Pasture L mangroves S mangroves Residential Industrial Bare ground Murky water Clear water Shallow water Cloud Shadow
FIGURE 7.22 An example of graphic embellishment of classified results. Essential components are a legend, a scale bar, and the orientation. The statistics of all mapped covers are presented in Table 7.5. (Source: Gao, 1998.)
N
0
1
2
3
Reservoir Marine farm Crystallization zone Condensation zone Evaporation zone Other
4 Km
FIGURE 8.16 Salt farm zones in the Taibei Salt Farm classified using the neural network method. (Source: Zhang et al., 2006.)
N
0 1
2
3
4 Km
(a)
Reservoir Marine farm Salt farm Settlement Roads
(b)
FIGURE 8.17 General land covers classified from TM satellite data. (a) Results from the neural network method; (b) results from the maximum likelihood method. (Source: Zhang et al., 2006.)
(a)
(b)
FIGURE 10.7 An example of image segmentation based on a combination of spectral heterogeneity and shape. (a) Color composite of three IKONOS spectral bands; (b) multiresolution segmented results based on a combination of spectral value and shape at a scale of 50 pixels.
Barren Degraded Farmland to Grassland Farmland (fallow) Farmland (healthy) Grassland Settlement Water Wetland
FIGURE 10.11 Land covers with an emphasis on degraded land classified from merged ASTER VNIR and SWIR bands at 15 m using the object-oriented method. (Source: Gao, 2008.)
Classified residential Resid. from mangroves Classified mangroves Mangroves from residential Unclassified
0
3
6 km
N
FIGURE 11.12 Results (mangroves and residential areas) mapped using the knowledge-based approach in ERDAS Imagine. (Source: Gao et al., 2004.)
(See last page for Fig. 11.14.)
(a)
(b)
FIGURE 13.1 Drastic changes in land cover can take place over a short period of time in light of a natural disaster such as a tsunami. (a) The Gleebruk village of Aceh on April 12, 2004 before the Boxing Day tsunami; (b) the same area on January 2, 2005 soon after the devastating tsunami. (Copyright DigitalGlobe.)
FIGURE 13.4 An image display method for carrying out qualitative change analysis. In this example the IKONOS-2 image of February 4, 2001 is displayed on top of the historic aerial photograph of the same area taken in 1960. In the middle of the display is a discontinuity in the shoreline position, suggesting that coastal erosion has taken place here during the interval. From this display the magnitude of beach erosion can be measured quantitatively on the screen.
FIGURE 13.5 The image compositing method of highlighting change. This color composite is formed by a historic aerial photograph (red) with bands 1 (green) and 2 (blue) of an IKONOS image. The red color in this composite represents change. The darkish bright red near the coast in the middle of the composite shows change from land to water (i.e., coastal erosion).
11 15 17 21 22 43 54 61 76 77 (a)
1 2 3 4 6 7 8 10 11 12 (b)
FIGURE 13.6 Land cover maps used in a postclassification change detection. (a) Land cover map interpreted from photograph; (b) land cover map derived from supervised classification of SPOT data. Both images were georeferenced to the New Zealand Map Grid (NZMG) coordinate system, covering a ground area of 90.23 km². For land cover codes, refer to Table 13.1. (Source: Gao and Skillcorn, 1996.)
Clear water Forest Industry Lush mangroves Muddy water Pasture Residential River water Stunted mangroves
FIGURE 11.14 Land covers mapped using postclassification filtering. The input image was produced with a parametric classifier. (Source: Gao et al., 2004.)
FIGURE 14.27 Spatial distribution of suspended sediment over the Waitemata Harbour in Auckland, New Zealand, estimated from scanned aerial photographs and bathymetric data. GPS is essential in providing information on the positions of in situ samples. (Source: Gao and O’Leary, 1997.)