Signal and Image Processing for Remote Sensing

Signal andImage Processing forRemote Sensing Signal andImage Processing forRemote Sensing Editedby C.H.Chen Bo...

Author: C.H. Chen

280 downloads 2000 Views 29MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Signal andImage Processing forRemote Sensing

Signal andImage Processing forRemote Sensing Editedby C.H.Chen

Boca Raton London New York

CRC is an imprint of the Taylor & Francis Group, an informa business

Published in 2007 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2007 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-10: 0-8493-5091-3 (Hardcover) International Standard Book Number-13: 978-0-8493-5091-7 (Hardcover) Library of Congress Card Number 2006009330 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Signal and image processing for remote sensing / edited by C.H. Chen. p. cm. Includes bibliographical references and index. ISBN 0-8493-5091-3 (978-0-8493-5091-7) 1. Remote sensing--Data processing. 2. Image processing. 3. Signal processing. I. Chen, C. H. (Chi-hau), 1937G70.4.S535 2006 621.36’78--dc22

2006009330

Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com Taylor & Francis Group is the Academic Division of Informa plc.

and the CRC Press Web site at http://www.crcpress.com

Preface

Both signal processing and image processing are playing increasingly important roles in remote sensing. As most data from satellites are in image form, image processing has been used most often, whereas signal processing contributes significantly in extracting information from the remotely sensed waveforms or time series data. In contrast to other books in this field, which deal almost exclusively with the image processing for remote sensing, this book provides a good balance between the roles of signal processing and image processing in remote sensing. It focuses mainly on methodologies of signal processing and image processing in remote sensing. An emphasis is placed on the mathematical techniques that, we believe, will not change as much as sensor, software, and hardware technologies. Furthermore, the term ‘‘remote sensing’’ is not limited to the problems with data from satellite sensors. Other sensors that acquire data remotely are also considered. Another unique feature of this book is that it covers a broader scope of problems in remote sensing information processing than any other book in this area. The book is divided into two parts. Part I deals with signal processing for remote sensing and has 12 chapters. Part II deals with image processing for remote sensing and has 16 chapters. The chapters are written by leaders in the field. We are very fortunate to have Dr. Norden Huang, inventor of the Huang–Hilbert transform, along with Dr. Steven Long, write a chapter on the application of the normalized Hilbert transform to remote sensing problem, and to have Dr. Enders A. Robinson, who has made many major contributions to geophysical signal processing for over half a century, write a chapter on the basic problem of constructing seismic images by ray tracing. In Part I, following Chapter 1 by Drs. Long and Huang, and my short Chapter 2 on the roles of statistical pattern recognition and statistical signal processing in remote sensing, we start from a very low end of the electromagnetic spectrum. Chapter 3 considers the classification of infrasound at a frequency range of 0.01 Hz to 10 Hz by using a parallel bank neural network classifier and an 11-step feature selection process. The >90% correct classification rate is impressive for this kind of remote sensing data. Chapter 4 through Chapter 6 deal with seismic signal processing. Chapter 4 provides excellent physical insights on the steps for constructing digital seismic images. Even though the seismic image is an image, this chapter is placed in Part I as seismic signals start as waveforms. Chapter 5 considers the singular value decomposition of a matrix data set from scalarsensors arrays, which is followed by an independent component analysis (ICA) step to relax the unjustified orthogonality constraint for the propagation vectors by imposing a stronger constraint of fourth-order independence of the estimated waves. With an initial focus on the use of ICA in seismic data and inspired by Dr. Robinson’s lecture on seismic deconvolution at the 4th International Symposium, 2002, on ‘‘Computer Aided Seismic Analysis and Discrimination,’’ Mr. Zhenhai Wang has examined approaches beyond ICA for improving seismic images. Chapter 6 is an effort to show that factor analysis, as an alternative to stacking, can play a useful role in removing some unwanted components in the data and thereby enhancing the subsurface structure as shown in the seismic images. Chapter 7 on Kalman filtering for improving detection of landmines using electromagnetic signals, which experience severe interference, is another remote sensing problem of higher interest in recent years. Chapter 8 is a representative time-series analysis problem on using meterological and remote sensing indices to monitor vegetation moisture

dynamics. Chapter 9 actually deals with the image data for a digital elevation model but is placed in Part I mainly because the prediction error (PE) filter originates from the geophysical signal processing. The PE filter allows us to interpolate the missing parts of an image. The only chapter that deals with the sonar data is Chapter 10, which shows that a simple blind source separation algorithm based on the second-order statistics can be very effective to remove reverberations in active sonar data. Chapter 11 and Chapter 12 are excellent examples of using neural networks for the retrieval of physical parameters from the remote sensing data. Chapter 12 further provides a link between signal and image processing as the principal component analysis and image sharpening tools employed are exactly what are needed in Part II. With a focus on image processing of remote sensing images, Part II begins with Chapter 13, which is concerned with the physics and mathematical algorithms for determining the ocean surface parameters from synthetic aperture radar (SAR) images. Mathematically, the Markov random field (MRF) is one of the most useful models for the rich contextual information in an image. Chapter 14 provides a comprehensive treatment of MRF-based remote sensing image classification. Besides an overview of previous work, the chapter describes the methodological issues involved and presents results of the application of the technique to the classification of real (both single-date and multitemporal) remote sensing images. Although there are many studies on using an ensemble of classifiers to improve the overall classification performance, the random forest machine learning method for classification of hyperspectral and multisource data as presented in Chapter 15 is an excellent example of using new statistical approaches for improved classification with the remote sensing data. Chapter 16 presents another machine learning method, AdaBoost, to obtain a robustness property in the classifier. The chapter further considers the relations among the contextual classifier, MRF-based methods, and spatial boosting. The following two chapters are concerned with different aspects of the change detection problem. Change detection is a uniquely important problem in remote sensing as the images acquired at different times over the same geographical area can be used in the areas of environmental monitoring, damage management, and so on. After discussing change detection methods for multitemporal SAR images, the chapter examines an adaptive scale-driven technique for change detection in medium resolution SAR data. Chapter 18 evaluates a Wiener filter–based method, Mahalanobis distance, and subspace projection methods of change detection, as well as a higher order, nonlinear mapping using kernel-based machine learning concept to improve the change detection performance as represented by receiver operating characteristic (ROC) curves. In recent years, ICA and related approaches have presented a new potential in remote sensing information processing. A challenging task underlying many hyperspectral imagery applications is decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions. Chapter 19 presents a new method for unsupervised endmember extraction called vertex component analysis (VCA). The VCA algorithms presented have better or comparable performance as compared to two other techniques but require less computational complexity. Other useful ICA applications in remote sensing include feature extraction and speckle reduction of SAR images. Chapter 20 presents two different methods of SAR image speckle reduction using ICA, both making use of the FastICA algorithm. In two-dimensional time series modeling, Chapter 21 makes use of a fractionally integrated autoregressive moving average (FARIMA) analysis to model the mean radial power spectral density of the sea SAR imagery. Long-range dependence models are used in addition to the fractional sea surface models for the simulation of the sea SAR image spectra at different sea states, with and without oil slicks at low computational cost.

Returning to the image classification problem, Chapter 22 deals with the topics of pixel classification using a Bayes classifier, region segmentation guided by morphology and the split-and-merge algorithm, region feature extraction, and region classification. Chapter 23 provides a tutorial presentation of different issues of data fusion for remote sensing applications. Data fusion can improve classification, and four multisensor classifiers are presented for the decision level fusion strategies. Beyond the currently popular transform techniques, Chapter 24 demonstrates that the Hermite transform can be very useful for noise reduction and image fusion in remote sensing. The Hermite transform is an image representation model that mimics some of the important properties of human visual perception, namely, local orientation analysis and the Gaussian derivative model of early vision. Chapter 25 is another chapter that demonstrates the importance of image fusion to improving sea ice classification performance, using backpropagation trained neural network and linear discrimination analysis and texture features. Chapter 26 is on the issue of accuracy assessment for which the Bradley–Terry model is adopted. Chapter 27 is on land map classification using a support vector machine, which has been increasingly popular as an effective classifier. The land map classification classifies the surface of the Earth into categories such as water area, forests, factories, or cities. Finally, with lossless data compression in mind, Chapter 28 focuses on information-theoretic measure of the quality of multiband remotely sensed digital images. The procedure relies on the estimation of parameters of the noise model. Results on image sequences acquired by AVIRIS and ASTER imaging sensors offer an estimation of the information contents of each spectral band. With rapid technological advances in both sensor and processing technologies, a book of this nature can only capture a certain amount of current progress and results. However, if past experience offers any indication, the numerous mathematical techniques presented will give this volume a long lasting value. The sister volumes of this book are the other two books edited by myself. One is Information Processing for Remote Sensing and the other is Frontiers of Remote Sensing Information Processing, both published by World Scientific in 1999 and 2003, respectively. I am grateful to all the contributors of this volume for their important contribution and, in particular, to Professors Dr. J.S. Lee, S. Serpico, L. Bruzzone, and S. Omatu for chapter contributions to all three volumes. Readers are advised to go over all three volumes for more complete information on signal and image processing for remote sensing. C.H. Chen

Editor

Chi Hau Chen was born on December 22nd, 1937. He received his Ph.D. in electrical engineering from Purdue University in 1965, M.S.E.E. degree from the University of Tennessee, Knoxville, in 1962, and B.S.E.E. degree from the National Taiwan University in 1959. He is currently chancellor professor of electrical and computer engineering at the University of Massachusetts, Dartmouth, where he has taught since 1968. His research areas are in statistical pattern recognition and signal/image processing with applications to remote sensing, geophysical, underwater acoustics, and nondestructive testing problems, as well as computer vision for video surveillance, time series analysis, and neural networks. Dr. Chen has published 23 books in his area of research. He is the editor of Digital Waveform Processing and Recognition (CRC Press, 1982) and Signal Processing Handbook (Marcel Dekker, 1988). He is the chief editor of Handbook of Pattern Recognition and Computer Vision, volumes 1, 2, and 3 (World Scientific Publishing, 1993, 1999, and 2005, respectively). He is the editor of Fuzzy Logic and Neural Network Handbook (McGraw-Hill, 1966). In the area of remote sensing, he is the editor of Information Processing for Remote Sensing and Frontiers of Remote Sensing Information Processing (World Scientific Publishing, 1999 and 2003, respectively). He served as the associate editor of the IEEE Transactions on Acoustics Speech and Signal Processing for 4 years, IEEE Transactions on Geoscience and Remote Sensing for 15 years, and since 1986 he has been the associate editor of the International Journal of Pattern Recognition and Artificial Intelligence. Dr. Chen has been a fellow of the Institutue of Electrical and Electronic Engineers (IEEE) since 1988, a life fellow of the IEEE since 2003, and a fellow of the International Association of Pattern Recognition (IAPR) since 1996.

Contributors

J. van Aardt Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Ranjan Acharyya Florida Institute of Technology, Melbourne, Florida Bruno Aiazzi Institute of Applied Physics, National Research Council, Florence, Italy Selim Aksoy Bilkent University, Ankara, Turkey V.Yu. Alexandrov Nansen International Environmental and Remote Sensing Center, St. Petersburg, Russia Luciano Alparone Department of Electronics and Telecommunications, University of Florence, Florence, Italy Stefano Baronti Institute of Applied Physics, National Research Council, Florence, Italy Jon Atli Benediktsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland Fabrizio Berizzi Department of Information Engineering, University of Pisa, Pisa, Italy Massimo Bertacca ISL-ALTRAN, Analysis and Simulation Group—Radar Systems Analysis and Signal Processing, Pisa, Italy Nicolas Le Bihan Laboratory of Images and Signals, CNRS UMR 5083, INP of Grenoble, France William J. Blackwell Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts L.P. Bobylev Nansen International Environmental and Remote Sensing Center, St. Petersburg, Russia A.V. Bogdanov Institute for Neuroinformatich, Bochum, Germany Francesca Bovolo Department of Information and Communication Technology, University of Trento, Trento, Italy Lorenzo Bruzzone Department of Information and Communication Technology, University of Trento, Trento, Italy

Chi Hau Chen Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Frederick W. Chen Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts Salim Chitroub Signal and Image Processing Laboratory, Department of Telecommunication, Algiers, Algeria Leslie M. Collins Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina Fengyu Cong State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China P. Coppin Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Jose´ M.B. Dias Department of Electrical and Computer Engineering, Instituto Superior Te´cnico, Av. Rovisco Pais, Lisbon, Portugal Shinto Eguchi Institute of Statistical Mathematics, Tokyo, Japan Boris Escalante-Ramı´rez School of Engineering, National Autonomous University of Mexico, Mexico City, Mexico Toru Fujinaka

Osaka Prefecture University, Osaka, Japan

Gerrit Gort Department of Biometris, Wageningen University, The Netherlands Fredric M. Ham Florida Institute of Technology, Melbourne, Florida Norden E. Huang

NASA Goddard Space Flight Center, Greenbelt, Maryland

Shaoling Ji State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Peng Jia State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Sveinn R. Joelsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland O.M. Johannessen Nansen Environmental and Remote Sensing Center, Bergen, Norway I. Jonckheere Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium

P. Jo¨nsson

Teachers Education, Malno¨ University, Sweden

Dayalan Kasilingam Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Heesung Kwon U.S. Army Research Laboratory, Adelphi, Maryland Jong-Sen Lee Remote Sensing Division, Naval Research Laboratory, Washington, D.C. S. Lhermitte Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Steven R. Long

NASA Goddard Space Flight Center, Wallops Island, Virginia

Alejandra A. Lo´pez-Caloca Center for Geography and Geomatics Research, Mexico City, Mexico Arko Lucieer Centre for Spatial Information Science (CenSIS), University of Tasmania, Australia Je´roˆme I. Mars Laboratory of Images and Signals, CNRS UMR 5083, INP of Grenoble, France Enzo Dalle Mese Department of Information Engineering, University of Pisa, Pisa, Italy Gabriele Moser Department of Biophysical and Electronic Engineering, University of Genoa, Genoa, Italy Jose´ M.P. Nascimento Instituto Superior, de Eugenharia de Lisbon, Lisbon, Portugal Nasser Nasrabadi U.S. Army Research Laboratory, Adelphi, Maryland Ryuei Nishii Faculty of Mathematics, Kyusyu University, Fukuoka, Japan Sigeru Omatu Osaka Prefecture University, Osaka, Japan Enders A. Robinson Columbia University, Newsburyport, Massachusetts S. Sandven

Nansen Environmental and Remote Sensing Center, Bergen, Norway

Dale L. Schuler D.C.

Remote Sensing Division, Naval Research Laboratory, Washington,

Massimo Selva Institute of Applied Physics, National Research Council, Florence, Italy

Sebastiano B. Serpico Department of Biophysical and Electronic Engineering, University of Genoa, Genoa, Italy Xizhi Shi State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Anne H.S. Solberg Department of Informatics, University of Oslo and Norwegian Computing Center, Oslo, Norway Alfred Stein International Institute for Geo-Information Science and Earth Observation, Enschede, The Netherlands Johannes R. Sveinsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland Yingyi Tan

Applied Research Associates; Inc., Raleigh, North Carolina

Stacy L. Tantum Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina Maria Tates U.S. Army Research Laboratory, Adelphi, Maryland, and Morgan State University, Baltimore, Maryland J. Verbesselt Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Valeriu Vrabie CReSTIC, University of Reims, Reims, France Xianju Wang Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Zhenhai Wang Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Carl White Morgan State University, Baltimore, Maryland Michifumi Yoshioka Osaka Prefecture University, Osaka, Japan Sang-Ho Yun Department of Geophysics, Stanford University, Stanford, California Howard Zebker Department of Geophysics, Stanford University, Stanford, California

Contents

Part I

Signal Processing for Remote Sensing

1.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing................................................................................................................3 Steven R. Long and Norden E. Huang

2.

Statistical Pattern Recognition and Signal Processing in Remote Sensing...........25 Chi Hau Chen

3.

A Universal Neural Network–Based Infrasound Event Classifier ..........................33 Fredric M. Ham and Ranjan Acharyya

4.

Construction of Seismic Images by Ray Tracing.........................................................55 Enders A. Robinson

5.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA ...................................................................75 Nicolas Le Bihan, Valeriu Vrabie, and Je´roˆme I. Mars

6.

Application of Factor Analysis in Seismic Profiling Zhenhai Wang and Chi Hau Chen ......................................................................................103

7.

Kalman Filtering for Weak Signal Detection in Remote Sensing .........................129 Stacy L. Tantum, Yingyi Tan, and Leslie M. Collins

8.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation Moisture Dynamics .................................................153 J. Verbesselt, P. Jo¨nsson, S. Lhermitte, I. Jonckheere, J. van Aardt, and P. Coppin

9.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images ...............................................................................173 Sang-Ho Yun and Howard Zebker

10.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation............................................................................................189 Fengyu Cong, Chi Hau Chen, Shaoling Ji, Peng Jia, and Xizhi Shi

11.

Neural Network Retrievals of Atmospheric Temperature and Moisture Profiles from High-Resolution Infrared and Microwave Sounding Data.....................................................................................205 William J. Blackwell

12.

Satellite-Based Precipitation Retrieval Using Neural Networks, Principal Component Analysis, and Image Sharpening..........................................233 Frederick W. Chen

Part II

Image Processing for Remote Sensing

13.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface...........267 Dale L. Schuler, Jong-Sen Lee, and Dayalan Kasilingam

14.

MRF-Based Remote-Sensing Image Classification with Automatic Model Parameter Estimation......................................................................305 Sebastiano B. Serpico and Gabriele Moser

15.

Random Forest Classification of Remote Sensing Data ..........................................327 Sveinn R. Joelsson, Jon A. Benediktsson, and Johannes R. Sveinsson

16.

Supervised Image Classification of Multi-Spectral Images Based on Statistical Machine Learning........................................................................345 Ryuei Nishii and Shinto Eguchi

17.

Unsupervised Change Detection in Multi-Temporal SAR Images .......................373 Lorenzo Bruzzone and Francesca Bovolo

18.

Change-Detection Methods for Location of Mines in SAR Imagery....................401 Maria Tates, Nasser Nasrabadi, Heesung Kwon, and Carl White

19.

Vertex Component Analysis: A Geometric-Based Approach to Unmix Hyperspectral Data .....................................................................415 Jose´ M.B. Dias and Jose´ M.P. Nascimento

20.

Two ICA Approaches for SAR Image Enhancement................................................441 Chi Hau Chen, Xianju Wang, and Salim Chitroub

21.

Long-Range Dependence Models for the Analysis and Discrimination of Sea-Surface Anomalies in Sea SAR Imagery ...........................455 Massimo Bertacca, Fabrizio Berizzi, and Enzo Dalle Mese

22.

Spatial Techniques for Image Classification..............................................................491 Selim Aksoy

23.

Data Fusion for Remote-Sensing Applications..........................................................515 Anne H.S. Solberg

24.

The Hermite Transform: An Efficient Tool for Noise Reduction and Image Fusion in Remote-Sensing .........................................................................539 Boris Escalante-Ramı´rez and Alejandra A. Lo´pez-Caloca

25.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data.....559 A.V. Bogdanov, S. Sandven, O.M. Johannessen, V.Yu. Alexandrov, and L.P. Bobylev

26.

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix from a Hierarchical Segmentation of an ASTER Image...................591 Alfred Stein, Gerrit Gort, and Arko Lucieer

27.

SAR Image Classification by Support Vector Machine...........................................607 Michifumi Yoshioka, Toru Fujinaka, and Sigeru Omatu

28.

Quality Assessment of Remote-Sensing Multi-Band Optical Images ..................621 Bruno Aiazzi, Luciano Alparone, Stefano Baronti, and Massimo Selva

Index .............................................................................................................................................643

Part I

Signal Processing for Remote Sensing

1 On the Normalized Hilbert Transform and Its Applications in Remote Sensing

Steven R. Long and Norden E. Huang

CONTENTS 1.1 Introduction ........................................................................................................................... 3 1.2 Review of Processing Advances......................................................................................... 4 1.2.1 The Normalized Empirical Mode Decomposition.............................................. 4 1.2.2 Amplitude and Frequency Representations ........................................................ 6 1.2.3 Instantaneous Frequency......................................................................................... 9 1.3 Application to Image Analysis in Remote Sensing....................................................... 12 1.3.1 The IR Digital Camera and Setup........................................................................ 13 1.3.2 Experimental IR Images of Surface Processes ................................................... 13 1.3.3 Volume Computations and Isosurfaces.............................................................. 18 1.4 Conclusion ........................................................................................................................... 21 Acknowledgment......................................................................................................................... 22 References ..................................................................................................................................... 22

1.1

Introduction

The development of this new approach was motivated by the need to describe nonlinear distorted waves in detail, along with the variations of these signals that occur naturally in nonstationary processes (e.g., ocean waves). As has been often noted, natural physical processes are mostly nonlinear and nonstationary. Yet, there have historically been very few options in the available analysis methods to examine data from such nonlinear and nonstationary processes. The available methods have usually been for either linear but nonstationary, or nonlinear but stationary, and statistically deterministic processes. The need to examine data from nonlinear, nonstationary, and stochastic processes in the natural world is due to the nonlinear processes which require special treatment. The past approach of imposing a linear structure (by assumptions) on the nonlinear system is not adequate. Other than periodicity, the detailed dynamics in the processes from the data also need to be determined. This is needed because one of the typical characteristics of nonlinear processes is its intrawave frequency modulation (FM), which indicates the instantaneous frequency (IF) changes within one oscillation cycle. In the past, when the analysis was dependent on linear Fourier analysis, there was no means of depicting the frequency changes within one wavelength (the intrawave frequency variation) except by resorting to the concept of harmonics. The term ‘‘bound 3

Signal and Image Processing for Remote Sensing

4

harmonics’’ was often used in this connection. Thus, the distortions of any nonlinear waveform have often been referred to as ‘‘harmonic distortions.’’ The concept of harmonic distortion is a mathematical artifact resulting from imposing a linear structure (through assumptions) on a nonlinear system. The harmonic distortions may thus have mathematical meaning, but there is no physical meaning associated with them, as discussed by Huang et al. [1,2]. For example, in the case of water waves, such harmonic components do not have any of the real physical characteristics of a water wave as it occurs in nature. The physically meaningful way to describe such data should be in terms of its IF, which will reveal the intrawave FMs occurring naturally. It is reasonable to suggest that any such complicated data should consist of numerous superimposed modes. Therefore, to define only one IF value for any given time is not meaningful (see Ref. [3], for comments on the Wigner–Ville distribution). To fully consider the effects of multicomponent data, a decomposition method should be used to separate the naturally combined components completely and nearly orthogonally. In the case of nonlinear data, the orthogonality condition would need to be relaxed, as discussed by Huang et al. [1]. Initially, Huang et al. [1] proposed the empirical mode decomposition (EMD) approach to produce intrinsic mode functions (IMF), which are both monocomponent and symmetric. This was an important step toward making the application truly practical. With the EMD satisfactorily determined, an important roadblock to truly nonlinear and nonstationary analysis was finally removed. However, the difficulties resulting from the limitations stated by the Bedrosian [4] and Nuttall [5] theorems must also be addressed in connection with this approach. Both limitations have firm theoretical foundations and must be considered; IMFs satisfy only the necessary condition, but not the sufficient condition. To improve the performance of the processing as proposed by Huang et al. [1], the normalized empirical mode decomposition (NEMD) method was developed as a further improvement on the earlier processing methods.

1.2 1.2.1

Review of Processing Advances The Normalized Empirical Mode Decomposition

The NEMD method was developed to satisfy the specific limitations set by the Bedrosian theorem while also providing a sharper measure of the local error when the quadrature differs from the Hilbert transform (HT) result. From an example data set of a natural process, all the local maxima of the data are first determined. These local maxima are then connected with a cubic spline curve, which gives the local amplitude of the data, A(t), as shown together in Figure 1.1. The envelope obtained through spline fitting is used to normalize the data by y(t) ¼

a(t) cos u(t) ¼ A(t)

a(t) cos u(t): A(t)

(1:1)

Here A(t) represents the cubic spline fit of all the maxima from the example data, and thus a(t)/A(t) should normalize y(t) with all maxima then normalized to unity, as shown in Figure 1.2. As is apparent from Figure 1.2, a small number of the normalized data points can still have an amplitude in excess of unity. This is because the cubic spline is through the maxima only, so that at locations where the amplitudes are changing rapidly, the line representing the envelope spline can pass under some of the data points. These occasional

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

5

Example data and splined envelope 0.3

0.2

Magnitude

0.1

0

−0.1

−0.2

−0.3

−0.4 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.1 The best possible cubic spline fit to the local maxima of the example data. The spline fit forms an envelope as an important first step in the process. Note also how the frequency can change within a wavelength, and that the oscillations can occur in groups.

misses are unavoidable, yet the normalization scheme has effectively separated the amplitude from the carrier oscillation. The IF can then be computed from this normalized carrier function y(t), just obtained. Owing to the nearly uniform amplitude, the limitations set by the Bedrosian theorem are effectively satisfied. The IF computed in this way from the normalized data from Figure 1.2 is shown in Figure 1.3, together with the original example data. With the Bedrosian theorem addressed, what of the limitations set by the Nuttall theorem? If the HT can be considered to be the quadrature, then the absolute value of the HT performed on the perfectly normalized example data should be unity. Then any deviation from the absolute value of the HT from unity would be an indication of a difference between the quadrature and the HT results. An error index can thus be defined simply as E(t) ¼ [abs(Hilbert transform (y(t))) 1]2 :

(1:2)

This error index would be not only an energy measure as given in the Nuttall theorem but also a function of time as shown in Figure 1.4. Therefore, it gives a local measure of the error resulting from the IF computation. This local measure of error is both logically and practically superior to the integrated error bound established by the Nuttall theorem. If the quadrature and the HT results are identical, then it follows that the error should be

Signal and Image Processing for Remote Sensing

6

Example data and normalized carrier 1.5 Normalized Data 1

Magnitude

0.5

0

−0.5

−1

−1.5 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.2 Normalized example data of Figure 1.1 with the cubic spline envelope. The occasional value beyond unity is due to the spline fit slightly missing the maxima at those locations.

zero. Based on experience with various natural data sets, the majority of the errors encountered here result from two sources. The first source is due to an imperfect normalization occurring at locations close to rapidly changing amplitudes, where the envelope spline-fitting is unable to turn sharply or quickly enough to cover all the data points. This type of error is even more pronounced when the amplitude is also locally small, thus amplifying any errors. The error index from this condition can be extremely large. The second source is due to nonlinear waveform distortions, which will cause corresponding variations of the phase function u(t). As discussed by Huang et al. [1], when the phase function is not an elementary function, the differentiation of the phase determined by the HT is not identical to that determined by the quadrature. The error index from this condition is usually small (see Ref. et al. [6]). Overall, the NEMD method gives a more consistent, stable IF. The occasionally large error index values offer an indication where the method failed simply because the spline misses and cuts through the data momentarily. All such locations occur at the minimum amplitude with a resulting negligible energy density.

1.2.2

Amplitude and Frequency Representations

In the initial methods [1,2,6], the main result of Hilbert spectral analysis (HSA) always emphasized the FM. In the original methods, the data were first decomposed into IMFs, as

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

7

Example data and instantaneous frequency 2 Data IF

Magnitude and frequency

1.5

1

0.5

0

−0.5 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.3 The instantaneous frequency determined from the normalized carrier function is shown with the example data. Data is about zero, and the instantaneous frequency varies about the horizontal 0.5 value.

defined in the initial work. Then, through the HT, the IF and the amplitude of each IMF were computed to form the Hilbert spectrum. This continues to be the method, especially when the data are normalized. The information on the amplitude or envelope variation is not examined. In the NEMD and HSA approach, it is justifiable not to pay too much attention to the amplitude variations. This is because if there is a mode mixing, the amplitude variation from such mixed mode IMFs does not reveal any true underlying physical processes. However, there are cases when the envelope variation does contain critical information. An example of this is when there is no mode mixing in any given IMF, when a beating signal representing the sum of two coexisting sinusoidal ones is encountered. In an earlier paper, Huang et al. [1] attempted to extract individual components out of the sum of two linear trigonometric functions such as x(t) ¼ cos at þ cos bt:

(1:3)

Two seemingly separate components were recovered after over 3000 sifting steps. Yet the obtained IMFs were not purely trigonometric functions anymore, and there were obvious aliases in the resulting IMF components as well as in the residue. The approach proposed then was unnecessary and unsatisfactory. The problem, in fact, has a much simpler solution: treating the envelope as an amplitude modulation (AM), and then processing just the envelope data. The function x(t), as given in Equation 1.3, can then be rewritten as

Signal and Image Processing for Remote Sensing

8

Offset example data and error measures 0.7 Error index EI quadrature

Offset magnitude, error index (EI), quadrature

0.6

Data + 0.3

0.5

0.4

0.3

0.2

0.1

0 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.4 The error index as it changes with the data location in time. The original example data offset by 0.3 vertically for clarity is also shown. The quadrature result is not visible on this scale.

x(t) ¼ cos at þ cos bt ¼ 2 cos

aþb ab t cos t : 2 2

(1:4)

There is no difference between the sum of the individual components and the modulating envelope form; they are trigonometric identities. If both the frequency of the carrier wave, (aþb)/2, and the frequency of the envelope, (ab)/2, can be obtained, then all the information in the signal can be extracted. This indicates the reason to look for a new approach to extracting additional information from the envelope. In this example, however, the envelope becomes a rectified cosine wave. The frequency would be easier to determine from the simple period counting than from the Hilbert spectral result. For a more general case when the amplitudes of the two sinusoidal functions are not equal, the modulation is not simple anymore. For even more complicated cases, when there are more than two coexisting sinusoidal components with different amplitudes and frequencies, there is no general expression for the envelope and carrier. The final result could be represented as more than one frequency-modulated band in the Hilbert spectrum. It is then impossible to describe the individual components under this situation. In such cases, representing the signal as a carrier and envelope, variation should still be meaningful, for

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

9

the dual representations of frequency arise from the different definitions of frequency. The Hilbert-inspired view of amplitude and FMs still renders a correct representation of the signal, but this view is very different from that of Fourier analysis. In such cases, if one is sure of the stationarity and regularity of the signal, Fourier analysis could be used, which will give more familiar results as suggested by Huang et al. [1]. The judgment for these cases is not on which one is correct, as both are correct; rather, it is on which one is more familiar and more revealing. When more complicated data are present, such as in the case of radar returns, tsunami wave records, earthquake data, speech signals, and so on (representing a frequency ‘‘chirp’’), the amplitude variation information can be found by processing the envelope and treating the data as an approximate carrier. When the envelope of frequency chirp data, such as the example given in Figure 1.5, is decomposed through the NEMD process, the IMF components are obtained as shown in Figure 1.6. Using these components (or IMFs), the Hilbert spectrum can be constructed as given in Figure 1.7, together with its FM counterpart. The physical meaning of the AM spectrum is not as clearly defined in this case. However, it serves to illustrate the AM contribution to the variability of the local frequency. 1.2.3

Instantaneous Frequency

It must be emphasized that IF is a very different concept from the frequency content of the data derived from Fourier-based methods, as discussed in great detail by Huang Example of frequency chirp data 0.8

0.6

0.4

Magnitude

0.2

0

−0.2

−0.4

−0.6

−0.8 0

0.1

0.2

0.3 Time (sec)

0.4

FIGURE 1.5 A typical example of complex natural data, illustrating the concept of frequency ‘‘chirps.’’

0.5

Signal and Image Processing for Remote Sensing

10

Components of frequency chirp data

0

Offset IMF components C1 (top) to C8

12000

24000

36000

48000

60000

72000

84000 0

2000

4000

6000 Time (sec*22050)

8000

10000

12000

FIGURE 1.6 The eight IMF components obtained by processing the frequency chirp data of Figure 1.5, offset vertically from C1 (top) to C8 (bottom).

et al. [1]. The IF, as discussed here, is based on the instantaneous variation of the phase function from the HT of a data-adaptive decomposition, while the frequency content in the Fourier approach is an averaged frequency on the basis of a convolution of data with an a priori basis. Therefore, whenever the basis changes, the frequency content also changes. Similarly, when the decomposition changes, the IF also has to change. However, there are still persistent and common misconceptions on the IF computed in this manner. One of the most prevailing misconceptions about IF is that, for any data with a discrete line spectrum, IF can be a continuous function. A variation of this misconception is that IF can give frequency values that are not one of the discrete spectral lines. This dilemma can be resolved easily. In the nonlinear cases, when the IF approach treats the harmonic distortions as continuous intrawave FMs, the Fourier-based methods treat the frequency content as discrete harmonic spectral lines. In the case of two or more beating waves, the IF approach treats the data as AM and FM modulation, while the frequency content from the Fourier method treats each constituting wave as a discrete spectral line, if the process is stationary. Although they appear perplexingly different, they represent the same data. Another misconception is on negative IF values. According to Gabor’s [7] approach, the HT is implemented through two Fourier transforms: the first transforms the data into frequency space, while the second performs an inverse Fourier transform after discarding all the negative frequency parts [3]. Therefore, according to this argument, all the negative frequency content has been discarded, which then raises the question, how can there still

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

11

FM and AM Hilbert spectra 3000

2500

Frequency (Hz)

2000

1500

1000

500

0 0

0.05

0.1

0.15

0.2

0.25 0.3 Time (sec)

0.35

0.4

0.45

0.5

FIGURE 1.7 The AM and FM Hilbert spectral results from the frequency chirp data of Figure 1.5.

be negative frequency values? This question arises due to a misunderstanding of the nature of negative IF from the HT. The direct cause of negative frequency in the HT is the consequence of multiple extrema between two zero-crossings. Then, there are local loops not centered at the origin of the coordinate system, as discussed by Huang et al. [1]. Negative frequency can also occur even if there are no multiple extrema. For example, this would happen when there are large amplitude fluctuations, which cause the Hilberttransformed phase loop to miss the origin. Therefore, the negative frequency does not influence the frequency content in the process of the HT through Gabor’s [7] approach. Both these causes are removed by the NEMD and the normalized Hilbert transform (NHT) methods presented here. The latest versions of these methods (NEMD/NHT) consistently give more stable IF values. They satisfy the limitation set by the Bedrosian theorem and offer a local measure of error sharper than the Nuttall theorem. Note here that in the initial spline of the amplitude done in the NEMD approach, the end effects again become important. The method used here is just to assign the end points as a maximum equal to the very last value. Other improvements using characteristic waves and linear predictions, as discussed in Ref. [1], can also be employed. There could be some improvement, but the resulting fit will be very similar. Ever since the introduction of the EMD and HSA by Huang et al. [1,2,8], these methods have attracted increasing attention. Some investigators, however, have expressed certain reservations. For example, Olhede and Walden [9] suggested that the idea of computing

12

Signal and Image Processing for Remote Sensing

IF through the Hilbert transform is good, but that the EMD approach is not rigorous. Therefore, they have introduced the wavelet projection as the method for decomposition and adopt only the IF computation from the Hilbert transform. Flandrin et al. [10], however, suggest that the EMD is equivalent to a bank of dyadic filters, but refrain from using the HT. From the analysis presented here, it can be concluded that caution when using the HT is fully justified. The limitations imposed by Bedrosian and Nuttall certainly have solid theoretical foundations. The normalization procedure shown here will remove any reservations about further applications of the improved HT methods in data analysis. The method offers relatively little help to the approach advanced by Olhede and Walden [9] because the wavelet decomposition definitely removes the nonlinear distortions from the waveform. The consequence of this, however, is that their approach should also be limited to nonstationary, but linear, processes. It only serves the limited purpose of improving the poor frequency resolution of the continuous wavelet analysis. As clearly shown in Equation 1.1, to give a good representation of actual wave data or other data from natural processes by means of an analytical wave profile, the analytical profile will need to have IMFs, and also obey the limitations imposed by the Bedrosian and Nuttall theorems. In the past, such a thorough examination of the data has not been done. As reported by Huang et al. [2,8], most of the actual wave data recorded are not composed of single components. Consequently, the analytical representation of a given wave profile in the form of Equation 1.1 poses a challenging problem theoretically.

1.3

Application to Image Analysis in Remote Sensing

Just as much of the data from natural phenomena are either nonlinear or nonstationary, or both, so it is also with the data that form images of natural processes. The methods of image processing are already well advanced, as can be seen in reviews such as by Castleman [11] or Russ [12]. The NEMD/NHT methods can now be added to the available tools for producing new and unique image products. Nunes et al. [13] and Linderhed [14–16], among others, have already done significant work in this new area. Because of the nonlinear and nonstationary nature of natural processes, the NEMD/NHT approach is especially well suited for image data, giving frequencies, inverse distances, or wave numbers as a function of time or distance, along with the amplitudes or energy values associated with these, as well as a sharp identification of imbedded structures. The various possibilities and products of this new analysis approach include, but are not limited to, joint and marginal distributions, which can be viewed as isosurfaces, contour plots, and surfaces that contain information on frequency, inverse wavelength, amplitude, energy and location in time, space, or both. Additionally, the concept of component images representing the intrinsic scales and structures imbedded in the data is now possible, along with a technique for obtaining frequency variations of structures within the images. The laboratory used for producing the nonlinear waves, used as an example here, is the NASA Air–Sea Interaction Research Facility (NASIRF) located at the NASA Goddard Space Flight Center/Wallops Flight Facility, at Wallops Island, Virginia, within the Ocean Sciences Branch. The test section of the main wave tank is 18.3 m long and 0.9 m wide, filled to a depth of 0.76 m of water, leaving a height of 0.45 m over the water for airflow,

On the Normalized Hilbert Transform and Its Applications in Remote Sensing 5.05 m

13

New coil

1.22 m 18.29 m 30.48 m FIGURE 1.8 The NASA Air–Sea Interaction Research Facility’s (NASIRF) main wave tank at Wallops Island, VA. The new coils shown were used to provide cooling and humidity control in the airflow overheated water.

if needed. The facility can produce wind and paddle-generated waves over a water current in either direction, and its capabilities, instruments and software have been described in detail by Long and colleagues [17–21]. The basic description is shown with an additional new feature indicated as new coil in Figure 1.8. These were recently installed to provide cold air of controlled temperature and humidity for experiments using cold air overheated water during the Flux Exchange Dynamics Study of 2004 (FEDS4) experiments, a joint experiment involving the University of Washington/Applied Physics Laboratory (UW/APL), The University of Alberta, the Lamont-Doherty Earth Observatory of Columbia University, and NASA GSFC/Wallops Flight Facility. The cold airflow overheated water optimized conditions for the collection of infrared (IR) video images.

1.3.1

The IR Digital Camera and Setup

The camera used to acquire the laboratory image presented here as an example was provided by UW/APL as part of FEDS4. The experimental setup is shown in Figure 1.9. For the example shown here, the resolution of the IR image was 640 512 pixels. The camera was mounted to look upwind at the water surface, so that its pixel image area covered a physical rectangle on the water surface on the order of 10 cm per side. The water within the wave tank was heated by four commercial spa heaters, while the air in the airflow was cooled and humidity controlled by NASIRF’s new cooling and reheating coils. This produced a very thin layer of surface water that was cooled, so that whenever wave spilling and breaking occurred, it could be immediately seen by the IR camera.

1.3.2

Experimental IR Images of Surface Processes

With this imaging system in place, steps were taken to acquire interesting images of wave breaking and spilling due to wind and wave interactions. One such image is illustrated in Figure 1.10. To help the eyes visualize the image data, the IR camera intensity levels have been converted to a grey scale. Using a horizontal line that slices through the central area of the image at the value of 275, Figure 1.11 illustrates the details contained in the actual array of data values obtained from the IR camera. This gives the IR camera intensity values stored in the pixels along the horizontal line. These can then be converted to actual temperatures when needed. A complex structure is evident here. Breaking wave fronts are evident in the crescent-

Signal and Image Processing for Remote Sensing

14

Flux Exchange Dynamics Study (FEDS) 4 NASA Air–Sea Interaction Research Facility, Wallops Island, Virginia Q = rCPKH (Tb – Ts) Use two method for Tb • Direct measurement • Infer from PDF

IR camera

Instrumentation IR Camera IR Radiometer Air : U, T, q profiles SeaBird T sensors Fast T sensors Gas Chromatograph

Measurement KH via ACFT, p (Tsurf) Tskin, calibrated "LabRad" Qnet, u* Tbulk, calibrated Twater, sub-skin profiles Bulk KG LabRad IR radiometer Tskin (calibrated)

CO2 laser

KH via ACFT Tbulk via PDF

Tair qair u

FLUXES Qsensible Qlatent u*

Wind 45 cm

Tbulk 28 mm

Tprofile

76 cm

72 mm 150 mm

FIGURE 1.9 The experimental arrangement of FEDS4 (Flux Exchange Dynamics Study of 2004) used to capture IR images of surface wave processes. (Courtesy of A. Jessup and K. Phadnis of UW/APL.)

IR image of water wave surface 250

100 200

Vertical pixels

200 150 300

100

400

500

50

600 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

0

FIGURE 1.10 Surface IR image from the FEDS4 experiment. Grey bar gives the IR camera intensity levels. (Data courtesy of A. Jessup and K. Phadnis of UW/APL.)

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

15

shaped structures, where spilling and breaking brings up the underlying warmer water. After processing, the resulting components produced from the horizontal row of Figure 1.11 are shown in Figure 1.12. As can be seen, the component with the longest scale, C9, contains the bulk of the intensity values. The shorter, riding scales are fluctuations about the levels shown in component C9. The sifting was done via the extrema approach discussed in the foundation articles, and produced a total of nine components. Using this approach, the IR image was first divided into 640 horizontal rows of 512 values each. The rows were then processed to produce the components, each of the 640 rows producing a component set similar to that shown in Figure 1.12. From these basic results, component images can be assembled. This is done by taking the first component representing the shortest scale from each of the 640 component sets. These first components are then assembled together to produce an array that is also 640 rows by 512 columns and can also be visualized as an image. This is the first component image. This production of component images is then continued in a similar fashion with the remaining components representing progressively longer scales. To visualize the shortest component scales, component images 1 through 4 were added together, as shown in Figure 1.13. Throughout the image, streaks of short wavy structures can be seen to line up in the wind direction (along the vertical axis). Even though the image is formed in the IR camera by measuring heat at many different pixel locations over a rectangular area, the surface waves have an effect that can be thus remotely sensed in the image, either as streaks of warmer water exposed by breaking or as more wavelike structures. If the longer scale components are now combined using the 5th and 6th component images, a composite image is obtained as shown in Figure 1.14. Longer scales can be seen throughout the image area where breaking and mixing occur. Other wavelike

Row 275 of IR image of water wave surface

190 180 170

IR intensity levels

160 150 140 130 120 110 100 90 0

100

200

300

400

500

600

Horizontal pixels FIGURE 1.11 A horizontal slice of the raw IR image given in Figure 1.10, taken at row 275. Note the details contained in the IR image data, showing structures containing both short and longer length scales.

Signal and Image Processing for Remote Sensing

16

C1 C2 C3

10 0 −10 10 0 −10

5 0 −5 2 0 −2

C9

C7

5 0 −5 5 0 −5

C8

C6

C5

10 0 −10

C4

Components of row 275 of IR image 20 0 −20

134 132 130

50

100

150

200

250 300 Horizontal pixels

350

400

450

500

FIGURE 1.12 Components obtained by processing data from the slice shown in Figure 1.11. Note that component C9 carries the bulk of the intensity scale, while the other components with shorter scales record the fluctuations about these base levels.

structures of longer wavelengths are also visible. To produce a true wave number from images like these, one only has to convert using k ¼ 2p=l,

(1:5)

where k is wave number (in 1/cm) and l is wavelength (in cm). This would only require knowing the physical size of the image in centimeters or some other unit and its equivalent in pixels from the array analyzed. Another approach to the raw image of Figure 1.10 is to separate the original image into columns instead of rows. This would make the analysis more sensitive to structures that were better aligned with that direction, and also with the direction of wind and waves. By repeating the steps leading to Figure 1.13, the shortest scale component images in component images 3 to 5 can be combined to form Figure 1.15. Component images 1 and 2 developed from the vertical column analysis were not included here, after they were found to contain results of such a short scale uniformly spread throughout the image, and without structure. Indeed, they had the appearance of uniform noise. It is apparent that more structures at these scales can be seen by analyzing along the column direction. Figure 1.16 represents the longer scale in component image 6. By the 6th component image, the lamination process starts to fail somewhat in reassembling the image from the components. Further processing is needed to better match the results at these longer scales.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

17

Horizantal IR components 1 to 4 100 100

50

Vertical pixels

200

300

0

400 −50 500

−100

600 50

100

150

200 300 250 Horizontal pixels

350

400

450

500

FIGURE 1.13 (See color insert following page 302.) Component images 1 to 4 from the horizontal rows used to produce a composite image representing the shortest scales.

Horizontal IR components 5 to 6 40 100

30 20

200 Vertical pixels

10

300 0

400

−10 −20

500

−30 600 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

−40

FIGURE 1.14 (See color insert following page 302.) Component images 5 to 6 from the horizontal rows used to produce a composite image representing the longer scales.

Signal and Image Processing for Remote Sensing

18

Vertical pixels

Vertical IR components 3 to 5

100

40

200

20

300

0

400

−20

500

−40

600

−60 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.15 (See color insert following page 302.) Component images 3 to 5 from the vertical rows here combined to produce a composite image representing the midrange scales.

When the original data are a function of time, this new approach can produce the IF and amplitude as functions of time. Here, the original data are from an IR image, so that any slice through the image (horizontal or vertical) would be a set of camera values (ultimately temperature) representing the temperature variation over a physical length. Thus, instead of producing frequency (inverse time scale), the new approach here initially produces an inverse length scale. In the case of water surface waves, this is the familiar scale of the wave number, as given in Equation 1.5. To illustrate this, consider Figure 1.17, which shows the changes of scale along the selected horizontal row 400. The largest measures of IR energy can be seen to be at the smaller inverse length scales, which imply that it came from the longer scales of components 3 and 4. Figure 1.18 repeats this for the even longer length scales in components 5 and 6. Returning to the column-wise processing at column 250 of Figure 1.15 and Figure 1.16, further processing gives the contour plot of Figure 1.19, for components 3 through 5, and Figure 1.20, for components 4 through 6.

1.3.3

Volume Computations and Isosurfaces

Many interesting phenomena happen in the flow of time, and thus it is interesting to note how changes occur with time in the images. To include time in the analysis, a sequence of images taken at uniform time steps can be used. By starting with a single horizontal or vertical line from the image, a contour plot can be produced, as was shown in Figure 1.7 through Figure 1.20. Using a set of sequential images covering a known time period and a pixel line of data from each (horizontal or vertical), a set of numerical arrays can be obtained from the NEMD/NHT analysis. Each

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

19

Vertical IR components image 6 25 20 100 15

Vertical pixels

200

10 5

300 0 −5

400

−10 500 −15 −20

600 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.16 (See color insert following page 302.) Component image 6 from the vertical row used to produce a composite image representing the longer scale.

array can be visualized by means of a contour plot, as already shown. The entire set of arrays can also be combined in sequence to form an array volume, or an array of dimension 3. Within the volume, each element of the array contains the amplitude or intensity of the data from the image sequence. The individual element location within the three-dimensional array specifies values associated with the stored data. One axis (call it x) of the volume can represent horizontal or vertical distance down the data line taken from the image. Another axis (call it y) can represent the resulting inverse length scale associated with the data. The additional axis (call it z) is produced by laminating the arrays together, and represents time, because each image was acquired in repetitive time steps. Thus, the position of the element in the volume gives location x along the horizontal or vertical slice, inverse length along the y-axis, and time along the z-axis. Isosurface techniques would be needed to visualize this. This could be compared to peeling an onion, except that the different layers, or spatial contour values, are not bound in spherical shells. After a value of data intensity is specified, the isosurface visualization makes all array elements transparent outside of the level of the value chosen, while shading in the chosen value so that the elements inside that level (or behind it) cannot be seen. Some examples of this procedure can be seen in Ref. [21]. Another approach with the analysis of images is to reassemble lines from the image data using a different format. A sequence of images in units of time is needed, and using the same horizontal or vertical line from each image in the time sequence, each line can be laminated to its predecessor to build up an array that is the image length along the chosen line along one edge, and the number of images along the other axis, in units of

Signal and Image Processing for Remote Sensing

20

Horizontal row 400: components 1 to 4 10

Inverse length scale (1/pixel)

0.45

9

0.4

8

0.35

7

0.3

6

0.25

5

0.2

4

0.15

3

0.1

2 1

0.05 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

FIGURE 1.17 (See color insert following page 302.) The results from the NEMD/NHT computation on horizontal row 400 for components 1 to 4, which resulted from Figure 1.13. Note the apparent influence of surface waves on the IR information. The most intense IR radiation can be seen at the smaller values of inverse length scale, denoting the longer scales in components 3 and 4. A wavelike influence can be seen at all scales.

Horizontal row 400: components 5 to 6

0.04

8 0.035 7 Inverse length scale (1/pixel)

0.03 6 0.025 5 0.02 4 0.015

3

0.01

2

0.005

0 0

1

50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.18 (See color insert following page 302.) The results from the NEMD/NHT computation on horizontal row 400 for components 5 to 6, which resulted from Figure 1.14. Even at the longer scales, an apparent influence of surface waves on the IR information can still be seen.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

21

Inverse length scale (1/pixel)

Vertical column 250: components 3 to 5

0.18

10

0.16

9 8

0.14

7 0.12 6 0.1 5 0.08 4 0.06 3 0.04

2

0.02

1

100

200

300 400 Vertical pixels

500

600

FIGURE 1.19 The contour plot developed from the vertical slice at column 250, using the components 3 to 5. The larger IR values can be seen at longer length scales.

time. Once complete, this two-dimensional array can be split into slices along the time axis. Each of these time slices, representing the variation in data values with time at a single-pixel location, can then be processed with the new NEMD/NHT technique. An example of this can also be seen in Ref. [21]. The NEMD/NHT techniques can thus reveal variations in frequency or time in the data at a specific location in the image sequence.

1.4

Conclusion

With the introduction of the normalization procedure, one of the major obstacles for NEMD/NHT analysis has been removed. Together with the establishment of the confidence limit [6] through the variation of stoppage criterion, and the statistically significant test of the information content for IMF [10,22], and the further development of the concept of IF [23], the new analysis approach has indeed approached maturity for applications empirically, if not mathematically (for a recent overview of developments, see Ref. [24]). The new NEMD/NHT methods provide the best overall approach to determine the IF for nonlinear and nonstationary data. Thus, a new tool is available to aid in further understanding and gaining deeper insight into the wealth of data now possible by remote

Signal and Image Processing for Remote Sensing

22

Vertical column 250: components 4 to 6 0.1 25 0.09

Inverse length scale (1/pixel)

0.08

20

0.07 0.06

15

0.05 10

0.04 0.03

5

0.02 0.01 100

200

300 400 Vertical pixels

500

600

FIGURE 1.20 The contour plot developed from the vertical slice at column 250, using the components 4 through 6, as in Figure 1.19.

sensing and other means. Specifically, the application of the new method to data images was demonstrated. This new approach is covered by several U.S. Patents held by NASA, as discussed by Huang and Long [25]. Further information on obtaining the software can be found at the NASA authorized commercial site: http://www.fuentek.com/technologies/hht.htm

Acknowledgment The authors wish to express their continuing gratitude and thanks to Dr. Eric Lindstrom of NASA headquarters for his encouragement and support of the work.

References 1. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, S.H., Zheng, Q., Tung, C.C., and Liu, H.H., The empirical mode decomposition method and the Hilbert spectrum for non-stationary time series analysis, Proc. Roy. Soc. London, A454, 903–995, 1998.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

23

2. Huang, N.E., Shen, Z., and Long, S.R., A new view of water waves—the Hilbert spectrum, Ann. Rev. Fluid Mech., 31, 417–457, 1999. 3. Flandrin, P., Time–Frequency/ Time–Scale Analysis, Academic Press, San Diego, 1999. 4. Bedrosian, E., On the quadrature approximation to the Hilbert transform of modulated signals, Proc. IEEE, 51, 868–869, 1963. 5. Nuttall, A.H., On the quadrature approximation to the Hilbert transform of modulated signals, Proc. IEEE, 54, 1458–1459, 1966. 6. Huang, N.E., Wu, M.L., Long, S.R., Shen, S.S.P., Qu, W.D., Gloersen, P., and Fan, K.L., A confidence limit for the empirical mode decomposition and the Hilbert spectral analysis, Proc. Roy. Soc. London, A459, 2317–2345, 2003. 7. Gabor, D., Theory of communication, J. IEEE, 93, 426–457, 1946. 8. Huang, N.E., Long, S.R., and Shen, Z., The mechanism for frequency downshift in nonlinear wave evolution, Adv. Appl. Mech., 32, 59–111, 1996. 9. Olhede, S. and Walden, A.T., The Hilbert spectrum via wavelet projections, Proc. Roy. Soc. London, A460, 955–975, 2004. 10. Flandrin, P., Rilling, G., and Gonc¸alves, P., Empirical mode decomposition as a filterbank, IEEE Signal Proc. Lett., 11(2), 112–114, 2004. 11. Castleman, K.R., Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1996. 12. Russ, J.C., The Image Processing Handbook, 4th Edition, CRC Press, Boca Raton, 2002. 13. Nunes, J.C., Guyot, S., and Dele´chelle, E., Texture analysis based on local analysis of the bidimensional empirical mode decomposition, Mach. Vision Appl., 16(3), 177–188, 2005. 14. Linderhed, A., Compression by image empirical mode decomposition, IEEE Int. Conf. Image Process., 1, 553–556, 2005. 15. Linderhed, A., Variable sampling of the empirical mode decomposition of two-dimensional signals, Int. J. Wavelets, Multi-resolut. Inform. Process., 3, 2005. 16. Linderhed, A., 2D empirical mode decompositions in the spirit of image compression, Wavelet Independ. Compon. Analy. Appl. IX, SPIE Proc., 4738, 1–8, 2002. 17. Long, S.R., NASA Wallops Flight Facility Air–Sea Interaction Research Facility, NASA Reference Publication, No. 1277, 1992, 29 pp. 18. Long, S.R., Lai, R.J., Huang, N.E., and Spedding, G.R., Blocking and trapping of waves in an inhomogeneous flow, Dynam. Atmos. Oceans, 20, 79–106, 1993. 19. Long, S.R., Huang, N.E., Tung, C.C., Wu, M.-L.C., Lin, R.-Q., Mollo-Christensen, E., and Yuan, Y., The Hilbert techniques: An alternate approach for non-steady time series analysis, IEEE GRSS, 3, 6–11, 1995. 20. Long, S.R. and Klinke, J., A closer look at short waves generated by wave interactions with adverse currents, Gas Transfer at Water Surfaces, Geophysical Monograph 127, American Geophysical Union, 121–128, 2002. 21. Long, S.R., Applications of HHT in image analysis, Hilbert–Huang Transform and Its Applications, Interdisciplinary Mathematical Sciences, 5, 289–305, World Scientific, Singapore, 2005. 22. Wu, Z. and Huang, N.E., A study of the characteristics of white noise using the empirical mode decomposition method, Proc. Roy. Soc. London, A460, 1597–1611, 2004. 23. Huang, N.E., Wu, Z., Long, S.R., Arnold, K.C., Blank, K., and Liu, T.W., On instantaneous frequency, Proc. Roy. Soc. London 2006, in press. 24. Huang, N.E., Introduction to the Hilbert–Huang transform and its related mathematical problems, Hilbert–Huang Transform and Its Applications, Interdisciplinary Mathematical Sciences, 5, 1–26, World Scientific, Singapore, 2005. 25. Huang, N.E. and Long, S.R., A generalized zero-crossing for local frequency determination, US Patent pending, 2003.

2 Statistical Pattern Recognition and Signal Processing in Remote Sensing

Chi Hau Chen

CONTENTS 2.1 Introduction ......................................................................................................................... 25 2.2 Introduction to Statistical Pattern Recognition in Remote Sensing............................ 26 2.3 Using Self-Organizing Maps and Radial Basis Function Networks for Pixel Classification ....................................................................................................... 28 2.4 Introduction to Statistical Signal Processing in Remote Sensing................................ 28 2.5 Conclusions.......................................................................................................................... 30 References ..................................................................................................................................... 30

2.1

Introduction

Basically, statistical pattern recognition deals with the correct classification of a pattern into one of several available pattern classes. Basic topics in statistical pattern recognition include: preprocessing, feature extraction and selection, parametric or nonparametric probability density, decision-making processes, performance evaluation, postprocessing as needed, supervised and unsupervised learning, or training, and cluster analysis. The large amount of data available makes remote-sensing data uniquely suitable for statistical pattern recognition. Signal processing is needed not only to reduce the undesired noises and interferences but also to extract desired information from the data as well as to perform the preprocessing task for pattern recognition. Remote-sensing data considered include those from multispectral, hyperspectral, radar, optical, and infrared sensors. Statistical signal-processing methods, as used in remote sensing, include transform methods such as principal component analysis (PCA), independent component analysis (ICA), factor analysis, and the methods using high-order statistics. This chapter is presented as a brief overview of the statistical pattern recognition and statistical signal processing in remote sensing. The views and comments presented, however, are largely those of this author. The chapter introduces the pattern recognition and signal-processing topics dealt in this book. The readers are highly recommended to refer the book by Landgrebe [1] for remote-sensing pattern classification issues and the article by Duin and Tax [2] for a survey on statistical pattern recognition.

25

Signal and Image Processing for Remote Sensing

26

Although there are many applications of statistical pattern recognition, its theory has been developed only during the last half century. A list of some major theoretical developments includes the following: .

Formulation of pattern recognition as a Bayes decision theory problem [3]

.

Nearest neighbor decision rules (NNDRs) and density estimation [4]

.

Use of Parzen density estimate in nonparametric pattern recognition [5] Leave-one-out method of error estimation [6]

.

.

Use of statistical distance measures and error bounds in feature evaluation [7] Hidden Markov models as one way to deal with contextual information [8]

.

Minimization of the perceptron criterion function [9]

.

Fisher linear discriminant and multicategory generlizations [10] Link between backpropagation trained neural networks and the Bayes discriminant [11]

.

.

. .

Cover’s theorem on the separability of patterns [12] Unsupervised learning by decomposition of mixture densities [13]

.

K-mean algorithm [14] Self-organizing map (SOM) [15]

.

Statistical learning theory and VC dimension [16,17]

.

Support vector machine for pattern recognition [17] Combining classifiers [18]

.

. . .

Nonlinear mapping [19] Effect of finite sample size (e.g., [13])

In the above discussion, the role of artificial neural networks on statistical classification and clustering has been taken into account. The above list is clearly not complete and is quite subjective. However, these developments clearly have a significant impact on information processing in remote sensing. We now examine briefly the performance measures in statistical pattern recognition. .

Error probability. This is most popular as the Bayes decision rule is optimum for minimum error probability. It is noted that an average classification accuracy was proposed by Wilkinson [20] for remote sensing.

.

Ratio of interclass distance to within-class distance. This is most popular for discriminant analysis that seeks to maximize such a ratio. Mean square error. This is most popular mainly in error correction learning and in neural networks.

.

.

ROC (receiver operating characteristics) curve, which is a plot of the probability of correct decision versus the probability of false alarm, with other parameters given.

Other measures, like error-reject tradeoff, are often used in character recognition.

2.2

Introduction to Statistical Pattern Recognition in Remote Sensing

Feature extraction and selection is still a basic problem in statistical pattern recognition for any application. Feature measurements constructed from multiple bands of the

Statistical Pattern Recognition and Signal Processing in Remote Sensing

27

remote-sensing data as a vector are still most commonly used in remote-sensing pattern recognition. Transform methods are useful to reduce the redundancy in vector measurements. The dimensionality reduction has been a particularly important topic in remote sensing in view of the hyperspectral image data, which normally has several hundred spectral bands. The parametric classification rules include the Bayes or maximum likelihood decision rule and discriminant analysis. The nonparametric (or distribution free) method of classification includes NNDR and its modifications, and the Parzen density estimation. In the early 1970s, the multivariate Gaussian assumption was most popular in the multispectral data classification problem. It was demonstrated that the empirical data follows the Gaussian distribution reasonably well [21]. Even with the use of new sensors and the expanded application of remote sensing, the Gaussian assumption remains to be a good approximation. The traditional multivariate analysis still plays a useful role in remotesensing pattern recognition [22] and, because of the importance of covariance matrix, methods to use unsupervised samples to ‘‘enhance’’ the data statistics have also been considered. Indeed, for good classification, data statistics must be carefully examined. An example is the synthetic aperture radar (SAR) image data. Chapter 13 presents a discussion on the physical and statistical characteristics of the SAR image data. Without making use of the Gaussian assumption, the NNDR is the most popular nonparametric classification method. It works well even with a moderate size data set and promises an error rate that is upper-bounded by twice the Bayes error rate. However, its performance is limited in remote-sensing data classification, while neural networks–based classifiers can reach the performance nearly equal to that of the Bayes classifier. Extensive study has been done in the statistical pattern recognition community to improve the performance of NNDR. We would like to mention the work of Grabowski et al. [23] here, which introduces the k-near surrounding neighbor (k-NSN) decision rule with application to remote-sensing data classification. Some unique problem areas of statistical pattern recognition in remote sensing are the use of contextual information and the ‘‘Hughes phenomenon.’’ The use of Markov random field model for contextual information is presented in Chapter 14 of this volume. While the classification performance generally improves with increases in the feature dimension, the performance reaches a peak without a proportional increase in the training sample size, beyond which the performance degrades. This is the so-called ‘‘Hughes phenomenon.’’ Methods to reduce this phenomenon are well presented in Ref. [1]. Data fusion is important in remote sensing as different sensors, which have different strengths, are often used. The subject is treated in Chapter 23 of this volume. Though the approach is not limited to statistical methodology [24], the approaches in combining classifiers in statistical pattern recognition and neural networks can be quite useful in providing effective utilizations of information from different sensors or sources to achieve the best-available classification performance. In this volume, Chapter 15 and Chapter 16 present two approaches in statistical combing of classifiers. The recent development in support vector machine appears to present an ultimate classifier that may provide the best classification performance. Indeed, the design of the classifier is fundamental to the classification performance. There is, however, a basic question: ‘‘Is there a best classifier?’’ [25]. The answer is ‘‘No’’ as, among other reasons, it is evident that the classification process is data-dependent. Theory and practice are often not consistent in pattern recognition. The preprocessing and feature extraction and selection are important and can influence the final classification performance. There are no clear steps to be taken in preprocessing, and the optimal feature extraction and selection is still an unsolved problem. A single feature derived from the genetic algorithm may perform better than several original features. There is always a choice to be made between using a complex

28

Signal and Image Processing for Remote Sensing

feature set followed by a simple classifier and a simple feature set followed by a complex classifier. In this volume, Chapter 27 deals with the classification by support vector machine. Among many other publications on the subject, Melgani and Bruzzone [26] provide an informative comparison of the performance of several support vector machines.

2.3 Using Self-Organizing Maps and Radial Basis Function Networks for Pixel Classification In this section, some experimental results are presented to illustrate the importance of preprocessing before classification. The data set, which is now available at the IEEE Geoscience and Remote Sensing Society database, consists of 250 350 pixel images. They were acquired by two imaging sensors installed on a Daedalus 1268 Airborne Thematic Mapper (ATM) scanner and a PLC-band, fully polarimetric NASA/JPL SAR sensor of an agricultural area near the village of Feltwell, U.K. The original SAR images include nine channels. Figure 2.1 shows the original nine channels of image data. The radial basis function (RBF) neural network is used for classification [27]. However, preprocessing is performed by the SOM that performs preclustering. The weights of the SOM are chosen as centers for RBF neurons. RBF has five output nodes for five pattern classes on the image data considered (SAR and ATM images in an agricultural area). Weights of the ‘‘n’’ most-frequently-fired neurons, when each class was presented to the SOM, were separately taken as the center for the 5 n RBF neurons. The weights between the hidden-layer neurons and the output-layer neurons were computed by a procedure for a generalized radial-basis function networks. Pixel classification using SOM alone (unsupervised) is 62.7% correct. Pixel classification using RBF alone (supervised) is 89.5% correct, at best. Pixel classification using both SOM and RBF is 95.2% correct. This result is better than the reported results on the same data set using RBF [28] at 90.5% correct or ICA-based features with nearest neighbor classification rule [29] at 86% correct.

2.4

Introduction to Statistical Signal Processing in Remote Sensing

Signal and image processing is needed in remote-sensing information processing to reduce the noise and interference with the data, to extract the desired signal and image component, or to derive useful measurements for input to the classifier. The classification problem is, in fact, very closely linked to signal and image processing [30,31]. Transform methods have been most popular in signal and image processing [32]. Though the popular wavelet transform method for remote sensing is treated elsewhere [33], we have included Chapter 1 in this volume, which presents the popular Hilbert–Huang transform; Chapter 24, which deals with the use of Hermite transform in the multispectral image fusion; and Chapter 10, Chapter 19, and Chapter 20, which make use of the methods of ICA. Although there is a constant need for better sensors, the signal-processing algorithm such as the one presented in Chapter 7 demonstrates well the role of Kalman filtering in weak signal detection. Time series modeling as used in remote sensing is the subject of Chapter 8 and Chapter 21. Chapter 6 of this volume makes use of the factor analysis.

Statistical Pattern Recognition and Signal Processing in Remote Sensing

29

th-c-hh

th-c-hv

th-c-vv

th-l-hh

th-l-hv

th-l-vv

th-p-hh

th-p-hv

th-p-vv

FIGURE 2.1 A nine-channel SAR image data set.

Signal and Image Processing for Remote Sensing

30

In spite of the numerous efforts with the transform methods, the basic method of PCA always has its useful role in remote sensing [34]. Signal decomposition and the use of high-order statistics can potentially offer new solutions to the remote-sensing information processing problems. Considering the nonlinear nature of the signal and image processing problems, it is necessary to point out the important roles of artificial neural networks in signal processing and classification, as presented in Chapter 3, Chapter 11, and Chapter 12. A lot of effort has been made in the last two decades to derive effective features in signal classification through signal processing. Such efforts include about two dozen mathematical features for use in exploration seismic pattern recognition [35], multi-dimensional attribute analysis that includes both physically and mathematically significant features or attributes for seismic interpretation [36], time domain, frequency domain, and time– frequency domain extracted features for transient signal analysis [37] and classification, and about a dozen features for active sonar classification [38]. Clearly, the feature extraction method for one type of signal cannot be transferred to other signals. To use a large number of features derived from signal processing is not desirable as there is significant information overlap among features and the resulting feature selection process can be tedious. It has not been verified that features extracted from the time–frequency representation can be more useful than the features from time–domain analysis and frequency domain alone. Ideally, a combination of a small set of physically significant and mathematically significant features should be used. Instead of looking for the optimal feature set, a small, but effective, feature set should be considered. It is doubtful that an optimal feature set for any given pattern recognition application can be developed in the near future in spite of many advances in signal and image processing.

2.5

Conclusions

Remote-sensing sensors have been able to deliver abundant information [39]. The many advances in statistical pattern recognition and signal processing can be very useful in remote-sensing information processing, either to supplement the capability of sensors or to effectively utilize the enormous amount of sensor data. The potentials and opportunities of using statistical pattern recognition and signal processing in remote sensing are thus unlimited.

References 1. Landgrebe, D.A., Signal Theory Methods in Multispectral Remote Sensing, John Wiley & Sons, New York, 2003. 2. Duin, R.P.W. and Tax, D.M.J., Statistical pattern recognition, in Handbook of Pattern Recognition and Computer Vision, 3rd ed., Chen, C.H. and Wang, P.S.P., Eds., World Scientific Publishing, Singapore, Jan. 2005, Chap. 1. 3. Chow, C.K., An optimum character recognition system using decision functions, IEEE Trans. Electron. Comput., EC6, 247–254, 1957. 4. Cover, T.M. and Hart, P.E., Nearest neighbor pattern classification, IEEE Trans. Inf. Theory, IT-13(1), 21–27, 1967.

Statistical Pattern Recognition and Signal Processing in Remote Sensing

31

5. Parzen, E., On estimation of a probability density function and mode, Annu. Math. Stat., 33(3), 1065–1076, 1962. 6. Lachenbruch, P.S. and Mickey, R.M., Estimation of error rates in discriminant analysis, Technometrics, 10, 1–11, 1968. 7. Kailath, T., The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commn. Technol., COM-15, 52–60, 1967. 8. Duda, R.C., Hart, P.E., and Stork, D.G., Pattern Classification, John Wiley & Sons, New York, 2003. 9. Ho, Y.C. and Kashyap, R.L., An algorithm for linear inequalities and its applications, IEEE Trans. Electron. Comput., Ece-14(5), 683–688, 1965. 10. Duda, R.C. and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1972. 11. Bishop, C., Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. 12. Cover, T.M., Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. Electron. Comput., EC-14, 326–334, 1965. 13. Fukanaga, K., Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, New York, 1990. 14. MacQueen, J., Some methods for classification and analysis of multivariate observations, Proc. Fifth Berkeley Symp. Probab. Stat., 281–297, 1997. 15. Kohonen, T., Self-Organization and Associative Memory, 3rd ed., Springer-Verlag, Heidelberg, 1988 (1st ed., 1980). 16. Vapnik, V.N., and Chervonenkis, A.Ya., On the uniform convergence of relative frequencies of events to their probabilities, Theor. Probab. Appl., 17, 264–280, 1971. 17. Vapnik, V.N., Statistical Learning Theory, John Wiley & Sons, New York, 1998. 18. Kittler, J., Hatef, M., Duin, R.P.W., and Matas, J., On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intellig., 20(3), 226–239, 1998. 19. Sammon, J.W., Jr., A nonlinear mapping for data structure analysis, IEEE Trans. Comput., C-18, 401–409, 1969. 20. Wilkinson, G., Results and implications of a study of fifteen years of satellite image classification experiments, IEEE Trans. Geosci. Remote Sens., 43, 433–440, 2005. 21. Fu, K.S., Application of pattern recognition to remote sensing, in Applications of Pattern Recognition, Fu, K.S., Ed., CRC Press, Boca Raton, FL, 1982, Chap. 4. 22. Benediktsson, J.A., Statistical and neural network pattern recognition methods for remote sensing applications, in Handbook of Pattern Recognition and Computer Vision, 2nd ed., Chen, C.H. et al., Eds., World Scientific Publishing, Singapore, 1999, Chap. 3.2. 23. Graboswki, S., Jozwik, A., and Chen, C.H., Nearest neighbor decision rule for pixel classification in remote sensing, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 13. 24. Stevens, M.R. and Snorrason, M., Multisensor automatic target segmentation, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 15. 25. Richards, J., Is there a best classifier? Proc. SPIE, 5982, 2005. 26. Melgani, F. and Bruzzone, L., Classification of hyperspectral remote sensing images with support vector machines, IEEE Trans. Geosci. Remote Sens., 42, 1778–1790, 2004. 27. Chen, C.H. and Shrestha, B., Classification of multi-sensor remote sensing images using selforganizing feature maps and radial basis function networks, Proc. IGARSS, Hawaii, 2000. 28. Bruzzone, L. and Prieto, D., A technique of the selection of kernel-function parameters in RBF neural networks for classification of remote sensing images, IEEE Trans. Geosci. Remote Sens., 37(2), 1179–1184, 1999. 29. Zhang, X. and Chen, C.H., Independent component analysis by using joint cumulants and its application to remote sensing images, J. VLSI Signal Process. Systems, 37(2/3), 2004. 30. Chen, C.H., Ed., Digital Waveform Processing and Recognition, CRC Press, Boca Raton, FL, 1982. 31. Chen, C.H., Ed., Signal Processing Handbook, Marcel-Dekker, New York, 1988. 32. Chen, C.H., Transform methods in remote sensing information processing, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 2.

32

Signal and Image Processing for Remote Sensing

33. Chen, C.H., Eds., Wavelet analysis and applications, in Frontiers of Remote Sensing Information Processing, World Scientific Publishing, Singapore, 2003, Chaps. 7–9, pp. 139–224. 34. Lee, J.B., Woodyatt, A.S., and Berman, M., Enhancement of high spectral resolution remote sensing data by a noise-adjusted principal component transform, IEEE Trans. Geosci. Remote Sens., 28, 295–304, 1990. 35. Li, Y., Bian, Z., Yan, P., and Chang, T., Pattern recognition in geophysical signal processing and interpretation, in Handbook of Pattern Recognition and Computer Vision, 1st ed., Chen, C.H. et al., Eds., World Scientific Publishing, Singapore, 1993, Chap. 3.2. 36. Olson, R.G., Signal transient analysis and classification techniques, in Handbook of Pattern Recognition and Computer Vision, 1st ed., Chen, C.H., Ed., World Scientific Publishing, Singapore, 1993, Chap. 3.3. 37. Justice, J.H., Hawkins, D.J., and Wong, G., Multidimensional attribute analysis and pattern recognition for seismic interpretation, Pattern Recognition, 18(6), 391–407, 1985. 38. Chen, C.H., Neural networks in active sonar classification, Neural Networks in Ocean Environments, IEE conference, Washington; DC; 1991. 39. Richard, J., Remote sensing sensors: capabilities and information processing requirements, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 1.

3 A Universal Neural Network–Based Infrasound Event Classifier

Fredric M. Ham and Ranjan Acharyya

CONTENTS 3.1 Overview of Infrasound and Why Classify Infrasound Events?................................ 33 3.2 Neural Networks for Infrasound Classification ............................................................ 34 3.3 Details of the Approach ..................................................................................................... 35 3.3.1 Infrasound Data Collected for Training and Testing ....................................... 36 3.3.2 Radial Basis Function Neural Networks ............................................................ 36 3.4 Data Preprocessing ............................................................................................................. 40 3.4.1 Noise Filtering ......................................................................................................... 40 3.4.2 Feature Extraction Process .................................................................................... 40 3.4.3 Useful Definitions ................................................................................................... 44 3.4.4 Selection Process for the Optimal Number of Feature Vector Components ................................................................................................ 46 3.4.5 Optimal Output Threshold Values and 3-D ROC Curves .............................. 46 3.5 Simulation Results .............................................................................................................. 49 3.6 Conclusions .......................................................................................................................... 53 Acknowledgments ....................................................................................................................... 53 References ..................................................................................................................................... 53

3.1

Overview of I nfrasound and Why C las sify Inf rasound Events?

Infrasound is a longitudinal pressure wave [1– 4]. The characteristics of these waves are similar to audible acoustic waves but the frequency range is far below what the human ear can detect. The typical frequency range is from 0.01 to 10 Hz (Figure 3.1). Nature is an incredible creator of infrasonic signals that can emanate from sources such as volcano eruptions, earthquakes, severe weather, tsunamis, meteors (bolides), gravity waves, microbaroms (infrasound radiated from ocean waves), surf, mountain ranges (mountain associated waves), avalanches, and auroral waves to name a few. Infrasound can also result from man-made events such as mining blasts, the space shuttle, high-speed aircraft, artillery fire, rockets, vehicles, and nuclear events. Because of relatively low atmospheric absorption at low frequencies, infrasound waves can travel long distances in the Earth’s atmosphere and can be detected with sensitive ground-based sensors. An integral part of the comprehensive nuclear test ban treaty (CTBT) international monitoring system (IMS) is an infrasound network system [3]. The goal is to have 60 33

Signal and Image Processing for Remote Sensing

34 0.01

0.03

1 Megaton yield

0.1 0.2

1.0

Hz

Impulsive events

1 Kiloton yield

Volcano events

10.0

Microbaroms

Gravity waves Bolide

Mountain associated waves FIGURE 3.1 Infrasound spectrum.

0.01

0.03

0.1

0.2

1.0

10.0 Hz

infrasound arrays operational worldwide over the next several years. The main objective of the infrasound monitoring system is the detection and verification, localization, and classification of nuclear explosions as well as other infrasonic signals-of-interest (SOI). Detection refers to the problem of detecting an SOI in the presence of all other unwanted sources and noises. Localization deals with finding the origin of a source, and classification deals with the discrimination of different infrasound events of interest. This chapter concentrates on the classification part only.

3.2

Neural Networks for Infrasound Classification

Humans excel at the task of classifying patterns. We all perform this task on a daily basis. Do we wear the checkered or the striped shirt today? For example, we will probably select from a group of checkered shirts versus a group of striped shirts. The grouping process is carried out (probably at a near subconscious level) by our ability to discriminate among all shirts in our closet and we group the striped ones in the striped class and the checkered ones in the checkered class (that is, without physically moving them around in the closet, only in our minds). However, if the closet is dimly lit, this creates a potential problem and diminishes our ability to make the right selection (that is, we are working in a ‘‘noisy’’ environment). In the case of using an artificial neural network for classification of patterns (or various ‘‘events’’) the same problem exists with noise. Noise is everywhere. In general, a common problem associated with event classification (or detection and localization for that matter) is environmental noise. In the infrasound problem, many times the distance between the source and the sensors is relatively large (as opposed to region infrasonic phenomena). Increases in the distance between sources and sensors heighten the environmental dependence of the signals. For example, the signal of an infrasonic event that takes place near an ocean may have significantly different characteristics as compared to the same event that occurs in a desert. A major contributor of noise for the signal near an ocean is microbaroms. As mentioned above, microbaroms are generated in the air from large ocean waves. One important characteristic of neural networks is their noise rejection capability [5]. This, and several other attributes, makes them highly desirable to use as classifiers.

A Universal Neural Network–Based Infrasound Event Classifier

3.3

35

Details of the Approach

Our approach of classifying infrasound events is based on a parallel bank neural network structure [6–10]. The basic architecture is shown in Figure 3.2. There are several reasons for using such an architecture; however, one very important advantage of dedicating one module to perform the classification of one event class is that the architecture is fault tolerant (i.e., if one module fails, the rest of the individual classifiers will continue to function). However, the overall performance of the classifier is enhanced when the parallel bank neural network classifier (PBNNC) architecture is used. Individual banks (or modules) within the classifier architecture are radial basis function neural networks (RBF NNs) [5]. Also, each classifier has its own dedicated preprocessor. Customized feature vectors are computed optimally for each classifier and are based on cepstral coefficients and a subset of their associated derivatives (differences) [11]. This will be explained in detail later. The different neural modules are trained to classify one and only one class; however, for the requisite module responsible for one of the classes, it is also trained not to recognize all other classes (negative reinforcement). During the training process, the output is set to a ‘‘1’’ for a correct class and a ‘‘0’’ for all the other signals associated with all the other classes. When the training process is complete the final output thresholds will be set to an optimal value based on a three-dimensional receiver operating characteristic (3-D ROC) curve for each one of the neural modules (see Figure 3.2). Optimum threshold set by ROC curve Pre-processor 1

Infrasound class 1 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 2

Infrasound class 2 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 3

Infrasound class 3 neural network

1 0

Optimum threshold set by ROC curve

Infrasound signal Pre-processor 4

Infrasound class 4 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 5

Infrasound class 5 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 6

Infrasound class 6 neural network

FIGURE 3.2 Basic parallel bank neural network classifier (PBNNC) architecture.

1 0

Signal and Image Processing for Remote Sensing

36 3.3.1

Infrasound Data Collected for Training and Testing

The data used for training and testing the individual networks are obtained from multiple infrasound arrays located in different geographical regions with different geometries. The six infrasound classes used in this study are shown in Table 3.1, and the various array geometries are shown in Figure 3.3(a) through Figure 3.3(e) [12,13]. Table 3.2 shows the various classes, along with the array numbers where the data were collected, and the associated sampling frequencies. 3.3.2

Radial Basis Function Neural Networks

As previously mentioned, each of the neural network modules in Figure 3.2 is an RBF NN. A brief overview of RBF NNs will be given here. This is not meant to be an exhaustive discourse on the subject, but only an introduction to the subject. More details can be found in Refs. [5,14]. Earlier work on the RBF NN was carried out for handling multivariate interpolation problems [15,16]. However, more recently they have been used for probability density estimation [17–19] and approximations of smooth multivariate functions [20]. In principle, the RBF NN makes adjustments of its weights so that the error between the actual and the desired responses is minimized relative to an optimization criterion through a defined learning algorithm [5]. Once trained, the network performs the interpolation in the output vector space, thus the generalization property. Radial basis functions are one type of positive-definite kernels that are extensively used for multivariate interpolation and approximation. Radial basis functions can be used for problems of any dimension, and the smoothness of the interpolants can be achieved to any desirable extent. Moreover, the structures of the interpolants are very simple. However, there are several challenges that go along with the aforementioned attributes of RBF NNs. For example, many times an ill-conditioned linear system must be solved, and the complexity of both time and space increases with the number of interpolation points. But these types of problems can be overcome. The interpolation problem may be formulated as follows. Assume M distinct data points X ¼ {x1, . . . , xM}. Also assume the data set is bounded in a region V (for a specific class). Each observed data point x 2 R u (u corresponds to the dimension of the input space) may correspond to some function of x. Mathematically, the interpolation problem may be stated as follows. Given a set of M points, i.e., {xi 2 R uji ¼ 1, 2, . . . , M} and a corresponding set of M real numbers {di 2 Rji ¼ 1, 2, . . . , M} (desired outputs or the targets), find a function F:R M ! R that satisfies the interpolation condition F(xi ) ¼ di , i ¼ 1, 2, . . . , M

(3:1)

TABLE 3.1 Infrasound Classes Used for Training and Testing Class Number 1 2 3 4 5 6

Event

No. SOI (n ¼ 574)

No. SOI Used for Training (n ¼ 351)

No. SOI Used for Testing (n ¼ 223)

Vehicle Artillery fire (ARTY) Jet Missile Rocket Shuttle

8 264 12 24 70 196

4 132 8 16 45 146

4 132 4 8 25 50

A Universal Neural Network–Based Infrasound Event Classifier (a)

37

(b)

YDIS, m

YDIS, m

30

30

Sensor 3 (3.1, 19.9)

20

Sensor 5 (−19.8, 0.0)

10

Sensor 1 (0.0, 0.0) −40

−30

−20

−10

10

20

−40

30 XDIS, m

−30

10

Sensor 1 (0.0, 0.0)

−20

−10

−10

Sensor 2 (−18.6, −7.5)

Sensor 3 (0.7, 20.1)

20

10 −10

Sensor 4 (15.3, −12.7)

−20

20

30

Sensor 4 XDIS, m (19.8, −0.1)

−20

Sensor 2 (−1.3, −19.9) −30

−30

Array BP1

Array BP2

YDIS, m

YDIS, m

(c)

(d)

30

30

20

20

−40

−30

−20

Sensor 1 (0.0, 0.0)

Sensor 1 (0.0, 0.0) −10

10

20

30 40 XDIS, m

−10

Sensor 4 (45.0, −8.0)

Sensor 3 (28.0, −21.0)

−30

YDIS, m 30

Sensor 3 (1.1, 20.0)

10

Sensor 1 (0.0, 0.0) −40

−30

−20

−10

10

20

30 XDIS, m

−10

−20

Sensor 2 (−12.3, −15.8)

Sensor 4 (14.1, −14.1)

−30

Array K8203

FIGURE 3.3 Five different array geometries.

−30

−20

Sensor 2 (−20.1, 0.0)

−10

10 −10

−30

Array K8202

Array K8201

20

−40

−20

−20

(e)

Sensor 4 (20.3, 0.5)

10

10

Sensor 2 (−22.0, 10.0)

Sensor 3 (0.0, 20.2)

20

30 XDIS, m

Signal and Image Processing for Remote Sensing

38 TABLE 3.2

Array Numbers Associated with the Event Classes and the Sampling Frequencies Used to Collect the Data Class Number 1 2 3 4 5 6 a

Event

Array

Sampling Frequency, Hz

Vehicle Artillery fire (ARTY) (K8201: Sites 1 and 2) Jet Missile Rocket Shuttle

K8201 K8201; K8203

100 (K8201: Sites 1 and 100; Sites 2 and 50); 50 50 50; 50 100; 100 100; 50

K8201 K8201; K8203 BP1; BP2 BP2; BP103a

Array geometry not available.

Thus, all the points must pass through the interpolating surface. A radial basis function may be a special interpolating function of the form F(x) ¼

M X

w i f i ( kx xi k2 )

(3:2)

i¼1

where f(.) is known as the radial basis function and k.k2 denotes the Euclidean norm. In general, the data points xi are the centers of the radial basis functions and are frequently written as ci. One of the problems encountered when attempting to fit a function to data points is over-fitting of the data, that is, the value of M is too large. However, generally speaking, this is less a problem the RBF NN that it is with, for example, a multi-layer perceptron trained by backpropagation [5]. The RBF NN is attempting to construct the hyperspace for a particular problem when given a limited number of data points. Let us take another point of view concerning how an RBF NN performs its construction of a hypersurface. Regularization theory [5,14] is applied to the construction of the hypersurface. A geometrical explanation follows. Consider a set of input data obtained from several events from a single class. The input data may be from temporal signals or defined features obtained from these signals using an appropriate transformation. The input data would be transformed by a nonlinear function in the hidden layer of the RBF NN. Each event would then correspond to a point in the feature space. Figure 3.4 depicts a two-dimensional (2-D) feature set, that is, the dimension of the output of the hidden layer in the RBF NN is two. In Figure 3.4, ‘‘(a)’’, ‘‘(b)’’, and ‘‘(c)’’ correspond to three separate events. The purpose here is to construct a surface (shown by the dotted line in Figure 3.4) such that the dotted region encompasses events of the same class. If the RBF network is to classify four different classes, there must be four different regions (four dotted contours), one for each class. Ideally, each of these regions should be separate with no overlap. However, because there is always a limited amount of observed data, perfect reconstruction of the hyperspace is not possible and it is inevitable that overlap will occur. To overcome this problem it is necessary to incorporate global information from V (i.e., the class space) in approximating the unknown hyperspace. One choice is to introduce a smoothness constraint on the targets. Mathematical details will not be given here, but for an in-depth development see Refs. [5,14]. Let us now turn our attention to the actual RBF NN architecture and how the network is trained. In its basic form, the RBF NN has three layers: an input layer, one hidden

A Universal Neural Network–Based Infrasound Event Classifier

39

Feature 2

(a)

(b)

(c)

FIGURE 3.4 Example of a two-dimensional feature set.

Feature 1

layer, and one output layer. Referring to Figure 3.5, the source nodes (or the input components) make up the input layer. The hidden layer performs a nonlinear transformation (i.e., the radial basis functions residing in the hidden layer perform this transformation) of the input to the network and is generally of a higher dimension than the input. This nonlinear transformation of the input in the hidden layer may be viewed as a basis for the construction of the input in the transformed space. Thus, the term radial basis function. In Figure 3.5, the output of the RBF NN (i.e., at the output layer) is calculated according to yi ¼ fi (x) ¼

N X

wik fk (x,ck ) ¼

k¼1

N X

wik fk (kx ck k2 ), i ¼ 1, 2, . . . , m (no: outputs)

(3:3)

k¼1

where x 2 R u1 is the input vector, fk(.) is a (RBF) function that maps R þ (set of all positive real numbers) to R (field of real numbers), k.k2 denotes the Euclidean norm, wik are the weights in the output layer, N is the number of neurons in the hidden layer, and ck 2 R u1 are the RBF centers that are selected based on the input vector space. The Euclidean distance between the center of each neuron in the hidden layer and the input to the network is computed. The output of the neuron in a hidden layer is a nonlinear x1

W T ∈ ℜm × N f1

x2

x3

Σ

y1

Σ

ym

f2

fN

xu Input layer

Hidden layer

Output layer

FIGURE 3.5 RBF NN architecture.

Signal and Image Processing for Remote Sensing

40

function of this distance, and the output of the network is computed as a weighted sum of the hidden layer outputs. The functional form of the radial basis function, fk(.), can be any of the following:

.

Linear function: f(x) ¼ x Cubic approximation: f(x) ¼ x3

.

Thin-plate-spline function: f(x) ¼ x2 ln(x)

.

Gaussian function: f(x) ¼ exp(x2/s2) pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Multi-quadratic function: f(x) ¼ x2 þ s2

.

. .

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Inverse multi-quadratic function: f(x) ¼ 1=( x2 þ s2 )

The parameter s controls the ‘‘width’’ of the RBF and is commonly referred to as the spread parameter. In many practical applications the Gaussian RBF is used. The centers, ck, of the Gaussian functions are points used to perform a sampling of the input vector space. In general, the centers form a subset of the input data.

3. 4 3.4.1

D ata P reproc essi ng Noise Filtering

Microbaroms, as previously defined, are a persistently present source of noise that resides in most collected infrasound signals [21–23]. Microbaroms are a class of infrasonic signals characterized by narrow-band, nearly sinusoidal waveforms, with a period between 6 and 8 sec. These signals can be generated by marine storms through a nonlinear interaction of surface waves [24]. The frequency content of the microbaroms often coincides with that of small-yield nuclear explosions. This could be bothersome in many applications; however, simple band-pass filtering can alleviate the problem in many cases. Therefore, a band-pass filter with a pass band between 1 and 49 Hz (for signals sampled at 100 Hz) is used here to eliminate the effects of the microbaroms. Figure 3.6 shows how band-pass filtering can be used to eliminate the microbaroms problem. 3.4.2

Feature Extraction Process

Depicted in each of the six graphs in Figure 3.7 is a collection of eight signals from each class, that is, yij(t) for i ¼ 1, 2, . . . , 6 (classes) and j ¼ 1, 2, . . . , 8 (number of signals) (see Table 3.1 for total number of signals in each class). A feature extraction process is desired that will capture the salient features of the signals in each class and at the same time be invariant relative to the array geometry, the geographical location of the array, the sampling frequency, and the length of the time window. The overall performance of the classifier is contingent on the data that is used to train the neural network in each of the six modules shown in Figure 3.2. Moreover, the neural network’s ability to distinguish between the various events (presented the neural networks as feature vectors) is the distinctiveness of the features between the classes. However, within each class it is desirable to have the feature vectors as similar to each other as possible. There are two major questions to be answered: (1) What will cause the signals in one class to have markedly different characteristics? (2) What can be done to minimize these

A Universal Neural Network–Based Infrasound Event Classifier After filtering 1

1

0.5

Amplitude

Amplitude

Before filtering 1.5

0.5

0

400

–1

1000

500 Time samples

500 Time samples

1000

8 6

300 200 100 0 –50

0

After filtering

Before filtering

Magnitude

Magnitude

0.0 –0.5

0 –0.5

41

50

0 Frequency (Hz)

4 2 0 –50

0 Frequency (Hz)

50

FIGURE 3.6 Results of band-pass filtering to eliminate the effects of microbaroms (an artillery signal).

differences and achieve uniformity within a class and distinctively different feature vector characteristics between classes? The answer to the first question is quite simple—noise. This can be noise associated with the sensors, the data acquisition equipment, or other unwanted signals that are not of interest. The answer to the second question is also quite simple (once you know the answer)—using a feature extraction process based on computed cepstral coefficients and a subset of their associated derivatives (differences) [10,11,25 –28]. As mentioned in Section 3.3, each classifier has its own dedicated preprocessor (see Figure 3.2). Customized feature vectors are computed optimally for each classifier (or neural module) and are based on the aforementioned cepstral coefficients and a subset of their associated derivatives (or differences). The preprocessing procedure is as follows. Each time-domain signal is first normalized and then its mean value is computed and removed. Next, the power spectral density (PSD) is calculated for each signal, which is a mixture of the desired component and possibly other unwanted signals and noise. Therefore, when the PSDs are computed for a set of signals in a defined class there will be spectral components associated with noise and other unwanted signals that need to be suppressed. This can be systematically accomplished by first computing the average PSD (i.e., PSDavg) over the suite of PSDs for a particular class. The spectral components are defined as mi for i ¼ 1, 2, . . . for PSDavg. The maximum spectral component, mmax, of PSDavg is then determined. This is considered the dominant spectral component within a particular class and its value is used to suppress selected components in the resident PSDs for any particular class according to the following: if mi > «1 mmax (typically «1 ¼ 0:001) then mi else «2

mi mi (typically «2 ¼ 0:00001)

Signal and Image Processing for Remote Sensing

42 (a) Raw time-domain 8 signals for vehicle

(b) Raw time-domain 8 signals for missile

0.8

2 1.5

0.6

1 0.5

Amplitude

Amplitude

0.4 0.2 0

Vehicle class

–0.2

0 –0.5 –1

Missile class

–1.5 –2

–0.4 –2.5 –0.6

0

100

200

300

400

500

600

700

–3

Time (sec) (c) Raw time-domain 8 signals for artillery 0.7

1.8 1.6

0

1000

2000

3000

4000

5000

6000

7000

8000

1400

1600

Time (sec) (d) Raw time-domain 8 signals for rocket

0.6

1.4 0.5 Amplitude

Amplitude

1.2 1

Artillery class

0.8

0.4 0.3

0.6

Rocket class

0.2 0.4 0.1

0.2 0

0 0

50

100

150

200

250

300

350

0

1.4

600

800

1000

1200

Shuttle class

0.9

Jet class

1

0.8

0.8

0.7

Amplitude

Amplitude

400

1

1.2

0.6 0.4

0.6 0.5

0.2

0.4

0

0.3

–0.2

0.2

–0.4

200

Time (sec) (f) Raw time-domain 8 signals for shuttle

Time (sec) (e) Raw time-domain 8 signals for jet

0

500

1000 1500 2000 2500 3000

3500 4000 4500 0.1 0

Time (sec)

100

200

300

400

500

600

700

800

900 1000

Time (sec)

FIGURE 3.7 (See color insert following page 302.) Infrasound signals for six classes.

To some extent, this will minimize the effects of any unwanted components that may reside in the signals and at the same time minimize the effects of noise. However, another step can be taken to further minimize the effects of any unwanted signals and noise that may reside in the data. This is based on a minimum variance criterion applied to the spectral components of the PSDs in a particular class after the previously described step is completed. The second step is carried out by taking the first 90% of the spectral components that are rank-ordered according to the smallest variance. The rest of the components

A Universal Neural Network–Based Infrasound Event Classifier

43

in the power spectral densities within a particular class are set to a small value, that is, «3 (typically 0.00001). Therefore, the number of spectral components greater than «3 will dictate the number of components in the cepstral domain (i.e., the number of cepstral coefficients and associated differences). Depending on the class, the number of coefficients and differences will vary. For example, in the simulations that were run, the largest number of components was 2401 (artillery class) and the smallest number was 543 (vehicle class). Next, the mel-frequency scaling step is carried out with defined values for a and b [10], then the inverse discrete cosine transform is taken and the derivatives (differences) are computed. From this set of computed cepstral coefficients and differences, it is desired to select those components that will constitute a feature vector that is consistent within a particular class. That is, there is minimal variation among similar components across the suite of feature vectors. So the approach taken here is to think in terms of minimum variance of these similar components within the feature set. Recall, the time-domain infrasound signals are assumed to be band-pass filtered to remove any effects of microbaroms as described previously. For each discrete-time infrasound signal, y(k), where k is the discrete time index (an integer), the specific preprocessing steps are (dropping the time dependence k): (1) Normalize (i.e., divide each sample in the signal y(k) by the absolute value of the maximum amplitude, jymaxj, and also divide by the square root of the computed variance of the signal, sy2, and then remove the mean: y

y={jymax j,sy }

(3:4)

y

y mean(y)

(3:5)

(2) Compute the PSD, Syy(kv), of the signal y: Syy (kv ) ¼

1 X

Ryy (t)ejkv t

(3:6)

t¼0

where Ryy() is the autocorrelation of the infrasound signal y. (3) Find the average of the entire set of PSDs in the class, i.e., PSDavg (4) Retain only those spectral components whose contributions will maximize the overall performance of the global classifier: if mi > «1 mmax (typically «1 ¼ 0:001) then mi else «2

mi mi (typically «2 ¼ 0:00001)

(5) Compute variances of the components selected in Step (4). Then take the first 90% of the spectral components that are rank-ordered according to the smallest variance. Set the remaining components to a small value, i.e., «3 (typically 0.00001). (6) Apply mel-frequency scaling to Syy(kv): Smel (kv ) ¼ a loge ½bSyy (kv ) where a ¼ 11.25, b ¼ 0.03.

(3:7)

Signal and Image Processing for Remote Sensing

44 (7)

Take the inverse discrete cosine transform: xmel (n) ¼

1 X 1 N Sm (kv ) cos(2pkv n=N ) n k ¼0

for n ¼ 0, 1, 2, . . . , N 1

(3:8)

v

(8)

Take the consecutive differences of the sequence xmel (n) to obtain x0mel (n).

(9)

Concatenate the sequence of differences, x0mel(n), with the cepstral coefficient sequence, xmel (n), to form the augmented sequence: xamel ¼ [x0mel (i)jxmel (j)]

(3:9)

where i and j are determined experimentally. As mentioned previously, i ¼ 400 and j ¼ 600. a (10) Take the absolute value of the elements in the sequence xmel yielding:

xamel,abs ¼ jxamel j

(3:10)

(11) Take the loge of xamel,abs from the previous step to give: xamel,abs,log ¼ loge ½xamel, abs

(3:11)

Applying this 11-step feature extraction process to the infrasound signals in the six different classes results in the feature vectors shown in Figure 3.8. The length of each feature vector is 34. This will be explained in the next section. If these sets of feature vectors are compared to their time-domain signal counterparts (see Figure 3.7), it is obvious that the feature extraction process produces feature vectors that are much more consistent than the time-domain signals. Moreover, comparing the feature vectors between classes reveals that the different sets of feature vectors are markedly distinct. This should result in improved classification performance. 3.4.3

Useful Definitions

Before we go on, let us define some useful quantities that apply to the assessment of performance for classifiers. The confusion matrix [29] for a two-class classifier is shown in Table 3.3. In Table 3.3 we have the following: p: number of correct predictions that an occurrence is positive q: number of incorrect predictions that an occurrence is positive r: number of incorrect of predictions that an occurrence is negative s: number of correct predictions that an occurrence is negative With this, the correct classification rate (CCR) is defined as No: correct predictions No: classifications No: predictions p þ s No: multiple classifications ¼ pþqþrþs

CCR ¼

(3:12)

A Universal Neural Network–Based Infrasound Event Classifier (a) Feature set for vehicle

10

9

9

8

8 Amplitude

Amplitude

10

7 6

4

4 10

25 15 20 Feature number

3

30

9

8

8

Amplitude

9

7 6

4

4 10

15 20 25 Feature number

5

(e) Feature set for jet 9

8

8 Amplitude

9 Amplitude

10

15 20 25 Feature number

30

(f) Feature set for shuttle 10

7 6

7 6

5

5

4

4 5

30

3

30

10

3

15 20 25 Feature number

6 5

5

10

7

5

3

5

(d) Feature set for rocket 10

(c) Feature set for artillery fire 10

Amplitude

6 5

5

(b) Feature set for missile

7

5

3

45

10 15 20 25 Feature number

30

3 5

10 15 20 25 Feature number

30

FIGURE 3.8 (See color insert following page 302.) Infrasound signals for six class different classes.

Multiple classifications refer to more than one of the neural modules showing a ‘‘positive’’ at the output of the RBF NN indicating that the input to the global classifier belongs to more than one class (whether this is true or not). So there could be double, triple, quadruple, etc., classifications for one event. The accuracy (ACC) is given by ACC ¼

No: correct predictions pþs ¼ No predictions pþqþrþs

(3:13)

Signal and Image Processing for Remote Sensing

46 TABLE 3.3

Confusion Matrix for a Two-Class Classifier Predicted Value Actual Value Positive Negative

Positive p r

Negative q s

As seen from Equation 3.12 and Equation 3.13, if multiple classifications occur, the CCR is a more conservative performance measure than the ACC. However, if no multiple classifications occur, the CCR ¼ ACC. The true positive (TP) rate is the proportion of positive cases that are correctly identified. This is computed using p TP ¼ (3:14) pþq The false positive (FP) rate is the proportion of negative cases that are incorrectly classified as positive occurrences. This is computed using r (3:15) FP ¼ rþs 3.4.4

Selection Process for the Optimal Number of Feature Vector Components

From the set of computed cepstral coefficients and differences generated using the feature extraction process given above, an optimal subset of these is desired that will constitute the feature vectors used to train and test the PBNNC shown in Figure 3.2. The optimal subset (i.e., the optimal feature vector length) is determined by taking a minimum variance approach. Specifically, a 3-D graph is generated that plots the performance; that is, CCR versus the RBF NN spread parameter and the feature vector number (see Figure 3.9). From this graph, mean values and variances are computed across the range of spread parameters for each of the defined number of components in the feature vector. The selection criterion is defined as simultaneously maximizing the mean and at the same time minimizing the variance. Maximization of the mean ensures maximum performance; that is, maximizing the CCR and at the same time minimizing the variance to minimize variation in the feature set within each of the classes. The output threshold at each of the neural modules (i.e., the output of the single output neuron of each RBF NN) is set optimally according to a 3-D ROC curve. This will be explained next. Figure 3.10 shows the two plots used to determine the maximum mean and the minimum variance. The table insert between the two graphs shows that even though the mean value for 40 elements in the feature vector is (slightly) larger than that for 34 elements, the variance for 40 is nearly three times that for 34 elements. Therefore, a length of 34 elements for the feature vectors is the best choice. 3.4.5

Optimal Output Threshold Values and 3-D ROC Curves

At the output of the RBF NN for each of the six neural modules, there is a single output neuron with hard-limiting binary values used during the training process (see Figure 3.2). After training, to determine whether a particular SOI belongs to one of the six classes, the

A Universal Neural Network–Based Infrasound Event Classifier

47

Performance (CCR) 3-D plot using ROC curve

100

CCR (%)

80 60 40 20 0 2.5 2.0 1.5

Sp

rea

1.0

dp

ara

Co m and pute m Co var e mp ian an Com and ute m ce p var e Com and ute m ian an e var p ce Com ian an and ute m 50 put c e e var e and ian an 40 var mean ce ian 30 ce r

0.5

me

ter

20

0.1

10

60

be

re num

Featu

FIGURE 3.9 (See color insert following page 302.) Performance plot used to determine the optimal number components in the feature vector. Ill conditioning occurs for the feature number less than 10, and for the feature number greater than 60, the CCR dramatically declines.

threshold value of the output neurons is optimally set according to an ROC curve [30–32] for that individual neural module (i.e., one particular class). Before an explanation of the 3-D ROC curve is given, let us first review 2-D ROC curves and see how they are used to optimally set threshold values. An ROC curve is a plot of the TP rate versus the FP rate, or the sensitivity versus (1 – specificity); a sample ROC curve is shown in Figure 3.11. The optimal threshold value corresponds to a point nearest the ideal point (0, 1) on the graph. The point (0, 1) is considered ideal because in this case there would be no false positives and only true positives. However, because of noise and other undesirable effects in the data, the point closest to the (0, 1) point (i.e., the minimum Euclidean distance) is the best that we can do. This will then dictate the optimal threshold value to be used at the output of the RBF NN. Since there are six classifiers, that is, six neural modules in the global classifier, six ROC curves must be generated. However, using 2-D ROC curves to set the thresholds at the outputs of the six RBF NN classifiers will not result in optimal thresholds. This is because misclassifications are not taken into account when setting the threshold for a particular neural module that is responsible for classifying a particular set of infrasound signals. Recall that one neural module is associated with one infrasonic class, and each neural module acts as its own classifier. Therefore, it is necessary to account for the misclassifications that can occur and this can be accomplished by adding a third dimension to the ROC curve. When the misclassifications are taken into account the (0, 1, 0) point now becomes the optimal point, and the smallest Euclidean distance to this point is directly related to

Signal and Image Processing for Remote Sensing

48 70

Performance mean

65

60 34 Features

55

50

45

40 10

15

20

25

Feature number

30 35 40 Feature number

45

Mean

Variance

34

66.9596

47.9553

40

69.2735

50

55

60

50

55

60

122.834

500 450 400

Performance variance

350 300 250 34 Features

200 150 100 50 0 10

15

20

25

30 35 40 Feature number

45

FIGURE 3.10 Performance means and performance variances versus feature number used to determine the optimal length of the feature vector.

A Universal Neural Network–Based Infrasound Event Classifier

49

1 0.9

ROC curve Minimum Euclidean distance

TP (sensitivity)

0.8 0.7 0.6 0.5 0.4

0

0.1

0.2

0.3

0.4 0.5 0.6 0.7 FP (1 − specificity)

0.8

0.9

1

FIGURE 3.11 Example ROC curve.

the optimal threshold value for each neural module. Figure 3.12 shows the six 3-D ROC curves associated with the classifiers.

3.5

Simul ation R esults

The four basic parameters that are to be optimized in the process of training the neural network classifier (i.e., the bank of six RBF NNs) are the RBF NN spread parameters, the output thresholds of each neural module, the combination of 34 components in the feature vectors for each class (note again in Figure 3.2, each neural module has its own custom preprocessor) and of course the resulting weights of each RBF NN. The MATLAB neural networks toolbox was used to design the six RBF NNs [33]. Table 3.1 shows the specific classes and the associated number of signals used to train and test the RBF NNs. Of the 574 infrasound signals, 351 were used for training the remaining 223 were used for testing. The criterion used to divide the data between the training and testing sets was to maintain independence. Hence, the four array signals from any one event are always kept together, either in the training set or the test set. After the optimal number components for each feature vector was determined, i.e., 34 elements, and the optimal combination of the 34 components for each preprocessor, the optimal RBF spread parameters are determined along with the optimal threshold value (the six graphs in Figure 3.12 were used for this purpose). For both the RBF spread parameters and the output thresholds, the selection criterion is based on maximizing the CCR of the local network and the overall (global) classifier CCR. The RBF spread parameter and the output threshold for each neural module was determined one by one by fixing the spread parameter, i.e., s, for all other neural modules to 0.3, and holding the threshold value at 0.5. Once the first neural module’s spread

Signal and Image Processing for Remote Sensing

50

(b) ROC 3-D plot for missile

(a) ROC 3-D plot for vehicle

5 Misclassification

Misclassification

4 3.5

Vehicle

3 2.5 2 1.5 1 0.8 Tru e

0.6 pos itive 0.4

0.2 0 (c) ROC 3-D plot for artillery

0.2

0.4

0.6

0.8

2 1

Tru

0.5 osi tive

ep

0.2

0 0 (d) ROC 3-D plot for rocket

False positive

0.4

0.6

1

0.8

False positive

Misclassification

5

4 3

Artillery

2 1 0 1 0.5 ep osit ive

Tru

0

0

0.2

0.4

0.6

0.8

4

Rocket

3 2 1 0 1

1

Tru

e p 0.5 osit ive

False positive

(e) ROC 3-D plot for jet

0

0

0.2

0.4

0.6

0.8

1

False positive

(f) ROC 3-D plot for shuttle

5

5

4 3

Misclassification

Misclassification

Missile

3

0 1

1

5 Misclassification

4

Jet

2 1 0 1 0.8 Tru 0.6 ep osit 0.4 ive

0.2 0

0.2

0.4

0.6

0.8

False positive

1

4

Shuttle

3 2 1 0 1

Tru

e p 0.5 osit ive

0

0

0.2

0.4

0.6

0.8

1

False positive

FIGURE 3.12 3-D ROC curves for the six classes.

parameter and threshold is determined, then the spread parameter and output threshold of the second neural module is computed while holding all other neural modules’ (except the first one) spread parameters and output thresholds fixed at 0.3 and 0.5, respectively. Table 3.4 gives the final values of the spread parameter and the output threshold for the global classifier. Figure 3.13 shows the classifier architecture with the final values indicated for the RBF NN spread parameters and the output thresholds. Table 3.5 shows the confusion matrix for the six-classifier. Concentrating on the 6 6 portion of the matrix for each of the defined classes, the diagonal elements correspond to

A Universal Neural Network–Based Infrasound Event Classifier

51

TABLE 3.4 Spread Parameter and Threshold of Six-Class Classifier

Vehicle Artillery Jet Missile Rocket Shuttle

Spread Parameter

Threshold Value

True Positive

False Positive

0.2 2.2 0.3 1.8 0.2 0.3

0.3144 0.6770 0.6921 0.9221 0.4446 0.6170

0.5 0.9621 0.5 1 0.9600 0.9600

0 0.0330 0 0 0.0202 0.0289

the correct predictions. The trace of this 66 matrix divided by the total number of signals tested (i.e., 223) gives the accuracy of the global classifier. The formula for the accuracy is given in Equation 3.13, and here ACC ¼ 94.6%. The off-diagonal elements indicate the misclassifications that occurred and those in parentheses indicate double classifications (i.e., the actual class was identified correctly, but there was another one of the output thresholds for another class that was exceeded). The off-diagonal element that is in square

s1 = 0.2

Pre-processor 1

Infrasound class 1 neural network

Optimum threshold set by ROC curve (0.3144) 1 0

s2 = 2.2

Pre-processor 2

Infrasound class 2 neural network

1 0

Optimum threshold set by ROC curve (0.6921)

s3 = 0.3

Pre-processor 3 Infrasound signal

Infrasound class 3 neural network

1 0

Optimum threshold set by ROC curve (0.9221)

s4 = 1.8

Pre-processor 4

Infrasound class 4 neural network

1 0

Optimum threshold set by ROC curve (0.4446)

s5 = 0.2

Pre-processor 5

Infrasound class 5 neural network

Optimum threshold set by ROC curve (0.6770)

1 0

s6 = 0.3

Pre-processor 6

Infrasound class 6 neural network

Optimum threshold set by ROC curve (0.6170) 1 0

FIGURE 3.13 Parallel bank neural network classifier architecture with the final values for the RBF spread parameters and the output threshold values.

Signal and Image Processing for Remote Sensing

52 TABLE 3.5 Confusion Matrix for the Six-Class Classifier

Predicted Value

Actual Value

Vehicle Artillery Jet Missile Rocket Shuttle

Vehicle

Artillery

Jet

Missile

Rocket

Shuttle

Unclassified

Total (223)

2 0 0 0 0 0

(1) 127 0 0 0 (1)[1]

0 0 2 0 0 0

0 0 0 8 0 0

0 0 0 0 24 1(3)

0 0 0 0 (5) 48

2 5 2 0 1 1

4 132 4 8 25 50

brackets is a double misclassification, that is, this event is misclassified along with another misclassified event (this is a shuttle event that is misclassified as both a ‘‘rocket’’ event as well as an ‘‘artillery’’ event). Table 3.6 shows the final global classifier results giving both the CCR (see Equation 3.12) and the ACC (see Equation 3.13). Simulations were also run using ‘‘bi-polar’’ outputs instead of binary outputs. For the case of bi-polar outputs, the output is bound between 1 and þ1 instead of 0 and 1. As can be seen from the table, the binary case yielded the best results. Finally, Table 3.7 shows the results for the case where the threshold levels on the outputs of the individual RBF NNs are ignored and only the output with this largest value is taken as the ‘‘winner,’’ that is, ‘‘winner-takes-all’’; this is considered to be the class that the input SOI belongs to. It should be noted that even though the CCR shows a higher level of performance for the winner-takes-all approach, this is probably not a viable method for classification. The reason being that if there were truly multiple events occurring simultaneously, they would never be indicated as such using this approach.

TABLE 3.6 Six-Class Classification Result Using a Threshold Value for Each Network Performance Type CCR ACC

Binary Outputs (%)

Bi-Polar Outputs (%)

90.1 94.6

88.38 92.8

TABLE 3.7 Six-Class Classification Result Using ‘‘Winner-Takes-All’’ Performance Type CCR ACC

Binary Method (%)

Bi-Polar Method (%)

93.7 93.7

92.4 92.4

A Universal Neural Network–Based Infrasound Event Classifier

3.6

53

Conclusions

Radial basis function neural networks were used to classify six different infrasound events. The classifier was built with a parallel structure of neural modules that individually are responsible for classifying one and only one infrasound event, referred to as PBNNC architecture. The overall accuracy of the classifier was found to be greater than 90%, using the CCR performance criterion. A feature extraction technique was employed that had a major impact toward increasing the classification performance over most other methods that have been tried in the past. Receiver operating characteristic curves were also employed to optimally set the output thresholds of the individual neural modules in the PBNNC architecture. This also contributed to increased performance of the global classifier. And finally, by optimizing the individual spread parameters of the RBF NN, the overall classifier performance was increased.

Acknowledgments The authors would like to thank Dr. Steve Tenney and Dr. Duong Tran-Luu from the Army Research Laboratory for their support of this work. The authors also thank Dr. Kamel Rekab, University of Missouri–Kansas City, and Mr. Young-Chan Lee, Florida Institute of Technology, for their comments and insight.

References 1. Pierce, A.D., Acoustics: An Introduction to Its Physical Principles and Applications, Acoustical Society of America Publications, Sewickley, PA, 1989. 2. Valentina, V.N., Microseismic and Infrasound Waves, Research Reports in Physics, SpringerVerlag, New York, 1992. 3. National Research Council, Comprehensive Nuclear Test Ban Treaty Monitoring, National Academy Press, Washington, DC, 1997. 4. Bedard, A.J. and Georges, T.M., Atmospheric infrasound, Physics Today, 53(3), 32–37, 2000. 5. Ham, F.M. and Kostanic, I., Principles of Neurocomputing for Science and Engineering, McGrawHill, New York, 2001. 6. Torres, H.M. and Rufiner, H.L., Automatic speaker identification by means of mel cepstrum, wavelets and wavelet packets, In Proc. 22nd Annu. EMBS Int. Conf., Chicago, IL, July 23–28, 2000, pp. 978–981. 7. Foo, S.W. and Lim, E.G., Speaker recognition using adaptively boosted classifier, In Proc. The IEEE Region 10th Int. Conf. Electr. Electron. Technol. (TENCON 2001), Singapore, Aug. 19–22, 2001, pp. 442–446. 8. Moonasar, V. and Venayagamoorthy, G., A committee of neural networks for automatic speaker recognition (ASR) systems, IJCNN, Vol. 4, Washington, DC, 2001, pp. 2936–2940. 9. Inal, M. and Fatihoglu, Y.S., Self-organizing map and associative memory model hybrid classifier for speaker recognition, 6th Seminar on Neural Network Applications in Electrical Engineering, NEUREL-2002, Belgrade, Yugoslavia, Sept. 26–28, 2002, pp. 71–74. 10. Ham, F.M., Rekab, K., Park, S., Acharyya, R., and Lee, Y.-C., Classification of infrasound events using radial basis function neural networks, Special Session: Applications of learning and data-driven methods to earth sciences and climate modeling, Proc. Int. Joint Conf. Neural Networks, Montre´al, Que´bec, Canada, July 31–August 4, 2005, pp. 2649–2654.

54

Signal and Image Processing for Remote Sensing

11. Mammone, R.J., Zhang, X., and Ramachandran, R.P., Robust speaker recognition: a featurebased approach, IEEE Signal Process. Mag., 13(5), 58–71, 1996. 12. Noble, J., Tenney, S.M., Whitaker, R.W., and ReVelle, D.O., Event detection from small aperture arrays, U.S. Army Research Laboratory and Los Alamos National Laboratory Report, September 2002. 13. Tenney, S.M., Noble, J., Whitaker, R.W., and Sandoval, T., Infrasonic SCUD-B launch signatures, U.S. Army Research Laboratory and Los Alamos National Laboratory Report, October 2003. 14. Haykin, S., Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1999. 15. Powell, M.J.D., Radial basis functions for multivariable interpolation: a review, IMA conference on Algorithms for the Approximation of Function and Data, RMCS, Shrivenham, U.K., 1985, pp. 143–167. 16. Powell, M.J.D., Radial basis functions for multivariable interpolation: a review, Algorithms for the Approximation of Function and Data, Mason, J.C. and Cox, M.G., Eds., Clarendon Press, Oxford, U.K., 1987. 17. Parzen, E., On estimation of a probability density function and mode, Ann. Math. Stat., 33, 1065–1076, 1962. 18. Duda, R.O., and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1973. 19. Specht, D.F., Probabilistic neural networks, Neural Networks, 3(1), 109–118, 1990. 20. Poggio, T. and Girosi, F., A Theory of Networks for Approximation and Learning, A.I. Memo 1140, MIT Press, Cambridge, MA, 1989. 21. Wilson, C.R. and Forbes, R.B., Infrasonic waves from Alaskan volcanic eruptions, J. Geophys. Res., 74, 4511–4522, 1969. 22. Wilson, C.R., Olson, J.V., and Richards, R., Library of typical infrasonic signals, Report prepared for ENSCO (subcontract no. 269343–2360.009), Vols. 1–4, 1996. 23. Bedard, A.J., Infrasound originating near mountain regions in Colorado, J. Appl. Meteorol., 17, 1014, 1978. 24. Olson, J.V. and Szuberla, C.A.L., Distribution of wave packet sizes in microbarom wave trains observed in Alaska, J. Acoust. Soc. Am., 117(3), 1032–1037, 2005. 25. Ham, F.M., Leeney, T.A., Canady, H.M., and Wheeler, J.C., Discrimination of volcano activity using infrasonic data and a backpropagation neural network, In Proc. SPIE Conf. Appl. Sci. Computat. Intell. II, Priddy, K.L., Keller, P.E., Fogel, D.B., and Bezdek, J.C., Eds., Orlando, FL, 1999, Vol. 3722, pp. 344–356. 26. Ham, F.M., Leeney, T.A., Canady, H.M., and Wheeler, J.C., An infrasonic event neural network classifier, In Proc. 1999 Int. Joint Conf. Neural Networks, Session 10.7, Paper No. 77, Washington, DC, July 10–16, 1999. 27. Ham, F.M., Neural network classification of infrasound events using multiple array data, International Infrasound Workshop 2001, Kailua-Kona, Hawaii, 2001. 28. Ham, F.M. and Park, S., A robust neural network classifier for infrasound events using multiple array data, In Proc. 2002 World Congress on Computational Intelligence—International Joint Conference on Neural Networks, Honolulu, Hawaii, May 12–17, 2002, pp. 2615–2619. 29. Kohavi, R. and Provost, F., Glossary of terms, Mach. Learn., 30(2/3), 271–274, 1998. 30. McDonough, R.N. and Whalen, A.D., Detection of Signals in Noise, 2nd ed., Academic Press, San Diego, CA, 1995. 31. Smith, S.W., The Scientist and Engineer’s Guide to Digital Signal Processing, California Technical Publishing, San Diego, CA, 1997. 32. Hanley, J.A. and McNeil, B.J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143(1), 29–36, 1982. 33. Demuth, H. and Beale, M., Neural Network Toolbox for Use with MATLAB, The MathWorks, Inc., Natick, MA, 1998.

4 Construction of Seismic Images by Ray Tracing

Enders A. Robinson

CONTENTS 4.1 Introduction ....................................................................................................................... 55 4.2 Acquisition and Interpretation ....................................................................................... 56 4.3 Digital Seismic Processing............................................................................................... 58 4.4 Imaging by Seismic Processing ...................................................................................... 60 4.5 Iterative Improvement ..................................................................................................... 62 4.6 Migration in the Case of Constant Velocity ................................................................. 63 4.7 Implementation of Migration ......................................................................................... 64 4.8 Seismic Rays ...................................................................................................................... 66 4.9 The Ray Equations............................................................................................................ 70 4.10 Numerical Ray Tracing.................................................................................................... 71 4.11 Conclusions........................................................................................................................ 73 References ..................................................................................................................................... 74

4.1

Introduction

Reflection seismology is a method of remote imaging used in the exploration of petroleum. The seismic reflection method was developed in the 1920s. Initially, the source was a dynamite explosion set off in a shallow hole drilled into the ground, and the receiver was a geophone planted on the ground. In difficult areas, a single source would refer to an array of dynamite charges around a central point, called the source point, and a receiver would refer to an array of geophones around a central point, called the receiver point. The received waves were recorded on a photographic paper on a drum. The developed paper was the seismic record or seismogram. Each receiver accounted for a single wiggly line on the record, which is called a seismic trace or simply a trace. In other words, a seismic trace is a signal (or time series) received at a specific receiver location from a specific source location. The recordings were taken for a time span starting at the time of the shot (called time zero) until about three or four seconds after the shot. In the early days, a seismic crew would record about 10 or 20 seismic records per day, with a dozen or two traces on each record. Figure 4.1 shows a seismic record with wiggly lines as traces. Seismic crew number 23 of the Atlantic Refining Company shot the record on October 9, 1952. As written on the record, the traces were shot with a source that was laid out as a 36-hole circular array. The first circle in the array had a diameter of 130 feet with 6 holes, each hole loaded with 10 lbs of dynamite. The second circle in the array had a diameter of 215 feet with 11 holes (which should have been 12 holes, but one hole was missing), each 55

Signal and Image Processing for Remote Sensing

56

FIGURE 4.1 Seismic record taken in 1952.

hole loaded with 10 lbs of dynamite. The third circle in the array had a diameter of 300 feet with 18 holes, each hole loaded with 5 lbs of dynamite. Each charge was at a depth of 20 feet. The total charge was thus 260 lbs of dynamite, which is a large amount of dynamite for a single seismic record. The receiver for each trace was made up of a group of 24 geophones (also called seismometers) in a circular array with 6 geophones on each of 4 circles of diameters 50 feet, 150 feet, 225 feet, and 300 feet, respectively. There was a 300-feet gap between group centers (i.e., receiver points). This record is called an NR seismogram. The purpose of the elaborate source and receiver arrays was an effort to bring out visible reflections on the record. The effort was fruitless. Regions where geophysicists can see no reflections on the raw records are termed as no record or no reflection (NR) areas. The region in which this record was taken, as the great majority of possible oilbearing regions in the world, was an NR area. In such areas, the seismic method (before digital processing) failed, and hence wildcat wells had to be drilled based on surface geology and a lot of guess work. There was a very low rate of discovery in the NR regions. Because of the tremendous cost of drilling to great depths, there was little chance that any oil would ever be discovered. The outlook for oil was bleak in the 1950s.

4.2

Acquisition and Interpretation

From the years of its inception up to about 1965, the seismic method involved two steps, namely acquisition and interpretation. Acquisition refers to the generation and recording of seismic data. Sources and receivers are laid out on the surface of the Earth. The objective is to probe the unknown structure below the surface. The sources are made up of vibrators (called vibroseis), dynamite shots, or air guns. The receivers are geophones on land and hydrophones at sea. The sources are activated one at a time, not all together. Suppose a

Construction of Seismic Images by Ray Tracing

57

single source is activated, the resulting seismic waves travel from the source into the Earth. The waves pass down through sedimentary rock strata, from which the waves are reflected upward. A reflector is an interface between layers of contrasting physical properties. A reflector might represent a change in lithology, a fault, or an unconformity. The reflected energy returns to the surface, where it is recorded. For each source activated, there are many receivers surrounding the source point. Each recorded signal, called a seismic trace, is associated with a particular source point and a particular receiver point. The traces, as recorded, are referred to as the raw traces. A raw trace contains all the received events. These events are produced by the subsurface structure of the Earth. The events due to primary reflections are wanted; all the other events are unwanted. Interpretation was the next step after acquisition. Each seismic record was examined through the eye and the primary reflections that could be seen were marked by a pencil. A primary reflection is an event that represents a passage from the source to the depth point, and then a passage directly back to the receiver (Figure 4.2). At a reflection, the traces become coherent; that is, they come into phase with each other. In other words, at a reflection, the crests and troughs on adjacent traces appear to fit into one another. The arrival time of a reflection indicates the depth of the reflecting horizon below the surface, while the time differential (the so-called step-out time) in the arrivals of a given peak or trough at successive receiver positions provides information on the dip of the reflecting horizon. In favorable areas, it is possible to follow the same reflection over a distance much greater than that covered by the receiver spread for a single record. In such cases, the records are placed side-by-side. The reflection from the last trace of one record correlates with the first trace of the next record. Such a correlation can be continued on successive records as long as the reflection persists. In areas of rapid structural change, the ensemble of raw traces is unable to show the true geometry of subsurface structures. In some cases, it is possible to identify an isolated structure such as a fault or a syncline on the basis of its characteristic reflection pattern. In NR regions, the raw record section does not give a usable image of the subsurface at all. Seismic wave propagation in three dimensions is a complicated process. The rock layers absorb, reflect, refract, or scatter the waves. Inside the different layers, the waves propagate at different velocities. The waves are reflected and refracted at the interfaces between the layers. Only elementary geometries can be treated exactly in three dimensions. If the reflecting interfaces are horizontal (or nearly so), the waves going straight down will be reflected nearly straight up. Thus, the wave motion is essentially vertical. If the time axes on the records are placed in the vertical position, time appears in the same direction as the raypaths. By using the correct wave velocity, the time axis can be converted into the depth axis. The result is that the primary reflections show the locations Receiver point R

Source point S

D Depth point

FIGURE 4.2 Primary reflection.

Signal and Image Processing for Remote Sensing

58

of the reflecting interfaces. Thus, in areas that have nearly level reflecting horizons, the primary reflections, as recorded, essentially show the correct depth positions of the subsurface interfaces. However, in areas that have a more complicated subsurface structure, the primary reflections as recorded in time do not occur at the correct depth positions in space. As a result, the primary reflections have to be moved (or migrated) to their proper spatial positions. In areas of complex geology, it is necessary to move (or migrate) the energy of each primary reflection to the spatial position of the reflecting point. The method is similar to that used in Huygens’s construction. Huygens articulated the principle that every point of a wavefront can be regarded as the origin of a secondary spherical wave, and the envelope of all these secondary waves constitutes the propagated wavefront. In the predigital days, migration was carried out by straightedge and compass or by a special-purpose handmanipulated drawing machine on a large sheet of a graph paper. The arrival times of the observed reflections were marked on the seismic records. These times are the two-way traveltimes from the source point to the receiver point. If the source and the receiver were at the same point (i.e., coincident), then the raypath down would be the same as the raypath up. In such a case, the one-way time is one half of the two-way time. From the two-way traveltime data, such a one-way time was estimated for each source point. This one-way time was multiplied by an estimated seismic velocity. The travel distance to the interface was thus obtained. A circle was drawn with the surface point as center and the travel distance as radius. This process was repeated for the other source points. In Huygens’s construction, the envelope of the spherical secondary waves gives the new wavefront. In a similar manner, in the seismic case, the envelope of the circles gives the reflecting interface. This method of migration was done in 1921 in the first reflection seismic survey ever taken.

4.3

Digital Seismic Processing

Historically, most seismic work fell under the category of two-dimensional (2D) imaging. In such cases, the source positions and the receiver positions are placed on a horizontal surface line called the x-axis. The time axis is a vertical line called the t-axis. Each source would produce many traces—one trace for each receiver position on the x-axis. The waves that make up each trace take a great variety of paths, each requiring a different time to travel from the source to receiver. Some waves are refracted and others scattered. Some waves travel along the surface of the Earth, and others are reflected upward from various interfaces. A primary reflection is an event that represents a passage from the source to the depth point, and then a passage directly back to the receiver. A multiple reflection is an event that has undergone three, five, or some other odd number of reflections in its travel path. In other words, a multiple reflection takes a zig-zag course with the same number of down legs as up legs. Depending on their time delay from the primary events with which they are associated, multiple reflections are characterized as short-path, implying that they interfere with the primary reflection, or as long-path, where they appear as separate events. Usually, primary reflections are simply called reflections or primaries, whereas multiple reflections are simply called multiples. A water-layer reverberation is a type of multiple reflection due to the multiple bounces of seismic energy back and forth between the water surface and the water bottom. Such reverberations are common in marine seismic data. A reverberation re-echoes (i.e., bounces back and forth) in the water layer for a prolonged period of time. Because of its resonant nature, a

Construction of Seismic Images by Ray Tracing

59

reverberation is a troublesome type of multiple. Reverberations conceal the primary reflections. The primary reflections (i.e., events that have undergone only one reflection) are needed for image formation. To make use of the primary reflected signals on the record, it is necessary to distinguish them from the other type of signals on the record. Random noise, such as wind noise, is usually minor and, in such cases, can be neglected. All the seismic signals, except primary reflections, are unwanted. These unwanted signals are due to the seismic energy introduced by the seismic source signal; hence, they are called signalgenerated noise. Thus, we are faced with the problem of (primary reflected) signal enhancement and (signal-generated) noise suppression. In the analog days (approximately up to about 1965), the separation of signal and noise was done through the eye. In the 1950s, a good part of the Earth’s sedimentary basins, including essentially all water-covered regions, were classified as NR areas. Unfortunately, in such areas, the signal-generated noise overwhelms the primary reflections. As a result, the primary reflections cannot be picked up visually. For example, water-layer reverberations as a rule completely overwhelm the primaries in the water-covered regions such as the Gulf of Mexico, the North Sea, and the Persian Gulf. The NR areas of the world could be explored for oil in a direct way by drilling, but not by the remote detection method of reflection seismology. The decades of the 1940s and 1950s were replete with inventions, not the least of which was the modern high-speed electronic stored-program digital computer. In the years from 1952 to 1954, almost every major oil company joined the MIT Geophysical Analysis Group to use the digital computer to process NR seismograms (Robinson, 2005). Historically, the additive model (trace ¼ s þ n) was used. In this model, the desired primary reflections were the signal s and everything else was the noise n. Electric filers and other analog methods were used, but they failed to give the desired primary reflections. The breakthrough was the recognition that the convolutional model (trace ¼ sn) is the correct model for a seismic trace. Note that the asterisk denotes convolution. In this model, the signal-generated noise is the signal s and the unpredictable primary reflections are the noise n. Deconvolution removes the signal-generated noise (such as instrument responses, ground roll, diffractions, ghosts, reverberations, and other types of multiple reflections) so as to yield the underlying primary reflections. The MIT Geophysical Analysis Group demonstrated the success of deconvolution on many NR seismograms, including the record shown in Figure 4.1. However, the oil companies were not ready to undertake digital seismic processing at that time. They were discouraged because an excursion into digital seismic processing would require new effort that would be expensive, and still the effort might fail because of the unreliability of the existing computers. It is true that in 1954 the available digital computers were far from suitable for geophysical processing. However, each year from 1946 onward, there was a constant stream of improvements in computers, and this development was accelerating every year. With patience and time, the oil and geophysical companies would convert to digital processing. It would happen when the need for hard-to-find oil was great enough to justify the investment necessary to turn NR seismograms into meaningful data. Digital signal processing was a new idea to the seismic exploration industry, and the industry shied away from converting to digital methods until the 1960s. The conversion of the seismic exploration industry to digital was in full force by about 1965, at which time transistorized computers were generally available at a reasonable price. Of course, a reasonable price for one computer then would be in the range from hundreds of thousands of dollars to millions of dollars. With the digital computer, a whole new step in seismic exploration was added, namely digital processing. However, once the conversion to digital was undertaken in the years around 1965, it was done quickly and effectively. Reflection seismology now involves three steps, namely acquisition, processing, and interpretation. A comprehensive presentation of seismic data processing is given by Yilmaz (1987).

Signal and Image Processing for Remote Sensing

60

4.4

Imaging by Seismic Processing

The term imaging refers to the formation of a computer image. The purpose of seismic processing is to convert the raw seismic data into a useful image of the subsurface structure of the Earth. From about 1965 onward, most of the new oil fields discovered were the result of the digital processing of the seismic data. Digital signal processing deconvolves the data and then superimposes (migrates) the results. As a result, seismic processing is divided into two main divisions: the deconvolution phase, which produces primaries-only traces (as well as possible), and the migration phase, which moves the primaries to their true depth positions (as well as possible). The result is the desired image. The first phase of imaging (i.e., deconvolution) is carried out on the traces, either individually by means of single-channel processing or in groups by means of multi-channel processing. Ancillary signal-enhancement methods typically include such things as the analyses of velocities and frequencies, static and dynamic corrections, and alternative types of deconvolution. Deconvolution is performed on one or a few traces at a time; hence, the small capacity of the computers of the 1960s was not a severely limiting factor. The second phase of imaging (i.e., migration) is the movement of the amplitudes of the primary-reflection events to their proper spatial locations (the depth points). Migration can be implemented by a Huygens-like superposition of the deconvolved traces. In a mechanical medium, such as the Earth, forces between the small rock particles transmit the disturbance. The disturbance at some regions of rock acts locally on nearby regions. Huygens imagined that the disturbance on a given wavefront is made up of many separate disturbances, each of which acts like a point source that radiates a spherically symmetric secondary wave, or wavelet. The superposition of these secondary waves gives the wavefront at a later time. The idea that the new wavefront is obtained by superposition is the crowning achievement of Huygens. See Figure 4.3. In a similar way, seismic migration uses superposition to find the subsurface reflecting interfaces.

A

A

H

B

I B

b

C K

d

D

d

b

d

b

d

d

L C

FIGURE 4.3 Huygens’s construction.

d

b

G

F E

Construction of Seismic Images by Ray Tracing

61

Ground surface

Interface FIGURE 4.4 Construction of a reflecting interface.

See Figure 4.4. In the case of the migration, there is an added benefit of superposition, namely, superposition is one of the most effective ways to accentuate signals and suppress noise. The superposition used in migration is designed to return the primary reflections to their proper spatial locations. The remnants of the signal-generated noise on the deconvolved traces are out of step with the primary events. As a result, the remaining signal-generated noise tends to be destroyed by the superposition. The superposition used in migration provides the desired image of the underground structure of the Earth. Historically, migration (i.e., the movement of reflected events to their proper locations in space) was carried out manually, sometimes making use of elaborate drawing instruments. The transfer of the manual processes to a digital computer involved the manipulation of a great number of traces at once. The resulting digital migration schemes all relied heavily on the superposition of the traces. This tremendous amount of data handling had a tendency to overload the limited capacities of the computers made in the 1960s and 1970s. As a result, it was necessary to simplify the migration problem and to break it down into smaller parts. Thus, migration was done by a sequence of approximate operations, such as stacking, followed by normal moveout, dip moveout, and migration after stack. The process known as time migration was often used, which improved the records in time, but stopped short of placing the events in their proper spatial positions. All kinds of modifications and adjustments were made to these piecemeal operations, and seismic migration in the 1970s and 1980s was a complicated discipline—an art as much as a science. The use of this art required much skill. Meanwhile, great advances in technology were taking place. In the 1990s, everything seemed to come together. Major improvements in instrumentation and computers resulted in light compact geophysical field equipment and affordable computers with high speed and massive storage. Instead of the modest number of sources and receivers used in 2D seismic processing, the tremendous number required for three-dimensional (3D) processing started to be used on a regular basis in field operations. Finally, the computers were large enough to handle the data for 3D imaging. Event movements (or migration) in three dimensions can now be carried out economically and efficiently by time-honored superposition methods such as those used in the Huygens’s construction. These migration methods are generally named as prestack migration. This name is a relic, which implies that stacking and all the other piecemeal operations are no longer used in the migration scheme. Until the 1990s, 3D seismic imaging was rarely used because of the prohibitive costs involved. Today, 3D methods are commonly used, and the resulting subsurface images are of extraordinary quality. Three-dimensional prestack migration significantly improves seismic interpretation because the locations of geological structures, especially faults, are given much more accurately. In addition, 3D migration collapses diffractions from secondary sources such as reflector terminations against faults and corrects bow ties to form synclines. Three-dimensional seismic work gives beautiful images of the underground structure of the Earth.

Signal and Image Processing for Remote Sensing

62

4.5

Iterative Improvement

Let (x, y) represent the surface coordinates and z represent the depth coordinate. Migration takes the deconvolved records and moves (i.e., migrates) the reflected events to depth points in the 3D volume (x, y, z). In this way, seismic migration produces an image of the geologic structure g(x, y, z) from the deconvolved seismic data. In other words, migration is a process in which primary reflections are moved to their correct locations in space. Thus, for migration, we need the primary reflected events. What else is required? The answer is the complete velocity function v(x, y, z), which gives the wave velocity at each point in the given volume of the Earth under exploration. At this point, let us note that the word velocity is used in two different ways. One way is to use the word velocity for the scalar that gives the rate of change of position in relation to time. When velocity is a scalar, the terms speed or swiftness are often used instead. The other (and more correct) way is to use the word velocity for the vector quantity that specifies both the speed of a body and its direction of motion. In geophysics, the word velocity is used for the (scalar) speed or swiftness of a seismic wave. The reciprocal of velocity v is slowness, n ¼ 1/v. Wave velocity can vary vertically and laterally in isotropic media. In anisotropic media, it can also vary azimuthally. However, we consider only isotropic media; therefore, at a given point, the wave velocity is the same in all directions. Wave velocity tends to increase with depth in the Earth because deep layers suffer more compaction from the weight of the layers above. Wave velocity can be determined from laboratory measurements, acoustic logs, and vertical seismic profiles or from velocity analysis of seismic data. Often, we say velocity when we mean wave velocity. Over the years, various methods have been devised to obtain a sampling of the velocity distribution within the Earth. The velocity functions so determined vary from method to method. For example, the velocity measured vertically from a check-shot or vertical seismic profile (VSP) differs from the stacking velocity derived from normal moveout measurements of ommon depth point gathers. Ideally, we would want to know the velocity at each and every point in the volume of Earth of interest. In many areas, there are significant and rapid lateral or vertical changes in the velocity that distort the time image. Migration requires an accurate knowledge of vertical and horizontal seismic velocity variations. Because the velocity depends on the types of rocks, a complete knowledge of the velocity is essentially equivalent to a complete description of the geologic structure g(x, y, z). However, as we have stated above, the velocity function is required to get the geologic structure. In other words, to get the answer (the geologic structure) g(x, y, z), we must know the answer (the velocity function) v(x, y, z). Seismic interpretation takes the images generated as representatives of the physical Earth. In an iterative improvement scheme, any observable discrepancies in the image are used as forcing functions to correct the velocity function. Sometimes, simple adjustments can be made, and, at other times, the whole imaging process has to be redone one or more times before a satisfactory solution can be obtained. In other words, there is interplay between seismic processing and seismic interpretation, which is a manifestation of the well-accepted exchange between the disciplines of geophysics and geology. Iterative improvement is a well-known method commonly used in those cases where you must know the answer to find the answer. By the use of iterative improvement, the seismic inverse problem is solved. In other words, the imaging of seismic data requires a model of seismic velocity. Initially, a model of smoothly varying velocity is used. If the results are not satisfactory, the velocity model is adjusted and a new image is formed. This process is repeated until a satisfactory image is obtained. To get the image, we must know the velocity; the method of iterative improvement deals with this problem.

Construction of Seismic Images by Ray Tracing

4.6

63

Migration in the Case of Constant Velocity

Consider a primary reflection. Its two-way traveltime is the time it takes for the seismic energy to travel down from the source S ¼ (xS, yS) to depth point D ¼ (xD, yD, zD) and then back up to the receiver R ¼ (xR, yR). The deconvolved trace f(S, R, t) gives the amplitude of the reflected signal as a function of two-way traveltime t, which is given in milliseconds from the time that the source is activated. We know S, R, t, and f(S, R, t). The problem is to find D, which is the depth point at refection. An isotropic medium is a medium whose properties at a point are the same in all directions. In particular, the wave velocity at a point is the same in all directions. Fresnel’s principle of least time requires that in an isotropic medium the rays are orthogonal trajectories of the wavefronts. In other words, the rays are normal to the wavefronts. However, in an anisotropic medium, the rays need not be orthogonal trajectories of the wavefronts. A homogeneous medium is a medium whose physical properties are the same throughout. For ease of exposition, let us first consider the case of a homogenous isotropic medium. Within a homogeneous isotropic material, the velocity v has the same value at all points and in all directions. The rays are straight lines since by symmetry they cannot bend in any preferred direction, as there are none. The two-way traveltime t is the elapsed time for a seismic wave to travel from its source to a given depth point and return to a receiver at the surface of the Earth. The two-way traveltime t is thus equal to the one-way traveltime t1 from the source point S to the depth point D plus the one-way traveltime t2 from the depth point D to the receiver point R. Note that the traveltime from D to R is the same as the traveltime from R to D. We may write t ¼ t1 þ t2, which in terms of distance is qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ vt ¼ (xD xS )2 þ (yD yS )2 þ (zD zS )2 þ (xD xR )2 þ (yD yR )2 þ (zD zR )2 We recall that an ellipse can be drawn with two pins, a loop of string, and a pencil. The pins are placed at the foci and the ends of the string are attached to the pins. The pencil is placed on the paper inside the string, so the string is taut. The string forms a triangle. If the pencil is moved around so that the string stays taut, the sum of the distances from the pencil to the pins remains constant, and the curve traced out by the pencil is an ellipse. Thus, if vt is the length of the string, then any point on the ellipse could be the depth point D that produces the reflection for that source S, receiver R, and traveltime t. We therefore take that event and move it out to each point on the ellipse. Suppose we have two traces with only one event on each trace. Suppose both events come from the same reflecting surface. In Figure 4.5, we show the two ellipses. In the spirit of Huygens’s construction, the reflector must be the common tangent to the ellipses. This example shows how migration works. In practice, we would take

Ellipse with possible reflector points for event on one trace

Ellipse with possible reflector points for event on another trace

Surface of earth

Tangent to the ellipses

FIGURE 4.5 Reflecting interface as a tangent.

Signal and Image Processing for Remote Sensing

64

the amplitude at each digital time instant t on the trace, and scatter the amplitude on the constructed ellipsoid. In this way, the trace is spread out into a 3D volume. Then we repeat this operation for each trace, until every trace is spread out into a 3D volume. The next step is superposition. All of these volumes are added together. (In practice, each trace would be spread out and cumulatively added into one given volume.) Interference tends to destroy the noise, and we are left with the desired 3D image of the Earth.

4.7

Implementation of Migration

A raypath is a course along which wave energy propagates through the Earth. In isotropic media, the raypath is perpendicular to the local wavefront. The raypath can be calculated using ray tracing. Let the point P ¼ (xP, yP) be either a source location or a receiver location. The subsurface volume is represented by a 3D grid (x, y, z) of depth points D. To minimize the amount of ray tracing, we first compute a traveltime table for each and every surface location, whether the location be a source point or a receiver point. In other words, for each surface location P, we compute the one-way traveltime from P to each depth point D in the 3D grid. We put these one-way traveltimes into a 3D table that is labeled by the surface location P. The traveltime for a primary reflection is the total two-way (i.e., down and up) time for a path originating at the source point S, reflected at the depth point D, and received at the receiver point R. Two identification numbers are associated with each trace: one for the source S and the other for the receiver R. We pull out the respective tables for these two identification numbers. We add the two tables together, element by element. The result is a 3D table for the two-way traveltimes for that seismic trace. Let us give a 2D example. We assume that the medium has a constant velocity, which we take to be 1. Let the subsurface grid for depth points D be given by (z, x), where depth is given by z ¼ 1, 2, . . . , 10 and horizontal distance is given by x ¼ 1, 2, . . . , 15. Let the surface locations P be (z ¼ 1, x), where x ¼ 1, 2, . . . , 15. Suppose the source is S ¼ (1, 3). We want to construct a table of one-way traveltimes, where depth z denotes the row and horizontal distance x denotes the column. The one-way traveltime from the source to the qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ depth point (z, x) is t(z, x) ¼ (z 1)2 þ (x 3)2 . For example, the traveltime from source to depth point (z, x) ¼ (4, 6) is qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃ t(4, 6) ¼ (4 1)2 þ (6 3)2 ¼ 18 ¼ 4:24 This number (rounded) appears in the fourth row, sixth column of the table below. The computed table (rounded) for the source is 2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

5.0 5.1 5.4 5.8 6.4 7.1 7.8 8.6 9.4 10.3

6.0 6.1 6.3 6.7 7.2 7.8 8.5 9.2 10.0 10.8

7.0 7.1 7.3 7.6 8.1 8.6 9.2 9.9 10.6 11.4

8.0 8.1 8.2 8.5 8.9 9.4 10.0 10.6 11.3 12.0

9.0 9.1 9.2 9.5 9.8 10.3 10.8 11.4 12.0 12.7

10.0 10.0 10.2 10.4 10.8 11.2 11.7 12.2 12.8 13.5

11.0 11.0 11.2 11.4 11.7 12.1 12.5 13.0 13.6 14.2

12.0 12.0 12.2 12.4 12.6 13.0 13.4 13.9 14.4 15.0

Construction of Seismic Images by Ray Tracing

65

Next, let us compute the traveltime for the receiver point. Note that the traveltime from the depth point to the receiver is the same as the traveltime from the receiver to the depth point. Suppose the receiver is R ¼ (11, 1). Then the one-way traveltime from the receiver qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ to the depth point (z, x) is t(z, x) ¼ (z 1)2 þ (x 11)2 . For example, the traveltime from qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ source to depth point (z, x) ¼ (4, 6) is t(4, 6) ¼ (4 1)2 þ (6 11)2 ¼ 5:85. This number (rounded) appears in the fourth row, sixth column of the table below. The computed table (rounded) for the receiver is 10.0 10.0 10.2 10.4 10.8 11.2 11.7 12.2 12.8 13.5

9.0 9.1 9.2 9.5 9.8 10.3 10.8 11.4 12.0 12.7

8.0 8.1 8.2 8.5 8.9 9.4 10.0 10.6 11.3 12.0

7.0 7.1 7.3 7.6 8.1 8.6 9.2 9.9 10.6 11.4

6.0 6.1 6.3 6.7 7.2 7.8 8.5 9.2 10.0 10.8

5.0 5.1 5.4 5.8 6.4 7.1 7.8 8.6 9.4 10.3

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

The addition of the above two tables gives the two-way traveltimes. For example, the traveltime from source to depth point (z, x) ¼ (4, 6) and back to receiver is t(4, 6) ¼ 4.24 þ 5.85 ¼ 10.09. This number (rounded) appears in the fourth row, sixth column of the two-way table below. The two-way table (rounded) for source and receiver is 12 12 13 14 15 17 18 19 21 23

10 10 11 13 14 15 17 18 20 22

8 9 10 12 13 14 16 18 19 21

8 8 10 11 12 14 15 17 19 20

8 8 9 10 12 13 15 16 18 20

8 8 9 10 11 13 15 16 18 20

8 8 9 10 11 13 14 16 18 20

8 8 9 10 11 13 15 16 18 20

8 8 9 10 12 13 15 16 18 20

8 8 10 11 12 14 15 17 19 20

8 9 10 12 13 14 16 18 19 21

10 10 11 13 14 15 17 18 20 22

12 12 13 14 15 17 18 19 21 23

14 14 15 16 17 18 19 21 22 24

16 16 17 17 18 19 21 22 23 25

Figure 4.6 shows a contour map of the above table. The contour lines are elliptic curves of constant two-way traveltime for the given source and receiver pair. Let the deconvolved trace (i.e., the trace with primary reflections only) be Time 0 Trace 0 Sample

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

10 1

11 0

12 0

13 2

14 0

15 0

16 0

17 0

18 3

19 0

The next step is to place the amplitude of each trace at depth locations, where the traveltime of the trace sample equals the traveltime as given in the above two-way table. The trace is zero for all times except 10, 13, and 18. We now spread the trace

Signal and Image Processing for Remote Sensing

66

S1 S2 S3 S4 20.0–25.0

S5

15.0–20.0 10.0–15.0

S6

5.0–10.0 S7

0.0–5.0

S8 S9

1

2

3

4

5

6

7

8

9

10

11

12

13

S10 15

14

FIGURE 4.6 Elliptic contour lines.

out as follows. In the above two-way table, the entries with 10, 13, and 18 are replaced by the trace values 1, 2, 3, respectively. All other entries are replaced by zero. The result is 0 0 2 0 0 0 3 0 0 0

1 1 0 2 0 0 0 3 0 0

0 0 1 0 2 0 0 3 0 0

0 0 1 0 0 0 0 0 0 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 1 0 0 0 0 0 0 0

0 0 1 0 2 0 0 3 0 0

0 1 0 2 0 0 0 3 0 0

0 0 2 0 0 0 0 0 0 0

0 0 0 0 0 3 0 0 0 0

0 0 0 0 3 0 0 0 0 0

This operation is repeated for all traces in the survey, and the resulting tables are added together. The final image appears by the constructive and destructive interference among the individual trace contributions. The above procedure for a constant velocity is the same as with a variable velocity, except now the traveltimes are computed according to the velocity function v(x, y, z). The eikonal equation can provide the means; hence, for the rest of this paper we will develop the properties of this basic equation.

4.8

Seismic Rays

To move the received reflected events back into the Earth and place their energy at the point of reflection, it is necessary to have a good understanding of ray theory. We assume the medium is isotropic. Rays are directed curves that are always perpendicular to the

Construction of Seismic Images by Ray Tracing

67

wavefront at any given time. The rays point along the direction of the motion of the wavefront. As time progresses, the disturbance propagates, and we obtain a family of wavefronts. We will now describe the behavior of the rays and wavefronts in media with a continuously varying velocity. In the treatment of light as wave motion, there is a region of approximation in which the wavelength is small in comparison with the dimensions of the components of the optical system involved. This region of approximation is treated by the methods of geometrical optics. When the wave character of the light cannot be ignored, then the methods of physical optics apply. Since the wavelength of light is very small compared to ordinary objects, geometrical optics can describe the behavior of a light beam satisfactorily in many situations. Within the approximation represented by geometrical optics, light travels along lines called rays. The ray is essentially the path along which most of the energy is transmitted from one point to another. The ray is a mathematical device rather than a physical entity. In practice, one can produce very narrow beams of light (e.g., a laser beam), which may be considered as physical manifestations of rays. When we turn to a seismic wave, the wavelength is not particularly small in comparison with the dimensions of geologic layers within the Earth. However, the concept of a seismic ray fulfills an important need. Geometric seismics is not nearly as accurate as geometric optics, but still ray theory is used to solve many important practical problems. In particular, the most popular form of prestack migration is based on tracing the raypaths of the primary reflections. In ancient times, Archimedes defined the straight line as the shortest path between two points. Heron explained the paths of reflected rays of light based on a principle of least distance. In the 17th century, Fermat proposed the principle of least time, which let him account for refraction as well as reflection. The Mississippi River has created most of Louisiana with sand and silt. The river could not have deposited these sediments by remaining in one channel. If it had remained in one channel, southern Louisiana would be a long narrow peninsula reaching into the Gulf of Mexico. Southern Louisiana exists in its present form because the Mississippi River has flowed here and there within an arc of about two hundred miles wide, frequently and radically changing course, surging over the left or the right bank to flow in a new direction. It is always the river’s purpose to get to the Gulf in the least time. This means that its path must follow the steepest way down. The gradient is the vector that points in the direction of the steepest ascent. Thus, the river’s path must follow the direction of the negatives gradient, which is the path of steepest descent. As the mouth advances southward and the river lengthens, the steepness of the path declines, the current slows, and sediment builds up the bed. Eventually, the bed builds up so much that the river spills to one side to follow what has become the steepest way down. Major shifts of that nature have occurred about once in a millennium. The Mississippi’s main channel of three thousand years ago is now Bayou Teche. A few hundred years later, the channel shifted abruptly to the east. About two thousand years ago, the channel shifted to the south. About one thousand years ago, the channel shifted to the river’s present course. Today, the Mississippi River has advanced past New Orleans and out into the Gulf that the channel is about to shift again to the Atchafalaya. By the route of the Atchafalaya, the distance across the delta plain is 145 miles, which is about half the length of the route of the present channel. The Mississippi River intends changing its course to this shorter and steeper route. The concept of potential was first developed to deal with problems of gravitational attraction. In fact, a simple gravitational analogy is helpful in explaining potential. We do work in carrying an object up a hill. This work is stored as potential energy, and it can be recovered by descending in any way we choose. A topographic map can be used to visualize the terrain. Topographic maps provide information about the elevation of the surface above sea level. The elevation is represented on a topographic map by contour

Signal and Image Processing for Remote Sensing

68

lines. Each point on a contour line has the same elevation. In other words, a contour line represents a horizontal slice through the surface of the land. A set of contour lines tells you the shape of the land. For example, hills are represented by concentric loops, whereas stream valleys are represented by V-shapes. The contour interval is the elevation difference between adjacent contour lines. Steep slopes have closely spaced contour lines, while gentle slopes have very widely spaced contour lines. In seismic theory, the counterpart of gravitational potential is the wavefront t(x, y, z) ¼ constant. A vector field is a rule that assigns a vector, in our case the gradient rt(x, y, z) ¼

@t @t @t , , @x @y @z

to each point (x, y, z). In visualizing a vector field, we imagine there is a vector extending from each point. Thus, the vector field associates a direction to each point. If a hypothetical particle moves in such a manner that its direction at any point coincides with the direction of the vector field at that point, then the curve traced out is called a flow line. In the seismic case, the wavefront corresponds to the equipotential surface and the seismic ray corresponds to the flow line. In 3D space, let r be the vector from the origin (0, 0, 0) to an arbitrary point (x, y, z). A vector is specified by its components. A compact notation is to write r as r ¼ (x, y, z). We call r the radius vector. A more explicit way is to write r as r ¼ xxˆ þ yyˆ þ zzˆ, where xˆ, yˆ, and zˆ are the three orthogonal unit vectors. These unit vectors are defined as the vectors that have magnitude equal to one and have directions lying along the x, y, z axes, respectively. They are referred to as ‘‘x-hat’’ and so on. Now, let the vector r ¼ (x, y, z) represent a point on a given ray (Figure 4.7). Let s denote arc length along the ray. Let r þ dr ¼ (x þ dx, y þ dy, z þ dz) give an adjacent point on the same ray. The vector dr ¼ (dx, dy, dz) is (approximately) the tangent vector to the ray. The pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ length of this vector is (dx2 þ dy2 þ dz2 Þ which is approximately equal to the increment ds of the arc length on the ray. As a result the unit vector tangent to the ray is u¼

dr dx dy dz ¼ xˆ þ yˆ þ zˆ ds ds ds ds

The unit tangent vector can also be written as dx dy dz u¼ , , ds ds ds The velocity along the ray is v ¼ ds/dt so dt ¼ ds/v ¼ n ds, where n(r) ¼ 1/v(r) is the slowness and ds is an increment of path length along the given ray. Thus, the seismic traveltime field is

Unit tangent u

Ray Wavefront FIGURE 4.7 Raypath and wavefronts.

Radius vector r

Construction of Seismic Images by Ray Tracing

t(r) ¼

69

ðr n(r) ds r0

It is understood, of course, that the path of integration is along the given ray. The above equation holds for any ray. A wavefront is a surface of constant traveltime. The time difference between two wavefronts is a constant independent of the ray used to calculate the time difference. A wave as it travels must follow the path of least time. The wavefronts are like contour lines on a hill. The height of the hill is measured in time. Take a point on a contour line. In what direction will the ray point? Suppose the ray points along the contour line (that is, along the wavefront). As the wave travels a certain distance along this hypothetical ray, it takes time. But, all time is the same along the wavefront. Thus, a wave cannot travel along a wavefront. It follows that a ray must point away from a wavefront. Suppose a ray points away from the wavefront. The wave wants to take the least time to travel to the new wavefront. By isotropy, the wave velocity is the same in all directions. Since the traveltime is velocity multiplied by distance, the wave wants to take the raypath that goes the shortest distance. The shortest distance is along the path that has no component along the wavefront; that is, the shortest distance is along the normal to the wavefront. In other words, the ray’s unit tangent vector u must be orthogonal to the wavefront. By definition, the gradient is a vector that points in the direction orthogonal to the wavefront. Thus, the ray’s unit tangent vector u and the gradient rt of the wavefront must point in the same direction. If the given wavefront is at time t and the new wavefront is at time t þ dt, then the traveltime along the ray is dt. If s measures the path length along the given ray, then the travel distance in time dt is ds. Along the raypath, the increments dt and ds are related by the slowness, that is, dt ¼ n ds. Thus, the slowness is equal to the directional derivative in the direction of the raypath, that is, n ¼ dt/ds. In other words, the swiftness along the raypath direction is v ¼ ds/dt, and the slowness along the raypath direction is n ¼ dt/ds. If we write the directional derivative in terms of its components, this equation becomes n¼

dt @t dx @t dy @t dz dr ¼ þ þ ¼ rt ds @x ds @y ds @z ds ds

Because dr/ds ¼ u, it follows that the above equation is n ¼ rt u. Since u is a unit vector in the same direction of the gradient, it follows that n ¼ jrtj juj cos 0 ¼ jrtj In other words, the slowness is equal to the magnitude of the gradient. Since gradient rt and the raypath each have the same direction u, and the gradient has magnitude n, and u has magnitude unity, it follows that rt ¼ nu This equation is the vector eikonal equation. The vector eikonal equation written in terms of its components is

@t @t @t dx dy dz , , , , ¼n @x @y @z ds ds ds

If we take the squared magnitude of each side, we obtain the eikonal equation

Signal and Image Processing for Remote Sensing

70

@t @x

2 2 2 @t @t þ þ ¼ n2 @y @z

The left-hand side involves the wavefront and the right-hand side involves the ray. The connecting link is the slowness. In the eikonal equation, the function t(x, y, z) is the traveltime (also called the eikonal) from the source to the point with the coordinates (x, y, z), and n(x, y, z) ¼ 1/v(x, y, z) is the slowness (or reciprocal velocity) at that point. The eikonal equation describes the traveltime propagation as an isotropic medium. To obtain a well-posed initial value problem, it is necessary to know the velocity function v(x, y, z) at all points in space. Moreover, as an initial condition, the source or some particular wavefront must be specified. Furthermore, one must choose one of the two branches of the solutions (namely, either the wave going from the source or else the wave going to the source). The eikonal equation then yields the traveltime field t(x, y, z) in the heterogeneous medium, as required for migration. What does the eikonal equation rt ¼ nu say? It says that, because of Fermat’s principle of least time, the raypath direction must be orthogonal to the wavefront. The eikonal equation is the fundamental equation that connects the ray (which corresponds to the fuselage of the airplane) to the wavefront (which corresponds to the wings of the airplane). The wings let the fuselage feel the effects of points removed from the path of the fuselage. The eikonal equation makes a traveling wave (as envisaged by Huygens) fundamentally different from a traveling particle (as envisaged by Newton). Hamilton perceived that there is a wave–particle duality, which provides the mathematical foundation of quantum mechanics.

4.9

The Ray Equations

In this section, the position vector r always represents a point on a specific raypath, and not any arbitrary point in space. As time increases, r traces out the particular raypath in question. The seismic ray at any given point follows the direction of the gradient of the traveltime field t(r). As before, let u be the unit vector along the ray. The ray, in general, follows a curved path, and nu is the tangent to this curved raypath. We now want to derive an equation that will tell us how nu changes along the curved raypath. The vector eikonal equation is written as nu ¼ rt We now take the derivative of the vector eikonal equation with respect to distance s along the raypath. We obtain the ray equation d d (nu) ¼ (rt) ds ds Interchange r and (d/ds) and use dt/ds ¼ n. Thus, the right-hand side becomes drt dt ¼r ¼ rn ds ds Thus, the ray equation becomes

Construction of Seismic Images by Ray Tracing

71

High slowness

Low slowness FIGURE 4.8 Straight and curved rays.

d (nu) ¼ rn ds This equation, together with the equation for the unit tangent vector dr ¼u ds are called the ray equations. We need to understand how a single ray, say a seismic ray, moving along a particular path can know what is an extremal path in the variational sense. To illustrate the problem, consider a medium whose slowness n decreases with vertical depth, but is constant laterally. Thus, the gradient of the slowness at any location points straight up (Figure 4.8). The vertical line on the left depicts a raypath parallel to the gradient of the slowness. This ray undergoes no refraction. However, the path of the ray on the right intersects the contour lines of slowness at an angle. The right-hand ray is refracted and follows a curved path, even though the ray strikes the same horizontal contour lines of slowness as did the left-hand ray, where there was no refraction. This shows that the path of a ray cannot be explained solely in terms of the values of the slowness on the path. We must also consider the transverse values of the slowness along neighboring paths, that is, along paths not taken by that particular ray. The classical wave explanation, proposed by Huygens, resolves this problem by saying that light does not propagate in the form of a single ray. According to the wave interpretation, light propagates as a wavefront possessing transverse width. Think of an airplane traveling along the raypath. The fuselage of the airplane points in the direction of the raypath. The wings of the aircraft are along the wavefront. Clearly, the wavefront propagates more rapidly on the side where the slowness is low (i.e., where the velocity is high) than on the side where the slowness is high. As a result, the wavefront naturally turns in the direction of the gradient of slowness.

4.10

Numerical Ray Tracing

Computer technology and seismic instrumentation have experienced great advances in the past few years. As a result, exploration geophysics is in a state of transition from computer-limited 2D processing to computer-intensive 3D processing. In the past, most seismic surveys were along surface lines, which yield 2D subsurface images. The wave equation acts nicely in one dimension, and in three dimensions, but not in two dimensions. In one dimension, waves (as on a string) propagate without distortion. In three

72

Signal and Image Processing for Remote Sensing

dimensions, waves (as in the Earth) propagate in an undistorted way except for a spherical correction factor. However, in two dimensions, wave propagation is complicated and distorted. By its very nature, 2D processing can never account for events originating outside of the plane. As a result, 2D processing is broken up into a large number of approximate partial steps in a sequence of operations. These steps are ingenious, but they can never give a true image. However, 3D processing accounts for all of the events. It is now cost effective to lay out seismic surveys over a surface area and to do 3D processing. The third dimension is no longer missing, and, consequently, the need for a large number of piecemeal 2D approximations is gone. Prestack depth migration is a 3D imaging process that is computationally extensive, but mathematically simple. The resulting 3D images of the interior of the Earth surpass all expectations in utility and beauty. Let us now consider the general case in which we have a spatially varying velocity function v(x, y, z) ¼ v(r). This velocity function represents a velocity field. For a fixed constant v0, the equation v(r) ¼ v0 specifies those positions, r, where velocity has this fixed value. The locus of such positions makes up an isovelocity surface. The gradient @v @v @v , , rv(r) ¼ @x @y @z is normal to the isovelocity surface and points in the direction of the greatest increase in velocity. Similarly, the equation n(r) ¼ n0 for a fixed value of slowness n0 specifies an isoslowness surface. The gradient @n @n @n , , rn(r) ¼ @x @y @z is normal to the isoslowness surface and points in the direction of greatest increase in slowness. The isovelocity and isoslowness surfaces coincide, and rv ¼ n2 rn so the respective gradients point in the opposite direction, as we would expect. A seismic ray makes its way through the slowness field. As the wavefront progresses in time, the raypath is bent according to the slowness field. For example, suppose we have a stratified Earth in which the slowness decreases with depth, a vertical raypath does not bend, as it is pulled equally in all lateral directions. However, a nonvertical ray drags on its slow side, therefore it curves away from the vertical and bends toward the horizontal. This is the case of a diving wave, whose raypath eventually curves enough to reach the Earth’s surface again. Certainly, the slowness field, together with the initial direction of the ray, determines the entire raypath. Except in special cases, however, we must determine such raypaths numerically. Assume that we know the slowness function n(r) and that we know the ray direction u1 at point r1. We now want to derive an algorithm for finding the ray direction u2 at point r2. We choose a small, but finite, change in path length Ds. Then we use the first ray equation, which we recall is dr ¼u ds to compute the change Dr ¼ r2 r1. The required approximation is Dr ¼ u1 Ds

Construction of Seismic Images by Ray Tracing

73

or r2 ¼ r1 þ u1 Ds We have thus found the first desired quantity r2. Next, we use the second ray equation, which we recall is d(nu) ¼ rn ds in the form d(nu) ¼ rn ds The required approximation is D(nu) ¼ (rn) Ds or n(r2 )u2 n(r1 )u1 ¼ rn Ds For accuracy, rn may be evaluated by differentiating the known function n(r) midway between r1 and r2. Thus, the desired u2 is given as u2 ¼

n(r1 ) Ds u1 þ rn n(r2 ) n(r2 )

Note that the vector u1 is pulled in the direction of rn in forming u2, that is, the ray drags on the slow side, and so is bent in the direction of increasing slowness. The special case of no bending occurs when u1 and rn are parallel. As we have seen, a vertical wave in a horizontally stratified medium is an example of such a special case. We have thus found how to advance the wave along the ray by an incremental raypath distance. We can repeat the algorithm to advance the wave by any desired distance.

4.11

Conclusions

The acquisition of seismic data in many promising areas yields raw traces that cannot be interpreted. The reason is that signal-generated noise conceals the desired primary reflections. The solution to this problem was found in the 1950s with the introduction of the first commercially available digital computers and the signal-processing method of deconvolution. Digital signal-enhancement methods, and, in particular, the various methods of deconvolution, are able to suppress signal-generated noise on seismic records and bring out the desired primary-reflected energy. Next, the energy of the primary reflections must be moved to the spatial positions of the subsurface reflectors. This process, called migration, involves the superposition of all the deconvolved traces according to a scheme similar to Huygens’s construction. The result gives the reflecting horizons and other features that make up the desired image. Thus, digital processing, as it is

74

Signal and Image Processing for Remote Sensing

currently done, is roughly divided into two main parts, namely signal enhancement and event migration. The day is not far off, provided that research actively continues in the future as it has in the past, when the processing scheme will not be divided into two parts, but will be united as a whole. The signal-generated noise consists of physical signals that future processing should not destroy, but utilize. When all the seismic information is used in an integrated way, then the images produced will be even more excellent.

References Robinson, E.A., The MIT Geophysical Analysis Group from inception to 1954, Geophysics, 70, 7 JA–30JA, 2005. Yilmaz, O., Seismic Data Processing, Society of Exploration Geophysicists, Tulsa, 1987.

5 Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA Nicolas Le Bihan, Valeriu Vrabie, and Je´roˆme I. Mars

CONTENTS 5.1 Introduction ......................................................................................................................... 76 5.2 Matrix Data Sets.................................................................................................................. 76 5.2.1 Acquisition............................................................................................................... 77 5.2.2 Matrix Model........................................................................................................... 77 5.3 Matrix Processing ............................................................................................................... 78 5.3.1 SVD ........................................................................................................................... 78 5.3.1.1 Definition.................................................................................................. 78 5.3.1.2 Subspace Method.................................................................................... 78 5.3.2 SVD and ICA........................................................................................................... 79 5.3.2.1 Motivation ................................................................................................ 79 5.3.2.2 Independent Component Analysis ...................................................... 79 5.3.2.3 Subspace Method Using SVD–ICA...................................................... 81 5.3.3 Application .............................................................................................................. 82 5.4 Multi-Way Array Data Sets............................................................................................... 85 5.4.1 Multi-Way Acquisition .......................................................................................... 86 5.4.2 Multi-Way Model ................................................................................................... 86 5.5 Multi-Way Array Processing ............................................................................................ 87 5.5.1 HOSVD..................................................................................................................... 87 5.5.1.1 HOSVD Definition .................................................................................. 87 5.5.1.2 Computation of the HOSVD................................................................. 88 5.5.1.3 The (rc, rx, rt)-rank................................................................................... 89 5.5.1.4 Three-Mode Subspace Method............................................................. 90 5.5.2 HOSVD and Unimodal ICA ................................................................................. 90 5.5.2.1 HOSVD and ICA..................................................................................... 91 5.5.2.2 Subspace Method Using HOSVD–Unimodal ICA ............................ 91 5.5.3 Application to Simulated Data............................................................................. 92 5.5.4 Application to Real Data ....................................................................................... 97 5.6 Conclusions........................................................................................................................ 100 References ................................................................................................................................... 100

75

Signal and Image Processing for Remote Sensing

76

5.1

Introduction

This chapter describes multi-dimensional seismic data processing using the higher order singular value decomposition (HOSVD) and partial (unimodal) independent component analysis (ICA). These techniques are used for wavefield separation and enhancement of the signal-to-noise ratio (SNR) in the data set. The use of multi-linear methods such as the HOSVD is motivated by the natural modeling of a multi-dimensional data set using multiway arrays. In particular, we present a multi-way model for signals recorded on arrays of vector-sensors acquiring seismic vibrations in different directions of the 3D space. Such acquisition schemes allow the recording of the polarization of waves and the proposed multi-way model ensures the effective use of polarization information in the processing. This leads to a substantial increase in the performances of the separation algorithms. Before introducing the multi-way model and processing, we first describe the classical subspace method based on the SVD and ICA techniques for 2D (matrix) seismic data sets. Using a matrix model for these data sets, the SVD-based subspace method is presented and it is shown how an extra ICA step in the processing allows better wavefield separation. Then, considering signals recorded on vector-sensor arrays, the multi-way model is defined and discussed. The HOSVD is presented and some properties detailed. Based on this multi-linear decomposition, we propose a subspace method that allows separation of polarized waves under orthogonality constraints. We then introduce an ICA step in the process that is performed here uniquely on the temporal mode of the data set, leading to the so-called HOSVD–unimodal ICA subspace algorithm. Results on simulated and real polarized data sets show the ability of this algorithm to surpass a matrix-based algorithm and subspace method using only the HOSVD. Section 5.2 presents matrix data sets and their associated model. In Section 5.3, the wellknown SVD is detailed, as well as the matrix-based subspace method. Then, we present the ICA concept and its contribution to subspace formulation in Section 5.3.2. Applications of SVD–ICA to seismic wavefield separation are discussed by way of illustrations. Section 5.4 exposes how signal mixtures recorded on vector-sensor arrays can be described by a multi-way model. Then, in Section 5.5, we introduce the HOSVD and the associated subspace method for multi-way data processing. As in the matrix data set case, an extra ICA step is proposed leading to a HOSVD–unimodal ICA subspace method in Section 5.5.2. Finally, in Section 5.5.3 and Section 5.5.4, we illustrate the proposed algorithm on simulated and real multi-way polarized data sets. These examples emphasize the potential of using both HOSVD and ICA in multi-way data set processing.

5.2

Matrix Data Sets

In this section, we show how the signals recorded on scalar-sensor arrays can be modeled as a matrix data set having two modes or diversities: time and distance. Such a model allows the use of subspace-based processing using a SVD of the matrix data set. Also, an additional ICA step can be added to the processing to relax the unjustified orthogonality constraint for the propagation vectors by imposing a stronger constraint of (fourth-order) independence of the estimated waves. Illustrations of these matrix algebra techniques are presented on a simulated data set. Application to a real ocean bottom seismic (OBS) data set can be found in Refs. [1,2].

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 77 5.2.1

Acquisition

In geophysics, the most commonly used method to describe the structure of the earth is seismic reflection. This method provides images of the underground in 2D or 3D, depending on the geometry of the network of sensors used. Classical recorded data sets are usually gathered into a matrix having a time diversity describing the time or depth propagation through the medium at each sensor and a distance diversity related to the aperture of the array. Several methods exist to gather data sets and the most popular are common shotpoint gather, common receiver gather, or common midpoint gather [3]. Seismic processing consists in a series of elementary processing procedures used to transform field data, usually recorded in common shotpoint gather into a 2D or 3D common midpoint stacked 2D signals. Before stacking and interpretation, part of the processing is used to suppress unwanted coherent signals like multiple waves, ground-roll (surface waves), refracted waves, and also to cancel noise. To achieve this goal, several filters are classically applied on seismic data sets. The SVD is a popular method to separate an initial data set into signal and noise subspaces. In some applications [4,5] when wavefield alignment is performed, the SVD method allows separation of the aligned wave from the other wavefields.

5.2.2

Matrix Model

Consider a uniform linear array composed of Nx omni-directional sensors recording the contributions of P waves, with P < Nx. Such a record can be written mathematically using a convolutive model for seismic signals first suggested by Robinson [6]. Using the superposition principle, the discrete-time signal xk(m) (m is the time index) recorded on sensor k is a linear combination of the P waves received on the array together with an additive noise nk(m):

xk (m) ¼

P X

aki si (m mki ) þ nk (m)

(5:1)

i¼1

where si(m) is the ith source waveform that has been propagated through the transfer function supposed here to consist in a delay mki and a factor attenuation aki. The noises on each sensor nk(m) are supposed centered, Gaussian, spatially white, and independent of the sources. In the sequel, the use of the SVD to separate waves is only of significant interest if the subspace occupied by the part of interest contained in the mixture is of low rank. Ideally, the SVD performs well when the rank is 1. Thus, to ensure good results of the process, a preprocessing is applied on the data set. This consists of alignment (delay correction) of a chosen high amplitude wave. Denoting the aligned wave by s1(m), the model becomes after alignment:

yk (m) ¼ ak1 s1 (m) þ

P X

aki si (m m0ki ) þ n0k (m)

(5:2)

i¼2 0 where yk(m) ¼ xk(m þ mk1), mki ¼ mki mk1 and nk0 (m) ¼ nk(m þ mk1). In the following we assume that the wave s1(m) is independent from the others and therefore independent from si(m mki0 ).

Signal and Image Processing for Remote Sensing

78

Considering the simplified model of the received signals (Equation 5.2) and supposing Nt time samples available, we define the matrix model of the recorded data set Y 2 RNx Nt as Y ¼ {ykm ¼ yk (m)j 1 k Nx , 1 m Nt }

(5:3)

That is, the data matrix Y has rows that are the Nx signals yk(m) given in Equation 5.2. Such a model allows the use of matrix decomposition, and especially the SVD, for its processing.

5.3

Matrix Processing

We now present the definition of the SVD of such a data matrix that will be of use for its decomposition into orthogonal subspaces and in the associated wave separation technique. 5.3.1

SVD

As the SVD is a widely used matrix algebra technique, we only recall here theoretical remarks and redirect readers interested in computational issues to the Golub and Van Loan book [7]. 5.3.1.1

Definition

Any matrix Y 2 RNx Nt can be decomposed into the product of three matrices as follows: Y ¼ UDVT

(5:4)

where U is a Nx Nx matrix, D is an Nx Nt pseudo-diagonal matrix with singular values {l1, l2 , . . . , lN} on its diagonal, satisfying l1 l2 . . . lN 0, (with N ¼ min(Nx, Nt)), and V is an Nt Nt matrix. The columns of U (respectively of V) are called the left (respectively right) singular vectors, uj (respectively vj), and form orthonormal bases. Thus U and V are orthogonal matrices. The rank r (with r N) of the matrix Y is given by the number of nonvanishing singular values. Such a decomposition can also be rewritten as Y¼

r X

lj uj vTj

(5:5)

j¼1

where uj (respectively vj) are the columns of U (respectively V). This notation shows that the SVD allows any matrix to be expressed as a sum of r rank-1 matrices1. 5.3.1.2

Subspace Method

The SVD has been widely used in signal processing [8] because it gives the best rank approximation (in the least squares sense) of a given matrix [9]. This property allows denoising if the signal subspace is of relatively low rank. So, the subspace method consists of decomposing the data set into two orthogonal subspaces with the first one built from the p singular vectors related to the p highest singular values being the best rank approximation of the original data. This can be written as follows, using the SVD notation used in Equation 5.5, for a data matrix Y with rank r: 1

Any matrix made up of the product of a column vector by a row vector is a matrix whose rank is equal to 1 [7].

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 79

Y ¼ YSignal þ YNoise ¼

p X j ¼1

lj uj vTj þ

r X

lj uj vTj

(5:6)

j¼pþ1

Orthogonality between the subspaces spanned by the two sets of singular vectors is ensured by the fact that left and right singular vectors form orthonormal bases. From a practical point of view, the value of p is chosen by finding an abrupt change of slope in the curve of relative singular values (relative meaning percentile representation of) contained in the matrix D defined in Equation 5.4. For some cases where no ‘‘visible’’ change of slope can be found, the value of p can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves [10].

5.3.2

SVD and ICA

The motivation to relax the unjustified orthogonality constraint for the propagation vectors is now presented. ICA is the method used to achieve this by imposing a fourthorder independence on the estimated waves. This provides a new subspace method based on SVD–ICA. 5.3.2.1 Motivation The SVD of the data matrix Y in Equation 5.4 provides two orthogonal matrices composed by the left uj (respectively right vj) singular vectors. Note here that vj are called estimated waves because they give the time dependence of received signals by the array sensor and uj propagation vectors because they give the amplitude of vj0s on sensors [2]. As SVD provides orthogonal matrices, these vectors are also orthogonal. Orthogonality of the vj0s means that the estimated waves are decorrelated (second-order independence). Actually, this supports the usual cases in geophysical situations, in which recorded waves are supposed decorrelated. However, there is no physical reason to consider the orthogonality of propagation vectors uj. Why should we have different recorded waves with orthogonal propagation vectors? Furthermore, imposing the orthogonality of uj0s, the estimated waves vj are forced to be a mixture of recorded waves [1]. One way to relax this limitation is to impose a stronger criterion for the estimated waves, that is, to be fourth-order statistically independent, and consequently to drop the unjustified orthogonality constraint for the propagation vectors. This step is motivated by cases encountered in geophysical situations, where the recorded signals can be approximated as an instantaneous linear mixture of unknown waves supposed to be mutually independent [11]. This can be done using ICA.

5.3.2.2 Independent Component Analysis ICA is a blind decomposition of a multi-channel data set composed of an unknown linear mixture of unknown source signals, based on the assumption that these signals are mutually statistically independent. It is used in blind source separation (BSS) to recover independent sources (modeled as vectors) from a set of recordings containing linear combinations of these sources [12–15]. The statistical independence of sources means that the cross-cumulants of any order vanish. Generally, the third-order cumulants are discarded because they are generally close to zero. Therefore, here we will use fourth-order statistics, which have been found to be sufficient for instantaneous mixtures [12,13].

Signal and Image Processing for Remote Sensing

80

ICA is usually resolved by a two-step algorithm: prewhitening followed by high-order step. The first one consists in extracting decorrelated waves from the initial data set. The step is carried out directly by an SVD as the vj0s are orthogonal. The second step consists in finding a rotation matrix B, which leads to fourth-order independence of the estimated waves. We suppose here that the nonaligned waves in the data set Y are contained in a subspace of dimension R 1, smaller than the rank r of Y. Assuming this, only the first R estimated waves [v1 , . . . , vR ]notation ¼ VR 2 RNt R are taken into account [2]. As the recorded waves are supposed mutually independent, this second step can be written as ~ R ¼ [v ~ 1 , . . . ,v ~R ] 2 RNt R VR B ¼ V

(5:7)

Error (%) of the estimated signal subspace

with B 2 RR R the rotation (unitary) matrix having the property BBT ¼ BTB ¼ I. The ~j are now independent at the fourth order. new estimated waves v There are different methods of finding the rotation matrix: joint approximate diagonalization of eigenmatrices (JADE) [12], maximal diagonality (MD) [13], simultaneous thirdorder tensor diagonalization (STOTD) [14], fast and robust fixed-point algorithms for independent component analysis (FastICA) [15], and so on. To compare some cited ICA algorithms, Figure 5.1 shows the relative error (see Equation 5.12) of the estimated signal subspace versus the SNR (see Equation 5.11) for the data set presented in Section 5.3.3. For SNRs greater than 7.5 dB, FastICA using a ‘‘tan h’’ nonlinearity with the parameter equal to 1 in the fixed-point algorithm provides the smallest relative error, but with some erroneous points at different SNR. Note that the ‘‘tan h’’ nonlinearity is the one which gives the smallest error for this data set, compared with ‘‘pow3’’, ‘‘gauss’’ with the parameter equal to 1, or ‘‘skew’’ nonlinearities. MD and JADE algorithms are approximately equivalent according to the relative error. For SNRs smaller than 7.5 dB, MD provides the smallest relative error. Consequently, the MD algorithm was employed in the following. Now, considering the SVD decomposition in Equation 5.5 and the ICA step in Equation 5.7, the subspace described by the first R estimated waves can be rewritten as

6 JADE FastlCA–tan h MD

5 4 3 2 1 0 –20

–15

FIGURE 5.1 ICA algorithms—comparison.

–10

–5 0 SNR (dB) of the dataset

5

10

15

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 81 R X

R R X T R T X ~ ~jv ~iv ~Tj ¼ ~Ti lj uj vTj ¼ UR DR VR ¼ UR DR B V ¼ bj u bi u

j ¼1

j¼1

(5:8)

i¼1

where UR ¼ [u1, . . . , uR] is made up of the first R vectors of the matrix U and DR ¼ diag (l1, . . . , lR) is the R R truncated version of D containing the greatest values of lj. The ~ j are the new second equality is obtained using Equation 5.7. For the third equality, the u propagation vectors obtained as the normalized2 vectors (columns) of the matrix URDRB and bj are the ‘‘modified singular values’’ obtained as the ‘2 -norm of the columns of the matrix URDRB. The elements bj are usually not ordered. For this reason, a permutation between the ~ j as well as between the vectors v ~j is performed to order the modified singular vectors u values. Denoting with s() this permutation and with i ¼ s(j), the last equality of Equation 5.8 is obtained. In this decomposition, which is similar to that given by Equation 5.5, a stronger ~i has been imposed, that is, to be independent at criterion for the new estimated waves v the fourth order, and, at the same time, the condition of orthogonality for the new ~ i has been relaxed. propagation vectors u In practical situations, the value of R becomes a parameter. Usually, it is chosen to completely describe the aligned wave by the first R estimated waves given by the SVD. 5.3.2.3 Subspace Method Using SVD–ICA After the ICA and the permutation steps, the signal subspace is given by ~ Signal ¼ Y

~p X

~iv ~Ti bi u

(5:9)

i ¼1

where ~ p is the number of the new estimated waves necessary to describe the aligned wave. ~ Noise is obtained by subtraction of the signal subspace Y ~ Signal from The noise subspace Y the original data set Y: ~ Noise ¼ Y Y ~ Signal Y

(5:10)

~ is chosen by finding an abrupt change of From a practical point of view, the value of p slope in the curve of relative modified singular values. For cases with low SNR, no ‘‘visible’’ change of slope can be found and the value of ~p can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves. Note here that for very small SNR of the initial data set, (for example, smaller than 6.2 dB for the data set presented in Section 5.3.3, the aligned wave can be described by a less energetic estimated wave than by the first one (related to the highest singular value). For these extreme cases, a search must be done after the ICA and the permutation steps to ~i give the aligned identify the indexes for which the corresponding estimated waves v ~ Signal in Equation 5.9 must be redefined by choosing the wave. So the signal subspace Y index values found in the search. For example, applying the MD algorithm to the data set presented in Section 5.3.3 for which the SNR was modified to 9 dB, the aligned wave is ~3. Note also that using SVD without ICA in the described by the third estimated wave v same conditions, the aligned wave is described by the eighth estimated wave v8.

2

Vectors are normalized by their ‘2-norm.

Signal and Image Processing for Remote Sensing

82 5.3.3

Application

An application to a simulated data set is presented in this section to illustrate the behavior of the SVD–ICA versus the SVD subspace method. Application to a real data set obtained during an acquisition with OBS can be found in Refs. [1,2]. The preprocessed recorded signals Y on an 8-sensor array (Nx ¼ 8) during Nt ¼ 512 time samples are represented in Figure 5.2c. This synthetic data set was obtained by the addition of an original signal subspace S (Figure 5.2a) made up by a wavefront having infinite celerity (velocity), consequently associated with the aligned wave s1(m), and an original noise subspace N (Figure 5.2b) made up by several nonaligned wavefronts. These nonaligned waves are contained in a subspace of dimension 7, smaller than the rank of Y, which equals 8. The SNR ratio of the presented data set is SNR ¼ 3.9 dB. The SNR definition used here is3: SNR ¼ 20 log10

kSk kNk

(5:11)

Normalization to unit variance of each trace for each component was done before applying the described subspace methods. This ensures that even weak picked arrivals are well represented within the input data. After the computation of signal subspaces, a denormalization was applied to find the original signal subspace. Firstly, the SVD subspace method was tested. The subspace method given by Equation 5.6 was employed, keeping only one singular vector (respectively one singular value). This choice was made by finding an abrupt change of slope after the first singular value (Figure 5.6) in the relative singular values for this data set. The obtained signal subspace YSignal and noise subspace YNoise are presented in Figure 5.3a and Figure 5.3b. It is clear

Distance (sensors)

Distance (sensors)

Distance (sensors) 1

2

3

4

5

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450 500

400 450 500

400 450 500

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ SIi¼1 SJj¼1 a2ij is the Frobenius norm of the matrix A ¼ {aij} 2 RI J

6 7

100 150

100 150

100 150

kAk ¼

50

50

50

(b) Original noise subspace N

FIGURE 5.2 Synthetic data set.

3

8 0

1

2

3

4

5

6 7

8 0

1

2

3

4

5

6 7

8 0

(a) Original signal subspace S

(c) Recorded signals Y = S ⫹ N

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 83 Distance (sensors)

Distance (sensors)

1

2

3

4

5

6

7

50

50

50

100 150

100 150

100 150

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450

400 450

400 450

500

500

500

(b) Estimated noise subspace YNoise

8

0

1

2

3

4

5

6 7

8 0

1

2

3

4

5

6

8 0

(a) Estimated signal subspace YSignal

Index (i )

(c) Estimated waves vj

FIGURE 5.3 Results obtained using the SVD subspace method.

from these figures that the classical SVD implies artifacts in the two estimated subspaces for a wavefield separation objective. Moreover, the estimated waves vj shown in Figure 5.3c are an instantaneous linear mixture of the recorded waves. ~ Signal and noise subspace Y ~ Noise obtained using the SVD–ICA The signal subspace Y subspace method given by Equation 5.9 are presented in Figure 5.4a and Figure 5.4b. This improvement is due to the fact that using ICA we have imposed a fourth-order independence condition stronger than the decorrelation used in classical SVD. With this subspace method we have also relaxed the nonphysically justified orthogonality of the propagation vectors. The dimension R of the rotation matrix B was chosen to be eight because the aligned wavelight is projected on all eight estimated waves vj shown in Figure 5.3c. After the ICA ~i are presented in Figure 5.4c. As and the permutation steps, the new estimated waves v we can see, the first one describes the aligned wave ‘‘perfectly’’. As no visible change of slope can be found in the relative modified singular values shown in Figure 5.6, the value of ~ p was fixed at 1 because we are dealing with a perfectly aligned wave. To compare the results qualitatively, the stack representation is usually employed [5]. Figure 5.5 shows, from left to right, the stacks on the initial data set Y, the original signal subspace S, and the estimated signal subspaces obtained with SVD and SVD–ICA sub~ Signal is very space methods, respectively. As the stack on the estimated signal subspace Y close to the stack on the original signal subspace S, we can conclude that the SVD–ICA subspace method enhances the wave separation results. To compare these methods quantitatively, we use the relative error « of the estimated signal subspace defined as Signal k2 kS Y «¼ (5:12) kSk2

Signal and Image Processing for Remote Sensing

84

Distance (sensors)

Distance (sensors)

1

2

50

50

100 150

100 150

100 150

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450

400 450

400 450

500

500

500

FIGURE 5.4 Results obtained using the SVD-ICA subspace method.

(c) Estimated waves ~ vi

3

4

5

6

0

0

50

(b) ~ Estimated noise subspace YNoise

0 50 100 150 250 300

Time (samples)

200 350 400 450 500

FIGURE 5.5 Stacks. From left to right: initial data set Y, original signal subspace S, SVD, and SVD–ICA estimated subspaces.

7

8

1

2

3

4

5

6 7

8

1

2

3

4

5

6

7

8

0

(a) Estimated signal subspace ~ YSignal

Index (i )

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 85

Relative singular values (%)

30 SVD SVD–ICA

25 20 15 10 5 0

1

2

3

4 5 Index ( j )

6

7

8

FIGURE 5.6 Relative singular values.

where kk is the matrix Frobenius norm defined above, S is the original signal subspace Signal represents either the estimated signal subspace YSignal obtained using SVD or and Y ~ Signal obtained using SVD–ICA. For the data set presented the estimated signal subspace Y in Figure 5.2, we obtain « ¼ 55.7% for classical SVD and « ¼ 0.5% for SVD–ICA. The SNR of this data set was modified by keeping the initial noise subspace constant and by adjusting the energy of the initial signal subspace. The relative errors of the estimated signal subspaces versus the SNR are plotted in Figure 5.7. For SNRs greater than 17 dB, the two methods are equivalent. For smaller SNR, the SVD–ICA subspace method is obviously better than the SVD subspace method. It provides a relative error lower than 1% for SNRs greater than 10 dB. Note here that for other data sets, the SVD–ICA performance can be degraded by the unfulfilled independence assumption supposed for the aligned wave. However, for small SNR of the data set, the SVD–ICA usually gives better performances than SVD. The ICA step leads to a fourth-order independence of the estimated waves and relaxes the unjustified orthogonality constraint for the propagation vectors. This step in the process enhances the wave separation results and minimizes the error on the estimated signal subspace, especially when the SNR ratio is low.

5.4

Multi-Way Array Data Sets

Error (%) of the estimated signal subspace

We now turn to the modelization and processing of data sets having more than two modes or diversities. Such data sets are recorded by arrays of vector-sensors (also called multicomponent sensors) collecting, in addition to time and distance information, the polarization information. Note that there exist other acquisition schemes that output multi-way (or multi-dimensional, multi-modal) data sets, but they are not considered here.

100

SVD SVD–ICA

90 80 70 60 50 40 30 20 10 0 –30

–20

–10 0 10 SNR (dB) of the data set

20

30

FIGURE 5.7 Relative error of the estimated subspaces.

Signal and Image Processing for Remote Sensing

86 5.4.1

Multi-Way Acquisition

In seismic acquisition campaigns, multi-component sensors have been used for more than ten years now. Such sensors allow the recording of the polarization of seismic waves. Thus, arrays of such sensors provide useful information about the nature of the propagated wavefields and allow a more complete description of the underground structures. The polarization information is very useful to differentiate and characterize waves in signal, but the specific (multi-component) nature of the data sets has to be taken into account in the processing. The use of vector-sensor arrays provides data sets with time, distance, and polarization modes, which are called trimodal or three-mode data sets. Here we propose to use a multi-way model to model and process them.

5.4.2

Multi-Way Model

To keep the trimodal (multi-dimensional) structure of data sets originated from vectorsensor arrays in their processing, we propose a multi-way model. This model is an extension of the one proposed in Section 5.2.2 Thus, a three-mode data set is modeled as a multi-way array of size Nc Nx Nt, where Nc is the number of components of each sensor used to recover the vibrations of the wavefield in the three directions of the 3D space, Nx is the number of sensors of the vector-sensor array, and Nt is the number of time samples. Note that the number of components is defined by the vector-sensor configuration. As an example, for the illustration shown in Section 5.5.3, Nc ¼ 2 because one geophone and one hydrophone were used, while Nc ¼ 3 in Section 5.5.4 because three geophones were used. Supposing that the propagation of waves only introduces delay and attenuation, the signal recorded on the cth component (c ¼ 1, . . . ,Nc) of the kth sensor (k ¼ 1, . . . ,Nx), using the superposition principle and assuming that P waves impinge on the array of vector-sensors, can be written as P X xck (m) ¼ acki si (m mki ) þ nck (m) (5:13) i¼1

where acki represents the attenuation of the ith wave on the cth component of the kth sensor of the array. si(m) is the ith wave and mki is the delay observed at sensor k. The time index is m. nck(m) is the noise, supposed Gaussian, centered, spatially white, and independent of the waves. As in the matrix processing approach, preprocessing is needed to ensure low rank of the signal subspace and to ensure good results for a subspace-based processing method. Thus, a velocity correction applied on the dominant waveform (compensation of mk1) leads for the signal recorded on component c of sensor k to: yck (m) ¼ ack1 s1(m) þ

P X

acki si (m m0ki ) þ n0ck (m)

(5:14)

i¼2

where yck(m) ¼ xck(m þ mk1), mki0 ¼ mki mk1, and n0ck(m) ¼ nck(m þ mk1). In the sequel, the wave s1(m) is considered independent from other waves and from the noise. The subspace method developed thereafter will intend to isolate and estimate correctly this wave. Thus, three-mode data sets recorded during Nt time samples on vector-sensor arrays made up by Nx sensors each one having Nc components can be modeled as multi-way arrays Y 2 RNc Nx Nt:

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 87 Y ¼ fyckm ¼ yck (m)j1 c Nc , 1 k Nx , 1 m Nt g

(5:15)

This multi-way model can be used for extension of subspace method separation to multicomponent data sets.

5.5

Multi- Way A rray Processing

Multi-way data analysis arose firstly in the field of psychometrics with Tucker [16]. It is still an active field of research and it has found applications in many areas such as chemometrics, signal processing, communications, biological data analysis, food industry, etc. It is an admitted fact that there exists no exact extension of the SVD for multi-way arrays of dimension greater than 2. Instead of such an extension, there exist mainly two decompositions: PARAFAC [17] and HOSVD [14,16]. The first one is also known as CANDECOMP as it gives a canonical decomposition of a multi-way array, that is, it expresses the multi-way array into a sum of rank-1 arrays. Note that the rank-1 arrays in the PARAFAC decomposition may not be orthogonal, unlike in the matrix case. The second multi-way array decomposition, HOSVD, gives orthogonal bases in the three ways of the array but is not a canonical decomposition as it does not express the original array into a sum of rank-1 arrays. However, in the sequel, we will make use of the HOSVD because of the orthogonal bases that allow extension of well-known subspace methods based on SVD to multi-way datasets.

5.5.1

HOSVD

We now introduce the HOSVD that was formulated and studied in detail in Ref. [14]. We give particular attention to the three-mode case because we will process such data in the sequel, but an extension to the multi-dimensional case exists [14]. One must notice that in the trimodal case the HOSVD is equivalent to the TUCKER3 model [16]; however, the HOSVD has a formulation that is more familiar in the signal processing community as its expression is given in terms of matrices of singular vectors just as in the SVD in the matrix case. 5.5.1.1 HOSVD Definition Consider a multi-way array Y 2 RNc

Nx Nt

, the HOSVD of Y is given by

Y ¼ C 1 V(c) 2 V(x) 3 V(t)

(5:16)

where C 2 RNc Nx Nt is called the core array and V(i) ¼ [v(i)1, . . . , v(i)j, . . . , v(i)ri] 2 RNi ri are matrices containing the singular vectors v(i)j 2 RNi of Y in the three modes (i ¼ c, x, t). These matrices are orthogonal, V(i)V(i)T ¼ I, just as in the matrix case. A schematic representation of the HOSVD is given in Figure 5.8. The core array C is the counterpart of the diagonal matrix D in the SVD case in Equation 5.4, except that it is not hyperdiagonal but fulfils the less restrictive property of being allorthogonal. All-orthogonality is defined as hCi¼ Ci¼ i ¼ 0 where i ¼ c, x, t and a 6¼ b kCi¼1 k kCi¼2 k kCi¼ri k 0,8i

(5:17)

Signal and Image Processing for Remote Sensing

88

rc Nc V(c ) Nc

Nx

y

V(x ) Nx

FIGURE 5.8 HOSVD of a three-mode data set Y.

rc

rx

Nt

rx

rt

C rt

V(t ) Nt

where h.,.i is the classical scalar product between matrices4 and kk is the matrix Frobenius norm defined in Section 5.2 (because here we deal with three-mode data and the ‘‘slices’’ Ci ¼ a define matrices). Thus, hCi ¼ a, Ci bi ¼ 0 corresponds to orthogonality between slices of the core array. Clearly, the all-orthogonality property consists of orthogonality between two slices (matrices) of the core array cut in the same mode and ordering of the norm of these slices. This second property is the counterpart of the decreasing arrangement of the singular values in the SVD [7], with the special property of being valid here for norms of slices of C and in its three modes. As a consequence, the ‘‘energy’’ of the three-mode data set Y is concentrated at the (1,1,1) corner of the core array C. The notation n in Equation 5.16 is called the n-mode product and there are three such products (namely 1, 2, and 3), which can be defined for the three-mode case. Given a multi-way array A 2 RI1 I2 I3, then the three possible n-mode products of A with matrices are: P (A 1 B)ji2 i3 ¼ ai1 i2 i3 bji1 i1

(A 2 C)i1 ji3 ¼ (A 3 D)i1 i2 j ¼

P

ai1 i2 i3 cji2

i2

P i3

ai1 i2 i3 dji3

(5:18)

where B 2 RJ I1, C 2 RJ I2, and D 2 RJ I3. This is a general notation in (multi-)linear algebra and even the SVD of a matrix can be expressed with such a product. For example, the SVD given in Equation 5.4 can be rewritten, using n-mode products, as Y ¼ D 1 U 2 V [10]. 5.5.1.2

Computation of the HOSVD

The problem of finding the elements of a three-mode decomposition was originally solved using alternate least square (ALS) techniques (see Ref. [18] for details). It was only in Ref. [14] that a technique based on unfolding matrix SVDs was proposed. We present briefly here a way to compute the HOSVD using this approach. From a multi-way array Y 2 RNc Nx Nt, it is possible to build three unfolding matrices, with respect to the three modes c, x, and t, in the following way: 8 < Y(c) 2 RNc Nx Nt Nc Nx Nt (5:19) ) Y(x) 2 RNx Nt Nc Y2R : Y(t) 2 RNt Nc Nx 4

hA, Bi ¼ SiI ¼ 1 Sj J ¼ 1aijbij is the scalar product between the matrices A ¼ {aij} 2 RI J and B ¼ {bij} 2 RI J.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 89 Nx Nc Y(c )

Nc

Nx Nt

Nx

Nt

Nt

Nc Nx Nt

Y(x )

Nc

Nc

Nc

Nt

Nx Nt

Y(t ) FIGURE 5.9 Schematic representation of the unfolding matrices.

Nx

A schematic representation of these unfolding matrices is presented in Figure 5.9. Then, these three unfolding matrices admit the following SVD decompositions: YT(i)

SVD

U(i) D(i) VT(i)

(5:20)

where each matrix V(i) (i ¼ c, x, t) in Equation 5.16 is the right matrix given by the SVD of T each transposed unfolding matrix Y(i) . The choice of the transpose was made to keep a homogeneous notation between matrix and multi-way processing. The matrices V(i) define orthonormal bases in the three modes of the vector space RNc Nx Nt. The core array is then directly obtained using the formula: C ¼ Y 1 VT(c) 2 VT(x) 3 VT(t)

(5:21)

The singular values contained in the three matrices D(i) (i ¼ c, x, t) in Equation 5.20 are called three-mode singular values. Thus, the HOSVD of a multi-way array can be easily obtained from the SVDs of the unfolding matrices, which makes computing of this decomposition easy using already existing algorithms.

5.5.1.3 The (rc, rx, rt)-rank Given a three-mode data Y 2 RNc Nx Nt, one gets three unfolded matrices Y(c), Y(x), and Y(t), 0 with respective ranks rc, rx, and rt. That is, the rj s are given as the number of nonvanishing singular values contained in the matrices D(i) in Equation 5.20, with i ¼ c, x, t. As mentioned before, the HOSVD is not a canonical decomposition and so is not related to the generic rank (number of rank-1 arrays that lead to the original array by linear combination). Nevertheless, the HOSVD gives other information named, in the threemode case, the three-mode rank. The three-mode rank consists of a triplet of ranks: the (rc, rx, rt)-rank, which is made up of the ranks of matrices Y(c), Y(x), and Y(t) in the HOSVD. In the sequel, the three-mode rank will be of use for determination of subspace dimensions, and so will be the counterpart of the classical rank used in matrix processing techniques.

Signal and Image Processing for Remote Sensing

90 5.5.1.4

Three-Mode Subspace Method

As in the matrix case, it is possible, using the HOSVD, to define a subspace method that decomposes the original three-mode data set into orthogonal subspaces. Such a technique was first proposed in Ref. [10] and can be stated as follows. Given a threemode data set Y 2 RNcNxNt, it is possible to decompose it into the sum of a signal and a noise subspace as Y ¼ YSignal þ YNoise

(5:22)

with a weaker constraint than in the matrix case (orthogonality), which is a mode orthogonality, that is, orthogonality in the three modes [10] (between subspaces defined using the unfolding matrices). Just as in the matrix case, the signal and noise subspaces are formed by different vectors obtained from the decomposition of the data set. In the threemode case, the signal subspace YSignal is built using the first pc rc singular vectors in the first mode, px rx in the second, and pt rt in the third: YSignal ¼ Y 1 PVpc 2 PVpx 3 PVpt (x)

(c)

(t)

(5:23)

with PVpi the projectors given by (i)

p

pT

PVpi ¼ V(i)i V(i)i (i)

(5:24)

pi where V(i) ¼ [ v(i)1, . . . , v(i)pi] are the matrices containing the first pi singular vectors (i ¼ c, x, t). Then after estimation of the signal subspace, the noise subspace is simply obtained by subtraction, that is, YNoise ¼ Y YSignal. The estimation of the signal subspace consists in finding a triplet of values pc, px, pt that allows recovery of the signal part by the (pc, px, pt)-rank truncation of the original data set Y. This truncation is obtained by classical matrix truncation of the three SVDs of the unfolding matrices. However, it is important to note that such a truncation is not the best (p1, p2, p3)-rank truncation of the data [10]. Nevertheless, the decomposition of the original three-mode data set is possible and leads, under some assumptions, to the separation of the recorded wavefields. From a practical point of view, the choice of pc, px, pt values is made by finding abrupt changes of the slope in the curves of relatives of three-mode singular values (the three sets of singular values contained in the matrices D(i)). For some special cases for which no ‘‘visible’’ change of slope can be found, the value of pc can be fixed at 1 for a linear polarization of the aligned wavefield (denoted by s1(m) ), or at 2 for an elliptical polarization [10]. The value of px can be fixed at 1 and the value of pt can be fixed at 1 for a perfect alignment of waves, or at 2 for not an imperfect alignment or for dispersive waves. As in the matrix case, the HOSVD-based subspace method decomposes the original space of the data set into orthogonal subspaces, and so following the ideas developed in Section 5.3.2, it is possible to add an ICA step to modify the orthogonal constraint in the temporal mode.

5.5.2

HOSVD and Unimodal ICA

To enhance wavefield separation results, we now introduce a unimodal ICA step following the HOSVD-based subspace decomposition.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 91 5.5.2.1 HOSVD and ICA The SVD of the unfolded matrix Y(t) in Equation 5.20 provides two orthogonal matrices U(t) and V(t) made up by the left and right singular vectors u(t)j and v(t)j. As for the SVD, the v(t)js0 are the estimated waves and u(t)j’s are the propagation vectors. Based on the same motivations as in the SVD case, the unjustified orthogonality constraint for the propagation vectors can be relaxed by imposing a fourth-order independence for the estimated waves. Assuming the recorded waves are mutually independent, we can write: ~ R ¼ [v ~(t)1 , . . . ,v ~(t)R ] VR(t) B ¼ V (t)

(5:25)

with B 2 RR R the rotation (unitary) matrix given by one of the algorithms presented in R Section 5.3.2 and V(t) ¼ [v(t)1, . . . ,v(t)R] 2 RNt R made up by the first R vectors of V(t). Here we also suppose that the nonaligned waves in the unfolded matrix Y(t) are contained in a subspace of dimension R 1, smaller than the rank rt of Y(t). After the ICA step, a new matrix can be computed: h i ~ R ; VNt R ~ (t) ¼ V V (t) (t)

(5:26)

R ~ (t) ~(t)j of the matrix V This matrix is made up of the R vectors v , which are independent at the fourth order, and by the last Nt R vectors v(t)j of the matrix V(t), which are kept unchanged. The HOSVD–unimodal ICA decomposition is defined as

~ 1 V(c) 2 V(x) 3 V ~ (t) Y¼C

(5:27)

~ ¼ Y 1 VT 2 VT 3 V ~T C (c) (x) (t)

(5:28)

with

Unimodal ICA implies here that ICA is only performed on one mode (the temporal mode). ~ (t) As in the SVD case, a permutation s(.) between the vectors of V(c), V(x), respectively, V must be performed for ordering the Frobenius norms of the subarrays (obtained by fixing ~ . Hence, we keep the same decomposition structure as one index) of the new core array C in relations given in Equation 5.16 and Equation 5.21, the only difference is that we have modified the orthogonality into a fourth-order independence constraint for the first R estimated waves on the third mode. Note that the (rc,rx,rt)-rank of the three-mode data set Y is unchanged. 5.5.2.2 Subspace Method Using HOSVD–Unimodal ICA On the temporal mode, a new projector can be computed after the ICA and the permutation steps: T

~ ~ ~pt ¼ V ~ ~pt V ~ ~pt P (t) (t) V

(5:29)

(t)

~ pt ~ (t) ~t] is the matrix containing the first p~t estimated waves, which where V ¼ [~ v(t)1, . . . ,~ v(t)p ~ (t) defined in Equation 5.26. Note that the two projectors on the first are the columns of V two modes Pvpc and Pvpx , given by Equation 5.24, must be recomputed after the ðcÞ ðcÞ permutation step.

Signal and Image Processing for Remote Sensing

92

The signal subspace using the HOSVD–unimodal ICA is thus given as ~ Signal ¼ Y 1 P pc 2 P px 3 P ~ ~ ~pt Y V V V (c)

(x)

(5:30)

(t)

~ Noise is obtained by the subtraction of the signal subspace from and the noise subspace Y the original three-mode data: ~ Signal ~ Noise ¼ Y Y Y

(5:31)

In practical situations, as in the SVD–ICA subspace case, the value of R becomes a parameter. It is chosen to fully describe the aligned wave by the first R estimated waves v(t)j obtained while using the HOSVD. The choice of the pc, px, and ~ pt values is made by finding abrupt changes of the slope in the curves of modified three-mode singular values, obtained after the ICA and the permutation steps. Note that in the HOSVD–unimodal ICA subspace method, the rank for the signal subspace in the third mode, ~pt, is not necessarily equal to the rank pt obtained using only the HOSVD. For some special cases for which no ‘‘visible’’ change of slope can be found, the value of ~pt can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves. As in the HOSVD subspace method, px can be fixed at 1 and pc can be fixed at 1 for a linear polarization of the aligned wavefield, or at 2 for an elliptical polarization. Applications to simulated and real data are presented in the following sections to illustrate the behavior of the HOSVD–unimodal ICA method in comparison with component-wise SVD (SVD applied on each component of the multi-way data separately) and HOSVD subspace methods. 5.5.3

Application to Simulated Data

This simulation represents a multi-way data set Y 2 R218256 composed of Nx ¼ 18 sensors each recording two directions (Nc ¼ 2) in the 3D space for a duration of Nt ¼ 256 time samples. The first component is related to a geophone Z and the second one to a hydrophone Hy. The Z component was scaled by 5 to obtain the same amplitude range. This data set shown in Figure 5.10c and Figure 5.11c has polarization, distance, and time as modes. It was obtained by the addition between an original signal subspace S with the two components shown in Figure 5.10a and Figure 5.11a, respectively, and an original noise subspace N (Figure 5.10b and Figure 5.11b respectively) obtained from a real geophysical acquisition after subtraction of aligned waves. The original signal subspace is made of several wavefronts having infinite apparent velocity, associated with the aligned wave. The relation between Z and Hy is a linear relation, which is assimilated to the wave polarization (polarization mode) in the sense that it consists of phase and amplitude relations between the two components. Wave amplitudes vary along the sensors, simulating attenuation along the distance mode. The noise is uncorrelated from one sensor to the other (spatially white) and also unpolarized. The SNR ratio of this data set is SNR ¼ 3 dB, where the SNR definition is: SNR ¼ 20 log10

kSk kNk

(5:32)

where kk is the multi-way array Frobenius norm5, S, and N the original signal and noise subspaces. 5

For any multi-way array X ¼ {xijk} 2 RIJK, his Frobenius norm is kXk ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ j Sii¼1 Sj¼1 SKk¼1 x2ijk .

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 93 Distance (sensors)

Distance (sensors)

Distance (sensors) 0

2 4

6 8

10

12 14

16 18

0

2 4

6 8

10

12 14

16 18

0

2 4

6 8

10

12 14

16 18

0

0

0

50

50

50

100 150

150

150

Time (samples)

100

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(b) Original noise subspace N

(a) Original signal subspace S

(c) Initial dataset Y

FIGURE 5.10 Simulated data: the Z component.

Our aim is to recover the original signal (Figure 5.10a and Figure 5.11a) from the mixture, which is, in practice, the only data available. Note that normalization to unit variance of each trace for each component was done before applying the described subspace methods. This ensures that even weak peaked arrivals are well represented

Distance (sensors)

Distance (sensors)

0

2

4

6

50

50

100 150

150

Time (samples)

100

150

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(c) Initial dataset Y

8

10

12

14

0

0

50

(b) Original noise subspace N

16

18

0

2

4

6

8

10

12

14

16

18

0

2

4

6

8

10

12

14

16

18 0

(a) Original signal subspace S FIGURE 5.11 Simulated data: the Hy component.

Distance (sensors)

Signal and Image Processing for Remote Sensing

94

Distance (sensors)

Distance (sensors)

0

2 4

6 8

10

12 14

16 18 0

0 2

4

6 8

10

12

14

16

18 0

50

50

100

150

150

Time (samples)

100

Time (samples)

200

200

250

250

(b) Hy component

(a) Z component FIGURE 5.12 Signal subspace using component-wise SVD.

within the input data. After computation of signal subspaces, a denormalization was applied to find the original signal subspace. Firstly, the SVD subspace method (described in Section 5.3) was applied separately on Z and Hy components of the mixture. The signal subspace components obtained keeping only one singular value for the two components are presented in Figure 5.12. This choice was made by finding an abrupt change of slope after the first singular value in the relative singular values shown in Figure 5.13 for each seismic 2D signal. The waveforms are not well recovered in respect of the original signal components (Figure 5.10a and Figure 5.11a). One can also see that the distinction between different wavefronts is not possible. Furthermore, no arrival time estimation is possible using this technique. Low signal level is a strong handicap for a component-wise process. Applied on each component separately, the SVD subspace method does not find the same aligned polarized wave. The results depend therefore on the characteristics of each matrix signal. Using the SVD–ICA subspace method, the estimated aligned waves may be improved, but we can be confronted with the same problem. 20 15 10 5 0

1

2

4

6

8

10

12

14

16

18

1

2

4

6

8

10

12

14

16

18

20 15 10

FIGURE 5.13 Relative singular values. Top: Z component. Bottom: Hy component.

5 0

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 95 Distance (sensors)

1

2

3

4

50

50

50

100 150

150

Time (samples)

100

150

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(b) Hy component of YSignal

5

0

0

2 4

6 8

10

12 14

0

(a) Z component of YSignal

Index ( j )

0

0

2 4

6 8

10

12 14

16 18

16 18

Distance (sensors)

(c) Estimated waves V(t )j

FIGURE 5.14 Results obtained using the HOSVD subspace method.

Using the HOSVD subspace method, the components of the estimated signal subspace YSignal are presented in Figure 5.14a and Figure 5.14b. In this case, the number of singular vectors kept are: one on polarization, one on distance, and one on time mode, giving a (1,1,1)-rank truncation for the signal subspace. This choice is motivated here by the linear polarization of the aligned wave. For the other two modes the choice was made by finding an abrupt change of slope after the first singular value (Figure 5.17a). There still remain some ‘‘oscillations’’ between the different wavefronts for the two components of the estimated signal subspace, that may induce some detection errors. An ICA step is required in this case to obtain a better signal separation and to cancel parasitic oscillations. ~ Signal In Figure 5.15a and Figure 5.15b, the wavefronts of the estimated signal subspace Y obtained with the HOSVD–unimodal ICA technique are very close to the original signal components. Here, the ICA method was applied on the first R ¼ 5 estimated waves shown in Figure 5.14c. These waves describe the aligned waves of the original signal subspace S. ~(t)j are shown in Figure 5.15c. As we can see, the After the ICA step, the estimated waves v ~(t)1 describes more precisely the aligned wave of the original subspace S than the first one v first estimated wave v(t)1 before the ICA step (Figure 5.14c). The estimation of signal subspace is more accurate and the aligned wavelet can be better estimated with our proposed procedure. After the permutation step, the relative singular values on the three modes in the HOSVD–unimodal ICA case are shown in Figure 5.17b. This figure justifies the choice of a (1,1,1)-rank truncation for the signal subspace, due to the linear polarization and the abrupt changes of the slopes for the other two modes. As for the bidimensional case, to compare these methods quantitatively, we use the relative error « of the estimated signal subspace defined as

Signal and Image Processing for Remote Sensing

96

Distance (sensors)

1

2

3

0 50

50

50

100 150

150

150

Time (samples)

100

Time (samples)

100

Time (samples)

200

200

200

250

250

250

~

(a) Z component of YSignal

4

5

0

2 4

6 8

10

12 14

Index ( j )

0

0

16 18

0

2

4

6

8

10

12

14

16

18

Distance (sensors)

~

(b) Hy component of YSignal

~

(c) Estimated waves v(t )j

FIGURE 5.15 Results obtained using the HOSVD–unimodal ICA subspace method. Signal 2

«¼

kS Y

k

kSk

(5:33)

2

where kk is the multi-way array Frobenius norm defined above, S is the original signal Signal represents the estimated signal subspaces obtained with the SVD, subspace, and Y HOSVD, and HOSVD–unimodal ICA methods, respectively. For this data set we obtain « ¼ 21.4% for the component-wise SVD, « ¼ 12.4% for HOSVD, and « ¼ 3.8% for HOSVD–unimodal ICA. We conclude that the ICA step minimizes the error on the estimated signal subspace. To compare the results qualitatively, the stack representation is employed. Figure 5.16 shows for each component, from left to right, the stacks on the initial data set Y, the original signal subspace S, and the estimated signal subspaces obtained with the component-wise SVD, HOSVD, and HOSVD–unimodal ICA subspace methods, respectively. 0

0

50

50

100 150

150

200

200

250

250

(a) Z component

Time (samples)

100

Time (samples)

FIGURE 5.16 Stacks. From left to right: initial data set Y, original signal subspace S, SVD, HOSVD, and HOSVD–unimodal ICA estimated subspaces, respectively.

(b) Hy component

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 97 60 50 40 1 30 20 10 0 1 15 10 5 0 1

2

2

4

5

6

10

8

15

10

12

20

14

25

16

30

18

36

(a) Using HOSVD

60 50 40 1 30 20 10 0 1 15 10 5 0 1

2

2

4

5

6

10

8

15

10

20

12

14

25

(b) Using HOSVD−unimodal ICA

16

30

18

36

FIGURE 5.17 Relative three-mode singular values.

~ Signal is very close to the stack on the original signal As the stack on the estimated signal subspace Y subspace S, we conclude that the HOSVD–unimodal ICA subspace method enhances the wave separation results.

5.5.4

Application to Real Data

We consider now a real vertical seismic profile (VSP) geophysical data set. This 3C data set was recorded by Nx ¼ 50 sensors with the depth sampling 10 m, each one made up by Nc ¼ 3 geophones recording three directions in the 3D space: X, Y, and Z, respectively. The recording time was 700 msec, corresponding to Nt ¼ 175 time samples. The Z component was scaled seven times to obtain the same amplitude range. After the preprocessing step (velocity correction based on the direct downgoing wave), the obtained data set Y 2 R350175 is shown in Figure 5.18. As in the simulation case, normalization and denormalization of each trace for each component were done before and after applying the different subspace methods. From the original data set we have constructed three seismic 2D matrix signals representing the three components of the data set Y. The SVD subspace method presented in Section 5.3.1 was applied on each matrix signal, keeping only one singular vector (respectively one singular value) for each one, due to an abrupt change of slope after the first singular value in the curves of relative singular values. As remarked in the simulation case, the SVD–ICA subspace method may improve the estimated aligned waves, but we will not find the same aligned polarized wave for all seismic matrix signals. For the HOSVD subspace method, the estimated signal subspace YSignal can be defined as a (2,1,1)-rank truncation of the data set. This choice is motivated here by the elliptical polarization of the aligned wave. For the other two modes the choice was made by finding an abrupt change of slope after the first singular value (Figure 5.20a). Using the HOSVD–unimodal ICA, the ICA step was applied here on the first R ¼ 9 estimated waves v(t)j shown in Figure 5.19a. As suggested, R becomes a parameter

Signal and Image Processing for Remote Sensing

98 Distance (m)

0 50

100 150

200 250

300 350 400

450 500 0

0 50

100 150

200 250

300 350 400

450 500 0

0 50

100 150

200 250

300 350 400

450 500 0

0.1

0.1

0.1

0.2

0.2

0.2

0.3 0.4

Time (s)

0.4

Time (s)

0.3

0.4

Time (s)

0.3

0.5

0.5

0.5

0.6

0.6

0.6

0.7

0.7

0.7

(a) X component

Distance (m)

Distance (m)

(b) Y component

(c) Z component

FIGURE 5.18 Real VSP geophysical data Y.

~(t)j shown in Figure 5.19b are while using real data. However, the estimated waves v more realistic (shorter wavelet and no side lobes) than those obtained without ICA. Due to the elliptical polarization and the abrupt change of slope after the first singular value for the other two modes (Figure 5.20b), the estimated signal subspace ~ Signal is defined as a (2,1,1)-rank truncation of the data set. This step enhances Y the wave separation results, implying a minimization of the error on the estimated signal subspace. When we deal with a real data set, only a qualitative comparison is possible. This is allowed by a stack representation. Figure 5.21 shows the stacks for the X, Y, and Z components, respectively, on the initial trimodal data Y and on the estimated signal

Index ( j )

0.2

0.2

0.3 0.4 Time (s)

0.3 0.4 Time (s) 0.5

0.5

0.6

0.6

0.7

0.7

~

(b) After ICA, v(t )j

1

3

5

7

0.1

0.1

(a) Before ICA, v(t )j

9

0

1

3

5

7

9

0

FIGURE 5.19 The first 9 estimated waves.

Index ( j )

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 99 60 40 20 1

2

3

20 10 01 10

10

20

30

40

50

5 0

1

20

40

60

80

100

120

140

(a) Using HOSVD 60 40 20 1 20

2

3

10 0 1 10

10

20

30

40

50

5 0 1

20

40

60

80

100

120

140

(b) Using HOSVD–unimodal ICA

FIGURE 5.20 Relative three-mode singular values.

subspaces given by the component-wise SVD, HOSVD, and HOSVD–unimodal ICA methods, respectively. The results on simulated and real data suggest that the three-dimensional subspace methods are more robust than the component-wise techniques because they exploit the relationship between the components directly in the process. Also, the fourth-order independence constraint of the estimated waves enhances the wave separation results 0

0

0

0.1

0.1

0.1

0.2

0.2

0.2

0.4

Time (s)

0.5

0.5

0.5

0.6

0.6

0.6

0.7

0.7

0.7

(b) Y component

0.3

0.4

Time (s)

0.3

0.4

Time (s)

0.3

(a) X component

(c) Z component

FIGURE 5.21 Stacks. From left to right: initial data set Y, SVD, HOSVD, and HOSVD–unimodal ICA estimated subspaces, respectively.

Signal and Image Processing for Remote Sensing

100

and minimizes the error on the estimated signal subspace. This emphasizes the potential of the HOSVD subspace method associated with a unimodal ICA step for vector-sensor array signal processing.

5.6

Conclusions

We have presented a subspace processing technique for multi-dimensional seismic data sets based on HOSVD and ICA. It is an extension of well-known subspace separation techniques for 2D (matrix) data sets based on SVD and more recently on SVD and ICA. The proposed multi-way technique can be used for the denoising and separation of polarized waves recorded on vector-sensor arrays. A multi-way (three-mode) model of polarized signals recorded on vector-sensor arrays allows us to take into account the additional polarization information in the processing and thus to enhance the separation results. A decomposition of three-mode data sets into all-orthogonal subspaces has been proposed using HOSVD. An extra unimodal ICA step has been introduced to minimize the error on the estimated signal subspace and to improve the separation result. Also, we have shown on simulated and real data sets that the proposed approach gives better results than the component-wise SVD subspace method. The use of multi-way array and associated decompositions for multi-dimensional data set processing is a powerful tool and ensures the extra dimension is fully taken into account in the process. This approach could be generalized to any multi-dimensional signal modelization and processing and could take advantage of recent work on tensors and multi-way array decomposition and analysis.

References 1. Vrabie, V.D., Statistiques d’ordre supe´rieur: applications en ge´ophysique et e´lectrotechnique, Ph.D. thesis, I.N.P. of Grenoble, France, 2003. 2. Vrabie, V.D., Mars, J.I., and Lacoume, J.-L., Modified singular value decomposition by means of independent component analysis, Signal Processing, 84(3), 645, 2004. 3. Sheriff, R.E. and Geldart, L.P., Exploration Seismology, Vol. 1 & 2, Cambridge University Press, 1982. 4. Freire, S. and Ulrich, T., Application of singular value decomposition to vertical seismic profiling, Geophysics, 53, 778–785, 1988. 5. Glangeaud, F., Mari, J.-L., and Coppens, F., Signal processing for geologists and geophysicists, Editions Technip, Paris, 1999. 6. Robinson, E.A. and Treitel, S., Geophysical Signal Processing, Prentice Hall, 1980. 7. Golub, G.H. and Van Loan, C.F., Matrix Computation, Johns Hopkins, 1989. 8. Moonen, M. and De Moor, B., SVD and signal processing III, in Algorithms, Applications and Architecture, Elsevier, Amsterdam, 1995. 9. Scharf, L.L., Statistical signal processing, detection, estimation and time series analysis, in Electrical and Computer Engineering: Digital Signal Processing Series, Addison Wesley, 1991. 10. Le Bihan, N. and Ginolhac, G., Three-mode dataset analysis using higher order subspace method: application to sonar and seismo-acoustic signal processing, Signal Processing, 84(5), 919–942, 2004. 11. Vrabie, V.D., Le Bihan, N., and Mars, J.I., 3D-SVD and partial ICA for 3D array sensors, Proceedings of the 72nd Annual International Meeting of the Society of Exploration Geophysicists, Salt Lake City, 2002, 1065.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 101 12. Cardoso, J.-F. and Souloumiac, A., Blind beamforming for non-Gaussian signals, IEE Proc.-F, 140(6), 362, 1993. 13. Comon, P., Independent component analysis, a new concept? Signal Processing, 36(3), 287, 1994. 14. De Lathauwer, L., Signal processing based on multilinear algebra, Ph.D. thesis, Katholieke Universiteit Leuven, 1997. 15. Hyva¨rinen, A. and Oja, E., Independent component analysis: algorithms and applications, Neural Networks, 13(4–5), 411, 2000. 16. Tucker, L.R., The extension of factor analysis to three-dimensional matrices, H. Gulliksen and N. Frederiksen (Eds.), Contributions to mathematical psychology series, Holt, Rinehart and Winston, 109–127, 1964. 17. Caroll, J.D. and Chang, J.J., Analysis of individual differences in multidimensional scaling via n-way generalization of Eckart–Young decomposition, Psychometrika, 35(3), 283–319, 1970. 18. Kroonenberg, P.M., Three-Mode Principal Component Analysis, DSWO Press, Leiden, 1983.

6 Application of Factor Analysis in Seismic Profiling

Zhenhai Wang and Chi Hau Chen

CONTENTS 6.1 Introduction to Seismic Signal Processing.................................................................... 104 6.1.1 Data Acquisition ................................................................................................... 104 6.1.2 Data Processing..................................................................................................... 105 6.1.2.1 Deconvolution ....................................................................................... 105 6.1.2.2 Normal Moveout................................................................................... 105 6.1.2.3 Velocity Analysis .................................................................................. 106 6.1.2.4 NMO Stretching .................................................................................... 106 6.1.2.5 Stacking .................................................................................................. 106 6.1.2.6 Migration................................................................................................ 106 6.1.3 Interpretation......................................................................................................... 107 6.2 Factor Analysis Framework ............................................................................................ 107 6.2.1 General Model....................................................................................................... 107 6.2.2 Within the Framework ........................................................................................ 109 6.2.2.1 Principal Component Analysis........................................................... 109 6.2.2.2 Independent Component Analysis .................................................... 110 6.2.2.3 Independent Factor Analysis .............................................................. 111 6.3 FA Application in Seismic Signal Processing .............................................................. 111 6.3.1 Marmousi Data Set............................................................................................... 111 6.3.2 Velocity Analysis, NMO Correction, and Stacking ........................................ 112 6.3.3 The Advantage of Stacking................................................................................. 114 6.3.4 Factor Analysis vs. Stacking ............................................................................... 114 6.3.5 Application of Factor Analysis........................................................................... 116 6.3.5.1 Factor Analysis Scheme No. 1 ............................................................ 116 6.3.5.2 Factor Analysis Scheme No. 2 ............................................................ 116 6.3.5.3 Factor Analysis Scheme No. 3 ............................................................ 118 6.3.5.4 Factor Analysis Scheme No. 4 ............................................................ 118 6.3.6 Factor Analysis vs. PCA and ICA ..................................................................... 120 6.4 Conclusions........................................................................................................................ 122 References ................................................................................................................................... 122 Appendices ................................................................................................................................. 124 6.A Upper Bound of the Number of Common Factors.................................................... 124 6.B Maximum Likelihood Algorithm.................................................................................. 125

103

Signal and Image Processing for Remote Sensing

104

6.1

Introduction to Seismic Signal Processing

Formed millions of years ago from plants and animals that died and decomposed beneath soil and rock, fossil fuels, namely, coal and petroleum, due to their low cost availability, will remain the most important energy resource for at least another few decades. Ongoing petroleum research continues to focus on science and technology needs for increased petroleum exploration and production. The petroleum industry relies heavily on subsurface imaging techniques for the location of these hydrocarbons.

6.1.1

Data Acquisition

Many geophysical survey techniques exist, such as multichannel reflection seismic profiling, refraction seismic survey, gravity survey, and heat flow measurement. Among them, reflection seismic profiling method stands out because of its target-oriented capability, generally good imaging results, and computational efficiency. These reflectivity data resolve features such as faults, folds, and lithologic boundaries measured in 10s of meters, and image them laterally for 100s of kilometers and to depths of 50 kilometers or more. As a result, seismic reflection profiling becomes the principal method by which the petroleum industry explores for hydrocarbon-trapping structures. The seismic reflection method works by processing echoes of seismic waves from boundaries between different Earth’s subsurfaces that characterize different acoustic impedances. Depending on the geometry of surface observation points and source locations, the survey is called a 2D or a 3D seismic survey. Figure 6.1 shows a typical 2D seismic survey, during which, a cable with attached receivers at regular intervals is dragged by a boat. The source moves along the predesigned seismic lines and generates seismic waves at regular intervals such that points in the subsurfaces are sampled several times by the receivers, producing a series of seismic traces. These seismic traces are saved on magnetic tapes or hard disks in the recording boat for future processing.

Receivers Water

Bottom

Subsurface 1 Subsurface 2 FIGURE 6.1 A typical 2D seismic survey.

Source

Application of Factor Analysis in Seismic Profiling 6.1.2

105

Data Processing

Seismic data processing has been regarded as having a flavor of interpretive character; it is even considered as an art [1]. However, there is a well-established sequence for standard seismic data processing. Deconvolution, stacking, and migration are the three principal processes that make up the foundation. Besides, some auxiliary processes can also help improve the effectiveness of the principal processes. In the following subsections, we briefly discuss the principal processes and some auxiliary processes. 6.1.2.1 Deconvolution Deconvolution can improve the temporal resolution ing the basic seismic wavelet to approximately a spike on the field data [2]. Deconvolution usually applied deconvolution. It is also a common practice to apply which is named poststack deconvolution.

of seismic data by compressand suppressing reverberations before stack is called prestack deconvolution to stacked data,

6.1.2.2 Normal Moveout Consider the simplest case where the subsurfaces of the Earth are horizontal, and within this layer, the velocity is constant. Here x is the distance (offset) between the source and the receiver positions, and v is the velocity of the medium above the reflecting interface. Given the midpoint location M, let t(x) be the traveltime along the raypath from the shot position S to the depth point D, then back to the receiver position G. Let t(0) be twice the traveltime along the vertical path MD. Utilizing the Pythagorean theorem, the traveltime equation as a function of offset is t2 (x) ¼ t2 (0) þ x2 =v2

(6:1)

Note that the above equation describes a hyperbola in the plane of two-way time vs. offset. A common-midpoint (CMP) gather are the traces whose raypaths associated with each source–receiver pair reflect from the same subsurface depth point D. The difference between the two-way time at a given offset t(x) and the two-way zero-offset time t(0) is called NMO. From Equation 6.1, we see that velocity can be computed when offset x and the two-way times t(x) and t(0) are known. Once the NMO velocity is estimated, the travletimes can be corrected to remove the influence of offset. DtNMO ¼ t(x) t(0) Traces in the NMO-corrected gather are then summed to obtain a stack trace at the particular CMP location. The procedure is called stacking. Now consider the horizontally stratified layers, with each layer’s thickness defined in terms of two-way zero-offset time. Given the number of layers N, interval velocities are represented as (v1, v2, . . . , vN). Considering the raypath from source S to depth D, back to receiver R, associated with offset x at midpoint location M, Equation 6.1 becomes t2 (x) ¼ t2 (0) þ x2= v2rms

(6:2)

where the relation between the rms velocity and the interval velocity is represented by

Signal and Image Processing for Remote Sensing

106

v2rms ¼

N 1 X v2 Dti (0) t(0) i¼1 i

where Dti is the vertical two-way time through the ith layer and t(0) ¼

i P

Dtk .

k¼1

6.1.2.3

Velocity Analysis

Effective correction for normal moveout depends on the use of accurate velocities. In CMP surveys, the appropriate velocity is derived by computer analysis of the moveout in the CMP gathers. Dynamic corrections are implemented for a range of velocity values and the corrected traces are stacked. The stacking velocity is defined as the velocity value that produces the maximum amplitude of the reflection event in the stack of traces, which clearly represents the condition of successful removal of NMO. In practice, NMO corrections are computed for narrow time windows down the entire trace, and for a range of velocities, to produce a velocity spectrum. The validity for each velocity value is assessed by calculating a form of multitrace correlation between the corrected traces of the CMP gathers. The values are shown contoured such that contour peaks occur at times corresponding to reflected wavelets and at velocities that produce an optimum stacked wavelet. By picking the location of the peaks on the velocity spectrum plot, a velocity function defining the increase of velocity with depth for that CMP gather can be derived. 6.1.2.4

NMO Stretching

After applying NMO correction, a frequency distortion appears, particularly for shallow events and at large offsets. This is called NMO stretching. The stretching is a frequency distortion where events are shifted to lower frequencies, which can be quantified as Df=f ¼ DtNMO =t(0)

(6:3)

where f is the dominant frequency, Df is change in frequency, and DtNMO is given by Equation 6.2. Because of the waveform distortion at large offsets, stacking the NMOcorrected CMP gather will severely damage the shallow events. Muting the stretched zones in the gather can solve this problem, which can be carried out by using the quantitative definition of stretching given in Equation 6.3. An alternative method for optimum selection of the mute zone is to progressively stack the data. By following the waveform along a certain event and observing where changes occur, the mute zone is derived. A trade-off exists between the signal-to-noise (SNR) ratio and mute, that is, when the SNR is high, more can be muted for less stretching; otherwise, when the SNR is low, a large amount of stretching is accepted to catch events on the stack. 6.1.2.5

Stacking

Among the three principal processes, CMP stacking is the most robust of all. Utilizing redundancy in CMP recording, stacking can significantly suppress uncorrelated noise, thereby increasing the SNR ratio. It also can attenuate a large part of the coherent noise in the data, such as guided waves and multiples. 6.1.2.6

Migration

On a seismic section such as that illustrated in Figure 6.2, each reflection event is mapped directly beneath the midpoint. However, the reflection point is located beneath the midpoint only if the reflector is horizontal. With a dip along the survey line the actual

Application of Factor Analysis in Seismic Profiling

107

x S

M

G

Surface

Reflector D

FIGURE 6.2 The NMO geometry of a single horizontal reflector.

reflection point is displaced in the up-dip direction; with a dip across the survey line the reflection point is displaced out of the plane of the section. Migration is a process that moves dipping reflectors into their true subsurface positions and collapses diffractions, thereby depicting detailed subsurface features. In this sense, migration can be viewed as a form of spatial deconvolution that increases spatial resolution. 6.1.3

Interpretation

The goal of seismic processing and imaging is to extract the reflectivity function of the subsurface from the seismic data. Once the reflectivity is obtained, it is the task of the seismic interpreter to infer the geological significance of a certain reflectivity pattern.

6.2

Factor Analysis Framework

Factor analysis (FA), a branch of multivariate analysis, is concerned with the internal relationships of a set of variates [3]. Widely used in psychology, biology, chemometrics1 [4], and social science, the latent variable model provides an important tool for the analysis of multivariate data. It offers a conceptual framework within which many disparate methods can be unified and a base from which new methods can be developed. 6.2.1

General Model

In FA the basic model is x ¼ As þ n

(6:4)

where x ¼ (x1, x2, . . . , xp)T is a vector of observable random variables (the test scores), s ¼ (s1, s2, . . . , sr)T is a vector r < p unobserved or latent random variables (the common factor scores), A is a (p r) matrix of fixed coefficients (factor loadings), n ¼ (n1, n2, . . . , np)T is a vector of random error terms (unique factor scores of order p). The means are usually set to zero for convenience so that E(x) ¼ E(s) ¼ E(n) ¼ 0. The random error term consists 1 Chemometrics is the use of mathematical and statistical methods for handling, interpreting, and predicting chemical data.

Signal and Image Processing for Remote Sensing

108

of errors of measurement and the unique individual effects associated with each variable xj, j ¼ 1, 2, . . . , p. For the present model we assume that A is a matrix of constant parameters and s is a vector of random variables. The following assumptions are usually made for the factor model [5]:

.

rank (A) ¼ r < p E(xjs) ¼ As

.

E(xxT) ¼ S, E(ssT) ¼ V and

.

2 6 6 ¼ E nnT ¼ 6 4

s21

0 s22

0

..

. s2p

3 7 7 7 5

(6:5)

That is, the errors are assumed to be uncorrelated. The common factors however are generally correlated, and V is therefore not necessarily diagonal. For the sake of convenience and computational efficiency, the common factors are usually assumed to be uncorrelated and of unit variance, so that V ¼ I. .

E(snT) ¼ 0 so that the errors and common factors are uncorrelated.

From the above assumptions, we have E xxT ¼ S ¼ E (As þ n)(As þ n)T ¼ E AssT AT þ AsnT þ nsT AT þ nnT ¼ AE ssT AT þ AE snT þ E nsT AT þ E nnT ¼ AVAT þ E nnT ¼Gþ

(6:6)

where G ¼ AVAT and ¼ E(nnT) are the true and error covariance matrices, respectively. In addition, postmultiplying Equation 6.4 by sT, considering the expectation, and using assumptions (6.3) and (6.4), we have E xsT ¼ E AssT þ nsT ¼ AE ssT þ E nsT ¼ AV

(6:7)

For the special case of V ¼ I, the covariance between the observation and the latent variables simplifies to E(xsT) ¼ A. A special case is found when x is a multivariate Gaussian; the second moments of Equation 6.6 will contain all the information concerning the factor model. The factor model Equation 6.4 will be linear, and given the factors s the variables x are conditionally independent. Let s 2 N(0, I), the conditional distribution of x is xjs 2 N(As, )

(6:8)

Application of Factor Analysis in Seismic Profiling

109

or 1 p(xjs) ¼ (2p)p=2 jj1=2 exp (x As)T 1 (x As) 2

(6:9)

with conditional independence following from the diagonality of . The common factors s therefore reproduce all covariances (or correlations) between the variables, but account for only a portion of the variance. The marginal distribution for x is found by integrating the hidden variables s, or ð p(x) ¼ p(xjs)p(s) ds 1 T p=2 T 1=2 T 1 (6:10) ¼ (2p) þ AA exp x þ AA x 2 The calculation is straightforward because both p(s) and p(xjs) are Gaussian. 6.2.2

Within the Framework

Many methods have been developed for estimating the model parameters for the special case of Equation 6.8. Unweighted least square (ULS) algorithm [6] is based on minimizing the sum of squared differences between the observed and estimated correlation matrices, not counting the diagonal. Generalized least square (GLS) [6] algorithm is adjusting ULS by weighting the correlations inversely according to their uniqueness. Another method, maximum likelihood (ML) algorithm [7], uses a linear combination of variables to form factors, where the parameter estimates are those most likely to have resulted in the observed correlation matrix. More details on the ML algorithm can be found in Appendix 6.B. These methods are all of second order, which find the representation using only the information contained in the covariance matrix of the test scores. In most cases, the mean is also used in the initial centering. The reason for the popularity of the second-order methods is that they are computationally simple, often requiring only classical matrix manipulations. Second-order methods are in contrast to most higher order methods that try to find a meaningful representation. Higher order methods use information on the distribution of x that is not contained in the covariance matrix. The distribution of fx must not be assumed to be Gaussian, because all the information of Gaussian variables is contained in the first two-order statistics from which all the high order statistics can be generated. However, for more general families of density functions, the representation problem has more degrees of freedom, and much more sophisticated techniques may be constructed for non-Gaussian random variables. 6.2.2.1 Principal Component Analysis Principal component analysis (PCA) is also known as the Hotelling transform or the Karhunen–Loe`ve transform. It is widely used in signal processing, statistics, and neural computing to find the most important directions in the data in the mean-square sense. It is the solution of the FA problem with minimum mean-square error and an orthogonal weight matrix. The basic idea of PCA is to find the r p linearly transformed components that provide the maximum amount of variance possible. During the analysis, variables in x are transformed linearly and orthogonally into an equal number of uncorrelated new variables in e. The transformation is obtained by finding the latent roots and vectors of either the covariance or the correlation matrix. The latent roots, arranged in descending order of magnitude, are

Signal and Image Processing for Remote Sensing

110

equal to the variances of the corresponding variables in e. Usually the first few components account for a large proportion of the total variance of x, accordingly, may then be used to reduce the dimensionality of the original data for further analysis. However, all components are needed to reproduce accurately the correlation coefficients within x. Mathematically, the first principal component e1 corresponds to the line on which the projection of the data has the greatest variance e1 ¼ arg max

kak¼1

T X

eT x

2

(6:11)

t¼1

The other components are found recursively by first removing the projections to the previous principal components: ek ¼ arg max

kek¼1

X

" e

T

x

k1 X

!#2 ei eTi x

(6:12)

i¼1

In practice, the principal components are found by calculating the eigenvectors of the covariance matrix S of the data as in Equation 6.6. The eigenvalues are positive and they correspond to the variances of the projections of data on the eigenvectors. The basic task in PCA is to reduce the dimension of the data. In fact, it can be proven that the representation given by PCA is an optimal linear dimension reduction technique in the mean-square sense [8,9]. The kind of reduction in dimension has important benefits [10]. First, the computational complexity of the further processing stages is reduced. Second, noise may be reduced, as the data not contained in the components may be mostly due to noise. Third, projecting into a subspace of low dimension is useful for visualizing the data. 6.2.2.2 Independent Component Analysis The independent component analysis (ICA) model originates from the multi-input and multi-output (MIMO) channel equalization [11]. Its two most important applications are blind source separation (BSS) and feature extraction. The mixing model of ICA is similar to that of the FA, but in the basic case without the noise term. The data have been generated from the latent components s through a square mixing matrix A by x ¼ As

(6:13)

In ICA, all the independent components, with the possible exception of one component, must be non-Gaussian. The number of components is typically the same as the number of observations. Such an A is searched for to enable the components s ¼ A1x to be as independent as possible. In practice, the independence can be maximized, for example, by maximizing nonGaussianity of the components or minimizing mutual information [12]. ICA can be approached from different starting points. In some extensions the number of independent components can exceed the number of dimensions of the observations making the basis overcomplete [12,13]. The noise term can be taken into the model. ICA can be viewed as a generative model when the 1D distributions for the components are modeled with, for example, mixtures of Gaussians (MoG). The problem with ICA is that it has the ambiguities of scaling and permutation [12]; that is, the indetermination of the variances of the independent components and the order of the independent components.

Application of Factor Analysis in Seismic Profiling

111

6.2.2.3 Independent Factor Analysis Independent factor analysis (IFA) is formulated by Attias [14]. It aims to describe p generally correlated observed variables x in terms of r < p independent latent variables s and an additive noise term n. The proposed algorithm derives from the ML and more specifically from the expectation–maximization (EM) algorithm. IFA model differs from the classic FA model in that the properties of the latent variables it involves are different. The noise variables n are assumed to be normally distributed, but not necessarily uncorrelated. The latent variables s are assumed to be mutually independent but not necessarily normally distributed; their densities are indeed modeled as mixtures of Gaussians. The independence assumption allows modeling the density of each si in the latent space separately. There are some problems with the EM–MoG algorithm. First, approximating source densities with MoGs is not so straightforward because the number of Gaussians has to be adjusted. Second, EM–MoG is computationally demanding where the complexity of computation grows exponentially with the number of sources [14]. Given a small number of sources the EM algorithm is exact and all the required calculations can be done analytically, whereas it becomes intractable as the number of sources in the model increases.

6.3 6.3.1

FA Application in Seismic Signal Processing Marmousi Data Set

Marmousi is a 2D synthetic data set generated at the Institut Franc¸is du Pe´trole (IFP). The geometry of this model is based on a profile through the North Quenguela trough in the Cuanza basin [15,16]. The geometry and velocity model was created to produce complex seismic data, which requires advanced processing techniques to obtain a correct Earth image. Figure 6.3 shows the velocity profile of the Marmousi model. Based on the profile and the geologic history, a geometric model containing 160 layers was created. Velocity and density distributions were defined by introducing realistic horizontal and vertical velocities and density gradients. This resulted in a 2D density– velocity grid with dimensions of 3000 m in depth by 9200 m in offset. Marmousi velocity model

0

Depth (m)

500 1000 1500 2000 2500 3000 0

1000

2000

3000

4000

5000

Offset (m) FIGURE 6.3 Marmousi velocity model.

6000

7000

8000

9000

Signal and Image Processing for Remote Sensing

112

Data were generated by a modeling package that can simulate a seismic line by computing successively the different shot records. The line was ‘‘shot’’ from west to east. The first and last shot points were, respectively, 3000 and 8975 m from the west edge of the model. Distance between shots was 25 m. Initial offset was 200 m and the maximum offset was 2575 m.

6.3.2

Velocity Analysis, NMO Correction, and Stacking

Given the Marmousi data set, after some conventional processing steps described in Section 6.2, the results of velocity analysis and normal moveout are shown in Figure 6.4. The left-most plot is a CMP gather. There are totally 574 CMP gathers in the Marmousi data set; each includes 48 traces. On the second plot, velocity spectrum is generated after the CMP gather is NMOcorrected and stacked using a range of constant velocity values, and the resultant stack traces for each velocity are placed side by side on a plane of velocity vs. two-way zerooffset time. By selecting the peaks on the velocity spectrum, an initial rms velocity can be defined, shown as a curve on the left of the second plot. The interval velocity can be calculated by using Dix formula [17] and shown on the right side of the plot. Given the estimated velocity profile, the real moveout correction can be carried out, shown in the third plot. As compared with the first plot, we can see the hyperbolic curves are flattened out after NMO correction. Usually another procedure called muting will be carried out before stacking because as we can see in the middle of the third plot, there are CMP gather

Velocity spectrum

NMO-corrected

Optimum muting

0.5

0.5

Time (sec)

1

1

1.5

1.5

2

2

2.5

3

2.5

Offset (m) 1000 2000

3000 4000 5000 Velocity (m/sec)

FIGURE 6.4 Velocity analysis and stacking of Marmousi data set.

6000 Offset (m)

3 Offset (m)

Application of Factor Analysis in Seismic Profiling

113

great distortions because of the approximation. That part will be eliminated before stacking all the 48 traces together. The fourth plot just shows a different way of highlighting the muting procedure. For details, see Ref. [1]. After we complete the velocity analysis, NMO correction, and stacking for the 56 of the CMPs, we get the following section of the subsurface image as on the left of Figure 6.5. There are two reasons that only 56 out of 574 of the CMPs are stacked. One reason is that the velocity analysis is too time consuming on a personal computer and the other is that although 56 CMPs are only one tenths of the 574 CMPs, it indeed covers nearly 700 m of the profile. It is enough to compare processing difference. The right plot is the same image as the left one except that it is after the automatic amplitude adjustment, which is to stress the vague events so that both the vague events and strong events in the image are shown with approximately the same amplitude. The algorithm includes three easy steps: 1. Compute Hilbert envelope of a trace. 2. Convolve the envelope with a triangular smoother to produce the smoothed envelope. 3. Divide the trace by the smoothed envelope to produce the amplitude-adjusted trace. By comparing the two plots, we can see that vague events at the top and bottom of the image are indeed stressed. In the following sections, we mainly use automatic amplitudeadjusted image to illustrate results. It needs to be pointed out that due to NMO stretching and lack of data at small offset after muting, events before 0.2 sec in Figure 6.5 are shown as distorted and do not provide

Stacking

Automatic amplitude adjustment

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

FIGURE 6.5 Stacking of 56 CMPs.

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

114

useful information. In the following sections, when we compare the result, we mainly consider events after 0.2 sec. 6.3.3

The Advantage of Stacking

Stacking is based on the assumption that all the traces in a CMP gather correspond to one single depth point. After they are NMO-corrected, the zero-offset traces should contain the same signal embedded in different random noises, which are caused by the different raypaths. The process of adding them together in this manner can increase the SNR ratio by adding up the signal components while canceling the noises among the traces. To see what stacking can do to improve the subsurface image quality, let us compare the image obtained from a single trace and that from stacking the 48 muted traces. In Figure 6.6, the single trace result without stacking is shown in the right plot. For every CMP (or CDP) gather, only the trace of smallest offset is NMO-corrected and placed side by side together to produce the image, while in the stack result in the left plot, 48 NMO-corrected and muted traces are stacked and placed side by side. Clearly, after stacking, the main events at 0.5, 1.0, and 1.5 sec are stressed, and the noise in between is canceled out. Noise at 0.2 is effectively removed. Noise caused by multiples from 2.0 to 3.0 sec is significantly reduced. However, due to NMO stretching and muting, there are not enough data to depict events at 0 to 0.25 sec on both plots.

6.3.4

Factor Analysis vs. Stacking

Now we suggest an alternative way of obtaining the subsurface image by using FA instead of stacking. As presented in Appendix 6.A, FA can extract one unique common factor from the traces with maximum correlation among them. It fits well with what is

Without stacking

0.5

0.5

1

1 Time (sec)

Time (sec)

Stacking

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.6 Comparison of stacking and single trace images.

3

3000

3200 3400 CDP (m)

Application of Factor Analysis in Seismic Profiling

115

expected of zero-offset traces in that after NMO correction they contain the same signal embedded in different random noises. There are two reasons that FA works better than stacking. First, FA model considers scaling factor A as in Equation 6.14, while stacking assumes no scaling as in Equation 6.15. Factor analysis: x ¼ As þ n

(6:14)

Stacking: x ¼ s þ n

(6:15)

When the scaling information is lost, simple summation does not necessarily increase the SNR ratio. For example, if one scaling factor is 1 and the other is 1, summation will simply cancel out the signal component completely, leaving only the noise component. Second, FA makes use of the second-order statistics explicitly as the criterion to extract the signal while stacking does not. Therefore, SNR ratio will improve more in the case of FA than in the case of stacking. To illustrate the idea, x(t) are generated using the following equation: x(t) ¼ As(t) þ n(t) ¼ A cos (2pt) þ n(t) where s(t) is the sinusoidal signal, n(t) are 10 independent noise terms with Gaussian distribution. The matrix of factor loadings A is also generated randomly. Figure 6.7 shows the result of stacking and FA. The top plot is one of the ten observations x(t). The middle plot is the result of stacking and the bottom plot is the result of FA using ML algorithm as presented in Appendix 6.B. Comparing the two plots suggests that FA outperforms stacking in improving the SNR ratio.

Observable variable

Result of stacking

Result of factor analysis

FIGURE 6.7 Comparison of stacking and FA.

Signal and Image Processing for Remote Sensing

116

0.5 Mute zone

Time (sec)

1

1.5

2

2.5

3

0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.8 Factor analysis of Scheme no. 1.

6.3.5

Application of Factor Analysis

The simulation result in Section 6.3.4 suggests that FA can be applied to the NMOcorrected seismic data. One problem arises, however, when we inspect the zero-offset traces. They need to be muted because of the NMO stretching, which means almost all the traces will have a segment set to zero (mute zone), as is shown in Figure 6.8. Is it possible to just apply FA to the muted traces? Is it possible to have other schemes that make full use of the information at hand? In the following sections, we try to answer these questions by discussing different schemes to carry out FA. 6.3.5.1 Factor Analysis Scheme No. 1 Let us start with the easiest one. The scheme is illustrated in Figure 6.8. We will set the mute zone to zero and apply FA to a CMP gather using ML algorithm. Extracting one single common factor from the 48 traces, and placing all the resulting factors from 56 CMP gathers side by side, the right plot in Figure 6.9 is obtained. Compared with the result of stacking shown on the left, events from 2.2 to 3.0 sec are more smoothly presented instead of the broken dashlike events after stacking. However, at near offset, from 0 to 0.7 sec, the image is contaminated with some vertical stripes. 6.3.5.2 Factor Analysis Scheme No. 2 In this scheme, the muted segments in each trace are replaced by segments of the nearest neighboring traces as is illustrated by Figure 6.10. Trace no. 44 borrows Segment 1

Application of Factor Analysis in Seismic Profiling

117 FA result of Scheme no. 1

0.5

0.5

1

1 Time (sec)

Time (sec)

Stack

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.9 Comparison of stacking and FA result of Scheme no. 1.

Segment 1 Segment 2 Segment 3 0.5

Time (sec)

1

1.5

2

2.5

3 0

5

10

FIGURE 6.10 Factor analysis of Scheme no. 2.

15

20

25 30 Trace number

35

40

45

Signal and Image Processing for Remote Sensing

118

FA result of Scheme no. 2

0.5

0.5

1

1 Time (sec)

Time (sec)

Stack

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.11 Comparison of stacking and FA result of Scheme no. 2.

from Trace no. 45 to fill out its muted segment. Trace no. 43 borrows Segments 1 and 2 from Trace no. 44 to fill out its muted segment and so on. As a result, Segment 1 from Trace no. 45 is copied to all the muted traces, from Trace no. 1 to 44. Segment 2 from Trace no. 44 is copied to traces from Trace no. 1 to 43. After the mute zone is filled out, FA is carried out to produce the result shown in Figure 6.11. Compared to stacking shown on the left, there is no improvement in the obtained image. Actually, the result is worse. Some events are blurred. Therefore, Scheme no. 2 is not a good scheme. 6.3.5.3 Factor Analysis Scheme No. 3 In this scheme, instead of copying the neighboring segments to the mute zone, the segments obtained from applying FA to the traces included in the nearest neighboring box are copied. In Figure 6.12, we first apply FA to traces in Box 1 (Trace no. 45 to 48), then Segment 1 is extracted from the result and copied to Trace no. 44. Segment 2 obtained from applying FA to traces in Box 2 (Trace no. 44 to 48) will be copied to Trace no. 43. When done, the image obtained is shown in Figure 6.13. Compared with Scheme no. 2, the result is better. But compared to stacking, there is still some contamination from 0 to 0.7 sec. 6.3.5.4 Factor Analysis Scheme No. 4 In the scheme, as is illustrated in Figure 6.14, Segment 1 will be extracted from applying FA to all the traces in Box 1 (traces from Trace no. 1 to 48), and Segment 2 will be extracted from applying FA to trace segments in Box 2 (traces from Trace no. 2 to 48). Note that the data are not muted before FA. In this manner, for every segment, all the data points available are fully utilized.

Application of Factor Analysis in Seismic Profiling

119 Box 3 Box 2 Box 1

Segment 1 Segment 2 Segment 3 0.5

Time (sec)

1

1.5

2

2.5

3 0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.12 Factor analysis of Scheme no. 3.

Stack

Factor analysis of Scheme no. 3

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.13 Comparison of stacking and FA result of Scheme no. 3.

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

120

Box 1 Box 2 Box 3

Time (sec)

0.5

1

1.5

2 Segment 3 Segment 2 2.5 Segment 1 3 0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.14 Factor analysis of Scheme no. 4.

In the result generated, we noticed that events from 0 to 0.16 sec are distorted. The amplitude is so large that it overshadows the other events. Comparing all the results obtained above, we conclude that both stacking and FA are unable to extract useful information from 0 to 0.16 sec. To better illustrate the FA result, we will mute the result from the distorted trace segments, and the final result is shown in Figure 6.15. Compare the results of FA and stacking; we can see that events at around 1 and 1.5 sec are strengthened. Events from 2.2 to 3.0 sec are more smoothly presented instead of the broken dashlike events in the stacked result. Overall, the SNR ratio of the image is improved.

6.3.6

Factor Analysis vs. PCA and ICA

The results of PCA and ICA (discussed in subsections 6.2.2.1 and 6.2.2.2) are placed side by side with the result of FA for comparison in Figure 6.16 and Figure 6.17. As we can see from both plots on the right side of the figures, important events are missing and the subsurface images are distorted. The reason is that the criteria used in PCA and ICA to extract the signals are improper to this particular scenario. In PCA, traces are transformed linearly and orthogonally into an equal number of new traces that have the property of being uncorrelated, where the first component having the maximum variance is used to produce the image. In ICA, the algorithm tries to extract components that are as independent to each other as possible, where the obtained components suffer from the problems of scaling and permutation.

Application of Factor Analysis in Seismic Profiling

121

Stack

Factor analysis of Scheme no. 4

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3

3200 3400 CDP (m)

3000

3200 3400 CDP (m)

FIGURE 6.15 Comparison of stacking and FA result of Scheme no. 4.

Factor analysis

Prinicipal component analysis

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.16 Comparison of FA and PCA results.

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

122 Factor analysis

Independent component analysis

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.17 Comparison of FA and ICA results.

6.4

Conclusions

Stacking is one of the three most important and robust processing steps in seismic signal processing. By utilizing the redundancy of the CMP gathers, stacking can effectively remove noise and increase the SNR ratio. In this chapter we propose to use FA to replace stacking to obtain better subsurface images after applying FA algorithm to the synthetic Marmousi data set. Comparisons with PCA and ICA show that FA indeed has advantages over other techniques in this scenario. It is noted that the conventional seismic processing steps adopted here are very basic and for illustrative purposes only. Better results may be obtained in velocity analysis and stacking if careful examination and iterative procedures are incorporated as is often the case in real situations.

References ¨ . Yilmaz, Seismic Data Processing, Society of Exploration Geophysicists, Tulsa, 1987. 1. O 2. E.A. Robinson, S. Treitei, R.A. Wiggins, and P.R. Gutowski, Digital Seismic Inverse Methods, International Human Resources Development Corporation, Boston, 1983. 3. D.N. Lawley and A.E. Maxwell, Factor Analysis as a Statistical Method, Butterworths, London, 1963. 4. E.R. Malinowski, Factor Analysis in Chemistry, 3rd ed., John Wiley & Sons, New York, 2002. 5. A.T. Basilevsky, Statistical Factor Analysis and Related Methods: Theory and Applications, 1st ed., Wiley-Interscience, New York, 1994.

Application of Factor Analysis in Seismic Profiling

123

6. H. Harman, Modern Factor Analysis, 2nd ed., University of Chicago Press, Chicago, 1967. 7. K.G. Jo¨reskog, Some contributions to maximum likelihood factor analysis, Psychometrika, 32:443–482, 1967. 8. I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, Heidelberg, 1986. 9. M. Kendall, Multivariate Analysis, Charles Griffin, London, 1975. 10. A. Hyva¨rinen, Survey on independent component analysis, Neural Computing Surveys, 2:94–128, 1999. 11. P. Comon, Independent component analysis, a new concept? Signal Processing, 36:287–314, 1994. 12. J. Karhunen, A. Hyva¨rinen, and E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001. 13. T. Lee, M. Girolami, M. Lewicki, and T. Sejnowski, Blind source separation of more sources than mixtures using overcomplete representations, IEEE Signal Processing Letters, 6:87–90, 1999. 14. H. Attias, Independent factor analysis, Neural Computation, 11:803–851, 1998. 15. R.J. Versteeg, Sensitivity of prestack depth migration to the velocity model, Geophysics, 58(6):873–882, 1993. 16. R.J. Versteeg, The Marmousi experience: velocity model determination on a synthetic complex data set, The Leading Edge, 13:927–936, 1994. 17. C.H. Dix, Seismic velocities from surface measurements, Geophysics, 20:68–86, 1955. 18. R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, New York, 1960.

APPENDICES

6.A

Upper Bound of the Number of Common Factors

Suppose that there is a unique , matrix S must be of rank r. This is the covariance matrix for x where each diagonal element represents the part of the variance that is due to the r common factors instead of the total variance of the corresponding variate. This is known as communality of the variate. When r ¼ 1, A reduces to a column vector of p elements. It is unique, apart from a possible change of sign of all its elements. With 1 < r < p common factors, it is not generally possible to determine A and s uniquely, even in the case of a normal distribution. Although every factor model specified by Equation 6.8 leads to a multivariate normal, the converse is not necessarily true when 1 < r < p. The difficulty is known as the factor identification or factor rotation problem. Let H be any (r r) orthogonal matrix, so that HHT ¼ HTH ¼ I, then x ¼ AHHT s þ n 8 8

¼ Aasa þ n

:

Thus, s and sa˚ have the same statistical properties since

8 E sa ¼ HT E(s)

8 cov sa ¼ HT covðsÞH ¼ HT H ¼ I Assume there exist 1 < r < p common factors such that G ¼ AVAT and is Grammian and diagonal. The covariance matrix S has 1 p C þ p ¼ p(p þ 1) 2 2 distinct elements, which equals the total number of normal equations to be solved. However, the number of solutions is infinite, as can be seen from the following derivation. Since V is Grammian, its Cholesky decomposition exists. That is, there exists a nonsingular (r r) matrix U, such that V ¼ UTU and S ¼ AVAT þ ¼ AUT UAT þ : T ¼ AUT AUT þ 8

8

¼ AaAa T þ

(6A:1) 124

Application of Factor Analysis in Seismic Profiling

125

Apparently both factorization Equation 6.6 and Equation 6A.1 leave the same residual error and therefore must represent equally valid factor solutions. Also, we can substitute Aa˚ ¼ AB and Va˚ ¼ B1V (BT)1, which again yields a factor model that is indistinguishable from Equation 6.6. Therefore, no sample estimator can distinguish between such an infinite number of transformations. The coefficients A and Aa˚ are thus statistically equivalent and cannot be distinguished from each other or identified uniquely; that is, both the transformed and untransformed coefficients, together with , generate S in exactly the same way and cannot be differentiated by any estimation procedure without the introduction of additional restrictions. To solve the rotational indeterminacy of the factor model we require restrictions on V, the covariance matrix of the factors. The most straightforward and common restriction is to set V ¼ I. The number m of free parameters implied by the equation S ¼ AAT þ

(6A:2)

is then equal to the total number pr þ p for unknown parameters in A and , minus the number of zero restrictions placed on the off-diagonal elements of V, which is equal to 1/2(r2r) since V is symmetric. We then have m ¼ (pr þ p) 1=2(r2 r) ¼ p(r þ 1) 1=2(r2 r)

(6A:3)

where the columns of A are assumed to be orthogonal. The number of degrees of freedom d is then given by the number of equations implied by Equation 6A.2, that is, the number of distinct elements in S minus the number of free parameters m. We have d ¼ 1=2p(p þ 1) p(r þ 1) 1=2(r2 r) ¼ 1=2 (p r)2 (p r)

(6A:4)

which for a meaningful (i.e., nontrivial) empirical application must be strictly positive. This places an upper bound on the number of common factors r, which may be obtained in practice, a number which is generally somewhat smaller than the number of variables p.

6.B

Maximum Likelihood Algorithm

The maximum likelihood (ML) algorithm presented here is proposed by Jo¨reskog [7]. The algorithm uses an iterative procedure to compute a linear combination of variables to form factors. Assume that the random vector x has a multivariate normal distribution as defined in Equation 6.9. The elements of A, V, and are the parameters of the model to be estimated from the data. From a random sample of N observations of x we can find the mean vector and the estimated covariance matrix S, whose elements are the usual estimates of variances and covariances of the components of x.

Signal and Image Processing for Remote Sensing

126

mx ¼

N 1 X xi N i¼1

N 1 X (x mx )(x mx )T N 1 i¼1 ! N X 1 T T ¼ xx Nmx mx : N 1 i¼1

S¼

(6B:1)

The distribution of S is the Wishart distribution [3]. The log-likelihood function is given by h

i 1 log L ¼ (N 1) logjSj þ tr SS1 2 However, it is more convenient to minimize

F(A, V, ) ¼ logjSj þ tr SS1 logjSj p instead of maximizing log L [7]. They are equivalent because log L is a constant minus 1) times F. The function F is regarded as a function of A and . Note that if H is any nonsingular (k k) matrix, then 1 2 (N

F(AH1 , HVHT , ) ¼ F(A, V, ) which means that the parameters in A and V are not independent of one another, and to make the ML estimates of A and V unique, k2 independent restrictions must be imposed on A and V. To find the minimum of F we shall first find the conditional minimum for a given and then find the overall minimum. The partial derivative of F with respect to A is @F ¼ 2S1 (S S)S1 A @A See details in Ref. [7]. For a given , the minimization of A is to be found in the solution of S1 (S S)S1 A ¼ 0 Premultiplying with S gives (S S)S1 A ¼ 0 Using the following expression for the inverse S1 [3] S1 ¼ 1 1 A(I þ AT 1 A)1 AT 1 whose left side may be further simplified [7] so that (S S)1 A(I þ AT 1 A)1 ¼ 0

(6B:2)

Application of Factor Analysis in Seismic Profiling

127

Postmultiplying by I þ AT1A gives (S S)1 A ¼ 0

(6B:3)

which after substitution of S from Equation 6A.2 and rearrangement of terms gives S1 A ¼ A(I þ AT 1 A) Premultiplying by 1/2 finally gives (1=2 S1=2 )(1=2 A) ¼ (1=2 A)(I þ AT 1 A)

(6B:4)

From Equation 6B.4, we can see that it is convenient to take AT1A to be diagonal, since F is unaffected by postmultiplication of A by an orthogonal matrix and AT1A can be reduced to diagonal form by orthogonal transformations [18]. In this case, Equation 6B.4 is a standard eigen decomposition form. The columns of 1/2A are latent vectors of 1/2 S1/2, and the diagonal elements of I þ AT1A are the corresponding latent roots. Let e1 e2 ep be the latent roots of 1/2 S1/2 and let ee1, ee2, , eek be a set of latent vectors ek be the diagonal matrix with e1, e2, . . . , ek as corresponding to the k largest roots. Let L diagonal elements and let Ek be the matrix with ee1, ee2, . . . , eek as columns. Then ek I)1=2 e ¼ Ek ( L 1=2 A Premultiplying by 1/2 gives the conditional ML estimate of A as ek I)1=2 e ¼ 1=2 Ek (L A

(6B:5)

Up to now, we have considered the minimization of F with respect to A for a given . Now let us examine the partial derivative of F with respect to [3], h i @F e ) S1 ¼ diag S1 (S S @ e 1 with Equation 6B.2 and using Equation 6B.3 gives Substituting S h i @F e ) 1 ¼ diag 1 (S S @ which by Equation 6.6 becomes h i @F e ) 1 eA eT þ S ¼ diag 1 (A @ Minimizing it, we will get, eA eA e T) ¼ diag(S

(6B:6)

By iterating Equation 6B.5 and Equation 6B.6, the ML estimation of the FA model of Equation 6.4 can be obtained.

7 Kalman Filtering for Weak Signal Detection in Remote Sensing

Stacy L. Tantum, Yingyi Tan, and Leslie M. Collins

CONTENTS 7.1 Signal Models .................................................................................................................... 131 7.1.1 Harmonic Signal Model ...................................................................................... 131 7.1.2 Interference Signal Model ................................................................................... 131 7.2 Interference Mitigation .................................................................................................... 132 7.3 Postmitigation Signal Models ......................................................................................... 133 7.3.1 Harmonic Signal Model ...................................................................................... 134 7.3.2 Interference Signal Model ................................................................................... 134 7.4 Kalman Filters for Weak Signal Estimation ................................................................. 135 7.4.1 Direct Signal Estimation ...................................................................................... 136 7.4.1.1 Conventional Kalman Filter ................................................................ 136 7.4.1.2 Kalman Filter with an AR Model for Colored Noise ..................... 137 7.4.1.3 Kalman Filter for Colored Noise........................................................ 139 7.4.2 Indirect Signal Estimation................................................................................... 140 7.5 Application to Landmine Detection via Quadrupole Resonance............................. 142 7.5.1 Quadrupole Resonance........................................................................................ 143 7.5.2 Radio-Frequency Interference ............................................................................ 143 7.5.3 Postmitigation Signals ......................................................................................... 144 7.5.3.1 Postmitigation Quadrupole Resonance Signal................................. 145 7.5.3.2 Postmitigation Background Noise ..................................................... 145 7.5.4 Kalman Filters for Quadrupole Resonance Detection.................................... 145 7.5.4.1 Conventional Kalman Filter ................................................................ 145 7.5.4.2 Kalman Filter with an Autoregressive Model for Colored Noise ................................................................................. 146 7.5.4.3 Kalman Filter for Colored Noise........................................................ 146 7.5.4.4 Indirect Signal Estimation ................................................................... 146 7.6 Performance Evaluation .................................................................................................. 147 7.6.1 Detection Algorithms........................................................................................... 147 7.6.2 Synthetic Quadrupole Resonance Data ............................................................ 147 7.6.3 Measured Quadrupole Resonance Data ........................................................... 148 7.7 Summary ............................................................................................................................ 150 References ................................................................................................................................... 151

129

130

Signal and Image Processing for Remote Sensing

Remote sensing often involves probing a region of interest with a transmitted electromagnetic signal, and then analyzing the returned signal to infer characteristics of the investigated region. It is not uncommon for the measured signal to be relatively weak or for ambient noise to interfere with the sensor’s ability to isolate and measure only the desired return signal. Although there are potential hardware solutions to these obstacles, such as increasing the power in the transmit signal to strengthen the return signal, or altering the transmit frequency, or shielding the system to eliminate the interfering ambient noise, these solutions are not always viable. For example, regulatory constraints on the amount of power that may be radiated by the sensor or the trade-off between the transmit power and the battery life for a portable sensor may limit the power in the transmitted signal, and effectively shielding a system in the field from ambient electromagnetic signals is very difficult. Thus, signal processing is often utilized to improve signal detectability in situations such as these where a hardware solution is not sufficient. Adaptive filtering is an approach that is frequently employed to mitigate interference. This approach, however, relies on the ability to measure the interference on auxiliary reference sensors. The signals measured on the reference sensors are utilized to estimate the interference, and then this estimate is subtracted from the signal measured by the primary sensor, which consists of the signal of interest and the interference. When the interference measured by the reference sensors is completely correlated with the interference measured by the primary sensor, the adaptive filtering can completely remove the interference from the primary signal. When there are limitations in the ability to measure the interference, that is, the signals from the reference sensors are not completely correlated with the interference measured by the primary sensor, this approach is not completely effective. Since some residual interference remains after the adaptive interference cancellation, signal detection performance is adversely affected. This is particularly true when the signal of interest is weak. Thus, methods to improve signal detection when there is residual interference would be useful. The Kalman filter (KF) is an important development in linear estimation theory. It is the statistically optimal estimator when the noise is Gaussian-distributed. In addition, the Kalman filter is still the optimal linear estimator in the minimum mean square error (MMSE) sense even when the Gaussian assumption is dropped [1]. Here, Kalman filters are applied to improve detection of weak harmonic signals. The emphasis in this chapter is not on developing new Kalman filters but, rather, on applying them in novel ways for improved weak harmonic signal detection. Both direct estimation and indirect estimation of the harmonic signal of interest are considered. Direct estimation is achieved by applying Kalman filters in the conventional manner; the state of the system is equal to the signal to be estimated. Indirect estimation of the harmonic signal of interest is achieved by reversing the usual application of the Kalman filter so the background noise is the system state to be estimated, and the signal of interest is the observation noise in the Kalman filter problem statement. This approach to weak signal estimation is evaluated through application to quadrupole resonance (QR) signal estimation for landmine detection. Mine detection technologies and systems that are in use or have been proposed include electromagnetic induction (EMI) [2], ground penetrating radar (GPR) [3], and QR [4,5]. Regardless of the technology utilized, the goal is to achieve a high probability of detection, PD, while maintaining a low probability of false alarm, PFA. This is of particular importance for landmine detection since the nearly perfect PD required to comply with safety requirements often comes at the expense of a high PFA, and the time and cost required to remediate contaminated areas is directly proportional to PFA. In areas such as a former battlefield, the average ratio of real mines to suspect objects can be as low as 1:100, thus the process of clearing the area often proceeds very slowly.

Kalman Filtering for Weak Signal Detection in Remote Sensing

131

QR technology for explosive detection is of crucial importance in an increasing number of applications. Most explosives, such as RDX, TNT, PETN, etc., contain nitrogen (N). Some of its isotopes, such as 14N, possess electric quadrupole moments. When compounds with such moments are probed with radio-frequency (RF) signals, they emit unique signals defined by the specific nucleus and its chemical environment. The QR frequencies for explosives are quite specific and are not shared by other nitrogenous materials. Since the detection process is specific to the chemistry of the explosive and therefore is less susceptible to the types of false alarms experienced by sensors typically used for landmine detection, such as EMI or GPR sensors, the pure QR of 14N nuclei supports a promising method for detecting explosives in the quantities encountered in landmines. Unfortunately, QR signals are weak, and thus vulnerable to both the thermal noise inherent in the sensor coil and external radio-frequency interference (RFI). The performance of the Kalman filter approach is evaluated on both simulated data and measured field data collected by Quantum Magnetics, Inc. (QM). The results show that the proposed algorithm improves the performance of landmine detection.

7.1

Signal Models

In this chapter, it is assumed that the sensor operates by repeatedly transmitting excitation pulses to investigate the potential target and acquires the sensor response after each pulse. The data acquired after each excitation pulse are termed a segment, and a group of segments constitutes a measurement. In general, for each potential target there are multiple measurements with each measurement containing many segments.

7.1.1

Harmonic Signal Model

The discrete-time harmonic signal of interest, at frequency f0, in a single segment can be represented by s(n) ¼ A0 cos (2pf0 n þ f0 ), n ¼ 0, 1, . . . , N 1

(7:1)

The measured signal may be demodulated at the frequency of the desired harmonic signal, f0, to produce a baseband signal, ~s(n), ~s(n) ¼ A0 ejf0 ,

n ¼ 0, 1, . . . , N 1

(7:2)

Assuming the frequency of the harmonic signal of interest is precisely known, the signal of interest after demodulation and subsequent low-pass filtering to remove any aliasing introduced by the demodulation is a DC constant.

7.1.2

Interference Signal Model

A source of interference for this type of signal detection problem is ambient harmonic signals. For example, sensors operating in the RF band could experience interference due to other transmitters operating in the same band, such as radio stations. Since there may be many sources transmitting harmonic signals operating simultaneously, the demodulated

Signal and Image Processing for Remote Sensing

132

interference signal measured in each segment may be modeled as the sum of the contribution from M different sources, each operating at its own frequency fm, ~I ¼

M X

~ m (n)ej(2pfm nþfm ) , A

n ¼ 0, 1, . . . , N 1

(7:3)

m¼1

where the superscript denotes a complex value and we assume the frequencies are ˜ m(n) from a discrete time series. distinct, meaning fi 6¼ fj for i 6¼ j. The amplitudes A Although the amplitudes are not restricted to constant values in general, they are assumed to remain essentially constant over the short time intervals during which each data ˜ m(n) is segment is collected. For time intervals on this order, it is reasonable to assume A constant for each data segment, but may change from segment to segment. Therefore, the interference signal model may be expressed as ~I ¼

M X

~ m ej(2pfm nþfm ) , n ¼ 0, . . . , N 1 A

(7:4)

m¼1

This model represents all frequencies even though the harmonic signal of interest exists in a very narrow band. In practice, only the frequency corresponding to the harmonic signal of interest needs to be considered.

7.2

Interference Mitigation

Adaptive filtering is a widely applied approach for noise cancellation [6]. The basic approach is illustrated in Figure 7.1. The primary signal consists of both the harmonic signal of interest and the interference. In contrast, the signal measured on each auxiliary antenna or sensor consists of only the interference. Adaptive noise cancellation utilizes the measured reference signals to estimate the noise present in the measured primary signal. The noise estimate is then subtracted from the primary signal to find the signal of interest. Adaptive noise cancellation, such as the least mean square (LMS) algorithm, is well suited for those applications in which one or more reference signals are available [6].

Signal of interest

Interference signal

+ Primary antenna

Reference antenna

+ −

Adaptive filter

FIGURE 7.1 Interference mitigation based on adaptive noise cancellation.

Signal estimate

Kalman Filtering for Weak Signal Detection in Remote Sensing

133

In this application, the adaptive noise cancellation is performed in the frequency domain by applying the normalized LMS algorithm to each frequency component of the frequency domain representation of the measured primary signal. The primary measured signal at time n may be denoted by d(n) and the measured reference signal with u(n). The tap input vector may be represented by u(n) ¼ [u(n) u(n 1) u(n M þ 1)]T and ^ (n). Both the tap input and tap weight vectors are the tap weight vector may be given by w of length M. Given these definitions, the filter output at time n, e(n), is ^ H (n)u(n) e(n) ¼ d(n) w

(7:5)

^ H(n)u(n) represents the interference estimate. The tap weights are updated The quantity w according to ^ (n þ 1) ¼ w ^ (n) þ m0 P1 (n)u(n)e (n) w

(7:6)

where the parameter m0 is an adaptation constant that controls the convergence rate and P(n) is given by P(n) ¼ bP(n 1) þ (1 b)ju(n)j2

(7:7)

with 0 < b < 1 [7]. The extension of this approach to utilize multiple reference signals is straightforward.

7.3

Postmitigation Signal Models

Under perfect circumstances the interference present in the primary signal is completely correlated with the reference signals and all interference can be removed by the adaptive noise cancellation, leaving only Gaussian noise associated with the sensor system. Since the interference often travels over multiple paths and the sensing systems are not perfect, however, the adaptive interference mitigation rarely removes all the interference. Thus, there is residual interference that remains after the adaptive noise cancellation. In addition, the adaptive interference cancellation process alters the characteristics of the signal of interest. The real-valued observed data prior to interference mitigation may be represented by x(n) ¼ s(n) þ I(n) þ w(n),

n ¼ 0, 1, . . . , N 1

(7:8)

where s(n) is the signal of interest, I(n) is the interference, and w(n) is Gaussian noise associated with the sensor system. The baseband signal after demodulation becomes ~ ~ (n) x(n) ¼ ~s(n) þ ~I (n) þ w

(7:9)

where ~I (n) is the interference, which is reduced but not completely eliminated by adaptive filtering. The signal remaining after interference mitigation is ~ y(n) ¼ ~s(n) þ ~ v(n), n ¼ 0, 1 , . . . , N 1

(7:10)

Signal and Image Processing for Remote Sensing

134

where ~s(n) is now the altered signal of interest and ~v(n) is the background noise remaining after interference mitigation, both of which are described in the following sections. 7.3.1

Harmonic Signal Model

Due to the nonlinear phase effects of the adaptive filter, the demodulated harmonic signal of interest is altered by the interference mitigation; it is no longer guaranteed to be a DC constant. The postmitigation harmonic signal is modeled with a first-order Gaussian– Markov model, ~s(n þ 1) ¼ ~s(n) þ «~(n)

(7:11)

where ~ «(n) is zero mean white Gaussian noise with variance s~«2.

7.3.2

Interference Signal Model

Although interference mitigation can remove most of the interference, some residual interference remains in the postmitigation signal. The background noise remaining after interference mitigation, ~ v(n), consists of the residual interference and the remaining sensor noise, which is altered by the mitigation. Either an autoregressive (AR) model or an autoregressive moving average (ARMA) model is appropriate for representing the sharp spectral peaks, valleys, and roll-offs in the power spectrum of ~v(n). An AR model is a causal, linear, time-invariant discrete-time system with a transfer function containing only poles, whereas an ARMA model is a causal, linear, time-invariant discrete-time system with a transfer function containing both zeros and poles. The AR model has a computational advantage over the ARMA model in the coefficient computation. Specifically, the AR coefficient computation involves solving a system of linear equations known as Yule–Walker equations, whereas the ARMA coefficient computation is significantly more complicated because it requires solving systems of nonlinear equations. Estimating and identifying an AR model for real-valued time series is well understood [8]. However, for complex-valued AR models, few theoretical and practical identification and estimation methods could be found in the literature. The most common approach is to adapt methods originally developed for real-valued data to complex-valued data. This strategy works well only when the the complex-valued process is the output of a linear system driven by white circular noise whose real and imaginary parts are uncorrelated and white [9]. Unfortunately, circularity is not always guaranteed for most complexvalued processes in practical situations. Analysis of the postmitigation background noise for the specific QR signal detection problem considered here showed that it is not a pure circular complex process. However, since the cross-correlation between the real and imaginary parts is small compared to the autocorrelation, we assume that the noise is a circular complex process and can be modeled as a P-th order complex AR process. Thus, ~ v(n) ¼

P X

~ap v~(n p) þ «~(n)

(7:12)

p¼1

where the driving noise ~ «(n) is white and complex-valued. Conventional modeling methods extended from real-valued time series are used for estimating the complex AR parameters. The Burg algorithm, which estimates the AR parameters by determining reflection coefficients that minimize the sum of forward and

Kalman Filtering for Weak Signal Detection in Remote Sensing

135

backward residuals, is the preferred estimator among various estimators for AR parameters [10]. Furthermore, the Burg algorithm has been reformulated so that it can be used to analyze data containing several separate segments [11]. The usual way of combining the information across segments is to take the average of the models estimated from the individual segments [12], such as average AR parameters (AVA) and average reflection coefficient (AVK). We found that for this application, the model coefficients are stable enough to be averaged. Although this will reduce the estimate variance, it still contains a bias that is proportional to 1=N, where N is the number of observations [13]. Another novel extension of the Burg algorithm to segments, the segment Burg algorithm (SBurg), proposed in [14], estimates the reflection coefficients by minimizing the sum of the forward and backward residuals of all the segments taken together. This means a single model is fit to all the segments simultaneously. An important aspect of AR modeling is choosing the appropriate order P. Although there are several criteria to determine the order for a real AR model, no formal criterion exists for a complex AR model. We adopt the Akaike information criterion (AIC) [15] that is a common choice for real AR models. The AIC is defined as AIC(p) ¼ ln (s ^«2 (p)) þ

2p , p ¼ 1, 2, 3, . . . T

(7:13)

where ^«2(p) is the prediction error power at P-th order and T is the total number of samples. The prediction error power for a given value of P is simply the variance in the difference between the true signal and the estimate for the signal using a model of order P. For signal measurements consisting of multiple segments, i.e. S segments and N samples=segment, T ¼ N S. The model order for which the AIC has the smallest value is chosen. The AIC method tends to overestimate the order [16] and is good for short data records.

7.4

Kalman Filters for Weak Signal Estimation

Kalman filters are appropriate for discrete-time, linear, and dynamic systems whose output can be characterized by the system state. To establish notation and terminology, this section provides a brief overview consisting of the problem statement and the recursive solution of the Kalman filter variants applied for weak harmonic signal detection. Two approaches to estimate a weak signal in nonstationary noise, both employing Kalman filter approaches, are proposed [17]. The first approach directly estimates the signal of interest. In this approach, the Kalman filter is utilized in a traditional manner, in that the signal of interest is the state to be estimated. The second approach indirectly estimates the signal of interest. This approach utilizes the Kalman filter in an unconventional way because the noise is the state to be estimated, and then the noise estimate is subtracted from the measured signal to obtain an estimate of the signal of interest. The problem statements and recursive Kalman filter solutions for each of the Kalman filters considered are provided with their application to weak signal estimation. These approaches have an advantage over the more intuitive approach of simply subtracting a measurement of the noise recorded in the field because the measured background noise is nonstationary, and the residual background noise after interference mitigation is also nonstationary. Due to the nonstationarity of the background noise, it must be measured and subtracted in real time. This is exactly what the adaptive frequency-domain

Signal and Image Processing for Remote Sensing

136

interference mitigation attempts to do, and the interference mitigation does significantly improve harmonic signal detection. There are, however, limitations to the accuracy with which the reference background noise can be measured, and these limitations make the interference mitigation insufficient for noise reduction. Thus, further processing, such as the Kalman filtering proposed here, is necessary to improve detection performance.

7.4.1

Direct Signal Estimation

The first approach assumes the demodulated harmonic signal is the system state to be estimated. The conventional Kalman filter was designed for white noise. Since the measurement noise in this application is colored, it is necessary to investigate modified Kalman filters. Two modifications are considered: utilizing an autoregressive (AR) model for the colored noise and designing a Kalman filter for colored noise, which can be modeled as a Markov process. In applying each of the following Kalman filter variants to directly estimate the demodulated harmonic signal, the system is assumed to be described by state equation: ~sk ¼ ~sk1 þ «~k

(7:14)

observation equation: ~yk ¼ ~sk þ ~vk

(7:15)

where ~sk is the postmitigation harmonic signal, modeled using a first-order Gaussian– Markov model, and ~ vk is the postmitigation background noise. 7.4.1.1

Conventional Kalman Filter

The system model for the conventional Kalman filter is described by two equations: a state equation and an observation equation. The state equation relates the current system state to the previous system state, while the observation equation relates the observed data to the current system state. Thus, the system model is described by state equation: xk ¼ Fk xk1 þ Gk wk

(7:16)

observation equation: zk ¼ HH k xk þ uk

(7:17)

where the M-dimensional parameter vector xk represents the state of the system at time k, the M M matrix Fk is the known state transition matrix relating the states of the system at time k and k 1, and the N-dimensional parameter vector zk represents the measured data at time k. The M 1 vector wk represents process noise, and the N 1 vector uk is measurement noise. The M M diagonal coefficient matrix Gk modifies the variances of the process noise. If both uk and wk are independent, zero mean, white noise processes H with E{ukuH k } ¼ Rk and E{wkwk } ¼ Qk, then the initial system state x0 is a random vector, with mean x0 and covariance S0, independent of uk and wk. The Kalman filter determines the estimates of the system state, ^xkjk1 ¼ E{xkjzk1} and ^ xkjk ¼ E{xkjzk}, and the associated error covariance matrices Skjk1 and Skjk. The recursive solution is achieved in two steps. The first step predicts the current state and the error covariance matrix using the previous data, ^ xkjk1 ¼ Fk1 ^xk1jk1

(7:18)

H Skjk1 ¼ Fk1 Sk1jk1 FH k1 þ Gk1 Qk1 Gk1

(7:19)

Kalman Filtering for Weak Signal Detection in Remote Sensing

137

The second step first determines the Kalman gain, Kk, and then updates the state and the error covariance prediction using the current data, 1 Kk ¼ Skjk1 Hk (HH k Skjk1 þ Rk )

(7:20)

^ xkjk1 ) xkjk ¼ ^ xkjk1 þ Kk (zk HH k ^

(7:21)

Skjk ¼ (I Kk HH k )Skjk1

(7:22)

The initial conditions on the state estimate and error covariance matrix are ^x1j0 ¼ x0 and S1j0 ¼ S0. In general, the system parameters F, G, and H are time-varying, which is denoted in the preceding discussion by the subscript k. In the discussions of the Kalman filters implemented for harmonic signal estimation, the system is assumed to be time-invariant. Thus, the subscript k is omitted and the system model becomes state equation: xk ¼ Fxk1 þ Gwk

(7:23)

observation equation: zk ¼ HH xk þ uk

(7:24)

7.4.1.2 Kalman Filter with an AR Model for Colored Noise A Kalman filter for colored measurement noise, based on the fact that colored noise can often be simulated with sufficient accuracy by a linear dynamic system driven by white noise, is proposed in [18]. In this approach, the colored noise vector is included in an augmented state variable vector, and the observations now contain only linear combinations of the augmented state variables. The state equation is unchanged from Equation 7.23; however, the observation equation is modified so the measurement noise is colored, thus the system model is state equation: xk ¼ Fxk1 þ Gwk

(7:25)

observation equation: zk ¼ HH xk þ ~vk

(7:26)

where ~ vk is colored noise modeled by the complex-valued AR process in Equation 7.12. Expressing the P-th order AR process ~ vk in state space notion yields state equation: vk ¼ Fv vk1 þ Gv «~k

(7:27)

observation equation: ~vk ¼ HH v vk

(7:28)

3 ~ v(k P þ 1) 7 6~ 6 v(k P þ 2) 7 vk ¼ 6 7 .. 5 4 . 2

where

~ v(k) 2

0 6 0 6 . . Fv ¼ 6 6 . 4 0 ~aP

1 0 .. .

0 1 .. .

0 ~aP1

0 ~aP2

(7:29) P1

.. .

0 0 .. .

0 ~a2

3 0 0 7 .. 7 . 7 7 1 5 ~a1

PP

(7:30)

Signal and Image Processing for Remote Sensing

138

2 3 0 607 6.7 .7 Gv ¼ Hv ¼ 6 6.7 405 1

(7:31)

P 1

and ~ «k is the white driving process in Equation 7.12 with zero mean and variance s~«2. Combining Equation 7.25, Equation 7.26, and Equation 7.28, yields a new Kalman filter expression whose dimensions have been extended, k state equation: xk ¼ Fxk1 þ Gw H

observation equation: zk ¼ H xk where

H xk wk k ¼ x , wk ¼ , H ¼ HH HH v , vk «~k ¼ F 0 , and G ¼ G 0 F 0 Fv 0 Gv

(7:32) (7:33)

(7:34)

In estimation literature, this is termed the noise-free [19] or perfect measurement [20] problem. The process noise, wk, and colored noise state process noise, ~«k, are assumed to be uncorrelated, so Q 0 H Q ¼ E{wk wk } ¼ (7:35) 0 s2«~ Since there is no noise in Equation 7.33, the covariance matrix of the observation noise R ¼ 0. The recursive solution for this problem, defined in Equation 7.32 and Equation 7.33, is the same as for the conventional Kalman filter given in Equation 7.18 through Equation 7.22. When estimating the harmonic signal with the traditional Kalman filter with an AR model for the colored noise, the coefficient matrices are 3 2 ~(k) x 2 3 1 7 6~ v (k P þ 1) 7 6 6 07 v(k P þ 2) 7 k ¼ 6 x (7:36) .. 7 7, H ¼ 6 6~ 4 7 6 .. .5 5 4 . 1 Pþ1 ~ v(k) 3 2 1 0 0 0 0 60 1 0 0 0 7 6. .. 7 .. .. .. .. 6 . (7:37) F¼6. . 7 . . . . 7 40 0 0 0 1 5 0 aP aP1 a2 a1

~k w wk ¼ , «~k

2

1 60 G¼6 4 ... 0

3 0 07 .. 7 .5 1

(7:38) (Pþ1)2

Kalman Filtering for Weak Signal Detection in Remote Sensing and Q ¼ E{wk wH k }¼

s2w~ 0

0 s2«~

139

(7:39)

Since the theoretical value R ¼ 0 turns off the Kalman filter it does not track the signal, R should be set to a positive value. 7.4.1.3 Kalman Filter for Colored Noise The Kalman filter has been generalized to systems for which both the process noise and measurement noise are colored and can be modeled as Markov processes [21]. A Markov process is a process for which the probability density function of the current sample depends only on the previous sample, not the entire process history. Although the Markov assumption may not be accurate in practice, this approach remains applicable. The system model is state equation: xk ¼ Fxk1 þ Gwk

(7:40)

observation equation: zk ¼ HH k x þ uk

(7:41)

where the process noise wk and the measurement noise uk are both zero mean and colored with arbitrary covariance matrices at time k, Q, and R, so Qij ¼ cov(wi , wj ), i, j ¼ 0, 1, 2, . . . Rij ¼ cov(ui , uj ), i, j ¼ 0, 1, 2, . . . The initial state x0 is a random vector with mean x0 and covariance matrix P0, and x0, wk, and uk are independent. The prediction step of the Kalman filter solution is

where Ck

1

^ xkjk1 ¼ F^xk1jk1

(7:42)

Skjk1 ¼ FSk1jk1 FH þ GQk1 GH þ FCk1 þ (FCk1 )H

(7:43)

1 k 1 ¼ Ckk 1jk 1 and Ck 1jk

1

is given recursively by

k1 H ¼ FCk1 Ciji1 i1ji1 þ GQi1,k1 G

(7:44)

k1 H k1 Ck1 iji ¼ Ciji1 Ki H Ciji1

(7:45)

k ¼ 0 and Q0 ¼ 0. The k in the superscript denotes time k. with the initial value C0j0 The subsequent update step is

^ xkjk ¼ ^ xkjk1 þ Kk (zk HH ^xkjk1 )

(7:46)

Skjk ¼ Skjk1 Kk Sk KH k

(7:47)

Kk ¼ (Skjk1 H þ Vk )S1 k

(7:48)

where Kk and Sk are given by

Signal and Image Processing for Remote Sensing

140

Sk ¼ HH Skjk1 H þ HH Vk þ (HH Vk )H þ Rk Vk ¼ Vkkjk

1

(7:49)

is given recursively by Vkiji1 ¼ FVki1ji1 Vkiji ¼ Vkiji1 þ Ki (Ri,k HH Vkiji1 )

(7:50)

withRk being the new ‘‘data’’ and the initial value V0j0k ¼ 0. When the noise is white V ¼ 0 and C ¼ 0, and this Kalman filter reduces to the conventional Kalman filter. It is worth noting that this filter is optimal only when E{wk} and E{uk} are known.

7.4.2

Indirect Signal Estimation

A central premise in Kalman filter theory is that the underlying state-space model is accurate. When this assumption is violated, the performance of the filter can deteriorate appreciably. The sensitivity of the Kalman filter to signal modeling errors has led to the development of robust Kalman filtering techniques based on modeling the noise. The conventional point of view in applying Kalman filters to a signal detection problem is to assign the system state, xk, to the signal to be detected. Under this paradigm, the hypotheses ‘‘signal absent’’ (H0) and ‘‘signal present’’ (H1) are represented by the state itself. When the conventional point of view is applied for this particular application, the demodulated harmonic signal (a DC constant) is the system state to be estimated. Although a model for the desired signal can be developed from a relatively pure harmonic signal obtained in a shielded lab environment, that signal model often differs from the harmonic signal measured in the field, sometimes substantially, because the measured signal may be a function of system and environmental parameters, such as temperature. Therefore the harmonic signal measured in the field may deviate from the assumed signal model and the Kalman filter may produce a poor state estimate due to inaccuracies in the state equation. However, the background noise in the field can be measured, from which a reliable noise model can be developed, even though the measured noise is not exactly the same as the noise corrupting the measurements. The background noise in the postmitigation signal (Equation 7.10), ~v(n), can be modeled as a complex-valued AR process as previously described. The Kalman filter is applied to estimate the background noise in the postmitigation signal and then the background noise estimate is subtracted from the postmitigation signal to indirectly estimate the postmitigation harmonic signal (a DC constant). Thus, the state in the Kalman filter, xk, is the background noise ~ v(n), 3 ~ v(k P þ 1) 7 6~ 6 v(k P þ 2) 7 xk ¼ 6 7 .. 5 4 . 2

(7:51)

v~(k) and the measurement noise in the observation, uk, is the demodulated harmonic signal ~s(n), uk ¼ ~sk

(7:52)

Kalman Filtering for Weak Signal Detection in Remote Sensing

141

The other system parameters are 2

0 6 0 6 . . F¼6 6 . 4 0 ~aP

.. .

1 0 .. .

0 1 .. .

0 ~aP1

0 ~aP2

0 0 .. .

0 ~a2

2 3 0 607 6.7 .7 G¼H¼6 6.7 405 1

3 0 0 7 .. 7 . 7 7 1 5

(7:53)

~a1

(7:54) P1

and wk ¼ ~ «k. Therefore, the measurement equation becomes vk þ ~sk zk ¼ ~

(7:55)

where zk is the measured data corresponding to ~y(k) in Equation 7.10. A Kalman filter assumes complete a priori knowledge of the process and measurement noise statistics Q and R. These statistics, however, are inexactly known in most practical situations. The use of incorrect a priori statistics in the design of a Kalman filter can lead to large estimation errors, or even to a divergence of errors. To reduce or bound these errors, an adaptive filter is employed by modifying or adapting the Kalman filter to the real data. The approaches to adaptive filtering are divided into four categories: Bayesian, maximum likelihood, correlation, and covariance matching [22]. The last technique has been suggested for the situations when Q is known but R is unknown. The covariance matching algorithm ensures that the residuals remain consistent with the theoretical covariance. The residual, or innovation, is defined by vk ¼ zk HH ^xkjk1

(7:56)

which has a theoretical covariance of H E{vk vH k } ¼ H Skjk1 H þ R

(7:57)

If the actual covariance of vk is much larger than the covariance obtained from the Kalman filter, R should be increased to prevent divergence. This has the effect of increasing Skjk1, thus bringing the actual covariance of vk closer to that given in Equation 7.57. In this case, R is estimated as m X H ^k ¼ 1 vkj vH R kj H Skjk1 H m j¼1

(7:58)

Here, a two-step adaptive Kalman filter using the covariance matching method is pro^ . Then, the convenposed. First, the covariance matching method is applied to estimate R ^ tional Kalman filter is implemented with R ¼ R to estimate the background noise. In this application, there are several measurements of data, with each measurement containing tens to hundreds of segments. For each segment, the covariance matching method is

Signal and Image Processing for Remote Sensing

142

(yn) = s(n) + v(n)

Adaptive Kalman filter

Estimate R

Rs

Conventional Kalman filter

v(n) −

+ + s(n)

Detector FIGURE 7.2 Block diagram of the two-step adaptive Kalman filter strategy. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Decision

b

H0 H1

^ k, where the subscript k denotes the sample index. Since it is an employed to estimate R ^ k (k m) are retained and averaged adaptive procedure, only the steady-state values of R ^ s for each segment, to find the value of R ^s ¼ R

1 X 1 N ^k R N m k¼m

(7:59)

where the subscript s denotes the segment index. Then, the average is taken over all the segments in each measurement. Thus, L X ^s ^¼1 R R L s¼1

(7:60)

is used in the conventional Kalman filter in this two-step process. A block diagram depicting the two-step adaptive Kalman filter strategy is shown in Figure 7.2.

7.5

Application to Landmine Detection via Quadrupole Resonance

Landmines are a form of unexploded ordnance, usually emplaced on or just under the ground, which are designed to explode in the presence of a triggering stimulus such as pressure from a foot or vehicle. Generally, landmines are divided into two categories: antipersonnel mines and antitank mines. Antipersonnel (AP) landmines are devices usually designed to be triggered by a relatively small amount of pressure, typically 40 lbs, and generally contain a small amount of explosive so that the explosion aims or

Kalman Filtering for Weak Signal Detection in Remote Sensing

143

kills the person who triggers the device. In contrast, antitank (AT) landmines are specifically designed to destroy tanks and vehicles. They explode only if compressed by an object weighing hundreds of pounds. AP landmines are generally small (less than 10 cm in diameter) and are usually more difficult to detect than the larger AT landmines.

7.5.1

Quadrupole Resonance

When compounds with quadrupole moments are excited by a properly designed EMI system, they emit unique signals characteristic of the compound’s chemical structure. The signal for a compound consists of a set of spectral lines, where the spectral lines correspond to the QR frequencies for that compound, and every compound has its own set of resonant frequencies. This phenomenon is similar to nuclear magnetic resonance (NMR). Although there are several subtle distinctions between QR and NMR, in this context it is sufficient to view QR as NMR without the external magnetic field [4]. The QR phenomenon is applicable for landmine detection because many explosives, such as RDX, TNT, and PETN, contain nitrogen, and some of nitrogen’s isotopes, namely 14N, have electric quadrupole moments. Because of the chemical specificity of QR, the QR frequencies for explosives are unique and are not shared with other nitrogenous materials. In summary, landmine detection using QR is achieved by observing the presence, or absence, of a QR signal after applying a sequence of RF pulses designed to excite the resonant frequency of frequencies for the explosive of interest [4]. RFI presents a problem since the frequencies of the QR response fall within the commercial AM radio band. After the QR response is measured, additional processing is often utilized to reduce the RFI, which is usually a non-Gaussian colored noise process. Adaptive filtering is a common method for cancelling RFI when RFI reference signals are available. The frequency-domain LMS algorithm is an efficient method for extracting the QR signal from the background RFI. Under perfect circumstances, when the RFI measured on the main antenna is completely correlated with the signals measured on the reference antennas, all RFI can be removed by RFI mitigation, leaving only Gaussian noise associated with the QR system. Since the RFI travels over multiple paths and the antenna system is not perfect, however, the RFI mitigation cannot remove all of the nonGaussian noise. Consequently, more sophisticated signal processing methods must be employed to estimate the QR signal after the RFI mitigation, and thus improve the QR signal detection. Figure 7.3 shows the basic block diagram for QR signal detection. The data acquired during each excitation pulse are termed a segment, and a group of segments constitutes a measurement. In general, for each potential target there are multiple measurements, with each measurement containing several hundred segments. The measurements are demodulated at the expected QR resonant frequency. Thus, if the demodulated frequency equals the QR resonant frequency, the QR signal after demodulation is a DC constant.

7.5.2

Radio-Frequency Interference

Although QR is a promising technology due to its chemical specificity, it is limited by the inherently weak QR signal and susceptibility to RFI. TNT is one of the most prevalent explosives in landmines, and also one of the most difficult explosives to detect. TNT possesses 18 resonant frequencies, 12 of which are clustered in the range of 700–900 kHz. Consequently, AM radio transmitters strongly interfere with TNT-QR detection in the

Signal and Image Processing for Remote Sensing

144 x(n)

e−j2πF0n

×

Low-pass filter x(n) RFI Mitigation y(n) Filtering

Detector

Decision

b

H0 H1

FIGURE 7.3 Signal processing block diagram for QR signal detection. F0 is the resonant QR frequency for the explosive of interest and denotes a complex-valued signal. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

field, and are the primary source of RFI. Since there may be several AM radio transmitters operating simultaneously, the baseband (after demodulation) RFI signal measured by the QR system in each segment may be modeled as ~I (n) ¼

M X

~ m (n)ej(2pfm nþfm ) , A

n ¼ 0, . . . , N 1

(7:61)

m¼1

where the superscript denotes a complex value and we assume the frequencies are distinct, meaning fi 6¼ fj for i 6¼ j. For the RFI, Am(n) is the discrete time series of the message signal from an AM transmitter, which may be a nonstationary speech or music signal from a commercial AM radio station. The statistics of this signal, however, can be assumed to remain essentially constant over the short time intervals during which data are collected. For time intervals of this order, it is reasonable to assume Am(n) is constant for each data segment, but may change from segment to segment. Therefore, Equation 7.61 may be expressed as ~I (n) ¼

M X

~ m ej(2pfm nþfm ) , A

n ¼ 0, . . . , N 1

(7:62)

m¼1

This model represents all frequencies even though each of the QR signals exists in a very narrow band. In practice, only the frequencies corresponding to the QR signals need be considered.

7.5.3

Postmitigation Signals

The applicability of the postmitigation signal models described previously is demonstrated by examining measured QR and RFI signals.

Kalman Filtering for Weak Signal Detection in Remote Sensing

145

7.5.3.1 Postmitigation Quadrupole Resonance Signal An example simulated QR signal before and after RFI mitigation is shown in Figure 7.4. Although the QR signal is a DC constant prior to RFI mitigation, that is no longer true after mitigation. The first-order Gaussian–Markov model introduced previously is an appropriate model for postmitigation QR signal. The value of s~«2 ¼ 0.1 is estimated from the data.

7.5.3.2 Postmitigation Background Noise Although RFI mitigation can remove most of the RFI, some residual RFI remains in the postmitigation signal. The background noise remaining after RFI mitigation, ~v(n), consists of the residual RFI and the remaining sensor noise, which is altered by the mitigation. An example power spectrum of ~ v(n) derived from experimental data is shown in Figure 7.5. The power spectrum contains numerous peaks and valleys. Thus, the AR and ARMA models discussed previously are appropriate for modeling the residual background noise.

7.5.4

Kalman Filters for Quadrupole Resonance Detection

7.5.4.1 Conventional Kalman Filter For the system described by Equation 7.14 and Equation 7.15, the variables in the ~ (k), uk ¼ ~v(k), and zk ¼ ~y(k), and the conventional Kalman filter are xk ¼ ~s(k), wk ¼ w coefficient and covariance matrices in the Kalman filter are F ¼ [1], G ¼ [1], H ¼ [1], and Q ¼ [sw2]. The covariance R is estimated from the data, and the off-diagonal elements are set to 0.

12

10

Amplitude

8 Pre RFI-MIT: Real Part Post RFI-MIT: Real Part Pre RFI-MIT: Imag Part Post RFI-MIT: Imag Part

6

4

2

0 −2

5

10

15

20

25

30

35

40

45

50

55

Samples FIGURE 7.4 Example realization of the the simulated QR signal before and after RFI mitigation. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Signal and Image Processing for Remote Sensing

146 160

Power spectral density (dB)

140

120

100

80

60

40

20 −4

−3

−2

−1

0 Frequency

1

2

3

4

FIGURE 7.5 Power spectrum of the remaining background noise n~(n) after RFI mitigation. The x-axis units are normalized frequency ([p,p]). (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

7.5.4.2

Kalman Filter with an Autoregressive Model for Colored Noise

The multi-segment Burg algorithms are used to estimate the AR parameters, and the simulation results do not show significant differences between AVA, AVK, and SBurg algorithms. The AVA algorithm was chosen to estimate the AR coefficients a˜p and error power s~«2 describing the background noise for each measurement. The optimal order, as determined by the AIC, is 6. 7.5.4.3

Kalman Filter for Colored Noise

For this Kalman filter, the coefficient and covariance matrices are the same as for the conventional Kalman filter, with the exception of the observation noise covariance R. In this filter, all elements of R are retained, as opposed to the conventional Kalman filter in which only the diagonal elements are retained. 7.5.4.4 Indirect Signal Estimation For the adaptive Kalman filter in the first step, all coefficient and covariance matrices, F, G, H, Q, R, and S0, are the same under both H1 and H0. The practical value of x0 is given by [~ y(0) ~ y(1) ~y(P 1)]T

(7:63)

where P is the order of the AR model representing ~v(n). For the conventional Kalman filter in the second step, F, G, H, Q, and S0 are the same under both H1 and H0; however, the ^ , depends on the postmitigation signal and estimated observation noise covariance, R therefore is different under the two hypotheses.

Kalman Filtering for Weak Signal Detection in Remote Sensing

7.6

147

Performance Evaluation

The proposed Kalman filter methods are evaluated on experimental data collected in the field by QM. A typical QR signal consists of multiple sinusoids where the amplitude and frequency of each resonant line are the parameters of interest. Usually, the amplitude of only one resonant line is estimated at a time, and the frequency is known a priori. For the baseband signal demodulated at the desired resonant frequency, adaptive RFI mitigation is employed. A 1-tap LMS mitigation algorithm is applied to each frequency component of the frequency-domain representation of the experimental data [23]. First, performance is evaluated for synthetic QR data. In this case, the RFI data are measured experimentally in the absence of an explosive material, and an ideal QR signal (complex-valued DC signal) is injected into the measured RFI data. Second, performance is evaluated for experimental QR data. In this case, the data are measured for both explosive and nonexplosive samples in the presence of RFI. The matched filter is employed for the synthetic QR data, while the energy detector is utilized for the measured QR data.

7.6.1

Detection Algorithms

After an estimate of the QR signal has been obtained, a detection algorithm must be applied to determine whether or not the QR signal is present. For this binary decision problem, both an energy detector and a matched filter are applied. The energy detector simply computes the energy in the QR signal estimate, ~^s(n) Es ¼

N 1 X

j~^s(n)j2

(7:64)

n¼0

As it is a simple detection algorithm that does not incorporate any prior knowledge of the QR signal characteristics, it is easy to compute. The matched filter computes the detection statistic ( l ¼ Re

N 1 X

^~s(n)S ~

) (7:65)

n¼0

~ is the reference signal, which, in this application, is the known QR signal. The where S matched filter is optimal only if the reference signal is precisely known a priori. Thus, if there is uncertainty regarding the resonant frequency of the QR signal, the matched filter will no longer be optimal. QR resonant frequency uncertainty may arise due to variations in environmental parameters, such as temperature.

7.6.2

Synthetic Quadrupole Resonance Data

Prior to detection using the matched filter, the QR signal is estimated directly. Three Kalman filter approaches to estimate the QR signal are considered: the conventional Kalman filter, the extended Kalman filter, and the Kalman filter for arbitrary colored noise. Each of these filters requires the initial value of the system state, x0, and the selection of the initial value may affect the estimate of x. The sample mean of the observation ~ y(n) is used to set x0. Since only the steady-state output of the Kalman filter is reliable,

148

Signal and Image Processing for Remote Sensing

the first 39 points are removed from the data used for detection to ensure the system has reached steady state. Although the measurement noise is known to be colored, the conventional Kalman filter (conKF) designed for white noise is applied to estimate the QR signal. This algorithm has the benefits of being simpler with lower computational burden than either of the other two Kalman filters proposed for direct estimation of the QR signal. Although its assumptions regarding the noise structure are recognized as inaccurate, its performance can be used as a benchmark for comparison. For a given process noise covariance Q, the covariance of the intial state x0, S0, affects the speed with which S reaches steady state in the conventional Kalman filter. As Q increases, it takes less time for S to converge. However, the steady-state value of S also increases. In these simulations, S0 ¼ 10 in the conventional Kalman filter. The second approach applied to estimate the QR signal is the Kalman filter with an AR model for the colored noise (extKF). Since R ¼ 0 shuts down the Kalman filter it does not track the signal, we set R ¼ 10. It is shown that the detection performance does not decrease when Q increases. Finally, the Kalman filter for colored noise (arbKF) is applied to the synthetic data. Compared to the conventional Kalman filter, the Kalman filter for arbitrary noise has a smaller Kalman gain, and therefore, slower convergence speed. Consequently, the error covariances for the Kalman filter for arbitrary noise are larger. Since only the steady state is used for detection, S0 ¼ 50 is chosen for Q ¼ 0.1 and S0 ¼ 100 is chosen for Q ¼ 1 and Q ¼ 10. Results for each of these Kalman filtering methods for different values of Q are presented in Figure 7.6. When Q is small (Q ¼ 0.1 and Q ¼ 1), all three methods have similar detection performance. However, when greater model error is introduced in the state equation (Q ¼ 10) both the conventional Kalman filter and the Kalman filter for colored noise have poorer detection performance than the Kalman filter with an AR model for the noise. Thus, the Kalman filter with an AR model for the noise shows robust performance. Considering the computational efficiency, the Kalman filter for colored noise is the poorest because it recursively estimates C and V for each k. In this application, although the measurement noise ~ vk is colored, the diagonal elements of the covariance matrix dominate. Therefore, the conventional Kalman filter is preferable to the Kalman filter for colored noise. 7.6.3

Measured Quadrupole Resonance Data

Indirect estimation of the QR signal is validated using two groups of real data collected both with and without an explosive present. The two explosives for which measured QR data are collected, denoted Type A and Type B, are among the more common explosives found in landmines. It is well known that the Type B explosive is the more challenging of the two explosives to detect. The first group, denoted Data II, has Type A explosive, and the second group, denoted Data III, has both Type A and Type B explosives. For each data group, a 50–50% training–testing strategy is employed. Thus, 50% of the data are used to estimate the coefficient and covariance matrices, and the remaining 50% of the data are used to test the algorithm. The AVA algorithm is utilized to estimate the AR parameters from the training data. Table 7.1 lists the four training–testing strategies considered. For example, if there are 10 measurements and measurements 1–5 are used for training and measurements 6–10 are used for testing, then this is termed ‘‘first 50% training, last 50% testing.’’ If the training–testing strategy measurements 1, 3, 5, 7, 9 are used for training, and the other measurements for testing, this is termed ‘‘odd 50% training, even 50% testing.’’

Kalman Filtering for Weak Signal Detection in Remote Sensing (b) Data II-2

1

1

0.9

0.9

0.8

0.8 Probability of detection

Probability of detection

(a) Data II-1

0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=0.1 Post arbKF: Q=0.1 Post extKF: Q=0.1

0.3 0.2

0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=1 Post arbKF: Q=1 Post extKF: Q=1

0.3 0.2

0.1 0

149

0.1 0

0.2 0.4 0.6 0.8 Probability of false alarm

1

0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

(c) Data II-3 1 0.9

Probability of detection

0.8 0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=10 Post arbKF: Q=10 Post extKF: Q=10

0.3 0.2 0.1 0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.6 Direct estimation of QR signal tested on Data I (synthetic data) with different noise model variance Q. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Figure 7.7 and Figure 7.8 present the performance of indirect QR signal estimation followed by an energy detector. The data are measured for both explosive and nonexplosive samples in the presence of RFI. The two different explosive compounds are referred to as Type A explosive and Type B explosive. Data II and Data III-1 are Type A explosive and Data III-2 is Type B explosive. Indirect QR signal estimation provides almost perfect TABLE 7.1 Training–testing strategies for Data II and Data III Training

Testing

First 50% Last 50% Odd 50% Even 50%

Last 50% First 50% Even 50% Odd 50%

Signal and Image Processing for Remote Sensing

150

1

1

0.9

0.9

0.8

0.8

Probability of detection

(b) Data II-2

Probability of detection

(a) Data II-1

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1

1

0

0

0.2 0.4 0.6 0.8 Probability of false alarm

1

(c) Data II-3 1

Probability of detection

0.9 0.8 0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.7 Indirect estimation of QR signal tested on Data II (true data). Four different training–testing strategies are plotted together. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

detection for the Type A explosive. Although detection is not near-perfect for the Type B explosive, the detection performance following the application of the Kalman filter is better than the performance prior to applying the Kalman filter.

7.7

Summary

The detectability of weak signals in remote sensing applications can be hindered by the presence of interference signals. In situations where it is not possible to record the measurement without the interference, adaptive filtering is an appropriate method to mitigate the interference in the measured signal. Adaptive filtering, however, may not remove all the interference from the measured signal if the reference signals are not

Kalman Filtering for Weak Signal Detection in Remote Sensing

1

1

0.9

0.9

0.8

0.8

Probability of detection

(b) Data III-2

Probability of detection

(a) Data III-1

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

151

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1

1

0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.8 Indirect estimation of QR signal tested on Data III (true data). Four different training–testing strategies are plotted together. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

completely correlated with the primary measured signal. One approach for subsequent processing to detect the signal of interest when residual interference remains after the adaptive noise cancellation is Kalman filtering. An accurate signal model is necessary for Kalman filters to perform well. It is so critical that even small deviations may cause very poor performance. The harmonic signal of interest may be sensitive to the external environment, which may then restrict the signal model accuracy. To overcome this limitation, an adaptive two-step algorithm, employing Kalman filters, is proposed to estimate the signal of interest indirectly. The utility of this approach is illustrated by applying it to QR signal estimation for landmine detection. QR technology provides promising explosive detection efficiency because it can detect the ‘‘fingerprint’’ of explosives. In applications such as humanitarian demining, QR has proven to be highly effective if the QR sensor is not exposed to RFI. Although adaptive RFI mitigation removes most of RFI, additional signal processing algorithms applied to the postmitigation signal are still necessary to improve landmine detection. Indirect signal estimation is compared to direct signal estimation using Kalman filters and is shown to be more effective. The results of this study indicate that indirect QR signal estimation provides robust detection performance.

References 1. B.D.O. Anderson and J.B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, NJ, 1979. 2. S.J. Norton and I.J. Won, Identification of buried unexploded ordnance from broadband electromagnetic induction data, IEEE Transactions on Geoscience and Remote Sensing, 39(10), 2253– 2261, 2001. 3. K. O’Neill, Discrimination of UXO in soil using broadband polarimetric GPR backscatter, IEEE Transactions on Geoscience and Remote Sensing, 39(2), 356–367, 2001.

152

Signal and Image Processing for Remote Sensing

4. N. Garroway, M.L. Buess, J.B. Miller, B.H. Suits, A.D. Hibbs, G.A. Barrall, R. Matthews, and L.J. Burnett, Remote sensing by nuclear quadrupole resonance, IEEE Transactions on Geoscience and Remote Sensing, 39(6), 1108–1118, 2001. 5. J.A.S. Smith, Nuclear quadrupole resonance spectroscopy, general principles, Journal of Chemical Education, 48, 39–49, 1971. 6. S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, 1991. 7. C.F. Cowan and P.M. Grant, Adaptive Filters, Prentice-Hall, Englewood Cliffs, NJ, 1985. 8. G.E.P. Box and G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, CA, 1970. 9. B. Picinbono and P. Bondon, Second-order statistics of complex signals, IEEE Transactions on Signal Processing, SP-45(2), 411–420, 1979. 10. P.M.T. Broersen, The ABC of autoregressive order selection criteria, Proceedings of SYSID SICE, 231–236, 1997. 11. S. Haykin, B.W. Currie, and S.B. Kesler, Maximum-entropy spectral analysis of radar clutter, Proceedings of the IEEE, 70(9), 953–962, 1982. 12. A.A. Beex and M.D.A. Raham, On averaging Burg spectral estimators for segments, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34, 1473–1484, 1986. 13. D. Tjostheim and J. Paulsen, Bias of some commonly-used time series estimates, Biometrika, 70(2), 389–399, 1983. 14. S. de Waele and P.M.T. Broersen, The Burg algorithm for segments, IEEE Transactions on Signal Processing, SP-48, 2876–2880, 2000. 15. H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, AC-19(6), 716–723, 1974. 16. M. Wax and T. Kailath, Detection of signals by information theoretic criteria, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-33(2), 387–392, 1985. 17. Y. Tan, S.L. Tantum, and L.M. Collins, Kalman filtering for enhanced landmine detection using quadrupole resonance, IEEE Transactions on Geoscience and Remote Sensing, 43(7), 1507– 1516, 2005. 18. J.D. Gibson, B. Koo, and S.D. Gray, Filtering of colored noise for speech enhancement and coding, IEEE Transactions on Signal Processing, 39(8), 1732–1742, 1991. 19. A.P. Sage and J.L. Melsa, Estimation Theory with Applications to Communications and Control, McGraw-Hill, New York, 1971. 20. P.S. Maybeck, Stochastic Models, Estimation, and Control, Academic Press, New York, 1979. 21. X.R. Li, C. Han, and J. Wang, Discrete-time linear filtering in arbitrary noise, Proceedings of the 39th IEEE Conference on Decision and Control, 2000, 1212–1217, 2000. 22. R.K. Mehra, Approaches to adaptive filtering, IEEE Transactions on Automatic Control, AC-17, 693–398, 1972. 23. Y. Tan, S.L. Tantum, and L.M. Collins, Landmine detection with nuclear quadrupole resonance, In IGARSS 2002 Proceedings, 1575–1578, July 2002.

8 Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation Moisture Dynamics

J. Verbesselt, P. Jo¨nsson, S. Lhermitte, I. Jonckheere, J. van Aardt, and P. Coppin

CONTENTS 8.1 Introduction ....................................................................................................................... 153 8.2 Data ..................................................................................................................................... 155 8.2.1 Study Area ............................................................................................................. 155 8.2.2 Climate Data.......................................................................................................... 156 8.2.3 Remote Sensing Data ........................................................................................... 157 8.3 Serial Correlation and Time-Series Analysis................................................................ 158 8.3.1 Recognizing Serial Correlation........................................................................... 158 8.3.2 Cross-Correlation Analysis ................................................................................. 160 8.3.3 Time-Series Analysis: Relating Time-Series and Autoregression ................ 162 8.4 Methodology...................................................................................................................... 163 8.4.1 Data Smoothing .................................................................................................... 163 8.4.2 Extracting Seasonal Metrics from Time-Series and Statistical Analysis ..... 165 8.5 Results and Discussion .................................................................................................... 166 8.5.1 Temporal Analysis of the Seasonal Metrics ..................................................... 166 8.5.2 Regression Analysis Based on Values of Extracted Seasonal Metrics......... 167 8.5.3 Time-Series Analysis Techniques ...................................................................... 168 8.6 Conclusions........................................................................................................................ 169 Acknowledgments ..................................................................................................................... 170 References ................................................................................................................................... 170

8.1

Introduction

The repeated occurrence of severe wildfires, which affect various fire-prone ecosystems of the world, has highlighted the need to develop effective tools for monitoring fire-related parameters. Vegetation water content (VWC), which influences the biomass burning processes, is an example of one such parameter [1–3]. The physical definitions of VWC vary from water volume per leaf or ground area (equivalent water thickness) to water mass per mass of vegetation [4]. Therefore, VWC could also be used to infer vegetation water stress and to assess drought conditions that linked with fire risk [5]. Decreases in VWC due to the seasonal decrease in available soil moisture can induce severe fires in 153

Signal and Image Processing for Remote Sensing

154

most ecosystems. VWC is particularly important for determining the behavior of fires in savanna ecosystems because the herbaceous layer becomes especially flammable during the dry season when the VWC is low [6,7]. Typically, VWC in savanna ecosystems is measured using labor-intensive vegetation sampling. Several studies, however, indicated that VWC can be characterized temporally and spatially using meteorological or remote sensing data, which could contribute to the monitoring of fire risk [1,4]. The meteorological Keetch–Byram drought index (KBDI) was selected for this study. This index was developed to incorporate soil water content in the root zone of vegetation and is able to assess the seasonal trend of VWC [3,8]. The KBDI is a cumulative algorithm for the estimation of fire potential from meteorological information, including daily maximum temperature, daily total precipitation, and mean annual precipitation [9,10]. The KBDI also has been used for the assessment of VWC for vegetation types with shallow rooting systems, for example, the herbaceous layer of the savanna ecosystem [8,11]. The application of drought indices, however, presents specific operational challenges. These challenges are due to the lack of meteorological data for certain areas, as well as spatial interpolation techniques that are not always suitable for use in areas with complex terrain features. Satellite data provide sound alternatives to meteorological indices in this context. Remotely sensed data have significant potential for monitoring vegetation dynamics at regional to global scale, given the synoptic coverage and repeated temporal sampling of satellite observations (e.g., SPOT VEGETATION or NOAA AVHRR) [12,13]. These data have the advantage of providing information on remote areas where ground measurements are impossible to obtain on a regular basis. Most research in the scientific community using optical sensors (e.g., SPOT VEGETATION) to study biomass burning has focused on two areas [4]: (1) the direct estimation of VWC and (2) the estimation of chlorophyll content or degree of drying as an alternative to the estimation of VWC. Chlorophyll-related indices are related to VWC based on the hypothesis that the chlorophyll content of leaves decreases proportionally to the VWC [4]. This assumption has been confirmed for selected species with shallow rooting systems (e.g., grasslands and understory forest vegetation) [14–16], but cannot be generalized to all ecosystems [4]. Therefore, chlorophyll-related indices, such as the normalized difference vegetation index (NDVI), only can be used in regions where the relationship among chlorophyll content, degree of curing, and water content has been established. Accordingly, a remote sensing index that is directly coupled to the VWC is used to investigate the potential of hyper-temporal satellite imagery to monitor the seasonal vegetation moisture dynamics. Several studies [4,16–18] have demonstrated that VWC can be estimated directly through the normalized difference of the near infrared reflectance (NIR, 0.78–0.89 mm) rNIR, influenced by the internal structure and the dry matter, and the shortwave infrared reflectance (SWIR, 1.58–1.75 mm) rSWIR, influenced by plant tissue water content: NDWI ¼

rNIR rSWIR rNIR þ rSWIR

(8:1)

The NDWI or normalized difference infrared index (NDII) [19] is similar to the global vegetation moisture index (GVMI) [20]. The relationship between NDWI and KBDI time-series, both related to VWC dynamics, is explored. Although the value of time-series data for monitoring vegetation moisture dynamics has been firmly established [21], only a few studies have taken serial correlation into account when correlating time-series [6,22–25]. Serial correlation occurs when data collected through time contain values at time t, which are correlated with observations at

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 155 time t – 1. This type of correlation in time-series, when related to VWC dynamics, is mainly caused by the seasonal variation (dry–wet cycle) of vegetation [26]. Serial correlation can be used to forecast future values of the time-series by modeling the dependence between observations but affects correlations between variables measured in time and violates the basic regression assumption of independence [22]. Correlation coefficients of serially correlated data cannot be used as indicators of goodness-of-fit of a model as the correlation coefficients are artificially inflated [22,27]. The study of the relationship between NDWI and KBDI is a nontrivial task due to the effect of serial correlation. Remedies for serial correlation include sampling or aggregating the data over longer time intervals, as well as further modeling, which can include techniques such as weighted regression [25,28]. However, it is difficult to account for serial correlation in time-series related to VWC dynamics using extended regression techniques. The time-series related to VWC dynamics often exhibit high non-Gaussian serial correlation and are more significantly affected by outliers and measurement errors [28]. A sampling technique therefore is proposed, which accounts for serial correlation in seasonal time-series, to study the relationship between different time-series. The serial correlation effect in time-series is assumed to be minimal when extracting one metric per season (e.g., start of the dry season). The extracted seasonal metrics are then utilized to study the relationship between time-series at a specific moment in time (e.g., start of the dry season). The aim of this chapter is to address the effect of serial correlation when studying the relationship between remote sensing and meteorological time-series related to VWC by comparing nonserially correlated seasonal metrics from time-series. This chapter therefore has three defined objectives. Firstly, an overview of time-series analysis techniques and concepts (e.g., stationarity, autocorrelation, ARIMA, etc.) is presented and the relationship between time-series is studied using cross-correlation and ordinary least square (OLS) regression analysis. Secondly, an algorithm for the extraction of seasonal metrics is optimized for satellite and meteorological time-series. Finally, the temporal occurrence and values of the extracted nonserially correlated seasonal metrics are analyzed statistically to define the quantitative relationship between NDWI and KBDI time-series. The influence of serial correlation is illustrated by comparing results from cross-correlation and OLS analysis with the results from the investigation of correlation between extracted metrics.

8.2 8.2.1

Data Study Area

The Kruger National Park (KNP), located between latitudes 238S and 268S and longitudes 308E and 328E in the low-lying savanna of the northeastern part of South Africa, was selected for this study (Figure 8.1). Elevations range from 260 to 839 m above sea level, and mean annual rainfall varies between 350 mm in the north and 750 mm in the south. The rainy season within the annual climatic season can be confined to the summer months (i.e., November to April), and over a longer period can be defined by alternating wet and dry seasons [7]. The KNP is characterized by an arid savanna dominated by thorny, fine-leafed trees of the families Mimosaceae and Burseraceae. An exception is the northern part of the KNP where the Mopane, a broad-leafed tree belonging to the Ceasalpinaceae, almost completely dominates the tree layer.

Signal and Image Processing for Remote Sensing

156

Punda Maria N

Shingwedzi

Letaba

Satara

Pretoriuskop Onder Sabie 0 10 20 30 40 50 km FIGURE 8.1 The Kruger National Park (KNP) study area with the weather stations used in the analysis (right). South Africa is shown with the borders of the provinces and the study area (top left).

8.2.2

Climate Data

Climate data from six weather stations in the KNP with similar vegetation types were used to estimate the daily KBDI (Figure 8.1). KBDI was derived from daily precipitation and maximum temperature data to estimate the net effect on the soil water balance [3]. Assumptions in the derivation of KBDI include a soil water capacity of approximately 20 cm and an exponential moisture loss from the soil reservoir. KBDI was initialized during periods of rainfall events (e.g., rainy season) that result in soils with maximized field capacity and KBDI values of zero [8]. The preprocessing of KBDI was done using the method developed by Janis et al. [10]. Missing daily maximum temperatures were replaced with interpolated values of daily maximum temperatures, based on a linear interpolation function [30]. Missing daily precipitation, on the other hand, was assumed to be zero. A series of error logs were automatically generated to indicate missing precipitation values and associated estimated daily KBDI values. This was done because zeroing missing precipitation may lead to an increased fire potential bias in KBDI. The total percentage of missing data gaps in rainfall and temperature series was maximally 5% during the study period for each of the six weather stations. The daily KBDI time-series were transformed into 10-daily KBDI series, similar to the SPOT VEGETATION S10 dekads (i.e., 10-day periods), by taking the maximum of

− 600

− 400 −KBDI

NDWI −0.3 −0.2 −0.1 0.0

0.1

0.2

NDWI −KBDI

− 200

0

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 157

1999

2000

2001

2002

2003

FIGURE 8.2 The temporal relationship between NDWI and KBDI time-series for the ‘‘Satara’’ weather station (Figure 8.1).

each dekad. The negative of the KBDI time-series (i.e., –KBDI) was analyzed in this chapter such that the temporal dynamics of KBDI and NDWI were related (Figure 8.2). The –KBDI and NDWI are used throughout this chapter. The Satara weather station, centrally positioned in the study area, was selected to represent the temporal vegetation dynamics. The other weather stations in the study area demonstrate similar temporal vegetation dynamics.

8.2.3

Remote Sensing Data

The data set used is composed of 10-daily SPOT VEGETATION (SPOT VGT) composites (S10 NDVI maximum value syntheses) acquired over the study area for the period April 1998 to December 2002. SPOT VGT can provide local to global coverage on a regular basis (e.g., daily for SPOT VGT). The syntheses result in surface reflectance in the blue (0.43–0.47 mm), red (0.61–0.68 mm), NIR (0.78–0.89 mm), and SWIR (1.58–1.75 mm) spectral regions. Images were atmospherically corrected using the simplified method for atmospheric correction (SMAC) [30]. The geometrically and radiometrically corrected S10 images have a spatial resolution of 1 km. The S10 SPOT VGT time-series were preprocessed to detect data that erroneously influence the subsequent fitting of functions to time-series, necessary to define and extract metrics [6]. The image preprocessing procedures performed were: .

Data points with a satellite viewing zenith angle (VZA) above 508 were masked out as pixels located at the very edge of the image (VZA > 50.58) swath are affected by re-sampling methods that yield erroneous spectral values.

.

The aberrant SWIR detectors of the SPOT VGT sensor, flagged by the status mask of the SPOT VGT S10 synthesis, also were masked out. A data point was classified as cloud-free if the blue reflectance was less than 0.07 [31]. The developed threshold approach was applied to identify cloud-free pixels for the study area.

.

NDWI time-series were derived by selecting savanna pixels, based on the land cover map of South Africa [32], for a 3 3 pixel window centered at each of the meteorological

Signal and Image Processing for Remote Sensing

158

stations to reduce the effect of potential spatial misregistration (Figure 8.1). Median values of the 9-pixel windows were then retained instead of single pixel values [33]. The median was preferred to average values as it is less affected by extreme values and therefore is less sensitive to potentially undetected data errors.

8. 3

Serial C orrelation and Ti me-Series A nalysis

Serial correlation affects correlations between variables measured in time, and violates the basic regression assumption of independence. Techniques that are used to recognize serial correlation therefore are discussed by applying them to the NDWI and –KBDI time-series. Cross-correlation analysis is illustrated and used to study the relationship between time-series of –KBDI and NDWI. Fundamental time-series analysis concepts (e.g., stationarity and seasonality) are introduced and a brief overview is presented of the most frequently used method for time-series analysis to account for serial correlation, namely autoregression(AR).

8.3.1

Recognizing Serial Correlation

This chapter focuses on discrete time-series, which contain observations made at discrete time intervals (e.g., 10 daily time steps of –KBDI and NDWI time-series). Time-series are defined as a set of observations, xt, recorded at a specific time, t [26]. Time-series of – KBDI and NDWI contain a seasonal variation which is illustrated in Figure 8.2 by a smooth increase or decrease of the series related to vegetation moisture dynamics. The gradual increase or decrease of the graph of a time-series is generally indicative of the existence of a form of dependence or serial correlation among observations. The presence of serial correlation systematically biases regression analysis when studying the relationship between two or more time-series [25]. Consider the OLS regression line with a slope and an intercept: Y(t) ¼ a0 þ a1 X(t) þ e(t)

(8:2)

where t is time, a0 and a1 are the respective OLS regression intercept and slope parameter, Y(t) the dependent variable, X(t) the independent variable, and e(t) the random error term. The standard error(SE) of each parameter is required for any regression model to define the confidence interval(CI) and derive the significance of parameters in the regression equation. The parameters a0, a1, and the CIs, estimated by minimizing the sum of the squared ‘‘residuals’’ are valid only if certain assumptions related to the regression and e(t) are met [25]. These assumptions are detailed in statistical textbooks [34] but are not always met or explicitly considered in real-world applications. Figure 8.3 illustrates the biased CIs of the OLS regression model at a 95% confidence level. The SE term of the regression model is underestimated due to serially correlated residuals and explains the biased confidence interval, where CI ¼ mean + 1.96 SE. The Gauss–Markov theorem states that the OLS parameter estimate is the best linear unbiased estimate (BLUE); that is, all other linear unbiased estimates will have a larger variance, if the error term, e(t), is stationary and exhibits no serial correlation. The Gauss–Markov theorem consequently points to the error term and not to the time-series themselves as the critical consideration [35]. The error term is defined as stationary when

−400 −800

−600

−KBDI

−200

0

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 159

−0.3

−0.2

−0.1

0.0

0.1

0.2

0.3

NDWI FIGURE 8.3 Result of the OLS regression fit between KBDI and NDWI as dependent and independent variables, respectively, for the Satara weather station (n ¼ 157). Confidence intervals (- - -) at a 95% confidence level are shown, but are ‘‘narrowed’’ due to serial correlation in the residuals.

it does not present a trend and the variance remains constant over time [27]. It is possible that the residuals are serially correlated if one of the dependent or independent variables also is serially correlated, because the residuals constitute a linear combination of both types of variables. Both dependent and independent variables of the regression model are serially correlated (KBDI and NDWI), which explains the serial correlation observed in the residuals. A sound practice used to verify serial correlation in time-series is to perform multiple checks by both graphical and diagnostic techniques. The autocorrelation function (ACF) can be viewed as a graphical measure of serial correlation between variables or residuals. The sample ACF is defined when x1, . . . , xn are observations of a time-series. The sample mean of x1, . . . , xn is [26]: x¼

n 1X xt n t¼1

(8:3)

The sample autocovariance function with lag h and time t is g^(h) ¼ n1

n jhj X

xtþjhj x (xt x),n < h < n

(8:4)

g^(h) ,n < h < n g^(0)

(8:5)

t¼1

The sample ACF is r^ ¼

Figure 8.4 illustrates the ACF for time-series of –KBDI and NDWI presented from the Kruger park data. The ACF clearly indicates a significant autocorrelation in the

Signal and Image Processing for Remote Sensing

−0.4 0.0

ACF, NDWI 0.2 0.4 0.6

0.8

1.0

160

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

Lag

−0.2 0.0

ACF,−KBDI 0.2 0.4 0.6

0.8

1.0

(a)

0.0

0.2

(b)

0.4 Lag

FIGURE 8.4 The autocorrelation function (ACF) for (a) KBDI and (b) NDWI time-series for the Satara weather station. pﬃﬃﬃ The horizontal lines on the graph are the bounds ¼ 1:96= n (n ¼ 157).

time-series, as more pﬃﬃﬃ than 5% of the sample autocorrelations fall outside the significance bounds ¼ 1:96= n [26]. There are also formal tests available to detect autocorrelation such as the Ljung–Box test statistic and the Durbin–Watson statistic [25,26].

8.3.2

Cross-Correlation Analysis

The cross-correlation function (CCF) can be derived between two time-series utilizing a technique similar to the ACF applied for one time-series [27]. Cross-correlation is a measure of the degree of linear relationship existing between two data sets and can be used to study the connection between time-series. The CCF, however, can only be used if the time-series is stationary [27]. For example, when all variables are increasing in value over time, cross-correlation results will be spurious and subsequently cannot be used to study the relationship between time-series. Nonstationary time-series can be transformed to stationary time-series by implementing one of the following techniques: .

Differencing the time-series by a period d can yield a series that satisfies the assumption of stationarity (e.g., xtxt–1 for d ¼ 1). The differenced series will

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 161 contain one point less than the original series. Although a time-series can be differenced more than once, one difference is usually sufficient. .

Lower order polynomials can be fitted to the series when the data contains a trend or seasonality that needs to be subtracted from the original series. Seasonal time-series can be represented as the sum of a specified trend, and seasonal and random terms. For example, for statistical interpretation results, it is important to recognize the presence of seasonal components and remove them to avoid confusion with long-term trends. Figure 8.5 illustrates the seasonal trend decompositioning method using locally weighted regression for the NDWI time-series [36].

.

The logarithm or square root of the series may stabilize the variance in the case of a nonconstant variance.

0.1 −0.1

0.1 −0.2 −0.1 0.0

−0.15 −0.10 −0.05 0.00

1999

2000

2001

2002

2003

−0.3

−0.1

0.1

Remainder

Trend

Seasonal

0.2

−0.3

Data

Figure 8.6 illustrates the cross-correlation plot for stationary series of –KBDI and NDWI. –KBDI and NDWI time-series became stationary after differencing with d ¼ 1. The stationarity was confirmed using the ‘‘augmented Dickey–Fuller’’ test for stationarity [26,29] at a confidence level of 95% ( p < 0.01; with stationarity as the alternative hypothesis). Note that approximately 95% confidence limits are shown for the autocorrelation plots of an independent series. These limits must be regarded with caution, since there exists an a priori expectation of serial correlation for time-series [37].

Time (dekad) FIGURE 8.5 The results of the seasonal trend decomposition (STL) technique for NDWI time-series of Satara weather station. The original series can be reconstructed by summing the seasonal, trend, and remainder. In the y-axes the NDWI values are indicated. The gray bars at the right-hand side of the plots illustrate the relative data range of the time-series.

Signal and Image Processing for Remote Sensing

–0.2

FIGURE 8.6 The cross-correlation plot between stationary KBDI and NDWI time-series of the Satara weather station, where CCF indicates results of the cross-correlation function. The horizontal lines on the pﬃﬃﬃ graph are the bounds (¼ 1:96= n) of the approximate 95% confidence interval.

0.0

CCF 0.1 0.2

0.3

0.4

0.5

162

–0.4

–0.2

0.0 Lag

0.2

0.4

Table 8.1 illustrates the coefficients of determination (i.e., multiple R2) of the OLS regression analysis with serially correlated residuals –KBDI as dependent and NDWI as independent variable for all six weather stations in the study area. The Durbin–Watson statistic indicated that the residuals were serially correlated at a 95% confidence level (p < 0.01). These results will be compared with the method presented in Section 8.5. Table 8.1 also indicates the time lags at which correlation between time-series was maximal, as derived from the cross-correlation plot. A negative lag indicates that –KBDI reacts prior to NDWI, for example, in the cases of Punda Maria and Shingwedzi weather stations, and subsequently can be used to predict NDWI. This is logical since weather conditions, for example, rainfall and temperature, change before vegetation reacts. NDWI, which is related to the amount of water in the vegetation, consequently lags behind the –KBDI. The major vegetation type in savanna vegetation is the herbaceous layer, which has a shallow rooting system. This explains why the vegetation in the study area quickly follows climatic changes and NDWI did not lag behind –KBDI for the other four weather stations.

8.3.3

Time-Series Analysis: Relating Time-Series and Autoregression

A remedy for serial correlation, apart from applying variations in sampling strategy, is modeling of the time dependence in the error structure by AR. AR most often is used for

TABLE 8.1 Coefficients of Determination of the OLS Regression Model between KBDI and NDWI (n ¼ 157). Station Punda Maria Letaba Onder Sabie Pretoriuskop Shingwedzi Satara

R2

Time Lag

0.74 0.88 0.72 0.31 0.72 0.81

1 0 0 0 1 0

Note: The time expressed in dekads of maximum correlation of the cross-correlation between –KBDI and NDWI is also indicated.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 163 purposes of forecasting and modeling of a time-series [25]. The simplest AR model for Equation 8.2, where r is the result of the sample ACF at lag 1, is et ¼ ret1 þ «t

(8:6)

where «t is a series of serially independent numbers with mean zero and constant variance. The Gauss–Markov theorem cannot be applied and therefore OLS is not an efficient estimator of the model parameters if r is not zero [35]. Many different AR models are available in statistical software systems that incorporate time-series modules. One of the most frequently used models to account for serial correlation is the autoregressive-integrated-moving-average model (ARIMA) [26,37]. Briefly stated, ARIMA models can have an AR term of order p, a differencing (integrating) term (I) of order d, and a moving average (MA) term of order q. The notation for specific models takes the form of (p,d,q) [27]. The order of each term in the model is determined by examining the raw data and plots of the ACF of the data. For example, a second-order AR (p ¼ 2) term in the model would be appropriate if a series has significant autocorrelation coefficients between xt, and xt–1, and xt–2. ARIMA models that are fitted to time-series data using AR and MA parameters, p and q, have coefficients F and u to describe the serial correlation. An underlying assumption of ARIMA models is that the series being modeled is stationary [26–27].

8.4

Methodology

The TIMESAT program is used to extract nonserially correlated metrics from remote sensing and meteorological time-series [38,39]. These metrics are utilized to study the relationship between time-series at specific moments in time. The relationship between time-series, in turn, is evaluated using statistical analysis of extracted nonserially correlated seasonal metrics from time-series (–KBDI and NDWI).

8.4.1

Data Smoothing

It often is necessary to generate smooth time-series from noisy satellite sensors or meteorological data to extract information on seasonality. The smoothing can be achieved by applying filters or by function fitting. Methods based on Fourier series [40–42] or leastsquare fits to sinusoidal functions [43–45] are known to work well in most instances. These methods, however, are not capable of capturing a sudden, steep rise or decrease of remote sensing or meteorological data values that often occur in arid and semiarid environments. Alternative smoothing and fitting methods have been developed to overcome these problems [38]. An adaptive Savitzky–Golay filtering method, implemented in the TIMESAT processing package developed by Jo¨nsson and Eklundh [39], is used in this chapter. The filter is based on local polynomial fits. Suppose we have a time-series (ti, yi), i ¼ 1, 2, . . . , N. For each point i, a quadratic polynomial f (t) ¼ c1 þ c2 t þ c3 t2

(8:7)

Signal and Image Processing for Remote Sensing

164

is fit to all 2kþ1 points for a window from n ¼ i k to m ¼ i þ k by solving the system of normal equations AT Ac ¼ AT b

(8:8)

where 0

wn B wnþ1 B A¼B @ wm

wn tn wnþ1 tnþ1 .. . wm tm

1 0 1 w n yn wn t2n B wnþ1 ynþ1 C wnþ1 t2nþ1 C B C C C and b ¼ B C .. A @ A . 2 w m ym wm tm

(8:9)

The filtered value is set to the value of the polynomial at point i. Weights are designated as w in the above expression, with weights assigned to all of the data values in the window. Data values that were flagged in the preprocessing are assigned weight ‘‘zero’’ in this application and thus do not influence the result. The clean data values all have weights ‘‘one.’’ Residual negatively biased noise (e.g., clouds) may occur for the remote sensing data and accordingly the fitting was performed in two steps [6]. The first fit was conducted using weights obtained from the preprocessing. Data points above the resulting smoothed function from the first fit are regarded more important, and in the second step the normal equations are solved using the weight of these data values, but increased by a factor 2. This multistep procedure leads to a smoothed function that is adapted to the upper envelope of the data (Figure 8.7). Similarly, the ancillary metadata of the meteorological data from the preprocessing also were used in the iterative fitting to the upper envelope of the –KBDI time-series [6]. The width of the fitting window determines the degree of smoothing, but it also affects the ability to follow a rapid change. It is sometimes necessary to locally tighten the window even when the global setting of the window performs well. A typical situation occurs in savanna ecosystems where vegetation, associated remote sensing, and meteorological indices respond rapidly to vegetation moisture dynamics. A small fitting window can be used to capture the corresponding sudden rise in data values. (b) 0.4

0.4

0.2

0.2 NDWI

NDWI

(a)

0

−0.2 −0.4

0

−0.2

0

20

40 60 Time (dekad)

80

100

−0.4 0

20

40 60 80 Time (dekad)

100

FIGURE 8.7 The Savitzky–Golay filtering of NDWI (——) is performed in two steps. Firstly, the local polynomials are fitted using the weights from the preprocessing (a). Data points above the resulting smoothed function (– – –) from the first fit are attributed a greater importance. Secondly, the normal equations are solved with the weights of these data values increased by a factor 2 (b).

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 165 (b) 0.3

165

0.2

160

0.1

155 NDWI

NDWI

(a)

0

150

−0.1

145

−0.2

140

−0.3

135

−0.4

0

20

40 60 80 Time (dekad)

100

130

0

20

40 60 80 Time (dekad)

100

FIGURE 8.8 The filtering of NDWI (——) in (a) is done with a window that is too large to allow the filtered data (– – –) to follow sudden increases and decreases of underlying data values. The data in the window are scanned and if there is a large increase or decrease, an automatic decrease in the window size will result. The filtering is then repeated using the new locally adapted size (b). Note the improved fit at rising edges and narrow peaks.

The data in the window are scanned and if a large increase or decrease is observed, the adaptive Savitzky–Golay method applied an automatic decrease in the window size. The filtering is then repeated using the new locally adapted size. Savitzky–Golay filtering with and without the adaptive procedure is illustrated in Figure 8.8. In the figure it is shown that the adaptation of the window improves the fit at the rising edges and at narrow seasonal peaks. 8.4.2

Extracting Seasonal Metrics from Time-Series and Statistical Analysis

Four seasonal metrics were extracted for each of the rainy seasons. Figure 8.9 illustrates the different metrics per season for NDWI and KBDI time-series. The beginning of a season, that is, 20% left of the rainy season, is defined from the final function fit as the point in time for which the index value has increased by 20% of the distance between the left minimum level and the maximum. The end of the season is defined in a similar way

(a)

(b) 0.4

0

0.2 −KBDI

NDWI

–200 0

–400

–0.2

–600

–0.4 –0.6 0

20

60 40 Time (dekad)

80

100

–800 0

20

40 60 80 Time (dekad)

100

FIGURE 8.9 The final fit of the Savitzky–Golay function (– – –) to the NDWI (a) and –KBDI (b) series (——), with the four defined metrics, that is, 20% left and right, and 80% left and right (&), overlaid on the graph. Points with flagged data errors (þ) were assigned weights of zero and did not influence the fit. A dekad is defined as a 10-day period.

Signal and Image Processing for Remote Sensing

166

as the point 20% right of the rain season. The 80% left and right points are defined as the points for which the function fit has increased to 80% of the distance between, respectively, the left and right minimum levels and the maximum. The current technique used to define metrics also is used by Verbesselt et al. [6] to define the beginning of the fire season. The temporal occurrence and the value of each metric were extracted for further exploratory statistical analysis to study the relationship between time-series. The SPOT VGT S10 time-series consisted of four seasons (1998–2002) from which four occurrences and values per metric type were extracted. Twenty-four occurrence–value combinations per metric type ultimately were available for further analysis since six weather stations were used. Serial correlation that occurs in remote sensing and climate-based time-series invalidates inferences made by standard parametric tests, such as the Student’s t-test or the Pearson correlation. All extracted occurrence–value combinations per metric type were tested for autocorrelation using the Ljung–Box autocorrelation test [26]. Robust nonparametric techniques, such as the Wilcoxon’s signed rank test were used in case of non-normally distributed data. The normality of the data was verified using the Shapiro–Wilkinson normality test [29]. Firstly, the distribution of the temporal occurrence of each metric was visualized and evaluated based on whether or not there was a significant difference between the temporal occurrence of the four metric types extracted from –KBDI and NDWI time-series. Next, the strength and significance of the relationship between –KBDI and NDWI values of the four metric types were assessed with an OLS regression analysis.

8.5

Results and Discussion

Figure 8.9 illustrates the optimized function fit and the defined metrics for the –KBDI and NDWI. Notice that the Savitzky–Golay function could properly define the behavior of the different time-series. The function was fitted to the upper envelope of the data by using the uncertainty information derived during the preprocessing step. The results of the statistical analysis based on the extracted metrics for –KBDI and NDWI are presented. The Ljung–Box statistic indicated that the extracted occurrences and values were not significantly autocorrelated at a 95% confidence level. All p-values were greater than 0.1, failing to reject the null hypothesis of independence. 8.5.1

Temporal Analysis of the Seasonal Metrics

Figure 8.10 illustrates the temporal distribution of temporal occurrence of extracted metrics from time-series of –KBDI and NDWI. The occurrences of extracted metrics were significantly non-normally distributed at a 95% confidence level (p > 0.1), indicating that the Wilcoxon’s signed rank can be used. The Wilcoxon’s signed rank test showed that –KBDI and NDWI occurrences of the 80% left and right, and 20% right were not significantly different from each other at a 95% confidence level (p > 0.1). This confirmed that –KBDI and NDWI were temporally related. It also corroborated the results of Burgan [11] and Ceccato et al. [4] who found that both –KBDI and NDWI were related to the seasonal vegetation moisture dynamics, as measured by VWC. Figure 8.10, however, illustrates that the start of the rainy season (i.e., 20% left occurrence), derived from the –KBDI and NDWI time-series, was different. The Wilcoxon’s

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 167 20% right

28

50

30

32

55

34

36

60

38

40

65

20% left

–KBDI

NDWI

–KBDI

80% left

NDWI

38

34

40

36

38

42 44

40

46

42

48

44

50

46

80% right

–KBDI

NDWI

–KBDI

NDWI

FIGURE 8.10 Box plots of the temporal occurrence of the four defined metrics, that is, 20% left and right, and 80% left and right, extracted from time-series of KBDI and NDWI. The dekads (10-day period) are shown on the y-axis and are indicative of the temporal occurrence of the metric. The upper and lower boundaries of the boxes indicate upper and lower quartiles. The median is indicated by the solid line (—) within each box. The whiskers connect the extremes of the data, which were defined as 1.5 times the inter-quartile range. Outliers are represented by (o).

signed rank test confirmed that the –KBDI and NDWI differed significantly from each other at a 95% confidence level (p < 0.01). This phenomenon can be explained by the fact that vegetation in the study area starts growing before the rainy season starts, due to an early change in air temperature (N. Govender, Scientific Service Kruger National Park, South Africa, personal communication). This explained why the NDWI reacted before the change in climatic conditions as measured by the –KBDI, given that the NDWI is directly related to vegetation moisture dynamics [4].

8.5.2

Regression Analysis Based on Values of Extracted Seasonal Metrics

The assumptions of the OLS regression models between values of metrics extracted from –KBDI and NDWI time-series were verified. The Wald test statistic showed nonlinearity to be not significant at a 95% confidence level (p > 0.15). The Shapiro– Wilkinson normality test confirmed that the residuals were normally distributed at a

Signal and Image Processing for Remote Sensing

168

95% confidence level (p < 0.01) [6,29]. Table 8.2 illustrates the results of the OLS regression analysis between values of metrics extracted from –KBDI and NDWI timeseries. The values extracted at the ‘‘20% right’’ position of the –KBDI and NDWI time-series showed a significant relationship at a 95% confidence level (p < 0.01). The other extracted metrics did not exhibit significant relationships at a 95% confidence level (p > 0.1). A significant relationship between the –KBDI and NDWI time-series was observed only at the moment when savanna vegetation was completely cured (i.e., 20% right-hand side). The savanna vegetation therefore reacted differently to changes in climate parameters such as rainfall and temperature, as measured by KBDI, depending on the phenological growing cycle. This phenomenon could be explained because a living plant uses defense mechanisms to protect itself from drying out, while a cured plant responds to climatic conditions [46]. These results consequently indicated that the relationship between extracted values of –KBDI and NDWI was influenced by seasonality. This is in corroboration with the results of Ji and Peters [23], who indicated that seasonality had a significant effect on the relationship between vegetation as measured by a remote sensing index and drought index. These results further illustrated that the seasonal effect needs to be taken into account when regression techniques are used to quantify the relationship between time-series related to vegetation moisture dynamics. The seasonal effect also can be accounted for by utilizing autoregression models with seasonal dummy variables, which take the effect of serial correlation and seasonality into account [23,26]. However, the proposed method to account for serial correlation by sampling at specific moments in time had an additional advantage; the influence of seasonality could be studied by extracting metrics at the specified moments, besides the fact that serial correlation was taken into account. Furthermore, it was shown that serial correlation caused an overestimation of the correlation coefficient is when results from Table 8.1 and Table 8.2 were compared. All the coefficients of determination (R2) of Table 8.1 were significant with an average value of 0.7, while in Table 8.2 only the correlation coefficient at the end of the rainy season (20% right-hand side) was significant (R2 ¼ 0.49). This confirmed the importance of accounting for serial correlation and seasonality in the residuals of a regression model, when studying the relationship between two time-series.

8.5.3

Time-Series Analysis Techniques

Time-series analysis models most often are used for purposes of describing current conditions and forecasting [25]. The models use the serial correlation in time-series as a TABLE 8.2 Coefficients of Determination of the OLS Regression Models (NDWI –KBDI) for the Four Extracted Seasonal Metric Values between –KBDI and NDWI Time-Series (n ¼ 24 per Metric) NDWI KBDI 20% left 20% right 80% left 80% right

R2

p-Values

0.01 0.49 0.00 0.01

0.66 <0.01 0.97 0.61

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 169 tool to relate temporal observations. Future observations can be predicted by modeling the serial correlation structure of a time-series [26]. Time-series analysis techniques (e.g., ARIMA) can be used to model a time-series by using other, independent timeseries [27]. ARIMA subsequently can be used to study the relationship between –KBDI and NDWI. However, there are constraints that have to be considered before ARIMA models can be applied to a time-series (e.g., drought index) by using other predictor time-series (e.g., satellite index). Firstly, the goodness-of-fit of an ARIMA model will not be significant when changes in the satellite index precede or coincide with those in the drought index. The CCF can be used in this context to verify how time-series are related to each other. Time lag results in Table 8.1 indicate that the –KBDI (drought index) precedes or coincides with the NDWI time-series. This illustrates that ARIMA models cannot directly be used to predict the KBDI, with NDWI as the predictor variable. Consequently, other more advanced timeseries analysis techniques are needed to model vegetation dynamics because they will precede or coincide with the dynamics monitored by remote sensing indices in most of the cases. Such more advanced time-series analysis techniques, however, are not discussed since they are outside the scope of this chapter. Secondly, availability of data is limited for time-series analysis, namely, from 1998 to 2002. This is an important constraint because two separate data sets are needed to parameterize and evaluate an ARIMA model. One set is needed for parameterization, while the other is used to forecast and validate the ARIMA model through comparison of the observed and expected values. Accordingly, it is necessary to interpolate missing satellite data that were masked out during preprocessing to ensure adequate data are available for parameterization. Thirdly, the proposed sampling strategy made investigation of the time lag and correlation at a defined instant in time possible, as opposed to ARIMA or cross-correlation analysis, through which only the overall relationship between time-series can be studied [25]. The applied sampling strategy is thus ideally suited to study the relationship between time-series of climate and remote sensing data, characterized by seasonality and serial correlation. The sampling of seasonal metrics minimized the influence of serial correlation, thereby making the study of seasonality possible.

8.6

Conclusions

Serial correlation problems are not unknown in the field of statistical or general meteorology. However, the presence of serial correlation, found during analysis of a variable sampled sequentially at regular time intervals, seems to be disregarded by many agricultural meteorologists and remote sensing scientists. This is true despite abundant documentation available in the traditional meteorological and statistical literature. Therefore, an overview of the most important time-series analysis techniques and concepts was presented, namely, stationarity, autocorrelation, differencing, decomposition, autoregression, and ARIMA. A method was proposed to study the relationship between a meteorological drought index (KBDI) and remote sensing index (NDWI), both related to vegetation moisture dynamics, by accounting for the serial correlation effect. The relationship between –KBDI and NDWI was studied by extracting nonserially correlated seasonal metrics, for example, 20% and 80% left- and right-hand side metrics of the rainy season, based on a Savitzky–Golay fit to the upper envelope of the time-series. Serial correlation between the

170

Signal and Image Processing for Remote Sensing

extracted metrics was shown to be minimal and seasonality was an important factor influencing the relationship between NDWI and –KBDI time-series. Statistical analysis using the temporal occurrence of the extracted metrics revealed that NDWI and –KBDI time-series are temporally connected, except at the beginning of the rainy season. The fact that the savanna vegetation starts re-greening before the start of the rainy season explains this inability to detect the beginning of the rainy season. The values of the extracted seasonal metrics of NDWI and –KBDI were significantly related only at the end of the rainy season, namely, at the 20% right-hand side value of the fitted curve. The savanna vegetation at the end of the rainy season was cured and responded strongly to changes in climatic conditions monitored by the –KBDI, such as rain and temperature. The relationship between –KBDI and NDWI consequently changes during the season, which indicates that seasonality is an important factor that needs to be taken into account. Moreover, it was shown that correlation coefficients estimated by OLS regression analysis were overestimated due to the influence of serial correlation in the residuals. This confirmed the importance of taking serial correlation of the residuals into account by sampling nonserially correlated seasonal metrics when studying the relationship between time-series. The serial correlation effect consequently was taken into account by the extraction of seasonal metrics from time-series. The seasonal metrics in turn could be used to study the relationship between remote sensing and ground-based time-series, such as meteorological or field measurements. A better understanding of the relationship between remote sensing and in situ observations at regular time intervals will contribute to the use of remotely sensed data for the development of an index that represents seasonal vegetation moisture dynamics.

Acknowledgments The SPOT VGT S10 data sets were generated by the Flemish Institute for Technological Development. The climate data were provided by the Weather Services of South Africa, whereas the National Land Cover Map (1995) was supplied by the Agricultural Research Centre of South Africa. We acknowledge the support of L. Eklundh, as well as the Crafoord and Bergvall foundations. The authors wish to thank N. Govender from the Scientific Services of Kruger National Park for scientific input.

References 1. Camia, A. et al., Meteorological fire danger indices and remote sensing, in Remote Sensing of Large Wildfires in the European Mediterranean Basin, E. Chuvieco, Ed., Springer-Verlag, New York, 1999, p. 39. 2. Ceccato, P. et al., Estimation of live fuel moisture content, in Wildland Fire Danger Estimation and Mapping: The Role of Remote Sensing Data, E. Chuvieco, Ed., World Scientific Publishing, New York, 2003, p. 63. 3. Dennison, P.E. et al., Modeling seasonal changes in live fuel moisture and equivalent water thickness using a cumulative water balance index, Remote Sens. Environ., 88, 442, 2003.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 171 4. Ceccato, P. et al., Detecting vegetation leaf water content using reflectance in the optical domain, Remote Sens. Environ., 77, 22, 2001. 5. Jackson, T.J. et al., Vegetation water content mapping using landsat data derived normalized difference water index for corn and soybeans, Remote Sens. Environ., 92, 475, 2004. 6. Verbesselt, J. et al., Evaluating satellite and climate data derived indices as fire risk indicators in savanna ecosystems, IEEE Trans. Geosci. Remote Sens., 44; 1622–1632; 2006. 7. Van Wilgen, B.W. et al., Response of Savanna fire regimes to changing fire-management policies in a large African national park, Conserv. Biol., 18, 1533, 2004. 8. Dimitrakopoulos, A.P. and Bemmerzouk, A.M., Predicting live herbaceous moisture content from a seasonal drought index, Int. J. of Biometeorol., 47, 73, 2003. 9. Keetch, J.J. and Byram, G.M., A drought index for forest fire control, U.S.D.A. Forest Service, Asheville NC, SE-38, 1988. 10. Janis, M.J., Johnson, M.B., and Forthun, G., Near-real time mapping of Keetch–Byram drought index in the south-eastern United States, Int. J. Wildland Fire, 11, 281, 2002. 11. Burgan, E.R., Correlation of plant moisture in Hawaii with the Keetch–Byram drought index, USDA. Forest Service Research Note, PSW-307, 1976. 12. Chuvieco, E. et al., Combining NDVI and surface temperature for the estimation of live fuel moisture content in forest fire danger rating, Remote Sens. Environ., 92, 322, 2004. 13. Maki, M., Ishiahra, M., and Tamura, M., Estimation of leaf water status to monitor the risk of forest fires by using remotely sensed data, Remote Sens. Environ., 90, 441, 2004. 14. Paltridge, G.W. and Barber, J., Monitoring grassland dryness and fire potential in Australia with NOAA AVHRR data, Remote Sens. Environ., 25, 381, 1988. 15. Chladil, M.A. and Nunez, M., Assessing grassland moisture and biomass in Tasmania—the application of remote-sensing and empirical-models for a cloudy environment, Int. J. Wildland Fire, 5, 165, 1995. 16. Hardy, C.C. and Burgan, R.E., Evaluation of NDVI for monitoring live moisture in three vegetation types of the western US, Photogramm. Eng. Remote Sens., 65, 603, 1999. 17. Chuvieco, E. et al., Estimation of fuel moisture content from multitemporal analysis of LANDSAT thematic mapper reflectance data: applications in fire danger assessment, Int. J. Remote Sens., 23, 2145, 2002. 18. Fensholt, R. and Sandholt, I., Derivation of a shortwave infrared water stress index from MODIS near- and shortwave infrared data in a semiarid environment, Remote Sens. Environ., 87, 111, 2003. 19. Hunt, E.R., Rock, B.N., and Nobel, P.S., Measurement of leaf relative water content by infrared reflectance, Remote Sens. Environ., 22, 429, 1987. 20. Ceccato, P., Flasse, S., and Gregoire, J.M., Designing a spectral index to estimate vegetation water content from remote sensing data—Part 2: Validation and applications, Remote Sens. Environ., 82, 198, 2002. 21. Myneni, R.B. et al., Increased plant growth in the northern high latitudes from 1981 to 1991, Nature, 386, 698, 1997. 22. Eklundh, L., Estimating relations between AVHRR NDVI and rainfall in East Africa at 10-day and monthly time scales, Int. J. Remote Sens., 19, 563, 1998. 23. Ji, L. and Peters, A.J., Assessing vegetation response to drought in the northern great plains using vegetation and drought indices, Remote Sens. Environ., 87, 85, 2003. 24. De Beurs, K.M. and Henebry, G.M., Land surface phenology, climatic variation, and institutional change: analyzing agricultural land cover change in Kazakhstan, Remote Sens. Environ., 89, 497, 2004. 25. Meek, D.W. et al., A note on recognizing autocorrelation and using autoregression, Agricult. Forest Meterol., 96, 9, 1999. 26. Brockwell, P.J. and Davis, R.A., Introduction to Time Series and Forecasting, 2nd ed., SpringerVerlag, New York, 2002, p. 31. 27. Ford, C.R. et al., Modeling canopy transpiration using time series analysis: a case study illustrating the effect of soil moisture deficit on Pinus Taeda, Agricult. Forest Meterol., 130, 163, 2005. 28. Montanari, A., Deseasonalisation of hydrological time series through the normal quantile transform, J. Hydrol., 313, 274, 2005.

172

Signal and Image Processing for Remote Sensing

29. R Development Core Team, R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, URL http:// www.R-project.org, 2005. 30. Rahman, H. and Dedieu, G., SMAC—a simplified method for the atmospheric correction of satellite measurements in the solar spectrum, Int. J. Remote Sens., 15, 123, 1994. 31. Stroppiana, D. et al., An algorithm for mapping burnt areas in Australia using SPOT-VEGETATION data, IEEE Trans. Geosci. Remote Sens., 41, 907, 2003. 32. Thompson, M.A., Standard land-cover classification scheme for remote-sensing applications in South Africa, South Afr. J. Sci., 92, 34, 1996. 33. Aguado, I. et al., Assessment of forest fire danger conditions in southern Spain from NOAA images and meteorological indices, Int. J. Remote Sens., 24, 1653, 2003. 34. von Storch, H. and Zwiers, F.W., Statistical Analysis in Climate Research, Cambridge University Press, Cambridge, 1999, p. 483. 35. Thejll, P. and Schmith, T., Limitations on regression analysis due to serially correlated residuals: application to climate reconstruction from proxies, J. Geophys. Res. Atmos., 110, 2005. 36. Cleveland, R.B. et al., STL: a seasonal-trend decomposition procedure based on Loess., J. Off. Stat., 6, 3, 1990. 37. Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, 4th ed., Springer-Verlag, New York, 2003, p. 493. 38. Jo¨nsson, P. and Eklundh, L., Seasonality extraction by function fitting to time-series of satellite sensor data, IEEE Trans. Geosci. Remote Sens., 40, 1824, 2002. 39. Jo¨nsson, P. and Eklundh, L., TIMESAT—a program for analyzing time-series of satellite sensor data, Comput. Geosci., 30, 833, 2004. 40. Menenti, M. et al., Mapping agroecological zones and time-lag in vegetation growth by means of Fourier-analysis of time-series of NDVI images, Adv. Space Res., 13, 233, 1993. 41. Azzali, S. and Menenti, M., Mapping vegetation–soil–climate complexes in Southern Africa using temporal Fourier analysis of NOAA-AVHRR NDVI data, Int. J. Remote Sens., 21, 973, 2000. 42. Olsson, L. and Eklundh, L., Fourier-series for analysis of temporal sequences of satellite sensor imagery, Int. J. Remote Sens., 15, 3735, 1994. 43. Cihlar, J., Identification of contaminated pixels in AVHRR composite images for studies of land biosphere, Remote Sens. Environ., 56, 149, 1996. 44. Sellers, P.J. et al., A global 1-degrees-by-1-degrees NDVI data set for climate studies. The generation of global fields of terrestrial biophysical parameters from the NDVI, Int. J. Remote Sens., 15, 3519, 1994. 45. Roerink, G.J., Menenti, M., and Verhoef, W., Reconstructing cloudfree NDVI composites using Fourier analysis of time series, Int. J. Remote Sens., 21, 1911, 2000. 46. Pyne, S.J., Andrews, P.L., and Laven, R.D., Introduction to Wildland Fire, 2nd ed., John Wiley & Sons, New York, 1996, 117.

9 Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

Sang-Ho Yun and Howard Zebker

CONTENTS 9.1 Image Descriptions ........................................................................................................... 174 9.1.1 TOPSAR DEM....................................................................................................... 174 9.1.2 SRTM DEM............................................................................................................ 175 9.2 Image Registration............................................................................................................ 176 9.3 Artifact Elimination .......................................................................................................... 177 9.4 Prediction-Error (PE) Filter ............................................................................................. 178 9.4.1 Designing the Filter.............................................................................................. 178 9.4.2 1D Example ........................................................................................................... 179 9.4.3 The Effect of the Filter ......................................................................................... 179 9.5 Interpolation ...................................................................................................................... 180 9.5.1 PE Filter Constraint .............................................................................................. 180 9.5.2 SRTM DEM Constraint........................................................................................ 181 9.5.3 Inversion with Two Constraints ........................................................................ 181 9.5.4 Optimal Weighting............................................................................................... 181 9.5.5 Simulation of the Interpolation .......................................................................... 183 9.6 Interpolation Results ........................................................................................................ 183 9.7 Effect on InSAR ................................................................................................................. 185 9.8 Conclusion ......................................................................................................................... 187 References ................................................................................................................................... 188

A prediction-error (PE) filter is an array of numbers designed to interpolate missing parts of data such that the interpolated parts have the same spectral content as the existing parts. The data can be a one-dimensional time series, two-dimensional image, or a threedimensional quantity such as subsurface material property. In this chapter, we discuss the application of a PE filter to recover missing parts of an image when a low-resolution image of the missing parts is available. One of the research issues on PE filter is improving the quality of image interpolation for nonstationary images, in which the spectral content varies with position. Digital elevation models (DEMs) are in general nonstationary. Thus, PE filter alone cannot guarantee the success of image recovery. However, the quality of the image recovery of a high-resolution image can be improved with independent data set such as a lowresolution image that has valid pixels for the missing regions of the high-resolution image. Using a DEM as an example image, we introduce a systematic method to use a 173

Signal and Image Processing for Remote Sensing

174

PE filter incorporating the low-resolution image as an additional constraint, and show the improved quality of the image interpolation. High-resolution DEMs are often limited in spatial coverage; they also may possess systematic artifacts when compared to comprehensive low-resolution maps. We correct artifacts and interpolate regions of missing data in topographic synthetic aperture radar (TOPSAR) DEMs using a low-resolution shuttle radar topography mission (SRTM) DEM. Then PE filters are to interpolate and fill missing data so that the interpolated regions have the same spectral content as the valid regions of the TOPSAR DEM. The SRTM DEM is used as an additional constraint in the interpolation. Using cross-validation methods one can obtain the optimal weighting for the PE filter and the SRTM DEM constraints.

9. 1

Image Descriptions

InSAR is a powerful tool for generating DEMs [1]. The TOPSAR and SRTM sensors are primary sources for the academic community for DEMs derived from single-pass interferometric data. Differences in system parameters such as altitude and swath width (Table 9.1) result in very different properties for derived DEMs. Specifically, TOPSAR DEMs have better resolution, while SRTM DEMs have better accuracy over larger areas. TOPSAR coverage is often not spatially complete.

9.1.1

TOPSAR DEM

TOPSAR DEMs are produced from cross-track interferometric data acquired with NASA’s AIRSAR system mounted on a DC-8 aircraft. Although the TOPSAR DEMs have a higher resolution than other existing data, they sometimes suffer from artifacts and missing data due to roll of the aircraft, layover, and flight planning limitations. The DEMs derived from the SRTM have lower resolution, but fewer artifacts and missing data than TOPSAR DEMs. Thus, the former often provides information in the missing regions of the latter. We illustrate joint use of these data sets using DEMs acquired over the Gala´pagos Islands. Figure 9.1 shows the TOPSAR DEM used in this study. The DEM covers Sierra Negra volcano on the island of Isabela. Recent InSAR observations reveal that the volcano has been deforming relatively rapidly [2,3]. InSAR analysis can require use of a DEM to produce a simulated interferogram required to isolate ground deformation. The effect of artifact elimination and interpolation for deformation studies is discussed later in this chapter. TABLE 9.1 TOPSAR Mission versus SRTM Mission Mission Platform Nominal Swath width Baseline DEM resolution DEM coord. system

TOPSAR

SRTM

DC-8 aircraft Altitude 9 km 10 km 2.583 m 10 m None

Space shuttle 233 km 225 km 60 m 90 m Lat/Long

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

100

175

1 3

200 300 2

400 N 500 2 km 600 200

400 800

800

600 900

1000

1100

1000

1200

Altitude 1200 (m)

FIGURE 9.1 The original TOPSAR DEM of Sierra Negra volcano in Gala´pagos Islands (inset for location). The pixel spacing of the image is 10 m. The boxed areas are used for illustration later in this paper. Note that there are a number of regions of missing data with various shapes and sizes. Artifacts are not identifiable due to the variation in topography. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

The TOPSAR DEMs have a pixel spacing of about 10 m, sufficient for most geodetic applications. However, regions of missing data are often encountered (Figure 9.1), and significant residual artifacts are found (Figure 9.2). The regions of missing data are caused by layover of the steep volcanoes and flight planning limitations. Artifacts are large-scale and systematic and most likely due to uncompensated roll of the DC-8 aircraft [4]. Attempts to compensate this motion include models of piecewise linear imaging geometry [5] and estimating imaging parameters that minimize the difference between the TOPSAR DEM and an independent reference DEM [6]. We use a nonparameterized direct approach by subtracting the difference between the TOPSAR and SRTM DEMs.

9.1.2

SRTM DEM

The recent SRTM mission produced nearly worldwide topographic data at 90 m posting. SRTM topographic data are in fact produced at 30 m posting (1 arcsec); however, highresolution data sets for areas outside of the United States are not available to the public at this time. Only DEMs at 90 m posting (3 arcsec) are available to download. For many analyses, finer scale elevation data are required. For example, a typical pixel spacing in a spaceborne SAR image is 20 m. If the SRTM DEMs are used for topography removal in spaceborne interferometry, the pixel spacing of the final interferograms would be limited by the topography data to at best 90 m. Despite the lower resolution, the SRTM DEM is useful because it has fewer motion-induced artifacts than the TOPSAR DEM. It also has fewer data holes. The merits and demerits of the two DEMs are in many ways complementary to each other. Thus, a proper data fusion method can overcome the shortcomings of each and produce a new DEM that combines the strengths of the two data sets: a DEM that has a

Signal and Image Processing for Remote Sensing

176

1000

500

50

1000

800

100

800

600

150

600

400

200

400

1000

1500

2000 200

250

200

2500

(a)

500

1000

1500

2000

0 (m)

2500

300

(b)

50

100

150

200

250

300

0 (m)

15

50

100

10

150 5 200

idth th w Swa km = 10

250

0

−5 (m)

300

(c)

50

100

150

200

250

300

FIGURE 9.2 (See color insert following page 302.) (a) TOPSAR DEM and (b) SRTM DEM. The tick labels are pixel numbers. Note the difference in pixel spacing between the two DEMs. (c) Artifacts obtained by subtracting the SRTM DEM from the TOPSAR DEM. The flight direction and the radar look direction of the aircraft associated with the swath with the artifact are indicated with long and short arrows, respectively. Note that the artifacts appear in one entire TOPSAR swath, while they are not as serious in other swaths.

resolution of the TOPSAR DEM and large-scale reliability of the SRTM DEM. In this chapter, we present an interpolation method that uses both TOPSAR and SRTM DEMs as constraints.

9.2

Image Registration

The original TOPSAR DEM, while in ground-range coordinates, is not georeferenced. Thus, we register the TOPSAR DEM to the SRTM DEM, which is already registered in a latitude–longitude coordinate system. The image registration is carried out between the DEM data sets using an affine transformation. Although the TOPSAR DEM is not georeferenced, it is already on the ground coordinate system. Thus, scaling and rotation are the two most important components. We have seen that skewing component was negligible. Any higher order transformation between the two DEMs would also be negligible. The affine transformation is as follows:

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

xS a ¼ yS c

b d

xT e þ yT f

177

(9:1)

x xS and T are tie points in the SRTM and TOPSAR DEM coordinate systems, where yS yT respectively. Since [a b e] and [c d f] are estimated separately, at least three tie points are required to uniquely determine them. We picked 10 tie points from each DEM based on topographic features and solved for the six unknowns in a least-square sense. Given the six unknowns, we choose new georeferenced sample locations that are uniformly spaced; every ninth sample location corresponds to the sample location of xT xS and are calculated. Then, the SRTM DEM. Those sample locations from yS yT nearest TOPSAR DEM value is selected and put into the corresponding new georeferenced sample location. The intermediate values are filled in from the TOPSAR map to produce the georeferenced 10-m data set. It should be noted that it is not easy to determine the tie points in DEM data sets. Enhancing the contrast of the DEMs facilitated the process. In general, fine registration is important for correctly merging different data sets. The two DEMs in this study have different pixel spacings. It is difficult to pick tie points with higher precision than the pixel spacing of the coarser image. In our method, however, the SRTM DEM, the coarser image, is treated as an averaged image of the TOPSAR DEM, the finer image. In our inversion, only the 9-by-9 averaged values of the TOPSAR DEM are compared with the pixel values of the SRTM DEM. Thus, the fine registration is less critical in this approach than in the case where a one-to-one match is required.

9.3

Artifact Elimination

Examination of the georeferenced TOPSAR DEM (Figure 9.2a) shows motion artifacts when compared to the SRTM DEM (Figure 9.2b). The artifacts are not clearly discernible in Figure 9.2a because their magnitude is small in comparison to the overall data values. The artifacts are identified by downsampling the registered TOPSAR DEM and subtracting the SRTM DEM. Large-scale anomalies that periodically fluctuate over an entire swath are visible in Figure 9.2c. The periodic pattern is most likely due to uncompensated roll of the DC-8 aircraft. The spaceborne data are less likely to exhibit similar artifacts, because the spacecraft is not greatly affected by the atmosphere. Note that the width of the anomalies corresponds to the width of a TOPSAR swath. Because the SRTM swath is much larger than that of the TOPSAR system (Table 9.1), a larger area is covered under consistent conditions, reducing the number of parallel tracks required to form an SRTM DEM. The maximum amplitude of the motion artifacts in our study area is about 20 m. This would result in substantial errors in many analyses if not properly corrected. For example, if this TOPSAR DEM is used for topography reduction in repeat-pass InSAR using ERS-2 data with a perpendicular baseline of about 400 m, the resulting deformation interferogram would contain one fringe ( ¼ 2.8 cm) of spurious signal. To remove these artifacts from the TOPSAR DEM, we up-sample the difference image with bilinear interpolation by a factor of 9 so that its pixel spacing matches the TOPSAR DEM. The difference image is subtracted from the TOPSAR DEM. This process is described with a flow diagram in Figure 9.3. Note that the lower branch

Signal and Image Processing for Remote Sensing

178

TOPSAR DEM − 9average

−

TOPSAR DEM corrected

9bilinear

SRTM DEM FIGURE 9.3 The flow diagram of the artifact elimination. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

undergoes two low-pass filter operations when averaging and bilinear interpolation are implemented, while the upper branch preserves the high frequency contents of the TOPSAR DEM. In this way we can eliminate the large-scale artifacts while retaining details in the TOPSAR DEM.

9.4

Prediction-Error (PE) Filter

The next step in the DEM process is to fill in missing data. We use a PE filter operating on the TOPSAR DEM to fill these gaps. The basic idea of the PE filter constraint [7,8] is that missing data can be estimated so that the restored data yield minimum energy when the PE filter is applied. The PE filter is derived from training data, which are normally valid data surrounding the missing regions. The PE filter is selected so that the missing data and the valid data share the same spectral content. Hence, we assume that the spectral content of the missing data in the TOPSAR DEM is similar to that of the regions with valid data surrounding the missing regions.

9.4.1

Designing the Filter

We generate a PE filter such that it rejects data with statistics found in the valid regions of the TOPSAR DEM. Given this PE filter, we solve for data in the missing regions such that the interpolated data are also nullified by the PE filter. This concept is illustrated in Figure 9.4. The PE filter, fPE, is found by minimizing the following objective function, kfPE xe k2

(9:2)

where xe is the existing data from the TOPSAR DEM, and * represents convolution. This expression can be rewritten in a linear algebraic form using the following matrix operation: kFPE xe k2

(9:3)

kXe fPE k2

(9:4)

or equivalently

where FPE and Xe are the matrix representations of fPE and xe for convolution operation. These matrix and vector expressions are used to indicate their linear relationship.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

179

Xe ∗

fPE

∗

9.4.2

fPE = 0 + e1

Xm

= 0 + e2

FIGURE 9.4 Concept of PE filter. The PE filter is estimated by solving an inverse problem constrained with the remaining part, and the missing part is estimated by solving another inverse problem constrained with the filter. The «1 and «2 are white noise with small amplitude.

1D Example

The procedure of acquiring the PE filter can be explained with a 1D example. Suppose that a data set, x ¼ [x1, . . ., xn] (where n 3) is given, and we want to compute a PE filter of length 3, fPE ¼ [1 f1 f2]. Then we form a system of linear equations as follows: 2

x3 6 x4 6 6 .. 4 . xn

x2 x3 .. .

x1 x2 .. .

xn1

xn 2

3

2 3 7 1 74 5 7 f1 0 5 f2

(9:5)

The first element of the PE filter should be equal to one to avoid the trivial solution, fPE ¼ 0. Note that Equation 9.5 is the convolution of the data and the PE filter. After simple algebra and with 3 2 3 2 x3 x2 x1 6 . 7 6 . .. 7 d 4 .. 5 and D 4 .. . 5 xn1

xn

xn2

we get D

f1 f2

d

and its normal equation becomes f1 ¼ (DT D)1 DT ( d) f2

(9:6)

(9:7)

Note that Equation 9.7 minimizes Equation 9.2 in a least-square sense. This procedure can be extended to 2D problems, and more details are described in Refs. [7] and [8]. 9.4.3

The Effect of the Filter

Figure 9.5 shows the characteristics of the PE filter in the spatial and Fourier domains. Figure 9.5a is the sample DEM chosen from Figure 9.1 (numbered box 1) for demonstration. It contains various topographic features and has a wide range of spectral content (Figure 9.5d). Figure 9.5b is the 5-by-5 PE filter derived from Figure 9.5a by solving the inverse problem in Equation 9.3. Note that the first three elements in the first column of the filter coefficients are 0 0 1. This is the PE filter’s unique constraint that ensures the filtered output to be white noise [7]. In the filtered output (Figure 9.5c) all the variations in the DEM were effectively suppressed. The size (order) of the PE filter is based on the

Signal and Image Processing for Remote Sensing

180

(a)

(b)

(c) 0

−0.204

0.122

0.067 −0.023

0

−0.365 −0.009

0.008 −0.022

1.000 −0.234

(d)

0.033

0.051 −0.010

0.063

0.001 −0.013

−0.606

0.151

−0.072

0.073 −0.016 −0.015

(e)

0.020

(f)

FIGURE 9.5 The effect of a PE filter. (a) original DEM; (b) a 2D PE filter found from the DEM; (c) DEM filtered with the PE filter; and (d), (e), and (f) the spectra of (a), (b), and (c), respectively, plotted in dB. (a) and (c) are drawn with the same color scale. Note that in (c) the variation of image (a) was effectively suppressed by the filter. The standard deviations of (a) and (c) are 27.6 m and 2.5 m, respectively. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

complexity of the spectrum of the DEM. In general, as the spectrum becomes more complex, a larger size filter is required. After testing various sizes of the filter, we found a 5-by-5 size appropriate for the DEM used in our study. Figure 9.5d and Figure 9.5e show the spectra of the DEM and the PE filter, respectively. These illustrate the inverse relationship of the PE filter to the corresponding DEM in the Fourier domain, such that their product is minimized (Figure 9.5f). This PE filter constrains the interpolated data in the DEM to similar spectral content to the existing data. All inverse problems in this study were derived using the conjugate gradient method, where forward and adjoint functional operators are used instead of the explicit inverse operators [7], saving computer memory space.

9.5 9.5.1

Interpolation PE Filter Constraint

Once the PE filter is determined, we next estimate the missing parts of the image. As depicted in Figure 9.4, interpolation using the PE filter requires that the norm of the filtered output be minimized. This procedure can be formulated as an inverse computation minimizing the following objective function: kFPE xk2

(9:8)

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

181

where FPE is the matrix representation of the PE filter convolution, and x represents the entire data set including the known and the missing regions. In the inversion process we only update the missing region, without changing the known region. This guarantees seamless interpolation across the boundaries between the known and missing regions. 9.5.2

SRTM DEM Constraint

As previously stated, 90-m posting SRTM DEMs were generated from 30-m posting data. This downsampling was done by calculating three ‘‘looks’’ in both the easting and northing directions. To use the SRTM DEM as a constraint to interpolate the TOPSAR DEM, we posit the following relationship between the two DEMs: each pixel value in a 90m posting SRTM DEM can be considered equivalent to the averaged value of a 9-by-9 pixel window in a 10-m posting TOPSAR DEM centered at the corresponding pixel in the SRTM DEM. The solution using the constraint of the SRTM DEM to find the missing data points in the TOPSAR DEM can be expressed as minimizing the following objective function: ky Axm k2

(9:9)

where y is an SRTM DEM expressed as a vector that covers the missing regions of the TOPSAR DEM, and A is an averaging operator generating nine looks, and xm represents the missing regions of the TOPSAR DEM. 9.5.3

Inversion with Two Constraints

By combining two constraints, one derived from the statistics of the PE filter and one from the SRTM DEM, we can interpolate the missing data optimally with respect to both criteria. The PE filter guarantees that the interpolated data will have the same spectral properties as the known data. At the same time the SRTM constraint forces the interpolated data to have average height near the corresponding SRTM DEM. We formulate the inverse problem as a minimization of the following objective function: l2 kFPE xm k2 þ ky Axm k2

(9:10)

where l set the relative effect of each criterion. Here xm has the dimensions of the TOPSAR DEM, while y has the dimensions of the SRTM DEM. If regions of missing data are localized in an image, the entire image does not have to be used for generating a PE filter. We implement interpolation in subimages to save time and computer memory space. An example of such a subimage is shown in Figure 9.6. The image is a part of Figure 9.1 (numbered box 2). Figure 9.6a and Figure 9.6b are examples of xe in Equation 9.3 and y, respectively. The multiplier l determines the relative weight of the two terms in the objective function. As l ! 1, the solution satisfies the first constraint only, and if l ¼ 0, the solution satisfies the second constraint only.

9.5.4

Optimal Weighting

We used cross-validation sum of squares (CVSS) [9] to determine the optimal weights for the two terms in Equation 9.10. Consider a model xm that minimizes the following quantity:

Signal and Image Processing for Remote Sensing

182

(b)

(a)

2 20 4 40 6 60 8 80 10 100

12

120

14

140

16 20

40

60

80

100

800

900

2 1000

4

6

8

10

12

1100 1200 (m)

FIGURE 9.6 Example subimages of (a) TOPSAR DEM showing regions of missing data (black), and (b) SRTM DEM of the same area. These subimages are engaged in one implementation of the interpolation. The grayscale is altitude in meters. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

l2 kFPE xm k2 þ ky(k) A(k) xm k2

(k ¼ 1, . . . , N)

(9:11)

where y(k) and A(k) are the y and the A in Equation 9.10 with the k-th element and the k-th row omitted, respectively, and N is the number of elements in y that fall into the missing region. Denote this model x(k) m (l). Then we compute the CVSS defined as follows: CVSS(l) ¼

N 1 X 2 (yk Ak x(k) m (l)) N k¼1

(9:12)

where yk is the omitted element from the vector y and Ak is the omitted row vector from (k) the matrix A when the x(k) m (l) was estimated. Thus, Ak xm (l) is the prediction based on the other N 1 observations. Finally, we minimize CVSS(l) with respect to l to obtain the optimal weight (Figure 9.7). In the case of the example shown in Figure 9.6, the minimum CVSS was obtained for l ¼ 0.16 (Figure 9.7). The effect of varying l is shown in Figure 9.8. It is apparent (see Figure 9.8) that the optimal weight is a more ‘‘plausible’’ result than either of the end members, preserving aspects of both constraints. In Figure 9.8a the interpolation uses only the PE filter constraint. This interpolation does not recover the continuity of the ridge running across the DEM in north–south direction, which is observed in the SRTM DEM (Figure 9.6b). This follows from a PE filter obtained such that it eliminates the overall variations in the image. The variations include not only the ridge but also the accurate topography in the DEM. The other end member, Figure 9.8c, shows the result for applying zero weight to the PE filter constraint. Since the averaging operator A in Equation 9.10 is applied independently

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

183

CVSS

110 100

CVSS(l)

90 80 70 60 50 40

0.2

0.4

0.6

0.8

1 l

1.2

1.4

1.6

1.8

2

FIGURE 9.7 Cross-validation sum of squares. The minimum occurs when l ¼ 0.16. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

for each 9-by-9 pixel group, it is equivalent to simply filling the regions of missing data with 9-by-9 identical values that are the same as the corresponding SRTM DEM (Figure 9.6b). 9.5.5

Simulation of the Interpolation

The quality of cross-validation in this study is itself validated by simulating the interpolation process with known subimages that do not contain missing data. For example, if a known subimage is selected from Figure 9.1 (numbered box 3), we can remove some data and apply our recovery algorithm. The subimage is similar in topographic features to the area shown in Figure 9.6. The process is illustrated in Figure 9.9. We introduce a hole as shown in Figure 9.9b and calculate the CVSS (Figure 9.9d) for each l ranging from 0 to 2. Then we use the estimated l, which minimizes the CVSS, for the interpolation process to obtain the image in Figure 9.9c. For each value of l we also calculate the RMS error between the known and the interpolated images. The RMS error is plotted against l in Figure 9.9e. The CVSS is minimized for l ¼ 0.062, while the RMS error has a minimum at l ¼ 0.065. This agreement suggests that minimizing the CVSS is a useful method to balance the constraints. Note that the minimum RMS error in Figure 9.9e is about 5 m. This value is smaller than the relative vertical height accuracy of the SRTM DEM, which is about 10 m.

9.6

Interpolation Results

The method presented in the previous section was applied to the entire image of Figure 9.1. The registered TOPSAR DEM contains missing data in regions of various sizes. Small subimages were extracted from the DEM. Each subimage is interpolated, and the

Signal and Image Processing for Remote Sensing

184

20

20

20

40

40

40

60

60

80

80

80

100

100

100

120

120

120

140

140

140

60

A⬘

A

(a)

20

40

60

80

100

20

(b)

40

60

80

100

(c)

20

40

60

80

100

Profiles along A−A⬘

1080 1060

Altitude (m)

1040

1020

1000

980 (a) l → ∞ (b) l = 0.16 (c) l = 0 SRTM DEM

960

(d)

940 10

20

30

40

50

60

70

80

90

100

110

Distance (pixel)

FIGURE 9.8 The results of interpolation applied to DEMs in Figure 9.6, with various weights. (a) l!1, (b) l ¼ 0.16, and (c) l ¼ 0. Profiles along A–A0 are shown in the plot (d). (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

results are reinserted into the large DEM. The locations and sizes of the subimages are indicated with white boxes in Figure 9.10a. Note the largest region of missing data in the middle of the caldera. This region is not only a simple large gap but also a gap between two swaths. The interpolation is an iterative process and fills up regions of missing data starting from the boundary. If valid data along the boundary (boundaries of a swath for example) contain edge effects, error tends to propagate through the interpolation process. In this case, expanding the region of missing data by a few pixels before interpolation produces better results. If there is a large region of missing data, the spectral content information of valid data can fade out as the interpolation proceeds toward the center of the gap. In this case, sequentially applying the interpolation to parts of the gap is one solution. Due to edge effects along the boundary of the large gap, the interpolation result does not produce topography that matches the surrounding terrain well. Hence, we expand the gap by three pixels to eliminate edge effects. We divided the gap into multiple subimages, and each subimage was interpolated individually.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

20

20

20

40

40

40

60

60

60

80

80

80

100

100

100

120

120

120

(a)

20

40

60

80

100 120

140

(b)

20

40

60

80

100 120

140

(c)

20

40

60

185

80

100 120

140

130 125

7

120 6.5

RMS error

CVSS(l)

115 110 105 100

6 5.5

95 90

5

85

(d)

80 0

0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

l

1

4.5 0

(e)

0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

l

FIGURE 9.9 The quality of the CVSS, (a) a sample image that does not have a hole, (b) a hole was made, (c) interpolated image with an optimal weight, (d) CVSS as a function of l. The CVSS has a minimum when l ¼ 0.062, and (e) RMS error between true image (a) and the interpolated image (c). The minimum occurs when l ¼ 0.065. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

9.7

Effect on InSAR

Finally, we can investigate the effect of the artifact elimination and the interpolation on simulated interferograms. It is often easier to see differences in elevation in simulated interferograms than in conventional contour plots. In addition, simulated interferograms provide a measure of how sensitive the interferogram is to the topography. Figure 9.11 shows georeferenced simulated interferograms from three DEMs: the registered TOPSAR DEM, the TOPSAR DEM after the artifact elimination, and the TOPSAR DEM after the interpolation. In all interferograms, a C-band wavelength is used, and we assume a 452 m perpendicular baseline between two satellite positions. This perpendicular baseline is realistic [2]. The fringe lines in the interferograms are approximately height contour lines. The interval of the fringe lines is inversely proportional to the perpendicular baseline [10], and in this case one color cycle of the fringes represents about 20 m. Note in Figure 9.11a that the fringe lines are discontinuous across the long region of missing data inside the caldera. This is due to artifacts in the original TOPSAR DEM. After eliminating these artifacts the discontinuity disappears (Figure 9.11b). Finally, the missing data regions are interpolated in a seamless manner (Figure 9.11c).

Signal and Image Processing for Remote Sensing

186

100

200

300

400

500

600 (a)

200

400

600

800

1000

1200

200

400

600

800

1000

1200

100

200

300

400

500

600

(b)

800

900

1000

1100

1200 (m)

FIGURE 9.10 The original TOPSAR DEM (a) and the reconstructed DEM (b) after interpolation with PE filter and SRTM DEM constraints. The grayscale is altitude in meters, and the spatial extent is about 12 km across the image. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

187

(a)

(b)

(c)

FIGURE 9.11 Simulated interferograms from (a) the original registered TOPSAR DEM, (b) the DEM after the artifact was removed, and (c) the DEM interpolated with PE filter and the SRTM DEM. All the interferograms were simulated with the C-band wavelength (5.6 cm) and a perpendicular baseline of 452 m. Thus, one color cycle represents 20 m height difference. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

9.8

Conclusion

The aircraft roll artifacts in the TOPSAR DEM were eliminated by subtracting the difference between the TOPSAR and SRTM DEMs. A 2D PE filter derived from the existing data and the SRTM DEM for the same region are then used as interpolation constraints.

188

Signal and Image Processing for Remote Sensing

Solving the inverse problem constrained with both the PE filter and the SRTM DEM produces a high-quality interpolated map of elevation. Cross-validation works well to select optimal constraint weighting in the inversion. This objective criterion results in less biased interpolation and guarantees the best fit to the SRTM DEM. The quality of many other TOPSAR DEMs can be improved similarly.

References 1. H.A. Zebker and R.M. Goldstein, Topographic mapping from interferometric synthetic aperture radar observations, Journal of Geophysical Research, 91(B5), 4993–4999, 1986. 2. F. Amelung, S. Jo´nsson, H. Zebker, and P. Segall, Widespread uplift and ‘trapdoor’ faulting on Gala´pagos volcanoes observed with radar interferometry, Nature, 407(6807), 993–996, 2000. 3. S. Yun, P. Segall, and H. Zebker, Constraints on magma chamber geometry at Sierra Negra volcano, Gala´pagos Islands, based on InSAR observations, Journal of Volcanology and Geothermal Research, 150, 232–243, 2006. 4. H.A. Zebker, S.N. Madsen, J. Martin, K.B. Wheeler, T. Miller, Y.L. Lou, G. Alberti, S. Vetrella, and A. Cucci, The TOPSAR interferometric radar topographic mapping instrument, IEEE Transactions on Geoscience and Remote Sensing, 30(5), 933–940, 1992. 5. S.N. Madsen, H.A. Zebker, and J. Martin, Topographic mapping using radar interferometry: processing techniques, IEEE Transactions on Geoscience and Remote Sensing, 31(1), 246–256, 1993. 6. Y. Kobayashi, K. Sarabandi, L. Pierce, and M.C. Dobson, An evaluation of the JPL TOPSAR for extracting tree heights, IEEE Transactions on Geoscience and Remote Sensing, 38(6), 2446–2454, 2000. 7. J.F. Claerbout, Earth Sounding Analysis: Processing versus Inversion, Blackwell, 1992 [Online], http://sepwww.stanford.edu/sep/prof/index.html. 8. J.F. Claerbout and S. Fomel, Image Estimation by Example: Geophysical Soundings Image Construction (Class Notes), 2002 [Online], http://sepwww.stanford.edu/sep/prof/index.html. 9. G. Wahba, Spline Models for Observational Data, ser. No. 59, Applied Mathematics, Philadelphia, PA, SIAM, 1990. 10. H.A. Zebker, P.A. Rosen, and S. Hensley, Atmospheric effects in interferometric synthetic aperture radar surface deformation and topographic maps, Journal of Geophysical Research, 102(B4), 7547–7563, 1997.

10 Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

Fengyu Cong, Chi Hau Chen, Shaoling Ji, Peng Jia, and Xizhi Shi

CONTENTS 10.1 Introduction ..................................................................................................................... 189 10.2 Problem Description....................................................................................................... 190 10.3 BSCM Algorithm............................................................................................................. 192 10.4 Experiments and Analysis ............................................................................................ 195 10.4.1 Backward Matrix of Active Sonar Data for Approximating the BSCM Model ............................................................................................... 195 10.4.2 Examples of Canceling Real Sea Reverberation .......................................... 196 10.4.2.1 Example 1: Simulation 1 ................................................................. 196 10.4.2.2 Example 2: Simulation 2 ................................................................. 196 10.4.2.3 Example 3: Simulation 3 ................................................................. 197 10.4.2.4 Example 4: Real Target Experiment ............................................. 200 10.4.2.5 Example 5: Simulation 4 ................................................................. 200 10.5 Conclusions...................................................................................................................... 201 References ................................................................................................................................... 202

10.1

Introduction

Under heavy oceanic reverberation, it can be difficult to detect a target echo accurately with an active sonar. To resolve the problem, researchers apply two methods that use reverberation models and specialized processing techniques [1]. For the first method, receivers with differing range resolutions may encounter different statistics for a given waveform. Furthermore, a given receiver may encounter different statistics at different ranges [2]. Researchers have used Weibull, log-normal, Rician, multi-modal Rayleigh, and nonRayleigh distributions to describe sonar reverberations [3,4]. For the second method, it is usually assumed that reverberation is a sum of returns issued from the transmitted signal. Under this assumption, a data matrix is first generated from the data received by the active sonar data [5,6]. The principal component inverse (PCI) [7–13] is primarily used to separate reverberation and target echoes from the data matrix. However, important prior knowledge such as the target power should be provided [11]. In Refs. [11–13], PCI and other methods have performed very well in Doppler cases, and the authors have also shown that PCI still performs well when the Doppler effect is not introduced. Provided that prior knowledge is 189

Signal and Image Processing for Remote Sensing

190

hard to obtain and the Doppler effect does not exist, it becomes very desirable to cancel reverberation with easily obtainable but minimal prior knowledge even in more complicated undersea situations. The chapter focuses on this case. The essence of PCI is separation by rank reduction. Nevertheless, the problem of separation is an old one in electrical engineering and many algorithms exist depending on the nature of signals. Blind signal separation (BSS) [14] is a significant statistical signal processing method that has been developed in the past 15 years. The advantage of BSS is that it does not need much prior knowledge and makes full use of the simple and apparent statistical properties, such as non-Gaussianity, nonstationarity, colored character, uncorrelatedness, independence, and so on. We studied BSS on canceling reverberation in Ref. [15]. From the perspective of BSS, the data received by active sonar is the convolutive mixture [16], while the instantaneous mixture model was only discussed in Ref. [15]. Consequently, we perform blind separation of convolutive mixtures (BSCM) to nullify oceanic reverberation in this contribution. In Ref. [15], the data waveform is only described in time-domain, and here, we will provide more illustrations on the matched filter outputs under different signal-to-reverberation ratios (SRR). We provide more examples for better explanation. The rest of this chapter is organized as follows: Section 10.2 presents the problem description; Section 10.3 introduces the BSCM algorithm, which is based on the reverberation characters; Section 10.4 provides examples of canceling real sea reverberation as the main content; and finally, Section 10.5 summarizes the above contents.

10.2

Problem D escription

In this chapter, we assume that reverberation is a sum of returns generated from the transmitted signal. In Ref. [16], the active sonar data series d(t) is expressed as d(t) ¼ h1 (t t 1 ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

K X k¼2

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

(10:1)

noise

reverberation

where t k is the propagation delay, hk(t) is the path impulse response, and e(t) is the transmitted signal. In the reverberation dominant circumstance, we can decompose the active sonar data into the target echo component ~d(t) and the reverberation component r(t), as defined in Equation 10.2. Extracting the target and nullifying the reverberation are done simultaneously ~ d(t) ¼ h1 (t t 1 ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z} target

r(t) ¼

K X k¼2

noise

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

(10:2)

noise

It is apparent that ~ d(t) and r(t) are of different non-Gaussianity. Next, we show an example of real sea reverberation. For simplicity, we do not introduce the experimental detail in this section. The transmitted signal is sinusoid with a frequency of 1750 Hz, a duration of 200 msec, and a sampling frequency of 6000 Hz. Figure 10.1 contains only the target echo and the background noise,

Normalized amplitude

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation 0.8 0.6 0.4 0.2 0 − 0.2 − 0.4 − 0.6 − 0.8 0

0.5

1

Samples

1.5

2

191

2.5 ⫻ 104

FIGURE 10.1 Target echo waveform.

and most reverberation generated by a sine wave is included in Figure 10.2 where no target echo exists and the background noise is embedded. The two waveforms in Figure 10.1 and Figure 10.2 are the time-domain descriptions of ~d(t) and r(t) with corresponding standard kurtosis [17] 4.1 and 1.6, respectively. Here, the time-domain non-Gaussianity is enough to discriminate ~ d(t) and r(t). It may appear that detection for ~d(t) must be evident. However, in the real world problem, it is not simple to cancel the reverberation to the degree as in Figure 10.1. To explore different effects of reverberation on detection in different SRRs, we give several examples of real sea reverberation as follows. In Figure 10.3, the first upper plot is reverberation, and in the remaining three plots, different targets are simulated and added to the reverberation with the SRR ¼ 0 dB, 7 dB, and 14 dB, respectively, from the upper to lower plots. The target echo is located between the 9,000th and 11,000th samples. The target does not move, hence no Doppler effect is produced. The matched filter outputs follow in the next figure. In Figure 10.4, the middle two plots show that the matched filter results are satisfactory even in the case that the SRR is quite small, and the lowest plot gives too many false alarms. The task for us is to cancel the reverberation to the degree that the detection is possible, similar to the middle two plots. In Equation 10.2, ~ d(t) does not contain any reverberation and just covers only the target echo and the background noise. In real world case, that is impossible. However, it does not harm the detection. Figure 10.4 also reminds us that even if ~ d(t) ¼ h1 (t t 1 ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

~ K X k¼2

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

(10:3)

noise

Normalized amplitude

~ < K, and as long as the reverberation is reduced to some satisfactory level, the where K detection can also be reliable. After comparing the first and the third plots in Figure 10.3 and Figure 10.4, respectively, we find that Figure 10.3 shows that the reverberation waveform with small SRR target added is nearly identical to the waveform of

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

FIGURE 10.2 Reverberation waveform.

0.5

1

Samples

1.5

2

2.5 ⫻ 104

Signal and Image Processing for Remote Sensing

192 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

Waveform: reverberation only

0

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 0 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = 0 dB

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = −7 dB

0

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = −14 dB

0

0.5

1

1.5

2

2.5 ⫻ 104

FIGURE 10.3 Real reverberation with different simulated targets.

reverberation, but Figure 10.4 implies that the corresponding matched filter outputs are definitely of different non-Gaussianity. This information is useful to our method for distinguishing the target echoes from the reverberation echoes.

10 .3

BSCM Algorit hm

In the past several years, different researchers developed several BSCM algorithms according to signal properties. Generally speaking, for a given source–receiver pair, the waveform arrives at different travel times, owing to varying path lengths [18]. All of these make reverberations nonstationary. The transmitted signal is sinusoid here, so the

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

193

MF output: reverberation only

0

0.5

1

1.5

2

⫻104

2.5

MF output: SRR = 0 dB

0

0.5

1

1.5

2

2.5 ⫻104

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

MF output: SRR = −7 dB

0

0.5

1

1.5

2

2.5

Normalized amplitude

⫻104 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

MF output: SRR = −14 dB

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.4 Outputs of matched filter on the data in Figure 10.3.

reverberations must retain the colored feature. Therefore, we say reverberations are nonstationary and colored. In this section, we will give only a brief introduction on the BSCM algorithm. Consider a speech scenario, where J microphones receive multiple filtered copies of statistically uncorrelated or independent signals. Mathematically, the received signals can be expressed as a convolution, that is, xj (t) ¼

I X P 1 X

hji (p)si (t p), j ¼ 1, . . . , J 0;

t ¼ 1, . . . , L

(10:4)

i¼1 p¼0

where hji(p) models the P-point impulse response from source to microphone and L is the length of the received signal. In a more compact matrix–vector notation, Equation 10.4 can be stated as x(t) ¼

P 1 X p ¼0

H(p)s(t p)

Signal and Image Processing for Remote Sensing

194

s2(n)

+

H11

s1(n)

x1(n)

H21

W21

H12

W12 +

H22

x2(n)

Mixing

W11

+

s1(n)

W22

+

s2(n)

Unmixing

FIGURE 10.5 Specific (2 2) blind separation of convolutive mixtures problem.

¼ H(p) s(t)

(10:5)

where x(t) ¼ [x1(t), . . . , xJ(t)]T is the received signal vector, s(t) ¼ [s1(t), . . . , sI(t)]T is the source vector, H(p) is the mixing filter matrix, * denotes the convolution, and [.]T means the transpose operation. Without loss of generality, we assume J ¼ I in this chapter. With this problem setup, the objective of the BSCM techniques is to find an unmixing filter W(q) of length Q for deconvolution, that is, Ð s(t) ¼ W(q) x(t)

(10:6)

where ^s(t) denotes the estimated sources. Figure 10.5 is a block diagram for J ¼ I ¼ 2. For nonstationary sources, a moving block is usually performed on received signals Ð Ð x(t,m) ¼ H(p) s(t,m), t ¼ 1, . . . , N, m ¼ 1, . . . , M

(10:7)

where m is the index for blocks, M is the total number of blocks, and N is the length of each block. After the fast Fourier transform (FFT) is applied on the three components in Equation 10.7, we obtain

Ð Ð X(w,m) ¼ H(w) S(w,m), w ¼ 1, . . . , N

(10:8)

where w denotes the frequency bin. Thus, the convolutive mixtures in time-domain turn into the instantaneous mixtures in frequency domain. This is the basic idea of the frequency method for BSCM. Then, the separation is done in each frequency bin for the unmixed filters,

Ð ^ (w,m) Y(w,m) ¼ W(w)X

(10:9)

Ð are the unmixed filter and unmixed signals in frequency bin w, where W(w) and Y(w,m) respectively. Since the envelope of the signal spectrum is correlated, XÐ j(w,m) must be a colored signal. Under the independence assumption of signals in the time and frequency domains, we apply the cost function as follows: F¼

T1 1X kW(w,r)G xÐ (w,t)WH (w,r) Ik2 4 t¼0

(10:10)

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

195

The algorithm for the unmixing matrix is given by W(w,r þ 1) ¼ W(w,r) þ m(w)

T 1 X

W(w,r)GxÐ (w,t )WH (w,r) I W(w,r)G xÐ (w,t )

(10:11)

t¼0

where m is the learning rate, r is the iteration time, T is the total number of delayed samples, [.]H is the conjugate transpose operation, I is the identity matrix, and with Ð w,m). BSS may introduce t samples delayed, G xÐ (w,t ) is the autocorrelation matrix of x( two indeterminacies including the permutation and scaling problems [14,17]. We apply the method in Ref. [19] to resolve the permutation. More methods may be found in Refs. [20–32]. Also, the scaling ambiguity is a nearly open problem to deal with. Some methods to deal with this problem are provided in Refs. [33,34]. Generally, to normalize the permuted estimated matrix is a fast and effective way. This is helpful in maintaining a stable convergence too [35]. We do not discuss this much here. After the inverse FFT is applied on W(w) we get W(q), and then BSCM is performed according to Equation 10.6. For more information about BSCM, please refer to the literatures mentioned above.

10 .4 10.4.1

Ex pe riment s an d Analy sis Backward Matrix of Active Sonar Data for Approximating the BSCM Model

In Section 10.3, BSCM is introduced with some assumptions. However, in applications like speech enhancement, remote sensing, communication systems, geophysics, and biomedical engineering, conditions may not meet the basic theoretical requirements of BSCM. Interestingly, as long as algorithms are converged, BSCM still performs signal deconvolution effectively after some approximations are made [14,17]. For active sonar, the target echo and the oceanic reverberation are all generated by the same transmitted signal, so they must be somewhat correlated nevertheless. Principal component analysis [36] is the method for decorrelation, and PCI has been applied in separating the target echo and reverberation with some prior knowledge provided [5–13]. As Neumann and Krawczyk have pointed out, the forward matrix model used is not a principal issue of the PCI algorithm, and any other forward model can be used [37]. Approximation implies that the processing result must be discounted under the comparison to the corresponding theory. Fortunately, as we pointed out in Section 10.2 the detection in the presence of reverberation may also be reliable with the matched filter even when the reverberation is not absolutely canceled, but the SRR should be within a certain limit. So, the approximation to BSCM is possible by the one-dimensional active sonar data. Intuitively, we generate the backward data matrix from active sonar data d(t) as the received signals x(t) to approximate the classical BSCM model: T x(t) ¼ x1 (t), . . . , xJ (t) xj (t) ¼ d½t þ j(B 1) 1

(10:12)

where xj(t) is the so-called jth received signal, J is the total number of received signals, and B is the size of a moving block, and it is an empirical parameter. By adjusting J and B, we may produce a different backward matrix. Also, xj(t) will not miss the target because the

Signal and Image Processing for Remote Sensing

196

two numbers J and B are relatively small in comparison to the whole length of d(t). The losses of xj(t) are at the beginning of the received reverberation and are often not useful. These make the approximation entirely reasonable.

10.4.2

Examples of Canceling Real Sea Reverberation

To simplify the discussion in this chapter, we only outline briefly the experimental setup here. Further details are in Ref. [11] about the underwater acoustic data and experiments. The transmitted signal is sinusoid with a frequency of 1750 Hz, the duration is 200 msec, and the sampling frequency is 6000 Hz. The examples are all with different targets simulated and added into the real sea reverberation. Before the targets are added, the reverberation is through the band-pass filter from 1500 to 2000 Hz. The reverberation we use is over 6 sec. For an enlarged plot, we show only the first 4 sec, which has no effect on the result as the reverberation is very light in the last 2 sec. The block size is N ¼ 200 for Equation 10.7, and the total number of delayed samples is T ¼ M for Equation 10.9. Since the energy varies in different frequency bins, the learning rate m(w) is supposed to correspond to the various frequency bins, and then it is defined as in Ref. [38]: 0:6 m(w) ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ T1 P jG xÐ (w,t)j2

(10:13)

t¼0

10.4.2.1

Example 1: Simulation 1

We take the example of the last plot in Figure 10.3. The corresponding part in Figure 10.4 shows that the detection is no good. A sinusoid target echo of 1750 Hz is simulated and added to the reverberation with SRR ¼ 14 dB. The target echo is located between the 9,001st and 11,000th samples. No Doppler effect is produced, and J ¼ 38 and B ¼ 1. After BSCM is performed on the backward matrix, J deconvolved signals are given, and then, all signals pass through the matched filter. We compute the kurtosis of all the outputs of the matched filter and choose the 18th deconvolved signal that has the highest kurtosis in Figure 10.6 as the BSCM result ~ d(t). In the next simulations, the BSCM results follow this rule. Figure 10.7 is the matched filter on the BSCM result ~d(t). By contrast to the fourth plot in Figure 10.4, it is apparent that the detection is more reliable after BSCM. Next, we show an example with Doppler effect.

10.4.2.2 Example 2: Simulation 2 A sinusoid target echo of 1745 Hz is simulated and added to the reverberation with SRR ¼ 19 dB. The target echo is located between the 12,001st and 14,000th samples. Doppler effect exists, and J ¼ 20 and B ¼ 1. Figure 10.8 is the matched filter on d(t) and the BSCM result ~ d(t). Figure 10.8 indicates that the BSCM is more effective when the Doppler effect is introduced because the reverberation and the target echo are more different. Figure 10.7 and Figure 10.8 show that the SRR is improved to the degree that the detection is more reliable than before. Since the transmitted signal and the target echo are all sinusoids of a single frequency or with very small bandwidth, obvious changes in frequency domain between the original d(t) and the BSCM result ~d(t) are not found. So, we perform the third

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

197

6

Components kurtosis

5.5 5 4.5 4 3.5 3

0

5

10

15

20 25 Components number

30

35

40

FIGURE 10.6 Kurtosis of outputs from the matched filter on the deconvoluted signals.

simulation. Although it is not very reasonable, it does help to show the validation of BSCM algorithm. 10.4.2.3

Example 3: Simulation 3

A hyperbolic frequency modulated (HFM) target echo is simulated and added to the reverberation with the SRR ¼ 28 dB. The center frequency is 1750 Hz and the bandwidth is 500 Hz. The target echo is located between the 12,001st and 18,000th samples, and J ¼ 30 and B ¼ 2. Figure 10.9 is the matched filter on d(t) and the BSCM result ~d(t). In Figure 10.9, it is amazing that the detection is improved greatly after BSCM. To compare the changes in frequency domain, we take out 6000 samples as the target echo from the 12,001st to 18,000th samples d(t) and ~ d(t), respectively, and then, FFT is applied on the two groups of data. Figure 10.10 shows the changes. It is apparent that the SRR has improved a lot after BSCM was applied.

0.8

Normalized amplitude

0.7

BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.7 Output of matched filter on the BSCM result on reverberation with a sinusoid simulated target (SRR ¼ 14 dB, no Doppler effect exists).

Signal and Image Processing for Remote Sensing

198

Normalized amplitude

0.8 0.7 0.6 MF output

0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

Normalized amplitude

0.8 0.7 0.6

BSCM MF output

0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.8 Outputs of matched filter on a real reverberation with a sinusoid simulated target (SRR ¼ 19 dB, Doppler effect exists).

0.8

Normalized amplitude

0.7 MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

0.8

Normalized amplitude

0.7 BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.9 Outputs of matched filter on reverberation with a HFM simulated target with SRR ¼ 28 dB.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

199

0.8

Normalized amplitude

0.7

Reverberation with simulated target

0.6 0.5 0.4 0.3 0.2 0.1 0

0

500

1000

1500 2000 Frequency bins

2500

3000

3500

0.8

Normalized amplitude

0.7

BSCM result

0.6 0.5 0.4 0.3 0.2 0.1 0

0

500

1000

1500 2000 Frequency bins

2500

3000

3500

FIGURE 10.10 The changes in frequency domain.

Normalized amplitude

0.8 0.7

MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3 Samples

4

5

6 ⫻ 104

Normalized amplitude

0.8 0.7

BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3 Samples

4

5

6 ⫻ 104

FIGURE 10.11 Outputs of matched filter on the active sonar data and the BSCM output in real target experiment.

Signal and Image Processing for Remote Sensing

200

Reverberation with simulated targets

Normalized amplitude

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

3.5

4 ⫻ 104

Normalized amplitude

0.8 0.6

BSCM output

0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

FIGURE 10.12 Waveforms of deep sea reverberation with two simulated targets and BSCM output—no overlap between the target echoes.

10.4.2.4

Example 4: Real Target Experiment

To simplify the discussion in this paper, we only outline briefly the experimental setup here. Further details on the underwater acoustic data and experimental results are in Ref. [11]. The transmitted signal is HFM, the duration is 1 sec, and the bandwidth is 500 Hz. The target speed is about 2.0 m/sec, the range is about 2500 m. The sampling frequency is also 6000 Hz. The data we use is after beamforming. Figure 10.11 shows the matched filter output of the original data and the BSCM output signals. The figure reveals the effect in canceling oceanic reverberation by BSCM without any expensive prior knowledge. 10.4.2.5

Example 5: Simulation 4

The above examples are all reverberations in shallow water. In this simulation, we give an example for canceling deep reverberation. The sea depth is beyond 4000 m. The sampling frequency is also 20,000 Hz. The transmitted signal is HFM and the bandwidth is 100 Hz. In deep sea, more than one target echo may occur within the reverberation issued by one ping; that is, Equation 10.1 is now written as d(t) ¼

K1 X k¼1

hk (t t k ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

K2 X k¼K1 þ1

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

noise

(10:14)

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

Reverberation with simulated targets

0.8 Normalized amplitude

201

0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

Normalized amplitude

0.8 0.6

BSCM output

0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

FIGURE 10.13 Waveforms of deep sea reverberation with two simulated targets and BSCM output—overlap between the target echoes.

In this example two target echoes are simulated. The SRR is 0 dB. The first case is shown in Figure 10.12. The two echoes do not overlap, and the duration is 0.05 sec. The second overlaps at the rate of 66.7% as shown in Figure 10.13 and the duration is 0.3 sec. Both Figure 10.12 and Figure 10.13 show that SRR has improved. This simulation implies that BSCM is also effective in canceling deep sea reverberation. In all simulations, we do not use BSS, but it does not imply that BSS is not useful. Compared to BSCM, BSS needs larger J and B, so the computation is more expensive. Sometimes, BSS cannot provide sound results, but BSCM can be used because it is much closer to the convolutive model of the active sonar data.

10 .5

Conc lusi ons

In this chapter we applied three useful and easily obtainable statistical characteristics to remove reverberation by deconvolving and distinguishing the target echoes from the reverberation echoes. They are the nonstationarity and the colored features of reverberation, and the non-Gaussianity of outputs of the matched filter on BSCM results. Except for statistical information, BSCM usually does not need other expensive prior knowledge, such as the target echo energy or the Doppler shift, and so on. This is the advantage of our

202

Signal and Image Processing for Remote Sensing

method. Though the active sonar data used in this paper is one-dimensional convolutive mixtures, the elimination of real sea reverberation proves that the approximation to the BSCM model is effective through backward data matrix. Examples show that BSCM results can improve the SRR to a degree of satisfactory detection. Finding a good criterion for choosing the appropriate data matrix, as in Ref. [15], is open for exploration.

References 1. T.J. Barnard and F. Khan, Statistical normalization of spherically invariant non-Gaussian clutter, IEEE Journal of Oceanic Engineering, 29(2), 303–309, 2004. 2. J.M. Fialkowski, R.C. Gauss, and D.M. Drumheller, Measurements and modeling of lowfrequency near-surface scattering statistics, IEEE Journal of Oceanic Engineering, 29(2), 197–214, 2004. 3. K.D. LePage, Statistics of broad-band bottom reverberation predictions in shallow-water waveguides, IEEE Journal of Ocean Engineering, 29(2), 330–346, 2004. 4. J.R. Preston and D.A. Abraham, Non-Rayleigh reverberation characteristics near 400 Hz observed on the New Jersey shelf, IEEE Journal of Oceanic Engineering, 29(2), 215–235, 2004. 5. I.P. Kirsteins and D.W. Tufts, Adaptive detection using low rank approximation to a data matrix, IEEE Transactions on Aerospace and Electronic Systems, 30(1), 55–67, 1994. 6. I.P. Kirsteins and D.W. Tufts, Rapidly adaptive nulling of interference, IEEE International Conference on Systems Engineering, pp. 269–272, 24–26 August 1989. 7. B.E. Freburger and D.W. Tufts, Case study of principal component inverse and cross spectral metric for low rank interference adaptation, in Proceedings of ICASSP ’98, Vol. 4, pp. 1977–1980, 12–15 May 1998. 8. B.E. Freburger and D.W. Tufts, Adaptive detection performance of principal components inverse, cross spectral metric and the partially adaptive multistage Wiener filter, Conference Record of the Thirty-Second Asilomar Conference on Signals, Systems & Computers, Vol. 2, pp. 1522– 1526, 1–4 November 1998. 9. B.E. Freburger and D.W. Tufts, Rapidly adaptive signal detection using the principal component inverse (PCI) method, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, Vol. 1, pp. 765–769, 2–5 November 1997. 10. T.A. Palka and D.W. Tufts, Reverberation characterization and suppression by means of principal components, in OCEANS ’98 Conference Proceedings, Vol. 3, pp. 1501–1506, 28 September–1 October 1998. 11. G. Ginolhac and G. Jourdain, Principal component inverse algorithm for detection in the presence of reverberation, IEEE Journal of Oceanic Engineering, 27(2), 310–321, 2002. 12. G. Ginolhac and G. Jourdain, Detection in presence of reverberation, OCEANS 2000 MTS/IEEE Conference and Exhibition, Vol. 2, pp. 1043–1046, 11–14 September 2000. 13. V. Carmillet, P.O. Amblard, and G. Jourdain, Detection of phase- or frequency-modulated signals in reverberation noise, JASA, 105(6), 3375–3389, 1999. 14. A. Cichocki and S. Amari, Adaptive Blind Signal Image Processing: Learning Algorithm and Application, John Wiley & Sons, New York, 2002. 15. F. Cong et al., Blind signal separation and reverberation canceling with active sonar data, in Proceedings of ISSPA, Australia 2005. 16. G.S. Edelson and I.P. Kirsteins, Modeling and suppression of reverberation components, IEEE Seventh SP Workshop on Statistical Signal and Array Processing, pp. 437–440, 26–29 June 1994 17. A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley Interscience, New York, 2001. 18. D.A. Abraham and A.P. Lyons, Simulation of non-Rayleigh reverberation and clutter, IEEE Journal of Oceanic Engineering, 29(2), 347–362, 2004.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

203

19. F. Cong et al., Approach based on colored character to blind deconvolution for speech signals, in Proceedings of ICIMA, pp. 396–399, 2004. 20. S. Amari, S.C. Douglas, A. Cichocki, and H.H. Yang, Multichannel blind deconvolution and equalization using the natural gradient, in Proceedings of IEEE Workshop Signal Processing Advances Wireless Communications, pp. 101–104, April 1997. 21. M. Kawamoto, K. Matsuoka, and N. Ohnishi, A method of blind separation for convolved nonstationary signals, Neurocomputing, 22, 157–171, 1998. 22. S.C. Douglas and X. Sun, Convolutive blind separation of speech mixtures using the natural gradient, Speech Communications, 39, 65–78, 2003. 23. P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Neurocomputing, 22, 21–34, 1998. 24. H. Sawada, R. Mukai, S. Araki, and S. Makino, Polar coordinate based nonlinear function for frequency domain blind source separation, IEICE Transactions Fundamentals, E86-A(3), 590– 596, 2003. 25. L. Schobben and W. Sommen, A frequency domain blind signal separation method based on decorrelation, IEEE Transactions on Signal Processing, 50, 1855–1865, 2002. 26. H. Sawada, R. Mukai, S. Araki, and S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation, IEEE Transactions on Speech and Audio Processing, 12(5), 530–538, 2004. 27. W. Lu and J.C. Rajapakse, Eliminating indeterminacy in ICA, Neurocomputing, 50, 271–290, 2003. 28. S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari, The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech, IEEE Transactions on Speech and Audio Processing, 11(2), 109–116, 2003. 29. M.Z. Ikram and D.R. Morgan, Permutation inconsistency in blind speech separation: investigation and solutions, IEEE Transactions on Speech and Audio Processing, 13(1), 1–13, 2005. 30. A. Dapena and L. Castedo, A novel frequency domain approach for separating convolutive mixtures of temporally white signals, Digital Signal Processing, 13(2), 301–316, 2003. 31. A. Dapena, M.F. Bugallo, and L. Catcdo, Separation of convolutive mixtures of temporallywhite signals: a novel frequency-domain approach, in Proceedings of Third ICA, San Diego, California, pp. 179–184, 2001. 32. C. Mejuto, A. Dapena, and L. Casteda, Frequency-domain infomax for blind source separation of convolutive mixtures, in Proceedings of ICA 2000, pp. 315–320, Helsinki, 2000. 33. K. Matsuoka and S. Nakashima, Minimal distortion principle for blind source separation, in Proceedings of ICA, pp. 722–727, December 2001. 34. N. Murata, S. Ikeda, and A. Ziehe, An approach to blind source separation based on temporal structure of speech signals, Neurocomputing, 41(1), 1–24, 2001. 35. W. Lu and J.C. Rajapakse, Eliminating indeterminacy in ICA, Neurocomputing, 50, 271–290, 2003. 36. K.I. Diamantaras and S.Y. Kung, Principal Component and Neural Network Theory and Application, John Wiley & Sons, New York, 1996. 37. A. Neumann and H. Krawczyk, Principal Component Inversion, Training Course on Remote Sensing of Ocean Color, Ahmedabad, India, February 2001. 38. M.Z. Ikram, Multichannel Blind Separation of Speech Signals in a Reverberant Environment, Ph.D thesis, Georgia Institute of Technology, 2001.

11 Neural Network Retrievals of Atmospheric Temperature and Moisture Profiles from HighResolution Infrared and Microwave Sounding Data

William J. Blackwell

CONTENTS 11.1 Introduction ..................................................................................................................... 206 11.2 A Brief Overview of Spaceborne Atmospheric Remote Sensing............................ 207 11.2.1 Geophysical Parameter Retrieval ................................................................... 209 11.2.2 The Motivation for Computationally Efficient Algorithms ....................... 210 11.3 Principal Components Analysis of Hyperspectral Sounding Data........................ 210 11.3.1 The PC Transform............................................................................................. 211 11.3.2 The NAPC Transform ...................................................................................... 211 11.3.3 The Projected PC Transform ........................................................................... 211 11.3.4 Evaluation of Compression Performance Using Two Different Metrics ...................................................................................... 212 11.3.4.1 PC Filtering ....................................................................................... 212 11.3.4.2 PC Regression................................................................................... 213 11.3.5 NAPC of Clear and Cloudy Radiance Data ................................................. 214 11.3.6 NAPC of Infrared Cloud Perturbations ........................................................ 214 11.3.7 PPC of Clear and Cloudy Radiance Data ..................................................... 216 11.4 Neural Network Retrieval of Temperature and Moisture Profiles ........................ 218 11.4.1 An Introduction to Multi-Layer Neural Networks ..................................... 218 11.4.2 The PPC–NN Algorithm.................................................................................. 219 11.4.2.1 Network Topology........................................................................... 220 11.4.2.2 Network Training ............................................................................ 220 11.4.3 Error Analyses for Simulated Clear and Cloudy Atmospheres ............... 220 11.4.4 Validation of the PPC–NN Algorithm with AIRS=AMSU Observations of Partially Cloudy Scenes Over Land and Ocean ............. 222 11.4.4.1 Cloud Clearing of AIRS Radiances............................................... 222 11.4.4.2 The AIRS=AMSU=ECMWF Data Set.............................................. 223 11.4.4.3 AIRS=AMSU Channel Selection ..................................................... 223 11.4.4.4 PPC–NN Retrieval Enhancements for Variable Sensor Scan Angle and Surface Pressure.................................................. 225 11.4.4.5 Retrieval Performance..................................................................... 225 11.4.4.6 Retrieval Sensitivity to Cloud Amount........................................ 225 11.4.5 Discussion and Future Work .......................................................................... 226

205

Signal and Image Processing for Remote Sensing

206

11.5 Summary .......................................................................................................................... 227 Acknowledgments ..................................................................................................................... 230 References ................................................................................................................................... 230

11.1

Introduction

Modern atmospheric sounders measure radiance with unprecedented resolution and accuracy in spatial, spectral, and temporal dimensions. For example, the Atmospheric Infrared Sounder (AIRS), operational on the NASA EOS Aqua satellite since 2002, provides a spatial resolution of 15 km, a spectral resolution of n=Dn 1200 (with 2,378 channels from 650 to 2675 cm1), and a radiometric accuracy on the order of +0.2 K. Typical polar-orbiting atmospheric sounders measure approximately 90% of the Earth’s atmosphere (in the horizontal dimension) approximately every 12 h. This wealth of data presents two major challenges in the development of retrieval algorithms, which estimate the geophysical state of the atmosphere as a function of space and time from upwelling spectral radiances measured by the sensor. The first challenge concerns the robustness of the retrieval operator and involves maximal use of the geophysical content of the radiance data with minimal interference from instrument and atmospheric noise. The second is to implement a robust algorithm within a given computational budget. Estimation techniques based on neural networks (NNs) are becoming more common in high-resolution atmospheric remote sensing largely because their simplicity, flexibility, and ability to accurately represent complex multi-dimensional statistical relationships allow both of these challenges to be overcome. In this chapter, we consider the retrieval of atmospheric temperature and moisture profiles (quantity as a function of altitude) from radiance measurements at microwave and thermal infrared wavelengths. A projected principal components (PPC) transform is used to reduce the dimensionality of and optimally extract geophysical information from the spectral radiance data, and a multi-layer feedforward NN is subsequently used to estimate the desired geophysical profiles. This algorithm is henceforth referred to as the ‘‘PPC–NN’’ algorithm. The PPC–NN algorithm offers the numerical stability and efficiency of statistical methods without sacrificing the accuracy of physical, model-based methods. The chapter is organized as follows. First, the physics of spaceborne atmospheric remote sensing is reviewed. The application of principal components transforms to hyperspectral sounding data is then presented and a new approach is introduced, where the sensor radiances are projected into a subspace that reduces spectral redundancy and maximizes the resulting correlation to a given parameter. This method is very similar to the concept of canonical correlations introduced by Hotelling over 70 years ago [1], but its application in the hyperspectral sounding context is new. Second, the use of multi-layer feedforward NNs for geophysical parameter retrieval from hyperspectral measurements (first proposed in 1993 [2]) is reviewed, and an overview of the network parameters used in this work is given. The combination of the PPC radiance compression operator with an NN is then discussed, and performance analyses comparing the PPC–NN algorithm to traditional retrieval methods are presented.

Retrievals of Atmospheric Temperature and Moisture Profiles

11.2

207

A Brief Overview of Spaceborne Atmospheric Remote Sensing

The typical measurement scenario for spaceborne atmospheric remote sensing is shown in Figure 11.1. A sensor measures upwelling spectral radiance (intensity as a function of frequency) at various incidence angles. The sensor data is usually calibrated to remove measurement artifacts such as gain drift, nonlinearities, and noise. The spectral radiances measured by the sensor are related to geophysical quantities, such as the vertical temperature profile of the atmosphere, and therefore must be converted into a geophysical quantity of interest through the use of an appropriate retrieval algorithm. The radiative transfer equation describing the radiation intensity observed at altitude L, viewing angle u, and frequency n can be formulated by including reflected atmospheric and cosmic contributions and the radiance emitted by the surface as follows [3,4]: Rn (L) ¼

ðL

ðL 0 0 kn (z)Jn [T(z)] exp sec ukn (z ) dz sec u dz

0

þ rn e

t* sec u

ðL

z

ðz 0 0 kn (z)Jn [T(z)] exp sec ukn (z ) dz sec u dz

0

0

þ rn e2t* sec u Jn (Tc ) þ «n et* sec u Jn (Ts )

(11:1)

Intensity

Voltage

1 0.5 0 −0.5 −1 −1.5 −2 −1

0

1

300 290 280 270 260 250 240 230 220 210 200 500

Spectral radiance

104

103

2

1000

1500

2000

2500

Wavelength

Time

Calibraton

Temperature profile 105

Altitude

Detector output 2 1.5

3000

10 180

200

220

240

260

280

300

Temperature

Retrieval algorithm

FIGURE 11.1 Typical measurement scenario for spaceborne atmospheric remote sensing. Electromagnetic radiation that reaches the sensor is emitted by the sun, cosmic background, atmosphere, surface, and clouds. This radiation can also be reflected or scattered by the surface, atmosphere, or clouds. The spectral radiances measured by the sensor are related to geophysical quantities, such as the vertical temperature profile of the atmosphere, and therefore must be converted into a geophysical quantity of interest through the use of an appropriate retrieval algorithm.

Signal and Image Processing for Remote Sensing

208

where «n is the surface emissivity, rn is the surface reflectivity, Ts is the surface temperature, kn(z) is the atmospheric absorption coefficient, t * is the atmospheric zenith opacity, Tc is the cosmic background temperature (2.736 + 0.017 K), and Jn(T) is the radiance intensity emitted by a blackbody at temperature T, which is given by the Planck equation: Jn (T) ¼

hn3 1 W m2 ster1 Hz1 c2 ehn=kT 1

(11:2)

The first term in Equation 11.1 can be recast in terms of a transmittance function Tn(z): Rn (L) ¼

ðL 0

dTn (z) Jn [T(z)] dz dz

(11:3)

The derivative of the transmittance function with respect to altitude is often called the temperature weighting function Wn (z)

dTn (z) dz

(11:4)

and gives the relative contribution of the radiance emanating from each altitude. The temperature weighting functions for the Advanced Microwave Sounding Unit (AMSU) are shown in Figure 11.2.

AMSU-A weighting functions

AMSU-B weighting functions

60

50

10−2

14

40 Altitude (km)

Water vapor burden (g/cm2)

Channel

13

30

12 11

20

10

183 ± 1 GHz

10−1

183 ± 3 GHz

9

10

0 0

183 ± 7 GHz

100

8 7 6

3

0.2

0.4 dT/d(In P )

150 GHz

5 4

0.6

89 GHz

0

0.1 0.2 0.3 0.4 dT/d(In (water vapor burden))

FIGURE 11.2 AMSU-A temperature profile (left) and AMSU-B water vapor profile (right) weighting functions

Retrievals of Atmospheric Temperature and Moisture Profiles 11.2.1

209

Geophysical Parameter Retrieval

The objective of the geophysical parameter retrieval algorithm is to estimate the state of the atmosphere (represented by parameter matrix X, say), given observations of spectral radiance (represented by radiance matrix R, say). There are generally two approaches to this problem, as shown in Figure 11.3. The first approach, referred to here as the variational approach, uses a forward model (for example, the transmittance and radiative transfer models previously discussed) to calculate the sensor radiance that would be measured given a specific atmospheric state. Note that the inverse model typically does not exist, as there are generally an infinite number of atmospheric states that could give rise to a particular radiance measurement. In the variational approach, a ‘‘guess’’ of the atmospheric state is made (this is usually obtained through a forecast model or historical statistics), and this guess is propagated through the forward models thereby producing an estimate of the at-sensor radiance. The measured radiance is compared with this estimated radiance, and the state vector is adjusted so as to reduce the difference between the measured and estimated radiance vectors. Details on this methodology are discussed at length by Rodgers [5], and the interested reader is referred there for a more thorough treatment of the methodology and implementation of variational retrieval methods. The second approach, referred to here as the statistical, or regression-based, approach, does not use the forward model explicitly to derive the estimate of the atmospheric state vector. Instead, an ensemble of radiance–state vector pairs is assembled, and a statistical characterization (p(X), p(R), and p(X,R)) is sought. In practice, it is difficult to obtain these probability density functions (PDFs) directly from the data, and alternative methods are often used. Two of these methods are linear least-squares estimation (LLSE), or linear regression, and nonlinear least-squares estimation (NLLSE). NNs are a special class of NLLSEs, and will be discussed later.

Variational approach: • A forward model relates the geophysical state of the atmosphere to the radiances measured by the sensor.

R=f

X ≡ [T (r,t ), W (r,t ), O (r,t ),…] surface reflectivity, solar illumination, etc. +e observing system (bandwidth, resolution, etc.)

Observation noise • A “guess” of the atmospheric state is adjusted iteratively until modeled radiance “matches” observed radiance.

g = ||R – Robs|| + h(X ) “Regularization” term

Statistical (regression-based) approach: • An ensemble of radiance−state vector pairs is assembled, and X ˆ = g(Robs), where g(·) is argmin ||Xens – g(Rens)|| a statistical relationship g(·) between the two is dervied empirically. Examples of g(·) include LLSE and neural network FIGURE 11.3 Variational and statistical approaches to geophysical parameter retrieval. In the variational approach, a forward model is used to predict at-sensor radiances based on atmospheric state. In the statistical approach, an empirical relationship between at-sensor radiances and atmospheric state is derived using an ensemble of radiance–state vectors.

Signal and Image Processing for Remote Sensing

210 11.2.2

The Motivation for Computationally Efficient Algorithms

The principal advantage of regression-based methods is their simplicity—once the coefficients are derived from ‘‘training’’ data, the calculation of atmospheric state vectors is relatively easy. The variational approaches require multiple calls to the forward models, which can be computationally prohibitive. The computational complexity of the forward models is usually nonlinearly related (often O(n2) or more) to the number of spectral channels. As shown in Figure 11.4, the spectral and spatial resolution of infrared sounders has increased dramatically over the last 35 years, and the required computation needed for real-time operation with variational algorithms has outpaced Moore’s Law. There is, therefore, a motivation to reduce the computational burden of current and next-generation retrieval algorithms to allow real-time ingestion of satellite-derived geophysical products into numerical weather forecast models.

11.3

Principal Components Analysis of Hyperspectral Sounding Data

Principal components (PC) transforms can be used to represent radiance measurements in a statistically compact form, enabling subsequent retrieval operators to be substantially

Spatial resolution

Spectral resolution 105

9 IR Microwave

IR Microwave 8 IASI

7

AIRS

6 Footprints per 100 km

Number of special channels

104

103

102

5

4

3 HIRS 101

2

ITPR

AMSU/ATMS 1

100 1970

1980

1990 Year

2000

2010

MSU

NEMS 0 1970 1980

1990 Year

2000

2010

FIGURE 11.4 Improvements in sensor spectral and spatial resolution over the last 35 years is shown. The recent increases in the spectral resolutions afforded by infrared sensors has far surpassed that available from microwave sensors. The trends in spatial resolution are similar for infrared and microwave sensors.

Retrievals of Atmospheric Temperature and Moisture Profiles

211

more efficient and robust (see Ref. [6], for example). Furthermore, measurement noise can be dramatically reduced through the use of PC filtering [7,8], and it has also been shown [9] that PC transforms can be used to represent variability in high-spectral-resolution radiances perturbed by clouds. In the following sections, several variants of the PC transform are briefly discussed, with emphasis focused on the ability of each to extract geophysical information from the noisy radiance data. 11.3.1

The PC Transform

The PC transform is a linear, orthonormal operator1 QrT, which projects a noisy ~ ¼ R þ , into an r-dimensional (r m) subspace. The m-dimensional radiance vector, R additive noise vector is assumed to be uncorrelated with the radiance vector R, and is ~ , that is, P ~ ¼ QrT R ~ have characterized by the noise covariance matrix C. The ‘‘PC’’ of R two desirable properties: (1) the components are statistically uncorrelated and (2) the reduced-rank reconstruction error. ^ ^~ R ~ )T (R ~ )] ~ R c () ¼ E[(R (11:5) 1

r

r

^ ~ for some linear operator Gr with rank r, is minimized when Gr ¼ Qr QrT. ~ r D Gr R where R ¼ The rows of QTr contain the r most-significant (ordered by descending eigenvalue) eigenvectors of the noisy data covariance matrix CR~ R~ ¼ CRR þ C. 11.3.2

The NAPC Transform

Cost criteria other than in Equation 11.5 are often more suitable for typical hyperspectral compression applications. For example, it might be desirable to reconstruct the noise-free radiances and filter the noise. The cost equation thus becomes ^ r R)T (R ^ r R)] c2 () ¼ E[(R (11:6) ^ r D Hr R ~ for some linear operator Hr with rank r. The noise-adjusted principal where R ¼ 1=2 1=2 components (NAPC) transform [10], where Hr ¼ C Wr WrT C and WrT contains the 1=2 r most-significant eigenvectors of the whitened noisy covariance matrix Cw~ w~ ¼ C 1=2 (CRR þ C)C , maximizes the signal-to-noise ratio of each component, and is superior to the PC transform for most noise-filtering applications where the noise statistics are known a priori.

11.3.3

The Projected PC Transform

It is often unnecessary to require that the PC be uncorrelated, and linear operators can be derived that offer improved performance over PC transforms for minimizing cost functions such as in Equation 11.6. It can be shown [11] that the optimal linear operator with rank r that minimizes Equation 11.6 is Lr ¼ Er ETr CRR (CRR þ CCC )1

(11:7)

where Er ¼ [E1 j E2 j . . . jEr] are the r most-significant eigenvectors of CRR (CRR þ C)1CRR. Examination of Equation 11.7 reveals that the Wiener-filtered radiances are projected onto the r-dimensional subspace spanned by Er. It is this projection that The following mathematical notation is used in this chapter: ()T denotes the transpose, (~ ) denotes a noisy random vector, and () denotes an estimate of a random vector. Matrices are indicated by bold upper case, vectors by upper case, and scalars by lower case. 1

Signal and Image Processing for Remote Sensing

212

motivates the name ‘‘PPC.’’ An orthonormal basis for this r-dimensional subspace of the original m-dimensional radiance vector space R is given by the r most-significant right eigenvectors, Vr, of the reduced-rank linear regression matrix, Lr, given in Equation 11.7. ~ as We then define the PPC of R ~ ¼ VT R ~ P r

(11:8)

~ are correlated, as VT ðCRR þ CCC ÞVr is not a diagonal matrix. Note that the elements of P r Another useful application of the PPC transform is the compression of spectral radiance information that is correlated with a geophysical parameter, such as the temperature profile. The r-rank linear operator that captures the most radiance information, which is correlated to the temperature profile, is similar to Equation 11.7 and is given below: Lr ¼ Er ETr CTR (CRR þ CCC )1

(11:9)

where Er ¼ [E1 j E2 j j Er] are the r most-significant eigenvectors of CTR (CRR þ CCC )1 CRT , and CTR is the cross-covariance of the temperature profile and the spectral radiance.

11.3.4

Evaluation of Compression Performance Using Two Different Metrics

The compression performance of each of the PC transforms discussed previously was evaluated using two performance metrics. First, we seek the transform that yields the best (in the sum-squared sense) reconstruction of the noise-free radiance spectrum given a noisy spectrum. Thus, we seek the optimal reduced-rank linear filter. The second performance metric is quite different and is based on the temperature retrieval performance in the following way. A radiance spectrum is first compressed using each of the PC transforms for a given number of coefficients. The resulting coefficients are then used in a linear regression to estimate the temperature profile. The results that follow were obtained using simulated, clear-air radiance intensity spectra from an AIRS-like sounder. Approximately, seven thousand and five-hundred 1750-channel radiance vectors were generated with spectral coverage from approximately 4 to 15 mm using the NOAA88b radiosonde set. The simulated intensities were expressed in spectral radiance units (mW m2 sr1(cm1)1). 11.3.4.1

PC Filtering

Figure 11.5a shows the sum-squared radiance distortion (Equation 11.5) as a function of the number of components used in the various PC decomposition techniques. The a priori level indicates the sum-squared error due to sensor noise. Results from two variants of the PC transform are plotted, where the first variant (the ‘‘PC’’ curve) uses eigenvectors of CR^ R^ as the transform basis vectors, and the second variant (the ‘‘noise-free PC’’ curve) uses eigenvectors of CRR as the transform basis vectors. It is shown in Figure 11.5a that the PPC reconstruction of noise-free radiances (PPC[R]) yields lower distortion than both the PC and NAPC transforms for any number of components (r). It is noteworthy that the ‘‘PC’’ and ‘‘noise-free PC’’ curves never reach the theoretically optimal level, defined by the full-rank Wiener filter. Furthermore, the PPC distortion curves decrease monotonically with coefficient number, while all the PC distortion curves exhibit a local minimum, after which the distortion increases with coefficient number as noisy, high-order terms are

Retrievals of Atmospheric Temperature and Moisture Profiles (a)

213

Noise-filtered radiance reconstruction error

Sum-squared error (mW m–2 sr –1 (cm–1)–1)

106 PC Noise-free PC NAPC PPC(R ) PPC(T )

104

102

100 100

a priori

Theoretical limit

101

102

103

Number of components (r )

(b)

Temperature profile retrieval error

Sum-squared error (K)

104 PC Noise-free PC NAPC PPC(R ) PPC(T )

103

102 100

Theoretical limit 101

102

103

Number of components (r )

FIGURE 11.5 Performance comparisons of the PC (where the components are derived from both noisy and noise-free radiances), NAPC, and PPC transforms for a hypothetical 1750-channel infrared (4–15 mm) sounder. Two projected principal components transforms were considered, PPC(R) and PPC(T), which are, respectively: (a) maximum representation of noise-free radiance energy, and (b) maximum representation of temperature profile energy. The first plot shows the sum-squared error of the reduced-rank reconstruction of the noise-free spectral radiances. The second plot shows the temperature profile retrieval error (trace of the error covariance matrix) obtained using linear regression with r components.

included. The noise in the high-order PPC terms is effectively zeroed out, because it is uncorrelated with the spectral radiances. 11.3.4.2 PC Regression The PC coefficients derived in the previous example are now used in a linear regression to estimate the temperature profile. Figure 11.5b shows the temperature profile error (integrated over all altitude levels) as a function of the number of coefficients used in the linear regression, for each of the PC transforms. To reach the theoretically optimal value achieved by linear regression with all channels requires approximately 20 PPC coefficients, 200 NAPC coefficients, and 1000 PC coefficients. Thus, the PPC transform results in a factor of ten improvement over the NAPC transform when compressing temperaturecorrelated radiances (20 versus 200 coefficients required), and approximately a factor of 100 improvement over the original spectral radiance vector (20 versus 1750). Note that the first guess in the AIRS Science Team Level-2 retrieval uses a linear regression derived from approximately 60 of the most-significant NAPC coefficients of the 2378-channel AIRS spectrum (in units of brightness temperature) [6]. Results for the moisture profile

214

Signal and Image Processing for Remote Sensing

are similar, although more coefficients (typically 35 versus 25 for temperature) are needed because of the higher degree of nonlinearity in the underlying physical relationship between atmospheric moisture and the observed spectral radiance. This substantial compression enables the use of relatively small (and thus very stable and fast) NN estimators to retrieve the desired geophysical parameters. It is interesting to consider the two variants of the PPC transform shown in Figure 11.5, namely PPC(R), where the basis for the noise-free radiance subspace is desired, and PPC(T), where the basis for only the temperature profile information is desired. As shown in Figure 11.5a, the PPC(T) transform poorly represents the noise-free radiance space, because there is substantial information that is uncorrelated with temperature (and thus ignored by the PPC(T) transform) but correlated with the noise-free radiance. Conversely, the PPC(R) transform offers a significantly less compact representation of temperature profile information (see Figure 11.5b), because the transform is representing information that is not correlated with temperature and thus superfluous when retrieving the temperature profile.

11.3.5

NAPC of Clear and Cloudy Radiance Data

In the following sections we compute the NAPC (and associated eigenvalues) of clear and cloudy radiance data, the NAPC of the infrared radiance perturbations due to clouds, and the projected (temperature) principal components of clear and cloudy radiance data. The 2378 AIRS radiances were converted from spectral intensities to brightness temperatures using Equation 11.2, and were concatenated with the 20 microwave brightness temperatures from AMSU-A and AMSU-B into a single vector R of length 2398. The NAPC were computed as follows: PNAPC ¼ QT R

(11:10)

~W ~ , sorted in descending order by eigenvalue. CW ~W ~ is where Q are the eigenvectors of CW the prewhitened covariance matrix discussed in Section 11.3. The eigenvalues corresponding to the top 100 NAPC are shown in Figure 11.6 for simulated clear-air and cloudy data. Also shown are scatterplots of the first three NAPC. The eigenvalues of the 90 lowest order terms are very similar. The principal differences occur in the three highest order terms, which are dominated by channels with weighting function peaks in the lower part of the atmosphere. The eigenvalues associated with the clearair and cloudy NAPC cluster into roughly five groups: 1, 2–3, 4–9, 10–11, and 12–100. The first 11 NAPC capture 99.96% of the total radiance variance for both the clear-air and cloudy data. The top three NAPCs of both clear-air and cloudy data appear to be jointly Gaussian to a close approximation, with the exception of clear-air NAPC #1 versus NAPC #2.

11.3.6

NAPC of Infrared Cloud Perturbations

We define the infrared cloud perturbation DRIR as D Rclr Rcld DRIR ¼ IR IR

(11:11)

clr cld where RIR is the clear-air infrared brightness temperature and RIR is the cloudy infrared brightness temperature. The NAPC of DRIR were calculated using the method described above. The results are shown in Figure 11.7.

Retrievals of Atmospheric Temperature and Moisture Profiles

215

108 Clear air Clouds

Eigenvalue

106

104

102

10

20

30

70 6

2

4

4

0

−4 −6 −4

NAPC #3

6

−2

2 0 −2

0 −2 NAPC #1

−4 −4

2

0 −2 NAPC #1

−4 −5

2

4

4

−6 −5

NAPC #3

2

2 0 −2

−4 0 NAPC #1

5

−4 −5

100

−2

6

−2

90

0

6

0

80

2

4

NAPC #3

NAPC #2

40 50 60 NAPC coefficient #

4

NAPC #3

NAPC #2

100 0

0 NAPC #2

5

0 NAPC #2

5

2 0 −2

0 NAPC #1

5

−4 −5

FIGURE 11.6 Noise-adjusted principal components transform analysis of clear and cloudy simulated AIRS=AMSU data. The top plot shows the eigenvalues of each NAPC coefficient for clear and cloudy data. The middle row presents scatterplots of the three clear-air NAPC coefficients with the largest variance (shown normalized to unit variance). The bottom row presents scatterplots of the three cloudy NAPC coefficients with the largest variance (shown normalized to unit variance).

The six highest order NAPC of DRIR capture approximately 99.96% of the total cloud perturbation variance, which suggests that there are more degrees of freedom in the atmosphere than there are in the clouds. Furthermore, there is significant crosstalk between the cloud perturbation and the underlying atmosphere, and this crosstalk is highly nonlinear and non-Gaussian. Evidence of this can be seen in the scatterplot of NAPC #1 versus NAPC #2, shown in the lower left corner of Figure 11.7.

Signal and Image Processing for Remote Sensing

216 108

Clear−cloudy Clear

107

Eigenvalue

106 105 104 103 102 101

10

20

30

40 50 60 NAPC coefficient #

NAPC #3

NAPC #2

5

0

−5 −2

0

2 4 NAPC #1

6

70

5

5

0

0

NAPC #3

100 0

−5

−10 −2

0

2 4 NAPC #1

6

80

90

100

−5

−10 −5

0 NAPC #2

5

FIGURE 11.7 Noise-adjusted principal components transform analysis of the cloud impact (clear radiance–cloudy radiance) for simulated AIRS data. The top plot shows the eigenvalue of each NAPC coefficient of cloud impact, along with the NAPC coefficients of clear-air data (shown in Figure 11.6). The bottom row presents scatterplots of the three cloud-impact NAPC coefficients with the largest variance (shown normalized to unit variance).

The temperature weighting functions of NAPC #1 and NAPC #2 are shown in Figure 11.8. NAPC #1 consists primarily of surface channels and NAPC #2 consists primarily of channels that peak near 3–6 km and channels that peak near the surface. Therefore, NAPC #1 is sensitive principally to the overall cloud amount, while NAPC #2 is also sensitive to cloud altitude.

11.3.7

PPC of Clear and Cloudy Radiance Data

The PPC transform discussed previously was used to identify temperature information contained in the clear and cloudy radiances. Figure 11.9 shows the mean temperature profile retrieval error for the reduced-rank regression operator given in Equation 11.9 as a function of rank (the number of PPC coefficients retained) for clear-air and cloudy radiance data. Both curves have asymptotes near 15 coefficients, and clouds degrade the temperature retrieval by an average of approximately 0.3 K RMS.

Retrievals of Atmospheric Temperature and Moisture Profiles

217

15 Clear–cloudy: NAPC #1 Clear–cloudy: NAPC #2

Altitude (km)

10

5

0

0

0.05

0.1

0.15 0.2 0.25 0.3 0.35 Temperature weighting function (km−1)

0.4

0.45

0.5

FIGURE 11.8 Temperature weighting functions of the first two noise-adjusted principal components of AIRS cloud perturbations (clear radiance–cloudy radiance).

3

Mean temperature profile error (K RMS)

Clear air Clouds 2.5

2

1.5

1

0.5 0

5

10

15

20

25

30

35

40

45

50

Number of PPC coefficients FIGURE 11.9 Projected principal components transform analysis of clear and cloudy simulated AIRS=AMSU data. The mean temperature profile retrieval error (K RMS) is shown as a function of the number of PPC coefficients used in a linear regression for simulated clear and cloudy data.

Signal and Image Processing for Remote Sensing

218

11.4

Neural Network Retrieval of Temperature and Moisture Profiles

An NN is an interconnection of simple computational elements, or nodes, with activation functions that are usually nonlinear, monotonically increasing, and differentiable. NNs are able to deduce input–output relationships directly from the training ensemble without requiring underlying assumptions about the distribution of the data. Furthermore, an NN with only a single hidden layer of a sufficient number of nodes with nonlinear activation functions is capable of approximating any real-valued continuous scalar function to a given precision over a finite domain [12,13].

11.4.1

An Introduction to Multi-Layer Neural Networks

Consider a multi-layer feedforward NN consisting of an input layer, an arbitrary number of hidden layers (usually one or two), and an output layer (see Figure 11.10). The hidden layersPtypically contain sigmoidal activation functions of the form zj ¼ tanh(aj), where d aj ¼ i ¼ 1 wji xi þ bj. The output layer is typically linear. The weights (wji) and biases (bj) for the jth neuron are chosen to minimize a cost function over a set of P training patterns. A common choice for the cost function is the sum-squared error, defined as E(w) ¼

1 X X (p) (p) 2 t k yk 2 p k

(11:12)

(p) where y(p) k and tk denote the network outputs and target responses, respectively, of each output node k given a pattern p, and w is a vector containing all the weights and biases of the network. The ‘‘training’’ process involves iteratively finding the weights and biases that minimize the cost function through some numerical optimization procedure. Secondorder methods are commonly used, where the local approximation of the cost function by a quadratic form is given by

(a) Neural network topology

(b) Perceptron x1 x2

wj,i

Σ

x3

. . .

xd Input layer

First hidden layer

Second hidden layer

Output layer

aj

F(aj)

zj

wj,d bj

FIGURE 11.10 The structure of the multi-layer feedforward NN (specifically, the multi-layer perceptron) is shown in (a), and the perceptron (or node) is shown in (b).

Retrievals of Atmospheric Temperature and Moisture Profiles 1 E(w þ dw) E(w) þ rE(w)T dw þ dwT r2 E(w) dw 2

219 (11:13)

where rE(w) and r2E(w) are the gradient vector and the Hessian matrix of the cost function, respectively. Setting the derivative of Equation 11.13 to zero and solving for the weight update vector dw yields dw ¼ [r2 E(w)]1 rE(w)

(11:14)

Direct application of Equation 11.14 is difficult in practice, because computation of the Hessian matrix (and its inverse) is nontrivial, and usually needs to be repeated at each iteration. For sum-squared error cost functions, it can be shown that rE(w) ¼ JT e

(11:15)

r2 E(w) ¼ JT J þ S

(11:16)

where J is the Jacobian matrix that contains first derivatives of the network P Perrors with 2 respect to the weights and biases, e is a vector of network errors, and S ¼ p ¼ 1 ep r ep. The Jacobian matrix can be computed using a standard backpropagation technique [14] that is significantly more computationally efficient than direct calculation of the Hessian matrix [15]. However, an inversion of a square matrix with dimensions equal to the total number of weights and biases in the network is required. For the Gauss–Newton method, it is assumed that S is zero (a reasonable assumption only near the solution), and the updated Equation 11.14 becomes dw ¼ [JT J]1 Je

(11:17)

The Levenberg–Marquardt modification [16] to the Gauss–Newton method is dw ¼ [JT J þ mI]1 Je

(11:18)

As m varies between zero and 1, dw varies continuously between the Gauss–Newton step and the steepest descent. The Levenberg–Marquardt method is thus an example of a model-trust-region approach in which the model (in this case the linearized approximation of the error function) is trusted only within some region around the current search point [17]. The size of this region is governed by the value m. The use of multi-layer feedforward NNs, such as the multi-layer perceptron (MLP), to retrieve temperature profiles from hyperspectral radiance measurements has been addressed by several investigators (see Refs. [18,19], for example). NN retrieval of moisture profiles from hyperspectral data is relatively new [20], but follows the same methodology used to retrieve temperature.

11.4.2

The PPC–NN Algorithm

A first attempt to combine the properties of both NN estimators and PC transforms for the inversion of microwave radiometric data to retrieve atmospheric temperature and moisture profiles is reported in Ref. [21], and a more recent study with hyperspectral data is presented in Ref. [20]. A conceptually similar approach is taken in this work by combining the PPC compression technique described in Section 11.3.3 with the NN estimator discussed in the previous section. PPC compression offers substantial performance

Signal and Image Processing for Remote Sensing

220

advantages over traditional principal components analysis (PCA) and is the cornerstone of the present work. 11.4.2.1

Network Topology

All MLPs used in the PPC–NN algorithm are composed of one or two hidden layers of nonlinear (hyperbolic tangent) nodes and an output layer of linear nodes. For the temperature retrieval, 25 PPC coefficients are input to six NNs, each with a single hidden layer of 15 nodes. Separate NNs are used for different vertical regions of the atmosphere; a total of six networks are used to estimate the temperature profile at 65 points from the surface to 50 mbar. For the water vapor retrieval, 35 PPC coefficients are input to nine NNs, each with a single hidden layer of 25 nodes. The water vapor profile (mass mixing ratio) is estimated at 58 points from the surface to 75 mbar. These network parameters were determined largely through empirical analyses. Work is underway to dynamically optimize these parameters as the NN is trained. Separate training and testing data sets are used and are discussed in more detail in Section 11.4.2.2. 11.4.2.2

Network Training

The weights and biases were initialized using the Nguyen–Widrow method [22]. This method reduces the training time by initializing the weights so that each node is ‘‘active’’ (in the linear region of the activation function) over the input range of interest. The NN was trained using the Levenberg–Marquardt backpropagation algorithm discussed in Section 11.4.1. For each epoch, the m parameter was initialized to 0.001. If a successful step was taken (i.e., E(w þ dw) < E(w)), then m was decreased by a factor of ten. If the current step was unsuccessful, the value of m was increased by a factor of ten until a successful step could be found (or until m reached 1010). The network training was stopped when the error on a separate data set did not decrease for 10 consecutive epochs. The sensor noise was changed on each training epoch to desensitize the network to radiance measurement errors. 11.4.3

Error Analyses for Simulated Clear and Cloudy Atmospheres

AIRS and AMSU-A=B clear and cloudy radiances were simulated for an ensemble of approximately 10,000 profiles. These profiles were produced using a numerical weather prediction (NWP) model, and are substantially smoother (vertically) than the NOAA88b profile set that is used in the following sections. The profiles were separated into mutually exclusive sets for training and testing of the PPC–NN algorithm. The RMS errors for the LLSE and NN temperature retrievals are shown in Figure 11.11 for clear and cloudy atmospheres. The NN estimator significantly outperforms the LLSE in both cases. The sensitivity of the retrieval to instrument noise is examined by repeating the retrieval with instrument noise set to zero. The difference in retrieval errors (with and without noise) is shown in the first panel of Figure 11.12. The atmospheric contribution to the retrieval error (i.e., the noise-free retrieval error) is shown in the second panel of Figure 11.12. Finally, the differences (net minus LLSE) in error contributions for the two methods are shown in the third panel of Figure 11.12. It is noteworthy that the NN is a much better filter of instrument noise than is the LLSE. As a final test of sensitivity to instrument noise, the LLSE and NN retrievals were repeated while varying the instrument noise between 10% and 1000% of its nominal value. The resulting retrieval errors are shown in Figure 11.13. The NN retrieval is significantly less sensitive to instrument noise than is the LLSE retrieval.

Retrievals of Atmospheric Temperature and Moisture Profiles

221

15 Neural net (clear air) LLSE (clear air) Neural net (clouds) LLSE (clouds)

Altitude (km)

10

5

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

RMS error (K) FIGURE 11.11 RMS temperature profile retrieval error for the NN and LLSE estimators. Results for clear air and clouds are shown. Surface emissivities were modeled as random variables with clipped-Gaussian PDFs; mean ¼ 0.975, standard deviation ¼ 0.025 (AIRS); mean ¼ 0.95, standard deviation ¼ 0.05 (AMSU).

15

Atmosphere contribution 15 15

10

10

10

5

5

5

Altitude (km)

Noise contribution

0

0 0

0.2 0.4 0.6 RMS error (K)

Neural net LLSE

Difference

0 0

0.5 1 RMS error (K)

Neural net LLSE

0 0.2 0.4 Net error LLSE error (K)

Noise Atmosphere

FIGURE 11.12 Contributions to temperature profile retrieval error due to measurement noise and atmospheric noise for the NN and LLSE estimators.

Signal and Image Processing for Remote Sensing

222 Absolute error

Normalized error

4

3.5

5 Neural net LLSE

4.5

Neural net LLSE

4

3

3.5 Retrieval error factor

Retrieval error (K)

2.5

2

1.5

1

2.5 2 1.5

0.5

0 10⫺1

3

1

100 Noise factor

101

0.5 10⫺1

100 Noise factor

101

FIGURE 11.13 Sensitivity of temperature profile retrieval to measurement noise. The mean RMS error over 15 km is shown as a function of the noise amplification factor.

11.4.4 Validation of the PPC–NN Algorithm with AIRS=AMSU Observations of Partially Cloudy Scenes Over Land and Ocean In this section, the performance of the PPC–NN algorithm is evaluated using cloud-cleared AIRS observations (not simulations, as were used in the previous section) and colocated ECMWF (European Center for Medium-range Weather Forecasting) forecast fields. The cloud clearing is performed using both AIRS and AMSU data. The PPC–NN retrieval performance is compared with that obtained using the AIRS Level-2 algorithm. Both ocean and land cases are considered, including elevated surface terrain, and retrievals at all sensor scan angles (out to +488) are derived. Finally, sensitivity analyses of PPC–NN retrieval performance are presented with respect to scan angle, orbit type (ascending or descending), cloud amount, and training set comprehensiveness. 11.4.4.1

Cloud Clearing of AIRS Radiances

The cloud-clearing approach discussed in Susskind et al. [23] was applied to the AIRS data by the AIRS Science Team. Version 3.x of the algorithm was used in this work (see Table 11.1 for a detailed listing of the software version numbers). The algorithm seeks to estimate a clear-column radiance (the radiance that would have been measured if the scene were cloud-free) from a number of adjacent cloud-impacted fields of view.

Retrievals of Atmospheric Temperature and Moisture Profiles

223

TABLE 11.1 AIRS Software Version Numbers for the Seven Days Used in the Match-Up Data Set 6 Sep 2002 25 Jan 2003 8 Jun 2003 21 Aug 2003 3 Sep 2003 12 Oct 2003 5 Dec 2003 Cloud clearing Level-2

11.4.4.2

3.7.0 3.0.8

3.7.0 3.0.8

3.7.0 3.0.8

3.1.9 3.0.8

3.1.9 3.0.8

3.1.9 3.0.8

3.7.0 3.0.10

The AIRS=AMSU=ECMWF Data Set

The performance of the PPC–NN algorithm was evaluated using 352,903 AIRS=AMSU observations and colocated ECMWF atmospheric fields collected on seven days throughout 2002 and 2003 (see Table 11.1). Various software version changes were made over the course of this work (see Table 11.1 for details), but these changes were primarily with regard to quality control and do not significantly affect the results presented here. However, the version 4.x release of the AIRS software, which was not available in time to be included in this work, should offer many enhancements over version 3.x, including improved cloud clearing, retrieval accuracies, quality control, and retrieval yield [24]. Reanalyses of the results presented in this section are therefore planned with the new AIRS software release. The 352,903 observations were randomly divided into a training set of 302,903 observations (206,061 of which were over ocean) and a separate validation set of 50,000 observations (40,000 of which were over ocean). The a priori RMS variation of the temperature and water vapor (mass mixing ratio) profiles contained in the validation set are shown in Figure 11.14. The observations in the validation set were matched with AIRS Level-2 retrievals obtained from the Earth Observing System (EOS) Data Gateway (EDG). As advised in the AIRS Version 3.0 L2 Data Release Documentation, only retrievals that met certain quality standards (specifically, RetQAFlag ¼ 0 for ocean and RetQAFlag ¼ 256 for land) were included in the analyses. There were 17,856 AIRS Level-2 retrievals (all within +408 latitude) that met this criterion. Re-analysis with AIRS Level-2 version 4.x software is planned, as the version 4.x products have been validated over both ocean and land at near-polar latitudes. To facilitate comparison with results published in the AIRS v3.0 Validation Report [25], layer error statistics are calculated as follows. First, layer averages are calculated in layers of approximately (but not exactly) 1-km width—the exact layer widths can be found in Appendix III in the AIRS v3.0 Validation Report. Second, weighted water vapor errors in each layer are calculated by dividing the RMS mass mixing ratio error by the RMS variation of the true mass mixing ratio (as opposed to dividing the mass mixing ratio error of each profile by the true mass mixing ratio for that profile and computing the RMS of the resulting ensemble). 11.4.4.3

AIRS=AMSU Channel Selection

Thirty-seven percent (888 of the 2378) of the AIRS channels were discarded for the analysis, as the radiance values for these channels frequently were flagged as invalid by the AIRS calibration software. A simulated AIRS brightness temperature spectrum is shown in Figure 11.15, which shows the original 2378 AIRS channels and the 1490 channels that were selected for use with the PPC–NN algorithm. All 15 AMSU-A channels were used. The algorithm automatically discounts channels that are excessively corrupted by sensor noise (for example, AMSU-A channel 7 on EOS Aqua) or other interfering

Signal and Image Processing for Remote Sensing

224

(b) a priori RMS MMR error

(a) a priori RMS temperature error 1

1

10

Pressure (mbar)

Pressure (mbar)

10

102

103 0

2

4

6 K

8

10

12

102

103

0

20

40 Percent

60

80

FIGURE 11.14 Temperature and water vapor profile statistics for the validation data set used in the analysis. See the text for details on how the statistics are computed at each layer.

Brightness temperature (K)

320 300

Availble AIRS channels Selected AIRS channels

280 260 240 220 200 500

1000

1500

2000

2500

Wavenumber (cm−1) FIGURE 11.15 A typical AIRS spectrum (simulated) is shown. 1490 out of 2378 AIRS channels were selected.

3000

Retrievals of Atmospheric Temperature and Moisture Profiles

225

signals (for example, the effects of nonlocal thermodynamic equilibrium) because the corruptive signals are largely uncorrelated with the geophysical parameters that are to be estimated. 11.4.4.4 PPC–NN Retrieval Enhancements for Variable Sensor Scan Angle and Surface Pressure When dealing with global AIRS=AMSU data, a variety of scan angles and surface pressures must be accommodated. Therefore, two additional inputs were added to the NNs discussed previously: (1) the secant of the scan angle and (2) the forecast surface pressure (in mbar) divided by 1013.25. The resulting temperature and water vapor profile estimates were reported on a variable pressure grid anchored by the forecast surface pressure. Because the number of inputs to the NNs increased, the number of hidden nodes in NNs used for temperature retrievals was increased from 15 to 20. For water vapor retrievals, the number of hidden nodes in the first hidden layer was maintained at 25, but a second hidden layer of 15 hidden nodes was added. 11.4.4.5 Retrieval Performance We now compare the retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods. For both the ocean and land cases, the PPC–NN and linear regression retrievals were derived using the same training set, and the same validation set was used for all methods. The temperature profile retrieval performance over ocean for the linear regression retrieval, the PPC–NN retrieval, and the AIRS Level-2 retrieval is shown in Figure 11.16, and the water vapor retrieval performance is shown in Figure 11.17. The error statistics were calculated using the 13,156 (out of 40,000) AIRS Level-2 retrievals that converged successfully. A bias of approximately 1 K near 100 mbar was found between the AIRS Level-2 temperature retrievals and the ECMWF data (ECMWF was colder). This bias was removed prior to computation of the AIRS Level-2 retrieval error statistics, which are shown in Figure 11.16. The temperature profile retrieval performance over land for the linear regression retrieval, the PPC–NN retrieval, and the AIRS Level-2 retrieval is shown in Figure 11.18, and the water vapor retrieval performance is shown in Figure 11.19. The error statistics were calculated using the 4,700 (out of 10,000) AIRS Level-2 retrievals that converged successfully. There are several features in Figure 11.16 through Figure 11.19 that are worthy of note. First, for all retrieval methods, the performance over land is worse than that over ocean, as expected. The cloud-clearing problem is significantly more difficult over land, as variations in surface emissivity can be mistaken for cloud perturbations, thus resulting in improper radiance corrections. Second, the magnitude of the temperature profile error degradation for land versus ocean is larger for the PPC–NN algorithm than for the AIRS Level-2 algorithm. In fact, the temperature profile retrieval performance of the AIRS Level-2 algorithm is superior to that of the PPC–NN algorithm throughout most of the lower troposphere over land. Further analyses of this discrepancy suggest that the performance of the PPC–NN method over elevated terrain is suboptimal, and could be improved. This work is currently underway. 11.4.4.6 Retrieval Sensitivity to Cloud Amount A plot of the temperature retrieval error in the layer closest to the surface as a function of the cloud fraction retrieved by the AIRS Level-2 algorithm is shown in Figure 11.20.

Signal and Image Processing for Remote Sensing

226 101 Linear regression

AIRS v3.0 Level-2 algorithm

Pressure (mbar)

PPC–NN

102

103 0

0.5

1

1.5

RMS temperature error (K) FIGURE 11.16 Temperature retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over ocean. Statistics were calculated over 13,156 fields of regard.

Similar curves for the water vapor retrieval performance are shown in Figure 11.21. Both methods produce temperature and moisture retrievals with RMS errors near 1 K and 15%, respectively, even in cases with large cloud fractions. The figures show that the PPC–NN temperature and moisture retrievals are less sensitive than the AIRS Level-2 retrievals to cloud amount. Furthermore, it has been shown that the PPC–NN retrieval technique is relatively insensitive to sensor scan angle, orbit type, and training set comprehensiveness [26].

11.4.5

Discussion and Future Work

Although the PPC–NN performance results presented in the previous section are very encouraging, several caveats must be mentioned. The ECMWF fields used for ‘‘ground truth’’ contain errors, and the NN will tune to these errors as part of its training process. Therefore, the PPC–NN RMS errors shown in the previous section may be underestimated, and the AIRS Level-2 RMS errors may be overestimated, as the ECMWF data is not an accurate representation of the true state of the atmosphere. Therefore, the ‘‘true’’ spread between the performance of the PPC–NN and AIRS Level-2 algorithms is

Retrievals of Atmospheric Temperature and Moisture Profiles

227

100

Linear regression AIRS v3.0 Level-2 algorithm

Pressure (mbar)

200

PPC–NN

300

400 500 600 700 800 900 1000 0

10

20

30

40

50

Percent RMS mass mixing ratio error (RMS error/RMS true * 100) FIGURE 11.17 Water vapor (mass mixing ratio) retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over ocean. Statistics were calculated over 13,156 fields of regard.

almost certainly smaller than that shown here. Work is currently underway to test the performance of both the PPC–NN and AIRS Level-2 algorithms with additional groundtruth data, including radiosonde data, and ground- and aircraft-based measurements. It should be noted that the PPC–NN algorithm as implemented in this work is currently not a stand-alone system, as both AIRS cloud-cleared radiances and quality flags produced by the AIRS Level-2 algorithm are required. Future work is planned to adapt the PPC–NN algorithm for use directly on cloudy AIRS=AMSU radiances and to produce quality assessments of the retrieved products. Finally, assimilation of PPC–NN-derived atmospheric parameters into NWP models is planned, and the resulting impact on forecast accuracy will be an excellent indicator of retrieval quality.

11.5

Summary

The PPC–NN temperature and moisture profile retrieval technique combines a linear radiance compression operator with an NN estimator. The PPC transform was shown to be well suited for this application because information correlated to the geophysical

Signal and Image Processing for Remote Sensing

228 1

10

Linear regression AIRS v3.0 Level-2 algorithm

Pressure (mbar)

PPC–NN

102

103

0

0.5

1

1.5

2

2.5

RMS temperature error (K) FIGURE 11.18 Temperature retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over land. Statistics were calculated over 4,700 fields of regard.

quantity of interest is optimally represented with only a few dozen components. This substantial amount of radiance compression (approximately a factor of 100) allows relatively small NNs to be used thereby improving both the stability and computational efficiency of the algorithm. Test cases with observed partially cloudy AIRS=AMSU data demonstrate that the PPC–NN temperature and moisture retrievals yield accuracies commensurate with those of physical methods at a substantially reduced computational burden. Retrieval accuracies (defined as agreement with ECMWF fields) near 1 K for temperature and 25% for water vapor mass mixing ratio in layers of approximately 1-km thickness were obtained using the PPC–NN retrieval method with AIRS=AMSU data in partially cloudy areas. PPC–NN retrievals with partially cloudy AIRS=AMSU data over land were also performed. The PPC–NN retrieval technique is relatively insensitive to cloud amount, sensor scan angle, orbit type, and training set comprehensiveness. These results further suggest the AIRS Level-2 algorithm that produced the cloud-cleared radiances and quality flags used by the PPC–NN retrieval is performing well. The high level of performance achieved by the PPC–NN algorithm suggests it would be a suitable candidate for the retrieval of geophysical parameters other than temperature and moisture from high resolution spectral data. Potential applications include the retrieval of ozone profiles and trace gas amounts. Future work will involve further evaluation of the algorithm with simulated and observed partially cloudy data, including global radiosonde data and ground- and aircraft-based observations.

Retrievals of Atmospheric Temperature and Moisture Profiles

229

100

Linear regression AIRS v3.0 Level-2 algorithm

200

Pressure (mbar)

PPC–NN

300

400 500 600 700 800 900 1000

0

10

20

30

40

50

Percent RMS mass mixing ratio error (RMS error/RMS true * 100)

FIGURE 11.19 Water vapor (mass mixing ratio) retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over land. Statistics were calculated over 4,700 fields of regard.

RMS temperature retrieval error (K) in lowest layer

1.5 AIRS Level-2 PPC–NN

1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0

10

20

30

40 50 60 Cloud fraction (%)

70

80

90

100

FIGURE 11.20 Cumulative RMS temperature error in the layer closest to the surface. Pixels were ranked in order of increasing cloudiness according to the retrieved cloud fraction from the AIRS Level-2 algorithm. No retrievals were attempted by the AIRS Level-2 algorithm if the retrieved cloud fraction exceeded 80%.

Signal and Image Processing for Remote Sensing

230

RMS humidity retrieval error (%) in lowest layer

20 AIRS Level-2 PPC–NN

15

10

5

0

10

20

30

40 50 60 Cloud fraction (%)

70

80

90

100

FIGURE 11.21 Cumulative RMS water vapor error in the layer closest to the surface. Pixels were ranked in order of increasing cloudiness, according to the retrieved cloud fraction from the AIRS Level-2 algorithm. No retrievals were attempted by the AIRS Level-2 algorithm if the retrieved cloud fraction exceeded 80%.

Acknowledgments The author would like to thank NOAA NESDIS, Office of Systems Development, and the National Aeronautics and Space Administration for financial support of this work. The colocated AIRS=AMSU=ECMWF data sets were provided by the AIRS Science Team, with assistance from M. Goldberg (NOAA NESDIS, Office of Research Applications). The author would also like to thank L. Zhou for help in obtaining and interpreting the AIRS=AMSU=ECMWF data sets, and D. Staelin and P. Rosenkranz for many useful discussions on all facets of this work. This work was supported by the National Oceanic and Atmospheric administration under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

References 1. H. Hotelling, The most predictable criterion, J. Educ. Psychol., vol. 26, pp. 139–142, 1935. 2. J. Escobar-Munoz, A. Che´din, F. Cheruy, and N. Scott, Re´seaux de neurons multi-couches pour la restitution de variables thermodynamiques atmosphe´riques a` l’aide de sondeurs verticaux satellitaires, Comptes-Rendus de L’Acade´mie des Sciences; Se´rie II, vol. 317, no. 7, pp. 911–918, 1993.

Retrievals of Atmospheric Temperature and Moisture Profiles

231

3. D.H. Staelin, Passive remote sensing at microwave wavelengths, Proc. IEEE, vol. 57, no. 4, pp. 427–439, 1969. 4. M.A. Janssen, Atmospheric Remote Sensing by Microwave Radiometry, New York, John Wiley & Sons, Inc., 1993. 5. C.D. Rodgers, Retrieval of atmospheric temperature and composition from remote measurements of thermal radiation, J. Geophys. Res., vol. 41, no. 7, pp. 609–624, 1976. 6. M.D. Goldberg, Y. Qu, L.M. McMillin, W. Wolff, L. Zhou, and M. Divakarla, AIRS nearreal-time products and algorithms in support of operational numerical weather prediction, IEEE Trans. Geosci. Rem. Sens., vol. 41, no. 2, pp. 379–389, 2003. 7. H. Huang and P. Antonelli, Application of principal component analysis to high-resolution infrared measurement compression and retrieval, J. Appl. Meteorol., vol. 40, pp. 365–388, Mar. 2001. 8. F. Aires, W.B. Rossow, N.A. Scott, and A. Che´din, Remote sensing from the infrared atmospheric sounding interferometer instrument: 1. compression, denoising, first-guess retrieval inversion algorithms, J. Geophys. Res., vol. 107, Nov. 2002. 9. W.J. Blackwell and D.H. Staelin, Cloud flagging and clearing using high-resolution infrared and microwave sounding data, IEEE Int. Geosci. Rem. Sens. Symp. Proc., June 2002. 10. J.B. Lee, A.S. Woodyatt, and M. Berman, Enhancement of high spectral resolution remotesensing data by a noise-adjusted principal components transform, IEEE Trans. Geosci. Rem. Sens., vol. 28, pp. 295–304, May 1990. 11. W.J. Blackwell, Retrieval of cloud-cleared atmospheric temperature profiles from hyperspectral infrared and microwave observations, Ph.D. dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 2002. 12. K.M. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Netw., vol. 4, no. 5, pp. 359–366, 1989. 13. S. Haykin, Neural Networks: A Comprehensive Foundation, New York, Macmillan College Publishing Company, 1994. 14. D.E. Rumelhart, G. Hinton, and R. Williams, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations., ser. D.E. Rumelhart and J.L. McClelland, Eds., Cambridge, MA, MIT Press, 1986. 15. M.T. Hagan and M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Trans. Neural Netw., vol. 5, pp. 989–993, Nov. 1994. 16. P.E. Gill, W. Murray, and M.H. Wright, The Levenberg–Marquardt method, in Practical Optimization, London, Academic Press, 1981. 17. C.M. Bishop, Neural Networks for Pattern Recognition, Oxford, Oxford University Press, 1995. 18. H.E. Motteler, L.L. Strow, L. McMillin, and J.A. Gualtieri, Comparison of neural networks and regression based methods for temperature retrievals, Appl. Opt., vol. 34, no. 24, pp. 5390–5397, 1995. 19. F. Aires, A. Che´din, N.A. Scott, and W.B. Rossow, A regularized neural net approach for retrieval of atmospheric and surface temperatures with the IASI instrument, J. Appl. Meteorol., vol. 41, pp. 144–159, Feb. 2002. 20. F. Aires, W.B. Rossow, N.A. Scott, and A. Che´din, Remote sensing from the infrared atmospheric sounding interferometer instrument: 2. simultaneous retrieval of temperature, water vapor, and ozone atmospheric profiles, J. Geophys. Res., vol. 107, Nov. 2002. 21. F.D. Frate and G. Schiavon, A combined natural orthogonal functions=neural network technique for the radiometric estimation of atmospheric profiles, Radio Sci., vol. 33, no. 2, pp. 405–410, 1998. 22. D. Nguyen and B. Widrow, Improving the learning speed of two-layer neural networks by choosing initial values of the adaptive weights, IJCNN, vol. 3, pp. 21–26, 1990. 23. J. Susskind, C.D. Barnet, and J.M. Blaisdell, Retrieval of atmospheric and surface parameters from AIRS=AMSU=HSB data in the presence of clouds, IEEE Trans. Geosci. Rem. Sens., vol. 41, no. 2, pp. 390–409, 2003. 24. J. Susskind et al., Accuracy of geophysical parameters derived from AIRS=AMSU as a function of fractional cloud cover, J. Geophys. Res., III; May 2006.

232

Signal and Image Processing for Remote Sensing

25. H.H. Aumann et al., Validation of AIRS=AMSU=HSB core products for data release version 3.0, NASA JPL Tech. Rep. D-26538, Aug 2003. 26. W.J. Blackwell, A neural-network technique for the retrieval of atmospheric temperature and moisture profiles from high spectral resolution sounding data, IEEE Trans. Geosci. Rem. Sens., vol. 43, no. 11, pp. 2535–2546, 2005.

12 Satellite-Based Precipitation Retrieval Using Neural Networks, Principal Component Analysis, and Image Sharpening*

Frederick W. Chen

CONTENTS 12.1 Introduction ..................................................................................................................... 233 12.2 Physical Basis of Passive Microwave Precipitation Sensing ................................... 234 12.3 Description of AMSU-A/B and AMSU/HSB.............................................................. 240 12.4 Signal Processing ............................................................................................................ 242 12.4.1 Regional Laplacian Filtering ........................................................................... 242 12.4.2 Principal Component Analysis....................................................................... 243 12.4.2.1 Basic PCA.......................................................................................... 244 12.4.2.2 Constrained PCA ............................................................................. 244 12.4.3 Data Fusion ........................................................................................................ 245 12.4.4 Neural Nets........................................................................................................ 245 12.5 The Chen–Staelin Algorithm ........................................................................................ 247 12.5.1 Limb-Correction of Temperature Profile Channels .................................... 249 12.5.2 Detection of Precipitation ................................................................................ 251 12.5.3 Cloud-Clearing by Regional Laplacian Interpolation................................. 252 12.5.4 Image Sharpening ............................................................................................. 252 12.5.5 Temperature and Water Vapor Profile Principal Components ................ 254 12.5.6 The Neural Net.................................................................................................. 254 12.6 Retrieval Performance Evaluation ............................................................................... 255 12.6.1 Image Comparisons of NEXRAD and AMSU-A/B Retrievals.................. 255 12.6.2 Numerical Comparisons of NEXRAD and AMSU-A/B Retrievals .......... 256 12.6.3 Global Retrievals of Rain and Snow.............................................................. 259 12.7 Conclusions...................................................................................................................... 260 Acknowledgments ..................................................................................................................... 261 References ................................................................................................................................... 261

12.1

Introduction

Global estimation of precipitation is important to studies in areas such as atmospheric dynamics, hydrology, climatology and meteorology. Improvements in methods for *The material in this chapter is taken from Refs. [66] and [67].

233

Signal and Image Processing for Remote Sensing

234

satellite-based estimation of precipitation can lead to improvements in weather forecasting, climate studies, and climate models. Precipitation presents many challenges because of its complicated physics and statistics. Existing models of clouds do not adequately account for all of the possible variations in clouds and precipitation that can be encountered. Instead of a physics-based approach, Chen and Staelin developed a method using a statistics-based approach. This method was developed for the Advanced Microwave Sounding Unit (AMSU) instruments AMSU-A/B on the NOAA-15, NOAA-16, and NOAA-17 satellites. The development process made use of the fact that although satellite-based passive microwave data cannot completely characterize the observed clouds and precipitation, the data can still yield useful information relevant to precipitation. This chapter discusses the methods used by Chen and Staelin to extract from the data information related to atmospheric state variables that are highly correlated with precipitation. Principal component analysis for signal separation and data compression, data fusion for resolution sharpening, neural nets, and regional filtering are among the signal and image processing methods used. AMSU-A/B and the corresponding instruments AMSU/HSB (Humidity Sounder for Brazil) collect data near 23.8, 31.4, 54, 89, and 183 GHz. These frequency bands provide useful information about precipitation. In this chapter, the precipitation estimation algorithm developed by Chen and Staelin is discussed with an emphasis on the signal processing employed to sense important parameters about precipitation. Section 12.2 explains why it is possible to use the data on AMSU-A/B to estimate precipitation. Section 12.3 provides a more detailed description of AMSU-A/B and AMSU/HSB. Section 12.4 describes the Chen– Staelin algorithm. Section 12.7 provides concluding remarks.

12 .2

Phys ical Bas is of Pas sive Microwave Precipitation Sensing

Matter radiates thermal energy depending on its physical temperature and characteristic properties. When a spaceborne radiometer observes a location on the Earth, the amount of energy it receives depends on contributions from the various atmospheric and topographical constituents within its field of view (Figure 12.1). One useful quantity describing the amount of thermal radiation emitted by a body is spectral brightness. Spectral brightness is a measure of how much energy a body radiates at a specified frequency per unit receiving area, per transmitting solid angle, and per unit frequency. The spectral brightness of a blackbody (Wster1m2Hz1) is a function of its physical temperature T (K) and frequency f (Hz) and is given by the following formula B(f ,T) ¼

2hf 3 c2 (ehf =kT

1)

(12:1)

where h is Planck’s constant (Js), c is the speed of light (m/s), and k is Boltzmann’s constant (J/K). For this chapter, f never exceeds 200 GHz, and T never falls below 100 K, so hf/ kT < 0.1. Then, the Taylor series expansion for exponential functions is used to simplify Equation 12.1 " # hf 1 hf 2 hf =kT e 1¼ 1þ þ 1 (12:2) þ kT 2! kT

hf kT

(12:3)

Satellite-Based Precipitation Retrieval Using Neural Networks

235

z Space Td

Tcosmic

f H

Tc

Ta q

Atmosphere

ka(z), T(z)

Tb r(q), T(0)

FIGURE 12.1 The major components of the radiative transfer equation (note: u 6¼ w since the surface of the Earth is spherical).

B( f ,T)¼

2kT l2

(12:4)

where l is the wavelength (m) associated with the frequency f. For blackbodies, given an observation of spectral brightness, one can calculate the physical temperature of a blackbody as follows: T¼

B( f , t)l2 2k

(12:5)

Unlike blackbodies, gray bodies reflect some of the energy incident on them, so the intrinsic spectral brightness of a gray body is not equal to that of a blackbody. For a gray body, B( f ,T)¼«

2kT l2

(12:6)

where « is the emissivity of the gray body. This equation is rewritten as follows: B( f ,T)¼

2k «T l2

(12:7)

The quantity «T is called the brightness temperature of an unilluminated gray body; that is, the temperature of a blackbody radiating the same brightness. Another useful property of a gray body is its reflectivity r, the fraction of incident energy that is reflected. A body in thermal equilibrium emits the same amount of energy that it absorbs. Therefore, r þ « ¼ 1. For a blackbody, « ¼ 1 and r ¼ 0. In an atmosphere without hydrometeors, the brightness temperature observed by an Earth-observing spaceborne radiometer can be divided into four components (Figure 12.1): 1. Ta, the brightness temperature due to radiation emitted by atmospheric gases that is not reflected off the surface

Signal and Image Processing for Remote Sensing

236

2. Tb, the brightness temperature due to radiation emitted by the surface 3. Tc, the brightness temperature due to radiation that is emitted by atmospheric gases that is reflected off the surface 4. Td, the brightness temperature due to cosmic background radiation that passes through the atmosphere twice and is reflected off the surface Thermal radiation can be attenuated through absorption or reflected off the surface before being received by a radiometer. The observed brightness temperature is, therefore, a function of several variables such as the atmospheric temperature profile, water vapor profile, the emissivity of the surface (as a function of satellite zenith angle), and the absorption coefficients of atmospheric gases (as a function of altitude z). Given the atmospheric absorption coefficients ka(z) (in Np per unit length), the reflectivity of the surface r(u), the altitude of the satellite H, the cosmic background temperature Tcosmic (in K), and the satellite zenith angle u, the brightness temperature components can be computed as follows (assuming specular surface reflection and the absence of hydrometeors): ðH

Ta ¼ sec u T(z0 )ka (z0 )et(z ,H) sec u dz0 0

(12:8)

z0

Tb ¼½1 r(u)T(z0 )et(z0 ,H) sec u t(z0 ,H) sec u

Tc ¼r(u) sec u e

ðH

(12:9)

T(z0 )ka (z0 )et(z0 ,z ) sec u dz0 0

(12:10)

z0

Td ¼r(u)Tcosmic e2t(z0 ,H) sec u

(12:11)

Then, the brightness temperature TB measured by the radiometer is the sum of Ta, Tb, Tc, and Td. ðH 0 TB ¼ sec u T(z0 )ka (z0 )et(z ,H) sec u dz0 þ½1 r(u)T(z0 )et(z0 ,H) sec u z0 t(z0 ,H) sec u

þr(u) sec u e

ðH

T(z0 )ka (z0 )et(z0 ,z ) sec u dz0 þr(u)Tcosmic e2t(z0 ,H) sec u 0

(12:12)

z0

where z0 is the altitude of the surface and t(z1,z2) is the integrated atmospheric absorption of the atmosphere between altitudes z1 and z2. zð2 (12:13) t(z1 ,z2 )¼ ka (z) dz z1

Equation 12.12 can be rewritten as follows (assuming that the contribution of thermal radiation from altitudes above H to Tc is negligible):

TB

ðH z0

T(z0 )W(z0 ) dz0 þ½1 r(u)T(z0 )et(z0 ,H) sec u þr(u)Tcosmic e2t(z0 ,H) sec u

(12:14)

Satellite-Based Precipitation Retrieval Using Neural Networks

103

1976 Standard Atmosphere Water vapor removed

54-GHz O2 band

118-GHz 183-GHz O2 band H2O band

2

Zenith opacity (dB)

237

10

101

100

10

425-GHz O2 band

−1

50

100

150

200 250 300 Frequency (GHz)

350

400

450

500

FIGURE 12.2 Zenith opacity for the microwave spectrum.

where W (z)¼ sec u ka (z0 )et(z,H) sec u þr(u) sec u et(z0 ,H) sec u ka (z0 )et(z0 ,z) sec u

(12:15)

W(z) is called the weighting function or Jacobian. Equation 12.12 shows that the relationship between the thermal energy radiated and the physical temperature of a body is linear; that is, the brightness temperature can be expressed as a linear integral of physical temperatures in the field of view, where these radiated signals are uncorrelated and superimposed [1]. The previous section shows that the brightness temperatures seen by a satellite-borne radiometer depend on many variables in a highly complex and nonlinear fashion, so retrievals of temperature and water vapor profiles by direct inversion would be very difficult. However, the physics of the atmosphere still allows the extraction of useful information about the atmosphere from microwave frequency bands. Two of the most important determinants of precipitation rate are the temperature and water vapor profiles. The presence of oxygen and water vapor resonance frequencies in the microwave spectrum and the presence of oxygen and water vapor in the atmosphere result in frequency bands that are sensitive primarily to a specific range of altitudes. Figure 12.2 shows the zenith opacity as a function of frequency for the range from 10 to 500 GHz for a ground-based zenith-observing radiometer. There are absorption spikes around oxygen resonance frequencies such as those in the neighborhood of 54, 118.75, and 424.76 GHz and at water vapor resonance frequencies such as 183.31 GHz. A satelliteborne radiometer observing at these frequencies would sense only the highest layers of the atmosphere. On the other hand, a satellite-borne radiometer observing at frequencies away from the spikes would be sensitive to the surface. By observing at frequency bands that are near the resonance frequencies, but still on the sides of the spikes, one can capture information on conditions at specific layers of the atmosphere. The AMSU focuses primarily on the 54-GHz oxygen band and the 183-GHz water vapor band. Figure 12.3 shows the weighting functions of the channels on AMSU. Opaque microwave channels were used with great success to retrieve atmospheric conditions. Rosenkranz used AMSU-A and AMSU-B data from NOAA satellites to estimate temperature and water vapor profiles [2,3]. Shi used AMSU-A to estimate temperature

Signal and Image Processing for Remote Sensing

238 AMSU-A weighting functions

AMSU-B weighting functions

60

50

10−2

Channel

Altitude (km)

Water vapor burden (g/cm2)

14

40

13 12

30

11 20

10 9

0 0

10

183 ± 3 GHz

100

8 10

183 ± 1 GHz −1

183 ± 7 GHz

7 6 3 0.2

0.4 dT/d(In P)

5 4 0.6

150 GHz 89 GHz 0 0.1 0.2 0.3 0.4 dT/d(In(water vapor burden))

FIGURE 12.3 Weighting functions of AMSU-A and AMSU-B channels.

profiles [4]. Blackwell et al. used the 54-GHz and 118-GHz bands aboard the National PolarOrbiting Environmental Satellite System (NPOESS) Aircraft Sounder Testbed-Microwave (NAST-M) [5]. Leslie et al. added the 183-GHz and 425-GHz bands to NAST-M [6,7,8]. Clouds and precipitation result from humid air that rises, cools, and condenses. Precipitation typically occurs in two forms: convective and stratiform. Stratiform precipitation occurs when one air mass slides under or over another, as in a cold or warm front, causing the upper air mass to cool and the water vapor within it to condense. Such precipitation is spread out. Convective precipitation is initiated by instabilities in the atmosphere caused by cold, dense air supported by warm, humid, and less dense air. Such instabilities result in the warm, humid air escaping upward and cooling. Water vapor in this ascending air mass condenses and releases latent heat. The latent heat warms the surrounding air, which pushes the air mass and the water and ice particles that have formed further upward. This cycle continues until the original warm humid air mass has cooled to the temperature of the surrounding cooler air [9,10]. The tops of convective clouds spew forth ice particles and can reach more than 10 km above sea level [11]. Convective precipitation often occurs within stratiform precipitation. Several factors affect the precipitation rate. Higher degrees of instability caused by large vertical temperature gradients force ice particles higher in the atmosphere, causing such particles to pick up more moisture and grow. Precipitation amounts are limited by the water vapor available in the atmosphere; therefore, higher concentrations of water vapor result in higher precipitation rates. Warmer surface air contributes to higher precipitation rates because warmer air holds more water vapor. Channels in the 54-GHz and 183-GHz bands provide important clues about factors such as cloud-top altitude, temperature profile, water vapor profile, and particle size distribution. Because each channel is sensitive to a specific layer of the atmosphere

Satellite-Based Precipitation Retrieval Using Neural Networks

239

(Figure 12.3), it is possible to extract information about the cloud-top altitude. Precipitating clouds exhibit perturbations in channels whose weighting functions have significant values in the range of altitudes occupied by the cloud. For example, one would not expect to detect low-lying clouds below 3 km in the AMSU-A channel 14 because the weighting function of channel 14 peaks at 40 km, far above the tops of nearly all precipitating clouds. The 54-GHz and 183-GHz bands together provide information about particle size distribution through their sensitivities to different ranges of ice particle diameters. The scattering of electromagnetic waves by spherical ice particles is described by Mie scattering coefficients. For a single spherical particle with radius r, given the power scattered by the particle Ps and the power density of the incident plane wave Si, the scattering crosssectional area Qs of the particle is defined as follows: Qs ¼

Ps Si

(12:16)

Then, the Mie scattering coefficient is defined as the ratio of Qs of the particle to the geometric cross-sectional area of the particle js ¼

Qs pr2

(12:17)

where r is the radius of the particle. Figure 12.4 shows the Mie scattering coefficients for fresh-water ice spheres at 54 GHz and 183.31 GHz as a function of diameter. The permittivities of ice were calculated using formulas developed by Hufford [12], and the Mie scattering coefficients were calculated using an iterative computational procedure developed by Deirmendjian [1,13]. At 183.31 GHz, the diameter of a particle can increase to about 0.7 mm before its diameter can no longer be uniquely determined by the Mie scattering coefficient curve. At 54 GHz, this limit is about 2.4 mm. For diameters less than 0.7 mm, js for 54 GHz is smaller than that for 183.31 GHz by a factor of at least

Mie scattering efficiency

101

100

10−1

183.31 GHz

54 GHz 10−2

10−3 −4 10

Temperature = 218 K (−55⬚C) 10−3 Drop size diameter (m)

FIGURE 12.4 Mie scattering coefficients for 54 GHz and 183.31 GHz at a temperature of 558C.

10−2

Signal and Image Processing for Remote Sensing

240

100. These differences make it possible to use both bands together to extract information about particle size distributions. The Mie scattering coefficients in Figure 12.4 were calculated for a temperature of 558C (218 K), near the temperature at an altitude of 10 km for the 1962 U.S. Standard Atmosphere [1]. Corresponding values for 108C (263 K) were also calculated, but did not differ from those for 558C by more than 7.2%. In addition to particle size distribution, the 54-GHz and 183-GHz bands can also provide information about particle abundance. While the particle size distribution can be sensed from a comparison of the Mie scattering efficiencies in both bands, particle abundance can be sensed from absolute scattering over volumes. The cloud-top altitude is another important variable in precipitation. This is correlated with particle size density because only higher updraft velocities are able to support larger particles and reach higher altitudes. The sensitivities of the 54-GHz and 183-GHz channels to specific layers of the atmosphere suggest that these channels are able to provide information about cloud-altitude. Spina et al. used data from the opaque 118-GHz band to estimate cloud-top altitudes [14]. Gasiewski showed that the 54-GHz and 118-GHz bands together could be used to estimate cell-top altitude and hydrometeor density (in terms of mass per volume) [15,16]. Window channels contribute information about precipitation through their sensitivity to the warm emission signatures of precipitating clouds against a sea background and their sensitivity to scattering. In addition to opaque channels, AMSU also includes window channels at 23.8 GHz, 31.4 GHz, 89.0 GHz, and 150 GHz. Weng et al. and Grody et al. have developed a precipitation-retrieval algorithm for AMSU that relies primarily on these channels [17,18]. Some of the recent passive microwave instruments that have focused on windows channels and have been used to study precipitation include the following: 1. The Special Sensor Microwave Imager (SSM/I) for the Defense Meteorological Satellite Program (DMSP) [19–21] 2. The Advanced Microwave Sounding Radiometer (AMSR-E) for the Earth Observing System aboard the NASA Aqua satellite [22,23] 3. The Tropical Rainfall Measurement Mission (TRMM) microwave imager (TMI) aboard the TRMM satellite [24,25] One weakness of window channels is that they tend to be sensitive to the surface. Over land, surface signatures can obscure the emission signatures of precipitation. Also, window-channel-based precipitation-rate retrieval algorithms tend to use one method over ocean and another over land [26,27]. The opaque channels aboard AMSU enable the development of an algorithm that uses the same method over both land and sea.

12.3

Description of AMSU-A/B and AMSU/HSB

The AMSU has been aboard the NOAA-15, NOAA-16, and NOAA-17 satellites launched in May 1998, September 2000, and June 2002, respectively. They are each equipped with the instruments AMSU-A and AMSU-B. AMSU-A has 15 channels: one each at 23.8 GHz, 31.4 GHz, and 89.0 GHz, and 12 channels in the 54-GHz oxygen absorption band (Table 12.1). AMSU-B has 5 channels at the following frequencies: 89.0 GHz, 150 GHz, 183.31 + 1 GHz, 183.31 + 3 GHz, and 183.31 + 7 GHz (Table 12.2). AMSU-A and AMSU-B measure

Satellite-Based Precipitation Retrieval Using Neural Networks

241

TABLE 12.1 AMSU-A Channel Frequencies Channel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Center Frequencies (MHz) 23,800 + 72.5 31,400 + 50 50,300 + 50 52,800 + 105 53,596 + 115 54,400 + 105 54,940 + 105 55,500 + 87.5 57,290.344 + 87.5 57,290.344 + 217 57,290.344 + 322.2 + 48 57,290.344 + 322.2 + 22 57,290.344 + 322.2 + 10 57,290.344 + 322.2 + 4.5 89,000 + 900

Bandwidth (MHz) 2125 280 280 2190 2168 2190 2190 2155 2155 277 435 415 48 43 21000

brightness temperatures at 50 and 15 km nominal resolutions at nadir, respectively. AMSUA has a 3.338-diameter 3-dB beamwidth and observes at 308 angles spaced at 3.338 intervals up to 48.338 from nadir every 8.00 sec. AMSU-B has a 1.18-diameter beamwidth and observes at 908 angles spaced at 1.18 intervals up to 48.958 from nadir every 2.67 sec (Figure 12.5b) [28–30]. AMSU covers a swath width of 2200 km. NOAA-15, NOAA-16, and NOAA-17 are sun-synchronous polar-orbiting satellites with equatorial crossing times of about 7 A.M./P.M., 2 A.M./P.M., and 10 A.M./P.M., respectively, so together they observe most locations approximately six times a day (Figure 12.6). The ascending local equatorial crossing times are 7 P.M., 2 P.M. 10 P.M. for NOAA–16 and NOAA–17, respectively. The 15-km, 89.0-GHz channel on AMSU-B was not used in this research in order that the algorithm developed for AMSU–A/B could also be used with AMSU/HSB with minimal adjustment. We also use data from the NASA Aqua satellite that was launched in May 2002. It is equipped with AMSU and is identical to AMSU-A aboard the NOAA satellites, and the HSB, which is identical to AMSU-B, but without the 89.0-GHz channel [31]. The nominal resolutions of AMSU and HSB are 40.5 and 13.5 km, respectively. Aqua has an equatorial crossing time of about 1:30 A.M./P.M. (Figure 12.6) [31–33]. The scan pattern of AMSU/HSB is slightly different from that of AMSU-A/B on the NOAA satellites in that the path traced during one AMSU scan is more parallel to that traced by a nearly coincident HSB scan (Figure 12.5). TABLE 12.2 AMSU-B Channel Frequencies Channel 1 2 3 4

Center Frequencies (GHz) 150 + 0.9 183.31 + 1 183.31 + 3 183.31 + 7

Bandwidth (GHz) 21 20.5 21 22

Signal and Image Processing for Remote Sensing

242

24⬚N

16⬚E

18⬚E

20⬚E

22⬚E

24⬚E

26⬚E

28⬚E

30⬚E

32⬚E

34⬚E

23⬚N 22⬚N 21⬚N 20⬚N 19⬚N

(a)

Aqua AMSU/HSB, Southbound 178⬚W 176⬚W 174⬚W 172⬚W 170⬚W 168⬚W 166⬚W 164⬚W 162⬚W 160⬚W 158⬚W 156⬚W 27⬚N 26⬚N 25⬚N 24⬚N 23⬚N 22⬚N 21⬚N

(b)

Aqua AMSU/HSB, Northbound

FIGURE 12.5 Scan patterns of (a) Aqua AMSU/HSB and (b) AMSU-A/B on NOAA-15, NOAA-16, and NOAA-17. AMSU and AMSU-A spots are labeled with þ’s and HSB and AMSU-B spots with dots.

12.4

Signal Processing

The preceding section provides an overview of the types of information available in data from AMSU-A/B. While developing an algorithm for estimating precipitation, it is important to process the data in a way that makes the information relevant to precipitation stand out as much as possible. This section provides an overview of the signal-processing methods used in the Chen and Staelin algorithm.

12.4.1

Regional Laplacian Filtering

Laplacian filtering is useful for clearing the effects of clouds over regions that are identified as cloudy. This enables computation of not only cloud-cleared brightness temperatures but also the perturbation due to precipitation. Laplacian filtering gives the Chen and Staelin algorithm a spatial filtering component not found in other algorithms that process data in a manner that treats each pixel independently of any other pixel. Laplacian filtering of a rectangular field F is done by first determining the region over which filtering is to be done. Using this region, a set of boundary conditions is determined. Then, using these boundary conditions, values for pixels in the region of interest are computed so that the discrete Laplace’s equation is satisfied. In a continuous twodimensional (2D) domain, Laplace’s equation is as follows: r2 F¼0

(12:18)

Satellite-Based Precipitation Retrieval Using Neural Networks

243

90⬚N 75⬚N 60⬚N 45⬚N

NOAA-15 7 A.M

30⬚N

NOAA-17

15⬚N 90⬚W

Aqua

10 A.M 60⬚W

0⬚

−16

90 k

30⬚W 15⬚S −2200

km

30⬚S

1:30 P.M

m

2 P.M

0⬚ 30⬚E

45⬚S

60⬚E NOAA-16

60⬚S

90⬚E

75⬚S 90⬚S FIGURE 12.6 Orbital patterns of the NOAA-15, NOAA-16, NOAA-17, and Aqua satellites.

12.4.2

Principal Component Analysis

Signal separation is an important concept in this chapter. Although precipitation is very complex, nonstationary, and sporadic, one can process the data in a way that adequately separates the different degrees of freedom that affect precipitation. Principal component analysis (PCA) is a linear method for reducing the dimensionality of a data set of interrelated variables. PCA transforms the data into a set of uncorrelated random variables that capture all of the variance of the original data set and assign as much variance as possible to the fewest number of variables. PCA is also known by other names such as singular value decomposition (SVD) and Karhunen–Loe`ve transform (KLT). PCA is useful for data compression. This feature can be critical in problems related to the compression of satellite data where a large amount of data must be downloaded from a satellite using a communications link with limited bandwidth and in situations where computational resources are limited. Blackwell used projected principal component transform, a variant of PCA, to compress data for use in estimating atmospheric temperature profiles. This reduced the number of inputs and, as a result, simplified the neural net, created a more stable neural net, and reduced the training time [34,35]. CabreraMercader used noise-adjusted principal components (NAPC) to compress simulated NASA Atmospheric Infrared Sounder (AIRS) data [36].

Signal and Image Processing for Remote Sensing

244

PCA is also used to filter out noise in data. Several different versions or extensions of PCA have been used to eliminate various types of noise. A variant of PCA has been used to remove signatures of the surface from passive microwave data for the purpose of detecting and characterizing precipitating clouds [37]. PCA is also an important part of blind multivariate noise estimation and filtering algorithms such as iterative order and noise (ION) estimation and an extension of ION that is capable of estimating mixing matrices [38–40]. 12.4.2.1 Basic PCA Here, a definition of basic PCA is presented. For a random vector x of p random variables, the principal components of x can be defined inductively. The first principal component is the product a1T x, where a1 is a unit-length vector that maximizes the variance of a1T x. a1 ¼ arg max Var(aT x) kak¼1

(12:19)

where Var() denotes the variance of a random variable. Each of the other principal components are defined as follows: the nth principal component is the product anT x, where an is a unit-length vector that is orthogonal to a1, a2, . . . , and an1 and maximizes the variance of anT x. an ¼ arg

max

kak¼1; a?ai ; 8i2{1; 2; ...; n1}

Var(aT x)

(12:20)

a1, a2, . . . , and ap are derived in Ref. [41]. The nth principal component is the product anTx where an is the eigenvector associated with the nth highest eigenvalue of the covariance matrix Lx of x. In this chapter, the definitions of PCA and the term principal component follow the convention of Ref. [41]. 12.4.2.2

Constrained PCA

In addition to the basic PCA, the algorithm developed in this chapter uses a variation of PCA known as constrained PCA. This form of PCA finds principal components that are constrained to be orthogonal to a given subspace [41]. It can be used to filter out noise in remote-sensing data. Constrained PCA has been used to filter out signatures of surface variations from microwave remote-sensing data [42]. Filtering out unwanted signatures involves the following steps: 1. Select a set of data that captures a good representation of the type of noise to be filtered without capturing too much of the variation of the signal of interest 2. Apply basic PCA to the resulting subset of data 3. Examine the resulting principal components for sensitivity to the type of noise to be filtered out 4. Project the data onto the subspace orthogonal to the noise-sensitive principal components 5. Apply basic PCA to the data resulting from the projection The principal components resulting from steps (2) and (5) are called preconstraint principal components and postconstraint principal components, respectively, as in Ref. [37].

Satellite-Based Precipitation Retrieval Using Neural Networks 12.4.3

245

Data Fusion

Data fusion is a very broad area involving the combination of information from different sources. A working group set up by the European Association of Remote Sensing Laboratories and the French Society for Electricity and Electronics has adopted the following definition of data fusion [42]: Data fusion is a formal framework in which are expressed means and tools for the alliance of data originating from different sources. It aims at obtaining information of greater quality; the exact definition of ‘greater quality’ will depend upon the application.

Review papers have referred to three levels of data fusion: measurement, feature, and decision [43–45]. The measurement level is sometimes called the pixel level [44]. The algorithm described in Section 12.5 involves the measurement and decision levels. In this chapter, nontrivial uses of data fusion occur only at the measurement level. Some of the applications of image fusion (or data fusion applied to 2D data) include image sharpening or enhancement [45], feature enhancement, and replacement of missing or faulty data [44]. For this research, nonlinear data fusion is applied to sharpen images. Rosenkranz developed a method for nonlinear geophysical parameter estimation through multi-resolution data fusion [46–48].

12.4.4

Neural Nets

Neural nets are computational structures that were developed to mimic the way biological neural nets learn from their environment and are useful for pattern recognition and classification. Neural nets can be used to learn and compute functions for which the relationships between inputs and outputs are unknown or computationally complex. There are a variety of neural nets such as feedforward neural nets (sometimes called multilayer perceptrons [49]), Kohonen self-organizing feature maps, and Hopfield nets [50,51]. The feedforward neural net is used in this chapter. The basic structural element of feedforward neural nets is called a perceptron. It computes a function of the weighted sum of inputs and a bias, as shown in Figure 12.7. y¼f

n X

! wi xi þb

(12:21)

i¼1

where xi is the ith input, wi is the weight associated with the ith input, b is the bias, f is the transfer function of the perceptron, and y is the output.

x1 x2

w1 w2

f (·)

xn

wn

b

y

FIGURE 12.7 The structure of a perceptron.

Signal and Image Processing for Remote Sensing

246 x1

w11 w12 f (·)

w1m w21

b1

x2

v1

w22 f (·) v2

w2m

g (·)

b2

y

c wn1

vm wn1 f (·)

xn

wnm bm

FIGURE 12.8 A two-layer feedforward neural net with one output node.

Perceptrons can be combined to form a multi-layer network, as shown in Figure 12.8. In Figure 12.8, xi is the ith input, n is the number of inputs, wij is the weight associated with the connection from the ith input to the jth node in the hidden layer, bi is the bias of the ith node, m is the number of nodes in the hidden layer, f is the transfer function of the perceptrons in the hidden layer, vi is the weight between the ith node and the output node, c is the bias of the output node, g is the transfer function of the output node, and y is the output. Then, 0 ! 1 m n X X vj f wij xi þbj þcA y¼g@ (12:22) j ¼1

i ¼1

In this chapter, f and g are defined as follows: f (x)¼ tanh x¼

ex ex ex þex

g(x)¼x

(12:23) (12:24)

The function tanh x is approximately linear in the range 0.6 x 0.6, and approaches 1 as x tends to 1 and 1 as x tends to 1, so it has a nonlinearity that is not too complex (Figure 12.9). This neural net topology is good for situations in which one wants to develop a simple nonlinear estimator whose output depends approximately monotonically on each input.

Satellite-Based Precipitation Retrieval Using Neural Networks

247

2.5 2

tanh(x) (solid) and x (dashed)

1.5 1 0.5 0

−0.5 −1

−1.5 −2 −2.5 −2.5

−2

−1.5

−1

−0.5

0 X

0.5

1

1.5

2

2.5

FIGURE 12.9 Neural net transfer functions.

The neural nets for this chapter were trained using the Levenberg–Marquardt training algorithm. Marquardt developed an efficient algorithm (called the Marquardt–Levenberg algorithm in Ref. [51]) for nonlinear least-square parameter estimation [52]. Hagan and Menhaj incorporated this algorithm into a backpropagation training algorithm for feedforward neural nets [52]. The weights of the neural net were initialized using the Nguyen– Widrow method to facilitate convergence of the neural net weights during training [53]. The vectors used to train and evaluate the neural nets were divided into three disjoint sets: 1. The training set, the set used to determine how the weights of the neural net should be adjusted during the training 2. The validation set, the set used to determine when the training should stop 3. The testing set, the set used to evaluate the resulting neural net These definitions are from Ref. [54]. One of the challenges encountered in the course of developing an estimator involved dealing with an output range that covered several orders of magnitude. Chapter 4 in this volume describes how this was accomplished.

12.5

The C hen–Staelin Algorithm

The basic structure of the algorithm includes some signal-processing components and a neural net, as shown in Figure 12.10. The signal processing components process the data

Signal and Image Processing for Remote Sensing

248 AMSU data Auxiliary data

Neural net

Signal processing

Rain rate metric

FIGURE 12.10 Basic structure of the algorithm.

into forms that characterize the most important degrees of freedom related to the precipitation rate such as atmospheric temperature profile, water vapor profile, cloud-top altitude, particle size distribution, and vertical updraft velocity. The neural net is trained to learn the nonlinear dependencies of precipitation rate on these variables. The dependence of the precipitation rate on these variables should be monotonic, so the neural net does not need to be complicated. A feedforward neural net with one hidden layer of tangent sigmoid nodes (with transfer function f(x) ¼ tanh x) and one linear output node should be sufficient (Figure 12.8) [49]. The Chen–Staelin algorithm uses the signal-processing methods in the previous section to extract the most relevant information from AMSU data. Figure 12.11 shows a block diagram of the first part of the algorithm and Figure 12.12 shows the final part of the algorithm with a neural net. A neural net that takes the following sets of inputs is at the heart of the algorithm: 1. Inferred 15-km-resolution perturbations at 52.8 GHz, 53.6 GHz, 54.4 GHz, 54.9 GHz, and 55.5 GHz.

HSB 183 ± 7 GHz

Compute 183 ±7 GHz perturbations

HSB 183 ± 3 GHz

Compute 183 ± 3 GHz perturbations

Out of bounds HSB 4 Flag invalid 183 ± 1 GHz 15-km points 183 ± 3 GHz 183 ± 7 GHz Out of bounds 150 GHz 13 Flag invalid AMSU 50-km points chs. 1–12,15

Land/sea flag Scan angle

9

C

OR

OR A

Interpolate to 15-km

AND

Extend to 50-km Form valid interpolation mask

Extend to 50-km D

B

Limb and surface 5 5 Laplacian 5 correction (chs. 4–8) interpolation − ⊕ + 5 cosine

FIGURE 12.11 Block diagram of the algorithm, part 1.

z < 0?

Compute scaling factor

y y < 0?

OR

z

E

53.6-GHz switch

Elevation flag

Surface elevation

AMSU chs. 4–12

Filter to 50-km

Extend to 50-km Form valid interpolation mask

NOT

AND

⊗

5

NOT

5 5 Set warm 5 5 Interpolate parts to 0 to 15-km 5 50-km, 54-GHz perturbations

15-km, 540-GHz 50-km valid 15-km valid retrieval mask perturbations retrieval mask

2. 183 + 1-, 183 + 3-, and 183 + 7-GHz 15-km HSB data.

Satellite-Based Precipitation Retrieval Using Neural Networks

249

5

15-km, 54-GHz perturbations

5

50-km, 54-GHz perturbations

– 9

AMSU chs. 4–12 Land/sea flag Scan angle

5

+

Limb and surface correction

50-km precipitation retrieval

3 5

Interpolate to 15-km

Temperature profile principal components

cosine

50-km valid retrieval mask

Linear transform

Filter to 50-km

3

AMSU chs. 1–3,15

4

4

Interpolate to 15-km

HSB 150-GHz

Linear transform

4 Water vapor profile principal components

HSB 183-GHz Satellite zenith angle

3

2

Neural x net

10x–1

3

3

15-km precipitation retrieval

15-km valid retrieval mask

secant

FIGURE 12.12 Block diagram of the algorithm, part 2.

3. The leading three principal components characterizing the original five corrected 50-km AMSU-A temperature radiances. 4. Two surface-insensitive principal components that characterize the window channels at 23.8 GHz, 31.4 GHz, 50 GHz, and 89 GHz, along with the four HSB channels. 5. The secant of the satellite zenith angle u. Each of these sets provides the neural net with information that is relevant to precipitation. The three principal components characterizing AMSU-A temperature radiances provide information on the atmospheric temperature profile, which is important because it determines how much water vapor can be precipitated. The secant of the satellite zenith angle u allows the neural net to account for variations in the data due to the scan angle. The 15-km cloud-induced perturbations provide information on the cloud-top altitude. The current AMSU/HSB precipitation retrieval algorithm is based on NOAA-15 AMSU comparisons with NEXRAD over the eastern United States during 38 orbits that exhibited significant precipitation and were distributed throughout the year. These orbits are listed in Table 12.3. The primary precipitation-rate retrieval products of AMSU/HSB are 15and 50-km-resolution contiguous retrievals over the viewing positions of HSB and AMSU, respectively, within 438 of nadir. The two outermost 50-km and six outermost 15-km viewing positions on each side of the swath are omitted due to their grazing angles. The algorithm architectures for these two retrieval methods and the derivation of the numerical coefficients characterizing the neural network are described and presented below.

12.5.1

Limb-Correction of Temperature Profile Channels

AMSU observes at angles up to 498 away from the nadir. For angles further away from nadir, the electromagnetic energy originating from a given altitude and atmospheric state

Signal and Image Processing for Remote Sensing

250 TABLE 12.3

List of Rainy Orbits Used for Training, Validation, and Testing October 16, 1999, 0030 UTC October 31, 1999, 0130 UTC November 2, 1999, 0045 UTC December 4, 1999, 1445 UTC December 12, 1999, 0100 UTC January 28, 2000, 0200 UTC January 31, 2000, 0045 UTC February 14, 2000, 0045 UTC February 27, 2000, 0045 UTC March 11, 2000, 0100 UTC March 17, 2000, 0015 UTC March 17, 2000, 0200 UTC March 19, 2000, 0115 UTC April 2, 2000, 0100 UTC April 4, 2000, 0015 UTC April 8, 2000, 0030 UTC April 12, 2000, 0045 UTC April 12, 2000, 0215 UTC April 20, 2000, 0100 UTC

April 30, 2000, 1430 UTC May 14, 2000, 0030 UTC May 19, 2000, 0015 UTC May 19, 2000, 0145 UTC May 20, 2000, 0130 UTC May 25, 2000, 0115 UTC June 10, 2000, 0200 UTC June 16, 2000, 0130 UTC June 30, 2000, 0115 UTC July 4, 2000, 0115 UTC July 15, 2000, 0030 UTC August 1, 2000, 0045 UTC August 8, 2000, 0145 UTC August 18, 2000, 0115 UTC August 23, 2000, 1315 UTC September 23, 2000, 1315 UTC October 5, 2000, 0130 UTC October 6, 2000, 0100 UTC October 14, 2000, 0130 UTC

has to travel longer paths before reaching the radiometer and, therefore, is subject to more absorption and scattering effects. This results in scan-angle-dependent effects in brightness temperature images, as shown in Figure 12.13a. A limb and surface correction method for AMSU-A channels 4–8 brightness temperatures is needed to make precipitation-induced perturbations more apparent and for extracting information about atmospheric conditions. AMSU-A channels 4 and 5 are corrected for surface variations, as they are sensitive to the surface. For these two channels, the brightness temperature for pixels over ocean is corrected to what might be observed for the same atmospheric conditions over land. AMSU-A channels 9–14 brightness temperatures are not corrected because they are not significantly perturbed by clouds and therefore are not used for anything other than limb correction. Limb and surface correction was done by training a neural net of the type shown in Figure 12.8 to estimate nadir-viewing brightness temperatures. For each pixel, the neural net used brightness temperatures from several channels at that pixel to estimate the brightness temperature seen at the pixel closest to nadir at a nearly identical latitude and at nearly the same time. It is assumed that the temperature field does not vary significantly over one scan. Limb and surface correction was done for AMSU-A channels 4–8. The data used to correct each of these channels are listed in Table 12.4. No attempt has been made to correct for the scan-angle-dependent asymmetry in the brightness temperatures. These neural nets were trained using data between 558N and 558S from seven orbits spaced over 1 year. Channels 4 and 5 are surface sensitive, so they were trained to estimate brightness temperatures that would be seen over land. Figure 12.13 shows a sample of (a) uncorrected and (b) limb and surface corrected 54.4GHz brightness temperatures. The shapes of precipitation systems over Texas and the Mexico–Guatemala border are more apparent after the limb correction. In Figure 12.13(a), the difference between brightness temperatures at nadir and the swath edge is as high as 18 K. In Figure 12.13(b), the angle-dependent variation is less than 3 K. One limitation of the training is that one cannot really know what the nadir-viewing brightness temperature is supposed to be when there is precipitation.

Satellite-Based Precipitation Retrieval Using Neural Networks 110⬚W

100⬚W

90⬚W

80⬚W

251 244

40⬚N

242

238 30⬚N

236 234

25⬚N

232

Brightness temperature (K)

240

35⬚N

230

20⬚N

228 15⬚N

226

(a) 110⬚W

100⬚W

90⬚W

80⬚W

243

40⬚N

242.5 242 241.5 30⬚N

241 240.5

25⬚N

240 239.5

20⬚N

Brightness temperature (K)

35⬚N

239 238.5

15⬚N

(b)

238

FIGURE 12.13 (See color insert following page 302.) NOAA-15 AMSU-A 54.4-GHz brightness temperatures for a northbound track on September 13, 2000. (a) Uncorrected and (b) limb and surface corrected.

12.5.2

Detection of Precipitation

The 15-km-resolution precipitation-rate retrieval algorithm, summarized in Figure 5.2, and Figure 5.3 begins with the identification of potentially precipitating pixels. The neural net operates on data from only FOVs labeled as potentially precipitating. This choice eliminates the need to exhaustively learn all of the conditions where the precipitation rate is exactly zero. The neural net is also likely to have difficulty forcing precipitation rates in nonprecipitating FOVs to be exactly zero. This choice also reduces the time needed to train the neural net since, at any given time, precipitation falls over less than 10% of the Earth’s surface. All 15-km pixels with brightness temperatures at 183 + 7 GHz that are below a threshold T7 are flagged as potentially precipitating, where

Signal and Image Processing for Remote Sensing

252 TABLE 12.4

Data Used in Limb and Surface Correction of AMSU-A Channels AMSU-A Channel

Inputs Used for Limb and Surface Correction

4 5 6 7 8

AMSU-A channels 4–12, land/sea flag, cos w AMSU-A channels 5–12, land/sea flag, cos w AMSU-A channels 6–12, cos w AMSU-A channels 6–12, cos w AMSU-A channels 6–12, cos w

T7 ¼0:667(T53:6 248)þ262þ6 cos u

(12:25)

and where u is the satellite zenith angle and T53.6 is the spatially filtered 53.6-GHz brightness temperature obtained by selecting the warmest brightness temperature within a 77 array of AMSU-B pixels. If, however, T53.6 is below 248 K, then the brightness temperature at 183 + 3 GHz is compared, instead, to a different threshold T3, where T3 ¼242:5þ5 cos u

(12:26)

The 183 + 3-GHz band is used to flag potential precipitation when the 183 + 7-GHz flag could be erroneously set by low-surface emissivity in very cold and dry atmospheres, as indicated by T53.6. These thresholds T7 and T3 are slightly colder than a saturated atmosphere would be, implying the presence of a microwave-absorbing cloud. If the locally filtered T53.6 is less than 242 K, then the pixel is assumed not to be precipitating. 12.5.3

Cloud-Clearing by Regional Laplacian Interpolation

Within regions flagged as potentially precipitating, strong precipitation is generally characterized by cold, cloud-induced perturbations of the AMSU-A tropospheric temperature sounding channels in the range of 52.5–55.6 GHz. Brightness temperature images approximately satisfy Laplace’s equation in the absence of precipitation. When the potentially precipitating FOVs have been identified, Laplacian interpolation can be performed to clear the brightness temperature image of the effects of precipitation, and the perturbations due to precipitation can be computed. Examples of 183 + 7-GHz data and the corresponding 50-km cold perturbations at 52.8 GHz are illustrated in Figure 12.14a and Figure 12.14c. Physical considerations and aircraft data show that convective cells near 54 GHz typically appear slightly off-center and less extended relative to the 183-GHz images [55,56].

12.5.4

Image Sharpening

The small interpolation errors in converting 54-GHz perturbations to 15-km contribute to the total errors and discrepancies discussed in Section 12.3. These 50-km-resolution 52.8GHz perturbations DT50,52.8 are then used to infer the perturbations DT15,52.8 [Figure 12.14d]. These might have been observed at 52.8 GHz with a 15-km resolution had those perturbations been distributed spatially in the same way as the cold perturbations observed at either 183 + 7 or 183 + 3 GHz, the choice between these two channels being the same as described above. This requires the bilinearly interpolated 50-km AMSU data

Satellite-Based Precipitation Retrieval Using Neural Networks 92⬚W

94⬚W

90⬚W

88⬚W

260

34⬚N

92⬚W

94⬚W

86⬚W

253 90⬚W

88⬚W

86⬚W

260 34⬚N

240

240

220 32⬚N

32⬚N

220 200 30⬚N

30⬚N

200

180 (a)

(b) 92⬚W

94⬚W

34⬚N

90⬚W

88⬚W

86⬚W

0

–2

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

0

–2

34⬚N

–4 –4

32⬚N

32⬚N

–6 –6 30⬚N

30⬚N

–8

–8 (c)

(d)

FIGURE 12.14 (See color insert following page 302.) Frontal system on September 13, 2000, 0130 UTC. (a) Brightness temperatures (K) near 183 + 7 GHz. (b) Brightness temperatures (K) near 183 + 3 GHz. (c) Brightness temperature perturbations (K) near 52.8 GHz. (d) Inferred 15-km-resolution brightness temperature perturbations (K) near 52.8 GHz.

to be resampled at the HSB beam positions. These inferred 15-km perturbations are computed for five AMSU-A channels using DT15,54 ¼ 20 tan h

DT15,183 DT50,54 DT50,183

(12:27)

The perturbation DT15,183 near 183 GHz is defined to be the difference between the observed radiance and the appropriate threshold given by (12.25) or (12.26). The perturbation DT50,54 near 54 GHz is defined to be the difference between the observed radiance and the Laplacian-interpolated radiance based on those pixels surrounding the flagged region [58]. Any warm perturbations in the images of DT15,183 and DT50,54 are set to zero. Limb and surface-emissivity corrections to nadir for the five 54-GHz channels are produced by neural networks for each channel; they operate on nine AMSU-A channels above 52 GHz, the cosine of the viewing angle w from nadir, and a land–sea flag (Figure 12.12). They were trained on seven orbits spaced over 1 year for latitudes up to +558. Inferred 50- and 15-km precipitation-induced perturbations at 52.8 GHz are shown in Figure 12.14c and Figure 12.14d for a frontal system. Such estimates of 15-km perturbations near 54 GHz help characterize heavily precipitating small cells.

Signal and Image Processing for Remote Sensing

254 12.5.5

Temperature and Water Vapor Profile Principal Components

One important determinant of precipitation is the temperature profile. Warmer atmospheres can hold more water vapor and result in higher vertical updraft velocities. Therefore, inputs to the neural net in Figure 12.10 should include some that have information about the temperature profile. For each of AMSU-A channels 4–8, the brightness temperatures were corrected for limb and surface effects and then processed to eliminate precipitation signatures with the methods described in Section 12.4.1 through Section 12.4.3. The corrected brightness temperatures from all five of these channels could have been inputs to the neural net in Figure 12.10, but it was determined that a more compact representation of these channels was sufficient. PCA was applied to these five channels, and the first three principal components were found to be sufficient for characterizing the temperature profile. Adding the fourth and fifth principal components did not significantly improve the training of the neural net. The water vapor profile is another important determinant of precipitation. Higher concentrations of water vapor can result in higher precipitation rates. The water vapor principal components are computed using AMSU-A channels 1–3, and 15, and the AMSU-B 150-, 183 + 7-, 183 + 3-, and 183 + 1-GHz channels. Some of these channels are sensitive to surface variations. Therefore, it is necessary to project the vector of these observations onto a subspace that is not significantly sensitive to surface variations. Constrained PCA, which was described in Section 12.4.2.2, was used to compute the water vapor principal components. A set of pixels without precipitation and with different types of surfaces was selected to compute surface-sensitive eigenvectors using PCA. The surface-sensitive eigenvectors were determined by visual inspection of the preconstraint principal components for correlation with surface features (e.g., land and sea boundaries). Then, a set of data that also included precipitation was selected. The observations over this set were projected onto a linear subspace that was orthogonal to the subspace spanned by the surface-sensitive eigenvectors. Then, PCA was done on the resulting data set to determine the water vapor principal components. It was found that two water vapor principal components were adequate for characterizing the eight channels.

12.5.6

The Neural Net

All 13 of the variables listed at the beginning of this section are fed into the neural net used for 15-km precipitation-rate retrievals, as shown in Figure 12.12. The relative insensitivity of these inputs to surface emissivity is important to the success of this technique over land, ice, and snow. This network was trained to minimize the rms value of the difference between the logarithms of the (AMSUþ1 mm/h) and (NEXRADþ1 mm/h) retrievals; the use of logarithms reduced the emphasis on the heaviest rain rates, which were roughly three orders of magnitude greater than the lightest rates. Adding 1 mm/h reduced the emphasis on the lightest rain rates, which were more noise-dominated. These intuitive choices clearly impact the retrieval error distribution, and therefore further studies should enable algorithm improvements. However, retrievals with training optimized for low rain rates did not markedly improve that regime. NEXRAD precipitation retrievals with a 2-km resolution were smoothed to approximate Gaussian spatial averages that were centered on and approximated the view-angle-distorted 15- or 50-km antenna beam patterns. The accuracy of NEXRAD precipitation observations is known to vary with distance; therefore, only points beyond 30 km, but within 110 km, of each NEXRAD radar site were included in the data used to train and test the neural nets. Eighty different networks were trained using the Levenberg–Marquardt algorithm, each with different numbers of nodes

Satellite-Based Precipitation Retrieval Using Neural Networks

255

and water vapor principal components. A network with nearly the best performance over the testing dataset was chosen; it used two surface-blind water vapor principal components, and a slightly better performance was achieved with five water vapor principal components with increased surface sensitivity. The final network had one hidden layer with five nodes that used the tanh sigmoid function. These neural networks were similar to those described in Ref. [57]. The resulting 15-km-resolution precipitation retrievals were then smoothed to yield 50-km retrievals. The 15-km retrieval neural network was trained using precipitation data from the 38 orbits listed in Table 12.3. During this period, the radio interference to AMSU-B was negligible relative to other sources of retrieval error. Each 15-km pixel flagged as potentially precipitating using 183 + 7- or 183 + 3-GHz radiances (see Figure 12.11 and Figure 12.12) was used for training, validation, or testing of the neural network. For these 38 orbits over the United States, 15 one-hundred and sixty 15-km pixels were flagged and considered suitable for training, validation, and testing; half were used for training and one quarter were used for each of validation and testing, where the validation pixels were used to determine when the training of the neural network should cease. On the basis of the final AMSU and NEXRAD 15-km retrievals, approximately 14 and 38%, respectively, of the flagged 15-km pixels appeared to have been precipitating less than 0.1 mm/h for the test set.

12.6

Retrieval Performance Evaluation

This section presents three forms of evaluation for this initial precipitation-rate retrieval algorithm: (1) representative qualitative comparisons of AMSU and NEXRAD precipitation rate images, (2) quantitative comparisons of AMSU and NEXRAD retrievals stratified by NEXRAD rain rate, and (3) representative precipitation images at more extreme latitudes beyond the NEXRAD training zone.

12.6.1

Image Comparisons of NEXRAD and AMSU-A/B Retrievals

Each NEXRAD comparison at 15-km resolution occurred within 8 min of satellite overpass; such coincidence is needed to characterize single-pixel retrievals because convective precipitation evolves rapidly on this spatial scale. Although comparison with instruments such as TRMM and SSM/I would be useful, their orbits unfortunately overlap those of AMSU within 8 min so infrequently (if ever) that comparisons over precipitation are too rare to be useful until several years of data have been analyzed. This challenge of simultaneity and the sporadic character of rain have restricted most prior instrument comparisons (passive microwave satellites, radar, rain gauges) to dimensions over 100 km and to periods of an hour to a month [58–60]. The uniformity and extent of the NEXRAD network offer a unique degree of simultaneity on 15- and 50-km scales and also the ability to match the Gaussian shape of the AMSU antenna beams. Although these AMSU/HSB–NEXRAD comparisons are encouraging because they involve single pixels and independent physics and facilities, further extensive analyses are required for real validation. For example, comparisons of precipitation averages and differences over the same time/space units used to validate other precipitation measurement systems (e.g., SSM/I [61], ATOVS, TRMM, rain gauges) are needed to characterize variances and systematic biases based on the precipitation rate, type, location, or season.

Signal and Image Processing for Remote Sensing

256

These biases include any present in the NEXRAD data used to train the AMSU/HSB algorithm; once characterized, they can be diminished. Any excess variance experienced for rain cells too small to be resolved by AMSU/HSB can also eventually be better characterized, although it is believed to be modest for cells with microwave signatures larger than 10 km. Smaller cells contribute little to the total rainfall.

12.6.2

Numerical Comparisons of NEXRAD and AMSU-A/B Retrievals

Figure 12.15a and Figure 12.15b present 15-km-resolution precipitation retrieval images for September 13, 2000, obtained from NEXRAD and AMSU, respectively. On this occasion, both sensors yielded rain rates over 50 mm/h at similar locations and lower rain rates down to 0.5 mm/h over comparable areas. The revealed morphology is thus very similar, even though AMSU observes 6 min before NEXRAD, and it senses altitudes that are separated by several kilometers; rain falling at a nominal rate of 10 m/s takes 10 min to fall 6 km. Figure 12.16 shows the scatter between the 15-km AMSU and NEXRAD rain-rate retrievals for the test pixels not used for training or validation. Figure 12.17 shows the scatter between the 50-km AMSU and NEXRAD rain-rate retrievals over all points flagged as precipitating.

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

34⬚N

50

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

50

40 34⬚N

40

30

30 32⬚N

32⬚N

30⬚N

20

20

10 30⬚N

10

0

0 (a)

(b) 92⬚W

94⬚W

34⬚N

90⬚W

88⬚W

86⬚W

50

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

50

40 34⬚N

40

30

30 32⬚N

32⬚N

30⬚N

20

20

10 30⬚N

10

0

0 (c)

(d)

FIGURE 12.15 (See color insert following page 302.) Precipitation rates (mm/h) above 0.5 mm/h observed on September 13, 2000, 0130 UTC. (a) 15-km-resolution NEXRAD retrievals, (b) 15-km-resolution AMSU retrievals, (c) 50-km-resolution NEXRAD retrievals, and (d) 50km-resolution AMSU retrievals.

Satellite-Based Precipitation Retrieval Using Neural Networks

257

15-km AMSU rain rate (mm/h) + 0.1 mm/h

103

102

101

100

10−1 10−1

100

101

102

103

15-km NEXRAD rain rate (mm/h) + 0.1 mm/h FIGURE 12.16 Comparison of AMSU and NEXRAD estimates of rain rate at 15-km resolution.

50-km AMSU rain rate (mm/h) + 0.1 mm/h

102

101

100

10−1 10−1

100 101 50-km NEXRAD rain rate (mm/h) + 0.1 mm/h

FIGURE 12.17 Comparison of AMSU and NEXRAD estimates of rain rate at 50-km resolution.

102

Signal and Image Processing for Remote Sensing

258

The relative sensitivity of AMSU and NEXRAD to light and heavy rain can be seen in Figure 12.17. In general, these figures suggest that AMSU responds less to the highest radar rain rates, perhaps because AMSU is less sensitive to the bright-band or hail anomalies that affect the radar. They also suggest that the risk of false rain detections increases for AMSU retrievals below 0.5 mm/h at a 50-km resolution, although further study is required. Greater accuracy at these low rates requires more space-time averaging and careful calibration. The risk of overestimating rain rate also appears to be limited. Only 3.3% of the total AMSU-derived rainfall was in areas where AMSU saw more than 1 mm/h and NEXRAD saw less than 1 mm/h. Only 7.6% of the total NEXRAD-derived rainfall was in areas where NEXRAD saw more than 1 mm/h and AMSU saw less than 1 mm/h. These percentages were compared with the total percentages of AMSU and NEXRAD rain that fell at rates above 1 mm/h, which were 94 and 97%, respectively. It is also interesting to see to what degree does each sensor retrieve rain when the other does not, and how much rain does each sensor miss. For example, of the 73 NEXRAD 15-km rain-rate retrievals in Figure 12.16 above 54 mm/h, none were found by AMSU to be below 3 mm/h, and of the 61 AMSU 15-km retrievals above 45 mm/h, none were found by NEXRAD to be below 16 mm/h. Also, of the 69 NEXRAD 50-km rainrate retrievals in Figure 12.17 above 30 mm/h, none were found by AMSU to be below 5 mm/h, and of the 102 AMSU 50-km retrievals above 16 mm/h, none were found by NEXRAD to be below 10 mm/h. Perhaps the most significant AMSU precipitation performance metric is the rms difference between the NEXRAD and AMSU rain-rate retrievals; these are grouped by retrieved NEXRAD rain rates in octaves. The central 26 AMSU-A scan angles and central 78 AMSU-B scan angles were included in these evaluations; only the outermost two AMSUA angles on each side were omitted. These comparisons used all 50-km pixels and only the 15-km pixels were not used for training or validation. The results are listed in Table 12.5. The smoothing of the 15-km NEXRAD and AMSU results to a nominal 50-km resolution was consistent with an AMSU-A Gaussian beamwidth of 3.38. The rms agreement between these two very different precipitation-rate sensors appears surprisingly good, particularly since a single AMSU neural network is used over all angles, seasons, and latitudes. The 3-GHz radar retrievals respond most strongly to the largest hydrometeors, especially those below the bright band near the freezing level, while AMSU interacts with the general population of hydrometeors in the top few kilometers of the precipitation cell, which may lie several kilometers above the freezing level. Much of the agreement between AMSU and NEXRAD rain-rate retrievals must therefore result from the statistical consistency of the relations between rain rate and its various electromagnetic signatures. It is difficult to say how much of the observed TABLE 12.5 RMS AMSU/NEXRAD Discrepancies (mm/h) NEXRAD Range <0.5 mm/h 0.5–1 mm/h 1–2 mm/h 2–4 mm/h 4–8 mm/h 8–16 mm/h 16–32 mm/h >32 mm/h

15-km Resolution

50-km Resolution

1.0 2.0 2.3 2.7 3.5 6.9 19.0 42.9

0.5 0.9 1.1 1.8 3.2 6.6 12.9 22.1

Satellite-Based Precipitation Retrieval Using Neural Networks

259

discrepancy is due to each sensor or how well each correlates with precipitation reaching the ground. Furthermore, this study provided an opportunity for evaluation of radar data. The rms discrepancies between AMSU and NEXRAD retrievals were separately calculated over all points at ranges from 110 to 230 km from any radar. For NEXRAD precipitation rates below 16 mm/h, these rms discrepancies were approximately 40% greater than those computed for test points at the 30- to 110-km range. At rain rates greater than 16 mm/h, the accuracies beyond 110 km were more comparable. Most points in the eastern United States are more than 110 km from any NEXRAD radar site.

12.6.3

Global Retrievals of Rain and Snow

Figure 12.18 illustrates precipitation-rate retrievals at points around the globe where radar confirmation data are scarce. Figure 12.18a shows precipitation retrievals in the tropics over a mix of land and sea, while Figure 12.18b shows a more intense tropical 120⬚E

125⬚E

130⬚E

135⬚E

14⬚N

22⬚N

12⬚N

20⬚N

95⬚E

100⬚E

105⬚E

110⬚E

18⬚N

10⬚N

16⬚N 8⬚N

14⬚N

6⬚N

12⬚N

4⬚N

10⬚N 8⬚N

2⬚N

6⬚N (b)

(a) 0⬚ 75⬚N

108⬚W

104⬚W

74⬚W 45⬚N

100⬚W

72⬚W

70⬚W

68⬚W

44⬚N 74⬚N 43⬚N 42⬚N

73⬚N

41⬚N 72⬚N 40⬚N 71⬚N (c)

39⬚N (d)

FIGURE 12.18 AMSU precipitation-rate retrievals (mm/h) with 15-km resolution. (a) Philippines, April 16, 2000; (b) Indochina, July 5, 2000; (c) Canada, August 2, 2000; and (d) New England snowstorm, March 5, 2001. Precipitation-rate retrievals exceed 0.5 mm/h in the shaded regions, and contours are drawn for 0.5 mm/h, 2 mm/h, 8 mm/h, 32 mm/h, and 128 mm/h. The peak retrieved values are 47 mm/h, 143 mm/h, 30 mm/h, and 1.5 mm/h in (a), (b), (c), and (d), respectively

Signal and Image Processing for Remote Sensing

260

event. Figure 12.18c illustrates strong precipitation near 728N to 748N, again over both land and sea. Finally, Figure 12.18d illustrates the March 5, 2001, New England snowstorm that deposited roughly a foot of snow within a few hours: an accumulation somewhat greater than is indicated by the retrieved rain rates of 1.2 mm/h. This applicability of the algorithm to snowfall rate should be expected because the observed radio emission originates exclusively at high altitudes. Whether the hydrometeors are rain or snow on impact depends only on air temperatures near the surface—far below those altitudes being probed. For essentially all of the pixels shown in Figure 12.18, the adjacent clear air exhibited temperature and humidity profiles (inferred from AMSU) within the range of the training set. Nonetheless, regional biases are expected and will require evaluation. For example, polar stratiform precipitation is expected to exhibit relatively weaker radiometric signatures in winter when the temperature lapse rates are lower, and snow-covered mountains in cold polar air can produce false detections.

12.7

Conclusions

In this chapter, the precipitation estimation method for microwave radiometric data from Chen and Staelin and the role of signal processing methods were described. The development of the Chen–Staelin algorithm shows that signal processing can play a useful role in satellite-based precipitation estimation. In this algorithm, which was developed for AMSU-A/B and AMSU/HSB, PCA was used to reduce the dimensionality of selected sets of channels and to separate the effects of surface variations from atmospheric variations. Data fusion was used to sharpen 50-km data from AMSU-A so that 15-km precipitation retrievals could be done. Laplacian filtering was applied to data from the 54-GHz band to quantify the effects of clouds, and neural nets were trained to learn the mathematical relationships between precipitation and the information resulting from the signal processing. The signal processing components of the algorithm were designed to process the brightness temperature measurements in a way that extracts the most relevant information, and the neural net was trained to learn the relationship between precipitation rate and the inputs. This Chen–Staelin algorithm represents a step in the ongoing development of microwave precipitation retrieval algorithms. The algorithm of Chen and Staelin likely can be improved by choosing more general signal processing methods or fine-tuning the ones already being used. For example, variations or extensions of PCA such as independent component analysis (ICA) could be used [63,64]. Additionally, methods like PCA and ICA can be improved by incorporating components from physics-based methods. Additional improvements can be made by making better use of the data from window channels. Future instruments also present opportunities for better precipitation retrievals. The Advanced Technology Microwave Sounder (ATMS) to be launched aboard the NPOESS preparatory project (NPP) and NPOESS satellite series could be considered a more advanced version of AMSU-A/B and AMSU/HSB because it has a set of channels very similar to that of AMSU-A/B and offers finer resolution and sampling for most channels [65,66]. Because of the similarity of channel sets, the algorithm of Chen and Staelin can be a starting point for the development of an algorithm for ATMS. The finer resolution and sampling of ATMS will likely lead to better image-sharpening methods and temperature and water vapor profile characterization.

Satellite-Based Precipitation Retrieval Using Neural Networks

261

Ackn owledgments The author wishes to thank P.W Rosenkranz, E. Williams, and M.M. Wolfson for helpful discussions, and S.P.L. Maloney and C. Lebell for assistance with the NEXRAD data. This work was supported by the National Oceanographic and Atmospheric Administration under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and not necessarily endorsed by the United States Government.

Re fer enc es 1. F.T. Ulaby, R.K. Moore, and A.K. Fung, Microwave Remote Sensing: Active and Passive, AddisonWesley, Reading, MA, 1981. 2. P.W. Rosenkranz, Retrieval of temperature of moisture profiles from AMSU-A and AMSU-B measurements, IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2429–2435, 2001. 3. P.W. Rosenkranz, Rapid radiative transfer model for AMSU/HSB channels, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 362–368, 2003. 4. L. Shi, Retrieval of atmospheric temperature profiles from AMSU-A measurement using a neural network approach, Journal of Atmospheric and Oceanic Technology, 18, 340–347, 2001. 5. W.J. Blackwell, J.W. Barrett, F.W. Chen, R.V. Leslie, P.W. Rosenkranz, M.J. Schwartz, and D.H. Staelin, NPOESS Aircraft Sounder Testbed-Microwave (NAST-M), instrument description and initial flight results, IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2444–2453, 2001. 6. R.V. Leslie, W.J. Blackwell, P.W. Rosenkranz, and D.H. Staelin, 183-GHz and 425-GHz passive microwave sounders on the NPOESS aircraft sounder testbed-microwave (NASTM), Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 1, 506–508, 2003. 7. R.V. Leslie, J.A. Loparo, P.W. Rosenkranz, and D.H. Staelin, Cloud and precipitation observations with the NPOESS Aircraft Sounder Testbed-Microwave (NAST-M) spectrometer suite at 54/118/183/425 GHz, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 2, 1212–1214, 2003. 8. R.V. Leslie and D.H. Staelin, NPOESS Aircraft Sounder Testbed-Microwave (NAST-M) observations of clouds and precipitation at 54, 118, 183, and 425 GHz, IEEE Transactions on Geoscience and Remote Sensing, 42(10), 2240–2247, 2004. 9. J. Houghton, The Physics of the Atmospheres, Cambridge University Press, New York, 2002. 10. Tropical Rainfall Measuring Mission, http://trmm.gsfc.nasa.gov/ 11. R.A. Houze, Cloud Dynamics, Academic Press, New York, 1993. 12. G. Hufford, A model for the complex permittivity of ice at frequencies below 1 THz, International Journal of Infrared and Millimeter Waves, 12(7), 677–682, 1991. 13. D. Deirmendjian, Electromagnetic Scattering on Spherical Polydispersions, Elsevier, New York, 1969. 14. M.S. Spina, M.J. Schwartz, D.H. Staelin, and A.J. Gasiewski, Application of multilayer feedforward neural networks to precipitation cell-top altitude estimation, IEEE Transactions on Geoscience and Remote Sensing, 36(1), 154–162, 1998. 15. A.J. Gasiewski, Atmospheric Temperature Sounding and Precipitation Cell Parameter Estimation Using Passive 118-GHz O2 Observations, Ph.D. thesis, MIT Department of Electrical Engineering and Computer Science, December 1988. 16. A.J. Gasiewski, Microwave radiative transfer in hydrometeors, Atmospheric Remote Sensing by Microwave Radiometry, Ed. M.A. Janssen, John Wiley & Sons, New York, 1993. 17. N. Grody, F. Weng, and R. Ferraro, Application of AMSU for obtaining hydrological parameters, Microwave Radiometry and Remote Sensing of the Earth’s Surface and Atmosphere, Eds. P. Pampaloni and S. Paloscia, pp. 339–351, 2000.

262

Signal and Image Processing for Remote Sensing

18. F. Weng, L. Zhao, R.R. Ferraro, G. Poe, X. Li, and N.C. Grody, Advanced microwave sounding unit cloud and precipitation algorithms, Radio Science, 38(4), 8068, 2003; doi: 10.1029/ 2002RS002679. 19. J.P. Hollinger, SSM/I instrument evaluation, IEEE Transactions on Geoscience and Remote Sensing, 28(5), 781–790, 1990. 20. C. Kummerow and L. Giglio, A passive microwave technique for estimating rainfall and vertical structure information from space. Part I: Algorithm description, Journal of Applied Meteorology, 33, 3–18, 1994. 21. D. Tsintikidis, J.L. Haferman, E.N. Anagnostou, W.F. Krajewski, and T.F. Smith, A neural network approach to estimating rainfall from spaceborne microwave data, IEEE Transactions on Geoscience and Remote Sensing, 35(5), 1079–1093, 1997. 22. T. Kawanishi, T. Sezai, Y. Ito, K. Imaoka, T. Takeshima, Y. Ishido, A. Shibata, M. Miura, H. Inahata, and R.W. Spencer, The Advanced Microwave Scanning Radiometer for the Earth observing system (AMSR-E), NASDA’s contribution to the EOS for global energy and water cycle studies, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 184–194, 2003. 23. T. Wilheit, C.D. Kummerow, and R. Ferraro, Rainfall algorithms for AMSR-E, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 204–213, 2003. 24. J. Ikai and K. Nakamura, Comparison of rain rates over the ocean derived from TRMM microwave imager and precipitation radar, Journal of Atmospheric and Oceanic Technology, 20, 1709–1726, 2003. 25. C. Kummerow, W. Barnes, T. Kozu, J. Shiue, and J. Simpson, The tropical rainfall measuring mission (TRMM) sensor package, Journal of Atmospheric and Oceanic Technology, 15, 809–817, 1998. 26. C. Kummerow and L. Giglio, A passive microwave technique for estimating rainfall and vertical structure information from space. Part I: Algorithm description, Journal of Applied Meteorology, 33, 3–18, 1994. 27. T. Wilheit, C.D. Kummerow, and R. Ferraro, Rainfall algorithms for AMSR-E, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 204–213, 2003. 28. T.J. Hewison and R. Saunders, Measurements of the AMSU-B antenna pattern, IEEE Transactions on Geoscience and Remote Sensing, 34(2), 405–412, 1996. 29. T. Mo, AMSU-A antenna pattern corrections, IEEE Transactions on Geoscience and Remote Sensing, 37(1), 103–112, 1999. 30. P.W. Rosenkranz, Improved rapid transmittance algorithm for microwave sounding channels, Proceedings of the 1998 IEEE International Geoscience and Remote Sensing Symposium, 2, 728–730, 1998. 31. B.H. Lambrigtsen, Calibration of the AIRS microwave instruments, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 369–378, 2003. 32. H.H. Aumann, M.T. Chahine, C. Gautier, M.D. Goldberg, E. Kalnay, L.M. McMillin, H. Revercomb, P.W. Rosenkranz, W.L. Smith, D.H. Staelin, L.L. Strow, and J. Susskind, AIRS/AMSU/ HSB on the aqua mission: design, science objectives, data products, and processing systems, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 253–264, 2003. 33. C.L. Parkinson, Aqua: an Earth-observing satellite mission to examine water and other climate variables, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 173–183, 2003. 34. W.J. Blackwell, Retrieval of Cloud-Cleared Atmospheric Temperature Profiles from Hyperspectral Infrared and Microwave Observations, Sc.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 2002. 35. W.J. Blackwell, Retrieval of atmospheric temperature and moisture profiles from hyperspectral sounding data using a projected principal components transform and a neural network, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 3, 2078–2081, 2003. 36. C.R. Cabrera-Mercader, Robust Compression of Multispectral Remote Sensing Data, Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Boston, MA, 1999. 37. F.W. Chen, Characterization of Clouds in Atmospheric Temperature Profile Retrievals, M.E. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Boston, MA, June 1998.

Satellite-Based Precipitation Retrieval Using Neural Networks

263

38. J. Lee, Blind Noise Estimation and Compensation for Improved Characterization of Multivariate Processes, Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, March 2000. 39. J. Lee and D.H. Staelin, Iterative signal-order and noise estimation for multivariate data, IEE Electronics Letters, 37(2), 134–135, 2001. 40. A. Mueller, Iterative Blind Seperation of Gaussian Data of Unknown Order, M.E. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer, June 2003. 41. I.T. Joliffe, Principal Component Analysis, Springer-Verlag, New York, 2002. 42. L. Wald, Some terms of reference in data fusion, IEEE Transactions on Geoscience and Remote Sensing, 37(3), 1190–1193, 1999. 43. D.L. Hall and J. Llinas, An introduction to multisensor data fusion, Proceedings of the IEEE, 85(1), 6–23, 1997. 44. C. Pohl and J.L. van Genderen, Multisensor image fusion in remote sensing: concepts, methods, and application, International Journal of Remote Sensing, 19(5), 823–854, 1998. 45. V.J.D. Tsai, Frequency-based fusion of multiresolution images, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 6, 3665–3667, 2003. 46. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 1. Linear case, Radio Science, 13(6), 1003–1010, 1978. 47. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 2. Nonlinear dependence of observables on the geophysical parameters, Radio Science, 17(1), 245–256, 1982. 48. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 3. Scanning multichannel microwave radiometer data, Radio Science, 17(1), 257–267, 1982. 49. R.P. Lippmann, An introduction to computing with neural nets, IEEE Acoustics, Speech, and Signal Processing Magazine, 4(2), 4–22, 1987. 50. S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1994. 51. M.T. Hagan and M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Transactions on Neural Networks, 5(6), 989–993, 1994. 52. D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics, 11(2), 431–441, 1963. 53. D. Nguyen and B. Widrow, Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights, Proceedings of the International Joint Conference on Neural Networks, 3, 21–26, 1990. 54. Neural Network Toolbox User’s Guide, The MathWorks, Inc., Natick, Massachusetts, 1998. 55. W.J. Blackwell et al., NPOESS aircraft sounder testbed-microwave (NAST-M): instrument description and initial flight results, IEEE Transactions on Geoscience and Remote Sensing, 39, 2444–2453, 2001. 56. A.J. Gasiewski, D.M. Jackson, J.R. Wang, P.E. Racette, and D.S. Zacharias, Airborne imaging of tropospheric emission at millimeter and submillimeter wavelengths, in Proceedings IGARSS, 2, 663–665, 1994. 57. D.H. Staelin and F.W. Chen, Precipitation observations near 54 and 183 GHz using the NOAA15 satellite, IEEE Transactions on Geoscience and Remote Sensing, 38, 2322–2332, 2000. 58. P.A. Arkin and P. Xie, The global precipitation climatology project: first algorithm intercomparison project, Bulletin of the American Meteorological Society, 75(3), 401–420, 1994. 59. E.A. Smith et al., Results of WetNet PIP-2 project, Journal of the Atmospheric Sciences, 55(9), 1483– 1536, 1998. 60. R.F. Adler et al., Intercomparison of global precipitation products: the third precipitation intercomparison project (PIP-3), Bulletin of the American Meteorological Society, 82(7), 1377–1396, 2001. 61. M.D. Conner and G.W. Petty, Validation and intercomparison of SSM/I rain-rate retrieval methods over the continental U.S., Journal of Applied Meteorology, 37(7), 679–700, 1998. 62. A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001. 63. F.R. Bach and M.I. Jordan, Kernel independent component analysis, Journal of Machine Learning Research, 3, 1–48, 2002.

264

Signal and Image Processing for Remote Sensing

64. J.C. Shiue, The advanced technology microwave sounder, a new atmospheric temperature and humidity sounder for operational polar-orbiting weather satellites, Proceedings of the SPIE, 4540, 159–165, 2001. 65. C. Muth, P.S. Lee, J.C. Shiue, and W.A. Webb, Advanced technology microwave sounder on NPOESS and NPP, Proceedings of the 2004 IEEE International Geoscience Remote Sensing Symposium, 4, 2454–2458, 2004. 66. F.W. Chen and D.H. Staelin, AIRS/AMSU/HSB precipitation estimates, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 410–417, 2003. 67. F.W. Chen, Global Estimation of Precipitation using opaque Microwave Bands, PhD thesis, Massachusetts Institute of Technology, Massachusetts, 2004.

Part II

Image Processing for Remote Sensing

13 Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

Dale L. Schuler, Jong-Sen Lee, and Dayalan Kasilingam

CONTENTS 13.1 Introduction ..................................................................................................................... 268 13.2 Measurement of Directional Slopes and Wave Spectra ........................................... 268 13.2.1 Single Polarization versus Fully Polarimetric SAR Techniques ............... 268 13.2.2 Single-Polarization SAR Measurements of Ocean Surface Properties..... 269 13.2.3 Measurement of Ocean Wave Slopes Using Polarimetric SAR Data....... 271 13.2.3.1 Orientation Angle Measurement of Azimuth Slopes ................ 271 13.2.3.2 Orientation Angle Measurement Using the Circular-Pol Algorithm .......................................................................................... 271 13.2.4 Ocean Wave Spectra Measured Using Orientation Angles....................... 272 13.2.5 Two-Scale Ocean-Scattering Model: Effect on the Orientation Angle Measurement ......................................................................................... 275 13.2.6 Alpha Parameter Measurement of Range Slopes........................................ 277 13.2.6.1 Cloude–Pottier Decomposition Theorem and the Alpha Parameter .......................................................................................... 277 13.2.6.2 Alpha Parameter Sensitivity to Range Traveling Waves .......... 279 13.2.6.3 Alpha Parameter Measurement of Range Slopes and Wave Spectra .................................................................................... 280 13.2.7 Measured Wave Properties and Comparisons with Buoy Data ............... 282 13.2.7.1 Coastal Wave Measurements: Gualala River Study Site........... 282 13.2.7.2 Open-Ocean Measurements: San Francisco Study Site ............. 284 13.3 Polarimetric Measurement of Ocean Wave–Current Interactions.......................... 286 13.3.1 Introduction ....................................................................................................... 286 13.3.2 Orientation Angle Changes Caused by Wave–Current Interactions ....... 287 13.3.3 Orientation Angle Changes at Ocean Current Fronts ................................ 291 13.3.4 Modeling SAR Images of Wave–Current Interactions ............................... 291 13.4 Ocean Surface Feature Mapping Using Current-Driven Slick Patterns ................ 293 13.4.1 Introduction ....................................................................................................... 293 13.4.2 Classification Algorithm .................................................................................. 297 13.4.2.1 Unsupervised Classification of Ocean Surface Features ........... 297 13.4.2.2 Classification Using Alpha–Entropy Values and the Wishart Classifier............................................................................. 297 13.4.2.3 Comparative Mapping of Slicks Using Other Classification Algorithms ........................................................................................ 300 13.5 Conclusions...................................................................................................................... 300 References ................................................................................................................................... 302 267

Signal and Image Processing for Remote Sensing

268

13.1

Introduction

Selected methods that use synthetic aperture radar (SAR) image data to remotely sense ocean surfaces are described in this chapter. Fully polarimetric SAR radars provide much more usable information than conventional single-polarization radars. Algorithms, presented here, to measure directional wave spectra, wave slopes, wave–current interactions, and current-driven surface features use this additional information. Polarimetric techniques that measure directional wave slopes and spectra with data collected from a single aircraft, or satellite, collection pass are described here. Conventional single-polarization backscatter cross-section measurements require two orthogonal passes and a complex SAR modulation transfer function (MTF) to determine vector slopes and directional wave spectra. The algorithm to measure wave spectra is described in Section 13.2. In the azimuth (flight) direction, wave-induced perturbations of the polarimetric orientation angle are used to sense the azimuth component of the wave slopes. In the orthogonal range direction, a technique involving an alpha parameter from the well-known Cloude–Pottier ) polarimetric decomposition theorem is entropy/anisotropy/averaged alpha (H/A/ used to measure the range slope component. Both measurement types are highly sensitive to ocean wave slopes and are directional. Together, they form a means of using polarimetric SAR image data to make complete directional measurements of ocean wave slopes and wave slope spectra. NASA Jet Propulsion Laboratory airborne SAR (AIRSAR) P-, L-, and C-band data obtained during flights over the coastal areas of California are used as wave-field examples. Wave parameters measured using the polarimetric methods are compared with those obtained using in situ NOAA National Data Buoy Center (NDBC) buoy products. In a second topic (Section 13.3), polarization orientation angles are used to remotely sense ocean wave slope distribution changes caused by ocean wave–current interactions. The wave–current features studied include surface manifestations of ocean internal waves and wave interactions with current fronts. A model [1], developed at the Naval Research Laboratory (NRL), is used to determine the parametric dependencies of the orientation angle on internal wave current, windwave direction, and wind-wave speed. An empirical relation is cited to relate orientation angle perturbations to the underlying parametric dependencies [1]. A third topic (Section 13.4) deals with the detection and classification of biogenic slick fields. Various techniques, using the Cloude–Pottier decomposition and Wishart classifier, are used to classify the slicks. An application utilizing current-driven ocean features, marked by slick patterns, is used to map spiral eddies. Finally, a related technique, using the polarimetric orientation angle, is used to segment slick fields from ocean wave slopes.

13.2 13.2.1

Measurement of Directional Slopes and Wave Spectra Single Polarization versus Fully Polarimetric SAR Techniques

SAR systems conventionally use backscatter intensity-based algorithms [2] to measure physical ocean wave parameters. SAR instruments, operating at a single-polarization, measure wave-induced backscatter cross section, or sigma-0, modulations that can be

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

269

developed into estimates of surface wave slopes or wave spectra. These measurements, however, require a parametrically complex MTF to relate the SAR backscatter measurements to the physical ocean wave properties [3]. Section 13.2.3 through Section 13.2.6 outline a means of using fully polarimetric SAR (POLSAR) data with algorithms [4] to measure ocean wave slopes. In the Fourier-transform domain, this orthogonal slope information is used to estimate a complete directional ocean wave slope spectrum. A parametrically simple measurement of the slope is made by using POLSAR-based algorithms. Modulations of the polarization orientation angle, u, are largely caused by waves traveling in the azimuth direction. The modulations are, to a lesser extent, also affected by range traveling waves. A method, originally used in topographic measurements [5], has been applied to the ocean and used to measure wave slopes. The method measures vector components of ocean wave slopes and wave spectra. Slopes smaller than 18 are measurable for ocean surfaces using this method. , described in Ref. An eigenvector or eigenvalue decomposition average parameter [6], is used to measure wave slopes in the orthogonal range direction. Waves in the range direction cause modulation of the local incidence angle f, which, in turn, changes . The alpha parameter is ‘‘roll-invariant.’’ This means that it is not affected the value of by slopes in the azimuth direction. Likewise, for ocean wave measurements, the orientation angle u parameter is largely insensitive to slopes in the range direction. An , u) is, therefore, capable of measuring slopes in any algorithm employing both ( direction. The ability to measure a physical parameter in two orthogonal directions within an individual resolution cell is rare. Microwave instruments, generally, must have a two-dimensional (2D) imaging or scanning capability to obtain information in two orthogonal directions. Motion-induced nonlinear ‘‘velocity-bunching’’ effects still present difficulties for wave measurements in the azimuth direction using POLSAR data. These difficulties are dealt with by using the same proven algorithms [3,7] that reduce nonlinearities for singlepolarization SAR measurements.

13.2.2

Single-Polarization SAR Measurements of Ocean Surface Properties

SAR systems have previously been used for imaging ocean features such as surface waves, shallow-water bathymetry, internal waves, current boundaries, slicks, and ship wakes [8]. In all of these applications, the modulation of the SAR image intensity by the ocean feature makes the feature visible in the image [9]. When imaging ocean surface waves, the main modulation mechanisms have been identified as tilt modulation, hydrodynamic modulation, and velocity bunching [2]. Tilt modulation is due to changes in the local incidence angle caused by the surface wave slopes [10]. Tilt modulation is strongest for waves traveling in the range direction. Hydrodynamic modulation is due to the hydrodynamic interactions between the long-scale surface waves and the short-scale surface (Bragg) waves that contribute most of the backscatter at moderate incidence angles [11]. Velocity bunching is a modulation process that is unique to SAR imaging systems [12]. It is a result of the azimuth shifting of scatterers in the image plane, owing to the motion of the scattering surface. Velocity bunching is the highest for azimuth traveling waves. In the past, considerable effort had gone into retrieving quantitative surface wave information from SAR images of ocean surface waves [13]. Data from satellite SAR missions, such as ERS 1 and 2 and RADARSAT 1 and 2, had been used to estimate surface wave spectra from SAR image information. Generally, wave height and wave

270

Signal and Image Processing for Remote Sensing

slope spectra are used as quantitative overall descriptors of the ocean surface wave properties [14]. Over the years, several different techniques have been developed for retrieving wave spectra from SAR image spectra [7,15,16]. Linear techniques, such as those having a linear MTF, are used to relate the wave spectrum to the image spectrum. Individual MTFs are derived for the three primary modulation mechanisms. A transformation based on the MTF is used to retrieve the wave spectrum from the SAR image spectrum. Since the technique is linear, it does not account for any nonlinear processes in the modulation mechanisms. It has been shown that SAR image modulation is nonlinear under certain ocean surface conditions. As the sea state increases, the degree of nonlinear behavior generally increases. Under these conditions, the linear methods do not provide accurate quantitative estimates of the wave spectra [15]. Thus, the linear transfer function method has limited utility and can be used as a qualitative indicator. More accurate estimates of wave spectra require the use of nonlinear inversion techniques [15]. Several nonlinear inversion techniques have been developed for retrieving wave spectra from SAR image spectra. Most of these techniques are based on a technique developed in Ref. [7]. The original method used an iterative technique to estimate the wave spectrum from the image spectrum. Initial estimates are obtained using a linear transfer function similar to the one used in Ref. [15]. These estimates are used as inputs in the forward SAR imaging model, and the revised image spectrum is used to iteratively correct the previous estimate of the wave spectra. The accuracy of this technique is dependent on the specific SAR imaging model. Improvements to this technique [17] have incorporated closed-form descriptions of the nonlinear transfer function, which relates the wave spectrum to the SAR image spectrum. However, this transfer function also has to be evaluated iteratively. Further improvements to this method have been suggested in Refs. [3,18]. In this method, a cross-spectrum is generated between different looks of the same ocean wave scene. The primary advantage of this method is that it resolves the 1808 ambiguity [3,18] of the wave direction. This method also reduces the effects of speckle in the SAR spectrum. Methods that incorporate additional a posteriori information about the wave field, which improves the accuracy of these nonlinear methods, have also been developed in recent years [19]. In all of the slope-retrieval methods, the one nonlinear mechanism that may completely destroy wave structure is velocity bunching [3,7]. Velocity bunching is a result of moving scatterers on the ocean surface either bunching or dilating in the SAR image domain. The shifting of the scatterers in the azimuth direction may, in extreme conditions, result in the destruction of the wave structure in the SAR image. SAR imaging simulations were performed at different range-to-velocity (R/V) ratios to study the effect of velocity bunching on the slope-retrieval algorithms. When the (R/V) ratio is artificially increased to large values, the effects of velocity bunching are expected to destroy the wave structure in the slope estimates. Simulations of the imaging process for a wide range of radar-viewing conditions indicate that the slope structure is preserved in the presence of moderate velocity-bunching modulation. It can be argued that for velocity bunching to affect the slope estimates, the (R/V) ratio has to be significantly larger than 100 s. The two data sets discussed here are designated ‘‘Gualala River’’ and ‘‘San Francisco.’’ The Gualala river data set has the longest waves and it also produces the best results. The R/V ratio for the AIRSAR missions was 59 s (Gualala) and 55 s (San Francisco). These values suggest that the effects of velocity bunching are present, but are not sufficiently strong to significantly affect the slope-retrieval process. However, for spaceborne SAR imaging applications, where the (R / V) ratio may be greater than 100 s, the effects of velocity bunching may limit the utility of all methods, especially in high sea states.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface 13.2.3

271

Measurement of Ocean Wave Slopes Using Polarimetric SAR Data

In this section, the techniques that were developed for the measurement of ocean surface slopes and wave spectra using the capabilities of fully polarimetric radars are discussed. Wave-induced perturbations of the polarization orientation angle are used to directly measure slopes for azimuth traveling waves. This technique is accurate for scattering from surface resolution cells where the sea return can be represented as a two-scale Bragg-scattering process. 13.2.3.1

Orientation Angle Measurement of Azimuth Slopes

It has been shown [5] that by measuring the orientation angle shift in the polarization signature, one can determine the effects of the azimuth surface tilts. In particular, the shift in the orientation angle is related to the azimuth surface tilt, the local incidence angle, and, to a lesser degree, the range tilt. This relationship is derived [20] and independently verified [6] as tan v tan u ¼ (13:1) sin f tan g cos f where u, tan v, tan g, and f are the shifts in the orientation angle, the azimuth slope, the ground range slope, and the radar look angle, respectively. According to Equation 13.1, the azimuth tilts may be estimated from the shift in the orientation angle, if the look angle and range tilt are known. The orthogonal range slope tan g can be estimated using the value of the local incidence angle associated with the alpha parameter for each pixel. The azimuth slope tan v and the range slope tan g provide complete slope information for each image pixel. For the ocean surface at scales of the size of the AIRSAR resolution cell (6.6 m 8.2 m), the averaged tilt angles are small and the denominator in Equation 13.1 may be approximated by sin f for a wide range of look angles, cos f, and ground range slope, tan g, values. Under this approximation, the ocean azimuth slope, tan v, is written as tan v ﬃ ( sin f) tan u

(13:2)

The above equation is important because it provides a direct link between polarimetric SAR measurable parameters and physical slopes on the ocean surface. This estimation of ocean slopes relies only on (1) the knowledge of the radar look angle (generally known from the SAR viewing geometry) and (2) the measurement of the wave-perturbed orientation angle. In ocean areas where the average scattering mechanism is predominantly tilted-Bragg scatter, the orientation angle can be measured accurately for angular changes <18, as demonstrated in Ref. [20]. POLSAR data can be represented by the scattering matrix for single-look complex data and by the Stokes matrix, the covariance matrix, or the coherency matrix for multi-look data. An orientation angle shift causes rotation of all these matrices about the line of sight. Several methods have been developed to estimate the azimuth slope–induced orientation angles for terrain and ocean applications. The ‘‘polarization signature maximum’’ method and the ‘‘circular polarization’’ method have proven to be the two most effective methods. Complete details of these methods and the relation of the orientation angle to orthogonal slopes and radar parameters are given [21,22]. 13.2.3.2

Orientation Angle Measurement Using the Circular-Pol Algorithm

Image processing was done with both the polarization signature maximum and the circular polarization algorithms. The results indicate that for ocean images a significant improvement in wave visibility is achieved when a circular polarization algorithm is chosen. In addition to this improvement, the circular polarization algorithm is

272

Signal and Image Processing for Remote Sensing

computationally more efficient. Therefore, the circular polarization algorithm method was chosen to estimate orientation angles. The most sensitive circular polarization estimator [21], which involves RR (right-hand transmit, right-hand receive) and LL (left-hand transmit, left-hand receive) terms, is * i) þ p]=4 u ¼ [Arg(hSRR SLL

(13:3)

A linear-pol basis has a similar transmit-and-receive convention, but the terms (HH, VV, HV, VH) involve horizontal (H) and vertical (V) transmitted, or received, components. The known relations between a circular-pol basis and a linear-pol basis are SRR ¼ (SHH SVV þ i2SHV )=2 SLL ¼ (SVV SHH þ i2SHV )=2

(13:4)

Using the above equation, the Arg term of Equation 13.3 can be written as 1 * i 4Re h(SHH SVV )SHV A u ¼ Arg(hSRR S*LL i) ¼ tan1 @ hjSHH SVV j2 i þ 4hjSHV j2 i 0

(13:5)

The above equation gives the orientation angle, u, in terms of three of the terms of the linear-pol coherency matrix. This algorithm has been proven to be successful in Ref. [21]. An example of the accuracy is cited from related earlier studies involving wave–current interactions [1]. In these studies, it has been shown that small wave slope asymmetries could be accurately detected as changes in the orientation angle. These small asymmetries had been predicted by theory [1] and their detection indicates the sensitivity of the circular-pol orientation angle measurement. 13.2.4

Ocean Wave Spectra Measured Using Orientation Angles

NASA/JPL/AIRSAR data were taken (1994) at L-band imaging a northern California coastal area near the town of Gualala (Mendocino County) and the Gualala River. This data set was used to determine if the azimuth component of an ocean wave spectrum could be measured using orientation angle modulation. The radar resolution cell had dimensions of 6.6 m (range direction) and 8.2 m (azimuth direction), and 3 3 boxcar averaging was done to the data inputted into the orientation angle algorithms. Figure 13.1 is an L-band, VV-pol, pseudo color-coded image of a northern California coastal area and the selected measurement study site. A wave system with an estimated dominant wavelength of 157 m is propagating through the site with a wind-wave direction of 3068 (estimates from wave spectra, Figure 13.4). The scattering geometry for a single average tilt radar resolution cell is shown in Figure 13.2. Modulations in the polarization orientation angle induced by azimuth traveling ocean waves in the study area are shown in Figure 13.3a and a histogram of the orientation angles is given in Figure 13.3b. An orientation angle spectrum versus wave number for azimuth direction waves propagating in the study area is given in Figure 13.4. The white rings correspond to ocean wavelengths of 50, 100, 150, and 200 m. The dominant 157-m wave is propagating at a heading of 3068. Figure 13.5a and Figure 13.5b give plots of spectral intensity versus wave number (a) for wave-induced orientation angle modulations and (b) for single-polarization (VV-pol)-intensity modulations. The plots are of wave spectra taken in the direction that maximizes the dominant wave peak. The orientation angle–dominant wave peak

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

273

N

Northern California Pacific Ocean

Wave direction (306ⴗ)

Gualala River

Range Study−site box (512 512)

Flight direction (azimuth) FIGURE 13.1 (See color insert following page 302.) An L-band, VV-pol, AIRSAR image, of northern California coastal waters (Gualala River dataset), showing ocean waves propagating through a study-site box.

(Figure 13.5a) has a significantly higher signal and background ratio than the conventional intensity-based VV-pol-dominant wave peak (Figure 13.5b). Finally, the orientation angles measured within the study sites were converted into azimuth direction slopes using an average incidence angle and Equation 13.2. From the estimates of these values, the ocean rms azimuth slopes were computed. These values are given in Table 13.1.

Signal and Image Processing for Remote Sensing

274

z

V

Ra

dar

H

Incidence plane

I1

N (Surface normal)

f (Ground range)

y

(A

zim

ut

h)

Tilted ocean resolution cell

x , I2 FIGURE 13.2 Geometry of scattering from a single, tilted, resolution cell. In the Gualala River dataset, the resolution cell has dimensions 6.6 m (range direction) and 8.2 m (azimuth direction).

(a)

Study area: orientation angles

FIGURE 13.3 (a) Image of modulations in the orientation angle, u, in the study site and (b) a histogram of the distribution of study-site u values. (continued)

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

275

(b) 1.2 x 104

Occurence number

1.0 x 104 8.0 x 103 6.0 x 103 4.0 x 103 2.0 x 103 0

–6

–4

–2 0 2 Orientation angle (deg)

4

6

FIGURE 13.3 (continued)

13.2.5

Two-Scale Ocean-Scattering Model: Effect on the Orientation Angle Measurement

In Section 13.2.3.2, Equation 13.5 is given for the orientation angle. This equation gives the orientation angle u as a function of three terms from the polarimetric coherency matrix T. Scattering has only been considered as occurring from a slightly rough, tilted surface equal to or greater than the size of the radar resolution cell (see Figure 13.2). The surface is planar and has a single tilt us. This section examines the effects of having a distribution of azimuth tilts, p (w), within the resolution cell, rather than a single averaged tilt. For single-look or multi-look processed data, the coherency matrix is defined as 2

D E jSHH þ SVV j2

3 h(SHH þ SVV )(SHH SVV )*i 2 (SHH þ SVV )S*HV D E 7 16 6 7 T ¼ hkk*T i ¼ 6 h(SHH SVV )(SHH þ SVV )*i 2 (SHH SVV )S*HV 7 jSHH SVV j2 5 24 D E 2hSHV (SHH þ SVV )*i 2hSHV (SHH SVV )*i 4 jSHV j2 2

3 S þ SVV 1 4 HH with k ¼ pﬃﬃﬃ SHH SVV 5 2 2SHV

(13:6)

We now follow the approach of Cloude and Pottier [23]. The composite surface consists of flat, slightly rough ‘‘facets,’’ which are tilted in the azimuth direction with a distribution of tilts, p(w): 1 jw sj b p(w) ¼ 2b (13:7) 0 otherwise where s is the average surface azimuth tilt of the entire radar resolution cell. The effect on T of having both (1) a mean bias azimuthal tilt us and (2) a distribution of azimuthal tilts b has been calculated in Ref. [22] as 2

3 A B sin c(2b) cos 2us B sin c(2b) sin 2us 5 T ¼ 4 B* sin c(2b) cos 2us 2C( sin2 2us þ sin c(4b) cos 4us ) C(1 2 sin c(4b)) sin 4us 2 B* sin c(2b) sin 2us C(1 2 sin c(4b)) sin 4us 2C( cos 2us sin c(4b) cos 4us ) (13:8)

Signal and Image Processing for Remote Sensing

276

50 m

Dominant wave: 157 m 100 m

200 m

150 m

Wave direction 306ⴗ FIGURE 13.4 (See color insert following page 302.) Orientation angle spectra versus wave number for azimuth direction waves propagating through the study site. The white rings correspond to 50, 100, 150, and 200 m. The dominant wave, of wavelength 157 m, is propagating at a heading of 3068.

where sin c(x) is ¼ sin(x)/x and A ¼ jSHH þ SVV j2 ,

B ¼ (SHH þ SVV )(S*HH S*VV ), C ¼ 0:5jSHH SVV j2

Equation 13.8 reveals the changes due to the tilt distribution (b) and bias (us) that occur in all terms, except in the term A ¼ jSHH þ SVVj2, which is roll-invariant. In the corresponding expression for the orientation angle, all of the other terms, except the denominator term hjSHH SVVj2i, are modified ! 4Re h(SHH SVV )S*HV i 1 u ¼ Arg(hSRR S*LL i) ¼ tan (13:9) hjSHH SVV j2 i þ 4hjSHV j2 i From Equation 13.8 and Equation 13.9, it can be determined that the exact estimation of the orientation angle u becomes more difficult as the distribution of ocean tilts b becomes stronger and wider because the ocean surface becomes progressively rougher.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

277

L-band orientation angle wave spectra

(a)

Spectral intensity

0.50

0.40

0.30

0.20

0.10

0.00

(b)

0.06 0.02 0.04 Azimuthal wave number

0.08

L-band, VV-intensity wave spectra 0.50

Spectral intensity

0.40

0.30

0.20

0.10 0.00 0.00

0.02

0.06 0.08 0.04 Azimuthal wave number

0.10

FIGURE 13.5 Plots of spectral intensity versus wave number (a) for wave-induced orientation angle modulations and (b) for VV-pol intensity modulations. The plots are taken in the propagation direction of the dominant wave (3068).

13.2.6

Alpha Parameter Measurement of Range Slopes

A second measurement technique is needed to remotely sense waves that have significant propagation direction components in the range direction. The technique must be more sensitive than current intensity-based techniques that depend on tilt and hydrodynamic modulations. Physically based POLSAR measurements of ocean slopes in the range direction are achieved using a technique involving the ‘‘alpha’’ parameter of the Cloude–Pottier polarimetric decomposition theorem [23]. 13.2.6.1

Cloude–Pottier Decomposition Theorem and the Alpha Parameter

The Cloude–Pottier entropy, anisotropy, and the alpha polarization decomposition theorem [23] introduce a new parameterization of the eigenvectors of the 3 3 averaged coherency matrix hjTji in the form

Signal and Image Processing for Remote Sensing

278 TABLE 13.1 Northern California: Gualala Coastal Results

In Situ Measurement Instrument Bodega Bay, CA 3-m Point Arena, CA Discus Buoy 46013 Wind Station

Parameter Dominant wave period (s) Dominant wavelength (m) Dominant wave direction (8) rms slopes azimuth direction (8) rms slopes range direction (8) Estimate of wave height (m)

10.0

Orientation Angle Method

N/A

Alpha Angle Method

156 From period, depth 320 Est. from wind direction N/A

10.03 From dominant wave number N/A 157 From wave spectra 284 Est. from 306 From wave wind direction spectra N/A 1.58

10.2 From dominant wave number 162 From wave spectra 306 From wave spectra N/A

N/A

N/A

N/A

1.36

2.4 Significant wave height

N/A

2.16 Est. from rms 1.92 Est. from rms slope, wave number slope, wave number

Date: 7/15/94; data start time (UTC): 20:04:44 (BB, PA), 20:02:98 (AIRSAR); wind speed: 1.0 m/s (BB), 2.9 m/s (PA) Mean ¼ 1.95 m/s; wind direction: 3208 (BB), 2848 (PA), mean ¼ 3028; Buoy: ‘‘Bodega Bay’’ (46013) ¼ BB; location: 38.23 N 123.33 W; water depth: 122.5 m; wind station: ‘‘Point Arena’’ (PTAC – 1) ¼ PA; location: 38.968 N, 123.74 W; study-site location: 38839.60 N, 123835.80 W.

2

l1 hjTji ¼ [U3 ] 4 0 0

3 0 0 5 [U3 ]*T l3

0 l2 0

(13:10)

where 2

cos a1 [U3 ] ¼ ejf 4 sin a1 cos b1 ejd1 sin a1 sin b1 ejg1

cos a2 ejf2 sin a2 cos b2 ejd2 sin a2 sin b2 ejg2

3 cos a3 ejf3 sin a3 cos b3 ejd3 5 sin a3 sin b3 ejg3

(13:11)

The average estimate of the alpha parameter is a ¼ P1 a 1 þ P 2 a 2 þ P 3 a 3

(13:12)

where li Pi ¼ Pj¼3 j¼1

: lj

(13:13)

The individual alphas are for the three eigenvectors and the Ps are the probabilities defined with respect to the eigenvalues. In this method, the average alpha is used and is, a. For the ocean backscatter, the contributions to the for simplicity, defined as average alpha are dominated, however, by the first eigenvalue or eigenvector term. The alpha parameter, developed from the Cloude–Pottier polarimetric scattering decomposition theorem [23], has desirable directional measurement properties. It is (1) roll-invariant in the azimuth direction and (2) in the range direction, it is highly sensitive to wave-induced modulations of f in the local incidence angle f. Thus, the

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

279

alpha parameter is well suited for measuring wave components traveling in the range direction, and discriminates against wave components traveling in the azimuth direction. 13.2.6.2 Alpha Parameter Sensitivity to Range Traveling Waves The alpha angle sensitivity to range traveling waves may be estimated using the small perturbation scattering model (SPSM) as a basis. For flat, slightly rough scattering areas that can be characterized by Bragg scattering, the scattering matrix has the form

S¼

SHH 0

0

SVV

(13:14)

Bragg-scattering coefficients SVV and SHH are given by

SHH

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ «r sin2 fi qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ cos fi þ «r sin2 fi cos fi

and

SVV ¼

(«r 1)( sin2 fi «r (1 þ sin2 fi ) ) qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 «r cos fi þ «r sin2 fi

(13:15)

The alpha angle is defined such that the eigenvectors of the coherency matrix T are parameterized by a vector k as 2 3 cos a k ¼ 4 sin a cos bejd 5 sin a sin bejg (13:16) For Bragg scattering, one may assume that there is only one dominant eigenvector (depolarization is negligible) and the eigenvector is given by 2

3 SVV þ SHH k ¼ 4 SVV SHH 5 0

(13:17)

Since there is only one dominant eigenvector, for Bragg scattering, a ¼ a1. For a horizontal, slightly rough resolution cell, the orientation angle b ¼ 0, and d may be set to zero. With these constraints, comparing Equation 13.16 and Equation 13.17 yields tan a ¼

SVV SHH SVV þ SHH

(13:18)

For « ! 1, SVV ¼ 1 þ sin2 fi and SHH ¼ cos2 fi

(13:19)

tan a ¼ sin2 fi

(13:20)

which yields

Figure 13.6 shows the alpha angle as a function of incidence angle for « ! 1 (blue) and for (red) « ¼ 80–70j, which is a representative dielectric constant of sea water. The sensitivity to alpha values to incidence angle changes (this is effectively the polarimetric

Signal and Image Processing for Remote Sensing

280 45 40

Alpha angle (deg)

35 30 25 20 15 10 5 0 0

10

20

30

40

50

60

70

80

90

Incidence angle (deg) FIGURE 13.6 (See color insert following page 302.) Small perturbation model dependence of alpha on the incidence angle. The red curve is for a dielectric constant representative of sea water (80–70j) and the blue curve is for a perfectly conducting surface.

MTF) as the range slope estimation is dependent on the derivative of a with respect to fi. For « ! 1 Da sin 2fi ¼ Dfi 1 þ sin4 fi

(13:21)

Figure 13.7 shows this curve (blue) and the exact curve for « ¼ 80–70j (red). Note that for the typical AIRSAR range of incidence angles (20–608), across the swath, the effective MTF is high (>0.5). 13.2.6.3

Alpha Parameter Measurement of Range Slopes and Wave Spectra

Model studies [6] result in an estimate of what the parametric relation a versus the incidence angle f should be for an assumed Bragg-scatter model. The sensitivity (i.e., the slope of the curve of a(f)) was large enough (Figure 13.6) to warrant investigation using real POLSAR ocean backscatter data. In Figure 13.8a, a curve of a versus the incidence angle f is given for a strip of Gualala data in the range direction that has been averaged 10 pixels in the azimuth direction. This curve shows a high sensitivity for the slope of a(f). Figure 13.8b gives a histogram of the frequency of occurrence of the alpha values. The curve of Figure 13.8a was smoothed by utilizing a least-square fit of the a(f) data to a third-order polynomial function. This closely fitting curve was used to transform the a values into corresponding incidence angle f perturbations. Pottier [6] used a model-based approach and fitted a third-order polynomial to the a(f) (red curve) of Figure 13.6 instead of using the smoothed, actual, image a(f) data. A distribution of f values has been made and the rms range slope value has been determined. The rms range slope values for the data sets are given in Table 13.1 and Table 13.2.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

281

0.9 0.8

Derivative of alpha (f )

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

10

20

30 40 50 60 70 Incidence angle (F ) (deg)

80

90

FIGURE 13.7 (See color insert following page 302.) Derivative of alpha with respect to the incidence angle. The red curve is for a sea water dielectric and the blue curve is for a perfectly conducting surface.

Finally, to measure an alpha wave spectrum, an image of the study area is formed with the mean of a(f) removed line by line in the range direction. An FFT of the study area results in the wave spectrum that is shown in Figure 13.9. The spectrum of Figure 13.9 is an alpha spectrum in the range direction. It can be converted to a range direction wave slope spectrum by transforming the slope values obtained from the smoothed alpha, a(f), values.

Alpha (deg)

(a)

40

20

0 20

30

40 Incidence angle (deg)

50

60

FIGURE 13.8 Empirical determination of the (a) sensitivity of the alpha parameter to the radar incidence angle (for Gualala River data) and (b) a histogram of the alpha values occurring within the study site. (continued)

Signal and Image Processing for Remote Sensing

282 (b) 1.4 × 104

Alpha value occurrences

1.2 × 104 1.0 × 104 8.0 × 103 6.0 × 103 4.0 × 103 2.0 × 103 0 –6

–4

–2 0 2 4 Alpha parameter perturbation (deg)

6

FIGURE 13.8 (continued) (b) a histogram of the alpha values occurring within the study site.

13.2.7

Measured Wave Properties and Comparisons with Buoy Data

The ocean wave properties estimated from the L- and P-band SAR data sets and the algorithms are the (1) dominant wavelength, (2) dominant wave direction, (3) rms slopes (azimuth and range), and (4) average dominant wave height. The NOAA NDBC buoys provided data on the (1) dominant wave period, (2) wind speed and direction, (3) significant wave height, and (4) wave classification (swell and wind waves). Both the Gualala and the San Francisco data sets involved waves classified as swell. Estimates of the average wave period can be determined either from buoy data or from the SAR-determined dominant wave number and water depth (see Equation 13.22). The dominant wavelength and direction are obtained from the wave spectra (see Figure 13.4 and Figure 13.9). The rms slopes in the azimuth direction are determined from the distribution of orientation angles converted to slope angles using Equation 13.2. The rms slopes in the range direction are determined by the distribution of alpha angles converted to slope angles using values of the smoothed curve fitted to the data of Figure 13.8a. Finally, an estimate of the average wave height, Hd, of the dominant wave was made using the peak-to-trough rms slope in the propagation direction Srms and the dominant wavelength ld. The estimated average dominant wave height was then determined from tan (Srms) ¼ Hd/(ld/2). This average dominant wave height estimate was compared with the (related) significant wave height provided by the NDBC buoy. The results of the measurement comparisons are given in Table 13.1 and Table 13.2. 13.2.7.1

Coastal Wave Measurements: Gualala River Study Site

For the Gualala River data set, parameters were calculated to characterize ocean waves present in the study area. Table 13.1 gives a summary of the ocean parameters that were determined using the data set as well as wind conditions and air and sea temperatures at the nearby NDBC buoy (‘‘Bodega Bay’’) and wind station (‘‘Point Arena’’) sites. The most important measured SAR parameters were rms wave slopes (azimuth and range directions), rms wave height, dominant wave period, and dominant wavelength. These quantities were estimated using the full-polarization data and the NDBC buoy data.

Open Ocean: Pacific Swell Results In Situ Measurement Instrument Parameter

San Francisco, CA, 3 m Discus Buoy 46026

Half Moon Bay, CA, 3 m Discus Buoy 46012

Dominant wave period (s)

15.7

15.7

Dominant wavelength (m) Dominant wave direction (8)

376 From period, depth 289 Est. from wind direction N/A N/A 3.10 Significant wave height

364 From period, depth 280 Est. from wind direction N/A N/A 2.80 Significant wave height

rms slopes azimuth direction (8) rms slopes range direction (8) Estimate of wave height (m)

Orientation Angle Method

Alpha Angle Method

15.17 From dominant wave number 359 From wave spectra 265 From wave spectra

15.23 From dominant wave number 362 From wave spectra 265 From wave spectra

0.92 N/A 2.88 Est. from rms slope, wave number

N/A 0.86 2.72 Est. from rms slope, wave number

Date: 7/17/88; data start time (UTC): 00:45:26 (Buoys SF, HMB), 00:52:28 (AIRSAR); wind speed: 8.1 m/s (SF), 5.0 m/s (HMB), mean ¼ 6.55 m/s; wind direction: 2898 (SF), 2808 (HMB), mean ¼ 284.58; Buoys: ‘‘San Francisco’’ (46026) ¼ SF; location: 37.75 N 122.82 W; water depth: 52.1 m; ‘‘Half Moon Bay’’ (46012) ¼ HMB; location: 37.36 N 122.88 W; water depth: 87.8 m.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

TABLE 13.2

283

Signal and Image Processing for Remote Sensing

284

Wave specturm

Dominant wave: 162 m

50 m

200 m

100 m

150 m

Wave direction 306ⴗ

FIGURE 13.9 (See color insert following page 302.) Spectrum of waves in the range direction using the alpha parameter from the Cloude–Pottier decomposition method. Wave direction is 3068 and dominant wavelength is 162 m.

The dominant wave at the Bodega Bay buoy during the measurement period is classified as a long wavelength swell. The contribution from wind wave systems or other swell components is small relative to the single dominant wave system. Using the surface gravity wave dispersion relation, one can calculate the dominant wavelength at this buoy location where the water depth is 122.5 m. The dispersion relation for surface water waves at finite depth is v2W ¼ gkW tan h(kH)

(13:22)

where vW is the wave frequency, kW is the wave number (2p/l), and H is the water depth. The calculated value for l is given in Table 13.1. A spectral profile similar to Figure 13.5a was developed for the alpha parameter technique and a dominant wave was measured having a wavelength of 156 m and a propagation direction of 3068. Estimates of the ocean parameters obtained using the orientation angle and alpha angle algorithms are summarized in Table 13.1. 13.2.7.2 Open-Ocean Measurements: San Francisco Study Site AIRSAR P-band image data were obtained for an ocean swell traveling in the azimuth direction. The location of this image was to the west of San Francisco Bay. It is a valuable

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

285

N

Wave direction

FIGURE 13.10 P-band span image of near-azimuth traveling (2658) swell in the Pacific Ocean off the coast of California near San Francisco.

data set because its location is near two NDBC buoys (‘‘San Francisco’’ and ‘‘Half-Moon Bay’’). Figure 13.10 gives a SPAN image of the ocean scene. The long-wavelength swell is clearly visible. The covariance matrix data was first Lee-filtered to reduce speckle noise [24] and was then corrected radiometrically. A polarimetric signature was developed for a 512 512 segment of the image and some distortion was noted. Measuring the distribution of the phase between the HH-pol and VV-pol backscatter returns eliminated this distortion. For the ocean, this distribution should have a mean nearly equal to zero. The recalibration procedure set the mean to zero and the distortion in the polarimetric signature was corrected. Figure 13.11a gives a plot of the spectral intensity (cross-section modulation) versus the wave number in the direction of the dominant wave propagation. Figure 13.11b presents a spectrum of orientation angles versus the wave number. The major peak, caused by the visible swell, in both plots occurs at a wave number of 0.0175 m1 or a wavelength of 359 m. Using Equation 13.22, the dominant wavelength was calculated at the San Francisco/Half Moon Bay buoy positions and depths. Estimates of the wave parameters developed from this data set using the orientation and alpha angle algorithms are presented in Table 13.2.

Signal and Image Processing for Remote Sensing

286 (a) 0.20

Spectral intensity

0.15

0.10

0.05

0.00 0.00

(b)

0.02

0.04 Azimuthal wave number

0.06

0.08

0.02

0.04 Azimuthal wave number

0.06

0.08

0.20

Spectral intensity

0.15

0.10

0.05

0.00 0.00

FIGURE 13.11 (a) Wave number spectrum of P-band intensity modulations and (b) a wave number spectrum of orientation angle modulations. The plots are taken in the propagation direction of the dominant wave (2658).

13.3 13.3.1

Polarimetric Measurement of Ocean Wave–Current Interactions Introduction

Studies have been carried out on the use of polarization orientation angles to remotely sense ocean wave slope distribution changes caused by wave–current interactions. The wave–current features studied here involve the surface manifestations of internal waves [1,25,26–29,30–32] and wave modifications at oceanic current fronts. Studies have shown that polarimetric SAR data may be used to measure bare surface roughness [33] and terrain topography [34,35]. Techniques have also been developed for measuring directional ocean wave spectra [36].

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

287

The polarimetric SAR image data used in all of the studies are NASA JPL/AIRSAR, P-, L-, and C-band, quad-pol microwave backscatter data. AIRSAR images of internal waves were obtained from the 1992 Joint US/Russia Internal Wave Remote Sensing Experiment (JUSREX’92) conducted in the New York Bight [25,26]. AIRSAR data on current fronts were obtained during the NRL Gulf Stream Experiment (NRL-GS’90). The NRL experiment is described in Ref. [20]. Extensive sea-truth is available for both of these experiments. These studies were motivated by the observation that strong perturbations occur in the polarization orientation angle u in the vicinity of internal waves and current fronts. The remote sensing of orientation angle changes associated with internal waves and current fronts are applications that have only recently been investigated [27–29]. Orientation angle changes should also occur for the related SAR application involving surface expressions of shallow-water bathymetry [37]. In the studies outlined here, polarization orientation angle changes are shown to be associated with wave–current interaction features. Orientation angle changes are not, however, produced by all types of ocean surface features. For example, orientation angle changes have been successfully used here to discriminate internal wave signatures from other ocean features, such as surfactant slicks, which produce no mean orientation angle changes.

13.3.2

Orientation Angle Changes Caused by Wave–Current Interactions

A study was undertaken to determine the effect that several important types of wave– current interactions have on the polarization orientation angle. The study involved both actual SAR data and an NRL theoretical model described in Ref. [38]. An example of a JPL/AIRSAR VV-polarization, L-band image of several strong, intersecting, internal wave packets is given in Figure 13.12. Packets of internal waves are generated from parent solitons as the soliton propagates into shallower water at, in this case, the continental shelf break. The white arrow in Figure 13.12 indicates the propagation direction of a wedge of internal waves (bounded by the dashed lines). The packet members within the area of this wedge were investigated. Radar cross-section (s0) intensity perturbations for the type of internal waves encountered in the New York Bight have been calculated in [30,31,39] and others. Related perturbations also occur in the ocean wave height and slope spectra. For the solitons often found in the New York Bight area, these perturbations become significantly larger for ocean wavelengths longer than about 0.25 m and shorter than 10–20 m. Thus, the study is essentially concerned with slope changes to meter-length wave scales. The AIRSAR slant range resolution cell size for these data is 6.6 m, and the azimuth resolution cell size is 12.1 m. These resolutions are fine enough for the SAR backscatter to be affected by perturbed wave slopes (meter-length scales). The changes in the orientation angle caused by these wave perturbations are seen in Figure 13.13. The magnitude of these perturbations covers a range u ¼ [1 to þ1]. The orientation angle perturbations have a large spatial extent (>100 m for the internal wave soliton width). The hypothesis assumed was that wave–current interactions make the meter wavelength slope distributions asymmetric. A profile of orientation angle perturbations caused by the internal wave study packet is given in Figure 13.14a. The values are obtained along the propagation vector line of Figure 13.12. Figure 13.14b gives a comparison of the orientation angle profile (solid line) and a normalized VV-pol backscatter intensity profile (dotted-dash line) along the same interval. Note that the orientation angle positive peaks (white stripe areas, Figure 13.13) align with the negative

Signal and Image Processing for Remote Sensing

288

Range direction

Azimuth direction

α

Internal waves packet propagation vector

FIGURE 13.12 AIRSAR L-band, VV-pol image of internal wave intersecting packets in the New York Bight. The arrow indicates the propagation direction for the chosen study packet (within the dashed lines). The angle a relates to the SAR/ packet coordinates. The image intensity has been normalized by the overall average.

troughs (black areas, Figure 13.12). In the direction orthogonal to the propagation vector, every point is averaged 5 5 pixels along the profile. The ratio of the maximum of u caused by the soliton to the average values of u within the ambient ocean is quite large. The current-induced asymmetry creates a mean wave slope that is manifested as a mean orientation angle. The relation between the tangent of the orientation angle u, wave slopes in the radar azimuth and ground range directions (tan v, tan g), and the radar look angle f from [21] is given by Equation 13.1, and for a given look angle f the average orientation angle tangent is

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

–1.0

0

289

+1.0

Orientation angle FIGURE 13.13 The orientation angle image of the internal wave packets in the New York Bight. The area within the wedge (dashed lines) was studied intensively.

htan ui ¼

ðp ðp

tan u(v,g) P(v,g) dg dv

(13:23)

0 0

where P(v,g) is the joint probability distribution function for the surface slopes in the azimuth and range directions. If the slopes are zero-meaned, but P(v,g) is skewed, then the mean orientation angle may not be zero even though the mean azimuth and range slopes are zero. It is evident from the above equation that both the azimuth and the range

Signal and Image Processing for Remote Sensing

290 (a)

1.0

Orientation angle (deg)

Internal waves Ambient ocean

0.5

0.0

–0.5

–1.0 0

200

400

600

800

Distance along line (pixels)

Normalized orientation angle and VV-pol backscatter intensity

(b)

1.0

0.5

0.0

–0.5 Orientation angle VV-pol intensity

–1.0 0

200

400

600

800

Distance along line (pixels) FIGURE 13.14 (a) The orientation angle value profile along the propagation vector for the internal wave study packet of Figure 13.12 and (b) a comparison of the orientation angle profile (solid line) and a normalized VV-pol backscatter intensity profile (dot–dash line). Note that the orientation angle positive peaks (white areas, Figure 13.13) align with the negative troughs (black areas, Figure 13. 11).

slopes have an effect on the mean orientation angle. The azimuth slope effect is generally larger because it is not reduced by the cos f term, which only affects the range slope. If, for instance, the meter-wavelength waves are produced by a broad wind-wave spectrum, then both v and g change locally. This yields a nonzero mean for the orientation angle. Figure 13.15 gives a histogram of orientation angle values (solid line) for a box inside the black area of the first packet member of the internal wave. A histogram for the ambient ocean orientation angle values for a similar-sized box near the internal wave is given by the dot–dash–dot line in Figure 13.15. Notice the significant difference in the mean value of these two distributions. The mean change in htan (u)i inferred from the bias for the perturbed area within the internal wave is 0.03 rad, corresponding to a u value of 1.728. The mean water wave slope changes needed to cause such orientation angle changes are estimated from Equation 13.1. In the denominator of Equation 13.1, the value of tan(g) cos(f) sin(f) for the value f ( ¼ 518) at the packet member location. Using this approximation, the ensemble average of Equation 13.1 provides the mean azimuth slope value,

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

291

150

Internal wave

Occurrences

100

Ambient ocean

50

0 –0.10

–0.15 –0.00 –0.05 Mean orientation angle tangent (radians)

–0.10

FIGURE 13.15 Distributions of orientation angles for the internal wave (solid line) and the ambient ocean (dot–dash–dot line).

htan(v)i ﬃ sin (f)htan(u)i

(13:24)

From the data provided in Figure 13.15, htan(v)i ¼ 0.0229 rad or v ¼ 1.328. A slope value of this magnitude is in approximate agreement with slope changes predicted by Lyzenga et al. [32] for internal waves in the same area during an earlier experiment (SARSEX, 1988). 13.3.3

Orientation Angle Changes at Ocean Current Fronts

An example of orientation angle changes induced by a second type of wave–current interaction, the convergent current front, is given in Figure 13.16a and Figure 13.16b. This image was created using AIRSAR P-band polarimetric data. The orientation angle response to this (NRL-GS’90) Gulf-Stream convergent-current front is the vertical white linear feature in Figure 13.16a and the sharp peak in Figure 13.16b. The perturbation of the orientation angle at, and near, the front location is quite strong relative to angle fluctuations in the ambient ocean. The change in the orientation angle maximum is ﬃ0.688. Other fronts in the same area of the Gulf Stream have similar changes in the orientation angle. 13.3.4

Modeling SAR Images of Wave–Current Interactions

To investigate wave–current-interaction features, a time-dependent ocean wave model has been developed that allows for general time-varying current, wind fields, and depth [20,38]. The model uses conservation of the wave action to compute the propagation of a statistical wind-wave system. The action density formalism that is used and an outline of the model are both described in Ref. [38]. The original model has been extended [1] to include calculations of polarization orientation angle changes due to wave–current interactions.

Signal and Image Processing for Remote Sensing

292 (a)

Front location

(b) 1.0

Orientation angle (deg)

Orientation angle response to current front 0.5

0.0

–0.5 0

2

Distance (km)

4

6

FIGURE 13.16 Current front within the Gulf Stream. An orientation angle image is given in (a) and orientation angle values are plotted in (b) (for values along the white line in (a)).

Model predictions have been made for the wind-wave field, radar return, and perturbation of the polarization orientation angle due to an internal wave. A model of the surface manifestation of an internal wave has also been developed. The algorithm used in the model has been modified from its original form to allow calculation of the polarization orientation angle and its variation throughout the extent of the soliton current field at the surface.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

293

Orientation angle tangent maximum variation

6 Primary Reponse (0.25–10.0 m)

5 4 3 2 1 0 0

5

20 10 15 Ocean wavelength (m)

25

FIGURE 13.17 The internal wave orientation angle tangent maximum variation as a function of ocean wavelength as predicted by the model. The primary response is in the range of 0.25–10.0 m and is in good agreement with previous studies of sigma-0. (From Thompson, D.R., J. Geophys. Res., 93, 12371, 1988.)

The values of both RCS (hs0i) and htan(u)i are computed by the model. The dependence of htan(u)i on the perturbed ocean wavelength was calculated by the model. This wavelength dependence is shown in Figure 13.17. The waves resonantly perturb htan(u)i for wavelengths in the range of 0.25–10.0 m. This result is in good agreement with previous studies of sigma-0 resonant perturbations for the JUSREX’92 area [39]. Figure 13.18a and Figure 13.18b show the form of the soliton current speed dependence of hs0i and htan(u)i. The potentially useful near-linear relation of htan(u)iv with current U (Figure 13.18b) is important in applications where determination of current gradients is the goal. The near-linear nature of this relationship provides the possibility that, from the value of htan(u)iv, the current magnitude can be estimated. Examination of the model results has led to the following empirical model of the variation of htan(u)i as: htan ui ¼ f (U, w, uw ) ¼ (aU) (w2 ebw ) sin (ajcw j þ bc2w )

(13:25)

where U, the surface current maximum speed (in m/s), w, the wind speed (in m/s) at (standard) 19.5 m height, and cw, the wind direction (in radians) relative to the soliton propagation direction. The constants are a ¼ 0.00347, b ¼ 0.365, a ¼ 0.65714, and b ¼ 0.10913. The range of cw is over [p,p]. Using Equation 13.25, the dashed curve in Figure 13.18 can be generated to show good agreement relative to the complete model. The solid lines in Figure 13.18 represent results from the complete model and the dashed lines are results from the empirical relation of Equation 13.25. This relation is much simpler than conventional estimates based on perturbation of the backscatter intensity. The scaling for the relationship is a relatively simple function of the wind speed and the direction of the locally wind-driven sea. If the orientation angle and wind measurements are available, then Equation 13.25 allows the internal wave current maximum U to be calculated.

13.4 13.4.1

Ocean Surface Feature Mapping Using Current-Driven Slick Patterns Introduction

Biogenic and man-made slicks are widely dispersed throughout the oceans. Current driven surface features, such as spiral eddies, can be made visible by associated patterns of slicks [40]. A combined algorithm using the Cloude–Pottier decomposition and the Wishart classifier

Signal and Image Processing for Remote Sensing

294

Current speed dependence

(a)

Max RCS variation (dB)

–18 –20 –22 –24 –26 –28 –30 –32 0.0

Max 〈tan(θ)〉 variation

(b)

0.2 0.4 Peak current speed (m / s)

0.6

0.008

0.006

0.004

0.002

0.000 0.0

0.2 0.4 Peak current speed (m/s) L-band, VV-pol, wind = (6.0 m/s, 135°)

0.6

FIGURE 13.18 (a and b) Model development of the current speed dependence of the max RCS and htan(u)i variations. The dashed line in Figure 13.23b gives the values predicted by an empirical equation in Ref. [38].

[41] is utilized to produce accurate maps of slick patterns and to suppress the background wave field. This technique uses the classified slick patterns to detect spiral eddies. Satellite SAR instruments performing wave spectral measurements, or operating as wind scatterometers, regard the slicks as a measurement error term. The classification maps produced by the algorithm facilitate the flagging of slick-contaminated pixels within the image. Aircraft L-band AIRSAR data (4/2003) taken in California coastal waters provided data on features that contained spiral eddies. The images also included biogenic slick patterns, internal wave packets, wind waves, and long wave swell. The temporal and spatial development of spiral eddies is of considerable importance to oceanographers. Slick patterns are used as ‘‘markers’’ to detect the presence and extent of spiral eddies generated in coastal waters. In a SAR image, the slicks appear as black distributed patterns of lower return. The slick patterns are most prevalent during periods of low to moderate winds. The spatial distribution of the slicks is determined by local surface current gradients that are associated with the spiral eddies. It has been determined that biogenic surfactant slicks may be identified and classified using SAR polarimetric decompositions. The purpose of the decomposition is to discriminate against other features such as background wave systems. The

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

295

) of the Cloude–Pottier parameters entropy (H), anisotropy (A), and average alpha ( decomposition [23] were used in the classification. The results indicate that biogenic slick patterns, classified by the algorithm, can be used to detect the spiral eddies. The decomposition parameters were also used to measure small-scale surface roughness as well as larger-scale rms slope distributions and wave spectra [4]. Examples of slope distributions are given in Figure 13.8b and that of wave spectra in Figure 13.9. Small-scale roughness variations that were detected by anisotropy changes are given in Figure 13.19. This figure shows variations in anisotropy at low wind speeds for a filament of colder, trapped water along the northern California coast. The air–sea stability has changed for the region containing the filament. The roughness changes are not seen in (a) the conventional VV-pol image but are clearly visible in (b) an anisotropy image. The data are from coastal waters near the Mendocino Co. town of Gualala. Finally, the classification algorithm may also be used to create a flag for the presence of slicks. Polarimetric satellite SAR systems (e.g., RADARSAT-2, ALOS/PALSAR, SIR-C) attempting to measure wave spectra, or scatterometers measuring wind speed and direction can avoid using slick contaminated data. In April 2003, the NRL and the NASA Jet Propulsion Laboratory (JPL) jointly carried out a series of AIRSAR flights over the Santa Monica Basin off the coast of California. Backscatter POLSAR image data at P-, L-, and C-bands were acquired. The purpose of the flights was to better understand the dynamical evolution of spiral eddies, which are

N

Anisotropy-related Roughness variations

California coast

California coast

Cold water mass

Pacific Ocean

L-band, VV-pol image

Pacific Ocean

Anisotropy-A image

FIGURE 13.19 (See color insert following page 302.) (a) Variations in anisotropy at low wind speeds for a filament of colder, trapped water along the northern California coast. The roughness changes are not seen in the conventional VV-pol image, but are clearly visible in (b) an anisotropy image. The data are from coastal waters near the Mendocino Co. town of Gualala.

Signal and Image Processing for Remote Sensing

296

generated in this area by interaction of currents with the Channel Islands. Sea-truth was gathered from a research vessel owned by the University of California at Los Angeles (UCLA). The flights yielded significant data not only on the time history of spiral eddies but also on surface waves, natural surfactants, and internal wave signatures. The data were analyzed using a polarimetric technique, the Cloude–Pottier hH/A/ai decomposition given in Ref. [23]. In Figure 13.20a, the anisotropy is again mapped for a study site

Anisotropy-A image

f = 23

f = 62 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

L-band, VV-pol image FIGURE 13.20 (See color insert following page 302.) (a) Image of anisotropy values. The quantity, 1A, is proportional to small-scale surface roughness and (b) a conventional L-band, VV-pol image of the study area.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

297

east of Catalina Island, CA. For comparison, a VV-pol image is given in Figure 13.20b. The slick field is reasonably well mapped by anisotropy—but the image is noisy because of the difference in the two small second and third eigenvalues that are used to compute it.

13.4.2

Classification Algorithm

The overall purpose of the field research effort outlined in Section 12.4.1 was to create a means of detecting ocean features such as spiral eddies using biogenic slicks as markers, while suppressing other effects such as wave fields and wind-gradient effects. A polarimetric classification algorithm [41–43] was tested as a candidate means to create such a feature map. 13.4.2.1 Unsupervised Classification of Ocean Surface Features Van Zyl [44] and Freeman–Durden [46] developed unsupervised classification algorithms that separate the image into four classes: odd-bounce, even bounce, diffuse (volume), and an in-determinate class. For an L-band image, the ocean surface typically is dominated by the characteristics of the Bragg-scattering odd (single) bounce. City buildings and structures have the characteristics of even (double) scattering, and heavy forest vegetation has the characteristics of diffuse (volume) scattering. Consequently, this classification algorithm provides information on the terrain scatterer type. For a refined separation into more classes, Pottier [6] proposed an unsupervised classification algorithm based on their target decomposition theory. The medium’s scattering mechanisms, characterized by average alpha angle, and later anisotropy A, were used for classification. entropy H, The entropy H is a measure of randomness of the scattering mechanisms, and the alpha angle characterizes the scattering mechanism. The unsupervised classification is achieved plane, which is segmented into by projecting the pixels of an image onto the H– scattering zones. The zones for the Gualala study-site data are shown in Figure 13.21. Details of this segmentation are given in Ref. [6]. In the alpha–entropy scattering zone map of the decomposition, backscatter returns from the ocean surface normally occur in the lowest (dark blue color) zone of both alpha and entropy. Returns from slick covered values, and occur in both the lowest areas have higher entropy H and average alpha zone and higher zones. 13.4.2.2

Classification Using Alpha–Entropy Values and the Wishart Classifier

Classification of the image was initiated by creating an alpha–entropy zone scatterplot to angle and level of entropy H for scatterers in the slick study area. determine the Secondly, the image was classified into eight distinct classes using the Wishart classifier [41]. The alpha–entropy decomposition method provides good image segmentation based on the scattering characteristics. The algorithm used is a combination of the unsupervised decomposition classifier and the supervised Wishart classifier [41]. One uses the segmented image of the decomposition method to form training sets as input for the Wishart classifier. It has been noted that , especially in the multi-look data are required to obtain meaningful results in H and entropy H. In general, 4-look processed data are not sufficient. Normally, additional averaging (e.g., 55 boxcar filter), either of the covariance or of coherency matrices, has computation. This prefiltering is done on all the to be performed prior to the H and . Initial classification data. The filtered coherency matrix is then used to compute H and is made using the eight zones. This initial classification map is then used to train the Wishart classification. The reclassified result shows improvement in retaining details.

Signal and Image Processing for Remote Sensing

Alpha angle

298

Entropy parameter FIGURE 13.21 (See color insert following page 302.) Alpha-entropy scatter plot for the image study area. The plot is divided into eight color-coded scattering classes for the Cloude–Pottier decomposition described in Ref. [6].

Further improvement is possible by using several iterations. The reclassified image is then used to update the cluster centers of the coherency matrices. For the present data, two iterations of this process were sufficient to produce good classifications of the complete biogenic fields. Figure 13.22 presents a completed classification map of the biogenic slick fields. Information is provided by the eight color-code classes in the image in Figure 13.22. The returns from within the largest slick (labeled as A) have classes that progressively increase in both average alpha and entropy as a path is made from clean water inward toward the center of the slick. Therefore, the scattering becomes less surfacelike ( increase) and also becomes more depolarized (H increase) as one approaches the center of the slick (Figure 13.22, Label A). The algorithm outlined above may be applied to an image containing large-scale ocean features. An image (JPL/CM6744) of classified slick patterns for two-linked spiral eddies near Catalina Island, CA, is given in Figure 13.23b. An L-band, HH-pol image is presented in Figure 13.23a for comparison. The Pacific swell is suppressed in areas where there are no slicks. The waves do, however, appear in areas where there are slicks because the currents associated with the orbital motion of the waves alternately compress or expand the slick-field density. Note the dark slick patch to the left of label A in Figure 13.23a and Figure 13.23b. This patch clearly has strongly suppressed the backscatter at HH-pol. The corresponding area of Figure 13.23b has been classified into three classes and colors (Class 7—salmon, Class 5—yellow, and Class 2—dark green), which indicate progressive increases in scattering complexity and depolarization as one moves from the perimeter of the slick toward its interior. A similar change in scattering occurs to the left of label B near the center of Figure 13.23a and Figure 13.23b. In this case, as one moves

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

299

A

Class 1 Class 2 Class 3 Class 4 Class 5 Class 7 Class 8 f = 23

Clean surface

Slicks trapped within internal wave packet

Slicks

f = 62

FIGURE 13.22 (See color insert following page 302.) scattering classes. Classification of the slick-field image into H/

(a)

(b)

A

A

Spiral eddy

B

B Class 1 Class 2

Spiral eddy

Class 3 Class 4 Class 5 Class 7 Class 8

L-band, HH-pol image

Wishartand H-Alpha algorithm classified image

FIGURE 13.23 (See color insert following page 302.) (a) L-band, HH-pol image of a second study image (CM6744) containing two strong spiral eddies marked by natural biogenic slicks and (b) classification of the slicks marking the spiral eddies. The image features were values combined with the Wishart classifier. classified into eight classes using the H–

Signal and Image Processing for Remote Sensing

300

from the perimeter into the slick toward the center, the classes and colors (Class 7— salmon, Class 4—light green, Class 1—white) also indicate progressive increases in scattering complexity and depolarization. 13.4.2.3

Comparative Mapping of Slicks Using Other Classification Algorithms

The question arises whether or not the algorithm using entropy-alpha values with the Wishart classifier is the best candidate for unsupervised detection and mapping of slick fields. Two candidate algorithms were suggested as possible competitive classification )– methods. These were (1) the Freeman–Durden decomposition [45] and (2) the (H/A/ Wishart segmentation algorithm [42,43], which introduce anisotropy to the parameter mix because of its sensitivity to ocean surface roughness. Programs were developed to investigate the slick classification capabilities of these candidate algorithms. The same amount of averaging (5 5) and speckle reduction was done for all of the algorithms. The results with the Freeman–Durden classification were poor at both L- and C-bands. Nearly all of the returns were surface, single-bounce scatter. This is expected because the Freeman– Durden decomposition was developed on the basis of scattering models of land features. This method could not discriminate between waves and slicks and did not improve on the results using conventional VV or HH polarization. )–Wishart segmentation algorithm was investigated to take advantage of The (H/A/ the small-scale roughness sensitivity of the polarimetric anisotropy A. The anisotropy is shown (Figure 13.20b) to be very sensitive to slick patterns across the whole image. The )–Wishart segmentation method expands the number of classes from 8 to 16 by (H/A/ including the anisotropy A. The best way to introduce information about A in the classification procedure is to carry out two successive Wishart classifier algorithms. The . Each class in the H/ plane is then further divided first classification only involves H/ into two classes according to whether the pixel’s anisotropy values are greater than 0.5 or less than 0.5. The Wishart classifier is then employed a second time. Details of this )–Wishart method algorithm are given in Ref. [42,43]. The results of using the (H/A/ and iterating it twice are given in Figure 13.24. )–Wishart method resulted in Classification of the slick-field image using the (H/A/ 14 scattering classes. Two of the expected 16 classes were suppressed. The Classes 1–7 corresponded to anisotropy A values from 0.5 to 1.0 and the Classes 8–14 corresponded to anisotropy A values from 0.0 to 0.49. The new two lighter blue vertical features at the lower right of the image appeared in all images involving anisotropy and were thought to be a smooth slick of the lower surfactant material concentration. This algorithm was an –Wishart algorithm for slick mapping. All of the slickimprovement relative to the H/ covered areas were classified well and the unwanted wave field intensity modulations were suppressed.

13.5

Conclusions

Methods that are capable of measuring ocean wave spectra and slope distributions in both the range and azimuth directions were described. The new measurements are sensitive and provide nearly direct measurements of ocean wave spectra and slopes without the need for a complex MTF. The orientation modulation spectrum has a higher dominant wave peak and background ratio than the intensity-based spectrum. The results

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

301

Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 FIGURE 13.24 (See color insert following page 302.) 14 scattering classes. The Classes 1–7 correspond to anisotropy Classification of the slick-field image into H/A/ A values 0.5 to 1.0 and the Classes 8–14 correspond to anisotropy A values 0.0 to 0.49. The two lighter blue vertical features at the lower right of the image appear in all images involving anisotropy and are thought to be smooth slicks of lower concentration.

determined for the dominant wave direction, wavelength, and wave height are comparable to the NDBC buoy measurements. The wave slope and wave spectra measurement methods that have been investigated may be developed further into fully operational algorithms. These algorithms may then be used by polarimetric SAR instruments, such as ALOS/PALSAR and RADARSAT II, to monitor sea-state conditions globally. Secondly, this work has investigated the effect of internal waves and current fronts on the SAR polarization orientation angle. The results provide a potential (1) independent means for identifying these ocean features and (2) a method of estimating the mean value of the surface current and slope changes associated with an internal wave. Simulations of the NRL wave–current interaction model [38] have been used to identify and quantify the different variables such as current speed, wind speed, and wind direction, which determine changes in the SAR polarization orientation angle. The polarimetric scattering properties of biogenic slicks have been found to be different from those of the clean surface wave field and the slicks may be separated from this background wave field. Damping of capillary waves, in the slick areas, lowers all of the eigenvalues of the decomposition and increases the average alpha angle, entropy, and the anisotropy. The Cloude–Pottier polarimetric decomposition was also used as a new means of studying scattering properties of surfactant slicks perturbed by current-driven surface features. The features, for example, spiral eddies, were marked by filament patterns of

302

Signal and Image Processing for Remote Sensing

slicks. These slick filaments were physically smoother. Backscatter from them was more complex (three eigenvalues nearly equal) and was more depolarized. Anisotropy was found to be sensitive to small-scale ocean surface roughness, but was not a function of large-scale range or azimuth wave slopes. These unique properties provided an achievable separation of roughness scales on the ocean surface at low wind speeds. Changes in anisotropy due to surfactant slicks were found to be measurable across the entire radar swath. Finally, polarimetric SAR decomposition parameters alpha, entropy, and anisotropy were used as an effective means for classifying biogenic slicks. Algorithms, using these parameters, were developed for the mapping of both slick fields and ocean surface features. Selective mapping of biogenic slick fields may be achieved using either the entropy or the alpha parameters with the Wishart classifier or, by the entropy, anisotropy, or the alpha parameters with the Wishart classifier. The latter algorithm gives the best results overall. Slick maps made using this algorithm are of use for satellite scatterometers and wave spectrometers in efforts aimed at flagging ocean surface areas that are contaminated by slick fields.

References 1. Schuler, D.L., Jansen, R.W., Lee, J.S., and Kasilingam, D., Polarisation orientation angle measurements of ocean internal waves and current fronts using polarimetric SAR, IEE Proc. Radar, Sonar Navigation, 150(3), 135–143, 2003. 2. Alpers, W., Ross, D.B., and Rufenach, C.L., The detectability of ocean surface waves by real and synthetic aperture radar, J. Geophys. Res., 86(C7), 6481, 1981. 3. Engen, G. and Johnsen, H., SAR-ocean wave inversion using image cross-spectra, IEEE Trans. Geosci. Rem. Sens., 33, 1047, 1995. 4. Schuler, D.L., Kasilingam, D., Lee, J.S., and Pottier, E., Studies of ocean wave spectra and surface features using polarimetric SAR, Proc. Int. Geosci. Rem. Sens. Symp. (IGARSS’03), Toulouse, France, IEEE, 2003. 5. Schuler, D.L., Lee, J.S., and De Grandi, G., Measurement of topography using polarimetric SAR Images, IEEE Trans. Geosci. Rem. Sens., 34, 1266, 1996. 6. Pottier, E., Unsupervised classification scheme and topography derivation of POLSAR data on the <> polarimetric decomposition theorem, Proc. 4th Int. Workshop Radar Polarimetry, IRESTE, Nantes, France, 535–548, 1998. 7. Hasselmann, K. and Hasselmann, S., The nonlinear mapping of an ocean wave spectrum into a synthetic aperture radar image spectrum and its inversion, J. Geophys. Res., 96(10), 713, 1991. 8. Vesecky, J.F. and Stewart, R.H., The observation of ocean surface phenomena using imagery from SEASAT synthetic aperture radar—an assessment, J. Geophys. Res., 87, 3397, 1982. 9. Beal, R.C., Gerling, T.W., Irvine, D.E., Monaldo, F.M., and Tilley, D.G., Spatial variations of ocean wave directional spectra from the SEASAT synthetic aperture radar, J. Geophys. Res., 91, 2433, 1986. 10. Valenzuela, G.R., Theories for the interaction of electromagnetic and oceanic waves—a review, Boundary Layer Meteorol., 13, 61, 1978. 11. Keller, W.C. and Wright, J.W., Microwave scattering and straining of wind-generated waves, Radio Sci., 10, 1091, 1975. 12. Alpers, W. and Rufenach, C.L., The effect of orbital velocity motions on synthetic aperture radar imagery of ocean waves, IEEE Trans. Antennas Propagat., 27, 685, 1979. 13. Plant, W.J. and Zurk, L.M., Dominant wave directions and significant wave heights from SAR imagery of the ocean, J. Geophys. Res., 102(C2), 3473, 1997.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

303

14. Hasselmann, K., Raney, R.K., Plant, W.J., Alpers, W., Shuchman, R.A., Lyzenga, D.R., Rufenach, C.L., and Tucker, M.J., Theory of synthetic aperture radar ocean imaging: a MARSEN view, J. Geophys. Res., 90, 4659, 1985. 15. Lyzenga, D.R., An analytic representation of the synthetic aperture radar image spectrum for ocean waves, J. Geophys. Res., 93(13), 859, 1998. 16. Kasilingam, D. and Shi, J., Artificial neural network based-inversion technique for extracting ocean surface wave spectra from SAR images, Proc. IGARSS’97, Singapore, IEEE, 1193–1195, 1997. 17. Hasselmann, S., Bruning, C., Hasselmann, K., and Heimbach, P., An improved algorithm for the retrieval of ocean wave spectra from synthetic aperture radar image spectra, J. Geophys. Res., 101, 16615, 1996. 18. Lehner, S., Schulz-Stellenfleth, Schattler, B., Breit, H., and Horstmann, J., Wind and wave measurements using complex ERS-2 SAR wave mode data, IEEE Trans. Geosci. Rem. Sens., 38(5), 2246, 2000. 19. Dowd, M., Vachon, P.W., and Dobson, F.W., Ocean wave extraction from RADARSAT synthetic aperture radar inter-look image cross-spectra, IEEE Trans. Geosci. Rem. Sens., 39, 21–37, 2001. 20. Lee, J.S., Jansen, R., Schuler, D., Ainsworth, T., Marmorino, G., and Chubb, S., Polarimetric analysis and modeling of multi-frequency SAR signatures from Gulf Stream fronts, IEEE J. Oceanic Eng., 23, 322, 1998. 21. Lee, J.S., Schuler, D.L., and Ainsworth, T.L., Polarimetric SAR data compensation for terrain azimuth slope variation, IEEE Trans. Geosci. Rem. Sens., 38, 2153–2163, 2000. 22. Lee, J.S., Schuler, D.L., Ainsworth, T.L., Krogager, E., Kasilingam, D., and Boerner, W.M., The estimation of radar polarization shifts induced by terrain slopes, IEEE Trans. Geosci. Rem. Sens., 40, 30–41, 2001. 23. Cloude, S.R. and Pottier, E., A review of target decomposition theorems in radar polarimetry, IEEE Trans. Geosci. Rem. Sens., 34(2), 498, 1996. 24. Lee, J.S., Grunes, M.R., and De Grandi, G., Polarimetric SAR speckle filtering and its implication for classification, IEEE Trans. Geosci. Rem. Sens., 37, 2363, 1999. 25. Gasparovic, R.F., Apel, J.R., and Kasischke, E., An overview of the SAR internal wave signature experiment, J. Geophys. Res., 93, 12304, 1998. 26. Gasparovic, R.F., Chapman, R., Monaldo, F.M., Porter, D.L., and Sterner, R.F., Joint U.S./Russia internal wave remote sensing experiment: interim results, Applied Physics Laboratory Report S1R-93U-011, Johns Hopkins University, 1993. 27. Schuler, D.L., Kasilingam, D., and Lee, J.S., Slope measurements of ocean internal waves and current fronts using polarimetric SAR, European Conference on Synthetic Aperture Radar (EUSAR’2002), Cologne, Germany, 2002. 28. Schuler, D.L., Kasilingam, D., Lee, J.S., Jansen, R.W., and De Grandi, G., Polarimetric SAR measurements of slope distribution and coherence changes due to internal waves and current fronts, Proc. Int. Geosci. Rem. Sens. (IGARSS’2002) Symp., Toronto, Canada, 2002. 29. Schuler, D.L., Lee, J.S., Kasilingam, D., and De Grandi, G., Studies of ocean current fronts and internal waves using polarimetric SAR coherences, in Proc. Prog. Electromagnetic Res. Symp. (PIERS’2002), Cambridge, MA, 2002. 30. Alpers, W., Theory of radar imaging of internal waves, Nature, 314, 245, 1985. 31. Brant, P., Alpers, W., and Backhaus, J.O., Study of the generation and propagation of internal waves in the Strait of Gibraltar using a numerical model and synthetic aperture radar images of the European ERS-1 satellite, J. Geophys. Res., 101(14), 14237, 1996. 32. Lyzenga, D.R. and Bennett, J.R., Full-spectrum modeling of synthetic aperture radar internal wave signatures, J. Geophys. Res., 93(C10), 12345, 1988. 33. Schuler D.L., Lee, J.S., Kasilingam, D., and Nesti, G., Surface roughness and slope measurements using polarimetric SAR data, IEEE Trans. Geosci. Rem. Sens., 40(3), 687, 2002. 34. Schuler, D.L., Ainsworth, T.L., Lee, J.S., and De Grandi, G., Topographic mapping using polarimetric SAR data, Int. J. Rem. Sens., 35(5), 1266, 1998. 35. Schuler, D.L, Lee, J.S., Ainsworth, T.L., and Grunes, M.R., Terrain topography measurement using multi-pass polarimetric synthetic aperture radar data, Radio Sci., 35(3), 813, 2002. 36. Schuler, D.L. and Lee, J.S., A microwave technique to improve the measurement of directional ocean wave spectra, Int. J. Rem. Sens., 16, 199, 1995.

304

Signal and Image Processing for Remote Sensing

37. Alpers, W. and Hennings, I., A theory of the imaging mechanism of underwater bottom topography by real and synthetic aperture radar, J. Geophys. Res., 89, 10529, 1984. 38. Jansen, R.W., Chubb, S.R., Fusina, R.A., and Valenzuela, G.R., Modeling of current features in Gulf Stream SAR imagery, Naval Research Laboratory Report NRL/MR/7234-93-7401, 1993 39. Thompson, D.R., Calculation of radar backscatter modulations from internal waves, J. Geophys. Res., 93(C10), 12371, 1988. 40. Schuler, D.L., Lee, J.S., and De Grandi, G., Spiral eddy detection using surfactant slick patterns and polarimetric SAR image decomposition techniques, Proc. Int. Geosci. Rem. Sens. Symp. (IGARSS), Anchorage, Alaska, September, 2004. 41. Lee, J.S., Grunes, M.R., Ainsworth, T.L., Du, L.J., Schuler, D.L., and Cloude, S.R., Unsupervised classification using polarimetric decomposition and the complex Wishart classifier, IEEE Trans. Geosci. Rem. Sens., 37(5), 2249, 1999. 42. Pottier, E. and Lee, J.S., Unsupervised classification scheme of POLSAR images based on the complex Wishart distribution and the polarimetric decomposition theorem, Proc. 3rd Eur. Conf. Synth. Aperture Radar (EUSAR’2000), Munich, Germany, 2000. 43. Ferro-Famil, L., Pottier, E., and Lee, J-S, Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier, IEEE Trans. Geosci. Rem. Sens., 39(11), 2332, 2001. 44. Van Zyl, J.J., Unsupervised classification of scattering mechanisms using radar polarimetry data, IEEE Trans. Geosci. Rem. Sens., 27, 36, 1989. 45. Freeman, A., and Durden, S.L., A three component scattering model for polarimetric SAR data, IEEE Trans. Geosci. Rem. Sens., 36, 963, 1998.

14 MRF-Based Remote-Sensing Image Classification with Automatic Model Parameter Estimation

Sebastiano B. Serpico and Gabriele Moser

CONTENTS 14.1 Introduction ..................................................................................................................... 305 14.2 Previous Work on MRF Parameter Estimation ......................................................... 306 14.3 Supervised MRF-Based Classification......................................................................... 308 14.3.1 MRF Models for Image Classification ........................................................... 308 14.3.2 Energy Functions .............................................................................................. 309 14.3.3 Operational Setting of the Proposed Method .............................................. 310 14.3.4 The Proposed MRF Parameter Optimization Method................................ 311 14.4 Experimental Results...................................................................................................... 314 14.4.1 Experiment I: Spatial MRF Model for Single-Date Image Classification.......................................................................................... 317 14.4.2 Experiment II: Spatio-Temporal MRF Model for Two-Date Multi-Temporal Image Classification ............................................................ 317 14.4.3 Experiment III: Spatio-Temporal MRF Model for Multi-Temporal Classification of Image Sequences.................................... 321 14.5 Conclusions...................................................................................................................... 322 Acknowledgments ..................................................................................................................... 324 References ................................................................................................................................... 324

14.1

Introduction

Within remote-sensing image analysis, Markov random field (MRF) models represent a powerful tool [1], due to their ability to integrate contextual information associated with the image data in the analysis process [2,3,4]. In particular, the use of a global model for the statistical dependence of all the image pixels in a given image-analysis scheme typically turns out to be an intractable task. The MRF approach offers a solution to this issue, as it allows expressing a global model of the contextual information by using only local relations among neighboring pixels [2]. Specifically, due to the Hammersley– Clifford theorem [3], a large class of global contextual models (i.e., the Gibbs random fields, GRFs [2]) can be proved to be equivalent to local MRFs, thus sharply reducing the related model complexity. In particular, MRFs have been used for remote-sensing image analysis, for single-date [5], multi-temporal [6–8], multi-source [9], and multi-resolution [10] 305

306

Signal and Image Processing for Remote Sensing

classification, for denoising [1], segmentation [11–14], anomaly detection [15], texture extraction [2,13,16], and change detection [17–19]. Focusing on the specific problem of image classification, the MRF approach allows one to express a ‘‘maximum-a-posteriori’’ (MAP) decision task as the minimization of a suitable energy function. Several techniques have been proposed to deal with this minimization problem, such as the simulated annealing (SA), an iterative stochastic optimization algorithm converging to a global minimum of the energy [3] but typically involving long execution times [2,5], the iterative conditional modes (ICM), an iterative deterministic algorithm converging to a local (but usually good) minimum point [20] and requiring much shorter computation times than SA [2], and the maximization of posterior marginals (MPM), which approximates the MAP rule by maximizing the marginal posterior distribution of the class label of each pixel instead of the joint posterior distribution of all image labels [2,11,21]. However, an MRF model usually involves the use of one or more internal parameters, thus requiring a preliminary parameter-setting stage before the application of the model itself. In particular, especially in the context of supervised classification, interactive ‘‘trial-and-error’’ procedures are typically employed to choose suitable values for the model parameters [2,5,9,11,17,19], while the problem of fast automatic parameter-setting for MRF classifiers is still an open issue in the MRF literature. The lack of effective automatic parameter-setting techniques has represented a significant limitation on the operational use of MRF-based supervised classification architectures, although such methodologies are known for their ability to generate accurate classification maps [9]. On the other hand, the availability of the above-mentioned unsupervised parameter-setting procedures has contributed to an extensive use of MRFs for segmentation and unsupervised classification purposes [22,23]. In the present chapter, an automatic parameter optimization algorithm is proposed to overcome the above limitation in the context of supervised image classification. The method refers to a broad category of MRF models, characterized by energy functions expressed as linear combinations of different energy contributions (e.g., representing different typologies of contextual information) [2,7]. The algorithm exploits this linear dependence to formalize the parameter-setting problem as the solution of a set of linear inequalities, and addresses this problem by extending to the present context the Ho–Kashyap method for linear classifier training [24,25]. The well-known convergence properties of such a method and the absence of parameters to be tuned are among the good features of the proposed technique. The chapter is organized as follows. Section 14.2 provides an overview of the previous work on the problem of MRF parameter estimation. Section 14.3 describes the methodological issues of the method, and Section 14.4 presents the results of the application of the technique to the classification of real (both single-date and multi-temporal) remote sensing images. Finally, conclusions are drawn in Section 14.5.

14.2

Previous Work on MRF Parameter Estimation

Several parameter-setting algorithms have been proposed in the context of MRF models for image segmentation (e.g., [12,22,26,27]) or unsupervised classification [23,28,29], although often resulting in a considerable computational burden. In particular, the usual maximum likelihood (ML) parameter estimation approach [30] exhibits good theoretical consistency properties when applied to GRFs and MRFs [31], but turns out to be computationally very expensive for most MRF models [22] or even intractable [20,32,33], due to the difficulty of analytical computation and numerical maximization of the

MRF-Based Remote-Sensing Image Classification

307

normalization constants involved by the MRF-based distributions (the so-called ‘‘partition functions’’) [20,23,33]. Operatively, the use of ML estimates for MRF parameters turns out to be restricted just to specific typologies of MRFs, such as continuous Gaussian [1,13,15,34,35] or generalized Gaussian [36] fields, which allows an ML estimation task to be formulated in an analytically feasible way. Beyond such case-specific techniques, essentially three approaches have been suggested in the literature to address the problem of unsupervised ML estimation of MRF parameters indirectly, namely the Monte Carlo methods, the stochastic gradient approaches, or the pseudo-likelihood approximations [23]. The combination of the ML criterion with Monte Carlo simulations has been proposed to overcome the difficulty of computation of the partition function [22,37] and it provides good estimation results, but it usually involves long execution times. Stochastic-gradient approaches aim at maximizing the log-likelihood function by integrating a Gibbs stochastic sampling strategy into the gradient ascent method [23,38]. Combinations of the stochastic-gradient approach with the iterated conditional expectation (ICE) and with an estimation–restoration scheme have also been proposed in Refs. [12,29], respectively. Approximate pseudo-likelihood functionals have been introduced [20,23], and they are numerically feasible, although the resulting estimates are not actual ML estimates (except in the trivial noncontextual case of pixel independence) [20]. Moreover the pseudo-likelihood approximation may underestimate the interactions between pixels and can provide unsatisfactory results unless the interactions are suitably weak [21,23]. Pseudo-likelihood approaches have also been developed in conjunction with mean-field approximations [26], with the expectationmaximization (EM) [21,39], the Metropolis–Hastings [40], and the ICE [21] algorithms, with Monte Carlo simulations [41], or with multi-resolution analysis [42]. In Ref. [20] a pseudo-likelihood approach is plugged into the ICM energy minimization strategy, by iteratively alternating the update of the contextual clustering map and the update of the parameter values. In Ref. [10] this method is integrated with EM in the context of multiresolution MRFs. A related technique is the ‘‘coding method,’’ which is based on a pseudo-likelihood functional computed over a subset of pixels, although these subsets depend on the choice of suitable coding strategies [34]. Several empirical or ad hoc estimators have also been developed. In Ref. [33] a family of MRF models with polynomial energy functions is proved to be dense in the space of all MRFs and is endowed with a case-specific estimation scheme based on the method of moments. In Ref. [43] a least-square approach is proposed for Ising-type and Potts-type MRF models [22], and it formulates an overdetermined system of linear equations relating the unknown parameters with a set of relative frequencies of pixel-label configurations. The combination of this method with EM is applied in Ref. [44] for sonar image segmentation purposes. A simpler but conceptually similar approach is adopted in Ref. [12] and is combined with ICE: a simple algebraic empirical estimator for a one-parameter spatial MRF model is developed and it directly relates the parameter value with the relative frequencies of several class-label configurations in the image. On the other hand, the literature about MRF parameter-setting for supervised image classification is very limited. Any unsupervised MRF parameter estimator can also naturally be applied to a supervised problem, by simply neglecting the training data in the estimation process. However, only a few techniques have been proposed so far, effectively exploiting the available training information for MRF parameter-optimization purposes as well. In particular, a heuristic algorithm is developed in Ref. [7] aiming to optimize automatically the parameters of a multi-temporal MRF model for a joint supervised classification of two-date imagery, and a genetic approach is combined in Ref. [45] with simulated annealing [3] for the estimation of the parameters of a spatial MRF model for multi-source classification.

Signal and Image Processing for Remote Sensing

308

14.3

Supervised MRF-Based Classification

14.3.1

MRF Models for Image Classification

Let I ¼ {x1, x2, . . . , xN} be a given n-band remote-sensing image, modeled as a set of N identically distributed n-variate random vectors. We assume M thematic classes v1, v2, . . . , vM to be present in the image and we denote the resulting set of classes by V ¼ {v1, v2, . . . , vM} and the class label of the k-th image pixel (k ¼ 1, 2, . . . , N) by sk 2 V. By operating in the context of supervised image classification, we assume a training set to be available, and we denote the index set of the training pixels by T {1, 2, . . . , N} and the corresponding true class label of the k-th training pixel (k 2 T) by sk*. When collecting all the feature vectors of the N image pixels in a single (N n)-dimensional column vector1 X ¼ col[x1, x2, . . . , xN] and all the pixel labels in a discrete random vector S ¼ (s1, s2, . . . , SN) 2 VN, the MAP decision rule (i.e., the Bayes rule for minimum classification error [46]) assigns to the image ~, which maximizes the joint posterior probability P(SjX), that is, data X the label vector S ~ ¼ arg max P(SjX) ¼ arg max ½p(XjS)P(S) S S2VN

S2VN

(14:1)

where p(XjS) and P(S) are the joint probability density function (PDF) of the global feature vector X conditioned to the label vector S and the joint probability mass function (PMF) of the label vector itself, respectively. The MRF approach offers a computationally tractable solution to this maximization problem by passing from a global model for the statistical dependence of the class labels to a model of the local image properties, defined according to a given neighborhood system [2,3]. Specifically, for each k-th image pixel, a neighborhood Nk {1, 2, . . . , N} is assumed to be defined, such that, for instance, Nk includes the four (firstorder neighborhood) or the eight (second-order neighborhood) pixels surrounding the k-th pixel (k ¼ 1, 2, . . . , N). More formally, a neighborhood system is a collection {Nk}kN¼ 1 of subsets of pixels such that each pixel is outside its neighborhood (i.e., k 2 = Nk for all k ¼ 1, 2, . . . , N) and neighboring pixels are always mutually neighbors (i.e., k 2 Nh if and only if h 2 Nk for all k,h ¼ 1, 2, . . . , N, k ¼ 6 h). This simple discrete topological structure attached to the image data is exploited in the MRF framework to model the statistical relationships between the class labels of spatially distinct pixels and to provide a computationally affordable solution to the global MAP classification problem of Equation 14.1. Specifically, we assume the feature vectors x1, x2, . . . , xN to be conditionally independent and identically distributed with PDF p(xjs) (x 2 Rn, s 2 V), that is [2], p(XjS) ¼

N Y

p(xk jsk )

(14:2)

k¼1

and the joint prior PMF P(S) to be a Markov random field with respect to the abovementioned neighborhood system, that is [2,3], .

1

the probability distribution of each k-th image label, conditioned to all the other image labels, is equivalent to the distribution of the k-th label conditioned only to the labels of the neighboring pixels (k ¼ 1, 2, . . . , N):

All the vectors in the chapter are implicitly assumed to be column vectors, and we denote by ui the i-th component of an m-dimensional vector u 2 Rm (i ¼ 1, 2, . . . , m), by ‘‘col’’ the operator of column vector juxtaposition (i.e., col [u,v] is the vector obtained by stacking the two vectors u 2 Rm and v 2 Rn in a single (m þ n)-dimensional column vector), and by the superscript ‘‘T’’ the matrix transpose operator.

MRF-Based Remote-Sensing Image Classification

309

P{sk ¼ vi jsh : h 6¼ k} ¼ P{sk ¼ vi jsh : h 2 Nk }, .

i ¼ 1, 2, . . . , M

(14:3)

the PMF of S is a strictly positive function on VN, that is, P(S) > 0 for all S 2 VN.

The Markov assumption expressed by Equation 14.3 allows restricting the statistical relationships among the image labels to the local relationships inside the predefined neighborhood, thus greatly simplifying the spatial-contextual model for the label distribution as compared to a generic global model for the joint PMF of all the image labels. However, as stated in the next subsection, due to the so-called Hammersley–Clifford theorem, despite this strong analytical simplification, a large class of contextual models can be accomplished under the MRF formalism. 14.3.2

Energy Functions

Given the neighborhood system {Nk}kN¼ 1 , we denote by ‘‘clique’’ a set Q of pixels (Q {1, 2, . . . , N}) such that, for each pair (k, h) of pixels in Q, k and h turn out to be mutually neighbors, that is, k 2 Nh

() h 2 Nk

8k, h 2 Q,

k 6¼ h

(14:4)

By marking the collection of all the cliques in the adopted neighborhood system by Q and the vector of the pixel labels in the clique Q 2 Q (i.e., SQ ¼ col[sk:k 2 Q]) by SQ, the Hammersley–Clifford theorem states that the label configuration S is an MRF if and only if, for any clique Q 2 Q there exists a real-valued function VQ (SQ) (usually named ‘‘potential function’’) of the pixel labels in Q, so that the global PMF of S is given by the following Gibbs distribution [3]: P(S) ¼

1 Zprior

Uprior (S) exp , Q

where: Uprior (S) ¼

X

VQ (SQ )

(14:5)

Q2Q

Q is a positive parameter, and Zprior is a normalizing constant. Because of the formal similarity between the probability distribution in Equation 14.5 and the well-known Maxwell–Boltzmann distribution introduced in statistical mechanics for canonical ensembles [47], Uprior (), Q, and Zprior are usually named ‘‘energy function,’’ ‘‘temperature,’’ and ‘‘partition function,’’ respectively. A very large class of statistical interactions among spatially distinct pixels can be modeled in this framework, by simply choosing a suitable function Uprior (), which makes the MRF approach highly flexible. In addition, when coupling the Hammersley–Clifford formulation of the prior PMF P(S) with the conditional independence assumption stated for the conditional PDF p(XjS) (see Equation 14.2), an energy representation also holds for the global posterior distribution P(SjX), that is [2], Upost (SjX) P(SjX) ¼ exp Q Zpost,X 1

(14:6)

where Zpost,X is a normalizing constant and Upost () is a posterior energy function, given by: Upost (SjX) ¼ Q

N X k¼1

ln p(xk jsk ) þ

X

VQ (SQ )

(14:7)

Q2Q

This formulation of the global posterior probability allows addressing the MAP classification task as the minimization of the energy function Upost(jX), which is locally

Signal and Image Processing for Remote Sensing

310

defined, according to the pixelwise PDF of the feature vectors conditioned to the class labels and to the collection of the potential functions. This makes the maximization problem of Equation 14.1 tractable and allows a contextual classification map of the image I to be feasibly generated. However, the (prior and posterior) energy functions are generally parameterized by several real parameters l1, l2, . . . , lL. Therefore, the solution of the minimum-energy problem requires the preliminary selection of a proper value for the parameter vector l ¼ (l1, l2, . . . , lL)T. As described in the next subsections, in the present chapter, we address this parameter-setting issue with regard to a broad family of MRF models, which have been employed in the remote-sensing literature, and to the ICM minimization strategy for the energy function. 14.3.3

Operational Setting of the Proposed Method

Among the techniques proposed in the literature to deal with the task of minimizing the posterior energy function (see Section 14.1), in the present chapter ICM is adopted as a trade-off between the effectiveness of the minimization process and the computation time [5,7], and it is endowed with an automatic parameter-setting stage. Specifically, ICM is 0 T initialized with a given label vector S0 ¼ (s10, s20, . . . , sN ) (e.g., generated by a previously applied noncontextual supervised classifier) and it iteratively modifies the class labels to decrease the energy function. In particular, by marking by Ck the set of the neighbor labels of the k-th pixel (i.e., the ‘‘context’’ of the k-th pixel, Ck ¼ col[sh:h 2 Nk]), at the t-th ICM iteration, the label sk is updated according to the feature vector xk and to the current neighboring labels Ckt, so that skt þ 1 is the class label vi that minimizes a local energy function Ul (vijxk,Ckt) (the subscript l is introduced to stress the dependence of the energy on the parameter vector l). More specifically, given the neighborhood system, an energy representation can also be proved for the local distribution of the class labels, that is, (i ¼ 1, 2, . . . , M; k ¼ 1, 2, . . . , N) [9]: 1 Ul (vi jxk , Ck ) P{sk ¼ vi jxk , Ck } ¼ exp (14:8) Zkl Q where Zkl is a further normalizing constant and the local energy is defined by X VQ (SQ ) Ul (vi jxk , Ck ) ¼ Q ln p(xk jvi ) þ

(14:9)

Q3k

Here we focus on the family of MRF models whose local energy functions Ul() can be expressed as weighted sums of distinct energy contributions, that is, Ul (vi jxk , Ck ) ¼

L X

l‘ E ‘ (vi jxk , Ck ), k ¼ 1, 2, . . . , N

(14:10)

‘¼1

where E ‘() is the ‘-th contribution and the parameter l‘ plays the role of the weight of E ‘() (‘ ¼ 1, 2, . . . , L). A formal comparison between Equation 14.9 and Equation 14.10 suggests that one of the considered L contributions (say, E 1()) should be related to the pixelwise conditional PDF of the feature vector (i.e., E 1(vijxk) / ln p(xkjvi)), which formalizes the spectral information associated with each single pixel. From this viewpoint, postulating the presence of (L 1) further contributions implicitly means that (L 1) typologies of contextual information are modeled by the adopted MRF and are supposed to be separable (i.e., combined in a purely additive way) [7]. Formally speaking, Equation 14.10

MRF-Based Remote-Sensing Image Classification

311

represents a constraint on the energy function, but most MRF models employed in remote sensing for classification, change detection, or segmentation purposes belong to this category. For instance, the pairwise interaction model described in Ref. [2], the wellknown spatial Potts MRF model employed in Ref. [17] for change detection purposes and in Ref. [5] for hyperspectral data classification, the spatio-temporal MRF model introduced in Ref. [7] for multi-date image classification, and the multi-source classification models defined in Ref. [9] belong to this family of MRFs. 14.3.4

The Proposed MRF Parameter Optimization Method

With regard to the class of MRFs identified by Equation 14.10, the proposed method employs the training data to identify a set of parameters for the purpose of maximizing the classification accuracy by exploiting the linear relation between the energy function and the parameter vector. When focusing on the first ICM iteration, the k-th training sample (k 2 T) is correctly classified by the minimum-energy rule if: Ul (sk jxk , C0k ) Ul (vi jxk , C0k )

8vi 6¼ s*k

(14:11)

or equivalently: L X

l‘ E ‘ (vi jxk , C0k ) E ‘ (s*k jxk , C0k ) 0

8vi 6¼ s*k

(14:12)

‘¼1

We note that the inequalities in Equation 14.12 are linear with respect to l, that is, the correct classification of a training pixel is expressed as a set of (M 1) linear inequalities with respect to the model parameters. More formally, when collecting the energy differences contained in Equation 14.12 in a single L-dimensional column vector: «ki ¼ col E ‘ (vi jxk , C0k ) E ‘ (s*k jxk , C0k ): ‘ ¼ 1, 2, . . . , L

(14:13)

the k-th training pixel is correctly classified by ICM if «Tki l 0 for all the class labels vi 6¼ sk* (k 2 T). By denoting by E the matrix obtained by juxtaposing all the row vectors «kiT (k 2 T, vi 6¼ sk*), we conclude that ICM correctly classifies the whole training set T if and only if2: El 0

(14:14)

Operatively, the matrix E includes all the coefficients (i.e., the energy differences) of the linear inequalities in Equation 14.12; hence, E has L columns and has a row for each inequality, that is, it has (M 1) rows for each training pixel (corresponding to the (M 1) class labels, which are different from the true label). Therefore, denoting by jTj the number of training samples (i.e., the cardinality of T), E is an (R L)-sized matrix, with R ¼ jTj (M 1). Since R L, the system in Equation 14.14 presents a number of inequalities, which is much larger than the number of unknown variables. In particular, a feasible approach would lie in formulating the matrix inequality in Equation 14.14 as an equality E l ¼ b, where binRR is a ‘‘margin’’ vector with positive components, to solve such equality by a minimum square error (MSE) approach [24]. However, MSE would require a preliminary manual choice of margin vector b. Therefore, we avoid using this approach, and note that this problem is formally identical to the problem of computing 2

Given two m-dimensional vectors, u and v (u, v 2 Rm), we write u v to mean ui vi for all i ¼ 1, 2, . . . , m.

Signal and Image Processing for Remote Sensing

312

the weight parameters of a linear binary discriminant function [24]. Hence, we propose to address the solution of Equation 14.14 by extending to the present context the methods described in the literature to compute such a linear discriminant function. In particular, we adopt the Ho–Kashyap method [24]. This approach jointly optimizes both l and b according to the following quadratic programing problem [24]: 8 > kEl bk2 < min l ,b L R > :l 2 R ,b 2 R br > 0 for r ¼ 1, 2, . . . , R

(14:15)

which is solved by alternating iteratively an MSE step to update l and a gradientlike descent step to update b [24]. Specifically, the following operations are performed at the t-th step (t ¼ 0, 1, 2, . . . ) of the Ho–Kashyap procedure [24]: .

given the current margin vector bt, compute a corresponding parameter vector lt by minimizing kE l btk2 with respect to l, that is, compute: lt ¼ E# bt , where E# ¼ (ET E)1 ET

(14:16)

is the so-called pseudo-inverse of E (provided that ETE is nonsingular) [48] . .

compute the error vector et ¼ Elt bt 2 RR update each component of the margin vector by minimizing kElt bk2 with respect to b by a gradient-like descent step that allows the margin components only to be increased [24], that is, compute

brtþ1 ¼

btr þ retr btr

if etr > 0 if etr 0

r ¼ 1, 2, . . . , R

(14:17)

where r is a convergence parameter. In particular, the proposed approach to the MRF parameter-setting problem allows exploiting the known theoretical results about the Ho–Kashyap method in the context of linear discriminant functions. Specifically, one can prove that, if the matrix inequality in Equation 14.14 has solutions and if 0 < r < 2, then the Ho–Kashyap algorithm converges to a solution of Equation 14.14 in a finite number t of iterations (i.e., we obtain et ¼ 0 and consequently bt þ 1 ¼ bt and Elt ¼ bt 0) [24]. On the other hand, if the inequality in Equation 14.14 has no solution, it is possible to prove that each component ert of the error vector (r ¼ 1, 2, . . . , R) either vanishes for t ! þ1 or takes on nonpositive values, while the error magnitude ketk converges to a positive limit, which is bounded away from zero; in the latter situation, according to the Ho – Kashyap iterative procedure, the algorithm stops, and one can conclude that the matrix inequality in Equation 14.14 has no solution [46]. Therefore, the convergence (either finite-time or asymptotic) of the proposed parameter-setting method is guaranteed in any case. However, it is worth noting that the number t of iterations required to reach convergence in the first case or the number t0 of iterations needed to detect the nonexistence of a solution of Equation 14.14 in the second case are not known in advance. Hence, specific stop conditions are usually adopted for the Ho–Kashyap procedure, for instance, by stopping the iterative process when j l‘tþ1 lt‘ j < «stop for all ‘ ¼ 1, 2, . . . , L and jbrtþ1brtj< «stop for all r ¼ 1, 2, . . . , R, where «stop is a given threshold (in the present

MRF-Based Remote-Sensing Image Classification

313

chapter, «stop ¼ 0.0001 is used). The algorithm has been initialized by setting unitary values for all the components of l and b and by choosing r ¼ 1. Coupling the proposed Ho–Kashyap-based parameter-setting method with the ICM classification approach results in an automatic contextual supervised classification approach (hereafter marked by HK–ICM), which performs the following processing steps: .

Noncontextual step: generate an initial noncontextual classification map (i.e., an initial label vector S0) by applying a given supervised noncontextual (e.g., Bayesian or neural) classifier;

.

Energy-difference step: compute the energy-difference matrix E according to the adopted MRF model and to the label vector S0; Ho–Kashyap step: compute an optimal parameter vector l* by running the Ho–Kashyap procedure (applied to the matrix E) up to convergence; and

.

.

ICM step: generate a contextual classification map by running ICM (fed with the parameter vector l* ) up to convergence.

A block diagram of the proposed contextual classification scheme is shown in Figure 14.1. It is worth noting that, according to the role of the parameters as weights of energy contributions, negative values for l*1, l*2 , . . . , L* would be undesirable. To prevent this issue, we note that, if all the entries in a given row of the matrix E are negative,

Multiband image

Training map

Noncontextual step: generation of an initial noncontextual classification map S0 Energy-difference step: computation of the matrix E E Ho–Kashyap step: computation of the optimal parameter vector l* l*

ICM step: contextual classification of the input image

Contextual classification map

FIGURE 14.1 Block diagram of the proposed contextual classification scheme.

Signal and Image Processing for Remote Sensing

314

the corresponding inequality cannot be satisfied by a vector l with positive components. Therefore, before the application of the Ho–Kashyap step, the rows of E containing only negative entries are cancelled, as such rows would represent linear inequalities that could not be solved by positive parameter vector solutions. In other words, each deleted row corresponds to a training pixel k 2 T, so that a class vi 6¼ sk* has a lower energy than the true class label sk* for all possible positive values of the parameters l1, l2, . . . , lL (therefore, its classification is wrong).

14.4

Expe rimental Results

The proposed HK–ICM method was tested on three different MRF models, each applied to a distinct real data set. In all the cases, the results provided by ICM, endowed with the proposed automatic parameter-optimization method, are compared with the ones that can be achieved by performing an exhaustive grid search for the parameter values yielding the highest accuracies on the training set; operationally, such parameter values can be obtained by a ‘‘trial-and-error’’ procedure (TE–ICM in the following). Specifically, in all the experiments, spatially distinct training and test fields were available for each class. We note that the highest training-set (and not test-set) accuracy is searched for by TE–ICM to perform a consistent comparison with the proposed method, which deals only with the available training samples. A comparison between the classification accuracies of HK–ICM and TE–ICM on such a set of samples aims at assessing the performances of the proposed method from the viewpoint of the MRF parameter-optimization problem. On the other hand, an analysis of the test-set accuracies allows one to assess the quality of the resulting classification maps. Correspondingly, Table 14.1 through Table 14.3 show, for the three experiments, the classification accuracies provided by HK–ICM and those given by TE–ICM on both the training and the test sets. In all the experiments, 10 ICM iterations were sufficient to reach convergence, and the Ho–Kashyap algorithm was initialized by setting a unitary value for all the MRF parameters (i.e., by initially giving the same weight to all the energy contributions).

TABLE 14.1 Experiment I: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual GMAP Classifier, by HK–ICM, and by TE–ICM

Class Wet soil Urban Wood Water Bare soil Overall accuracy Average accuracy

Test-Set Accuracy Training-Set Accuracy Training Test Samples Samples GMAP (%) HK–ICM (%) TE–ICM (%) HK–ICM (%) TE–ICM (%) 1866 4195 3685 1518 3377

750 2168 2048 875 2586

93.73 94.10 98.19 91.66 97.53

96.93 99.31 98.97 92.69 99.54

96.80 99.45 98.97 92.69 99.54

98.66 97.00 96.85 98.16 99.20

98.61 97.02 96.85 98.22 99.20

95.86

98.40

98.42

97.80

97.81

95.04

97.49

97.49

97.97

97.98

Experiment II: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual DTC Classifier, by HK–ICM, and by TE–ICM

Image October 2000

June 2001

Class Wet soil Urban Wood Water Bare soil Overall accuracy Average accuracy Wet soil Urban Wood Water Bare soil Agricultural Overall accuracy Average accuracy

Training Samples

Test Samples

1194 3089 4859 1708 4509

895 3033 3284 1156 3781

1921 3261 1719 1461 3168 2773

1692 2967 1413 1444 3052 2431

Test-Set Accuracy

Training-Set Accuracy

DTC (%)

HK–ICM (%)

TE–ICM (%)

HK–ICM (%)

TE–ICM (%)

79.29 88.30 99.15 98.70 97.70 93.26 92.63 96.75 89.25 94.55 99.72 97.67 82.48 92.68 93.40

92.63 96.41 99.76 98.70 99.00 98.06 97.30 98.46 92.35 98.51 99.72 99.34 83.42 94.61 95.30

91.84 99.04 99.88 98.79 99.34 98.81 97.78 98.94 94.00 98.51 99.79 99.48 83.67 95.13 95.73

96.15 94.59 99.96 99.82 98.54 98.15 97.81 99.64 98.25 96.74 99.93 98.77 99.89 98.86 98.87

94.30 96.89 99.98 99.82 98.91 98.59 97.98 99.74 99.39 98.08 100 98.90 99.96 99.34 99.34

MRF-Based Remote-Sensing Image Classification

TABLE 14.2

315

316

TABLE 14.3 Experiment III: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual DTC Classifier, by HK–ICM, and by TE–ICM Class

April 16

Wet soil Bare soil Overall accuracy Average accuracy Wet soil Bare soil Overall accuracy Average accuracy Wet soil Bare soil Overall accuracy Average accuracy

April 17

April 18

Test Samples

6205 1346

5082 1927

6205 1469

5082 1163

6205 1920

5082 2122

Test-Set Accuracy

Training-Set Accuracy

DTC (%)

HK–ICM (%)

TE–ICM (%)

HK–ICM (%)

TE–ICM (%)

90.38 91.70 90.74 91.04 93.33 99.66 94.51 96.49 86.80 97.41 89.92 92.10

99.19 92.42 97.33 95.81 99.17 100 99.33 99.59 97.28 96.80 97.14 97.04

96.79 95.28 96.38 96.04 97.93 100 98.32 98.97 92.84 98.26 94.43 95.55

100 99.93 99.99 99.96 100 99.52 99.91 99.76 100 98.70 99.69 99.35

100 100 100 100 100 100 100 100 100 99.95 99.99 99.97

Signal and Image Processing for Remote Sensing

Image

Training Samples

MRF-Based Remote-Sensing Image Classification 14.4.1

317

Experiment I: Spatial MRF Model for Single-Date Image Classification

The spatial MRF model described in Refs. [5,17] is adopted: it employs a second-order neighborhood (i.e., Ck includes the labels of the 8 pixels surrounding the k-th pixel; k ¼ 1, 2, . . . , N) and a parametric Gaussian N(mi, Si) model for the PDF p(xkjvi) of a feature vector xk conditioned to the class vi (i ¼ 1, 2, . . . , M; k ¼ 1, 2, . . . , N). Specifically, two energy contributions are defined, that is, a ‘‘spectral’’ energy term E1(), related to the information conveyed by the conditional statistics of the feature vector of each single pixel and a ‘‘spatial’’ term E2(), related to the information conveyed by the correlation among the class labels of neighboring pixels, that is, (k ¼ 1, 2, . . . , N): X ^ i, ^ 1 ( xk m ^ i )T ^ i ) þ lndet E 1 (vi j xk ) ¼ (xk m E 2 (vi j Ck ) ¼ d(vi , vj ) (14:18) i vj 2 Ck

^ i are a sample-mean estimate and a sample-covariance estimate of the ^ i and S where m conditional mean mi and of the conditional covariance matrix Si, respectively, computed using the training samples of vi (i ¼ 1, 2, . . . , M); d (, ) is the Kronecker delta function (i.e., d(a, b) ¼ 1 for a ¼ b and d(a, b) ¼ 0 for a 6¼ b). Hence, in this case, two parameters, l1 and l2, have to be set. The model was applied to a 6 -band 870 498 pixel-sized Landsat-5 TM image acquired in April, 1994, over an urban and agricultural area around the town of Pavia (Italy) (Figure 14.2a), and presenting five thematic classes (i.e., ‘‘wet soil,’’ ‘‘urban,’’ ‘‘wood,’’ ‘‘water,’’ and ‘‘bare soil;’’ see Table 14.1). The above-mentioned normality assumption for the conditional PDFs is usually accepted as a model for the class statistics in optical data [49,50]. A standard noncontextual MAP classifier with Gaussian classes (hereafter denoted simply by GMAP) [49,50] was adopted in the noncontextual step. The Ho–Kashyap step required 31716 iterations to reach convergence and selected l*1 ¼ 0.99 and l2* ¼ 10.30. The classification accuracies provided by GMAP and HK–ICM on the test set are given in Table 14.1. GMAP already provides good classification performances, with accuracies higher than 90% for all the classes. However, as expected, the contextual HK–ICM classifier further improves the classification result, in particular, yielding, a 3.20% accuracy increase for ‘‘wet soil’’ and a 5.21% increase for ‘‘urban,’’ which result in a 2.54% increase in the overall accuracy (OA, i.e., the percentage of correctly classified test samples) and in a 2.45% increase in the average accuracy (AA, i.e., the average of the accuracies obtained on the five classes). Furthermore, for the sake of comparison, Table 14.1 also shows the results obtained by the contextual MRF–ICM classifier, applied by using a standard ‘‘trial-and-error’’ parametersetting procedure (TE–ICM). Specifically, fixing3 l1 ¼ 1, the value of l2 yielding the highest value of OA on the training set has been searched exhaustively in the range [0,20] with discretization step 0.2. The value selected exhaustively by TE–ICM was l2 ¼ 11.20, which is quite close to the value computed automatically by the proposed procedure. Accordingly, the classification accuracies obtained by HK–ICM and TE–ICM on both the training and test sets were very similar (Table 14.1). 14.4.2 Experiment II: Spatio-Temporal MRF Model for Two-Date Multi-Temporal Image Classification As a second experiment, we addressed the problem of supervised multi-temporal classification of two co-registered images, I0 and I1, acquired over the same ground area at 3 According to the minimum-energy decision rule, this choice causes no loss of generality, as the classification result is affected only by the relative weight l2/l1 between the two parameters and not by their absolute values.

Signal and Image Processing for Remote Sensing

318

(a)

(b)

(c) FIGURE 14.2 Data sets employed for the experiments: (a) band TM-3 of the multi-spectral image acquired in April, 1994, and employed in Experiment I; (b) ERS-2 channel (after histogram equalization) of the multi-sensor image acquired in October, 2000, and employed in Experiment II; (c) XSAR channel (after histogram equalization) of the SAR multi-frequency image acquired in April 16, 1994, and employed in Experiment III.

different times, t0 and t1 (t1 > t0). Specifically, the spatio-temporal mutual MRF model proposed in Ref. [7] is adopted that introduces an energy function taking into account both the spatial context (related to the correlation among neighboring pixels in the same image) and the temporal context (related to the correlation between distinct images of the same area). Let us denote by Mj, xjk, and vji the number of classes in Ij, the feature vector of the k-th pixel in Ij (k ¼ 1, 2, . . . , N), and the i-th class in Ij (i ¼ 1, 2, . . . , Mj), respectively

MRF-Based Remote-Sensing Image Classification

319

( j ¼ 0,1). Focusing, for instance, on the classification of the first-date image I0, the context of the k-th pixel includes two distinct sets of labels: (1) the set S0k of labels of the spatial context, that is, the labels of the 8 pixels surrounding the k-th pixel in I0, and (2) the set T0k of labels of the temporal context, that is, the labels of the pixels lying in I1 inside a 3 3 window centered on the k-th pixel (k ¼ 1, 2, . . . , N) [7]. Accordingly, three energy contributions are introduced, that is, a spectral and a spatial term (as in Experiment I), and an additional temporal term, that is, (i ¼ 1, 2, . . . , M0; k ¼ 1, 2, . . . , N): X p(x0k jv0i ), E 2 (v0i jS0k ) ¼ d(v0i , v0j ) E 1 (v0i jx0k ) ¼ ln ^ v0j 2S0k

E 3 (v0i jT0k ) ¼

X

^ (v0i jv1j ) P

(14:19)

v1j 2T0k

where ^ p(x0kjv0i) is an estimate of the PDF p(x0kjv0i) of a feature vector x0k (k ¼ 1, 2, . . . , N) ^ (v0ijv1j) is an estimate of the transition probability conditioned to the class v0i, and P between class v1j at time t1 and class v0i at time t0 (i ¼ 1, 2, . . . , M0 ; j ¼ 1, 2, . . . , M1) [7]. A similar 3-component energy function is also introduced at the second date; hence, three parameters have to be set at date t0 (namely, l01, l02, and l03) and three other parameters at date t1 (namely, l11, l12, and l13). In the present experiment, HK–ICM was applied to a two-date multi-temporal data set, consisting of two 870 498 pixel-sized images acquired over the same ground area as in Experiment I in October, 2000 and July, 2001, respectively. At both acquisition dates, eight Landsat-7 ETMþ bands were available, and a further ERS-2 channel (C-band, VV polarization) was also available in October, 2000 (Figure 14.2b). The same five thematic classes, as considered in Experiment I were considered in the October, 2000 scene, while the July, 2001 image also presented a further ‘‘agricultural’’ class. For all the classes, training and test data were available (Table 14.2). Due to the multi-sensor nature of the October, 2000 image, a nonparametric technique was employed in the noncontextual step. Specifically, the dependence tree classifier (DTC) approach was adopted that approximates each multi-variate class-conditional PDF as a product of automatically selected bivariate PDFs [51]. As proposed in Ref. [7], the Parzen window method [30], applied with Gaussian kernels, was used to model such bivariate PDFs. In particular, to avoid the usual ‘‘trial-and-error’’ selection of the kernelwidth parameters involved by the Parzen method, we adopted the simple ‘‘reference density’’ automatization procedure, which computes the kernel width by asymptotically minimizing the ‘‘mean integrated square error’’ functional, according to a given reference model for the unknown PDF (for further details on the automatic kernel-width selection for Parzen density estimation, we refer the reader to Refs. [52,53]). The ‘‘reference density’’ approach is chosen for its simplicity and short computation time; in particular, it is applied according to a Gaussian reference density for the kernel widths of the ETMþ features [53] and to a Rayleigh density for the kernel widths of the ERS-2 feature (for further details on this SAR-specific kernel-width selection approach, we refer the reader to Ref. [54]). The resulting PDF estimates were fed to an MAP classifier together with prior-probability estimates (computed as relative frequencies on the training set) to generate an initial noncontextual classification map for each date. During the noncontextual step, transition-probability estimates were computed as ^ (v0ijv1j) was computed as the ratio relative frequencies on the two initial maps (i.e., P nij/mj between the number nij of pixels assigned both to v0i in I0 and to v1j in I1 and the total number mj of pixels assigned to v1j in I1; i ¼ 1, 2, . . . , M0 ; j ¼ 1, 2, . . . , M1). The energy-difference and the Ho–Kashyap steps were applied first to the October,

Signal and Image Processing for Remote Sensing

320

2000 image, converging to (1.06,0.95,1.97) in 2632 iterations, and then to the July, 2001 image, converging to (1.00,1.00,1.01) in only 52 iterations. As an example, the convergent behavior of the parameter values during the Ho–Kashyap iterations for the October, 2000 image is shown in Figure 14.3. The classification accuracies obtained by ICM, applied to the spatio-temporal model of Equation 14.19 with these parameter values, are shown in Table 14.2, together with the results of the noncontextual DTC algorithm. Also in this experiment, HK–ICM provided good classification accuracies at both dates, specifically yielding large accuracy increases for several classes as compared with the noncontextual DTC initialization (in particular, ‘‘urban’’ at both dates, ‘‘wet soil’’ in October, 2000, and ‘‘wood’’ in June, 2001). Also for this experiment, we compared the results obtained by HK–ICM with the ones provided by the MRF–ICM classifier with a ‘‘trial-and-error’’ parameter setting (TE–ICM, Table 14.2). Fixing l01 ¼ l11 ¼ 1 (as in Experiment I) at both dates, a grid search was performed to identify the values of the other parameters that yielded the highest overall accuracies on the training set at the two dates. In general, this would require an interactive optimization of four distinct parameters (namely, l02, l03, l12, and l13). To reduce the time taken by this interactive procedure, the restrictions l02 ¼ l12 and l03 ¼ l13 were adopted, thus assigning the same weight l2 to the two spatial energy contributions and the same weight l3 to the two temporal contributions, and performing a grid search in a two-dimensional (and not four-dimensional) space. The adopted search range was [0,10] for both parameters with discretization step 0.2: for each combination of (l2, l3), ICM was run until convergence and the resulting classification accuracies on the training set were computed. A global maximum of the average of the two training-set OAs obtained at the two dates was reached for l2 ¼ 9.8 and l3 ¼ 2.6: the corresponding accuracies (on both the training and test sets) are shown in Table 14.2. This optimal parameter vector turns out to be quite different from the solution automatically computed by the proposed method. The training-set accuracies of TE–ICM are better than the ones of HK–ICM; however, the difference between the OAs provided on the training set by the two approaches is only 0.44% in October, 2000 and 0.48% in June, 2001, and the differences between the AAs are only 0.17% and 0.47%, respectively. This suggests that, although identifying different parameter solutions, the proposed automatic approach and the standard interactive one allow achieving similar classification performances. A similar small difference of performance can also be noted between the corresponding test-set accuracies.

2.2 l01

2

l02

l03

1.8 1.6 1.4 1.2 1 0.8 1

401

801

1201

1601

2001

2401

Ho–Kashyap iteration FIGURE 14.3 Experiment II: Plots of behaviors of the ‘‘spectral’’ parameter l01, the ‘‘spatial’’ parameter l02, and the ‘‘temporal’’ parameter l03 versus the number of Ho–Kashyap iterations for the October, 2000, image.

MRF-Based Remote-Sensing Image Classification

321

14.4.3 Experiment III: Spatio-Temporal MRF Model for Multi-Temporal Classification of Image Sequences As a third experiment, HK–ICM was applied to the multi-temporal mutual MRF model (see Section 14.2) extended to the problem of supervised classification of image sequences [7]. In particular, considering, a sequence {I0, I1, I2} of three co-registered images acquired over the same ground area at times t0, t1, t2, respectively (t0 < t1 < t2), the mutual model is generalized by introducing an energy function with four contributions at the intermediate date t1, that is, a spectral and a spatial energy term (namely, E1() and E2()) expressed as in Equation 14.19 and two distinct temporal energy terms, E3() and E4(), related to the backward temporal correlation of I1 with the previous image I0 and to the forward temporal correlation of I1 with I2, respectively [7]. For the first and last images in the sequence (i.e., I0 and I2), only one typology of temporal energy is well defined (in particular, only the forward temporal energy E3() for I0 and only the backward energy E4() for I2). All the temporal energy terms are computed in terms of transition probabilities, as in Equation 14.19. Therefore, four parameters (l11, l12, l13, and l14) have to be set at date t1, while only three parameters are needed at date t0 (namely, l01, l02, and l03) or at date t2 (namely, l21, l22, and l24). Specifically, a sequence of three 700 280 pixel-sized co-registered multi-frequency SAR images, acquired over an agricultural area near the city of Pavia in April 16, 17, and 18, 1994, was used in the experiment (the ground area is not the same as the one considered in Experiments I and II, although the two regions are quite close to each other). At each date, a 4-look XSAR band (VV polarization) and three 4-look SIR-C-bands (HH, HV, and TP (total power)) were available. The observed scene at the three dates presented two main classes, that is, ‘‘wet soil’’ and ‘‘bare soil,’’ and the temporal evolution of these classes between April 16 and April 18 was due to the artificial flooding processes caused by rice cultivation. The PDF estimates and the initial classification maps were computed using the DTC-Parzen method (as in Experiment II) applied here with a kernel-width optimized according again to the SAR-specific method developed in Ref. [54] and based on a Rayleigh reference density. As in Experiment II, the transition probabilities were estimated as relative frequencies on the initial noncontextual maps. Specifically, the Ho–Kashyap step was applied separately to set the parameters at each date, converging to (0.97,1.23,1.56) in 11510 iterations for April 16, to (1.12,1.03,0.99,1.28) in 15092 iterations for April 17, and to (1.00,1.04,0.97) in 2112 iterations for April 18. As shown in Table 14.3, the noncontextual classification stage already provided good accuracies, but the use of the contextual information allowed a sharp increase in accuracy to be obtained at all the three dates (i.e., þ6.59%, þ4.82%, and þ7.22% in OA for April 16, 17, and 18, respectively, and þ4.77%, þ3.09%, and þ4.94% in AA, respectively), thus suggesting the effectiveness of the parameter values selected by the developed algorithm. In particular, as shown in Figure 14.4 with regard to the April 16 image, the noncontextual DTC maps (e.g., Figure 14.4a) were very noisy, due to the impact of the presence of speckle in the SAR data on the classification results, whereas the contextual HK–ICM maps (e.g., Figure 4b) were far less affected by speckle, because of the integration of contextual information in the classification process. It is worth noting that no preliminary despeckling procedure was applied to the input SIR-C/XSAR images before classifying them. Also for this experiment, in Table 14.3, we present the results obtained by searching exhaustively for the MRF parameters yielding the best training-set accuracies. As in Experiment II, for the ‘‘trial-and-error’’ parameter optimization the following restrictions were adopted: l01 ¼ l11 ¼ l21 ¼ 1, l02 ¼ l12 ¼ l22 ¼ l2, and l03 ¼ l13 ¼ l14 ¼ l24 ¼ l3 (i.e., a unitary value was used for the ‘‘spectral’’ parameters, the same weight l2 was assigned to all the spatial energy contributions employed at the three dates, and the same weight l3 was associated with all the temporal energy terms). Therefore, the number of

Signal and Image Processing for Remote Sensing

322

(a)

(b) FIGURE 14.4 Experiment III: Classification maps for the April 16, image provided by: (a) the noncontextual DTC classifier; (b) the proposed contextual HK–ICM classifier. Color legend: white ¼ ‘‘wet soil,’’ grey ¼ ‘‘dry soil,’’ black ¼ not classified (not classified pixels are present where the DTC-Parzen PDF estimates exhibit very low values for both classes).

independent parameters to be set interactively was reduced from 7 to 2. Again the search range [0,10] (discretized with step 0.2) was adopted for both parameters and, for each couple (l2, l3), ICM was run up to convergence. Then, the parameter vector yielding the best classification result was chosen. In particular, the average of the three OAs on the training set at the three dates exhibited a global maximum for l2 ¼ 1 and l3 ¼ 0.4. The corresponding accuracies are given in Table 14.3 and denoted by TE–ICM. As noted with regard to Experiment II, although the parameter vectors selected by the proposed automatic procedure and by the interactive ‘‘trial-and-error’’ one were quite different, the classification performances achieved by the two methods on the training set were very similar (and very close or equal to 100%). On the other hand, in most cases the test-set accuracies obtained by HK–ICM are even better than the ones given by TE–ICM. These results can be interpreted as a consequence of the fact that, to make the computation time for the exhaustive parameter search tractable, the above-mentioned restrictions on the parameter values were used when applying TE–ICM, whereas HK–ICM optimized all the MRF parameters, without any need for additional restrictions. The resulting TE–ICM parameters were effective for the classification of the training set (as they were specifically optimized toward this end), but they yielded less accurate results on the test set.

14.5

Conclusions

In the present chapter, an innovative algorithm has been proposed that addresses the problem of the automatization of the parameter-setting operations involved by MRF

MRF-Based Remote-Sensing Image Classification

323

models for ICM-based supervised image classification. The method is applicable to a wide class of MRF models (i.e., the models whose energy functions can be expressed as weighted sums of distinct energy contributions), and expresses the parameter optimization as a linear discrimination problem, solved by using the Ho–Kashyap method. In particular, the theoretical properties known for this algorithm in the context of linear classifier design still holds also for the proposed method, thus ensuring a good convergent behavior. The numerical results proved the capability of HK–ICM to select automatically parameter values that can yield accurate contextual classification maps. These results have been obtained for several different MRF models, dealing both with single-date supervised classification and with multi-temporal classification of two-date or multidate imagery, which suggests a high flexibility of the method. This is further confirmed by the fact that good classification results were achieved on different typologies of remote sensing data (i.e., multi-spectral, optical-SAR multi-source, and multi-frequency SAR) and with different PDF estimation strategies (both parametric and nonparametric). In particular, an interactive choice of the MRF parameters that selects through an exhaustive grid search the parameter values that provide the best performances on the training set yielded results very similar to the ones obtained by the proposed HK–ICM methodology in terms of classification accuracy. Specifically, in the first experiment, HK–ICM automatically identified parameter values quite close to the ones selected by this ‘‘trial-and-error’’ procedure (thus providing a very similar classification map), while in the other experiments, HK–ICM chose parameter values different from the ones obtained by the ‘‘trial-and-error’’ strategy, although generating classification results with similar (Experiment II) or even better (Experiment III) accuracies. The application of the ‘‘trial-and-error’’ approach in acceptable execution times required the definition of suitable restrictions on the parameter values to reduce the dimension of the parameter space to be explored. Such restrictions are not needed by the proposed method, which turns out to be a good compromise between accuracy and level of automatization, as it allows obtaining very good classification results, although avoiding completely the time-consuming phase of ‘‘trial-and-error’’ parameter setting. In addition, in all the experiments, the adopted MRF models yielded a significant increase in accuracy, as compared with the initial noncontextual results. This further highlights the advantages of MRF models, which effectively exploit the contextual information in remote sensing image classification, and further confirms the usefulness of the proposed technique in automatically setting the parameters of such models. The proposed algorithm allows one to overcome the usual limitation on the use of MRF techniques in supervised image classification, consisting in the lack of automatization of such techniques, and supports a more extensive use of this powerful classification approach in remote sensing. It is worth noting that HK–ICM optimizes the MRF parameter vector according to the initial classification maps provided as an input to ICM: the resulting optimal parameters are then employed to run ICM up to convergence. A further generalization of the method would adapt automatically the MRF parameters also during the ICM iterative process by applying the proposed Ho–Kashyap-based method at each ICM iteration or according to a different predefined iterative scheme. This could allow a further improvement in accuracy, although requiring one to run the Ho–Kashyap algorithm not once but several times, thus increasing the total computation time of the contextual classification process. The effect of this combined optimization strategy on the convergence of ICM is an issue worth being investigated.

324

Signal and Image Processing for Remote Sensing

Acknowledgments This research was carried out within the framework of the PRIN-2002 project entitled ‘‘Processing and analysis of multitemporal and hypertemporal remote-sensing images for environmental monitoring,’’ which was funded by the Italian Ministry of Education, University, and Research (MIUR). The support is gratefully acknowledged. The authors would also like to thank Dr. Paolo Gamba from the University of Pavia, Italy, for providing the SAR images employed in Experiments II and III.

References 1. M. Datcu, K. Seidel, and M. Walessa, Spatial information retrieval from remote sensing images: Part I. Information theoretical perspective, IEEE Trans. Geosci. Rem. Sens., 36(5), 1431–1445, 1998. 2. R.C. Dubes and A.K. Jain, Random field models in image analysis, J. Appl. Stat., 16, 131–163, 1989. 3. S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration, IEEE Trans. Pattern Anal. Machine Intell., 6, 721–741, 1984. 4. A.H.S. Solberg, Flexible nonlinear contextual classification, Pattern Recogn. Lett., 25(13), 1501–1508, 2004. 5. Q. Jackson and D. Landgrebe, Adaptive Bayesian contextual classification based on Markov random fields, IEEE Trans. Geosci. Rem. Sens., 40(11), 2454–2463, 2002. 6. M.De Martino, G. Macchiavello, G. Moser., and S.B. Serpico, Partially supervised contextual classification of multitemporal remotely sensed images, In Proc. of IEEE-IGARSS 2003, Toulouse, 2, 1377–1379, 2003. 7. F. Melgani and S.B. Serpico, A Markov random field approach to spatio-temporal contextual image classification, IEEE Trans. Geosci. Rem. Sens., 41(11), 2478–2487, 2003. 8. P.H. Swain, Bayesian classification in a time-varying environment, IEEE Trans. Syst., Man, Cybern., 8(12), 880–883, 1978. 9. A.H.S. Solberg, T. Taxt, and A.K. Jain, A Markov Random Field model for classification of multisource satellite imagery, IEEE Trans. Geosci. Rem. Sens., 34(1), 100–113, 1996. 10. G. Storvik, R. Fjortoft, and A.H.S. Solberg, A bayesian approach to classification of multiresolution remote sensing data, IEEE Trans. Geosci. Rem. Sens., 43(3), 539–547, 2005. 11. M.L. Comer and E.J. Delp, The EM/MPM algorithm for segmentation of textured images: Analysis and further experimental results, IEEE Trans. Image Process., 9(10), 1731–1744, 2000. 12. Y. Delignon, A. Marzouki, and W. Pieczynski, Estimation of generalized mixtures and its application to image segmentation, IEEE Trans. Image Process., 6(10), 1364–1375, 2001. 13. X. Descombes, M. Sigelle, and F. Preteux, Estimating Gaussian Markov random field parameters in a nonstationary framework: Application to remote sensing imaging, IEEE Trans. Image Process., 8(4), 490–503, 1999. 14. P. Smits and S. Dellepiane, Synthetic aperture radar image segmentation by a detail preserving Markov random field approach, IEEE Trans. Geosci. Rem. Sens., 35(4), 844–857, 1997. 15. G.G. Hazel, Multivariate Gaussian MRF for multispectral scene segmentation and anomaly detection, IEEE Trans. Geosci. Rem. Sens., 38(3), 1199–1211, 2000. 16. G. Rellier, X. Descombes, F. Falzon, and J. Zerubia, Texture feature analysis using a Gauss–Markov model in hyperspectral image classification, IEEE Trans. Geosci. Rem. Sens., 42(7), 1543–1551, 2004. 17. L. Bruzzone and D.F. Prieto, Automatic analysis of the difference image for unsupervised change detection, IEEE Trans. Geosci. Rem. Sens., 38(3), 1171–1182, 2000. 18. L. Bruzzone and D.F. Prieto, An adaptive semiparametric and context-based approach to unsupervised change detection in multitemporal remote-sensing images, IEEE Trans. Image Process., 40(4), 452–466, 2002.

MRF-Based Remote-Sensing Image Classification

325

19. T. Kasetkasem and P.K. Varshney, An image change detection algorithm based on markov random field models, IEEE Trans. Geosci. Rem. Sens., 40(8), 1815–1823, 2002. 20. J. Besag, On the statistical analysis of dirty pictures, J. R. Statist. Soc., 68, 259–302, 1986. 21. Y. Cao, H. Sun, and X. Xu, An unsupervised segmentation method based on MPM for SAR images, IEEE Geosci. Rem. Sens. Lett., 2(1), 55–58, 2005. 22. X. Descombes, R.D. Morris, J. Zerubia, and M. Berthod, Estimation of Markov Random Field prior parameters using Markov Chain Monte Carlo maximum likelihood, IEEE Trans. Image Process., 8(7), 954–963, 1999. 23. M.V. Ibanez and A. Simo’, Parameter estimation in Markov random field image modeling with imperfect observations. A comparative study, Pattern Rec. Lett., 24(14), 2377–2389, 2003. 24. J.T. Tou and R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, MA, 1974. 25. Y.-C. Ho and R.L. Kashyap, An algorithm for linear inequalities and its applications, IEEE Trans. Elec. Comp., 14, 683–688, 1965. 26. G. Celeux, F. Forbes, and N. Peyrand, EM procedures using mean field-like approximations for Markov model-based image segmentation, Pattern Recogn., 36, 131–144, 2003. 27. Y. Zhang, M. Brady, and S. Smith, Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm, IEEE Trans. Med. Imag., 20(1), 45–57, 2001. 28. R. Fiortoft, Y. Delignon, W. Pieczynski, M. Sigelle, and F. Tupin, Unsupervised classification of radar images using hidden Markov models and hidden Markov random fields, IEEE Trans. Geosci. Rem. Sens., 41(3), 675–686, 2003. 29. Z. Kato, J. Zerubia, and M. Berthod, Unsupervised parallel image classification using Markovian models, Pattern Rec., 32, 591–604, 1999. 30. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, Academic Press, New York, 1990. 31. F. Comets and B. Gidas, Parameter estimation for Gibbs distributions from partially observed data, Ann. Appl. Prob., 2(1), 142–170, 1992. 32. W. Qian and D.N. Titterington, Estimation of parameters of hidden Markov models, Phil. Trans.: Phys. Sci. and Eng., 337(1647), 407–428, 1991. 33. X. Descombes, A dense class of Markov random fields and associated parameter estimation, J. Vis. Commun. Image R., 8(3), 299–316, 1997. 34. J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. R. Statist. Soc., 36, 192–236, 1974. 35. L. Bedini, A. Tonazzini, and S. Minutoli, Unsupervised edge-preserving image restoration via a saddle point approximation, Image and Vision Computing, 17, 779–793, 1999. 36. S.S. Saquib, C.A. Bouman, and K. Sauer, ML parameter estimation for Markov random fields with applications to Bayesian tomography, IEEE Trans. Image Process., 7(7), 1029–1044, 1998. 37. C. Geyer and E. Thompson, Constrained Monte Carlo maximum likelihood for dependent data, J. Roy. Statist. Soc. Ser. B, 54, 657–699, 1992. 38. L. Younes, Estimation and annealing for Gibbsian fields, Annales de l’Institut Henri Poincare´. Probabilite´s et Statistiques, 24, 269–294, 1988. 39. B. Chalmond, An iterative Gibbsian technique for reconstruction of mary images, Pattern Rec., 22(6), 747–761, 1989. 40. Y. Yu and Q. Cheng, MRF parameter estimation by an accelerated method, Patt. Rec. Lett., 24, 1251–1259, 2003. 41. L. Wang, J. Liu, and S.Z. Li, MRF parameter estimation by MCMC method, Pattern Rec., 33, 1919–1925, 2000. 42. L. Wang and J. Liu, Texture segmentation based on MRMRF modeling, Patt. Rec. Lett., 21, 189– 200, 2000. 43. H. Derin and H. Elliott, Modeling and segmentation of noisy and textured images using Gibbs random fields, IEEE Trans. Pattern Anal. Machine Intell., 9(1), 39–55, 1987. 44. M. Mignotte, C. Collet, P. Perez, and P. Bouthemy, Sonar image segmentation using an unsupervised hierarchical MRF model, IEEE Trans. Image Process., 9(7), 1216–1231, 2000. 45. B.C.K. Tso and P.M. Mather, Classification of multisource remote sensing imagery using a genetic algorithm and Markov Random Fields, IEEE Trans. Geosci. Rem. Sens., 37(3), 1255–1260, 1999.

326

Signal and Image Processing for Remote Sensing

46. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd edition, Wiley, New York, 2001. 47. M.W. Zemansky, Heat and Thermodynamics, 5th edition, McGraw-Hill, New York, 1968. 48. J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, 2nd edition, Springer-Verlag, New York, 1992. 49. J. Richards and X. Jia, Remote sensing digital image analysis, 3rd edition, Springer-Verlag, Berlin, 1999. 50. D.A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing. Wiley-InterScience, New York, 2003. 51. M. Datcu, F. Melgani, A. Piardi, and S.B. Serpico, Multisource data classification with dependence trees, IEEE Trans. Geosci. Rem. Sens., 40(3), 609–617, 2002. 52. A. Berlinet and L. Devroye, A comparison of kernel density estimates, Publications de l’Institut de Statistique de l’Universite´ de Paris, 38(3), 3–59, 1994. 53. K.-D. Kim and J.-H. Heo, Comparative study of flood quantiles estimation by nonparametric models, J. Hydrology, 260, 176–193, 2002. 54. G. Moser, J. Zerubia, and S.B. Serpico, Dictionary-based stochastic expectation-maximization for SAR amplitude probability density function estimation, Research Report 5154, INRIA, Mar. 2004.

15 Random Forest Classification of Remote Sensing Data

Sveinn R. Joelsson, Jon A. Benediktsson, and Johannes R. Sveinsson

CONTENTS 15.1 Introduction ..................................................................................................................... 327 15.2 The Random Forest Classifier....................................................................................... 328 15.2.1 Derived Parameters for Random Forests...................................................... 329 15.2.1.1 Out-of-Bag Error .............................................................................. 329 15.2.1.2 Variable Importance ........................................................................ 329 15.2.1.3 Proximities ........................................................................................ 329 15.3 The Building Blocks of Random Forests..................................................................... 330 15.3.1 Classification and Regression Tree ................................................................ 330 15.3.2 Binary Hierarchy Classifier Trees .................................................................. 330 15.4 Different Implementations of Random Forests ......................................................... 331 15.4.1 Random Forest: Classification and Regression Tree................................... 331 15.4.2 Random Forest: Binary Hierarchical Classifier............................................ 331 15.5 Experimental Results...................................................................................................... 331 15.5.1 Classification of a Multi-Source Data Set...................................................... 331 15.5.1.1 The Anderson River Data Set Examined with a Single CART Tree ......................................................................... 335 15.5.1.2 The Anderson River Data Set Examined with the BHC Approach ................................................................................. 337 15.5.2 Experiments with Hyperspectral Data.......................................................... 338 15.6 Conclusions...................................................................................................................... 343 Acknowledgment....................................................................................................................... 343 References ................................................................................................................................... 343

15.1

Introduction

Ensemble classification methods train several classifiers and combine their results through a voting process. Many ensemble classifiers [1,2] have been proposed. These classifiers include consensus theoretic classifiers [3] and committee machines [4]. Boosting and bagging are widely used ensemble methods. Bagging (or bootstrap aggregating) [5] is based on training many classifiers on bootstrapped samples from the training set and has been shown to reduce the variance of the classification. In contrast, boosting uses iterative re-training, where the incorrectly classified samples are given more weight in 327

Signal and Image Processing for Remote Sensing

328

successive training iterations. This makes the algorithm slow (much slower than bagging) while in most cases it is considerably more accurate than bagging. Boosting generally reduces both the variance and the bias of the classification and has been shown to be a very accurate classification method. However, it has various drawbacks: it is computationally demanding, it can overtrain, and is also sensitive to noise [6]. Therefore, there is much interest in investigating methods such as random forests. In this chapter, random forests are investigated in the classification of hyperspectral and multi-source remote sensing data. A random forest is a collection of classification trees or treelike classifiers. Each tree is trained on a bootstrapped sample of the training data, and at each node in each tree the algorithm only searches across a random subset of the features to determine a split. To classify an input vector in a random forest, the vector is submitted as an input to each of the trees in the forest. Each tree gives a classification, and it is said that the tree votes for that class. In the classification, the forest chooses the class having the most votes (over all the trees in the forest). Random forests have been shown to be comparable to boosting in terms of accuracies, but without the drawbacks of boosting. In addition, the random forests are computationally much less intensive than boosting. Random forests have recently been investigated for classification of remote sensing data. Ham et al. [7] applied them in the classification of hyperspectral remote sensing data. Joelsson et al. [8] used random forests in the classification of hyperspectral data from urban areas and Gislason et al. [9] investigated random forests in the classification of multi-source remote sensing and geographic data. All studies report good accuracies, especially when computational demand is taken into account. The chapter is organized as follows. Firstly random forest classifiers are discussed. Then, two different building blocks for random forests, that is, the classification and regression tree (CART) and the binary hierarchical classifier (BHC) approaches are reviewed. In Section 15.4, random forests with the two different building blocks are discussed. Experimental results for hyperspectral and multi-source data are given in Section 15.5. Finally, conclusions are given in Section 15.6.

15.2

The Random Forest Classifier

A random forest classifier is a classifier comprising a collection of treelike classifiers. Ideally, a random forest classifier is an i.i.d. randomization of weak learners [10]. The classifier uses a large number of individual decision trees, all of which are trained (grown) to tackle the same problem. A sample is decided to belong to the most frequently occurring of the classes as determined by the individual trees. The individuality of the trees is maintained by three factors: 1. Each tree is trained using a random subset of the training samples. 2. During the growing process of a tree the best split on each node in the tree is found by searching through m randomly selected features. For a data set with M features, m is selected by the user and kept much smaller than M. 3. Every tree is grown to its fullest to diversify the trees so there is no pruning. As described above, a random forest is an ensemble of treelike classifiers, each trained on a randomly chosen subset of the input data where final classification is based on a majority vote by the trees in the forest.

Random Forest Classification of Remote Sensing Data

329

Each node of a tree in a random forest looks to a random subset of features of fixed size m when deciding a split during training. The trees can thus be viewed as random vectors of integers (features used to determine a split at each node). There are two points to note about the parameter m: 1. Increasing the correlation between the trees in the forest by increasing m, increases the error rate of the forest. 2. Increasing the classification accuracy of every individual tree by increasing m, decreases the error rate of the forest. An optimal interval for m is between the somewhat fuzzy extremes discussed above. The parameter m is often said to be the only adjustable parameter to which the forest is sensitive and the ‘‘optimal’’ range for m is usually quite wide [10]. 15.2.1

Derived Parameters for Random Forests

There are three parameters that are derived from the random forests. These parameters are the out-of-bag (OOB) error, the variable importance, and the proximity analysis. 15.2.1.1

Out-of-Bag Error

To estimate the test set accuracy, the out-of-bag samples (the remaining training set samples that are not in the bootstrap for a particular tree) of each tree can be run down through the tree (cross-validation). The OOB error estimate is derived by the classification error for the samples left out for each tree, averaged over the total number of trees. In other words, for all the trees where case n was OOB, run case n down the trees and note if it is correctly classified. The proportion of times the classification is in error, averaged over all the cases, is the OOB error estimate. Let us consider an example. Each tree is trained on a random 2/3 of the sample population (training set) while the remaining 1/3 is used to derive the OOB error rate for that tree. The OOB error rate is then averaged over all the OOB cases yielding the final or total OOB error. This error estimate has been shown to be unbiased in many tests [10,11]. 15.2.1.2

Variable Importance

For a single tree, run it on its OOB cases and count the votes for the correct class. Then, repeat this again after randomly permuting the values of a single variable in the OOB cases. Now subtract the correctly cast votes for the randomly permuted data from the number of correctly cast votes for the original OOB data. The average of this value over all the forest is the raw importance score for the variable [5,6,11]. If the values of this score from tree to tree are independent, then the standard error can be computed by a standard computation [12]. The correlations of these scores between trees have been computed for a number of data sets and proved to be quite low [5,6,11]. Therefore, we compute standard errors in the classical way: divide the raw score by its standard error to get a z-score, and assign a significance level to the z-score assuming normality [5,6,11]. 15.2.1.3 Proximities After a tree is grown all the data are passed through it. If cases k and n are in the same terminal node, their proximity is increased by one. The proximity measure can be used

Signal and Image Processing for Remote Sensing

330

(directly or indirectly) to visualize high dimensional data [5,6,11]. As the proximities are indicators on the ‘‘distance’’ to other samples this measure can be used to detect outliers in the sense that an outlier is ‘‘far’’ from all other samples.

15.3

The Building Blocks of Random Forests

Random forests are made up of several trees or building blocks. The building blocks considered here are CART, which partition the input data, and the BHC trees, which partition the labels (the output).

15.3.1

Classification and Regression Tree

CART is a decision tree where splits are made on a variable/feature/dimension resulting in the greatest change in impurity or minimum impurity given a split on a variable in the data set at a node in the tree [12]. The growing of a tree is maintained until either the change in impurity has stopped or is below some bound or the number of samples left to split is too small according to the user. CART trees are easily overtrained, so a single tree is usually pruned to increase its generality. However, a collection of unpruned trees, where each tree is trained to its fullest on a subset of the training data to diversify individual trees can be very useful. When collected in a multi-classifier ensemble and trained using the random forest algorithm, these are called RF-CART.

15.3.2

Binary Hierarchy Classifier Trees

A binary hierarchy of classifiers, where each node is based on a split regarding labels and output instead of input as in the CART case, are naturally organized in trees and can as such be combined, under similar rules as the CART trees, to form RF-BHC. In a BHC, the best split on each node is based on (meta-) class separability starting with a single metaclass, which is split into two meta-classes and so on; the true classes are realized in the leaves. Simultaneously to the splitting process, the Fisher discriminant and the corresponding projection are computed, and the data are projected along the Fisher direction [12]. In ‘‘Fisher space,’’ the projected data are used to estimate the likelihood of a sample belonging to a meta-class and from there the probabilities of a true class belonging to a meta-class are estimated and used to update the Fisher projection. Then, the data are projected using this updated projection and so forth until a user-supplied level of separation is acquired. This approach utilizes natural class affinities in the data, that is, the most natural splits occur early in the growth of the tree [13]. A drawback is the possible instability of the split algorithm. The Fisher projection involves an inverse of an estimate of the within-class covariance matrix, which can be unstable at some nodes of the tree, depending on the data being considered and so if this matrix estimate is singular (to numerical precision), the algorithm fails. As mentioned above, the BHC trees can be combined to an RF-BHC where the best splits on classes are performed on a subset of the features in the data to diversify individual trees and stabilize the aforementioned inverse. Since the number of leaves in a BHC tree is the same as the number of classes in the data set the trees themselves can be very informative when compared to CART-like trees.

Random Forest Classification of Remote Sensing Data

15.4 15.4.1

331

Different Implementations of Random Forests Random Forest: Classification and Regression Tree

The RF-CART approach is based on CART-like trees where trees are grown to minimize an impurity measure. When trees are grown using a minimum Gini impurity criterion [12], the impurity of two descendent nodes in a tree is less than the parents. Adding up the decrease in the Gini value for each variable over all the forest gives a variable importance that is often very consistent with the permutation importance measure.

15.4.2

Random Forest: Binary Hierarchical Classifier

RF-BHC is a random forest based on an ensemble of BHC trees. In the RF-BHC, a split in the tree is based on the best separation between meta-classes. At each node the best separation is found by examining m features selected at random. The value of m can be selected by trials to yield optimal results. In the case where the number of samples is small enough to induce the ‘‘curse’’ of dimensionality, m is calculated by looking to a user-supplied ratio R between the number of samples and the number of features; then m is either used unchanged as the supplied value or a new value is calculated to preserve the ratio R, whichever is smaller at the node in question [7]. An RF-BHC is uniform regarding tree size (depth) because the number of nodes is a function of the number of classes in the dataset.

15.5

Experimental Results

Random forests have many important qualities of which many apply directly to multi- or hyperspectral data. It has been shown that the volume of a hypercube concentrates in the corners and the volume of a hyper ellipsoid concentrates in an outer shell, implying that with limited data points, much of the hyperspectral data space is empty [17]. Making a collection of trees is attractive, when each of the trees looks to minimize or maximize some information content related criteria given a subset of the features. This means that the random forest can arrive at a good decision boundary without deleting or extracting features explicitly while making the most out of the training set. This ability to handle thousands of input features is especially attractive when dealing with multi- or hyperspectral data, because more often than not it is composed of tens to hundreds of features and a limited number of samples. The unbiased nature of the OOB error rate can in some cases (if not all) eliminate the need for a validation dataset, which is another plus when working with a limited number of samples. In experiments, the RF-CART approach was tested using a FORTRAN implementation of random forests supplied on a web page maintained by Leo Breiman and Adele Cutler [18].

15.5.1

Classification of a Multi-Source Data Set

In this experiment we use the Anderson River data set, which is a multi-source remote sensing and geographic data set made available by the Canada Centre for Remote Sensing

Signal and Image Processing for Remote Sensing

332

(CCRS) [16]. This data set is very difficult to classify due to a number of mixed forest type classes [15]. Classification was performed on a data set consisting of the following six data sources: 1. Airborne multispectral scanner (AMSS) with 11 spectral data channels (ten channels from 380 nm to 1100 nm and one channel from 8 mm to 14 mm) 2. Steep mode synthetic aperture radar (SAR) with four data channels (X-HH, X-HV, L-HH, and L-HV) 3. Shallow mode SAR with four data channels (X-HH, X-HV, L-HH, and L-HV) 4. Elevation data (one data channel, where elevation in meters pixel value) 5. Slope data (one data channel, where slope in degrees pixel value) 6. Aspect data (one data channel, where aspect in degrees pixel value) There are 19 information classes in the ground reference map provided by CCRS. In the experiments, only the six largest ones were used, as listed in Table 15.1. Here, training samples were selected uniformly, giving 10% of the total sample size. All other known samples were then used as test samples [15]. The experimental results for random forest classification are given in Table 15.2 through Table 15.4. Table 15.2 shows line by line, how the parameters (number of split variables m and number of trees) are selected. First, a forest of 50 trees is grown for various number of split variables, then the number yielding the highest train accuracy (OOB) is selected, and then growing more trees until the overall accuracy stops increasing is tried. The overall accuracy (see Table 15.2) was seen to be insensitive to variable settings on the interval 10–22 split variables. Growing the forest larger than 200 trees improves the overall accuracy insignificantly, so a forest of 200 trees, each of which considers all the input variables at every node, yields the highest accuracy. The OOB accuracy in Table 15.2 seems to support the claim that overfitting is next to impossible using random forests in this manner. However the ‘‘best’’ results were obtained using 22 variables so there is no random selection of input variables at each node of every tree here because all variables are being considered on every split. This might suggest that a boosting algorithm using decision trees might yield higher overall accuracies. The highest overall accuracies achieved with the Anderson River data set, known to the authors at the time of this writing, have been reached by boosting using j4.8 trees [17]. These accuracies were 100% training accuracy (vs. 77.5% here) and 80.6% accuracy for test data, which are not dramatically higher than the overall accuracies observed here (around 79.0%) with a random forest (about 1.6 percentage points difference). Therefore, even though m is not much less than the total number of variables (in fact equal), the TABLE 15.1 Anderson River Data: Information Classes and Samples Class No. 1 2 3 4 5 6

Class Description

Training Samples

Test Samples

Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Total

971 551 548 542 317 1260 4189

1250 817 701 705 405 1625 5503

Random Forest Classification of Remote Sensing Data

333

TABLE 15.2 Anderson River Data: Selecting m and the Number of Trees Trees

Split Variables

Runtime (min:sec)

50 1 00:19 50 5 00:20 50 10 00:22 50 15 00:22 50 20 00:24 50 22 00:24 100 22 00:38 200 22 01:06 400 22 02:06 1000 22 05:09 100 10 00:32 200 10 00:52 400 10 01:41 1000 10 04:02 22 split variables selected as the ‘‘best’’ choice

OOB acc. (%)

Test Set acc. (%)

68.42 74.00 75.89 76.30 76.01 76.63 77.18 77.51 77.56 77.68 76.65 77.04 77.54 77.66

71.58 75.74 77.63 78.50 78.14 78.10 78.56 79.01 78.81 78.87 78.39 78.34 78.41 78.25

random forest ensemble performs rather well, especially when running times are taken into consideration. Here, in the random forest, each tree is an expert on a subset of the data but all the experts look to the same number of variables and do not, in the strictest sense, utilize the strength of random forests. However, the fact remains that the results are among the best ones for this data set. The training and test accuracies for the individual classes using random forests with 200 trees and 22 variables at each node are given in Table 15.3 and Table 15.4, respectively. From these tables, it can be seen that the random forest yields the highest accuracies for classes 5 and 6 but the lowest for class 2, which is in accordance with the outlier analysis below. A variable importance estimate for the training data can be seen in Figure 15.1, where each data channel is represented by one variable. The first 11 variables are multi-spectral data, followed by four steep-mode SAR data channels, four shallow-mode synthetic aperture radar, and then elevation, slope, and aspect measurements, one channel each. It is interesting to note that variable 20 (elevation) is the most important variable, followed by variable 22 (aspect), and spectral channel 6 when looking at the raw importance (Figure 15.1a), but slope when looking at the z-score (Figure 15.1b). The variable importance for each individual class can be seen in Figure 15.2. Some interesting conclusions can be drawn from Figure 15.2. For example, with the exception of class 6, topographic data TABLE 15.3 Anderson River Data: Confusion Matrix for Training Data in Random Forest Classification (Using 200 Trees and Testing 22 Variables at Each Node) Class No. 1 2 3 4 5 6

1

2

3

4

5

6

%

764 75 32 11 8 81

126 289 62 3 2 69

20 38 430 11 9 40

35 8 21 423 39 16

1 1 0 42 271 2

57 43 51 25 14 1070

78.68 52.45 78.47 78.04 85.49 84.92

Signal and Image Processing for Remote Sensing

334 TABLE 15.4

Anderson River Data: Confusion Matrix for Test Data in Random Forest Classification (Using 200 Trees and Testing 22 Variables at Each Node) Class No. 1 2 3 4 5 6

1

2

3

4

5

6

%

1006 87 26 19 7 105

146 439 67 65 6 94

29 40 564 12 7 49

44 23 18 565 45 10

2 0 3 44 351 5

60 55 51 22 14 1423

80.48 53.73 80.46 80.14 86.67 87.57

(channels 20–22) are of high importance and then come the spectral channels (channels 1–11). In Figure 15.2, we can see that the SAR channels (channels 12–19) seem to be almost irrelevant to class 5, but seem to play a more important role for the other classes. They always come third after the topographic and multi-spectral variables, with the exception of class 6, which seems to be the only class where this is not true; that is, the topographic variables score lower than an SAR channel (Shallow-mode SAR channel number 17 or X-HV). These findings can then be verified by classifying the data set according to only the most important variables and compared to the accuracy when all the variables are

Raw importance 15

10

5

0

2

4

6

8

10

12

14

16

18

20

22

16

18

20

22

(a) z-score

60 40 20 0 (b)

2

4

6

8 10 12 14 Variable (dimension) number

FIGURE 15.1 Anderson river training data: (a) variable importance and (b) z-score on raw importance.

Random Forest Classification of Remote Sensing Data

335

Class 1

Class 2

4 1

3 2

0.5 1 0

5

10

15

20

0

5

10

Class 3

15

20

15

20

Class 4 4

2

3 2

1

1 0

5

10

15

20

0

5

10

Class 5

Class 6

3 4 2

3 2

1 0

1 5 10 15 20 Variable (dimension) number

0

5 10 15 20 Variable (dimension) number

FIGURE 15.2 Anderson river training data: variable importance for each of the six classes.

included. For example leaving out variable 20 should have less effect on classification accuracy in class 6 than on all the other classes. A proximity matrix was computed for the training data to detect outliers. The results of this outlier analysis are shown in Figure 15.3, where it can be seen that the data set is difficult for classification as there are several outliers. From Figure 15.3, the outliers are spread over all classes—with a varying degree. The classes with the least amount of outliers (classes 5 and 6) are indeed those with the highest classification accuracy (Table 15.3 and Table 15.4). On the other hand, class 2 has the lowest accuracy and the highest number of outliers. In the experiments, the random forest classifier proved to be fast. Using an Intelt Celeront CPU 2.20-GHz desktop, it took about a minute to read the data set into memory, train, and classify the data set, with the settings of 200 trees and 22 split variables when the FORTRAN code supplied on the random forest web site was used [18]. The running times seem to indicate a linear time increase when considering the number of trees. They are seen along with a least squares fit to a line in Figure 15.4. 15.5.1.1 The Anderson River Data Set Examined with a Single CART Tree We look to all of the 22 features when deciding a split in the RF-CART approach above, so it is of interest here to examine if the RF-CART performs any better than a single CART tree. Unlike the RF-CART, a single CART is easily overtrained. Here we prune the CART tree to reduce or eliminate any overtraining features of the tree and hence use three data sets, a training set, testing set (used to decide the level of pruning), and a validation set to estimate the performance of the tree as a classifier (Table 15.5 and Table 15.6).

Signal and Image Processing for Remote Sensing

336

Outliers in the training data 20 10 0

500

1000

1500

2000 2500 Sample number

3000

3500

Class 1

Class 2

20

20

10

10

0

200

400 600 Class 3

0

800

20

20

10

10

0

100

200

300 Class 5

400

500

0

20

20

10

10

0

4000

50 100 150 200 250 Sample number (within class)

300

0

100

200

300 Class 4

400

500

100

200

300 Class 6

400

500

200 400 600 800 1000 1200 Sample number (within class)

FIGURE 15.3 Anderson River training data: outlier analysis for individual classes. In each case, the x-axis (index) gives the number of a training sample and the y-axis the outlier measure.

Random forest running times 350 10 variables, slope: 0.235 sec per tree + 22 variables, slope: 0.302 sec per tree

300

+

Running time (sec)

250

200 150 + 100 + 50 0

+

0

200

400

600 800 Number of trees

1000

FIGURE 15.4 Anderson river data set: random forest running times for 10 and 22 split variables.

1200

Random Forest Classification of Remote Sensing Data

337

TABLE 15.5 Anderson River Data Set: Training, Test, and Validation Sets Class Description

Training Samples

Test Samples

Validation Samples

Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Total samples

971 551 548 542 317 1260 4189

250 163 140 141 81 325 1100

1000 654 561 564 324 1300 4403

Class No. 1 2 3 4 5 6

As can be seen in Table 15.6 and from the results of the RF-CART runs above (Table 15.2), the overall accuracy is about 8 percentage points higher ( (78.8/70.81)*100 ¼ 11.3%) than the overall accuracy for the validation set in Table 15.6. Therefore, a boosting effect is present even though we need all the variables to determine a split in every tree in the RF-CART. 15.5.1.2

The Anderson River Data Set Examined with the BHC Approach

The same procedure was used to select the variable m when using the RF-BHC as in the RF-CART case. However, for the RF-BHC, the separability of the data set is an issue. When the number of randomly selected features was less than 11, it was seen that a singular matrix was likely for the Anderson River data set. The best overall performance regarding the realized classification accuracy turned out to be the same as for the RFCART approach or for m ¼ 22. The R parameter was set to 5, but given the number of samples per (meta-)class in this data set, the parameter is not necessary. This means 22 is always at least 5 times smaller than the number of samples in a (meta-)class during the growing of the trees in the RF-BHC. Since all the trees were trained using all the available features, the trees are more or less the same, the only difference is that the trees are trained on different subsets of the samples and thus the RF-BHC gives a very similar result as a single BHC. It can be argued that the RF-BHC is a more general classifier due to the nature of the error or accuracy estimates used during training, but as can be seen in Table 15.7 and Table 15.8 the differences are small, at least for this data set and no boosting effect seems to be present when using the RF-BHC approach when compared to a single BHC. TABLE 15.6 Anderson River Data Set: Classification Accuracy (%) for Training, Test, and Validation Sets Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

Training

Test

Validation

87.54 77.50 87.96 84.69 90.54 90.08 86.89

73.20 47.24 70.00 68.79 79.01 81.23 71.18

71.30 46.79 72.01 69.15 77.78 81.00 70.82

Signal and Image Processing for Remote Sensing

338 TABLE 15.7

Anderson River Data Set: Classification Accuracies in Percentage for a Single BHC Tree Classifier Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

15.5.2

Training

Test

50.57 47.91 58.94 72.32 77.60 71.75 62.54

50.40 43.57 59.49 67.23 73.58 72.80 61.02

Experiments with Hyperspectral Data

The data used in this experiment were collected in the framework of the HySens project, managed by Deutschen Zentrum fur Luft-und Raumfahrt (DLR) (the German Aerospace Center) and sponsored by the European Union. The optical sensor reflective optics system imaging spectrometer (ROSIS 03) was used to record four flight lines over the urban area of Pavia, northern Italy. The number of bands of the ROSIS 03 sensor used in the experiments is 103, with spectral coverage from 0.43 mm through 0.86 mm. The flight altitude was chosen as the lowest available for the airplane, which resulted in a spatial resolution of 1.3 m per pixel. The ROSIS data consist of nine classes (Table 15.9): The data were composed of 43923 samples, split up into 3921 training samples and 40002 for testing. Pseudo color image of the area along with the ground truth mask (training and testing samples) are shown in Figure 15.5. This data set was classified using a BHC tree, an RF-BHC, a single CART, and an RF-CART. The BCH, RF-BHC, CART, and RF-CART were applied on the ROSIS data. The forest parameters, m and R (for RF-BHC), were chosen by trials to maximize accuracies. The growing of trees was stopped when the overall accuracy did not improve using additional trees. This is the same procedure as for the Anderson River data set (see Table 15.2). For the RF-BHC, R was chosen to be 5, m chosen to be 25 and the forest was grown to only 10 trees. For the RF-CART, m was set to 25 and the forest was grown to 200 trees. No feature extraction was done at individual nodes in the tree when using the single BHC approach. TABLE 15.8 Anderson River Data Set: Classification Accuracies in Percentage for an RF-BHC, R ¼ 5, m ¼ 22, and 10 Trees Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

Training

Test

51.29 45.37 59.31 72.14 77.92 71.75 62.43

51.12 41.13 57.20 67.80 71.85 72.43 60.37

Random Forest Classification of Remote Sensing Data

339

TABLE 15.9 ROSIS University Data Set: Classes and Number of Samples Class No. 1 2 3 4 5 6 7 8 9

Class Description

Training Samples

Test Samples

Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Total samples

548 540 392 524 265 532 375 514 231 3,921

6,304 18,146 1,815 2,912 1,113 4,572 981 3,364 795 40,002

Classification accuracies are presented in Table 15.11 through Table 15.14. As in the single CART case for the Anderson River data set, approximately 20% of the samples in the original test set were randomly sampled into a new test set to select a pruning level for the tree, leaving 80% of the original test samples for validation as seen in Table 15.10. All the other classification methods used the training and test sets as described in Table 15.9. From Table 15.11 through Table 15.14 we can see that the RF-BHC give the highest overall accuracies of the tree methods where the single BHC, single CART, and the RFCART methods yielded lower and comparable overall accuracies. These results show that using many weak learners as opposed to a few stronger ones is not always the best choice in classification and is dependent on the data set. In our experience the RF-BHC approach is as accurate or more accurate than the RF-CART when the data set consists of moder-

Class color

(a) University ground truth

(b) University pseudo color (Gray) image

bg 1 2 3 4 5 6 7 8 9 FIGURE 15.5 ROSIS University: (a) reference data and (b) gray scale image.

Signal and Image Processing for Remote Sensing

340 TABLE 15.10

ROSIS University Data Set: Train, Test, and Validation Sets Class No. 1 2 3 4 5 6 7 8 9

Class Description

Training Samples

Test Samples

Validation Samples

Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Total samples

548 540 392 524 265 532 375 514 231 3,921

1,261 3,629 363 582 223 914 196 673 159 8,000

5,043 14,517 1,452 2,330 890 3,658 785 2,691 636 32,002

ately to highly separable (meta-)classes, but for difficult data sets the partitioning algorithm used for the BHC trees can fail to converge (inverse of the within-class covariance matrix becomes singular to numerical precision) and thus no BHC classifier can be realized. This is not a problem when using the CART trees as the building blocks partition the input and simply minimize an impurity measure given a split on a node. The classification results for the single CART tree (Table 15.11), especially for the two classes gravel and bare-soil, may be considered unacceptable when compared to the other methods that seem to yield more balanced accuracies for all classes. The classified images for the results given in Table 15.12 through Table 15.14 are shown in Figure 15.6a through Figure 15.6d. Since BHC trees are of fixed size regarding the number of leafs it is worth examining the tree in the single case (Figure 15.7). Notice the siblings on the tree (nodes sharing a parent): gravel (3)/shadow (9), asphalt (1)/bitumen (7), and finally meadows (2)/bare soil (6). Without too much stretch of the imagination, one can intuitively decide that these classes are related, at least asphalt/ bitumen and meadows/bare soil. When comparing the gravel area in the ground truth image (Figure 15.5a) and the same area in the gray scale image (Figure 15.5b), one can see it has gray levels ranging from bright to relatively dark, which might be interpreted as an intuitive relation or overlap between the gravel (3) and the shadow (9) classes. The selfblocking bricks (8) are the class closest to the asphalt-bitumen meta-class, which again looks very similar in the pseudo color image. So the tree more or less seems to place ‘‘naturally’’ TABLE 15.11 Single CART: Training, Test, and Validation Accuracies in Percentage for ROSIS University Data Set Class Description Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

Validation

80.11 83.52 0.00 88.36 97.36 46.99 85.07 84.63 96.10 72.35

70.74 75.48 0.00 97.08 91.03 24.73 82.14 92.42 100.00 69.59

72.24 75.80 0.00 97.00 86.07 26.60 80.38 92.27 99.84 69.98

Random Forest Classification of Remote Sensing Data

341

TABLE 15.12 BHC: Training and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

78.83 93.33 72.45 91.60 97.74 94.92 93.07 85.60 94.37 88.52

69.86 55.11 62.92 92.20 94.79 89.63 81.55 88.64 96.35 69.83

TABLE 15.13 RF-BHC: Training and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

76.82 84.26 59.95 88.36 100.00 75.38 92.53 83.07 96.10 82.53

71.41 68.17 51.35 95.91 99.28 78.85 87.36 92.45 99.50 75.16

TABLE 15.14 RF-CART: Train and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

86.86 90.93 76.79 92.37 99.25 91.17 88.80 83.46 94.37 88.75

80.36 54.32 46.61 98.73 99.01 77.60 78.29 90.64 97.23 69.70

Signal and Image Processing for Remote Sensing

342 (a) Single BHC

(b) RF-BHC

(c) Single CART

(d) RF-CART

Colorbar indicating the color of information classes in images above

1

2

3

4

5 6 Class Number

7

8

9

FIGURE 15.6 ROSIS University: image classified by (a) single BHC, (b) RF-BHC, (c) single CART, and (d) RF-CART.

related classes close to one another in the tree. That would mean that classes 2, 6, 4, and 5 are more related to each other than to classes 3, 9, 1, 7, or 8. On the other hand, it is not clear if (painted) metal sheets (5) are ‘‘naturally’’ more related to trees (4) than to bare soil (6) or asphalt (1). However, the point is that the partition algorithm finds the ‘‘clearest’’ separation between meta-classes. Therefore, it may be better to view the tree as a separation hierarchy rather than a relation hierarchy. The single BHC classifier finds that class 5 is the most separable class within the first right meta-class of the tree, so it might not be related to meta-class 2–6–4 in any ‘‘natural’’ way, but it is more separable along with these classes when the whole data set is split up to two meta-classes.

52/48

30/70

86/14

67/33

64/36

63/37 FIGURE 15.7 The BHC tree used for classification of Figure 15.5 with left/right probabilities (%).

3

51/49

60/40

9

1

7

8

2

6

4

5

Random Forest Classification of Remote Sensing Data

15 .6

343

Conc lusi ons

The use of random forests for classification of multi-source remote sensing data and hyperspectral remote sensing data has been discussed. Random forests should be considered attractive for classification of both data types. They are both fast in training and classification, and are distribution-free classifiers. Furthermore, the problem with the curse of dimensionality is naturally addressed by the selection of a low m, without having to discard variables and dimensions completely. The only parameter random forests are truly sensitive to is the number of variables m, the nodes in every tree draw at random during training. This parameter should generally be much smaller than the total number of available variables, although selecting a high m can yield good classification accuracies, as can be seen above for the Anderson River data (Table 15.2). In experiments, two types of random forests were used, that is, random forests based on the CART approach and random forests that use BHC trees. Both approaches performed well in experiments. They gave excellent accuracies for both data types and were shown to be very fast.

Acknowledgment This research was supported in part by the Research Fund of the University of Iceland and the Assistantship Fund of the University of Iceland. The Anderson River SAR/MSS data set was acquired, preprocessed, and loaned by the Canada Centre for Remote Sensing, Department of Energy Mines and Resources, Government of Canada.

References 1. L.K. Hansen and P. Salamon, Neural network ensembles, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 993–1001, 1990. 2. L.I. Kuncheva, Fuzzy versus nonfuzzy in combining classifiers designed by Boosting, IEEE Transactions on Fuzzy Systems, 11, 1214–1219, 2003. 3. J.A. Benediktsson and P.H. Swain, Consensus Theoretic Classification Methods, IEEE Transactions on Systems, Man and Cybernetics, 22(4), 688–704, 1992. 4. S. Haykin, Neural Networks, A Comprehensive Foundation, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1999. 5. L. Breiman, Bagging predictors, Machine Learning, 24I(2), 123–140, 1996. 6. Y. Freund and R.E. Schapire: Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference, 148–156, 1996. 7. J. Ham, Y. Chen, M.M. Crawford, and J. Ghosh, Investigation of the random forest framework for classification of hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing, 43(3), 492–501, 2005. 8. S.R. Joelsson, J.A. Benediktsson, and J.R. Sveinsson, Random forest classifiers for hyperspectral data, IEEE International Geoscience and Remote Sensing Symposium (IGARSS 0 05), Seoul, Korea, 25–29 July 2005, pp. 160–163. 9. P.O. Gislason, J.A. Benediktsson, and J.R. Sveinsson, Random forests for land cover classification, Pattern Recognition Letters, 294–300, 2006.

344

Signal and Image Processing for Remote Sensing

10. L. Breiman, Random forests, Machine Learning, 45(1), 5–32, 2001. 11. L. Breiman, Random forest, Readme file. Available at: http://www.stat.berkeley.edu/~ briman/ RandomForests/cc.home.htm Last accessed, 29 May, 2006. 12. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd ed., John Wiley & Sons, New York, 2001. 13. S. Kumar, J. Ghosh, and M.M. Crawford, Hierarchical fusion of multiple classifiers for hyperspectral data analysis, Pattern Analysis & Applications, 5, 210–220, 2002. 14. http://oz.berkeley.edu/users/breiman/RandomForests/cc_home.htm (Last accessed, 29 May, 2006.) 15. G.J. Briem, J.A. Benediktsson, and J.R. Sveinsson, Multiple classifiers applied to multisource remote sensing data, IEEE Transactions on Geoscience and Remote Sensing, 40(10), 2291–2299, 2002. 16. D.G. Goodenough, M. Goldberg, G. Plunkett, and J. Zelek, The CCRS SAR/MSS Anderson River data set, IEEE Transactions on Geoscience and Remote Sensing, GE-25(3), 360–367, 1987. 17. L. Jimenez and D. Landgrebe, Supervised classification in high-dimensional space: Geometrical, statistical, and asymptotical properties of multivariate data, IEEE Transactions on Systems, Man, and Cybernetics, Part. C, 28, 39–54, 1998. 18. http://www.stat.berkeley.edu/users/breiman/RandomForests/cc_software.htm (Last accessed, 29 May, 2006.)

16 Supervised Image Classification of Multi-Spectral Images Based on Statistical Machine Learning

Ryuei Nishii and Shinto Eguchi

CONTENTS 16.1 Introduction ..................................................................................................................... 346 16.2 AdaBoost .......................................................................................................................... 346 16.2.1 Toy Example in Binary Classification ........................................................... 347 16.2.2 AdaBoost for Multi-Class Problems .............................................................. 348 16.2.3 Sequential Minimization of Exponential Risk with Multi-Class .............. 348 16.2.3.1 Case 1................................................................................................. 349 16.2.3.2 Case 2................................................................................................. 349 16.2.4 AdaBoost Algorithm ........................................................................................ 350 16.3 LogitBoost and EtaBoost................................................................................................ 350 16.3.1 Binary Class Case.............................................................................................. 350 16.3.2 Multi-Class Case ............................................................................................... 351 16.4 Contextual Image Classification................................................................................... 352 16.4.1 Neighborhoods of Pixels.................................................................................. 352 16.4.2 MRFs Based on Divergence ............................................................................ 353 16.4.3 Assumptions ...................................................................................................... 353 16.4.3.1 Assumption 1 (Local Continuity of the Classes) ........................ 353 16.4.3.2 Assumption 2 (Class-Specific Distribution) ................................ 353 16.4.3.3 Assumption 3 (Conditional Independence) ................................ 354 16.4.3.4 Assumption 4 (MRFs) ..................................................................... 354 16.4.4 Switzer’s Smoothing Method.......................................................................... 354 16.4.5 ICM Method....................................................................................................... 354 16.4.6 Spatial Boosting................................................................................................. 355 16.5 Relationships between Contextual Classification Methods..................................... 356 16.5.1 Divergence Model and Switzer’s Model....................................................... 356 16.5.2 Error Rates ......................................................................................................... 357 16.5.3 Spatial Boosting and the Smoothing Method .............................................. 358 16.5.4 Spatial Boosting and MRF-Based Methods .................................................. 359 16.6 Spatial Parallel Boost by Meta-Learning..................................................................... 359 16.7 Numerical Experiments ................................................................................................. 360 16.7.1 Legends of Three Data Sets............................................................................. 361 16.7.1.1 Data Set 1: Synthetic Data Set........................................................ 361 16.7.1.2 Data Set 2: Benchmark Data Set grss_dfc_0006 .......................... 361 16.7.1.3 Data Set 3: Benchmark Data Set grss_dfc_0009 .......................... 361 16.7.2 Potts Models and the Divergence Models.................................................... 361 16.7.3 Spatial AdaBoost and Its Robustness ............................................................ 363 345

Signal and Image Processing for Remote Sensing

346

16.7.4 Spatial AdaBoost and Spatial LogitBoost ..................................................... 365 16.7.5 Spatial Parallel Boost ........................................................................................ 367 16.8 Conclusion ....................................................................................................................... 368 Acknowledgment ....................................................................................................................... 370 References ................................................................................................................................... 370

16.1

Introduction

Image classification for geostatistical data is one of the most important issues in the remote-sensing community. Statistical approaches have been discussed extensively in the literature. In particular, Markov random fields (MRFs) are used for modeling distributions of land-cover classes, and contextual classifiers based on MRFs exhibit efficient performances. In addition, various classification methods were proposed. See Ref. [3] for an excellent review paper on classification. See also Refs. [1,4–7] for a general discussion on classification methods, and Refs. [8,9] for backgrounds on spatial statistics. In a paradigm of supervised learning, AdaBoost was proposed as a machine learning technique in Ref. [10] and has been widely and rapidly improved for use in pattern recognition. AdaBoost linearly combines several weak classifiers into a strong classifier. The coefficients of the classifiers are tuned by minimizing an empirical exponential risk. The classification method exhibits high performance in various fields [11,12]. In addition, fusion techniques have been discussed [13–15]. In the present chapter, we consider contextual classification methods based on statistics and machine learning. We review AdaBoost with binary class labels as well as multi-class labels. The procedures for deriving coefficients for classifiers are discussed, and robustness for loss functions is emphasized here. Next, contextual image classification methods including Switzer’s smoothing method [1], MRF-based methods [16], and spatial boosting [2,17] are introduced. Relationships among them are also pointed out. Spatial parallel boost by meta-learning for multi-source and multi-temporal data classification is proposed. The remainder of the chapter is organized as follows. In Section 16.2, AdaBoost is briefly reviewed. A simple example with binary class labels is provided to illustrate AdaBoost. Then, we proceed to the case with multi-class labels. Section 16.3 gives general boosting methods to obtain the robustness property of the classifier. Then, contextual classifiers including Switzer’s method, an MRF-based method, and spatial boosting are discussed. Relationships among them are shown in Section 16.5. The exact error rate and the properties of the MRF-based classifier are given. Section 16.6 proposes spatial parallel boost applicable to classification of multi-source and multi-temporal data sets. The methods treated here are applied to a synthetic data set and two benchmark data sets, and the performances are examined in Section 16.7. Section 16.8 concludes the chapter and mentions future problems.

16.2

AdaBoost

We begin this section with a simple example to illustrate AdaBoost [10]. Later, AdaBoost with multi-class labels is mentioned.

Supervised Image Classification of Multi-Spectral Images 16.2.1

347

Toy Example in Binary Classification

Suppose that a q-dimensional feature vector x 2 Rq observed by a supervised example labeled by þ 1 or 1 is available. Furthermore, let gk(x) be functions (classifiers) of the feature vector x into label set {þ1, 1} for k ¼ 1, 2, 3. If these three classifiers are equally efficient, a new function, sign( f1(x) þ f2(x) þ f3(x)), is a combined classifier based on a majority vote, where sign(z) is the sign of the argument z. Suppose that classifier f1 is the most reliable, f2 has the next greatest reliability, and f3 is the least reliable. Then, a new function sign (b1f1(x) þ b2f2(x) þ b3f3(x)) is a boosted classifier based on a weighted vote, where b1 > b2 > b3 are positive constants to be determined according to efficiencies of the classifiers. Constants bk are tuned by minimizing the empirical risk, which will be defined shortly. In general, let y be the true label of feature vector x. Then, label y is estimated by a signature, sign(F(x)), of a classification function F(x). Actually, if F(x) > 0, then x is classified into the class with label 1, otherwise into 1. Hence, if yF(x) < 0, vector x is misclassified. For evaluating classifier F, AdaBoost in Ref. [10] takes the exponential loss function defined by Lexp (F j x, y) ¼ exp {yF(x)}

(16:1)

The loss function Lexp(t) ¼ exp(t) vs. t ¼ yF(x) is given in Figure 16.1. Note that the exponential function assigns a heavy loss to an outlying example that is misclassified. AdaBoost is apt to overlearn misclassified examples. Let {(xi, yi) 2 Rq { þ 1, 1} j i ¼ 1, 2, . . . , n} be a set of training data. The classification function, F, is determined to minimize the empirical risk: Rexp (F) ¼

n n 1X 1X Lexp (F j xi , yi ) ¼ exp {yi F(xi )} n i ¼1 n i¼1

(16:2)

In the toy example above, F(x) is b1f1(x) þ b2f2(x) þ b3f3(x) and coefficients b1, b2, b3 are tuned by minimizing the empirical risk in Equation 16.2. A fast sequential procedure for minimizing the empirical risk is well known [11]. We will provide a new understanding of the procedure in the binary class case as well as in the multi-class case in Section 16.2.3. A typical classifier is a decision stump defined by a function d sign(xj t), where d ¼ + 1, t 2 R and xj denotes the j-th coordinate of the feature vector x. Nevertheless, each decision stump is poor. Finally, a linearly combined function of many stumps is expected to be a strong classification function.

6 5

exp logit eta 0−1

4 3 2 1 0 −1 −3

−2

−1

0

1

2

3

FIGURE 16.1 Loss functions (loss vs. yF(x)).

Signal and Image Processing for Remote Sensing

348 16.2.2

AdaBoost for Multi-Class Problems

We will give an extension of loss and risk functions to cases with multi-class labels. Suppose that there are g possible land-cover classes C1, . . . , Cg, for example, coniferous forest, broad leaf forest, and water area. Let D ¼ {1, . . . , n} be a training region with n pixels over some scene. Each pixel i in region D is supposed to belong to one of the g classes. We denote a set of all class labels by G ¼ {1, . . . , g}. Let xi 2 Rq be a q-dimensional feature vector observed at pixel i, and yi be its true label in label set G. Note that pixel i in region D is a numbered small area corresponding to the observed unit on the earth. Let F(x, k) be a classification function of feature vector x 2 Rq and label k in set G. We allocate vector x into the class with label ^ yF 2 G by the following maximizer: ^ yF ¼ arg max F(x, k) k2G

(16:3)

Typical examples of the strong classification function would be given by posterior probability functions. Let p(x j k) be a class-specific probability density function of the k-th class, Ck. Thus, the posterior probability of the label, Y ¼ k, given feature vector x, is defined by X p(k j x) ¼ p(x j k)= p(x j ‘) with k 2 G (16:4) ‘2G

which gives a strong classification function, where the prior distribution of class Ck is assumed to be uniform. Note that the label estimated by posteriors p(k j x), or equivalently by log posteriors log p(k j x), is just the Bayes rule of classification. Note also that p(k j x) is a measure of the confidence of the current classification and is closely related to logistic discriminant functions [18]. Let y 2 G be the true label of feature vector x and F(x) a classification function. Then, the loss by misclassification into class label k is assessed by the following exponential loss function: Lexp (F, k j x, y) ¼ exp {F(x, k) F(x, y)}

for k 6¼ y with k 2 G

(16:5)

This is an extension of the exponential loss (Equation 16.1) with binary classification. The empirical risk is defined by averaging the loss functions over the training data set {(xi, yi) 2 Rq G j i 2 D} as Rexp (F) ¼

1XX 1XX Lexp (F, k j xi , yi ) ¼ exp {F(xi , k) F(xi , yi )} n i2D k6¼y n i2D k6¼y i

(16:6)

i

AdaBoost determines the classification function F to minimize exponential risk Rexp(F), in which F is a linear combination of base functions. 16.2.3

Sequential Minimization of Exponential Risk with Multi-Class

Let f and F be fixed classification functions. Then, we obtain the optimal coefficient, b*, which gives the minimum value of empirical risk Remp(F þ b f): b* ¼ arg min {Rexp (F þ bf )}; b2R

(16:7)

Supervised Image Classification of Multi-Spectral Images

349

Applying procedure in Equation 16.7 sequentially, we combine classifiers f1, f2, . . . , fT as F(0) 0, F(1) ¼ b1 f1 , F(2) ¼ b1 f1 þ b2 f2 , . . . , F(T) ¼ b1 f1 þ b2 f2 þ þ bT fT where bT is defined by the formula given in Equation 16.7 with F ¼ F(t 16.2.3.1

1)

and f ¼ ft.

Case 1

Suppose that function f(, k) takes values 0 or 1, and it takes 1 only once. In this case, coefficient b in Equation 16.7 is given by the closed form as follows. Let ^yi,f be the label of * pixel i selected by classifier f, and Df be a subset of D classified correctly by f. Define Vi (k) ¼ F(xi , k) F(xi , yi ) and

vi (k) ¼ f (xi , k) f (xi , yi )

(16:8)

Then, we obtain Rexp (F þ bf ) ¼

n X X

exp [Vi (k) þ bvi (k)]

i¼1 k6¼yi

X X exp Vj (^yjf ) þ exp {Vi (k)} i2Df k6¼yi j62Df j62Df k6¼yi , ^yjf sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ XX X X X ^2 exp {Vi (k)} exp {Vj (^yjf )} þ exp {Vj (k)} (16:9) i2Df k2yi j62Df j62Df k6¼yi , ^yjf ¼ eb

XX

exp {Vi (k)} þ eb

X

The last inequality is due to the relationship between the arithmetic and the geometric means. The equality holds if and only if b ¼ b , where * 2 3 X X X 1 (16:10) exp {Vi (k)}= exp {Vj (^yjf )}5 b ¼ log4 * 2 i2D k6¼y j62D f

f

i

The optimal coefficient, b , can be expressed as *

1 1 «F (f ) b ¼ log * 2 «F (f )

with «F (f ) ¼ P

P

exp Vj (^yjf )

j62Df

P P exp Vj (^yjf ) þ expfVi (k)g

j62Df

(16:11)

i2Df k6¼yi

In the binary class case, «F( f) coincides with the error rate of classifier f. 16.2.3.2

Case 2

If f(, k) takes real values, there is no closed form of coefficient b*. We must perform an iterative procedure for the optimization of risk Remp. Using the Newton-like method, we update estimate b(t) at the t-th step as follows: b(tþ1) ¼ b(t)

n X X i¼1 k6¼yi

vi (k) exp [Vi (k) þ b(t) vi (k)]=

n X X

v2i (k) exp [Vi (k) þ b(t) vi (k)]

i¼1 k6¼yi

(16:12)

Signal and Image Processing for Remote Sensing

350

where vi(k) and Vi(k) are defined in the formulas in Equation 16.8. We observe that the convergence of the iterative procedure starting from b(0) ¼ 0 is very fast. In numerical examples in Section 16.7, the procedure converges within five steps in most cases. 16.2.4

AdaBoost Algorithm

Now, we summarize an iterative procedure of AdaBoost for minimizing the empirical exponential risk. Let {F} ¼ {f : Rq ! G} be a set of classification functions, where G ¼ {1, . . . , g} is the label set. AdaBoost combines classification functions as follows: .

.

.

.

Find classification function f in F and coefficient b that jointly minimize empirical risk Rexp( bf ) defined in Equation 16.6, for example, f1 and b1. Consider empirical risk Rexp(b1f1 þ bf) with b1f1 given from the previous step. Then, find classification function f 2 {F} and coefficient b that minimize the empirical risk, for example, f2 and b2. This procedure is repeated T-times and the final classification function FT ¼ b1 f1 þ þ bTfT is obtained. Test vector x 2 Rq is classified into the label maximizing the final function FT(x, k) with respect to k 2 G.

Substituting exponential risk Rexp for other risk functions, we have different classification methods. Risk functions Rlogit and Reta will be defined in the next section.

16.3

Logi tBoost and EtaBoost

AdaBoost was originally designed to combine weak classifiers for deriving a strong classifier. However, if we combine strong classifiers with AdaBoost, the exponential loss assigns an extreme penalty for misclassified data. It is well known that AdaBoost is not robust. In the multi-class case, this seems more serious than the binary class case. Actually, this is confirmed by our numerical example in Section 16.7.3. In this section, we consider robust classifiers derived by a loss function that is more robust than the exponential loss function.

16.3.1

Binary Class Case

Consider binary class problems such that feature vector x with true label y 2 {1,1} is classified into class label sign (F(x)). Then, we take the logit and the eta loss functions defined by Llogit (Fjx, y) ¼ log [1 þ exp {yF(x)}] Leta (F j x, y) ¼ (1 h) log [1 þ exp {yF(x)}] þ h{yF(x)}

(16:13) for 0 < h < 1

(16:14)

The logit loss function is derived by the log posterior probability of a binomial distribution. The eta loss function, an extension of the logit loss, was proposed by Takenouchi and Eguchi [19].

Supervised Image Classification of Multi-Spectral Images

351

Three loss functions given in Figure 16.1 are defined as follows: Lexp (t) ¼ exp (t), Llogit (t) ¼ log {1 þ exp (2t)} log 2 þ 1 and Leta (t) ¼ (1=2)Llogit (t) þ (1=2)(t)

(16:15)

We see that the logit and the eta loss functions assign less penalty for misclassified data than the exponential loss function does. In addition, the three loss functions are convex and differentiable with respect to t. The convexity assures the uniqueness of the coefficient minimizing Remp(F þ bf ) with respect to b, where Remp denotes an empirical risk function under consideration. The convexity makes the sequential minimization of the empirical risk feasible. For corresponding empirical risk functions, we define the empirical risks as follows:

Rlogit (F) ¼

Reta (F) ¼

16.3.2

n 1X log [1 þ exp {yi F(xi )}], and n i¼1

n n 1h X hX log [1 þ exp {yi F(xi )}] þ {yi F(xi )} n i¼1 n i¼1

(16:16)

(16:17)

Multi-Class Case

Let y be the true label of feature vector x, and F(x, k) a classification function. Then, we define the following function in a similar manner to that of posterior probabilities: plogit (y j x) ¼ P

exp {F(x, y)} k2G exp {F(x, k)}

Using the function, we define the loss functions in the multi-class case as follows: Llogit (F j x, y) ¼ log plogit (y j x) and Leta (F j x, y) ¼ {1 (g 1)h}{ log plogit (y j x)} þ h

X k6¼y

log plogit (k j x)

where h is a constant with 0 < h < 1/(g1). Then empirical risks are defined by the average of the loss functions evaluated by training data set {(xi, yi) 2 Rq Gji 2 D} as Rlogit (F) ¼

n 1X Llogit (F j xi , yi ) and n i¼1

Reta (F) ¼

n 1X Leta (F j xi , yi ) n i¼1

(16:18)

LogitBoost and EtaBoost aim to minimize logit risk function Rlogit(F) and eta risk function Reta(F), respectively. These risk functions are expected to be more robust than the exponential risk function. Actually, EtaBoost is more robust than LogitBoost in the presence of mislabeled training examples.

Signal and Image Processing for Remote Sensing

352

16 .4

Context ual I ma ge Clas sific ation

Ordinary classifiers proposed for independent samples are of course utilized for image classification. However, it is known that contextual classifiers show better performance than noncontextual classifiers. In this section, contextual classifiers: the smoothing method by Switzer [1], the MRF-based classifiers, and spatial boosting [17], will be discussed.

16.4.1

Neighborhoods of Pixels

In this subsection, we define notations related to observations and two sorts of neighborhoods. Let D ¼ {1, . . . , n} be an observed area consisting of n pixels. A q-dimensional feature vector and its observation at pixel i are denoted as Xi and xi, respectively, for i in area D. The class label covering pixel i is denoted by random variable Yi, where Yi takes an element in the label set G ¼ {1, . . . , g}. All feature vectors are expressed in vector form as X ¼ (XT1 , . . . , XTn )T : nq 1

(16:19)

In addition, we define random label vectors as Y ¼ (Y1 , . . . , Yn )T : n 1

and

Yi ¼ Y with deleted Yi : (n 1) 1

(16:20)

Recall that class-specific density functions are defined by p(x j k) with x 2 Rq for deriving the posterior distribution in Equation 16.4. In the numerical study in Section 16.7, the densities are fitted by homoscedastic q-dimensional Gaussian distributions, Nq(m(k), S), with common variance–covariance matrix S, or heteroscedastic Gaussian distributions, Nq(m(k), Sk), with class-specific variance–covariance matrices Sk. Here, we define neighborhoods to provide contextual information. Let d(i, j) denote the distance between centers of pixels i and j. Then, we define two kinds of neighborhoods of pixel i as follows: Ur (i) ¼ { j 2 D j d(i, j) ¼ r} and Nr (i) ¼ {j 2 D j % d(i, j) % r} (16:21) pﬃﬃﬃ where r ¼ 1, 2, 2, . . ., which denotes the radius of the neighborhood. pﬃﬃﬃNote that subset Ur(i) constitutes an isotropic ring region. Subsets Ur(i) with r ¼ 0, 1, 2, 2 are shown in Figure 16.2. Here, we find that U0(i) ¼ {i}, N1(i) ¼ U1(i) is the first-order neighborhood, and Npﬃﬃ2 (i) ¼ U1 (i) [ Upﬃﬃ2 (i) forms the second-order neighborhood of pixel i. In general, we have Nr(i) ¼ [1 % r0 % r Ur0 (i) for r ^ 1.

(a) r = 0

i

(b) r = 1

i

FIGURE 16.2 Isotropic neighborhoods Ur(i) with center pixel i and radius r.

(c) r = √2

i

(d) r = 2

i

Supervised Image Classification of Multi-Spectral Images 16.4.2

353

MRFs Based on Divergence

Here, we will discuss the spatial distribution of the classes. A pairwise dependent MRF is an important model for specifying the field. Let D(k, ‘) > 0 be a divergence between two classes, Ck and C‘ (k 6¼ ‘), and put D(k, k) ¼ 0. The divergence is employed for modeling the MRF. In Potts model, D(k, ‘) is defined by D0 (k, ‘): ¼ 1 if k 6¼ ‘:¼ 0 otherwise. Nishii [18] proposed to take the squared Mahalanobis distance between homoscedastic Gaussian distributions Nq(m(k), S) defined by D1 (m(k), m(‘)) ¼ {m(k) m(‘)}T S1 {m(k) m(‘)}

(16:22) Ð

Nishii and Eguchi (2004) proposed to take Jeffreys divergence {p(x j k) p(xj‘)} log{p(x j k)/p(x j ‘)}dx between densities p(x j k). The models are called divergence models. Let Di(g) be the average of divergences in the neighborhood Nr(i) defined by Equation 16.21 as follows: ( 1 P D(k, yj ), if jNr (i) j 1 j Nr (i)j Di (k) ¼ for (i, k) 2 D G (16:23) j2Nr (i) 0 otherwise where jSj denotes the cardinality of set S. Then, random variable Yi conditional on all the other labels Yi ¼ yi is assumed to follow a multinomial distribution with the following probabilities: exp {bDi (k)} Pr{Yi ¼ k j Yi ¼ yi } ¼ P exp {bDi (‘)}

for k 2 G

(16:24)

‘2G

Here, b is a non-negative constant called the clustering parameter, or the granularity of the classes, and Di(k) is defined by the formula given in Equation 16.23. Parameter b characterizes the degree of the spatial dependency of the MRF. If b ¼ 0, then the classes are spatially independent. Here, radius r of neighborhood Ur(i) denotes the extent of spatial dependency. Of course, b, as well as r, are parameters that need to be estimated. Due to the Hammersley–Clifford theorem, conditional distribution in Equation 16.24 is known to specify the distribution of test label vector, Y, under the mild condition. The joint distribution of test labels, however, cannot be obtained in a closed form. This causes a difficulty in estimating the parameters specifying the MRF. Geman and Geman [6] developed a method for the estimation of test labels by simulated annealing. However, the procedure is time consuming. Besag [4] proposed an iterative conditional mode (ICM) method, which is reviewed in Section 16.4.5. 16.4.3

Assumptions

Now, we make the following assumptions for deriving classifiers. 16.4.3.1 Assumption 1 (Local Continuity of the Classes) If a class label of a pixel is k 2 G, then pixels in the neighborhood have the same class label k. Furthermore, this is true for any pixel. 16.4.3.2 Assumption 2 (Class-Specific Distribution) A feature vector of a sample from class Ck follows a class-specific probability density function p(x j k) for label k in G.

Signal and Image Processing for Remote Sensing

354 16.4.3.3

Assumption 3 (Conditional Independence)

The conditional distribution of vector X in Equation 16.19 given label vector Y ¼ y in Equation 16.20 is given by Pi 2 D p(xi j yi). 16.4.3.4

Assumption 4 (MRFs)

Label vector Y defined by Equation 16.20 follows an MRF specified by divergence (quasidistance) between the classes.

16.4.4

Switzer’s Smoothing Method

Switzer [1] derived the contextual classification method (the smoothing method) under Assumptions 1–3 with homoscedastic Gaussian distributions Nq(m(k), S). Let c(x j k) be its probability density function. Assume that Assumption 1 holds for neighborhoods Nr(). Then, he proposed to estimate label yi of pixel i by maximizing the following joint probability densities: c(xi j k) Pj2Nr (i) c(xj j k)

c(x j k) (2p)q=2 j Sj1=2 exp {D1 (x, m(k))=2}

with respect to label k 2 G, where D1(,) stands for the squared Mahalanobis distance in Equation 16.22. The maximization problem is equivalent to minimizing the following quantity: X D1 (xi , m(k)) þ D1 (xj , m(k)) (16:25) j2Nr (i)

Obviously, Assumption 1 does not hold for the whole image. However, the method still exhibits good performance, and the classification is performed very quickly. Thus, the method is a pioneering work of contextual image classification.

16.4.5

ICM Method

Under Assumptions 2–4 with conditional distribution in Equation 16.24, the posterior probability of Yi ¼ k given feature vector X ¼ x and label vector Yi ¼ yi is expressed by exp {bDi (k)}p(xi j k) pi (k j r, b) Pr{Yi ¼ k j X ¼ x, Yi ¼ yi } ¼ P exp {bDi (‘)gp(xi j ‘)

(16:26)

‘2G

Then, the posterior probability Pr{Y ¼ y j X ¼ x} of label vector y is approximated by the pseudo-likelihood PL(y j r, b) ¼

n Y

pi (yi j r, b)

(16:27)

i¼1

where posterior probability pi (yi j r, b) is defined by Equation 16.26. Pseudo-likelihood in Equation 16.27 is used for accuracy assessment of the classification as well as for parameter estimation. Here, class-specific densities p(x j k) are estimated using the training data.

Supervised Image Classification of Multi-Spectral Images

355

When radius r and clustering parameter b are given, the optimal label vector y, which maximizes the pseudo-likelihood PL(y j r, b) defined by Equation 16.27, is usually estimated by the ICM procedure [4]. At the (t þ 1)-st step, ICM finds the optimal label, yi(t þ 1), (t) for all test pixels i 2 D. This procedure is repeated until the convergence of the given yi label vector, for example y ¼ y(r, b) : n 1. Furthermore, we must optimize a pair of parameters (r, b) by maximizing pseudo-likelihood PL(y(r, b) j r, b). 16.4.6

Spatial Boosting

As shown in Section 16.2, AdaBoost combines classification functions defined over the feature space. Of course, the classifiers give noncontextual classification. We extend AdaBoost to build contextual classification functions, which we call spatial AdaBoost. Define an averaged logarithm of the posterior probabilities (Equation 16.4) in neighborhood Ur(i) (Equation 16.21) by 8 X < 1 pﬃﬃﬃ log p(k j xj ) if jUr (i) j 1 j Ur (i)j j2U (i) for r ¼ 0, 1, 2, . . . (16:28) fr (x, k j i) ¼ r : 0 otherwise where x ¼ (x1T, . . . , xnT)T : qn 1. Therefore, the averaged log posterior f0 (x, k j i) with radius r ¼ 0 is equal to the log posterior log p(k j xi) itself. Hence, the classification due to function f0(x, k j i) is equivalent to a noncontextual classification based on the maximum-aposteriori (MAP) criterion. If the spatial dependency among the classes is not negligible, then the averaged log posteriors f1(x, k j i) in the first-order neighborhood may have information for classification. If the spatial dependency becomes stronger, then fr(x, k j i) with a larger r is also useful. Thus, we adopt the average of the log posteriors fr(x, k j i) as a classification function of center pixel i. The efficiency of the averaged log posteriors as classification functions would be intuitively arranged in the following order: f0 (x, k j i), f1 (x, k j i), fp2ﬃﬃ (x, k j i), f2 (x, k j i), , where x ¼ (xT1 , , xTn )T

(16:29)

The coefficients for the above classification functions can be tuned by minimizing the empirical risk given by Equation 16.6 or Equation 16.18. See Ref. [2] for possible candidates for contextual classification functions. The following is the contextual classification procedure based on the spatial boosting method. .

.

Fix an empirical risk function, Remp(F), of classification function F evaluated over training data set {(xi, yi) 2 Rq G j i 2 D}. Let f0 (x, k j i), f1(x, k j i), fpﬃﬃ2 (x, k j i), . . . , fr (x, k j i) be the classification functions defined by Equation 16.28.

.

Find coefficient b that minimizes empirical risk Remp (bf0). Put the optimal value to b0.

.

If coefficient b0 is negative, quit the procedure. Otherwise, consider empirical risk Remp(b0f0 þ b f1) with b0 f0 obtained by the previous step. Then, find coefficient b, which minimizes the empirical risk. Put the optimal value to b1.

.

If b1 is negative, quit the procedure. Otherwise, consider empirical risk Remp (b0f0 þ b1f1 þ bfpﬃﬃ2 ). This procedure is repeated, and we obtain a sequence of positive coefficients b0, b1, . . . , br for the classification functions.

Signal and Image Processing for Remote Sensing

356

Finally, the classification function is derived by Fr (x, k j i) ¼ b0 f0 (x, k j i) þ b1 f1 (x, k j i) þ þ br fr (x, k j i), x ¼ (xT1 , . . . , xTn )T

(16:30)

Test label y* of test vector x* 2 Rq is estimated by maximizing classification function in Equation 16.30 with respect to label k 2 G. Note that the pixel is classified by the feature vector at the pixel as well as feature vectors in neighborhood Nr(i) in the test area only. There is no need to estimate labels of neighbors, whereas the ICM method requires estimated labels of neighbors and needs an iterative procedure for the classification. Hence, we claim that spatial boosting provides a very fast classifier.

16.5

Relationshi ps between Contextual Classification Methods

Contextual classifiers discussed in the chapter can be regarded as an extension of Switzer’s method from a unified viewpoint, cf. [16] and [2].

16.5.1

Divergence Model and Switzer’s Model

Let us consider the divergence model in Gaussian MRFs (GMRFs), where feature vectors follow homoscedastic Gaussian distributions Nq(m(k), S). The divergence model can be viewed as a natural extension of Switzer’s model. The image with center pixel 1 and its neighbors is shown in Figure 16.3. First-order and second-order neighborhoods of the center pixel are given by sets of pixel numbers N1(1) ¼ {2, 4, 6, 8} and Npﬃﬃ2 (1) ¼ {2, 3, . . . , 9}, respectively. We focus our attention on center pixel 1 and its neighborhood Nr(1) of size 2K in general and discuss the classification problem of center pixel 1 when labels yj of 2K neighbors are observed. ^ be a non-negative estimated value of the clustering parameter b. Then, label y1 of Let b center pixel 1 is estimated by the ICM algorithm. In this case, the estimate is derived by maximizing conditional probability (Equation 16.26) with p(x j k) ¼ c(x j k). This is equiva^Div defined by lent to finding label Y ^ X ^ Div ¼ arg min {D1 (x1 , m(k)) þ b D1 (m(yj ), m(k))}, Y K j2N (1) r k2G

jNr (1) j ¼ 2K

(16:31)

where D1(s, t) is the squared Mahalanobis distance (Equation 16.22). Switzer’s method [1] classifies the center pixel by minimizing formula given in Equation 16.25 with resp Pect to label k. Here, the method can be slightly extended by changing ^ /K. Thus, we define the estimate due to the coefficient for j 2 Nr (1) D1(xj, m(k)) from 1 to b Switizer’s method as follows:

FIGURE 16.3 Pixel numbers (left) and pixel labels (right).

5

4

3

1

1

1

6

1

2

1

1

2

7

8

9

1

2

2

Supervised Image Classification of Multi-Spectral Images

^ Switzer Y

357

8 <

9 = X ^ b ¼ arg min D1 (x1 , m(k)) þ D1 (xj , m(k)) : ; K j2N (1) r k2G

(16:32)

Estimates in Equation 16.31 and Equation 16.32 of label y1 are given by the same formula except for m(yj) and xj in the respective last terms. Note that xj itself is a primitive estimate of m(yj). Hence, the classification method based on the divergence model is an extension of Switzer’s method. 16.5.2

Error Rates

We derive the exact error rate due to the divergence model and Switzer’s model in the previous local region with two classes G ¼ {1, 2}. In the two-class case, the only positive quasi-distance is D(1, 2). Substituting bD(1, 2) for b, we note that the MRF based on Jeffreys divergence is reduced to the Potts model. Let d be the Mahalanobis distance between distributions Nq(m(k), S) for k ¼ 1, 2, and Nr (1) be a neighborhood consisting of 2K neighbors of center pixel 1, where K is a fixed natural number. Furthermore, suppose that the number of neighbors with label 1 or 2 is randomly changing. Our aim is to derive the error rate of pixel 1 given features x1, xj, and labels yj of neighbors j in Nr(1). Recall that Y^Div is the estimated label of y1 obtained by ^Div 6¼ Y1}, is given by formula in Equation 16.31. Then, the exact error rate, Pr{Y e (b^; b, d) ¼ p0 F(d=2) þ

K X

( pk

k¼1

F( d=2 kb^d=g) F(d=2 þ kb^d=g) þ 2 2 1 þ ekbd =g 1 þ ekbd =g

) (16:33)

^ is an where F(x) is the cumulative standard Gaussian distribution function, and b estimate of clustering parameter b. Here, pk gives a prior probability such that the number of neighbors with label 1, W1, is equal to K þ k or K k in the neighborhood Nr(1) for k ¼ 0, 1, . . . , K. In Figure 16.3, first-order neighborhood N1(1) is given by {2, 4, 6, 8} with (W1, K) ¼ (2, 2), and second-order neighborhood Npﬃﬃ2 (1) is given by {2, 3, . . . ,9} with (W1, K) ¼ (3, 4); see Ref. [16]. If prior probability p0 is equal to one, K pixels in neighborhood Nr(1) are labeled 1 and the remaining K pixels are labeled 2 with probability one. In this case, the majority vote of the neighbors does not work. Hence, we assume that p0 is less than one. Then, we have ^ ; b, d); see Ref. [16]. the following properties of the error rate, e(b .

K P

pk . kbd2 =g k¼1 1 þ e ^ ; b, d), of b ^ takes a minimum at b ^ ¼ b (Bayes rule), and the P2. The function, e(b minimum e(b; b, d) is a monotonically decreasing function of the Mahalanobis distance, d, for any fixed positive clustering parameter b. ^ ; b, d), is a monotonically decreasing function of b for any P3. The function, e(b ^ and d. fixed positive constants b ^ ; b, d) < F(d/2) for any positive b ^ if the inequalP4. We have nthe inequality: e(b o P1. e(0; b, d) ¼ F(d/2), lim e(b^; b, d) ¼ p0 F(d=2) þ b^!1

.

.

.

ity b d12 log

1F(d=2) F(d=2)

holds.

Note that the value e(0; b, d) ¼ F(d/2) is simply the error rate due to Fisher’s linear discriminant function (LDF) with uniform prior probabilities on the labels. PK kbd2 =g The asymptotic value given in P1 is the error rate due to the k¼1 p k = 1 þ e

Signal and Image Processing for Remote Sensing

358

vote-for-majority rule when the number of neighbors, W1, is not equal to K. The property, P2, recommends that we use the true parameter, b, if it is known, and this is quite natural. P3 means that the classification becomes more efficient when d or b becomes large. Note that d is a distance in the feature space and b is a distance in the image. P4 implies that the use of spatial information always improves the performance of noncontextual discrimination ^ is far from the true value b. even if estimate b The error rate due to Switzer’s method is obtained in the same form as that of Equation qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 16.33 by replacing d with d * d= 1 þ 4b^2 =g
1F(d=2) F(d=2)

=d2 as in P3. Actually, the error rate exceeds the 1y-

^ (Figure 16.4a). On the other hand, the parameters of (b) intercept F(d/2) for large b meet the condition because b ¼ 1 and d is the same. Hence, the error rate is always less ^ , and this is shown in Figure 16.4b. than F(d/2) for any b 16.5.3

Spatial Boosting and the Smoothing Method

We consider the proposed classification function in Equation 16.30. When the radius is equal to zero (r ¼ 0), classification function in Equation 16.28 is given by x ¼ (xT1 , . . . , xTn )T : nq 1

F0 (x, k j i) ¼ b0 log p(k j xi ) ¼ b0 log p(xi j k) þ C,

(16:34)

where p(k j xi) is the posterior probability given in Equation 16.4, and C denotes a constant independent of label k. Constant b0 is positive, so the maximization of function F0(x, k j i) with respect to label k is equivalent to that of class-specific density p(xi j k). Hence, classification function F0 yields the usual noncontextual Bayes classifier. d = 1.5, b = 0.5

d = 1.5, b = 1.0

0.45

0.45 MRF Switzer

0.35 0.3 0.25

0.35 0.3 0.25

0.2 0.15 (a)

MRF Switzer

0.4 Error rate

Error rate

0.4

0.2 0

0.5

1

1.5

2

^

b

FIGURE 16.4 ^ ). Error rates due to MRF and Switzer’s method (x-axis: b

0.15 (b)

0

1

2 ^

b

3

4

Supervised Image Classification of Multi-Spectral Images

359

When the radius is equal to one (r ¼ 1), the classification function in Equation 16.18 is given by F1 (x, k j i) ¼ b0 log p(k j xi ) þ

X b1 log p(k j xj ) j U1 (i)j j2U (i) 1

¼ b0 log p(xi j k) þ

X b1 log p(xj j k) þ C0 jU1 (i)j j2U (i)

(16:35)

1

where C0 denotes a constant independent of k, and set U1(i) is defined by Equation 16.21, which is the first-order neighborhood of pixel i. Hence, the label of pixel i is estimated by a convex combination of the log-likelihood of the feature vector xi and that of xj with pixel j in neighborhood U1(i). In other words, the label is estimated by a majority vote based on the weighted sum of the log-likelihoods. Thus, the proposed method is an extension of the smoothing method proposed by Switzer [1] for multivariate Gaussian distributions with an isotropic correlation function. However, the derivation is completely different. 16.5.4

Spatial Boosting and MRF-Based Methods

Usually, the MRF-based method uses training data for the estimation of parameters that specify the class-specific densities p(x j k). Clustering parameter b appearing in conditional probability in Equation 16.24 and the test labels are jointly estimated. The ICM method maximizes conditional probability in Equation 16.26 sequentially with a given clustering parameter b. Furthermore, b is chosen to maximize pseudo-likelihood in Equation 16.27 calculated over the test data. This procedure requires a great deal of CPU time. On the other hand, spatial boosting uses the training data for the estimation of coefficients br and parameters of densities p(x j k). The estimation cost of the parameters of the densities is the same as that of the MRF-based classifier. Although estimation of the coefficients requires iterative procedure (Equation 16.12), the number of iterations is very low. After estimation of the coefficients and the parameters, the test labels can be estimated without any iterative procedure. Thus, the required computational resources, such as CPU times, should be significantly different between the two methods, which needs further scrutiny. Although the training data and the test data are numerous, spatial AdaBoost classifies the test data very quickly.

16.6

Spatial Parallel Boost by Meta-Learning

Spatial boosting is extended for classification of spatio-temporal data. Suppose we have S sets of training data over regions D(s) D for s ¼ 1, 2, . . . , S. Let {(xi(s), yi(s) 2 Rq G j i 2 D(s)} be a set of training data with n(s) ¼ j D(s) j samples. Parameters specifying the classspecific densities are estimated by each training data set. Then, we estimate the posterior distribution in Equation 16.4 according to each training data set. Thus, we derive the averaged log posteriors (Equation 16.28). We assume that an appropriate calibration for feature vectors xi(s) throughout the training data sets is preprocessed.

Signal and Image Processing for Remote Sensing

360

Fix loss function L, and let F(s) be a set of classification functions of the s-th training data set. Now, we propose spatial boosting based on S training data sets (Parallel Boost). Let F be a classification function applicable to all training data sets. Then, we propose an empirical risk function of F: Remp (F) ¼

S X

R(s) emp (F)

(16:36)

s¼1

where R(s) emp(F) denotes empirical risk given in Equation 16.6 or Equation 16.18 due to loss function L evaluated over the s-th training data set. The following steps describe parallel boost: .

.

Let {f0(s), f1(s), . . . , fr(s)} be a set of averages of the log posteriors (Equation 16.28) evaluated through the s-th training data set for s ¼ 1, . . . , S. Stage for radius 0 (a) Obtain the optimal coefficient and the training data set defined by (a01 , b01 ) ¼ arg min Remp (bf0(s) ) (s, b) 2 {1, . . . , S} R (b) If coefficient b01 is negative, quit the procedure. Otherwise, find the minimizer of empirical risk Remp (b01 f0(a01) þ bf0(s)). Let s ¼ a02 and b ¼ b02 2 R be the solution. (c) If coefficient b02 is negative, quit the procedure. Repeat this at most S times and define the convex combination, F0 b01 f0(a01) þ þ b0m0 f0(a0m0), of the classification functions, where m0 denotes the number of replications of the substeps.

.

.

Stage for radius 1 (first-order neighborhood) (a) Let F0 be a function obtained by the previous stage. Then, find classification function f 2 F1 and coefficient b 2 R that minimize the empirical risk Remp(F0 þ b f0(s)), for example, s ¼ a11 and b ¼ b11. (b) If coefficient b11 is negative, quit the procedure. Otherwise, minimize the empirical risk, Remp(F0 þ b11 f1(a11) þ bf1(s)), given F0 and b11 f1(a11). This is repeated and we define a convex combination F1 b11 f1(a11) þ þ b1a1 f1(a1m1). The function F0 þ F1 constitutes a contextual classifier. r The stage is repeated up to radius r, and we obtain the classification function F F0 þ F1 þ þ Fr.

Note that spatial parallel boost is based on the assumption that training data sets and test data are not significantly different, especially in the spatial distribution of the classes.

16.7

Numerical Experiments

Various classification methods discussed here will be examined by applying them to three data sets. Data set 1 is a synthetic data set, and Data sets 2 and 3 are real data sets offered in Ref. [20] as benchmarks. Legends of the data sets are given in the following

Supervised Image Classification of Multi-Spectral Images

361

section; then, the classification methods are applied to the data sets for evaluating the efficiency of the methods.

16.7.1 16.7.1.1

Legends of Three Data Sets Data Set 1: Synthetic Data Set

Data set 1 is constituted by multi-spectral images generated over the image shown in Figure 16.5a. There are three classes (g ¼ 3), and the labels correspond to black, white, and gray. Sample sizes from these classes are 3330, 1371, and 3580 (n ¼ 8281), respectively. We simulate four-dimensional spectral images (q ¼ 4) at each pixel of the true image of Figure 16.4a following independent multi-var iate Gaussian distributions pﬃﬃﬃ with mean vectors m1 ¼ (0 0 0 0)T, m2 ¼ (1 1 0 0)T= 2, and m3 ¼ (1.04980.6379 0 0)T and with common variance–covariance matrix s2 I, where I denotes the identity matrix. The mean vectors were chosen to maximize pseudo-likelihood (Equation 16.27) of the training data based on the squared Mahalanobis distance used as the divergence in Equation 16.22. Test data of size 8281 are similarly generated over the same image. Gaussian density functions with the common variance–covariance matrix are used to derive the posterior probabilities. 16.7.1.2

Data Set 2: Benchmark Data Set grss_dfc_0006

Data set grss_dfc_0006 is a benchmark data set for supervised image classification. The data set can be accessed from the IEEE GRS-S Data Fusion reference database [20], which consists of samples acquired by ATM and SAR (six and nine bands, respectively; q ¼ 15) observed at Feltwell, UK. There are five agricultural classes of sugar beets, stubble, bare soil, potatoes, and carrots (g ¼ 5). Sample sizes of the training data and test data are 5072 ¼ 1436 þ 1070 þ 341 þ 1411 þ 814 and 5760 ¼ 2017 þ 1355 þ 548 þ 877 þ 963, respectively, in a rectangular region of size 350 250. The data set was analyzed in Ref. [21] through various methods based on statistics and neural networks. In this case, the labels of the test and training data are observed sparsely. 16.7.1.3

Data Set 3: Benchmark Data Set grss_dfc_0009

Data set grss_dfc_0009 is also taken from the IEEE GRS-S Data Fusion reference database, which consists of examples acquired by Landsat 7 TMþ (q ¼ 7) with 11 agricultural classes (g ¼ 11) in Flevoland, Netherlands. Training data and test data sizes are 2891 and 2890, respectively, in a square region of size 512 512. We note that the labels of the test and training data of Data sets 2 and 3 are observed sparsely in the region.

16.7.2

Potts Models and the Divergence Models

We compare the performance of the Potts and the divergence models applied to Data sets 1 and 2 (see Table 16.1 and Table 16.2). Gaussian distributions with a common variance–covariance matrix (homoscedastic), and Gaussian distributions with class-specific matrices (heteroscedastic) are employed for class-specific distributions. The former corresponds to the LDF and the latter to the quadratic discriminant function (QDF). Test errors due to GMRF-based classifiers with several radii r are listed in the tables. Numerals in boldface denote the optimal values in the respective columns. We see that the divergence models give the minimum values in Table 16.1 and Table 16.2.

Signal and Image Processing for Remote Sensing

362 (a) True labels

(b) r = 0

(c) r = 1

(d) r = √2

(e) r = √8

(f) r = √13

FIGURE 16.5 (a) True labels of data set 1, (b) labels estimated by LDF, (c)–(f) labels estimated by spatial AdaBoost with b0 f0 þ b1 f1 þ þ br fr.

In the homoscedastic cases of Table 16.1 and Table 16.2, the divergence model gives a performance similar to that of the Potts model. In the heteroscedastic case, the divergence model is superior to the Potts model to a small degree. Classified images with s2 ¼ 1 are given in Figure 16.5b through Figure 16.5f. Figure 16.5b is an image classified by the LDF, which exhibits very poor performance. As radius r becomes larger, images classified by

TABLE 16.1 Test Error Rates (%) Due to Gaussian MRFs with Variance–Covariance Matrices s2I with s2 ¼ 1, 2 for Data Set 1 Homoscedastic Gaussian MRFs 2

s2 ¼ 2

s ¼ 1

s2 ¼ 3

s2 ¼ 4

Radius r2

Potts

Jeffreys

Potts

Jeffreys

Potts

Jeffreys

Potts

Jeffreys

0 1 2 4 5 8 9 10 13 16 17 18 20

41.66 9.35 3.90 3.47 3.19 3.39 3.65 3.89 4.46 4.49 4.96 5.00 5.68

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.57 9.97 3.90 3.02 3.32 3.35 3.51 3.86 4.36 5.00 5.30 5.92 6.50

41.57 9.68 4.46 3.12 2.96 3.37 4.06 4.09 4.66 4.99 5.54 5.83 6.56

52.63 25.60 13.26 9.82 8.03 7.76 8.28 8.88 10.12 11.16 12.04 12.64 14.89

52.63 27.41 12.23 9.20 7.75 8.57 8.20 10.08 10.20 11.68 11.65 12.67 13.55

54.45 30.76 17.49 11.83 10.72 10.14 10.58 10.57 13.25 14.25 15.34 16.07 17.18

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

Supervised Image Classification of Multi-Spectral Images

363

TABLE 16.2 Test Error Rates (%) Due to Homoscedastic and Heteroscedastic Gaussian MRFs with Two Sorts of Divergences for Data Set 2 Gaussian MRFs Homoscedastic 2

Heteroscedastic

Radius r

Potts

Jeffreys

Potts

Jeffreys

0 4 9 16 25 36 49 64

8.61 6.44 6.16 5.99 6.16 6.49 6.37 6.30

8.61 6.09 5.69 5.35 5.69 5.64 6.18 6.65

14.67 12.93 13.09 13.28 12.80 13.16 12.90 12.92

14.67 11.44 10.19 9.67 9.39 9.55 9.90 9.76

spatial AdaBoost becomes clearer. Therefore, we see that spatial AdaBoost improves the noncontextual classification result (r ¼ 0) significantly.

16.7.3

Spatial AdaBoost and Its Robustness

Spatial AdaBoost is applied to the three data sets. The error rates due to spatial AdaBoost and the MRF-based classifier for Data set 1 are listed in Table 16.3. Here, homoscedastic Gaussian distributions are employed for class-specific distributions in both classifiers. Hence, classification with radius r ¼ 0 indicates noncontextual classification by the LDF. We used the GMRF with the divergence model discussed in Section 16.4.3. From Table 16.3, we see that the minimum error rate due to AdaBoost is higher than that due to the GMRF-based method, but the value is still comparable to that due to the TABLE 16.3 Error Rates (%) Due to Homoscedastic GMRF and Spatial AdaBoost for Data Set 1 with Variance– Covariance Matrix s2I s2 ¼ 1 Radius r2 0 1 2 4 5 8 9 10 13 16 17 18 20

s2 ¼ 4

GMRF

AdaBoost

GMRF

AdaBoost

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.65 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34 4.31 4.19 4.34 4.43

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

54.45 40.88 33.79 28.78 22.71 20.50 18.96 16.58 14.99 14.14 13.02 12.57 12.16

Signal and Image Processing for Remote Sensing

364 (a) Data set 1

(b) Data set 2 0.06

3 Coefficients

Coefficients

0.05 2

1

0.04 0.03 0.02 0.01 0

−0.01 0

0

10

20

30

40

50

−0.02

20 40 60 80 Squared radii of neighborhoods

0

Squared radii of neighborhoods

100

FIGURE 16.6 Estimated coefficients for the log posteriors in the ring regions of data set 1 (left) and data set 2 (right).

GMRF-based method. The left figure in Figure 16.6 is a plot of the estimated coefficients for the averaged log posterior fr vs. r2. All the coefficients are positive. Therefore, the problem of stopping to combine classification function arises. Training and test errors as radius r becomes large are given in Figure 16.7. The test error takes the minimum value at r2 ¼ 17, whereas the training error is stable for large r. The same study as the above is applied to Data set 2 (Table 16.4). We obtain a similar result as that observed in Table 16.3. The left figur e in Figure 16.6 gives the estimated pﬃﬃﬃﬃﬃ coefficient. We see that coefficient br with r ¼ 72 isp ﬃﬃﬃﬃﬃ 0.012975, a negative value. This implies that averaged log posterior fr with r ¼ 72 ispﬃﬃﬃﬃﬃ not appropriate for the classification. Therefore, we must exclude functions of radius 72 or larger. The application to Data set 3 leads to a quite different result (Table 16.5). We fit homoscedastic and heteroscedastic Gaussian distributions for feature vectors. Hence, classification with radius r ¼ 0 is performed by the LDF and the QDF. The error rates due to noncontextual classification function log p(k j x) and the coefficients are tabulated in Table 16.5. The test error due to the QDF is 1.66%, and this is excellent. Unfortunately, coefficients b0 of functions log p(k j x) in both cases are estimated by the negative values.

(b) Test data

0.4

0.4

0.3

0.3

Test errors

Training errors

(a) Training data

0.2

0.1

0.1

0 0

0.2

10

20

30

40

Squared radii of neighborhoods

50

0 0

10

20

30

40

50

Squared radii of neighborhoods

FIGURE 16.7 Error rates with respect to squared radius, r2, for training data (left) and test data (right) due to classifiers b0f0 þ b1f1 þ þ brfr for data set 1.

Supervised Image Classification of Multi-Spectral Images

365

TABLE 16.4 Error Rates (%) Due to GMRF and Spatial AdaBoost for Data Set 2 Homoscedastic 2

Radius r 0 4 9 16 25 36 49 64

GMRF

AdaBoost

8.61 6.09 5.69 5.35 5.69 5.69 6.13 6.65

8.61 7.08 6.11 5.95 5.83 5.85 5.99 6.01

Therefore, the classification functions b0 log p(k j x) are not applicable. This implies that the exponential loss defined by Equation 16.5 is too sensitive to misclassified data. Error rates for spatial AdaBoost b0f0 þ b1f1 are listed in Table 16.6 when coefficient b0 is predetermined and b1 is tuned by the empirical risk Remp(b0f0 þ b1f1). The LDF and the QDF columns correspond to the posterior probabilities based on the homoscedastic Gaussian distributions and the heteroscedastic Gaussian distributions, respectively. For both of these cases, information in the first-order neighborhood (r ¼ 1) improves the classification slightly. We note that homoscedastic and heteroscedastic GMRFs have minimum error rates of 4.01% and 0.52%, respectively.

16.7.4

Spatial AdaBoost and Spatial LogitBoost

pﬃﬃﬃﬃﬃ We apply classification functions F0, F1, . . . , Fr with r ¼ 20 to the test data,pwhere ﬃﬃﬃﬃﬃ function Fr is defined by Equation 16.30, and coefficients b0, b1, . . . , br with r ¼ 20 are sequentially tuned by minimizing exponential risk (Equation 16.6) or logit risk (Equation 16.18). Error rates due to GMRF-based classifiers and spatial boosting methods for error variances s2 ¼ 1 and 4 for Data set 1 are compared in Table 16.7. We see that GMRF is superior to the spatial boosting methods to a small degree. Furthermore, spatial AdaBoost and spatial LogitBoost exhibit similar performances for Data set 1. Note that spatial boosting is very fast compared with GMRF-based classifiers. This observation is also true for Data set 2 (Table 16.8). Then, GMRF and spatial LogitBoost were applied to Data set 3 (Table 16.9). Spatial LogitBoost still works well in spite of the failure of spatial AdaBoost. Note that spatial AdaBoost and spatial LogitBoost use the same classification functions: f0, f1, . . . , fr. However, the loss function used by TABLE 16.5 Optimal Coefficients Tuned by Minimizing Exponential Risks Based on LDF and QDF for Data Set 3

Error rate by f0 Tuned b0 to f0 Error rate by b0f0

LDF

QDF

10.69 % 0.006805 Nearly 100%

1.66 % 0.000953 Nearly 100%

Signal and Image Processing for Remote Sensing

366 TABLE 16.6

Error Rates (%) Due to Contextual Classification Function b0f0 þ b1f1 Given b0 for Data Set 3. b1 Is Tuned by Minimizing the Empirical Exponential Risk b0

Homoscedastic

Heteroscedastic

107 106 105 104 103 102 101 1 1

9.97 10.00 10.00 10.00 9.10 9.45 10.24 10.69 10.69

1.35 1.35 1.35 1.25 1.00 1.38 1.66 1.66 1.66

TABLE 16.7 Error Rates (%) Due to GMRF and Spatial Boosting for Data Set 1 with Variance–Covariance Matrix s2I s2 ¼ 1

s2 ¼ 4

Spatial Boosting 2

Radius r

Homoscedastic GMRF

AdaBoost

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.66 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34 4.31 4.19 4.34 4.43

0 1 2 4 5 8 9 10 13 16 17 18 20

Spatial Boosting

LogitBoost

Homoscedastic GMRF

AdaBoost

LogitBoost

41.66 19.45 11.63 8.36 5.93 5.43 4.96 4.65 4.54 4.43 4.31 4.58 4.66

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

54.45 40.88 33.79 28.78 22.71 20.50 18.96 16.58 14.99 14.14 13.02 12.57 12.16

54.45 40.77 33.95 28.74 22.44 20.55 19.01 16.56 15.08 14.12 13.22 12.69 12.11

TABLE 16.8 Error Rates (%) Due to GMRF and Spatial Boosting for Data Set 2 Homoscedastic 2

Radius r 0 4 9 16 25 36 49 64

Homoscedastic GMRF

AdaBoost

LogitBoost

8.61 6.09 5.69 5.35 5.69 5.69 6.13 6.65

8.61 7.08 6.11 5.95 5.83 5.85 5.99 6.01

8.61 7.93 7.33 6.74 6.35 6.13 6.32 6.48

Supervised Image Classification of Multi-Spectral Images

367

TABLE 16.9 Error Rates (%) Due to GMRF and Spatial LogitBoost for Data Set 3 Homoscedastic 2

Radius r 0 1 4 8 9 10 13 16

Heteroscedastic

GMRF

LogitBoost

GMRF

LogitBoost

10.69 8.58 7.06 3.81 4.01 4.08 4.01 4.01

10.69 9.48 6.51 5.43 5.40 5.29 5.16 5.26

1.66 1.97 1.69 0.55 0.52 0.52 0.52 0.52

1.66 1.31 0.93 0.76 0.76 0.83 0.87 0.90

LogitBoost assigns less penalty to misclassified data. Hence, spatial LogitBoost seems to have a robust property. 16.7.5

Spatial Parallel Boost

We examine parallel boost by applying it to Data set 1. Three rectangular regions found in Figure 16.8a give training areas. Now, we consider the following four subsets of the training data set: .

Near the center of Figure 16.8a with sample size 75 (D(1))

.

North-west of Figure 16.8a with sample size 160 (D(2)) (a) True labels

(d) D (2), r = √13

(b) D (1) ∪ D (2) ∪ D (3) , r = 0

(e) D (3), r = √13

(c) D (1), r = √13

(f) D (1) ∪ D (2) ∪ D (3) , r = √13

FIGURE 16.8 True labels, training regions, and classified images. The n denotes the size of training samples. (a) True labels with three training regions. (b) Parallel boost with pﬃﬃﬃﬃﬃ r ¼ 0 (error rate ¼ 42.29%, n ¼ 435) noncontextual (1) ¼ 10.26%, p n ﬃﬃﬃﬃﬃ ¼ 75). (d) Spatial boost by D(2) classification. pﬃﬃﬃﬃﬃ (c) Spatial boost by D with r ¼ 13 (error rate (3) with r ¼ 13 (error rate ¼ 6.29%, n ¼ 160). (e) p Spatial boost by D with r ¼ 13 (error rate ¼ 7.46%, n ¼ 200). ﬃﬃﬃﬃﬃ (f) Parallel boost by D(1) [ D(2) [ D(3) with r ¼ 13 (error rate ¼ 4.96%, n ¼ 435).

Signal and Image Processing for Remote Sensing

368 TABLE 16.10

Error Rates (%) Due to Spatial AdaBoost Based on Four Training Data Sets and Spatial Parallel Boost Based on Union of Three Subsets in Data Set 1 Spatial AdaBoost 2

(1)

Radius r

D

0 1 2 4 5 8 9 10 13

43.13 23.84 17.16 13.88 10.88 10.49 10.48 10.41 10.26

(2)

Parallel Boost (3)

D

D

41.67 20.23 12.84 9.44 7.34 6.97 6.64 6.35 6.29

45.88 26.51 18.92 15.25 11.21 10.26 9.26 8.08 7.46

Whole

D(1)U D(2)U D(3)

41.75 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34

42.29 21.18 13.03 9.46 6.39 5.78 5.41 5.19 4.96

.

South-east of Figure 16.8a with sample size 200 (D(3))

.

All training data with sample size 8281

The test data set is also of sample size 8281. The training region D(1) is relatively small, pﬃﬃﬃﬃﬃ so we only consider radius r up to 13. Gaussian density functions with a common variance–covariance matrix are fitted to each training data set, and we obtain four types of posterior probabilities for each pixel in training data sets and the test data set. Spatial AdaBoost derived by the four training data sets and spatial parallel boost derived by the region D(1) [ D(2) [ D(3) are applied to the test region shown in Figure 16.2a. Error rates due to five classifiers examined with the test data using 8281 samples are comparedpin ﬃﬃﬃﬃﬃTable 16.10. Each row corresponds to spatial boost based on radius r in {0, 1, . . . , 13}. The second to fourth columns correspond to Spatial AdaBoost based on the four training data sets. The last column is the error rate due to spatial parallel boost. We see that spatial AdaBoost exhibits a better performance as the training data size increases. The classifier based on D(3) seems exceptional, but it gives better results for a large r. Spatial parallel boost exhibits a performance similar to that of spatial AdaBoost based on all the training data. Here, we note again that our method uses only 435 samples, whereas all the training data have 8281 samples. A noncontextual classification result based on spatial parallel boost is given in Figure 16.8b, and it is very poor. Anpexcellent classification result is obtained as radius r becomes ﬃﬃﬃﬃﬃ bigger. The result with r ¼ 13, which is better than the results (b), (d), and (f) due to spatial AdaBoost using three training data sets, is given in Figure 16.8e. The error rates corresponding to figures (b), (d), (e), and (f) are given in Table 16.10 in bold.

16.8

Conclusion

We have considered contextual classifiers, Switzer’s smoothing method, MRF-based methods, and spatial boosting methods. We showed that the last two methods can be regarded as extensions of the Switzer’s method. The MRF-based method is known to be efficient, and it exhibited the best performance in our numerical studies given here. However, the method requires extra computation time for (1) the estimation of parameters

Supervised Image Classification of Multi-Spectral Images

369

specifying the MRF and (2) the estimation of test labels, even if the ICM procedure is utilized for (2). Both estimations are accomplished by iterative procedures. The method has already been fully discussed in the literature. The following are our remarks on spatial boosting and its possible future development. Spatial boosting is derived as long as posterior probabilities are available. Hence, posteriors need not be defined by statistical models. Posteriors due to statistical support vector machines (SVM) [22,23] could be utilized. Spatial boosting is derived by fewer assumptions than those of an MRF-based method. Features of spatial boosting are as follows: (a) Various types of posteriors can be implemented. Pairwise coupling [24] is a popular strategy for multi-class classification that involves estimating posteriors for each pair of classes, and then coupling the estimates together. (b) Directed neighborhoods can be implemented. See Figure 16.9 for the directed neighborhoods. They would be suitable for classification of textured images. (c) The method is applicable to a situation with multi-source and multi-temporal data sets. (d) Classification is performed non-iteratively, and the performance is similar to that of MRF-based classifiers. The following new questions arise: . . .

How do we define the posteriors? This is closely related to feature (a). How do we choose the loss function? How do we determine the maximum radius for regarding log posteriors into the classification function.

This is an old question in boosting: How many classification functions are sufficient for classification? Spatial boosting should be used in cases where estimation of coefficients is performed through test data only. This is a first step for possible development into unsupervised classification. In addition, the method will be useful for unmixing. Originally, classification function F defined by Equation 16.30 is given by a convex combination of averaged

(a) r = 1

(b) r = 1

(c) r = √2

(d) r = √2

i

i

i

i

FIGURE 16.9 Directed subsets with center pixel i.

370

Signal and Image Processing for Remote Sensing

log posteriors. Therefore, F would be useful for estimation of posterior probabilities, which is directly applicable to unmixing.

Acknowledgment Data sets grss_dfc_0006 and grss_dfc_0009 were provided by the IEEE GRS-S Data Fusion Committee.

References 1. Switzer, P., (1980) Extensions of linear discriminant analysis for statistical classification of remotely sensed satellite imagery, Mathematical Geology, 12(4), 367–376. 2. Nishii, R. and Eguchi, S. (2005), Supervised image classification by contextual AdaBoost based on posteriors in neighborhoods, IEEE Transactions on Geoscience and Remote Sensing, 43(11), 2547–2554. 3. Jain, A.K., Duin, R.P.W., and Mao, J. (2000), Statistical pattern recognition: a review, IEEE Transaction on Pattern Analysis and Machine Intelligence, PAMI-22, 4–37. 4. Besag, J. (1986), On the statistical analysis of dirty pictures, Journal of the Royal Statistical Society, Series B, 48, 259–302. 5. Duda, R.O., Hart, P.E., and Stork, D.G. (2000), Pattern Classification (2nd ed.), John Wiley & Sons, New York. 6. Geman, S. and Geman, D. (1984), Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 721–741. 7. McLachlan, G.J. (2004), Discriminant Analysis and Statistical Pattern Recognition (2nd ed.), John Wiley & Sons, New York. 8. Chile`s, J.P. and Delfiner, P. (1999), Geostatistics, John Wiley & Sons, New York. 9. Cressie, N. (1993), Statistics for Spatial Data, John Wiley & Sons, New York. 10. Freund, Y. and Schapire, R.E. (1997), A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55(1), 119–139. 11. Friedman, J., Hastie, T., and Tibshirani, R. (2000), Additive logistic regression: a statistical view of boosting (with discussion), Annals of Statistics, 28, 337–407. 12. Hastie, T., Tibshirani, R., and Friedman, J., (2001), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York. 13. Benediktsson, J.A. and Kanellopoulos, I. (1999), Classification of multisource and hyperspectral data based on decision fusion, IEEE Transactions on Geoscience and Remote Sensing, 37, 1367–1377. 14. Melgani, F. and Bruzzone, L. (2004), Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on Geoscience and Remote Sensing, 42(8), 1778–1790. 15. Nishii, R. (2003), A Markov random field-based approach to decision level fusion for remote sensing image classification, IEEE Transactions on Geoscience and Remote Sensing, 41(10), 2316–2319. 16. Nishii, R. and Eguchi, S. (2002), Image segmentation by structural Markov random fields based on Jeffreys divergence, Res. Memo., Institute of Statistical Mathematics, No. 849. 17. Nishii, R. and Eguchi, S. (2005), Robust supervised image classifiers by spatial AdaBoost based on robust loss functions, Proc. SPIE, Vol. 5982. 18. Eguchi, S. and Copas, J. (2002), A class of logistic-type discriminant functions, Biometrika, 89, 1–22. 19. Takenouchi, T. and Eguchi, S. (2004), Robustifying AdaBoost by adding the naive error rate, Neural Computation, 16, 767–787.

Supervised Image Classification of Multi-Spectral Images

371

20. IEEE GRSS Data Fusion reference database, Data sets GRSS_DFC_0006, GRSS_DFC_0009 Online. http://www.dfc-grss.org/, 2001. 21. Giacinto, G., Roli, F., and Bruzzone, L. (2000), Combination of neural and statistical algorithms for supervised classification of remote-sensing images, Pattern Recognition Letters, 21, 385–397. 22. Kwok, J.T. (2000), The evidence framework applied to support vector machine, IEEE Transactions on Neural Networks, 11(5), 1162–1173. 23. Wahba, G. (1999), Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV, In Advances in Kernel Methods—Support Vector Learning, Scho¨lkopf, Burges and Smola, eds., MIT press, Cambridge, MA, 69–88. 24. Hastie, T. and Tibshirani, R. (1998), Classification by pairwise coupling, Annals of Statistics, 26(2), 451–471.

17 Unsupervised Change Detection in Multi-Temporal SAR Images

Lorenzo Bruzzone and Francesca Bovolo

CONTENTS 17.1 Introduction ..................................................................................................................... 373 17.2 Change Detection in Multi-Temporal Remote-Sensing Images: Literature Survey ............................................................................................................ 376 17.2.1 General Overview ............................................................................................. 376 17.2.2 Change Detection in SAR Images .................................................................. 379 17.2.2.1 Preprocessing.................................................................................... 379 17.2.2.2 Multi-Temporal Image Comparison............................................. 380 17.2.2.3 Analysis of the Ratio and Log-Ratio Image ................................ 381 17.3 Advanced Approaches to Change Detection in SAR Images: A Detail-Preserving Scale-Driven Technique............................................................. 383 17.3.1 Multi-Resolution Decomposition of the Log-Ratio Image ......................... 385 17.3.2 Adaptive Scale Identification .......................................................................... 387 17.3.3 Scale-Driven Fusion.......................................................................................... 388 17.4 Experimental Results and Comparisons..................................................................... 390 17.4.1 Data Set Description ......................................................................................... 390 17.4.2 Results................................................................................................................. 392 17.5 Conclusions...................................................................................................................... 396 Acknowledgments ..................................................................................................................... 397 References ................................................................................................................................... 397

17.1

Introduction

The recent natural disasters (e.g., tsunami, hurricanes, eruptions, earthquakes, etc.) and the increasing amount of anthropogenic changes (e.g., due to wars, pollution, etc.) gave prominence to the topics related to environment monitoring and damage assessment. The study of environmental variations due to the time evolution of the above phenomena is of fundamental interest from a political point of view. In this context, the development of effective change-detection techniques capable of automatically identifying land-cover

373

Signal and Image Processing for Remote Sensing

374

variations occurring on the ground by analyzing multi-temporal remote-sensing images assumes an important relevance for both the scientific community and the end-users. The change-detection process considers images acquired at different times over the same geographical area of interest. These images acquired from repeat-pass satellite sensors are an effective input for addressing change-detection problems. Several different Earthobservation satellite missions are currently operative, with different kinds of sensors mounted on board (e.g., MODIS and ASTER on board NASA’s TERRA satellite, MERIS and ASAR on board ESA’s ENVISAT satellite, Hyperion on board EO-1 NASA’s satellite, SAR sensors on board RADARSAT-1 and RADARSAT-2 CSA’s satellites, Ikonos and Quickbird satellites that acquire very high resolution pancromatic and multi-spectral (MS) images, etc.). Each sensor has specific properties with respect to the image acquisition mode (e.g., passive or active), geometrical, spectral, and radiometric resolutions, etc. In the development of automatic change-detection techniques, it is mandatory to take into account the properties of the sensors to properly extract information from the considered data. Let us discuss the main characteristics of different kinds of sensors in detail (Table 17.1 summarizes some advantages and disadvantages of different sensors for change-detection applications according to their characteristics). Images acquired from passive sensors are obtained by measuring the land-cover reflectance on the basis of the energy emitted from the sun and reflected from the ground1. Usually, the measured signal can be modeled as the desired reflectance (measured as a radiance) altered from an additive Gaussian noise. This noise model enables relatively easy processing of the signal when designing data analysis techniques. Passive sensors can acquire two different kinds of images [panchromatic (PAN) images and MS images] by defining different trade-offs between geometrical and spectral resolutions according to the radiometric resolution of the adopted detectors. PAN images are characterized by poor spectral resolution but very high geometrical resolution, whereas MS images have medium geometrical resolution but high spectral resolution. From the perspective of change detection, PAN images should be used when the expected size of the changed area is too small for adopting MS data. For example, in the case of the analysis of changes in urban areas, where detailed urban studies should be carried out, change detection in PAN images requires the definition of techniques capable of capturing the richness of information present both in the spatial-context relations between neighboring pixels and in the geometrical shapes of objects. MS data should be used

TABLE 17.1 Advantages and Disadvantages of Different Kinds of Sensors for Change-Detection Applications Sensor Multispectral (passive)

Panchromatic (passive)

SAR (active)

1

Advantages

Disadvantages

3 Characterization of the spectral signature of land-covers 3 The noise has an additive model 3 High geometrical resolution

3 Atmospheric conditions strongly affect the acquisition phase

3 High content of spatial-context information 3 Not affected by sunlight and atmospheric conditions

3 Atmospheric conditions strongly affect the acquisition phase 3 Poor characterization of the spectral signature of land-covers 3 Complexity of data preprocessing 3 Presence of multiplicative speckle noise

Also, the emission of Earth affects the measurements in the infrared portion of the spectrum.

Unsupervised Change Detection in Multi-Temporal SAR Images

375

when a medium geometrical resolution (i.e., 10–30 m) is sufficient for characterizing the size of the changed areas and a detailed modeling of the spectral signature of the landcovers is necessary for identifying the change investigated. Change-detection methods in MS images should be able to properly exploit the available MS information in the change detection process. A critical problem related to the use of passive sensors in change detection consists in the sensitivity of the image-acquisition phase to atmospheric conditions. This problem has two possible effects: (1) atmospheric conditions may not be conducive to measure land-cover spectral signatures, which depends on the presence of clouds; and (2) variations in illumination and atmospheric conditions at different acquisition times may be a potential source of errors, which should be taken into account to avoid the identification of false changes (or the missed detection of true changes). The working principle of active synthetic aperture radar (SAR) sensors is completely different from that of the passive ones and allows overcoming some of the drawbacks that affect optical images. The signal measured by active sensors is the Earth backscattering of an electromagnetic pulse emitted from the sensor itself. SAR instruments acquire different kinds of signals that result in different images: medium or high-resolution images, singlefrequency or multi-frequency, and single-polarimetric or fully polarimetric images. As for optical data, the proper geometrical resolution should be chosen according to the size of the expected investigated changes. The SAR signal has different geometrical resolutions and a different penetration capability depending on the signal wavelength, which is usually included between band X and band P (i.e., between 2 and 100 cm). In other words, shorter wavelengths should be used for measuring vegetation changes and longer and more penetrating wavelengths for studying changes that have occurred on or under the terrain. All the wavelengths adopted for SAR sensors neither suffer from atmospheric and sunlight conditions nor from the presence of clouds; thus multi-temporal radar backscattering does not change with atmospheric conditions. The main problem related to the use of active sensors is the coherent nature of the SAR signal, which results in a multiplicative speckle noise that makes acquired data intrinsically complex to be analyzed. A proper handling of speckle requires both an intensive preprocessing phase and the development of effective data analysis techniques. The different properties and statistical behaviors of signals acquired by active and passive sensors require the definition of different change-detection techniques capable of properly exploiting the specific data peculiarities. In the literature, many different techniques for change detection in images acquired by passive sensors have been presented [1–8], and many applications of these techniques have been reported. This is because of both the amount of information present in MS images and the relative simplicity of data analysis, which results from the additive noise model adopted for MS data (the radiance of natural classes can be approximated with a Gaussian distribution). Less attention has been devoted to change detection in SAR images. This is explained by the intrinsic complexity of SAR data, which require both an intensive preprocessing phase and the development of effective data analysis techniques capable of dealing with multiplicative speckle noise. Nonetheless, in the past few years the remote-sensing community has shown more interest in the use of SAR images in change-detection problems, due to their independence from atmospheric conditions that results in excellent operational properties. The recent technological developments in sensors and satellites have resulted in the design of more sophisticated systems with increased geometrical resolution. Apart from the active or passive nature of the sensor, the very high geometrical resolution images acquired by these systems (e.g., PAN images) require the development of specific techniques capable of taking advantage of the richness of the geometrical information they contain. In particular, both the high correlation

Signal and Image Processing for Remote Sensing

376

between neighboring pixels and the object shapes should be considered in the design of data analysis procedures. In the above-mentioned context, two main challenging issues of particular interest in the development of automatic change-detection techniques are: (1) the definition of advanced and effective techniques for change detection in SAR images, and (2) the development of proper methods for the detection of changes in very high geometrical resolution images. A solution for these issues lies in the definition of multi-scale and multi-resolution change-detection techniques, which can properly analyze the different components of the change signal at their optimal scale2. On the one hand, the multi-scale analysis allows one to better handle the noise present in medium-resolution SAR images, resulting in the possibility of obtaining accurate change-detection maps characterized by a high spatial fidelity. On the other hand, multi-scale approaches are intrinsically suitable to exploit the information present in very high geometrical resolution images according to effective modeling (at different resolution levels) of the different objects present at the scene. According to the analysis mentioned above, after a brief survey on change detection and on unsupervised change detection in SAR images, we present, in this chapter, a novel adaptive multi-scale change detection technique for multi-temporal SAR images. This technique exploits a proper scale-driven analysis to obtain a high sensitivity to geometrical features (i.e., details and borders of changed areas are well preserved) and a high robustness to noisy speckle components in homogeneous areas. Although explicitly developed and tested for change detection in medium-resolution SAR images, this technique can be easily extended to the analysis of very high geometrical resolution images. The chapter is organized into five sections. Section 17.2 defines the change-detection problem in multi-temporal remote-sensing images and focuses attention on unsupervised techniques for multi-temporal SAR images. Section 17.3 presents a multi-scale approach to change detection in multi-temporal SAR images recently developed by the authors. Section 17.4 gives an example of the application of the proposed multi-scale technique to a real multi-temporal SAR data set and compares the effectiveness of the presented method with those of standard single-scale change-detection techniques. Finally, in Section 17.5, results are discussed and conclusions are drawn.

17.2 17.2.1

Change Detection in Multi-Temporal Remote-Sensing Images: Literature Survey General Overview

A very important preliminary step in the development of a change-detection system, based on automatic or semi-automatic procedures, consists in the design of a proper phase of data collection. The phase of data collection aims at defining: (1) the kind of satellite to be used (on the basis of the repetition time and on the characteristics of the sensors mounted on-board), (2) the kind of sensor to be considered (on the basis of the desired properties of the images and of the system), (3) the end-user requirements (which are of basic importance for the development of a proper change-detection

2

It is worth noting that these kinds of approaches have been successfully exploited in image classification problems [9–12].

Unsupervised Change Detection in Multi-Temporal SAR Images

377

technique), and (4) the kinds of available ancillary data (all the available information that can be used for constraining the change-detection procedure). The outputs of the data-collection phase should be used for defining the automatic change-detection technique. In the literature, many different techniques have been proposed. We can distinguish between two main categories: supervised and unsupervised methods [9,13]. When performing supervised change detection, in addition to the multi-temporal images, multi-temporal ground-truth information is also needed. This information is used for identifying, for each possible land-cover class, spectral signature samples for performing supervised data classification and also for explicitly identifying what kinds of land-cover transitions have taken place. Three main general approaches to supervised change detection can be found in the literature: postclassification comparison, supervised direct multi-data classification [13], and compound classification [14–16]. Postclassification comparison computes the change-detection map by comparing the classification maps obtained by classifying independently two multi-temporal remote-sensing images. On the one hand, this procedure avoids data normalization aimed at reducing atmospheric conditions, sensor differences, etc. between the two acquisitions; on the other hand, it critically depends on the accuracies of the classification maps computed at the two acquisition dates. As postclassification comparison does not take into account the dependence existing between two images of the same area acquired at two different times, the global accuracy is close to the product of the accuracies yielded at the two times [13]. Supervised direct multi-data classification [13] performs change detection by considering each possible transition (according to the available a priori information) as a class and by training a classifier to recognize the transitions. Although this method exploits the temporal correlation between images in the classification process, its major drawback is that training pixels should be related to the same points on the ground at the two times and should accurately represent the proportions of all the transitions in the whole images. Compound classification overcomes the drawbacks of supervised multi-date classification technique by removing the constraint that training pixels should be related to the same area on the ground [14–16]. In general, the approach based on supervised classification is more accurate and detailed than the unsupervised one; nevertheless, the latter approach is often preferred in real-data applications. This is due to the difficulties in collecting proper ground-truth information (necessary for supervised techniques), which is a complex, time consuming, and expensive process (in many cases this process is not consistent with the application constraints). Unsupervised change-detection techniques are based on the comparison of the spectral reflectances of multi-temporal raw images and a subsequent analysis of the comparison output. In the literature, the most widely used unsupervised change-detection techniques are based on a three-step procedure [13,17]: (1) preprocessing, (2) pixel-by-pixel comparison of two raw images, and (3) image analysis and thresholding (Figure 17.1). The aim of the preprocessing step is to make the two considered images as comparable as possible. In general, preprocessing operations include: co-registration, radiometric and geometric corrections, and noise reduction. From the practical point of view, co-registration is a fundamental step as it allows obtaining a pair of images where corresponding pixels are associated to the same position on the ground3. Radiometric corrections reduce differences between the two acquisitions due to sunlight and atmospheric conditions. These procedures are applied to optical images, but they are not necessary for SAR 3

It is worth noting that usually it is not possible to obtain a perfect alignment between temporal images. This may considerably affect the change-detection process [18]. Consequently, if the amount of residual misregistration noise is significant, proper techniques aimed at reducing its effects should be used for change detection [1,4].

Signal and Image Processing for Remote Sensing

378

SAR Image (date t2)

SAR Image (date t2)

Preprocessing

Preprocessing

Comparison

Analysis of the log-ratio image FIGURE 17.1 Block scheme of a standard unsupervised changedetection approach.

Change-detection map (M )

images (as SAR data are not affected by atmospheric conditions). Also noise reduction is performed differently according to the kind of remote-sensing images considered. In optical images common low-pass filters can be used, whereas in SAR images proper despeckling filters should be applied. The comparison step aims at producing a further image where differences between the two acquisitions considered are highlighted. Different mathematical operators (see Table 17.2 for a summary) can be adopted for performing image comparison; this choice gives rise to different kinds of techniques [13,19–23]. One of the most widely used operators is the difference one. The difference can be applied to: (1) a single spectral band (univariate image differencing) [13,21–23], (2) multiple spectral bands (change vector analysis) [13,24], and (3) vegetation indices (vegetation index differencing) [13,19] or other linear (e.g., tasselled cap transformation [22]) or nonlinear combinations of spectral bands. Another widely used operator is the ratio operator (image ratioing) [13], which can be successfully used in SAR image processing [17,25,26]. A different approach is based on the use of the principal component analysis (PCA) [13,20,23]. PCA can be applied separately to the feature space at single times or jointly to both images. In the first case, comparison should be performed in the transformed feature space before performing change detection; in the second case, the minor components of the transformed feature space contains change information. TABLE 17.2 Summary of the Most Widely Used Comparison Operators. (fk is the considered feature at time tk that can be: (1) a single spectral band Xbk, (2) a vector of m spectral bands [Xk1, . . . ,Xkm], (3) a vegetation index Vk, or (4) a vector of features [Pk1, . . . ,Pkm] obtained after PCA. XD and XR are the images after comparison with the difference or ratio operators, respectively) Technique Univariate image differencing Vegetation index differencing Image rationing Change vector analysis Principal component analysis

Feature Vector fk at the Time tk fk fk fk fk fk

¼ ¼ ¼ ¼ ¼

Xkb Vk Xkb [Xk1, . . . ,Xkm] [Pk1, . . . , Pkn]

Comparison Operator XD XD XR XD XD

¼ ¼ ¼ ¼ ¼

f2 f1 f2 f1 f2=f1 kf2 f1k kf2 f1k

Unsupervised Change Detection in Multi-Temporal SAR Images

379

Performances of the above-mentioned techniques could be degraded by several factors (like differences in illumination at two dates, differences in atmospheric conditions, and in sensor calibration) that make a direct comparison between raw images acquired at different times difficult. These problems related to unsupervised change detection disappear when dealing with SAR images instead of optical data. Once image comparison is performed, the decision threshold can be selected either with a manual trial-and-error procedure (according to the desired trade-off between false and missed alarms) or with automatic techniques (e.g., by analyzing the statistical distribution of the image obtained after comparison, by fixing the desired false alarm probability [27,28], or following a Bayesian minimum-error decision rule [17]). Since the remote-sensing community has devoted more attention to passive sensors [6–8] rather than active SAR sensors, in the following section we focus our attention on changedetection techniques for SAR data. Although change-detection techniques based on different architectures have been proposed for SAR images [29–37], we focus on the most widely used techniques, which are based on the three-step procedure described above (see Figure 17.1). 17.2.2

Change Detection in SAR Images

Let us consider two co-registered intensity SAR images, X1 ¼ {X1(i,j), 1 i I, 1 j J} and X2 ¼ {X2(i,j), 1 i I, 1 j J}, of size I J, acquired over the same area at different times t1 and t2. Let V ¼ {vc, vu} be the set of classes associated with changed and unchanged pixels. Let us assume that no ground-truth information is available for the design of the change-detection algorithm, i.e., the statistical analysis of change and no-change classes should be performed only on the basis of the raw data. The changedetection process aims at generating a change-detection map representing changes on the ground between the two considered acquisition dates. In other words, one of the possible labels in V should be assigned to each pixel (i,j) in the scene. 17.2.2.1

Preprocessing

The first step for properly performing change detection based on direct image comparison is image preprocessing. This procedure aims at generating two images that are as similar as possible unless in changed areas. As SAR data are not corrupted by differences in atmospheric and sunlight conditions, preprocessing usually comprises three steps: (1) geometric correction, (2) co-registration, and (3) noise reduction. The first procedure aims at reducing distortions that are strictly related to the active nature of the SAR signal, as layover, foreshortening, and shadowing due to ground topography. The second step is very important, as it allows aligning temporal images to ensure that corresponding pixels in the spatial domain are associated to the same geographical position on the ground. Co-registration in SAR images is usually carried out by maximizing crosscorrelation between the multi-temporal images [38,39]. The major drawback of this process is the need for performing interpolation of backscattering values, which is a time-consuming process. Finally, the last step is aimed at reducing the speckle noise. Many different techniques have been developed in the literature for reducing the speckle. One of the most attractive techniques for speckle reduction is multi-looking [25]. This procedure, which is used for generating images with the same resolution along the azimuth and range directions, allows reduction of the effect of the coherent speckle components. However, a further filtering step is usually applied to the images for making them suitable to the desired analysis. Usually, adaptive despeckling procedures are applied. Among these procedures we mention the following filtering techniques:

380

Signal and Image Processing for Remote Sensing

Frost [40], Lee [41], Kuan [42], Gamma Map [43,44], and Gamma WMAP [45] (i.e., the Gamma MAP filter applied in the wavelet domain). As the description of despeckling filters is outside the scope of this chapter, we refer the reader to the literature for more details.

17.2.2.2 Multi-Temporal Image Comparison As described in Section 17.1, image pixel-by-pixel comparison can be performed by means of different mathematical operators. In general, the most widely used operators are the difference and the ratio (or log-ratio). Depending on the selected operator, the image resulting from the comparison presents different behaviors with respect to the change-detection problem and to the signal statistics. To analyze this issue, let us consider two multi-look intensity images. It is possible to show that the measured backscattering of each image follows a Gamma distribution [25,26], that is, LL XkL1 LXk p(Xk ) ¼ L exp , mk mk (L 1)!

k ¼ 1, 2

(17:1)

where Xk is a random variable that represents the value of the pixels in image Xk (k ¼ 1, 2), mk is the average intensity of a homogeneous region at time tk, and L is the equivalent number of looks (ENL) of the considered image. Let us also assume that the intensity images X1 and X2 are statistically independent. This assumption, even if not entirely realistic, simplifies the analytical derivation of the pixel statistical distribution in the image after comparison. In the following, we analyze the effects of the use of the difference and ratio (log-ratio) operators on the statistical distributions of the signal. 17.2.2.2.1 Difference Operator The difference image XD is computed subtracting the image acquired before the change from the image acquired after, i.e., XD ¼ X2 X1

(17:2)

Under the stated conditions, the distribution of the difference image XD is given by [25,26]: n o XD j L1 X LL exp L m2 (L 1 þ j)! m1 m2 L1J X p(XD ) ¼ j!(L 1 j)! D (L 1)! (m1 þ m2 )L L(m1 þ m2 ) j¼0

(17:3)

where XD is a random variable that represents the values of the pixels in XD. As can be seen, the difference-image distribution depends on both the relative change between the intensity values in the two images and also a reference intensity value (i.e., the intensity at t1 or t2). It is possible to show that the distribution variance of XD increases with the reference intensity level. From a practical point of view, this leads to a higher changedetection error for changes that have occurred in high intensity regions of the image than in low intensity regions. Although in some applications the difference operator was used with SAR data [46], this behavior is an undesired effect that renders the difference operator intrinsically not suited to the statistics of SAR images.

Unsupervised Change Detection in Multi-Temporal SAR Images 17.2.2.2.2

381

Ratio Operator

The ratio image XR is computed by dividing the image acquired after the change by the image acquired before (or vice-versa), i.e., X R ¼ X 2 =X 1

(17:4)

It is possible to prove that the distribution of the ratio image XR can be written as follows [25,26]: p(XR ) ¼

L XL1 (2L 1)! X R R R þ XR )2L (L 1)!2 (X

(17:5)

R is the where XR is a random variable that represents the values of the pixels in XR and X true change in the radar cross section. The ratio operator shows two main advantages over the difference operator. The first one is that the ratio-image distribution depends R ¼ m2=m1 in the average intensity between the two dates only on the relative change X and not on a reference intensity level. Thus changes are detected in the same manner both in high- and low-intensity regions. The second advantage is that the ratioing allows reduction in common multiplicative error components (which are due to both multiplicative sensor calibration errors and the multiplicative effects of the interaction of the coherent signal with the terrain geometry [25,47]), as far as these components are the same for images acquired with the same geometry. It is worth noting that, in the literature, the ratio image is usually expressed in a logarithmic scale. With this operation the distribution of the two classes of interest (vc and vu) in the ratio image can be made more symmetrical and the residual multiplicative speckle noise can be transformed to an additive noise component [17]. Thus the log-ratio operator is typically preferred when dealing with SAR images and change detection is performed analyzing the log-ratio image XLR defined as: X LR ¼ log X R ¼ log

X2 ¼ log X 2 log X 1 X1

(17:6)

Based on the above considerations, the ratio and log-ratio operators are more used than the difference one in SAR change-detection applications [17,26,29,47–49]. It is worth noting that for keeping the changed class on one side of the histogram of the ratio (or log-ratio) image, a normalized ratio can be computed pixel-by-pixel, i.e.,

X NR

X1 X2 ¼ min , X2 X1

(17:7)

This operator allows all changed areas (independently of the increasing or decreasing value of the backscattering coefficient) to play a similar role in the change-detection problem.

17.2.2.3

Analysis of the Ratio and Log-Ratio Image

The most widely used approach to extract change information from the ratio and log-ratio image is based on histogram thresholding4. In this context, the most difficult task is to 4

For simplicity, in the following, we will refer to the log-ratio image.

382

Signal and Image Processing for Remote Sensing

properly define the threshold value. Typically, changed pixels are identified as those pixels that modified their backscattering more than + x dB, where x is a real number depending on the considered scene. The value of x is fixed according to the kind of change and the expected magnitude variation to obtain a desired probability of correct detection Pd (which is the probability to be over the threshold if a change occurred) or false alarm Pfa (which is the probability to be over the threshold if no change occurred). It has been shown that the value of x can be analytically defined as a function of the true change in R and of the ENL L [25,26], once Pd and Pfa are fixed. This the radar backscattering X means that there exists a value of L such that the given constraints on Pd and Pfa are satisfied. The major drawback of this approach is that, as the desired change intensity decreases and the detection probability increases, a ratio image with an even higher ENL is required for constraint satisfaction. This is due to the sensitivity of the ratio to the presence of speckle; thus a complex preprocessing procedure is required for increasing the ENL. A similar approach is presented in Ref. [46]; it identifies the decision threshold on the basis of predefined values on the cumulative histogram of the difference image. It is worth noting that these approaches are not fully automatic and objective from an application point of view, as they depend on the user’s sensibility in constraint definition with respect to the considered kind of change. Recently, extending the work previously carried out for MS passive images [3,5,23,50], a novel Bayesian framework has been developed for performing automatic unsupervised change detection in the log-ratio image derived from SAR data. The aim of this framework is to use the well-known Bayes decision theory in unsupervised problems for deriving decision thresholds that optimize the separation between changed and unchanged pixels. The main problems to be solved for the application of the Bayes decision theory consist in the estimation of both the probability density functions p(XLR=vc) and p(XLR=vu) and the a-priori probabilities P(vc) and P(vu) of the classes vc and vu, respectively [51], without any ground-truth information (i.e., without any training set). The starting point of such kinds of methodologies is the hypothesis that the statistical distributions of pixels in the log-ratio image can be modeled as a mixture of two densities associated with the classes of changed and unchanged pixels, i.e., p(XLR ) ¼ p(XLR =vu )P(vu ) þ p(XLR =vc )P(vc )

(17:8)

Under this hypothesis, two different approaches to estimate class statistical parameters have been proposed in the literature: (1) an implicit approach [17] and (2) an explicit approach [49]. The first approach derives the decision threshold according to an implicit and biased parametric estimation of the statistical model parameters, carried out on the basis of simple cost functions. In this case, the change-detection map is computed in a one-step procedure. The second approach separates the image analysis in two steps: (1) estimation of the class statistical parameters and (2) definition of the decision threshold based on the estimated statistical parameters. Both techniques require the selection of a proper statistical model for the distributions of the change and no-change classes. In Ref. [17], it has been shown that the generalized Gaussian distribution is a flexible statistical model that allows handling the complexity of the log-ratio images better than the more commonly used Gaussian distribution. Based on this consideration, in Refs. [17] the wellknown Kittler and Illingworth (KI) thresholding technique (which is an implicit estimation approach) [52–55] was reformulated under the generalized Gaussian assumption for the statistical distributions of classes. Despite its simplicity, the KI technique produces satisfactory change-detection results. The alternative approach, proposed in Refs. [56–58], which is based on a theoretically more precise explicit procedure for the estimation of statistical parameters of classes, exploits the combined use of the expectation–maximization (EM)

Unsupervised Change Detection in Multi-Temporal SAR Images

383

algorithm and of the Bayesian decision theory for producing the change-detection map. In Ref. [49], the iterative EM algorithm (reformulated under the hypothesis of generalized Gaussian data distribution) was successfully applied to the analysis of the log-ratio image. Once the statistical parameters are computed, pixel-based or context-based decision rules can be applied. In the former group, we find the Bayes rule for minimum error, the Bayes rule for minimum cost, the Neyman–Pearson criterion, etc. [3,5,24,59]. In the latter group, we find the contextual Bayes rule for minimum error formulated in the Markov random field (MRF) framework [24,60]. In both cases (implicit and explicit parameter estimation), change-detection accuracy increases as the ENL increases. This means that, depending on the data and the application, an intensive despeckling phase may be required to achieve good change-detection accuracies [17]. It is worth noting that Equation 17.8 assumes that only one kind of change5 has occurred in the area under investigation between the two acquisition dates. However, techniques that can automatically identify the number of changes and the related threshold values have been recently proposed in the literature (both in the context of implicit [61] and explicit [60] approaches). We refer the reader to the literature for greater details on these approaches. Depending on the kind of preprocessing applied to the multi-temporal images, the standard thresholding techniques can achieve different trade-offs between the preservation of detail and accuracy in the representation of homogeneous areas in change-detection maps. In most of the applications, both properties need to be satisfied; however, they are contrasting to each other. On the one hand, high accuracy in homogeneous areas usually requires an intensive despeckling phase; on the other hand, intensive despeckling degrades the geometrical details in the SAR images. This is due to both the smoothing effects of the filter and the removal of the informative components of the speckle (which are related to the coherent properties of the SAR signal). To address the above limitations of standard methods, in the next section we present an adaptive scale-driven approach to the analysis of the log-ratio image recently developed by the authors.

17.3

Advanced Approaches to Change Detection in SAR Images: A Detail-Preserving Scale-Driven Technique

In this chapter we present a scale-driven approach to unsupervised change detection in SAR images, which is based on a multi-scale analysis of the log-ratio image. This approach can be suitably applied to medium- and high-resolution SAR images to produce change-detection maps characterized by high accuracy both in modeling details present in the scene (e.g., border of changed areas) and in homogeneous regions. Multi-temporal SAR images intrinsically contain different areas of changes in the spatial domain that can be modeled at different spatial resolutions. The identification of these areas with high accuracy requires the development of proper change-detection techniques capable of handling information at different scales. The rationale of the presented scale-driven unsupervised change-detection method is to exploit only high-resolution levels in the analysis of the expected edge (or detail) pixels and to use low-resolution levels also in the processing of pixels in homogeneous areas, to improve both preservation of geometrical detail and accuracy in homogeneous areas in the final change-detection map. In 5

Or different kinds of changes that can be represented with a single generalized Gaussian distribution.

Signal and Image Processing for Remote Sensing

384

the following, we present the proposed multi-scale change-detection technique in the context of the analysis of multi-temporal SAR images. However, we expect that the methodology has general validity and can also be applied successfully to change detection in very high resolution optical and SAR images with small modifications in the implementation procedures. The presented scale-driven technique is based on three main steps: (1) multi-scale decomposition of the log-ratio image, (2) adaptive selection of the reliable scales for each pixel (i.e., the scales at which the considered pixel can be represented without border details problems) according to an adaptive analysis of its local statistics, and (3) scaledriven combination of the selected scales (Figure 17.2). Scale-driven combination can be performed by following three different strategies: (1) fusion at the decision level by an ‘‘optimal’’ scale selection, (2) fusion at the decision level of all reliable scales, and (3) fusion at the feature level of all reliable scales. The first step of the proposed method aims at building a multi-scale representation of the change information in the considered test site. The desired scale-dependent representation can be obtained by applying different decomposition techniques to the data, such as Laplacian–Gaussian pyramid decomposition [62], wavelet transform [63,64], recursively upsampled bicubic filter [65], etc. Given the computational cost and the assumption of the additive noise model required by the above techniques, we chose to apply the multiresolution decomposition process to the log-ratio image XLR, instead of decomposing the two original images X1 and X2 separately. At the same time this allows a reduction in the computational cost and satisfies the additive noise model hypothesis. The selection of the most appropriate multi-resolution technique is related to the statistical behaviors of XLR and will be discussed in the next section. The multi-resolution decomposition step produces 0 n N1 a set of images XMS ¼ {XLR , ... , XLR , ... , XLR }, where the superscript n (n ¼ 0, 1,..., N1) indicates the resolution level. As we consider a dyadic decomposition process, the scale corresponding to each resolution level is given by 2n. In our notation, the output at 0 resolution level 0 corresponds to the original image, i.e., XLR XLR. For n ranging from 0 to N1, the obtained images are distinguished by different trade-offs between preservation of spatial detail and speckle reduction. In particular, images with a low value of n are strongly affected by speckle, but they are characterized by a large amount of geometrical details, whereas images identified by a high value of n show significant speckle reduction and contain degraded geometrical details (high frequencies are smoothed out). In the second step, local and global statistics are evaluated for each pixel at different resolution levels. At each level and for each spatial position, by comparing the local and X LR

Multi-resolution decomposition

0

X LR

…

n

X LR

…

N−1

Adaptive scale identification

X LR

… Scale-driven fusion

Change-detection map (M ) FIGURE 17.2 General scheme of the proposed approach.

…

Unsupervised Change Detection in Multi-Temporal SAR Images

385

global statistical behaviors it is possible to identify adaptively whether the considered scale is reliable for the analyzed pixel. The selected scales are used to drive the last step, which consists of the generation of the change-detection map according to a scale-driven fusion. In this paper, three different scale-driven combination strategies are proposed and investigated. Two perform fusion at the decision level, while the third performs it at the feature level. Fusion at the decision level can either be based on ‘‘optimal’’ scale selection or on the use of all reliable scales; fusion at the feature level is carried out by analyzing all reliable scales.

17.3.1

Multi-Resolution Decomposition of the Log-Ratio Image

As mentioned in the previous section, our aim is to handle the information at different scales (resolution levels) to improve both preservation of geometrical detail and accuracy in homogeneous areas in the final change-detection map. Images included in the set XMS are computed by adopting a multi-resolution decomposition process of the log-ratio image XLR. In the SAR literature [45,66–70], image multi-resolution representation has been applied extensively to image de-noising. Here, a decomposition based on the twodimensional discrete stationary wavelet transform (2D-SWT) has been adopted, as in our image analysis framework it has a few advantages (as described in the following) over the standard discrete wavelet transform (DWT) [71]. As the log-ratio operation transforms the SAR signal multiplicative model into an additive noise model, SWT can be applied to XLR without any additional processing. 2D-SWT applies appropriate level-dependent highand low-pass filters with impulse response hn() and ln(), (n ¼ 0, 1, . . . , N 1), respectively, to the considered signal at each resolution level. A one-step wavelet decomposition is based on both level-dependent high- and low-pass filtering, first along rows and then along columns to produce four different images at the next scale. After each convolution step, unlike DWT, SWT avoids down-sampling the filtered signals. Thus, according to the scheme in Figure 17.3, the image XLR is decomposed into four images of the same size as LL1 the original. In particular, decomposition produces: (1) a lower resolution version XLR of image XLR, which is called the approximation sub-band, and contains low spatial frequencies both in the horizontal and the vertical direction at resolution level 1; and (2) LH1 HL1 HH1 three high-frequency images XLR , XLR , and XLR , which correspond to the horizontal, vertical, and diagonal detail sub-bands at resolution level 1, respectively. Note that, superscripts LL, LH, HL, and HH specify the order in which high-(H) and low-(L) pass filters have been applied to obtain the considered sub-band.

Row-wise

Column-wise LL

l 0(·)

X LR1

h0(·)

LH1 X LR

0 l (·)

X LR1

h0(·)

X LR 1

l 0(·)

X LR

HL

h0(·) HH

FIGURE 17.3 Block scheme of the stationary wavelet decomposition of the log-ratio image XLR.

Signal and Image Processing for Remote Sensing

386

Multi-resolution decomposition is obtained by recursively applying the described LLn procedure to the approximation sub-band XLR obtained at each scale 2n. Thus, the outputs at a generic resolution level n can be expressed analytically as follows: LL X LR(nþ1) (i,

j) ¼

n n D 1 D 1 X X

p¼0 LH X LR (nþ1) (i,

j) ¼

n n D 1 D 1 X X

p¼0 HL

HH

n n D 1 D 1 X X

p¼0

X LR (nþ1) (i, j) ¼

p¼0

n ln [p]hn [q]X LL LR (i þ p, j þ q)

q¼0

n n D 1 D 1 X X

X LR (nþ1) (i, j) ¼

n ln [p]ln [q]X LL LR (i þ p, j þ q)

q¼0

n hn [p]ln [q]X LL LR (i þ p, j þ q)

q¼0

q¼0

n hn [p]hn [q]X LL LR (i þ p, j þ q)

(17:9)

where Dn is the length of the wavelet filters at resolution level n. At each decomposition step, the length of the impulse response of both high- and low-pass filters is up-sampled by a factor 2. Thus, filter coefficients for computing sub-bands at resolution level nþ1 can be obtained by applying a dilation operation to the filter coefficients used to compute level n. In particular, 2n1 zeros are inserted between the filter coefficients used to compute sub-bands at the lower resolution levels [71]. This allows a reduction in the bandwidth of the filters by a factor two between subsequent resolution levels. Filter coefficients of the first decomposition step for n ¼ 0 depend on the selected wavelet family and on the length of the chosen wavelet filter. According to an analysis of the literature [68,72], we selected the Daubechies wavelet family and set the filter length to 8. The impulse response of Daubechies of order 4 low-pass filter prototype is given by the following coefficients set: {0.230378, 0.714847, 0.630881, 0.0279838, 0.187035, 0.0308414, 0.0328830, 0.0105974}. The finite impulse response of the high-pass filter for the decomposition step is obtained by satisfying the properties of the quadrature mirror filters. This is done by reversing the order of the low-pass decomposition filter coefficients and by changing the sign of the even indexed coefficients [73]. To adopt the proposed multi-resolution fusion strategies, one should return to the original image domain. This is done by applying the two-dimensional inverse stationary wavelet transform (2D-ISWT) at each computed resolution level independently. For further detail about the stationary wavelet transform, the reader is referred to Ref. [71]. To obtain the desired image set XMS (where each image contains information at a different scale), for each resolution level a one-step inverse stationary wavelet transform is applied in the reconstruction phase as many times as in the decomposition phase. The reconstruction process can be performed by applying the 2D-ISWT either to the approximation and thresholded detail sub-bands at the considered level (this is usually done in wavelet-based speckle filters [69]) or only to the approximation sub-bands at each resolution level6. Since the change-detection phase considers all the different levels, all the geometrical detail is in XMS even when detail coefficients at a particular scale are 6

It is worth noting that the approximation sub-band contains low frequencies in both horizontal and vertical directions. It represents the input image at a coarser scale and contains most informative components, whereas detail sub-bands contain information related to high frequencies (i.e., both geometrical detail information and noise components) each in a preferred direction. According to this observation, it is easy to understand how proper thresholding of detail coefficients allows noise reduction [69].

Unsupervised Change Detection in Multi-Temporal SAR Images

387

neglected (in other words, the details removed at a certain resolution level are recovered at a higher level without removing them from the decision process). Once all resolution levels have been brought back to the image domain, the desired multi-scale sequence of n images XLR (n ¼ 0, 1, . . . , N 1) is complete and each element in XMS has the same size as the original image. It is important to point out that unlike DWT, SWT avoids decimating. Thus, this multiresolution decomposition strategy ‘‘fills in the gaps’’ caused by the decimation step in the standard wavelet transform [71]. In particular, the SWT decomposition preserves translation invariance and allows avoiding aliasing effects during synthesis without providing high-frequency components.

17.3.2

Adaptive Scale Identification

n Based on the set of multi-scale images XLR (n ¼ 0, 1, . . . , N 1) obtained, we must identify reliable scales for each considered spatial position to drive the next fusion stage with this information. By using this information we can obtain change-detection maps characterized by high accuracy in homogeneous and border areas. Reliable scales are selected according to whether the considered pixel belongs to a border or a homogeneous area at different scales. It is worth noting that the information at low-resolution levels is not reliable for pixels belonging to the border area, because at those scales details and edge information has been removed from the decomposition process. Thus, a generic scale is reliable for a given pixel, if the pixel at this scale is not in a border region or if it does not represent a geometrical detail. To define whether a pixel belongs to a border or a homogeneous area at a given scale n, we use a multi-scale local coefficient of variation (LCVn), as typically done in adaptive speckle de-noising algorithms [70,74]. This allows better handling of any residual multiplicative noise that may still be present in the scale selection process after ratioing7. As the coefficient of variation cannot be computed on the multi-scale log-ratio image sequence, the analysis is applied to the multi-resolution ratio image sequence, which can be easily obtained from the former by inverting the logarithm operation. Furthermore, it should be mentioned that by working on the multi-resolution ratio sequence we can design a homogeneity test capable of identifying border regions (or details) and no-border regions related to the presence of changes on the ground. This is different from applying the same test to the original images (which would result in identifying border and no-border regions with respect to the original scene but not with respect to the change signal). The LCVn is defined as:

LCVn (i, j) ¼

sn (i, j) mn (i, j)

(17:10)

where sn(i, j) and mn(i, j) are the local standard deviation and the local mean, respectively, computed for the spatial position (i, j) at resolution level n (n ¼ 0, 1, . . . , N 1), on a moving window of a user-defined size. Windows that are too small reduce the reliability of the local statistical parameters, while those that are too large decrease in sensitivity to identify geometrical details. Thus the selected size should be a trade-off between the above properties. The normalization operation defined in Equation 17.10 helps in adapting the standard deviation to the multiplicative speckle model. This coefficient is a 7 An alternative choice could be to use the standard deviation computed on the log-ratio image. However, in this way we would neglect possible residual effects of the multiplicative noise component.

Signal and Image Processing for Remote Sensing

388

measure of the scene heterogeneity [74]: low values correspond to homogeneous areas, while high values refer to heterogeneous areas (e.g., border areas and point targets). To separate the homogeneous from the heterogeneous regions, a threshold value must be defined. In a homogeneous region the degree of homogeneity can be expressed in relation to the global coefficient of variation (CVn) of the considered image at resolution level n, which is defined as: CVn ¼

sn mn

(17:11)

where sn and mn are the mean and the standard deviation computed over a homogeneous region at resolution level n, (n ¼ 0, 1, . . . , N 1). Homogeneous regions at each scale can be defined as those regions that satisfy the following condition: LCVn (i, j) CVn

(17:12)

In detail, a resolution level r (r ¼ 0, 1, . . . , N 1) is said to be reliable for a given pixel if Equation 17.12 is satisfied for all resolution levels t (t ¼ 0, 1, . . . , r). Thus, for the pixel (i, j), Rij the set XMS of images with a reliable scale is defined as: n o Rij Sij X MS ¼ X 0LR , . . . , X nLR , . . . , X LR , with Sij N 1

(17:13)

where Sij is the level with the lowest resolution (identified by the highest value of n), such that the pixel can be represented without any border problems and therefore it satisfies the definition of a reliable scale shown in Equation 17.12 (note that the value of Sij is pixeldependent). It is worth noting that, if the scene contains different kinds of changes with different radiometry (e.g., with increasing and decreasing radiometry), the above analysis should be applied to the normalized ratio image XNR (Equation 17.7) (rather than to the standard ratio image XR). This makes the identification of border areas independent of the order with which the images are considered in the ratio, thus allowing all changed areas (independently of the related radiometry) to play a similar role in the definition of border pixels.

17.3.3

Scale-Driven Fusion

Rij Once the set XMS has been defined for each spatial position, selected reliable scales are used to drive the fourth step, which consists of the generation of the change-detection map according to a scale-driven fusion. In this chapter three different fusion strategies are reported: two of them perform fusion at the decision level, while the third performs it at the feature level. Fusion at the decision level can either be based on ‘‘optimal’’ scale selection Rij (FDL-OSS) or on the use of all reliable scales (i.e., scales included in XMS ) (FDL-ARS); fusion at the feature level is carried out by analyzing all reliable scales (FFL-ARS). For each pixel, the FDL-OSS strategy only considers the reliable level with the lowest resolution, that is, the ‘‘optimal’’ resolution level Sij. The rationale of this strategy is that the reliable level with the lowest resolution presents an ‘‘optimal’’ trade-off between speckle reduction and detail preservation for the considered pixel. In detail, each scaledependent image in the set XMS is analyzed independently to discriminate between the two classes vc and vu associated with change and no-change classes, respectively. The n desired partitioning for the generic scale n can be obtained by thresholding XLR . It is

Unsupervised Change Detection in Multi-Temporal SAR Images

389

worth noting that since the threshold value is scale-dependent, given the set of images 0 n N1 XMS ¼ {XLR , . . . , XLR , . . . , XLR }, we should determine (either automatically [17,52,59] or manually) a set of threshold values T ¼ {T0, . . . , Tn, . . . , TN 1}. Regardless of the threshold-selection method adopted, a sequence of change-detection maps 0 n MMS ¼ {M0, . . . , Mn, . . . , MN 1} is obtained from the images in XMS ¼ {XLR , . . . , XLR ,..., N1 XLR }. A generic pixel M(i,j) in the final change-detection map M is assigned to the class it belongs to in the map MSij (2 MMS) computed at its optimal selected scale, i.e., M(i, j) 2 vk , MSij (i, j) 2 vk ,

with k ¼ {c, u}

and

Sij N 1

(17:14)

The accuracy of the resulting change-detection map depends both on the accuracy of the maps in the multi-resolution sequence and on the effectiveness of the procedure adopted to select the optimal resolution level. Both aspects are affected by the amount of residual Sij noise in XLR . The second approach that considers FDL-ARS, makes the decision process more robust R to noise. For each pixel, the set MMS (i, j) ¼ {M0 (i, j), . . . , Mn (i, j), . . . , MSij (i, j)} of the R related reliable multi-resolution labels is considered. Each label in MMS can be seen as a decision of a member of a pool of experts. Thus the pixel is assigned to the class that obtains the highest number of votes. In fact, the final change-detection map M is comR puted by applying a majority voting rule to the set MMS (i, j), at each spatial position. The class that receives the largest number of votes Vvk (i, j), k ¼ {c, u}, represents the final decision for the considered input pattern, i.e., M(i, j) 2 vk , vk ¼ arg max Vvh (i, j) , vh 20

k ¼ {c, u}

(17:15)

The main disadvantage of the FDL-ARS strategy is that it only considers the final classification of each pixel at different reliable scales. A better utilization of the information in the multi-resolution sequence XMS can be obtained by considering a fusion at feature level strategy (FFL-ARS). To accomplish the fusion process at different scales, a R

0

n

N1

new set of images X MS ¼ {X MS , . . . , X MS , . . . , X MS } is computed by averaging all possible sequential combinations of images in XMS, i.e., n

X MS ¼

n 1 X Xh , n þ 1 h¼0 LR

with n ¼ 0, 1, . . . , N 1

(17:16)

where the superscript n identifies the highest scale included in the average operation. n When low values of n are considered, the image X MS contains a large amount of both n geometrical details and speckle components, whereas when n increases, the image X MS contains a smaller amount of both geometrical details and speckle components. A pixel in position (i, j) is assigned to the class obtained by applying a standard thresholding Sij procedure to the image X MS , Sij N 1, computed by averaging on the reliable scales selected for that spatial position, i.e., ( Sij vu if X MS (i, j) T Sij M(i, j) 2 (17:17) Sij vc if X MS (i, j) >TSij where TSij is the decision threshold optimized (either automatically [17,52,59] or manuSij ally) for the considered image X MS . The latter strategy is capable of exploiting also the information component in the speckle, as it considers all the high frequencies in the

Signal and Image Processing for Remote Sensing

390

decision process. It is worth noting that in the FFL-ARS strategy, as the information n present at a given scale r is also contained in all images XLR with n < r, the components characterizing the optimal scale Sij (and the scales closer to the optimal one) in the fusion process are implicitly associated with greater weights than those associated with other considered levels. This seems reasonable, given the importance of these components for the analyzed spatial position.

17.4

Expe rimental Results and C omparisons

In this section experimental results obtained by applying the proposed multi-scale change-detection technique to a data set of multi-temporal remote-sensing SAR images are reported. These results are compared with those obtained by applying standard single-scale change-detection techniques to the same data set. In all trials involving image thresholding, the optimal threshold value was obtained according to a manual trial-and-error procedure. We selected for each image the threshold value (among all possible) that showed the minimum overall error in the change-detection map compared to the available reference map. This ensured it is possible to properly compare method performances without any bias due to human operator subjectivity or to the kind of automatic thresholding algorithm adopted. However, any type of automatic thresholdselection technique can be used with this technique (see Ref. [17] for more details about automatic thresholding of the log-ratio image). Performance assessment was accomplished both quantitatively (in terms of overall errors, false, and missed alarms) and qualitatively (according to a visual comparison of the produced change-detection maps with reference data).

17.4.1

Data Set Description

The data set used in the experiments is made up of two SAR images acquired by the ERS1 SAR sensor (C-band and VV-polarization) in the province of Saskatchewan (Canada) before (1st July) and after (14th October) the 1995 fire season. The two images considered are characterized by a geometrical resolution of 25 m in both directions and by a nominal number of looks equal to 5. The selected test site (see in Figure 17.4a and Figure 17.4b) is a section (350350 pixels) of the entire available scene. A fire caused by a lightning event destroyed a large portion of the vegetation in the considered area between the two dates mentioned above. The two multi-look intensity images were geocoded using the digital elevation model (DEM) GTOPO30; no speckle reduction algorithms were applied to the images. The logratio image was computed from the above data according to Equation 17.4. To enable a quantitative evaluation of the effectiveness of the proposed approach, a reference map was defined manually (see Figure 17.5b). To this end, we used the available ground-truth information provided by the Canadian Forest Service (CFS) and by the fire agencies of the individual Canadian provinces. Ground-truth information is coded in a vector format and includes information about fires (e.g., geographical coordinates, final size, cause, etc.) that occurred from 1981 to 1995 and in areas greater than 200 ha in final size. CFS ground truth was used for a rough localization of the burned areas as it shows a medium geometrical resolution. An accurate identification of the boundaries of the burned areas was obtained from a detailed visual analysis of the two original 5-look

Unsupervised Change Detection in Multi-Temporal SAR Images (a)

391

(b)

(c)

FIGURE 17.4 Images of the Saskatchewan province, Canada, used in the experiments. (a) Image acquired from the ERS-1 SAR sensor in July 1995, (b) image acquired from the ERS-1 SAR sensor in October 1995, and (c) analyzed log-ratio image.

intensity images (Figure 17.4a and Figure 17.4b), the ratio image, and the log-ratio image (Figure 17.4c), carried out accurately in cooperation with experts in SAR-image interpretation. In particular, different color composites of the above-mentioned images were used to highlight all the portions of the changed areas in the best possible way. It is worth noting that no despeckling or wavelet-based analysis was applied to the images exploited to generate the reference map for this process to be as independent as possible of the methods adopted in the proposed change-detection technique. In generating the reference map, the irregularities of the edges of the burned areas were faithfully reproduced to enable accurate assessment of the effectiveness of the proposed change-detection approach. At the end of the process, the obtained reference map contained 101219 unchanged pixels and 21281 changed pixels. Our goal was to obtain, with the proposed automatic technique, a change-detection map as similar as possible to the reference map obtained according to the above-mentioned time-consuming manual process driven by ground-truth information and by experts in SAR image interpretation.

Signal and Image Processing for Remote Sensing

392

FIGURE 17.5 Multi-scale image sequence obtained by applying the wavelet decomposition procedure to the log-ratio image XMS ¼ {X1LR , . . . , X7LR }.

17.4.2

Results

Several experiments were carried out to assess the effectiveness of the proposed changedetection technique (which is based on scale-driven fusion strategies) with respect to classical methods (which are based on thresholding of the log-ratio image). In all trials involving image thresholding, the optimal threshold value was obtained according to a manual trial-and-error procedure. Among all possible values, we selected for each image the threshold value that showed the minimum overall error in the changedetection map compared to the reference map. Subsequently, it was possible to evaluate the optimal performance of the proposed methodology without any bias due to human operator subjectivity or to the fact that the selection was made by an automatic thresholding algorithm. However, any type of automatic threshold-selection technique can be used with this technique (see Ref. [17] for more details about automatic thresholding of

Unsupervised Change Detection in Multi-Temporal SAR Images

393

the log-ratio image). As the described procedure is independently optimized for each considered image, it leads to different threshold values in each case. Performance assessment was accomplished both quantitatively (in terms of overall errors, false, and missed alarms) and qualitatively (according to a visual comparison of the produced change-detection maps with reference data). To apply the three scale-driven fusion strategies (see Section 17.3.3), the log-ratio image was first decomposed into seven resolution levels by applying the Daubechies-4 SWT. Each computed approximation sub-band was used to construct different scales, that is, 7 XMS ¼ {X1LR , . . . , XLR } (see Figure 17.5). For simplicity, only approximation sub-bands have been involved in the reconstruction phase (it is worth noting that empirical experiments on real data have confirmed that details sub-band elimination does not affect the 0 change-detection accuracy). Observe that the full-resolution original image XLR ( XLR ) was discarded from the analyzed set, as it was affected by a strong speckle noise. In 0 particular, empirical experiments pointed out that when XLR is used on this data set the accuracy of the proposed change-detection technique gets degraded. Nevertheless, in the general case, resolution level 0 can also be considered and should not be discarded a priori. A number of trials were carried out to identify the optimal window size to compute the local coefficient of variation (LCVn) used to detect detail pixels (e.g., border) at different resolution levels. The optimal size (i.e., the one that gives the minimum overall error) was selected for all analyzed strategies (see Table 17.3). Table 17.3 summarizes the quantitative results obtained with the different fusion strategies. As can be seen from the analysis of the overall error, the FFL-ARS strategy gave the lowest error, i.e., 5557 pixels, while the FDL-ARS strategy gave 6223, and the FDL-OSS strategy 7603 (the highest overall error). As expected, by including all the reliable scales in the fusion phase it was possible to improve the change-detection accuracy compared to a single ‘‘optimal’’ scale. The FFL-ARS strategy gave the lowest false and missed alarms, decreasing their values by 1610 and 436 pixels, respectively, compared to the FDL-OSS strategy. This is because on the one hand the FDL-OSS procedure is penalized both by the change-detection accuracy at a single resolution level (which is significantly affected by noise when fine scales are considered) and by residual errors in identifying the optimal scale of a given pixel; on the other hand, the use of the entire subset of reliable scales allows a better exploitation of the information at the highest resolution levels of the multi-resolution sequence in the change-detection process. It is worth noting that FFL-ARS outperformed FDL-ARS also in terms of false (2181 vs. 2695) and missed (3376 vs. 3528) alarms. This is mainly due to its ability to handle better all the information in the scale-dependent images before the decision process. This leads to a more accurate recognition of critical pixels (i.e., pixels that are very close to the boundary between the changed and unchanged classes on the log-ratio image) that

TABLE 17.3 Overall Error, False Alarms, and Missed Alarms (in Number of Pixels and Percentage) Resulting from the Proposed Adaptive Scale-Driven Fusion Approaches False Alarms Fusion Strategy FDL-OSS FDL-ARS FFL-ARS

Missed Alarms

Overall Errors

Pixels

%

Pixels

%

Pixels

%

LCV Window Size

3791 2695 2181

3.75% 2.66% 2.15%

3812 3528 3376

17.91% 16.58% 15.86%

7603 6223 5557

6.21% 5.08% 4.54%

23 23 77 55

Signal and Image Processing for Remote Sensing

394 (a)

(b)

FIGURE 17.6 (a) Change-detection map obtained for the considered data set using the FFL-ARS strategy on all reliable scales and (b) reference map of the changed area used in the experiment.

exploit the joint consideration of all the information present at the different scales in the decision process. For a better understanding of the results achieved, we made a visual analysis of the change-detection maps obtained. Figure 17.6a shows the change-detection map obtained with the FFL-ARS strategy (which proved to be the most accurate), while Figure 17.6b is the reference map. As can be seen, the considered strategy produced a change-detection map that was very similar to the reference map. In particular, the change-detection map obtained with the proposed approach shows good properties both in terms of detail preservation and in terms of high accuracy in homogeneous areas. To assess the effectiveness of the scale-driven change-detection approach, the results obtained with the FFL-ARS strategy were compared with those obtained with a classical change-detection algorithm. In particular, we computed a change-detection map by an optimal (in the sense of minimum error) thresholding of the log-ratio image obtained after despeckling with the adaptive enhanced Lee filter [74]. The enhanced Lee filter was applied to the two original images (since a multiplicative speckle model is required). Several trials were carried out while varying the window size, to find the value that leads to the minimum overall error. The best result for the considered test site (see Table 17.4) was obtained with a 77 window size. The thresholding operation gave an overall error of 8053 pixels. This value is significantly higher than the overall error obtained with the TABLE 17.4 Overall Error, False Alarms, and Missed Alarms (in Number of Pixels and Percentage) Resulting from Classical Change-Detection Approaches False Alarms Applied Filtering Technique Enhanced Lee filter Gamma MAP filter Wavelet de-noising

Missed Alarms

Total Errors

Pixels

%

Pixels

%

Pixels

%

Filter Window Size

3725 3511 2769

3.68% 3.47% 2.74%

4328 4539 4243

20.34% 21.33% 19.94%

8053 8050 7012

6.57% 6.57% 5.72%

77 77 —

Unsupervised Change Detection in Multi-Temporal SAR Images

395

FFL-ARS strategy (i.e., 5557). In addition the proposed scale-driven fusion technique also decreased both the false (2181 vs. 3725) and the missed alarms (3376 vs. 4328) compared to the considered classical procedure. From a visual analysis of Figure 17.7a and Figure 17.6a and Figure 17.6b, it is clear that the change-detection map obtained after the Lee-based despeckling procedure significantly reduces the geometrical detail content in the final change-detection map compared to that obtained with the FFL-ARS approach. This is mainly due to the use of the filter, which not only results in a significant smoothing of the images but also strongly reduces the information component present in the speckle. Similar results and considerations both from a quantitative and qualitative point of view were obtained by filtering the image with the Gamma MAP filter (compare Table 17.3 and Table 17.4). Furthermore, we also analyzed the effectiveness of classical thresholding of the logratio image after de-noising with a recently proposed more advanced despeckling procedure. In particular, we investigated a DWT-based de-noising [69,75] technique (not used previously in change-detection problems). This technique achieves noise reduction in three steps: (1) image decomposition (DWT), (2) thresholding of wavelet coefficients, and (3) image reconstruction by inverse wavelet transformation (IDWT) [69,75]. It is worth noting also that this procedure is based on the multi-scale decomposition of the images. We can therefore better evaluate the effectiveness of the scale-driven procedure in exploiting the multi-scale information obtained by the DWT decomposition. Wavelet-based de-noising was applied to the log-ratio image because an additive speckle model was required. Several trials were carried out varying the wavelet-coefficient denoising algorithm while keeping the type of wavelet fixed, that is, Daubechies-4 (the same used for multi-level decomposition). The best change-detection result (see Table 17.4) was obtained by soft thresholding detail coefficients according to the universal threshold pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ T ¼ 2s2 log (I J), where I J is the image size and s2 is the estimated noise variance [75]. The soft thresholding procedure sets detail coefficients that fall between T and T to zero, and shrinks the module of coefficients that fall out of this interval by a factor T. The noise variance estimation was performed by computing the variance of the diagonaldetail sub-band at the first decomposition level, given the above thresholding approach. However, in this case also the error (i.e., 7012 pixels) obtained was significantly higher than the overall error obtained with the multi-scale change-detection technique based on the FFL-ARS strategy (i.e., 5557 pixels). Moreover, the multi-scale method performs better also in terms of false and missed alarms, which were reduced from 4243 to 3376, and from 2769 to 2181, respectively (see Table 17.3 and Table 17.4). By analyzing Figure 17.7b, Figure 17.6a, and Figure 17.6b, it can be seen that the change-detection map obtained by thresholding the log-ratio image after applying the DWT-based de-noising algorithm preserves geometrical information well. Nevertheless, on observing the map in greater detail, it can be concluded qualitatively that the spatial fidelity obtained with this procedure is lower than that obtained with the proposed approach. This is confirmed, for example, when we look at the right part of the burned area (circles in Figure 17.7b), where some highly irregular areas saved from the fire are properly modeled by the proposed technique, but smoothed out by the procedure based on DWT de-noising. This confirms the quantitative results and thus the effectiveness of the proposed approach in exploiting information from multi-level image decomposition. It is worth noting that the improvement in performance shown by the proposed approach was obtained without any additional computational burden compared to the thresholding procedure after wavelet de-noising. In particular, both methods require analysis and synthesis steps (though for different purposes). The main difference between the two considered techniques is the scale-driven combination step, which

Signal and Image Processing for Remote Sensing

396 (a)

(b)

FIGURE 17.7 Change-detection maps obtained for the considered data set by optimal manual thresholding of the log-ratio image after the despeckling with (a) the Lee-enhanced filter and (b) the DWT-based technique.

does not increase the computational time required by the thresholding of detail coefficients according to the standard wavelet-based de-noising procedure.

17.5

Conclusions

In this chapter the problem of unsupervised change detection in multi-temporal remotesensing images has been addressed. In particular, after a general overview on the unsupervised change-detection problem, attention has been focused on multi-temporal SAR images. A brief analysis of the literature on unsupervised techniques in SAR images has been reported, and a novel adaptive scale-driven approach to change detection in multi-temporal SAR data (recently developed by the authors) has been proposed. Unlike classical methods, this approach exploits information at different scales (obtained by a wavelet-based decomposition of the log-ratio image) to improve the accuracy and geometric fidelity of the change-detection map. Three different fusion strategies that exploit the subset of reliable scales for each pixel have been proposed and tested: (1) fusion at the decision level by an optimal scale selection (FDL-OSS), (2) fusion at the decision level of all reliable scales (FDL-ARS), and (3) fusion at the feature level of all reliable scales (FFL-ARS). As expected, a comparison among these strategies showed that fusion at the feature level led to better results than the other two procedures, in terms both of geometrical detail preservation and accuracy in homogeneous areas. This is due to a better intrinsic capability of this technique to exploit the information present in all the reliable scales for the analyzed spatial position, including the amount of information present in the speckle. Experimental results confirmed the effectiveness of the proposed scale-driven approach with the FFL-ARS strategy on the considered data set. This approach outperformed a classical change-detection technique based on the thresholding of the log-ratio image after a proper despeckling based on the application of the enhanced Lee filter and

Unsupervised Change Detection in Multi-Temporal SAR Images

397

also of the Gamma filter. In particular, change detection after despeckling resulted in a higher overall error, more false alarms and missed alarms, and significantly lower geometrical fidelity. To further assess the validity of the proposed approach, the standard technique based on the thresholding of the log-ratio image was applied after a despeckling phase applied according to an advanced DWT-based de-noising procedure (which has not been used previously in change-detection problems). The obtained results suggest that the proposed approach performs slightly better in terms of spatial fidelity and significantly increases the overall accuracy of the change-detection map. This confirms that on the considered data set and for solving change-detection problems, the scaledriven fusion strategy exploits the multi-scale decomposition better than standard denoising methods. It is worth noting that all experimental results were carried out applying an optimal manual trial-and-error threshold selection procedure, to avoid any bias related to the selected automatic procedure in assessing the effectiveness of both the proposed and standard techniques. Nevertheless, this step can be performed adopting automatic thresholding procedures [17,50]. As a final remark, it is important to point out that the proposed scale-driven approach is intrinsically suitable to be used with very high geometrical resolution active (SAR) and passive (PAN) images, as it properly handles and processes the information present at different scales. Furthermore, we think that the use of multi-scale and multi-resolution approaches represents a very promising avenue for investigation to develop advanced automatic change-detection techniques for the analysis of very high spatial resolution images acquired by the previous generation of remote-sensing sensors.

Acknowledgments This work was partially supported by the Italian Ministry of Education, University and Research. The authors are grateful to Dr. Francesco Holecz and Dr. Paolo Pasquali (SARMAP s.a.1, Cascine di Barico, CH-6989 Purasca, Switzerland) for providing images of Canada and advice about ground truth.

References 1. Bruzzone, L. and Serpico, S.B., Detection of changes in remotely-sensed images by the selective use of multi-spectral information, Int. J. Rem. Sens., 18, 3883–3888, 1997. 2. Bruzzone, L. and Ferna´ndez Prieto, D., An adaptive parcel-based technique for unsupervised change detection, Int. J. Rem. Sens., 21, 817–822, 2000. 3. Bruzzone, L. and Ferna´ndez Prieto, D., A minimum-cost thresholding technique for unsupervised change detection, Int. J. Rem. Sens., 21, 3539–3544, 2000. 4. Bruzzone, L. and Cossu, R., An adaptive approach for reducing registration noise effects in unsupervised change detection, IEEE Trans. Geosci. Rem. Sens, 41, 2455–2465, 2003. 5. Bruzzone, L. and Cossu, R., Analysis of multitemporal remote-sensing images for change detection: bayesian thresholding approaches, in Geospatial Pattern Recognition, Eds. Binaghi, E., Brivio, P.A., and Serpico, S.B., Research Signpost=Transworld Research, Kerala, India, 2002, Chap. 15.

398

Signal and Image Processing for Remote Sensing

6. Proc. First Int. Workshop Anal. Multitemporal Rem. Sens. Images, Eds. Bruzzone, L. and Smits, P.C., World Scientific Publishing, Singapore, 2002, ISBN: 981-02-4955-1, Book Code: 4997 hc. 7. Proc. Second Int. Workshop Anal. Multitemporal Rem. Sens. Images, Eds. Smits, P.C. and Bruzzone, L., World Scientific Publishing, Singapore, 2004, ISBN: 981-238-915-6, Book Code: 5582 hc. 8. Special Issue on Analysis of Multitemporal Remote Sensing Images, IEEE Trans. Geosci. Rem. Sens., Guest Eds. Bruzzone, L. and Smits, P.C., Tilton, J.C., 41, 2003. 9. Carlin, L. and Bruzzone, L., A scale-driven classification technique for very high geometrical resolution images, Proc. SPIE Conf. Image Signal Proc. Rem. Sens. XI, 2005. 10. Baraldi, A. and Bruzzone, L., Classification of high-spatial resolution images by using Gabor wavelet decomposition, Proc. SPIE Conf. Image Signal Proc. Rem. Sens. X, 5573, 19–29, 2004. 11. Binaghi, E., Gallo, I., and Pepe, M., A cognitive pyramid for contextual classification of remote sensing images, IEEE Trans. Geosci. Rem. Sens., 41, 2906–2922, 2003. 12. Benz, U.C. et al., Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GISready information, ISPRS J. Photogramm. Rem. Sens., 58, 239–258, 2004. 13. Singh, A., Digital change detection techniques using remotely-sensed data, Int. J. Rem. Sens., 10, 989–1003, 1989. 14. Bruzzone, L. and Serpico, S.B., An iterative technique for the detection of land-cover transitions in multitemporal remote-sensing images, IEEE Trans. Geosci. Rem. Sens., 35, 858–867, 1997. 15. Bruzzone, L., Ferna´ndez Prieto, D., and Serpico, S.B., A neural-statistical approach to multitemporal and multisource remote-sensing image classification, IEEE Trans. Geosci. Rem. Sens., 37, 1350–1359, 1999. 16. Serpico, S.B. and Bruzzone, L., Change detection, in Information Processing for Remote Sensing, Ed. Chen, C.H., World Scientific Publishing, Singapore, Chap. 15. 17. Bazi, Y., Bruzzone, L., and Melgani, F., An unsupervised approach based on the generalized Gaussian model to automatic change detection in multitemporal SAR images, IEEE Trans. Geosci. Rem. Sens., 43, 874–887, 2005. 18. Dai, X. and Khorram, S., The effects of image misregistration on the accuracy of remotely sensed change detection, IEEE Trans. Geosci. Rem. Sens., 36, 1566–1577, 1998. 19. Townshend, J.R.G. and Justice, C.O., Spatial variability of images and the monitoring of changes in the normalized difference vegetation index, Int. J. Rem. Sens., 16, 2187–2195, 1995. 20. Fung, T. and LeDrew, E., Application of principal component analysis to change detection, Photogramm. Eng. Rem. Sens., 53, 1649–1658, 1987. 21. Chavez, P.S. Jr., and MacKinnon, D.J., Automatic detection of vegetation changes in the southwestern united states using remotely sensed images, Photogramm. Eng. Rem. Sens., 60, 571–583, 1994. 22. Fung, T., An assessment of TM imagery for land-cover change detection, IEEE Trans. Geosci. Rem. Sens., 28, 681–684, 1990. 23. Muchoney, D.M. and Haack, B.N., Change detection for monitoring forest defoliation, Photogramm. Eng. Rem. Sens., 60, 1243–1251, 1994. 24. Bruzzone, L. and Ferna´ndez Prieto, D., Automatic analysis of the difference image for unsupervised change detection, IEEE Trans. Geosci. Rem. Sens., 38, 1170–1182, 2000. 25. Oliver, C.J. and Quegan, S., Understanding Synthetic Aperture Radar Images, Artech House, Norwood, MA, 1998. 26. Rignot, E.J.M. and van Zyl, J.J., Change detection techniques for ERS-1 SAR data, IEEE Trans. Geosci. Rem. Sens., 31, 896–906, 1993. 27. Quegan, S. et al., Multitemporal ERS SAR analysis applied to forest mapping, IEEE Trans. Geosci. Rem. Sens., 38, 741–753, 2000. 28. Dierking, W. and Skriver, H., Change detection for thematic mapping by means of airborne multitemporal polarimetric SAR imagery, IEEE Trans. Geosci. Rem. Sens., 40, 618–636, 2002. 29. Grover, K., Quegan, S., and da Costa Freitas, C., Quantitative estimation of tropical forest cover by SAR, IEEE Trans. Geosci. Rem. Sens., 37, 479–490, 1999. 30. Liu, J.G. et al., Land surface change detection in a desert area in Algeria using multi-temporal ERS SAR coherence images, Int. J. Rem. Sens., 2, 2463–2477, 2001. 31. Lombardo, P. and Macrı` Pellizzeri, T., Maximum likelihood signal processing techniques to detect a step pattern of change in multitemporal SAR images, IEEE Trans. Geosci. Rem. Sens., 40, 853–870, 2002.

Unsupervised Change Detection in Multi-Temporal SAR Images

399

32. Lombardo, P. and Oliver, C.J., Maximum likelihood approach to the detection of changes between multitemporal SAR images, IEEE Proc. Radar, Sonar, Navig., 148, 200–210, 2001. 33. Engeset, R.V. and Weydahl, D.J., Analysis of glaciers and geomorphology on svalbard using multitemporal ERS-1 SAR images, IEEE Trans. Geosci. Rem. Sens., 36, 1879–1887, 1998. 34. White, R.G. and Oliver, C.J., Change detection in SAR imagery, Rec. IEEE 1990 Int. Radar Conf., 217–222, 1990. 35. Caves, R.G. and Quegan, S., Segmentation based change detection in ERS-1 SAR images, Proc. IGARSS, 4, 2149–2151, 1994. 36. Inglada, J., Change detection on SAR images by using a parametric estimation of the Kullback– Leibler divergence, Proc. IGARSS, 6, 4104–4106, 2003. 37. Rignot, E. and Chellappa, R., A Bayes classifier for change detection in synthetic aperture radar imagery, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 3, 25–28, 1992. 38. Brown. L., A survey of image registration techniques, ACM Computing Surveys, 24, 1992. 39. Homer, J. and Longstaff, I.D., Minimizing the tie patch window size for SAR image co-registration, Electron. Lett., 39, 122–124, 2003. 40. Frost, V.S. et al., A model for radar images and its application to adaptive digital filtering of multiplicative noise, IEEE Trans. Pattern Anal. Machine Intell., PAMI-4, 157–165, 1982. 41. Lee, J.S., Digital image enhancement and noise filtering by use of local statistics, IEEE Trans. Pattern Anal. Machine Intell., PAMI-2, 165–168, 1980. 42. Kuan, D.T. et al., Adaptive noise smoothing filter for images with signal-dependent noise, IEEE Trans. Pattern Anal. Machine Intell., PAMI-7, 165–177, 1985. 43. Lopes, A. et al., Structure detection and statistical adaptive filtering in SAR images, Int. J. Rem. Sens. 41, 1735–1758, 1993. 44. Lopes, A. et al., Maximum a posteriori speckle filtering and first order texture models in SAR images, Proc. Geosci. Rem. Sens. Symp., IGARSS90, Collage Park, MD, 2409–2412, 1990. 45. Solbø, S. and Eltoft, T., G-WMAP: a statistical speckle filter operating in the wavelet domain, Int. J. Rem. Sens., 25, 1019–1036, 2004. 46. Cihlar, J., Pultz, T.J., and Gray, A.L., Change detection with synthetic aperture radar, Int. J. Rem. Sens., 13, 401–414, 1992. 47. Dekker, R.J., Speckle filtering in satellite SAR change detection imagery, Int. J. Rem. Sens., 19, 1133–1146, 1998. 48. Bovolo, F. and Bruzzone, L., A detail-preserving scale-driven approach to change detection in multitemporal SAR images IEEE Trans. Geosci. Rem. Sens., 43, 2963–2972, 2005. 49. Bazi, Y., Bruzzone, L. and Melgani, F., Change detection in multitemporal SAR images based on generalized Gaussian distribution and EM algorithm, Proc. SPIE Conf. Image Signal Processing Rem. Sens. X, 364–375, 2004. 50. Bruzzone, L. and Ferna´ndez Prieto, D., An adaptive semiparametric and context-based approach to unsupervised change detection in multitemporal remote-sensing images, IEEE Trans. Image Processing, 11, 452–446, 2002. 51. Fukunaga, K., Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, London, 1990. 52. Melgani, F., Moser, G., and Serpico, S.B., Unsupervised change-detection methods for remotesensing data, Optical Engineering, 41, 3288–3297, 2002. 53. Kittler, J. and Illingworth, J., Minimum error thresholding, Pattern Recognition, 19, 41–47, 1986. 54. Rosin, P.L. and Ioannidis, E., Evaluation of global image thresholding for change detection, Pattern Recognition Lett., 24, 2345–2356, 2003. 55. Lee, S.U., Chung, S.Y., and Park, R.H., A comparative performance study of several global thresholding techniques for segmentation, Comput. Vision, Graph. Image Processing, 52, 171– 190, 1990. 56. Dempster, A.P., Laird, N.M., and Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc., 39, 1–38, 1977. 57. Moon, T.K., The expectation–maximization algorithm, Sig. Process. Mag., 13, 47–60, 1996. 58. Redner, A.P. and Walker, H.F., Mixture densities, maximum likelihood and the EM algorithm, SIAM Review, 26, 195–239, 1984. 59. Duda, R.O., Hart, P.E., and Stork, D.G., Pattern Classification, John Wiley & Sons, New York, 2001.

400

Signal and Image Processing for Remote Sensing

60. Bazi, Y., Bruzzone, L. and Melgani, F., Change detection in multitemporal SAR images based on the EM–GA algorithm and Markov Random Fields, Proc. IEEE Third Int. Workshop Anal. MultiTemporal Rem. Sens. Images (Multi-Temp 2005), 2005. 61. Bazi, Y., Bruzzone, L., and Melgani, F., Automatic identification of the number and values of decision thresholds in the log-ratio image for change detection in SAR images, IEEE Lett. Geosci. Rem. Sens., 43, 874–887, 2005. 62. Burt, P.J. and Adelson, E.H., The Laplacian pyramid as a compact image code IEEE Trans. Comm., 31, 532–540, 1983. 63. Mallat, S.G., A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. Machine Intell., PAMI-11, 674–693, 1989. 64. Mallat, S.G., A Wavelet Tour of Signal Processing, Academic Press, San Diego, 1998. 65. Aiazzi, B. et al., A Laplacian pyramid with rational scale factor for multisensor image data fusion, in Proc. Int. Conf. Sampling Theory and Applications, SampTA 97, 55–60, 1997. 66. Aiazzi, B., Alparone, L. and Baronti, S., Multiresolution local-statistics speckle filtering based on a ratio Laplacian pyramid, IEEE Trans. Geosci. Rem. Sens., 36, 1466–1476, 1998. 67. Xie, H., Pierce, L.E. and Ulaby, F.T., SAR speckle reduction using wavelet denoising and Markov random field modeling, IEEE Trans. Geosci. Rem. Sens., 40, 2196–2212, 2002. 68. Zeng, Z. and Cumming, I., Bayesian speckle noise reduction using the discrete wavelet transform, IEEE Int. Geosci. Rem. Sens. Symp. Proc., IGARSS-1998, 7–9, 1998. 69. Guo, H. et al., Wavelet based speckle reduction with application to SAR based ATD=R, IEEE Int. Conf. Image Processing Proc., ICIP-1994, 1, 75–79, 1994. 70. Argenti, F. and Alparone, L., Speckle removal for SAR images in the undecimated wavelet domain, IEEE Trans. Geosci. Rem. Sens., 40, 2363–2374, 2002. 71. Nason, G.P. and Silverman, B.W., The stationary wavelet transform and some statistical applications, in Wavelets and Statistics, Eds. Antoniadis, A. and Oppenheim, G., Springer-Verlag, New York, Lect. Notes Statist., 103, 281–300, 1995. 72. Manian, V. and Va`squez, R., On the use of transform for SAR image classification, IEEE Int. Geosci. Rem. Sens. Symp. Proc., 2, 1068–1070, 1998. 73. Strang, G. and Nguyen, T, Wavelets and Filter Banks, Cambridge Press, Wellesley, USA, 1996. 74. Lopes, A., Touzi, R., and Nerzy, E., Adaptive speckle filters and scene heterogeneity, IEEE Trans. Geosci. Rem. Sens., 28, 992–1000, 1990. 75. Donoho, D.L. and Johnstone, I.M., Ideal spatial adaptation via wavelet shrinkage, Biometrika, 81, 425–455, 1994.

18 Change-Detection Methods for Location of Mines in SAR Imagery

Maria Tates, Nasser Nasrabadi, Heesung Kwon, and Carl White

CONTENTS 18.1 Introduction ..................................................................................................................... 401 18.2 Difference, Euclidean Distance, and Image Ratioing Change-Detection Methods ............................................................................................................................ 403 18.3 Mahalanobis Distance–Based Change Detection....................................................... 403 18.4 Subspace Projection–Based Change Detection .......................................................... 404 18.5 Wiener Prediction–Based Change Detection.............................................................. 405 18.6 Implementation Considerations ................................................................................... 406 18.6.1 Local Implementation Problem Formulation ............................................... 406 18.6.2 Computing Matrix Inversion for CX1 and R1 XX............................................ 407 18.7 Experimental Results...................................................................................................... 408 18.8 Conclusions...................................................................................................................... 411 References ................................................................................................................................... 412

18.1

Introduction

Using multi-temporal SAR images of the same scene, analysts employ several methods to determine changes among the set [1]. Changes may be abrupt in which case only two images are required or it may be gradual in which case several images of a given scene are compared to identify changes. The former case is considered here, where SAR images are analyzed to detect land mines. Much of the change-detection activity has been focused on optical data for specific applications, for example, several change-detection methods are implemented and compared to detect land-cover changes in multi-spectral imagery [2]. However, due to the natural limitations of optical sensors, such as sensitivity to weather and illumination conditions, SAR sensors may constitute a superior sensor for change detection as images for this purpose are obtained at various times of the day under varying conditions. SAR systems are capable of collecting data in all weather conditions and are unaffected by cloud cover or changing sunlight conditions. Exploiting the long-range propagation qualities of radar signals and utilizing the complex processing capabilities of current digital technology, SAR produces high-resolution imagery providing a wealth of information in many civilian and military applications. One major characteristic of SAR data is 401

402

Signal and Image Processing for Remote Sensing

the presence of speckle noise, which poses a challenge in classification. SAR data taken from different platforms also shows high variability. The accuracy of preprocessing tasks, such as image registration, also affects the accuracy of the postprocessing task of change detection. Image registration, the process of aligning images into the same coordinate frame, must be done prior to change detection. Image registration can be an especially nontrivial task in instances where the images are acquired from nonstationary sources in the presence of sensor motion. In this paper we do not address the issue of image registration, but note that there is a plethora of information available on this topic and a survey is provided in Ref. [3]. In Ref. [4], Mayer and Schaum implemented several transforms on multi-temporal hyperspectral target signatures to improve target detection in a matched filter. The transforms implemented in their paper were either model-dependent, based solely on atmospheric corrections, or image-based techniques. The image-based transforms, including one that used Wiener filtering, yielded improvements in target detection using the matched filter and proved robust to various atmospheric conditions. In Ref. [5], Rignot and van Zyl compared various change-detection techniques for SAR imagery. In one method they implemented image ratioing, which proves more appropriate for SAR than simple differencing. However, the speckle decorrelation techniques they implemented outperform the ratio method under certain conditions. The results lead to the conclusion that the ratio method and speckle decorrelation method have complimentary characteristics for detecting gradual changes in the structural and dielectric properties of remotely sensed surfaces. In Ref. [6], Ranney and Soumekh evaluated change detection using the so-called signal subspace processing method, which is similar to the subspace projection technique [7], to detect mines in averaged multi-look SAR imagery. They posed the problem as a trinary hypothesis-testing problem to determine whether no change has occurred, a target has entered a scene, or whether a target has left the scene. In the signal subspace processing method, one image is modeled as the convolution of the other image by a transform that accounts for variances in the image-point response between the reference and test images due to varying conditions present during the different times of data collection. For the detection of abrupt changes in soil characteristics following volcanic eruptions, in Ref. [8] a two-stage change-detection process was utilized using SAR data. In the first stage, the probability distribution functions of the classes present in the images were estimated and then changes in the images via paradoxical and evidential reasoning were detected. In Ref. [9], difference images are automatically analyzed using two different methods based on Bayes theory in an unsupervised change-detection paradigm, which estimates the statistical distributions of the changed and unchanged pixels using an iterative method based on the expectation maximization algorithm. In Ref. [10], the performance of the minimum error solution based on the matrix Wiener filter was compared to the covariance equalization method (CE) to detect changes in hyperspectral imagery. They found that CE, which has relaxed operational requirements, is more robust to imperfect image registration than the matrix Wiener filter method. In this chapter, we have implemented image differencing, ratio, Euclidean distance, Mahalanobis distance, subspace projection-based, and Wiener filter-based change detection and compared their performances to one another. We demonstrate all algorithms on co-registered SAR images obtained from a high resolution, VV-polarized SAR system. In Section 18.2 we discuss difference-based change detection, Euclidean distance, and ratio change-detection methods, which comprise the simpler change-detection methods implemented in this paper. In Section 18.3 through Section 18.5, we discuss more complex methods such as Mahalanobis distance, subspace projection–based change detection, and

Change-Detection Methods for Location of Mines in SAR Imagery

403

Wiener filter–based change detection, respectively. We consider specific implementation issues in Section 18.6 and present results in Section 18.7. Finally, a conclusion is provided in Section 18.8.

18.2

Difference, Euclidean Distance, and Image Ratioing Change-Detection Methods

In simple difference–based change detection, a pixel from one image (the reference image) is subtracted from the pixel in the corresponding location of another image (the test image), which has changed with respect to the reference image. If the difference is greater than a threshold, then a change is said to have occurred. One can also subtract a block of pixels from one image from the corresponding block from a test image. This is referred to as Euclidean difference if the blocks of pixels are arranged into vectors and the L-2 norm of the difference of these vectors is computed. We implemented the Euclidean distance as in Equation 18.1, where we computed the difference between a block of pixels (arranged into a one-dimensional vector) from reference image, fX, and the corresponding block in the test image, fY, to obtain an error, eE: eE ¼ ðy xÞT

(18:1)

where x and y are vectors of pixels taken from fX and fY, respectively. We display this error, eE, as an image. When no change has occurred this error is expected to be low and the error will be high when a change has occurred. While simple differencing considers only two pixels in making a change decision, the Euclidean distance takes into account pixels within the neighborhood of the pixel in question. This regional decision may have a smoothing effect on the change image at the expense of additional computational complexity. Closely related to the Euclidean distance metric for change detection is image ratioing. In many change-detection applications, ratioing proved more robust to illumination effects than simple differencing. It is implemented as follows: eR ¼

y x

(18:2)

where y and x are pixels from the same locations in the test and reference images, respectively.

18.3

Mahalanobis Distance–Based Change Detection

Simple techniques like differencing and image ratioing suffer from sensitivity to noise and illumination intensity variations in the images. Therefore, these methods may prove adequate only for applications to images with low noise and which show little illumination variation. These methods may be inadequate when applied to SAR images due to the presence of highly correlated speckle noise, misregistration errors, and other unimportant

404

Signal and Image Processing for Remote Sensing

changes in the images. There may be differences in the reference and test images caused by unknown fluctuations in the amplitude-phase in the radiation pattern of the physical radar between the two data sets and subtle inconsistencies in the data acquisition circuitry [5]; consequently, more robust methods are sought. One such method uses the Mahalanobis distance to detect changes in SAR imagery. In this change-detection application, we obtain an error (change) image by computing the Mahalanobis distance between x and y to obtain eMD as follows: 1 eMD ¼ (x y)T C X (x y)

(18:3)

The CX1 term in the Mahalanobis distance is the inverse covariance matrix computed from vectors of pixels in fX. By considering the effects of other pixels in making a change decision with the inclusion of second-order statistics, the Mahalanobis distance method is expected to reduce false alarms. The CX1 term should improve the estimate by reducing the effects of background clutter variance, which, for the purpose of detecting mines in SAR imagery, does not constitute a significant change.

18.4

Subspa ce Projection–Based C hange Detection

To apply subspace projection [11] to change detection a subspace must be defined for either the reference or the test image. One may implement Gram–Schmidt orthogonalization as in [6] or one can make use of eigen-decomposition to define a suitable subspace for the data. We computed the covariance of a sample from fX and its eigenvectors and eigenvalues as follows: CX ¼ (X mX )(X mX )T

(18:4)

where X is a matrix of pixels whose columns represent a block of pixels from fX, arranged as described in Section 18.6.1, having mean mX ¼ X1NN. 1NN is a square matrix of size NN whose elements are 1/N, where N is the number of columns of X. We define the subspace of the reference data, which we can express using eigen-decomposition in terms of its eigenvectors, Z, and eigenvalues, V: CX ¼ ZVZT

(18:5)

e , to develop a subspace We truncate the number of eigenvectors in Z, denoted by Z projection operator, PX: eZ eT PX ¼ Z

(18:6)

The projection of the test image onto the subspace of the reference image will provide a measure of how much of the test sample is represented by the reference image. Therefore, by computing the squared difference between the test image and its projection onto the subspace of the reference image we obtain an estimate of the difference between the two images: eSP (y) ¼ [yT (I PX )y]

(18:7)

Change-Detection Methods for Location of Mines in SAR Imagery

405

We evaluated the effects of various levels of truncation and display the best results 1 achieved. In our implementations we include the CX term in the subspace projection error term as follows: eSP (y) ¼ yT (I PX )T C1 X (I PX )y

(18:8)

We expect it will play a similar role as it does in the Mahalanobis prediction and further diminish false alarms by suppressing the background clutter. Expanding the terms in (18.8) we get T T 1 T 1 T T 1 eSP (y) ¼ yT C1 X y y PX CX y y CX PX y þ y PX CX PX y

(18:9)

T T 1 T T 1 It can be shown that yT C1 X PX y ¼ y PX CX y ¼ y PX CX PX y, so we can rewrite (18.9) as 1 T eSP (y) ¼ yT C1 X y yCX PX y

18.5

(18:10)

Wiener Prediction–Based Change Detection

We propose a Wiener filter–based change-detection algorithm to overcome some of the limitations of simple differencing, namely to exploit the highly correlated nature of speckle noise, thereby reducing false alarms. Given two stochastic, jointly wide-sense stationary signals, the Wiener equation can be implemented for prediction of one data set from the other using the auto-and cross-correlations of the data. By minimizing the variance of the error committed in this prediction, the Wiener predictor provides the optimal solution in the linear minimum mean-squared error sense. If the data are well represented by the second-order statistics, the Wiener prediction will be very close to the actual data. In applying Wiener prediction theory to change detection the number of pixels where abrupt change has occurred is very small relative to the total number of pixels in the image; therefore, these changes will not be adequately represented by the correlation matrix. Consequently, there will be a low error in the prediction where the reference and test images are the same and no change exists between the two, and there will be a large error in the prediction of pixels where a change has occurred. The Wiener filter [11] is the linear minimum mean-squared error filter for second-order ^W ¼ Wx. The goal of Wiener filtering is to find W, stationary data. Consider the signal y ^W k, where y represents a desired signal. In the which minimizes the error, eW ¼ k y y case of change detection y is taken from the test image, fY, which contains changes as compared to the reference image, fX. The Wiener filter seeks the value of W, which minimizes the mean-squared error, eW. If the linear minimum mean-squared error estimator of Y satisfies the orthogonality condition, the following expressions hold: E{(Y WX)XT } ¼ 0 E{YXT } E{WXXT } ¼ 0 RYX WRXX ¼ 0

(18:11)

where E{} represents the expectation operator, X is a matrix of pixels whose columns represent a block of pixels from fX, Y is a matrix of pixels from fY whose columns

Signal and Image Processing for Remote Sensing

406

represent a block of pixels at the same locations as those in X, RYX is the cross-correlation 1 matrix of X and Y, and RYX is the autocorrelation matrix of X. For a complete description of the construction of X and Y see Section 18.6.1. Equations 18.11 imply that W satisfies the Wiener–Hopf equation, which has the following solution: 1 W ¼ RYX R XX

(18:12)

1 ^W ¼ RYX R y XX x

(18:13)

Therefore, we have:

1 where RYX ¼ YXT and RXX ¼ (XXT)1. So the error can be computed as follows:

T ^W eW ¼ y y

(18:14)

In a modified implementation of the Wiener filter–based change-detection method, we insert a normalization (or whitening) term, CX1, into the error equation as follows: 1 ^W )T C ^W ) eW0 ¼ (y y X (y y

(18:15)

We expect it will play a similar role as it does in the Mahalanobis prediction and further diminish false alarms by suppressing the background clutter. In our work we implemented Equation 18.15 to detect changes in pixels occurring from one image to another. The error at each pixel is eW0 , which we display as an image.

18.6 18.6.1

Implementation C onsiderations Local Implementation Problem Formulation

In implementing the Mahalanobis distance, subspace projection–based and the Wiener filter–based change-detection methods, we constructed our data matrices and vectors locally to compute the local statistics and generate a change image. We generate X, Y, x, and y using dual concentric sliding windows. X and Y are matrices of pixels contained within an outer moving window (size m m), whose columns are obtained from overlapping smaller blocks (x1, x2, . . . , xM each of size nn) within the outer windows of the reference image and test image, respectively, as demonstrated in Figure 18.1 and Figure 18.2 for X, where X ¼ [x1 x2 xM]. Accordingly, the dimensions of X and Y will be NM, where N ¼ n2 and M ¼ m2. Note, for clarity the windows that generate the columns of X and Y are not shown as overlapping in Figure 18.2. However, in actual implementation the X and Y matrices are constructed from overlapping windows within X and Y. Figure 18.1 shows x and y, which are N1 vectors composed of pixels within the inner window (size nn) of the reference image, fX, and test image, fY, respectively. In this local implementation, we must compute the necessary statistics at each iteration as we move pixel-by-pixel through the image, thereby generating new X and Y matrices and x and y vectors.

Change-Detection Methods for Location of Mines in SAR Imagery

x

y

X

Y

407

FIGURE 18.1 Dual concentric sliding windows of the test and reference images.

18.6.2

1 Computing Matrix Inversion for C1 X and RXX

Because of numerical errors, computation of the inverse of these matrices may become unstable. We compute the pseudo-inverse using eigenvalue decomposition as shown below: 1 T C1 X ¼ UV U ,

(18:16)

where U and V are the matrices of eigenvectors and eigenvalues of CX, respectively. However, since the eigenvalues are obtained using a numerical tool, some values are very small such that its inverse blows up and causes the matrix to be unstable. To avoid this problem, we discard eigenvalues that are below a certain threshold. We have evaluated the performance for various thresholds.

x1 n

n

x2

n

n

m

n xM n n

FIGURE 18.2 Construction of X matrix.

Signal and Image Processing for Remote Sensing

408

18.7

Expe rimental Results

Results from all methods implemented were normalized between zero and one before producing ROC curves. The threshold to generate the ROC curves also ranged from zero to one in increments of 103. We used multi-look data collected at X band from a high resolution, VV-polarized SAR system (Figure 18.3a and Figure 18.3b). Small mines, comprising about a 55 pixel area, appear in the test image (Figure 18.3b). The size of both the reference and test images is 950950 pixels. Ground truth was provided, which indicates the approximate center location of the known mines; we defined the mine area as an area slightly bigger than the average mine, centered at the locations indicated by ground-truth information. To determine the performance, a hit was tabulated if at least a single pixel within the mine area surpassed the threshold. Any pixel outside of a mine area, which surpassed the threshold, was counted as a false alarm. Figure 18.4a shows the pixel differencing and Figure 18.4b shows the Euclidean distance change images. The ratio change image obtained from Equation 18.2 is represented in Figure 18.5. The local implementation results shown in Figure 18.6 through Figure 18.8 are from implementations where the inner window was of size 33; the outer window, from which we constructed X and Y matrices to compute the second-order statistics, was of size 1313. We gathered 169 overlapping 33 blocks from the 1313 area, arranged each into a 91 vector, and placed them as columns in a matrix of size 9169. We re-computed X and Y at each iteration to determine the change at each pixel. The three very bright spots in the original images, Figure 18.3, are used to register the images and were identified as changes in the simple change-detection methods. The local methods that incorporated image statistics were able to eliminate these bright areas from the change image (Figure 18.6 through Figure 18.8), as they are present in both the reference and test images at different intensities. Another area of the image that results in many false alarms, after processing by the simpler methods (results shown in Figure 18.4a,b, and Figure 18.5), is the area in the upper right quadrant of the image. In the test image, Figure 18.3b, this area is comprised of several slanted lines, while in the reference

(a)

(b)

FIGURE 18.3 Original SAR Images: (a) reference image, (b) test image with mines contained in elliptical area.

Change-Detection Methods for Location of Mines in SAR Imagery (a)

409

(b)

FIGURE 18.4 (a) Simple difference change image, (b) Euclidean distance change image.

image, Figure 18.3a, there is a more random, grainy pattern. The Wiener filter–based method with CX1, Mahalanobis distance, and the subspace projection method with CX1 methods all show far fewer false alarms in this area than is shown in the results from the simpler change-detection methods. ROC curves for the simple change-detection methods can be seen in Figure 18.9a and ROC curves for the more complicated methods that have the CX1 term are shown in Figure 18.9b. A plot of the ROC curves for all the methods implemented is displayed in Figure 18.10 for comparison.

FIGURE 18.5 Ratio change image.

410

Signal and Image Processing for Remote Sensing

FIGURE 18.6 Local Mahalanobis distance change image.

Local Wiener with CX1, local Mahalanobis, local subspace projection with CX1, and ratio methods show similar performance. However, the local Mahalanobis distance method begins to perform worse than the others from about .4 to .65 probability of detection (Pd) rate, then surpasses all the other methods above Pd ¼ .65. The simple difference method has the worst performance over all Pd rates.

FIGURE 18.7 1 Local subspace projection (with CX term) change image.

Change-Detection Methods for Location of Mines in SAR Imagery

411

FIGURE 18.8 Local Wiener filter–based (with CX1 term) change image.

18.8

Conclusions

As we have only demonstrated the algorithms on a limited data set of one scene, it is not prudent to come to broad conclusions. However there are some trends indicated by the (a)

(b) ROC, change detection methods with C x−1

1

1

0.9

0.9

0.8

0.8

0.7

0.7

Probability of detection

Probability of detection

ROC, simple change detection methods

0.6 0.5 0.4 Difference Ratio Euclidean

0.3

0.6 0.5

0.3

0.2

0.2

0.1

0.1

0

Local subspace projection, with invo(Cx) Local Mahalanobis Local Wiener, w ith inv(Cx)

0.4

0 0

0.05

0.1

Probability of false alarm

0.15

0

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Probability of false alarm

FIGURE 18.9 ROC curves displaying performance of (a) simple change-detection methods and (b) methods incorporating 1 term. CX

Signal and Image Processing for Remote Sensing

412

Roc, all change detection methods 1 0.9

Probability of detection

0.8 0.7 0.6 0.5 0.4 Local subspace projection, with inv(Cx) Local Mahalanobis Local Wiener, with inv(Cx) Euclidean Difference Ratio

0.3 0.2 0.1 0

0

0.01

0.02

0.03

0.04 0.05 0.06 Probability of false alarm

0.07

0.08

0.09

0.1

FIGURE 18.10 ROC curves displaying performance of the various change-detection methods implemented in this paper.

results above. The results above indicate that in general the more complex algorithms exhibited superior performance as compared to the simple methods; the exception seems to be the ratio method. The ratio method, although a simple implementation, has a performance that is competitive with the more complex change-detection methods implemented up to a Pd rate approximately equal to .65. Its performance degrades at higher detection rates, where it suffers from more false alarms than the methods that have the CX1. Results indicate that taking into account the local spatial and statistical properties of the image improves the change-detection performance. Local characteristics are used in computing the inverse covariance matrix, resulting in fewer false alarms as compared to the simple change detection methods. The ROC curves in Figure 18.9 and Figure 18.10 also show that that addition of the CX1 term may serve to mitigate false alarms that arise from speckle, which is a hindrance to achieving low false alarm rates with the Euclidean distance and simple difference methods. As expected the simple difference method exhibits the worst performance, having a high occurrence of false alarms.

References 1. R.J. Radke, S. Andra, O. Al-Kofahi, and B. Roysam, Image change detection algorithms: A systematic survey, IEEE Transactions on Image Processing, 14(3), 294–307, 2005. 2. J.F. Mas, Monitoring land-cover changes: A comparison of change detection techniques, International Journal of Remote Sensing, 20(1), 139–152, 1999. 3. B. Zitova and J. Flusser, Image registration methods: A survey, Image and Vision Computing, 21, 977–1000, 2003.

Change-Detection Methods for Location of Mines in SAR Imagery

413

4. R. Mayer and A. Schaum, Target detection using temporal signature propagation, Proceedings of SPIE Conference Algorithms for Multispectral, Hyperspectral and Ultraspectral Imagery, 4049, 64–74, 2000. 5. E.J.M. Rignot and J. van Zyl, Change detection techniques for ERS-1 SAR data, IEEE Transactions on Geoscience and Remote Sensing, 31(4), 896–906, 1993. 6. K. Ranney and M. Soumekh, Adaptive change detection in coherent and noncoherent SAR imagery, Proceedings of IEEE International Radar Conference, pp. 195–200, 2005. 7. H. Kwon, S.Z. Der, and N.M. Nasrabadi, Projection-based adaptive anomaly detection for hyperspectral imagery, Proceedings of IEEE International Conference on Image Processing, pp. 1001– 1004, 2003. 8. G. Mercier and S. Derrode, SAR image change detection using distance between distributions of classes, Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 6, 3872–3875, 2004. 9. L. Bruzzone and D.F. Prieto, Automatic analysis of the difference image for unsupervised change detection, IEEE Transactions on Geoscience and Remote Sensing, 38(3), 1171–1182, 2000. 10. A. Schaum and A. Stocker, Advanced algorithms for autonomous hyperspectral change detection, Proceedings of IEEE, Applied Imagery Pattern Recognition Workshop, pp. 39–44, 2004. 11. L. Scharf, Statistical Signal Processing: Detection Estimation, and Time Series Analysis, Addison Wesley, Colorado, 1991.

19 Vertex Component Analysis: A Geometric-Based Approach to Unmix Hyperspectral Data* Jose´ M.B. Dias and Jose´ M.P. Nascimento

CONTENTS 19.1 Introduction ..................................................................................................................... 415 19.2 Vertex Component Analysis Algorithm ..................................................................... 417 19.2.1 Subspace Estimation......................................................................................... 418 19.2.1.1 Subspace Estimation Method Description................................... 419 19.2.1.2 Computer Simulations .................................................................... 421 19.2.2 Dimensionality Reduction ............................................................................... 423 19.2.3 VCA Algorithm ................................................................................................. 424 19.3 Evaluation of the VCA Algorithm ............................................................................... 426 19.4 Evaluation with Experimental Data ............................................................................ 432 19.5 Conclusions...................................................................................................................... 434 References ................................................................................................................................... 436

19.1

Introduction

Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2,3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4,5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6,7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11,12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3,14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15,16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. *Work partially based on [1].

415

416

Signal and Image Processing for Remote Sensing

Linear spectral unmixing has been intensively researched in the last years [9,10,12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21,32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log() law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35,42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data.

Vertex Component Analysis

417

ORASIS [44,45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postprocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given threshold. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalization. The selected vectors are then projected onto this subspace and a simplex is found by an MVT process. ORASIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47,48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating signal and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selecting the set of signal eigenvalues that best represents the data, in the least-square sense [48,49], we note, however, that VCA works with projected and with unprojected data. The extraction of the endmembers exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FINDR algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data onto a direction orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to the extreme of the projection. The algorithm iterates until all endmembers are exhausted. VCA performs much better than PPI and better than or comparable to N-FINDR; yet it has a computational complexity between one and two orders of magnitude lower than N-FINDR. The chapter is structured as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.

19.2

Vertex Component Analysis Algorithm

Assuming the linear mixing scenario, each observed spectral vector is given by r¼xþn ¼ M ga þn , |{z}

(19:1)

s

where r is an L-vector (L is the number of bands), M [m1, m2, . . . , mp] is the mixing matrix (mi denotes the ith endmember signature and p is the number of endmembers present in the covered area), s ga (g is a scale factor modeling illumination variability due to surface topography), a ¼ [a1, a2, . . . , ap]T is the vector containing the abundance fraction of each endmember (the notation ()T stands for vector transposed), and n models system additive noise.

Signal and Image Processing for Remote Sensing

418

(b)

(a) Reflectance 1

Carnallite Ammonioalunite Biotite

Cone (g ≠1) 0.8

0.8

C Reflectance (%)

Channel 150 (l`=1780 nm)

1

0.6 B

0.4 A 0.2

C 0.6 0.4 0.2

A B

Simplex (g = 1) 0

0

0.2 0.4 0.6 0.8 Channel 50 (l`= 827 nm)

1

0

0.5

1

1.5 l (μm)

2

2.5

FIGURE 19.1 (a) 2D scatterplot of mixtures of the three endmembers shown in Figure 19.2; circles denote pure materials. (b) Reflectances of Carnallite, Ammonioalunite, and Biotite. (ß 2005 IEEE. With permission.)

Owing to physical constraints [20], abundance fractions are non-negative (a 0) and satisfy the so-called positivity constraint 1T a ¼ 1, where 1 is a p 1 vector of ones. Each pixel can be viewed as a vector in an L-dimensional Euclidean space, where each channel is assigned to one axis of space. Since the set {a 2 R p : 1Ta ¼ 1, a 0} is a simplex, the set Sx ¼ {x 2 R L: x ¼ Ma,1Ta ¼ 1, a 0} is also a simplex. However, even assuming n ¼ 0, the observed vector set belongs to Cp ¼ {r 2 R L : r ¼ Mga, 1T a ¼ 1, a 0, g 0} that is a convex cone, owing to scale factor g. For illustration purposes, a simulated scene was generated according to the expression in Equation 19.1. Figure 19.1a illustrates a simplex and a cone, projected on a 2D subspace, defined by a mixture of three endmembers. These spectral signatures (AM Biotite, BM Carnallite, and CM Ammonioalunite) were selected from the U.S. Geological Survey (USGS) digital spectral library [50]. The simplex boundary is a triangle whose vertices correspond to these endmembers shown in Figure 19.1b. Small and medium dots are simulated mixed spectra belonging to the simplex Sx (g ¼ 1) and to the cone Cp (g i 0), respectively. The projective projection of the convex cone Cp onto a properly chosen hyperplane is a simplex with vertices corresponding to the vertices of the simplex Sx. This is illustrated in Figure 19.2. The simplex Sp ¼ {y 2 R L: y ¼ r=(rT u), r 2 Cp} is the projective projection of the convex cone Cp onto the plane rTu ¼ 1, where the choice of u assures that there is no observed vectors orthogonal to it. After identifying Sp, the VCA algorithm iteratively projects data onto a direction orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to the extreme of the projection. Figure 19.2 illustrates the two iterations of VCA algorithm applied to the simplex Sp defined by the mixture of two endmembers. In the first iteration, data is projected onto the first direction f1. The extreme of the projection corresponds to endmember ma. In the next iteration, endmember mb is found by projecting data onto direction f2, which is orthogonal to ma. The algorithm iterates until the number of endmembers is exhausted. 19.2.1

Subspace Estimation

As mentioned before each spectral vector, also called pixel, of a hyperspectral image can be represented as a vector in the space R L. Under the linear mixing scenario, those

Vertex Component Analysis

419

Convex cone Cp (γ ≠ 1) Simplex Sp T

f2mb

ej mb

u

T

ru = 1

ma

f2 f1 T

f1ma

Simplex Sx (γ = 1) ei FIGURE 19.2 Illustration of the VCA algorithm. (ß 2005 IEEE. With permission.)

spectral vectors are a linear combination of a few vectors, the so-called endmember signatures. Therefore, the dimensionality of data (number of endmembers) is usually much lower than the number of bands, that is, those pixels are in a subspace of dimension p. If p L, it is worthy to project the observed spectral vectors onto the subspace signal. This leads to significant savings in computational complexity and to SNR improvements. Principal component analysis (PCA) [51], maximum noise fraction (MNF) [52], and singular value decomposition (SVD) [49] are three well-known projection techniques widely used in remote sensing. PCA, also known as Karhunen–Loe´ve transform, seeks the projection that best represents data in a least-square sense; MNF or noise adjusted principal components (NAPC) [53] seeks the projection that optimizes SNR; and SVD provides the projection that best represents data in the maximum power sense. PCA and MNF are equal in the case of white noise. SVD and PCA are also equal in the case of zeromean data. A key problem in dimensionality reduction of hyperspectral data is the determination of the number of endmembers present in the data set. Recently, Harsanyi, Farrand, and Chang developed a Neyman–Pearson detection theory-based thresholding method (HFC), to determine the number of spectral endmembers in hyperspectral data (referred in Ref. [18] as virtual dimensionality—VD). HFC method uses the eigenvalues to measure signal [54]. A modified version of this method, energy in a detection frame work called noise-whitened HFC (NWHFC), includes a noise-whitening process as a preprocessing step to remove the second-order statistical correlation [18]. In the NWHFC method a noise estimation is required. By specifying the false alarm probability, Pf, a Neyman–Pearson detector is derived to determine whether or not a distinct signature is present in each of spectral bands. The number of times the Neyman–Pearson detector fails the test is exactly the number of endmembers assumed to be present in the data. A new mean-squared error-based approach is next presented to determine the signal subspace in hyperspectral imagery [47,48]. The method firstly estimates the signal and noise correlations matrices, then it selects the subset of eigenvalues that best represents the signal subspace in the least-square sense. 19.2.1.1

Subspace Estimation Method Description

Let R ¼ [r1, r2, . . . ,rN] be an L N matrix of spectral vectors, one per pixel, where N is the number of pixels and L is the number of bands. Assuming a linear mixing scenario, each observed spectral vector is given by Equation 19.1.

Signal and Image Processing for Remote Sensing

420

The correlation matrix of vector r is Kr ¼ Kx þ Kn, where Kx ¼ MKs MT is the signal correlation matrix, Kn is the noise correlation matrix, and Ks is the abundance correlation matrix. An estimate of the signal correlation matrix is given by ^x ¼ K ^r K ^ n, K

(19:2)

^ n is an estimate of noise ^ r ¼ RRT=N is the sample correlation matrix of R, and K where K correlation matrix. Define Ri ¼ ([R]i,:)T, where symbol [R]i,: stands for the ith row of R, that is, Ri is the transpose of the ith line of matrix R, thus containing the data read by the hyperspectral sensor at the ith band for all image pixels. Define also the matrix R@ i ¼ [R1, . . . , Ri1, Riþ1, . . . , RL]. Assuming that the dimension of the signal subspace is much lower than the number of ^ n can be inferred based on multiple regression theory bands, the noise correlation matrix K [55]. This consists in assuming that Ri ¼ R@i Bi þ «i ,

(19:3)

where R@ i is the explanatory data matrix, Bi ¼ [B1, . . . , BL1]T is the regression vector, and «i are modeling errors. For each i 2 {1, . . . , L}, the regression vector is given by Bi ¼ [R@ i]# ^ i and its Ri, where ()# denotes pseudo-inverse matrix. Finally, we compute «^i ¼ Ri R@ i B ^ n. sample correlation matrix K ^ x be Let the singular value decomposition (SVD) of K ^ x ¼ ESET , K

(19:4)

where E ¼ [e1, . . . , ek, ekþ1, . . . , eL] is a matrix with the singular vectors ordered by the descendent magnitude of the respective singular values. The space R L can be split into ? two orthogonal subspaces: h Ek i spanned by Ek ¼ [e1, . . . , ek] and h E? k i spanned by Ek ¼ [ek þ 1, . . . , eL], where k is the order of the signal subspace. Because hyperspectral mixtures have non-negative components, the projection of the mean value of R onto any eigenvector ei, 1 i k, is always nonzero. Therefore, the signal subspace can be identified by finding the subset of eigenvalues that best represents, in the least-square sense, the mean value of data set. The sample mean value of R is

r ¼

N N N 1 X 1 X 1 X ri ¼ M si þ ni ¼ c þ w, N i¼1 N N i¼1 i¼1

(19:5)

where c is in the signal subspace and w N(0, Kn=N) [the notation N(m, C) stands for normal density function with mean m and covariance C]. Let ck be the projection of c onto h Ek i. The estimation of ck can be obtained by projecting r onto the signal subspace h Ek i, i.e., ^ck ¼ Ukr , where Uk ¼ Ek EkT is the projection matrix onto h Ek i.

Vertex Component Analysis

421

The first and the second-order moments of the estimated error c ^ck are E[c ^ck ] ¼ c E[^ck ] ¼ c E[Ukr] ¼ c Uk c ¼ c ck bk ,

(19:6)

E[(c ^ck )(c ^ck )T ] ¼ bk bTk þ Uk Kn UTk =N,

(19:7)

where E[] denotes the expectation operator and the bias bk ¼ U? k c is the projection of c onto the space h E? ck is N(bk, bTk bk þ Uk k i. Therefore, the density of the estimated error c ^ T Kn Uk =N). The mean-squared error between c and ^ck is mse(k) ¼ E[(c ^ck )T (c ^ck )] ¼ tr{E[(c ^ck )(c ^ck )T ]} ¼ bTk bk þ tr(Uk Kn UTk =N),

(19:8)

where tr() denotes the trace operator. As we do not know the bias bk, an app^ k ¼ U? r. Howroximation of Equation 19.8 can be achieved by using the bias estimate b k^ T ^ T ? ?T ^ ^ ever, E[ bk] ¼ bk and E[ bk bk] ¼ bk bk þ tr(Uk Kn Uk =N), that is, an unbiased estimate of ?T ^ k tr(U? ^Tk b bTk bk is b k KnUk =N). The criteria for the signal subspace order determination are then T ? ?T ^k ¼ arg min (b ^ ^Tb k k þ tr(Uk Kn Uk =N) tr(Uk Kn Uk =N )) k

T

? r þ 2tr(Uk Kn =N) tr(Kn =N )) ¼ arg min (rT U? k Uk k

r þ 2tr(Uk Kn =N)), ¼ arg min (rT U? k

(19:9)

k

where we have used U ¼ UT and U2 ¼ U for any projection matrix. Each term of Equation 19.9 has a clear meaning: the first accounts for projection error power and it is a decreasing function of k; the second accounts for noise power and it is an increasing function of k. 19.2.1.2

Computer Simulations

In this section we test the proposed method in simulated scenes. The spectral signatures are selected from the USGS digital spectral library. Abundance fractions are generated according to a Dirichlet distribution given by p(a1 , a2 , . . . , ap ) ¼

G(m1 þ m2 þ þ mp ) G(m1 )G(m2 ) G(mp )

m 1 m 1

a1 1 a2 2

m 1

ap p ,

(19:10)

P where 0 ai 1, ip¼ 1 ai ¼ 1, and G() denotes P the Gamma function. The mean value of the ith endmember fraction ai is E [ai] ¼ mi= pk ¼ 1 mk. The results next presented are organized into two experiments: in the first experiment the method is evaluated with respect to the SNR and to the number of endmembers p. We define SNR as SNR 10 log10

E[xT x] E[nT n]

(19:11)

Signal and Image Processing for Remote Sensing

422

Mean squared error Projection error Noise power

mse(k )

10−4

FIGURE 19.3 Mean-squared error versus k, with SNR ¼ 35 dB, p ¼ 5 (first experiment).

10−7

10−10 0

5

10 k

15

20

In the second experiment, the method is evaluated when a subset of the endmembers are present only in a few pixels of the scene. In the first experiment, the hyperspectral scene has 104 pixels and the number of endmembers vary from 3 to 21. The abundance fractions are Dirichlet distributed with mean value mi ¼ 1=p, for i ¼ 1, . . . , p. r þ 2 tr Figure 19.3 shows the evolution of the mean-squared error, that is, of rT U? k (Uk Kn=N) as a function of the parameter k, for SNR ¼ 35 dB and p ¼ 5. The minimum of the mean-squared error occurs for k ¼ 5, which is exactly the number of endmembers present in the image. Table 19.1 presents the signal subspace order estimate as a function of the SNR and of p. In this table we compare the proposed method and the VD, recently proposed in Ref. [18]. The VD was estimated by the NWHFC-based eigen-thresholding method using the Neyman–Pearson test with the false-alarm probability set to Pf ¼ 104. The proposed method finds the correct ID for SNR larger than 25 dB, and underestimates number of endmembers as the SNR decreases. In comparison with the NWHFC algorithm, the proposed approach yields systematically equal or better results. In the second experiment SNR ¼ 35 dB and p ¼ 8. The first five endmembers have a Dirichlet distribution as in the previous experiment and Figure 19.4 shows the meansquared error versus k, when p ¼ 8. The minimum of mse(k) is achieved for k ¼ 8. This means that the method is able to detect rare endmembers in the image. However, this ability degrades as SNR decreases, as expected.

TABLE 19.1 Signal Subspace Dimension ^k as a Function of SNR and of p (Bold: Proposed method; In Brackets: VD Estimation with NWHFC Method and Pf ¼ 104) Method SNR p p p p p

¼ ¼ ¼ ¼ ¼

3 5 10 15 21

New (VD) 50 dB

New (VD) 35 dB

3 (3) 5 (5) 10 (10) 15 (15) 21 (21)

3 (3) 5 (5) 10 (10) 15 (15) 21 (19)

New (VD) 25 dB 3 5 10 13 14

(3) (5) (8) (12) (12)

New (VD) 15 dB 3 (3) 5 (3) 8 (5) 9 (9) 10 (6)

New (VD) 5 dB 3 4 6 5 5

(2) (2) (2) (2) (1)

Vertex Component Analysis

100

Mean squared error Projection error Noise power

10−2 mse(k)

423

10−4 10−6 10−8 10−10

19.2.2

0

5

10 k

15

20

FIGURE 19.4 Mean-squared error versus k, with SNR ¼ 35 dB, p ¼ 8 (3 spectral vectors occur only on 4 pixels each (second experiment).

Dimensionality Reduction

As discussed before, in the absence of noise, observed vectors r lie in a convex cone Cp contained in a subspace h Ep i of dimension p. The VCA algorithm starts by identifying h Ep i and then projects points in Cp onto a simplex Sp by computing y ¼ r=(rT u) (see Figure 19.2). This simplex is contained in an affine set of dimension p 1. We note that the rational underlying the VCA algorithm is still valid if the observed data set is projected onto any subspace h Ed i h Ep i of dimension d, for p d L; that is, the projection of the cone Cp onto h Ed i followed by a projective projection is also a simplex with the same vertices. Of course, the SNR decreases as d increases. To illustrate the point above-mentioned, a simulated scene was generated according to the expression in Equation 19.1 with three spectral signatures (A, Biotite; B, Carnallite; and C, Ammonioalunite) (see Figure 19.1b). The abundance fractions follow a Dirichlet distribution, parameter g is set to 1, and the noise is zero-mean white Gaussian with covariance matrix s 2I, where I is the identity matrix and s ¼ 0.045, leading to an SNR ¼ 20 dB. Figure 19.5a presents a scatterplot of the simulated spectral mixtures without projection (bands l ¼ 827 nm and l ¼ 1780 nm). Two triangles whose vertices represent the true endmembers (solid line) and the estimated endmembers (dashed line) by the VCA algorithm are also plotted. Figure 19.5b presents a scatterplot (same bands) of projected data onto the estimated affine set of dimension 2 inferred by SVD. Noise is clearly reduced, leading to a visible improvement on the VCA results. As referred before, we apply the rescaling r=(rT u) to get rid of the topographic modulation factor. As the SNR decreases, this rescaling amplifies noise and is preferable to identify directly the affine space of dimension p 1 by using only PCA. This phenomenon is illustrated in Figure 19.6, where data clouds (noiseless and noisy) generated by two signatures are shown. Affine spaces h Ap1 i and h Ap1 i identified by PCA of dimension p 1 and SVD of dimension p, respectively, followed by projective projection are schematized by straight lines. In the absence of noise, the direction of ma is better ^^ a); in the presence of ^ a better than m identified by projective projection onto h Ap1 i (m strong noise, the direction of ma is better identified by orthogonal projection onto h Ap1 i ^ a better than m ^ ^ a). As a conclusion, when the SNR is higher than a given threshold SNRth, (m data is projected onto h Ep i followed by the rescaling r=(rT u); otherwise data is projected onto h Ap1 i. Based on experimental results, we propose the threshold SNRth ¼ 15 þ 10 log10 (p) dB. Since for zero-mean white noise SNR ¼ E[xT x]=(Ls2), we conclude that at

Signal and Image Processing for Remote Sensing

424

(b)

(a) Reflectance

Reflectance 1 Channel 150 (l =1780 nm)

Channel 150 (l =1780 nm)

1 0.8 0.6 0.4 0.2 0

0

0.8 0.6 0.4 0.2 0

0.2 0.4 0.6 0.8 1 Channel 50 (l = 827 nm)

0

0.2 0.4 0.6 0.8 Channel 50 (l = 827 nm)

1

FIGURE 19.5 Scatterplot (bands l ¼ 827 nm and l ¼ 1780 nm) of the three endmembers mixture. (a) Unprojected data. (b) Projected data using SVD. Solid and dashed lines represent simplexes computed from original and estimated endmembers (using VCA), respectively. (ß 2005 IEEE. With permission.)

SNRth, E[xTx]=(ps2) ¼ 101.5 L; that is, the SNRth corresponds to the fixed value L 101.5 of the SNR measured with respect to the signal subspace. 19.2.3

VCA Algorithm

The pseudo-code for the VCA method is shown in Algorithm 1. Symbols [c M]:,j and [c M]:,j:k stand for the jth column of c M and for the ith to k th columns of c M, respectively. Symbol c M stands for the estimated mixing matrix.

ma (σ ≠ 0) ma (σ= 0) ej Sp ma (σ ≠ 0) ma

ma (σ=0)

u

Sx mb FIGURE 19.6 Illustration of the noise effect on the dimensionality reduction. (ß 2005 IEEE. With permission.)

A¢p −1 (SVDp + projective. proj.) ei

Noiseless data cloud (r = x)

Ap −1 (PCAp −1)

Noisy data cloud (r = x + n)

Vertex Component Analysis

425

Algorithm 1: Vertex Component Analysis (VCA) INPUT: p, R [r1, r2, . . . , rN] r þ 2 tr(Uk Kn=N)); Uk obtained by SVD, and Kn estimated 1: p : ¼ argmink (rT U? k by multiple regression 2: SNRth ¼ 15 þ 10 log10 (p) dB 3: if SNR > SNRth then 4: d : ¼ p; 5: X : ¼ UdT R; Ud obtained by SVD 6: u : ¼ mean (X); u is a 1 d vector 7: [Y]:,j : ¼ [X]:,j=([X]:,jT u); projective projection 8: else 9: d : ¼ p 1; 10: [X]:,j : ¼ UdT ([R]:,j r ); {Ud obtained by PCA} 11: k : ¼ arg maxj ¼ 1 . . . N k [X]:,j k; k : ¼ [k j k j . . . jk]; 12: k is a1 N vector X 13: Y : ¼ ; k 14: end if 15: A : ¼ [«u j 0 j j 0]; {«u : ¼ [0, . . . ,0,1]T and A is a p p auxiliary matrix} 16: for i : ¼ 1 to p do 17: w : ¼ randn (0, Ip); {w is a zero-mean random Gaussian vector of covariance Ip}. (IAA# )w 18: f: ¼ k(IAA ; {f is a vector orthonormal to the subspace spanned by [A]:,1:i.} # )wk 19: v : ¼ fT Y;

20: k : ¼ arg maxj ¼ 1, . . . ,N j[v]:,jj; {find the projection extreme} 21: [A]:,i : ¼ [Y]:,k; 22: [indice]i : ¼ k; {stores the pixel index} 23: end for 24: if SNR i SNRth then 25: c M : ¼ Ud[X]:,indice; {c M is a L p estimated mixing matrix} 26: else c: ¼ Ud[X]:,indice þr ; {M c is a L p estimated mixing matrix} 27: M 28: end if Step 3: Test if the SNR is higher than SNRth to decide whether the data is to be projected onto a subspace of dimension p or p 1. In the first case the projection matrix Ud is obtained by SVD from RRT=N. In the second case the projection is obtained by PCA from (Rr )(Rr )T=N; recall that r is the sample mean of [R]:,i, for i ¼ 1, . . . , N. Step 5 and Step 10: Assure that the inner product between any vector [X]:,j and vector u is non-negative—a crucial condition for the VCA algorithm to work correctly. The chosen value of k ¼ argmaxj ¼ 1 N k[X]:,jk assures that the colatitude angle between u and any vector [X]:,j is between 08 and 458, avoiding numerical errors that otherwise would occur for angles near 908. Step 15: Initializes the auxiliary matrix A, which stores the projection of the estimated endmembers signatures. Assume that there exists at least one pure pixel of each

Signal and Image Processing for Remote Sensing

426

endmember in the input sample R (see Figure 19.2). Each time the loop for is executed, a vector f orthonormal to the space spanned by the columns of the auxiliary matrix A is randomly generated and y is projected onto f. Because we assume that pure endmembers occupy the vertices of a simplex, a fT [Y]:,i b, for i ¼ 1, . . . , N, where values a and b correspond to only pure pixels. We store the endmember signature corresponding to max(jaj,jbj). The next time loop for is executed, f is orthogonal to the space spanned by the signatures already determined. Since f is the projection of a zero-mean Gaussian independent random vector onto the orthogonal space spanned by the columns of [A]:,1:i, then the probability of f being null is zero. Notice that the underlying reason for generating a random vector is only to get a non-null projection onto the orthogonal space generated by the columns of A. Figure 19.2 shows the input samples and the chosen pixels, after the projection v ¼ fT Y. Then a second vector f orthonormal to the endmember a is generated and the second endmember is stored. Finally, Step 25 and Step 27 compute the columns of matrix c M, which contain the estimated endmembers signatures in the L-dimensional space.

19.3

Evaluation of the VCA Algorithm

In this section, we compare VCA, PPI, and N-FINDR algorithms. N-FINDR and PPI were coded according to Refs. [40] and [35], respectively. Regarding PPI, the number of skewers must be large [41,42,56–58]. On the basis of Monte Carlo runs, we concluded that the minimum number of skewers beyond which there is no unmixing improvements is about 1000. All experiments are based on simulated scenes from which we know the number of endmembers, their signatures, and their abundance fractions. Estimated endmembers are b 1, m b 2, . . . , m b p]. We also compare estimated abundance fractions the columns of c M [m ^ ¼M c# [r1, r2 , . . . , rN], (c given by S M# stands for pseudo-inverse of c M) with the true abundance fractions. To evaluate the performance of the three algorithms, we compute vectors of angles u [u1, u2, . . . , up]T and f [f1, f2, . . . , fp]T with1 ui

fi

^i > < m i ,m arccos , ^ ik kmi kkm ! ^ ] i ,: > < [S]i,: ,[S , arccos ^ ] i ,: k[S]i,: k[S

(19:12)

(19:13)

b i (ith endmember signature estimate) and where ui is the angle between vectors mi and m ^ ]i,: (vectors of R N formed by the ith lines of fi is the angle between vectors [S]i,: and [S ^ and S [s1, s2, . . . , sN], respectively). matrices S Based on u and f, we estimate the following root mean square error distances h i1=2 1 E kuk22 «u ¼ , (19:14) p «f ¼

i1=2 1 h E kfk22 : p

(19:15)

^ i and mi, for i ¼ 1, . . . , p; the second is The first quantity measures distances between m similar to the first, but for the estimated abundance fractions. Here we name «u and «f as 1

Notation hx,yi stands for the inner product xTy.

Vertex Component Analysis

427

rmsSAE and rmsFAAE, respectively (SAE stands for signature angle error and FAAE stands for fractional abundance angle error). Mean values in Equation 19.14 and Equation 19.15 are approximated by sample means based on one hundred Monte Carlo runs. In all experiments, the spectral signatures are selected from the USGS digital spectral library (Figure 19.1b shows three of these endmember signatures). Abundance fractions are generated according to a Dirichlet distribution given by Equation 19.10. Parameter g is Beta (b1, b2) distributed, that is, p(g) ¼

G(b1 þ b2 ) b1 1 g (g 1)b2 1 , G(b1 )G(b2 )

which is also a Dirichlet distribution*. The Dirichlet density, besides enforcing positivity and full additivity constraints, displays a wide range of shapes depending on the parameters m1, . . . , mi. This flexibility influences its choice in our simulations. The results presented next are organized into five experiments: in the first experiment, the algorithms are evaluated with respect to the SNR and to the absence of pure pixels. As mentioned before, we define SNR 10 log10

E[xT x] : E[nT n]

(19:16)

In the case of zero-mean noise with covariance s 2I and Dirichlet abundance fractions, one obtains SNR ¼ 10 log10

tr[MKs MT ] , Ls2

(19:17)

where mmT þ diag(m) Ks s2g E[aaT ] ¼ s2g Pp , Pp ( i¼1 mi )(1 þ i¼1 mi )

(19:18)

e m ¼ [m1 mp]T, and sg2 is the variance of parameter g. For example, assuming abundanc 2 Pp fractio ns equally distri buted, we have, after som e a lgebra, SNR ’ 10 log s ¼ 1 10 g i p p p P P P 2 mij =p =(Ls2) for mp 1 and SNR ’ 10 log10 sg2 ( mij Þ2 =p2 =(Ls2) for m p 1. j¼1

i¼1 j¼1

In the second experiment, the performance is measured as function of the parameter g, which models fluctuations on the illumination due to surface topography. In the third experiment, to illustrate the algorithm performance, the number of pixels of the scene varies with the size of the covered area—as the number of pixels increases, the likelihood of having pure pixels also increases, improving the performance of the unmixing algorithms. In the fourth experiment, the algorithms are evaluated as a function of the number of endmembers present in the scene. Finally, in the fifth experiment, the number of floating point operations (flops) is measured, to compare the computational complexity of the VCA, N-FINDR, and the PPI algorithms. In the first experiment, the hyperspectral scene has 1000 pixels and the abundance fractions are Dirichlet distributed with mi ¼ 1=3, for i ¼ 1,2,3; parameter g is Beta-distributed with b1 ¼ 20 and b2 ¼ 1 implying E[g] ¼ 0.952 and sg ¼ 0.05. Figure 19.7 shows performance results as a function of the SNR. As expected, the presence of noise degrades the performance of all algorithms. In terms of rmsSAE and rmsFAAE (Figure 19.7a and Figure 19.7b), we can see that when SNR is less than 20 dB *With one component.

Signal and Image Processing for Remote Sensing

428 (a)

(b) rmsSAE

20

50

15

ef (degrees)

eq (degrees)

60

VCA NFINDR PPI

10

rmsAE VCA NFINDR PPI

40 30 20

5 10 0

∞

30

20 SNR (dB)

0 ∞

10

30

20 SNR (dB)

10

FIGURE 19.7 First scenario: (N ¼ 1000, p ¼ 3, L ¼ 224, m1 ¼ m2 ¼ m3 ¼ 1=3, b1 ¼ 20, b2 ¼ 1): (a) rmsSAE as function of SNR; (b) rmsFAAE as function of SNR. (ß 2005 IEEE. With permission.)

the VCA algorithm exhibits the best performance. Note that for noiseless scenes, only the VCA algorithm has zero rmsSAE. PPI algorithm displays the worst result. Figure 19.8 shows performance results as a function of the SNR in the absence of pure pixels. Spectral data without pure pixels was obtained by rejecting pixels with any abundance fraction smaller than 0.2. Figure 19.9 shows the scatterplot obtained. VCA and N-FINDR display similar results, and both are better than PPI. Notice that the performance is almost independent of the SNR and is uniformly worse than that displayed with pure pixels and SNR ¼ 5 dB in the first experiment. We conclude that this family of algorithms is more affected by the lack of pure pixels than by low SNR. For economy of space and also because rmsSAE and rmsFAAE disclose similar pattern of behavior, we only present the rmsSAE in the remaining experiments. In the second experiment, abundance fractions are generated as in the first one, SNR is set to 20 dB, and parameter g is Beta-distributed with b2 ¼ 2, . . . , 28. This corresponds to the variation of E[g] from 0.66 to 0.96 and sg from 0.23 to 0.03. By varying parameter b1, the severity of topographic modulation is also varied. Figure 19.10 illustrates the effect of topographic modulation on the performance of the three algorithms. When b1 grows (sg (a)

eq (degrees)

25

rmsSAE

50

20 15 10

VCA NFINDR PPI

40 30 20 10

5 0 ∞

rmsAE

60

VCA NFINDR PPI ef (degrees)

30

(b)

30

20 SNR (dB)

10

0 ∞

30

20 SNR (dB)

10

FIGURE 19.8 Robustness to the absence of pure pixels (N ¼ 1000, p ¼ 3, L ¼ 224, m1 ¼ m2 ¼ m3 ¼ 1=3, b1 ¼ 20, b2 ¼ 1): (a) rmsSAE as function of SNR; (b) rmsFAAE as function of SNR. (ß 2005 IEEE. With permission.)

Vertex Component Analysis

429

Reflectance Channel 150 (l =1780 nm)

1 0.8 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Channel 50 (l = 827 nm)

1

FIGURE 19.9 Illustration of the absence of pure pixels (N ¼ 1000, p ¼ 3, L ¼ 224, m1 ¼ m2 ¼ m3 ¼ 1=3, g ¼ 1). Scatterplot (bands l ¼ 827 nm and l ¼ 1780 nm), with abundance fraction smaller than 0.2 rejected. (ß 2005 IEEE. With permission.)

gets smaller) the performance improves. This is expected because the simplex identification is more accurate when the topographic modulation is smaller. The PPI algorithm displays the worst performance for sg h 0.1. VCA and N-FINDR algorithms have identical performances when b1 takes higher values (sg h 0.045); otherwise VCA algorithm has the best performance. VCA is more robust to topographic modulation because it seeks for the extreme projections of the simplex, whereas N-FINDR seeks for the maximum volume, which is more sensitive to fluctuations on g. In the third experiment, the number of pixels is varied, the abundance fractions are generated as in the first one, and SNR ¼ 20 dB. Figure 19.11 shows that VCA and N-FINDR exhibit identical results, whereas the PPI algorithm displays the worst result. Note that the behavior of the three algorithms is quasi-independent of the number of pixels. In the fourth experiment, we vary the number of signatures from p ¼ 3 to p ¼ 21, the scene has 1000 pixels, and SNR ¼ 30 dB. Figure 19.12a shows that VCA and N-FINDR performances are comparable, whereas PPI displays the worst result. The rmsSAE increase slightly as the number of endmembers present in the scene increases. The rmsSAE is also plotted as a function of the SNR with p ¼ 10 (see Figure 19.12b). Comparing with Figure 19.7a we conclude that when the number of endmembers increases, the performance of the algorithms slightly decreases. In the fifth and last experiment, the number of flops is measured to compare the computational complexity of VCA, PPI, and N-FINDR algorithms. Here we use the scenarios of the second and third experiments. Table 19.2 presents approximated expressions

rmsSAE

25

eq (degrees)

VCA NFINDR PPI 10

3

1 0.23

0.1

0.05 sg

0.04

0.03

FIGURE 19.10 Robustness to the topographic modulation (N ¼ 1000, p ¼ 3, L ¼ 224, m1 ¼ m2 ¼ m3 ¼ 1=3, SNR ¼ 20 dB, b2 ¼ 1), rmsSEA as function of the sg2 (variance of g). (ß 2005 IEEE. With permission.)

Signal and Image Processing for Remote Sensing

430

rmsSEA 20 VCA NFINDR PPI

eq (degrees)

15

10

5 FIGURE 19.11 rmsSEA as function of the number of pixels in a scene (p ¼ 6, L ¼ 224, m1 ¼ m2 ¼ m3 ¼ 1=3, SNR ¼ 20 dB, b2 ¼ 20, b2 ¼ 1). (ß 2005 IEEE. With permission.)

0 102

103 Number of pixels

104

for the number of flops used by each algorithm. These expressions neither account for the computational complexities involved in the computations of the sample covariance (R r) (R r )T=N nor in the computations of the eigen decomposition. The reason is that these operations, compared with the VCA, PPI, and N-FINDR algorithms, have a negligible computational cost since: .

The computation of (R r )(R r )T=N has a complexity of 2NL2 flops. However, in practice one does not need to use the complete set of N hyperspectral vectors. If the scene is noiseless, only p 1 linearly independent vectors would be enough to infer the exact subspace h Ep 1 i. In the presence of noise, however, a larger set should be used. For example in a 1000 1000 hyperspectral image, we found out that only 1000 samples randomly sampled are enough to find a very good estimate of h Ep 1 i. Even a sample size of 100 leads to good results in this respect.

.

Concerning the eigen decomposition of (R r )(R r )T=N (or the SVD of RRT=N), we only need to compute p 1 (or p) eigenvectors corresponding to the largest p 1 eigenvalues (or p single values). For these partial eigen decompositions, we (b)

(a) rmsSAE

20

VCA NFINDR PPI

10

20

10

5

0

VCA NFINDR PPI

30 eq (degrees)

eq (degrees)

15

rmsSAE (p =10)

40

5

10 15 Number of sources

20

0

∞

30

20 SNR (dB)

10

FIGURE 19.12 Impact of the number of endmembers (N ¼ 1000, L ¼ 224, m1 ¼ m2 ¼ m3 ¼ 1=3, SNR ¼ 30 dB, b2 ¼ 20, b2 ¼ 1), (a) rmsSEA as function of the number of endmembers; (b) rmsSEA function of the SNR with p ¼ 10. (ß 2005 IEEE. With permission.)

Vertex Component Analysis

431 TABLE 19.2 Computational Complexity of VCA, N-FINDR, and PPI Algorithms Algorithm

Complexity (flops)

VCA N-FINDR PPI

2p2N phþ1N 2psN

Source: ß 2005 IEEE. With permission.

have used the PCA algorithm [51] (or SVD analysis [49]) whose complexity is negligible compared with the remaining operations. The VCA algorithm projects all data (N vectors of size p) onto p orthogonal directions. N-FINDR computes pN times the determinant of a p p matrix, whose complexity is ph, with 2.3 h h h 2.9 [59]. Assuming that N p i 2, VCA complexity is lower than that of N-FINDR. With regard to PPI, given that the number of skewers (s) is much higher than the usual number of endmembers, the PPI complexity is much higher than that of VCA. Hence, we conclude that the VCA algorithm has always the lowest complexity. Figure 19.13 plots the flops for the three algorithms after data projection. In Figure 19.13a the abscissa is the number of endmembers in the scene, whereas in Figure 19.13b the abscissa is the number of pixels. Note that for five endmembers, VCA computational complexity is one order of magnitude lower than that of the N-FINDR algorithm. When the number of endmembers is higher than 15, the VCA computational complexity is, at least, two orders of magnitude lower than PPI and N-FINDR algorithms. In the introduction, besides PPI and N-FINDR algorithms, we have also mentioned ORASIS. Nevertheless, no comparison whatsoever was made with this method. The reason is that there are no ORASIS implementation details published in the literature. We can, however, make a few considerations based on the results recently published in Ref. [58]. This work compares, among others, PPI, N-FINDR, and ORASIS algorithms. Although the relative performance of the three algorithms varies, depending on SNR, number of endmembers, spectral signatures, type of atmospheric correction, and so on, (a) 109

109

VCA NFINDR PPI

108

107

Flops

Flops

108

(b) Computational complexity

106 105 104 0

Computational complexity VCA NFINDR PPI

107 106 105

5

10 15 20 Number of sources

25

104 102

FIGURE 19.13 Computational complexity measured in the number of flops.

103 Number of pixels

104

Signal and Image Processing for Remote Sensing

432

both PPI and N-FINDR generally perform better than ORASIS when SNR is low. Since in all comparisons conducted here the VCA performs better than or equal to PPI and N-FINDR, we expect that the proposed method performs better than or equal to ORASIS when low SNR dominates the data, although further experiments would be required to demonstrate the above remark.

19 .4

E val uati on with Ex peri me nta l Da ta

In this section, we apply the VCA algorithm to real hyperspectral data collected by the AVIRIS [5] sensor over Cuprite, Nevada. Cuprite is a mining area in southern Nevada with mineral and little vegetation [60], located approximately 200 km northwest of Las Vegas. The test site is a relatively undisturbed acid-sulphate hydrothermal system near highway 95. The geology and alteration were previously mapped in detail [61,62]. A geologic summary and a mineral map can be found in Refs. [60,63]. This site has been used extensively for remote sensing experiments over the past years [64,65]. Our study is based on a subimage (250 190 pixels and 224 bands) of a data set acquired on the AVIRIS flight 19 June 1997 (see Figure 19.14a). The AVIRIS instrument covers the spectral region from 0.41 to 2.45 mm in 224 bands with 10 nm bands. Flying at an altitude of 20 km, the AVIRIS flight has an instantaneous field of view (IFOV) of 20 m and views a swath over 10 km wide. To compare results with a signature library, we process the reflectance image after atmospheric correction. The proposed method to estimate the number of endmembers when applied to this data set estimates ^k ¼ 23 (see Figure 19.15b). According to the truth data presented in Ref. [60], there are eight materials in this area. This difference is due to (1) the presence of rare pixels not accounted for in the truth data [60] and (2) spectral variability. The bulk of spectral energy is explained with only a few eigenvectors. This can be observed from Figure 19.15a where the accumulated signal energy is plotted as a function of the eigenvalue index. The energy contained in the first eight eigenvalues is 99.94% of the total signal energy. This is further confirmed in Figure 19.14b where we show, in gray level and for each pixel, the percentage of energy contained in the subspace (a)

(b) 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02

FIGURE 19.14 (a) Band 30 (wavelength l ¼ 667.3 nm) of the subimage of AVIRIS Cuprite Nevada data set; (b) percentage of energy in the subspace E9:23.

Vertex Component Analysis

433

(a)

(b) 10−2

100

Mean squared error Projection error Noise power

mse(k )

Energy (%)

98 96 94

10−10

92 90

10−6

2

4 6 8 Number of eigenvalues

10

0

20

40 k

60

80

FIGURE 19.15 (a) Percentage of signal energy as a function of the number of eigenvalues; (b) mean-squared error versus k for cuprite data set.

h E9:23 i ¼ h [e9, . . . , e23] i. Notice that only a few (rare) pixels contain energy in h E9:23 i. Furthermore, these energies are a very small percentage of the corresponding spectral vector energies (less than 0.16%) in this subspace. The VD estimated by the HFC-based eigen-thresholding method [18] (Pf ¼ 103) on the same data set yields ^k ¼ 20. A lower value of Pf would lead to a lower number of endmembers. This result seems to indicate that the proposed method performs better than the HFC with respect to rare materials. To determine the type of projection applied by VCA, we compute SNR ’ 10 log10

PRp (p=L)PR , P R P Rp

(19:19)

where PR E[rT r] and PRp ¼ E[rT Ud UdT r] in the case of SVD and PRp E[rT Ud UdT r] þ rTr in the case of PCA. A visual comparison between VCA results on the Cuprite data set and the ground truth presented in Ref. [63] shows that the first component (see Figure 19.16a) is predominantly Alunite, the second component (see Figure 19.16b) is Sphene, the third component (see Figure 19.16c) is Buddingtonite, and the fourth component (see Figure 19.16d) is Montmorillonite. The fifth, seventh, and the eighth components (see Figure 19.16e, Figure 19.16g, and Figure 19.16h) are Kaolinite and the sixth component (see Figure 19.16f) is predominantly Nontronite. To confirm the classification based on the estimated abundance fractions, a comparison of the estimated VCA endmember signatures with laboratory spectra [53] is presented in Figure 19.17. The signatures provided by VCA are scaled by a factor to minimize the mean square error between them and the respective library spectra. The estimated signatures are close to the laboratory spectra. The larger mismatches occur for buddingtonite and kaolinite (#1) signatures, but only on a small percentage of the total bands. Table 19.3 compares the spectral angles between extracted endmembers and laboratory reflectances for the VCA, N-FINDR, and the PPI algorithms. The first column shows the laboratory substances with smaller spectral angle distance with respect to the signature extracted by VCA algorithm; the second column shows the respective angles. The third and the fourth columns are similar to the second one, except when the closest spectral substance is different from the corresponding VCA one. In these

Signal and Image Processing for Remote Sensing

434 (a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

FIGURE 19.16 Eight abundance fractions estimated with VCA algorithm: (a) Alunite or Montmorillonite; (b) Sphene; (c) Buddingtonite; (d) Montmorillonite; (e) Kaolinite #1; (f) Nontronite or Kaolinite; (g) Kaolinite #2; (h) Kaolinite #3;

cases, we write the name of the substance. The displayed results follow the pattern of behavior shown in the simulations, where VCA performs better than PPI and better or similar to N-FINDR.

19.5

Conclusions

We have presented a new algorithm to unmix linear mixtures of hyperspectral sources, termed vertex component analysis. The VCA algorithm is unsupervised and is based on the geometry of convex sets. It exploits the fact that endmembers occupy the vertices

Vertex Component Analysis

435 (b)

1

1

0.8

0.8

Reflectance (%)

Reflectance (%)

(a)

0.6 0.4

0.6 0.4 0.2

0.2 0

0.5

1

1.5 l (μm)

2

(c)

1

0.8

0.8

0.6 0.4

0

0.5

1

0.5

0.5

1.5 l (μm)

2

2.5

1.5 l (μm)

2

2.5

1

1.5 l (μm)

2

2.5

1

1.5 l (μm)

2

2.5

0.4 0.2

0.5

1

1.5 l (μm)

2

0

2.5

(e)

(f) 1

1

0.8

0.8

Reflectance (%)

Reflectance (%)

1

0.6

0.2

0.6 0.4

0.6 0.4

0.2

0.2

0

0 0.5

1

1.5 l (μm)

2

2.5

(g)

(h) 1

1

0.8

0.8

Reflectance (%)

Reflectance (%)

0.5

(d) 1 Reflectance (%)

Reflectance (%)

0

2.5

0.6 0.4 0.2 0

0.6 0.4 0.2

0.5

1

1.5 l (μm)

2

2.5

0

FIGURE 19.17 Comparison of the extracted signatures (dotted line) with the USGS spectral library (solid line): (a) Alunite or Montmorillonite; (b) Sphene; (c) Buddingtonite; (d) Montmorillonite; (e) Kaolinite #1; (f) Nontronite or Kaolinite; (g) Kaolinite #2; (h) Kaolinite #3;

Signal and Image Processing for Remote Sensing

436 TABLE 19.3

Spectral Angle Distance between Extracted Endmembers and Laboratory Reflectances for VCA, N-FINDR, and PPI Algorithms Substance Alunite or montmorillonite Sphene Buddingtonite Montmorillonite Kaolinite #1 Nontronite or kaolinite Kaolinite #2 Kaolinite #3

VCA

N-FINDR

PPI

3.9 3.1 4.2 3.1 5.7 3.4 3.5 4.2

3.9 Barite (2.7) 4.1 3.0 5.3 4.8 Montmor. (4.2) 4.3

4.3 Pyrope (3.9) 3.9 2.9 Dumortierite (5.3) 4.7 3.5 5.0

of a simplex. This algorithm also estimates the dimensionality of hyperspectral linear mixtures. To determine the signal subspace in hyperspectral data set the proposed method first estimates the signal and noise correlations matrices, and then it selects the subset of eigenvalues that best represents the signal subspace in the least-square sense. A comparison with HFC and NWHFC methods is conducted yielding comparable or better results than these methods. The VCA algorithm assumes the presence of pure pixels in the data and iteratively projects data onto a direction orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to the extreme of the projection. The algorithm iterates until the number of endmembers is exhausted. A comparison of VCA with PPI [35] and N-FINDR [40] algorithms is conducted. Several experiments with simulated data lead to the conclusion that VCA performs better than PPI and better than or similar to N-FINDR. However, VCA has the lowest computational complexity among these three algorithms. Savings in computational complexity ranges between one and two orders of magnitude. This conclusion has great impact when the data set has a large number of pixels. VCA was also applied to real hyperspectral data. The results achieved show that VCA is an effective tool to unmix hyperspectral data.

References 1. Jose´ M.P. Nascimonto and Jose´ M. Bioucas Dias, Vertex component analysis: A fast algorithm to unmix hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing, vol. 43, no. 4, pp. 898–910, 2005. 2. B. Hapke, Theory of Reflectance and Emmittance Spectroscopy, Cambridge, U.K., Cambridge University Press, 1993. 3. R.N. Clark and T.L. Roush, Reflectance spectroscopy: Quantitative analysis techniques for remote sensing applications, Journal of Geophysical Research, vol. 89, no. B7, pp. 6329–6340, 1984. 4. T.M. Lillesand, R.W. Kiefer, and J.W. Chipman, Remote Sensing and Image Interpretation, 5th ed., John Wiley & Sons, Inc., New York, 2004. 5. G. Vane, R. Green, T. Chrien, H. Enmark, E. Hansen, and W. Porter, The airborne visible= infrared imaging spectrometer (AVIRIS), Remote Sensing of the Environment, vol. 44, pp. 127– 143, 1993.

Vertex Component Analysis

437

6. M.O. Smith, J.B. Adams, and D.E. Sabol, Spectral mixture analysis-New strategies for the analysis of multispectral data, Brussels and Luxemburg, Belgium, 1994, pp. 125–143. 7. A.R. Gillespie, M.O. Smith, J.B. Adams, S.C. Willis, A.F. Fisher, and D.E. Sabol, Interpretation of residual images: Spectral mixture analysis of AVIRIS images, Owens Valley, California, in Proceedings of the 2nd AVIRIS Workshop, R.O. Green, Ed., JPL Publications, vol. 90–54, pp. 243–270, 1990. 8. J.J. Settle, On the relationship between spectral unmixing and sub-space projection, IEEE Transactions on Geoscience And Remote Sensing, vol. 34, pp. 1045–1046, 1996. 9. Y.H. Hu, H.B. Lee, and F.L. Scarpace, Optimal linear spectral un-mixing, IEEE Transactions on Geoscience and Remote Sensing, vol. 37, pp. 639–644, 1999. 10. M. Petrou and P.G. Foschi, Confidence in linear spectral unmixing of single pixels, IEEE Transactions on Geoscience and Remote Sensing, vol. 37, pp. 624–626, 1999. 11. S. Liangrocapart and M. Petrou, Mixed pixels classification, in Proceedings of the SPIE Conference on Image and Signal Processing for Remote Sensing IV, vol. 3500, pp. 72–83, 1998. 12. N. Keshava and J. Mustard, Spectral unmixing, IEEE Signal Processing Magazine, vol. 19, no. 1, pp. 44–57, 2002. 13. R.B. Singer and T.B. McCord, Mars: Large scale mixing of bright and dark surface materials and implications for analysis of spectral reflectance, in Proceedings of the 10th Lunar and Planetary Science Conference, pp. 1835–1848, 1979. 14. B. Hapke, Bidirection reflectance spectroscopy. I. theory, Journal of Geophysical Research, vol. 86, pp. 3039–3054, 1981. 15. R. Singer, Near-infrared spectral reflectance of mineral mixtures: Systematic combinations of pyroxenes, olivine, and iron oxides, Journal of Geophysical Research, vol. 86, pp. 7967–7982, 1981. 16. B. Nash and J. Conel, Spectral reflectance systematics for mixtures of powdered hypersthene, labradoride, and ilmenite, Journal of Geophysical Research, vol. 79, pp. 1615–1621, 1974. 17. C.C. Borel and S.A. Gerstl, Nonlinear spectral mixing models for vegetative and soils surface, Remote Sensing of the Environment, vol. 47, no. 2, pp. 403–416, 1994. 18. C.-I. Chang, Hyperspectral Imaging: Techniques for spectral detection and classification, New York, Kluwer Academic, 2003. 19. G. Shaw and H. Burke, Spectral imaging for remote sensing, Lincoln Laboratory Journal, vol. 14, no. 1, pp. 3–28, 2003. 20. D. Manolakis, C. Siracusa, and G. Shaw, Hyperspectral subpixel target detection using linear mixing model, IEEE Transactions on Geoscience and Remote Sensing, vol. 39, no. 7, pp. 1392–1409, 2001. 21. N. Keshava, J. Kerekes, D. Manolakis, and G. Shaw, An algorithm taxonomy for hyperspectral unmixing, in Proceedings of the SPIE AeroSense Conference on Algorithms for Multispectral and Hyperspectral Imagery VI, vol. 4049, pp. 42–63, 2000. 22. A.S. Mazer, M. Martin, et al., Image processing software for imaging spectrometry data analysis, Remote Sensing of the Environment, vol. 24, no. 1, pp. 201–210, 1988. 23. R.H. Yuhas, A.F.H. Goetz, and J.W. Boardman, Discrimination among semi-arid landscape endmembres using the spectral angle mapper (SAM) algorithm, in Summaries of the 3rd Annual JPL Airborne Geoscience Workshop, R.O. Green, Ed., Publ., 92–14, vol. 1, pp. 147–149, 1992. 24. J.C. Harsanyi and C.-I. Chang, Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach, IEEE Transactions on Geoscience and Remote Sensing, vol. 32, no. 4, pp. 779–785, 1994. 25. C. Chang, X. Zhao, M.L.G. Althouse, and J.J. Pan, Least squares subspace projection approach to mixed pixel classification for hyperspectral images, IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 3, pp. 898–912, 1998. 26. D.C. Heinz, C.-I. Chang, and M.L.G. Althouse, Fully constrained least squares-based linear unmixing, in Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, pp. 1401–1403, 1999. 27. P. Common, C. Jutten, and J. Herault, Blind separation of sources, part II: Problem statement, Signal Processing, vol. 24, pp. 11–20, 1991. 28. J. Bayliss, J.A. Gualtieri, and R. Cromp, Analysing hyperspectral data with independent component analysis, in Proceedings of SPIE, vol. 3240, pp. 133–143, 1997.

438

Signal and Image Processing for Remote Sensing

29. C. Chen and X. Zhang, Independent component analysis for remote sensing study, in Proceedings of the SPIE Symposium on Remote Sensing Conference on Image and Signal Processing for Remote Sensing V, vol. 3871, pp. 150–158, 1999. 30. T.M. Tu, Unsupervised signature extraction and separation in hyperspectral images: A noiseadjusted fast independent component analysis approach, Optical Engineering=SPIE, vol. 39, no. 4, pp. 897–906, 2000. 31. S.-S. Chiang, C.-I. Chang, and I.W. Ginsberg, Unsupervised hyperspectral image analysis using independent component analysis, in Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2000. 32. Jos´e M.P. Nascimonto and Jose´ M. Bioucas Dias, Does independent component analysis play a role in unmixing hyperspectral data? in Pattern Recognition and Image Analysis, ser. Lecture Notes in Computer Science, F. j. Perales, A. Campilho, and N.P.B.A. Sanfeliu, Eds., vol. 2652. SpringerVerlag, pp. 616–625, 2003. 33. Jos´e M.P. Nascimonto and Jose´ M. Bioucas Dias, Does independent component analysis play a role in unmixing hyperspectral data? IEEE Transactions on Geoscience and Remote Sensing, vol. 43, no. 1, pp. 175–187, 2005. 34. A. Ifarraguerri and C.-I. Chang, Multispectral and hyperspectral image analysis with convex cones, IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 2, pp. 756–770, 1999. 35. J. Boardman, Automating spectral unmixing of AVIRIS data using convex geometry concepts, in Summaries of the Fourth Annual JPL Airborne Geoscience Workshop, JPL Publications, 93–26, AVIRIS Workshop, vol. 1, 1993, pp. 11–14. 36. M.D. Craig, Minimum-volume transforms for remotely sensed data, IEEE Transactions on Geoscience and Remote Sensing, vol. 32, pp. 99–109, 1994. 37. C. Bateson, G. Asner, and C. Wessman, Endmember bundles: A new approach to incorporating endmember variability into spectral mixture analysis, IEEE Transactions on Geoscience and Remote Sensing, vol. 38, pp. 1083–1094, 2000. 38. R. Seidel, Convex Hull Computations, Boca Raton, CRC Press, ch. 19, pp. 361–375, 1997. 39. S. Geman and D. Geman, Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Machine Intell., vol. 6, no. 6, pp. 721–741, 1984. 40. M.E. Winter, N-findr: an algorithm for fast autonomous spectral endmember determination in hyperspectral data, in Proceedings of the SPIE Conference on Imaging Spectrometry V, pp. 266–275, 1999. 41. J. Boardman, F.A. Kruse, and R.O. Green, Mapping target signatures via partial unmixing of AVIRIS data, in Summaries of the V JPL Airborne Earth Science Workshop, vol. 1, pp. 23–26, 1995. 42. J. Theiler, D. Lavenier, N. Harvey, S. Perkins, and J. Szymanski, Using blocks of skewers for faster computation of pixel purity index, in Proceedings of the SPIE International Conference on Optical Science and Technology, vol. 4132, pp. 61–71, 2000. 43. D. Lavenier, J. Theiler, J. Szymanski, M. Gokhale, and J. Frigo, Fpga implementation of the pixel purity index algorithm, in Proceedings of the SPIE Photonics East, Workshop on Reconfigurable Architectures, 2000. 44. J.H. Bowles, P.J. Palmadesso, J.A. Antoniades, M.M. Baumback, and L.J. Rickard, Use of filter vectors in hyperspectral data analysis, in Proceedings of the SPIE Conference on Infrared Spaceborne Remote Sensing III, vol. 2553, pp. 148–157, 1995. 45. J.H. Bowles, J.A. Antoniades, M.M. Baumback, J.M. Grossmann, D. Haas, P.J. Palmadesso, and J. Stracka, Real-time analysis of hyperspectral data sets using nrl’s orasis algorithm, in Proceedings of the SPIE Conference on Imaging Spectrometry III, vol. 3118, pp. 38–45, 1997. 46. J.M. Grossmann, J. Bowles, D. Haas, J.A. Antoniades, M.R. Grunes, P. Palmadesso, D. Gillis, K. Y. Tsang, M. Baumback, M. Daniel, J. Fisher, and I. Triandaf, Hyperspectral analysis and target detection system for the adaptative spectral reconnaissance program (asrp), in Proceedings of the SPIE Conference on Algorithms for Multispectral and Hyperspectral Imagery IV, vol. 3372, pp. 2–13, 1998. 47. J.M.P. Nascimento and J.M.B. Dias, Signal subspace identification in hyperspectral linear mixtures, in Pattern Recognition and Image Analysis, ser. Lecture Notes in Computer Science, J. S. Marques, N.P. de la Blanca, and P. Pina, Eds., vol. 3523, no. 2., Heidelberg, Springer-Verlag, pp. 207–214, 2005. 48. J.M.B. Dias and J.M.P. Nascimento, Estimation of signal subspace on hyperspectral data, in Proceedings of SPIE conference on Image and Signal Processing for Remote Sensing XI, L. Bruzzone, Ed., vol. 5982, pp. 191–198, 2005.

Vertex Component Analysis

439

49. L.L. Scharf, Statistical Signal Processing, Detection Estimation and Time Series Analysis, Reading, MA, Addison-Wesley, 1991. 50. R.N. Clark, G.A. Swayze, A. Gallagher, T.V. King, and W.M. Calvin, The U.S. geological survey digital spectral library: Version 1: 0.2 to 3.0 mm, U.S. Geological Survey, Open File Report 93–592, 1993. 51. I.T. Jolliffe, Principal Component Analysis, New York, Springer-Verlag, 1986. 52. A. Green, M. Berman, P. Switzer, and M.D. Craig, A transformation for ordering multispectral data in terms of image quality with implications for noise removal, IEEE Transactions on Geoscience and Remote Sensing, vol. 26, no. 1, pp. 65–74, 1994. 53. J.B. Lee, S. Woodyatt, and M. Berman, Enhancement of high spectral resolution remote-sensing data by noise-adjusted principal components transform, IEEE Transactions on Geoscience and Remote Sensing, vol. 28, pp. 295–304, 1990. 54. J. Harsanyi, W. Farrand, and C.-I. Chang, Determining the number and identity of spectral endmembers: An integrated approach using neymanpearson eigenthresholding and iterative constrained rms error minimization, in Proceedings of the 9th Thematic Conference on Geologic Remote Sensing, 1993. 55. R. Roger and J. Arnold, Reliably estimating the noise in aviris hyperspectral imagers, International Journal of Remote Sensing, vol. 17, no. 10, pp. 1951–1962, 1996. 56. J.H. Bowles, M. Daniel, J.M. Grossmann, J.A. Antoniades, M.M. Baumback, and P. J. Palmadesso, Comparison of output from orasis and pixel purity calculations, in Proceedings of the SPIE Conference on Imaging Spectrometry IV, vol. 3438, pp. 148–156, 1998. 57. A. Plaza, P. Martinez, R. Perez, and J. Plaza, Spatial=spectral endmember extraction by multidimensional morphological operations, IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 9, pp. 2025–2041, 2002. 58. A. Plaza, P. Martinez, R. Perez, and J. Plaza, A quantitative and comparative analysis of endmember extraction algorithms from hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 3, pp. 650–663, 2004. 59. E. Kaltofen and G. Villard, On the complexity of computing determinants, in Proceedings of the Fifth Asian Symposium on Computer Mathematics, Singapore, ser. Lecture Notes Series on Computing, K. Shirayanagi and K. Yokoyama, Eds., vol. 9, pp. 13–27, 2001. 60. G. Swayze, R. Clark, S. Sutley, and A. Gallagher, Ground-truthing aviris mineral mapping at Cuprite, Nevada, in Summaries of the Third Annual JPL Airborne Geosciences Workshop, vol. 1, pp. 47–49, 1992. 61. R. Ashley and M. Abrams, Alteration mapping using multispectral images—Cuprite mining district, Esmeralda county, U.S. Geological Survey, Open File Report 80-367, 1980. 62. M. Abrams, R. Ashley, L. Rowan, A. Goetz, and A. Kahle, Mapping of hydrothermal alteration in the Cuprite mining district, Nevada, using aircraft scanner images for the spectral region 0.46 to 2.36 mm, Geology, vol. 5, pp. 713–718, 1977. 63. G. Swayze, The hydrothermal and structural history of the Cuprite mining district, southwestern Nevada: An integrated geological and geo-physical approach, Ph.D. Dissertation, University of Colorado, 1997. 64. A. Goetz and V. Strivastava, Mineralogical mapping in the cuprite mining district, in Proceedings of the Airborne Imaging Spectrometer Data Analysis Workshop, JPL Publications 85-41, pp. 22–29, 1985. 65. F. Kruse, J. Boardman, and J. Huntington, Comparison of airborne and satellite hyperspectral data for geologic mapping, in Proceedings of the SPIE Aerospace Conference, vol. 4725, pp. 128–139, 2002.

20 Two ICA Approaches for SAR Image Enhancement

Chi Hau Chen, Xianju Wang, and Salim Chitroub

CONTENTS 20.1 Part 1: Subspace Approach of Speckle Reduction in SAR Images Using ICA..... 441 20.1.1 Introduction ....................................................................................................... 441 20.1.2 Review of Speckle Reduction Techniques in SAR Images ........................ 442 20.1.3 The Subspace Approach to ICA Speckle Reduction ................................... 442 20.1.3.1 Estimating ICA Bases from the Image ......................................... 442 20.1.3.2 Basis Image Classification .............................................................. 442 20.1.3.3 Feature Emphasis by Generalized Adaptive Gain..................... 444 20.1.3.4 Nonlinear Filtering for Each Component .................................... 445 20.2 Part 2: A Bayesian Approach to ICA of SAR Images............................................... 446 20.2.1 Introduction ....................................................................................................... 446 20.2.2 Model and Statistics ......................................................................................... 447 20.2.3 Whitening Phase ............................................................................................... 447 20.2.4 ICA of SAR Images by Ensemble Learning ................................................. 449 20.2.5 Experimental Results........................................................................................ 451 20.2.6 Conclusions........................................................................................................ 452 References ................................................................................................................................... 454

20.1 20.1.1

Part 1: Subspace Approach of Speckle Reduction in SAR Images Using ICA Introduction

The use of synthetic aperture radar (SAR) can provide images with good details under many environmental conditions. However, the main disadvantage of SAR imagery is the poor quality of images, which are degraded by multiplicative speckle noise. SAR image speckle noise appears to be randomly granular and results from phase variations of radar waves from unit reflectors within a resolution cell. Its existence is undesirable because it degrades quality of the image and affects the task of human interpretation and evaluation. Thus, speckle removal is a key preprocessing step for automatic interpretation of SAR images. A subspace method using independent component analysis (ICA) for speckle reduction is presented here.

441

442 20.1.2

Signal and Image Processing for Remote Sensing Review of Speckle Reduction Techniques in SAR Images

Many adaptive filters for speckle reduction have been proposed in the past. Earlier approaches include Frost filter, Lee filter, Kuan filter, etc. The Frost filter was designed as an adaptive Wiener filter based on the assumption that the scene reflectivity is an autoregressive (AR) exponential model [1]. The Lee filter is a linear approximation filter based on the minimum mean-square error (MMSE) criterion [2]. The Kuan filter is the generalized case of the Lee filter. It is an MMSE linear filter based on the multiplicative speckle model and is optimal when both the scene and the detected intensities are Gaussian distributed [3]. Recently, there has been considerable interest in using the ICA as an effective tool for signal blind separation and deconvolution. In the field of image processing, ICA has strong adaptability for representing different kinds of images and is very suitable for tasks like compression and denoising. Since the mid-1990s its applications have been extended to more practical fields, such as signal and image denoising and pattern recognition. Zhang [4] presented a new ICA algorithm by working directly with highorder statistics and demonstrated its better performance on SAR image speckle reduction problem. Malladi [5] developed a speckle filtering technique using Holder regularity analysis of the Sparse coding. Other approaches [6–8] employ multi-scale and wavelet analysis. 20.1.3

The Subspace Approach to ICA Speckle Reduction

In this approach, we assume that the speckle noise in SAR images comes from a different signal source, which accompanies but is independent of the ‘‘true signal source’’ (image details). Thus the speckle removal problem can also be described as ‘‘signal source separation’’ problem. The steps taken are illustrated by the nine-channel SAR images considered in Chapter 2, which are reproduced here as shown in Figure 20.1. 20.1.3.1

Estimating ICA Bases from the Image

One of the important problems in ICA is how to estimate the transform from the given data. It has been shown that the estimation of the ICA data model can be reduced to the search for uncorrelated directions in which the components are as non-Gaussian as possible [9]. In addition, we note that ICA usually gives one component (DC component) representing the local mean image intensity, which is noise-free. Thus we should treat it separately from the other components in image denoising applications. Therefore, in all experiments we first subtract the local mean, and then estimate a suitable basis for the rest of the components. The original image is first linearly normalized so that it has zero mean and unit variance. A set of overlapped image windows of 16 16 pixels are taken from it and the local mean of each patch is subtracted. The choice of window size can be critical in this application. For smaller sizes, the reconstructed separated sources can still be very correlated. To overcome the difficulties related to the high dimensionality of vectors, their dimensionality has been reduced to 64 by PCA. (Experiments prove that for SAR images that have few image details, 64 components can make image reconstruction nearly error-free.) The preprocessed data set is used as the input to FastICA algorithm, using the tanh nonlinearity. Figure 20.2 shows the estimated basis vectors after convergence of the FastICA algorithm. 20.1.3.2 Basis Image Classification As alluded earlier, we believe that ‘‘speckle pattern’’ (speckle noise) in the SAR image comes from another kind of signal source, which is independent of true signal source; hence our problem can be considered as signal source separation. However, for the

Two ICA Approaches for SAR Image Enhancement

443

th-c-hh

th-c-hv

th-c-vv

th-p-hh

th-p-hv

th-p-vv

th-l-hh

th-l-hv

th-l-vv

FIGURE 20.1 The nine-channel polarimetric synthetic aperture radar (POLSAR) images.

image signal separation, we first need to classify the basis images; that is, we denote basis images that span speckle pattern space by S2 and the basis images that span ‘‘true signal’’ space by S1. Then we have S1 þ S2 ¼ V. The whole signal space that is spanned by all the basis images is denoted by V. Here, we sample in the main noise regions, which we denote by P. From the above discussion, S1 and S2 are essentially nonoverlapping or ‘‘orthogonal.’’ Then our classification rule is 8 1 P sij > T ith component 2S2 > sij < T ith component 2S1 :N j2P

Signal and Image Processing for Remote Sensing

444

FIGURE 20.2 ICA basis images of the images in Figure 20.1.

where T is a selected threshold. Figure 20.3 shows the classification result. The processing results of the first five channels are shown in Figure 20.4. We further calculate the ratio of local standard deviation to mean (SD/mean) for each image and use it as a criterion for image quality. Both visual quality and performance criterion demonstrate that our method can remove the speckle noise in SAR images efficiently. 20.1.3.3

Feature Emphasis by Generalized Adaptive Gain

We now apply nonlinear contrast stretching in each component to enhance the image features. Here, adaptive gain [6] through nonlinear processing, denoted as f(), is generalized to incorporate hard thresholding to avoid amplifying noise and remove small noise perturbations.

FIGURE 20.3 (a) The basis images 2S1 (19 components). (b) The basis images 2S2 (45 components).

Two ICA Approaches for SAR Image Enhancement

Channel C-HH

445

Channel C-HV

Channel L-HH

Channel C-VV

Channel L-HV

FIGURE 20.4 Recovered images with our method.

20.1.3.4 Nonlinear Filtering for Each Component Our nonlinear filtering is simple to realize. For the components that belong to S2, we simply set them to zero, but we apply our GAG operator to other components that belong to S1, to enhance the image feature. Then the recovered Sij can be calculated by the following equation: 0 ith component 2S2 ^sij ¼ f (sij ) ith component 2S1 Finally the restored image can be obtained after a mixing transform. A comparison is made with other methods including the Wiener filter, the Lee filter, and Kuan filter. The result of using Lee filter is shown in Figure 20.5. The ratio comparison is shown in Table 20.1. The smaller the ratio, the better the image quality. Our method has the smallest ratios in most cases.

Signal and Image Processing for Remote Sensing

446

FIGURE 20.5 Recovered images using Lee filter.

As a concluding remark the subspace approach as presented allows quite a flexibility to adjust parameters such that significant improvement in speckle reduction with the SAR images can be achieved.

20.2

Part 2: A Bayesian Approach to ICA of SAR Images

20.2.1

Introduction

We present a PCA–ICA neural network for analyzing the SAR images. With this model, the correlation between the images is eliminated and the speckle noise is largely reduced in only the first independent component (IC) image. We have used, as input data for the ICA parts, only the first principal component (PC) image. The IC images obtained are of very high quality and better contrasted than the first PC image. However, when the second and third PC images are also used as input images with the first PC image, the results are less impressive and the first IC images become less contrasted and more affected by the noise. This can be justified by the fact that the ICA parts of the models

TABLE 20.1 Ratio Comparison

Channel 1 Channel 2 Channel 3 Channel 4 Channel 5

Original

Our Method

Wiener Filter

Lee Filter

Kuan Filter

0.1298 0.1009 0.1446 0.1259 0.1263

0.1086 0.0526 0.0938 0.0371 0.1010

0.1273 0.0852 0.1042 0.0531 0.0858

0.1191 0.1133 0.1277 0.0983 0.1933

0.1141 0.0770 0.1016 0.0515 0.0685

Two ICA Approaches for SAR Image Enhancement

447

are essentially based on the principle of the Infomax algorithm for the model proposed in Ref. [10]. The Informax algorithm, however, is efficient only in the case where the input data have low additive noise. The purpose of Part 2 is to propose a Bayesian approach of the ICA method that performs well for analyzing images and that presents some advantages compared to the previous model. The Bayesian approach ICA method is based on the so-called ensemble learning algorithm [11,12]. The purpose is to overcome the disadvantages of the method proposed in Ref. [10]. Before detailing the present method in Section 20.2.4, we present in Section 20.2.2 the SAR image model and the statistics to be used later. Section 20.2.3 is devoted to the whitening phase of the proposed method. This step of processing is based on the so-called simultaneous diagonalization transform for performing the PCA method of SAR images [13]. Experimental results based on real SAR images shown in Figure 20.1 are discussed in Section 20.2.5. To prove the effectiveness of the proposed method, the FastICA-based method [9] is used for comparison. The conclusion for Part 2 is in Section 20.2.6.

20.2.2

Model and Statistics

We adopt the same model used in Ref. [10]. Speckle has the characteristics of a multiplicative noise in the sense that its intensity is proportional to the value of the pixel content and is dependent on the target nature [13]. Let xi be the content of the pixel in the ith image, si the noise-free signal response of the target, and ni the speckle. Then, we have the following multiplicative model: xi ¼ si ni

(20:1)

By supposing that the speckle has unity mean, standard deviation of si, and is statistically independent of the observed signal xi [14], the multiplicative model can be rewritten as xi ¼ si þ si (ni 1)

(20:2)

The term si (ni 1) represents the zero mean signal-dependent noise and characterizes the speckle noise variation. Now, let X be the stationary random vector of input SAR images. The covariance matrix of X, SX, can be written as SX ¼ Ss þ Sn

(20:3)

where Ss and Sn are the covariance matrices of the noise-free signal vector and the signaldependent noise vector, respectively. The two matrices, SX and Sn, are used in constructing the linear transformation matrix of the whitening phase of the proposed method.

20.2.3

Whitening Phase

The whitening phase is ensured by the PCA part of the proposed model (Figure 20.6). The PCA-based part (Figure 20.7) is devoted to the extraction of the PC images. It is based on the simultaneous diagonalization concept of the two matrices SX and Sn, via one orthogonal matrix A. This means that the PC images (vector Y) are uncorrelated and have an additive noise that has a unit variance. This step of processing allows us to make our application coherent with the theoretical development of ICA. In fact, the constraint to

Signal and Image Processing for Remote Sensing

448

POLSAR images

PC images

PCA part of the model using neural networks

IC images

ICA part of the model using ensemble

A

B

FIGURE 20.6 The proposed PCA–ICA model for SAR image analysis.

have whitening uncorrelated inputs is desirable in ICA algorithms because it simplifies the computations considerably [11,12]. These inputs are assumed non-Gaussian, centered, and have unit variance. It is ordinarily assumed that X is zero-mean, which in turn means that Y is also zero-mean, where the condition of unit variance can be achieved by standardizing Y. For the non-Gaussianity of Y, it is clear that the speckle, which has non-Gaussianity properties, is not affected by this step of processing because only the second-order statistics are used to compute the matrix A. The criterion for determining A is: ‘‘Finding A such as the matrix Sn becomes an identity matrix and the matrix SX is transformed, at the same time, to a diagonal matrix.’’ This criterion can be formulated in the constrained optimization framework as Maximize AT SX A subject to AT Sn A ¼ I

(20:4)

where I is the identity matrix. Based on the well-developed aspects of the matrix theories and computations, the existence of A is proved in Ref. [12] and a statistical algorithm for obtaining it is also proposed. Here, we propose a neuronal implementation of this algorithm [15] with some modifications (Figure 20.7). It is composed of two PCA neural networks that have the same topology. The lateral weights cj1 and cj2, forming the vectors C1 and C2, respectively, connect all the first m 1 neurons with the mth one. These connections play a very important role in the model because they work toward the orthogonalization of the synaptic vector of the mth neuron with the vectors of the previous m 1 neurons. The solid lines denote the weights wi1, cj1 and wi2, cj2, respectively, W1

. . .

Y

C1

. .

W2

C2

. .

.

First PCA neural network

FIGURE 20.7 The PCA part of the proposed model for SAR image analysis.

Second PCA neural network

X

Two ICA Approaches for SAR Image Enhancement

449

which are trained at the mth stage, while the dashed lines correspond to the weights of the already trained neurons. Note that the lateral weights asymptotically converge to zero, so they do not appear among the already trained neurons. The first network of Figure 20.7 is devoted to whitening the noise in Equation 20.2, while the second one is for maximizing the variance given that the noise is already whitened. Let X1 be the input vector of the first network. The noise is whitened, through the feed-forward weights {wij1}, where i and j correspond to the input and output neurons, respectively, and the superscript 1 designates the weighted matrix of the first network. After convergence, the vector X is transformed to the new vector X0 via the matrix U ¼ W1L1/2, where W1 is the weighted matrix of the first network, L is the diagonal matrix of eigenvalues of Sn (variances of the output neurons) and L1/2 is the inverse of its square root. Next, X0 be the input vector of the second network. It is connected to M outputs, with M N, corresponding to the intermediate output vector noted X2, through the feedforward weights {wij2}. Once this network is converged, the PC images to be extracted (vector Y) are obtained as Y ¼ AT X ¼ UW2 X

(20:5)

where W2 is the weighted matrix of the second network. The activation of each neuron in the two parts of the network is a linear function of their inputs. The kth iteration of the learning algorithm, for both networks, is given as: w(k þ 1) ¼ w(k) þ b(k)(qm (k)P q2m (k)w(k))

(20:6)

c(k þ 1) ¼ c(k) þ b(k)(qm (k)Q q2m (k)c(k))

(20:7)

Here P and Q are the input and output vectors of the network, respectively. b(k) is a positive sequence of the learning parameter. The global convergence of the PCA-based part of the model is strongly dependent on the parameter b. The optimal choice of this parameter is well studied in Ref. [15].

20.2.4

ICA of SAR Images by Ensemble Learning

Ensemble learning is a computationally efficient approximation for exact Bayesian analysis. With Bayesian learning, all information is taken into account in the posterior probabilities. However, the posterior probability density function (pdf) is a complex high-dimensional function whose exact treatment is often difficult, if not impossible. Thus some suitable approximation method must be used. One solution is to find the maximum A posterior (MAP) parameters. But this method can overfit because it is sensitive to probability density rather than probability mass. The correct way to perform the inference would be to average over all possible parameter values by drawing samples from the posterior density. Rather than performing a Markov chain Monte Carlo (MCMC) approach to sample from the true posterior, we use the ensemble learning approximation [11]. Ensemble learning [11,12] which is a special case of variational learning, is a recently developed method for parametric approximation of posterior pdfs where the search takes into account the probability mass of the models. Therefore, it solves the tradeoff between under- and overfitting. The basic idea in ensemble learning is to minimize the misfit between the posterior pdf and its parametric approximation by choosing a computationally tractable parametric approximation—an ensemble—for the posterior pdf.

Signal and Image Processing for Remote Sensing

450

In fact, all the relevant information needed in choosing an appropriate model is contained in the posterior pdfs of hidden sources and parameters. Let us denote the set of available data, which are the PC output images of the PCA part of Figure 20.7, by X, and the respective source vectors by S. Given the observed data X, the unknown variables of the model are the sources S, the mixing matrix B, the parameters of the noise and source distributions, and the hyperparameters. For notational simplicity, we shall denote the ensemble of these variables and parameters by u. The posterior P (S, ujX) is thus a pdf of all these unknown variables and parameters. We wish to infer the set pdf parameters u given the observed data matrix X. We approximate the exact posterior probability density, P(S, ujX), by a more tractable parametric approximation, Q(S, ujX), for which it is easy to perform inferences by integration rather than by sampling. We optimize the approximate distribution by minimizing the Kullback–Leibler divergence between the approximate and the true posterior distribution. If we choose a separable distribution for Q(S, ujX), the Kullback–Leibler divergence will split into a sum of simpler terms. An ensemble learning model can approximate the full posterior of the sources by a more tractable separable distribution. The Kullback–Leibler divergence CKL, between P(S, ujX) and Q(S, ujX), is defined by the following cost function: Q(S, ujX) ¼ Q(S, ujX) log du dS P(S, ujX) ð

CKL

(20:8)

CKL measures the difference in the probability mass between the densities P(S, ujX) and Q(S, ujX). Its minimum value 0 is achieved when the two densities are the same. For approximating and then minimizing CKL, we need the exact posterior density P(S, ujX) and its parametric approximation Q(S, ujX). According to the Bayes rule, the posterior pdf of the unknown variables S and u is such as: P(S, ujX) ¼

P(XjS, u)P(Sju)P(u) P(X)

(20:9)

The term P(XjS, u) is obtained from the model that relates the observed data and the sources. The terms P(Sju) and P(u) are products of simple Gaussian distributions and they are obtained directly from the definition of the model structure [16]. The term P(X) does not depend on the model parameters and can be neglected. The approximation Q(S, ujX) must be simple for mathematical tractability and computational efficiency. Here, both the posterior density P(S, ujX) and its approximation Q(S, ujX) are products of simple Gaussian terms, which simplify the cost function given by Equation 20.8 considerably: it splits into expectations of many simple terms. In fact, to make the approximation of the posterior pdf computationally tractable, we shall choose the ensemble Q(S, ujX) to be a Gaussian pdf with diagonal covariance. The independent sources are assumed to have mixtures of Gaussian as distributions. The observed data are also assumed to have additive Gaussian noise with diagonal covariance. This hypothesis is verified by performing the whitening step using the simultaneous diagonalization transform as it is given in Section 20.2.3. The model structure and all the parameters of the distributions are estimated from the data. First, we assume that the sources S are independent of the other parameters u, so that Q(S, ujX) decouples into Q(S, ujX) ¼ Q(SjX)Q(ujX)

(20:10)

Two ICA Approaches for SAR Image Enhancement

451

For the parameters u, a Gaussian density with a diagonal covariance matrix is supposed. This implies that the approximation is a product of independent distributions: Y Q(ujX) ¼ Q (u jX) (20:11) i i i The parameters of each Gaussian component density Qi(uijX) are its means i and variances ~i. The Q(ujX) is similar. The cost function CKL is a function of the posterior means i and variances ~i of the sources and the parameters of the network. This is because instead of finding a point estimate, the joint posterior pdf of the sources and parameters is estimated in ensemble learning. The variances give information about the reliability of the estimates. Let us now denote the two parts of the cost function given by Equation 20.8 arising in the denominator and numerator of the logarithm respectively by Cp ¼ Ep(log (P)) and Cq ¼ Eq(log (Q)). The variances ~i are obtained by differentiating Equation 20.8 with respect to ~i [16]: @ CKL @ Cp @ Cq @ Cp 1 ¼ þ ¼ ~ ~ ~ ~ @ ui @ ui @ ui @ ui 2u~i

(20:12)

Equating this to zero yields a fixed-point iteration for updating the variances: u~i ¼

2

@ Cp @ u~i

1 (20:13)

The means i can be estimated from the approximate Newton iteration [16]: ui

1 @ Cp @ 2 Cp @ Cp ui ui u~i 2 @ ui @ ui @ ui

(20:14)

The algorithm solves Equation 20.13 and Equation 20.14 iteratively until convergence is achieved. The practical learning procedure consists of applying the PCA part of the model. The output PC images are used to find sensible initial values for the posterior means of the sources. The PCA part of the model yields clearly better initial values than a random choice. The posterior variances of the sources are initialized to small values. 20.2.5

Experimental Results

The SAR images used are shown in Figure 20.1. To prove the effectiveness of the proposed method, the FastICA-based method [13,14] is used for comparison. The extracted IC images using the proposed method are given in Figure 20.8. The extracted IC images using the FastICA-based method are presented in Figure 20.9. We note that the FastICA-based method gives inadequate results because the IC images obtained are contrasted too much. It is clear that the proposed method gives the IC images that are better than the original SAR images. Also, the results of ICA by ensemble learning exceed largely the results of the FastICA-based method. We observe that the effect of the speckle noise is largely reduced in the images based on ensemble learning especially in the sixth image, which is an image of high quality. It appears that the low quality of some of the images by the FastICA method is caused by being trapped in local minimum while the ensemble learning–based method is much more robust.

Signal and Image Processing for Remote Sensing

452

First IC image

Second IC image

Third IC image

Fourth IC image

Fifth IC image

Sixth IC image

Seventh IC image

Eighth IC image

Ninth IC image

FIGURE 20.8 The results of ICA by ensemble learning.

Table 20.2 shows a comparison of the computation time between the FastICA method and the proposed method. It is evident that the FastICA method has significant advantage in computation time. 20.2.6

Conclusions

We have suggested a Bayesian approach of ICA applied to SAR image analysis. This consists of using the ensemble learning, which is a computationally efficient approximation for exact Bayesian analysis. Before performing the ICA by ensemble learning, a PCA neural network model that performs the simultaneous diagonalization of the noise

Two ICA Approaches for SAR Image Enhancement

453

First IC image

Second IC image

Third IC image

Fourth IC image

Fifth IC image

Sixth IC image

Seventh IC image

Eighth IC image

Ninth IC image

FIGURE 20.9 The results of the FastICA-based method.

TABLE 20.2 Computation Time of FastICA-Based Method and ICA by Ensemble Learning Method FastICA-based method ICA by ensemble learning

Computation Time (sec)

Number of Iterations

23.92 2819.53

270 130

454

Signal and Image Processing for Remote Sensing

covariance matrix and the observed data covariance matrix is applied to SAR images. The PC images are used as input data of ICA by ensemble learning. The obtained results are satisfactory. The comparative study with FastICA-based method shows that ICA by ensemble learning is a robust technique that has an ability to avoid the local minimal and so reaching the global minimal in contrast to the FastICA-based method, which does not have this ability. However, the drawback of ICA by ensemble learning is the prohibitive computation time compared to that of the FastICA-based method. This can be justified by the fact that ICA by ensemble learning requires many parameter estimations during its learning process. Further investigation is needed to reduce the computational requirement.

References 1. Frost, V.S., Stiles, J.A., Shanmugan, K.S., and Holtzman, J.C., A model for radar images and its application to adaptive digital filtering of multiplicative noise, IEEE Trans. Pattern Anal. Mach. Intell., 4, 157–166, 1982. 2. Lee, J.S., Digital image enhancement and noise filtering by use of local statistics, IEEE Trans. Pattern Anal. Mach. Intell., 2(2), 165–168, 1980. 3. Kuan, D.R., Sawchuk, A.A., Strand, T.C., and Chavel, P., Adaptive noise smoothing filter for images with signal-dependent noise, IEEE Trans. Pattern Anal. Mach. Intell., 7, 165–177, 1985. 4. Zhang, X. and Chen, C.H., Independent component analysis by using joint cumulations and its application to remote sensing images, J. VLSI Signal Process. Syst., 37(2/3), 2004. 5. Malladi, R.K., Speckle filtering of SAR images using Holder Regularity Analysis of the Sparse Code, Master Dissertation of ECE Department, University of Massachusetts, Dartmouth, September 2003. 6. Zong, X., Laine, A.F., and Geiser, E.A., Speckle reduction and contrast enhancement of echocardiograms via multiscale nonlinear processing, IEEE Trans. Med. Imag., 17, 532–540, 1998. 7. Fukuda, S. and Hirosawa, H., Suppression of speckle in synthetic aperture radar images using wavelet, Int. J. Rem. Sens., 19(3), 507–519, 1998. 8. Achim, A., Tsakalides, P., and Bezerianos, A., SAR image denoising via Bayesian, IEEE Trans. Geosci. Rem. Sens., 41(8), 1773–1784, 2003. 9. Hyvarinen, A., Karhunen, J., and Oja, E., Independent Component Analysis, Wiley Interscience, New York, 2001. 10. Chitroub, S., PCA–ICA neural network model for POLSAR images analysis, in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP’04), Montreal, Canada, May 17–21, 2004, pp. 757–760. 11. Lappalainen, H. and Miskin, J., Ensemble learning, in M. Girolami, Ed., Advances in Independent Component Analysis, Springer-Verlag, Berlin, 2000, pp. 75–92. 12. Mackay, D.J.C., Developments in probabilistic modeling with neural networks—ensemble learning, in Proc. 3rd Annu. Symp. Neural Networks, Springer-Verlag, Berlin, 1995, pp. 191–198. 13. Chitroub, S., Houacine, A., and Sansal, B., Statistical characterisation and modelling of SAR images, Signal Processing, 82(1), 69–92, 2002. 14. Chitroub, S., Houacine, A., and Sansal, B., A new PCA-based method for data compression and enhancement of multi-frequency polarimetric SAR imagery, Intell. Data Anal. Int. J., 6(2), 187– 207, 2002. 15. Chitroub, S., Houacine, A., and Sansal, B., Neuronal principal component analysis for an optimal representation of multispectral images, Intell. Data Anal. Int. J., 5(5), 385–403, 2001. 16. Lappalainen, H. and Honkela, A., Bayesian non-linear independent component analysis by multilayer perceptrons, in M. Girolami, Ed., Advances in Independent Component Analysis, Springer-Verlag, Berlin, 2000, pp. 93–121.

21 Long-Range Dependence Models for the Analysis and Discrimination of Sea-Surface Anomalies in Sea SAR Imagery

Massimo Bertacca, Fabrizio Berizzi, and Enzo Dalle Mese

CONTENTS 21.1 Introduction ..................................................................................................................... 455 21.2 Methods of Estimating the PSD of Images................................................................. 458 21.2.1 The Periodogram............................................................................................... 458 21.2.2 Bartlett Method: Average of the Periodograms........................................... 459 21.3 Self-Similar Stochastic Processes .................................................................................. 461 21.3.1 Covariance and Correlation Functions for Self-Similar Processes with Stationary Increments ............................................................................. 463 21.3.2 Power Spectral Density of Self-Similar Processes with Stationary Increments ............................................................................. 465 21.4 Long-Memory Stochastic Processes............................................................................. 465 21.5 Long-Memory Stochastic Fractal Models ................................................................... 466 21.5.1 FARIMA Models ............................................................................................... 467 21.5.2 FEXP Models ..................................................................................................... 468 21.5.3 Spectral Densities of FARIMA and FEXP Processes................................... 470 21.6 LRD Modeling of Mean Radial Spectral Densities of Sea SAR Images ................ 471 21.6.1 Estimation of the Fractional Differencing Parameter d .............................. 473 21.6.2 ARMA Parameter Estimation ......................................................................... 475 21.6.3 FEXP Parameter Estimation ............................................................................ 476 21.7 Analysis of Sea SAR Images ......................................................................................... 476 21.7.1 Two-Dimensional Long-Memory Models for Sea SAR Image Spectra .................................................................................... 480 21.8 Conclusions...................................................................................................................... 483 References ................................................................................................................................... 487

21.1

Introduction

In this chapter, by employing long-memory spectral analysis techniques, the discrimination between oil spill and low-wind areas in sea synthetic aperture radar (SAR) images and the simulation of spectral densities of sea SAR images are described. Oil on the sea surface dampens capillary waves, reduces Bragg’s electromagnetic backscattering effect and, therefore, generates darker zones in the SAR image. A low surface wind speed, 455

456

Signal and Image Processing for Remote Sensing

which reduces the amplitudes of all the wave components (not just capillary waves), and the presence of phytoplankton, algae, or natural films can also cause analogous effects. Some current recognition and classification techniques span from different algorithms for fractal analysis [1] (i.e., spectral algorithms, wavelet, and box-counting algorithms for the estimation of the fractal dimension D) to algorithms for the calculation of the normalized intensity moments (NIM) of the sea SAR image [2]. The problems faced when estimating the value of D include small variations due to oil slick and weak-wind areas and the effect of the edges between two anomaly regions with different physical characteristics. There are also computational problems that arise when the calculation of NIM is related to real (i.e., not simulated) sea SAR images. In recent years, the analysis of natural clutter in high-resolution SAR images has improved by the utilization of self-similar random process models. Many natural surfaces, like terrain, grass, trees, and also sea surfaces, correspond to SAR precision images (PRI) that exhibit long-term dependence behavior and scale-limited fractal properties. Specifically, the long-term dependence or long-range dependence (LRD) property describes the high-order correlation structure of a process. Suppose that Y(m,n) is a discrete twodimensional (2D) process whose realizations are digital images. If Y (m,n) exhibits long memory, persistent spatial (linear) dependence exists even between distant observations. On the contrary, the short memory or short-range dependence (SRD) property describes the low-order correlation structure of a process. If Y (m,n) is a short-memory process, observations separated by a long spatial span are nearly independent. Among the possible self-similar models, two classes have been used in the literature to describe the spatial correlation properties of the scattering from natural surfaces: fractional Brownian motion (fBm) models and fractionally integrated autoregressive moving average (FARIMA) models. In particular, fBm provides a mathematical framework for the description of scale-invariant random textures and amorphous clutter of natural settings. Datcu [3] used an fBm model for synthesizing SAR imagery. Stewart et al. [4] proposed an analysis technique for natural background clutter in high-resolution SAR imagery. They employed fBm models to discriminate among three clutter types: grass, trees, and radar shadows. If the fBm model provides a good fit with the periodogram of the data, it means that the power spectral density (PSD), as a function of the frequency, is approximately a straight line with negative slope in a log–log plot. For particular data sets, the estimated PSD cannot be correctly represented by an fBm model. There are different slopes that characterize the plot of the logarithm of the periodogram versus the logarithm of the frequency. They reveal a greater complexity of the analyzed phenomenon. Therefore, we can utilize FARIMA models that preserve the negative slope of the long-memory data PSD near the origin and, through the so-called SRD functions, modify the shape and the slope of the PSD with increasing frequency. The SRD part of a FARIMA model is an autoregressive moving average (ARMA) process. Ilow and Leung [5] used the FARIMA model as a texture model for sea SAR images to capture the long-range and short-range spatial dependence structures of some sea SAR images collected by the RADARSAT sensor. Their work was limited to the analysis of isotropic and homogeneous random fields, and only to AR or MA models (ARMA models were not considered). They observed that, for a statistically isotropic and homogeneous field, it is a common practice to derive a 2D model from a one-dimensional (1D) model by replacing the argument K in the PSD of a 1D process, S(K), with qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ kKk ¼ Kx2 þ Ky2 to get the radial PSD: S(kKk). When such properties hold, the PSD of the correspondent image can be completely described by using the radial PSD.

Sea-Surface Anomalies in Sea SAR Imagery

457

Unfortunately, sea SAR images cannot be considered simply in terms of a homogeneous, isotropic, or amorphous clutter. The action of the wind contributes to the anisotropy of the sea surfaces and the particular self-similar behavior of sea surfaces and spectra, correctly described by means of the Weierstrass-like fractal model [6], strongly complicates the self-similar representation of sea SAR imagery. Bertacca et al. [7,8] extended the work of Ilow and Leung to the analysis of nonisotropic sea surfaces. The authors made use of ARMA processes to model the SRD part of the mean radial PSD (MRPSD) of sea European remote sensing 1 and 2 (ERS-1 and ERS-2) SAR PRI. They utilized a FARIMA analysis technique of the spectral densities to discriminate low-wind from oil slick areas on the sea surface. A limitation to the applicability of FARIMA models is the high number of parameters required for the ARMA part of the PSD. Using an excessive number of parameters is undesirable because it increases the uncertainty of the statistical inference and the parameters become difficult to interpret. Using fractionally exponential (FEXP) models allows the representation of the logarithm of the SRD part of the long-memory PSD to be obtained, and greatly reduces the number of parameters to be estimated. FEXP models provide the same goodness of fit as FARIMA models at lower computational costs. We have experimentally determined that three parameters are sufficient to characterize the SRD part of the PSD of sea SAR images corresponding to absence of wind, low surface wind speeds, or to oil slicks (or spills) on the sea surface [9]. The first step in all the methods presented in this chapter is the calculation of the directional spectrum of a sea SAR image by using the 2D periodogram of an N N image. To decrease the variance of the spectral estimation, we average spectral estimates obtained from nonoverlapping squared blocks of data. The characterization of isotropic or anisotropic 2D random fields is done first using a rectangular to polar coordinates transformation of the 2D PSD, and then considering, as radial PSD, the average of the radial spectral densities for q ranging from 0 to 2p radians. This estimated MRPSD is finally modeled using a FARIMA or an FEXP model independently of the anisotropy of sea SAR images. As the MRPSD is a 1D signal, we define these techniques as 1D PSD modeling techniques. It is observed that sea SAR images, in the presence of a high or moderate wind, do not have statistical isotropic properties [7,8]. In these cases, MRPSD modeling permits discrimination between different sea surface anomalies, but it is not sufficient to completely represent anisotropic and nonhomogeneous fields in the spectral domain. For instance, to characterize the sea wave directional spectrum of a sea surface, we can use its MRPSD together with an apposite spreading function. Spreading functions describe the anisotropy of sea surfaces and depend on the directions of the waves. The assumption of spatial isotropy and nondirectionality for sea SAR images is valid when the sea is calm, as the sea wave energy is spread in all directions and the SAR image PSD shows a circular symmetry. However, with surface wind speeds over 7 m/sec, and, in particular, when the wind and the radar directions are orthogonal [10], the anisotropy of the PSD of sea SAR images starts to be perceptible. Using a 2D model allows the information on the shape of the SAR image PSD to be preserved and provides a better representation of sea SAR images. In this chapter, LRD models are used in addition to the fractal sea surface spectral model [6] to obtain a suitable representation of the spectral densities of sea SAR images. We define this technique as the 2D PSD modeling technique. These 2D spectral models (FARIMA-fractal or FEXP-fractal models) can be used to simulate sea SAR image spectra in different sea states and wind conditions—and with oil slicks—at a very low computational cost. All the presented methods demonstrated reliable results when applied to ERS2 SAR PRI and to ERS-2 SAR Ellipsoid Geocoded Images.

Signal and Image Processing for Remote Sensing

458

21.2

Methods of Estimating the PSD of Images

The problem of spectral estimation can be faced in two ways: applying classical methods, which consist of estimating the spectrum directly from the observed data, or by a parametrical approach, which consists of hypothesizing a model, estimating its parameters from the data, and verifying the validity of the adopted model a posteriori. The classical methods of spectrum estimation are based on the calculation of the Fourier transform of the observed data or of their autocorrelation function [11]. These techniques of estimation ensure good performances in case the available samples are numerous and require the sole hypothesis of stationarity of the observed data. The methods that depend on the choice of a model ensure a better estimation than the ones obtainable with the classical methods in case the available data are less (provided the adopted model is correct). The classical methods are preferable for the study of SAR images. In these applications, an elevated number of pixels are available and one cannot use models that describe the process of generation of the samples and that turn out to be simple and accurate at the same time.

21.2.1

The Periodogram

This method of estimation, in the 1D case, requires the calculation of the Fourier transform of the sequence of the observed data. When working with bidimensional stochastic processes, whose sample functions are images [12], in place of a sequence x[n], we consider a data matrix x[m, n], m ¼ 0, 1, . . . , (M 1), n ¼ 0, 1, . . . , (N 1). In these cases, one uses the bidimensional version of the periodogram as defined by the equation 2 M 1 N 1 X X 1 j2p( f m þ f n) 1 2 ^ PER ( f1 , f2 ) ¼ P x[m, n]e MN m¼0 n¼0

(21:1)

Observing that X( f1 , f2 ) ¼

M 1 N 1 X X

x[m, n]ej2p( f1 m þ f2 n)

(21:2)

m¼0 n¼0

is the Fourier bidimensional discrete transform of the data sequence, Equation (21.1), can thus be rewritten as ^ PER ( f1 , f2 ) ¼ 1 jX( f1 , f2 )j2 P MN

(21:3)

It can be demonstrated that the estimator in the above equation is not only asymptotically unbiased (the average value of the estimator tends, at the limit for N ! 1 and M ! 1, to the PSD of the data) but also inconsistent (the variance of the estimator does not tend to zero in the limit for N ! 1 and M ! 1). The technique of estimation through the periodogram remains, however, of great practical interest: from Equation 21.3, we perceive the computational simplicity achievable through an implementation of the calculation of the fast Fourier transform (FFT).

Sea-Surface Anomalies in Sea SAR Imagery 21.2.2

459

Bartlett Method: Average of the Periodograms

A simple strategy adopted to reduce the variance of the estimator (Equation 21.3) consists of calculating the average of several independent estimations. As the variance of the estimator does not decrease with the increasing of the dimensions of the matrix of the data, one can subdivide this matrix in disconnected subsets, calculate the periodogram of each subset and execute the average of all the periodograms. Figure 21.1 shows an image of N M pixels (a matrix of N M elements) subdivided into K2 subwindows that are not superimposed by each of the R S elements xlx ly [m, n] ¼ x[m þ lx K, n þ ly K]

m ¼ 0, 1, . . . , (R 1) n ¼ 0, 1, . . . , (S 1)

(21:4)

with M¼RK N ¼ S K,

lx ¼ 0, 1, . . . , (K 1) ly ¼ 0, 1, . . . , (K 1)

(21:5)

The estimation according to Bartlett’s procedure gives K1 X K1 X ^ BART ( f1 , f2 ) ¼ 1 ^ (lx ly ) ( f1 , f2 ) P P K2 l ¼0 l ¼0 PER x

(21:6)

y

M

S

R 1

Ix 2

3

K

2

Iy

N

3

K FIGURE 21.1 Calculation of the periodogram in an image (a bidimensional sequence) with Bartlett’s method.

Signal and Image Processing for Remote Sensing

460

^ PER(lx ly )( f1, f2) represents the periodogram calculated on the In the above equation, P subwindows identified in the couple (lx, ly) and defined by the equation ^ (lx ly ) ( f1 , f2 ) P PER

2 R1 X S1 1 X j2p( f1 m þ f2 n) ¼ xlx ly [m, n]e RS m¼0 n¼0

(21:7)

A reduction in the dimensions of the windows containing the data that are analyzed corresponds to a loss of resolution for the estimated spectrum. Consider the average value of the estimator modified according to Bartlett: K 1 X K 1 n o n o n o X ^ BART ( f1 , f2 ) ¼ 1 ^ (lx ly ) ( f1 , f2 ) ¼ E P ^ (lx ly ) ( f1 , f2 ) E P E P PER PER K 2 l ¼0 l ¼0 x

(21:8)

y

From the above equation, we obtain the equation n o ^ BART ( f1 , f2 ) ¼ WB (f1 , f2 ) Sx ( f1 , f2 ) E P

(21:9)

From the above equation, we have that the average value of the estimator is the result of the double periodical convolution between the function WB( f1, f2) and the spectral density of power Sx( f1, f2) relative to the data matrix. Equation 21.6 thus defines a biased estimator. By a direct extension of the 1D theory of the spectral estimate, it is possible to interpret the function WB( f1, f2) as a 2D Fourier transform of the window (

1jkj R

w(k, l) ¼ 0

1 jlSj

jkj R jlj S otherwise if

(21:10)

Given that the window (Equation 21.10) is separable, the Fourier transform is the product of the 1D transforms: 1 1 sin(pf1 R) 2 sin(pf2 S) 2 WB ( f1 , f2 ) ¼ R S sin(pf1 ) sin(pf2 )

(21:11)

The application of Bartlett’s method determines the smearing of the estimated spectrum. Such a phenomenon is scarcely relevant only if WB( f1, f2) is very narrow in relation to X( f1, f2), that is, if the window used is sufficiently long. For example, for R ¼ S ¼ 256, we 1 have that the band at 3 dB of the principal lobe of WB( f1, f2) is equal to around R1 ¼ 256 . The resolution in frequency is equal to this value. Bartlett’s method permits a reduction in the variance of the estimator by a factor proportional to K2 [12]: n o n o ^ BART ( f1 , f2 ) ¼ 1 var P ^ (l) ( f1 , f2 ) var P PER K2

(21:12)

Such a result is correct only if the various periodograms are independent estimates. When the subsets of data are chosen as the contiguous blocks of the same realization, as shown in Figure 21.1, the windows of the data do not turn out to be uncorrelated among themselves, and a reduction in the variance by a factor inferior to K2 must be accepted. In conclusion, the periodogram is a nonconsistent and asymptotically unbiased estimator of the PSD of a bidimensional sequence. The bias of the estimator can be mathematically

Sea-Surface Anomalies in Sea SAR Imagery

461

described in the form of a double convolution of the true spectrum with a spectral window 2 2 sin (pf2 N ) 1 1 sin (p f1 M) of the type WBtot ( f1 , f2 ) ¼ M , where with M and N we indicate the sin (p f1 ) sin (pf2 ) N dimensions of the bidimensional sequence considered. If we carry out an average of the periodograms calculated on several adjacent subsequences (Bartlett’s method), each of R S samples, as in Figure 21.1, the bias of the estimator can still be represented as the double convolution (see Equation 21.9) of the true spectrum with the spectral window (Equation 21.11): 1 1 sin(p f1 R) 2 sin(pf2 S) 2 M N WB ( f1 , f2 ) ¼ , where R ¼ , S ¼ R S sin(p f1 ) sin(pf2 ) K K

(21:13)

^ BART( f1, f2) is greater than that of P ^ PER( f1, f2), because of the The bias of the estimator P greater width of the principal lobe of the corresponding spectral window. The bias can hence be interpreted in relation to its effects on the resolution of the spectrum. For a fixed dimension of the sequence to be analyzed, M rows and N columns, the variance of the estimator diminishes with the increase in the number of the periodograms, but R and S also diminish and thus the resolution of the estimated spectrum. Therefore, in Bartlett’s method, a compromise needs to be reached between the bias or resolution of the spectrum on one side and the variance of the estimator on the other. The actual choice of the parameters M, N, R, and S in a real situation is orientated by the a priori knowledge of the signal to be analyzed. For example, if we know that the spectrum has a very narrow peak and if it is important to resolve it, we must choose sufficiently large R and S values to obtain the desired resolution in frequency. It is then necessary to use a pair of sufficiently high M and N values to obtain a conveniently low variance for Bartlett’s estimator.

21.3

Self-Similar Stochastic Processes

In this section, we recall the definitions of self-similar and long-memory stochastic processes. Definition 1: Let Y(u), u 2 R be a continuous random process. It is called self-similar with selfsimilarity parameter H, if for any positive constant b, the following relation holds: d

bH Y(bu)¼ Y(u)

(21:14) d

In Definition 1, bH Y (bu) is called rescaled process with scale factor b, and ¼ means the equality of all the finite dimensional probability distributions. For any sequence of points u1, . . . ,uN and all b > 0, we have that bH [Y(bu1), . . . ,Y (buN)] has the same distribution as [Y(u1), . . . , Y(uN)] In this chapter, we analyze the correlation and PSD structure of discrete stochastic processes. Then we can impose a condition on the correlation of stochastic stationary processes introducing the definition of second-order self-similarity. Let Y(n), n 2 N be a covariance stationary discrete random process with mean h ¼ E{Y(n)}, variance s2, and autocorrelation function R(m) ¼ E{Y(n þ m)Y(n)}, m 0. The spectral density of the process is defined as [13]:

Signal and Image Processing for Remote Sensing

462

S(K) ¼

1 s2 X R(m)eimK 2p m¼1

(21:15)

Assume that [14,15] lim R(m) ¼ cR mg

(21:16)

m!1

where cR is a constant and 0 < g < 1. For each l ¼ 1, 2, . . . , we indicate with the symbols Yl (n) ¼ {Yl(n), n ¼ 1, 2, . . . } the series obtained by averaging Y(n) over nonoverlapping blocks of size l: Yl (n) ¼

Y[(n 1)l] þ Y[(nl 1)] , n>1 l

(21:17)

Definition 2: A process is called exactly second-order self-similar with self-similarity parameter g H ¼ 1 if, for each l ¼ 1, 2, . . . , we have that 2 Var{Yl } ¼ s2 lg Rl (m) ¼ R(m) ¼

i 1h (m þ 1)2H 2m2H þ jm 1j2H , 2

m0

(21:18)

where Rl(m) denotes the autocorrelation function of Yl(n) Definition 3: A process is called asymptotically second-order self-similar with self-similarity g parameter H ¼ 1 if 2 lim Rl (m) ¼ R(m) (21:19) l!1

Thus, if the autocorrelation functions of the processes Yl(n) are the same as or become indistinguishable from R(m) as m ! 1, the covariance stationary discrete process Y(n) is second-order self-similar. Lamperti [16] demonstrated that self-similarity is produced as a consequence of limit theorems for sums of stochastic variables. Definition 4: Let Y(u), u 2 R be a continuous random process. Suppose that for any n 1, n 2 R and any n points (u1, . . . , un), the random vectors {Y(u1 þ n) Y(u1 þ n 1), . . . , Y(un þ n) Y(un þ n 1)} show the same distribution. Then the process Y(u) has stationary increments. Theorem 1: Let Y(u), u 2 R be a continuous random process. Suppose that: 1. P{Y(1) 6¼ 0} > 0 2. X1, X2, is a stationary sequence of stochastic variables 3. b1, b2, are real, positive normalizing constants for which lim {log (bn )} ¼ 1 n!1

4. Y(u) is the limit in distribution of the sequence of the normalized partial sums

1 b1 n Snu ¼ bn

bnuc X

Xj ,

n ¼ 1,2, . . .

j¼1

Then, for each t > 0 there exists an H > 0 such that 1. Y(u) is self-similar with self-similarity parameter H 2. Y(u) has stationary increments

(21:20)

Sea-Surface Anomalies in Sea SAR Imagery

463

Furthermore, all self-similar processes with stationary increments and H > 0 can be obtained as sequences of normalized partial sums. Let Y(u), u 2 R be a continuous self-similar random process with self-similarity parameter H such that d

Y(u) ¼ uH Y(1)

(21:21)

for any u > 0. d Then, indicating with ! the convergence in distribution, we have the following behavior of Y(u) as u tends to infinity [17]: .

d

If H < 0, then Y(u) ! 0 d Y(u)¼ Y(1)

.

If H ¼ 0, then

.

If H > 0 and Y(u) 6¼ 0, then jY(u)j ! 1

d

If u tends to zero, we have the following: d

.

If H < 0 and Y(u) 6¼ 0, then jY(u)j ! 1

.

If H ¼ 0, then Y(u)¼ Y(1) d If H > 0, then Y(u) ! 0

.

d

We notice that: . . .

.

Y(u) is not stationary unless Y(u) 0 or H ¼ 0 If H ¼ 0, then P{Y(u) ¼ Y(1)} ¼ 1 for any u > 0 If H < 0, then Y(u) is not a measurable process unless P{Y(u) ¼ Y(1) ¼ 0} ¼ 1 for any u > 0 [18] As stationary data models, we use self-similar processes, Y(u), with stationary increments, self-similarity parameter H > 0 and P{Y(0) ¼ 0} ¼ 1

21.3.1

Covariance and Correlation Functions for Self-Similar Processes with Stationary Increments

Let Y(u), u 2 R be a continuous self-similar random process with self-similarity parameter H. Assume that Y(u) has stationary increments and that E{Y(u)} ¼ 0. Indicate with s2 ¼ E{Y(u) Y(u 1)} ¼ E{Y2 (u)} the variance of the stationary increment process X(u) with X(u) ¼ Y(u) Y(u 1). We have that n o n o E ½Y(u) Y(v) 2 ¼ E ½Y(u v) Y(0) 2 ¼ s2 (u v)2H

(21:22)

where u, v 2 R and u > v. In addition n o n o n o E ½Y(u) Y(v) 2 ¼ E ½Y(u) 2 þ E ½Y(v) 2 2EfY(u)Y(v)g ¼ ¼ s2 u2H þ s2 v2H 2CY (u, v) where CY(u, v) denotes the covariance function of the nonstationary process Y(u).

(21:23)

Signal and Image Processing for Remote Sensing

464 Thus, we obtain that

1 CY (u, v) ¼ s2 u2H (u v)2H þ v2H 2

(21:24)

The covariance function of the stationary increment sequence X(j) ¼ Y(j) Y(j 1), j ¼ 1, 2, . . . is C(m) ¼ Cov{X(j), X(j þ m)} ¼ Cov{X(1), X(1 þ m)} ¼ 82 32 2 32 2 32 2 32 9 = mþ1 m m mþ1 X X X 1 <4X X(p)5 þ4 X(p)5 4 X(p)5 4 X(p)5 ¼ E ; 2 : p ¼1 p ¼2 p ¼1 p ¼2 ¼

o n o n oo 1n n E ½Y(1 þ m) Y(0) 2 þ E ½Y(m 1) Y(0) 2 2E ½Y(m) Y(0) 2 2 (21:25)

After some algebra, we obtain

1 C(m) ¼ s2 (m þ 1)2H 2m2H þ (m 1)2H , 2 C(m) ¼ C( m), m < 0 Then, the correlation function, R(m) ¼

m0

(21:26)

C(m) , is s2

1 (m þ 1)2H 2m2H þ (m 1)2H , 2 R(m) ¼ R( m), m < 0

R(m) ¼

m0

(21:27)

If 0 < H < 1 and H 6¼ 0, we have that [19] lim {R(m)} ¼ H(2H 1)m2H2

m!1

(21:28)

We notice that: .

If 12 < H < 1, then the correlations decay very slowly and sum to infinity: 1 P R(m) ¼ 1. The process has long memory (it has LRD behavior). m¼1

. .

If H ¼ 12, then the observations are uncorrelated: R (m) ¼ 0 for each m. 1 P If 0 < H < 12, then the correlations sum to zero: R(m) ¼ 0. In reality, the last m¼1

condition is very unstable. In the presence of arbitrarily small disturbances [19], 1 P the series sums to a finite number: R(m) ¼ c, c 6¼ 0. m¼1 .

If H ¼ 1, from Equation 21.7 we obtain d

d

Y(m) ¼ uH Y(1) ¼ Y(m),

m ¼ 1, 2, . . .

(21:29)

and R(m) ¼ 1 for each m. .

If H > 1, then R(m) can become greater than 1 or less than 1 when m tends to infinity.

Sea-Surface Anomalies in Sea SAR Imagery

465

The last two points, corresponding to H 1, are not of any importance. Thus, if correlations exist and lim {R(m)} ¼ 0, then 0 < H < 1. m!1 We can conclude by observing that: .

A self-similar process for which Equation 21.7 holds is nonstationary.

.

Stationary data can be modeled using self-similar processes with stationary increments. Analyzing the autocorrelation function of the stationary increment process, we obtain

.

– If 12 < H < 1, then the increment process has long memory (LRD) – If H ¼ 12, then the observations are uncorrelated – If 0 < H < 12, then the process has short memory and its correlations sum to zero 21.3.2

Power Spectral Density of Self-Similar Processes with Stationary Increments

Let Y(u) be a self-similar process with stationary increments, finite second-order moments, 0 < H < 1 and lim R(m) ¼ 0. m!1 Under these hypotheses, the PSD of the stationary increment sequence X( j) is [20]: S(k) ¼ 2cS [1 cos(k)]

1 X

j2pp þ kj2H1 ,

k 2 [ p, p]

(21:30)

p¼1

s2 sin(pH)G(2H þ 1) and s2 ¼ Var {X(j)}. 2p Calculating the Taylor expansion of S(K) in zero, we obtain

where cS ¼

S(K) ¼ cS jkj12H þ o jkjmin(3 2H, 2)

(21:31)

We note that if 12 < H < 1, then the logarithm of the PSD, log[S(k)], plotted against the logarithm of frequency, log(k), diverges when k tends to zero. In other words, the PSD of long-memory data tends to infinity at the origin.

21.4

Long-Memory Stochastic Processes

Intuitively, long-memory or LRD can be considered as a phenomenon in which current observations are strongly correlated to observations that are far away in time or space. In Section 21.3, the concept of self-similar LRD processes was introduced and was shown to be related to the shape of the autocorrelation function of the stationary increment sequence X(j). If the correlations R(m) decay asymptotically as a hyperbolic function, their sum over all lags diverges and the self-similar process exhibits an LRD behavior. For the correlations and the PSD of a stationary LRD process, the following properties hold [19]: .

The correlations R(m) are asymptotically equal to cRjmjd for some 0 < d < 1.

Signal and Image Processing for Remote Sensing

466 .

The PSD S(K) has a pole at zero that is equal to a constant cSkb for some 0 < b < 1.

.

Near the origin, the logarithm of the periodogram I(k) plotted versus the logarithm of the frequency is randomly scattered around a straight line with negative slope.

Definition 5: Let X(v), v 2 R be a continuous stationary random process. Assume that there exists a real number 0 < d < 1 and a constant cR such that lim

m!1

R(m) ¼1 cR md

(21:32)

Then X(v) is called a stationary process with long-memory or LRD d In Equation 21.32, the Hurst parameter H ¼ 1 is often used instead of d. 2 On the contrary, stationary processes with exponentially decaying correlations R(m) cbm

(21:33)

where c and b are real constants and 0 < c < 1, 0 < b < 1 are called stationary processes with short-memory or SRD. We can also define LRD by imposing a condition on the PSD of a stationary process. Definition 6: Let X(v), v 2 R be a continuous stationary random process. Assume that there exists a real number 0 < b < 1 and a constant cS such that lim

k!1

S(k) cS jkjb

¼1

(21:34)

Then X(v) is called a stationary process with long-memory or LRD. Such spectra occur frequently in engineering, geophysics, and physics [21,22]. In particular, studies on sea spectra using long-memory processes have been carried out by Sarpkaya and Isaacson [23] and Bretschneider [24]. We notice that the definition of LRD by Equation 21.33 or Equation 21.34 is an asymptotic definition. It depends on the behavior of the spectral density as the frequency tends to zero and behavior of the correlations as the lag tends to infinity.

21.5

Long-Memory Stochastic Fractal Model s

Examples of LRD processes include fractional Gaussian noise (fGn), FARIMA, and FEXP. fGn is the stationary first-order increment of the well-known fractionally fBm model. fBm was defined by Kolmogorov and studied by Mandelbrot and Van Ness [25]. It is a Gaussian, zero mean, nonstationary self-similar process with stationary increments. In one dimension, it is the only self-similar Gaussian process with stationary increments. Its covariance function is given by Equation 21.24. A particular case of an fBm model is the Wiener process (Brownian motion). It is a zeromean Gaussian process whose covariance function is equal to Equation 21.24 with H ¼ 12 (the observations are uncorrelated). In fact, one of the most important properties of Brownian motion is the independence of its increments. As fBm is nonstationary, its PSD cannot be defined. Therefore, we can study the characteristics of the process by analyzing the autocorrelation function and the PSD of the fGn process.

Sea-Surface Anomalies in Sea SAR Imagery

467

Fractional Gaussian noise is a Gaussian, null mean, and stationary discrete process. Its autocorrelation function is given by Equation 21.27) and is proportional to jmj2H 2 as m tends to infinity (Equation 21.28). Therefore, the discrete process exhibits SRD for 0 < H < 12, independence for H ¼ 12 (Brownian motion), and LRD for 12 < H < 1. As all second-order self-similar processes with stationary increments have the same second-order statistics as fBm, their increments have all the same correlation functions as fGn. FGn processes can be completely specified by three parameters: mean, variance, and the Hurst parameter. Some data sets in diverse fields of statistical applications, such as hydrology, broadband network traffic, and sea SAR images analysis, can exhibit a complex mixture of SRD and LRD. It means that the corresponding autocorrelation function behaves similar to that of LRD processes at large lags, and to that of SRD processes at small lags [8,26]. Models such as fGn can capture LRD but not SRD behavior. In these cases, we can use models specifically developed to characterize both LRD and SRD, like FARIMA and FEXP. 21.5.1

FARIMA Models

FARIMA models can be introduced as an extension of the classic ARMA and ARIMA models. To simplify the notation, we consider only zero-mean stochastic processes. An ARMA(p, q) process model is a stationary discrete random process. It is defined to be the stationary solution of F(B)Y(n) ¼ C(B)W(n)

(21:35)

where: .

B denotes the backshift operator defined by Y(n) Y(n 1) ¼ (1 B)Y(n) (Y(n) Y(n 1) ) (Y(n 1) Y(n 2) ) ¼ (1 B)2 Y(n)

.

(21:36)

F(x) and C(x) are polynomials of order p and q, respectively:

F(x) ¼ 1

p X

Fm xm

m¼1

C(x) ¼ 1

q X

Cm xm

m¼1 .

It is assumed that all solutions of F(x) ¼ 0 and C(x) ¼ 0 are outside the unit circle.

.

W(n), n ¼ 1, 2, . . . are i.i.d. Gaussian random variables with zero mean and variance sW2.

An ARIMA process is the stationary solution of F(B)(1 B)d Y(n) ¼ C(B)W(n)

(21:37)

Signal and Image Processing for Remote Sensing

468 where d is an integer. If d 0, then (1 B)d is given by

d X d (1 B) ¼ ( 1)m Bm m m¼0 d

(21:38)

with the binomial coefficients

d m

¼

d! G(d þ 1) ¼ m!(d m)! G(m þ 1)G(d m þ 1)

(21:39)

As the gamma function G(x) is defined for all real numbers, the definition of binomial coefficients can be extended to all real numbers d. The extended definition is given by 1 X d (1 B) ¼ ( 1)m Bm m m¼0 d

(21:40)

In Equation 21.26, if d is an integer, all the terms for m > d are zero. On the contrary, if d is a real number, we have a summation over an infinite number of indices. Definition 7: Let Y(n) be a discrete stationary stochastic process. If it is the solution of F(B)(1 B)d Y(n) ¼ C(B)W(n)

(21:41)

for some 12 < d < 12, then Y(n) is a FARIMA(p, d, q) process [27,28]. LRD occurs for 0 < d < 12, and for d 12 the process is not stationary. The coefficient d is called the fractional differencing parameter. There are four special cases of a FARIMA(p, d, q) model: . . . .

Fractional differencing (FD) ¼ FARIMA(0, d, 0) Fractionally autoregressive (FAR) ¼ FARIMA(p, d, 0) Fractionally moving average (FMA) ¼ FARIMA(0, d, q) FARIMA(p, d, q)

In Ref. [5], Ilow and Leung used FMA and FAR models to represent some sea SAR images collected by the RADARSAT sensor as 2D isotropic and homogeneous random fields. Bertacca et al. extended Ilow’s approach to characterize nonhomogeneous high-resolution sea SAR images [7,8]. In these papers, the modeling of the SRD part of the PSD model (MA, AR, or ARMA) required from 8 to more than 30 parameters. FARIMA(p, d, q) has p þ q þ 3 parameters, it is much more flexible than fGn in terms of the simultaneous modeling of both LRD and SRD, but it is known to require a large number of model parameters and to be not computationally efficient [26]. Using FEXP models, Bertacca et al. defined a simplified analysis technique of sea SAR images PSD [9]. 21.5.2

FEXP Models

FEXP models were introduced by Beran in Ref. [29] to reduce the numerical complexity of Whittle’s approximate maximum likelihood estimator (time domain MLE estimation of

Sea-Surface Anomalies in Sea SAR Imagery

469

long memory) [30] principally for large sample size, and to decrease the CPU times required for the approximate frequency domain MLE of long-memory, in particular, for high-dimensional parameter vectors to estimate. Operating with FEXP models leads to the estimation of the parameters in a generalized linear model. This methodology permits the valuation of the whole vector of parameters independently of their particular character (i.e., LRD or SRD parameters). In a generalized linear model, we observe a random response y with mean m and distribution function F [31]. The mean m can be expressed as g(m) ¼ h0 þ h1 v1 þ þ hn vn

(21:42)

where v1, v2, . . . ,vn are called explanatory variables and are related to m through the link function g(m). Let X(l), l ¼ 1, 2, . . . , n be samples of a stationary sequence at points l ¼ 1, 2, . . . , n. The periodogram ordinates, I(kj,n), j ¼ 1, 2, . . . , n, are given by [19]: 2 n

ilk 1 X 1 I(kj,n ) ¼ X(l) Xn e j,n ¼ 2pn l¼1 2p

n1 X

^ n (m)eimkj,n C

(21:43)

m¼(n1)

2pj where kj,n ¼ n , j ¼ 1, 2, . . . ,n are the Fourier frequencies, n ¼ n1 2 , bxc denotes the ^ integer part of x, Xn is the sample mean, and Cn(m) are the sample covariances: njmj X

^ n (m) ¼ 1 X(l) Xn X(l þ jmj) Xn C n l¼1

(21:44)

It is known that the periodogram I(k), calculated on an n-size data vector, is an asymptotically unbiased estimate of the power spectral density S(k): lim E{I(k)} ¼ S(k)

n!1

(21:45)

Further, for short-memory processes and a finite number of frequencies k1, . . . , kN 2 [0,p], the corresponding periodogram ordinates I(k1), . . . , I(kN) are approximately independent exponential random variables with means S(k1), . . . , S(kN). For long-memory processes, this result continues to be valid under certain mild regularity conditions [32]. The periodogram of the data is usually calculated at the Fourier frequencies. Thus, the samples I(kj,n) are independent exponential random variables with means S(kj,n). When we estimate the MRPSD, we can employ the central limit theorem and consider the mean radial periodogram ordinates Imr(kj,n) as approximately independent Gaussian random variables with means Smr(kj,n) (the MRPSD at the frequencies kj,n). Assume that yj,n ¼ Imr (kj,n )

(21:46)

Then the expected value of yj,n is given by m ¼ Smr (kj,n )

(21:47)

Suppose that there exists a link function n(m) ¼ h1 f1 (k) þ h2 f2 (k) þ þ hM fM (k)

(21:48)

Signal and Image Processing for Remote Sensing

470

Equation 21.48 defines a generalized linear model where y is the vector of the mean radial periodogram ordinates with a Gaussian distribution F, the functions fi (k), i ¼ 1, . . . , M are called explanatory variables, and n is the link function. It is observed that, near the origin, the spectral densities of long-memory processes are proportional to k2d ¼ e2d log(k)

(21:49)

so that a convenient choice of the link function is n(m) ¼ log (m)

(21:50)

Definition 8: Let r (k):[p, p] ! Rþ be an even function [29] for which lim k!0

r(k) k

¼ 1.

Define z0 ¼ 1. Let z1, . . . , zq be smooth even functions in [p,p] such that the n* (q þ 1) 4p 6p

2n p T matrix A, with column vectors zl 2p , l ¼ 1, . . . , q, is nonsingular n , zl n , zl n , . . . , zl n 1 for any n. Define a real vector f ¼ [h0, H, h1, . . . , hq] with 2 H < 1. A stationary discrete process Y(m) whose spectral density is given by ( S(k;f) ¼ r(k)12H exp

q X

) hl zl (k)

(21:51)

l¼0

is an FEXP process. The functions z1, . . . , zq are called short-memory components, whereas r is the long-memory component of the process. We can choose different sets of short-memory components. Each set z1, . . . , zq corresponds to a class of FEXP models. Beran observed that two classes of FEXP models are particularly convenient. These classes are characterized by the same LRD component k r(k) ¼ j1 eik j ¼ 2 sin (21:52) 2 If we define the short-memory functions as zl (k) ¼ cos (lk), l ¼ 1, 2, . . . , q

(21:53)

then the short-memory part of the PSD can be expressed as a Fourier series [33]. If the short-memory components are equal to zl (k) ¼ kl ,

l ¼ 1, 2, . . . , q

(21:54)

then the logarithm of the SRD component of the spectral density is assumed to be a finiteorder polynomial [34]. In all that follows, this class of FEXP models is referred to as polynomial FEXP models. 21.5.3

Spectral Densities of FARIMA and FEXP Processes

It is observed that a FARIMA process can be obtained by passing an FD process through an ARMA filter [19]. Therefore, in deriving the PSD of a FARIMA process SFARIMA(k), we can refer to the spectral density of an ARMA model SARMA(k). Following the notation of Section 21.5.1, we have that

Sea-Surface Anomalies in Sea SAR Imagery

SARMA (k) ¼

471 2 s2W C(eik ) 2pjC(eik )j

(21:55)

2

Then we can write the expression of SFARIMA(k) as ik 2d

SFARIMA (k) ¼ j1 e j

2d k SARMA (k) ¼ 2 sin SARMA (k) 2

(21:56)

If k tends to zero, SFARIMA(k) is asymptotically equal to SFARIMA (k) ¼ SARMA (0)jkj2d ¼

s2W jc(1)j2 2pjC(1)j2

jkj2d

(21:57)

and it diverges for d > 0. Comparing Equation 21.57 and Equation 21.31, we see that d¼H

1 2

(21:58)

As mentioned in the preceding sections, LRD occurs for 12 < H < 1 or 0 < d < 12. It has been demonstrated that LRD occurs for 0 < d < 32 for 2D self-similar processes, and, in particular, for a fractal image [35]. Let us rewrite the expression of the polynomial FEXP PSD as follows: ( ik 12H

S(k;f) ¼ j1 e

j

exp

q X l¼0

) hl zl (k)

2d k ¼ 2 sin SSRD (k;f) 2

(21:59)

where SSRD (k;f) denotes the SRD part of the PSD. Modeling of sea SAR images PSD is concerned with handling functions of both LRD and SRD behaviors. If we compare Equation 21.56 with Equation 21.59, the result is that FARIMA and FEXP models provide an equivalent description for the LRD behavior of the estimated sea SAR image MRPSD. To have a better understanding of the gain obtained by using polynomial FEXP PSD, we must analyze the expressions of its SRD component. It goes without saying that the exponential SRD of FEXP is more suitable than a ratio of polynomials to represent rapid SRD variability of the estimated MRPSD. In the next section, we employ FARIMA and FEXP models to fit some MRPSD obtained from highresolution ERS sea SAR images.

21 .6

LRD Modeling of Mean Radial Spectr al Densiti es of Se a S AR I ma ges

As described in Section 21.1, the MRPSD of a sea SAR image is obtained by using a rectangular to polar coordinates transformation of the 2D PSD and by calculating the average of the radial spectral densities for q ranging from 0 to 2p radians. This MRPSD can assume different shapes corresponding to low-wind areas, oil slicks, and the sea in the presence of strong winds. In any case, it diverges at the origin independently of the surface wind speeds, sea states, and the presence of oily substances on the sea surface [8].

472

Signal and Image Processing for Remote Sensing

LRD was first used by Ilow and Leung in the context of sea SAR images modeling and simulations. Analyzing some images collected by the RADARSAT sensor and utilizing FAR and FMA processes, they extended the applicability of long-memory models to characterize the sea clutter texture in high-resolution SAR imagery. An accurate representation of clutter in satellite images is important in sea traffic monitoring as well as in search and rescue operations [5], because it can lead to the development of automatic target recognition algorithms with improved performance. The work of Ilow and Leung was limited to isotropic processes. In fact, in moderate weather conditions, the ocean surface exhibits homogeneous behavior. The anisotropy of sea SAR images depends, in a complex way, on the SAR system (frequency, polarization, look-angle, sensor velocity, pulse repetition frequency, pulse duration, chirp bandwidth) and on the ocean (directional spectra of waves), slick (concentration, surface tension, and then ocean spectral dampening), and environment (wind history) parameters. It is observed [36] that the backscattered signal intensity depends on the horizontal angle w between the radar look direction and the directions of the waves on the sea surface. The maximum signal occurs when the radar looks in the upwind direction, a smaller signal when the radar looks downwind, and the minimum signal is measured when the radar looks normal to the wind direction. In employing the direct transformation defined by Hasselmann and Hasselmann [10], we observed that the anisotropy of sea SAR image spectra tends to decrease as w increases from 0 to p/2 radians [37]. We can consider the sea SAR images corresponding to sea surfaces with no swells and with surface winds with speeds lower than 5 m/sec as isotropic. When the surface wind speed increases, the anisotropy of the estimated directional spectra starts to be noticeable. For surface wind speeds higher than 10 m/sec, evident main lobes appear in the directional spectra of sea SAR images, and the hypothesis of isotropy is no longer appropriate. It is important to underline that a low surface wind speed is not always associated with an isotropic sea surface. Sometimes, wind waves generated far from the considered area can spread to a sea surface with little wind (with low amplitudes) and determine a reduction in the spatial correlation in the context of long-memory models. In addition, the land and some man-made structures can act as a shelter from the wind in the proximity of the coast. Also, an island or the particular conformation of the shore can represent a barrier and partially reduce the surface wind speed. Nevertheless, a low surface wind speed measured far from the coast, which is responsible for the attenuation of the sea wave spectra in the region of the wind waves, frequently corresponds to almost isotropic sea SAR images. The spectral density and autocorrelation properties, introduced in the previous sections for 1D self-similar and LRD series, can be extended to 2D processes Y(x, y), where (x, y) 2 R2, whose realizations are images [35]. This process is a collection of matrices whose elements are random variables. We also refer to Y(x, y) as a random field. If the autocorrelation function of the image is invariant under all Euclidean motions, then it depends only on the Euclidean distance between the points RY (u, v) ¼ E{Y(x, y)Y(x þ u, y þ v)}

(21:60)

and the field is referred to as homogeneous and isotropic. The PSD of the homogeneous field, SY(kx, ky), can be calculated by taking the 2D Fourier transform of RY(u, v). For sampled data, this is commonly implemented using the 2D FFT algorithm.

Sea-Surface Anomalies in Sea SAR Imagery

473

As observed in Ref. [5], for a statistically isotropic and homogeneous field, we can derive a 2D model from 1D model qaﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ by replacing k 2 R in the PSD of a 1D process with the radial frequency kR ¼ (k2x þ k2y ) to get the radial PSD. Isotropic and homogeneous random fields can be completely represented in the spectral domain by means of their radial PSD. On the contrary, the representation of nonisotropic fields requires additional information. Our experimental results show that FARIMA models always provide a good fit with the MRPSD of isotropic and nonisotropic sea SAR images. In the next section, the results obtained in Refs. [5,8] are compared with those we get by employing a modified FARIMA-based technique or using FEXP models. In particular, the proposed method utilizes a noniterative technique to estimate the fractional differencing parameter d [9]. As in Ref. [8], we use the LRD parameter d, which depends on the SAR image roughness, and a SRD parameter, which depends on the amplitude of the backscattered signal to discriminate between low-wind and oil slick areas in sea SAR imagery. Furthermore, we will demonstrate that FEXP models allow the computational efficiency to be maximized. In this chapter, with the general term ‘‘SAR images,’’ we refer to amplitude multi-look ground range SAR images, such as either ERS-2 PRI or ERS-2 GEC. ERS-2 PRI are corrected for antenna elevation gain and range-spreading loss. The terrain-induced radiometric effects and terrain distortion are not removed. ERS-2 SAR GEC are multi-look (speckle-reduced), ground range, system-corrected images. They are precisely located and rectified onto a map projection, but not corrected for terrain distortion. These images are high-level products, calibrated, and corrected for the SAR antenna pattern and range-spreading loss. 21.6.1

Estimation of the Fractional Differencing Parameter d

The LRD estimation algorithms set includes the R/S statistic, first proposed by Hurst in a hydrological context, the log–log correlogram, the log–log plot of the sample mean, the semivariogram, and the least-squares regression in the spectral domain. Least-squares regression in the spectral domain is concerned with the analysis of the asymptotic behavior of the estimated MRPSD b x c as kR tends to zero. As mentioned in Section 21.3.5, the mean radial periodogram is usually sampled at 2pj and b x c denotes the the Fourier frequencies kj,n ¼ n , j ¼ 1, 2, . . . n , where n ¼ n1 2 integer part of x. To obtain the LRD parameter d, we use a linear regression algorithm and estimate the slope of the mean radial periodogram, in a log–log plot, when the radial frequency tends to zero. It is observed that the LRD is an asymptotic property and that the negative slope of log{Imr(kR)} is usually proportional to the fractional differencing parameter only in a restricted neighborhood of the origin. This means that we must perform the linear regression using only a certain number of the smallest Fourier frequencies [38,39]. Least-squares regression in the spectral domain was employed by Geweke and PorterHudak in Ref. [38]. Ilow and Leung [5] extended their techniques and made use of a twostep procedure to estimate the FARIMA parameters of homogeneous and isotropic sea SAR images. An iterative two-step analogous approach was introduced by Bertacca et al. [7,8]. In all these papers, the d parameter was repeatedly estimated inside a loop for increasing Fourier frequencies. The loop was terminated by a logical expression, which compared the logarithms of the long-memory and short-memory spectral density components. In the proximity of the origin, the logarithm of the spectral density was dominated

474

Signal and Image Processing for Remote Sensing

by the long-memory component, so that only the Fourier frequencies for which the SRD contribution was negligible were chosen. There is an important difference between the 1D spectral densities introduced by Ilow and Bertacca. Ilow calculated the radial PSD through the conversion of frequency points on a square lattice from the 2D FFT to the discrete radial frequency points. The radial PSD was obtained at frequency points, which were not equally spaced, by an angular integration and averaging of the spectral density. Bertacca first introduced a rectangular to polar coordinates transformation of the 2D PSD. This was obtained through a cubic interpolation of the spectral density on a polar grid. Thus, the rapid fluctuations of the 2D periodogram were reduced before the mean radial periodogram computation. It is assumed (see Section 21.2.2) that the mean radial periodogram ordinates Imr(kj,n) are approximately independent Gaussian random variables with means Smr(kj,n) (the MRPSD at the Fourier frequencies kj,n). Then the average of the radial spectral densities for q ranging from 0 to 2p radians allows the variance of Imr(kj,n) to be significantly reduced. For example, if we calculate the 2D spectral density using nonoverlapping 256 256 size squared blocks of data, after going into polar coordinates (with the cubic interpolation), the mean radial periodogram is obtained by averaging over 256 different radial spectral densities. On the contrary, in Ilow’s technique, there is no interpolation and the average is calculated over a smaller number of frequency points. This means that the variance of the mean radial periodogram ordinates Imr(kj,n) introduced by Bertacca is significantly smaller than that obtained by Ilow. Furthermore, supposing that the scattering of the random variables Imr(kj,n) around their means Smr(kj,n) is the effect of an additive noise, then the variance reduction is not affected by the anisotropy of R sea SAR images. Let {kj,n}m j=1 denote the set of the smallest Fourier frequencies to be used for the fractional differencing parameter estimation. Experimental results (shown in Section 21.7) demonstrate that, near the origin, the ordinates log{Imr(kj,n)}, represented versus log{kj,n}, are approximately the samples of a straight line with a negative slope. In addition, allows the set of the the small variance of all the samples log {Imr (kj,n )}, kj,n ¼ 1, 2, . . . , n1 2 mR Fourier frequencies {kj,n}j=1 ¼ 1 to be easily distinguished from this log–log plot. In all that mR follows, we will limit the set {kj,n}j=1 to the last Fourier frequency before the lobe (the slope increment) of the plot. In Ref. [9], Bertacca et al. obtained good results by determining the number mR of the Fourier frequencies to be used for the parameter d estimation from the log–log plot of the mR mean radial periodogram. This method prevents the estimation of the set {kj,n}j=1 inside a loop and increases the computational efficiency of the proposed technique. The FARIMA MRPSD ordinates are equal to 2d kj,n Smr (kj,n ) ¼ 2 sin SARMA (kj,n ) 2 Calculating the logarithm of the above equation, we obtain kj,n log Smr (kj,n ) ¼ d log 4 sin2 þ log SARMA (kj,n ) 2

(21:61)

(21:62)

mR After determining {kj,n}j=1 and substituting the ordinates Smr(kj,n) with their estimates Imr (kj,n), we have

Sea-Surface Anomalies in Sea SAR Imagery

475

kj,n log Imr (kj,n ) d log 4 sin2 2

(21:63)

The least-squares estimator of d is given by mR P j ¼1 ^ d¼

)(vj v) ( uj u mR P

(21:64) )2 ( uj u

j¼1

mR mR u P P k j ,n j vj ¼ , and u . where vj ¼ log Imr (kj,n ) , uj ¼ log 4 sin2 , v ¼ m R 2 m R j¼1 j¼1 In Section 21.7, we show the experimental results corresponding to high-resolution sea SAR images of clean and windy sea areas, oil spill, and low-wind areas. It is observed that considering the SAR image of a rough sea surface that shows clearly identifiable darker zones corresponding to an oil slick (or to an oil spill) and a low-wind area, we have [8] the following: .

.

.

These darker areas are both caused by the amplitude attenuation for the tiny capillary and short-gravity waves contributing to the Bragg resonance phenomenon. In the oil slick (or spill), the low amplitude of the backscattered signal is due to the concentration of the surfactant in the water that affects only the shortwavelength waves. In the low-wind area, the amplitudes of all the wind-generated waves are reduced. In other words, the low-wind area tends to a flat sea surface.

Bertacca et al. observed that the spatial correlation of SAR subimages of low-wind areas always decays to zero at a slower rate than that related to the oil slicks (or spills) in the same SAR image [8]. Since lower decaying correlations are related to higher values of the fractional differencing parameter, low-wind areas are characterized by the greatest d value in the whole image. It is observed that oily substances determine the attenuation of only tiny capillary and short-gravity waves contributing to the Bragg resonance phenomenon [36]. Out of the Bragg wavelength range, the same waves are present in oil spill and clean water areas, and experimental results show that the shapes (and the fractional differencing parameter d) of the MRPSD of oil slick (or spill) and clean water areas on the SAR image are very similar. On the contrary, the shapes of the mean radial spectra differ for low-wind and windy clean water areas [8].

21.6.2

ARMA Parameter Estimation

After obtaining the fractional differencing parameter estimate ^d, we can write the SRD component of the mean radial periodogram as 2^d ^SRD (kj,n ) ¼ Imr (kj,n ) 2 sin kj,n ISRD (kj,n ) ¼ S 2 The square root of this function,

(21:65)

Signal and Image Processing for Remote Sensing

476

hSRD (kj,n ) ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ISRD (kj,n )

(21:66)

can be defined as the short-memory frequency response vector. The vector of the ARMA parameter estimates that fits the data, hSRD (kj,n), can be obtained using different algorithms. Ilow considered either MA or AR representation of the ARMA part. With his method, a 2D MA model, with a circular symmetric impulse response, is obtained from a 1D MA model with a symmetric impulse response in a similar way as the 2D filter is designed in the frequency transformation technique [40]. Bertacca considered ARMA models. He estimated the real numerator and denominator coefficients of the transfer function by employing the classical Levi algorithm [41], whose output was used to initialize the Gaussian Newton method that directly minimized the mean square error. The stability of the system was ensured by using the damped Gauss– Newton algorithm for iterative search [42]. 21.6.3

FEXP Parameter Estimation

FEXP models for sea SAR images modeling and representation were first introduced by Bertacca et al. in Ref. [9]. As observed in Ref. [19], using FEXP models leads to the estimation of parameters in a generalized linear model. A different approach consists of estimating the fractional differencing parameter d as described in Section 21.6.1. The SRD component of the mean radial periodogram, ISRD (kj,n), can be calculated as in Equation 21.65. After determining the logarithm of the data n o ^SRD (kj,n ) ¼ log ISRD (kj,n ) y, y(j) ¼ log S (21:67) we define the vector x ¼ x(j) ¼ kj,n, and compute the coefficients vector h of the polynomial p(x) that fits the data, p(x(j)) to y(j), in a least-squares sense. The SRD part of the FEXP model is equal to ( ) m X i (21:68) h(i)kj,n SFEXP srd (kj,n ) ¼ exp i¼0

where m denotes the order of the polynomial p(x). It is worth noting the difference between the FMA and the polynomial FEXP models for considering the computational costs and the number of the parameters to estimate. Using ^SRD (kj,n) and estimate the bestFMA models, we calculate the square root of ISRD (kj,n) ¼ S pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ mR . fit polynomial regression of f1 (kj,n ) ¼ ISRD (kj,n ), j ¼ 1, . . . , mR on {kj,n}j=1 Emp lo ying FEXP m ode ls, we perfo rm a pol ynomi al regress ion of pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ mR f2 (kj,n ) ¼ lo g ISRD (kj,n ) , j ¼ 1, . . . , mR on {k j, n } j =1 . Since the logarithm generates functions much smoother than those obtained by calculating the square root of ISRD(kj,n), FEXP models give the same goodness of fit of FARIMA models and require a reduced number of parameters.

21 .7

Ana lysi s of Sea S AR Images

In this section, we show the results obtained for an ERS-2 GEC and an ERS-2 SAR PRI (Figure 21.2 and Figure 21.3). In fact, the results show that the geocoding process does not

Sea-Surface Anomalies in Sea SAR Imagery

Sea

477

Low wind

FIGURE 21.2 Sea ERS-2 SAR GEC acquired near the Santa Catalina and San Clemente Islands (Los Angeles, CA). (Data provided by the European Space Agency ß ESA (2002). With permission.)

affect the LRD behavior of sea SAR image spectral densities. In all that follows, figures are represented with respect to the spatial frequency ks (in linear or logarithmic scale). The spatial resolution of ERS-2 SAR PRI is 12.5 m for both the azimuth and the (ground) range coordinates. The North and East axes pixel spacing is 12.5 m for ERS-2 SAR GEC. In the calculation of the periodogram, we average spectral estimates obtained from squared blocks of data containing 256 256 pixels. This square window size permits the representation of the low-frequency spectral components that provide good estimates of the fractional differencing parameter d. Thus, we have Spatial resolution ¼ 12:5(m) 1 fmax ¼ ¼ 0:08(m1 ) Resolution 2pj , j ¼ 1, . . . , 127, n ¼ 256 kj,n ¼ n ksj,n ¼ kj,n fmax

(21:69)

478

Signal and Image Processing for Remote Sensing

Sea

Oil Spill

FIGURE 21.3 Sea ERS-2 SAR PRI acquired near the coast of Asturias region, Spain. (Data provided by the European Space Agency ß ESA (2002). With permission.)

The first image (Figure 21.2) represents a rough sea area near Santa Catalina and San Clemente Islands (Los Angeles, USA), on the Pacific Ocean, containing a low-wind area (ERS-2 SAR GEC orbit 34364 frame 2943). The second image (Figure 21.3) covers the coast of the Asturias region on the Atlantic Ocean (ERS-2 SAR PRI orbit 40071 frame 2727), and represents a rough sea surface with extended oil spill areas. Four subimages from Figure 21.2 and Figure 21.3 are considered for this analysis. The two subimages marked in Figure 21.2 represent a rough sea area and a low-wind area, respectively. The two subimages marked in Figure 21.3 correspond to a rough sea area and to an oil spill. Note that not all the considered subimages have the same dimension. Subimages of greater dimension provide lower variance of the 2D periodogram. The two clean water areas contain 1024 1024 pixels. The low-wind area in Figure 21.2 and the oil spill in Figure 21.3 contain 512 512 pixels. Using FARIMA models, we estimate the parameter d as explained in Section 21.6.1. The estimation of the ARMA part of the spectrum is made as in Ref. [8]. First we choose the maximum orders, mmax and nmax, of the numerator and denominator polynomials of the ARMA transfer function. Then we estimate the real numerator and denominator coefficients in vectors b and a, for m and n indexes ranging from 0 to mmax and nmax, respectively. The estimated frequency response vector of the short memory PSD, hSRD (kj,n), has been defined in Equation 21.66 as the square root of ISRD (kj,n). The ARMA transfer function is the one that corresponds to the estimated frequency response ^hARMA (kj,n) for which the RMS error between hSRD(kj,n) and ^ hARMA(kj,n) is the minimum among all values of (m, n). The estimated ARMA part of the spectrum then becomes

Sea-Surface Anomalies in Sea SAR Imagery

479

TABLE 21.1 Estimated FARIMA Parameters Analyzed Image

mR

m

n

d

SRD A

Asturias region sea Asturias region oil spill Santa Catalina sea Santa Catalina low wind

11 15 7 11

11 16 6 3

18 11 6 14

0.471 0.568 0.096 0.423

4.87 0.24 218.88 0.47

^ARMA (kj,n ) ¼ j^hARMA (kj,n )j2 S

(21:70)

In the data processing performed using FARIMA models, we have used mmax ¼ 20, nmax ¼ 20: The FARIMA parameters estimated from the four considered sub SRD, introduced by Bertacca et al. in Ref. images are shown in Table 21.1. The parameter A [8], is given by SRD ¼ A

n

X

ISRD (kj,n )=n

(21:71)

j¼1

It depends on the strength of the backscattered signal and thus on the roughness of the sea surface. It is observed [8] that the ratio of these parameters for the clean sea and the oil spill SAR images is always lower than those corresponding to the clean sea and the low-wind subimages SRD S SRD S A A < SRD OS A SRD LW A

(21:72)

SRD_S From Table 21.1, we have AASRD S ¼ 20:29 and AASRD S ¼ 465:7. The parameter A SRD OS SRD LW assumes low values for clean sea areas with moderate wind velocities, as we can see in Table 21.1, for the subimage of Figure 21.3. However, the attenuation of the Bragg resonance phenomenon due to the oil on the sea surface leads to the measurement of a SRD_S and A SRD_OS. In any case, for rough sea surfaces, the discrimhigh ratio between A ination between oil spill and low wind is very easy using either the LRD parameter d or SRD. The parameter d is used together with A SRD to resolve different the SRD parameter A darker areas corresponding to smooth sea surfaces [8]. Figure 21.4 and Figure 21.5 show the estimated mean radial spectral densities and their FARIMA models for all the subimages of Figure 21.2 and Figure 21.3. Table 21.2 displays the results obtained by using an FEXP model, and estimating its parameters in a generalized linear model of order m ¼ 2, for the almost isotropic sea SAR subimage shown in Figure 21.3. In Figure 21.6, we examine visually the fit of the estimated model to the MRPSD of the sea clutter. Figure 21.7 and Figure 21.8 show the estimated mean radial spectral densities of all the subimages of Figure 21.2 and Figure 21.3 and their FEXP models obtained by using the two-step technique described in Section 21.6.3. We notice that increasing the polynomial FEXP order allows better fits to be obtained. To illustrate this, FEXP models of order m ¼ 2 and m ¼ 20 are shown in Figure 21.7 and Figure 21.8.

Signal and Image Processing for Remote Sensing

480 104

Mean radial PSD FARIMA model fitting

103

Mean radial PSD

Sea

102 Low wind

101

100

10–1 10–3

10–2 10–1 Spatial frequency (radians/m)

100

FIGURE 21.4 Mean radial power spectral densities and FARIMA models corresponding to Figure 21.2.

21.7.1

Two-Dimensional Long-Memory Models for Sea SAR Image Spectra

As mentioned in Section 21.6, the 2D periodograms estimated for sea SAR images can exhibit either isotropic or anisotropic spatial structure [8]. This depends on the SAR system, on the directional spectra of waves, on the possible presence of oil spills or slicks, and on the wind history parameters. fBm and FARIMA models have been used to represent isotropic and homogeneous self-similar random fields [5,8,35,43]. Owing to the complex nature of the surface scattering phenomenon involved in the problem and the highly nonlinear techniques of SAR image generation, the definition of anisotropic spectral densities models is not merely related to the sea state or wind conditions. In the context of self-similar random fields, anisotropy can be introduced by linear spatial transformations of isotropic fractal fields, by spatial filtering of fields with desired characteristics, or by building intrinsically anisotropic fields as in the case of the so-called fractional Brownian sheet [43]. Anisotropic 2D FARIMA has been introduced in Ref. [43] as the discrete-space equivalent of a 2D fractionally differenced Gaussian noise, taking into account the long-memory property and the directionality of the image. In that work, Pesquet-Popescu and Le´vy Ve´hel defined the anisotropic extension of 2D FARIMA by multiplying its isotropic spectral density by an anisotropic part Aa,w (kx, ky). The function Aa,w (kx, ky) depended on two parameters: a and w. The parameter w 2 (0,2p] provided the

Sea-Surface Anomalies in Sea SAR Imagery

481

104 Mean radial PSD FARIMA model fitting

103

Mean radial PSD

Sea

102

Oil spill

101

100 10–3

10–2 10–1 Spatial frequency (radians/m)

100

FIGURE 21.5 Mean radial power spectral densities and FARIMA models corresponding to Figure 21.3.

orientation of the field. The anisotropy coefficient a 2 [0,1) determined the dispersion of the PSD around its main direction w. This anisotropic model allows the directionality and the shape of the 2D PSD to be modified, but the maximum of the PSD main lobe depends on the values assumed by the anisotropy coefficient. Furthermore, the anisotropy of this model depends on only one parameter (a). Experimental results show that this is not sufficient to correctly represent the anisotropy of the PSD of sea SAR images. This means the model must be changed both to provide a better fit with the spectral densities of sea SAR images and allow the shape and the position of the spectral density main lobes to be independently modified. TABLE 21.2 Parameter Estimates for the FEXP Model of Order m ¼ 2 of Figure 21.6. Parameter

Coefficient Estimates

Standard Errors

t-Statistics

p-Values

¼ ¼ ¼ ¼

1.0379 2.2743 25.651 126.49

0.018609 0.11393 1.7359 9.7813

55.775 19.962 14.777 12.932

3.5171e089 2.1482e040 4.881e029 1.1051e024

b(0) b(1) b(2) b(3)

2d h(0) h(1) h(2)

Note: p-Values are given for testing b(i) ¼ 0 against the two-sided alternative b(i) 6¼ 0.

Signal and Image Processing for Remote Sensing

482 104

FEXP model fitting (order m = 2) Mean radial PSD

Mean radial PSD

103

102

101

100 10–3

10–2 10–1 Spatial frequency (rad/m)

100

FIGURE 21.6 Mean radial power spectral densities and FEXP model corresponding to the sea subimage of Figure 21.3.

Bertacca et al. [44] showed that the PSD of either isotropic or anisotropic sea SAR images can be modeled by multiplying the isotropic spectral density of a 2D FARIMA by an anisotropic part derived from the fractal model of sea surface spectra [6]. This anisotropic component depends on seven different parameters and can be broken down into a radial component, the omnidirectional spectrum S(ksj,n), and a spreading function G (qi,m), qi:m ¼ 2pi m , i ¼ 1, 2, . . . , m. This allows the shape, the slope, and the anisotropic characteristics of the 2D PSD to be independently modified and a good fit with the estimated periodogram to be obtained. With a different approach [37], a reliable model of anisotropic PSD has been defined, for sea SAR intensity images, by adding a 2D isotropic FEXP to an anisotropic term newly defined starting from the fractal model of sea surface spectra [6]. However, the additive model cannot be interpreted as an anisotropic extension of the LRD spectral models as in Refs. [5,44]. Analyzing homogeneous and isotropic random fields leads to the estimation of isotropic spectral densities. In such cases, it is better to utilize FEXP instead of FARIMA models [9]. To illustrate this, in Figure 21.9 and Figure 21.10, we show the isotropic PSD of the low-wind area in Figure 21.3 and the result obtained by employing a 2D isotropic FEXP model to represent it. Figure 21.11 and Figure 21.12 compare the anisotropic PSD estimated from a directional sea SAR image (ERS-2 SAR PRI orbit 40801 frame 2727, data provided by the European Space Agency ß ESA (2002)) with its additive model [37].

Sea-Surface Anomalies in Sea SAR Imagery

483

104

103

Mean radial PSD

Sea 102 Low wind

101

100 FEXP model fitting (m = 2) Mean radial PSD FEXP model fitting (m = 20) 10–1 –3 10

10–2 10–1 Spatial frequency (rad/m)

100

FIGURE 21.7 Mean radial PSD and FEXP models corresponding to Figure 21.2.

In Figure 21.9 and Figure 21.10, we present results using the log scale for the z-axis. This emphasizes the divergence of the long-memory PSD. In Figure 21.11 and Figure 21.12, to better examine the anisotropy characteristics of the spectral densities at medium and high frequencies, the value of the PSD at the origin has been set to zero.

21 .8

Conc lusi ons

Some self-similar and LRD processes, such as fBm, fGn, FD, and FMA or FAR models, have been used in the literature to model the spatial correlation properties of the scattering from natural surfaces. These models have demonstrated reliable results in the analysis and modeling of high-definition sea SAR images under the assumption of homogeneity and isotropy. Unfortunately, a wind speed greater than 7 m/sec is often present on sea surfaces and this can produce directionality and nonhomogeneity of the corresponding SAR images. Furthermore, the mean radial spectral density (MRPSD) of sea SAR images always shows an LRD behavior and a slope that changes with increasing frequency. This means that:

Signal and Image Processing for Remote Sensing

484

104

Mean radial PSD

103

Sea

Oil spill 102

101

FEXP model fitting (order m = 2) Mean radial PSD FEXP model fitting (order m = 20) 100 10–3

10–2 10–1 Spatial frequency (radians/m)

100

FIGURE 21.8 Mean radial PSD and FEXP models corresponding to Figure 21.3.

.

.

When analyzing mean radial spectral densities (1D), fBm and FD models, which do not represent both the long and short-memory behaviors, cannot provide a good fit. When analyzing nonhomogeneous and nonisotropic sea SAR images, due to their directionality, suitable 2D LRD spectral models, corresponding to nonhomogeneous and anisotropic random fields, must be defined to better fit their power spectral densities.

Here we have presented a brief review of the most important LRD models as well as an explanation of the techniques of analysis and discrimination of sea SAR images corresponding to oil slicks (or spills) and low-wind areas. In the context of MRPSD modeling, we have shown that FEXP models improve the goodness of fit of FARIMA models and ensure lower computational costs. In particular, we have used Bertacca’s noniterative technique for estimating the LRD parameter d. The result obtained by using this algorithm has been compared with that corresponding to the estimation of parameters in a generalized linear model. The two estimates of the parameter d, corresponding to the sea SAR subimage of Figure 21.3, are dI ¼ 0:471, dII ¼ b(0) 2 ¼ 0:518, from Table 21.1 and Table 21.2, respectively. This demon-

Sea-Surface Anomalies in Sea SAR Imagery

485

1010

2D PSD

105

100

10–5 0.2 0.1

0.2 0.1

0 0

–0.1

–0.1 –0.2

–0.2

ky (radians/m)

kx (radians/m)

FIGURE 21.9 The isotropic spectral density estimated from the low-wind area of Figure 21.2.

strates the reliability of the noniterative algorithm: it prevents the estimation of the set R fkj ,ngm j¼1 inside a loop and increases the computational efficiency. We notice that the ratios of the FD parameters for oil spill and clean sea subimages are always lower than those corresponding to low-wind and clean sea areas. From Table 21.1, we have that dOS 0:568 ¼ 1:206 ¼ 0:471 dS dLW 0:423 ¼ 4:406 ¼ 0:096 dS

(21:73)

As observed by Bertacca et al. [8], the discrimination among clean water, oil spill, and low-wind areas is very easy by using either the long-memory parameter d or the short SRD, in the case of rough sea surfaces. The two parameters must be memory parameter A used together only to resolve different darker areas corresponding to smooth sea surfaces. Furthermore, using large square window sizes in the estimation of the SAR image PSD allows low-frequency spectral components to be represented and good estimates of the fractional differencing parameter to be obtained. Therefore, the proposed technique can be better used to distinguish oil spills or slicks with large extent from low-wind areas. In the context of modeling of 2D spectral densities, we have shown that the anisotropic models introduced in the literature by linear spatial transformations of isotropic fractal fields, by spatial filtering of fields with desired characteristics, or by building intrinsically

Signal and Image Processing for Remote Sensing

486

1010

2D PSD

105

100

0.2 0.1

0.2 0.1

0 0

–0.1

–0.1 –0.2

–0.2

ky (rad/m)

kx (rad/m)

FIGURE 21.10 The 2D isotropic FEXP model (m ¼ 20) of the PSD in Figure 21.9.

× 1010

–0.25

9 –0.2 8 –0.15 7

ky (rad/m)

–0.1

6

–0.05

5

0 0.05

4

0.1

3

0.15

2

0.2

1

0.25

–0.2

–0.1

0 kx (rad/m)

FIGURE 21.11 Anisotropic PSD estimate from a directional sea SAR image.

0.1

0.2

Sea-Surface Anomalies in Sea SAR Imagery

487

× 1010 9

–0.25 –0.2

8

–0.15

7

–0.1 ky (radians/m)

6 –0.05 5 0 4

0.05

3

0.1 0.15

2

0.2

1

0.25 –0.2

–0.1

0 kx (radians/m)

0.1

0.2

FIGURE 21.12 Anisotropic 2D model of the PSD of Figure 21.11.

anisotropic fields, are not suitable for representing the spectral densities of sea SAR images. In this chapter, two anisotropic spectral models have been presented. They have been obtained by either adding or multiplying a 2D isotropic FEXP to an anisotropic term defined starting from the fractal model of sea surface spectra [6,37]. To illustrate this, we have shown an anisotropic sea SAR image PSD in Figure 21.11, and its spectral model, in Figure 21.12, obtained by adding a 2D FEXP model of order m ¼ 20 to an anisotropic fractal spectral component.

Re fer enc es 1. Berizzi, F. et al., Fractal mapping for sea surface anomalies recognition, IEEE Proc. IGARSS 2003, Toulouse, France, 4, 2665–2667, 2003. 2. Franceschetti, G. et al., SAR raw signal simulation of oil slicks in ocean environments, IEEE Trans. Geosci. Rem. Sens., 40, 1935, 2002. 3. Datcu, M., Model for SAR images, Int. Symp. Opt. Eng. Photonics Aerospace Sens., SPIE, Orlando, FL, Apr 1992. 4. Stewart, C.V. et al., Fractional Brownian motion models for synthetic aperture radar imagery scene segmentation, Proc. IEEE, 81, 1511, 1993. 5. Ilow, J. and Leung, H., Self-similar texture modeling using FARIMA processes with applications to satellite images, IEEE Trans. Image Process., 10, 792, 2001. 6. Berizzi, F. and Dalle Mese, E., Sea-wave fractal spectrum for SAR remote sensing, IEEE Proc. Radar Sonar Navigat., 148, 56, 2001. 7. Bertacca, M. et al., A FARIMA-based analysis for wind falls and oil slicks discrimination in sea SAR imagery, Proc. IGARSS 2004, Alaska, 7, 4703, 2004.

488

Signal and Image Processing for Remote Sensing

8. Bertacca, M., Berizzi, F., and Dalle Mese, E., A FARIMA-based technique for oil slick and lowwind areas discrimination in sea SAR imagery, IEEE Trans. Geosci. Rem. Sens., 43, 2484, 2005. 9. Bertacca, M., Berizzi, F., and Dalle Mese, E., FEXP models for oil slick and low-wind areas analysis and discrimination in sea SAR images, SEASAR 2006: Advances in SAR oceanophy from ENVISAT and ERS mission, ESA ESRIN, Prascati (Rome), Italy, 2006 to ESA-SEASAR 2006 Advances in SAR oceanography from ENVISAT and ERS missions. 10. Hasselmann, K. and Hasselmann, S., The nonlinear mapping of an ocean wave spectrum into a synthetic aperture radar image spectrum and its inversion, J. Geophys. Res., 96, 713, 1991. 11. Therrien, C.W., Discrete Random Signals and Statistical Signal Processing, Prentice Hall, Englewood Cliffs, NJ, 1992. 12. Oppenheim, A.V. and Schafer, R.W. Digital Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1975. 13. Priestley, M.B., Spectral Analysis of Time Series, Accdemic Press, London, 1981. 14. Willinger, W., et al., Self-similarity in high-speed packet traffic: analysis and modeling of ethernet traffic measurements, Stat. Sci., 10, 67, 1995. 15. Rose, O., Estimation of the Hurst parameter of long-range dependent time series, Institute Of Computer Science, University of Wu¨rzburg, Wu¨rzburg, Germany, Res. Rep. 137, 1996. 16. Lamperti, J.W., Semi-stable stochastic processes, Trans. Am. Math. Soc., 104, 62, 1962. 17. Veervaat, W., Properties of general self-similar processes, Bull. Int. Statist. Inst., 52, 199, 1987. 18. Veervaat, W., Sample path properties of self-similar processes with stationary increments, Ann. Probab., 13, 1, 1985. 19. Beran, J., Statistics for Long-Memory Processes, Chapman & Hall, London, U.K., 1994, chap.2. 20. Sinai, Ya.G., Self-similar probability distributions, Theory Probab. Appl., 21, 64, 1976. 21. Akaike, H., A limiting process which asymptotically produces f2 spectral density, Ann. Inst. Statist. Math., 12, 7, 1960. 22. Sayles, R.S. and Thomas, T.R., Surface topography as a nonstationary random process, Nature, 271, 431, 1978. 23. Sarpkaya, T. and Isaacson, M., Mechanics of Wave Forces on Offshore Structures, Van Nostrand Reinhold, New York, 1981. 24. Bretschneider, C.L., Wave variability and wave spectra for wind-generated gravity waves, Beach Erosion Board, U.S. Army Corps Eng., U.S. Gov. Printing Off., Washington, DC, Tech. Memo., pp. 118, 1959. 25. Mandelbrot, B.B. and Van Ness, J.W., Fractional Brownian motions, fractional noises and applications, SIAM Rev., 10, 422, 1968. 26. Ma, S. and Chuanyi, J., Modeling heterogeneous network traffic in wavelet domain, IEEE/ACM Trans. Networking, 9, 634, 2001. 27. Granger, C.W.J. and Joyeux, R., An introduction to long-range time series models and fractional differencing, J. Time Ser. Anal., 1, 15, 1980. 28. Hosking, J.R.M., Fractional differencing, Biometrika, 68, 165, 1981. 29. Beran, J., Fitting long-memory models by generalized linear regression, Biometrika, 80, 785, 1993. 30. Whittle, P., Estimation and information in stationary time series, Ark. Mat., 2, 423, 1953. 31. McCullagh, P. and Nelder, J.A., Generalized Linear Models, Chapman and Hall, London, 1983. 32. Yajima, Y., A central limit theorem of Fourier transforms of strongly dependent stationary processes, J. Time Ser. Anal., 10, 375, 1989. 33. Bloomfield, P., An exponential model for the spectrum of a scalar time series, Biometrika, 60, 217, 1973. 34. Diggle, P., Time Series: A Biostatistical Introduction, Oxford University Press, Oxford, 1990. 35. Reed, I., Lee, P., and Truong, T., Spectral representation of fractional Brownian motion in n dimension and its properties, IEEE Trans. Inf. Theory, 41, 1439, 1995. 36. Ulaby, F.T., Moore, R.K., and Fung, A.K., Microwave Remote Sensing, Artech House Inc., Vol. II, Norwood, 1982, chap. 11. 37. Bertacca, M., Analisi spettrale delle immagini SAR della superficie marina per la discriminazione delle anomalie di superficie, Ph.D. thesis, University of Pisa, Pisa, 2005. 38. Geweke, J. and Porter-Hudak, S., The estimation and application of long memory time series models, J. Time Ser. Anal., 4, 221,1983.

Sea-Surface Anomalies in Sea SAR Imagery

489

39. Robinson, P.M. and Hidalgo, F.J. Time series regression with long-range dependence, Ann. Statist., 25, 77, 1997. 40. Lim, J., Two-Dimensional Signal and Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1990. 41. Levi, E.C., Complex-curve fitting, IRE Trans. Autom. Control, AC-4, 37, 1959. 42. Dennis, J.E. Jr. and Schnabel, R.B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Upper Saddle River, NJ, 1983. 43. Pesquet-Popescu, B. and Ve´hel, J.L., Stochastic fractal models for image processing, IEEE Signal Process. Mag., 19, 48, 2002. 44. Berizzi, F., Bertacca, M., et al., Development and validation of a sea surface fractal model: project results and new perspectives, in Proc. 2004 Envisat & ERS Symp., Salzburg, Austria, 2004.

22 Spatial Techniques for Image Classification*

Selim Aksoy

CONTENTS 22.1 Introduction ..................................................................................................................... 491 22.2 Pixel Feature Extraction................................................................................................. 493 22.3 Pixel Classification.......................................................................................................... 497 22.4 Region Segmentation...................................................................................................... 502 22.5 Region Feature Extraction ............................................................................................. 504 22.6 Region Classification ...................................................................................................... 506 22.7 Experiments ..................................................................................................................... 506 22.8 Conclusions...................................................................................................................... 509 Acknowledgments ..................................................................................................................... 512 References ................................................................................................................................... 512

22.1

Introduction

The amount of image data that is received from satellites is constantly increasing. For example, nearly 3 terabytes of data are being sent to Earth by NASA’s satellites every day [1]. Advances in satellite technology and computing power have enabled the study of multi-modal, multi-spectral, multi-resolution, and multi-temporal data sets for applications such as urban land-use monitoring and management, GIS and mapping, environmental change, site suitability, and agricultural and ecological studies. Automatic content extraction, classification, and content-based retrieval have become highly desired goals for developing intelligent systems for effective and efficient processing of remotely sensed data sets. There is extensive literature on classification of remotely sensed imagery using parametric or nonparametric statistical or structural techniques with many different features [2]. Most of the previous approaches try to solve the content extraction problem by building pixel-based classification and retrieval models using spectral and textural features. However, a recent study [3] that investigated classification accuracies reported in the last 15 years showed that there has not been any significant improvement in the

*This work was supported by the TUBITAK CAREER Grant 104E074 and European Commission Sixth Framework Programme Marie Curie International Reintegration Grant MIRG-CT-2005-017504.

491

492

Signal and Image Processing for Remote Sensing

performance of classification methodologies over this period. The reason behind this problem is the large semantic gap between the low-level features used for classification and the high-level expectations and scenarios required by the users. This semantic gap makes a human expert’s involvement and interpretation in the final analysis inevitable, and this makes processing of data in large remote-sensing archives practically impossible. Therefore, practical accessibility of large remotely sensed data archives is currently limited to queries on geographical coordinates, time of acquisition, sensor type, and acquisition mode [4]. The commonly used statistical classifiers model image content using distributions of pixels in spectral or other feature domains by assuming that similar land-cover and land-use structures will cluster together and behave similarly in these feature spaces. However, the assumptions for distribution models often do not hold for different kinds of data. Even when nonlinear tools such as neural networks or multi-classifier systems are used, the use of only pixel-based data often fails expectations. An important element of understanding an image is the spatial information because complex land structures usually contain many pixels that have different feature characteristics. Remote-sensing experts also use spatial information to interpret the land-cover because pixels alone do not give much information about image content. Image segmentation techniques [5] automatically group neighboring pixels into contiguous regions based on similarity criteria on the pixels’ properties. Even though image segmentation has been heavily studied in image processing and computer vision fields, and despite the early efforts [6] that use spatial information for classification of remotely sensed imagery, segmentation algorithms have only recently started receiving emphasis in remote-sensing image analysis. Examples of image segmentation in the remote-sensing literature include region growing [7] and Markov random field models [8] for segmentation of natural scenes, hierarchical segmentation for image mining [9], region growing for object-level change detection [10] and fuzzy rule–based classification [11], and boundary delineation of agricultural fields [12]. We model spatial information by segmenting images into spatially contiguous regions and classifying these regions according to the statistics of their spectral and textural properties and shape features. To develop segmentation algorithms that group pixels into regions, first, we use nonparametric Bayesian classifiers that create probabilistic links between low-level image features and high-level user-defined semantic land-cover and land-use labels. Pixel-level characterization provides classification details for each pixel with automatic fusion of its spectral, textural, and other ancillary attributes [13]. Then, each resulting pixel-level classification map is converted into a set of contiguous regions using an iterative split-and-merge algorithm [13,14] and mathematical morphology. Following this segmentation process, resulting regions are modeled using the statistical summaries of their spectral and textural properties along with shape features that are computed from region polygon boundaries [14,15]. Finally, nonparametric Bayesian classifiers are used with these region-level features that describe properties shared by groups of pixels to classify these groups into land-cover and land-use categories defined by the user. The rest of the chapter is organized as follows. An overview of feature data used for modeling pixels is given in Section 22.2. Bayesian classifiers used for classifying these pixels are described in Section 22.3. Algorithms for segmentation of regions are presented in Section 22.4. Feature data used for modeling resulting regions are described in Section 22.5. Application of the Bayesian classifiers to region-level classification is described in Section 22.6. Experiments are presented in Section 22.7 and conclusions are provided in Section 22.8.

Spatial Techniques for Image Classification

22.2

493

Pixel Feature Extraction

The algorithms presented in this chapter will be illustrated using three different data sets: .

DC Mall: Hyperspectral digital image collection experiment (HYDICE) image with 1,280 307 pixels and 191 spectral bands corresponding to an airborne data flightline over the Washington DC Mall area. The DC Mall data set includes seven land-cover and land-use classes: roof, street, path, grass, trees, water, and shadow. A thematic map with ground-truth labels for 8,079 pixels was supplied with the original data [2]. We used this ground truth for testing and separately labeled 35,289 pixels for training. Details are given in Figure 22.1.

(a) DC Mall data

(b) Training map

(c) Test map

Roof (3834) Roof (5106) Street (416) Street (5068) Path (175) Path (1144) Grass (1928) Grass (8545) Trees (5078)

Trees (405)

Water (9157)

Water (1224)

Shadow (1191)

Shadow (97)

FIGURE 22.1 (See color insert following page 302.) False color image of the DC Mall data set (generated using the bands 63, 52, and 36) and the corresponding groundtruth maps for training and testing. The number of pixels for each class is shown in parenthesis in the legend.

Signal and Image Processing for Remote Sensing

494 .

Centre: Digital airborne imaging spectrometer (DAIS) and reflective optics system imaging spectrometer (ROSIS) data with 1,096 715 pixels and 102 spectral bands corresponding to the city center in Pavia, Italy. The Centre data set includes nine land-cover and land-use classes: water, trees, meadows, self-blocking bricks, bare soil, asphalt, bitumen, tiles, and shadow. The thematic maps for ground truth contain 7,456 pixels for training and 148,152 pixels for testing. Details are given in Figure 22.2.

.

University: DAIS and ROSIS data with 610 340 pixels and 103 spectral bands corresponding to a scene over the University of Pavia, Italy. The University data set also includes nine land-cover and land-use classes: asphalt, meadows, gravel, trees, (painted) metal sheets, bare soil, bitumen, self-blocking bricks, and shadow. The thematic maps for ground truth contain 3,921 pixels for training and 42,776 pixels for testing. Details are given in Figure 22.3.

(b) Training map (a) Centre data

Water (824) Trees (820) Meadows (824) Self-blocking bricks (808) Bare soil (820) Asphalt (816) Bitumen (808) Tiles (1260) Shadow (476)

(c) Test map

Water (65971) Trees (7598) Meadows (3090) Self-blocking bricks (2685) Bare soil (6584) Asphalt (9248) Bitumen (7287) Tiles (42826) Shadow (2863)

FIGURE 22.2 (See color insert following page 302.) False color image of the Centre data set (generated using the bands 68, 30, and 2) and the corresponding groundtruth maps for training and testing. The number of pixels for each class is shown in parenthesis in the legend. (A missing vertical section in the middle was removed.)

Spatial Techniques for Image Classification

495 (b) Training map

(a) University data

Asphalt (548) Meadows (540) Gravel (392) Trees (524) (Painted) metal shoots (265) Bare soil (532) Bitumen (375) Self-blocking bricks (514) Shadow (231)

(c) Test map

Asphalt (6631) Meadows (18649) Gravel (2099) Trees (3064) (Painted) metal shoots (1345) Bare soil (5029) Bitumen (1330) Self-blocking bricks (3682) Shadow (947)

FIGURE 22.3 (See color insert following page 302.) False color image of the University data set (generated using the bands 68, 30, and 2) and the corresponding ground-truth maps for training and testing. The number of pixels for each class is shown in parenthesis in the legend.

The Bayesian classification framework that will be described in the rest of the chapter supports fusion of multiple feature representations such as spectral values, textural features, and ancillary data such as elevation from DEM. In the rest of the chapter, pixel-level characterization consists of spectral and textural properties of pixels that are extracted as described below. To simplify computations and to avoid the curse of dimensionality during the analysis of hyperspectral data, we apply Fisher’s linear discriminant analysis (LDA) [16] that finds a projection to a new set of bases that best separate the data in a least-square sense. The resulting number of bands for each data set is one less than the number of classes in the ground truth. We also apply principal components analysis (PCA) [16] that finds a projection to a new set of bases that best represent the data in a least-square sense. Then, we retain the top ten principal components instead of the large number of hyperspectral bands. In addition,

Signal and Image Processing for Remote Sensing

496

we extract Gabor texture features [17] by filtering the first principal component image with Gabor kernels at different scales and orientations shown in Figure 22.4. We use kernels rotated by np/4, n ¼ 0, . . . , 3, at four scales resulting in feature vectors of length 16. In previous work [13], we observed that, in general, microtexture analysis algorithms like Gabor features smooth noisy areas and become useful for modeling neighborhoods of pixels by distinguishing areas that may have similar spectral responses but have different spatial structures. Finally, each feature component is normalized by linear scaling to unit variance [18] as ~ x ¼

xm s

(22:1)

~ is the normalized value, m is the sample mean, where x is the original feature value, x and s is the sample standard deviation of that feature, so that the features with larger

s = 1, o = 0°

s = 1, o = 45°

s = 1, o = 90°

s = 1, o = 135°

s = 2, o = 0°

s = 2, o = 45°

s = 2, o = 90°

s = 2, o = 135°

s = 3, o = 0°

s = 3, o = 45°

s = 3, o = 90°

s = 3, o = 135°

s = 4, o = 0°

s = 4, o = 45°

s = 4, o = 90°

s = 4, o = 135°

FIGURE 22.4 Gabor texture filters at different scales (s ¼ 1, . . . , 4) and orientations (o 2 {08,458,908,1358}). Each filter is approximated using 3131 pixels.

Spatial Techniques for Image Classification

497

FIGURE 22.5 Pixel feature examples for the DC Mall data set. From left to right: the first LDA band, the first PCA band, Gabor features for 908 orientation at the first scale, Gabor features for 08 orientation at the third scale, and Gabor features for 458 orientation at the fourth scale. Histogram equalization was applied to all images for better visualization.

ranges do not bias the results. Examples for pixel-level features are shown in Figure 22.5 through Figure 22.7.

22 .3

Pi xel Clas sific ation

We use Bayesian classifiers to create subjective class definitions that are described in terms of easily computable objective attributes such as spectral values, texture, and ancillary data [13]. The Bayesian framework is a probabilistic tool to combine information from multiple sources in terms of conditional and prior probabilities. Assume there are k class labels, w1, . . . , wk, defined by the user. Let x1, . . . , xm be the attributes computed for a pixel. The goal is to find the most probable label for that pixel given a particular set of values of these attributes. The degree of association between the pixel and class wj can be computed using the posterior probability

498

Signal and Image Processing for Remote Sensing

FIGURE 22.6 Pixel feature examples for the Centre data set. From left to right, first row: the first LDA band, the first PCA band, and Gabor features for 1358 orientation at the first scale; second row: Gabor features for 458 orientation at the third scale, Gabor features for 458 orientation at the fourth scale, and Gabor features for 1358 orientation at the fourth scale. Histogram equalization was applied to all images for better visualization.

p(wj j x1 , . . . , xm ) p(x1 , . . . , xm j wj )p(wj ) ¼ p(x1 , . . . , xm ) p(x1 , . . . , xm j wj )p(wj ) ¼ p(x1 , . . . , xm j wj )p(wj ) þ p(x1 , . . . , xm j :wj )p(:wj ) Q p(wj ) m i ¼ 1 p(xi j wj ) Qm Q ¼ p(wj ) i ¼ 1 p(xi j wj ) þ p(:wj ) m i ¼ 1 p(xi j :wj )

(22:2)

under the conditional independence assumption. The conditional independence assumption simplifies learning because the parameters for each attribute model p(xi j wj) can be estimated separately. Therefore, user interaction is only required for the labeling of pixels as positive (wj) or negative (–wj) examples for a particular class under training. Models for

Spatial Techniques for Image Classification

499

FIGURE 22.7 Pixel feature examples for the University data set. From left to right, first row: the first LDA band, the first PCA band, and Gabor features for 458 orientation at the first scale; second row: Gabor features for 458 orientation at the third scale, Gabor features for 1358 orientation at the third scale, and Gabor features for 1358 orientation at the fourth scale. Histogram equalization was applied to all images for better visualization.

different classes are learned separately from the corresponding positive and negative examples. Then, the predicted class becomes the one with the largest posterior probability and the pixel is assigned the class label w*j ¼ arg max p(wj j x1 , . . . , xm ) j ¼ 1, ... , k

(22:3)

We use discrete variables and a nonparametric model in the Bayesian framework where continuous features are converted to discrete attribute values using the unsupervised kmeans clustering algorithm for vector quantization. The number of clusters (quantization

Signal and Image Processing for Remote Sensing

500

levels) is empirically chosen for each feature. (An alternative is to use a parametric distribution assumption, for example, Gaussian, for each individual continuous feature but these parametric assumptions do not always hold.) Schroder et al. [19] used similar classifiers to retrieve images from remote-sensing archives by approximating the probabilities of images belonging to different classes using pixel-level probabilities. In the following, we describe learning of the models for p(xi j wj) using the positive training examples for the jth class label. Learning of p(xi j : wj) is done the same way using the negative examples. For a particular class, let each discrete variable xi have ri possible values (states) with probabilities p(xi ¼ z j ui ) ¼ uiz > 0

(22:4)

where z 2 {1, . . . , ri} and ui ¼ {uiz}zri¼ 1 is the set of parameters for the ith attribute model. This corresponds to a multinomial distribution. Since maximum likelihood estimates can give unreliable results when the sample is small and the number of parameters is large, we use the Bayes estimate of uiz that can be computed as the expected value of the posterior distribution. We can choose any prior for ui in the computation of the posterior distribution but there is a big advantage in using conjugate priors. A conjugate prior is one which, when multiplied with the direct probability, gives a posterior probability having the same functional form as the prior, thus allowing the posterior to be used as a prior in further computations [20]. The conjugate prior for the multinomial distribution is the Dirichlet distribution [21]. Geiger and Heckerman [22] showed that if all allowed states of the variables are possible (i.e., uiz > 0) and if certain parameter independence assumptions hold, then a Dirichlet distribution is indeed the only possible choice for the prior. Given the Dirichlet prior p(ui) ¼ Dir(ui j ai1, . . . , airi), where aiz are positive constants, the posterior distribution of ui can be computed using the Bayes rule as p(ui j D) ¼

p(D j ui )p(ui ) p(D)

¼ Dir(ui j ai1 þ Ni1 , . . . , airi þ Niri )

(22:5)

where D is the training sample and Niz is the number of cases in D in which xi ¼ z. Then, the Bayes estimate for uiz can be found by taking the conditional expected value aiz þ Niz u^iz ¼ Ep(ui j D) [uiz ] ¼ ai þ Ni

(22:6)

P P where ai ¼ zri¼ 1 aiz and Ni ¼ zri¼ 1 Niz. An intuitive choice for the hyperparameters ai1, . . . , airi of the Dirichlet distribution is Laplace’s uniform prior [23] that assumes all ri states to be equally probable (aiz ¼ 1 8z 2 {1, . . . , ri}), which results in the Bayes estimate 1 þ Niz u^iz ¼ ri þ N i

(22:7)

Laplace’s prior is regarded to be a safe choice when the distribution of the source is unknown and the number of possible states ri is fixed and known [24].

Spatial Techniques for Image Classification

501

Given the current state of the classifier that was trained using the prior information and the sample D, we can easily update the parameters when new data D0 is available. The new posterior distribution for ui becomes p(ui j D, D0 ) ¼

p(D0 j ui )p(ui j D) p(D0 j D)

(22:8)

With the Dirichlet priors and the posterior distribution for p(ui j D) given in Equation 22.5, the updated posterior distribution becomes 0 p(ui j D, D0 ) ¼ Dir(ui j ai1 þ Ni1 þ Ni1 , . . . , airi þ Niri þ Nir0 i )

(22:9)

where Niz0 is the number of cases in D0 in which xi ¼ z. Hence, updating the classifier parameters involves only updating the counts in the estimates for ^iz. The Bayesian classifiers that are learned from examples as described above are used to compute probability maps for all land-cover and land-use classes and assign each pixel to one of these classes using the maximum a posteriori probability (MAP) rule given in Equation 22.3. Example probability maps are shown in Figure 22.8 through Figure 22.10.

FIGURE 22.8 Pixel-level probability maps for different classes of the DC Mall data set. From left to right: roof, street, path, trees, shadow. Brighter values in the map show pixels with high probability of belonging to that class.

Signal and Image Processing for Remote Sensing

502

FIGURE 22.9 Pixel-level probability maps for different classes of the Centre data set. From left to right, first row: trees, selfblocking bricks, asphalt; second row: bitumen, tiles, shadow. Brighter values in the map show pixels with high probability of belonging to that class.

22.4

Region Segmentation

Image segmentation is used to group pixels that belong to the same structure with the goal of delineating each individual structure as an individual region. In previous work [25], we used an automatic segmentation algorithm that breaks an image into many small regions and merges them by minimizing an energy functional that trades off the similarity of regions against the length of their shared boundaries. We have also recently experimented with several segmentation algorithms from the computer vision literature. Algorithms that are based on graph clustering [26], mode seeking [27], and classification [28] have been reported to be successful in moderately sized color images with relatively homogeneous structures. However, we could not apply these techniques successfully to our data sets because the huge amount of data in hyperspectral images made processing infeasible due to both memory and computational requirements, and the detailed

Spatial Techniques for Image Classification

503

FIGURE 22.10 Pixel-level probability maps for different classes of the University data set. From left to right, first row: asphalt, meadows, trees; second row: metal sheets, self-blocking bricks, shadow. Brighter values in the map show pixels with high probability of belonging to that class.

structure in high-resolution remotely sensed imagery prevented the use of sampling that has been often used to reduce the computational requirements of these techniques. The segmentation approach we have used in this work consists of smoothing filters and mathematical morphology. The input to the algorithm includes the probability maps for all classes, where each pixel is assigned either to one of these classes or to the reject class for probabilities smaller than a threshold (the latter type of pixels are initially marked as background). Because pixel-based classification ignores spatial correlations, the initial segmentation may contain isolated pixels with labels different from those of their neighbors. We use an iterative split-and-merge algorithm [13] to convert this intermediate step into contiguous regions as follows: 1. Merge pixels with identical class labels to find the initial set of regions and mark these regions as foreground.

Signal and Image Processing for Remote Sensing

504

2. Mark regions with areas smaller than a threshold as background using connected components analysis [5]. 3. Use region growing to iteratively assign background pixels to the foreground regions by placing a window at each background pixel and assigning it to the class that occurs the most in its neighborhood. This procedure corresponds to a spatial smoothing of the clustering results. We further process the resulting regions using mathematical morphology operators [5] to automatically divide large regions into more compact subregions as follows [13]: 1. Find individual regions using connected components analysis for each class. 2. For all regions, compute the erosion transform [5] and repeat: – – – – –

Threshold erosion transform at steps of 3 pixels in every iteration Find connected components of the thresholded image Select subregions that have an area smaller than a threshold Dilate these subregions to restore the effects of erosion Mark these subregions in the output image by masking the dilation using the original image – Until no more sub-regions are found 3. Merge the residues of previous iterations to their smallest neighbors. The merging and splitting process is illustrated in Figure 22.11. The probability of each region belonging to a land-cover or land-use class can be estimated by propagating class labels from pixels to regions. Let X ¼ {x1, . . . , xn} be the set of pixels that are merged to form a region. Let wj and p(wj j xi) be the class label and its posterior probability, respectively, assigned to pixel xi by the classifier. The probability p(wj j x 2 X) that a pixel in the merged region belongs to the class wj can be computed as p(wj j x 2 X) p(wj , x 2 X) p(wj , x 2 X) ¼ Pk p(x 2 X) t ¼ 1 p(wt , x 2 X) P P p(wj j x)p(x) x2X p(wj , x) ¼ Pk P ¼ Pk x2X P x2X p(wt , x) x2X p(wt j x)p(x) t¼1 t¼1

¼

n Ex {Ix2X (x)p(wj j x)} 1X ¼ Pk p(wj j xi ) ¼ n i¼1 t ¼ 1 Ex {Ix2X (x)p(wt j x)}

(22:10)

where IA() is the indicator function associated with the set A. Each region in the final segmentation is assigned labels with probabilities using Equation 22.10.

22.5

Region Feature Extraction

Region-level representations include properties shared by groups of pixels obtained through region segmentation. The regions are modeled using the statistical summaries of their spectral and textural properties along with shape features that are computed from

Spatial Techniques for Image Classification (a) A large connected region formed by merging pixels labeled as street in DC Mall data

(b) More compact subregions after splitting the region in (a)

505 (c) A large connected region formed by merging pixels labeled as tiled in Centre data

(d) More compact subregions after splitting the region in (c).

FIGURE 22.11 (See color insert following page 302.) Examples for the region segmentation process. The iterative algorithm that uses mathematical morphology operators is used to split a large connected region into more compact subregions.

region polygon boundaries. The statistical summary for a region is computed as the means and standard deviations of features of the pixels in that region. Multi-dimensional histograms also provide pixel feature distributions within individual regions. The shape properties [5] of a region correspond to its . . .

.

Aarea Orientation of the region’s major axis with respect to the x axis Eccentricity (ratio of the distance between the foci to the length of the major axis; for example, a circle is an ellipse with zero eccentricity) Euler number (1 minus the number of holes in the region)

.

Solidity (ratio of the area to the convex area) Extent (ratio of the area to the area of the bounding box)

.

Spatial variances along the x and y axes

.

Spatial variances along the region’s principal (major and minor) axes Resulting in a feature vector of length 10

.

.

Signal and Image Processing for Remote Sensing

506

22 .6

Regi on Clas sific atio n

In the remote-sensing literature, image classification is usually done by using pixel features as input to classifiers such as minimum distance, maximum likelihood, neural networks, or decision trees. However, large within-class variations and small betweenclass variations of these features at the pixel level and the lack of spatial information limit the accuracy of these classifiers. In this work, we perform final classification using region-level information. To use the Bayesian classifiers that were described in Section 22.3, different region-based features such as statistics and shape features are independently converted to discrete random variables using the k-means algorithm for vector quantization. In particular, for each region, we obtain four values from

.

Clustering of the statistics of the LDA bands (6 bands for DC Mall data, 8 bands for Centre and University data) Clustering of the statistics of the 10 PCA bands

.

Clustering of the statistics of the 16 Gabor bands

.

Clustering of the 10 shape features

.

In the next section, we evaluate the performance of these new features for classifying regions (and the corresponding pixels) into land-cover and land-use categories defined by the user.

22.7

Expe riments

Performances of the features and the algorithms described in the previous sections were evaluated both quantitatively and qualitatively. First, pixel-level features (LDA, PCA, and Gabor) were extracted and normalized for all three data sets as described in Section 22.2. The ground-truth maps shown in Figure 22.1 through Figure 22.3 were used to divide the data into independent training and test sets. Then, the k-means algorithm was used to cluster (quantize) the continuous features and convert them to discrete attribute values, and Bayesian classifiers with discrete nonparametric models were trained using these attributes and the training examples as described in Section 22.3. The value of k was set to 25 empirically for all data sets. Example probability maps for some of the classes were given in Figure 22.8 through Figure 22.10. Confusion matrices, shown in Table 22.1 through Table 22.3, were computed using the test ground truth for all data sets. Next, the iterative split-and-merge algorithm described in Section 22.4 was used to convert the pixel-level classification results into contiguous regions. The neighborhood size for region growing was set to 3 3. The minimum area threshold in the segmentation process was set to 5 pixels. After the region-level features (LDA, PCA, and Gabor statistics, and shape features) were computed and normalized for all resulting regions as described in Section 22.5, they were also clustered (quantized) and converted to discrete values. The value of k was set to 25 again for all data sets. Then, Bayesian classifiers were trained using the training ground truth as described in Section 22.6, and were applied to the test data to produce the confusion matrices shown in Table 22.4 through Table 22.6.

Spatial Techniques for Image Classification

507

TABLE 22.1 Confusion Matrix for Pixel-Level Classification of the DC Mall Data Set (Testing Subset) Using LDA, PCA, and Gabor Features Assigned True

Roof

Street

Path

Grass

Trees

Water

Shadow

Total

% Agree

Roof Street Path Grass Trees Water Shadow Total

3771 0 0 0 0 0 0 3771

49 412 0 0 0 0 4 465

12 0 175 0 0 0 0 187

0 0 0 1926 0 0 0 1926

1 0 0 2 405 0 0 408

0 0 0 0 0 1223 0 1223

1 4 0 0 0 1 93 99

3834 416 175 1928 405 1224 97 8079

98.3568 99.0385 100.0000 99.8963 100.0000 99.9183 95.8763 99.0840

TABLE 22.2 Confusion Matrix for Pixel-Level Classification of the Centre Data Set (Testing Subset) Using LDA, PCA, and Gabor Features Assigned True

Bare Water Trees Meadows Bricks Soil Asphalt Bitumen

Tiles

Shadow

Total

% Agree

Water Trees Meadows Bricks Bare soil Asphalt Bitumen Tiles Shadow Total

65,877 1 0 0 0 4 4 0 12 65,898

0 0 0 0 3 5 9 41,826 0 41,843

85 29 0 0 0 756 53 211 2371 3505

65,971 7598 3090 2685 6584 9248 7287 42,826 2863 148,152

99.8575 84.4959 87.9612 83.3520 78.7667 85.3914 83.1755 97.6650 82.8152 94.8985

0 6420 349 0 9 0 0 1 0 6779

1 1094 2718 0 110 0 1 0 0 3924

0 5 0 2238 1026 317 253 150 3 3992

1 0 22 221 5186 30 22 85 0 5567

7 45 1 139 191 7897 884 437 477 10,078

0 4 0 87 59 239 6061 116 0 6566

TABLE 22.3 Confusion Matrix for Pixel-Level Classification of the University Data Set (Testing Subset) Using LDA, PCA, and Gabor Features Assigned True Asphalt Meadows Gravel Trees Metal sheets Bare soil Bitumen Bricks Shadow Total

Metal Bare Asphalt Meadows Gravel Trees Sheets Soil Bitumen Bricks Shadow Total 4045 21 91 5 0 34 424 382 22 5024

38 14,708 14 76 2 1032 1 45 0 15,916

391 14 1466 1 0 7 7 959 0 2845

39 691 0 2927 1 38 1 2 0 3699

1 0 0 0 1341 20 0 1 0 1363

105 3132 3 40 0 3745 1 87 0 7113

1050 11 19 1 0 32 829 141 0 2083

875 71 506 2 1 119 67 2064 2 3707

87 1 0 12 0 2 0 1 923 1026

6631 18,649 2099 3064 1345 5029 1330 3682 947 42,776

% Agree 61.0014 78.8675 69.8428 95.5287 99.7026 74.4681 62.3308 56.0565 97.4657 74.9205

Signal and Image Processing for Remote Sensing

508 TABLE 22.4

Confusion Matrix for Region-Level Classification of the DC Mall Data Set (Testing Subset) Using LDA, PCA, and Gabor Statistics, and Shape Features Assigned True

Roof

Street

Path

Grass

Trees

Water

Shadow

Total

% Agree

Roof Street Path Grass Trees Water Shadow Total

3814 0 0 0 0 0 1 3815

11 414 0 0 0 1 2 428

5 0 175 0 0 0 0 180

0 0 0 1928 0 0 0 1928

0 0 0 0 405 0 0 405

1 0 0 0 0 1223 0 1224

3 2 0 0 0 0 94 99

3834 416 175 1928 405 1224 97 8079

99.4784 99.5192 100.0000 100.0000 100.0000 99.9183 96.9072 99.678

TABLE 22.5 Confusion Matrix for Region-Level Classification of the Centre Data Set (Testing Subset) Using LDA, PCA, and Gabor Statistics, and Shape Features Assigned True

Water

Trees

Meadows

Bricks

Bare Soil

Asphalt

Bitumen

Tiles

Shadow

Total

% Agree

Water Trees Meadows Bricks Bare soil Asphalt Bitumen Tiles Shadow Total

65,803 0 0 0 1 0 0 0 38 65,842

0 6209 138 0 4 1 0 0 0 6352

0 1282 2942 1 59 2 0 0 2 4288

0 28 0 2247 257 37 24 39 2 2634

0 22 10 173 6139 4 3 13 0 6364

0 11 0 31 11 8669 726 220 341 10,009

0 5 0 233 102 163 6506 2 12 7023

0 0 0 0 0 0 0 42,380 0 42,380

168 41 0 0 11 372 28 172 2468 3260

65,971 7598 3090 2685 6584 9248 7287 42,826 2863 148,152

99.7453 81.7189 95.2104 83.6872 93.2412 93.7392 89.2823 98.9586 86.2033 96.7675

TABLE 22.6 Confusion Matrix for Region-Level Classification of the University Data Set (Testing Subset) Using LDA, PCA, and Gabor Statistics, and Shape Features Assigned True Asphalt Meadows Gravel Trees Metal sheets Bare soil Bitumen Bricks Shadow Total

Metal Bare Asphalt Meadows Gravel Trees Sheets Soil Bitumen Bricks Shadow 4620 8 9 39 0 0 162 248 16 5102

7 17,246 5 37 0 991 0 13 0 18,299

281 0 1360 0 0 0 0 596 0 2237

4 1242 2 2941 0 5 0 33 0 4227

0 0 0 0 1344 0 0 5 1 1350

52 19 0 4 0 4014 0 21 0 4110

344 6 0 13 0 0 1033 125 0 1521

1171 7 723 14 1 19 135 2635 1 4706

152 121 0 16 0 0 0 6 929 1224

Total

% Agree

6631 18,649 2099 3064 1345 5029 1330 3682 947 42,776

69.6727 92.4768 64.7928 95.9856 99.9257 79.8171 77.6692 71.5644 98.0993 84.4445

Spatial Techniques for Image Classification

509

TABLE 22.7 Summary of Classification Accuracies Using the Pixel-Level and Region-Level Bayesian Classifiers and the Quadratic Gaussian Classifier

Pixel-level Bayesian Region-level Bayesian Quadratic Gaussian

DC Mall

Centre

University

99.0840 99.6782 99.3811

94.8985 96.7675 93.9677

74.9205 84.4445 81.2792

Finally, comparative experiments were done by training and evaluating traditional maximum likelihood classifiers with the multi-variate Gaussian with full covariance matrix assumption for each class (quadratic Gaussian classifier) using the same training and test ground-truth data. The classification performances of all three classifiers (pixellevel Bayesian, region-level Bayesian, quadratic Gaussian) are summarized in Table 22.7. For qualitative comparison, the classification maps for all classifiers for all data sets were computed as shown in Figure 22.12 through Figure 22.14. The results show that the proposed region-level features and Bayesian classifiers performed better than the traditional maximum likelihood classifier with the Gaussian density assumption for all data sets with respect to the ground-truth maps available. Using texture features, which model spatial neighborhoods of pixels, in addition to the spectral-based ones improved the performances of all classifiers. Using the Gabor filters at the third and fourth scales (corresponding to eight features) improved the results the most. (The confusion matrices presented show the performances of using these features instead of the original 16.) The reason for this is the high spatial image resolution where filters with a larger coverage include mixed effects from multiple structures within a pixel’s neighborhood. Using region-level information gave the most significant improvement for the University data set. The performances of pixel-level classifiers for DC Mall and Centre data sets using LDA- and PCA-based spectral and Gabor-based textural features were already quite high. In all cases, region-level classification performed better than pixel-level classifiers. One important observation to note is that even though the accuracies of all classifiers seem quite high, some misclassified areas can still be found in the classification maps for all images. This is especially apparent in the results of pixel-level classifiers where many isolated pixels that are not covered by test ground-truth maps (e.g., the upper part of the DC Mall data, tiles on the left of the Centre data, many areas in the University data) were assigned wrong class labels because of the lack of spatial information and, hence, the context. The same phenomenon can be observed in many other results published in the literature. A more detailed ground truth is necessary for a more reliable evaluation of classifiers for high-resolution imagery. We believe that there is still a large margin for improvement in the performance of classification techniques for data received from stateof-the-art satellites.

22.8

Conclusions

We have presented an approach for classification of remotely sensed imagery using spatial techniques. First, pixel-level spectral and textural features were extracted and used for classification with nonparametric Bayesian classifiers. Next, an iterative

Signal and Image Processing for Remote Sensing

510 (a) Pixel-level Bayesian

(b) Region-level Bayesian

(c) Quadratic Gaussian

FIGURE 22.12 (See color insert following page 302.) Final classification maps with the Bayesian pixel-and region-level classifiers and the quadratic Gaussian classifier for the DC Mall data set. Class color codes were listed in Figure 22.1.

split-and-merge algorithm was used to convert the pixel-level classification maps into contiguous regions. Then, spectral and textural statistics and shape features extracted from these regions were used with similar Bayesian classifiers to compute the final classification maps. Comparative quantitative and qualitative evaluation using traditional maximum likelihood Gaussian classifiers in experiments with three different data sets with ground truth showed that the proposed region-level features and Bayesian classifiers performed better than the traditional pixel-level classification techniques. Even though the numerical results already look quite impressive, we believe that selection of the most discriminative

Spatial Techniques for Image Classification (a) Pixel-level Bayesian

(b) Region-level Bayesian

511 (c) Quadratic Gaussian

FIGURE 22.13 (See color insert following page 302.) Final classification maps with the Bayesian pixel-and region-level classifiers and the quadratic Gaussian classifier for the Centre data set. Class color codes were listed in Figure 22.2.

subset of features and better segmentation of regions will bring further improvements in classification accuracy. We are also in the process of gathering ground-truth data with a larger coverage for better evaluation of classification techniques for images from highresolution satellites. (a) Pixel-level Bayesian

(b) Region-level Bayesian

(c) Quadratic Gaussian

FIGURE 22.14 (See color insert following page 302.) Final classification maps with the Bayesian pixel- and region-level classifiers and the quadratic Gaussian classifier for the University data set. Class color codes were listed in Figure 22.3.

512

Signal and Image Processing for Remote Sensing

Acknowledgments The author would like to thank Dr. David A. Landgrebe and Mr. Larry L. Biehl from Purdue University, Indiana, U.S.A., for the DC Mall data set, and Dr. Paolo Gamba from the University of Pavia, Italy, for the Centre and University data sets.

References 1. S.S. Durbha and R.L. King, Knowledge mining in earth observation data archives: a domain ontology perspective, in Proceedings of IEEE International Geoscience and Remote Sensing Symposium, September 1, 2004. 2. D.A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing, John Wiley & Sons, Inc., New York, 2003. 3. G.G. Wilkinson, Results and implications of a study of fifteen years of satellite image classification experiments, IEEE Transactions on Geoscience and Remote Sensing, 43(3), 433–440, 2005. 4. M. Datcu, H. Daschiel, A. Pelizzari, M. Quartulli, A. Galoppo, A. Colapicchioni, M. Pastori, K. Seidel, P.G. Marchetti, and S. D’Elia, Information mining in remote sensing image archives: system concepts, IEEE Transactions on Geoscience and Remote Sensing, 41(12), 2923–2936, 2003. 5. R.M. Haralick and L.G. Shapiro, Computer and Robot Vision, Addison-Wesley, Reading, MA, 1992. 6. R.L. Kettig and D.A. Landgrebe, Classification of multispectral image data by extraction and classification of homogeneous objects, IEEE Transactions on Geoscience Electronics, GE-14(1), 19–26, 1976. 7. C. Evans, R. Jones, I. Svalbe, and M. Berman, Segmenting multispectral Landsat TM images into field units, IEEE Transactions on Geoscience and Remote Sensing, 40(5), 1054–1064, 2002. 8. A. Sarkar, M.K. Biswas, B. Kartikeyan, V. Kumar, K.L. Majumder, and D.K. Pal, A MRF modelbased segmentation approach to classification for multispectral imagery, IEEE Transactions on Geoscience and Remote Sensing, 40(5), 1102–1113, 2002. 9. J.C. Tilton, G. Marchisio, K. Koperski, and M. Datcu, Image information mining utilizing hierarchical segmentation, in Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2, 1029–1031, Toronto, Canada, June 2002. 10. G.G. Hazel, Object-level change detection in spectral imagery, IEEE Transactions on Geoscience and Remote Sensing, 39(3), 553–561, 2001. 11. T. Blaschke, Object-based contextual image classification built on image segmentation, in Proceedings of IEEE GRSS Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, 113–119, Washington, DC, October 2003. 12. A. Rydberg and G. Borgefors, Integrated method for boundary delineation of agricultural fields in multispectral satellite images, IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2514–2520, 2001. 13. S. Aksoy, K. Koperski, C. Tusk, G. Marchisio, and J.C. Tilton, Learning Bayesian classifiers for scene classification with a visual grammar, IEEE Transactions on Geoscience and Remote Sensing, 43(3), 581–589, 2005. 14. S. Aksoy and H.G. Akcay, Multi-resolution segmentation and shape analysis for remote sensing image classification, in Proceedings of 2nd International Conference on Recent Advances in Space Technologies, Istanbul, Turkey, June 9–11, 599–604, 2005. 15. S. Aksoy, K. Koperski, C. Tusk, and G. Marchisio, Interactive training of advanced classifiers for mining remote sensing image archives, in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 773–782, Seattle, WA, August 22–25, 2004. 16. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, John Wiley & Sons, Inc., New York, 2000.

Spatial Techniques for Image Classification

513

17. B.S. Manjunath and W.Y. Ma, Texture features for browsing and retrieval of image data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), 837–842, 1996. 18. S. Aksoy and R.M. Haralick, Feature normalization and likelihood-based similarity measures for image retrieval, Pattern Recognition Letters, 22(5), 563–582, 2001. 19. M. Schroder, H. Rehrauer, K. Siedel, and M. Datcu, Interactive learning and probabilistic retrieval in remote sensing image archives, IEEE Transactions on Geoscience and Remote Sensing, 38(5), 2288–2298, 2000. 20. C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. 21. M.H. DeGroot, Optimal Statistical Decisions, McGraw-Hill, New York, 1970. 22. D. Geiger and D. Heckerman, A characterization of the Dirichlet distribution through global and local parameter independence, The Annals of Statistics, 25(3), 1344–1369, 1997, MSRTR-94-16. 23. T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. 24. R.F. Krichevskiy, Laplace’s law of succession and universal encoding, IEEE Transactions on Information Theory, 44(1), 296–303, 1998. 25. S. Aksoy, C. Tusk, K. Koperski, and G. Marchisio, Scene modeling and image mining with a visual grammar, in C.H. Chen, ed., Frontiers of Remote Sensing Information Processing, World Scientific, 2003, 35–62. 26. J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905, 2000. 27. D. Comaniciu and P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619, 2002. 28. P. Paclik, R.P.W. Duin, G.M.P. van Kempen, and R. Kohlus, Segmentation of multispectral images using the combined classifier approach, Image and Vision Computing, 21(6), 473–482, 2003.

23 Data Fusion for Remote-Sensing Applications

Anne H.S. Solberg

CONTENTS 23.1 Introduction ..................................................................................................................... 516 23.2 The ‘‘Multi’’ Concept in Remote Sensing ................................................................... 516 23.2.1 The Multi-Spectral or Multi-Frequency Aspect ........................................... 516 23.2.2 The Multi-Temporal Aspect............................................................................ 517 23.2.3 The Multi-Polarization Aspect........................................................................ 517 23.2.4 The Multi-Sensor Aspect ................................................................................. 517 23.2.5 Other Sources of Spatial Data ......................................................................... 517 23.3 Multi-Sensor Data Registration .................................................................................... 518 23.4 Multi-Sensor Image Classification ............................................................................... 520 23.4.1 A General Introduction to Multi-Sensor Data Fusion for Remote-Sensing Applications ......................................................................... 520 23.4.2 Decision-Level Data Fusion for Remote-Sensing Applications ................ 520 23.4.3 Combination Schemes for Combining Classifier Outputs......................... 522 23.4.4 Statistical Multi-Source Classification ........................................................... 523 23.4.5 Neural Nets for Multi-Source Classification ................................................ 523 23.4.6 A Closer Look at Dempster–Shafer Evidence Theory for Data Fusion ... 524 23.4.7 Contextual Methods for Data Fusion ............................................................ 525 23.4.8 Using Markov Random Fields to Incorporate Ancillary Data .................. 526 23.4.9 A Summary of Data Fusion Architectures ................................................... 526 23.5 Multi-Temporal Image Classification.......................................................................... 526 23.5.1 Multi-Temporal Classifiers.............................................................................. 529 23.5.1.1 Direct Multi-Date Classification .................................................... 529 23.5.1.2 Cascade Classifiers .......................................................................... 529 23.5.1.3 Markov Chain and Markov Random Field Classifiers.............. 530 23.5.1.4 Approaches Based on Characterizing the Temporal Signature............................................................................................ 530 23.5.1.5 Other Decision-Level Approaches to Multi-Temporal Classification..................................................................................... 530 23.6 Multi-Scale Image Classification .................................................................................. 530 23.7 Concluding Remarks...................................................................................................... 532 23.7.1 Fusion Level....................................................................................................... 533 23.7.2 Selecting a Multi-Sensor Classifier................................................................. 533 23.7.3 Selecting a Multi-Temporal Classifier ........................................................... 533 23.7.4 Approaches for Multi-Scale Data ................................................................... 533 Acknowledgment....................................................................................................................... 533 References ................................................................................................................................... 533 515

Signal and Image Processing for Remote Sensing

516

23.1

Introduction

Earth observation is currently developing more rapidly than ever before. During the last decade the number of satellites has been growing steadily, and the coverage of the Earth in space, time, and the electromagnetic spectrum is increasing correspondingly fast. The accuracy in classifying a scene can be increased by using images from several sensors operating at different wavelengths of the electromagnetic spectrum. The interaction between the electromagnetic radiation and the earth’s surface is characterized by certain properties at different frequencies of electromagnetic energy. Sensors with different wavelengths provide complementary information about the surface. In addition to image data, prior information about the scene might be available in the form of map data from geographic information systems (GIS). The merging of multi-source data can create a more consistent interpretation of the scene compared to an interpretation based on data from a single sensor. This development opens up for a potential significant change in the approach of analysis of earth observation data. Traditionally, analysis of such data has been by means of analysis of a single satellite image. The emerging exceptionally good coverage in space, time, and the spectrum opens for analysis of time series of data, combining different sensor types, combining imagery of different scales, and better integration with ancillary data and models. Thus, data fusion to combine data from several sources is becoming increasingly more important in many remote-sensing applications. This paper provides a tutorial on data fusion for remote-sensing applications. The main focus is on methods for multi-source image classification, but separate sections on multisensor image registration, multi-scale classification, and multi-temporal image classification are also included. The remainder of this chapter is organized in the following manner: in Section 23.2 the ‘‘multi’’ concept in remote sensing is presented. Multi-sensor data registration is treated in Section 23.3. Classification strategies for multi-sensor applications are discussed in Section 23.4. Multi-temporal image classification is discussed in Section 23.5, while multi-scale approaches are discussed in Section 23.6. Concluding remarks are given in Section 23.7.

23.2

The ‘‘Multi’’ Concept in Remote Sensing

The variety of different sensors already available or being planned creates a number of possibilities for data fusion to provide better capabilities for scene interpretation. This is referred to as the ‘‘multi’’ concept in remote sensing. The ‘‘multi’’ concept includes multitemporal, multi-spectral or multi-frequency, multi-polarization, multi-scale, and multisensor image analysis. In addition to the concepts discussed here, imaging using multiple incidence angles can also provide additional information [1,2].

23.2.1

The Multi-Spectral or Multi-Frequency Aspect

The measured backscatter values for an area vary with the wavelength band. A land-use category will give different image signals depending on the frequency used, and by using different frequencies, a spectral signature that characterizes the land-use category can be found. A description of the scattering mechanisms for optical sensors can be found in

Data Fusion for Remote-Sensing Applications

517

Ref. [3], while Ref. [4] contains a thorough discussion of the backscattering mechanisms in the microwave region. Multi-spectral optical sensors have demonstrated this effect for a substantial number of applications for several decades; they are now followed by high-spatial-resolution multi-spectral sensors such as Ikonos and Quickbird, and by hyperspectral sensors from satellite platforms (e.g., Hyperion).

23.2.2

The Multi-Temporal Aspect

The term multi-temporal refers to the repeated imaging of an area over a period. By analyzing an area through time, it is possible to develop interpretation techniques based on an object’s temporal variations and to discriminate different pattern classes accordingly. Multi-temporal imagery allows, the study of the variation of backscatter of different areas with time, weather conditions, and seasons. It also allows monitoring of processes that change over time. The principal advantage of multi-temporal analysis is the increased amount of information for the study area. The information provided for a single image is, for certain applications, not sufficient to properly distinguish between the desired pattern classes. This limitation can sometimes be resolved by examining the pattern of temporal changes in the spectral signature of an object. This is particularly important for vegetation applications. Multi-temporal image analysis is discussed in more detail in Section 23.5.

23.2.3

The Multi-Polarization Aspect

The multi-polarization aspect is related to microwave image data. The polarization of an electromagnetic wave refers to the orientation of the electric field during propagation. A review of the theory and features of polarization is given in Refs. [5,6].

23.2.4

The Multi-Sensor Aspect

With an increasing number of operational and experimental satellites, information about a phenomenon can be captured using different types of sensors. Fusion of images from different sensors requires some additional preprocessing and poses certain difficulties that are not solved in traditional image classifiers. Each sensor has its own characteristics, and the image captured usually contains various artifacts that should be corrected or removed. The images also need to be geometrically corrected and co-registered. Because the multi-sensor images often are not acquired on the same data, the multi-temporal nature of the data must also often be explained. Figure 23.1 shows a simple visualization of two synthetic aperture radar (SAR) images from an oil spill in the Baltic sea, imaged by the ENVISAT ASAR sensor and the Radarsat SAR sensor. The images were taken a few hours apart. During this time, the oil slick has drifted to some extent, and it has become more irregular in shape.

23.2.5

Other Sources of Spatial Data

The preceding sections have addressed spatial data in the form of digital images obtained from remote-sensing satellites. For most regions, additional information is available in the form of various kinds of maps, for example, topography, ground cover, elevation, and so on. Frequently, maps contain spatial information not obtainable from a single remotely sensed image. Such maps represent a valuable information resource in addition to the

518

Signal and Image Processing for Remote Sensing

FIGURE 23.1 (See color insert following page 302.) Example of multi-sensor visualization of an oil spill in the Baltic sea created by combining an ENVISAT ASAR image with a Radarsat SAR image taken a few hours later.

satellite images. To integrate map information with a remotely sensed image, the map must be available in digital form, for example, in a GIS system.

23.3

Multi-Sensor Data Registration

A prerequisite for data fusion is that the data are co-registered, and geometrically and radiometrically corrected. Data co-registration can be simple if the data are georeferenced. In that case, the co-registration consists merely of resampling the images to a common map projection. However, an image-matching step is often necessary to obtain subpixel accuracy in matching. Complicating factors for multi-sensor data are the different appearances of the same object imaged by different sensors, and nonrigid changes in object position between multi-temporal images.

Data Fusion for Remote-Sensing Applications

519

The image resampling can be done at various stages of the image interpretation process. Resampling an image affects the spatial statistics of the neighboring pixel, which is of importance for many radar image feature extraction methods that might use speckle statistics or texture. When fusing a radar image with other data sources, a solution might be to transform the other data sources to the geometry of the radar image. When fusing a multi-temporal radar image, an alternative might be to use images from the same image mode of the sensor, for example, only ascending scenes with a given incidence angle range. If this is not possible and the spatial information from the original geometry is important, the data can be fused and resampling done after classification by the sensor-specific classifiers. An image-matching step may be necessary to achieve subpixel accuracy in the coregistration even if the data are georeferenced. A survey of image registration methods is given by Zitova and Flusser [7]. A full image registration process generally consists of four steps: .

Feature extraction. This is the step where regions, edges, and contours can be used to represent tie-points in the set of images to be matched are extracted. This is a crucial step, as the registration accuracy can be no better than what is achieved for the tie-points.

Feature extraction can be grouped into area-based methods [8,9], feature-based methods [10–12], and hybrid approaches [7]. In area-based methods, the gray levels of the images are used directly for matching, often by statistical comparison of pixel values in small windows, and they are best suited for images from the same or highly similar sensors. Feature-based methods will be application-dependent, as the type of features to use as tie points needs to be tailored to the application. Features can be extracted either from the spatial domain (edges, lines, regions, intersections, and so on) or from the frequency domain (e.g., wavelet features). Spatial features can perform well for matching data from heterogeneous sensors, for example, optical and radar images. Hybrid approaches use both area-based and feature-based techniques by combining both a correlation-based matching with an edge-based approach, and they are useful in matching data from heterogeneous sensors. .

Feature matching. In this step, the correspondence between the tie-points or features in the sensed image and the reference image is found. Area-based methods for feature extraction use correlation, Fourier-transform methods, or optical flow [13]. Feature-based methods use the equivalence between correlation in the spatial domain and multiplication in the Fourier domain to perform matching in the Fourier domain [10,11]. Correlation-based methods are best suited for data from similar sensors. The optical flow approach involves estimation of the relative motion between two images and is a broad approach. It is commonly used in video analysis, but only a few studies have used it in remote-sensing applications [29,30].

.

Transformation selection concerns the choice of mapping function and estimation of its parameters based on the established feature correspondence. The affine transform model is commonly used for remote-sensing applications, where the images normally are preprocessed for geometrical correction—a step that justifies the use of affine transforms.

.

Image resampling. In this step, the image is transformed by means of the mapping function. Image values in no-integer coordinates are computed by the

Signal and Image Processing for Remote Sensing

520

appropriate interpolation technique. Normally, either a nearest neighbor or a bilinear interpolation is used. Nearest neighbor interpolation is applicable when no new pixel values should be introduced. Bilinear interpolation is often a good trade-off between accuracy and computational complexity compared to cubic or higher order interpolation.

23.4

Multi-Sensor Image Classification

The literature on data fusion in the computer vision and machine intelligence domains is substantial. For an extensive review of data fusion, we recommend the book by Abidi and Gonzalez [16]. Multi-sensor architectures, sensor management, and designing sensor setup are also thoroughly discussed in Ref. [17].

23.4.1

A General Introduction to Multi-Sensor Data Fusion for Remote-Sensing Applications

Fusion can be performed at the signal, pixel, feature, or decision level of representation (see Figure 23.2). In signal-based fusion, signals from different sensors are combined to create a new signal with a better signal-to-noise ratio than the original signals [18]. Techniques for signal-level data fusion typically involve classic detection and estimation methods [19]. If the data are noncommensurate, they must be fused at a higher level. Pixel-based fusion consists of merging information from different images on a pixel-bypixel basis to improve the performance of image processing tasks such as segmentation [20]. Feature-based fusion consists of merging features extracted from different signals or images [21]. In feature-level fusion, features are extracted from multiple sensor observations, then combined into a concatenated feature vector, and classified using a standard classifier. Symbol-level or decision-level fusion consists of merging information at a higher level of abstraction. Based on the data from each single sensor, a preliminary classification is performed. Fusion then consists of combining the outputs from the preliminary classifications. The main approaches to data fusion in the remote-sensing literature are statistical methods [22–25], Dempster–Shafer theory [26–28], and neural networks [22,29]. We will discuss each of these approaches in the following sections. The best level and methodology for a given remote-sensing application depends on several factors: the complexity of the classification problem, the available data set, and the goal of the analysis.

23.4.2

Decision-Level Data Fusion for Remote-Sensing Applications

In the general multi-sensor fusion case, we have a set of images X1 XP from P sensors. The class labels of the scene are denoted C. The Bayesian approach is to assign each pixel to the class that maximizes the posterior probabilities P(C j X1, . . . , XP) P(CjX1 , . . . , XP ) ¼

P(X1 , . . . , XP jC)P(C) P(X1 , . . . , XP )

where P(C) is the prior model for the class labels.

(23:1)

Data Fusion for Remote-Sensing Applications

521

Decision-level fusion

Feature extraction

Classifier module

Statistical Consensus theory Neural nets Dempster−Shager Fusion module

Image data sensor 1

Feature extraction

Classifier module

Image data sensor p

Classified image

Feature-level fusion

Feature extraction

Classifier module

Image data sensor 1

Feature extraction

Statistical Neural nets Dempster−Shafer

Classified image

Image data sensor p

Pixel-level fusion

Classifier module

Image data sensor 1

Multi-band image data

Image data sensor p FIGURE 23.2 (See color insert following page 302.) An illustration of data fusion on different levels.

Statistical Neural nets Dempster−Shafer

Classified image

Signal and Image Processing for Remote Sensing

522

For decision-level fusion, the following conditional independence assumption is used: P(X1 , . . . , XP jC) P(X1 jC) P(XP jC) This assumption means that the measurements from the different sensors are considered to be conditionally independent.

23.4.3

Combination Schemes for Combining Classifier Outputs

In the data fusion literature [30], various alternative methods have been proposed for combining the outputs from the sensor-specific classifiers by weighting the influence of each sensor. This is termed consensus theory. The weighting schemes can be linear, logarithmic, or of a more general form (see Figure 23.3). The simplest choice, the linear opinion pool (LOP), is given by LOP(X1 , . . . , XP ) ¼

P X

P(Xp jC)lp

(23:2)

p¼1

The logarithmic opinion pool (LOGP) is given by LOGP(X1 , . . . , XP ) ¼

P Y

P(Xp jC)lp

(23:3)

p¼1

which is equivalent to the Bayesian combination if the weights lp are equal. This weighting scheme contradicts the statistical formulation in which the sensor’s uncertainty is supposed to be modeled by the variance of the probability density function. The weights are supposed to represent the sensor’s reliability. The weights can be selected by heuristic methods based on their goodness [3] by weighting a sensor’s influence by a factor proportional to its overall classification accuracy on the training data set. An alternative approach for a linear combination pool is to use a genetic algorithm [32]. An approach using a neural net to optimize the weights is presented in Ref. [30]. Yet another possibility is to choose the weights in such a way that they not only weigh the individual data sources but also the classes within the data sources [33].

Classifier for data source 1

P(w|x1) l1

Label w f (X,l)

Classifier for data source p

P(w|xp) lp

FIGURE 23.3 Schematic view of weighting the outputs from sensor-specific classifiers in decision-level fusion.

Data Fusion for Remote-Sensing Applications

523

Benediktsson et al. [30,31] use a multi-layer perceptron (MLP) neural network to combine the class-conditional probability densities P(Xp j C). This allows a more flexible, nonlinear combination scheme. They compare the classification accuracy using MLPs to LOPs and LOGPs, and find that the neural net combination performs best. Benediktsson and Sveinsson [34] provide a comparison of different weighting schemes for an LOP and LOGP, genetic algorithm with and without pruning, parallel consensus neural nets, and conjugate gradient backpropagation (CGBP) nets on a single multisource data set. The best results were achieved by using a CGBP net to optimize the weights in an LOGP. A study that contradicts the weighting of different sources is found in Ref. [35]. In this study, three different data sets (optical and radar) were merged using the LOGP, and the weights were varied between 0 and 1. Best results for all three data sets were found by using equal weights.

23.4.4

Statistical Multi-Source Classification

Statistical methods for fusion of remotely sensed data can be divided into four categories: the augmented vector approach, stratification, probabilistic relaxation, and extended statistical fusion. In the augmented vector approach, data from different sources are concatenated as if they were measurements from a single sensor. This is the most common approach for many application-oriented applications of multi-source classification, because no special software is needed. This is an example of pixel-level fusion. Such a classifier is difficult to use when the data cannot be modeled with a common probability density function, or when the data set includes ancillary data (e.g., from a GIS system). The fused data vector is then classified using ordinary single-source classifiers [36]. Stratification has been used to incorporate ancillary GIS data in the classification process. The GIS data are stratified into categories and then a spectral model for each of these categories is used [37]. Richards et al. [38] extended the methods used for spatially contextual classification based on probabilistic relaxation to incorporate ancillary data. The methods based on extended statistical fusion [10,43] were derived by extending the concepts used for classification of single-sensor data. Each data source is considered independently and the classification results are fused using weighted linear combinations. By using a statistical classifier one often assumes that the data have a multi-variate Gaussian distribution. Recent developments in statistical classifiers based on regression theory include choices of nonlinear classifiers [11–13,18–20,26,28,33,38,39–56]. For a comparison of neural nets and regression-based nonlinear classifiers, see Ref. [57].

23.4.5

Neural Nets for Multi-Source Classification

Many multi-sensor studies have used neural nets because no specific assumptions about the underlying probability densities are needed [40,58]. A drawback of neural nets in this respect is that they act like a black box in that the user cannot control the usage of different data sources. It is also difficult to explicitly use a spatial model for neighboring pixels (but one can extend the input vector from measurements from a single pixel to measurements from neighboring pixels). Guan et al. [41] utilized contextual information by using a network of neural networks with which they built a quadratic regularizer. Another drawback is that specifying a neural network architecture involves specifying a large number of parameters. A classification experiment should take care in choosing them and apply different configurations, making the complete training process very time

Signal and Image Processing for Remote Sensing

524

Sensor-specific neural net

P(w |x1) Multi-sensor fusion net Image data sensor 1 Sensor-specific neural net P(w |xp)

Image data sensor p Classified image FIGURE 23.4 (See color insert following page 302.) Network architecture for decision-level fusion using neural networks.

consuming [52,58]. Hybrid approaches combining statistical methods and neural networks for data fusion have also been proposed [30]. Benediktsson et al. [30] apply a statistical model to each individual source and use neural nets to reach a consensus decision. Most applications involving a neural net use an MLP or radial basis function network, but other neural network architectures can be used [59–61]. Neural nets for data fusion can be applied both at the pixel, feature, and decision level. For pixel- and feature-level fusion a single neural net is used to classify the joint feature vector or pixel measurement vector. For decision-level fusion, a network combination like the one outlined in Figure 23.4 is often used [29]. An MLP neural net is first used to classify the images from each source separately. Then, the outputs from the sensorspecific nets are fused and weighted in a fusion network.

23.4.6

A Closer Look at Dempster–Shafer Evidence Theory for Data Fusion

Dempster–Shafer theory of evidence provides a representation of multi-source data using two central concepts: plausibility and belief. Mathematical evidence theory was first introduced by Dempster in the 1960s, and later extended by Shafer [62]. A good introduction to Dempster–Shafer evidence theory for remote sensing data fusion is given in Ref. [28]. Plausibility (Pls) and belief (Bel) are derived from a mass function m, which is defined on the [0,1] interval. The belief and plausibility functions for an element A are defined as X Bel(A) ¼ m(B) (23:4) BA

Pls(A) ¼

X B \ A 6¼ ;

m(B):

(23:5)

Data Fusion for Remote-Sensing Applications

525

They are sometimes referred to as lower and upper probability functions. The belief value of hypothesis A can be interpreted as the minimum uncertainty value about A, and its plausibility as the maximum uncertainty [28]. Evidence from p different sources is combined by combining the mass functions m1 mp by Pm(;)¼0 If K 6¼ 1, m(A) ¼

B1 \\Bp ¼A

Q

1K

1ip

mi (Bi )

Q P where K ¼ B1\ \Bp ¼o 1 < i < p mi(Bi) is interpreted as a measure of conflict between the different sources. The decision rule used to combine the evidence from each sensor varies from different applications, either maximum of plausibility or maximum of belief (with variations). The performance of Dempster–Shafer theory for data fusion does however depend on the methods used to compute the mass functions. Lee et al. [20] assign nonzero mass function values only to the single classes, whereas Hegarat-Mascle et al. [28] propose two strategies for assigning mass function values to sets of classes according to the membership for a pixel for these classes. The concepts of evidence theory belong to a different school than Bayesian multi-sensor models. Researchers coming from one school often have a tendency to dislike modeling used in the alternative theory. Not many neutral comparisons of these two approaches exist. The main advantage of this approach is its robustness in the method by which information from several heterogeneous sources is combined. A disadvantage is the underlying basic assumption that the evidence from different sources is independent. According to Ref. [43], Bayesian theory assumes that imprecision about uncertainty in the measurements is assumed to be zero and uncertainty about an event is only measured by the probability. The author disagrees with this by pointing out that in Bayesian modeling, uncertainty about the measurements can be modeled in the priors. Priors of this kind are not always used, however. Priors in a Bayesian model can also be used to model spatial context and temporal class development. It might be argued that the Dempster– Shafer theory can be more appropriate for a high number of heterogeneous sources. However, most papers on data fusion for remote sensing consider two or maximum three different sources. 23.4.7

Contextual Methods for Data Fusion

Remote-sensing data have an inherent spatial nature. To account for this, contextual information can be incorporated in the interpretation process. Basically, the effect of context in an image-labeling problem is that when a pixel is considered in isolation, it may provide incomplete information about the desired characteristics. By considering the pixel in context with other measurements, more complete information might be derived. Only a limited set of studies have involved spatially contextual multi-source classification. Richards et al. [38] extended the methods used for spatial contextual classification based on probabilistic relaxation to incorporate ancillary data. Binaghi et al. [63] presented a knowledge-based framework for contextual classification based on fuzzy set theory. Wan and Fraser [61] used multiple self-organizing maps for contextual classification. Le He´garat-Mascle et al. [28] combined the use of a Markov random field model with the Dempster–Shafer theory. Smits and Dellepiane [64] used a multi-channel image segmentation method based on Markov random fields with adaptive neighborhoods. Markov random fields have also been used for data fusion in other application domains [65,66].

Signal and Image Processing for Remote Sensing

526 23.4.8

Using Markov Random Fields to Incorporate Ancillary Data

Schistad Solberg et al. [67,68] used a Markov random field model to include map data into the fusion. In this framework, the task is to estimate the class labels of the scene C given the image data X and the map data M (from a previous survey): P(CjX, M) ¼ P(XjC, M)P(C) with respect to C. The spatial context between neighboring pixels in the scene is modeled in P(C) using the common Ising model. By using the equivalence between Markov random fields and Gibbs distribution P( ) ¼

1 exp U( ) Z

where U is called the energy function and Z is a constant; the task of maximizing P(CjX,M) is equivalent to minimizing the sum U¼

P X

Udata(i) þ Uspatial, map

i¼1

Uspatial is the common Ising model: Uspatial ¼ bs

X

I (ci , ck )

k2N

and Umap ¼ bm

X

t(ci jmk )

k2M

mk is the class assigned to the pixel in the map, and t(cijmk) is the probability of a class transition from class mk to class ci. This kind of model can also be used for multi-temporal classification [67].

23.4.9

A Summary of Data Fusion Architectures

Table 23.1 gives a schematic view on different fusion architectures applied to remotesensing data.

23.5

Multi-Temporal Image Classification

For most applications where multi-source data are involved, it is not likely that all the images are acquired at the same time. When the temporal aspect is involved, the classification methodology must handle changes in pattern classes between the image acquisitions, and possibly also use different classes.

Data Fusion for Remote-Sensing Applications

527

TABLE 23.1 A Summary of Data Fusion Architectures Pixel-level fusion Advantages

Limitations Feature-level fusion Advantages

Limitations Decision-level fusion Advantages

Limitations

Simple. No special classifier software needed. Correlation between sources utilized. Well suited for change detection. Assumes that the data can be modeled using a common probability density function. Source reliability cannot be modeled. Simple. No special classifier software needed. Sensor-specific features give advantage over pixel-based fusion. Well suited for change detection. Assumes that the data can be modeled using a common probability density function. Source reliability cannot be modeled. Suited for data with different probability densities. Source-specific reliabilities can be modeled. Prior information about the source combination can be modeled. Special software often needed.

To find the best classification strategy for a multi-temporal data set, it is useful to consider the goal of the analysis and the complexity of the multi-temporal image data to be used. Multi-temporal image classification can be applied for different purposes: .

Monitor and identify specific changes. If the goal is to monitor changes, multitemporal data are required either in the form of a combination of existing maps and new satellite imagery or as a set of satellite images. For identifying changes, different fusion levels can be considered. Numerous methods for

TABLE 23.2 A Summary of Decision-Level Fusion Strategies Statistical multi-sensor classifiers Advantages Good control over the process. Prior knowledge can be included if the model is adapted to the application. Inclusion of ancillary data simple using a Markov random field approach. Limitations Assumes a particular probability density function. Dempster–Shafer multi-sensor classifiers Advantages Useful for representation of heterogeneous sources. Inclusion of ancillary data simple. Well suited to model a high number of sources. Limitations Performance depends on selected mass functions. Not many comparisons with other approaches. Neural net multi-sensor classifiers Advantages No assumption about probability densities needed. Sensor-specific weights can easily be estimated. Suited for heterogeneous sources. Limitations The user has little control over the fusion process and how different sources are used. Involves a large number of parameters and a risk of overfitting. Hybrid multi-sensor classifiers Advantages Can combine the best of statistical and neural net or Dempster–Shafer approaches. Limitations More complex to use.

Signal and Image Processing for Remote Sensing

528

.

.

change detection exist, ranging from pixel-level to decision-level fusion. Examples of pixel-level change detection are classical unsupervised approaches like image math, image regression, and principal component analysis of a multi-temporal vector of spectral measurements or derived feature vectors like normalized vegetation indexes. In this paper, we will not discuss in detail these well-established unsupervised methods. Decision-level change detection includes postclassification comparisons, direct multi-date classification, and more sophisticated classifiers. Improved quality in discriminating between a set of classes. Sometimes, parts of an area might be covered by clouds, and a multi-temporal image set is needed to map all areas. For microwave images, the signature depends on temperature and soil moisture content, and several images might be necessary to obtain good coverage of all regions in an area as two classes can have different mechanisms affecting their signature. For this kind of application, a data fusion model that takes source reliability weighting into account should be considered. An example concerning vegetation classification in a series of SAR images is shown in Figure 23.5. Discriminate between classes based on their temporal signature development. By analyzing an area through time and studying how the spectral signature changes, it is possible to discriminate between classes that are not separable on a single

FIGURE 23.5 Multi-temporal image from 13 different dates during August–December 1991 for agricultural sites in Norway. The ability to identify ploughing activity in a SAR image depends on the soil moisture content at the given date.

Data Fusion for Remote-Sensing Applications

529

image. Consider for example vegetation mapping. Based on a single image, we might be able to discriminate between deciduous and conifer trees, but not between different kinds of conifer or deciduous. By studying how the spectral signature varies during the growth season, we might also be able to discriminate between different vegetation species. It is also relevant to consider the available data set. How many images can be included in the analysis? Most studies use bi-temporal data sets, which are easy to obtain. Obtaining longer time series of images can sometimes be difficult due to sensor repeat cycles and weather limitations. In Northern Europe, cloud coverage is a serious limitation for many applications of temporal trajectory analysis. Obtaining long time series tends to be easier for low- and medium-resolution images from satellites with frequent passes. A principal decision in multi-temporal image analysis is whether the images are to be combined on the pixel level or the decision level. Pixel-level fusion consists of combining the multi-temporal images into a joint data set and performing the classification based on all data at the same time. In decision-level fusion, a classification is first performed for each time, and then the individual decisions are combined to reach a consensus decision. If no changes in the spectral signatures of the objects to be studied have occurred between the image acquisitions, then this is very similar to classifier combination [31].

23.5.1

Multi-Temporal Classifiers

In the following, we describe the main approaches for multi-temporal classification. The methods utilize temporal correlation in different ways. Temporal feature correlation means that the correlation between the pixel measurements or feature vectors at different times is modeled. Temporal class correlation means that the correlation between the class labels of a given pixel at different times is modeled. 23.5.1.1

Direct Multi-Date Classification

In direct compound or stacked vector classification, the multi-temporal data set is merged at the pixel level into one vector of measurements, followed by classification using a traditional classifier. This is a simple approach that utilizes temporal feature correlation. However, the approach might not be suited when some of the images are of lower quality due to noise. An example of this classification strategy is to use multiple self-organizing map (MSOM) [69] as a classifier for compound bi-temporal images. 23.5.1.2

Cascade Classifiers

Swain [70] presented the initial work on using cascade classifiers. In a cascade-classifier approach the temporal class correlation between multi-temporal images is utilized in a recursive manner. To find a class label for a pixel at time t2, the conditional probability for observing class v given the images x1 and x2 is modeled as P(vjx1 , x2 ) Classification was performed using a maximum likelihood classifier. In several papers by Bruzzone and co-authors [71,72] the use of cascade classifiers has been extended to unsupervised classification using multiple classifiers (combining both maximum likelihood classifiers and radial basis function neural nets).

Signal and Image Processing for Remote Sensing

530 23.5.1.3

Markov Chain and Markov Random Field Classifiers

Schistad Solberg et al. [67] describe a method for classification of multi-source and multitemporal images where the temporal changes of classes are modeled using Markov chains with transition probabilities. This approach utilizes temporal class correlation. In the Markov random field model presented in Ref. [25], class transitions are modeled in terms of Markov chains of possible class changes and specific energy functions are used to combine temporal information with multi-source measurements, and ancillary data. Bruzzone and Prieto [73] use a similar framework for unsupervised multi-temporal classification. 23.5.1.4 Approaches Based on Characterizing the Temporal Signature Several papers have studied changes in vegetation parameters (for a review see Ref. [74]). In Refs. [50,75] the temporal signatures of classes are modeled using Fourier series (using temporal feature correlation). Not many approaches have integrated phenological models for the expected development of vegetation parameters during the growth season. Aurdal et al. [76] model the phenological evolution of mountain vegetation using hidden Markov models. The different vegetation classes can be in one of a predefined set of states related to their phenological development, and classifying a pixel consists of selecting the class that has the highest probability of producing a given series of observations. The performance of this model is compared to a compound maximum likelihood approach and found to give comparable results for a single scene, but more robust when testing and training on different images. 23.5.1.5 Other Decision-Level Approaches to Multi-Temporal Classification Jeon and Landgrebe [46] developed a spatio-temporal classifier utilizing both the temporal and the spatial context of the image data. Khazenie and Crawford [47] proposed a method for contextual classification using both spatial and temporal correlation of data. In this approach, the feature vectors are modeled as resulting from a class-dependent process and a contaminating noise process, and the noise is correlated in both space and time. Middelkoop and Janssen [49] presented a knowledge-based classifier, which used landcover data from preceding years. An approach to decision-level change detection using evidence theory is given in Ref. [43]. A summary of approaches for multi-temporal image classifiers is given in Table 23.3.

23.6

Multi-Scale Image Classification

Most of the approaches to multi-sensor image classification do not treat the multi-scale aspect of the input data. The most common approach is to resample all the images to be fused to a common pixel resolution. In other domains of science, much work on combining data sources at different resolutions exists, for example, in epidemiology [77], in the estimation of hydraulic conductivity for characterizing groundwater flow [78], and in the estimation of environmental components [44]. These approaches are mainly for situations where the aim is to estimate an underlying continuous variable. The remote-sensing literature contains many examples of multi-scale and multi-sensor data visualization. Many multi-spectral sensors, such as SPOT XS or Ikonos, provide a combination of multi-spectral band and a panchromatic band of a higher resolution. Several methods for visualizing such multi-scale data sets have been proposed, and

Data Fusion for Remote-Sensing Applications

531

TABLE 23.3 A Discussion of Multi-Temporal Classifiers Direct multi-date classifier Advantages Limitations Cascade classifiers Advantages

Simple. Temporal feature correlation between image measurements utilized. Is restricted to pixel-level fusion. Not suited for data sets containing noisy images. Temporal correlation of class labels considered. Information about special class transitions can be modeled.

Limitations Special software needed. Markov chain and MRF classifiers Advantages Spatial and temporal correlation of class labels considered. Information about special class transitions can be modeled. Limitations Special software needed. Temporal signature trajectory approaches Advantages Can discriminate between classes not separable at a single point in time. Can be used either at feature level or at decision level. Decision-level approaches allow flexible modeling. Limitations Feature-level approaches can be sensitive to noise. A time series of images needed (can be difficult to get more than bi-temporal).

they are often based on overlaying a multi-spectral image on the panchromatic image using different colors. We will not describe such techniques in detail, but refer the reader to surveys like [51,55,79]. Van der Meer [80] studied the effect of multi-sensor image fusion in terms of information content for visual interpretation, and concluded that image fusion aiming at improving the visual content and interpretability was more successful for homogeneous data than for heteorogeneous data. For classification problems, Puyou-Lascassies [54] and Zhukov et al. [81] considered unmixing of low-resolution data by using class label information obtained from classification of high-resolution data. The unmixing is performed through several sequential steps, but no formal model for the complete data set is derived. Price [53] proposed unmixing by relating the correlation between low-resolution data and high-resolution data resampled to low resolution, to correlation between high-resolution data and lowresolution data resampled to high resolution. The possibility of mixed pixels was not taken into account. In Ref. [82], separate classifications were performed based on data from each resolution. The resulting resolution-dependent probabilities were averaged over the resolutions. Multi-resolution tree models are sometimes used for multi-scale analysis (see, e.g., Ref. [48]). Such models yield a multi-scale representation through a quad tree, in which each pixel at a given resolution is decomposed into four child pixels at higher resolution, which are correlated. This gives a model where the correlation between neighbor pixels depends on the pixel locations in an arbitrary (i.e., not problem-related) manner. The multi-scale model presented in Ref. [83] is based on the concept of a reference resolution and is developed in a Bayesian framework [84]. The reference resolution corresponds to the highest resolution present in the data set. For each pixel of the input image at the reference resolution it is assumed that there is an underlying discrete class. The observed pixel values are modeled conditionally on the classes. The properties of the class label image are described through an a priori model. Markov random fields have been selected for this purpose. Data at coarser resolutions are modeled as mixed pixels,

Signal and Image Processing for Remote Sensing

532 TABLE 23.4 A Discussion of Multi-Scale Classifiers Resampling combined with single-scale classifier Advantages Limitations Classifier with explicit multi-scale model Advantages Limitations

Simple. Works well enough for homogeneous regions. Can fail in identifying small or detailed structures. Can give increased performance for small or detailed structures. More complex software needed. Not necessary for homogeneous regions.

that is, the observations are allowed to include contributions from several distinct classes. In this way it is possible to exploit spectrally richer images at lower resolutions to obtain more accurate classification results at the reference level, without smoothing the results as much as if we simply oversample the low-resolution data to the reference resolution prior to the analysis. Methods that use a model for the relationship between the multi-scale data might offer advantages compared to simple resampling both in terms of increased classification accuracy and being able to describe relationships between variables measured at different scales. This can provide tools to predict high-resolution properties from coarser resolution properties. Of particular concern in the establishment of statistical relationships is the quantification of what is lost in precision at various resolutions and the associated uncertainty. The potential of using multi-scale classifiers will also depend on the level of detail needed for the application, and might be related to the typical size of the structures one wants to identify in the images. Even simple resampling of the coarsest resolution to the finest resolution, followed by classification using a multi-sensor classifier, can help improve the classification result. The gain obtained by using a classifier that explicitly models the data at different scales depends not only on the set of classes used but also on the regions used to train and test the classifier. For scenes with a high level of detail, for example, in urban scenes, the performance gain might be large. However, it depends also on how the classifier performance is evaluated. If the regions used for testing the classifier are well inside homogeneous regions and not close to other classes, the difference in performance in terms of overall classification accuracy might not be large, but visual inspection of the level of detail in the classified images can reveal the higher level of detail. A summary of multi-scale classification approaches is given in Table 23.4.

23.7

Concluding Remarks

A number of different approaches for data fusion in remote-sensing applications have been presented in the literature. A prerequisite for data fusion is that the data are coregistered and geometrically and radiometrically corrected. In general, there is no consensus on which multi-source or multi-temporal classification approach works best. Different studies and comparisons report different results. There is still a need for a better understanding on which methods are most suited to different applications types, and also broader comparison studies. The best level and methodology for a given remote-sensing application depends on several factors: the complexity of the

Data Fusion for Remote-Sensing Applications

533

classification problem, the available data set, the number of sensors involved, and the goal of the analysis. Some guidelines for selecting the methodology and architecture for a given fusion task are given below. 23.7.1

Fusion Level

Decision-level fusion gives best control and allows weighting the influence of each sensor. Pixel-level fusion can be suited for simple analysis, for example, fast unsupervised change detection. 23.7.2

Selecting a Multi-Sensor Classifier

If decision-level fusion is selected, three main approaches for fusion should be considered: the statistical approach, neural networks, or evidence theory. A hybrid approach can also be used to combine these approaches. If the sources are believed to provide data of different quality, weighting schemes for consensus combination of the sensor-specific classifiers should be considered. 23.7.3

Selecting a Multi-Temporal Classifier

To find the best classification strategy for a multi-temporal data set, the complexity of the class separation problem must be considered in light of the available data set. If the classes are difficult to separate, it might be necessary to use methods for characterizing the temporal trajectory of signatures. For pixel-level classification of multi-temporal imagery, the direct multi-date classification approach can be used. If specific knowledge about certain types of changes needs to be modeled, Markov chain and Markov random field approaches or cascade classifiers should be used. 23.7.4

Approaches for Multi-Scale Data

Multi-scale images can either be resampled to a common resolution or a classifier with implicit modeling of the relationship between the different scales can be used. For classification problems involving small or detailed structures (e.g., urban areas) or heteorogeneous sources, the latter is recommended.

Acknowledgment The author would like to thank Line Eikvil for valuable input, in particular, regardingmulti-sensor image registration.

References 1. C. Elachi, J. Cimino, and M. Settle, Overview of the shuttle imaging radar-B preliminary scientific results, Science, 232, 1511–1516, 1986.

534

Signal and Image Processing for Remote Sensing

2. J. Cimino, A. Brandani, D. Casey, J. Rabassa, and S.D. Wall, Multiple incidence angle SIR-B experiment over Argentina: Mapping of forest units, IEEE Trans. Geosc. Rem. Sens., 24, 498–509, 1986. 3. G. Asrar, Theory and Applications of Optical Remote Sensing, Wiley, New York, 1989. 4. F.T. Ulaby, R.K. Moore, and A.K. Fung, Microwave Remote Sensing, Active and Passive, Vols. I–III, Artech House Inc., 1981, 1982, 1986. 5. F.T. Ulaby and C. Elachi, Radar Polarimetry for Geoscience Applications, Artec House Inc., 1990. 6. H.A. Zebker and J.J. Van Zyl, Imaging radar polarimetry: a review, Proc. IEEE, 79, 1583– 1606, 1991. 7. B. Zitova and J. Flusser, Image registration methods: a survey, Image and Vision Computing, 21, 977–1000, 2003. 8. P. Chalermwat and T. El-Chazawi, Multi-resolution image registration using genetics, in Proc. ICIP, 452–456, 1999. 9. H.M. Chen, M.K. Arora, and P.K. Varshney, Mutual information-based image registration for remote sensing data, Int. J. Rem. Sens., 24, 3701–3706, 2003. 10. X. Dai and S. Khorram, A feature-based image registration algorithm using improved chain-code representation combined with invariant moments, IEEE Trans. Geosc. Rem. Sens., 37, 17–38, 1999. 11. D.M. Mount, N.S. Netanyahu, and L. Le Moigne, Efficient algorithms for robust feature matching, Pattern Recognition, 32, 17–38, 1999. 12. E. Rignot, R. Kwok, J.C. Curlander, J. Homer, and I. Longstaff, Automated multisensor registration: Requirements and techniques, Photogramm. Eng. Rem. Sens., 57, 1029–1038, 1991. 13. Z.-D. Lan, R. Mohr, and P. Remagnino, Robust matching by partial correlation, in British Machine Vision Conference, 651–660, 1996. 14. D. Fedorov, L.M.G. Fonseca, C. Kennedy, and B.S. Manjunath, Automatic registration and mosaicking system for remotely sensed imagery, in Proc. 9th Int. Symp. Rem. Sens., 22–27, Crete, Greece, 2002. 15. L. Fonseca, G. Hewer, C. Kenney, and B. Manjunath, Registration and fusion of multispectral images using a new control point assessment method derived from optical flow ideas, in Proc. Algorithms for Multispectral and Hyperspectral Imagery V, 104–111, SPIE, Orlando, USA, 1999. 16. M.A. Abidi and R.C. Gonzalez, Data Fusion in Robotics and Machine Intelligence, Academic Press, Inc., New York, 1992. 17. N. Xiong and P. Svensson, Multi-sensor management for information fusion: issues and approaches, Information Fusion, 3, 163–180, 2002. 18. J.M. Richardson and K.A. Marsh, Fusion of multisensor data, Int. J. Robot. Res. 7, 78–96, 1988. 19. D.L. Hall and J. Llinas, An introduction to multisensor data fusion, Proc. IEEE, 85(1), 6–23, 1997. 20. T. Lee, J.A. Richards, and P.H. Swain, Probabilistic and evidential approaches for multisource data analysis, IEEE Trans. Geosc. Rem. Sens., 25, 283–293, 1987. 21. N. Ayache and O. Faugeras, Building, registrating, and fusing noisy visual maps, Int. J. Robot. Res., 7, 45–64, 1988. 22. J.A. Benediktsson and P.H. Swain, A method of statistical multisource classification with a mechanism to weight the influence of the data sources, in IEEE Symp. Geosc. Rem. Sens. (IGARSS), 517–520, Vancouver, Canada, July 1989. 23. S. Wu, Analysis of data acquired by shuttle imaging radar SIR-A and Landsat Thematic Mapper over Baldwin county, Alabama, in Proc. Mach. Process. Remotely Sensed Data Symp., 173–182, West Lafayette, Indiana, June 1985. 24. A.H. Schistad Solberg, A.K. Jain, and T. Taxt, Multisource classification of remotely sensed data: Fusion of Landsat TM and SAR images, IEEE Trans. Geosc. Rem. Sens., 32, 768–778, 1994. 25. A. Schistad Solberg, Texture fusion and classification based on flexible discriminant analysis, in Int. Conf. Pattern Recogn. (ICPR), 596–600, Vienna, Austria, August 1996. 26. H. Kim and P.H. Swain, A method for classification of multisource data using interval-valued probabilities and its application to HIRIS data, in Proc. Workshop Multisource Data Integration Rem. Sens., 75–82, NASA Conference Publication 3099, Maryland, June 1990. 27. J. Desachy, L. Roux, and E-H. Zahzah, Numeric and symbolic data fusion: a soft computing approach to remote sensing image analysis, Pattern Recognition Letters, 17, 1361–1378, 1996.

Data Fusion for Remote-Sensing Applications

535

28. S.L. He´garat-Mascle, I. Bloch, and D. Vidal-Madjar, Application of Dempster–Shafer evidence theory to unsupervised classification in multisource remote sensing, IEEE Trans. Geosc. Rem. Sens., 35, 1018–1031, 1997. 29. S.B. Serpico and F. Roli, Classification of multisensor remote-sensing images by structured neural networks, IEEE Trans. Geosc. Rem. Sens., 33, 562–578, 1995. 30. J.A. Benediktsson, J.R. Sveinsson, and P.H. Swain, Hybrid consensys theoretic classification, IEEE Trans. Geosc. Rem. Sens., 35, 833–843, 1997. 31. J.A. Benediktsson and I. Kanellopoulos, Classification of multisource and hyperspectral data based on decision fusion, IEEE Trans. Geosc. Rem. Sens., 37, 1367–1377, 1999. 32. B.C.K. Tso and P.M. Mather, Classification of multisource remote sensing imagery using a genetic algorithm and Markov random fields, IEEE Trans. Geosc. Rem. Sens., 37, 1255–1260, 1999. 33. M. Petrakos, J.A. Benediktsson, and I. Kannelopoulos, The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion, IEEE Trans. Geosc. Rem. Sens., 39, 2539–2546, 2001. 34. J.A. Benediktsson and J. Sveinsson, Multisource remote sensing data classification based on consensus and pruning, IEEE Trans. Geosc. Rem. Sens., 41, 932–936, 2003. 35. A. Solberg, G. Storvik, and R. Fjørtoft, A comparison of criteria for decision fusion and parameter estimation in statistical multisensor image classification, in IEEE Symp. Geosc. Rem. Sens. (IGARSS’02), July 2002. 36. D.G. Leckie, Synergism of synthetic aperture radar and visible/infrared data for forest type discrimination, Photogramm. Eng. Rem. Sens., 56, 1237–1246, 1990. 37. S.E. Franklin, Ancillary data input to satellite remote sensing of complex terrain phenomena, Comput. Geosci., 15, 799–808, 1989. 38. J.A. Richards, D.A. Landgrebe, and P.H. Swain, A means for utilizing ancillary information in multispectral classification, Rem. Sens. Environ., 12, 463–477, 1982. 39. J. Friedman, Multivariate adaptive regression splines (with discussion), Ann. Stat., 19, 1–141, 1991. 40. P. Gong, R. Pu, and J. Chen, Mapping ecological land systems and classification uncertainties from digital elevation and forest-cover data using neural networks, Photogramm. Eng. Rem. Sens., 62, 1249–1260, 1996. 41. L. Guan, J.A. Anderson, and J.P. Sutton, A network of networks processing model for image regularization, IEEE Trans. Neural Networks, 8, 169–174, 1997. 42. T. Hastie, R. Tibshirani, and A. Buja, Flexible discriminant analysis by optimal scoring, J. Am. Stat. Assoc., 89, 1255–1270, 1994. 43. S. Le He´garat-Mascle and R. Seltz, Automatic change detection by evidential fusion of change indices, Rem. Sens. Environ., 91, 390–404, 2004. 44. D. Hirst, G. Storvik, and A.R. Syversveen, A hierarchical modelling approach to combining environmental data at different scales, J. Royal Stat. Soc., Series C, 52, 377–390, 2003. 45. J.-N. Hwang, D. Li, M. Maechelr, D. Martin, and J. Schimert, Projection pursuit learning networks for regression, Eng. Appl. Artif. Intell., 5, 193–204, 1992. 46. B. Jeon and D.A. Landgrebe, Classification with spatio-temporal interpixel class dependency contexts, IEEE Trans. Geosc. Rem. Sens., 30, 663–672, 1992. 47. N. Khazenie and M.M. Crawford, Spatio-temporal autocorrelated model for contextual classification, IEEE Trans. Geosc. Rem. Sens., 28, 529–539, 1990. 48. M.R. Luettgen, W. Clem Karl, and A.S. Willsky, Efficient multiscale regularization with applications to the computation of optical flow, IEEE Trans. Image Process., 3(1), 41–63, 1994. 49. J. Middelkoop and L.L.F. Janssen, Implementation of temporal relationships in knowledge based classification of satellite images, Photogramm. Eng. Rem. Sens., 57, 937–945, 1991. 50. L. Olsson and L. Eklundh, Fourier series for analysis of temporal sequences of satellite sensor imagery, Int. J. Rem. Sens., 15, 3735–3741, 1994. 51. G. Pajares and J.M. de la Cruz, A wavelet-based image fusion tutorial, Pattern Recognition, 37, 1855–1871, 2004. 52. J.D. Paola and R.A. Schowengerdt, The effect of neural-network structure on a multispectral land-use/land-cover classification, Photogramm. Eng. Rem. Sens., 63, 535–544, 1997. 53. J.C. Price, Combining multispectral data of differing spatial resolution, IEEE Trans. Geosci. Rem. Sens., 37(3), 1199–1203, 1999.

536

Signal and Image Processing for Remote Sensing

54. P. Puyou-Lascassies, A. Podaire, and M. Gay, Extracting crop radiometric responses from simulated low and high spatial resolution satellite data using a linear mixing model, Int. J. Rem. Sens., 15(18), 3767–3784, 1994. 55. T. Ranchin, B. Aiazzi, L. Alparone, S. Baronti, and L. Wald, Image fusion—the ARSIS concept and some successful implementations, ISPRS J. Photogramm. Rem. Sens., 58, 4–18, 2003. 56. B.D. Ripley, Flexible non-linear approaches to classification, in From Statistics to Neural Networks. Theory and Pattern Recognition Applications, V. Cherkassky, J.H. Friedman, and H. Wechsler, eds., 105–126, NATO ASI series F: Computer and systems sciences, springer-Verlag, Heidelberg, 1994. 57. A.H. Solberg, Flexible nonlinear contextual classification, Pattern Recognition Letters, 25, 1501– 1508, 2004. 58. A.K. Skidmore, B.J. Turner, W. Brinkhof, and E. Knowles, Performance of a neural network: mapping forests using GIS and remotely sensed data, Photogramm. Eng. Rem. Sens., 63, 501–514, 1997. 59. J.A. Benediktsson, J.R. Sveinsson, and O.K. Ersoy, Optimized combination of neural networks, in IEEE Int. Symp. Circuits and Sys. (ISCAS’96), 535–538, Atlanta, Georgia, May 1996. 60. G.A. Carpenter, M.N. Gjaja, S. Gopal, and C.E. Woodcock, ART neural networks for remote sensing: Vegetation classification from Landsat TM and terrain data, in IEEE Symp. Geosc. Rem. Sens. (IGARSS), 529–531, Lincoln, Nebraska, May 1996. 61. W. Wan and D. Fraser, A self-organizing map model for spatial and temporal contextual classification, in IEEE Symp. Geosc. Rem. Sens. (IGARSS), 1867–1869, Pasadena, California, August 1994. 62. G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, 1976. 63. E. Binaghi, P. Madella, M.G. Montesano, and A. Rampini, Fuzzy contextual classification of multisource remote sensing images, IEEE Trans. Geosc. Rem. Sens., 35, 326–340, 1997. 64. P.C. Smits and S.G. Dellepiane, Synthetic aperture radar image segmentation by a detail preserving Markov random field approach, IEEE Trans. Geosc. Rem. Sens., 35, 844–857, 1997. 65. P.B. Chou and C.M. Brown, Multimodal reconstruction and segmentation with Markov random fields and HCF optimization, in Proc. 1988 DARPA Image Understanding Workshop, 214– 221, 1988. 66. W.A. Wright, A Markov random field approach to data fusion and colour segmentation, Image Vision Comp., 7, 144–150, 1989. 67. A.H. Schistad Solberg, T. Taxt, and Anil K. Jain, A Markov random field model for classification of multisource satellite imagery, IEEE Trans. Geosc. Rem. Sens., 34, 100–113, 1996. 68. A.H. Schistad Solberg, Contextual data fusion applied to forest map revision, IEEE Trans. Geosc. Rem. Sens., 37, 1234–1243, 1999. 69. W. Wan and D. Fraser, Multisource data fusion with multiple self-organizing maps, IEEE Trans. Geosc. Rem. Sens., 37, 1344–1349, 1999. 70. P.H. Swain, Bayesian classification in a time-varying environment, IEEE Trans. Sys. Man Cyber., 8, 879–883, 1978. 71. L. Bruzzone and R. Cossu, A multiple-cascade-classifier system for a robust and partially unsupervised updating of land-cover maps, IEEE Trans. Geosc. Rem. Sens., 40, 1984–1996, 2002. 72. L. Bruzzone and D.F. Prieto, Unsupervised retraining of a maximum-likelihood classifier for the analysis of multitemporal remote-sensing images, IEEE Trans. Geosc. Rem. Sens., 39, 456–460, 2001. 73. L. Bruzzone and D.F. Prieto, An adaptive semiparametric and context-based approach to unsupervised change detection in multitemporal remote-sensing images, IEEE Trans. Image Proc., 11, 452–466, 2002. 74. P. Coppin, K. Jonkheere, B. Nackaerts, and B. Muys, Digital change detection methods in ecosystem monitoring: A review, Int. J. Rem. Sens., 25, 1565–1596, 2004. 75. L. Andres, W.A. Salas, and D. Skole, Fourier analysis of multi-temporal AVHRR data applied to a land cover classification, Int. J. Rem. Sens., 15, 1115–1121, 1994. 76. L. Aurdal, R. B. Huseby, L. Eikvil, R. Solberg, D. Vikhamar, and A. Solberg, Use of hiddel Markov models and phenology for multitemporal satellite image classification: applications to mountain vegetation classification, in MULTITEMP 2005, 220–224, May 2005. 77. N.G. Besag, K. Ickstadt, and R.L. Wolpert, Spatial poisson regression for health and exposure data measured at disparate resolutions, J. Am. Stat. Assoc., 452, 1076–1088, 2000.

Data Fusion for Remote-Sensing Applications

537

78. M.M. Daniel and A.S. Willsky, A multiresolution methodology for signal-level fusion and data assimilation with applications to remote sensing, Proc. IEEE, 85(1), 164–180, 1997. 79. L. Wald, Data Fusion: Definitions and Achitectures—Fusion of Images of Different Spatial Resolutions, Ecole des Mines Press, 2002. 80. F. Van der Meer, What does multisensor image fusion add in terms of information content for visual interpretation? Int. J. Rem. Sens., 18, 445–452, 1997. 81. B. Zhukov, D. Oertel, F. Lanzl, and G. Reinha¨ckel, Unmixing-based multisensor multiresolution image fusion, IEEE Trans. Geosci. Rem. Sens., 37(3), 1212–1226, 1999. 82. M.M. Crawford, S. Kumar, M.R. Ricard, J.C. Gibeaut, and A. Neuenshwander, Fusion of airborne polarimetric and interferometric SAR for classification of coastal environments, IEEE Trans. Geosci. Rem. Sens., 37(3), 1306–1315, 1999. 83. G. Storvik, R. Fjørtoft, and A. Solberg, A Bayesian approach to classification in multiscale remote sensing data, IEEE Trans. Geosc. Rem. Sens., 43, 539–547, 2005. 84. J. Besag, Towards Bayesian image analysis, J. Appl. Stat., 16(3), 395–407, 1989.

24 The Hermite Transform: An Efficient Tool for Noise Reduction and Image Fusion in Remote-Sensing Boris Escalante-Ramı´rez and Alejandra A. Lo´pez-Caloca

CONTENTS 24.1 Introduction ..................................................................................................................... 539 24.2 The Hermite Transform ................................................................................................. 541 24.2.1 The Hermite Transform as an Image Representation Model.................... 541 24.2.2 The Steered Hermite Transform..................................................................... 543 24.3 Noise Reduction in SAR Images .................................................................................. 545 24.4 Fusion Based on the Hermite Transform ................................................................... 546 24.4.1 Fusion Scheme with Multi-Spectral and Panchromatic Images ............... 550 24.4.2 Experimental Results with Multi-Spectral and Panchromatic Images .... 551 24.4.3 Fusion Scheme with Multi-Spectral and SAR Images ................................ 552 24.4.4 Experimental Results with Multi-Spectral and SAR Images..................... 553 24.5 Conclusions...................................................................................................................... 555 Acknowledgments ..................................................................................................................... 556 References ................................................................................................................................... 556

24.1

Introduction

In this chapter, we introduce the Hermite transform (HT) as an efficient tool for remotesensing image processing applications. The HT is an image representation model that mimics some of the more important properties of human visual perception, namely the local orientation analysis and the Gaussian derivative model of early vision. We limit our discussion to the cases of noise reduction and image fusion. However many different applications can be tackled within the scheme of direct-inverse HT. It is generally acknowledged that visual perception models must involve two major processing stages: (1) initial measurements and (2) high-level interpretation. Fleet and Jepson [1] pointed out that the early measurement is a rich encoding of image structure in terms of generic properties from which structures that are more complex are easily detected and analyzed. Such measurement processes should be image-independent and require no previous or concurrent interpretation. Unfortunately, it is not known what primitives are necessary and sufficient for interpretation or even identification of meaningful features. However, we know that, for image processing purposes, linear operators that exhibit special kind of symmetries related to translation, rotation, and magnification are of particular interest. A family of generic neighborhood operators fulfilling these 539

540

Signal and Image Processing for Remote Sensing

requirements is that formed by the so-called Gaussian derivatives [2]. These operators have long been used in computer vision for feature extraction [3,4], and are relevant in visual system modeling [5]. Formal integration of these operators is achieved in the HT introduced first by Martens [6,7], and recently reformulated as a multi-scale image representation model for local orientation analysis [8,9]. This transform can take many alternative forms corresponding to different ways of coding local orientations in the image. Young showed that Gaussian derivatives model the measured receptive field data more accurately than the Gabor functions do [10]. Like the receptive fields, both Gabor functions and Gaussian derivatives are spatially local and consist of alternating excitatory and inhibitory regions within a decaying envelope. However, the Gaussian derivative analysis is found to be more efficient because it takes advantage of the fact that Gaussian derivatives comprise an orthogonal basis if they belong to the same point of analysis. Gaussian derivatives can be interpreted as local generic operators in a scale-space representation described by the isotropic diffusion equation [2]. In a related work, the Gaussian derivatives have been interpreted as the product of Hermite polynomials and a Gaussian window [6], where windowed images are decomposed into a set of Hermite polynomials. Some mathematical models based on these operators at a single spatial scale have been described elsewhere [6,11]. In the case of the HT, it has been extended to the multi-scale case [7–9], and has been successfully used in different applications such as noise reduction [12], coding [13], and motion estimation for the case of image sequences [14]. Applications to local orientation analysis are a major concern in this chapter. It is well known that local orientation estimation can be achieved by combining the outputs from polar separable quadrature filters [15]. Freeman and Adelson developed a technique to steer filters by linearly combining basis filters oriented at a number of specific directions [16]. The possibilities are, in fact, infinite because the set of basis functions required to steer a function is not unique [17]. The Gaussian derivative family is perhaps the most common example of such functions. In the first part of this chapter we introduce the HT as an image representation model, and show how local analysis can be achieved from a steered HT. In the second part we build a noise-reduction algorithm for synthetic aperture radar (SAR) images based on the steered HT that adapts to the local image content and to the multiplicative nature of speckle. In the third section, we fuse multi-spectral and panchromatic images from the same satellite (Landsat ETMþ) with different spatial resolutions. In this case we show how the proposed method improves spatial resolution and preserves the spectral characteristics, that is, the biophysical variable interpretation of the original images remains intact. Finally, we fuse SAR and multi-spectral Landsat ETMþ images, and show that in this case spatial resolution is also improved while spectral resolution is preserved. Speckle reduction in the SAR image is achieved, along with image fusion, within the analysis– synthesis process of the fusion scheme. Both fusion and speckle-reduction algorithms are based on the detection of relevant image structures (primitives) during the analysis stage. For this purpose, Gaussianderivative filters at different scales can be used. Local orientation is estimated so that the transform can be rotated at every position of the analysis window. In the case of noise reduction, transform coefficients are classified based on structure dimensionality and energy content so that those belonging to speckle are discarded. With a similar criterion, transform coefficients from different image sources are classified to select coefficients from each image that contribute to synthesize the fused image.

The Hermite Transform

24.2 24.2.1

541

The Hermite Transform The Hermite Transform as an Image Representation Model

The HT [6,7] is a special case of polynomial transform. It can be regarded as an image description model. Firstly, windowing with a local function v(x, y) takes place at several positions over the input image. Next, local information at every analysis window is expanded in terms of a family of orthogonal polynomials. The polynomials Gm,nm(x, y) used to approximate the windowed information are determined by the analysis window function and satisfy the orthogonal condition: þ1 ð þ1 ð

v2 (x, y)Gm,nm (x, y)Gl,kl (x, y) dx dy ¼ dnk dml

(24:1)

1 1

for n, k ¼ 0, . . . , 1; m ¼ 0, . . . , n; l ¼ 0, . . . , k; where dnk denotes the Kronecker function. Psychophysical insights suggest using a Gaussian window function, which resembles the receptive field profiles of human vision, that is, v(x, y) ¼

1 (x2 þ y2 ) exp 2ps2 2s2

(24:2)

The Gaussian window is separable into Cartesian coordinates; it is isotropic, thus it is rotationally invariant and its derivatives are good models of some of the more important retinal and cortical cells of the human visual system [5,10]. In the case of a Gaussian window function, the associated orthogonal polynomials are the Hermite polynomials [18]: x y 1 Gnm, m (x, y) ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Hnm Hm s s 2n (n m)!m!

(24:3)

where Hn(x) denotes the nth Hermite polynomial. The original signal L(x, y), where (x, y) are the pixel coordinates, is multiplied by the window function v(x p, y q), at positions ( p, q) that conform the sampling lattice S. Through replication of the window function P over the sampling lattice, a periodic weighting function is defined as W(x, y) ¼ v(x p, y q). This weighting function ðp,qÞ2S

must be different from zero for all coordinates (x, y), then: L(x, y) ¼

X 1 L(x, y)v(x p, y q) W(x, y) ðp,qÞ2S

(24:4)

The signal content within every window function is described as a weighted sum of polynomials Gm,nm(x, y) of m degree in x and n m in y. In a discrete implementation, the Gaussian window function may be approximated by the binomial window function, and in this case, its orthogonal polynomials Gm,nm(x, y) are known as the Krawtchouck polynomials.

Signal and Image Processing for Remote Sensing

542

In either case, the polynomial coefficients Lm,nm(p, q) are calculated by convolution of the original image L(x, y) with the function filter Dm,nm(x, y) ¼ Gm,nm(x, y) v2(x, y) followed by subsampling at positions (p, q) of the sampling lattice S, i.e.,

Lm, nm (p, q) ¼

þ1 ð ð þ1

L(x, y)Dm,nm (x p, y q) dx dy

(24:5)

1 1

For the case of the HT, it can be shown [18] that the filter functions Dm,nm(x, y) correspond to Gaussian derivatives of order m in x and n m in y, in agreement with the Gaussian derivative model of early vision [5,10]. The process of recovering the original image consists of interpolating the transform coefficients with the proper synthesis filters. This process is called an inverse polynomial transform and is defined by ^(x, y) ¼ L

1 X n X X

Lm,nm (p, q)Pm,nm (x p, y q)

(24:6)

n¼0 m¼0 (p, q)2S

The synthesis filters Pm,nm(x, y) of order m and n m are defined by Pm,nm (x, y) ¼

Gm, nm (x, y)v(x, y) W(x, y)

for m ¼ 0, . . . , n

and

n ¼ 0, . . . , 1:

Figure 24.1 shows the analysis and synthesis stages of a polynomial transform. Figure 24.2 shows a HT calculated on a satellite image. To define a polynomial transform, some parameters have to be chosen. First, we have to define the characteristics of the window function. The Gaussian window is the best option from a perceptual point of view and from the scale-space theory. Other free parameters are the size of the Gaussian window spread (s) and the distance between adjacent window positions (sampling lattice). The size of the window functions must be related to the spatial scale of the image structures that are to be analyzed. Fine local changes are better detected with small windows, but on the contrary, representation of low-resolution objects needs large windows. To overcome this compromise, multi-resolution representations are a good alternative. For the case of the HT, a multi-resolution extension has recently been proposed [8,9].

D00

T

D01

T

D10

T

L00

L00

L01

L01

L10

L10

T

P00

T

P01

T

P10

Original image

image

DNN T Subsampling

Synthesized

T

LNN Polynomial coefficients

FIGURE 24.1 Analysis and synthesis with the polynomial transform.

LNN Polynomial coefficients

T

PNN T Oversampling

The Hermite Transform

n⫽ 0

543

L0,0

L1,0

LN,0

L0,1

L1,1

LN,1

L0,N

L1,N

LN,N

1 n⫽

2 n⫽ (b)

FIGURE 24.2 (a) Hermite transform calculated on a satellite image. (b) Diagram showing the coefficient orders. Diagonals depict pﬃﬃﬃ zero-order coefficients (n ¼ 0), first-order coefficients (n ¼ 1), etc. Gaussian window with spread s ¼ 2 and subsampling d ¼ 4 was used.

24.2.2

The Steered Hermite Transform

The HT has the advantage that high-energy compaction can be obtained through adaptively steering the transform [19]. The term steerable filters describes a set of filters that are rotated copies of each other, and a copy of the filter in any orientation which is then constructed as a linear combination of a set of basis filters. The steering property of the Hermite filters can be considered because the filters are products of polynomials with a radially symmetric window function. The N þ 1 Hermite filters of Nth-order form a steerable basis for each individual filter of order N. Based on the steering property, the Hermite filters at each position in the image adapt to the local orientation content. This adaptability results in significant compaction. For orientation analysis purposes, it is convenient to work with a rotational version of the HT. The polynomial coefficients can be computed through a convolution of the image with the filter functions Dm(x)Dnm(y); the properties of the filter functions are separable in spatial and polar domains and the Fourier transform of the filter functions are expressed in polar coordinates considering vx ¼ v cos u and vy ¼ v sin u, dm (vx )dnm (vy ) ¼ gm,nm (u) dn (v)

(24:7)

where dn(v) is the Fourier transform for each filter function, and the radial frequency of the filter function of the nth order Gaussian derivative is given by 1 dn (v) ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ ( jvs)n exp (vs)2 =4 n 2 n!

(24:8)

and the orientation selectivity of the filter is expressed by sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ n gm,nm (u) ¼ cosm u sinnm u m

(24:9)

Signal and Image Processing for Remote Sensing

544

In terms of orientation frequency functions, this property of the Hermite filters can be expressed by gm,nm (u u0 ) ¼

n X

cnm, k (u0 )gnk, k (u)

(24:10)

k¼0 n where cm,k (u0) is the steering coefficient. The Hermite filter rotation at each position over the image is an adaptation to local orientation content. Figure 24.3 shows the directional Hermite decomposition over an image. First a HT was applied and then the coefficients of this transform were rotated towards the local estimated orientation, according to a maximum oriented energy criterion at each window position. For local 1D patterns, the steered HT provides a very efficient representation. This representation consists of a parameter u, indicating the orientation of the pattern, and a small number of coefficients, representing the profile of the pattern perpendicular to its orientation. For a 1D pattern with orientation u, the following relation holds: 8 n
0

For such a pattern, steering over u results in a compaction of energy into the coefficients Lun,0, while all other coefficients are set to zero. The energy content can be expressed through the Hermite coefficients (Parseval Theorem) as E1 ¼

1 X n X

½Lnm, m 2

(24:12)

n¼0 m¼0

The energy up to order N, EN is defined as the addition of all squared coefficients up to N order.

FIGURE 24.3 Steered Hermite transform. (a) Original coefficients. (b) Steered coefficients. It can be noted that most coefficient energy is concentrated on the upper row.

The Hermite Transform

545

The steered Hemite transform offers a way to describe 1D patterns on the basis of their orientation and profile. We can differentiate 1D energy terms and 2D energy terms. That is, for each local signal we have E1D N (u) ¼

N X

2 Lun, 0 ,

(24:13)

n¼1

E2D N (u) ¼

N X n X

Lunm, m

2

(24:14)

n¼1 m¼1

24.3

Noise Reduction in SAR Images

The use of SAR images instead of visible and multi-spectral images is becoming increasingly popular, because of their capability of imaging even in the case of cloud-covered remote areas In addition to the all-weather capacity, there are several well-known advantages of SAR data over other imaging systems [20]. Unfortunately, the poor quality of SAR images makes it very difficult to perform direct information extraction tasks. Even more, the incorporation of external reference data (in-situ measurements) is frequently needed to guaranty a good positioning of the results. Numerous filters have been proposed to remove speckle in SAR imagery; however, in most cases and even in the most elegant approaches, filtering algorithms have a tendency to smooth speckle as well as information. For numerous applications, low-level processing of SAR images remains a partially unsolved problem. In this context, we propose a restoration algorithm that adaptively smoothes images. Its main advantage is that it retains subtle details. The HT coefficients are used to discriminate noise from relevant information such as borders and lines in a SAR image. Then an energy mask containing relevant image locations is built by thresholding the first-order transform coefficient energy E1: E1 ¼ 2 2 L0,1 þ L1,0 where L0,1 and L1,0 are the first-order coefficients of the HT. These coefficients are obtained by convolving the original image with the first-order derivatives of a Gaussian function, which are known to be quasi-optimal edge detectors [21]; therefore, the first-order energy can be used to discriminate edges from noise by means of a threshold scheme. The optimal threshold is set considering two important characteristics of SAR images. First, one-look amplitude SAR images have a Rayleigh distribution and the signal-tonoise ratio (SNR) is approximately 1.9131. Second, in general, the SNR of multi-look SAR pﬃﬃﬃﬃ images does not change over the whole image; furthermore, SNRNlooks ¼ 1.9131 N , which yields for a homogeneous region l: sl ¼

m1 pﬃﬃﬃﬃ 1:9131 N

(24:15)

where sl is the standard deviation of the region l, ml is its mean value, and N is the number of looks of the image. The first-order coefficient noise variance in homogeneous regions is given by s2 ¼ as2l ,

(24:16)

Signal and Image Processing for Remote Sensing

546 where

a ¼ jRL (x, y) D1, 0 (x, y) D1, 0 ( x, y)jx¼y¼0 RL is the normalized autocorrelation function of the input noise, and D1,0 is the filter used to calculate the first-order coefficient. Moreover, the probability density function (PDF) of L1,0 and L0,1 in uniform regions can be considered Gaussian, according to the Central Limit Theorem, then, the energy PDF is exponential: P(E1 ) ¼

1 E1 exp 2s2 2s2

(24:17)

Finally, the threshold is fixed: T ¼ 2 ln

1 s2 PR

(24:18)

where PR is the probability (percentage) of noise left on the image and will be set by the user. A careful analysis of this expression reveals that this threshold adapts to the local content of the image since Equation 24.15 and Equation 24.16 show the dependence of s on the local mean value ml, the latter being approximated by the Hermite coefficient L00. With the locations of relevant edges detected, the next step is to represent these locations as one-dimensional patterns. This can be achieved by steering the HT as described in the previous section so that the steering angle u is determined by the local u edge orientation. Next, only coefficients Ln,0 are preserved, all others are set to zero. In summary, the noise reduction strategy consists of classifying the image in either zero-dimensional patterns consisting of homogeneous noisy regions, or one-dimensional patterns containing noisy edges. The former are represented by the zeroth order coefficient, that is, the local mean value, and the latter by oriented 1D Hermite coefficients. When an inverse HT is performed over these selected coefficients, the resulting synthesized image consists of noise-free sharp edges and smoothed homogeneous regions. Therefore the denoised image preserves sharpness and thus, image quality. Some speckle remains in the image because there is always a compromise between the degree of noise reduction and the preservation of low-contrast edges. The user controls the balance of this compromise by changing the percentage of noise left PR on the image according to Equation 24.18. Figure 24.4 shows the algorithm for noise reduction, and Figure 24.5 through Figure 24.8 show different results of the algorithm.

24.4

Fusion Based on the Hermite Transform

Image fusion has become a useful tool to enhance information provided by two or more sensors by combining the most relevant features of each image. A wide range of disciplines including remote sensing and medicine have taken advantage of fusion techniques, which in recent years have evolved from simple linear combinations to sophisticated methods based on principal components, color models, and signal transformations

Reconstruction 1D → 2D in optimal directions

LNN

Noisy image

Forward polynomial transform

Projection 2D → 1D in several orientation

q

L01 Lq00 q LNN

L01 ..

Directional processing

.. ..

L10 L01 L00

The Hermite Transform

q

L00

Determination of orientation with maximum contrast

Energy computation Adaptive threshold

L10

Inverse polynomial transform

Restored image

LNN

E1 Th

Masking of polynomial coefficients

Binary decision

m1

Detection of edges’ position

FIGURE 24.4 Noise-reduction algorithm.

547

548

Signal and Image Processing for Remote Sensing

FIGURE 24.5 Left: Original SAR AeS-1 image. Right: Image after noise reduction.

among others [22–25]. Recently, multi-resolution techniques such as image pyramids and wavelet transforms have been successfully used [25–27]. Several authors have shown that, for image fusion, the wavelet transform approach offers good results [1,25,27]. Comparisons of Mallat’s and ‘‘a` trous’’ methodologies have been studied [28]. Furthermore, multi-sensor image fusion algorithms based on intensity modulation have been proposed for SAR and multi-band optical data fusion [29]. Information in the fused image must lead to improved accuracy (from redundant information) and improved capacity (from complementary information). Moreover, from a visual perception point of view, patterns included in the fused image must be perceptually relevant and must not include distracting artifacts. Our approach aims at analyzing images by means of the HT, which allows us to identify perceptually relevant patterns to be included in the fusion process while discriminating spurious artifacts. The steered HT has the advantage of energy compaction. Transform coefficients are selected with an energy compaction criterion from the steered Hermite transform;

FIGURE 24.6 Left: Original SEASAT image. Right: Image after noise reduction.

The Hermite Transform

549

FIGURE 24.7 Left: Original ERS1 image. Right: Image after noise reduction.

therefore, it is possible to reconstruct an image with few coefficients and still preserve details such as edges and textures. The general framework for fusion through HT includes five steps: (1) HT of the image. (2) Detection of maximum energy orientation with the energy measure E1D N (u) at each window position. In practice, one estimator of the optimal orientation u can be obtained through tan(u) ¼ L0,1/L1,0, where L0,1 and L1,0 are the first-order HT coefficients. (3) Adaptive steering of the transform coefficients, as described in previous sections. (4) Coefficient selection based on the method of verification of consistency [27]. This selection rule uses the maximum absolute value within a 5 5 window over the image (area of activity). The window variance is computed and used as a measurement of the activity associated with the central pixel of the window. In this way, a significant value indicates the presence of a dominant pattern in the local area. A map of binary decision is then

FIGURE 24.8 Left: Original ERS1 image. Right: Image after noise reduction.

Signal and Image Processing for Remote Sensing

550

q Image 1

Fused image

Image 2

q Inverse transform

Coefficient selection HT Local rotation

HT FIGURE 24.9 Fusion scheme with the Hermite transform.

created for the registry of the results. This binary map is subjected to consistency verification. (5) The final step of the fusion is the inverse transformation from the selected coefficients and their corresponding optimal u. Figure 24.9 shows a simplified diagram of this method.

24.4.1

Fusion Scheme with Multi-Spectral and Panchromatic Images

Our objective of image fusion is to generate synthetic images with a higher resolution that attempts to preserve the radiometric characteristics of the original multi-spectral data. It is desirable that any procedure that fuses high-resolution panchromatic data with low-resolution multi-spectral data preserves, as much as possible, the original spectral characteristics. To apply this image fusion method, it is necessary to resample the multi-spectral images so that their pixel size is the same as that of the panchromatic image’s. The steps for fusing multi-spectral and panchromatic images are as follows: (1) Generate new panchromatic images, whose histograms match those of each band of the multispectral image. (2) Apply the HT with local orientation extraction and detection of maximum energy orientation. (3) Select the coefficients based on the method of verification of consistency. (4) Inverse transformation with the optimal u resulting from the selected coefficient set. This process of fusion is depicted in Figure 24.10.

q Coefficient of high degree of energy

30 m

Inverse transform

q Multi-spectral Histogram match

15 m

Hermite transform

Hermite transform rotated L q = arctan 0,1 L1,0

Energy computation

Panchromatic

FIGURE 24.10 Hermite transform fusion for multi-spectral and panchromatic images.

Fusion

Fused image

The Hermite Transform 24.4.2

551

Experimental Results with Multi-Spectral and Panchromatic Images

The proposed fusion scheme with multi-spectral images has been tested on optical data. We fused multi-spectral images from Landsat ETMþ (30 m) with its panchromatic band (15 m). We show in Figure 24.11 how the proposed method can help improve spatial resolution. To evaluate the efficiency of the proposed method we calibrate the images so that digital values are transformed to reflectance values. Calibrated images were compared before and after fusion, by means of the Tasselep cap transformation (TCT) [30–32]. The TCT method is reported in [33]. The TCT transforms multi-spectral spatial values to a new domain based on biophysical variables, namely brightness, greenness, and a third component of the scene under study. It is deduced that the brightness component is a weighted sum of all the bands, based on the reflectance variation of the ground. The greenness component describes the contrast between near-infrared and visible bands with the mid-infrared bands. It is strongly related to the amount of green vegetation in the scene. Finally, the third component gives a measurement of the humidity content of the ground. Figure 24.12 shows the brightness, greenness, and third components obtained from HT fusion results. The TCT was applied to the original multi-spectral image, the HT fusion result, and principal component analysis (PCA) fusion method. To understand the variability of TCT results on the original, HT fusion and PCA fusion images, the greenness, and brightness components were compared. The greenness and brightness components define the plane of vegetation in ETMþ data. These results are displayed in Figure 24.13. It can be noticed that, in the case of PCA, the brightness and greenness content differs considerably from the original image, while in the case of HT they are very similar to the original ones. A linear regression analysis of the TCT components (Table 24.1) shows that the brightness and greenness components of the HT-fused image present a high linear correlation with the original image values. In other words, the biophysical properties of multi-spectral images are preserved when using the HT for image fusion, in contrast to the case of PCA fusion.

FIGURE 24.11 (a) Original Landsat 7 ETMþ image of Mexico city (resampled to 15 m to match geocoded panchromatic), (b) Resulting image pﬃﬃﬃ of ETMþ and panchromatic band fusion with Hermite transform (Gaussian window with spread s ¼ 2 and window spacing T ¼ 4) (RGB composition 5–4–3).

Signal and Image Processing for Remote Sensing

552

FIGURE 24.12 (a) Brightness, (b) greenness, and (c) third component in Hermite transform in a fused image.

24.4.3

Fusion Scheme with Multi-Spectral and SAR Images

In the case of SAR images, the characteristic noise, also known as speckle, imposes additional difficulties to the problem of image fusion. In spite of this limitation, the use of SAR images is becoming more popular due to their immunity to cloud coverage. Speckle removal, as described in previous section, is therefore a mandatory task in fusion (b)

240 220 200 180 160 140 120 100 80 60 40 20 0

Scale 600 400 200 0

0

50

100 150 Brightness

200

Greenness

Greenness

(a)

240 220 200 180 160 140 120 100 80 60 40 20 0

Scale 300 200 100 0

0

250

50

100

Brightness

(c)

Greenness

150

240 220 200 180 160 140 120 100 80 60 40 20 0

Scale 600 400 200 0

0

50

100 150 Brightness

200

250

FIGURE 24.13 (See color insert following page 302.) Greenness versus brightness, (a) original multi-spectral, (b) HT fusion, (c) PCA fusion.

200

250

The Hermite Transform

553

TABLE 24.1 Linear Regression Analysis of TCT Components: Correlation Factors of Original Image with HT Fusion and PCA Fusion Images

Correlation factor

Brightness (Original/HT)

Brightness (Original/PCA)

Greenness (Original/HT)

Greenness (Original/PCA)

Third Component (Original/HT)

Third Component (Original/PCA)

1.00

0.93

0.99

0.98

0.97

0.94

applications involving SAR imagery. The HT allows us to achieve both noise reduction and image fusion. It is easy to figure out that local orientation analysis for the purpose of noise reduction can be combined with image fusion in a single direct-inverse HT scheme. Figure 24.14 shows the complete methodology to reduce noise and fuse Landsat ETMþ with SAR images.

24.4.4

Experimental Results with Multi-Spectral and SAR Images

We fused multi-sensor images, namely SAR Radarsat (8 m) and multi-spectral Landsat ETMþ (30 m), with the HT and showed that in this case too spatial resolution was improved while spectral resolution was preserved. Speckle reduction in the SAR image was achieved, along with image fusion, within the analysis–synthesis process of the fusion scheme proposed. Figure 24.15 shows the result of panchromatic and SAR image HT fusion including speckle reduction. Figure 24.16 illustrates the result of multi-spectral and SAR image HT fusion. No significant distortion in the spectral and radiometric information is detected. A comparison of the TCT of the original multi-spectral image and the fused image can be seen in Figure 24.17. There is a variation between both plots; however, the vegetation

Energy mask

q

Multi-spectral

Histogram match

Coefficient combination

Hermite transform rotated

Hermite transform

q

SAR

Hermite transform

Hermite transform rotated

Energy mask

FIGURE 24.14 Noise reduction and fusion for multi-spectral and SAR images.

Noise reduction

Threshold

Inverse transform

Fused images

Signal and Image Processing for Remote Sensing

554

(a)

(b)

(c)

FIGURE 24.15 (a) Radarsat image with speckle (1998). (b) Panchromatic Landsat-7 ETMþ (1998). (c) Resulting image fusion with noise reduction.

(a)

(b)

FIGURE 24.16 (See color insert following page 302.) (a) Original multi-spectral. (b) Result of ETMþ and Radarsat image fusion with HT (Gaussian window with pﬃﬃﬃ spread s ¼ 2 and window spacing d ¼ 4) (RGB composition 5–4–3).

240

240

220

220

200

200 180

Scale

160

600

140

400

120

200

100

0

80

Greenness

Greenness

180

6000

140

4000

120

2000

100

0

80

60

60

40

40

20

20

0

0 0

(a)

Scale

160

50

100

150

Brightness

200

250

0

(b)

50

100

150

200

Brightness

FIGURE 24.17 (See color insert following page 302.) Greenness versus brightness: (a) original multi-spectral, (b) LANDSAT–SAR fusion with HT.

250

The Hermite Transform

555

FIGURE 24.18 Left: Original first principal component of a 25 m resolution LANDSAT TM5 image. Right: Result of fusion with SAR AeS-1 denoised image of Figure 24.5.

plane remains similar, meaning that the fused image still can be used to interpret biophysical properties. Another fusion result is displayed in Figrue 24.18. In this case, the 5 m resolution SAR AeS-1 denoised image displayed on right side of Figure 24.5 is fused with its corresponding 25 m resolution LANDSAT TM5 image. Multi-spectral bands were analyzed with principal components. The first component is shown on the left in Figure 24.18. Fusion of the latter with the SAR AeS-1 image is shown on the right. Note the resolution improvement of the fused image in comparison with the LANDSAT image.

24 .5

Conc lusi ons

In this chapter the HT was introduced as an efficient image representation model that can be used for noise reduction and fusion in remote perception imagery. Other applications such as coding and motion estimation have been demonstrated in related works [13,14]. In the case of noise reduction in SAR images, the adaptive algorithm presented here allows us to preserve image sharpness while smoothing homogeneous regions. The proposed fusion algorithm based on the HT integrates images with different spatial and spectral resolutions, either from the same or different image sensors. The algorithm is intended to preserve both the highest spatial and spectral resolutions of the original data. In the case of ETMþ multi-spectral and panchromatic image fusion, we demonstrated that the HT fusion method did not lose the radiometric properties of the original multispectral image; thus, the fused image preserved biophysical variable interpretation. Furthermore, the spatial resolution of the fused images was considerably improved. In the case of SAR and ETMþ image fusion, spatial resolution of the fused image was also improved, and we showed for this case how noise reduction could be incorporated within the fusion scheme. These algorithms present several common features, namely, detection of relevant image primitives, local orientation analysis, and Gaussian derivative operators, which are common to some of the more important characteristics of the early stages of human vision.

556

Signal and Image Processing for Remote Sensing

The algorithms presented here are formulated in a single spatial scale scheme, that is, the Gaussian window of analysis is fixed; however, multi-resolution is also an important characteristic of human vision and has also proved to be an efficient way to construct image processing solutions. Multi-resolution image processing algorithms are straightforward to build from the HT by means of hierarchical pyramidal structures that replicate, at each resolution level, the analysis–synthesis image processing schemes proposed here. Moreover, a formal approach to the multi-resolution HT for local orientation analysis has been recently developed, clearing the way to propose new multi-resolution image processing tasks [8,9].

Ackn owledgments This work was sponsored by UNAM grant PAPIIT IN105505 and by the Center for Geography and Geomatics Research ‘‘Ing. Jorge L. Tamayo’’.

References 1. D.J. Fleet and A.D. Jepson, Hierarchical construction of orientation and velocity selective filters, IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(3), 315–325, 1989. 2. J. Koenderink and A.J. Van Doorn, Generic neighborhood operators, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 597–605, 1992. 3. J. Bevington and R. Mersereau, Differential operator based edge and line detection, Proceedings ICASSP, 249–252, 1987. 4. V. Torre and T. Poggio, On edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 147–163, 1986. 5. R. Young, The Gaussian derivative theory of spatial vision: analysis of cortical cell receptive field line-weighting profiles, General Motors Research Laboratory, Report 4920, 1986. 6. J.B. Martens, The Hermite transform—theory, IEEE Transactions on Acoustics, Speech and Signal Processing, 38(9), 1607–1618, 1990. 7. J.B. Martens, The Hermite transform—applications, IEEE Transactions on Acoustics, Speech and Signal Processing, 38(9), 1595–1606, 1990. 8. B. Escalante-Ramı´rez and J.L. Silvan-Cardenas, Advanced modeling of visual information processing: a multiresolution directional-oriented image transform based on Gaussian derivatives, Signal Processing: Image Communication, 20, 801–812, 2005. 9. J.L. Silva´n-Ca´rdenas and B. Escalante-Ramı´rez, The multiscale hermite transform for local orientation analysis, IEEE Transactions on Image Processing, 15(5), 1236–1253, 2006. 10. R. Young, Oh say, can you see? The physiology of vision, Proceedings of SPIE, 1453, 92–723, 1991. 11. Z.-Q. Liu, R.M. Rangayyan, and C.B. Frank, Directional analysis of images in scale space, [On line] IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(11), 1185–1192, 1991; http://iel.ihs.com:80/cgi-bin/. 12. B. Escalante-Ramı´rez and J.-B. Martens, Noise reduction in computed tomography images by means of polynomial transforms, Journal of Visual Communication and Image Representation, 3(3), 272–285, 1992 13. J.L. Silva´n-Ca´rdenas and B. Escalante-Ramı´rez, Image coding with a directional-oriented discrete Hermite transform on a hexagonal sampling lattice, Applications of Digital Image Processing XXIV (A.G. Tescher, Ed.), Proceedings of SPIE, 4472, 528–536, 2001.

The Hermite Transform

557

14. B. Escalante-Ramı´rez, J.L. Silva´n-Ca´rdenas, and H. Yuen-Zhou, Optic flow estimation using the Hermite transform, Applications of Digital Image Processing XXVII (A.G. Tescher, Ed.), Proceedings of SPIE, 5558, 632–643, 2004. 15. G. Granlund and H. Knutsson, Signal Processing for Computer Vision, Kluwer, Dordrecht, The Netherlands, 1995. 16. W.T. Freeman and E.H. Adelson, The design and use of steerable filters, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9), 891–906, 1991. 17. M. Michaelis and G. Sommer, A lie group-approach to steerable filters, Pattern Recognition Letters, 16(11), 1165–1174, 1995. 18. G. Szego¨, Orthogonal Polynomials, American Mathematical Society, Colloquium Publications, 1959. 19. A.M. Van Dijk and J.B. Martens, Image representation and compression with steered Hermite transform, Signal Processing, 56, 1–16, 1997. 20. F. Leberl, Radargrammetric Image Processing, Artech House, Inc, 1990. 21. J.F. Canny, Finding edges and lines in images, MIT Technical Report 720, 1983. 22. C. Pohl and J.L. van Genderen, Multisensor image fusion in remote sensing: concepts, methods and applications, International Journal of Remote Sensing, 19(5), 823–854, 1998. 23. Y. Du, P.W. Vachon, and J.J. van der Sanden, Satellite image fusion with multiscale wavelet analysis for marine applications: preserving spatial information and minimizing artifacts (PSIMA), Canadian Journal of Remote Sensing, 29, 14–23, 2003. 24. T. Feingersh, B.G.H. Gorte, and H.J.C. van Leeuwen, Fusion of SAR and SPOT image data for crop mapping, Proceedings of the International Geoscience and Remote Sensing Symposium, IGARSS, pp. 873–875, 2001. 25. J. Nu´n˜ez, X. Otazu, O. Fors, A. Prades, and R. Arbiol, Multiresolution-based image fusion with additive wavelet decomposition, IEEE Transactions on Geoscience and Remote Sensing, 37(3), 1204– 1211, 1999. 26. T. Ranchin and L. Wald, Fusion of high spatial and spectral resolution images: the ARSIS concepts and its implementation, Photogrammetric Engineering and Remote Sensing, 66(1), 49–61, 2000. 27. H. Li, B.S. Manjunath, and S.K. Mitra, Multisensor image fusion using the wavelet transform, Graphical Models and Image Processing, 57(3), 235–245, 1995. 28. M. Gonza´lez-Audı´cana, X. Otazu, O. Fors, and A. Seco, Comparison between Mallat’s and the ‘a` trous’ discrete wavelet transform based algorithms for the fusion of multispectyral and panchromatic images, International Journal of Remote Sensing, 26(3,10), 595–614, 2005. 29. L. Alparone, S. Baronti, A. Garzelli, and F. Nencini, Landsat ETMþ and SAR image fusion based on generalized intensity modulation, IEEE Transactions on Geoscience and Remote Sensing, 42(12), 2832–2839, 2004. 30. E.P. Crist and R.C. Cicone, A physically based transformation of thematic mapper data—the TM Tasselep Cap, IEEE Transactions on Geoscience and Remote Sensing, 22(3), 256–263, 1984. 31. E.P. Crist and R.J. Kauth, The tasseled cap de-mystified, Photogrammetric Engineering and Remote Sensing, 52(1), 81–86, 1986. 32. E.P. Crist and R.C. Cicone, Application of the Tasseled Cap concept to simulated thematic mapper data, Photogrammetric Engineering and Remote Sensing, 50(3), 343–352, 1984. 33. C. Huang, B. Wylie, L. Yang, C. Homer, and. G. Zylstra, Derivation of a tasseled cap transform. Based on Landsat 7 at satellite reflectance, Raytheon ITSS, USGS EROS Data Center, Sioux Falls, SD 57198, USA www.nr.usu.edu/regap/download/documents/t-cap/usgs-tcap.pdf.

25 Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

A.V. Bogdanov, S. Sandven, O.M. Johannessen, V.Yu. Alexandrov, and L.P. Bobylev

CONTENTS 25.1 Introduction ..................................................................................................................... 559 25.2 Data Sets and Image Interpretation ............................................................................. 562 25.2.1 Acquisition and Processing of Satellite Images ........................................... 562 25.2.2 Visual Analysis of the Images ........................................................................ 566 25.2.3 Selection of Training Regions ......................................................................... 566 25.3 Algorithms Used for Sea Ice Classification ................................................................ 567 25.3.1 General Methodology....................................................................................... 567 25.3.2 Image Features .................................................................................................. 568 25.3.3 Backpropagation Neural Network................................................................. 570 25.3.4 Linear Discriminant Analysis Based Algorithm.......................................... 571 25.4 Results............................................................................................................................... 572 25.4.1 Analysis of Sensor Brightness Scatterplots................................................... 572 25.4.2 MLP Training..................................................................................................... 574 25.4.3 MLP Performance for Sensor Data Fusion ................................................... 576 25.4.3.1 ERS and RADARSAT SAR Image Classification........................ 576 25.4.3.2 Fusion of ERS and RADARSAT SAR Images ............................. 576 25.4.3.3 Fusion of ERS, RADARSAT SAR, and Meteor Visible Images .................................................................................. 577 25.4.4 LDA Algorithm Performance for Sensor Data Fusion ............................... 577 25.4.5 Texture Features for Multi-Sensor Data Set ................................................. 578 25.4.6 Neural Network Optimization and Reduction of the Number of Input Features............................................................................... 578 25.4.6.1 Neural Network with No Hidden Layers ................................... 579 25.4.6.2 Neural Network with One Hidden Layer ................................... 580 25.4.7 Classification of the Whole Image Scene ...................................................... 582 25.5 Conclusions...................................................................................................................... 585 Acknowledgments ..................................................................................................................... 586 References ................................................................................................................................... 586

25.1

Introduction

Satellite radar systems have an important ability to observe the Earth’s surface, independent of cloud and light conditions. This property of satellite radars is particularly 559

560

Signal and Image Processing for Remote Sensing

useful in high latitude regions, where harsh weather conditions and the polar night restrict the use of optical sensors. Regular observations of sea ice using space-borne radars started in 1983 when the Russian OKEAN side looking radar (SLR) system became operational. The wide swath (450 km) SLR images of 0.7–2.8 km spatial resolution were used to support ship transportation along the Northern Sea Route and to provide ice information to facilitate other polar activities. Sea ice observation using high-resolution synthetic aperture radar (SAR) from satellites began with the launch of Seasat in 1978, which operated for only three months, and continued with ERS from 1991, RADARSAT from 1996, and ENVISAT from 2002. Satellite SAR images with a typical pixel size of 30 m to 100 m, allow observation of a number of sea ice parameters such as floe parameters [1], concentration [2], drift [3], ice type classification [4,5,6], leads [7], and ice edge processes. RADARSAT wide swath SAR images, providing enlarged spatial coverage, are now in operation at several sea ice centers [8]. The ENVISAT advanced SAR (ASAR) operating at several imaging modes, including single polarization wide swath (400 km) and alternating polarization narrow swath (100 km) modes can improve the classification of several ice types and open water (OW) using dual polarization images. Several methods for sea ice classification have been developed and tested [9–11]. The straightforward and physically plausible approach is based on the application of sea ice microwave scattering models for the inverse problem solution [12]. This is, however, a difficult task because the SAR signature depends on many sea ice characteristics [13]. A common approach in classification is to use empirically determined sea ice backscatter coefficients obtained from field campaigns [14,15]. Classical statistical methods based on Bayesian theory [16] are known to be optimal if the form of the probability density function (PDF) is known and can be parameterized in the algorithm. A Bayesian classifier, developed at the Alaska SAR Facility (ASF) [4], assumes a Gaussian distribution of sea ice backscatter coefficients [17]. Utilization of backscatter coefficients only, limits the number of ice classes that can be distinguished and decreases the accuracy of classification because backscatter coefficients of several sea ice types and OW overlap significantly [18]. Incorporation of other image features with a non-Gaussian distribution requires modeling of the joint PDF of features from different sensors, which is difficult to achieve. The classification errors can be grouped into two categories: (1) labeling inconsistencies and (2) classification-induced errors [19]. The errors in the first group are due to mixed pixels, transition zones between different ice regimes, temporal change of physical properties, sea ice drift, within-class variability, and limited training and test data sets. The errors in the second group are errors induced by the classifier. These errors can be due to the selection of an improper classifier for the given problem, its parameters, learning algorithms, input features, etc.—the problems traditionally considered within pattern recognition and classification domains. Fusion of data from several observation systems can greatly reduce errors in labeling inconsistency. These can be satellite and aircraft images obtained at different wavelengths and polarizations, data in cartographic format represented by vectors and polygons (i.e., bathymetry profiles, currents, meteorological information), and expert knowledge. Data fusion can improve the classification and extend the use of the algorithms to larger geographical areas and several seasons. Data fusion can be done using statistical methods, the theory of belief functions, fuzzy logic and fuzzy set theory, neural networks, and expert systems [20]. Some of these methods have been successfully applied to sea ice classification [21]. Haverkamp et al. [11] combined a number of SAR-derived sea ice parameters and expert geophysical knowledge in the rule-based expert system. Beaven [22] used a combination of ERS-1 SAR and special sensor microwave/imager (SSM/I) data to improve estimates of ice concentration after the onset of freeze-up. Soh and Tsatsoulis [23] used information from various data sources in a new fusion process

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

561

based on Dempster–Shafer belief theory. Steffen and Heinrichs [24] merged ERS SAR and Landsat thematic mapper data using a maximum likelihood classifier. These studies demonstrated the advantages that can be gained by fusing different types of data. However, there is still forthcoming work to compare different sea sensor data fusion algorithms and assess their performances using ground-truth data. In this study we investigate and analyze the performance of an artificial neural network model applied for sea ice classification and compare its performance with the performance of the linear discriminant analysis (LDA) based algorithm. Artificial neural network models received high attention during the last decades due to their ability to approximate complex input–output relationships using a training data set, perform without any prior assumptions on the statistical model of the data, generalize well on the new, previously unseen data (see Ref. [25] and therein), and be less affected by noise. These properties make neural networks especially attractive for the sensor data fusion and classification. Empirical comparisons of neural network–based algorithms with the standard parametric statistical classifiers [26,27] showed that the neural network model, being distribution free, can outperform the statistical methods on the condition that a sufficient number of representative training samples is presented to the neural network. It also avoids the problem of determining the amount of influence a source should have in the classification [26]. Standard statistical parametric classifiers require a statistical model and thus work well when the used statistical model (usually multivariate normal) is in good correspondence with the observed data. There are not many comparisons of neural network models with nonparametric statistical algorithms. However, there are some indications that these algorithms can work at least as well as neural network approaches [28]. Several researchers proposed neural network models for sea ice classification. Key et al. [29] applied a backpropagation neural network to fuse the data of two satellite radiometers. Sea ice was among 12 surface and cloud classes identified on the images. Hara et al. [30] developed an unsupervised algorithm that combines learning vector quantization and iterative maximum likelihood algorithms for the classification of polarimetric SAR images. The total classification accuracy, estimated using three ice classes, comprised 77.8% in the best case (P-band). Karvonen [31] used a pulse-coupled neural network for unsupervised sea ice classification in RADARSAT SAR images. Although these studies demonstrated the usefulness of neural network models when applied to sea ice classification, the algorithms still need to be extensively tested under different environmental conditions using ground-truth data. It is unclear whether neural network models outperform traditional statistical classifiers and generalize well on the test data set. It is also unclear which input features and neural network structure should be used in classification. This study analyzes the performance of a multi-sensor data fusion algorithm based on a multi-layer neural network also known as multi-layer perceptron (MLP) applied for sea ice classification. The algorithm fuses three different types of satellite images: ERS, RADARSAT SAR, and low-resolution visible images; each type of data carries unique information on sea ice properties. The structure of the neural network is optimized for the sea ice classification using a pruning method that removes redundant connections between neurons. The analysis presented in this study consists of the following steps: Firstly, we use a set of in situ sea ice observations to estimate the contribution of different sensor combinations to the total classification accuracy. Secondly, we evaluate the positive effect of SAR image texture features included in the ice classification algorithm, utilizing only tonal image information. Thirdly, we verify the performance of the classifier by comparing it with the performance of the standard statistical approach. As a benchmark, and for comparison, we use an LDA-based algorithm [6] that resides in an intermediate position

Signal and Image Processing for Remote Sensing

562

between parametric and nonparametric algorithms such as the K-nearest-neighbor classifier. Finally, the whole image area is classified and analyzed to give additional evidence of the generalization properties of the classifier; and the results of automatic classification are compared with manually prepared classification maps. In the following sections we describe the multi-sensor image sets used and the in situ data (Section 25.2), the MLP and LDA-based classification algorithms (Section 25.3), and finally discuss the results of our experiments in Section 25.4.

25.2 25.2.1

Data Set s and Image Interpretation Acquisition and Processing of Satellite Images

In our experiments, we used a set of spatially overlapping ERS-2 SAR low-resolution images (LRI), RADARSAT Scan SAR Wide beam mode image, and Meteor 3/5 TV optical image, acquired on April 30, 1998. The characteristics of the satellite data are summarized in Table 25.1. The RADARSAT Scan SAR scene and the corresponding fragment of the Meteor image, covering a part of the coastal Kara Sea with the Ob and Yenisey estuaries, are shown in Figure 25.1. ERS SAR image has the narrowest swath width (100 km) among the three sensors. Thus the size of the image fragments (Figure 25.2) used for fusion is limited by the spatial coverage of the two ERS SAR images available for the study shown in Figure 25.2a. The images contain various stages and forms of first year, young, and new ice. The selection of test and training regions in the images is done using in situ observations made onboard the Russian nuclear icebreaker ‘‘Sovetsky Soyuz,’’ which sailed through the area as shown in Figure 25.1a by a white line. Compressed SAR images were transmitted to the icebreaker via INMARSAT in near realtime and were available onboard for ice navigation. The satellite images onboard enabled direct identification of various sea ice types observed in SAR images and verification of their radar signatures. The SAR data were received and processed into images at Kongsberg Satellite Services in Tromsø, Norway. The ScanSAR image is 500 km wide and has 100 m spatial resolution (Table 25.1), which corresponds to a pixel spacing of 50 m. The image was filtered and down-sampled to have the same pixel size (100 m) as the ERS SAR LRI with 200 m spatial resolution (Table 25.1). Further processing includes antenna pattern correction, range spreading loss compensation, and a correction for incidence angle. The resulting pixel TABLE 25.1 The Main Parameters of the Satellite Systems and Images Used in the Study

Sensor RADARSAT Scan SAR Wide beam mode ERS SAR low resolution image (LRI) Meteor-3/5 MR-900 TV camera system

Wavelength and Band

Polarization

Swath Width

Spatial Resolution/ Number of Looks

Range of Incidence Angles

5.66 cm C-band

HH

500 km

100 m 42

208–498

5.66 cm C-band

VV

100 km

200 m 30

208–268

0.5–0.7 mm VIS

Nonpolarized, panchromatic

2600 km

2 km

46.68(left)–46.68 (right-looking)

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

563

(a) E75⬚

E70⬚

E80⬚

E85⬚

N75⬚

N74⬚

N73⬚

N72⬚

N71⬚

(b) E65⬚

E70⬚

E75⬚

E80⬚

E85⬚

E90⬚

E95⬚

N78⬚

N77⬚

N76⬚

N75⬚

N74⬚

N73⬚

FIGURE 25.1 RADARSAT Scan SAR (a) and Meteor 3/5 TV (b) images acquired on April 30, 1998. The icebreaker route and coastal line are shown. Flaw polynyas are marked by letters A, B, and C.

564

(a)

N75⬚

E76⬚

E78⬚

E80⬚

E82⬚

(b) E76⬚

E78⬚

E80⬚

E82⬚

(c)

N75⬚

N75⬚

N74⬚30'

N74⬚30'

N74⬚

N74⬚

N73⬚30'

N73⬚30'

N73⬚

N73⬚

E76⬚

E78⬚

E80⬚

E82⬚

N74⬚30'

N74⬚

FIGURE 25.2 Satellite images used for data fusion: (a) mosaic of ERS-2 SAR images, (b) a part of the RADARSAT Scan SAR image, and (c) a part of Meteor-3/5 TV image (April 30, 1998). Coastal line, fast ice edge [dark lines in (a) and (b)], and ERS image border are overlaid. The letters mark: (A) nilas, new ice, and open water, (B) first-year ice, and (C) young ice.

Signal and Image Processing for Remote Sensing

N73⬚30'

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

565

value is proportional to the logarithm of the backscatter coefficient. The scaling factor and a fixed offset, normally provided in CEOS radiometric data record, are used to obtain absolute values of the backscatter coefficients (sigma-zero) in decibels [32]. These parameters are not available for the relevant operational quantized 8-bit product, making retrieval of absolute values of sigma-zero difficult. However, in a supervised classification procedure it is important that only relative values of image brightness within a single image and across different images used in classification are preserved. Variations of backscatter coefficients of sea ice in the range direction are relatively large, varying from 4 dB (for dry multi-year ice) to 8 dB (for wet ice) [33], due to the large range of incidence angles from 208 to 498. The range-varying normalization, using empirical dependencies for the first-year (FY) ice dominant in the images, was applied to reduce this effect [33]. The uncompensated radiometric residuals for the other ice types presented in the images increase classification error. The latter effect may be reduced by application of texture and other statistical local parameters, or by restricting the range of incidence angles and training classification algorithms separately within each range. In this study we apply texture features, which depend on relative image values and thus should be less sensitive to the variations of image brightness in range direction. The two ERS SAR LRI (200 m spatial resolution) were processed in a similar way to the RADARSAT image. The image pixel value is proportional to the square root of backscatter coefficients [34], which is different from the RADARSAT pixel value representation where a logarithm function is used. The absolute values of the backscatter coefficients can be obtained using calibration constants provided by the European Space Agency (ESA) [34], but for this study we used only the pixel values derived from the processing described above. The visual image was obtained in the visible spectrum (0.5–0.7 mm) by the MR-900 camera system used onboard the Meteor-3/5 satellite. The swath width of the sensor is 2600 km and the spatial resolution is 2 km. For fusion purposes the coarse image is resampled to the same pixel size as RADARSAT and ERS images. Even though no clouds are observed in the image, small or minor clouds might be present but not visible in the image due to the ice-dominated background. For spatial alignment, the images were georeferenced using corner coordinates and ground control points and then transformed to the universal transverse mercator (UTM) geographical projection. The corresponding pixels of the spatially aligned and resampled images cover approximately the same ice on the ground. Because the images are acquired with time delay reaching 8 h 42 min for RADARSAT—Meteor images (Table 25.2) and several kilometers ice drift occur during this period, a certain mismatch of the ice features in the images are present. This is corrected for as much as possible, but there are still minor errors in the co-location of ice features due to rotation and local convergence or divergence of the drifting ice pack. The fast ice does not introduce this error due to its stationarity.

TABLE 25.2 Satellite Images and In Situ Data Sensor

Date/Time (GMT)

No. of Images

No. of In Situ Observations

RADARSAT Scan SAR ERS-2 SAR Meteor-3 TV camera MR 900

30 April 1998/11:58 30 April 1998/06:39 30 April 1998/03:16

1 3 1

56 25 >56

Signal and Image Processing for Remote Sensing

566 25.2.2

Visual Analysis of the Images

The ice in the area covered by the visual image shown in Figure 25.1b mostly consists of thick and medium FY ice of different deformations, identified by a bright signature in the visible image and various grayish signatures in the ScanSAR image in Figure 25.1a. Due to dominant easterly and southeasterly winds in the region before and during the image acquisition, the ice drifted westwards, creating the coastal polynyas with OW and very thin ice, characterized by the dark signatures in the optical image. Over new and young ice types, the brightness of ice in the visual image increases as the ice becomes thicker. Over FY ice types, increases in ice thickness are masked by high albedo snow cover. The coarse spatial resolution of the TV image reduces the discrimination ability of the classifier, which is especially noticeable in regions of mixed sea ice. However, two considerations need to be taken into account: firstly, the texture features computed over the relatively large SAR image regions are themselves characterized by the lower spatial resolution, and secondly, the neural network–based classifier providing nonlinear input–output mapping can theoretically mediate the later affects by combining low and high spatial resolution data. The physical processes of scattering, reflectance, and attenuation of microwaves determine sea ice radar signatures [35]. The scattered signal received depends on the surface and volume properties of ice. For thin, high salinity ice types the attenuation of microwaves in the ice volume is high, and the backscatter signal is mostly due to surface scattering. Multi-year ice characterized by strong volume scattering is usually not observed in the studied region. During the initial stages of sea ice growth, the sea ice exhibits a strong change in its physical and chemical properties [36]. Radar signatures of thin sea ice starting to form at different periods of time and growing under different ambient conditions are very diverse. Polynyas appearing dark in the visual image (Figure 25.1b, regions A, B, and C) are depicted by various levels of brightness in the SAR image in Figure 25.1a. The dark signature of the SAR image in region A corresponds mostly to grease ice formed on the water surface. The low image brightness of grease ice is primarily due to its high salinity and smooth surface, which results in a strong specular reflection of the incident electromagnetic waves. At C-band, this ice is often detectable due to the brighter scattering of adjacent, rougher, OW. Smooth nilas also appears dark in the SAR image but the formation of salt flowers or its rafting strongly increases the backscatter. The bright signature of the polynya in region B could be due to the formation of pancake ice, brash ice, or salt flowers on the surface of the nilas. A common problem of sea ice classification of SAR images acquired at single frequency and polarization is the separation of OW and sea ice, since backscatter of OW changes as a function of wind speed. An example of ice-free polynya can be found in region C in Figure 25.1a where OW and thin ice have practically the same backscatter. This vast polynya (Taimyrskaya), expanding far northeast, can be easily identified in the Meteor TV image in Figure 25.1b, region C, due to its dark signature (RADARSAT SAR image covers only a part of it). As mentioned before, dark signature in the visual image mainly corresponds to the thin ice and OW. Therefore, to some extent, the visual image is complementary to SAR data, enabling separation of FY ice with different surface roughness from thinner sea ice and OW. SAR images, on the other hand, can be used for classification of FY ice of different surface roughness, and separation of thin ice types.

25.2.3

Selection of Training Regions

For supervised classification it is necessary to define sea ice classes and to select in the image training regions for each class. The classes should generally correspond to the

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

567

TABLE 25.3 Description of the Ice Classes and the Number of Training and Test Feature Vectors for Each Class

Sea Ice Class 1. Smooth first-year ice 2. Medium deformation first-year ice 3. Deformed first-year ice 4. Young ice

5. Nilas 6. Open water

Description Very smooth first-year ice of medium thickness (70–120 cm) Deformed medium and thick (>120 cm) first-year ice, deformation is 2–3 using the 5-grade scale The same as above, but with deformation 3–5 Gray (10–15 cm) and gray–white (15–30 cm) ice, small floes (20–100 m) and ice cake, contains new ice in between floes space Nilas (5–10 cm), grease ice, areas of open water Mostly open water, at some places formation of new ice on water surface

No. of Training Vectors

No. of Test Vectors

150

130

1400

1400

1400

1400

1400

1400

1400

1400

30

26

World Meteorological Organization (WMO) terminology [37], so that the produced sea ice maps can be used in practical applications. WMO defines a number of sea ice types and parameters, but the WMO classification is not necessarily in agreement with the classification that can be retrieved from satellite images. In defining the sea ice classes, we combine some of the ice types into the larger classes based on a priori knowledge of their separation in the images and some practical considerations. For navigation in sea ice it is more important to identify thicker ice types, their deformations, and OW regions. Since microwave backscatter from active radars such as SAR is sensitive to various stages of new and young ice, multi-year FY ice and surface roughness, we have selected the following six sea ice classes for use in the classification: smooth, medium deformation, deformed FY ice, young ice, nilas, and open water OW. From their description given in Table 25.3 it is seen that the defined ice classes contain inclusions of other ice types because it is usually difficult to find ‘‘pure’’ ice types extended over large areas in the studied region. The selected training and test regions for different sea ice classes, overlaid on the RADARSAT SAR image, are shown by the rectangles in Figure 25.3. These are homogeneous areas that represent ‘‘typical’’ ice signatures as known a priori based on the combined analysis of the multi-sensor data set, image archive, bathymetry, meteorological data, and in situ observations. The in situ ice observations from the icebreaker were done along the whole sailing route between Murmansk and the Yenisei estuary. In this study we have mainly used the observations falling into the image fragment used for fusion, or located nearby, as shown in Figure 25.3a. Examples of photographs of various ice types are shown in Figure 25.4.

25.3 25.3.1

Algorithms Used for Sea Ice Classification General Methodology

To assess the improvement of classification accuracy that can be achieved by combining data from the three sensors we trained and tested several classifiers using different combinations of image features stacked in feature vectors. A set of the feature vectors

Signal and Image Processing for Remote Sensing

568 (a)

(b) Young ice

FY smooth

Nilas

FY medium deformation

Open water

FY deformed

FIGURE 25.3 Selection of the training and test regions: (a) fragment of the RADARSAT Scan SAR image (April 30, 1998) with the ship route and the image regions for different ice classes overlaid and (b) enlarged part of the same fragment.

computed for different ice classes is randomly separated into training and test data sets. The smaller dimensionality subsets have been produced from the original data sets containing all features and were used for training and validation of both algorithms in the experiments described below.

25.3.2

Image Features

The SAR image features used for the feature vectors are in three main groups: image moments, gray level co-occurrence matrix (GLCM) texture, and autocorrelation function– based features. These features describe local statistical image properties within a small region of an image. They have been investigated in several studies [6,9,10,38–41] and are in general found to be useful for sea ice classification. A set of the most informative features differs from study to study, and it may depend on several factors including geographical region, ambient conditions, etc. Application of texture usually increases classification accuracy; however, it cannot fully resolve ambiguities between different sea ice types, so that incorporation of other information is required. The texture features are often understood as a description of spatial variations of image brightness in a small image region. Some texture features can be used to describe regular patterns in the region, while others depend on the overall distribution of brightness. Texture has been used for a long time by sea ice image interpreters for visual classification of different sea ice types in radar images. For example, multi-year ice is characterized by a patchy image structure explained by the formation of numerous melting ponds on its surface during summer and then freezing in winter. Another example is the network of bright linear segments corresponding to ridges in the deformed FY ice. The texture depends on the spatial resolution of the radar, the spatial scale of sea ice surface, and volume inhomogeneity. There is currently a lack of information on large-scale sea ice properties, and as a consequence, on mechanisms of texture formation.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data (a)

(b)

(c)

(d)

(e)

(f)

569

FIGURE 25.4 Photographs of different sea ice types outlining different mechanisms of ice surface roughness formation. (a) Deformed first-year ice. (b) Gray–white ice, presumably formed from congealed pancake ice. (c) Rafted nilas. (d) Pancake ice formed in the marginal ice zone from newly formed ice (grease ice, frazil ice) influenced by surface waves. (e) Frost flowers on top of the first-year ice. (f) Level fast ice (smooth first-year ice).

In supervised classification the texture features are computed over the defined training regions and the classifier is trained to recognize similar patterns in the newly acquired images. Several texture patterns can correspond to one ice class, which implies the existence of several disjointed regions in feature space for the given class. The latter, however, is not observed in our data. The structure of data in the input space is affected by several factors including definition of ice classes, selection of the training regions, and existence of smooth transitions between different textures. In this study the training and test data have been collected over a relatively small geographic area where image and in situ data are overlapped. In contrast to this local approach, the ice texture

Signal and Image Processing for Remote Sensing

570

investigation can be carried out using training regions selected over a relatively large geographic area and across different seasons based on visual analysis of images [38]. Selection of ice types that may have several visually distinct textures can facilitate formation of disjointed or complex form clusters in the feature space pertinent for one ice type. Note that in this case MLP should show better results than the LDA-based algorithm. The approach to texture computation is closely related to the classification approach adopted to design multi-season, large geographic area classification system using (1) a single classifier with additional inputs indicating area and season (month number), (2) a set (ensemble) of local classifiers designed to classify ice within a particular region and season, and (3) a multiple classifier system (MCS). The trained classifier presented in this paper can be considered as a member of a set of classifiers, each of which performs a simpler job than a single multi-season, multi-region classifier. The image moments used in this study are mean value, second-, third-, and fourthorder moments, and central moments computed over the distribution of pixel values within a small computation window. The GLCM-based texture features include homogeneity, contrast, entropy, inverse difference moment [42], cluster prominence, and cluster shade. The autocorrelation function–based features are decorrelation lengths computed along 08, 458, and 908 directions. In total, 16 features are used for SAR image classification. Only the mean value was used for the visual image because of its lower spatial resolution. The texture computation parameters are selected experimentally, taking into account the results of previous investigations [6,9,10,38–41]. There are several important parameters that need to be defined for GLCM: (1) the computation window size; (2) the displacement value, also called interpixel distance; (3) the number of quantization levels; and (4) orientation. We took into account that the studied region contains mixed sea ice types, while defining these parameters. With increasing window size and interpixel distance (which is related to the spatial scale of inhomogeneities ‘‘captured’’ by the algorithm), computed texture would be more affected by the composition of ice types within the computational window rather than the properties of ice. Therefore in the hard classification approach adopted here, we selected the smaller window size equal to 55 pixels and interpixel distance equal to 2. This implies that we explore moderate scale ice texture. The use of macro texture information (larger displacement values) or multi-scale information (a range of different displacement values), recommended in the latest and comprehensive ice texture study [38], would require a soft classification approach in our case. To reduce the computational time, the range of image gray levels is usually quantized into a number of separate bins. The image quantization, generally leading to the loss of image information, does not strongly influence the computation of texture parameters on the condition that a sufficient number of bins are used (>16–32) [38]. In our experiments the range of image gray levels is quantified to the 20 equally spaced bins (see Ref. [38] for the discussion on different quantization schemes); the GLCM is averaged for the three different directions 08, 458, and 908 to account for possible rotation of ice. The training data set is prepared by moving the computational window within the defined training regions. For each nonoverlapping placement of the window, the image features are computed in three images and stacked in a vector. The number of feature vectors computed for different ice classes is given in Table 25.3.

25.3.3

Backpropagation Neural Network

In our experiments we used a multi-layer feedforward neural network trained by a standard backpropagation algorithm [43,44]. Backpropagation neural networks also known as MLP [45] are structures of highly interconnected processing units, which

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

571

are usually organized in layers. MLP can be considered as a universal approximator of functions that learns or approximates the nonlinear input–output mapping function using a training data set. During training the weights between processing units are iteratively adjusted to minimize an error function, usually the root-mean-square (RMS) error function. The simple method for finding the weight updates is the steepest descent algorithm in which the weights are changed in the direction of the largest reduction of the error, that is, in the direction where the gradient of the error function with respect to the weights is negative. This method has some limitations [25], including slow convergence in the areas characterized by substantially different curvatures along different directions in the error surface as, for example, in the long, steep-sided valley. To speed up the convergence, we used a modification of the method that adds a momentum term [44] to the equation: Dwt ¼ hrEt þ mDwt1

(25:1)

where Dwt is the weight change at iteration t, rEt is the gradient of the error function with respect to the weights evaluated at the current iteration, h is the learning rate parameter, and m is the momentum constant, 0 < jmj < 1. Due to the inclusion of the second term, the changes of weights, having the same sign in steady downhill regions of the error surface, are accumulated during successive iterations, which increases the step size of the algorithm. In the regions where oscillations take place, contributions from the momentum terms change sign and thus tend to cancel each other, reducing the step size of the algorithm. The gradients rEt are computed using the known backpropagation algorithm [43,44].

25.3.4

Linear Discriminant Analysis Based Algorithm

An LDA-based algorithm is proposed by Wackerman and Miller [6] for sea ice classification in the marginal ice zone (MIZ) using single channel SAR data. In this study it is applied for data fusion of different sensors. LDA is a known method for the reduction of dimensionality of the input space, which can be used at the preprocessing stage of the classification algorithm, to reduce the number of input features. This method is used to project the original, usually high-dimensional input space onto a lower dimensional one. The projection of n-dimensional data vector ~ x is done using the linear transformation ~ y ¼ VT~ x, where ~ y is the vector of dimension m (m
(25:2)

P P ~ i)(~ ~ i)T þ (~ ~ j)(~ ~ j)T is the total within-class covariance xi m xi m xj m xj m where Wij ¼ (~ ~ i and m ~j matrix, given as a sum of the two covariance matrices of ith and jth ice classes, m are the mean feature vectors of classes i and j, respectively; Bij is the between-class matrix, ~i m ~ j)(m ~i m ~ j)T, and ~ given by Bij ¼ (m ij is the transformation (projection) vector to which matrix V reduces in the two-class case. Vector ~ ij defines a new direction in feature space, along which separation of classes i and j is maximal. It can be shown that vector ~ ij

Signal and Image Processing for Remote Sensing

572

maximizing the clustering metric in Equation 25.2 is the eigenvector with the maximum eigenvalue l that satisfies the equation [47]: nij ¼ l~ nij Wij1 Bij~

(25:3)

In general, case classification of vectors ~ y can be performed using traditional statistical classifiers. The central limit theorem is applied since ~ y represents a weighted sum of random variables and conditional PDFs of ~ y are assumed to be multi-variate normal. The method is distribution-free in the sense that ‘‘it is a reasonable criterion for constructing a linear combination’’ [47]. It is shown to be statistically optimal if the input features are multi-variate normal [25]. Another assumption that needs to be satisfied when applying LDA is the equivalence of the class conditional covariance matrices for each class. These assumptions are difficult to satisfy in practice. However, the slight violation of these criteria does not strongly degrade the performance of the classifier [47]. In the limiting case LDA can be used to project the input space in one dimension only. By projecting feature vectors of pairs of classes, the multi-class classification problem can be decomposed into two-class problems. The constructed classifier is a piecewise linear classifier. For training of the classification algorithm and finding parameters of the classifier the following steps are performed [6]: ~ i (i ¼ 1, . . . , c), the between-class Bij, and within-class covar1. The mean vectors m iance matrices Wij (i ¼ 1, . . . , c; i ¼ 1, . . . , c; i 6¼ j) are estimated using the training data set; 2. The transformation vectors ~ ij (i ¼ 1, . . . , c; j ¼ 1, . . . , c; i 6¼ j) are found as eigenvectors of the matrix Wij1 Bij solving Equation 25.3 (since Bij~ ij has the ~i m ~ j and the scaling factor is not important, the ~ same direction as m ij can also ~i m ~ j) [48]). be found as ~ ij ¼ Wij1(m 3. The feature training vectors for ice classes i and j are projected on lines defined by ~ ij, computed in the previous step. The threshold tij between two classes is found as an intersection of two histograms. In total there are c2– c thresholds for c ice classes. During the classification stage, to see if a new vector ~ x belongs to class i, c – 1 projections pj ~ i)T ~ ¼ (~ xm x assigned to the class i if pj < tij ij, (j ¼ 1, . . . , c – 1; i 6¼ j) are computed and ~ for all j. If the latter condition is not satisfied vector ~ x is left unclassified.

25.4 25.4.1

Results Analysis of Sensor Brightness Scatterplots

The image brightness1 distribution for the six classes in the ERS and RADARSAT image is presented in the scatter diagram in Figure 25.5a. As mentioned before, the absolute values of backscatter coefficients were not available for the RADARSAT image and therefore only relative values between ice classes can be analyzed from the scatter diagram. It shows the variability of image brightness for most of the ice classes, exclusive 1

We use this term as equivalent to image value or image digital number (DN).

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

573

(a)

RADARSAT Brightness

100 Nilas FY medium deformation FY deformed Young ice FY smooth Open water

90

80

70

60

50

40 20

40

60

−21

−14

−11

80 −9

100 −7

120 −5

140 −4

ERS Brightness s0 (dB)

(b)

250

Meter Brightness

200 150 100 50 0 20

120 100 40

80

60

80 100 120 140 ERS Brightness

40

60 RADARSAT Brightness

FIGURE 25.5 Image brightness scattergrams for (a) ERS, RADARSAT and (b) ERS, RADARSAT SAR, and Meteor television camera images plotted using a subset of training feature vectors.

FY smooth ice is relatively high, and the clusters in two-dimensional subspace are overlapped. The exception is OW for which polarization ratio is as high as 16 dB [49] (VV/HH), estimated using the CMOD4 model (a semiempirical C-band model that describes the dependency of backscatter signal on wind speed and image geometry [50]), with wind speed values of 3–4 m/s measured onboard the ship. The corresponding cluster is located away from those of sea ice, thus separation of OW from ice can be

Signal and Image Processing for Remote Sensing

574

achieved using both HH and VV SAR sensors if the wind speed is in a certain range. The changes of wind speed and, consequently, changes of OW radar signatures from those used for training, can decrease classification accuracy. In the latter case incorporation of visual data (see Figure 25.5b), where OW and nilas can be well separated from the ice independently on wind speed, is quite important. The yearly averaged wind speeds at the Kara Sea are in the range of 5–8 m/s for different coastal stations [51], which suggests that the calm water conditions (wind speed below 3 m/s) are less common than wind roughed OW conditions. The variation in backscatter from FY ice mostly depends on surface roughness, thus FY smooth ice appears dark in the images. Nilas has higher backscatter than FY smooth ice due to rafting and formation of frost flowers on its surface. An example of frost flowers is shown in Figure 25.4e. The relatively high variation of backscatter can be partially explained by the spatial variation of surface roughness, and the existence of OW areas between the ice. These factors may also influence the other ice type signatures (FY deformed and young ice) often containing OW areas in small leads between ice floes. Formation of new ice in leads and existence of brash ice between larger ice floes can also modify ice radar signatures. The backscatter from young ice in the diagram is high due to the small size of the ice floes. The raised, highly saline ice edges are perfect scatterers and the backscatter signal integrated over large areas of young ice is often typically high.

25.4.2

MLP Training

In our study we used a standard software package—Stuttgart Neural Network Simulator (SNNS) [52]—developed at the University of Stuttgart in collaboration with other universities and research institutes. In addition to a variety of implemented neural network architectures it provides a convenient graphical user interface. The proper representation of the neural network inputs and outputs is important for its performance. There are several choices of target coding schemes. The ‘‘one from n’’ or ‘‘winner-takes-all’’ coding scheme is often used for classification. According to this scheme a desired target value of an output unit corresponding to true class of the input vector is unity, and the target values of all other units are zero. In cases where significantly different numbers of training vectors are used for each class, the neural network biases strongly in favour of the classes with the largest membership, as shown in Ref. [53]. Neural network training using this technique implicitly encodes proportions of samples within classes as prior probabilities. In our training data set the number of training vectors for each class does not correspond to their prior probabilities. These probabilities depend on many factors and are generally unknown. We assumed them to be equal and adopted the following modification of the coding scheme [53]: Ti ¼

pﬃﬃﬃﬃ xi 2 class i 1= N i if ~ 0 otherwise,

where Ti is the target value of class i, and Ni is the number of patterns in that class. Since the number of vectors in several classes are large, their target values do not differ much from zero. Therefore we linearly scaled the target values to span the range [0,1]. Different parameters of the MLP used in our experiments are presented in Table 25.4 and Table 25.5 along with classification results. In notation nu1–nu2–nu3, nui (i ¼ 1 – 3) is the number of units in the input, hidden, and output layers of the neural network, respectively. The number of input units of the network is equivalent to the number of

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

575

TABLE 25.4 MLP Percentage of Correct Classification on the Test Data Set Image Features/BPNN Parameters ERS mean and texture/16–20–6, 100 cycles RADARSAT mean and texture/16–20–6, 100 cycles ERS, RADARSAT mean and texture/32–40–6, 300 cycles ERS, RADARSAT mean and texture, Meteor mean/33–40–6, 300 cycles

FY Smooth

FY Medium

FY Def.

Young Ice

Nilas

OW

Total

82.3

55.1

71.9

82.9

52.4

76.9

66.0

68.5

70.9

51.2

89.5

72.6

40.0

70.7

88.5

78.3

80.5

93.8

82.4

96.2

83.9

99.2

87.1

84.6

94.1

98.2

100.0

91.2

input image features and nu3 corresponds to the number of ice classes. We found sufficient to have one layer of hidden neurons. The number of units in this layer is determined empirically. In Section 25.4.6 we use a pruning method to evaluate the number of hidden neurons more precisely. This number should generally correspond to the complexity of the task. The more neurons are in the hidden layer, the more precise the achieved approximation of the input–output mapping function. It is known that the fine approximation of the training data set does not necessarily lead to the improved generalization of the neural network, i.e., the ability to classify previously unseen data. Therefore, nu2 is determined as a compromise between the two factors and it usually increases with increasing the nu1. The following parameters are used for the MLP training: the learning rate h and the momentum parameter a are set to 0.5 and 0.2, respectively. The weights of the MLP are randomly initialized before training. They are updated after each presentation of the training vector to the neural network, that is, online learning is used. At each training cycle all vectors of the training data set are selected at random and sequentially provided to the classifier. The activation function is the standard logistic sigmoid function. In the experiments below we present accuracies estimated using the test data set. The absolute differences between the total classification accuracies of the test data set (Table 25.4 and Table 25.6) and those of the training data set (not shown) are less than 1.1 percent point, which indicates good generalization of the trained algorithms. TABLE 25.5 MLP Percentage of Correct Classification on the Test Data Set Using Only Mean Values of Sensor Brightness and a Reduced Set of Texture Features (Last Row) Image Features/MLP Parameters ERS and RADARSAT mean values/2–10–6, 80 cycles ERS, RADARSAT, and Meteor mean values/3–10–6, 80 cycles ERS and RADARSAT mean values, fourth-order central moment, cluster shade, Meteor mean value/7–6–6, 200 cycles

FY Smooth

FY Medium

FY Def.

Young Ice

Nilas

OW

Total

91.5

68.3

71.1

82.7

78.1

100.0

75.5

99.2

87.0

69.4

84.0

97.0

100.0

84.8

99.2

84.6

81.8

93.1

97.3

100.0

89.5

Signal and Image Processing for Remote Sensing

576 TABLE 25.6

LDA Percentage of Correct Classification on the Test Data Set Image Features ERS mean and texture RADARSAT mean and texture ERS and RADARSAT mean and texture ERS, RADARSAT mean and texture, Meteor mean

25.4.3 25.4.3.1

FY Smooth

FY Medium

FY Def.

Young Ice

Nilas

OW

Total

93.1 43.1 91.5

50.6 60.6 74.4

67.9 65.9 80.6

81.4 89.1 93.2

55.2 60.9 81.3

92.3 46.2 100.0

64.6 68.5 82.6

99.2

86.6

81.7

93.4

97.4

100.0

90.1

MLP Performance for Sensor Data Fusion ERS and RADARSAT SAR Image Classification

The estimated accuracy for separate classification of ERS and RADARSAT images by the MLP-based algorithm are shown in the first two rows of Table 25.4. The mean value of image brightness and image texture are used in both cases. As evident from the table, the single source classification provides poor accuracy for several ice classes, such as FY ice, nilas, and OW. This is expected since corresponding clusters are difficult to separate along axes of the scatterplot in Figure 25.5a. OW, which is visually dark in RADARSAT and bright in ERS images, is largely misclassified as FY smooth ice in RADARSAT image and classified with higher accuracy in ERS images. The evaluated total classification accuracies are 66.0% and 70.7% for ERS and RADARSAT, respectively.

25.4.3.2

Fusion of ERS and RADARSAT SAR Images

The joint use of ERS and RADARSAT image features increases the number of correctly classified test vectors to the value of about 83.9% which is 17.9 and 13.2 percent points higher than those values obtained using ERS and RADARSAT SAR separately. It is known that the radar signatures of sea ice vary across different Arctic regions (see for example Ref. [54] for multi-year ice) and are subject to the seasonal changes [35] that should influence the classifier performance. The number of ice types discriminated is maximal in winter and less in other seasons. From late spring to late summer, only two classes, namely, ice and water are most likely to be separated [4]. This limits the classifier applicability to the particular season that the classifier is trained for and to a certain geographical region. As seen from the table, the improvements are observed for all ice classes. The classification accuracy of OW is much higher, due to the incorporation of polarimetric information. This increase is as much as 56.2 and 19.3 percent points compared with RADARSAT and ERS images classified separately. Classification of the dual polarization data not only facilitates separation of classes with large polarization ratios, but also those with relatively small ratios, which have similar tonal signatures on ERS and RADARSAT images. The improvement in classification of young ice, medium, and rough FY ice with a pronounced texture, can be due to incorporation of texture features in the classification discussed in the next sections. The latter factor can also explain the higher increase in accuracy observed in our experiment as compared with previous studies [30], which demonstrated 10 percent points increase in the total classification accuracy (63.6% vs. 52.4%) for the

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

577

three ice types using fully polarimetric data obtained at C-band, although the direct comparison of the results is rather difficult. 25.4.3.3

Fusion of ERS, RADARSAT SAR, and Meteor Visible Images

Combination of visible and SAR images improves estimated accuracy up to 91.2% (the last row in Table 25.4). The increase in classification accuracy gained by fusion of a single polarization SAR (ERS-1) with Landsat TM images was demonstrated earlier by using maximum likelihood classifier [24]. Our results show that visual data are also useful in combination with the multi-polarization SAR data. A significant improvement of about 16 percent points is observed for nilas, since the optical sensor has a large capacity to discriminate between nilas and OW (see Figure 25.5b). An increase in classification accuracy is also observed for the smooth and deformed FY ice because less of their test vectors are misclassified as nilas. The improvement is insignificant for OW and young ice because these classes have already been well separated using the polarimetric data set. It should be mentioned that the importance of optical data for classification of OW using C-band should increase in the near range because polarization ratio (VV/HH) decreases with decreasing incident angle [55] and is small in the directions close to nadir. Another important factor favoring incorporation of optical clear sky data is the generalization of the classification algorithm over the images containing OW areas, acquired under different wind conditions (not available in this study). Since backscattering from OW, among other factors, depends on wind speed and direction [55], it would be more difficult for a classifier utilizing only polarimetric information to generalize well over OW regions, unless wind speed and direction, obtained from the other data sources (scatterometer derived wind fields, ground-truth data) are presented to the classifier.

25.4.4

LDA Algorithm Performance for Sensor Data Fusion

The classification accuracy of the LDA algorithm, estimated using the same combinations of image features as for MLP are presented in Table 25.6. The MLP slightly outperforms the LDA algorithm with an accuracy increase of 1.2–2.2 percent points for different feature combinations. In interpreting this result we should mention the close relationship between neural networks and traditional statistical classifiers. It is shown that a feed forward neural network without hidden layers approximates an LDA [25], and multilayer neural network with at least one hidden layer of neurons provides nonlinear discrimination between classes. For the performance and selection of the classifier, a structure of the input space, the form of clusters, and their closeness in the feature space, are of primary importance. If the clusters have a complex form and are close to each other, construction of nonlinear decision boundaries may improve discrimination. By experimentally comparing LDA- and MLP-based algorithms, we compare linear and nonlinear discriminant methods applied for the sea ice classification problem. Our results suggest that the used feature space is fairly simple, and that the ice classes can be linearly separated piecewise. Construction of nonlinear decision boundaries can only slightly improve classification results. The assessed classification accuracies for individual classes are different. Some ice classes are better classified by the LDA algorithm, but there is a reduction in classification accuracy for the other classes. Taking into account the longer training time for MLP, compared with the LDA algorithm (5 min 59 sec vs. only 12 sec

578

Signal and Image Processing for Remote Sensing

respectively2) the latter may be preferable when the time factor is important. However, a greater reduction of training and classification time can be gained by using less input features or even just using image brightness values, since computation of texture is time consuming (31 min 39 sec, 16 parameters over the 2163 2763 single image in nonoverlapping window3). Therefore, in the next section we will empirically evaluate the usefulness of texture computation for a multi-sensor data set.

25.4.5

Texture Features for Multi-Sensor Data Set

Assuming that texture is useful for classification of single polarization SAR data [6,9,10,38–41], the question that we would like to address is ‘‘are the texture features still useful for classification of the multi-polarization data set?’’ For this purpose the MLP is trained using only mean values of image brightness, and the test results obtained (Table 25.5, the first two rows) are compared with those presented earlier (Table 25.4, the last two rows), where texture is used. By comparing them, one notices the increase of total classification accuracy from 75.5% (no texture) to 83.9% when texture is computed for a combination of ERS and RADARSAT. The improvement gained by the incorporation of texture is largest for the FY rough ice, young ice, and nilas. It is interesting to mention that computation of texture features for the dual polarization data set (VV and HH) provides almost the same accuracy (83.9%) as reached by the fusion of data from all three sensors without texture computation (84.8%), as shown in the second row of Table 25.5. This is useful because visual images may be unavailable due to the weather or low light conditions so that texture computation for the polarimetric data set can, at least partially, compensate for the absence of optical data. As expected, sea ice classes with a developed texture, such as young and deformed first ice, are better classified when texture features are computed for the dual polarization data set, while the estimated classification accuracy of other classes are higher using brightness from all three sensors without texture computation. The texture features are still useful when data from all three sensors are available, gaining an improvement of 6.4 percent points (91.2% vs. 84.8%).

25.4.6

Neural Network Optimization and Reduction of the Number of Input Features

In the previous sections we considered texture features as comprising one group. Nevertheless, some of the selected features may carry similar information on sea ice classes, thus increasing the dimensionality of the input feature space and necessitating the use of larger training data sets. To exclude the redundant features and simultaneously optimize the structure of the neural network we apply pruning methods [56] that trim or remove units or links of the neural network (depending on the method), and evaluate some parameters of neural network performance. The stepwise LDA is one of the methods that can be used for the analysis and selection of the input features for the linear classifier; however, application of this method may not be optimal for the nonlinear MLP. 2

33 input features, 300 training cycles for MLP, Sun Blade 100 workstation with 500 MHz UltraSPARC II CPU are used. 3 The computation, perhaps, can be made faster by the code optimization.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

579

42 40 38 36

Sum-squared error

34 32 30 28 26 24 22 20 18 16 14

A 0

200

400

600

B 800

1000

1200

1400

No. of training cycles FIGURE 25.6 Change of the sum-squared error during initial training and pruning of the MLP.

The largest trained neural network (33–40–6) is selected for the experiments. During pruning the redundant units or links are sequentially removed. After this the neural network is additionally trained to recover from the change in its structure. Figure 25.6 shows the change of sum-squared error during initial training and pruning using the skeletonization algorithm [57]. This is a sensitivity type algorithm that estimates the relevance of each unit by computing derivative of a linear error function with respect to the switching off and on of the unit using attenuation strength parameter. These errors propagate backward using a similar algorithm to that used for training of the MLP. The picks in the error plot correspond to removal of the input and hidden units. Approximately half of the input and several hidden units are removed at the middle of the pruning (letter A). As shown, their removal at this stage does not strongly effect classification. Increasing the number of removed units causes the MLP performance on the training data set to decrease. This process continues until only three input units, corresponding to the three sensor image brightnesses and six hidden units, are left (letter B in Figure 25.6). In the next section the two neural networks similar to those obtained at the end and at the middle of pruning will be analyzed. 25.4.6.1 Neural Network with No Hidden Layers We start with the description of the neural network consisting of three input and no hidden units. Its Hinton diagram is shown in Figure 25.7. Connections between the input and output units are represented by squares in the diagram. The size and color of the

Signal and Image Processing for Remote Sensing

580 ERS

RADARSAT Meteor

4

Nilas

5

FY medium deformation

6

FY deformed

7

Young ice

8

FY smooth

9

Open water 1

2

3

FIGURE 25.7 The Hinton diagram of the MLP with no hidden layers (3–6). The squares at the diagram represent weights between input (no. 1–3) and output units (no. 4–9) of the neural network. The size of the squares lineally scales with the absolute value of weights from 20.0 to 0, dark and white colors mark positive and negative weights, respectively.

squares correspond to the strength of the connections, and their sign, respectively. In a neural network without hidden layers these weights affect the incoming signals, directly defining the contribution of each sensor (i.e., input unit) to the class outputs. As known, this neural network provides a linear discrimination between classes. Analyzing the diagram, we find the large Meteor image values, which were changed by positive weight connections, (last column in the diagram), increase the FY ice output signal, and correspondingly, a posterior probability of the input vector to belong to the FY ice. Simultaneously, the probability for nilas class is reduced due to the negative weight between the respective input and output units. The high ERS image values increase FY deformed ice output through the positive weight connection (first column in the diagram) and decrease it for smooth and medium deformation FY ice. The test vector most probably belongs to OW when the ERS image value is high and the RADARSAT image value is low. There is no redundancy in using these sensors: if FY deformed ice can be separated using a combination of ERS and Meteor sensors, RADARSAT SAR data need not be involved. The latter is also true for the young ice. The other sensors can indirectly influence the assignment of the vector to a particular class by decreasing or increasing outputs for the other classes. 25.4.6.2

Neural Network with One Hidden Layer

Neural networks obtained in the middle stages of the pruning (Figure 25.6, letter A) correspond to some compromise between neural network complexity (i.e., number of input and hidden units) and the RMS error for the training data set. An increase in this

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

581

error does not necessarily imply a decrease in the classification rate for the test data set, since generalization of the neural network may improve. This neural network still has a number of hidden units and can thus provide nonlinear discrimination between ice classes. The outputs of the neural network on presenting the feature vector to the classifier are proportional to the class conditional probabilities [25] for the type of neural network and target coding schema used here. During several pruning trials, neural networks of different structure and with different combinations of input features are obtained, which implies the existence of several minima in the multi-dimensional error surface. The application of pruning methods does not guarantee finding a global minimum, thus several MLPs with similar performances on the test data set are obtained. The Hinton diagram of the MLP with the structure 7–6–6 is shown in Figure 25.8. In addition to the brightness for each of the three sensors, it uses two texture features: a fourth-order central statistical moment and a cluster shade for ERS and RADARSAT images. For its analysis it is convenient to represent the performance of each hidden neuron as a linear combination of several input features, weighted by a sigmoid activation function. There are six such combinations (see first six rows of the diagram), corresponding to the six hidden neurons. In the first two of these, the individual sensor brightness values play a dominant role through the large positive weights. The contribution of each combination to the formation of different class outputs can be analyzed by looking at weights between the hidden and output units (shown in the last six columns of the diagram and marked by arrows). The role of the first two combinations is similar to

ERS 4th Mean moment

RADARSAT Meteor 4th CS Mean CS Mean moment Hidden unit 1

8

Hidden unit 2

9

Hidden unit 3

10

Hidden unit 4

11

Hidden unit 5

12

Hidden unit 6

13 14

Nilas

15

FY medium

16

FY deformed

17

Young ice

18

FY smooth

19

Open water 1

2

3

4

5

6

7

8

9

10

11

12

13

FIGURE 25.8 The Hinton diagram of the MLP with one hidden layer of neurons (7–6–6). The squares at the upper left-hand corner of the diagram represent weights between input (no. 1–7) and hidden units (no. 8–13), the squares at the lower right-hand corner of the diagram represent weights between hidden and outputs units (no. 14–19) of the neural network. The size of the squares lineally scales with the absolute value of weights from 15.0 to 0, dark and white colors mark positive and negative weights, respectively. The used abbreviation: CS, cluster shade.

Signal and Image Processing for Remote Sensing

582

the individual sensor contributions in the network without hidden neurons described earlier. However, the availability of the sixth combination, where the ERS and RADARSAT image brightness are combined together, modifies the weight structure. For example, a negative contribution of ERS brightness (1st hidden unit) to the FY medium ice output is partially compensated by positively weighted ERS brightness of the 6th combination. Eventually, only RADARSAT brightness plays a dominant role in output of FY medium ice. From the diagram it is seen that a cluster shade feature contributes little to the sea ice separation due to the small weights connecting the corresponding input units with hidden units (2nd and 6th columns). Perhaps this feature can be removed when data from all three sensors are used in classification since inclusion of cluster shade provides only about a 0.5 percent point increase in classification accuracy (5–6–6, 200 cycles). The fourth-order statistical moment mostly takes part in formation of FY deformed and young ice outputs through the 3rd, 4th, and 5th hidden units. This observation is supported by the increase in classification accuracy for these classes when the texture features are used (last row of Table 25.5). It is evident from the classification results that the role of texture increases with decreasing number of sensors used in classification (8.4 vs. 6.4 percent points for 2 and 3 sensors, respectively). More texture features in addition to those shown in the diagram may be required for the two-sensor image classification (ERS and RADARSAT). The suitable candidates are variation and entropy. Analysis of the pruning sequences suggests that texture features computed over VV and HH polarization images are not fully redundant (see also Ref. [58]); therefore, it seems reasonable to use the same feature sets for ERS and RADARSAT images. 25.4.7

Classification of the Whole Image Scene

Sea ice maps produced by different classifiers (Table 25.4 and Table 25.6) are presented in Figure 25.9 and Figure 25.10 for the MLP and LDA-based algorithms, respectively. (a)

(b)

(c)

FY smooth

FY medium deformation

FY deformed

Young ice

Open water

Nilas

(d)

FIGURE 25.9 (See color insert following page 302.) MLP sea ice classification maps obtained using: (a) ERS; (b) RADARSAT; (c) ERS and RADARSAT; and (d) ERS, RADARSAT, and Meteor images. The classifiers’ parameters are given in Table 25.4.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data (a)

(b)

(c)

FY smooth

FY medium deformation

FY deformed

Young ice

Open water

Nilas

583 (d)

Not classified

FIGURE 25.10 (See color insert following page 302.) LDA sea ice classification maps obtained using: (a) ERS; (b) RADARSAT; (c) ERS and RADARSAT; and (d) ERS, RADARSAT, and Meteor images.

No postclassification filtering or smoothing is performed. The color legend is based on the WMO-defined color code with the following discrepancies: firstly, different gradations of green reflect ice surface roughness and, secondly, the red color marks OW, to separate it from nilas. The ERS and RADARSAT image classification maps are visually rather poor, and this is independent of the type of algorithm used. Some parts of the polynya (regions A and B in Figure 25.9) are misclassified as FY ice and, conversely, some areas of the FY ice (regions C and D) are misclassified as nilas. This also applies to the LDA classification maps (Figure 25.10), which appear a bit more ‘‘noisy’’ because some image areas are left unclassified by the LDA algorithm (dark pixels) and some areas of smooth FY ice are largely misclassified as OW in the case of the RADARSAT image in Figure 25.10b. Due to classification errors, some of the objects cannot be visually identified in the images in the case of the single sensor classification. Sea ice classification is improved by combining the two SAR images (Figure 25.9c and Figure 25.10c). OW, young, and FY ice are much better delineated using dual polarization information. Classification of sea ice using three data sources (Figure 25.9d and Figure 25.10d) in general corresponds well to the visual expert classification shown in Figure 25.11, but with more details in comparison to the usual manually prepared sea ice maps. To generalize ice zones well (some applications do not require detailed ice information) and reduce classification time, larger ice zones are usually manually outlined, partial concentrations of ice types within these zones are (often subjectively) evaluated and presented in an ice chart using the WMO egg code (not used in Figure 25.11). As can be seen from the examples the detailed presentation of information in automatically produced sea ice maps can be useful in tactical ice navigation, where small-scale ice information is needed. For applications that do not require a high level of detail, the partial ice concentrations can be calculated more precisely based on the results of

Signal and Image Processing for Remote Sensing

N74⬚

584

2

3

1

1

2

6

4

6

1 2

2 6

1

5

N73⬚

2 3

6a

5

6

6

6

5

N72⬚

7

7

N71⬚

6

E74⬚ 0

50

E76⬚ 100

E78⬚ 150

200

E80⬚

E82⬚

E84⬚

E86⬚

250

km FIGURE 25.11 Expert interpretation of RADARSAT Scan SAR image (30 April 1998) made by N. Babitch (Murmansk Shipping Company). (1) The darker areas are mainly large thick FY ice while the brighter areas are young ice. The darkest areas between the FY and young ice is nilas. (2) Heavily deformed thin and medium thick FY ice with some inclusions of young ice. (3) Nilas (dark) and young ice (bright). (4) A difficult area for interpretation. Probably it is a polynya with a mixture of water and thin ice types such as grease ice and nilas. (5) Mixture of open water and very thin ice. (6) Fast ice of various age and deformation. The darker areas are undeformed ice while brighter areas are more deformed ice. The brightest signature in the river estuaries can be due to freshwater ice from the rivers. (6a) Larger floes of fast ice drifting out from the Yenisei estuary. (7) The bright lines in the fast ice are ship tracks. (From O.M. Johannessen et al., SAR sea ice interpretation guide. NERSC Tech. Rep. No. 227, Bergen, Norway, 2002. With Permission.)

automatic classification. The major difference between expert and automatic classifications is in the assignment of bright segment no. 2 in the upper right hand corner of Figure 25.11, which has been assigned to heavily deformed FY ice by ice expert and to young ice in our case. Some discrepancies in classification results can arise because in situ and optical data were not used for the expert classification. In many cases, the incorporation of low-resolution visible information reduces the noise in the classification, while preserving small-scale ice features. The nilas and fast ice (regions E and F in Figure 25.9d, respectively) are correctly classified, while some relatively thin strips of rafted ice in polynya (region G) are still retained in the classified images. Classification maps produced by the algorithms look rather similar except for some minor details: the strips of rafted ice (region G) are better delineated by the

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

585

MLP-based algorithm and the region of OW in the low right hand corner of the images (marked by letter H in Figure 25.9d) is better classified by LDA-based algorithm. It is interesting to mention that in the latter case only, the transition from OW to nilas and then to young ice is correctly outlined in the rather complex region (H). Such transition zones are observed many times during the expedition and this classification result seems reasonable. It is also interesting to see how fast ice in region F in Figure 25.9d is delineated by different classifiers (Figure 25.9 and Figure 25.10a–d).

25.5

Conclusions

In this paper we have presented a sea ice classification algorithm that fuses data of different satellite sensors. The algorithm is based on a multi-layer backpropagation neural network and utilizes ERS, RADARSAT SAR and low-resolution optical images. Radiometric correction, spatial alignment, and co-registration of images from these three sensors have been done before their fusion. The selection of image test and training regions is based on in situ observations made onboard the Russian nuclear icebreaker ‘‘Sovetsky Soyuz’’ in April, 1998. The image features used include image moments, graylevel co-occurrence matrix texture, and autocorrelation function–based features for the SAR image and mean brightness for visible image. The empirical assessment of the algorithms via the independent test data set showed that the performance of MLP with only one data source (SAR) is rather poor (66.0% and 70.7% for the ERS and RADARSAT SAR, respectively), even if texture features are used in the classification. A substantial improvement in sea ice classification is achieved by combining ERS and RADARSAT SAR images obtained at two different polarizations (VV and HH). Fusion of these two SAR data types increases the classification accuracy up to 83.9%. This noticeable improvement is observed for all ice classes. These results suggest that an improvement in ice classification can be also expected for ENVISAT ASAR operating at alternating polarization mode and for future RADARSAT-2 SAR with its several multi-polarization modes. Incorporation of visual data provides additional information, especially on the nilas and OW classes. The estimated total classification accuracy reaches 91.2% when the low-resolution visual data are combined with the dual polarization data set. In cases where visual data are unavailable, computation of texture features is found to be particularly useful, since it can, at least partially, compensate for the reduction in accuracy due to the absence of low-resolution visual data. Both of the considered MLP and LDA algorithms show similar results when applied for sea ice classification. This implies that the ice classes can be linearly separated piecewise in multi-dimensional feature space. Incorporation of texture features into the classification improves the separation of various ice classes, although its contribution decreases with an increasing number of sensors used in classification. Computation of texture is time consuming. Application of a pruning method allowed us to reduce the number of input features of the neural network, to adjust the number of hidden neurons, and, as a result, to decrease classification time. These methods are not free from entrapment in the local minima of the error surface, thus the application of evolutionary optimization methods for the neural network structure optimization is a possible future development. The other interesting direction of research can be in combining the outputs of several classifiers in an MCS [60]. The differences in classifier performances for different ice classes (especially in single polarization image classification) can be accommodated in MCS to improve overall classification accuracy.

586

Signal and Image Processing for Remote Sensing

In this study we used an extensive set of in situ observations collected on board the icebreaker sailing in ice; however, the studied area and the number of observations are still quite small to describe the variety of possible ice conditions. The presented results should therefore be taken with care, because they are obtained for a relatively small geographical region and during the short time period in cold conditions. With new data acquisition it is necessary to assess the generalization of the algorithm to a larger geographical region and to a larger number of seasons.

Acknowledgments This work was fulfilled during the first author’s Ph.D. study at the Nansen Environmental and Remote Sensing Centres in St. Petersburg (Russia) and Bergen (Norway). The study was sponsored by the Norwegian Research Council. The first author would like to thank his scientific supervisor Dr. V.V. Melentyev and Prof. V.D. Pozdniakov for their help and valuable advice. The ice data sets were obtained through ‘‘Ice Routes’’ (WA-96-AM-1136) and ‘‘ARCDEV’’ (WA-97-SC2191) EU Projects. The authors are also grateful to Dr. C.C. Wackerman (Veridian Systems Division, Ann Arbor, Michigan), SNNS developers for providing the software packages, and Dr. G. Scho¨ner and Dr. C. Igel (INI, Bochum) for their suggestions for improving the manuscript.

References 1. L.K. Soh, C. Tsatsoulis, and B. Holt, Identifying ice floes and computing ice floe distributions in SAR images, in Analysis of SAR Data of the Polar Oceans, C. Tsatsoulis and R. Kwok, Eds., New York: Springer-Verlag, pp. 9–34, 1998. 2. D. Haverkamp and C. Tsatsoulis, Information fusion for estimation of summer MIZ ice concentration from SAR imagery, IEEE Trans. Geosci. Rem. Sens., 37(3), 1278–1291, 1999. 3. J. Banfield, Automated tracking of ice floes: a stochastic approach, IEEE Trans. Geosci. Rem. Sens., 29(6), 905–911, 1991. 4. R. Kwok, E. Rignot, B. Holt, and R. Onstott, Identification of sea ice types in spaceborne synthetic aperture radar data, J. Geophys. Res., 97(C2), 2391–2402, 1992. 5. Y. Sun, A. Calrlstro¨m, and J. Askne, SAR image classification of ice in the Gulf of Bothnia, Int. J. Rem. Sens., 13(13), 2489–2514, 1992. 6. C.C. Wackerman and D.L. Miller, An automated algorithm for sea ice classification in the marginal ice zone using ERS-1 synthetic aperture radar imagery, Tech. Rep., ERIM, Ann Arbor, MI, 1996. 7. M.M. Van Dyne, C. Tsatsoulis, and F. Fetterer, Analyzing lead information from SAR images, IEEE Trans. Geosci. Rem. Sens., 36(2), 647–660, 1998. 8. C. Bertoia, J. Falkingham, and F. Fetterer, Polar SAR data for operational sea ice mapping, in Analysis of SAR Data of the Polar Oceans, C. Tsatsoulis and R. Kwok, Eds., New York: SpringerVerlag, pp. 201–234, 1998. 9. M.E. Shokr, Evaluation of second-order texture parameters for sea ice classification from radar images, J. Geophys. Res., 96(C6), 10625–10640, 1991. 10. D.G. Barber, M.E. Shokr, R.A. Fernandes, E.D. Soulis, D.G. Flett, and E.F. LeDrew, A comparison of second-order classifiers for SAR sea ice discrimination, Photogram. Eng. Remote Sens., 59(9), 1397–1408, 1993.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

587

11. D. Haverkamp, L.K. Soh, and C. Tsatsoulis, A comprehensive, automated approach to determining sea ice thickness from SAR data, IEEE Trans. Geosci. Rem. Sens., 33(1), 46–57, 1995. 12. A.K. Fang, Microwave Scattering and Emission Models and Their Applications, Norwood, MA: Artech House, 1994. 13. D.P. Winebrenner, J. Bredow, M.R. Drinkwater, A.K. Fung, S.P. Gogineni, A.J. Gow, T.C. Grenfell, H.C. Han, J.K. Lee, J.A. Kong, S. Mudaliar, S. Nghiem, R.G. Onstott, D. Perovich, L. Tsang, and R.D. West., Microwave sea ice signature modelling, in Geophysical Monograph 68: Microwave Remote Sensing of Sea Ice, F.D. Carsey, Ed., Washington, DC: American Geophysical Union, pp. 137–175, 1992. 14. R.G. Onstott, R.K. Moore, and W.F. Weeks, Surface-based scatterometer results of Arctic sea ice, IEEE Trans. Geosci. Rem. Sens., GE-17, 78–85, 1979. 15. O.M. Johannessen, W.J. Campbell, R. Shuchman, S. Sandven, P. Gloersen, E.G. Jospberger, J.A. Johannessen, and P.M. Haugan, Microwave study programs of air–ice–ocean interactive processes in the Seasonal Ice Zone of the Greenland and Barents seas, in Geophysical Monograph 68: Microwave Remote Sensing of Sea Ice, F.D. Carsey, Ed., Washington, DC: American Geophysical Union, pp. 261–289, 1992. 16. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., New York: Academic Press, 1990. 17. F.M. Fetterer, D. Gineris, and R. Kwok, Sea ice type maps from Alaska Synthetic Aperture Radar Facility imagery: an assessment, J. Geophys. Res., 99(C11), 22443–22458, 1994. 18. S. Sandven, O.M. Johannessen, M.W. Miles, L.H. Pettersson, and K. Kloster, Barents Sea seasonal ice zone features and processes from ERS 1 synthetic aperture radar: seasonal Ice Zone Experiment 1992, J. Geophys. Res., 104(C7), 15843–15857, 1999. 19. R.A. Schowengerdt, Remote Sensing—Models and Methods for Image Processing, New York: Academic Press, 1997. 20. D.L. Hall, Mathematical Techniques in Multisensor Data Fusion, Norwood, MA: Artech House, 1992. 21. M.J. Collins, Information fusion in sea ice remote sensing, in Geophysical Monograph 68: Microwave Remote Sensing of Sea Ice, F.D. Carsey, Ed., Washington, DC: American Geophysical Union, pp. 431–441, 1992. 22. S.G. Beaven and S.P. Gogineni, Fusion of satellite SAR with passive microwave data for sea remote sensing, in Analysis of SAR Data of the Polar Oceans, C. Tsatsoulis and R. Kwok, Eds., New York: Springer-Verlag, pp. 91–109, 1998. 23. L.K. Soh, C. Tsatsoulis, D. Gineris, and C. Bertoia, ARKTOS: an intelligent system for SAR sea ice image classification, IEEE Trans. Geosci. Rem. Sens., 42(1), 229–248, 2004. 24. K. Steffen and J. Heinrichs, Feasibility of sea ice typing with synthetic aperture radar (SAR): Merging of Landsat thematic mapper and ERS 1 SAR satellite imagery, J. Geophys. Res., 99(C11), 22413–22424, 1994. 25. C.M. Bishop, Neural Networks for Pattern Recognition, Oxford: Clarendon Press, 1995. 26. J.A. Benediktsson, P.H. Swain, and O.K. Ersoy, Neural network approaches versus statistical methods in classification of multisource remote sensing data, IEEE Trans. Geosci. Rem. Sens., 28(4), 540–552, 1990. 27. J.D. Paola and R.A. Schowengerdt, A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification, IEEE Trans. Geosci. Rem. Sens., 33(4), 981–996, 1995. 28. C. Eckes and B. Fritzke, Classification of sea-ice with neural networks—results of the EU research project ICE ROUTS, Int. Rep. 2001–02, Institut fu¨r Neuroinformatik, Ruhr-Universita¨t Bochum, Bochum, Germany, March 2001. 29. J. Key, J.A. Maslanik, and A.J. Schweiger, Classification of merged AVHRR and SMMR Arctic data with neural networks, Photogram. Eng. Rem. Sens., 55(9), 1331–1338, 1989. 30. Y. Hara, R.G. Atkins, R.T. Shin, J.A. Kong, S.H. Yueh, and R. Kwok, Application of neural networks for sea ice classification in polarimetric SAR images, IEEE Trans. Geosci. Rem. Sens., 33(3), 740–748, 1995. 31. J.A. Karvonen, Baltic sea ice SAR segmentation and classification using modified pulse-coupled neural networks, IEEE Trans. Geosci. Rem. Sens., 42(7), 1566–1574, 2004.

588

Signal and Image Processing for Remote Sensing

32. RADARSAT data products specifications, Doc. RSI-GS-026, ver. 3.0, RADARSAT International, May 2000. 33. K. Kloster, A report on backscatter normalisation and calibration of C-band SAR, NERSC Tech. Notes, Bergen, Norway, 1997. 34. H. Laur, P. Bally, P. Meadows, J. Sanchez, B. Schaettler, and E. Lopinto, ERS SAR calibration. Derivation of the backscattering coefficient s0 in ESA ERS SAR PRI products, Tech. Rep. ES-TNRS-PM-HL09, issue 2, ESA/ESRIN, Frascati, Italy, 1996. 35. R.G. Onstott, SAR and scatterometer signatures of sea ice, in Geophysical Monograph 68: Microwave Remote Sensing of Sea Ice, F.D. Carsey, Ed., Washington, DC: American Geophysical Union, pp. 73–104, 1992. 36. T.C. Grenfell, D. Cavalieri, D. Comiso., and K. Steffen, Considerations for microwave remote sensing of thin sea ice, in Geophysical Monograph 68: Microwave Remote Sensing of Sea Ice, F.D. Carsey, Ed., Washington, DC: American Geophysical Union, pp. 291–301, 1992. 37. WMO, Sea Ice Nomenclature, World Meteorological Organization, Geneva, Switzerland, 1970. 38. L.K. Soh and C. Tsatsoulis, Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices, IEEE Trans. Geosci. Rem. Sens., 37(2), 780–795, 1999. 39. Q.A. Holmes, D.R. Nu¨esch, and R.A. Shuchman, Textural analysis and real-time classification of sea ice types using digital SAR data, IEEE Trans. Geosci. Rem. Sens., GE 22(2), 113–120, 1984. 40. J.A. Nystuen and F.W. Garcia, Jr., Sea ice classification using SAR backscatter statistics, IEEE Trans. Geosci. Rem. Sens., 30(3), 502–509, 1992. 41. M.J. Collins, C.E. Livingstone, and R.K. Raney, Discrimination of sea ice in the Labrador marginal ice zone from synthetic aperture radar image texture, Int. J. Rem. Sens., 18(3), 535– 571, 1997. 42. R.M. Haralic, K. Shanmugam, and I. Dinstein, Textural features for image classification, IEEE Trans. Syst. Man Cybern., SMS-3(6), 610–621, 1973. 43. P.J. Werbos, Beyond regression: New tools for prediction and analysis in the behavioural sciences, Ph.D. Thesis, Harvard University, Cambridge, 1974. 44. D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning representations by back-propagating errors, Nature, 323, 533–536, 1986. 45. F. Rosenblatt, The Perceptron: a probabilistic model for information storage and organization in the brain, Psychol. Rev., 65, 386–408, 1958. 46. R.A. Fisher, The use of multiple measurement in taxonomic problems, Ann. Eugen., 7, 179–188, 1936. 47. P.A. Lachenbruch, Discriminant Analysis, New York: Hafner Press, 1975. 48. R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, New York: John Wiley & Sons, 1973. 49. K. Kloster, Personal communication, May 1999. 50. A.C.M. Stoffelen and D.L.T. Andreson, Scatterometer data interpretation: estimation and validation of the transfer function CMOD4, J. Geophys. Res., 102(C3), 5767–5780, 1997. 51. Atlas of the Arctic, Moscow: Glavnoye Upravlenie Geodezii i Kartografii, 1985. 52. A. Zell, G. Mamier, M. Vogt, N. Mache, R. Hu¨bner, S. Do¨ring, et al., SNNS Stuttgart neural network simulator user manual, ver. 4.1, Rep. no. 6/95, Inst. Parallel Distributed High Performance Syst., University of Stuttgart, Stuttgart, Germany, 1995. 53. A.R. Webb and D. Lowe, The optimised internal representation of multiplayer classifier networks performs nonlinear discriminant analysis, Neural Netw., 3, 367–375, 1990. 54. R. Kwok and G.F. Cunningham, Backscatter characteristics of the winter ice cover in the Beaufort Sea, J. Geophys. Res., 99(C4), 7787–7802, 1994. 55. R.H. Stewart, Methods of Satellite Oceanography, Berkeley: University of California Press, 1985. 56. J. Siestma and R.J.F. Dow, Creating artificial neural networks that generalize, Neural Netw., 4, 67–79, 1991. 57. P. Smolensky and M. Mozer, Skeletonization: a technique for trimming the fat from a network via relevance assessment, in Advances in Neural Information Processing Systems (NIPS), San Mateo: Morgan Kaufmann, pp. 107–115, 1989.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data

589

58. M.E. Shokr, L.J.Wilson, and D.L. Surdu-Miller, Effect of radar parameters on sea ice tonal and textural signatures using multi-frequency polarimetric SAR data, Photogram. Eng. Rem. Sens., 61(12), 1463–1473, 1995. 59. O.M. Johannessen et al., SAR sea ice interpretation guide. NERSC Tech. Rep. No. 227, Bergen, Norway, 2002. 60. A.V. Bogdanov, G. Scho¨ner, A. Steinhage, and S. Sandven, Multiple classifier system based on attractor dynamics, in Proc. IEEE Symp. Geosci. Rem. Sens. (IGARSS), Toulouse, France, July 2003.

26 Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix from a Hierarchical Segmentation of an ASTER Image

Alfred Stein, Gerrit Gort, and Arko Lucieer

CONTENTS 26.1 Introduction ..................................................................................................................... 591 26.2 Concepts and Methods .................................................................................................. 592 26.2.1 The k-Statistic .................................................................................................... 593 26.2.2 The BT Model .................................................................................................... 595 26.2.3 Formulating and Testing a Hypothesis......................................................... 596 26.3 Case Study ....................................................................................................................... 596 26.3.1 The Error Matrix ............................................................................................... 597 26.3.2 Implementation in SAS .................................................................................... 598 26.3.3 The BT Model .................................................................................................... 598 26.3.4 The BT Model for Standardized Data ........................................................... 602 26.4 Discussion ........................................................................................................................ 603 26.5 Conclusions...................................................................................................................... 605 References ................................................................................................................................... 605

26.1

Introduction

Remotely sensed images are increasingly being used for collection of spatial information. A wide development in sensor systems has occurred during the last decades, resulting in improved spatial, temporal, and spectral resolution. The collection of data by remote sensing is generally more efficient and cheaper than by direct observation and measurement on the ground, although still of a varying quality. Data collected by sensors may be affected by atmospheric factors between sensors and the values reflected on the earth’s surface, local impurities on the earth’s surface, technical deficiencies of sensors and other factors. In addition, only the reflection of the sensor’s signal or of the sunlight on the earth’s surface is being measured, and no direct measurements are made. Consequently, the quality of maps produced by remote sensing needs to be assessed [1]. Another major issue of interest concerns the ontological interpretation of remote-sensing imagery. The truth as it supposedly exists at the earth surface is due to what one wishes to see. This can be formalized in terms of formal ontologies, although these may be subject to changes in time as well, and also a subjective interpretation is often being given. 591

Signal and Image Processing for Remote Sensing

592

Image classification and segmentation are important concepts in this respect. A segmentation of an image yields segments of more or less homogeneity supposed to be applicable on the earth’s surface. In fact, a segmentation procedure can be done mathematically, without any more knowledge of the earth than is present in the data, hence governed by the conditions of the sensor on the earth’s surface and in the atmosphere. Image classification leads to segmentation with meaningful contents. Meaningful is either what is described by the ontologies, or what a researcher aims to find. There still is a large discrepancy between the image representation on one hand and the earth’s surface processes on the other hand. After a classification is being carried out, its accuracy can be determined if ground truth is available. Classification accuracy refers to the extent to which the classified image or map corresponds with the description of a class on the earth’s surface. This is commonly described by an error matrix, in which the overall accuracy and the accuracy of the individual classes are calculated. The k-statistic is then used for testing on homogeneity [2]. Its use is somewhat controversial, because its value depends strongly on the marginal distributions [3]. Indeed, it measures the strength of agreement without taking into account the strength of disagreement. Another criticism, which we will not address in this chapter, is that it requires the presence of well-identifiable objects. In many ontologies, though, we realize that objects at the earth’s surface are fuzzy or vague. In such cases, it appears to be more convenient to compare membership values. This chapter aims to focus on an alternative to the k-statistic. To do so, we consider on pairwise comparisons, dealing with the structure of disagreement between categories. A way to do so is by using the Bradley–Terry (BT) model [4]. Based on the logistic regression model, it may provide more details on the strength and direction of disagreement. The chapter builds on a previous study published recently [5]. It extends that paper by addressing another case study, providing more details at various places and by using a very large number of test points. We also provide additional discussion on the qualities of the BT model. The aim of research described in this chapter is to study the use of the BT model as an alternative measure for association in the remotely sensed image for the k-statistic. To do so, we implement and interpret a test of significance of the BT model for paired preferences in a multi-variate texture-based segmentation of geological classes from an ASTER image in Mongolia. An error matrix is obtained by validation with an existing geological map.

26.2

Concepts and Methods

A common aim in classifying a multi-spectral image is to automatically categorize all pixels in the image into classes [6]. Each pixel in the image is assigned in a Boolean fashion into one class. As a result, a thematic layer from the multi-spectral image emerges. Several classification algorithms applicable to thematic mapping from satellite images exist [7]. Supervised image classification depends on differences in the reflection on the earth’s surface and hence, in bare areas, on the composition of the material on the earth’s surface. Pixel-by-pixel spectral information is used as the basis for automated land-cover classification. A classification algorithm gives a classified thematic layer. Such a form of classification yields, therefore, both a segmentation of the image (i.e., different segments

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix

593

emerge) and a classification (i.e., each segment has a meaning within some ontology). This is particularly the case if a supervised classification is carried out. A feature of this method, often recognized as a drawback, is that it yields a large number of very small segments, hence a highly fragmented thematic layer. Next, the thematic layer is assessed in terms of accuracy by comparing image class samples to reference samples [8]. Representative samples are selected for each class from different locations of the image. The amount of training data usually represents between 1% and 5% of the pixels [9]. Accuracy assessment of classification is usually carried out by evaluating error matrices. An error matrix is the square contingency table. The columns represent the reference data and the rows represent the classified data [2,9]. On the basis of the error matrix, overall classification accuracy, producer’s accuracy, and the user’s accuracy are calculated. In addition, the k-statistic for each individual category within the matrix can be calculated. In all these cases, the accuracy of any individual pixel is associated in a strictly Boolean fashion, that is, the classification is either correct or incorrect.

26.2.1

The k-Statistic

The k-statistic derived from the error matrix is a measure of agreement [2]. It is based on the difference between the actual agreement in the error matrix and the chance agreement. The sample outcome is the k^ statistic, an estimate of k. The actual agreement refers to the correctly classified pixel indicated by the major diagonal of the error matrix. k measures the accuracy of classification in terms of whether the classification is better than random or not. To describe k we assume a multinomial sampling model: each class label is derived from a finite set of possible labels. To calculate the maximum likelihood estimate k^ of k and a test statistic z, the error matrix is represented in mathematical form. Then k^ is given by k^ ¼

p0 pc 1 pc

(26:1)

where p0 and pc are the actual agreement and the chance agreement. Let nij equal the number of samples classified into category i, as belonging to category j in the reference data. The k^ value can be calculated using the following formula: n k^ ¼

k P

nii

i¼1

n2

k P

niþ nþi

i¼1

k P

(26:2)

niþ nþi

i¼1

where k is the number of classes, nii is the number of correctly classified pixels of category i, niþ is the total number of pixels classified as category i, nþi is the total number of actual pixels in category i, and n is the total number of pixels. The approximate large sample variance of k equals: ! 1 q1 (1 q1 ) 2(1 q1 )(2q1 q2 q3 ) 2(1 q2 )2 (q4 4q22 ) var(^ k) ¼ þ þ (26:3) n (1 q2 )2 (1 q2 )3 (1 q2 )4

Signal and Image Processing for Remote Sensing

594 where q1 ¼

q2 ¼

k 1X nii n i¼1

k 1 X niþ nþi 2 n i¼1

q3 ¼

k 1 X nii (niþ þ nþi ) n2 i¼1

q4 ¼

k 1 X nii (niþ þ nþi )2 n3 i¼1

Each of the terms qi can be easily calculated, and hence the k-statistic has found its place in a wide range of (semi-)automatic classification algorithms. Apart from being a measure of accuracy, the conditional k coefficient ki is defined as the conditional agreement for the ith category. It can also be used to test the individual class (category) agreement. The maximum likelihood estimate of ki is given by k^i ¼

n nii niþ nþi n niþ niþ nþi

(26:4)

Its approximate large sample variance is given by var(^ ki ) ¼

n (niþ nii ) [niþ nii (ni þ nþi n nii ) þ n nii (nii niþ nþi þ n)] niþ (n nþi )3

(26:5)

A test of significance for a single error matrix is based on comparing the obtained classification with a random classification, that is, with assigning random labels to the individual classes at each of the test points. For this test the null hypothesis equals H0: k ¼ 0, that is, the association does not significantly differ from that obtained from a random classification. The alternative hypothesis equals H1: k 6¼ 0, that is, the value of k significantly differs from that obtained from a random classification. The test statistic equals k^ z ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ var(^ k)

(26:6)

A similar test of significance can be carried out for each individual category using the value of the conditional ki and its variance. Different classifications may yield different error matrices. Possibly, one classification significantly improves on the other one. Comparing two error matrices, we might therefore be willing to test the null hypothesis H0: k1 ¼ k2, against the alternative hypothesis H1: k1 6¼ k2, that is, no significant difference occurs in association between the two error matrices. A test statistic for the significance of these two error matrices is given by k^1 k^2 z12 ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ k2 ) var(^ k1 ) þ var(^

(26:7)

Computation of the z or the z12 statistic then allows us to evaluate H0, as the distribution of z and that of z12 is approximately a normal distribution. H0 is rejected if z > za/2 (or

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix

595

equivalently z12 > za/2), where za/2 is the 1 – a/2 confidence level of the two-tailed standard normal distribution. A similar argument applies if a one-sided test is to be carried out, for example, to investigate a claim that some new classification procedure is better than another one.

26.2.2

The BT Model

The BT model makes a pairwise comparison among n individuals, classes, or categories. It focuses on misclassifications without considering the total number and the number of correctly classified pixels (or objects). First we follow the model in this original form. Below, we address the issue of including also the total number of classified pixels. In the BT model, classes are ordered according to magnitude on the basis of misclassifications. The ordering is estimated on the basis of pairwise comparisons. Pairwise comparisons model the preference of one individual (class, category) over another [3]. The BT model has found applications in various fields, notably in sports statistics. The original paper [4] considered basketball statistics, whereas a well worked-out example on tennis appears in Ref. [3]. Both examples include multiple confrontations of various teams against one another. The BT model can be seen as the logit model for paired preference data. A logit model is a generally applicable statistical model for the relation between the probability p that an effect occurs and an explanatory variable x by using two parameters a and b. It is modeled as ln(p/(1 – p) ¼ a þ b x, ensuring that probability values are between 0 and 1 [10]. To apply the BT model to the error matrix, we consider the pair of classes Ci and Cj and let Pij denote the probability that Ci is classified as Cj and let Pji denote that Cj is classified as Ci. Obviously, Pji ¼ 1Pij. The BT model has parameters bi such that logit(

Q

ij

) ¼ log (

Q Q = ji ) ¼ bi bj ij

(26:8)

To interpret this equation, we note that equal probabilities emerge for Ci being classified as Cj and Cj being classified as Ci if Pij ¼ Pji ¼ 1⁄2, hence if logit(Pij) ¼ 0, therefore bi ¼ bj. If Pij > 1⁄2, i.e., if Ci is more likely to be classified as Cj than Cj to be classified as Ci, then bi > bj. A value of bi larger than that of bj indicates a preference of misclassification of Ci to Cj above that of Cj to Ci. By fitting this model to the error matrix, one obtains the estimates b^i of the bi, and the ^ ij that Ci is classified as Cj are given by estimated probabilities ^ ij ¼ P

exp(b^i b^j ) 1 þ exp(b^i b^j )

(26:9)

Similarly, from the fitted values of the BT model, we can derive fitted values for misclassification, ^ ij ¼ P

m ^ ij m ^ ij m ^ ji

(26:10)

where ^ij is the expected count value of Ci over Cj, that is, the fitted value for the model. The fitted value of the model can be derived from the output of SAS or SPSS.

Signal and Image Processing for Remote Sensing

596 26.2.3

Formulating and Testing a Hypothesis

In practice, we may be interested to formulate a test on the parameters. In particular we are interested in whether there is equality in performance of a classifier for the different classes that can be distinguished. Therefore, let H0 be the null hypothesis: the class parameters are equal, that is, no significant difference exists between bi and bj. The alternative hypothesis equals H1: bi 6¼ bj. To test H0 against H1, we compare b^i b^j to its asymptotic standard error (ASE). To find the ASE, we note that var(b^i b^j) ¼ var(b^i) þ var(b^j) – 2 cov(b^i,b^j). Using the estimated variance–covariance matrix, each ASE is calculated as the square root of the sum of two diagonal values minus twice an offdiagonal value. An approximate 95% confidence interval for bi bj is then seen to be equal to b^i b^j + 1.96 ASE. If the value 0 occurs within the confidence interval, then the difference between the two classes is not significantly different from zero, that is, no significant difference between the class parameters occurs, and H0 is not rejected. On the other hand, if the value 0 does not occur within the confidence interval, then the difference between the two classes is significantly different from zero, indicating a significant difference between the class parameter and H0 is rejected in favor of H1.

26 .3

Case Study

To illustrate the BT model in an actual remote-sensing study, we return to a case described in Ref. [11]. It was shown how spectral and spatial data collected by means of remote sensing can provide information about geological aspects of the earth surface. In particular in remote, barren areas, remote-sensing imagery can provide useful information on the geological constitution. If surface observations are combined with geologic knowledge and insights, geologists are able to make valid inferences about subsurface materials. The study area is located in the Dundgovi Aimag province in Southern Mongolia (longitude: 1058500 –1068260 E and lattitude: 468010 –468180 N). The total area is 1415.58 km2. The area is characterized by an arid, mountainous-steppe zone with elevations between 1300 and 1700 m. Five geological units are distinguished: Cretaceous basalt (K1), Permian–Triassic sandstone (PT), Proterozoic granite (yPR), Triassic–Jurassic granite (yT3-J1) (an intrusive rock outcrop), and Triassic–Jurassic andesite (aT3-J1). For the identification of general geological units we use images from the advanced spaceborne thermal emission and reflection radiometer (ASTER) satellite, acquired on 21 May, 2002. The multi-spectral ASTER data cover the visible, near infrared, shortwave and thermal infrared portions of the electromagnetic spectrum, in 14 discrete channels. Level 1B data as used in this study are radiometrically calibrated and geometrically co-registered for all ASTER bands. The combination of ASTER shortwave infrared (SWIR) bands is highly useful for the extraction of information on rock and soil types. Figure 26.1 shows a color composite of ASTER band combination 9, 6, and 4 in the SWIR range. An important aspect of this study is the validation of geological units derived with segmentation (Figure 26.2a). Reference data in the form of a geological map were obtained by expert field observation and image interpretation (Figure 26.2b). Textural information derived from remotely sensed imagery can be helpful in the identification of geological units. These units are commonly mapped based on field observations or interpretation of aerial photographs. Geological units often show charac-

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix

597

FIGURE 26.1 SWIR band combination of the ASTER image for the study area in Mongolia, showing different geological units.

teristic image texture features, for example, in the form of fracture patterns. Pixel-based classification methods might, therefore, fail to identify these units. A texture-based segmentation approach, taking into account the spatial relations between pixels, can be helpful in identifying geological units from an image scene. For segmentation, we applied a hierarchical splitting algorithm to identify areas with homogeneous texture in the image. Similar to split-and-merge segmentation each square image block in the image is split into four sub-blocks forming a quadtree structure. The criterion used to determine if an image block is divided is based on a comparison between uncertainty of a block and uncertainty of its sub-blocks. Uncertainty is defined as the ratio between the similarity values (G-statistic), computed for an image block B, of the two most likely reference textures. This measure is also known as the confusion index (CI). The image is segmented such that uncertainty is minimized. Reference textures are defined by two-dimensional histograms of the local binary pattern and the variance texture measures. To test for similarity between an image block texture and a reference texture, the G-statistic is applied. Finally, a partition of the image with objects labeled with reference texture class labels is obtained [12].

26.3.1

The Error Matrix

Segmentation and classification resulted in a thematic layer with geological classes. Comparison of this layer with the geological map yielded the error matrix (Table 26.1). Accuracy of the overall classification equals 71.0%, and the k-statistic equals 0.51. A major source for incorrect segmentation is caused by the differences in detail between the segmentation results and the geological map. In the map, only the main geological units are given, where the segmentation provides many more details. A majority filter of 15 15 pixels was applied to filter out the smallest objects from the ASTER segmentation map. Visually, the segmentation is similar to the geological map. However, the K1 unit is much more abundant in the segmentation map. The original image clearly shows a distinctly different texture from the surrounding area. Therefore, this area is segmented as a K1 instead of as a PT unit. The majority filtering did not provide higher accuracy values, as the total accuracy only increased by 0.5%.

Signal and Image Processing for Remote Sensing

598 (a)

yT3-J1

K1 yPR aT3-J1

PT

FIGURE 26.2 (a) Segmentation of the ASTER image. (b) The geological map used as a reference.

26.3.2

Implementation in SAS

The BT model was implemented as a logit model in the statistical package SAS (Table 26.2). We applied proc Genmod, using the built-in binomial probability distribution (DIST ¼ BIN) and the logit link function. The covb option provides the estimated covariance matrix of the model parameter estimators, and estimated model parameters are obtained with the obstats option. In the data matrix, a dummy variable is set for each geological class. The variable for Ci is 1 and that for Cj is –1 if Ci is classified over Cj. The logit model has these variates as explanatory variables. Each line further lists the error value (‘k’) of Ci over Cj and the sum of the error values of Ci over Cj and Cj over Ci, (‘n’). The intercept term is excluded. 26.3.3

The BT Model

The BT model was fitted to the error matrix as shown in Table 26.1. Table 26.3 shows the parameter estimates b^i for each class. These values give the ranking of the category in

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix

599

TABLE 26.1 Error Matrix for Classification Classified as PT yT3-J1 K1 aT3-J1 yPR Total

PT

yT3-J1

K1

Reality aT3-J1

yPR

Total

518,326 805 140,464 50,825 23,788 734,208

304 86,103 95 2,997 8,082 97,581

8,772 0 8,123 827 60 17,782

3,256 2,836 13,645 95,244 21,226 136,207

13,716 3,848 3,799 2,621 31,569 55,553

544,374 93,592 166,126 152,514 84,725 1,041,331

Note: PT (Permian and Triassic formation), yT3-J1 (upper Triassic and lower Jurassic granite), K1 (lower Cretaceous basalt), aT3-J1 (upper Triassic and lower Jurassic andesite) and yPR (Penterozoic granite). The overall classification accuracy is 71.0%, the overall k-statistic equals 0.51.

comparison with the reference category. Table 26.3 also shows standard errors for the b^is for each class. The b parameter for class yPR is not estimated, being the last in the input series, and is hence set equal to 0. The highest b^ coefficient equal to 1.545 is observed for class PT (Permian and Triassic formation) and the lowest value is equal to –1.484 for class K1 (lower Cretaceous basalt). Standard errors are relatively small (below 0.016), indicating that all coefficients significantly differ from 0. Hence the significantly highest erroneous classification occurs for the class PT and the lowest for class K1. Estimated b^i values are used in turn to determine misclassification of one class over another (Table 26.4). This anti-symmetric matrix shows again relatively high values for differences of geological classes with PT and lower values for differences with class K1. Next, the probability of a misclassification is calculated using Equation 26.9 (Table 26.5). For example, the estimate of probability of misclassifying PT as upper Triassic and lower Jurassic granite (yT3-J1) is 0.28, whereas that of a misclassification of yT3-J1 as PT equals TABLE 26.2 data matrix; input PT yT3J1 K1 aT3J1 yPR k1 n1 k2 n2 w; k ¼ (k1/n1); n ¼ (k1/n1 þ k2/n2); cards; Data Matrix 1 1 1 1 0 0 0 0 0 0

Input

PT

yT3-J1

K1

aT3-J1

yPR k n; Cards

1 0 0 0 1 1 1 0 0 0

0 1 0 0 1 0 0 1 1 0

0 0 1 0 0 1 0 1 0 1

0 0 0 1 0 0 1 0 1 1

304 8,772 3,256 13,716 0 2,836 3,848 13,645 3,799 2,621

1,109 149,236 54,081 37,504 95 5,833 11,930 14,472 3,859 23,847

proc genmod data ¼ matrix; model k/n ¼ PT yT3J1 K1 aT3J1 yPR/NOINT DIST ¼ BIN link ¼ logit covb obstats; run;

Signal and Image Processing for Remote Sensing

600 TABLE 26.3

Estimated Parameters b^i and Standard Deviations se(b^i) for the BT Model

b^i se(b^i)

PT

yT3-J1

K1

aT3-J1

yPR

1.545 0.010

0.612 0.016

1.484 0.014

0.317 0.010

0.000 0.000

Note: See Table 26.1.

TABLE 26.4 Values for b^i b^j Comparing Differences of One Geological Class Over Another

PT yT3-J1 K1 aT3-J1 yPR

PT

yT3-J1

K1

aT3-J1

yPR

0.000 0.932 3.029 1.228 1.545

0.932 0.000 2.097 0.295 0.612

3.029 2.097 0.000 1.801 1.484

1.228 0.295 1.801 0.000 0.317

1.545 0.612 1.484 0.317 0.000

Note: See Table 26.1.

0.72: it is therefore much more likely that PT is misclassified as yT3-J1, than yT3-J1 as PT. Table 26.6 shows the observed and fitted entries in the error matrix (in brackets) for the BT model. We notice that the estimated and observed differences are relatively close. An exception is the expected entry for the yPR–aT3-J1 combination, where fitting is clearly erroneous. To test the significance of differences, ASEs are calculated, yielding a symmetric matrix (Table 26.7). Low values (less than 0.02) emerge for the different class combinations, mainly because of the large number of pixels. Next, we test for the significance of differences (Table 26.8), with H0 being equal to the hypothesis that ^i ¼ ^j, and the alternative hypothesis H1 that ^i 6¼ bj. Because of the large number of pixels, H0 is rejected for all class combinations. This means that there is a significant difference between the parameter values for each of these classes. We now turn to the standardized data, where also the diagonal values of the error matrix are included in the calculations. TABLE 26.5 ^ ij Using Equation 26.9 of Misclassifying Ci over Cj Using Estimated Probabilities Standardized Data PT PT yT3-J1 K1 aT3-J1 yPR

0.72 0.95 0.77 0.82

Note: See Table 26.1.

yT3-J1

K1

aT3-J1

yPR

0.28

0.05 0.11

0.23 0.43 0.86

0.18 0.35 0.82 0.42

0.89 0.57 0.65

0.14 0.18

0.58

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix

601

TABLE 26.6 Observed and Expected Entries in the Error Matrix PT PT yT3-J1 K1 aT3-J1 yPR

yT3-J1 304 (313)

805 (796) 140,464 (142,351) 50,825 (41,826) 23,788 (30,909)

95 (85) 2,997 (3,344) 8,082 (7,736)

K1

aT3-J1

yPR

8,772 (6,885) 0 (10)

3,256 (12,255) 2,836 (2,489) 13,645 (12,421)

13,716 (6,595) 3,848 (4,194) 3,799 (3,146) 2,621 (41,240)

827 (2,051) 60 (713)

21,226 (56,625)

Note: See Table 26.1.

TABLE 26.7 Asymptotic Standard Errors for the Parameters in the BT Model, Using the Standard Deviations and the Covariances between the Parameters PT PT yT3-J1 K1 aT3-J1 yPR

0.017 0.011 0.008 0.010

yT3-J1

K1

aT3-J1

yPR

0.017

0.011 0.019

0.008 0.016 0.012

0.010 0.016 0.014 0.010

0.019 0.016 0.016

0.012 0.014

0.010

Note: See Table 26.1.

TABLE 26.8 Test of Significances in Differences between Two Classes

PT

yT3-J1

K1 aT3-J1

yT-J1 K1 aT-J1 yPr K1 aT-J1 yPr aT-J1 yPr yPr

Note: See Table 26.1.

b^i b^j

ASE

t-Ratio

H0

0.932 3.029 2.097 1.228 0.295 1.801 1.545 0.612 1.484 0.317

0.017 0.011 0.019 0.008 0.016 0.012 0.010 0.016 0.014 0.010

54.84 280.99 108.02 145.39 18.08 145.01 155.96 39.47 108.69 32.77

Reject Reject Reject Reject Reject Reject Reject Reject Reject Reject

Signal and Image Processing for Remote Sensing

602 26.3.4

The BT Model for Standardized Data

When applying the BT model, diagonal values are excluded. This may have the following effect. If C1 contains k1 pixels identified as C2, and C2 contains k2 pixels identified as C1, then the BT model analyzes the binomial fraction k1/(k1þk2). The denominator k1 þ k2, however, depends also on the total number of incorrectly identified pixels in the two classes, say n1 for C1 and n2 for C2. Indeed, suppose that the number of classified pixels doubles for one particular class. One would have the assumption that this does not affect the misclassification probability. This does not apply to the BT model as presented above. As a solution, we standardized the counts per row by defining k ¼ (k1/n1)/(k1/n1 þ k2/ n2) and n ¼ 1. This has as an effect that the diagonals in the error matrix are scaled to 1, and that matrix elements not on the diagonal contain the number of misclassified pixels for each class combination. Furthermore, a large number of classified pixels should lead to lower standard deviations than a low number of classified pixels. To take this into account, a weighting was done, with weights equal to k1 þ k2 being equal to the number of incorrect classifications. Again, a generalized model is defined, with as a dependent variable the ratio k/n and the same explanatory variables as in Table 26.2. The SAS input file is given in Table 26.9. The first five columns, describing particular class combinations, are similar to those in Table 26.2, and the final five columns are described above. The variables in the model have also been described in Table 26.2. Estimated parameters b^i, together with their standard deviations are given in Table 26.10. We note that the largest b^i value again occurs for the class PT (Permian and Triassic formation) and the lowest value for class K1 (lower Cretaceous basalt). Class PT therefore has the highest probability of being misclassified and class K1 has the lowest probability. Being negative, this probability is smaller than that for Penterozoic granite (yPR). In contrast to the first analysis, none of the parameters significantly differs from 0, as is shown by the relatively large standard deviations. Subsequent calculations are carried out to compare differences in classes, as was done earlier. For example, misclassification probabilities corresponding to Table 26.5 are now in Table 26.11 and observed and expected entries in the error matrix are in Table 26.12. We now observe first that some modeled values are extremely good (such as the combinations between PT and yT3-J1) and second, some modeled values are totally different TABLE 26.9 data matrix; input PT yT3J1 K1 aT3J1 yPR k1 n1 k2 n2 w; k ¼ (k1/n1); n ¼ (k1/n1 þ k2/n2); cards; 1 1 1 1 0 0 0 0 0 0

1 0 0 0 1 1 1 0 0 0

0 1 0 0 1 0 0 1 1 0

0 0 1 0 0 1 0 1 0 1

0 0 0 1 0 0 1 0 1 1

304 8,772 3,256 13,716 0 2,836 3,848 13,645 3,799 2,621

518,326 518,326 518,326 518,326 86,103 86,103 86,103 8,123 8,123 95,244

proc genmod data ¼ matrix; model k/n ¼ PT yT3J1 K1 aT3J1 yPR/NOINT DIST ¼ BIN covb obstats; run;

805 140,464 50,825 23,788 95 2,997 8,082 827 60 21,226

86,103 8,123 95,244 31,569 8,123 95,244 31,569 95,244 31,569 31,569

1,109 149,236 54,081 37,504 95 5,833 11,930 14,472 3,859 23,847

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix

603

TABLE 26.10 Estimated Parameters b^i and Standard Deviations se(b^i) Using the Weighted BT Model for Standardized Data

b^i se(b^i)

PT

yT3-J1

K1

aT3-J1

yPR

4.813 5.404

1.887 4.585

3.337 6.297

2.245 3.483

0.000 0.000

TABLE 26.11 ^ ij Using Equation 26.9 of Misclassifying Ci over Cj Using Standardized Data Estimated Probabilities PT PT yT3-J1 K1 aT3-J1 yPR

0.95 1.00 0.93 0.99

yT3-J1

K1

aT3-J1

yPR

0.05

0.00 0.01

0.07 0.59 1.00

0.01 0.13 0.97 0.10

0.99 0.41 0.87

0.00 0.03

0.90

from the observed values, in particular, negative values still emerge. These differences occur in particular in the right corner of the matrix. Testing of differences between different classes (Table 26.13) can again be carried out in a similar way, with again significance occurring in all differences, due to the very large number of pixels.

26.4

Discussion

This chapter focuses on the BT model for summarizing the error matrix. The model is applied to the error matrix, derived from a segmentation of an image using a hierarchical

TABLE 26.12 Observed and Expected Entries in the Error Matrix PT PT yT3-J1 K1 aT3-J1 yPR

805 (804) 140464 (10515) 50825 * 23788 (14605)

yT3-J1

K1

304 (304)

8772 (117179) 0 (0)

95 (0) 2997 (3057) 8082 (7766)

827 (1208) 60 *

Note: See Table 26.1; entries marked * indicate an estimate of a negative value.

aT3-J1 3256 2836 (2780) 13645 (9342)

21226 *

yPR 13716 (22340) 3848 (4005) 3799 * 2621 *

Signal and Image Processing for Remote Sensing

604 TABLE 26.13

Test of Significances in Differences between Two Classes Using Standardized Data

PT

yT3-J1

K1 aT3-J1

yT3-J1 K1 aT3-J1 yPR K1 aT3-J1 yPR aT3-J1 yPR yPR

^j ^i b b

ASE

t-Ratio

H0

2.926 8.150 5.224 2.568 0.358 5.582 4.813 1.887 3.337 2.245

0.017 0.011 0.019 0.008 0.016 0.012 0.010 0.016 0.014 0.010

172.09 756.06 269.15 304.09 21.94 449.40 485.93 121.64 244.36 232.07

Reject Reject Reject Reject Reject Reject Reject Reject Reject Reject

segmentation algorithm. The image is classified on geological units. The k-statistic, measuring the accuracy of the whole error matrix, considers the actual agreement and chance agreement, but ignores asymmetry in the matrix. In addition, the conditional k-statistic measures the accuracy of agreement within each category, but does not consider the preference of one category over another category. These measures of accuracy only consider the agreement of classified pixels and reference pixels. In this study we extended these measures with those from the BT model to include chance agreement and disagreement. The BT model in its original form studies preference of one category over another. A pairwise comparison between classes gives additional parameters as compared to other measures of accuracy. The model also yields expected against observed values, estimated parameters, and probabilities of misclassification of one category over another category. Using the BT model, we can determine both the agreement within a category and disagreement in relation to another category. Parameters computed from this model can be tested for statistical significance. This analysis does not take into account the categories with zero values in combination with other categories. A formal testing procedure can be implemented, using ASEs. The class parameters bi provide a ranking of the categories. The BT model shows that a class, which is easier to recognize, is less confused with other classes. At this stage it is difficult to say which of the two implemented BT models is most useful and appropriate for application. It appears that on the one hand the standardized model allows us to give an interpretation that is fairer and more stable in the long run, but the sometimes highly erroneous estimates of misclassification are a major drawback for application. The original, unstandardized BT model may be applicable for an error matrix as demonstrated in this study, but a large part of the available information is ignored. One reason is that the error matrix is typically different from the error matrix as applied in Ref. [11]. Positional and thematic accuracy of the reference data is crucial for a successful accuracy assessment. Often the positional and thematic errors in the reference data are unknown or are not taken into account. Vagueness in the class definition and the spatial extent of objects is not included in most accuracy assessments. To take into account uncertainty in accuracy assessment, membership values or error index values could be used.

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix

26.5

605

Conclusions

We conclude that the BT model can be used for a statistical analysis of an error matrix obtained by a hierarchical classification of a remotely sensed image. This model relies on the key assumption that misclassification of one class as another is one minus the probability of misclassifying the other class as the first class. The model provides parameters and estimates for differences between classes. As such, it may serve as an extension to the k-statistic. As this study has shown, more directed information is obtained, including a statement whether these differences are significantly different from zero.

References 1. Foody, M.G., 2002, Status of land cover classification accuracy assessment, Rem. Sens. Environ., 80, 185–201. 2. Congalton, R.G., 1994, A Review of assessing the accuracy of classifications of remotely sensed data, in Remote Sensing Thematic Accuracy Assessment: A compendium, Fenstermaker, K.L., ed., Am. Soc. Photogramm. Rem. Sens., Bethseda, pp. 73–96. 3. Agresti, A., 1996, An Introduction to Categorical Data Analysis, John Wiley & Sons, Inc., New York. 4. Bradley, R.A. and Terry, M.E., 1952, Rank analysis of incomplete block designs I. The method of paired comparisons, Biometrika, 39, 324–345. 5. Stein, A., Aryal, J., and Gort, G., 2005, Use of the Bradley–Terry model to quantify association in remotely sensed images, IEEE Trans. Geosci. Rem. Sens., 43, 852–856. 6. Lillesand, T.M. and Kiefer, R.W., 2000, Remote sensing and Image Interpretation, 4th edition, John Wiley & Sons, Inc., New York. 7. Gordon, A.D., 1980, Classification, Chapman & Hall, London. 8. Janssen, L.L.F. and Gorte, B.G.H., 2001, Digital image classification, in Principles of Remote Sensing, L.L.F. Janssen and G.C. Huurneman, eds., 2nd edition, ITC, Enschede, pp. 73–96. 9. Richards, J.A. and Jia, X., 1999, Remote Sensing Digital Image Analysis, 3rd edition, SpringerVerlag, Berlin. 10. Hosmer, D.W. and Lemeshow, S., 1989, Applied Logistic Regression, John Wiley & Sons, Inc., New York. 11. Lucieer, L., Tsolmongerel, O., and Stein, A., 2004, Texture-based segmentation for identification of geological units in remotely sensed imagery, in Proc. ISSDQ ’04, A.U. Frank and E. Grum, eds., pp. 117–120. 12. Lucieer, A., Stein, A., and Fisher, P., 2005, Texture-based segmentation of high-resolution remotely sensed imagery for identification of fuzzy objects, Int. J. Rem. Sens., 26, 2917–2936.

27 SAR Image Classification by Support Vector Machine

Michifumi Yoshioka, Toru Fujinaka, and Sigeru Omatu

CONTENTS 27.1 Introduction ..................................................................................................................... 607 27.2 Proposed Method............................................................................................................ 608 27.3 Simulation ........................................................................................................................ 614 27.3.1 Data Set and Condition for Simulations ....................................................... 614 27.3.2 Simulation Results ............................................................................................ 615 27.3.3 Reduction of SVM Learning Cost .................................................................. 615 27.4 Conclusions...................................................................................................................... 618 References ................................................................................................................................... 618

27.1

Introduction

Remote sensing is the term used for observing the strength of electromagnetic radiation that is radiated or reflected from various objects on the ground level with a sensor installed in a space satellite or in an aircraft. The analysis of acquired data is an effective means to survey vast areas periodically [1]. Land map classification is one of the analyses. The land map classification classifies the surface of the Earth into categories such as water area, forests, factories, or cities. In this study, we will discuss an effective method for land map classification by using synthetic aperture radar (SAR) and support vector machine (SVM). The sensor installed in the space satellite includes an optical and a microwave sensor. SAR as an active-type microwave sensor is used for land map classification in this study. A feature of SAR is that it is not influenced by weather conditions [2–9]. As a classifier, SVM is adopted, which is known as one of the most effective methods in pattern and texture classification; texture patterns are composed of many pixels and are used as input features for SVM [10–12]. Traditionally, the maximum likelihood method has been used as a general classification technique for land map classification. However, the categories to be classified might not achieve high accuracy because the method assumes normal distribution of the data of each category. Finally, the effectiveness of our proposed method is shown by simulations.

607

Signal and Image Processing for Remote Sensing

608

27.2

Proposed Method

The outline of the proposed method is described here. At first, the target images from SAR are divided into an area of 88 pixels for the calculation of texture features. The texture features that serve as input data to the SVM are calculated using gray level cooccurrence matrix (GLCM), Cij, and gray level difference matrix (GLDM), Dk. The term GLCM means the co-occurrence probability that neighbor pixels i and j become the same gray level, and GLDM means the gray level difference of neighbor pixels whose distance is k. The definitions of texture features based on GLCM and GLDM are as follows: Energy (GLCM) E¼

X i ,j

C2ij

(27:1)

Entropy (GLCM) X

H¼

Ci j log Cij

(27:2)

i,j

Local homogeneity L¼

X

1

Cij

2 i,j 1 þ (i j)

(27:3)

Inertia I¼

X

(i j)2 Cij

(27:4)

i ,j

Correlation C¼

X (i mi )(j mj ) si s j

i ,j

mi ¼

X X X X i Cij , mj ¼ j Cij i

s2i ¼

X i

Variance

Cij

j

j

(i mi )2

X

i

s2j ¼

Ci j ,

X

j

V¼

j

X i ,j

(i mi )Cij

(j mj )2

X

Cij

(27:5)

i

(27:6)

Sum average S¼

X i,j

(i þ j)Cij

(27:7)

SAR Image Classification by Support Vector Machine

609

Energy (GLDM) Ed ¼

X

D2k

(27:8)

k

Entropy (GLDM) Hd ¼

X

Dk log {Dk }

(27:9)

k

Mean M¼

X

k Dk

(27:10)

{k M}2 Dk

(27:11)

k

Difference variance V¼

X k

The next step is to select effective texture features as an input to SVM as there are too many texture features to feed SVM [(7 GLCMs þ 4 GLDMs) 8 bands ¼ totally 88 features]. Kullback–Leibler distance is adopted as the selection method of features in this study. The definition of Kullback–Leibler distance between two probability density functions p(x) and q(x) is as follows: ð

L ¼ p(x) log

p(x) dx q(x)

(27:12)

Using the above as the distance measure, the distance indicated in the selected features between two categories can be compared, and the feature combinations whose distance is large are selected as input to the SVM. However, it is difficult to calculate all combinations of 88 features for computational costs. Therefore, in this study, each 5-feature combination from 88 is tested for selection. Then the selected features are fed to the SVM for classification. The SVM classifies the data into two categories at a time. Therefore, in this study, input data are classified into two sets, that is, a set of water and cultivation areas or a set of city and factory areas in the first stage. In the second stage, these two sets are classified into two categories, respectively. In this step, it is important to reduce the learning costs of SVM since the remote sensing data from SAR are too large for learning. In this study, we propose a reduction method of SVM learning costs using the extraction of surrounding part data based on the distance in the kernel space because the boundary data of categories determine the SVM learning efficiency. The distance d(x) of an element x in the kernel space from the category to which the element belongs is defined as follows using the kernel function F(x): 2 n 1X d (x) ¼ F(x) F(xk ) n k¼1 !t ! n n 1X 1X ¼ F(x) F(xk ) F(x) F(xl ) n k¼1 n l¼1 n n n X n 1X 1X 1 X ¼ F(x)t F(x) F(x)t F(xl ) F(xk )t F(x) þ 2 F(xk )t F(xl ) (27:13) n l¼1 n k¼1 n k¼1 l¼1 2

Signal and Image Processing for Remote Sensing

610

Here, xk denotes the elements of category, and n is the total number of elements. Using the above distance d(x), the relative distance r1(x) and r2(x) can be defined as r1 (x) ¼

d2 (x) d1 (x) d1 (x)

(27:14)

r2 (x) ¼

d1 (x) d2 (x) d2 (x)

(27:15)

In these equations, d1(x) and d2(x) indicate the distance of the element x from the category 1 or 2, respectively. A half of the total data that has small relative distance is extracted and fed to the SVM. To evaluate this extraction method by comparing with the traditional method based on Mahalanobis distance, the simulation is performed using sample data 1 and 2 illustrated in Figure 27.1 through Figure 27.4, respectively. The distribution of samples 1 and 2 is Gaussian. The centers of distributions are (0.5,0), (0.5,0) in class 1 and 2 of sample 1, and (0.6), (0.6) in class 1, and (0,0) in class 2 of sample 2, respectively. The variances of distributions are 0.03 and 0.015, respectively. The total number of data is 500 per class. The kernel function used in this simulation is as follows: kx x0 k2 K(x,x ) ¼ F(x) F(x ) ¼ exp 2s2 0

T

!

0

(27:16)

2s2 ¼ 0:1 As a result of the simulation illustrated in Figure 27.2, Figure 27.3, Figure 27.5, and Figure 27.6, in the case of sample 1, both the proposed and the Mahalanobis-based method classify 1

Class 1 Class 2

0.5

0

−0.5

−1 −1 FIGURE 27.1 Sample data 1.

−0.5

0

0.5

1

SAR Image Classification by Support Vector Machine

611

1 Class 1 Class 2 Extraction 1 Extraction 2 0.5

0

−0.5

−1 −1

−0.5

0

0.5

1

FIGURE 27.2 Extracted boundary elements by proposed method (sample 1).

1

Class 1 Class 2 Extraction 1 Extraction 2

0.5

0

−0.5

−1 −1

−0.5

0

FIGURE 27.3 Extracted boundary elements by Mahalanobis distance (sample 1).

0.5

1

Signal and Image Processing for Remote Sensing

612 1

Class 1 Class 2

0.5

0

−0.5

−1 −1

−0.5

0

0.5

1

FIGURE 27.4 Sample data 2.

1 Class 1 Class 2 Extraction 1 Extraction 2 0.5

0

−0.5

−1 −1

−0.5

0

FIGURE 27.5 Extracted boundary elements by proposed method (sample 2).

0.5

1

SAR Image Classification by Support Vector Machine

613

1 Class 1 Class 2 Extraction 1 Extraction 2 0.5

0

−0.5

−1 −1

−0.5

0

0.5

1

FIGURE 27.6 Extracted boundary elements by Mahalanobis distance (sample 2).

data successfully. However, in the case of sample 2, the Mahalanobis-based method fails to classify data though the proposed method succeeds. This is because Mahalanobis-based method assumes that the data distribution is spheroidal. The distance function in those methods illustrated in Figure 27.7 through Figure 27.10 clearly shows the reason for the classification ability difference of those methods.

d1(x,y)

1.15 1.1 1.05 1 0.95 0.9 0.85 0.8 0.75 0.7 1 0.5 −1

−0.5

0 −0.5

0 0.5

FIGURE 27.7 Distance d1(x) for class 1 in sample 2 (proposed method).

1 −1

Signal and Image Processing for Remote Sensing

614

d2(x,y)

1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 1 0.5 −1

0

−0.5

−0.5

0 0.5

1 −1

FIGURE 27.8 Distance d2(x) for class 2 in sample 2 (proposed method).

27 .3 27.3.1

Simulat ion Data Set and Condition for Simulations

The target data (Figure 27.11) used in this study for the classification are the observational data by SIR-C. The SIR-C device is a SAR system that consists of two wavelengths: L-band (wavelength 23 cm) and C-band (wavelength 6 cm) and four polarized electromagnetic radiations. The observed region is Kagawa Prefecture, Sakaide City, Japan (October 3,

d1(x,y)

9 8 7 6 5 4 3 2 1 0 1 0.5 −1

−0.5

0 −0.5

0 0.5

FIGURE 27.9 Distance d1(x) for class 2 in sample 2 (Mahalanobis).

1 −1

SAR Image Classification by Support Vector Machine

615 d2(x,y)

14 12 10 8 6 4 2 0 1 0.5 −1

−0.5

0 −0.5

0 0.5

1 −1

FIGURE 27.10 Distance d2(x) for class 2 in sample 2 (Mahalanobis).

1994). The image sizes are 1000 pixels in height, 696 pixels in width, and each pixel has 256 gray levels in eight bands. To extract the texture features, the areas of 88 pixels on the target data are combined and classified into four categories ‘‘water area,’’ ‘‘cultivation region,’’ ‘‘city region,’’ and ‘‘factory region.’’ The mountain region is not classified because of the backscatter. The ground-truth data for training are shown in Figure 27.12 and the numbers of sample data in each category are shown in Table 27.1. The selected texture features based on Kullback–Leibler distance mentioned in the previous section are shown in Table 27.2. The kernel function of SVM is Gaussian kernel with the variance s2 ¼ 0.5, and soft margin parameter C is 1000. The SVM training data are 100 samples randomly selected from ground-truth data for each category.

27.3.2

Simulation Results

The final result of simulation is shown in Table 27.3. In this table, ‘‘selected’’ implies the classification accuracy with selected texture features in Table 27.2, and ‘‘all’’ implies the accuracy with all 88 kinds of texture features. The final result shows the effectiveness of feature selection for improving classification accuracy.

27.3.3

Reduction of SVM Learning Cost

The learning time of SVM depends on the number of sample data. Therefore, the computational cost reduction method for SVM learning is important for complex data sets such as ‘‘city’’ and ‘‘factory’’ region in this study. Then, the reduction method proposed in the previous section is applied, and the effectiveness of this method is evaluated by comparing the learning time with traditional methods. The numbers of data are from 200 to 4000, and the learning times of the SVM classifier are measured in two cases. In the

616

Signal and Image Processing for Remote Sensing

FIGURE 27.11 Target data.

first case, all data are used for learning and in the second case, by using the proposed method, data for learning are reduced by half. The selected texture features for classification are the energy (band 1), the entropy (band 6), and the local homogeneity (band 2), and the SVM kernel is Gaussian. Figure 27.13 shows the result of the simulation. The CPU of the computer used in this simulation is Pentium 4/2GHz. The result of the simulation clearly shows that the learning time is reduced by using the proposed method. The learning time is reduced to about 50% on average.

SAR Image Classification by Support Vector Machine

617

Water City Cultivation Factory Mountain

FIGURE 27.12 (See color insert following page 302.) Ground-truth data for training.

TABLE 27.1 Number of Data Category Water Cultivation City Factory

Number of Data 133201 16413 11378 2685

TABLE 27.2 Selected Features. Category Water, cultivation/city, factory Water/factory City/factory Numbers in parentheses indicate SAR band

Texture Features Correlation (1, 2), sum average (1) Variance (1) Energy (1), entropy (6), local homogeneity (2)

Signal and Image Processing for Remote Sensing

618

TABLE 27.3 Classification Accuracy (%)

Selected All

Water

Cultivation

City

Factory

Average

99.82 99.49

94.37 94.35

93.18 92.18

89.77 87.55

94.29 93.39

1600 Learning time (sec)

1400 1200

Normal

1000 800

Accelerated

600 400 200 0

200

400

600

800 1000 Number of data

2000

3000

4000

FIGURE 27.13 SVM learning Time.

27.4

Conclusions

In this chapter, we have proposed the automatic selection of texture feature combinations based on the Kullback–Leibler distance between category data distributions, and the computational cost reduction method for SVM classifier learning. As a result of simulations, by using our proposed texture feature selection method and the SVM classfier, higher classification accuracy is achieved when compared with traditional methods. In addition, it is shown that our proposed SVM learning method can be applied to more complex distributions than applying traditional methods.

References 1. Richards J.A., Remote Sensing Digital Image Analysis, 2nd ed., Springer-Verlag, Berlin, p. 246, 1993. 2. Hara Y., Atkins R.G., Shin R.T., Kong J.A., Yueh S.H., and Kwok R., Application of neural networks for sea ice classification in polarimetric SAR images, IEEE Transactions on Geoscience and Remote Sensing, 33, 740, 1995. 3. Heermann P.D. and Khazenie N., Classification of multispectral remote sensing data using a back-propagation neural network, IEEE Transactions on Geoscience and Remote Sensing, 30, 81, 1992.

SAR Image Classification by Support Vector Machine

619

4. Yoshida T., Omatu S., and Teranishi M., Pattern classification for remote sensing data using neural network, Transactions of the Institute of Systems, Control and Information Engineers, 4, 11, 1991. 5. Yoshida T. and Omatu S., Neural network approach to land cover mapping, IEEE Transactions on Geoscience and Remote Sensing, 32, 1103, 1994. 6. Hecht-Nielsen R., Neurocomputing, Addison-Wesley, New York, 1990. 7. Ulaby F.T. and Elachi C., Radar Polarimetry for Geoscience Applications, Artech House, Norwood, 1990. 8. Van Zyl J.J., Zebker H.A., and Elach C., Imaging radar polarization signatures: theory and observation, Radio Science, 22, 529, 1987. 9. Lim H.H., Swartz A.A., Yueh H.A., Kong J.A., Shin R.T., and Van Zyl J.J., Classification of earth terrain using polarimetric synthetic aperture radar images, Journal of Geophysical Research, 94, 7049, 1989. 10. Vapnik V.N., The Nature of Statistical Learning Theory, 2nd ed., Springer, New York, 1999. 11. Platt J., Sequential minimal optimization: A fast algorithm for training support vector machines, Technical Report MSR-TR-98-14, Microsoft Research, 1998. 12. Joachims T., Making large-scale SVM learning practical, In B. Scho¨lkopf, C.J.C. Burges, and A.J. Smola, Eds., Advanced in Kernel Method—Support Vector Learning, MIT Press, Cambridge, MA, 1998.

28 Quality Assessment of Remote-Sensing Multi-Band Optical Images

Bruno Aiazzi, Luciano Alparone, Stefano Baronti, and Massimo Selva

CONTENTS 28.1 Introduction ..................................................................................................................... 621 28.2 Information Theoretic Problem Statement ................................................................. 623 28.3 Information Assessment Procedure............................................................................. 624 28.3.1 Noise Modeling ................................................................................................. 624 28.3.2 Estimation of Noise Variance and Correlation ............................................ 625 28.3.3 Source Decorrelation via DPCM .................................................................... 628 28.3.4 Entropy Modeling............................................................................................. 629 28.3.5 Generalized Gaussian PDF.............................................................................. 630 28.3.6 Information Theoretic Assessment ................................................................ 631 28.4 Experimental Results...................................................................................................... 632 28.4.1 AVIRIS Hyperspectral Data ............................................................................ 632 28.4.2 ASTER Superspectral Data.............................................................................. 636 28.5 Conclusions...................................................................................................................... 640 Acknowledgment....................................................................................................................... 640 References ................................................................................................................................... 640

28.1

Introduction

Information theoretic assessment is a branch of image analysis aimed at defining and measuring the quality of digital images and is presently an open problem [1–3]. By resorting to Shannon’s information theory [4], the concept of quality can be related to the information conveyed to a user by an image or, in general, by multi-band data, that is, to the mutual information between the unknown noise-free digitized signal (either radiance or reflectance in the visible-near infrared (VNIR) and short-wave infrared (SWIR) wavelengths, or irradiance in the middle infrared (MIR), thermal infrared (TIR), and far infrared (FIR) bands) and the corresponding noise-affected observed digital samples. Accurate estimates of the entropy of an image source can only be obtained provided the data are uncorrelated. Hence, data decorrelation must be considered to suppress or largely reduce the correlation existing in natural images. Indeed, entropy is a measure

621

622

Signal and Image Processing for Remote Sensing

of statistical information, that is, of uncertainty of symbols emitted by a source. Hence, any observation noise introduced by the imaging sensor results in an increment in entropy, which is accompanied by a decrement of the information content useful in application contexts. Modeling and estimation of the noise must be preliminarily carried out [5] to quantify its contribution to the entropy of the observed source. Modeling of information sources is also important to assess the role played by the signal-to-noise ratio (SNR) in determining the extent to which an increment in radiometric resolution can increase the amount of information available to users. The models that are exploited are simple, yet adequate, for describing first-order statistics of memoryless information sources and autocorrelation functions of noise processes, typically encountered in digitized raster data. The mathematical tractability of models is fundamental for deriving an information theoretic closed-form solution yielding the entropy of the noise-free signal from the entropy of the observed noisy signal and the estimated noise model parameters. This work focuses on measuring the quality of multi-band remotely sensed digitized images. Lossless data compression is exploited to measure the information content of the data. To this purpose, extremely advanced lossless compression methods capable of attaining the ultimate compression ratio, regardless of any issues of computational complexity [6,7], are utilized. In fact, the bit rate achieved by a reversible compression process takes into account both the contribution of the ‘‘observation’’ noise (i.e., information regarded as statistical uncertainty, whose relevance is null to a user) and the intrinsic information of hypothetically noise-free samples. Once the parametric model of the noise, assumed to be possibly non-Gaussian and both spatially and spectrally autocorrelated, has been preliminarily estimated, the mutual information between noise-free signal and recorded noisy signal is calculated as the difference between the entropy of the noisy signal and the entropy derived from the parametric model of the noise. Afterward, the amount of information that the digitized samples would convey if they were ideally recorded without observation noise is estimated. To this purpose, an entropy model of the source is defined. The inversion of the model yields an estimate of the information content of the noise-free source starting from the code rate and the noise model. Thus, it is possible to establish the extent to which an increment in the radiometric resolution, or equivalently in the SNR, obtained due to technological improvements of the imaging sensor can increase the amount of information that is available to the users’ applications. This objective measurement of quality fits better the subjective concept of quality, that is, the capability of achieving a desired objective as the number of spectral bands increases. Practically, (mutual) information, or equivalently SNR, is the sole quality index for hyperspectral imagery, generally used for detection and identification of materials and spectral anomalies, rather than for conventional multi-spectral classification tasks. The remainder of this chapter is organized as follows. The information theoretic fundamentals underlying the analysis procedure are reviewed in Section 28.2. Section 28.3 presents the information theoretic procedure step by step: noise model, estimation of noise parameters, source decorrelation by differential pulse code modulation (DPCM), and parametric entropy modeling of memoryless information sources via generalized Gaussian densities. Section 28.4 reports experimental results on a hyperspectral image acquired by the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) radiometer and on a test superspectral image acquired by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) imaging radiometer. Concluding remarks are drawn in Section 28.5.

Quality Assessment of Remote-Sensing Multi-Band Optical Images

28.2

623

Informat ion Theoretic Problem Statement

If we consider a discrete multi-dimensional signal as an information source S, its average information content is given by its entropy H(S)[8]. An acquisition procedure originates an observed digitized source S^, whose information is the entropy H(S^). H(S^) is not adequate for measuring the amount of acquired information, since the observed source generally does not coincide with the digitized original source, mainly because of the observation noise. Furthermore, the source is not exactly band-limited by half of the sampling frequency; hence, the nonideal sampling is responsible for an additional amount of noise generated by the aliasing phenomenon. Therefore, only a fraction of the original source information is conveyed by the digitized noisy signal. The amount of source information that is not contained in the digitized samples is measured by the conditional entropy H(S jS^) or equivocation, which is the residual uncertainty on the original source when the observed source is known. The contribution of the overall noise (i.e., aliasing and acquisition noise) to the entropy of the digitized source is measured by the conditional entropy H(S^ j S), which represents the uncertainty on the observed source S^ when the original source S is known. Therefore, the larger the acquisition noise, the larger H(S^), even if the amount of information of the original source that is available from the observed (noisy) source is not increased, but diminished by the presence of the noise. A suitable measure of the information content of a recorded source is instead represented by the mutual information: ^) ¼ H(S) H(SjS ^) I(S;S ^) H(S ^jS) ¼ H(S ^) H(S,S ^): ¼ H(S) þ H(S

(28:1)

Figure 28.1 describes the relationship existing between the entropy of the original and the recorded source and mutual information and joint entropy H(S, S^). In the following sections, the procedure reported in Figure 28.2 for estimating the mutual information I(S; S^) and the entropy of the noise-free source H(S) is described later. The estimation relies on parametric noise and source modeling that is also capable of describing non-Gaussian sources usually encountered in a number of application contexts.

^ H (S)

H (S )

^ H (S⏐S )

^ ^ I(S;S) I(S;S)

^ H (S ,S )

^ H (S⏐S)

FIGURE 28.1 Relationship between entropies H(S) and H(S^), equivocation H(S jS^), conditional entropy H(S^jS), mutual information I(S; S^ ), and joint entropy H(S, S^ ).

Signal and Image Processing for Remote Sensing

624

GG PDF Entropy of PDF of Noisy residuals of noisy residuals Noise-free residuals noise-free signal Parametric H(S ) GG PDF De convolution entropy modelling Decorrelation estimation Entropy of noisy signal ^ H(S )

Noisy Signal

GG PFD of decorrelated noise

Noise histogram Noise estimation

GG PDF modelling

Noise parameters (s , r)

+ Noise entropy ^ H(S⏐S )

Mutual information ^ I(S ;S )

+ –

FIGURE 28.2 Flowchart of the information theoretic assessment procedure for a digital signal.

28.3 28.3.1

Information Assessment Procedure Noise Modeling

This section focuses on modeling the noise affecting digitized observed signal samples. Unlike coherent or systematic disturbances, which may occur in some kind of data, the noise is assumed to be due to a fully stochastic process. Let us assume an additive signalindependent non-Gaussian model for the noise g(i) ¼ f (i) þ n(i)

(28:2)

in which g(i) is the recorded noisy signal level and f(i) the noise-free signal at position (i). Both g(i) and f(i) are regarded as nonstationary non-Gaussian autocorrelated stochastic processes. The term n(i) is a zero-mean process, independent of f, stationary and autocorrelated. Let its variance sn2 and its correlation coefficient (CC) r be constant. Let us assume for the stationary zero-mean noise a first-order Markov model, uniquely defined by the r and the sn2 n(i) ¼ r n(i 1) þ «n (i)

(28:3)

in which «n(i) is an uncorrelated random process having variance s2«n ¼ s2n (1 r2 )

(28:4)

The variance of Equation 28.2 can be easily calculated as s2g (i) ¼ s2f (i) þ s2n

(28:5)

due to the independence between signal and noise components and to the spatial stationarity of the latter. From Equation 28.3, it stems that the autocorrelation, Rnn(m), of n(i) is an exponentially decaying function of the correlation coefficient r:

Quality Assessment of Remote-Sensing Multi-Band Optical Images Rnn (m) D E[n(i)n(i þ m)] ¼ rjmj s2n ¼

625 (28:6)

The zero-mean additive signal-independent correlated noise model (Equation 28.3) is relatively simple and mathematically tractable. Its accuracy has been validated for twodimensional (2D) and three-dimensional (3D) signals produced by incoherent systems [3] by measuring the exponential decay of the autocorrelation function in Equation 28.6. The noise samples n(i) may be estimated on homogeneous signal segments, in which f(i) is constant, by taking the difference between g(i) and its average g(i) on a sliding window of length 2m þ 1. Once the CC of the noise, r, and the most homogeneous image pixels have been found by means of robust bivariate regression procedures [3], as described in the next section, the noise samples are estimated in the following way. If Equation 28.3 and Equation 28.6 are utilized to calculate the correlation of the noise affecting g and g on a homogeneous window, the estimated noise sample at the ith position is written as vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u (2m þ 1) u [g(i) g(i)] ^(i) ¼ t n m (2m þ 1) 1 þ 2r 1r 1r

(28:7)

^(i)} is made available to find the noise probability density function The resulting set {n (PDF), either empirical (histogram) or parametric, via proper modeling.

28.3.2

Estimation of Noise Variance and Correlation

To properly describe the estimation procedure, a 2D notation is adopted in this section, that is, (i, j) identifies the spatial position of the indexed entities. The standard deviation of the noisy observed band g(i, j) is stated in homogeneous areas as sg (i; j) ¼ sn

(28:8)

Therefore, Equation 28.8 yields an estimate of sn, namely ^n, as the y-intercept of the horizontal regression line drawn on the scatterplot of ^g versus ^g, in which the symbol ^ denotes estimated values, and is calculated only on pixels belonging to homogeneous areas. Although methods based on scatterplots have been devised more than one decade ago for speckle noise assessment [9], the crucial point is the reliable identification of homogeneous areas. To overcome the drawback of a user-supervised method, an automatic procedure was developed [10] on the basis of the fact that each homogeneous area originates a cluster of scatterpoints. All these clusters are aligned along a horizontal straight line having the y-intercept equal to sn. Instead, the presence of signal edges and textures originates scatterpoints spread throughout the plot. The scatterplot relative to the whole band may be regarded as the joint PDF of the estimated local standard deviation to the estimated local mean. In the absence of any signal textures, the image is made up by uniform noisy patches; by assuming that the noise is stationary, the PDF is given by the superposition of as many unimodal distributions as the patches. Because the noise is independent of the signal, the measured variance does not depend on the underlying mean. Thus, all the above expectations are aligned along a horizontal line. The presence of textured areas modifies the ‘‘flaps’’ of the PDF, which still exhibit aligned modes, or possibly a watershed. The idea is to threshold the PDF to identify a number of points belonging to the most homogeneous image areas, large enough to yield

Signal and Image Processing for Remote Sensing

626

a statistically consistent estimate and small enough to avoid comprising signal textures that, even if weak, might introduce a bias by excess. Now let us calculate the space-varying (auto)covariance of unity lag along either of the coordinate directions, say i and j, Cg (i, j;1, 0) D E{[g(i, j) E(g(i, j))] [g(i þ 1, j) E(g(i þ 1, j))]} ¼ Cf (i, j;1, 0) þ rx s2n ¼ (28:9) The term Cf (i, j;1, 0) on right-hand side of Equation 28.9 is identically zero in homogeneous areas in which sg (i, j) becomes equal to sn. Thus, Equation 28.9 becomes Cg (i, j;1, 0) ¼ rx s2n ¼ rx sg (i, j) sg (i þ 1, j)

(28:10)

Hence, rx, and analogously ry, is estimated from those points, lying on the covariance-tovariance scatterplots, corresponding to homogeneous areas. To avoid calculating the PDF, the following procedure was devised: .

Within a (2m þ 1) (2m þ 1) window, sliding over the image, calculate the local statistics of the noisy image to estimate its space-varying ensemble statistics: – average g(i, j ) ^g(i, j ) g(i, j) ¼

1

m m X X

(2m þ 1)2

k¼m l¼m

g(i þ k, j þ l)

(28:11)

– RMS deviation from the average, ^g(i, j), with s ^g2 (i, j) ¼

1

m m X X

(2m þ 1)2 1

k¼m l¼m

[g(i þ k, j þ l) g(i, j)]2

(28:12)

^g(i, j; 1, 0), given by – cross deviation from the average C ^ g (i, j;1, 0) ¼ C

1

m m X X

(2m þ 1)2 1 k¼m l¼m

[g(i þ k, j þ l) g(i, j)] [g(i þ k þ 1, j þ l) g(i þ 1, j)] (28:13)

.

^g(i, j; 1, 0) Draw scatterplots either of ^g(i, j ) versus ^g(i, j ) to estimate sn or of C versus ^g(i, j ) ^g(i þ 1, j ) to estimate rx.

.

Partition the scatterplot planes into an L L array of rectangular blocks. Sort and label such blocks by decreasing population, that is, the number of scatter points: if C() denotes the cardinality of a set, then C(B(k)) C(B(k þ 1)), k ¼ 1, . . . , L2.

.

.

For each scatterplot, calculate a succession ofShorizontal regression lines, that is {^ n(k)}, from the set of scatter points {p j p 2 lk¼ 1 B(l)}.

The succession attains a steady value of the parameter after a number of terms that depend on the actual percentage of homogeneous points. The size of the partition and the stop criterion are noncrucial; however, a 100 100 array of blocks and the stop criterion is applied after processing 2–10% of points, depending on the degree of

Quality Assessment of Remote-Sensing Multi-Band Optical Images

627

heterogeneity of the scene, are usually setup. The size of the processing window, that is, (2m þ 1) (2m þ 1), is noncrucial if the noise is white. Otherwise, a size 7 7 to 11 11 is recommended, because too small a size will underestimate the covariance. It is noteworthy that, unlike most of the methods that rely on the assumption of white noise, the scatterplot method, which is easily adjustable to deal with signaldependent noise, can also accurately measure correlated noise, and is thus preferred in this context. To show an example of the estimation procedure, Figure 28.3a portrays band 25 extracted from the sequence acquired on the Cuprite Mine test site in 1997 by the AVIRIS instrument. On this image, homogeneous areas that contribute to the estimation procedure have been automatically extracted (Figure 28.3b). The scatterplot of local standard deviation to local mean is plotted in Figure 28.3c. Homogeneous areas cluster and determine the regression line whose y-intercept is the estimated standard deviation of noise. Eventually, the scatterplot of local one-lag covariance to local variance is reported in Figure 28.3d; the slope of the regression line represents the estimated correlation coefficient. (a)

(b)

(d) 2,000

Local one-lag covariance

Local standard deviation

(c) 1,800 1,600 1,400 1,200 1,000 800 600 400

200,000

160,000

120,000

80,000

40,000

200 0 4,000

5,000

6,000

7,000

8,000

Local mean

9,000

10,000

0 0

40,000

80,000

120,000 160,000 200,000

Local variance

FIGURE 28.3 Estimation of noise parameters: (a) AVIRIS band 25 extracted from the sequence of Cuprite Mine test site acquired in 1997; (b) map of homogeneous areas that contribute to the estimation; (c) scatterplot of local standard deviation to local mean; the y-intercept of the regression line is the estimated standard deviation of noise; (d) scatterplot of local one-lag covariance to local variance; the slope of the regression line is the estimated correlation coefficient.

Signal and Image Processing for Remote Sensing

628 28.3.3

Source Decorrelation via DPCM

Differential pulse code modulation is usually employed for reversible data compression. DPCM basically consists of a prediction stage followed by entropy coding of the resulting prediction errors. For the sake of clarity, we develop the analysis for a 1D fixed DPCM and extend its results to the case of 2D and 3D adaptive prediction [6,7,11,12]. Let ^ g(i) denote the prediction at pixel i obtained as a linear regression of the values of P previous pixels ^ g(i) ¼

P X

f(j) g(i j)

(28:14)

j¼1

in which {f(j), j ¼ 1,. . ., P} are the coefficients of the linear predictor and constant throughout the image. By replacing the additive noise model, one obtains ^ g(i) ¼ ^f (i) þ

P X

f(j) n(i j)

(28:15)

j¼1

in which ^f (i) ¼

P X

f(j) f (i j)

(28:16)

j¼1

represents the prediction for the noise-free signal expressed from its previous samples. Prediction errors of g are D

eg (i) ¼ g(i) ^ g(i) ¼ ef (i) þ n(i)

P X

f(j) n(i j)

(28:17)

j¼1 D f (i) ^f (i) is the error the predictor would produce starting from noise-free in which ef (i) ¼ data. Both eg(i) and ef (i) are zero-mean processes, uncorrelated, and nonstationary. The zero-mean property stems from an assumption of local first-order stationarity, within the (P þ 1)-pixel window comprising the current pixel and its prediction support. Equation 28.17 is written as

eg (i) ¼ ef (i) þ en (i)

(28:18)

in which D

^(i) ¼ n(i) en (i) ¼ n(i) n

P X

f(j) n(i j)

(28:19)

j¼1

is the error produced when the correlated noise is predicted. The term en(i) is assumed to be zero mean, stationary, and independent of ef (i), because f and n are assumed to be independent of each other. Thus, the relationship among the variances of the three types of prediction errors becomes

Quality Assessment of Remote-Sensing Multi-Band Optical Images s2eg (i) ¼ s2ef (i) þ s2en

629 (28:20)

From the noise model in Equation 28.3, it is easily noticed that the term se2n is lower bounded by s«2n , which means that se2n sn2 (1 r2). The optimum MMSE predictor for a first-order Markov model, like in Equation 28.3, is f(1) ¼ r and f( j) ¼ 0, j ¼ 2, . . . , P; it yields se2n ¼ sn2 (1 r2) ¼ s«2n , which can easily be verified. Thus, the residual variance of the noise after decorrelation may be approximated from the estimated variance of the correlated noise, that is, ^n2 , and from its estimated CC, ^, as ^n2 (1 r^2 ) s2en ﬃ s

(28:21)

The approximation is more accurate as the predictor attains the optimal MMSE performance.

28.3.4

Entropy Modeling

Given a stationary memoryless source S, uniquely defined by its PDF, p(x), having zero mean and variance s2, linearly quantized with a step size D, the minimum bit rate needed to encode one of its symbols is [13] R ﬃ h(S) log2 D

(28:22)

in which h(S) is the differential entropy of S defined as h(S) ¼

ð1 1

p(x) log2 p(x) dx ¼

1 log2 (c s2 ) 2

(28:23)

where 0 < c 2pe is a positive constant accounting for the shape of the PDF and attaining its maximum for a Gaussian function. Such a constant is referred in the following as the entropy factor. The approximation in Equation 28.22 holds for s D, but is still acceptable for s > D [14]. Now, the minimum average bit rate Rg necessary to reversibly encode an integervalued sample of g, approximated as in Equation 28.22, in which prediction errors are regarded as an uncorrelated source G {eg(i)} and are linearly quantized with a step size D ¼ 1 Rg ﬃ h(G) ¼

1 log2 (cg s e2g ) 2

(28:24)

in which e2g is the average variance of eg(i). By averaging Equation 28.20 and replacing it in Equation 28.24, Rg may be written as Rg ¼

1 log2 [cg (s e2f þ s2en )] 2

(28:25)

where ef2 is the average variance of sef2(i). If sef2 ¼ 0, then Equation 28.25 reduces to Rg Rn ¼

1 log2 (cn s2en ) 2

(28:26)

Signal and Image Processing for Remote Sensing

630

in which cn ¼ 2pe is the entropy factor of the PDF of en, if n is Gaussian. Analogously, if se2n ¼ 0, then Equation 28.25 becomes Rg Rf ¼

1 log2 (cf s e2f ) 2

(28:27)

in which cf 2pe is the entropy factor of prediction errors of the noise-free image, which are generally non-Gaussian. The average entropy of the noise-free signal f in the case of correlated noise is given by replacing Equation 28.20 and Equation 28.21 in Equation 28.27 to yield Rf ¼

1 log2 {cf [s e2g (1 r2 ) s2n ]} 2

(28:28)

Since e2g can be measured during compression process by averaging se2g, cf is the only unknown parameter whose estimation is crucial for the accuracy of Rf. The determination of cf can be performed by modeling the PDF of ef through the PDF of eg and en. The generalized Gaussian density (GGD) model, which can properly represent these PDFs, is described in the next section. 28.3.5

Generalized Gaussian PDF

A model suitable for describing unimodal non-Gaussian amplitude distributions is achieved by varying the parameters n (shape factor) and s (standard deviation) of the GGD [15], which is defined as

n h(n, s) pGG (x) ¼ exp{ [h(n, s) jxj]n } 2 G(1=n)

(28:29)

where h(n, s) ¼

1 G(3=n) 1=2 s G(1=n)

(28:30)

Ð and G() is the well-known Gamma function, that is, G(z) ¼ 01 tz 1 e t dt, z > 0. Since G(n) ¼ (n 1)!, when n ¼ 1, a Laplacian law is obtained; but, n ¼ 2 yields a Gaussian distribution. As limit cases, for n ! 0, pGG(x) becomes an impulse function, yet has extremely heavy tails and thus a nonzero s2 variance, whereas for n ! 1, pGG(x) approaches a uniform distribution having variance s2 as well. The shape parameter n rules the exponential rate of decay: the larger the n, the flatter the PDF; the smaller the n, the more peaked the PDF. Figure 28.4a shows the trend of the GG function for different values of n. The matching between a GGD and the empirical data distribution can be obtained following a maximum likelihood (ML) approach [16]. Such an approach has the disadvantage of a cumbersome numerical solution. Some effective methods, which are suitable for real-time applications, have been developed. They are based on fitting a parametric function (shape function) of the modeled source to statistics calculated from the observed data [17–19]. In Figure 28.4b, the kurtosis [19], Mallat’s [20] and entropy matching [18] shape functions are plotted as a function of the shape factor n for a unity variance GG. In the experiments carried out in this work, the method briefly introduced by Mallat [20] and then developed in greater detail by Sharifi and Leon-Garcia [17] has been adopted.

Quality Assessment of Remote-Sensing Multi-Band Optical Images (a)

(b)

1.8

2.5 0.6

1.5

1.4 1.2 1 0.8

1

0.6 2

0.4

Mallat’s function

1 0.5

Kurtosis function

0 −0.5 −1 −1.5

10

−2

0.2 0 −3

Entropic function

2 Shape function

Probability density

1.6

631

−2

−1

0 1 Amplitude

2

3

−2.5

0

0.5

1

1.5 2 Shape factor

2.5

3

FIGURE 28.4 (a) Unity-variance GG density plotted for several n’s; (b) shape functions of a unity-variance GG PDF as a function of the shape factor n.

28.3.6

Information Theoretic Assessment

Let us assume that the real-valued eg(i) may be modeled as a GGD. From Equation 28.24, the entropy function is 1 log2 (cg ) ¼ Rg log2 (s eg ) ¼ FH (neg ) 2

(28:31)

in which neg is the shape factor of eg(i), the average rate of which, Rg, has been set equal to the entropy H of the discrete source. The neg is found by inverting either the entropy function FH(neg) or any other shape function; in that case, the eg(i) produced by DPCM, instead of its variance, is directly used. Eventually, the parametric PDF of the uncorrelated observed source eg(i) is available. The term eg(i) is obtained by adding a sample of white non-Gaussian noise of variance se2n , approximately equal to (1 r2) sn2 , to a sample of the noise-free uncorrelated nonGaussian signal ef (i). Furthermore, ef (i) and en(i) are independent of each other. Therefore, the GG PDF of eg previously found is given by the linear convolution of the unknown pef(x) with a GG PDF having variance se2n and shape factor nen. By assuming that the PDF of the noise-free residues, pef(x), is GG as well, its shape factor nef can be obtained starting from the forward relationship pGG [s eg , neg ](x) ¼ pGG [

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ s e2g s2en , nef ](x) pGG [sen , nen ](x)

(28:32)

by deconvolving the PDF of noise residue from that of noisy signal residue. In a practical implementation, the estimated value of nef is found such that the direct convolution at the right-hand side of Equation 28.32 yields a GGD, whose shape factor matches neg as much as possible. Eventually, the estimated shape factor ^ef is used to determine the entropy function 1 log2 (cf ) ¼ FH (^ ne f ) 2

(28:33)

Signal and Image Processing for Remote Sensing

632

which is replaced in Equation 28.28 to yield the entropy of the noise-free signal Rf H(S). Figure 28.2 summarizes the overall procedure. The mutual information is simply given as the difference between the rate of the decorrelated source and the rate of the noise. The extension of the procedure to 2D and 3D signals, that is, to digital images and sequences of digital images, is straightforward. In the former case, 2D prediction is used to find eg, and two correlation coefficients, rx and ry, are to be estimated for the noise, since its variance after decorrelation is approximated as sn2(1 r2x)(1 r2y), by assuming a separable 2D Markov model. Analogously, Equation 28.7 that defines the estimated value of a sample of correlated noise is extended as vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u N2 u [g(i, j) g(i, j)] ^(i, j) ¼ t n 1rm rm x (N2 1 þ 2rx 11r 1 þ 2ry 1ry x

(28:34)

y

in which N ¼ 2m þ 1 is the length of the side of the square window on which the average (i, j) is calculated. g The 3D extension is more critical because a sequence of images may have noise variances and spatial CCs different for each image. Moreover, it is often desirable to estimate entropy and mutual information of the individual images of the sequence. Therefore, each image is decorrelated both spatially and along the third dimension by using a 3D prediction.

28.4

Experimental Results

The results presented in this section refer to AVIRIS and ASTER sensors. Such sensors can be considered as representatives of hyperspectral and superspectral instruments, respectively. Apart from specific design characteristics (e.g., only ASTER has TIR channels), their main difference for users consists in the spectral resolution that is particularly high (10 nm) for AVIRIS, thus yielding the acquisition of a consistent number of spectral bands (224). An obvious question would be to assess to what extent such a high spectral resolution and large number of bands may correspond to an increase in information content. Although it is not possible to answer this and it would require the definition of an ideal experiment in which the same scene would be acquired with two ideal sensors differing in spectral resolution only, some careful consideration is, however, possible by referring to the mutual information measured for the two sensors.

28.4.1

AVIRIS Hyperspectral Data

Hyperspectral imaging sensors provide a thorough description of a scene in a quasicontinuous range of wavelengths [21]. A huge amount of data is produced for each scene: problems may arise for transmission, storage, and processing. In several applications, the whole set of data is redundant and therefore it is difficult to seek the intrinsically embedded information [22]. Hence, a representation of the data that allows essential information to be condensed in a few basic components is desirable to expedite scene analysis [23]. The challenging task of hypervariate data analysis may be alleviated by

Quality Assessment of Remote-Sensing Multi-Band Optical Images

633

resorting to information theoretic methods capable of quantifying the information content of the data that are likely to be useful in application contexts [3,24]. The proposed information theoretic procedure was run on a hyperspectral image having 224 spectral bands. The image was acquired in 1997 by the AVIRIS operated by NASA/JPL on the Cuprite Mine test site in Nevada. AVIRIS sequences are constituted by 224 bands recorded at different wavelengths in the range 380–2500 nm, with a spectral separation between two bands of 10 nm nominally. The sensor acquires images pixel by pixel (whisk-broom), recording the spectrum of each pixel. The size of an image is 614 pixels in the across-track direction, while the size in the along-track direction is variable and limited only by the onboard mass storage capacity. The instantaneous field of view (IFOV) is about 1 mrad, which means that, at the operating height of 20 km, the spatial resolution is about 20 m. The sequence was acquired by the sensor with the 12-bit analog-to-digital converter (ADC) introduced in 1995 in substitution of the 10-bit ADC originally designed in 1987. All the data are radiometrically calibrated and expressed as radiance values (i.e., power per surface, solid angle, and wavelength units). The AVIRIS system features four distinct spectrometers capable of operating in the visible (VIS) (380–720 nm), NIR (730–1100 nm), and in the first and second part of the SWIR interval (1110–2500). The three transition regions of the spectrum are covered by two adjacent spectrometers, whose bands are partially overlapped. The Cuprite Mine image for which results are reported is the fourth 614 512 portion of the full image composed of four consecutive 614 512 frames. A sample of six bands, four acquired in the blue, green, red, and NIR wavelengths, two in the SWIR wavelengths, is shown in Figure 28.5. The first step of the procedure concerns noise estimation of all 224 bands. Across-track and along-track CCs of the noise, that is, rx and ry, are plotted against band number in (a)

(b)

(c)

(d)

(e)

(f)

FIGURE 28.5 Sample bands from Cuprite Mine AVIRIS hyperspectral image (128 128, detail): (a) blue wavelength (band 8); (b) green (band 13); (c) red (band 23); (d) near infrared (band 43); (e) lower short-wave infrared (band 137); (f) upper short-wave infrared (band 181).

Signal and Image Processing for Remote Sensing

634

(b) 1

1

−8

−8

Noise CC along track

Noise CC across track

(a)

−6 −4 −2 0 −2

0

50

100 150 Band number

Noise standard deviation

Noise CC along wavelength

−8 −6 −4 −2 0 0

50

100 150 Band number

0 0

50

100 150 Band number

200

0

50

100 150 Band number

200

0

50

100 150 Band number

200

120 100 80 60 40 20 0

200

(e)

(f) 4

45

3.5

40

3

35 SNR (dB)

Noise GG shape factor

−2

(d) 140

1

−2

−4

−2

200

(c)

−6

2.5 2 1.5

30 25 20 15

1

10

−5

5

0

0

50

100 150 Band number

200

0

FIGURE 28.6 Noise parameters of test AVIRIS hyperspectral image plotted against band number: (a) CC across track, rx; (b) CC along track, ry; (c) CC along spectral direction, rl; (d) square root of noise variance, sn; (e) shape factor of GGmodeled noise PDF, nn (average 2.074); (f) SNR.

Figure 28.6a and Figure 28.6b, respectively. The CCs exhibit similar values (averages of 0.403 versus 0.404), showing that the data have been properly preprocessed to eliminate any striping effect possibly caused by the scanning mechanism that produces the images line by line following the motion of the airborne platform (track). Apart from the transition between the first and the second spectrometers, marked losses of correlation are noticed around changes between the other spectrometers. The spectral CC of the noise

Quality Assessment of Remote-Sensing Multi-Band Optical Images

635

(Figure 28.6c) seems rather singular: apart from the absorption bands, all the spectrometers exhibit extremely high values (average of 0.798), probably originated by preprocessing along the spectral direction aimed at mitigating structured noise patterns occurring in the raw data [3]. The measured standard deviation of correlated noise is drawn in Figure 28.6d; the noise follows the typical spectral distribution provided by JPL as NER [14]; the average value of its standard deviation is 32.86. Figure 28.6e shows that the noise of AVIRIS data is Gaussian with good approximation (average shape factor 2.074), as assumed in earlier works [25,3], with the noise shape factor rather stable around 2. Eventually, Figure 28.6f demonstrates that the SNR is almost uniform, varying with the wavelength, apart from the absorption bands, and close to values greater than 35 dB (average of 33.03 dB). This feature is essential for the analysis of spectral pixels, for which a uniform SNR is desirable. Once all noise parameters had been estimated, the information theoretic assessment procedure was run on the AVIRIS data set. Figure 28.7 reports plots of information parameters varying with band number (not exactly corresponding to wavelength because every couple of adjacent spectrometers yields duplicate bands at the same wavelengths).

(b) 10 9 8 7 6 5 4 3 2 1 0

Entropy of noise (bit/pel)

Entropy of observed radiance (bit/pel)

(a)

0

50

100 150 Band number

200

0

50

100 150 Band number

200

0

50

100 150 Band number

200

(d) 10 9 8 7 6 5 4 3 2 1 0

Entropy of noise-free data (bit/pel)

(c) Mutual information (bit/pel)

10 9 8 7 6 5 4 3 2 1 0

0

50

100 150 Band number

200

10 9 8 7 6 5 4 3 2 1 0

FIGURE 28.7 Information parameters of test AVIRIS hyperspectral image plotted against band number: (a) code rate of noisy radiance, Rg, approximating H(S^ ); (b) noise entropy, Rn, corresponding to H(S^jS ); (c) mutual information, I(S; S^), given by Rg Rn (average 0.249); (d) entropy of noise-free radiance, H(S), given by Rf obtained by inverting the parametric source entropy model.

Signal and Image Processing for Remote Sensing

636

The entropy of noisy radiance was approximated by the code bit rates provided by the RLP 3D encoder [12], working in reversible mode, that is, without quantization (roundoff to integer only) of prediction residues. The entropy of the estimated noise (Rn), plotted in Figure 28.7b, follows a trend similar to that of bit rate (Rg) in Figure 28.7a, which also reflects the trend of noise variance (Figure 28.6d) with the only exception of the first spectrometer. Rg has an average value of 4.896 bit per radiance sample, Rn of 4.647. The mutual information, given by the difference of the two former plots and drawn in Figure 28.7c, reveals that the noisiness of the instrument has destroyed most of noisefree radiance information. Only an average of 0.249 bit survived. Conversely, the radiance information varying with band number would approximately be equal to that shown in Figure 28.7d (about 3.57 bits, in average) if the recording process were ideally noise-free. According to the assumed distortion model of spectral radiance, its entropy must be zero whenever the mutual information is zero as well. This condition occurs only in the absorption bands, where SNR is low, thereby validating the implicit relationship between SNR and information.

28.4.2

ASTER Superspectral Data

Considerations on the huge amount of produced data and the opportunity to characterize the basic components in which information is condensed also apply for superspectral sensors like ASTER [26]. In fact, the number of spectral bands is moderate, but the swath width is far wider than that of hyperspectral sensors and therefore the data volume is still huge. The proposed information theoretic procedure was run on the ASTER superspectral image acquired on the area of Mt. Fuji and made available by Earth Remote Sensing Data Analysis Center (ERSDAC) of Ministry of Economy, Trade, and Industry of Japan (METI). L1B data, that is, georeferenced and radiometrically corrected, were analyzed. ASTER collected data in 14 channels of VNIR, SWIR, and TIR spectral range. Images had different spatial, spectral, and radiometric resolutions. The main characteristics of the ASTER data are reported in Table 28.1. Figure 28.8 shows details of the full size test image. Three out of the 14 ASTER bands are reported as representative of VNIR, SWIR, and TIR spectral intervals. The relative original spatial scale has been maintained between images, while contrast has been enhanced to improve visual appearance. The first step of the procedure concerns noise estimation of all 14 bands. Across-track and along-track CCs of the noise, that is, rx and ry, are plotted against band number in Figure 28.9a and Figure 28.9b, respectively. In the former, the correlation of noise is lower in VNIR and SWIR bands than in TIR ones, whereas in the latter this behavior is opposite. This can easily be explained since VNIR and SWIR sensors are push-broom imagers that TABLE 28.1 Spectral, Spatial, and Radiometric Resolution of ASTER Data VNIR (15 m–8 bit/sample) (mm) Band 1: 0.52–0.60 Band 2: 0.63–0.69 Band 3: 0.76–0.86

SWIR (30 m–8 bit/sample) (mm) Band 4: 1.600–1.700 Band 5: 2.145–2.185 Band 6: 2.185–2.225 Band 7: 2.235–2.285 Band 8: 2.295–2.365 Band 9: 2.360–2.430

TIR (90 m–12 bit/sample) (mm) Band 10: Band 11: Band 12: Band 13: Band 14:

8.125–8.475 8.475–8.825 8.925–9.275 10.25–10.95 10.95–11.65

Quality Assessment of Remote-Sensing Multi-Band Optical Images (a)

637

(b)

(c)

FIGURE 28.8 Detail of ASTER superspectral test image: (a) VNIR band 2, resolution 15 m; (b) SWIR band 6, resolution 30 m; (c) TIR band 12, resolution 90 m.

acquire data along-track, whereas the TIR sensor is a whisk-broom device that scans the image across-track. Therefore, in VNIR and SWIR bands, the same sensor element produces a line in the along-track direction and the noise CC results higher along that direction. The spectral CC of the noise is reported in Figure 28.9c. The CC values are rather low, in contrast to hyperspectral sensors where preprocessing along the spectral direction, aimed at mitigating structured noise patterns occurring in the raw data [3], can introduce significant correlation among bands. The CC values of bands 1, 4, and 10 in Figure 28.9c have been arbitrarily set equal to 0, because a previous reference band is not available for the measurement. The measured standard deviation of correlated noise is drawn in Figure 28.9d where two curves are plotted to take into account that the ADC of the TIR sensor has 12 bits, while the ADCs of the VNIR and SWIR sensors have 8 bits. The solid line refers to the acquired data and suggests that the VNIR and SWIR data are less noisy than those of TIR. The dashed line has been obtained by rescaling the 8-bit VNIR and SWIR data to the 12-bit dynamic range of TIR sensor; it represents a likely indicator of the noisiness of the VNIR and SWIR data if they were acquired with a 12-bit ADC. Therefore, it is evident that TIR data are less noisy than the others, as shown in the SNR plot reported in Figure 28.9f. In fact, the SNR values of TIR bands are higher than in the other bands because of the 12-bit dynamic range of the sensor and the low value of the standard deviation of the noise, once the data have been properly rescaled. As a further consideration, Figure 28.9d and Figure 28.9f evidence that the standard deviation of the noise of ASTER data in all bands is rather low and, conversely, the SNR is somewhat high when compared to other satellite imaging sensors.

Signal and Image Processing for Remote Sensing

638

(b) 0.6

0.6

0.5

0.5

Noise CC along track

Noise CC across track

(a)

0.4 0.3 0.2 0.1 0

0.3 0.2 0.1 0

0.1

0.1

0.2

0.2

0.3

0

2

4

6 8 10 Band number

12

0.3

14

(c) 0.7

10

0.6

9

0.5

2

4

6 8 10 Band number

12

14

0.4 0.3 0.2 0.1 0 0.1

8 7 6 5 4 3 2 1

0.2 0.3

0

2

4

6 8 10 Band number

12

0

14

(e)

0

2

4

6 8 10 Band number

12

14

0

2

4

6 8 10 Band number

12

14

(f) 55

2.5

50

2 SNR (dB)

Noise GG shape factor

0

(d)

Noise standard deviation

Noise CC along wavelength

0.4

1.5 1

45 40 35

0.5 30 0 0

2

4

6 8 10 Band Number

12

14

25

FIGURE 28.9 Noise parameters of test ASTER superspectral image plotted against band number: (a) CC across track, rx; (b) CC along track, ry; (c) CC along spectral direction, rl; (d) square root of noise variance, sn: solid line represents values relative to data with their original dynamic range; dashed line plots values of VNIR and SWIR data rescaled to match the dynamic range of TIR; (e) shape factor of GG-modeled noise PDF, nn; (f) SNR.

Eventually, Figure 28.9e plots the noise shape factor. Conversely, from AVIRIS, the noise shape factor is rather different from 2, thereby suggesting that the noise might be non-Gaussian. This apparent discrepancy is due to an inaccuracy in the estimation procedure and probably occurs because the noise level in VNIR and SWIR bands is low (in the 8-bit scale) and hardly separable from weak textures.

Quality Assessment of Remote-Sensing Multi-Band Optical Images

639

Once all noise parameters had been estimated, the information theoretic assessment procedure was run on the ASTER data set. The entropy of observed images was again approximated by the code bit rates provided by the RLP 3D encoder [12], working in reversible mode. Owing to the different spatial scales of ASTER images in VNIR, SWIR, and TIR spectral range, each data set has been processed separately. Therefore, the 3D prediction has been carried out for each spectrometer. As a consequence, the decorrelation of the first image of each data set cannot exploit the correlation with the spectrally preceding band. Thus, the code rate Rg, and consequently Rn and Rf, is inflated for bands 1, 4, 10 (some tenths of bit for bands 1, 4, and roughly 1 bit for band 10). Figure 28.10 reports plots of information parameters varying with band numbers. Entropies are higher for TIR bands than for the others because of the 12-bit ADC. The bit rate Rg is reported in Figure 28.10a. The inflation effect is apparent for band 10 and hardly recognizable for bands 1 and 4. The high value of Rg for band 3 is due to the response of vegetation that introduces a strong variability in the observed scene and makes band 3 strongly uncorrelated with the other VNIR bands. Such variability represents an increase in information that is identified and measured by the entropy values.

(b) 6

6 Entropy of noise (bit/pel)

Entropy of observed data (bit/pel)

(a)

5 4 3 2 1 0

0

2

4

6 8 10 Band number

12

3 2 1 0

2

4

6 8 10 Band number

12

14

0

2

4

6 8 10 Band number

12

14

(d) Entropy of noise-free data (bit/pel)

6 Mutual information (bit/pel)

4

0

14

(c)

5

5 4 3 2 1 0

0

2

4

6 8 10 Band number

12

14

6 5 4 3 2 1 0

FIGURE 28.10 Information parameters of test ASTER superspectral image plotted against band number: (a) code rate of observed data, Rg, approximating H(S^ ); (b) noise entropy, Rn, corresponding to H(S^jS); (c) mutual information, I(S, S^), given by Rg Rn; (d) entropy of noise-free data, H(S), given by Rf obtained by inverting the parametric source entropy model.

Signal and Image Processing for Remote Sensing

640

The entropy of the estimated noise, plotted in Figure 28.10b, follows a trend similar to that of bit rate in Figure 28.10a, with a noticeable difference in band 3 (NIR), where the information presents a peak value, as already noticed. The mutual information, I(S; S^), given by the difference of the two former plots and drawn in Figure 28.10c has retained a significant part of noise-free radiance information. In fact, the radiance information varying with the band number is consistent with that shown in Figure 28.10d. This happens because the recording process is weakly affected by the noise. According to the assumed model, when the noise level is low and does not affect the observation significantly, the entropy of the noise-free source must show the same trend of the mutual information and the entropy of the observed data.

28.5

Conclusions

A procedure for information theoretic assessment of multi-dimensional remote-sensing data has been described. It relies on robust noise parameters estimation and advanced lossless compression to calculate the mutual information between the noise-free bandlimited analog signal and the acquired digitized signal. Also, because of the parametric entropy modeling of information sources, it is possible to upper bound the amount of information generated by an ideally noise-free process of sampling and digitization. The results on image sequences acquired by AVIRIS and ASTER imaging sensors offer an estimation of the true and hypothetical information contents of each spectral band. In particular, for a single spectral band, mutual information of ASTER data results higher than that of AVIRIS. This is not surprising and is due to the nature of hyperspectral data that are strongly correlated and thus well predictable, apart from the noise. Therefore, a hyperspectral band usually exhibits mutual information lower than that of a superspectral band. Nevertheless, the sum of the mutual information of the hyperspectral bands on a given spectral interval should be higher than the mutual information of only one band covering the same spectral interval. Interesting considerations should stem from future work devoted to the analysis of data of a same scene acquired at the same time by different sensors.

Acknowledgment The authors wish to thank NASA/JPL and ERSDAC/METI for providing the test data.

References 1. Huck, F.O., Fales, C.L., Alter-Ganterberg, R., Park, S.K., and Rahman, Z., Information-theoretic assessment of sampled imaging systems, J. Opt. Eng., 38, 742–762, 1999. 2. Park, S.K. and Rahman, Z., Fidelity analysis of sampled imaging systems, J. Opt. Eng., 38, 786– 800, 1999.

Quality Assessment of Remote-Sensing Multi-Band Optical Images

641

3. Aiazzi, B., Alparone, L., Barducci, A., Baronti, S., and Pippi, I., Information theoretic assessment of sampled hyperspectral imagers, IEEE Trans. Geosci. Rem. Sens., 39, 1447–1458, 2001. 4. Shannon, C.E. and Weaver, W., The Mathematical Theory of Communication, University of Illinois Press, Urbana, IL, 1949. 5. Aiazzi, B., Alparone, L., Barducci, A., Baronti, S., and Pippi, I., Estimating noise and information of multispectral imagery, J. Opt. Eng., 41, 656–668, 2002. 6. Aiazzi, B., Alparone, L., and Baronti, S., Fuzzy logic-based matching pursuits for lossless predictive coding of still images, IEEE Trans. Fuzzy Syst., 10, 473–483, 2002. 7. Aiazzi, B., Alparone, L., and Baronti, S., Near-lossless image compression by relaxation-labelled prediction, Signal Process., 82, 1619–1631, 2002. 8. Blahut, R.E., Principles and Practice of Information Theory, Addison-Wesley, Reading, MA, 1987. 9. Lee, J.S. and Hoppel, K., Noise modeling and estimation of remotely sensed images, Proc. IEEE Int. Geosci. Rem. Sens. Symp., 2,1005–1008, 1989. 10. Aiazzi, B., Alparone, L., and Baronti, S., Reliably estimating the speckle noise from SAR data, Proc. IEEE Int. Geosci. Rem. Sens. Symp., 3,1546–1548, 1999. 11. Aiazzi, B., Alba, P., Alparone, L., and Baronti, S., Lossless compression of multi/hyper-spectral imagery based on a 3-D fuzzy prediction, IEEE Trans. Geosci. Rem. Sens., 37, 2287–2294, 1999. 12. Aiazzi, B., Alparone, L., and Baronti, S., Near-lossless compression of 3-D optical data, IEEE Trans. Geosci. Rem. Sens., 39, 2547–2557, 2001. 13. Jayant, N.S. and Noll, P., Digital Coding of Waveforms: Principles and Applications to Speech and Video, Prentice Hall, Englewood Cliffs, NJ, 1984. 14. Roger, R.E. and Arnold, J.F., Reversible image compression bounded by noise, IEEE Trans. Geosci. Rem. Sens., 32, 19–24, 1994. 15. Birney, K.A. and Fischer, T.R., On the modeling of DCT and subband image data for compression, IEEE Trans. Image Process., 4, 186–193, 1995. 16. Mu¨ller, F., Distribution shape of two-dimensional DCT coefficients of natural images, Electron. Lett., 29, 1935–1936, 1993. 17. Sharifi, K. and Leon-Garcia, A., Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video, IEEE Trans. Circuits Syst. Video Technol., 5, 52–56, 1995. 18. Aiazzi, B., Alparone, L., and Baronti, S., Estimation based on entropy matching for generalized Gaussian PDF modeling, IEEE Signal Process. Lett., 6, 138–140, 1999. 19. Kokkinakis, K. and Nandi, A.K., Exponent parameter estimation for generalized Gaussian PDF modeling, Signal Process., 85, 1852–1858, 2005. 20. Mallat, S., A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. Machine Intell., 11, 674–693, 1989. 21. Shaw, G. and Manolakis, D., Signal processing for hyperspectral image exploitation, IEEE Signal Process. Magazine, 19, 12–16, 2002. 22. Jimenez, L.O. and Landgrebe, D.A., Hyperspectral data analysis and supervised feature reduction with projection pursuit, IEEE Trans. Geosci. Rem. Sens., 37, 2653–2667, 1999. 23. Landgrebe, D.A., Hyperspectral image data analysis, IEEE Signal Process. Mag., 19, 17–28, 2002. 24. Aiazzi, B., Baronti, S., Santurri, L., Selva, M., and Alparone, L., Information–theoretic assessment of multi-dimensional signals, Signal Process., 85, 903–916, 2005. 25. Roger, R.E. and Arnold, J.F., Reliably estimating the noise in AVIRIS hyperspectral images, Int. J. Rem. Sens., 17, 1951–1962, 1996. 26. Aiazzi, B., Alparone, L., Baronti, S., Santurri, L., and Selva, M., Information theoretic assessment of aster super-spectral imagery. In Image and Signal Processing for Remote Sensing XI, Bruzzone, L., ed., Proc. SPIE, Bellingham, Washington, Volume 5982, 2005.

Horizantal IR components 1 to 4 100 100

50

Vertical pixels

200

300

0

400 −50 500

−100

600 50

100

150

200 300 250 Horizontal pixels

350

400

450

500

FIGURE 1.13 Component images 1 to 4 from the horizontal rows used to produce a composite image representing the shortest scales.

Horizontal IR components 5 to 6 40 100

30 20

200 Vertical pixels

10

300 0

400

−10 −20

500

−30 600 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

−40

FIGURE 1.14 Component images 5 to 6 from the horizontal rows used to produce a composite image representing the longer scales.

Vertical pixels

Vertical IR components 3 to 5

100

40

200

20

300

0

400

−20

500

−40

600

−60 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.15 Component images 3 to 5 from the vertical rows here combined to produce a composite image representing the midrange scales.

Vertical IR components image 6 25 20 100 15

Vertical pixels

200

10 5

300 0 −5

400

−10 500 −15 −20

600 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.16 Component image 6 from the vertical row used to produce a composite image representing the longer scale.

Horizontal row 400: components 1 to 4 10

Inverse length scale (1/pixel)

0.45

9

0.4

8

0.35

7

0.3

6

0.25

5

0.2

4

0.15

3

0.1

2 1

0.05 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

FIGURE 1.17 The results from the NEMD/NHT computation on horizontal row 400 for components 1 to 4, which resulted from Figure 1.13. Note the apparent influence of surface waves on the IR information. The most intense IR radiation can be seen at the smaller values of inverse length scale, denoting the longer scales in components 3 and 4. A wavelike influence can be seen at all scales.

Horizontal row 400: components 5 to 6

0.04

8 0.035 7 Inverse length scale (1/pixel)

0.03 6 0.025 5 0.02 4 0.015

3

0.01

2

0.005

0 0

1

50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.18 The results from the NEMD/NHT computation on horizontal row 400 for components 5 to 6, which resulted from Figure 1.14. Even at the longer scales, an apparent influence of surface waves on the IR information can still be seen.

(a) Raw time-domain 8 signals for vehicle

(b) Raw time-domain 8 signals for missile

0.8

2 1.5

0.6

1 0.5 Amplitude

Amplitude

0.4 0.2 0

Vehicle class

–0.2

0 –0.5 –1

Missile class

–1.5 –2

–0.4 –2.5 –0.6

0

100

200

300

400

500

600

–3

700

Time (sec)

0

1000

2000

3000

4000

5000

6000

7000

8000

1400

1600

Time (sec)

(c) Raw time-domain 8 signals for artillery

(d) Raw time-domain 8 signals for rocket

1.8

0.7

1.6

0.6

1.4 0.5 Amplitude

Amplitude

1.2 1

Artillery class

0.8

0.4 0.3

0.6

Rocket class

0.2 0.4 0.1

0.2 0

0 0

50

100

150

200

250

300

350

0

200

400

Time (sec) (e) Raw time-domain 8 signals for jet

1000

1200

(f) Raw time-domain 8 signals for shuttle 1

Jet class

1.2 1

0.8

0.8

0.7

0.6 0.4

0.6 0.5

0.2

0.4

0

0.3

–0.2

0.2 0

500

1000 1500 2000 2500 3000

Time (sec)

FIGURE 3.7 Infrasound signals for six classes.

Shuttle class

0.9

Amplitude

Amplitude

800

Time (sec)

1.4

–0.4

600

3500 4000 4500 0.1 0

100

200

300

400

500

600

Time (sec)

700

800

900 1000

(a) Feature set for vehicle

10

9

9

8

8 Amplitude

Amplitude

10

7 6

7 6

5

5

4

4

3

5

10

15

20

25

3

30

(b) Feature set for missile

5

10

10

9

9

8

8

Amplitude

Amplitude

(c) Feature set for artillery fire 10

7 6

4

4 15

20

25

(d) Feature set for rocket

3

30

5

Feature number (e) Feature set for jet

10

9

9

8

8 Amplitude

Amplitude

10

30

6 5

10

25

7

5

5

20

Feature number

Feature number

3

15

7 6

10

15 20 25 Feature number

30

(f) Feature set for shuttle

7 6

5

5

4

4 3

3 5

10 15 20 25 Feature number

30

FIGURE 3.8 Infrasound signals for the six class different classes.

5

10 15 20 25 Feature number

30

FIGURE 3.9 Performance plot used to determine the optimal number components in the feature vector. Ill conditioning occurs for the feature number less than 10, and for the feature number greater than 60, the CCR dramatically declines.

FIGURE 9.2 (a) TOPSAR DEM and (b) SRTM DEM. The tick labels are pixel numbers. Note the difference in pixel spacing between the two DEMs. (c) Artifacts obtained by subtracting the SRTM DEM from the TOPSAR DEM. The flight direction and the radar look direction of the aircraft associated with the swath with the artifact are indicated with long and short arrows, respectively. Note that the artifacts appear in one entire TOPSAR swath, while they are not as serious in other swaths.

110W 40N

100W

90W

80W

244 242

238 30N

236 234

25N

232

Brightness temperature (K)

240

35N

230

20N

228 15N

226

(a) 110W 40N

100W

90W

80W

243 242.5 242 241.5

30N

241 240.5

25N

240 239.5

20N

Brightness temperature (K)

35N

239 238.5

15N (b)

238

FIGURE 12.13 NOAA-15 AMSU-A 54.4-GHz brightness temperatures for a northbound track on September 13, 2000. (a) Uncorrected and (b) limb and surface corrected.

92W

94W

90W

88W

260

34N

92W

94W

86W

90W

88W

86W

260 34N

240

240

220 32N

32N

220 200 30N

30N

200

180 (a)

(b) 92W

94W

34N

90W

88W

86W

0

–2

92W

94W

90W

88W

86W

0

–2

34N

–4 –4

32N

32N

–6 –6 30N

30N

–8

–8 (c)

(d)

FIGURE 12.14 Frontal system on September 13, 2000, 0130 UTC. (a) Brightness temperatures (K) near 183 + 7 GHz. (b) Brightness temperatures (K) near 183 + 3 GHz. (c) Brightness temperature perturbations (K) near 52.8 GHz. (d) Inferred 15-km-resolution brightness temperature perturbations (K) near 52.8 GHz.

94W

92W

90W

88W

86W

34N

50

94W

92 W

90W

88W

86W

50

40 34N

40

30

30 32N

32N

20

20

30N

10 30N

10

(a)

0

0

94W

34N

92W

90W

88W

86W

50

(b) 94W

92W

90W

88W

86W

50

40 34N

40

30

30 32N

32N

20

20

30N

10 30N

10

(c)

0

0

(d)

FIGURE 12.15 Precipitation rates (mm/h) above 0.5 mm/h observed on September 13, 2000, 0130 UTC. (a) 15-km-resolution NEXRAD retrievals, (b) 15-km-resolution AMSU retrievals, (c) 50-km-resolution NEXRAD retrievals, and (d) 50km-resolution AMSU retrievals.

N

Northern California Pacific Ocean

Wave Direction (306ⴗ)

Gualala River

Range

Study-Site Box (512 3 512)

Flight Direction (Azimuth) FIGURE 13.1 An L-band, VV-pol, AIRSAR image, of northern California coastal waters (Gualala River dataset), showing ocean waves propagating through a study-site box.

FIGURE 13.4 Orientation angle spectra versus wave number for azimuth direction waves propagating through the study site. The white rings correspond to 50 m, 100 m, 150 m, and 200 m. The dominant wave, of wavelength 157 m, is propagating at a heading of 3068.

45 40

Alpha Angle (degrees)

35 30 25 20 15 10 5 0 0

10

20

30

40

50

60

70

80

90

Incidence Angle (degrees) FIGURE 13.6 Small perturbation model dependence of alpha on the incidence angle. The red curve is for a dielectric constant representative of sea water (80–70j) and the blue curve is for a perfectly conducting surface.

0.9 0.8

Derivative of Alpha (f )

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

10

20 30 40 50 60 70 Incidence Angle (F ) (degrees)

80

90

FIGURE 13.7 Derivative of alpha with respect to the incidence angle. The red curve is for a sea water dielectric and the blue curve is for a perfectly conducting surface.

FIGURE 13.9 Spectrum of waves in the range direction using the alpha parameter from the Cloude–Pottier decomposition method. Wave direction is 3068 and dominant wavelength is 162 m.

FIGURE 13.19 (a and b) (a) Variations in anisotropy at low wind speeds for a filament of colder, trapped water along the northern California coast. The roughness changes are not seen in the conventional VV-pol image, but are clearly visible in (b) an anisotropy image. The data is from coastal waters near the Mendocino Co. town of Gualala.

Anisotropy - A Image

f = 23

f = 62 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

L-band, vv-pol image FIGURE 13.20 (a) Image of anisotropy values. The quantity, 1-A, is proportional to small-scale surface roughness and (b) a conventional L-band VV-pol image of the study area.

FIGURE 13.21 Alpha-entropy scatter plot for the image study area. The plot is divided into eight color-coded scattering classes for the Cloude–Pottier decomposition described in Ref. [6].

FIGURE 13.22 Classification of the slick-field image into H/ scattering classes.

FIGURE 13.23 (a) L-band, HH-pol image of a second study image (CM6744) containing two strong spiral eddies marked by natural biogenic slicks and (b) classification of the slicks marking the spiral eddies. The image features were values combined with the Wishart classifier. classified into eight classes using the H–

FIGURE 13.24 14 scattering classes. The Classes 1–7 correspond to anisotropy Classification of the slick-field image into H/A/ A values 0.5 to 1.0 and the classes 8–14 correspond to anisotropy A values 0.0 to 0.49. The two lighter blue vertical features at the lower right of the image appear in all images involving anisotropy and are thought to be smooth slicks of lower concentration.

(a) DC Mall data

(b) Training map

(c) Test map

Roof (3834) Roof (5106) Street (416) Street (5068) Path (175) Path (1144) Grass (8545) Trees (5078)

Grass (1928) Trees (405)

Water (9157)

Water (1224)

Shadow (1191)

Shadow (97)

FIGURE 22.1 False color image of the DC Mall data set (generated using the bands 63, 52, and 36) and the corresponding ground-truth maps for training and testing. The number of pixels for each class are shown in parenthesis in the legend.

(b) Training map (a) Centre data

Water (824) Trees (820) Meadows (820) Self-Blocking Bricks (820) Bare Soil (820) Asphalt (816) Bitumen (816) Tiles (816) Shadow (816)

(c) Test map

Water (65971) Trees (7508) Meadows (3090) Self-Blocking Bricks (2685) Bare Soil (6584) Asphalt (9248) Bitumen (7287) Tiles (42826) Shadow (2863)

FIGURE 22.2 False color image of the Centre data set (generated using the bands 68, 30, and 2) and the corresponding groundtruth maps for training and testing. The number of pixels for each class are shown in parenthesis in the legend. (A missing vertical section in the middle was removed.)

(b) Training map (a) University data

Asphalt (548) Meadows (540) Gravel (392) Trees (524) (Painted) Metal shoots (265) Bare Soil (532) Bitumen (375) Self-Blockin Bricks (514) Shadow (231)

(c) Test map

Asphalt (6831) Meadows (18641) Gravel (2099) Trees (3064) (Painted) Metal shoots (1345) Bare Soil (5029) Bitumen (1330) Self-Blockin Bricks (3682) Shadow (947)

FIGURE 22.3 False color image of the University data set (generated using the bands 68, 30, and 2) and the corresponding ground-truth maps for training and testing. The number of pixels for each class are shown in parenthesis in the legend.

(a) A large connected region formed by merging pixels labeled as street in DC Mall data

(b) More compact sub-regions after splitting the region in (a)

(c) A large connected region formed by merging pixles labeled as tiled in Centre data

(d) More compact sub-regions after splitting the region in (c)

FIGURE 22.11 Examples for the region segmentation process. The iterative algorithm that uses mathematical morphology operators is used to split a large connected region into more compact subregions.

(a) Pixel level Bayesian

(b) Region level Bayesian

(c) Quadratic Gausian

FIGURE 22.12 Final classification maps with the Bayesian pixel and region-level classifiers and the quadratic Gaussian classifier for the DC Mall data set. Class color codes were listed in Figure 22.1.

(a) Pixel level Bayesian

(b) Region level Bayesian

(c) Quadratic Gaussian

FIGURE 22.13 Final classification maps with the Bayesian pixel and region-level classifiers and the quadratic Gaussian classifier for the Centre data set. Class color codes were listed in Figure 22.2.

(a) Pixel level Bayesian

(b) Region level Bayesian

(c) Quadratic Gaussian

FIGURE 22.14 Final classification maps with the Bayesian pixel and region-level classifiers and the quadratic Gaussian classifier for the University data set. Class color codes were listed in Figure 22.3.

FIGURE 23.1 Example of multi-sensor visualization of an oil spill in the Baltic Sea created by combining an ENVISAT ASAR image with a Radarsat SAR image taken a few hours later.

Decision-level fusion

Feature extraction

Classifier module

Statistical Consensus theory Neural nets Dempster-Shager Fusion module

Image data Sensor 1

Feature extraction

Classifier module

Image data Sensor p

Classified image

Feature-level fusion Feature extraction

Classifier module

Image data Sensor 1

Feature extraction

Statistical Neural nets Dempster-Shafer

Classified Image

Image data Sensor p

Pixel-level fusion

Classifier module

Image data Sensor 1

Multiband image data

Image data Sensor p FIGURE 23.2 An illustration of data fusion on different levels.

Statistical Neural nets Dempster-Shafer

Classified image

Sensor-specific neural net

P(w|x1) Multisensor fusion net Image data Sensor 1 Sensor-specific neural net P(w|xp)

Image data Sensor p Classified image FIGURE 23.4 Network architecture for decision-level fusion using neural networks.

(b) 240 220 200 180 60 40 20 00 80 60 40 20 0

Scale 600 400 200 0

0

50

100 150 Brightness

200

Greenness

Greenness

(a)

240 220 200 180 160 140 120 100 80 60 40 20 0

Scale 300 200 100 0

0

250

50

100

(c)

Greenness

150

Brightness

240 220 200 180 160 140 120 100 80 60 40 20 0

Scale 600 400 200 0

0

50

100 150 Brightness

200

250

FIGURE 24.13 Greenness versus brightness, (a) original multi-spectral, (b) HT fusion, (c) PCA fusion.

200

250

(a)

(b)

240

240

220 200

220 200

180 160

Scale

140

600 400

120

200

100

0

80

180

Greenness

Greenness

FIGURE 24.16 (a) Original p multi-spectral, (b)Result of ETMþ and Radarsat image fusion with HT (Gaussian window with ﬃﬃﬃ spread s ¼ 2 and window spacing d ¼ 4) (RGB composition 5–4–3).

600

140

400

120

200

100

0

80

60

60

40

40

20

20 0

0 0

(a)

Scale

160

50

100

150

Brightness

200

250

0

(b)

50

100

150

200

Brightness

FIGURE 24.17 Greenness versus brightness, (a) original multi-spectral, (b) LANDSAT–SAR fusion with HT.

250

(a)

(b)

(c)

FY smooth

FY medium deformation

FY deformed

Young ice

Open water

Nilas

(d)

FIGURE 25.9 MLP sea ice classification maps obtained using (a) ERS, (b) RADARSAT, (c) ERS and RADARSAT, (d) ERS, RADARSAT, and Meteor images. The classifiers’ parameters are given in Table 25.4.

(a)

(b)

(c)

FY smooth

FY medium deformation

FY deformed

Young ice

Open water

Nilas

(d)

Not classified

FIGURE 25.10 LDA sea ice classification maps obtained using: (a) ERS; (b) RADARSAT; (c) ERS and RADARSAT; and (d) ERS, RADARSAT, and Meteor images.

Water City Cultivation Factory Mountain

FIGURE 27.12 Ground-truth data for training.

Signal Processing for Remote Sensing

Read more

Remote Sensing Image Processing

Read more

Remote Sensing and Image Interpretation

Read more

Essential Image Processing and GIS for Remote Sensing

Read more

Essential Image Processing and GIS for Remote Sensing

Read more

Optical Remote Sensing: Advances in Signal Processing and Exploitation Techniques

Read more

Radar Remote Sensing of Urban Areas (Remote Sensing and Digital Image Processing, Volume 15)

Read more

Remote Sensing Image Analysis Including the Spatial Domain Remote Sensing and Digital Image Proc

Read more

Signal theory methods in multispectral remote sensing

Read more

Signal theory methods in multispectral remote sensing

Read more

Signal Processing for Image Enhancement and Multimedia Processing

Read more

Signal processing for image enhancement and multimedia processing

Read more

Remote Sensing Digital Image Analysis: An Introduction

Read more

Geoscience and Remote Sensing

Read more

Signal Processing Fundamentals and Applications for Communications and Sensing Systems

Read more

Signal Processing Fundamentals and Applications for Communications and Sensing Systems

Read more

Spatial Statistics for Remote Sensing

Read more

Spatial statistics for remote sensing

Read more

Satellite remote sensing for archaeology

Read more

Satellite remote sensing for archaeology

Read more

Satellite remote sensing for archaeology

Read more

Frontiers of remote sensing information processing

Read more

Satellite Remote Sensing for Archaeology

Read more

Two dimentional signal and image processing

Read more

Optimisation in Signal and Image Processing

Read more

Evolutionary image analysis and signal processing

Read more

Adaptive Blind Signal and Image Processing

Read more

Signal and Image Processing in Navigational Systems

Read more

Digital signal and image processing using MATLAB

Read more

Adaptive Blind Signal and Image Processing

Read more

Recommend Documents

Signal Processing for Remote Sensing

Signal Processing forRemote Sensing ß 2007 by Taylor & Francis Group, LLC. ß 2007 by Taylor & Francis Group, LLC. ...

Remote Sensing Image Processing

Remote Sensing Image Processing Synthesis Lectures on Image, Video, and Multimedia Processing Editor Alan C. Bovik, Un...

Remote Sensing and Image Interpretation

REMOTE SENSING AND IMAGE INTERPRETATION Fifth Edition Thomas M. Lillesand University of Wisconsin-Madison Ralph W. K...

Essential Image Processing and GIS for Remote Sensing

Essential Image Processing and GIS for Remote Sensing Jian Guo Liu Philippa J. Mason Imperial College London, UK Ess...

Essential Image Processing and GIS for Remote Sensing

Essential Image Processing and GIS for Remote Sensing Jian Guo Liu Philippa J. Mason Imperial College London, UK Ess...

Optical Remote Sensing: Advances in Signal Processing and Exploitation Techniques

Augmented Vision and Reality Series Editors Riad Hammoud, DynaVox Technologies, Pittsburgh, PA, USA Lawrence B. Wolff,...

Radar Remote Sensing of Urban Areas (Remote Sensing and Digital Image Processing, Volume 15)

Radar Remote Sensing of Urban Areas Remote Sensing and Digital Image Processing VOLUME 15 Series Editor: EARSel Ser...

Remote Sensing Image Analysis Including the Spatial Domain Remote Sensing and Digital Image Proc

Remote Sensing Image Analysis: Including the Spatial Domain Remote Sensing and Digital Image Processing VOLUME 5 Seri...

Signal theory methods in multispectral remote sensing

Signal theory methods in multispectral remote sensing

SIGNAL THEORY METHODS IN MULTISPECTRAL REMOTE SENSING WILEY SERIES IN REMOTE SENSING Jin Au Kong, Editor Asrar THE...