Signal andImage Processing forRemote Sensing
Signal andImage Processing forRemote Sensing Editedby C.H.Chen
Bo...

Author:
C.H. Chen

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Signal andImage Processing forRemote Sensing

Signal andImage Processing forRemote Sensing Editedby C.H.Chen

Boca Raton London New York

CRC is an imprint of the Taylor & Francis Group, an informa business

Published in 2007 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2007 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-10: 0-8493-5091-3 (Hardcover) International Standard Book Number-13: 978-0-8493-5091-7 (Hardcover) Library of Congress Card Number 2006009330 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Signal and image processing for remote sensing / edited by C.H. Chen. p. cm. Includes bibliographical references and index. ISBN 0-8493-5091-3 (978-0-8493-5091-7) 1. Remote sensing--Data processing. 2. Image processing. 3. Signal processing. I. Chen, C. H. (Chi-hau), 1937G70.4.S535 2006 621.36’78--dc22

2006009330

Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com Taylor & Francis Group is the Academic Division of Informa plc.

and the CRC Press Web site at http://www.crcpress.com

Preface

Both signal processing and image processing are playing increasingly important roles in remote sensing. As most data from satellites are in image form, image processing has been used most often, whereas signal processing contributes significantly in extracting information from the remotely sensed waveforms or time series data. In contrast to other books in this field, which deal almost exclusively with the image processing for remote sensing, this book provides a good balance between the roles of signal processing and image processing in remote sensing. It focuses mainly on methodologies of signal processing and image processing in remote sensing. An emphasis is placed on the mathematical techniques that, we believe, will not change as much as sensor, software, and hardware technologies. Furthermore, the term ‘‘remote sensing’’ is not limited to the problems with data from satellite sensors. Other sensors that acquire data remotely are also considered. Another unique feature of this book is that it covers a broader scope of problems in remote sensing information processing than any other book in this area. The book is divided into two parts. Part I deals with signal processing for remote sensing and has 12 chapters. Part II deals with image processing for remote sensing and has 16 chapters. The chapters are written by leaders in the field. We are very fortunate to have Dr. Norden Huang, inventor of the Huang–Hilbert transform, along with Dr. Steven Long, write a chapter on the application of the normalized Hilbert transform to remote sensing problem, and to have Dr. Enders A. Robinson, who has made many major contributions to geophysical signal processing for over half a century, write a chapter on the basic problem of constructing seismic images by ray tracing. In Part I, following Chapter 1 by Drs. Long and Huang, and my short Chapter 2 on the roles of statistical pattern recognition and statistical signal processing in remote sensing, we start from a very low end of the electromagnetic spectrum. Chapter 3 considers the classification of infrasound at a frequency range of 0.01 Hz to 10 Hz by using a parallel bank neural network classifier and an 11-step feature selection process. The >90% correct classification rate is impressive for this kind of remote sensing data. Chapter 4 through Chapter 6 deal with seismic signal processing. Chapter 4 provides excellent physical insights on the steps for constructing digital seismic images. Even though the seismic image is an image, this chapter is placed in Part I as seismic signals start as waveforms. Chapter 5 considers the singular value decomposition of a matrix data set from scalarsensors arrays, which is followed by an independent component analysis (ICA) step to relax the unjustified orthogonality constraint for the propagation vectors by imposing a stronger constraint of fourth-order independence of the estimated waves. With an initial focus on the use of ICA in seismic data and inspired by Dr. Robinson’s lecture on seismic deconvolution at the 4th International Symposium, 2002, on ‘‘Computer Aided Seismic Analysis and Discrimination,’’ Mr. Zhenhai Wang has examined approaches beyond ICA for improving seismic images. Chapter 6 is an effort to show that factor analysis, as an alternative to stacking, can play a useful role in removing some unwanted components in the data and thereby enhancing the subsurface structure as shown in the seismic images. Chapter 7 on Kalman filtering for improving detection of landmines using electromagnetic signals, which experience severe interference, is another remote sensing problem of higher interest in recent years. Chapter 8 is a representative time-series analysis problem on using meterological and remote sensing indices to monitor vegetation moisture

dynamics. Chapter 9 actually deals with the image data for a digital elevation model but is placed in Part I mainly because the prediction error (PE) filter originates from the geophysical signal processing. The PE filter allows us to interpolate the missing parts of an image. The only chapter that deals with the sonar data is Chapter 10, which shows that a simple blind source separation algorithm based on the second-order statistics can be very effective to remove reverberations in active sonar data. Chapter 11 and Chapter 12 are excellent examples of using neural networks for the retrieval of physical parameters from the remote sensing data. Chapter 12 further provides a link between signal and image processing as the principal component analysis and image sharpening tools employed are exactly what are needed in Part II. With a focus on image processing of remote sensing images, Part II begins with Chapter 13, which is concerned with the physics and mathematical algorithms for determining the ocean surface parameters from synthetic aperture radar (SAR) images. Mathematically, the Markov random field (MRF) is one of the most useful models for the rich contextual information in an image. Chapter 14 provides a comprehensive treatment of MRF-based remote sensing image classification. Besides an overview of previous work, the chapter describes the methodological issues involved and presents results of the application of the technique to the classification of real (both single-date and multitemporal) remote sensing images. Although there are many studies on using an ensemble of classifiers to improve the overall classification performance, the random forest machine learning method for classification of hyperspectral and multisource data as presented in Chapter 15 is an excellent example of using new statistical approaches for improved classification with the remote sensing data. Chapter 16 presents another machine learning method, AdaBoost, to obtain a robustness property in the classifier. The chapter further considers the relations among the contextual classifier, MRF-based methods, and spatial boosting. The following two chapters are concerned with different aspects of the change detection problem. Change detection is a uniquely important problem in remote sensing as the images acquired at different times over the same geographical area can be used in the areas of environmental monitoring, damage management, and so on. After discussing change detection methods for multitemporal SAR images, the chapter examines an adaptive scale-driven technique for change detection in medium resolution SAR data. Chapter 18 evaluates a Wiener filter–based method, Mahalanobis distance, and subspace projection methods of change detection, as well as a higher order, nonlinear mapping using kernel-based machine learning concept to improve the change detection performance as represented by receiver operating characteristic (ROC) curves. In recent years, ICA and related approaches have presented a new potential in remote sensing information processing. A challenging task underlying many hyperspectral imagery applications is decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions. Chapter 19 presents a new method for unsupervised endmember extraction called vertex component analysis (VCA). The VCA algorithms presented have better or comparable performance as compared to two other techniques but require less computational complexity. Other useful ICA applications in remote sensing include feature extraction and speckle reduction of SAR images. Chapter 20 presents two different methods of SAR image speckle reduction using ICA, both making use of the FastICA algorithm. In two-dimensional time series modeling, Chapter 21 makes use of a fractionally integrated autoregressive moving average (FARIMA) analysis to model the mean radial power spectral density of the sea SAR imagery. Long-range dependence models are used in addition to the fractional sea surface models for the simulation of the sea SAR image spectra at different sea states, with and without oil slicks at low computational cost.

Returning to the image classification problem, Chapter 22 deals with the topics of pixel classification using a Bayes classifier, region segmentation guided by morphology and the split-and-merge algorithm, region feature extraction, and region classification. Chapter 23 provides a tutorial presentation of different issues of data fusion for remote sensing applications. Data fusion can improve classification, and four multisensor classifiers are presented for the decision level fusion strategies. Beyond the currently popular transform techniques, Chapter 24 demonstrates that the Hermite transform can be very useful for noise reduction and image fusion in remote sensing. The Hermite transform is an image representation model that mimics some of the important properties of human visual perception, namely, local orientation analysis and the Gaussian derivative model of early vision. Chapter 25 is another chapter that demonstrates the importance of image fusion to improving sea ice classification performance, using backpropagation trained neural network and linear discrimination analysis and texture features. Chapter 26 is on the issue of accuracy assessment for which the Bradley–Terry model is adopted. Chapter 27 is on land map classification using a support vector machine, which has been increasingly popular as an effective classifier. The land map classification classifies the surface of the Earth into categories such as water area, forests, factories, or cities. Finally, with lossless data compression in mind, Chapter 28 focuses on information-theoretic measure of the quality of multiband remotely sensed digital images. The procedure relies on the estimation of parameters of the noise model. Results on image sequences acquired by AVIRIS and ASTER imaging sensors offer an estimation of the information contents of each spectral band. With rapid technological advances in both sensor and processing technologies, a book of this nature can only capture a certain amount of current progress and results. However, if past experience offers any indication, the numerous mathematical techniques presented will give this volume a long lasting value. The sister volumes of this book are the other two books edited by myself. One is Information Processing for Remote Sensing and the other is Frontiers of Remote Sensing Information Processing, both published by World Scientific in 1999 and 2003, respectively. I am grateful to all the contributors of this volume for their important contribution and, in particular, to Professors Dr. J.S. Lee, S. Serpico, L. Bruzzone, and S. Omatu for chapter contributions to all three volumes. Readers are advised to go over all three volumes for more complete information on signal and image processing for remote sensing. C.H. Chen

Editor

Chi Hau Chen was born on December 22nd, 1937. He received his Ph.D. in electrical engineering from Purdue University in 1965, M.S.E.E. degree from the University of Tennessee, Knoxville, in 1962, and B.S.E.E. degree from the National Taiwan University in 1959. He is currently chancellor professor of electrical and computer engineering at the University of Massachusetts, Dartmouth, where he has taught since 1968. His research areas are in statistical pattern recognition and signal/image processing with applications to remote sensing, geophysical, underwater acoustics, and nondestructive testing problems, as well as computer vision for video surveillance, time series analysis, and neural networks. Dr. Chen has published 23 books in his area of research. He is the editor of Digital Waveform Processing and Recognition (CRC Press, 1982) and Signal Processing Handbook (Marcel Dekker, 1988). He is the chief editor of Handbook of Pattern Recognition and Computer Vision, volumes 1, 2, and 3 (World Scientific Publishing, 1993, 1999, and 2005, respectively). He is the editor of Fuzzy Logic and Neural Network Handbook (McGraw-Hill, 1966). In the area of remote sensing, he is the editor of Information Processing for Remote Sensing and Frontiers of Remote Sensing Information Processing (World Scientific Publishing, 1999 and 2003, respectively). He served as the associate editor of the IEEE Transactions on Acoustics Speech and Signal Processing for 4 years, IEEE Transactions on Geoscience and Remote Sensing for 15 years, and since 1986 he has been the associate editor of the International Journal of Pattern Recognition and Artificial Intelligence. Dr. Chen has been a fellow of the Institutue of Electrical and Electronic Engineers (IEEE) since 1988, a life fellow of the IEEE since 2003, and a fellow of the International Association of Pattern Recognition (IAPR) since 1996.

Contributors

J. van Aardt Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Ranjan Acharyya Florida Institute of Technology, Melbourne, Florida Bruno Aiazzi Institute of Applied Physics, National Research Council, Florence, Italy Selim Aksoy Bilkent University, Ankara, Turkey V.Yu. Alexandrov Nansen International Environmental and Remote Sensing Center, St. Petersburg, Russia Luciano Alparone Department of Electronics and Telecommunications, University of Florence, Florence, Italy Stefano Baronti Institute of Applied Physics, National Research Council, Florence, Italy Jon Atli Benediktsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland Fabrizio Berizzi Department of Information Engineering, University of Pisa, Pisa, Italy Massimo Bertacca ISL-ALTRAN, Analysis and Simulation Group—Radar Systems Analysis and Signal Processing, Pisa, Italy Nicolas Le Bihan Laboratory of Images and Signals, CNRS UMR 5083, INP of Grenoble, France William J. Blackwell Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts L.P. Bobylev Nansen International Environmental and Remote Sensing Center, St. Petersburg, Russia A.V. Bogdanov Institute for Neuroinformatich, Bochum, Germany Francesca Bovolo Department of Information and Communication Technology, University of Trento, Trento, Italy Lorenzo Bruzzone Department of Information and Communication Technology, University of Trento, Trento, Italy

Chi Hau Chen Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Frederick W. Chen Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts Salim Chitroub Signal and Image Processing Laboratory, Department of Telecommunication, Algiers, Algeria Leslie M. Collins Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina Fengyu Cong State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China P. Coppin Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Jose´ M.B. Dias Department of Electrical and Computer Engineering, Instituto Superior Te´cnico, Av. Rovisco Pais, Lisbon, Portugal Shinto Eguchi Institute of Statistical Mathematics, Tokyo, Japan Boris Escalante-Ramı´rez School of Engineering, National Autonomous University of Mexico, Mexico City, Mexico Toru Fujinaka

Osaka Prefecture University, Osaka, Japan

Gerrit Gort Department of Biometris, Wageningen University, The Netherlands Fredric M. Ham Florida Institute of Technology, Melbourne, Florida Norden E. Huang

NASA Goddard Space Flight Center, Greenbelt, Maryland

Shaoling Ji State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Peng Jia State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Sveinn R. Joelsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland O.M. Johannessen Nansen Environmental and Remote Sensing Center, Bergen, Norway I. Jonckheere Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium

P. Jo¨nsson

Teachers Education, Malno¨ University, Sweden

Dayalan Kasilingam Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Heesung Kwon U.S. Army Research Laboratory, Adelphi, Maryland Jong-Sen Lee Remote Sensing Division, Naval Research Laboratory, Washington, D.C. S. Lhermitte Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Steven R. Long

NASA Goddard Space Flight Center, Wallops Island, Virginia

Alejandra A. Lo´pez-Caloca Center for Geography and Geomatics Research, Mexico City, Mexico Arko Lucieer Centre for Spatial Information Science (CenSIS), University of Tasmania, Australia Je´roˆme I. Mars Laboratory of Images and Signals, CNRS UMR 5083, INP of Grenoble, France Enzo Dalle Mese Department of Information Engineering, University of Pisa, Pisa, Italy Gabriele Moser Department of Biophysical and Electronic Engineering, University of Genoa, Genoa, Italy Jose´ M.P. Nascimento Instituto Superior, de Eugenharia de Lisbon, Lisbon, Portugal Nasser Nasrabadi U.S. Army Research Laboratory, Adelphi, Maryland Ryuei Nishii Faculty of Mathematics, Kyusyu University, Fukuoka, Japan Sigeru Omatu Osaka Prefecture University, Osaka, Japan Enders A. Robinson Columbia University, Newsburyport, Massachusetts S. Sandven

Nansen Environmental and Remote Sensing Center, Bergen, Norway

Dale L. Schuler D.C.

Remote Sensing Division, Naval Research Laboratory, Washington,

Massimo Selva Institute of Applied Physics, National Research Council, Florence, Italy

Sebastiano B. Serpico Department of Biophysical and Electronic Engineering, University of Genoa, Genoa, Italy Xizhi Shi State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Anne H.S. Solberg Department of Informatics, University of Oslo and Norwegian Computing Center, Oslo, Norway Alfred Stein International Institute for Geo-Information Science and Earth Observation, Enschede, The Netherlands Johannes R. Sveinsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland Yingyi Tan

Applied Research Associates; Inc., Raleigh, North Carolina

Stacy L. Tantum Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina Maria Tates U.S. Army Research Laboratory, Adelphi, Maryland, and Morgan State University, Baltimore, Maryland J. Verbesselt Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Valeriu Vrabie CReSTIC, University of Reims, Reims, France Xianju Wang Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Zhenhai Wang Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Carl White Morgan State University, Baltimore, Maryland Michifumi Yoshioka Osaka Prefecture University, Osaka, Japan Sang-Ho Yun Department of Geophysics, Stanford University, Stanford, California Howard Zebker Department of Geophysics, Stanford University, Stanford, California

Contents

Part I

Signal Processing for Remote Sensing

1.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing................................................................................................................3 Steven R. Long and Norden E. Huang

2.

Statistical Pattern Recognition and Signal Processing in Remote Sensing...........25 Chi Hau Chen

3.

A Universal Neural Network–Based Infrasound Event Classifier ..........................33 Fredric M. Ham and Ranjan Acharyya

4.

Construction of Seismic Images by Ray Tracing.........................................................55 Enders A. Robinson

5.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA ...................................................................75 Nicolas Le Bihan, Valeriu Vrabie, and Je´roˆme I. Mars

6.

Application of Factor Analysis in Seismic Profiling Zhenhai Wang and Chi Hau Chen ......................................................................................103

7.

Kalman Filtering for Weak Signal Detection in Remote Sensing .........................129 Stacy L. Tantum, Yingyi Tan, and Leslie M. Collins

8.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation Moisture Dynamics .................................................153 J. Verbesselt, P. Jo¨nsson, S. Lhermitte, I. Jonckheere, J. van Aardt, and P. Coppin

9.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images ...............................................................................173 Sang-Ho Yun and Howard Zebker

10.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation............................................................................................189 Fengyu Cong, Chi Hau Chen, Shaoling Ji, Peng Jia, and Xizhi Shi

11.

Neural Network Retrievals of Atmospheric Temperature and Moisture Profiles from High-Resolution Infrared and Microwave Sounding Data.....................................................................................205 William J. Blackwell

12.

Satellite-Based Precipitation Retrieval Using Neural Networks, Principal Component Analysis, and Image Sharpening..........................................233 Frederick W. Chen

Part II

Image Processing for Remote Sensing

13.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface...........267 Dale L. Schuler, Jong-Sen Lee, and Dayalan Kasilingam

14.

MRF-Based Remote-Sensing Image Classification with Automatic Model Parameter Estimation......................................................................305 Sebastiano B. Serpico and Gabriele Moser

15.

Random Forest Classification of Remote Sensing Data ..........................................327 Sveinn R. Joelsson, Jon A. Benediktsson, and Johannes R. Sveinsson

16.

Supervised Image Classification of Multi-Spectral Images Based on Statistical Machine Learning........................................................................345 Ryuei Nishii and Shinto Eguchi

17.

Unsupervised Change Detection in Multi-Temporal SAR Images .......................373 Lorenzo Bruzzone and Francesca Bovolo

18.

Change-Detection Methods for Location of Mines in SAR Imagery....................401 Maria Tates, Nasser Nasrabadi, Heesung Kwon, and Carl White

19.

Vertex Component Analysis: A Geometric-Based Approach to Unmix Hyperspectral Data .....................................................................415 Jose´ M.B. Dias and Jose´ M.P. Nascimento

20.

Two ICA Approaches for SAR Image Enhancement................................................441 Chi Hau Chen, Xianju Wang, and Salim Chitroub

21.

Long-Range Dependence Models for the Analysis and Discrimination of Sea-Surface Anomalies in Sea SAR Imagery ...........................455 Massimo Bertacca, Fabrizio Berizzi, and Enzo Dalle Mese

22.

Spatial Techniques for Image Classification..............................................................491 Selim Aksoy

23.

Data Fusion for Remote-Sensing Applications..........................................................515 Anne H.S. Solberg

24.

The Hermite Transform: An Efficient Tool for Noise Reduction and Image Fusion in Remote-Sensing .........................................................................539 Boris Escalante-Ramı´rez and Alejandra A. Lo´pez-Caloca

25.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data.....559 A.V. Bogdanov, S. Sandven, O.M. Johannessen, V.Yu. Alexandrov, and L.P. Bobylev

26.

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix from a Hierarchical Segmentation of an ASTER Image...................591 Alfred Stein, Gerrit Gort, and Arko Lucieer

27.

SAR Image Classification by Support Vector Machine...........................................607 Michifumi Yoshioka, Toru Fujinaka, and Sigeru Omatu

28.

Quality Assessment of Remote-Sensing Multi-Band Optical Images ..................621 Bruno Aiazzi, Luciano Alparone, Stefano Baronti, and Massimo Selva

Index .............................................................................................................................................643

Part I

Signal Processing for Remote Sensing

1 On the Normalized Hilbert Transform and Its Applications in Remote Sensing

Steven R. Long and Norden E. Huang

CONTENTS 1.1 Introduction ........................................................................................................................... 3 1.2 Review of Processing Advances......................................................................................... 4 1.2.1 The Normalized Empirical Mode Decomposition.............................................. 4 1.2.2 Amplitude and Frequency Representations ........................................................ 6 1.2.3 Instantaneous Frequency......................................................................................... 9 1.3 Application to Image Analysis in Remote Sensing....................................................... 12 1.3.1 The IR Digital Camera and Setup........................................................................ 13 1.3.2 Experimental IR Images of Surface Processes ................................................... 13 1.3.3 Volume Computations and Isosurfaces.............................................................. 18 1.4 Conclusion ........................................................................................................................... 21 Acknowledgment......................................................................................................................... 22 References ..................................................................................................................................... 22

1.1

Introduction

The development of this new approach was motivated by the need to describe nonlinear distorted waves in detail, along with the variations of these signals that occur naturally in nonstationary processes (e.g., ocean waves). As has been often noted, natural physical processes are mostly nonlinear and nonstationary. Yet, there have historically been very few options in the available analysis methods to examine data from such nonlinear and nonstationary processes. The available methods have usually been for either linear but nonstationary, or nonlinear but stationary, and statistically deterministic processes. The need to examine data from nonlinear, nonstationary, and stochastic processes in the natural world is due to the nonlinear processes which require special treatment. The past approach of imposing a linear structure (by assumptions) on the nonlinear system is not adequate. Other than periodicity, the detailed dynamics in the processes from the data also need to be determined. This is needed because one of the typical characteristics of nonlinear processes is its intrawave frequency modulation (FM), which indicates the instantaneous frequency (IF) changes within one oscillation cycle. In the past, when the analysis was dependent on linear Fourier analysis, there was no means of depicting the frequency changes within one wavelength (the intrawave frequency variation) except by resorting to the concept of harmonics. The term ‘‘bound 3

Signal and Image Processing for Remote Sensing

4

harmonics’’ was often used in this connection. Thus, the distortions of any nonlinear waveform have often been referred to as ‘‘harmonic distortions.’’ The concept of harmonic distortion is a mathematical artifact resulting from imposing a linear structure (through assumptions) on a nonlinear system. The harmonic distortions may thus have mathematical meaning, but there is no physical meaning associated with them, as discussed by Huang et al. [1,2]. For example, in the case of water waves, such harmonic components do not have any of the real physical characteristics of a water wave as it occurs in nature. The physically meaningful way to describe such data should be in terms of its IF, which will reveal the intrawave FMs occurring naturally. It is reasonable to suggest that any such complicated data should consist of numerous superimposed modes. Therefore, to define only one IF value for any given time is not meaningful (see Ref. [3], for comments on the Wigner–Ville distribution). To fully consider the effects of multicomponent data, a decomposition method should be used to separate the naturally combined components completely and nearly orthogonally. In the case of nonlinear data, the orthogonality condition would need to be relaxed, as discussed by Huang et al. [1]. Initially, Huang et al. [1] proposed the empirical mode decomposition (EMD) approach to produce intrinsic mode functions (IMF), which are both monocomponent and symmetric. This was an important step toward making the application truly practical. With the EMD satisfactorily determined, an important roadblock to truly nonlinear and nonstationary analysis was finally removed. However, the difficulties resulting from the limitations stated by the Bedrosian [4] and Nuttall [5] theorems must also be addressed in connection with this approach. Both limitations have firm theoretical foundations and must be considered; IMFs satisfy only the necessary condition, but not the sufficient condition. To improve the performance of the processing as proposed by Huang et al. [1], the normalized empirical mode decomposition (NEMD) method was developed as a further improvement on the earlier processing methods.

1.2 1.2.1

Review of Processing Advances The Normalized Empirical Mode Decomposition

The NEMD method was developed to satisfy the specific limitations set by the Bedrosian theorem while also providing a sharper measure of the local error when the quadrature differs from the Hilbert transform (HT) result. From an example data set of a natural process, all the local maxima of the data are first determined. These local maxima are then connected with a cubic spline curve, which gives the local amplitude of the data, A(t), as shown together in Figure 1.1. The envelope obtained through spline fitting is used to normalize the data by y(t) ¼

a(t) cos u(t) ¼ A(t)

a(t) cos u(t): A(t)

(1:1)

Here A(t) represents the cubic spline fit of all the maxima from the example data, and thus a(t)/A(t) should normalize y(t) with all maxima then normalized to unity, as shown in Figure 1.2. As is apparent from Figure 1.2, a small number of the normalized data points can still have an amplitude in excess of unity. This is because the cubic spline is through the maxima only, so that at locations where the amplitudes are changing rapidly, the line representing the envelope spline can pass under some of the data points. These occasional

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

5

Example data and splined envelope 0.3

0.2

Magnitude

0.1

0

−0.1

−0.2

−0.3

−0.4 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.1 The best possible cubic spline fit to the local maxima of the example data. The spline fit forms an envelope as an important first step in the process. Note also how the frequency can change within a wavelength, and that the oscillations can occur in groups.

misses are unavoidable, yet the normalization scheme has effectively separated the amplitude from the carrier oscillation. The IF can then be computed from this normalized carrier function y(t), just obtained. Owing to the nearly uniform amplitude, the limitations set by the Bedrosian theorem are effectively satisfied. The IF computed in this way from the normalized data from Figure 1.2 is shown in Figure 1.3, together with the original example data. With the Bedrosian theorem addressed, what of the limitations set by the Nuttall theorem? If the HT can be considered to be the quadrature, then the absolute value of the HT performed on the perfectly normalized example data should be unity. Then any deviation from the absolute value of the HT from unity would be an indication of a difference between the quadrature and the HT results. An error index can thus be defined simply as E(t) ¼ [abs(Hilbert transform (y(t))) 1]2 :

(1:2)

This error index would be not only an energy measure as given in the Nuttall theorem but also a function of time as shown in Figure 1.4. Therefore, it gives a local measure of the error resulting from the IF computation. This local measure of error is both logically and practically superior to the integrated error bound established by the Nuttall theorem. If the quadrature and the HT results are identical, then it follows that the error should be

Signal and Image Processing for Remote Sensing

6

Example data and normalized carrier 1.5 Normalized Data 1

Magnitude

0.5

0

−0.5

−1

−1.5 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.2 Normalized example data of Figure 1.1 with the cubic spline envelope. The occasional value beyond unity is due to the spline fit slightly missing the maxima at those locations.

zero. Based on experience with various natural data sets, the majority of the errors encountered here result from two sources. The first source is due to an imperfect normalization occurring at locations close to rapidly changing amplitudes, where the envelope spline-fitting is unable to turn sharply or quickly enough to cover all the data points. This type of error is even more pronounced when the amplitude is also locally small, thus amplifying any errors. The error index from this condition can be extremely large. The second source is due to nonlinear waveform distortions, which will cause corresponding variations of the phase function u(t). As discussed by Huang et al. [1], when the phase function is not an elementary function, the differentiation of the phase determined by the HT is not identical to that determined by the quadrature. The error index from this condition is usually small (see Ref. et al. [6]). Overall, the NEMD method gives a more consistent, stable IF. The occasionally large error index values offer an indication where the method failed simply because the spline misses and cuts through the data momentarily. All such locations occur at the minimum amplitude with a resulting negligible energy density.

1.2.2

Amplitude and Frequency Representations

In the initial methods [1,2,6], the main result of Hilbert spectral analysis (HSA) always emphasized the FM. In the original methods, the data were first decomposed into IMFs, as

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

7

Example data and instantaneous frequency 2 Data IF

Magnitude and frequency

1.5

1

0.5

0

−0.5 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.3 The instantaneous frequency determined from the normalized carrier function is shown with the example data. Data is about zero, and the instantaneous frequency varies about the horizontal 0.5 value.

defined in the initial work. Then, through the HT, the IF and the amplitude of each IMF were computed to form the Hilbert spectrum. This continues to be the method, especially when the data are normalized. The information on the amplitude or envelope variation is not examined. In the NEMD and HSA approach, it is justifiable not to pay too much attention to the amplitude variations. This is because if there is a mode mixing, the amplitude variation from such mixed mode IMFs does not reveal any true underlying physical processes. However, there are cases when the envelope variation does contain critical information. An example of this is when there is no mode mixing in any given IMF, when a beating signal representing the sum of two coexisting sinusoidal ones is encountered. In an earlier paper, Huang et al. [1] attempted to extract individual components out of the sum of two linear trigonometric functions such as x(t) ¼ cos at þ cos bt:

(1:3)

Two seemingly separate components were recovered after over 3000 sifting steps. Yet the obtained IMFs were not purely trigonometric functions anymore, and there were obvious aliases in the resulting IMF components as well as in the residue. The approach proposed then was unnecessary and unsatisfactory. The problem, in fact, has a much simpler solution: treating the envelope as an amplitude modulation (AM), and then processing just the envelope data. The function x(t), as given in Equation 1.3, can then be rewritten as

Signal and Image Processing for Remote Sensing

8

Offset example data and error measures 0.7 Error index EI quadrature

Offset magnitude, error index (EI), quadrature

0.6

Data + 0.3

0.5

0.4

0.3

0.2

0.1

0 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.4 The error index as it changes with the data location in time. The original example data offset by 0.3 vertically for clarity is also shown. The quadrature result is not visible on this scale.

x(t) ¼ cos at þ cos bt ¼ 2 cos

aþb ab t cos t : 2 2

(1:4)

There is no difference between the sum of the individual components and the modulating envelope form; they are trigonometric identities. If both the frequency of the carrier wave, (aþb)/2, and the frequency of the envelope, (ab)/2, can be obtained, then all the information in the signal can be extracted. This indicates the reason to look for a new approach to extracting additional information from the envelope. In this example, however, the envelope becomes a rectified cosine wave. The frequency would be easier to determine from the simple period counting than from the Hilbert spectral result. For a more general case when the amplitudes of the two sinusoidal functions are not equal, the modulation is not simple anymore. For even more complicated cases, when there are more than two coexisting sinusoidal components with different amplitudes and frequencies, there is no general expression for the envelope and carrier. The final result could be represented as more than one frequency-modulated band in the Hilbert spectrum. It is then impossible to describe the individual components under this situation. In such cases, representing the signal as a carrier and envelope, variation should still be meaningful, for

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

9

the dual representations of frequency arise from the different definitions of frequency. The Hilbert-inspired view of amplitude and FMs still renders a correct representation of the signal, but this view is very different from that of Fourier analysis. In such cases, if one is sure of the stationarity and regularity of the signal, Fourier analysis could be used, which will give more familiar results as suggested by Huang et al. [1]. The judgment for these cases is not on which one is correct, as both are correct; rather, it is on which one is more familiar and more revealing. When more complicated data are present, such as in the case of radar returns, tsunami wave records, earthquake data, speech signals, and so on (representing a frequency ‘‘chirp’’), the amplitude variation information can be found by processing the envelope and treating the data as an approximate carrier. When the envelope of frequency chirp data, such as the example given in Figure 1.5, is decomposed through the NEMD process, the IMF components are obtained as shown in Figure 1.6. Using these components (or IMFs), the Hilbert spectrum can be constructed as given in Figure 1.7, together with its FM counterpart. The physical meaning of the AM spectrum is not as clearly defined in this case. However, it serves to illustrate the AM contribution to the variability of the local frequency. 1.2.3

Instantaneous Frequency

It must be emphasized that IF is a very different concept from the frequency content of the data derived from Fourier-based methods, as discussed in great detail by Huang Example of frequency chirp data 0.8

0.6

0.4

Magnitude

0.2

0

−0.2

−0.4

−0.6

−0.8 0

0.1

0.2

0.3 Time (sec)

0.4

FIGURE 1.5 A typical example of complex natural data, illustrating the concept of frequency ‘‘chirps.’’

0.5

Signal and Image Processing for Remote Sensing

10

Components of frequency chirp data

0

Offset IMF components C1 (top) to C8

12000

24000

36000

48000

60000

72000

84000 0

2000

4000

6000 Time (sec*22050)

8000

10000

12000

FIGURE 1.6 The eight IMF components obtained by processing the frequency chirp data of Figure 1.5, offset vertically from C1 (top) to C8 (bottom).

et al. [1]. The IF, as discussed here, is based on the instantaneous variation of the phase function from the HT of a data-adaptive decomposition, while the frequency content in the Fourier approach is an averaged frequency on the basis of a convolution of data with an a priori basis. Therefore, whenever the basis changes, the frequency content also changes. Similarly, when the decomposition changes, the IF also has to change. However, there are still persistent and common misconceptions on the IF computed in this manner. One of the most prevailing misconceptions about IF is that, for any data with a discrete line spectrum, IF can be a continuous function. A variation of this misconception is that IF can give frequency values that are not one of the discrete spectral lines. This dilemma can be resolved easily. In the nonlinear cases, when the IF approach treats the harmonic distortions as continuous intrawave FMs, the Fourier-based methods treat the frequency content as discrete harmonic spectral lines. In the case of two or more beating waves, the IF approach treats the data as AM and FM modulation, while the frequency content from the Fourier method treats each constituting wave as a discrete spectral line, if the process is stationary. Although they appear perplexingly different, they represent the same data. Another misconception is on negative IF values. According to Gabor’s [7] approach, the HT is implemented through two Fourier transforms: the first transforms the data into frequency space, while the second performs an inverse Fourier transform after discarding all the negative frequency parts [3]. Therefore, according to this argument, all the negative frequency content has been discarded, which then raises the question, how can there still

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

11

FM and AM Hilbert spectra 3000

2500

Frequency (Hz)

2000

1500

1000

500

0 0

0.05

0.1

0.15

0.2

0.25 0.3 Time (sec)

0.35

0.4

0.45

0.5

FIGURE 1.7 The AM and FM Hilbert spectral results from the frequency chirp data of Figure 1.5.

be negative frequency values? This question arises due to a misunderstanding of the nature of negative IF from the HT. The direct cause of negative frequency in the HT is the consequence of multiple extrema between two zero-crossings. Then, there are local loops not centered at the origin of the coordinate system, as discussed by Huang et al. [1]. Negative frequency can also occur even if there are no multiple extrema. For example, this would happen when there are large amplitude fluctuations, which cause the Hilberttransformed phase loop to miss the origin. Therefore, the negative frequency does not influence the frequency content in the process of the HT through Gabor’s [7] approach. Both these causes are removed by the NEMD and the normalized Hilbert transform (NHT) methods presented here. The latest versions of these methods (NEMD/NHT) consistently give more stable IF values. They satisfy the limitation set by the Bedrosian theorem and offer a local measure of error sharper than the Nuttall theorem. Note here that in the initial spline of the amplitude done in the NEMD approach, the end effects again become important. The method used here is just to assign the end points as a maximum equal to the very last value. Other improvements using characteristic waves and linear predictions, as discussed in Ref. [1], can also be employed. There could be some improvement, but the resulting fit will be very similar. Ever since the introduction of the EMD and HSA by Huang et al. [1,2,8], these methods have attracted increasing attention. Some investigators, however, have expressed certain reservations. For example, Olhede and Walden [9] suggested that the idea of computing

12

Signal and Image Processing for Remote Sensing

IF through the Hilbert transform is good, but that the EMD approach is not rigorous. Therefore, they have introduced the wavelet projection as the method for decomposition and adopt only the IF computation from the Hilbert transform. Flandrin et al. [10], however, suggest that the EMD is equivalent to a bank of dyadic filters, but refrain from using the HT. From the analysis presented here, it can be concluded that caution when using the HT is fully justified. The limitations imposed by Bedrosian and Nuttall certainly have solid theoretical foundations. The normalization procedure shown here will remove any reservations about further applications of the improved HT methods in data analysis. The method offers relatively little help to the approach advanced by Olhede and Walden [9] because the wavelet decomposition definitely removes the nonlinear distortions from the waveform. The consequence of this, however, is that their approach should also be limited to nonstationary, but linear, processes. It only serves the limited purpose of improving the poor frequency resolution of the continuous wavelet analysis. As clearly shown in Equation 1.1, to give a good representation of actual wave data or other data from natural processes by means of an analytical wave profile, the analytical profile will need to have IMFs, and also obey the limitations imposed by the Bedrosian and Nuttall theorems. In the past, such a thorough examination of the data has not been done. As reported by Huang et al. [2,8], most of the actual wave data recorded are not composed of single components. Consequently, the analytical representation of a given wave profile in the form of Equation 1.1 poses a challenging problem theoretically.

1.3

Application to Image Analysis in Remote Sensing

Just as much of the data from natural phenomena are either nonlinear or nonstationary, or both, so it is also with the data that form images of natural processes. The methods of image processing are already well advanced, as can be seen in reviews such as by Castleman [11] or Russ [12]. The NEMD/NHT methods can now be added to the available tools for producing new and unique image products. Nunes et al. [13] and Linderhed [14–16], among others, have already done significant work in this new area. Because of the nonlinear and nonstationary nature of natural processes, the NEMD/NHT approach is especially well suited for image data, giving frequencies, inverse distances, or wave numbers as a function of time or distance, along with the amplitudes or energy values associated with these, as well as a sharp identification of imbedded structures. The various possibilities and products of this new analysis approach include, but are not limited to, joint and marginal distributions, which can be viewed as isosurfaces, contour plots, and surfaces that contain information on frequency, inverse wavelength, amplitude, energy and location in time, space, or both. Additionally, the concept of component images representing the intrinsic scales and structures imbedded in the data is now possible, along with a technique for obtaining frequency variations of structures within the images. The laboratory used for producing the nonlinear waves, used as an example here, is the NASA Air–Sea Interaction Research Facility (NASIRF) located at the NASA Goddard Space Flight Center/Wallops Flight Facility, at Wallops Island, Virginia, within the Ocean Sciences Branch. The test section of the main wave tank is 18.3 m long and 0.9 m wide, filled to a depth of 0.76 m of water, leaving a height of 0.45 m over the water for airflow,

On the Normalized Hilbert Transform and Its Applications in Remote Sensing 5.05 m

13

New coil

1.22 m 18.29 m 30.48 m FIGURE 1.8 The NASA Air–Sea Interaction Research Facility’s (NASIRF) main wave tank at Wallops Island, VA. The new coils shown were used to provide cooling and humidity control in the airflow overheated water.

if needed. The facility can produce wind and paddle-generated waves over a water current in either direction, and its capabilities, instruments and software have been described in detail by Long and colleagues [17–21]. The basic description is shown with an additional new feature indicated as new coil in Figure 1.8. These were recently installed to provide cold air of controlled temperature and humidity for experiments using cold air overheated water during the Flux Exchange Dynamics Study of 2004 (FEDS4) experiments, a joint experiment involving the University of Washington/Applied Physics Laboratory (UW/APL), The University of Alberta, the Lamont-Doherty Earth Observatory of Columbia University, and NASA GSFC/Wallops Flight Facility. The cold airflow overheated water optimized conditions for the collection of infrared (IR) video images.

1.3.1

The IR Digital Camera and Setup

The camera used to acquire the laboratory image presented here as an example was provided by UW/APL as part of FEDS4. The experimental setup is shown in Figure 1.9. For the example shown here, the resolution of the IR image was 640 512 pixels. The camera was mounted to look upwind at the water surface, so that its pixel image area covered a physical rectangle on the water surface on the order of 10 cm per side. The water within the wave tank was heated by four commercial spa heaters, while the air in the airflow was cooled and humidity controlled by NASIRF’s new cooling and reheating coils. This produced a very thin layer of surface water that was cooled, so that whenever wave spilling and breaking occurred, it could be immediately seen by the IR camera.

1.3.2

Experimental IR Images of Surface Processes

With this imaging system in place, steps were taken to acquire interesting images of wave breaking and spilling due to wind and wave interactions. One such image is illustrated in Figure 1.10. To help the eyes visualize the image data, the IR camera intensity levels have been converted to a grey scale. Using a horizontal line that slices through the central area of the image at the value of 275, Figure 1.11 illustrates the details contained in the actual array of data values obtained from the IR camera. This gives the IR camera intensity values stored in the pixels along the horizontal line. These can then be converted to actual temperatures when needed. A complex structure is evident here. Breaking wave fronts are evident in the crescent-

Signal and Image Processing for Remote Sensing

14

Flux Exchange Dynamics Study (FEDS) 4 NASA Air–Sea Interaction Research Facility, Wallops Island, Virginia Q = rCPKH (Tb – Ts) Use two method for Tb • Direct measurement • Infer from PDF

IR camera

Instrumentation IR Camera IR Radiometer Air : U, T, q profiles SeaBird T sensors Fast T sensors Gas Chromatograph

Measurement KH via ACFT, p (Tsurf) Tskin, calibrated "LabRad" Qnet, u* Tbulk, calibrated Twater, sub-skin profiles Bulk KG LabRad IR radiometer Tskin (calibrated)

CO2 laser

KH via ACFT Tbulk via PDF

Tair qair u

FLUXES Qsensible Qlatent u*

Wind 45 cm

Tbulk 28 mm

Tprofile

76 cm

72 mm 150 mm

FIGURE 1.9 The experimental arrangement of FEDS4 (Flux Exchange Dynamics Study of 2004) used to capture IR images of surface wave processes. (Courtesy of A. Jessup and K. Phadnis of UW/APL.)

IR image of water wave surface 250

100 200

Vertical pixels

200 150 300

100

400

500

50

600 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

0

FIGURE 1.10 Surface IR image from the FEDS4 experiment. Grey bar gives the IR camera intensity levels. (Data courtesy of A. Jessup and K. Phadnis of UW/APL.)

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

15

shaped structures, where spilling and breaking brings up the underlying warmer water. After processing, the resulting components produced from the horizontal row of Figure 1.11 are shown in Figure 1.12. As can be seen, the component with the longest scale, C9, contains the bulk of the intensity values. The shorter, riding scales are fluctuations about the levels shown in component C9. The sifting was done via the extrema approach discussed in the foundation articles, and produced a total of nine components. Using this approach, the IR image was first divided into 640 horizontal rows of 512 values each. The rows were then processed to produce the components, each of the 640 rows producing a component set similar to that shown in Figure 1.12. From these basic results, component images can be assembled. This is done by taking the first component representing the shortest scale from each of the 640 component sets. These first components are then assembled together to produce an array that is also 640 rows by 512 columns and can also be visualized as an image. This is the first component image. This production of component images is then continued in a similar fashion with the remaining components representing progressively longer scales. To visualize the shortest component scales, component images 1 through 4 were added together, as shown in Figure 1.13. Throughout the image, streaks of short wavy structures can be seen to line up in the wind direction (along the vertical axis). Even though the image is formed in the IR camera by measuring heat at many different pixel locations over a rectangular area, the surface waves have an effect that can be thus remotely sensed in the image, either as streaks of warmer water exposed by breaking or as more wavelike structures. If the longer scale components are now combined using the 5th and 6th component images, a composite image is obtained as shown in Figure 1.14. Longer scales can be seen throughout the image area where breaking and mixing occur. Other wavelike

Row 275 of IR image of water wave surface

190 180 170

IR intensity levels

160 150 140 130 120 110 100 90 0

100

200

300

400

500

600

Horizontal pixels FIGURE 1.11 A horizontal slice of the raw IR image given in Figure 1.10, taken at row 275. Note the details contained in the IR image data, showing structures containing both short and longer length scales.

Signal and Image Processing for Remote Sensing

16

C1 C2 C3

10 0 −10 10 0 −10

5 0 −5 2 0 −2

C9

C7

5 0 −5 5 0 −5

C8

C6

C5

10 0 −10

C4

Components of row 275 of IR image 20 0 −20

134 132 130

50

100

150

200

250 300 Horizontal pixels

350

400

450

500

FIGURE 1.12 Components obtained by processing data from the slice shown in Figure 1.11. Note that component C9 carries the bulk of the intensity scale, while the other components with shorter scales record the fluctuations about these base levels.

structures of longer wavelengths are also visible. To produce a true wave number from images like these, one only has to convert using k ¼ 2p=l,

(1:5)

where k is wave number (in 1/cm) and l is wavelength (in cm). This would only require knowing the physical size of the image in centimeters or some other unit and its equivalent in pixels from the array analyzed. Another approach to the raw image of Figure 1.10 is to separate the original image into columns instead of rows. This would make the analysis more sensitive to structures that were better aligned with that direction, and also with the direction of wind and waves. By repeating the steps leading to Figure 1.13, the shortest scale component images in component images 3 to 5 can be combined to form Figure 1.15. Component images 1 and 2 developed from the vertical column analysis were not included here, after they were found to contain results of such a short scale uniformly spread throughout the image, and without structure. Indeed, they had the appearance of uniform noise. It is apparent that more structures at these scales can be seen by analyzing along the column direction. Figure 1.16 represents the longer scale in component image 6. By the 6th component image, the lamination process starts to fail somewhat in reassembling the image from the components. Further processing is needed to better match the results at these longer scales.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

17

Horizantal IR components 1 to 4 100 100

50

Vertical pixels

200

300

0

400 −50 500

−100

600 50

100

150

200 300 250 Horizontal pixels

350

400

450

500

FIGURE 1.13 (See color insert following page 302.) Component images 1 to 4 from the horizontal rows used to produce a composite image representing the shortest scales.

Horizontal IR components 5 to 6 40 100

30 20

200 Vertical pixels

10

300 0

400

−10 −20

500

−30 600 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

−40

FIGURE 1.14 (See color insert following page 302.) Component images 5 to 6 from the horizontal rows used to produce a composite image representing the longer scales.

Signal and Image Processing for Remote Sensing

18

Vertical pixels

Vertical IR components 3 to 5

100

40

200

20

300

0

400

−20

500

−40

600

−60 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.15 (See color insert following page 302.) Component images 3 to 5 from the vertical rows here combined to produce a composite image representing the midrange scales.

When the original data are a function of time, this new approach can produce the IF and amplitude as functions of time. Here, the original data are from an IR image, so that any slice through the image (horizontal or vertical) would be a set of camera values (ultimately temperature) representing the temperature variation over a physical length. Thus, instead of producing frequency (inverse time scale), the new approach here initially produces an inverse length scale. In the case of water surface waves, this is the familiar scale of the wave number, as given in Equation 1.5. To illustrate this, consider Figure 1.17, which shows the changes of scale along the selected horizontal row 400. The largest measures of IR energy can be seen to be at the smaller inverse length scales, which imply that it came from the longer scales of components 3 and 4. Figure 1.18 repeats this for the even longer length scales in components 5 and 6. Returning to the column-wise processing at column 250 of Figure 1.15 and Figure 1.16, further processing gives the contour plot of Figure 1.19, for components 3 through 5, and Figure 1.20, for components 4 through 6.

1.3.3

Volume Computations and Isosurfaces

Many interesting phenomena happen in the flow of time, and thus it is interesting to note how changes occur with time in the images. To include time in the analysis, a sequence of images taken at uniform time steps can be used. By starting with a single horizontal or vertical line from the image, a contour plot can be produced, as was shown in Figure 1.7 through Figure 1.20. Using a set of sequential images covering a known time period and a pixel line of data from each (horizontal or vertical), a set of numerical arrays can be obtained from the NEMD/NHT analysis. Each

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

19

Vertical IR components image 6 25 20 100 15

Vertical pixels

200

10 5

300 0 −5

400

−10 500 −15 −20

600 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.16 (See color insert following page 302.) Component image 6 from the vertical row used to produce a composite image representing the longer scale.

array can be visualized by means of a contour plot, as already shown. The entire set of arrays can also be combined in sequence to form an array volume, or an array of dimension 3. Within the volume, each element of the array contains the amplitude or intensity of the data from the image sequence. The individual element location within the three-dimensional array specifies values associated with the stored data. One axis (call it x) of the volume can represent horizontal or vertical distance down the data line taken from the image. Another axis (call it y) can represent the resulting inverse length scale associated with the data. The additional axis (call it z) is produced by laminating the arrays together, and represents time, because each image was acquired in repetitive time steps. Thus, the position of the element in the volume gives location x along the horizontal or vertical slice, inverse length along the y-axis, and time along the z-axis. Isosurface techniques would be needed to visualize this. This could be compared to peeling an onion, except that the different layers, or spatial contour values, are not bound in spherical shells. After a value of data intensity is specified, the isosurface visualization makes all array elements transparent outside of the level of the value chosen, while shading in the chosen value so that the elements inside that level (or behind it) cannot be seen. Some examples of this procedure can be seen in Ref. [21]. Another approach with the analysis of images is to reassemble lines from the image data using a different format. A sequence of images in units of time is needed, and using the same horizontal or vertical line from each image in the time sequence, each line can be laminated to its predecessor to build up an array that is the image length along the chosen line along one edge, and the number of images along the other axis, in units of

Signal and Image Processing for Remote Sensing

20

Horizontal row 400: components 1 to 4 10

Inverse length scale (1/pixel)

0.45

9

0.4

8

0.35

7

0.3

6

0.25

5

0.2

4

0.15

3

0.1

2 1

0.05 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

FIGURE 1.17 (See color insert following page 302.) The results from the NEMD/NHT computation on horizontal row 400 for components 1 to 4, which resulted from Figure 1.13. Note the apparent influence of surface waves on the IR information. The most intense IR radiation can be seen at the smaller values of inverse length scale, denoting the longer scales in components 3 and 4. A wavelike influence can be seen at all scales.

Horizontal row 400: components 5 to 6

0.04

8 0.035 7 Inverse length scale (1/pixel)

0.03 6 0.025 5 0.02 4 0.015

3

0.01

2

0.005

0 0

1

50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.18 (See color insert following page 302.) The results from the NEMD/NHT computation on horizontal row 400 for components 5 to 6, which resulted from Figure 1.14. Even at the longer scales, an apparent influence of surface waves on the IR information can still be seen.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

21

Inverse length scale (1/pixel)

Vertical column 250: components 3 to 5

0.18

10

0.16

9 8

0.14

7 0.12 6 0.1 5 0.08 4 0.06 3 0.04

2

0.02

1

100

200

300 400 Vertical pixels

500

600

FIGURE 1.19 The contour plot developed from the vertical slice at column 250, using the components 3 to 5. The larger IR values can be seen at longer length scales.

time. Once complete, this two-dimensional array can be split into slices along the time axis. Each of these time slices, representing the variation in data values with time at a single-pixel location, can then be processed with the new NEMD/NHT technique. An example of this can also be seen in Ref. [21]. The NEMD/NHT techniques can thus reveal variations in frequency or time in the data at a specific location in the image sequence.

1.4

Conclusion

With the introduction of the normalization procedure, one of the major obstacles for NEMD/NHT analysis has been removed. Together with the establishment of the confidence limit [6] through the variation of stoppage criterion, and the statistically significant test of the information content for IMF [10,22], and the further development of the concept of IF [23], the new analysis approach has indeed approached maturity for applications empirically, if not mathematically (for a recent overview of developments, see Ref. [24]). The new NEMD/NHT methods provide the best overall approach to determine the IF for nonlinear and nonstationary data. Thus, a new tool is available to aid in further understanding and gaining deeper insight into the wealth of data now possible by remote

Signal and Image Processing for Remote Sensing

22

Vertical column 250: components 4 to 6 0.1 25 0.09

Inverse length scale (1/pixel)

0.08

20

0.07 0.06

15

0.05 10

0.04 0.03

5

0.02 0.01 100

200

300 400 Vertical pixels

500

600

FIGURE 1.20 The contour plot developed from the vertical slice at column 250, using the components 4 through 6, as in Figure 1.19.

sensing and other means. Specifically, the application of the new method to data images was demonstrated. This new approach is covered by several U.S. Patents held by NASA, as discussed by Huang and Long [25]. Further information on obtaining the software can be found at the NASA authorized commercial site: http://www.fuentek.com/technologies/hht.htm

Acknowledgment The authors wish to express their continuing gratitude and thanks to Dr. Eric Lindstrom of NASA headquarters for his encouragement and support of the work.

References 1. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, S.H., Zheng, Q., Tung, C.C., and Liu, H.H., The empirical mode decomposition method and the Hilbert spectrum for non-stationary time series analysis, Proc. Roy. Soc. London, A454, 903–995, 1998.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

23

2. Huang, N.E., Shen, Z., and Long, S.R., A new view of water waves—the Hilbert spectrum, Ann. Rev. Fluid Mech., 31, 417–457, 1999. 3. Flandrin, P., Time–Frequency/ Time–Scale Analysis, Academic Press, San Diego, 1999. 4. Bedrosian, E., On the quadrature approximation to the Hilbert transform of modulated signals, Proc. IEEE, 51, 868–869, 1963. 5. Nuttall, A.H., On the quadrature approximation to the Hilbert transform of modulated signals, Proc. IEEE, 54, 1458–1459, 1966. 6. Huang, N.E., Wu, M.L., Long, S.R., Shen, S.S.P., Qu, W.D., Gloersen, P., and Fan, K.L., A confidence limit for the empirical mode decomposition and the Hilbert spectral analysis, Proc. Roy. Soc. London, A459, 2317–2345, 2003. 7. Gabor, D., Theory of communication, J. IEEE, 93, 426–457, 1946. 8. Huang, N.E., Long, S.R., and Shen, Z., The mechanism for frequency downshift in nonlinear wave evolution, Adv. Appl. Mech., 32, 59–111, 1996. 9. Olhede, S. and Walden, A.T., The Hilbert spectrum via wavelet projections, Proc. Roy. Soc. London, A460, 955–975, 2004. 10. Flandrin, P., Rilling, G., and Gonc¸alves, P., Empirical mode decomposition as a filterbank, IEEE Signal Proc. Lett., 11(2), 112–114, 2004. 11. Castleman, K.R., Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1996. 12. Russ, J.C., The Image Processing Handbook, 4th Edition, CRC Press, Boca Raton, 2002. 13. Nunes, J.C., Guyot, S., and Dele´chelle, E., Texture analysis based on local analysis of the bidimensional empirical mode decomposition, Mach. Vision Appl., 16(3), 177–188, 2005. 14. Linderhed, A., Compression by image empirical mode decomposition, IEEE Int. Conf. Image Process., 1, 553–556, 2005. 15. Linderhed, A., Variable sampling of the empirical mode decomposition of two-dimensional signals, Int. J. Wavelets, Multi-resolut. Inform. Process., 3, 2005. 16. Linderhed, A., 2D empirical mode decompositions in the spirit of image compression, Wavelet Independ. Compon. Analy. Appl. IX, SPIE Proc., 4738, 1–8, 2002. 17. Long, S.R., NASA Wallops Flight Facility Air–Sea Interaction Research Facility, NASA Reference Publication, No. 1277, 1992, 29 pp. 18. Long, S.R., Lai, R.J., Huang, N.E., and Spedding, G.R., Blocking and trapping of waves in an inhomogeneous flow, Dynam. Atmos. Oceans, 20, 79–106, 1993. 19. Long, S.R., Huang, N.E., Tung, C.C., Wu, M.-L.C., Lin, R.-Q., Mollo-Christensen, E., and Yuan, Y., The Hilbert techniques: An alternate approach for non-steady time series analysis, IEEE GRSS, 3, 6–11, 1995. 20. Long, S.R. and Klinke, J., A closer look at short waves generated by wave interactions with adverse currents, Gas Transfer at Water Surfaces, Geophysical Monograph 127, American Geophysical Union, 121–128, 2002. 21. Long, S.R., Applications of HHT in image analysis, Hilbert–Huang Transform and Its Applications, Interdisciplinary Mathematical Sciences, 5, 289–305, World Scientific, Singapore, 2005. 22. Wu, Z. and Huang, N.E., A study of the characteristics of white noise using the empirical mode decomposition method, Proc. Roy. Soc. London, A460, 1597–1611, 2004. 23. Huang, N.E., Wu, Z., Long, S.R., Arnold, K.C., Blank, K., and Liu, T.W., On instantaneous frequency, Proc. Roy. Soc. London 2006, in press. 24. Huang, N.E., Introduction to the Hilbert–Huang transform and its related mathematical problems, Hilbert–Huang Transform and Its Applications, Interdisciplinary Mathematical Sciences, 5, 1–26, World Scientific, Singapore, 2005. 25. Huang, N.E. and Long, S.R., A generalized zero-crossing for local frequency determination, US Patent pending, 2003.

2 Statistical Pattern Recognition and Signal Processing in Remote Sensing

Chi Hau Chen

CONTENTS 2.1 Introduction ......................................................................................................................... 25 2.2 Introduction to Statistical Pattern Recognition in Remote Sensing............................ 26 2.3 Using Self-Organizing Maps and Radial Basis Function Networks for Pixel Classification ....................................................................................................... 28 2.4 Introduction to Statistical Signal Processing in Remote Sensing................................ 28 2.5 Conclusions.......................................................................................................................... 30 References ..................................................................................................................................... 30

2.1

Introduction

Basically, statistical pattern recognition deals with the correct classification of a pattern into one of several available pattern classes. Basic topics in statistical pattern recognition include: preprocessing, feature extraction and selection, parametric or nonparametric probability density, decision-making processes, performance evaluation, postprocessing as needed, supervised and unsupervised learning, or training, and cluster analysis. The large amount of data available makes remote-sensing data uniquely suitable for statistical pattern recognition. Signal processing is needed not only to reduce the undesired noises and interferences but also to extract desired information from the data as well as to perform the preprocessing task for pattern recognition. Remote-sensing data considered include those from multispectral, hyperspectral, radar, optical, and infrared sensors. Statistical signal-processing methods, as used in remote sensing, include transform methods such as principal component analysis (PCA), independent component analysis (ICA), factor analysis, and the methods using high-order statistics. This chapter is presented as a brief overview of the statistical pattern recognition and statistical signal processing in remote sensing. The views and comments presented, however, are largely those of this author. The chapter introduces the pattern recognition and signal-processing topics dealt in this book. The readers are highly recommended to refer the book by Landgrebe [1] for remote-sensing pattern classification issues and the article by Duin and Tax [2] for a survey on statistical pattern recognition.

25

Signal and Image Processing for Remote Sensing

26

Although there are many applications of statistical pattern recognition, its theory has been developed only during the last half century. A list of some major theoretical developments includes the following: .

Formulation of pattern recognition as a Bayes decision theory problem [3]

.

Nearest neighbor decision rules (NNDRs) and density estimation [4]

.

Use of Parzen density estimate in nonparametric pattern recognition [5] Leave-one-out method of error estimation [6]

.

.

Use of statistical distance measures and error bounds in feature evaluation [7] Hidden Markov models as one way to deal with contextual information [8]

.

Minimization of the perceptron criterion function [9]

.

Fisher linear discriminant and multicategory generlizations [10] Link between backpropagation trained neural networks and the Bayes discriminant [11]

.

.

. .

Cover’s theorem on the separability of patterns [12] Unsupervised learning by decomposition of mixture densities [13]

.

K-mean algorithm [14] Self-organizing map (SOM) [15]

.

Statistical learning theory and VC dimension [16,17]

.

Support vector machine for pattern recognition [17] Combining classifiers [18]

.

. . .

Nonlinear mapping [19] Effect of finite sample size (e.g., [13])

In the above discussion, the role of artificial neural networks on statistical classification and clustering has been taken into account. The above list is clearly not complete and is quite subjective. However, these developments clearly have a significant impact on information processing in remote sensing. We now examine briefly the performance measures in statistical pattern recognition. .

Error probability. This is most popular as the Bayes decision rule is optimum for minimum error probability. It is noted that an average classification accuracy was proposed by Wilkinson [20] for remote sensing.

.

Ratio of interclass distance to within-class distance. This is most popular for discriminant analysis that seeks to maximize such a ratio. Mean square error. This is most popular mainly in error correction learning and in neural networks.

.

.

ROC (receiver operating characteristics) curve, which is a plot of the probability of correct decision versus the probability of false alarm, with other parameters given.

Other measures, like error-reject tradeoff, are often used in character recognition.

2.2

Introduction to Statistical Pattern Recognition in Remote Sensing

Feature extraction and selection is still a basic problem in statistical pattern recognition for any application. Feature measurements constructed from multiple bands of the

Statistical Pattern Recognition and Signal Processing in Remote Sensing

27

remote-sensing data as a vector are still most commonly used in remote-sensing pattern recognition. Transform methods are useful to reduce the redundancy in vector measurements. The dimensionality reduction has been a particularly important topic in remote sensing in view of the hyperspectral image data, which normally has several hundred spectral bands. The parametric classification rules include the Bayes or maximum likelihood decision rule and discriminant analysis. The nonparametric (or distribution free) method of classification includes NNDR and its modifications, and the Parzen density estimation. In the early 1970s, the multivariate Gaussian assumption was most popular in the multispectral data classification problem. It was demonstrated that the empirical data follows the Gaussian distribution reasonably well [21]. Even with the use of new sensors and the expanded application of remote sensing, the Gaussian assumption remains to be a good approximation. The traditional multivariate analysis still plays a useful role in remotesensing pattern recognition [22] and, because of the importance of covariance matrix, methods to use unsupervised samples to ‘‘enhance’’ the data statistics have also been considered. Indeed, for good classification, data statistics must be carefully examined. An example is the synthetic aperture radar (SAR) image data. Chapter 13 presents a discussion on the physical and statistical characteristics of the SAR image data. Without making use of the Gaussian assumption, the NNDR is the most popular nonparametric classification method. It works well even with a moderate size data set and promises an error rate that is upper-bounded by twice the Bayes error rate. However, its performance is limited in remote-sensing data classification, while neural networks–based classifiers can reach the performance nearly equal to that of the Bayes classifier. Extensive study has been done in the statistical pattern recognition community to improve the performance of NNDR. We would like to mention the work of Grabowski et al. [23] here, which introduces the k-near surrounding neighbor (k-NSN) decision rule with application to remote-sensing data classification. Some unique problem areas of statistical pattern recognition in remote sensing are the use of contextual information and the ‘‘Hughes phenomenon.’’ The use of Markov random field model for contextual information is presented in Chapter 14 of this volume. While the classification performance generally improves with increases in the feature dimension, the performance reaches a peak without a proportional increase in the training sample size, beyond which the performance degrades. This is the so-called ‘‘Hughes phenomenon.’’ Methods to reduce this phenomenon are well presented in Ref. [1]. Data fusion is important in remote sensing as different sensors, which have different strengths, are often used. The subject is treated in Chapter 23 of this volume. Though the approach is not limited to statistical methodology [24], the approaches in combining classifiers in statistical pattern recognition and neural networks can be quite useful in providing effective utilizations of information from different sensors or sources to achieve the best-available classification performance. In this volume, Chapter 15 and Chapter 16 present two approaches in statistical combing of classifiers. The recent development in support vector machine appears to present an ultimate classifier that may provide the best classification performance. Indeed, the design of the classifier is fundamental to the classification performance. There is, however, a basic question: ‘‘Is there a best classifier?’’ [25]. The answer is ‘‘No’’ as, among other reasons, it is evident that the classification process is data-dependent. Theory and practice are often not consistent in pattern recognition. The preprocessing and feature extraction and selection are important and can influence the final classification performance. There are no clear steps to be taken in preprocessing, and the optimal feature extraction and selection is still an unsolved problem. A single feature derived from the genetic algorithm may perform better than several original features. There is always a choice to be made between using a complex

28

Signal and Image Processing for Remote Sensing

feature set followed by a simple classifier and a simple feature set followed by a complex classifier. In this volume, Chapter 27 deals with the classification by support vector machine. Among many other publications on the subject, Melgani and Bruzzone [26] provide an informative comparison of the performance of several support vector machines.

2.3 Using Self-Organizing Maps and Radial Basis Function Networks for Pixel Classification In this section, some experimental results are presented to illustrate the importance of preprocessing before classification. The data set, which is now available at the IEEE Geoscience and Remote Sensing Society database, consists of 250 350 pixel images. They were acquired by two imaging sensors installed on a Daedalus 1268 Airborne Thematic Mapper (ATM) scanner and a PLC-band, fully polarimetric NASA/JPL SAR sensor of an agricultural area near the village of Feltwell, U.K. The original SAR images include nine channels. Figure 2.1 shows the original nine channels of image data. The radial basis function (RBF) neural network is used for classification [27]. However, preprocessing is performed by the SOM that performs preclustering. The weights of the SOM are chosen as centers for RBF neurons. RBF has five output nodes for five pattern classes on the image data considered (SAR and ATM images in an agricultural area). Weights of the ‘‘n’’ most-frequently-fired neurons, when each class was presented to the SOM, were separately taken as the center for the 5 n RBF neurons. The weights between the hidden-layer neurons and the output-layer neurons were computed by a procedure for a generalized radial-basis function networks. Pixel classification using SOM alone (unsupervised) is 62.7% correct. Pixel classification using RBF alone (supervised) is 89.5% correct, at best. Pixel classification using both SOM and RBF is 95.2% correct. This result is better than the reported results on the same data set using RBF [28] at 90.5% correct or ICA-based features with nearest neighbor classification rule [29] at 86% correct.

2.4

Introduction to Statistical Signal Processing in Remote Sensing

Signal and image processing is needed in remote-sensing information processing to reduce the noise and interference with the data, to extract the desired signal and image component, or to derive useful measurements for input to the classifier. The classification problem is, in fact, very closely linked to signal and image processing [30,31]. Transform methods have been most popular in signal and image processing [32]. Though the popular wavelet transform method for remote sensing is treated elsewhere [33], we have included Chapter 1 in this volume, which presents the popular Hilbert–Huang transform; Chapter 24, which deals with the use of Hermite transform in the multispectral image fusion; and Chapter 10, Chapter 19, and Chapter 20, which make use of the methods of ICA. Although there is a constant need for better sensors, the signal-processing algorithm such as the one presented in Chapter 7 demonstrates well the role of Kalman filtering in weak signal detection. Time series modeling as used in remote sensing is the subject of Chapter 8 and Chapter 21. Chapter 6 of this volume makes use of the factor analysis.

Statistical Pattern Recognition and Signal Processing in Remote Sensing

29

th-c-hh

th-c-hv

th-c-vv

th-l-hh

th-l-hv

th-l-vv

th-p-hh

th-p-hv

th-p-vv

FIGURE 2.1 A nine-channel SAR image data set.

Signal and Image Processing for Remote Sensing

30

In spite of the numerous efforts with the transform methods, the basic method of PCA always has its useful role in remote sensing [34]. Signal decomposition and the use of high-order statistics can potentially offer new solutions to the remote-sensing information processing problems. Considering the nonlinear nature of the signal and image processing problems, it is necessary to point out the important roles of artificial neural networks in signal processing and classification, as presented in Chapter 3, Chapter 11, and Chapter 12. A lot of effort has been made in the last two decades to derive effective features in signal classification through signal processing. Such efforts include about two dozen mathematical features for use in exploration seismic pattern recognition [35], multi-dimensional attribute analysis that includes both physically and mathematically significant features or attributes for seismic interpretation [36], time domain, frequency domain, and time– frequency domain extracted features for transient signal analysis [37] and classification, and about a dozen features for active sonar classification [38]. Clearly, the feature extraction method for one type of signal cannot be transferred to other signals. To use a large number of features derived from signal processing is not desirable as there is significant information overlap among features and the resulting feature selection process can be tedious. It has not been verified that features extracted from the time–frequency representation can be more useful than the features from time–domain analysis and frequency domain alone. Ideally, a combination of a small set of physically significant and mathematically significant features should be used. Instead of looking for the optimal feature set, a small, but effective, feature set should be considered. It is doubtful that an optimal feature set for any given pattern recognition application can be developed in the near future in spite of many advances in signal and image processing.

2.5

Conclusions

Remote-sensing sensors have been able to deliver abundant information [39]. The many advances in statistical pattern recognition and signal processing can be very useful in remote-sensing information processing, either to supplement the capability of sensors or to effectively utilize the enormous amount of sensor data. The potentials and opportunities of using statistical pattern recognition and signal processing in remote sensing are thus unlimited.

References 1. Landgrebe, D.A., Signal Theory Methods in Multispectral Remote Sensing, John Wiley & Sons, New York, 2003. 2. Duin, R.P.W. and Tax, D.M.J., Statistical pattern recognition, in Handbook of Pattern Recognition and Computer Vision, 3rd ed., Chen, C.H. and Wang, P.S.P., Eds., World Scientific Publishing, Singapore, Jan. 2005, Chap. 1. 3. Chow, C.K., An optimum character recognition system using decision functions, IEEE Trans. Electron. Comput., EC6, 247–254, 1957. 4. Cover, T.M. and Hart, P.E., Nearest neighbor pattern classification, IEEE Trans. Inf. Theory, IT-13(1), 21–27, 1967.

Statistical Pattern Recognition and Signal Processing in Remote Sensing

31

5. Parzen, E., On estimation of a probability density function and mode, Annu. Math. Stat., 33(3), 1065–1076, 1962. 6. Lachenbruch, P.S. and Mickey, R.M., Estimation of error rates in discriminant analysis, Technometrics, 10, 1–11, 1968. 7. Kailath, T., The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commn. Technol., COM-15, 52–60, 1967. 8. Duda, R.C., Hart, P.E., and Stork, D.G., Pattern Classification, John Wiley & Sons, New York, 2003. 9. Ho, Y.C. and Kashyap, R.L., An algorithm for linear inequalities and its applications, IEEE Trans. Electron. Comput., Ece-14(5), 683–688, 1965. 10. Duda, R.C. and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1972. 11. Bishop, C., Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. 12. Cover, T.M., Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. Electron. Comput., EC-14, 326–334, 1965. 13. Fukanaga, K., Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, New York, 1990. 14. MacQueen, J., Some methods for classification and analysis of multivariate observations, Proc. Fifth Berkeley Symp. Probab. Stat., 281–297, 1997. 15. Kohonen, T., Self-Organization and Associative Memory, 3rd ed., Springer-Verlag, Heidelberg, 1988 (1st ed., 1980). 16. Vapnik, V.N., and Chervonenkis, A.Ya., On the uniform convergence of relative frequencies of events to their probabilities, Theor. Probab. Appl., 17, 264–280, 1971. 17. Vapnik, V.N., Statistical Learning Theory, John Wiley & Sons, New York, 1998. 18. Kittler, J., Hatef, M., Duin, R.P.W., and Matas, J., On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intellig., 20(3), 226–239, 1998. 19. Sammon, J.W., Jr., A nonlinear mapping for data structure analysis, IEEE Trans. Comput., C-18, 401–409, 1969. 20. Wilkinson, G., Results and implications of a study of fifteen years of satellite image classification experiments, IEEE Trans. Geosci. Remote Sens., 43, 433–440, 2005. 21. Fu, K.S., Application of pattern recognition to remote sensing, in Applications of Pattern Recognition, Fu, K.S., Ed., CRC Press, Boca Raton, FL, 1982, Chap. 4. 22. Benediktsson, J.A., Statistical and neural network pattern recognition methods for remote sensing applications, in Handbook of Pattern Recognition and Computer Vision, 2nd ed., Chen, C.H. et al., Eds., World Scientific Publishing, Singapore, 1999, Chap. 3.2. 23. Graboswki, S., Jozwik, A., and Chen, C.H., Nearest neighbor decision rule for pixel classification in remote sensing, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 13. 24. Stevens, M.R. and Snorrason, M., Multisensor automatic target segmentation, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 15. 25. Richards, J., Is there a best classifier? Proc. SPIE, 5982, 2005. 26. Melgani, F. and Bruzzone, L., Classification of hyperspectral remote sensing images with support vector machines, IEEE Trans. Geosci. Remote Sens., 42, 1778–1790, 2004. 27. Chen, C.H. and Shrestha, B., Classification of multi-sensor remote sensing images using selforganizing feature maps and radial basis function networks, Proc. IGARSS, Hawaii, 2000. 28. Bruzzone, L. and Prieto, D., A technique of the selection of kernel-function parameters in RBF neural networks for classification of remote sensing images, IEEE Trans. Geosci. Remote Sens., 37(2), 1179–1184, 1999. 29. Zhang, X. and Chen, C.H., Independent component analysis by using joint cumulants and its application to remote sensing images, J. VLSI Signal Process. Systems, 37(2/3), 2004. 30. Chen, C.H., Ed., Digital Waveform Processing and Recognition, CRC Press, Boca Raton, FL, 1982. 31. Chen, C.H., Ed., Signal Processing Handbook, Marcel-Dekker, New York, 1988. 32. Chen, C.H., Transform methods in remote sensing information processing, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 2.

32

Signal and Image Processing for Remote Sensing

33. Chen, C.H., Eds., Wavelet analysis and applications, in Frontiers of Remote Sensing Information Processing, World Scientific Publishing, Singapore, 2003, Chaps. 7–9, pp. 139–224. 34. Lee, J.B., Woodyatt, A.S., and Berman, M., Enhancement of high spectral resolution remote sensing data by a noise-adjusted principal component transform, IEEE Trans. Geosci. Remote Sens., 28, 295–304, 1990. 35. Li, Y., Bian, Z., Yan, P., and Chang, T., Pattern recognition in geophysical signal processing and interpretation, in Handbook of Pattern Recognition and Computer Vision, 1st ed., Chen, C.H. et al., Eds., World Scientific Publishing, Singapore, 1993, Chap. 3.2. 36. Olson, R.G., Signal transient analysis and classification techniques, in Handbook of Pattern Recognition and Computer Vision, 1st ed., Chen, C.H., Ed., World Scientific Publishing, Singapore, 1993, Chap. 3.3. 37. Justice, J.H., Hawkins, D.J., and Wong, G., Multidimensional attribute analysis and pattern recognition for seismic interpretation, Pattern Recognition, 18(6), 391–407, 1985. 38. Chen, C.H., Neural networks in active sonar classification, Neural Networks in Ocean Environments, IEE conference, Washington; DC; 1991. 39. Richard, J., Remote sensing sensors: capabilities and information processing requirements, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 1.

3 A Universal Neural Network–Based Infrasound Event Classifier

Fredric M. Ham and Ranjan Acharyya

CONTENTS 3.1 Overview of Infrasound and Why Classify Infrasound Events?................................ 33 3.2 Neural Networks for Infrasound Classification ............................................................ 34 3.3 Details of the Approach ..................................................................................................... 35 3.3.1 Infrasound Data Collected for Training and Testing ....................................... 36 3.3.2 Radial Basis Function Neural Networks ............................................................ 36 3.4 Data Preprocessing ............................................................................................................. 40 3.4.1 Noise Filtering ......................................................................................................... 40 3.4.2 Feature Extraction Process .................................................................................... 40 3.4.3 Useful Definitions ................................................................................................... 44 3.4.4 Selection Process for the Optimal Number of Feature Vector Components ................................................................................................ 46 3.4.5 Optimal Output Threshold Values and 3-D ROC Curves .............................. 46 3.5 Simulation Results .............................................................................................................. 49 3.6 Conclusions .......................................................................................................................... 53 Acknowledgments ....................................................................................................................... 53 References ..................................................................................................................................... 53

3.1

Overview of I nfrasound and Why C las sify Inf rasound Events?

Infrasound is a longitudinal pressure wave [1– 4]. The characteristics of these waves are similar to audible acoustic waves but the frequency range is far below what the human ear can detect. The typical frequency range is from 0.01 to 10 Hz (Figure 3.1). Nature is an incredible creator of infrasonic signals that can emanate from sources such as volcano eruptions, earthquakes, severe weather, tsunamis, meteors (bolides), gravity waves, microbaroms (infrasound radiated from ocean waves), surf, mountain ranges (mountain associated waves), avalanches, and auroral waves to name a few. Infrasound can also result from man-made events such as mining blasts, the space shuttle, high-speed aircraft, artillery fire, rockets, vehicles, and nuclear events. Because of relatively low atmospheric absorption at low frequencies, infrasound waves can travel long distances in the Earth’s atmosphere and can be detected with sensitive ground-based sensors. An integral part of the comprehensive nuclear test ban treaty (CTBT) international monitoring system (IMS) is an infrasound network system [3]. The goal is to have 60 33

Signal and Image Processing for Remote Sensing

34 0.01

0.03

1 Megaton yield

0.1 0.2

1.0

Hz

Impulsive events

1 Kiloton yield

Volcano events

10.0

Microbaroms

Gravity waves Bolide

Mountain associated waves FIGURE 3.1 Infrasound spectrum.

0.01

0.03

0.1

0.2

1.0

10.0 Hz

infrasound arrays operational worldwide over the next several years. The main objective of the infrasound monitoring system is the detection and verification, localization, and classification of nuclear explosions as well as other infrasonic signals-of-interest (SOI). Detection refers to the problem of detecting an SOI in the presence of all other unwanted sources and noises. Localization deals with finding the origin of a source, and classification deals with the discrimination of different infrasound events of interest. This chapter concentrates on the classification part only.

3.2

Neural Networks for Infrasound Classification

Humans excel at the task of classifying patterns. We all perform this task on a daily basis. Do we wear the checkered or the striped shirt today? For example, we will probably select from a group of checkered shirts versus a group of striped shirts. The grouping process is carried out (probably at a near subconscious level) by our ability to discriminate among all shirts in our closet and we group the striped ones in the striped class and the checkered ones in the checkered class (that is, without physically moving them around in the closet, only in our minds). However, if the closet is dimly lit, this creates a potential problem and diminishes our ability to make the right selection (that is, we are working in a ‘‘noisy’’ environment). In the case of using an artificial neural network for classification of patterns (or various ‘‘events’’) the same problem exists with noise. Noise is everywhere. In general, a common problem associated with event classification (or detection and localization for that matter) is environmental noise. In the infrasound problem, many times the distance between the source and the sensors is relatively large (as opposed to region infrasonic phenomena). Increases in the distance between sources and sensors heighten the environmental dependence of the signals. For example, the signal of an infrasonic event that takes place near an ocean may have significantly different characteristics as compared to the same event that occurs in a desert. A major contributor of noise for the signal near an ocean is microbaroms. As mentioned above, microbaroms are generated in the air from large ocean waves. One important characteristic of neural networks is their noise rejection capability [5]. This, and several other attributes, makes them highly desirable to use as classifiers.

A Universal Neural Network–Based Infrasound Event Classifier

3.3

35

Details of the Approach

Our approach of classifying infrasound events is based on a parallel bank neural network structure [6–10]. The basic architecture is shown in Figure 3.2. There are several reasons for using such an architecture; however, one very important advantage of dedicating one module to perform the classification of one event class is that the architecture is fault tolerant (i.e., if one module fails, the rest of the individual classifiers will continue to function). However, the overall performance of the classifier is enhanced when the parallel bank neural network classifier (PBNNC) architecture is used. Individual banks (or modules) within the classifier architecture are radial basis function neural networks (RBF NNs) [5]. Also, each classifier has its own dedicated preprocessor. Customized feature vectors are computed optimally for each classifier and are based on cepstral coefficients and a subset of their associated derivatives (differences) [11]. This will be explained in detail later. The different neural modules are trained to classify one and only one class; however, for the requisite module responsible for one of the classes, it is also trained not to recognize all other classes (negative reinforcement). During the training process, the output is set to a ‘‘1’’ for a correct class and a ‘‘0’’ for all the other signals associated with all the other classes. When the training process is complete the final output thresholds will be set to an optimal value based on a three-dimensional receiver operating characteristic (3-D ROC) curve for each one of the neural modules (see Figure 3.2). Optimum threshold set by ROC curve Pre-processor 1

Infrasound class 1 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 2

Infrasound class 2 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 3

Infrasound class 3 neural network

1 0

Optimum threshold set by ROC curve

Infrasound signal Pre-processor 4

Infrasound class 4 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 5

Infrasound class 5 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 6

Infrasound class 6 neural network

FIGURE 3.2 Basic parallel bank neural network classifier (PBNNC) architecture.

1 0

Signal and Image Processing for Remote Sensing

36 3.3.1

Infrasound Data Collected for Training and Testing

The data used for training and testing the individual networks are obtained from multiple infrasound arrays located in different geographical regions with different geometries. The six infrasound classes used in this study are shown in Table 3.1, and the various array geometries are shown in Figure 3.3(a) through Figure 3.3(e) [12,13]. Table 3.2 shows the various classes, along with the array numbers where the data were collected, and the associated sampling frequencies. 3.3.2

Radial Basis Function Neural Networks

As previously mentioned, each of the neural network modules in Figure 3.2 is an RBF NN. A brief overview of RBF NNs will be given here. This is not meant to be an exhaustive discourse on the subject, but only an introduction to the subject. More details can be found in Refs. [5,14]. Earlier work on the RBF NN was carried out for handling multivariate interpolation problems [15,16]. However, more recently they have been used for probability density estimation [17–19] and approximations of smooth multivariate functions [20]. In principle, the RBF NN makes adjustments of its weights so that the error between the actual and the desired responses is minimized relative to an optimization criterion through a defined learning algorithm [5]. Once trained, the network performs the interpolation in the output vector space, thus the generalization property. Radial basis functions are one type of positive-definite kernels that are extensively used for multivariate interpolation and approximation. Radial basis functions can be used for problems of any dimension, and the smoothness of the interpolants can be achieved to any desirable extent. Moreover, the structures of the interpolants are very simple. However, there are several challenges that go along with the aforementioned attributes of RBF NNs. For example, many times an ill-conditioned linear system must be solved, and the complexity of both time and space increases with the number of interpolation points. But these types of problems can be overcome. The interpolation problem may be formulated as follows. Assume M distinct data points X ¼ {x1, . . . , xM}. Also assume the data set is bounded in a region V (for a specific class). Each observed data point x 2 R u (u corresponds to the dimension of the input space) may correspond to some function of x. Mathematically, the interpolation problem may be stated as follows. Given a set of M points, i.e., {xi 2 R uji ¼ 1, 2, . . . , M} and a corresponding set of M real numbers {di 2 Rji ¼ 1, 2, . . . , M} (desired outputs or the targets), find a function F:R M ! R that satisfies the interpolation condition F(xi ) ¼ di , i ¼ 1, 2, . . . , M

(3:1)

TABLE 3.1 Infrasound Classes Used for Training and Testing Class Number 1 2 3 4 5 6

Event

No. SOI (n ¼ 574)

No. SOI Used for Training (n ¼ 351)

No. SOI Used for Testing (n ¼ 223)

Vehicle Artillery fire (ARTY) Jet Missile Rocket Shuttle

8 264 12 24 70 196

4 132 8 16 45 146

4 132 4 8 25 50

A Universal Neural Network–Based Infrasound Event Classifier (a)

37

(b)

YDIS, m

YDIS, m

30

30

Sensor 3 (3.1, 19.9)

20

Sensor 5 (−19.8, 0.0)

10

Sensor 1 (0.0, 0.0) −40

−30

−20

−10

10

20

−40

30 XDIS, m

−30

10

Sensor 1 (0.0, 0.0)

−20

−10

−10

Sensor 2 (−18.6, −7.5)

Sensor 3 (0.7, 20.1)

20

10 −10

Sensor 4 (15.3, −12.7)

−20

20

30

Sensor 4 XDIS, m (19.8, −0.1)

−20

Sensor 2 (−1.3, −19.9) −30

−30

Array BP1

Array BP2

YDIS, m

YDIS, m

(c)

(d)

30

30

20

20

−40

−30

−20

Sensor 1 (0.0, 0.0)

Sensor 1 (0.0, 0.0) −10

10

20

30 40 XDIS, m

−10

Sensor 4 (45.0, −8.0)

Sensor 3 (28.0, −21.0)

−30

YDIS, m 30

Sensor 3 (1.1, 20.0)

10

Sensor 1 (0.0, 0.0) −40

−30

−20

−10

10

20

30 XDIS, m

−10

−20

Sensor 2 (−12.3, −15.8)

Sensor 4 (14.1, −14.1)

−30

Array K8203

FIGURE 3.3 Five different array geometries.

−30

−20

Sensor 2 (−20.1, 0.0)

−10

10 −10

−30

Array K8202

Array K8201

20

−40

−20

−20

(e)

Sensor 4 (20.3, 0.5)

10

10

Sensor 2 (−22.0, 10.0)

Sensor 3 (0.0, 20.2)

20

30 XDIS, m

Signal and Image Processing for Remote Sensing

38 TABLE 3.2

Array Numbers Associated with the Event Classes and the Sampling Frequencies Used to Collect the Data Class Number 1 2 3 4 5 6 a

Event

Array

Sampling Frequency, Hz

Vehicle Artillery fire (ARTY) (K8201: Sites 1 and 2) Jet Missile Rocket Shuttle

K8201 K8201; K8203

100 (K8201: Sites 1 and 100; Sites 2 and 50); 50 50 50; 50 100; 100 100; 50

K8201 K8201; K8203 BP1; BP2 BP2; BP103a

Array geometry not available.

Thus, all the points must pass through the interpolating surface. A radial basis function may be a special interpolating function of the form F(x) ¼

M X

w i f i ( kx xi k2 )

(3:2)

i¼1

where f(.) is known as the radial basis function and k.k2 denotes the Euclidean norm. In general, the data points xi are the centers of the radial basis functions and are frequently written as ci. One of the problems encountered when attempting to fit a function to data points is over-fitting of the data, that is, the value of M is too large. However, generally speaking, this is less a problem the RBF NN that it is with, for example, a multi-layer perceptron trained by backpropagation [5]. The RBF NN is attempting to construct the hyperspace for a particular problem when given a limited number of data points. Let us take another point of view concerning how an RBF NN performs its construction of a hypersurface. Regularization theory [5,14] is applied to the construction of the hypersurface. A geometrical explanation follows. Consider a set of input data obtained from several events from a single class. The input data may be from temporal signals or defined features obtained from these signals using an appropriate transformation. The input data would be transformed by a nonlinear function in the hidden layer of the RBF NN. Each event would then correspond to a point in the feature space. Figure 3.4 depicts a two-dimensional (2-D) feature set, that is, the dimension of the output of the hidden layer in the RBF NN is two. In Figure 3.4, ‘‘(a)’’, ‘‘(b)’’, and ‘‘(c)’’ correspond to three separate events. The purpose here is to construct a surface (shown by the dotted line in Figure 3.4) such that the dotted region encompasses events of the same class. If the RBF network is to classify four different classes, there must be four different regions (four dotted contours), one for each class. Ideally, each of these regions should be separate with no overlap. However, because there is always a limited amount of observed data, perfect reconstruction of the hyperspace is not possible and it is inevitable that overlap will occur. To overcome this problem it is necessary to incorporate global information from V (i.e., the class space) in approximating the unknown hyperspace. One choice is to introduce a smoothness constraint on the targets. Mathematical details will not be given here, but for an in-depth development see Refs. [5,14]. Let us now turn our attention to the actual RBF NN architecture and how the network is trained. In its basic form, the RBF NN has three layers: an input layer, one hidden

A Universal Neural Network–Based Infrasound Event Classifier

39

Feature 2

(a)

(b)

(c)

FIGURE 3.4 Example of a two-dimensional feature set.

Feature 1

layer, and one output layer. Referring to Figure 3.5, the source nodes (or the input components) make up the input layer. The hidden layer performs a nonlinear transformation (i.e., the radial basis functions residing in the hidden layer perform this transformation) of the input to the network and is generally of a higher dimension than the input. This nonlinear transformation of the input in the hidden layer may be viewed as a basis for the construction of the input in the transformed space. Thus, the term radial basis function. In Figure 3.5, the output of the RBF NN (i.e., at the output layer) is calculated according to yi ¼ fi (x) ¼

N X

wik fk (x,ck ) ¼

k¼1

N X

wik fk (kx ck k2 ), i ¼ 1, 2, . . . , m (no: outputs)

(3:3)

k¼1

where x 2 R u1 is the input vector, fk(.) is a (RBF) function that maps R þ (set of all positive real numbers) to R (field of real numbers), k.k2 denotes the Euclidean norm, wik are the weights in the output layer, N is the number of neurons in the hidden layer, and ck 2 R u1 are the RBF centers that are selected based on the input vector space. The Euclidean distance between the center of each neuron in the hidden layer and the input to the network is computed. The output of the neuron in a hidden layer is a nonlinear x1

W T ∈ ℜm × N f1

x2

x3

Σ

y1

Σ

ym

f2

fN

xu Input layer

Hidden layer

Output layer

FIGURE 3.5 RBF NN architecture.

Signal and Image Processing for Remote Sensing

40

function of this distance, and the output of the network is computed as a weighted sum of the hidden layer outputs. The functional form of the radial basis function, fk(.), can be any of the following:

.

Linear function: f(x) ¼ x Cubic approximation: f(x) ¼ x3

.

Thin-plate-spline function: f(x) ¼ x2 ln(x)

.

Gaussian function: f(x) ¼ exp(x2/s2) pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Multi-quadratic function: f(x) ¼ x2 þ s2

.

. .

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Inverse multi-quadratic function: f(x) ¼ 1=( x2 þ s2 )

The parameter s controls the ‘‘width’’ of the RBF and is commonly referred to as the spread parameter. In many practical applications the Gaussian RBF is used. The centers, ck, of the Gaussian functions are points used to perform a sampling of the input vector space. In general, the centers form a subset of the input data.

3. 4 3.4.1

D ata P reproc essi ng Noise Filtering

Microbaroms, as previously defined, are a persistently present source of noise that resides in most collected infrasound signals [21–23]. Microbaroms are a class of infrasonic signals characterized by narrow-band, nearly sinusoidal waveforms, with a period between 6 and 8 sec. These signals can be generated by marine storms through a nonlinear interaction of surface waves [24]. The frequency content of the microbaroms often coincides with that of small-yield nuclear explosions. This could be bothersome in many applications; however, simple band-pass filtering can alleviate the problem in many cases. Therefore, a band-pass filter with a pass band between 1 and 49 Hz (for signals sampled at 100 Hz) is used here to eliminate the effects of the microbaroms. Figure 3.6 shows how band-pass filtering can be used to eliminate the microbaroms problem. 3.4.2

Feature Extraction Process

Depicted in each of the six graphs in Figure 3.7 is a collection of eight signals from each class, that is, yij(t) for i ¼ 1, 2, . . . , 6 (classes) and j ¼ 1, 2, . . . , 8 (number of signals) (see Table 3.1 for total number of signals in each class). A feature extraction process is desired that will capture the salient features of the signals in each class and at the same time be invariant relative to the array geometry, the geographical location of the array, the sampling frequency, and the length of the time window. The overall performance of the classifier is contingent on the data that is used to train the neural network in each of the six modules shown in Figure 3.2. Moreover, the neural network’s ability to distinguish between the various events (presented the neural networks as feature vectors) is the distinctiveness of the features between the classes. However, within each class it is desirable to have the feature vectors as similar to each other as possible. There are two major questions to be answered: (1) What will cause the signals in one class to have markedly different characteristics? (2) What can be done to minimize these

A Universal Neural Network–Based Infrasound Event Classifier After filtering 1

1

0.5

Amplitude

Amplitude

Before filtering 1.5

0.5

0

400

–1

1000

500 Time samples

500 Time samples

1000

8 6

300 200 100 0 –50

0

After filtering

Before filtering

Magnitude

Magnitude

0.0 –0.5

0 –0.5

41

50

0 Frequency (Hz)

4 2 0 –50

0 Frequency (Hz)

50

FIGURE 3.6 Results of band-pass filtering to eliminate the effects of microbaroms (an artillery signal).

differences and achieve uniformity within a class and distinctively different feature vector characteristics between classes? The answer to the first question is quite simple—noise. This can be noise associated with the sensors, the data acquisition equipment, or other unwanted signals that are not of interest. The answer to the second question is also quite simple (once you know the answer)—using a feature extraction process based on computed cepstral coefficients and a subset of their associated derivatives (differences) [10,11,25 –28]. As mentioned in Section 3.3, each classifier has its own dedicated preprocessor (see Figure 3.2). Customized feature vectors are computed optimally for each classifier (or neural module) and are based on the aforementioned cepstral coefficients and a subset of their associated derivatives (or differences). The preprocessing procedure is as follows. Each time-domain signal is first normalized and then its mean value is computed and removed. Next, the power spectral density (PSD) is calculated for each signal, which is a mixture of the desired component and possibly other unwanted signals and noise. Therefore, when the PSDs are computed for a set of signals in a defined class there will be spectral components associated with noise and other unwanted signals that need to be suppressed. This can be systematically accomplished by first computing the average PSD (i.e., PSDavg) over the suite of PSDs for a particular class. The spectral components are defined as mi for i ¼ 1, 2, . . . for PSDavg. The maximum spectral component, mmax, of PSDavg is then determined. This is considered the dominant spectral component within a particular class and its value is used to suppress selected components in the resident PSDs for any particular class according to the following: if mi > «1 mmax (typically «1 ¼ 0:001) then mi else «2

mi mi (typically «2 ¼ 0:00001)

Signal and Image Processing for Remote Sensing

42 (a) Raw time-domain 8 signals for vehicle

(b) Raw time-domain 8 signals for missile

0.8

2 1.5

0.6

1 0.5

Amplitude

Amplitude

0.4 0.2 0

Vehicle class

–0.2

0 –0.5 –1

Missile class

–1.5 –2

–0.4 –2.5 –0.6

0

100

200

300

400

500

600

700

–3

Time (sec) (c) Raw time-domain 8 signals for artillery 0.7

1.8 1.6

0

1000

2000

3000

4000

5000

6000

7000

8000

1400

1600

Time (sec) (d) Raw time-domain 8 signals for rocket

0.6

1.4 0.5 Amplitude

Amplitude

1.2 1

Artillery class

0.8

0.4 0.3

0.6

Rocket class

0.2 0.4 0.1

0.2 0

0 0

50

100

150

200

250

300

350

0

1.4

600

800

1000

1200

Shuttle class

0.9

Jet class

1

0.8

0.8

0.7

Amplitude

Amplitude

400

1

1.2

0.6 0.4

0.6 0.5

0.2

0.4

0

0.3

–0.2

0.2

–0.4

200

Time (sec) (f) Raw time-domain 8 signals for shuttle

Time (sec) (e) Raw time-domain 8 signals for jet

0

500

1000 1500 2000 2500 3000

3500 4000 4500 0.1 0

Time (sec)

100

200

300

400

500

600

700

800

900 1000

Time (sec)

FIGURE 3.7 (See color insert following page 302.) Infrasound signals for six classes.

To some extent, this will minimize the effects of any unwanted components that may reside in the signals and at the same time minimize the effects of noise. However, another step can be taken to further minimize the effects of any unwanted signals and noise that may reside in the data. This is based on a minimum variance criterion applied to the spectral components of the PSDs in a particular class after the previously described step is completed. The second step is carried out by taking the first 90% of the spectral components that are rank-ordered according to the smallest variance. The rest of the components

A Universal Neural Network–Based Infrasound Event Classifier

43

in the power spectral densities within a particular class are set to a small value, that is, «3 (typically 0.00001). Therefore, the number of spectral components greater than «3 will dictate the number of components in the cepstral domain (i.e., the number of cepstral coefficients and associated differences). Depending on the class, the number of coefficients and differences will vary. For example, in the simulations that were run, the largest number of components was 2401 (artillery class) and the smallest number was 543 (vehicle class). Next, the mel-frequency scaling step is carried out with defined values for a and b [10], then the inverse discrete cosine transform is taken and the derivatives (differences) are computed. From this set of computed cepstral coefficients and differences, it is desired to select those components that will constitute a feature vector that is consistent within a particular class. That is, there is minimal variation among similar components across the suite of feature vectors. So the approach taken here is to think in terms of minimum variance of these similar components within the feature set. Recall, the time-domain infrasound signals are assumed to be band-pass filtered to remove any effects of microbaroms as described previously. For each discrete-time infrasound signal, y(k), where k is the discrete time index (an integer), the specific preprocessing steps are (dropping the time dependence k): (1) Normalize (i.e., divide each sample in the signal y(k) by the absolute value of the maximum amplitude, jymaxj, and also divide by the square root of the computed variance of the signal, sy2, and then remove the mean: y

y={jymax j,sy }

(3:4)

y

y mean(y)

(3:5)

(2) Compute the PSD, Syy(kv), of the signal y: Syy (kv ) ¼

1 X

Ryy (t)ejkv t

(3:6)

t¼0

where Ryy() is the autocorrelation of the infrasound signal y. (3) Find the average of the entire set of PSDs in the class, i.e., PSDavg (4) Retain only those spectral components whose contributions will maximize the overall performance of the global classifier: if mi > «1 mmax (typically «1 ¼ 0:001) then mi else «2

mi mi (typically «2 ¼ 0:00001)

(5) Compute variances of the components selected in Step (4). Then take the first 90% of the spectral components that are rank-ordered according to the smallest variance. Set the remaining components to a small value, i.e., «3 (typically 0.00001). (6) Apply mel-frequency scaling to Syy(kv): Smel (kv ) ¼ a loge ½bSyy (kv ) where a ¼ 11.25, b ¼ 0.03.

(3:7)

Signal and Image Processing for Remote Sensing

44 (7)

Take the inverse discrete cosine transform: xmel (n) ¼

1 X 1 N Sm (kv ) cos(2pkv n=N ) n k ¼0

for n ¼ 0, 1, 2, . . . , N 1

(3:8)

v

(8)

Take the consecutive differences of the sequence xmel (n) to obtain x0mel (n).

(9)

Concatenate the sequence of differences, x0mel(n), with the cepstral coefficient sequence, xmel (n), to form the augmented sequence: xamel ¼ [x0mel (i)jxmel (j)]

(3:9)

where i and j are determined experimentally. As mentioned previously, i ¼ 400 and j ¼ 600. a (10) Take the absolute value of the elements in the sequence xmel yielding:

xamel,abs ¼ jxamel j

(3:10)

(11) Take the loge of xamel,abs from the previous step to give: xamel,abs,log ¼ loge ½xamel, abs

(3:11)

Applying this 11-step feature extraction process to the infrasound signals in the six different classes results in the feature vectors shown in Figure 3.8. The length of each feature vector is 34. This will be explained in the next section. If these sets of feature vectors are compared to their time-domain signal counterparts (see Figure 3.7), it is obvious that the feature extraction process produces feature vectors that are much more consistent than the time-domain signals. Moreover, comparing the feature vectors between classes reveals that the different sets of feature vectors are markedly distinct. This should result in improved classification performance. 3.4.3

Useful Definitions

Before we go on, let us define some useful quantities that apply to the assessment of performance for classifiers. The confusion matrix [29] for a two-class classifier is shown in Table 3.3. In Table 3.3 we have the following: p: number of correct predictions that an occurrence is positive q: number of incorrect predictions that an occurrence is positive r: number of incorrect of predictions that an occurrence is negative s: number of correct predictions that an occurrence is negative With this, the correct classification rate (CCR) is defined as No: correct predictions No: classifications No: predictions p þ s No: multiple classifications ¼ pþqþrþs

CCR ¼

(3:12)

A Universal Neural Network–Based Infrasound Event Classifier (a) Feature set for vehicle

10

9

9

8

8 Amplitude

Amplitude

10

7 6

4

4 10

25 15 20 Feature number

3

30

9

8

8

Amplitude

9

7 6

4

4 10

15 20 25 Feature number

5

(e) Feature set for jet 9

8

8 Amplitude

9 Amplitude

10

15 20 25 Feature number

30

(f) Feature set for shuttle 10

7 6

7 6

5

5

4

4 5

30

3

30

10

3

15 20 25 Feature number

6 5

5

10

7

5

3

5

(d) Feature set for rocket 10

(c) Feature set for artillery fire 10

Amplitude

6 5

5

(b) Feature set for missile

7

5

3

45

10 15 20 25 Feature number

30

3 5

10 15 20 25 Feature number

30

FIGURE 3.8 (See color insert following page 302.) Infrasound signals for six class different classes.

Multiple classifications refer to more than one of the neural modules showing a ‘‘positive’’ at the output of the RBF NN indicating that the input to the global classifier belongs to more than one class (whether this is true or not). So there could be double, triple, quadruple, etc., classifications for one event. The accuracy (ACC) is given by ACC ¼

No: correct predictions pþs ¼ No predictions pþqþrþs

(3:13)

Signal and Image Processing for Remote Sensing

46 TABLE 3.3

Confusion Matrix for a Two-Class Classifier Predicted Value Actual Value Positive Negative

Positive p r

Negative q s

As seen from Equation 3.12 and Equation 3.13, if multiple classifications occur, the CCR is a more conservative performance measure than the ACC. However, if no multiple classifications occur, the CCR ¼ ACC. The true positive (TP) rate is the proportion of positive cases that are correctly identified. This is computed using p TP ¼ (3:14) pþq The false positive (FP) rate is the proportion of negative cases that are incorrectly classified as positive occurrences. This is computed using r (3:15) FP ¼ rþs 3.4.4

Selection Process for the Optimal Number of Feature Vector Components

From the set of computed cepstral coefficients and differences generated using the feature extraction process given above, an optimal subset of these is desired that will constitute the feature vectors used to train and test the PBNNC shown in Figure 3.2. The optimal subset (i.e., the optimal feature vector length) is determined by taking a minimum variance approach. Specifically, a 3-D graph is generated that plots the performance; that is, CCR versus the RBF NN spread parameter and the feature vector number (see Figure 3.9). From this graph, mean values and variances are computed across the range of spread parameters for each of the defined number of components in the feature vector. The selection criterion is defined as simultaneously maximizing the mean and at the same time minimizing the variance. Maximization of the mean ensures maximum performance; that is, maximizing the CCR and at the same time minimizing the variance to minimize variation in the feature set within each of the classes. The output threshold at each of the neural modules (i.e., the output of the single output neuron of each RBF NN) is set optimally according to a 3-D ROC curve. This will be explained next. Figure 3.10 shows the two plots used to determine the maximum mean and the minimum variance. The table insert between the two graphs shows that even though the mean value for 40 elements in the feature vector is (slightly) larger than that for 34 elements, the variance for 40 is nearly three times that for 34 elements. Therefore, a length of 34 elements for the feature vectors is the best choice. 3.4.5

Optimal Output Threshold Values and 3-D ROC Curves

At the output of the RBF NN for each of the six neural modules, there is a single output neuron with hard-limiting binary values used during the training process (see Figure 3.2). After training, to determine whether a particular SOI belongs to one of the six classes, the

A Universal Neural Network–Based Infrasound Event Classifier

47

Performance (CCR) 3-D plot using ROC curve

100

CCR (%)

80 60 40 20 0 2.5 2.0 1.5

Sp

rea

1.0

dp

ara

Co m and pute m Co var e mp ian an Com and ute m ce p var e Com and ute m ian an e var p ce Com ian an and ute m 50 put c e e var e and ian an 40 var mean ce ian 30 ce r

0.5

me

ter

20

0.1

10

60

be

re num

Featu

FIGURE 3.9 (See color insert following page 302.) Performance plot used to determine the optimal number components in the feature vector. Ill conditioning occurs for the feature number less than 10, and for the feature number greater than 60, the CCR dramatically declines.

threshold value of the output neurons is optimally set according to an ROC curve [30–32] for that individual neural module (i.e., one particular class). Before an explanation of the 3-D ROC curve is given, let us first review 2-D ROC curves and see how they are used to optimally set threshold values. An ROC curve is a plot of the TP rate versus the FP rate, or the sensitivity versus (1 – specificity); a sample ROC curve is shown in Figure 3.11. The optimal threshold value corresponds to a point nearest the ideal point (0, 1) on the graph. The point (0, 1) is considered ideal because in this case there would be no false positives and only true positives. However, because of noise and other undesirable effects in the data, the point closest to the (0, 1) point (i.e., the minimum Euclidean distance) is the best that we can do. This will then dictate the optimal threshold value to be used at the output of the RBF NN. Since there are six classifiers, that is, six neural modules in the global classifier, six ROC curves must be generated. However, using 2-D ROC curves to set the thresholds at the outputs of the six RBF NN classifiers will not result in optimal thresholds. This is because misclassifications are not taken into account when setting the threshold for a particular neural module that is responsible for classifying a particular set of infrasound signals. Recall that one neural module is associated with one infrasonic class, and each neural module acts as its own classifier. Therefore, it is necessary to account for the misclassifications that can occur and this can be accomplished by adding a third dimension to the ROC curve. When the misclassifications are taken into account the (0, 1, 0) point now becomes the optimal point, and the smallest Euclidean distance to this point is directly related to

Signal and Image Processing for Remote Sensing

48 70

Performance mean

65

60 34 Features

55

50

45

40 10

15

20

25

Feature number

30 35 40 Feature number

45

Mean

Variance

34

66.9596

47.9553

40

69.2735

50

55

60

50

55

60

122.834

500 450 400

Performance variance

350 300 250 34 Features

200 150 100 50 0 10

15

20

25

30 35 40 Feature number

45

FIGURE 3.10 Performance means and performance variances versus feature number used to determine the optimal length of the feature vector.

A Universal Neural Network–Based Infrasound Event Classifier

49

1 0.9

ROC curve Minimum Euclidean distance

TP (sensitivity)

0.8 0.7 0.6 0.5 0.4

0

0.1

0.2

0.3

0.4 0.5 0.6 0.7 FP (1 − specificity)

0.8

0.9

1

FIGURE 3.11 Example ROC curve.

the optimal threshold value for each neural module. Figure 3.12 shows the six 3-D ROC curves associated with the classifiers.

3.5

Simul ation R esults

The four basic parameters that are to be optimized in the process of training the neural network classifier (i.e., the bank of six RBF NNs) are the RBF NN spread parameters, the output thresholds of each neural module, the combination of 34 components in the feature vectors for each class (note again in Figure 3.2, each neural module has its own custom preprocessor) and of course the resulting weights of each RBF NN. The MATLAB neural networks toolbox was used to design the six RBF NNs [33]. Table 3.1 shows the specific classes and the associated number of signals used to train and test the RBF NNs. Of the 574 infrasound signals, 351 were used for training the remaining 223 were used for testing. The criterion used to divide the data between the training and testing sets was to maintain independence. Hence, the four array signals from any one event are always kept together, either in the training set or the test set. After the optimal number components for each feature vector was determined, i.e., 34 elements, and the optimal combination of the 34 components for each preprocessor, the optimal RBF spread parameters are determined along with the optimal threshold value (the six graphs in Figure 3.12 were used for this purpose). For both the RBF spread parameters and the output thresholds, the selection criterion is based on maximizing the CCR of the local network and the overall (global) classifier CCR. The RBF spread parameter and the output threshold for each neural module was determined one by one by fixing the spread parameter, i.e., s, for all other neural modules to 0.3, and holding the threshold value at 0.5. Once the first neural module’s spread

Signal and Image Processing for Remote Sensing

50

(b) ROC 3-D plot for missile

(a) ROC 3-D plot for vehicle

5 Misclassification

Misclassification

4 3.5

Vehicle

3 2.5 2 1.5 1 0.8 Tru e

0.6 pos itive 0.4

0.2 0 (c) ROC 3-D plot for artillery

0.2

0.4

0.6

0.8

2 1

Tru

0.5 osi tive

ep

0.2

0 0 (d) ROC 3-D plot for rocket

False positive

0.4

0.6

1

0.8

False positive

Misclassification

5

4 3

Artillery

2 1 0 1 0.5 ep osit ive

Tru

0

0

0.2

0.4

0.6

0.8

4

Rocket

3 2 1 0 1

1

Tru

e p 0.5 osit ive

False positive

(e) ROC 3-D plot for jet

0

0

0.2

0.4

0.6

0.8

1

False positive

(f) ROC 3-D plot for shuttle

5

5

4 3

Misclassification

Misclassification

Missile

3

0 1

1

5 Misclassification

4

Jet

2 1 0 1 0.8 Tru 0.6 ep osit 0.4 ive

0.2 0

0.2

0.4

0.6

0.8

False positive

1

4

Shuttle

3 2 1 0 1

Tru

e p 0.5 osit ive

0

0

0.2

0.4

0.6

0.8

1

False positive

FIGURE 3.12 3-D ROC curves for the six classes.

parameter and threshold is determined, then the spread parameter and output threshold of the second neural module is computed while holding all other neural modules’ (except the first one) spread parameters and output thresholds fixed at 0.3 and 0.5, respectively. Table 3.4 gives the final values of the spread parameter and the output threshold for the global classifier. Figure 3.13 shows the classifier architecture with the final values indicated for the RBF NN spread parameters and the output thresholds. Table 3.5 shows the confusion matrix for the six-classifier. Concentrating on the 6 6 portion of the matrix for each of the defined classes, the diagonal elements correspond to

A Universal Neural Network–Based Infrasound Event Classifier

51

TABLE 3.4 Spread Parameter and Threshold of Six-Class Classifier

Vehicle Artillery Jet Missile Rocket Shuttle

Spread Parameter

Threshold Value

True Positive

False Positive

0.2 2.2 0.3 1.8 0.2 0.3

0.3144 0.6770 0.6921 0.9221 0.4446 0.6170

0.5 0.9621 0.5 1 0.9600 0.9600

0 0.0330 0 0 0.0202 0.0289

the correct predictions. The trace of this 66 matrix divided by the total number of signals tested (i.e., 223) gives the accuracy of the global classifier. The formula for the accuracy is given in Equation 3.13, and here ACC ¼ 94.6%. The off-diagonal elements indicate the misclassifications that occurred and those in parentheses indicate double classifications (i.e., the actual class was identified correctly, but there was another one of the output thresholds for another class that was exceeded). The off-diagonal element that is in square

s1 = 0.2

Pre-processor 1

Infrasound class 1 neural network

Optimum threshold set by ROC curve (0.3144) 1 0

s2 = 2.2

Pre-processor 2

Infrasound class 2 neural network

1 0

Optimum threshold set by ROC curve (0.6921)

s3 = 0.3

Pre-processor 3 Infrasound signal

Infrasound class 3 neural network

1 0

Optimum threshold set by ROC curve (0.9221)

s4 = 1.8

Pre-processor 4

Infrasound class 4 neural network

1 0

Optimum threshold set by ROC curve (0.4446)

s5 = 0.2

Pre-processor 5

Infrasound class 5 neural network

Optimum threshold set by ROC curve (0.6770)

1 0

s6 = 0.3

Pre-processor 6

Infrasound class 6 neural network

Optimum threshold set by ROC curve (0.6170) 1 0

FIGURE 3.13 Parallel bank neural network classifier architecture with the final values for the RBF spread parameters and the output threshold values.

Signal and Image Processing for Remote Sensing

52 TABLE 3.5 Confusion Matrix for the Six-Class Classifier

Predicted Value

Actual Value

Vehicle Artillery Jet Missile Rocket Shuttle

Vehicle

Artillery

Jet

Missile

Rocket

Shuttle

Unclassified

Total (223)

2 0 0 0 0 0

(1) 127 0 0 0 (1)[1]

0 0 2 0 0 0

0 0 0 8 0 0

0 0 0 0 24 1(3)

0 0 0 0 (5) 48

2 5 2 0 1 1

4 132 4 8 25 50

brackets is a double misclassification, that is, this event is misclassified along with another misclassified event (this is a shuttle event that is misclassified as both a ‘‘rocket’’ event as well as an ‘‘artillery’’ event). Table 3.6 shows the final global classifier results giving both the CCR (see Equation 3.12) and the ACC (see Equation 3.13). Simulations were also run using ‘‘bi-polar’’ outputs instead of binary outputs. For the case of bi-polar outputs, the output is bound between 1 and þ1 instead of 0 and 1. As can be seen from the table, the binary case yielded the best results. Finally, Table 3.7 shows the results for the case where the threshold levels on the outputs of the individual RBF NNs are ignored and only the output with this largest value is taken as the ‘‘winner,’’ that is, ‘‘winner-takes-all’’; this is considered to be the class that the input SOI belongs to. It should be noted that even though the CCR shows a higher level of performance for the winner-takes-all approach, this is probably not a viable method for classification. The reason being that if there were truly multiple events occurring simultaneously, they would never be indicated as such using this approach.

TABLE 3.6 Six-Class Classification Result Using a Threshold Value for Each Network Performance Type CCR ACC

Binary Outputs (%)

Bi-Polar Outputs (%)

90.1 94.6

88.38 92.8

TABLE 3.7 Six-Class Classification Result Using ‘‘Winner-Takes-All’’ Performance Type CCR ACC

Binary Method (%)

Bi-Polar Method (%)

93.7 93.7

92.4 92.4

A Universal Neural Network–Based Infrasound Event Classifier

3.6

53

Conclusions

Radial basis function neural networks were used to classify six different infrasound events. The classifier was built with a parallel structure of neural modules that individually are responsible for classifying one and only one infrasound event, referred to as PBNNC architecture. The overall accuracy of the classifier was found to be greater than 90%, using the CCR performance criterion. A feature extraction technique was employed that had a major impact toward increasing the classification performance over most other methods that have been tried in the past. Receiver operating characteristic curves were also employed to optimally set the output thresholds of the individual neural modules in the PBNNC architecture. This also contributed to increased performance of the global classifier. And finally, by optimizing the individual spread parameters of the RBF NN, the overall classifier performance was increased.

Acknowledgments The authors would like to thank Dr. Steve Tenney and Dr. Duong Tran-Luu from the Army Research Laboratory for their support of this work. The authors also thank Dr. Kamel Rekab, University of Missouri–Kansas City, and Mr. Young-Chan Lee, Florida Institute of Technology, for their comments and insight.

References 1. Pierce, A.D., Acoustics: An Introduction to Its Physical Principles and Applications, Acoustical Society of America Publications, Sewickley, PA, 1989. 2. Valentina, V.N., Microseismic and Infrasound Waves, Research Reports in Physics, SpringerVerlag, New York, 1992. 3. National Research Council, Comprehensive Nuclear Test Ban Treaty Monitoring, National Academy Press, Washington, DC, 1997. 4. Bedard, A.J. and Georges, T.M., Atmospheric infrasound, Physics Today, 53(3), 32–37, 2000. 5. Ham, F.M. and Kostanic, I., Principles of Neurocomputing for Science and Engineering, McGrawHill, New York, 2001. 6. Torres, H.M. and Rufiner, H.L., Automatic speaker identification by means of mel cepstrum, wavelets and wavelet packets, In Proc. 22nd Annu. EMBS Int. Conf., Chicago, IL, July 23–28, 2000, pp. 978–981. 7. Foo, S.W. and Lim, E.G., Speaker recognition using adaptively boosted classifier, In Proc. The IEEE Region 10th Int. Conf. Electr. Electron. Technol. (TENCON 2001), Singapore, Aug. 19–22, 2001, pp. 442–446. 8. Moonasar, V. and Venayagamoorthy, G., A committee of neural networks for automatic speaker recognition (ASR) systems, IJCNN, Vol. 4, Washington, DC, 2001, pp. 2936–2940. 9. Inal, M. and Fatihoglu, Y.S., Self-organizing map and associative memory model hybrid classifier for speaker recognition, 6th Seminar on Neural Network Applications in Electrical Engineering, NEUREL-2002, Belgrade, Yugoslavia, Sept. 26–28, 2002, pp. 71–74. 10. Ham, F.M., Rekab, K., Park, S., Acharyya, R., and Lee, Y.-C., Classification of infrasound events using radial basis function neural networks, Special Session: Applications of learning and data-driven methods to earth sciences and climate modeling, Proc. Int. Joint Conf. Neural Networks, Montre´al, Que´bec, Canada, July 31–August 4, 2005, pp. 2649–2654.

54

Signal and Image Processing for Remote Sensing

11. Mammone, R.J., Zhang, X., and Ramachandran, R.P., Robust speaker recognition: a featurebased approach, IEEE Signal Process. Mag., 13(5), 58–71, 1996. 12. Noble, J., Tenney, S.M., Whitaker, R.W., and ReVelle, D.O., Event detection from small aperture arrays, U.S. Army Research Laboratory and Los Alamos National Laboratory Report, September 2002. 13. Tenney, S.M., Noble, J., Whitaker, R.W., and Sandoval, T., Infrasonic SCUD-B launch signatures, U.S. Army Research Laboratory and Los Alamos National Laboratory Report, October 2003. 14. Haykin, S., Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1999. 15. Powell, M.J.D., Radial basis functions for multivariable interpolation: a review, IMA conference on Algorithms for the Approximation of Function and Data, RMCS, Shrivenham, U.K., 1985, pp. 143–167. 16. Powell, M.J.D., Radial basis functions for multivariable interpolation: a review, Algorithms for the Approximation of Function and Data, Mason, J.C. and Cox, M.G., Eds., Clarendon Press, Oxford, U.K., 1987. 17. Parzen, E., On estimation of a probability density function and mode, Ann. Math. Stat., 33, 1065–1076, 1962. 18. Duda, R.O., and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1973. 19. Specht, D.F., Probabilistic neural networks, Neural Networks, 3(1), 109–118, 1990. 20. Poggio, T. and Girosi, F., A Theory of Networks for Approximation and Learning, A.I. Memo 1140, MIT Press, Cambridge, MA, 1989. 21. Wilson, C.R. and Forbes, R.B., Infrasonic waves from Alaskan volcanic eruptions, J. Geophys. Res., 74, 4511–4522, 1969. 22. Wilson, C.R., Olson, J.V., and Richards, R., Library of typical infrasonic signals, Report prepared for ENSCO (subcontract no. 269343–2360.009), Vols. 1–4, 1996. 23. Bedard, A.J., Infrasound originating near mountain regions in Colorado, J. Appl. Meteorol., 17, 1014, 1978. 24. Olson, J.V. and Szuberla, C.A.L., Distribution of wave packet sizes in microbarom wave trains observed in Alaska, J. Acoust. Soc. Am., 117(3), 1032–1037, 2005. 25. Ham, F.M., Leeney, T.A., Canady, H.M., and Wheeler, J.C., Discrimination of volcano activity using infrasonic data and a backpropagation neural network, In Proc. SPIE Conf. Appl. Sci. Computat. Intell. II, Priddy, K.L., Keller, P.E., Fogel, D.B., and Bezdek, J.C., Eds., Orlando, FL, 1999, Vol. 3722, pp. 344–356. 26. Ham, F.M., Leeney, T.A., Canady, H.M., and Wheeler, J.C., An infrasonic event neural network classifier, In Proc. 1999 Int. Joint Conf. Neural Networks, Session 10.7, Paper No. 77, Washington, DC, July 10–16, 1999. 27. Ham, F.M., Neural network classification of infrasound events using multiple array data, International Infrasound Workshop 2001, Kailua-Kona, Hawaii, 2001. 28. Ham, F.M. and Park, S., A robust neural network classifier for infrasound events using multiple array data, In Proc. 2002 World Congress on Computational Intelligence—International Joint Conference on Neural Networks, Honolulu, Hawaii, May 12–17, 2002, pp. 2615–2619. 29. Kohavi, R. and Provost, F., Glossary of terms, Mach. Learn., 30(2/3), 271–274, 1998. 30. McDonough, R.N. and Whalen, A.D., Detection of Signals in Noise, 2nd ed., Academic Press, San Diego, CA, 1995. 31. Smith, S.W., The Scientist and Engineer’s Guide to Digital Signal Processing, California Technical Publishing, San Diego, CA, 1997. 32. Hanley, J.A. and McNeil, B.J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143(1), 29–36, 1982. 33. Demuth, H. and Beale, M., Neural Network Toolbox for Use with MATLAB, The MathWorks, Inc., Natick, MA, 1998.

4 Construction of Seismic Images by Ray Tracing

Enders A. Robinson

CONTENTS 4.1 Introduction ....................................................................................................................... 55 4.2 Acquisition and Interpretation ....................................................................................... 56 4.3 Digital Seismic Processing............................................................................................... 58 4.4 Imaging by Seismic Processing ...................................................................................... 60 4.5 Iterative Improvement ..................................................................................................... 62 4.6 Migration in the Case of Constant Velocity ................................................................. 63 4.7 Implementation of Migration ......................................................................................... 64 4.8 Seismic Rays ...................................................................................................................... 66 4.9 The Ray Equations............................................................................................................ 70 4.10 Numerical Ray Tracing.................................................................................................... 71 4.11 Conclusions........................................................................................................................ 73 References ..................................................................................................................................... 74

4.1

Introduction

Reflection seismology is a method of remote imaging used in the exploration of petroleum. The seismic reflection method was developed in the 1920s. Initially, the source was a dynamite explosion set off in a shallow hole drilled into the ground, and the receiver was a geophone planted on the ground. In difficult areas, a single source would refer to an array of dynamite charges around a central point, called the source point, and a receiver would refer to an array of geophones around a central point, called the receiver point. The received waves were recorded on a photographic paper on a drum. The developed paper was the seismic record or seismogram. Each receiver accounted for a single wiggly line on the record, which is called a seismic trace or simply a trace. In other words, a seismic trace is a signal (or time series) received at a specific receiver location from a specific source location. The recordings were taken for a time span starting at the time of the shot (called time zero) until about three or four seconds after the shot. In the early days, a seismic crew would record about 10 or 20 seismic records per day, with a dozen or two traces on each record. Figure 4.1 shows a seismic record with wiggly lines as traces. Seismic crew number 23 of the Atlantic Refining Company shot the record on October 9, 1952. As written on the record, the traces were shot with a source that was laid out as a 36-hole circular array. The first circle in the array had a diameter of 130 feet with 6 holes, each hole loaded with 10 lbs of dynamite. The second circle in the array had a diameter of 215 feet with 11 holes (which should have been 12 holes, but one hole was missing), each 55

Signal and Image Processing for Remote Sensing

56

FIGURE 4.1 Seismic record taken in 1952.

hole loaded with 10 lbs of dynamite. The third circle in the array had a diameter of 300 feet with 18 holes, each hole loaded with 5 lbs of dynamite. Each charge was at a depth of 20 feet. The total charge was thus 260 lbs of dynamite, which is a large amount of dynamite for a single seismic record. The receiver for each trace was made up of a group of 24 geophones (also called seismometers) in a circular array with 6 geophones on each of 4 circles of diameters 50 feet, 150 feet, 225 feet, and 300 feet, respectively. There was a 300-feet gap between group centers (i.e., receiver points). This record is called an NR seismogram. The purpose of the elaborate source and receiver arrays was an effort to bring out visible reflections on the record. The effort was fruitless. Regions where geophysicists can see no reflections on the raw records are termed as no record or no reflection (NR) areas. The region in which this record was taken, as the great majority of possible oilbearing regions in the world, was an NR area. In such areas, the seismic method (before digital processing) failed, and hence wildcat wells had to be drilled based on surface geology and a lot of guess work. There was a very low rate of discovery in the NR regions. Because of the tremendous cost of drilling to great depths, there was little chance that any oil would ever be discovered. The outlook for oil was bleak in the 1950s.

4.2

Acquisition and Interpretation

From the years of its inception up to about 1965, the seismic method involved two steps, namely acquisition and interpretation. Acquisition refers to the generation and recording of seismic data. Sources and receivers are laid out on the surface of the Earth. The objective is to probe the unknown structure below the surface. The sources are made up of vibrators (called vibroseis), dynamite shots, or air guns. The receivers are geophones on land and hydrophones at sea. The sources are activated one at a time, not all together. Suppose a

Construction of Seismic Images by Ray Tracing

57

single source is activated, the resulting seismic waves travel from the source into the Earth. The waves pass down through sedimentary rock strata, from which the waves are reflected upward. A reflector is an interface between layers of contrasting physical properties. A reflector might represent a change in lithology, a fault, or an unconformity. The reflected energy returns to the surface, where it is recorded. For each source activated, there are many receivers surrounding the source point. Each recorded signal, called a seismic trace, is associated with a particular source point and a particular receiver point. The traces, as recorded, are referred to as the raw traces. A raw trace contains all the received events. These events are produced by the subsurface structure of the Earth. The events due to primary reflections are wanted; all the other events are unwanted. Interpretation was the next step after acquisition. Each seismic record was examined through the eye and the primary reflections that could be seen were marked by a pencil. A primary reflection is an event that represents a passage from the source to the depth point, and then a passage directly back to the receiver (Figure 4.2). At a reflection, the traces become coherent; that is, they come into phase with each other. In other words, at a reflection, the crests and troughs on adjacent traces appear to fit into one another. The arrival time of a reflection indicates the depth of the reflecting horizon below the surface, while the time differential (the so-called step-out time) in the arrivals of a given peak or trough at successive receiver positions provides information on the dip of the reflecting horizon. In favorable areas, it is possible to follow the same reflection over a distance much greater than that covered by the receiver spread for a single record. In such cases, the records are placed side-by-side. The reflection from the last trace of one record correlates with the first trace of the next record. Such a correlation can be continued on successive records as long as the reflection persists. In areas of rapid structural change, the ensemble of raw traces is unable to show the true geometry of subsurface structures. In some cases, it is possible to identify an isolated structure such as a fault or a syncline on the basis of its characteristic reflection pattern. In NR regions, the raw record section does not give a usable image of the subsurface at all. Seismic wave propagation in three dimensions is a complicated process. The rock layers absorb, reflect, refract, or scatter the waves. Inside the different layers, the waves propagate at different velocities. The waves are reflected and refracted at the interfaces between the layers. Only elementary geometries can be treated exactly in three dimensions. If the reflecting interfaces are horizontal (or nearly so), the waves going straight down will be reflected nearly straight up. Thus, the wave motion is essentially vertical. If the time axes on the records are placed in the vertical position, time appears in the same direction as the raypaths. By using the correct wave velocity, the time axis can be converted into the depth axis. The result is that the primary reflections show the locations Receiver point R

Source point S

D Depth point

FIGURE 4.2 Primary reflection.

Signal and Image Processing for Remote Sensing

58

of the reflecting interfaces. Thus, in areas that have nearly level reflecting horizons, the primary reflections, as recorded, essentially show the correct depth positions of the subsurface interfaces. However, in areas that have a more complicated subsurface structure, the primary reflections as recorded in time do not occur at the correct depth positions in space. As a result, the primary reflections have to be moved (or migrated) to their proper spatial positions. In areas of complex geology, it is necessary to move (or migrate) the energy of each primary reflection to the spatial position of the reflecting point. The method is similar to that used in Huygens’s construction. Huygens articulated the principle that every point of a wavefront can be regarded as the origin of a secondary spherical wave, and the envelope of all these secondary waves constitutes the propagated wavefront. In the predigital days, migration was carried out by straightedge and compass or by a special-purpose handmanipulated drawing machine on a large sheet of a graph paper. The arrival times of the observed reflections were marked on the seismic records. These times are the two-way traveltimes from the source point to the receiver point. If the source and the receiver were at the same point (i.e., coincident), then the raypath down would be the same as the raypath up. In such a case, the one-way time is one half of the two-way time. From the two-way traveltime data, such a one-way time was estimated for each source point. This one-way time was multiplied by an estimated seismic velocity. The travel distance to the interface was thus obtained. A circle was drawn with the surface point as center and the travel distance as radius. This process was repeated for the other source points. In Huygens’s construction, the envelope of the spherical secondary waves gives the new wavefront. In a similar manner, in the seismic case, the envelope of the circles gives the reflecting interface. This method of migration was done in 1921 in the first reflection seismic survey ever taken.

4.3

Digital Seismic Processing

Historically, most seismic work fell under the category of two-dimensional (2D) imaging. In such cases, the source positions and the receiver positions are placed on a horizontal surface line called the x-axis. The time axis is a vertical line called the t-axis. Each source would produce many traces—one trace for each receiver position on the x-axis. The waves that make up each trace take a great variety of paths, each requiring a different time to travel from the source to receiver. Some waves are refracted and others scattered. Some waves travel along the surface of the Earth, and others are reflected upward from various interfaces. A primary reflection is an event that represents a passage from the source to the depth point, and then a passage directly back to the receiver. A multiple reflection is an event that has undergone three, five, or some other odd number of reflections in its travel path. In other words, a multiple reflection takes a zig-zag course with the same number of down legs as up legs. Depending on their time delay from the primary events with which they are associated, multiple reflections are characterized as short-path, implying that they interfere with the primary reflection, or as long-path, where they appear as separate events. Usually, primary reflections are simply called reflections or primaries, whereas multiple reflections are simply called multiples. A water-layer reverberation is a type of multiple reflection due to the multiple bounces of seismic energy back and forth between the water surface and the water bottom. Such reverberations are common in marine seismic data. A reverberation re-echoes (i.e., bounces back and forth) in the water layer for a prolonged period of time. Because of its resonant nature, a

Construction of Seismic Images by Ray Tracing

59

reverberation is a troublesome type of multiple. Reverberations conceal the primary reflections. The primary reflections (i.e., events that have undergone only one reflection) are needed for image formation. To make use of the primary reflected signals on the record, it is necessary to distinguish them from the other type of signals on the record. Random noise, such as wind noise, is usually minor and, in such cases, can be neglected. All the seismic signals, except primary reflections, are unwanted. These unwanted signals are due to the seismic energy introduced by the seismic source signal; hence, they are called signalgenerated noise. Thus, we are faced with the problem of (primary reflected) signal enhancement and (signal-generated) noise suppression. In the analog days (approximately up to about 1965), the separation of signal and noise was done through the eye. In the 1950s, a good part of the Earth’s sedimentary basins, including essentially all water-covered regions, were classified as NR areas. Unfortunately, in such areas, the signal-generated noise overwhelms the primary reflections. As a result, the primary reflections cannot be picked up visually. For example, water-layer reverberations as a rule completely overwhelm the primaries in the water-covered regions such as the Gulf of Mexico, the North Sea, and the Persian Gulf. The NR areas of the world could be explored for oil in a direct way by drilling, but not by the remote detection method of reflection seismology. The decades of the 1940s and 1950s were replete with inventions, not the least of which was the modern high-speed electronic stored-program digital computer. In the years from 1952 to 1954, almost every major oil company joined the MIT Geophysical Analysis Group to use the digital computer to process NR seismograms (Robinson, 2005). Historically, the additive model (trace ¼ s þ n) was used. In this model, the desired primary reflections were the signal s and everything else was the noise n. Electric filers and other analog methods were used, but they failed to give the desired primary reflections. The breakthrough was the recognition that the convolutional model (trace ¼ sn) is the correct model for a seismic trace. Note that the asterisk denotes convolution. In this model, the signal-generated noise is the signal s and the unpredictable primary reflections are the noise n. Deconvolution removes the signal-generated noise (such as instrument responses, ground roll, diffractions, ghosts, reverberations, and other types of multiple reflections) so as to yield the underlying primary reflections. The MIT Geophysical Analysis Group demonstrated the success of deconvolution on many NR seismograms, including the record shown in Figure 4.1. However, the oil companies were not ready to undertake digital seismic processing at that time. They were discouraged because an excursion into digital seismic processing would require new effort that would be expensive, and still the effort might fail because of the unreliability of the existing computers. It is true that in 1954 the available digital computers were far from suitable for geophysical processing. However, each year from 1946 onward, there was a constant stream of improvements in computers, and this development was accelerating every year. With patience and time, the oil and geophysical companies would convert to digital processing. It would happen when the need for hard-to-find oil was great enough to justify the investment necessary to turn NR seismograms into meaningful data. Digital signal processing was a new idea to the seismic exploration industry, and the industry shied away from converting to digital methods until the 1960s. The conversion of the seismic exploration industry to digital was in full force by about 1965, at which time transistorized computers were generally available at a reasonable price. Of course, a reasonable price for one computer then would be in the range from hundreds of thousands of dollars to millions of dollars. With the digital computer, a whole new step in seismic exploration was added, namely digital processing. However, once the conversion to digital was undertaken in the years around 1965, it was done quickly and effectively. Reflection seismology now involves three steps, namely acquisition, processing, and interpretation. A comprehensive presentation of seismic data processing is given by Yilmaz (1987).

Signal and Image Processing for Remote Sensing

60

4.4

Imaging by Seismic Processing

The term imaging refers to the formation of a computer image. The purpose of seismic processing is to convert the raw seismic data into a useful image of the subsurface structure of the Earth. From about 1965 onward, most of the new oil fields discovered were the result of the digital processing of the seismic data. Digital signal processing deconvolves the data and then superimposes (migrates) the results. As a result, seismic processing is divided into two main divisions: the deconvolution phase, which produces primaries-only traces (as well as possible), and the migration phase, which moves the primaries to their true depth positions (as well as possible). The result is the desired image. The first phase of imaging (i.e., deconvolution) is carried out on the traces, either individually by means of single-channel processing or in groups by means of multi-channel processing. Ancillary signal-enhancement methods typically include such things as the analyses of velocities and frequencies, static and dynamic corrections, and alternative types of deconvolution. Deconvolution is performed on one or a few traces at a time; hence, the small capacity of the computers of the 1960s was not a severely limiting factor. The second phase of imaging (i.e., migration) is the movement of the amplitudes of the primary-reflection events to their proper spatial locations (the depth points). Migration can be implemented by a Huygens-like superposition of the deconvolved traces. In a mechanical medium, such as the Earth, forces between the small rock particles transmit the disturbance. The disturbance at some regions of rock acts locally on nearby regions. Huygens imagined that the disturbance on a given wavefront is made up of many separate disturbances, each of which acts like a point source that radiates a spherically symmetric secondary wave, or wavelet. The superposition of these secondary waves gives the wavefront at a later time. The idea that the new wavefront is obtained by superposition is the crowning achievement of Huygens. See Figure 4.3. In a similar way, seismic migration uses superposition to find the subsurface reflecting interfaces.

A

A

H

B

I B

b

C K

d

D

d

b

d

b

d

d

L C

FIGURE 4.3 Huygens’s construction.

d

b

G

F E

Construction of Seismic Images by Ray Tracing

61

Ground surface

Interface FIGURE 4.4 Construction of a reflecting interface.

See Figure 4.4. In the case of the migration, there is an added benefit of superposition, namely, superposition is one of the most effective ways to accentuate signals and suppress noise. The superposition used in migration is designed to return the primary reflections to their proper spatial locations. The remnants of the signal-generated noise on the deconvolved traces are out of step with the primary events. As a result, the remaining signal-generated noise tends to be destroyed by the superposition. The superposition used in migration provides the desired image of the underground structure of the Earth. Historically, migration (i.e., the movement of reflected events to their proper locations in space) was carried out manually, sometimes making use of elaborate drawing instruments. The transfer of the manual processes to a digital computer involved the manipulation of a great number of traces at once. The resulting digital migration schemes all relied heavily on the superposition of the traces. This tremendous amount of data handling had a tendency to overload the limited capacities of the computers made in the 1960s and 1970s. As a result, it was necessary to simplify the migration problem and to break it down into smaller parts. Thus, migration was done by a sequence of approximate operations, such as stacking, followed by normal moveout, dip moveout, and migration after stack. The process known as time migration was often used, which improved the records in time, but stopped short of placing the events in their proper spatial positions. All kinds of modifications and adjustments were made to these piecemeal operations, and seismic migration in the 1970s and 1980s was a complicated discipline—an art as much as a science. The use of this art required much skill. Meanwhile, great advances in technology were taking place. In the 1990s, everything seemed to come together. Major improvements in instrumentation and computers resulted in light compact geophysical field equipment and affordable computers with high speed and massive storage. Instead of the modest number of sources and receivers used in 2D seismic processing, the tremendous number required for three-dimensional (3D) processing started to be used on a regular basis in field operations. Finally, the computers were large enough to handle the data for 3D imaging. Event movements (or migration) in three dimensions can now be carried out economically and efficiently by time-honored superposition methods such as those used in the Huygens’s construction. These migration methods are generally named as prestack migration. This name is a relic, which implies that stacking and all the other piecemeal operations are no longer used in the migration scheme. Until the 1990s, 3D seismic imaging was rarely used because of the prohibitive costs involved. Today, 3D methods are commonly used, and the resulting subsurface images are of extraordinary quality. Three-dimensional prestack migration significantly improves seismic interpretation because the locations of geological structures, especially faults, are given much more accurately. In addition, 3D migration collapses diffractions from secondary sources such as reflector terminations against faults and corrects bow ties to form synclines. Three-dimensional seismic work gives beautiful images of the underground structure of the Earth.

Signal and Image Processing for Remote Sensing

62

4.5

Iterative Improvement

Let (x, y) represent the surface coordinates and z represent the depth coordinate. Migration takes the deconvolved records and moves (i.e., migrates) the reflected events to depth points in the 3D volume (x, y, z). In this way, seismic migration produces an image of the geologic structure g(x, y, z) from the deconvolved seismic data. In other words, migration is a process in which primary reflections are moved to their correct locations in space. Thus, for migration, we need the primary reflected events. What else is required? The answer is the complete velocity function v(x, y, z), which gives the wave velocity at each point in the given volume of the Earth under exploration. At this point, let us note that the word velocity is used in two different ways. One way is to use the word velocity for the scalar that gives the rate of change of position in relation to time. When velocity is a scalar, the terms speed or swiftness are often used instead. The other (and more correct) way is to use the word velocity for the vector quantity that specifies both the speed of a body and its direction of motion. In geophysics, the word velocity is used for the (scalar) speed or swiftness of a seismic wave. The reciprocal of velocity v is slowness, n ¼ 1/v. Wave velocity can vary vertically and laterally in isotropic media. In anisotropic media, it can also vary azimuthally. However, we consider only isotropic media; therefore, at a given point, the wave velocity is the same in all directions. Wave velocity tends to increase with depth in the Earth because deep layers suffer more compaction from the weight of the layers above. Wave velocity can be determined from laboratory measurements, acoustic logs, and vertical seismic profiles or from velocity analysis of seismic data. Often, we say velocity when we mean wave velocity. Over the years, various methods have been devised to obtain a sampling of the velocity distribution within the Earth. The velocity functions so determined vary from method to method. For example, the velocity measured vertically from a check-shot or vertical seismic profile (VSP) differs from the stacking velocity derived from normal moveout measurements of ommon depth point gathers. Ideally, we would want to know the velocity at each and every point in the volume of Earth of interest. In many areas, there are significant and rapid lateral or vertical changes in the velocity that distort the time image. Migration requires an accurate knowledge of vertical and horizontal seismic velocity variations. Because the velocity depends on the types of rocks, a complete knowledge of the velocity is essentially equivalent to a complete description of the geologic structure g(x, y, z). However, as we have stated above, the velocity function is required to get the geologic structure. In other words, to get the answer (the geologic structure) g(x, y, z), we must know the answer (the velocity function) v(x, y, z). Seismic interpretation takes the images generated as representatives of the physical Earth. In an iterative improvement scheme, any observable discrepancies in the image are used as forcing functions to correct the velocity function. Sometimes, simple adjustments can be made, and, at other times, the whole imaging process has to be redone one or more times before a satisfactory solution can be obtained. In other words, there is interplay between seismic processing and seismic interpretation, which is a manifestation of the well-accepted exchange between the disciplines of geophysics and geology. Iterative improvement is a well-known method commonly used in those cases where you must know the answer to find the answer. By the use of iterative improvement, the seismic inverse problem is solved. In other words, the imaging of seismic data requires a model of seismic velocity. Initially, a model of smoothly varying velocity is used. If the results are not satisfactory, the velocity model is adjusted and a new image is formed. This process is repeated until a satisfactory image is obtained. To get the image, we must know the velocity; the method of iterative improvement deals with this problem.

Construction of Seismic Images by Ray Tracing

4.6

63

Migration in the Case of Constant Velocity

Consider a primary reflection. Its two-way traveltime is the time it takes for the seismic energy to travel down from the source S ¼ (xS, yS) to depth point D ¼ (xD, yD, zD) and then back up to the receiver R ¼ (xR, yR). The deconvolved trace f(S, R, t) gives the amplitude of the reflected signal as a function of two-way traveltime t, which is given in milliseconds from the time that the source is activated. We know S, R, t, and f(S, R, t). The problem is to find D, which is the depth point at refection. An isotropic medium is a medium whose properties at a point are the same in all directions. In particular, the wave velocity at a point is the same in all directions. Fresnel’s principle of least time requires that in an isotropic medium the rays are orthogonal trajectories of the wavefronts. In other words, the rays are normal to the wavefronts. However, in an anisotropic medium, the rays need not be orthogonal trajectories of the wavefronts. A homogeneous medium is a medium whose physical properties are the same throughout. For ease of exposition, let us first consider the case of a homogenous isotropic medium. Within a homogeneous isotropic material, the velocity v has the same value at all points and in all directions. The rays are straight lines since by symmetry they cannot bend in any preferred direction, as there are none. The two-way traveltime t is the elapsed time for a seismic wave to travel from its source to a given depth point and return to a receiver at the surface of the Earth. The two-way traveltime t is thus equal to the one-way traveltime t1 from the source point S to the depth point D plus the one-way traveltime t2 from the depth point D to the receiver point R. Note that the traveltime from D to R is the same as the traveltime from R to D. We may write t ¼ t1 þ t2, which in terms of distance is qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ vt ¼ (xD xS )2 þ (yD yS )2 þ (zD zS )2 þ (xD xR )2 þ (yD yR )2 þ (zD zR )2 We recall that an ellipse can be drawn with two pins, a loop of string, and a pencil. The pins are placed at the foci and the ends of the string are attached to the pins. The pencil is placed on the paper inside the string, so the string is taut. The string forms a triangle. If the pencil is moved around so that the string stays taut, the sum of the distances from the pencil to the pins remains constant, and the curve traced out by the pencil is an ellipse. Thus, if vt is the length of the string, then any point on the ellipse could be the depth point D that produces the reflection for that source S, receiver R, and traveltime t. We therefore take that event and move it out to each point on the ellipse. Suppose we have two traces with only one event on each trace. Suppose both events come from the same reflecting surface. In Figure 4.5, we show the two ellipses. In the spirit of Huygens’s construction, the reflector must be the common tangent to the ellipses. This example shows how migration works. In practice, we would take

Ellipse with possible reflector points for event on one trace

Ellipse with possible reflector points for event on another trace

Surface of earth

Tangent to the ellipses

FIGURE 4.5 Reflecting interface as a tangent.

Signal and Image Processing for Remote Sensing

64

the amplitude at each digital time instant t on the trace, and scatter the amplitude on the constructed ellipsoid. In this way, the trace is spread out into a 3D volume. Then we repeat this operation for each trace, until every trace is spread out into a 3D volume. The next step is superposition. All of these volumes are added together. (In practice, each trace would be spread out and cumulatively added into one given volume.) Interference tends to destroy the noise, and we are left with the desired 3D image of the Earth.

4.7

Implementation of Migration

A raypath is a course along which wave energy propagates through the Earth. In isotropic media, the raypath is perpendicular to the local wavefront. The raypath can be calculated using ray tracing. Let the point P ¼ (xP, yP) be either a source location or a receiver location. The subsurface volume is represented by a 3D grid (x, y, z) of depth points D. To minimize the amount of ray tracing, we first compute a traveltime table for each and every surface location, whether the location be a source point or a receiver point. In other words, for each surface location P, we compute the one-way traveltime from P to each depth point D in the 3D grid. We put these one-way traveltimes into a 3D table that is labeled by the surface location P. The traveltime for a primary reflection is the total two-way (i.e., down and up) time for a path originating at the source point S, reflected at the depth point D, and received at the receiver point R. Two identification numbers are associated with each trace: one for the source S and the other for the receiver R. We pull out the respective tables for these two identification numbers. We add the two tables together, element by element. The result is a 3D table for the two-way traveltimes for that seismic trace. Let us give a 2D example. We assume that the medium has a constant velocity, which we take to be 1. Let the subsurface grid for depth points D be given by (z, x), where depth is given by z ¼ 1, 2, . . . , 10 and horizontal distance is given by x ¼ 1, 2, . . . , 15. Let the surface locations P be (z ¼ 1, x), where x ¼ 1, 2, . . . , 15. Suppose the source is S ¼ (1, 3). We want to construct a table of one-way traveltimes, where depth z denotes the row and horizontal distance x denotes the column. The one-way traveltime from the source to the qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ depth point (z, x) is t(z, x) ¼ (z 1)2 þ (x 3)2 . For example, the traveltime from source to depth point (z, x) ¼ (4, 6) is qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃ t(4, 6) ¼ (4 1)2 þ (6 3)2 ¼ 18 ¼ 4:24 This number (rounded) appears in the fourth row, sixth column of the table below. The computed table (rounded) for the source is 2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

5.0 5.1 5.4 5.8 6.4 7.1 7.8 8.6 9.4 10.3

6.0 6.1 6.3 6.7 7.2 7.8 8.5 9.2 10.0 10.8

7.0 7.1 7.3 7.6 8.1 8.6 9.2 9.9 10.6 11.4

8.0 8.1 8.2 8.5 8.9 9.4 10.0 10.6 11.3 12.0

9.0 9.1 9.2 9.5 9.8 10.3 10.8 11.4 12.0 12.7

10.0 10.0 10.2 10.4 10.8 11.2 11.7 12.2 12.8 13.5

11.0 11.0 11.2 11.4 11.7 12.1 12.5 13.0 13.6 14.2

12.0 12.0 12.2 12.4 12.6 13.0 13.4 13.9 14.4 15.0

Construction of Seismic Images by Ray Tracing

65

Next, let us compute the traveltime for the receiver point. Note that the traveltime from the depth point to the receiver is the same as the traveltime from the receiver to the depth point. Suppose the receiver is R ¼ (11, 1). Then the one-way traveltime from the receiver qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ to the depth point (z, x) is t(z, x) ¼ (z 1)2 þ (x 11)2 . For example, the traveltime from qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ source to depth point (z, x) ¼ (4, 6) is t(4, 6) ¼ (4 1)2 þ (6 11)2 ¼ 5:85. This number (rounded) appears in the fourth row, sixth column of the table below. The computed table (rounded) for the receiver is 10.0 10.0 10.2 10.4 10.8 11.2 11.7 12.2 12.8 13.5

9.0 9.1 9.2 9.5 9.8 10.3 10.8 11.4 12.0 12.7

8.0 8.1 8.2 8.5 8.9 9.4 10.0 10.6 11.3 12.0

7.0 7.1 7.3 7.6 8.1 8.6 9.2 9.9 10.6 11.4

6.0 6.1 6.3 6.7 7.2 7.8 8.5 9.2 10.0 10.8

5.0 5.1 5.4 5.8 6.4 7.1 7.8 8.6 9.4 10.3

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

The addition of the above two tables gives the two-way traveltimes. For example, the traveltime from source to depth point (z, x) ¼ (4, 6) and back to receiver is t(4, 6) ¼ 4.24 þ 5.85 ¼ 10.09. This number (rounded) appears in the fourth row, sixth column of the two-way table below. The two-way table (rounded) for source and receiver is 12 12 13 14 15 17 18 19 21 23

10 10 11 13 14 15 17 18 20 22

8 9 10 12 13 14 16 18 19 21

8 8 10 11 12 14 15 17 19 20

8 8 9 10 12 13 15 16 18 20

8 8 9 10 11 13 15 16 18 20

8 8 9 10 11 13 14 16 18 20

8 8 9 10 11 13 15 16 18 20

8 8 9 10 12 13 15 16 18 20

8 8 10 11 12 14 15 17 19 20

8 9 10 12 13 14 16 18 19 21

10 10 11 13 14 15 17 18 20 22

12 12 13 14 15 17 18 19 21 23

14 14 15 16 17 18 19 21 22 24

16 16 17 17 18 19 21 22 23 25

Figure 4.6 shows a contour map of the above table. The contour lines are elliptic curves of constant two-way traveltime for the given source and receiver pair. Let the deconvolved trace (i.e., the trace with primary reflections only) be Time 0 Trace 0 Sample

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

10 1

11 0

12 0

13 2

14 0

15 0

16 0

17 0

18 3

19 0

The next step is to place the amplitude of each trace at depth locations, where the traveltime of the trace sample equals the traveltime as given in the above two-way table. The trace is zero for all times except 10, 13, and 18. We now spread the trace

Signal and Image Processing for Remote Sensing

66

S1 S2 S3 S4 20.0–25.0

S5

15.0–20.0 10.0–15.0

S6

5.0–10.0 S7

0.0–5.0

S8 S9

1

2

3

4

5

6

7

8

9

10

11

12

13

S10 15

14

FIGURE 4.6 Elliptic contour lines.

out as follows. In the above two-way table, the entries with 10, 13, and 18 are replaced by the trace values 1, 2, 3, respectively. All other entries are replaced by zero. The result is 0 0 2 0 0 0 3 0 0 0

1 1 0 2 0 0 0 3 0 0

0 0 1 0 2 0 0 3 0 0

0 0 1 0 0 0 0 0 0 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 1 0 0 0 0 0 0 0

0 0 1 0 2 0 0 3 0 0

0 1 0 2 0 0 0 3 0 0

0 0 2 0 0 0 0 0 0 0

0 0 0 0 0 3 0 0 0 0

0 0 0 0 3 0 0 0 0 0

This operation is repeated for all traces in the survey, and the resulting tables are added together. The final image appears by the constructive and destructive interference among the individual trace contributions. The above procedure for a constant velocity is the same as with a variable velocity, except now the traveltimes are computed according to the velocity function v(x, y, z). The eikonal equation can provide the means; hence, for the rest of this paper we will develop the properties of this basic equation.

4.8

Seismic Rays

To move the received reflected events back into the Earth and place their energy at the point of reflection, it is necessary to have a good understanding of ray theory. We assume the medium is isotropic. Rays are directed curves that are always perpendicular to the

Construction of Seismic Images by Ray Tracing

67

wavefront at any given time. The rays point along the direction of the motion of the wavefront. As time progresses, the disturbance propagates, and we obtain a family of wavefronts. We will now describe the behavior of the rays and wavefronts in media with a continuously varying velocity. In the treatment of light as wave motion, there is a region of approximation in which the wavelength is small in comparison with the dimensions of the components of the optical system involved. This region of approximation is treated by the methods of geometrical optics. When the wave character of the light cannot be ignored, then the methods of physical optics apply. Since the wavelength of light is very small compared to ordinary objects, geometrical optics can describe the behavior of a light beam satisfactorily in many situations. Within the approximation represented by geometrical optics, light travels along lines called rays. The ray is essentially the path along which most of the energy is transmitted from one point to another. The ray is a mathematical device rather than a physical entity. In practice, one can produce very narrow beams of light (e.g., a laser beam), which may be considered as physical manifestations of rays. When we turn to a seismic wave, the wavelength is not particularly small in comparison with the dimensions of geologic layers within the Earth. However, the concept of a seismic ray fulfills an important need. Geometric seismics is not nearly as accurate as geometric optics, but still ray theory is used to solve many important practical problems. In particular, the most popular form of prestack migration is based on tracing the raypaths of the primary reflections. In ancient times, Archimedes defined the straight line as the shortest path between two points. Heron explained the paths of reflected rays of light based on a principle of least distance. In the 17th century, Fermat proposed the principle of least time, which let him account for refraction as well as reflection. The Mississippi River has created most of Louisiana with sand and silt. The river could not have deposited these sediments by remaining in one channel. If it had remained in one channel, southern Louisiana would be a long narrow peninsula reaching into the Gulf of Mexico. Southern Louisiana exists in its present form because the Mississippi River has flowed here and there within an arc of about two hundred miles wide, frequently and radically changing course, surging over the left or the right bank to flow in a new direction. It is always the river’s purpose to get to the Gulf in the least time. This means that its path must follow the steepest way down. The gradient is the vector that points in the direction of the steepest ascent. Thus, the river’s path must follow the direction of the negatives gradient, which is the path of steepest descent. As the mouth advances southward and the river lengthens, the steepness of the path declines, the current slows, and sediment builds up the bed. Eventually, the bed builds up so much that the river spills to one side to follow what has become the steepest way down. Major shifts of that nature have occurred about once in a millennium. The Mississippi’s main channel of three thousand years ago is now Bayou Teche. A few hundred years later, the channel shifted abruptly to the east. About two thousand years ago, the channel shifted to the south. About one thousand years ago, the channel shifted to the river’s present course. Today, the Mississippi River has advanced past New Orleans and out into the Gulf that the channel is about to shift again to the Atchafalaya. By the route of the Atchafalaya, the distance across the delta plain is 145 miles, which is about half the length of the route of the present channel. The Mississippi River intends changing its course to this shorter and steeper route. The concept of potential was first developed to deal with problems of gravitational attraction. In fact, a simple gravitational analogy is helpful in explaining potential. We do work in carrying an object up a hill. This work is stored as potential energy, and it can be recovered by descending in any way we choose. A topographic map can be used to visualize the terrain. Topographic maps provide information about the elevation of the surface above sea level. The elevation is represented on a topographic map by contour

Signal and Image Processing for Remote Sensing

68

lines. Each point on a contour line has the same elevation. In other words, a contour line represents a horizontal slice through the surface of the land. A set of contour lines tells you the shape of the land. For example, hills are represented by concentric loops, whereas stream valleys are represented by V-shapes. The contour interval is the elevation difference between adjacent contour lines. Steep slopes have closely spaced contour lines, while gentle slopes have very widely spaced contour lines. In seismic theory, the counterpart of gravitational potential is the wavefront t(x, y, z) ¼ constant. A vector field is a rule that assigns a vector, in our case the gradient rt(x, y, z) ¼

@t @t @t , , @x @y @z

to each point (x, y, z). In visualizing a vector field, we imagine there is a vector extending from each point. Thus, the vector field associates a direction to each point. If a hypothetical particle moves in such a manner that its direction at any point coincides with the direction of the vector field at that point, then the curve traced out is called a flow line. In the seismic case, the wavefront corresponds to the equipotential surface and the seismic ray corresponds to the flow line. In 3D space, let r be the vector from the origin (0, 0, 0) to an arbitrary point (x, y, z). A vector is specified by its components. A compact notation is to write r as r ¼ (x, y, z). We call r the radius vector. A more explicit way is to write r as r ¼ xxˆ þ yyˆ þ zzˆ, where xˆ, yˆ, and zˆ are the three orthogonal unit vectors. These unit vectors are defined as the vectors that have magnitude equal to one and have directions lying along the x, y, z axes, respectively. They are referred to as ‘‘x-hat’’ and so on. Now, let the vector r ¼ (x, y, z) represent a point on a given ray (Figure 4.7). Let s denote arc length along the ray. Let r þ dr ¼ (x þ dx, y þ dy, z þ dz) give an adjacent point on the same ray. The vector dr ¼ (dx, dy, dz) is (approximately) the tangent vector to the ray. The pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ length of this vector is (dx2 þ dy2 þ dz2 Þ which is approximately equal to the increment ds of the arc length on the ray. As a result the unit vector tangent to the ray is u¼

dr dx dy dz ¼ xˆ þ yˆ þ zˆ ds ds ds ds

The unit tangent vector can also be written as dx dy dz u¼ , , ds ds ds The velocity along the ray is v ¼ ds/dt so dt ¼ ds/v ¼ n ds, where n(r) ¼ 1/v(r) is the slowness and ds is an increment of path length along the given ray. Thus, the seismic traveltime field is

Unit tangent u

Ray Wavefront FIGURE 4.7 Raypath and wavefronts.

Radius vector r

Construction of Seismic Images by Ray Tracing

t(r) ¼

69

ðr n(r) ds r0

It is understood, of course, that the path of integration is along the given ray. The above equation holds for any ray. A wavefront is a surface of constant traveltime. The time difference between two wavefronts is a constant independent of the ray used to calculate the time difference. A wave as it travels must follow the path of least time. The wavefronts are like contour lines on a hill. The height of the hill is measured in time. Take a point on a contour line. In what direction will the ray point? Suppose the ray points along the contour line (that is, along the wavefront). As the wave travels a certain distance along this hypothetical ray, it takes time. But, all time is the same along the wavefront. Thus, a wave cannot travel along a wavefront. It follows that a ray must point away from a wavefront. Suppose a ray points away from the wavefront. The wave wants to take the least time to travel to the new wavefront. By isotropy, the wave velocity is the same in all directions. Since the traveltime is velocity multiplied by distance, the wave wants to take the raypath that goes the shortest distance. The shortest distance is along the path that has no component along the wavefront; that is, the shortest distance is along the normal to the wavefront. In other words, the ray’s unit tangent vector u must be orthogonal to the wavefront. By definition, the gradient is a vector that points in the direction orthogonal to the wavefront. Thus, the ray’s unit tangent vector u and the gradient rt of the wavefront must point in the same direction. If the given wavefront is at time t and the new wavefront is at time t þ dt, then the traveltime along the ray is dt. If s measures the path length along the given ray, then the travel distance in time dt is ds. Along the raypath, the increments dt and ds are related by the slowness, that is, dt ¼ n ds. Thus, the slowness is equal to the directional derivative in the direction of the raypath, that is, n ¼ dt/ds. In other words, the swiftness along the raypath direction is v ¼ ds/dt, and the slowness along the raypath direction is n ¼ dt/ds. If we write the directional derivative in terms of its components, this equation becomes n¼

dt @t dx @t dy @t dz dr ¼ þ þ ¼ rt ds @x ds @y ds @z ds ds

Because dr/ds ¼ u, it follows that the above equation is n ¼ rt u. Since u is a unit vector in the same direction of the gradient, it follows that n ¼ jrtj juj cos 0 ¼ jrtj In other words, the slowness is equal to the magnitude of the gradient. Since gradient rt and the raypath each have the same direction u, and the gradient has magnitude n, and u has magnitude unity, it follows that rt ¼ nu This equation is the vector eikonal equation. The vector eikonal equation written in terms of its components is

@t @t @t dx dy dz , , , , ¼n @x @y @z ds ds ds

If we take the squared magnitude of each side, we obtain the eikonal equation

Signal and Image Processing for Remote Sensing

70

@t @x

2 2 2 @t @t þ þ ¼ n2 @y @z

The left-hand side involves the wavefront and the right-hand side involves the ray. The connecting link is the slowness. In the eikonal equation, the function t(x, y, z) is the traveltime (also called the eikonal) from the source to the point with the coordinates (x, y, z), and n(x, y, z) ¼ 1/v(x, y, z) is the slowness (or reciprocal velocity) at that point. The eikonal equation describes the traveltime propagation as an isotropic medium. To obtain a well-posed initial value problem, it is necessary to know the velocity function v(x, y, z) at all points in space. Moreover, as an initial condition, the source or some particular wavefront must be specified. Furthermore, one must choose one of the two branches of the solutions (namely, either the wave going from the source or else the wave going to the source). The eikonal equation then yields the traveltime field t(x, y, z) in the heterogeneous medium, as required for migration. What does the eikonal equation rt ¼ nu say? It says that, because of Fermat’s principle of least time, the raypath direction must be orthogonal to the wavefront. The eikonal equation is the fundamental equation that connects the ray (which corresponds to the fuselage of the airplane) to the wavefront (which corresponds to the wings of the airplane). The wings let the fuselage feel the effects of points removed from the path of the fuselage. The eikonal equation makes a traveling wave (as envisaged by Huygens) fundamentally different from a traveling particle (as envisaged by Newton). Hamilton perceived that there is a wave–particle duality, which provides the mathematical foundation of quantum mechanics.

4.9

The Ray Equations

In this section, the position vector r always represents a point on a specific raypath, and not any arbitrary point in space. As time increases, r traces out the particular raypath in question. The seismic ray at any given point follows the direction of the gradient of the traveltime field t(r). As before, let u be the unit vector along the ray. The ray, in general, follows a curved path, and nu is the tangent to this curved raypath. We now want to derive an equation that will tell us how nu changes along the curved raypath. The vector eikonal equation is written as nu ¼ rt We now take the derivative of the vector eikonal equation with respect to distance s along the raypath. We obtain the ray equation d d (nu) ¼ (rt) ds ds Interchange r and (d/ds) and use dt/ds ¼ n. Thus, the right-hand side becomes drt dt ¼r ¼ rn ds ds Thus, the ray equation becomes

Construction of Seismic Images by Ray Tracing

71

High slowness

Low slowness FIGURE 4.8 Straight and curved rays.

d (nu) ¼ rn ds This equation, together with the equation for the unit tangent vector dr ¼u ds are called the ray equations. We need to understand how a single ray, say a seismic ray, moving along a particular path can know what is an extremal path in the variational sense. To illustrate the problem, consider a medium whose slowness n decreases with vertical depth, but is constant laterally. Thus, the gradient of the slowness at any location points straight up (Figure 4.8). The vertical line on the left depicts a raypath parallel to the gradient of the slowness. This ray undergoes no refraction. However, the path of the ray on the right intersects the contour lines of slowness at an angle. The right-hand ray is refracted and follows a curved path, even though the ray strikes the same horizontal contour lines of slowness as did the left-hand ray, where there was no refraction. This shows that the path of a ray cannot be explained solely in terms of the values of the slowness on the path. We must also consider the transverse values of the slowness along neighboring paths, that is, along paths not taken by that particular ray. The classical wave explanation, proposed by Huygens, resolves this problem by saying that light does not propagate in the form of a single ray. According to the wave interpretation, light propagates as a wavefront possessing transverse width. Think of an airplane traveling along the raypath. The fuselage of the airplane points in the direction of the raypath. The wings of the aircraft are along the wavefront. Clearly, the wavefront propagates more rapidly on the side where the slowness is low (i.e., where the velocity is high) than on the side where the slowness is high. As a result, the wavefront naturally turns in the direction of the gradient of slowness.

4.10

Numerical Ray Tracing

Computer technology and seismic instrumentation have experienced great advances in the past few years. As a result, exploration geophysics is in a state of transition from computer-limited 2D processing to computer-intensive 3D processing. In the past, most seismic surveys were along surface lines, which yield 2D subsurface images. The wave equation acts nicely in one dimension, and in three dimensions, but not in two dimensions. In one dimension, waves (as on a string) propagate without distortion. In three

72

Signal and Image Processing for Remote Sensing

dimensions, waves (as in the Earth) propagate in an undistorted way except for a spherical correction factor. However, in two dimensions, wave propagation is complicated and distorted. By its very nature, 2D processing can never account for events originating outside of the plane. As a result, 2D processing is broken up into a large number of approximate partial steps in a sequence of operations. These steps are ingenious, but they can never give a true image. However, 3D processing accounts for all of the events. It is now cost effective to lay out seismic surveys over a surface area and to do 3D processing. The third dimension is no longer missing, and, consequently, the need for a large number of piecemeal 2D approximations is gone. Prestack depth migration is a 3D imaging process that is computationally extensive, but mathematically simple. The resulting 3D images of the interior of the Earth surpass all expectations in utility and beauty. Let us now consider the general case in which we have a spatially varying velocity function v(x, y, z) ¼ v(r). This velocity function represents a velocity field. For a fixed constant v0, the equation v(r) ¼ v0 specifies those positions, r, where velocity has this fixed value. The locus of such positions makes up an isovelocity surface. The gradient @v @v @v , , rv(r) ¼ @x @y @z is normal to the isovelocity surface and points in the direction of the greatest increase in velocity. Similarly, the equation n(r) ¼ n0 for a fixed value of slowness n0 specifies an isoslowness surface. The gradient @n @n @n , , rn(r) ¼ @x @y @z is normal to the isoslowness surface and points in the direction of greatest increase in slowness. The isovelocity and isoslowness surfaces coincide, and rv ¼ n2 rn so the respective gradients point in the opposite direction, as we would expect. A seismic ray makes its way through the slowness field. As the wavefront progresses in time, the raypath is bent according to the slowness field. For example, suppose we have a stratified Earth in which the slowness decreases with depth, a vertical raypath does not bend, as it is pulled equally in all lateral directions. However, a nonvertical ray drags on its slow side, therefore it curves away from the vertical and bends toward the horizontal. This is the case of a diving wave, whose raypath eventually curves enough to reach the Earth’s surface again. Certainly, the slowness field, together with the initial direction of the ray, determines the entire raypath. Except in special cases, however, we must determine such raypaths numerically. Assume that we know the slowness function n(r) and that we know the ray direction u1 at point r1. We now want to derive an algorithm for finding the ray direction u2 at point r2. We choose a small, but finite, change in path length Ds. Then we use the first ray equation, which we recall is dr ¼u ds to compute the change Dr ¼ r2 r1. The required approximation is Dr ¼ u1 Ds

Construction of Seismic Images by Ray Tracing

73

or r2 ¼ r1 þ u1 Ds We have thus found the first desired quantity r2. Next, we use the second ray equation, which we recall is d(nu) ¼ rn ds in the form d(nu) ¼ rn ds The required approximation is D(nu) ¼ (rn) Ds or n(r2 )u2 n(r1 )u1 ¼ rn Ds For accuracy, rn may be evaluated by differentiating the known function n(r) midway between r1 and r2. Thus, the desired u2 is given as u2 ¼

n(r1 ) Ds u1 þ rn n(r2 ) n(r2 )

Note that the vector u1 is pulled in the direction of rn in forming u2, that is, the ray drags on the slow side, and so is bent in the direction of increasing slowness. The special case of no bending occurs when u1 and rn are parallel. As we have seen, a vertical wave in a horizontally stratified medium is an example of such a special case. We have thus found how to advance the wave along the ray by an incremental raypath distance. We can repeat the algorithm to advance the wave by any desired distance.

4.11

Conclusions

The acquisition of seismic data in many promising areas yields raw traces that cannot be interpreted. The reason is that signal-generated noise conceals the desired primary reflections. The solution to this problem was found in the 1950s with the introduction of the first commercially available digital computers and the signal-processing method of deconvolution. Digital signal-enhancement methods, and, in particular, the various methods of deconvolution, are able to suppress signal-generated noise on seismic records and bring out the desired primary-reflected energy. Next, the energy of the primary reflections must be moved to the spatial positions of the subsurface reflectors. This process, called migration, involves the superposition of all the deconvolved traces according to a scheme similar to Huygens’s construction. The result gives the reflecting horizons and other features that make up the desired image. Thus, digital processing, as it is

74

Signal and Image Processing for Remote Sensing

currently done, is roughly divided into two main parts, namely signal enhancement and event migration. The day is not far off, provided that research actively continues in the future as it has in the past, when the processing scheme will not be divided into two parts, but will be united as a whole. The signal-generated noise consists of physical signals that future processing should not destroy, but utilize. When all the seismic information is used in an integrated way, then the images produced will be even more excellent.

References Robinson, E.A., The MIT Geophysical Analysis Group from inception to 1954, Geophysics, 70, 7 JA–30JA, 2005. Yilmaz, O., Seismic Data Processing, Society of Exploration Geophysicists, Tulsa, 1987.

5 Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA Nicolas Le Bihan, Valeriu Vrabie, and Je´roˆme I. Mars

CONTENTS 5.1 Introduction ......................................................................................................................... 76 5.2 Matrix Data Sets.................................................................................................................. 76 5.2.1 Acquisition............................................................................................................... 77 5.2.2 Matrix Model........................................................................................................... 77 5.3 Matrix Processing ............................................................................................................... 78 5.3.1 SVD ........................................................................................................................... 78 5.3.1.1 Definition.................................................................................................. 78 5.3.1.2 Subspace Method.................................................................................... 78 5.3.2 SVD and ICA........................................................................................................... 79 5.3.2.1 Motivation ................................................................................................ 79 5.3.2.2 Independent Component Analysis ...................................................... 79 5.3.2.3 Subspace Method Using SVD–ICA...................................................... 81 5.3.3 Application .............................................................................................................. 82 5.4 Multi-Way Array Data Sets............................................................................................... 85 5.4.1 Multi-Way Acquisition .......................................................................................... 86 5.4.2 Multi-Way Model ................................................................................................... 86 5.5 Multi-Way Array Processing ............................................................................................ 87 5.5.1 HOSVD..................................................................................................................... 87 5.5.1.1 HOSVD Definition .................................................................................. 87 5.5.1.2 Computation of the HOSVD................................................................. 88 5.5.1.3 The (rc, rx, rt)-rank................................................................................... 89 5.5.1.4 Three-Mode Subspace Method............................................................. 90 5.5.2 HOSVD and Unimodal ICA ................................................................................. 90 5.5.2.1 HOSVD and ICA..................................................................................... 91 5.5.2.2 Subspace Method Using HOSVD–Unimodal ICA ............................ 91 5.5.3 Application to Simulated Data............................................................................. 92 5.5.4 Application to Real Data ....................................................................................... 97 5.6 Conclusions........................................................................................................................ 100 References ................................................................................................................................... 100

75

Signal and Image Processing for Remote Sensing

76

5.1

Introduction

This chapter describes multi-dimensional seismic data processing using the higher order singular value decomposition (HOSVD) and partial (unimodal) independent component analysis (ICA). These techniques are used for wavefield separation and enhancement of the signal-to-noise ratio (SNR) in the data set. The use of multi-linear methods such as the HOSVD is motivated by the natural modeling of a multi-dimensional data set using multiway arrays. In particular, we present a multi-way model for signals recorded on arrays of vector-sensors acquiring seismic vibrations in different directions of the 3D space. Such acquisition schemes allow the recording of the polarization of waves and the proposed multi-way model ensures the effective use of polarization information in the processing. This leads to a substantial increase in the performances of the separation algorithms. Before introducing the multi-way model and processing, we first describe the classical subspace method based on the SVD and ICA techniques for 2D (matrix) seismic data sets. Using a matrix model for these data sets, the SVD-based subspace method is presented and it is shown how an extra ICA step in the processing allows better wavefield separation. Then, considering signals recorded on vector-sensor arrays, the multi-way model is defined and discussed. The HOSVD is presented and some properties detailed. Based on this multi-linear decomposition, we propose a subspace method that allows separation of polarized waves under orthogonality constraints. We then introduce an ICA step in the process that is performed here uniquely on the temporal mode of the data set, leading to the so-called HOSVD–unimodal ICA subspace algorithm. Results on simulated and real polarized data sets show the ability of this algorithm to surpass a matrix-based algorithm and subspace method using only the HOSVD. Section 5.2 presents matrix data sets and their associated model. In Section 5.3, the wellknown SVD is detailed, as well as the matrix-based subspace method. Then, we present the ICA concept and its contribution to subspace formulation in Section 5.3.2. Applications of SVD–ICA to seismic wavefield separation are discussed by way of illustrations. Section 5.4 exposes how signal mixtures recorded on vector-sensor arrays can be described by a multi-way model. Then, in Section 5.5, we introduce the HOSVD and the associated subspace method for multi-way data processing. As in the matrix data set case, an extra ICA step is proposed leading to a HOSVD–unimodal ICA subspace method in Section 5.5.2. Finally, in Section 5.5.3 and Section 5.5.4, we illustrate the proposed algorithm on simulated and real multi-way polarized data sets. These examples emphasize the potential of using both HOSVD and ICA in multi-way data set processing.

5.2

Matrix Data Sets

In this section, we show how the signals recorded on scalar-sensor arrays can be modeled as a matrix data set having two modes or diversities: time and distance. Such a model allows the use of subspace-based processing using a SVD of the matrix data set. Also, an additional ICA step can be added to the processing to relax the unjustified orthogonality constraint for the propagation vectors by imposing a stronger constraint of (fourth-order) independence of the estimated waves. Illustrations of these matrix algebra techniques are presented on a simulated data set. Application to a real ocean bottom seismic (OBS) data set can be found in Refs. [1,2].

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 77 5.2.1

Acquisition

In geophysics, the most commonly used method to describe the structure of the earth is seismic reflection. This method provides images of the underground in 2D or 3D, depending on the geometry of the network of sensors used. Classical recorded data sets are usually gathered into a matrix having a time diversity describing the time or depth propagation through the medium at each sensor and a distance diversity related to the aperture of the array. Several methods exist to gather data sets and the most popular are common shotpoint gather, common receiver gather, or common midpoint gather [3]. Seismic processing consists in a series of elementary processing procedures used to transform field data, usually recorded in common shotpoint gather into a 2D or 3D common midpoint stacked 2D signals. Before stacking and interpretation, part of the processing is used to suppress unwanted coherent signals like multiple waves, ground-roll (surface waves), refracted waves, and also to cancel noise. To achieve this goal, several filters are classically applied on seismic data sets. The SVD is a popular method to separate an initial data set into signal and noise subspaces. In some applications [4,5] when wavefield alignment is performed, the SVD method allows separation of the aligned wave from the other wavefields.

5.2.2

Matrix Model

Consider a uniform linear array composed of Nx omni-directional sensors recording the contributions of P waves, with P < Nx. Such a record can be written mathematically using a convolutive model for seismic signals first suggested by Robinson [6]. Using the superposition principle, the discrete-time signal xk(m) (m is the time index) recorded on sensor k is a linear combination of the P waves received on the array together with an additive noise nk(m):

xk (m) ¼

P X

aki si (m mki ) þ nk (m)

(5:1)

i¼1

where si(m) is the ith source waveform that has been propagated through the transfer function supposed here to consist in a delay mki and a factor attenuation aki. The noises on each sensor nk(m) are supposed centered, Gaussian, spatially white, and independent of the sources. In the sequel, the use of the SVD to separate waves is only of significant interest if the subspace occupied by the part of interest contained in the mixture is of low rank. Ideally, the SVD performs well when the rank is 1. Thus, to ensure good results of the process, a preprocessing is applied on the data set. This consists of alignment (delay correction) of a chosen high amplitude wave. Denoting the aligned wave by s1(m), the model becomes after alignment:

yk (m) ¼ ak1 s1 (m) þ

P X

aki si (m m0ki ) þ n0k (m)

(5:2)

i¼2 0 where yk(m) ¼ xk(m þ mk1), mki ¼ mki mk1 and nk0 (m) ¼ nk(m þ mk1). In the following we assume that the wave s1(m) is independent from the others and therefore independent from si(m mki0 ).

Signal and Image Processing for Remote Sensing

78

Considering the simplified model of the received signals (Equation 5.2) and supposing Nt time samples available, we define the matrix model of the recorded data set Y 2 RNx Nt as Y ¼ {ykm ¼ yk (m)j 1 k Nx , 1 m Nt }

(5:3)

That is, the data matrix Y has rows that are the Nx signals yk(m) given in Equation 5.2. Such a model allows the use of matrix decomposition, and especially the SVD, for its processing.

5.3

Matrix Processing

We now present the definition of the SVD of such a data matrix that will be of use for its decomposition into orthogonal subspaces and in the associated wave separation technique. 5.3.1

SVD

As the SVD is a widely used matrix algebra technique, we only recall here theoretical remarks and redirect readers interested in computational issues to the Golub and Van Loan book [7]. 5.3.1.1

Definition

Any matrix Y 2 RNx Nt can be decomposed into the product of three matrices as follows: Y ¼ UDVT

(5:4)

where U is a Nx Nx matrix, D is an Nx Nt pseudo-diagonal matrix with singular values {l1, l2 , . . . , lN} on its diagonal, satisfying l1 l2 . . . lN 0, (with N ¼ min(Nx, Nt)), and V is an Nt Nt matrix. The columns of U (respectively of V) are called the left (respectively right) singular vectors, uj (respectively vj), and form orthonormal bases. Thus U and V are orthogonal matrices. The rank r (with r N) of the matrix Y is given by the number of nonvanishing singular values. Such a decomposition can also be rewritten as Y¼

r X

lj uj vTj

(5:5)

j¼1

where uj (respectively vj) are the columns of U (respectively V). This notation shows that the SVD allows any matrix to be expressed as a sum of r rank-1 matrices1. 5.3.1.2

Subspace Method

The SVD has been widely used in signal processing [8] because it gives the best rank approximation (in the least squares sense) of a given matrix [9]. This property allows denoising if the signal subspace is of relatively low rank. So, the subspace method consists of decomposing the data set into two orthogonal subspaces with the first one built from the p singular vectors related to the p highest singular values being the best rank approximation of the original data. This can be written as follows, using the SVD notation used in Equation 5.5, for a data matrix Y with rank r: 1

Any matrix made up of the product of a column vector by a row vector is a matrix whose rank is equal to 1 [7].

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 79

Y ¼ YSignal þ YNoise ¼

p X j ¼1

lj uj vTj þ

r X

lj uj vTj

(5:6)

j¼pþ1

Orthogonality between the subspaces spanned by the two sets of singular vectors is ensured by the fact that left and right singular vectors form orthonormal bases. From a practical point of view, the value of p is chosen by finding an abrupt change of slope in the curve of relative singular values (relative meaning percentile representation of) contained in the matrix D defined in Equation 5.4. For some cases where no ‘‘visible’’ change of slope can be found, the value of p can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves [10].

5.3.2

SVD and ICA

The motivation to relax the unjustified orthogonality constraint for the propagation vectors is now presented. ICA is the method used to achieve this by imposing a fourthorder independence on the estimated waves. This provides a new subspace method based on SVD–ICA. 5.3.2.1 Motivation The SVD of the data matrix Y in Equation 5.4 provides two orthogonal matrices composed by the left uj (respectively right vj) singular vectors. Note here that vj are called estimated waves because they give the time dependence of received signals by the array sensor and uj propagation vectors because they give the amplitude of vj0s on sensors [2]. As SVD provides orthogonal matrices, these vectors are also orthogonal. Orthogonality of the vj0s means that the estimated waves are decorrelated (second-order independence). Actually, this supports the usual cases in geophysical situations, in which recorded waves are supposed decorrelated. However, there is no physical reason to consider the orthogonality of propagation vectors uj. Why should we have different recorded waves with orthogonal propagation vectors? Furthermore, imposing the orthogonality of uj0s, the estimated waves vj are forced to be a mixture of recorded waves [1]. One way to relax this limitation is to impose a stronger criterion for the estimated waves, that is, to be fourth-order statistically independent, and consequently to drop the unjustified orthogonality constraint for the propagation vectors. This step is motivated by cases encountered in geophysical situations, where the recorded signals can be approximated as an instantaneous linear mixture of unknown waves supposed to be mutually independent [11]. This can be done using ICA.

5.3.2.2 Independent Component Analysis ICA is a blind decomposition of a multi-channel data set composed of an unknown linear mixture of unknown source signals, based on the assumption that these signals are mutually statistically independent. It is used in blind source separation (BSS) to recover independent sources (modeled as vectors) from a set of recordings containing linear combinations of these sources [12–15]. The statistical independence of sources means that the cross-cumulants of any order vanish. Generally, the third-order cumulants are discarded because they are generally close to zero. Therefore, here we will use fourth-order statistics, which have been found to be sufficient for instantaneous mixtures [12,13].

Signal and Image Processing for Remote Sensing

80

ICA is usually resolved by a two-step algorithm: prewhitening followed by high-order step. The first one consists in extracting decorrelated waves from the initial data set. The step is carried out directly by an SVD as the vj0s are orthogonal. The second step consists in finding a rotation matrix B, which leads to fourth-order independence of the estimated waves. We suppose here that the nonaligned waves in the data set Y are contained in a subspace of dimension R 1, smaller than the rank r of Y. Assuming this, only the first R estimated waves [v1 , . . . , vR ]notation ¼ VR 2 RNt R are taken into account [2]. As the recorded waves are supposed mutually independent, this second step can be written as ~ R ¼ [v ~ 1 , . . . ,v ~R ] 2 RNt R VR B ¼ V

(5:7)

Error (%) of the estimated signal subspace

with B 2 RR R the rotation (unitary) matrix having the property BBT ¼ BTB ¼ I. The ~j are now independent at the fourth order. new estimated waves v There are different methods of finding the rotation matrix: joint approximate diagonalization of eigenmatrices (JADE) [12], maximal diagonality (MD) [13], simultaneous thirdorder tensor diagonalization (STOTD) [14], fast and robust fixed-point algorithms for independent component analysis (FastICA) [15], and so on. To compare some cited ICA algorithms, Figure 5.1 shows the relative error (see Equation 5.12) of the estimated signal subspace versus the SNR (see Equation 5.11) for the data set presented in Section 5.3.3. For SNRs greater than 7.5 dB, FastICA using a ‘‘tan h’’ nonlinearity with the parameter equal to 1 in the fixed-point algorithm provides the smallest relative error, but with some erroneous points at different SNR. Note that the ‘‘tan h’’ nonlinearity is the one which gives the smallest error for this data set, compared with ‘‘pow3’’, ‘‘gauss’’ with the parameter equal to 1, or ‘‘skew’’ nonlinearities. MD and JADE algorithms are approximately equivalent according to the relative error. For SNRs smaller than 7.5 dB, MD provides the smallest relative error. Consequently, the MD algorithm was employed in the following. Now, considering the SVD decomposition in Equation 5.5 and the ICA step in Equation 5.7, the subspace described by the first R estimated waves can be rewritten as

6 JADE FastlCA–tan h MD

5 4 3 2 1 0 –20

–15

FIGURE 5.1 ICA algorithms—comparison.

–10

–5 0 SNR (dB) of the dataset

5

10

15

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 81 R X

R R X T R T X ~ ~jv ~iv ~Tj ¼ ~Ti lj uj vTj ¼ UR DR VR ¼ UR DR B V ¼ bj u bi u

j ¼1

j¼1

(5:8)

i¼1

where UR ¼ [u1, . . . , uR] is made up of the first R vectors of the matrix U and DR ¼ diag (l1, . . . , lR) is the R R truncated version of D containing the greatest values of lj. The ~ j are the new second equality is obtained using Equation 5.7. For the third equality, the u propagation vectors obtained as the normalized2 vectors (columns) of the matrix URDRB and bj are the ‘‘modified singular values’’ obtained as the ‘2 -norm of the columns of the matrix URDRB. The elements bj are usually not ordered. For this reason, a permutation between the ~ j as well as between the vectors v ~j is performed to order the modified singular vectors u values. Denoting with s() this permutation and with i ¼ s(j), the last equality of Equation 5.8 is obtained. In this decomposition, which is similar to that given by Equation 5.5, a stronger ~i has been imposed, that is, to be independent at criterion for the new estimated waves v the fourth order, and, at the same time, the condition of orthogonality for the new ~ i has been relaxed. propagation vectors u In practical situations, the value of R becomes a parameter. Usually, it is chosen to completely describe the aligned wave by the first R estimated waves given by the SVD. 5.3.2.3 Subspace Method Using SVD–ICA After the ICA and the permutation steps, the signal subspace is given by ~ Signal ¼ Y

~p X

~iv ~Ti bi u

(5:9)

i ¼1

where ~ p is the number of the new estimated waves necessary to describe the aligned wave. ~ Noise is obtained by subtraction of the signal subspace Y ~ Signal from The noise subspace Y the original data set Y: ~ Noise ¼ Y Y ~ Signal Y

(5:10)

~ is chosen by finding an abrupt change of From a practical point of view, the value of p slope in the curve of relative modified singular values. For cases with low SNR, no ‘‘visible’’ change of slope can be found and the value of ~p can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves. Note here that for very small SNR of the initial data set, (for example, smaller than 6.2 dB for the data set presented in Section 5.3.3, the aligned wave can be described by a less energetic estimated wave than by the first one (related to the highest singular value). For these extreme cases, a search must be done after the ICA and the permutation steps to ~i give the aligned identify the indexes for which the corresponding estimated waves v ~ Signal in Equation 5.9 must be redefined by choosing the wave. So the signal subspace Y index values found in the search. For example, applying the MD algorithm to the data set presented in Section 5.3.3 for which the SNR was modified to 9 dB, the aligned wave is ~3. Note also that using SVD without ICA in the described by the third estimated wave v same conditions, the aligned wave is described by the eighth estimated wave v8.

2

Vectors are normalized by their ‘2-norm.

Signal and Image Processing for Remote Sensing

82 5.3.3

Application

An application to a simulated data set is presented in this section to illustrate the behavior of the SVD–ICA versus the SVD subspace method. Application to a real data set obtained during an acquisition with OBS can be found in Refs. [1,2]. The preprocessed recorded signals Y on an 8-sensor array (Nx ¼ 8) during Nt ¼ 512 time samples are represented in Figure 5.2c. This synthetic data set was obtained by the addition of an original signal subspace S (Figure 5.2a) made up by a wavefront having infinite celerity (velocity), consequently associated with the aligned wave s1(m), and an original noise subspace N (Figure 5.2b) made up by several nonaligned wavefronts. These nonaligned waves are contained in a subspace of dimension 7, smaller than the rank of Y, which equals 8. The SNR ratio of the presented data set is SNR ¼ 3.9 dB. The SNR definition used here is3: SNR ¼ 20 log10

kSk kNk

(5:11)

Normalization to unit variance of each trace for each component was done before applying the described subspace methods. This ensures that even weak picked arrivals are well represented within the input data. After the computation of signal subspaces, a denormalization was applied to find the original signal subspace. Firstly, the SVD subspace method was tested. The subspace method given by Equation 5.6 was employed, keeping only one singular vector (respectively one singular value). This choice was made by finding an abrupt change of slope after the first singular value (Figure 5.6) in the relative singular values for this data set. The obtained signal subspace YSignal and noise subspace YNoise are presented in Figure 5.3a and Figure 5.3b. It is clear

Distance (sensors)

Distance (sensors)

Distance (sensors) 1

2

3

4

5

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450 500

400 450 500

400 450 500

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ SIi¼1 SJj¼1 a2ij is the Frobenius norm of the matrix A ¼ {aij} 2 RI J

6 7

100 150

100 150

100 150

kAk ¼

50

50

50

(b) Original noise subspace N

FIGURE 5.2 Synthetic data set.

3

8 0

1

2

3

4

5

6 7

8 0

1

2

3

4

5

6 7

8 0

(a) Original signal subspace S

(c) Recorded signals Y = S ⫹ N

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 83 Distance (sensors)

Distance (sensors)

1

2

3

4

5

6

7

50

50

50

100 150

100 150

100 150

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450

400 450

400 450

500

500

500

(b) Estimated noise subspace YNoise

8

0

1

2

3

4

5

6 7

8 0

1

2

3

4

5

6

8 0

(a) Estimated signal subspace YSignal

Index (i )

(c) Estimated waves vj

FIGURE 5.3 Results obtained using the SVD subspace method.

from these figures that the classical SVD implies artifacts in the two estimated subspaces for a wavefield separation objective. Moreover, the estimated waves vj shown in Figure 5.3c are an instantaneous linear mixture of the recorded waves. ~ Signal and noise subspace Y ~ Noise obtained using the SVD–ICA The signal subspace Y subspace method given by Equation 5.9 are presented in Figure 5.4a and Figure 5.4b. This improvement is due to the fact that using ICA we have imposed a fourth-order independence condition stronger than the decorrelation used in classical SVD. With this subspace method we have also relaxed the nonphysically justified orthogonality of the propagation vectors. The dimension R of the rotation matrix B was chosen to be eight because the aligned wavelight is projected on all eight estimated waves vj shown in Figure 5.3c. After the ICA ~i are presented in Figure 5.4c. As and the permutation steps, the new estimated waves v we can see, the first one describes the aligned wave ‘‘perfectly’’. As no visible change of slope can be found in the relative modified singular values shown in Figure 5.6, the value of ~ p was fixed at 1 because we are dealing with a perfectly aligned wave. To compare the results qualitatively, the stack representation is usually employed [5]. Figure 5.5 shows, from left to right, the stacks on the initial data set Y, the original signal subspace S, and the estimated signal subspaces obtained with SVD and SVD–ICA sub~ Signal is very space methods, respectively. As the stack on the estimated signal subspace Y close to the stack on the original signal subspace S, we can conclude that the SVD–ICA subspace method enhances the wave separation results. To compare these methods quantitatively, we use the relative error « of the estimated signal subspace defined as Signal k2 kS Y «¼ (5:12) kSk2

Signal and Image Processing for Remote Sensing

84

Distance (sensors)

Distance (sensors)

1

2

50

50

100 150

100 150

100 150

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450

400 450

400 450

500

500

500

FIGURE 5.4 Results obtained using the SVD-ICA subspace method.

(c) Estimated waves ~ vi

3

4

5

6

0

0

50

(b) ~ Estimated noise subspace YNoise

0 50 100 150 250 300

Time (samples)

200 350 400 450 500

FIGURE 5.5 Stacks. From left to right: initial data set Y, original signal subspace S, SVD, and SVD–ICA estimated subspaces.

7

8

1

2

3

4

5

6 7

8

1

2

3

4

5

6

7

8

0

(a) Estimated signal subspace ~ YSignal

Index (i )

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 85

Relative singular values (%)

30 SVD SVD–ICA

25 20 15 10 5 0

1

2

3

4 5 Index ( j )

6

7

8

FIGURE 5.6 Relative singular values.

where kk is the matrix Frobenius norm defined above, S is the original signal subspace Signal represents either the estimated signal subspace YSignal obtained using SVD or and Y ~ Signal obtained using SVD–ICA. For the data set presented the estimated signal subspace Y in Figure 5.2, we obtain « ¼ 55.7% for classical SVD and « ¼ 0.5% for SVD–ICA. The SNR of this data set was modified by keeping the initial noise subspace constant and by adjusting the energy of the initial signal subspace. The relative errors of the estimated signal subspaces versus the SNR are plotted in Figure 5.7. For SNRs greater than 17 dB, the two methods are equivalent. For smaller SNR, the SVD–ICA subspace method is obviously better than the SVD subspace method. It provides a relative error lower than 1% for SNRs greater than 10 dB. Note here that for other data sets, the SVD–ICA performance can be degraded by the unfulfilled independence assumption supposed for the aligned wave. However, for small SNR of the data set, the SVD–ICA usually gives better performances than SVD. The ICA step leads to a fourth-order independence of the estimated waves and relaxes the unjustified orthogonality constraint for the propagation vectors. This step in the process enhances the wave separation results and minimizes the error on the estimated signal subspace, especially when the SNR ratio is low.

5.4

Multi-Way Array Data Sets

Error (%) of the estimated signal subspace

We now turn to the modelization and processing of data sets having more than two modes or diversities. Such data sets are recorded by arrays of vector-sensors (also called multicomponent sensors) collecting, in addition to time and distance information, the polarization information. Note that there exist other acquisition schemes that output multi-way (or multi-dimensional, multi-modal) data sets, but they are not considered here.

100

SVD SVD–ICA

90 80 70 60 50 40 30 20 10 0 –30

–20

–10 0 10 SNR (dB) of the data set

20

30

FIGURE 5.7 Relative error of the estimated subspaces.

Signal and Image Processing for Remote Sensing

86 5.4.1

Multi-Way Acquisition

In seismic acquisition campaigns, multi-component sensors have been used for more than ten years now. Such sensors allow the recording of the polarization of seismic waves. Thus, arrays of such sensors provide useful information about the nature of the propagated wavefields and allow a more complete description of the underground structures. The polarization information is very useful to differentiate and characterize waves in signal, but the specific (multi-component) nature of the data sets has to be taken into account in the processing. The use of vector-sensor arrays provides data sets with time, distance, and polarization modes, which are called trimodal or three-mode data sets. Here we propose to use a multi-way model to model and process them.

5.4.2

Multi-Way Model

To keep the trimodal (multi-dimensional) structure of data sets originated from vectorsensor arrays in their processing, we propose a multi-way model. This model is an extension of the one proposed in Section 5.2.2 Thus, a three-mode data set is modeled as a multi-way array of size Nc Nx Nt, where Nc is the number of components of each sensor used to recover the vibrations of the wavefield in the three directions of the 3D space, Nx is the number of sensors of the vector-sensor array, and Nt is the number of time samples. Note that the number of components is defined by the vector-sensor configuration. As an example, for the illustration shown in Section 5.5.3, Nc ¼ 2 because one geophone and one hydrophone were used, while Nc ¼ 3 in Section 5.5.4 because three geophones were used. Supposing that the propagation of waves only introduces delay and attenuation, the signal recorded on the cth component (c ¼ 1, . . . ,Nc) of the kth sensor (k ¼ 1, . . . ,Nx), using the superposition principle and assuming that P waves impinge on the array of vector-sensors, can be written as P X xck (m) ¼ acki si (m mki ) þ nck (m) (5:13) i¼1

where acki represents the attenuation of the ith wave on the cth component of the kth sensor of the array. si(m) is the ith wave and mki is the delay observed at sensor k. The time index is m. nck(m) is the noise, supposed Gaussian, centered, spatially white, and independent of the waves. As in the matrix processing approach, preprocessing is needed to ensure low rank of the signal subspace and to ensure good results for a subspace-based processing method. Thus, a velocity correction applied on the dominant waveform (compensation of mk1) leads for the signal recorded on component c of sensor k to: yck (m) ¼ ack1 s1(m) þ

P X

acki si (m m0ki ) þ n0ck (m)

(5:14)

i¼2

where yck(m) ¼ xck(m þ mk1), mki0 ¼ mki mk1, and n0ck(m) ¼ nck(m þ mk1). In the sequel, the wave s1(m) is considered independent from other waves and from the noise. The subspace method developed thereafter will intend to isolate and estimate correctly this wave. Thus, three-mode data sets recorded during Nt time samples on vector-sensor arrays made up by Nx sensors each one having Nc components can be modeled as multi-way arrays Y 2 RNc Nx Nt:

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 87 Y ¼ fyckm ¼ yck (m)j1 c Nc , 1 k Nx , 1 m Nt g

(5:15)

This multi-way model can be used for extension of subspace method separation to multicomponent data sets.

5.5

Multi- Way A rray Processing

Multi-way data analysis arose firstly in the field of psychometrics with Tucker [16]. It is still an active field of research and it has found applications in many areas such as chemometrics, signal processing, communications, biological data analysis, food industry, etc. It is an admitted fact that there exists no exact extension of the SVD for multi-way arrays of dimension greater than 2. Instead of such an extension, there exist mainly two decompositions: PARAFAC [17] and HOSVD [14,16]. The first one is also known as CANDECOMP as it gives a canonical decomposition of a multi-way array, that is, it expresses the multi-way array into a sum of rank-1 arrays. Note that the rank-1 arrays in the PARAFAC decomposition may not be orthogonal, unlike in the matrix case. The second multi-way array decomposition, HOSVD, gives orthogonal bases in the three ways of the array but is not a canonical decomposition as it does not express the original array into a sum of rank-1 arrays. However, in the sequel, we will make use of the HOSVD because of the orthogonal bases that allow extension of well-known subspace methods based on SVD to multi-way datasets.

5.5.1

HOSVD

We now introduce the HOSVD that was formulated and studied in detail in Ref. [14]. We give particular attention to the three-mode case because we will process such data in the sequel, but an extension to the multi-dimensional case exists [14]. One must notice that in the trimodal case the HOSVD is equivalent to the TUCKER3 model [16]; however, the HOSVD has a formulation that is more familiar in the signal processing community as its expression is given in terms of matrices of singular vectors just as in the SVD in the matrix case. 5.5.1.1 HOSVD Definition Consider a multi-way array Y 2 RNc

Nx Nt

, the HOSVD of Y is given by

Y ¼ C 1 V(c) 2 V(x) 3 V(t)

(5:16)

where C 2 RNc Nx Nt is called the core array and V(i) ¼ [v(i)1, . . . , v(i)j, . . . , v(i)ri] 2 RNi ri are matrices containing the singular vectors v(i)j 2 RNi of Y in the three modes (i ¼ c, x, t). These matrices are orthogonal, V(i)V(i)T ¼ I, just as in the matrix case. A schematic representation of the HOSVD is given in Figure 5.8. The core array C is the counterpart of the diagonal matrix D in the SVD case in Equation 5.4, except that it is not hyperdiagonal but fulfils the less restrictive property of being allorthogonal. All-orthogonality is defined as hCi¼ Ci¼ i ¼ 0 where i ¼ c, x, t and a 6¼ b kCi¼1 k kCi¼2 k kCi¼ri k 0,8i

(5:17)

Signal and Image Processing for Remote Sensing

88

rc Nc V(c ) Nc

Nx

y

V(x ) Nx

FIGURE 5.8 HOSVD of a three-mode data set Y.

rc

rx

Nt

rx

rt

C rt

V(t ) Nt

where h.,.i is the classical scalar product between matrices4 and kk is the matrix Frobenius norm defined in Section 5.2 (because here we deal with three-mode data and the ‘‘slices’’ Ci ¼ a define matrices). Thus, hCi ¼ a, Ci bi ¼ 0 corresponds to orthogonality between slices of the core array. Clearly, the all-orthogonality property consists of orthogonality between two slices (matrices) of the core array cut in the same mode and ordering of the norm of these slices. This second property is the counterpart of the decreasing arrangement of the singular values in the SVD [7], with the special property of being valid here for norms of slices of C and in its three modes. As a consequence, the ‘‘energy’’ of the three-mode data set Y is concentrated at the (1,1,1) corner of the core array C. The notation n in Equation 5.16 is called the n-mode product and there are three such products (namely 1, 2, and 3), which can be defined for the three-mode case. Given a multi-way array A 2 RI1 I2 I3, then the three possible n-mode products of A with matrices are: P (A 1 B)ji2 i3 ¼ ai1 i2 i3 bji1 i1

(A 2 C)i1 ji3 ¼ (A 3 D)i1 i2 j ¼

P

ai1 i2 i3 cji2

i2

P i3

ai1 i2 i3 dji3

(5:18)

where B 2 RJ I1, C 2 RJ I2, and D 2 RJ I3. This is a general notation in (multi-)linear algebra and even the SVD of a matrix can be expressed with such a product. For example, the SVD given in Equation 5.4 can be rewritten, using n-mode products, as Y ¼ D 1 U 2 V [10]. 5.5.1.2

Computation of the HOSVD

The problem of finding the elements of a three-mode decomposition was originally solved using alternate least square (ALS) techniques (see Ref. [18] for details). It was only in Ref. [14] that a technique based on unfolding matrix SVDs was proposed. We present briefly here a way to compute the HOSVD using this approach. From a multi-way array Y 2 RNc Nx Nt, it is possible to build three unfolding matrices, with respect to the three modes c, x, and t, in the following way: 8 < Y(c) 2 RNc Nx Nt Nc Nx Nt (5:19) ) Y(x) 2 RNx Nt Nc Y2R : Y(t) 2 RNt Nc Nx 4

hA, Bi ¼ SiI ¼ 1 Sj J ¼ 1aijbij is the scalar product between the matrices A ¼ {aij} 2 RI J and B ¼ {bij} 2 RI J.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 89 Nx Nc Y(c )

Nc

Nx Nt

Nx

Nt

Nt

Nc Nx Nt

Y(x )

Nc

Nc

Nc

Nt

Nx Nt

Y(t ) FIGURE 5.9 Schematic representation of the unfolding matrices.

Nx

A schematic representation of these unfolding matrices is presented in Figure 5.9. Then, these three unfolding matrices admit the following SVD decompositions: YT(i)

SVD

U(i) D(i) VT(i)

(5:20)

where each matrix V(i) (i ¼ c, x, t) in Equation 5.16 is the right matrix given by the SVD of T each transposed unfolding matrix Y(i) . The choice of the transpose was made to keep a homogeneous notation between matrix and multi-way processing. The matrices V(i) define orthonormal bases in the three modes of the vector space RNc Nx Nt. The core array is then directly obtained using the formula: C ¼ Y 1 VT(c) 2 VT(x) 3 VT(t)

(5:21)

The singular values contained in the three matrices D(i) (i ¼ c, x, t) in Equation 5.20 are called three-mode singular values. Thus, the HOSVD of a multi-way array can be easily obtained from the SVDs of the unfolding matrices, which makes computing of this decomposition easy using already existing algorithms.

5.5.1.3 The (rc, rx, rt)-rank Given a three-mode data Y 2 RNc Nx Nt, one gets three unfolded matrices Y(c), Y(x), and Y(t), 0 with respective ranks rc, rx, and rt. That is, the rj s are given as the number of nonvanishing singular values contained in the matrices D(i) in Equation 5.20, with i ¼ c, x, t. As mentioned before, the HOSVD is not a canonical decomposition and so is not related to the generic rank (number of rank-1 arrays that lead to the original array by linear combination). Nevertheless, the HOSVD gives other information named, in the threemode case, the three-mode rank. The three-mode rank consists of a triplet of ranks: the (rc, rx, rt)-rank, which is made up of the ranks of matrices Y(c), Y(x), and Y(t) in the HOSVD. In the sequel, the three-mode rank will be of use for determination of subspace dimensions, and so will be the counterpart of the classical rank used in matrix processing techniques.

Signal and Image Processing for Remote Sensing

90 5.5.1.4

Three-Mode Subspace Method

As in the matrix case, it is possible, using the HOSVD, to define a subspace method that decomposes the original three-mode data set into orthogonal subspaces. Such a technique was first proposed in Ref. [10] and can be stated as follows. Given a threemode data set Y 2 RNcNxNt, it is possible to decompose it into the sum of a signal and a noise subspace as Y ¼ YSignal þ YNoise

(5:22)

with a weaker constraint than in the matrix case (orthogonality), which is a mode orthogonality, that is, orthogonality in the three modes [10] (between subspaces defined using the unfolding matrices). Just as in the matrix case, the signal and noise subspaces are formed by different vectors obtained from the decomposition of the data set. In the threemode case, the signal subspace YSignal is built using the first pc rc singular vectors in the first mode, px rx in the second, and pt rt in the third: YSignal ¼ Y 1 PVpc 2 PVpx 3 PVpt (x)

(c)

(t)

(5:23)

with PVpi the projectors given by (i)

p

pT

PVpi ¼ V(i)i V(i)i (i)

(5:24)

pi where V(i) ¼ [ v(i)1, . . . , v(i)pi] are the matrices containing the first pi singular vectors (i ¼ c, x, t). Then after estimation of the signal subspace, the noise subspace is simply obtained by subtraction, that is, YNoise ¼ Y YSignal. The estimation of the signal subspace consists in finding a triplet of values pc, px, pt that allows recovery of the signal part by the (pc, px, pt)-rank truncation of the original data set Y. This truncation is obtained by classical matrix truncation of the three SVDs of the unfolding matrices. However, it is important to note that such a truncation is not the best (p1, p2, p3)-rank truncation of the data [10]. Nevertheless, the decomposition of the original three-mode data set is possible and leads, under some assumptions, to the separation of the recorded wavefields. From a practical point of view, the choice of pc, px, pt values is made by finding abrupt changes of the slope in the curves of relatives of three-mode singular values (the three sets of singular values contained in the matrices D(i)). For some special cases for which no ‘‘visible’’ change of slope can be found, the value of pc can be fixed at 1 for a linear polarization of the aligned wavefield (denoted by s1(m) ), or at 2 for an elliptical polarization [10]. The value of px can be fixed at 1 and the value of pt can be fixed at 1 for a perfect alignment of waves, or at 2 for not an imperfect alignment or for dispersive waves. As in the matrix case, the HOSVD-based subspace method decomposes the original space of the data set into orthogonal subspaces, and so following the ideas developed in Section 5.3.2, it is possible to add an ICA step to modify the orthogonal constraint in the temporal mode.

5.5.2

HOSVD and Unimodal ICA

To enhance wavefield separation results, we now introduce a unimodal ICA step following the HOSVD-based subspace decomposition.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 91 5.5.2.1 HOSVD and ICA The SVD of the unfolded matrix Y(t) in Equation 5.20 provides two orthogonal matrices U(t) and V(t) made up by the left and right singular vectors u(t)j and v(t)j. As for the SVD, the v(t)js0 are the estimated waves and u(t)j’s are the propagation vectors. Based on the same motivations as in the SVD case, the unjustified orthogonality constraint for the propagation vectors can be relaxed by imposing a fourth-order independence for the estimated waves. Assuming the recorded waves are mutually independent, we can write: ~ R ¼ [v ~(t)1 , . . . ,v ~(t)R ] VR(t) B ¼ V (t)

(5:25)

with B 2 RR R the rotation (unitary) matrix given by one of the algorithms presented in R Section 5.3.2 and V(t) ¼ [v(t)1, . . . ,v(t)R] 2 RNt R made up by the first R vectors of V(t). Here we also suppose that the nonaligned waves in the unfolded matrix Y(t) are contained in a subspace of dimension R 1, smaller than the rank rt of Y(t). After the ICA step, a new matrix can be computed: h i ~ R ; VNt R ~ (t) ¼ V V (t) (t)

(5:26)

R ~ (t) ~(t)j of the matrix V This matrix is made up of the R vectors v , which are independent at the fourth order, and by the last Nt R vectors v(t)j of the matrix V(t), which are kept unchanged. The HOSVD–unimodal ICA decomposition is defined as

~ 1 V(c) 2 V(x) 3 V ~ (t) Y¼C

(5:27)

~ ¼ Y 1 VT 2 VT 3 V ~T C (c) (x) (t)

(5:28)

with

Unimodal ICA implies here that ICA is only performed on one mode (the temporal mode). ~ (t) As in the SVD case, a permutation s(.) between the vectors of V(c), V(x), respectively, V must be performed for ordering the Frobenius norms of the subarrays (obtained by fixing ~ . Hence, we keep the same decomposition structure as one index) of the new core array C in relations given in Equation 5.16 and Equation 5.21, the only difference is that we have modified the orthogonality into a fourth-order independence constraint for the first R estimated waves on the third mode. Note that the (rc,rx,rt)-rank of the three-mode data set Y is unchanged. 5.5.2.2 Subspace Method Using HOSVD–Unimodal ICA On the temporal mode, a new projector can be computed after the ICA and the permutation steps: T

~ ~ ~pt ¼ V ~ ~pt V ~ ~pt P (t) (t) V

(5:29)

(t)

~ pt ~ (t) ~t] is the matrix containing the first p~t estimated waves, which where V ¼ [~ v(t)1, . . . ,~ v(t)p ~ (t) defined in Equation 5.26. Note that the two projectors on the first are the columns of V two modes Pvpc and Pvpx , given by Equation 5.24, must be recomputed after the ðcÞ ðcÞ permutation step.

Signal and Image Processing for Remote Sensing

92

The signal subspace using the HOSVD–unimodal ICA is thus given as ~ Signal ¼ Y 1 P pc 2 P px 3 P ~ ~ ~pt Y V V V (c)

(x)

(5:30)

(t)

~ Noise is obtained by the subtraction of the signal subspace from and the noise subspace Y the original three-mode data: ~ Signal ~ Noise ¼ Y Y Y

(5:31)

In practical situations, as in the SVD–ICA subspace case, the value of R becomes a parameter. It is chosen to fully describe the aligned wave by the first R estimated waves v(t)j obtained while using the HOSVD. The choice of the pc, px, and ~ pt values is made by finding abrupt changes of the slope in the curves of modified three-mode singular values, obtained after the ICA and the permutation steps. Note that in the HOSVD–unimodal ICA subspace method, the rank for the signal subspace in the third mode, ~pt, is not necessarily equal to the rank pt obtained using only the HOSVD. For some special cases for which no ‘‘visible’’ change of slope can be found, the value of ~pt can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves. As in the HOSVD subspace method, px can be fixed at 1 and pc can be fixed at 1 for a linear polarization of the aligned wavefield, or at 2 for an elliptical polarization. Applications to simulated and real data are presented in the following sections to illustrate the behavior of the HOSVD–unimodal ICA method in comparison with component-wise SVD (SVD applied on each component of the multi-way data separately) and HOSVD subspace methods. 5.5.3

Application to Simulated Data

This simulation represents a multi-way data set Y 2 R218256 composed of Nx ¼ 18 sensors each recording two directions (Nc ¼ 2) in the 3D space for a duration of Nt ¼ 256 time samples. The first component is related to a geophone Z and the second one to a hydrophone Hy. The Z component was scaled by 5 to obtain the same amplitude range. This data set shown in Figure 5.10c and Figure 5.11c has polarization, distance, and time as modes. It was obtained by the addition between an original signal subspace S with the two components shown in Figure 5.10a and Figure 5.11a, respectively, and an original noise subspace N (Figure 5.10b and Figure 5.11b respectively) obtained from a real geophysical acquisition after subtraction of aligned waves. The original signal subspace is made of several wavefronts having infinite apparent velocity, associated with the aligned wave. The relation between Z and Hy is a linear relation, which is assimilated to the wave polarization (polarization mode) in the sense that it consists of phase and amplitude relations between the two components. Wave amplitudes vary along the sensors, simulating attenuation along the distance mode. The noise is uncorrelated from one sensor to the other (spatially white) and also unpolarized. The SNR ratio of this data set is SNR ¼ 3 dB, where the SNR definition is: SNR ¼ 20 log10

kSk kNk

(5:32)

where kk is the multi-way array Frobenius norm5, S, and N the original signal and noise subspaces. 5

For any multi-way array X ¼ {xijk} 2 RIJK, his Frobenius norm is kXk ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ j Sii¼1 Sj¼1 SKk¼1 x2ijk .

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 93 Distance (sensors)

Distance (sensors)

Distance (sensors) 0

2 4

6 8

10

12 14

16 18

0

2 4

6 8

10

12 14

16 18

0

2 4

6 8

10

12 14

16 18

0

0

0

50

50

50

100 150

150

150

Time (samples)

100

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(b) Original noise subspace N

(a) Original signal subspace S

(c) Initial dataset Y

FIGURE 5.10 Simulated data: the Z component.

Our aim is to recover the original signal (Figure 5.10a and Figure 5.11a) from the mixture, which is, in practice, the only data available. Note that normalization to unit variance of each trace for each component was done before applying the described subspace methods. This ensures that even weak peaked arrivals are well represented

Distance (sensors)

Distance (sensors)

0

2

4

6

50

50

100 150

150

Time (samples)

100

150

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(c) Initial dataset Y

8

10

12

14

0

0

50

(b) Original noise subspace N

16

18

0

2

4

6

8

10

12

14

16

18

0

2

4

6

8

10

12

14

16

18 0

(a) Original signal subspace S FIGURE 5.11 Simulated data: the Hy component.

Distance (sensors)

Signal and Image Processing for Remote Sensing

94

Distance (sensors)

Distance (sensors)

0

2 4

6 8

10

12 14

16 18 0

0 2

4

6 8

10

12

14

16

18 0

50

50

100

150

150

Time (samples)

100

Time (samples)

200

200

250

250

(b) Hy component

(a) Z component FIGURE 5.12 Signal subspace using component-wise SVD.

within the input data. After computation of signal subspaces, a denormalization was applied to find the original signal subspace. Firstly, the SVD subspace method (described in Section 5.3) was applied separately on Z and Hy components of the mixture. The signal subspace components obtained keeping only one singular value for the two components are presented in Figure 5.12. This choice was made by finding an abrupt change of slope after the first singular value in the relative singular values shown in Figure 5.13 for each seismic 2D signal. The waveforms are not well recovered in respect of the original signal components (Figure 5.10a and Figure 5.11a). One can also see that the distinction between different wavefronts is not possible. Furthermore, no arrival time estimation is possible using this technique. Low signal level is a strong handicap for a component-wise process. Applied on each component separately, the SVD subspace method does not find the same aligned polarized wave. The results depend therefore on the characteristics of each matrix signal. Using the SVD–ICA subspace method, the estimated aligned waves may be improved, but we can be confronted with the same problem. 20 15 10 5 0

1

2

4

6

8

10

12

14

16

18

1

2

4

6

8

10

12

14

16

18

20 15 10

FIGURE 5.13 Relative singular values. Top: Z component. Bottom: Hy component.

5 0

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 95 Distance (sensors)

1

2

3

4

50

50

50

100 150

150

Time (samples)

100

150

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(b) Hy component of YSignal

5

0

0

2 4

6 8

10

12 14

0

(a) Z component of YSignal

Index ( j )

0

0

2 4

6 8

10

12 14

16 18

16 18

Distance (sensors)

(c) Estimated waves V(t )j

FIGURE 5.14 Results obtained using the HOSVD subspace method.

Using the HOSVD subspace method, the components of the estimated signal subspace YSignal are presented in Figure 5.14a and Figure 5.14b. In this case, the number of singular vectors kept are: one on polarization, one on distance, and one on time mode, giving a (1,1,1)-rank truncation for the signal subspace. This choice is motivated here by the linear polarization of the aligned wave. For the other two modes the choice was made by finding an abrupt change of slope after the first singular value (Figure 5.17a). There still remain some ‘‘oscillations’’ between the different wavefronts for the two components of the estimated signal subspace, that may induce some detection errors. An ICA step is required in this case to obtain a better signal separation and to cancel parasitic oscillations. ~ Signal In Figure 5.15a and Figure 5.15b, the wavefronts of the estimated signal subspace Y obtained with the HOSVD–unimodal ICA technique are very close to the original signal components. Here, the ICA method was applied on the first R ¼ 5 estimated waves shown in Figure 5.14c. These waves describe the aligned waves of the original signal subspace S. ~(t)j are shown in Figure 5.15c. As we can see, the After the ICA step, the estimated waves v ~(t)1 describes more precisely the aligned wave of the original subspace S than the first one v first estimated wave v(t)1 before the ICA step (Figure 5.14c). The estimation of signal subspace is more accurate and the aligned wavelet can be better estimated with our proposed procedure. After the permutation step, the relative singular values on the three modes in the HOSVD–unimodal ICA case are shown in Figure 5.17b. This figure justifies the choice of a (1,1,1)-rank truncation for the signal subspace, due to the linear polarization and the abrupt changes of the slopes for the other two modes. As for the bidimensional case, to compare these methods quantitatively, we use the relative error « of the estimated signal subspace defined as

Signal and Image Processing for Remote Sensing

96

Distance (sensors)

1

2

3

0 50

50

50

100 150

150

150

Time (samples)

100

Time (samples)

100

Time (samples)

200

200

200

250

250

250

~

(a) Z component of YSignal

4

5

0

2 4

6 8

10

12 14

Index ( j )

0

0

16 18

0

2

4

6

8

10

12

14

16

18

Distance (sensors)

~

(b) Hy component of YSignal

~

(c) Estimated waves v(t )j

FIGURE 5.15 Results obtained using the HOSVD–unimodal ICA subspace method. Signal 2

«¼

kS Y

k

kSk

(5:33)

2

where kk is the multi-way array Frobenius norm defined above, S is the original signal Signal represents the estimated signal subspaces obtained with the SVD, subspace, and Y HOSVD, and HOSVD–unimodal ICA methods, respectively. For this data set we obtain « ¼ 21.4% for the component-wise SVD, « ¼ 12.4% for HOSVD, and « ¼ 3.8% for HOSVD–unimodal ICA. We conclude that the ICA step minimizes the error on the estimated signal subspace. To compare the results qualitatively, the stack representation is employed. Figure 5.16 shows for each component, from left to right, the stacks on the initial data set Y, the original signal subspace S, and the estimated signal subspaces obtained with the component-wise SVD, HOSVD, and HOSVD–unimodal ICA subspace methods, respectively. 0

0

50

50

100 150

150

200

200

250

250

(a) Z component

Time (samples)

100

Time (samples)

FIGURE 5.16 Stacks. From left to right: initial data set Y, original signal subspace S, SVD, HOSVD, and HOSVD–unimodal ICA estimated subspaces, respectively.

(b) Hy component

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 97 60 50 40 1 30 20 10 0 1 15 10 5 0 1

2

2

4

5

6

10

8

15

10

12

20

14

25

16

30

18

36

(a) Using HOSVD

60 50 40 1 30 20 10 0 1 15 10 5 0 1

2

2

4

5

6

10

8

15

10

20

12

14

25

(b) Using HOSVD−unimodal ICA

16

30

18

36

FIGURE 5.17 Relative three-mode singular values.

~ Signal is very close to the stack on the original signal As the stack on the estimated signal subspace Y subspace S, we conclude that the HOSVD–unimodal ICA subspace method enhances the wave separation results.

5.5.4

Application to Real Data

We consider now a real vertical seismic profile (VSP) geophysical data set. This 3C data set was recorded by Nx ¼ 50 sensors with the depth sampling 10 m, each one made up by Nc ¼ 3 geophones recording three directions in the 3D space: X, Y, and Z, respectively. The recording time was 700 msec, corresponding to Nt ¼ 175 time samples. The Z component was scaled seven times to obtain the same amplitude range. After the preprocessing step (velocity correction based on the direct downgoing wave), the obtained data set Y 2 R350175 is shown in Figure 5.18. As in the simulation case, normalization and denormalization of each trace for each component were done before and after applying the different subspace methods. From the original data set we have constructed three seismic 2D matrix signals representing the three components of the data set Y. The SVD subspace method presented in Section 5.3.1 was applied on each matrix signal, keeping only one singular vector (respectively one singular value) for each one, due to an abrupt change of slope after the first singular value in the curves of relative singular values. As remarked in the simulation case, the SVD–ICA subspace method may improve the estimated aligned waves, but we will not find the same aligned polarized wave for all seismic matrix signals. For the HOSVD subspace method, the estimated signal subspace YSignal can be defined as a (2,1,1)-rank truncation of the data set. This choice is motivated here by the elliptical polarization of the aligned wave. For the other two modes the choice was made by finding an abrupt change of slope after the first singular value (Figure 5.20a). Using the HOSVD–unimodal ICA, the ICA step was applied here on the first R ¼ 9 estimated waves v(t)j shown in Figure 5.19a. As suggested, R becomes a parameter

Signal and Image Processing for Remote Sensing

98 Distance (m)

0 50

100 150

200 250

300 350 400

450 500 0

0 50

100 150

200 250

300 350 400

450 500 0

0 50

100 150

200 250

300 350 400

450 500 0

0.1

0.1

0.1

0.2

0.2

0.2

0.3 0.4

Time (s)

0.4

Time (s)

0.3

0.4

Time (s)

0.3

0.5

0.5

0.5

0.6

0.6

0.6

0.7

0.7

0.7

(a) X component

Distance (m)

Distance (m)

(b) Y component

(c) Z component

FIGURE 5.18 Real VSP geophysical data Y.

~(t)j shown in Figure 5.19b are while using real data. However, the estimated waves v more realistic (shorter wavelet and no side lobes) than those obtained without ICA. Due to the elliptical polarization and the abrupt change of slope after the first singular value for the other two modes (Figure 5.20b), the estimated signal subspace ~ Signal is defined as a (2,1,1)-rank truncation of the data set. This step enhances Y the wave separation results, implying a minimization of the error on the estimated signal subspace. When we deal with a real data set, only a qualitative comparison is possible. This is allowed by a stack representation. Figure 5.21 shows the stacks for the X, Y, and Z components, respectively, on the initial trimodal data Y and on the estimated signal

Index ( j )

0.2

0.2

0.3 0.4 Time (s)

0.3 0.4 Time (s) 0.5

0.5

0.6

0.6

0.7

0.7

~

(b) After ICA, v(t )j

1

3

5

7

0.1

0.1

(a) Before ICA, v(t )j

9

0

1

3

5

7

9

0

FIGURE 5.19 The first 9 estimated waves.

Index ( j )

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 99 60 40 20 1

2

3

20 10 01 10

10

20

30

40

50

5 0

1

20

40

60

80

100

120

140

(a) Using HOSVD 60 40 20 1 20

2

3

10 0 1 10

10

20

30

40

50

5 0 1

20

40

60

80

100

120

140

(b) Using HOSVD–unimodal ICA

FIGURE 5.20 Relative three-mode singular values.

subspaces given by the component-wise SVD, HOSVD, and HOSVD–unimodal ICA methods, respectively. The results on simulated and real data suggest that the three-dimensional subspace methods are more robust than the component-wise techniques because they exploit the relationship between the components directly in the process. Also, the fourth-order independence constraint of the estimated waves enhances the wave separation results 0

0

0

0.1

0.1

0.1

0.2

0.2

0.2

0.4

Time (s)

0.5

0.5

0.5

0.6

0.6

0.6

0.7

0.7

0.7

(b) Y component

0.3

0.4

Time (s)

0.3

0.4

Time (s)

0.3

(a) X component

(c) Z component

FIGURE 5.21 Stacks. From left to right: initial data set Y, SVD, HOSVD, and HOSVD–unimodal ICA estimated subspaces, respectively.

Signal and Image Processing for Remote Sensing

100

and minimizes the error on the estimated signal subspace. This emphasizes the potential of the HOSVD subspace method associated with a unimodal ICA step for vector-sensor array signal processing.

5.6

Conclusions

We have presented a subspace processing technique for multi-dimensional seismic data sets based on HOSVD and ICA. It is an extension of well-known subspace separation techniques for 2D (matrix) data sets based on SVD and more recently on SVD and ICA. The proposed multi-way technique can be used for the denoising and separation of polarized waves recorded on vector-sensor arrays. A multi-way (three-mode) model of polarized signals recorded on vector-sensor arrays allows us to take into account the additional polarization information in the processing and thus to enhance the separation results. A decomposition of three-mode data sets into all-orthogonal subspaces has been proposed using HOSVD. An extra unimodal ICA step has been introduced to minimize the error on the estimated signal subspace and to improve the separation result. Also, we have shown on simulated and real data sets that the proposed approach gives better results than the component-wise SVD subspace method. The use of multi-way array and associated decompositions for multi-dimensional data set processing is a powerful tool and ensures the extra dimension is fully taken into account in the process. This approach could be generalized to any multi-dimensional signal modelization and processing and could take advantage of recent work on tensors and multi-way array decomposition and analysis.

References 1. Vrabie, V.D., Statistiques d’ordre supe´rieur: applications en ge´ophysique et e´lectrotechnique, Ph.D. thesis, I.N.P. of Grenoble, France, 2003. 2. Vrabie, V.D., Mars, J.I., and Lacoume, J.-L., Modified singular value decomposition by means of independent component analysis, Signal Processing, 84(3), 645, 2004. 3. Sheriff, R.E. and Geldart, L.P., Exploration Seismology, Vol. 1 & 2, Cambridge University Press, 1982. 4. Freire, S. and Ulrich, T., Application of singular value decomposition to vertical seismic profiling, Geophysics, 53, 778–785, 1988. 5. Glangeaud, F., Mari, J.-L., and Coppens, F., Signal processing for geologists and geophysicists, Editions Technip, Paris, 1999. 6. Robinson, E.A. and Treitel, S., Geophysical Signal Processing, Prentice Hall, 1980. 7. Golub, G.H. and Van Loan, C.F., Matrix Computation, Johns Hopkins, 1989. 8. Moonen, M. and De Moor, B., SVD and signal processing III, in Algorithms, Applications and Architecture, Elsevier, Amsterdam, 1995. 9. Scharf, L.L., Statistical signal processing, detection, estimation and time series analysis, in Electrical and Computer Engineering: Digital Signal Processing Series, Addison Wesley, 1991. 10. Le Bihan, N. and Ginolhac, G., Three-mode dataset analysis using higher order subspace method: application to sonar and seismo-acoustic signal processing, Signal Processing, 84(5), 919–942, 2004. 11. Vrabie, V.D., Le Bihan, N., and Mars, J.I., 3D-SVD and partial ICA for 3D array sensors, Proceedings of the 72nd Annual International Meeting of the Society of Exploration Geophysicists, Salt Lake City, 2002, 1065.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 101 12. Cardoso, J.-F. and Souloumiac, A., Blind beamforming for non-Gaussian signals, IEE Proc.-F, 140(6), 362, 1993. 13. Comon, P., Independent component analysis, a new concept? Signal Processing, 36(3), 287, 1994. 14. De Lathauwer, L., Signal processing based on multilinear algebra, Ph.D. thesis, Katholieke Universiteit Leuven, 1997. 15. Hyva¨rinen, A. and Oja, E., Independent component analysis: algorithms and applications, Neural Networks, 13(4–5), 411, 2000. 16. Tucker, L.R., The extension of factor analysis to three-dimensional matrices, H. Gulliksen and N. Frederiksen (Eds.), Contributions to mathematical psychology series, Holt, Rinehart and Winston, 109–127, 1964. 17. Caroll, J.D. and Chang, J.J., Analysis of individual differences in multidimensional scaling via n-way generalization of Eckart–Young decomposition, Psychometrika, 35(3), 283–319, 1970. 18. Kroonenberg, P.M., Three-Mode Principal Component Analysis, DSWO Press, Leiden, 1983.

6 Application of Factor Analysis in Seismic Profiling

Zhenhai Wang and Chi Hau Chen

CONTENTS 6.1 Introduction to Seismic Signal Processing.................................................................... 104 6.1.1 Data Acquisition ................................................................................................... 104 6.1.2 Data Processing..................................................................................................... 105 6.1.2.1 Deconvolution ....................................................................................... 105 6.1.2.2 Normal Moveout................................................................................... 105 6.1.2.3 Velocity Analysis .................................................................................. 106 6.1.2.4 NMO Stretching .................................................................................... 106 6.1.2.5 Stacking .................................................................................................. 106 6.1.2.6 Migration................................................................................................ 106 6.1.3 Interpretation......................................................................................................... 107 6.2 Factor Analysis Framework ............................................................................................ 107 6.2.1 General Model....................................................................................................... 107 6.2.2 Within the Framework ........................................................................................ 109 6.2.2.1 Principal Component Analysis........................................................... 109 6.2.2.2 Independent Component Analysis .................................................... 110 6.2.2.3 Independent Factor Analysis .............................................................. 111 6.3 FA Application in Seismic Signal Processing .............................................................. 111 6.3.1 Marmousi Data Set............................................................................................... 111 6.3.2 Velocity Analysis, NMO Correction, and Stacking ........................................ 112 6.3.3 The Advantage of Stacking................................................................................. 114 6.3.4 Factor Analysis vs. Stacking ............................................................................... 114 6.3.5 Application of Factor Analysis........................................................................... 116 6.3.5.1 Factor Analysis Scheme No. 1 ............................................................ 116 6.3.5.2 Factor Analysis Scheme No. 2 ............................................................ 116 6.3.5.3 Factor Analysis Scheme No. 3 ............................................................ 118 6.3.5.4 Factor Analysis Scheme No. 4 ............................................................ 118 6.3.6 Factor Analysis vs. PCA and ICA ..................................................................... 120 6.4 Conclusions........................................................................................................................ 122 References ................................................................................................................................... 122 Appendices ................................................................................................................................. 124 6.A Upper Bound of the Number of Common Factors.................................................... 124 6.B Maximum Likelihood Algorithm.................................................................................. 125

103

Signal and Image Processing for Remote Sensing

104

6.1

Introduction to Seismic Signal Processing

Formed millions of years ago from plants and animals that died and decomposed beneath soil and rock, fossil fuels, namely, coal and petroleum, due to their low cost availability, will remain the most important energy resource for at least another few decades. Ongoing petroleum research continues to focus on science and technology needs for increased petroleum exploration and production. The petroleum industry relies heavily on subsurface imaging techniques for the location of these hydrocarbons.

6.1.1

Data Acquisition

Many geophysical survey techniques exist, such as multichannel reflection seismic profiling, refraction seismic survey, gravity survey, and heat flow measurement. Among them, reflection seismic profiling method stands out because of its target-oriented capability, generally good imaging results, and computational efficiency. These reflectivity data resolve features such as faults, folds, and lithologic boundaries measured in 10s of meters, and image them laterally for 100s of kilometers and to depths of 50 kilometers or more. As a result, seismic reflection profiling becomes the principal method by which the petroleum industry explores for hydrocarbon-trapping structures. The seismic reflection method works by processing echoes of seismic waves from boundaries between different Earth’s subsurfaces that characterize different acoustic impedances. Depending on the geometry of surface observation points and source locations, the survey is called a 2D or a 3D seismic survey. Figure 6.1 shows a typical 2D seismic survey, during which, a cable with attached receivers at regular intervals is dragged by a boat. The source moves along the predesigned seismic lines and generates seismic waves at regular intervals such that points in the subsurfaces are sampled several times by the receivers, producing a series of seismic traces. These seismic traces are saved on magnetic tapes or hard disks in the recording boat for future processing.

Receivers Water

Bottom

Subsurface 1 Subsurface 2 FIGURE 6.1 A typical 2D seismic survey.

Source

Application of Factor Analysis in Seismic Profiling 6.1.2

105

Data Processing

Seismic data processing has been regarded as having a flavor of interpretive character; it is even considered as an art [1]. However, there is a well-established sequence for standard seismic data processing. Deconvolution, stacking, and migration are the three principal processes that make up the foundation. Besides, some auxiliary processes can also help improve the effectiveness of the principal processes. In the following subsections, we briefly discuss the principal processes and some auxiliary processes. 6.1.2.1 Deconvolution Deconvolution can improve the temporal resolution ing the basic seismic wavelet to approximately a spike on the field data [2]. Deconvolution usually applied deconvolution. It is also a common practice to apply which is named poststack deconvolution.

of seismic data by compressand suppressing reverberations before stack is called prestack deconvolution to stacked data,

6.1.2.2 Normal Moveout Consider the simplest case where the subsurfaces of the Earth are horizontal, and within this layer, the velocity is constant. Here x is the distance (offset) between the source and the receiver positions, and v is the velocity of the medium above the reflecting interface. Given the midpoint location M, let t(x) be the traveltime along the raypath from the shot position S to the depth point D, then back to the receiver position G. Let t(0) be twice the traveltime along the vertical path MD. Utilizing the Pythagorean theorem, the traveltime equation as a function of offset is t2 (x) ¼ t2 (0) þ x2 =v2

(6:1)

Note that the above equation describes a hyperbola in the plane of two-way time vs. offset. A common-midpoint (CMP) gather are the traces whose raypaths associated with each source–receiver pair reflect from the same subsurface depth point D. The difference between the two-way time at a given offset t(x) and the two-way zero-offset time t(0) is called NMO. From Equation 6.1, we see that velocity can be computed when offset x and the two-way times t(x) and t(0) are known. Once the NMO velocity is estimated, the travletimes can be corrected to remove the influence of offset. DtNMO ¼ t(x) t(0) Traces in the NMO-corrected gather are then summed to obtain a stack trace at the particular CMP location. The procedure is called stacking. Now consider the horizontally stratified layers, with each layer’s thickness defined in terms of two-way zero-offset time. Given the number of layers N, interval velocities are represented as (v1, v2, . . . , vN). Considering the raypath from source S to depth D, back to receiver R, associated with offset x at midpoint location M, Equation 6.1 becomes t2 (x) ¼ t2 (0) þ x2= v2rms

(6:2)

where the relation between the rms velocity and the interval velocity is represented by

Signal and Image Processing for Remote Sensing

106

v2rms ¼

N 1 X v2 Dti (0) t(0) i¼1 i

where Dti is the vertical two-way time through the ith layer and t(0) ¼

i P

Dtk .

k¼1

6.1.2.3

Velocity Analysis

Effective correction for normal moveout depends on the use of accurate velocities. In CMP surveys, the appropriate velocity is derived by computer analysis of the moveout in the CMP gathers. Dynamic corrections are implemented for a range of velocity values and the corrected traces are stacked. The stacking velocity is defined as the velocity value that produces the maximum amplitude of the reflection event in the stack of traces, which clearly represents the condition of successful removal of NMO. In practice, NMO corrections are computed for narrow time windows down the entire trace, and for a range of velocities, to produce a velocity spectrum. The validity for each velocity value is assessed by calculating a form of multitrace correlation between the corrected traces of the CMP gathers. The values are shown contoured such that contour peaks occur at times corresponding to reflected wavelets and at velocities that produce an optimum stacked wavelet. By picking the location of the peaks on the velocity spectrum plot, a velocity function defining the increase of velocity with depth for that CMP gather can be derived. 6.1.2.4

NMO Stretching

After applying NMO correction, a frequency distortion appears, particularly for shallow events and at large offsets. This is called NMO stretching. The stretching is a frequency distortion where events are shifted to lower frequencies, which can be quantified as Df=f ¼ DtNMO =t(0)

(6:3)

where f is the dominant frequency, Df is change in frequency, and DtNMO is given by Equation 6.2. Because of the waveform distortion at large offsets, stacking the NMOcorrected CMP gather will severely damage the shallow events. Muting the stretched zones in the gather can solve this problem, which can be carried out by using the quantitative definition of stretching given in Equation 6.3. An alternative method for optimum selection of the mute zone is to progressively stack the data. By following the waveform along a certain event and observing where changes occur, the mute zone is derived. A trade-off exists between the signal-to-noise (SNR) ratio and mute, that is, when the SNR is high, more can be muted for less stretching; otherwise, when the SNR is low, a large amount of stretching is accepted to catch events on the stack. 6.1.2.5

Stacking

Among the three principal processes, CMP stacking is the most robust of all. Utilizing redundancy in CMP recording, stacking can significantly suppress uncorrelated noise, thereby increasing the SNR ratio. It also can attenuate a large part of the coherent noise in the data, such as guided waves and multiples. 6.1.2.6

Migration

On a seismic section such as that illustrated in Figure 6.2, each reflection event is mapped directly beneath the midpoint. However, the reflection point is located beneath the midpoint only if the reflector is horizontal. With a dip along the survey line the actual

Application of Factor Analysis in Seismic Profiling

107

x S

M

G

Surface

Reflector D

FIGURE 6.2 The NMO geometry of a single horizontal reflector.

reflection point is displaced in the up-dip direction; with a dip across the survey line the reflection point is displaced out of the plane of the section. Migration is a process that moves dipping reflectors into their true subsurface positions and collapses diffractions, thereby depicting detailed subsurface features. In this sense, migration can be viewed as a form of spatial deconvolution that increases spatial resolution. 6.1.3

Interpretation

The goal of seismic processing and imaging is to extract the reflectivity function of the subsurface from the seismic data. Once the reflectivity is obtained, it is the task of the seismic interpreter to infer the geological significance of a certain reflectivity pattern.

6.2

Factor Analysis Framework

Factor analysis (FA), a branch of multivariate analysis, is concerned with the internal relationships of a set of variates [3]. Widely used in psychology, biology, chemometrics1 [4], and social science, the latent variable model provides an important tool for the analysis of multivariate data. It offers a conceptual framework within which many disparate methods can be unified and a base from which new methods can be developed. 6.2.1

General Model

In FA the basic model is x ¼ As þ n

(6:4)

where x ¼ (x1, x2, . . . , xp)T is a vector of observable random variables (the test scores), s ¼ (s1, s2, . . . , sr)T is a vector r < p unobserved or latent random variables (the common factor scores), A is a (p r) matrix of fixed coefficients (factor loadings), n ¼ (n1, n2, . . . , np)T is a vector of random error terms (unique factor scores of order p). The means are usually set to zero for convenience so that E(x) ¼ E(s) ¼ E(n) ¼ 0. The random error term consists 1 Chemometrics is the use of mathematical and statistical methods for handling, interpreting, and predicting chemical data.

Signal and Image Processing for Remote Sensing

108

of errors of measurement and the unique individual effects associated with each variable xj, j ¼ 1, 2, . . . , p. For the present model we assume that A is a matrix of constant parameters and s is a vector of random variables. The following assumptions are usually made for the factor model [5]:

.

rank (A) ¼ r < p E(xjs) ¼ As

.

E(xxT) ¼ S, E(ssT) ¼ V and

.

2 6 6 ¼ E nnT ¼ 6 4

s21

0 s22

0

..

. s2p

3 7 7 7 5

(6:5)

That is, the errors are assumed to be uncorrelated. The common factors however are generally correlated, and V is therefore not necessarily diagonal. For the sake of convenience and computational efficiency, the common factors are usually assumed to be uncorrelated and of unit variance, so that V ¼ I. .

E(snT) ¼ 0 so that the errors and common factors are uncorrelated.

From the above assumptions, we have E xxT ¼ S ¼ E (As þ n)(As þ n)T ¼ E AssT AT þ AsnT þ nsT AT þ nnT ¼ AE ssT AT þ AE snT þ E nsT AT þ E nnT ¼ AVAT þ E nnT ¼Gþ

(6:6)

where G ¼ AVAT and ¼ E(nnT) are the true and error covariance matrices, respectively. In addition, postmultiplying Equation 6.4 by sT, considering the expectation, and using assumptions (6.3) and (6.4), we have E xsT ¼ E AssT þ nsT ¼ AE ssT þ E nsT ¼ AV

(6:7)

For the special case of V ¼ I, the covariance between the observation and the latent variables simplifies to E(xsT) ¼ A. A special case is found when x is a multivariate Gaussian; the second moments of Equation 6.6 will contain all the information concerning the factor model. The factor model Equation 6.4 will be linear, and given the factors s the variables x are conditionally independent. Let s 2 N(0, I), the conditional distribution of x is xjs 2 N(As, )

(6:8)

Application of Factor Analysis in Seismic Profiling

109

or 1 p(xjs) ¼ (2p)p=2 jj1=2 exp (x As)T 1 (x As) 2

(6:9)

with conditional independence following from the diagonality of . The common factors s therefore reproduce all covariances (or correlations) between the variables, but account for only a portion of the variance. The marginal distribution for x is found by integrating the hidden variables s, or ð p(x) ¼ p(xjs)p(s) ds 1 T p=2 T 1=2 T 1 (6:10) ¼ (2p) þ AA exp x þ AA x 2 The calculation is straightforward because both p(s) and p(xjs) are Gaussian. 6.2.2

Within the Framework

Many methods have been developed for estimating the model parameters for the special case of Equation 6.8. Unweighted least square (ULS) algorithm [6] is based on minimizing the sum of squared differences between the observed and estimated correlation matrices, not counting the diagonal. Generalized least square (GLS) [6] algorithm is adjusting ULS by weighting the correlations inversely according to their uniqueness. Another method, maximum likelihood (ML) algorithm [7], uses a linear combination of variables to form factors, where the parameter estimates are those most likely to have resulted in the observed correlation matrix. More details on the ML algorithm can be found in Appendix 6.B. These methods are all of second order, which find the representation using only the information contained in the covariance matrix of the test scores. In most cases, the mean is also used in the initial centering. The reason for the popularity of the second-order methods is that they are computationally simple, often requiring only classical matrix manipulations. Second-order methods are in contrast to most higher order methods that try to find a meaningful representation. Higher order methods use information on the distribution of x that is not contained in the covariance matrix. The distribution of fx must not be assumed to be Gaussian, because all the information of Gaussian variables is contained in the first two-order statistics from which all the high order statistics can be generated. However, for more general families of density functions, the representation problem has more degrees of freedom, and much more sophisticated techniques may be constructed for non-Gaussian random variables. 6.2.2.1 Principal Component Analysis Principal component analysis (PCA) is also known as the Hotelling transform or the Karhunen–Loe`ve transform. It is widely used in signal processing, statistics, and neural computing to find the most important directions in the data in the mean-square sense. It is the solution of the FA problem with minimum mean-square error and an orthogonal weight matrix. The basic idea of PCA is to find the r p linearly transformed components that provide the maximum amount of variance possible. During the analysis, variables in x are transformed linearly and orthogonally into an equal number of uncorrelated new variables in e. The transformation is obtained by finding the latent roots and vectors of either the covariance or the correlation matrix. The latent roots, arranged in descending order of magnitude, are

Signal and Image Processing for Remote Sensing

110

equal to the variances of the corresponding variables in e. Usually the first few components account for a large proportion of the total variance of x, accordingly, may then be used to reduce the dimensionality of the original data for further analysis. However, all components are needed to reproduce accurately the correlation coefficients within x. Mathematically, the first principal component e1 corresponds to the line on which the projection of the data has the greatest variance e1 ¼ arg max

kak¼1

T X

eT x

2

(6:11)

t¼1

The other components are found recursively by first removing the projections to the previous principal components: ek ¼ arg max

kek¼1

X

" e

T

x

k1 X

!#2 ei eTi x

(6:12)

i¼1

In practice, the principal components are found by calculating the eigenvectors of the covariance matrix S of the data as in Equation 6.6. The eigenvalues are positive and they correspond to the variances of the projections of data on the eigenvectors. The basic task in PCA is to reduce the dimension of the data. In fact, it can be proven that the representation given by PCA is an optimal linear dimension reduction technique in the mean-square sense [8,9]. The kind of reduction in dimension has important benefits [10]. First, the computational complexity of the further processing stages is reduced. Second, noise may be reduced, as the data not contained in the components may be mostly due to noise. Third, projecting into a subspace of low dimension is useful for visualizing the data. 6.2.2.2 Independent Component Analysis The independent component analysis (ICA) model originates from the multi-input and multi-output (MIMO) channel equalization [11]. Its two most important applications are blind source separation (BSS) and feature extraction. The mixing model of ICA is similar to that of the FA, but in the basic case without the noise term. The data have been generated from the latent components s through a square mixing matrix A by x ¼ As

(6:13)

In ICA, all the independent components, with the possible exception of one component, must be non-Gaussian. The number of components is typically the same as the number of observations. Such an A is searched for to enable the components s ¼ A1x to be as independent as possible. In practice, the independence can be maximized, for example, by maximizing nonGaussianity of the components or minimizing mutual information [12]. ICA can be approached from different starting points. In some extensions the number of independent components can exceed the number of dimensions of the observations making the basis overcomplete [12,13]. The noise term can be taken into the model. ICA can be viewed as a generative model when the 1D distributions for the components are modeled with, for example, mixtures of Gaussians (MoG). The problem with ICA is that it has the ambiguities of scaling and permutation [12]; that is, the indetermination of the variances of the independent components and the order of the independent components.

Application of Factor Analysis in Seismic Profiling

111

6.2.2.3 Independent Factor Analysis Independent factor analysis (IFA) is formulated by Attias [14]. It aims to describe p generally correlated observed variables x in terms of r < p independent latent variables s and an additive noise term n. The proposed algorithm derives from the ML and more specifically from the expectation–maximization (EM) algorithm. IFA model differs from the classic FA model in that the properties of the latent variables it involves are different. The noise variables n are assumed to be normally distributed, but not necessarily uncorrelated. The latent variables s are assumed to be mutually independent but not necessarily normally distributed; their densities are indeed modeled as mixtures of Gaussians. The independence assumption allows modeling the density of each si in the latent space separately. There are some problems with the EM–MoG algorithm. First, approximating source densities with MoGs is not so straightforward because the number of Gaussians has to be adjusted. Second, EM–MoG is computationally demanding where the complexity of computation grows exponentially with the number of sources [14]. Given a small number of sources the EM algorithm is exact and all the required calculations can be done analytically, whereas it becomes intractable as the number of sources in the model increases.

6.3 6.3.1

FA Application in Seismic Signal Processing Marmousi Data Set

Marmousi is a 2D synthetic data set generated at the Institut Franc¸is du Pe´trole (IFP). The geometry of this model is based on a profile through the North Quenguela trough in the Cuanza basin [15,16]. The geometry and velocity model was created to produce complex seismic data, which requires advanced processing techniques to obtain a correct Earth image. Figure 6.3 shows the velocity profile of the Marmousi model. Based on the profile and the geologic history, a geometric model containing 160 layers was created. Velocity and density distributions were defined by introducing realistic horizontal and vertical velocities and density gradients. This resulted in a 2D density– velocity grid with dimensions of 3000 m in depth by 9200 m in offset. Marmousi velocity model

0

Depth (m)

500 1000 1500 2000 2500 3000 0

1000

2000

3000

4000

5000

Offset (m) FIGURE 6.3 Marmousi velocity model.

6000

7000

8000

9000

Signal and Image Processing for Remote Sensing

112

Data were generated by a modeling package that can simulate a seismic line by computing successively the different shot records. The line was ‘‘shot’’ from west to east. The first and last shot points were, respectively, 3000 and 8975 m from the west edge of the model. Distance between shots was 25 m. Initial offset was 200 m and the maximum offset was 2575 m.

6.3.2

Velocity Analysis, NMO Correction, and Stacking

Given the Marmousi data set, after some conventional processing steps described in Section 6.2, the results of velocity analysis and normal moveout are shown in Figure 6.4. The left-most plot is a CMP gather. There are totally 574 CMP gathers in the Marmousi data set; each includes 48 traces. On the second plot, velocity spectrum is generated after the CMP gather is NMOcorrected and stacked using a range of constant velocity values, and the resultant stack traces for each velocity are placed side by side on a plane of velocity vs. two-way zerooffset time. By selecting the peaks on the velocity spectrum, an initial rms velocity can be defined, shown as a curve on the left of the second plot. The interval velocity can be calculated by using Dix formula [17] and shown on the right side of the plot. Given the estimated velocity profile, the real moveout correction can be carried out, shown in the third plot. As compared with the first plot, we can see the hyperbolic curves are flattened out after NMO correction. Usually another procedure called muting will be carried out before stacking because as we can see in the middle of the third plot, there are CMP gather

Velocity spectrum

NMO-corrected

Optimum muting

0.5

0.5

Time (sec)

1

1

1.5

1.5

2

2

2.5

3

2.5

Offset (m) 1000 2000

3000 4000 5000 Velocity (m/sec)

FIGURE 6.4 Velocity analysis and stacking of Marmousi data set.

6000 Offset (m)

3 Offset (m)

Application of Factor Analysis in Seismic Profiling

113

great distortions because of the approximation. That part will be eliminated before stacking all the 48 traces together. The fourth plot just shows a different way of highlighting the muting procedure. For details, see Ref. [1]. After we complete the velocity analysis, NMO correction, and stacking for the 56 of the CMPs, we get the following section of the subsurface image as on the left of Figure 6.5. There are two reasons that only 56 out of 574 of the CMPs are stacked. One reason is that the velocity analysis is too time consuming on a personal computer and the other is that although 56 CMPs are only one tenths of the 574 CMPs, it indeed covers nearly 700 m of the profile. It is enough to compare processing difference. The right plot is the same image as the left one except that it is after the automatic amplitude adjustment, which is to stress the vague events so that both the vague events and strong events in the image are shown with approximately the same amplitude. The algorithm includes three easy steps: 1. Compute Hilbert envelope of a trace. 2. Convolve the envelope with a triangular smoother to produce the smoothed envelope. 3. Divide the trace by the smoothed envelope to produce the amplitude-adjusted trace. By comparing the two plots, we can see that vague events at the top and bottom of the image are indeed stressed. In the following sections, we mainly use automatic amplitudeadjusted image to illustrate results. It needs to be pointed out that due to NMO stretching and lack of data at small offset after muting, events before 0.2 sec in Figure 6.5 are shown as distorted and do not provide

Stacking

Automatic amplitude adjustment

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

FIGURE 6.5 Stacking of 56 CMPs.

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

114

useful information. In the following sections, when we compare the result, we mainly consider events after 0.2 sec. 6.3.3

The Advantage of Stacking

Stacking is based on the assumption that all the traces in a CMP gather correspond to one single depth point. After they are NMO-corrected, the zero-offset traces should contain the same signal embedded in different random noises, which are caused by the different raypaths. The process of adding them together in this manner can increase the SNR ratio by adding up the signal components while canceling the noises among the traces. To see what stacking can do to improve the subsurface image quality, let us compare the image obtained from a single trace and that from stacking the 48 muted traces. In Figure 6.6, the single trace result without stacking is shown in the right plot. For every CMP (or CDP) gather, only the trace of smallest offset is NMO-corrected and placed side by side together to produce the image, while in the stack result in the left plot, 48 NMO-corrected and muted traces are stacked and placed side by side. Clearly, after stacking, the main events at 0.5, 1.0, and 1.5 sec are stressed, and the noise in between is canceled out. Noise at 0.2 is effectively removed. Noise caused by multiples from 2.0 to 3.0 sec is significantly reduced. However, due to NMO stretching and muting, there are not enough data to depict events at 0 to 0.25 sec on both plots.

6.3.4

Factor Analysis vs. Stacking

Now we suggest an alternative way of obtaining the subsurface image by using FA instead of stacking. As presented in Appendix 6.A, FA can extract one unique common factor from the traces with maximum correlation among them. It fits well with what is

Without stacking

0.5

0.5

1

1 Time (sec)

Time (sec)

Stacking

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.6 Comparison of stacking and single trace images.

3

3000

3200 3400 CDP (m)

Application of Factor Analysis in Seismic Profiling

115

expected of zero-offset traces in that after NMO correction they contain the same signal embedded in different random noises. There are two reasons that FA works better than stacking. First, FA model considers scaling factor A as in Equation 6.14, while stacking assumes no scaling as in Equation 6.15. Factor analysis: x ¼ As þ n

(6:14)

Stacking: x ¼ s þ n

(6:15)

When the scaling information is lost, simple summation does not necessarily increase the SNR ratio. For example, if one scaling factor is 1 and the other is 1, summation will simply cancel out the signal component completely, leaving only the noise component. Second, FA makes use of the second-order statistics explicitly as the criterion to extract the signal while stacking does not. Therefore, SNR ratio will improve more in the case of FA than in the case of stacking. To illustrate the idea, x(t) are generated using the following equation: x(t) ¼ As(t) þ n(t) ¼ A cos (2pt) þ n(t) where s(t) is the sinusoidal signal, n(t) are 10 independent noise terms with Gaussian distribution. The matrix of factor loadings A is also generated randomly. Figure 6.7 shows the result of stacking and FA. The top plot is one of the ten observations x(t). The middle plot is the result of stacking and the bottom plot is the result of FA using ML algorithm as presented in Appendix 6.B. Comparing the two plots suggests that FA outperforms stacking in improving the SNR ratio.

Observable variable

Result of stacking

Result of factor analysis

FIGURE 6.7 Comparison of stacking and FA.

Signal and Image Processing for Remote Sensing

116

0.5 Mute zone

Time (sec)

1

1.5

2

2.5

3

0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.8 Factor analysis of Scheme no. 1.

6.3.5

Application of Factor Analysis

The simulation result in Section 6.3.4 suggests that FA can be applied to the NMOcorrected seismic data. One problem arises, however, when we inspect the zero-offset traces. They need to be muted because of the NMO stretching, which means almost all the traces will have a segment set to zero (mute zone), as is shown in Figure 6.8. Is it possible to just apply FA to the muted traces? Is it possible to have other schemes that make full use of the information at hand? In the following sections, we try to answer these questions by discussing different schemes to carry out FA. 6.3.5.1 Factor Analysis Scheme No. 1 Let us start with the easiest one. The scheme is illustrated in Figure 6.8. We will set the mute zone to zero and apply FA to a CMP gather using ML algorithm. Extracting one single common factor from the 48 traces, and placing all the resulting factors from 56 CMP gathers side by side, the right plot in Figure 6.9 is obtained. Compared with the result of stacking shown on the left, events from 2.2 to 3.0 sec are more smoothly presented instead of the broken dashlike events after stacking. However, at near offset, from 0 to 0.7 sec, the image is contaminated with some vertical stripes. 6.3.5.2 Factor Analysis Scheme No. 2 In this scheme, the muted segments in each trace are replaced by segments of the nearest neighboring traces as is illustrated by Figure 6.10. Trace no. 44 borrows Segment 1

Application of Factor Analysis in Seismic Profiling

117 FA result of Scheme no. 1

0.5

0.5

1

1 Time (sec)

Time (sec)

Stack

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.9 Comparison of stacking and FA result of Scheme no. 1.

Segment 1 Segment 2 Segment 3 0.5

Time (sec)

1

1.5

2

2.5

3 0

5

10

FIGURE 6.10 Factor analysis of Scheme no. 2.

15

20

25 30 Trace number

35

40

45

Signal and Image Processing for Remote Sensing

118

FA result of Scheme no. 2

0.5

0.5

1

1 Time (sec)

Time (sec)

Stack

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.11 Comparison of stacking and FA result of Scheme no. 2.

from Trace no. 45 to fill out its muted segment. Trace no. 43 borrows Segments 1 and 2 from Trace no. 44 to fill out its muted segment and so on. As a result, Segment 1 from Trace no. 45 is copied to all the muted traces, from Trace no. 1 to 44. Segment 2 from Trace no. 44 is copied to traces from Trace no. 1 to 43. After the mute zone is filled out, FA is carried out to produce the result shown in Figure 6.11. Compared to stacking shown on the left, there is no improvement in the obtained image. Actually, the result is worse. Some events are blurred. Therefore, Scheme no. 2 is not a good scheme. 6.3.5.3 Factor Analysis Scheme No. 3 In this scheme, instead of copying the neighboring segments to the mute zone, the segments obtained from applying FA to the traces included in the nearest neighboring box are copied. In Figure 6.12, we first apply FA to traces in Box 1 (Trace no. 45 to 48), then Segment 1 is extracted from the result and copied to Trace no. 44. Segment 2 obtained from applying FA to traces in Box 2 (Trace no. 44 to 48) will be copied to Trace no. 43. When done, the image obtained is shown in Figure 6.13. Compared with Scheme no. 2, the result is better. But compared to stacking, there is still some contamination from 0 to 0.7 sec. 6.3.5.4 Factor Analysis Scheme No. 4 In the scheme, as is illustrated in Figure 6.14, Segment 1 will be extracted from applying FA to all the traces in Box 1 (traces from Trace no. 1 to 48), and Segment 2 will be extracted from applying FA to trace segments in Box 2 (traces from Trace no. 2 to 48). Note that the data are not muted before FA. In this manner, for every segment, all the data points available are fully utilized.

Application of Factor Analysis in Seismic Profiling

119 Box 3 Box 2 Box 1

Segment 1 Segment 2 Segment 3 0.5

Time (sec)

1

1.5

2

2.5

3 0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.12 Factor analysis of Scheme no. 3.

Stack

Factor analysis of Scheme no. 3

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.13 Comparison of stacking and FA result of Scheme no. 3.

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

120

Box 1 Box 2 Box 3

Time (sec)

0.5

1

1.5

2 Segment 3 Segment 2 2.5 Segment 1 3 0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.14 Factor analysis of Scheme no. 4.

In the result generated, we noticed that events from 0 to 0.16 sec are distorted. The amplitude is so large that it overshadows the other events. Comparing all the results obtained above, we conclude that both stacking and FA are unable to extract useful information from 0 to 0.16 sec. To better illustrate the FA result, we will mute the result from the distorted trace segments, and the final result is shown in Figure 6.15. Compare the results of FA and stacking; we can see that events at around 1 and 1.5 sec are strengthened. Events from 2.2 to 3.0 sec are more smoothly presented instead of the broken dashlike events in the stacked result. Overall, the SNR ratio of the image is improved.

6.3.6

Factor Analysis vs. PCA and ICA

The results of PCA and ICA (discussed in subsections 6.2.2.1 and 6.2.2.2) are placed side by side with the result of FA for comparison in Figure 6.16 and Figure 6.17. As we can see from both plots on the right side of the figures, important events are missing and the subsurface images are distorted. The reason is that the criteria used in PCA and ICA to extract the signals are improper to this particular scenario. In PCA, traces are transformed linearly and orthogonally into an equal number of new traces that have the property of being uncorrelated, where the first component having the maximum variance is used to produce the image. In ICA, the algorithm tries to extract components that are as independent to each other as possible, where the obtained components suffer from the problems of scaling and permutation.

Application of Factor Analysis in Seismic Profiling

121

Stack

Factor analysis of Scheme no. 4

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3

3200 3400 CDP (m)

3000

3200 3400 CDP (m)

FIGURE 6.15 Comparison of stacking and FA result of Scheme no. 4.

Factor analysis

Prinicipal component analysis

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.16 Comparison of FA and PCA results.

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

122 Factor analysis

Independent component analysis

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.17 Comparison of FA and ICA results.

6.4

Conclusions

Stacking is one of the three most important and robust processing steps in seismic signal processing. By utilizing the redundancy of the CMP gathers, stacking can effectively remove noise and increase the SNR ratio. In this chapter we propose to use FA to replace stacking to obtain better subsurface images after applying FA algorithm to the synthetic Marmousi data set. Comparisons with PCA and ICA show that FA indeed has advantages over other techniques in this scenario. It is noted that the conventional seismic processing steps adopted here are very basic and for illustrative purposes only. Better results may be obtained in velocity analysis and stacking if careful examination and iterative procedures are incorporated as is often the case in real situations.

References ¨ . Yilmaz, Seismic Data Processing, Society of Exploration Geophysicists, Tulsa, 1987. 1. O 2. E.A. Robinson, S. Treitei, R.A. Wiggins, and P.R. Gutowski, Digital Seismic Inverse Methods, International Human Resources Development Corporation, Boston, 1983. 3. D.N. Lawley and A.E. Maxwell, Factor Analysis as a Statistical Method, Butterworths, London, 1963. 4. E.R. Malinowski, Factor Analysis in Chemistry, 3rd ed., John Wiley & Sons, New York, 2002. 5. A.T. Basilevsky, Statistical Factor Analysis and Related Methods: Theory and Applications, 1st ed., Wiley-Interscience, New York, 1994.

Application of Factor Analysis in Seismic Profiling

123

6. H. Harman, Modern Factor Analysis, 2nd ed., University of Chicago Press, Chicago, 1967. 7. K.G. Jo¨reskog, Some contributions to maximum likelihood factor analysis, Psychometrika, 32:443–482, 1967. 8. I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, Heidelberg, 1986. 9. M. Kendall, Multivariate Analysis, Charles Griffin, London, 1975. 10. A. Hyva¨rinen, Survey on independent component analysis, Neural Computing Surveys, 2:94–128, 1999. 11. P. Comon, Independent component analysis, a new concept? Signal Processing, 36:287–314, 1994. 12. J. Karhunen, A. Hyva¨rinen, and E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001. 13. T. Lee, M. Girolami, M. Lewicki, and T. Sejnowski, Blind source separation of more sources than mixtures using overcomplete representations, IEEE Signal Processing Letters, 6:87–90, 1999. 14. H. Attias, Independent factor analysis, Neural Computation, 11:803–851, 1998. 15. R.J. Versteeg, Sensitivity of prestack depth migration to the velocity model, Geophysics, 58(6):873–882, 1993. 16. R.J. Versteeg, The Marmousi experience: velocity model determination on a synthetic complex data set, The Leading Edge, 13:927–936, 1994. 17. C.H. Dix, Seismic velocities from surface measurements, Geophysics, 20:68–86, 1955. 18. R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, New York, 1960.

APPENDICES

6.A

Upper Bound of the Number of Common Factors

Suppose that there is a unique , matrix S must be of rank r. This is the covariance matrix for x where each diagonal element represents the part of the variance that is due to the r common factors instead of the total variance of the corresponding variate. This is known as communality of the variate. When r ¼ 1, A reduces to a column vector of p elements. It is unique, apart from a possible change of sign of all its elements. With 1 < r < p common factors, it is not generally possible to determine A and s uniquely, even in the case of a normal distribution. Although every factor model specified by Equation 6.8 leads to a multivariate normal, the converse is not necessarily true when 1 < r < p. The difficulty is known as the factor identification or factor rotation problem. Let H be any (r r) orthogonal matrix, so that HHT ¼ HTH ¼ I, then x ¼ AHHT s þ n 8 8

¼ Aasa þ n

:

Thus, s and sa˚ have the same statistical properties since

8 E sa ¼ HT E(s)

8 cov sa ¼ HT covðsÞH ¼ HT H ¼ I Assume there exist 1 < r < p common factors such that G ¼ AVAT and is Grammian and diagonal. The covariance matrix S has 1 p C þ p ¼ p(p þ 1) 2 2 distinct elements, which equals the total number of normal equations to be solved. However, the number of solutions is infinite, as can be seen from the following derivation. Since V is Grammian, its Cholesky decomposition exists. That is, there exists a nonsingular (r r) matrix U, such that V ¼ UTU and S ¼ AVAT þ ¼ AUT UAT þ : T ¼ AUT AUT þ 8

8

¼ AaAa T þ

(6A:1) 124

Application of Factor Analysis in Seismic Profiling

125

Apparently both factorization Equation 6.6 and Equation 6A.1 leave the same residual error and therefore must represent equally valid factor solutions. Also, we can substitute Aa˚ ¼ AB and Va˚ ¼ B1V (BT)1, which again yields a factor model that is indistinguishable from Equation 6.6. Therefore, no sample estimator can distinguish between such an infinite number of transformations. The coefficients A and Aa˚ are thus statistically equivalent and cannot be distinguished from each other or identified uniquely; that is, both the transformed and untransformed coefficients, together with , generate S in exactly the same way and cannot be differentiated by any estimation procedure without the introduction of additional restrictions. To solve the rotational indeterminacy of the factor model we require restrictions on V, the covariance matrix of the factors. The most straightforward and common restriction is to set V ¼ I. The number m of free parameters implied by the equation S ¼ AAT þ

(6A:2)

is then equal to the total number pr þ p for unknown parameters in A and , minus the number of zero restrictions placed on the off-diagonal elements of V, which is equal to 1/2(r2r) since V is symmetric. We then have m ¼ (pr þ p) 1=2(r2 r) ¼ p(r þ 1) 1=2(r2 r)

(6A:3)

where the columns of A are assumed to be orthogonal. The number of degrees of freedom d is then given by the number of equations implied by Equation 6A.2, that is, the number of distinct elements in S minus the number of free parameters m. We have d ¼ 1=2p(p þ 1) p(r þ 1) 1=2(r2 r) ¼ 1=2 (p r)2 (p r)

(6A:4)

which for a meaningful (i.e., nontrivial) empirical application must be strictly positive. This places an upper bound on the number of common factors r, which may be obtained in practice, a number which is generally somewhat smaller than the number of variables p.

6.B

Maximum Likelihood Algorithm

The maximum likelihood (ML) algorithm presented here is proposed by Jo¨reskog [7]. The algorithm uses an iterative procedure to compute a linear combination of variables to form factors. Assume that the random vector x has a multivariate normal distribution as defined in Equation 6.9. The elements of A, V, and are the parameters of the model to be estimated from the data. From a random sample of N observations of x we can find the mean vector and the estimated covariance matrix S, whose elements are the usual estimates of variances and covariances of the components of x.

Signal and Image Processing for Remote Sensing

126

mx ¼

N 1 X xi N i¼1

N 1 X (x mx )(x mx )T N 1 i¼1 ! N X 1 T T ¼ xx Nmx mx : N 1 i¼1

S¼

(6B:1)

The distribution of S is the Wishart distribution [3]. The log-likelihood function is given by h

i 1 log L ¼ (N 1) logjSj þ tr SS1 2 However, it is more convenient to minimize

F(A, V, ) ¼ logjSj þ tr SS1 logjSj p instead of maximizing log L [7]. They are equivalent because log L is a constant minus 1) times F. The function F is regarded as a function of A and . Note that if H is any nonsingular (k k) matrix, then 1 2 (N

F(AH1 , HVHT , ) ¼ F(A, V, ) which means that the parameters in A and V are not independent of one another, and to make the ML estimates of A and V unique, k2 independent restrictions must be imposed on A and V. To find the minimum of F we shall first find the conditional minimum for a given and then find the overall minimum. The partial derivative of F with respect to A is @F ¼ 2S1 (S S)S1 A @A See details in Ref. [7]. For a given , the minimization of A is to be found in the solution of S1 (S S)S1 A ¼ 0 Premultiplying with S gives (S S)S1 A ¼ 0 Using the following expression for the inverse S1 [3] S1 ¼ 1 1 A(I þ AT 1 A)1 AT 1 whose left side may be further simplified [7] so that (S S)1 A(I þ AT 1 A)1 ¼ 0

(6B:2)

Application of Factor Analysis in Seismic Profiling

127

Postmultiplying by I þ AT1A gives (S S)1 A ¼ 0

(6B:3)

which after substitution of S from Equation 6A.2 and rearrangement of terms gives S1 A ¼ A(I þ AT 1 A) Premultiplying by 1/2 finally gives (1=2 S1=2 )(1=2 A) ¼ (1=2 A)(I þ AT 1 A)

(6B:4)

From Equation 6B.4, we can see that it is convenient to take AT1A to be diagonal, since F is unaffected by postmultiplication of A by an orthogonal matrix and AT1A can be reduced to diagonal form by orthogonal transformations [18]. In this case, Equation 6B.4 is a standard eigen decomposition form. The columns of 1/2A are latent vectors of 1/2 S1/2, and the diagonal elements of I þ AT1A are the corresponding latent roots. Let e1 e2 ep be the latent roots of 1/2 S1/2 and let ee1, ee2, , eek be a set of latent vectors ek be the diagonal matrix with e1, e2, . . . , ek as corresponding to the k largest roots. Let L diagonal elements and let Ek be the matrix with ee1, ee2, . . . , eek as columns. Then ek I)1=2 e ¼ Ek ( L 1=2 A Premultiplying by 1/2 gives the conditional ML estimate of A as ek I)1=2 e ¼ 1=2 Ek (L A

(6B:5)

Up to now, we have considered the minimization of F with respect to A for a given . Now let us examine the partial derivative of F with respect to [3], h i @F e ) S1 ¼ diag S1 (S S @ e 1 with Equation 6B.2 and using Equation 6B.3 gives Substituting S h i @F e ) 1 ¼ diag 1 (S S @ which by Equation 6.6 becomes h i @F e ) 1 eA eT þ S ¼ diag 1 (A @ Minimizing it, we will get, eA eA e T) ¼ diag(S

(6B:6)

By iterating Equation 6B.5 and Equation 6B.6, the ML estimation of the FA model of Equation 6.4 can be obtained.

7 Kalman Filtering for Weak Signal Detection in Remote Sensing

Stacy L. Tantum, Yingyi Tan, and Leslie M. Collins

CONTENTS 7.1 Signal Models .................................................................................................................... 131 7.1.1 Harmonic Signal Model ...................................................................................... 131 7.1.2 Interference Signal Model ................................................................................... 131 7.2 Interference Mitigation .................................................................................................... 132 7.3 Postmitigation Signal Models ......................................................................................... 133 7.3.1 Harmonic Signal Model ...................................................................................... 134 7.3.2 Interference Signal Model ................................................................................... 134 7.4 Kalman Filters for Weak Signal Estimation ................................................................. 135 7.4.1 Direct Signal Estimation ...................................................................................... 136 7.4.1.1 Conventional Kalman Filter ................................................................ 136 7.4.1.2 Kalman Filter with an AR Model for Colored Noise ..................... 137 7.4.1.3 Kalman Filter for Colored Noise........................................................ 139 7.4.2 Indirect Signal Estimation................................................................................... 140 7.5 Application to Landmine Detection via Quadrupole Resonance............................. 142 7.5.1 Quadrupole Resonance........................................................................................ 143 7.5.2 Radio-Frequency Interference ............................................................................ 143 7.5.3 Postmitigation Signals ......................................................................................... 144 7.5.3.1 Postmitigation Quadrupole Resonance Signal................................. 145 7.5.3.2 Postmitigation Background Noise ..................................................... 145 7.5.4 Kalman Filters for Quadrupole Resonance Detection.................................... 145 7.5.4.1 Conventional Kalman Filter ................................................................ 145 7.5.4.2 Kalman Filter with an Autoregressive Model for Colored Noise ................................................................................. 146 7.5.4.3 Kalman Filter for Colored Noise........................................................ 146 7.5.4.4 Indirect Signal Estimation ................................................................... 146 7.6 Performance Evaluation .................................................................................................. 147 7.6.1 Detection Algorithms........................................................................................... 147 7.6.2 Synthetic Quadrupole Resonance Data ............................................................ 147 7.6.3 Measured Quadrupole Resonance Data ........................................................... 148 7.7 Summary ............................................................................................................................ 150 References ................................................................................................................................... 151

129

130

Signal and Image Processing for Remote Sensing

Remote sensing often involves probing a region of interest with a transmitted electromagnetic signal, and then analyzing the returned signal to infer characteristics of the investigated region. It is not uncommon for the measured signal to be relatively weak or for ambient noise to interfere with the sensor’s ability to isolate and measure only the desired return signal. Although there are potential hardware solutions to these obstacles, such as increasing the power in the transmit signal to strengthen the return signal, or altering the transmit frequency, or shielding the system to eliminate the interfering ambient noise, these solutions are not always viable. For example, regulatory constraints on the amount of power that may be radiated by the sensor or the trade-off between the transmit power and the battery life for a portable sensor may limit the power in the transmitted signal, and effectively shielding a system in the field from ambient electromagnetic signals is very difficult. Thus, signal processing is often utilized to improve signal detectability in situations such as these where a hardware solution is not sufficient. Adaptive filtering is an approach that is frequently employed to mitigate interference. This approach, however, relies on the ability to measure the interference on auxiliary reference sensors. The signals measured on the reference sensors are utilized to estimate the interference, and then this estimate is subtracted from the signal measured by the primary sensor, which consists of the signal of interest and the interference. When the interference measured by the reference sensors is completely correlated with the interference measured by the primary sensor, the adaptive filtering can completely remove the interference from the primary signal. When there are limitations in the ability to measure the interference, that is, the signals from the reference sensors are not completely correlated with the interference measured by the primary sensor, this approach is not completely effective. Since some residual interference remains after the adaptive interference cancellation, signal detection performance is adversely affected. This is particularly true when the signal of interest is weak. Thus, methods to improve signal detection when there is residual interference would be useful. The Kalman filter (KF) is an important development in linear estimation theory. It is the statistically optimal estimator when the noise is Gaussian-distributed. In addition, the Kalman filter is still the optimal linear estimator in the minimum mean square error (MMSE) sense even when the Gaussian assumption is dropped [1]. Here, Kalman filters are applied to improve detection of weak harmonic signals. The emphasis in this chapter is not on developing new Kalman filters but, rather, on applying them in novel ways for improved weak harmonic signal detection. Both direct estimation and indirect estimation of the harmonic signal of interest are considered. Direct estimation is achieved by applying Kalman filters in the conventional manner; the state of the system is equal to the signal to be estimated. Indirect estimation of the harmonic signal of interest is achieved by reversing the usual application of the Kalman filter so the background noise is the system state to be estimated, and the signal of interest is the observation noise in the Kalman filter problem statement. This approach to weak signal estimation is evaluated through application to quadrupole resonance (QR) signal estimation for landmine detection. Mine detection technologies and systems that are in use or have been proposed include electromagnetic induction (EMI) [2], ground penetrating radar (GPR) [3], and QR [4,5]. Regardless of the technology utilized, the goal is to achieve a high probability of detection, PD, while maintaining a low probability of false alarm, PFA. This is of particular importance for landmine detection since the nearly perfect PD required to comply with safety requirements often comes at the expense of a high PFA, and the time and cost required to remediate contaminated areas is directly proportional to PFA. In areas such as a former battlefield, the average ratio of real mines to suspect objects can be as low as 1:100, thus the process of clearing the area often proceeds very slowly.

Kalman Filtering for Weak Signal Detection in Remote Sensing

131

QR technology for explosive detection is of crucial importance in an increasing number of applications. Most explosives, such as RDX, TNT, PETN, etc., contain nitrogen (N). Some of its isotopes, such as 14N, possess electric quadrupole moments. When compounds with such moments are probed with radio-frequency (RF) signals, they emit unique signals defined by the specific nucleus and its chemical environment. The QR frequencies for explosives are quite specific and are not shared by other nitrogenous materials. Since the detection process is specific to the chemistry of the explosive and therefore is less susceptible to the types of false alarms experienced by sensors typically used for landmine detection, such as EMI or GPR sensors, the pure QR of 14N nuclei supports a promising method for detecting explosives in the quantities encountered in landmines. Unfortunately, QR signals are weak, and thus vulnerable to both the thermal noise inherent in the sensor coil and external radio-frequency interference (RFI). The performance of the Kalman filter approach is evaluated on both simulated data and measured field data collected by Quantum Magnetics, Inc. (QM). The results show that the proposed algorithm improves the performance of landmine detection.

7.1

Signal Models

In this chapter, it is assumed that the sensor operates by repeatedly transmitting excitation pulses to investigate the potential target and acquires the sensor response after each pulse. The data acquired after each excitation pulse are termed a segment, and a group of segments constitutes a measurement. In general, for each potential target there are multiple measurements with each measurement containing many segments.

7.1.1

Harmonic Signal Model

The discrete-time harmonic signal of interest, at frequency f0, in a single segment can be represented by s(n) ¼ A0 cos (2pf0 n þ f0 ), n ¼ 0, 1, . . . , N 1

(7:1)

The measured signal may be demodulated at the frequency of the desired harmonic signal, f0, to produce a baseband signal, ~s(n), ~s(n) ¼ A0 ejf0 ,

n ¼ 0, 1, . . . , N 1

(7:2)

Assuming the frequency of the harmonic signal of interest is precisely known, the signal of interest after demodulation and subsequent low-pass filtering to remove any aliasing introduced by the demodulation is a DC constant.

7.1.2

Interference Signal Model

A source of interference for this type of signal detection problem is ambient harmonic signals. For example, sensors operating in the RF band could experience interference due to other transmitters operating in the same band, such as radio stations. Since there may be many sources transmitting harmonic signals operating simultaneously, the demodulated

Signal and Image Processing for Remote Sensing

132

interference signal measured in each segment may be modeled as the sum of the contribution from M different sources, each operating at its own frequency fm, ~I ¼

M X

~ m (n)ej(2pfm nþfm ) , A

n ¼ 0, 1, . . . , N 1

(7:3)

m¼1

where the superscript denotes a complex value and we assume the frequencies are ˜ m(n) from a discrete time series. distinct, meaning fi 6¼ fj for i 6¼ j. The amplitudes A Although the amplitudes are not restricted to constant values in general, they are assumed to remain essentially constant over the short time intervals during which each data ˜ m(n) is segment is collected. For time intervals on this order, it is reasonable to assume A constant for each data segment, but may change from segment to segment. Therefore, the interference signal model may be expressed as ~I ¼

M X

~ m ej(2pfm nþfm ) , n ¼ 0, . . . , N 1 A

(7:4)

m¼1

This model represents all frequencies even though the harmonic signal of interest exists in a very narrow band. In practice, only the frequency corresponding to the harmonic signal of interest needs to be considered.

7.2

Interference Mitigation

Adaptive filtering is a widely applied approach for noise cancellation [6]. The basic approach is illustrated in Figure 7.1. The primary signal consists of both the harmonic signal of interest and the interference. In contrast, the signal measured on each auxiliary antenna or sensor consists of only the interference. Adaptive noise cancellation utilizes the measured reference signals to estimate the noise present in the measured primary signal. The noise estimate is then subtracted from the primary signal to find the signal of interest. Adaptive noise cancellation, such as the least mean square (LMS) algorithm, is well suited for those applications in which one or more reference signals are available [6].

Signal of interest

Interference signal

+ Primary antenna

Reference antenna

+ −

Adaptive filter

FIGURE 7.1 Interference mitigation based on adaptive noise cancellation.

Signal estimate

Kalman Filtering for Weak Signal Detection in Remote Sensing

133

In this application, the adaptive noise cancellation is performed in the frequency domain by applying the normalized LMS algorithm to each frequency component of the frequency domain representation of the measured primary signal. The primary measured signal at time n may be denoted by d(n) and the measured reference signal with u(n). The tap input vector may be represented by u(n) ¼ [u(n) u(n 1) u(n M þ 1)]T and ^ (n). Both the tap input and tap weight vectors are the tap weight vector may be given by w of length M. Given these definitions, the filter output at time n, e(n), is ^ H (n)u(n) e(n) ¼ d(n) w

(7:5)

^ H(n)u(n) represents the interference estimate. The tap weights are updated The quantity w according to ^ (n þ 1) ¼ w ^ (n) þ m0 P1 (n)u(n)e (n) w

(7:6)

where the parameter m0 is an adaptation constant that controls the convergence rate and P(n) is given by P(n) ¼ bP(n 1) þ (1 b)ju(n)j2

(7:7)

with 0 < b < 1 [7]. The extension of this approach to utilize multiple reference signals is straightforward.

7.3

Postmitigation Signal Models

Under perfect circumstances the interference present in the primary signal is completely correlated with the reference signals and all interference can be removed by the adaptive noise cancellation, leaving only Gaussian noise associated with the sensor system. Since the interference often travels over multiple paths and the sensing systems are not perfect, however, the adaptive interference mitigation rarely removes all the interference. Thus, there is residual interference that remains after the adaptive noise cancellation. In addition, the adaptive interference cancellation process alters the characteristics of the signal of interest. The real-valued observed data prior to interference mitigation may be represented by x(n) ¼ s(n) þ I(n) þ w(n),

n ¼ 0, 1, . . . , N 1

(7:8)

where s(n) is the signal of interest, I(n) is the interference, and w(n) is Gaussian noise associated with the sensor system. The baseband signal after demodulation becomes ~ ~ (n) x(n) ¼ ~s(n) þ ~I (n) þ w

(7:9)

where ~I (n) is the interference, which is reduced but not completely eliminated by adaptive filtering. The signal remaining after interference mitigation is ~ y(n) ¼ ~s(n) þ ~ v(n), n ¼ 0, 1 , . . . , N 1

(7:10)

Signal and Image Processing for Remote Sensing

134

where ~s(n) is now the altered signal of interest and ~v(n) is the background noise remaining after interference mitigation, both of which are described in the following sections. 7.3.1

Harmonic Signal Model

Due to the nonlinear phase effects of the adaptive filter, the demodulated harmonic signal of interest is altered by the interference mitigation; it is no longer guaranteed to be a DC constant. The postmitigation harmonic signal is modeled with a first-order Gaussian– Markov model, ~s(n þ 1) ¼ ~s(n) þ «~(n)

(7:11)

where ~ «(n) is zero mean white Gaussian noise with variance s~«2.

7.3.2

Interference Signal Model

Although interference mitigation can remove most of the interference, some residual interference remains in the postmitigation signal. The background noise remaining after interference mitigation, ~ v(n), consists of the residual interference and the remaining sensor noise, which is altered by the mitigation. Either an autoregressive (AR) model or an autoregressive moving average (ARMA) model is appropriate for representing the sharp spectral peaks, valleys, and roll-offs in the power spectrum of ~v(n). An AR model is a causal, linear, time-invariant discrete-time system with a transfer function containing only poles, whereas an ARMA model is a causal, linear, time-invariant discrete-time system with a transfer function containing both zeros and poles. The AR model has a computational advantage over the ARMA model in the coefficient computation. Specifically, the AR coefficient computation involves solving a system of linear equations known as Yule–Walker equations, whereas the ARMA coefficient computation is significantly more complicated because it requires solving systems of nonlinear equations. Estimating and identifying an AR model for real-valued time series is well understood [8]. However, for complex-valued AR models, few theoretical and practical identification and estimation methods could be found in the literature. The most common approach is to adapt methods originally developed for real-valued data to complex-valued data. This strategy works well only when the the complex-valued process is the output of a linear system driven by white circular noise whose real and imaginary parts are uncorrelated and white [9]. Unfortunately, circularity is not always guaranteed for most complexvalued processes in practical situations. Analysis of the postmitigation background noise for the specific QR signal detection problem considered here showed that it is not a pure circular complex process. However, since the cross-correlation between the real and imaginary parts is small compared to the autocorrelation, we assume that the noise is a circular complex process and can be modeled as a P-th order complex AR process. Thus, ~ v(n) ¼

P X

~ap v~(n p) þ «~(n)

(7:12)

p¼1

where the driving noise ~ «(n) is white and complex-valued. Conventional modeling methods extended from real-valued time series are used for estimating the complex AR parameters. The Burg algorithm, which estimates the AR parameters by determining reflection coefficients that minimize the sum of forward and

Kalman Filtering for Weak Signal Detection in Remote Sensing

135

backward residuals, is the preferred estimator among various estimators for AR parameters [10]. Furthermore, the Burg algorithm has been reformulated so that it can be used to analyze data containing several separate segments [11]. The usual way of combining the information across segments is to take the average of the models estimated from the individual segments [12], such as average AR parameters (AVA) and average reflection coefficient (AVK). We found that for this application, the model coefficients are stable enough to be averaged. Although this will reduce the estimate variance, it still contains a bias that is proportional to 1=N, where N is the number of observations [13]. Another novel extension of the Burg algorithm to segments, the segment Burg algorithm (SBurg), proposed in [14], estimates the reflection coefficients by minimizing the sum of the forward and backward residuals of all the segments taken together. This means a single model is fit to all the segments simultaneously. An important aspect of AR modeling is choosing the appropriate order P. Although there are several criteria to determine the order for a real AR model, no formal criterion exists for a complex AR model. We adopt the Akaike information criterion (AIC) [15] that is a common choice for real AR models. The AIC is defined as AIC(p) ¼ ln (s ^«2 (p)) þ

2p , p ¼ 1, 2, 3, . . . T

(7:13)

where ^«2(p) is the prediction error power at P-th order and T is the total number of samples. The prediction error power for a given value of P is simply the variance in the difference between the true signal and the estimate for the signal using a model of order P. For signal measurements consisting of multiple segments, i.e. S segments and N samples=segment, T ¼ N S. The model order for which the AIC has the smallest value is chosen. The AIC method tends to overestimate the order [16] and is good for short data records.

7.4

Kalman Filters for Weak Signal Estimation

Kalman filters are appropriate for discrete-time, linear, and dynamic systems whose output can be characterized by the system state. To establish notation and terminology, this section provides a brief overview consisting of the problem statement and the recursive solution of the Kalman filter variants applied for weak harmonic signal detection. Two approaches to estimate a weak signal in nonstationary noise, both employing Kalman filter approaches, are proposed [17]. The first approach directly estimates the signal of interest. In this approach, the Kalman filter is utilized in a traditional manner, in that the signal of interest is the state to be estimated. The second approach indirectly estimates the signal of interest. This approach utilizes the Kalman filter in an unconventional way because the noise is the state to be estimated, and then the noise estimate is subtracted from the measured signal to obtain an estimate of the signal of interest. The problem statements and recursive Kalman filter solutions for each of the Kalman filters considered are provided with their application to weak signal estimation. These approaches have an advantage over the more intuitive approach of simply subtracting a measurement of the noise recorded in the field because the measured background noise is nonstationary, and the residual background noise after interference mitigation is also nonstationary. Due to the nonstationarity of the background noise, it must be measured and subtracted in real time. This is exactly what the adaptive frequency-domain

Signal and Image Processing for Remote Sensing

136

interference mitigation attempts to do, and the interference mitigation does significantly improve harmonic signal detection. There are, however, limitations to the accuracy with which the reference background noise can be measured, and these limitations make the interference mitigation insufficient for noise reduction. Thus, further processing, such as the Kalman filtering proposed here, is necessary to improve detection performance.

7.4.1

Direct Signal Estimation

The first approach assumes the demodulated harmonic signal is the system state to be estimated. The conventional Kalman filter was designed for white noise. Since the measurement noise in this application is colored, it is necessary to investigate modified Kalman filters. Two modifications are considered: utilizing an autoregressive (AR) model for the colored noise and designing a Kalman filter for colored noise, which can be modeled as a Markov process. In applying each of the following Kalman filter variants to directly estimate the demodulated harmonic signal, the system is assumed to be described by state equation: ~sk ¼ ~sk1 þ «~k

(7:14)

observation equation: ~yk ¼ ~sk þ ~vk

(7:15)

where ~sk is the postmitigation harmonic signal, modeled using a first-order Gaussian– Markov model, and ~ vk is the postmitigation background noise. 7.4.1.1

Conventional Kalman Filter

The system model for the conventional Kalman filter is described by two equations: a state equation and an observation equation. The state equation relates the current system state to the previous system state, while the observation equation relates the observed data to the current system state. Thus, the system model is described by state equation: xk ¼ Fk xk1 þ Gk wk

(7:16)

observation equation: zk ¼ HH k xk þ uk

(7:17)

where the M-dimensional parameter vector xk represents the state of the system at time k, the M M matrix Fk is the known state transition matrix relating the states of the system at time k and k 1, and the N-dimensional parameter vector zk represents the measured data at time k. The M 1 vector wk represents process noise, and the N 1 vector uk is measurement noise. The M M diagonal coefficient matrix Gk modifies the variances of the process noise. If both uk and wk are independent, zero mean, white noise processes H with E{ukuH k } ¼ Rk and E{wkwk } ¼ Qk, then the initial system state x0 is a random vector, with mean x0 and covariance S0, independent of uk and wk. The Kalman filter determines the estimates of the system state, ^xkjk1 ¼ E{xkjzk1} and ^ xkjk ¼ E{xkjzk}, and the associated error covariance matrices Skjk1 and Skjk. The recursive solution is achieved in two steps. The first step predicts the current state and the error covariance matrix using the previous data, ^ xkjk1 ¼ Fk1 ^xk1jk1

(7:18)

H Skjk1 ¼ Fk1 Sk1jk1 FH k1 þ Gk1 Qk1 Gk1

(7:19)

Kalman Filtering for Weak Signal Detection in Remote Sensing

137

The second step first determines the Kalman gain, Kk, and then updates the state and the error covariance prediction using the current data, 1 Kk ¼ Skjk1 Hk (HH k Skjk1 þ Rk )

(7:20)

^ xkjk1 ) xkjk ¼ ^ xkjk1 þ Kk (zk HH k ^

(7:21)

Skjk ¼ (I Kk HH k )Skjk1

(7:22)

The initial conditions on the state estimate and error covariance matrix are ^x1j0 ¼ x0 and S1j0 ¼ S0. In general, the system parameters F, G, and H are time-varying, which is denoted in the preceding discussion by the subscript k. In the discussions of the Kalman filters implemented for harmonic signal estimation, the system is assumed to be time-invariant. Thus, the subscript k is omitted and the system model becomes state equation: xk ¼ Fxk1 þ Gwk

(7:23)

observation equation: zk ¼ HH xk þ uk

(7:24)

7.4.1.2 Kalman Filter with an AR Model for Colored Noise A Kalman filter for colored measurement noise, based on the fact that colored noise can often be simulated with sufficient accuracy by a linear dynamic system driven by white noise, is proposed in [18]. In this approach, the colored noise vector is included in an augmented state variable vector, and the observations now contain only linear combinations of the augmented state variables. The state equation is unchanged from Equation 7.23; however, the observation equation is modified so the measurement noise is colored, thus the system model is state equation: xk ¼ Fxk1 þ Gwk

(7:25)

observation equation: zk ¼ HH xk þ ~vk

(7:26)

where ~ vk is colored noise modeled by the complex-valued AR process in Equation 7.12. Expressing the P-th order AR process ~ vk in state space notion yields state equation: vk ¼ Fv vk1 þ Gv «~k

(7:27)

observation equation: ~vk ¼ HH v vk

(7:28)

3 ~ v(k P þ 1) 7 6~ 6 v(k P þ 2) 7 vk ¼ 6 7 .. 5 4 . 2

where

~ v(k) 2

0 6 0 6 . . Fv ¼ 6 6 . 4 0 ~aP

1 0 .. .

0 1 .. .

0 ~aP1

0 ~aP2

(7:29) P1

.. .

0 0 .. .

0 ~a2

3 0 0 7 .. 7 . 7 7 1 5 ~a1

PP

(7:30)

Signal and Image Processing for Remote Sensing

138

2 3 0 607 6.7 .7 Gv ¼ Hv ¼ 6 6.7 405 1

(7:31)

P 1

and ~ «k is the white driving process in Equation 7.12 with zero mean and variance s~«2. Combining Equation 7.25, Equation 7.26, and Equation 7.28, yields a new Kalman filter expression whose dimensions have been extended, k state equation: xk ¼ Fxk1 þ Gw H

observation equation: zk ¼ H xk where

H xk wk k ¼ x , wk ¼ , H ¼ HH HH v , vk «~k ¼ F 0 , and G ¼ G 0 F 0 Fv 0 Gv

(7:32) (7:33)

(7:34)

In estimation literature, this is termed the noise-free [19] or perfect measurement [20] problem. The process noise, wk, and colored noise state process noise, ~«k, are assumed to be uncorrelated, so Q 0 H Q ¼ E{wk wk } ¼ (7:35) 0 s2«~ Since there is no noise in Equation 7.33, the covariance matrix of the observation noise R ¼ 0. The recursive solution for this problem, defined in Equation 7.32 and Equation 7.33, is the same as for the conventional Kalman filter given in Equation 7.18 through Equation 7.22. When estimating the harmonic signal with the traditional Kalman filter with an AR model for the colored noise, the coefficient matrices are 3 2 ~(k) x 2 3 1 7 6~ v (k P þ 1) 7 6 6 07 v(k P þ 2) 7 k ¼ 6 x (7:36) .. 7 7, H ¼ 6 6~ 4 7 6 .. .5 5 4 . 1 Pþ1 ~ v(k) 3 2 1 0 0 0 0 60 1 0 0 0 7 6. .. 7 .. .. .. .. 6 . (7:37) F¼6. . 7 . . . . 7 40 0 0 0 1 5 0 aP aP1 a2 a1

~k w wk ¼ , «~k

2

1 60 G¼6 4 ... 0

3 0 07 .. 7 .5 1

(7:38) (Pþ1)2

Kalman Filtering for Weak Signal Detection in Remote Sensing and Q ¼ E{wk wH k }¼

s2w~ 0

0 s2«~

139

(7:39)

Since the theoretical value R ¼ 0 turns off the Kalman filter it does not track the signal, R should be set to a positive value. 7.4.1.3 Kalman Filter for Colored Noise The Kalman filter has been generalized to systems for which both the process noise and measurement noise are colored and can be modeled as Markov processes [21]. A Markov process is a process for which the probability density function of the current sample depends only on the previous sample, not the entire process history. Although the Markov assumption may not be accurate in practice, this approach remains applicable. The system model is state equation: xk ¼ Fxk1 þ Gwk

(7:40)

observation equation: zk ¼ HH k x þ uk

(7:41)

where the process noise wk and the measurement noise uk are both zero mean and colored with arbitrary covariance matrices at time k, Q, and R, so Qij ¼ cov(wi , wj ), i, j ¼ 0, 1, 2, . . . Rij ¼ cov(ui , uj ), i, j ¼ 0, 1, 2, . . . The initial state x0 is a random vector with mean x0 and covariance matrix P0, and x0, wk, and uk are independent. The prediction step of the Kalman filter solution is

where Ck

1

^ xkjk1 ¼ F^xk1jk1

(7:42)

Skjk1 ¼ FSk1jk1 FH þ GQk1 GH þ FCk1 þ (FCk1 )H

(7:43)

1 k 1 ¼ Ckk 1jk 1 and Ck 1jk

1

is given recursively by

k1 H ¼ FCk1 Ciji1 i1ji1 þ GQi1,k1 G

(7:44)

k1 H k1 Ck1 iji ¼ Ciji1 Ki H Ciji1

(7:45)

k ¼ 0 and Q0 ¼ 0. The k in the superscript denotes time k. with the initial value C0j0 The subsequent update step is

^ xkjk ¼ ^ xkjk1 þ Kk (zk HH ^xkjk1 )

(7:46)

Skjk ¼ Skjk1 Kk Sk KH k

(7:47)

Kk ¼ (Skjk1 H þ Vk )S1 k

(7:48)

where Kk and Sk are given by

Signal and Image Processing for Remote Sensing

140

Sk ¼ HH Skjk1 H þ HH Vk þ (HH Vk )H þ Rk Vk ¼ Vkkjk

1

(7:49)

is given recursively by Vkiji1 ¼ FVki1ji1 Vkiji ¼ Vkiji1 þ Ki (Ri,k HH Vkiji1 )

(7:50)

withRk being the new ‘‘data’’ and the initial value V0j0k ¼ 0. When the noise is white V ¼ 0 and C ¼ 0, and this Kalman filter reduces to the conventional Kalman filter. It is worth noting that this filter is optimal only when E{wk} and E{uk} are known.

7.4.2

Indirect Signal Estimation

A central premise in Kalman filter theory is that the underlying state-space model is accurate. When this assumption is violated, the performance of the filter can deteriorate appreciably. The sensitivity of the Kalman filter to signal modeling errors has led to the development of robust Kalman filtering techniques based on modeling the noise. The conventional point of view in applying Kalman filters to a signal detection problem is to assign the system state, xk, to the signal to be detected. Under this paradigm, the hypotheses ‘‘signal absent’’ (H0) and ‘‘signal present’’ (H1) are represented by the state itself. When the conventional point of view is applied for this particular application, the demodulated harmonic signal (a DC constant) is the system state to be estimated. Although a model for the desired signal can be developed from a relatively pure harmonic signal obtained in a shielded lab environment, that signal model often differs from the harmonic signal measured in the field, sometimes substantially, because the measured signal may be a function of system and environmental parameters, such as temperature. Therefore the harmonic signal measured in the field may deviate from the assumed signal model and the Kalman filter may produce a poor state estimate due to inaccuracies in the state equation. However, the background noise in the field can be measured, from which a reliable noise model can be developed, even though the measured noise is not exactly the same as the noise corrupting the measurements. The background noise in the postmitigation signal (Equation 7.10), ~v(n), can be modeled as a complex-valued AR process as previously described. The Kalman filter is applied to estimate the background noise in the postmitigation signal and then the background noise estimate is subtracted from the postmitigation signal to indirectly estimate the postmitigation harmonic signal (a DC constant). Thus, the state in the Kalman filter, xk, is the background noise ~ v(n), 3 ~ v(k P þ 1) 7 6~ 6 v(k P þ 2) 7 xk ¼ 6 7 .. 5 4 . 2

(7:51)

v~(k) and the measurement noise in the observation, uk, is the demodulated harmonic signal ~s(n), uk ¼ ~sk

(7:52)

Kalman Filtering for Weak Signal Detection in Remote Sensing

141

The other system parameters are 2

0 6 0 6 . . F¼6 6 . 4 0 ~aP

.. .

1 0 .. .

0 1 .. .

0 ~aP1

0 ~aP2

0 0 .. .

0 ~a2

2 3 0 607 6.7 .7 G¼H¼6 6.7 405 1

3 0 0 7 .. 7 . 7 7 1 5

(7:53)

~a1

(7:54) P1

and wk ¼ ~ «k. Therefore, the measurement equation becomes vk þ ~sk zk ¼ ~

(7:55)

where zk is the measured data corresponding to ~y(k) in Equation 7.10. A Kalman filter assumes complete a priori knowledge of the process and measurement noise statistics Q and R. These statistics, however, are inexactly known in most practical situations. The use of incorrect a priori statistics in the design of a Kalman filter can lead to large estimation errors, or even to a divergence of errors. To reduce or bound these errors, an adaptive filter is employed by modifying or adapting the Kalman filter to the real data. The approaches to adaptive filtering are divided into four categories: Bayesian, maximum likelihood, correlation, and covariance matching [22]. The last technique has been suggested for the situations when Q is known but R is unknown. The covariance matching algorithm ensures that the residuals remain consistent with the theoretical covariance. The residual, or innovation, is defined by vk ¼ zk HH ^xkjk1

(7:56)

which has a theoretical covariance of H E{vk vH k } ¼ H Skjk1 H þ R

(7:57)

If the actual covariance of vk is much larger than the covariance obtained from the Kalman filter, R should be increased to prevent divergence. This has the effect of increasing Skjk1, thus bringing the actual covariance of vk closer to that given in Equation 7.57. In this case, R is estimated as m X H ^k ¼ 1 vkj vH R kj H Skjk1 H m j¼1

(7:58)

Here, a two-step adaptive Kalman filter using the covariance matching method is pro^ . Then, the convenposed. First, the covariance matching method is applied to estimate R ^ tional Kalman filter is implemented with R ¼ R to estimate the background noise. In this application, there are several measurements of data, with each measurement containing tens to hundreds of segments. For each segment, the covariance matching method is

Signal and Image Processing for Remote Sensing

142

(yn) = s(n) + v(n)

Adaptive Kalman filter

Estimate R

Rs

Conventional Kalman filter

v(n) −

+ + s(n)

Detector FIGURE 7.2 Block diagram of the two-step adaptive Kalman filter strategy. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Decision

**b**

H0 H1

^ k, where the subscript k denotes the sample index. Since it is an employed to estimate R ^ k (k m) are retained and averaged adaptive procedure, only the steady-state values of R ^ s for each segment, to find the value of R ^s ¼ R

1 X 1 N ^k R N m k¼m

(7:59)

where the subscript s denotes the segment index. Then, the average is taken over all the segments in each measurement. Thus, L X ^s ^¼1 R R L s¼1

(7:60)

is used in the conventional Kalman filter in this two-step process. A block diagram depicting the two-step adaptive Kalman filter strategy is shown in Figure 7.2.

7.5

Application to Landmine Detection via Quadrupole Resonance

Landmines are a form of unexploded ordnance, usually emplaced on or just under the ground, which are designed to explode in the presence of a triggering stimulus such as pressure from a foot or vehicle. Generally, landmines are divided into two categories: antipersonnel mines and antitank mines. Antipersonnel (AP) landmines are devices usually designed to be triggered by a relatively small amount of pressure, typically 40 lbs, and generally contain a small amount of explosive so that the explosion aims or

Kalman Filtering for Weak Signal Detection in Remote Sensing

143

kills the person who triggers the device. In contrast, antitank (AT) landmines are specifically designed to destroy tanks and vehicles. They explode only if compressed by an object weighing hundreds of pounds. AP landmines are generally small (less than 10 cm in diameter) and are usually more difficult to detect than the larger AT landmines.

7.5.1

Quadrupole Resonance

When compounds with quadrupole moments are excited by a properly designed EMI system, they emit unique signals characteristic of the compound’s chemical structure. The signal for a compound consists of a set of spectral lines, where the spectral lines correspond to the QR frequencies for that compound, and every compound has its own set of resonant frequencies. This phenomenon is similar to nuclear magnetic resonance (NMR). Although there are several subtle distinctions between QR and NMR, in this context it is sufficient to view QR as NMR without the external magnetic field [4]. The QR phenomenon is applicable for landmine detection because many explosives, such as RDX, TNT, and PETN, contain nitrogen, and some of nitrogen’s isotopes, namely 14N, have electric quadrupole moments. Because of the chemical specificity of QR, the QR frequencies for explosives are unique and are not shared with other nitrogenous materials. In summary, landmine detection using QR is achieved by observing the presence, or absence, of a QR signal after applying a sequence of RF pulses designed to excite the resonant frequency of frequencies for the explosive of interest [4]. RFI presents a problem since the frequencies of the QR response fall within the commercial AM radio band. After the QR response is measured, additional processing is often utilized to reduce the RFI, which is usually a non-Gaussian colored noise process. Adaptive filtering is a common method for cancelling RFI when RFI reference signals are available. The frequency-domain LMS algorithm is an efficient method for extracting the QR signal from the background RFI. Under perfect circumstances, when the RFI measured on the main antenna is completely correlated with the signals measured on the reference antennas, all RFI can be removed by RFI mitigation, leaving only Gaussian noise associated with the QR system. Since the RFI travels over multiple paths and the antenna system is not perfect, however, the RFI mitigation cannot remove all of the nonGaussian noise. Consequently, more sophisticated signal processing methods must be employed to estimate the QR signal after the RFI mitigation, and thus improve the QR signal detection. Figure 7.3 shows the basic block diagram for QR signal detection. The data acquired during each excitation pulse are termed a segment, and a group of segments constitutes a measurement. In general, for each potential target there are multiple measurements, with each measurement containing several hundred segments. The measurements are demodulated at the expected QR resonant frequency. Thus, if the demodulated frequency equals the QR resonant frequency, the QR signal after demodulation is a DC constant.

7.5.2

Radio-Frequency Interference

Although QR is a promising technology due to its chemical specificity, it is limited by the inherently weak QR signal and susceptibility to RFI. TNT is one of the most prevalent explosives in landmines, and also one of the most difficult explosives to detect. TNT possesses 18 resonant frequencies, 12 of which are clustered in the range of 700–900 kHz. Consequently, AM radio transmitters strongly interfere with TNT-QR detection in the

Signal and Image Processing for Remote Sensing

144 x(n)

e−j2πF0n

×

Low-pass filter x(n) RFI Mitigation y(n) Filtering

Detector

Decision

**b**

H0 H1

FIGURE 7.3 Signal processing block diagram for QR signal detection. F0 is the resonant QR frequency for the explosive of interest and denotes a complex-valued signal. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

field, and are the primary source of RFI. Since there may be several AM radio transmitters operating simultaneously, the baseband (after demodulation) RFI signal measured by the QR system in each segment may be modeled as ~I (n) ¼

M X

~ m (n)ej(2pfm nþfm ) , A

n ¼ 0, . . . , N 1

(7:61)

m¼1

where the superscript denotes a complex value and we assume the frequencies are distinct, meaning fi 6¼ fj for i 6¼ j. For the RFI, Am(n) is the discrete time series of the message signal from an AM transmitter, which may be a nonstationary speech or music signal from a commercial AM radio station. The statistics of this signal, however, can be assumed to remain essentially constant over the short time intervals during which data are collected. For time intervals of this order, it is reasonable to assume Am(n) is constant for each data segment, but may change from segment to segment. Therefore, Equation 7.61 may be expressed as ~I (n) ¼

M X

~ m ej(2pfm nþfm ) , A

n ¼ 0, . . . , N 1

(7:62)

m¼1

This model represents all frequencies even though each of the QR signals exists in a very narrow band. In practice, only the frequencies corresponding to the QR signals need be considered.

7.5.3

Postmitigation Signals

The applicability of the postmitigation signal models described previously is demonstrated by examining measured QR and RFI signals.

Kalman Filtering for Weak Signal Detection in Remote Sensing

145

7.5.3.1 Postmitigation Quadrupole Resonance Signal An example simulated QR signal before and after RFI mitigation is shown in Figure 7.4. Although the QR signal is a DC constant prior to RFI mitigation, that is no longer true after mitigation. The first-order Gaussian–Markov model introduced previously is an appropriate model for postmitigation QR signal. The value of s~«2 ¼ 0.1 is estimated from the data.

7.5.3.2 Postmitigation Background Noise Although RFI mitigation can remove most of the RFI, some residual RFI remains in the postmitigation signal. The background noise remaining after RFI mitigation, ~v(n), consists of the residual RFI and the remaining sensor noise, which is altered by the mitigation. An example power spectrum of ~ v(n) derived from experimental data is shown in Figure 7.5. The power spectrum contains numerous peaks and valleys. Thus, the AR and ARMA models discussed previously are appropriate for modeling the residual background noise.

7.5.4

Kalman Filters for Quadrupole Resonance Detection

7.5.4.1 Conventional Kalman Filter For the system described by Equation 7.14 and Equation 7.15, the variables in the ~ (k), uk ¼ ~v(k), and zk ¼ ~y(k), and the conventional Kalman filter are xk ¼ ~s(k), wk ¼ w coefficient and covariance matrices in the Kalman filter are F ¼ [1], G ¼ [1], H ¼ [1], and Q ¼ [sw2]. The covariance R is estimated from the data, and the off-diagonal elements are set to 0.

12

10

Amplitude

8 Pre RFI-MIT: Real Part Post RFI-MIT: Real Part Pre RFI-MIT: Imag Part Post RFI-MIT: Imag Part

6

4

2

0 −2

5

10

15

20

25

30

35

40

45

50

55

Samples FIGURE 7.4 Example realization of the the simulated QR signal before and after RFI mitigation. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Signal and Image Processing for Remote Sensing

146 160

Power spectral density (dB)

140

120

100

80

60

40

20 −4

−3

−2

−1

0 Frequency

1

2

3

4

FIGURE 7.5 Power spectrum of the remaining background noise n~(n) after RFI mitigation. The x-axis units are normalized frequency ([p,p]). (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

7.5.4.2

Kalman Filter with an Autoregressive Model for Colored Noise

The multi-segment Burg algorithms are used to estimate the AR parameters, and the simulation results do not show significant differences between AVA, AVK, and SBurg algorithms. The AVA algorithm was chosen to estimate the AR coefficients a˜p and error power s~«2 describing the background noise for each measurement. The optimal order, as determined by the AIC, is 6. 7.5.4.3

Kalman Filter for Colored Noise

For this Kalman filter, the coefficient and covariance matrices are the same as for the conventional Kalman filter, with the exception of the observation noise covariance R. In this filter, all elements of R are retained, as opposed to the conventional Kalman filter in which only the diagonal elements are retained. 7.5.4.4 Indirect Signal Estimation For the adaptive Kalman filter in the first step, all coefficient and covariance matrices, F, G, H, Q, R, and S0, are the same under both H1 and H0. The practical value of x0 is given by [~ y(0) ~ y(1) ~y(P 1)]T

(7:63)

where P is the order of the AR model representing ~v(n). For the conventional Kalman filter in the second step, F, G, H, Q, and S0 are the same under both H1 and H0; however, the ^ , depends on the postmitigation signal and estimated observation noise covariance, R therefore is different under the two hypotheses.

Kalman Filtering for Weak Signal Detection in Remote Sensing

7.6

147

Performance Evaluation

The proposed Kalman filter methods are evaluated on experimental data collected in the field by QM. A typical QR signal consists of multiple sinusoids where the amplitude and frequency of each resonant line are the parameters of interest. Usually, the amplitude of only one resonant line is estimated at a time, and the frequency is known a priori. For the baseband signal demodulated at the desired resonant frequency, adaptive RFI mitigation is employed. A 1-tap LMS mitigation algorithm is applied to each frequency component of the frequency-domain representation of the experimental data [23]. First, performance is evaluated for synthetic QR data. In this case, the RFI data are measured experimentally in the absence of an explosive material, and an ideal QR signal (complex-valued DC signal) is injected into the measured RFI data. Second, performance is evaluated for experimental QR data. In this case, the data are measured for both explosive and nonexplosive samples in the presence of RFI. The matched filter is employed for the synthetic QR data, while the energy detector is utilized for the measured QR data.

7.6.1

Detection Algorithms

After an estimate of the QR signal has been obtained, a detection algorithm must be applied to determine whether or not the QR signal is present. For this binary decision problem, both an energy detector and a matched filter are applied. The energy detector simply computes the energy in the QR signal estimate, ~^s(n) Es ¼

N 1 X

j~^s(n)j2

(7:64)

n¼0

As it is a simple detection algorithm that does not incorporate any prior knowledge of the QR signal characteristics, it is easy to compute. The matched filter computes the detection statistic ( l ¼ Re

N 1 X

^~s(n)S ~

) (7:65)

n¼0

~ is the reference signal, which, in this application, is the known QR signal. The where S matched filter is optimal only if the reference signal is precisely known a priori. Thus, if there is uncertainty regarding the resonant frequency of the QR signal, the matched filter will no longer be optimal. QR resonant frequency uncertainty may arise due to variations in environmental parameters, such as temperature.

7.6.2

Synthetic Quadrupole Resonance Data

Prior to detection using the matched filter, the QR signal is estimated directly. Three Kalman filter approaches to estimate the QR signal are considered: the conventional Kalman filter, the extended Kalman filter, and the Kalman filter for arbitrary colored noise. Each of these filters requires the initial value of the system state, x0, and the selection of the initial value may affect the estimate of x. The sample mean of the observation ~ y(n) is used to set x0. Since only the steady-state output of the Kalman filter is reliable,

148

Signal and Image Processing for Remote Sensing

the first 39 points are removed from the data used for detection to ensure the system has reached steady state. Although the measurement noise is known to be colored, the conventional Kalman filter (conKF) designed for white noise is applied to estimate the QR signal. This algorithm has the benefits of being simpler with lower computational burden than either of the other two Kalman filters proposed for direct estimation of the QR signal. Although its assumptions regarding the noise structure are recognized as inaccurate, its performance can be used as a benchmark for comparison. For a given process noise covariance Q, the covariance of the intial state x0, S0, affects the speed with which S reaches steady state in the conventional Kalman filter. As Q increases, it takes less time for S to converge. However, the steady-state value of S also increases. In these simulations, S0 ¼ 10 in the conventional Kalman filter. The second approach applied to estimate the QR signal is the Kalman filter with an AR model for the colored noise (extKF). Since R ¼ 0 shuts down the Kalman filter it does not track the signal, we set R ¼ 10. It is shown that the detection performance does not decrease when Q increases. Finally, the Kalman filter for colored noise (arbKF) is applied to the synthetic data. Compared to the conventional Kalman filter, the Kalman filter for arbitrary noise has a smaller Kalman gain, and therefore, slower convergence speed. Consequently, the error covariances for the Kalman filter for arbitrary noise are larger. Since only the steady state is used for detection, S0 ¼ 50 is chosen for Q ¼ 0.1 and S0 ¼ 100 is chosen for Q ¼ 1 and Q ¼ 10. Results for each of these Kalman filtering methods for different values of Q are presented in Figure 7.6. When Q is small (Q ¼ 0.1 and Q ¼ 1), all three methods have similar detection performance. However, when greater model error is introduced in the state equation (Q ¼ 10) both the conventional Kalman filter and the Kalman filter for colored noise have poorer detection performance than the Kalman filter with an AR model for the noise. Thus, the Kalman filter with an AR model for the noise shows robust performance. Considering the computational efficiency, the Kalman filter for colored noise is the poorest because it recursively estimates C and V for each k. In this application, although the measurement noise ~ vk is colored, the diagonal elements of the covariance matrix dominate. Therefore, the conventional Kalman filter is preferable to the Kalman filter for colored noise. 7.6.3

Measured Quadrupole Resonance Data

Indirect estimation of the QR signal is validated using two groups of real data collected both with and without an explosive present. The two explosives for which measured QR data are collected, denoted Type A and Type B, are among the more common explosives found in landmines. It is well known that the Type B explosive is the more challenging of the two explosives to detect. The first group, denoted Data II, has Type A explosive, and the second group, denoted Data III, has both Type A and Type B explosives. For each data group, a 50–50% training–testing strategy is employed. Thus, 50% of the data are used to estimate the coefficient and covariance matrices, and the remaining 50% of the data are used to test the algorithm. The AVA algorithm is utilized to estimate the AR parameters from the training data. Table 7.1 lists the four training–testing strategies considered. For example, if there are 10 measurements and measurements 1–5 are used for training and measurements 6–10 are used for testing, then this is termed ‘‘first 50% training, last 50% testing.’’ If the training–testing strategy measurements 1, 3, 5, 7, 9 are used for training, and the other measurements for testing, this is termed ‘‘odd 50% training, even 50% testing.’’

Kalman Filtering for Weak Signal Detection in Remote Sensing (b) Data II-2

1

1

0.9

0.9

0.8

0.8 Probability of detection

Probability of detection

(a) Data II-1

0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=0.1 Post arbKF: Q=0.1 Post extKF: Q=0.1

0.3 0.2

0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=1 Post arbKF: Q=1 Post extKF: Q=1

0.3 0.2

0.1 0

149

0.1 0

0.2 0.4 0.6 0.8 Probability of false alarm

1

0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

(c) Data II-3 1 0.9

Probability of detection

0.8 0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=10 Post arbKF: Q=10 Post extKF: Q=10

0.3 0.2 0.1 0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.6 Direct estimation of QR signal tested on Data I (synthetic data) with different noise model variance Q. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Figure 7.7 and Figure 7.8 present the performance of indirect QR signal estimation followed by an energy detector. The data are measured for both explosive and nonexplosive samples in the presence of RFI. The two different explosive compounds are referred to as Type A explosive and Type B explosive. Data II and Data III-1 are Type A explosive and Data III-2 is Type B explosive. Indirect QR signal estimation provides almost perfect TABLE 7.1 Training–testing strategies for Data II and Data III Training

Testing

First 50% Last 50% Odd 50% Even 50%

Last 50% First 50% Even 50% Odd 50%

Signal and Image Processing for Remote Sensing

150

1

1

0.9

0.9

0.8

0.8

Probability of detection

(b) Data II-2

Probability of detection

(a) Data II-1

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1

1

0

0

0.2 0.4 0.6 0.8 Probability of false alarm

1

(c) Data II-3 1

Probability of detection

0.9 0.8 0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.7 Indirect estimation of QR signal tested on Data II (true data). Four different training–testing strategies are plotted together. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

detection for the Type A explosive. Although detection is not near-perfect for the Type B explosive, the detection performance following the application of the Kalman filter is better than the performance prior to applying the Kalman filter.

7.7

Summary

The detectability of weak signals in remote sensing applications can be hindered by the presence of interference signals. In situations where it is not possible to record the measurement without the interference, adaptive filtering is an appropriate method to mitigate the interference in the measured signal. Adaptive filtering, however, may not remove all the interference from the measured signal if the reference signals are not

Kalman Filtering for Weak Signal Detection in Remote Sensing

1

1

0.9

0.9

0.8

0.8

Probability of detection

(b) Data III-2

Probability of detection

(a) Data III-1

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

151

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1

1

0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.8 Indirect estimation of QR signal tested on Data III (true data). Four different training–testing strategies are plotted together. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

completely correlated with the primary measured signal. One approach for subsequent processing to detect the signal of interest when residual interference remains after the adaptive noise cancellation is Kalman filtering. An accurate signal model is necessary for Kalman filters to perform well. It is so critical that even small deviations may cause very poor performance. The harmonic signal of interest may be sensitive to the external environment, which may then restrict the signal model accuracy. To overcome this limitation, an adaptive two-step algorithm, employing Kalman filters, is proposed to estimate the signal of interest indirectly. The utility of this approach is illustrated by applying it to QR signal estimation for landmine detection. QR technology provides promising explosive detection efficiency because it can detect the ‘‘fingerprint’’ of explosives. In applications such as humanitarian demining, QR has proven to be highly effective if the QR sensor is not exposed to RFI. Although adaptive RFI mitigation removes most of RFI, additional signal processing algorithms applied to the postmitigation signal are still necessary to improve landmine detection. Indirect signal estimation is compared to direct signal estimation using Kalman filters and is shown to be more effective. The results of this study indicate that indirect QR signal estimation provides robust detection performance.

References 1. B.D.O. Anderson and J.B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, NJ, 1979. 2. S.J. Norton and I.J. Won, Identification of buried unexploded ordnance from broadband electromagnetic induction data, IEEE Transactions on Geoscience and Remote Sensing, 39(10), 2253– 2261, 2001. 3. K. O’Neill, Discrimination of UXO in soil using broadband polarimetric GPR backscatter, IEEE Transactions on Geoscience and Remote Sensing, 39(2), 356–367, 2001.

152

Signal and Image Processing for Remote Sensing

4. N. Garroway, M.L. Buess, J.B. Miller, B.H. Suits, A.D. Hibbs, G.A. Barrall, R. Matthews, and L.J. Burnett, Remote sensing by nuclear quadrupole resonance, IEEE Transactions on Geoscience and Remote Sensing, 39(6), 1108–1118, 2001. 5. J.A.S. Smith, Nuclear quadrupole resonance spectroscopy, general principles, Journal of Chemical Education, 48, 39–49, 1971. 6. S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, 1991. 7. C.F. Cowan and P.M. Grant, Adaptive Filters, Prentice-Hall, Englewood Cliffs, NJ, 1985. 8. G.E.P. Box and G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, CA, 1970. 9. B. Picinbono and P. Bondon, Second-order statistics of complex signals, IEEE Transactions on Signal Processing, SP-45(2), 411–420, 1979. 10. P.M.T. Broersen, The ABC of autoregressive order selection criteria, Proceedings of SYSID SICE, 231–236, 1997. 11. S. Haykin, B.W. Currie, and S.B. Kesler, Maximum-entropy spectral analysis of radar clutter, Proceedings of the IEEE, 70(9), 953–962, 1982. 12. A.A. Beex and M.D.A. Raham, On averaging Burg spectral estimators for segments, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34, 1473–1484, 1986. 13. D. Tjostheim and J. Paulsen, Bias of some commonly-used time series estimates, Biometrika, 70(2), 389–399, 1983. 14. S. de Waele and P.M.T. Broersen, The Burg algorithm for segments, IEEE Transactions on Signal Processing, SP-48, 2876–2880, 2000. 15. H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, AC-19(6), 716–723, 1974. 16. M. Wax and T. Kailath, Detection of signals by information theoretic criteria, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-33(2), 387–392, 1985. 17. Y. Tan, S.L. Tantum, and L.M. Collins, Kalman filtering for enhanced landmine detection using quadrupole resonance, IEEE Transactions on Geoscience and Remote Sensing, 43(7), 1507– 1516, 2005. 18. J.D. Gibson, B. Koo, and S.D. Gray, Filtering of colored noise for speech enhancement and coding, IEEE Transactions on Signal Processing, 39(8), 1732–1742, 1991. 19. A.P. Sage and J.L. Melsa, Estimation Theory with Applications to Communications and Control, McGraw-Hill, New York, 1971. 20. P.S. Maybeck, Stochastic Models, Estimation, and Control, Academic Press, New York, 1979. 21. X.R. Li, C. Han, and J. Wang, Discrete-time linear filtering in arbitrary noise, Proceedings of the 39th IEEE Conference on Decision and Control, 2000, 1212–1217, 2000. 22. R.K. Mehra, Approaches to adaptive filtering, IEEE Transactions on Automatic Control, AC-17, 693–398, 1972. 23. Y. Tan, S.L. Tantum, and L.M. Collins, Landmine detection with nuclear quadrupole resonance, In IGARSS 2002 Proceedings, 1575–1578, July 2002.

8 Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation Moisture Dynamics

J. Verbesselt, P. Jo¨nsson, S. Lhermitte, I. Jonckheere, J. van Aardt, and P. Coppin

CONTENTS 8.1 Introduction ....................................................................................................................... 153 8.2 Data ..................................................................................................................................... 155 8.2.1 Study Area ............................................................................................................. 155 8.2.2 Climate Data.......................................................................................................... 156 8.2.3 Remote Sensing Data ........................................................................................... 157 8.3 Serial Correlation and Time-Series Analysis................................................................ 158 8.3.1 Recognizing Serial Correlation........................................................................... 158 8.3.2 Cross-Correlation Analysis ................................................................................. 160 8.3.3 Time-Series Analysis: Relating Time-Series and Autoregression ................ 162 8.4 Methodology...................................................................................................................... 163 8.4.1 Data Smoothing .................................................................................................... 163 8.4.2 Extracting Seasonal Metrics from Time-Series and Statistical Analysis ..... 165 8.5 Results and Discussion .................................................................................................... 166 8.5.1 Temporal Analysis of the Seasonal Metrics ..................................................... 166 8.5.2 Regression Analysis Based on Values of Extracted Seasonal Metrics......... 167 8.5.3 Time-Series Analysis Techniques ...................................................................... 168 8.6 Conclusions........................................................................................................................ 169 Acknowledgments ..................................................................................................................... 170 References ................................................................................................................................... 170

8.1

Introduction

The repeated occurrence of severe wildfires, which affect various fire-prone ecosystems of the world, has highlighted the need to develop effective tools for monitoring fire-related parameters. Vegetation water content (VWC), which influences the biomass burning processes, is an example of one such parameter [1–3]. The physical definitions of VWC vary from water volume per leaf or ground area (equivalent water thickness) to water mass per mass of vegetation [4]. Therefore, VWC could also be used to infer vegetation water stress and to assess drought conditions that linked with fire risk [5]. Decreases in VWC due to the seasonal decrease in available soil moisture can induce severe fires in 153

Signal and Image Processing for Remote Sensing

154

most ecosystems. VWC is particularly important for determining the behavior of fires in savanna ecosystems because the herbaceous layer becomes especially flammable during the dry season when the VWC is low [6,7]. Typically, VWC in savanna ecosystems is measured using labor-intensive vegetation sampling. Several studies, however, indicated that VWC can be characterized temporally and spatially using meteorological or remote sensing data, which could contribute to the monitoring of fire risk [1,4]. The meteorological Keetch–Byram drought index (KBDI) was selected for this study. This index was developed to incorporate soil water content in the root zone of vegetation and is able to assess the seasonal trend of VWC [3,8]. The KBDI is a cumulative algorithm for the estimation of fire potential from meteorological information, including daily maximum temperature, daily total precipitation, and mean annual precipitation [9,10]. The KBDI also has been used for the assessment of VWC for vegetation types with shallow rooting systems, for example, the herbaceous layer of the savanna ecosystem [8,11]. The application of drought indices, however, presents specific operational challenges. These challenges are due to the lack of meteorological data for certain areas, as well as spatial interpolation techniques that are not always suitable for use in areas with complex terrain features. Satellite data provide sound alternatives to meteorological indices in this context. Remotely sensed data have significant potential for monitoring vegetation dynamics at regional to global scale, given the synoptic coverage and repeated temporal sampling of satellite observations (e.g., SPOT VEGETATION or NOAA AVHRR) [12,13]. These data have the advantage of providing information on remote areas where ground measurements are impossible to obtain on a regular basis. Most research in the scientific community using optical sensors (e.g., SPOT VEGETATION) to study biomass burning has focused on two areas [4]: (1) the direct estimation of VWC and (2) the estimation of chlorophyll content or degree of drying as an alternative to the estimation of VWC. Chlorophyll-related indices are related to VWC based on the hypothesis that the chlorophyll content of leaves decreases proportionally to the VWC [4]. This assumption has been confirmed for selected species with shallow rooting systems (e.g., grasslands and understory forest vegetation) [14–16], but cannot be generalized to all ecosystems [4]. Therefore, chlorophyll-related indices, such as the normalized difference vegetation index (NDVI), only can be used in regions where the relationship among chlorophyll content, degree of curing, and water content has been established. Accordingly, a remote sensing index that is directly coupled to the VWC is used to investigate the potential of hyper-temporal satellite imagery to monitor the seasonal vegetation moisture dynamics. Several studies [4,16–18] have demonstrated that VWC can be estimated directly through the normalized difference of the near infrared reflectance (NIR, 0.78–0.89 mm) rNIR, influenced by the internal structure and the dry matter, and the shortwave infrared reflectance (SWIR, 1.58–1.75 mm) rSWIR, influenced by plant tissue water content: NDWI ¼

rNIR rSWIR rNIR þ rSWIR

(8:1)

The NDWI or normalized difference infrared index (NDII) [19] is similar to the global vegetation moisture index (GVMI) [20]. The relationship between NDWI and KBDI time-series, both related to VWC dynamics, is explored. Although the value of time-series data for monitoring vegetation moisture dynamics has been firmly established [21], only a few studies have taken serial correlation into account when correlating time-series [6,22–25]. Serial correlation occurs when data collected through time contain values at time t, which are correlated with observations at

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 155 time t – 1. This type of correlation in time-series, when related to VWC dynamics, is mainly caused by the seasonal variation (dry–wet cycle) of vegetation [26]. Serial correlation can be used to forecast future values of the time-series by modeling the dependence between observations but affects correlations between variables measured in time and violates the basic regression assumption of independence [22]. Correlation coefficients of serially correlated data cannot be used as indicators of goodness-of-fit of a model as the correlation coefficients are artificially inflated [22,27]. The study of the relationship between NDWI and KBDI is a nontrivial task due to the effect of serial correlation. Remedies for serial correlation include sampling or aggregating the data over longer time intervals, as well as further modeling, which can include techniques such as weighted regression [25,28]. However, it is difficult to account for serial correlation in time-series related to VWC dynamics using extended regression techniques. The time-series related to VWC dynamics often exhibit high non-Gaussian serial correlation and are more significantly affected by outliers and measurement errors [28]. A sampling technique therefore is proposed, which accounts for serial correlation in seasonal time-series, to study the relationship between different time-series. The serial correlation effect in time-series is assumed to be minimal when extracting one metric per season (e.g., start of the dry season). The extracted seasonal metrics are then utilized to study the relationship between time-series at a specific moment in time (e.g., start of the dry season). The aim of this chapter is to address the effect of serial correlation when studying the relationship between remote sensing and meteorological time-series related to VWC by comparing nonserially correlated seasonal metrics from time-series. This chapter therefore has three defined objectives. Firstly, an overview of time-series analysis techniques and concepts (e.g., stationarity, autocorrelation, ARIMA, etc.) is presented and the relationship between time-series is studied using cross-correlation and ordinary least square (OLS) regression analysis. Secondly, an algorithm for the extraction of seasonal metrics is optimized for satellite and meteorological time-series. Finally, the temporal occurrence and values of the extracted nonserially correlated seasonal metrics are analyzed statistically to define the quantitative relationship between NDWI and KBDI time-series. The influence of serial correlation is illustrated by comparing results from cross-correlation and OLS analysis with the results from the investigation of correlation between extracted metrics.

8.2 8.2.1

Data Study Area

The Kruger National Park (KNP), located between latitudes 238S and 268S and longitudes 308E and 328E in the low-lying savanna of the northeastern part of South Africa, was selected for this study (Figure 8.1). Elevations range from 260 to 839 m above sea level, and mean annual rainfall varies between 350 mm in the north and 750 mm in the south. The rainy season within the annual climatic season can be confined to the summer months (i.e., November to April), and over a longer period can be defined by alternating wet and dry seasons [7]. The KNP is characterized by an arid savanna dominated by thorny, fine-leafed trees of the families Mimosaceae and Burseraceae. An exception is the northern part of the KNP where the Mopane, a broad-leafed tree belonging to the Ceasalpinaceae, almost completely dominates the tree layer.

Signal and Image Processing for Remote Sensing

156

Punda Maria N

Shingwedzi

Letaba

Satara

Pretoriuskop Onder Sabie 0 10 20 30 40 50 km FIGURE 8.1 The Kruger National Park (KNP) study area with the weather stations used in the analysis (right). South Africa is shown with the borders of the provinces and the study area (top left).

8.2.2

Climate Data

Climate data from six weather stations in the KNP with similar vegetation types were used to estimate the daily KBDI (Figure 8.1). KBDI was derived from daily precipitation and maximum temperature data to estimate the net effect on the soil water balance [3]. Assumptions in the derivation of KBDI include a soil water capacity of approximately 20 cm and an exponential moisture loss from the soil reservoir. KBDI was initialized during periods of rainfall events (e.g., rainy season) that result in soils with maximized field capacity and KBDI values of zero [8]. The preprocessing of KBDI was done using the method developed by Janis et al. [10]. Missing daily maximum temperatures were replaced with interpolated values of daily maximum temperatures, based on a linear interpolation function [30]. Missing daily precipitation, on the other hand, was assumed to be zero. A series of error logs were automatically generated to indicate missing precipitation values and associated estimated daily KBDI values. This was done because zeroing missing precipitation may lead to an increased fire potential bias in KBDI. The total percentage of missing data gaps in rainfall and temperature series was maximally 5% during the study period for each of the six weather stations. The daily KBDI time-series were transformed into 10-daily KBDI series, similar to the SPOT VEGETATION S10 dekads (i.e., 10-day periods), by taking the maximum of

− 600

− 400 −KBDI

NDWI −0.3 −0.2 −0.1 0.0

0.1

0.2

NDWI −KBDI

− 200

0

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 157

1999

2000

2001

2002

2003

FIGURE 8.2 The temporal relationship between NDWI and KBDI time-series for the ‘‘Satara’’ weather station (Figure 8.1).

each dekad. The negative of the KBDI time-series (i.e., –KBDI) was analyzed in this chapter such that the temporal dynamics of KBDI and NDWI were related (Figure 8.2). The –KBDI and NDWI are used throughout this chapter. The Satara weather station, centrally positioned in the study area, was selected to represent the temporal vegetation dynamics. The other weather stations in the study area demonstrate similar temporal vegetation dynamics.

8.2.3

Remote Sensing Data

The data set used is composed of 10-daily SPOT VEGETATION (SPOT VGT) composites (S10 NDVI maximum value syntheses) acquired over the study area for the period April 1998 to December 2002. SPOT VGT can provide local to global coverage on a regular basis (e.g., daily for SPOT VGT). The syntheses result in surface reflectance in the blue (0.43–0.47 mm), red (0.61–0.68 mm), NIR (0.78–0.89 mm), and SWIR (1.58–1.75 mm) spectral regions. Images were atmospherically corrected using the simplified method for atmospheric correction (SMAC) [30]. The geometrically and radiometrically corrected S10 images have a spatial resolution of 1 km. The S10 SPOT VGT time-series were preprocessed to detect data that erroneously influence the subsequent fitting of functions to time-series, necessary to define and extract metrics [6]. The image preprocessing procedures performed were: .

Data points with a satellite viewing zenith angle (VZA) above 508 were masked out as pixels located at the very edge of the image (VZA > 50.58) swath are affected by re-sampling methods that yield erroneous spectral values.

.

The aberrant SWIR detectors of the SPOT VGT sensor, flagged by the status mask of the SPOT VGT S10 synthesis, also were masked out. A data point was classified as cloud-free if the blue reflectance was less than 0.07 [31]. The developed threshold approach was applied to identify cloud-free pixels for the study area.

.

NDWI time-series were derived by selecting savanna pixels, based on the land cover map of South Africa [32], for a 3 3 pixel window centered at each of the meteorological

Signal and Image Processing for Remote Sensing

158

stations to reduce the effect of potential spatial misregistration (Figure 8.1). Median values of the 9-pixel windows were then retained instead of single pixel values [33]. The median was preferred to average values as it is less affected by extreme values and therefore is less sensitive to potentially undetected data errors.

8. 3

Serial C orrelation and Ti me-Series A nalysis

Serial correlation affects correlations between variables measured in time, and violates the basic regression assumption of independence. Techniques that are used to recognize serial correlation therefore are discussed by applying them to the NDWI and –KBDI time-series. Cross-correlation analysis is illustrated and used to study the relationship between time-series of –KBDI and NDWI. Fundamental time-series analysis concepts (e.g., stationarity and seasonality) are introduced and a brief overview is presented of the most frequently used method for time-series analysis to account for serial correlation, namely autoregression(AR).

8.3.1

Recognizing Serial Correlation

This chapter focuses on discrete time-series, which contain observations made at discrete time intervals (e.g., 10 daily time steps of –KBDI and NDWI time-series). Time-series are defined as a set of observations, xt, recorded at a specific time, t [26]. Time-series of – KBDI and NDWI contain a seasonal variation which is illustrated in Figure 8.2 by a smooth increase or decrease of the series related to vegetation moisture dynamics. The gradual increase or decrease of the graph of a time-series is generally indicative of the existence of a form of dependence or serial correlation among observations. The presence of serial correlation systematically biases regression analysis when studying the relationship between two or more time-series [25]. Consider the OLS regression line with a slope and an intercept: Y(t) ¼ a0 þ a1 X(t) þ e(t)

(8:2)

where t is time, a0 and a1 are the respective OLS regression intercept and slope parameter, Y(t) the dependent variable, X(t) the independent variable, and e(t) the random error term. The standard error(SE) of each parameter is required for any regression model to define the confidence interval(CI) and derive the significance of parameters in the regression equation. The parameters a0, a1, and the CIs, estimated by minimizing the sum of the squared ‘‘residuals’’ are valid only if certain assumptions related to the regression and e(t) are met [25]. These assumptions are detailed in statistical textbooks [34] but are not always met or explicitly considered in real-world applications. Figure 8.3 illustrates the biased CIs of the OLS regression model at a 95% confidence level. The SE term of the regression model is underestimated due to serially correlated residuals and explains the biased confidence interval, where CI ¼ mean + 1.96 SE. The Gauss–Markov theorem states that the OLS parameter estimate is the best linear unbiased estimate (BLUE); that is, all other linear unbiased estimates will have a larger variance, if the error term, e(t), is stationary and exhibits no serial correlation. The Gauss–Markov theorem consequently points to the error term and not to the time-series themselves as the critical consideration [35]. The error term is defined as stationary when

−400 −800

−600

−KBDI

−200

0

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 159

−0.3

−0.2

−0.1

0.0

0.1

0.2

0.3

NDWI FIGURE 8.3 Result of the OLS regression fit between KBDI and NDWI as dependent and independent variables, respectively, for the Satara weather station (n ¼ 157). Confidence intervals (- - -) at a 95% confidence level are shown, but are ‘‘narrowed’’ due to serial correlation in the residuals.

it does not present a trend and the variance remains constant over time [27]. It is possible that the residuals are serially correlated if one of the dependent or independent variables also is serially correlated, because the residuals constitute a linear combination of both types of variables. Both dependent and independent variables of the regression model are serially correlated (KBDI and NDWI), which explains the serial correlation observed in the residuals. A sound practice used to verify serial correlation in time-series is to perform multiple checks by both graphical and diagnostic techniques. The autocorrelation function (ACF) can be viewed as a graphical measure of serial correlation between variables or residuals. The sample ACF is defined when x1, . . . , xn are observations of a time-series. The sample mean of x1, . . . , xn is [26]: x¼

n 1X xt n t¼1

(8:3)

The sample autocovariance function with lag h and time t is g^(h) ¼ n1

n jhj X

xtþjhj x (xt x),n < h < n

(8:4)

g^(h) ,n < h < n g^(0)

(8:5)

t¼1

The sample ACF is r^ ¼

Figure 8.4 illustrates the ACF for time-series of –KBDI and NDWI presented from the Kruger park data. The ACF clearly indicates a significant autocorrelation in the

Signal and Image Processing for Remote Sensing

−0.4 0.0

ACF, NDWI 0.2 0.4 0.6

0.8

1.0

160

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

Lag

−0.2 0.0

ACF,−KBDI 0.2 0.4 0.6

0.8

1.0

(a)

0.0

0.2

(b)

0.4 Lag

FIGURE 8.4 The autocorrelation function (ACF) for (a) KBDI and (b) NDWI time-series for the Satara weather station. pﬃﬃﬃ The horizontal lines on the graph are the bounds ¼ 1:96= n (n ¼ 157).

time-series, as more pﬃﬃﬃ than 5% of the sample autocorrelations fall outside the significance bounds ¼ 1:96= n [26]. There are also formal tests available to detect autocorrelation such as the Ljung–Box test statistic and the Durbin–Watson statistic [25,26].

8.3.2

Cross-Correlation Analysis

The cross-correlation function (CCF) can be derived between two time-series utilizing a technique similar to the ACF applied for one time-series [27]. Cross-correlation is a measure of the degree of linear relationship existing between two data sets and can be used to study the connection between time-series. The CCF, however, can only be used if the time-series is stationary [27]. For example, when all variables are increasing in value over time, cross-correlation results will be spurious and subsequently cannot be used to study the relationship between time-series. Nonstationary time-series can be transformed to stationary time-series by implementing one of the following techniques: .

Differencing the time-series by a period d can yield a series that satisfies the assumption of stationarity (e.g., xtxt–1 for d ¼ 1). The differenced series will

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 161 contain one point less than the original series. Although a time-series can be differenced more than once, one difference is usually sufficient. .

Lower order polynomials can be fitted to the series when the data contains a trend or seasonality that needs to be subtracted from the original series. Seasonal time-series can be represented as the sum of a specified trend, and seasonal and random terms. For example, for statistical interpretation results, it is important to recognize the presence of seasonal components and remove them to avoid confusion with long-term trends. Figure 8.5 illustrates the seasonal trend decompositioning method using locally weighted regression for the NDWI time-series [36].

.

The logarithm or square root of the series may stabilize the variance in the case of a nonconstant variance.

0.1 −0.1

0.1 −0.2 −0.1 0.0

−0.15 −0.10 −0.05 0.00

1999

2000

2001

2002

2003

−0.3

−0.1

0.1

Remainder

Trend

Seasonal

0.2

−0.3

Data

Figure 8.6 illustrates the cross-correlation plot for stationary series of –KBDI and NDWI. –KBDI and NDWI time-series became stationary after differencing with d ¼ 1. The stationarity was confirmed using the ‘‘augmented Dickey–Fuller’’ test for stationarity [26,29] at a confidence level of 95% ( p < 0.01; with stationarity as the alternative hypothesis). Note that approximately 95% confidence limits are shown for the autocorrelation plots of an independent series. These limits must be regarded with caution, since there exists an a priori expectation of serial correlation for time-series [37].

Time (dekad) FIGURE 8.5 The results of the seasonal trend decomposition (STL) technique for NDWI time-series of Satara weather station. The original series can be reconstructed by summing the seasonal, trend, and remainder. In the y-axes the NDWI values are indicated. The gray bars at the right-hand side of the plots illustrate the relative data range of the time-series.

Signal and Image Processing for Remote Sensing

–0.2

FIGURE 8.6 The cross-correlation plot between stationary KBDI and NDWI time-series of the Satara weather station, where CCF indicates results of the cross-correlation function. The horizontal lines on the pﬃﬃﬃ graph are the bounds (¼ 1:96= n) of the approximate 95% confidence interval.

0.0

CCF 0.1 0.2

0.3

0.4

0.5

162

–0.4

–0.2

0.0 Lag

0.2

0.4

Table 8.1 illustrates the coefficients of determination (i.e., multiple R2) of the OLS regression analysis with serially correlated residuals –KBDI as dependent and NDWI as independent variable for all six weather stations in the study area. The Durbin–Watson statistic indicated that the residuals were serially correlated at a 95% confidence level (p < 0.01). These results will be compared with the method presented in Section 8.5. Table 8.1 also indicates the time lags at which correlation between time-series was maximal, as derived from the cross-correlation plot. A negative lag indicates that –KBDI reacts prior to NDWI, for example, in the cases of Punda Maria and Shingwedzi weather stations, and subsequently can be used to predict NDWI. This is logical since weather conditions, for example, rainfall and temperature, change before vegetation reacts. NDWI, which is related to the amount of water in the vegetation, consequently lags behind the –KBDI. The major vegetation type in savanna vegetation is the herbaceous layer, which has a shallow rooting system. This explains why the vegetation in the study area quickly follows climatic changes and NDWI did not lag behind –KBDI for the other four weather stations.

8.3.3

Time-Series Analysis: Relating Time-Series and Autoregression

A remedy for serial correlation, apart from applying variations in sampling strategy, is modeling of the time dependence in the error structure by AR. AR most often is used for

TABLE 8.1 Coefficients of Determination of the OLS Regression Model between KBDI and NDWI (n ¼ 157). Station Punda Maria Letaba Onder Sabie Pretoriuskop Shingwedzi Satara

R2

Time Lag

0.74 0.88 0.72 0.31 0.72 0.81

1 0 0 0 1 0

Note: The time expressed in dekads of maximum correlation of the cross-correlation between –KBDI and NDWI is also indicated.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 163 purposes of forecasting and modeling of a time-series [25]. The simplest AR model for Equation 8.2, where r is the result of the sample ACF at lag 1, is et ¼ ret1 þ «t

(8:6)

where «t is a series of serially independent numbers with mean zero and constant variance. The Gauss–Markov theorem cannot be applied and therefore OLS is not an efficient estimator of the model parameters if r is not zero [35]. Many different AR models are available in statistical software systems that incorporate time-series modules. One of the most frequently used models to account for serial correlation is the autoregressive-integrated-moving-average model (ARIMA) [26,37]. Briefly stated, ARIMA models can have an AR term of order p, a differencing (integrating) term (I) of order d, and a moving average (MA) term of order q. The notation for specific models takes the form of (p,d,q) [27]. The order of each term in the model is determined by examining the raw data and plots of the ACF of the data. For example, a second-order AR (p ¼ 2) term in the model would be appropriate if a series has significant autocorrelation coefficients between xt, and xt–1, and xt–2. ARIMA models that are fitted to time-series data using AR and MA parameters, p and q, have coefficients F and u to describe the serial correlation. An underlying assumption of ARIMA models is that the series being modeled is stationary [26–27].

8.4

Methodology

The TIMESAT program is used to extract nonserially correlated metrics from remote sensing and meteorological time-series [38,39]. These metrics are utilized to study the relationship between time-series at specific moments in time. The relationship between time-series, in turn, is evaluated using statistical analysis of extracted nonserially correlated seasonal metrics from time-series (–KBDI and NDWI).

8.4.1

Data Smoothing

It often is necessary to generate smooth time-series from noisy satellite sensors or meteorological data to extract information on seasonality. The smoothing can be achieved by applying filters or by function fitting. Methods based on Fourier series [40–42] or leastsquare fits to sinusoidal functions [43–45] are known to work well in most instances. These methods, however, are not capable of capturing a sudden, steep rise or decrease of remote sensing or meteorological data values that often occur in arid and semiarid environments. Alternative smoothing and fitting methods have been developed to overcome these problems [38]. An adaptive Savitzky–Golay filtering method, implemented in the TIMESAT processing package developed by Jo¨nsson and Eklundh [39], is used in this chapter. The filter is based on local polynomial fits. Suppose we have a time-series (ti, yi), i ¼ 1, 2, . . . , N. For each point i, a quadratic polynomial f (t) ¼ c1 þ c2 t þ c3 t2

(8:7)

Signal and Image Processing for Remote Sensing

164

is fit to all 2kþ1 points for a window from n ¼ i k to m ¼ i þ k by solving the system of normal equations AT Ac ¼ AT b

(8:8)

where 0

wn B wnþ1 B A¼B @ wm

wn tn wnþ1 tnþ1 .. . wm tm

1 0 1 w n yn wn t2n B wnþ1 ynþ1 C wnþ1 t2nþ1 C B C C C and b ¼ B C .. A @ A . 2 w m ym wm tm

(8:9)

The filtered value is set to the value of the polynomial at point i. Weights are designated as w in the above expression, with weights assigned to all of the data values in the window. Data values that were flagged in the preprocessing are assigned weight ‘‘zero’’ in this application and thus do not influence the result. The clean data values all have weights ‘‘one.’’ Residual negatively biased noise (e.g., clouds) may occur for the remote sensing data and accordingly the fitting was performed in two steps [6]. The first fit was conducted using weights obtained from the preprocessing. Data points above the resulting smoothed function from the first fit are regarded more important, and in the second step the normal equations are solved using the weight of these data values, but increased by a factor 2. This multistep procedure leads to a smoothed function that is adapted to the upper envelope of the data (Figure 8.7). Similarly, the ancillary metadata of the meteorological data from the preprocessing also were used in the iterative fitting to the upper envelope of the –KBDI time-series [6]. The width of the fitting window determines the degree of smoothing, but it also affects the ability to follow a rapid change. It is sometimes necessary to locally tighten the window even when the global setting of the window performs well. A typical situation occurs in savanna ecosystems where vegetation, associated remote sensing, and meteorological indices respond rapidly to vegetation moisture dynamics. A small fitting window can be used to capture the corresponding sudden rise in data values. (b) 0.4

0.4

0.2

0.2 NDWI

NDWI

(a)

0

−0.2 −0.4

0

−0.2

0

20

40 60 Time (dekad)

80

100

−0.4 0

20

40 60 80 Time (dekad)

100

FIGURE 8.7 The Savitzky–Golay filtering of NDWI (——) is performed in two steps. Firstly, the local polynomials are fitted using the weights from the preprocessing (a). Data points above the resulting smoothed function (– – –) from the first fit are attributed a greater importance. Secondly, the normal equations are solved with the weights of these data values increased by a factor 2 (b).

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 165 (b) 0.3

165

0.2

160

0.1

155 NDWI

NDWI

(a)

0

150

−0.1

145

−0.2

140

−0.3

135

−0.4

0

20

40 60 80 Time (dekad)

100

130

0

20

40 60 80 Time (dekad)

100

FIGURE 8.8 The filtering of NDWI (——) in (a) is done with a window that is too large to allow the filtered data (– – –) to follow sudden increases and decreases of underlying data values. The data in the window are scanned and if there is a large increase or decrease, an automatic decrease in the window size will result. The filtering is then repeated using the new locally adapted size (b). Note the improved fit at rising edges and narrow peaks.

The data in the window are scanned and if a large increase or decrease is observed, the adaptive Savitzky–Golay method applied an automatic decrease in the window size. The filtering is then repeated using the new locally adapted size. Savitzky–Golay filtering with and without the adaptive procedure is illustrated in Figure 8.8. In the figure it is shown that the adaptation of the window improves the fit at the rising edges and at narrow seasonal peaks. 8.4.2

Extracting Seasonal Metrics from Time-Series and Statistical Analysis

Four seasonal metrics were extracted for each of the rainy seasons. Figure 8.9 illustrates the different metrics per season for NDWI and KBDI time-series. The beginning of a season, that is, 20% left of the rainy season, is defined from the final function fit as the point in time for which the index value has increased by 20% of the distance between the left minimum level and the maximum. The end of the season is defined in a similar way

(a)

(b) 0.4

0

0.2 −KBDI

NDWI

–200 0

–400

–0.2

–600

–0.4 –0.6 0

20

60 40 Time (dekad)

80

100

–800 0

20

40 60 80 Time (dekad)

100

FIGURE 8.9 The final fit of the Savitzky–Golay function (– – –) to the NDWI (a) and –KBDI (b) series (——), with the four defined metrics, that is, 20% left and right, and 80% left and right (&), overlaid on the graph. Points with flagged data errors (þ) were assigned weights of zero and did not influence the fit. A dekad is defined as a 10-day period.

Signal and Image Processing for Remote Sensing

166

as the point 20% right of the rain season. The 80% left and right points are defined as the points for which the function fit has increased to 80% of the distance between, respectively, the left and right minimum levels and the maximum. The current technique used to define metrics also is used by Verbesselt et al. [6] to define the beginning of the fire season. The temporal occurrence and the value of each metric were extracted for further exploratory statistical analysis to study the relationship between time-series. The SPOT VGT S10 time-series consisted of four seasons (1998–2002) from which four occurrences and values per metric type were extracted. Twenty-four occurrence–value combinations per metric type ultimately were available for further analysis since six weather stations were used. Serial correlation that occurs in remote sensing and climate-based time-series invalidates inferences made by standard parametric tests, such as the Student’s t-test or the Pearson correlation. All extracted occurrence–value combinations per metric type were tested for autocorrelation using the Ljung–Box autocorrelation test [26]. Robust nonparametric techniques, such as the Wilcoxon’s signed rank test were used in case of non-normally distributed data. The normality of the data was verified using the Shapiro–Wilkinson normality test [29]. Firstly, the distribution of the temporal occurrence of each metric was visualized and evaluated based on whether or not there was a significant difference between the temporal occurrence of the four metric types extracted from –KBDI and NDWI time-series. Next, the strength and significance of the relationship between –KBDI and NDWI values of the four metric types were assessed with an OLS regression analysis.

8.5

Results and Discussion

Figure 8.9 illustrates the optimized function fit and the defined metrics for the –KBDI and NDWI. Notice that the Savitzky–Golay function could properly define the behavior of the different time-series. The function was fitted to the upper envelope of the data by using the uncertainty information derived during the preprocessing step. The results of the statistical analysis based on the extracted metrics for –KBDI and NDWI are presented. The Ljung–Box statistic indicated that the extracted occurrences and values were not significantly autocorrelated at a 95% confidence level. All p-values were greater than 0.1, failing to reject the null hypothesis of independence. 8.5.1

Temporal Analysis of the Seasonal Metrics

Figure 8.10 illustrates the temporal distribution of temporal occurrence of extracted metrics from time-series of –KBDI and NDWI. The occurrences of extracted metrics were significantly non-normally distributed at a 95% confidence level (p > 0.1), indicating that the Wilcoxon’s signed rank can be used. The Wilcoxon’s signed rank test showed that –KBDI and NDWI occurrences of the 80% left and right, and 20% right were not significantly different from each other at a 95% confidence level (p > 0.1). This confirmed that –KBDI and NDWI were temporally related. It also corroborated the results of Burgan [11] and Ceccato et al. [4] who found that both –KBDI and NDWI were related to the seasonal vegetation moisture dynamics, as measured by VWC. Figure 8.10, however, illustrates that the start of the rainy season (i.e., 20% left occurrence), derived from the –KBDI and NDWI time-series, was different. The Wilcoxon’s

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 167 20% right

28

50

30

32

55

34

36

60

38

40

65

20% left

–KBDI

NDWI

–KBDI

80% left

NDWI

38

34

40

36

38

42 44

40

46

42

48

44

50

46

80% right

–KBDI

NDWI

–KBDI

NDWI

FIGURE 8.10 Box plots of the temporal occurrence of the four defined metrics, that is, 20% left and right, and 80% left and right, extracted from time-series of KBDI and NDWI. The dekads (10-day period) are shown on the y-axis and are indicative of the temporal occurrence of the metric. The upper and lower boundaries of the boxes indicate upper and lower quartiles. The median is indicated by the solid line (—) within each box. The whiskers connect the extremes of the data, which were defined as 1.5 times the inter-quartile range. Outliers are represented by (o).

signed rank test confirmed that the –KBDI and NDWI differed significantly from each other at a 95% confidence level (p < 0.01). This phenomenon can be explained by the fact that vegetation in the study area starts growing before the rainy season starts, due to an early change in air temperature (N. Govender, Scientific Service Kruger National Park, South Africa, personal communication). This explained why the NDWI reacted before the change in climatic conditions as measured by the –KBDI, given that the NDWI is directly related to vegetation moisture dynamics [4].

8.5.2

Regression Analysis Based on Values of Extracted Seasonal Metrics

The assumptions of the OLS regression models between values of metrics extracted from –KBDI and NDWI time-series were verified. The Wald test statistic showed nonlinearity to be not significant at a 95% confidence level (p > 0.15). The Shapiro– Wilkinson normality test confirmed that the residuals were normally distributed at a

Signal and Image Processing for Remote Sensing

168

95% confidence level (p < 0.01) [6,29]. Table 8.2 illustrates the results of the OLS regression analysis between values of metrics extracted from –KBDI and NDWI timeseries. The values extracted at the ‘‘20% right’’ position of the –KBDI and NDWI time-series showed a significant relationship at a 95% confidence level (p < 0.01). The other extracted metrics did not exhibit significant relationships at a 95% confidence level (p > 0.1). A significant relationship between the –KBDI and NDWI time-series was observed only at the moment when savanna vegetation was completely cured (i.e., 20% right-hand side). The savanna vegetation therefore reacted differently to changes in climate parameters such as rainfall and temperature, as measured by KBDI, depending on the phenological growing cycle. This phenomenon could be explained because a living plant uses defense mechanisms to protect itself from drying out, while a cured plant responds to climatic conditions [46]. These results consequently indicated that the relationship between extracted values of –KBDI and NDWI was influenced by seasonality. This is in corroboration with the results of Ji and Peters [23], who indicated that seasonality had a significant effect on the relationship between vegetation as measured by a remote sensing index and drought index. These results further illustrated that the seasonal effect needs to be taken into account when regression techniques are used to quantify the relationship between time-series related to vegetation moisture dynamics. The seasonal effect also can be accounted for by utilizing autoregression models with seasonal dummy variables, which take the effect of serial correlation and seasonality into account [23,26]. However, the proposed method to account for serial correlation by sampling at specific moments in time had an additional advantage; the influence of seasonality could be studied by extracting metrics at the specified moments, besides the fact that serial correlation was taken into account. Furthermore, it was shown that serial correlation caused an overestimation of the correlation coefficient is when results from Table 8.1 and Table 8.2 were compared. All the coefficients of determination (R2) of Table 8.1 were significant with an average value of 0.7, while in Table 8.2 only the correlation coefficient at the end of the rainy season (20% right-hand side) was significant (R2 ¼ 0.49). This confirmed the importance of accounting for serial correlation and seasonality in the residuals of a regression model, when studying the relationship between two time-series.

8.5.3

Time-Series Analysis Techniques

Time-series analysis models most often are used for purposes of describing current conditions and forecasting [25]. The models use the serial correlation in time-series as a TABLE 8.2 Coefficients of Determination of the OLS Regression Models (NDWI –KBDI) for the Four Extracted Seasonal Metric Values between –KBDI and NDWI Time-Series (n ¼ 24 per Metric) NDWI KBDI 20% left 20% right 80% left 80% right

R2

p-Values

0.01 0.49 0.00 0.01

0.66 <0.01 0.97 0.61

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 169 tool to relate temporal observations. Future observations can be predicted by modeling the serial correlation structure of a time-series [26]. Time-series analysis techniques (e.g., ARIMA) can be used to model a time-series by using other, independent timeseries [27]. ARIMA subsequently can be used to study the relationship between –KBDI and NDWI. However, there are constraints that have to be considered before ARIMA models can be applied to a time-series (e.g., drought index) by using other predictor time-series (e.g., satellite index). Firstly, the goodness-of-fit of an ARIMA model will not be significant when changes in the satellite index precede or coincide with those in the drought index. The CCF can be used in this context to verify how time-series are related to each other. Time lag results in Table 8.1 indicate that the –KBDI (drought index) precedes or coincides with the NDWI time-series. This illustrates that ARIMA models cannot directly be used to predict the KBDI, with NDWI as the predictor variable. Consequently, other more advanced timeseries analysis techniques are needed to model vegetation dynamics because they will precede or coincide with the dynamics monitored by remote sensing indices in most of the cases. Such more advanced time-series analysis techniques, however, are not discussed since they are outside the scope of this chapter. Secondly, availability of data is limited for time-series analysis, namely, from 1998 to 2002. This is an important constraint because two separate data sets are needed to parameterize and evaluate an ARIMA model. One set is needed for parameterization, while the other is used to forecast and validate the ARIMA model through comparison of the observed and expected values. Accordingly, it is necessary to interpolate missing satellite data that were masked out during preprocessing to ensure adequate data are available for parameterization. Thirdly, the proposed sampling strategy made investigation of the time lag and correlation at a defined instant in time possible, as opposed to ARIMA or cross-correlation analysis, through which only the overall relationship between time-series can be studied [25]. The applied sampling strategy is thus ideally suited to study the relationship between time-series of climate and remote sensing data, characterized by seasonality and serial correlation. The sampling of seasonal metrics minimized the influence of serial correlation, thereby making the study of seasonality possible.

8.6

Conclusions

Serial correlation problems are not unknown in the field of statistical or general meteorology. However, the presence of serial correlation, found during analysis of a variable sampled sequentially at regular time intervals, seems to be disregarded by many agricultural meteorologists and remote sensing scientists. This is true despite abundant documentation available in the traditional meteorological and statistical literature. Therefore, an overview of the most important time-series analysis techniques and concepts was presented, namely, stationarity, autocorrelation, differencing, decomposition, autoregression, and ARIMA. A method was proposed to study the relationship between a meteorological drought index (KBDI) and remote sensing index (NDWI), both related to vegetation moisture dynamics, by accounting for the serial correlation effect. The relationship between –KBDI and NDWI was studied by extracting nonserially correlated seasonal metrics, for example, 20% and 80% left- and right-hand side metrics of the rainy season, based on a Savitzky–Golay fit to the upper envelope of the time-series. Serial correlation between the

170

Signal and Image Processing for Remote Sensing

extracted metrics was shown to be minimal and seasonality was an important factor influencing the relationship between NDWI and –KBDI time-series. Statistical analysis using the temporal occurrence of the extracted metrics revealed that NDWI and –KBDI time-series are temporally connected, except at the beginning of the rainy season. The fact that the savanna vegetation starts re-greening before the start of the rainy season explains this inability to detect the beginning of the rainy season. The values of the extracted seasonal metrics of NDWI and –KBDI were significantly related only at the end of the rainy season, namely, at the 20% right-hand side value of the fitted curve. The savanna vegetation at the end of the rainy season was cured and responded strongly to changes in climatic conditions monitored by the –KBDI, such as rain and temperature. The relationship between –KBDI and NDWI consequently changes during the season, which indicates that seasonality is an important factor that needs to be taken into account. Moreover, it was shown that correlation coefficients estimated by OLS regression analysis were overestimated due to the influence of serial correlation in the residuals. This confirmed the importance of taking serial correlation of the residuals into account by sampling nonserially correlated seasonal metrics when studying the relationship between time-series. The serial correlation effect consequently was taken into account by the extraction of seasonal metrics from time-series. The seasonal metrics in turn could be used to study the relationship between remote sensing and ground-based time-series, such as meteorological or field measurements. A better understanding of the relationship between remote sensing and in situ observations at regular time intervals will contribute to the use of remotely sensed data for the development of an index that represents seasonal vegetation moisture dynamics.

Acknowledgments The SPOT VGT S10 data sets were generated by the Flemish Institute for Technological Development. The climate data were provided by the Weather Services of South Africa, whereas the National Land Cover Map (1995) was supplied by the Agricultural Research Centre of South Africa. We acknowledge the support of L. Eklundh, as well as the Crafoord and Bergvall foundations. The authors wish to thank N. Govender from the Scientific Services of Kruger National Park for scientific input.

References 1. Camia, A. et al., Meteorological fire danger indices and remote sensing, in Remote Sensing of Large Wildfires in the European Mediterranean Basin, E. Chuvieco, Ed., Springer-Verlag, New York, 1999, p. 39. 2. Ceccato, P. et al., Estimation of live fuel moisture content, in Wildland Fire Danger Estimation and Mapping: The Role of Remote Sensing Data, E. Chuvieco, Ed., World Scientific Publishing, New York, 2003, p. 63. 3. Dennison, P.E. et al., Modeling seasonal changes in live fuel moisture and equivalent water thickness using a cumulative water balance index, Remote Sens. Environ., 88, 442, 2003.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 171 4. Ceccato, P. et al., Detecting vegetation leaf water content using reflectance in the optical domain, Remote Sens. Environ., 77, 22, 2001. 5. Jackson, T.J. et al., Vegetation water content mapping using landsat data derived normalized difference water index for corn and soybeans, Remote Sens. Environ., 92, 475, 2004. 6. Verbesselt, J. et al., Evaluating satellite and climate data derived indices as fire risk indicators in savanna ecosystems, IEEE Trans. Geosci. Remote Sens., 44; 1622–1632; 2006. 7. Van Wilgen, B.W. et al., Response of Savanna fire regimes to changing fire-management policies in a large African national park, Conserv. Biol., 18, 1533, 2004. 8. Dimitrakopoulos, A.P. and Bemmerzouk, A.M., Predicting live herbaceous moisture content from a seasonal drought index, Int. J. of Biometeorol., 47, 73, 2003. 9. Keetch, J.J. and Byram, G.M., A drought index for forest fire control, U.S.D.A. Forest Service, Asheville NC, SE-38, 1988. 10. Janis, M.J., Johnson, M.B., and Forthun, G., Near-real time mapping of Keetch–Byram drought index in the south-eastern United States, Int. J. Wildland Fire, 11, 281, 2002. 11. Burgan, E.R., Correlation of plant moisture in Hawaii with the Keetch–Byram drought index, USDA. Forest Service Research Note, PSW-307, 1976. 12. Chuvieco, E. et al., Combining NDVI and surface temperature for the estimation of live fuel moisture content in forest fire danger rating, Remote Sens. Environ., 92, 322, 2004. 13. Maki, M., Ishiahra, M., and Tamura, M., Estimation of leaf water status to monitor the risk of forest fires by using remotely sensed data, Remote Sens. Environ., 90, 441, 2004. 14. Paltridge, G.W. and Barber, J., Monitoring grassland dryness and fire potential in Australia with NOAA AVHRR data, Remote Sens. Environ., 25, 381, 1988. 15. Chladil, M.A. and Nunez, M., Assessing grassland moisture and biomass in Tasmania—the application of remote-sensing and empirical-models for a cloudy environment, Int. J. Wildland Fire, 5, 165, 1995. 16. Hardy, C.C. and Burgan, R.E., Evaluation of NDVI for monitoring live moisture in three vegetation types of the western US, Photogramm. Eng. Remote Sens., 65, 603, 1999. 17. Chuvieco, E. et al., Estimation of fuel moisture content from multitemporal analysis of LANDSAT thematic mapper reflectance data: applications in fire danger assessment, Int. J. Remote Sens., 23, 2145, 2002. 18. Fensholt, R. and Sandholt, I., Derivation of a shortwave infrared water stress index from MODIS near- and shortwave infrared data in a semiarid environment, Remote Sens. Environ., 87, 111, 2003. 19. Hunt, E.R., Rock, B.N., and Nobel, P.S., Measurement of leaf relative water content by infrared reflectance, Remote Sens. Environ., 22, 429, 1987. 20. Ceccato, P., Flasse, S., and Gregoire, J.M., Designing a spectral index to estimate vegetation water content from remote sensing data—Part 2: Validation and applications, Remote Sens. Environ., 82, 198, 2002. 21. Myneni, R.B. et al., Increased plant growth in the northern high latitudes from 1981 to 1991, Nature, 386, 698, 1997. 22. Eklundh, L., Estimating relations between AVHRR NDVI and rainfall in East Africa at 10-day and monthly time scales, Int. J. Remote Sens., 19, 563, 1998. 23. Ji, L. and Peters, A.J., Assessing vegetation response to drought in the northern great plains using vegetation and drought indices, Remote Sens. Environ., 87, 85, 2003. 24. De Beurs, K.M. and Henebry, G.M., Land surface phenology, climatic variation, and institutional change: analyzing agricultural land cover change in Kazakhstan, Remote Sens. Environ., 89, 497, 2004. 25. Meek, D.W. et al., A note on recognizing autocorrelation and using autoregression, Agricult. Forest Meterol., 96, 9, 1999. 26. Brockwell, P.J. and Davis, R.A., Introduction to Time Series and Forecasting, 2nd ed., SpringerVerlag, New York, 2002, p. 31. 27. Ford, C.R. et al., Modeling canopy transpiration using time series analysis: a case study illustrating the effect of soil moisture deficit on Pinus Taeda, Agricult. Forest Meterol., 130, 163, 2005. 28. Montanari, A., Deseasonalisation of hydrological time series through the normal quantile transform, J. Hydrol., 313, 274, 2005.

172

Signal and Image Processing for Remote Sensing

29. R Development Core Team, R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, URL http:// www.R-project.org, 2005. 30. Rahman, H. and Dedieu, G., SMAC—a simplified method for the atmospheric correction of satellite measurements in the solar spectrum, Int. J. Remote Sens., 15, 123, 1994. 31. Stroppiana, D. et al., An algorithm for mapping burnt areas in Australia using SPOT-VEGETATION data, IEEE Trans. Geosci. Remote Sens., 41, 907, 2003. 32. Thompson, M.A., Standard land-cover classification scheme for remote-sensing applications in South Africa, South Afr. J. Sci., 92, 34, 1996. 33. Aguado, I. et al., Assessment of forest fire danger conditions in southern Spain from NOAA images and meteorological indices, Int. J. Remote Sens., 24, 1653, 2003. 34. von Storch, H. and Zwiers, F.W., Statistical Analysis in Climate Research, Cambridge University Press, Cambridge, 1999, p. 483. 35. Thejll, P. and Schmith, T., Limitations on regression analysis due to serially correlated residuals: application to climate reconstruction from proxies, J. Geophys. Res. Atmos., 110, 2005. 36. Cleveland, R.B. et al., STL: a seasonal-trend decomposition procedure based on Loess., J. Off. Stat., 6, 3, 1990. 37. Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, 4th ed., Springer-Verlag, New York, 2003, p. 493. 38. Jo¨nsson, P. and Eklundh, L., Seasonality extraction by function fitting to time-series of satellite sensor data, IEEE Trans. Geosci. Remote Sens., 40, 1824, 2002. 39. Jo¨nsson, P. and Eklundh, L., TIMESAT—a program for analyzing time-series of satellite sensor data, Comput. Geosci., 30, 833, 2004. 40. Menenti, M. et al., Mapping agroecological zones and time-lag in vegetation growth by means of Fourier-analysis of time-series of NDVI images, Adv. Space Res., 13, 233, 1993. 41. Azzali, S. and Menenti, M., Mapping vegetation–soil–climate complexes in Southern Africa using temporal Fourier analysis of NOAA-AVHRR NDVI data, Int. J. Remote Sens., 21, 973, 2000. 42. Olsson, L. and Eklundh, L., Fourier-series for analysis of temporal sequences of satellite sensor imagery, Int. J. Remote Sens., 15, 3735, 1994. 43. Cihlar, J., Identification of contaminated pixels in AVHRR composite images for studies of land biosphere, Remote Sens. Environ., 56, 149, 1996. 44. Sellers, P.J. et al., A global 1-degrees-by-1-degrees NDVI data set for climate studies. The generation of global fields of terrestrial biophysical parameters from the NDVI, Int. J. Remote Sens., 15, 3519, 1994. 45. Roerink, G.J., Menenti, M., and Verhoef, W., Reconstructing cloudfree NDVI composites using Fourier analysis of time series, Int. J. Remote Sens., 21, 1911, 2000. 46. Pyne, S.J., Andrews, P.L., and Laven, R.D., Introduction to Wildland Fire, 2nd ed., John Wiley & Sons, New York, 1996, 117.

9 Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

Sang-Ho Yun and Howard Zebker

CONTENTS 9.1 Image Descriptions ........................................................................................................... 174 9.1.1 TOPSAR DEM....................................................................................................... 174 9.1.2 SRTM DEM............................................................................................................ 175 9.2 Image Registration............................................................................................................ 176 9.3 Artifact Elimination .......................................................................................................... 177 9.4 Prediction-Error (PE) Filter ............................................................................................. 178 9.4.1 Designing the Filter.............................................................................................. 178 9.4.2 1D Example ........................................................................................................... 179 9.4.3 The Effect of the Filter ......................................................................................... 179 9.5 Interpolation ...................................................................................................................... 180 9.5.1 PE Filter Constraint .............................................................................................. 180 9.5.2 SRTM DEM Constraint........................................................................................ 181 9.5.3 Inversion with Two Constraints ........................................................................ 181 9.5.4 Optimal Weighting............................................................................................... 181 9.5.5 Simulation of the Interpolation .......................................................................... 183 9.6 Interpolation Results ........................................................................................................ 183 9.7 Effect on InSAR ................................................................................................................. 185 9.8 Conclusion ......................................................................................................................... 187 References ................................................................................................................................... 188

A prediction-error (PE) filter is an array of numbers designed to interpolate missing parts of data such that the interpolated parts have the same spectral content as the existing parts. The data can be a one-dimensional time series, two-dimensional image, or a threedimensional quantity such as subsurface material property. In this chapter, we discuss the application of a PE filter to recover missing parts of an image when a low-resolution image of the missing parts is available. One of the research issues on PE filter is improving the quality of image interpolation for nonstationary images, in which the spectral content varies with position. Digital elevation models (DEMs) are in general nonstationary. Thus, PE filter alone cannot guarantee the success of image recovery. However, the quality of the image recovery of a high-resolution image can be improved with independent data set such as a lowresolution image that has valid pixels for the missing regions of the high-resolution image. Using a DEM as an example image, we introduce a systematic method to use a 173

Signal and Image Processing for Remote Sensing

174

PE filter incorporating the low-resolution image as an additional constraint, and show the improved quality of the image interpolation. High-resolution DEMs are often limited in spatial coverage; they also may possess systematic artifacts when compared to comprehensive low-resolution maps. We correct artifacts and interpolate regions of missing data in topographic synthetic aperture radar (TOPSAR) DEMs using a low-resolution shuttle radar topography mission (SRTM) DEM. Then PE filters are to interpolate and fill missing data so that the interpolated regions have the same spectral content as the valid regions of the TOPSAR DEM. The SRTM DEM is used as an additional constraint in the interpolation. Using cross-validation methods one can obtain the optimal weighting for the PE filter and the SRTM DEM constraints.

9. 1

Image Descriptions

InSAR is a powerful tool for generating DEMs [1]. The TOPSAR and SRTM sensors are primary sources for the academic community for DEMs derived from single-pass interferometric data. Differences in system parameters such as altitude and swath width (Table 9.1) result in very different properties for derived DEMs. Specifically, TOPSAR DEMs have better resolution, while SRTM DEMs have better accuracy over larger areas. TOPSAR coverage is often not spatially complete.

9.1.1

TOPSAR DEM

TOPSAR DEMs are produced from cross-track interferometric data acquired with NASA’s AIRSAR system mounted on a DC-8 aircraft. Although the TOPSAR DEMs have a higher resolution than other existing data, they sometimes suffer from artifacts and missing data due to roll of the aircraft, layover, and flight planning limitations. The DEMs derived from the SRTM have lower resolution, but fewer artifacts and missing data than TOPSAR DEMs. Thus, the former often provides information in the missing regions of the latter. We illustrate joint use of these data sets using DEMs acquired over the Gala´pagos Islands. Figure 9.1 shows the TOPSAR DEM used in this study. The DEM covers Sierra Negra volcano on the island of Isabela. Recent InSAR observations reveal that the volcano has been deforming relatively rapidly [2,3]. InSAR analysis can require use of a DEM to produce a simulated interferogram required to isolate ground deformation. The effect of artifact elimination and interpolation for deformation studies is discussed later in this chapter. TABLE 9.1 TOPSAR Mission versus SRTM Mission Mission Platform Nominal Swath width Baseline DEM resolution DEM coord. system

TOPSAR

SRTM

DC-8 aircraft Altitude 9 km 10 km 2.583 m 10 m None

Space shuttle 233 km 225 km 60 m 90 m Lat/Long

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

100

175

1 3

200 300 2

400 N 500 2 km 600 200

400 800

800

600 900

1000

1100

1000

1200

Altitude 1200 (m)

FIGURE 9.1 The original TOPSAR DEM of Sierra Negra volcano in Gala´pagos Islands (inset for location). The pixel spacing of the image is 10 m. The boxed areas are used for illustration later in this paper. Note that there are a number of regions of missing data with various shapes and sizes. Artifacts are not identifiable due to the variation in topography. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

The TOPSAR DEMs have a pixel spacing of about 10 m, sufficient for most geodetic applications. However, regions of missing data are often encountered (Figure 9.1), and significant residual artifacts are found (Figure 9.2). The regions of missing data are caused by layover of the steep volcanoes and flight planning limitations. Artifacts are large-scale and systematic and most likely due to uncompensated roll of the DC-8 aircraft [4]. Attempts to compensate this motion include models of piecewise linear imaging geometry [5] and estimating imaging parameters that minimize the difference between the TOPSAR DEM and an independent reference DEM [6]. We use a nonparameterized direct approach by subtracting the difference between the TOPSAR and SRTM DEMs.

9.1.2

SRTM DEM

The recent SRTM mission produced nearly worldwide topographic data at 90 m posting. SRTM topographic data are in fact produced at 30 m posting (1 arcsec); however, highresolution data sets for areas outside of the United States are not available to the public at this time. Only DEMs at 90 m posting (3 arcsec) are available to download. For many analyses, finer scale elevation data are required. For example, a typical pixel spacing in a spaceborne SAR image is 20 m. If the SRTM DEMs are used for topography removal in spaceborne interferometry, the pixel spacing of the final interferograms would be limited by the topography data to at best 90 m. Despite the lower resolution, the SRTM DEM is useful because it has fewer motion-induced artifacts than the TOPSAR DEM. It also has fewer data holes. The merits and demerits of the two DEMs are in many ways complementary to each other. Thus, a proper data fusion method can overcome the shortcomings of each and produce a new DEM that combines the strengths of the two data sets: a DEM that has a

Signal and Image Processing for Remote Sensing

176

1000

500

50

1000

800

100

800

600

150

600

400

200

400

1000

1500

2000 200

250

200

2500

(a)

500

1000

1500

2000

0 (m)

2500

300

(b)

50

100

150

200

250

300

0 (m)

15

50

100

10

150 5 200

idth th w Swa km = 10

250

0

−5 (m)

300

(c)

50

100

150

200

250

300

FIGURE 9.2 (See color insert following page 302.) (a) TOPSAR DEM and (b) SRTM DEM. The tick labels are pixel numbers. Note the difference in pixel spacing between the two DEMs. (c) Artifacts obtained by subtracting the SRTM DEM from the TOPSAR DEM. The flight direction and the radar look direction of the aircraft associated with the swath with the artifact are indicated with long and short arrows, respectively. Note that the artifacts appear in one entire TOPSAR swath, while they are not as serious in other swaths.

resolution of the TOPSAR DEM and large-scale reliability of the SRTM DEM. In this chapter, we present an interpolation method that uses both TOPSAR and SRTM DEMs as constraints.

9.2

Image Registration

The original TOPSAR DEM, while in ground-range coordinates, is not georeferenced. Thus, we register the TOPSAR DEM to the SRTM DEM, which is already registered in a latitude–longitude coordinate system. The image registration is carried out between the DEM data sets using an affine transformation. Although the TOPSAR DEM is not georeferenced, it is already on the ground coordinate system. Thus, scaling and rotation are the two most important components. We have seen that skewing component was negligible. Any higher order transformation between the two DEMs would also be negligible. The affine transformation is as follows:

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

xS a ¼ yS c

b d

xT e þ yT f

177

(9:1)

x xS and T are tie points in the SRTM and TOPSAR DEM coordinate systems, where yS yT respectively. Since [a b e] and [c d f] are estimated separately, at least three tie points are required to uniquely determine them. We picked 10 tie points from each DEM based on topographic features and solved for the six unknowns in a least-square sense. Given the six unknowns, we choose new georeferenced sample locations that are uniformly spaced; every ninth sample location corresponds to the sample location of xT xS and are calculated. Then, the SRTM DEM. Those sample locations from yS yT nearest TOPSAR DEM value is selected and put into the corresponding new georeferenced sample location. The intermediate values are filled in from the TOPSAR map to produce the georeferenced 10-m data set. It should be noted that it is not easy to determine the tie points in DEM data sets. Enhancing the contrast of the DEMs facilitated the process. In general, fine registration is important for correctly merging different data sets. The two DEMs in this study have different pixel spacings. It is difficult to pick tie points with higher precision than the pixel spacing of the coarser image. In our method, however, the SRTM DEM, the coarser image, is treated as an averaged image of the TOPSAR DEM, the finer image. In our inversion, only the 9-by-9 averaged values of the TOPSAR DEM are compared with the pixel values of the SRTM DEM. Thus, the fine registration is less critical in this approach than in the case where a one-to-one match is required.

9.3

Artifact Elimination

Examination of the georeferenced TOPSAR DEM (Figure 9.2a) shows motion artifacts when compared to the SRTM DEM (Figure 9.2b). The artifacts are not clearly discernible in Figure 9.2a because their magnitude is small in comparison to the overall data values. The artifacts are identified by downsampling the registered TOPSAR DEM and subtracting the SRTM DEM. Large-scale anomalies that periodically fluctuate over an entire swath are visible in Figure 9.2c. The periodic pattern is most likely due to uncompensated roll of the DC-8 aircraft. The spaceborne data are less likely to exhibit similar artifacts, because the spacecraft is not greatly affected by the atmosphere. Note that the width of the anomalies corresponds to the width of a TOPSAR swath. Because the SRTM swath is much larger than that of the TOPSAR system (Table 9.1), a larger area is covered under consistent conditions, reducing the number of parallel tracks required to form an SRTM DEM. The maximum amplitude of the motion artifacts in our study area is about 20 m. This would result in substantial errors in many analyses if not properly corrected. For example, if this TOPSAR DEM is used for topography reduction in repeat-pass InSAR using ERS-2 data with a perpendicular baseline of about 400 m, the resulting deformation interferogram would contain one fringe ( ¼ 2.8 cm) of spurious signal. To remove these artifacts from the TOPSAR DEM, we up-sample the difference image with bilinear interpolation by a factor of 9 so that its pixel spacing matches the TOPSAR DEM. The difference image is subtracted from the TOPSAR DEM. This process is described with a flow diagram in Figure 9.3. Note that the lower branch

Signal and Image Processing for Remote Sensing

178

TOPSAR DEM − 9average

−

TOPSAR DEM corrected

9bilinear

SRTM DEM FIGURE 9.3 The flow diagram of the artifact elimination. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

undergoes two low-pass filter operations when averaging and bilinear interpolation are implemented, while the upper branch preserves the high frequency contents of the TOPSAR DEM. In this way we can eliminate the large-scale artifacts while retaining details in the TOPSAR DEM.

9.4

Prediction-Error (PE) Filter

The next step in the DEM process is to fill in missing data. We use a PE filter operating on the TOPSAR DEM to fill these gaps. The basic idea of the PE filter constraint [7,8] is that missing data can be estimated so that the restored data yield minimum energy when the PE filter is applied. The PE filter is derived from training data, which are normally valid data surrounding the missing regions. The PE filter is selected so that the missing data and the valid data share the same spectral content. Hence, we assume that the spectral content of the missing data in the TOPSAR DEM is similar to that of the regions with valid data surrounding the missing regions.

9.4.1

Designing the Filter

We generate a PE filter such that it rejects data with statistics found in the valid regions of the TOPSAR DEM. Given this PE filter, we solve for data in the missing regions such that the interpolated data are also nullified by the PE filter. This concept is illustrated in Figure 9.4. The PE filter, fPE, is found by minimizing the following objective function, kfPE xe k2

(9:2)

where xe is the existing data from the TOPSAR DEM, and * represents convolution. This expression can be rewritten in a linear algebraic form using the following matrix operation: kFPE xe k2

(9:3)

kXe fPE k2

(9:4)

or equivalently

where FPE and Xe are the matrix representations of fPE and xe for convolution operation. These matrix and vector expressions are used to indicate their linear relationship.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

179

Xe ∗

fPE

∗

9.4.2

fPE = 0 + e1

Xm

= 0 + e2

FIGURE 9.4 Concept of PE filter. The PE filter is estimated by solving an inverse problem constrained with the remaining part, and the missing part is estimated by solving another inverse problem constrained with the filter. The «1 and «2 are white noise with small amplitude.

1D Example

The procedure of acquiring the PE filter can be explained with a 1D example. Suppose that a data set, x ¼ [x1, . . ., xn] (where n 3) is given, and we want to compute a PE filter of length 3, fPE ¼ [1 f1 f2]. Then we form a system of linear equations as follows: 2

x3 6 x4 6 6 .. 4 . xn

x2 x3 .. .

x1 x2 .. .

xn1

xn 2

3

2 3 7 1 74 5 7 f1 0 5 f2

(9:5)

The first element of the PE filter should be equal to one to avoid the trivial solution, fPE ¼ 0. Note that Equation 9.5 is the convolution of the data and the PE filter. After simple algebra and with 3 2 3 2 x3 x2 x1 6 . 7 6 . .. 7 d 4 .. 5 and D 4 .. . 5 xn1

xn

xn2

we get D

f1 f2

d

and its normal equation becomes f1 ¼ (DT D)1 DT ( d) f2

(9:6)

(9:7)

Note that Equation 9.7 minimizes Equation 9.2 in a least-square sense. This procedure can be extended to 2D problems, and more details are described in Refs. [7] and [8]. 9.4.3

The Effect of the Filter

Figure 9.5 shows the characteristics of the PE filter in the spatial and Fourier domains. Figure 9.5a is the sample DEM chosen from Figure 9.1 (numbered box 1) for demonstration. It contains various topographic features and has a wide range of spectral content (Figure 9.5d). Figure 9.5b is the 5-by-5 PE filter derived from Figure 9.5a by solving the inverse problem in Equation 9.3. Note that the first three elements in the first column of the filter coefficients are 0 0 1. This is the PE filter’s unique constraint that ensures the filtered output to be white noise [7]. In the filtered output (Figure 9.5c) all the variations in the DEM were effectively suppressed. The size (order) of the PE filter is based on the

Signal and Image Processing for Remote Sensing

180

(a)

(b)

(c) 0

−0.204

0.122

0.067 −0.023

0

−0.365 −0.009

0.008 −0.022

1.000 −0.234

(d)

0.033

0.051 −0.010

0.063

0.001 −0.013

−0.606

0.151

−0.072

0.073 −0.016 −0.015

(e)

0.020

(f)

FIGURE 9.5 The effect of a PE filter. (a) original DEM; (b) a 2D PE filter found from the DEM; (c) DEM filtered with the PE filter; and (d), (e), and (f) the spectra of (a), (b), and (c), respectively, plotted in dB. (a) and (c) are drawn with the same color scale. Note that in (c) the variation of image (a) was effectively suppressed by the filter. The standard deviations of (a) and (c) are 27.6 m and 2.5 m, respectively. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

complexity of the spectrum of the DEM. In general, as the spectrum becomes more complex, a larger size filter is required. After testing various sizes of the filter, we found a 5-by-5 size appropriate for the DEM used in our study. Figure 9.5d and Figure 9.5e show the spectra of the DEM and the PE filter, respectively. These illustrate the inverse relationship of the PE filter to the corresponding DEM in the Fourier domain, such that their product is minimized (Figure 9.5f). This PE filter constrains the interpolated data in the DEM to similar spectral content to the existing data. All inverse problems in this study were derived using the conjugate gradient method, where forward and adjoint functional operators are used instead of the explicit inverse operators [7], saving computer memory space.

9.5 9.5.1

Interpolation PE Filter Constraint

Once the PE filter is determined, we next estimate the missing parts of the image. As depicted in Figure 9.4, interpolation using the PE filter requires that the norm of the filtered output be minimized. This procedure can be formulated as an inverse computation minimizing the following objective function: kFPE xk2

(9:8)

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

181

where FPE is the matrix representation of the PE filter convolution, and x represents the entire data set including the known and the missing regions. In the inversion process we only update the missing region, without changing the known region. This guarantees seamless interpolation across the boundaries between the known and missing regions. 9.5.2

SRTM DEM Constraint

As previously stated, 90-m posting SRTM DEMs were generated from 30-m posting data. This downsampling was done by calculating three ‘‘looks’’ in both the easting and northing directions. To use the SRTM DEM as a constraint to interpolate the TOPSAR DEM, we posit the following relationship between the two DEMs: each pixel value in a 90m posting SRTM DEM can be considered equivalent to the averaged value of a 9-by-9 pixel window in a 10-m posting TOPSAR DEM centered at the corresponding pixel in the SRTM DEM. The solution using the constraint of the SRTM DEM to find the missing data points in the TOPSAR DEM can be expressed as minimizing the following objective function: ky Axm k2

(9:9)

where y is an SRTM DEM expressed as a vector that covers the missing regions of the TOPSAR DEM, and A is an averaging operator generating nine looks, and xm represents the missing regions of the TOPSAR DEM. 9.5.3

Inversion with Two Constraints

By combining two constraints, one derived from the statistics of the PE filter and one from the SRTM DEM, we can interpolate the missing data optimally with respect to both criteria. The PE filter guarantees that the interpolated data will have the same spectral properties as the known data. At the same time the SRTM constraint forces the interpolated data to have average height near the corresponding SRTM DEM. We formulate the inverse problem as a minimization of the following objective function: l2 kFPE xm k2 þ ky Axm k2

(9:10)

where l set the relative effect of each criterion. Here xm has the dimensions of the TOPSAR DEM, while y has the dimensions of the SRTM DEM. If regions of missing data are localized in an image, the entire image does not have to be used for generating a PE filter. We implement interpolation in subimages to save time and computer memory space. An example of such a subimage is shown in Figure 9.6. The image is a part of Figure 9.1 (numbered box 2). Figure 9.6a and Figure 9.6b are examples of xe in Equation 9.3 and y, respectively. The multiplier l determines the relative weight of the two terms in the objective function. As l ! 1, the solution satisfies the first constraint only, and if l ¼ 0, the solution satisfies the second constraint only.

9.5.4

Optimal Weighting

We used cross-validation sum of squares (CVSS) [9] to determine the optimal weights for the two terms in Equation 9.10. Consider a model xm that minimizes the following quantity:

Signal and Image Processing for Remote Sensing

182

(b)

(a)

2 20 4 40 6 60 8 80 10 100

12

120

14

140

16 20

40

60

80

100

800

900

2 1000

4

6

8

10

12

1100 1200 (m)

FIGURE 9.6 Example subimages of (a) TOPSAR DEM showing regions of missing data (black), and (b) SRTM DEM of the same area. These subimages are engaged in one implementation of the interpolation. The grayscale is altitude in meters. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

l2 kFPE xm k2 þ ky(k) A(k) xm k2

(k ¼ 1, . . . , N)

(9:11)

where y(k) and A(k) are the y and the A in Equation 9.10 with the k-th element and the k-th row omitted, respectively, and N is the number of elements in y that fall into the missing region. Denote this model x(k) m (l). Then we compute the CVSS defined as follows: CVSS(l) ¼

N 1 X 2 (yk Ak x(k) m (l)) N k¼1

(9:12)

where yk is the omitted element from the vector y and Ak is the omitted row vector from (k) the matrix A when the x(k) m (l) was estimated. Thus, Ak xm (l) is the prediction based on the other N 1 observations. Finally, we minimize CVSS(l) with respect to l to obtain the optimal weight (Figure 9.7). In the case of the example shown in Figure 9.6, the minimum CVSS was obtained for l ¼ 0.16 (Figure 9.7). The effect of varying l is shown in Figure 9.8. It is apparent (see Figure 9.8) that the optimal weight is a more ‘‘plausible’’ result than either of the end members, preserving aspects of both constraints. In Figure 9.8a the interpolation uses only the PE filter constraint. This interpolation does not recover the continuity of the ridge running across the DEM in north–south direction, which is observed in the SRTM DEM (Figure 9.6b). This follows from a PE filter obtained such that it eliminates the overall variations in the image. The variations include not only the ridge but also the accurate topography in the DEM. The other end member, Figure 9.8c, shows the result for applying zero weight to the PE filter constraint. Since the averaging operator A in Equation 9.10 is applied independently

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

183

CVSS

110 100

CVSS(l)

90 80 70 60 50 40

0.2

0.4

0.6

0.8

1 l

1.2

1.4

1.6

1.8

2

FIGURE 9.7 Cross-validation sum of squares. The minimum occurs when l ¼ 0.16. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

for each 9-by-9 pixel group, it is equivalent to simply filling the regions of missing data with 9-by-9 identical values that are the same as the corresponding SRTM DEM (Figure 9.6b). 9.5.5

Simulation of the Interpolation

The quality of cross-validation in this study is itself validated by simulating the interpolation process with known subimages that do not contain missing data. For example, if a known subimage is selected from Figure 9.1 (numbered box 3), we can remove some data and apply our recovery algorithm. The subimage is similar in topographic features to the area shown in Figure 9.6. The process is illustrated in Figure 9.9. We introduce a hole as shown in Figure 9.9b and calculate the CVSS (Figure 9.9d) for each l ranging from 0 to 2. Then we use the estimated l, which minimizes the CVSS, for the interpolation process to obtain the image in Figure 9.9c. For each value of l we also calculate the RMS error between the known and the interpolated images. The RMS error is plotted against l in Figure 9.9e. The CVSS is minimized for l ¼ 0.062, while the RMS error has a minimum at l ¼ 0.065. This agreement suggests that minimizing the CVSS is a useful method to balance the constraints. Note that the minimum RMS error in Figure 9.9e is about 5 m. This value is smaller than the relative vertical height accuracy of the SRTM DEM, which is about 10 m.

9.6

Interpolation Results

The method presented in the previous section was applied to the entire image of Figure 9.1. The registered TOPSAR DEM contains missing data in regions of various sizes. Small subimages were extracted from the DEM. Each subimage is interpolated, and the

Signal and Image Processing for Remote Sensing

184

20

20

20

40

40

40

60

60

80

80

80

100

100

100

120

120

120

140

140

140

60

A⬘

A

(a)

20

40

60

80

100

20

(b)

40

60

80

100

(c)

20

40

60

80

100

Profiles along A−A⬘

1080 1060

Altitude (m)

1040

1020

1000

980 (a) l → ∞ (b) l = 0.16 (c) l = 0 SRTM DEM

960

(d)

940 10

20

30

40

50

60

70

80

90

100

110

Distance (pixel)

FIGURE 9.8 The results of interpolation applied to DEMs in Figure 9.6, with various weights. (a) l!1, (b) l ¼ 0.16, and (c) l ¼ 0. Profiles along A–A0 are shown in the plot (d). (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

results are reinserted into the large DEM. The locations and sizes of the subimages are indicated with white boxes in Figure 9.10a. Note the largest region of missing data in the middle of the caldera. This region is not only a simple large gap but also a gap between two swaths. The interpolation is an iterative process and fills up regions of missing data starting from the boundary. If valid data along the boundary (boundaries of a swath for example) contain edge effects, error tends to propagate through the interpolation process. In this case, expanding the region of missing data by a few pixels before interpolation produces better results. If there is a large region of missing data, the spectral content information of valid data can fade out as the interpolation proceeds toward the center of the gap. In this case, sequentially applying the interpolation to parts of the gap is one solution. Due to edge effects along the boundary of the large gap, the interpolation result does not produce topography that matches the surrounding terrain well. Hence, we expand the gap by three pixels to eliminate edge effects. We divided the gap into multiple subimages, and each subimage was interpolated individually.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

20

20

20

40

40

40

60

60

60

80

80

80

100

100

100

120

120

120

(a)

20

40

60

80

100 120

140

(b)

20

40

60

80

100 120

140

(c)

20

40

60

185

80

100 120

140

130 125

7

120 6.5

RMS error

CVSS(l)

115 110 105 100

6 5.5

95 90

5

85

(d)

80 0

0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

l

1

4.5 0

(e)

0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

l

FIGURE 9.9 The quality of the CVSS, (a) a sample image that does not have a hole, (b) a hole was made, (c) interpolated image with an optimal weight, (d) CVSS as a function of l. The CVSS has a minimum when l ¼ 0.062, and (e) RMS error between true image (a) and the interpolated image (c). The minimum occurs when l ¼ 0.065. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

9.7

Effect on InSAR

Finally, we can investigate the effect of the artifact elimination and the interpolation on simulated interferograms. It is often easier to see differences in elevation in simulated interferograms than in conventional contour plots. In addition, simulated interferograms provide a measure of how sensitive the interferogram is to the topography. Figure 9.11 shows georeferenced simulated interferograms from three DEMs: the registered TOPSAR DEM, the TOPSAR DEM after the artifact elimination, and the TOPSAR DEM after the interpolation. In all interferograms, a C-band wavelength is used, and we assume a 452 m perpendicular baseline between two satellite positions. This perpendicular baseline is realistic [2]. The fringe lines in the interferograms are approximately height contour lines. The interval of the fringe lines is inversely proportional to the perpendicular baseline [10], and in this case one color cycle of the fringes represents about 20 m. Note in Figure 9.11a that the fringe lines are discontinuous across the long region of missing data inside the caldera. This is due to artifacts in the original TOPSAR DEM. After eliminating these artifacts the discontinuity disappears (Figure 9.11b). Finally, the missing data regions are interpolated in a seamless manner (Figure 9.11c).

Signal and Image Processing for Remote Sensing

186

100

200

300

400

500

600 (a)

200

400

600

800

1000

1200

200

400

600

800

1000

1200

100

200

300

400

500

600

(b)

800

900

1000

1100

1200 (m)

FIGURE 9.10 The original TOPSAR DEM (a) and the reconstructed DEM (b) after interpolation with PE filter and SRTM DEM constraints. The grayscale is altitude in meters, and the spatial extent is about 12 km across the image. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

187

(a)

(b)

(c)

FIGURE 9.11 Simulated interferograms from (a) the original registered TOPSAR DEM, (b) the DEM after the artifact was removed, and (c) the DEM interpolated with PE filter and the SRTM DEM. All the interferograms were simulated with the C-band wavelength (5.6 cm) and a perpendicular baseline of 452 m. Thus, one color cycle represents 20 m height difference. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

9.8

Conclusion

The aircraft roll artifacts in the TOPSAR DEM were eliminated by subtracting the difference between the TOPSAR and SRTM DEMs. A 2D PE filter derived from the existing data and the SRTM DEM for the same region are then used as interpolation constraints.

188

Signal and Image Processing for Remote Sensing

Solving the inverse problem constrained with both the PE filter and the SRTM DEM produces a high-quality interpolated map of elevation. Cross-validation works well to select optimal constraint weighting in the inversion. This objective criterion results in less biased interpolation and guarantees the best fit to the SRTM DEM. The quality of many other TOPSAR DEMs can be improved similarly.

References 1. H.A. Zebker and R.M. Goldstein, Topographic mapping from interferometric synthetic aperture radar observations, Journal of Geophysical Research, 91(B5), 4993–4999, 1986. 2. F. Amelung, S. Jo´nsson, H. Zebker, and P. Segall, Widespread uplift and ‘trapdoor’ faulting on Gala´pagos volcanoes observed with radar interferometry, Nature, 407(6807), 993–996, 2000. 3. S. Yun, P. Segall, and H. Zebker, Constraints on magma chamber geometry at Sierra Negra volcano, Gala´pagos Islands, based on InSAR observations, Journal of Volcanology and Geothermal Research, 150, 232–243, 2006. 4. H.A. Zebker, S.N. Madsen, J. Martin, K.B. Wheeler, T. Miller, Y.L. Lou, G. Alberti, S. Vetrella, and A. Cucci, The TOPSAR interferometric radar topographic mapping instrument, IEEE Transactions on Geoscience and Remote Sensing, 30(5), 933–940, 1992. 5. S.N. Madsen, H.A. Zebker, and J. Martin, Topographic mapping using radar interferometry: processing techniques, IEEE Transactions on Geoscience and Remote Sensing, 31(1), 246–256, 1993. 6. Y. Kobayashi, K. Sarabandi, L. Pierce, and M.C. Dobson, An evaluation of the JPL TOPSAR for extracting tree heights, IEEE Transactions on Geoscience and Remote Sensing, 38(6), 2446–2454, 2000. 7. J.F. Claerbout, Earth Sounding Analysis: Processing versus Inversion, Blackwell, 1992 [Online], http://sepwww.stanford.edu/sep/prof/index.html. 8. J.F. Claerbout and S. Fomel, Image Estimation by Example: Geophysical Soundings Image Construction (Class Notes), 2002 [Online], http://sepwww.stanford.edu/sep/prof/index.html. 9. G. Wahba, Spline Models for Observational Data, ser. No. 59, Applied Mathematics, Philadelphia, PA, SIAM, 1990. 10. H.A. Zebker, P.A. Rosen, and S. Hensley, Atmospheric effects in interferometric synthetic aperture radar surface deformation and topographic maps, Journal of Geophysical Research, 102(B4), 7547–7563, 1997.

10 Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

Fengyu Cong, Chi Hau Chen, Shaoling Ji, Peng Jia, and Xizhi Shi

CONTENTS 10.1 Introduction ..................................................................................................................... 189 10.2 Problem Description....................................................................................................... 190 10.3 BSCM Algorithm............................................................................................................. 192 10.4 Experiments and Analysis ............................................................................................ 195 10.4.1 Backward Matrix of Active Sonar Data for Approximating the BSCM Model ............................................................................................... 195 10.4.2 Examples of Canceling Real Sea Reverberation .......................................... 196 10.4.2.1 Example 1: Simulation 1 ................................................................. 196 10.4.2.2 Example 2: Simulation 2 ................................................................. 196 10.4.2.3 Example 3: Simulation 3 ................................................................. 197 10.4.2.4 Example 4: Real Target Experiment ............................................. 200 10.4.2.5 Example 5: Simulation 4 ................................................................. 200 10.5 Conclusions...................................................................................................................... 201 References ................................................................................................................................... 202

10.1

Introduction

Under heavy oceanic reverberation, it can be difficult to detect a target echo accurately with an active sonar. To resolve the problem, researchers apply two methods that use reverberation models and specialized processing techniques [1]. For the first method, receivers with differing range resolutions may encounter different statistics for a given waveform. Furthermore, a given receiver may encounter different statistics at different ranges [2]. Researchers have used Weibull, log-normal, Rician, multi-modal Rayleigh, and nonRayleigh distributions to describe sonar reverberations [3,4]. For the second method, it is usually assumed that reverberation is a sum of returns issued from the transmitted signal. Under this assumption, a data matrix is first generated from the data received by the active sonar data [5,6]. The principal component inverse (PCI) [7–13] is primarily used to separate reverberation and target echoes from the data matrix. However, important prior knowledge such as the target power should be provided [11]. In Refs. [11–13], PCI and other methods have performed very well in Doppler cases, and the authors have also shown that PCI still performs well when the Doppler effect is not introduced. Provided that prior knowledge is 189

Signal and Image Processing for Remote Sensing

190

hard to obtain and the Doppler effect does not exist, it becomes very desirable to cancel reverberation with easily obtainable but minimal prior knowledge even in more complicated undersea situations. The chapter focuses on this case. The essence of PCI is separation by rank reduction. Nevertheless, the problem of separation is an old one in electrical engineering and many algorithms exist depending on the nature of signals. Blind signal separation (BSS) [14] is a significant statistical signal processing method that has been developed in the past 15 years. The advantage of BSS is that it does not need much prior knowledge and makes full use of the simple and apparent statistical properties, such as non-Gaussianity, nonstationarity, colored character, uncorrelatedness, independence, and so on. We studied BSS on canceling reverberation in Ref. [15]. From the perspective of BSS, the data received by active sonar is the convolutive mixture [16], while the instantaneous mixture model was only discussed in Ref. [15]. Consequently, we perform blind separation of convolutive mixtures (BSCM) to nullify oceanic reverberation in this contribution. In Ref. [15], the data waveform is only described in time-domain, and here, we will provide more illustrations on the matched filter outputs under different signal-to-reverberation ratios (SRR). We provide more examples for better explanation. The rest of this chapter is organized as follows: Section 10.2 presents the problem description; Section 10.3 introduces the BSCM algorithm, which is based on the reverberation characters; Section 10.4 provides examples of canceling real sea reverberation as the main content; and finally, Section 10.5 summarizes the above contents.

10.2

Problem D escription

In this chapter, we assume that reverberation is a sum of returns generated from the transmitted signal. In Ref. [16], the active sonar data series d(t) is expressed as d(t) ¼ h1 (t t 1 ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

K X k¼2

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

(10:1)

noise

reverberation

where t k is the propagation delay, hk(t) is the path impulse response, and e(t) is the transmitted signal. In the reverberation dominant circumstance, we can decompose the active sonar data into the target echo component ~d(t) and the reverberation component r(t), as defined in Equation 10.2. Extracting the target and nullifying the reverberation are done simultaneously ~ d(t) ¼ h1 (t t 1 ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z} target

r(t) ¼

K X k¼2

noise

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

(10:2)

noise

It is apparent that ~ d(t) and r(t) are of different non-Gaussianity. Next, we show an example of real sea reverberation. For simplicity, we do not introduce the experimental detail in this section. The transmitted signal is sinusoid with a frequency of 1750 Hz, a duration of 200 msec, and a sampling frequency of 6000 Hz. Figure 10.1 contains only the target echo and the background noise,

Normalized amplitude

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation 0.8 0.6 0.4 0.2 0 − 0.2 − 0.4 − 0.6 − 0.8 0

0.5

1

Samples

1.5

2

191

2.5 ⫻ 104

FIGURE 10.1 Target echo waveform.

and most reverberation generated by a sine wave is included in Figure 10.2 where no target echo exists and the background noise is embedded. The two waveforms in Figure 10.1 and Figure 10.2 are the time-domain descriptions of ~d(t) and r(t) with corresponding standard kurtosis [17] 4.1 and 1.6, respectively. Here, the time-domain non-Gaussianity is enough to discriminate ~ d(t) and r(t). It may appear that detection for ~d(t) must be evident. However, in the real world problem, it is not simple to cancel the reverberation to the degree as in Figure 10.1. To explore different effects of reverberation on detection in different SRRs, we give several examples of real sea reverberation as follows. In Figure 10.3, the first upper plot is reverberation, and in the remaining three plots, different targets are simulated and added to the reverberation with the SRR ¼ 0 dB, 7 dB, and 14 dB, respectively, from the upper to lower plots. The target echo is located between the 9,000th and 11,000th samples. The target does not move, hence no Doppler effect is produced. The matched filter outputs follow in the next figure. In Figure 10.4, the middle two plots show that the matched filter results are satisfactory even in the case that the SRR is quite small, and the lowest plot gives too many false alarms. The task for us is to cancel the reverberation to the degree that the detection is possible, similar to the middle two plots. In Equation 10.2, ~ d(t) does not contain any reverberation and just covers only the target echo and the background noise. In real world case, that is impossible. However, it does not harm the detection. Figure 10.4 also reminds us that even if ~ d(t) ¼ h1 (t t 1 ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

~ K X k¼2

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

(10:3)

noise

Normalized amplitude

~ < K, and as long as the reverberation is reduced to some satisfactory level, the where K detection can also be reliable. After comparing the first and the third plots in Figure 10.3 and Figure 10.4, respectively, we find that Figure 10.3 shows that the reverberation waveform with small SRR target added is nearly identical to the waveform of

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

FIGURE 10.2 Reverberation waveform.

0.5

1

Samples

1.5

2

2.5 ⫻ 104

Signal and Image Processing for Remote Sensing

192 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

Waveform: reverberation only

0

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 0 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = 0 dB

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = −7 dB

0

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = −14 dB

0

0.5

1

1.5

2

2.5 ⫻ 104

FIGURE 10.3 Real reverberation with different simulated targets.

reverberation, but Figure 10.4 implies that the corresponding matched filter outputs are definitely of different non-Gaussianity. This information is useful to our method for distinguishing the target echoes from the reverberation echoes.

10 .3

BSCM Algorit hm

In the past several years, different researchers developed several BSCM algorithms according to signal properties. Generally speaking, for a given source–receiver pair, the waveform arrives at different travel times, owing to varying path lengths [18]. All of these make reverberations nonstationary. The transmitted signal is sinusoid here, so the

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

193

MF output: reverberation only

0

0.5

1

1.5

2

⫻104

2.5

MF output: SRR = 0 dB

0

0.5

1

1.5

2

2.5 ⫻104

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

MF output: SRR = −7 dB

0

0.5

1

1.5

2

2.5

Normalized amplitude

⫻104 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

MF output: SRR = −14 dB

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.4 Outputs of matched filter on the data in Figure 10.3.

reverberations must retain the colored feature. Therefore, we say reverberations are nonstationary and colored. In this section, we will give only a brief introduction on the BSCM algorithm. Consider a speech scenario, where J microphones receive multiple filtered copies of statistically uncorrelated or independent signals. Mathematically, the received signals can be expressed as a convolution, that is, xj (t) ¼

I X P 1 X

hji (p)si (t p), j ¼ 1, . . . , J 0;

t ¼ 1, . . . , L

(10:4)

i¼1 p¼0

where hji(p) models the P-point impulse response from source to microphone and L is the length of the received signal. In a more compact matrix–vector notation, Equation 10.4 can be stated as x(t) ¼

P 1 X p ¼0

H(p)s(t p)

Signal and Image Processing for Remote Sensing

194

s2(n)

+

H11

s1(n)

x1(n)

H21

W21

H12

W12 +

H22

x2(n)

Mixing

W11

+

s1(n)

W22

+

s2(n)

Unmixing

FIGURE 10.5 Specific (2 2) blind separation of convolutive mixtures problem.

¼ H(p) s(t)

(10:5)

where x(t) ¼ [x1(t), . . . , xJ(t)]T is the received signal vector, s(t) ¼ [s1(t), . . . , sI(t)]T is the source vector, H(p) is the mixing filter matrix, * denotes the convolution, and [.]T means the transpose operation. Without loss of generality, we assume J ¼ I in this chapter. With this problem setup, the objective of the BSCM techniques is to find an unmixing filter W(q) of length Q for deconvolution, that is, Ð s(t) ¼ W(q) x(t)

(10:6)

where ^s(t) denotes the estimated sources. Figure 10.5 is a block diagram for J ¼ I ¼ 2. For nonstationary sources, a moving block is usually performed on received signals Ð Ð x(t,m) ¼ H(p) s(t,m), t ¼ 1, . . . , N, m ¼ 1, . . . , M

(10:7)

where m is the index for blocks, M is the total number of blocks, and N is the length of each block. After the fast Fourier transform (FFT) is applied on the three components in Equation 10.7, we obtain

Ð Ð X(w,m) ¼ H(w) S(w,m), w ¼ 1, . . . , N

(10:8)

where w denotes the frequency bin. Thus, the convolutive mixtures in time-domain turn into the instantaneous mixtures in frequency domain. This is the basic idea of the frequency method for BSCM. Then, the separation is done in each frequency bin for the unmixed filters,

Ð ^ (w,m) Y(w,m) ¼ W(w)X

(10:9)

Ð are the unmixed filter and unmixed signals in frequency bin w, where W(w) and Y(w,m) respectively. Since the envelope of the signal spectrum is correlated, XÐ j(w,m) must be a colored signal. Under the independence assumption of signals in the time and frequency domains, we apply the cost function as follows: F¼

T1 1X kW(w,r)G xÐ (w,t)WH (w,r) Ik2 4 t¼0

(10:10)

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

195

The algorithm for the unmixing matrix is given by W(w,r þ 1) ¼ W(w,r) þ m(w)

T 1 X

W(w,r)GxÐ (w,t )WH (w,r) I W(w,r)G xÐ (w,t )

(10:11)

t¼0

where m is the learning rate, r is the iteration time, T is the total number of delayed samples, [.]H is the conjugate transpose operation, I is the identity matrix, and with Ð w,m). BSS may introduce t samples delayed, G xÐ (w,t ) is the autocorrelation matrix of x( two indeterminacies including the permutation and scaling problems [14,17]. We apply the method in Ref. [19] to resolve the permutation. More methods may be found in Refs. [20–32]. Also, the scaling ambiguity is a nearly open problem to deal with. Some methods to deal with this problem are provided in Refs. [33,34]. Generally, to normalize the permuted estimated matrix is a fast and effective way. This is helpful in maintaining a stable convergence too [35]. We do not discuss this much here. After the inverse FFT is applied on W(w) we get W(q), and then BSCM is performed according to Equation 10.6. For more information about BSCM, please refer to the literatures mentioned above.

10 .4 10.4.1

Ex pe riment s an d Analy sis Backward Matrix of Active Sonar Data for Approximating the BSCM Model

In Section 10.3, BSCM is introduced with some assumptions. However, in applications like speech enhancement, remote sensing, communication systems, geophysics, and biomedical engineering, conditions may not meet the basic theoretical requirements of BSCM. Interestingly, as long as algorithms are converged, BSCM still performs signal deconvolution effectively after some approximations are made [14,17]. For active sonar, the target echo and the oceanic reverberation are all generated by the same transmitted signal, so they must be somewhat correlated nevertheless. Principal component analysis [36] is the method for decorrelation, and PCI has been applied in separating the target echo and reverberation with some prior knowledge provided [5–13]. As Neumann and Krawczyk have pointed out, the forward matrix model used is not a principal issue of the PCI algorithm, and any other forward model can be used [37]. Approximation implies that the processing result must be discounted under the comparison to the corresponding theory. Fortunately, as we pointed out in Section 10.2 the detection in the presence of reverberation may also be reliable with the matched filter even when the reverberation is not absolutely canceled, but the SRR should be within a certain limit. So, the approximation to BSCM is possible by the one-dimensional active sonar data. Intuitively, we generate the backward data matrix from active sonar data d(t) as the received signals x(t) to approximate the classical BSCM model: T x(t) ¼ x1 (t), . . . , xJ (t) xj (t) ¼ d½t þ j(B 1) 1

(10:12)

where xj(t) is the so-called jth received signal, J is the total number of received signals, and B is the size of a moving block, and it is an empirical parameter. By adjusting J and B, we may produce a different backward matrix. Also, xj(t) will not miss the target because the

Signal and Image Processing for Remote Sensing

196

two numbers J and B are relatively small in comparison to the whole length of d(t). The losses of xj(t) are at the beginning of the received reverberation and are often not useful. These make the approximation entirely reasonable.

10.4.2

Examples of Canceling Real Sea Reverberation

To simplify the discussion in this chapter, we only outline briefly the experimental setup here. Further details are in Ref. [11] about the underwater acoustic data and experiments. The transmitted signal is sinusoid with a frequency of 1750 Hz, the duration is 200 msec, and the sampling frequency is 6000 Hz. The examples are all with different targets simulated and added into the real sea reverberation. Before the targets are added, the reverberation is through the band-pass filter from 1500 to 2000 Hz. The reverberation we use is over 6 sec. For an enlarged plot, we show only the first 4 sec, which has no effect on the result as the reverberation is very light in the last 2 sec. The block size is N ¼ 200 for Equation 10.7, and the total number of delayed samples is T ¼ M for Equation 10.9. Since the energy varies in different frequency bins, the learning rate m(w) is supposed to correspond to the various frequency bins, and then it is defined as in Ref. [38]: 0:6 m(w) ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ T1 P jG xÐ (w,t)j2

(10:13)

t¼0

10.4.2.1

Example 1: Simulation 1

We take the example of the last plot in Figure 10.3. The corresponding part in Figure 10.4 shows that the detection is no good. A sinusoid target echo of 1750 Hz is simulated and added to the reverberation with SRR ¼ 14 dB. The target echo is located between the 9,001st and 11,000th samples. No Doppler effect is produced, and J ¼ 38 and B ¼ 1. After BSCM is performed on the backward matrix, J deconvolved signals are given, and then, all signals pass through the matched filter. We compute the kurtosis of all the outputs of the matched filter and choose the 18th deconvolved signal that has the highest kurtosis in Figure 10.6 as the BSCM result ~ d(t). In the next simulations, the BSCM results follow this rule. Figure 10.7 is the matched filter on the BSCM result ~d(t). By contrast to the fourth plot in Figure 10.4, it is apparent that the detection is more reliable after BSCM. Next, we show an example with Doppler effect.

10.4.2.2 Example 2: Simulation 2 A sinusoid target echo of 1745 Hz is simulated and added to the reverberation with SRR ¼ 19 dB. The target echo is located between the 12,001st and 14,000th samples. Doppler effect exists, and J ¼ 20 and B ¼ 1. Figure 10.8 is the matched filter on d(t) and the BSCM result ~ d(t). Figure 10.8 indicates that the BSCM is more effective when the Doppler effect is introduced because the reverberation and the target echo are more different. Figure 10.7 and Figure 10.8 show that the SRR is improved to the degree that the detection is more reliable than before. Since the transmitted signal and the target echo are all sinusoids of a single frequency or with very small bandwidth, obvious changes in frequency domain between the original d(t) and the BSCM result ~d(t) are not found. So, we perform the third

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

197

6

Components kurtosis

5.5 5 4.5 4 3.5 3

0

5

10

15

20 25 Components number

30

35

40

FIGURE 10.6 Kurtosis of outputs from the matched filter on the deconvoluted signals.

simulation. Although it is not very reasonable, it does help to show the validation of BSCM algorithm. 10.4.2.3

Example 3: Simulation 3

A hyperbolic frequency modulated (HFM) target echo is simulated and added to the reverberation with the SRR ¼ 28 dB. The center frequency is 1750 Hz and the bandwidth is 500 Hz. The target echo is located between the 12,001st and 18,000th samples, and J ¼ 30 and B ¼ 2. Figure 10.9 is the matched filter on d(t) and the BSCM result ~d(t). In Figure 10.9, it is amazing that the detection is improved greatly after BSCM. To compare the changes in frequency domain, we take out 6000 samples as the target echo from the 12,001st to 18,000th samples d(t) and ~ d(t), respectively, and then, FFT is applied on the two groups of data. Figure 10.10 shows the changes. It is apparent that the SRR has improved a lot after BSCM was applied.

0.8

Normalized amplitude

0.7

BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.7 Output of matched filter on the BSCM result on reverberation with a sinusoid simulated target (SRR ¼ 14 dB, no Doppler effect exists).

Signal and Image Processing for Remote Sensing

198

Normalized amplitude

0.8 0.7 0.6 MF output

0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

Normalized amplitude

0.8 0.7 0.6

BSCM MF output

0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.8 Outputs of matched filter on a real reverberation with a sinusoid simulated target (SRR ¼ 19 dB, Doppler effect exists).

0.8

Normalized amplitude

0.7 MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

0.8

Normalized amplitude

0.7 BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.9 Outputs of matched filter on reverberation with a HFM simulated target with SRR ¼ 28 dB.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

199

0.8

Normalized amplitude

0.7

Reverberation with simulated target

0.6 0.5 0.4 0.3 0.2 0.1 0

0

500

1000

1500 2000 Frequency bins

2500

3000

3500

0.8

Normalized amplitude

0.7

BSCM result

0.6 0.5 0.4 0.3 0.2 0.1 0

0

500

1000

1500 2000 Frequency bins

2500

3000

3500

FIGURE 10.10 The changes in frequency domain.

Normalized amplitude

0.8 0.7

MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3 Samples

4

5

6 ⫻ 104

Normalized amplitude

0.8 0.7

BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3 Samples

4

5

6 ⫻ 104

FIGURE 10.11 Outputs of matched filter on the active sonar data and the BSCM output in real target experiment.

Signal and Image Processing for Remote Sensing

200

Reverberation with simulated targets

Normalized amplitude

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

3.5

4 ⫻ 104

Normalized amplitude

0.8 0.6

BSCM output

0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

FIGURE 10.12 Waveforms of deep sea reverberation with two simulated targets and BSCM output—no overlap between the target echoes.

10.4.2.4

Example 4: Real Target Experiment

To simplify the discussion in this paper, we only outline briefly the experimental setup here. Further details on the underwater acoustic data and experimental results are in Ref. [11]. The transmitted signal is HFM, the duration is 1 sec, and the bandwidth is 500 Hz. The target speed is about 2.0 m/sec, the range is about 2500 m. The sampling frequency is also 6000 Hz. The data we use is after beamforming. Figure 10.11 shows the matched filter output of the original data and the BSCM output signals. The figure reveals the effect in canceling oceanic reverberation by BSCM without any expensive prior knowledge. 10.4.2.5

Example 5: Simulation 4

The above examples are all reverberations in shallow water. In this simulation, we give an example for canceling deep reverberation. The sea depth is beyond 4000 m. The sampling frequency is also 20,000 Hz. The transmitted signal is HFM and the bandwidth is 100 Hz. In deep sea, more than one target echo may occur within the reverberation issued by one ping; that is, Equation 10.1 is now written as d(t) ¼

K1 X k¼1

hk (t t k ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

K2 X k¼K1 þ1

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

noise

(10:14)

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

Reverberation with simulated targets

0.8 Normalized amplitude

201

0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

Normalized amplitude

0.8 0.6

BSCM output

0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

FIGURE 10.13 Waveforms of deep sea reverberation with two simulated targets and BSCM output—overlap between the target echoes.

In this example two target echoes are simulated. The SRR is 0 dB. The first case is shown in Figure 10.12. The two echoes do not overlap, and the duration is 0.05 sec. The second overlaps at the rate of 66.7% as shown in Figure 10.13 and the duration is 0.3 sec. Both Figure 10.12 and Figure 10.13 show that SRR has improved. This simulation implies that BSCM is also effective in canceling deep sea reverberation. In all simulations, we do not use BSS, but it does not imply that BSS is not useful. Compared to BSCM, BSS needs larger J and B, so the computation is more expensive. Sometimes, BSS cannot provide sound results, but BSCM can be used because it is much closer to the convolutive model of the active sonar data.

10 .5

Conc lusi ons

In this chapter we applied three useful and easily obtainable statistical characteristics to remove reverberation by deconvolving and distinguishing the target echoes from the reverberation echoes. They are the nonstationarity and the colored features of reverberation, and the non-Gaussianity of outputs of the matched filter on BSCM results. Except for statistical information, BSCM usually does not need other expensive prior knowledge, such as the target echo energy or the Doppler shift, and so on. This is the advantage of our

202

Signal and Image Processing for Remote Sensing

method. Though the active sonar data used in this paper is one-dimensional convolutive mixtures, the elimination of real sea reverberation proves that the approximation to the BSCM model is effective through backward data matrix. Examples show that BSCM results can improve the SRR to a degree of satisfactory detection. Finding a good criterion for choosing the appropriate data matrix, as in Ref. [15], is open for exploration.

References 1. T.J. Barnard and F. Khan, Statistical normalization of spherically invariant non-Gaussian clutter, IEEE Journal of Oceanic Engineering, 29(2), 303–309, 2004. 2. J.M. Fialkowski, R.C. Gauss, and D.M. Drumheller, Measurements and modeling of lowfrequency near-surface scattering statistics, IEEE Journal of Oceanic Engineering, 29(2), 197–214, 2004. 3. K.D. LePage, Statistics of broad-band bottom reverberation predictions in shallow-water waveguides, IEEE Journal of Ocean Engineering, 29(2), 330–346, 2004. 4. J.R. Preston and D.A. Abraham, Non-Rayleigh reverberation characteristics near 400 Hz observed on the New Jersey shelf, IEEE Journal of Oceanic Engineering, 29(2), 215–235, 2004. 5. I.P. Kirsteins and D.W. Tufts, Adaptive detection using low rank approximation to a data matrix, IEEE Transactions on Aerospace and Electronic Systems, 30(1), 55–67, 1994. 6. I.P. Kirsteins and D.W. Tufts, Rapidly adaptive nulling of interference, IEEE International Conference on Systems Engineering, pp. 269–272, 24–26 August 1989. 7. B.E. Freburger and D.W. Tufts, Case study of principal component inverse and cross spectral metric for low rank interference adaptation, in Proceedings of ICASSP ’98, Vol. 4, pp. 1977–1980, 12–15 May 1998. 8. B.E. Freburger and D.W. Tufts, Adaptive detection performance of principal components inverse, cross spectral metric and the partially adaptive multistage Wiener filter, Conference Record of the Thirty-Second Asilomar Conference on Signals, Systems & Computers, Vol. 2, pp. 1522– 1526, 1–4 November 1998. 9. B.E. Freburger and D.W. Tufts, Rapidly adaptive signal detection using the principal component inverse (PCI) method, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, Vol. 1, pp. 765–769, 2–5 November 1997. 10. T.A. Palka and D.W. Tufts, Reverberation characterization and suppression by means of principal components, in OCEANS ’98 Conference Proceedings, Vol. 3, pp. 1501–1506, 28 September–1 October 1998. 11. G. Ginolhac and G. Jourdain, Principal component inverse algorithm for detection in the presence of reverberation, IEEE Journal of Oceanic Engineering, 27(2), 310–321, 2002. 12. G. Ginolhac and G. Jourdain, Detection in presence of reverberation, OCEANS 2000 MTS/IEEE Conference and Exhibition, Vol. 2, pp. 1043–1046, 11–14 September 2000. 13. V. Carmillet, P.O. Amblard, and G. Jourdain, Detection of phase- or frequency-modulated signals in reverberation noise, JASA, 105(6), 3375–3389, 1999. 14. A. Cichocki and S. Amari, Adaptive Blind Signal Image Processing: Learning Algorithm and Application, John Wiley & Sons, New York, 2002. 15. F. Cong et al., Blind signal separation and reverberation canceling with active sonar data, in Proceedings of ISSPA, Australia 2005. 16. G.S. Edelson and I.P. Kirsteins, Modeling and suppression of reverberation components, IEEE Seventh SP Workshop on Statistical Signal and Array Processing, pp. 437–440, 26–29 June 1994 17. A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley Interscience, New York, 2001. 18. D.A. Abraham and A.P. Lyons, Simulation of non-Rayleigh reverberation and clutter, IEEE Journal of Oceanic Engineering, 29(2), 347–362, 2004.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

203

19. F. Cong et al., Approach based on colored character to blind deconvolution for speech signals, in Proceedings of ICIMA, pp. 396–399, 2004. 20. S. Amari, S.C. Douglas, A. Cichocki, and H.H. Yang, Multichannel blind deconvolution and equalization using the natural gradient, in Proceedings of IEEE Workshop Signal Processing Advances Wireless Communications, pp. 101–104, April 1997. 21. M. Kawamoto, K. Matsuoka, and N. Ohnishi, A method of blind separation for convolved nonstationary signals, Neurocomputing, 22, 157–171, 1998. 22. S.C. Douglas and X. Sun, Convolutive blind separation of speech mixtures using the natural gradient, Speech Communications, 39, 65–78, 2003. 23. P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Neurocomputing, 22, 21–34, 1998. 24. H. Sawada, R. Mukai, S. Araki, and S. Makino, Polar coordinate based nonlinear function for frequency domain blind source separation, IEICE Transactions Fundamentals, E86-A(3), 590– 596, 2003. 25. L. Schobben and W. Sommen, A frequency domain blind signal separation method based on decorrelation, IEEE Transactions on Signal Processing, 50, 1855–1865, 2002. 26. H. Sawada, R. Mukai, S. Araki, and S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation, IEEE Transactions on Speech and Audio Processing, 12(5), 530–538, 2004. 27. W. Lu and J.C. Rajapakse, Eliminating indeterminacy in ICA, Neurocomputing, 50, 271–290, 2003. 28. S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari, The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech, IEEE Transactions on Speech and Audio Processing, 11(2), 109–116, 2003. 29. M.Z. Ikram and D.R. Morgan, Permutation inconsistency in blind speech separation: investigation and solutions, IEEE Transactions on Speech and Audio Processing, 13(1), 1–13, 2005. 30. A. Dapena and L. Castedo, A novel frequency domain approach for separating convolutive mixtures of temporally white signals, Digital Signal Processing, 13(2), 301–316, 2003. 31. A. Dapena, M.F. Bugallo, and L. Catcdo, Separation of convolutive mixtures of temporallywhite signals: a novel frequency-domain approach, in Proceedings of Third ICA, San Diego, California, pp. 179–184, 2001. 32. C. Mejuto, A. Dapena, and L. Casteda, Frequency-domain infomax for blind source separation of convolutive mixtures, in Proceedings of ICA 2000, pp. 315–320, Helsinki, 2000. 33. K. Matsuoka and S. Nakashima, Minimal distortion principle for blind source separation, in Proceedings of ICA, pp. 722–727, December 2001. 34. N. Murata, S. Ikeda, and A. Ziehe, An approach to blind source separation based on temporal structure of speech signals, Neurocomputing, 41(1), 1–24, 2001. 35. W. Lu and J.C. Rajapakse, Eliminating indeterminacy in ICA, Neurocomputing, 50, 271–290, 2003. 36. K.I. Diamantaras and S.Y. Kung, Principal Component and Neural Network Theory and Application, John Wiley & Sons, New York, 1996. 37. A. Neumann and H. Krawczyk, Principal Component Inversion, Training Course on Remote Sensing of Ocean Color, Ahmedabad, India, February 2001. 38. M.Z. Ikram, Multichannel Blind Separation of Speech Signals in a Reverberant Environment, Ph.D thesis, Georgia Institute of Technology, 2001.

11 Neural Network Retrievals of Atmospheric Temperature and Moisture Profiles from HighResolution Infrared and Microwave Sounding Data

William J. Blackwell

CONTENTS 11.1 Introduction ..................................................................................................................... 206 11.2 A Brief Overview of Spaceborne Atmospheric Remote Sensing............................ 207 11.2.1 Geophysical Parameter Retrieval ................................................................... 209 11.2.2 The Motivation for Computationally Efficient Algorithms ....................... 210 11.3 Principal Components Analysis of Hyperspectral Sounding Data........................ 210 11.3.1 The PC Transform............................................................................................. 211 11.3.2 The NAPC Transform ...................................................................................... 211 11.3.3 The Projected PC Transform ........................................................................... 211 11.3.4 Evaluation of Compression Performance Using Two Different Metrics ...................................................................................... 212 11.3.4.1 PC Filtering ....................................................................................... 212 11.3.4.2 PC Regression................................................................................... 213 11.3.5 NAPC of Clear and Cloudy Radiance Data ................................................. 214 11.3.6 NAPC of Infrared Cloud Perturbations ........................................................ 214 11.3.7 PPC of Clear and Cloudy Radiance Data ..................................................... 216 11.4 Neural Network Retrieval of Temperature and Moisture Profiles ........................ 218 11.4.1 An Introduction to Multi-Layer Neural Networks ..................................... 218 11.4.2 The PPC–NN Algorithm.................................................................................. 219 11.4.2.1 Network Topology........................................................................... 220 11.4.2.2 Network Training ............................................................................ 220 11.4.3 Error Analyses for Simulated Clear and Cloudy Atmospheres ............... 220 11.4.4 Validation of the PPC–NN Algorithm with AIRS=AMSU Observations of Partially Cloudy Scenes Over Land and Ocean ............. 222 11.4.4.1 Cloud Clearing of AIRS Radiances............................................... 222 11.4.4.2 The AIRS=AMSU=ECMWF Data Set.............................................. 223 11.4.4.3 AIRS=AMSU Channel Selection ..................................................... 223 11.4.4.4 PPC–NN Retrieval Enhancements for Variable Sensor Scan Angle and Surface Pressure.................................................. 225 11.4.4.5 Retrieval Performance..................................................................... 225 11.4.4.6 Retrieval Sensitivity to Cloud Amount........................................ 225 11.4.5 Discussion and Future Work .......................................................................... 226

205

Signal and Image Processing for Remote Sensing

206

11.5 Summary .......................................................................................................................... 227 Acknowledgments ..................................................................................................................... 230 References ................................................................................................................................... 230

11.1

Introduction

Modern atmospheric sounders measure radiance with unprecedented resolution and accuracy in spatial, spectral, and temporal dimensions. For example, the Atmospheric Infrared Sounder (AIRS), operational on the NASA EOS Aqua satellite since 2002, provides a spatial resolution of 15 km, a spectral resolution of n=Dn 1200 (with 2,378 channels from 650 to 2675 cm1), and a radiometric accuracy on the order of +0.2 K. Typical polar-orbiting atmospheric sounders measure approximately 90% of the Earth’s atmosphere (in the horizontal dimension) approximately every 12 h. This wealth of data presents two major challenges in the development of retrieval algorithms, which estimate the geophysical state of the atmosphere as a function of space and time from upwelling spectral radiances measured by the sensor. The first challenge concerns the robustness of the retrieval operator and involves maximal use of the geophysical content of the radiance data with minimal interference from instrument and atmospheric noise. The second is to implement a robust algorithm within a given computational budget. Estimation techniques based on neural networks (NNs) are becoming more common in high-resolution atmospheric remote sensing largely because their simplicity, flexibility, and ability to accurately represent complex multi-dimensional statistical relationships allow both of these challenges to be overcome. In this chapter, we consider the retrieval of atmospheric temperature and moisture profiles (quantity as a function of altitude) from radiance measurements at microwave and thermal infrared wavelengths. A projected principal components (PPC) transform is used to reduce the dimensionality of and optimally extract geophysical information from the spectral radiance data, and a multi-layer feedforward NN is subsequently used to estimate the desired geophysical profiles. This algorithm is henceforth referred to as the ‘‘PPC–NN’’ algorithm. The PPC–NN algorithm offers the numerical stability and efficiency of statistical methods without sacrificing the accuracy of physical, model-based methods. The chapter is organized as follows. First, the physics of spaceborne atmospheric remote sensing is reviewed. The application of principal components transforms to hyperspectral sounding data is then presented and a new approach is introduced, where the sensor radiances are projected into a subspace that reduces spectral redundancy and maximizes the resulting correlation to a given parameter. This method is very similar to the concept of canonical correlations introduced by Hotelling over 70 years ago [1], but its application in the hyperspectral sounding context is new. Second, the use of multi-layer feedforward NNs for geophysical parameter retrieval from hyperspectral measurements (first proposed in 1993 [2]) is reviewed, and an overview of the network parameters used in this work is given. The combination of the PPC radiance compression operator with an NN is then discussed, and performance analyses comparing the PPC–NN algorithm to traditional retrieval methods are presented.

Retrievals of Atmospheric Temperature and Moisture Profiles

11.2

207

A Brief Overview of Spaceborne Atmospheric Remote Sensing

The typical measurement scenario for spaceborne atmospheric remote sensing is shown in Figure 11.1. A sensor measures upwelling spectral radiance (intensity as a function of frequency) at various incidence angles. The sensor data is usually calibrated to remove measurement artifacts such as gain drift, nonlinearities, and noise. The spectral radiances measured by the sensor are related to geophysical quantities, such as the vertical temperature profile of the atmosphere, and therefore must be converted into a geophysical quantity of interest through the use of an appropriate retrieval algorithm. The radiative transfer equation describing the radiation intensity observed at altitude L, viewing angle u, and frequency n can be formulated by including reflected atmospheric and cosmic contributions and the radiance emitted by the surface as follows [3,4]: Rn (L) ¼

ðL

ðL 0 0 kn (z)Jn [T(z)] exp sec ukn (z ) dz sec u dz

0

þ rn e

t* sec u

ðL

z

ðz 0 0 kn (z)Jn [T(z)] exp sec ukn (z ) dz sec u dz

0

0

þ rn e2t* sec u Jn (Tc ) þ «n et* sec u Jn (Ts )

(11:1)

Intensity

Voltage

1 0.5 0 −0.5 −1 −1.5 −2 −1

0

1

300 290 280 270 260 250 240 230 220 210 200 500

Spectral radiance

104

103

2

1000

1500

2000

2500

Wavelength

Time

Calibraton

Temperature profile 105

Altitude

Detector output 2 1.5

3000

10 180

200

220

240

260

280

300

Temperature

Retrieval algorithm

FIGURE 11.1 Typical measurement scenario for spaceborne atmospheric remote sensing. Electromagnetic radiation that reaches the sensor is emitted by the sun, cosmic background, atmosphere, surface, and clouds. This radiation can also be reflected or scattered by the surface, atmosphere, or clouds. The spectral radiances measured by the sensor are related to geophysical quantities, such as the vertical temperature profile of the atmosphere, and therefore must be converted into a geophysical quantity of interest through the use of an appropriate retrieval algorithm.

Signal and Image Processing for Remote Sensing

208

where «n is the surface emissivity, rn is the surface reflectivity, Ts is the surface temperature, kn(z) is the atmospheric absorption coefficient, t * is the atmospheric zenith opacity, Tc is the cosmic background temperature (2.736 + 0.017 K), and Jn(T) is the radiance intensity emitted by a blackbody at temperature T, which is given by the Planck equation: Jn (T) ¼

hn3 1 W m2 ster1 Hz1 c2 ehn=kT 1

(11:2)

The first term in Equation 11.1 can be recast in terms of a transmittance function Tn(z): Rn (L) ¼

ðL 0

dTn (z) Jn [T(z)] dz dz

(11:3)

The derivative of the transmittance function with respect to altitude is often called the temperature weighting function Wn (z)

dTn (z) dz

(11:4)

and gives the relative contribution of the radiance emanating from each altitude. The temperature weighting functions for the Advanced Microwave Sounding Unit (AMSU) are shown in Figure 11.2.

AMSU-A weighting functions

AMSU-B weighting functions

60

50

10−2

14

40 Altitude (km)

Water vapor burden (g/cm2)

Channel

13

30

12 11

20

10

183 ± 1 GHz

10−1

183 ± 3 GHz

9

10

0 0

183 ± 7 GHz

100

8 7 6

3

0.2

0.4 dT/d(In P )

150 GHz

5 4

0.6

89 GHz

0

0.1 0.2 0.3 0.4 dT/d(In (water vapor burden))

FIGURE 11.2 AMSU-A temperature profile (left) and AMSU-B water vapor profile (right) weighting functions

Retrievals of Atmospheric Temperature and Moisture Profiles 11.2.1

209

Geophysical Parameter Retrieval

The objective of the geophysical parameter retrieval algorithm is to estimate the state of the atmosphere (represented by parameter matrix X, say), given observations of spectral radiance (represented by radiance matrix R, say). There are generally two approaches to this problem, as shown in Figure 11.3. The first approach, referred to here as the variational approach, uses a forward model (for example, the transmittance and radiative transfer models previously discussed) to calculate the sensor radiance that would be measured given a specific atmospheric state. Note that the inverse model typically does not exist, as there are generally an infinite number of atmospheric states that could give rise to a particular radiance measurement. In the variational approach, a ‘‘guess’’ of the atmospheric state is made (this is usually obtained through a forecast model or historical statistics), and this guess is propagated through the forward models thereby producing an estimate of the at-sensor radiance. The measured radiance is compared with this estimated radiance, and the state vector is adjusted so as to reduce the difference between the measured and estimated radiance vectors. Details on this methodology are discussed at length by Rodgers [5], and the interested reader is referred there for a more thorough treatment of the methodology and implementation of variational retrieval methods. The second approach, referred to here as the statistical, or regression-based, approach, does not use the forward model explicitly to derive the estimate of the atmospheric state vector. Instead, an ensemble of radiance–state vector pairs is assembled, and a statistical characterization (p(X), p(R), and p(X,R)) is sought. In practice, it is difficult to obtain these probability density functions (PDFs) directly from the data, and alternative methods are often used. Two of these methods are linear least-squares estimation (LLSE), or linear regression, and nonlinear least-squares estimation (NLLSE). NNs are a special class of NLLSEs, and will be discussed later.

Variational approach: • A forward model relates the geophysical state of the atmosphere to the radiances measured by the sensor.

R=f

X ≡ [T (r,t ), W (r,t ), O (r,t ),…] surface reflectivity, solar illumination, etc. +e observing system (bandwidth, resolution, etc.)

Observation noise • A “guess” of the atmospheric state is adjusted iteratively until modeled radiance “matches” observed radiance.

g = ||R – Robs|| + h(X ) “Regularization” term

Statistical (regression-based) approach: • An ensemble of radiance−state vector pairs is assembled, and X ˆ = g(Robs), where g(·) is argmin ||Xens – g(Rens)|| a statistical relationship g(·) between the two is dervied empirically. Examples of g(·) include LLSE and neural network FIGURE 11.3 Variational and statistical approaches to geophysical parameter retrieval. In the variational approach, a forward model is used to predict at-sensor radiances based on atmospheric state. In the statistical approach, an empirical relationship between at-sensor radiances and atmospheric state is derived using an ensemble of radiance–state vectors.

Signal and Image Processing for Remote Sensing

210 11.2.2

The Motivation for Computationally Efficient Algorithms

The principal advantage of regression-based methods is their simplicity—once the coefficients are derived from ‘‘training’’ data, the calculation of atmospheric state vectors is relatively easy. The variational approaches require multiple calls to the forward models, which can be computationally prohibitive. The computational complexity of the forward models is usually nonlinearly related (often O(n2) or more) to the number of spectral channels. As shown in Figure 11.4, the spectral and spatial resolution of infrared sounders has increased dramatically over the last 35 years, and the required computation needed for real-time operation with variational algorithms has outpaced Moore’s Law. There is, therefore, a motivation to reduce the computational burden of current and next-generation retrieval algorithms to allow real-time ingestion of satellite-derived geophysical products into numerical weather forecast models.

11.3

Principal Components Analysis of Hyperspectral Sounding Data

Principal components (PC) transforms can be used to represent radiance measurements in a statistically compact form, enabling subsequent retrieval operators to be substantially

Spatial resolution

Spectral resolution 105

9 IR Microwave

IR Microwave 8 IASI

7

AIRS

6 Footprints per 100 km

Number of special channels

104

103

102

5

4

3 HIRS 101

2

ITPR

AMSU/ATMS 1

100 1970

1980

1990 Year

2000

2010

MSU

NEMS 0 1970 1980

1990 Year

2000

2010

FIGURE 11.4 Improvements in sensor spectral and spatial resolution over the last 35 years is shown. The recent increases in the spectral resolutions afforded by infrared sensors has far surpassed that available from microwave sensors. The trends in spatial resolution are similar for infrared and microwave sensors.

Retrievals of Atmospheric Temperature and Moisture Profiles

211

more efficient and robust (see Ref. [6], for example). Furthermore, measurement noise can be dramatically reduced through the use of PC filtering [7,8], and it has also been shown [9] that PC transforms can be used to represent variability in high-spectral-resolution radiances perturbed by clouds. In the following sections, several variants of the PC transform are briefly discussed, with emphasis focused on the ability of each to extract geophysical information from the noisy radiance data. 11.3.1

The PC Transform

The PC transform is a linear, orthonormal operator1 QrT, which projects a noisy ~ ¼ R þ , into an r-dimensional (r m) subspace. The m-dimensional radiance vector, R additive noise vector is assumed to be uncorrelated with the radiance vector R, and is ~ , that is, P ~ ¼ QrT R ~ have characterized by the noise covariance matrix C. The ‘‘PC’’ of R two desirable properties: (1) the components are statistically uncorrelated and (2) the reduced-rank reconstruction error. ^ ^~ R ~ )T (R ~ )] ~ R c () ¼ E[(R (11:5) 1

r

r

^ ~ for some linear operator Gr with rank r, is minimized when Gr ¼ Qr QrT. ~ r D Gr R where R ¼ The rows of QTr contain the r most-significant (ordered by descending eigenvalue) eigenvectors of the noisy data covariance matrix CR~ R~ ¼ CRR þ C. 11.3.2

The NAPC Transform

Cost criteria other than in Equation 11.5 are often more suitable for typical hyperspectral compression applications. For example, it might be desirable to reconstruct the noise-free radiances and filter the noise. The cost equation thus becomes ^ r R)T (R ^ r R)] c2 () ¼ E[(R (11:6) ^ r D Hr R ~ for some linear operator Hr with rank r. The noise-adjusted principal where R ¼ 1=2 1=2 components (NAPC) transform [10], where Hr ¼ C Wr WrT C and WrT contains the 1=2 r most-significant eigenvectors of the whitened noisy covariance matrix Cw~ w~ ¼ C 1=2 (CRR þ C)C , maximizes the signal-to-noise ratio of each component, and is superior to the PC transform for most noise-filtering applications where the noise statistics are known a priori.

11.3.3

The Projected PC Transform

It is often unnecessary to require that the PC be uncorrelated, and linear operators can be derived that offer improved performance over PC transforms for minimizing cost functions such as in Equation 11.6. It can be shown [11] that the optimal linear operator with rank r that minimizes Equation 11.6 is Lr ¼ Er ETr CRR (CRR þ CCC )1

(11:7)

where Er ¼ [E1 j E2 j . . . jEr] are the r most-significant eigenvectors of CRR (CRR þ C)1CRR. Examination of Equation 11.7 reveals that the Wiener-filtered radiances are projected onto the r-dimensional subspace spanned by Er. It is this projection that The following mathematical notation is used in this chapter: ()T denotes the transpose, (~ ) denotes a noisy random vector, and () denotes an estimate of a random vector. Matrices are indicated by bold upper case, vectors by upper case, and scalars by lower case. 1

Signal and Image Processing for Remote Sensing

212

motivates the name ‘‘PPC.’’ An orthonormal basis for this r-dimensional subspace of the original m-dimensional radiance vector space R is given by the r most-significant right eigenvectors, Vr, of the reduced-rank linear regression matrix, Lr, given in Equation 11.7. ~ as We then define the PPC of R ~ ¼ VT R ~ P r

(11:8)

~ are correlated, as VT ðCRR þ CCC ÞVr is not a diagonal matrix. Note that the elements of P r Another useful application of the PPC transform is the compression of spectral radiance information that is correlated with a geophysical parameter, such as the temperature profile. The r-rank linear operator that captures the most radiance information, which is correlated to the temperature profile, is similar to Equation 11.7 and is given below: Lr ¼ Er ETr CTR (CRR þ CCC )1

(11:9)

where Er ¼ [E1 j E2 j j Er] are the r most-significant eigenvectors of CTR (CRR þ CCC )1 CRT , and CTR is the cross-covariance of the temperature profile and the spectral radiance.

11.3.4

Evaluation of Compression Performance Using Two Different Metrics

The compression performance of each of the PC transforms discussed previously was evaluated using two performance metrics. First, we seek the transform that yields the best (in the sum-squared sense) reconstruction of the noise-free radiance spectrum given a noisy spectrum. Thus, we seek the optimal reduced-rank linear filter. The second performance metric is quite different and is based on the temperature retrieval performance in the following way. A radiance spectrum is first compressed using each of the PC transforms for a given number of coefficients. The resulting coefficients are then used in a linear regression to estimate the temperature profile. The results that follow were obtained using simulated, clear-air radiance intensity spectra from an AIRS-like sounder. Approximately, seven thousand and five-hundred 1750-channel radiance vectors were generated with spectral coverage from approximately 4 to 15 mm using the NOAA88b radiosonde set. The simulated intensities were expressed in spectral radiance units (mW m2 sr1(cm1)1). 11.3.4.1

PC Filtering

Figure 11.5a shows the sum-squared radiance distortion (Equation 11.5) as a function of the number of components used in the various PC decomposition techniques. The a priori level indicates the sum-squared error due to sensor noise. Results from two variants of the PC transform are plotted, where the first variant (the ‘‘PC’’ curve) uses eigenvectors of CR^ R^ as the transform basis vectors, and the second variant (the ‘‘noise-free PC’’ curve) uses eigenvectors of CRR as the transform basis vectors. It is shown in Figure 11.5a that the PPC reconstruction of noise-free radiances (PPC[R]) yields lower distortion than both the PC and NAPC transforms for any number of components (r). It is noteworthy that the ‘‘PC’’ and ‘‘noise-free PC’’ curves never reach the theoretically optimal level, defined by the full-rank Wiener filter. Furthermore, the PPC distortion curves decrease monotonically with coefficient number, while all the PC distortion curves exhibit a local minimum, after which the distortion increases with coefficient number as noisy, high-order terms are

Retrievals of Atmospheric Temperature and Moisture Profiles (a)

213

Noise-filtered radiance reconstruction error

Sum-squared error (mW m–2 sr –1 (cm–1)–1)

106 PC Noise-free PC NAPC PPC(R ) PPC(T )

104

102

100 100

a priori

Theoretical limit

101

102

103

Number of components (r )

(b)

Temperature profile retrieval error

Sum-squared error (K)

104 PC Noise-free PC NAPC PPC(R ) PPC(T )

103

102 100

Theoretical limit 101

102

103

Number of components (r )

FIGURE 11.5 Performance comparisons of the PC (where the components are derived from both noisy and noise-free radiances), NAPC, and PPC transforms for a hypothetical 1750-channel infrared (4–15 mm) sounder. Two projected principal components transforms were considered, PPC(R) and PPC(T), which are, respectively: (a) maximum representation of noise-free radiance energy, and (b) maximum representation of temperature profile energy. The first plot shows the sum-squared error of the reduced-rank reconstruction of the noise-free spectral radiances. The second plot shows the temperature profile retrieval error (trace of the error covariance matrix) obtained using linear regression with r components.

included. The noise in the high-order PPC terms is effectively zeroed out, because it is uncorrelated with the spectral radiances. 11.3.4.2 PC Regression The PC coefficients derived in the previous example are now used in a linear regression to estimate the temperature profile. Figure 11.5b shows the temperature profile error (integrated over all altitude levels) as a function of the number of coefficients used in the linear regression, for each of the PC transforms. To reach the theoretically optimal value achieved by linear regression with all channels requires approximately 20 PPC coefficients, 200 NAPC coefficients, and 1000 PC coefficients. Thus, the PPC transform results in a factor of ten improvement over the NAPC transform when compressing temperaturecorrelated radiances (20 versus 200 coefficients required), and approximately a factor of 100 improvement over the original spectral radiance vector (20 versus 1750). Note that the first guess in the AIRS Science Team Level-2 retrieval uses a linear regression derived from approximately 60 of the most-significant NAPC coefficients of the 2378-channel AIRS spectrum (in units of brightness temperature) [6]. Results for the moisture profile

214

Signal and Image Processing for Remote Sensing

are similar, although more coefficients (typically 35 versus 25 for temperature) are needed because of the higher degree of nonlinearity in the underlying physical relationship between atmospheric moisture and the observed spectral radiance. This substantial compression enables the use of relatively small (and thus very stable and fast) NN estimators to retrieve the desired geophysical parameters. It is interesting to consider the two variants of the PPC transform shown in Figure 11.5, namely PPC(R), where the basis for the noise-free radiance subspace is desired, and PPC(T), where the basis for only the temperature profile information is desired. As shown in Figure 11.5a, the PPC(T) transform poorly represents the noise-free radiance space, because there is substantial information that is uncorrelated with temperature (and thus ignored by the PPC(T) transform) but correlated with the noise-free radiance. Conversely, the PPC(R) transform offers a significantly less compact representation of temperature profile information (see Figure 11.5b), because the transform is representing information that is not correlated with temperature and thus superfluous when retrieving the temperature profile.

11.3.5

NAPC of Clear and Cloudy Radiance Data

In the following sections we compute the NAPC (and associated eigenvalues) of clear and cloudy radiance data, the NAPC of the infrared radiance perturbations due to clouds, and the projected (temperature) principal components of clear and cloudy radiance data. The 2378 AIRS radiances were converted from spectral intensities to brightness temperatures using Equation 11.2, and were concatenated with the 20 microwave brightness temperatures from AMSU-A and AMSU-B into a single vector R of length 2398. The NAPC were computed as follows: PNAPC ¼ QT R

(11:10)

~W ~ , sorted in descending order by eigenvalue. CW ~W ~ is where Q are the eigenvectors of CW the prewhitened covariance matrix discussed in Section 11.3. The eigenvalues corresponding to the top 100 NAPC are shown in Figure 11.6 for simulated clear-air and cloudy data. Also shown are scatterplots of the first three NAPC. The eigenvalues of the 90 lowest order terms are very similar. The principal differences occur in the three highest order terms, which are dominated by channels with weighting function peaks in the lower part of the atmosphere. The eigenvalues associated with the clearair and cloudy NAPC cluster into roughly five groups: 1, 2–3, 4–9, 10–11, and 12–100. The first 11 NAPC capture 99.96% of the total radiance variance for both the clear-air and cloudy data. The top three NAPCs of both clear-air and cloudy data appear to be jointly Gaussian to a close approximation, with the exception of clear-air NAPC #1 versus NAPC #2.

11.3.6

NAPC of Infrared Cloud Perturbations

We define the infrared cloud perturbation DRIR as D Rclr Rcld DRIR ¼ IR IR

(11:11)

clr cld where RIR is the clear-air infrared brightness temperature and RIR is the cloudy infrared brightness temperature. The NAPC of DRIR were calculated using the method described above. The results are shown in Figure 11.7.

Retrievals of Atmospheric Temperature and Moisture Profiles

215

108 Clear air Clouds

Eigenvalue

106

104

102

10

20

30

70 6

2

4

4

0

−4 −6 −4

NAPC #3

6

−2

2 0 −2

0 −2 NAPC #1

−4 −4

2

0 −2 NAPC #1

−4 −5

2

4

4

−6 −5

NAPC #3

2

2 0 −2

−4 0 NAPC #1

5

−4 −5

100

−2

6

−2

90

0

6

0

80

2

4

NAPC #3

NAPC #2

40 50 60 NAPC coefficient #

4

NAPC #3

NAPC #2

100 0

0 NAPC #2

5

0 NAPC #2

5

2 0 −2

0 NAPC #1

5

−4 −5

FIGURE 11.6 Noise-adjusted principal components transform analysis of clear and cloudy simulated AIRS=AMSU data. The top plot shows the eigenvalues of each NAPC coefficient for clear and cloudy data. The middle row presents scatterplots of the three clear-air NAPC coefficients with the largest variance (shown normalized to unit variance). The bottom row presents scatterplots of the three cloudy NAPC coefficients with the largest variance (shown normalized to unit variance).

The six highest order NAPC of DRIR capture approximately 99.96% of the total cloud perturbation variance, which suggests that there are more degrees of freedom in the atmosphere than there are in the clouds. Furthermore, there is significant crosstalk between the cloud perturbation and the underlying atmosphere, and this crosstalk is highly nonlinear and non-Gaussian. Evidence of this can be seen in the scatterplot of NAPC #1 versus NAPC #2, shown in the lower left corner of Figure 11.7.

Signal and Image Processing for Remote Sensing

216 108

Clear−cloudy Clear

107

Eigenvalue

106 105 104 103 102 101

10

20

30

40 50 60 NAPC coefficient #

NAPC #3

NAPC #2

5

0

−5 −2

0

2 4 NAPC #1

6

70

5

5

0

0

NAPC #3

100 0

−5

−10 −2

0

2 4 NAPC #1

6

80

90

100

−5

−10 −5

0 NAPC #2

5

FIGURE 11.7 Noise-adjusted principal components transform analysis of the cloud impact (clear radiance–cloudy radiance) for simulated AIRS data. The top plot shows the eigenvalue of each NAPC coefficient of cloud impact, along with the NAPC coefficients of clear-air data (shown in Figure 11.6). The bottom row presents scatterplots of the three cloud-impact NAPC coefficients with the largest variance (shown normalized to unit variance).

The temperature weighting functions of NAPC #1 and NAPC #2 are shown in Figure 11.8. NAPC #1 consists primarily of surface channels and NAPC #2 consists primarily of channels that peak near 3–6 km and channels that peak near the surface. Therefore, NAPC #1 is sensitive principally to the overall cloud amount, while NAPC #2 is also sensitive to cloud altitude.

11.3.7

PPC of Clear and Cloudy Radiance Data

The PPC transform discussed previously was used to identify temperature information contained in the clear and cloudy radiances. Figure 11.9 shows the mean temperature profile retrieval error for the reduced-rank regression operator given in Equation 11.9 as a function of rank (the number of PPC coefficients retained) for clear-air and cloudy radiance data. Both curves have asymptotes near 15 coefficients, and clouds degrade the temperature retrieval by an average of approximately 0.3 K RMS.

Retrievals of Atmospheric Temperature and Moisture Profiles

217

15 Clear–cloudy: NAPC #1 Clear–cloudy: NAPC #2

Altitude (km)

10

5

0

0

0.05

0.1

0.15 0.2 0.25 0.3 0.35 Temperature weighting function (km−1)

0.4

0.45

0.5

FIGURE 11.8 Temperature weighting functions of the first two noise-adjusted principal components of AIRS cloud perturbations (clear radiance–cloudy radiance).

3

Mean temperature profile error (K RMS)

Clear air Clouds 2.5

2

1.5

1

0.5 0

5

10

15

20

25

30

35

40

45

50

Number of PPC coefficients FIGURE 11.9 Projected principal components transform analysis of clear and cloudy simulated AIRS=AMSU data. The mean temperature profile retrieval error (K RMS) is shown as a function of the number of PPC coefficients used in a linear regression for simulated clear and cloudy data.

Signal and Image Processing for Remote Sensing

218

11.4

Neural Network Retrieval of Temperature and Moisture Profiles

An NN is an interconnection of simple computational elements, or nodes, with activation functions that are usually nonlinear, monotonically increasing, and differentiable. NNs are able to deduce input–output relationships directly from the training ensemble without requiring underlying assumptions about the distribution of the data. Furthermore, an NN with only a single hidden layer of a sufficient number of nodes with nonlinear activation functions is capable of approximating any real-valued continuous scalar function to a given precision over a finite domain [12,13].

11.4.1

An Introduction to Multi-Layer Neural Networks

Consider a multi-layer feedforward NN consisting of an input layer, an arbitrary number of hidden layers (usually one or two), and an output layer (see Figure 11.10). The hidden layersPtypically contain sigmoidal activation functions of the form zj ¼ tanh(aj), where d aj ¼ i ¼ 1 wji xi þ bj. The output layer is typically linear. The weights (wji) and biases (bj) for the jth neuron are chosen to minimize a cost function over a set of P training patterns. A common choice for the cost function is the sum-squared error, defined as E(w) ¼

1 X X (p) (p) 2 t k yk 2 p k

(11:12)

(p) where y(p) k and tk denote the network outputs and target responses, respectively, of each output node k given a pattern p, and w is a vector containing all the weights and biases of the network. The ‘‘training’’ process involves iteratively finding the weights and biases that minimize the cost function through some numerical optimization procedure. Secondorder methods are commonly used, where the local approximation of the cost function by a quadratic form is given by

(a) Neural network topology

(b) Perceptron x1 x2

wj,i

Σ

x3

. . .

xd Input layer

First hidden layer

Second hidden layer

Output layer

aj

F(aj)

zj

wj,d bj

FIGURE 11.10 The structure of the multi-layer feedforward NN (specifically, the multi-layer perceptron) is shown in (a), and the perceptron (or node) is shown in (b).

Retrievals of Atmospheric Temperature and Moisture Profiles 1 E(w þ dw) E(w) þ rE(w)T dw þ dwT r2 E(w) dw 2

219 (11:13)

where rE(w) and r2E(w) are the gradient vector and the Hessian matrix of the cost function, respectively. Setting the derivative of Equation 11.13 to zero and solving for the weight update vector dw yields dw ¼ [r2 E(w)]1 rE(w)

(11:14)

Direct application of Equation 11.14 is difficult in practice, because computation of the Hessian matrix (and its inverse) is nontrivial, and usually needs to be repeated at each iteration. For sum-squared error cost functions, it can be shown that rE(w) ¼ JT e

(11:15)

r2 E(w) ¼ JT J þ S

(11:16)

where J is the Jacobian matrix that contains first derivatives of the network P Perrors with 2 respect to the weights and biases, e is a vector of network errors, and S ¼ p ¼ 1 ep r ep. The Jacobian matrix can be computed using a standard backpropagation technique [14] that is significantly more computationally efficient than direct calculation of the Hessian matrix [15]. However, an inversion of a square matrix with dimensions equal to the total number of weights and biases in the network is required. For the Gauss–Newton method, it is assumed that S is zero (a reasonable assumption only near the solution), and the updated Equation 11.14 becomes dw ¼ [JT J]1 Je

(11:17)

The Levenberg–Marquardt modification [16] to the Gauss–Newton method is dw ¼ [JT J þ mI]1 Je

(11:18)

As m varies between zero and 1, dw varies continuously between the Gauss–Newton step and the steepest descent. The Levenberg–Marquardt method is thus an example of a model-trust-region approach in which the model (in this case the linearized approximation of the error function) is trusted only within some region around the current search point [17]. The size of this region is governed by the value m. The use of multi-layer feedforward NNs, such as the multi-layer perceptron (MLP), to retrieve temperature profiles from hyperspectral radiance measurements has been addressed by several investigators (see Refs. [18,19], for example). NN retrieval of moisture profiles from hyperspectral data is relatively new [20], but follows the same methodology used to retrieve temperature.

11.4.2

The PPC–NN Algorithm

A first attempt to combine the properties of both NN estimators and PC transforms for the inversion of microwave radiometric data to retrieve atmospheric temperature and moisture profiles is reported in Ref. [21], and a more recent study with hyperspectral data is presented in Ref. [20]. A conceptually similar approach is taken in this work by combining the PPC compression technique described in Section 11.3.3 with the NN estimator discussed in the previous section. PPC compression offers substantial performance

Signal and Image Processing for Remote Sensing

220

advantages over traditional principal components analysis (PCA) and is the cornerstone of the present work. 11.4.2.1

Network Topology

All MLPs used in the PPC–NN algorithm are composed of one or two hidden layers of nonlinear (hyperbolic tangent) nodes and an output layer of linear nodes. For the temperature retrieval, 25 PPC coefficients are input to six NNs, each with a single hidden layer of 15 nodes. Separate NNs are used for different vertical regions of the atmosphere; a total of six networks are used to estimate the temperature profile at 65 points from the surface to 50 mbar. For the water vapor retrieval, 35 PPC coefficients are input to nine NNs, each with a single hidden layer of 25 nodes. The water vapor profile (mass mixing ratio) is estimated at 58 points from the surface to 75 mbar. These network parameters were determined largely through empirical analyses. Work is underway to dynamically optimize these parameters as the NN is trained. Separate training and testing data sets are used and are discussed in more detail in Section 11.4.2.2. 11.4.2.2

Network Training

The weights and biases were initialized using the Nguyen–Widrow method [22]. This method reduces the training time by initializing the weights so that each node is ‘‘active’’ (in the linear region of the activation function) over the input range of interest. The NN was trained using the Levenberg–Marquardt backpropagation algorithm discussed in Section 11.4.1. For each epoch, the m parameter was initialized to 0.001. If a successful step was taken (i.e., E(w þ dw) < E(w)), then m was decreased by a factor of ten. If the current step was unsuccessful, the value of m was increased by a factor of ten until a successful step could be found (or until m reached 1010). The network training was stopped when the error on a separate data set did not decrease for 10 consecutive epochs. The sensor noise was changed on each training epoch to desensitize the network to radiance measurement errors. 11.4.3

Error Analyses for Simulated Clear and Cloudy Atmospheres

AIRS and AMSU-A=B clear and cloudy radiances were simulated for an ensemble of approximately 10,000 profiles. These profiles were produced using a numerical weather prediction (NWP) model, and are substantially smoother (vertically) than the NOAA88b profile set that is used in the following sections. The profiles were separated into mutually exclusive sets for training and testing of the PPC–NN algorithm. The RMS errors for the LLSE and NN temperature retrievals are shown in Figure 11.11 for clear and cloudy atmospheres. The NN estimator significantly outperforms the LLSE in both cases. The sensitivity of the retrieval to instrument noise is examined by repeating the retrieval with instrument noise set to zero. The difference in retrieval errors (with and without noise) is shown in the first panel of Figure 11.12. The atmospheric contribution to the retrieval error (i.e., the noise-free retrieval error) is shown in the second panel of Figure 11.12. Finally, the differences (net minus LLSE) in error contributions for the two methods are shown in the third panel of Figure 11.12. It is noteworthy that the NN is a much better filter of instrument noise than is the LLSE. As a final test of sensitivity to instrument noise, the LLSE and NN retrievals were repeated while varying the instrument noise between 10% and 1000% of its nominal value. The resulting retrieval errors are shown in Figure 11.13. The NN retrieval is significantly less sensitive to instrument noise than is the LLSE retrieval.

Retrievals of Atmospheric Temperature and Moisture Profiles

221

15 Neural net (clear air) LLSE (clear air) Neural net (clouds) LLSE (clouds)

Altitude (km)

10

5

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

RMS error (K) FIGURE 11.11 RMS temperature profile retrieval error for the NN and LLSE estimators. Results for clear air and clouds are shown. Surface emissivities were modeled as random variables with clipped-Gaussian PDFs; mean ¼ 0.975, standard deviation ¼ 0.025 (AIRS); mean ¼ 0.95, standard deviation ¼ 0.05 (AMSU).

15

Atmosphere contribution 15 15

10

10

10

5

5

5

Altitude (km)

Noise contribution

0

0 0

0.2 0.4 0.6 RMS error (K)

Neural net LLSE

Difference

0 0

0.5 1 RMS error (K)

Neural net LLSE

0 0.2 0.4 Net error LLSE error (K)

Noise Atmosphere

FIGURE 11.12 Contributions to temperature profile retrieval error due to measurement noise and atmospheric noise for the NN and LLSE estimators.

Signal and Image Processing for Remote Sensing

222 Absolute error

Normalized error

4

3.5

5 Neural net LLSE

4.5

Neural net LLSE

4

3

3.5 Retrieval error factor

Retrieval error (K)

2.5

2

1.5

1

2.5 2 1.5

0.5

0 10⫺1

3

1

100 Noise factor

101

0.5 10⫺1

100 Noise factor

101

FIGURE 11.13 Sensitivity of temperature profile retrieval to measurement noise. The mean RMS error over 15 km is shown as a function of the noise amplification factor.

11.4.4 Validation of the PPC–NN Algorithm with AIRS=AMSU Observations of Partially Cloudy Scenes Over Land and Ocean In this section, the performance of the PPC–NN algorithm is evaluated using cloud-cleared AIRS observations (not simulations, as were used in the previous section) and colocated ECMWF (European Center for Medium-range Weather Forecasting) forecast fields. The cloud clearing is performed using both AIRS and AMSU data. The PPC–NN retrieval performance is compared with that obtained using the AIRS Level-2 algorithm. Both ocean and land cases are considered, including elevated surface terrain, and retrievals at all sensor scan angles (out to +488) are derived. Finally, sensitivity analyses of PPC–NN retrieval performance are presented with respect to scan angle, orbit type (ascending or descending), cloud amount, and training set comprehensiveness. 11.4.4.1

Cloud Clearing of AIRS Radiances

The cloud-clearing approach discussed in Susskind et al. [23] was applied to the AIRS data by the AIRS Science Team. Version 3.x of the algorithm was used in this work (see Table 11.1 for a detailed listing of the software version numbers). The algorithm seeks to estimate a clear-column radiance (the radiance that would have been measured if the scene were cloud-free) from a number of adjacent cloud-impacted fields of view.

Retrievals of Atmospheric Temperature and Moisture Profiles

223

TABLE 11.1 AIRS Software Version Numbers for the Seven Days Used in the Match-Up Data Set 6 Sep 2002 25 Jan 2003 8 Jun 2003 21 Aug 2003 3 Sep 2003 12 Oct 2003 5 Dec 2003 Cloud clearing Level-2

11.4.4.2

3.7.0 3.0.8

3.7.0 3.0.8

3.7.0 3.0.8

3.1.9 3.0.8

3.1.9 3.0.8

3.1.9 3.0.8

3.7.0 3.0.10

The AIRS=AMSU=ECMWF Data Set

The performance of the PPC–NN algorithm was evaluated using 352,903 AIRS=AMSU observations and colocated ECMWF atmospheric fields collected on seven days throughout 2002 and 2003 (see Table 11.1). Various software version changes were made over the course of this work (see Table 11.1 for details), but these changes were primarily with regard to quality control and do not significantly affect the results presented here. However, the version 4.x release of the AIRS software, which was not available in time to be included in this work, should offer many enhancements over version 3.x, including improved cloud clearing, retrieval accuracies, quality control, and retrieval yield [24]. Reanalyses of the results presented in this section are therefore planned with the new AIRS software release. The 352,903 observations were randomly divided into a training set of 302,903 observations (206,061 of which were over ocean) and a separate validation set of 50,000 observations (40,000 of which were over ocean). The a priori RMS variation of the temperature and water vapor (mass mixing ratio) profiles contained in the validation set are shown in Figure 11.14. The observations in the validation set were matched with AIRS Level-2 retrievals obtained from the Earth Observing System (EOS) Data Gateway (EDG). As advised in the AIRS Version 3.0 L2 Data Release Documentation, only retrievals that met certain quality standards (specifically, RetQAFlag ¼ 0 for ocean and RetQAFlag ¼ 256 for land) were included in the analyses. There were 17,856 AIRS Level-2 retrievals (all within +408 latitude) that met this criterion. Re-analysis with AIRS Level-2 version 4.x software is planned, as the version 4.x products have been validated over both ocean and land at near-polar latitudes. To facilitate comparison with results published in the AIRS v3.0 Validation Report [25], layer error statistics are calculated as follows. First, layer averages are calculated in layers of approximately (but not exactly) 1-km width—the exact layer widths can be found in Appendix III in the AIRS v3.0 Validation Report. Second, weighted water vapor errors in each layer are calculated by dividing the RMS mass mixing ratio error by the RMS variation of the true mass mixing ratio (as opposed to dividing the mass mixing ratio error of each profile by the true mass mixing ratio for that profile and computing the RMS of the resulting ensemble). 11.4.4.3

AIRS=AMSU Channel Selection

Thirty-seven percent (888 of the 2378) of the AIRS channels were discarded for the analysis, as the radiance values for these channels frequently were flagged as invalid by the AIRS calibration software. A simulated AIRS brightness temperature spectrum is shown in Figure 11.15, which shows the original 2378 AIRS channels and the 1490 channels that were selected for use with the PPC–NN algorithm. All 15 AMSU-A channels were used. The algorithm automatically discounts channels that are excessively corrupted by sensor noise (for example, AMSU-A channel 7 on EOS Aqua) or other interfering

Signal and Image Processing for Remote Sensing

224

(b) a priori RMS MMR error

(a) a priori RMS temperature error 1

1

10

Pressure (mbar)

Pressure (mbar)

10

102

103 0

2

4

6 K

8

10

12

102

103

0

20

40 Percent

60

80

FIGURE 11.14 Temperature and water vapor profile statistics for the validation data set used in the analysis. See the text for details on how the statistics are computed at each layer.

Brightness temperature (K)

320 300

Availble AIRS channels Selected AIRS channels

280 260 240 220 200 500

1000

1500

2000

2500

Wavenumber (cm−1) FIGURE 11.15 A typical AIRS spectrum (simulated) is shown. 1490 out of 2378 AIRS channels were selected.

3000

Retrievals of Atmospheric Temperature and Moisture Profiles

225

signals (for example, the effects of nonlocal thermodynamic equilibrium) because the corruptive signals are largely uncorrelated with the geophysical parameters that are to be estimated. 11.4.4.4 PPC–NN Retrieval Enhancements for Variable Sensor Scan Angle and Surface Pressure When dealing with global AIRS=AMSU data, a variety of scan angles and surface pressures must be accommodated. Therefore, two additional inputs were added to the NNs discussed previously: (1) the secant of the scan angle and (2) the forecast surface pressure (in mbar) divided by 1013.25. The resulting temperature and water vapor profile estimates were reported on a variable pressure grid anchored by the forecast surface pressure. Because the number of inputs to the NNs increased, the number of hidden nodes in NNs used for temperature retrievals was increased from 15 to 20. For water vapor retrievals, the number of hidden nodes in the first hidden layer was maintained at 25, but a second hidden layer of 15 hidden nodes was added. 11.4.4.5 Retrieval Performance We now compare the retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods. For both the ocean and land cases, the PPC–NN and linear regression retrievals were derived using the same training set, and the same validation set was used for all methods. The temperature profile retrieval performance over ocean for the linear regression retrieval, the PPC–NN retrieval, and the AIRS Level-2 retrieval is shown in Figure 11.16, and the water vapor retrieval performance is shown in Figure 11.17. The error statistics were calculated using the 13,156 (out of 40,000) AIRS Level-2 retrievals that converged successfully. A bias of approximately 1 K near 100 mbar was found between the AIRS Level-2 temperature retrievals and the ECMWF data (ECMWF was colder). This bias was removed prior to computation of the AIRS Level-2 retrieval error statistics, which are shown in Figure 11.16. The temperature profile retrieval performance over land for the linear regression retrieval, the PPC–NN retrieval, and the AIRS Level-2 retrieval is shown in Figure 11.18, and the water vapor retrieval performance is shown in Figure 11.19. The error statistics were calculated using the 4,700 (out of 10,000) AIRS Level-2 retrievals that converged successfully. There are several features in Figure 11.16 through Figure 11.19 that are worthy of note. First, for all retrieval methods, the performance over land is worse than that over ocean, as expected. The cloud-clearing problem is significantly more difficult over land, as variations in surface emissivity can be mistaken for cloud perturbations, thus resulting in improper radiance corrections. Second, the magnitude of the temperature profile error degradation for land versus ocean is larger for the PPC–NN algorithm than for the AIRS Level-2 algorithm. In fact, the temperature profile retrieval performance of the AIRS Level-2 algorithm is superior to that of the PPC–NN algorithm throughout most of the lower troposphere over land. Further analyses of this discrepancy suggest that the performance of the PPC–NN method over elevated terrain is suboptimal, and could be improved. This work is currently underway. 11.4.4.6 Retrieval Sensitivity to Cloud Amount A plot of the temperature retrieval error in the layer closest to the surface as a function of the cloud fraction retrieved by the AIRS Level-2 algorithm is shown in Figure 11.20.

Signal and Image Processing for Remote Sensing

226 101 Linear regression

AIRS v3.0 Level-2 algorithm

Pressure (mbar)

PPC–NN

102

103 0

0.5

1

1.5

RMS temperature error (K) FIGURE 11.16 Temperature retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over ocean. Statistics were calculated over 13,156 fields of regard.

Similar curves for the water vapor retrieval performance are shown in Figure 11.21. Both methods produce temperature and moisture retrievals with RMS errors near 1 K and 15%, respectively, even in cases with large cloud fractions. The figures show that the PPC–NN temperature and moisture retrievals are less sensitive than the AIRS Level-2 retrievals to cloud amount. Furthermore, it has been shown that the PPC–NN retrieval technique is relatively insensitive to sensor scan angle, orbit type, and training set comprehensiveness [26].

11.4.5

Discussion and Future Work

Although the PPC–NN performance results presented in the previous section are very encouraging, several caveats must be mentioned. The ECMWF fields used for ‘‘ground truth’’ contain errors, and the NN will tune to these errors as part of its training process. Therefore, the PPC–NN RMS errors shown in the previous section may be underestimated, and the AIRS Level-2 RMS errors may be overestimated, as the ECMWF data is not an accurate representation of the true state of the atmosphere. Therefore, the ‘‘true’’ spread between the performance of the PPC–NN and AIRS Level-2 algorithms is

Retrievals of Atmospheric Temperature and Moisture Profiles

227

100

Linear regression AIRS v3.0 Level-2 algorithm

Pressure (mbar)

200

PPC–NN

300

400 500 600 700 800 900 1000 0

10

20

30

40

50

Percent RMS mass mixing ratio error (RMS error/RMS true * 100) FIGURE 11.17 Water vapor (mass mixing ratio) retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over ocean. Statistics were calculated over 13,156 fields of regard.

almost certainly smaller than that shown here. Work is currently underway to test the performance of both the PPC–NN and AIRS Level-2 algorithms with additional groundtruth data, including radiosonde data, and ground- and aircraft-based measurements. It should be noted that the PPC–NN algorithm as implemented in this work is currently not a stand-alone system, as both AIRS cloud-cleared radiances and quality flags produced by the AIRS Level-2 algorithm are required. Future work is planned to adapt the PPC–NN algorithm for use directly on cloudy AIRS=AMSU radiances and to produce quality assessments of the retrieved products. Finally, assimilation of PPC–NN-derived atmospheric parameters into NWP models is planned, and the resulting impact on forecast accuracy will be an excellent indicator of retrieval quality.

11.5

Summary

The PPC–NN temperature and moisture profile retrieval technique combines a linear radiance compression operator with an NN estimator. The PPC transform was shown to be well suited for this application because information correlated to the geophysical

Signal and Image Processing for Remote Sensing

228 1

10

Linear regression AIRS v3.0 Level-2 algorithm

Pressure (mbar)

PPC–NN

102

103

0

0.5

1

1.5

2

2.5

RMS temperature error (K) FIGURE 11.18 Temperature retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over land. Statistics were calculated over 4,700 fields of regard.

quantity of interest is optimally represented with only a few dozen components. This substantial amount of radiance compression (approximately a factor of 100) allows relatively small NNs to be used thereby improving both the stability and computational efficiency of the algorithm. Test cases with observed partially cloudy AIRS=AMSU data demonstrate that the PPC–NN temperature and moisture retrievals yield accuracies commensurate with those of physical methods at a substantially reduced computational burden. Retrieval accuracies (defined as agreement with ECMWF fields) near 1 K for temperature and 25% for water vapor mass mixing ratio in layers of approximately 1-km thickness were obtained using the PPC–NN retrieval method with AIRS=AMSU data in partially cloudy areas. PPC–NN retrievals with partially cloudy AIRS=AMSU data over land were also performed. The PPC–NN retrieval technique is relatively insensitive to cloud amount, sensor scan angle, orbit type, and training set comprehensiveness. These results further suggest the AIRS Level-2 algorithm that produced the cloud-cleared radiances and quality flags used by the PPC–NN retrieval is performing well. The high level of performance achieved by the PPC–NN algorithm suggests it would be a suitable candidate for the retrieval of geophysical parameters other than temperature and moisture from high resolution spectral data. Potential applications include the retrieval of ozone profiles and trace gas amounts. Future work will involve further evaluation of the algorithm with simulated and observed partially cloudy data, including global radiosonde data and ground- and aircraft-based observations.

Retrievals of Atmospheric Temperature and Moisture Profiles

229

100

Linear regression AIRS v3.0 Level-2 algorithm

200

Pressure (mbar)

PPC–NN

300

400 500 600 700 800 900 1000

0

10

20

30

40

50

Percent RMS mass mixing ratio error (RMS error/RMS true * 100)

FIGURE 11.19 Water vapor (mass mixing ratio) retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over land. Statistics were calculated over 4,700 fields of regard.

RMS temperature retrieval error (K) in lowest layer

1.5 AIRS Level-2 PPC–NN

1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0

10

20

30

40 50 60 Cloud fraction (%)

70

80

90

100

FIGURE 11.20 Cumulative RMS temperature error in the layer closest to the surface. Pixels were ranked in order of increasing cloudiness according to the retrieved cloud fraction from the AIRS Level-2 algorithm. No retrievals were attempted by the AIRS Level-2 algorithm if the retrieved cloud fraction exceeded 80%.

Signal and Image Processing for Remote Sensing

230

RMS humidity retrieval error (%) in lowest layer

20 AIRS Level-2 PPC–NN

15

10

5

0

10

20

30

40 50 60 Cloud fraction (%)

70

80

90

100

FIGURE 11.21 Cumulative RMS water vapor error in the layer closest to the surface. Pixels were ranked in order of increasing cloudiness, according to the retrieved cloud fraction from the AIRS Level-2 algorithm. No retrievals were attempted by the AIRS Level-2 algorithm if the retrieved cloud fraction exceeded 80%.

Acknowledgments The author would like to thank NOAA NESDIS, Office of Systems Development, and the National Aeronautics and Space Administration for financial support of this work. The colocated AIRS=AMSU=ECMWF data sets were provided by the AIRS Science Team, with assistance from M. Goldberg (NOAA NESDIS, Office of Research Applications). The author would also like to thank L. Zhou for help in obtaining and interpreting the AIRS=AMSU=ECMWF data sets, and D. Staelin and P. Rosenkranz for many useful discussions on all facets of this work. This work was supported by the National Oceanic and Atmospheric administration under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

References 1. H. Hotelling, The most predictable criterion, J. Educ. Psychol., vol. 26, pp. 139–142, 1935. 2. J. Escobar-Munoz, A. Che´din, F. Cheruy, and N. Scott, Re´seaux de neurons multi-couches pour la restitution de variables thermodynamiques atmosphe´riques a` l’aide de sondeurs verticaux satellitaires, Comptes-Rendus de L’Acade´mie des Sciences; Se´rie II, vol. 317, no. 7, pp. 911–918, 1993.

Retrievals of Atmospheric Temperature and Moisture Profiles

231

3. D.H. Staelin, Passive remote sensing at microwave wavelengths, Proc. IEEE, vol. 57, no. 4, pp. 427–439, 1969. 4. M.A. Janssen, Atmospheric Remote Sensing by Microwave Radiometry, New York, John Wiley & Sons, Inc., 1993. 5. C.D. Rodgers, Retrieval of atmospheric temperature and composition from remote measurements of thermal radiation, J. Geophys. Res., vol. 41, no. 7, pp. 609–624, 1976. 6. M.D. Goldberg, Y. Qu, L.M. McMillin, W. Wolff, L. Zhou, and M. Divakarla, AIRS nearreal-time products and algorithms in support of operational numerical weather prediction, IEEE Trans. Geosci. Rem. Sens., vol. 41, no. 2, pp. 379–389, 2003. 7. H. Huang and P. Antonelli, Application of principal component analysis to high-resolution infrared measurement compression and retrieval, J. Appl. Meteorol., vol. 40, pp. 365–388, Mar. 2001. 8. F. Aires, W.B. Rossow, N.A. Scott, and A. Che´din, Remote sensing from the infrared atmospheric sounding interferometer instrument: 1. compression, denoising, first-guess retrieval inversion algorithms, J. Geophys. Res., vol. 107, Nov. 2002. 9. W.J. Blackwell and D.H. Staelin, Cloud flagging and clearing using high-resolution infrared and microwave sounding data, IEEE Int. Geosci. Rem. Sens. Symp. Proc., June 2002. 10. J.B. Lee, A.S. Woodyatt, and M. Berman, Enhancement of high spectral resolution remotesensing data by a noise-adjusted principal components transform, IEEE Trans. Geosci. Rem. Sens., vol. 28, pp. 295–304, May 1990. 11. W.J. Blackwell, Retrieval of cloud-cleared atmospheric temperature profiles from hyperspectral infrared and microwave observations, Ph.D. dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 2002. 12. K.M. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Netw., vol. 4, no. 5, pp. 359–366, 1989. 13. S. Haykin, Neural Networks: A Comprehensive Foundation, New York, Macmillan College Publishing Company, 1994. 14. D.E. Rumelhart, G. Hinton, and R. Williams, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations., ser. D.E. Rumelhart and J.L. McClelland, Eds., Cambridge, MA, MIT Press, 1986. 15. M.T. Hagan and M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Trans. Neural Netw., vol. 5, pp. 989–993, Nov. 1994. 16. P.E. Gill, W. Murray, and M.H. Wright, The Levenberg–Marquardt method, in Practical Optimization, London, Academic Press, 1981. 17. C.M. Bishop, Neural Networks for Pattern Recognition, Oxford, Oxford University Press, 1995. 18. H.E. Motteler, L.L. Strow, L. McMillin, and J.A. Gualtieri, Comparison of neural networks and regression based methods for temperature retrievals, Appl. Opt., vol. 34, no. 24, pp. 5390–5397, 1995. 19. F. Aires, A. Che´din, N.A. Scott, and W.B. Rossow, A regularized neural net approach for retrieval of atmospheric and surface temperatures with the IASI instrument, J. Appl. Meteorol., vol. 41, pp. 144–159, Feb. 2002. 20. F. Aires, W.B. Rossow, N.A. Scott, and A. Che´din, Remote sensing from the infrared atmospheric sounding interferometer instrument: 2. simultaneous retrieval of temperature, water vapor, and ozone atmospheric profiles, J. Geophys. Res., vol. 107, Nov. 2002. 21. F.D. Frate and G. Schiavon, A combined natural orthogonal functions=neural network technique for the radiometric estimation of atmospheric profiles, Radio Sci., vol. 33, no. 2, pp. 405–410, 1998. 22. D. Nguyen and B. Widrow, Improving the learning speed of two-layer neural networks by choosing initial values of the adaptive weights, IJCNN, vol. 3, pp. 21–26, 1990. 23. J. Susskind, C.D. Barnet, and J.M. Blaisdell, Retrieval of atmospheric and surface parameters from AIRS=AMSU=HSB data in the presence of clouds, IEEE Trans. Geosci. Rem. Sens., vol. 41, no. 2, pp. 390–409, 2003. 24. J. Susskind et al., Accuracy of geophysical parameters derived from AIRS=AMSU as a function of fractional cloud cover, J. Geophys. Res., III; May 2006.

232

Signal and Image Processing for Remote Sensing

25. H.H. Aumann et al., Validation of AIRS=AMSU=HSB core products for data release version 3.0, NASA JPL Tech. Rep. D-26538, Aug 2003. 26. W.J. Blackwell, A neural-network technique for the retrieval of atmospheric temperature and moisture profiles from high spectral resolution sounding data, IEEE Trans. Geosci. Rem. Sens., vol. 43, no. 11, pp. 2535–2546, 2005.

12 Satellite-Based Precipitation Retrieval Using Neural Networks, Principal Component Analysis, and Image Sharpening*

Frederick W. Chen

CONTENTS 12.1 Introduction ..................................................................................................................... 233 12.2 Physical Basis of Passive Microwave Precipitation Sensing ................................... 234 12.3 Description of AMSU-A/B and AMSU/HSB.............................................................. 240 12.4 Signal Processing ............................................................................................................ 242 12.4.1 Regional Laplacian Filtering ........................................................................... 242 12.4.2 Principal Component Analysis....................................................................... 243 12.4.2.1 Basic PCA.......................................................................................... 244 12.4.2.2 Constrained PCA ............................................................................. 244 12.4.3 Data Fusion ........................................................................................................ 245 12.4.4 Neural Nets........................................................................................................ 245 12.5 The Chen–Staelin Algorithm ........................................................................................ 247 12.5.1 Limb-Correction of Temperature Profile Channels .................................... 249 12.5.2 Detection of Precipitation ................................................................................ 251 12.5.3 Cloud-Clearing by Regional Laplacian Interpolation................................. 252 12.5.4 Image Sharpening ............................................................................................. 252 12.5.5 Temperature and Water Vapor Profile Principal Components ................ 254 12.5.6 The Neural Net.................................................................................................. 254 12.6 Retrieval Performance Evaluation ............................................................................... 255 12.6.1 Image Comparisons of NEXRAD and AMSU-A/B Retrievals.................. 255 12.6.2 Numerical Comparisons of NEXRAD and AMSU-A/B Retrievals .......... 256 12.6.3 Global Retrievals of Rain and Snow.............................................................. 259 12.7 Conclusions...................................................................................................................... 260 Acknowledgments ..................................................................................................................... 261 References ................................................................................................................................... 261

12.1

Introduction

Global estimation of precipitation is important to studies in areas such as atmospheric dynamics, hydrology, climatology and meteorology. Improvements in methods for *The material in this chapter is taken from Refs. [66] and [67].

233

Signal and Image Processing for Remote Sensing

234

satellite-based estimation of precipitation can lead to improvements in weather forecasting, climate studies, and climate models. Precipitation presents many challenges because of its complicated physics and statistics. Existing models of clouds do not adequately account for all of the possible variations in clouds and precipitation that can be encountered. Instead of a physics-based approach, Chen and Staelin developed a method using a statistics-based approach. This method was developed for the Advanced Microwave Sounding Unit (AMSU) instruments AMSU-A/B on the NOAA-15, NOAA-16, and NOAA-17 satellites. The development process made use of the fact that although satellite-based passive microwave data cannot completely characterize the observed clouds and precipitation, the data can still yield useful information relevant to precipitation. This chapter discusses the methods used by Chen and Staelin to extract from the data information related to atmospheric state variables that are highly correlated with precipitation. Principal component analysis for signal separation and data compression, data fusion for resolution sharpening, neural nets, and regional filtering are among the signal and image processing methods used. AMSU-A/B and the corresponding instruments AMSU/HSB (Humidity Sounder for Brazil) collect data near 23.8, 31.4, 54, 89, and 183 GHz. These frequency bands provide useful information about precipitation. In this chapter, the precipitation estimation algorithm developed by Chen and Staelin is discussed with an emphasis on the signal processing employed to sense important parameters about precipitation. Section 12.2 explains why it is possible to use the data on AMSU-A/B to estimate precipitation. Section 12.3 provides a more detailed description of AMSU-A/B and AMSU/HSB. Section 12.4 describes the Chen– Staelin algorithm. Section 12.7 provides concluding remarks.

12 .2

Phys ical Bas is of Pas sive Microwave Precipitation Sensing

Matter radiates thermal energy depending on its physical temperature and characteristic properties. When a spaceborne radiometer observes a location on the Earth, the amount of energy it receives depends on contributions from the various atmospheric and topographical constituents within its field of view (Figure 12.1). One useful quantity describing the amount of thermal radiation emitted by a body is spectral brightness. Spectral brightness is a measure of how much energy a body radiates at a specified frequency per unit receiving area, per transmitting solid angle, and per unit frequency. The spectral brightness of a blackbody (Wster1m2Hz1) is a function of its physical temperature T (K) and frequency f (Hz) and is given by the following formula B(f ,T) ¼

2hf 3 c2 (ehf =kT

1)

(12:1)

where h is Planck’s constant (Js), c is the speed of light (m/s), and k is Boltzmann’s constant (J/K). For this chapter, f never exceeds 200 GHz, and T never falls below 100 K, so hf/ kT < 0.1. Then, the Taylor series expansion for exponential functions is used to simplify Equation 12.1 " # hf 1 hf 2 hf =kT e 1¼ 1þ þ 1 (12:2) þ kT 2! kT

hf kT

(12:3)

Satellite-Based Precipitation Retrieval Using Neural Networks

235

z Space Td

Tcosmic

f H

Tc

Ta q

Atmosphere

ka(z), T(z)

Tb r(q), T(0)

FIGURE 12.1 The major components of the radiative transfer equation (note: u 6¼ w since the surface of the Earth is spherical).

B( f ,T)¼

2kT l2

(12:4)

where l is the wavelength (m) associated with the frequency f. For blackbodies, given an observation of spectral brightness, one can calculate the physical temperature of a blackbody as follows: T¼

B( f , t)l2 2k

(12:5)

Unlike blackbodies, gray bodies reflect some of the energy incident on them, so the intrinsic spectral brightness of a gray body is not equal to that of a blackbody. For a gray body, B( f ,T)¼«

2kT l2

(12:6)

where « is the emissivity of the gray body. This equation is rewritten as follows: B( f ,T)¼

2k «T l2

(12:7)

The quantity «T is called the brightness temperature of an unilluminated gray body; that is, the temperature of a blackbody radiating the same brightness. Another useful property of a gray body is its reflectivity r, the fraction of incident energy that is reflected. A body in thermal equilibrium emits the same amount of energy that it absorbs. Therefore, r þ « ¼ 1. For a blackbody, « ¼ 1 and r ¼ 0. In an atmosphere without hydrometeors, the brightness temperature observed by an Earth-observing spaceborne radiometer can be divided into four components (Figure 12.1): 1. Ta, the brightness temperature due to radiation emitted by atmospheric gases that is not reflected off the surface

Signal and Image Processing for Remote Sensing

236

2. Tb, the brightness temperature due to radiation emitted by the surface 3. Tc, the brightness temperature due to radiation that is emitted by atmospheric gases that is reflected off the surface 4. Td, the brightness temperature due to cosmic background radiation that passes through the atmosphere twice and is reflected off the surface Thermal radiation can be attenuated through absorption or reflected off the surface before being received by a radiometer. The observed brightness temperature is, therefore, a function of several variables such as the atmospheric temperature profile, water vapor profile, the emissivity of the surface (as a function of satellite zenith angle), and the absorption coefficients of atmospheric gases (as a function of altitude z). Given the atmospheric absorption coefficients ka(z) (in Np per unit length), the reflectivity of the surface r(u), the altitude of the satellite H, the cosmic background temperature Tcosmic (in K), and the satellite zenith angle u, the brightness temperature components can be computed as follows (assuming specular surface reflection and the absence of hydrometeors): ðH

Ta ¼ sec u T(z0 )ka (z0 )et(z ,H) sec u dz0 0

(12:8)

z0

Tb ¼½1 r(u)T(z0 )et(z0 ,H) sec u t(z0 ,H) sec u

Tc ¼r(u) sec u e

ðH

(12:9)

T(z0 )ka (z0 )et(z0 ,z ) sec u dz0 0

(12:10)

z0

Td ¼r(u)Tcosmic e2t(z0 ,H) sec u

(12:11)

Then, the brightness temperature TB measured by the radiometer is the sum of Ta, Tb, Tc, and Td. ðH 0 TB ¼ sec u T(z0 )ka (z0 )et(z ,H) sec u dz0 þ½1 r(u)T(z0 )et(z0 ,H) sec u z0 t(z0 ,H) sec u

þr(u) sec u e

ðH

T(z0 )ka (z0 )et(z0 ,z ) sec u dz0 þr(u)Tcosmic e2t(z0 ,H) sec u 0

(12:12)

z0

where z0 is the altitude of the surface and t(z1,z2) is the integrated atmospheric absorption of the atmosphere between altitudes z1 and z2. zð2 (12:13) t(z1 ,z2 )¼ ka (z) dz z1

Equation 12.12 can be rewritten as follows (assuming that the contribution of thermal radiation from altitudes above H to Tc is negligible):

TB

ðH z0

T(z0 )W(z0 ) dz0 þ½1 r(u)T(z0 )et(z0 ,H) sec u þr(u)Tcosmic e2t(z0 ,H) sec u

(12:14)

Satellite-Based Precipitation Retrieval Using Neural Networks

103

1976 Standard Atmosphere Water vapor removed

54-GHz O2 band

118-GHz 183-GHz O2 band H2O band

2

Zenith opacity (dB)

237

10

101

100

10

425-GHz O2 band

−1

50

100

150

200 250 300 Frequency (GHz)

350

400

450

500

FIGURE 12.2 Zenith opacity for the microwave spectrum.

where W (z)¼ sec u ka (z0 )et(z,H) sec u þr(u) sec u et(z0 ,H) sec u ka (z0 )et(z0 ,z) sec u

(12:15)

W(z) is called the weighting function or Jacobian. Equation 12.12 shows that the relationship between the thermal energy radiated and the physical temperature of a body is linear; that is, the brightness temperature can be expressed as a linear integral of physical temperatures in the field of view, where these radiated signals are uncorrelated and superimposed [1]. The previous section shows that the brightness temperatures seen by a satellite-borne radiometer depend on many variables in a highly complex and nonlinear fashion, so retrievals of temperature and water vapor profiles by direct inversion would be very difficult. However, the physics of the atmosphere still allows the extraction of useful information about the atmosphere from microwave frequency bands. Two of the most important determinants of precipitation rate are the temperature and water vapor profiles. The presence of oxygen and water vapor resonance frequencies in the microwave spectrum and the presence of oxygen and water vapor in the atmosphere result in frequency bands that are sensitive primarily to a specific range of altitudes. Figure 12.2 shows the zenith opacity as a function of frequency for the range from 10 to 500 GHz for a ground-based zenith-observing radiometer. There are absorption spikes around oxygen resonance frequencies such as those in the neighborhood of 54, 118.75, and 424.76 GHz and at water vapor resonance frequencies such as 183.31 GHz. A satelliteborne radiometer observing at these frequencies would sense only the highest layers of the atmosphere. On the other hand, a satellite-borne radiometer observing at frequencies away from the spikes would be sensitive to the surface. By observing at frequency bands that are near the resonance frequencies, but still on the sides of the spikes, one can capture information on conditions at specific layers of the atmosphere. The AMSU focuses primarily on the 54-GHz oxygen band and the 183-GHz water vapor band. Figure 12.3 shows the weighting functions of the channels on AMSU. Opaque microwave channels were used with great success to retrieve atmospheric conditions. Rosenkranz used AMSU-A and AMSU-B data from NOAA satellites to estimate temperature and water vapor profiles [2,3]. Shi used AMSU-A to estimate temperature

Signal and Image Processing for Remote Sensing

238 AMSU-A weighting functions

AMSU-B weighting functions

60

50

10−2

Channel

Altitude (km)

Water vapor burden (g/cm2)

14

40

13 12

30

11 20

10 9

0 0

10

183 ± 3 GHz

100

8 10

183 ± 1 GHz −1

183 ± 7 GHz

7 6 3 0.2

0.4 dT/d(In P)

5 4 0.6

150 GHz 89 GHz 0 0.1 0.2 0.3 0.4 dT/d(In(water vapor burden))

FIGURE 12.3 Weighting functions of AMSU-A and AMSU-B channels.

profiles [4]. Blackwell et al. used the 54-GHz and 118-GHz bands aboard the National PolarOrbiting Environmental Satellite System (NPOESS) Aircraft Sounder Testbed-Microwave (NAST-M) [5]. Leslie et al. added the 183-GHz and 425-GHz bands to NAST-M [6,7,8]. Clouds and precipitation result from humid air that rises, cools, and condenses. Precipitation typically occurs in two forms: convective and stratiform. Stratiform precipitation occurs when one air mass slides under or over another, as in a cold or warm front, causing the upper air mass to cool and the water vapor within it to condense. Such precipitation is spread out. Convective precipitation is initiated by instabilities in the atmosphere caused by cold, dense air supported by warm, humid, and less dense air. Such instabilities result in the warm, humid air escaping upward and cooling. Water vapor in this ascending air mass condenses and releases latent heat. The latent heat warms the surrounding air, which pushes the air mass and the water and ice particles that have formed further upward. This cycle continues until the original warm humid air mass has cooled to the temperature of the surrounding cooler air [9,10]. The tops of convective clouds spew forth ice particles and can reach more than 10 km above sea level [11]. Convective precipitation often occurs within stratiform precipitation. Several factors affect the precipitation rate. Higher degrees of instability caused by large vertical temperature gradients force ice particles higher in the atmosphere, causing such particles to pick up more moisture and grow. Precipitation amounts are limited by the water vapor available in the atmosphere; therefore, higher concentrations of water vapor result in higher precipitation rates. Warmer surface air contributes to higher precipitation rates because warmer air holds more water vapor. Channels in the 54-GHz and 183-GHz bands provide important clues about factors such as cloud-top altitude, temperature profile, water vapor profile, and particle size distribution. Because each channel is sensitive to a specific layer of the atmosphere

Satellite-Based Precipitation Retrieval Using Neural Networks

239

(Figure 12.3), it is possible to extract information about the cloud-top altitude. Precipitating clouds exhibit perturbations in channels whose weighting functions have significant values in the range of altitudes occupied by the cloud. For example, one would not expect to detect low-lying clouds below 3 km in the AMSU-A channel 14 because the weighting function of channel 14 peaks at 40 km, far above the tops of nearly all precipitating clouds. The 54-GHz and 183-GHz bands together provide information about particle size distribution through their sensitivities to different ranges of ice particle diameters. The scattering of electromagnetic waves by spherical ice particles is described by Mie scattering coefficients. For a single spherical particle with radius r, given the power scattered by the particle Ps and the power density of the incident plane wave Si, the scattering crosssectional area Qs of the particle is defined as follows: Qs ¼

Ps Si

(12:16)

Then, the Mie scattering coefficient is defined as the ratio of Qs of the particle to the geometric cross-sectional area of the particle js ¼

Qs pr2

(12:17)

where r is the radius of the particle. Figure 12.4 shows the Mie scattering coefficients for fresh-water ice spheres at 54 GHz and 183.31 GHz as a function of diameter. The permittivities of ice were calculated using formulas developed by Hufford [12], and the Mie scattering coefficients were calculated using an iterative computational procedure developed by Deirmendjian [1,13]. At 183.31 GHz, the diameter of a particle can increase to about 0.7 mm before its diameter can no longer be uniquely determined by the Mie scattering coefficient curve. At 54 GHz, this limit is about 2.4 mm. For diameters less than 0.7 mm, js for 54 GHz is smaller than that for 183.31 GHz by a factor of at least

Mie scattering efficiency

101

100

10−1

183.31 GHz

54 GHz 10−2

10−3 −4 10

Temperature = 218 K (−55⬚C) 10−3 Drop size diameter (m)

FIGURE 12.4 Mie scattering coefficients for 54 GHz and 183.31 GHz at a temperature of 558C.

10−2

Signal and Image Processing for Remote Sensing

240

100. These differences make it possible to use both bands together to extract information about particle size distributions. The Mie scattering coefficients in Figure 12.4 were calculated for a temperature of 558C (218 K), near the temperature at an altitude of 10 km for the 1962 U.S. Standard Atmosphere [1]. Corresponding values for 108C (263 K) were also calculated, but did not differ from those for 558C by more than 7.2%. In addition to particle size distribution, the 54-GHz and 183-GHz bands can also provide information about particle abundance. While the particle size distribution can be sensed from a comparison of the Mie scattering efficiencies in both bands, particle abundance can be sensed from absolute scattering over volumes. The cloud-top altitude is another important variable in precipitation. This is correlated with particle size density because only higher updraft velocities are able to support larger particles and reach higher altitudes. The sensitivities of the 54-GHz and 183-GHz channels to specific layers of the atmosphere suggest that these channels are able to provide information about cloud-altitude. Spina et al. used data from the opaque 118-GHz band to estimate cloud-top altitudes [14]. Gasiewski showed that the 54-GHz and 118-GHz bands together could be used to estimate cell-top altitude and hydrometeor density (in terms of mass per volume) [15,16]. Window channels contribute information about precipitation through their sensitivity to the warm emission signatures of precipitating clouds against a sea background and their sensitivity to scattering. In addition to opaque channels, AMSU also includes window channels at 23.8 GHz, 31.4 GHz, 89.0 GHz, and 150 GHz. Weng et al. and Grody et al. have developed a precipitation-retrieval algorithm for AMSU that relies primarily on these channels [17,18]. Some of the recent passive microwave instruments that have focused on windows channels and have been used to study precipitation include the following: 1. The Special Sensor Microwave Imager (SSM/I) for the Defense Meteorological Satellite Program (DMSP) [19–21] 2. The Advanced Microwave Sounding Radiometer (AMSR-E) for the Earth Observing System aboard the NASA Aqua satellite [22,23] 3. The Tropical Rainfall Measurement Mission (TRMM) microwave imager (TMI) aboard the TRMM satellite [24,25] One weakness of window channels is that they tend to be sensitive to the surface. Over land, surface signatures can obscure the emission signatures of precipitation. Also, window-channel-based precipitation-rate retrieval algorithms tend to use one method over ocean and another over land [26,27]. The opaque channels aboard AMSU enable the development of an algorithm that uses the same method over both land and sea.

12.3

Description of AMSU-A/B and AMSU/HSB

The AMSU has been aboard the NOAA-15, NOAA-16, and NOAA-17 satellites launched in May 1998, September 2000, and June 2002, respectively. They are each equipped with the instruments AMSU-A and AMSU-B. AMSU-A has 15 channels: one each at 23.8 GHz, 31.4 GHz, and 89.0 GHz, and 12 channels in the 54-GHz oxygen absorption band (Table 12.1). AMSU-B has 5 channels at the following frequencies: 89.0 GHz, 150 GHz, 183.31 + 1 GHz, 183.31 + 3 GHz, and 183.31 + 7 GHz (Table 12.2). AMSU-A and AMSU-B measure

Satellite-Based Precipitation Retrieval Using Neural Networks

241

TABLE 12.1 AMSU-A Channel Frequencies Channel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Center Frequencies (MHz) 23,800 + 72.5 31,400 + 50 50,300 + 50 52,800 + 105 53,596 + 115 54,400 + 105 54,940 + 105 55,500 + 87.5 57,290.344 + 87.5 57,290.344 + 217 57,290.344 + 322.2 + 48 57,290.344 + 322.2 + 22 57,290.344 + 322.2 + 10 57,290.344 + 322.2 + 4.5 89,000 + 900

Bandwidth (MHz) 2125 280 280 2190 2168 2190 2190 2155 2155 277 435 415 48 43 21000

brightness temperatures at 50 and 15 km nominal resolutions at nadir, respectively. AMSUA has a 3.338-diameter 3-dB beamwidth and observes at 308 angles spaced at 3.338 intervals up to 48.338 from nadir every 8.00 sec. AMSU-B has a 1.18-diameter beamwidth and observes at 908 angles spaced at 1.18 intervals up to 48.958 from nadir every 2.67 sec (Figure 12.5b) [28–30]. AMSU covers a swath width of 2200 km. NOAA-15, NOAA-16, and NOAA-17 are sun-synchronous polar-orbiting satellites with equatorial crossing times of about 7 A.M./P.M., 2 A.M./P.M., and 10 A.M./P.M., respectively, so together they observe most locations approximately six times a day (Figure 12.6). The ascending local equatorial crossing times are 7 P.M., 2 P.M. 10 P.M. for NOAA–16 and NOAA–17, respectively. The 15-km, 89.0-GHz channel on AMSU-B was not used in this research in order that the algorithm developed for AMSU–A/B could also be used with AMSU/HSB with minimal adjustment. We also use data from the NASA Aqua satellite that was launched in May 2002. It is equipped with AMSU and is identical to AMSU-A aboard the NOAA satellites, and the HSB, which is identical to AMSU-B, but without the 89.0-GHz channel [31]. The nominal resolutions of AMSU and HSB are 40.5 and 13.5 km, respectively. Aqua has an equatorial crossing time of about 1:30 A.M./P.M. (Figure 12.6) [31–33]. The scan pattern of AMSU/HSB is slightly different from that of AMSU-A/B on the NOAA satellites in that the path traced during one AMSU scan is more parallel to that traced by a nearly coincident HSB scan (Figure 12.5). TABLE 12.2 AMSU-B Channel Frequencies Channel 1 2 3 4

Center Frequencies (GHz) 150 + 0.9 183.31 + 1 183.31 + 3 183.31 + 7

Bandwidth (GHz) 21 20.5 21 22

Signal and Image Processing for Remote Sensing

242

24⬚N

16⬚E

18⬚E

20⬚E

22⬚E

24⬚E

26⬚E

28⬚E

30⬚E

32⬚E

34⬚E

23⬚N 22⬚N 21⬚N 20⬚N 19⬚N

(a)

Aqua AMSU/HSB, Southbound 178⬚W 176⬚W 174⬚W 172⬚W 170⬚W 168⬚W 166⬚W 164⬚W 162⬚W 160⬚W 158⬚W 156⬚W 27⬚N 26⬚N 25⬚N 24⬚N 23⬚N 22⬚N 21⬚N

(b)

Aqua AMSU/HSB, Northbound

FIGURE 12.5 Scan patterns of (a) Aqua AMSU/HSB and (b) AMSU-A/B on NOAA-15, NOAA-16, and NOAA-17. AMSU and AMSU-A spots are labeled with þ’s and HSB and AMSU-B spots with dots.

12.4

Signal Processing

The preceding section provides an overview of the types of information available in data from AMSU-A/B. While developing an algorithm for estimating precipitation, it is important to process the data in a way that makes the information relevant to precipitation stand out as much as possible. This section provides an overview of the signal-processing methods used in the Chen and Staelin algorithm.

12.4.1

Regional Laplacian Filtering

Laplacian filtering is useful for clearing the effects of clouds over regions that are identified as cloudy. This enables computation of not only cloud-cleared brightness temperatures but also the perturbation due to precipitation. Laplacian filtering gives the Chen and Staelin algorithm a spatial filtering component not found in other algorithms that process data in a manner that treats each pixel independently of any other pixel. Laplacian filtering of a rectangular field F is done by first determining the region over which filtering is to be done. Using this region, a set of boundary conditions is determined. Then, using these boundary conditions, values for pixels in the region of interest are computed so that the discrete Laplace’s equation is satisfied. In a continuous twodimensional (2D) domain, Laplace’s equation is as follows: r2 F¼0

(12:18)

Satellite-Based Precipitation Retrieval Using Neural Networks

243

90⬚N 75⬚N 60⬚N 45⬚N

NOAA-15 7 A.M

30⬚N

NOAA-17

15⬚N 90⬚W

Aqua

10 A.M 60⬚W

0⬚

−16

90 k

30⬚W 15⬚S −2200

km

30⬚S

1:30 P.M

m

2 P.M

0⬚ 30⬚E

45⬚S

60⬚E NOAA-16

60⬚S

90⬚E

75⬚S 90⬚S FIGURE 12.6 Orbital patterns of the NOAA-15, NOAA-16, NOAA-17, and Aqua satellites.

12.4.2

Principal Component Analysis

Signal separation is an important concept in this chapter. Although precipitation is very complex, nonstationary, and sporadic, one can process the data in a way that adequately separates the different degrees of freedom that affect precipitation. Principal component analysis (PCA) is a linear method for reducing the dimensionality of a data set of interrelated variables. PCA transforms the data into a set of uncorrelated random variables that capture all of the variance of the original data set and assign as much variance as possible to the fewest number of variables. PCA is also known by other names such as singular value decomposition (SVD) and Karhunen–Loe`ve transform (KLT). PCA is useful for data compression. This feature can be critical in problems related to the compression of satellite data where a large amount of data must be downloaded from a satellite using a communications link with limited bandwidth and in situations where computational resources are limited. Blackwell used projected principal component transform, a variant of PCA, to compress data for use in estimating atmospheric temperature profiles. This reduced the number of inputs and, as a result, simplified the neural net, created a more stable neural net, and reduced the training time [34,35]. CabreraMercader used noise-adjusted principal components (NAPC) to compress simulated NASA Atmospheric Infrared Sounder (AIRS) data [36].

Signal and Image Processing for Remote Sensing

244

PCA is also used to filter out noise in data. Several different versions or extensions of PCA have been used to eliminate various types of noise. A variant of PCA has been used to remove signatures of the surface from passive microwave data for the purpose of detecting and characterizing precipitating clouds [37]. PCA is also an important part of blind multivariate noise estimation and filtering algorithms such as iterative order and noise (ION) estimation and an extension of ION that is capable of estimating mixing matrices [38–40]. 12.4.2.1 Basic PCA Here, a definition of basic PCA is presented. For a random vector x of p random variables, the principal components of x can be defined inductively. The first principal component is the product a1T x, where a1 is a unit-length vector that maximizes the variance of a1T x. a1 ¼ arg max Var(aT x) kak¼1

(12:19)

where Var() denotes the variance of a random variable. Each of the other principal components are defined as follows: the nth principal component is the product anT x, where an is a unit-length vector that is orthogonal to a1, a2, . . . , and an1 and maximizes the variance of anT x. an ¼ arg

max

kak¼1; a?ai ; 8i2{1; 2; ...; n1}

Var(aT x)

(12:20)

a1, a2, . . . , and ap are derived in Ref. [41]. The nth principal component is the product anTx where an is the eigenvector associated with the nth highest eigenvalue of the covariance matrix Lx of x. In this chapter, the definitions of PCA and the term principal component follow the convention of Ref. [41]. 12.4.2.2

Constrained PCA

In addition to the basic PCA, the algorithm developed in this chapter uses a variation of PCA known as constrained PCA. This form of PCA finds principal components that are constrained to be orthogonal to a given subspace [41]. It can be used to filter out noise in remote-sensing data. Constrained PCA has been used to filter out signatures of surface variations from microwave remote-sensing data [42]. Filtering out unwanted signatures involves the following steps: 1. Select a set of data that captures a good representation of the type of noise to be filtered without capturing too much of the variation of the signal of interest 2. Apply basic PCA to the resulting subset of data 3. Examine the resulting principal components for sensitivity to the type of noise to be filtered out 4. Project the data onto the subspace orthogonal to the noise-sensitive principal components 5. Apply basic PCA to the data resulting from the projection The principal components resulting from steps (2) and (5) are called preconstraint principal components and postconstraint principal components, respectively, as in Ref. [37].

Satellite-Based Precipitation Retrieval Using Neural Networks 12.4.3

245

Data Fusion

Data fusion is a very broad area involving the combination of information from different sources. A working group set up by the European Association of Remote Sensing Laboratories and the French Society for Electricity and Electronics has adopted the following definition of data fusion [42]: Data fusion is a formal framework in which are expressed means and tools for the alliance of data originating from different sources. It aims at obtaining information of greater quality; the exact definition of ‘greater quality’ will depend upon the application.

Review papers have referred to three levels of data fusion: measurement, feature, and decision [43–45]. The measurement level is sometimes called the pixel level [44]. The algorithm described in Section 12.5 involves the measurement and decision levels. In this chapter, nontrivial uses of data fusion occur only at the measurement level. Some of the applications of image fusion (or data fusion applied to 2D data) include image sharpening or enhancement [45], feature enhancement, and replacement of missing or faulty data [44]. For this research, nonlinear data fusion is applied to sharpen images. Rosenkranz developed a method for nonlinear geophysical parameter estimation through multi-resolution data fusion [46–48].

12.4.4

Neural Nets

Neural nets are computational structures that were developed to mimic the way biological neural nets learn from their environment and are useful for pattern recognition and classification. Neural nets can be used to learn and compute functions for which the relationships between inputs and outputs are unknown or computationally complex. There are a variety of neural nets such as feedforward neural nets (sometimes called multilayer perceptrons [49]), Kohonen self-organizing feature maps, and Hopfield nets [50,51]. The feedforward neural net is used in this chapter. The basic structural element of feedforward neural nets is called a perceptron. It computes a function of the weighted sum of inputs and a bias, as shown in Figure 12.7. y¼f

n X

! wi xi þb

(12:21)

i¼1

where xi is the ith input, wi is the weight associated with the ith input, b is the bias, f is the transfer function of the perceptron, and y is the output.

x1 x2

w1 w2

f (·)

xn

wn

b

y

FIGURE 12.7 The structure of a perceptron.

Signal and Image Processing for Remote Sensing

246 x1

w11 w12 f (·)

w1m w21

b1

x2

v1

w22 f (·) v2

w2m

g (·)

b2

y

c wn1

vm wn1 f (·)

xn

wnm bm

FIGURE 12.8 A two-layer feedforward neural net with one output node.

Perceptrons can be combined to form a multi-layer network, as shown in Figure 12.8. In Figure 12.8, xi is the ith input, n is the number of inputs, wij is the weight associated with the connection from the ith input to the jth node in the hidden layer, bi is the bias of the ith node, m is the number of nodes in the hidden layer, f is the transfer function of the perceptrons in the hidden layer, vi is the weight between the ith node and the output node, c is the bias of the output node, g is the transfer function of the output node, and y is the output. Then, 0 ! 1 m n X X vj f wij xi þbj þcA y¼[email protected] (12:22) j ¼1

i ¼1

In this chapter, f and g are defined as follows: f (x)¼ tanh x¼

ex ex ex þex

g(x)¼x

(12:23) (12:24)

The function tanh x is approximately linear in the range 0.6 x 0.6, and approaches 1 as x tends to 1 and 1 as x tends to 1, so it has a nonlinearity that is not too complex (Figure 12.9). This neural net topology is good for situations in which one wants to develop a simple nonlinear estimator whose output depends approximately monotonically on each input.

Satellite-Based Precipitation Retrieval Using Neural Networks

247

2.5 2

tanh(x) (solid) and x (dashed)

1.5 1 0.5 0

−0.5 −1

−1.5 −2 −2.5 −2.5

−2

−1.5

−1

−0.5

0 X

0.5

1

1.5

2

2.5

FIGURE 12.9 Neural net transfer functions.

The neural nets for this chapter were trained using the Levenberg–Marquardt training algorithm. Marquardt developed an efficient algorithm (called the Marquardt–Levenberg algorithm in Ref. [51]) for nonlinear least-square parameter estimation [52]. Hagan and Menhaj incorporated this algorithm into a backpropagation training algorithm for feedforward neural nets [52]. The weights of the neural net were initialized using the Nguyen– Widrow method to facilitate convergence of the neural net weights during training [53]. The vectors used to train and evaluate the neural nets were divided into three disjoint sets: 1. The training set, the set used to determine how the weights of the neural net should be adjusted during the training 2. The validation set, the set used to determine when the training should stop 3. The testing set, the set used to evaluate the resulting neural net These definitions are from Ref. [54]. One of the challenges encountered in the course of developing an estimator involved dealing with an output range that covered several orders of magnitude. Chapter 4 in this volume describes how this was accomplished.

12.5

The C hen–Staelin Algorithm

The basic structure of the algorithm includes some signal-processing components and a neural net, as shown in Figure 12.10. The signal processing components process the data

Signal and Image Processing for Remote Sensing

248 AMSU data Auxiliary data

Neural net

Signal processing

Rain rate metric

FIGURE 12.10 Basic structure of the algorithm.

into forms that characterize the most important degrees of freedom related to the precipitation rate such as atmospheric temperature profile, water vapor profile, cloud-top altitude, particle size distribution, and vertical updraft velocity. The neural net is trained to learn the nonlinear dependencies of precipitation rate on these variables. The dependence of the precipitation rate on these variables should be monotonic, so the neural net does not need to be complicated. A feedforward neural net with one hidden layer of tangent sigmoid nodes (with transfer function f(x) ¼ tanh x) and one linear output node should be sufficient (Figure 12.8) [49]. The Chen–Staelin algorithm uses the signal-processing methods in the previous section to extract the most relevant information from AMSU data. Figure 12.11 shows a block diagram of the first part of the algorithm and Figure 12.12 shows the final part of the algorithm with a neural net. A neural net that takes the following sets of inputs is at the heart of the algorithm: 1. Inferred 15-km-resolution perturbations at 52.8 GHz, 53.6 GHz, 54.4 GHz, 54.9 GHz, and 55.5 GHz.

HSB 183 ± 7 GHz

Compute 183 ±7 GHz perturbations

HSB 183 ± 3 GHz

Compute 183 ± 3 GHz perturbations

Out of bounds HSB 4 Flag invalid 183 ± 1 GHz 15-km points 183 ± 3 GHz 183 ± 7 GHz Out of bounds 150 GHz 13 Flag invalid AMSU 50-km points chs. 1–12,15

Land/sea flag Scan angle

9

C

OR

OR A

Interpolate to 15-km

AND

Extend to 50-km Form valid interpolation mask

Extend to 50-km D

B

Limb and surface 5 5 Laplacian 5 correction (chs. 4–8) interpolation − ⊕ + 5 cosine

FIGURE 12.11 Block diagram of the algorithm, part 1.

z < 0?

Compute scaling factor

y y < 0?

OR

z

E

53.6-GHz switch

Elevation flag

Surface elevation

AMSU chs. 4–12

Filter to 50-km

Extend to 50-km Form valid interpolation mask

NOT

AND

⊗

5

NOT

5 5 Set warm 5 5 Interpolate parts to 0 to 15-km 5 50-km, 54-GHz perturbations

15-km, 540-GHz 50-km valid 15-km valid retrieval mask perturbations retrieval mask

2. 183 + 1-, 183 + 3-, and 183 + 7-GHz 15-km HSB data.

Satellite-Based Precipitation Retrieval Using Neural Networks

249

5

15-km, 54-GHz perturbations

5

50-km, 54-GHz perturbations

– 9

AMSU chs. 4–12 Land/sea flag Scan angle

5

+

Limb and surface correction

50-km precipitation retrieval

3 5

Interpolate to 15-km

Temperature profile principal components

cosine

50-km valid retrieval mask

Linear transform

Filter to 50-km

3

AMSU chs. 1–3,15

4

4

Interpolate to 15-km

HSB 150-GHz

Linear transform

4 Water vapor profile principal components

HSB 183-GHz Satellite zenith angle

3

2

Neural x net

10x–1

3

3

15-km precipitation retrieval

15-km valid retrieval mask

secant

FIGURE 12.12 Block diagram of the algorithm, part 2.

3. The leading three principal components characterizing the original five corrected 50-km AMSU-A temperature radiances. 4. Two surface-insensitive principal components that characterize the window channels at 23.8 GHz, 31.4 GHz, 50 GHz, and 89 GHz, along with the four HSB channels. 5. The secant of the satellite zenith angle u. Each of these sets provides the neural net with information that is relevant to precipitation. The three principal components characterizing AMSU-A temperature radiances provide information on the atmospheric temperature profile, which is important because it determines how much water vapor can be precipitated. The secant of the satellite zenith angle u allows the neural net to account for variations in the data due to the scan angle. The 15-km cloud-induced perturbations provide information on the cloud-top altitude. The current AMSU/HSB precipitation retrieval algorithm is based on NOAA-15 AMSU comparisons with NEXRAD over the eastern United States during 38 orbits that exhibited significant precipitation and were distributed throughout the year. These orbits are listed in Table 12.3. The primary precipitation-rate retrieval products of AMSU/HSB are 15and 50-km-resolution contiguous retrievals over the viewing positions of HSB and AMSU, respectively, within 438 of nadir. The two outermost 50-km and six outermost 15-km viewing positions on each side of the swath are omitted due to their grazing angles. The algorithm architectures for these two retrieval methods and the derivation of the numerical coefficients characterizing the neural network are described and presented below.

12.5.1

Limb-Correction of Temperature Profile Channels

AMSU observes at angles up to 498 away from the nadir. For angles further away from nadir, the electromagnetic energy originating from a given altitude and atmospheric state

Signal and Image Processing for Remote Sensing

250 TABLE 12.3

List of Rainy Orbits Used for Training, Validation, and Testing October 16, 1999, 0030 UTC October 31, 1999, 0130 UTC November 2, 1999, 0045 UTC December 4, 1999, 1445 UTC December 12, 1999, 0100 UTC January 28, 2000, 0200 UTC January 31, 2000, 0045 UTC February 14, 2000, 0045 UTC February 27, 2000, 0045 UTC March 11, 2000, 0100 UTC March 17, 2000, 0015 UTC March 17, 2000, 0200 UTC March 19, 2000, 0115 UTC April 2, 2000, 0100 UTC April 4, 2000, 0015 UTC April 8, 2000, 0030 UTC April 12, 2000, 0045 UTC April 12, 2000, 0215 UTC April 20, 2000, 0100 UTC

April 30, 2000, 1430 UTC May 14, 2000, 0030 UTC May 19, 2000, 0015 UTC May 19, 2000, 0145 UTC May 20, 2000, 0130 UTC May 25, 2000, 0115 UTC June 10, 2000, 0200 UTC June 16, 2000, 0130 UTC June 30, 2000, 0115 UTC July 4, 2000, 0115 UTC July 15, 2000, 0030 UTC August 1, 2000, 0045 UTC August 8, 2000, 0145 UTC August 18, 2000, 0115 UTC August 23, 2000, 1315 UTC September 23, 2000, 1315 UTC October 5, 2000, 0130 UTC October 6, 2000, 0100 UTC October 14, 2000, 0130 UTC

has to travel longer paths before reaching the radiometer and, therefore, is subject to more absorption and scattering effects. This results in scan-angle-dependent effects in brightness temperature images, as shown in Figure 12.13a. A limb and surface correction method for AMSU-A channels 4–8 brightness temperatures is needed to make precipitation-induced perturbations more apparent and for extracting information about atmospheric conditions. AMSU-A channels 4 and 5 are corrected for surface variations, as they are sensitive to the surface. For these two channels, the brightness temperature for pixels over ocean is corrected to what might be observed for the same atmospheric conditions over land. AMSU-A channels 9–14 brightness temperatures are not corrected because they are not significantly perturbed by clouds and therefore are not used for anything other than limb correction. Limb and surface correction was done by training a neural net of the type shown in Figure 12.8 to estimate nadir-viewing brightness temperatures. For each pixel, the neural net used brightness temperatures from several channels at that pixel to estimate the brightness temperature seen at the pixel closest to nadir at a nearly identical latitude and at nearly the same time. It is assumed that the temperature field does not vary significantly over one scan. Limb and surface correction was done for AMSU-A channels 4–8. The data used to correct each of these channels are listed in Table 12.4. No attempt has been made to correct for the scan-angle-dependent asymmetry in the brightness temperatures. These neural nets were trained using data between 558N and 558S from seven orbits spaced over 1 year. Channels 4 and 5 are surface sensitive, so they were trained to estimate brightness temperatures that would be seen over land. Figure 12.13 shows a sample of (a) uncorrected and (b) limb and surface corrected 54.4GHz brightness temperatures. The shapes of precipitation systems over Texas and the Mexico–Guatemala border are more apparent after the limb correction. In Figure 12.13(a), the difference between brightness temperatures at nadir and the swath edge is as high as 18 K. In Figure 12.13(b), the angle-dependent variation is less than 3 K. One limitation of the training is that one cannot really know what the nadir-viewing brightness temperature is supposed to be when there is precipitation.

Satellite-Based Precipitation Retrieval Using Neural Networks 110⬚W

100⬚W

90⬚W

80⬚W

251 244

40⬚N

242

238 30⬚N

236 234

25⬚N

232

Brightness temperature (K)

240

35⬚N

230

20⬚N

228 15⬚N

226

(a) 110⬚W

100⬚W

90⬚W

80⬚W

243

40⬚N

242.5 242 241.5 30⬚N

241 240.5

25⬚N

240 239.5

20⬚N

Brightness temperature (K)

35⬚N

239 238.5

15⬚N

(b)

238

FIGURE 12.13 (See color insert following page 302.) NOAA-15 AMSU-A 54.4-GHz brightness temperatures for a northbound track on September 13, 2000. (a) Uncorrected and (b) limb and surface corrected.

12.5.2

Detection of Precipitation

The 15-km-resolution precipitation-rate retrieval algorithm, summarized in Figure 5.2, and Figure 5.3 begins with the identification of potentially precipitating pixels. The neural net operates on data from only FOVs labeled as potentially precipitating. This choice eliminates the need to exhaustively learn all of the conditions where the precipitation rate is exactly zero. The neural net is also likely to have difficulty forcing precipitation rates in nonprecipitating FOVs to be exactly zero. This choice also reduces the time needed to train the neural net since, at any given time, precipitation falls over less than 10% of the Earth’s surface. All 15-km pixels with brightness temperatures at 183 + 7 GHz that are below a threshold T7 are flagged as potentially precipitating, where

Signal and Image Processing for Remote Sensing

252 TABLE 12.4

Data Used in Limb and Surface Correction of AMSU-A Channels AMSU-A Channel

Inputs Used for Limb and Surface Correction

4 5 6 7 8

AMSU-A channels 4–12, land/sea flag, cos w AMSU-A channels 5–12, land/sea flag, cos w AMSU-A channels 6–12, cos w AMSU-A channels 6–12, cos w AMSU-A channels 6–12, cos w

T7 ¼0:667(T53:6 248)þ262þ6 cos u

(12:25)

and where u is the satellite zenith angle and T53.6 is the spatially filtered 53.6-GHz brightness temperature obtained by selecting the warmest brightness temperature within a 77 array of AMSU-B pixels. If, however, T53.6 is below 248 K, then the brightness temperature at 183 + 3 GHz is compared, instead, to a different threshold T3, where T3 ¼242:5þ5 cos u

(12:26)

The 183 + 3-GHz band is used to flag potential precipitation when the 183 + 7-GHz flag could be erroneously set by low-surface emissivity in very cold and dry atmospheres, as indicated by T53.6. These thresholds T7 and T3 are slightly colder than a saturated atmosphere would be, implying the presence of a microwave-absorbing cloud. If the locally filtered T53.6 is less than 242 K, then the pixel is assumed not to be precipitating. 12.5.3

Cloud-Clearing by Regional Laplacian Interpolation

Within regions flagged as potentially precipitating, strong precipitation is generally characterized by cold, cloud-induced perturbations of the AMSU-A tropospheric temperature sounding channels in the range of 52.5–55.6 GHz. Brightness temperature images approximately satisfy Laplace’s equation in the absence of precipitation. When the potentially precipitating FOVs have been identified, Laplacian interpolation can be performed to clear the brightness temperature image of the effects of precipitation, and the perturbations due to precipitation can be computed. Examples of 183 + 7-GHz data and the corresponding 50-km cold perturbations at 52.8 GHz are illustrated in Figure 12.14a and Figure 12.14c. Physical considerations and aircraft data show that convective cells near 54 GHz typically appear slightly off-center and less extended relative to the 183-GHz images [55,56].

12.5.4

Image Sharpening

The small interpolation errors in converting 54-GHz perturbations to 15-km contribute to the total errors and discrepancies discussed in Section 12.3. These 50-km-resolution 52.8GHz perturbations DT50,52.8 are then used to infer the perturbations DT15,52.8 [Figure 12.14d]. These might have been observed at 52.8 GHz with a 15-km resolution had those perturbations been distributed spatially in the same way as the cold perturbations observed at either 183 + 7 or 183 + 3 GHz, the choice between these two channels being the same as described above. This requires the bilinearly interpolated 50-km AMSU data

Satellite-Based Precipitation Retrieval Using Neural Networks 92⬚W

94⬚W

90⬚W

88⬚W

260

34⬚N

92⬚W

94⬚W

86⬚W

253 90⬚W

88⬚W

86⬚W

260 34⬚N

240

240

220 32⬚N

32⬚N

220 200 30⬚N

30⬚N

200

180 (a)

(b) 92⬚W

94⬚W

34⬚N

90⬚W

88⬚W

86⬚W

0

–2

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

0

–2

34⬚N

–4 –4

32⬚N

32⬚N

–6 –6 30⬚N

30⬚N

–8

–8 (c)

(d)

FIGURE 12.14 (See color insert following page 302.) Frontal system on September 13, 2000, 0130 UTC. (a) Brightness temperatures (K) near 183 + 7 GHz. (b) Brightness temperatures (K) near 183 + 3 GHz. (c) Brightness temperature perturbations (K) near 52.8 GHz. (d) Inferred 15-km-resolution brightness temperature perturbations (K) near 52.8 GHz.

to be resampled at the HSB beam positions. These inferred 15-km perturbations are computed for five AMSU-A channels using DT15,54 ¼ 20 tan h

DT15,183 DT50,54 DT50,183

(12:27)

The perturbation DT15,183 near 183 GHz is defined to be the difference between the observed radiance and the appropriate threshold given by (12.25) or (12.26). The perturbation DT50,54 near 54 GHz is defined to be the difference between the observed radiance and the Laplacian-interpolated radiance based on those pixels surrounding the flagged region [58]. Any warm perturbations in the images of DT15,183 and DT50,54 are set to zero. Limb and surface-emissivity corrections to nadir for the five 54-GHz channels are produced by neural networks for each channel; they operate on nine AMSU-A channels above 52 GHz, the cosine of the viewing angle w from nadir, and a land–sea flag (Figure 12.12). They were trained on seven orbits spaced over 1 year for latitudes up to +558. Inferred 50- and 15-km precipitation-induced perturbations at 52.8 GHz are shown in Figure 12.14c and Figure 12.14d for a frontal system. Such estimates of 15-km perturbations near 54 GHz help characterize heavily precipitating small cells.

Signal and Image Processing for Remote Sensing

254 12.5.5

Temperature and Water Vapor Profile Principal Components

One important determinant of precipitation is the temperature profile. Warmer atmospheres can hold more water vapor and result in higher vertical updraft velocities. Therefore, inputs to the neural net in Figure 12.10 should include some that have information about the temperature profile. For each of AMSU-A channels 4–8, the brightness temperatures were corrected for limb and surface effects and then processed to eliminate precipitation signatures with the methods described in Section 12.4.1 through Section 12.4.3. The corrected brightness temperatures from all five of these channels could have been inputs to the neural net in Figure 12.10, but it was determined that a more compact representation of these channels was sufficient. PCA was applied to these five channels, and the first three principal components were found to be sufficient for characterizing the temperature profile. Adding the fourth and fifth principal components did not significantly improve the training of the neural net. The water vapor profile is another important determinant of precipitation. Higher concentrations of water vapor can result in higher precipitation rates. The water vapor principal components are computed using AMSU-A channels 1–3, and 15, and the AMSU-B 150-, 183 + 7-, 183 + 3-, and 183 + 1-GHz channels. Some of these channels are sensitive to surface variations. Therefore, it is necessary to project the vector of these observations onto a subspace that is not significantly sensitive to surface variations. Constrained PCA, which was described in Section 12.4.2.2, was used to compute the water vapor principal components. A set of pixels without precipitation and with different types of surfaces was selected to compute surface-sensitive eigenvectors using PCA. The surface-sensitive eigenvectors were determined by visual inspection of the preconstraint principal components for correlation with surface features (e.g., land and sea boundaries). Then, a set of data that also included precipitation was selected. The observations over this set were projected onto a linear subspace that was orthogonal to the subspace spanned by the surface-sensitive eigenvectors. Then, PCA was done on the resulting data set to determine the water vapor principal components. It was found that two water vapor principal components were adequate for characterizing the eight channels.

12.5.6

The Neural Net

All 13 of the variables listed at the beginning of this section are fed into the neural net used for 15-km precipitation-rate retrievals, as shown in Figure 12.12. The relative insensitivity of these inputs to surface emissivity is important to the success of this technique over land, ice, and snow. This network was trained to minimize the rms value of the difference between the logarithms of the (AMSUþ1 mm/h) and (NEXRADþ1 mm/h) retrievals; the use of logarithms reduced the emphasis on the heaviest rain rates, which were roughly three orders of magnitude greater than the lightest rates. Adding 1 mm/h reduced the emphasis on the lightest rain rates, which were more noise-dominated. These intuitive choices clearly impact the retrieval error distribution, and therefore further studies should enable algorithm improvements. However, retrievals with training optimized for low rain rates did not markedly improve that regime. NEXRAD precipitation retrievals with a 2-km resolution were smoothed to approximate Gaussian spatial averages that were centered on and approximated the view-angle-distorted 15- or 50-km antenna beam patterns. The accuracy of NEXRAD precipitation observations is known to vary with distance; therefore, only points beyond 30 km, but within 110 km, of each NEXRAD radar site were included in the data used to train and test the neural nets. Eighty different networks were trained using the Levenberg–Marquardt algorithm, each with different numbers of nodes

Satellite-Based Precipitation Retrieval Using Neural Networks

255

and water vapor principal components. A network with nearly the best performance over the testing dataset was chosen; it used two surface-blind water vapor principal components, and a slightly better performance was achieved with five water vapor principal components with increased surface sensitivity. The final network had one hidden layer with five nodes that used the tanh sigmoid function. These neural networks were similar to those described in Ref. [57]. The resulting 15-km-resolution precipitation retrievals were then smoothed to yield 50-km retrievals. The 15-km retrieval neural network was trained using precipitation data from the 38 orbits listed in Table 12.3. During this period, the radio interference to AMSU-B was negligible relative to other sources of retrieval error. Each 15-km pixel flagged as potentially precipitating using 183 + 7- or 183 + 3-GHz radiances (see Figure 12.11 and Figure 12.12) was used for training, validation, or testing of the neural network. For these 38 orbits over the United States, 15 one-hundred and sixty 15-km pixels were flagged and considered suitable for training, validation, and testing; half were used for training and one quarter were used for each of validation and testing, where the validation pixels were used to determine when the training of the neural network should cease. On the basis of the final AMSU and NEXRAD 15-km retrievals, approximately 14 and 38%, respectively, of the flagged 15-km pixels appeared to have been precipitating less than 0.1 mm/h for the test set.

12.6

Retrieval Performance Evaluation

This section presents three forms of evaluation for this initial precipitation-rate retrieval algorithm: (1) representative qualitative comparisons of AMSU and NEXRAD precipitation rate images, (2) quantitative comparisons of AMSU and NEXRAD retrievals stratified by NEXRAD rain rate, and (3) representative precipitation images at more extreme latitudes beyond the NEXRAD training zone.

12.6.1

Image Comparisons of NEXRAD and AMSU-A/B Retrievals

Each NEXRAD comparison at 15-km resolution occurred within 8 min of satellite overpass; such coincidence is needed to characterize single-pixel retrievals because convective precipitation evolves rapidly on this spatial scale. Although comparison with instruments such as TRMM and SSM/I would be useful, their orbits unfortunately overlap those of AMSU within 8 min so infrequently (if ever) that comparisons over precipitation are too rare to be useful until several years of data have been analyzed. This challenge of simultaneity and the sporadic character of rain have restricted most prior instrument comparisons (passive microwave satellites, radar, rain gauges) to dimensions over 100 km and to periods of an hour to a month [58–60]. The uniformity and extent of the NEXRAD network offer a unique degree of simultaneity on 15- and 50-km scales and also the ability to match the Gaussian shape of the AMSU antenna beams. Although these AMSU/HSB–NEXRAD comparisons are encouraging because they involve single pixels and independent physics and facilities, further extensive analyses are required for real validation. For example, comparisons of precipitation averages and differences over the same time/space units used to validate other precipitation measurement systems (e.g., SSM/I [61], ATOVS, TRMM, rain gauges) are needed to characterize variances and systematic biases based on the precipitation rate, type, location, or season.

Signal and Image Processing for Remote Sensing

256

These biases include any present in the NEXRAD data used to train the AMSU/HSB algorithm; once characterized, they can be diminished. Any excess variance experienced for rain cells too small to be resolved by AMSU/HSB can also eventually be better characterized, although it is believed to be modest for cells with microwave signatures larger than 10 km. Smaller cells contribute little to the total rainfall.

12.6.2

Numerical Comparisons of NEXRAD and AMSU-A/B Retrievals

Figure 12.15a and Figure 12.15b present 15-km-resolution precipitation retrieval images for September 13, 2000, obtained from NEXRAD and AMSU, respectively. On this occasion, both sensors yielded rain rates over 50 mm/h at similar locations and lower rain rates down to 0.5 mm/h over comparable areas. The revealed morphology is thus very similar, even though AMSU observes 6 min before NEXRAD, and it senses altitudes that are separated by several kilometers; rain falling at a nominal rate of 10 m/s takes 10 min to fall 6 km. Figure 12.16 shows the scatter between the 15-km AMSU and NEXRAD rain-rate retrievals for the test pixels not used for training or validation. Figure 12.17 shows the scatter between the 50-km AMSU and NEXRAD rain-rate retrievals over all points flagged as precipitating.

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

34⬚N

50

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

50

40 34⬚N

40

30

30 32⬚N

32⬚N

30⬚N

20

20

10 30⬚N

10

0

0 (a)

(b) 92⬚W

94⬚W

34⬚N

90⬚W

88⬚W

86⬚W

50

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

50

40 34⬚N

40

30

30 32⬚N

32⬚N

30⬚N

20

20

10 30⬚N

10

0

0 (c)

(d)

FIGURE 12.15 (See color insert following page 302.) Precipitation rates (mm/h) above 0.5 mm/h observed on September 13, 2000, 0130 UTC. (a) 15-km-resolution NEXRAD retrievals, (b) 15-km-resolution AMSU retrievals, (c) 50-km-resolution NEXRAD retrievals, and (d) 50km-resolution AMSU retrievals.

Satellite-Based Precipitation Retrieval Using Neural Networks

257

15-km AMSU rain rate (mm/h) + 0.1 mm/h

103

102

101

100

10−1 10−1

100

101

102

103

15-km NEXRAD rain rate (mm/h) + 0.1 mm/h FIGURE 12.16 Comparison of AMSU and NEXRAD estimates of rain rate at 15-km resolution.

50-km AMSU rain rate (mm/h) + 0.1 mm/h

102

101

100

10−1 10−1

100 101 50-km NEXRAD rain rate (mm/h) + 0.1 mm/h

FIGURE 12.17 Comparison of AMSU and NEXRAD estimates of rain rate at 50-km resolution.

102

Signal and Image Processing for Remote Sensing

258

The relative sensitivity of AMSU and NEXRAD to light and heavy rain can be seen in Figure 12.17. In general, these figures suggest that AMSU responds less to the highest radar rain rates, perhaps because AMSU is less sensitive to the bright-band or hail anomalies that affect the radar. They also suggest that the risk of false rain detections increases for AMSU retrievals below 0.5 mm/h at a 50-km resolution, although further study is required. Greater accuracy at these low rates requires more space-time averaging and careful calibration. The risk of overestimating rain rate also appears to be limited. Only 3.3% of the total AMSU-derived rainfall was in areas where AMSU saw more than 1 mm/h and NEXRAD saw less than 1 mm/h. Only 7.6% of the total NEXRAD-derived rainfall was in areas where NEXRAD saw more than 1 mm/h and AMSU saw less than 1 mm/h. These percentages were compared with the total percentages of AMSU and NEXRAD rain that fell at rates above 1 mm/h, which were 94 and 97%, respectively. It is also interesting to see to what degree does each sensor retrieve rain when the other does not, and how much rain does each sensor miss. For example, of the 73 NEXRAD 15-km rain-rate retrievals in Figure 12.16 above 54 mm/h, none were found by AMSU to be below 3 mm/h, and of the 61 AMSU 15-km retrievals above 45 mm/h, none were found by NEXRAD to be below 16 mm/h. Also, of the 69 NEXRAD 50-km rainrate retrievals in Figure 12.17 above 30 mm/h, none were found by AMSU to be below 5 mm/h, and of the 102 AMSU 50-km retrievals above 16 mm/h, none were found by NEXRAD to be below 10 mm/h. Perhaps the most significant AMSU precipitation performance metric is the rms difference between the NEXRAD and AMSU rain-rate retrievals; these are grouped by retrieved NEXRAD rain rates in octaves. The central 26 AMSU-A scan angles and central 78 AMSU-B scan angles were included in these evaluations; only the outermost two AMSUA angles on each side were omitted. These comparisons used all 50-km pixels and only the 15-km pixels were not used for training or validation. The results are listed in Table 12.5. The smoothing of the 15-km NEXRAD and AMSU results to a nominal 50-km resolution was consistent with an AMSU-A Gaussian beamwidth of 3.38. The rms agreement between these two very different precipitation-rate sensors appears surprisingly good, particularly since a single AMSU neural network is used over all angles, seasons, and latitudes. The 3-GHz radar retrievals respond most strongly to the largest hydrometeors, especially those below the bright band near the freezing level, while AMSU interacts with the general population of hydrometeors in the top few kilometers of the precipitation cell, which may lie several kilometers above the freezing level. Much of the agreement between AMSU and NEXRAD rain-rate retrievals must therefore result from the statistical consistency of the relations between rain rate and its various electromagnetic signatures. It is difficult to say how much of the observed TABLE 12.5 RMS AMSU/NEXRAD Discrepancies (mm/h) NEXRAD Range <0.5 mm/h 0.5–1 mm/h 1–2 mm/h 2–4 mm/h 4–8 mm/h 8–16 mm/h 16–32 mm/h >32 mm/h

15-km Resolution

50-km Resolution

1.0 2.0 2.3 2.7 3.5 6.9 19.0 42.9

0.5 0.9 1.1 1.8 3.2 6.6 12.9 22.1

Satellite-Based Precipitation Retrieval Using Neural Networks

259

discrepancy is due to each sensor or how well each correlates with precipitation reaching the ground. Furthermore, this study provided an opportunity for evaluation of radar data. The rms discrepancies between AMSU and NEXRAD retrievals were separately calculated over all points at ranges from 110 to 230 km from any radar. For NEXRAD precipitation rates below 16 mm/h, these rms discrepancies were approximately 40% greater than those computed for test points at the 30- to 110-km range. At rain rates greater than 16 mm/h, the accuracies beyond 110 km were more comparable. Most points in the eastern United States are more than 110 km from any NEXRAD radar site.

12.6.3

Global Retrievals of Rain and Snow

Figure 12.18 illustrates precipitation-rate retrievals at points around the globe where radar confirmation data are scarce. Figure 12.18a shows precipitation retrievals in the tropics over a mix of land and sea, while Figure 12.18b shows a more intense tropical 120⬚E

125⬚E

130⬚E

135⬚E

14⬚N

22⬚N

12⬚N

20⬚N

95⬚E

100⬚E

105⬚E

110⬚E

18⬚N

10⬚N

16⬚N 8⬚N

14⬚N

6⬚N

12⬚N

4⬚N

10⬚N 8⬚N

2⬚N

6⬚N (b)

(a) 0⬚ 75⬚N

108⬚W

104⬚W

74⬚W 45⬚N

100⬚W

72⬚W

70⬚W

68⬚W

44⬚N 74⬚N 43⬚N 42⬚N

73⬚N

41⬚N 72⬚N 40⬚N 71⬚N (c)

39⬚N (d)

FIGURE 12.18 AMSU precipitation-rate retrievals (mm/h) with 15-km resolution. (a) Philippines, April 16, 2000; (b) Indochina, July 5, 2000; (c) Canada, August 2, 2000; and (d) New England snowstorm, March 5, 2001. Precipitation-rate retrievals exceed 0.5 mm/h in the shaded regions, and contours are drawn for 0.5 mm/h, 2 mm/h, 8 mm/h, 32 mm/h, and 128 mm/h. The peak retrieved values are 47 mm/h, 143 mm/h, 30 mm/h, and 1.5 mm/h in (a), (b), (c), and (d), respectively

Signal and Image Processing for Remote Sensing

260

event. Figure 12.18c illustrates strong precipitation near 728N to 748N, again over both land and sea. Finally, Figure 12.18d illustrates the March 5, 2001, New England snowstorm that deposited roughly a foot of snow within a few hours: an accumulation somewhat greater than is indicated by the retrieved rain rates of 1.2 mm/h. This applicability of the algorithm to snowfall rate should be expected because the observed radio emission originates exclusively at high altitudes. Whether the hydrometeors are rain or snow on impact depends only on air temperatures near the surface—far below those altitudes being probed. For essentially all of the pixels shown in Figure 12.18, the adjacent clear air exhibited temperature and humidity profiles (inferred from AMSU) within the range of the training set. Nonetheless, regional biases are expected and will require evaluation. For example, polar stratiform precipitation is expected to exhibit relatively weaker radiometric signatures in winter when the temperature lapse rates are lower, and snow-covered mountains in cold polar air can produce false detections.

12.7

Conclusions

In this chapter, the precipitation estimation method for microwave radiometric data from Chen and Staelin and the role of signal processing methods were described. The development of the Chen–Staelin algorithm shows that signal processing can play a useful role in satellite-based precipitation estimation. In this algorithm, which was developed for AMSU-A/B and AMSU/HSB, PCA was used to reduce the dimensionality of selected sets of channels and to separate the effects of surface variations from atmospheric variations. Data fusion was used to sharpen 50-km data from AMSU-A so that 15-km precipitation retrievals could be done. Laplacian filtering was applied to data from the 54-GHz band to quantify the effects of clouds, and neural nets were trained to learn the mathematical relationships between precipitation and the information resulting from the signal processing. The signal processing components of the algorithm were designed to process the brightness temperature measurements in a way that extracts the most relevant information, and the neural net was trained to learn the relationship between precipitation rate and the inputs. This Chen–Staelin algorithm represents a step in the ongoing development of microwave precipitation retrieval algorithms. The algorithm of Chen and Staelin likely can be improved by choosing more general signal processing methods or fine-tuning the ones already being used. For example, variations or extensions of PCA such as independent component analysis (ICA) could be used [63,64]. Additionally, methods like PCA and ICA can be improved by incorporating components from physics-based methods. Additional improvements can be made by making better use of the data from window channels. Future instruments also present opportunities for better precipitation retrievals. The Advanced Technology Microwave Sounder (ATMS) to be launched aboard the NPOESS preparatory project (NPP) and NPOESS satellite series could be considered a more advanced version of AMSU-A/B and AMSU/HSB because it has a set of channels very similar to that of AMSU-A/B and offers finer resolution and sampling for most channels [65,66]. Because of the similarity of channel sets, the algorithm of Chen and Staelin can be a starting point for the development of an algorithm for ATMS. The finer resolution and sampling of ATMS will likely lead to better image-sharpening methods and temperature and water vapor profile characterization.

Satellite-Based Precipitation Retrieval Using Neural Networks

261

Ackn owledgments The author wishes to thank P.W Rosenkranz, E. Williams, and M.M. Wolfson for helpful discussions, and S.P.L. Maloney and C. Lebell for assistance with the NEXRAD data. This work was supported by the National Oceanographic and Atmospheric Administration under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and not necessarily endorsed by the United States Government.

Re fer enc es 1. F.T. Ulaby, R.K. Moore, and A.K. Fung, Microwave Remote Sensing: Active and Passive, AddisonWesley, Reading, MA, 1981. 2. P.W. Rosenkranz, Retrieval of temperature of moisture profiles from AMSU-A and AMSU-B measurements, IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2429–2435, 2001. 3. P.W. Rosenkranz, Rapid radiative transfer model for AMSU/HSB channels, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 362–368, 2003. 4. L. Shi, Retrieval of atmospheric temperature profiles from AMSU-A measurement using a neural network approach, Journal of Atmospheric and Oceanic Technology, 18, 340–347, 2001. 5. W.J. Blackwell, J.W. Barrett, F.W. Chen, R.V. Leslie, P.W. Rosenkranz, M.J. Schwartz, and D.H. Staelin, NPOESS Aircraft Sounder Testbed-Microwave (NAST-M), instrument description and initial flight results, IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2444–2453, 2001. 6. R.V. Leslie, W.J. Blackwell, P.W. Rosenkranz, and D.H. Staelin, 183-GHz and 425-GHz passive microwave sounders on the NPOESS aircraft sounder testbed-microwave (NASTM), Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 1, 506–508, 2003. 7. R.V. Leslie, J.A. Loparo, P.W. Rosenkranz, and D.H. Staelin, Cloud and precipitation observations with the NPOESS Aircraft Sounder Testbed-Microwave (NAST-M) spectrometer suite at 54/118/183/425 GHz, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 2, 1212–1214, 2003. 8. R.V. Leslie and D.H. Staelin, NPOESS Aircraft Sounder Testbed-Microwave (NAST-M) observations of clouds and precipitation at 54, 118, 183, and 425 GHz, IEEE Transactions on Geoscience and Remote Sensing, 42(10), 2240–2247, 2004. 9. J. Houghton, The Physics of the Atmospheres, Cambridge University Press, New York, 2002. 10. Tropical Rainfall Measuring Mission, http://trmm.gsfc.nasa.gov/ 11. R.A. Houze, Cloud Dynamics, Academic Press, New York, 1993. 12. G. Hufford, A model for the complex permittivity of ice at frequencies below 1 THz, International Journal of Infrared and Millimeter Waves, 12(7), 677–682, 1991. 13. D. Deirmendjian, Electromagnetic Scattering on Spherical Polydispersions, Elsevier, New York, 1969. 14. M.S. Spina, M.J. Schwartz, D.H. Staelin, and A.J. Gasiewski, Application of multilayer feedforward neural networks to precipitation cell-top altitude estimation, IEEE Transactions on Geoscience and Remote Sensing, 36(1), 154–162, 1998. 15. A.J. Gasiewski, Atmospheric Temperature Sounding and Precipitation Cell Parameter Estimation Using Passive 118-GHz O2 Observations, Ph.D. thesis, MIT Department of Electrical Engineering and Computer Science, December 1988. 16. A.J. Gasiewski, Microwave radiative transfer in hydrometeors, Atmospheric Remote Sensing by Microwave Radiometry, Ed. M.A. Janssen, John Wiley & Sons, New York, 1993. 17. N. Grody, F. Weng, and R. Ferraro, Application of AMSU for obtaining hydrological parameters, Microwave Radiometry and Remote Sensing of the Earth’s Surface and Atmosphere, Eds. P. Pampaloni and S. Paloscia, pp. 339–351, 2000.

262

Signal and Image Processing for Remote Sensing

18. F. Weng, L. Zhao, R.R. Ferraro, G. Poe, X. Li, and N.C. Grody, Advanced microwave sounding unit cloud and precipitation algorithms, Radio Science, 38(4), 8068, 2003; doi: 10.1029/ 2002RS002679. 19. J.P. Hollinger, SSM/I instrument evaluation, IEEE Transactions on Geoscience and Remote Sensing, 28(5), 781–790, 1990. 20. C. Kummerow and L. Giglio, A passive microwave technique for estimating rainfall and vertical structure information from space. Part I: Algorithm description, Journal of Applied Meteorology, 33, 3–18, 1994. 21. D. Tsintikidis, J.L. Haferman, E.N. Anagnostou, W.F. Krajewski, and T.F. Smith, A neural network approach to estimating rainfall from spaceborne microwave data, IEEE Transactions on Geoscience and Remote Sensing, 35(5), 1079–1093, 1997. 22. T. Kawanishi, T. Sezai, Y. Ito, K. Imaoka, T. Takeshima, Y. Ishido, A. Shibata, M. Miura, H. Inahata, and R.W. Spencer, The Advanced Microwave Scanning Radiometer for the Earth observing system (AMSR-E), NASDA’s contribution to the EOS for global energy and water cycle studies, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 184–194, 2003. 23. T. Wilheit, C.D. Kummerow, and R. Ferraro, Rainfall algorithms for AMSR-E, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 204–213, 2003. 24. J. Ikai and K. Nakamura, Comparison of rain rates over the ocean derived from TRMM microwave imager and precipitation radar, Journal of Atmospheric and Oceanic Technology, 20, 1709–1726, 2003. 25. C. Kummerow, W. Barnes, T. Kozu, J. Shiue, and J. Simpson, The tropical rainfall measuring mission (TRMM) sensor package, Journal of Atmospheric and Oceanic Technology, 15, 809–817, 1998. 26. C. Kummerow and L. Giglio, A passive microwave technique for estimating rainfall and vertical structure information from space. Part I: Algorithm description, Journal of Applied Meteorology, 33, 3–18, 1994. 27. T. Wilheit, C.D. Kummerow, and R. Ferraro, Rainfall algorithms for AMSR-E, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 204–213, 2003. 28. T.J. Hewison and R. Saunders, Measurements of the AMSU-B antenna pattern, IEEE Transactions on Geoscience and Remote Sensing, 34(2), 405–412, 1996. 29. T. Mo, AMSU-A antenna pattern corrections, IEEE Transactions on Geoscience and Remote Sensing, 37(1), 103–112, 1999. 30. P.W. Rosenkranz, Improved rapid transmittance algorithm for microwave sounding channels, Proceedings of the 1998 IEEE International Geoscience and Remote Sensing Symposium, 2, 728–730, 1998. 31. B.H. Lambrigtsen, Calibration of the AIRS microwave instruments, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 369–378, 2003. 32. H.H. Aumann, M.T. Chahine, C. Gautier, M.D. Goldberg, E. Kalnay, L.M. McMillin, H. Revercomb, P.W. Rosenkranz, W.L. Smith, D.H. Staelin, L.L. Strow, and J. Susskind, AIRS/AMSU/ HSB on the aqua mission: design, science objectives, data products, and processing systems, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 253–264, 2003. 33. C.L. Parkinson, Aqua: an Earth-observing satellite mission to examine water and other climate variables, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 173–183, 2003. 34. W.J. Blackwell, Retrieval of Cloud-Cleared Atmospheric Temperature Profiles from Hyperspectral Infrared and Microwave Observations, Sc.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 2002. 35. W.J. Blackwell, Retrieval of atmospheric temperature and moisture profiles from hyperspectral sounding data using a projected principal components transform and a neural network, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 3, 2078–2081, 2003. 36. C.R. Cabrera-Mercader, Robust Compression of Multispectral Remote Sensing Data, Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Boston, MA, 1999. 37. F.W. Chen, Characterization of Clouds in Atmospheric Temperature Profile Retrievals, M.E. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Boston, MA, June 1998.

Satellite-Based Precipitation Retrieval Using Neural Networks

263

38. J. Lee, Blind Noise Estimation and Compensation for Improved Characterization of Multivariate Processes, Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, March 2000. 39. J. Lee and D.H. Staelin, Iterative signal-order and noise estimation for multivariate data, IEE Electronics Letters, 37(2), 134–135, 2001. 40. A. Mueller, Iterative Blind Seperation of Gaussian Data of Unknown Order, M.E. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer, June 2003. 41. I.T. Joliffe, Principal Component Analysis, Springer-Verlag, New York, 2002. 42. L. Wald, Some terms of reference in data fusion, IEEE Transactions on Geoscience and Remote Sensing, 37(3), 1190–1193, 1999. 43. D.L. Hall and J. Llinas, An introduction to multisensor data fusion, Proceedings of the IEEE, 85(1), 6–23, 1997. 44. C. Pohl and J.L. van Genderen, Multisensor image fusion in remote sensing: concepts, methods, and application, International Journal of Remote Sensing, 19(5), 823–854, 1998. 45. V.J.D. Tsai, Frequency-based fusion of multiresolution images, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 6, 3665–3667, 2003. 46. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 1. Linear case, Radio Science, 13(6), 1003–1010, 1978. 47. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 2. Nonlinear dependence of observables on the geophysical parameters, Radio Science, 17(1), 245–256, 1982. 48. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 3. Scanning multichannel microwave radiometer data, Radio Science, 17(1), 257–267, 1982. 49. R.P. Lippmann, An introduction to computing with neural nets, IEEE Acoustics, Speech, and Signal Processing Magazine, 4(2), 4–22, 1987. 50. S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1994. 51. M.T. Hagan and M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Transactions on Neural Networks, 5(6), 989–993, 1994. 52. D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics, 11(2), 431–441, 1963. 53. D. Nguyen and B. Widrow, Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights, Proceedings of the International Joint Conference on Neural Networks, 3, 21–26, 1990. 54. Neural Network Toolbox User’s Guide, The MathWorks, Inc., Natick, Massachusetts, 1998. 55. W.J. Blackwell et al., NPOESS aircraft sounder testbed-microwave (NAST-M): instrument description and initial flight results, IEEE Transactions on Geoscience and Remote Sensing, 39, 2444–2453, 2001. 56. A.J. Gasiewski, D.M. Jackson, J.R. Wang, P.E. Racette, and D.S. Zacharias, Airborne imaging of tropospheric emission at millimeter and submillimeter wavelengths, in Proceedings IGARSS, 2, 663–665, 1994. 57. D.H. Staelin and F.W. Chen, Precipitation observations near 54 and 183 GHz using the NOAA15 satellite, IEEE Transactions on Geoscience and Remote Sensing, 38, 2322–2332, 2000. 58. P.A. Arkin and P. Xie, The global precipitation climatology project: first algorithm intercomparison project, Bulletin of the American Meteorological Society, 75(3), 401–420, 1994. 59. E.A. Smith et al., Results of WetNet PIP-2 project, Journal of the Atmospheric Sciences, 55(9), 1483– 1536, 1998. 60. R.F. Adler et al., Intercomparison of global precipitation products: the third precipitation intercomparison project (PIP-3), Bulletin of the American Meteorological Society, 82(7), 1377–1396, 2001. 61. M.D. Conner and G.W. Petty, Validation and intercomparison of SSM/I rain-rate retrieval methods over the continental U.S., Journal of Applied Meteorology, 37(7), 679–700, 1998. 62. A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001. 63. F.R. Bach and M.I. Jordan, Kernel independent component analysis, Journal of Machine Learning Research, 3, 1–48, 2002.

264

Signal and Image Processing for Remote Sensing

64. J.C. Shiue, The advanced technology microwave sounder, a new atmospheric temperature and humidity sounder for operational polar-orbiting weather satellites, Proceedings of the SPIE, 4540, 159–165, 2001. 65. C. Muth, P.S. Lee, J.C. Shiue, and W.A. Webb, Advanced technology microwave sounder on NPOESS and NPP, Proceedings of the 2004 IEEE International Geoscience Remote Sensing Symposium, 4, 2454–2458, 2004. 66. F.W. Chen and D.H. Staelin, AIRS/AMSU/HSB precipitation estimates, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 410–417, 2003. 67. F.W. Chen, Global Estimation of Precipitation using opaque Microwave Bands, PhD thesis, Massachusetts Institute of Technology, Massachusetts, 2004.

Part II

Image Processing for Remote Sensing

13 Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

Dale L. Schuler, Jong-Sen Lee, and Dayalan Kasilingam

CONTENTS 13.1 Introduction ..................................................................................................................... 268 13.2 Measurement of Directional Slopes and Wave Spectra ........................................... 268 13.2.1 Single Polarization versus Fully Polarimetric SAR Techniques ............... 268 13.2.2 Single-Polarization SAR Measurements of Ocean Surface Properties..... 269 13.2.3 Measurement of Ocean Wave Slopes Using Polarimetric SAR Data....... 271 13.2.3.1 Orientation Angle Measurement of Azimuth Slopes ................ 271 13.2.3.2 Orientation Angle Measurement Using the Circular-Pol Algorithm .......................................................................................... 271 13.2.4 Ocean Wave Spectra Measured Using Orientation Angles....................... 272 13.2.5 Two-Scale Ocean-Scattering Model: Effect on the Orientation Angle Measurement ......................................................................................... 275 13.2.6 Alpha Parameter Measurement of Range Slopes........................................ 277 13.2.6.1 Cloude–Pottier Decomposition Theorem and the Alpha Parameter .......................................................................................... 277 13.2.6.2 Alpha Parameter Sensitivity to Range Traveling Waves .......... 279 13.2.6.3 Alpha Parameter Measurement of Range Slopes and Wave Spectra .................................................................................... 280 13.2.7 Measured Wave Properties and Comparisons with Buoy Data ............... 282 13.2.7.1 Coastal Wave Measurements: Gualala River Study Site........... 282 13.2.7.2 Open-Ocean Measurements: San Francisco Study Site ............. 284 13.3 Polarimetric Measurement of Ocean Wave–Current Interactions.......................... 286 13.3.1 Introduction ....................................................................................................... 286 13.3.2 Orientation Angle Changes Caused by Wave–Current Interactions ....... 287 13.3.3 Orientation Angle Changes at Ocean Current Fronts ................................ 291 13.3.4 Modeling SAR Images of Wave–Current Interactions ............................... 291 13.4 Ocean Surface Feature Mapping Using Current-Driven Slick Patterns ................ 293 13.4.1 Introduction ....................................................................................................... 293 13.4.2 Classification Algorithm .................................................................................. 297 13.4.2.1 Unsupervised Classification of Ocean Surface Features ........... 297 13.4.2.2 Classification Using Alpha–Entropy Values and the Wishart Classifier............................................................................. 297 13.4.2.3 Comparative Mapping of Slicks Using Other Classification Algorithms ........................................................................................ 300 13.5 Conclusions...................................................................................................................... 300 References ................................................................................................................................... 302 267

Signal and Image Processing for Remote Sensing

268

13.1

Introduction

Selected methods that use synthetic aperture radar (SAR) image data to remotely sense ocean surfaces are described in this chapter. Fully polarimetric SAR radars provide much more usable information than conventional single-polarization radars. Algorithms, presented here, to measure directional wave spectra, wave slopes, wave–current interactions, and current-driven surface features use this additional information. Polarimetric techniques that measure directional wave slopes and spectra with data collected from a single aircraft, or satellite, collection pass are described here. Conventional single-polarization backscatter cross-section measurements require two orthogonal passes and a complex SAR modulation transfer function (MTF) to determine vector slopes and directional wave spectra. The algorithm to measure wave spectra is described in Section 13.2. In the azimuth (flight) direction, wave-induced perturbations of the polarimetric orientation angle are used to sense the azimuth component of the wave slopes. In the orthogonal range direction, a technique involving an alpha parameter from the well-known Cloude–Pottier ) polarimetric decomposition theorem is entropy/anisotropy/averaged alpha (H/A/ used to measure the range slope component. Both measurement types are highly sensitive to ocean wave slopes and are directional. Together, they form a means of using polarimetric SAR image data to make complete directional measurements of ocean wave slopes and wave slope spectra. NASA Jet Propulsion Laboratory airborne SAR (AIRSAR) P-, L-, and C-band data obtained during flights over the coastal areas of California are used as wave-field examples. Wave parameters measured using the polarimetric methods are compared with those obtained using in situ NOAA National Data Buoy Center (NDBC) buoy products. In a second topic (Section 13.3), polarization orientation angles are used to remotely sense ocean wave slope distribution changes caused by ocean wave–current interactions. The wave–current features studied include surface manifestations of ocean internal waves and wave interactions with current fronts. A model [1], developed at the Naval Research Laboratory (NRL), is used to determine the parametric dependencies of the orientation angle on internal wave current, windwave direction, and wind-wave speed. An empirical relation is cited to relate orientation angle perturbations to the underlying parametric dependencies [1]. A third topic (Section 13.4) deals with the detection and classification of biogenic slick fields. Various techniques, using the Cloude–Pottier decomposition and Wishart classifier, are used to classify the slicks. An application utilizing current-driven ocean features, marked by slick patterns, is used to map spiral eddies. Finally, a related technique, using the polarimetric orientation angle, is used to segment slick fields from ocean wave slopes.

13.2 13.2.1

Measurement of Directional Slopes and Wave Spectra Single Polarization versus Fully Polarimetric SAR Techniques

SAR systems conventionally use backscatter intensity-based algorithms [2] to measure physical ocean wave parameters. SAR instruments, operating at a single-polarization, measure wave-induced backscatter cross section, or sigma-0, modulations that can be

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

269

developed into estimates of surface wave slopes or wave spectra. These measurements, however, require a parametrically complex MTF to relate the SAR backscatter measurements to the physical ocean wave properties [3]. Section 13.2.3 through Section 13.2.6 outline a means of using fully polarimetric SAR (POLSAR) data with algorithms [4] to measure ocean wave slopes. In the Fourier-transform domain, this orthogonal slope information is used to estimate a complete directional ocean wave slope spectrum. A parametrically simple measurement of the slope is made by using POLSAR-based algorithms. Modulations of the polarization orientation angle, u, are largely caused by waves traveling in the azimuth direction. The modulations are, to a lesser extent, also affected by range traveling waves. A method, originally used in topographic measurements [5], has been applied to the ocean and used to measure wave slopes. The method measures vector components of ocean wave slopes and wave spectra. Slopes smaller than 18 are measurable for ocean surfaces using this method. , described in Ref. An eigenvector or eigenvalue decomposition average parameter [6], is used to measure wave slopes in the orthogonal range direction. Waves in the range direction cause modulation of the local incidence angle f, which, in turn, changes . The alpha parameter is ‘‘roll-invariant.’’ This means that it is not affected the value of by slopes in the azimuth direction. Likewise, for ocean wave measurements, the orientation angle u parameter is largely insensitive to slopes in the range direction. An , u) is, therefore, capable of measuring slopes in any algorithm employing both ( direction. The ability to measure a physical parameter in two orthogonal directions within an individual resolution cell is rare. Microwave instruments, generally, must have a two-dimensional (2D) imaging or scanning capability to obtain information in two orthogonal directions. Motion-induced nonlinear ‘‘velocity-bunching’’ effects still present difficulties for wave measurements in the azimuth direction using POLSAR data. These difficulties are dealt with by using the same proven algorithms [3,7] that reduce nonlinearities for singlepolarization SAR measurements.

13.2.2

Single-Polarization SAR Measurements of Ocean Surface Properties

SAR systems have previously been used for imaging ocean features such as surface waves, shallow-water bathymetry, internal waves, current boundaries, slicks, and ship wakes [8]. In all of these applications, the modulation of the SAR image intensity by the ocean feature makes the feature visible in the image [9]. When imaging ocean surface waves, the main modulation mechanisms have been identified as tilt modulation, hydrodynamic modulation, and velocity bunching [2]. Tilt modulation is due to changes in the local incidence angle caused by the surface wave slopes [10]. Tilt modulation is strongest for waves traveling in the range direction. Hydrodynamic modulation is due to the hydrodynamic interactions between the long-scale surface waves and the short-scale surface (Bragg) waves that contribute most of the backscatter at moderate incidence angles [11]. Velocity bunching is a modulation process that is unique to SAR imaging systems [12]. It is a result of the azimuth shifting of scatterers in the image plane, owing to the motion of the scattering surface. Velocity bunching is the highest for azimuth traveling waves. In the past, considerable effort had gone into retrieving quantitative surface wave information from SAR images of ocean surface waves [13]. Data from satellite SAR missions, such as ERS 1 and 2 and RADARSAT 1 and 2, had been used to estimate surface wave spectra from SAR image information. Generally, wave height and wave

270

Signal and Image Processing for Remote Sensing

slope spectra are used as quantitative overall descriptors of the ocean surface wave properties [14]. Over the years, several different techniques have been developed for retrieving wave spectra from SAR image spectra [7,15,16]. Linear techniques, such as those having a linear MTF, are used to relate the wave spectrum to the image spectrum. Individual MTFs are derived for the three primary modulation mechanisms. A transformation based on the MTF is used to retrieve the wave spectrum from the SAR image spectrum. Since the technique is linear, it does not account for any nonlinear processes in the modulation mechanisms. It has been shown that SAR image modulation is nonlinear under certain ocean surface conditions. As the sea state increases, the degree of nonlinear behavior generally increases. Under these conditions, the linear methods do not provide accurate quantitative estimates of the wave spectra [15]. Thus, the linear transfer function method has limited utility and can be used as a qualitative indicator. More accurate estimates of wave spectra require the use of nonlinear inversion techniques [15]. Several nonlinear inversion techniques have been developed for retrieving wave spectra from SAR image spectra. Most of these techniques are based on a technique developed in Ref. [7]. The original method used an iterative technique to estimate the wave spectrum from the image spectrum. Initial estimates are obtained using a linear transfer function similar to the one used in Ref. [15]. These estimates are used as inputs in the forward SAR imaging model, and the revised image spectrum is used to iteratively correct the previous estimate of the wave spectra. The accuracy of this technique is dependent on the specific SAR imaging model. Improvements to this technique [17] have incorporated closed-form descriptions of the nonlinear transfer function, which relates the wave spectrum to the SAR image spectrum. However, this transfer function also has to be evaluated iteratively. Further improvements to this method have been suggested in Refs. [3,18]. In this method, a cross-spectrum is generated between different looks of the same ocean wave scene. The primary advantage of this method is that it resolves the 1808 ambiguity [3,18] of the wave direction. This method also reduces the effects of speckle in the SAR spectrum. Methods that incorporate additional a posteriori information about the wave field, which improves the accuracy of these nonlinear methods, have also been developed in recent years [19]. In all of the slope-retrieval methods, the one nonlinear mechanism that may completely destroy wave structure is velocity bunching [3,7]. Velocity bunching is a result of moving scatterers on the ocean surface either bunching or dilating in the SAR image domain. The shifting of the scatterers in the azimuth direction may, in extreme conditions, result in the destruction of the wave structure in the SAR image. SAR imaging simulations were performed at different range-to-velocity (R/V) ratios to study the effect of velocity bunching on the slope-retrieval algorithms. When the (R/V) ratio is artificially increased to large values, the effects of velocity bunching are expected to destroy the wave structure in the slope estimates. Simulations of the imaging process for a wide range of radar-viewing conditions indicate that the slope structure is preserved in the presence of moderate velocity-bunching modulation. It can be argued that for velocity bunching to affect the slope estimates, the (R/V) ratio has to be significantly larger than 100 s. The two data sets discussed here are designated ‘‘Gualala River’’ and ‘‘San Francisco.’’ The Gualala river data set has the longest waves and it also produces the best results. The R/V ratio for the AIRSAR missions was 59 s (Gualala) and 55 s (San Francisco). These values suggest that the effects of velocity bunching are present, but are not sufficiently strong to significantly affect the slope-retrieval process. However, for spaceborne SAR imaging applications, where the (R / V) ratio may be greater than 100 s, the effects of velocity bunching may limit the utility of all methods, especially in high sea states.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface 13.2.3

271

Measurement of Ocean Wave Slopes Using Polarimetric SAR Data

In this section, the techniques that were developed for the measurement of ocean surface slopes and wave spectra using the capabilities of fully polarimetric radars are discussed. Wave-induced perturbations of the polarization orientation angle are used to directly measure slopes for azimuth traveling waves. This technique is accurate for scattering from surface resolution cells where the sea return can be represented as a two-scale Bragg-scattering process. 13.2.3.1

Orientation Angle Measurement of Azimuth Slopes

It has been shown [5] that by measuring the orientation angle shift in the polarization signature, one can determine the effects of the azimuth surface tilts. In particular, the shift in the orientation angle is related to the azimuth surface tilt, the local incidence angle, and, to a lesser degree, the range tilt. This relationship is derived [20] and independently verified [6] as tan v tan u ¼ (13:1) sin f tan g cos f where u, tan v, tan g, and f are the shifts in the orientation angle, the azimuth slope, the ground range slope, and the radar look angle, respectively. According to Equation 13.1, the azimuth tilts may be estimated from the shift in the orientation angle, if the look angle and range tilt are known. The orthogonal range slope tan g can be estimated using the value of the local incidence angle associated with the alpha parameter for each pixel. The azimuth slope tan v and the range slope tan g provide complete slope information for each image pixel. For the ocean surface at scales of the size of the AIRSAR resolution cell (6.6 m 8.2 m), the averaged tilt angles are small and the denominator in Equation 13.1 may be approximated by sin f for a wide range of look angles, cos f, and ground range slope, tan g, values. Under this approximation, the ocean azimuth slope, tan v, is written as tan v ﬃ ( sin f) tan u

(13:2)

The above equation is important because it provides a direct link between polarimetric SAR measurable parameters and physical slopes on the ocean surface. This estimation of ocean slopes relies only on (1) the knowledge of the radar look angle (generally known from the SAR viewing geometry) and (2) the measurement of the wave-perturbed orientation angle. In ocean areas where the average scattering mechanism is predominantly tilted-Bragg scatter, the orientation angle can be measured accurately for angular changes <18, as demonstrated in Ref. [20]. POLSAR data can be represented by the scattering matrix for single-look complex data and by the Stokes matrix, the covariance matrix, or the coherency matrix for multi-look data. An orientation angle shift causes rotation of all these matrices about the line of sight. Several methods have been developed to estimate the azimuth slope–induced orientation angles for terrain and ocean applications. The ‘‘polarization signature maximum’’ method and the ‘‘circular polarization’’ method have proven to be the two most effective methods. Complete details of these methods and the relation of the orientation angle to orthogonal slopes and radar parameters are given [21,22]. 13.2.3.2

Orientation Angle Measurement Using the Circular-Pol Algorithm

Image processing was done with both the polarization signature maximum and the circular polarization algorithms. The results indicate that for ocean images a significant improvement in wave visibility is achieved when a circular polarization algorithm is chosen. In addition to this improvement, the circular polarization algorithm is

272

Signal and Image Processing for Remote Sensing

computationally more efficient. Therefore, the circular polarization algorithm method was chosen to estimate orientation angles. The most sensitive circular polarization estimator [21], which involves RR (right-hand transmit, right-hand receive) and LL (left-hand transmit, left-hand receive) terms, is * i) þ p]=4 u ¼ [Arg(hSRR SLL

(13:3)

A linear-pol basis has a similar transmit-and-receive convention, but the terms (HH, VV, HV, VH) involve horizontal (H) and vertical (V) transmitted, or received, components. The known relations between a circular-pol basis and a linear-pol basis are SRR ¼ (SHH SVV þ i2SHV )=2 SLL ¼ (SVV SHH þ i2SHV )=2

(13:4)

Using the above equation, the Arg term of Equation 13.3 can be written as 1 * i 4Re h(SHH SVV )SHV A u ¼ Arg(hSRR S*LL i) ¼ tan1 @ hjSHH SVV j2 i þ 4hjSHV j2 i 0

(13:5)

The above equation gives the orientation angle, u, in terms of three of the terms of the linear-pol coherency matrix. This algorithm has been proven to be successful in Ref. [21]. An example of the accuracy is cited from related earlier studies involving wave–current interactions [1]. In these studies, it has been shown that small wave slope asymmetries could be accurately detected as changes in the orientation angle. These small asymmetries had been predicted by theory [1] and their detection indicates the sensitivity of the circular-pol orientation angle measurement. 13.2.4

Ocean Wave Spectra Measured Using Orientation Angles

NASA/JPL/AIRSAR data were taken (1994) at L-band imaging a northern California coastal area near the town of Gualala (Mendocino County) and the Gualala River. This data set was used to determine if the azimuth component of an ocean wave spectrum could be measured using orientation angle modulation. The radar resolution cell had dimensions of 6.6 m (range direction) and 8.2 m (azimuth direction), and 3 3 boxcar averaging was done to the data inputted into the orientation angle algorithms. Figure 13.1 is an L-band, VV-pol, pseudo color-coded image of a northern California coastal area and the selected measurement study site. A wave system with an estimated dominant wavelength of 157 m is propagating through the site with a wind-wave direction of 3068 (estimates from wave spectra, Figure 13.4). The scattering geometry for a single average tilt radar resolution cell is shown in Figure 13.2. Modulations in the polarization orientation angle induced by azimuth traveling ocean waves in the study area are shown in Figure 13.3a and a histogram of the orientation angles is given in Figure 13.3b. An orientation angle spectrum versus wave number for azimuth direction waves propagating in the study area is given in Figure 13.4. The white rings correspond to ocean wavelengths of 50, 100, 150, and 200 m. The dominant 157-m wave is propagating at a heading of 3068. Figure 13.5a and Figure 13.5b give plots of spectral intensity versus wave number (a) for wave-induced orientation angle modulations and (b) for single-polarization (VV-pol)-intensity modulations. The plots are of wave spectra taken in the direction that maximizes the dominant wave peak. The orientation angle–dominant wave peak

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

273

N

Northern California Pacific Ocean

Wave direction (306ⴗ)

Gualala River

Range Study−site box (512 512)

Flight direction (azimuth) FIGURE 13.1 (See color insert following page 302.) An L-band, VV-pol, AIRSAR image, of northern California coastal waters (Gualala River dataset), showing ocean waves propagating through a study-site box.

(Figure 13.5a) has a significantly higher signal and background ratio than the conventional intensity-based VV-pol-dominant wave peak (Figure 13.5b). Finally, the orientation angles measured within the study sites were converted into azimuth direction slopes using an average incidence angle and Equation 13.2. From the estimates of these values, the ocean rms azimuth slopes were computed. These values are given in Table 13.1.

Signal and Image Processing for Remote Sensing

274

z

V

Ra

dar

H

Incidence plane

I1

N (Surface normal)

f (Ground range)

y

(A

zim

ut

h)

Tilted ocean resolution cell

x , I2 FIGURE 13.2 Geometry of scattering from a single, tilted, resolution cell. In the Gualala River dataset, the resolution cell has dimensions 6.6 m (range direction) and 8.2 m (azimuth direction).

(a)

Study area: orientation angles

FIGURE 13.3 (a) Image of modulations in the orientation angle, u, in the study site and (b) a histogram of the distribution of study-site u values. (continued)

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

275

(b) 1.2 x 104

Occurence number

1.0 x 104 8.0 x 103 6.0 x 103 4.0 x 103 2.0 x 103 0

–6

–4

–2 0 2 Orientation angle (deg)

4

6

FIGURE 13.3 (continued)

13.2.5

Two-Scale Ocean-Scattering Model: Effect on the Orientation Angle Measurement

In Section 13.2.3.2, Equation 13.5 is given for the orientation angle. This equation gives the orientation angle u as a function of three terms from the polarimetric coherency matrix T. Scattering has only been considered as occurring from a slightly rough, tilted surface equal to or greater than the size of the radar resolution cell (see Figure 13.2). The surface is planar and has a single tilt us. This section examines the effects of having a distribution of azimuth tilts, p (w), within the resolution cell, rather than a single averaged tilt. For single-look or multi-look processed data, the coherency matrix is defined as 2

D E jSHH þ SVV j2

3 h(SHH þ SVV )(SHH SVV )*i 2 (SHH þ SVV )S*HV D E 7 16 6 7 T ¼ hkk*T i ¼ 6 h(SHH SVV )(SHH þ SVV )*i 2 (SHH SVV )S*HV 7 jSHH SVV j2 5 24 D E 2hSHV (SHH þ SVV )*i 2hSHV (SHH SVV )*i 4 jSHV j2 2

3 S þ SVV 1 4 HH with k ¼ pﬃﬃﬃ SHH SVV 5 2 2SHV

(13:6)

We now follow the approach of Cloude and Pottier [23]. The composite surface consists of flat, slightly rough ‘‘facets,’’ which are tilted in the azimuth direction with a distribution of tilts, p(w): 1 jw sj b p(w) ¼ 2b (13:7) 0 otherwise where s is the average surface azimuth tilt of the entire radar resolution cell. The effect on T of having both (1) a mean bias azimuthal tilt us and (2) a distribution of azimuthal tilts b has been calculated in Ref. [22] as 2

3 A B sin c(2b) cos 2us B sin c(2b) sin 2us 5 T ¼ 4 B* sin c(2b) cos 2us 2C( sin2 2us þ sin c(4b) cos 4us ) C(1 2 sin c(4b)) sin 4us 2 B* sin c(2b) sin 2us C(1 2 sin c(4b)) sin 4us 2C( cos 2us sin c(4b) cos 4us ) (13:8)

Signal and Image Processing for Remote Sensing

276

50 m

Dominant wave: 157 m 100 m

200 m

150 m

Wave direction 306ⴗ FIGURE 13.4 (See color insert following page 302.) Orientation angle spectra versus wave number for azimuth direction waves propagating through the study site. The white rings correspond to 50, 100, 150, and 200 m. The dominant wave, of wavelength 157 m, is propagating at a heading of 3068.

where sin c(x) is ¼ sin(x)/x and A ¼ jSHH þ SVV j2 ,

B ¼ (SHH þ SVV )(S*HH S*VV ), C ¼ 0:5jSHH SVV j2

Equation 13.8 reveals the changes due to the tilt distribution (b) and bias (us) that occur in all terms, except in the term A ¼ jSHH þ SVVj2, which is roll-invariant. In the corresponding expression for the orientation angle, all of the other terms, except the denominator term hjSHH SVVj2i, are modified ! 4Re h(SHH SVV )S*HV i 1 u ¼ Arg(hSRR S*LL i) ¼ tan (13:9) hjSHH SVV j2 i þ 4hjSHV j2 i From Equation 13.8 and Equation 13.9, it can be determined that the exact estimation of the orientation angle u becomes more difficult as the distribution of ocean tilts b becomes stronger and wider because the ocean surface becomes progressively rougher.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

277

L-band orientation angle wave spectra

(a)

Spectral intensity

0.50

0.40

0.30

0.20

0.10

0.00

(b)

0.06 0.02 0.04 Azimuthal wave number

0.08

L-band, VV-intensity wave spectra 0.50

Spectral intensity

0.40

0.30

0.20

0.10 0.00 0.00

0.02

0.06 0.08 0.04 Azimuthal wave number

0.10

FIGURE 13.5 Plots of spectral intensity versus wave number (a) for wave-induced orientation angle modulations and (b) for VV-pol intensity modulations. The plots are taken in the propagation direction of the dominant wave (3068).

13.2.6

Alpha Parameter Measurement of Range Slopes

A second measurement technique is needed to remotely sense waves that have significant propagation direction components in the range direction. The technique must be more sensitive than current intensity-based techniques that depend on tilt and hydrodynamic modulations. Physically based POLSAR measurements of ocean slopes in the range direction are achieved using a technique involving the ‘‘alpha’’ parameter of the Cloude–Pottier polarimetric decomposition theorem [23]. 13.2.6.1

Cloude–Pottier Decomposition Theorem and the Alpha Parameter

The Cloude–Pottier entropy, anisotropy, and the alpha polarization decomposition theorem [23] introduce a new parameterization of the eigenvectors of the 3 3 averaged coherency matrix hjTji in the form

Signal and Image Processing for Remote Sensing

278 TABLE 13.1 Northern California: Gualala Coastal Results

In Situ Measurement Instrument Bodega Bay, CA 3-m Point Arena, CA Discus Buoy 46013 Wind Station

Parameter Dominant wave period (s) Dominant wavelength (m) Dominant wave direction (8) rms slopes azimuth direction (8) rms slopes range direction (8) Estimate of wave height (m)

10.0

Orientation Angle Method

N/A

Alpha Angle Method

156 From period, depth 320 Est. from wind direction N/A

10.03 From dominant wave number N/A 157 From wave spectra 284 Est. from 306 From wave wind direction spectra N/A 1.58

10.2 From dominant wave number 162 From wave spectra 306 From wave spectra N/A

N/A

N/A

N/A

1.36

2.4 Significant wave height

N/A

2.16 Est. from rms 1.92 Est. from rms slope, wave number slope, wave number

Date: 7/15/94; data start time (UTC): 20:04:44 (BB, PA), 20:02:98 (AIRSAR); wind speed: 1.0 m/s (BB), 2.9 m/s (PA) Mean ¼ 1.95 m/s; wind direction: 3208 (BB), 2848 (PA), mean ¼ 3028; Buoy: ‘‘Bodega Bay’’ (46013) ¼ BB; location: 38.23 N 123.33 W; water depth: 122.5 m; wind station: ‘‘Point Arena’’ (PTAC – 1) ¼ PA; location: 38.968 N, 123.74 W; study-site location: 38839.60 N, 123835.80 W.

2

l1 hjTji ¼ [U3 ] 4 0 0

3 0 0 5 [U3 ]*T l3

0 l2 0

(13:10)

where 2

cos a1 [U3 ] ¼ ejf 4 sin a1 cos b1 ejd1 sin a1 sin b1 ejg1

cos a2 ejf2 sin a2 cos b2 ejd2 sin a2 sin b2 ejg2

3 cos a3 ejf3 sin a3 cos b3 ejd3 5 sin a3 sin b3 ejg3

(13:11)

The average estimate of the alpha parameter is a ¼ P1 a 1 þ P 2 a 2 þ P 3 a 3

(13:12)

where li Pi ¼ Pj¼3 j¼1

: lj

(13:13)

The individual alphas are for the three eigenvectors and the Ps are the probabilities defined with respect to the eigenvalues. In this method, the average alpha is used and is, a. For the ocean backscatter, the contributions to the for simplicity, defined as average alpha are dominated, however, by the first eigenvalue or eigenvector term. The alpha parameter, developed from the Cloude–Pottier polarimetric scattering decomposition theorem [23], has desirable directional measurement properties. It is (1) roll-invariant in the azimuth direction and (2) in the range direction, it is highly sensitive to wave-induced modulations of f in the local incidence angle f. Thus, the

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

279

alpha parameter is well suited for measuring wave components traveling in the range direction, and discriminates against wave components traveling in the azimuth direction. 13.2.6.2 Alpha Parameter Sensitivity to Range Traveling Waves The alpha angle sensitivity to range traveling waves may be estimated using the small perturbation scattering model (SPSM) as a basis. For flat, slightly rough scattering areas that can be characterized by Bragg scattering, the scattering matrix has the form

S¼

SHH 0

0

SVV

(13:14)

Bragg-scattering coefficients SVV and SHH are given by

SHH

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ «r sin2 fi qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ cos fi þ «r sin2 fi cos fi

and

SVV ¼

(«r 1)( sin2 fi «r (1 þ sin2 fi ) ) qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 «r cos fi þ «r sin2 fi

(13:15)

The alpha angle is defined such that the eigenvectors of the coherency matrix T are parameterized by a vector k as 2 3 cos a k ¼ 4 sin a cos bejd 5 sin a sin bejg (13:16) For Bragg scattering, one may assume that there is only one dominant eigenvector (depolarization is negligible) and the eigenvector is given by 2

3 SVV þ SHH k ¼ 4 SVV SHH 5 0

(13:17)

Since there is only one dominant eigenvector, for Bragg scattering, a ¼ a1. For a horizontal, slightly rough resolution cell, the orientation angle b ¼ 0, and d may be set to zero. With these constraints, comparing Equation 13.16 and Equation 13.17 yields tan a ¼

SVV SHH SVV þ SHH

(13:18)

For « ! 1, SVV ¼ 1 þ sin2 fi and SHH ¼ cos2 fi

(13:19)

tan a ¼ sin2 fi

(13:20)

which yields

Figure 13.6 shows the alpha angle as a function of incidence angle for « ! 1 (blue) and for (red) « ¼ 80–70j, which is a representative dielectric constant of sea water. The sensitivity to alpha values to incidence angle changes (this is effectively the polarimetric

Signal and Image Processing for Remote Sensing

280 45 40

Alpha angle (deg)

35 30 25 20 15 10 5 0 0

10

20

30

40

50

60

70

80

90

Incidence angle (deg) FIGURE 13.6 (See color insert following page 302.) Small perturbation model dependence of alpha on the incidence angle. The red curve is for a dielectric constant representative of sea water (80–70j) and the blue curve is for a perfectly conducting surface.

MTF) as the range slope estimation is dependent on the derivative of a with respect to fi. For « ! 1 Da sin 2fi ¼ Dfi 1 þ sin4 fi

(13:21)

Figure 13.7 shows this curve (blue) and the exact curve for « ¼ 80–70j (red). Note that for the typical AIRSAR range of incidence angles (20–608), across the swath, the effective MTF is high (>0.5). 13.2.6.3

Alpha Parameter Measurement of Range Slopes and Wave Spectra

Model studies [6] result in an estimate of what the parametric relation a versus the incidence angle f should be for an assumed Bragg-scatter model. The sensitivity (i.e., the slope of the curve of a(f)) was large enough (Figure 13.6) to warrant investigation using real POLSAR ocean backscatter data. In Figure 13.8a, a curve of a versus the incidence angle f is given for a strip of Gualala data in the range direction that has been averaged 10 pixels in the azimuth direction. This curve shows a high sensitivity for the slope of a(f). Figure 13.8b gives a histogram of the frequency of occurrence of the alpha values. The curve of Figure 13.8a was smoothed by utilizing a least-square fit of the a(f) data to a third-order polynomial function. This closely fitting curve was used to transform the a values into corresponding incidence angle f perturbations. Pottier [6] used a model-based approach and fitted a third-order polynomial to the a(f) (red curve) of Figure 13.6 instead of using the smoothed, actual, image a(f) data. A distribution of f values has been made and the rms range slope value has been determined. The rms range slope values for the data sets are given in Table 13.1 and Table 13.2.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

281

0.9 0.8

Derivative of alpha (f )

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

10

20

30 40 50 60 70 Incidence angle (F ) (deg)

80

90

FIGURE 13.7 (See color insert following page 302.) Derivative of alpha with respect to the incidence angle. The red curve is for a sea water dielectric and the blue curve is for a perfectly conducting surface.

Finally, to measure an alpha wave spectrum, an image of the study area is formed with the mean of a(f) removed line by line in the range direction. An FFT of the study area results in the wave spectrum that is shown in Figure 13.9. The spectrum of Figure 13.9 is an alpha spectrum in the range direction. It can be converted to a range direction wave slope spectrum by transforming the slope values obtained from the smoothed alpha, a(f), values.

Alpha (deg)

(a)

40

20

0 20

30

40 Incidence angle (deg)

50

60

FIGURE 13.8 Empirical determination of the (a) sensitivity of the alpha parameter to the radar incidence angle (for Gualala River data) and (b) a histogram of the alpha values occurring within the study site. (continued)

Signal and Image Processing for Remote Sensing

282 (b) 1.4 × 104

Alpha value occurrences

1.2 × 104 1.0 × 104 8.0 × 103 6.0 × 103 4.0 × 103 2.0 × 103 0 –6

–4

–2 0 2 4 Alpha parameter perturbation (deg)

6

FIGURE 13.8 (continued) (b) a histogram of the alpha values occurring within the study site.

13.2.7

Measured Wave Properties and Comparisons with Buoy Data

The ocean wave properties estimated from the L- and P-band SAR data sets and the algorithms are the (1) dominant wavelength, (2) dominant wave direction, (3) rms slopes (azimuth and range), and (4) average dominant wave height. The NOAA NDBC buoys provided data on the (1) dominant wave period, (2) wind speed and direction, (3) significant wave height, and (4) wave classification (swell and wind waves). Both the Gualala and the San Francisco data sets involved waves classified as swell. Estimates of the average wave period can be determined either from buoy data or from the SAR-determined dominant wave number and water depth (see Equation 13.22). The dominant wavelength and direction are obtained from the wave spectra (see Figure 13.4 and Figure 13.9). The rms slopes in the azimuth direction are determined from the distribution of orientation angles converted to slope angles using Equation 13.2. The rms slopes in the range direction are determined by the distribution of alpha angles converted to slope angles using values of the smoothed curve fitted to the data of Figure 13.8a. Finally, an estimate of the average wave height, Hd, of the dominant wave was made using the peak-to-trough rms slope in the propagation direction Srms and the dominant wavelength ld. The estimated average dominant wave height was then determined from tan (Srms) ¼ Hd/(ld/2). This average dominant wave height estimate was compared with the (related) significant wave height provided by the NDBC buoy. The results of the measurement comparisons are given in Table 13.1 and Table 13.2. 13.2.7.1

Coastal Wave Measurements: Gualala River Study Site

For the Gualala River data set, parameters were calculated to characterize ocean waves present in the study area. Table 13.1 gives a summary of the ocean parameters that were determined using the data set as well as wind conditions and air and sea temperatures at the nearby NDBC buoy (‘‘Bodega Bay’’) and wind station (‘‘Point Arena’’) sites. The most important measured SAR parameters were rms wave slopes (azimuth and range directions), rms wave height, dominant wave period, and dominant wavelength. These quantities were estimated using the full-polarization data and the NDBC buoy data.

Open Ocean: Pacific Swell Results In Situ Measurement Instrument Parameter

San Francisco, CA, 3 m Discus Buoy 46026

Half Moon Bay, CA, 3 m Discus Buoy 46012

Dominant wave period (s)

15.7

15.7

Dominant wavelength (m) Dominant wave direction (8)

376 From period, depth 289 Est. from wind direction N/A N/A 3.10 Significant wave height

364 From period, depth 280 Est. from wind direction N/A N/A 2.80 Significant wave height

rms slopes azimuth direction (8) rms slopes range direction (8) Estimate of wave height (m)

Orientation Angle Method

Alpha Angle Method

15.17 From dominant wave number 359 From wave spectra 265 From wave spectra

15.23 From dominant wave number 362 From wave spectra 265 From wave spectra

0.92 N/A 2.88 Est. from rms slope, wave number

N/A 0.86 2.72 Est. from rms slope, wave number

Date: 7/17/88; data start time (UTC): 00:45:26 (Buoys SF, HMB), 00:52:28 (AIRSAR); wind speed: 8.1 m/s (SF), 5.0 m/s (HMB), mean ¼ 6.55 m/s; wind direction: 2898 (SF), 2808 (HMB), mean ¼ 284.58; Buoys: ‘‘San Francisco’’ (46026) ¼ SF; location: 37.75 N 122.82 W; water depth: 52.1 m; ‘‘Half Moon Bay’’ (46012) ¼ HMB; location: 37.36 N 122.88 W; water depth: 87.8 m.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

TABLE 13.2

283

Signal and Image Processing for Remote Sensing

284

Wave specturm

Dominant wave: 162 m

50 m

200 m

100 m

150 m

Wave direction 306ⴗ

FIGURE 13.9 (See color insert following page 302.) Spectrum of waves in the range direction using the alpha parameter from the Cloude–Pottier decomposition method. Wave direction is 3068 and dominant wavelength is 162 m.

The dominant wave at the Bodega Bay buoy during the measurement period is classified as a long wavelength swell. The contribution from wind wave systems or other swell components is small relative to the single dominant wave system. Using the surface gravity wave dispersion relation, one can calculate the dominant wavelength at this buoy location where the water depth is 122.5 m. The dispersion relation for surface water waves at finite depth is v2W ¼ gkW tan h(kH)

(13:22)

where vW is the wave frequency, kW is the wave number (2p/l), and H is the water depth. The calculated value for l is given in Table 13.1. A spectral profile similar to Figure 13.5a was developed for the alpha parameter technique and a dominant wave was measured having a wavelength of 156 m and a propagation direction of 3068. Estimates of the ocean parameters obtained using the orientation angle and alpha angle algorithms are summarized in Table 13.1. 13.2.7.2 Open-Ocean Measurements: San Francisco Study Site AIRSAR P-band image data were obtained for an ocean swell traveling in the azimuth direction. The location of this image was to the west of San Francisco Bay. It is a valuable

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

285

N

Wave direction

FIGURE 13.10 P-band span image of near-azimuth traveling (2658) swell in the Pacific Ocean off the coast of California near San Francisco.

data set because its location is near two NDBC buoys (‘‘San Francisco’’ and ‘‘Half-Moon Bay’’). Figure 13.10 gives a SPAN image of the ocean scene. The long-wavelength swell is clearly visible. The covariance matrix data was first Lee-filtered to reduce speckle noise [24] and was then corrected radiometrically. A polarimetric signature was developed for a 512 512 segment of the image and some distortion was noted. Measuring the distribution of the phase between the HH-pol and VV-pol backscatter returns eliminated this distortion. For the ocean, this distribution should have a mean nearly equal to zero. The recalibration procedure set the mean to zero and the distortion in the polarimetric signature was corrected. Figure 13.11a gives a plot of the spectral intensity (cross-section modulation) versus the wave number in the direction of the dominant wave propagation. Figure 13.11b presents a spectrum of orientation angles versus the wave number. The major peak, caused by the visible swell, in both plots occurs at a wave number of 0.0175 m1 or a wavelength of 359 m. Using Equation 13.22, the dominant wavelength was calculated at the San Francisco/Half Moon Bay buoy positions and depths. Estimates of the wave parameters developed from this data set using the orientation and alpha angle algorithms are presented in Table 13.2.

Signal and Image Processing for Remote Sensing

286 (a) 0.20

Spectral intensity

0.15

0.10

0.05

0.00 0.00

(b)

0.02

0.04 Azimuthal wave number

0.06

0.08

0.02

0.04 Azimuthal wave number

0.06

0.08

0.20

Spectral intensity

0.15

0.10

0.05

0.00 0.00

FIGURE 13.11 (a) Wave number spectrum of P-band intensity modulations and (b) a wave number spectrum of orientation angle modulations. The plots are taken in the propagation direction of the dominant wave (2658).

13.3 13.3.1

Polarimetric Measurement of Ocean Wave–Current Interactions Introduction

Studies have been carried out on the use of polarization orientation angles to remotely sense ocean wave slope distribution changes caused by wave–current interactions. The wave–current features studied here involve the surface manifestations of internal waves [1,25,26–29,30–32] and wave modifications at oceanic current fronts. Studies have shown that polarimetric SAR data may be used to measure bare surface roughness [33] and terrain topography [34,35]. Techniques have also been developed for measuring directional ocean wave spectra [36].

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

287

The polarimetric SAR image data used in all of the studies are NASA JPL/AIRSAR, P-, L-, and C-band, quad-pol microwave backscatter data. AIRSAR images of internal waves were obtained from the 1992 Joint US/Russia Internal Wave Remote Sensing Experiment (JUSREX’92) conducted in the New York Bight [25,26]. AIRSAR data on current fronts were obtained during the NRL Gulf Stream Experiment (NRL-GS’90). The NRL experiment is described in Ref. [20]. Extensive sea-truth is available for both of these experiments. These studies were motivated by the observation that strong perturbations occur in the polarization orientation angle u in the vicinity of internal waves and current fronts. The remote sensing of orientation angle changes associated with internal waves and current fronts are applications that have only recently been investigated [27–29]. Orientation angle changes should also occur for the related SAR application involving surface expressions of shallow-water bathymetry [37]. In the studies outlined here, polarization orientation angle changes are shown to be associated with wave–current interaction features. Orientation angle changes are not, however, produced by all types of ocean surface features. For example, orientation angle changes have been successfully used here to discriminate internal wave signatures from other ocean features, such as surfactant slicks, which produce no mean orientation angle changes.

13.3.2

Orientation Angle Changes Caused by Wave–Current Interactions

A study was undertaken to determine the effect that several important types of wave– current interactions have on the polarization orientation angle. The study involved both actual SAR data and an NRL theoretical model described in Ref. [38]. An example of a JPL/AIRSAR VV-polarization, L-band image of several strong, intersecting, internal wave packets is given in Figure 13.12. Packets of internal waves are generated from parent solitons as the soliton propagates into shallower water at, in this case, the continental shelf break. The white arrow in Figure 13.12 indicates the propagation direction of a wedge of internal waves (bounded by the dashed lines). The packet members within the area of this wedge were investigated. Radar cross-section (s0) intensity perturbations for the type of internal waves encountered in the New York Bight have been calculated in [30,31,39] and others. Related perturbations also occur in the ocean wave height and slope spectra. For the solitons often found in the New York Bight area, these perturbations become significantly larger for ocean wavelengths longer than about 0.25 m and shorter than 10–20 m. Thus, the study is essentially concerned with slope changes to meter-length wave scales. The AIRSAR slant range resolution cell size for these data is 6.6 m, and the azimuth resolution cell size is 12.1 m. These resolutions are fine enough for the SAR backscatter to be affected by perturbed wave slopes (meter-length scales). The changes in the orientation angle caused by these wave perturbations are seen in Figure 13.13. The magnitude of these perturbations covers a range u ¼ [1 to þ1]. The orientation angle perturbations have a large spatial extent (>100 m for the internal wave soliton width). The hypothesis assumed was that wave–current interactions make the meter wavelength slope distributions asymmetric. A profile of orientation angle perturbations caused by the internal wave study packet is given in Figure 13.14a. The values are obtained along the propagation vector line of Figure 13.12. Figure 13.14b gives a comparison of the orientation angle profile (solid line) and a normalized VV-pol backscatter intensity profile (dotted-dash line) along the same interval. Note that the orientation angle positive peaks (white stripe areas, Figure 13.13) align with the negative

Signal and Image Processing for Remote Sensing

288

Range direction

Azimuth direction

α

Internal waves packet propagation vector

FIGURE 13.12 AIRSAR L-band, VV-pol image of internal wave intersecting packets in the New York Bight. The arrow indicates the propagation direction for the chosen study packet (within the dashed lines). The angle a relates to the SAR/ packet coordinates. The image intensity has been normalized by the overall average.

troughs (black areas, Figure 13.12). In the direction orthogonal to the propagation vector, every point is averaged 5 5 pixels along the profile. The ratio of the maximum of u caused by the soliton to the average values of u within the ambient ocean is quite large. The current-induced asymmetry creates a mean wave slope that is manifested as a mean orientation angle. The relation between the tangent of the orientation angle u, wave slopes in the radar azimuth and ground range directions (tan v, tan g), and the radar look angle f from [21] is given by Equation 13.1, and for a given look angle f the average orientation angle tangent is

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

–1.0

0

289

+1.0

Orientation angle FIGURE 13.13 The orientation angle image of the internal wave packets in the New York Bight. The area within the wedge (dashed lines) was studied intensively.

htan ui ¼

ðp ðp

tan u(v,g) P(v,g) dg dv

(13:23)

0 0

where P(v,g) is the joint probability distribution function for the surface slopes in the azimuth and range directions. If the slopes are zero-meaned, but P(v,g) is skewed, then the mean orientation angle may not be zero even though the mean azimuth and range slopes are zero. It is evident from the above equation that both the azimuth and the range

Signal and Image Processing for Remote Sensing

290 (a)

1.0

Orientation angle (deg)

Internal waves Ambient ocean

0.5

0.0

–0.5

–1.0 0

200

400

600

800

Distance along line (pixels)

Normalized orientation angle and VV-pol backscatter intensity

(b)

1.0

0.5

0.0

–0.5 Orientation angle VV-pol intensity

–1.0 0

200

400

600

800

Distance along line (pixels) FIGURE 13.14 (a) The orientation angle value profile along the propagation vector for the internal wave study packet of Figure 13.12 and (b) a comparison of the orientation angle profile (solid line) and a normalized VV-pol backscatter intensity profile (dot–dash line). Note that the orientation angle positive peaks (white areas, Figure 13.13) align with the negative troughs (black areas, Figure 13. 11).

slopes have an effect on the mean orientation angle. The azimuth slope effect is generally larger because it is not reduced by the cos f term, which only affects the range slope. If, for instance, the meter-wavelength waves are produced by a broad wind-wave spectrum, then both v and g change locally. This yields a nonzero mean for the orientation angle. Figure 13.15 gives a histogram of orientation angle values (solid line) for a box inside the black area of the first packet member of the internal wave. A histogram for the ambient ocean orientation angle values for a similar-sized box near the internal wave is given by the dot–dash–dot line in Figure 13.15. Notice the significant difference in the mean value of these two distributions. The mean change in htan (u)i inferred from the bias for the perturbed area within the internal wave is 0.03 rad, corresponding to a u value of 1.728. The mean water wave slope changes needed to cause such orientation angle changes are estimated from Equation 13.1. In the denominator of Equation 13.1, the value of tan(g) cos(f) sin(f) for the value f ( ¼ 518) at the packet member location. Using this approximation, the ensemble average of Equation 13.1 provides the mean azimuth slope value,

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

291

150

Internal wave

Occurrences

100

Ambient ocean

50

0 –0.10

–0.15 –0.00 –0.05 Mean orientation angle tangent (radians)

–0.10

FIGURE 13.15 Distributions of orientation angles for the internal wave (solid line) and the ambient ocean (dot–dash–dot line).

htan(v)i ﬃ sin (f)htan(u)i

(13:24)

From the data provided in Figure 13.15, htan(v)i ¼ 0.0229 rad or v ¼ 1.328. A slope value of this magnitude is in approximate agreement with slope changes predicted by Lyzenga et al. [32] for internal waves in the same area during an earlier experiment (SARSEX, 1988). 13.3.3

Orientation Angle Changes at Ocean Current Fronts

An example of orientation angle changes induced by a second type of wave–current interaction, the convergent current front, is given in Figure 13.16a and Figure 13.16b. This image was created using AIRSAR P-band polarimetric data. The orientation angle response to this (NRL-GS’90) Gulf-Stream convergent-current front is the vertical white linear feature in Figure 13.16a and the sharp peak in Figure 13.16b. The perturbation of the orientation angle at, and near, the front location is quite strong relative to angle fluctuations in the ambient ocean. The change in the orientation angle maximum is ﬃ0.688. Other fronts in the same area of the Gulf Stream have similar changes in the orientation angle. 13.3.4

Modeling SAR Images of Wave–Current Interactions

To investigate wave–current-interaction features, a time-dependent ocean wave model has been developed that allows for general time-varying current, wind fields, and depth [20,38]. The model uses conservation of the wave action to compute the propagation of a statistical wind-wave system. The action density formalism that is used and an outline of the model are both described in Ref. [38]. The original model has been extended [1] to include calculations of polarization orientation angle changes due to wave–current interactions.

Signal and Image Processing for Remote Sensing

292 (a)

Front location

(b) 1.0

Orientation angle (deg)

Orientation angle response to current front 0.5

0.0

–0.5 0

2

Distance (km)

4

6

FIGURE 13.16 Current front within the Gulf Stream. An orientation angle image is given in (a) and orientation angle values are plotted in (b) (for values along the white line in (a)).

Model predictions have been made for the wind-wave field, radar return, and perturbation of the polarization orientation angle due to an internal wave. A model of the surface manifestation of an internal wave has also been developed. The algorithm used in the model has been modified from its original form to allow calculation of the polarization orientation angle and its variation throughout the extent of the soliton current field at the surface.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

293

Orientation angle tangent maximum variation

6 Primary Reponse (0.25–10.0 m)

5 4 3 2 1 0 0

5

20 10 15 Ocean wavelength (m)

25

FIGURE 13.17 The internal wave orientation angle tangent maximum variation as a function of ocean wavelength as predicted by the model. The primary response is in the range of 0.25–10.0 m and is in good agreement with previous studies of sigma-0. (From Thompson, D.R., J. Geophys. Res., 93, 12371, 1988.)

The values of both RCS (hs0i) and htan(u)i are computed by the model. The dependence of htan(u)i on the perturbed ocean wavelength was calculated by the model. This wavelength dependence is shown in Figure 13.17. The waves resonantly perturb htan(u)i for wavelengths in the range of 0.25–10.0 m. This result is in good agreement with previous studies of sigma-0 resonant perturbations for the JUSREX’92 area [39]. Figure 13.18a and Figure 13.18b show the form of the soliton current speed dependence of hs0i and htan(u)i. The potentially useful near-linear relation of htan(u)iv with current U (Figure 13.18b) is important in applications where determination of current gradients is the goal. The near-linear nature of this relationship provides the possibility that, from the value of htan(u)iv, the current magnitude can be estimated. Examination of the model results has led to the following empirical model of the variation of htan(u)i as: htan ui ¼ f (U, w, uw ) ¼ (aU) (w2 ebw ) sin (ajcw j þ bc2w )

(13:25)

where U, the surface current maximum speed (in m/s), w, the wind speed (in m/s) at (standard) 19.5 m height, and cw, the wind direction (in radians) relative to the soliton propagation direction. The constants are a ¼ 0.00347, b ¼ 0.365, a ¼ 0.65714, and b ¼ 0.10913. The range of cw is over [p,p]. Using Equation 13.25, the dashed curve in Figure 13.18 can be generated to show good agreement relative to the complete model. The solid lines in Figure 13.18 represent results from the complete model and the dashed lines are results from the empirical relation of Equation 13.25. This relation is much simpler than conventional estimates based on perturbation of the backscatter intensity. The scaling for the relationship is a relatively simple function of the wind speed and the direction of the locally wind-driven sea. If the orientation angle and wind measurements are available, then Equation 13.25 allows the internal wave current maximum U to be calculated.

13.4 13.4.1

Ocean Surface Feature Mapping Using Current-Driven Slick Patterns Introduction

Biogenic and man-made slicks are widely dispersed throughout the oceans. Current driven surface features, such as spiral eddies, can be made visible by associated patterns of slicks [40]. A combined algorithm using the Cloude–Pottier decomposition and the Wishart classifier

Signal and Image Processing for Remote Sensing

294

Current speed dependence

(a)

Max RCS variation (dB)

–18 –20 –22 –24 –26 –28 –30 –32 0.0

Max 〈tan(θ)〉 variation

(b)

0.2 0.4 Peak current speed (m / s)

0.6

0.008

0.006

0.004

0.002

0.000 0.0

0.2 0.4 Peak current speed (m/s) L-band, VV-pol, wind = (6.0 m/s, 135°)

0.6

FIGURE 13.18 (a and b) Model development of the current speed dependence of the max RCS and htan(u)i variations. The dashed line in Figure 13.23b gives the values predicted by an empirical equation in Ref. [38].

[41] is utilized to produce accurate maps of slick patterns and to suppress the background wave field. This technique uses the classified slick patterns to detect spiral eddies. Satellite SAR instruments performing wave spectral measurements, or operating as wind scatterometers, regard the slicks as a measurement error term. The classification maps produced by the algorithm facilitate the flagging of slick-contaminated pixels within the image. Aircraft L-band AIRSAR data (4/2003) taken in California coastal waters provided data on features that contained spiral eddies. The images also included biogenic slick patterns, internal wave packets, wind waves, and long wave swell. The temporal and spatial development of spiral eddies is of considerable importance to oceanographers. Slick patterns are used as ‘‘markers’’ to detect the presence and extent of spiral eddies generated in coastal waters. In a SAR image, the slicks appear as black distributed patterns of lower return. The slick patterns are most prevalent during periods of low to moderate winds. The spatial distribution of the slicks is determined by local surface current gradients that are associated with the spiral eddies. It has been determined that biogenic surfactant slicks may be identified and classified using SAR polarimetric decompositions. The purpose of the decomposition is to discriminate against other features such as background wave systems. The

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

295

) of the Cloude–Pottier parameters entropy (H), anisotropy (A), and average alpha ( decomposition [23] were used in the classification. The results indicate that biogenic slick patterns, classified by the algorithm, can be used to detect the spiral eddies. The decomposition parameters were also used to measure small-scale surface roughness as well as larger-scale rms slope distributions and wave spectra [4]. Examples of slope distributions are given in Figure 13.8b and that of wave spectra in Figure 13.9. Small-scale roughness variations that were detected by anisotropy changes are given in Figure 13.19. This figure shows variations in anisotropy at low wind speeds for a filament of colder, trapped water along the northern California coast. The air–sea stability has changed for the region containing the filament. The roughness changes are not seen in (a) the conventional VV-pol image but are clearly visible in (b) an anisotropy image. The data are from coastal waters near the Mendocino Co. town of Gualala. Finally, the classification algorithm may also be used to create a flag for the presence of slicks. Polarimetric satellite SAR systems (e.g., RADARSAT-2, ALOS/PALSAR, SIR-C) attempting to measure wave spectra, or scatterometers measuring wind speed and direction can avoid using slick contaminated data. In April 2003, the NRL and the NASA Jet Propulsion Laboratory (JPL) jointly carried out a series of AIRSAR flights over the Santa Monica Basin off the coast of California. Backscatter POLSAR image data at P-, L-, and C-bands were acquired. The purpose of the flights was to better understand the dynamical evolution of spiral eddies, which are

N

Anisotropy-related Roughness variations

California coast

California coast

Cold water mass

Pacific Ocean

L-band, VV-pol image

Pacific Ocean

Anisotropy-A image

FIGURE 13.19 (See color insert following page 302.) (a) Variations in anisotropy at low wind speeds for a filament of colder, trapped water along the northern California coast. The roughness changes are not seen in the conventional VV-pol image, but are clearly visible in (b) an anisotropy image. The data are from coastal waters near the Mendocino Co. town of Gualala.

Signal and Image Processing for Remote Sensing

296

generated in this area by interaction of currents with the Channel Islands. Sea-truth was gathered from a research vessel owned by the University of California at Los Angeles (UCLA). The flights yielded significant data not only on the time history of spiral eddies but also on surface waves, natural surfactants, and internal wave signatures. The data were analyzed using a polarimetric technique, the Cloude–Pottier hH/A/ai decomposition given in Ref. [23]. In Figure 13.20a, the anisotropy is again mapped for a study site

Anisotropy-A image

f = 23

f = 62 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

L-band, VV-pol image FIGURE 13.20 (See color insert following page 302.) (a) Image of anisotropy values. The quantity, 1A, is proportional to small-scale surface roughness and (b) a conventional L-band, VV-pol image of the study area.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

297

east of Catalina Island, CA. For comparison, a VV-pol image is given in Figure 13.20b. The slick field is reasonably well mapped by anisotropy—but the image is noisy because of the difference in the two small second and third eigenvalues that are used to compute it.

13.4.2

Classification Algorithm

The overall purpose of the field research effort outlined in Section 12.4.1 was to create a means of detecting ocean features such as spiral eddies using biogenic slicks as markers, while suppressing other effects such as wave fields and wind-gradient effects. A polarimetric classification algorithm [41–43] was tested as a candidate means to create such a feature map. 13.4.2.1 Unsupervised Classification of Ocean Surface Features Van Zyl [44] and Freeman–Durden [46] developed unsupervised classification algorithms that separate the image into four classes: odd-bounce, even bounce, diffuse (volume), and an in-determinate class. For an L-band image, the ocean surface typically is dominated by the characteristics of the Bragg-scattering odd (single) bounce. City buildings and structures have the characteristics of even (double) scattering, and heavy forest vegetation has the characteristics of diffuse (volume) scattering. Consequently, this classification algorithm provides information on the terrain scatterer type. For a refined separation into more classes, Pottier [6] proposed an unsupervised classification algorithm based on their target decomposition theory. The medium’s scattering mechanisms, characterized by average alpha angle, and later anisotropy A, were used for classification. entropy H, The entropy H is a measure of randomness of the scattering mechanisms, and the alpha angle characterizes the scattering mechanism. The unsupervised classification is achieved plane, which is segmented into by projecting the pixels of an image onto the H– scattering zones. The zones for the Gualala study-site data are shown in Figure 13.21. Details of this segmentation are given in Ref. [6]. In the alpha–entropy scattering zone map of the decomposition, backscatter returns from the ocean surface normally occur in the lowest (dark blue color) zone of both alpha and entropy. Returns from slick covered values, and occur in both the lowest areas have higher entropy H and average alpha zone and higher zones. 13.4.2.2

Classification Using Alpha–Entropy Values and the Wishart Classifier

Classification of the image was initiated by creating an alpha–entropy zone scatterplot to angle and level of entropy H for scatterers in the slick study area. determine the Secondly, the image was classified into eight distinct classes using the Wishart classifier [41]. The alpha–entropy decomposition method provides good image segmentation based on the scattering characteristics. The algorithm used is a combination of the unsupervised decomposition classifier and the supervised Wishart classifier [41]. One uses the segmented image of the decomposition method to form training sets as input for the Wishart classifier. It has been noted that , especially in the multi-look data are required to obtain meaningful results in H and entropy H. In general, 4-look processed data are not sufficient. Normally, additional averaging (e.g., 55 boxcar filter), either of the covariance or of coherency matrices, has computation. This prefiltering is done on all the to be performed prior to the H and . Initial classification data. The filtered coherency matrix is then used to compute H and is made using the eight zones. This initial classification map is then used to train the Wishart classification. The reclassified result shows improvement in retaining details.

Signal and Image Processing for Remote Sensing

Alpha angle

298

Entropy parameter FIGURE 13.21 (See color insert following page 302.) Alpha-entropy scatter plot for the image study area. The plot is divided into eight color-coded scattering classes for the Cloude–Pottier decomposition described in Ref. [6].

Further improvement is possible by using several iterations. The reclassified image is then used to update the cluster centers of the coherency matrices. For the present data, two iterations of this process were sufficient to produce good classifications of the complete biogenic fields. Figure 13.22 presents a completed classification map of the biogenic slick fields. Information is provided by the eight color-code classes in the image in Figure 13.22. The returns from within the largest slick (labeled as A) have classes that progressively increase in both average alpha and entropy as a path is made from clean water inward toward the center of the slick. Therefore, the scattering becomes less surfacelike ( increase) and also becomes more depolarized (H increase) as one approaches the center of the slick (Figure 13.22, Label A). The algorithm outlined above may be applied to an image containing large-scale ocean features. An image (JPL/CM6744) of classified slick patterns for two-linked spiral eddies near Catalina Island, CA, is given in Figure 13.23b. An L-band, HH-pol image is presented in Figure 13.23a for comparison. The Pacific swell is suppressed in areas where there are no slicks. The waves do, however, appear in areas where there are slicks because the currents associated with the orbital motion of the waves alternately compress or expand the slick-field density. Note the dark slick patch to the left of label A in Figure 13.23a and Figure 13.23b. This patch clearly has strongly suppressed the backscatter at HH-pol. The corresponding area of Figure 13.23b has been classified into three classes and colors (Class 7—salmon, Class 5—yellow, and Class 2—dark green), which indicate progressive increases in scattering complexity and depolarization as one moves from the perimeter of the slick toward its interior. A similar change in scattering occurs to the left of label B near the center of Figure 13.23a and Figure 13.23b. In this case, as one moves

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

299

A

Class 1 Class 2 Class 3 Class 4 Class 5 Class 7 Class 8 f = 23

Clean surface

Slicks trapped within internal wave packet

Slicks

f = 62

FIGURE 13.22 (See color insert following page 302.) scattering classes. Classification of the slick-field image into H/

(a)

(b)

A

A

Spiral eddy

B

B Class 1 Class 2

Spiral eddy

Class 3 Class 4 Class 5 Class 7 Class 8

L-band, HH-pol image

Wishartand H-Alpha algorithm classified image

FIGURE 13.23 (See color insert following page 302.) (a) L-band, HH-pol image of a second study image (CM6744) containing two strong spiral eddies marked by natural biogenic slicks and (b) classification of the slicks marking the spiral eddies. The image features were values combined with the Wishart classifier. classified into eight classes using the H–

Signal and Image Processing for Remote Sensing

300

from the perimeter into the slick toward the center, the classes and colors (Class 7— salmon, Class 4—light green, Class 1—white) also indicate progressive increases in scattering complexity and depolarization. 13.4.2.3

Comparative Mapping of Slicks Using Other Classification Algorithms

The question arises whether or not the algorithm using entropy-alpha values with the Wishart classifier is the best candidate for unsupervised detection and mapping of slick fields. Two candidate algorithms were suggested as possible competitive classification )– methods. These were (1) the Freeman–Durden decomposition [45] and (2) the (H/A/ Wishart segmentation algorithm [42,43], which introduce anisotropy to the parameter mix because of its sensitivity to ocean surface roughness. Programs were developed to investigate the slick classification capabilities of these candidate algorithms. The same amount of averaging (5 5) and speckle reduction was done for all of the algorithms. The results with the Freeman–Durden classification were poor at both L- and C-bands. Nearly all of the returns were surface, single-bounce scatter. This is expected because the Freeman– Durden decomposition was developed on the basis of scattering models of land features. This method could not discriminate between waves and slicks and did not improve on the results using conventional VV or HH polarization. )–Wishart segmentation algorithm was investigated to take advantage of The (H/A/ the small-scale roughness sensitivity of the polarimetric anisotropy A. The anisotropy is shown (Figure 13.20b) to be very sensitive to slick patterns across the whole image. The )–Wishart segmentation method expands the number of classes from 8 to 16 by (H/A/ including the anisotropy A. The best way to introduce information about A in the classification procedure is to carry out two successive Wishart classifier algorithms. The . Each class in the H/ plane is then further divided first classification only involves H/ into two classes according to whether the pixel’s anisotropy values are greater than 0.5 or less than 0.5. The Wishart classifier is then employed a second time. Details of this )–Wishart method algorithm are given in Ref. [42,43]. The results of using the (H/A/ and iterating it twice are given in Figure 13.24. )–Wishart method resulted in Classification of the slick-field image using the (H/A/ 14 scattering classes. Two of the expected 16 classes were suppressed. The Classes 1–7 corresponded to anisotropy A values from 0.5 to 1.0 and the Classes 8–14 corresponded to anisotropy A values from 0.0 to 0.49. The new two lighter blue vertical features at the lower right of the image appeared in all images involving anisotropy and were thought to be a smooth slick of the lower surfactant material concentration. This algorithm was an –Wishart algorithm for slick mapping. All of the slickimprovement relative to the H/ covered areas were classified well and the unwanted wave field intensity modulations were suppressed.

13.5

Conclusions

Methods that are capable of measuring ocean wave spectra and slope distributions in both the range and azimuth directions were described. The new measurements are sensitive and provide nearly direct measurements of ocean wave spectra and slopes without the need for a complex MTF. The orientation modulation spectrum has a higher dominant wave peak and background ratio than the intensity-based spectrum. The results

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

301

Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 FIGURE 13.24 (See color insert following page 302.) 14 scattering classes. The Classes 1–7 correspond to anisotropy Classification of the slick-field image into H/A/ A values 0.5 to 1.0 and the Classes 8–14 correspond to anisotropy A values 0.0 to 0.49. The two lighter blue vertical features at the lower right of the image appear in all images involving anisotropy and are thought to be smooth slicks of lower concentration.

determined for the dominant wave direction, wavelength, and wave height are comparable to the NDBC buoy measurements. The wave slope and wave spectra measurement methods that have been investigated may be developed further into fully operational algorithms. These algorithms may then be used by polarimetric SAR instruments, such as ALOS/PALSAR and RADARSAT II, to monitor sea-state conditions globally. Secondly, this work has investigated the effect of internal waves and current fronts on the SAR polarization orientation angle. The results provide a potential (1) independent means for identifying these ocean features and (2) a method of estimating the mean value of the surface current and slope changes associated with an internal wave. Simulations of the NRL wave–current interaction model [38] have been used to identify and quantify the different variables such as current speed, wind speed, and wind direction, which determine changes in the SAR polarization orientation angle. The polarimetric scattering properties of biogenic slicks have been found to be different from those of the clean surface wave field and the slicks may be separated from this background wave field. Damping of capillary waves, in the slick areas, lowers all of the eigenvalues of the decomposition and increases the average alpha angle, entropy, and the anisotropy. The Cloude–Pottier polarimetric decomposition was also used as a new means of studying scattering properties of surfactant slicks perturbed by current-driven surface features. The features, for example, spiral eddies, were marked by filament patterns of

302

Signal and Image Processing for Remote Sensing

slicks. These slick filaments were physically smoother. Backscatter from them was more complex (three eigenvalues nearly equal) and was more depolarized. Anisotropy was found to be sensitive to small-scale ocean surface roughness, but was not a function of large-scale range or azimuth wave slopes. These unique properties provided an achievable separation of roughness scales on the ocean surface at low wind speeds. Changes in anisotropy due to surfactant slicks were found to be measurable across the entire radar swath. Finally, polarimetric SAR decomposition parameters alpha, entropy, and anisotropy were used as an effective means for classifying biogenic slicks. Algorithms, using these parameters, were developed for the mapping of both slick fields and ocean surface features. Selective mapping of biogenic slick fields may be achieved using either the entropy or the alpha parameters with the Wishart classifier or, by the entropy, anisotropy, or the alpha parameters with the Wishart classifier. The latter algorithm gives the best results overall. Slick maps made using this algorithm are of use for satellite scatterometers and wave spectrometers in efforts aimed at flagging ocean surface areas that are contaminated by slick fields.

References 1. Schuler, D.L., Jansen, R.W., Lee, J.S., and Kasilingam, D., Polarisation orientation angle measurements of ocean internal waves and current fronts using polarimetric SAR, IEE Proc. Radar, Sonar Navigation, 150(3), 135–143, 2003. 2. Alpers, W., Ross, D.B., and Rufenach, C.L., The detectability of ocean surface waves by real and synthetic aperture radar, J. Geophys. Res., 86(C7), 6481, 1981. 3. Engen, G. and Johnsen, H., SAR-ocean wave inversion using image cross-spectra, IEEE Trans. Geosci. Rem. Sens., 33, 1047, 1995. 4. Schuler, D.L., Kasilingam, D., Lee, J.S., and Pottier, E., Studies of ocean wave spectra and surface features using polarimetric SAR, Proc. Int. Geosci. Rem. Sens. Symp. (IGARSS’03), Toulouse, France, IEEE, 2003. 5. Schuler, D.L., Lee, J.S., and De Grandi, G., Measurement of topography using polarimetric SAR Images, IEEE Trans. Geosci. Rem. Sens., 34, 1266, 1996. 6. Pottier, E., Unsupervised classification scheme and topography derivation of POLSAR data on the <> polarimetric decomposition theorem, Proc. 4th Int. Workshop Radar Polarimetry, IRESTE, Nantes, France, 535–548, 1998. 7. Hasselmann, K. and Hasselmann, S., The nonlinear mapping of an ocean wave spectrum into a synthetic aperture radar image spectrum and its inversion, J. Geophys. Res., 96(10), 713, 1991. 8. Vesecky, J.F. and Stewart, R.H., The observation of ocean surface phenomena using imagery from SEASAT synthetic aperture radar—an assessment, J. Geophys. Res., 87, 3397, 1982. 9. Beal, R.C., Gerling, T.W., Irvine, D.E., Monaldo, F.M., and Tilley, D.G., Spatial variations of ocean wave directional spectra from the SEASAT synthetic aperture radar, J. Geophys. Res., 91, 2433, 1986. 10. Valenzuela, G.R., Theories for the interaction of electromagnetic and oceanic waves—a review, Boundary Layer Meteorol., 13, 61, 1978. 11. Keller, W.C. and Wright, J.W., Microwave scattering and straining of wind-generated waves, Radio Sci., 10, 1091, 1975. 12. Alpers, W. and Rufenach, C.L., The effect of orbital velocity motions on synthetic aperture radar imagery of ocean waves, IEEE Trans. Antennas Propagat., 27, 685, 1979. 13. Plant, W.J. and Zurk, L.M., Dominant wave directions and significant wave heights from SAR imagery of the ocean, J. Geophys. Res., 102(C2), 3473, 1997.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

303

14. Hasselmann, K., Raney, R.K., Plant, W.J., Alpers, W., Shuchman, R.A., Lyzenga, D.R., Rufenach, C.L., and Tucker, M.J., Theory of synthetic aperture radar ocean imaging: a MARSEN view, J. Geophys. Res., 90, 4659, 1985. 15. Lyzenga, D.R., An analytic representation of the synthetic aperture radar image spectrum for ocean waves, J. Geophys. Res., 93(13), 859, 1998. 16. Kasilingam, D. and Shi, J., Artificial neural network based-inversion technique for extracting ocean surface wave spectra from SAR images, Proc. IGARSS’97, Singapore, IEEE, 1193–1195, 1997. 17. Hasselmann, S., Bruning, C., Hasselmann, K., and Heimbach, P., An improved algorithm for the retrieval of ocean wave spectra from synthetic aperture radar image spectra, J. Geophys. Res., 101, 16615, 1996. 18. Lehner, S., Schulz-Stellenfleth, Schattler, B., Breit, H., and Horstmann, J., Wind and wave measurements using complex ERS-2 SAR wave mode data, IEEE Trans. Geosci. Rem. Sens., 38(5), 2246, 2000. 19. Dowd, M., Vachon, P.W., and Dobson, F.W., Ocean wave extraction from RADARSAT synthetic aperture radar inter-look image cross-spectra, IEEE Trans. Geosci. Rem. Sens., 39, 21–37, 2001. 20. Lee, J.S., Jansen, R., Schuler, D., Ainsworth, T., Marmorino, G., and Chubb, S., Polarimetric analysis and modeling of multi-frequency SAR signatures from Gulf Stream fronts, IEEE J. Oceanic Eng., 23, 322, 1998. 21. Lee, J.S., Schuler, D.L., and Ainsworth, T.L., Polarimetric SAR data compensation for terrain azimuth slope variation, IEEE Trans. Geosci. Rem. Sens., 38, 2153–2163, 2000. 22. Lee, J.S., Schuler, D.L., Ainsworth, T.L., Krogager, E., Kasilingam, D., and Boerner, W.M., The estimation of radar polarization shifts induced by terrain slopes, IEEE Trans. Geosci. Rem. Sens., 40, 30–41, 2001. 23. Cloude, S.R. and Pottier, E., A review of target decomposition theorems in radar polarimetry, IEEE Trans. Geosci. Rem. Sens., 34(2), 498, 1996. 24. Lee, J.S., Grunes, M.R., and De Grandi, G., Polarimetric SAR speckle filtering and its implication for classification, IEEE Trans. Geosci. Rem. Sens., 37, 2363, 1999. 25. Gasparovic, R.F., Apel, J.R., and Kasischke, E., An overview of the SAR internal wave signature experiment, J. Geophys. Res., 93, 12304, 1998. 26. Gasparovic, R.F., Chapman, R., Monaldo, F.M., Porter, D.L., and Sterner, R.F., Joint U.S./Russia internal wave remote sensing experiment: interim results, Applied Physics Laboratory Report S1R-93U-011, Johns Hopkins University, 1993. 27. Schuler, D.L., Kasilingam, D., and Lee, J.S., Slope measurements of ocean internal waves and current fronts using polarimetric SAR, European Conference on Synthetic Aperture Radar (EUSAR’2002), Cologne, Germany, 2002. 28. Schuler, D.L., Kasilingam, D., Lee, J.S., Jansen, R.W., and De Grandi, G., Polarimetric SAR measurements of slope distribution and coherence changes due to internal waves and current fronts, Proc. Int. Geosci. Rem. Sens. (IGARSS’2002) Symp., Toronto, Canada, 2002. 29. Schuler, D.L., Lee, J.S., Kasilingam, D., and De Grandi, G., Studies of ocean current fronts and internal waves using polarimetric SAR coherences, in Proc. Prog. Electromagnetic Res. Symp. (PIERS’2002), Cambridge, MA, 2002. 30. Alpers, W., Theory of radar imaging of internal waves, Nature, 314, 245, 1985. 31. Brant, P., Alpers, W., and Backhaus, J.O., Study of the generation and propagation of internal waves in the Strait of Gibraltar using a numerical model and synthetic aperture radar images of the European ERS-1 satellite, J. Geophys. Res., 101(14), 14237, 1996. 32. Lyzenga, D.R. and Bennett, J.R., Full-spectrum modeling of synthetic aperture radar internal wave signatures, J. Geophys. Res., 93(C10), 12345, 1988. 33. Schuler D.L., Lee, J.S., Kasilingam, D., and Nesti, G., Surface roughness and slope measurements using polarimetric SAR data, IEEE Trans. Geosci. Rem. Sens., 40(3), 687, 2002. 34. Schuler, D.L., Ainsworth, T.L., Lee, J.S., and De Grandi, G., Topographic mapping using polarimetric SAR data, Int. J. Rem. Sens., 35(5), 1266, 1998. 35. Schuler, D.L, Lee, J.S., Ainsworth, T.L., and Grunes, M.R., Terrain topography measurement using multi-pass polarimetric synthetic aperture radar data, Radio Sci., 35(3), 813, 2002. 36. Schuler, D.L. and Lee, J.S., A microwave technique to improve the measurement of directional ocean wave spectra, Int. J. Rem. Sens., 16, 199, 1995.

304

Signal and Image Processing for Remote Sensing

37. Alpers, W. and Hennings, I., A theory of the imaging mechanism of underwater bottom topography by real and synthetic aperture radar, J. Geophys. Res., 89, 10529, 1984. 38. Jansen, R.W., Chubb, S.R., Fusina, R.A., and Valenzuela, G.R., Modeling of current features in Gulf Stream SAR imagery, Naval Research Laboratory Report NRL/MR/7234-93-7401, 1993 39. Thompson, D.R., Calculation of radar backscatter modulations from internal waves, J. Geophys. Res., 93(C10), 12371, 1988. 40. Schuler, D.L., Lee, J.S., and De Grandi, G., Spiral eddy detection using surfactant slick patterns and polarimetric SAR image decomposition techniques, Proc. Int. Geosci. Rem. Sens. Symp. (IGARSS), Anchorage, Alaska, September, 2004. 41. Lee, J.S., Grunes, M.R., Ainsworth, T.L., Du, L.J., Schuler, D.L., and Cloude, S.R., Unsupervised classification using polarimetric decomposition and the complex Wishart classifier, IEEE Trans. Geosci. Rem. Sens., 37(5), 2249, 1999. 42. Pottier, E. and Lee, J.S., Unsupervised classification scheme of POLSAR images based on the complex Wishart distribution and the polarimetric decomposition theorem, Proc. 3rd Eur. Conf. Synth. Aperture Radar (EUSAR’2000), Munich, Germany, 2000. 43. Ferro-Famil, L., Pottier, E., and Lee, J-S, Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier, IEEE Trans. Geosci. Rem. Sens., 39(11), 2332, 2001. 44. Van Zyl, J.J., Unsupervised classification of scattering mechanisms using radar polarimetry data, IEEE Trans. Geosci. Rem. Sens., 27, 36, 1989. 45. Freeman, A., and Durden, S.L., A three component scattering model for polarimetric SAR data, IEEE Trans. Geosci. Rem. Sens., 36, 963, 1998.

14 MRF-Based Remote-Sensing Image Classification with Automatic Model Parameter Estimation

Sebastiano B. Serpico and Gabriele Moser

CONTENTS 14.1 Introduction ..................................................................................................................... 305 14.2 Previous Work on MRF Parameter Estimation ......................................................... 306 14.3 Supervised MRF-Based Classification......................................................................... 308 14.3.1 MRF Models for Image Classification ........................................................... 308 14.3.2 Energy Functions .............................................................................................. 309 14.3.3 Operational Setting of the Proposed Method .............................................. 310 14.3.4 The Proposed MRF Parameter Optimization Method................................ 311 14.4 Experimental Results...................................................................................................... 314 14.4.1 Experiment I: Spatial MRF Model for Single-Date Image Classification.......................................................................................... 317 14.4.2 Experiment II: Spatio-Temporal MRF Model for Two-Date Multi-Temporal Image Classification ............................................................ 317 14.4.3 Experiment III: Spatio-Temporal MRF Model for Multi-Temporal Classification of Image Sequences.................................... 321 14.5 Conclusions...................................................................................................................... 322 Acknowledgments ..................................................................................................................... 324 References ................................................................................................................................... 324

14.1

Introduction

Within remote-sensing image analysis, Markov random field (MRF) models represent a powerful tool [1], due to their ability to integrate contextual information associated with the image data in the analysis process [2,3,4]. In particular, the use of a global model for the statistical dependence of all the image pixels in a given image-analysis scheme typically turns out to be an intractable task. The MRF approach offers a solution to this issue, as it allows expressing a global model of the contextual information by using only local relations among neighboring pixels [2]. Specifically, due to the Hammersley– Clifford theorem [3], a large class of global contextual models (i.e., the Gibbs random fields, GRFs [2]) can be proved to be equivalent to local MRFs, thus sharply reducing the related model complexity. In particular, MRFs have been used for remote-sensing image analysis, for single-date [5], multi-temporal [6–8], multi-source [9], and multi-resolution [10] 305

306

Signal and Image Processing for Remote Sensing

classification, for denoising [1], segmentation [11–14], anomaly detection [15], texture extraction [2,13,16], and change detection [17–19]. Focusing on the specific problem of image classification, the MRF approach allows one to express a ‘‘maximum-a-posteriori’’ (MAP) decision task as the minimization of a suitable energy function. Several techniques have been proposed to deal with this minimization problem, such as the simulated annealing (SA), an iterative stochastic optimization algorithm converging to a global minimum of the energy [3] but typically involving long execution times [2,5], the iterative conditional modes (ICM), an iterative deterministic algorithm converging to a local (but usually good) minimum point [20] and requiring much shorter computation times than SA [2], and the maximization of posterior marginals (MPM), which approximates the MAP rule by maximizing the marginal posterior distribution of the class label of each pixel instead of the joint posterior distribution of all image labels [2,11,21]. However, an MRF model usually involves the use of one or more internal parameters, thus requiring a preliminary parameter-setting stage before the application of the model itself. In particular, especially in the context of supervised classification, interactive ‘‘trial-and-error’’ procedures are typically employed to choose suitable values for the model parameters [2,5,9,11,17,19], while the problem of fast automatic parameter-setting for MRF classifiers is still an open issue in the MRF literature. The lack of effective automatic parameter-setting techniques has represented a significant limitation on the operational use of MRF-based supervised classification architectures, although such methodologies are known for their ability to generate accurate classification maps [9]. On the other hand, the availability of the above-mentioned unsupervised parameter-setting procedures has contributed to an extensive use of MRFs for segmentation and unsupervised classification purposes [22,23]. In the present chapter, an automatic parameter optimization algorithm is proposed to overcome the above limitation in the context of supervised image classification. The method refers to a broad category of MRF models, characterized by energy functions expressed as linear combinations of different energy contributions (e.g., representing different typologies of contextual information) [2,7]. The algorithm exploits this linear dependence to formalize the parameter-setting problem as the solution of a set of linear inequalities, and addresses this problem by extending to the present context the Ho–Kashyap method for linear classifier training [24,25]. The well-known convergence properties of such a method and the absence of parameters to be tuned are among the good features of the proposed technique. The chapter is organized as follows. Section 14.2 provides an overview of the previous work on the problem of MRF parameter estimation. Section 14.3 describes the methodological issues of the method, and Section 14.4 presents the results of the application of the technique to the classification of real (both single-date and multi-temporal) remote sensing images. Finally, conclusions are drawn in Section 14.5.

14.2

Previous Work on MRF Parameter Estimation

Several parameter-setting algorithms have been proposed in the context of MRF models for image segmentation (e.g., [12,22,26,27]) or unsupervised classification [23,28,29], although often resulting in a considerable computational burden. In particular, the usual maximum likelihood (ML) parameter estimation approach [30] exhibits good theoretical consistency properties when applied to GRFs and MRFs [31], but turns out to be computationally very expensive for most MRF models [22] or even intractable [20,32,33], due to the difficulty of analytical computation and numerical maximization of the

MRF-Based Remote-Sensing Image Classification

307

normalization constants involved by the MRF-based distributions (the so-called ‘‘partition functions’’) [20,23,33]. Operatively, the use of ML estimates for MRF parameters turns out to be restricted just to specific typologies of MRFs, such as continuous Gaussian [1,13,15,34,35] or generalized Gaussian [36] fields, which allows an ML estimation task to be formulated in an analytically feasible way. Beyond such case-specific techniques, essentially three approaches have been suggested in the literature to address the problem of unsupervised ML estimation of MRF parameters indirectly, namely the Monte Carlo methods, the stochastic gradient approaches, or the pseudo-likelihood approximations [23]. The combination of the ML criterion with Monte Carlo simulations has been proposed to overcome the difficulty of computation of the partition function [22,37] and it provides good estimation results, but it usually involves long execution times. Stochastic-gradient approaches aim at maximizing the log-likelihood function by integrating a Gibbs stochastic sampling strategy into the gradient ascent method [23,38]. Combinations of the stochastic-gradient approach with the iterated conditional expectation (ICE) and with an estimation–restoration scheme have also been proposed in Refs. [12,29], respectively. Approximate pseudo-likelihood functionals have been introduced [20,23], and they are numerically feasible, although the resulting estimates are not actual ML estimates (except in the trivial noncontextual case of pixel independence) [20]. Moreover the pseudo-likelihood approximation may underestimate the interactions between pixels and can provide unsatisfactory results unless the interactions are suitably weak [21,23]. Pseudo-likelihood approaches have also been developed in conjunction with mean-field approximations [26], with the expectationmaximization (EM) [21,39], the Metropolis–Hastings [40], and the ICE [21] algorithms, with Monte Carlo simulations [41], or with multi-resolution analysis [42]. In Ref. [20] a pseudo-likelihood approach is plugged into the ICM energy minimization strategy, by iteratively alternating the update of the contextual clustering map and the update of the parameter values. In Ref. [10] this method is integrated with EM in the context of multiresolution MRFs. A related technique is the ‘‘coding method,’’ which is based on a pseudo-likelihood functional computed over a subset of pixels, although these subsets depend on the choice of suitable coding strategies [34]. Several empirical or ad hoc estimators have also been developed. In Ref. [33] a family of MRF models with polynomial energy functions is proved to be dense in the space of all MRFs and is endowed with a case-specific estimation scheme based on the method of moments. In Ref. [43] a least-square approach is proposed for Ising-type and Potts-type MRF models [22], and it formulates an overdetermined system of linear equations relating the unknown parameters with a set of relative frequencies of pixel-label configurations. The combination of this method with EM is applied in Ref. [44] for sonar image segmentation purposes. A simpler but conceptually similar approach is adopted in Ref. [12] and is combined with ICE: a simple algebraic empirical estimator for a one-parameter spatial MRF model is developed and it directly relates the parameter value with the relative frequencies of several class-label configurations in the image. On the other hand, the literature about MRF parameter-setting for supervised image classification is very limited. Any unsupervised MRF parameter estimator can also naturally be applied to a supervised problem, by simply neglecting the training data in the estimation process. However, only a few techniques have been proposed so far, effectively exploiting the available training information for MRF parameter-optimization purposes as well. In particular, a heuristic algorithm is developed in Ref. [7] aiming to optimize automatically the parameters of a multi-temporal MRF model for a joint supervised classification of two-date imagery, and a genetic approach is combined in Ref. [45] with simulated annealing [3] for the estimation of the parameters of a spatial MRF model for multi-source classification.

Signal and Image Processing for Remote Sensing

308

14.3

Supervised MRF-Based Classification

14.3.1

MRF Models for Image Classification

Let I ¼ {x1, x2, . . . , xN} be a given n-band remote-sensing image, modeled as a set of N identically distributed n-variate random vectors. We assume M thematic classes v1, v2, . . . , vM to be present in the image and we denote the resulting set of classes by V ¼ {v1, v2, . . . , vM} and the class label of the k-th image pixel (k ¼ 1, 2, . . . , N) by sk 2 V. By operating in the context of supervised image classification, we assume a training set to be available, and we denote the index set of the training pixels by T {1, 2, . . . , N} and the corresponding true class label of the k-th training pixel (k 2 T) by sk*. When collecting all the feature vectors of the N image pixels in a single (N n)-dimensional column vector1 X ¼ col[x1, x2, . . . , xN] and all the pixel labels in a discrete random vector S ¼ (s1, s2, . . . , SN) 2 VN, the MAP decision rule (i.e., the Bayes rule for minimum classification error [46]) assigns to the image ~, which maximizes the joint posterior probability P(SjX), that is, data X the label vector S ~ ¼ arg max P(SjX) ¼ arg max ½p(XjS)P(S) S S2VN

S2VN

(14:1)

where p(XjS) and P(S) are the joint probability density function (PDF) of the global feature vector X conditioned to the label vector S and the joint probability mass function (PMF) of the label vector itself, respectively. The MRF approach offers a computationally tractable solution to this maximization problem by passing from a global model for the statistical dependence of the class labels to a model of the local image properties, defined according to a given neighborhood system [2,3]. Specifically, for each k-th image pixel, a neighborhood Nk {1, 2, . . . , N} is assumed to be defined, such that, for instance, Nk includes the four (firstorder neighborhood) or the eight (second-order neighborhood) pixels surrounding the k-th pixel (k ¼ 1, 2, . . . , N). More formally, a neighborhood system is a collection {Nk}kN¼ 1 of subsets of pixels such that each pixel is outside its neighborhood (i.e., k 2 = Nk for all k ¼ 1, 2, . . . , N) and neighboring pixels are always mutually neighbors (i.e., k 2 Nh if and only if h 2 Nk for all k,h ¼ 1, 2, . . . , N, k ¼ 6 h). This simple discrete topological structure attached to the image data is exploited in the MRF framework to model the statistical relationships between the class labels of spatially distinct pixels and to provide a computationally affordable solution to the global MAP classification problem of Equation 14.1. Specifically, we assume the feature vectors x1, x2, . . . , xN to be conditionally independent and identically distributed with PDF p(xjs) (x 2 Rn, s 2 V), that is [2], p(XjS) ¼

N Y

p(xk jsk )

(14:2)

k¼1

and the joint prior PMF P(S) to be a Markov random field with respect to the abovementioned neighborhood system, that is [2,3], .

1

the probability distribution of each k-th image label, conditioned to all the other image labels, is equivalent to the distribution of the k-th label conditioned only to the labels of the neighboring pixels (k ¼ 1, 2, . . . , N):

All the vectors in the chapter are implicitly assumed to be column vectors, and we denote by ui the i-th component of an m-dimensional vector u 2 Rm (i ¼ 1, 2, . . . , m), by ‘‘col’’ the operator of column vector juxtaposition (i.e., col [u,v] is the vector obtained by stacking the two vectors u 2 Rm and v 2 Rn in a single (m þ n)-dimensional column vector), and by the superscript ‘‘T’’ the matrix transpose operator.

MRF-Based Remote-Sensing Image Classification

309

P{sk ¼ vi jsh : h 6¼ k} ¼ P{sk ¼ vi jsh : h 2 Nk }, .

i ¼ 1, 2, . . . , M

(14:3)

the PMF of S is a strictly positive function on VN, that is, P(S) > 0 for all S 2 VN.

The Markov assumption expressed by Equation 14.3 allows restricting the statistical relationships among the image labels to the local relationships inside the predefined neighborhood, thus greatly simplifying the spatial-contextual model for the label distribution as compared to a generic global model for the joint PMF of all the image labels. However, as stated in the next subsection, due to the so-called Hammersley–Clifford theorem, despite this strong analytical simplification, a large class of contextual models can be accomplished under the MRF formalism. 14.3.2

Energy Functions

Given the neighborhood system {Nk}kN¼ 1 , we denote by ‘‘clique’’ a set Q of pixels (Q {1, 2, . . . , N}) such that, for each pair (k, h) of pixels in Q, k and h turn out to be mutually neighbors, that is, k 2 Nh

() h 2 Nk

8k, h 2 Q,

k 6¼ h

(14:4)

By marking the collection of all the cliques in the adopted neighborhood system by Q and the vector of the pixel labels in the clique Q 2 Q (i.e., SQ ¼ col[sk:k 2 Q]) by SQ, the Hammersley–Clifford theorem states that the label configuration S is an MRF if and only if, for any clique Q 2 Q there exists a real-valued function VQ (SQ) (usually named ‘‘potential function’’) of the pixel labels in Q, so that the global PMF of S is given by the following Gibbs distribution [3]: P(S) ¼

1 Zprior

Uprior (S) exp , Q

where: Uprior (S) ¼

X

VQ (SQ )

(14:5)

Q2Q

Q is a positive parameter, and Zprior is a normalizing constant. Because of the formal similarity between the probability distribution in Equation 14.5 and the well-known Maxwell–Boltzmann distribution introduced in statistical mechanics for canonical ensembles [47], Uprior (), Q, and Zprior are usually named ‘‘energy function,’’ ‘‘temperature,’’ and ‘‘partition function,’’ respectively. A very large class of statistical interactions among spatially distinct pixels can be modeled in this framework, by simply choosing a suitable function Uprior (), which makes the MRF approach highly flexible. In addition, when coupling the Hammersley–Clifford formulation of the prior PMF P(S) with the conditional independence assumption stated for the conditional PDF p(XjS) (see Equation 14.2), an energy representation also holds for the global posterior distribution P(SjX), that is [2], Upost (SjX) P(SjX) ¼ exp Q Zpost,X 1

(14:6)

where Zpost,X is a normalizing constant and Upost () is a posterior energy function, given by: Upost (SjX) ¼ Q

N X k¼1

ln p(xk jsk ) þ

X

VQ (SQ )

(14:7)

Q2Q

This formulation of the global posterior probability allows addressing the MAP classification task as the minimization of the energy function Upost(jX), which is locally

Signal and Image Processing for Remote Sensing

310

defined, according to the pixelwise PDF of the feature vectors conditioned to the class labels and to the collection of the potential functions. This makes the maximization problem of Equation 14.1 tractable and allows a contextual classification map of the image I to be feasibly generated. However, the (prior and posterior) energy functions are generally parameterized by several real parameters l1, l2, . . . , lL. Therefore, the solution of the minimum-energy problem requires the preliminary selection of a proper value for the parameter vector l ¼ (l1, l2, . . . , lL)T. As described in the next subsections, in the present chapter, we address this parameter-setting issue with regard to a broad family of MRF models, which have been employed in the remote-sensing literature, and to the ICM minimization strategy for the energy function. 14.3.3

Operational Setting of the Proposed Method

Among the techniques proposed in the literature to deal with the task of minimizing the posterior energy function (see Section 14.1), in the present chapter ICM is adopted as a trade-off between the effectiveness of the minimization process and the computation time [5,7], and it is endowed with an automatic parameter-setting stage. Specifically, ICM is 0 T initialized with a given label vector S0 ¼ (s10, s20, . . . , sN ) (e.g., generated by a previously applied noncontextual supervised classifier) and it iteratively modifies the class labels to decrease the energy function. In particular, by marking by Ck the set of the neighbor labels of the k-th pixel (i.e., the ‘‘context’’ of the k-th pixel, Ck ¼ col[sh:h 2 Nk]), at the t-th ICM iteration, the label sk is updated according to the feature vector xk and to the current neighboring labels Ckt, so that skt þ 1 is the class label vi that minimizes a local energy function Ul (vijxk,Ckt) (the subscript l is introduced to stress the dependence of the energy on the parameter vector l). More specifically, given the neighborhood system, an energy representation can also be proved for the local distribution of the class labels, that is, (i ¼ 1, 2, . . . , M; k ¼ 1, 2, . . . , N) [9]: 1 Ul (vi jxk , Ck ) P{sk ¼ vi jxk , Ck } ¼ exp (14:8) Zkl Q where Zkl is a further normalizing constant and the local energy is defined by X VQ (SQ ) Ul (vi jxk , Ck ) ¼ Q ln p(xk jvi ) þ

(14:9)

Q3k

Here we focus on the family of MRF models whose local energy functions Ul() can be expressed as weighted sums of distinct energy contributions, that is, Ul (vi jxk , Ck ) ¼

L X

l‘ E ‘ (vi jxk , Ck ), k ¼ 1, 2, . . . , N

(14:10)

‘¼1

where E ‘() is the ‘-th contribution and the parameter l‘ plays the role of the weight of E ‘() (‘ ¼ 1, 2, . . . , L). A formal comparison between Equation 14.9 and Equation 14.10 suggests that one of the considered L contributions (say, E 1()) should be related to the pixelwise conditional PDF of the feature vector (i.e., E 1(vijxk) / ln p(xkjvi)), which formalizes the spectral information associated with each single pixel. From this viewpoint, postulating the presence of (L 1) further contributions implicitly means that (L 1) typologies of contextual information are modeled by the adopted MRF and are supposed to be separable (i.e., combined in a purely additive way) [7]. Formally speaking, Equation 14.10

MRF-Based Remote-Sensing Image Classification

311

represents a constraint on the energy function, but most MRF models employed in remote sensing for classification, change detection, or segmentation purposes belong to this category. For instance, the pairwise interaction model described in Ref. [2], the wellknown spatial Potts MRF model employed in Ref. [17] for change detection purposes and in Ref. [5] for hyperspectral data classification, the spatio-temporal MRF model introduced in Ref. [7] for multi-date image classification, and the multi-source classification models defined in Ref. [9] belong to this family of MRFs. 14.3.4

The Proposed MRF Parameter Optimization Method

With regard to the class of MRFs identified by Equation 14.10, the proposed method employs the training data to identify a set of parameters for the purpose of maximizing the classification accuracy by exploiting the linear relation between the energy function and the parameter vector. When focusing on the first ICM iteration, the k-th training sample (k 2 T) is correctly classified by the minimum-energy rule if: Ul (sk jxk , C0k ) Ul (vi jxk , C0k )

8vi 6¼ s*k

(14:11)

or equivalently: L X

l‘ E ‘ (vi jxk , C0k ) E ‘ (s*k jxk , C0k ) 0

8vi 6¼ s*k

(14:12)

‘¼1

We note that the inequalities in Equation 14.12 are linear with respect to l, that is, the correct classification of a training pixel is expressed as a set of (M 1) linear inequalities with respect to the model parameters. More formally, when collecting the energy differences contained in Equation 14.12 in a single L-dimensional column vector: «ki ¼ col E ‘ (vi jxk , C0k ) E ‘ (s*k jxk , C0k ): ‘ ¼ 1, 2, . . . , L

(14:13)

the k-th training pixel is correctly classified by ICM if «Tki l 0 for all the class labels vi 6¼ sk* (k 2 T). By denoting by E the matrix obtained by juxtaposing all the row vectors «kiT (k 2 T, vi 6¼ sk*), we conclude that ICM correctly classifies the whole training set T if and only if2: El 0

(14:14)

Operatively, the matrix E includes all the coefficients (i.e., the energy differences) of the linear inequalities in Equation 14.12; hence, E has L columns and has a row for each inequality, that is, it has (M 1) rows for each training pixel (corresponding to the (M 1) class labels, which are different from the true label). Therefore, denoting by jTj the number of training samples (i.e., the cardinality of T), E is an (R L)-sized matrix, with R ¼ jTj (M 1). Since R L, the system in Equation 14.14 presents a number of inequalities, which is much larger than the number of unknown variables. In particular, a feasible approach would lie in formulating the matrix inequality in Equation 14.14 as an equality E l ¼ b, where binRR is a ‘‘margin’’ vector with positive components, to solve such equality by a minimum square error (MSE) approach [24]. However, MSE would require a preliminary manual choice of margin vector b. Therefore, we avoid using this approach, and note that this problem is formally identical to the problem of computing 2

Given two m-dimensional vectors, u and v (u, v 2 Rm), we write u v to mean ui vi for all i ¼ 1, 2, . . . , m.

Signal and Image Processing for Remote Sensing

312

the weight parameters of a linear binary discriminant function [24]. Hence, we propose to address the solution of Equation 14.14 by extending to the present context the methods described in the literature to compute such a linear discriminant function. In particular, we adopt the Ho–Kashyap method [24]. This approach jointly optimizes both l and b according to the following quadratic programing problem [24]: 8 > kEl bk2 < min l ,b L R > :l 2 R ,b 2 R br > 0 for r ¼ 1, 2, . . . , R

(14:15)

which is solved by alternating iteratively an MSE step to update l and a gradientlike descent step to update b [24]. Specifically, the following operations are performed at the t-th step (t ¼ 0, 1, 2, . . . ) of the Ho–Kashyap procedure [24]: .

given the current margin vector bt, compute a corresponding parameter vector lt by minimizing kE l btk2 with respect to l, that is, compute: lt ¼ E# bt , where E# ¼ (ET E)1 ET

(14:16)

is the so-called pseudo-inverse of E (provided that ETE is nonsingular) [48] . .

compute the error vector et ¼ Elt bt 2 RR update each component of the margin vector by minimizing kElt bk2 with respect to b by a gradient-like descent step that allows the margin components only to be increased [24], that is, compute

brtþ1 ¼

btr þ retr btr

if etr > 0 if etr 0

r ¼ 1, 2, . . . , R

(14:17)

where r is a convergence parameter. In particular, the proposed approach to the MRF parameter-setting problem allows exploiting the known theoretical results about the Ho–Kashyap method in the context of linear discriminant functions. Specifically, one can prove that, if the matrix inequality in Equation 14.14 has solutions and if 0 < r < 2, then the Ho–Kashyap algorithm converges to a solution of Equation 14.14 in a finite number t of iterations (i.e., we obtain et ¼ 0 and consequently bt þ 1 ¼ bt and Elt ¼ bt 0) [24]. On the other hand, if the inequality in Equation 14.14 has no solution, it is possible to prove that each component ert of the error vector (r ¼ 1, 2, . . . , R) either vanishes for t ! þ1 or takes on nonpositive values, while the error magnitude ketk converges to a positive limit, which is bounded away from zero; in the latter situation, according to the Ho – Kashyap iterative procedure, the algorithm stops, and one can conclude that the matrix inequality in Equation 14.14 has no solution [46]. Therefore, the convergence (either finite-time or asymptotic) of the proposed parameter-setting method is guaranteed in any case. However, it is worth noting that the number t of iterations required to reach convergence in the first case or the number t0 of iterations needed to detect the nonexistence of a solution of Equation 14.14 in the second case are not known in advance. Hence, specific stop conditions are usually adopted for the Ho–Kashyap procedure, for instance, by stopping the iterative process when j l‘tþ1 lt‘ j < «stop for all ‘ ¼ 1, 2, . . . , L and jbrtþ1brtj< «stop for all r ¼ 1, 2, . . . , R, where «stop is a given threshold (in the present

MRF-Based Remote-Sensing Image Classification

313

chapter, «stop ¼ 0.0001 is used). The algorithm has been initialized by setting unitary values for all the components of l and b and by choosing r ¼ 1. Coupling the proposed Ho–Kashyap-based parameter-setting method with the ICM classification approach results in an automatic contextual supervised classification approach (hereafter marked by HK–ICM), which performs the following processing steps: .

Noncontextual step: generate an initial noncontextual classification map (i.e., an initial label vector S0) by applying a given supervised noncontextual (e.g., Bayesian or neural) classifier;

.

Energy-difference step: compute the energy-difference matrix E according to the adopted MRF model and to the label vector S0; Ho–Kashyap step: compute an optimal parameter vector l* by running the Ho–Kashyap procedure (applied to the matrix E) up to convergence; and

.

.

ICM step: generate a contextual classification map by running ICM (fed with the parameter vector l* ) up to convergence.

A block diagram of the proposed contextual classification scheme is shown in Figure 14.1. It is worth noting that, according to the role of the parameters as weights of energy contributions, negative values for l*1, l*2 , . . . , L* would be undesirable. To prevent this issue, we note that, if all the entries in a given row of the matrix E are negative,

Multiband image

Training map

Noncontextual step: generation of an initial noncontextual classification map S0 Energy-difference step: computation of the matrix E E Ho–Kashyap step: computation of the optimal parameter vector l* l*

ICM step: contextual classification of the input image

Contextual classification map

FIGURE 14.1 Block diagram of the proposed contextual classification scheme.

Signal and Image Processing for Remote Sensing

314

the corresponding inequality cannot be satisfied by a vector l with positive components. Therefore, before the application of the Ho–Kashyap step, the rows of E containing only negative entries are cancelled, as such rows would represent linear inequalities that could not be solved by positive parameter vector solutions. In other words, each deleted row corresponds to a training pixel k 2 T, so that a class vi 6¼ sk* has a lower energy than the true class label sk* for all possible positive values of the parameters l1, l2, . . . , lL (therefore, its classification is wrong).

14.4

Expe rimental Results

The proposed HK–ICM method was tested on three different MRF models, each applied to a distinct real data set. In all the cases, the results provided by ICM, endowed with the proposed automatic parameter-optimization method, are compared with the ones that can be achieved by performing an exhaustive grid search for the parameter values yielding the highest accuracies on the training set; operationally, such parameter values can be obtained by a ‘‘trial-and-error’’ procedure (TE–ICM in the following). Specifically, in all the experiments, spatially distinct training and test fields were available for each class. We note that the highest training-set (and not test-set) accuracy is searched for by TE–ICM to perform a consistent comparison with the proposed method, which deals only with the available training samples. A comparison between the classification accuracies of HK–ICM and TE–ICM on such a set of samples aims at assessing the performances of the proposed method from the viewpoint of the MRF parameter-optimization problem. On the other hand, an analysis of the test-set accuracies allows one to assess the quality of the resulting classification maps. Correspondingly, Table 14.1 through Table 14.3 show, for the three experiments, the classification accuracies provided by HK–ICM and those given by TE–ICM on both the training and the test sets. In all the experiments, 10 ICM iterations were sufficient to reach convergence, and the Ho–Kashyap algorithm was initialized by setting a unitary value for all the MRF parameters (i.e., by initially giving the same weight to all the energy contributions).

TABLE 14.1 Experiment I: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual GMAP Classifier, by HK–ICM, and by TE–ICM

Class Wet soil Urban Wood Water Bare soil Overall accuracy Average accuracy

Test-Set Accuracy Training-Set Accuracy Training Test Samples Samples GMAP (%) HK–ICM (%) TE–ICM (%) HK–ICM (%) TE–ICM (%) 1866 4195 3685 1518 3377

750 2168 2048 875 2586

93.73 94.10 98.19 91.66 97.53

96.93 99.31 98.97 92.69 99.54

96.80 99.45 98.97 92.69 99.54

98.66 97.00 96.85 98.16 99.20

98.61 97.02 96.85 98.22 99.20

95.86

98.40

98.42

97.80

97.81

95.04

97.49

97.49

97.97

97.98

Experiment II: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual DTC Classifier, by HK–ICM, and by TE–ICM

Image October 2000

June 2001

Class Wet soil Urban Wood Water Bare soil Overall accuracy Average accuracy Wet soil Urban Wood Water Bare soil Agricultural Overall accuracy Average accuracy

Training Samples

Test Samples

1194 3089 4859 1708 4509

895 3033 3284 1156 3781

1921 3261 1719 1461 3168 2773

1692 2967 1413 1444 3052 2431

Test-Set Accuracy

Training-Set Accuracy

DTC (%)

HK–ICM (%)

TE–ICM (%)

HK–ICM (%)

TE–ICM (%)

79.29 88.30 99.15 98.70 97.70 93.26 92.63 96.75 89.25 94.55 99.72 97.67 82.48 92.68 93.40

92.63 96.41 99.76 98.70 99.00 98.06 97.30 98.46 92.35 98.51 99.72 99.34 83.42 94.61 95.30

91.84 99.04 99.88 98.79 99.34 98.81 97.78 98.94 94.00 98.51 99.79 99.48 83.67 95.13 95.73

96.15 94.59 99.96 99.82 98.54 98.15 97.81 99.64 98.25 96.74 99.93 98.77 99.89 98.86 98.87

94.30 96.89 99.98 99.82 98.91 98.59 97.98 99.74 99.39 98.08 100 98.90 99.96 99.34 99.34

MRF-Based Remote-Sensing Image Classification

TABLE 14.2

315

316

TABLE 14.3 Experiment III: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual DTC Classifier, by HK–ICM, and by TE–ICM Class

April 16

Wet soil Bare soil Overall accuracy Average accuracy Wet soil Bare soil Overall accuracy Average accuracy Wet soil Bare soil Overall accuracy Average accuracy

April 17

April 18

Test Samples

6205 1346

5082 1927

6205 1469

5082 1163

6205 1920

5082 2122

Test-Set Accuracy

Training-Set Accuracy

DTC (%)

HK–ICM (%)

TE–ICM (%)

HK–ICM (%)

TE–ICM (%)

90.38 91.70 90.74 91.04 93.33 99.66 94.51 96.49 86.80 97.41 89.92 92.10

99.19 92.42 97.33 95.81 99.17 100 99.33 99.59 97.28 96.80 97.14 97.04

96.79 95.28 96.38 96.04 97.93 100 98.32 98.97 92.84 98.26 94.43 95.55

100 99.93 99.99 99.96 100 99.52 99.91 99.76 100 98.70 99.69 99.35

100 100 100 100 100 100 100 100 100 99.95 99.99 99.97

Signal and Image Processing for Remote Sensing

Image

Training Samples

MRF-Based Remote-Sensing Image Classification 14.4.1

317

Experiment I: Spatial MRF Model for Single-Date Image Classification

The spatial MRF model described in Refs. [5,17] is adopted: it employs a second-order neighborhood (i.e., Ck includes the labels of the 8 pixels surrounding the k-th pixel; k ¼ 1, 2, . . . , N) and a parametric Gaussian N(mi, Si) model for the PDF p(xkjvi) of a feature vector xk conditioned to the class vi (i ¼ 1, 2, . . . , M; k ¼ 1, 2, . . . , N). Specifically, two energy contributions are defined, that is, a ‘‘spectral’’ energy term E1(), related to the information conveyed by the conditional statistics of the feature vector of each single pixel and a ‘‘spatial’’ term E2(), related to the information conveyed by the correlation among the class labels of neighboring pixels, that is, (k ¼ 1, 2, . . . , N): X ^ i, ^ 1 ( xk m ^ i )T ^ i ) þ lndet E 1 (vi j xk ) ¼ (xk m E 2 (vi j Ck ) ¼ d(vi , vj ) (14:18) i vj 2 Ck

^ i are a sample-mean estimate and a sample-covariance estimate of the ^ i and S where m conditional mean mi and of the conditional covariance matrix Si, respectively, computed using the training samples of vi (i ¼ 1, 2, . . . , M); d (, ) is the Kronecker delta function (i.e., d(a, b) ¼ 1 for a ¼ b and d(a, b) ¼ 0 for a 6¼ b). Hence, in this case, two parameters, l1 and l2, have to be set. The model was applied to a 6 -band 870 498 pixel-sized Landsat-5 TM image acquired in April, 1994, over an urban and agricultural area around the town of Pavia (Italy) (Figure 14.2a), and presenting five thematic classes (i.e., ‘‘wet soil,’’ ‘‘urban,’’ ‘‘wood,’’ ‘‘water,’’ and ‘‘bare soil;’’ see Table 14.1). The above-mentioned normality assumption for the conditional PDFs is usually accepted as a model for the class statistics in optical data [49,50]. A standard noncontextual MAP classifier with Gaussian classes (hereafter denoted simply by GMAP) [49,50] was adopted in the noncontextual step. The Ho–Kashyap step required 31716 iterations to reach convergence and selected l*1 ¼ 0.99 and l2* ¼ 10.30. The classification accuracies provided by GMAP and HK–ICM on the test set are given in Table 14.1. GMAP already provides good classification performances, with accuracies higher than 90% for all the classes. However, as expected, the contextual HK–ICM classifier further improves the classification result, in particular, yielding, a 3.20% accuracy increase for ‘‘wet soil’’ and a 5.21% increase for ‘‘urban,’’ which result in a 2.54% increase in the overall accuracy (OA, i.e., the percentage of correctly classified test samples) and in a 2.45% increase in the average accuracy (AA, i.e., the average of the accuracies obtained on the five classes). Furthermore, for the sake of comparison, Table 14.1 also shows the results obtained by the contextual MRF–ICM classifier, applied by using a standard ‘‘trial-and-error’’ parametersetting procedure (TE–ICM). Specifically, fixing3 l1 ¼ 1, the value of l2 yielding the highest value of OA on the training set has been searched exhaustively in the range [0,20] with discretization step 0.2. The value selected exhaustively by TE–ICM was l2 ¼ 11.20, which is quite close to the value computed automatically by the proposed procedure. Accordingly, the classification accuracies obtained by HK–ICM and TE–ICM on both the training and test sets were very similar (Table 14.1). 14.4.2 Experiment II: Spatio-Temporal MRF Model for Two-Date Multi-Temporal Image Classification As a second experiment, we addressed the problem of supervised multi-temporal classification of two co-registered images, I0 and I1, acquired over the same ground area at 3 According to the minimum-energy decision rule, this choice causes no loss of generality, as the classification result is affected only by the relative weight l2/l1 between the two parameters and not by their absolute values.

Signal and Image Processing for Remote Sensing

318

(a)

(b)

(c) FIGURE 14.2 Data sets employed for the experiments: (a) band TM-3 of the multi-spectral image acquired in April, 1994, and employed in Experiment I; (b) ERS-2 channel (after histogram equalization) of the multi-sensor image acquired in October, 2000, and employed in Experiment II; (c) XSAR channel (after histogram equalization) of the SAR multi-frequency image acquired in April 16, 1994, and employed in Experiment III.

different times, t0 and t1 (t1 > t0). Specifically, the spatio-temporal mutual MRF model proposed in Ref. [7] is adopted that introduces an energy function taking into account both the spatial context (related to the correlation among neighboring pixels in the same image) and the temporal context (related to the correlation between distinct images of the same area). Let us denote by Mj, xjk, and vji the number of classes in Ij, the feature vector of the k-th pixel in Ij (k ¼ 1, 2, . . . , N), and the i-th class in Ij (i ¼ 1, 2, . . . , Mj), respectively

MRF-Based Remote-Sensing Image Classification

319

( j ¼ 0,1). Focusing, for instance, on the classification of the first-date image I0, the context of the k-th pixel includes two distinct sets of labels: (1) the set S0k of labels of the spatial context, that is, the labels of the 8 pixels surrounding the k-th pixel in I0, and (2) the set T0k of labels of the temporal context, that is, the labels of the pixels lying in I1 inside a 3 3 window centered on the k-th pixel (k ¼ 1, 2, . . . , N) [7]. Accordingly, three energy contributions are introduced, that is, a spectral and a spatial term (as in Experiment I), and an additional temporal term, that is, (i ¼ 1, 2, . . . , M0; k ¼ 1, 2, . . . , N): X p(x0k jv0i ), E 2 (v0i jS0k ) ¼ d(v0i , v0j ) E 1 (v0i jx0k ) ¼ ln ^ v0j 2S0k

E 3 (v0i jT0k ) ¼

X

^ (v0i jv1j ) P

(14:19)

v1j 2T0k

where ^ p(x0kjv0i) is an estimate of the PDF p(x0kjv0i) of a feature vector x0k (k ¼ 1, 2, . . . , N) ^ (v0ijv1j) is an estimate of the transition probability conditioned to the class v0i, and P between class v1j at time t1 and class v0i at time t0 (i ¼ 1, 2, . . . , M0 ; j ¼ 1, 2, . . . , M1) [7]. A similar 3-component energy function is also introduced at the second date; hence, three parameters have to be set at date t0 (namely, l01, l02, and l03) and three other parameters at date t1 (namely, l11, l12, and l13). In the present experiment, HK–ICM was applied to a two-date multi-temporal data set, consisting of two 870 498 pixel-sized images acquired over the same ground area as in Experiment I in October, 2000 and July, 2001, respectively. At both acquisition dates, eight Landsat-7 ETMþ bands were available, and a further ERS-2 channel (C-band, VV polarization) was also available in October, 2000 (Figure 14.2b). The same five thematic classes, as considered in Experiment I were considered in the October, 2000 scene, while the July, 2001 image also presented a further ‘‘agricultural’’ class. For all the classes, training and test data were available (Table 14.2). Due to the multi-sensor nature of the October, 2000 image, a nonparametric technique was employed in the noncontextual step. Specifically, the dependence tree classifier (DTC) approach was adopted that approximates each multi-variate class-conditional PDF as a product of automatically selected bivariate PDFs [51]. As proposed in Ref. [7], the Parzen window method [30], applied with Gaussian kernels, was used to model such bivariate PDFs. In particular, to avoid the usual ‘‘trial-and-error’’ selection of the kernelwidth parameters involved by the Parzen method, we adopted the simple ‘‘reference density’’ automatization procedure, which computes the kernel width by asymptotically minimizing the ‘‘mean integrated square error’’ functional, according to a given reference model for the unknown PDF (for further details on the automatic kernel-width selection for Parzen density estimation, we refer the reader to Refs. [52,53]). The ‘‘reference density’’ approach is chosen for its simplicity and short computation time; in particular, it is applied according to a Gaussian reference density for the kernel widths of the ETMþ features [53] and to a Rayleigh density for the kernel widths of the ERS-2 feature (for further details on this SAR-specific kernel-width selection approach, we refer the reader to Ref. [54]). The resulting PDF estimates were fed to an MAP classifier together with prior-probability estimates (computed as relative frequencies on the training set) to generate an initial noncontextual classification map for each date. During the noncontextual step, transition-probability estimates were computed as ^ (v0ijv1j) was computed as the ratio relative frequencies on the two initial maps (i.e., P nij/mj between the number nij of pixels assigned both to v0i in I0 and to v1j in I1 and the total number mj of pixels assigned to v1j in I1; i ¼ 1, 2, . . . , M0 ; j ¼ 1, 2, . . . , M1). The energy-difference and the Ho–Kashyap steps were applied first to the October,

Signal and Image Processing for Remote Sensing

320

2000 image, converging to (1.06,0.95,1.97) in 2632 iterations, and then to the July, 2001 image, converging to (1.00,1.00,1.01) in only 52 iterations. As an example, the convergent behavior of the parameter values during the Ho–Kashyap iterations for the October, 2000 image is shown in Figure 14.3. The classification accuracies obtained by ICM, applied to the spatio-temporal model of Equation 14.19 with these parameter values, are shown in Table 14.2, together with the results of the noncontextual DTC algorithm. Also in this experiment, HK–ICM provided good classification accuracies at both dates, specifically yielding large accuracy increases for several classes as compared with the noncontextual DTC initialization (in particular, ‘‘urban’’ at both dates, ‘‘wet soil’’ in October, 2000, and ‘‘wood’’ in June, 2001). Also for this experiment, we compared the results obtained by HK–ICM with the ones provided by the MRF–ICM classifier with a ‘‘trial-and-error’’ parameter setting (TE–ICM, Table 14.2). Fixing l01 ¼ l11 ¼ 1 (as in Experiment I) at both dates, a grid search was performed to identify the values of the other parameters that yielded the highest overall accuracies on the training set at the two dates. In general, this would require an interactive optimization of four distinct parameters (namely, l02, l03, l12, and l13). To reduce the time taken by this interactive procedure, the restrictions l02 ¼ l12 and l03 ¼ l13 were adopted, thus assigning the same weight l2 to the two spatial energy contributions and the same weight l3 to the two temporal contributions, and performing a grid search in a two-dimensional (and not four-dimensional) space. The adopted search range was [0,10] for both parameters with discretization step 0.2: for each combination of (l2, l3), ICM was run until convergence and the resulting classification accuracies on the training set were computed. A global maximum of the average of the two training-set OAs obtained at the two dates was reached for l2 ¼ 9.8 and l3 ¼ 2.6: the corresponding accuracies (on both the training and test sets) are shown in Table 14.2. This optimal parameter vector turns out to be quite different from the solution automatically computed by the proposed method. The training-set accuracies of TE–ICM are better than the ones of HK–ICM; however, the difference between the OAs provided on the training set by the two approaches is only 0.44% in October, 2000 and 0.48% in June, 2001, and the differences between the AAs are only 0.17% and 0.47%, respectively. This suggests that, although identifying different parameter solutions, the proposed automatic approach and the standard interactive one allow achieving similar classification performances. A similar small difference of performance can also be noted between the corresponding test-set accuracies.

2.2 l01

2

l02

l03

1.8 1.6 1.4 1.2 1 0.8 1

401

801

1201

1601

2001

2401

Ho–Kashyap iteration FIGURE 14.3 Experiment II: Plots of behaviors of the ‘‘spectral’’ parameter l01, the ‘‘spatial’’ parameter l02, and the ‘‘temporal’’ parameter l03 versus the number of Ho–Kashyap iterations for the October, 2000, image.

MRF-Based Remote-Sensing Image Classification

321

14.4.3 Experiment III: Spatio-Temporal MRF Model for Multi-Temporal Classification of Image Sequences As a third experiment, HK–ICM was applied to the multi-temporal mutual MRF model (see Section 14.2) extended to the problem of supervised classification of image sequences [7]. In particular, considering, a sequence {I0, I1, I2} of three co-registered images acquired over the same ground area at times t0, t1, t2, respectively (t0 < t1 < t2), the mutual model is generalized by introducing an energy function with four contributions at the intermediate date t1, that is, a spectral and a spatial energy term (namely, E1() and E2()) expressed as in Equation 14.19 and two distinct temporal energy terms, E3() and E4(), related to the backward temporal correlation of I1 with the previous image I0 and to the forward temporal correlation of I1 with I2, respectively [7]. For the first and last images in the sequence (i.e., I0 and I2), only one typology of temporal energy is well defined (in particular, only the forward temporal energy E3() for I0 and only the backward energy E4() for I2). All the temporal energy terms are computed in terms of transition probabilities, as in Equation 14.19. Therefore, four parameters (l11, l12, l13, and l14) have to be set at date t1, while only three parameters are needed at date t0 (namely, l01, l02, and l03) or at date t2 (namely, l21, l22, and l24). Specifically, a sequence of three 700 280 pixel-sized co-registered multi-frequency SAR images, acquired over an agricultural area near the city of Pavia in April 16, 17, and 18, 1994, was used in the experiment (the ground area is not the same as the one considered in Experiments I and II, although the two regions are quite close to each other). At each date, a 4-look XSAR band (VV polarization) and three 4-look SIR-C-bands (HH, HV, and TP (total power)) were available. The observed scene at the three dates presented two main classes, that is, ‘‘wet soil’’ and ‘‘bare soil,’’ and the temporal evolution of these classes between April 16 and April 18 was due to the artificial flooding processes caused by rice cultivation. The PDF estimates and the initial classification maps were computed using the DTC-Parzen method (as in Experiment II) applied here with a kernel-width optimized according again to the SAR-specific method developed in Ref. [54] and based on a Rayleigh reference density. As in Experiment II, the transition probabilities were estimated as relative frequencies on the initial noncontextual maps. Specifically, the Ho–Kashyap step was applied separately to set the parameters at each date, converging to (0.97,1.23,1.56) in 11510 iterations for April 16, to (1.12,1.03,0.99,1.28) in 15092 iterations for April 17, and to (1.00,1.04,0.97) in 2112 iterations for April 18. As shown in Table 14.3, the noncontextual classification stage already provided good accuracies, but the use of the contextual information allowed a sharp increase in accuracy to be obtained at all the three dates (i.e., þ6.59%, þ4.82%, and þ7.22% in OA for April 16, 17, and 18, respectively, and þ4.77%, þ3.09%, and þ4.94% in AA, respectively), thus suggesting the effectiveness of the parameter values selected by the developed algorithm. In particular, as shown in Figure 14.4 with regard to the April 16 image, the noncontextual DTC maps (e.g., Figure 14.4a) were very noisy, due to the impact of the presence of speckle in the SAR data on the classification results, whereas the contextual HK–ICM maps (e.g., Figure 4b) were far less affected by speckle, because of the integration of contextual information in the classification process. It is worth noting that no preliminary despeckling procedure was applied to the input SIR-C/XSAR images before classifying them. Also for this experiment, in Table 14.3, we present the results obtained by searching exhaustively for the MRF parameters yielding the best training-set accuracies. As in Experiment II, for the ‘‘trial-and-error’’ parameter optimization the following restrictions were adopted: l01 ¼ l11 ¼ l21 ¼ 1, l02 ¼ l12 ¼ l22 ¼ l2, and l03 ¼ l13 ¼ l14 ¼ l24 ¼ l3 (i.e., a unitary value was used for the ‘‘spectral’’ parameters, the same weight l2 was assigned to all the spatial energy contributions employed at the three dates, and the same weight l3 was associated with all the temporal energy terms). Therefore, the number of

Signal and Image Processing for Remote Sensing

322

(a)

(b) FIGURE 14.4 Experiment III: Classification maps for the April 16, image provided by: (a) the noncontextual DTC classifier; (b) the proposed contextual HK–ICM classifier. Color legend: white ¼ ‘‘wet soil,’’ grey ¼ ‘‘dry soil,’’ black ¼ not classified (not classified pixels are present where the DTC-Parzen PDF estimates exhibit very low values for both classes).

independent parameters to be set interactively was reduced from 7 to 2. Again the search range [0,10] (discretized with step 0.2) was adopted for both parameters and, for each couple (l2, l3), ICM was run up to convergence. Then, the parameter vector yielding the best classification result was chosen. In particular, the average of the three OAs on the training set at the three dates exhibited a global maximum for l2 ¼ 1 and l3 ¼ 0.4. The corresponding accuracies are given in Table 14.3 and denoted by TE–ICM. As noted with regard to Experiment II, although the parameter vectors selected by the proposed automatic procedure and by the interactive ‘‘trial-and-error’’ one were quite different, the classification performances achieved by the two methods on the training set were very similar (and very close or equal to 100%). On the other hand, in most cases the test-set accuracies obtained by HK–ICM are even better than the ones given by TE–ICM. These results can be interpreted as a consequence of the fact that, to make the computation time for the exhaustive parameter search tractable, the above-mentioned restrictions on the parameter values were used when applying TE–ICM, whereas HK–ICM optimized all the MRF parameters, without any need for additional restrictions. The resulting TE–ICM parameters were effective for the classification of the training set (as they were specifically optimized toward this end), but they yielded less accurate results on the test set.

14.5

Conclusions

In the present chapter, an innovative algorithm has been proposed that addresses the problem of the automatization of the parameter-setting operations involved by MRF

MRF-Based Remote-Sensing Image Classification

323

models for ICM-based supervised image classification. The method is applicable to a wide class of MRF models (i.e., the models whose energy functions can be expressed as weighted sums of distinct energy contributions), and expresses the parameter optimization as a linear discrimination problem, solved by using the Ho–Kashyap method. In particular, the theoretical properties known for this algorithm in the context of linear classifier design still holds also for the proposed method, thus ensuring a good convergent behavior. The numerical results proved the capability of HK–ICM to select automatically parameter values that can yield accurate contextual classification maps. These results have been obtained for several different MRF models, dealing both with single-date supervised classification and with multi-temporal classification of two-date or multidate imagery, which suggests a high flexibility of the method. This is further confirmed by the fact that good classification results were achieved on different typologies of remote sensing data (i.e., multi-spectral, optical-SAR multi-source, and multi-frequency SAR) and with different PDF estimation strategies (both parametric and nonparametric). In particular, an interactive choice of the MRF parameters that selects through an exhaustive grid search the parameter values that provide the best performances on the training set yielded results very similar to the ones obtained by the proposed HK–ICM methodology in terms of classification accuracy. Specifically, in the first experiment, HK–ICM automatically identified parameter values quite close to the ones selected by this ‘‘trial-and-error’’ procedure (thus providing a very similar classification map), while in the other experiments, HK–ICM chose parameter values different from the ones obtained by the ‘‘trial-and-error’’ strategy, although generating classification results with similar (Experiment II) or even better (Experiment III) accuracies. The application of the ‘‘trial-and-error’’ approach in acceptable execution times required the definition of suitable restrictions on the parameter values to reduce the dimension of the parameter space to be explored. Such restrictions are not needed by the proposed method, which turns out to be a good compromise between accuracy and level of automatization, as it allows obtaining very good classification results, although avoiding completely the time-consuming phase of ‘‘trial-and-error’’ parameter setting. In addition, in all the experiments, the adopted MRF models yielded a significant increase in accuracy, as compared with the initial noncontextual results. This further highlights the advantages of MRF models, which effectively exploit the contextual information in remote sensing image classification, and further confirms the usefulness of the proposed technique in automatically setting the parameters of such models. The proposed algorithm allows one to overcome the usual limitation on the use of MRF techniques in supervised image classification, consisting in the lack of automatization of such techniques, and supports a more extensive use of this powerful classification approach in remote sensing. It is worth noting that HK–ICM optimizes the MRF parameter vector according to the initial classification maps provided as an input to ICM: the resulting optimal parameters are then employed to run ICM up to convergence. A further generalization of the method would adapt automatically the MRF parameters also during the ICM iterative process by applying the proposed Ho–Kashyap-based method at each ICM iteration or according to a different predefined iterative scheme. This could allow a further improvement in accuracy, although requiring one to run the Ho–Kashyap algorithm not once but several times, thus increasing the total computation time of the contextual classification process. The effect of this combined optimization strategy on the convergence of ICM is an issue worth being investigated.

324

Signal and Image Processing for Remote Sensing

Acknowledgments This research was carried out within the framework of the PRIN-2002 project entitled ‘‘Processing and analysis of multitemporal and hypertemporal remote-sensing images for environmental monitoring,’’ which was funded by the Italian Ministry of Education, University, and Research (MIUR). The support is gratefully acknowledged. The authors would also like to thank Dr. Paolo Gamba from the University of Pavia, Italy, for providing the SAR images employed in Experiments II and III.

References 1. M. Datcu, K. Seidel, and M. Walessa, Spatial information retrieval from remote sensing images: Part I. Information theoretical perspective, IEEE Trans. Geosci. Rem. Sens., 36(5), 1431–1445, 1998. 2. R.C. Dubes and A.K. Jain, Random field models in image analysis, J. Appl. Stat., 16, 131–163, 1989. 3. S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration, IEEE Trans. Pattern Anal. Machine Intell., 6, 721–741, 1984. 4. A.H.S. Solberg, Flexible nonlinear contextual classification, Pattern Recogn. Lett., 25(13), 1501–1508, 2004. 5. Q. Jackson and D. Landgrebe, Adaptive Bayesian contextual classification based on Markov random fields, IEEE Trans. Geosci. Rem. Sens., 40(11), 2454–2463, 2002. 6. M.De Martino, G. Macchiavello, G. Moser., and S.B. Serpico, Partially supervised contextual classification of multitemporal remotely sensed images, In Proc. of IEEE-IGARSS 2003, Toulouse, 2, 1377–1379, 2003. 7. F. Melgani and S.B. Serpico, A Markov random field approach to spatio-temporal contextual image classification, IEEE Trans. Geosci. Rem. Sens., 41(11), 2478–2487, 2003. 8. P.H. Swain, Bayesian classification in a time-varying environment, IEEE Trans. Syst., Man, Cybern., 8(12), 880–883, 1978. 9. A.H.S. Solberg, T. Taxt, and A.K. Jain, A Markov Random Field model for classification of multisource satellite imagery, IEEE Trans. Geosci. Rem. Sens., 34(1), 100–113, 1996. 10. G. Storvik, R. Fjortoft, and A.H.S. Solberg, A bayesian approach to classification of multiresolution remote sensing data, IEEE Trans. Geosci. Rem. Sens., 43(3), 539–547, 2005. 11. M.L. Comer and E.J. Delp, The EM/MPM algorithm for segmentation of textured images: Analysis and further experimental results, IEEE Trans. Image Process., 9(10), 1731–1744, 2000. 12. Y. Delignon, A. Marzouki, and W. Pieczynski, Estimation of generalized mixtures and its application to image segmentation, IEEE Trans. Image Process., 6(10), 1364–1375, 2001. 13. X. Descombes, M. Sigelle, and F. Preteux, Estimating Gaussian Markov random field parameters in a nonstationary framework: Application to remote sensing imaging, IEEE Trans. Image Process., 8(4), 490–503, 1999. 14. P. Smits and S. Dellepiane, Synthetic aperture radar image segmentation by a detail preserving Markov random field approach, IEEE Trans. Geosci. Rem. Sens., 35(4), 844–857, 1997. 15. G.G. Hazel, Multivariate Gaussian MRF for multispectral scene segmentation and anomaly detection, IEEE Trans. Geosci. Rem. Sens., 38(3), 1199–1211, 2000. 16. G. Rellier, X. Descombes, F. Falzon, and J. Zerubia, Texture feature analysis using a Gauss–Markov model in hyperspectral image classification, IEEE Trans. Geosci. Rem. Sens., 42(7), 1543–1551, 2004. 17. L. Bruzzone and D.F. Prieto, Automatic analysis of the difference image for unsupervised change detection, IEEE Trans. Geosci. Rem. Sens., 38(3), 1171–1182, 2000. 18. L. Bruzzone and D.F. Prieto, An adaptive semiparametric and context-based approach to unsupervised change detection in multitemporal remote-sensing images, IEEE Trans. Image Process., 40(4), 452–466, 2002.

MRF-Based Remote-Sensing Image Classification

325

19. T. Kasetkasem and P.K. Varshney, An image change detection algorithm based on markov random field models, IEEE Trans. Geosci. Rem. Sens., 40(8), 1815–1823, 2002. 20. J. Besag, On the statistical analysis of dirty pictures, J. R. Statist. Soc., 68, 259–302, 1986. 21. Y. Cao, H. Sun, and X. Xu, An unsupervised segmentation method based on MPM for SAR images, IEEE Geosci. Rem. Sens. Lett., 2(1), 55–58, 2005. 22. X. Descombes, R.D. Morris, J. Zerubia, and M. Berthod, Estimation of Markov Random Field prior parameters using Markov Chain Monte Carlo maximum likelihood, IEEE Trans. Image Process., 8(7), 954–963, 1999. 23. M.V. Ibanez and A. Simo’, Parameter estimation in Markov random field image modeling with imperfect observations. A comparative study, Pattern Rec. Lett., 24(14), 2377–2389, 2003. 24. J.T. Tou and R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, MA, 1974. 25. Y.-C. Ho and R.L. Kashyap, An algorithm for linear inequalities and its applications, IEEE Trans. Elec. Comp., 14, 683–688, 1965. 26. G. Celeux, F. Forbes, and N. Peyrand, EM procedures using mean field-like approximations for Markov model-based image segmentation, Pattern Recogn., 36, 131–144, 2003. 27. Y. Zhang, M. Brady, and S. Smith, Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm, IEEE Trans. Med. Imag., 20(1), 45–57, 2001. 28. R. Fiortoft, Y. Delignon, W. Pieczynski, M. Sigelle, and F. Tupin, Unsupervised classification of radar images using hidden Markov models and hidden Markov random fields, IEEE Trans. Geosci. Rem. Sens., 41(3), 675–686, 2003. 29. Z. Kato, J. Zerubia, and M. Berthod, Unsupervised parallel image classification using Markovian models, Pattern Rec., 32, 591–604, 1999. 30. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, Academic Press, New York, 1990. 31. F. Comets and B. Gidas, Parameter estimation for Gibbs distributions from partially observed data, Ann. Appl. Prob., 2(1), 142–170, 1992. 32. W. Qian and D.N. Titterington, Estimation of parameters of hidden Markov models, Phil. Trans.: Phys. Sci. and Eng., 337(1647), 407–428, 1991. 33. X. Descombes, A dense class of Markov random fields and associated parameter estimation, J. Vis. Commun. Image R., 8(3), 299–316, 1997. 34. J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. R. Statist. Soc., 36, 192–236, 1974. 35. L. Bedini, A. Tonazzini, and S. Minutoli, Unsupervised edge-preserving image restoration via a saddle point approximation, Image and Vision Computing, 17, 779–793, 1999. 36. S.S. Saquib, C.A. Bouman, and K. Sauer, ML parameter estimation for Markov random fields with applications to Bayesian tomography, IEEE Trans. Image Process., 7(7), 1029–1044, 1998. 37. C. Geyer and E. Thompson, Constrained Monte Carlo maximum likelihood for dependent data, J. Roy. Statist. Soc. Ser. B, 54, 657–699, 1992. 38. L. Younes, Estimation and annealing for Gibbsian fields, Annales de l’Institut Henri Poincare´. Probabilite´s et Statistiques, 24, 269–294, 1988. 39. B. Chalmond, An iterative Gibbsian technique for reconstruction of mary images, Pattern Rec., 22(6), 747–761, 1989. 40. Y. Yu and Q. Cheng, MRF parameter estimation by an accelerated method, Patt. Rec. Lett., 24, 1251–1259, 2003. 41. L. Wang, J. Liu, and S.Z. Li, MRF parameter estimation by MCMC method, Pattern Rec., 33, 1919–1925, 2000. 42. L. Wang and J. Liu, Texture segmentation based on MRMRF modeling, Patt. Rec. Lett., 21, 189– 200, 2000. 43. H. Derin and H. Elliott, Modeling and segmentation of noisy and textured images using Gibbs random fields, IEEE Trans. Pattern Anal. Machine Intell., 9(1), 39–55, 1987. 44. M. Mignotte, C. Collet, P. Perez, and P. Bouthemy, Sonar image segmentation using an unsupervised hierarchical MRF model, IEEE Trans. Image Process., 9(7), 1216–1231, 2000. 45. B.C.K. Tso and P.M. Mather, Classification of multisource remote sensing imagery using a genetic algorithm and Markov Random Fields, IEEE Trans. Geosci. Rem. Sens., 37(3), 1255–1260, 1999.

326

Signal and Image Processing for Remote Sensing

46. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd edition, Wiley, New York, 2001. 47. M.W. Zemansky, Heat and Thermodynamics, 5th edition, McGraw-Hill, New York, 1968. 48. J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, 2nd edition, Springer-Verlag, New York, 1992. 49. J. Richards and X. Jia, Remote sensing digital image analysis, 3rd edition, Springer-Verlag, Berlin, 1999. 50. D.A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing. Wiley-InterScience, New York, 2003. 51. M. Datcu, F. Melgani, A. Piardi, and S.B. Serpico, Multisource data classification with dependence trees, IEEE Trans. Geosci. Rem. Sens., 40(3), 609–617, 2002. 52. A. Berlinet and L. Devroye, A comparison of kernel density estimates, Publications de l’Institut de Statistique de l’Universite´ de Paris, 38(3), 3–59, 1994. 53. K.-D. Kim and J.-H. Heo, Comparative study of flood quantiles estimation by nonparametric models, J. Hydrology, 260, 176–193, 2002. 54. G. Moser, J. Zerubia, and S.B. Serpico, Dictionary-based stochastic expectation-maximization for SAR amplitude probability density function estimation, Research Report 5154, INRIA, Mar. 2004.

15 Random Forest Classification of Remote Sensing Data

Sveinn R. Joelsson, Jon A. Benediktsson, and Johannes R. Sveinsson

CONTENTS 15.1 Introduction ..................................................................................................................... 327 15.2 The Random Forest Classifier....................................................................................... 328 15.2.1 Derived Parameters for Random Forests...................................................... 329 15.2.1.1 Out-of-Bag Error .............................................................................. 329 15.2.1.2 Variable Importance ........................................................................ 329 15.2.1.3 Proximities ........................................................................................ 329 15.3 The Building Blocks of Random Forests..................................................................... 330 15.3.1 Classification and Regression Tree ................................................................ 330 15.3.2 Binary Hierarchy Classifier Trees .................................................................. 330 15.4 Different Implementations of Random Forests ......................................................... 331 15.4.1 Random Forest: Classification and Regression Tree................................... 331 15.4.2 Random Forest: Binary Hierarchical Classifier............................................ 331 15.5 Experimental Results...................................................................................................... 331 15.5.1 Classification of a Multi-Source Data Set...................................................... 331 15.5.1.1 The Anderson River Data Set Examined with a Single CART Tree ......................................................................... 335 15.5.1.2 The Anderson River Data Set Examined with the BHC Approach ................................................................................. 337 15.5.2 Experiments with Hyperspectral Data.......................................................... 338 15.6 Conclusions...................................................................................................................... 343 Acknowledgment....................................................................................................................... 343 References ................................................................................................................................... 343

15.1

Introduction

Ensemble classification methods train several classifiers and combine their results through a voting process. Many ensemble classifiers [1,2] have been proposed. These classifiers include consensus theoretic classifiers [3] and committee machines [4]. Boosting and bagging are widely used ensemble methods. Bagging (or bootstrap aggregating) [5] is based on training many classifiers on bootstrapped samples from the training set and has been shown to reduce the variance of the classification. In contrast, boosting uses iterative re-training, where the incorrectly classified samples are given more weight in 327

Signal and Image Processing for Remote Sensing

328

successive training iterations. This makes the algorithm slow (much slower than bagging) while in most cases it is considerably more accurate than bagging. Boosting generally reduces both the variance and the bias of the classification and has been shown to be a very accurate classification method. However, it has various drawbacks: it is computationally demanding, it can overtrain, and is also sensitive to noise [6]. Therefore, there is much interest in investigating methods such as random forests. In this chapter, random forests are investigated in the classification of hyperspectral and multi-source remote sensing data. A random forest is a collection of classification trees or treelike classifiers. Each tree is trained on a bootstrapped sample of the training data, and at each node in each tree the algorithm only searches across a random subset of the features to determine a split. To classify an input vector in a random forest, the vector is submitted as an input to each of the trees in the forest. Each tree gives a classification, and it is said that the tree votes for that class. In the classification, the forest chooses the class having the most votes (over all the trees in the forest). Random forests have been shown to be comparable to boosting in terms of accuracies, but without the drawbacks of boosting. In addition, the random forests are computationally much less intensive than boosting. Random forests have recently been investigated for classification of remote sensing data. Ham et al. [7] applied them in the classification of hyperspectral remote sensing data. Joelsson et al. [8] used random forests in the classification of hyperspectral data from urban areas and Gislason et al. [9] investigated random forests in the classification of multi-source remote sensing and geographic data. All studies report good accuracies, especially when computational demand is taken into account. The chapter is organized as follows. Firstly random forest classifiers are discussed. Then, two different building blocks for random forests, that is, the classification and regression tree (CART) and the binary hierarchical classifier (BHC) approaches are reviewed. In Section 15.4, random forests with the two different building blocks are discussed. Experimental results for hyperspectral and multi-source data are given in Section 15.5. Finally, conclusions are given in Section 15.6.

15.2

The Random Forest Classifier

A random forest classifier is a classifier comprising a collection of treelike classifiers. Ideally, a random forest classifier is an i.i.d. randomization of weak learners [10]. The classifier uses a large number of individual decision trees, all of which are trained (grown) to tackle the same problem. A sample is decided to belong to the most frequently occurring of the classes as determined by the individual trees. The individuality of the trees is maintained by three factors: 1. Each tree is trained using a random subset of the training samples. 2. During the growing process of a tree the best split on each node in the tree is found by searching through m randomly selected features. For a data set with M features, m is selected by the user and kept much smaller than M. 3. Every tree is grown to its fullest to diversify the trees so there is no pruning. As described above, a random forest is an ensemble of treelike classifiers, each trained on a randomly chosen subset of the input data where final classification is based on a majority vote by the trees in the forest.

Random Forest Classification of Remote Sensing Data

329

Each node of a tree in a random forest looks to a random subset of features of fixed size m when deciding a split during training. The trees can thus be viewed as random vectors of integers (features used to determine a split at each node). There are two points to note about the parameter m: 1. Increasing the correlation between the trees in the forest by increasing m, increases the error rate of the forest. 2. Increasing the classification accuracy of every individual tree by increasing m, decreases the error rate of the forest. An optimal interval for m is between the somewhat fuzzy extremes discussed above. The parameter m is often said to be the only adjustable parameter to which the forest is sensitive and the ‘‘optimal’’ range for m is usually quite wide [10]. 15.2.1

Derived Parameters for Random Forests

There are three parameters that are derived from the random forests. These parameters are the out-of-bag (OOB) error, the variable importance, and the proximity analysis. 15.2.1.1

Out-of-Bag Error

To estimate the test set accuracy, the out-of-bag samples (the remaining training set samples that are not in the bootstrap for a particular tree) of each tree can be run down through the tree (cross-validation). The OOB error estimate is derived by the classification error for the samples left out for each tree, averaged over the total number of trees. In other words, for all the trees where case n was OOB, run case n down the trees and note if it is correctly classified. The proportion of times the classification is in error, averaged over all the cases, is the OOB error estimate. Let us consider an example. Each tree is trained on a random 2/3 of the sample population (training set) while the remaining 1/3 is used to derive the OOB error rate for that tree. The OOB error rate is then averaged over all the OOB cases yielding the final or total OOB error. This error estimate has been shown to be unbiased in many tests [10,11]. 15.2.1.2

Variable Importance

For a single tree, run it on its OOB cases and count the votes for the correct class. Then, repeat this again after randomly permuting the values of a single variable in the OOB cases. Now subtract the correctly cast votes for the randomly permuted data from the number of correctly cast votes for the original OOB data. The average of this value over all the forest is the raw importance score for the variable [5,6,11]. If the values of this score from tree to tree are independent, then the standard error can be computed by a standard computation [12]. The correlations of these scores between trees have been computed for a number of data sets and proved to be quite low [5,6,11]. Therefore, we compute standard errors in the classical way: divide the raw score by its standard error to get a z-score, and assign a significance level to the z-score assuming normality [5,6,11]. 15.2.1.3 Proximities After a tree is grown all the data are passed through it. If cases k and n are in the same terminal node, their proximity is increased by one. The proximity measure can be used

Signal and Image Processing for Remote Sensing

330

(directly or indirectly) to visualize high dimensional data [5,6,11]. As the proximities are indicators on the ‘‘distance’’ to other samples this measure can be used to detect outliers in the sense that an outlier is ‘‘far’’ from all other samples.

15.3

The Building Blocks of Random Forests

Random forests are made up of several trees or building blocks. The building blocks considered here are CART, which partition the input data, and the BHC trees, which partition the labels (the output).

15.3.1

Classification and Regression Tree

CART is a decision tree where splits are made on a variable/feature/dimension resulting in the greatest change in impurity or minimum impurity given a split on a variable in the data set at a node in the tree [12]. The growing of a tree is maintained until either the change in impurity has stopped or is below some bound or the number of samples left to split is too small according to the user. CART trees are easily overtrained, so a single tree is usually pruned to increase its generality. However, a collection of unpruned trees, where each tree is trained to its fullest on a subset of the training data to diversify individual trees can be very useful. When collected in a multi-classifier ensemble and trained using the random forest algorithm, these are called RF-CART.

15.3.2

Binary Hierarchy Classifier Trees

A binary hierarchy of classifiers, where each node is based on a split regarding labels and output instead of input as in the CART case, are naturally organized in trees and can as such be combined, under similar rules as the CART trees, to form RF-BHC. In a BHC, the best split on each node is based on (meta-) class separability starting with a single metaclass, which is split into two meta-classes and so on; the true classes are realized in the leaves. Simultaneously to the splitting process, the Fisher discriminant and the corresponding projection are computed, and the data are projected along the Fisher direction [12]. In ‘‘Fisher space,’’ the projected data are used to estimate the likelihood of a sample belonging to a meta-class and from there the probabilities of a true class belonging to a meta-class are estimated and used to update the Fisher projection. Then, the data are projected using this updated projection and so forth until a user-supplied level of separation is acquired. This approach utilizes natural class affinities in the data, that is, the most natural splits occur early in the growth of the tree [13]. A drawback is the possible instability of the split algorithm. The Fisher projection involves an inverse of an estimate of the within-class covariance matrix, which can be unstable at some nodes of the tree, depending on the data being considered and so if this matrix estimate is singular (to numerical precision), the algorithm fails. As mentioned above, the BHC trees can be combined to an RF-BHC where the best splits on classes are performed on a subset of the features in the data to diversify individual trees and stabilize the aforementioned inverse. Since the number of leaves in a BHC tree is the same as the number of classes in the data set the trees themselves can be very informative when compared to CART-like trees.

Random Forest Classification of Remote Sensing Data

15.4 15.4.1

331

Different Implementations of Random Forests Random Forest: Classification and Regression Tree

The RF-CART approach is based on CART-like trees where trees are grown to minimize an impurity measure. When trees are grown using a minimum Gini impurity criterion [12], the impurity of two descendent nodes in a tree is less than the parents. Adding up the decrease in the Gini value for each variable over all the forest gives a variable importance that is often very consistent with the permutation importance measure.

15.4.2

Random Forest: Binary Hierarchical Classifier

RF-BHC is a random forest based on an ensemble of BHC trees. In the RF-BHC, a split in the tree is based on the best separation between meta-classes. At each node the best separation is found by examining m features selected at random. The value of m can be selected by trials to yield optimal results. In the case where the number of samples is small enough to induce the ‘‘curse’’ of dimensionality, m is calculated by looking to a user-supplied ratio R between the number of samples and the number of features; then m is either used unchanged as the supplied value or a new value is calculated to preserve the ratio R, whichever is smaller at the node in question [7]. An RF-BHC is uniform regarding tree size (depth) because the number of nodes is a function of the number of classes in the dataset.

15.5

Experimental Results

Random forests have many important qualities of which many apply directly to multi- or hyperspectral data. It has been shown that the volume of a hypercube concentrates in the corners and the volume of a hyper ellipsoid concentrates in an outer shell, implying that with limited data points, much of the hyperspectral data space is empty [17]. Making a collection of trees is attractive, when each of the trees looks to minimize or maximize some information content related criteria given a subset of the features. This means that the random forest can arrive at a good decision boundary without deleting or extracting features explicitly while making the most out of the training set. This ability to handle thousands of input features is especially attractive when dealing with multi- or hyperspectral data, because more often than not it is composed of tens to hundreds of features and a limited number of samples. The unbiased nature of the OOB error rate can in some cases (if not all) eliminate the need for a validation dataset, which is another plus when working with a limited number of samples. In experiments, the RF-CART approach was tested using a FORTRAN implementation of random forests supplied on a web page maintained by Leo Breiman and Adele Cutler [18].

15.5.1

Classification of a Multi-Source Data Set

In this experiment we use the Anderson River data set, which is a multi-source remote sensing and geographic data set made available by the Canada Centre for Remote Sensing

Signal and Image Processing for Remote Sensing

332

(CCRS) [16]. This data set is very difficult to classify due to a number of mixed forest type classes [15]. Classification was performed on a data set consisting of the following six data sources: 1. Airborne multispectral scanner (AMSS) with 11 spectral data channels (ten channels from 380 nm to 1100 nm and one channel from 8 mm to 14 mm) 2. Steep mode synthetic aperture radar (SAR) with four data channels (X-HH, X-HV, L-HH, and L-HV) 3. Shallow mode SAR with four data channels (X-HH, X-HV, L-HH, and L-HV) 4. Elevation data (one data channel, where elevation in meters pixel value) 5. Slope data (one data channel, where slope in degrees pixel value) 6. Aspect data (one data channel, where aspect in degrees pixel value) There are 19 information classes in the ground reference map provided by CCRS. In the experiments, only the six largest ones were used, as listed in Table 15.1. Here, training samples were selected uniformly, giving 10% of the total sample size. All other known samples were then used as test samples [15]. The experimental results for random forest classification are given in Table 15.2 through Table 15.4. Table 15.2 shows line by line, how the parameters (number of split variables m and number of trees) are selected. First, a forest of 50 trees is grown for various number of split variables, then the number yielding the highest train accuracy (OOB) is selected, and then growing more trees until the overall accuracy stops increasing is tried. The overall accuracy (see Table 15.2) was seen to be insensitive to variable settings on the interval 10–22 split variables. Growing the forest larger than 200 trees improves the overall accuracy insignificantly, so a forest of 200 trees, each of which considers all the input variables at every node, yields the highest accuracy. The OOB accuracy in Table 15.2 seems to support the claim that overfitting is next to impossible using random forests in this manner. However the ‘‘best’’ results were obtained using 22 variables so there is no random selection of input variables at each node of every tree here because all variables are being considered on every split. This might suggest that a boosting algorithm using decision trees might yield higher overall accuracies. The highest overall accuracies achieved with the Anderson River data set, known to the authors at the time of this writing, have been reached by boosting using j4.8 trees [17]. These accuracies were 100% training accuracy (vs. 77.5% here) and 80.6% accuracy for test data, which are not dramatically higher than the overall accuracies observed here (around 79.0%) with a random forest (about 1.6 percentage points difference). Therefore, even though m is not much less than the total number of variables (in fact equal), the TABLE 15.1 Anderson River Data: Information Classes and Samples Class No. 1 2 3 4 5 6

Class Description

Training Samples

Test Samples

Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Total

971 551 548 542 317 1260 4189

1250 817 701 705 405 1625 5503

Random Forest Classification of Remote Sensing Data

333

TABLE 15.2 Anderson River Data: Selecting m and the Number of Trees Trees

Split Variables

Runtime (min:sec)

50 1 00:19 50 5 00:20 50 10 00:22 50 15 00:22 50 20 00:24 50 22 00:24 100 22 00:38 200 22 01:06 400 22 02:06 1000 22 05:09 100 10 00:32 200 10 00:52 400 10 01:41 1000 10 04:02 22 split variables selected as the ‘‘best’’ choice

OOB acc. (%)

Test Set acc. (%)

68.42 74.00 75.89 76.30 76.01 76.63 77.18 77.51 77.56 77.68 76.65 77.04 77.54 77.66

71.58 75.74 77.63 78.50 78.14 78.10 78.56 79.01 78.81 78.87 78.39 78.34 78.41 78.25

random forest ensemble performs rather well, especially when running times are taken into consideration. Here, in the random forest, each tree is an expert on a subset of the data but all the experts look to the same number of variables and do not, in the strictest sense, utilize the strength of random forests. However, the fact remains that the results are among the best ones for this data set. The training and test accuracies for the individual classes using random forests with 200 trees and 22 variables at each node are given in Table 15.3 and Table 15.4, respectively. From these tables, it can be seen that the random forest yields the highest accuracies for classes 5 and 6 but the lowest for class 2, which is in accordance with the outlier analysis below. A variable importance estimate for the training data can be seen in Figure 15.1, where each data channel is represented by one variable. The first 11 variables are multi-spectral data, followed by four steep-mode SAR data channels, four shallow-mode synthetic aperture radar, and then elevation, slope, and aspect measurements, one channel each. It is interesting to note that variable 20 (elevation) is the most important variable, followed by variable 22 (aspect), and spectral channel 6 when looking at the raw importance (Figure 15.1a), but slope when looking at the z-score (Figure 15.1b). The variable importance for each individual class can be seen in Figure 15.2. Some interesting conclusions can be drawn from Figure 15.2. For example, with the exception of class 6, topographic data TABLE 15.3 Anderson River Data: Confusion Matrix for Training Data in Random Forest Classification (Using 200 Trees and Testing 22 Variables at Each Node) Class No. 1 2 3 4 5 6

1

2

3

4

5

6

%

764 75 32 11 8 81

126 289 62 3 2 69

20 38 430 11 9 40

35 8 21 423 39 16

1 1 0 42 271 2

57 43 51 25 14 1070

78.68 52.45 78.47 78.04 85.49 84.92

Signal and Image Processing for Remote Sensing

334 TABLE 15.4

Anderson River Data: Confusion Matrix for Test Data in Random Forest Classification (Using 200 Trees and Testing 22 Variables at Each Node) Class No. 1 2 3 4 5 6

1

2

3

4

5

6

%

1006 87 26 19 7 105

146 439 67 65 6 94

29 40 564 12 7 49

44 23 18 565 45 10

2 0 3 44 351 5

60 55 51 22 14 1423

80.48 53.73 80.46 80.14 86.67 87.57

(channels 20–22) are of high importance and then come the spectral channels (channels 1–11). In Figure 15.2, we can see that the SAR channels (channels 12–19) seem to be almost irrelevant to class 5, but seem to play a more important role for the other classes. They always come third after the topographic and multi-spectral variables, with the exception of class 6, which seems to be the only class where this is not true; that is, the topographic variables score lower than an SAR channel (Shallow-mode SAR channel number 17 or X-HV). These findings can then be verified by classifying the data set according to only the most important variables and compared to the accuracy when all the variables are

Raw importance 15

10

5

0

2

4

6

8

10

12

14

16

18

20

22

16

18

20

22

(a) z-score

60 40 20 0 (b)

2

4

6

8 10 12 14 Variable (dimension) number

FIGURE 15.1 Anderson river training data: (a) variable importance and (b) z-score on raw importance.

Random Forest Classification of Remote Sensing Data

335

Class 1

Class 2

4 1

3 2

0.5 1 0

5

10

15

20

0

5

10

Class 3

15

20

15

20

Class 4 4

2

3 2

1

1 0

5

10

15

20

0

5

10

Class 5

Class 6

3 4 2

3 2

1 0

1 5 10 15 20 Variable (dimension) number

0

5 10 15 20 Variable (dimension) number

FIGURE 15.2 Anderson river training data: variable importance for each of the six classes.

included. For example leaving out variable 20 should have less effect on classification accuracy in class 6 than on all the other classes. A proximity matrix was computed for the training data to detect outliers. The results of this outlier analysis are shown in Figure 15.3, where it can be seen that the data set is difficult for classification as there are several outliers. From Figure 15.3, the outliers are spread over all classes—with a varying degree. The classes with the least amount of outliers (classes 5 and 6) are indeed those with the highest classification accuracy (Table 15.3 and Table 15.4). On the other hand, class 2 has the lowest accuracy and the highest number of outliers. In the experiments, the random forest classifier proved to be fast. Using an Intelt Celeront CPU 2.20-GHz desktop, it took about a minute to read the data set into memory, train, and classify the data set, with the settings of 200 trees and 22 split variables when the FORTRAN code supplied on the random forest web site was used [18]. The running times seem to indicate a linear time increase when considering the number of trees. They are seen along with a least squares fit to a line in Figure 15.4. 15.5.1.1 The Anderson River Data Set Examined with a Single CART Tree We look to all of the 22 features when deciding a split in the RF-CART approach above, so it is of interest here to examine if the RF-CART performs any better than a single CART tree. Unlike the RF-CART, a single CART is easily overtrained. Here we prune the CART tree to reduce or eliminate any overtraining features of the tree and hence use three data sets, a training set, testing set (used to decide the level of pruning), and a validation set to estimate the performance of the tree as a classifier (Table 15.5 and Table 15.6).

Signal and Image Processing for Remote Sensing

336

Outliers in the training data 20 10 0

500

1000

1500

2000 2500 Sample number

3000

3500

Class 1

Class 2

20

20

10

10

0

200

400 600 Class 3

0

800

20

20

10

10

0

100

200

300 Class 5

400

500

0

20

20

10

10

0

4000

50 100 150 200 250 Sample number (within class)

300

0

100

200

300 Class 4

400

500

100

200

300 Class 6

400

500

200 400 600 800 1000 1200 Sample number (within class)

FIGURE 15.3 Anderson River training data: outlier analysis for individual classes. In each case, the x-axis (index) gives the number of a training sample and the y-axis the outlier measure.

Random forest running times 350 10 variables, slope: 0.235 sec per tree + 22 variables, slope: 0.302 sec per tree

300

+

Running time (sec)

250

200 150 + 100 + 50 0

+

0

200

400

600 800 Number of trees

1000

FIGURE 15.4 Anderson river data set: random forest running times for 10 and 22 split variables.

1200

Random Forest Classification of Remote Sensing Data

337

TABLE 15.5 Anderson River Data Set: Training, Test, and Validation Sets Class Description

Training Samples

Test Samples

Validation Samples

Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Total samples

971 551 548 542 317 1260 4189

250 163 140 141 81 325 1100

1000 654 561 564 324 1300 4403

Class No. 1 2 3 4 5 6

As can be seen in Table 15.6 and from the results of the RF-CART runs above (Table 15.2), the overall accuracy is about 8 percentage points higher ( (78.8/70.81)*100 ¼ 11.3%) than the overall accuracy for the validation set in Table 15.6. Therefore, a boosting effect is present even though we need all the variables to determine a split in every tree in the RF-CART. 15.5.1.2

The Anderson River Data Set Examined with the BHC Approach

The same procedure was used to select the variable m when using the RF-BHC as in the RF-CART case. However, for the RF-BHC, the separability of the data set is an issue. When the number of randomly selected features was less than 11, it was seen that a singular matrix was likely for the Anderson River data set. The best overall performance regarding the realized classification accuracy turned out to be the same as for the RFCART approach or for m ¼ 22. The R parameter was set to 5, but given the number of samples per (meta-)class in this data set, the parameter is not necessary. This means 22 is always at least 5 times smaller than the number of samples in a (meta-)class during the growing of the trees in the RF-BHC. Since all the trees were trained using all the available features, the trees are more or less the same, the only difference is that the trees are trained on different subsets of the samples and thus the RF-BHC gives a very similar result as a single BHC. It can be argued that the RF-BHC is a more general classifier due to the nature of the error or accuracy estimates used during training, but as can be seen in Table 15.7 and Table 15.8 the differences are small, at least for this data set and no boosting effect seems to be present when using the RF-BHC approach when compared to a single BHC. TABLE 15.6 Anderson River Data Set: Classification Accuracy (%) for Training, Test, and Validation Sets Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

Training

Test

Validation

87.54 77.50 87.96 84.69 90.54 90.08 86.89

73.20 47.24 70.00 68.79 79.01 81.23 71.18

71.30 46.79 72.01 69.15 77.78 81.00 70.82

Signal and Image Processing for Remote Sensing

338 TABLE 15.7

Anderson River Data Set: Classification Accuracies in Percentage for a Single BHC Tree Classifier Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

15.5.2

Training

Test

50.57 47.91 58.94 72.32 77.60 71.75 62.54

50.40 43.57 59.49 67.23 73.58 72.80 61.02

Experiments with Hyperspectral Data

The data used in this experiment were collected in the framework of the HySens project, managed by Deutschen Zentrum fur Luft-und Raumfahrt (DLR) (the German Aerospace Center) and sponsored by the European Union. The optical sensor reflective optics system imaging spectrometer (ROSIS 03) was used to record four flight lines over the urban area of Pavia, northern Italy. The number of bands of the ROSIS 03 sensor used in the experiments is 103, with spectral coverage from 0.43 mm through 0.86 mm. The flight altitude was chosen as the lowest available for the airplane, which resulted in a spatial resolution of 1.3 m per pixel. The ROSIS data consist of nine classes (Table 15.9): The data were composed of 43923 samples, split up into 3921 training samples and 40002 for testing. Pseudo color image of the area along with the ground truth mask (training and testing samples) are shown in Figure 15.5. This data set was classified using a BHC tree, an RF-BHC, a single CART, and an RF-CART. The BCH, RF-BHC, CART, and RF-CART were applied on the ROSIS data. The forest parameters, m and R (for RF-BHC), were chosen by trials to maximize accuracies. The growing of trees was stopped when the overall accuracy did not improve using additional trees. This is the same procedure as for the Anderson River data set (see Table 15.2). For the RF-BHC, R was chosen to be 5, m chosen to be 25 and the forest was grown to only 10 trees. For the RF-CART, m was set to 25 and the forest was grown to 200 trees. No feature extraction was done at individual nodes in the tree when using the single BHC approach. TABLE 15.8 Anderson River Data Set: Classification Accuracies in Percentage for an RF-BHC, R ¼ 5, m ¼ 22, and 10 Trees Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

Training

Test

51.29 45.37 59.31 72.14 77.92 71.75 62.43

51.12 41.13 57.20 67.80 71.85 72.43 60.37

Random Forest Classification of Remote Sensing Data

339

TABLE 15.9 ROSIS University Data Set: Classes and Number of Samples Class No. 1 2 3 4 5 6 7 8 9

Class Description

Training Samples

Test Samples

Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Total samples

548 540 392 524 265 532 375 514 231 3,921

6,304 18,146 1,815 2,912 1,113 4,572 981 3,364 795 40,002

Classification accuracies are presented in Table 15.11 through Table 15.14. As in the single CART case for the Anderson River data set, approximately 20% of the samples in the original test set were randomly sampled into a new test set to select a pruning level for the tree, leaving 80% of the original test samples for validation as seen in Table 15.10. All the other classification methods used the training and test sets as described in Table 15.9. From Table 15.11 through Table 15.14 we can see that the RF-BHC give the highest overall accuracies of the tree methods where the single BHC, single CART, and the RFCART methods yielded lower and comparable overall accuracies. These results show that using many weak learners as opposed to a few stronger ones is not always the best choice in classification and is dependent on the data set. In our experience the RF-BHC approach is as accurate or more accurate than the RF-CART when the data set consists of moder-

Class color

(a) University ground truth

(b) University pseudo color (Gray) image

bg 1 2 3 4 5 6 7 8 9 FIGURE 15.5 ROSIS University: (a) reference data and (b) gray scale image.

Signal and Image Processing for Remote Sensing

340 TABLE 15.10

ROSIS University Data Set: Train, Test, and Validation Sets Class No. 1 2 3 4 5 6 7 8 9

Class Description

Training Samples

Test Samples

Validation Samples

Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Total samples

548 540 392 524 265 532 375 514 231 3,921

1,261 3,629 363 582 223 914 196 673 159 8,000

5,043 14,517 1,452 2,330 890 3,658 785 2,691 636 32,002

ately to highly separable (meta-)classes, but for difficult data sets the partitioning algorithm used for the BHC trees can fail to converge (inverse of the within-class covariance matrix becomes singular to numerical precision) and thus no BHC classifier can be realized. This is not a problem when using the CART trees as the building blocks partition the input and simply minimize an impurity measure given a split on a node. The classification results for the single CART tree (Table 15.11), especially for the two classes gravel and bare-soil, may be considered unacceptable when compared to the other methods that seem to yield more balanced accuracies for all classes. The classified images for the results given in Table 15.12 through Table 15.14 are shown in Figure 15.6a through Figure 15.6d. Since BHC trees are of fixed size regarding the number of leafs it is worth examining the tree in the single case (Figure 15.7). Notice the siblings on the tree (nodes sharing a parent): gravel (3)/shadow (9), asphalt (1)/bitumen (7), and finally meadows (2)/bare soil (6). Without too much stretch of the imagination, one can intuitively decide that these classes are related, at least asphalt/ bitumen and meadows/bare soil. When comparing the gravel area in the ground truth image (Figure 15.5a) and the same area in the gray scale image (Figure 15.5b), one can see it has gray levels ranging from bright to relatively dark, which might be interpreted as an intuitive relation or overlap between the gravel (3) and the shadow (9) classes. The selfblocking bricks (8) are the class closest to the asphalt-bitumen meta-class, which again looks very similar in the pseudo color image. So the tree more or less seems to place ‘‘naturally’’ TABLE 15.11 Single CART: Training, Test, and Validation Accuracies in Percentage for ROSIS University Data Set Class Description Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

Validation

80.11 83.52 0.00 88.36 97.36 46.99 85.07 84.63 96.10 72.35

70.74 75.48 0.00 97.08 91.03 24.73 82.14 92.42 100.00 69.59

72.24 75.80 0.00 97.00 86.07 26.60 80.38 92.27 99.84 69.98

Random Forest Classification of Remote Sensing Data

341

TABLE 15.12 BHC: Training and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

78.83 93.33 72.45 91.60 97.74 94.92 93.07 85.60 94.37 88.52

69.86 55.11 62.92 92.20 94.79 89.63 81.55 88.64 96.35 69.83

TABLE 15.13 RF-BHC: Training and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

76.82 84.26 59.95 88.36 100.00 75.38 92.53 83.07 96.10 82.53

71.41 68.17 51.35 95.91 99.28 78.85 87.36 92.45 99.50 75.16

TABLE 15.14 RF-CART: Train and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

86.86 90.93 76.79 92.37 99.25 91.17 88.80 83.46 94.37 88.75

80.36 54.32 46.61 98.73 99.01 77.60 78.29 90.64 97.23 69.70

Signal and Image Processing for Remote Sensing

342 (a) Single BHC

(b) RF-BHC

(c) Single CART

(d) RF-CART

Colorbar indicating the color of information classes in images above

1

2

3

4

5 6 Class Number

7

8

9

FIGURE 15.6 ROSIS University: image classified by (a) single BHC, (b) RF-BHC, (c) single CART, and (d) RF-CART.

related classes close to one another in the tree. That would mean that classes 2, 6, 4, and 5 are more related to each other than to classes 3, 9, 1, 7, or 8. On the other hand, it is not clear if (painted) metal sheets (5) are ‘‘naturally’’ more related to trees (4) than to bare soil (6) or asphalt (1). However, the point is that the partition algorithm finds the ‘‘clearest’’ separation between meta-classes. Therefore, it may be better to view the tree as a separation hierarchy rather than a relation hierarchy. The single BHC classifier finds that class 5 is the most separable class within the first right meta-class of the tree, so it might not be related to meta-class 2–6–4 in any ‘‘natural’’ way, but it is more separable along with these classes when the whole data set is split up to two meta-classes.

52/48

30/70

86/14

67/33

64/36

63/37 FIGURE 15.7 The BHC tree used for classification of Figure 15.5 with left/right probabilities (%).

3

51/49

60/40

9

1

7

8

2

6

4

5

Random Forest Classification of Remote Sensing Data

15 .6

343

Conc lusi ons

The use of random forests for classification of multi-source remote sensing data and hyperspectral remote sensing data has been discussed. Random forests should be considered attractive for classification of both data types. They are both fast in training and classification, and are distribution-free classifiers. Furthermore, the problem with the curse of dimensionality is naturally addressed by the selection of a low m, without having to discard variables and dimensions completely. The only parameter random forests are truly sensitive to is the number of variables m, the nodes in every tree draw at random during training. This parameter should generally be much smaller than the total number of available variables, although selecting a high m can yield good classification accuracies, as can be seen above for the Anderson River data (Table 15.2). In experiments, two types of random forests were used, that is, random forests based on the CART approach and random forests that use BHC trees. Both approaches performed well in experiments. They gave excellent accuracies for both data types and were shown to be very fast.

Acknowledgment This research was supported in part by the Research Fund of the University of Iceland and the Assistantship Fund of the University of Iceland. The Anderson River SAR/MSS data set was acquired, preprocessed, and loaned by the Canada Centre for Remote Sensing, Department of Energy Mines and Resources, Government of Canada.

References 1. L.K. Hansen and P. Salamon, Neural network ensembles, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 993–1001, 1990. 2. L.I. Kuncheva, Fuzzy versus nonfuzzy in combining classifiers designed by Boosting, IEEE Transactions on Fuzzy Systems, 11, 1214–1219, 2003. 3. J.A. Benediktsson and P.H. Swain, Consensus Theoretic Classification Methods, IEEE Transactions on Systems, Man and Cybernetics, 22(4), 688–704, 1992. 4. S. Haykin, Neural Networks, A Comprehensive Foundation, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1999. 5. L. Breiman, Bagging predictors, Machine Learning, 24I(2), 123–140, 1996. 6. Y. Freund and R.E. Schapire: Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference, 148–156, 1996. 7. J. Ham, Y. Chen, M.M. Crawford, and J. Ghosh, Investigation of the random forest framework for classification of hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing, 43(3), 492–501, 2005. 8. S.R. Joelsson, J.A. Benediktsson, and J.R. Sveinsson, Random forest classifiers for hyperspectral data, IEEE International Geoscience and Remote Sensing Symposium (IGARSS 0 05), Seoul, Korea, 25–29 July 2005, pp. 160–163. 9. P.O. Gislason, J.A. Benediktsson, and J.R. Sveinsson, Random forests for land cover classification, Pattern Recognition Letters, 294–300, 2006.

344

Signal and Image Processing for Remote Sensing

10. L. Breiman, Random forests, Machine Learning, 45(1), 5–32, 2001. 11. L. Breiman, Random forest, Readme file. Available at: http://www.stat.berkeley.edu/~ briman/ RandomForests/cc.home.htm Last accessed, 29 May, 2006. 12. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd ed., John Wiley & Sons, New York, 2001. 13. S. Kumar, J. Ghosh, and M.M. Crawford, Hierarchical fusion of multiple classifiers for hyperspectral data analysis, Pattern Analysis & Applications, 5, 210–220, 2002. 14. http://oz.berkeley.edu/users/breiman/RandomForests/cc_home.htm (Last accessed, 29 May, 2006.) 15. G.J. Briem, J.A. Benediktsson, and J.R. Sveinsson, Multiple classifiers applied to multisource remote sensing data, IEEE Transactions on Geoscience and Remote Sensing, 40(10), 2291–2299, 2002. 16. D.G. Goodenough, M. Goldberg, G. Plunkett, and J. Zelek, The CCRS SAR/MSS Anderson River data set, IEEE Transactions on Geoscience and Remote Sensing, GE-25(3), 360–367, 1987. 17. L. Jimenez and D. Landgrebe, Supervised classification in high-dimensional space: Geometrical, statistical, and asymptotical properties of multivariate data, IEEE Transactions on Systems, Man, and Cybernetics, Part. C, 28, 39–54, 1998. 18. http://www.stat.berkeley.edu/users/breiman/RandomForests/cc_software.htm (Last accessed, 29 May, 2006.)

16 Supervised Image Classification of Multi-Spectral Images Based on Statistical Machine Learning

Ryuei Nishii and Shinto Eguchi

CONTENTS 16.1 Introduction ..................................................................................................................... 346 16.2 AdaBoost .......................................................................................................................... 346 16.2.1 Toy Example in Binary Classification ........................................................... 347 16.2.2 AdaBoost for Multi-Class Problems .............................................................. 348 16.2.3 Sequential Minimization of Exponential Risk with Multi-Class .............. 348 16.2.3.1 Case 1................................................................................................. 349 16.2.3.2 Case 2................................................................................................. 349 16.2.4 AdaBoost Algorithm ........................................................................................ 350 16.3 LogitBoost and EtaBoost................................................................................................ 350 16.3.1 Binary Class Case.............................................................................................. 350 16.3.2 Multi-Class Case ............................................................................................... 351 16.4 Contextual Image Classification................................................................................... 352 16.4.1 Neighborhoods of Pixels.................................................................................. 352 16.4.2 MRFs Based on Divergence ............................................................................ 353 16.4.3 Assumptions ...................................................................................................... 353 16.4.3.1 Assumption 1 (Local Continuity of the Classes) ........................ 353 16.4.3.2 Assumption 2 (Class-Specific Distribution) ................................ 353 16.4.3.3 Assumption 3 (Conditional Independence) ................................ 354 16.4.3.4 Assumption 4 (MRFs) ..................................................................... 354 16.4.4 Switzer’s Smoothing Method.......................................................................... 354 16.4.5 ICM Method....................................................................................................... 354 16.4.6 Spatial Boosting................................................................................................. 355 16.5 Relationships between Contextual Classification Methods..................................... 356 16.5.1 Divergence Model and Switzer’s Model....................................................... 356 16.5.2 Error Rates ......................................................................................................... 357 16.5.3 Spatial Boosting and the Smoothing Method .............................................. 358 16.5.4 Spatial Boosting and MRF-Based Methods .................................................. 359 16.6 Spatial Parallel Boost by Meta-Learning..................................................................... 359 16.7 Numerical Experiments ................................................................................................. 360 16.7.1 Legends of Three Data Sets............................................................................. 361 16.7.1.1 Data Set 1: Synthetic Data Set........................................................ 361 16.7.1.2 Data Set 2: Benchmark Data Set grss_dfc_0006 .......................... 361 16.7.1.3 Data Set 3: Benchmark Data Set grss_dfc_0009 .......................... 361 16.7.2 Potts Models and the Divergence Models.................................................... 361 16.7.3 Spatial AdaBoost and Its Robustness ............................................................ 363 345

Signal and Image Processing for Remote Sensing

346

16.7.4 Spatial AdaBoost and Spatial LogitBoost ..................................................... 365 16.7.5 Spatial Parallel Boost ........................................................................................ 367 16.8 Conclusion ....................................................................................................................... 368 Acknowledgment ....................................................................................................................... 370 References ................................................................................................................................... 370

16.1

Introduction

Image classification for geostatistical data is one of the most important issues in the remote-sensing community. Statistical approaches have been discussed extensively in the literature. In particular, Markov random fields (MRFs) are used for modeling distributions of land-cover classes, and contextual classifiers based on MRFs exhibit efficient performances. In addition, various classification methods were proposed. See Ref. [3] for an excellent review paper on classification. See also Refs. [1,4–7] for a general discussion on classification methods, and Refs. [8,9] for backgrounds on spatial statistics. In a paradigm of supervised learning, AdaBoost was proposed as a machine learning technique in Ref. [10] and has been widely and rapidly improved for use in pattern recognition. AdaBoost linearly combines several weak classifiers into a strong classifier. The coefficients of the classifiers are tuned by minimizing an empirical exponential risk. The classification method exhibits high performance in various fields [11,12]. In addition, fusion techniques have been discussed [13–15]. In the present chapter, we consider contextual classification methods based on statistics and machine learning. We review AdaBoost with binary class labels as well as multi-class labels. The procedures for deriving coefficients for classifiers are discussed, and robustness for loss functions is emphasized here. Next, contextual image classification methods including Switzer’s smoothing method [1], MRF-based methods [16], and spatial boosting [2,17] are introduced. Relationships among them are also pointed out. Spatial parallel boost by meta-learning for multi-source and multi-temporal data classification is proposed. The remainder of the chapter is organized as follows. In Section 16.2, AdaBoost is briefly reviewed. A simple example with binary class labels is provided to illustrate AdaBoost. Then, we proceed to the case with multi-class labels. Section 16.3 gives general boosting methods to obtain the robustness property of the classifier. Then, contextual classifiers including Switzer’s method, an MRF-based method, and spatial boosting are discussed. Relationships among them are shown in Section 16.5. The exact error rate and the properties of the MRF-based classifier are given. Section 16.6 proposes spatial parallel boost applicable to classification of multi-source and multi-temporal data sets. The methods treated here are applied to a synthetic data set and two benchmark data sets, and the performances are examined in Section 16.7. Section 16.8 concludes the chapter and mentions future problems.

16.2

AdaBoost

We begin this section with a simple example to illustrate AdaBoost [10]. Later, AdaBoost with multi-class labels is mentioned.

Supervised Image Classification of Multi-Spectral Images 16.2.1

347

Toy Example in Binary Classification

Suppose that a q-dimensional feature vector x 2 Rq observed by a supervised example labeled by þ 1 or 1 is available. Furthermore, let gk(x) be functions (classifiers) of the feature vector x into label set {þ1, 1} for k ¼ 1, 2, 3. If these three classifiers are equally efficient, a new function, sign( f1(x) þ f2(x) þ f3(x)), is a combined classifier based on a majority vote, where sign(z) is the sign of the argument z. Suppose that classifier f1 is the most reliable, f2 has the next greatest reliability, and f3 is the least reliable. Then, a new function sign (b1f1(x) þ b2f2(x) þ b3f3(x)) is a boosted classifier based on a weighted vote, where b1 > b2 > b3 are positive constants to be determined according to efficiencies of the classifiers. Constants bk are tuned by minimizing the empirical risk, which will be defined shortly. In general, let y be the true label of feature vector x. Then, label y is estimated by a signature, sign(F(x)), of a classification function F(x). Actually, if F(x) > 0, then x is classified into the class with label 1, otherwise into 1. Hence, if yF(x) < 0, vector x is misclassified. For evaluating classifier F, AdaBoost in Ref. [10] takes the exponential loss function defined by Lexp (F j x, y) ¼ exp {yF(x)}

(16:1)

The loss function Lexp(t) ¼ exp(t) vs. t ¼ yF(x) is given in Figure 16.1. Note that the exponential function assigns a heavy loss to an outlying example that is misclassified. AdaBoost is apt to overlearn misclassified examples. Let {(xi, yi) 2 Rq { þ 1, 1} j i ¼ 1, 2, . . . , n} be a set of training data. The classification function, F, is determined to minimize the empirical risk: Rexp (F) ¼

n n 1X 1X Lexp (F j xi , yi ) ¼ exp {yi F(xi )} n i ¼1 n i¼1

(16:2)

In the toy example above, F(x) is b1f1(x) þ b2f2(x) þ b3f3(x) and coefficients b1, b2, b3 are tuned by minimizing the empirical risk in Equation 16.2. A fast sequential procedure for minimizing the empirical risk is well known [11]. We will provide a new understanding of the procedure in the binary class case as well as in the multi-class case in Section 16.2.3. A typical classifier is a decision stump defined by a function d sign(xj t), where d ¼ + 1, t 2 R and xj denotes the j-th coordinate of the feature vector x. Nevertheless, each decision stump is poor. Finally, a linearly combined function of many stumps is expected to be a strong classification function.

6 5

exp logit eta 0−1

4 3 2 1 0 −1 −3

−2

−1

0

1

2

3

FIGURE 16.1 Loss functions (loss vs. yF(x)).

Signal and Image Processing for Remote Sensing

348 16.2.2

AdaBoost for Multi-Class Problems

We will give an extension of loss and risk functions to cases with multi-class labels. Suppose that there are g possible land-cover classes C1, . . . , Cg, for example, coniferous forest, broad leaf forest, and water area. Let D ¼ {1, . . . , n} be a training region with n pixels over some scene. Each pixel i in region D is supposed to belong to one of the g classes. We denote a set of all class labels by G ¼ {1, . . . , g}. Let xi 2 Rq be a q-dimensional feature vector observed at pixel i, and yi be its true label in label set G. Note that pixel i in region D is a numbered small area corresponding to the observed unit on the earth. Let F(x, k) be a classification function of feature vector x 2 Rq and label k in set G. We allocate vector x into the class with label ^ yF 2 G by the following maximizer: ^ yF ¼ arg max F(x, k) k2G

(16:3)

Typical examples of the strong classification function would be given by posterior probability functions. Let p(x j k) be a class-specific probability density function of the k-th class, Ck. Thus, the posterior probability of the label, Y ¼ k, given feature vector x, is defined by X p(k j x) ¼ p(x j k)= p(x j ‘) with k 2 G (16:4) ‘2G

which gives a strong classification function, where the prior distribution of class Ck is assumed to be uniform. Note that the label estimated by posteriors p(k j x), or equivalently by log posteriors log p(k j x), is just the Bayes rule of classification. Note also that p(k j x) is a measure of the confidence of the current classification and is closely related to logistic discriminant functions [18]. Let y 2 G be the true label of feature vector x and F(x) a classification function. Then, the loss by misclassification into class label k is assessed by the following exponential loss function: Lexp (F, k j x, y) ¼ exp {F(x, k) F(x, y)}

for k 6¼ y with k 2 G

(16:5)

This is an extension of the exponential loss (Equation 16.1) with binary classification. The empirical risk is defined by averaging the loss functions over the training data set {(xi, yi) 2 Rq G j i 2 D} as Rexp (F) ¼

1XX 1XX Lexp (F, k j xi , yi ) ¼ exp {F(xi , k) F(xi , yi )} n i2D k6¼y n i2D k6¼y i

(16:6)

i

AdaBoost determines the classification function F to minimize exponential risk Rexp(F), in which F is a linear combination of base functions. 16.2.3

Sequential Minimization of Exponential Risk with Multi-Class

Let f and F be fixed classification functions. Then, we obtain the optimal coefficient, b*, which gives the minimum value of empirical risk Remp(F þ b f): b* ¼ arg min {Rexp (F þ bf )}; b2R

(16:7)

Supervised Image Classification of Multi-Spectral Images

349

Applying procedure in Equation 16.7 sequentially, we combine classifiers f1, f2, . . . , fT as F(0) 0, F(1) ¼ b1 f1 , F(2) ¼ b1 f1 þ b2 f2 , . . . , F(T) ¼ b1 f1 þ b2 f2 þ þ bT fT where bT is defined by the formula given in Equation 16.7 with F ¼ F(t 16.2.3.1

1)

and f ¼ ft.

Case 1

Suppose that function f(, k) takes values 0 or 1, and it takes 1 only once. In this case, coefficient b in Equation 16.7 is given by the closed form as follows. Let ^yi,f be the label of * pixel i selected by classifier f, and Df be a subset of D classified correctly by f. Define Vi (k) ¼ F(xi , k) F(xi , yi ) and

vi (k) ¼ f (xi , k) f (xi , yi )

(16:8)

Then, we obtain Rexp (F þ bf ) ¼

n X X

exp [Vi (k) þ bvi (k)]

i¼1 k6¼yi

X X exp Vj (^yjf ) þ exp {Vi (k)} i2Df k6¼yi j62Df j62Df k6¼yi , ^yjf sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ XX X X X ^2 exp {Vi (k)} exp {Vj (^yjf )} þ exp {Vj (k)} (16:9) i2Df k2yi j62Df j62Df k6¼yi , ^yjf ¼ eb

XX

exp {Vi (k)} þ eb

X

The last inequality is due to the relationship between the arithmetic and the geometric means. The equality holds if and only if b ¼ b , where * 2 3 X X X 1 (16:10) exp {Vi (k)}= exp {Vj (^yjf )}5 b ¼ log4 * 2 i2D k6¼y j62D f

f

i

The optimal coefficient, b , can be expressed as *

1 1 «F (f ) b ¼ log * 2 «F (f )

with «F (f ) ¼ P

P

exp Vj (^yjf )

j62Df

P P exp Vj (^yjf ) þ expfVi (k)g

j62Df

(16:11)

i2Df k6¼yi

In the binary class case, «F( f) coincides with the error rate of classifier f. 16.2.3.2

Case 2

If f(, k) takes real values, there is no closed form of coefficient b*. We must perform an iterative procedure for the optimization of risk Remp. Using the Newton-like method, we update estimate b(t) at the t-th step as follows: b(tþ1) ¼ b(t)

n X X i¼1 k6¼yi

vi (k) exp [Vi (k) þ b(t) vi (k)]=

n X X

v2i (k) exp [Vi (k) þ b(t) vi (k)]

i¼1 k6¼yi

(16:12)

Signal and Image Processing for Remote Sensing

350

where vi(k) and Vi(k) are defined in the formulas in Equation 16.8. We observe that the convergence of the iterative procedure starting from b(0) ¼ 0 is very fast. In numerical examples in Section 16.7, the procedure converges within five steps in most cases. 16.2.4

AdaBoost Algorithm

Now, we summarize an iterative procedure of AdaBoost for minimizing the empirical exponential risk. Let {F} ¼ {f : Rq ! G} be a set of classification functions, where G ¼ {1, . . . , g} is the label set. AdaBoost combines classification functions as follows: .

.

.

.

Find classification function f in F and coefficient b that jointly minimize empirical risk Rexp( bf ) defined in Equation 16.6, for example, f1 and b1. Consider empirical risk Rexp(b1f1 þ bf) with b1f1 given from the previous step. Then, find classification function f 2 {F} and coefficient b that minimize the empirical risk, for example, f2 and b2. This procedure is repeated T-times and the final classification function FT ¼ b1 f1 þ þ bTfT is obtained. Test vector x 2 Rq is classified into the label maximizing the final function FT(x, k) with respect to k 2 G.

Substituting exponential risk Rexp for other risk functions, we have different classification methods. Risk functions Rlogit and Reta will be defined in the next section.

16.3

Logi tBoost and EtaBoost

AdaBoost was originally designed to combine weak classifiers for deriving a strong classifier. However, if we combine strong classifiers with AdaBoost, the exponential loss assigns an extreme penalty for misclassified data. It is well known that AdaBoost is not robust. In the multi-class case, this seems more serious than the binary class case. Actually, this is confirmed by our numerical example in Section 16.7.3. In this section, we consider robust classifiers derived by a loss function that is more robust than the exponential loss function.

16.3.1

Binary Class Case

Consider binary class problems such that feature vector x with true label y 2 {1,1} is classified into class label sign (F(x)). Then, we take the logit and the eta loss functions defined by Llogit (Fjx, y) ¼ log [1 þ exp {yF(x)}] Leta (F j x, y) ¼ (1 h) log [1 þ exp {yF(x)}] þ h{yF(x)}

(16:13) for 0 < h < 1

(16:14)

The logit loss function is derived by the log posterior probability of a binomial distribution. The eta loss function, an extension of the logit loss, was proposed by Takenouchi and Eguchi [19].

Supervised Image Classification of Multi-Spectral Images

351

Three loss functions given in Figure 16.1 are defined as follows: Lexp (t) ¼ exp (t), Llogit (t) ¼ log {1 þ exp (2t)} log 2 þ 1 and Leta (t) ¼ (1=2)Llogit (t) þ (1=2)(t)

(16:15)

We see that the logit and the eta loss functions assign less penalty for misclassified data than the exponential loss function does. In addition, the three loss functions are convex and differentiable with respect to t. The convexity assures the uniqueness of the coefficient minimizing Remp(F þ bf ) with respect to b, where Remp denotes an empirical risk function under consideration. The convexity makes the sequential minimization of the empirical risk feasible. For corresponding empirical risk functions, we define the empirical risks as follows:

Rlogit (F) ¼

Reta (F) ¼

16.3.2

n 1X log [1 þ exp {yi F(xi )}], and n i¼1

n n 1h X hX log [1 þ exp {yi F(xi )}] þ {yi F(xi )} n i¼1 n i¼1

(16:16)

(16:17)

Multi-Class Case

Let y be the true label of feature vector x, and F(x, k) a classification function. Then, we define the following function in a similar manner to that of posterior probabilities: plogit (y j x) ¼ P

exp {F(x, y)} k2G exp {F(x, k)}

Using the function, we define the loss functions in the multi-class case as follows: Llogit (F j x, y) ¼ log plogit (y j x) and Leta (F j x, y) ¼ {1 (g 1)h}{ log plogit (y j x)} þ h

X k6¼y

log plogit (k j x)

where h is a constant with 0 < h < 1/(g1). Then empirical risks are defined by the average of the loss functions evaluated by training data set {(xi, yi) 2 Rq Gji 2 D} as Rlogit (F) ¼

n 1X Llogit (F j xi , yi ) and n i¼1

Reta (F) ¼

n 1X Leta (F j xi , yi ) n i¼1

(16:18)

LogitBoost and EtaBoost aim to minimize logit risk function Rlogit(F) and eta risk function Reta(F), respectively. These risk functions are expected to be more robust than the exponential risk function. Actually, EtaBoost is more robust than LogitBoost in the presence of mislabeled training examples.

Signal and Image Processing for Remote Sensing

352

16 .4

Context ual I ma ge Clas sific ation

Ordinary classifiers proposed for independent samples are of course utilized for image classification. However, it is known that contextual classifiers show better performance than noncontextual classifiers. In this section, contextual classifiers: the smoothing method by Switzer [1], the MRF-based classifiers, and spatial boosting [17], will be discussed.

16.4.1

Neighborhoods of Pixels

In this subsection, we define notations related to observations and two sorts of neighborhoods. Let D ¼ {1, . . . , n} be an observed area consisting of n pixels. A q-dimensional feature vector and its observation at pixel i are denoted as Xi and xi, respectively, for i in area D. The class label covering pixel i is denoted by random variable Yi, where Yi takes an element in the label set G ¼ {1, . . . , g}. All feature vectors are expressed in vector form as X ¼ (XT1 , . . . , XTn )T : nq 1

(16:19)

In addition, we define random label vectors as Y ¼ (Y1 , . . . , Yn )T : n 1

and

Yi ¼ Y with deleted Yi : (n 1) 1

(16:20)

Recall that class-specific density functions are defined by p(x j k) with x 2 Rq for deriving the posterior distribution in Equation 16.4. In the numerical study in Section 16.7, the densities are fitted by homoscedastic q-dimensional Gaussian distributions, Nq(m(k), S), with common variance–covariance matrix S, or heteroscedastic Gaussian distributions, Nq(m(k), Sk), with class-specific variance–covariance matrices Sk. Here, we define neighborhoods to provide contextual information. Let d(i, j) denote the distance between centers of pixels i and j. Then, we define two kinds of neighborhoods of pixel i as follows: Ur (i) ¼ { j 2 D j d(i, j) ¼ r} and Nr (i) ¼ {j 2 D j % d(i, j) % r} (16:21) pﬃﬃﬃ where r ¼ 1, 2, 2, . . ., which denotes the radius of the neighborhood. pﬃﬃﬃNote that subset Ur(i) constitutes an isotropic ring region. Subsets Ur(i) with r ¼ 0, 1, 2, 2 are shown in Figure 16.2. Here, we find that U0(i) ¼ {i}, N1(i) ¼ U1(i) is the first-order neighborhood, and Npﬃﬃ2 (i) ¼ U1 (i) [ Upﬃﬃ2 (i) forms the second-order neighborhood of pixel i. In general, we have Nr(i) ¼ [1 % r0 % r Ur0 (i) for r ^ 1.

(a) r = 0

i

(b) r = 1

i

FIGURE 16.2 Isotropic neighborhoods Ur(i) with center pixel i and radius r.

(c) r = √2

i

(d) r = 2

i

Supervised Image Classification of Multi-Spectral Images 16.4.2

353

MRFs Based on Divergence

Here, we will discuss the spatial distribution of the classes. A pairwise dependent MRF is an important model for specifying the field. Let D(k, ‘) > 0 be a divergence between two classes, Ck and C‘ (k 6¼ ‘), and put D(k, k) ¼ 0. The divergence is employed for modeling the MRF. In Potts model, D(k, ‘) is defined by D0 (k, ‘): ¼ 1 if k 6¼ ‘:¼ 0 otherwise. Nishii [18] proposed to take the squared Mahalanobis distance between homoscedastic Gaussian distributions Nq(m(k), S) defined by D1 (m(k), m(‘)) ¼ {m(k) m(‘)}T S1 {m(k) m(‘)}

(16:22) Ð

Nishii and Eguchi (2004) proposed to take Jeffreys divergence {p(x j k) p(xj‘)} log{p(x j k)/p(x j ‘)}dx between densities p(x j k). The models are called divergence models. Let Di(g) be the average of divergences in the neighborhood Nr(i) defined by Equation 16.21 as follows: ( 1 P D(k, yj ), if jNr (i) j 1 j Nr (i)j Di (k) ¼ for (i, k) 2 D G (16:23) j2Nr (i) 0 otherwise where jSj denotes the cardinality of set S. Then, random variable Yi conditional on all the other labels Yi ¼ yi is assumed to follow a multinomial distribution with the following probabilities: exp {bDi (k)} Pr{Yi ¼ k j Yi ¼ yi } ¼ P exp {bDi (‘)}

for k 2 G

(16:24)

‘2G

Here, b is a non-negative constant called the clustering parameter, or the granularity of the classes, and Di(k) is defined by the formula given in Equation 16.23. Parameter b characterizes the degree of the spatial dependency of the MRF. If b ¼ 0, then the classes are spatially independent. Here, radius r of neighborhood Ur(i) denotes the extent of spatial dependency. Of course, b, as well as r, are parameters that need to be estimated. Due to the Hammersley–Clifford theorem, conditional distribution in Equation 16.24 is known to specify the distribution of test label vector, Y, under the mild condition. The joint distribution of test labels, however, cannot be obtained in a closed form. This causes a difficulty in estimating the parameters specifying the MRF. Geman and Geman [6] developed a method for the estimation of test labels by simulated annealing. However, the procedure is time consuming. Besag [4] proposed an iterative conditional mode (ICM) method, which is reviewed in Section 16.4.5. 16.4.3

Assumptions

Now, we make the following assumptions for deriving classifiers. 16.4.3.1 Assumption 1 (Local Continuity of the Classes) If a class label of a pixel is k 2 G, then pixels in the neighborhood have the same class label k. Furthermore, this is true for any pixel. 16.4.3.2 Assumption 2 (Class-Specific Distribution) A feature vector of a sample from class Ck follows a class-specific probability density function p(x j k) for label k in G.

Signal and Image Processing for Remote Sensing

354 16.4.3.3

Assumption 3 (Conditional Independence)

The conditional distribution of vector X in Equation 16.19 given label vector Y ¼ y in Equation 16.20 is given by Pi 2 D p(xi j yi). 16.4.3.4

Assumption 4 (MRFs)

Label vector Y defined by Equation 16.20 follows an MRF specified by divergence (quasidistance) between the classes.

16.4.4

Switzer’s Smoothing Method

Switzer [1] derived the contextual classification method (the smoothing method) under Assumptions 1–3 with homoscedastic Gaussian distributions Nq(m(k), S). Let c(x j k) be its probability density function. Assume that Assumption 1 holds for neighborhoods Nr(). Then, he proposed to estimate label yi of pixel i by maximizing the following joint probability densities: c(xi j k) Pj2Nr (i) c(xj j k)

c(x j k) (2p)q=2 j Sj1=2 exp {D1 (x, m(k))=2}

with respect to label k 2 G, where D1(,) stands for the squared Mahalanobis distance in Equation 16.22. The maximization problem is equivalent to minimizing the following quantity: X D1 (xi , m(k)) þ D1 (xj , m(k)) (16:25) j2Nr (i)

Obviously, Assumption 1 does not hold for the whole image. However, the method still exhibits good performance, and the classification is performed very quickly. Thus, the method is a pioneering work of contextual image classification.

16.4.5

ICM Method

Under Assumptions 2–4 with conditional distribution in Equation 16.24, the posterior probability of Yi ¼ k given feature vector X ¼ x and label vector Yi ¼ yi is expressed by exp {bDi (k)}p(xi j k) pi (k j r, b) Pr{Yi ¼ k j X ¼ x, Yi ¼ yi } ¼ P exp {bDi (‘)gp(xi j ‘)

(16:26)

‘2G

Then, the posterior probability Pr{Y ¼ y j X ¼ x} of label vector y is approximated by the pseudo-likelihood PL(y j r, b) ¼

n Y

pi (yi j r, b)

(16:27)

i¼1

where posterior probability pi (yi j r, b) is defined by Equation 16.26. Pseudo-likelihood in Equation 16.27 is used for accuracy assessment of the classification as well as for parameter estimation. Here, class-specific densities p(x j k) are estimated using the training data.

Supervised Image Classification of Multi-Spectral Images

355

When radius r and clustering parameter b are given, the optimal label vector y, which maximizes the pseudo-likelihood PL(y j r, b) defined by Equation 16.27, is usually estimated by the ICM procedure [4]. At the (t þ 1)-st step, ICM finds the optimal label, yi(t þ 1), (t) for all test pixels i 2 D. This procedure is repeated until the convergence of the given yi label vector, for example y ¼ y(r, b) : n 1. Furthermore, we must optimize a pair of parameters (r, b) by maximizing pseudo-likelihood PL(y(r, b) j r, b). 16.4.6

Spatial Boosting

As shown in Section 16.2, AdaBoost combines classification functions defined over the feature space. Of course, the classifiers give noncontextual classification. We extend AdaBoost to build contextual classification functions, which we call spatial AdaBoost. Define an averaged logarithm of the posterior probabilities (Equation 16.4) in neighborhood Ur(i) (Equation 16.21) by 8 X < 1 pﬃﬃﬃ log p(k j xj ) if jUr (i) j 1 j Ur (i)j j2U (i) for r ¼ 0, 1, 2, . . . (16:28) fr (x, k j i) ¼ r : 0 otherwise where x ¼ (x1T, . . . , xnT)T : qn 1. Therefore, the averaged log posterior f0 (x, k j i) with radius r ¼ 0 is equal to the log posterior log p(k j xi) itself. Hence, the classification due to function f0(x, k j i) is equivalent to a noncontextual classification based on the maximum-aposteriori (MAP) criterion. If the spatial dependency among the classes is not negligible, then the averaged log posteriors f1(x, k j i) in the first-order neighborhood may have information for classification. If the spatial dependency becomes stronger, then fr(x, k j i) with a larger r is also useful. Thus, we adopt the average of the log posteriors fr(x, k j i) as a classification function of center pixel i. The efficiency of the averaged log posteriors as classification functions would be intuitively arranged in the following order: f0 (x, k j i), f1 (x, k j i), fp2ﬃﬃ (x, k j i), f2 (x, k j i), , where x ¼ (xT1 , , xTn )T

(16:29)

The coefficients for the above classification functions can be tuned by minimizing the empirical risk given by Equation 16.6 or Equation 16.18. See Ref. [2] for possible candidates for contextual classification functions. The following is the contextual classification procedure based on the spatial boosting method. .

.

Fix an empirical risk function, Remp(F), of classification function F evaluated over training data set {(xi, yi) 2 Rq G j i 2 D}. Let f0 (x, k j i), f1(x, k j i), fpﬃﬃ2 (x, k j i), . . . , fr (x, k j i) be the classification functions defined by Equation 16.28.

.

Find coefficient b that minimizes empirical risk Remp (bf0). Put the optimal value to b0.

.

If coefficient b0 is negative, quit the procedure. Otherwise, consider empirical risk Remp(b0f0 þ b f1) with b0 f0 obtained by the previous step. Then, find coefficient b, which minimizes the empirical risk. Put the optimal value to b1.

.

If b1 is negative, quit the procedure. Otherwise, consider empirical risk Remp (b0f0 þ b1f1 þ bfpﬃﬃ2 ). This procedure is repeated, and we obtain a sequence of positive coefficients b0, b1, . . . , br for the classification functions.

Signal and Image Processing for Remote Sensing

356

Finally, the classification function is derived by Fr (x, k j i) ¼ b0 f0 (x, k j i) þ b1 f1 (x, k j i) þ þ br fr (x, k j i), x ¼ (xT1 , . . . , xTn )T

(16:30)

Test label y* of test vector x* 2 Rq is estimated by maximizing classification function in Equation 16.30 with respect to label k 2 G. Note that the pixel is classified by the feature vector at the pixel as well as feature vectors in neighborhood Nr(i) in the test area only. There is no need to estimate labels of neighbors, whereas the ICM method requires estimated labels of neighbors and needs an iterative procedure for the classification. Hence, we claim that spatial boosting provides a very fast classifier.

16.5

Relationshi ps between Contextual Classification Methods

Contextual classifiers discussed in the chapter can be regarded as an extension of Switzer’s method from a unified viewpoint, cf. [16] and [2].

16.5.1

Divergence Model and Switzer’s Model

Let us consider the divergence model in Gaussian MRFs (GMRFs), where feature vectors follow homoscedastic Gaussian distributions Nq(m(k), S). The divergence model can be viewed as a natural extension of Switzer’s model. The image with center pixel 1 and its neighbors is shown in Figure 16.3. First-order and second-order neighborhoods of the center pixel are given by sets of pixel numbers N1(1) ¼ {2, 4, 6, 8} and Npﬃﬃ2 (1) ¼ {2, 3, . . . , 9}, respectively. We focus our attention on center pixel 1 and its neighborhood Nr(1) of size 2K in general and discuss the classification problem of center pixel 1 when labels yj of 2K neighbors are observed. ^ be a non-negative estimated value of the clustering parameter b. Then, label y1 of Let b center pixel 1 is estimated by the ICM algorithm. In this case, the estimate is derived by maximizing conditional probability (Equation 16.26) with p(x j k) ¼ c(x j k). This is equiva^Div defined by lent to finding label Y ^ X ^ Div ¼ arg min {D1 (x1 , m(k)) þ b D1 (m(yj ), m(k))}, Y K j2N (1) r k2G

jNr (1) j ¼ 2K

(16:31)

where D1(s, t) is the squared Mahalanobis distance (Equation 16.22). Switzer’s method [1] classifies the center pixel by minimizing formula given in Equation 16.25 with resp Pect to label k. Here, the method can be slightly extended by changing ^ /K. Thus, we define the estimate due to the coefficient for j 2 Nr (1) D1(xj, m(k)) from 1 to b Switizer’s method as follows:

FIGURE 16.3 Pixel numbers (left) and pixel labels (right).

5

4

3

1

1

1

6

1

2

1

1

2

7

8

9

1

2

2

Supervised Image Classification of Multi-Spectral Images

^ Switzer Y

357

8 <

9 = X ^ b ¼ arg min D1 (x1 , m(k)) þ D1 (xj , m(k)) : ; K j2N (1) r k2G

(16:32)

Estimates in Equation 16.31 and Equation 16.32 of label y1 are given by the same formula except for m(yj) and xj in the respective last terms. Note that xj itself is a primitive estimate of m(yj). Hence, the classification method based on the divergence model is an extension of Switzer’s method. 16.5.2

Error Rates

We derive the exact error rate due to the divergence model and Switzer’s model in the previous local region with two classes G ¼ {1, 2}. In the two-class case, the only positive quasi-distance is D(1, 2). Substituting bD(1, 2) for b, we note that the MRF based on Jeffreys divergence is reduced to the Potts model. Let d be the Mahalanobis distance between distributions Nq(m(k), S) for k ¼ 1, 2, and Nr (1) be a neighborhood consisting of 2K neighbors of center pixel 1, where K is a fixed natural number. Furthermore, suppose that the number of neighbors with label 1 or 2 is randomly changing. Our aim is to derive the error rate of pixel 1 given features x1, xj, and labels yj of neighbors j in Nr(1). Recall that Y^Div is the estimated label of y1 obtained by ^Div 6¼ Y1}, is given by formula in Equation 16.31. Then, the exact error rate, Pr{Y e (b^; b, d) ¼ p0 F(d=2) þ

K X

( pk

k¼1

F( d=2 kb^d=g) F(d=2 þ kb^d=g) þ 2 2 1 þ ekbd =g 1 þ ekbd =g

) (16:33)

^ is an where F(x) is the cumulative standard Gaussian distribution function, and b estimate of clustering parameter b. Here, pk gives a prior probability such that the number of neighbors with label 1, W1, is equal to K þ k or K k in the neighborhood Nr(1) for k ¼ 0, 1, . . . , K. In Figure 16.3, first-order neighborhood N1(1) is given by {2, 4, 6, 8} with (W1, K) ¼ (2, 2), and second-order neighborhood Npﬃﬃ2 (1) is given by {2, 3, . . . ,9} with (W1, K) ¼ (3, 4); see Ref. [16]. If prior probability p0 is equal to one, K pixels in neighborhood Nr(1) are labeled 1 and the remaining K pixels are labeled 2 with probability one. In this case, the majority vote of the neighbors does not work. Hence, we assume that p0 is less than one. Then, we have ^ ; b, d); see Ref. [16]. the following properties of the error rate, e(b .

K P

pk . kbd2 =g k¼1 1 þ e ^ ; b, d), of b ^ takes a minimum at b ^ ¼ b (Bayes rule), and the P2. The function, e(b minimum e(b; b, d) is a monotonically decreasing function of the Mahalanobis distance, d, for any fixed positive clustering parameter b. ^ ; b, d), is a monotonically decreasing function of b for any P3. The function, e(b ^ and d. fixed positive constants b ^ ; b, d) < F(d/2) for any positive b ^ if the inequalP4. We have nthe inequality: e(b o P1. e(0; b, d) ¼ F(d/2), lim e(b^; b, d) ¼ p0 F(d=2) þ b^!1

.

.

.

ity b d12 log

1F(d=2) F(d=2)

holds.

Note that the value e(0; b, d) ¼ F(d/2) is simply the error rate due to Fisher’s linear discriminant function (LDF) with uniform prior probabilities on the labels. PK kbd2 =g The asymptotic value given in P1 is the error rate due to the k¼1 p k = 1 þ e

Signal and Image Processing for Remote Sensing

358

vote-for-majority rule when the number of neighbors, W1, is not equal to K. The property, P2, recommends that we use the true parameter, b, if it is known, and this is quite natural. P3 means that the classification becomes more efficient when d or b becomes large. Note that d is a distance in the feature space and b is a distance in the image. P4 implies that the use of spatial information always improves the performance of noncontextual discrimination ^ is far from the true value b. even if estimate b The error rate due to Switzer’s method is obtained in the same form as that of Equation qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 16.33 by replacing d with d * d= 1 þ 4b^2 =g

1F(d=2) F(d=2)

=d2 as in P3. Actually, the error rate exceeds the 1y-

^ (Figure 16.4a). On the other hand, the parameters of (b) intercept F(d/2) for large b meet the condition because b ¼ 1 and d is the same. Hence, the error rate is always less ^ , and this is shown in Figure 16.4b. than F(d/2) for any b 16.5.3

Spatial Boosting and the Smoothing Method

We consider the proposed classification function in Equation 16.30. When the radius is equal to zero (r ¼ 0), classification function in Equation 16.28 is given by x ¼ (xT1 , . . . , xTn )T : nq 1

F0 (x, k j i) ¼ b0 log p(k j xi ) ¼ b0 log p(xi j k) þ C,

(16:34)

where p(k j xi) is the posterior probability given in Equation 16.4, and C denotes a constant independent of label k. Constant b0 is positive, so the maximization of function F0(x, k j i) with respect to label k is equivalent to that of class-specific density p(xi j k). Hence, classification function F0 yields the usual noncontextual Bayes classifier. d = 1.5, b = 0.5

d = 1.5, b = 1.0

0.45

0.45 MRF Switzer

0.35 0.3 0.25

0.35 0.3 0.25

0.2 0.15 (a)

MRF Switzer

0.4 Error rate

Error rate

0.4

0.2 0

0.5

1

1.5

2

^

b

FIGURE 16.4 ^ ). Error rates due to MRF and Switzer’s method (x-axis: b

0.15 (b)

0

1

2 ^

b

3

4

Supervised Image Classification of Multi-Spectral Images

359

When the radius is equal to one (r ¼ 1), the classification function in Equation 16.18 is given by F1 (x, k j i) ¼ b0 log p(k j xi ) þ

X b1 log p(k j xj ) j U1 (i)j j2U (i) 1

¼ b0 log p(xi j k) þ

X b1 log p(xj j k) þ C0 jU1 (i)j j2U (i)

(16:35)

1

where C0 denotes a constant independent of k, and set U1(i) is defined by Equation 16.21, which is the first-order neighborhood of pixel i. Hence, the label of pixel i is estimated by a convex combination of the log-likelihood of the feature vector xi and that of xj with pixel j in neighborhood U1(i). In other words, the label is estimated by a majority vote based on the weighted sum of the log-likelihoods. Thus, the proposed method is an extension of the smoothing method proposed by Switzer [1] for multivariate Gaussian distributions with an isotropic correlation function. However, the derivation is completely different. 16.5.4

Spatial Boosting and MRF-Based Methods

Usually, the MRF-based method uses training data for the estimation of parameters that specify the class-specific densities p(x j k). Clustering parameter b appearing in conditional probability in Equation 16.24 and the test labels are jointly estimated. The ICM method maximizes conditional probability in Equation 16.26 sequentially with a given clustering parameter b. Furthermore, b is chosen to maximize pseudo-likelihood in Equation 16.27 calculated over the test data. This procedure requires a great deal of CPU time. On the other hand, spatial boosting uses the training data for the estimation of coefficients br and parameters of densities p(x j k). The estimation cost of the parameters of the densities is the same as that of the MRF-based classifier. Although estimation of the coefficients requires iterative procedure (Equation 16.12), the number of iterations is very low. After estimation of the coefficients and the parameters, the test labels can be estimated without any iterative procedure. Thus, the required computational resources, such as CPU times, should be significantly different between the two methods, which needs further scrutiny. Although the training data and the test data are numerous, spatial AdaBoost classifies the test data very quickly.

16.6

Spatial Parallel Boost by Meta-Learning

Spatial boosting is extended for classification of spatio-temporal data. Suppose we have S sets of training data over regions D(s) D for s ¼ 1, 2, . . . , S. Let {(xi(s), yi(s) 2 Rq G j i 2 D(s)} be a set of training data with n(s) ¼ j D(s) j samples. Parameters specifying the classspecific densities are estimated by each training data set. Then, we estimate the posterior distribution in Equation 16.4 according to each training data set. Thus, we derive the averaged log posteriors (Equation 16.28). We assume that an appropriate calibration for feature vectors xi(s) throughout the training data sets is preprocessed.

Signal and Image Processing for Remote Sensing

360

Fix loss function L, and let F(s) be a set of classification functions of the s-th training data set. Now, we propose spatial boosting based on S training data sets (Parallel Boost). Let F be a classification function applicable to all training data sets. Then, we propose an empirical risk function of F: Remp (F) ¼

S X

R(s) emp (F)

(16:36)

s¼1

where R(s) emp(F) denotes empirical risk given in Equation 16.6 or Equation 16.18 due to loss function L evaluated over the s-th training data set. The following steps describe parallel boost: .

.

Let {f0(s), f1(s), . . . , fr(s)} be a set of averages of the log posteriors (Equation 16.28) evaluated through the s-th training data set for s ¼ 1, . . . , S. Stage for radius 0 (a) Obtain the optimal coefficient and the training data set defined by (a01 , b01 ) ¼ arg min Remp (bf0(s) ) (s, b) 2 {1, . . . , S} R (b) If coefficient b01 is negative, quit the procedure. Otherwise, find the minimizer of empirical risk Remp (b01 f0(a01) þ bf0(s)). Let s ¼ a02 and b ¼ b02 2 R be the solution. (c) If coefficient b02 is negative, quit the procedure. Repeat this at most S times and define the convex combination, F0 b01 f0(a01) þ þ b0m0 f0(a0m0), of the classification functions, where m0 denotes the number of replications of the substeps.

.

.

Stage for radius 1 (first-order neighborhood) (a) Let F0 be a function obtained by the previous stage. Then, find classification function f 2 F1 and coefficient b 2 R that minimize the empirical risk Remp(F0 þ b f0(s)), for example, s ¼ a11 and b ¼ b11. (b) If coefficient b11 is negative, quit the procedure. Otherwise, minimize the empirical risk, Remp(F0 þ b11 f1(a11) þ bf1(s)), given F0 and b11 f1(a11). This is repeated and we define a convex combination F1 b11 f1(a11) þ þ b1a1 f1(a1m1). The function F0 þ F1 constitutes a contextual classifier. r The stage is repeated up to radius r, and we obtain the classification function F F0 þ F1 þ þ Fr.

Note that spatial parallel boost is based on the assumption that training data sets and test data are not significantly different, especially in the spatial distribution of the classes.

16.7

Numerical Experiments

Various classification methods discussed here will be examined by applying them to three data sets. Data set 1 is a synthetic data set, and Data sets 2 and 3 are real data sets offered in Ref. [20] as benchmarks. Legends of the data sets are given in the following

Supervised Image Classification of Multi-Spectral Images

361

section; then, the classification methods are applied to the data sets for evaluating the efficiency of the methods.

16.7.1 16.7.1.1

Legends of Three Data Sets Data Set 1: Synthetic Data Set

Data set 1 is constituted by multi-spectral images generated over the image shown in Figure 16.5a. There are three classes (g ¼ 3), and the labels correspond to black, white, and gray. Sample sizes from these classes are 3330, 1371, and 3580 (n ¼ 8281), respectively. We simulate four-dimensional spectral images (q ¼ 4) at each pixel of the true image of Figure 16.4a following independent multi-var iate Gaussian distributions pﬃﬃﬃ with mean vectors m1 ¼ (0 0 0 0)T, m2 ¼ (1 1 0 0)T= 2, and m3 ¼ (1.04980.6379 0 0)T and with common variance–covariance matrix s2 I, where I denotes the identity matrix. The mean vectors were chosen to maximize pseudo-likelihood (Equation 16.27) of the training data based on the squared Mahalanobis distance used as the divergence in Equation 16.22. Test data of size 8281 are similarly generated over the same image. Gaussian density functions with the common variance–covariance matrix are used to derive the posterior probabilities. 16.7.1.2

Data Set 2: Benchmark Data Set grss_dfc_0006

Data set grss_dfc_0006 is a benchmark data set for supervised image classification. The data set can be accessed from the IEEE GRS-S Data Fusion reference database [20], which consists of samples acquired by ATM and SAR (six and nine bands, respectively; q ¼ 15) observed at Feltwell, UK. There are five agricultural classes of sugar beets, stubble, bare soil, potatoes, and carrots (g ¼ 5). Sample sizes of the training data and test data are 5072 ¼ 1436 þ 1070 þ 341 þ 1411 þ 814 and 5760 ¼ 2017 þ 1355 þ 548 þ 877 þ 963, respectively, in a rectangular region of size 350 250. The data set was analyzed in Ref. [21] through various methods based on statistics and neural networks. In this case, the labels of the test and training data are observed sparsely. 16.7.1.3

Data Set 3: Benchmark Data Set grss_dfc_0009

Data set grss_dfc_0009 is also taken from the IEEE GRS-S Data Fusion reference database, which consists of examples acquired by Landsat 7 TMþ (q ¼ 7) with 11 agricultural classes (g ¼ 11) in Flevoland, Netherlands. Training data and test data sizes are 2891 and 2890, respectively, in a square region of size 512 512. We note that the labels of the test and training data of Data sets 2 and 3 are observed sparsely in the region.

16.7.2

Potts Models and the Divergence Models

We compare the performance of the Potts and the divergence models applied to Data sets 1 and 2 (see Table 16.1 and Table 16.2). Gaussian distributions with a common variance–covariance matrix (homoscedastic), and Gaussian distributions with class-specific matrices (heteroscedastic) are employed for class-specific distributions. The former corresponds to the LDF and the latter to the quadratic discriminant function (QDF). Test errors due to GMRF-based classifiers with several radii r are listed in the tables. Numerals in boldface denote the optimal values in the respective columns. We see that the divergence models give the minimum values in Table 16.1 and Table 16.2.

Signal and Image Processing for Remote Sensing

362 (a) True labels

(b) r = 0

(c) r = 1

(d) r = √2

(e) r = √8

(f) r = √13

FIGURE 16.5 (a) True labels of data set 1, (b) labels estimated by LDF, (c)–(f) labels estimated by spatial AdaBoost with b0 f0 þ b1 f1 þ þ br fr.

In the homoscedastic cases of Table 16.1 and Table 16.2, the divergence model gives a performance similar to that of the Potts model. In the heteroscedastic case, the divergence model is superior to the Potts model to a small degree. Classified images with s2 ¼ 1 are given in Figure 16.5b through Figure 16.5f. Figure 16.5b is an image classified by the LDF, which exhibits very poor performance. As radius r becomes larger, images classified by

TABLE 16.1 Test Error Rates (%) Due to Gaussian MRFs with Variance–Covariance Matrices s2I with s2 ¼ 1, 2 for Data Set 1 Homoscedastic Gaussian MRFs 2

s2 ¼ 2

s ¼ 1

s2 ¼ 3

s2 ¼ 4

Radius r2

Potts

Jeffreys

Potts

Jeffreys

Potts

Jeffreys

Potts

Jeffreys

0 1 2 4 5 8 9 10 13 16 17 18 20

41.66 9.35 3.90 3.47 3.19 3.39 3.65 3.89 4.46 4.49 4.96 5.00 5.68

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.57 9.97 3.90 3.02 3.32 3.35 3.51 3.86 4.36 5.00 5.30 5.92 6.50

41.57 9.68 4.46 3.12 2.96 3.37 4.06 4.09 4.66 4.99 5.54 5.83 6.56

52.63 25.60 13.26 9.82 8.03 7.76 8.28 8.88 10.12 11.16 12.04 12.64 14.89

52.63 27.41 12.23 9.20 7.75 8.57 8.20 10.08 10.20 11.68 11.65 12.67 13.55

54.45 30.76 17.49 11.83 10.72 10.14 10.58 10.57 13.25 14.25 15.34 16.07 17.18

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

Supervised Image Classification of Multi-Spectral Images

363

TABLE 16.2 Test Error Rates (%) Due to Homoscedastic and Heteroscedastic Gaussian MRFs with Two Sorts of Divergences for Data Set 2 Gaussian MRFs Homoscedastic 2

Heteroscedastic

Radius r

Potts

Jeffreys

Potts

Jeffreys

0 4 9 16 25 36 49 64

8.61 6.44 6.16 5.99 6.16 6.49 6.37 6.30

8.61 6.09 5.69 5.35 5.69 5.64 6.18 6.65

14.67 12.93 13.09 13.28 12.80 13.16 12.90 12.92

14.67 11.44 10.19 9.67 9.39 9.55 9.90 9.76

spatial AdaBoost becomes clearer. Therefore, we see that spatial AdaBoost improves the noncontextual classification result (r ¼ 0) significantly.

16.7.3

Spatial AdaBoost and Its Robustness

Spatial AdaBoost is applied to the three data sets. The error rates due to spatial AdaBoost and the MRF-based classifier for Data set 1 are listed in Table 16.3. Here, homoscedastic Gaussian distributions are employed for class-specific distributions in both classifiers. Hence, classification with radius r ¼ 0 indicates noncontextual classification by the LDF. We used the GMRF with the divergence model discussed in Section 16.4.3. From Table 16.3, we see that the minimum error rate due to AdaBoost is higher than that due to the GMRF-based method, but the value is still comparable to that due to the TABLE 16.3 Error Rates (%) Due to Homoscedastic GMRF and Spatial AdaBoost for Data Set 1 with Variance– Covariance Matrix s2I s2 ¼ 1 Radius r2 0 1 2 4 5 8 9 10 13 16 17 18 20

s2 ¼ 4

GMRF

AdaBoost

GMRF

AdaBoost

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.65 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34 4.31 4.19 4.34 4.43

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

54.45 40.88 33.79 28.78 22.71 20.50 18.96 16.58 14.99 14.14 13.02 12.57 12.16

Signal and Image Processing for Remote Sensing

364 (a) Data set 1

(b) Data set 2 0.06

3 Coefficients

Coefficients

0.05 2

1

0.04 0.03 0.02 0.01 0

−0.01 0

0

10

20

30

40

50

−0.02

20 40 60 80 Squared radii of neighborhoods

0

Squared radii of neighborhoods

100

FIGURE 16.6 Estimated coefficients for the log posteriors in the ring regions of data set 1 (left) and data set 2 (right).

GMRF-based method. The left figure in Figure 16.6 is a plot of the estimated coefficients for the averaged log posterior fr vs. r2. All the coefficients are positive. Therefore, the problem of stopping to combine classification function arises. Training and test errors as radius r becomes large are given in Figure 16.7. The test error takes the minimum value at r2 ¼ 17, whereas the training error is stable for large r. The same study as the above is applied to Data set 2 (Table 16.4). We obtain a similar result as that observed in Table 16.3. The left figur e in Figure 16.6 gives the estimated pﬃﬃﬃﬃﬃ coefficient. We see that coefficient br with r ¼ 72 isp ﬃﬃﬃﬃﬃ 0.012975, a negative value. This implies that averaged log posterior fr with r ¼ 72 ispﬃﬃﬃﬃﬃ not appropriate for the classification. Therefore, we must exclude functions of radius 72 or larger. The application to Data set 3 leads to a quite different result (Table 16.5). We fit homoscedastic and heteroscedastic Gaussian distributions for feature vectors. Hence, classification with radius r ¼ 0 is performed by the LDF and the QDF. The error rates due to noncontextual classification function log p(k j x) and the coefficients are tabulated in Table 16.5. The test error due to the QDF is 1.66%, and this is excellent. Unfortunately, coefficients b0 of functions log p(k j x) in both cases are estimated by the negative values.

(b) Test data

0.4

0.4

0.3

0.3

Test errors

Training errors

(a) Training data

0.2

0.1

0.1

0 0

0.2

10

20

30

40

Squared radii of neighborhoods

50

0 0

10

20

30

40

50

Squared radii of neighborhoods

FIGURE 16.7 Error rates with respect to squared radius, r2, for training data (left) and test data (right) due to classifiers b0f0 þ b1f1 þ þ brfr for data set 1.

Supervised Image Classification of Multi-Spectral Images

365

TABLE 16.4 Error Rates (%) Due to GMRF and Spatial AdaBoost for Data Set 2 Homoscedastic 2

Radius r 0 4 9 16 25 36 49 64

GMRF

AdaBoost

8.61 6.09 5.69 5.35 5.69 5.69 6.13 6.65

8.61 7.08 6.11 5.95 5.83 5.85 5.99 6.01

Therefore, the classification functions b0 log p(k j x) are not applicable. This implies that the exponential loss defined by Equation 16.5 is too sensitive to misclassified data. Error rates for spatial AdaBoost b0f0 þ b1f1 are listed in Table 16.6 when coefficient b0 is predetermined and b1 is tuned by the empirical risk Remp(b0f0 þ b1f1). The LDF and the QDF columns correspond to the posterior probabilities based on the homoscedastic Gaussian distributions and the heteroscedastic Gaussian distributions, respectively. For both of these cases, information in the first-order neighborhood (r ¼ 1) improves the classification slightly. We note that homoscedastic and heteroscedastic GMRFs have minimum error rates of 4.01% and 0.52%, respectively.

16.7.4

Spatial AdaBoost and Spatial LogitBoost

pﬃﬃﬃﬃﬃ We apply classification functions F0, F1, . . . , Fr with r ¼ 20 to the test data,pwhere ﬃﬃﬃﬃﬃ function Fr is defined by Equation 16.30, and coefficients b0, b1, . . . , br with r ¼ 20 are sequentially tuned by minimizing exponential risk (Equation 16.6) or logit risk (Equation 16.18). Error rates due to GMRF-based classifiers and spatial boosting methods for error variances s2 ¼ 1 and 4 for Data set 1 are compared in Table 16.7. We see that GMRF is superior to the spatial boosting methods to a small degree. Furthermore, spatial AdaBoost and spatial LogitBoost exhibit similar performances for Data set 1. Note that spatial boosting is very fast compared with GMRF-based classifiers. This observation is also true for Data set 2 (Table 16.8). Then, GMRF and spatial LogitBoost were applied to Data set 3 (Table 16.9). Spatial LogitBoost still works well in spite of the failure of spatial AdaBoost. Note that spatial AdaBoost and spatial LogitBoost use the same classification functions: f0, f1, . . . , fr. However, the loss function used by TABLE 16.5 Optimal Coefficients Tuned by Minimizing Exponential Risks Based on LDF and QDF for Data Set 3

Error rate by f0 Tuned b0 to f0 Error rate by b0f0

LDF

QDF

10.69 % 0.006805 Nearly 100%

1.66 % 0.000953 Nearly 100%

Signal and Image Processing for Remote Sensing

366 TABLE 16.6

Error Rates (%) Due to Contextual Classification Function b0f0 þ b1f1 Given b0 for Data Set 3. b1 Is Tuned by Minimizing the Empirical Exponential Risk b0

Homoscedastic

Heteroscedastic

107 106 105 104 103 102 101 1 1

9.97 10.00 10.00 10.00 9.10 9.45 10.24 10.69 10.69

1.35 1.35 1.35 1.25 1.00 1.38 1.66 1.66 1.66

TABLE 16.7 Error Rates (%) Due to GMRF and Spatial Boosting for Data Set 1 with Variance–Covariance Matrix s2I s2 ¼ 1

s2 ¼ 4

Spatial Boosting 2

Radius r

Homoscedastic GMRF

AdaBoost

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.66 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34 4.31 4.19 4.34 4.43

0 1 2 4 5 8 9 10 13 16 17 18 20

Spatial Boosting

LogitBoost

Homoscedastic GMRF

AdaBoost

LogitBoost

41.66 19.45 11.63 8.36 5.93 5.43 4.96 4.65 4.54 4.43 4.31 4.58 4.66

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

54.45 40.88 33.79 28.78 22.71 20.50 18.96 16.58 14.99 14.14 13.02 12.57 12.16

54.45 40.77 33.95 28.74 22.44 20.55 19.01 16.56 15.08 14.12 13.22 12.69 12.11

TABLE 16.8 Error Rates (%) Due to GMRF and Spatial Boosting for Data Set 2 Homoscedastic 2

Radius r 0 4 9 16 25 36 49 64

Homoscedastic GMRF

AdaBoost

LogitBoost

8.61 6.09 5.69 5.35 5.69 5.69 6.13 6.65

8.61 7.08 6.11 5.95 5.83 5.85 5.99 6.01

8.61 7.93 7.33 6.74 6.35 6.13 6.32 6.48

Supervised Image Classification of Multi-Spectral Images

367

TABLE 16.9 Error Rates (%) Due to GMRF and Spatial LogitBoost for Data Set 3 Homoscedastic 2

Radius r 0 1 4 8 9 10 13 16

Heteroscedastic

GMRF

LogitBoost

GMRF

LogitBoost

10.69 8.58 7.06 3.81 4.01 4.08 4.01 4.01

10.69 9.48 6.51 5.43 5.40 5.29 5.16 5.26

1.66 1.97 1.69 0.55 0.52 0.52 0.52 0.52

1.66 1.31 0.93 0.76 0.76 0.83 0.87 0.90

LogitBoost assigns less penalty to misclassified data. Hence, spatial LogitBoost seems to have a robust property. 16.7.5

Spatial Parallel Boost

We examine parallel boost by applying it to Data set 1. Three rectangular regions found in Figure 16.8a give training areas. Now, we consider the following four subsets of the training data set: .

Near the center of Figure 16.8a with sample size 75 (D(1))

.

North-west of Figure 16.8a with sample size 160 (D(2)) (a) True labels

(d) D (2), r = √13

(b) D (1) ∪ D (2) ∪ D (3) , r = 0

(e) D (3), r = √13

(c) D (1), r = √13

(f) D (1) ∪ D (2) ∪ D (3) , r = √13

FIGURE 16.8 True labels, training regions, and classified images. The n denotes the size of training samples. (a) True labels with three training regions. (b) Parallel boost with pﬃﬃﬃﬃﬃ r ¼ 0 (error rate ¼ 42.29%, n ¼ 435) noncontextual (1) ¼ 10.26%, p n ﬃﬃﬃﬃﬃ ¼ 75). (d) Spatial boost by D(2) classification. pﬃﬃﬃﬃﬃ (c) Spatial boost by D with r ¼ 13 (error rate (3) with r ¼ 13 (error rate ¼ 6.29%, n ¼ 160). (e) p Spatial boost by D with r ¼ 13 (error rate ¼ 7.46%, n ¼ 200). ﬃﬃﬃﬃﬃ (f) Parallel boost by D(1) [ D(2) [ D(3) with r ¼ 13 (error rate ¼ 4.96%, n ¼ 435).

Signal and Image Processing for Remote Sensing

368 TABLE 16.10

Error Rates (%) Due to Spatial AdaBoost Based on Four Training Data Sets and Spatial Parallel Boost Based on Union of Three Subsets in Data Set 1 Spatial AdaBoost 2

(1)

Radius r

D

0 1 2 4 5 8 9 10 13

43.13 23.84 17.16 13.88 10.88 10.49 10.48 10.41 10.26

(2)

Parallel Boost (3)

D

D

41.67 20.23 12.84 9.44 7.34 6.97 6.64 6.35 6.29

45.88 26.51 18.92 15.25 11.21 10.26 9.26 8.08 7.46

Whole

D(1)U D(2)U D(3)

41.75 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34

42.29 21.18 13.03 9.46 6.39 5.78 5.41 5.19 4.96

.

South-east of Figure 16.8a with sample size 200 (D(3))

.

All training data with sample size 8281

The test data set is also of sample size 8281. The training region D(1) is relatively small, pﬃﬃﬃﬃﬃ so we only consider radius r up to 13. Gaussian density functions with a common variance–covariance matrix are fitted to each training data set, and we obtain four types of posterior probabilities for each pixel in training data sets and the test data set. Spatial AdaBoost derived by the four training data sets and spatial parallel boost derived by the region D(1) [ D(2) [ D(3) are applied to the test region shown in Figure 16.2a. Error rates due to five classifiers examined with the test data using 8281 samples are comparedpin ﬃﬃﬃﬃﬃTable 16.10. Each row corresponds to spatial boost based on radius r in {0, 1, . . . , 13}. The second to fourth columns correspond to Spatial AdaBoost based on the four training data sets. The last column is the error rate due to spatial parallel boost. We see that spatial AdaBoost exhibits a better performance as the training data size increases. The classifier based on D(3) seems exceptional, but it gives better results for a large r. Spatial parallel boost exhibits a performance similar to that of spatial AdaBoost based on all the training data. Here, we note again that our method uses only 435 samples, whereas all the training data have 8281 samples. A noncontextual classification result based on spatial parallel boost is given in Figure 16.8b, and it is very poor. Anpexcellent classification result is obtained as radius r becomes ﬃﬃﬃﬃﬃ bigger. The result with r ¼ 13, which is better than the results (b), (d), and (f) due to spatial AdaBoost using three training data sets, is given in Figure 16.8e. The error rates corresponding to figures (b), (d), (e), and (f) are given in Table 16.10 in bold.

16.8

Conclusion

We have considered contextual classifiers, Switzer’s smoothing method, MRF-based methods, and spatial boosting methods. We showed that the last two methods can be regarded as extensions of the Switzer’s method. The MRF-based method is known to be efficient, and it exhibited the best performance in our numerical studies given here. However, the method requires extra computation time for (1) the estimation of parameters

Supervised Image Classification of Multi-Spectral Images

369

specifying the MRF and (2) the estimation of test labels, even if the ICM procedure is utilized for (2). Both estimations are accomplished by iterative procedures. The method has already been fully discussed in the literature. The following are our remarks on spatial boosting and its possible future development. Spatial boosting is derived as long as posterior probabilities are available. Hence, posteriors need not be defined by statistical models. Posteriors due to statistical support vector machines (SVM) [22,23] could be utilized. Spatial boosting is derived by fewer assumptions than those of an MRF-based method. Features of spatial boosting are as follows: (a) Various types of posteriors can be implemented. Pairwise coupling [24] is a popular strategy for multi-class classification that involves estimating posteriors for each pair of classes, and then coupling the estimates together. (b) Directed neighborhoods can be implemented. See Figure 16.9 for the directed neighborhoods. They would be suitable for classification of textured images. (c) The method is applicable to a situation with multi-source and multi-temporal data sets. (d) Classification is performed non-iteratively, and the performance is similar to that of MRF-based classifiers. The following new questions arise: . . .

How do we define the posteriors? This is closely related to feature (a). How do we choose the loss function? How do we determine the maximum radius for regarding log posteriors into the classification function.

This is an old question in boosting: How many classification functions are sufficient for classification? Spatial boosting should be used in cases where estimation of coefficients is performed through test data only. This is a first step for possible development into unsupervised classification. In addition, the method will be useful for unmixing. Originally, classification function F defined by Equation 16.30 is given by a convex combination of averaged

(a) r = 1

(b) r = 1

(c) r = √2

(d) r = √2

i

i

i

i

FIGURE 16.9 Directed subsets with center pixel i.

370

Signal and Image Processing for Remote Sensing

log posteriors. Therefore, F would be useful for estimation of posterior probabilities, which is directly applicable to unmixing.

Acknowledgment Data sets grss_dfc_0006 and grss_dfc_0009 were provided by the IEEE GRS-S Data Fusion Committee.

References 1. Switzer, P., (1980) Extensions of linear discriminant analysis for statistical classification of remotely sensed satellite imagery, Mathematical Geology, 12(4), 367–376. 2. Nishii, R. and Eguchi, S. (2005), Supervised image classification by contextual AdaBoost based on posteriors in neighborhoods, IEEE Transactions on Geoscience and Remote Sensing, 43(11), 2547–2554. 3. Jain, A.K., Duin, R.P.W., and Mao, J. (2000), Statistical pattern recognition: a review, IEEE Transaction on Pattern Analysis and Machine Intelligence, PAMI-22, 4–37. 4. Besag, J. (1986), On the statistical analysis of dirty pictures, Journal of the Royal Statistical Society, Series B, 48, 259–302. 5. Duda, R.O., Hart, P.E., and Stork, D.G. (2000), Pattern Classification (2nd ed.), John Wiley & Sons, New York. 6. Geman, S. and Geman, D. (1984), Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 721–741. 7. McLachlan, G.J. (2004), Discriminant Analysis and Statistical Pattern Recognition (2nd ed.), John Wiley & Sons, New York. 8. Chile`s, J.P. and Delfiner, P. (1999), Geostatistics, John Wiley & Sons, New York. 9. Cressie, N. (1993), Statistics for Spatial Data, John Wiley & Sons, New York. 10. Freund, Y. and Schapire, R.E. (1997), A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55(1), 119–139. 11. Friedman, J., Hastie, T., and Tibshirani, R. (2000), Additive logistic regression: a statistical view of boosting (with discussion), Annals of Statistics, 28, 337–407. 12. Hastie, T., Tibshirani, R., and Friedman, J., (2001), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York. 13. Benediktsson, J.A. and Kanellopoulos, I. (1999), Classification of multisource and hyperspectral data based on decision fusion, IEEE Transactions on Geoscience and Remote Sensing, 37, 1367–1377. 14. Melgani, F. and Bruzzone, L. (2004), Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on Geoscience and Remote Sensing, 42(8), 1778–1790. 15. Nishii, R. (2003), A Markov random field-based approach to decision level fusion for remote sensing image classification, IEEE Transactions on Geoscience and Remote Sensing, 41(10), 2316–2319. 16. Nishii, R. and Eguchi, S. (2002), Image segmentation by structural Markov random fields based on Jeffreys divergence, Res. Memo., Institute of Statistical Mathematics, No. 849. 17. Nishii, R. and Eguchi, S. (2005), Robust supervised image classifiers by spatial AdaBoost based on robust loss functions, Proc. SPIE, Vol. 5982. 18. Eguchi, S. and Copas, J. (2002), A class of logistic-type discriminant functions, Biometrika, 89, 1–22. 19. Takenouchi, T. and Eguchi, S. (2004), Robustifying AdaBoost by adding the naive error rate, Neural Computation, 16, 767–787.

Supervised Image Classification of Multi-Spectral Images

371

20. IEEE GRSS Data Fusion reference database, Data sets GRSS_DFC_0006, GRSS_DFC_0009 Online. http://www.dfc-grss.org/, 2001. 21. Giacinto, G., Roli, F., and Bruzzone, L. (2000), Combination of neural and statistical algorithms for supervised classification of remote-sensing images, Pattern Recognition Letters, 21, 385–397. 22. Kwok, J.T. (2000), The evidence framework applied to support vector machine, IEEE Transactions on Neural Networks, 11(5), 1162–1173. 23. Wahba, G. (1999), Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV, In Advances in Kernel Methods—Support Vector Learning, Scho¨lkopf, Burges and Smola, eds., MIT press, Cambridge, MA, 69–88. 24. Hastie, T. and Tibshirani, R. (1998), Classification by pairwise coupling, Annals of Statistics, 26(2), 451–471.

17 Unsupervised Change Detection in Multi-Temporal SAR Images

Lorenzo Bruzzone and Francesca Bovolo

CONTENTS 17.1 Introduction ..................................................................................................................... 373 17.2 Change Detection in Multi-Temporal Remote-Sensing Images: Literature Survey ............................................................................................................ 376 17.2.1 General Overview ............................................................................................. 376 17.2.2 Change Detection in SAR Images .................................................................. 379 17.2.2.1 Preprocessing.................................................................................... 379 17.2.2.2 Multi-Temporal Image Comparison............................................. 380 17.2.2.3 Analysis of the Ratio and Log-Ratio Image ................................ 381 17.3 Advanced Approaches to Change Detection in SAR Images: A Detail-Preserving Scale-Driven Technique............................................................. 383 17.3.1 Multi-Resolution Decomposition of the Log-Ratio Image ......................... 385 17.3.2 Adaptive Scale Identification .......................................................................... 387 17.3.3 Scale-Driven Fusion.......................................................................................... 388 17.4 Experimental Results and Comparisons..................................................................... 390 17.4.1 Data Set Description ......................................................................................... 390 17.4.2 Results................................................................................................................. 392 17.5 Conclusions...................................................................................................................... 396 Acknowledgments ..................................................................................................................... 397 References ................................................................................................................................... 397

17.1

Introduction

The recent natural disasters (e.g., tsunami, hurricanes, eruptions, earthquakes, etc.) and the increasing amount of anthropogenic changes (e.g., due to wars, pollution, etc.) gave prominence to the topics related to environment monitoring and damage assessment. The study of environmental variations due to the time evolution of the above phenomena is of fundamental interest from a political point of view. In this context, the development of effective change-detection techniques capable of automatically identifying land-cover

373

Signal and Image Processing for Remote Sensing

374

variations occurring on the ground by analyzing multi-temporal remote-sensing images assumes an important relevance for both the scientific community and the end-users. The change-detection process considers images acquired at different times over the same geographical area of interest. These images acquired from repeat-pass satellite sensors are an effective input for addressing change-detection problems. Several different Earthobservation satellite missions are currently operative, with different kinds of sensors mounted on board (e.g., MODIS and ASTER on board NASA’s TERRA satellite, MERIS and ASAR on board ESA’s ENVISAT satellite, Hyperion on board EO-1 NASA’s satellite, SAR sensors on board RADARSAT-1 and RADARSAT-2 CSA’s satellites, Ikonos and Quickbird satellites that acquire very high resolution pancromatic and multi-spectral (MS) images, etc.). Each sensor has specific properties with respect to the image acquisition mode (e.g., passive or active), geometrical, spectral, and radiometric resolutions, etc. In the development of automatic change-detection techniques, it is mandatory to take into account the properties of the sensors to properly extract information from the considered data. Let us discuss the main characteristics of different kinds of sensors in detail (Table 17.1 summarizes some advantages and disadvantages of different sensors for change-detection applications according to their characteristics). Images acquired from passive sensors are obtained by measuring the land-cover reflectance on the basis of the energy emitted from the sun and reflected from the ground1. Usually, the measured signal can be modeled as the desired reflectance (measured as a radiance) altered from an additive Gaussian noise. This noise model enables relatively easy processing of the signal when designing data analysis techniques. Passive sensors can acquire two different kinds of images [panchromatic (PAN) images and MS images] by defining different trade-offs between geometrical and spectral resolutions according to the radiometric resolution of the adopted detectors. PAN images are characterized by poor spectral resolution but very high geometrical resolution, whereas MS images have medium geometrical resolution but high spectral resolution. From the perspective of change detection, PAN images should be used when the expected size of the changed area is too small for adopting MS data. For example, in the case of the analysis of changes in urban areas, where detailed urban studies should be carried out, change detection in PAN images requires the definition of techniques capable of capturing the richness of information present both in the spatial-context relations between neighboring pixels and in the geometrical shapes of objects. MS data should be used

TABLE 17.1 Advantages and Disadvantages of Different Kinds of Sensors for Change-Detection Applications Sensor Multispectral (passive)

Panchromatic (passive)

SAR (active)

1

Advantages

Disadvantages

3 Characterization of the spectral signature of land-covers 3 The noise has an additive model 3 High geometrical resolution

3 Atmospheric conditions strongly affect the acquisition phase

3 High content of spatial-context information 3 Not affected by sunlight and atmospheric conditions

3 Atmospheric conditions strongly affect the acquisition phase 3 Poor characterization of the spectral signature of land-covers 3 Complexity of data preprocessing 3 Presence of multiplicative speckle noise

Also, the emission of Earth affects the measurements in the infrared portion of the spectrum.

Unsupervised Change Detection in Multi-Temporal SAR Images

375

when a medium geometrical resolution (i.e., 10–30 m) is sufficient for characterizing the size of the changed areas and a detailed modeling of the spectral signature of the landcovers is necessary for identifying the change investigated. Change-detection methods in MS images should be able to properly exploit the available MS information in the change detection process. A critical problem related to the use of passive sensors in change detection consists in the sensitivity of the image-acquisition phase to atmospheric conditions. This problem has two possible effects: (1) atmospheric conditions may not be conducive to measure land-cover spectral signatures, which depends on the presence of clouds; and (2) variations in illumination and atmospheric conditions at different acquisition times may be a potential source of errors, which should be taken into account to avoid the identification of false changes (or the missed detection of true changes). The working principle of active synthetic aperture radar (SAR) sensors is completely different from that of the passive ones and allows overcoming some of the drawbacks that affect optical images. The signal measured by active sensors is the Earth backscattering of an electromagnetic pulse emitted from the sensor itself. SAR instruments acquire different kinds of signals that result in different images: medium or high-resolution images, singlefrequency or multi-frequency, and single-polarimetric or fully polarimetric images. As for optical data, the proper geometrical resolution should be chosen according to the size of the expected investigated changes. The SAR signal has different geometrical resolutions and a different penetration capability depending on the signal wavelength, which is usually included between band X and band P (i.e., between 2 and 100 cm). In other words, shorter wavelengths should be used for measuring vegetation changes and longer and more penetrating wavelengths for studying changes that have occurred on or under the terrain. All the wavelengths adopted for SAR sensors neither suffer from atmospheric and sunlight conditions nor from the presence of clouds; thus multi-temporal radar backscattering does not change with atmospheric conditions. The main problem related to the use of active sensors is the coherent nature of the SAR signal, which results in a multiplicative speckle noise that makes acquired data intrinsically complex to be analyzed. A proper handling of speckle requires both an intensive preprocessing phase and the development of effective data analysis techniques. The different properties and statistical behaviors of signals acquired by active and passive sensors require the definition of different change-detection techniques capable of properly exploiting the specific data peculiarities. In the literature, many different techniques for change detection in images acquired by passive sensors have been presented [1–8], and many applications of these techniques have been reported. This is because of both the amount of information present in MS images and the relative simplicity of data analysis, which results from the additive noise model adopted for MS data (the radiance of natural classes can be approximated with a Gaussian distribution). Less attention has been devoted to change detection in SAR images. This is explained by the intrinsic complexity of SAR data, which require both an intensive preprocessing phase and the development of effective data analysis techniques capable of dealing with multiplicative speckle noise. Nonetheless, in the past few years the remote-sensing community has shown more interest in the use of SAR images in change-detection problems, due to their independence from atmospheric conditions that results in excellent operational properties. The recent technological developments in sensors and satellites have resulted in the design of more sophisticated systems with increased geometrical resolution. Apart from the active or passive nature of the sensor, the very high geometrical resolution images acquired by these systems (e.g., PAN images) require the development of specific techniques capable of taking advantage of the richness of the geometrical information they contain. In particular, both the high correlation

Signal and Image Processing for Remote Sensing

376

between neighboring pixels and the object shapes should be considered in the design of data analysis procedures. In the above-mentioned context, two main challenging issues of particular interest in the development of automatic change-detection techniques are: (1) the definition of advanced and effective techniques for change detection in SAR images, and (2) the development of proper methods for the detection of changes in very high geometrical resolution images. A solution for these issues lies in the definition of multi-scale and multi-resolution change-detection techniques, which can properly analyze the different components of the change signal at their optimal scale2. On the one hand, the multi-scale analysis allows one to better handle the noise present in medium-resolution SAR images, resulting in the possibility of obtaining accurate change-detection maps characterized by a high spatial fidelity. On the other hand, multi-scale approaches are intrinsically suitable to exploit the information present in very high geometrical resolution images according to effective modeling (at different resolution levels) of the different objects present at the scene. According to the analysis mentioned above, after a brief survey on change detection and on unsupervised change detection in SAR images, we present, in this chapter, a novel adaptive multi-scale change detection technique for multi-temporal SAR images. This technique exploits a proper scale-driven analysis to obtain a high sensitivity to geometrical features (i.e., details and borders of changed areas are well preserved) and a high robustness to noisy speckle components in homogeneous areas. Although explicitly developed and tested for change detection in medium-resolution SAR images, this technique can be easily extended to the analysis of very high geometrical resolution images. The chapter is organized into five sections. Section 17.2 defines the change-detection problem in multi-temporal remote-sensing images and focuses attention on unsupervised techniques for multi-temporal SAR images. Section 17.3 presents a multi-scale approach to change detection in multi-temporal SAR images recently developed by the authors. Section 17.4 gives an example of the application of the proposed multi-scale technique to a real multi-temporal SAR data set and compares the effectiveness of the presented method with those of standard single-scale change-detection techniques. Finally, in Section 17.5, results are discussed and conclusions are drawn.

17.2 17.2.1

Change Detection in Multi-Temporal Remote-Sensing Images: Literature Survey General Overview

A very important preliminary step in the development of a change-detection system, based on automatic or semi-automatic procedures, consists in the design of a proper phase of data collection. The phase of data collection aims at defining: (1) the kind of satellite to be used (on the basis of the repetition time and on the characteristics of the sensors mounted on-board), (2) the kind of sensor to be considered (on the basis of the desired properties of the images and of the system), (3) the end-user requirements (which are of basic importance for the development of a proper change-detection

2

It is worth noting that these kinds of approaches have been successfully exploited in image classification problems [9–12].

Unsupervised Change Detection in Multi-Temporal SAR Images

377

technique), and (4) the kinds of available ancillary data (all the available information that can be used for constraining the change-detection procedure). The outputs of the data-collection phase should be used for defining the automatic change-detection technique. In the literature, many different techniques have been proposed. We can distinguish between two main categories: supervised and unsupervised methods [9,13]. When performing supervised change detection, in addition to the multi-temporal images, multi-temporal ground-truth information is also needed. This information is used for identifying, for each possible land-cover class, spectral signature samples for performing supervised data classification and also for explicitly identifying what kinds of land-cover transitions have taken place. Three main general approaches to supervised change detection can be found in the literature: postclassification comparison, supervised direct multi-data classification [13], and compound classification [14–16]. Postclassification comparison computes the change-detection map by comparing the classification maps obtained by classifying independently two multi-temporal remote-sensing images. On the one hand, this procedure avoids data normalization aimed at reducing atmospheric conditions, sensor differences, etc. between the two acquisitions; on the other hand, it critically depends on the accuracies of the classification maps computed at the two acquisition dates. As postclassification comparison does not take into account the dependence existing between two images of the same area acquired at two different times, the global accuracy is close to the product of the accuracies yielded at the two times [13]. Supervised direct multi-data classification [13] performs change detection by considering each possible transition (according to the available a priori information) as a class and by training a classifier to recognize the transitions. Although this method exploits the temporal correlation between images in the classification process, its major drawback is that training pixels should be related to the same points on the ground at the two times and should accurately represent the proportions of all the transitions in the whole images. Compound classification overcomes the drawbacks of supervised multi-date classification technique by removing the constraint that training pixels should be related to the same area on the ground [14–16]. In general, the approach based on supervised classification is more accurate and detailed than the unsupervised one; nevertheless, the latter approach is often preferred in real-data applications. This is due to the difficulties in collecting proper ground-truth information (necessary for supervised techniques), which is a complex, time consuming, and expensive process (in many cases this process is not consistent with the application constraints). Unsupervised change-detection techniques are based on the comparison of the spectral reflectances of multi-temporal raw images and a subsequent analysis of the comparison output. In the literature, the most widely used unsupervised change-detection techniques are based on a three-step procedure [13,17]: (1) preprocessing, (2) pixel-by-pixel comparison of two raw images, and (3) image analysis and thresholding (Figure 17.1). The aim of the preprocessing step is to make the two considered images as comparable as possible. In general, preprocessing operations include: co-registration, radiometric and geometric corrections, and noise reduction. From the practical point of view, co-registration is a fundamental step as it allows obtaining a pair of images where corresponding pixels are associated to the same position on the ground3. Radiometric corrections reduce differences between the two acquisitions due to sunlight and atmospheric conditions. These procedures are applied to optical images, but they are not necessary for SAR 3

It is worth noting that usually it is not possible to obtain a perfect alignment between temporal images. This may considerably affect the change-detection process [18]. Consequently, if the amount of residual misregistration noise is significant, proper techniques aimed at reducing its effects should be used for change detection [1,4].

Signal and Image Processing for Remote Sensing

378

SAR Image (date t2)

SAR Image (date t2)

Preprocessing

Preprocessing

Comparison

Analysis of the log-ratio image FIGURE 17.1 Block scheme of a standard unsupervised changedetection approach.

Change-detection map (M )

images (as SAR data are not affected by atmospheric conditions). Also noise reduction is performed differently according to the kind of remote-sensing images considered. In optical images common low-pass filters can be used, whereas in SAR images proper despeckling filters should be applied. The comparison step aims at producing a further image where differences between the two acquisitions considered are highlighted. Different mathematical operators (see Table 17.2 for a summary) can be adopted for performing image comparison; this choice gives rise to different kinds of techniques [13,19–23]. One of the most widely used operators is the difference one. The difference can be applied to: (1) a single spectral band (univariate image differencing) [13,21–23], (2) multiple spectral bands (change vector analysis) [13,24], and (3) vegetation indices (vegetation index differencing) [13,19] or other linear (e.g., tasselled cap transformation [22]) or nonlinear combinations of spectral bands. Another widely used operator is the ratio operator (image ratioing) [13], which can be successfully used in SAR image processing [17,25,26]. A different approach is based on the use of the principal component analysis (PCA) [13,20,23]. PCA can be applied separately to the feature space at single times or jointly to both images. In the first case, comparison should be performed in the transformed feature space before performing change detection; in the second case, the minor components of the transformed feature space contains change information. TABLE 17.2 Summary of the Most Widely Used Comparison Operators. (fk is the considered feature at time tk that can be: (1) a single spectral band Xbk, (2) a vector of m spectral bands [Xk1, . . . ,Xkm], (3) a vegetation index Vk, or (4) a vector of features [Pk1, . . . ,Pkm] obtained after PCA. XD and XR are the images after comparison with the difference or ratio operators, respectively) Technique Univariate image differencing Vegetation index differencing Image rationing Change vector analysis Principal component analysis

Feature Vector fk at the Time tk fk fk fk fk fk

¼ ¼ ¼ ¼ ¼

Xkb Vk Xkb [Xk1, . . . ,Xkm] [Pk1, . . . , Pkn]

Comparison Operator XD XD XR XD XD

¼ ¼ ¼ ¼ ¼

f2 f1 f2 f1 f2=f1 kf2 f1k kf2 f1k

Unsupervised Change Detection in Multi-Temporal SAR Images

379

Performances of the above-mentioned techniques could be degraded by several factors (like differences in illumination at two dates, differences in atmospheric conditions, and in sensor calibration) that make a direct comparison between raw images acquired at different times difficult. These problems related to unsupervised change detection disappear when dealing with SAR images instead of optical data. Once image comparison is performed, the decision threshold can be selected either with a manual trial-and-error procedure (according to the desired trade-off between false and missed alarms) or with automatic techniques (e.g., by analyzing the statistical distribution of the image obtained after comparison, by fixing the desired false alarm probability [27,28], or following a Bayesian minimum-error decision rule [17]). Since the remote-sensing community has devoted more attention to passive sensors [6–8] rather than active SAR sensors, in the following section we focus our attention on changedetection techniques for SAR data. Although change-detection techniques based on different architectures have been proposed for SAR images [29–37], we focus on the most widely used techniques, which are based on the three-step procedure described above (see Figure 17.1). 17.2.2

Change Detection in SAR Images

Let us consider two co-registered intensity SAR images, X1 ¼ {X1(i,j), 1 i I, 1 j J} and X2 ¼ {X2(i,j), 1 i I, 1 j J}, of size I J, acquired over the same area at different times t1 and t2. Let V ¼ {vc, vu} be the set of classes associated with changed and unchanged pixels. Let us assume that no ground-truth information is available for the design of the change-detection algorithm, i.e., the statistical analysis of change and no-change classes should be performed only on the basis of the raw data. The changedetection process aims at generating a change-detection map representing changes on the ground between the two considered acquisition dates. In other words, one of the possible labels in V should be assigned to each pixel (i,j) in the scene. 17.2.2.1

Preprocessing

The first step for properly performing change detection based on direct image comparison is image preprocessing. This procedure aims at generating two images that are as similar as possible unless in changed areas. As SAR data are not corrupted by differences in atmospheric and sunlight conditions, preprocessing usually comprises three steps: (1) geometric correction, (2) co-registration, and (3) noise reduction. The first procedure aims at reducing distortions that are strictly related to the active nature of the SAR signal, as layover, foreshortening, and shadowing due to ground topography. The second step is very important, as it allows aligning temporal images to ensure that corresponding pixels in the spatial domain are associated to the same geographical position on the ground. Co-registration in SAR images is usually carried out by maximizing crosscorrelation between the multi-temporal images [38,39]. The major drawback of this process is the need for performing interpolation of backscattering values, which is a time-consuming process. Finally, the last step is aimed at reducing the speckle noise. Many different techniques have been developed in the literature for reducing the speckle. One of the most attractive techniques for speckle reduction is multi-looking [25]. This procedure, which is used for generating images with the same resolution along the azimuth and range directions, allows reduction of the effect of the coherent speckle components. However, a further filtering step is usually applied to the images for making them suitable to the desired analysis. Usually, adaptive despeckling procedures are applied. Among these procedures we mention the following filtering techniques:

380

Signal and Image Processing for Remote Sensing

Frost [40], Lee [41], Kuan [42], Gamma Map [43,44], and Gamma WMAP [45] (i.e., the Gamma MAP filter applied in the wavelet domain). As the description of despeckling filters is outside the scope of this chapter, we refer the reader to the literature for more details.

17.2.2.2 Multi-Temporal Image Comparison As described in Section 17.1, image pixel-by-pixel comparison can be performed by means of different mathematical operators. In general, the most widely used operators are the difference and the ratio (or log-ratio). Depending on the selected operator, the image resulting from the comparison presents different behaviors with respect to the change-detection problem and to the signal statistics. To analyze this issue, let us consider two multi-look intensity images. It is possible to show that the measured backscattering of each image follows a Gamma distribution [25,26], that is, LL XkL1 LXk p(Xk ) ¼ L exp , mk mk (L 1)!

k ¼ 1, 2

(17:1)

where Xk is a random variable that represents the value of the pixels in image Xk (k ¼ 1, 2), mk is the average intensity of a homogeneous region at time tk, and L is the equivalent number of looks (ENL) of the considered image. Let us also assume that the intensity images X1 and X2 are statistically independent. This assumption, even if not entirely realistic, simplifies the analytical derivation of the pixel statistical distribution in the image after comparison. In the following, we analyze the effects of the use of the difference and ratio (log-ratio) operators on the statistical distributions of the signal. 17.2.2.2.1 Difference Operator The difference image XD is computed subtracting the image acquired before the change from the image acquired after, i.e., XD ¼ X2 X1

(17:2)

Under the stated conditions, the distribution of the difference image XD is given by [25,26]: n o XD j L1 X LL exp L m2 (L 1 þ j)! m1 m2 L1J X p(XD ) ¼ j!(L 1 j)! D (L 1)! (m1 þ m2 )L L(m1 þ m2 ) j¼0

(17:3)

where XD is a random variable that represents the values of the pixels in XD. As can be seen, the difference-image distribution depends on both the relative change between the intensity values in the two images and also a reference intensity value (i.e., the intensity at t1 or t2). It is possible to show that the distribution variance of XD increases with the reference intensity level. From a practical point of view, this leads to a higher changedetection error for changes that have occurred in high intensity regions of the image than in low intensity regions. Although in some applications the difference operator was used with SAR data [46], this behavior is an undesired effect that renders the difference operator intrinsically not suited to the statistics of SAR images.

Unsupervised Change Detection in Multi-Temporal SAR Images 17.2.2.2.2

381

Ratio Operator

The ratio image XR is computed by dividing the image acquired after the change by the image acquired before (or vice-versa), i.e., X R ¼ X 2 =X 1

(17:4)

It is possible to prove that the distribution of the ratio image XR can be written as follows [25,26]: p(XR ) ¼

L XL1 (2L 1)! X R R R þ XR )2L (L 1)!2 (X

(17:5)

R is the where XR is a random variable that represents the values of the pixels in XR and X true change in the radar cross section. The ratio operator shows two main advantages over the difference operator. The first one is that the ratio-image distribution depends R ¼ m2=m1 in the average intensity between the two dates only on the relative change X and not on a reference intensity level. Thus changes are detected in the same manner both in high- and low-intensity regions. The second advantage is that the ratioing allows reduction in common multiplicative error components (which are due to both multiplicative sensor calibration errors and the multiplicative effects of the interaction of the coherent signal with the terrain geometry [25,47]), as far as these components are the same for images acquired with the same geometry. It is worth noting that, in the literature, the ratio image is usually expressed in a logarithmic scale. With this operation the distribution of the two classes of interest (vc and vu) in the ratio image can be made more symmetrical and the residual multiplicative speckle noise can be transformed to an additive noise component [17]. Thus the log-ratio operator is typically preferred when dealing with SAR images and change detection is performed analyzing the log-ratio image XLR defined as: X LR ¼ log X R ¼ log

X2 ¼ log X 2 log X 1 X1

(17:6)

Based on the above considerations, the ratio and log-ratio operators are more used than the difference one in SAR change-detection applications [17,26,29,47–49]. It is worth noting that for keeping the changed class on one side of the histogram of the ratio (or log-ratio) image, a normalized ratio can be computed pixel-by-pixel, i.e.,

X NR

X1 X2 ¼ min , X2 X1

(17:7)

This operator allows all changed areas (independently of the increasing or decreasing value of the backscattering coefficient) to play a similar role in the change-detection problem.

17.2.2.3

Analysis of the Ratio and Log-Ratio Image

The most widely used approach to extract change information from the ratio and log-ratio image is based on histogram thresholding4. In this context, the most difficult task is to 4

For simplicity, in the following, we will refer to the log-ratio image.

382

Signal and Image Processing for Remote Sensing

properly define the th

Signal andImage Processing forRemote Sensing Editedby C.H.Chen

Boca Raton London New York

CRC is an imprint of the Taylor & Francis Group, an informa business

Published in 2007 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2007 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-10: 0-8493-5091-3 (Hardcover) International Standard Book Number-13: 978-0-8493-5091-7 (Hardcover) Library of Congress Card Number 2006009330 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Signal and image processing for remote sensing / edited by C.H. Chen. p. cm. Includes bibliographical references and index. ISBN 0-8493-5091-3 (978-0-8493-5091-7) 1. Remote sensing--Data processing. 2. Image processing. 3. Signal processing. I. Chen, C. H. (Chi-hau), 1937G70.4.S535 2006 621.36’78--dc22

2006009330

Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com Taylor & Francis Group is the Academic Division of Informa plc.

and the CRC Press Web site at http://www.crcpress.com

Preface

Both signal processing and image processing are playing increasingly important roles in remote sensing. As most data from satellites are in image form, image processing has been used most often, whereas signal processing contributes significantly in extracting information from the remotely sensed waveforms or time series data. In contrast to other books in this field, which deal almost exclusively with the image processing for remote sensing, this book provides a good balance between the roles of signal processing and image processing in remote sensing. It focuses mainly on methodologies of signal processing and image processing in remote sensing. An emphasis is placed on the mathematical techniques that, we believe, will not change as much as sensor, software, and hardware technologies. Furthermore, the term ‘‘remote sensing’’ is not limited to the problems with data from satellite sensors. Other sensors that acquire data remotely are also considered. Another unique feature of this book is that it covers a broader scope of problems in remote sensing information processing than any other book in this area. The book is divided into two parts. Part I deals with signal processing for remote sensing and has 12 chapters. Part II deals with image processing for remote sensing and has 16 chapters. The chapters are written by leaders in the field. We are very fortunate to have Dr. Norden Huang, inventor of the Huang–Hilbert transform, along with Dr. Steven Long, write a chapter on the application of the normalized Hilbert transform to remote sensing problem, and to have Dr. Enders A. Robinson, who has made many major contributions to geophysical signal processing for over half a century, write a chapter on the basic problem of constructing seismic images by ray tracing. In Part I, following Chapter 1 by Drs. Long and Huang, and my short Chapter 2 on the roles of statistical pattern recognition and statistical signal processing in remote sensing, we start from a very low end of the electromagnetic spectrum. Chapter 3 considers the classification of infrasound at a frequency range of 0.01 Hz to 10 Hz by using a parallel bank neural network classifier and an 11-step feature selection process. The >90% correct classification rate is impressive for this kind of remote sensing data. Chapter 4 through Chapter 6 deal with seismic signal processing. Chapter 4 provides excellent physical insights on the steps for constructing digital seismic images. Even though the seismic image is an image, this chapter is placed in Part I as seismic signals start as waveforms. Chapter 5 considers the singular value decomposition of a matrix data set from scalarsensors arrays, which is followed by an independent component analysis (ICA) step to relax the unjustified orthogonality constraint for the propagation vectors by imposing a stronger constraint of fourth-order independence of the estimated waves. With an initial focus on the use of ICA in seismic data and inspired by Dr. Robinson’s lecture on seismic deconvolution at the 4th International Symposium, 2002, on ‘‘Computer Aided Seismic Analysis and Discrimination,’’ Mr. Zhenhai Wang has examined approaches beyond ICA for improving seismic images. Chapter 6 is an effort to show that factor analysis, as an alternative to stacking, can play a useful role in removing some unwanted components in the data and thereby enhancing the subsurface structure as shown in the seismic images. Chapter 7 on Kalman filtering for improving detection of landmines using electromagnetic signals, which experience severe interference, is another remote sensing problem of higher interest in recent years. Chapter 8 is a representative time-series analysis problem on using meterological and remote sensing indices to monitor vegetation moisture

dynamics. Chapter 9 actually deals with the image data for a digital elevation model but is placed in Part I mainly because the prediction error (PE) filter originates from the geophysical signal processing. The PE filter allows us to interpolate the missing parts of an image. The only chapter that deals with the sonar data is Chapter 10, which shows that a simple blind source separation algorithm based on the second-order statistics can be very effective to remove reverberations in active sonar data. Chapter 11 and Chapter 12 are excellent examples of using neural networks for the retrieval of physical parameters from the remote sensing data. Chapter 12 further provides a link between signal and image processing as the principal component analysis and image sharpening tools employed are exactly what are needed in Part II. With a focus on image processing of remote sensing images, Part II begins with Chapter 13, which is concerned with the physics and mathematical algorithms for determining the ocean surface parameters from synthetic aperture radar (SAR) images. Mathematically, the Markov random field (MRF) is one of the most useful models for the rich contextual information in an image. Chapter 14 provides a comprehensive treatment of MRF-based remote sensing image classification. Besides an overview of previous work, the chapter describes the methodological issues involved and presents results of the application of the technique to the classification of real (both single-date and multitemporal) remote sensing images. Although there are many studies on using an ensemble of classifiers to improve the overall classification performance, the random forest machine learning method for classification of hyperspectral and multisource data as presented in Chapter 15 is an excellent example of using new statistical approaches for improved classification with the remote sensing data. Chapter 16 presents another machine learning method, AdaBoost, to obtain a robustness property in the classifier. The chapter further considers the relations among the contextual classifier, MRF-based methods, and spatial boosting. The following two chapters are concerned with different aspects of the change detection problem. Change detection is a uniquely important problem in remote sensing as the images acquired at different times over the same geographical area can be used in the areas of environmental monitoring, damage management, and so on. After discussing change detection methods for multitemporal SAR images, the chapter examines an adaptive scale-driven technique for change detection in medium resolution SAR data. Chapter 18 evaluates a Wiener filter–based method, Mahalanobis distance, and subspace projection methods of change detection, as well as a higher order, nonlinear mapping using kernel-based machine learning concept to improve the change detection performance as represented by receiver operating characteristic (ROC) curves. In recent years, ICA and related approaches have presented a new potential in remote sensing information processing. A challenging task underlying many hyperspectral imagery applications is decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions. Chapter 19 presents a new method for unsupervised endmember extraction called vertex component analysis (VCA). The VCA algorithms presented have better or comparable performance as compared to two other techniques but require less computational complexity. Other useful ICA applications in remote sensing include feature extraction and speckle reduction of SAR images. Chapter 20 presents two different methods of SAR image speckle reduction using ICA, both making use of the FastICA algorithm. In two-dimensional time series modeling, Chapter 21 makes use of a fractionally integrated autoregressive moving average (FARIMA) analysis to model the mean radial power spectral density of the sea SAR imagery. Long-range dependence models are used in addition to the fractional sea surface models for the simulation of the sea SAR image spectra at different sea states, with and without oil slicks at low computational cost.

Returning to the image classification problem, Chapter 22 deals with the topics of pixel classification using a Bayes classifier, region segmentation guided by morphology and the split-and-merge algorithm, region feature extraction, and region classification. Chapter 23 provides a tutorial presentation of different issues of data fusion for remote sensing applications. Data fusion can improve classification, and four multisensor classifiers are presented for the decision level fusion strategies. Beyond the currently popular transform techniques, Chapter 24 demonstrates that the Hermite transform can be very useful for noise reduction and image fusion in remote sensing. The Hermite transform is an image representation model that mimics some of the important properties of human visual perception, namely, local orientation analysis and the Gaussian derivative model of early vision. Chapter 25 is another chapter that demonstrates the importance of image fusion to improving sea ice classification performance, using backpropagation trained neural network and linear discrimination analysis and texture features. Chapter 26 is on the issue of accuracy assessment for which the Bradley–Terry model is adopted. Chapter 27 is on land map classification using a support vector machine, which has been increasingly popular as an effective classifier. The land map classification classifies the surface of the Earth into categories such as water area, forests, factories, or cities. Finally, with lossless data compression in mind, Chapter 28 focuses on information-theoretic measure of the quality of multiband remotely sensed digital images. The procedure relies on the estimation of parameters of the noise model. Results on image sequences acquired by AVIRIS and ASTER imaging sensors offer an estimation of the information contents of each spectral band. With rapid technological advances in both sensor and processing technologies, a book of this nature can only capture a certain amount of current progress and results. However, if past experience offers any indication, the numerous mathematical techniques presented will give this volume a long lasting value. The sister volumes of this book are the other two books edited by myself. One is Information Processing for Remote Sensing and the other is Frontiers of Remote Sensing Information Processing, both published by World Scientific in 1999 and 2003, respectively. I am grateful to all the contributors of this volume for their important contribution and, in particular, to Professors Dr. J.S. Lee, S. Serpico, L. Bruzzone, and S. Omatu for chapter contributions to all three volumes. Readers are advised to go over all three volumes for more complete information on signal and image processing for remote sensing. C.H. Chen

Editor

Chi Hau Chen was born on December 22nd, 1937. He received his Ph.D. in electrical engineering from Purdue University in 1965, M.S.E.E. degree from the University of Tennessee, Knoxville, in 1962, and B.S.E.E. degree from the National Taiwan University in 1959. He is currently chancellor professor of electrical and computer engineering at the University of Massachusetts, Dartmouth, where he has taught since 1968. His research areas are in statistical pattern recognition and signal/image processing with applications to remote sensing, geophysical, underwater acoustics, and nondestructive testing problems, as well as computer vision for video surveillance, time series analysis, and neural networks. Dr. Chen has published 23 books in his area of research. He is the editor of Digital Waveform Processing and Recognition (CRC Press, 1982) and Signal Processing Handbook (Marcel Dekker, 1988). He is the chief editor of Handbook of Pattern Recognition and Computer Vision, volumes 1, 2, and 3 (World Scientific Publishing, 1993, 1999, and 2005, respectively). He is the editor of Fuzzy Logic and Neural Network Handbook (McGraw-Hill, 1966). In the area of remote sensing, he is the editor of Information Processing for Remote Sensing and Frontiers of Remote Sensing Information Processing (World Scientific Publishing, 1999 and 2003, respectively). He served as the associate editor of the IEEE Transactions on Acoustics Speech and Signal Processing for 4 years, IEEE Transactions on Geoscience and Remote Sensing for 15 years, and since 1986 he has been the associate editor of the International Journal of Pattern Recognition and Artificial Intelligence. Dr. Chen has been a fellow of the Institutue of Electrical and Electronic Engineers (IEEE) since 1988, a life fellow of the IEEE since 2003, and a fellow of the International Association of Pattern Recognition (IAPR) since 1996.

Contributors

J. van Aardt Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Ranjan Acharyya Florida Institute of Technology, Melbourne, Florida Bruno Aiazzi Institute of Applied Physics, National Research Council, Florence, Italy Selim Aksoy Bilkent University, Ankara, Turkey V.Yu. Alexandrov Nansen International Environmental and Remote Sensing Center, St. Petersburg, Russia Luciano Alparone Department of Electronics and Telecommunications, University of Florence, Florence, Italy Stefano Baronti Institute of Applied Physics, National Research Council, Florence, Italy Jon Atli Benediktsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland Fabrizio Berizzi Department of Information Engineering, University of Pisa, Pisa, Italy Massimo Bertacca ISL-ALTRAN, Analysis and Simulation Group—Radar Systems Analysis and Signal Processing, Pisa, Italy Nicolas Le Bihan Laboratory of Images and Signals, CNRS UMR 5083, INP of Grenoble, France William J. Blackwell Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts L.P. Bobylev Nansen International Environmental and Remote Sensing Center, St. Petersburg, Russia A.V. Bogdanov Institute for Neuroinformatich, Bochum, Germany Francesca Bovolo Department of Information and Communication Technology, University of Trento, Trento, Italy Lorenzo Bruzzone Department of Information and Communication Technology, University of Trento, Trento, Italy

Chi Hau Chen Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Frederick W. Chen Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts Salim Chitroub Signal and Image Processing Laboratory, Department of Telecommunication, Algiers, Algeria Leslie M. Collins Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina Fengyu Cong State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China P. Coppin Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Jose´ M.B. Dias Department of Electrical and Computer Engineering, Instituto Superior Te´cnico, Av. Rovisco Pais, Lisbon, Portugal Shinto Eguchi Institute of Statistical Mathematics, Tokyo, Japan Boris Escalante-Ramı´rez School of Engineering, National Autonomous University of Mexico, Mexico City, Mexico Toru Fujinaka

Osaka Prefecture University, Osaka, Japan

Gerrit Gort Department of Biometris, Wageningen University, The Netherlands Fredric M. Ham Florida Institute of Technology, Melbourne, Florida Norden E. Huang

NASA Goddard Space Flight Center, Greenbelt, Maryland

Shaoling Ji State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Peng Jia State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Sveinn R. Joelsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland O.M. Johannessen Nansen Environmental and Remote Sensing Center, Bergen, Norway I. Jonckheere Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium

P. Jo¨nsson

Teachers Education, Malno¨ University, Sweden

Dayalan Kasilingam Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Heesung Kwon U.S. Army Research Laboratory, Adelphi, Maryland Jong-Sen Lee Remote Sensing Division, Naval Research Laboratory, Washington, D.C. S. Lhermitte Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Steven R. Long

NASA Goddard Space Flight Center, Wallops Island, Virginia

Alejandra A. Lo´pez-Caloca Center for Geography and Geomatics Research, Mexico City, Mexico Arko Lucieer Centre for Spatial Information Science (CenSIS), University of Tasmania, Australia Je´roˆme I. Mars Laboratory of Images and Signals, CNRS UMR 5083, INP of Grenoble, France Enzo Dalle Mese Department of Information Engineering, University of Pisa, Pisa, Italy Gabriele Moser Department of Biophysical and Electronic Engineering, University of Genoa, Genoa, Italy Jose´ M.P. Nascimento Instituto Superior, de Eugenharia de Lisbon, Lisbon, Portugal Nasser Nasrabadi U.S. Army Research Laboratory, Adelphi, Maryland Ryuei Nishii Faculty of Mathematics, Kyusyu University, Fukuoka, Japan Sigeru Omatu Osaka Prefecture University, Osaka, Japan Enders A. Robinson Columbia University, Newsburyport, Massachusetts S. Sandven

Nansen Environmental and Remote Sensing Center, Bergen, Norway

Dale L. Schuler D.C.

Remote Sensing Division, Naval Research Laboratory, Washington,

Massimo Selva Institute of Applied Physics, National Research Council, Florence, Italy

Sebastiano B. Serpico Department of Biophysical and Electronic Engineering, University of Genoa, Genoa, Italy Xizhi Shi State Key Laboratory of Vibration, Shock & Noise, Shanghai Jiaotong University, Shanghai, China Anne H.S. Solberg Department of Informatics, University of Oslo and Norwegian Computing Center, Oslo, Norway Alfred Stein International Institute for Geo-Information Science and Earth Observation, Enschede, The Netherlands Johannes R. Sveinsson Department of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland Yingyi Tan

Applied Research Associates; Inc., Raleigh, North Carolina

Stacy L. Tantum Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina Maria Tates U.S. Army Research Laboratory, Adelphi, Maryland, and Morgan State University, Baltimore, Maryland J. Verbesselt Group of Geomatics Engineering, Department of Biosystems, Katholieke Universiteit Leuven, Leuven, Belgium Valeriu Vrabie CReSTIC, University of Reims, Reims, France Xianju Wang Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Zhenhai Wang Department of Electrical and Computer Engineering, University of Massachusetts Dartmouth, North Dartmouth, Massachusetts Carl White Morgan State University, Baltimore, Maryland Michifumi Yoshioka Osaka Prefecture University, Osaka, Japan Sang-Ho Yun Department of Geophysics, Stanford University, Stanford, California Howard Zebker Department of Geophysics, Stanford University, Stanford, California

Contents

Part I

Signal Processing for Remote Sensing

1.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing................................................................................................................3 Steven R. Long and Norden E. Huang

2.

Statistical Pattern Recognition and Signal Processing in Remote Sensing...........25 Chi Hau Chen

3.

A Universal Neural Network–Based Infrasound Event Classifier ..........................33 Fredric M. Ham and Ranjan Acharyya

4.

Construction of Seismic Images by Ray Tracing.........................................................55 Enders A. Robinson

5.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA ...................................................................75 Nicolas Le Bihan, Valeriu Vrabie, and Je´roˆme I. Mars

6.

Application of Factor Analysis in Seismic Profiling Zhenhai Wang and Chi Hau Chen ......................................................................................103

7.

Kalman Filtering for Weak Signal Detection in Remote Sensing .........................129 Stacy L. Tantum, Yingyi Tan, and Leslie M. Collins

8.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation Moisture Dynamics .................................................153 J. Verbesselt, P. Jo¨nsson, S. Lhermitte, I. Jonckheere, J. van Aardt, and P. Coppin

9.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images ...............................................................................173 Sang-Ho Yun and Howard Zebker

10.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation............................................................................................189 Fengyu Cong, Chi Hau Chen, Shaoling Ji, Peng Jia, and Xizhi Shi

11.

Neural Network Retrievals of Atmospheric Temperature and Moisture Profiles from High-Resolution Infrared and Microwave Sounding Data.....................................................................................205 William J. Blackwell

12.

Satellite-Based Precipitation Retrieval Using Neural Networks, Principal Component Analysis, and Image Sharpening..........................................233 Frederick W. Chen

Part II

Image Processing for Remote Sensing

13.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface...........267 Dale L. Schuler, Jong-Sen Lee, and Dayalan Kasilingam

14.

MRF-Based Remote-Sensing Image Classification with Automatic Model Parameter Estimation......................................................................305 Sebastiano B. Serpico and Gabriele Moser

15.

Random Forest Classification of Remote Sensing Data ..........................................327 Sveinn R. Joelsson, Jon A. Benediktsson, and Johannes R. Sveinsson

16.

Supervised Image Classification of Multi-Spectral Images Based on Statistical Machine Learning........................................................................345 Ryuei Nishii and Shinto Eguchi

17.

Unsupervised Change Detection in Multi-Temporal SAR Images .......................373 Lorenzo Bruzzone and Francesca Bovolo

18.

Change-Detection Methods for Location of Mines in SAR Imagery....................401 Maria Tates, Nasser Nasrabadi, Heesung Kwon, and Carl White

19.

Vertex Component Analysis: A Geometric-Based Approach to Unmix Hyperspectral Data .....................................................................415 Jose´ M.B. Dias and Jose´ M.P. Nascimento

20.

Two ICA Approaches for SAR Image Enhancement................................................441 Chi Hau Chen, Xianju Wang, and Salim Chitroub

21.

Long-Range Dependence Models for the Analysis and Discrimination of Sea-Surface Anomalies in Sea SAR Imagery ...........................455 Massimo Bertacca, Fabrizio Berizzi, and Enzo Dalle Mese

22.

Spatial Techniques for Image Classification..............................................................491 Selim Aksoy

23.

Data Fusion for Remote-Sensing Applications..........................................................515 Anne H.S. Solberg

24.

The Hermite Transform: An Efficient Tool for Noise Reduction and Image Fusion in Remote-Sensing .........................................................................539 Boris Escalante-Ramı´rez and Alejandra A. Lo´pez-Caloca

25.

Multi-Sensor Approach to Automated Classification of Sea Ice Image Data.....559 A.V. Bogdanov, S. Sandven, O.M. Johannessen, V.Yu. Alexandrov, and L.P. Bobylev

26.

Use of the Bradley–Terry Model to Assess Uncertainty in an Error Matrix from a Hierarchical Segmentation of an ASTER Image...................591 Alfred Stein, Gerrit Gort, and Arko Lucieer

27.

SAR Image Classification by Support Vector Machine...........................................607 Michifumi Yoshioka, Toru Fujinaka, and Sigeru Omatu

28.

Quality Assessment of Remote-Sensing Multi-Band Optical Images ..................621 Bruno Aiazzi, Luciano Alparone, Stefano Baronti, and Massimo Selva

Index .............................................................................................................................................643

Part I

Signal Processing for Remote Sensing

1 On the Normalized Hilbert Transform and Its Applications in Remote Sensing

Steven R. Long and Norden E. Huang

CONTENTS 1.1 Introduction ........................................................................................................................... 3 1.2 Review of Processing Advances......................................................................................... 4 1.2.1 The Normalized Empirical Mode Decomposition.............................................. 4 1.2.2 Amplitude and Frequency Representations ........................................................ 6 1.2.3 Instantaneous Frequency......................................................................................... 9 1.3 Application to Image Analysis in Remote Sensing....................................................... 12 1.3.1 The IR Digital Camera and Setup........................................................................ 13 1.3.2 Experimental IR Images of Surface Processes ................................................... 13 1.3.3 Volume Computations and Isosurfaces.............................................................. 18 1.4 Conclusion ........................................................................................................................... 21 Acknowledgment......................................................................................................................... 22 References ..................................................................................................................................... 22

1.1

Introduction

The development of this new approach was motivated by the need to describe nonlinear distorted waves in detail, along with the variations of these signals that occur naturally in nonstationary processes (e.g., ocean waves). As has been often noted, natural physical processes are mostly nonlinear and nonstationary. Yet, there have historically been very few options in the available analysis methods to examine data from such nonlinear and nonstationary processes. The available methods have usually been for either linear but nonstationary, or nonlinear but stationary, and statistically deterministic processes. The need to examine data from nonlinear, nonstationary, and stochastic processes in the natural world is due to the nonlinear processes which require special treatment. The past approach of imposing a linear structure (by assumptions) on the nonlinear system is not adequate. Other than periodicity, the detailed dynamics in the processes from the data also need to be determined. This is needed because one of the typical characteristics of nonlinear processes is its intrawave frequency modulation (FM), which indicates the instantaneous frequency (IF) changes within one oscillation cycle. In the past, when the analysis was dependent on linear Fourier analysis, there was no means of depicting the frequency changes within one wavelength (the intrawave frequency variation) except by resorting to the concept of harmonics. The term ‘‘bound 3

Signal and Image Processing for Remote Sensing

4

harmonics’’ was often used in this connection. Thus, the distortions of any nonlinear waveform have often been referred to as ‘‘harmonic distortions.’’ The concept of harmonic distortion is a mathematical artifact resulting from imposing a linear structure (through assumptions) on a nonlinear system. The harmonic distortions may thus have mathematical meaning, but there is no physical meaning associated with them, as discussed by Huang et al. [1,2]. For example, in the case of water waves, such harmonic components do not have any of the real physical characteristics of a water wave as it occurs in nature. The physically meaningful way to describe such data should be in terms of its IF, which will reveal the intrawave FMs occurring naturally. It is reasonable to suggest that any such complicated data should consist of numerous superimposed modes. Therefore, to define only one IF value for any given time is not meaningful (see Ref. [3], for comments on the Wigner–Ville distribution). To fully consider the effects of multicomponent data, a decomposition method should be used to separate the naturally combined components completely and nearly orthogonally. In the case of nonlinear data, the orthogonality condition would need to be relaxed, as discussed by Huang et al. [1]. Initially, Huang et al. [1] proposed the empirical mode decomposition (EMD) approach to produce intrinsic mode functions (IMF), which are both monocomponent and symmetric. This was an important step toward making the application truly practical. With the EMD satisfactorily determined, an important roadblock to truly nonlinear and nonstationary analysis was finally removed. However, the difficulties resulting from the limitations stated by the Bedrosian [4] and Nuttall [5] theorems must also be addressed in connection with this approach. Both limitations have firm theoretical foundations and must be considered; IMFs satisfy only the necessary condition, but not the sufficient condition. To improve the performance of the processing as proposed by Huang et al. [1], the normalized empirical mode decomposition (NEMD) method was developed as a further improvement on the earlier processing methods.

1.2 1.2.1

Review of Processing Advances The Normalized Empirical Mode Decomposition

The NEMD method was developed to satisfy the specific limitations set by the Bedrosian theorem while also providing a sharper measure of the local error when the quadrature differs from the Hilbert transform (HT) result. From an example data set of a natural process, all the local maxima of the data are first determined. These local maxima are then connected with a cubic spline curve, which gives the local amplitude of the data, A(t), as shown together in Figure 1.1. The envelope obtained through spline fitting is used to normalize the data by y(t) ¼

a(t) cos u(t) ¼ A(t)

a(t) cos u(t): A(t)

(1:1)

Here A(t) represents the cubic spline fit of all the maxima from the example data, and thus a(t)/A(t) should normalize y(t) with all maxima then normalized to unity, as shown in Figure 1.2. As is apparent from Figure 1.2, a small number of the normalized data points can still have an amplitude in excess of unity. This is because the cubic spline is through the maxima only, so that at locations where the amplitudes are changing rapidly, the line representing the envelope spline can pass under some of the data points. These occasional

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

5

Example data and splined envelope 0.3

0.2

Magnitude

0.1

0

−0.1

−0.2

−0.3

−0.4 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.1 The best possible cubic spline fit to the local maxima of the example data. The spline fit forms an envelope as an important first step in the process. Note also how the frequency can change within a wavelength, and that the oscillations can occur in groups.

misses are unavoidable, yet the normalization scheme has effectively separated the amplitude from the carrier oscillation. The IF can then be computed from this normalized carrier function y(t), just obtained. Owing to the nearly uniform amplitude, the limitations set by the Bedrosian theorem are effectively satisfied. The IF computed in this way from the normalized data from Figure 1.2 is shown in Figure 1.3, together with the original example data. With the Bedrosian theorem addressed, what of the limitations set by the Nuttall theorem? If the HT can be considered to be the quadrature, then the absolute value of the HT performed on the perfectly normalized example data should be unity. Then any deviation from the absolute value of the HT from unity would be an indication of a difference between the quadrature and the HT results. An error index can thus be defined simply as E(t) ¼ [abs(Hilbert transform (y(t))) 1]2 :

(1:2)

This error index would be not only an energy measure as given in the Nuttall theorem but also a function of time as shown in Figure 1.4. Therefore, it gives a local measure of the error resulting from the IF computation. This local measure of error is both logically and practically superior to the integrated error bound established by the Nuttall theorem. If the quadrature and the HT results are identical, then it follows that the error should be

Signal and Image Processing for Remote Sensing

6

Example data and normalized carrier 1.5 Normalized Data 1

Magnitude

0.5

0

−0.5

−1

−1.5 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.2 Normalized example data of Figure 1.1 with the cubic spline envelope. The occasional value beyond unity is due to the spline fit slightly missing the maxima at those locations.

zero. Based on experience with various natural data sets, the majority of the errors encountered here result from two sources. The first source is due to an imperfect normalization occurring at locations close to rapidly changing amplitudes, where the envelope spline-fitting is unable to turn sharply or quickly enough to cover all the data points. This type of error is even more pronounced when the amplitude is also locally small, thus amplifying any errors. The error index from this condition can be extremely large. The second source is due to nonlinear waveform distortions, which will cause corresponding variations of the phase function u(t). As discussed by Huang et al. [1], when the phase function is not an elementary function, the differentiation of the phase determined by the HT is not identical to that determined by the quadrature. The error index from this condition is usually small (see Ref. et al. [6]). Overall, the NEMD method gives a more consistent, stable IF. The occasionally large error index values offer an indication where the method failed simply because the spline misses and cuts through the data momentarily. All such locations occur at the minimum amplitude with a resulting negligible energy density.

1.2.2

Amplitude and Frequency Representations

In the initial methods [1,2,6], the main result of Hilbert spectral analysis (HSA) always emphasized the FM. In the original methods, the data were first decomposed into IMFs, as

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

7

Example data and instantaneous frequency 2 Data IF

Magnitude and frequency

1.5

1

0.5

0

−0.5 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.3 The instantaneous frequency determined from the normalized carrier function is shown with the example data. Data is about zero, and the instantaneous frequency varies about the horizontal 0.5 value.

defined in the initial work. Then, through the HT, the IF and the amplitude of each IMF were computed to form the Hilbert spectrum. This continues to be the method, especially when the data are normalized. The information on the amplitude or envelope variation is not examined. In the NEMD and HSA approach, it is justifiable not to pay too much attention to the amplitude variations. This is because if there is a mode mixing, the amplitude variation from such mixed mode IMFs does not reveal any true underlying physical processes. However, there are cases when the envelope variation does contain critical information. An example of this is when there is no mode mixing in any given IMF, when a beating signal representing the sum of two coexisting sinusoidal ones is encountered. In an earlier paper, Huang et al. [1] attempted to extract individual components out of the sum of two linear trigonometric functions such as x(t) ¼ cos at þ cos bt:

(1:3)

Two seemingly separate components were recovered after over 3000 sifting steps. Yet the obtained IMFs were not purely trigonometric functions anymore, and there were obvious aliases in the resulting IMF components as well as in the residue. The approach proposed then was unnecessary and unsatisfactory. The problem, in fact, has a much simpler solution: treating the envelope as an amplitude modulation (AM), and then processing just the envelope data. The function x(t), as given in Equation 1.3, can then be rewritten as

Signal and Image Processing for Remote Sensing

8

Offset example data and error measures 0.7 Error index EI quadrature

Offset magnitude, error index (EI), quadrature

0.6

Data + 0.3

0.5

0.4

0.3

0.2

0.1

0 0

200

400

600

800

1000 1200 Time (sec)

1400

1600

1800

2000

FIGURE 1.4 The error index as it changes with the data location in time. The original example data offset by 0.3 vertically for clarity is also shown. The quadrature result is not visible on this scale.

x(t) ¼ cos at þ cos bt ¼ 2 cos

aþb ab t cos t : 2 2

(1:4)

There is no difference between the sum of the individual components and the modulating envelope form; they are trigonometric identities. If both the frequency of the carrier wave, (aþb)/2, and the frequency of the envelope, (ab)/2, can be obtained, then all the information in the signal can be extracted. This indicates the reason to look for a new approach to extracting additional information from the envelope. In this example, however, the envelope becomes a rectified cosine wave. The frequency would be easier to determine from the simple period counting than from the Hilbert spectral result. For a more general case when the amplitudes of the two sinusoidal functions are not equal, the modulation is not simple anymore. For even more complicated cases, when there are more than two coexisting sinusoidal components with different amplitudes and frequencies, there is no general expression for the envelope and carrier. The final result could be represented as more than one frequency-modulated band in the Hilbert spectrum. It is then impossible to describe the individual components under this situation. In such cases, representing the signal as a carrier and envelope, variation should still be meaningful, for

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

9

the dual representations of frequency arise from the different definitions of frequency. The Hilbert-inspired view of amplitude and FMs still renders a correct representation of the signal, but this view is very different from that of Fourier analysis. In such cases, if one is sure of the stationarity and regularity of the signal, Fourier analysis could be used, which will give more familiar results as suggested by Huang et al. [1]. The judgment for these cases is not on which one is correct, as both are correct; rather, it is on which one is more familiar and more revealing. When more complicated data are present, such as in the case of radar returns, tsunami wave records, earthquake data, speech signals, and so on (representing a frequency ‘‘chirp’’), the amplitude variation information can be found by processing the envelope and treating the data as an approximate carrier. When the envelope of frequency chirp data, such as the example given in Figure 1.5, is decomposed through the NEMD process, the IMF components are obtained as shown in Figure 1.6. Using these components (or IMFs), the Hilbert spectrum can be constructed as given in Figure 1.7, together with its FM counterpart. The physical meaning of the AM spectrum is not as clearly defined in this case. However, it serves to illustrate the AM contribution to the variability of the local frequency. 1.2.3

Instantaneous Frequency

It must be emphasized that IF is a very different concept from the frequency content of the data derived from Fourier-based methods, as discussed in great detail by Huang Example of frequency chirp data 0.8

0.6

0.4

Magnitude

0.2

0

−0.2

−0.4

−0.6

−0.8 0

0.1

0.2

0.3 Time (sec)

0.4

FIGURE 1.5 A typical example of complex natural data, illustrating the concept of frequency ‘‘chirps.’’

0.5

Signal and Image Processing for Remote Sensing

10

Components of frequency chirp data

0

Offset IMF components C1 (top) to C8

12000

24000

36000

48000

60000

72000

84000 0

2000

4000

6000 Time (sec*22050)

8000

10000

12000

FIGURE 1.6 The eight IMF components obtained by processing the frequency chirp data of Figure 1.5, offset vertically from C1 (top) to C8 (bottom).

et al. [1]. The IF, as discussed here, is based on the instantaneous variation of the phase function from the HT of a data-adaptive decomposition, while the frequency content in the Fourier approach is an averaged frequency on the basis of a convolution of data with an a priori basis. Therefore, whenever the basis changes, the frequency content also changes. Similarly, when the decomposition changes, the IF also has to change. However, there are still persistent and common misconceptions on the IF computed in this manner. One of the most prevailing misconceptions about IF is that, for any data with a discrete line spectrum, IF can be a continuous function. A variation of this misconception is that IF can give frequency values that are not one of the discrete spectral lines. This dilemma can be resolved easily. In the nonlinear cases, when the IF approach treats the harmonic distortions as continuous intrawave FMs, the Fourier-based methods treat the frequency content as discrete harmonic spectral lines. In the case of two or more beating waves, the IF approach treats the data as AM and FM modulation, while the frequency content from the Fourier method treats each constituting wave as a discrete spectral line, if the process is stationary. Although they appear perplexingly different, they represent the same data. Another misconception is on negative IF values. According to Gabor’s [7] approach, the HT is implemented through two Fourier transforms: the first transforms the data into frequency space, while the second performs an inverse Fourier transform after discarding all the negative frequency parts [3]. Therefore, according to this argument, all the negative frequency content has been discarded, which then raises the question, how can there still

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

11

FM and AM Hilbert spectra 3000

2500

Frequency (Hz)

2000

1500

1000

500

0 0

0.05

0.1

0.15

0.2

0.25 0.3 Time (sec)

0.35

0.4

0.45

0.5

FIGURE 1.7 The AM and FM Hilbert spectral results from the frequency chirp data of Figure 1.5.

be negative frequency values? This question arises due to a misunderstanding of the nature of negative IF from the HT. The direct cause of negative frequency in the HT is the consequence of multiple extrema between two zero-crossings. Then, there are local loops not centered at the origin of the coordinate system, as discussed by Huang et al. [1]. Negative frequency can also occur even if there are no multiple extrema. For example, this would happen when there are large amplitude fluctuations, which cause the Hilberttransformed phase loop to miss the origin. Therefore, the negative frequency does not influence the frequency content in the process of the HT through Gabor’s [7] approach. Both these causes are removed by the NEMD and the normalized Hilbert transform (NHT) methods presented here. The latest versions of these methods (NEMD/NHT) consistently give more stable IF values. They satisfy the limitation set by the Bedrosian theorem and offer a local measure of error sharper than the Nuttall theorem. Note here that in the initial spline of the amplitude done in the NEMD approach, the end effects again become important. The method used here is just to assign the end points as a maximum equal to the very last value. Other improvements using characteristic waves and linear predictions, as discussed in Ref. [1], can also be employed. There could be some improvement, but the resulting fit will be very similar. Ever since the introduction of the EMD and HSA by Huang et al. [1,2,8], these methods have attracted increasing attention. Some investigators, however, have expressed certain reservations. For example, Olhede and Walden [9] suggested that the idea of computing

12

Signal and Image Processing for Remote Sensing

IF through the Hilbert transform is good, but that the EMD approach is not rigorous. Therefore, they have introduced the wavelet projection as the method for decomposition and adopt only the IF computation from the Hilbert transform. Flandrin et al. [10], however, suggest that the EMD is equivalent to a bank of dyadic filters, but refrain from using the HT. From the analysis presented here, it can be concluded that caution when using the HT is fully justified. The limitations imposed by Bedrosian and Nuttall certainly have solid theoretical foundations. The normalization procedure shown here will remove any reservations about further applications of the improved HT methods in data analysis. The method offers relatively little help to the approach advanced by Olhede and Walden [9] because the wavelet decomposition definitely removes the nonlinear distortions from the waveform. The consequence of this, however, is that their approach should also be limited to nonstationary, but linear, processes. It only serves the limited purpose of improving the poor frequency resolution of the continuous wavelet analysis. As clearly shown in Equation 1.1, to give a good representation of actual wave data or other data from natural processes by means of an analytical wave profile, the analytical profile will need to have IMFs, and also obey the limitations imposed by the Bedrosian and Nuttall theorems. In the past, such a thorough examination of the data has not been done. As reported by Huang et al. [2,8], most of the actual wave data recorded are not composed of single components. Consequently, the analytical representation of a given wave profile in the form of Equation 1.1 poses a challenging problem theoretically.

1.3

Application to Image Analysis in Remote Sensing

Just as much of the data from natural phenomena are either nonlinear or nonstationary, or both, so it is also with the data that form images of natural processes. The methods of image processing are already well advanced, as can be seen in reviews such as by Castleman [11] or Russ [12]. The NEMD/NHT methods can now be added to the available tools for producing new and unique image products. Nunes et al. [13] and Linderhed [14–16], among others, have already done significant work in this new area. Because of the nonlinear and nonstationary nature of natural processes, the NEMD/NHT approach is especially well suited for image data, giving frequencies, inverse distances, or wave numbers as a function of time or distance, along with the amplitudes or energy values associated with these, as well as a sharp identification of imbedded structures. The various possibilities and products of this new analysis approach include, but are not limited to, joint and marginal distributions, which can be viewed as isosurfaces, contour plots, and surfaces that contain information on frequency, inverse wavelength, amplitude, energy and location in time, space, or both. Additionally, the concept of component images representing the intrinsic scales and structures imbedded in the data is now possible, along with a technique for obtaining frequency variations of structures within the images. The laboratory used for producing the nonlinear waves, used as an example here, is the NASA Air–Sea Interaction Research Facility (NASIRF) located at the NASA Goddard Space Flight Center/Wallops Flight Facility, at Wallops Island, Virginia, within the Ocean Sciences Branch. The test section of the main wave tank is 18.3 m long and 0.9 m wide, filled to a depth of 0.76 m of water, leaving a height of 0.45 m over the water for airflow,

On the Normalized Hilbert Transform and Its Applications in Remote Sensing 5.05 m

13

New coil

1.22 m 18.29 m 30.48 m FIGURE 1.8 The NASA Air–Sea Interaction Research Facility’s (NASIRF) main wave tank at Wallops Island, VA. The new coils shown were used to provide cooling and humidity control in the airflow overheated water.

if needed. The facility can produce wind and paddle-generated waves over a water current in either direction, and its capabilities, instruments and software have been described in detail by Long and colleagues [17–21]. The basic description is shown with an additional new feature indicated as new coil in Figure 1.8. These were recently installed to provide cold air of controlled temperature and humidity for experiments using cold air overheated water during the Flux Exchange Dynamics Study of 2004 (FEDS4) experiments, a joint experiment involving the University of Washington/Applied Physics Laboratory (UW/APL), The University of Alberta, the Lamont-Doherty Earth Observatory of Columbia University, and NASA GSFC/Wallops Flight Facility. The cold airflow overheated water optimized conditions for the collection of infrared (IR) video images.

1.3.1

The IR Digital Camera and Setup

The camera used to acquire the laboratory image presented here as an example was provided by UW/APL as part of FEDS4. The experimental setup is shown in Figure 1.9. For the example shown here, the resolution of the IR image was 640 512 pixels. The camera was mounted to look upwind at the water surface, so that its pixel image area covered a physical rectangle on the water surface on the order of 10 cm per side. The water within the wave tank was heated by four commercial spa heaters, while the air in the airflow was cooled and humidity controlled by NASIRF’s new cooling and reheating coils. This produced a very thin layer of surface water that was cooled, so that whenever wave spilling and breaking occurred, it could be immediately seen by the IR camera.

1.3.2

Experimental IR Images of Surface Processes

With this imaging system in place, steps were taken to acquire interesting images of wave breaking and spilling due to wind and wave interactions. One such image is illustrated in Figure 1.10. To help the eyes visualize the image data, the IR camera intensity levels have been converted to a grey scale. Using a horizontal line that slices through the central area of the image at the value of 275, Figure 1.11 illustrates the details contained in the actual array of data values obtained from the IR camera. This gives the IR camera intensity values stored in the pixels along the horizontal line. These can then be converted to actual temperatures when needed. A complex structure is evident here. Breaking wave fronts are evident in the crescent-

Signal and Image Processing for Remote Sensing

14

Flux Exchange Dynamics Study (FEDS) 4 NASA Air–Sea Interaction Research Facility, Wallops Island, Virginia Q = rCPKH (Tb – Ts) Use two method for Tb • Direct measurement • Infer from PDF

IR camera

Instrumentation IR Camera IR Radiometer Air : U, T, q profiles SeaBird T sensors Fast T sensors Gas Chromatograph

Measurement KH via ACFT, p (Tsurf) Tskin, calibrated "LabRad" Qnet, u* Tbulk, calibrated Twater, sub-skin profiles Bulk KG LabRad IR radiometer Tskin (calibrated)

CO2 laser

KH via ACFT Tbulk via PDF

Tair qair u

FLUXES Qsensible Qlatent u*

Wind 45 cm

Tbulk 28 mm

Tprofile

76 cm

72 mm 150 mm

FIGURE 1.9 The experimental arrangement of FEDS4 (Flux Exchange Dynamics Study of 2004) used to capture IR images of surface wave processes. (Courtesy of A. Jessup and K. Phadnis of UW/APL.)

IR image of water wave surface 250

100 200

Vertical pixels

200 150 300

100

400

500

50

600 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

0

FIGURE 1.10 Surface IR image from the FEDS4 experiment. Grey bar gives the IR camera intensity levels. (Data courtesy of A. Jessup and K. Phadnis of UW/APL.)

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

15

shaped structures, where spilling and breaking brings up the underlying warmer water. After processing, the resulting components produced from the horizontal row of Figure 1.11 are shown in Figure 1.12. As can be seen, the component with the longest scale, C9, contains the bulk of the intensity values. The shorter, riding scales are fluctuations about the levels shown in component C9. The sifting was done via the extrema approach discussed in the foundation articles, and produced a total of nine components. Using this approach, the IR image was first divided into 640 horizontal rows of 512 values each. The rows were then processed to produce the components, each of the 640 rows producing a component set similar to that shown in Figure 1.12. From these basic results, component images can be assembled. This is done by taking the first component representing the shortest scale from each of the 640 component sets. These first components are then assembled together to produce an array that is also 640 rows by 512 columns and can also be visualized as an image. This is the first component image. This production of component images is then continued in a similar fashion with the remaining components representing progressively longer scales. To visualize the shortest component scales, component images 1 through 4 were added together, as shown in Figure 1.13. Throughout the image, streaks of short wavy structures can be seen to line up in the wind direction (along the vertical axis). Even though the image is formed in the IR camera by measuring heat at many different pixel locations over a rectangular area, the surface waves have an effect that can be thus remotely sensed in the image, either as streaks of warmer water exposed by breaking or as more wavelike structures. If the longer scale components are now combined using the 5th and 6th component images, a composite image is obtained as shown in Figure 1.14. Longer scales can be seen throughout the image area where breaking and mixing occur. Other wavelike

Row 275 of IR image of water wave surface

190 180 170

IR intensity levels

160 150 140 130 120 110 100 90 0

100

200

300

400

500

600

Horizontal pixels FIGURE 1.11 A horizontal slice of the raw IR image given in Figure 1.10, taken at row 275. Note the details contained in the IR image data, showing structures containing both short and longer length scales.

Signal and Image Processing for Remote Sensing

16

C1 C2 C3

10 0 −10 10 0 −10

5 0 −5 2 0 −2

C9

C7

5 0 −5 5 0 −5

C8

C6

C5

10 0 −10

C4

Components of row 275 of IR image 20 0 −20

134 132 130

50

100

150

200

250 300 Horizontal pixels

350

400

450

500

FIGURE 1.12 Components obtained by processing data from the slice shown in Figure 1.11. Note that component C9 carries the bulk of the intensity scale, while the other components with shorter scales record the fluctuations about these base levels.

structures of longer wavelengths are also visible. To produce a true wave number from images like these, one only has to convert using k ¼ 2p=l,

(1:5)

where k is wave number (in 1/cm) and l is wavelength (in cm). This would only require knowing the physical size of the image in centimeters or some other unit and its equivalent in pixels from the array analyzed. Another approach to the raw image of Figure 1.10 is to separate the original image into columns instead of rows. This would make the analysis more sensitive to structures that were better aligned with that direction, and also with the direction of wind and waves. By repeating the steps leading to Figure 1.13, the shortest scale component images in component images 3 to 5 can be combined to form Figure 1.15. Component images 1 and 2 developed from the vertical column analysis were not included here, after they were found to contain results of such a short scale uniformly spread throughout the image, and without structure. Indeed, they had the appearance of uniform noise. It is apparent that more structures at these scales can be seen by analyzing along the column direction. Figure 1.16 represents the longer scale in component image 6. By the 6th component image, the lamination process starts to fail somewhat in reassembling the image from the components. Further processing is needed to better match the results at these longer scales.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

17

Horizantal IR components 1 to 4 100 100

50

Vertical pixels

200

300

0

400 −50 500

−100

600 50

100

150

200 300 250 Horizontal pixels

350

400

450

500

FIGURE 1.13 (See color insert following page 302.) Component images 1 to 4 from the horizontal rows used to produce a composite image representing the shortest scales.

Horizontal IR components 5 to 6 40 100

30 20

200 Vertical pixels

10

300 0

400

−10 −20

500

−30 600 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

−40

FIGURE 1.14 (See color insert following page 302.) Component images 5 to 6 from the horizontal rows used to produce a composite image representing the longer scales.

Signal and Image Processing for Remote Sensing

18

Vertical pixels

Vertical IR components 3 to 5

100

40

200

20

300

0

400

−20

500

−40

600

−60 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.15 (See color insert following page 302.) Component images 3 to 5 from the vertical rows here combined to produce a composite image representing the midrange scales.

When the original data are a function of time, this new approach can produce the IF and amplitude as functions of time. Here, the original data are from an IR image, so that any slice through the image (horizontal or vertical) would be a set of camera values (ultimately temperature) representing the temperature variation over a physical length. Thus, instead of producing frequency (inverse time scale), the new approach here initially produces an inverse length scale. In the case of water surface waves, this is the familiar scale of the wave number, as given in Equation 1.5. To illustrate this, consider Figure 1.17, which shows the changes of scale along the selected horizontal row 400. The largest measures of IR energy can be seen to be at the smaller inverse length scales, which imply that it came from the longer scales of components 3 and 4. Figure 1.18 repeats this for the even longer length scales in components 5 and 6. Returning to the column-wise processing at column 250 of Figure 1.15 and Figure 1.16, further processing gives the contour plot of Figure 1.19, for components 3 through 5, and Figure 1.20, for components 4 through 6.

1.3.3

Volume Computations and Isosurfaces

Many interesting phenomena happen in the flow of time, and thus it is interesting to note how changes occur with time in the images. To include time in the analysis, a sequence of images taken at uniform time steps can be used. By starting with a single horizontal or vertical line from the image, a contour plot can be produced, as was shown in Figure 1.7 through Figure 1.20. Using a set of sequential images covering a known time period and a pixel line of data from each (horizontal or vertical), a set of numerical arrays can be obtained from the NEMD/NHT analysis. Each

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

19

Vertical IR components image 6 25 20 100 15

Vertical pixels

200

10 5

300 0 −5

400

−10 500 −15 −20

600 50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.16 (See color insert following page 302.) Component image 6 from the vertical row used to produce a composite image representing the longer scale.

array can be visualized by means of a contour plot, as already shown. The entire set of arrays can also be combined in sequence to form an array volume, or an array of dimension 3. Within the volume, each element of the array contains the amplitude or intensity of the data from the image sequence. The individual element location within the three-dimensional array specifies values associated with the stored data. One axis (call it x) of the volume can represent horizontal or vertical distance down the data line taken from the image. Another axis (call it y) can represent the resulting inverse length scale associated with the data. The additional axis (call it z) is produced by laminating the arrays together, and represents time, because each image was acquired in repetitive time steps. Thus, the position of the element in the volume gives location x along the horizontal or vertical slice, inverse length along the y-axis, and time along the z-axis. Isosurface techniques would be needed to visualize this. This could be compared to peeling an onion, except that the different layers, or spatial contour values, are not bound in spherical shells. After a value of data intensity is specified, the isosurface visualization makes all array elements transparent outside of the level of the value chosen, while shading in the chosen value so that the elements inside that level (or behind it) cannot be seen. Some examples of this procedure can be seen in Ref. [21]. Another approach with the analysis of images is to reassemble lines from the image data using a different format. A sequence of images in units of time is needed, and using the same horizontal or vertical line from each image in the time sequence, each line can be laminated to its predecessor to build up an array that is the image length along the chosen line along one edge, and the number of images along the other axis, in units of

Signal and Image Processing for Remote Sensing

20

Horizontal row 400: components 1 to 4 10

Inverse length scale (1/pixel)

0.45

9

0.4

8

0.35

7

0.3

6

0.25

5

0.2

4

0.15

3

0.1

2 1

0.05 50

100

150

200 250 300 350 Horizontal pixels

400

450

500

FIGURE 1.17 (See color insert following page 302.) The results from the NEMD/NHT computation on horizontal row 400 for components 1 to 4, which resulted from Figure 1.13. Note the apparent influence of surface waves on the IR information. The most intense IR radiation can be seen at the smaller values of inverse length scale, denoting the longer scales in components 3 and 4. A wavelike influence can be seen at all scales.

Horizontal row 400: components 5 to 6

0.04

8 0.035 7 Inverse length scale (1/pixel)

0.03 6 0.025 5 0.02 4 0.015

3

0.01

2

0.005

0 0

1

50

100

150

200 250 300 Horizontal pixels

350

400

450

500

FIGURE 1.18 (See color insert following page 302.) The results from the NEMD/NHT computation on horizontal row 400 for components 5 to 6, which resulted from Figure 1.14. Even at the longer scales, an apparent influence of surface waves on the IR information can still be seen.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

21

Inverse length scale (1/pixel)

Vertical column 250: components 3 to 5

0.18

10

0.16

9 8

0.14

7 0.12 6 0.1 5 0.08 4 0.06 3 0.04

2

0.02

1

100

200

300 400 Vertical pixels

500

600

FIGURE 1.19 The contour plot developed from the vertical slice at column 250, using the components 3 to 5. The larger IR values can be seen at longer length scales.

time. Once complete, this two-dimensional array can be split into slices along the time axis. Each of these time slices, representing the variation in data values with time at a single-pixel location, can then be processed with the new NEMD/NHT technique. An example of this can also be seen in Ref. [21]. The NEMD/NHT techniques can thus reveal variations in frequency or time in the data at a specific location in the image sequence.

1.4

Conclusion

With the introduction of the normalization procedure, one of the major obstacles for NEMD/NHT analysis has been removed. Together with the establishment of the confidence limit [6] through the variation of stoppage criterion, and the statistically significant test of the information content for IMF [10,22], and the further development of the concept of IF [23], the new analysis approach has indeed approached maturity for applications empirically, if not mathematically (for a recent overview of developments, see Ref. [24]). The new NEMD/NHT methods provide the best overall approach to determine the IF for nonlinear and nonstationary data. Thus, a new tool is available to aid in further understanding and gaining deeper insight into the wealth of data now possible by remote

Signal and Image Processing for Remote Sensing

22

Vertical column 250: components 4 to 6 0.1 25 0.09

Inverse length scale (1/pixel)

0.08

20

0.07 0.06

15

0.05 10

0.04 0.03

5

0.02 0.01 100

200

300 400 Vertical pixels

500

600

FIGURE 1.20 The contour plot developed from the vertical slice at column 250, using the components 4 through 6, as in Figure 1.19.

sensing and other means. Specifically, the application of the new method to data images was demonstrated. This new approach is covered by several U.S. Patents held by NASA, as discussed by Huang and Long [25]. Further information on obtaining the software can be found at the NASA authorized commercial site: http://www.fuentek.com/technologies/hht.htm

Acknowledgment The authors wish to express their continuing gratitude and thanks to Dr. Eric Lindstrom of NASA headquarters for his encouragement and support of the work.

References 1. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, S.H., Zheng, Q., Tung, C.C., and Liu, H.H., The empirical mode decomposition method and the Hilbert spectrum for non-stationary time series analysis, Proc. Roy. Soc. London, A454, 903–995, 1998.

On the Normalized Hilbert Transform and Its Applications in Remote Sensing

23

2. Huang, N.E., Shen, Z., and Long, S.R., A new view of water waves—the Hilbert spectrum, Ann. Rev. Fluid Mech., 31, 417–457, 1999. 3. Flandrin, P., Time–Frequency/ Time–Scale Analysis, Academic Press, San Diego, 1999. 4. Bedrosian, E., On the quadrature approximation to the Hilbert transform of modulated signals, Proc. IEEE, 51, 868–869, 1963. 5. Nuttall, A.H., On the quadrature approximation to the Hilbert transform of modulated signals, Proc. IEEE, 54, 1458–1459, 1966. 6. Huang, N.E., Wu, M.L., Long, S.R., Shen, S.S.P., Qu, W.D., Gloersen, P., and Fan, K.L., A confidence limit for the empirical mode decomposition and the Hilbert spectral analysis, Proc. Roy. Soc. London, A459, 2317–2345, 2003. 7. Gabor, D., Theory of communication, J. IEEE, 93, 426–457, 1946. 8. Huang, N.E., Long, S.R., and Shen, Z., The mechanism for frequency downshift in nonlinear wave evolution, Adv. Appl. Mech., 32, 59–111, 1996. 9. Olhede, S. and Walden, A.T., The Hilbert spectrum via wavelet projections, Proc. Roy. Soc. London, A460, 955–975, 2004. 10. Flandrin, P., Rilling, G., and Gonc¸alves, P., Empirical mode decomposition as a filterbank, IEEE Signal Proc. Lett., 11(2), 112–114, 2004. 11. Castleman, K.R., Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1996. 12. Russ, J.C., The Image Processing Handbook, 4th Edition, CRC Press, Boca Raton, 2002. 13. Nunes, J.C., Guyot, S., and Dele´chelle, E., Texture analysis based on local analysis of the bidimensional empirical mode decomposition, Mach. Vision Appl., 16(3), 177–188, 2005. 14. Linderhed, A., Compression by image empirical mode decomposition, IEEE Int. Conf. Image Process., 1, 553–556, 2005. 15. Linderhed, A., Variable sampling of the empirical mode decomposition of two-dimensional signals, Int. J. Wavelets, Multi-resolut. Inform. Process., 3, 2005. 16. Linderhed, A., 2D empirical mode decompositions in the spirit of image compression, Wavelet Independ. Compon. Analy. Appl. IX, SPIE Proc., 4738, 1–8, 2002. 17. Long, S.R., NASA Wallops Flight Facility Air–Sea Interaction Research Facility, NASA Reference Publication, No. 1277, 1992, 29 pp. 18. Long, S.R., Lai, R.J., Huang, N.E., and Spedding, G.R., Blocking and trapping of waves in an inhomogeneous flow, Dynam. Atmos. Oceans, 20, 79–106, 1993. 19. Long, S.R., Huang, N.E., Tung, C.C., Wu, M.-L.C., Lin, R.-Q., Mollo-Christensen, E., and Yuan, Y., The Hilbert techniques: An alternate approach for non-steady time series analysis, IEEE GRSS, 3, 6–11, 1995. 20. Long, S.R. and Klinke, J., A closer look at short waves generated by wave interactions with adverse currents, Gas Transfer at Water Surfaces, Geophysical Monograph 127, American Geophysical Union, 121–128, 2002. 21. Long, S.R., Applications of HHT in image analysis, Hilbert–Huang Transform and Its Applications, Interdisciplinary Mathematical Sciences, 5, 289–305, World Scientific, Singapore, 2005. 22. Wu, Z. and Huang, N.E., A study of the characteristics of white noise using the empirical mode decomposition method, Proc. Roy. Soc. London, A460, 1597–1611, 2004. 23. Huang, N.E., Wu, Z., Long, S.R., Arnold, K.C., Blank, K., and Liu, T.W., On instantaneous frequency, Proc. Roy. Soc. London 2006, in press. 24. Huang, N.E., Introduction to the Hilbert–Huang transform and its related mathematical problems, Hilbert–Huang Transform and Its Applications, Interdisciplinary Mathematical Sciences, 5, 1–26, World Scientific, Singapore, 2005. 25. Huang, N.E. and Long, S.R., A generalized zero-crossing for local frequency determination, US Patent pending, 2003.

2 Statistical Pattern Recognition and Signal Processing in Remote Sensing

Chi Hau Chen

CONTENTS 2.1 Introduction ......................................................................................................................... 25 2.2 Introduction to Statistical Pattern Recognition in Remote Sensing............................ 26 2.3 Using Self-Organizing Maps and Radial Basis Function Networks for Pixel Classification ....................................................................................................... 28 2.4 Introduction to Statistical Signal Processing in Remote Sensing................................ 28 2.5 Conclusions.......................................................................................................................... 30 References ..................................................................................................................................... 30

2.1

Introduction

Basically, statistical pattern recognition deals with the correct classification of a pattern into one of several available pattern classes. Basic topics in statistical pattern recognition include: preprocessing, feature extraction and selection, parametric or nonparametric probability density, decision-making processes, performance evaluation, postprocessing as needed, supervised and unsupervised learning, or training, and cluster analysis. The large amount of data available makes remote-sensing data uniquely suitable for statistical pattern recognition. Signal processing is needed not only to reduce the undesired noises and interferences but also to extract desired information from the data as well as to perform the preprocessing task for pattern recognition. Remote-sensing data considered include those from multispectral, hyperspectral, radar, optical, and infrared sensors. Statistical signal-processing methods, as used in remote sensing, include transform methods such as principal component analysis (PCA), independent component analysis (ICA), factor analysis, and the methods using high-order statistics. This chapter is presented as a brief overview of the statistical pattern recognition and statistical signal processing in remote sensing. The views and comments presented, however, are largely those of this author. The chapter introduces the pattern recognition and signal-processing topics dealt in this book. The readers are highly recommended to refer the book by Landgrebe [1] for remote-sensing pattern classification issues and the article by Duin and Tax [2] for a survey on statistical pattern recognition.

25

Signal and Image Processing for Remote Sensing

26

Although there are many applications of statistical pattern recognition, its theory has been developed only during the last half century. A list of some major theoretical developments includes the following: .

Formulation of pattern recognition as a Bayes decision theory problem [3]

.

Nearest neighbor decision rules (NNDRs) and density estimation [4]

.

Use of Parzen density estimate in nonparametric pattern recognition [5] Leave-one-out method of error estimation [6]

.

.

Use of statistical distance measures and error bounds in feature evaluation [7] Hidden Markov models as one way to deal with contextual information [8]

.

Minimization of the perceptron criterion function [9]

.

Fisher linear discriminant and multicategory generlizations [10] Link between backpropagation trained neural networks and the Bayes discriminant [11]

.

.

. .

Cover’s theorem on the separability of patterns [12] Unsupervised learning by decomposition of mixture densities [13]

.

K-mean algorithm [14] Self-organizing map (SOM) [15]

.

Statistical learning theory and VC dimension [16,17]

.

Support vector machine for pattern recognition [17] Combining classifiers [18]

.

. . .

Nonlinear mapping [19] Effect of finite sample size (e.g., [13])

In the above discussion, the role of artificial neural networks on statistical classification and clustering has been taken into account. The above list is clearly not complete and is quite subjective. However, these developments clearly have a significant impact on information processing in remote sensing. We now examine briefly the performance measures in statistical pattern recognition. .

Error probability. This is most popular as the Bayes decision rule is optimum for minimum error probability. It is noted that an average classification accuracy was proposed by Wilkinson [20] for remote sensing.

.

Ratio of interclass distance to within-class distance. This is most popular for discriminant analysis that seeks to maximize such a ratio. Mean square error. This is most popular mainly in error correction learning and in neural networks.

.

.

ROC (receiver operating characteristics) curve, which is a plot of the probability of correct decision versus the probability of false alarm, with other parameters given.

Other measures, like error-reject tradeoff, are often used in character recognition.

2.2

Introduction to Statistical Pattern Recognition in Remote Sensing

Feature extraction and selection is still a basic problem in statistical pattern recognition for any application. Feature measurements constructed from multiple bands of the

Statistical Pattern Recognition and Signal Processing in Remote Sensing

27

remote-sensing data as a vector are still most commonly used in remote-sensing pattern recognition. Transform methods are useful to reduce the redundancy in vector measurements. The dimensionality reduction has been a particularly important topic in remote sensing in view of the hyperspectral image data, which normally has several hundred spectral bands. The parametric classification rules include the Bayes or maximum likelihood decision rule and discriminant analysis. The nonparametric (or distribution free) method of classification includes NNDR and its modifications, and the Parzen density estimation. In the early 1970s, the multivariate Gaussian assumption was most popular in the multispectral data classification problem. It was demonstrated that the empirical data follows the Gaussian distribution reasonably well [21]. Even with the use of new sensors and the expanded application of remote sensing, the Gaussian assumption remains to be a good approximation. The traditional multivariate analysis still plays a useful role in remotesensing pattern recognition [22] and, because of the importance of covariance matrix, methods to use unsupervised samples to ‘‘enhance’’ the data statistics have also been considered. Indeed, for good classification, data statistics must be carefully examined. An example is the synthetic aperture radar (SAR) image data. Chapter 13 presents a discussion on the physical and statistical characteristics of the SAR image data. Without making use of the Gaussian assumption, the NNDR is the most popular nonparametric classification method. It works well even with a moderate size data set and promises an error rate that is upper-bounded by twice the Bayes error rate. However, its performance is limited in remote-sensing data classification, while neural networks–based classifiers can reach the performance nearly equal to that of the Bayes classifier. Extensive study has been done in the statistical pattern recognition community to improve the performance of NNDR. We would like to mention the work of Grabowski et al. [23] here, which introduces the k-near surrounding neighbor (k-NSN) decision rule with application to remote-sensing data classification. Some unique problem areas of statistical pattern recognition in remote sensing are the use of contextual information and the ‘‘Hughes phenomenon.’’ The use of Markov random field model for contextual information is presented in Chapter 14 of this volume. While the classification performance generally improves with increases in the feature dimension, the performance reaches a peak without a proportional increase in the training sample size, beyond which the performance degrades. This is the so-called ‘‘Hughes phenomenon.’’ Methods to reduce this phenomenon are well presented in Ref. [1]. Data fusion is important in remote sensing as different sensors, which have different strengths, are often used. The subject is treated in Chapter 23 of this volume. Though the approach is not limited to statistical methodology [24], the approaches in combining classifiers in statistical pattern recognition and neural networks can be quite useful in providing effective utilizations of information from different sensors or sources to achieve the best-available classification performance. In this volume, Chapter 15 and Chapter 16 present two approaches in statistical combing of classifiers. The recent development in support vector machine appears to present an ultimate classifier that may provide the best classification performance. Indeed, the design of the classifier is fundamental to the classification performance. There is, however, a basic question: ‘‘Is there a best classifier?’’ [25]. The answer is ‘‘No’’ as, among other reasons, it is evident that the classification process is data-dependent. Theory and practice are often not consistent in pattern recognition. The preprocessing and feature extraction and selection are important and can influence the final classification performance. There are no clear steps to be taken in preprocessing, and the optimal feature extraction and selection is still an unsolved problem. A single feature derived from the genetic algorithm may perform better than several original features. There is always a choice to be made between using a complex

28

Signal and Image Processing for Remote Sensing

feature set followed by a simple classifier and a simple feature set followed by a complex classifier. In this volume, Chapter 27 deals with the classification by support vector machine. Among many other publications on the subject, Melgani and Bruzzone [26] provide an informative comparison of the performance of several support vector machines.

2.3 Using Self-Organizing Maps and Radial Basis Function Networks for Pixel Classification In this section, some experimental results are presented to illustrate the importance of preprocessing before classification. The data set, which is now available at the IEEE Geoscience and Remote Sensing Society database, consists of 250 350 pixel images. They were acquired by two imaging sensors installed on a Daedalus 1268 Airborne Thematic Mapper (ATM) scanner and a PLC-band, fully polarimetric NASA/JPL SAR sensor of an agricultural area near the village of Feltwell, U.K. The original SAR images include nine channels. Figure 2.1 shows the original nine channels of image data. The radial basis function (RBF) neural network is used for classification [27]. However, preprocessing is performed by the SOM that performs preclustering. The weights of the SOM are chosen as centers for RBF neurons. RBF has five output nodes for five pattern classes on the image data considered (SAR and ATM images in an agricultural area). Weights of the ‘‘n’’ most-frequently-fired neurons, when each class was presented to the SOM, were separately taken as the center for the 5 n RBF neurons. The weights between the hidden-layer neurons and the output-layer neurons were computed by a procedure for a generalized radial-basis function networks. Pixel classification using SOM alone (unsupervised) is 62.7% correct. Pixel classification using RBF alone (supervised) is 89.5% correct, at best. Pixel classification using both SOM and RBF is 95.2% correct. This result is better than the reported results on the same data set using RBF [28] at 90.5% correct or ICA-based features with nearest neighbor classification rule [29] at 86% correct.

2.4

Introduction to Statistical Signal Processing in Remote Sensing

Signal and image processing is needed in remote-sensing information processing to reduce the noise and interference with the data, to extract the desired signal and image component, or to derive useful measurements for input to the classifier. The classification problem is, in fact, very closely linked to signal and image processing [30,31]. Transform methods have been most popular in signal and image processing [32]. Though the popular wavelet transform method for remote sensing is treated elsewhere [33], we have included Chapter 1 in this volume, which presents the popular Hilbert–Huang transform; Chapter 24, which deals with the use of Hermite transform in the multispectral image fusion; and Chapter 10, Chapter 19, and Chapter 20, which make use of the methods of ICA. Although there is a constant need for better sensors, the signal-processing algorithm such as the one presented in Chapter 7 demonstrates well the role of Kalman filtering in weak signal detection. Time series modeling as used in remote sensing is the subject of Chapter 8 and Chapter 21. Chapter 6 of this volume makes use of the factor analysis.

Statistical Pattern Recognition and Signal Processing in Remote Sensing

29

th-c-hh

th-c-hv

th-c-vv

th-l-hh

th-l-hv

th-l-vv

th-p-hh

th-p-hv

th-p-vv

FIGURE 2.1 A nine-channel SAR image data set.

Signal and Image Processing for Remote Sensing

30

In spite of the numerous efforts with the transform methods, the basic method of PCA always has its useful role in remote sensing [34]. Signal decomposition and the use of high-order statistics can potentially offer new solutions to the remote-sensing information processing problems. Considering the nonlinear nature of the signal and image processing problems, it is necessary to point out the important roles of artificial neural networks in signal processing and classification, as presented in Chapter 3, Chapter 11, and Chapter 12. A lot of effort has been made in the last two decades to derive effective features in signal classification through signal processing. Such efforts include about two dozen mathematical features for use in exploration seismic pattern recognition [35], multi-dimensional attribute analysis that includes both physically and mathematically significant features or attributes for seismic interpretation [36], time domain, frequency domain, and time– frequency domain extracted features for transient signal analysis [37] and classification, and about a dozen features for active sonar classification [38]. Clearly, the feature extraction method for one type of signal cannot be transferred to other signals. To use a large number of features derived from signal processing is not desirable as there is significant information overlap among features and the resulting feature selection process can be tedious. It has not been verified that features extracted from the time–frequency representation can be more useful than the features from time–domain analysis and frequency domain alone. Ideally, a combination of a small set of physically significant and mathematically significant features should be used. Instead of looking for the optimal feature set, a small, but effective, feature set should be considered. It is doubtful that an optimal feature set for any given pattern recognition application can be developed in the near future in spite of many advances in signal and image processing.

2.5

Conclusions

Remote-sensing sensors have been able to deliver abundant information [39]. The many advances in statistical pattern recognition and signal processing can be very useful in remote-sensing information processing, either to supplement the capability of sensors or to effectively utilize the enormous amount of sensor data. The potentials and opportunities of using statistical pattern recognition and signal processing in remote sensing are thus unlimited.

References 1. Landgrebe, D.A., Signal Theory Methods in Multispectral Remote Sensing, John Wiley & Sons, New York, 2003. 2. Duin, R.P.W. and Tax, D.M.J., Statistical pattern recognition, in Handbook of Pattern Recognition and Computer Vision, 3rd ed., Chen, C.H. and Wang, P.S.P., Eds., World Scientific Publishing, Singapore, Jan. 2005, Chap. 1. 3. Chow, C.K., An optimum character recognition system using decision functions, IEEE Trans. Electron. Comput., EC6, 247–254, 1957. 4. Cover, T.M. and Hart, P.E., Nearest neighbor pattern classification, IEEE Trans. Inf. Theory, IT-13(1), 21–27, 1967.

Statistical Pattern Recognition and Signal Processing in Remote Sensing

31

5. Parzen, E., On estimation of a probability density function and mode, Annu. Math. Stat., 33(3), 1065–1076, 1962. 6. Lachenbruch, P.S. and Mickey, R.M., Estimation of error rates in discriminant analysis, Technometrics, 10, 1–11, 1968. 7. Kailath, T., The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commn. Technol., COM-15, 52–60, 1967. 8. Duda, R.C., Hart, P.E., and Stork, D.G., Pattern Classification, John Wiley & Sons, New York, 2003. 9. Ho, Y.C. and Kashyap, R.L., An algorithm for linear inequalities and its applications, IEEE Trans. Electron. Comput., Ece-14(5), 683–688, 1965. 10. Duda, R.C. and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1972. 11. Bishop, C., Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. 12. Cover, T.M., Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. Electron. Comput., EC-14, 326–334, 1965. 13. Fukanaga, K., Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, New York, 1990. 14. MacQueen, J., Some methods for classification and analysis of multivariate observations, Proc. Fifth Berkeley Symp. Probab. Stat., 281–297, 1997. 15. Kohonen, T., Self-Organization and Associative Memory, 3rd ed., Springer-Verlag, Heidelberg, 1988 (1st ed., 1980). 16. Vapnik, V.N., and Chervonenkis, A.Ya., On the uniform convergence of relative frequencies of events to their probabilities, Theor. Probab. Appl., 17, 264–280, 1971. 17. Vapnik, V.N., Statistical Learning Theory, John Wiley & Sons, New York, 1998. 18. Kittler, J., Hatef, M., Duin, R.P.W., and Matas, J., On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intellig., 20(3), 226–239, 1998. 19. Sammon, J.W., Jr., A nonlinear mapping for data structure analysis, IEEE Trans. Comput., C-18, 401–409, 1969. 20. Wilkinson, G., Results and implications of a study of fifteen years of satellite image classification experiments, IEEE Trans. Geosci. Remote Sens., 43, 433–440, 2005. 21. Fu, K.S., Application of pattern recognition to remote sensing, in Applications of Pattern Recognition, Fu, K.S., Ed., CRC Press, Boca Raton, FL, 1982, Chap. 4. 22. Benediktsson, J.A., Statistical and neural network pattern recognition methods for remote sensing applications, in Handbook of Pattern Recognition and Computer Vision, 2nd ed., Chen, C.H. et al., Eds., World Scientific Publishing, Singapore, 1999, Chap. 3.2. 23. Graboswki, S., Jozwik, A., and Chen, C.H., Nearest neighbor decision rule for pixel classification in remote sensing, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 13. 24. Stevens, M.R. and Snorrason, M., Multisensor automatic target segmentation, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 15. 25. Richards, J., Is there a best classifier? Proc. SPIE, 5982, 2005. 26. Melgani, F. and Bruzzone, L., Classification of hyperspectral remote sensing images with support vector machines, IEEE Trans. Geosci. Remote Sens., 42, 1778–1790, 2004. 27. Chen, C.H. and Shrestha, B., Classification of multi-sensor remote sensing images using selforganizing feature maps and radial basis function networks, Proc. IGARSS, Hawaii, 2000. 28. Bruzzone, L. and Prieto, D., A technique of the selection of kernel-function parameters in RBF neural networks for classification of remote sensing images, IEEE Trans. Geosci. Remote Sens., 37(2), 1179–1184, 1999. 29. Zhang, X. and Chen, C.H., Independent component analysis by using joint cumulants and its application to remote sensing images, J. VLSI Signal Process. Systems, 37(2/3), 2004. 30. Chen, C.H., Ed., Digital Waveform Processing and Recognition, CRC Press, Boca Raton, FL, 1982. 31. Chen, C.H., Ed., Signal Processing Handbook, Marcel-Dekker, New York, 1988. 32. Chen, C.H., Transform methods in remote sensing information processing, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 2.

32

Signal and Image Processing for Remote Sensing

33. Chen, C.H., Eds., Wavelet analysis and applications, in Frontiers of Remote Sensing Information Processing, World Scientific Publishing, Singapore, 2003, Chaps. 7–9, pp. 139–224. 34. Lee, J.B., Woodyatt, A.S., and Berman, M., Enhancement of high spectral resolution remote sensing data by a noise-adjusted principal component transform, IEEE Trans. Geosci. Remote Sens., 28, 295–304, 1990. 35. Li, Y., Bian, Z., Yan, P., and Chang, T., Pattern recognition in geophysical signal processing and interpretation, in Handbook of Pattern Recognition and Computer Vision, 1st ed., Chen, C.H. et al., Eds., World Scientific Publishing, Singapore, 1993, Chap. 3.2. 36. Olson, R.G., Signal transient analysis and classification techniques, in Handbook of Pattern Recognition and Computer Vision, 1st ed., Chen, C.H., Ed., World Scientific Publishing, Singapore, 1993, Chap. 3.3. 37. Justice, J.H., Hawkins, D.J., and Wong, G., Multidimensional attribute analysis and pattern recognition for seismic interpretation, Pattern Recognition, 18(6), 391–407, 1985. 38. Chen, C.H., Neural networks in active sonar classification, Neural Networks in Ocean Environments, IEE conference, Washington; DC; 1991. 39. Richard, J., Remote sensing sensors: capabilities and information processing requirements, in Frontiers of Remote Sensing Information Processing, Chen, C.H., Ed., World Scientific Publishing, Singapore, 2003, Chap. 1.

3 A Universal Neural Network–Based Infrasound Event Classifier

Fredric M. Ham and Ranjan Acharyya

CONTENTS 3.1 Overview of Infrasound and Why Classify Infrasound Events?................................ 33 3.2 Neural Networks for Infrasound Classification ............................................................ 34 3.3 Details of the Approach ..................................................................................................... 35 3.3.1 Infrasound Data Collected for Training and Testing ....................................... 36 3.3.2 Radial Basis Function Neural Networks ............................................................ 36 3.4 Data Preprocessing ............................................................................................................. 40 3.4.1 Noise Filtering ......................................................................................................... 40 3.4.2 Feature Extraction Process .................................................................................... 40 3.4.3 Useful Definitions ................................................................................................... 44 3.4.4 Selection Process for the Optimal Number of Feature Vector Components ................................................................................................ 46 3.4.5 Optimal Output Threshold Values and 3-D ROC Curves .............................. 46 3.5 Simulation Results .............................................................................................................. 49 3.6 Conclusions .......................................................................................................................... 53 Acknowledgments ....................................................................................................................... 53 References ..................................................................................................................................... 53

3.1

Overview of I nfrasound and Why C las sify Inf rasound Events?

Infrasound is a longitudinal pressure wave [1– 4]. The characteristics of these waves are similar to audible acoustic waves but the frequency range is far below what the human ear can detect. The typical frequency range is from 0.01 to 10 Hz (Figure 3.1). Nature is an incredible creator of infrasonic signals that can emanate from sources such as volcano eruptions, earthquakes, severe weather, tsunamis, meteors (bolides), gravity waves, microbaroms (infrasound radiated from ocean waves), surf, mountain ranges (mountain associated waves), avalanches, and auroral waves to name a few. Infrasound can also result from man-made events such as mining blasts, the space shuttle, high-speed aircraft, artillery fire, rockets, vehicles, and nuclear events. Because of relatively low atmospheric absorption at low frequencies, infrasound waves can travel long distances in the Earth’s atmosphere and can be detected with sensitive ground-based sensors. An integral part of the comprehensive nuclear test ban treaty (CTBT) international monitoring system (IMS) is an infrasound network system [3]. The goal is to have 60 33

Signal and Image Processing for Remote Sensing

34 0.01

0.03

1 Megaton yield

0.1 0.2

1.0

Hz

Impulsive events

1 Kiloton yield

Volcano events

10.0

Microbaroms

Gravity waves Bolide

Mountain associated waves FIGURE 3.1 Infrasound spectrum.

0.01

0.03

0.1

0.2

1.0

10.0 Hz

infrasound arrays operational worldwide over the next several years. The main objective of the infrasound monitoring system is the detection and verification, localization, and classification of nuclear explosions as well as other infrasonic signals-of-interest (SOI). Detection refers to the problem of detecting an SOI in the presence of all other unwanted sources and noises. Localization deals with finding the origin of a source, and classification deals with the discrimination of different infrasound events of interest. This chapter concentrates on the classification part only.

3.2

Neural Networks for Infrasound Classification

Humans excel at the task of classifying patterns. We all perform this task on a daily basis. Do we wear the checkered or the striped shirt today? For example, we will probably select from a group of checkered shirts versus a group of striped shirts. The grouping process is carried out (probably at a near subconscious level) by our ability to discriminate among all shirts in our closet and we group the striped ones in the striped class and the checkered ones in the checkered class (that is, without physically moving them around in the closet, only in our minds). However, if the closet is dimly lit, this creates a potential problem and diminishes our ability to make the right selection (that is, we are working in a ‘‘noisy’’ environment). In the case of using an artificial neural network for classification of patterns (or various ‘‘events’’) the same problem exists with noise. Noise is everywhere. In general, a common problem associated with event classification (or detection and localization for that matter) is environmental noise. In the infrasound problem, many times the distance between the source and the sensors is relatively large (as opposed to region infrasonic phenomena). Increases in the distance between sources and sensors heighten the environmental dependence of the signals. For example, the signal of an infrasonic event that takes place near an ocean may have significantly different characteristics as compared to the same event that occurs in a desert. A major contributor of noise for the signal near an ocean is microbaroms. As mentioned above, microbaroms are generated in the air from large ocean waves. One important characteristic of neural networks is their noise rejection capability [5]. This, and several other attributes, makes them highly desirable to use as classifiers.

A Universal Neural Network–Based Infrasound Event Classifier

3.3

35

Details of the Approach

Our approach of classifying infrasound events is based on a parallel bank neural network structure [6–10]. The basic architecture is shown in Figure 3.2. There are several reasons for using such an architecture; however, one very important advantage of dedicating one module to perform the classification of one event class is that the architecture is fault tolerant (i.e., if one module fails, the rest of the individual classifiers will continue to function). However, the overall performance of the classifier is enhanced when the parallel bank neural network classifier (PBNNC) architecture is used. Individual banks (or modules) within the classifier architecture are radial basis function neural networks (RBF NNs) [5]. Also, each classifier has its own dedicated preprocessor. Customized feature vectors are computed optimally for each classifier and are based on cepstral coefficients and a subset of their associated derivatives (differences) [11]. This will be explained in detail later. The different neural modules are trained to classify one and only one class; however, for the requisite module responsible for one of the classes, it is also trained not to recognize all other classes (negative reinforcement). During the training process, the output is set to a ‘‘1’’ for a correct class and a ‘‘0’’ for all the other signals associated with all the other classes. When the training process is complete the final output thresholds will be set to an optimal value based on a three-dimensional receiver operating characteristic (3-D ROC) curve for each one of the neural modules (see Figure 3.2). Optimum threshold set by ROC curve Pre-processor 1

Infrasound class 1 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 2

Infrasound class 2 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 3

Infrasound class 3 neural network

1 0

Optimum threshold set by ROC curve

Infrasound signal Pre-processor 4

Infrasound class 4 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 5

Infrasound class 5 neural network

1 0

Optimum threshold set by ROC curve Pre-processor 6

Infrasound class 6 neural network

FIGURE 3.2 Basic parallel bank neural network classifier (PBNNC) architecture.

1 0

Signal and Image Processing for Remote Sensing

36 3.3.1

Infrasound Data Collected for Training and Testing

The data used for training and testing the individual networks are obtained from multiple infrasound arrays located in different geographical regions with different geometries. The six infrasound classes used in this study are shown in Table 3.1, and the various array geometries are shown in Figure 3.3(a) through Figure 3.3(e) [12,13]. Table 3.2 shows the various classes, along with the array numbers where the data were collected, and the associated sampling frequencies. 3.3.2

Radial Basis Function Neural Networks

As previously mentioned, each of the neural network modules in Figure 3.2 is an RBF NN. A brief overview of RBF NNs will be given here. This is not meant to be an exhaustive discourse on the subject, but only an introduction to the subject. More details can be found in Refs. [5,14]. Earlier work on the RBF NN was carried out for handling multivariate interpolation problems [15,16]. However, more recently they have been used for probability density estimation [17–19] and approximations of smooth multivariate functions [20]. In principle, the RBF NN makes adjustments of its weights so that the error between the actual and the desired responses is minimized relative to an optimization criterion through a defined learning algorithm [5]. Once trained, the network performs the interpolation in the output vector space, thus the generalization property. Radial basis functions are one type of positive-definite kernels that are extensively used for multivariate interpolation and approximation. Radial basis functions can be used for problems of any dimension, and the smoothness of the interpolants can be achieved to any desirable extent. Moreover, the structures of the interpolants are very simple. However, there are several challenges that go along with the aforementioned attributes of RBF NNs. For example, many times an ill-conditioned linear system must be solved, and the complexity of both time and space increases with the number of interpolation points. But these types of problems can be overcome. The interpolation problem may be formulated as follows. Assume M distinct data points X ¼ {x1, . . . , xM}. Also assume the data set is bounded in a region V (for a specific class). Each observed data point x 2 R u (u corresponds to the dimension of the input space) may correspond to some function of x. Mathematically, the interpolation problem may be stated as follows. Given a set of M points, i.e., {xi 2 R uji ¼ 1, 2, . . . , M} and a corresponding set of M real numbers {di 2 Rji ¼ 1, 2, . . . , M} (desired outputs or the targets), find a function F:R M ! R that satisfies the interpolation condition F(xi ) ¼ di , i ¼ 1, 2, . . . , M

(3:1)

TABLE 3.1 Infrasound Classes Used for Training and Testing Class Number 1 2 3 4 5 6

Event

No. SOI (n ¼ 574)

No. SOI Used for Training (n ¼ 351)

No. SOI Used for Testing (n ¼ 223)

Vehicle Artillery fire (ARTY) Jet Missile Rocket Shuttle

8 264 12 24 70 196

4 132 8 16 45 146

4 132 4 8 25 50

A Universal Neural Network–Based Infrasound Event Classifier (a)

37

(b)

YDIS, m

YDIS, m

30

30

Sensor 3 (3.1, 19.9)

20

Sensor 5 (−19.8, 0.0)

10

Sensor 1 (0.0, 0.0) −40

−30

−20

−10

10

20

−40

30 XDIS, m

−30

10

Sensor 1 (0.0, 0.0)

−20

−10

−10

Sensor 2 (−18.6, −7.5)

Sensor 3 (0.7, 20.1)

20

10 −10

Sensor 4 (15.3, −12.7)

−20

20

30

Sensor 4 XDIS, m (19.8, −0.1)

−20

Sensor 2 (−1.3, −19.9) −30

−30

Array BP1

Array BP2

YDIS, m

YDIS, m

(c)

(d)

30

30

20

20

−40

−30

−20

Sensor 1 (0.0, 0.0)

Sensor 1 (0.0, 0.0) −10

10

20

30 40 XDIS, m

−10

Sensor 4 (45.0, −8.0)

Sensor 3 (28.0, −21.0)

−30

YDIS, m 30

Sensor 3 (1.1, 20.0)

10

Sensor 1 (0.0, 0.0) −40

−30

−20

−10

10

20

30 XDIS, m

−10

−20

Sensor 2 (−12.3, −15.8)

Sensor 4 (14.1, −14.1)

−30

Array K8203

FIGURE 3.3 Five different array geometries.

−30

−20

Sensor 2 (−20.1, 0.0)

−10

10 −10

−30

Array K8202

Array K8201

20

−40

−20

−20

(e)

Sensor 4 (20.3, 0.5)

10

10

Sensor 2 (−22.0, 10.0)

Sensor 3 (0.0, 20.2)

20

30 XDIS, m

Signal and Image Processing for Remote Sensing

38 TABLE 3.2

Array Numbers Associated with the Event Classes and the Sampling Frequencies Used to Collect the Data Class Number 1 2 3 4 5 6 a

Event

Array

Sampling Frequency, Hz

Vehicle Artillery fire (ARTY) (K8201: Sites 1 and 2) Jet Missile Rocket Shuttle

K8201 K8201; K8203

100 (K8201: Sites 1 and 100; Sites 2 and 50); 50 50 50; 50 100; 100 100; 50

K8201 K8201; K8203 BP1; BP2 BP2; BP103a

Array geometry not available.

Thus, all the points must pass through the interpolating surface. A radial basis function may be a special interpolating function of the form F(x) ¼

M X

w i f i ( kx xi k2 )

(3:2)

i¼1

where f(.) is known as the radial basis function and k.k2 denotes the Euclidean norm. In general, the data points xi are the centers of the radial basis functions and are frequently written as ci. One of the problems encountered when attempting to fit a function to data points is over-fitting of the data, that is, the value of M is too large. However, generally speaking, this is less a problem the RBF NN that it is with, for example, a multi-layer perceptron trained by backpropagation [5]. The RBF NN is attempting to construct the hyperspace for a particular problem when given a limited number of data points. Let us take another point of view concerning how an RBF NN performs its construction of a hypersurface. Regularization theory [5,14] is applied to the construction of the hypersurface. A geometrical explanation follows. Consider a set of input data obtained from several events from a single class. The input data may be from temporal signals or defined features obtained from these signals using an appropriate transformation. The input data would be transformed by a nonlinear function in the hidden layer of the RBF NN. Each event would then correspond to a point in the feature space. Figure 3.4 depicts a two-dimensional (2-D) feature set, that is, the dimension of the output of the hidden layer in the RBF NN is two. In Figure 3.4, ‘‘(a)’’, ‘‘(b)’’, and ‘‘(c)’’ correspond to three separate events. The purpose here is to construct a surface (shown by the dotted line in Figure 3.4) such that the dotted region encompasses events of the same class. If the RBF network is to classify four different classes, there must be four different regions (four dotted contours), one for each class. Ideally, each of these regions should be separate with no overlap. However, because there is always a limited amount of observed data, perfect reconstruction of the hyperspace is not possible and it is inevitable that overlap will occur. To overcome this problem it is necessary to incorporate global information from V (i.e., the class space) in approximating the unknown hyperspace. One choice is to introduce a smoothness constraint on the targets. Mathematical details will not be given here, but for an in-depth development see Refs. [5,14]. Let us now turn our attention to the actual RBF NN architecture and how the network is trained. In its basic form, the RBF NN has three layers: an input layer, one hidden

A Universal Neural Network–Based Infrasound Event Classifier

39

Feature 2

(a)

(b)

(c)

FIGURE 3.4 Example of a two-dimensional feature set.

Feature 1

layer, and one output layer. Referring to Figure 3.5, the source nodes (or the input components) make up the input layer. The hidden layer performs a nonlinear transformation (i.e., the radial basis functions residing in the hidden layer perform this transformation) of the input to the network and is generally of a higher dimension than the input. This nonlinear transformation of the input in the hidden layer may be viewed as a basis for the construction of the input in the transformed space. Thus, the term radial basis function. In Figure 3.5, the output of the RBF NN (i.e., at the output layer) is calculated according to yi ¼ fi (x) ¼

N X

wik fk (x,ck ) ¼

k¼1

N X

wik fk (kx ck k2 ), i ¼ 1, 2, . . . , m (no: outputs)

(3:3)

k¼1

where x 2 R u1 is the input vector, fk(.) is a (RBF) function that maps R þ (set of all positive real numbers) to R (field of real numbers), k.k2 denotes the Euclidean norm, wik are the weights in the output layer, N is the number of neurons in the hidden layer, and ck 2 R u1 are the RBF centers that are selected based on the input vector space. The Euclidean distance between the center of each neuron in the hidden layer and the input to the network is computed. The output of the neuron in a hidden layer is a nonlinear x1

W T ∈ ℜm × N f1

x2

x3

Σ

y1

Σ

ym

f2

fN

xu Input layer

Hidden layer

Output layer

FIGURE 3.5 RBF NN architecture.

Signal and Image Processing for Remote Sensing

40

function of this distance, and the output of the network is computed as a weighted sum of the hidden layer outputs. The functional form of the radial basis function, fk(.), can be any of the following:

.

Linear function: f(x) ¼ x Cubic approximation: f(x) ¼ x3

.

Thin-plate-spline function: f(x) ¼ x2 ln(x)

.

Gaussian function: f(x) ¼ exp(x2/s2) pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Multi-quadratic function: f(x) ¼ x2 þ s2

.

. .

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Inverse multi-quadratic function: f(x) ¼ 1=( x2 þ s2 )

The parameter s controls the ‘‘width’’ of the RBF and is commonly referred to as the spread parameter. In many practical applications the Gaussian RBF is used. The centers, ck, of the Gaussian functions are points used to perform a sampling of the input vector space. In general, the centers form a subset of the input data.

3. 4 3.4.1

D ata P reproc essi ng Noise Filtering

Microbaroms, as previously defined, are a persistently present source of noise that resides in most collected infrasound signals [21–23]. Microbaroms are a class of infrasonic signals characterized by narrow-band, nearly sinusoidal waveforms, with a period between 6 and 8 sec. These signals can be generated by marine storms through a nonlinear interaction of surface waves [24]. The frequency content of the microbaroms often coincides with that of small-yield nuclear explosions. This could be bothersome in many applications; however, simple band-pass filtering can alleviate the problem in many cases. Therefore, a band-pass filter with a pass band between 1 and 49 Hz (for signals sampled at 100 Hz) is used here to eliminate the effects of the microbaroms. Figure 3.6 shows how band-pass filtering can be used to eliminate the microbaroms problem. 3.4.2

Feature Extraction Process

Depicted in each of the six graphs in Figure 3.7 is a collection of eight signals from each class, that is, yij(t) for i ¼ 1, 2, . . . , 6 (classes) and j ¼ 1, 2, . . . , 8 (number of signals) (see Table 3.1 for total number of signals in each class). A feature extraction process is desired that will capture the salient features of the signals in each class and at the same time be invariant relative to the array geometry, the geographical location of the array, the sampling frequency, and the length of the time window. The overall performance of the classifier is contingent on the data that is used to train the neural network in each of the six modules shown in Figure 3.2. Moreover, the neural network’s ability to distinguish between the various events (presented the neural networks as feature vectors) is the distinctiveness of the features between the classes. However, within each class it is desirable to have the feature vectors as similar to each other as possible. There are two major questions to be answered: (1) What will cause the signals in one class to have markedly different characteristics? (2) What can be done to minimize these

A Universal Neural Network–Based Infrasound Event Classifier After filtering 1

1

0.5

Amplitude

Amplitude

Before filtering 1.5

0.5

0

400

–1

1000

500 Time samples

500 Time samples

1000

8 6

300 200 100 0 –50

0

After filtering

Before filtering

Magnitude

Magnitude

0.0 –0.5

0 –0.5

41

50

0 Frequency (Hz)

4 2 0 –50

0 Frequency (Hz)

50

FIGURE 3.6 Results of band-pass filtering to eliminate the effects of microbaroms (an artillery signal).

differences and achieve uniformity within a class and distinctively different feature vector characteristics between classes? The answer to the first question is quite simple—noise. This can be noise associated with the sensors, the data acquisition equipment, or other unwanted signals that are not of interest. The answer to the second question is also quite simple (once you know the answer)—using a feature extraction process based on computed cepstral coefficients and a subset of their associated derivatives (differences) [10,11,25 –28]. As mentioned in Section 3.3, each classifier has its own dedicated preprocessor (see Figure 3.2). Customized feature vectors are computed optimally for each classifier (or neural module) and are based on the aforementioned cepstral coefficients and a subset of their associated derivatives (or differences). The preprocessing procedure is as follows. Each time-domain signal is first normalized and then its mean value is computed and removed. Next, the power spectral density (PSD) is calculated for each signal, which is a mixture of the desired component and possibly other unwanted signals and noise. Therefore, when the PSDs are computed for a set of signals in a defined class there will be spectral components associated with noise and other unwanted signals that need to be suppressed. This can be systematically accomplished by first computing the average PSD (i.e., PSDavg) over the suite of PSDs for a particular class. The spectral components are defined as mi for i ¼ 1, 2, . . . for PSDavg. The maximum spectral component, mmax, of PSDavg is then determined. This is considered the dominant spectral component within a particular class and its value is used to suppress selected components in the resident PSDs for any particular class according to the following: if mi > «1 mmax (typically «1 ¼ 0:001) then mi else «2

mi mi (typically «2 ¼ 0:00001)

Signal and Image Processing for Remote Sensing

42 (a) Raw time-domain 8 signals for vehicle

(b) Raw time-domain 8 signals for missile

0.8

2 1.5

0.6

1 0.5

Amplitude

Amplitude

0.4 0.2 0

Vehicle class

–0.2

0 –0.5 –1

Missile class

–1.5 –2

–0.4 –2.5 –0.6

0

100

200

300

400

500

600

700

–3

Time (sec) (c) Raw time-domain 8 signals for artillery 0.7

1.8 1.6

0

1000

2000

3000

4000

5000

6000

7000

8000

1400

1600

Time (sec) (d) Raw time-domain 8 signals for rocket

0.6

1.4 0.5 Amplitude

Amplitude

1.2 1

Artillery class

0.8

0.4 0.3

0.6

Rocket class

0.2 0.4 0.1

0.2 0

0 0

50

100

150

200

250

300

350

0

1.4

600

800

1000

1200

Shuttle class

0.9

Jet class

1

0.8

0.8

0.7

Amplitude

Amplitude

400

1

1.2

0.6 0.4

0.6 0.5

0.2

0.4

0

0.3

–0.2

0.2

–0.4

200

Time (sec) (f) Raw time-domain 8 signals for shuttle

Time (sec) (e) Raw time-domain 8 signals for jet

0

500

1000 1500 2000 2500 3000

3500 4000 4500 0.1 0

Time (sec)

100

200

300

400

500

600

700

800

900 1000

Time (sec)

FIGURE 3.7 (See color insert following page 302.) Infrasound signals for six classes.

To some extent, this will minimize the effects of any unwanted components that may reside in the signals and at the same time minimize the effects of noise. However, another step can be taken to further minimize the effects of any unwanted signals and noise that may reside in the data. This is based on a minimum variance criterion applied to the spectral components of the PSDs in a particular class after the previously described step is completed. The second step is carried out by taking the first 90% of the spectral components that are rank-ordered according to the smallest variance. The rest of the components

A Universal Neural Network–Based Infrasound Event Classifier

43

in the power spectral densities within a particular class are set to a small value, that is, «3 (typically 0.00001). Therefore, the number of spectral components greater than «3 will dictate the number of components in the cepstral domain (i.e., the number of cepstral coefficients and associated differences). Depending on the class, the number of coefficients and differences will vary. For example, in the simulations that were run, the largest number of components was 2401 (artillery class) and the smallest number was 543 (vehicle class). Next, the mel-frequency scaling step is carried out with defined values for a and b [10], then the inverse discrete cosine transform is taken and the derivatives (differences) are computed. From this set of computed cepstral coefficients and differences, it is desired to select those components that will constitute a feature vector that is consistent within a particular class. That is, there is minimal variation among similar components across the suite of feature vectors. So the approach taken here is to think in terms of minimum variance of these similar components within the feature set. Recall, the time-domain infrasound signals are assumed to be band-pass filtered to remove any effects of microbaroms as described previously. For each discrete-time infrasound signal, y(k), where k is the discrete time index (an integer), the specific preprocessing steps are (dropping the time dependence k): (1) Normalize (i.e., divide each sample in the signal y(k) by the absolute value of the maximum amplitude, jymaxj, and also divide by the square root of the computed variance of the signal, sy2, and then remove the mean: y

y={jymax j,sy }

(3:4)

y

y mean(y)

(3:5)

(2) Compute the PSD, Syy(kv), of the signal y: Syy (kv ) ¼

1 X

Ryy (t)ejkv t

(3:6)

t¼0

where Ryy() is the autocorrelation of the infrasound signal y. (3) Find the average of the entire set of PSDs in the class, i.e., PSDavg (4) Retain only those spectral components whose contributions will maximize the overall performance of the global classifier: if mi > «1 mmax (typically «1 ¼ 0:001) then mi else «2

mi mi (typically «2 ¼ 0:00001)

(5) Compute variances of the components selected in Step (4). Then take the first 90% of the spectral components that are rank-ordered according to the smallest variance. Set the remaining components to a small value, i.e., «3 (typically 0.00001). (6) Apply mel-frequency scaling to Syy(kv): Smel (kv ) ¼ a loge ½bSyy (kv ) where a ¼ 11.25, b ¼ 0.03.

(3:7)

Signal and Image Processing for Remote Sensing

44 (7)

Take the inverse discrete cosine transform: xmel (n) ¼

1 X 1 N Sm (kv ) cos(2pkv n=N ) n k ¼0

for n ¼ 0, 1, 2, . . . , N 1

(3:8)

v

(8)

Take the consecutive differences of the sequence xmel (n) to obtain x0mel (n).

(9)

Concatenate the sequence of differences, x0mel(n), with the cepstral coefficient sequence, xmel (n), to form the augmented sequence: xamel ¼ [x0mel (i)jxmel (j)]

(3:9)

where i and j are determined experimentally. As mentioned previously, i ¼ 400 and j ¼ 600. a (10) Take the absolute value of the elements in the sequence xmel yielding:

xamel,abs ¼ jxamel j

(3:10)

(11) Take the loge of xamel,abs from the previous step to give: xamel,abs,log ¼ loge ½xamel, abs

(3:11)

Applying this 11-step feature extraction process to the infrasound signals in the six different classes results in the feature vectors shown in Figure 3.8. The length of each feature vector is 34. This will be explained in the next section. If these sets of feature vectors are compared to their time-domain signal counterparts (see Figure 3.7), it is obvious that the feature extraction process produces feature vectors that are much more consistent than the time-domain signals. Moreover, comparing the feature vectors between classes reveals that the different sets of feature vectors are markedly distinct. This should result in improved classification performance. 3.4.3

Useful Definitions

Before we go on, let us define some useful quantities that apply to the assessment of performance for classifiers. The confusion matrix [29] for a two-class classifier is shown in Table 3.3. In Table 3.3 we have the following: p: number of correct predictions that an occurrence is positive q: number of incorrect predictions that an occurrence is positive r: number of incorrect of predictions that an occurrence is negative s: number of correct predictions that an occurrence is negative With this, the correct classification rate (CCR) is defined as No: correct predictions No: classifications No: predictions p þ s No: multiple classifications ¼ pþqþrþs

CCR ¼

(3:12)

A Universal Neural Network–Based Infrasound Event Classifier (a) Feature set for vehicle

10

9

9

8

8 Amplitude

Amplitude

10

7 6

4

4 10

25 15 20 Feature number

3

30

9

8

8

Amplitude

9

7 6

4

4 10

15 20 25 Feature number

5

(e) Feature set for jet 9

8

8 Amplitude

9 Amplitude

10

15 20 25 Feature number

30

(f) Feature set for shuttle 10

7 6

7 6

5

5

4

4 5

30

3

30

10

3

15 20 25 Feature number

6 5

5

10

7

5

3

5

(d) Feature set for rocket 10

(c) Feature set for artillery fire 10

Amplitude

6 5

5

(b) Feature set for missile

7

5

3

45

10 15 20 25 Feature number

30

3 5

10 15 20 25 Feature number

30

FIGURE 3.8 (See color insert following page 302.) Infrasound signals for six class different classes.

Multiple classifications refer to more than one of the neural modules showing a ‘‘positive’’ at the output of the RBF NN indicating that the input to the global classifier belongs to more than one class (whether this is true or not). So there could be double, triple, quadruple, etc., classifications for one event. The accuracy (ACC) is given by ACC ¼

No: correct predictions pþs ¼ No predictions pþqþrþs

(3:13)

Signal and Image Processing for Remote Sensing

46 TABLE 3.3

Confusion Matrix for a Two-Class Classifier Predicted Value Actual Value Positive Negative

Positive p r

Negative q s

As seen from Equation 3.12 and Equation 3.13, if multiple classifications occur, the CCR is a more conservative performance measure than the ACC. However, if no multiple classifications occur, the CCR ¼ ACC. The true positive (TP) rate is the proportion of positive cases that are correctly identified. This is computed using p TP ¼ (3:14) pþq The false positive (FP) rate is the proportion of negative cases that are incorrectly classified as positive occurrences. This is computed using r (3:15) FP ¼ rþs 3.4.4

Selection Process for the Optimal Number of Feature Vector Components

From the set of computed cepstral coefficients and differences generated using the feature extraction process given above, an optimal subset of these is desired that will constitute the feature vectors used to train and test the PBNNC shown in Figure 3.2. The optimal subset (i.e., the optimal feature vector length) is determined by taking a minimum variance approach. Specifically, a 3-D graph is generated that plots the performance; that is, CCR versus the RBF NN spread parameter and the feature vector number (see Figure 3.9). From this graph, mean values and variances are computed across the range of spread parameters for each of the defined number of components in the feature vector. The selection criterion is defined as simultaneously maximizing the mean and at the same time minimizing the variance. Maximization of the mean ensures maximum performance; that is, maximizing the CCR and at the same time minimizing the variance to minimize variation in the feature set within each of the classes. The output threshold at each of the neural modules (i.e., the output of the single output neuron of each RBF NN) is set optimally according to a 3-D ROC curve. This will be explained next. Figure 3.10 shows the two plots used to determine the maximum mean and the minimum variance. The table insert between the two graphs shows that even though the mean value for 40 elements in the feature vector is (slightly) larger than that for 34 elements, the variance for 40 is nearly three times that for 34 elements. Therefore, a length of 34 elements for the feature vectors is the best choice. 3.4.5

Optimal Output Threshold Values and 3-D ROC Curves

At the output of the RBF NN for each of the six neural modules, there is a single output neuron with hard-limiting binary values used during the training process (see Figure 3.2). After training, to determine whether a particular SOI belongs to one of the six classes, the

A Universal Neural Network–Based Infrasound Event Classifier

47

Performance (CCR) 3-D plot using ROC curve

100

CCR (%)

80 60 40 20 0 2.5 2.0 1.5

Sp

rea

1.0

dp

ara

Co m and pute m Co var e mp ian an Com and ute m ce p var e Com and ute m ian an e var p ce Com ian an and ute m 50 put c e e var e and ian an 40 var mean ce ian 30 ce r

0.5

me

ter

20

0.1

10

60

be

re num

Featu

FIGURE 3.9 (See color insert following page 302.) Performance plot used to determine the optimal number components in the feature vector. Ill conditioning occurs for the feature number less than 10, and for the feature number greater than 60, the CCR dramatically declines.

threshold value of the output neurons is optimally set according to an ROC curve [30–32] for that individual neural module (i.e., one particular class). Before an explanation of the 3-D ROC curve is given, let us first review 2-D ROC curves and see how they are used to optimally set threshold values. An ROC curve is a plot of the TP rate versus the FP rate, or the sensitivity versus (1 – specificity); a sample ROC curve is shown in Figure 3.11. The optimal threshold value corresponds to a point nearest the ideal point (0, 1) on the graph. The point (0, 1) is considered ideal because in this case there would be no false positives and only true positives. However, because of noise and other undesirable effects in the data, the point closest to the (0, 1) point (i.e., the minimum Euclidean distance) is the best that we can do. This will then dictate the optimal threshold value to be used at the output of the RBF NN. Since there are six classifiers, that is, six neural modules in the global classifier, six ROC curves must be generated. However, using 2-D ROC curves to set the thresholds at the outputs of the six RBF NN classifiers will not result in optimal thresholds. This is because misclassifications are not taken into account when setting the threshold for a particular neural module that is responsible for classifying a particular set of infrasound signals. Recall that one neural module is associated with one infrasonic class, and each neural module acts as its own classifier. Therefore, it is necessary to account for the misclassifications that can occur and this can be accomplished by adding a third dimension to the ROC curve. When the misclassifications are taken into account the (0, 1, 0) point now becomes the optimal point, and the smallest Euclidean distance to this point is directly related to

Signal and Image Processing for Remote Sensing

48 70

Performance mean

65

60 34 Features

55

50

45

40 10

15

20

25

Feature number

30 35 40 Feature number

45

Mean

Variance

34

66.9596

47.9553

40

69.2735

50

55

60

50

55

60

122.834

500 450 400

Performance variance

350 300 250 34 Features

200 150 100 50 0 10

15

20

25

30 35 40 Feature number

45

FIGURE 3.10 Performance means and performance variances versus feature number used to determine the optimal length of the feature vector.

A Universal Neural Network–Based Infrasound Event Classifier

49

1 0.9

ROC curve Minimum Euclidean distance

TP (sensitivity)

0.8 0.7 0.6 0.5 0.4

0

0.1

0.2

0.3

0.4 0.5 0.6 0.7 FP (1 − specificity)

0.8

0.9

1

FIGURE 3.11 Example ROC curve.

the optimal threshold value for each neural module. Figure 3.12 shows the six 3-D ROC curves associated with the classifiers.

3.5

Simul ation R esults

The four basic parameters that are to be optimized in the process of training the neural network classifier (i.e., the bank of six RBF NNs) are the RBF NN spread parameters, the output thresholds of each neural module, the combination of 34 components in the feature vectors for each class (note again in Figure 3.2, each neural module has its own custom preprocessor) and of course the resulting weights of each RBF NN. The MATLAB neural networks toolbox was used to design the six RBF NNs [33]. Table 3.1 shows the specific classes and the associated number of signals used to train and test the RBF NNs. Of the 574 infrasound signals, 351 were used for training the remaining 223 were used for testing. The criterion used to divide the data between the training and testing sets was to maintain independence. Hence, the four array signals from any one event are always kept together, either in the training set or the test set. After the optimal number components for each feature vector was determined, i.e., 34 elements, and the optimal combination of the 34 components for each preprocessor, the optimal RBF spread parameters are determined along with the optimal threshold value (the six graphs in Figure 3.12 were used for this purpose). For both the RBF spread parameters and the output thresholds, the selection criterion is based on maximizing the CCR of the local network and the overall (global) classifier CCR. The RBF spread parameter and the output threshold for each neural module was determined one by one by fixing the spread parameter, i.e., s, for all other neural modules to 0.3, and holding the threshold value at 0.5. Once the first neural module’s spread

Signal and Image Processing for Remote Sensing

50

(b) ROC 3-D plot for missile

(a) ROC 3-D plot for vehicle

5 Misclassification

Misclassification

4 3.5

Vehicle

3 2.5 2 1.5 1 0.8 Tru e

0.6 pos itive 0.4

0.2 0 (c) ROC 3-D plot for artillery

0.2

0.4

0.6

0.8

2 1

Tru

0.5 osi tive

ep

0.2

0 0 (d) ROC 3-D plot for rocket

False positive

0.4

0.6

1

0.8

False positive

Misclassification

5

4 3

Artillery

2 1 0 1 0.5 ep osit ive

Tru

0

0

0.2

0.4

0.6

0.8

4

Rocket

3 2 1 0 1

1

Tru

e p 0.5 osit ive

False positive

(e) ROC 3-D plot for jet

0

0

0.2

0.4

0.6

0.8

1

False positive

(f) ROC 3-D plot for shuttle

5

5

4 3

Misclassification

Misclassification

Missile

3

0 1

1

5 Misclassification

4

Jet

2 1 0 1 0.8 Tru 0.6 ep osit 0.4 ive

0.2 0

0.2

0.4

0.6

0.8

False positive

1

4

Shuttle

3 2 1 0 1

Tru

e p 0.5 osit ive

0

0

0.2

0.4

0.6

0.8

1

False positive

FIGURE 3.12 3-D ROC curves for the six classes.

parameter and threshold is determined, then the spread parameter and output threshold of the second neural module is computed while holding all other neural modules’ (except the first one) spread parameters and output thresholds fixed at 0.3 and 0.5, respectively. Table 3.4 gives the final values of the spread parameter and the output threshold for the global classifier. Figure 3.13 shows the classifier architecture with the final values indicated for the RBF NN spread parameters and the output thresholds. Table 3.5 shows the confusion matrix for the six-classifier. Concentrating on the 6 6 portion of the matrix for each of the defined classes, the diagonal elements correspond to

A Universal Neural Network–Based Infrasound Event Classifier

51

TABLE 3.4 Spread Parameter and Threshold of Six-Class Classifier

Vehicle Artillery Jet Missile Rocket Shuttle

Spread Parameter

Threshold Value

True Positive

False Positive

0.2 2.2 0.3 1.8 0.2 0.3

0.3144 0.6770 0.6921 0.9221 0.4446 0.6170

0.5 0.9621 0.5 1 0.9600 0.9600

0 0.0330 0 0 0.0202 0.0289

the correct predictions. The trace of this 66 matrix divided by the total number of signals tested (i.e., 223) gives the accuracy of the global classifier. The formula for the accuracy is given in Equation 3.13, and here ACC ¼ 94.6%. The off-diagonal elements indicate the misclassifications that occurred and those in parentheses indicate double classifications (i.e., the actual class was identified correctly, but there was another one of the output thresholds for another class that was exceeded). The off-diagonal element that is in square

s1 = 0.2

Pre-processor 1

Infrasound class 1 neural network

Optimum threshold set by ROC curve (0.3144) 1 0

s2 = 2.2

Pre-processor 2

Infrasound class 2 neural network

1 0

Optimum threshold set by ROC curve (0.6921)

s3 = 0.3

Pre-processor 3 Infrasound signal

Infrasound class 3 neural network

1 0

Optimum threshold set by ROC curve (0.9221)

s4 = 1.8

Pre-processor 4

Infrasound class 4 neural network

1 0

Optimum threshold set by ROC curve (0.4446)

s5 = 0.2

Pre-processor 5

Infrasound class 5 neural network

Optimum threshold set by ROC curve (0.6770)

1 0

s6 = 0.3

Pre-processor 6

Infrasound class 6 neural network

Optimum threshold set by ROC curve (0.6170) 1 0

FIGURE 3.13 Parallel bank neural network classifier architecture with the final values for the RBF spread parameters and the output threshold values.

Signal and Image Processing for Remote Sensing

52 TABLE 3.5 Confusion Matrix for the Six-Class Classifier

Predicted Value

Actual Value

Vehicle Artillery Jet Missile Rocket Shuttle

Vehicle

Artillery

Jet

Missile

Rocket

Shuttle

Unclassified

Total (223)

2 0 0 0 0 0

(1) 127 0 0 0 (1)[1]

0 0 2 0 0 0

0 0 0 8 0 0

0 0 0 0 24 1(3)

0 0 0 0 (5) 48

2 5 2 0 1 1

4 132 4 8 25 50

brackets is a double misclassification, that is, this event is misclassified along with another misclassified event (this is a shuttle event that is misclassified as both a ‘‘rocket’’ event as well as an ‘‘artillery’’ event). Table 3.6 shows the final global classifier results giving both the CCR (see Equation 3.12) and the ACC (see Equation 3.13). Simulations were also run using ‘‘bi-polar’’ outputs instead of binary outputs. For the case of bi-polar outputs, the output is bound between 1 and þ1 instead of 0 and 1. As can be seen from the table, the binary case yielded the best results. Finally, Table 3.7 shows the results for the case where the threshold levels on the outputs of the individual RBF NNs are ignored and only the output with this largest value is taken as the ‘‘winner,’’ that is, ‘‘winner-takes-all’’; this is considered to be the class that the input SOI belongs to. It should be noted that even though the CCR shows a higher level of performance for the winner-takes-all approach, this is probably not a viable method for classification. The reason being that if there were truly multiple events occurring simultaneously, they would never be indicated as such using this approach.

TABLE 3.6 Six-Class Classification Result Using a Threshold Value for Each Network Performance Type CCR ACC

Binary Outputs (%)

Bi-Polar Outputs (%)

90.1 94.6

88.38 92.8

TABLE 3.7 Six-Class Classification Result Using ‘‘Winner-Takes-All’’ Performance Type CCR ACC

Binary Method (%)

Bi-Polar Method (%)

93.7 93.7

92.4 92.4

A Universal Neural Network–Based Infrasound Event Classifier

3.6

53

Conclusions

Radial basis function neural networks were used to classify six different infrasound events. The classifier was built with a parallel structure of neural modules that individually are responsible for classifying one and only one infrasound event, referred to as PBNNC architecture. The overall accuracy of the classifier was found to be greater than 90%, using the CCR performance criterion. A feature extraction technique was employed that had a major impact toward increasing the classification performance over most other methods that have been tried in the past. Receiver operating characteristic curves were also employed to optimally set the output thresholds of the individual neural modules in the PBNNC architecture. This also contributed to increased performance of the global classifier. And finally, by optimizing the individual spread parameters of the RBF NN, the overall classifier performance was increased.

Acknowledgments The authors would like to thank Dr. Steve Tenney and Dr. Duong Tran-Luu from the Army Research Laboratory for their support of this work. The authors also thank Dr. Kamel Rekab, University of Missouri–Kansas City, and Mr. Young-Chan Lee, Florida Institute of Technology, for their comments and insight.

References 1. Pierce, A.D., Acoustics: An Introduction to Its Physical Principles and Applications, Acoustical Society of America Publications, Sewickley, PA, 1989. 2. Valentina, V.N., Microseismic and Infrasound Waves, Research Reports in Physics, SpringerVerlag, New York, 1992. 3. National Research Council, Comprehensive Nuclear Test Ban Treaty Monitoring, National Academy Press, Washington, DC, 1997. 4. Bedard, A.J. and Georges, T.M., Atmospheric infrasound, Physics Today, 53(3), 32–37, 2000. 5. Ham, F.M. and Kostanic, I., Principles of Neurocomputing for Science and Engineering, McGrawHill, New York, 2001. 6. Torres, H.M. and Rufiner, H.L., Automatic speaker identification by means of mel cepstrum, wavelets and wavelet packets, In Proc. 22nd Annu. EMBS Int. Conf., Chicago, IL, July 23–28, 2000, pp. 978–981. 7. Foo, S.W. and Lim, E.G., Speaker recognition using adaptively boosted classifier, In Proc. The IEEE Region 10th Int. Conf. Electr. Electron. Technol. (TENCON 2001), Singapore, Aug. 19–22, 2001, pp. 442–446. 8. Moonasar, V. and Venayagamoorthy, G., A committee of neural networks for automatic speaker recognition (ASR) systems, IJCNN, Vol. 4, Washington, DC, 2001, pp. 2936–2940. 9. Inal, M. and Fatihoglu, Y.S., Self-organizing map and associative memory model hybrid classifier for speaker recognition, 6th Seminar on Neural Network Applications in Electrical Engineering, NEUREL-2002, Belgrade, Yugoslavia, Sept. 26–28, 2002, pp. 71–74. 10. Ham, F.M., Rekab, K., Park, S., Acharyya, R., and Lee, Y.-C., Classification of infrasound events using radial basis function neural networks, Special Session: Applications of learning and data-driven methods to earth sciences and climate modeling, Proc. Int. Joint Conf. Neural Networks, Montre´al, Que´bec, Canada, July 31–August 4, 2005, pp. 2649–2654.

54

Signal and Image Processing for Remote Sensing

11. Mammone, R.J., Zhang, X., and Ramachandran, R.P., Robust speaker recognition: a featurebased approach, IEEE Signal Process. Mag., 13(5), 58–71, 1996. 12. Noble, J., Tenney, S.M., Whitaker, R.W., and ReVelle, D.O., Event detection from small aperture arrays, U.S. Army Research Laboratory and Los Alamos National Laboratory Report, September 2002. 13. Tenney, S.M., Noble, J., Whitaker, R.W., and Sandoval, T., Infrasonic SCUD-B launch signatures, U.S. Army Research Laboratory and Los Alamos National Laboratory Report, October 2003. 14. Haykin, S., Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1999. 15. Powell, M.J.D., Radial basis functions for multivariable interpolation: a review, IMA conference on Algorithms for the Approximation of Function and Data, RMCS, Shrivenham, U.K., 1985, pp. 143–167. 16. Powell, M.J.D., Radial basis functions for multivariable interpolation: a review, Algorithms for the Approximation of Function and Data, Mason, J.C. and Cox, M.G., Eds., Clarendon Press, Oxford, U.K., 1987. 17. Parzen, E., On estimation of a probability density function and mode, Ann. Math. Stat., 33, 1065–1076, 1962. 18. Duda, R.O., and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1973. 19. Specht, D.F., Probabilistic neural networks, Neural Networks, 3(1), 109–118, 1990. 20. Poggio, T. and Girosi, F., A Theory of Networks for Approximation and Learning, A.I. Memo 1140, MIT Press, Cambridge, MA, 1989. 21. Wilson, C.R. and Forbes, R.B., Infrasonic waves from Alaskan volcanic eruptions, J. Geophys. Res., 74, 4511–4522, 1969. 22. Wilson, C.R., Olson, J.V., and Richards, R., Library of typical infrasonic signals, Report prepared for ENSCO (subcontract no. 269343–2360.009), Vols. 1–4, 1996. 23. Bedard, A.J., Infrasound originating near mountain regions in Colorado, J. Appl. Meteorol., 17, 1014, 1978. 24. Olson, J.V. and Szuberla, C.A.L., Distribution of wave packet sizes in microbarom wave trains observed in Alaska, J. Acoust. Soc. Am., 117(3), 1032–1037, 2005. 25. Ham, F.M., Leeney, T.A., Canady, H.M., and Wheeler, J.C., Discrimination of volcano activity using infrasonic data and a backpropagation neural network, In Proc. SPIE Conf. Appl. Sci. Computat. Intell. II, Priddy, K.L., Keller, P.E., Fogel, D.B., and Bezdek, J.C., Eds., Orlando, FL, 1999, Vol. 3722, pp. 344–356. 26. Ham, F.M., Leeney, T.A., Canady, H.M., and Wheeler, J.C., An infrasonic event neural network classifier, In Proc. 1999 Int. Joint Conf. Neural Networks, Session 10.7, Paper No. 77, Washington, DC, July 10–16, 1999. 27. Ham, F.M., Neural network classification of infrasound events using multiple array data, International Infrasound Workshop 2001, Kailua-Kona, Hawaii, 2001. 28. Ham, F.M. and Park, S., A robust neural network classifier for infrasound events using multiple array data, In Proc. 2002 World Congress on Computational Intelligence—International Joint Conference on Neural Networks, Honolulu, Hawaii, May 12–17, 2002, pp. 2615–2619. 29. Kohavi, R. and Provost, F., Glossary of terms, Mach. Learn., 30(2/3), 271–274, 1998. 30. McDonough, R.N. and Whalen, A.D., Detection of Signals in Noise, 2nd ed., Academic Press, San Diego, CA, 1995. 31. Smith, S.W., The Scientist and Engineer’s Guide to Digital Signal Processing, California Technical Publishing, San Diego, CA, 1997. 32. Hanley, J.A. and McNeil, B.J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143(1), 29–36, 1982. 33. Demuth, H. and Beale, M., Neural Network Toolbox for Use with MATLAB, The MathWorks, Inc., Natick, MA, 1998.

4 Construction of Seismic Images by Ray Tracing

Enders A. Robinson

CONTENTS 4.1 Introduction ....................................................................................................................... 55 4.2 Acquisition and Interpretation ....................................................................................... 56 4.3 Digital Seismic Processing............................................................................................... 58 4.4 Imaging by Seismic Processing ...................................................................................... 60 4.5 Iterative Improvement ..................................................................................................... 62 4.6 Migration in the Case of Constant Velocity ................................................................. 63 4.7 Implementation of Migration ......................................................................................... 64 4.8 Seismic Rays ...................................................................................................................... 66 4.9 The Ray Equations............................................................................................................ 70 4.10 Numerical Ray Tracing.................................................................................................... 71 4.11 Conclusions........................................................................................................................ 73 References ..................................................................................................................................... 74

4.1

Introduction

Reflection seismology is a method of remote imaging used in the exploration of petroleum. The seismic reflection method was developed in the 1920s. Initially, the source was a dynamite explosion set off in a shallow hole drilled into the ground, and the receiver was a geophone planted on the ground. In difficult areas, a single source would refer to an array of dynamite charges around a central point, called the source point, and a receiver would refer to an array of geophones around a central point, called the receiver point. The received waves were recorded on a photographic paper on a drum. The developed paper was the seismic record or seismogram. Each receiver accounted for a single wiggly line on the record, which is called a seismic trace or simply a trace. In other words, a seismic trace is a signal (or time series) received at a specific receiver location from a specific source location. The recordings were taken for a time span starting at the time of the shot (called time zero) until about three or four seconds after the shot. In the early days, a seismic crew would record about 10 or 20 seismic records per day, with a dozen or two traces on each record. Figure 4.1 shows a seismic record with wiggly lines as traces. Seismic crew number 23 of the Atlantic Refining Company shot the record on October 9, 1952. As written on the record, the traces were shot with a source that was laid out as a 36-hole circular array. The first circle in the array had a diameter of 130 feet with 6 holes, each hole loaded with 10 lbs of dynamite. The second circle in the array had a diameter of 215 feet with 11 holes (which should have been 12 holes, but one hole was missing), each 55

Signal and Image Processing for Remote Sensing

56

FIGURE 4.1 Seismic record taken in 1952.

hole loaded with 10 lbs of dynamite. The third circle in the array had a diameter of 300 feet with 18 holes, each hole loaded with 5 lbs of dynamite. Each charge was at a depth of 20 feet. The total charge was thus 260 lbs of dynamite, which is a large amount of dynamite for a single seismic record. The receiver for each trace was made up of a group of 24 geophones (also called seismometers) in a circular array with 6 geophones on each of 4 circles of diameters 50 feet, 150 feet, 225 feet, and 300 feet, respectively. There was a 300-feet gap between group centers (i.e., receiver points). This record is called an NR seismogram. The purpose of the elaborate source and receiver arrays was an effort to bring out visible reflections on the record. The effort was fruitless. Regions where geophysicists can see no reflections on the raw records are termed as no record or no reflection (NR) areas. The region in which this record was taken, as the great majority of possible oilbearing regions in the world, was an NR area. In such areas, the seismic method (before digital processing) failed, and hence wildcat wells had to be drilled based on surface geology and a lot of guess work. There was a very low rate of discovery in the NR regions. Because of the tremendous cost of drilling to great depths, there was little chance that any oil would ever be discovered. The outlook for oil was bleak in the 1950s.

4.2

Acquisition and Interpretation

From the years of its inception up to about 1965, the seismic method involved two steps, namely acquisition and interpretation. Acquisition refers to the generation and recording of seismic data. Sources and receivers are laid out on the surface of the Earth. The objective is to probe the unknown structure below the surface. The sources are made up of vibrators (called vibroseis), dynamite shots, or air guns. The receivers are geophones on land and hydrophones at sea. The sources are activated one at a time, not all together. Suppose a

Construction of Seismic Images by Ray Tracing

57

single source is activated, the resulting seismic waves travel from the source into the Earth. The waves pass down through sedimentary rock strata, from which the waves are reflected upward. A reflector is an interface between layers of contrasting physical properties. A reflector might represent a change in lithology, a fault, or an unconformity. The reflected energy returns to the surface, where it is recorded. For each source activated, there are many receivers surrounding the source point. Each recorded signal, called a seismic trace, is associated with a particular source point and a particular receiver point. The traces, as recorded, are referred to as the raw traces. A raw trace contains all the received events. These events are produced by the subsurface structure of the Earth. The events due to primary reflections are wanted; all the other events are unwanted. Interpretation was the next step after acquisition. Each seismic record was examined through the eye and the primary reflections that could be seen were marked by a pencil. A primary reflection is an event that represents a passage from the source to the depth point, and then a passage directly back to the receiver (Figure 4.2). At a reflection, the traces become coherent; that is, they come into phase with each other. In other words, at a reflection, the crests and troughs on adjacent traces appear to fit into one another. The arrival time of a reflection indicates the depth of the reflecting horizon below the surface, while the time differential (the so-called step-out time) in the arrivals of a given peak or trough at successive receiver positions provides information on the dip of the reflecting horizon. In favorable areas, it is possible to follow the same reflection over a distance much greater than that covered by the receiver spread for a single record. In such cases, the records are placed side-by-side. The reflection from the last trace of one record correlates with the first trace of the next record. Such a correlation can be continued on successive records as long as the reflection persists. In areas of rapid structural change, the ensemble of raw traces is unable to show the true geometry of subsurface structures. In some cases, it is possible to identify an isolated structure such as a fault or a syncline on the basis of its characteristic reflection pattern. In NR regions, the raw record section does not give a usable image of the subsurface at all. Seismic wave propagation in three dimensions is a complicated process. The rock layers absorb, reflect, refract, or scatter the waves. Inside the different layers, the waves propagate at different velocities. The waves are reflected and refracted at the interfaces between the layers. Only elementary geometries can be treated exactly in three dimensions. If the reflecting interfaces are horizontal (or nearly so), the waves going straight down will be reflected nearly straight up. Thus, the wave motion is essentially vertical. If the time axes on the records are placed in the vertical position, time appears in the same direction as the raypaths. By using the correct wave velocity, the time axis can be converted into the depth axis. The result is that the primary reflections show the locations Receiver point R

Source point S

D Depth point

FIGURE 4.2 Primary reflection.

Signal and Image Processing for Remote Sensing

58

of the reflecting interfaces. Thus, in areas that have nearly level reflecting horizons, the primary reflections, as recorded, essentially show the correct depth positions of the subsurface interfaces. However, in areas that have a more complicated subsurface structure, the primary reflections as recorded in time do not occur at the correct depth positions in space. As a result, the primary reflections have to be moved (or migrated) to their proper spatial positions. In areas of complex geology, it is necessary to move (or migrate) the energy of each primary reflection to the spatial position of the reflecting point. The method is similar to that used in Huygens’s construction. Huygens articulated the principle that every point of a wavefront can be regarded as the origin of a secondary spherical wave, and the envelope of all these secondary waves constitutes the propagated wavefront. In the predigital days, migration was carried out by straightedge and compass or by a special-purpose handmanipulated drawing machine on a large sheet of a graph paper. The arrival times of the observed reflections were marked on the seismic records. These times are the two-way traveltimes from the source point to the receiver point. If the source and the receiver were at the same point (i.e., coincident), then the raypath down would be the same as the raypath up. In such a case, the one-way time is one half of the two-way time. From the two-way traveltime data, such a one-way time was estimated for each source point. This one-way time was multiplied by an estimated seismic velocity. The travel distance to the interface was thus obtained. A circle was drawn with the surface point as center and the travel distance as radius. This process was repeated for the other source points. In Huygens’s construction, the envelope of the spherical secondary waves gives the new wavefront. In a similar manner, in the seismic case, the envelope of the circles gives the reflecting interface. This method of migration was done in 1921 in the first reflection seismic survey ever taken.

4.3

Digital Seismic Processing

Historically, most seismic work fell under the category of two-dimensional (2D) imaging. In such cases, the source positions and the receiver positions are placed on a horizontal surface line called the x-axis. The time axis is a vertical line called the t-axis. Each source would produce many traces—one trace for each receiver position on the x-axis. The waves that make up each trace take a great variety of paths, each requiring a different time to travel from the source to receiver. Some waves are refracted and others scattered. Some waves travel along the surface of the Earth, and others are reflected upward from various interfaces. A primary reflection is an event that represents a passage from the source to the depth point, and then a passage directly back to the receiver. A multiple reflection is an event that has undergone three, five, or some other odd number of reflections in its travel path. In other words, a multiple reflection takes a zig-zag course with the same number of down legs as up legs. Depending on their time delay from the primary events with which they are associated, multiple reflections are characterized as short-path, implying that they interfere with the primary reflection, or as long-path, where they appear as separate events. Usually, primary reflections are simply called reflections or primaries, whereas multiple reflections are simply called multiples. A water-layer reverberation is a type of multiple reflection due to the multiple bounces of seismic energy back and forth between the water surface and the water bottom. Such reverberations are common in marine seismic data. A reverberation re-echoes (i.e., bounces back and forth) in the water layer for a prolonged period of time. Because of its resonant nature, a

Construction of Seismic Images by Ray Tracing

59

reverberation is a troublesome type of multiple. Reverberations conceal the primary reflections. The primary reflections (i.e., events that have undergone only one reflection) are needed for image formation. To make use of the primary reflected signals on the record, it is necessary to distinguish them from the other type of signals on the record. Random noise, such as wind noise, is usually minor and, in such cases, can be neglected. All the seismic signals, except primary reflections, are unwanted. These unwanted signals are due to the seismic energy introduced by the seismic source signal; hence, they are called signalgenerated noise. Thus, we are faced with the problem of (primary reflected) signal enhancement and (signal-generated) noise suppression. In the analog days (approximately up to about 1965), the separation of signal and noise was done through the eye. In the 1950s, a good part of the Earth’s sedimentary basins, including essentially all water-covered regions, were classified as NR areas. Unfortunately, in such areas, the signal-generated noise overwhelms the primary reflections. As a result, the primary reflections cannot be picked up visually. For example, water-layer reverberations as a rule completely overwhelm the primaries in the water-covered regions such as the Gulf of Mexico, the North Sea, and the Persian Gulf. The NR areas of the world could be explored for oil in a direct way by drilling, but not by the remote detection method of reflection seismology. The decades of the 1940s and 1950s were replete with inventions, not the least of which was the modern high-speed electronic stored-program digital computer. In the years from 1952 to 1954, almost every major oil company joined the MIT Geophysical Analysis Group to use the digital computer to process NR seismograms (Robinson, 2005). Historically, the additive model (trace ¼ s þ n) was used. In this model, the desired primary reflections were the signal s and everything else was the noise n. Electric filers and other analog methods were used, but they failed to give the desired primary reflections. The breakthrough was the recognition that the convolutional model (trace ¼ sn) is the correct model for a seismic trace. Note that the asterisk denotes convolution. In this model, the signal-generated noise is the signal s and the unpredictable primary reflections are the noise n. Deconvolution removes the signal-generated noise (such as instrument responses, ground roll, diffractions, ghosts, reverberations, and other types of multiple reflections) so as to yield the underlying primary reflections. The MIT Geophysical Analysis Group demonstrated the success of deconvolution on many NR seismograms, including the record shown in Figure 4.1. However, the oil companies were not ready to undertake digital seismic processing at that time. They were discouraged because an excursion into digital seismic processing would require new effort that would be expensive, and still the effort might fail because of the unreliability of the existing computers. It is true that in 1954 the available digital computers were far from suitable for geophysical processing. However, each year from 1946 onward, there was a constant stream of improvements in computers, and this development was accelerating every year. With patience and time, the oil and geophysical companies would convert to digital processing. It would happen when the need for hard-to-find oil was great enough to justify the investment necessary to turn NR seismograms into meaningful data. Digital signal processing was a new idea to the seismic exploration industry, and the industry shied away from converting to digital methods until the 1960s. The conversion of the seismic exploration industry to digital was in full force by about 1965, at which time transistorized computers were generally available at a reasonable price. Of course, a reasonable price for one computer then would be in the range from hundreds of thousands of dollars to millions of dollars. With the digital computer, a whole new step in seismic exploration was added, namely digital processing. However, once the conversion to digital was undertaken in the years around 1965, it was done quickly and effectively. Reflection seismology now involves three steps, namely acquisition, processing, and interpretation. A comprehensive presentation of seismic data processing is given by Yilmaz (1987).

Signal and Image Processing for Remote Sensing

60

4.4

Imaging by Seismic Processing

The term imaging refers to the formation of a computer image. The purpose of seismic processing is to convert the raw seismic data into a useful image of the subsurface structure of the Earth. From about 1965 onward, most of the new oil fields discovered were the result of the digital processing of the seismic data. Digital signal processing deconvolves the data and then superimposes (migrates) the results. As a result, seismic processing is divided into two main divisions: the deconvolution phase, which produces primaries-only traces (as well as possible), and the migration phase, which moves the primaries to their true depth positions (as well as possible). The result is the desired image. The first phase of imaging (i.e., deconvolution) is carried out on the traces, either individually by means of single-channel processing or in groups by means of multi-channel processing. Ancillary signal-enhancement methods typically include such things as the analyses of velocities and frequencies, static and dynamic corrections, and alternative types of deconvolution. Deconvolution is performed on one or a few traces at a time; hence, the small capacity of the computers of the 1960s was not a severely limiting factor. The second phase of imaging (i.e., migration) is the movement of the amplitudes of the primary-reflection events to their proper spatial locations (the depth points). Migration can be implemented by a Huygens-like superposition of the deconvolved traces. In a mechanical medium, such as the Earth, forces between the small rock particles transmit the disturbance. The disturbance at some regions of rock acts locally on nearby regions. Huygens imagined that the disturbance on a given wavefront is made up of many separate disturbances, each of which acts like a point source that radiates a spherically symmetric secondary wave, or wavelet. The superposition of these secondary waves gives the wavefront at a later time. The idea that the new wavefront is obtained by superposition is the crowning achievement of Huygens. See Figure 4.3. In a similar way, seismic migration uses superposition to find the subsurface reflecting interfaces.

A

A

H

B

I B

b

C K

d

D

d

b

d

b

d

d

L C

FIGURE 4.3 Huygens’s construction.

d

b

G

F E

Construction of Seismic Images by Ray Tracing

61

Ground surface

Interface FIGURE 4.4 Construction of a reflecting interface.

See Figure 4.4. In the case of the migration, there is an added benefit of superposition, namely, superposition is one of the most effective ways to accentuate signals and suppress noise. The superposition used in migration is designed to return the primary reflections to their proper spatial locations. The remnants of the signal-generated noise on the deconvolved traces are out of step with the primary events. As a result, the remaining signal-generated noise tends to be destroyed by the superposition. The superposition used in migration provides the desired image of the underground structure of the Earth. Historically, migration (i.e., the movement of reflected events to their proper locations in space) was carried out manually, sometimes making use of elaborate drawing instruments. The transfer of the manual processes to a digital computer involved the manipulation of a great number of traces at once. The resulting digital migration schemes all relied heavily on the superposition of the traces. This tremendous amount of data handling had a tendency to overload the limited capacities of the computers made in the 1960s and 1970s. As a result, it was necessary to simplify the migration problem and to break it down into smaller parts. Thus, migration was done by a sequence of approximate operations, such as stacking, followed by normal moveout, dip moveout, and migration after stack. The process known as time migration was often used, which improved the records in time, but stopped short of placing the events in their proper spatial positions. All kinds of modifications and adjustments were made to these piecemeal operations, and seismic migration in the 1970s and 1980s was a complicated discipline—an art as much as a science. The use of this art required much skill. Meanwhile, great advances in technology were taking place. In the 1990s, everything seemed to come together. Major improvements in instrumentation and computers resulted in light compact geophysical field equipment and affordable computers with high speed and massive storage. Instead of the modest number of sources and receivers used in 2D seismic processing, the tremendous number required for three-dimensional (3D) processing started to be used on a regular basis in field operations. Finally, the computers were large enough to handle the data for 3D imaging. Event movements (or migration) in three dimensions can now be carried out economically and efficiently by time-honored superposition methods such as those used in the Huygens’s construction. These migration methods are generally named as prestack migration. This name is a relic, which implies that stacking and all the other piecemeal operations are no longer used in the migration scheme. Until the 1990s, 3D seismic imaging was rarely used because of the prohibitive costs involved. Today, 3D methods are commonly used, and the resulting subsurface images are of extraordinary quality. Three-dimensional prestack migration significantly improves seismic interpretation because the locations of geological structures, especially faults, are given much more accurately. In addition, 3D migration collapses diffractions from secondary sources such as reflector terminations against faults and corrects bow ties to form synclines. Three-dimensional seismic work gives beautiful images of the underground structure of the Earth.

Signal and Image Processing for Remote Sensing

62

4.5

Iterative Improvement

Let (x, y) represent the surface coordinates and z represent the depth coordinate. Migration takes the deconvolved records and moves (i.e., migrates) the reflected events to depth points in the 3D volume (x, y, z). In this way, seismic migration produces an image of the geologic structure g(x, y, z) from the deconvolved seismic data. In other words, migration is a process in which primary reflections are moved to their correct locations in space. Thus, for migration, we need the primary reflected events. What else is required? The answer is the complete velocity function v(x, y, z), which gives the wave velocity at each point in the given volume of the Earth under exploration. At this point, let us note that the word velocity is used in two different ways. One way is to use the word velocity for the scalar that gives the rate of change of position in relation to time. When velocity is a scalar, the terms speed or swiftness are often used instead. The other (and more correct) way is to use the word velocity for the vector quantity that specifies both the speed of a body and its direction of motion. In geophysics, the word velocity is used for the (scalar) speed or swiftness of a seismic wave. The reciprocal of velocity v is slowness, n ¼ 1/v. Wave velocity can vary vertically and laterally in isotropic media. In anisotropic media, it can also vary azimuthally. However, we consider only isotropic media; therefore, at a given point, the wave velocity is the same in all directions. Wave velocity tends to increase with depth in the Earth because deep layers suffer more compaction from the weight of the layers above. Wave velocity can be determined from laboratory measurements, acoustic logs, and vertical seismic profiles or from velocity analysis of seismic data. Often, we say velocity when we mean wave velocity. Over the years, various methods have been devised to obtain a sampling of the velocity distribution within the Earth. The velocity functions so determined vary from method to method. For example, the velocity measured vertically from a check-shot or vertical seismic profile (VSP) differs from the stacking velocity derived from normal moveout measurements of ommon depth point gathers. Ideally, we would want to know the velocity at each and every point in the volume of Earth of interest. In many areas, there are significant and rapid lateral or vertical changes in the velocity that distort the time image. Migration requires an accurate knowledge of vertical and horizontal seismic velocity variations. Because the velocity depends on the types of rocks, a complete knowledge of the velocity is essentially equivalent to a complete description of the geologic structure g(x, y, z). However, as we have stated above, the velocity function is required to get the geologic structure. In other words, to get the answer (the geologic structure) g(x, y, z), we must know the answer (the velocity function) v(x, y, z). Seismic interpretation takes the images generated as representatives of the physical Earth. In an iterative improvement scheme, any observable discrepancies in the image are used as forcing functions to correct the velocity function. Sometimes, simple adjustments can be made, and, at other times, the whole imaging process has to be redone one or more times before a satisfactory solution can be obtained. In other words, there is interplay between seismic processing and seismic interpretation, which is a manifestation of the well-accepted exchange between the disciplines of geophysics and geology. Iterative improvement is a well-known method commonly used in those cases where you must know the answer to find the answer. By the use of iterative improvement, the seismic inverse problem is solved. In other words, the imaging of seismic data requires a model of seismic velocity. Initially, a model of smoothly varying velocity is used. If the results are not satisfactory, the velocity model is adjusted and a new image is formed. This process is repeated until a satisfactory image is obtained. To get the image, we must know the velocity; the method of iterative improvement deals with this problem.

Construction of Seismic Images by Ray Tracing

4.6

63

Migration in the Case of Constant Velocity

Consider a primary reflection. Its two-way traveltime is the time it takes for the seismic energy to travel down from the source S ¼ (xS, yS) to depth point D ¼ (xD, yD, zD) and then back up to the receiver R ¼ (xR, yR). The deconvolved trace f(S, R, t) gives the amplitude of the reflected signal as a function of two-way traveltime t, which is given in milliseconds from the time that the source is activated. We know S, R, t, and f(S, R, t). The problem is to find D, which is the depth point at refection. An isotropic medium is a medium whose properties at a point are the same in all directions. In particular, the wave velocity at a point is the same in all directions. Fresnel’s principle of least time requires that in an isotropic medium the rays are orthogonal trajectories of the wavefronts. In other words, the rays are normal to the wavefronts. However, in an anisotropic medium, the rays need not be orthogonal trajectories of the wavefronts. A homogeneous medium is a medium whose physical properties are the same throughout. For ease of exposition, let us first consider the case of a homogenous isotropic medium. Within a homogeneous isotropic material, the velocity v has the same value at all points and in all directions. The rays are straight lines since by symmetry they cannot bend in any preferred direction, as there are none. The two-way traveltime t is the elapsed time for a seismic wave to travel from its source to a given depth point and return to a receiver at the surface of the Earth. The two-way traveltime t is thus equal to the one-way traveltime t1 from the source point S to the depth point D plus the one-way traveltime t2 from the depth point D to the receiver point R. Note that the traveltime from D to R is the same as the traveltime from R to D. We may write t ¼ t1 þ t2, which in terms of distance is qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ vt ¼ (xD xS )2 þ (yD yS )2 þ (zD zS )2 þ (xD xR )2 þ (yD yR )2 þ (zD zR )2 We recall that an ellipse can be drawn with two pins, a loop of string, and a pencil. The pins are placed at the foci and the ends of the string are attached to the pins. The pencil is placed on the paper inside the string, so the string is taut. The string forms a triangle. If the pencil is moved around so that the string stays taut, the sum of the distances from the pencil to the pins remains constant, and the curve traced out by the pencil is an ellipse. Thus, if vt is the length of the string, then any point on the ellipse could be the depth point D that produces the reflection for that source S, receiver R, and traveltime t. We therefore take that event and move it out to each point on the ellipse. Suppose we have two traces with only one event on each trace. Suppose both events come from the same reflecting surface. In Figure 4.5, we show the two ellipses. In the spirit of Huygens’s construction, the reflector must be the common tangent to the ellipses. This example shows how migration works. In practice, we would take

Ellipse with possible reflector points for event on one trace

Ellipse with possible reflector points for event on another trace

Surface of earth

Tangent to the ellipses

FIGURE 4.5 Reflecting interface as a tangent.

Signal and Image Processing for Remote Sensing

64

the amplitude at each digital time instant t on the trace, and scatter the amplitude on the constructed ellipsoid. In this way, the trace is spread out into a 3D volume. Then we repeat this operation for each trace, until every trace is spread out into a 3D volume. The next step is superposition. All of these volumes are added together. (In practice, each trace would be spread out and cumulatively added into one given volume.) Interference tends to destroy the noise, and we are left with the desired 3D image of the Earth.

4.7

Implementation of Migration

A raypath is a course along which wave energy propagates through the Earth. In isotropic media, the raypath is perpendicular to the local wavefront. The raypath can be calculated using ray tracing. Let the point P ¼ (xP, yP) be either a source location or a receiver location. The subsurface volume is represented by a 3D grid (x, y, z) of depth points D. To minimize the amount of ray tracing, we first compute a traveltime table for each and every surface location, whether the location be a source point or a receiver point. In other words, for each surface location P, we compute the one-way traveltime from P to each depth point D in the 3D grid. We put these one-way traveltimes into a 3D table that is labeled by the surface location P. The traveltime for a primary reflection is the total two-way (i.e., down and up) time for a path originating at the source point S, reflected at the depth point D, and received at the receiver point R. Two identification numbers are associated with each trace: one for the source S and the other for the receiver R. We pull out the respective tables for these two identification numbers. We add the two tables together, element by element. The result is a 3D table for the two-way traveltimes for that seismic trace. Let us give a 2D example. We assume that the medium has a constant velocity, which we take to be 1. Let the subsurface grid for depth points D be given by (z, x), where depth is given by z ¼ 1, 2, . . . , 10 and horizontal distance is given by x ¼ 1, 2, . . . , 15. Let the surface locations P be (z ¼ 1, x), where x ¼ 1, 2, . . . , 15. Suppose the source is S ¼ (1, 3). We want to construct a table of one-way traveltimes, where depth z denotes the row and horizontal distance x denotes the column. The one-way traveltime from the source to the qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ depth point (z, x) is t(z, x) ¼ (z 1)2 þ (x 3)2 . For example, the traveltime from source to depth point (z, x) ¼ (4, 6) is qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃ t(4, 6) ¼ (4 1)2 þ (6 3)2 ¼ 18 ¼ 4:24 This number (rounded) appears in the fourth row, sixth column of the table below. The computed table (rounded) for the source is 2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

5.0 5.1 5.4 5.8 6.4 7.1 7.8 8.6 9.4 10.3

6.0 6.1 6.3 6.7 7.2 7.8 8.5 9.2 10.0 10.8

7.0 7.1 7.3 7.6 8.1 8.6 9.2 9.9 10.6 11.4

8.0 8.1 8.2 8.5 8.9 9.4 10.0 10.6 11.3 12.0

9.0 9.1 9.2 9.5 9.8 10.3 10.8 11.4 12.0 12.7

10.0 10.0 10.2 10.4 10.8 11.2 11.7 12.2 12.8 13.5

11.0 11.0 11.2 11.4 11.7 12.1 12.5 13.0 13.6 14.2

12.0 12.0 12.2 12.4 12.6 13.0 13.4 13.9 14.4 15.0

Construction of Seismic Images by Ray Tracing

65

Next, let us compute the traveltime for the receiver point. Note that the traveltime from the depth point to the receiver is the same as the traveltime from the receiver to the depth point. Suppose the receiver is R ¼ (11, 1). Then the one-way traveltime from the receiver qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ to the depth point (z, x) is t(z, x) ¼ (z 1)2 þ (x 11)2 . For example, the traveltime from qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ source to depth point (z, x) ¼ (4, 6) is t(4, 6) ¼ (4 1)2 þ (6 11)2 ¼ 5:85. This number (rounded) appears in the fourth row, sixth column of the table below. The computed table (rounded) for the receiver is 10.0 10.0 10.2 10.4 10.8 11.2 11.7 12.2 12.8 13.5

9.0 9.1 9.2 9.5 9.8 10.3 10.8 11.4 12.0 12.7

8.0 8.1 8.2 8.5 8.9 9.4 10.0 10.6 11.3 12.0

7.0 7.1 7.3 7.6 8.1 8.6 9.2 9.9 10.6 11.4

6.0 6.1 6.3 6.7 7.2 7.8 8.5 9.2 10.0 10.8

5.0 5.1 5.4 5.8 6.4 7.1 7.8 8.6 9.4 10.3

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

1.0 1.4 2.2 3.2 4.1 5.1 6.1 7.1 8.1 9.1

2.0 2.2 2.8 3.6 4.5 5.4 6.3 7.3 8.2 9.2

3.0 3.2 3.6 4.2 5.0 5.8 6.7 7.6 8.5 9.5

4.0 4.1 4.5 5.0 5.7 6.4 7.2 8.1 8.9 9.8

The addition of the above two tables gives the two-way traveltimes. For example, the traveltime from source to depth point (z, x) ¼ (4, 6) and back to receiver is t(4, 6) ¼ 4.24 þ 5.85 ¼ 10.09. This number (rounded) appears in the fourth row, sixth column of the two-way table below. The two-way table (rounded) for source and receiver is 12 12 13 14 15 17 18 19 21 23

10 10 11 13 14 15 17 18 20 22

8 9 10 12 13 14 16 18 19 21

8 8 10 11 12 14 15 17 19 20

8 8 9 10 12 13 15 16 18 20

8 8 9 10 11 13 15 16 18 20

8 8 9 10 11 13 14 16 18 20

8 8 9 10 11 13 15 16 18 20

8 8 9 10 12 13 15 16 18 20

8 8 10 11 12 14 15 17 19 20

8 9 10 12 13 14 16 18 19 21

10 10 11 13 14 15 17 18 20 22

12 12 13 14 15 17 18 19 21 23

14 14 15 16 17 18 19 21 22 24

16 16 17 17 18 19 21 22 23 25

Figure 4.6 shows a contour map of the above table. The contour lines are elliptic curves of constant two-way traveltime for the given source and receiver pair. Let the deconvolved trace (i.e., the trace with primary reflections only) be Time 0 Trace 0 Sample

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

10 1

11 0

12 0

13 2

14 0

15 0

16 0

17 0

18 3

19 0

The next step is to place the amplitude of each trace at depth locations, where the traveltime of the trace sample equals the traveltime as given in the above two-way table. The trace is zero for all times except 10, 13, and 18. We now spread the trace

Signal and Image Processing for Remote Sensing

66

S1 S2 S3 S4 20.0–25.0

S5

15.0–20.0 10.0–15.0

S6

5.0–10.0 S7

0.0–5.0

S8 S9

1

2

3

4

5

6

7

8

9

10

11

12

13

S10 15

14

FIGURE 4.6 Elliptic contour lines.

out as follows. In the above two-way table, the entries with 10, 13, and 18 are replaced by the trace values 1, 2, 3, respectively. All other entries are replaced by zero. The result is 0 0 2 0 0 0 3 0 0 0

1 1 0 2 0 0 0 3 0 0

0 0 1 0 2 0 0 3 0 0

0 0 1 0 0 0 0 0 0 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 0 1 0 2 0 0 3 0

0 0 1 0 0 0 0 0 0 0

0 0 1 0 2 0 0 3 0 0

0 1 0 2 0 0 0 3 0 0

0 0 2 0 0 0 0 0 0 0

0 0 0 0 0 3 0 0 0 0

0 0 0 0 3 0 0 0 0 0

This operation is repeated for all traces in the survey, and the resulting tables are added together. The final image appears by the constructive and destructive interference among the individual trace contributions. The above procedure for a constant velocity is the same as with a variable velocity, except now the traveltimes are computed according to the velocity function v(x, y, z). The eikonal equation can provide the means; hence, for the rest of this paper we will develop the properties of this basic equation.

4.8

Seismic Rays

To move the received reflected events back into the Earth and place their energy at the point of reflection, it is necessary to have a good understanding of ray theory. We assume the medium is isotropic. Rays are directed curves that are always perpendicular to the

Construction of Seismic Images by Ray Tracing

67

wavefront at any given time. The rays point along the direction of the motion of the wavefront. As time progresses, the disturbance propagates, and we obtain a family of wavefronts. We will now describe the behavior of the rays and wavefronts in media with a continuously varying velocity. In the treatment of light as wave motion, there is a region of approximation in which the wavelength is small in comparison with the dimensions of the components of the optical system involved. This region of approximation is treated by the methods of geometrical optics. When the wave character of the light cannot be ignored, then the methods of physical optics apply. Since the wavelength of light is very small compared to ordinary objects, geometrical optics can describe the behavior of a light beam satisfactorily in many situations. Within the approximation represented by geometrical optics, light travels along lines called rays. The ray is essentially the path along which most of the energy is transmitted from one point to another. The ray is a mathematical device rather than a physical entity. In practice, one can produce very narrow beams of light (e.g., a laser beam), which may be considered as physical manifestations of rays. When we turn to a seismic wave, the wavelength is not particularly small in comparison with the dimensions of geologic layers within the Earth. However, the concept of a seismic ray fulfills an important need. Geometric seismics is not nearly as accurate as geometric optics, but still ray theory is used to solve many important practical problems. In particular, the most popular form of prestack migration is based on tracing the raypaths of the primary reflections. In ancient times, Archimedes defined the straight line as the shortest path between two points. Heron explained the paths of reflected rays of light based on a principle of least distance. In the 17th century, Fermat proposed the principle of least time, which let him account for refraction as well as reflection. The Mississippi River has created most of Louisiana with sand and silt. The river could not have deposited these sediments by remaining in one channel. If it had remained in one channel, southern Louisiana would be a long narrow peninsula reaching into the Gulf of Mexico. Southern Louisiana exists in its present form because the Mississippi River has flowed here and there within an arc of about two hundred miles wide, frequently and radically changing course, surging over the left or the right bank to flow in a new direction. It is always the river’s purpose to get to the Gulf in the least time. This means that its path must follow the steepest way down. The gradient is the vector that points in the direction of the steepest ascent. Thus, the river’s path must follow the direction of the negatives gradient, which is the path of steepest descent. As the mouth advances southward and the river lengthens, the steepness of the path declines, the current slows, and sediment builds up the bed. Eventually, the bed builds up so much that the river spills to one side to follow what has become the steepest way down. Major shifts of that nature have occurred about once in a millennium. The Mississippi’s main channel of three thousand years ago is now Bayou Teche. A few hundred years later, the channel shifted abruptly to the east. About two thousand years ago, the channel shifted to the south. About one thousand years ago, the channel shifted to the river’s present course. Today, the Mississippi River has advanced past New Orleans and out into the Gulf that the channel is about to shift again to the Atchafalaya. By the route of the Atchafalaya, the distance across the delta plain is 145 miles, which is about half the length of the route of the present channel. The Mississippi River intends changing its course to this shorter and steeper route. The concept of potential was first developed to deal with problems of gravitational attraction. In fact, a simple gravitational analogy is helpful in explaining potential. We do work in carrying an object up a hill. This work is stored as potential energy, and it can be recovered by descending in any way we choose. A topographic map can be used to visualize the terrain. Topographic maps provide information about the elevation of the surface above sea level. The elevation is represented on a topographic map by contour

Signal and Image Processing for Remote Sensing

68

lines. Each point on a contour line has the same elevation. In other words, a contour line represents a horizontal slice through the surface of the land. A set of contour lines tells you the shape of the land. For example, hills are represented by concentric loops, whereas stream valleys are represented by V-shapes. The contour interval is the elevation difference between adjacent contour lines. Steep slopes have closely spaced contour lines, while gentle slopes have very widely spaced contour lines. In seismic theory, the counterpart of gravitational potential is the wavefront t(x, y, z) ¼ constant. A vector field is a rule that assigns a vector, in our case the gradient rt(x, y, z) ¼

@t @t @t , , @x @y @z

to each point (x, y, z). In visualizing a vector field, we imagine there is a vector extending from each point. Thus, the vector field associates a direction to each point. If a hypothetical particle moves in such a manner that its direction at any point coincides with the direction of the vector field at that point, then the curve traced out is called a flow line. In the seismic case, the wavefront corresponds to the equipotential surface and the seismic ray corresponds to the flow line. In 3D space, let r be the vector from the origin (0, 0, 0) to an arbitrary point (x, y, z). A vector is specified by its components. A compact notation is to write r as r ¼ (x, y, z). We call r the radius vector. A more explicit way is to write r as r ¼ xxˆ þ yyˆ þ zzˆ, where xˆ, yˆ, and zˆ are the three orthogonal unit vectors. These unit vectors are defined as the vectors that have magnitude equal to one and have directions lying along the x, y, z axes, respectively. They are referred to as ‘‘x-hat’’ and so on. Now, let the vector r ¼ (x, y, z) represent a point on a given ray (Figure 4.7). Let s denote arc length along the ray. Let r þ dr ¼ (x þ dx, y þ dy, z þ dz) give an adjacent point on the same ray. The vector dr ¼ (dx, dy, dz) is (approximately) the tangent vector to the ray. The pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ length of this vector is (dx2 þ dy2 þ dz2 Þ which is approximately equal to the increment ds of the arc length on the ray. As a result the unit vector tangent to the ray is u¼

dr dx dy dz ¼ xˆ þ yˆ þ zˆ ds ds ds ds

The unit tangent vector can also be written as dx dy dz u¼ , , ds ds ds The velocity along the ray is v ¼ ds/dt so dt ¼ ds/v ¼ n ds, where n(r) ¼ 1/v(r) is the slowness and ds is an increment of path length along the given ray. Thus, the seismic traveltime field is

Unit tangent u

Ray Wavefront FIGURE 4.7 Raypath and wavefronts.

Radius vector r

Construction of Seismic Images by Ray Tracing

t(r) ¼

69

ðr n(r) ds r0

It is understood, of course, that the path of integration is along the given ray. The above equation holds for any ray. A wavefront is a surface of constant traveltime. The time difference between two wavefronts is a constant independent of the ray used to calculate the time difference. A wave as it travels must follow the path of least time. The wavefronts are like contour lines on a hill. The height of the hill is measured in time. Take a point on a contour line. In what direction will the ray point? Suppose the ray points along the contour line (that is, along the wavefront). As the wave travels a certain distance along this hypothetical ray, it takes time. But, all time is the same along the wavefront. Thus, a wave cannot travel along a wavefront. It follows that a ray must point away from a wavefront. Suppose a ray points away from the wavefront. The wave wants to take the least time to travel to the new wavefront. By isotropy, the wave velocity is the same in all directions. Since the traveltime is velocity multiplied by distance, the wave wants to take the raypath that goes the shortest distance. The shortest distance is along the path that has no component along the wavefront; that is, the shortest distance is along the normal to the wavefront. In other words, the ray’s unit tangent vector u must be orthogonal to the wavefront. By definition, the gradient is a vector that points in the direction orthogonal to the wavefront. Thus, the ray’s unit tangent vector u and the gradient rt of the wavefront must point in the same direction. If the given wavefront is at time t and the new wavefront is at time t þ dt, then the traveltime along the ray is dt. If s measures the path length along the given ray, then the travel distance in time dt is ds. Along the raypath, the increments dt and ds are related by the slowness, that is, dt ¼ n ds. Thus, the slowness is equal to the directional derivative in the direction of the raypath, that is, n ¼ dt/ds. In other words, the swiftness along the raypath direction is v ¼ ds/dt, and the slowness along the raypath direction is n ¼ dt/ds. If we write the directional derivative in terms of its components, this equation becomes n¼

dt @t dx @t dy @t dz dr ¼ þ þ ¼ rt ds @x ds @y ds @z ds ds

Because dr/ds ¼ u, it follows that the above equation is n ¼ rt u. Since u is a unit vector in the same direction of the gradient, it follows that n ¼ jrtj juj cos 0 ¼ jrtj In other words, the slowness is equal to the magnitude of the gradient. Since gradient rt and the raypath each have the same direction u, and the gradient has magnitude n, and u has magnitude unity, it follows that rt ¼ nu This equation is the vector eikonal equation. The vector eikonal equation written in terms of its components is

@t @t @t dx dy dz , , , , ¼n @x @y @z ds ds ds

If we take the squared magnitude of each side, we obtain the eikonal equation

Signal and Image Processing for Remote Sensing

70

@t @x

2 2 2 @t @t þ þ ¼ n2 @y @z

The left-hand side involves the wavefront and the right-hand side involves the ray. The connecting link is the slowness. In the eikonal equation, the function t(x, y, z) is the traveltime (also called the eikonal) from the source to the point with the coordinates (x, y, z), and n(x, y, z) ¼ 1/v(x, y, z) is the slowness (or reciprocal velocity) at that point. The eikonal equation describes the traveltime propagation as an isotropic medium. To obtain a well-posed initial value problem, it is necessary to know the velocity function v(x, y, z) at all points in space. Moreover, as an initial condition, the source or some particular wavefront must be specified. Furthermore, one must choose one of the two branches of the solutions (namely, either the wave going from the source or else the wave going to the source). The eikonal equation then yields the traveltime field t(x, y, z) in the heterogeneous medium, as required for migration. What does the eikonal equation rt ¼ nu say? It says that, because of Fermat’s principle of least time, the raypath direction must be orthogonal to the wavefront. The eikonal equation is the fundamental equation that connects the ray (which corresponds to the fuselage of the airplane) to the wavefront (which corresponds to the wings of the airplane). The wings let the fuselage feel the effects of points removed from the path of the fuselage. The eikonal equation makes a traveling wave (as envisaged by Huygens) fundamentally different from a traveling particle (as envisaged by Newton). Hamilton perceived that there is a wave–particle duality, which provides the mathematical foundation of quantum mechanics.

4.9

The Ray Equations

In this section, the position vector r always represents a point on a specific raypath, and not any arbitrary point in space. As time increases, r traces out the particular raypath in question. The seismic ray at any given point follows the direction of the gradient of the traveltime field t(r). As before, let u be the unit vector along the ray. The ray, in general, follows a curved path, and nu is the tangent to this curved raypath. We now want to derive an equation that will tell us how nu changes along the curved raypath. The vector eikonal equation is written as nu ¼ rt We now take the derivative of the vector eikonal equation with respect to distance s along the raypath. We obtain the ray equation d d (nu) ¼ (rt) ds ds Interchange r and (d/ds) and use dt/ds ¼ n. Thus, the right-hand side becomes drt dt ¼r ¼ rn ds ds Thus, the ray equation becomes

Construction of Seismic Images by Ray Tracing

71

High slowness

Low slowness FIGURE 4.8 Straight and curved rays.

d (nu) ¼ rn ds This equation, together with the equation for the unit tangent vector dr ¼u ds are called the ray equations. We need to understand how a single ray, say a seismic ray, moving along a particular path can know what is an extremal path in the variational sense. To illustrate the problem, consider a medium whose slowness n decreases with vertical depth, but is constant laterally. Thus, the gradient of the slowness at any location points straight up (Figure 4.8). The vertical line on the left depicts a raypath parallel to the gradient of the slowness. This ray undergoes no refraction. However, the path of the ray on the right intersects the contour lines of slowness at an angle. The right-hand ray is refracted and follows a curved path, even though the ray strikes the same horizontal contour lines of slowness as did the left-hand ray, where there was no refraction. This shows that the path of a ray cannot be explained solely in terms of the values of the slowness on the path. We must also consider the transverse values of the slowness along neighboring paths, that is, along paths not taken by that particular ray. The classical wave explanation, proposed by Huygens, resolves this problem by saying that light does not propagate in the form of a single ray. According to the wave interpretation, light propagates as a wavefront possessing transverse width. Think of an airplane traveling along the raypath. The fuselage of the airplane points in the direction of the raypath. The wings of the aircraft are along the wavefront. Clearly, the wavefront propagates more rapidly on the side where the slowness is low (i.e., where the velocity is high) than on the side where the slowness is high. As a result, the wavefront naturally turns in the direction of the gradient of slowness.

4.10

Numerical Ray Tracing

Computer technology and seismic instrumentation have experienced great advances in the past few years. As a result, exploration geophysics is in a state of transition from computer-limited 2D processing to computer-intensive 3D processing. In the past, most seismic surveys were along surface lines, which yield 2D subsurface images. The wave equation acts nicely in one dimension, and in three dimensions, but not in two dimensions. In one dimension, waves (as on a string) propagate without distortion. In three

72

Signal and Image Processing for Remote Sensing

dimensions, waves (as in the Earth) propagate in an undistorted way except for a spherical correction factor. However, in two dimensions, wave propagation is complicated and distorted. By its very nature, 2D processing can never account for events originating outside of the plane. As a result, 2D processing is broken up into a large number of approximate partial steps in a sequence of operations. These steps are ingenious, but they can never give a true image. However, 3D processing accounts for all of the events. It is now cost effective to lay out seismic surveys over a surface area and to do 3D processing. The third dimension is no longer missing, and, consequently, the need for a large number of piecemeal 2D approximations is gone. Prestack depth migration is a 3D imaging process that is computationally extensive, but mathematically simple. The resulting 3D images of the interior of the Earth surpass all expectations in utility and beauty. Let us now consider the general case in which we have a spatially varying velocity function v(x, y, z) ¼ v(r). This velocity function represents a velocity field. For a fixed constant v0, the equation v(r) ¼ v0 specifies those positions, r, where velocity has this fixed value. The locus of such positions makes up an isovelocity surface. The gradient @v @v @v , , rv(r) ¼ @x @y @z is normal to the isovelocity surface and points in the direction of the greatest increase in velocity. Similarly, the equation n(r) ¼ n0 for a fixed value of slowness n0 specifies an isoslowness surface. The gradient @n @n @n , , rn(r) ¼ @x @y @z is normal to the isoslowness surface and points in the direction of greatest increase in slowness. The isovelocity and isoslowness surfaces coincide, and rv ¼ n2 rn so the respective gradients point in the opposite direction, as we would expect. A seismic ray makes its way through the slowness field. As the wavefront progresses in time, the raypath is bent according to the slowness field. For example, suppose we have a stratified Earth in which the slowness decreases with depth, a vertical raypath does not bend, as it is pulled equally in all lateral directions. However, a nonvertical ray drags on its slow side, therefore it curves away from the vertical and bends toward the horizontal. This is the case of a diving wave, whose raypath eventually curves enough to reach the Earth’s surface again. Certainly, the slowness field, together with the initial direction of the ray, determines the entire raypath. Except in special cases, however, we must determine such raypaths numerically. Assume that we know the slowness function n(r) and that we know the ray direction u1 at point r1. We now want to derive an algorithm for finding the ray direction u2 at point r2. We choose a small, but finite, change in path length Ds. Then we use the first ray equation, which we recall is dr ¼u ds to compute the change Dr ¼ r2 r1. The required approximation is Dr ¼ u1 Ds

Construction of Seismic Images by Ray Tracing

73

or r2 ¼ r1 þ u1 Ds We have thus found the first desired quantity r2. Next, we use the second ray equation, which we recall is d(nu) ¼ rn ds in the form d(nu) ¼ rn ds The required approximation is D(nu) ¼ (rn) Ds or n(r2 )u2 n(r1 )u1 ¼ rn Ds For accuracy, rn may be evaluated by differentiating the known function n(r) midway between r1 and r2. Thus, the desired u2 is given as u2 ¼

n(r1 ) Ds u1 þ rn n(r2 ) n(r2 )

Note that the vector u1 is pulled in the direction of rn in forming u2, that is, the ray drags on the slow side, and so is bent in the direction of increasing slowness. The special case of no bending occurs when u1 and rn are parallel. As we have seen, a vertical wave in a horizontally stratified medium is an example of such a special case. We have thus found how to advance the wave along the ray by an incremental raypath distance. We can repeat the algorithm to advance the wave by any desired distance.

4.11

Conclusions

The acquisition of seismic data in many promising areas yields raw traces that cannot be interpreted. The reason is that signal-generated noise conceals the desired primary reflections. The solution to this problem was found in the 1950s with the introduction of the first commercially available digital computers and the signal-processing method of deconvolution. Digital signal-enhancement methods, and, in particular, the various methods of deconvolution, are able to suppress signal-generated noise on seismic records and bring out the desired primary-reflected energy. Next, the energy of the primary reflections must be moved to the spatial positions of the subsurface reflectors. This process, called migration, involves the superposition of all the deconvolved traces according to a scheme similar to Huygens’s construction. The result gives the reflecting horizons and other features that make up the desired image. Thus, digital processing, as it is

74

Signal and Image Processing for Remote Sensing

currently done, is roughly divided into two main parts, namely signal enhancement and event migration. The day is not far off, provided that research actively continues in the future as it has in the past, when the processing scheme will not be divided into two parts, but will be united as a whole. The signal-generated noise consists of physical signals that future processing should not destroy, but utilize. When all the seismic information is used in an integrated way, then the images produced will be even more excellent.

References Robinson, E.A., The MIT Geophysical Analysis Group from inception to 1954, Geophysics, 70, 7 JA–30JA, 2005. Yilmaz, O., Seismic Data Processing, Society of Exploration Geophysicists, Tulsa, 1987.

5 Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA Nicolas Le Bihan, Valeriu Vrabie, and Je´roˆme I. Mars

CONTENTS 5.1 Introduction ......................................................................................................................... 76 5.2 Matrix Data Sets.................................................................................................................. 76 5.2.1 Acquisition............................................................................................................... 77 5.2.2 Matrix Model........................................................................................................... 77 5.3 Matrix Processing ............................................................................................................... 78 5.3.1 SVD ........................................................................................................................... 78 5.3.1.1 Definition.................................................................................................. 78 5.3.1.2 Subspace Method.................................................................................... 78 5.3.2 SVD and ICA........................................................................................................... 79 5.3.2.1 Motivation ................................................................................................ 79 5.3.2.2 Independent Component Analysis ...................................................... 79 5.3.2.3 Subspace Method Using SVD–ICA...................................................... 81 5.3.3 Application .............................................................................................................. 82 5.4 Multi-Way Array Data Sets............................................................................................... 85 5.4.1 Multi-Way Acquisition .......................................................................................... 86 5.4.2 Multi-Way Model ................................................................................................... 86 5.5 Multi-Way Array Processing ............................................................................................ 87 5.5.1 HOSVD..................................................................................................................... 87 5.5.1.1 HOSVD Definition .................................................................................. 87 5.5.1.2 Computation of the HOSVD................................................................. 88 5.5.1.3 The (rc, rx, rt)-rank................................................................................... 89 5.5.1.4 Three-Mode Subspace Method............................................................. 90 5.5.2 HOSVD and Unimodal ICA ................................................................................. 90 5.5.2.1 HOSVD and ICA..................................................................................... 91 5.5.2.2 Subspace Method Using HOSVD–Unimodal ICA ............................ 91 5.5.3 Application to Simulated Data............................................................................. 92 5.5.4 Application to Real Data ....................................................................................... 97 5.6 Conclusions........................................................................................................................ 100 References ................................................................................................................................... 100

75

Signal and Image Processing for Remote Sensing

76

5.1

Introduction

This chapter describes multi-dimensional seismic data processing using the higher order singular value decomposition (HOSVD) and partial (unimodal) independent component analysis (ICA). These techniques are used for wavefield separation and enhancement of the signal-to-noise ratio (SNR) in the data set. The use of multi-linear methods such as the HOSVD is motivated by the natural modeling of a multi-dimensional data set using multiway arrays. In particular, we present a multi-way model for signals recorded on arrays of vector-sensors acquiring seismic vibrations in different directions of the 3D space. Such acquisition schemes allow the recording of the polarization of waves and the proposed multi-way model ensures the effective use of polarization information in the processing. This leads to a substantial increase in the performances of the separation algorithms. Before introducing the multi-way model and processing, we first describe the classical subspace method based on the SVD and ICA techniques for 2D (matrix) seismic data sets. Using a matrix model for these data sets, the SVD-based subspace method is presented and it is shown how an extra ICA step in the processing allows better wavefield separation. Then, considering signals recorded on vector-sensor arrays, the multi-way model is defined and discussed. The HOSVD is presented and some properties detailed. Based on this multi-linear decomposition, we propose a subspace method that allows separation of polarized waves under orthogonality constraints. We then introduce an ICA step in the process that is performed here uniquely on the temporal mode of the data set, leading to the so-called HOSVD–unimodal ICA subspace algorithm. Results on simulated and real polarized data sets show the ability of this algorithm to surpass a matrix-based algorithm and subspace method using only the HOSVD. Section 5.2 presents matrix data sets and their associated model. In Section 5.3, the wellknown SVD is detailed, as well as the matrix-based subspace method. Then, we present the ICA concept and its contribution to subspace formulation in Section 5.3.2. Applications of SVD–ICA to seismic wavefield separation are discussed by way of illustrations. Section 5.4 exposes how signal mixtures recorded on vector-sensor arrays can be described by a multi-way model. Then, in Section 5.5, we introduce the HOSVD and the associated subspace method for multi-way data processing. As in the matrix data set case, an extra ICA step is proposed leading to a HOSVD–unimodal ICA subspace method in Section 5.5.2. Finally, in Section 5.5.3 and Section 5.5.4, we illustrate the proposed algorithm on simulated and real multi-way polarized data sets. These examples emphasize the potential of using both HOSVD and ICA in multi-way data set processing.

5.2

Matrix Data Sets

In this section, we show how the signals recorded on scalar-sensor arrays can be modeled as a matrix data set having two modes or diversities: time and distance. Such a model allows the use of subspace-based processing using a SVD of the matrix data set. Also, an additional ICA step can be added to the processing to relax the unjustified orthogonality constraint for the propagation vectors by imposing a stronger constraint of (fourth-order) independence of the estimated waves. Illustrations of these matrix algebra techniques are presented on a simulated data set. Application to a real ocean bottom seismic (OBS) data set can be found in Refs. [1,2].

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 77 5.2.1

Acquisition

In geophysics, the most commonly used method to describe the structure of the earth is seismic reflection. This method provides images of the underground in 2D or 3D, depending on the geometry of the network of sensors used. Classical recorded data sets are usually gathered into a matrix having a time diversity describing the time or depth propagation through the medium at each sensor and a distance diversity related to the aperture of the array. Several methods exist to gather data sets and the most popular are common shotpoint gather, common receiver gather, or common midpoint gather [3]. Seismic processing consists in a series of elementary processing procedures used to transform field data, usually recorded in common shotpoint gather into a 2D or 3D common midpoint stacked 2D signals. Before stacking and interpretation, part of the processing is used to suppress unwanted coherent signals like multiple waves, ground-roll (surface waves), refracted waves, and also to cancel noise. To achieve this goal, several filters are classically applied on seismic data sets. The SVD is a popular method to separate an initial data set into signal and noise subspaces. In some applications [4,5] when wavefield alignment is performed, the SVD method allows separation of the aligned wave from the other wavefields.

5.2.2

Matrix Model

Consider a uniform linear array composed of Nx omni-directional sensors recording the contributions of P waves, with P < Nx. Such a record can be written mathematically using a convolutive model for seismic signals first suggested by Robinson [6]. Using the superposition principle, the discrete-time signal xk(m) (m is the time index) recorded on sensor k is a linear combination of the P waves received on the array together with an additive noise nk(m):

xk (m) ¼

P X

aki si (m mki ) þ nk (m)

(5:1)

i¼1

where si(m) is the ith source waveform that has been propagated through the transfer function supposed here to consist in a delay mki and a factor attenuation aki. The noises on each sensor nk(m) are supposed centered, Gaussian, spatially white, and independent of the sources. In the sequel, the use of the SVD to separate waves is only of significant interest if the subspace occupied by the part of interest contained in the mixture is of low rank. Ideally, the SVD performs well when the rank is 1. Thus, to ensure good results of the process, a preprocessing is applied on the data set. This consists of alignment (delay correction) of a chosen high amplitude wave. Denoting the aligned wave by s1(m), the model becomes after alignment:

yk (m) ¼ ak1 s1 (m) þ

P X

aki si (m m0ki ) þ n0k (m)

(5:2)

i¼2 0 where yk(m) ¼ xk(m þ mk1), mki ¼ mki mk1 and nk0 (m) ¼ nk(m þ mk1). In the following we assume that the wave s1(m) is independent from the others and therefore independent from si(m mki0 ).

Signal and Image Processing for Remote Sensing

78

Considering the simplified model of the received signals (Equation 5.2) and supposing Nt time samples available, we define the matrix model of the recorded data set Y 2 RNx Nt as Y ¼ {ykm ¼ yk (m)j 1 k Nx , 1 m Nt }

(5:3)

That is, the data matrix Y has rows that are the Nx signals yk(m) given in Equation 5.2. Such a model allows the use of matrix decomposition, and especially the SVD, for its processing.

5.3

Matrix Processing

We now present the definition of the SVD of such a data matrix that will be of use for its decomposition into orthogonal subspaces and in the associated wave separation technique. 5.3.1

SVD

As the SVD is a widely used matrix algebra technique, we only recall here theoretical remarks and redirect readers interested in computational issues to the Golub and Van Loan book [7]. 5.3.1.1

Definition

Any matrix Y 2 RNx Nt can be decomposed into the product of three matrices as follows: Y ¼ UDVT

(5:4)

where U is a Nx Nx matrix, D is an Nx Nt pseudo-diagonal matrix with singular values {l1, l2 , . . . , lN} on its diagonal, satisfying l1 l2 . . . lN 0, (with N ¼ min(Nx, Nt)), and V is an Nt Nt matrix. The columns of U (respectively of V) are called the left (respectively right) singular vectors, uj (respectively vj), and form orthonormal bases. Thus U and V are orthogonal matrices. The rank r (with r N) of the matrix Y is given by the number of nonvanishing singular values. Such a decomposition can also be rewritten as Y¼

r X

lj uj vTj

(5:5)

j¼1

where uj (respectively vj) are the columns of U (respectively V). This notation shows that the SVD allows any matrix to be expressed as a sum of r rank-1 matrices1. 5.3.1.2

Subspace Method

The SVD has been widely used in signal processing [8] because it gives the best rank approximation (in the least squares sense) of a given matrix [9]. This property allows denoising if the signal subspace is of relatively low rank. So, the subspace method consists of decomposing the data set into two orthogonal subspaces with the first one built from the p singular vectors related to the p highest singular values being the best rank approximation of the original data. This can be written as follows, using the SVD notation used in Equation 5.5, for a data matrix Y with rank r: 1

Any matrix made up of the product of a column vector by a row vector is a matrix whose rank is equal to 1 [7].

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 79

Y ¼ YSignal þ YNoise ¼

p X j ¼1

lj uj vTj þ

r X

lj uj vTj

(5:6)

j¼pþ1

Orthogonality between the subspaces spanned by the two sets of singular vectors is ensured by the fact that left and right singular vectors form orthonormal bases. From a practical point of view, the value of p is chosen by finding an abrupt change of slope in the curve of relative singular values (relative meaning percentile representation of) contained in the matrix D defined in Equation 5.4. For some cases where no ‘‘visible’’ change of slope can be found, the value of p can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves [10].

5.3.2

SVD and ICA

The motivation to relax the unjustified orthogonality constraint for the propagation vectors is now presented. ICA is the method used to achieve this by imposing a fourthorder independence on the estimated waves. This provides a new subspace method based on SVD–ICA. 5.3.2.1 Motivation The SVD of the data matrix Y in Equation 5.4 provides two orthogonal matrices composed by the left uj (respectively right vj) singular vectors. Note here that vj are called estimated waves because they give the time dependence of received signals by the array sensor and uj propagation vectors because they give the amplitude of vj0s on sensors [2]. As SVD provides orthogonal matrices, these vectors are also orthogonal. Orthogonality of the vj0s means that the estimated waves are decorrelated (second-order independence). Actually, this supports the usual cases in geophysical situations, in which recorded waves are supposed decorrelated. However, there is no physical reason to consider the orthogonality of propagation vectors uj. Why should we have different recorded waves with orthogonal propagation vectors? Furthermore, imposing the orthogonality of uj0s, the estimated waves vj are forced to be a mixture of recorded waves [1]. One way to relax this limitation is to impose a stronger criterion for the estimated waves, that is, to be fourth-order statistically independent, and consequently to drop the unjustified orthogonality constraint for the propagation vectors. This step is motivated by cases encountered in geophysical situations, where the recorded signals can be approximated as an instantaneous linear mixture of unknown waves supposed to be mutually independent [11]. This can be done using ICA.

5.3.2.2 Independent Component Analysis ICA is a blind decomposition of a multi-channel data set composed of an unknown linear mixture of unknown source signals, based on the assumption that these signals are mutually statistically independent. It is used in blind source separation (BSS) to recover independent sources (modeled as vectors) from a set of recordings containing linear combinations of these sources [12–15]. The statistical independence of sources means that the cross-cumulants of any order vanish. Generally, the third-order cumulants are discarded because they are generally close to zero. Therefore, here we will use fourth-order statistics, which have been found to be sufficient for instantaneous mixtures [12,13].

Signal and Image Processing for Remote Sensing

80

ICA is usually resolved by a two-step algorithm: prewhitening followed by high-order step. The first one consists in extracting decorrelated waves from the initial data set. The step is carried out directly by an SVD as the vj0s are orthogonal. The second step consists in finding a rotation matrix B, which leads to fourth-order independence of the estimated waves. We suppose here that the nonaligned waves in the data set Y are contained in a subspace of dimension R 1, smaller than the rank r of Y. Assuming this, only the first R estimated waves [v1 , . . . , vR ]notation ¼ VR 2 RNt R are taken into account [2]. As the recorded waves are supposed mutually independent, this second step can be written as ~ R ¼ [v ~ 1 , . . . ,v ~R ] 2 RNt R VR B ¼ V

(5:7)

Error (%) of the estimated signal subspace

with B 2 RR R the rotation (unitary) matrix having the property BBT ¼ BTB ¼ I. The ~j are now independent at the fourth order. new estimated waves v There are different methods of finding the rotation matrix: joint approximate diagonalization of eigenmatrices (JADE) [12], maximal diagonality (MD) [13], simultaneous thirdorder tensor diagonalization (STOTD) [14], fast and robust fixed-point algorithms for independent component analysis (FastICA) [15], and so on. To compare some cited ICA algorithms, Figure 5.1 shows the relative error (see Equation 5.12) of the estimated signal subspace versus the SNR (see Equation 5.11) for the data set presented in Section 5.3.3. For SNRs greater than 7.5 dB, FastICA using a ‘‘tan h’’ nonlinearity with the parameter equal to 1 in the fixed-point algorithm provides the smallest relative error, but with some erroneous points at different SNR. Note that the ‘‘tan h’’ nonlinearity is the one which gives the smallest error for this data set, compared with ‘‘pow3’’, ‘‘gauss’’ with the parameter equal to 1, or ‘‘skew’’ nonlinearities. MD and JADE algorithms are approximately equivalent according to the relative error. For SNRs smaller than 7.5 dB, MD provides the smallest relative error. Consequently, the MD algorithm was employed in the following. Now, considering the SVD decomposition in Equation 5.5 and the ICA step in Equation 5.7, the subspace described by the first R estimated waves can be rewritten as

6 JADE FastlCA–tan h MD

5 4 3 2 1 0 –20

–15

FIGURE 5.1 ICA algorithms—comparison.

–10

–5 0 SNR (dB) of the dataset

5

10

15

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 81 R X

R R X T R T X ~ ~jv ~iv ~Tj ¼ ~Ti lj uj vTj ¼ UR DR VR ¼ UR DR B V ¼ bj u bi u

j ¼1

j¼1

(5:8)

i¼1

where UR ¼ [u1, . . . , uR] is made up of the first R vectors of the matrix U and DR ¼ diag (l1, . . . , lR) is the R R truncated version of D containing the greatest values of lj. The ~ j are the new second equality is obtained using Equation 5.7. For the third equality, the u propagation vectors obtained as the normalized2 vectors (columns) of the matrix URDRB and bj are the ‘‘modified singular values’’ obtained as the ‘2 -norm of the columns of the matrix URDRB. The elements bj are usually not ordered. For this reason, a permutation between the ~ j as well as between the vectors v ~j is performed to order the modified singular vectors u values. Denoting with s() this permutation and with i ¼ s(j), the last equality of Equation 5.8 is obtained. In this decomposition, which is similar to that given by Equation 5.5, a stronger ~i has been imposed, that is, to be independent at criterion for the new estimated waves v the fourth order, and, at the same time, the condition of orthogonality for the new ~ i has been relaxed. propagation vectors u In practical situations, the value of R becomes a parameter. Usually, it is chosen to completely describe the aligned wave by the first R estimated waves given by the SVD. 5.3.2.3 Subspace Method Using SVD–ICA After the ICA and the permutation steps, the signal subspace is given by ~ Signal ¼ Y

~p X

~iv ~Ti bi u

(5:9)

i ¼1

where ~ p is the number of the new estimated waves necessary to describe the aligned wave. ~ Noise is obtained by subtraction of the signal subspace Y ~ Signal from The noise subspace Y the original data set Y: ~ Noise ¼ Y Y ~ Signal Y

(5:10)

~ is chosen by finding an abrupt change of From a practical point of view, the value of p slope in the curve of relative modified singular values. For cases with low SNR, no ‘‘visible’’ change of slope can be found and the value of ~p can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves. Note here that for very small SNR of the initial data set, (for example, smaller than 6.2 dB for the data set presented in Section 5.3.3, the aligned wave can be described by a less energetic estimated wave than by the first one (related to the highest singular value). For these extreme cases, a search must be done after the ICA and the permutation steps to ~i give the aligned identify the indexes for which the corresponding estimated waves v ~ Signal in Equation 5.9 must be redefined by choosing the wave. So the signal subspace Y index values found in the search. For example, applying the MD algorithm to the data set presented in Section 5.3.3 for which the SNR was modified to 9 dB, the aligned wave is ~3. Note also that using SVD without ICA in the described by the third estimated wave v same conditions, the aligned wave is described by the eighth estimated wave v8.

2

Vectors are normalized by their ‘2-norm.

Signal and Image Processing for Remote Sensing

82 5.3.3

Application

An application to a simulated data set is presented in this section to illustrate the behavior of the SVD–ICA versus the SVD subspace method. Application to a real data set obtained during an acquisition with OBS can be found in Refs. [1,2]. The preprocessed recorded signals Y on an 8-sensor array (Nx ¼ 8) during Nt ¼ 512 time samples are represented in Figure 5.2c. This synthetic data set was obtained by the addition of an original signal subspace S (Figure 5.2a) made up by a wavefront having infinite celerity (velocity), consequently associated with the aligned wave s1(m), and an original noise subspace N (Figure 5.2b) made up by several nonaligned wavefronts. These nonaligned waves are contained in a subspace of dimension 7, smaller than the rank of Y, which equals 8. The SNR ratio of the presented data set is SNR ¼ 3.9 dB. The SNR definition used here is3: SNR ¼ 20 log10

kSk kNk

(5:11)

Normalization to unit variance of each trace for each component was done before applying the described subspace methods. This ensures that even weak picked arrivals are well represented within the input data. After the computation of signal subspaces, a denormalization was applied to find the original signal subspace. Firstly, the SVD subspace method was tested. The subspace method given by Equation 5.6 was employed, keeping only one singular vector (respectively one singular value). This choice was made by finding an abrupt change of slope after the first singular value (Figure 5.6) in the relative singular values for this data set. The obtained signal subspace YSignal and noise subspace YNoise are presented in Figure 5.3a and Figure 5.3b. It is clear

Distance (sensors)

Distance (sensors)

Distance (sensors) 1

2

3

4

5

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450 500

400 450 500

400 450 500

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ SIi¼1 SJj¼1 a2ij is the Frobenius norm of the matrix A ¼ {aij} 2 RI J

6 7

100 150

100 150

100 150

kAk ¼

50

50

50

(b) Original noise subspace N

FIGURE 5.2 Synthetic data set.

3

8 0

1

2

3

4

5

6 7

8 0

1

2

3

4

5

6 7

8 0

(a) Original signal subspace S

(c) Recorded signals Y = S ⫹ N

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 83 Distance (sensors)

Distance (sensors)

1

2

3

4

5

6

7

50

50

50

100 150

100 150

100 150

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450

400 450

400 450

500

500

500

(b) Estimated noise subspace YNoise

8

0

1

2

3

4

5

6 7

8 0

1

2

3

4

5

6

8 0

(a) Estimated signal subspace YSignal

Index (i )

(c) Estimated waves vj

FIGURE 5.3 Results obtained using the SVD subspace method.

from these figures that the classical SVD implies artifacts in the two estimated subspaces for a wavefield separation objective. Moreover, the estimated waves vj shown in Figure 5.3c are an instantaneous linear mixture of the recorded waves. ~ Signal and noise subspace Y ~ Noise obtained using the SVD–ICA The signal subspace Y subspace method given by Equation 5.9 are presented in Figure 5.4a and Figure 5.4b. This improvement is due to the fact that using ICA we have imposed a fourth-order independence condition stronger than the decorrelation used in classical SVD. With this subspace method we have also relaxed the nonphysically justified orthogonality of the propagation vectors. The dimension R of the rotation matrix B was chosen to be eight because the aligned wavelight is projected on all eight estimated waves vj shown in Figure 5.3c. After the ICA ~i are presented in Figure 5.4c. As and the permutation steps, the new estimated waves v we can see, the first one describes the aligned wave ‘‘perfectly’’. As no visible change of slope can be found in the relative modified singular values shown in Figure 5.6, the value of ~ p was fixed at 1 because we are dealing with a perfectly aligned wave. To compare the results qualitatively, the stack representation is usually employed [5]. Figure 5.5 shows, from left to right, the stacks on the initial data set Y, the original signal subspace S, and the estimated signal subspaces obtained with SVD and SVD–ICA sub~ Signal is very space methods, respectively. As the stack on the estimated signal subspace Y close to the stack on the original signal subspace S, we can conclude that the SVD–ICA subspace method enhances the wave separation results. To compare these methods quantitatively, we use the relative error « of the estimated signal subspace defined as Signal k2 kS Y «¼ (5:12) kSk2

Signal and Image Processing for Remote Sensing

84

Distance (sensors)

Distance (sensors)

1

2

50

50

100 150

100 150

100 150

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

200 250 300 350

Time (samples)

400 450

400 450

400 450

500

500

500

FIGURE 5.4 Results obtained using the SVD-ICA subspace method.

(c) Estimated waves ~ vi

3

4

5

6

0

0

50

(b) ~ Estimated noise subspace YNoise

0 50 100 150 250 300

Time (samples)

200 350 400 450 500

FIGURE 5.5 Stacks. From left to right: initial data set Y, original signal subspace S, SVD, and SVD–ICA estimated subspaces.

7

8

1

2

3

4

5

6 7

8

1

2

3

4

5

6

7

8

0

(a) Estimated signal subspace ~ YSignal

Index (i )

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 85

Relative singular values (%)

30 SVD SVD–ICA

25 20 15 10 5 0

1

2

3

4 5 Index ( j )

6

7

8

FIGURE 5.6 Relative singular values.

where kk is the matrix Frobenius norm defined above, S is the original signal subspace Signal represents either the estimated signal subspace YSignal obtained using SVD or and Y ~ Signal obtained using SVD–ICA. For the data set presented the estimated signal subspace Y in Figure 5.2, we obtain « ¼ 55.7% for classical SVD and « ¼ 0.5% for SVD–ICA. The SNR of this data set was modified by keeping the initial noise subspace constant and by adjusting the energy of the initial signal subspace. The relative errors of the estimated signal subspaces versus the SNR are plotted in Figure 5.7. For SNRs greater than 17 dB, the two methods are equivalent. For smaller SNR, the SVD–ICA subspace method is obviously better than the SVD subspace method. It provides a relative error lower than 1% for SNRs greater than 10 dB. Note here that for other data sets, the SVD–ICA performance can be degraded by the unfulfilled independence assumption supposed for the aligned wave. However, for small SNR of the data set, the SVD–ICA usually gives better performances than SVD. The ICA step leads to a fourth-order independence of the estimated waves and relaxes the unjustified orthogonality constraint for the propagation vectors. This step in the process enhances the wave separation results and minimizes the error on the estimated signal subspace, especially when the SNR ratio is low.

5.4

Multi-Way Array Data Sets

Error (%) of the estimated signal subspace

We now turn to the modelization and processing of data sets having more than two modes or diversities. Such data sets are recorded by arrays of vector-sensors (also called multicomponent sensors) collecting, in addition to time and distance information, the polarization information. Note that there exist other acquisition schemes that output multi-way (or multi-dimensional, multi-modal) data sets, but they are not considered here.

100

SVD SVD–ICA

90 80 70 60 50 40 30 20 10 0 –30

–20

–10 0 10 SNR (dB) of the data set

20

30

FIGURE 5.7 Relative error of the estimated subspaces.

Signal and Image Processing for Remote Sensing

86 5.4.1

Multi-Way Acquisition

In seismic acquisition campaigns, multi-component sensors have been used for more than ten years now. Such sensors allow the recording of the polarization of seismic waves. Thus, arrays of such sensors provide useful information about the nature of the propagated wavefields and allow a more complete description of the underground structures. The polarization information is very useful to differentiate and characterize waves in signal, but the specific (multi-component) nature of the data sets has to be taken into account in the processing. The use of vector-sensor arrays provides data sets with time, distance, and polarization modes, which are called trimodal or three-mode data sets. Here we propose to use a multi-way model to model and process them.

5.4.2

Multi-Way Model

To keep the trimodal (multi-dimensional) structure of data sets originated from vectorsensor arrays in their processing, we propose a multi-way model. This model is an extension of the one proposed in Section 5.2.2 Thus, a three-mode data set is modeled as a multi-way array of size Nc Nx Nt, where Nc is the number of components of each sensor used to recover the vibrations of the wavefield in the three directions of the 3D space, Nx is the number of sensors of the vector-sensor array, and Nt is the number of time samples. Note that the number of components is defined by the vector-sensor configuration. As an example, for the illustration shown in Section 5.5.3, Nc ¼ 2 because one geophone and one hydrophone were used, while Nc ¼ 3 in Section 5.5.4 because three geophones were used. Supposing that the propagation of waves only introduces delay and attenuation, the signal recorded on the cth component (c ¼ 1, . . . ,Nc) of the kth sensor (k ¼ 1, . . . ,Nx), using the superposition principle and assuming that P waves impinge on the array of vector-sensors, can be written as P X xck (m) ¼ acki si (m mki ) þ nck (m) (5:13) i¼1

where acki represents the attenuation of the ith wave on the cth component of the kth sensor of the array. si(m) is the ith wave and mki is the delay observed at sensor k. The time index is m. nck(m) is the noise, supposed Gaussian, centered, spatially white, and independent of the waves. As in the matrix processing approach, preprocessing is needed to ensure low rank of the signal subspace and to ensure good results for a subspace-based processing method. Thus, a velocity correction applied on the dominant waveform (compensation of mk1) leads for the signal recorded on component c of sensor k to: yck (m) ¼ ack1 s1(m) þ

P X

acki si (m m0ki ) þ n0ck (m)

(5:14)

i¼2

where yck(m) ¼ xck(m þ mk1), mki0 ¼ mki mk1, and n0ck(m) ¼ nck(m þ mk1). In the sequel, the wave s1(m) is considered independent from other waves and from the noise. The subspace method developed thereafter will intend to isolate and estimate correctly this wave. Thus, three-mode data sets recorded during Nt time samples on vector-sensor arrays made up by Nx sensors each one having Nc components can be modeled as multi-way arrays Y 2 RNc Nx Nt:

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 87 Y ¼ fyckm ¼ yck (m)j1 c Nc , 1 k Nx , 1 m Nt g

(5:15)

This multi-way model can be used for extension of subspace method separation to multicomponent data sets.

5.5

Multi- Way A rray Processing

Multi-way data analysis arose firstly in the field of psychometrics with Tucker [16]. It is still an active field of research and it has found applications in many areas such as chemometrics, signal processing, communications, biological data analysis, food industry, etc. It is an admitted fact that there exists no exact extension of the SVD for multi-way arrays of dimension greater than 2. Instead of such an extension, there exist mainly two decompositions: PARAFAC [17] and HOSVD [14,16]. The first one is also known as CANDECOMP as it gives a canonical decomposition of a multi-way array, that is, it expresses the multi-way array into a sum of rank-1 arrays. Note that the rank-1 arrays in the PARAFAC decomposition may not be orthogonal, unlike in the matrix case. The second multi-way array decomposition, HOSVD, gives orthogonal bases in the three ways of the array but is not a canonical decomposition as it does not express the original array into a sum of rank-1 arrays. However, in the sequel, we will make use of the HOSVD because of the orthogonal bases that allow extension of well-known subspace methods based on SVD to multi-way datasets.

5.5.1

HOSVD

We now introduce the HOSVD that was formulated and studied in detail in Ref. [14]. We give particular attention to the three-mode case because we will process such data in the sequel, but an extension to the multi-dimensional case exists [14]. One must notice that in the trimodal case the HOSVD is equivalent to the TUCKER3 model [16]; however, the HOSVD has a formulation that is more familiar in the signal processing community as its expression is given in terms of matrices of singular vectors just as in the SVD in the matrix case. 5.5.1.1 HOSVD Definition Consider a multi-way array Y 2 RNc

Nx Nt

, the HOSVD of Y is given by

Y ¼ C 1 V(c) 2 V(x) 3 V(t)

(5:16)

where C 2 RNc Nx Nt is called the core array and V(i) ¼ [v(i)1, . . . , v(i)j, . . . , v(i)ri] 2 RNi ri are matrices containing the singular vectors v(i)j 2 RNi of Y in the three modes (i ¼ c, x, t). These matrices are orthogonal, V(i)V(i)T ¼ I, just as in the matrix case. A schematic representation of the HOSVD is given in Figure 5.8. The core array C is the counterpart of the diagonal matrix D in the SVD case in Equation 5.4, except that it is not hyperdiagonal but fulfils the less restrictive property of being allorthogonal. All-orthogonality is defined as hCi¼ Ci¼ i ¼ 0 where i ¼ c, x, t and a 6¼ b kCi¼1 k kCi¼2 k kCi¼ri k 0,8i

(5:17)

Signal and Image Processing for Remote Sensing

88

rc Nc V(c ) Nc

Nx

y

V(x ) Nx

FIGURE 5.8 HOSVD of a three-mode data set Y.

rc

rx

Nt

rx

rt

C rt

V(t ) Nt

where h.,.i is the classical scalar product between matrices4 and kk is the matrix Frobenius norm defined in Section 5.2 (because here we deal with three-mode data and the ‘‘slices’’ Ci ¼ a define matrices). Thus, hCi ¼ a, Ci bi ¼ 0 corresponds to orthogonality between slices of the core array. Clearly, the all-orthogonality property consists of orthogonality between two slices (matrices) of the core array cut in the same mode and ordering of the norm of these slices. This second property is the counterpart of the decreasing arrangement of the singular values in the SVD [7], with the special property of being valid here for norms of slices of C and in its three modes. As a consequence, the ‘‘energy’’ of the three-mode data set Y is concentrated at the (1,1,1) corner of the core array C. The notation n in Equation 5.16 is called the n-mode product and there are three such products (namely 1, 2, and 3), which can be defined for the three-mode case. Given a multi-way array A 2 RI1 I2 I3, then the three possible n-mode products of A with matrices are: P (A 1 B)ji2 i3 ¼ ai1 i2 i3 bji1 i1

(A 2 C)i1 ji3 ¼ (A 3 D)i1 i2 j ¼

P

ai1 i2 i3 cji2

i2

P i3

ai1 i2 i3 dji3

(5:18)

where B 2 RJ I1, C 2 RJ I2, and D 2 RJ I3. This is a general notation in (multi-)linear algebra and even the SVD of a matrix can be expressed with such a product. For example, the SVD given in Equation 5.4 can be rewritten, using n-mode products, as Y ¼ D 1 U 2 V [10]. 5.5.1.2

Computation of the HOSVD

The problem of finding the elements of a three-mode decomposition was originally solved using alternate least square (ALS) techniques (see Ref. [18] for details). It was only in Ref. [14] that a technique based on unfolding matrix SVDs was proposed. We present briefly here a way to compute the HOSVD using this approach. From a multi-way array Y 2 RNc Nx Nt, it is possible to build three unfolding matrices, with respect to the three modes c, x, and t, in the following way: 8 < Y(c) 2 RNc Nx Nt Nc Nx Nt (5:19) ) Y(x) 2 RNx Nt Nc Y2R : Y(t) 2 RNt Nc Nx 4

hA, Bi ¼ SiI ¼ 1 Sj J ¼ 1aijbij is the scalar product between the matrices A ¼ {aij} 2 RI J and B ¼ {bij} 2 RI J.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 89 Nx Nc Y(c )

Nc

Nx Nt

Nx

Nt

Nt

Nc Nx Nt

Y(x )

Nc

Nc

Nc

Nt

Nx Nt

Y(t ) FIGURE 5.9 Schematic representation of the unfolding matrices.

Nx

A schematic representation of these unfolding matrices is presented in Figure 5.9. Then, these three unfolding matrices admit the following SVD decompositions: YT(i)

SVD

U(i) D(i) VT(i)

(5:20)

where each matrix V(i) (i ¼ c, x, t) in Equation 5.16 is the right matrix given by the SVD of T each transposed unfolding matrix Y(i) . The choice of the transpose was made to keep a homogeneous notation between matrix and multi-way processing. The matrices V(i) define orthonormal bases in the three modes of the vector space RNc Nx Nt. The core array is then directly obtained using the formula: C ¼ Y 1 VT(c) 2 VT(x) 3 VT(t)

(5:21)

The singular values contained in the three matrices D(i) (i ¼ c, x, t) in Equation 5.20 are called three-mode singular values. Thus, the HOSVD of a multi-way array can be easily obtained from the SVDs of the unfolding matrices, which makes computing of this decomposition easy using already existing algorithms.

5.5.1.3 The (rc, rx, rt)-rank Given a three-mode data Y 2 RNc Nx Nt, one gets three unfolded matrices Y(c), Y(x), and Y(t), 0 with respective ranks rc, rx, and rt. That is, the rj s are given as the number of nonvanishing singular values contained in the matrices D(i) in Equation 5.20, with i ¼ c, x, t. As mentioned before, the HOSVD is not a canonical decomposition and so is not related to the generic rank (number of rank-1 arrays that lead to the original array by linear combination). Nevertheless, the HOSVD gives other information named, in the threemode case, the three-mode rank. The three-mode rank consists of a triplet of ranks: the (rc, rx, rt)-rank, which is made up of the ranks of matrices Y(c), Y(x), and Y(t) in the HOSVD. In the sequel, the three-mode rank will be of use for determination of subspace dimensions, and so will be the counterpart of the classical rank used in matrix processing techniques.

Signal and Image Processing for Remote Sensing

90 5.5.1.4

Three-Mode Subspace Method

As in the matrix case, it is possible, using the HOSVD, to define a subspace method that decomposes the original three-mode data set into orthogonal subspaces. Such a technique was first proposed in Ref. [10] and can be stated as follows. Given a threemode data set Y 2 RNcNxNt, it is possible to decompose it into the sum of a signal and a noise subspace as Y ¼ YSignal þ YNoise

(5:22)

with a weaker constraint than in the matrix case (orthogonality), which is a mode orthogonality, that is, orthogonality in the three modes [10] (between subspaces defined using the unfolding matrices). Just as in the matrix case, the signal and noise subspaces are formed by different vectors obtained from the decomposition of the data set. In the threemode case, the signal subspace YSignal is built using the first pc rc singular vectors in the first mode, px rx in the second, and pt rt in the third: YSignal ¼ Y 1 PVpc 2 PVpx 3 PVpt (x)

(c)

(t)

(5:23)

with PVpi the projectors given by (i)

p

pT

PVpi ¼ V(i)i V(i)i (i)

(5:24)

pi where V(i) ¼ [ v(i)1, . . . , v(i)pi] are the matrices containing the first pi singular vectors (i ¼ c, x, t). Then after estimation of the signal subspace, the noise subspace is simply obtained by subtraction, that is, YNoise ¼ Y YSignal. The estimation of the signal subspace consists in finding a triplet of values pc, px, pt that allows recovery of the signal part by the (pc, px, pt)-rank truncation of the original data set Y. This truncation is obtained by classical matrix truncation of the three SVDs of the unfolding matrices. However, it is important to note that such a truncation is not the best (p1, p2, p3)-rank truncation of the data [10]. Nevertheless, the decomposition of the original three-mode data set is possible and leads, under some assumptions, to the separation of the recorded wavefields. From a practical point of view, the choice of pc, px, pt values is made by finding abrupt changes of the slope in the curves of relatives of three-mode singular values (the three sets of singular values contained in the matrices D(i)). For some special cases for which no ‘‘visible’’ change of slope can be found, the value of pc can be fixed at 1 for a linear polarization of the aligned wavefield (denoted by s1(m) ), or at 2 for an elliptical polarization [10]. The value of px can be fixed at 1 and the value of pt can be fixed at 1 for a perfect alignment of waves, or at 2 for not an imperfect alignment or for dispersive waves. As in the matrix case, the HOSVD-based subspace method decomposes the original space of the data set into orthogonal subspaces, and so following the ideas developed in Section 5.3.2, it is possible to add an ICA step to modify the orthogonal constraint in the temporal mode.

5.5.2

HOSVD and Unimodal ICA

To enhance wavefield separation results, we now introduce a unimodal ICA step following the HOSVD-based subspace decomposition.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 91 5.5.2.1 HOSVD and ICA The SVD of the unfolded matrix Y(t) in Equation 5.20 provides two orthogonal matrices U(t) and V(t) made up by the left and right singular vectors u(t)j and v(t)j. As for the SVD, the v(t)js0 are the estimated waves and u(t)j’s are the propagation vectors. Based on the same motivations as in the SVD case, the unjustified orthogonality constraint for the propagation vectors can be relaxed by imposing a fourth-order independence for the estimated waves. Assuming the recorded waves are mutually independent, we can write: ~ R ¼ [v ~(t)1 , . . . ,v ~(t)R ] VR(t) B ¼ V (t)

(5:25)

with B 2 RR R the rotation (unitary) matrix given by one of the algorithms presented in R Section 5.3.2 and V(t) ¼ [v(t)1, . . . ,v(t)R] 2 RNt R made up by the first R vectors of V(t). Here we also suppose that the nonaligned waves in the unfolded matrix Y(t) are contained in a subspace of dimension R 1, smaller than the rank rt of Y(t). After the ICA step, a new matrix can be computed: h i ~ R ; VNt R ~ (t) ¼ V V (t) (t)

(5:26)

R ~ (t) ~(t)j of the matrix V This matrix is made up of the R vectors v , which are independent at the fourth order, and by the last Nt R vectors v(t)j of the matrix V(t), which are kept unchanged. The HOSVD–unimodal ICA decomposition is defined as

~ 1 V(c) 2 V(x) 3 V ~ (t) Y¼C

(5:27)

~ ¼ Y 1 VT 2 VT 3 V ~T C (c) (x) (t)

(5:28)

with

Unimodal ICA implies here that ICA is only performed on one mode (the temporal mode). ~ (t) As in the SVD case, a permutation s(.) between the vectors of V(c), V(x), respectively, V must be performed for ordering the Frobenius norms of the subarrays (obtained by fixing ~ . Hence, we keep the same decomposition structure as one index) of the new core array C in relations given in Equation 5.16 and Equation 5.21, the only difference is that we have modified the orthogonality into a fourth-order independence constraint for the first R estimated waves on the third mode. Note that the (rc,rx,rt)-rank of the three-mode data set Y is unchanged. 5.5.2.2 Subspace Method Using HOSVD–Unimodal ICA On the temporal mode, a new projector can be computed after the ICA and the permutation steps: T

~ ~ ~pt ¼ V ~ ~pt V ~ ~pt P (t) (t) V

(5:29)

(t)

~ pt ~ (t) ~t] is the matrix containing the first p~t estimated waves, which where V ¼ [~ v(t)1, . . . ,~ v(t)p ~ (t) defined in Equation 5.26. Note that the two projectors on the first are the columns of V two modes Pvpc and Pvpx , given by Equation 5.24, must be recomputed after the ðcÞ ðcÞ permutation step.

Signal and Image Processing for Remote Sensing

92

The signal subspace using the HOSVD–unimodal ICA is thus given as ~ Signal ¼ Y 1 P pc 2 P px 3 P ~ ~ ~pt Y V V V (c)

(x)

(5:30)

(t)

~ Noise is obtained by the subtraction of the signal subspace from and the noise subspace Y the original three-mode data: ~ Signal ~ Noise ¼ Y Y Y

(5:31)

In practical situations, as in the SVD–ICA subspace case, the value of R becomes a parameter. It is chosen to fully describe the aligned wave by the first R estimated waves v(t)j obtained while using the HOSVD. The choice of the pc, px, and ~ pt values is made by finding abrupt changes of the slope in the curves of modified three-mode singular values, obtained after the ICA and the permutation steps. Note that in the HOSVD–unimodal ICA subspace method, the rank for the signal subspace in the third mode, ~pt, is not necessarily equal to the rank pt obtained using only the HOSVD. For some special cases for which no ‘‘visible’’ change of slope can be found, the value of ~pt can be fixed at 1 for a perfect alignment of waves, or at 2 for an imperfect alignment or for dispersive waves. As in the HOSVD subspace method, px can be fixed at 1 and pc can be fixed at 1 for a linear polarization of the aligned wavefield, or at 2 for an elliptical polarization. Applications to simulated and real data are presented in the following sections to illustrate the behavior of the HOSVD–unimodal ICA method in comparison with component-wise SVD (SVD applied on each component of the multi-way data separately) and HOSVD subspace methods. 5.5.3

Application to Simulated Data

This simulation represents a multi-way data set Y 2 R218256 composed of Nx ¼ 18 sensors each recording two directions (Nc ¼ 2) in the 3D space for a duration of Nt ¼ 256 time samples. The first component is related to a geophone Z and the second one to a hydrophone Hy. The Z component was scaled by 5 to obtain the same amplitude range. This data set shown in Figure 5.10c and Figure 5.11c has polarization, distance, and time as modes. It was obtained by the addition between an original signal subspace S with the two components shown in Figure 5.10a and Figure 5.11a, respectively, and an original noise subspace N (Figure 5.10b and Figure 5.11b respectively) obtained from a real geophysical acquisition after subtraction of aligned waves. The original signal subspace is made of several wavefronts having infinite apparent velocity, associated with the aligned wave. The relation between Z and Hy is a linear relation, which is assimilated to the wave polarization (polarization mode) in the sense that it consists of phase and amplitude relations between the two components. Wave amplitudes vary along the sensors, simulating attenuation along the distance mode. The noise is uncorrelated from one sensor to the other (spatially white) and also unpolarized. The SNR ratio of this data set is SNR ¼ 3 dB, where the SNR definition is: SNR ¼ 20 log10

kSk kNk

(5:32)

where kk is the multi-way array Frobenius norm5, S, and N the original signal and noise subspaces. 5

For any multi-way array X ¼ {xijk} 2 RIJK, his Frobenius norm is kXk ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ j Sii¼1 Sj¼1 SKk¼1 x2ijk .

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 93 Distance (sensors)

Distance (sensors)

Distance (sensors) 0

2 4

6 8

10

12 14

16 18

0

2 4

6 8

10

12 14

16 18

0

2 4

6 8

10

12 14

16 18

0

0

0

50

50

50

100 150

150

150

Time (samples)

100

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(b) Original noise subspace N

(a) Original signal subspace S

(c) Initial dataset Y

FIGURE 5.10 Simulated data: the Z component.

Our aim is to recover the original signal (Figure 5.10a and Figure 5.11a) from the mixture, which is, in practice, the only data available. Note that normalization to unit variance of each trace for each component was done before applying the described subspace methods. This ensures that even weak peaked arrivals are well represented

Distance (sensors)

Distance (sensors)

0

2

4

6

50

50

100 150

150

Time (samples)

100

150

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(c) Initial dataset Y

8

10

12

14

0

0

50

(b) Original noise subspace N

16

18

0

2

4

6

8

10

12

14

16

18

0

2

4

6

8

10

12

14

16

18 0

(a) Original signal subspace S FIGURE 5.11 Simulated data: the Hy component.

Distance (sensors)

Signal and Image Processing for Remote Sensing

94

Distance (sensors)

Distance (sensors)

0

2 4

6 8

10

12 14

16 18 0

0 2

4

6 8

10

12

14

16

18 0

50

50

100

150

150

Time (samples)

100

Time (samples)

200

200

250

250

(b) Hy component

(a) Z component FIGURE 5.12 Signal subspace using component-wise SVD.

within the input data. After computation of signal subspaces, a denormalization was applied to find the original signal subspace. Firstly, the SVD subspace method (described in Section 5.3) was applied separately on Z and Hy components of the mixture. The signal subspace components obtained keeping only one singular value for the two components are presented in Figure 5.12. This choice was made by finding an abrupt change of slope after the first singular value in the relative singular values shown in Figure 5.13 for each seismic 2D signal. The waveforms are not well recovered in respect of the original signal components (Figure 5.10a and Figure 5.11a). One can also see that the distinction between different wavefronts is not possible. Furthermore, no arrival time estimation is possible using this technique. Low signal level is a strong handicap for a component-wise process. Applied on each component separately, the SVD subspace method does not find the same aligned polarized wave. The results depend therefore on the characteristics of each matrix signal. Using the SVD–ICA subspace method, the estimated aligned waves may be improved, but we can be confronted with the same problem. 20 15 10 5 0

1

2

4

6

8

10

12

14

16

18

1

2

4

6

8

10

12

14

16

18

20 15 10

FIGURE 5.13 Relative singular values. Top: Z component. Bottom: Hy component.

5 0

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 95 Distance (sensors)

1

2

3

4

50

50

50

100 150

150

Time (samples)

100

150

Time (samples)

100

Time (samples)

200

200

200

250

250

250

(b) Hy component of YSignal

5

0

0

2 4

6 8

10

12 14

0

(a) Z component of YSignal

Index ( j )

0

0

2 4

6 8

10

12 14

16 18

16 18

Distance (sensors)

(c) Estimated waves V(t )j

FIGURE 5.14 Results obtained using the HOSVD subspace method.

Using the HOSVD subspace method, the components of the estimated signal subspace YSignal are presented in Figure 5.14a and Figure 5.14b. In this case, the number of singular vectors kept are: one on polarization, one on distance, and one on time mode, giving a (1,1,1)-rank truncation for the signal subspace. This choice is motivated here by the linear polarization of the aligned wave. For the other two modes the choice was made by finding an abrupt change of slope after the first singular value (Figure 5.17a). There still remain some ‘‘oscillations’’ between the different wavefronts for the two components of the estimated signal subspace, that may induce some detection errors. An ICA step is required in this case to obtain a better signal separation and to cancel parasitic oscillations. ~ Signal In Figure 5.15a and Figure 5.15b, the wavefronts of the estimated signal subspace Y obtained with the HOSVD–unimodal ICA technique are very close to the original signal components. Here, the ICA method was applied on the first R ¼ 5 estimated waves shown in Figure 5.14c. These waves describe the aligned waves of the original signal subspace S. ~(t)j are shown in Figure 5.15c. As we can see, the After the ICA step, the estimated waves v ~(t)1 describes more precisely the aligned wave of the original subspace S than the first one v first estimated wave v(t)1 before the ICA step (Figure 5.14c). The estimation of signal subspace is more accurate and the aligned wavelet can be better estimated with our proposed procedure. After the permutation step, the relative singular values on the three modes in the HOSVD–unimodal ICA case are shown in Figure 5.17b. This figure justifies the choice of a (1,1,1)-rank truncation for the signal subspace, due to the linear polarization and the abrupt changes of the slopes for the other two modes. As for the bidimensional case, to compare these methods quantitatively, we use the relative error « of the estimated signal subspace defined as

Signal and Image Processing for Remote Sensing

96

Distance (sensors)

1

2

3

0 50

50

50

100 150

150

150

Time (samples)

100

Time (samples)

100

Time (samples)

200

200

200

250

250

250

~

(a) Z component of YSignal

4

5

0

2 4

6 8

10

12 14

Index ( j )

0

0

16 18

0

2

4

6

8

10

12

14

16

18

Distance (sensors)

~

(b) Hy component of YSignal

~

(c) Estimated waves v(t )j

FIGURE 5.15 Results obtained using the HOSVD–unimodal ICA subspace method. Signal 2

«¼

kS Y

k

kSk

(5:33)

2

where kk is the multi-way array Frobenius norm defined above, S is the original signal Signal represents the estimated signal subspaces obtained with the SVD, subspace, and Y HOSVD, and HOSVD–unimodal ICA methods, respectively. For this data set we obtain « ¼ 21.4% for the component-wise SVD, « ¼ 12.4% for HOSVD, and « ¼ 3.8% for HOSVD–unimodal ICA. We conclude that the ICA step minimizes the error on the estimated signal subspace. To compare the results qualitatively, the stack representation is employed. Figure 5.16 shows for each component, from left to right, the stacks on the initial data set Y, the original signal subspace S, and the estimated signal subspaces obtained with the component-wise SVD, HOSVD, and HOSVD–unimodal ICA subspace methods, respectively. 0

0

50

50

100 150

150

200

200

250

250

(a) Z component

Time (samples)

100

Time (samples)

FIGURE 5.16 Stacks. From left to right: initial data set Y, original signal subspace S, SVD, HOSVD, and HOSVD–unimodal ICA estimated subspaces, respectively.

(b) Hy component

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 97 60 50 40 1 30 20 10 0 1 15 10 5 0 1

2

2

4

5

6

10

8

15

10

12

20

14

25

16

30

18

36

(a) Using HOSVD

60 50 40 1 30 20 10 0 1 15 10 5 0 1

2

2

4

5

6

10

8

15

10

20

12

14

25

(b) Using HOSVD−unimodal ICA

16

30

18

36

FIGURE 5.17 Relative three-mode singular values.

~ Signal is very close to the stack on the original signal As the stack on the estimated signal subspace Y subspace S, we conclude that the HOSVD–unimodal ICA subspace method enhances the wave separation results.

5.5.4

Application to Real Data

We consider now a real vertical seismic profile (VSP) geophysical data set. This 3C data set was recorded by Nx ¼ 50 sensors with the depth sampling 10 m, each one made up by Nc ¼ 3 geophones recording three directions in the 3D space: X, Y, and Z, respectively. The recording time was 700 msec, corresponding to Nt ¼ 175 time samples. The Z component was scaled seven times to obtain the same amplitude range. After the preprocessing step (velocity correction based on the direct downgoing wave), the obtained data set Y 2 R350175 is shown in Figure 5.18. As in the simulation case, normalization and denormalization of each trace for each component were done before and after applying the different subspace methods. From the original data set we have constructed three seismic 2D matrix signals representing the three components of the data set Y. The SVD subspace method presented in Section 5.3.1 was applied on each matrix signal, keeping only one singular vector (respectively one singular value) for each one, due to an abrupt change of slope after the first singular value in the curves of relative singular values. As remarked in the simulation case, the SVD–ICA subspace method may improve the estimated aligned waves, but we will not find the same aligned polarized wave for all seismic matrix signals. For the HOSVD subspace method, the estimated signal subspace YSignal can be defined as a (2,1,1)-rank truncation of the data set. This choice is motivated here by the elliptical polarization of the aligned wave. For the other two modes the choice was made by finding an abrupt change of slope after the first singular value (Figure 5.20a). Using the HOSVD–unimodal ICA, the ICA step was applied here on the first R ¼ 9 estimated waves v(t)j shown in Figure 5.19a. As suggested, R becomes a parameter

Signal and Image Processing for Remote Sensing

98 Distance (m)

0 50

100 150

200 250

300 350 400

450 500 0

0 50

100 150

200 250

300 350 400

450 500 0

0 50

100 150

200 250

300 350 400

450 500 0

0.1

0.1

0.1

0.2

0.2

0.2

0.3 0.4

Time (s)

0.4

Time (s)

0.3

0.4

Time (s)

0.3

0.5

0.5

0.5

0.6

0.6

0.6

0.7

0.7

0.7

(a) X component

Distance (m)

Distance (m)

(b) Y component

(c) Z component

FIGURE 5.18 Real VSP geophysical data Y.

~(t)j shown in Figure 5.19b are while using real data. However, the estimated waves v more realistic (shorter wavelet and no side lobes) than those obtained without ICA. Due to the elliptical polarization and the abrupt change of slope after the first singular value for the other two modes (Figure 5.20b), the estimated signal subspace ~ Signal is defined as a (2,1,1)-rank truncation of the data set. This step enhances Y the wave separation results, implying a minimization of the error on the estimated signal subspace. When we deal with a real data set, only a qualitative comparison is possible. This is allowed by a stack representation. Figure 5.21 shows the stacks for the X, Y, and Z components, respectively, on the initial trimodal data Y and on the estimated signal

Index ( j )

0.2

0.2

0.3 0.4 Time (s)

0.3 0.4 Time (s) 0.5

0.5

0.6

0.6

0.7

0.7

~

(b) After ICA, v(t )j

1

3

5

7

0.1

0.1

(a) Before ICA, v(t )j

9

0

1

3

5

7

9

0

FIGURE 5.19 The first 9 estimated waves.

Index ( j )

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 99 60 40 20 1

2

3

20 10 01 10

10

20

30

40

50

5 0

1

20

40

60

80

100

120

140

(a) Using HOSVD 60 40 20 1 20

2

3

10 0 1 10

10

20

30

40

50

5 0 1

20

40

60

80

100

120

140

(b) Using HOSVD–unimodal ICA

FIGURE 5.20 Relative three-mode singular values.

subspaces given by the component-wise SVD, HOSVD, and HOSVD–unimodal ICA methods, respectively. The results on simulated and real data suggest that the three-dimensional subspace methods are more robust than the component-wise techniques because they exploit the relationship between the components directly in the process. Also, the fourth-order independence constraint of the estimated waves enhances the wave separation results 0

0

0

0.1

0.1

0.1

0.2

0.2

0.2

0.4

Time (s)

0.5

0.5

0.5

0.6

0.6

0.6

0.7

0.7

0.7

(b) Y component

0.3

0.4

Time (s)

0.3

0.4

Time (s)

0.3

(a) X component

(c) Z component

FIGURE 5.21 Stacks. From left to right: initial data set Y, SVD, HOSVD, and HOSVD–unimodal ICA estimated subspaces, respectively.

Signal and Image Processing for Remote Sensing

100

and minimizes the error on the estimated signal subspace. This emphasizes the potential of the HOSVD subspace method associated with a unimodal ICA step for vector-sensor array signal processing.

5.6

Conclusions

We have presented a subspace processing technique for multi-dimensional seismic data sets based on HOSVD and ICA. It is an extension of well-known subspace separation techniques for 2D (matrix) data sets based on SVD and more recently on SVD and ICA. The proposed multi-way technique can be used for the denoising and separation of polarized waves recorded on vector-sensor arrays. A multi-way (three-mode) model of polarized signals recorded on vector-sensor arrays allows us to take into account the additional polarization information in the processing and thus to enhance the separation results. A decomposition of three-mode data sets into all-orthogonal subspaces has been proposed using HOSVD. An extra unimodal ICA step has been introduced to minimize the error on the estimated signal subspace and to improve the separation result. Also, we have shown on simulated and real data sets that the proposed approach gives better results than the component-wise SVD subspace method. The use of multi-way array and associated decompositions for multi-dimensional data set processing is a powerful tool and ensures the extra dimension is fully taken into account in the process. This approach could be generalized to any multi-dimensional signal modelization and processing and could take advantage of recent work on tensors and multi-way array decomposition and analysis.

References 1. Vrabie, V.D., Statistiques d’ordre supe´rieur: applications en ge´ophysique et e´lectrotechnique, Ph.D. thesis, I.N.P. of Grenoble, France, 2003. 2. Vrabie, V.D., Mars, J.I., and Lacoume, J.-L., Modified singular value decomposition by means of independent component analysis, Signal Processing, 84(3), 645, 2004. 3. Sheriff, R.E. and Geldart, L.P., Exploration Seismology, Vol. 1 & 2, Cambridge University Press, 1982. 4. Freire, S. and Ulrich, T., Application of singular value decomposition to vertical seismic profiling, Geophysics, 53, 778–785, 1988. 5. Glangeaud, F., Mari, J.-L., and Coppens, F., Signal processing for geologists and geophysicists, Editions Technip, Paris, 1999. 6. Robinson, E.A. and Treitel, S., Geophysical Signal Processing, Prentice Hall, 1980. 7. Golub, G.H. and Van Loan, C.F., Matrix Computation, Johns Hopkins, 1989. 8. Moonen, M. and De Moor, B., SVD and signal processing III, in Algorithms, Applications and Architecture, Elsevier, Amsterdam, 1995. 9. Scharf, L.L., Statistical signal processing, detection, estimation and time series analysis, in Electrical and Computer Engineering: Digital Signal Processing Series, Addison Wesley, 1991. 10. Le Bihan, N. and Ginolhac, G., Three-mode dataset analysis using higher order subspace method: application to sonar and seismo-acoustic signal processing, Signal Processing, 84(5), 919–942, 2004. 11. Vrabie, V.D., Le Bihan, N., and Mars, J.I., 3D-SVD and partial ICA for 3D array sensors, Proceedings of the 72nd Annual International Meeting of the Society of Exploration Geophysicists, Salt Lake City, 2002, 1065.

Multi-Dimensional Seismic Data Decomposition by Higher Order SVD and Unimodal ICA 101 12. Cardoso, J.-F. and Souloumiac, A., Blind beamforming for non-Gaussian signals, IEE Proc.-F, 140(6), 362, 1993. 13. Comon, P., Independent component analysis, a new concept? Signal Processing, 36(3), 287, 1994. 14. De Lathauwer, L., Signal processing based on multilinear algebra, Ph.D. thesis, Katholieke Universiteit Leuven, 1997. 15. Hyva¨rinen, A. and Oja, E., Independent component analysis: algorithms and applications, Neural Networks, 13(4–5), 411, 2000. 16. Tucker, L.R., The extension of factor analysis to three-dimensional matrices, H. Gulliksen and N. Frederiksen (Eds.), Contributions to mathematical psychology series, Holt, Rinehart and Winston, 109–127, 1964. 17. Caroll, J.D. and Chang, J.J., Analysis of individual differences in multidimensional scaling via n-way generalization of Eckart–Young decomposition, Psychometrika, 35(3), 283–319, 1970. 18. Kroonenberg, P.M., Three-Mode Principal Component Analysis, DSWO Press, Leiden, 1983.

6 Application of Factor Analysis in Seismic Profiling

Zhenhai Wang and Chi Hau Chen

CONTENTS 6.1 Introduction to Seismic Signal Processing.................................................................... 104 6.1.1 Data Acquisition ................................................................................................... 104 6.1.2 Data Processing..................................................................................................... 105 6.1.2.1 Deconvolution ....................................................................................... 105 6.1.2.2 Normal Moveout................................................................................... 105 6.1.2.3 Velocity Analysis .................................................................................. 106 6.1.2.4 NMO Stretching .................................................................................... 106 6.1.2.5 Stacking .................................................................................................. 106 6.1.2.6 Migration................................................................................................ 106 6.1.3 Interpretation......................................................................................................... 107 6.2 Factor Analysis Framework ............................................................................................ 107 6.2.1 General Model....................................................................................................... 107 6.2.2 Within the Framework ........................................................................................ 109 6.2.2.1 Principal Component Analysis........................................................... 109 6.2.2.2 Independent Component Analysis .................................................... 110 6.2.2.3 Independent Factor Analysis .............................................................. 111 6.3 FA Application in Seismic Signal Processing .............................................................. 111 6.3.1 Marmousi Data Set............................................................................................... 111 6.3.2 Velocity Analysis, NMO Correction, and Stacking ........................................ 112 6.3.3 The Advantage of Stacking................................................................................. 114 6.3.4 Factor Analysis vs. Stacking ............................................................................... 114 6.3.5 Application of Factor Analysis........................................................................... 116 6.3.5.1 Factor Analysis Scheme No. 1 ............................................................ 116 6.3.5.2 Factor Analysis Scheme No. 2 ............................................................ 116 6.3.5.3 Factor Analysis Scheme No. 3 ............................................................ 118 6.3.5.4 Factor Analysis Scheme No. 4 ............................................................ 118 6.3.6 Factor Analysis vs. PCA and ICA ..................................................................... 120 6.4 Conclusions........................................................................................................................ 122 References ................................................................................................................................... 122 Appendices ................................................................................................................................. 124 6.A Upper Bound of the Number of Common Factors.................................................... 124 6.B Maximum Likelihood Algorithm.................................................................................. 125

103

Signal and Image Processing for Remote Sensing

104

6.1

Introduction to Seismic Signal Processing

Formed millions of years ago from plants and animals that died and decomposed beneath soil and rock, fossil fuels, namely, coal and petroleum, due to their low cost availability, will remain the most important energy resource for at least another few decades. Ongoing petroleum research continues to focus on science and technology needs for increased petroleum exploration and production. The petroleum industry relies heavily on subsurface imaging techniques for the location of these hydrocarbons.

6.1.1

Data Acquisition

Many geophysical survey techniques exist, such as multichannel reflection seismic profiling, refraction seismic survey, gravity survey, and heat flow measurement. Among them, reflection seismic profiling method stands out because of its target-oriented capability, generally good imaging results, and computational efficiency. These reflectivity data resolve features such as faults, folds, and lithologic boundaries measured in 10s of meters, and image them laterally for 100s of kilometers and to depths of 50 kilometers or more. As a result, seismic reflection profiling becomes the principal method by which the petroleum industry explores for hydrocarbon-trapping structures. The seismic reflection method works by processing echoes of seismic waves from boundaries between different Earth’s subsurfaces that characterize different acoustic impedances. Depending on the geometry of surface observation points and source locations, the survey is called a 2D or a 3D seismic survey. Figure 6.1 shows a typical 2D seismic survey, during which, a cable with attached receivers at regular intervals is dragged by a boat. The source moves along the predesigned seismic lines and generates seismic waves at regular intervals such that points in the subsurfaces are sampled several times by the receivers, producing a series of seismic traces. These seismic traces are saved on magnetic tapes or hard disks in the recording boat for future processing.

Receivers Water

Bottom

Subsurface 1 Subsurface 2 FIGURE 6.1 A typical 2D seismic survey.

Source

Application of Factor Analysis in Seismic Profiling 6.1.2

105

Data Processing

Seismic data processing has been regarded as having a flavor of interpretive character; it is even considered as an art [1]. However, there is a well-established sequence for standard seismic data processing. Deconvolution, stacking, and migration are the three principal processes that make up the foundation. Besides, some auxiliary processes can also help improve the effectiveness of the principal processes. In the following subsections, we briefly discuss the principal processes and some auxiliary processes. 6.1.2.1 Deconvolution Deconvolution can improve the temporal resolution ing the basic seismic wavelet to approximately a spike on the field data [2]. Deconvolution usually applied deconvolution. It is also a common practice to apply which is named poststack deconvolution.

of seismic data by compressand suppressing reverberations before stack is called prestack deconvolution to stacked data,

6.1.2.2 Normal Moveout Consider the simplest case where the subsurfaces of the Earth are horizontal, and within this layer, the velocity is constant. Here x is the distance (offset) between the source and the receiver positions, and v is the velocity of the medium above the reflecting interface. Given the midpoint location M, let t(x) be the traveltime along the raypath from the shot position S to the depth point D, then back to the receiver position G. Let t(0) be twice the traveltime along the vertical path MD. Utilizing the Pythagorean theorem, the traveltime equation as a function of offset is t2 (x) ¼ t2 (0) þ x2 =v2

(6:1)

Note that the above equation describes a hyperbola in the plane of two-way time vs. offset. A common-midpoint (CMP) gather are the traces whose raypaths associated with each source–receiver pair reflect from the same subsurface depth point D. The difference between the two-way time at a given offset t(x) and the two-way zero-offset time t(0) is called NMO. From Equation 6.1, we see that velocity can be computed when offset x and the two-way times t(x) and t(0) are known. Once the NMO velocity is estimated, the travletimes can be corrected to remove the influence of offset. DtNMO ¼ t(x) t(0) Traces in the NMO-corrected gather are then summed to obtain a stack trace at the particular CMP location. The procedure is called stacking. Now consider the horizontally stratified layers, with each layer’s thickness defined in terms of two-way zero-offset time. Given the number of layers N, interval velocities are represented as (v1, v2, . . . , vN). Considering the raypath from source S to depth D, back to receiver R, associated with offset x at midpoint location M, Equation 6.1 becomes t2 (x) ¼ t2 (0) þ x2= v2rms

(6:2)

where the relation between the rms velocity and the interval velocity is represented by

Signal and Image Processing for Remote Sensing

106

v2rms ¼

N 1 X v2 Dti (0) t(0) i¼1 i

where Dti is the vertical two-way time through the ith layer and t(0) ¼

i P

Dtk .

k¼1

6.1.2.3

Velocity Analysis

Effective correction for normal moveout depends on the use of accurate velocities. In CMP surveys, the appropriate velocity is derived by computer analysis of the moveout in the CMP gathers. Dynamic corrections are implemented for a range of velocity values and the corrected traces are stacked. The stacking velocity is defined as the velocity value that produces the maximum amplitude of the reflection event in the stack of traces, which clearly represents the condition of successful removal of NMO. In practice, NMO corrections are computed for narrow time windows down the entire trace, and for a range of velocities, to produce a velocity spectrum. The validity for each velocity value is assessed by calculating a form of multitrace correlation between the corrected traces of the CMP gathers. The values are shown contoured such that contour peaks occur at times corresponding to reflected wavelets and at velocities that produce an optimum stacked wavelet. By picking the location of the peaks on the velocity spectrum plot, a velocity function defining the increase of velocity with depth for that CMP gather can be derived. 6.1.2.4

NMO Stretching

After applying NMO correction, a frequency distortion appears, particularly for shallow events and at large offsets. This is called NMO stretching. The stretching is a frequency distortion where events are shifted to lower frequencies, which can be quantified as Df=f ¼ DtNMO =t(0)

(6:3)

where f is the dominant frequency, Df is change in frequency, and DtNMO is given by Equation 6.2. Because of the waveform distortion at large offsets, stacking the NMOcorrected CMP gather will severely damage the shallow events. Muting the stretched zones in the gather can solve this problem, which can be carried out by using the quantitative definition of stretching given in Equation 6.3. An alternative method for optimum selection of the mute zone is to progressively stack the data. By following the waveform along a certain event and observing where changes occur, the mute zone is derived. A trade-off exists between the signal-to-noise (SNR) ratio and mute, that is, when the SNR is high, more can be muted for less stretching; otherwise, when the SNR is low, a large amount of stretching is accepted to catch events on the stack. 6.1.2.5

Stacking

Among the three principal processes, CMP stacking is the most robust of all. Utilizing redundancy in CMP recording, stacking can significantly suppress uncorrelated noise, thereby increasing the SNR ratio. It also can attenuate a large part of the coherent noise in the data, such as guided waves and multiples. 6.1.2.6

Migration

On a seismic section such as that illustrated in Figure 6.2, each reflection event is mapped directly beneath the midpoint. However, the reflection point is located beneath the midpoint only if the reflector is horizontal. With a dip along the survey line the actual

Application of Factor Analysis in Seismic Profiling

107

x S

M

G

Surface

Reflector D

FIGURE 6.2 The NMO geometry of a single horizontal reflector.

reflection point is displaced in the up-dip direction; with a dip across the survey line the reflection point is displaced out of the plane of the section. Migration is a process that moves dipping reflectors into their true subsurface positions and collapses diffractions, thereby depicting detailed subsurface features. In this sense, migration can be viewed as a form of spatial deconvolution that increases spatial resolution. 6.1.3

Interpretation

The goal of seismic processing and imaging is to extract the reflectivity function of the subsurface from the seismic data. Once the reflectivity is obtained, it is the task of the seismic interpreter to infer the geological significance of a certain reflectivity pattern.

6.2

Factor Analysis Framework

Factor analysis (FA), a branch of multivariate analysis, is concerned with the internal relationships of a set of variates [3]. Widely used in psychology, biology, chemometrics1 [4], and social science, the latent variable model provides an important tool for the analysis of multivariate data. It offers a conceptual framework within which many disparate methods can be unified and a base from which new methods can be developed. 6.2.1

General Model

In FA the basic model is x ¼ As þ n

(6:4)

where x ¼ (x1, x2, . . . , xp)T is a vector of observable random variables (the test scores), s ¼ (s1, s2, . . . , sr)T is a vector r < p unobserved or latent random variables (the common factor scores), A is a (p r) matrix of fixed coefficients (factor loadings), n ¼ (n1, n2, . . . , np)T is a vector of random error terms (unique factor scores of order p). The means are usually set to zero for convenience so that E(x) ¼ E(s) ¼ E(n) ¼ 0. The random error term consists 1 Chemometrics is the use of mathematical and statistical methods for handling, interpreting, and predicting chemical data.

Signal and Image Processing for Remote Sensing

108

of errors of measurement and the unique individual effects associated with each variable xj, j ¼ 1, 2, . . . , p. For the present model we assume that A is a matrix of constant parameters and s is a vector of random variables. The following assumptions are usually made for the factor model [5]:

.

rank (A) ¼ r < p E(xjs) ¼ As

.

E(xxT) ¼ S, E(ssT) ¼ V and

.

2 6 6 ¼ E nnT ¼ 6 4

s21

0 s22

0

..

. s2p

3 7 7 7 5

(6:5)

That is, the errors are assumed to be uncorrelated. The common factors however are generally correlated, and V is therefore not necessarily diagonal. For the sake of convenience and computational efficiency, the common factors are usually assumed to be uncorrelated and of unit variance, so that V ¼ I. .

E(snT) ¼ 0 so that the errors and common factors are uncorrelated.

From the above assumptions, we have E xxT ¼ S ¼ E (As þ n)(As þ n)T ¼ E AssT AT þ AsnT þ nsT AT þ nnT ¼ AE ssT AT þ AE snT þ E nsT AT þ E nnT ¼ AVAT þ E nnT ¼Gþ

(6:6)

where G ¼ AVAT and ¼ E(nnT) are the true and error covariance matrices, respectively. In addition, postmultiplying Equation 6.4 by sT, considering the expectation, and using assumptions (6.3) and (6.4), we have E xsT ¼ E AssT þ nsT ¼ AE ssT þ E nsT ¼ AV

(6:7)

For the special case of V ¼ I, the covariance between the observation and the latent variables simplifies to E(xsT) ¼ A. A special case is found when x is a multivariate Gaussian; the second moments of Equation 6.6 will contain all the information concerning the factor model. The factor model Equation 6.4 will be linear, and given the factors s the variables x are conditionally independent. Let s 2 N(0, I), the conditional distribution of x is xjs 2 N(As, )

(6:8)

Application of Factor Analysis in Seismic Profiling

109

or 1 p(xjs) ¼ (2p)p=2 jj1=2 exp (x As)T 1 (x As) 2

(6:9)

with conditional independence following from the diagonality of . The common factors s therefore reproduce all covariances (or correlations) between the variables, but account for only a portion of the variance. The marginal distribution for x is found by integrating the hidden variables s, or ð p(x) ¼ p(xjs)p(s) ds 1 T p=2 T 1=2 T 1 (6:10) ¼ (2p) þ AA exp x þ AA x 2 The calculation is straightforward because both p(s) and p(xjs) are Gaussian. 6.2.2

Within the Framework

Many methods have been developed for estimating the model parameters for the special case of Equation 6.8. Unweighted least square (ULS) algorithm [6] is based on minimizing the sum of squared differences between the observed and estimated correlation matrices, not counting the diagonal. Generalized least square (GLS) [6] algorithm is adjusting ULS by weighting the correlations inversely according to their uniqueness. Another method, maximum likelihood (ML) algorithm [7], uses a linear combination of variables to form factors, where the parameter estimates are those most likely to have resulted in the observed correlation matrix. More details on the ML algorithm can be found in Appendix 6.B. These methods are all of second order, which find the representation using only the information contained in the covariance matrix of the test scores. In most cases, the mean is also used in the initial centering. The reason for the popularity of the second-order methods is that they are computationally simple, often requiring only classical matrix manipulations. Second-order methods are in contrast to most higher order methods that try to find a meaningful representation. Higher order methods use information on the distribution of x that is not contained in the covariance matrix. The distribution of fx must not be assumed to be Gaussian, because all the information of Gaussian variables is contained in the first two-order statistics from which all the high order statistics can be generated. However, for more general families of density functions, the representation problem has more degrees of freedom, and much more sophisticated techniques may be constructed for non-Gaussian random variables. 6.2.2.1 Principal Component Analysis Principal component analysis (PCA) is also known as the Hotelling transform or the Karhunen–Loe`ve transform. It is widely used in signal processing, statistics, and neural computing to find the most important directions in the data in the mean-square sense. It is the solution of the FA problem with minimum mean-square error and an orthogonal weight matrix. The basic idea of PCA is to find the r p linearly transformed components that provide the maximum amount of variance possible. During the analysis, variables in x are transformed linearly and orthogonally into an equal number of uncorrelated new variables in e. The transformation is obtained by finding the latent roots and vectors of either the covariance or the correlation matrix. The latent roots, arranged in descending order of magnitude, are

Signal and Image Processing for Remote Sensing

110

equal to the variances of the corresponding variables in e. Usually the first few components account for a large proportion of the total variance of x, accordingly, may then be used to reduce the dimensionality of the original data for further analysis. However, all components are needed to reproduce accurately the correlation coefficients within x. Mathematically, the first principal component e1 corresponds to the line on which the projection of the data has the greatest variance e1 ¼ arg max

kak¼1

T X

eT x

2

(6:11)

t¼1

The other components are found recursively by first removing the projections to the previous principal components: ek ¼ arg max

kek¼1

X

" e

T

x

k1 X

!#2 ei eTi x

(6:12)

i¼1

In practice, the principal components are found by calculating the eigenvectors of the covariance matrix S of the data as in Equation 6.6. The eigenvalues are positive and they correspond to the variances of the projections of data on the eigenvectors. The basic task in PCA is to reduce the dimension of the data. In fact, it can be proven that the representation given by PCA is an optimal linear dimension reduction technique in the mean-square sense [8,9]. The kind of reduction in dimension has important benefits [10]. First, the computational complexity of the further processing stages is reduced. Second, noise may be reduced, as the data not contained in the components may be mostly due to noise. Third, projecting into a subspace of low dimension is useful for visualizing the data. 6.2.2.2 Independent Component Analysis The independent component analysis (ICA) model originates from the multi-input and multi-output (MIMO) channel equalization [11]. Its two most important applications are blind source separation (BSS) and feature extraction. The mixing model of ICA is similar to that of the FA, but in the basic case without the noise term. The data have been generated from the latent components s through a square mixing matrix A by x ¼ As

(6:13)

In ICA, all the independent components, with the possible exception of one component, must be non-Gaussian. The number of components is typically the same as the number of observations. Such an A is searched for to enable the components s ¼ A1x to be as independent as possible. In practice, the independence can be maximized, for example, by maximizing nonGaussianity of the components or minimizing mutual information [12]. ICA can be approached from different starting points. In some extensions the number of independent components can exceed the number of dimensions of the observations making the basis overcomplete [12,13]. The noise term can be taken into the model. ICA can be viewed as a generative model when the 1D distributions for the components are modeled with, for example, mixtures of Gaussians (MoG). The problem with ICA is that it has the ambiguities of scaling and permutation [12]; that is, the indetermination of the variances of the independent components and the order of the independent components.

Application of Factor Analysis in Seismic Profiling

111

6.2.2.3 Independent Factor Analysis Independent factor analysis (IFA) is formulated by Attias [14]. It aims to describe p generally correlated observed variables x in terms of r < p independent latent variables s and an additive noise term n. The proposed algorithm derives from the ML and more specifically from the expectation–maximization (EM) algorithm. IFA model differs from the classic FA model in that the properties of the latent variables it involves are different. The noise variables n are assumed to be normally distributed, but not necessarily uncorrelated. The latent variables s are assumed to be mutually independent but not necessarily normally distributed; their densities are indeed modeled as mixtures of Gaussians. The independence assumption allows modeling the density of each si in the latent space separately. There are some problems with the EM–MoG algorithm. First, approximating source densities with MoGs is not so straightforward because the number of Gaussians has to be adjusted. Second, EM–MoG is computationally demanding where the complexity of computation grows exponentially with the number of sources [14]. Given a small number of sources the EM algorithm is exact and all the required calculations can be done analytically, whereas it becomes intractable as the number of sources in the model increases.

6.3 6.3.1

FA Application in Seismic Signal Processing Marmousi Data Set

Marmousi is a 2D synthetic data set generated at the Institut Franc¸is du Pe´trole (IFP). The geometry of this model is based on a profile through the North Quenguela trough in the Cuanza basin [15,16]. The geometry and velocity model was created to produce complex seismic data, which requires advanced processing techniques to obtain a correct Earth image. Figure 6.3 shows the velocity profile of the Marmousi model. Based on the profile and the geologic history, a geometric model containing 160 layers was created. Velocity and density distributions were defined by introducing realistic horizontal and vertical velocities and density gradients. This resulted in a 2D density– velocity grid with dimensions of 3000 m in depth by 9200 m in offset. Marmousi velocity model

0

Depth (m)

500 1000 1500 2000 2500 3000 0

1000

2000

3000

4000

5000

Offset (m) FIGURE 6.3 Marmousi velocity model.

6000

7000

8000

9000

Signal and Image Processing for Remote Sensing

112

Data were generated by a modeling package that can simulate a seismic line by computing successively the different shot records. The line was ‘‘shot’’ from west to east. The first and last shot points were, respectively, 3000 and 8975 m from the west edge of the model. Distance between shots was 25 m. Initial offset was 200 m and the maximum offset was 2575 m.

6.3.2

Velocity Analysis, NMO Correction, and Stacking

Given the Marmousi data set, after some conventional processing steps described in Section 6.2, the results of velocity analysis and normal moveout are shown in Figure 6.4. The left-most plot is a CMP gather. There are totally 574 CMP gathers in the Marmousi data set; each includes 48 traces. On the second plot, velocity spectrum is generated after the CMP gather is NMOcorrected and stacked using a range of constant velocity values, and the resultant stack traces for each velocity are placed side by side on a plane of velocity vs. two-way zerooffset time. By selecting the peaks on the velocity spectrum, an initial rms velocity can be defined, shown as a curve on the left of the second plot. The interval velocity can be calculated by using Dix formula [17] and shown on the right side of the plot. Given the estimated velocity profile, the real moveout correction can be carried out, shown in the third plot. As compared with the first plot, we can see the hyperbolic curves are flattened out after NMO correction. Usually another procedure called muting will be carried out before stacking because as we can see in the middle of the third plot, there are CMP gather

Velocity spectrum

NMO-corrected

Optimum muting

0.5

0.5

Time (sec)

1

1

1.5

1.5

2

2

2.5

3

2.5

Offset (m) 1000 2000

3000 4000 5000 Velocity (m/sec)

FIGURE 6.4 Velocity analysis and stacking of Marmousi data set.

6000 Offset (m)

3 Offset (m)

Application of Factor Analysis in Seismic Profiling

113

great distortions because of the approximation. That part will be eliminated before stacking all the 48 traces together. The fourth plot just shows a different way of highlighting the muting procedure. For details, see Ref. [1]. After we complete the velocity analysis, NMO correction, and stacking for the 56 of the CMPs, we get the following section of the subsurface image as on the left of Figure 6.5. There are two reasons that only 56 out of 574 of the CMPs are stacked. One reason is that the velocity analysis is too time consuming on a personal computer and the other is that although 56 CMPs are only one tenths of the 574 CMPs, it indeed covers nearly 700 m of the profile. It is enough to compare processing difference. The right plot is the same image as the left one except that it is after the automatic amplitude adjustment, which is to stress the vague events so that both the vague events and strong events in the image are shown with approximately the same amplitude. The algorithm includes three easy steps: 1. Compute Hilbert envelope of a trace. 2. Convolve the envelope with a triangular smoother to produce the smoothed envelope. 3. Divide the trace by the smoothed envelope to produce the amplitude-adjusted trace. By comparing the two plots, we can see that vague events at the top and bottom of the image are indeed stressed. In the following sections, we mainly use automatic amplitudeadjusted image to illustrate results. It needs to be pointed out that due to NMO stretching and lack of data at small offset after muting, events before 0.2 sec in Figure 6.5 are shown as distorted and do not provide

Stacking

Automatic amplitude adjustment

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

FIGURE 6.5 Stacking of 56 CMPs.

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

114

useful information. In the following sections, when we compare the result, we mainly consider events after 0.2 sec. 6.3.3

The Advantage of Stacking

Stacking is based on the assumption that all the traces in a CMP gather correspond to one single depth point. After they are NMO-corrected, the zero-offset traces should contain the same signal embedded in different random noises, which are caused by the different raypaths. The process of adding them together in this manner can increase the SNR ratio by adding up the signal components while canceling the noises among the traces. To see what stacking can do to improve the subsurface image quality, let us compare the image obtained from a single trace and that from stacking the 48 muted traces. In Figure 6.6, the single trace result without stacking is shown in the right plot. For every CMP (or CDP) gather, only the trace of smallest offset is NMO-corrected and placed side by side together to produce the image, while in the stack result in the left plot, 48 NMO-corrected and muted traces are stacked and placed side by side. Clearly, after stacking, the main events at 0.5, 1.0, and 1.5 sec are stressed, and the noise in between is canceled out. Noise at 0.2 is effectively removed. Noise caused by multiples from 2.0 to 3.0 sec is significantly reduced. However, due to NMO stretching and muting, there are not enough data to depict events at 0 to 0.25 sec on both plots.

6.3.4

Factor Analysis vs. Stacking

Now we suggest an alternative way of obtaining the subsurface image by using FA instead of stacking. As presented in Appendix 6.A, FA can extract one unique common factor from the traces with maximum correlation among them. It fits well with what is

Without stacking

0.5

0.5

1

1 Time (sec)

Time (sec)

Stacking

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.6 Comparison of stacking and single trace images.

3

3000

3200 3400 CDP (m)

Application of Factor Analysis in Seismic Profiling

115

expected of zero-offset traces in that after NMO correction they contain the same signal embedded in different random noises. There are two reasons that FA works better than stacking. First, FA model considers scaling factor A as in Equation 6.14, while stacking assumes no scaling as in Equation 6.15. Factor analysis: x ¼ As þ n

(6:14)

Stacking: x ¼ s þ n

(6:15)

When the scaling information is lost, simple summation does not necessarily increase the SNR ratio. For example, if one scaling factor is 1 and the other is 1, summation will simply cancel out the signal component completely, leaving only the noise component. Second, FA makes use of the second-order statistics explicitly as the criterion to extract the signal while stacking does not. Therefore, SNR ratio will improve more in the case of FA than in the case of stacking. To illustrate the idea, x(t) are generated using the following equation: x(t) ¼ As(t) þ n(t) ¼ A cos (2pt) þ n(t) where s(t) is the sinusoidal signal, n(t) are 10 independent noise terms with Gaussian distribution. The matrix of factor loadings A is also generated randomly. Figure 6.7 shows the result of stacking and FA. The top plot is one of the ten observations x(t). The middle plot is the result of stacking and the bottom plot is the result of FA using ML algorithm as presented in Appendix 6.B. Comparing the two plots suggests that FA outperforms stacking in improving the SNR ratio.

Observable variable

Result of stacking

Result of factor analysis

FIGURE 6.7 Comparison of stacking and FA.

Signal and Image Processing for Remote Sensing

116

0.5 Mute zone

Time (sec)

1

1.5

2

2.5

3

0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.8 Factor analysis of Scheme no. 1.

6.3.5

Application of Factor Analysis

The simulation result in Section 6.3.4 suggests that FA can be applied to the NMOcorrected seismic data. One problem arises, however, when we inspect the zero-offset traces. They need to be muted because of the NMO stretching, which means almost all the traces will have a segment set to zero (mute zone), as is shown in Figure 6.8. Is it possible to just apply FA to the muted traces? Is it possible to have other schemes that make full use of the information at hand? In the following sections, we try to answer these questions by discussing different schemes to carry out FA. 6.3.5.1 Factor Analysis Scheme No. 1 Let us start with the easiest one. The scheme is illustrated in Figure 6.8. We will set the mute zone to zero and apply FA to a CMP gather using ML algorithm. Extracting one single common factor from the 48 traces, and placing all the resulting factors from 56 CMP gathers side by side, the right plot in Figure 6.9 is obtained. Compared with the result of stacking shown on the left, events from 2.2 to 3.0 sec are more smoothly presented instead of the broken dashlike events after stacking. However, at near offset, from 0 to 0.7 sec, the image is contaminated with some vertical stripes. 6.3.5.2 Factor Analysis Scheme No. 2 In this scheme, the muted segments in each trace are replaced by segments of the nearest neighboring traces as is illustrated by Figure 6.10. Trace no. 44 borrows Segment 1

Application of Factor Analysis in Seismic Profiling

117 FA result of Scheme no. 1

0.5

0.5

1

1 Time (sec)

Time (sec)

Stack

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.9 Comparison of stacking and FA result of Scheme no. 1.

Segment 1 Segment 2 Segment 3 0.5

Time (sec)

1

1.5

2

2.5

3 0

5

10

FIGURE 6.10 Factor analysis of Scheme no. 2.

15

20

25 30 Trace number

35

40

45

Signal and Image Processing for Remote Sensing

118

FA result of Scheme no. 2

0.5

0.5

1

1 Time (sec)

Time (sec)

Stack

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.11 Comparison of stacking and FA result of Scheme no. 2.

from Trace no. 45 to fill out its muted segment. Trace no. 43 borrows Segments 1 and 2 from Trace no. 44 to fill out its muted segment and so on. As a result, Segment 1 from Trace no. 45 is copied to all the muted traces, from Trace no. 1 to 44. Segment 2 from Trace no. 44 is copied to traces from Trace no. 1 to 43. After the mute zone is filled out, FA is carried out to produce the result shown in Figure 6.11. Compared to stacking shown on the left, there is no improvement in the obtained image. Actually, the result is worse. Some events are blurred. Therefore, Scheme no. 2 is not a good scheme. 6.3.5.3 Factor Analysis Scheme No. 3 In this scheme, instead of copying the neighboring segments to the mute zone, the segments obtained from applying FA to the traces included in the nearest neighboring box are copied. In Figure 6.12, we first apply FA to traces in Box 1 (Trace no. 45 to 48), then Segment 1 is extracted from the result and copied to Trace no. 44. Segment 2 obtained from applying FA to traces in Box 2 (Trace no. 44 to 48) will be copied to Trace no. 43. When done, the image obtained is shown in Figure 6.13. Compared with Scheme no. 2, the result is better. But compared to stacking, there is still some contamination from 0 to 0.7 sec. 6.3.5.4 Factor Analysis Scheme No. 4 In the scheme, as is illustrated in Figure 6.14, Segment 1 will be extracted from applying FA to all the traces in Box 1 (traces from Trace no. 1 to 48), and Segment 2 will be extracted from applying FA to trace segments in Box 2 (traces from Trace no. 2 to 48). Note that the data are not muted before FA. In this manner, for every segment, all the data points available are fully utilized.

Application of Factor Analysis in Seismic Profiling

119 Box 3 Box 2 Box 1

Segment 1 Segment 2 Segment 3 0.5

Time (sec)

1

1.5

2

2.5

3 0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.12 Factor analysis of Scheme no. 3.

Stack

Factor analysis of Scheme no. 3

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.13 Comparison of stacking and FA result of Scheme no. 3.

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

120

Box 1 Box 2 Box 3

Time (sec)

0.5

1

1.5

2 Segment 3 Segment 2 2.5 Segment 1 3 0

5

10

15

20 25 30 Trace number

35

40

45

FIGURE 6.14 Factor analysis of Scheme no. 4.

In the result generated, we noticed that events from 0 to 0.16 sec are distorted. The amplitude is so large that it overshadows the other events. Comparing all the results obtained above, we conclude that both stacking and FA are unable to extract useful information from 0 to 0.16 sec. To better illustrate the FA result, we will mute the result from the distorted trace segments, and the final result is shown in Figure 6.15. Compare the results of FA and stacking; we can see that events at around 1 and 1.5 sec are strengthened. Events from 2.2 to 3.0 sec are more smoothly presented instead of the broken dashlike events in the stacked result. Overall, the SNR ratio of the image is improved.

6.3.6

Factor Analysis vs. PCA and ICA

The results of PCA and ICA (discussed in subsections 6.2.2.1 and 6.2.2.2) are placed side by side with the result of FA for comparison in Figure 6.16 and Figure 6.17. As we can see from both plots on the right side of the figures, important events are missing and the subsurface images are distorted. The reason is that the criteria used in PCA and ICA to extract the signals are improper to this particular scenario. In PCA, traces are transformed linearly and orthogonally into an equal number of new traces that have the property of being uncorrelated, where the first component having the maximum variance is used to produce the image. In ICA, the algorithm tries to extract components that are as independent to each other as possible, where the obtained components suffer from the problems of scaling and permutation.

Application of Factor Analysis in Seismic Profiling

121

Stack

Factor analysis of Scheme no. 4

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3

3200 3400 CDP (m)

3000

3200 3400 CDP (m)

FIGURE 6.15 Comparison of stacking and FA result of Scheme no. 4.

Factor analysis

Prinicipal component analysis

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

FIGURE 6.16 Comparison of FA and PCA results.

3

3000

3200 3400 CDP (m)

Signal and Image Processing for Remote Sensing

122 Factor analysis

Independent component analysis

1

1 Time (sec)

0.5

Time (sec)

0.5

1.5

1.5

2

2

2.5

2.5

3

3000

3200 3400 CDP (m)

3

3000

3200 3400 CDP (m)

FIGURE 6.17 Comparison of FA and ICA results.

6.4

Conclusions

Stacking is one of the three most important and robust processing steps in seismic signal processing. By utilizing the redundancy of the CMP gathers, stacking can effectively remove noise and increase the SNR ratio. In this chapter we propose to use FA to replace stacking to obtain better subsurface images after applying FA algorithm to the synthetic Marmousi data set. Comparisons with PCA and ICA show that FA indeed has advantages over other techniques in this scenario. It is noted that the conventional seismic processing steps adopted here are very basic and for illustrative purposes only. Better results may be obtained in velocity analysis and stacking if careful examination and iterative procedures are incorporated as is often the case in real situations.

References ¨ . Yilmaz, Seismic Data Processing, Society of Exploration Geophysicists, Tulsa, 1987. 1. O 2. E.A. Robinson, S. Treitei, R.A. Wiggins, and P.R. Gutowski, Digital Seismic Inverse Methods, International Human Resources Development Corporation, Boston, 1983. 3. D.N. Lawley and A.E. Maxwell, Factor Analysis as a Statistical Method, Butterworths, London, 1963. 4. E.R. Malinowski, Factor Analysis in Chemistry, 3rd ed., John Wiley & Sons, New York, 2002. 5. A.T. Basilevsky, Statistical Factor Analysis and Related Methods: Theory and Applications, 1st ed., Wiley-Interscience, New York, 1994.

Application of Factor Analysis in Seismic Profiling

123

6. H. Harman, Modern Factor Analysis, 2nd ed., University of Chicago Press, Chicago, 1967. 7. K.G. Jo¨reskog, Some contributions to maximum likelihood factor analysis, Psychometrika, 32:443–482, 1967. 8. I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, Heidelberg, 1986. 9. M. Kendall, Multivariate Analysis, Charles Griffin, London, 1975. 10. A. Hyva¨rinen, Survey on independent component analysis, Neural Computing Surveys, 2:94–128, 1999. 11. P. Comon, Independent component analysis, a new concept? Signal Processing, 36:287–314, 1994. 12. J. Karhunen, A. Hyva¨rinen, and E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001. 13. T. Lee, M. Girolami, M. Lewicki, and T. Sejnowski, Blind source separation of more sources than mixtures using overcomplete representations, IEEE Signal Processing Letters, 6:87–90, 1999. 14. H. Attias, Independent factor analysis, Neural Computation, 11:803–851, 1998. 15. R.J. Versteeg, Sensitivity of prestack depth migration to the velocity model, Geophysics, 58(6):873–882, 1993. 16. R.J. Versteeg, The Marmousi experience: velocity model determination on a synthetic complex data set, The Leading Edge, 13:927–936, 1994. 17. C.H. Dix, Seismic velocities from surface measurements, Geophysics, 20:68–86, 1955. 18. R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, New York, 1960.

APPENDICES

6.A

Upper Bound of the Number of Common Factors

Suppose that there is a unique , matrix S must be of rank r. This is the covariance matrix for x where each diagonal element represents the part of the variance that is due to the r common factors instead of the total variance of the corresponding variate. This is known as communality of the variate. When r ¼ 1, A reduces to a column vector of p elements. It is unique, apart from a possible change of sign of all its elements. With 1 < r < p common factors, it is not generally possible to determine A and s uniquely, even in the case of a normal distribution. Although every factor model specified by Equation 6.8 leads to a multivariate normal, the converse is not necessarily true when 1 < r < p. The difficulty is known as the factor identification or factor rotation problem. Let H be any (r r) orthogonal matrix, so that HHT ¼ HTH ¼ I, then x ¼ AHHT s þ n 8 8

¼ Aasa þ n

:

Thus, s and sa˚ have the same statistical properties since

8 E sa ¼ HT E(s)

8 cov sa ¼ HT covðsÞH ¼ HT H ¼ I Assume there exist 1 < r < p common factors such that G ¼ AVAT and is Grammian and diagonal. The covariance matrix S has 1 p C þ p ¼ p(p þ 1) 2 2 distinct elements, which equals the total number of normal equations to be solved. However, the number of solutions is infinite, as can be seen from the following derivation. Since V is Grammian, its Cholesky decomposition exists. That is, there exists a nonsingular (r r) matrix U, such that V ¼ UTU and S ¼ AVAT þ ¼ AUT UAT þ : T ¼ AUT AUT þ 8

8

¼ AaAa T þ

(6A:1) 124

Application of Factor Analysis in Seismic Profiling

125

Apparently both factorization Equation 6.6 and Equation 6A.1 leave the same residual error and therefore must represent equally valid factor solutions. Also, we can substitute Aa˚ ¼ AB and Va˚ ¼ B1V (BT)1, which again yields a factor model that is indistinguishable from Equation 6.6. Therefore, no sample estimator can distinguish between such an infinite number of transformations. The coefficients A and Aa˚ are thus statistically equivalent and cannot be distinguished from each other or identified uniquely; that is, both the transformed and untransformed coefficients, together with , generate S in exactly the same way and cannot be differentiated by any estimation procedure without the introduction of additional restrictions. To solve the rotational indeterminacy of the factor model we require restrictions on V, the covariance matrix of the factors. The most straightforward and common restriction is to set V ¼ I. The number m of free parameters implied by the equation S ¼ AAT þ

(6A:2)

is then equal to the total number pr þ p for unknown parameters in A and , minus the number of zero restrictions placed on the off-diagonal elements of V, which is equal to 1/2(r2r) since V is symmetric. We then have m ¼ (pr þ p) 1=2(r2 r) ¼ p(r þ 1) 1=2(r2 r)

(6A:3)

where the columns of A are assumed to be orthogonal. The number of degrees of freedom d is then given by the number of equations implied by Equation 6A.2, that is, the number of distinct elements in S minus the number of free parameters m. We have d ¼ 1=2p(p þ 1) p(r þ 1) 1=2(r2 r) ¼ 1=2 (p r)2 (p r)

(6A:4)

which for a meaningful (i.e., nontrivial) empirical application must be strictly positive. This places an upper bound on the number of common factors r, which may be obtained in practice, a number which is generally somewhat smaller than the number of variables p.

6.B

Maximum Likelihood Algorithm

The maximum likelihood (ML) algorithm presented here is proposed by Jo¨reskog [7]. The algorithm uses an iterative procedure to compute a linear combination of variables to form factors. Assume that the random vector x has a multivariate normal distribution as defined in Equation 6.9. The elements of A, V, and are the parameters of the model to be estimated from the data. From a random sample of N observations of x we can find the mean vector and the estimated covariance matrix S, whose elements are the usual estimates of variances and covariances of the components of x.

Signal and Image Processing for Remote Sensing

126

mx ¼

N 1 X xi N i¼1

N 1 X (x mx )(x mx )T N 1 i¼1 ! N X 1 T T ¼ xx Nmx mx : N 1 i¼1

S¼

(6B:1)

The distribution of S is the Wishart distribution [3]. The log-likelihood function is given by h

i 1 log L ¼ (N 1) logjSj þ tr SS1 2 However, it is more convenient to minimize

F(A, V, ) ¼ logjSj þ tr SS1 logjSj p instead of maximizing log L [7]. They are equivalent because log L is a constant minus 1) times F. The function F is regarded as a function of A and . Note that if H is any nonsingular (k k) matrix, then 1 2 (N

F(AH1 , HVHT , ) ¼ F(A, V, ) which means that the parameters in A and V are not independent of one another, and to make the ML estimates of A and V unique, k2 independent restrictions must be imposed on A and V. To find the minimum of F we shall first find the conditional minimum for a given and then find the overall minimum. The partial derivative of F with respect to A is @F ¼ 2S1 (S S)S1 A @A See details in Ref. [7]. For a given , the minimization of A is to be found in the solution of S1 (S S)S1 A ¼ 0 Premultiplying with S gives (S S)S1 A ¼ 0 Using the following expression for the inverse S1 [3] S1 ¼ 1 1 A(I þ AT 1 A)1 AT 1 whose left side may be further simplified [7] so that (S S)1 A(I þ AT 1 A)1 ¼ 0

(6B:2)

Application of Factor Analysis in Seismic Profiling

127

Postmultiplying by I þ AT1A gives (S S)1 A ¼ 0

(6B:3)

which after substitution of S from Equation 6A.2 and rearrangement of terms gives S1 A ¼ A(I þ AT 1 A) Premultiplying by 1/2 finally gives (1=2 S1=2 )(1=2 A) ¼ (1=2 A)(I þ AT 1 A)

(6B:4)

From Equation 6B.4, we can see that it is convenient to take AT1A to be diagonal, since F is unaffected by postmultiplication of A by an orthogonal matrix and AT1A can be reduced to diagonal form by orthogonal transformations [18]. In this case, Equation 6B.4 is a standard eigen decomposition form. The columns of 1/2A are latent vectors of 1/2 S1/2, and the diagonal elements of I þ AT1A are the corresponding latent roots. Let e1 e2 ep be the latent roots of 1/2 S1/2 and let ee1, ee2, , eek be a set of latent vectors ek be the diagonal matrix with e1, e2, . . . , ek as corresponding to the k largest roots. Let L diagonal elements and let Ek be the matrix with ee1, ee2, . . . , eek as columns. Then ek I)1=2 e ¼ Ek ( L 1=2 A Premultiplying by 1/2 gives the conditional ML estimate of A as ek I)1=2 e ¼ 1=2 Ek (L A

(6B:5)

Up to now, we have considered the minimization of F with respect to A for a given . Now let us examine the partial derivative of F with respect to [3], h i @F e ) S1 ¼ diag S1 (S S @ e 1 with Equation 6B.2 and using Equation 6B.3 gives Substituting S h i @F e ) 1 ¼ diag 1 (S S @ which by Equation 6.6 becomes h i @F e ) 1 eA eT þ S ¼ diag 1 (A @ Minimizing it, we will get, eA eA e T) ¼ diag(S

(6B:6)

By iterating Equation 6B.5 and Equation 6B.6, the ML estimation of the FA model of Equation 6.4 can be obtained.

7 Kalman Filtering for Weak Signal Detection in Remote Sensing

Stacy L. Tantum, Yingyi Tan, and Leslie M. Collins

CONTENTS 7.1 Signal Models .................................................................................................................... 131 7.1.1 Harmonic Signal Model ...................................................................................... 131 7.1.2 Interference Signal Model ................................................................................... 131 7.2 Interference Mitigation .................................................................................................... 132 7.3 Postmitigation Signal Models ......................................................................................... 133 7.3.1 Harmonic Signal Model ...................................................................................... 134 7.3.2 Interference Signal Model ................................................................................... 134 7.4 Kalman Filters for Weak Signal Estimation ................................................................. 135 7.4.1 Direct Signal Estimation ...................................................................................... 136 7.4.1.1 Conventional Kalman Filter ................................................................ 136 7.4.1.2 Kalman Filter with an AR Model for Colored Noise ..................... 137 7.4.1.3 Kalman Filter for Colored Noise........................................................ 139 7.4.2 Indirect Signal Estimation................................................................................... 140 7.5 Application to Landmine Detection via Quadrupole Resonance............................. 142 7.5.1 Quadrupole Resonance........................................................................................ 143 7.5.2 Radio-Frequency Interference ............................................................................ 143 7.5.3 Postmitigation Signals ......................................................................................... 144 7.5.3.1 Postmitigation Quadrupole Resonance Signal................................. 145 7.5.3.2 Postmitigation Background Noise ..................................................... 145 7.5.4 Kalman Filters for Quadrupole Resonance Detection.................................... 145 7.5.4.1 Conventional Kalman Filter ................................................................ 145 7.5.4.2 Kalman Filter with an Autoregressive Model for Colored Noise ................................................................................. 146 7.5.4.3 Kalman Filter for Colored Noise........................................................ 146 7.5.4.4 Indirect Signal Estimation ................................................................... 146 7.6 Performance Evaluation .................................................................................................. 147 7.6.1 Detection Algorithms........................................................................................... 147 7.6.2 Synthetic Quadrupole Resonance Data ............................................................ 147 7.6.3 Measured Quadrupole Resonance Data ........................................................... 148 7.7 Summary ............................................................................................................................ 150 References ................................................................................................................................... 151

129

130

Signal and Image Processing for Remote Sensing

Remote sensing often involves probing a region of interest with a transmitted electromagnetic signal, and then analyzing the returned signal to infer characteristics of the investigated region. It is not uncommon for the measured signal to be relatively weak or for ambient noise to interfere with the sensor’s ability to isolate and measure only the desired return signal. Although there are potential hardware solutions to these obstacles, such as increasing the power in the transmit signal to strengthen the return signal, or altering the transmit frequency, or shielding the system to eliminate the interfering ambient noise, these solutions are not always viable. For example, regulatory constraints on the amount of power that may be radiated by the sensor or the trade-off between the transmit power and the battery life for a portable sensor may limit the power in the transmitted signal, and effectively shielding a system in the field from ambient electromagnetic signals is very difficult. Thus, signal processing is often utilized to improve signal detectability in situations such as these where a hardware solution is not sufficient. Adaptive filtering is an approach that is frequently employed to mitigate interference. This approach, however, relies on the ability to measure the interference on auxiliary reference sensors. The signals measured on the reference sensors are utilized to estimate the interference, and then this estimate is subtracted from the signal measured by the primary sensor, which consists of the signal of interest and the interference. When the interference measured by the reference sensors is completely correlated with the interference measured by the primary sensor, the adaptive filtering can completely remove the interference from the primary signal. When there are limitations in the ability to measure the interference, that is, the signals from the reference sensors are not completely correlated with the interference measured by the primary sensor, this approach is not completely effective. Since some residual interference remains after the adaptive interference cancellation, signal detection performance is adversely affected. This is particularly true when the signal of interest is weak. Thus, methods to improve signal detection when there is residual interference would be useful. The Kalman filter (KF) is an important development in linear estimation theory. It is the statistically optimal estimator when the noise is Gaussian-distributed. In addition, the Kalman filter is still the optimal linear estimator in the minimum mean square error (MMSE) sense even when the Gaussian assumption is dropped [1]. Here, Kalman filters are applied to improve detection of weak harmonic signals. The emphasis in this chapter is not on developing new Kalman filters but, rather, on applying them in novel ways for improved weak harmonic signal detection. Both direct estimation and indirect estimation of the harmonic signal of interest are considered. Direct estimation is achieved by applying Kalman filters in the conventional manner; the state of the system is equal to the signal to be estimated. Indirect estimation of the harmonic signal of interest is achieved by reversing the usual application of the Kalman filter so the background noise is the system state to be estimated, and the signal of interest is the observation noise in the Kalman filter problem statement. This approach to weak signal estimation is evaluated through application to quadrupole resonance (QR) signal estimation for landmine detection. Mine detection technologies and systems that are in use or have been proposed include electromagnetic induction (EMI) [2], ground penetrating radar (GPR) [3], and QR [4,5]. Regardless of the technology utilized, the goal is to achieve a high probability of detection, PD, while maintaining a low probability of false alarm, PFA. This is of particular importance for landmine detection since the nearly perfect PD required to comply with safety requirements often comes at the expense of a high PFA, and the time and cost required to remediate contaminated areas is directly proportional to PFA. In areas such as a former battlefield, the average ratio of real mines to suspect objects can be as low as 1:100, thus the process of clearing the area often proceeds very slowly.

Kalman Filtering for Weak Signal Detection in Remote Sensing

131

QR technology for explosive detection is of crucial importance in an increasing number of applications. Most explosives, such as RDX, TNT, PETN, etc., contain nitrogen (N). Some of its isotopes, such as 14N, possess electric quadrupole moments. When compounds with such moments are probed with radio-frequency (RF) signals, they emit unique signals defined by the specific nucleus and its chemical environment. The QR frequencies for explosives are quite specific and are not shared by other nitrogenous materials. Since the detection process is specific to the chemistry of the explosive and therefore is less susceptible to the types of false alarms experienced by sensors typically used for landmine detection, such as EMI or GPR sensors, the pure QR of 14N nuclei supports a promising method for detecting explosives in the quantities encountered in landmines. Unfortunately, QR signals are weak, and thus vulnerable to both the thermal noise inherent in the sensor coil and external radio-frequency interference (RFI). The performance of the Kalman filter approach is evaluated on both simulated data and measured field data collected by Quantum Magnetics, Inc. (QM). The results show that the proposed algorithm improves the performance of landmine detection.

7.1

Signal Models

In this chapter, it is assumed that the sensor operates by repeatedly transmitting excitation pulses to investigate the potential target and acquires the sensor response after each pulse. The data acquired after each excitation pulse are termed a segment, and a group of segments constitutes a measurement. In general, for each potential target there are multiple measurements with each measurement containing many segments.

7.1.1

Harmonic Signal Model

The discrete-time harmonic signal of interest, at frequency f0, in a single segment can be represented by s(n) ¼ A0 cos (2pf0 n þ f0 ), n ¼ 0, 1, . . . , N 1

(7:1)

The measured signal may be demodulated at the frequency of the desired harmonic signal, f0, to produce a baseband signal, ~s(n), ~s(n) ¼ A0 ejf0 ,

n ¼ 0, 1, . . . , N 1

(7:2)

Assuming the frequency of the harmonic signal of interest is precisely known, the signal of interest after demodulation and subsequent low-pass filtering to remove any aliasing introduced by the demodulation is a DC constant.

7.1.2

Interference Signal Model

A source of interference for this type of signal detection problem is ambient harmonic signals. For example, sensors operating in the RF band could experience interference due to other transmitters operating in the same band, such as radio stations. Since there may be many sources transmitting harmonic signals operating simultaneously, the demodulated

Signal and Image Processing for Remote Sensing

132

interference signal measured in each segment may be modeled as the sum of the contribution from M different sources, each operating at its own frequency fm, ~I ¼

M X

~ m (n)ej(2pfm nþfm ) , A

n ¼ 0, 1, . . . , N 1

(7:3)

m¼1

where the superscript denotes a complex value and we assume the frequencies are ˜ m(n) from a discrete time series. distinct, meaning fi 6¼ fj for i 6¼ j. The amplitudes A Although the amplitudes are not restricted to constant values in general, they are assumed to remain essentially constant over the short time intervals during which each data ˜ m(n) is segment is collected. For time intervals on this order, it is reasonable to assume A constant for each data segment, but may change from segment to segment. Therefore, the interference signal model may be expressed as ~I ¼

M X

~ m ej(2pfm nþfm ) , n ¼ 0, . . . , N 1 A

(7:4)

m¼1

This model represents all frequencies even though the harmonic signal of interest exists in a very narrow band. In practice, only the frequency corresponding to the harmonic signal of interest needs to be considered.

7.2

Interference Mitigation

Adaptive filtering is a widely applied approach for noise cancellation [6]. The basic approach is illustrated in Figure 7.1. The primary signal consists of both the harmonic signal of interest and the interference. In contrast, the signal measured on each auxiliary antenna or sensor consists of only the interference. Adaptive noise cancellation utilizes the measured reference signals to estimate the noise present in the measured primary signal. The noise estimate is then subtracted from the primary signal to find the signal of interest. Adaptive noise cancellation, such as the least mean square (LMS) algorithm, is well suited for those applications in which one or more reference signals are available [6].

Signal of interest

Interference signal

+ Primary antenna

Reference antenna

+ −

Adaptive filter

FIGURE 7.1 Interference mitigation based on adaptive noise cancellation.

Signal estimate

Kalman Filtering for Weak Signal Detection in Remote Sensing

133

In this application, the adaptive noise cancellation is performed in the frequency domain by applying the normalized LMS algorithm to each frequency component of the frequency domain representation of the measured primary signal. The primary measured signal at time n may be denoted by d(n) and the measured reference signal with u(n). The tap input vector may be represented by u(n) ¼ [u(n) u(n 1) u(n M þ 1)]T and ^ (n). Both the tap input and tap weight vectors are the tap weight vector may be given by w of length M. Given these definitions, the filter output at time n, e(n), is ^ H (n)u(n) e(n) ¼ d(n) w

(7:5)

^ H(n)u(n) represents the interference estimate. The tap weights are updated The quantity w according to ^ (n þ 1) ¼ w ^ (n) þ m0 P1 (n)u(n)e (n) w

(7:6)

where the parameter m0 is an adaptation constant that controls the convergence rate and P(n) is given by P(n) ¼ bP(n 1) þ (1 b)ju(n)j2

(7:7)

with 0 < b < 1 [7]. The extension of this approach to utilize multiple reference signals is straightforward.

7.3

Postmitigation Signal Models

Under perfect circumstances the interference present in the primary signal is completely correlated with the reference signals and all interference can be removed by the adaptive noise cancellation, leaving only Gaussian noise associated with the sensor system. Since the interference often travels over multiple paths and the sensing systems are not perfect, however, the adaptive interference mitigation rarely removes all the interference. Thus, there is residual interference that remains after the adaptive noise cancellation. In addition, the adaptive interference cancellation process alters the characteristics of the signal of interest. The real-valued observed data prior to interference mitigation may be represented by x(n) ¼ s(n) þ I(n) þ w(n),

n ¼ 0, 1, . . . , N 1

(7:8)

where s(n) is the signal of interest, I(n) is the interference, and w(n) is Gaussian noise associated with the sensor system. The baseband signal after demodulation becomes ~ ~ (n) x(n) ¼ ~s(n) þ ~I (n) þ w

(7:9)

where ~I (n) is the interference, which is reduced but not completely eliminated by adaptive filtering. The signal remaining after interference mitigation is ~ y(n) ¼ ~s(n) þ ~ v(n), n ¼ 0, 1 , . . . , N 1

(7:10)

Signal and Image Processing for Remote Sensing

134

where ~s(n) is now the altered signal of interest and ~v(n) is the background noise remaining after interference mitigation, both of which are described in the following sections. 7.3.1

Harmonic Signal Model

Due to the nonlinear phase effects of the adaptive filter, the demodulated harmonic signal of interest is altered by the interference mitigation; it is no longer guaranteed to be a DC constant. The postmitigation harmonic signal is modeled with a first-order Gaussian– Markov model, ~s(n þ 1) ¼ ~s(n) þ «~(n)

(7:11)

where ~ «(n) is zero mean white Gaussian noise with variance s~«2.

7.3.2

Interference Signal Model

Although interference mitigation can remove most of the interference, some residual interference remains in the postmitigation signal. The background noise remaining after interference mitigation, ~ v(n), consists of the residual interference and the remaining sensor noise, which is altered by the mitigation. Either an autoregressive (AR) model or an autoregressive moving average (ARMA) model is appropriate for representing the sharp spectral peaks, valleys, and roll-offs in the power spectrum of ~v(n). An AR model is a causal, linear, time-invariant discrete-time system with a transfer function containing only poles, whereas an ARMA model is a causal, linear, time-invariant discrete-time system with a transfer function containing both zeros and poles. The AR model has a computational advantage over the ARMA model in the coefficient computation. Specifically, the AR coefficient computation involves solving a system of linear equations known as Yule–Walker equations, whereas the ARMA coefficient computation is significantly more complicated because it requires solving systems of nonlinear equations. Estimating and identifying an AR model for real-valued time series is well understood [8]. However, for complex-valued AR models, few theoretical and practical identification and estimation methods could be found in the literature. The most common approach is to adapt methods originally developed for real-valued data to complex-valued data. This strategy works well only when the the complex-valued process is the output of a linear system driven by white circular noise whose real and imaginary parts are uncorrelated and white [9]. Unfortunately, circularity is not always guaranteed for most complexvalued processes in practical situations. Analysis of the postmitigation background noise for the specific QR signal detection problem considered here showed that it is not a pure circular complex process. However, since the cross-correlation between the real and imaginary parts is small compared to the autocorrelation, we assume that the noise is a circular complex process and can be modeled as a P-th order complex AR process. Thus, ~ v(n) ¼

P X

~ap v~(n p) þ «~(n)

(7:12)

p¼1

where the driving noise ~ «(n) is white and complex-valued. Conventional modeling methods extended from real-valued time series are used for estimating the complex AR parameters. The Burg algorithm, which estimates the AR parameters by determining reflection coefficients that minimize the sum of forward and

Kalman Filtering for Weak Signal Detection in Remote Sensing

135

backward residuals, is the preferred estimator among various estimators for AR parameters [10]. Furthermore, the Burg algorithm has been reformulated so that it can be used to analyze data containing several separate segments [11]. The usual way of combining the information across segments is to take the average of the models estimated from the individual segments [12], such as average AR parameters (AVA) and average reflection coefficient (AVK). We found that for this application, the model coefficients are stable enough to be averaged. Although this will reduce the estimate variance, it still contains a bias that is proportional to 1=N, where N is the number of observations [13]. Another novel extension of the Burg algorithm to segments, the segment Burg algorithm (SBurg), proposed in [14], estimates the reflection coefficients by minimizing the sum of the forward and backward residuals of all the segments taken together. This means a single model is fit to all the segments simultaneously. An important aspect of AR modeling is choosing the appropriate order P. Although there are several criteria to determine the order for a real AR model, no formal criterion exists for a complex AR model. We adopt the Akaike information criterion (AIC) [15] that is a common choice for real AR models. The AIC is defined as AIC(p) ¼ ln (s ^«2 (p)) þ

2p , p ¼ 1, 2, 3, . . . T

(7:13)

where ^«2(p) is the prediction error power at P-th order and T is the total number of samples. The prediction error power for a given value of P is simply the variance in the difference between the true signal and the estimate for the signal using a model of order P. For signal measurements consisting of multiple segments, i.e. S segments and N samples=segment, T ¼ N S. The model order for which the AIC has the smallest value is chosen. The AIC method tends to overestimate the order [16] and is good for short data records.

7.4

Kalman Filters for Weak Signal Estimation

Kalman filters are appropriate for discrete-time, linear, and dynamic systems whose output can be characterized by the system state. To establish notation and terminology, this section provides a brief overview consisting of the problem statement and the recursive solution of the Kalman filter variants applied for weak harmonic signal detection. Two approaches to estimate a weak signal in nonstationary noise, both employing Kalman filter approaches, are proposed [17]. The first approach directly estimates the signal of interest. In this approach, the Kalman filter is utilized in a traditional manner, in that the signal of interest is the state to be estimated. The second approach indirectly estimates the signal of interest. This approach utilizes the Kalman filter in an unconventional way because the noise is the state to be estimated, and then the noise estimate is subtracted from the measured signal to obtain an estimate of the signal of interest. The problem statements and recursive Kalman filter solutions for each of the Kalman filters considered are provided with their application to weak signal estimation. These approaches have an advantage over the more intuitive approach of simply subtracting a measurement of the noise recorded in the field because the measured background noise is nonstationary, and the residual background noise after interference mitigation is also nonstationary. Due to the nonstationarity of the background noise, it must be measured and subtracted in real time. This is exactly what the adaptive frequency-domain

Signal and Image Processing for Remote Sensing

136

interference mitigation attempts to do, and the interference mitigation does significantly improve harmonic signal detection. There are, however, limitations to the accuracy with which the reference background noise can be measured, and these limitations make the interference mitigation insufficient for noise reduction. Thus, further processing, such as the Kalman filtering proposed here, is necessary to improve detection performance.

7.4.1

Direct Signal Estimation

The first approach assumes the demodulated harmonic signal is the system state to be estimated. The conventional Kalman filter was designed for white noise. Since the measurement noise in this application is colored, it is necessary to investigate modified Kalman filters. Two modifications are considered: utilizing an autoregressive (AR) model for the colored noise and designing a Kalman filter for colored noise, which can be modeled as a Markov process. In applying each of the following Kalman filter variants to directly estimate the demodulated harmonic signal, the system is assumed to be described by state equation: ~sk ¼ ~sk1 þ «~k

(7:14)

observation equation: ~yk ¼ ~sk þ ~vk

(7:15)

where ~sk is the postmitigation harmonic signal, modeled using a first-order Gaussian– Markov model, and ~ vk is the postmitigation background noise. 7.4.1.1

Conventional Kalman Filter

The system model for the conventional Kalman filter is described by two equations: a state equation and an observation equation. The state equation relates the current system state to the previous system state, while the observation equation relates the observed data to the current system state. Thus, the system model is described by state equation: xk ¼ Fk xk1 þ Gk wk

(7:16)

observation equation: zk ¼ HH k xk þ uk

(7:17)

where the M-dimensional parameter vector xk represents the state of the system at time k, the M M matrix Fk is the known state transition matrix relating the states of the system at time k and k 1, and the N-dimensional parameter vector zk represents the measured data at time k. The M 1 vector wk represents process noise, and the N 1 vector uk is measurement noise. The M M diagonal coefficient matrix Gk modifies the variances of the process noise. If both uk and wk are independent, zero mean, white noise processes H with E{ukuH k } ¼ Rk and E{wkwk } ¼ Qk, then the initial system state x0 is a random vector, with mean x0 and covariance S0, independent of uk and wk. The Kalman filter determines the estimates of the system state, ^xkjk1 ¼ E{xkjzk1} and ^ xkjk ¼ E{xkjzk}, and the associated error covariance matrices Skjk1 and Skjk. The recursive solution is achieved in two steps. The first step predicts the current state and the error covariance matrix using the previous data, ^ xkjk1 ¼ Fk1 ^xk1jk1

(7:18)

H Skjk1 ¼ Fk1 Sk1jk1 FH k1 þ Gk1 Qk1 Gk1

(7:19)

Kalman Filtering for Weak Signal Detection in Remote Sensing

137

The second step first determines the Kalman gain, Kk, and then updates the state and the error covariance prediction using the current data, 1 Kk ¼ Skjk1 Hk (HH k Skjk1 þ Rk )

(7:20)

^ xkjk1 ) xkjk ¼ ^ xkjk1 þ Kk (zk HH k ^

(7:21)

Skjk ¼ (I Kk HH k )Skjk1

(7:22)

The initial conditions on the state estimate and error covariance matrix are ^x1j0 ¼ x0 and S1j0 ¼ S0. In general, the system parameters F, G, and H are time-varying, which is denoted in the preceding discussion by the subscript k. In the discussions of the Kalman filters implemented for harmonic signal estimation, the system is assumed to be time-invariant. Thus, the subscript k is omitted and the system model becomes state equation: xk ¼ Fxk1 þ Gwk

(7:23)

observation equation: zk ¼ HH xk þ uk

(7:24)

7.4.1.2 Kalman Filter with an AR Model for Colored Noise A Kalman filter for colored measurement noise, based on the fact that colored noise can often be simulated with sufficient accuracy by a linear dynamic system driven by white noise, is proposed in [18]. In this approach, the colored noise vector is included in an augmented state variable vector, and the observations now contain only linear combinations of the augmented state variables. The state equation is unchanged from Equation 7.23; however, the observation equation is modified so the measurement noise is colored, thus the system model is state equation: xk ¼ Fxk1 þ Gwk

(7:25)

observation equation: zk ¼ HH xk þ ~vk

(7:26)

where ~ vk is colored noise modeled by the complex-valued AR process in Equation 7.12. Expressing the P-th order AR process ~ vk in state space notion yields state equation: vk ¼ Fv vk1 þ Gv «~k

(7:27)

observation equation: ~vk ¼ HH v vk

(7:28)

3 ~ v(k P þ 1) 7 6~ 6 v(k P þ 2) 7 vk ¼ 6 7 .. 5 4 . 2

where

~ v(k) 2

0 6 0 6 . . Fv ¼ 6 6 . 4 0 ~aP

1 0 .. .

0 1 .. .

0 ~aP1

0 ~aP2

(7:29) P1

.. .

0 0 .. .

0 ~a2

3 0 0 7 .. 7 . 7 7 1 5 ~a1

PP

(7:30)

Signal and Image Processing for Remote Sensing

138

2 3 0 607 6.7 .7 Gv ¼ Hv ¼ 6 6.7 405 1

(7:31)

P 1

and ~ «k is the white driving process in Equation 7.12 with zero mean and variance s~«2. Combining Equation 7.25, Equation 7.26, and Equation 7.28, yields a new Kalman filter expression whose dimensions have been extended, k state equation: xk ¼ Fxk1 þ Gw H

observation equation: zk ¼ H xk where

H xk wk k ¼ x , wk ¼ , H ¼ HH HH v , vk «~k ¼ F 0 , and G ¼ G 0 F 0 Fv 0 Gv

(7:32) (7:33)

(7:34)

In estimation literature, this is termed the noise-free [19] or perfect measurement [20] problem. The process noise, wk, and colored noise state process noise, ~«k, are assumed to be uncorrelated, so Q 0 H Q ¼ E{wk wk } ¼ (7:35) 0 s2«~ Since there is no noise in Equation 7.33, the covariance matrix of the observation noise R ¼ 0. The recursive solution for this problem, defined in Equation 7.32 and Equation 7.33, is the same as for the conventional Kalman filter given in Equation 7.18 through Equation 7.22. When estimating the harmonic signal with the traditional Kalman filter with an AR model for the colored noise, the coefficient matrices are 3 2 ~(k) x 2 3 1 7 6~ v (k P þ 1) 7 6 6 07 v(k P þ 2) 7 k ¼ 6 x (7:36) .. 7 7, H ¼ 6 6~ 4 7 6 .. .5 5 4 . 1 Pþ1 ~ v(k) 3 2 1 0 0 0 0 60 1 0 0 0 7 6. .. 7 .. .. .. .. 6 . (7:37) F¼6. . 7 . . . . 7 40 0 0 0 1 5 0 aP aP1 a2 a1

~k w wk ¼ , «~k

2

1 60 G¼6 4 ... 0

3 0 07 .. 7 .5 1

(7:38) (Pþ1)2

Kalman Filtering for Weak Signal Detection in Remote Sensing and Q ¼ E{wk wH k }¼

s2w~ 0

0 s2«~

139

(7:39)

Since the theoretical value R ¼ 0 turns off the Kalman filter it does not track the signal, R should be set to a positive value. 7.4.1.3 Kalman Filter for Colored Noise The Kalman filter has been generalized to systems for which both the process noise and measurement noise are colored and can be modeled as Markov processes [21]. A Markov process is a process for which the probability density function of the current sample depends only on the previous sample, not the entire process history. Although the Markov assumption may not be accurate in practice, this approach remains applicable. The system model is state equation: xk ¼ Fxk1 þ Gwk

(7:40)

observation equation: zk ¼ HH k x þ uk

(7:41)

where the process noise wk and the measurement noise uk are both zero mean and colored with arbitrary covariance matrices at time k, Q, and R, so Qij ¼ cov(wi , wj ), i, j ¼ 0, 1, 2, . . . Rij ¼ cov(ui , uj ), i, j ¼ 0, 1, 2, . . . The initial state x0 is a random vector with mean x0 and covariance matrix P0, and x0, wk, and uk are independent. The prediction step of the Kalman filter solution is

where Ck

1

^ xkjk1 ¼ F^xk1jk1

(7:42)

Skjk1 ¼ FSk1jk1 FH þ GQk1 GH þ FCk1 þ (FCk1 )H

(7:43)

1 k 1 ¼ Ckk 1jk 1 and Ck 1jk

1

is given recursively by

k1 H ¼ FCk1 Ciji1 i1ji1 þ GQi1,k1 G

(7:44)

k1 H k1 Ck1 iji ¼ Ciji1 Ki H Ciji1

(7:45)

k ¼ 0 and Q0 ¼ 0. The k in the superscript denotes time k. with the initial value C0j0 The subsequent update step is

^ xkjk ¼ ^ xkjk1 þ Kk (zk HH ^xkjk1 )

(7:46)

Skjk ¼ Skjk1 Kk Sk KH k

(7:47)

Kk ¼ (Skjk1 H þ Vk )S1 k

(7:48)

where Kk and Sk are given by

Signal and Image Processing for Remote Sensing

140

Sk ¼ HH Skjk1 H þ HH Vk þ (HH Vk )H þ Rk Vk ¼ Vkkjk

1

(7:49)

is given recursively by Vkiji1 ¼ FVki1ji1 Vkiji ¼ Vkiji1 þ Ki (Ri,k HH Vkiji1 )

(7:50)

withRk being the new ‘‘data’’ and the initial value V0j0k ¼ 0. When the noise is white V ¼ 0 and C ¼ 0, and this Kalman filter reduces to the conventional Kalman filter. It is worth noting that this filter is optimal only when E{wk} and E{uk} are known.

7.4.2

Indirect Signal Estimation

A central premise in Kalman filter theory is that the underlying state-space model is accurate. When this assumption is violated, the performance of the filter can deteriorate appreciably. The sensitivity of the Kalman filter to signal modeling errors has led to the development of robust Kalman filtering techniques based on modeling the noise. The conventional point of view in applying Kalman filters to a signal detection problem is to assign the system state, xk, to the signal to be detected. Under this paradigm, the hypotheses ‘‘signal absent’’ (H0) and ‘‘signal present’’ (H1) are represented by the state itself. When the conventional point of view is applied for this particular application, the demodulated harmonic signal (a DC constant) is the system state to be estimated. Although a model for the desired signal can be developed from a relatively pure harmonic signal obtained in a shielded lab environment, that signal model often differs from the harmonic signal measured in the field, sometimes substantially, because the measured signal may be a function of system and environmental parameters, such as temperature. Therefore the harmonic signal measured in the field may deviate from the assumed signal model and the Kalman filter may produce a poor state estimate due to inaccuracies in the state equation. However, the background noise in the field can be measured, from which a reliable noise model can be developed, even though the measured noise is not exactly the same as the noise corrupting the measurements. The background noise in the postmitigation signal (Equation 7.10), ~v(n), can be modeled as a complex-valued AR process as previously described. The Kalman filter is applied to estimate the background noise in the postmitigation signal and then the background noise estimate is subtracted from the postmitigation signal to indirectly estimate the postmitigation harmonic signal (a DC constant). Thus, the state in the Kalman filter, xk, is the background noise ~ v(n), 3 ~ v(k P þ 1) 7 6~ 6 v(k P þ 2) 7 xk ¼ 6 7 .. 5 4 . 2

(7:51)

v~(k) and the measurement noise in the observation, uk, is the demodulated harmonic signal ~s(n), uk ¼ ~sk

(7:52)

Kalman Filtering for Weak Signal Detection in Remote Sensing

141

The other system parameters are 2

0 6 0 6 . . F¼6 6 . 4 0 ~aP

.. .

1 0 .. .

0 1 .. .

0 ~aP1

0 ~aP2

0 0 .. .

0 ~a2

2 3 0 607 6.7 .7 G¼H¼6 6.7 405 1

3 0 0 7 .. 7 . 7 7 1 5

(7:53)

~a1

(7:54) P1

and wk ¼ ~ «k. Therefore, the measurement equation becomes vk þ ~sk zk ¼ ~

(7:55)

where zk is the measured data corresponding to ~y(k) in Equation 7.10. A Kalman filter assumes complete a priori knowledge of the process and measurement noise statistics Q and R. These statistics, however, are inexactly known in most practical situations. The use of incorrect a priori statistics in the design of a Kalman filter can lead to large estimation errors, or even to a divergence of errors. To reduce or bound these errors, an adaptive filter is employed by modifying or adapting the Kalman filter to the real data. The approaches to adaptive filtering are divided into four categories: Bayesian, maximum likelihood, correlation, and covariance matching [22]. The last technique has been suggested for the situations when Q is known but R is unknown. The covariance matching algorithm ensures that the residuals remain consistent with the theoretical covariance. The residual, or innovation, is defined by vk ¼ zk HH ^xkjk1

(7:56)

which has a theoretical covariance of H E{vk vH k } ¼ H Skjk1 H þ R

(7:57)

If the actual covariance of vk is much larger than the covariance obtained from the Kalman filter, R should be increased to prevent divergence. This has the effect of increasing Skjk1, thus bringing the actual covariance of vk closer to that given in Equation 7.57. In this case, R is estimated as m X H ^k ¼ 1 vkj vH R kj H Skjk1 H m j¼1

(7:58)

Here, a two-step adaptive Kalman filter using the covariance matching method is pro^ . Then, the convenposed. First, the covariance matching method is applied to estimate R ^ tional Kalman filter is implemented with R ¼ R to estimate the background noise. In this application, there are several measurements of data, with each measurement containing tens to hundreds of segments. For each segment, the covariance matching method is

Signal and Image Processing for Remote Sensing

142

(yn) = s(n) + v(n)

Adaptive Kalman filter

Estimate R

Rs

Conventional Kalman filter

v(n) −

+ + s(n)

Detector FIGURE 7.2 Block diagram of the two-step adaptive Kalman filter strategy. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Decision

H0 H1

^ k, where the subscript k denotes the sample index. Since it is an employed to estimate R ^ k (k m) are retained and averaged adaptive procedure, only the steady-state values of R ^ s for each segment, to find the value of R ^s ¼ R

1 X 1 N ^k R N m k¼m

(7:59)

where the subscript s denotes the segment index. Then, the average is taken over all the segments in each measurement. Thus, L X ^s ^¼1 R R L s¼1

(7:60)

is used in the conventional Kalman filter in this two-step process. A block diagram depicting the two-step adaptive Kalman filter strategy is shown in Figure 7.2.

7.5

Application to Landmine Detection via Quadrupole Resonance

Landmines are a form of unexploded ordnance, usually emplaced on or just under the ground, which are designed to explode in the presence of a triggering stimulus such as pressure from a foot or vehicle. Generally, landmines are divided into two categories: antipersonnel mines and antitank mines. Antipersonnel (AP) landmines are devices usually designed to be triggered by a relatively small amount of pressure, typically 40 lbs, and generally contain a small amount of explosive so that the explosion aims or

Kalman Filtering for Weak Signal Detection in Remote Sensing

143

kills the person who triggers the device. In contrast, antitank (AT) landmines are specifically designed to destroy tanks and vehicles. They explode only if compressed by an object weighing hundreds of pounds. AP landmines are generally small (less than 10 cm in diameter) and are usually more difficult to detect than the larger AT landmines.

7.5.1

Quadrupole Resonance

When compounds with quadrupole moments are excited by a properly designed EMI system, they emit unique signals characteristic of the compound’s chemical structure. The signal for a compound consists of a set of spectral lines, where the spectral lines correspond to the QR frequencies for that compound, and every compound has its own set of resonant frequencies. This phenomenon is similar to nuclear magnetic resonance (NMR). Although there are several subtle distinctions between QR and NMR, in this context it is sufficient to view QR as NMR without the external magnetic field [4]. The QR phenomenon is applicable for landmine detection because many explosives, such as RDX, TNT, and PETN, contain nitrogen, and some of nitrogen’s isotopes, namely 14N, have electric quadrupole moments. Because of the chemical specificity of QR, the QR frequencies for explosives are unique and are not shared with other nitrogenous materials. In summary, landmine detection using QR is achieved by observing the presence, or absence, of a QR signal after applying a sequence of RF pulses designed to excite the resonant frequency of frequencies for the explosive of interest [4]. RFI presents a problem since the frequencies of the QR response fall within the commercial AM radio band. After the QR response is measured, additional processing is often utilized to reduce the RFI, which is usually a non-Gaussian colored noise process. Adaptive filtering is a common method for cancelling RFI when RFI reference signals are available. The frequency-domain LMS algorithm is an efficient method for extracting the QR signal from the background RFI. Under perfect circumstances, when the RFI measured on the main antenna is completely correlated with the signals measured on the reference antennas, all RFI can be removed by RFI mitigation, leaving only Gaussian noise associated with the QR system. Since the RFI travels over multiple paths and the antenna system is not perfect, however, the RFI mitigation cannot remove all of the nonGaussian noise. Consequently, more sophisticated signal processing methods must be employed to estimate the QR signal after the RFI mitigation, and thus improve the QR signal detection. Figure 7.3 shows the basic block diagram for QR signal detection. The data acquired during each excitation pulse are termed a segment, and a group of segments constitutes a measurement. In general, for each potential target there are multiple measurements, with each measurement containing several hundred segments. The measurements are demodulated at the expected QR resonant frequency. Thus, if the demodulated frequency equals the QR resonant frequency, the QR signal after demodulation is a DC constant.

7.5.2

Radio-Frequency Interference

Although QR is a promising technology due to its chemical specificity, it is limited by the inherently weak QR signal and susceptibility to RFI. TNT is one of the most prevalent explosives in landmines, and also one of the most difficult explosives to detect. TNT possesses 18 resonant frequencies, 12 of which are clustered in the range of 700–900 kHz. Consequently, AM radio transmitters strongly interfere with TNT-QR detection in the

Signal and Image Processing for Remote Sensing

144 x(n)

e−j2πF0n

×

Low-pass filter x(n) RFI Mitigation y(n) Filtering

Detector

Decision

H0 H1

FIGURE 7.3 Signal processing block diagram for QR signal detection. F0 is the resonant QR frequency for the explosive of interest and denotes a complex-valued signal. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

field, and are the primary source of RFI. Since there may be several AM radio transmitters operating simultaneously, the baseband (after demodulation) RFI signal measured by the QR system in each segment may be modeled as ~I (n) ¼

M X

~ m (n)ej(2pfm nþfm ) , A

n ¼ 0, . . . , N 1

(7:61)

m¼1

where the superscript denotes a complex value and we assume the frequencies are distinct, meaning fi 6¼ fj for i 6¼ j. For the RFI, Am(n) is the discrete time series of the message signal from an AM transmitter, which may be a nonstationary speech or music signal from a commercial AM radio station. The statistics of this signal, however, can be assumed to remain essentially constant over the short time intervals during which data are collected. For time intervals of this order, it is reasonable to assume Am(n) is constant for each data segment, but may change from segment to segment. Therefore, Equation 7.61 may be expressed as ~I (n) ¼

M X

~ m ej(2pfm nþfm ) , A

n ¼ 0, . . . , N 1

(7:62)

m¼1

This model represents all frequencies even though each of the QR signals exists in a very narrow band. In practice, only the frequencies corresponding to the QR signals need be considered.

7.5.3

Postmitigation Signals

The applicability of the postmitigation signal models described previously is demonstrated by examining measured QR and RFI signals.

Kalman Filtering for Weak Signal Detection in Remote Sensing

145

7.5.3.1 Postmitigation Quadrupole Resonance Signal An example simulated QR signal before and after RFI mitigation is shown in Figure 7.4. Although the QR signal is a DC constant prior to RFI mitigation, that is no longer true after mitigation. The first-order Gaussian–Markov model introduced previously is an appropriate model for postmitigation QR signal. The value of s~«2 ¼ 0.1 is estimated from the data.

7.5.3.2 Postmitigation Background Noise Although RFI mitigation can remove most of the RFI, some residual RFI remains in the postmitigation signal. The background noise remaining after RFI mitigation, ~v(n), consists of the residual RFI and the remaining sensor noise, which is altered by the mitigation. An example power spectrum of ~ v(n) derived from experimental data is shown in Figure 7.5. The power spectrum contains numerous peaks and valleys. Thus, the AR and ARMA models discussed previously are appropriate for modeling the residual background noise.

7.5.4

Kalman Filters for Quadrupole Resonance Detection

7.5.4.1 Conventional Kalman Filter For the system described by Equation 7.14 and Equation 7.15, the variables in the ~ (k), uk ¼ ~v(k), and zk ¼ ~y(k), and the conventional Kalman filter are xk ¼ ~s(k), wk ¼ w coefficient and covariance matrices in the Kalman filter are F ¼ [1], G ¼ [1], H ¼ [1], and Q ¼ [sw2]. The covariance R is estimated from the data, and the off-diagonal elements are set to 0.

12

10

Amplitude

8 Pre RFI-MIT: Real Part Post RFI-MIT: Real Part Pre RFI-MIT: Imag Part Post RFI-MIT: Imag Part

6

4

2

0 −2

5

10

15

20

25

30

35

40

45

50

55

Samples FIGURE 7.4 Example realization of the the simulated QR signal before and after RFI mitigation. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Signal and Image Processing for Remote Sensing

146 160

Power spectral density (dB)

140

120

100

80

60

40

20 −4

−3

−2

−1

0 Frequency

1

2

3

4

FIGURE 7.5 Power spectrum of the remaining background noise n~(n) after RFI mitigation. The x-axis units are normalized frequency ([p,p]). (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

7.5.4.2

Kalman Filter with an Autoregressive Model for Colored Noise

The multi-segment Burg algorithms are used to estimate the AR parameters, and the simulation results do not show significant differences between AVA, AVK, and SBurg algorithms. The AVA algorithm was chosen to estimate the AR coefficients a˜p and error power s~«2 describing the background noise for each measurement. The optimal order, as determined by the AIC, is 6. 7.5.4.3

Kalman Filter for Colored Noise

For this Kalman filter, the coefficient and covariance matrices are the same as for the conventional Kalman filter, with the exception of the observation noise covariance R. In this filter, all elements of R are retained, as opposed to the conventional Kalman filter in which only the diagonal elements are retained. 7.5.4.4 Indirect Signal Estimation For the adaptive Kalman filter in the first step, all coefficient and covariance matrices, F, G, H, Q, R, and S0, are the same under both H1 and H0. The practical value of x0 is given by [~ y(0) ~ y(1) ~y(P 1)]T

(7:63)

where P is the order of the AR model representing ~v(n). For the conventional Kalman filter in the second step, F, G, H, Q, and S0 are the same under both H1 and H0; however, the ^ , depends on the postmitigation signal and estimated observation noise covariance, R therefore is different under the two hypotheses.

Kalman Filtering for Weak Signal Detection in Remote Sensing

7.6

147

Performance Evaluation

The proposed Kalman filter methods are evaluated on experimental data collected in the field by QM. A typical QR signal consists of multiple sinusoids where the amplitude and frequency of each resonant line are the parameters of interest. Usually, the amplitude of only one resonant line is estimated at a time, and the frequency is known a priori. For the baseband signal demodulated at the desired resonant frequency, adaptive RFI mitigation is employed. A 1-tap LMS mitigation algorithm is applied to each frequency component of the frequency-domain representation of the experimental data [23]. First, performance is evaluated for synthetic QR data. In this case, the RFI data are measured experimentally in the absence of an explosive material, and an ideal QR signal (complex-valued DC signal) is injected into the measured RFI data. Second, performance is evaluated for experimental QR data. In this case, the data are measured for both explosive and nonexplosive samples in the presence of RFI. The matched filter is employed for the synthetic QR data, while the energy detector is utilized for the measured QR data.

7.6.1

Detection Algorithms

After an estimate of the QR signal has been obtained, a detection algorithm must be applied to determine whether or not the QR signal is present. For this binary decision problem, both an energy detector and a matched filter are applied. The energy detector simply computes the energy in the QR signal estimate, ~^s(n) Es ¼

N 1 X

j~^s(n)j2

(7:64)

n¼0

As it is a simple detection algorithm that does not incorporate any prior knowledge of the QR signal characteristics, it is easy to compute. The matched filter computes the detection statistic ( l ¼ Re

N 1 X

^~s(n)S ~

) (7:65)

n¼0

~ is the reference signal, which, in this application, is the known QR signal. The where S matched filter is optimal only if the reference signal is precisely known a priori. Thus, if there is uncertainty regarding the resonant frequency of the QR signal, the matched filter will no longer be optimal. QR resonant frequency uncertainty may arise due to variations in environmental parameters, such as temperature.

7.6.2

Synthetic Quadrupole Resonance Data

Prior to detection using the matched filter, the QR signal is estimated directly. Three Kalman filter approaches to estimate the QR signal are considered: the conventional Kalman filter, the extended Kalman filter, and the Kalman filter for arbitrary colored noise. Each of these filters requires the initial value of the system state, x0, and the selection of the initial value may affect the estimate of x. The sample mean of the observation ~ y(n) is used to set x0. Since only the steady-state output of the Kalman filter is reliable,

148

Signal and Image Processing for Remote Sensing

the first 39 points are removed from the data used for detection to ensure the system has reached steady state. Although the measurement noise is known to be colored, the conventional Kalman filter (conKF) designed for white noise is applied to estimate the QR signal. This algorithm has the benefits of being simpler with lower computational burden than either of the other two Kalman filters proposed for direct estimation of the QR signal. Although its assumptions regarding the noise structure are recognized as inaccurate, its performance can be used as a benchmark for comparison. For a given process noise covariance Q, the covariance of the intial state x0, S0, affects the speed with which S reaches steady state in the conventional Kalman filter. As Q increases, it takes less time for S to converge. However, the steady-state value of S also increases. In these simulations, S0 ¼ 10 in the conventional Kalman filter. The second approach applied to estimate the QR signal is the Kalman filter with an AR model for the colored noise (extKF). Since R ¼ 0 shuts down the Kalman filter it does not track the signal, we set R ¼ 10. It is shown that the detection performance does not decrease when Q increases. Finally, the Kalman filter for colored noise (arbKF) is applied to the synthetic data. Compared to the conventional Kalman filter, the Kalman filter for arbitrary noise has a smaller Kalman gain, and therefore, slower convergence speed. Consequently, the error covariances for the Kalman filter for arbitrary noise are larger. Since only the steady state is used for detection, S0 ¼ 50 is chosen for Q ¼ 0.1 and S0 ¼ 100 is chosen for Q ¼ 1 and Q ¼ 10. Results for each of these Kalman filtering methods for different values of Q are presented in Figure 7.6. When Q is small (Q ¼ 0.1 and Q ¼ 1), all three methods have similar detection performance. However, when greater model error is introduced in the state equation (Q ¼ 10) both the conventional Kalman filter and the Kalman filter for colored noise have poorer detection performance than the Kalman filter with an AR model for the noise. Thus, the Kalman filter with an AR model for the noise shows robust performance. Considering the computational efficiency, the Kalman filter for colored noise is the poorest because it recursively estimates C and V for each k. In this application, although the measurement noise ~ vk is colored, the diagonal elements of the covariance matrix dominate. Therefore, the conventional Kalman filter is preferable to the Kalman filter for colored noise. 7.6.3

Measured Quadrupole Resonance Data

Indirect estimation of the QR signal is validated using two groups of real data collected both with and without an explosive present. The two explosives for which measured QR data are collected, denoted Type A and Type B, are among the more common explosives found in landmines. It is well known that the Type B explosive is the more challenging of the two explosives to detect. The first group, denoted Data II, has Type A explosive, and the second group, denoted Data III, has both Type A and Type B explosives. For each data group, a 50–50% training–testing strategy is employed. Thus, 50% of the data are used to estimate the coefficient and covariance matrices, and the remaining 50% of the data are used to test the algorithm. The AVA algorithm is utilized to estimate the AR parameters from the training data. Table 7.1 lists the four training–testing strategies considered. For example, if there are 10 measurements and measurements 1–5 are used for training and measurements 6–10 are used for testing, then this is termed ‘‘first 50% training, last 50% testing.’’ If the training–testing strategy measurements 1, 3, 5, 7, 9 are used for training, and the other measurements for testing, this is termed ‘‘odd 50% training, even 50% testing.’’

Kalman Filtering for Weak Signal Detection in Remote Sensing (b) Data II-2

1

1

0.9

0.9

0.8

0.8 Probability of detection

Probability of detection

(a) Data II-1

0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=0.1 Post arbKF: Q=0.1 Post extKF: Q=0.1

0.3 0.2

0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=1 Post arbKF: Q=1 Post extKF: Q=1

0.3 0.2

0.1 0

149

0.1 0

0.2 0.4 0.6 0.8 Probability of false alarm

1

0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

(c) Data II-3 1 0.9

Probability of detection

0.8 0.7 0.6 0.5 0.4 Pre KF Post conKF: Q=10 Post arbKF: Q=10 Post extKF: Q=10

0.3 0.2 0.1 0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.6 Direct estimation of QR signal tested on Data I (synthetic data) with different noise model variance Q. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

Figure 7.7 and Figure 7.8 present the performance of indirect QR signal estimation followed by an energy detector. The data are measured for both explosive and nonexplosive samples in the presence of RFI. The two different explosive compounds are referred to as Type A explosive and Type B explosive. Data II and Data III-1 are Type A explosive and Data III-2 is Type B explosive. Indirect QR signal estimation provides almost perfect TABLE 7.1 Training–testing strategies for Data II and Data III Training

Testing

First 50% Last 50% Odd 50% Even 50%

Last 50% First 50% Even 50% Odd 50%

Signal and Image Processing for Remote Sensing

150

1

1

0.9

0.9

0.8

0.8

Probability of detection

(b) Data II-2

Probability of detection

(a) Data II-1

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1

1

0

0

0.2 0.4 0.6 0.8 Probability of false alarm

1

(c) Data II-3 1

Probability of detection

0.9 0.8 0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.7 Indirect estimation of QR signal tested on Data II (true data). Four different training–testing strategies are plotted together. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

detection for the Type A explosive. Although detection is not near-perfect for the Type B explosive, the detection performance following the application of the Kalman filter is better than the performance prior to applying the Kalman filter.

7.7

Summary

The detectability of weak signals in remote sensing applications can be hindered by the presence of interference signals. In situations where it is not possible to record the measurement without the interference, adaptive filtering is an appropriate method to mitigate the interference in the measured signal. Adaptive filtering, however, may not remove all the interference from the measured signal if the reference signals are not

Kalman Filtering for Weak Signal Detection in Remote Sensing

1

1

0.9

0.9

0.8

0.8

Probability of detection

(b) Data III-2

Probability of detection

(a) Data III-1

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1 0

0

0.2 0.4 0.6 0.8 Probability of false alarm

151

0.7 0.6 0.5 0.4

2 Step KF: First/Last 2 Step KF: Last/First 2 Step KF: Odd/Even 2 Step KF: Even/Odd Pre KF

0.3 0.2 0.1

1

0

0

0.2

0.4 0.6 0.8 Probability of false alarm

1

FIGURE 7.8 Indirect estimation of QR signal tested on Data III (true data). Four different training–testing strategies are plotted together. (From Tan et al., IEEE Transactions on Geoscience and Remote Sensing 43(7), 1507–1516, 2005. With permission.)

completely correlated with the primary measured signal. One approach for subsequent processing to detect the signal of interest when residual interference remains after the adaptive noise cancellation is Kalman filtering. An accurate signal model is necessary for Kalman filters to perform well. It is so critical that even small deviations may cause very poor performance. The harmonic signal of interest may be sensitive to the external environment, which may then restrict the signal model accuracy. To overcome this limitation, an adaptive two-step algorithm, employing Kalman filters, is proposed to estimate the signal of interest indirectly. The utility of this approach is illustrated by applying it to QR signal estimation for landmine detection. QR technology provides promising explosive detection efficiency because it can detect the ‘‘fingerprint’’ of explosives. In applications such as humanitarian demining, QR has proven to be highly effective if the QR sensor is not exposed to RFI. Although adaptive RFI mitigation removes most of RFI, additional signal processing algorithms applied to the postmitigation signal are still necessary to improve landmine detection. Indirect signal estimation is compared to direct signal estimation using Kalman filters and is shown to be more effective. The results of this study indicate that indirect QR signal estimation provides robust detection performance.

References 1. B.D.O. Anderson and J.B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, NJ, 1979. 2. S.J. Norton and I.J. Won, Identification of buried unexploded ordnance from broadband electromagnetic induction data, IEEE Transactions on Geoscience and Remote Sensing, 39(10), 2253– 2261, 2001. 3. K. O’Neill, Discrimination of UXO in soil using broadband polarimetric GPR backscatter, IEEE Transactions on Geoscience and Remote Sensing, 39(2), 356–367, 2001.

152

Signal and Image Processing for Remote Sensing

4. N. Garroway, M.L. Buess, J.B. Miller, B.H. Suits, A.D. Hibbs, G.A. Barrall, R. Matthews, and L.J. Burnett, Remote sensing by nuclear quadrupole resonance, IEEE Transactions on Geoscience and Remote Sensing, 39(6), 1108–1118, 2001. 5. J.A.S. Smith, Nuclear quadrupole resonance spectroscopy, general principles, Journal of Chemical Education, 48, 39–49, 1971. 6. S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, 1991. 7. C.F. Cowan and P.M. Grant, Adaptive Filters, Prentice-Hall, Englewood Cliffs, NJ, 1985. 8. G.E.P. Box and G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, CA, 1970. 9. B. Picinbono and P. Bondon, Second-order statistics of complex signals, IEEE Transactions on Signal Processing, SP-45(2), 411–420, 1979. 10. P.M.T. Broersen, The ABC of autoregressive order selection criteria, Proceedings of SYSID SICE, 231–236, 1997. 11. S. Haykin, B.W. Currie, and S.B. Kesler, Maximum-entropy spectral analysis of radar clutter, Proceedings of the IEEE, 70(9), 953–962, 1982. 12. A.A. Beex and M.D.A. Raham, On averaging Burg spectral estimators for segments, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34, 1473–1484, 1986. 13. D. Tjostheim and J. Paulsen, Bias of some commonly-used time series estimates, Biometrika, 70(2), 389–399, 1983. 14. S. de Waele and P.M.T. Broersen, The Burg algorithm for segments, IEEE Transactions on Signal Processing, SP-48, 2876–2880, 2000. 15. H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, AC-19(6), 716–723, 1974. 16. M. Wax and T. Kailath, Detection of signals by information theoretic criteria, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-33(2), 387–392, 1985. 17. Y. Tan, S.L. Tantum, and L.M. Collins, Kalman filtering for enhanced landmine detection using quadrupole resonance, IEEE Transactions on Geoscience and Remote Sensing, 43(7), 1507– 1516, 2005. 18. J.D. Gibson, B. Koo, and S.D. Gray, Filtering of colored noise for speech enhancement and coding, IEEE Transactions on Signal Processing, 39(8), 1732–1742, 1991. 19. A.P. Sage and J.L. Melsa, Estimation Theory with Applications to Communications and Control, McGraw-Hill, New York, 1971. 20. P.S. Maybeck, Stochastic Models, Estimation, and Control, Academic Press, New York, 1979. 21. X.R. Li, C. Han, and J. Wang, Discrete-time linear filtering in arbitrary noise, Proceedings of the 39th IEEE Conference on Decision and Control, 2000, 1212–1217, 2000. 22. R.K. Mehra, Approaches to adaptive filtering, IEEE Transactions on Automatic Control, AC-17, 693–398, 1972. 23. Y. Tan, S.L. Tantum, and L.M. Collins, Landmine detection with nuclear quadrupole resonance, In IGARSS 2002 Proceedings, 1575–1578, July 2002.

8 Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation Moisture Dynamics

J. Verbesselt, P. Jo¨nsson, S. Lhermitte, I. Jonckheere, J. van Aardt, and P. Coppin

CONTENTS 8.1 Introduction ....................................................................................................................... 153 8.2 Data ..................................................................................................................................... 155 8.2.1 Study Area ............................................................................................................. 155 8.2.2 Climate Data.......................................................................................................... 156 8.2.3 Remote Sensing Data ........................................................................................... 157 8.3 Serial Correlation and Time-Series Analysis................................................................ 158 8.3.1 Recognizing Serial Correlation........................................................................... 158 8.3.2 Cross-Correlation Analysis ................................................................................. 160 8.3.3 Time-Series Analysis: Relating Time-Series and Autoregression ................ 162 8.4 Methodology...................................................................................................................... 163 8.4.1 Data Smoothing .................................................................................................... 163 8.4.2 Extracting Seasonal Metrics from Time-Series and Statistical Analysis ..... 165 8.5 Results and Discussion .................................................................................................... 166 8.5.1 Temporal Analysis of the Seasonal Metrics ..................................................... 166 8.5.2 Regression Analysis Based on Values of Extracted Seasonal Metrics......... 167 8.5.3 Time-Series Analysis Techniques ...................................................................... 168 8.6 Conclusions........................................................................................................................ 169 Acknowledgments ..................................................................................................................... 170 References ................................................................................................................................... 170

8.1

Introduction

The repeated occurrence of severe wildfires, which affect various fire-prone ecosystems of the world, has highlighted the need to develop effective tools for monitoring fire-related parameters. Vegetation water content (VWC), which influences the biomass burning processes, is an example of one such parameter [1–3]. The physical definitions of VWC vary from water volume per leaf or ground area (equivalent water thickness) to water mass per mass of vegetation [4]. Therefore, VWC could also be used to infer vegetation water stress and to assess drought conditions that linked with fire risk [5]. Decreases in VWC due to the seasonal decrease in available soil moisture can induce severe fires in 153

Signal and Image Processing for Remote Sensing

154

most ecosystems. VWC is particularly important for determining the behavior of fires in savanna ecosystems because the herbaceous layer becomes especially flammable during the dry season when the VWC is low [6,7]. Typically, VWC in savanna ecosystems is measured using labor-intensive vegetation sampling. Several studies, however, indicated that VWC can be characterized temporally and spatially using meteorological or remote sensing data, which could contribute to the monitoring of fire risk [1,4]. The meteorological Keetch–Byram drought index (KBDI) was selected for this study. This index was developed to incorporate soil water content in the root zone of vegetation and is able to assess the seasonal trend of VWC [3,8]. The KBDI is a cumulative algorithm for the estimation of fire potential from meteorological information, including daily maximum temperature, daily total precipitation, and mean annual precipitation [9,10]. The KBDI also has been used for the assessment of VWC for vegetation types with shallow rooting systems, for example, the herbaceous layer of the savanna ecosystem [8,11]. The application of drought indices, however, presents specific operational challenges. These challenges are due to the lack of meteorological data for certain areas, as well as spatial interpolation techniques that are not always suitable for use in areas with complex terrain features. Satellite data provide sound alternatives to meteorological indices in this context. Remotely sensed data have significant potential for monitoring vegetation dynamics at regional to global scale, given the synoptic coverage and repeated temporal sampling of satellite observations (e.g., SPOT VEGETATION or NOAA AVHRR) [12,13]. These data have the advantage of providing information on remote areas where ground measurements are impossible to obtain on a regular basis. Most research in the scientific community using optical sensors (e.g., SPOT VEGETATION) to study biomass burning has focused on two areas [4]: (1) the direct estimation of VWC and (2) the estimation of chlorophyll content or degree of drying as an alternative to the estimation of VWC. Chlorophyll-related indices are related to VWC based on the hypothesis that the chlorophyll content of leaves decreases proportionally to the VWC [4]. This assumption has been confirmed for selected species with shallow rooting systems (e.g., grasslands and understory forest vegetation) [14–16], but cannot be generalized to all ecosystems [4]. Therefore, chlorophyll-related indices, such as the normalized difference vegetation index (NDVI), only can be used in regions where the relationship among chlorophyll content, degree of curing, and water content has been established. Accordingly, a remote sensing index that is directly coupled to the VWC is used to investigate the potential of hyper-temporal satellite imagery to monitor the seasonal vegetation moisture dynamics. Several studies [4,16–18] have demonstrated that VWC can be estimated directly through the normalized difference of the near infrared reflectance (NIR, 0.78–0.89 mm) rNIR, influenced by the internal structure and the dry matter, and the shortwave infrared reflectance (SWIR, 1.58–1.75 mm) rSWIR, influenced by plant tissue water content: NDWI ¼

rNIR rSWIR rNIR þ rSWIR

(8:1)

The NDWI or normalized difference infrared index (NDII) [19] is similar to the global vegetation moisture index (GVMI) [20]. The relationship between NDWI and KBDI time-series, both related to VWC dynamics, is explored. Although the value of time-series data for monitoring vegetation moisture dynamics has been firmly established [21], only a few studies have taken serial correlation into account when correlating time-series [6,22–25]. Serial correlation occurs when data collected through time contain values at time t, which are correlated with observations at

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 155 time t – 1. This type of correlation in time-series, when related to VWC dynamics, is mainly caused by the seasonal variation (dry–wet cycle) of vegetation [26]. Serial correlation can be used to forecast future values of the time-series by modeling the dependence between observations but affects correlations between variables measured in time and violates the basic regression assumption of independence [22]. Correlation coefficients of serially correlated data cannot be used as indicators of goodness-of-fit of a model as the correlation coefficients are artificially inflated [22,27]. The study of the relationship between NDWI and KBDI is a nontrivial task due to the effect of serial correlation. Remedies for serial correlation include sampling or aggregating the data over longer time intervals, as well as further modeling, which can include techniques such as weighted regression [25,28]. However, it is difficult to account for serial correlation in time-series related to VWC dynamics using extended regression techniques. The time-series related to VWC dynamics often exhibit high non-Gaussian serial correlation and are more significantly affected by outliers and measurement errors [28]. A sampling technique therefore is proposed, which accounts for serial correlation in seasonal time-series, to study the relationship between different time-series. The serial correlation effect in time-series is assumed to be minimal when extracting one metric per season (e.g., start of the dry season). The extracted seasonal metrics are then utilized to study the relationship between time-series at a specific moment in time (e.g., start of the dry season). The aim of this chapter is to address the effect of serial correlation when studying the relationship between remote sensing and meteorological time-series related to VWC by comparing nonserially correlated seasonal metrics from time-series. This chapter therefore has three defined objectives. Firstly, an overview of time-series analysis techniques and concepts (e.g., stationarity, autocorrelation, ARIMA, etc.) is presented and the relationship between time-series is studied using cross-correlation and ordinary least square (OLS) regression analysis. Secondly, an algorithm for the extraction of seasonal metrics is optimized for satellite and meteorological time-series. Finally, the temporal occurrence and values of the extracted nonserially correlated seasonal metrics are analyzed statistically to define the quantitative relationship between NDWI and KBDI time-series. The influence of serial correlation is illustrated by comparing results from cross-correlation and OLS analysis with the results from the investigation of correlation between extracted metrics.

8.2 8.2.1

Data Study Area

The Kruger National Park (KNP), located between latitudes 238S and 268S and longitudes 308E and 328E in the low-lying savanna of the northeastern part of South Africa, was selected for this study (Figure 8.1). Elevations range from 260 to 839 m above sea level, and mean annual rainfall varies between 350 mm in the north and 750 mm in the south. The rainy season within the annual climatic season can be confined to the summer months (i.e., November to April), and over a longer period can be defined by alternating wet and dry seasons [7]. The KNP is characterized by an arid savanna dominated by thorny, fine-leafed trees of the families Mimosaceae and Burseraceae. An exception is the northern part of the KNP where the Mopane, a broad-leafed tree belonging to the Ceasalpinaceae, almost completely dominates the tree layer.

Signal and Image Processing for Remote Sensing

156

Punda Maria N

Shingwedzi

Letaba

Satara

Pretoriuskop Onder Sabie 0 10 20 30 40 50 km FIGURE 8.1 The Kruger National Park (KNP) study area with the weather stations used in the analysis (right). South Africa is shown with the borders of the provinces and the study area (top left).

8.2.2

Climate Data

Climate data from six weather stations in the KNP with similar vegetation types were used to estimate the daily KBDI (Figure 8.1). KBDI was derived from daily precipitation and maximum temperature data to estimate the net effect on the soil water balance [3]. Assumptions in the derivation of KBDI include a soil water capacity of approximately 20 cm and an exponential moisture loss from the soil reservoir. KBDI was initialized during periods of rainfall events (e.g., rainy season) that result in soils with maximized field capacity and KBDI values of zero [8]. The preprocessing of KBDI was done using the method developed by Janis et al. [10]. Missing daily maximum temperatures were replaced with interpolated values of daily maximum temperatures, based on a linear interpolation function [30]. Missing daily precipitation, on the other hand, was assumed to be zero. A series of error logs were automatically generated to indicate missing precipitation values and associated estimated daily KBDI values. This was done because zeroing missing precipitation may lead to an increased fire potential bias in KBDI. The total percentage of missing data gaps in rainfall and temperature series was maximally 5% during the study period for each of the six weather stations. The daily KBDI time-series were transformed into 10-daily KBDI series, similar to the SPOT VEGETATION S10 dekads (i.e., 10-day periods), by taking the maximum of

− 600

− 400 −KBDI

NDWI −0.3 −0.2 −0.1 0.0

0.1

0.2

NDWI −KBDI

− 200

0

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 157

1999

2000

2001

2002

2003

FIGURE 8.2 The temporal relationship between NDWI and KBDI time-series for the ‘‘Satara’’ weather station (Figure 8.1).

each dekad. The negative of the KBDI time-series (i.e., –KBDI) was analyzed in this chapter such that the temporal dynamics of KBDI and NDWI were related (Figure 8.2). The –KBDI and NDWI are used throughout this chapter. The Satara weather station, centrally positioned in the study area, was selected to represent the temporal vegetation dynamics. The other weather stations in the study area demonstrate similar temporal vegetation dynamics.

8.2.3

Remote Sensing Data

The data set used is composed of 10-daily SPOT VEGETATION (SPOT VGT) composites (S10 NDVI maximum value syntheses) acquired over the study area for the period April 1998 to December 2002. SPOT VGT can provide local to global coverage on a regular basis (e.g., daily for SPOT VGT). The syntheses result in surface reflectance in the blue (0.43–0.47 mm), red (0.61–0.68 mm), NIR (0.78–0.89 mm), and SWIR (1.58–1.75 mm) spectral regions. Images were atmospherically corrected using the simplified method for atmospheric correction (SMAC) [30]. The geometrically and radiometrically corrected S10 images have a spatial resolution of 1 km. The S10 SPOT VGT time-series were preprocessed to detect data that erroneously influence the subsequent fitting of functions to time-series, necessary to define and extract metrics [6]. The image preprocessing procedures performed were: .

Data points with a satellite viewing zenith angle (VZA) above 508 were masked out as pixels located at the very edge of the image (VZA > 50.58) swath are affected by re-sampling methods that yield erroneous spectral values.

.

The aberrant SWIR detectors of the SPOT VGT sensor, flagged by the status mask of the SPOT VGT S10 synthesis, also were masked out. A data point was classified as cloud-free if the blue reflectance was less than 0.07 [31]. The developed threshold approach was applied to identify cloud-free pixels for the study area.

.

NDWI time-series were derived by selecting savanna pixels, based on the land cover map of South Africa [32], for a 3 3 pixel window centered at each of the meteorological

Signal and Image Processing for Remote Sensing

158

stations to reduce the effect of potential spatial misregistration (Figure 8.1). Median values of the 9-pixel windows were then retained instead of single pixel values [33]. The median was preferred to average values as it is less affected by extreme values and therefore is less sensitive to potentially undetected data errors.

8. 3

Serial C orrelation and Ti me-Series A nalysis

Serial correlation affects correlations between variables measured in time, and violates the basic regression assumption of independence. Techniques that are used to recognize serial correlation therefore are discussed by applying them to the NDWI and –KBDI time-series. Cross-correlation analysis is illustrated and used to study the relationship between time-series of –KBDI and NDWI. Fundamental time-series analysis concepts (e.g., stationarity and seasonality) are introduced and a brief overview is presented of the most frequently used method for time-series analysis to account for serial correlation, namely autoregression(AR).

8.3.1

Recognizing Serial Correlation

This chapter focuses on discrete time-series, which contain observations made at discrete time intervals (e.g., 10 daily time steps of –KBDI and NDWI time-series). Time-series are defined as a set of observations, xt, recorded at a specific time, t [26]. Time-series of – KBDI and NDWI contain a seasonal variation which is illustrated in Figure 8.2 by a smooth increase or decrease of the series related to vegetation moisture dynamics. The gradual increase or decrease of the graph of a time-series is generally indicative of the existence of a form of dependence or serial correlation among observations. The presence of serial correlation systematically biases regression analysis when studying the relationship between two or more time-series [25]. Consider the OLS regression line with a slope and an intercept: Y(t) ¼ a0 þ a1 X(t) þ e(t)

(8:2)

where t is time, a0 and a1 are the respective OLS regression intercept and slope parameter, Y(t) the dependent variable, X(t) the independent variable, and e(t) the random error term. The standard error(SE) of each parameter is required for any regression model to define the confidence interval(CI) and derive the significance of parameters in the regression equation. The parameters a0, a1, and the CIs, estimated by minimizing the sum of the squared ‘‘residuals’’ are valid only if certain assumptions related to the regression and e(t) are met [25]. These assumptions are detailed in statistical textbooks [34] but are not always met or explicitly considered in real-world applications. Figure 8.3 illustrates the biased CIs of the OLS regression model at a 95% confidence level. The SE term of the regression model is underestimated due to serially correlated residuals and explains the biased confidence interval, where CI ¼ mean + 1.96 SE. The Gauss–Markov theorem states that the OLS parameter estimate is the best linear unbiased estimate (BLUE); that is, all other linear unbiased estimates will have a larger variance, if the error term, e(t), is stationary and exhibits no serial correlation. The Gauss–Markov theorem consequently points to the error term and not to the time-series themselves as the critical consideration [35]. The error term is defined as stationary when

−400 −800

−600

−KBDI

−200

0

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 159

−0.3

−0.2

−0.1

0.0

0.1

0.2

0.3

NDWI FIGURE 8.3 Result of the OLS regression fit between KBDI and NDWI as dependent and independent variables, respectively, for the Satara weather station (n ¼ 157). Confidence intervals (- - -) at a 95% confidence level are shown, but are ‘‘narrowed’’ due to serial correlation in the residuals.

it does not present a trend and the variance remains constant over time [27]. It is possible that the residuals are serially correlated if one of the dependent or independent variables also is serially correlated, because the residuals constitute a linear combination of both types of variables. Both dependent and independent variables of the regression model are serially correlated (KBDI and NDWI), which explains the serial correlation observed in the residuals. A sound practice used to verify serial correlation in time-series is to perform multiple checks by both graphical and diagnostic techniques. The autocorrelation function (ACF) can be viewed as a graphical measure of serial correlation between variables or residuals. The sample ACF is defined when x1, . . . , xn are observations of a time-series. The sample mean of x1, . . . , xn is [26]: x¼

n 1X xt n t¼1

(8:3)

The sample autocovariance function with lag h and time t is g^(h) ¼ n1

n jhj X

xtþjhj x (xt x),n < h < n

(8:4)

g^(h) ,n < h < n g^(0)

(8:5)

t¼1

The sample ACF is r^ ¼

Figure 8.4 illustrates the ACF for time-series of –KBDI and NDWI presented from the Kruger park data. The ACF clearly indicates a significant autocorrelation in the

Signal and Image Processing for Remote Sensing

−0.4 0.0

ACF, NDWI 0.2 0.4 0.6

0.8

1.0

160

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

Lag

−0.2 0.0

ACF,−KBDI 0.2 0.4 0.6

0.8

1.0

(a)

0.0

0.2

(b)

0.4 Lag

FIGURE 8.4 The autocorrelation function (ACF) for (a) KBDI and (b) NDWI time-series for the Satara weather station. pﬃﬃﬃ The horizontal lines on the graph are the bounds ¼ 1:96= n (n ¼ 157).

time-series, as more pﬃﬃﬃ than 5% of the sample autocorrelations fall outside the significance bounds ¼ 1:96= n [26]. There are also formal tests available to detect autocorrelation such as the Ljung–Box test statistic and the Durbin–Watson statistic [25,26].

8.3.2

Cross-Correlation Analysis

The cross-correlation function (CCF) can be derived between two time-series utilizing a technique similar to the ACF applied for one time-series [27]. Cross-correlation is a measure of the degree of linear relationship existing between two data sets and can be used to study the connection between time-series. The CCF, however, can only be used if the time-series is stationary [27]. For example, when all variables are increasing in value over time, cross-correlation results will be spurious and subsequently cannot be used to study the relationship between time-series. Nonstationary time-series can be transformed to stationary time-series by implementing one of the following techniques: .

Differencing the time-series by a period d can yield a series that satisfies the assumption of stationarity (e.g., xtxt–1 for d ¼ 1). The differenced series will

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 161 contain one point less than the original series. Although a time-series can be differenced more than once, one difference is usually sufficient. .

Lower order polynomials can be fitted to the series when the data contains a trend or seasonality that needs to be subtracted from the original series. Seasonal time-series can be represented as the sum of a specified trend, and seasonal and random terms. For example, for statistical interpretation results, it is important to recognize the presence of seasonal components and remove them to avoid confusion with long-term trends. Figure 8.5 illustrates the seasonal trend decompositioning method using locally weighted regression for the NDWI time-series [36].

.

The logarithm or square root of the series may stabilize the variance in the case of a nonconstant variance.

0.1 −0.1

0.1 −0.2 −0.1 0.0

−0.15 −0.10 −0.05 0.00

1999

2000

2001

2002

2003

−0.3

−0.1

0.1

Remainder

Trend

Seasonal

0.2

−0.3

Data

Figure 8.6 illustrates the cross-correlation plot for stationary series of –KBDI and NDWI. –KBDI and NDWI time-series became stationary after differencing with d ¼ 1. The stationarity was confirmed using the ‘‘augmented Dickey–Fuller’’ test for stationarity [26,29] at a confidence level of 95% ( p < 0.01; with stationarity as the alternative hypothesis). Note that approximately 95% confidence limits are shown for the autocorrelation plots of an independent series. These limits must be regarded with caution, since there exists an a priori expectation of serial correlation for time-series [37].

Time (dekad) FIGURE 8.5 The results of the seasonal trend decomposition (STL) technique for NDWI time-series of Satara weather station. The original series can be reconstructed by summing the seasonal, trend, and remainder. In the y-axes the NDWI values are indicated. The gray bars at the right-hand side of the plots illustrate the relative data range of the time-series.

Signal and Image Processing for Remote Sensing

–0.2

FIGURE 8.6 The cross-correlation plot between stationary KBDI and NDWI time-series of the Satara weather station, where CCF indicates results of the cross-correlation function. The horizontal lines on the pﬃﬃﬃ graph are the bounds (¼ 1:96= n) of the approximate 95% confidence interval.

0.0

CCF 0.1 0.2

0.3

0.4

0.5

162

–0.4

–0.2

0.0 Lag

0.2

0.4

Table 8.1 illustrates the coefficients of determination (i.e., multiple R2) of the OLS regression analysis with serially correlated residuals –KBDI as dependent and NDWI as independent variable for all six weather stations in the study area. The Durbin–Watson statistic indicated that the residuals were serially correlated at a 95% confidence level (p < 0.01). These results will be compared with the method presented in Section 8.5. Table 8.1 also indicates the time lags at which correlation between time-series was maximal, as derived from the cross-correlation plot. A negative lag indicates that –KBDI reacts prior to NDWI, for example, in the cases of Punda Maria and Shingwedzi weather stations, and subsequently can be used to predict NDWI. This is logical since weather conditions, for example, rainfall and temperature, change before vegetation reacts. NDWI, which is related to the amount of water in the vegetation, consequently lags behind the –KBDI. The major vegetation type in savanna vegetation is the herbaceous layer, which has a shallow rooting system. This explains why the vegetation in the study area quickly follows climatic changes and NDWI did not lag behind –KBDI for the other four weather stations.

8.3.3

Time-Series Analysis: Relating Time-Series and Autoregression

A remedy for serial correlation, apart from applying variations in sampling strategy, is modeling of the time dependence in the error structure by AR. AR most often is used for

TABLE 8.1 Coefficients of Determination of the OLS Regression Model between KBDI and NDWI (n ¼ 157). Station Punda Maria Letaba Onder Sabie Pretoriuskop Shingwedzi Satara

R2

Time Lag

0.74 0.88 0.72 0.31 0.72 0.81

1 0 0 0 1 0

Note: The time expressed in dekads of maximum correlation of the cross-correlation between –KBDI and NDWI is also indicated.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 163 purposes of forecasting and modeling of a time-series [25]. The simplest AR model for Equation 8.2, where r is the result of the sample ACF at lag 1, is et ¼ ret1 þ «t

(8:6)

where «t is a series of serially independent numbers with mean zero and constant variance. The Gauss–Markov theorem cannot be applied and therefore OLS is not an efficient estimator of the model parameters if r is not zero [35]. Many different AR models are available in statistical software systems that incorporate time-series modules. One of the most frequently used models to account for serial correlation is the autoregressive-integrated-moving-average model (ARIMA) [26,37]. Briefly stated, ARIMA models can have an AR term of order p, a differencing (integrating) term (I) of order d, and a moving average (MA) term of order q. The notation for specific models takes the form of (p,d,q) [27]. The order of each term in the model is determined by examining the raw data and plots of the ACF of the data. For example, a second-order AR (p ¼ 2) term in the model would be appropriate if a series has significant autocorrelation coefficients between xt, and xt–1, and xt–2. ARIMA models that are fitted to time-series data using AR and MA parameters, p and q, have coefficients F and u to describe the serial correlation. An underlying assumption of ARIMA models is that the series being modeled is stationary [26–27].

8.4

Methodology

The TIMESAT program is used to extract nonserially correlated metrics from remote sensing and meteorological time-series [38,39]. These metrics are utilized to study the relationship between time-series at specific moments in time. The relationship between time-series, in turn, is evaluated using statistical analysis of extracted nonserially correlated seasonal metrics from time-series (–KBDI and NDWI).

8.4.1

Data Smoothing

It often is necessary to generate smooth time-series from noisy satellite sensors or meteorological data to extract information on seasonality. The smoothing can be achieved by applying filters or by function fitting. Methods based on Fourier series [40–42] or leastsquare fits to sinusoidal functions [43–45] are known to work well in most instances. These methods, however, are not capable of capturing a sudden, steep rise or decrease of remote sensing or meteorological data values that often occur in arid and semiarid environments. Alternative smoothing and fitting methods have been developed to overcome these problems [38]. An adaptive Savitzky–Golay filtering method, implemented in the TIMESAT processing package developed by Jo¨nsson and Eklundh [39], is used in this chapter. The filter is based on local polynomial fits. Suppose we have a time-series (ti, yi), i ¼ 1, 2, . . . , N. For each point i, a quadratic polynomial f (t) ¼ c1 þ c2 t þ c3 t2

(8:7)

Signal and Image Processing for Remote Sensing

164

is fit to all 2kþ1 points for a window from n ¼ i k to m ¼ i þ k by solving the system of normal equations AT Ac ¼ AT b

(8:8)

where 0

wn B wnþ1 B A¼B @ wm

wn tn wnþ1 tnþ1 .. . wm tm

1 0 1 w n yn wn t2n B wnþ1 ynþ1 C wnþ1 t2nþ1 C B C C C and b ¼ B C .. A @ A . 2 w m ym wm tm

(8:9)

The filtered value is set to the value of the polynomial at point i. Weights are designated as w in the above expression, with weights assigned to all of the data values in the window. Data values that were flagged in the preprocessing are assigned weight ‘‘zero’’ in this application and thus do not influence the result. The clean data values all have weights ‘‘one.’’ Residual negatively biased noise (e.g., clouds) may occur for the remote sensing data and accordingly the fitting was performed in two steps [6]. The first fit was conducted using weights obtained from the preprocessing. Data points above the resulting smoothed function from the first fit are regarded more important, and in the second step the normal equations are solved using the weight of these data values, but increased by a factor 2. This multistep procedure leads to a smoothed function that is adapted to the upper envelope of the data (Figure 8.7). Similarly, the ancillary metadata of the meteorological data from the preprocessing also were used in the iterative fitting to the upper envelope of the –KBDI time-series [6]. The width of the fitting window determines the degree of smoothing, but it also affects the ability to follow a rapid change. It is sometimes necessary to locally tighten the window even when the global setting of the window performs well. A typical situation occurs in savanna ecosystems where vegetation, associated remote sensing, and meteorological indices respond rapidly to vegetation moisture dynamics. A small fitting window can be used to capture the corresponding sudden rise in data values. (b) 0.4

0.4

0.2

0.2 NDWI

NDWI

(a)

0

−0.2 −0.4

0

−0.2

0

20

40 60 Time (dekad)

80

100

−0.4 0

20

40 60 80 Time (dekad)

100

FIGURE 8.7 The Savitzky–Golay filtering of NDWI (——) is performed in two steps. Firstly, the local polynomials are fitted using the weights from the preprocessing (a). Data points above the resulting smoothed function (– – –) from the first fit are attributed a greater importance. Secondly, the normal equations are solved with the weights of these data values increased by a factor 2 (b).

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 165 (b) 0.3

165

0.2

160

0.1

155 NDWI

NDWI

(a)

0

150

−0.1

145

−0.2

140

−0.3

135

−0.4

0

20

40 60 80 Time (dekad)

100

130

0

20

40 60 80 Time (dekad)

100

FIGURE 8.8 The filtering of NDWI (——) in (a) is done with a window that is too large to allow the filtered data (– – –) to follow sudden increases and decreases of underlying data values. The data in the window are scanned and if there is a large increase or decrease, an automatic decrease in the window size will result. The filtering is then repeated using the new locally adapted size (b). Note the improved fit at rising edges and narrow peaks.

The data in the window are scanned and if a large increase or decrease is observed, the adaptive Savitzky–Golay method applied an automatic decrease in the window size. The filtering is then repeated using the new locally adapted size. Savitzky–Golay filtering with and without the adaptive procedure is illustrated in Figure 8.8. In the figure it is shown that the adaptation of the window improves the fit at the rising edges and at narrow seasonal peaks. 8.4.2

Extracting Seasonal Metrics from Time-Series and Statistical Analysis

Four seasonal metrics were extracted for each of the rainy seasons. Figure 8.9 illustrates the different metrics per season for NDWI and KBDI time-series. The beginning of a season, that is, 20% left of the rainy season, is defined from the final function fit as the point in time for which the index value has increased by 20% of the distance between the left minimum level and the maximum. The end of the season is defined in a similar way

(a)

(b) 0.4

0

0.2 −KBDI

NDWI

–200 0

–400

–0.2

–600

–0.4 –0.6 0

20

60 40 Time (dekad)

80

100

–800 0

20

40 60 80 Time (dekad)

100

FIGURE 8.9 The final fit of the Savitzky–Golay function (– – –) to the NDWI (a) and –KBDI (b) series (——), with the four defined metrics, that is, 20% left and right, and 80% left and right (&), overlaid on the graph. Points with flagged data errors (þ) were assigned weights of zero and did not influence the fit. A dekad is defined as a 10-day period.

Signal and Image Processing for Remote Sensing

166

as the point 20% right of the rain season. The 80% left and right points are defined as the points for which the function fit has increased to 80% of the distance between, respectively, the left and right minimum levels and the maximum. The current technique used to define metrics also is used by Verbesselt et al. [6] to define the beginning of the fire season. The temporal occurrence and the value of each metric were extracted for further exploratory statistical analysis to study the relationship between time-series. The SPOT VGT S10 time-series consisted of four seasons (1998–2002) from which four occurrences and values per metric type were extracted. Twenty-four occurrence–value combinations per metric type ultimately were available for further analysis since six weather stations were used. Serial correlation that occurs in remote sensing and climate-based time-series invalidates inferences made by standard parametric tests, such as the Student’s t-test or the Pearson correlation. All extracted occurrence–value combinations per metric type were tested for autocorrelation using the Ljung–Box autocorrelation test [26]. Robust nonparametric techniques, such as the Wilcoxon’s signed rank test were used in case of non-normally distributed data. The normality of the data was verified using the Shapiro–Wilkinson normality test [29]. Firstly, the distribution of the temporal occurrence of each metric was visualized and evaluated based on whether or not there was a significant difference between the temporal occurrence of the four metric types extracted from –KBDI and NDWI time-series. Next, the strength and significance of the relationship between –KBDI and NDWI values of the four metric types were assessed with an OLS regression analysis.

8.5

Results and Discussion

Figure 8.9 illustrates the optimized function fit and the defined metrics for the –KBDI and NDWI. Notice that the Savitzky–Golay function could properly define the behavior of the different time-series. The function was fitted to the upper envelope of the data by using the uncertainty information derived during the preprocessing step. The results of the statistical analysis based on the extracted metrics for –KBDI and NDWI are presented. The Ljung–Box statistic indicated that the extracted occurrences and values were not significantly autocorrelated at a 95% confidence level. All p-values were greater than 0.1, failing to reject the null hypothesis of independence. 8.5.1

Temporal Analysis of the Seasonal Metrics

Figure 8.10 illustrates the temporal distribution of temporal occurrence of extracted metrics from time-series of –KBDI and NDWI. The occurrences of extracted metrics were significantly non-normally distributed at a 95% confidence level (p > 0.1), indicating that the Wilcoxon’s signed rank can be used. The Wilcoxon’s signed rank test showed that –KBDI and NDWI occurrences of the 80% left and right, and 20% right were not significantly different from each other at a 95% confidence level (p > 0.1). This confirmed that –KBDI and NDWI were temporally related. It also corroborated the results of Burgan [11] and Ceccato et al. [4] who found that both –KBDI and NDWI were related to the seasonal vegetation moisture dynamics, as measured by VWC. Figure 8.10, however, illustrates that the start of the rainy season (i.e., 20% left occurrence), derived from the –KBDI and NDWI time-series, was different. The Wilcoxon’s

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 167 20% right

28

50

30

32

55

34

36

60

38

40

65

20% left

–KBDI

NDWI

–KBDI

80% left

NDWI

38

34

40

36

38

42 44

40

46

42

48

44

50

46

80% right

–KBDI

NDWI

–KBDI

NDWI

FIGURE 8.10 Box plots of the temporal occurrence of the four defined metrics, that is, 20% left and right, and 80% left and right, extracted from time-series of KBDI and NDWI. The dekads (10-day period) are shown on the y-axis and are indicative of the temporal occurrence of the metric. The upper and lower boundaries of the boxes indicate upper and lower quartiles. The median is indicated by the solid line (—) within each box. The whiskers connect the extremes of the data, which were defined as 1.5 times the inter-quartile range. Outliers are represented by (o).

signed rank test confirmed that the –KBDI and NDWI differed significantly from each other at a 95% confidence level (p < 0.01). This phenomenon can be explained by the fact that vegetation in the study area starts growing before the rainy season starts, due to an early change in air temperature (N. Govender, Scientific Service Kruger National Park, South Africa, personal communication). This explained why the NDWI reacted before the change in climatic conditions as measured by the –KBDI, given that the NDWI is directly related to vegetation moisture dynamics [4].

8.5.2

Regression Analysis Based on Values of Extracted Seasonal Metrics

The assumptions of the OLS regression models between values of metrics extracted from –KBDI and NDWI time-series were verified. The Wald test statistic showed nonlinearity to be not significant at a 95% confidence level (p > 0.15). The Shapiro– Wilkinson normality test confirmed that the residuals were normally distributed at a

Signal and Image Processing for Remote Sensing

168

95% confidence level (p < 0.01) [6,29]. Table 8.2 illustrates the results of the OLS regression analysis between values of metrics extracted from –KBDI and NDWI timeseries. The values extracted at the ‘‘20% right’’ position of the –KBDI and NDWI time-series showed a significant relationship at a 95% confidence level (p < 0.01). The other extracted metrics did not exhibit significant relationships at a 95% confidence level (p > 0.1). A significant relationship between the –KBDI and NDWI time-series was observed only at the moment when savanna vegetation was completely cured (i.e., 20% right-hand side). The savanna vegetation therefore reacted differently to changes in climate parameters such as rainfall and temperature, as measured by KBDI, depending on the phenological growing cycle. This phenomenon could be explained because a living plant uses defense mechanisms to protect itself from drying out, while a cured plant responds to climatic conditions [46]. These results consequently indicated that the relationship between extracted values of –KBDI and NDWI was influenced by seasonality. This is in corroboration with the results of Ji and Peters [23], who indicated that seasonality had a significant effect on the relationship between vegetation as measured by a remote sensing index and drought index. These results further illustrated that the seasonal effect needs to be taken into account when regression techniques are used to quantify the relationship between time-series related to vegetation moisture dynamics. The seasonal effect also can be accounted for by utilizing autoregression models with seasonal dummy variables, which take the effect of serial correlation and seasonality into account [23,26]. However, the proposed method to account for serial correlation by sampling at specific moments in time had an additional advantage; the influence of seasonality could be studied by extracting metrics at the specified moments, besides the fact that serial correlation was taken into account. Furthermore, it was shown that serial correlation caused an overestimation of the correlation coefficient is when results from Table 8.1 and Table 8.2 were compared. All the coefficients of determination (R2) of Table 8.1 were significant with an average value of 0.7, while in Table 8.2 only the correlation coefficient at the end of the rainy season (20% right-hand side) was significant (R2 ¼ 0.49). This confirmed the importance of accounting for serial correlation and seasonality in the residuals of a regression model, when studying the relationship between two time-series.

8.5.3

Time-Series Analysis Techniques

Time-series analysis models most often are used for purposes of describing current conditions and forecasting [25]. The models use the serial correlation in time-series as a TABLE 8.2 Coefficients of Determination of the OLS Regression Models (NDWI –KBDI) for the Four Extracted Seasonal Metric Values between –KBDI and NDWI Time-Series (n ¼ 24 per Metric) NDWI KBDI 20% left 20% right 80% left 80% right

R2

p-Values

0.01 0.49 0.00 0.01

0.66 <0.01 0.97 0.61

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 169 tool to relate temporal observations. Future observations can be predicted by modeling the serial correlation structure of a time-series [26]. Time-series analysis techniques (e.g., ARIMA) can be used to model a time-series by using other, independent timeseries [27]. ARIMA subsequently can be used to study the relationship between –KBDI and NDWI. However, there are constraints that have to be considered before ARIMA models can be applied to a time-series (e.g., drought index) by using other predictor time-series (e.g., satellite index). Firstly, the goodness-of-fit of an ARIMA model will not be significant when changes in the satellite index precede or coincide with those in the drought index. The CCF can be used in this context to verify how time-series are related to each other. Time lag results in Table 8.1 indicate that the –KBDI (drought index) precedes or coincides with the NDWI time-series. This illustrates that ARIMA models cannot directly be used to predict the KBDI, with NDWI as the predictor variable. Consequently, other more advanced timeseries analysis techniques are needed to model vegetation dynamics because they will precede or coincide with the dynamics monitored by remote sensing indices in most of the cases. Such more advanced time-series analysis techniques, however, are not discussed since they are outside the scope of this chapter. Secondly, availability of data is limited for time-series analysis, namely, from 1998 to 2002. This is an important constraint because two separate data sets are needed to parameterize and evaluate an ARIMA model. One set is needed for parameterization, while the other is used to forecast and validate the ARIMA model through comparison of the observed and expected values. Accordingly, it is necessary to interpolate missing satellite data that were masked out during preprocessing to ensure adequate data are available for parameterization. Thirdly, the proposed sampling strategy made investigation of the time lag and correlation at a defined instant in time possible, as opposed to ARIMA or cross-correlation analysis, through which only the overall relationship between time-series can be studied [25]. The applied sampling strategy is thus ideally suited to study the relationship between time-series of climate and remote sensing data, characterized by seasonality and serial correlation. The sampling of seasonal metrics minimized the influence of serial correlation, thereby making the study of seasonality possible.

8.6

Conclusions

Serial correlation problems are not unknown in the field of statistical or general meteorology. However, the presence of serial correlation, found during analysis of a variable sampled sequentially at regular time intervals, seems to be disregarded by many agricultural meteorologists and remote sensing scientists. This is true despite abundant documentation available in the traditional meteorological and statistical literature. Therefore, an overview of the most important time-series analysis techniques and concepts was presented, namely, stationarity, autocorrelation, differencing, decomposition, autoregression, and ARIMA. A method was proposed to study the relationship between a meteorological drought index (KBDI) and remote sensing index (NDWI), both related to vegetation moisture dynamics, by accounting for the serial correlation effect. The relationship between –KBDI and NDWI was studied by extracting nonserially correlated seasonal metrics, for example, 20% and 80% left- and right-hand side metrics of the rainy season, based on a Savitzky–Golay fit to the upper envelope of the time-series. Serial correlation between the

170

Signal and Image Processing for Remote Sensing

extracted metrics was shown to be minimal and seasonality was an important factor influencing the relationship between NDWI and –KBDI time-series. Statistical analysis using the temporal occurrence of the extracted metrics revealed that NDWI and –KBDI time-series are temporally connected, except at the beginning of the rainy season. The fact that the savanna vegetation starts re-greening before the start of the rainy season explains this inability to detect the beginning of the rainy season. The values of the extracted seasonal metrics of NDWI and –KBDI were significantly related only at the end of the rainy season, namely, at the 20% right-hand side value of the fitted curve. The savanna vegetation at the end of the rainy season was cured and responded strongly to changes in climatic conditions monitored by the –KBDI, such as rain and temperature. The relationship between –KBDI and NDWI consequently changes during the season, which indicates that seasonality is an important factor that needs to be taken into account. Moreover, it was shown that correlation coefficients estimated by OLS regression analysis were overestimated due to the influence of serial correlation in the residuals. This confirmed the importance of taking serial correlation of the residuals into account by sampling nonserially correlated seasonal metrics when studying the relationship between time-series. The serial correlation effect consequently was taken into account by the extraction of seasonal metrics from time-series. The seasonal metrics in turn could be used to study the relationship between remote sensing and ground-based time-series, such as meteorological or field measurements. A better understanding of the relationship between remote sensing and in situ observations at regular time intervals will contribute to the use of remotely sensed data for the development of an index that represents seasonal vegetation moisture dynamics.

Acknowledgments The SPOT VGT S10 data sets were generated by the Flemish Institute for Technological Development. The climate data were provided by the Weather Services of South Africa, whereas the National Land Cover Map (1995) was supplied by the Agricultural Research Centre of South Africa. We acknowledge the support of L. Eklundh, as well as the Crafoord and Bergvall foundations. The authors wish to thank N. Govender from the Scientific Services of Kruger National Park for scientific input.

References 1. Camia, A. et al., Meteorological fire danger indices and remote sensing, in Remote Sensing of Large Wildfires in the European Mediterranean Basin, E. Chuvieco, Ed., Springer-Verlag, New York, 1999, p. 39. 2. Ceccato, P. et al., Estimation of live fuel moisture content, in Wildland Fire Danger Estimation and Mapping: The Role of Remote Sensing Data, E. Chuvieco, Ed., World Scientific Publishing, New York, 2003, p. 63. 3. Dennison, P.E. et al., Modeling seasonal changes in live fuel moisture and equivalent water thickness using a cumulative water balance index, Remote Sens. Environ., 88, 442, 2003.

Relating Time-Series of Meteorological and Remote Sensing Indices to Monitor Vegetation 171 4. Ceccato, P. et al., Detecting vegetation leaf water content using reflectance in the optical domain, Remote Sens. Environ., 77, 22, 2001. 5. Jackson, T.J. et al., Vegetation water content mapping using landsat data derived normalized difference water index for corn and soybeans, Remote Sens. Environ., 92, 475, 2004. 6. Verbesselt, J. et al., Evaluating satellite and climate data derived indices as fire risk indicators in savanna ecosystems, IEEE Trans. Geosci. Remote Sens., 44; 1622–1632; 2006. 7. Van Wilgen, B.W. et al., Response of Savanna fire regimes to changing fire-management policies in a large African national park, Conserv. Biol., 18, 1533, 2004. 8. Dimitrakopoulos, A.P. and Bemmerzouk, A.M., Predicting live herbaceous moisture content from a seasonal drought index, Int. J. of Biometeorol., 47, 73, 2003. 9. Keetch, J.J. and Byram, G.M., A drought index for forest fire control, U.S.D.A. Forest Service, Asheville NC, SE-38, 1988. 10. Janis, M.J., Johnson, M.B., and Forthun, G., Near-real time mapping of Keetch–Byram drought index in the south-eastern United States, Int. J. Wildland Fire, 11, 281, 2002. 11. Burgan, E.R., Correlation of plant moisture in Hawaii with the Keetch–Byram drought index, USDA. Forest Service Research Note, PSW-307, 1976. 12. Chuvieco, E. et al., Combining NDVI and surface temperature for the estimation of live fuel moisture content in forest fire danger rating, Remote Sens. Environ., 92, 322, 2004. 13. Maki, M., Ishiahra, M., and Tamura, M., Estimation of leaf water status to monitor the risk of forest fires by using remotely sensed data, Remote Sens. Environ., 90, 441, 2004. 14. Paltridge, G.W. and Barber, J., Monitoring grassland dryness and fire potential in Australia with NOAA AVHRR data, Remote Sens. Environ., 25, 381, 1988. 15. Chladil, M.A. and Nunez, M., Assessing grassland moisture and biomass in Tasmania—the application of remote-sensing and empirical-models for a cloudy environment, Int. J. Wildland Fire, 5, 165, 1995. 16. Hardy, C.C. and Burgan, R.E., Evaluation of NDVI for monitoring live moisture in three vegetation types of the western US, Photogramm. Eng. Remote Sens., 65, 603, 1999. 17. Chuvieco, E. et al., Estimation of fuel moisture content from multitemporal analysis of LANDSAT thematic mapper reflectance data: applications in fire danger assessment, Int. J. Remote Sens., 23, 2145, 2002. 18. Fensholt, R. and Sandholt, I., Derivation of a shortwave infrared water stress index from MODIS near- and shortwave infrared data in a semiarid environment, Remote Sens. Environ., 87, 111, 2003. 19. Hunt, E.R., Rock, B.N., and Nobel, P.S., Measurement of leaf relative water content by infrared reflectance, Remote Sens. Environ., 22, 429, 1987. 20. Ceccato, P., Flasse, S., and Gregoire, J.M., Designing a spectral index to estimate vegetation water content from remote sensing data—Part 2: Validation and applications, Remote Sens. Environ., 82, 198, 2002. 21. Myneni, R.B. et al., Increased plant growth in the northern high latitudes from 1981 to 1991, Nature, 386, 698, 1997. 22. Eklundh, L., Estimating relations between AVHRR NDVI and rainfall in East Africa at 10-day and monthly time scales, Int. J. Remote Sens., 19, 563, 1998. 23. Ji, L. and Peters, A.J., Assessing vegetation response to drought in the northern great plains using vegetation and drought indices, Remote Sens. Environ., 87, 85, 2003. 24. De Beurs, K.M. and Henebry, G.M., Land surface phenology, climatic variation, and institutional change: analyzing agricultural land cover change in Kazakhstan, Remote Sens. Environ., 89, 497, 2004. 25. Meek, D.W. et al., A note on recognizing autocorrelation and using autoregression, Agricult. Forest Meterol., 96, 9, 1999. 26. Brockwell, P.J. and Davis, R.A., Introduction to Time Series and Forecasting, 2nd ed., SpringerVerlag, New York, 2002, p. 31. 27. Ford, C.R. et al., Modeling canopy transpiration using time series analysis: a case study illustrating the effect of soil moisture deficit on Pinus Taeda, Agricult. Forest Meterol., 130, 163, 2005. 28. Montanari, A., Deseasonalisation of hydrological time series through the normal quantile transform, J. Hydrol., 313, 274, 2005.

172

Signal and Image Processing for Remote Sensing

29. R Development Core Team, R: a language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, URL http:// www.R-project.org, 2005. 30. Rahman, H. and Dedieu, G., SMAC—a simplified method for the atmospheric correction of satellite measurements in the solar spectrum, Int. J. Remote Sens., 15, 123, 1994. 31. Stroppiana, D. et al., An algorithm for mapping burnt areas in Australia using SPOT-VEGETATION data, IEEE Trans. Geosci. Remote Sens., 41, 907, 2003. 32. Thompson, M.A., Standard land-cover classification scheme for remote-sensing applications in South Africa, South Afr. J. Sci., 92, 34, 1996. 33. Aguado, I. et al., Assessment of forest fire danger conditions in southern Spain from NOAA images and meteorological indices, Int. J. Remote Sens., 24, 1653, 2003. 34. von Storch, H. and Zwiers, F.W., Statistical Analysis in Climate Research, Cambridge University Press, Cambridge, 1999, p. 483. 35. Thejll, P. and Schmith, T., Limitations on regression analysis due to serially correlated residuals: application to climate reconstruction from proxies, J. Geophys. Res. Atmos., 110, 2005. 36. Cleveland, R.B. et al., STL: a seasonal-trend decomposition procedure based on Loess., J. Off. Stat., 6, 3, 1990. 37. Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S, 4th ed., Springer-Verlag, New York, 2003, p. 493. 38. Jo¨nsson, P. and Eklundh, L., Seasonality extraction by function fitting to time-series of satellite sensor data, IEEE Trans. Geosci. Remote Sens., 40, 1824, 2002. 39. Jo¨nsson, P. and Eklundh, L., TIMESAT—a program for analyzing time-series of satellite sensor data, Comput. Geosci., 30, 833, 2004. 40. Menenti, M. et al., Mapping agroecological zones and time-lag in vegetation growth by means of Fourier-analysis of time-series of NDVI images, Adv. Space Res., 13, 233, 1993. 41. Azzali, S. and Menenti, M., Mapping vegetation–soil–climate complexes in Southern Africa using temporal Fourier analysis of NOAA-AVHRR NDVI data, Int. J. Remote Sens., 21, 973, 2000. 42. Olsson, L. and Eklundh, L., Fourier-series for analysis of temporal sequences of satellite sensor imagery, Int. J. Remote Sens., 15, 3735, 1994. 43. Cihlar, J., Identification of contaminated pixels in AVHRR composite images for studies of land biosphere, Remote Sens. Environ., 56, 149, 1996. 44. Sellers, P.J. et al., A global 1-degrees-by-1-degrees NDVI data set for climate studies. The generation of global fields of terrestrial biophysical parameters from the NDVI, Int. J. Remote Sens., 15, 3519, 1994. 45. Roerink, G.J., Menenti, M., and Verhoef, W., Reconstructing cloudfree NDVI composites using Fourier analysis of time series, Int. J. Remote Sens., 21, 1911, 2000. 46. Pyne, S.J., Andrews, P.L., and Laven, R.D., Introduction to Wildland Fire, 2nd ed., John Wiley & Sons, New York, 1996, 117.

9 Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

Sang-Ho Yun and Howard Zebker

CONTENTS 9.1 Image Descriptions ........................................................................................................... 174 9.1.1 TOPSAR DEM....................................................................................................... 174 9.1.2 SRTM DEM............................................................................................................ 175 9.2 Image Registration............................................................................................................ 176 9.3 Artifact Elimination .......................................................................................................... 177 9.4 Prediction-Error (PE) Filter ............................................................................................. 178 9.4.1 Designing the Filter.............................................................................................. 178 9.4.2 1D Example ........................................................................................................... 179 9.4.3 The Effect of the Filter ......................................................................................... 179 9.5 Interpolation ...................................................................................................................... 180 9.5.1 PE Filter Constraint .............................................................................................. 180 9.5.2 SRTM DEM Constraint........................................................................................ 181 9.5.3 Inversion with Two Constraints ........................................................................ 181 9.5.4 Optimal Weighting............................................................................................... 181 9.5.5 Simulation of the Interpolation .......................................................................... 183 9.6 Interpolation Results ........................................................................................................ 183 9.7 Effect on InSAR ................................................................................................................. 185 9.8 Conclusion ......................................................................................................................... 187 References ................................................................................................................................... 188

A prediction-error (PE) filter is an array of numbers designed to interpolate missing parts of data such that the interpolated parts have the same spectral content as the existing parts. The data can be a one-dimensional time series, two-dimensional image, or a threedimensional quantity such as subsurface material property. In this chapter, we discuss the application of a PE filter to recover missing parts of an image when a low-resolution image of the missing parts is available. One of the research issues on PE filter is improving the quality of image interpolation for nonstationary images, in which the spectral content varies with position. Digital elevation models (DEMs) are in general nonstationary. Thus, PE filter alone cannot guarantee the success of image recovery. However, the quality of the image recovery of a high-resolution image can be improved with independent data set such as a lowresolution image that has valid pixels for the missing regions of the high-resolution image. Using a DEM as an example image, we introduce a systematic method to use a 173

Signal and Image Processing for Remote Sensing

174

PE filter incorporating the low-resolution image as an additional constraint, and show the improved quality of the image interpolation. High-resolution DEMs are often limited in spatial coverage; they also may possess systematic artifacts when compared to comprehensive low-resolution maps. We correct artifacts and interpolate regions of missing data in topographic synthetic aperture radar (TOPSAR) DEMs using a low-resolution shuttle radar topography mission (SRTM) DEM. Then PE filters are to interpolate and fill missing data so that the interpolated regions have the same spectral content as the valid regions of the TOPSAR DEM. The SRTM DEM is used as an additional constraint in the interpolation. Using cross-validation methods one can obtain the optimal weighting for the PE filter and the SRTM DEM constraints.

9. 1

Image Descriptions

InSAR is a powerful tool for generating DEMs [1]. The TOPSAR and SRTM sensors are primary sources for the academic community for DEMs derived from single-pass interferometric data. Differences in system parameters such as altitude and swath width (Table 9.1) result in very different properties for derived DEMs. Specifically, TOPSAR DEMs have better resolution, while SRTM DEMs have better accuracy over larger areas. TOPSAR coverage is often not spatially complete.

9.1.1

TOPSAR DEM

TOPSAR DEMs are produced from cross-track interferometric data acquired with NASA’s AIRSAR system mounted on a DC-8 aircraft. Although the TOPSAR DEMs have a higher resolution than other existing data, they sometimes suffer from artifacts and missing data due to roll of the aircraft, layover, and flight planning limitations. The DEMs derived from the SRTM have lower resolution, but fewer artifacts and missing data than TOPSAR DEMs. Thus, the former often provides information in the missing regions of the latter. We illustrate joint use of these data sets using DEMs acquired over the Gala´pagos Islands. Figure 9.1 shows the TOPSAR DEM used in this study. The DEM covers Sierra Negra volcano on the island of Isabela. Recent InSAR observations reveal that the volcano has been deforming relatively rapidly [2,3]. InSAR analysis can require use of a DEM to produce a simulated interferogram required to isolate ground deformation. The effect of artifact elimination and interpolation for deformation studies is discussed later in this chapter. TABLE 9.1 TOPSAR Mission versus SRTM Mission Mission Platform Nominal Swath width Baseline DEM resolution DEM coord. system

TOPSAR

SRTM

DC-8 aircraft Altitude 9 km 10 km 2.583 m 10 m None

Space shuttle 233 km 225 km 60 m 90 m Lat/Long

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

100

175

1 3

200 300 2

400 N 500 2 km 600 200

400 800

800

600 900

1000

1100

1000

1200

Altitude 1200 (m)

FIGURE 9.1 The original TOPSAR DEM of Sierra Negra volcano in Gala´pagos Islands (inset for location). The pixel spacing of the image is 10 m. The boxed areas are used for illustration later in this paper. Note that there are a number of regions of missing data with various shapes and sizes. Artifacts are not identifiable due to the variation in topography. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

The TOPSAR DEMs have a pixel spacing of about 10 m, sufficient for most geodetic applications. However, regions of missing data are often encountered (Figure 9.1), and significant residual artifacts are found (Figure 9.2). The regions of missing data are caused by layover of the steep volcanoes and flight planning limitations. Artifacts are large-scale and systematic and most likely due to uncompensated roll of the DC-8 aircraft [4]. Attempts to compensate this motion include models of piecewise linear imaging geometry [5] and estimating imaging parameters that minimize the difference between the TOPSAR DEM and an independent reference DEM [6]. We use a nonparameterized direct approach by subtracting the difference between the TOPSAR and SRTM DEMs.

9.1.2

SRTM DEM

The recent SRTM mission produced nearly worldwide topographic data at 90 m posting. SRTM topographic data are in fact produced at 30 m posting (1 arcsec); however, highresolution data sets for areas outside of the United States are not available to the public at this time. Only DEMs at 90 m posting (3 arcsec) are available to download. For many analyses, finer scale elevation data are required. For example, a typical pixel spacing in a spaceborne SAR image is 20 m. If the SRTM DEMs are used for topography removal in spaceborne interferometry, the pixel spacing of the final interferograms would be limited by the topography data to at best 90 m. Despite the lower resolution, the SRTM DEM is useful because it has fewer motion-induced artifacts than the TOPSAR DEM. It also has fewer data holes. The merits and demerits of the two DEMs are in many ways complementary to each other. Thus, a proper data fusion method can overcome the shortcomings of each and produce a new DEM that combines the strengths of the two data sets: a DEM that has a

Signal and Image Processing for Remote Sensing

176

1000

500

50

1000

800

100

800

600

150

600

400

200

400

1000

1500

2000 200

250

200

2500

(a)

500

1000

1500

2000

0 (m)

2500

300

(b)

50

100

150

200

250

300

0 (m)

15

50

100

10

150 5 200

idth th w Swa km = 10

250

0

−5 (m)

300

(c)

50

100

150

200

250

300

FIGURE 9.2 (See color insert following page 302.) (a) TOPSAR DEM and (b) SRTM DEM. The tick labels are pixel numbers. Note the difference in pixel spacing between the two DEMs. (c) Artifacts obtained by subtracting the SRTM DEM from the TOPSAR DEM. The flight direction and the radar look direction of the aircraft associated with the swath with the artifact are indicated with long and short arrows, respectively. Note that the artifacts appear in one entire TOPSAR swath, while they are not as serious in other swaths.

resolution of the TOPSAR DEM and large-scale reliability of the SRTM DEM. In this chapter, we present an interpolation method that uses both TOPSAR and SRTM DEMs as constraints.

9.2

Image Registration

The original TOPSAR DEM, while in ground-range coordinates, is not georeferenced. Thus, we register the TOPSAR DEM to the SRTM DEM, which is already registered in a latitude–longitude coordinate system. The image registration is carried out between the DEM data sets using an affine transformation. Although the TOPSAR DEM is not georeferenced, it is already on the ground coordinate system. Thus, scaling and rotation are the two most important components. We have seen that skewing component was negligible. Any higher order transformation between the two DEMs would also be negligible. The affine transformation is as follows:

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

xS a ¼ yS c

b d

xT e þ yT f

177

(9:1)

x xS and T are tie points in the SRTM and TOPSAR DEM coordinate systems, where yS yT respectively. Since [a b e] and [c d f] are estimated separately, at least three tie points are required to uniquely determine them. We picked 10 tie points from each DEM based on topographic features and solved for the six unknowns in a least-square sense. Given the six unknowns, we choose new georeferenced sample locations that are uniformly spaced; every ninth sample location corresponds to the sample location of xT xS and are calculated. Then, the SRTM DEM. Those sample locations from yS yT nearest TOPSAR DEM value is selected and put into the corresponding new georeferenced sample location. The intermediate values are filled in from the TOPSAR map to produce the georeferenced 10-m data set. It should be noted that it is not easy to determine the tie points in DEM data sets. Enhancing the contrast of the DEMs facilitated the process. In general, fine registration is important for correctly merging different data sets. The two DEMs in this study have different pixel spacings. It is difficult to pick tie points with higher precision than the pixel spacing of the coarser image. In our method, however, the SRTM DEM, the coarser image, is treated as an averaged image of the TOPSAR DEM, the finer image. In our inversion, only the 9-by-9 averaged values of the TOPSAR DEM are compared with the pixel values of the SRTM DEM. Thus, the fine registration is less critical in this approach than in the case where a one-to-one match is required.

9.3

Artifact Elimination

Examination of the georeferenced TOPSAR DEM (Figure 9.2a) shows motion artifacts when compared to the SRTM DEM (Figure 9.2b). The artifacts are not clearly discernible in Figure 9.2a because their magnitude is small in comparison to the overall data values. The artifacts are identified by downsampling the registered TOPSAR DEM and subtracting the SRTM DEM. Large-scale anomalies that periodically fluctuate over an entire swath are visible in Figure 9.2c. The periodic pattern is most likely due to uncompensated roll of the DC-8 aircraft. The spaceborne data are less likely to exhibit similar artifacts, because the spacecraft is not greatly affected by the atmosphere. Note that the width of the anomalies corresponds to the width of a TOPSAR swath. Because the SRTM swath is much larger than that of the TOPSAR system (Table 9.1), a larger area is covered under consistent conditions, reducing the number of parallel tracks required to form an SRTM DEM. The maximum amplitude of the motion artifacts in our study area is about 20 m. This would result in substantial errors in many analyses if not properly corrected. For example, if this TOPSAR DEM is used for topography reduction in repeat-pass InSAR using ERS-2 data with a perpendicular baseline of about 400 m, the resulting deformation interferogram would contain one fringe ( ¼ 2.8 cm) of spurious signal. To remove these artifacts from the TOPSAR DEM, we up-sample the difference image with bilinear interpolation by a factor of 9 so that its pixel spacing matches the TOPSAR DEM. The difference image is subtracted from the TOPSAR DEM. This process is described with a flow diagram in Figure 9.3. Note that the lower branch

Signal and Image Processing for Remote Sensing

178

TOPSAR DEM − 9average

−

TOPSAR DEM corrected

9bilinear

SRTM DEM FIGURE 9.3 The flow diagram of the artifact elimination. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

undergoes two low-pass filter operations when averaging and bilinear interpolation are implemented, while the upper branch preserves the high frequency contents of the TOPSAR DEM. In this way we can eliminate the large-scale artifacts while retaining details in the TOPSAR DEM.

9.4

Prediction-Error (PE) Filter

The next step in the DEM process is to fill in missing data. We use a PE filter operating on the TOPSAR DEM to fill these gaps. The basic idea of the PE filter constraint [7,8] is that missing data can be estimated so that the restored data yield minimum energy when the PE filter is applied. The PE filter is derived from training data, which are normally valid data surrounding the missing regions. The PE filter is selected so that the missing data and the valid data share the same spectral content. Hence, we assume that the spectral content of the missing data in the TOPSAR DEM is similar to that of the regions with valid data surrounding the missing regions.

9.4.1

Designing the Filter

We generate a PE filter such that it rejects data with statistics found in the valid regions of the TOPSAR DEM. Given this PE filter, we solve for data in the missing regions such that the interpolated data are also nullified by the PE filter. This concept is illustrated in Figure 9.4. The PE filter, fPE, is found by minimizing the following objective function, kfPE xe k2

(9:2)

where xe is the existing data from the TOPSAR DEM, and * represents convolution. This expression can be rewritten in a linear algebraic form using the following matrix operation: kFPE xe k2

(9:3)

kXe fPE k2

(9:4)

or equivalently

where FPE and Xe are the matrix representations of fPE and xe for convolution operation. These matrix and vector expressions are used to indicate their linear relationship.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

179

Xe ∗

fPE

∗

9.4.2

fPE = 0 + e1

Xm

= 0 + e2

FIGURE 9.4 Concept of PE filter. The PE filter is estimated by solving an inverse problem constrained with the remaining part, and the missing part is estimated by solving another inverse problem constrained with the filter. The «1 and «2 are white noise with small amplitude.

1D Example

The procedure of acquiring the PE filter can be explained with a 1D example. Suppose that a data set, x ¼ [x1, . . ., xn] (where n 3) is given, and we want to compute a PE filter of length 3, fPE ¼ [1 f1 f2]. Then we form a system of linear equations as follows: 2

x3 6 x4 6 6 .. 4 . xn

x2 x3 .. .

x1 x2 .. .

xn1

xn 2

3

2 3 7 1 74 5 7 f1 0 5 f2

(9:5)

The first element of the PE filter should be equal to one to avoid the trivial solution, fPE ¼ 0. Note that Equation 9.5 is the convolution of the data and the PE filter. After simple algebra and with 3 2 3 2 x3 x2 x1 6 . 7 6 . .. 7 d 4 .. 5 and D 4 .. . 5 xn1

xn

xn2

we get D

f1 f2

d

and its normal equation becomes f1 ¼ (DT D)1 DT ( d) f2

(9:6)

(9:7)

Note that Equation 9.7 minimizes Equation 9.2 in a least-square sense. This procedure can be extended to 2D problems, and more details are described in Refs. [7] and [8]. 9.4.3

The Effect of the Filter

Figure 9.5 shows the characteristics of the PE filter in the spatial and Fourier domains. Figure 9.5a is the sample DEM chosen from Figure 9.1 (numbered box 1) for demonstration. It contains various topographic features and has a wide range of spectral content (Figure 9.5d). Figure 9.5b is the 5-by-5 PE filter derived from Figure 9.5a by solving the inverse problem in Equation 9.3. Note that the first three elements in the first column of the filter coefficients are 0 0 1. This is the PE filter’s unique constraint that ensures the filtered output to be white noise [7]. In the filtered output (Figure 9.5c) all the variations in the DEM were effectively suppressed. The size (order) of the PE filter is based on the

Signal and Image Processing for Remote Sensing

180

(a)

(b)

(c) 0

−0.204

0.122

0.067 −0.023

0

−0.365 −0.009

0.008 −0.022

1.000 −0.234

(d)

0.033

0.051 −0.010

0.063

0.001 −0.013

−0.606

0.151

−0.072

0.073 −0.016 −0.015

(e)

0.020

(f)

FIGURE 9.5 The effect of a PE filter. (a) original DEM; (b) a 2D PE filter found from the DEM; (c) DEM filtered with the PE filter; and (d), (e), and (f) the spectra of (a), (b), and (c), respectively, plotted in dB. (a) and (c) are drawn with the same color scale. Note that in (c) the variation of image (a) was effectively suppressed by the filter. The standard deviations of (a) and (c) are 27.6 m and 2.5 m, respectively. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

complexity of the spectrum of the DEM. In general, as the spectrum becomes more complex, a larger size filter is required. After testing various sizes of the filter, we found a 5-by-5 size appropriate for the DEM used in our study. Figure 9.5d and Figure 9.5e show the spectra of the DEM and the PE filter, respectively. These illustrate the inverse relationship of the PE filter to the corresponding DEM in the Fourier domain, such that their product is minimized (Figure 9.5f). This PE filter constrains the interpolated data in the DEM to similar spectral content to the existing data. All inverse problems in this study were derived using the conjugate gradient method, where forward and adjoint functional operators are used instead of the explicit inverse operators [7], saving computer memory space.

9.5 9.5.1

Interpolation PE Filter Constraint

Once the PE filter is determined, we next estimate the missing parts of the image. As depicted in Figure 9.4, interpolation using the PE filter requires that the norm of the filtered output be minimized. This procedure can be formulated as an inverse computation minimizing the following objective function: kFPE xk2

(9:8)

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

181

where FPE is the matrix representation of the PE filter convolution, and x represents the entire data set including the known and the missing regions. In the inversion process we only update the missing region, without changing the known region. This guarantees seamless interpolation across the boundaries between the known and missing regions. 9.5.2

SRTM DEM Constraint

As previously stated, 90-m posting SRTM DEMs were generated from 30-m posting data. This downsampling was done by calculating three ‘‘looks’’ in both the easting and northing directions. To use the SRTM DEM as a constraint to interpolate the TOPSAR DEM, we posit the following relationship between the two DEMs: each pixel value in a 90m posting SRTM DEM can be considered equivalent to the averaged value of a 9-by-9 pixel window in a 10-m posting TOPSAR DEM centered at the corresponding pixel in the SRTM DEM. The solution using the constraint of the SRTM DEM to find the missing data points in the TOPSAR DEM can be expressed as minimizing the following objective function: ky Axm k2

(9:9)

where y is an SRTM DEM expressed as a vector that covers the missing regions of the TOPSAR DEM, and A is an averaging operator generating nine looks, and xm represents the missing regions of the TOPSAR DEM. 9.5.3

Inversion with Two Constraints

By combining two constraints, one derived from the statistics of the PE filter and one from the SRTM DEM, we can interpolate the missing data optimally with respect to both criteria. The PE filter guarantees that the interpolated data will have the same spectral properties as the known data. At the same time the SRTM constraint forces the interpolated data to have average height near the corresponding SRTM DEM. We formulate the inverse problem as a minimization of the following objective function: l2 kFPE xm k2 þ ky Axm k2

(9:10)

where l set the relative effect of each criterion. Here xm has the dimensions of the TOPSAR DEM, while y has the dimensions of the SRTM DEM. If regions of missing data are localized in an image, the entire image does not have to be used for generating a PE filter. We implement interpolation in subimages to save time and computer memory space. An example of such a subimage is shown in Figure 9.6. The image is a part of Figure 9.1 (numbered box 2). Figure 9.6a and Figure 9.6b are examples of xe in Equation 9.3 and y, respectively. The multiplier l determines the relative weight of the two terms in the objective function. As l ! 1, the solution satisfies the first constraint only, and if l ¼ 0, the solution satisfies the second constraint only.

9.5.4

Optimal Weighting

We used cross-validation sum of squares (CVSS) [9] to determine the optimal weights for the two terms in Equation 9.10. Consider a model xm that minimizes the following quantity:

Signal and Image Processing for Remote Sensing

182

(b)

(a)

2 20 4 40 6 60 8 80 10 100

12

120

14

140

16 20

40

60

80

100

800

900

2 1000

4

6

8

10

12

1100 1200 (m)

FIGURE 9.6 Example subimages of (a) TOPSAR DEM showing regions of missing data (black), and (b) SRTM DEM of the same area. These subimages are engaged in one implementation of the interpolation. The grayscale is altitude in meters. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

l2 kFPE xm k2 þ ky(k) A(k) xm k2

(k ¼ 1, . . . , N)

(9:11)

where y(k) and A(k) are the y and the A in Equation 9.10 with the k-th element and the k-th row omitted, respectively, and N is the number of elements in y that fall into the missing region. Denote this model x(k) m (l). Then we compute the CVSS defined as follows: CVSS(l) ¼

N 1 X 2 (yk Ak x(k) m (l)) N k¼1

(9:12)

where yk is the omitted element from the vector y and Ak is the omitted row vector from (k) the matrix A when the x(k) m (l) was estimated. Thus, Ak xm (l) is the prediction based on the other N 1 observations. Finally, we minimize CVSS(l) with respect to l to obtain the optimal weight (Figure 9.7). In the case of the example shown in Figure 9.6, the minimum CVSS was obtained for l ¼ 0.16 (Figure 9.7). The effect of varying l is shown in Figure 9.8. It is apparent (see Figure 9.8) that the optimal weight is a more ‘‘plausible’’ result than either of the end members, preserving aspects of both constraints. In Figure 9.8a the interpolation uses only the PE filter constraint. This interpolation does not recover the continuity of the ridge running across the DEM in north–south direction, which is observed in the SRTM DEM (Figure 9.6b). This follows from a PE filter obtained such that it eliminates the overall variations in the image. The variations include not only the ridge but also the accurate topography in the DEM. The other end member, Figure 9.8c, shows the result for applying zero weight to the PE filter constraint. Since the averaging operator A in Equation 9.10 is applied independently

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

183

CVSS

110 100

CVSS(l)

90 80 70 60 50 40

0.2

0.4

0.6

0.8

1 l

1.2

1.4

1.6

1.8

2

FIGURE 9.7 Cross-validation sum of squares. The minimum occurs when l ¼ 0.16. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

for each 9-by-9 pixel group, it is equivalent to simply filling the regions of missing data with 9-by-9 identical values that are the same as the corresponding SRTM DEM (Figure 9.6b). 9.5.5

Simulation of the Interpolation

The quality of cross-validation in this study is itself validated by simulating the interpolation process with known subimages that do not contain missing data. For example, if a known subimage is selected from Figure 9.1 (numbered box 3), we can remove some data and apply our recovery algorithm. The subimage is similar in topographic features to the area shown in Figure 9.6. The process is illustrated in Figure 9.9. We introduce a hole as shown in Figure 9.9b and calculate the CVSS (Figure 9.9d) for each l ranging from 0 to 2. Then we use the estimated l, which minimizes the CVSS, for the interpolation process to obtain the image in Figure 9.9c. For each value of l we also calculate the RMS error between the known and the interpolated images. The RMS error is plotted against l in Figure 9.9e. The CVSS is minimized for l ¼ 0.062, while the RMS error has a minimum at l ¼ 0.065. This agreement suggests that minimizing the CVSS is a useful method to balance the constraints. Note that the minimum RMS error in Figure 9.9e is about 5 m. This value is smaller than the relative vertical height accuracy of the SRTM DEM, which is about 10 m.

9.6

Interpolation Results

The method presented in the previous section was applied to the entire image of Figure 9.1. The registered TOPSAR DEM contains missing data in regions of various sizes. Small subimages were extracted from the DEM. Each subimage is interpolated, and the

Signal and Image Processing for Remote Sensing

184

20

20

20

40

40

40

60

60

80

80

80

100

100

100

120

120

120

140

140

140

60

A⬘

A

(a)

20

40

60

80

100

20

(b)

40

60

80

100

(c)

20

40

60

80

100

Profiles along A−A⬘

1080 1060

Altitude (m)

1040

1020

1000

980 (a) l → ∞ (b) l = 0.16 (c) l = 0 SRTM DEM

960

(d)

940 10

20

30

40

50

60

70

80

90

100

110

Distance (pixel)

FIGURE 9.8 The results of interpolation applied to DEMs in Figure 9.6, with various weights. (a) l!1, (b) l ¼ 0.16, and (c) l ¼ 0. Profiles along A–A0 are shown in the plot (d). (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

results are reinserted into the large DEM. The locations and sizes of the subimages are indicated with white boxes in Figure 9.10a. Note the largest region of missing data in the middle of the caldera. This region is not only a simple large gap but also a gap between two swaths. The interpolation is an iterative process and fills up regions of missing data starting from the boundary. If valid data along the boundary (boundaries of a swath for example) contain edge effects, error tends to propagate through the interpolation process. In this case, expanding the region of missing data by a few pixels before interpolation produces better results. If there is a large region of missing data, the spectral content information of valid data can fade out as the interpolation proceeds toward the center of the gap. In this case, sequentially applying the interpolation to parts of the gap is one solution. Due to edge effects along the boundary of the large gap, the interpolation result does not produce topography that matches the surrounding terrain well. Hence, we expand the gap by three pixels to eliminate edge effects. We divided the gap into multiple subimages, and each subimage was interpolated individually.

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

20

20

20

40

40

40

60

60

60

80

80

80

100

100

100

120

120

120

(a)

20

40

60

80

100 120

140

(b)

20

40

60

80

100 120

140

(c)

20

40

60

185

80

100 120

140

130 125

7

120 6.5

RMS error

CVSS(l)

115 110 105 100

6 5.5

95 90

5

85

(d)

80 0

0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

l

1

4.5 0

(e)

0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

l

FIGURE 9.9 The quality of the CVSS, (a) a sample image that does not have a hole, (b) a hole was made, (c) interpolated image with an optimal weight, (d) CVSS as a function of l. The CVSS has a minimum when l ¼ 0.062, and (e) RMS error between true image (a) and the interpolated image (c). The minimum occurs when l ¼ 0.065. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

9.7

Effect on InSAR

Finally, we can investigate the effect of the artifact elimination and the interpolation on simulated interferograms. It is often easier to see differences in elevation in simulated interferograms than in conventional contour plots. In addition, simulated interferograms provide a measure of how sensitive the interferogram is to the topography. Figure 9.11 shows georeferenced simulated interferograms from three DEMs: the registered TOPSAR DEM, the TOPSAR DEM after the artifact elimination, and the TOPSAR DEM after the interpolation. In all interferograms, a C-band wavelength is used, and we assume a 452 m perpendicular baseline between two satellite positions. This perpendicular baseline is realistic [2]. The fringe lines in the interferograms are approximately height contour lines. The interval of the fringe lines is inversely proportional to the perpendicular baseline [10], and in this case one color cycle of the fringes represents about 20 m. Note in Figure 9.11a that the fringe lines are discontinuous across the long region of missing data inside the caldera. This is due to artifacts in the original TOPSAR DEM. After eliminating these artifacts the discontinuity disappears (Figure 9.11b). Finally, the missing data regions are interpolated in a seamless manner (Figure 9.11c).

Signal and Image Processing for Remote Sensing

186

100

200

300

400

500

600 (a)

200

400

600

800

1000

1200

200

400

600

800

1000

1200

100

200

300

400

500

600

(b)

800

900

1000

1100

1200 (m)

FIGURE 9.10 The original TOPSAR DEM (a) and the reconstructed DEM (b) after interpolation with PE filter and SRTM DEM constraints. The grayscale is altitude in meters, and the spatial extent is about 12 km across the image. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

Use of a Prediction-Error Filter in Merging High- and Low-Resolution Images

187

(a)

(b)

(c)

FIGURE 9.11 Simulated interferograms from (a) the original registered TOPSAR DEM, (b) the DEM after the artifact was removed, and (c) the DEM interpolated with PE filter and the SRTM DEM. All the interferograms were simulated with the C-band wavelength (5.6 cm) and a perpendicular baseline of 452 m. Thus, one color cycle represents 20 m height difference. (From Yun, S.-H., Ji, J., Zebker, H., and Segall, P., IEEE Trans. Geosci. Rem. Sens., 43(7), 1682, 2005. With permission.)

9.8

Conclusion

The aircraft roll artifacts in the TOPSAR DEM were eliminated by subtracting the difference between the TOPSAR and SRTM DEMs. A 2D PE filter derived from the existing data and the SRTM DEM for the same region are then used as interpolation constraints.

188

Signal and Image Processing for Remote Sensing

Solving the inverse problem constrained with both the PE filter and the SRTM DEM produces a high-quality interpolated map of elevation. Cross-validation works well to select optimal constraint weighting in the inversion. This objective criterion results in less biased interpolation and guarantees the best fit to the SRTM DEM. The quality of many other TOPSAR DEMs can be improved similarly.

References 1. H.A. Zebker and R.M. Goldstein, Topographic mapping from interferometric synthetic aperture radar observations, Journal of Geophysical Research, 91(B5), 4993–4999, 1986. 2. F. Amelung, S. Jo´nsson, H. Zebker, and P. Segall, Widespread uplift and ‘trapdoor’ faulting on Gala´pagos volcanoes observed with radar interferometry, Nature, 407(6807), 993–996, 2000. 3. S. Yun, P. Segall, and H. Zebker, Constraints on magma chamber geometry at Sierra Negra volcano, Gala´pagos Islands, based on InSAR observations, Journal of Volcanology and Geothermal Research, 150, 232–243, 2006. 4. H.A. Zebker, S.N. Madsen, J. Martin, K.B. Wheeler, T. Miller, Y.L. Lou, G. Alberti, S. Vetrella, and A. Cucci, The TOPSAR interferometric radar topographic mapping instrument, IEEE Transactions on Geoscience and Remote Sensing, 30(5), 933–940, 1992. 5. S.N. Madsen, H.A. Zebker, and J. Martin, Topographic mapping using radar interferometry: processing techniques, IEEE Transactions on Geoscience and Remote Sensing, 31(1), 246–256, 1993. 6. Y. Kobayashi, K. Sarabandi, L. Pierce, and M.C. Dobson, An evaluation of the JPL TOPSAR for extracting tree heights, IEEE Transactions on Geoscience and Remote Sensing, 38(6), 2446–2454, 2000. 7. J.F. Claerbout, Earth Sounding Analysis: Processing versus Inversion, Blackwell, 1992 [Online], http://sepwww.stanford.edu/sep/prof/index.html. 8. J.F. Claerbout and S. Fomel, Image Estimation by Example: Geophysical Soundings Image Construction (Class Notes), 2002 [Online], http://sepwww.stanford.edu/sep/prof/index.html. 9. G. Wahba, Spline Models for Observational Data, ser. No. 59, Applied Mathematics, Philadelphia, PA, SIAM, 1990. 10. H.A. Zebker, P.A. Rosen, and S. Hensley, Atmospheric effects in interferometric synthetic aperture radar surface deformation and topographic maps, Journal of Geophysical Research, 102(B4), 7547–7563, 1997.

10 Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

Fengyu Cong, Chi Hau Chen, Shaoling Ji, Peng Jia, and Xizhi Shi

CONTENTS 10.1 Introduction ..................................................................................................................... 189 10.2 Problem Description....................................................................................................... 190 10.3 BSCM Algorithm............................................................................................................. 192 10.4 Experiments and Analysis ............................................................................................ 195 10.4.1 Backward Matrix of Active Sonar Data for Approximating the BSCM Model ............................................................................................... 195 10.4.2 Examples of Canceling Real Sea Reverberation .......................................... 196 10.4.2.1 Example 1: Simulation 1 ................................................................. 196 10.4.2.2 Example 2: Simulation 2 ................................................................. 196 10.4.2.3 Example 3: Simulation 3 ................................................................. 197 10.4.2.4 Example 4: Real Target Experiment ............................................. 200 10.4.2.5 Example 5: Simulation 4 ................................................................. 200 10.5 Conclusions...................................................................................................................... 201 References ................................................................................................................................... 202

10.1

Introduction

Under heavy oceanic reverberation, it can be difficult to detect a target echo accurately with an active sonar. To resolve the problem, researchers apply two methods that use reverberation models and specialized processing techniques [1]. For the first method, receivers with differing range resolutions may encounter different statistics for a given waveform. Furthermore, a given receiver may encounter different statistics at different ranges [2]. Researchers have used Weibull, log-normal, Rician, multi-modal Rayleigh, and nonRayleigh distributions to describe sonar reverberations [3,4]. For the second method, it is usually assumed that reverberation is a sum of returns issued from the transmitted signal. Under this assumption, a data matrix is first generated from the data received by the active sonar data [5,6]. The principal component inverse (PCI) [7–13] is primarily used to separate reverberation and target echoes from the data matrix. However, important prior knowledge such as the target power should be provided [11]. In Refs. [11–13], PCI and other methods have performed very well in Doppler cases, and the authors have also shown that PCI still performs well when the Doppler effect is not introduced. Provided that prior knowledge is 189

Signal and Image Processing for Remote Sensing

190

hard to obtain and the Doppler effect does not exist, it becomes very desirable to cancel reverberation with easily obtainable but minimal prior knowledge even in more complicated undersea situations. The chapter focuses on this case. The essence of PCI is separation by rank reduction. Nevertheless, the problem of separation is an old one in electrical engineering and many algorithms exist depending on the nature of signals. Blind signal separation (BSS) [14] is a significant statistical signal processing method that has been developed in the past 15 years. The advantage of BSS is that it does not need much prior knowledge and makes full use of the simple and apparent statistical properties, such as non-Gaussianity, nonstationarity, colored character, uncorrelatedness, independence, and so on. We studied BSS on canceling reverberation in Ref. [15]. From the perspective of BSS, the data received by active sonar is the convolutive mixture [16], while the instantaneous mixture model was only discussed in Ref. [15]. Consequently, we perform blind separation of convolutive mixtures (BSCM) to nullify oceanic reverberation in this contribution. In Ref. [15], the data waveform is only described in time-domain, and here, we will provide more illustrations on the matched filter outputs under different signal-to-reverberation ratios (SRR). We provide more examples for better explanation. The rest of this chapter is organized as follows: Section 10.2 presents the problem description; Section 10.3 introduces the BSCM algorithm, which is based on the reverberation characters; Section 10.4 provides examples of canceling real sea reverberation as the main content; and finally, Section 10.5 summarizes the above contents.

10.2

Problem D escription

In this chapter, we assume that reverberation is a sum of returns generated from the transmitted signal. In Ref. [16], the active sonar data series d(t) is expressed as d(t) ¼ h1 (t t 1 ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

K X k¼2

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

(10:1)

noise

reverberation

where t k is the propagation delay, hk(t) is the path impulse response, and e(t) is the transmitted signal. In the reverberation dominant circumstance, we can decompose the active sonar data into the target echo component ~d(t) and the reverberation component r(t), as defined in Equation 10.2. Extracting the target and nullifying the reverberation are done simultaneously ~ d(t) ¼ h1 (t t 1 ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z} target

r(t) ¼

K X k¼2

noise

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

(10:2)

noise

It is apparent that ~ d(t) and r(t) are of different non-Gaussianity. Next, we show an example of real sea reverberation. For simplicity, we do not introduce the experimental detail in this section. The transmitted signal is sinusoid with a frequency of 1750 Hz, a duration of 200 msec, and a sampling frequency of 6000 Hz. Figure 10.1 contains only the target echo and the background noise,

Normalized amplitude

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation 0.8 0.6 0.4 0.2 0 − 0.2 − 0.4 − 0.6 − 0.8 0

0.5

1

Samples

1.5

2

191

2.5 ⫻ 104

FIGURE 10.1 Target echo waveform.

and most reverberation generated by a sine wave is included in Figure 10.2 where no target echo exists and the background noise is embedded. The two waveforms in Figure 10.1 and Figure 10.2 are the time-domain descriptions of ~d(t) and r(t) with corresponding standard kurtosis [17] 4.1 and 1.6, respectively. Here, the time-domain non-Gaussianity is enough to discriminate ~ d(t) and r(t). It may appear that detection for ~d(t) must be evident. However, in the real world problem, it is not simple to cancel the reverberation to the degree as in Figure 10.1. To explore different effects of reverberation on detection in different SRRs, we give several examples of real sea reverberation as follows. In Figure 10.3, the first upper plot is reverberation, and in the remaining three plots, different targets are simulated and added to the reverberation with the SRR ¼ 0 dB, 7 dB, and 14 dB, respectively, from the upper to lower plots. The target echo is located between the 9,000th and 11,000th samples. The target does not move, hence no Doppler effect is produced. The matched filter outputs follow in the next figure. In Figure 10.4, the middle two plots show that the matched filter results are satisfactory even in the case that the SRR is quite small, and the lowest plot gives too many false alarms. The task for us is to cancel the reverberation to the degree that the detection is possible, similar to the middle two plots. In Equation 10.2, ~ d(t) does not contain any reverberation and just covers only the target echo and the background noise. In real world case, that is impossible. However, it does not harm the detection. Figure 10.4 also reminds us that even if ~ d(t) ¼ h1 (t t 1 ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

~ K X k¼2

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

(10:3)

noise

Normalized amplitude

~ < K, and as long as the reverberation is reduced to some satisfactory level, the where K detection can also be reliable. After comparing the first and the third plots in Figure 10.3 and Figure 10.4, respectively, we find that Figure 10.3 shows that the reverberation waveform with small SRR target added is nearly identical to the waveform of

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

FIGURE 10.2 Reverberation waveform.

0.5

1

Samples

1.5

2

2.5 ⫻ 104

Signal and Image Processing for Remote Sensing

192 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

Waveform: reverberation only

0

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 0 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = 0 dB

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = −7 dB

0

0.5

1

1.5

2

2.5 ⫻ 104

Waveform: SRR = −14 dB

0

0.5

1

1.5

2

2.5 ⫻ 104

FIGURE 10.3 Real reverberation with different simulated targets.

reverberation, but Figure 10.4 implies that the corresponding matched filter outputs are definitely of different non-Gaussianity. This information is useful to our method for distinguishing the target echoes from the reverberation echoes.

10 .3

BSCM Algorit hm

In the past several years, different researchers developed several BSCM algorithms according to signal properties. Generally speaking, for a given source–receiver pair, the waveform arrives at different travel times, owing to varying path lengths [18]. All of these make reverberations nonstationary. The transmitted signal is sinusoid here, so the

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

193

MF output: reverberation only

0

0.5

1

1.5

2

⫻104

2.5

MF output: SRR = 0 dB

0

0.5

1

1.5

2

2.5 ⫻104

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

MF output: SRR = −7 dB

0

0.5

1

1.5

2

2.5

Normalized amplitude

⫻104 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

MF output: SRR = −14 dB

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.4 Outputs of matched filter on the data in Figure 10.3.

reverberations must retain the colored feature. Therefore, we say reverberations are nonstationary and colored. In this section, we will give only a brief introduction on the BSCM algorithm. Consider a speech scenario, where J microphones receive multiple filtered copies of statistically uncorrelated or independent signals. Mathematically, the received signals can be expressed as a convolution, that is, xj (t) ¼

I X P 1 X

hji (p)si (t p), j ¼ 1, . . . , J 0;

t ¼ 1, . . . , L

(10:4)

i¼1 p¼0

where hji(p) models the P-point impulse response from source to microphone and L is the length of the received signal. In a more compact matrix–vector notation, Equation 10.4 can be stated as x(t) ¼

P 1 X p ¼0

H(p)s(t p)

Signal and Image Processing for Remote Sensing

194

s2(n)

+

H11

s1(n)

x1(n)

H21

W21

H12

W12 +

H22

x2(n)

Mixing

W11

+

s1(n)

W22

+

s2(n)

Unmixing

FIGURE 10.5 Specific (2 2) blind separation of convolutive mixtures problem.

¼ H(p) s(t)

(10:5)

where x(t) ¼ [x1(t), . . . , xJ(t)]T is the received signal vector, s(t) ¼ [s1(t), . . . , sI(t)]T is the source vector, H(p) is the mixing filter matrix, * denotes the convolution, and [.]T means the transpose operation. Without loss of generality, we assume J ¼ I in this chapter. With this problem setup, the objective of the BSCM techniques is to find an unmixing filter W(q) of length Q for deconvolution, that is, Ð s(t) ¼ W(q) x(t)

(10:6)

where ^s(t) denotes the estimated sources. Figure 10.5 is a block diagram for J ¼ I ¼ 2. For nonstationary sources, a moving block is usually performed on received signals Ð Ð x(t,m) ¼ H(p) s(t,m), t ¼ 1, . . . , N, m ¼ 1, . . . , M

(10:7)

where m is the index for blocks, M is the total number of blocks, and N is the length of each block. After the fast Fourier transform (FFT) is applied on the three components in Equation 10.7, we obtain

Ð Ð X(w,m) ¼ H(w) S(w,m), w ¼ 1, . . . , N

(10:8)

where w denotes the frequency bin. Thus, the convolutive mixtures in time-domain turn into the instantaneous mixtures in frequency domain. This is the basic idea of the frequency method for BSCM. Then, the separation is done in each frequency bin for the unmixed filters,

Ð ^ (w,m) Y(w,m) ¼ W(w)X

(10:9)

Ð are the unmixed filter and unmixed signals in frequency bin w, where W(w) and Y(w,m) respectively. Since the envelope of the signal spectrum is correlated, XÐ j(w,m) must be a colored signal. Under the independence assumption of signals in the time and frequency domains, we apply the cost function as follows: F¼

T1 1X kW(w,r)G xÐ (w,t)WH (w,r) Ik2 4 t¼0

(10:10)

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

195

The algorithm for the unmixing matrix is given by W(w,r þ 1) ¼ W(w,r) þ m(w)

T 1 X

W(w,r)GxÐ (w,t )WH (w,r) I W(w,r)G xÐ (w,t )

(10:11)

t¼0

where m is the learning rate, r is the iteration time, T is the total number of delayed samples, [.]H is the conjugate transpose operation, I is the identity matrix, and with Ð w,m). BSS may introduce t samples delayed, G xÐ (w,t ) is the autocorrelation matrix of x( two indeterminacies including the permutation and scaling problems [14,17]. We apply the method in Ref. [19] to resolve the permutation. More methods may be found in Refs. [20–32]. Also, the scaling ambiguity is a nearly open problem to deal with. Some methods to deal with this problem are provided in Refs. [33,34]. Generally, to normalize the permuted estimated matrix is a fast and effective way. This is helpful in maintaining a stable convergence too [35]. We do not discuss this much here. After the inverse FFT is applied on W(w) we get W(q), and then BSCM is performed according to Equation 10.6. For more information about BSCM, please refer to the literatures mentioned above.

10 .4 10.4.1

Ex pe riment s an d Analy sis Backward Matrix of Active Sonar Data for Approximating the BSCM Model

In Section 10.3, BSCM is introduced with some assumptions. However, in applications like speech enhancement, remote sensing, communication systems, geophysics, and biomedical engineering, conditions may not meet the basic theoretical requirements of BSCM. Interestingly, as long as algorithms are converged, BSCM still performs signal deconvolution effectively after some approximations are made [14,17]. For active sonar, the target echo and the oceanic reverberation are all generated by the same transmitted signal, so they must be somewhat correlated nevertheless. Principal component analysis [36] is the method for decorrelation, and PCI has been applied in separating the target echo and reverberation with some prior knowledge provided [5–13]. As Neumann and Krawczyk have pointed out, the forward matrix model used is not a principal issue of the PCI algorithm, and any other forward model can be used [37]. Approximation implies that the processing result must be discounted under the comparison to the corresponding theory. Fortunately, as we pointed out in Section 10.2 the detection in the presence of reverberation may also be reliable with the matched filter even when the reverberation is not absolutely canceled, but the SRR should be within a certain limit. So, the approximation to BSCM is possible by the one-dimensional active sonar data. Intuitively, we generate the backward data matrix from active sonar data d(t) as the received signals x(t) to approximate the classical BSCM model: T x(t) ¼ x1 (t), . . . , xJ (t) xj (t) ¼ d½t þ j(B 1) 1

(10:12)

where xj(t) is the so-called jth received signal, J is the total number of received signals, and B is the size of a moving block, and it is an empirical parameter. By adjusting J and B, we may produce a different backward matrix. Also, xj(t) will not miss the target because the

Signal and Image Processing for Remote Sensing

196

two numbers J and B are relatively small in comparison to the whole length of d(t). The losses of xj(t) are at the beginning of the received reverberation and are often not useful. These make the approximation entirely reasonable.

10.4.2

Examples of Canceling Real Sea Reverberation

To simplify the discussion in this chapter, we only outline briefly the experimental setup here. Further details are in Ref. [11] about the underwater acoustic data and experiments. The transmitted signal is sinusoid with a frequency of 1750 Hz, the duration is 200 msec, and the sampling frequency is 6000 Hz. The examples are all with different targets simulated and added into the real sea reverberation. Before the targets are added, the reverberation is through the band-pass filter from 1500 to 2000 Hz. The reverberation we use is over 6 sec. For an enlarged plot, we show only the first 4 sec, which has no effect on the result as the reverberation is very light in the last 2 sec. The block size is N ¼ 200 for Equation 10.7, and the total number of delayed samples is T ¼ M for Equation 10.9. Since the energy varies in different frequency bins, the learning rate m(w) is supposed to correspond to the various frequency bins, and then it is defined as in Ref. [38]: 0:6 m(w) ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ T1 P jG xÐ (w,t)j2

(10:13)

t¼0

10.4.2.1

Example 1: Simulation 1

We take the example of the last plot in Figure 10.3. The corresponding part in Figure 10.4 shows that the detection is no good. A sinusoid target echo of 1750 Hz is simulated and added to the reverberation with SRR ¼ 14 dB. The target echo is located between the 9,001st and 11,000th samples. No Doppler effect is produced, and J ¼ 38 and B ¼ 1. After BSCM is performed on the backward matrix, J deconvolved signals are given, and then, all signals pass through the matched filter. We compute the kurtosis of all the outputs of the matched filter and choose the 18th deconvolved signal that has the highest kurtosis in Figure 10.6 as the BSCM result ~ d(t). In the next simulations, the BSCM results follow this rule. Figure 10.7 is the matched filter on the BSCM result ~d(t). By contrast to the fourth plot in Figure 10.4, it is apparent that the detection is more reliable after BSCM. Next, we show an example with Doppler effect.

10.4.2.2 Example 2: Simulation 2 A sinusoid target echo of 1745 Hz is simulated and added to the reverberation with SRR ¼ 19 dB. The target echo is located between the 12,001st and 14,000th samples. Doppler effect exists, and J ¼ 20 and B ¼ 1. Figure 10.8 is the matched filter on d(t) and the BSCM result ~ d(t). Figure 10.8 indicates that the BSCM is more effective when the Doppler effect is introduced because the reverberation and the target echo are more different. Figure 10.7 and Figure 10.8 show that the SRR is improved to the degree that the detection is more reliable than before. Since the transmitted signal and the target echo are all sinusoids of a single frequency or with very small bandwidth, obvious changes in frequency domain between the original d(t) and the BSCM result ~d(t) are not found. So, we perform the third

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

197

6

Components kurtosis

5.5 5 4.5 4 3.5 3

0

5

10

15

20 25 Components number

30

35

40

FIGURE 10.6 Kurtosis of outputs from the matched filter on the deconvoluted signals.

simulation. Although it is not very reasonable, it does help to show the validation of BSCM algorithm. 10.4.2.3

Example 3: Simulation 3

A hyperbolic frequency modulated (HFM) target echo is simulated and added to the reverberation with the SRR ¼ 28 dB. The center frequency is 1750 Hz and the bandwidth is 500 Hz. The target echo is located between the 12,001st and 18,000th samples, and J ¼ 30 and B ¼ 2. Figure 10.9 is the matched filter on d(t) and the BSCM result ~d(t). In Figure 10.9, it is amazing that the detection is improved greatly after BSCM. To compare the changes in frequency domain, we take out 6000 samples as the target echo from the 12,001st to 18,000th samples d(t) and ~ d(t), respectively, and then, FFT is applied on the two groups of data. Figure 10.10 shows the changes. It is apparent that the SRR has improved a lot after BSCM was applied.

0.8

Normalized amplitude

0.7

BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.7 Output of matched filter on the BSCM result on reverberation with a sinusoid simulated target (SRR ¼ 14 dB, no Doppler effect exists).

Signal and Image Processing for Remote Sensing

198

Normalized amplitude

0.8 0.7 0.6 MF output

0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

Normalized amplitude

0.8 0.7 0.6

BSCM MF output

0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.8 Outputs of matched filter on a real reverberation with a sinusoid simulated target (SRR ¼ 19 dB, Doppler effect exists).

0.8

Normalized amplitude

0.7 MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

0.8

Normalized amplitude

0.7 BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.5

1

Samples

1.5

2

2.5 ⫻104

FIGURE 10.9 Outputs of matched filter on reverberation with a HFM simulated target with SRR ¼ 28 dB.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

199

0.8

Normalized amplitude

0.7

Reverberation with simulated target

0.6 0.5 0.4 0.3 0.2 0.1 0

0

500

1000

1500 2000 Frequency bins

2500

3000

3500

0.8

Normalized amplitude

0.7

BSCM result

0.6 0.5 0.4 0.3 0.2 0.1 0

0

500

1000

1500 2000 Frequency bins

2500

3000

3500

FIGURE 10.10 The changes in frequency domain.

Normalized amplitude

0.8 0.7

MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3 Samples

4

5

6 ⫻ 104

Normalized amplitude

0.8 0.7

BSCM MF output

0.6 0.5 0.4 0.3 0.2 0.1 0

0

1

2

3 Samples

4

5

6 ⫻ 104

FIGURE 10.11 Outputs of matched filter on the active sonar data and the BSCM output in real target experiment.

Signal and Image Processing for Remote Sensing

200

Reverberation with simulated targets

Normalized amplitude

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

3.5

4 ⫻ 104

Normalized amplitude

0.8 0.6

BSCM output

0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

FIGURE 10.12 Waveforms of deep sea reverberation with two simulated targets and BSCM output—no overlap between the target echoes.

10.4.2.4

Example 4: Real Target Experiment

To simplify the discussion in this paper, we only outline briefly the experimental setup here. Further details on the underwater acoustic data and experimental results are in Ref. [11]. The transmitted signal is HFM, the duration is 1 sec, and the bandwidth is 500 Hz. The target speed is about 2.0 m/sec, the range is about 2500 m. The sampling frequency is also 6000 Hz. The data we use is after beamforming. Figure 10.11 shows the matched filter output of the original data and the BSCM output signals. The figure reveals the effect in canceling oceanic reverberation by BSCM without any expensive prior knowledge. 10.4.2.5

Example 5: Simulation 4

The above examples are all reverberations in shallow water. In this simulation, we give an example for canceling deep reverberation. The sea depth is beyond 4000 m. The sampling frequency is also 20,000 Hz. The transmitted signal is HFM and the bandwidth is 100 Hz. In deep sea, more than one target echo may occur within the reverberation issued by one ping; that is, Equation 10.1 is now written as d(t) ¼

K1 X k¼1

hk (t t k ) e(t) þ |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} target

K2 X k¼K1 þ1

hk (t t k ) e(t) þ n(t) |ﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄ} |{z}

reverberation

noise

(10:14)

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

Reverberation with simulated targets

0.8 Normalized amplitude

201

0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

Normalized amplitude

0.8 0.6

BSCM output

0.4 0.2 0 −0.2 −0.4 −0.6 −0.8

0

0.5

1

1.5

2 Samples

2.5

3

3.5

4 ⫻ 104

FIGURE 10.13 Waveforms of deep sea reverberation with two simulated targets and BSCM output—overlap between the target echoes.

In this example two target echoes are simulated. The SRR is 0 dB. The first case is shown in Figure 10.12. The two echoes do not overlap, and the duration is 0.05 sec. The second overlaps at the rate of 66.7% as shown in Figure 10.13 and the duration is 0.3 sec. Both Figure 10.12 and Figure 10.13 show that SRR has improved. This simulation implies that BSCM is also effective in canceling deep sea reverberation. In all simulations, we do not use BSS, but it does not imply that BSS is not useful. Compared to BSCM, BSS needs larger J and B, so the computation is more expensive. Sometimes, BSS cannot provide sound results, but BSCM can be used because it is much closer to the convolutive model of the active sonar data.

10 .5

Conc lusi ons

In this chapter we applied three useful and easily obtainable statistical characteristics to remove reverberation by deconvolving and distinguishing the target echoes from the reverberation echoes. They are the nonstationarity and the colored features of reverberation, and the non-Gaussianity of outputs of the matched filter on BSCM results. Except for statistical information, BSCM usually does not need other expensive prior knowledge, such as the target echo energy or the Doppler shift, and so on. This is the advantage of our

202

Signal and Image Processing for Remote Sensing

method. Though the active sonar data used in this paper is one-dimensional convolutive mixtures, the elimination of real sea reverberation proves that the approximation to the BSCM model is effective through backward data matrix. Examples show that BSCM results can improve the SRR to a degree of satisfactory detection. Finding a good criterion for choosing the appropriate data matrix, as in Ref. [15], is open for exploration.

References 1. T.J. Barnard and F. Khan, Statistical normalization of spherically invariant non-Gaussian clutter, IEEE Journal of Oceanic Engineering, 29(2), 303–309, 2004. 2. J.M. Fialkowski, R.C. Gauss, and D.M. Drumheller, Measurements and modeling of lowfrequency near-surface scattering statistics, IEEE Journal of Oceanic Engineering, 29(2), 197–214, 2004. 3. K.D. LePage, Statistics of broad-band bottom reverberation predictions in shallow-water waveguides, IEEE Journal of Ocean Engineering, 29(2), 330–346, 2004. 4. J.R. Preston and D.A. Abraham, Non-Rayleigh reverberation characteristics near 400 Hz observed on the New Jersey shelf, IEEE Journal of Oceanic Engineering, 29(2), 215–235, 2004. 5. I.P. Kirsteins and D.W. Tufts, Adaptive detection using low rank approximation to a data matrix, IEEE Transactions on Aerospace and Electronic Systems, 30(1), 55–67, 1994. 6. I.P. Kirsteins and D.W. Tufts, Rapidly adaptive nulling of interference, IEEE International Conference on Systems Engineering, pp. 269–272, 24–26 August 1989. 7. B.E. Freburger and D.W. Tufts, Case study of principal component inverse and cross spectral metric for low rank interference adaptation, in Proceedings of ICASSP ’98, Vol. 4, pp. 1977–1980, 12–15 May 1998. 8. B.E. Freburger and D.W. Tufts, Adaptive detection performance of principal components inverse, cross spectral metric and the partially adaptive multistage Wiener filter, Conference Record of the Thirty-Second Asilomar Conference on Signals, Systems & Computers, Vol. 2, pp. 1522– 1526, 1–4 November 1998. 9. B.E. Freburger and D.W. Tufts, Rapidly adaptive signal detection using the principal component inverse (PCI) method, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, Vol. 1, pp. 765–769, 2–5 November 1997. 10. T.A. Palka and D.W. Tufts, Reverberation characterization and suppression by means of principal components, in OCEANS ’98 Conference Proceedings, Vol. 3, pp. 1501–1506, 28 September–1 October 1998. 11. G. Ginolhac and G. Jourdain, Principal component inverse algorithm for detection in the presence of reverberation, IEEE Journal of Oceanic Engineering, 27(2), 310–321, 2002. 12. G. Ginolhac and G. Jourdain, Detection in presence of reverberation, OCEANS 2000 MTS/IEEE Conference and Exhibition, Vol. 2, pp. 1043–1046, 11–14 September 2000. 13. V. Carmillet, P.O. Amblard, and G. Jourdain, Detection of phase- or frequency-modulated signals in reverberation noise, JASA, 105(6), 3375–3389, 1999. 14. A. Cichocki and S. Amari, Adaptive Blind Signal Image Processing: Learning Algorithm and Application, John Wiley & Sons, New York, 2002. 15. F. Cong et al., Blind signal separation and reverberation canceling with active sonar data, in Proceedings of ISSPA, Australia 2005. 16. G.S. Edelson and I.P. Kirsteins, Modeling and suppression of reverberation components, IEEE Seventh SP Workshop on Statistical Signal and Array Processing, pp. 437–440, 26–29 June 1994 17. A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley Interscience, New York, 2001. 18. D.A. Abraham and A.P. Lyons, Simulation of non-Rayleigh reverberation and clutter, IEEE Journal of Oceanic Engineering, 29(2), 347–362, 2004.

Blind Separation of Convolutive Mixtures for Canceling Active Sonar Reverberation

203

19. F. Cong et al., Approach based on colored character to blind deconvolution for speech signals, in Proceedings of ICIMA, pp. 396–399, 2004. 20. S. Amari, S.C. Douglas, A. Cichocki, and H.H. Yang, Multichannel blind deconvolution and equalization using the natural gradient, in Proceedings of IEEE Workshop Signal Processing Advances Wireless Communications, pp. 101–104, April 1997. 21. M. Kawamoto, K. Matsuoka, and N. Ohnishi, A method of blind separation for convolved nonstationary signals, Neurocomputing, 22, 157–171, 1998. 22. S.C. Douglas and X. Sun, Convolutive blind separation of speech mixtures using the natural gradient, Speech Communications, 39, 65–78, 2003. 23. P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Neurocomputing, 22, 21–34, 1998. 24. H. Sawada, R. Mukai, S. Araki, and S. Makino, Polar coordinate based nonlinear function for frequency domain blind source separation, IEICE Transactions Fundamentals, E86-A(3), 590– 596, 2003. 25. L. Schobben and W. Sommen, A frequency domain blind signal separation method based on decorrelation, IEEE Transactions on Signal Processing, 50, 1855–1865, 2002. 26. H. Sawada, R. Mukai, S. Araki, and S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation, IEEE Transactions on Speech and Audio Processing, 12(5), 530–538, 2004. 27. W. Lu and J.C. Rajapakse, Eliminating indeterminacy in ICA, Neurocomputing, 50, 271–290, 2003. 28. S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari, The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech, IEEE Transactions on Speech and Audio Processing, 11(2), 109–116, 2003. 29. M.Z. Ikram and D.R. Morgan, Permutation inconsistency in blind speech separation: investigation and solutions, IEEE Transactions on Speech and Audio Processing, 13(1), 1–13, 2005. 30. A. Dapena and L. Castedo, A novel frequency domain approach for separating convolutive mixtures of temporally white signals, Digital Signal Processing, 13(2), 301–316, 2003. 31. A. Dapena, M.F. Bugallo, and L. Catcdo, Separation of convolutive mixtures of temporallywhite signals: a novel frequency-domain approach, in Proceedings of Third ICA, San Diego, California, pp. 179–184, 2001. 32. C. Mejuto, A. Dapena, and L. Casteda, Frequency-domain infomax for blind source separation of convolutive mixtures, in Proceedings of ICA 2000, pp. 315–320, Helsinki, 2000. 33. K. Matsuoka and S. Nakashima, Minimal distortion principle for blind source separation, in Proceedings of ICA, pp. 722–727, December 2001. 34. N. Murata, S. Ikeda, and A. Ziehe, An approach to blind source separation based on temporal structure of speech signals, Neurocomputing, 41(1), 1–24, 2001. 35. W. Lu and J.C. Rajapakse, Eliminating indeterminacy in ICA, Neurocomputing, 50, 271–290, 2003. 36. K.I. Diamantaras and S.Y. Kung, Principal Component and Neural Network Theory and Application, John Wiley & Sons, New York, 1996. 37. A. Neumann and H. Krawczyk, Principal Component Inversion, Training Course on Remote Sensing of Ocean Color, Ahmedabad, India, February 2001. 38. M.Z. Ikram, Multichannel Blind Separation of Speech Signals in a Reverberant Environment, Ph.D thesis, Georgia Institute of Technology, 2001.

11 Neural Network Retrievals of Atmospheric Temperature and Moisture Profiles from HighResolution Infrared and Microwave Sounding Data

William J. Blackwell

CONTENTS 11.1 Introduction ..................................................................................................................... 206 11.2 A Brief Overview of Spaceborne Atmospheric Remote Sensing............................ 207 11.2.1 Geophysical Parameter Retrieval ................................................................... 209 11.2.2 The Motivation for Computationally Efficient Algorithms ....................... 210 11.3 Principal Components Analysis of Hyperspectral Sounding Data........................ 210 11.3.1 The PC Transform............................................................................................. 211 11.3.2 The NAPC Transform ...................................................................................... 211 11.3.3 The Projected PC Transform ........................................................................... 211 11.3.4 Evaluation of Compression Performance Using Two Different Metrics ...................................................................................... 212 11.3.4.1 PC Filtering ....................................................................................... 212 11.3.4.2 PC Regression................................................................................... 213 11.3.5 NAPC of Clear and Cloudy Radiance Data ................................................. 214 11.3.6 NAPC of Infrared Cloud Perturbations ........................................................ 214 11.3.7 PPC of Clear and Cloudy Radiance Data ..................................................... 216 11.4 Neural Network Retrieval of Temperature and Moisture Profiles ........................ 218 11.4.1 An Introduction to Multi-Layer Neural Networks ..................................... 218 11.4.2 The PPC–NN Algorithm.................................................................................. 219 11.4.2.1 Network Topology........................................................................... 220 11.4.2.2 Network Training ............................................................................ 220 11.4.3 Error Analyses for Simulated Clear and Cloudy Atmospheres ............... 220 11.4.4 Validation of the PPC–NN Algorithm with AIRS=AMSU Observations of Partially Cloudy Scenes Over Land and Ocean ............. 222 11.4.4.1 Cloud Clearing of AIRS Radiances............................................... 222 11.4.4.2 The AIRS=AMSU=ECMWF Data Set.............................................. 223 11.4.4.3 AIRS=AMSU Channel Selection ..................................................... 223 11.4.4.4 PPC–NN Retrieval Enhancements for Variable Sensor Scan Angle and Surface Pressure.................................................. 225 11.4.4.5 Retrieval Performance..................................................................... 225 11.4.4.6 Retrieval Sensitivity to Cloud Amount........................................ 225 11.4.5 Discussion and Future Work .......................................................................... 226

205

Signal and Image Processing for Remote Sensing

206

11.5 Summary .......................................................................................................................... 227 Acknowledgments ..................................................................................................................... 230 References ................................................................................................................................... 230

11.1

Introduction

Modern atmospheric sounders measure radiance with unprecedented resolution and accuracy in spatial, spectral, and temporal dimensions. For example, the Atmospheric Infrared Sounder (AIRS), operational on the NASA EOS Aqua satellite since 2002, provides a spatial resolution of 15 km, a spectral resolution of n=Dn 1200 (with 2,378 channels from 650 to 2675 cm1), and a radiometric accuracy on the order of +0.2 K. Typical polar-orbiting atmospheric sounders measure approximately 90% of the Earth’s atmosphere (in the horizontal dimension) approximately every 12 h. This wealth of data presents two major challenges in the development of retrieval algorithms, which estimate the geophysical state of the atmosphere as a function of space and time from upwelling spectral radiances measured by the sensor. The first challenge concerns the robustness of the retrieval operator and involves maximal use of the geophysical content of the radiance data with minimal interference from instrument and atmospheric noise. The second is to implement a robust algorithm within a given computational budget. Estimation techniques based on neural networks (NNs) are becoming more common in high-resolution atmospheric remote sensing largely because their simplicity, flexibility, and ability to accurately represent complex multi-dimensional statistical relationships allow both of these challenges to be overcome. In this chapter, we consider the retrieval of atmospheric temperature and moisture profiles (quantity as a function of altitude) from radiance measurements at microwave and thermal infrared wavelengths. A projected principal components (PPC) transform is used to reduce the dimensionality of and optimally extract geophysical information from the spectral radiance data, and a multi-layer feedforward NN is subsequently used to estimate the desired geophysical profiles. This algorithm is henceforth referred to as the ‘‘PPC–NN’’ algorithm. The PPC–NN algorithm offers the numerical stability and efficiency of statistical methods without sacrificing the accuracy of physical, model-based methods. The chapter is organized as follows. First, the physics of spaceborne atmospheric remote sensing is reviewed. The application of principal components transforms to hyperspectral sounding data is then presented and a new approach is introduced, where the sensor radiances are projected into a subspace that reduces spectral redundancy and maximizes the resulting correlation to a given parameter. This method is very similar to the concept of canonical correlations introduced by Hotelling over 70 years ago [1], but its application in the hyperspectral sounding context is new. Second, the use of multi-layer feedforward NNs for geophysical parameter retrieval from hyperspectral measurements (first proposed in 1993 [2]) is reviewed, and an overview of the network parameters used in this work is given. The combination of the PPC radiance compression operator with an NN is then discussed, and performance analyses comparing the PPC–NN algorithm to traditional retrieval methods are presented.

Retrievals of Atmospheric Temperature and Moisture Profiles

11.2

207

A Brief Overview of Spaceborne Atmospheric Remote Sensing

The typical measurement scenario for spaceborne atmospheric remote sensing is shown in Figure 11.1. A sensor measures upwelling spectral radiance (intensity as a function of frequency) at various incidence angles. The sensor data is usually calibrated to remove measurement artifacts such as gain drift, nonlinearities, and noise. The spectral radiances measured by the sensor are related to geophysical quantities, such as the vertical temperature profile of the atmosphere, and therefore must be converted into a geophysical quantity of interest through the use of an appropriate retrieval algorithm. The radiative transfer equation describing the radiation intensity observed at altitude L, viewing angle u, and frequency n can be formulated by including reflected atmospheric and cosmic contributions and the radiance emitted by the surface as follows [3,4]: Rn (L) ¼

ðL

ðL 0 0 kn (z)Jn [T(z)] exp sec ukn (z ) dz sec u dz

0

þ rn e

t* sec u

ðL

z

ðz 0 0 kn (z)Jn [T(z)] exp sec ukn (z ) dz sec u dz

0

0

þ rn e2t* sec u Jn (Tc ) þ «n et* sec u Jn (Ts )

(11:1)

Intensity

Voltage

1 0.5 0 −0.5 −1 −1.5 −2 −1

0

1

300 290 280 270 260 250 240 230 220 210 200 500

Spectral radiance

104

103

2

1000

1500

2000

2500

Wavelength

Time

Calibraton

Temperature profile 105

Altitude

Detector output 2 1.5

3000

10 180

200

220

240

260

280

300

Temperature

Retrieval algorithm

FIGURE 11.1 Typical measurement scenario for spaceborne atmospheric remote sensing. Electromagnetic radiation that reaches the sensor is emitted by the sun, cosmic background, atmosphere, surface, and clouds. This radiation can also be reflected or scattered by the surface, atmosphere, or clouds. The spectral radiances measured by the sensor are related to geophysical quantities, such as the vertical temperature profile of the atmosphere, and therefore must be converted into a geophysical quantity of interest through the use of an appropriate retrieval algorithm.

Signal and Image Processing for Remote Sensing

208

where «n is the surface emissivity, rn is the surface reflectivity, Ts is the surface temperature, kn(z) is the atmospheric absorption coefficient, t * is the atmospheric zenith opacity, Tc is the cosmic background temperature (2.736 + 0.017 K), and Jn(T) is the radiance intensity emitted by a blackbody at temperature T, which is given by the Planck equation: Jn (T) ¼

hn3 1 W m2 ster1 Hz1 c2 ehn=kT 1

(11:2)

The first term in Equation 11.1 can be recast in terms of a transmittance function Tn(z): Rn (L) ¼

ðL 0

dTn (z) Jn [T(z)] dz dz

(11:3)

The derivative of the transmittance function with respect to altitude is often called the temperature weighting function Wn (z)

dTn (z) dz

(11:4)

and gives the relative contribution of the radiance emanating from each altitude. The temperature weighting functions for the Advanced Microwave Sounding Unit (AMSU) are shown in Figure 11.2.

AMSU-A weighting functions

AMSU-B weighting functions

60

50

10−2

14

40 Altitude (km)

Water vapor burden (g/cm2)

Channel

13

30

12 11

20

10

183 ± 1 GHz

10−1

183 ± 3 GHz

9

10

0 0

183 ± 7 GHz

100

8 7 6

3

0.2

0.4 dT/d(In P )

150 GHz

5 4

0.6

89 GHz

0

0.1 0.2 0.3 0.4 dT/d(In (water vapor burden))

FIGURE 11.2 AMSU-A temperature profile (left) and AMSU-B water vapor profile (right) weighting functions

Retrievals of Atmospheric Temperature and Moisture Profiles 11.2.1

209

Geophysical Parameter Retrieval

The objective of the geophysical parameter retrieval algorithm is to estimate the state of the atmosphere (represented by parameter matrix X, say), given observations of spectral radiance (represented by radiance matrix R, say). There are generally two approaches to this problem, as shown in Figure 11.3. The first approach, referred to here as the variational approach, uses a forward model (for example, the transmittance and radiative transfer models previously discussed) to calculate the sensor radiance that would be measured given a specific atmospheric state. Note that the inverse model typically does not exist, as there are generally an infinite number of atmospheric states that could give rise to a particular radiance measurement. In the variational approach, a ‘‘guess’’ of the atmospheric state is made (this is usually obtained through a forecast model or historical statistics), and this guess is propagated through the forward models thereby producing an estimate of the at-sensor radiance. The measured radiance is compared with this estimated radiance, and the state vector is adjusted so as to reduce the difference between the measured and estimated radiance vectors. Details on this methodology are discussed at length by Rodgers [5], and the interested reader is referred there for a more thorough treatment of the methodology and implementation of variational retrieval methods. The second approach, referred to here as the statistical, or regression-based, approach, does not use the forward model explicitly to derive the estimate of the atmospheric state vector. Instead, an ensemble of radiance–state vector pairs is assembled, and a statistical characterization (p(X), p(R), and p(X,R)) is sought. In practice, it is difficult to obtain these probability density functions (PDFs) directly from the data, and alternative methods are often used. Two of these methods are linear least-squares estimation (LLSE), or linear regression, and nonlinear least-squares estimation (NLLSE). NNs are a special class of NLLSEs, and will be discussed later.

Variational approach: • A forward model relates the geophysical state of the atmosphere to the radiances measured by the sensor.

R=f

X ≡ [T (r,t ), W (r,t ), O (r,t ),…] surface reflectivity, solar illumination, etc. +e observing system (bandwidth, resolution, etc.)

Observation noise • A “guess” of the atmospheric state is adjusted iteratively until modeled radiance “matches” observed radiance.

g = ||R – Robs|| + h(X ) “Regularization” term

Statistical (regression-based) approach: • An ensemble of radiance−state vector pairs is assembled, and X ˆ = g(Robs), where g(·) is argmin ||Xens – g(Rens)|| a statistical relationship g(·) between the two is dervied empirically. Examples of g(·) include LLSE and neural network FIGURE 11.3 Variational and statistical approaches to geophysical parameter retrieval. In the variational approach, a forward model is used to predict at-sensor radiances based on atmospheric state. In the statistical approach, an empirical relationship between at-sensor radiances and atmospheric state is derived using an ensemble of radiance–state vectors.

Signal and Image Processing for Remote Sensing

210 11.2.2

The Motivation for Computationally Efficient Algorithms

The principal advantage of regression-based methods is their simplicity—once the coefficients are derived from ‘‘training’’ data, the calculation of atmospheric state vectors is relatively easy. The variational approaches require multiple calls to the forward models, which can be computationally prohibitive. The computational complexity of the forward models is usually nonlinearly related (often O(n2) or more) to the number of spectral channels. As shown in Figure 11.4, the spectral and spatial resolution of infrared sounders has increased dramatically over the last 35 years, and the required computation needed for real-time operation with variational algorithms has outpaced Moore’s Law. There is, therefore, a motivation to reduce the computational burden of current and next-generation retrieval algorithms to allow real-time ingestion of satellite-derived geophysical products into numerical weather forecast models.

11.3

Principal Components Analysis of Hyperspectral Sounding Data

Principal components (PC) transforms can be used to represent radiance measurements in a statistically compact form, enabling subsequent retrieval operators to be substantially

Spatial resolution

Spectral resolution 105

9 IR Microwave

IR Microwave 8 IASI

7

AIRS

6 Footprints per 100 km

Number of special channels

104

103

102

5

4

3 HIRS 101

2

ITPR

AMSU/ATMS 1

100 1970

1980

1990 Year

2000

2010

MSU

NEMS 0 1970 1980

1990 Year

2000

2010

FIGURE 11.4 Improvements in sensor spectral and spatial resolution over the last 35 years is shown. The recent increases in the spectral resolutions afforded by infrared sensors has far surpassed that available from microwave sensors. The trends in spatial resolution are similar for infrared and microwave sensors.

Retrievals of Atmospheric Temperature and Moisture Profiles

211

more efficient and robust (see Ref. [6], for example). Furthermore, measurement noise can be dramatically reduced through the use of PC filtering [7,8], and it has also been shown [9] that PC transforms can be used to represent variability in high-spectral-resolution radiances perturbed by clouds. In the following sections, several variants of the PC transform are briefly discussed, with emphasis focused on the ability of each to extract geophysical information from the noisy radiance data. 11.3.1

The PC Transform

The PC transform is a linear, orthonormal operator1 QrT, which projects a noisy ~ ¼ R þ , into an r-dimensional (r m) subspace. The m-dimensional radiance vector, R additive noise vector is assumed to be uncorrelated with the radiance vector R, and is ~ , that is, P ~ ¼ QrT R ~ have characterized by the noise covariance matrix C. The ‘‘PC’’ of R two desirable properties: (1) the components are statistically uncorrelated and (2) the reduced-rank reconstruction error. ^ ^~ R ~ )T (R ~ )] ~ R c () ¼ E[(R (11:5) 1

r

r

^ ~ for some linear operator Gr with rank r, is minimized when Gr ¼ Qr QrT. ~ r D Gr R where R ¼ The rows of QTr contain the r most-significant (ordered by descending eigenvalue) eigenvectors of the noisy data covariance matrix CR~ R~ ¼ CRR þ C. 11.3.2

The NAPC Transform

Cost criteria other than in Equation 11.5 are often more suitable for typical hyperspectral compression applications. For example, it might be desirable to reconstruct the noise-free radiances and filter the noise. The cost equation thus becomes ^ r R)T (R ^ r R)] c2 () ¼ E[(R (11:6) ^ r D Hr R ~ for some linear operator Hr with rank r. The noise-adjusted principal where R ¼ 1=2 1=2 components (NAPC) transform [10], where Hr ¼ C Wr WrT C and WrT contains the 1=2 r most-significant eigenvectors of the whitened noisy covariance matrix Cw~ w~ ¼ C 1=2 (CRR þ C)C , maximizes the signal-to-noise ratio of each component, and is superior to the PC transform for most noise-filtering applications where the noise statistics are known a priori.

11.3.3

The Projected PC Transform

It is often unnecessary to require that the PC be uncorrelated, and linear operators can be derived that offer improved performance over PC transforms for minimizing cost functions such as in Equation 11.6. It can be shown [11] that the optimal linear operator with rank r that minimizes Equation 11.6 is Lr ¼ Er ETr CRR (CRR þ CCC )1

(11:7)

where Er ¼ [E1 j E2 j . . . jEr] are the r most-significant eigenvectors of CRR (CRR þ C)1CRR. Examination of Equation 11.7 reveals that the Wiener-filtered radiances are projected onto the r-dimensional subspace spanned by Er. It is this projection that The following mathematical notation is used in this chapter: ()T denotes the transpose, (~ ) denotes a noisy random vector, and () denotes an estimate of a random vector. Matrices are indicated by bold upper case, vectors by upper case, and scalars by lower case. 1

Signal and Image Processing for Remote Sensing

212

motivates the name ‘‘PPC.’’ An orthonormal basis for this r-dimensional subspace of the original m-dimensional radiance vector space R is given by the r most-significant right eigenvectors, Vr, of the reduced-rank linear regression matrix, Lr, given in Equation 11.7. ~ as We then define the PPC of R ~ ¼ VT R ~ P r

(11:8)

~ are correlated, as VT ðCRR þ CCC ÞVr is not a diagonal matrix. Note that the elements of P r Another useful application of the PPC transform is the compression of spectral radiance information that is correlated with a geophysical parameter, such as the temperature profile. The r-rank linear operator that captures the most radiance information, which is correlated to the temperature profile, is similar to Equation 11.7 and is given below: Lr ¼ Er ETr CTR (CRR þ CCC )1

(11:9)

where Er ¼ [E1 j E2 j j Er] are the r most-significant eigenvectors of CTR (CRR þ CCC )1 CRT , and CTR is the cross-covariance of the temperature profile and the spectral radiance.

11.3.4

Evaluation of Compression Performance Using Two Different Metrics

The compression performance of each of the PC transforms discussed previously was evaluated using two performance metrics. First, we seek the transform that yields the best (in the sum-squared sense) reconstruction of the noise-free radiance spectrum given a noisy spectrum. Thus, we seek the optimal reduced-rank linear filter. The second performance metric is quite different and is based on the temperature retrieval performance in the following way. A radiance spectrum is first compressed using each of the PC transforms for a given number of coefficients. The resulting coefficients are then used in a linear regression to estimate the temperature profile. The results that follow were obtained using simulated, clear-air radiance intensity spectra from an AIRS-like sounder. Approximately, seven thousand and five-hundred 1750-channel radiance vectors were generated with spectral coverage from approximately 4 to 15 mm using the NOAA88b radiosonde set. The simulated intensities were expressed in spectral radiance units (mW m2 sr1(cm1)1). 11.3.4.1

PC Filtering

Figure 11.5a shows the sum-squared radiance distortion (Equation 11.5) as a function of the number of components used in the various PC decomposition techniques. The a priori level indicates the sum-squared error due to sensor noise. Results from two variants of the PC transform are plotted, where the first variant (the ‘‘PC’’ curve) uses eigenvectors of CR^ R^ as the transform basis vectors, and the second variant (the ‘‘noise-free PC’’ curve) uses eigenvectors of CRR as the transform basis vectors. It is shown in Figure 11.5a that the PPC reconstruction of noise-free radiances (PPC[R]) yields lower distortion than both the PC and NAPC transforms for any number of components (r). It is noteworthy that the ‘‘PC’’ and ‘‘noise-free PC’’ curves never reach the theoretically optimal level, defined by the full-rank Wiener filter. Furthermore, the PPC distortion curves decrease monotonically with coefficient number, while all the PC distortion curves exhibit a local minimum, after which the distortion increases with coefficient number as noisy, high-order terms are

Retrievals of Atmospheric Temperature and Moisture Profiles (a)

213

Noise-filtered radiance reconstruction error

Sum-squared error (mW m–2 sr –1 (cm–1)–1)

106 PC Noise-free PC NAPC PPC(R ) PPC(T )

104

102

100 100

a priori

Theoretical limit

101

102

103

Number of components (r )

(b)

Temperature profile retrieval error

Sum-squared error (K)

104 PC Noise-free PC NAPC PPC(R ) PPC(T )

103

102 100

Theoretical limit 101

102

103

Number of components (r )

FIGURE 11.5 Performance comparisons of the PC (where the components are derived from both noisy and noise-free radiances), NAPC, and PPC transforms for a hypothetical 1750-channel infrared (4–15 mm) sounder. Two projected principal components transforms were considered, PPC(R) and PPC(T), which are, respectively: (a) maximum representation of noise-free radiance energy, and (b) maximum representation of temperature profile energy. The first plot shows the sum-squared error of the reduced-rank reconstruction of the noise-free spectral radiances. The second plot shows the temperature profile retrieval error (trace of the error covariance matrix) obtained using linear regression with r components.

included. The noise in the high-order PPC terms is effectively zeroed out, because it is uncorrelated with the spectral radiances. 11.3.4.2 PC Regression The PC coefficients derived in the previous example are now used in a linear regression to estimate the temperature profile. Figure 11.5b shows the temperature profile error (integrated over all altitude levels) as a function of the number of coefficients used in the linear regression, for each of the PC transforms. To reach the theoretically optimal value achieved by linear regression with all channels requires approximately 20 PPC coefficients, 200 NAPC coefficients, and 1000 PC coefficients. Thus, the PPC transform results in a factor of ten improvement over the NAPC transform when compressing temperaturecorrelated radiances (20 versus 200 coefficients required), and approximately a factor of 100 improvement over the original spectral radiance vector (20 versus 1750). Note that the first guess in the AIRS Science Team Level-2 retrieval uses a linear regression derived from approximately 60 of the most-significant NAPC coefficients of the 2378-channel AIRS spectrum (in units of brightness temperature) [6]. Results for the moisture profile

214

Signal and Image Processing for Remote Sensing

are similar, although more coefficients (typically 35 versus 25 for temperature) are needed because of the higher degree of nonlinearity in the underlying physical relationship between atmospheric moisture and the observed spectral radiance. This substantial compression enables the use of relatively small (and thus very stable and fast) NN estimators to retrieve the desired geophysical parameters. It is interesting to consider the two variants of the PPC transform shown in Figure 11.5, namely PPC(R), where the basis for the noise-free radiance subspace is desired, and PPC(T), where the basis for only the temperature profile information is desired. As shown in Figure 11.5a, the PPC(T) transform poorly represents the noise-free radiance space, because there is substantial information that is uncorrelated with temperature (and thus ignored by the PPC(T) transform) but correlated with the noise-free radiance. Conversely, the PPC(R) transform offers a significantly less compact representation of temperature profile information (see Figure 11.5b), because the transform is representing information that is not correlated with temperature and thus superfluous when retrieving the temperature profile.

11.3.5

NAPC of Clear and Cloudy Radiance Data

In the following sections we compute the NAPC (and associated eigenvalues) of clear and cloudy radiance data, the NAPC of the infrared radiance perturbations due to clouds, and the projected (temperature) principal components of clear and cloudy radiance data. The 2378 AIRS radiances were converted from spectral intensities to brightness temperatures using Equation 11.2, and were concatenated with the 20 microwave brightness temperatures from AMSU-A and AMSU-B into a single vector R of length 2398. The NAPC were computed as follows: PNAPC ¼ QT R

(11:10)

~W ~ , sorted in descending order by eigenvalue. CW ~W ~ is where Q are the eigenvectors of CW the prewhitened covariance matrix discussed in Section 11.3. The eigenvalues corresponding to the top 100 NAPC are shown in Figure 11.6 for simulated clear-air and cloudy data. Also shown are scatterplots of the first three NAPC. The eigenvalues of the 90 lowest order terms are very similar. The principal differences occur in the three highest order terms, which are dominated by channels with weighting function peaks in the lower part of the atmosphere. The eigenvalues associated with the clearair and cloudy NAPC cluster into roughly five groups: 1, 2–3, 4–9, 10–11, and 12–100. The first 11 NAPC capture 99.96% of the total radiance variance for both the clear-air and cloudy data. The top three NAPCs of both clear-air and cloudy data appear to be jointly Gaussian to a close approximation, with the exception of clear-air NAPC #1 versus NAPC #2.

11.3.6

NAPC of Infrared Cloud Perturbations

We define the infrared cloud perturbation DRIR as D Rclr Rcld DRIR ¼ IR IR

(11:11)

clr cld where RIR is the clear-air infrared brightness temperature and RIR is the cloudy infrared brightness temperature. The NAPC of DRIR were calculated using the method described above. The results are shown in Figure 11.7.

Retrievals of Atmospheric Temperature and Moisture Profiles

215

108 Clear air Clouds

Eigenvalue

106

104

102

10

20

30

70 6

2

4

4

0

−4 −6 −4

NAPC #3

6

−2

2 0 −2

0 −2 NAPC #1

−4 −4

2

0 −2 NAPC #1

−4 −5

2

4

4

−6 −5

NAPC #3

2

2 0 −2

−4 0 NAPC #1

5

−4 −5

100

−2

6

−2

90

0

6

0

80

2

4

NAPC #3

NAPC #2

40 50 60 NAPC coefficient #

4

NAPC #3

NAPC #2

100 0

0 NAPC #2

5

0 NAPC #2

5

2 0 −2

0 NAPC #1

5

−4 −5

FIGURE 11.6 Noise-adjusted principal components transform analysis of clear and cloudy simulated AIRS=AMSU data. The top plot shows the eigenvalues of each NAPC coefficient for clear and cloudy data. The middle row presents scatterplots of the three clear-air NAPC coefficients with the largest variance (shown normalized to unit variance). The bottom row presents scatterplots of the three cloudy NAPC coefficients with the largest variance (shown normalized to unit variance).

The six highest order NAPC of DRIR capture approximately 99.96% of the total cloud perturbation variance, which suggests that there are more degrees of freedom in the atmosphere than there are in the clouds. Furthermore, there is significant crosstalk between the cloud perturbation and the underlying atmosphere, and this crosstalk is highly nonlinear and non-Gaussian. Evidence of this can be seen in the scatterplot of NAPC #1 versus NAPC #2, shown in the lower left corner of Figure 11.7.

Signal and Image Processing for Remote Sensing

216 108

Clear−cloudy Clear

107

Eigenvalue

106 105 104 103 102 101

10

20

30

40 50 60 NAPC coefficient #

NAPC #3

NAPC #2

5

0

−5 −2

0

2 4 NAPC #1

6

70

5

5

0

0

NAPC #3

100 0

−5

−10 −2

0

2 4 NAPC #1

6

80

90

100

−5

−10 −5

0 NAPC #2

5

FIGURE 11.7 Noise-adjusted principal components transform analysis of the cloud impact (clear radiance–cloudy radiance) for simulated AIRS data. The top plot shows the eigenvalue of each NAPC coefficient of cloud impact, along with the NAPC coefficients of clear-air data (shown in Figure 11.6). The bottom row presents scatterplots of the three cloud-impact NAPC coefficients with the largest variance (shown normalized to unit variance).

The temperature weighting functions of NAPC #1 and NAPC #2 are shown in Figure 11.8. NAPC #1 consists primarily of surface channels and NAPC #2 consists primarily of channels that peak near 3–6 km and channels that peak near the surface. Therefore, NAPC #1 is sensitive principally to the overall cloud amount, while NAPC #2 is also sensitive to cloud altitude.

11.3.7

PPC of Clear and Cloudy Radiance Data

The PPC transform discussed previously was used to identify temperature information contained in the clear and cloudy radiances. Figure 11.9 shows the mean temperature profile retrieval error for the reduced-rank regression operator given in Equation 11.9 as a function of rank (the number of PPC coefficients retained) for clear-air and cloudy radiance data. Both curves have asymptotes near 15 coefficients, and clouds degrade the temperature retrieval by an average of approximately 0.3 K RMS.

Retrievals of Atmospheric Temperature and Moisture Profiles

217

15 Clear–cloudy: NAPC #1 Clear–cloudy: NAPC #2

Altitude (km)

10

5

0

0

0.05

0.1

0.15 0.2 0.25 0.3 0.35 Temperature weighting function (km−1)

0.4

0.45

0.5

FIGURE 11.8 Temperature weighting functions of the first two noise-adjusted principal components of AIRS cloud perturbations (clear radiance–cloudy radiance).

3

Mean temperature profile error (K RMS)

Clear air Clouds 2.5

2

1.5

1

0.5 0

5

10

15

20

25

30

35

40

45

50

Number of PPC coefficients FIGURE 11.9 Projected principal components transform analysis of clear and cloudy simulated AIRS=AMSU data. The mean temperature profile retrieval error (K RMS) is shown as a function of the number of PPC coefficients used in a linear regression for simulated clear and cloudy data.

Signal and Image Processing for Remote Sensing

218

11.4

Neural Network Retrieval of Temperature and Moisture Profiles

An NN is an interconnection of simple computational elements, or nodes, with activation functions that are usually nonlinear, monotonically increasing, and differentiable. NNs are able to deduce input–output relationships directly from the training ensemble without requiring underlying assumptions about the distribution of the data. Furthermore, an NN with only a single hidden layer of a sufficient number of nodes with nonlinear activation functions is capable of approximating any real-valued continuous scalar function to a given precision over a finite domain [12,13].

11.4.1

An Introduction to Multi-Layer Neural Networks

Consider a multi-layer feedforward NN consisting of an input layer, an arbitrary number of hidden layers (usually one or two), and an output layer (see Figure 11.10). The hidden layersPtypically contain sigmoidal activation functions of the form zj ¼ tanh(aj), where d aj ¼ i ¼ 1 wji xi þ bj. The output layer is typically linear. The weights (wji) and biases (bj) for the jth neuron are chosen to minimize a cost function over a set of P training patterns. A common choice for the cost function is the sum-squared error, defined as E(w) ¼

1 X X (p) (p) 2 t k yk 2 p k

(11:12)

(p) where y(p) k and tk denote the network outputs and target responses, respectively, of each output node k given a pattern p, and w is a vector containing all the weights and biases of the network. The ‘‘training’’ process involves iteratively finding the weights and biases that minimize the cost function through some numerical optimization procedure. Secondorder methods are commonly used, where the local approximation of the cost function by a quadratic form is given by

(a) Neural network topology

(b) Perceptron x1 x2

wj,i

Σ

x3

. . .

xd Input layer

First hidden layer

Second hidden layer

Output layer

aj

F(aj)

zj

wj,d bj

FIGURE 11.10 The structure of the multi-layer feedforward NN (specifically, the multi-layer perceptron) is shown in (a), and the perceptron (or node) is shown in (b).

Retrievals of Atmospheric Temperature and Moisture Profiles 1 E(w þ dw) E(w) þ rE(w)T dw þ dwT r2 E(w) dw 2

219 (11:13)

where rE(w) and r2E(w) are the gradient vector and the Hessian matrix of the cost function, respectively. Setting the derivative of Equation 11.13 to zero and solving for the weight update vector dw yields dw ¼ [r2 E(w)]1 rE(w)

(11:14)

Direct application of Equation 11.14 is difficult in practice, because computation of the Hessian matrix (and its inverse) is nontrivial, and usually needs to be repeated at each iteration. For sum-squared error cost functions, it can be shown that rE(w) ¼ JT e

(11:15)

r2 E(w) ¼ JT J þ S

(11:16)

where J is the Jacobian matrix that contains first derivatives of the network P Perrors with 2 respect to the weights and biases, e is a vector of network errors, and S ¼ p ¼ 1 ep r ep. The Jacobian matrix can be computed using a standard backpropagation technique [14] that is significantly more computationally efficient than direct calculation of the Hessian matrix [15]. However, an inversion of a square matrix with dimensions equal to the total number of weights and biases in the network is required. For the Gauss–Newton method, it is assumed that S is zero (a reasonable assumption only near the solution), and the updated Equation 11.14 becomes dw ¼ [JT J]1 Je

(11:17)

The Levenberg–Marquardt modification [16] to the Gauss–Newton method is dw ¼ [JT J þ mI]1 Je

(11:18)

As m varies between zero and 1, dw varies continuously between the Gauss–Newton step and the steepest descent. The Levenberg–Marquardt method is thus an example of a model-trust-region approach in which the model (in this case the linearized approximation of the error function) is trusted only within some region around the current search point [17]. The size of this region is governed by the value m. The use of multi-layer feedforward NNs, such as the multi-layer perceptron (MLP), to retrieve temperature profiles from hyperspectral radiance measurements has been addressed by several investigators (see Refs. [18,19], for example). NN retrieval of moisture profiles from hyperspectral data is relatively new [20], but follows the same methodology used to retrieve temperature.

11.4.2

The PPC–NN Algorithm

A first attempt to combine the properties of both NN estimators and PC transforms for the inversion of microwave radiometric data to retrieve atmospheric temperature and moisture profiles is reported in Ref. [21], and a more recent study with hyperspectral data is presented in Ref. [20]. A conceptually similar approach is taken in this work by combining the PPC compression technique described in Section 11.3.3 with the NN estimator discussed in the previous section. PPC compression offers substantial performance

Signal and Image Processing for Remote Sensing

220

advantages over traditional principal components analysis (PCA) and is the cornerstone of the present work. 11.4.2.1

Network Topology

All MLPs used in the PPC–NN algorithm are composed of one or two hidden layers of nonlinear (hyperbolic tangent) nodes and an output layer of linear nodes. For the temperature retrieval, 25 PPC coefficients are input to six NNs, each with a single hidden layer of 15 nodes. Separate NNs are used for different vertical regions of the atmosphere; a total of six networks are used to estimate the temperature profile at 65 points from the surface to 50 mbar. For the water vapor retrieval, 35 PPC coefficients are input to nine NNs, each with a single hidden layer of 25 nodes. The water vapor profile (mass mixing ratio) is estimated at 58 points from the surface to 75 mbar. These network parameters were determined largely through empirical analyses. Work is underway to dynamically optimize these parameters as the NN is trained. Separate training and testing data sets are used and are discussed in more detail in Section 11.4.2.2. 11.4.2.2

Network Training

The weights and biases were initialized using the Nguyen–Widrow method [22]. This method reduces the training time by initializing the weights so that each node is ‘‘active’’ (in the linear region of the activation function) over the input range of interest. The NN was trained using the Levenberg–Marquardt backpropagation algorithm discussed in Section 11.4.1. For each epoch, the m parameter was initialized to 0.001. If a successful step was taken (i.e., E(w þ dw) < E(w)), then m was decreased by a factor of ten. If the current step was unsuccessful, the value of m was increased by a factor of ten until a successful step could be found (or until m reached 1010). The network training was stopped when the error on a separate data set did not decrease for 10 consecutive epochs. The sensor noise was changed on each training epoch to desensitize the network to radiance measurement errors. 11.4.3

Error Analyses for Simulated Clear and Cloudy Atmospheres

AIRS and AMSU-A=B clear and cloudy radiances were simulated for an ensemble of approximately 10,000 profiles. These profiles were produced using a numerical weather prediction (NWP) model, and are substantially smoother (vertically) than the NOAA88b profile set that is used in the following sections. The profiles were separated into mutually exclusive sets for training and testing of the PPC–NN algorithm. The RMS errors for the LLSE and NN temperature retrievals are shown in Figure 11.11 for clear and cloudy atmospheres. The NN estimator significantly outperforms the LLSE in both cases. The sensitivity of the retrieval to instrument noise is examined by repeating the retrieval with instrument noise set to zero. The difference in retrieval errors (with and without noise) is shown in the first panel of Figure 11.12. The atmospheric contribution to the retrieval error (i.e., the noise-free retrieval error) is shown in the second panel of Figure 11.12. Finally, the differences (net minus LLSE) in error contributions for the two methods are shown in the third panel of Figure 11.12. It is noteworthy that the NN is a much better filter of instrument noise than is the LLSE. As a final test of sensitivity to instrument noise, the LLSE and NN retrievals were repeated while varying the instrument noise between 10% and 1000% of its nominal value. The resulting retrieval errors are shown in Figure 11.13. The NN retrieval is significantly less sensitive to instrument noise than is the LLSE retrieval.

Retrievals of Atmospheric Temperature and Moisture Profiles

221

15 Neural net (clear air) LLSE (clear air) Neural net (clouds) LLSE (clouds)

Altitude (km)

10

5

0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

RMS error (K) FIGURE 11.11 RMS temperature profile retrieval error for the NN and LLSE estimators. Results for clear air and clouds are shown. Surface emissivities were modeled as random variables with clipped-Gaussian PDFs; mean ¼ 0.975, standard deviation ¼ 0.025 (AIRS); mean ¼ 0.95, standard deviation ¼ 0.05 (AMSU).

15

Atmosphere contribution 15 15

10

10

10

5

5

5

Altitude (km)

Noise contribution

0

0 0

0.2 0.4 0.6 RMS error (K)

Neural net LLSE

Difference

0 0

0.5 1 RMS error (K)

Neural net LLSE

0 0.2 0.4 Net error LLSE error (K)

Noise Atmosphere

FIGURE 11.12 Contributions to temperature profile retrieval error due to measurement noise and atmospheric noise for the NN and LLSE estimators.

Signal and Image Processing for Remote Sensing

222 Absolute error

Normalized error

4

3.5

5 Neural net LLSE

4.5

Neural net LLSE

4

3

3.5 Retrieval error factor

Retrieval error (K)

2.5

2

1.5

1

2.5 2 1.5

0.5

0 10⫺1

3

1

100 Noise factor

101

0.5 10⫺1

100 Noise factor

101

FIGURE 11.13 Sensitivity of temperature profile retrieval to measurement noise. The mean RMS error over 15 km is shown as a function of the noise amplification factor.

11.4.4 Validation of the PPC–NN Algorithm with AIRS=AMSU Observations of Partially Cloudy Scenes Over Land and Ocean In this section, the performance of the PPC–NN algorithm is evaluated using cloud-cleared AIRS observations (not simulations, as were used in the previous section) and colocated ECMWF (European Center for Medium-range Weather Forecasting) forecast fields. The cloud clearing is performed using both AIRS and AMSU data. The PPC–NN retrieval performance is compared with that obtained using the AIRS Level-2 algorithm. Both ocean and land cases are considered, including elevated surface terrain, and retrievals at all sensor scan angles (out to +488) are derived. Finally, sensitivity analyses of PPC–NN retrieval performance are presented with respect to scan angle, orbit type (ascending or descending), cloud amount, and training set comprehensiveness. 11.4.4.1

Cloud Clearing of AIRS Radiances

The cloud-clearing approach discussed in Susskind et al. [23] was applied to the AIRS data by the AIRS Science Team. Version 3.x of the algorithm was used in this work (see Table 11.1 for a detailed listing of the software version numbers). The algorithm seeks to estimate a clear-column radiance (the radiance that would have been measured if the scene were cloud-free) from a number of adjacent cloud-impacted fields of view.

Retrievals of Atmospheric Temperature and Moisture Profiles

223

TABLE 11.1 AIRS Software Version Numbers for the Seven Days Used in the Match-Up Data Set 6 Sep 2002 25 Jan 2003 8 Jun 2003 21 Aug 2003 3 Sep 2003 12 Oct 2003 5 Dec 2003 Cloud clearing Level-2

11.4.4.2

3.7.0 3.0.8

3.7.0 3.0.8

3.7.0 3.0.8

3.1.9 3.0.8

3.1.9 3.0.8

3.1.9 3.0.8

3.7.0 3.0.10

The AIRS=AMSU=ECMWF Data Set

The performance of the PPC–NN algorithm was evaluated using 352,903 AIRS=AMSU observations and colocated ECMWF atmospheric fields collected on seven days throughout 2002 and 2003 (see Table 11.1). Various software version changes were made over the course of this work (see Table 11.1 for details), but these changes were primarily with regard to quality control and do not significantly affect the results presented here. However, the version 4.x release of the AIRS software, which was not available in time to be included in this work, should offer many enhancements over version 3.x, including improved cloud clearing, retrieval accuracies, quality control, and retrieval yield [24]. Reanalyses of the results presented in this section are therefore planned with the new AIRS software release. The 352,903 observations were randomly divided into a training set of 302,903 observations (206,061 of which were over ocean) and a separate validation set of 50,000 observations (40,000 of which were over ocean). The a priori RMS variation of the temperature and water vapor (mass mixing ratio) profiles contained in the validation set are shown in Figure 11.14. The observations in the validation set were matched with AIRS Level-2 retrievals obtained from the Earth Observing System (EOS) Data Gateway (EDG). As advised in the AIRS Version 3.0 L2 Data Release Documentation, only retrievals that met certain quality standards (specifically, RetQAFlag ¼ 0 for ocean and RetQAFlag ¼ 256 for land) were included in the analyses. There were 17,856 AIRS Level-2 retrievals (all within +408 latitude) that met this criterion. Re-analysis with AIRS Level-2 version 4.x software is planned, as the version 4.x products have been validated over both ocean and land at near-polar latitudes. To facilitate comparison with results published in the AIRS v3.0 Validation Report [25], layer error statistics are calculated as follows. First, layer averages are calculated in layers of approximately (but not exactly) 1-km width—the exact layer widths can be found in Appendix III in the AIRS v3.0 Validation Report. Second, weighted water vapor errors in each layer are calculated by dividing the RMS mass mixing ratio error by the RMS variation of the true mass mixing ratio (as opposed to dividing the mass mixing ratio error of each profile by the true mass mixing ratio for that profile and computing the RMS of the resulting ensemble). 11.4.4.3

AIRS=AMSU Channel Selection

Thirty-seven percent (888 of the 2378) of the AIRS channels were discarded for the analysis, as the radiance values for these channels frequently were flagged as invalid by the AIRS calibration software. A simulated AIRS brightness temperature spectrum is shown in Figure 11.15, which shows the original 2378 AIRS channels and the 1490 channels that were selected for use with the PPC–NN algorithm. All 15 AMSU-A channels were used. The algorithm automatically discounts channels that are excessively corrupted by sensor noise (for example, AMSU-A channel 7 on EOS Aqua) or other interfering

Signal and Image Processing for Remote Sensing

224

(b) a priori RMS MMR error

(a) a priori RMS temperature error 1

1

10

Pressure (mbar)

Pressure (mbar)

10

102

103 0

2

4

6 K

8

10

12

102

103

0

20

40 Percent

60

80

FIGURE 11.14 Temperature and water vapor profile statistics for the validation data set used in the analysis. See the text for details on how the statistics are computed at each layer.

Brightness temperature (K)

320 300

Availble AIRS channels Selected AIRS channels

280 260 240 220 200 500

1000

1500

2000

2500

Wavenumber (cm−1) FIGURE 11.15 A typical AIRS spectrum (simulated) is shown. 1490 out of 2378 AIRS channels were selected.

3000

Retrievals of Atmospheric Temperature and Moisture Profiles

225

signals (for example, the effects of nonlocal thermodynamic equilibrium) because the corruptive signals are largely uncorrelated with the geophysical parameters that are to be estimated. 11.4.4.4 PPC–NN Retrieval Enhancements for Variable Sensor Scan Angle and Surface Pressure When dealing with global AIRS=AMSU data, a variety of scan angles and surface pressures must be accommodated. Therefore, two additional inputs were added to the NNs discussed previously: (1) the secant of the scan angle and (2) the forecast surface pressure (in mbar) divided by 1013.25. The resulting temperature and water vapor profile estimates were reported on a variable pressure grid anchored by the forecast surface pressure. Because the number of inputs to the NNs increased, the number of hidden nodes in NNs used for temperature retrievals was increased from 15 to 20. For water vapor retrievals, the number of hidden nodes in the first hidden layer was maintained at 25, but a second hidden layer of 15 hidden nodes was added. 11.4.4.5 Retrieval Performance We now compare the retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods. For both the ocean and land cases, the PPC–NN and linear regression retrievals were derived using the same training set, and the same validation set was used for all methods. The temperature profile retrieval performance over ocean for the linear regression retrieval, the PPC–NN retrieval, and the AIRS Level-2 retrieval is shown in Figure 11.16, and the water vapor retrieval performance is shown in Figure 11.17. The error statistics were calculated using the 13,156 (out of 40,000) AIRS Level-2 retrievals that converged successfully. A bias of approximately 1 K near 100 mbar was found between the AIRS Level-2 temperature retrievals and the ECMWF data (ECMWF was colder). This bias was removed prior to computation of the AIRS Level-2 retrieval error statistics, which are shown in Figure 11.16. The temperature profile retrieval performance over land for the linear regression retrieval, the PPC–NN retrieval, and the AIRS Level-2 retrieval is shown in Figure 11.18, and the water vapor retrieval performance is shown in Figure 11.19. The error statistics were calculated using the 4,700 (out of 10,000) AIRS Level-2 retrievals that converged successfully. There are several features in Figure 11.16 through Figure 11.19 that are worthy of note. First, for all retrieval methods, the performance over land is worse than that over ocean, as expected. The cloud-clearing problem is significantly more difficult over land, as variations in surface emissivity can be mistaken for cloud perturbations, thus resulting in improper radiance corrections. Second, the magnitude of the temperature profile error degradation for land versus ocean is larger for the PPC–NN algorithm than for the AIRS Level-2 algorithm. In fact, the temperature profile retrieval performance of the AIRS Level-2 algorithm is superior to that of the PPC–NN algorithm throughout most of the lower troposphere over land. Further analyses of this discrepancy suggest that the performance of the PPC–NN method over elevated terrain is suboptimal, and could be improved. This work is currently underway. 11.4.4.6 Retrieval Sensitivity to Cloud Amount A plot of the temperature retrieval error in the layer closest to the surface as a function of the cloud fraction retrieved by the AIRS Level-2 algorithm is shown in Figure 11.20.

Signal and Image Processing for Remote Sensing

226 101 Linear regression

AIRS v3.0 Level-2 algorithm

Pressure (mbar)

PPC–NN

102

103 0

0.5

1

1.5

RMS temperature error (K) FIGURE 11.16 Temperature retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over ocean. Statistics were calculated over 13,156 fields of regard.

Similar curves for the water vapor retrieval performance are shown in Figure 11.21. Both methods produce temperature and moisture retrievals with RMS errors near 1 K and 15%, respectively, even in cases with large cloud fractions. The figures show that the PPC–NN temperature and moisture retrievals are less sensitive than the AIRS Level-2 retrievals to cloud amount. Furthermore, it has been shown that the PPC–NN retrieval technique is relatively insensitive to sensor scan angle, orbit type, and training set comprehensiveness [26].

11.4.5

Discussion and Future Work

Although the PPC–NN performance results presented in the previous section are very encouraging, several caveats must be mentioned. The ECMWF fields used for ‘‘ground truth’’ contain errors, and the NN will tune to these errors as part of its training process. Therefore, the PPC–NN RMS errors shown in the previous section may be underestimated, and the AIRS Level-2 RMS errors may be overestimated, as the ECMWF data is not an accurate representation of the true state of the atmosphere. Therefore, the ‘‘true’’ spread between the performance of the PPC–NN and AIRS Level-2 algorithms is

Retrievals of Atmospheric Temperature and Moisture Profiles

227

100

Linear regression AIRS v3.0 Level-2 algorithm

Pressure (mbar)

200

PPC–NN

300

400 500 600 700 800 900 1000 0

10

20

30

40

50

Percent RMS mass mixing ratio error (RMS error/RMS true * 100) FIGURE 11.17 Water vapor (mass mixing ratio) retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over ocean. Statistics were calculated over 13,156 fields of regard.

almost certainly smaller than that shown here. Work is currently underway to test the performance of both the PPC–NN and AIRS Level-2 algorithms with additional groundtruth data, including radiosonde data, and ground- and aircraft-based measurements. It should be noted that the PPC–NN algorithm as implemented in this work is currently not a stand-alone system, as both AIRS cloud-cleared radiances and quality flags produced by the AIRS Level-2 algorithm are required. Future work is planned to adapt the PPC–NN algorithm for use directly on cloudy AIRS=AMSU radiances and to produce quality assessments of the retrieved products. Finally, assimilation of PPC–NN-derived atmospheric parameters into NWP models is planned, and the resulting impact on forecast accuracy will be an excellent indicator of retrieval quality.

11.5

Summary

The PPC–NN temperature and moisture profile retrieval technique combines a linear radiance compression operator with an NN estimator. The PPC transform was shown to be well suited for this application because information correlated to the geophysical

Signal and Image Processing for Remote Sensing

228 1

10

Linear regression AIRS v3.0 Level-2 algorithm

Pressure (mbar)

PPC–NN

102

103

0

0.5

1

1.5

2

2.5

RMS temperature error (K) FIGURE 11.18 Temperature retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over land. Statistics were calculated over 4,700 fields of regard.

quantity of interest is optimally represented with only a few dozen components. This substantial amount of radiance compression (approximately a factor of 100) allows relatively small NNs to be used thereby improving both the stability and computational efficiency of the algorithm. Test cases with observed partially cloudy AIRS=AMSU data demonstrate that the PPC–NN temperature and moisture retrievals yield accuracies commensurate with those of physical methods at a substantially reduced computational burden. Retrieval accuracies (defined as agreement with ECMWF fields) near 1 K for temperature and 25% for water vapor mass mixing ratio in layers of approximately 1-km thickness were obtained using the PPC–NN retrieval method with AIRS=AMSU data in partially cloudy areas. PPC–NN retrievals with partially cloudy AIRS=AMSU data over land were also performed. The PPC–NN retrieval technique is relatively insensitive to cloud amount, sensor scan angle, orbit type, and training set comprehensiveness. These results further suggest the AIRS Level-2 algorithm that produced the cloud-cleared radiances and quality flags used by the PPC–NN retrieval is performing well. The high level of performance achieved by the PPC–NN algorithm suggests it would be a suitable candidate for the retrieval of geophysical parameters other than temperature and moisture from high resolution spectral data. Potential applications include the retrieval of ozone profiles and trace gas amounts. Future work will involve further evaluation of the algorithm with simulated and observed partially cloudy data, including global radiosonde data and ground- and aircraft-based observations.

Retrievals of Atmospheric Temperature and Moisture Profiles

229

100

Linear regression AIRS v3.0 Level-2 algorithm

200

Pressure (mbar)

PPC–NN

300

400 500 600 700 800 900 1000

0

10

20

30

40

50

Percent RMS mass mixing ratio error (RMS error/RMS true * 100)

FIGURE 11.19 Water vapor (mass mixing ratio) retrieval performance of the PPC–NN, linear regression, and AIRS Level-2 methods over land. Statistics were calculated over 4,700 fields of regard.

RMS temperature retrieval error (K) in lowest layer

1.5 AIRS Level-2 PPC–NN

1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0

10

20

30

40 50 60 Cloud fraction (%)

70

80

90

100

FIGURE 11.20 Cumulative RMS temperature error in the layer closest to the surface. Pixels were ranked in order of increasing cloudiness according to the retrieved cloud fraction from the AIRS Level-2 algorithm. No retrievals were attempted by the AIRS Level-2 algorithm if the retrieved cloud fraction exceeded 80%.

Signal and Image Processing for Remote Sensing

230

RMS humidity retrieval error (%) in lowest layer

20 AIRS Level-2 PPC–NN

15

10

5

0

10

20

30

40 50 60 Cloud fraction (%)

70

80

90

100

FIGURE 11.21 Cumulative RMS water vapor error in the layer closest to the surface. Pixels were ranked in order of increasing cloudiness, according to the retrieved cloud fraction from the AIRS Level-2 algorithm. No retrievals were attempted by the AIRS Level-2 algorithm if the retrieved cloud fraction exceeded 80%.

Acknowledgments The author would like to thank NOAA NESDIS, Office of Systems Development, and the National Aeronautics and Space Administration for financial support of this work. The colocated AIRS=AMSU=ECMWF data sets were provided by the AIRS Science Team, with assistance from M. Goldberg (NOAA NESDIS, Office of Research Applications). The author would also like to thank L. Zhou for help in obtaining and interpreting the AIRS=AMSU=ECMWF data sets, and D. Staelin and P. Rosenkranz for many useful discussions on all facets of this work. This work was supported by the National Oceanic and Atmospheric administration under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

References 1. H. Hotelling, The most predictable criterion, J. Educ. Psychol., vol. 26, pp. 139–142, 1935. 2. J. Escobar-Munoz, A. Che´din, F. Cheruy, and N. Scott, Re´seaux de neurons multi-couches pour la restitution de variables thermodynamiques atmosphe´riques a` l’aide de sondeurs verticaux satellitaires, Comptes-Rendus de L’Acade´mie des Sciences; Se´rie II, vol. 317, no. 7, pp. 911–918, 1993.

Retrievals of Atmospheric Temperature and Moisture Profiles

231

3. D.H. Staelin, Passive remote sensing at microwave wavelengths, Proc. IEEE, vol. 57, no. 4, pp. 427–439, 1969. 4. M.A. Janssen, Atmospheric Remote Sensing by Microwave Radiometry, New York, John Wiley & Sons, Inc., 1993. 5. C.D. Rodgers, Retrieval of atmospheric temperature and composition from remote measurements of thermal radiation, J. Geophys. Res., vol. 41, no. 7, pp. 609–624, 1976. 6. M.D. Goldberg, Y. Qu, L.M. McMillin, W. Wolff, L. Zhou, and M. Divakarla, AIRS nearreal-time products and algorithms in support of operational numerical weather prediction, IEEE Trans. Geosci. Rem. Sens., vol. 41, no. 2, pp. 379–389, 2003. 7. H. Huang and P. Antonelli, Application of principal component analysis to high-resolution infrared measurement compression and retrieval, J. Appl. Meteorol., vol. 40, pp. 365–388, Mar. 2001. 8. F. Aires, W.B. Rossow, N.A. Scott, and A. Che´din, Remote sensing from the infrared atmospheric sounding interferometer instrument: 1. compression, denoising, first-guess retrieval inversion algorithms, J. Geophys. Res., vol. 107, Nov. 2002. 9. W.J. Blackwell and D.H. Staelin, Cloud flagging and clearing using high-resolution infrared and microwave sounding data, IEEE Int. Geosci. Rem. Sens. Symp. Proc., June 2002. 10. J.B. Lee, A.S. Woodyatt, and M. Berman, Enhancement of high spectral resolution remotesensing data by a noise-adjusted principal components transform, IEEE Trans. Geosci. Rem. Sens., vol. 28, pp. 295–304, May 1990. 11. W.J. Blackwell, Retrieval of cloud-cleared atmospheric temperature profiles from hyperspectral infrared and microwave observations, Ph.D. dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 2002. 12. K.M. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Netw., vol. 4, no. 5, pp. 359–366, 1989. 13. S. Haykin, Neural Networks: A Comprehensive Foundation, New York, Macmillan College Publishing Company, 1994. 14. D.E. Rumelhart, G. Hinton, and R. Williams, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations., ser. D.E. Rumelhart and J.L. McClelland, Eds., Cambridge, MA, MIT Press, 1986. 15. M.T. Hagan and M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Trans. Neural Netw., vol. 5, pp. 989–993, Nov. 1994. 16. P.E. Gill, W. Murray, and M.H. Wright, The Levenberg–Marquardt method, in Practical Optimization, London, Academic Press, 1981. 17. C.M. Bishop, Neural Networks for Pattern Recognition, Oxford, Oxford University Press, 1995. 18. H.E. Motteler, L.L. Strow, L. McMillin, and J.A. Gualtieri, Comparison of neural networks and regression based methods for temperature retrievals, Appl. Opt., vol. 34, no. 24, pp. 5390–5397, 1995. 19. F. Aires, A. Che´din, N.A. Scott, and W.B. Rossow, A regularized neural net approach for retrieval of atmospheric and surface temperatures with the IASI instrument, J. Appl. Meteorol., vol. 41, pp. 144–159, Feb. 2002. 20. F. Aires, W.B. Rossow, N.A. Scott, and A. Che´din, Remote sensing from the infrared atmospheric sounding interferometer instrument: 2. simultaneous retrieval of temperature, water vapor, and ozone atmospheric profiles, J. Geophys. Res., vol. 107, Nov. 2002. 21. F.D. Frate and G. Schiavon, A combined natural orthogonal functions=neural network technique for the radiometric estimation of atmospheric profiles, Radio Sci., vol. 33, no. 2, pp. 405–410, 1998. 22. D. Nguyen and B. Widrow, Improving the learning speed of two-layer neural networks by choosing initial values of the adaptive weights, IJCNN, vol. 3, pp. 21–26, 1990. 23. J. Susskind, C.D. Barnet, and J.M. Blaisdell, Retrieval of atmospheric and surface parameters from AIRS=AMSU=HSB data in the presence of clouds, IEEE Trans. Geosci. Rem. Sens., vol. 41, no. 2, pp. 390–409, 2003. 24. J. Susskind et al., Accuracy of geophysical parameters derived from AIRS=AMSU as a function of fractional cloud cover, J. Geophys. Res., III; May 2006.

232

Signal and Image Processing for Remote Sensing

25. H.H. Aumann et al., Validation of AIRS=AMSU=HSB core products for data release version 3.0, NASA JPL Tech. Rep. D-26538, Aug 2003. 26. W.J. Blackwell, A neural-network technique for the retrieval of atmospheric temperature and moisture profiles from high spectral resolution sounding data, IEEE Trans. Geosci. Rem. Sens., vol. 43, no. 11, pp. 2535–2546, 2005.

12 Satellite-Based Precipitation Retrieval Using Neural Networks, Principal Component Analysis, and Image Sharpening*

Frederick W. Chen

CONTENTS 12.1 Introduction ..................................................................................................................... 233 12.2 Physical Basis of Passive Microwave Precipitation Sensing ................................... 234 12.3 Description of AMSU-A/B and AMSU/HSB.............................................................. 240 12.4 Signal Processing ............................................................................................................ 242 12.4.1 Regional Laplacian Filtering ........................................................................... 242 12.4.2 Principal Component Analysis....................................................................... 243 12.4.2.1 Basic PCA.......................................................................................... 244 12.4.2.2 Constrained PCA ............................................................................. 244 12.4.3 Data Fusion ........................................................................................................ 245 12.4.4 Neural Nets........................................................................................................ 245 12.5 The Chen–Staelin Algorithm ........................................................................................ 247 12.5.1 Limb-Correction of Temperature Profile Channels .................................... 249 12.5.2 Detection of Precipitation ................................................................................ 251 12.5.3 Cloud-Clearing by Regional Laplacian Interpolation................................. 252 12.5.4 Image Sharpening ............................................................................................. 252 12.5.5 Temperature and Water Vapor Profile Principal Components ................ 254 12.5.6 The Neural Net.................................................................................................. 254 12.6 Retrieval Performance Evaluation ............................................................................... 255 12.6.1 Image Comparisons of NEXRAD and AMSU-A/B Retrievals.................. 255 12.6.2 Numerical Comparisons of NEXRAD and AMSU-A/B Retrievals .......... 256 12.6.3 Global Retrievals of Rain and Snow.............................................................. 259 12.7 Conclusions...................................................................................................................... 260 Acknowledgments ..................................................................................................................... 261 References ................................................................................................................................... 261

12.1

Introduction

Global estimation of precipitation is important to studies in areas such as atmospheric dynamics, hydrology, climatology and meteorology. Improvements in methods for *The material in this chapter is taken from Refs. [66] and [67].

233

Signal and Image Processing for Remote Sensing

234

satellite-based estimation of precipitation can lead to improvements in weather forecasting, climate studies, and climate models. Precipitation presents many challenges because of its complicated physics and statistics. Existing models of clouds do not adequately account for all of the possible variations in clouds and precipitation that can be encountered. Instead of a physics-based approach, Chen and Staelin developed a method using a statistics-based approach. This method was developed for the Advanced Microwave Sounding Unit (AMSU) instruments AMSU-A/B on the NOAA-15, NOAA-16, and NOAA-17 satellites. The development process made use of the fact that although satellite-based passive microwave data cannot completely characterize the observed clouds and precipitation, the data can still yield useful information relevant to precipitation. This chapter discusses the methods used by Chen and Staelin to extract from the data information related to atmospheric state variables that are highly correlated with precipitation. Principal component analysis for signal separation and data compression, data fusion for resolution sharpening, neural nets, and regional filtering are among the signal and image processing methods used. AMSU-A/B and the corresponding instruments AMSU/HSB (Humidity Sounder for Brazil) collect data near 23.8, 31.4, 54, 89, and 183 GHz. These frequency bands provide useful information about precipitation. In this chapter, the precipitation estimation algorithm developed by Chen and Staelin is discussed with an emphasis on the signal processing employed to sense important parameters about precipitation. Section 12.2 explains why it is possible to use the data on AMSU-A/B to estimate precipitation. Section 12.3 provides a more detailed description of AMSU-A/B and AMSU/HSB. Section 12.4 describes the Chen– Staelin algorithm. Section 12.7 provides concluding remarks.

12 .2

Phys ical Bas is of Pas sive Microwave Precipitation Sensing

Matter radiates thermal energy depending on its physical temperature and characteristic properties. When a spaceborne radiometer observes a location on the Earth, the amount of energy it receives depends on contributions from the various atmospheric and topographical constituents within its field of view (Figure 12.1). One useful quantity describing the amount of thermal radiation emitted by a body is spectral brightness. Spectral brightness is a measure of how much energy a body radiates at a specified frequency per unit receiving area, per transmitting solid angle, and per unit frequency. The spectral brightness of a blackbody (Wster1m2Hz1) is a function of its physical temperature T (K) and frequency f (Hz) and is given by the following formula B(f ,T) ¼

2hf 3 c2 (ehf =kT

1)

(12:1)

where h is Planck’s constant (Js), c is the speed of light (m/s), and k is Boltzmann’s constant (J/K). For this chapter, f never exceeds 200 GHz, and T never falls below 100 K, so hf/ kT < 0.1. Then, the Taylor series expansion for exponential functions is used to simplify Equation 12.1 " # hf 1 hf 2 hf =kT e 1¼ 1þ þ 1 (12:2) þ kT 2! kT

hf kT

(12:3)

Satellite-Based Precipitation Retrieval Using Neural Networks

235

z Space Td

Tcosmic

f H

Tc

Ta q

Atmosphere

ka(z), T(z)

Tb r(q), T(0)

FIGURE 12.1 The major components of the radiative transfer equation (note: u 6¼ w since the surface of the Earth is spherical).

B( f ,T)¼

2kT l2

(12:4)

where l is the wavelength (m) associated with the frequency f. For blackbodies, given an observation of spectral brightness, one can calculate the physical temperature of a blackbody as follows: T¼

B( f , t)l2 2k

(12:5)

Unlike blackbodies, gray bodies reflect some of the energy incident on them, so the intrinsic spectral brightness of a gray body is not equal to that of a blackbody. For a gray body, B( f ,T)¼«

2kT l2

(12:6)

where « is the emissivity of the gray body. This equation is rewritten as follows: B( f ,T)¼

2k «T l2

(12:7)

The quantity «T is called the brightness temperature of an unilluminated gray body; that is, the temperature of a blackbody radiating the same brightness. Another useful property of a gray body is its reflectivity r, the fraction of incident energy that is reflected. A body in thermal equilibrium emits the same amount of energy that it absorbs. Therefore, r þ « ¼ 1. For a blackbody, « ¼ 1 and r ¼ 0. In an atmosphere without hydrometeors, the brightness temperature observed by an Earth-observing spaceborne radiometer can be divided into four components (Figure 12.1): 1. Ta, the brightness temperature due to radiation emitted by atmospheric gases that is not reflected off the surface

Signal and Image Processing for Remote Sensing

236

2. Tb, the brightness temperature due to radiation emitted by the surface 3. Tc, the brightness temperature due to radiation that is emitted by atmospheric gases that is reflected off the surface 4. Td, the brightness temperature due to cosmic background radiation that passes through the atmosphere twice and is reflected off the surface Thermal radiation can be attenuated through absorption or reflected off the surface before being received by a radiometer. The observed brightness temperature is, therefore, a function of several variables such as the atmospheric temperature profile, water vapor profile, the emissivity of the surface (as a function of satellite zenith angle), and the absorption coefficients of atmospheric gases (as a function of altitude z). Given the atmospheric absorption coefficients ka(z) (in Np per unit length), the reflectivity of the surface r(u), the altitude of the satellite H, the cosmic background temperature Tcosmic (in K), and the satellite zenith angle u, the brightness temperature components can be computed as follows (assuming specular surface reflection and the absence of hydrometeors): ðH

Ta ¼ sec u T(z0 )ka (z0 )et(z ,H) sec u dz0 0

(12:8)

z0

Tb ¼½1 r(u)T(z0 )et(z0 ,H) sec u t(z0 ,H) sec u

Tc ¼r(u) sec u e

ðH

(12:9)

T(z0 )ka (z0 )et(z0 ,z ) sec u dz0 0

(12:10)

z0

Td ¼r(u)Tcosmic e2t(z0 ,H) sec u

(12:11)

Then, the brightness temperature TB measured by the radiometer is the sum of Ta, Tb, Tc, and Td. ðH 0 TB ¼ sec u T(z0 )ka (z0 )et(z ,H) sec u dz0 þ½1 r(u)T(z0 )et(z0 ,H) sec u z0 t(z0 ,H) sec u

þr(u) sec u e

ðH

T(z0 )ka (z0 )et(z0 ,z ) sec u dz0 þr(u)Tcosmic e2t(z0 ,H) sec u 0

(12:12)

z0

where z0 is the altitude of the surface and t(z1,z2) is the integrated atmospheric absorption of the atmosphere between altitudes z1 and z2. zð2 (12:13) t(z1 ,z2 )¼ ka (z) dz z1

Equation 12.12 can be rewritten as follows (assuming that the contribution of thermal radiation from altitudes above H to Tc is negligible):

TB

ðH z0

T(z0 )W(z0 ) dz0 þ½1 r(u)T(z0 )et(z0 ,H) sec u þr(u)Tcosmic e2t(z0 ,H) sec u

(12:14)

Satellite-Based Precipitation Retrieval Using Neural Networks

103

1976 Standard Atmosphere Water vapor removed

54-GHz O2 band

118-GHz 183-GHz O2 band H2O band

2

Zenith opacity (dB)

237

10

101

100

10

425-GHz O2 band

−1

50

100

150

200 250 300 Frequency (GHz)

350

400

450

500

FIGURE 12.2 Zenith opacity for the microwave spectrum.

where W (z)¼ sec u ka (z0 )et(z,H) sec u þr(u) sec u et(z0 ,H) sec u ka (z0 )et(z0 ,z) sec u

(12:15)

W(z) is called the weighting function or Jacobian. Equation 12.12 shows that the relationship between the thermal energy radiated and the physical temperature of a body is linear; that is, the brightness temperature can be expressed as a linear integral of physical temperatures in the field of view, where these radiated signals are uncorrelated and superimposed [1]. The previous section shows that the brightness temperatures seen by a satellite-borne radiometer depend on many variables in a highly complex and nonlinear fashion, so retrievals of temperature and water vapor profiles by direct inversion would be very difficult. However, the physics of the atmosphere still allows the extraction of useful information about the atmosphere from microwave frequency bands. Two of the most important determinants of precipitation rate are the temperature and water vapor profiles. The presence of oxygen and water vapor resonance frequencies in the microwave spectrum and the presence of oxygen and water vapor in the atmosphere result in frequency bands that are sensitive primarily to a specific range of altitudes. Figure 12.2 shows the zenith opacity as a function of frequency for the range from 10 to 500 GHz for a ground-based zenith-observing radiometer. There are absorption spikes around oxygen resonance frequencies such as those in the neighborhood of 54, 118.75, and 424.76 GHz and at water vapor resonance frequencies such as 183.31 GHz. A satelliteborne radiometer observing at these frequencies would sense only the highest layers of the atmosphere. On the other hand, a satellite-borne radiometer observing at frequencies away from the spikes would be sensitive to the surface. By observing at frequency bands that are near the resonance frequencies, but still on the sides of the spikes, one can capture information on conditions at specific layers of the atmosphere. The AMSU focuses primarily on the 54-GHz oxygen band and the 183-GHz water vapor band. Figure 12.3 shows the weighting functions of the channels on AMSU. Opaque microwave channels were used with great success to retrieve atmospheric conditions. Rosenkranz used AMSU-A and AMSU-B data from NOAA satellites to estimate temperature and water vapor profiles [2,3]. Shi used AMSU-A to estimate temperature

Signal and Image Processing for Remote Sensing

238 AMSU-A weighting functions

AMSU-B weighting functions

60

50

10−2

Channel

Altitude (km)

Water vapor burden (g/cm2)

14

40

13 12

30

11 20

10 9

0 0

10

183 ± 3 GHz

100

8 10

183 ± 1 GHz −1

183 ± 7 GHz

7 6 3 0.2

0.4 dT/d(In P)

5 4 0.6

150 GHz 89 GHz 0 0.1 0.2 0.3 0.4 dT/d(In(water vapor burden))

FIGURE 12.3 Weighting functions of AMSU-A and AMSU-B channels.

profiles [4]. Blackwell et al. used the 54-GHz and 118-GHz bands aboard the National PolarOrbiting Environmental Satellite System (NPOESS) Aircraft Sounder Testbed-Microwave (NAST-M) [5]. Leslie et al. added the 183-GHz and 425-GHz bands to NAST-M [6,7,8]. Clouds and precipitation result from humid air that rises, cools, and condenses. Precipitation typically occurs in two forms: convective and stratiform. Stratiform precipitation occurs when one air mass slides under or over another, as in a cold or warm front, causing the upper air mass to cool and the water vapor within it to condense. Such precipitation is spread out. Convective precipitation is initiated by instabilities in the atmosphere caused by cold, dense air supported by warm, humid, and less dense air. Such instabilities result in the warm, humid air escaping upward and cooling. Water vapor in this ascending air mass condenses and releases latent heat. The latent heat warms the surrounding air, which pushes the air mass and the water and ice particles that have formed further upward. This cycle continues until the original warm humid air mass has cooled to the temperature of the surrounding cooler air [9,10]. The tops of convective clouds spew forth ice particles and can reach more than 10 km above sea level [11]. Convective precipitation often occurs within stratiform precipitation. Several factors affect the precipitation rate. Higher degrees of instability caused by large vertical temperature gradients force ice particles higher in the atmosphere, causing such particles to pick up more moisture and grow. Precipitation amounts are limited by the water vapor available in the atmosphere; therefore, higher concentrations of water vapor result in higher precipitation rates. Warmer surface air contributes to higher precipitation rates because warmer air holds more water vapor. Channels in the 54-GHz and 183-GHz bands provide important clues about factors such as cloud-top altitude, temperature profile, water vapor profile, and particle size distribution. Because each channel is sensitive to a specific layer of the atmosphere

Satellite-Based Precipitation Retrieval Using Neural Networks

239

(Figure 12.3), it is possible to extract information about the cloud-top altitude. Precipitating clouds exhibit perturbations in channels whose weighting functions have significant values in the range of altitudes occupied by the cloud. For example, one would not expect to detect low-lying clouds below 3 km in the AMSU-A channel 14 because the weighting function of channel 14 peaks at 40 km, far above the tops of nearly all precipitating clouds. The 54-GHz and 183-GHz bands together provide information about particle size distribution through their sensitivities to different ranges of ice particle diameters. The scattering of electromagnetic waves by spherical ice particles is described by Mie scattering coefficients. For a single spherical particle with radius r, given the power scattered by the particle Ps and the power density of the incident plane wave Si, the scattering crosssectional area Qs of the particle is defined as follows: Qs ¼

Ps Si

(12:16)

Then, the Mie scattering coefficient is defined as the ratio of Qs of the particle to the geometric cross-sectional area of the particle js ¼

Qs pr2

(12:17)

where r is the radius of the particle. Figure 12.4 shows the Mie scattering coefficients for fresh-water ice spheres at 54 GHz and 183.31 GHz as a function of diameter. The permittivities of ice were calculated using formulas developed by Hufford [12], and the Mie scattering coefficients were calculated using an iterative computational procedure developed by Deirmendjian [1,13]. At 183.31 GHz, the diameter of a particle can increase to about 0.7 mm before its diameter can no longer be uniquely determined by the Mie scattering coefficient curve. At 54 GHz, this limit is about 2.4 mm. For diameters less than 0.7 mm, js for 54 GHz is smaller than that for 183.31 GHz by a factor of at least

Mie scattering efficiency

101

100

10−1

183.31 GHz

54 GHz 10−2

10−3 −4 10

Temperature = 218 K (−55⬚C) 10−3 Drop size diameter (m)

FIGURE 12.4 Mie scattering coefficients for 54 GHz and 183.31 GHz at a temperature of 558C.

10−2

Signal and Image Processing for Remote Sensing

240

100. These differences make it possible to use both bands together to extract information about particle size distributions. The Mie scattering coefficients in Figure 12.4 were calculated for a temperature of 558C (218 K), near the temperature at an altitude of 10 km for the 1962 U.S. Standard Atmosphere [1]. Corresponding values for 108C (263 K) were also calculated, but did not differ from those for 558C by more than 7.2%. In addition to particle size distribution, the 54-GHz and 183-GHz bands can also provide information about particle abundance. While the particle size distribution can be sensed from a comparison of the Mie scattering efficiencies in both bands, particle abundance can be sensed from absolute scattering over volumes. The cloud-top altitude is another important variable in precipitation. This is correlated with particle size density because only higher updraft velocities are able to support larger particles and reach higher altitudes. The sensitivities of the 54-GHz and 183-GHz channels to specific layers of the atmosphere suggest that these channels are able to provide information about cloud-altitude. Spina et al. used data from the opaque 118-GHz band to estimate cloud-top altitudes [14]. Gasiewski showed that the 54-GHz and 118-GHz bands together could be used to estimate cell-top altitude and hydrometeor density (in terms of mass per volume) [15,16]. Window channels contribute information about precipitation through their sensitivity to the warm emission signatures of precipitating clouds against a sea background and their sensitivity to scattering. In addition to opaque channels, AMSU also includes window channels at 23.8 GHz, 31.4 GHz, 89.0 GHz, and 150 GHz. Weng et al. and Grody et al. have developed a precipitation-retrieval algorithm for AMSU that relies primarily on these channels [17,18]. Some of the recent passive microwave instruments that have focused on windows channels and have been used to study precipitation include the following: 1. The Special Sensor Microwave Imager (SSM/I) for the Defense Meteorological Satellite Program (DMSP) [19–21] 2. The Advanced Microwave Sounding Radiometer (AMSR-E) for the Earth Observing System aboard the NASA Aqua satellite [22,23] 3. The Tropical Rainfall Measurement Mission (TRMM) microwave imager (TMI) aboard the TRMM satellite [24,25] One weakness of window channels is that they tend to be sensitive to the surface. Over land, surface signatures can obscure the emission signatures of precipitation. Also, window-channel-based precipitation-rate retrieval algorithms tend to use one method over ocean and another over land [26,27]. The opaque channels aboard AMSU enable the development of an algorithm that uses the same method over both land and sea.

12.3

Description of AMSU-A/B and AMSU/HSB

The AMSU has been aboard the NOAA-15, NOAA-16, and NOAA-17 satellites launched in May 1998, September 2000, and June 2002, respectively. They are each equipped with the instruments AMSU-A and AMSU-B. AMSU-A has 15 channels: one each at 23.8 GHz, 31.4 GHz, and 89.0 GHz, and 12 channels in the 54-GHz oxygen absorption band (Table 12.1). AMSU-B has 5 channels at the following frequencies: 89.0 GHz, 150 GHz, 183.31 + 1 GHz, 183.31 + 3 GHz, and 183.31 + 7 GHz (Table 12.2). AMSU-A and AMSU-B measure

Satellite-Based Precipitation Retrieval Using Neural Networks

241

TABLE 12.1 AMSU-A Channel Frequencies Channel 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Center Frequencies (MHz) 23,800 + 72.5 31,400 + 50 50,300 + 50 52,800 + 105 53,596 + 115 54,400 + 105 54,940 + 105 55,500 + 87.5 57,290.344 + 87.5 57,290.344 + 217 57,290.344 + 322.2 + 48 57,290.344 + 322.2 + 22 57,290.344 + 322.2 + 10 57,290.344 + 322.2 + 4.5 89,000 + 900

Bandwidth (MHz) 2125 280 280 2190 2168 2190 2190 2155 2155 277 435 415 48 43 21000

brightness temperatures at 50 and 15 km nominal resolutions at nadir, respectively. AMSUA has a 3.338-diameter 3-dB beamwidth and observes at 308 angles spaced at 3.338 intervals up to 48.338 from nadir every 8.00 sec. AMSU-B has a 1.18-diameter beamwidth and observes at 908 angles spaced at 1.18 intervals up to 48.958 from nadir every 2.67 sec (Figure 12.5b) [28–30]. AMSU covers a swath width of 2200 km. NOAA-15, NOAA-16, and NOAA-17 are sun-synchronous polar-orbiting satellites with equatorial crossing times of about 7 A.M./P.M., 2 A.M./P.M., and 10 A.M./P.M., respectively, so together they observe most locations approximately six times a day (Figure 12.6). The ascending local equatorial crossing times are 7 P.M., 2 P.M. 10 P.M. for NOAA–16 and NOAA–17, respectively. The 15-km, 89.0-GHz channel on AMSU-B was not used in this research in order that the algorithm developed for AMSU–A/B could also be used with AMSU/HSB with minimal adjustment. We also use data from the NASA Aqua satellite that was launched in May 2002. It is equipped with AMSU and is identical to AMSU-A aboard the NOAA satellites, and the HSB, which is identical to AMSU-B, but without the 89.0-GHz channel [31]. The nominal resolutions of AMSU and HSB are 40.5 and 13.5 km, respectively. Aqua has an equatorial crossing time of about 1:30 A.M./P.M. (Figure 12.6) [31–33]. The scan pattern of AMSU/HSB is slightly different from that of AMSU-A/B on the NOAA satellites in that the path traced during one AMSU scan is more parallel to that traced by a nearly coincident HSB scan (Figure 12.5). TABLE 12.2 AMSU-B Channel Frequencies Channel 1 2 3 4

Center Frequencies (GHz) 150 + 0.9 183.31 + 1 183.31 + 3 183.31 + 7

Bandwidth (GHz) 21 20.5 21 22

Signal and Image Processing for Remote Sensing

242

24⬚N

16⬚E

18⬚E

20⬚E

22⬚E

24⬚E

26⬚E

28⬚E

30⬚E

32⬚E

34⬚E

23⬚N 22⬚N 21⬚N 20⬚N 19⬚N

(a)

Aqua AMSU/HSB, Southbound 178⬚W 176⬚W 174⬚W 172⬚W 170⬚W 168⬚W 166⬚W 164⬚W 162⬚W 160⬚W 158⬚W 156⬚W 27⬚N 26⬚N 25⬚N 24⬚N 23⬚N 22⬚N 21⬚N

(b)

Aqua AMSU/HSB, Northbound

FIGURE 12.5 Scan patterns of (a) Aqua AMSU/HSB and (b) AMSU-A/B on NOAA-15, NOAA-16, and NOAA-17. AMSU and AMSU-A spots are labeled with þ’s and HSB and AMSU-B spots with dots.

12.4

Signal Processing

The preceding section provides an overview of the types of information available in data from AMSU-A/B. While developing an algorithm for estimating precipitation, it is important to process the data in a way that makes the information relevant to precipitation stand out as much as possible. This section provides an overview of the signal-processing methods used in the Chen and Staelin algorithm.

12.4.1

Regional Laplacian Filtering

Laplacian filtering is useful for clearing the effects of clouds over regions that are identified as cloudy. This enables computation of not only cloud-cleared brightness temperatures but also the perturbation due to precipitation. Laplacian filtering gives the Chen and Staelin algorithm a spatial filtering component not found in other algorithms that process data in a manner that treats each pixel independently of any other pixel. Laplacian filtering of a rectangular field F is done by first determining the region over which filtering is to be done. Using this region, a set of boundary conditions is determined. Then, using these boundary conditions, values for pixels in the region of interest are computed so that the discrete Laplace’s equation is satisfied. In a continuous twodimensional (2D) domain, Laplace’s equation is as follows: r2 F¼0

(12:18)

Satellite-Based Precipitation Retrieval Using Neural Networks

243

90⬚N 75⬚N 60⬚N 45⬚N

NOAA-15 7 A.M

30⬚N

NOAA-17

15⬚N 90⬚W

Aqua

10 A.M 60⬚W

0⬚

−16

90 k

30⬚W 15⬚S −2200

km

30⬚S

1:30 P.M

m

2 P.M

0⬚ 30⬚E

45⬚S

60⬚E NOAA-16

60⬚S

90⬚E

75⬚S 90⬚S FIGURE 12.6 Orbital patterns of the NOAA-15, NOAA-16, NOAA-17, and Aqua satellites.

12.4.2

Principal Component Analysis

Signal separation is an important concept in this chapter. Although precipitation is very complex, nonstationary, and sporadic, one can process the data in a way that adequately separates the different degrees of freedom that affect precipitation. Principal component analysis (PCA) is a linear method for reducing the dimensionality of a data set of interrelated variables. PCA transforms the data into a set of uncorrelated random variables that capture all of the variance of the original data set and assign as much variance as possible to the fewest number of variables. PCA is also known by other names such as singular value decomposition (SVD) and Karhunen–Loe`ve transform (KLT). PCA is useful for data compression. This feature can be critical in problems related to the compression of satellite data where a large amount of data must be downloaded from a satellite using a communications link with limited bandwidth and in situations where computational resources are limited. Blackwell used projected principal component transform, a variant of PCA, to compress data for use in estimating atmospheric temperature profiles. This reduced the number of inputs and, as a result, simplified the neural net, created a more stable neural net, and reduced the training time [34,35]. CabreraMercader used noise-adjusted principal components (NAPC) to compress simulated NASA Atmospheric Infrared Sounder (AIRS) data [36].

Signal and Image Processing for Remote Sensing

244

PCA is also used to filter out noise in data. Several different versions or extensions of PCA have been used to eliminate various types of noise. A variant of PCA has been used to remove signatures of the surface from passive microwave data for the purpose of detecting and characterizing precipitating clouds [37]. PCA is also an important part of blind multivariate noise estimation and filtering algorithms such as iterative order and noise (ION) estimation and an extension of ION that is capable of estimating mixing matrices [38–40]. 12.4.2.1 Basic PCA Here, a definition of basic PCA is presented. For a random vector x of p random variables, the principal components of x can be defined inductively. The first principal component is the product a1T x, where a1 is a unit-length vector that maximizes the variance of a1T x. a1 ¼ arg max Var(aT x) kak¼1

(12:19)

where Var() denotes the variance of a random variable. Each of the other principal components are defined as follows: the nth principal component is the product anT x, where an is a unit-length vector that is orthogonal to a1, a2, . . . , and an1 and maximizes the variance of anT x. an ¼ arg

max

kak¼1; a?ai ; 8i2{1; 2; ...; n1}

Var(aT x)

(12:20)

a1, a2, . . . , and ap are derived in Ref. [41]. The nth principal component is the product anTx where an is the eigenvector associated with the nth highest eigenvalue of the covariance matrix Lx of x. In this chapter, the definitions of PCA and the term principal component follow the convention of Ref. [41]. 12.4.2.2

Constrained PCA

In addition to the basic PCA, the algorithm developed in this chapter uses a variation of PCA known as constrained PCA. This form of PCA finds principal components that are constrained to be orthogonal to a given subspace [41]. It can be used to filter out noise in remote-sensing data. Constrained PCA has been used to filter out signatures of surface variations from microwave remote-sensing data [42]. Filtering out unwanted signatures involves the following steps: 1. Select a set of data that captures a good representation of the type of noise to be filtered without capturing too much of the variation of the signal of interest 2. Apply basic PCA to the resulting subset of data 3. Examine the resulting principal components for sensitivity to the type of noise to be filtered out 4. Project the data onto the subspace orthogonal to the noise-sensitive principal components 5. Apply basic PCA to the data resulting from the projection The principal components resulting from steps (2) and (5) are called preconstraint principal components and postconstraint principal components, respectively, as in Ref. [37].

Satellite-Based Precipitation Retrieval Using Neural Networks 12.4.3

245

Data Fusion

Data fusion is a very broad area involving the combination of information from different sources. A working group set up by the European Association of Remote Sensing Laboratories and the French Society for Electricity and Electronics has adopted the following definition of data fusion [42]: Data fusion is a formal framework in which are expressed means and tools for the alliance of data originating from different sources. It aims at obtaining information of greater quality; the exact definition of ‘greater quality’ will depend upon the application.

Review papers have referred to three levels of data fusion: measurement, feature, and decision [43–45]. The measurement level is sometimes called the pixel level [44]. The algorithm described in Section 12.5 involves the measurement and decision levels. In this chapter, nontrivial uses of data fusion occur only at the measurement level. Some of the applications of image fusion (or data fusion applied to 2D data) include image sharpening or enhancement [45], feature enhancement, and replacement of missing or faulty data [44]. For this research, nonlinear data fusion is applied to sharpen images. Rosenkranz developed a method for nonlinear geophysical parameter estimation through multi-resolution data fusion [46–48].

12.4.4

Neural Nets

Neural nets are computational structures that were developed to mimic the way biological neural nets learn from their environment and are useful for pattern recognition and classification. Neural nets can be used to learn and compute functions for which the relationships between inputs and outputs are unknown or computationally complex. There are a variety of neural nets such as feedforward neural nets (sometimes called multilayer perceptrons [49]), Kohonen self-organizing feature maps, and Hopfield nets [50,51]. The feedforward neural net is used in this chapter. The basic structural element of feedforward neural nets is called a perceptron. It computes a function of the weighted sum of inputs and a bias, as shown in Figure 12.7. y¼f

n X

! wi xi þb

(12:21)

i¼1

where xi is the ith input, wi is the weight associated with the ith input, b is the bias, f is the transfer function of the perceptron, and y is the output.

x1 x2

w1 w2

f (·)

xn

wn

b

y

FIGURE 12.7 The structure of a perceptron.

Signal and Image Processing for Remote Sensing

246 x1

w11 w12 f (·)

w1m w21

b1

x2

v1

w22 f (·) v2

w2m

g (·)

b2

y

c wn1

vm wn1 f (·)

xn

wnm bm

FIGURE 12.8 A two-layer feedforward neural net with one output node.

Perceptrons can be combined to form a multi-layer network, as shown in Figure 12.8. In Figure 12.8, xi is the ith input, n is the number of inputs, wij is the weight associated with the connection from the ith input to the jth node in the hidden layer, bi is the bias of the ith node, m is the number of nodes in the hidden layer, f is the transfer function of the perceptrons in the hidden layer, vi is the weight between the ith node and the output node, c is the bias of the output node, g is the transfer function of the output node, and y is the output. Then, 0 ! 1 m n X X vj f wij xi þbj þcA y¼[email protected] (12:22) j ¼1

i ¼1

In this chapter, f and g are defined as follows: f (x)¼ tanh x¼

ex ex ex þex

g(x)¼x

(12:23) (12:24)

The function tanh x is approximately linear in the range 0.6 x 0.6, and approaches 1 as x tends to 1 and 1 as x tends to 1, so it has a nonlinearity that is not too complex (Figure 12.9). This neural net topology is good for situations in which one wants to develop a simple nonlinear estimator whose output depends approximately monotonically on each input.

Satellite-Based Precipitation Retrieval Using Neural Networks

247

2.5 2

tanh(x) (solid) and x (dashed)

1.5 1 0.5 0

−0.5 −1

−1.5 −2 −2.5 −2.5

−2

−1.5

−1

−0.5

0 X

0.5

1

1.5

2

2.5

FIGURE 12.9 Neural net transfer functions.

The neural nets for this chapter were trained using the Levenberg–Marquardt training algorithm. Marquardt developed an efficient algorithm (called the Marquardt–Levenberg algorithm in Ref. [51]) for nonlinear least-square parameter estimation [52]. Hagan and Menhaj incorporated this algorithm into a backpropagation training algorithm for feedforward neural nets [52]. The weights of the neural net were initialized using the Nguyen– Widrow method to facilitate convergence of the neural net weights during training [53]. The vectors used to train and evaluate the neural nets were divided into three disjoint sets: 1. The training set, the set used to determine how the weights of the neural net should be adjusted during the training 2. The validation set, the set used to determine when the training should stop 3. The testing set, the set used to evaluate the resulting neural net These definitions are from Ref. [54]. One of the challenges encountered in the course of developing an estimator involved dealing with an output range that covered several orders of magnitude. Chapter 4 in this volume describes how this was accomplished.

12.5

The C hen–Staelin Algorithm

The basic structure of the algorithm includes some signal-processing components and a neural net, as shown in Figure 12.10. The signal processing components process the data

Signal and Image Processing for Remote Sensing

248 AMSU data Auxiliary data

Neural net

Signal processing

Rain rate metric

FIGURE 12.10 Basic structure of the algorithm.

into forms that characterize the most important degrees of freedom related to the precipitation rate such as atmospheric temperature profile, water vapor profile, cloud-top altitude, particle size distribution, and vertical updraft velocity. The neural net is trained to learn the nonlinear dependencies of precipitation rate on these variables. The dependence of the precipitation rate on these variables should be monotonic, so the neural net does not need to be complicated. A feedforward neural net with one hidden layer of tangent sigmoid nodes (with transfer function f(x) ¼ tanh x) and one linear output node should be sufficient (Figure 12.8) [49]. The Chen–Staelin algorithm uses the signal-processing methods in the previous section to extract the most relevant information from AMSU data. Figure 12.11 shows a block diagram of the first part of the algorithm and Figure 12.12 shows the final part of the algorithm with a neural net. A neural net that takes the following sets of inputs is at the heart of the algorithm: 1. Inferred 15-km-resolution perturbations at 52.8 GHz, 53.6 GHz, 54.4 GHz, 54.9 GHz, and 55.5 GHz.

HSB 183 ± 7 GHz

Compute 183 ±7 GHz perturbations

HSB 183 ± 3 GHz

Compute 183 ± 3 GHz perturbations

Out of bounds HSB 4 Flag invalid 183 ± 1 GHz 15-km points 183 ± 3 GHz 183 ± 7 GHz Out of bounds 150 GHz 13 Flag invalid AMSU 50-km points chs. 1–12,15

Land/sea flag Scan angle

9

C

OR

OR A

Interpolate to 15-km

AND

Extend to 50-km Form valid interpolation mask

Extend to 50-km D

B

Limb and surface 5 5 Laplacian 5 correction (chs. 4–8) interpolation − ⊕ + 5 cosine

FIGURE 12.11 Block diagram of the algorithm, part 1.

z < 0?

Compute scaling factor

y y < 0?

OR

z

E

53.6-GHz switch

Elevation flag

Surface elevation

AMSU chs. 4–12

Filter to 50-km

Extend to 50-km Form valid interpolation mask

NOT

AND

⊗

5

NOT

5 5 Set warm 5 5 Interpolate parts to 0 to 15-km 5 50-km, 54-GHz perturbations

15-km, 540-GHz 50-km valid 15-km valid retrieval mask perturbations retrieval mask

2. 183 + 1-, 183 + 3-, and 183 + 7-GHz 15-km HSB data.

Satellite-Based Precipitation Retrieval Using Neural Networks

249

5

15-km, 54-GHz perturbations

5

50-km, 54-GHz perturbations

– 9

AMSU chs. 4–12 Land/sea flag Scan angle

5

+

Limb and surface correction

50-km precipitation retrieval

3 5

Interpolate to 15-km

Temperature profile principal components

cosine

50-km valid retrieval mask

Linear transform

Filter to 50-km

3

AMSU chs. 1–3,15

4

4

Interpolate to 15-km

HSB 150-GHz

Linear transform

4 Water vapor profile principal components

HSB 183-GHz Satellite zenith angle

3

2

Neural x net

10x–1

3

3

15-km precipitation retrieval

15-km valid retrieval mask

secant

FIGURE 12.12 Block diagram of the algorithm, part 2.

3. The leading three principal components characterizing the original five corrected 50-km AMSU-A temperature radiances. 4. Two surface-insensitive principal components that characterize the window channels at 23.8 GHz, 31.4 GHz, 50 GHz, and 89 GHz, along with the four HSB channels. 5. The secant of the satellite zenith angle u. Each of these sets provides the neural net with information that is relevant to precipitation. The three principal components characterizing AMSU-A temperature radiances provide information on the atmospheric temperature profile, which is important because it determines how much water vapor can be precipitated. The secant of the satellite zenith angle u allows the neural net to account for variations in the data due to the scan angle. The 15-km cloud-induced perturbations provide information on the cloud-top altitude. The current AMSU/HSB precipitation retrieval algorithm is based on NOAA-15 AMSU comparisons with NEXRAD over the eastern United States during 38 orbits that exhibited significant precipitation and were distributed throughout the year. These orbits are listed in Table 12.3. The primary precipitation-rate retrieval products of AMSU/HSB are 15and 50-km-resolution contiguous retrievals over the viewing positions of HSB and AMSU, respectively, within 438 of nadir. The two outermost 50-km and six outermost 15-km viewing positions on each side of the swath are omitted due to their grazing angles. The algorithm architectures for these two retrieval methods and the derivation of the numerical coefficients characterizing the neural network are described and presented below.

12.5.1

Limb-Correction of Temperature Profile Channels

AMSU observes at angles up to 498 away from the nadir. For angles further away from nadir, the electromagnetic energy originating from a given altitude and atmospheric state

Signal and Image Processing for Remote Sensing

250 TABLE 12.3

List of Rainy Orbits Used for Training, Validation, and Testing October 16, 1999, 0030 UTC October 31, 1999, 0130 UTC November 2, 1999, 0045 UTC December 4, 1999, 1445 UTC December 12, 1999, 0100 UTC January 28, 2000, 0200 UTC January 31, 2000, 0045 UTC February 14, 2000, 0045 UTC February 27, 2000, 0045 UTC March 11, 2000, 0100 UTC March 17, 2000, 0015 UTC March 17, 2000, 0200 UTC March 19, 2000, 0115 UTC April 2, 2000, 0100 UTC April 4, 2000, 0015 UTC April 8, 2000, 0030 UTC April 12, 2000, 0045 UTC April 12, 2000, 0215 UTC April 20, 2000, 0100 UTC

April 30, 2000, 1430 UTC May 14, 2000, 0030 UTC May 19, 2000, 0015 UTC May 19, 2000, 0145 UTC May 20, 2000, 0130 UTC May 25, 2000, 0115 UTC June 10, 2000, 0200 UTC June 16, 2000, 0130 UTC June 30, 2000, 0115 UTC July 4, 2000, 0115 UTC July 15, 2000, 0030 UTC August 1, 2000, 0045 UTC August 8, 2000, 0145 UTC August 18, 2000, 0115 UTC August 23, 2000, 1315 UTC September 23, 2000, 1315 UTC October 5, 2000, 0130 UTC October 6, 2000, 0100 UTC October 14, 2000, 0130 UTC

has to travel longer paths before reaching the radiometer and, therefore, is subject to more absorption and scattering effects. This results in scan-angle-dependent effects in brightness temperature images, as shown in Figure 12.13a. A limb and surface correction method for AMSU-A channels 4–8 brightness temperatures is needed to make precipitation-induced perturbations more apparent and for extracting information about atmospheric conditions. AMSU-A channels 4 and 5 are corrected for surface variations, as they are sensitive to the surface. For these two channels, the brightness temperature for pixels over ocean is corrected to what might be observed for the same atmospheric conditions over land. AMSU-A channels 9–14 brightness temperatures are not corrected because they are not significantly perturbed by clouds and therefore are not used for anything other than limb correction. Limb and surface correction was done by training a neural net of the type shown in Figure 12.8 to estimate nadir-viewing brightness temperatures. For each pixel, the neural net used brightness temperatures from several channels at that pixel to estimate the brightness temperature seen at the pixel closest to nadir at a nearly identical latitude and at nearly the same time. It is assumed that the temperature field does not vary significantly over one scan. Limb and surface correction was done for AMSU-A channels 4–8. The data used to correct each of these channels are listed in Table 12.4. No attempt has been made to correct for the scan-angle-dependent asymmetry in the brightness temperatures. These neural nets were trained using data between 558N and 558S from seven orbits spaced over 1 year. Channels 4 and 5 are surface sensitive, so they were trained to estimate brightness temperatures that would be seen over land. Figure 12.13 shows a sample of (a) uncorrected and (b) limb and surface corrected 54.4GHz brightness temperatures. The shapes of precipitation systems over Texas and the Mexico–Guatemala border are more apparent after the limb correction. In Figure 12.13(a), the difference between brightness temperatures at nadir and the swath edge is as high as 18 K. In Figure 12.13(b), the angle-dependent variation is less than 3 K. One limitation of the training is that one cannot really know what the nadir-viewing brightness temperature is supposed to be when there is precipitation.

Satellite-Based Precipitation Retrieval Using Neural Networks 110⬚W

100⬚W

90⬚W

80⬚W

251 244

40⬚N

242

238 30⬚N

236 234

25⬚N

232

Brightness temperature (K)

240

35⬚N

230

20⬚N

228 15⬚N

226

(a) 110⬚W

100⬚W

90⬚W

80⬚W

243

40⬚N

242.5 242 241.5 30⬚N

241 240.5

25⬚N

240 239.5

20⬚N

Brightness temperature (K)

35⬚N

239 238.5

15⬚N

(b)

238

FIGURE 12.13 (See color insert following page 302.) NOAA-15 AMSU-A 54.4-GHz brightness temperatures for a northbound track on September 13, 2000. (a) Uncorrected and (b) limb and surface corrected.

12.5.2

Detection of Precipitation

The 15-km-resolution precipitation-rate retrieval algorithm, summarized in Figure 5.2, and Figure 5.3 begins with the identification of potentially precipitating pixels. The neural net operates on data from only FOVs labeled as potentially precipitating. This choice eliminates the need to exhaustively learn all of the conditions where the precipitation rate is exactly zero. The neural net is also likely to have difficulty forcing precipitation rates in nonprecipitating FOVs to be exactly zero. This choice also reduces the time needed to train the neural net since, at any given time, precipitation falls over less than 10% of the Earth’s surface. All 15-km pixels with brightness temperatures at 183 + 7 GHz that are below a threshold T7 are flagged as potentially precipitating, where

Signal and Image Processing for Remote Sensing

252 TABLE 12.4

Data Used in Limb and Surface Correction of AMSU-A Channels AMSU-A Channel

Inputs Used for Limb and Surface Correction

4 5 6 7 8

AMSU-A channels 4–12, land/sea flag, cos w AMSU-A channels 5–12, land/sea flag, cos w AMSU-A channels 6–12, cos w AMSU-A channels 6–12, cos w AMSU-A channels 6–12, cos w

T7 ¼0:667(T53:6 248)þ262þ6 cos u

(12:25)

and where u is the satellite zenith angle and T53.6 is the spatially filtered 53.6-GHz brightness temperature obtained by selecting the warmest brightness temperature within a 77 array of AMSU-B pixels. If, however, T53.6 is below 248 K, then the brightness temperature at 183 + 3 GHz is compared, instead, to a different threshold T3, where T3 ¼242:5þ5 cos u

(12:26)

The 183 + 3-GHz band is used to flag potential precipitation when the 183 + 7-GHz flag could be erroneously set by low-surface emissivity in very cold and dry atmospheres, as indicated by T53.6. These thresholds T7 and T3 are slightly colder than a saturated atmosphere would be, implying the presence of a microwave-absorbing cloud. If the locally filtered T53.6 is less than 242 K, then the pixel is assumed not to be precipitating. 12.5.3

Cloud-Clearing by Regional Laplacian Interpolation

Within regions flagged as potentially precipitating, strong precipitation is generally characterized by cold, cloud-induced perturbations of the AMSU-A tropospheric temperature sounding channels in the range of 52.5–55.6 GHz. Brightness temperature images approximately satisfy Laplace’s equation in the absence of precipitation. When the potentially precipitating FOVs have been identified, Laplacian interpolation can be performed to clear the brightness temperature image of the effects of precipitation, and the perturbations due to precipitation can be computed. Examples of 183 + 7-GHz data and the corresponding 50-km cold perturbations at 52.8 GHz are illustrated in Figure 12.14a and Figure 12.14c. Physical considerations and aircraft data show that convective cells near 54 GHz typically appear slightly off-center and less extended relative to the 183-GHz images [55,56].

12.5.4

Image Sharpening

The small interpolation errors in converting 54-GHz perturbations to 15-km contribute to the total errors and discrepancies discussed in Section 12.3. These 50-km-resolution 52.8GHz perturbations DT50,52.8 are then used to infer the perturbations DT15,52.8 [Figure 12.14d]. These might have been observed at 52.8 GHz with a 15-km resolution had those perturbations been distributed spatially in the same way as the cold perturbations observed at either 183 + 7 or 183 + 3 GHz, the choice between these two channels being the same as described above. This requires the bilinearly interpolated 50-km AMSU data

Satellite-Based Precipitation Retrieval Using Neural Networks 92⬚W

94⬚W

90⬚W

88⬚W

260

34⬚N

92⬚W

94⬚W

86⬚W

253 90⬚W

88⬚W

86⬚W

260 34⬚N

240

240

220 32⬚N

32⬚N

220 200 30⬚N

30⬚N

200

180 (a)

(b) 92⬚W

94⬚W

34⬚N

90⬚W

88⬚W

86⬚W

0

–2

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

0

–2

34⬚N

–4 –4

32⬚N

32⬚N

–6 –6 30⬚N

30⬚N

–8

–8 (c)

(d)

FIGURE 12.14 (See color insert following page 302.) Frontal system on September 13, 2000, 0130 UTC. (a) Brightness temperatures (K) near 183 + 7 GHz. (b) Brightness temperatures (K) near 183 + 3 GHz. (c) Brightness temperature perturbations (K) near 52.8 GHz. (d) Inferred 15-km-resolution brightness temperature perturbations (K) near 52.8 GHz.

to be resampled at the HSB beam positions. These inferred 15-km perturbations are computed for five AMSU-A channels using DT15,54 ¼ 20 tan h

DT15,183 DT50,54 DT50,183

(12:27)

The perturbation DT15,183 near 183 GHz is defined to be the difference between the observed radiance and the appropriate threshold given by (12.25) or (12.26). The perturbation DT50,54 near 54 GHz is defined to be the difference between the observed radiance and the Laplacian-interpolated radiance based on those pixels surrounding the flagged region [58]. Any warm perturbations in the images of DT15,183 and DT50,54 are set to zero. Limb and surface-emissivity corrections to nadir for the five 54-GHz channels are produced by neural networks for each channel; they operate on nine AMSU-A channels above 52 GHz, the cosine of the viewing angle w from nadir, and a land–sea flag (Figure 12.12). They were trained on seven orbits spaced over 1 year for latitudes up to +558. Inferred 50- and 15-km precipitation-induced perturbations at 52.8 GHz are shown in Figure 12.14c and Figure 12.14d for a frontal system. Such estimates of 15-km perturbations near 54 GHz help characterize heavily precipitating small cells.

Signal and Image Processing for Remote Sensing

254 12.5.5

Temperature and Water Vapor Profile Principal Components

One important determinant of precipitation is the temperature profile. Warmer atmospheres can hold more water vapor and result in higher vertical updraft velocities. Therefore, inputs to the neural net in Figure 12.10 should include some that have information about the temperature profile. For each of AMSU-A channels 4–8, the brightness temperatures were corrected for limb and surface effects and then processed to eliminate precipitation signatures with the methods described in Section 12.4.1 through Section 12.4.3. The corrected brightness temperatures from all five of these channels could have been inputs to the neural net in Figure 12.10, but it was determined that a more compact representation of these channels was sufficient. PCA was applied to these five channels, and the first three principal components were found to be sufficient for characterizing the temperature profile. Adding the fourth and fifth principal components did not significantly improve the training of the neural net. The water vapor profile is another important determinant of precipitation. Higher concentrations of water vapor can result in higher precipitation rates. The water vapor principal components are computed using AMSU-A channels 1–3, and 15, and the AMSU-B 150-, 183 + 7-, 183 + 3-, and 183 + 1-GHz channels. Some of these channels are sensitive to surface variations. Therefore, it is necessary to project the vector of these observations onto a subspace that is not significantly sensitive to surface variations. Constrained PCA, which was described in Section 12.4.2.2, was used to compute the water vapor principal components. A set of pixels without precipitation and with different types of surfaces was selected to compute surface-sensitive eigenvectors using PCA. The surface-sensitive eigenvectors were determined by visual inspection of the preconstraint principal components for correlation with surface features (e.g., land and sea boundaries). Then, a set of data that also included precipitation was selected. The observations over this set were projected onto a linear subspace that was orthogonal to the subspace spanned by the surface-sensitive eigenvectors. Then, PCA was done on the resulting data set to determine the water vapor principal components. It was found that two water vapor principal components were adequate for characterizing the eight channels.

12.5.6

The Neural Net

All 13 of the variables listed at the beginning of this section are fed into the neural net used for 15-km precipitation-rate retrievals, as shown in Figure 12.12. The relative insensitivity of these inputs to surface emissivity is important to the success of this technique over land, ice, and snow. This network was trained to minimize the rms value of the difference between the logarithms of the (AMSUþ1 mm/h) and (NEXRADþ1 mm/h) retrievals; the use of logarithms reduced the emphasis on the heaviest rain rates, which were roughly three orders of magnitude greater than the lightest rates. Adding 1 mm/h reduced the emphasis on the lightest rain rates, which were more noise-dominated. These intuitive choices clearly impact the retrieval error distribution, and therefore further studies should enable algorithm improvements. However, retrievals with training optimized for low rain rates did not markedly improve that regime. NEXRAD precipitation retrievals with a 2-km resolution were smoothed to approximate Gaussian spatial averages that were centered on and approximated the view-angle-distorted 15- or 50-km antenna beam patterns. The accuracy of NEXRAD precipitation observations is known to vary with distance; therefore, only points beyond 30 km, but within 110 km, of each NEXRAD radar site were included in the data used to train and test the neural nets. Eighty different networks were trained using the Levenberg–Marquardt algorithm, each with different numbers of nodes

Satellite-Based Precipitation Retrieval Using Neural Networks

255

and water vapor principal components. A network with nearly the best performance over the testing dataset was chosen; it used two surface-blind water vapor principal components, and a slightly better performance was achieved with five water vapor principal components with increased surface sensitivity. The final network had one hidden layer with five nodes that used the tanh sigmoid function. These neural networks were similar to those described in Ref. [57]. The resulting 15-km-resolution precipitation retrievals were then smoothed to yield 50-km retrievals. The 15-km retrieval neural network was trained using precipitation data from the 38 orbits listed in Table 12.3. During this period, the radio interference to AMSU-B was negligible relative to other sources of retrieval error. Each 15-km pixel flagged as potentially precipitating using 183 + 7- or 183 + 3-GHz radiances (see Figure 12.11 and Figure 12.12) was used for training, validation, or testing of the neural network. For these 38 orbits over the United States, 15 one-hundred and sixty 15-km pixels were flagged and considered suitable for training, validation, and testing; half were used for training and one quarter were used for each of validation and testing, where the validation pixels were used to determine when the training of the neural network should cease. On the basis of the final AMSU and NEXRAD 15-km retrievals, approximately 14 and 38%, respectively, of the flagged 15-km pixels appeared to have been precipitating less than 0.1 mm/h for the test set.

12.6

Retrieval Performance Evaluation

This section presents three forms of evaluation for this initial precipitation-rate retrieval algorithm: (1) representative qualitative comparisons of AMSU and NEXRAD precipitation rate images, (2) quantitative comparisons of AMSU and NEXRAD retrievals stratified by NEXRAD rain rate, and (3) representative precipitation images at more extreme latitudes beyond the NEXRAD training zone.

12.6.1

Image Comparisons of NEXRAD and AMSU-A/B Retrievals

Each NEXRAD comparison at 15-km resolution occurred within 8 min of satellite overpass; such coincidence is needed to characterize single-pixel retrievals because convective precipitation evolves rapidly on this spatial scale. Although comparison with instruments such as TRMM and SSM/I would be useful, their orbits unfortunately overlap those of AMSU within 8 min so infrequently (if ever) that comparisons over precipitation are too rare to be useful until several years of data have been analyzed. This challenge of simultaneity and the sporadic character of rain have restricted most prior instrument comparisons (passive microwave satellites, radar, rain gauges) to dimensions over 100 km and to periods of an hour to a month [58–60]. The uniformity and extent of the NEXRAD network offer a unique degree of simultaneity on 15- and 50-km scales and also the ability to match the Gaussian shape of the AMSU antenna beams. Although these AMSU/HSB–NEXRAD comparisons are encouraging because they involve single pixels and independent physics and facilities, further extensive analyses are required for real validation. For example, comparisons of precipitation averages and differences over the same time/space units used to validate other precipitation measurement systems (e.g., SSM/I [61], ATOVS, TRMM, rain gauges) are needed to characterize variances and systematic biases based on the precipitation rate, type, location, or season.

Signal and Image Processing for Remote Sensing

256

These biases include any present in the NEXRAD data used to train the AMSU/HSB algorithm; once characterized, they can be diminished. Any excess variance experienced for rain cells too small to be resolved by AMSU/HSB can also eventually be better characterized, although it is believed to be modest for cells with microwave signatures larger than 10 km. Smaller cells contribute little to the total rainfall.

12.6.2

Numerical Comparisons of NEXRAD and AMSU-A/B Retrievals

Figure 12.15a and Figure 12.15b present 15-km-resolution precipitation retrieval images for September 13, 2000, obtained from NEXRAD and AMSU, respectively. On this occasion, both sensors yielded rain rates over 50 mm/h at similar locations and lower rain rates down to 0.5 mm/h over comparable areas. The revealed morphology is thus very similar, even though AMSU observes 6 min before NEXRAD, and it senses altitudes that are separated by several kilometers; rain falling at a nominal rate of 10 m/s takes 10 min to fall 6 km. Figure 12.16 shows the scatter between the 15-km AMSU and NEXRAD rain-rate retrievals for the test pixels not used for training or validation. Figure 12.17 shows the scatter between the 50-km AMSU and NEXRAD rain-rate retrievals over all points flagged as precipitating.

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

34⬚N

50

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

50

40 34⬚N

40

30

30 32⬚N

32⬚N

30⬚N

20

20

10 30⬚N

10

0

0 (a)

(b) 92⬚W

94⬚W

34⬚N

90⬚W

88⬚W

86⬚W

50

92⬚W

94⬚W

90⬚W

88⬚W

86⬚W

50

40 34⬚N

40

30

30 32⬚N

32⬚N

30⬚N

20

20

10 30⬚N

10

0

0 (c)

(d)

FIGURE 12.15 (See color insert following page 302.) Precipitation rates (mm/h) above 0.5 mm/h observed on September 13, 2000, 0130 UTC. (a) 15-km-resolution NEXRAD retrievals, (b) 15-km-resolution AMSU retrievals, (c) 50-km-resolution NEXRAD retrievals, and (d) 50km-resolution AMSU retrievals.

Satellite-Based Precipitation Retrieval Using Neural Networks

257

15-km AMSU rain rate (mm/h) + 0.1 mm/h

103

102

101

100

10−1 10−1

100

101

102

103

15-km NEXRAD rain rate (mm/h) + 0.1 mm/h FIGURE 12.16 Comparison of AMSU and NEXRAD estimates of rain rate at 15-km resolution.

50-km AMSU rain rate (mm/h) + 0.1 mm/h

102

101

100

10−1 10−1

100 101 50-km NEXRAD rain rate (mm/h) + 0.1 mm/h

FIGURE 12.17 Comparison of AMSU and NEXRAD estimates of rain rate at 50-km resolution.

102

Signal and Image Processing for Remote Sensing

258

The relative sensitivity of AMSU and NEXRAD to light and heavy rain can be seen in Figure 12.17. In general, these figures suggest that AMSU responds less to the highest radar rain rates, perhaps because AMSU is less sensitive to the bright-band or hail anomalies that affect the radar. They also suggest that the risk of false rain detections increases for AMSU retrievals below 0.5 mm/h at a 50-km resolution, although further study is required. Greater accuracy at these low rates requires more space-time averaging and careful calibration. The risk of overestimating rain rate also appears to be limited. Only 3.3% of the total AMSU-derived rainfall was in areas where AMSU saw more than 1 mm/h and NEXRAD saw less than 1 mm/h. Only 7.6% of the total NEXRAD-derived rainfall was in areas where NEXRAD saw more than 1 mm/h and AMSU saw less than 1 mm/h. These percentages were compared with the total percentages of AMSU and NEXRAD rain that fell at rates above 1 mm/h, which were 94 and 97%, respectively. It is also interesting to see to what degree does each sensor retrieve rain when the other does not, and how much rain does each sensor miss. For example, of the 73 NEXRAD 15-km rain-rate retrievals in Figure 12.16 above 54 mm/h, none were found by AMSU to be below 3 mm/h, and of the 61 AMSU 15-km retrievals above 45 mm/h, none were found by NEXRAD to be below 16 mm/h. Also, of the 69 NEXRAD 50-km rainrate retrievals in Figure 12.17 above 30 mm/h, none were found by AMSU to be below 5 mm/h, and of the 102 AMSU 50-km retrievals above 16 mm/h, none were found by NEXRAD to be below 10 mm/h. Perhaps the most significant AMSU precipitation performance metric is the rms difference between the NEXRAD and AMSU rain-rate retrievals; these are grouped by retrieved NEXRAD rain rates in octaves. The central 26 AMSU-A scan angles and central 78 AMSU-B scan angles were included in these evaluations; only the outermost two AMSUA angles on each side were omitted. These comparisons used all 50-km pixels and only the 15-km pixels were not used for training or validation. The results are listed in Table 12.5. The smoothing of the 15-km NEXRAD and AMSU results to a nominal 50-km resolution was consistent with an AMSU-A Gaussian beamwidth of 3.38. The rms agreement between these two very different precipitation-rate sensors appears surprisingly good, particularly since a single AMSU neural network is used over all angles, seasons, and latitudes. The 3-GHz radar retrievals respond most strongly to the largest hydrometeors, especially those below the bright band near the freezing level, while AMSU interacts with the general population of hydrometeors in the top few kilometers of the precipitation cell, which may lie several kilometers above the freezing level. Much of the agreement between AMSU and NEXRAD rain-rate retrievals must therefore result from the statistical consistency of the relations between rain rate and its various electromagnetic signatures. It is difficult to say how much of the observed TABLE 12.5 RMS AMSU/NEXRAD Discrepancies (mm/h) NEXRAD Range <0.5 mm/h 0.5–1 mm/h 1–2 mm/h 2–4 mm/h 4–8 mm/h 8–16 mm/h 16–32 mm/h >32 mm/h

15-km Resolution

50-km Resolution

1.0 2.0 2.3 2.7 3.5 6.9 19.0 42.9

0.5 0.9 1.1 1.8 3.2 6.6 12.9 22.1

Satellite-Based Precipitation Retrieval Using Neural Networks

259

discrepancy is due to each sensor or how well each correlates with precipitation reaching the ground. Furthermore, this study provided an opportunity for evaluation of radar data. The rms discrepancies between AMSU and NEXRAD retrievals were separately calculated over all points at ranges from 110 to 230 km from any radar. For NEXRAD precipitation rates below 16 mm/h, these rms discrepancies were approximately 40% greater than those computed for test points at the 30- to 110-km range. At rain rates greater than 16 mm/h, the accuracies beyond 110 km were more comparable. Most points in the eastern United States are more than 110 km from any NEXRAD radar site.

12.6.3

Global Retrievals of Rain and Snow

Figure 12.18 illustrates precipitation-rate retrievals at points around the globe where radar confirmation data are scarce. Figure 12.18a shows precipitation retrievals in the tropics over a mix of land and sea, while Figure 12.18b shows a more intense tropical 120⬚E

125⬚E

130⬚E

135⬚E

14⬚N

22⬚N

12⬚N

20⬚N

95⬚E

100⬚E

105⬚E

110⬚E

18⬚N

10⬚N

16⬚N 8⬚N

14⬚N

6⬚N

12⬚N

4⬚N

10⬚N 8⬚N

2⬚N

6⬚N (b)

(a) 0⬚ 75⬚N

108⬚W

104⬚W

74⬚W 45⬚N

100⬚W

72⬚W

70⬚W

68⬚W

44⬚N 74⬚N 43⬚N 42⬚N

73⬚N

41⬚N 72⬚N 40⬚N 71⬚N (c)

39⬚N (d)

FIGURE 12.18 AMSU precipitation-rate retrievals (mm/h) with 15-km resolution. (a) Philippines, April 16, 2000; (b) Indochina, July 5, 2000; (c) Canada, August 2, 2000; and (d) New England snowstorm, March 5, 2001. Precipitation-rate retrievals exceed 0.5 mm/h in the shaded regions, and contours are drawn for 0.5 mm/h, 2 mm/h, 8 mm/h, 32 mm/h, and 128 mm/h. The peak retrieved values are 47 mm/h, 143 mm/h, 30 mm/h, and 1.5 mm/h in (a), (b), (c), and (d), respectively

Signal and Image Processing for Remote Sensing

260

event. Figure 12.18c illustrates strong precipitation near 728N to 748N, again over both land and sea. Finally, Figure 12.18d illustrates the March 5, 2001, New England snowstorm that deposited roughly a foot of snow within a few hours: an accumulation somewhat greater than is indicated by the retrieved rain rates of 1.2 mm/h. This applicability of the algorithm to snowfall rate should be expected because the observed radio emission originates exclusively at high altitudes. Whether the hydrometeors are rain or snow on impact depends only on air temperatures near the surface—far below those altitudes being probed. For essentially all of the pixels shown in Figure 12.18, the adjacent clear air exhibited temperature and humidity profiles (inferred from AMSU) within the range of the training set. Nonetheless, regional biases are expected and will require evaluation. For example, polar stratiform precipitation is expected to exhibit relatively weaker radiometric signatures in winter when the temperature lapse rates are lower, and snow-covered mountains in cold polar air can produce false detections.

12.7

Conclusions

In this chapter, the precipitation estimation method for microwave radiometric data from Chen and Staelin and the role of signal processing methods were described. The development of the Chen–Staelin algorithm shows that signal processing can play a useful role in satellite-based precipitation estimation. In this algorithm, which was developed for AMSU-A/B and AMSU/HSB, PCA was used to reduce the dimensionality of selected sets of channels and to separate the effects of surface variations from atmospheric variations. Data fusion was used to sharpen 50-km data from AMSU-A so that 15-km precipitation retrievals could be done. Laplacian filtering was applied to data from the 54-GHz band to quantify the effects of clouds, and neural nets were trained to learn the mathematical relationships between precipitation and the information resulting from the signal processing. The signal processing components of the algorithm were designed to process the brightness temperature measurements in a way that extracts the most relevant information, and the neural net was trained to learn the relationship between precipitation rate and the inputs. This Chen–Staelin algorithm represents a step in the ongoing development of microwave precipitation retrieval algorithms. The algorithm of Chen and Staelin likely can be improved by choosing more general signal processing methods or fine-tuning the ones already being used. For example, variations or extensions of PCA such as independent component analysis (ICA) could be used [63,64]. Additionally, methods like PCA and ICA can be improved by incorporating components from physics-based methods. Additional improvements can be made by making better use of the data from window channels. Future instruments also present opportunities for better precipitation retrievals. The Advanced Technology Microwave Sounder (ATMS) to be launched aboard the NPOESS preparatory project (NPP) and NPOESS satellite series could be considered a more advanced version of AMSU-A/B and AMSU/HSB because it has a set of channels very similar to that of AMSU-A/B and offers finer resolution and sampling for most channels [65,66]. Because of the similarity of channel sets, the algorithm of Chen and Staelin can be a starting point for the development of an algorithm for ATMS. The finer resolution and sampling of ATMS will likely lead to better image-sharpening methods and temperature and water vapor profile characterization.

Satellite-Based Precipitation Retrieval Using Neural Networks

261

Ackn owledgments The author wishes to thank P.W Rosenkranz, E. Williams, and M.M. Wolfson for helpful discussions, and S.P.L. Maloney and C. Lebell for assistance with the NEXRAD data. This work was supported by the National Oceanographic and Atmospheric Administration under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and not necessarily endorsed by the United States Government.

Re fer enc es 1. F.T. Ulaby, R.K. Moore, and A.K. Fung, Microwave Remote Sensing: Active and Passive, AddisonWesley, Reading, MA, 1981. 2. P.W. Rosenkranz, Retrieval of temperature of moisture profiles from AMSU-A and AMSU-B measurements, IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2429–2435, 2001. 3. P.W. Rosenkranz, Rapid radiative transfer model for AMSU/HSB channels, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 362–368, 2003. 4. L. Shi, Retrieval of atmospheric temperature profiles from AMSU-A measurement using a neural network approach, Journal of Atmospheric and Oceanic Technology, 18, 340–347, 2001. 5. W.J. Blackwell, J.W. Barrett, F.W. Chen, R.V. Leslie, P.W. Rosenkranz, M.J. Schwartz, and D.H. Staelin, NPOESS Aircraft Sounder Testbed-Microwave (NAST-M), instrument description and initial flight results, IEEE Transactions on Geoscience and Remote Sensing, 39(11), 2444–2453, 2001. 6. R.V. Leslie, W.J. Blackwell, P.W. Rosenkranz, and D.H. Staelin, 183-GHz and 425-GHz passive microwave sounders on the NPOESS aircraft sounder testbed-microwave (NASTM), Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 1, 506–508, 2003. 7. R.V. Leslie, J.A. Loparo, P.W. Rosenkranz, and D.H. Staelin, Cloud and precipitation observations with the NPOESS Aircraft Sounder Testbed-Microwave (NAST-M) spectrometer suite at 54/118/183/425 GHz, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 2, 1212–1214, 2003. 8. R.V. Leslie and D.H. Staelin, NPOESS Aircraft Sounder Testbed-Microwave (NAST-M) observations of clouds and precipitation at 54, 118, 183, and 425 GHz, IEEE Transactions on Geoscience and Remote Sensing, 42(10), 2240–2247, 2004. 9. J. Houghton, The Physics of the Atmospheres, Cambridge University Press, New York, 2002. 10. Tropical Rainfall Measuring Mission, http://trmm.gsfc.nasa.gov/ 11. R.A. Houze, Cloud Dynamics, Academic Press, New York, 1993. 12. G. Hufford, A model for the complex permittivity of ice at frequencies below 1 THz, International Journal of Infrared and Millimeter Waves, 12(7), 677–682, 1991. 13. D. Deirmendjian, Electromagnetic Scattering on Spherical Polydispersions, Elsevier, New York, 1969. 14. M.S. Spina, M.J. Schwartz, D.H. Staelin, and A.J. Gasiewski, Application of multilayer feedforward neural networks to precipitation cell-top altitude estimation, IEEE Transactions on Geoscience and Remote Sensing, 36(1), 154–162, 1998. 15. A.J. Gasiewski, Atmospheric Temperature Sounding and Precipitation Cell Parameter Estimation Using Passive 118-GHz O2 Observations, Ph.D. thesis, MIT Department of Electrical Engineering and Computer Science, December 1988. 16. A.J. Gasiewski, Microwave radiative transfer in hydrometeors, Atmospheric Remote Sensing by Microwave Radiometry, Ed. M.A. Janssen, John Wiley & Sons, New York, 1993. 17. N. Grody, F. Weng, and R. Ferraro, Application of AMSU for obtaining hydrological parameters, Microwave Radiometry and Remote Sensing of the Earth’s Surface and Atmosphere, Eds. P. Pampaloni and S. Paloscia, pp. 339–351, 2000.

262

Signal and Image Processing for Remote Sensing

18. F. Weng, L. Zhao, R.R. Ferraro, G. Poe, X. Li, and N.C. Grody, Advanced microwave sounding unit cloud and precipitation algorithms, Radio Science, 38(4), 8068, 2003; doi: 10.1029/ 2002RS002679. 19. J.P. Hollinger, SSM/I instrument evaluation, IEEE Transactions on Geoscience and Remote Sensing, 28(5), 781–790, 1990. 20. C. Kummerow and L. Giglio, A passive microwave technique for estimating rainfall and vertical structure information from space. Part I: Algorithm description, Journal of Applied Meteorology, 33, 3–18, 1994. 21. D. Tsintikidis, J.L. Haferman, E.N. Anagnostou, W.F. Krajewski, and T.F. Smith, A neural network approach to estimating rainfall from spaceborne microwave data, IEEE Transactions on Geoscience and Remote Sensing, 35(5), 1079–1093, 1997. 22. T. Kawanishi, T. Sezai, Y. Ito, K. Imaoka, T. Takeshima, Y. Ishido, A. Shibata, M. Miura, H. Inahata, and R.W. Spencer, The Advanced Microwave Scanning Radiometer for the Earth observing system (AMSR-E), NASDA’s contribution to the EOS for global energy and water cycle studies, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 184–194, 2003. 23. T. Wilheit, C.D. Kummerow, and R. Ferraro, Rainfall algorithms for AMSR-E, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 204–213, 2003. 24. J. Ikai and K. Nakamura, Comparison of rain rates over the ocean derived from TRMM microwave imager and precipitation radar, Journal of Atmospheric and Oceanic Technology, 20, 1709–1726, 2003. 25. C. Kummerow, W. Barnes, T. Kozu, J. Shiue, and J. Simpson, The tropical rainfall measuring mission (TRMM) sensor package, Journal of Atmospheric and Oceanic Technology, 15, 809–817, 1998. 26. C. Kummerow and L. Giglio, A passive microwave technique for estimating rainfall and vertical structure information from space. Part I: Algorithm description, Journal of Applied Meteorology, 33, 3–18, 1994. 27. T. Wilheit, C.D. Kummerow, and R. Ferraro, Rainfall algorithms for AMSR-E, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 204–213, 2003. 28. T.J. Hewison and R. Saunders, Measurements of the AMSU-B antenna pattern, IEEE Transactions on Geoscience and Remote Sensing, 34(2), 405–412, 1996. 29. T. Mo, AMSU-A antenna pattern corrections, IEEE Transactions on Geoscience and Remote Sensing, 37(1), 103–112, 1999. 30. P.W. Rosenkranz, Improved rapid transmittance algorithm for microwave sounding channels, Proceedings of the 1998 IEEE International Geoscience and Remote Sensing Symposium, 2, 728–730, 1998. 31. B.H. Lambrigtsen, Calibration of the AIRS microwave instruments, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 369–378, 2003. 32. H.H. Aumann, M.T. Chahine, C. Gautier, M.D. Goldberg, E. Kalnay, L.M. McMillin, H. Revercomb, P.W. Rosenkranz, W.L. Smith, D.H. Staelin, L.L. Strow, and J. Susskind, AIRS/AMSU/ HSB on the aqua mission: design, science objectives, data products, and processing systems, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 253–264, 2003. 33. C.L. Parkinson, Aqua: an Earth-observing satellite mission to examine water and other climate variables, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 173–183, 2003. 34. W.J. Blackwell, Retrieval of Cloud-Cleared Atmospheric Temperature Profiles from Hyperspectral Infrared and Microwave Observations, Sc.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 2002. 35. W.J. Blackwell, Retrieval of atmospheric temperature and moisture profiles from hyperspectral sounding data using a projected principal components transform and a neural network, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 3, 2078–2081, 2003. 36. C.R. Cabrera-Mercader, Robust Compression of Multispectral Remote Sensing Data, Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Boston, MA, 1999. 37. F.W. Chen, Characterization of Clouds in Atmospheric Temperature Profile Retrievals, M.E. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Boston, MA, June 1998.

Satellite-Based Precipitation Retrieval Using Neural Networks

263

38. J. Lee, Blind Noise Estimation and Compensation for Improved Characterization of Multivariate Processes, Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, March 2000. 39. J. Lee and D.H. Staelin, Iterative signal-order and noise estimation for multivariate data, IEE Electronics Letters, 37(2), 134–135, 2001. 40. A. Mueller, Iterative Blind Seperation of Gaussian Data of Unknown Order, M.E. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer, June 2003. 41. I.T. Joliffe, Principal Component Analysis, Springer-Verlag, New York, 2002. 42. L. Wald, Some terms of reference in data fusion, IEEE Transactions on Geoscience and Remote Sensing, 37(3), 1190–1193, 1999. 43. D.L. Hall and J. Llinas, An introduction to multisensor data fusion, Proceedings of the IEEE, 85(1), 6–23, 1997. 44. C. Pohl and J.L. van Genderen, Multisensor image fusion in remote sensing: concepts, methods, and application, International Journal of Remote Sensing, 19(5), 823–854, 1998. 45. V.J.D. Tsai, Frequency-based fusion of multiresolution images, Proceedings of the 2003 IEEE International Geoscience and Remote Sensing Symposium, 6, 3665–3667, 2003. 46. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 1. Linear case, Radio Science, 13(6), 1003–1010, 1978. 47. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 2. Nonlinear dependence of observables on the geophysical parameters, Radio Science, 17(1), 245–256, 1982. 48. P.W. Rosenkranz, Inversion of data from diffraction-limited multiwavelength remote sensors, 3. Scanning multichannel microwave radiometer data, Radio Science, 17(1), 257–267, 1982. 49. R.P. Lippmann, An introduction to computing with neural nets, IEEE Acoustics, Speech, and Signal Processing Magazine, 4(2), 4–22, 1987. 50. S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1994. 51. M.T. Hagan and M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Transactions on Neural Networks, 5(6), 989–993, 1994. 52. D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of the Society for Industrial and Applied Mathematics, 11(2), 431–441, 1963. 53. D. Nguyen and B. Widrow, Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights, Proceedings of the International Joint Conference on Neural Networks, 3, 21–26, 1990. 54. Neural Network Toolbox User’s Guide, The MathWorks, Inc., Natick, Massachusetts, 1998. 55. W.J. Blackwell et al., NPOESS aircraft sounder testbed-microwave (NAST-M): instrument description and initial flight results, IEEE Transactions on Geoscience and Remote Sensing, 39, 2444–2453, 2001. 56. A.J. Gasiewski, D.M. Jackson, J.R. Wang, P.E. Racette, and D.S. Zacharias, Airborne imaging of tropospheric emission at millimeter and submillimeter wavelengths, in Proceedings IGARSS, 2, 663–665, 1994. 57. D.H. Staelin and F.W. Chen, Precipitation observations near 54 and 183 GHz using the NOAA15 satellite, IEEE Transactions on Geoscience and Remote Sensing, 38, 2322–2332, 2000. 58. P.A. Arkin and P. Xie, The global precipitation climatology project: first algorithm intercomparison project, Bulletin of the American Meteorological Society, 75(3), 401–420, 1994. 59. E.A. Smith et al., Results of WetNet PIP-2 project, Journal of the Atmospheric Sciences, 55(9), 1483– 1536, 1998. 60. R.F. Adler et al., Intercomparison of global precipitation products: the third precipitation intercomparison project (PIP-3), Bulletin of the American Meteorological Society, 82(7), 1377–1396, 2001. 61. M.D. Conner and G.W. Petty, Validation and intercomparison of SSM/I rain-rate retrieval methods over the continental U.S., Journal of Applied Meteorology, 37(7), 679–700, 1998. 62. A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, New York, 2001. 63. F.R. Bach and M.I. Jordan, Kernel independent component analysis, Journal of Machine Learning Research, 3, 1–48, 2002.

264

Signal and Image Processing for Remote Sensing

64. J.C. Shiue, The advanced technology microwave sounder, a new atmospheric temperature and humidity sounder for operational polar-orbiting weather satellites, Proceedings of the SPIE, 4540, 159–165, 2001. 65. C. Muth, P.S. Lee, J.C. Shiue, and W.A. Webb, Advanced technology microwave sounder on NPOESS and NPP, Proceedings of the 2004 IEEE International Geoscience Remote Sensing Symposium, 4, 2454–2458, 2004. 66. F.W. Chen and D.H. Staelin, AIRS/AMSU/HSB precipitation estimates, IEEE Transactions on Geoscience and Remote Sensing, 41(2), 410–417, 2003. 67. F.W. Chen, Global Estimation of Precipitation using opaque Microwave Bands, PhD thesis, Massachusetts Institute of Technology, Massachusetts, 2004.

Part II

Image Processing for Remote Sensing

13 Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

Dale L. Schuler, Jong-Sen Lee, and Dayalan Kasilingam

CONTENTS 13.1 Introduction ..................................................................................................................... 268 13.2 Measurement of Directional Slopes and Wave Spectra ........................................... 268 13.2.1 Single Polarization versus Fully Polarimetric SAR Techniques ............... 268 13.2.2 Single-Polarization SAR Measurements of Ocean Surface Properties..... 269 13.2.3 Measurement of Ocean Wave Slopes Using Polarimetric SAR Data....... 271 13.2.3.1 Orientation Angle Measurement of Azimuth Slopes ................ 271 13.2.3.2 Orientation Angle Measurement Using the Circular-Pol Algorithm .......................................................................................... 271 13.2.4 Ocean Wave Spectra Measured Using Orientation Angles....................... 272 13.2.5 Two-Scale Ocean-Scattering Model: Effect on the Orientation Angle Measurement ......................................................................................... 275 13.2.6 Alpha Parameter Measurement of Range Slopes........................................ 277 13.2.6.1 Cloude–Pottier Decomposition Theorem and the Alpha Parameter .......................................................................................... 277 13.2.6.2 Alpha Parameter Sensitivity to Range Traveling Waves .......... 279 13.2.6.3 Alpha Parameter Measurement of Range Slopes and Wave Spectra .................................................................................... 280 13.2.7 Measured Wave Properties and Comparisons with Buoy Data ............... 282 13.2.7.1 Coastal Wave Measurements: Gualala River Study Site........... 282 13.2.7.2 Open-Ocean Measurements: San Francisco Study Site ............. 284 13.3 Polarimetric Measurement of Ocean Wave–Current Interactions.......................... 286 13.3.1 Introduction ....................................................................................................... 286 13.3.2 Orientation Angle Changes Caused by Wave–Current Interactions ....... 287 13.3.3 Orientation Angle Changes at Ocean Current Fronts ................................ 291 13.3.4 Modeling SAR Images of Wave–Current Interactions ............................... 291 13.4 Ocean Surface Feature Mapping Using Current-Driven Slick Patterns ................ 293 13.4.1 Introduction ....................................................................................................... 293 13.4.2 Classification Algorithm .................................................................................. 297 13.4.2.1 Unsupervised Classification of Ocean Surface Features ........... 297 13.4.2.2 Classification Using Alpha–Entropy Values and the Wishart Classifier............................................................................. 297 13.4.2.3 Comparative Mapping of Slicks Using Other Classification Algorithms ........................................................................................ 300 13.5 Conclusions...................................................................................................................... 300 References ................................................................................................................................... 302 267

Signal and Image Processing for Remote Sensing

268

13.1

Introduction

Selected methods that use synthetic aperture radar (SAR) image data to remotely sense ocean surfaces are described in this chapter. Fully polarimetric SAR radars provide much more usable information than conventional single-polarization radars. Algorithms, presented here, to measure directional wave spectra, wave slopes, wave–current interactions, and current-driven surface features use this additional information. Polarimetric techniques that measure directional wave slopes and spectra with data collected from a single aircraft, or satellite, collection pass are described here. Conventional single-polarization backscatter cross-section measurements require two orthogonal passes and a complex SAR modulation transfer function (MTF) to determine vector slopes and directional wave spectra. The algorithm to measure wave spectra is described in Section 13.2. In the azimuth (flight) direction, wave-induced perturbations of the polarimetric orientation angle are used to sense the azimuth component of the wave slopes. In the orthogonal range direction, a technique involving an alpha parameter from the well-known Cloude–Pottier ) polarimetric decomposition theorem is entropy/anisotropy/averaged alpha (H/A/ used to measure the range slope component. Both measurement types are highly sensitive to ocean wave slopes and are directional. Together, they form a means of using polarimetric SAR image data to make complete directional measurements of ocean wave slopes and wave slope spectra. NASA Jet Propulsion Laboratory airborne SAR (AIRSAR) P-, L-, and C-band data obtained during flights over the coastal areas of California are used as wave-field examples. Wave parameters measured using the polarimetric methods are compared with those obtained using in situ NOAA National Data Buoy Center (NDBC) buoy products. In a second topic (Section 13.3), polarization orientation angles are used to remotely sense ocean wave slope distribution changes caused by ocean wave–current interactions. The wave–current features studied include surface manifestations of ocean internal waves and wave interactions with current fronts. A model [1], developed at the Naval Research Laboratory (NRL), is used to determine the parametric dependencies of the orientation angle on internal wave current, windwave direction, and wind-wave speed. An empirical relation is cited to relate orientation angle perturbations to the underlying parametric dependencies [1]. A third topic (Section 13.4) deals with the detection and classification of biogenic slick fields. Various techniques, using the Cloude–Pottier decomposition and Wishart classifier, are used to classify the slicks. An application utilizing current-driven ocean features, marked by slick patterns, is used to map spiral eddies. Finally, a related technique, using the polarimetric orientation angle, is used to segment slick fields from ocean wave slopes.

13.2 13.2.1

Measurement of Directional Slopes and Wave Spectra Single Polarization versus Fully Polarimetric SAR Techniques

SAR systems conventionally use backscatter intensity-based algorithms [2] to measure physical ocean wave parameters. SAR instruments, operating at a single-polarization, measure wave-induced backscatter cross section, or sigma-0, modulations that can be

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

269

developed into estimates of surface wave slopes or wave spectra. These measurements, however, require a parametrically complex MTF to relate the SAR backscatter measurements to the physical ocean wave properties [3]. Section 13.2.3 through Section 13.2.6 outline a means of using fully polarimetric SAR (POLSAR) data with algorithms [4] to measure ocean wave slopes. In the Fourier-transform domain, this orthogonal slope information is used to estimate a complete directional ocean wave slope spectrum. A parametrically simple measurement of the slope is made by using POLSAR-based algorithms. Modulations of the polarization orientation angle, u, are largely caused by waves traveling in the azimuth direction. The modulations are, to a lesser extent, also affected by range traveling waves. A method, originally used in topographic measurements [5], has been applied to the ocean and used to measure wave slopes. The method measures vector components of ocean wave slopes and wave spectra. Slopes smaller than 18 are measurable for ocean surfaces using this method. , described in Ref. An eigenvector or eigenvalue decomposition average parameter [6], is used to measure wave slopes in the orthogonal range direction. Waves in the range direction cause modulation of the local incidence angle f, which, in turn, changes . The alpha parameter is ‘‘roll-invariant.’’ This means that it is not affected the value of by slopes in the azimuth direction. Likewise, for ocean wave measurements, the orientation angle u parameter is largely insensitive to slopes in the range direction. An , u) is, therefore, capable of measuring slopes in any algorithm employing both ( direction. The ability to measure a physical parameter in two orthogonal directions within an individual resolution cell is rare. Microwave instruments, generally, must have a two-dimensional (2D) imaging or scanning capability to obtain information in two orthogonal directions. Motion-induced nonlinear ‘‘velocity-bunching’’ effects still present difficulties for wave measurements in the azimuth direction using POLSAR data. These difficulties are dealt with by using the same proven algorithms [3,7] that reduce nonlinearities for singlepolarization SAR measurements.

13.2.2

Single-Polarization SAR Measurements of Ocean Surface Properties

SAR systems have previously been used for imaging ocean features such as surface waves, shallow-water bathymetry, internal waves, current boundaries, slicks, and ship wakes [8]. In all of these applications, the modulation of the SAR image intensity by the ocean feature makes the feature visible in the image [9]. When imaging ocean surface waves, the main modulation mechanisms have been identified as tilt modulation, hydrodynamic modulation, and velocity bunching [2]. Tilt modulation is due to changes in the local incidence angle caused by the surface wave slopes [10]. Tilt modulation is strongest for waves traveling in the range direction. Hydrodynamic modulation is due to the hydrodynamic interactions between the long-scale surface waves and the short-scale surface (Bragg) waves that contribute most of the backscatter at moderate incidence angles [11]. Velocity bunching is a modulation process that is unique to SAR imaging systems [12]. It is a result of the azimuth shifting of scatterers in the image plane, owing to the motion of the scattering surface. Velocity bunching is the highest for azimuth traveling waves. In the past, considerable effort had gone into retrieving quantitative surface wave information from SAR images of ocean surface waves [13]. Data from satellite SAR missions, such as ERS 1 and 2 and RADARSAT 1 and 2, had been used to estimate surface wave spectra from SAR image information. Generally, wave height and wave

270

Signal and Image Processing for Remote Sensing

slope spectra are used as quantitative overall descriptors of the ocean surface wave properties [14]. Over the years, several different techniques have been developed for retrieving wave spectra from SAR image spectra [7,15,16]. Linear techniques, such as those having a linear MTF, are used to relate the wave spectrum to the image spectrum. Individual MTFs are derived for the three primary modulation mechanisms. A transformation based on the MTF is used to retrieve the wave spectrum from the SAR image spectrum. Since the technique is linear, it does not account for any nonlinear processes in the modulation mechanisms. It has been shown that SAR image modulation is nonlinear under certain ocean surface conditions. As the sea state increases, the degree of nonlinear behavior generally increases. Under these conditions, the linear methods do not provide accurate quantitative estimates of the wave spectra [15]. Thus, the linear transfer function method has limited utility and can be used as a qualitative indicator. More accurate estimates of wave spectra require the use of nonlinear inversion techniques [15]. Several nonlinear inversion techniques have been developed for retrieving wave spectra from SAR image spectra. Most of these techniques are based on a technique developed in Ref. [7]. The original method used an iterative technique to estimate the wave spectrum from the image spectrum. Initial estimates are obtained using a linear transfer function similar to the one used in Ref. [15]. These estimates are used as inputs in the forward SAR imaging model, and the revised image spectrum is used to iteratively correct the previous estimate of the wave spectra. The accuracy of this technique is dependent on the specific SAR imaging model. Improvements to this technique [17] have incorporated closed-form descriptions of the nonlinear transfer function, which relates the wave spectrum to the SAR image spectrum. However, this transfer function also has to be evaluated iteratively. Further improvements to this method have been suggested in Refs. [3,18]. In this method, a cross-spectrum is generated between different looks of the same ocean wave scene. The primary advantage of this method is that it resolves the 1808 ambiguity [3,18] of the wave direction. This method also reduces the effects of speckle in the SAR spectrum. Methods that incorporate additional a posteriori information about the wave field, which improves the accuracy of these nonlinear methods, have also been developed in recent years [19]. In all of the slope-retrieval methods, the one nonlinear mechanism that may completely destroy wave structure is velocity bunching [3,7]. Velocity bunching is a result of moving scatterers on the ocean surface either bunching or dilating in the SAR image domain. The shifting of the scatterers in the azimuth direction may, in extreme conditions, result in the destruction of the wave structure in the SAR image. SAR imaging simulations were performed at different range-to-velocity (R/V) ratios to study the effect of velocity bunching on the slope-retrieval algorithms. When the (R/V) ratio is artificially increased to large values, the effects of velocity bunching are expected to destroy the wave structure in the slope estimates. Simulations of the imaging process for a wide range of radar-viewing conditions indicate that the slope structure is preserved in the presence of moderate velocity-bunching modulation. It can be argued that for velocity bunching to affect the slope estimates, the (R/V) ratio has to be significantly larger than 100 s. The two data sets discussed here are designated ‘‘Gualala River’’ and ‘‘San Francisco.’’ The Gualala river data set has the longest waves and it also produces the best results. The R/V ratio for the AIRSAR missions was 59 s (Gualala) and 55 s (San Francisco). These values suggest that the effects of velocity bunching are present, but are not sufficiently strong to significantly affect the slope-retrieval process. However, for spaceborne SAR imaging applications, where the (R / V) ratio may be greater than 100 s, the effects of velocity bunching may limit the utility of all methods, especially in high sea states.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface 13.2.3

271

Measurement of Ocean Wave Slopes Using Polarimetric SAR Data

In this section, the techniques that were developed for the measurement of ocean surface slopes and wave spectra using the capabilities of fully polarimetric radars are discussed. Wave-induced perturbations of the polarization orientation angle are used to directly measure slopes for azimuth traveling waves. This technique is accurate for scattering from surface resolution cells where the sea return can be represented as a two-scale Bragg-scattering process. 13.2.3.1

Orientation Angle Measurement of Azimuth Slopes

It has been shown [5] that by measuring the orientation angle shift in the polarization signature, one can determine the effects of the azimuth surface tilts. In particular, the shift in the orientation angle is related to the azimuth surface tilt, the local incidence angle, and, to a lesser degree, the range tilt. This relationship is derived [20] and independently verified [6] as tan v tan u ¼ (13:1) sin f tan g cos f where u, tan v, tan g, and f are the shifts in the orientation angle, the azimuth slope, the ground range slope, and the radar look angle, respectively. According to Equation 13.1, the azimuth tilts may be estimated from the shift in the orientation angle, if the look angle and range tilt are known. The orthogonal range slope tan g can be estimated using the value of the local incidence angle associated with the alpha parameter for each pixel. The azimuth slope tan v and the range slope tan g provide complete slope information for each image pixel. For the ocean surface at scales of the size of the AIRSAR resolution cell (6.6 m 8.2 m), the averaged tilt angles are small and the denominator in Equation 13.1 may be approximated by sin f for a wide range of look angles, cos f, and ground range slope, tan g, values. Under this approximation, the ocean azimuth slope, tan v, is written as tan v ﬃ ( sin f) tan u

(13:2)

The above equation is important because it provides a direct link between polarimetric SAR measurable parameters and physical slopes on the ocean surface. This estimation of ocean slopes relies only on (1) the knowledge of the radar look angle (generally known from the SAR viewing geometry) and (2) the measurement of the wave-perturbed orientation angle. In ocean areas where the average scattering mechanism is predominantly tilted-Bragg scatter, the orientation angle can be measured accurately for angular changes <18, as demonstrated in Ref. [20]. POLSAR data can be represented by the scattering matrix for single-look complex data and by the Stokes matrix, the covariance matrix, or the coherency matrix for multi-look data. An orientation angle shift causes rotation of all these matrices about the line of sight. Several methods have been developed to estimate the azimuth slope–induced orientation angles for terrain and ocean applications. The ‘‘polarization signature maximum’’ method and the ‘‘circular polarization’’ method have proven to be the two most effective methods. Complete details of these methods and the relation of the orientation angle to orthogonal slopes and radar parameters are given [21,22]. 13.2.3.2

Orientation Angle Measurement Using the Circular-Pol Algorithm

Image processing was done with both the polarization signature maximum and the circular polarization algorithms. The results indicate that for ocean images a significant improvement in wave visibility is achieved when a circular polarization algorithm is chosen. In addition to this improvement, the circular polarization algorithm is

272

Signal and Image Processing for Remote Sensing

computationally more efficient. Therefore, the circular polarization algorithm method was chosen to estimate orientation angles. The most sensitive circular polarization estimator [21], which involves RR (right-hand transmit, right-hand receive) and LL (left-hand transmit, left-hand receive) terms, is * i) þ p]=4 u ¼ [Arg(hSRR SLL

(13:3)

A linear-pol basis has a similar transmit-and-receive convention, but the terms (HH, VV, HV, VH) involve horizontal (H) and vertical (V) transmitted, or received, components. The known relations between a circular-pol basis and a linear-pol basis are SRR ¼ (SHH SVV þ i2SHV )=2 SLL ¼ (SVV SHH þ i2SHV )=2

(13:4)

Using the above equation, the Arg term of Equation 13.3 can be written as 1 * i 4Re h(SHH SVV )SHV A u ¼ Arg(hSRR S*LL i) ¼ tan1 @ hjSHH SVV j2 i þ 4hjSHV j2 i 0

(13:5)

The above equation gives the orientation angle, u, in terms of three of the terms of the linear-pol coherency matrix. This algorithm has been proven to be successful in Ref. [21]. An example of the accuracy is cited from related earlier studies involving wave–current interactions [1]. In these studies, it has been shown that small wave slope asymmetries could be accurately detected as changes in the orientation angle. These small asymmetries had been predicted by theory [1] and their detection indicates the sensitivity of the circular-pol orientation angle measurement. 13.2.4

Ocean Wave Spectra Measured Using Orientation Angles

NASA/JPL/AIRSAR data were taken (1994) at L-band imaging a northern California coastal area near the town of Gualala (Mendocino County) and the Gualala River. This data set was used to determine if the azimuth component of an ocean wave spectrum could be measured using orientation angle modulation. The radar resolution cell had dimensions of 6.6 m (range direction) and 8.2 m (azimuth direction), and 3 3 boxcar averaging was done to the data inputted into the orientation angle algorithms. Figure 13.1 is an L-band, VV-pol, pseudo color-coded image of a northern California coastal area and the selected measurement study site. A wave system with an estimated dominant wavelength of 157 m is propagating through the site with a wind-wave direction of 3068 (estimates from wave spectra, Figure 13.4). The scattering geometry for a single average tilt radar resolution cell is shown in Figure 13.2. Modulations in the polarization orientation angle induced by azimuth traveling ocean waves in the study area are shown in Figure 13.3a and a histogram of the orientation angles is given in Figure 13.3b. An orientation angle spectrum versus wave number for azimuth direction waves propagating in the study area is given in Figure 13.4. The white rings correspond to ocean wavelengths of 50, 100, 150, and 200 m. The dominant 157-m wave is propagating at a heading of 3068. Figure 13.5a and Figure 13.5b give plots of spectral intensity versus wave number (a) for wave-induced orientation angle modulations and (b) for single-polarization (VV-pol)-intensity modulations. The plots are of wave spectra taken in the direction that maximizes the dominant wave peak. The orientation angle–dominant wave peak

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

273

N

Northern California Pacific Ocean

Wave direction (306ⴗ)

Gualala River

Range Study−site box (512 512)

Flight direction (azimuth) FIGURE 13.1 (See color insert following page 302.) An L-band, VV-pol, AIRSAR image, of northern California coastal waters (Gualala River dataset), showing ocean waves propagating through a study-site box.

(Figure 13.5a) has a significantly higher signal and background ratio than the conventional intensity-based VV-pol-dominant wave peak (Figure 13.5b). Finally, the orientation angles measured within the study sites were converted into azimuth direction slopes using an average incidence angle and Equation 13.2. From the estimates of these values, the ocean rms azimuth slopes were computed. These values are given in Table 13.1.

Signal and Image Processing for Remote Sensing

274

z

V

Ra

dar

H

Incidence plane

I1

N (Surface normal)

f (Ground range)

y

(A

zim

ut

h)

Tilted ocean resolution cell

x , I2 FIGURE 13.2 Geometry of scattering from a single, tilted, resolution cell. In the Gualala River dataset, the resolution cell has dimensions 6.6 m (range direction) and 8.2 m (azimuth direction).

(a)

Study area: orientation angles

FIGURE 13.3 (a) Image of modulations in the orientation angle, u, in the study site and (b) a histogram of the distribution of study-site u values. (continued)

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

275

(b) 1.2 x 104

Occurence number

1.0 x 104 8.0 x 103 6.0 x 103 4.0 x 103 2.0 x 103 0

–6

–4

–2 0 2 Orientation angle (deg)

4

6

FIGURE 13.3 (continued)

13.2.5

Two-Scale Ocean-Scattering Model: Effect on the Orientation Angle Measurement

In Section 13.2.3.2, Equation 13.5 is given for the orientation angle. This equation gives the orientation angle u as a function of three terms from the polarimetric coherency matrix T. Scattering has only been considered as occurring from a slightly rough, tilted surface equal to or greater than the size of the radar resolution cell (see Figure 13.2). The surface is planar and has a single tilt us. This section examines the effects of having a distribution of azimuth tilts, p (w), within the resolution cell, rather than a single averaged tilt. For single-look or multi-look processed data, the coherency matrix is defined as 2

D E jSHH þ SVV j2

3 h(SHH þ SVV )(SHH SVV )*i 2 (SHH þ SVV )S*HV D E 7 16 6 7 T ¼ hkk*T i ¼ 6 h(SHH SVV )(SHH þ SVV )*i 2 (SHH SVV )S*HV 7 jSHH SVV j2 5 24 D E 2hSHV (SHH þ SVV )*i 2hSHV (SHH SVV )*i 4 jSHV j2 2

3 S þ SVV 1 4 HH with k ¼ pﬃﬃﬃ SHH SVV 5 2 2SHV

(13:6)

We now follow the approach of Cloude and Pottier [23]. The composite surface consists of flat, slightly rough ‘‘facets,’’ which are tilted in the azimuth direction with a distribution of tilts, p(w): 1 jw sj b p(w) ¼ 2b (13:7) 0 otherwise where s is the average surface azimuth tilt of the entire radar resolution cell. The effect on T of having both (1) a mean bias azimuthal tilt us and (2) a distribution of azimuthal tilts b has been calculated in Ref. [22] as 2

3 A B sin c(2b) cos 2us B sin c(2b) sin 2us 5 T ¼ 4 B* sin c(2b) cos 2us 2C( sin2 2us þ sin c(4b) cos 4us ) C(1 2 sin c(4b)) sin 4us 2 B* sin c(2b) sin 2us C(1 2 sin c(4b)) sin 4us 2C( cos 2us sin c(4b) cos 4us ) (13:8)

Signal and Image Processing for Remote Sensing

276

50 m

Dominant wave: 157 m 100 m

200 m

150 m

Wave direction 306ⴗ FIGURE 13.4 (See color insert following page 302.) Orientation angle spectra versus wave number for azimuth direction waves propagating through the study site. The white rings correspond to 50, 100, 150, and 200 m. The dominant wave, of wavelength 157 m, is propagating at a heading of 3068.

where sin c(x) is ¼ sin(x)/x and A ¼ jSHH þ SVV j2 ,

B ¼ (SHH þ SVV )(S*HH S*VV ), C ¼ 0:5jSHH SVV j2

Equation 13.8 reveals the changes due to the tilt distribution (b) and bias (us) that occur in all terms, except in the term A ¼ jSHH þ SVVj2, which is roll-invariant. In the corresponding expression for the orientation angle, all of the other terms, except the denominator term hjSHH SVVj2i, are modified ! 4Re h(SHH SVV )S*HV i 1 u ¼ Arg(hSRR S*LL i) ¼ tan (13:9) hjSHH SVV j2 i þ 4hjSHV j2 i From Equation 13.8 and Equation 13.9, it can be determined that the exact estimation of the orientation angle u becomes more difficult as the distribution of ocean tilts b becomes stronger and wider because the ocean surface becomes progressively rougher.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

277

L-band orientation angle wave spectra

(a)

Spectral intensity

0.50

0.40

0.30

0.20

0.10

0.00

(b)

0.06 0.02 0.04 Azimuthal wave number

0.08

L-band, VV-intensity wave spectra 0.50

Spectral intensity

0.40

0.30

0.20

0.10 0.00 0.00

0.02

0.06 0.08 0.04 Azimuthal wave number

0.10

FIGURE 13.5 Plots of spectral intensity versus wave number (a) for wave-induced orientation angle modulations and (b) for VV-pol intensity modulations. The plots are taken in the propagation direction of the dominant wave (3068).

13.2.6

Alpha Parameter Measurement of Range Slopes

A second measurement technique is needed to remotely sense waves that have significant propagation direction components in the range direction. The technique must be more sensitive than current intensity-based techniques that depend on tilt and hydrodynamic modulations. Physically based POLSAR measurements of ocean slopes in the range direction are achieved using a technique involving the ‘‘alpha’’ parameter of the Cloude–Pottier polarimetric decomposition theorem [23]. 13.2.6.1

Cloude–Pottier Decomposition Theorem and the Alpha Parameter

The Cloude–Pottier entropy, anisotropy, and the alpha polarization decomposition theorem [23] introduce a new parameterization of the eigenvectors of the 3 3 averaged coherency matrix hjTji in the form

Signal and Image Processing for Remote Sensing

278 TABLE 13.1 Northern California: Gualala Coastal Results

In Situ Measurement Instrument Bodega Bay, CA 3-m Point Arena, CA Discus Buoy 46013 Wind Station

Parameter Dominant wave period (s) Dominant wavelength (m) Dominant wave direction (8) rms slopes azimuth direction (8) rms slopes range direction (8) Estimate of wave height (m)

10.0

Orientation Angle Method

N/A

Alpha Angle Method

156 From period, depth 320 Est. from wind direction N/A

10.03 From dominant wave number N/A 157 From wave spectra 284 Est. from 306 From wave wind direction spectra N/A 1.58

10.2 From dominant wave number 162 From wave spectra 306 From wave spectra N/A

N/A

N/A

N/A

1.36

2.4 Significant wave height

N/A

2.16 Est. from rms 1.92 Est. from rms slope, wave number slope, wave number

Date: 7/15/94; data start time (UTC): 20:04:44 (BB, PA), 20:02:98 (AIRSAR); wind speed: 1.0 m/s (BB), 2.9 m/s (PA) Mean ¼ 1.95 m/s; wind direction: 3208 (BB), 2848 (PA), mean ¼ 3028; Buoy: ‘‘Bodega Bay’’ (46013) ¼ BB; location: 38.23 N 123.33 W; water depth: 122.5 m; wind station: ‘‘Point Arena’’ (PTAC – 1) ¼ PA; location: 38.968 N, 123.74 W; study-site location: 38839.60 N, 123835.80 W.

2

l1 hjTji ¼ [U3 ] 4 0 0

3 0 0 5 [U3 ]*T l3

0 l2 0

(13:10)

where 2

cos a1 [U3 ] ¼ ejf 4 sin a1 cos b1 ejd1 sin a1 sin b1 ejg1

cos a2 ejf2 sin a2 cos b2 ejd2 sin a2 sin b2 ejg2

3 cos a3 ejf3 sin a3 cos b3 ejd3 5 sin a3 sin b3 ejg3

(13:11)

The average estimate of the alpha parameter is a ¼ P1 a 1 þ P 2 a 2 þ P 3 a 3

(13:12)

where li Pi ¼ Pj¼3 j¼1

: lj

(13:13)

The individual alphas are for the three eigenvectors and the Ps are the probabilities defined with respect to the eigenvalues. In this method, the average alpha is used and is, a. For the ocean backscatter, the contributions to the for simplicity, defined as average alpha are dominated, however, by the first eigenvalue or eigenvector term. The alpha parameter, developed from the Cloude–Pottier polarimetric scattering decomposition theorem [23], has desirable directional measurement properties. It is (1) roll-invariant in the azimuth direction and (2) in the range direction, it is highly sensitive to wave-induced modulations of f in the local incidence angle f. Thus, the

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

279

alpha parameter is well suited for measuring wave components traveling in the range direction, and discriminates against wave components traveling in the azimuth direction. 13.2.6.2 Alpha Parameter Sensitivity to Range Traveling Waves The alpha angle sensitivity to range traveling waves may be estimated using the small perturbation scattering model (SPSM) as a basis. For flat, slightly rough scattering areas that can be characterized by Bragg scattering, the scattering matrix has the form

S¼

SHH 0

0

SVV

(13:14)

Bragg-scattering coefficients SVV and SHH are given by

SHH

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ «r sin2 fi qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ cos fi þ «r sin2 fi cos fi

and

SVV ¼

(«r 1)( sin2 fi «r (1 þ sin2 fi ) ) qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 «r cos fi þ «r sin2 fi

(13:15)

The alpha angle is defined such that the eigenvectors of the coherency matrix T are parameterized by a vector k as 2 3 cos a k ¼ 4 sin a cos bejd 5 sin a sin bejg (13:16) For Bragg scattering, one may assume that there is only one dominant eigenvector (depolarization is negligible) and the eigenvector is given by 2

3 SVV þ SHH k ¼ 4 SVV SHH 5 0

(13:17)

Since there is only one dominant eigenvector, for Bragg scattering, a ¼ a1. For a horizontal, slightly rough resolution cell, the orientation angle b ¼ 0, and d may be set to zero. With these constraints, comparing Equation 13.16 and Equation 13.17 yields tan a ¼

SVV SHH SVV þ SHH

(13:18)

For « ! 1, SVV ¼ 1 þ sin2 fi and SHH ¼ cos2 fi

(13:19)

tan a ¼ sin2 fi

(13:20)

which yields

Figure 13.6 shows the alpha angle as a function of incidence angle for « ! 1 (blue) and for (red) « ¼ 80–70j, which is a representative dielectric constant of sea water. The sensitivity to alpha values to incidence angle changes (this is effectively the polarimetric

Signal and Image Processing for Remote Sensing

280 45 40

Alpha angle (deg)

35 30 25 20 15 10 5 0 0

10

20

30

40

50

60

70

80

90

Incidence angle (deg) FIGURE 13.6 (See color insert following page 302.) Small perturbation model dependence of alpha on the incidence angle. The red curve is for a dielectric constant representative of sea water (80–70j) and the blue curve is for a perfectly conducting surface.

MTF) as the range slope estimation is dependent on the derivative of a with respect to fi. For « ! 1 Da sin 2fi ¼ Dfi 1 þ sin4 fi

(13:21)

Figure 13.7 shows this curve (blue) and the exact curve for « ¼ 80–70j (red). Note that for the typical AIRSAR range of incidence angles (20–608), across the swath, the effective MTF is high (>0.5). 13.2.6.3

Alpha Parameter Measurement of Range Slopes and Wave Spectra

Model studies [6] result in an estimate of what the parametric relation a versus the incidence angle f should be for an assumed Bragg-scatter model. The sensitivity (i.e., the slope of the curve of a(f)) was large enough (Figure 13.6) to warrant investigation using real POLSAR ocean backscatter data. In Figure 13.8a, a curve of a versus the incidence angle f is given for a strip of Gualala data in the range direction that has been averaged 10 pixels in the azimuth direction. This curve shows a high sensitivity for the slope of a(f). Figure 13.8b gives a histogram of the frequency of occurrence of the alpha values. The curve of Figure 13.8a was smoothed by utilizing a least-square fit of the a(f) data to a third-order polynomial function. This closely fitting curve was used to transform the a values into corresponding incidence angle f perturbations. Pottier [6] used a model-based approach and fitted a third-order polynomial to the a(f) (red curve) of Figure 13.6 instead of using the smoothed, actual, image a(f) data. A distribution of f values has been made and the rms range slope value has been determined. The rms range slope values for the data sets are given in Table 13.1 and Table 13.2.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

281

0.9 0.8

Derivative of alpha (f )

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

10

20

30 40 50 60 70 Incidence angle (F ) (deg)

80

90

FIGURE 13.7 (See color insert following page 302.) Derivative of alpha with respect to the incidence angle. The red curve is for a sea water dielectric and the blue curve is for a perfectly conducting surface.

Finally, to measure an alpha wave spectrum, an image of the study area is formed with the mean of a(f) removed line by line in the range direction. An FFT of the study area results in the wave spectrum that is shown in Figure 13.9. The spectrum of Figure 13.9 is an alpha spectrum in the range direction. It can be converted to a range direction wave slope spectrum by transforming the slope values obtained from the smoothed alpha, a(f), values.

Alpha (deg)

(a)

40

20

0 20

30

40 Incidence angle (deg)

50

60

FIGURE 13.8 Empirical determination of the (a) sensitivity of the alpha parameter to the radar incidence angle (for Gualala River data) and (b) a histogram of the alpha values occurring within the study site. (continued)

Signal and Image Processing for Remote Sensing

282 (b) 1.4 × 104

Alpha value occurrences

1.2 × 104 1.0 × 104 8.0 × 103 6.0 × 103 4.0 × 103 2.0 × 103 0 –6

–4

–2 0 2 4 Alpha parameter perturbation (deg)

6

FIGURE 13.8 (continued) (b) a histogram of the alpha values occurring within the study site.

13.2.7

Measured Wave Properties and Comparisons with Buoy Data

The ocean wave properties estimated from the L- and P-band SAR data sets and the algorithms are the (1) dominant wavelength, (2) dominant wave direction, (3) rms slopes (azimuth and range), and (4) average dominant wave height. The NOAA NDBC buoys provided data on the (1) dominant wave period, (2) wind speed and direction, (3) significant wave height, and (4) wave classification (swell and wind waves). Both the Gualala and the San Francisco data sets involved waves classified as swell. Estimates of the average wave period can be determined either from buoy data or from the SAR-determined dominant wave number and water depth (see Equation 13.22). The dominant wavelength and direction are obtained from the wave spectra (see Figure 13.4 and Figure 13.9). The rms slopes in the azimuth direction are determined from the distribution of orientation angles converted to slope angles using Equation 13.2. The rms slopes in the range direction are determined by the distribution of alpha angles converted to slope angles using values of the smoothed curve fitted to the data of Figure 13.8a. Finally, an estimate of the average wave height, Hd, of the dominant wave was made using the peak-to-trough rms slope in the propagation direction Srms and the dominant wavelength ld. The estimated average dominant wave height was then determined from tan (Srms) ¼ Hd/(ld/2). This average dominant wave height estimate was compared with the (related) significant wave height provided by the NDBC buoy. The results of the measurement comparisons are given in Table 13.1 and Table 13.2. 13.2.7.1

Coastal Wave Measurements: Gualala River Study Site

For the Gualala River data set, parameters were calculated to characterize ocean waves present in the study area. Table 13.1 gives a summary of the ocean parameters that were determined using the data set as well as wind conditions and air and sea temperatures at the nearby NDBC buoy (‘‘Bodega Bay’’) and wind station (‘‘Point Arena’’) sites. The most important measured SAR parameters were rms wave slopes (azimuth and range directions), rms wave height, dominant wave period, and dominant wavelength. These quantities were estimated using the full-polarization data and the NDBC buoy data.

Open Ocean: Pacific Swell Results In Situ Measurement Instrument Parameter

San Francisco, CA, 3 m Discus Buoy 46026

Half Moon Bay, CA, 3 m Discus Buoy 46012

Dominant wave period (s)

15.7

15.7

Dominant wavelength (m) Dominant wave direction (8)

376 From period, depth 289 Est. from wind direction N/A N/A 3.10 Significant wave height

364 From period, depth 280 Est. from wind direction N/A N/A 2.80 Significant wave height

rms slopes azimuth direction (8) rms slopes range direction (8) Estimate of wave height (m)

Orientation Angle Method

Alpha Angle Method

15.17 From dominant wave number 359 From wave spectra 265 From wave spectra

15.23 From dominant wave number 362 From wave spectra 265 From wave spectra

0.92 N/A 2.88 Est. from rms slope, wave number

N/A 0.86 2.72 Est. from rms slope, wave number

Date: 7/17/88; data start time (UTC): 00:45:26 (Buoys SF, HMB), 00:52:28 (AIRSAR); wind speed: 8.1 m/s (SF), 5.0 m/s (HMB), mean ¼ 6.55 m/s; wind direction: 2898 (SF), 2808 (HMB), mean ¼ 284.58; Buoys: ‘‘San Francisco’’ (46026) ¼ SF; location: 37.75 N 122.82 W; water depth: 52.1 m; ‘‘Half Moon Bay’’ (46012) ¼ HMB; location: 37.36 N 122.88 W; water depth: 87.8 m.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

TABLE 13.2

283

Signal and Image Processing for Remote Sensing

284

Wave specturm

Dominant wave: 162 m

50 m

200 m

100 m

150 m

Wave direction 306ⴗ

FIGURE 13.9 (See color insert following page 302.) Spectrum of waves in the range direction using the alpha parameter from the Cloude–Pottier decomposition method. Wave direction is 3068 and dominant wavelength is 162 m.

The dominant wave at the Bodega Bay buoy during the measurement period is classified as a long wavelength swell. The contribution from wind wave systems or other swell components is small relative to the single dominant wave system. Using the surface gravity wave dispersion relation, one can calculate the dominant wavelength at this buoy location where the water depth is 122.5 m. The dispersion relation for surface water waves at finite depth is v2W ¼ gkW tan h(kH)

(13:22)

where vW is the wave frequency, kW is the wave number (2p/l), and H is the water depth. The calculated value for l is given in Table 13.1. A spectral profile similar to Figure 13.5a was developed for the alpha parameter technique and a dominant wave was measured having a wavelength of 156 m and a propagation direction of 3068. Estimates of the ocean parameters obtained using the orientation angle and alpha angle algorithms are summarized in Table 13.1. 13.2.7.2 Open-Ocean Measurements: San Francisco Study Site AIRSAR P-band image data were obtained for an ocean swell traveling in the azimuth direction. The location of this image was to the west of San Francisco Bay. It is a valuable

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

285

N

Wave direction

FIGURE 13.10 P-band span image of near-azimuth traveling (2658) swell in the Pacific Ocean off the coast of California near San Francisco.

data set because its location is near two NDBC buoys (‘‘San Francisco’’ and ‘‘Half-Moon Bay’’). Figure 13.10 gives a SPAN image of the ocean scene. The long-wavelength swell is clearly visible. The covariance matrix data was first Lee-filtered to reduce speckle noise [24] and was then corrected radiometrically. A polarimetric signature was developed for a 512 512 segment of the image and some distortion was noted. Measuring the distribution of the phase between the HH-pol and VV-pol backscatter returns eliminated this distortion. For the ocean, this distribution should have a mean nearly equal to zero. The recalibration procedure set the mean to zero and the distortion in the polarimetric signature was corrected. Figure 13.11a gives a plot of the spectral intensity (cross-section modulation) versus the wave number in the direction of the dominant wave propagation. Figure 13.11b presents a spectrum of orientation angles versus the wave number. The major peak, caused by the visible swell, in both plots occurs at a wave number of 0.0175 m1 or a wavelength of 359 m. Using Equation 13.22, the dominant wavelength was calculated at the San Francisco/Half Moon Bay buoy positions and depths. Estimates of the wave parameters developed from this data set using the orientation and alpha angle algorithms are presented in Table 13.2.

Signal and Image Processing for Remote Sensing

286 (a) 0.20

Spectral intensity

0.15

0.10

0.05

0.00 0.00

(b)

0.02

0.04 Azimuthal wave number

0.06

0.08

0.02

0.04 Azimuthal wave number

0.06

0.08

0.20

Spectral intensity

0.15

0.10

0.05

0.00 0.00

FIGURE 13.11 (a) Wave number spectrum of P-band intensity modulations and (b) a wave number spectrum of orientation angle modulations. The plots are taken in the propagation direction of the dominant wave (2658).

13.3 13.3.1

Polarimetric Measurement of Ocean Wave–Current Interactions Introduction

Studies have been carried out on the use of polarization orientation angles to remotely sense ocean wave slope distribution changes caused by wave–current interactions. The wave–current features studied here involve the surface manifestations of internal waves [1,25,26–29,30–32] and wave modifications at oceanic current fronts. Studies have shown that polarimetric SAR data may be used to measure bare surface roughness [33] and terrain topography [34,35]. Techniques have also been developed for measuring directional ocean wave spectra [36].

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

287

The polarimetric SAR image data used in all of the studies are NASA JPL/AIRSAR, P-, L-, and C-band, quad-pol microwave backscatter data. AIRSAR images of internal waves were obtained from the 1992 Joint US/Russia Internal Wave Remote Sensing Experiment (JUSREX’92) conducted in the New York Bight [25,26]. AIRSAR data on current fronts were obtained during the NRL Gulf Stream Experiment (NRL-GS’90). The NRL experiment is described in Ref. [20]. Extensive sea-truth is available for both of these experiments. These studies were motivated by the observation that strong perturbations occur in the polarization orientation angle u in the vicinity of internal waves and current fronts. The remote sensing of orientation angle changes associated with internal waves and current fronts are applications that have only recently been investigated [27–29]. Orientation angle changes should also occur for the related SAR application involving surface expressions of shallow-water bathymetry [37]. In the studies outlined here, polarization orientation angle changes are shown to be associated with wave–current interaction features. Orientation angle changes are not, however, produced by all types of ocean surface features. For example, orientation angle changes have been successfully used here to discriminate internal wave signatures from other ocean features, such as surfactant slicks, which produce no mean orientation angle changes.

13.3.2

Orientation Angle Changes Caused by Wave–Current Interactions

A study was undertaken to determine the effect that several important types of wave– current interactions have on the polarization orientation angle. The study involved both actual SAR data and an NRL theoretical model described in Ref. [38]. An example of a JPL/AIRSAR VV-polarization, L-band image of several strong, intersecting, internal wave packets is given in Figure 13.12. Packets of internal waves are generated from parent solitons as the soliton propagates into shallower water at, in this case, the continental shelf break. The white arrow in Figure 13.12 indicates the propagation direction of a wedge of internal waves (bounded by the dashed lines). The packet members within the area of this wedge were investigated. Radar cross-section (s0) intensity perturbations for the type of internal waves encountered in the New York Bight have been calculated in [30,31,39] and others. Related perturbations also occur in the ocean wave height and slope spectra. For the solitons often found in the New York Bight area, these perturbations become significantly larger for ocean wavelengths longer than about 0.25 m and shorter than 10–20 m. Thus, the study is essentially concerned with slope changes to meter-length wave scales. The AIRSAR slant range resolution cell size for these data is 6.6 m, and the azimuth resolution cell size is 12.1 m. These resolutions are fine enough for the SAR backscatter to be affected by perturbed wave slopes (meter-length scales). The changes in the orientation angle caused by these wave perturbations are seen in Figure 13.13. The magnitude of these perturbations covers a range u ¼ [1 to þ1]. The orientation angle perturbations have a large spatial extent (>100 m for the internal wave soliton width). The hypothesis assumed was that wave–current interactions make the meter wavelength slope distributions asymmetric. A profile of orientation angle perturbations caused by the internal wave study packet is given in Figure 13.14a. The values are obtained along the propagation vector line of Figure 13.12. Figure 13.14b gives a comparison of the orientation angle profile (solid line) and a normalized VV-pol backscatter intensity profile (dotted-dash line) along the same interval. Note that the orientation angle positive peaks (white stripe areas, Figure 13.13) align with the negative

Signal and Image Processing for Remote Sensing

288

Range direction

Azimuth direction

α

Internal waves packet propagation vector

FIGURE 13.12 AIRSAR L-band, VV-pol image of internal wave intersecting packets in the New York Bight. The arrow indicates the propagation direction for the chosen study packet (within the dashed lines). The angle a relates to the SAR/ packet coordinates. The image intensity has been normalized by the overall average.

troughs (black areas, Figure 13.12). In the direction orthogonal to the propagation vector, every point is averaged 5 5 pixels along the profile. The ratio of the maximum of u caused by the soliton to the average values of u within the ambient ocean is quite large. The current-induced asymmetry creates a mean wave slope that is manifested as a mean orientation angle. The relation between the tangent of the orientation angle u, wave slopes in the radar azimuth and ground range directions (tan v, tan g), and the radar look angle f from [21] is given by Equation 13.1, and for a given look angle f the average orientation angle tangent is

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

–1.0

0

289

+1.0

Orientation angle FIGURE 13.13 The orientation angle image of the internal wave packets in the New York Bight. The area within the wedge (dashed lines) was studied intensively.

htan ui ¼

ðp ðp

tan u(v,g) P(v,g) dg dv

(13:23)

0 0

where P(v,g) is the joint probability distribution function for the surface slopes in the azimuth and range directions. If the slopes are zero-meaned, but P(v,g) is skewed, then the mean orientation angle may not be zero even though the mean azimuth and range slopes are zero. It is evident from the above equation that both the azimuth and the range

Signal and Image Processing for Remote Sensing

290 (a)

1.0

Orientation angle (deg)

Internal waves Ambient ocean

0.5

0.0

–0.5

–1.0 0

200

400

600

800

Distance along line (pixels)

Normalized orientation angle and VV-pol backscatter intensity

(b)

1.0

0.5

0.0

–0.5 Orientation angle VV-pol intensity

–1.0 0

200

400

600

800

Distance along line (pixels) FIGURE 13.14 (a) The orientation angle value profile along the propagation vector for the internal wave study packet of Figure 13.12 and (b) a comparison of the orientation angle profile (solid line) and a normalized VV-pol backscatter intensity profile (dot–dash line). Note that the orientation angle positive peaks (white areas, Figure 13.13) align with the negative troughs (black areas, Figure 13. 11).

slopes have an effect on the mean orientation angle. The azimuth slope effect is generally larger because it is not reduced by the cos f term, which only affects the range slope. If, for instance, the meter-wavelength waves are produced by a broad wind-wave spectrum, then both v and g change locally. This yields a nonzero mean for the orientation angle. Figure 13.15 gives a histogram of orientation angle values (solid line) for a box inside the black area of the first packet member of the internal wave. A histogram for the ambient ocean orientation angle values for a similar-sized box near the internal wave is given by the dot–dash–dot line in Figure 13.15. Notice the significant difference in the mean value of these two distributions. The mean change in htan (u)i inferred from the bias for the perturbed area within the internal wave is 0.03 rad, corresponding to a u value of 1.728. The mean water wave slope changes needed to cause such orientation angle changes are estimated from Equation 13.1. In the denominator of Equation 13.1, the value of tan(g) cos(f) sin(f) for the value f ( ¼ 518) at the packet member location. Using this approximation, the ensemble average of Equation 13.1 provides the mean azimuth slope value,

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

291

150

Internal wave

Occurrences

100

Ambient ocean

50

0 –0.10

–0.15 –0.00 –0.05 Mean orientation angle tangent (radians)

–0.10

FIGURE 13.15 Distributions of orientation angles for the internal wave (solid line) and the ambient ocean (dot–dash–dot line).

htan(v)i ﬃ sin (f)htan(u)i

(13:24)

From the data provided in Figure 13.15, htan(v)i ¼ 0.0229 rad or v ¼ 1.328. A slope value of this magnitude is in approximate agreement with slope changes predicted by Lyzenga et al. [32] for internal waves in the same area during an earlier experiment (SARSEX, 1988). 13.3.3

Orientation Angle Changes at Ocean Current Fronts

An example of orientation angle changes induced by a second type of wave–current interaction, the convergent current front, is given in Figure 13.16a and Figure 13.16b. This image was created using AIRSAR P-band polarimetric data. The orientation angle response to this (NRL-GS’90) Gulf-Stream convergent-current front is the vertical white linear feature in Figure 13.16a and the sharp peak in Figure 13.16b. The perturbation of the orientation angle at, and near, the front location is quite strong relative to angle fluctuations in the ambient ocean. The change in the orientation angle maximum is ﬃ0.688. Other fronts in the same area of the Gulf Stream have similar changes in the orientation angle. 13.3.4

Modeling SAR Images of Wave–Current Interactions

To investigate wave–current-interaction features, a time-dependent ocean wave model has been developed that allows for general time-varying current, wind fields, and depth [20,38]. The model uses conservation of the wave action to compute the propagation of a statistical wind-wave system. The action density formalism that is used and an outline of the model are both described in Ref. [38]. The original model has been extended [1] to include calculations of polarization orientation angle changes due to wave–current interactions.

Signal and Image Processing for Remote Sensing

292 (a)

Front location

(b) 1.0

Orientation angle (deg)

Orientation angle response to current front 0.5

0.0

–0.5 0

2

Distance (km)

4

6

FIGURE 13.16 Current front within the Gulf Stream. An orientation angle image is given in (a) and orientation angle values are plotted in (b) (for values along the white line in (a)).

Model predictions have been made for the wind-wave field, radar return, and perturbation of the polarization orientation angle due to an internal wave. A model of the surface manifestation of an internal wave has also been developed. The algorithm used in the model has been modified from its original form to allow calculation of the polarization orientation angle and its variation throughout the extent of the soliton current field at the surface.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

293

Orientation angle tangent maximum variation

6 Primary Reponse (0.25–10.0 m)

5 4 3 2 1 0 0

5

20 10 15 Ocean wavelength (m)

25

FIGURE 13.17 The internal wave orientation angle tangent maximum variation as a function of ocean wavelength as predicted by the model. The primary response is in the range of 0.25–10.0 m and is in good agreement with previous studies of sigma-0. (From Thompson, D.R., J. Geophys. Res., 93, 12371, 1988.)

The values of both RCS (hs0i) and htan(u)i are computed by the model. The dependence of htan(u)i on the perturbed ocean wavelength was calculated by the model. This wavelength dependence is shown in Figure 13.17. The waves resonantly perturb htan(u)i for wavelengths in the range of 0.25–10.0 m. This result is in good agreement with previous studies of sigma-0 resonant perturbations for the JUSREX’92 area [39]. Figure 13.18a and Figure 13.18b show the form of the soliton current speed dependence of hs0i and htan(u)i. The potentially useful near-linear relation of htan(u)iv with current U (Figure 13.18b) is important in applications where determination of current gradients is the goal. The near-linear nature of this relationship provides the possibility that, from the value of htan(u)iv, the current magnitude can be estimated. Examination of the model results has led to the following empirical model of the variation of htan(u)i as: htan ui ¼ f (U, w, uw ) ¼ (aU) (w2 ebw ) sin (ajcw j þ bc2w )

(13:25)

where U, the surface current maximum speed (in m/s), w, the wind speed (in m/s) at (standard) 19.5 m height, and cw, the wind direction (in radians) relative to the soliton propagation direction. The constants are a ¼ 0.00347, b ¼ 0.365, a ¼ 0.65714, and b ¼ 0.10913. The range of cw is over [p,p]. Using Equation 13.25, the dashed curve in Figure 13.18 can be generated to show good agreement relative to the complete model. The solid lines in Figure 13.18 represent results from the complete model and the dashed lines are results from the empirical relation of Equation 13.25. This relation is much simpler than conventional estimates based on perturbation of the backscatter intensity. The scaling for the relationship is a relatively simple function of the wind speed and the direction of the locally wind-driven sea. If the orientation angle and wind measurements are available, then Equation 13.25 allows the internal wave current maximum U to be calculated.

13.4 13.4.1

Ocean Surface Feature Mapping Using Current-Driven Slick Patterns Introduction

Biogenic and man-made slicks are widely dispersed throughout the oceans. Current driven surface features, such as spiral eddies, can be made visible by associated patterns of slicks [40]. A combined algorithm using the Cloude–Pottier decomposition and the Wishart classifier

Signal and Image Processing for Remote Sensing

294

Current speed dependence

(a)

Max RCS variation (dB)

–18 –20 –22 –24 –26 –28 –30 –32 0.0

Max 〈tan(θ)〉 variation

(b)

0.2 0.4 Peak current speed (m / s)

0.6

0.008

0.006

0.004

0.002

0.000 0.0

0.2 0.4 Peak current speed (m/s) L-band, VV-pol, wind = (6.0 m/s, 135°)

0.6

FIGURE 13.18 (a and b) Model development of the current speed dependence of the max RCS and htan(u)i variations. The dashed line in Figure 13.23b gives the values predicted by an empirical equation in Ref. [38].

[41] is utilized to produce accurate maps of slick patterns and to suppress the background wave field. This technique uses the classified slick patterns to detect spiral eddies. Satellite SAR instruments performing wave spectral measurements, or operating as wind scatterometers, regard the slicks as a measurement error term. The classification maps produced by the algorithm facilitate the flagging of slick-contaminated pixels within the image. Aircraft L-band AIRSAR data (4/2003) taken in California coastal waters provided data on features that contained spiral eddies. The images also included biogenic slick patterns, internal wave packets, wind waves, and long wave swell. The temporal and spatial development of spiral eddies is of considerable importance to oceanographers. Slick patterns are used as ‘‘markers’’ to detect the presence and extent of spiral eddies generated in coastal waters. In a SAR image, the slicks appear as black distributed patterns of lower return. The slick patterns are most prevalent during periods of low to moderate winds. The spatial distribution of the slicks is determined by local surface current gradients that are associated with the spiral eddies. It has been determined that biogenic surfactant slicks may be identified and classified using SAR polarimetric decompositions. The purpose of the decomposition is to discriminate against other features such as background wave systems. The

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

295

) of the Cloude–Pottier parameters entropy (H), anisotropy (A), and average alpha ( decomposition [23] were used in the classification. The results indicate that biogenic slick patterns, classified by the algorithm, can be used to detect the spiral eddies. The decomposition parameters were also used to measure small-scale surface roughness as well as larger-scale rms slope distributions and wave spectra [4]. Examples of slope distributions are given in Figure 13.8b and that of wave spectra in Figure 13.9. Small-scale roughness variations that were detected by anisotropy changes are given in Figure 13.19. This figure shows variations in anisotropy at low wind speeds for a filament of colder, trapped water along the northern California coast. The air–sea stability has changed for the region containing the filament. The roughness changes are not seen in (a) the conventional VV-pol image but are clearly visible in (b) an anisotropy image. The data are from coastal waters near the Mendocino Co. town of Gualala. Finally, the classification algorithm may also be used to create a flag for the presence of slicks. Polarimetric satellite SAR systems (e.g., RADARSAT-2, ALOS/PALSAR, SIR-C) attempting to measure wave spectra, or scatterometers measuring wind speed and direction can avoid using slick contaminated data. In April 2003, the NRL and the NASA Jet Propulsion Laboratory (JPL) jointly carried out a series of AIRSAR flights over the Santa Monica Basin off the coast of California. Backscatter POLSAR image data at P-, L-, and C-bands were acquired. The purpose of the flights was to better understand the dynamical evolution of spiral eddies, which are

N

Anisotropy-related Roughness variations

California coast

California coast

Cold water mass

Pacific Ocean

L-band, VV-pol image

Pacific Ocean

Anisotropy-A image

FIGURE 13.19 (See color insert following page 302.) (a) Variations in anisotropy at low wind speeds for a filament of colder, trapped water along the northern California coast. The roughness changes are not seen in the conventional VV-pol image, but are clearly visible in (b) an anisotropy image. The data are from coastal waters near the Mendocino Co. town of Gualala.

Signal and Image Processing for Remote Sensing

296

generated in this area by interaction of currents with the Channel Islands. Sea-truth was gathered from a research vessel owned by the University of California at Los Angeles (UCLA). The flights yielded significant data not only on the time history of spiral eddies but also on surface waves, natural surfactants, and internal wave signatures. The data were analyzed using a polarimetric technique, the Cloude–Pottier hH/A/ai decomposition given in Ref. [23]. In Figure 13.20a, the anisotropy is again mapped for a study site

Anisotropy-A image

f = 23

f = 62 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

L-band, VV-pol image FIGURE 13.20 (See color insert following page 302.) (a) Image of anisotropy values. The quantity, 1A, is proportional to small-scale surface roughness and (b) a conventional L-band, VV-pol image of the study area.

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

297

east of Catalina Island, CA. For comparison, a VV-pol image is given in Figure 13.20b. The slick field is reasonably well mapped by anisotropy—but the image is noisy because of the difference in the two small second and third eigenvalues that are used to compute it.

13.4.2

Classification Algorithm

The overall purpose of the field research effort outlined in Section 12.4.1 was to create a means of detecting ocean features such as spiral eddies using biogenic slicks as markers, while suppressing other effects such as wave fields and wind-gradient effects. A polarimetric classification algorithm [41–43] was tested as a candidate means to create such a feature map. 13.4.2.1 Unsupervised Classification of Ocean Surface Features Van Zyl [44] and Freeman–Durden [46] developed unsupervised classification algorithms that separate the image into four classes: odd-bounce, even bounce, diffuse (volume), and an in-determinate class. For an L-band image, the ocean surface typically is dominated by the characteristics of the Bragg-scattering odd (single) bounce. City buildings and structures have the characteristics of even (double) scattering, and heavy forest vegetation has the characteristics of diffuse (volume) scattering. Consequently, this classification algorithm provides information on the terrain scatterer type. For a refined separation into more classes, Pottier [6] proposed an unsupervised classification algorithm based on their target decomposition theory. The medium’s scattering mechanisms, characterized by average alpha angle, and later anisotropy A, were used for classification. entropy H, The entropy H is a measure of randomness of the scattering mechanisms, and the alpha angle characterizes the scattering mechanism. The unsupervised classification is achieved plane, which is segmented into by projecting the pixels of an image onto the H– scattering zones. The zones for the Gualala study-site data are shown in Figure 13.21. Details of this segmentation are given in Ref. [6]. In the alpha–entropy scattering zone map of the decomposition, backscatter returns from the ocean surface normally occur in the lowest (dark blue color) zone of both alpha and entropy. Returns from slick covered values, and occur in both the lowest areas have higher entropy H and average alpha zone and higher zones. 13.4.2.2

Classification Using Alpha–Entropy Values and the Wishart Classifier

Classification of the image was initiated by creating an alpha–entropy zone scatterplot to angle and level of entropy H for scatterers in the slick study area. determine the Secondly, the image was classified into eight distinct classes using the Wishart classifier [41]. The alpha–entropy decomposition method provides good image segmentation based on the scattering characteristics. The algorithm used is a combination of the unsupervised decomposition classifier and the supervised Wishart classifier [41]. One uses the segmented image of the decomposition method to form training sets as input for the Wishart classifier. It has been noted that , especially in the multi-look data are required to obtain meaningful results in H and entropy H. In general, 4-look processed data are not sufficient. Normally, additional averaging (e.g., 55 boxcar filter), either of the covariance or of coherency matrices, has computation. This prefiltering is done on all the to be performed prior to the H and . Initial classification data. The filtered coherency matrix is then used to compute H and is made using the eight zones. This initial classification map is then used to train the Wishart classification. The reclassified result shows improvement in retaining details.

Signal and Image Processing for Remote Sensing

Alpha angle

298

Entropy parameter FIGURE 13.21 (See color insert following page 302.) Alpha-entropy scatter plot for the image study area. The plot is divided into eight color-coded scattering classes for the Cloude–Pottier decomposition described in Ref. [6].

Further improvement is possible by using several iterations. The reclassified image is then used to update the cluster centers of the coherency matrices. For the present data, two iterations of this process were sufficient to produce good classifications of the complete biogenic fields. Figure 13.22 presents a completed classification map of the biogenic slick fields. Information is provided by the eight color-code classes in the image in Figure 13.22. The returns from within the largest slick (labeled as A) have classes that progressively increase in both average alpha and entropy as a path is made from clean water inward toward the center of the slick. Therefore, the scattering becomes less surfacelike ( increase) and also becomes more depolarized (H increase) as one approaches the center of the slick (Figure 13.22, Label A). The algorithm outlined above may be applied to an image containing large-scale ocean features. An image (JPL/CM6744) of classified slick patterns for two-linked spiral eddies near Catalina Island, CA, is given in Figure 13.23b. An L-band, HH-pol image is presented in Figure 13.23a for comparison. The Pacific swell is suppressed in areas where there are no slicks. The waves do, however, appear in areas where there are slicks because the currents associated with the orbital motion of the waves alternately compress or expand the slick-field density. Note the dark slick patch to the left of label A in Figure 13.23a and Figure 13.23b. This patch clearly has strongly suppressed the backscatter at HH-pol. The corresponding area of Figure 13.23b has been classified into three classes and colors (Class 7—salmon, Class 5—yellow, and Class 2—dark green), which indicate progressive increases in scattering complexity and depolarization as one moves from the perimeter of the slick toward its interior. A similar change in scattering occurs to the left of label B near the center of Figure 13.23a and Figure 13.23b. In this case, as one moves

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

299

A

Class 1 Class 2 Class 3 Class 4 Class 5 Class 7 Class 8 f = 23

Clean surface

Slicks trapped within internal wave packet

Slicks

f = 62

FIGURE 13.22 (See color insert following page 302.) scattering classes. Classification of the slick-field image into H/

(a)

(b)

A

A

Spiral eddy

B

B Class 1 Class 2

Spiral eddy

Class 3 Class 4 Class 5 Class 7 Class 8

L-band, HH-pol image

Wishartand H-Alpha algorithm classified image

FIGURE 13.23 (See color insert following page 302.) (a) L-band, HH-pol image of a second study image (CM6744) containing two strong spiral eddies marked by natural biogenic slicks and (b) classification of the slicks marking the spiral eddies. The image features were values combined with the Wishart classifier. classified into eight classes using the H–

Signal and Image Processing for Remote Sensing

300

from the perimeter into the slick toward the center, the classes and colors (Class 7— salmon, Class 4—light green, Class 1—white) also indicate progressive increases in scattering complexity and depolarization. 13.4.2.3

Comparative Mapping of Slicks Using Other Classification Algorithms

The question arises whether or not the algorithm using entropy-alpha values with the Wishart classifier is the best candidate for unsupervised detection and mapping of slick fields. Two candidate algorithms were suggested as possible competitive classification )– methods. These were (1) the Freeman–Durden decomposition [45] and (2) the (H/A/ Wishart segmentation algorithm [42,43], which introduce anisotropy to the parameter mix because of its sensitivity to ocean surface roughness. Programs were developed to investigate the slick classification capabilities of these candidate algorithms. The same amount of averaging (5 5) and speckle reduction was done for all of the algorithms. The results with the Freeman–Durden classification were poor at both L- and C-bands. Nearly all of the returns were surface, single-bounce scatter. This is expected because the Freeman– Durden decomposition was developed on the basis of scattering models of land features. This method could not discriminate between waves and slicks and did not improve on the results using conventional VV or HH polarization. )–Wishart segmentation algorithm was investigated to take advantage of The (H/A/ the small-scale roughness sensitivity of the polarimetric anisotropy A. The anisotropy is shown (Figure 13.20b) to be very sensitive to slick patterns across the whole image. The )–Wishart segmentation method expands the number of classes from 8 to 16 by (H/A/ including the anisotropy A. The best way to introduce information about A in the classification procedure is to carry out two successive Wishart classifier algorithms. The . Each class in the H/ plane is then further divided first classification only involves H/ into two classes according to whether the pixel’s anisotropy values are greater than 0.5 or less than 0.5. The Wishart classifier is then employed a second time. Details of this )–Wishart method algorithm are given in Ref. [42,43]. The results of using the (H/A/ and iterating it twice are given in Figure 13.24. )–Wishart method resulted in Classification of the slick-field image using the (H/A/ 14 scattering classes. Two of the expected 16 classes were suppressed. The Classes 1–7 corresponded to anisotropy A values from 0.5 to 1.0 and the Classes 8–14 corresponded to anisotropy A values from 0.0 to 0.49. The new two lighter blue vertical features at the lower right of the image appeared in all images involving anisotropy and were thought to be a smooth slick of the lower surfactant material concentration. This algorithm was an –Wishart algorithm for slick mapping. All of the slickimprovement relative to the H/ covered areas were classified well and the unwanted wave field intensity modulations were suppressed.

13.5

Conclusions

Methods that are capable of measuring ocean wave spectra and slope distributions in both the range and azimuth directions were described. The new measurements are sensitive and provide nearly direct measurements of ocean wave spectra and slopes without the need for a complex MTF. The orientation modulation spectrum has a higher dominant wave peak and background ratio than the intensity-based spectrum. The results

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

301

Class 1 2 3 4 5 6 7 8 9 10 11 12 13 14 FIGURE 13.24 (See color insert following page 302.) 14 scattering classes. The Classes 1–7 correspond to anisotropy Classification of the slick-field image into H/A/ A values 0.5 to 1.0 and the Classes 8–14 correspond to anisotropy A values 0.0 to 0.49. The two lighter blue vertical features at the lower right of the image appear in all images involving anisotropy and are thought to be smooth slicks of lower concentration.

determined for the dominant wave direction, wavelength, and wave height are comparable to the NDBC buoy measurements. The wave slope and wave spectra measurement methods that have been investigated may be developed further into fully operational algorithms. These algorithms may then be used by polarimetric SAR instruments, such as ALOS/PALSAR and RADARSAT II, to monitor sea-state conditions globally. Secondly, this work has investigated the effect of internal waves and current fronts on the SAR polarization orientation angle. The results provide a potential (1) independent means for identifying these ocean features and (2) a method of estimating the mean value of the surface current and slope changes associated with an internal wave. Simulations of the NRL wave–current interaction model [38] have been used to identify and quantify the different variables such as current speed, wind speed, and wind direction, which determine changes in the SAR polarization orientation angle. The polarimetric scattering properties of biogenic slicks have been found to be different from those of the clean surface wave field and the slicks may be separated from this background wave field. Damping of capillary waves, in the slick areas, lowers all of the eigenvalues of the decomposition and increases the average alpha angle, entropy, and the anisotropy. The Cloude–Pottier polarimetric decomposition was also used as a new means of studying scattering properties of surfactant slicks perturbed by current-driven surface features. The features, for example, spiral eddies, were marked by filament patterns of

302

Signal and Image Processing for Remote Sensing

slicks. These slick filaments were physically smoother. Backscatter from them was more complex (three eigenvalues nearly equal) and was more depolarized. Anisotropy was found to be sensitive to small-scale ocean surface roughness, but was not a function of large-scale range or azimuth wave slopes. These unique properties provided an achievable separation of roughness scales on the ocean surface at low wind speeds. Changes in anisotropy due to surfactant slicks were found to be measurable across the entire radar swath. Finally, polarimetric SAR decomposition parameters alpha, entropy, and anisotropy were used as an effective means for classifying biogenic slicks. Algorithms, using these parameters, were developed for the mapping of both slick fields and ocean surface features. Selective mapping of biogenic slick fields may be achieved using either the entropy or the alpha parameters with the Wishart classifier or, by the entropy, anisotropy, or the alpha parameters with the Wishart classifier. The latter algorithm gives the best results overall. Slick maps made using this algorithm are of use for satellite scatterometers and wave spectrometers in efforts aimed at flagging ocean surface areas that are contaminated by slick fields.

References 1. Schuler, D.L., Jansen, R.W., Lee, J.S., and Kasilingam, D., Polarisation orientation angle measurements of ocean internal waves and current fronts using polarimetric SAR, IEE Proc. Radar, Sonar Navigation, 150(3), 135–143, 2003. 2. Alpers, W., Ross, D.B., and Rufenach, C.L., The detectability of ocean surface waves by real and synthetic aperture radar, J. Geophys. Res., 86(C7), 6481, 1981. 3. Engen, G. and Johnsen, H., SAR-ocean wave inversion using image cross-spectra, IEEE Trans. Geosci. Rem. Sens., 33, 1047, 1995. 4. Schuler, D.L., Kasilingam, D., Lee, J.S., and Pottier, E., Studies of ocean wave spectra and surface features using polarimetric SAR, Proc. Int. Geosci. Rem. Sens. Symp. (IGARSS’03), Toulouse, France, IEEE, 2003. 5. Schuler, D.L., Lee, J.S., and De Grandi, G., Measurement of topography using polarimetric SAR Images, IEEE Trans. Geosci. Rem. Sens., 34, 1266, 1996. 6. Pottier, E., Unsupervised classification scheme and topography derivation of POLSAR data on the <

Polarimetric SAR Techniques for Remote Sensing of the Ocean Surface

303

14. Hasselmann, K., Raney, R.K., Plant, W.J., Alpers, W., Shuchman, R.A., Lyzenga, D.R., Rufenach, C.L., and Tucker, M.J., Theory of synthetic aperture radar ocean imaging: a MARSEN view, J. Geophys. Res., 90, 4659, 1985. 15. Lyzenga, D.R., An analytic representation of the synthetic aperture radar image spectrum for ocean waves, J. Geophys. Res., 93(13), 859, 1998. 16. Kasilingam, D. and Shi, J., Artificial neural network based-inversion technique for extracting ocean surface wave spectra from SAR images, Proc. IGARSS’97, Singapore, IEEE, 1193–1195, 1997. 17. Hasselmann, S., Bruning, C., Hasselmann, K., and Heimbach, P., An improved algorithm for the retrieval of ocean wave spectra from synthetic aperture radar image spectra, J. Geophys. Res., 101, 16615, 1996. 18. Lehner, S., Schulz-Stellenfleth, Schattler, B., Breit, H., and Horstmann, J., Wind and wave measurements using complex ERS-2 SAR wave mode data, IEEE Trans. Geosci. Rem. Sens., 38(5), 2246, 2000. 19. Dowd, M., Vachon, P.W., and Dobson, F.W., Ocean wave extraction from RADARSAT synthetic aperture radar inter-look image cross-spectra, IEEE Trans. Geosci. Rem. Sens., 39, 21–37, 2001. 20. Lee, J.S., Jansen, R., Schuler, D., Ainsworth, T., Marmorino, G., and Chubb, S., Polarimetric analysis and modeling of multi-frequency SAR signatures from Gulf Stream fronts, IEEE J. Oceanic Eng., 23, 322, 1998. 21. Lee, J.S., Schuler, D.L., and Ainsworth, T.L., Polarimetric SAR data compensation for terrain azimuth slope variation, IEEE Trans. Geosci. Rem. Sens., 38, 2153–2163, 2000. 22. Lee, J.S., Schuler, D.L., Ainsworth, T.L., Krogager, E., Kasilingam, D., and Boerner, W.M., The estimation of radar polarization shifts induced by terrain slopes, IEEE Trans. Geosci. Rem. Sens., 40, 30–41, 2001. 23. Cloude, S.R. and Pottier, E., A review of target decomposition theorems in radar polarimetry, IEEE Trans. Geosci. Rem. Sens., 34(2), 498, 1996. 24. Lee, J.S., Grunes, M.R., and De Grandi, G., Polarimetric SAR speckle filtering and its implication for classification, IEEE Trans. Geosci. Rem. Sens., 37, 2363, 1999. 25. Gasparovic, R.F., Apel, J.R., and Kasischke, E., An overview of the SAR internal wave signature experiment, J. Geophys. Res., 93, 12304, 1998. 26. Gasparovic, R.F., Chapman, R., Monaldo, F.M., Porter, D.L., and Sterner, R.F., Joint U.S./Russia internal wave remote sensing experiment: interim results, Applied Physics Laboratory Report S1R-93U-011, Johns Hopkins University, 1993. 27. Schuler, D.L., Kasilingam, D., and Lee, J.S., Slope measurements of ocean internal waves and current fronts using polarimetric SAR, European Conference on Synthetic Aperture Radar (EUSAR’2002), Cologne, Germany, 2002. 28. Schuler, D.L., Kasilingam, D., Lee, J.S., Jansen, R.W., and De Grandi, G., Polarimetric SAR measurements of slope distribution and coherence changes due to internal waves and current fronts, Proc. Int. Geosci. Rem. Sens. (IGARSS’2002) Symp., Toronto, Canada, 2002. 29. Schuler, D.L., Lee, J.S., Kasilingam, D., and De Grandi, G., Studies of ocean current fronts and internal waves using polarimetric SAR coherences, in Proc. Prog. Electromagnetic Res. Symp. (PIERS’2002), Cambridge, MA, 2002. 30. Alpers, W., Theory of radar imaging of internal waves, Nature, 314, 245, 1985. 31. Brant, P., Alpers, W., and Backhaus, J.O., Study of the generation and propagation of internal waves in the Strait of Gibraltar using a numerical model and synthetic aperture radar images of the European ERS-1 satellite, J. Geophys. Res., 101(14), 14237, 1996. 32. Lyzenga, D.R. and Bennett, J.R., Full-spectrum modeling of synthetic aperture radar internal wave signatures, J. Geophys. Res., 93(C10), 12345, 1988. 33. Schuler D.L., Lee, J.S., Kasilingam, D., and Nesti, G., Surface roughness and slope measurements using polarimetric SAR data, IEEE Trans. Geosci. Rem. Sens., 40(3), 687, 2002. 34. Schuler, D.L., Ainsworth, T.L., Lee, J.S., and De Grandi, G., Topographic mapping using polarimetric SAR data, Int. J. Rem. Sens., 35(5), 1266, 1998. 35. Schuler, D.L, Lee, J.S., Ainsworth, T.L., and Grunes, M.R., Terrain topography measurement using multi-pass polarimetric synthetic aperture radar data, Radio Sci., 35(3), 813, 2002. 36. Schuler, D.L. and Lee, J.S., A microwave technique to improve the measurement of directional ocean wave spectra, Int. J. Rem. Sens., 16, 199, 1995.

304

Signal and Image Processing for Remote Sensing

37. Alpers, W. and Hennings, I., A theory of the imaging mechanism of underwater bottom topography by real and synthetic aperture radar, J. Geophys. Res., 89, 10529, 1984. 38. Jansen, R.W., Chubb, S.R., Fusina, R.A., and Valenzuela, G.R., Modeling of current features in Gulf Stream SAR imagery, Naval Research Laboratory Report NRL/MR/7234-93-7401, 1993 39. Thompson, D.R., Calculation of radar backscatter modulations from internal waves, J. Geophys. Res., 93(C10), 12371, 1988. 40. Schuler, D.L., Lee, J.S., and De Grandi, G., Spiral eddy detection using surfactant slick patterns and polarimetric SAR image decomposition techniques, Proc. Int. Geosci. Rem. Sens. Symp. (IGARSS), Anchorage, Alaska, September, 2004. 41. Lee, J.S., Grunes, M.R., Ainsworth, T.L., Du, L.J., Schuler, D.L., and Cloude, S.R., Unsupervised classification using polarimetric decomposition and the complex Wishart classifier, IEEE Trans. Geosci. Rem. Sens., 37(5), 2249, 1999. 42. Pottier, E. and Lee, J.S., Unsupervised classification scheme of POLSAR images based on the complex Wishart distribution and the polarimetric decomposition theorem, Proc. 3rd Eur. Conf. Synth. Aperture Radar (EUSAR’2000), Munich, Germany, 2000. 43. Ferro-Famil, L., Pottier, E., and Lee, J-S, Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier, IEEE Trans. Geosci. Rem. Sens., 39(11), 2332, 2001. 44. Van Zyl, J.J., Unsupervised classification of scattering mechanisms using radar polarimetry data, IEEE Trans. Geosci. Rem. Sens., 27, 36, 1989. 45. Freeman, A., and Durden, S.L., A three component scattering model for polarimetric SAR data, IEEE Trans. Geosci. Rem. Sens., 36, 963, 1998.

14 MRF-Based Remote-Sensing Image Classification with Automatic Model Parameter Estimation

Sebastiano B. Serpico and Gabriele Moser

CONTENTS 14.1 Introduction ..................................................................................................................... 305 14.2 Previous Work on MRF Parameter Estimation ......................................................... 306 14.3 Supervised MRF-Based Classification......................................................................... 308 14.3.1 MRF Models for Image Classification ........................................................... 308 14.3.2 Energy Functions .............................................................................................. 309 14.3.3 Operational Setting of the Proposed Method .............................................. 310 14.3.4 The Proposed MRF Parameter Optimization Method................................ 311 14.4 Experimental Results...................................................................................................... 314 14.4.1 Experiment I: Spatial MRF Model for Single-Date Image Classification.......................................................................................... 317 14.4.2 Experiment II: Spatio-Temporal MRF Model for Two-Date Multi-Temporal Image Classification ............................................................ 317 14.4.3 Experiment III: Spatio-Temporal MRF Model for Multi-Temporal Classification of Image Sequences.................................... 321 14.5 Conclusions...................................................................................................................... 322 Acknowledgments ..................................................................................................................... 324 References ................................................................................................................................... 324

14.1

Introduction

Within remote-sensing image analysis, Markov random field (MRF) models represent a powerful tool [1], due to their ability to integrate contextual information associated with the image data in the analysis process [2,3,4]. In particular, the use of a global model for the statistical dependence of all the image pixels in a given image-analysis scheme typically turns out to be an intractable task. The MRF approach offers a solution to this issue, as it allows expressing a global model of the contextual information by using only local relations among neighboring pixels [2]. Specifically, due to the Hammersley– Clifford theorem [3], a large class of global contextual models (i.e., the Gibbs random fields, GRFs [2]) can be proved to be equivalent to local MRFs, thus sharply reducing the related model complexity. In particular, MRFs have been used for remote-sensing image analysis, for single-date [5], multi-temporal [6–8], multi-source [9], and multi-resolution [10] 305

306

Signal and Image Processing for Remote Sensing

classification, for denoising [1], segmentation [11–14], anomaly detection [15], texture extraction [2,13,16], and change detection [17–19]. Focusing on the specific problem of image classification, the MRF approach allows one to express a ‘‘maximum-a-posteriori’’ (MAP) decision task as the minimization of a suitable energy function. Several techniques have been proposed to deal with this minimization problem, such as the simulated annealing (SA), an iterative stochastic optimization algorithm converging to a global minimum of the energy [3] but typically involving long execution times [2,5], the iterative conditional modes (ICM), an iterative deterministic algorithm converging to a local (but usually good) minimum point [20] and requiring much shorter computation times than SA [2], and the maximization of posterior marginals (MPM), which approximates the MAP rule by maximizing the marginal posterior distribution of the class label of each pixel instead of the joint posterior distribution of all image labels [2,11,21]. However, an MRF model usually involves the use of one or more internal parameters, thus requiring a preliminary parameter-setting stage before the application of the model itself. In particular, especially in the context of supervised classification, interactive ‘‘trial-and-error’’ procedures are typically employed to choose suitable values for the model parameters [2,5,9,11,17,19], while the problem of fast automatic parameter-setting for MRF classifiers is still an open issue in the MRF literature. The lack of effective automatic parameter-setting techniques has represented a significant limitation on the operational use of MRF-based supervised classification architectures, although such methodologies are known for their ability to generate accurate classification maps [9]. On the other hand, the availability of the above-mentioned unsupervised parameter-setting procedures has contributed to an extensive use of MRFs for segmentation and unsupervised classification purposes [22,23]. In the present chapter, an automatic parameter optimization algorithm is proposed to overcome the above limitation in the context of supervised image classification. The method refers to a broad category of MRF models, characterized by energy functions expressed as linear combinations of different energy contributions (e.g., representing different typologies of contextual information) [2,7]. The algorithm exploits this linear dependence to formalize the parameter-setting problem as the solution of a set of linear inequalities, and addresses this problem by extending to the present context the Ho–Kashyap method for linear classifier training [24,25]. The well-known convergence properties of such a method and the absence of parameters to be tuned are among the good features of the proposed technique. The chapter is organized as follows. Section 14.2 provides an overview of the previous work on the problem of MRF parameter estimation. Section 14.3 describes the methodological issues of the method, and Section 14.4 presents the results of the application of the technique to the classification of real (both single-date and multi-temporal) remote sensing images. Finally, conclusions are drawn in Section 14.5.

14.2

Previous Work on MRF Parameter Estimation

Several parameter-setting algorithms have been proposed in the context of MRF models for image segmentation (e.g., [12,22,26,27]) or unsupervised classification [23,28,29], although often resulting in a considerable computational burden. In particular, the usual maximum likelihood (ML) parameter estimation approach [30] exhibits good theoretical consistency properties when applied to GRFs and MRFs [31], but turns out to be computationally very expensive for most MRF models [22] or even intractable [20,32,33], due to the difficulty of analytical computation and numerical maximization of the

MRF-Based Remote-Sensing Image Classification

307

normalization constants involved by the MRF-based distributions (the so-called ‘‘partition functions’’) [20,23,33]. Operatively, the use of ML estimates for MRF parameters turns out to be restricted just to specific typologies of MRFs, such as continuous Gaussian [1,13,15,34,35] or generalized Gaussian [36] fields, which allows an ML estimation task to be formulated in an analytically feasible way. Beyond such case-specific techniques, essentially three approaches have been suggested in the literature to address the problem of unsupervised ML estimation of MRF parameters indirectly, namely the Monte Carlo methods, the stochastic gradient approaches, or the pseudo-likelihood approximations [23]. The combination of the ML criterion with Monte Carlo simulations has been proposed to overcome the difficulty of computation of the partition function [22,37] and it provides good estimation results, but it usually involves long execution times. Stochastic-gradient approaches aim at maximizing the log-likelihood function by integrating a Gibbs stochastic sampling strategy into the gradient ascent method [23,38]. Combinations of the stochastic-gradient approach with the iterated conditional expectation (ICE) and with an estimation–restoration scheme have also been proposed in Refs. [12,29], respectively. Approximate pseudo-likelihood functionals have been introduced [20,23], and they are numerically feasible, although the resulting estimates are not actual ML estimates (except in the trivial noncontextual case of pixel independence) [20]. Moreover the pseudo-likelihood approximation may underestimate the interactions between pixels and can provide unsatisfactory results unless the interactions are suitably weak [21,23]. Pseudo-likelihood approaches have also been developed in conjunction with mean-field approximations [26], with the expectationmaximization (EM) [21,39], the Metropolis–Hastings [40], and the ICE [21] algorithms, with Monte Carlo simulations [41], or with multi-resolution analysis [42]. In Ref. [20] a pseudo-likelihood approach is plugged into the ICM energy minimization strategy, by iteratively alternating the update of the contextual clustering map and the update of the parameter values. In Ref. [10] this method is integrated with EM in the context of multiresolution MRFs. A related technique is the ‘‘coding method,’’ which is based on a pseudo-likelihood functional computed over a subset of pixels, although these subsets depend on the choice of suitable coding strategies [34]. Several empirical or ad hoc estimators have also been developed. In Ref. [33] a family of MRF models with polynomial energy functions is proved to be dense in the space of all MRFs and is endowed with a case-specific estimation scheme based on the method of moments. In Ref. [43] a least-square approach is proposed for Ising-type and Potts-type MRF models [22], and it formulates an overdetermined system of linear equations relating the unknown parameters with a set of relative frequencies of pixel-label configurations. The combination of this method with EM is applied in Ref. [44] for sonar image segmentation purposes. A simpler but conceptually similar approach is adopted in Ref. [12] and is combined with ICE: a simple algebraic empirical estimator for a one-parameter spatial MRF model is developed and it directly relates the parameter value with the relative frequencies of several class-label configurations in the image. On the other hand, the literature about MRF parameter-setting for supervised image classification is very limited. Any unsupervised MRF parameter estimator can also naturally be applied to a supervised problem, by simply neglecting the training data in the estimation process. However, only a few techniques have been proposed so far, effectively exploiting the available training information for MRF parameter-optimization purposes as well. In particular, a heuristic algorithm is developed in Ref. [7] aiming to optimize automatically the parameters of a multi-temporal MRF model for a joint supervised classification of two-date imagery, and a genetic approach is combined in Ref. [45] with simulated annealing [3] for the estimation of the parameters of a spatial MRF model for multi-source classification.

Signal and Image Processing for Remote Sensing

308

14.3

Supervised MRF-Based Classification

14.3.1

MRF Models for Image Classification

Let I ¼ {x1, x2, . . . , xN} be a given n-band remote-sensing image, modeled as a set of N identically distributed n-variate random vectors. We assume M thematic classes v1, v2, . . . , vM to be present in the image and we denote the resulting set of classes by V ¼ {v1, v2, . . . , vM} and the class label of the k-th image pixel (k ¼ 1, 2, . . . , N) by sk 2 V. By operating in the context of supervised image classification, we assume a training set to be available, and we denote the index set of the training pixels by T {1, 2, . . . , N} and the corresponding true class label of the k-th training pixel (k 2 T) by sk*. When collecting all the feature vectors of the N image pixels in a single (N n)-dimensional column vector1 X ¼ col[x1, x2, . . . , xN] and all the pixel labels in a discrete random vector S ¼ (s1, s2, . . . , SN) 2 VN, the MAP decision rule (i.e., the Bayes rule for minimum classification error [46]) assigns to the image ~, which maximizes the joint posterior probability P(SjX), that is, data X the label vector S ~ ¼ arg max P(SjX) ¼ arg max ½p(XjS)P(S) S S2VN

S2VN

(14:1)

where p(XjS) and P(S) are the joint probability density function (PDF) of the global feature vector X conditioned to the label vector S and the joint probability mass function (PMF) of the label vector itself, respectively. The MRF approach offers a computationally tractable solution to this maximization problem by passing from a global model for the statistical dependence of the class labels to a model of the local image properties, defined according to a given neighborhood system [2,3]. Specifically, for each k-th image pixel, a neighborhood Nk {1, 2, . . . , N} is assumed to be defined, such that, for instance, Nk includes the four (firstorder neighborhood) or the eight (second-order neighborhood) pixels surrounding the k-th pixel (k ¼ 1, 2, . . . , N). More formally, a neighborhood system is a collection {Nk}kN¼ 1 of subsets of pixels such that each pixel is outside its neighborhood (i.e., k 2 = Nk for all k ¼ 1, 2, . . . , N) and neighboring pixels are always mutually neighbors (i.e., k 2 Nh if and only if h 2 Nk for all k,h ¼ 1, 2, . . . , N, k ¼ 6 h). This simple discrete topological structure attached to the image data is exploited in the MRF framework to model the statistical relationships between the class labels of spatially distinct pixels and to provide a computationally affordable solution to the global MAP classification problem of Equation 14.1. Specifically, we assume the feature vectors x1, x2, . . . , xN to be conditionally independent and identically distributed with PDF p(xjs) (x 2 Rn, s 2 V), that is [2], p(XjS) ¼

N Y

p(xk jsk )

(14:2)

k¼1

and the joint prior PMF P(S) to be a Markov random field with respect to the abovementioned neighborhood system, that is [2,3], .

1

the probability distribution of each k-th image label, conditioned to all the other image labels, is equivalent to the distribution of the k-th label conditioned only to the labels of the neighboring pixels (k ¼ 1, 2, . . . , N):

All the vectors in the chapter are implicitly assumed to be column vectors, and we denote by ui the i-th component of an m-dimensional vector u 2 Rm (i ¼ 1, 2, . . . , m), by ‘‘col’’ the operator of column vector juxtaposition (i.e., col [u,v] is the vector obtained by stacking the two vectors u 2 Rm and v 2 Rn in a single (m þ n)-dimensional column vector), and by the superscript ‘‘T’’ the matrix transpose operator.

MRF-Based Remote-Sensing Image Classification

309

P{sk ¼ vi jsh : h 6¼ k} ¼ P{sk ¼ vi jsh : h 2 Nk }, .

i ¼ 1, 2, . . . , M

(14:3)

the PMF of S is a strictly positive function on VN, that is, P(S) > 0 for all S 2 VN.

The Markov assumption expressed by Equation 14.3 allows restricting the statistical relationships among the image labels to the local relationships inside the predefined neighborhood, thus greatly simplifying the spatial-contextual model for the label distribution as compared to a generic global model for the joint PMF of all the image labels. However, as stated in the next subsection, due to the so-called Hammersley–Clifford theorem, despite this strong analytical simplification, a large class of contextual models can be accomplished under the MRF formalism. 14.3.2

Energy Functions

Given the neighborhood system {Nk}kN¼ 1 , we denote by ‘‘clique’’ a set Q of pixels (Q {1, 2, . . . , N}) such that, for each pair (k, h) of pixels in Q, k and h turn out to be mutually neighbors, that is, k 2 Nh

() h 2 Nk

8k, h 2 Q,

k 6¼ h

(14:4)

By marking the collection of all the cliques in the adopted neighborhood system by Q and the vector of the pixel labels in the clique Q 2 Q (i.e., SQ ¼ col[sk:k 2 Q]) by SQ, the Hammersley–Clifford theorem states that the label configuration S is an MRF if and only if, for any clique Q 2 Q there exists a real-valued function VQ (SQ) (usually named ‘‘potential function’’) of the pixel labels in Q, so that the global PMF of S is given by the following Gibbs distribution [3]: P(S) ¼

1 Zprior

Uprior (S) exp , Q

where: Uprior (S) ¼

X

VQ (SQ )

(14:5)

Q2Q

Q is a positive parameter, and Zprior is a normalizing constant. Because of the formal similarity between the probability distribution in Equation 14.5 and the well-known Maxwell–Boltzmann distribution introduced in statistical mechanics for canonical ensembles [47], Uprior (), Q, and Zprior are usually named ‘‘energy function,’’ ‘‘temperature,’’ and ‘‘partition function,’’ respectively. A very large class of statistical interactions among spatially distinct pixels can be modeled in this framework, by simply choosing a suitable function Uprior (), which makes the MRF approach highly flexible. In addition, when coupling the Hammersley–Clifford formulation of the prior PMF P(S) with the conditional independence assumption stated for the conditional PDF p(XjS) (see Equation 14.2), an energy representation also holds for the global posterior distribution P(SjX), that is [2], Upost (SjX) P(SjX) ¼ exp Q Zpost,X 1

(14:6)

where Zpost,X is a normalizing constant and Upost () is a posterior energy function, given by: Upost (SjX) ¼ Q

N X k¼1

ln p(xk jsk ) þ

X

VQ (SQ )

(14:7)

Q2Q

This formulation of the global posterior probability allows addressing the MAP classification task as the minimization of the energy function Upost(jX), which is locally

Signal and Image Processing for Remote Sensing

310

defined, according to the pixelwise PDF of the feature vectors conditioned to the class labels and to the collection of the potential functions. This makes the maximization problem of Equation 14.1 tractable and allows a contextual classification map of the image I to be feasibly generated. However, the (prior and posterior) energy functions are generally parameterized by several real parameters l1, l2, . . . , lL. Therefore, the solution of the minimum-energy problem requires the preliminary selection of a proper value for the parameter vector l ¼ (l1, l2, . . . , lL)T. As described in the next subsections, in the present chapter, we address this parameter-setting issue with regard to a broad family of MRF models, which have been employed in the remote-sensing literature, and to the ICM minimization strategy for the energy function. 14.3.3

Operational Setting of the Proposed Method

Among the techniques proposed in the literature to deal with the task of minimizing the posterior energy function (see Section 14.1), in the present chapter ICM is adopted as a trade-off between the effectiveness of the minimization process and the computation time [5,7], and it is endowed with an automatic parameter-setting stage. Specifically, ICM is 0 T initialized with a given label vector S0 ¼ (s10, s20, . . . , sN ) (e.g., generated by a previously applied noncontextual supervised classifier) and it iteratively modifies the class labels to decrease the energy function. In particular, by marking by Ck the set of the neighbor labels of the k-th pixel (i.e., the ‘‘context’’ of the k-th pixel, Ck ¼ col[sh:h 2 Nk]), at the t-th ICM iteration, the label sk is updated according to the feature vector xk and to the current neighboring labels Ckt, so that skt þ 1 is the class label vi that minimizes a local energy function Ul (vijxk,Ckt) (the subscript l is introduced to stress the dependence of the energy on the parameter vector l). More specifically, given the neighborhood system, an energy representation can also be proved for the local distribution of the class labels, that is, (i ¼ 1, 2, . . . , M; k ¼ 1, 2, . . . , N) [9]: 1 Ul (vi jxk , Ck ) P{sk ¼ vi jxk , Ck } ¼ exp (14:8) Zkl Q where Zkl is a further normalizing constant and the local energy is defined by X VQ (SQ ) Ul (vi jxk , Ck ) ¼ Q ln p(xk jvi ) þ

(14:9)

Q3k

Here we focus on the family of MRF models whose local energy functions Ul() can be expressed as weighted sums of distinct energy contributions, that is, Ul (vi jxk , Ck ) ¼

L X

l‘ E ‘ (vi jxk , Ck ), k ¼ 1, 2, . . . , N

(14:10)

‘¼1

where E ‘() is the ‘-th contribution and the parameter l‘ plays the role of the weight of E ‘() (‘ ¼ 1, 2, . . . , L). A formal comparison between Equation 14.9 and Equation 14.10 suggests that one of the considered L contributions (say, E 1()) should be related to the pixelwise conditional PDF of the feature vector (i.e., E 1(vijxk) / ln p(xkjvi)), which formalizes the spectral information associated with each single pixel. From this viewpoint, postulating the presence of (L 1) further contributions implicitly means that (L 1) typologies of contextual information are modeled by the adopted MRF and are supposed to be separable (i.e., combined in a purely additive way) [7]. Formally speaking, Equation 14.10

MRF-Based Remote-Sensing Image Classification

311

represents a constraint on the energy function, but most MRF models employed in remote sensing for classification, change detection, or segmentation purposes belong to this category. For instance, the pairwise interaction model described in Ref. [2], the wellknown spatial Potts MRF model employed in Ref. [17] for change detection purposes and in Ref. [5] for hyperspectral data classification, the spatio-temporal MRF model introduced in Ref. [7] for multi-date image classification, and the multi-source classification models defined in Ref. [9] belong to this family of MRFs. 14.3.4

The Proposed MRF Parameter Optimization Method

With regard to the class of MRFs identified by Equation 14.10, the proposed method employs the training data to identify a set of parameters for the purpose of maximizing the classification accuracy by exploiting the linear relation between the energy function and the parameter vector. When focusing on the first ICM iteration, the k-th training sample (k 2 T) is correctly classified by the minimum-energy rule if: Ul (sk jxk , C0k ) Ul (vi jxk , C0k )

8vi 6¼ s*k

(14:11)

or equivalently: L X

l‘ E ‘ (vi jxk , C0k ) E ‘ (s*k jxk , C0k ) 0

8vi 6¼ s*k

(14:12)

‘¼1

We note that the inequalities in Equation 14.12 are linear with respect to l, that is, the correct classification of a training pixel is expressed as a set of (M 1) linear inequalities with respect to the model parameters. More formally, when collecting the energy differences contained in Equation 14.12 in a single L-dimensional column vector: «ki ¼ col E ‘ (vi jxk , C0k ) E ‘ (s*k jxk , C0k ): ‘ ¼ 1, 2, . . . , L

(14:13)

the k-th training pixel is correctly classified by ICM if «Tki l 0 for all the class labels vi 6¼ sk* (k 2 T). By denoting by E the matrix obtained by juxtaposing all the row vectors «kiT (k 2 T, vi 6¼ sk*), we conclude that ICM correctly classifies the whole training set T if and only if2: El 0

(14:14)

Operatively, the matrix E includes all the coefficients (i.e., the energy differences) of the linear inequalities in Equation 14.12; hence, E has L columns and has a row for each inequality, that is, it has (M 1) rows for each training pixel (corresponding to the (M 1) class labels, which are different from the true label). Therefore, denoting by jTj the number of training samples (i.e., the cardinality of T), E is an (R L)-sized matrix, with R ¼ jTj (M 1). Since R L, the system in Equation 14.14 presents a number of inequalities, which is much larger than the number of unknown variables. In particular, a feasible approach would lie in formulating the matrix inequality in Equation 14.14 as an equality E l ¼ b, where binRR is a ‘‘margin’’ vector with positive components, to solve such equality by a minimum square error (MSE) approach [24]. However, MSE would require a preliminary manual choice of margin vector b. Therefore, we avoid using this approach, and note that this problem is formally identical to the problem of computing 2

Given two m-dimensional vectors, u and v (u, v 2 Rm), we write u v to mean ui vi for all i ¼ 1, 2, . . . , m.

Signal and Image Processing for Remote Sensing

312

the weight parameters of a linear binary discriminant function [24]. Hence, we propose to address the solution of Equation 14.14 by extending to the present context the methods described in the literature to compute such a linear discriminant function. In particular, we adopt the Ho–Kashyap method [24]. This approach jointly optimizes both l and b according to the following quadratic programing problem [24]: 8 > kEl bk2 < min l ,b L R > :l 2 R ,b 2 R br > 0 for r ¼ 1, 2, . . . , R

(14:15)

which is solved by alternating iteratively an MSE step to update l and a gradientlike descent step to update b [24]. Specifically, the following operations are performed at the t-th step (t ¼ 0, 1, 2, . . . ) of the Ho–Kashyap procedure [24]: .

given the current margin vector bt, compute a corresponding parameter vector lt by minimizing kE l btk2 with respect to l, that is, compute: lt ¼ E# bt , where E# ¼ (ET E)1 ET

(14:16)

is the so-called pseudo-inverse of E (provided that ETE is nonsingular) [48] . .

compute the error vector et ¼ Elt bt 2 RR update each component of the margin vector by minimizing kElt bk2 with respect to b by a gradient-like descent step that allows the margin components only to be increased [24], that is, compute

brtþ1 ¼

btr þ retr btr

if etr > 0 if etr 0

r ¼ 1, 2, . . . , R

(14:17)

where r is a convergence parameter. In particular, the proposed approach to the MRF parameter-setting problem allows exploiting the known theoretical results about the Ho–Kashyap method in the context of linear discriminant functions. Specifically, one can prove that, if the matrix inequality in Equation 14.14 has solutions and if 0 < r < 2, then the Ho–Kashyap algorithm converges to a solution of Equation 14.14 in a finite number t of iterations (i.e., we obtain et ¼ 0 and consequently bt þ 1 ¼ bt and Elt ¼ bt 0) [24]. On the other hand, if the inequality in Equation 14.14 has no solution, it is possible to prove that each component ert of the error vector (r ¼ 1, 2, . . . , R) either vanishes for t ! þ1 or takes on nonpositive values, while the error magnitude ketk converges to a positive limit, which is bounded away from zero; in the latter situation, according to the Ho – Kashyap iterative procedure, the algorithm stops, and one can conclude that the matrix inequality in Equation 14.14 has no solution [46]. Therefore, the convergence (either finite-time or asymptotic) of the proposed parameter-setting method is guaranteed in any case. However, it is worth noting that the number t of iterations required to reach convergence in the first case or the number t0 of iterations needed to detect the nonexistence of a solution of Equation 14.14 in the second case are not known in advance. Hence, specific stop conditions are usually adopted for the Ho–Kashyap procedure, for instance, by stopping the iterative process when j l‘tþ1 lt‘ j < «stop for all ‘ ¼ 1, 2, . . . , L and jbrtþ1brtj< «stop for all r ¼ 1, 2, . . . , R, where «stop is a given threshold (in the present

MRF-Based Remote-Sensing Image Classification

313

chapter, «stop ¼ 0.0001 is used). The algorithm has been initialized by setting unitary values for all the components of l and b and by choosing r ¼ 1. Coupling the proposed Ho–Kashyap-based parameter-setting method with the ICM classification approach results in an automatic contextual supervised classification approach (hereafter marked by HK–ICM), which performs the following processing steps: .

Noncontextual step: generate an initial noncontextual classification map (i.e., an initial label vector S0) by applying a given supervised noncontextual (e.g., Bayesian or neural) classifier;

.

Energy-difference step: compute the energy-difference matrix E according to the adopted MRF model and to the label vector S0; Ho–Kashyap step: compute an optimal parameter vector l* by running the Ho–Kashyap procedure (applied to the matrix E) up to convergence; and

.

.

ICM step: generate a contextual classification map by running ICM (fed with the parameter vector l* ) up to convergence.

A block diagram of the proposed contextual classification scheme is shown in Figure 14.1. It is worth noting that, according to the role of the parameters as weights of energy contributions, negative values for l*1, l*2 , . . . , L* would be undesirable. To prevent this issue, we note that, if all the entries in a given row of the matrix E are negative,

Multiband image

Training map

Noncontextual step: generation of an initial noncontextual classification map S0 Energy-difference step: computation of the matrix E E Ho–Kashyap step: computation of the optimal parameter vector l* l*

ICM step: contextual classification of the input image

Contextual classification map

FIGURE 14.1 Block diagram of the proposed contextual classification scheme.

Signal and Image Processing for Remote Sensing

314

the corresponding inequality cannot be satisfied by a vector l with positive components. Therefore, before the application of the Ho–Kashyap step, the rows of E containing only negative entries are cancelled, as such rows would represent linear inequalities that could not be solved by positive parameter vector solutions. In other words, each deleted row corresponds to a training pixel k 2 T, so that a class vi 6¼ sk* has a lower energy than the true class label sk* for all possible positive values of the parameters l1, l2, . . . , lL (therefore, its classification is wrong).

14.4

Expe rimental Results

The proposed HK–ICM method was tested on three different MRF models, each applied to a distinct real data set. In all the cases, the results provided by ICM, endowed with the proposed automatic parameter-optimization method, are compared with the ones that can be achieved by performing an exhaustive grid search for the parameter values yielding the highest accuracies on the training set; operationally, such parameter values can be obtained by a ‘‘trial-and-error’’ procedure (TE–ICM in the following). Specifically, in all the experiments, spatially distinct training and test fields were available for each class. We note that the highest training-set (and not test-set) accuracy is searched for by TE–ICM to perform a consistent comparison with the proposed method, which deals only with the available training samples. A comparison between the classification accuracies of HK–ICM and TE–ICM on such a set of samples aims at assessing the performances of the proposed method from the viewpoint of the MRF parameter-optimization problem. On the other hand, an analysis of the test-set accuracies allows one to assess the quality of the resulting classification maps. Correspondingly, Table 14.1 through Table 14.3 show, for the three experiments, the classification accuracies provided by HK–ICM and those given by TE–ICM on both the training and the test sets. In all the experiments, 10 ICM iterations were sufficient to reach convergence, and the Ho–Kashyap algorithm was initialized by setting a unitary value for all the MRF parameters (i.e., by initially giving the same weight to all the energy contributions).

TABLE 14.1 Experiment I: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual GMAP Classifier, by HK–ICM, and by TE–ICM

Class Wet soil Urban Wood Water Bare soil Overall accuracy Average accuracy

Test-Set Accuracy Training-Set Accuracy Training Test Samples Samples GMAP (%) HK–ICM (%) TE–ICM (%) HK–ICM (%) TE–ICM (%) 1866 4195 3685 1518 3377

750 2168 2048 875 2586

93.73 94.10 98.19 91.66 97.53

96.93 99.31 98.97 92.69 99.54

96.80 99.45 98.97 92.69 99.54

98.66 97.00 96.85 98.16 99.20

98.61 97.02 96.85 98.22 99.20

95.86

98.40

98.42

97.80

97.81

95.04

97.49

97.49

97.97

97.98

Experiment II: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual DTC Classifier, by HK–ICM, and by TE–ICM

Image October 2000

June 2001

Class Wet soil Urban Wood Water Bare soil Overall accuracy Average accuracy Wet soil Urban Wood Water Bare soil Agricultural Overall accuracy Average accuracy

Training Samples

Test Samples

1194 3089 4859 1708 4509

895 3033 3284 1156 3781

1921 3261 1719 1461 3168 2773

1692 2967 1413 1444 3052 2431

Test-Set Accuracy

Training-Set Accuracy

DTC (%)

HK–ICM (%)

TE–ICM (%)

HK–ICM (%)

TE–ICM (%)

79.29 88.30 99.15 98.70 97.70 93.26 92.63 96.75 89.25 94.55 99.72 97.67 82.48 92.68 93.40

92.63 96.41 99.76 98.70 99.00 98.06 97.30 98.46 92.35 98.51 99.72 99.34 83.42 94.61 95.30

91.84 99.04 99.88 98.79 99.34 98.81 97.78 98.94 94.00 98.51 99.79 99.48 83.67 95.13 95.73

96.15 94.59 99.96 99.82 98.54 98.15 97.81 99.64 98.25 96.74 99.93 98.77 99.89 98.86 98.87

94.30 96.89 99.98 99.82 98.91 98.59 97.98 99.74 99.39 98.08 100 98.90 99.96 99.34 99.34

MRF-Based Remote-Sensing Image Classification

TABLE 14.2

315

316

TABLE 14.3 Experiment III: Training and Test-Set Cardinalities and Classification Accuracies Provided by the Noncontextual DTC Classifier, by HK–ICM, and by TE–ICM Class

April 16

Wet soil Bare soil Overall accuracy Average accuracy Wet soil Bare soil Overall accuracy Average accuracy Wet soil Bare soil Overall accuracy Average accuracy

April 17

April 18

Test Samples

6205 1346

5082 1927

6205 1469

5082 1163

6205 1920

5082 2122

Test-Set Accuracy

Training-Set Accuracy

DTC (%)

HK–ICM (%)

TE–ICM (%)

HK–ICM (%)

TE–ICM (%)

90.38 91.70 90.74 91.04 93.33 99.66 94.51 96.49 86.80 97.41 89.92 92.10

99.19 92.42 97.33 95.81 99.17 100 99.33 99.59 97.28 96.80 97.14 97.04

96.79 95.28 96.38 96.04 97.93 100 98.32 98.97 92.84 98.26 94.43 95.55

100 99.93 99.99 99.96 100 99.52 99.91 99.76 100 98.70 99.69 99.35

100 100 100 100 100 100 100 100 100 99.95 99.99 99.97

Signal and Image Processing for Remote Sensing

Image

Training Samples

MRF-Based Remote-Sensing Image Classification 14.4.1

317

Experiment I: Spatial MRF Model for Single-Date Image Classification

The spatial MRF model described in Refs. [5,17] is adopted: it employs a second-order neighborhood (i.e., Ck includes the labels of the 8 pixels surrounding the k-th pixel; k ¼ 1, 2, . . . , N) and a parametric Gaussian N(mi, Si) model for the PDF p(xkjvi) of a feature vector xk conditioned to the class vi (i ¼ 1, 2, . . . , M; k ¼ 1, 2, . . . , N). Specifically, two energy contributions are defined, that is, a ‘‘spectral’’ energy term E1(), related to the information conveyed by the conditional statistics of the feature vector of each single pixel and a ‘‘spatial’’ term E2(), related to the information conveyed by the correlation among the class labels of neighboring pixels, that is, (k ¼ 1, 2, . . . , N): X ^ i, ^ 1 ( xk m ^ i )T ^ i ) þ lndet E 1 (vi j xk ) ¼ (xk m E 2 (vi j Ck ) ¼ d(vi , vj ) (14:18) i vj 2 Ck

^ i are a sample-mean estimate and a sample-covariance estimate of the ^ i and S where m conditional mean mi and of the conditional covariance matrix Si, respectively, computed using the training samples of vi (i ¼ 1, 2, . . . , M); d (, ) is the Kronecker delta function (i.e., d(a, b) ¼ 1 for a ¼ b and d(a, b) ¼ 0 for a 6¼ b). Hence, in this case, two parameters, l1 and l2, have to be set. The model was applied to a 6 -band 870 498 pixel-sized Landsat-5 TM image acquired in April, 1994, over an urban and agricultural area around the town of Pavia (Italy) (Figure 14.2a), and presenting five thematic classes (i.e., ‘‘wet soil,’’ ‘‘urban,’’ ‘‘wood,’’ ‘‘water,’’ and ‘‘bare soil;’’ see Table 14.1). The above-mentioned normality assumption for the conditional PDFs is usually accepted as a model for the class statistics in optical data [49,50]. A standard noncontextual MAP classifier with Gaussian classes (hereafter denoted simply by GMAP) [49,50] was adopted in the noncontextual step. The Ho–Kashyap step required 31716 iterations to reach convergence and selected l*1 ¼ 0.99 and l2* ¼ 10.30. The classification accuracies provided by GMAP and HK–ICM on the test set are given in Table 14.1. GMAP already provides good classification performances, with accuracies higher than 90% for all the classes. However, as expected, the contextual HK–ICM classifier further improves the classification result, in particular, yielding, a 3.20% accuracy increase for ‘‘wet soil’’ and a 5.21% increase for ‘‘urban,’’ which result in a 2.54% increase in the overall accuracy (OA, i.e., the percentage of correctly classified test samples) and in a 2.45% increase in the average accuracy (AA, i.e., the average of the accuracies obtained on the five classes). Furthermore, for the sake of comparison, Table 14.1 also shows the results obtained by the contextual MRF–ICM classifier, applied by using a standard ‘‘trial-and-error’’ parametersetting procedure (TE–ICM). Specifically, fixing3 l1 ¼ 1, the value of l2 yielding the highest value of OA on the training set has been searched exhaustively in the range [0,20] with discretization step 0.2. The value selected exhaustively by TE–ICM was l2 ¼ 11.20, which is quite close to the value computed automatically by the proposed procedure. Accordingly, the classification accuracies obtained by HK–ICM and TE–ICM on both the training and test sets were very similar (Table 14.1). 14.4.2 Experiment II: Spatio-Temporal MRF Model for Two-Date Multi-Temporal Image Classification As a second experiment, we addressed the problem of supervised multi-temporal classification of two co-registered images, I0 and I1, acquired over the same ground area at 3 According to the minimum-energy decision rule, this choice causes no loss of generality, as the classification result is affected only by the relative weight l2/l1 between the two parameters and not by their absolute values.

Signal and Image Processing for Remote Sensing

318

(a)

(b)

(c) FIGURE 14.2 Data sets employed for the experiments: (a) band TM-3 of the multi-spectral image acquired in April, 1994, and employed in Experiment I; (b) ERS-2 channel (after histogram equalization) of the multi-sensor image acquired in October, 2000, and employed in Experiment II; (c) XSAR channel (after histogram equalization) of the SAR multi-frequency image acquired in April 16, 1994, and employed in Experiment III.

different times, t0 and t1 (t1 > t0). Specifically, the spatio-temporal mutual MRF model proposed in Ref. [7] is adopted that introduces an energy function taking into account both the spatial context (related to the correlation among neighboring pixels in the same image) and the temporal context (related to the correlation between distinct images of the same area). Let us denote by Mj, xjk, and vji the number of classes in Ij, the feature vector of the k-th pixel in Ij (k ¼ 1, 2, . . . , N), and the i-th class in Ij (i ¼ 1, 2, . . . , Mj), respectively

MRF-Based Remote-Sensing Image Classification

319

( j ¼ 0,1). Focusing, for instance, on the classification of the first-date image I0, the context of the k-th pixel includes two distinct sets of labels: (1) the set S0k of labels of the spatial context, that is, the labels of the 8 pixels surrounding the k-th pixel in I0, and (2) the set T0k of labels of the temporal context, that is, the labels of the pixels lying in I1 inside a 3 3 window centered on the k-th pixel (k ¼ 1, 2, . . . , N) [7]. Accordingly, three energy contributions are introduced, that is, a spectral and a spatial term (as in Experiment I), and an additional temporal term, that is, (i ¼ 1, 2, . . . , M0; k ¼ 1, 2, . . . , N): X p(x0k jv0i ), E 2 (v0i jS0k ) ¼ d(v0i , v0j ) E 1 (v0i jx0k ) ¼ ln ^ v0j 2S0k

E 3 (v0i jT0k ) ¼

X

^ (v0i jv1j ) P

(14:19)

v1j 2T0k

where ^ p(x0kjv0i) is an estimate of the PDF p(x0kjv0i) of a feature vector x0k (k ¼ 1, 2, . . . , N) ^ (v0ijv1j) is an estimate of the transition probability conditioned to the class v0i, and P between class v1j at time t1 and class v0i at time t0 (i ¼ 1, 2, . . . , M0 ; j ¼ 1, 2, . . . , M1) [7]. A similar 3-component energy function is also introduced at the second date; hence, three parameters have to be set at date t0 (namely, l01, l02, and l03) and three other parameters at date t1 (namely, l11, l12, and l13). In the present experiment, HK–ICM was applied to a two-date multi-temporal data set, consisting of two 870 498 pixel-sized images acquired over the same ground area as in Experiment I in October, 2000 and July, 2001, respectively. At both acquisition dates, eight Landsat-7 ETMþ bands were available, and a further ERS-2 channel (C-band, VV polarization) was also available in October, 2000 (Figure 14.2b). The same five thematic classes, as considered in Experiment I were considered in the October, 2000 scene, while the July, 2001 image also presented a further ‘‘agricultural’’ class. For all the classes, training and test data were available (Table 14.2). Due to the multi-sensor nature of the October, 2000 image, a nonparametric technique was employed in the noncontextual step. Specifically, the dependence tree classifier (DTC) approach was adopted that approximates each multi-variate class-conditional PDF as a product of automatically selected bivariate PDFs [51]. As proposed in Ref. [7], the Parzen window method [30], applied with Gaussian kernels, was used to model such bivariate PDFs. In particular, to avoid the usual ‘‘trial-and-error’’ selection of the kernelwidth parameters involved by the Parzen method, we adopted the simple ‘‘reference density’’ automatization procedure, which computes the kernel width by asymptotically minimizing the ‘‘mean integrated square error’’ functional, according to a given reference model for the unknown PDF (for further details on the automatic kernel-width selection for Parzen density estimation, we refer the reader to Refs. [52,53]). The ‘‘reference density’’ approach is chosen for its simplicity and short computation time; in particular, it is applied according to a Gaussian reference density for the kernel widths of the ETMþ features [53] and to a Rayleigh density for the kernel widths of the ERS-2 feature (for further details on this SAR-specific kernel-width selection approach, we refer the reader to Ref. [54]). The resulting PDF estimates were fed to an MAP classifier together with prior-probability estimates (computed as relative frequencies on the training set) to generate an initial noncontextual classification map for each date. During the noncontextual step, transition-probability estimates were computed as ^ (v0ijv1j) was computed as the ratio relative frequencies on the two initial maps (i.e., P nij/mj between the number nij of pixels assigned both to v0i in I0 and to v1j in I1 and the total number mj of pixels assigned to v1j in I1; i ¼ 1, 2, . . . , M0 ; j ¼ 1, 2, . . . , M1). The energy-difference and the Ho–Kashyap steps were applied first to the October,

Signal and Image Processing for Remote Sensing

320

2000 image, converging to (1.06,0.95,1.97) in 2632 iterations, and then to the July, 2001 image, converging to (1.00,1.00,1.01) in only 52 iterations. As an example, the convergent behavior of the parameter values during the Ho–Kashyap iterations for the October, 2000 image is shown in Figure 14.3. The classification accuracies obtained by ICM, applied to the spatio-temporal model of Equation 14.19 with these parameter values, are shown in Table 14.2, together with the results of the noncontextual DTC algorithm. Also in this experiment, HK–ICM provided good classification accuracies at both dates, specifically yielding large accuracy increases for several classes as compared with the noncontextual DTC initialization (in particular, ‘‘urban’’ at both dates, ‘‘wet soil’’ in October, 2000, and ‘‘wood’’ in June, 2001). Also for this experiment, we compared the results obtained by HK–ICM with the ones provided by the MRF–ICM classifier with a ‘‘trial-and-error’’ parameter setting (TE–ICM, Table 14.2). Fixing l01 ¼ l11 ¼ 1 (as in Experiment I) at both dates, a grid search was performed to identify the values of the other parameters that yielded the highest overall accuracies on the training set at the two dates. In general, this would require an interactive optimization of four distinct parameters (namely, l02, l03, l12, and l13). To reduce the time taken by this interactive procedure, the restrictions l02 ¼ l12 and l03 ¼ l13 were adopted, thus assigning the same weight l2 to the two spatial energy contributions and the same weight l3 to the two temporal contributions, and performing a grid search in a two-dimensional (and not four-dimensional) space. The adopted search range was [0,10] for both parameters with discretization step 0.2: for each combination of (l2, l3), ICM was run until convergence and the resulting classification accuracies on the training set were computed. A global maximum of the average of the two training-set OAs obtained at the two dates was reached for l2 ¼ 9.8 and l3 ¼ 2.6: the corresponding accuracies (on both the training and test sets) are shown in Table 14.2. This optimal parameter vector turns out to be quite different from the solution automatically computed by the proposed method. The training-set accuracies of TE–ICM are better than the ones of HK–ICM; however, the difference between the OAs provided on the training set by the two approaches is only 0.44% in October, 2000 and 0.48% in June, 2001, and the differences between the AAs are only 0.17% and 0.47%, respectively. This suggests that, although identifying different parameter solutions, the proposed automatic approach and the standard interactive one allow achieving similar classification performances. A similar small difference of performance can also be noted between the corresponding test-set accuracies.

2.2 l01

2

l02

l03

1.8 1.6 1.4 1.2 1 0.8 1

401

801

1201

1601

2001

2401

Ho–Kashyap iteration FIGURE 14.3 Experiment II: Plots of behaviors of the ‘‘spectral’’ parameter l01, the ‘‘spatial’’ parameter l02, and the ‘‘temporal’’ parameter l03 versus the number of Ho–Kashyap iterations for the October, 2000, image.

MRF-Based Remote-Sensing Image Classification

321

14.4.3 Experiment III: Spatio-Temporal MRF Model for Multi-Temporal Classification of Image Sequences As a third experiment, HK–ICM was applied to the multi-temporal mutual MRF model (see Section 14.2) extended to the problem of supervised classification of image sequences [7]. In particular, considering, a sequence {I0, I1, I2} of three co-registered images acquired over the same ground area at times t0, t1, t2, respectively (t0 < t1 < t2), the mutual model is generalized by introducing an energy function with four contributions at the intermediate date t1, that is, a spectral and a spatial energy term (namely, E1() and E2()) expressed as in Equation 14.19 and two distinct temporal energy terms, E3() and E4(), related to the backward temporal correlation of I1 with the previous image I0 and to the forward temporal correlation of I1 with I2, respectively [7]. For the first and last images in the sequence (i.e., I0 and I2), only one typology of temporal energy is well defined (in particular, only the forward temporal energy E3() for I0 and only the backward energy E4() for I2). All the temporal energy terms are computed in terms of transition probabilities, as in Equation 14.19. Therefore, four parameters (l11, l12, l13, and l14) have to be set at date t1, while only three parameters are needed at date t0 (namely, l01, l02, and l03) or at date t2 (namely, l21, l22, and l24). Specifically, a sequence of three 700 280 pixel-sized co-registered multi-frequency SAR images, acquired over an agricultural area near the city of Pavia in April 16, 17, and 18, 1994, was used in the experiment (the ground area is not the same as the one considered in Experiments I and II, although the two regions are quite close to each other). At each date, a 4-look XSAR band (VV polarization) and three 4-look SIR-C-bands (HH, HV, and TP (total power)) were available. The observed scene at the three dates presented two main classes, that is, ‘‘wet soil’’ and ‘‘bare soil,’’ and the temporal evolution of these classes between April 16 and April 18 was due to the artificial flooding processes caused by rice cultivation. The PDF estimates and the initial classification maps were computed using the DTC-Parzen method (as in Experiment II) applied here with a kernel-width optimized according again to the SAR-specific method developed in Ref. [54] and based on a Rayleigh reference density. As in Experiment II, the transition probabilities were estimated as relative frequencies on the initial noncontextual maps. Specifically, the Ho–Kashyap step was applied separately to set the parameters at each date, converging to (0.97,1.23,1.56) in 11510 iterations for April 16, to (1.12,1.03,0.99,1.28) in 15092 iterations for April 17, and to (1.00,1.04,0.97) in 2112 iterations for April 18. As shown in Table 14.3, the noncontextual classification stage already provided good accuracies, but the use of the contextual information allowed a sharp increase in accuracy to be obtained at all the three dates (i.e., þ6.59%, þ4.82%, and þ7.22% in OA for April 16, 17, and 18, respectively, and þ4.77%, þ3.09%, and þ4.94% in AA, respectively), thus suggesting the effectiveness of the parameter values selected by the developed algorithm. In particular, as shown in Figure 14.4 with regard to the April 16 image, the noncontextual DTC maps (e.g., Figure 14.4a) were very noisy, due to the impact of the presence of speckle in the SAR data on the classification results, whereas the contextual HK–ICM maps (e.g., Figure 4b) were far less affected by speckle, because of the integration of contextual information in the classification process. It is worth noting that no preliminary despeckling procedure was applied to the input SIR-C/XSAR images before classifying them. Also for this experiment, in Table 14.3, we present the results obtained by searching exhaustively for the MRF parameters yielding the best training-set accuracies. As in Experiment II, for the ‘‘trial-and-error’’ parameter optimization the following restrictions were adopted: l01 ¼ l11 ¼ l21 ¼ 1, l02 ¼ l12 ¼ l22 ¼ l2, and l03 ¼ l13 ¼ l14 ¼ l24 ¼ l3 (i.e., a unitary value was used for the ‘‘spectral’’ parameters, the same weight l2 was assigned to all the spatial energy contributions employed at the three dates, and the same weight l3 was associated with all the temporal energy terms). Therefore, the number of

Signal and Image Processing for Remote Sensing

322

(a)

(b) FIGURE 14.4 Experiment III: Classification maps for the April 16, image provided by: (a) the noncontextual DTC classifier; (b) the proposed contextual HK–ICM classifier. Color legend: white ¼ ‘‘wet soil,’’ grey ¼ ‘‘dry soil,’’ black ¼ not classified (not classified pixels are present where the DTC-Parzen PDF estimates exhibit very low values for both classes).

independent parameters to be set interactively was reduced from 7 to 2. Again the search range [0,10] (discretized with step 0.2) was adopted for both parameters and, for each couple (l2, l3), ICM was run up to convergence. Then, the parameter vector yielding the best classification result was chosen. In particular, the average of the three OAs on the training set at the three dates exhibited a global maximum for l2 ¼ 1 and l3 ¼ 0.4. The corresponding accuracies are given in Table 14.3 and denoted by TE–ICM. As noted with regard to Experiment II, although the parameter vectors selected by the proposed automatic procedure and by the interactive ‘‘trial-and-error’’ one were quite different, the classification performances achieved by the two methods on the training set were very similar (and very close or equal to 100%). On the other hand, in most cases the test-set accuracies obtained by HK–ICM are even better than the ones given by TE–ICM. These results can be interpreted as a consequence of the fact that, to make the computation time for the exhaustive parameter search tractable, the above-mentioned restrictions on the parameter values were used when applying TE–ICM, whereas HK–ICM optimized all the MRF parameters, without any need for additional restrictions. The resulting TE–ICM parameters were effective for the classification of the training set (as they were specifically optimized toward this end), but they yielded less accurate results on the test set.

14.5

Conclusions

In the present chapter, an innovative algorithm has been proposed that addresses the problem of the automatization of the parameter-setting operations involved by MRF

MRF-Based Remote-Sensing Image Classification

323

models for ICM-based supervised image classification. The method is applicable to a wide class of MRF models (i.e., the models whose energy functions can be expressed as weighted sums of distinct energy contributions), and expresses the parameter optimization as a linear discrimination problem, solved by using the Ho–Kashyap method. In particular, the theoretical properties known for this algorithm in the context of linear classifier design still holds also for the proposed method, thus ensuring a good convergent behavior. The numerical results proved the capability of HK–ICM to select automatically parameter values that can yield accurate contextual classification maps. These results have been obtained for several different MRF models, dealing both with single-date supervised classification and with multi-temporal classification of two-date or multidate imagery, which suggests a high flexibility of the method. This is further confirmed by the fact that good classification results were achieved on different typologies of remote sensing data (i.e., multi-spectral, optical-SAR multi-source, and multi-frequency SAR) and with different PDF estimation strategies (both parametric and nonparametric). In particular, an interactive choice of the MRF parameters that selects through an exhaustive grid search the parameter values that provide the best performances on the training set yielded results very similar to the ones obtained by the proposed HK–ICM methodology in terms of classification accuracy. Specifically, in the first experiment, HK–ICM automatically identified parameter values quite close to the ones selected by this ‘‘trial-and-error’’ procedure (thus providing a very similar classification map), while in the other experiments, HK–ICM chose parameter values different from the ones obtained by the ‘‘trial-and-error’’ strategy, although generating classification results with similar (Experiment II) or even better (Experiment III) accuracies. The application of the ‘‘trial-and-error’’ approach in acceptable execution times required the definition of suitable restrictions on the parameter values to reduce the dimension of the parameter space to be explored. Such restrictions are not needed by the proposed method, which turns out to be a good compromise between accuracy and level of automatization, as it allows obtaining very good classification results, although avoiding completely the time-consuming phase of ‘‘trial-and-error’’ parameter setting. In addition, in all the experiments, the adopted MRF models yielded a significant increase in accuracy, as compared with the initial noncontextual results. This further highlights the advantages of MRF models, which effectively exploit the contextual information in remote sensing image classification, and further confirms the usefulness of the proposed technique in automatically setting the parameters of such models. The proposed algorithm allows one to overcome the usual limitation on the use of MRF techniques in supervised image classification, consisting in the lack of automatization of such techniques, and supports a more extensive use of this powerful classification approach in remote sensing. It is worth noting that HK–ICM optimizes the MRF parameter vector according to the initial classification maps provided as an input to ICM: the resulting optimal parameters are then employed to run ICM up to convergence. A further generalization of the method would adapt automatically the MRF parameters also during the ICM iterative process by applying the proposed Ho–Kashyap-based method at each ICM iteration or according to a different predefined iterative scheme. This could allow a further improvement in accuracy, although requiring one to run the Ho–Kashyap algorithm not once but several times, thus increasing the total computation time of the contextual classification process. The effect of this combined optimization strategy on the convergence of ICM is an issue worth being investigated.

324

Signal and Image Processing for Remote Sensing

Acknowledgments This research was carried out within the framework of the PRIN-2002 project entitled ‘‘Processing and analysis of multitemporal and hypertemporal remote-sensing images for environmental monitoring,’’ which was funded by the Italian Ministry of Education, University, and Research (MIUR). The support is gratefully acknowledged. The authors would also like to thank Dr. Paolo Gamba from the University of Pavia, Italy, for providing the SAR images employed in Experiments II and III.

References 1. M. Datcu, K. Seidel, and M. Walessa, Spatial information retrieval from remote sensing images: Part I. Information theoretical perspective, IEEE Trans. Geosci. Rem. Sens., 36(5), 1431–1445, 1998. 2. R.C. Dubes and A.K. Jain, Random field models in image analysis, J. Appl. Stat., 16, 131–163, 1989. 3. S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration, IEEE Trans. Pattern Anal. Machine Intell., 6, 721–741, 1984. 4. A.H.S. Solberg, Flexible nonlinear contextual classification, Pattern Recogn. Lett., 25(13), 1501–1508, 2004. 5. Q. Jackson and D. Landgrebe, Adaptive Bayesian contextual classification based on Markov random fields, IEEE Trans. Geosci. Rem. Sens., 40(11), 2454–2463, 2002. 6. M.De Martino, G. Macchiavello, G. Moser., and S.B. Serpico, Partially supervised contextual classification of multitemporal remotely sensed images, In Proc. of IEEE-IGARSS 2003, Toulouse, 2, 1377–1379, 2003. 7. F. Melgani and S.B. Serpico, A Markov random field approach to spatio-temporal contextual image classification, IEEE Trans. Geosci. Rem. Sens., 41(11), 2478–2487, 2003. 8. P.H. Swain, Bayesian classification in a time-varying environment, IEEE Trans. Syst., Man, Cybern., 8(12), 880–883, 1978. 9. A.H.S. Solberg, T. Taxt, and A.K. Jain, A Markov Random Field model for classification of multisource satellite imagery, IEEE Trans. Geosci. Rem. Sens., 34(1), 100–113, 1996. 10. G. Storvik, R. Fjortoft, and A.H.S. Solberg, A bayesian approach to classification of multiresolution remote sensing data, IEEE Trans. Geosci. Rem. Sens., 43(3), 539–547, 2005. 11. M.L. Comer and E.J. Delp, The EM/MPM algorithm for segmentation of textured images: Analysis and further experimental results, IEEE Trans. Image Process., 9(10), 1731–1744, 2000. 12. Y. Delignon, A. Marzouki, and W. Pieczynski, Estimation of generalized mixtures and its application to image segmentation, IEEE Trans. Image Process., 6(10), 1364–1375, 2001. 13. X. Descombes, M. Sigelle, and F. Preteux, Estimating Gaussian Markov random field parameters in a nonstationary framework: Application to remote sensing imaging, IEEE Trans. Image Process., 8(4), 490–503, 1999. 14. P. Smits and S. Dellepiane, Synthetic aperture radar image segmentation by a detail preserving Markov random field approach, IEEE Trans. Geosci. Rem. Sens., 35(4), 844–857, 1997. 15. G.G. Hazel, Multivariate Gaussian MRF for multispectral scene segmentation and anomaly detection, IEEE Trans. Geosci. Rem. Sens., 38(3), 1199–1211, 2000. 16. G. Rellier, X. Descombes, F. Falzon, and J. Zerubia, Texture feature analysis using a Gauss–Markov model in hyperspectral image classification, IEEE Trans. Geosci. Rem. Sens., 42(7), 1543–1551, 2004. 17. L. Bruzzone and D.F. Prieto, Automatic analysis of the difference image for unsupervised change detection, IEEE Trans. Geosci. Rem. Sens., 38(3), 1171–1182, 2000. 18. L. Bruzzone and D.F. Prieto, An adaptive semiparametric and context-based approach to unsupervised change detection in multitemporal remote-sensing images, IEEE Trans. Image Process., 40(4), 452–466, 2002.

MRF-Based Remote-Sensing Image Classification

325

19. T. Kasetkasem and P.K. Varshney, An image change detection algorithm based on markov random field models, IEEE Trans. Geosci. Rem. Sens., 40(8), 1815–1823, 2002. 20. J. Besag, On the statistical analysis of dirty pictures, J. R. Statist. Soc., 68, 259–302, 1986. 21. Y. Cao, H. Sun, and X. Xu, An unsupervised segmentation method based on MPM for SAR images, IEEE Geosci. Rem. Sens. Lett., 2(1), 55–58, 2005. 22. X. Descombes, R.D. Morris, J. Zerubia, and M. Berthod, Estimation of Markov Random Field prior parameters using Markov Chain Monte Carlo maximum likelihood, IEEE Trans. Image Process., 8(7), 954–963, 1999. 23. M.V. Ibanez and A. Simo’, Parameter estimation in Markov random field image modeling with imperfect observations. A comparative study, Pattern Rec. Lett., 24(14), 2377–2389, 2003. 24. J.T. Tou and R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, MA, 1974. 25. Y.-C. Ho and R.L. Kashyap, An algorithm for linear inequalities and its applications, IEEE Trans. Elec. Comp., 14, 683–688, 1965. 26. G. Celeux, F. Forbes, and N. Peyrand, EM procedures using mean field-like approximations for Markov model-based image segmentation, Pattern Recogn., 36, 131–144, 2003. 27. Y. Zhang, M. Brady, and S. Smith, Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm, IEEE Trans. Med. Imag., 20(1), 45–57, 2001. 28. R. Fiortoft, Y. Delignon, W. Pieczynski, M. Sigelle, and F. Tupin, Unsupervised classification of radar images using hidden Markov models and hidden Markov random fields, IEEE Trans. Geosci. Rem. Sens., 41(3), 675–686, 2003. 29. Z. Kato, J. Zerubia, and M. Berthod, Unsupervised parallel image classification using Markovian models, Pattern Rec., 32, 591–604, 1999. 30. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, Academic Press, New York, 1990. 31. F. Comets and B. Gidas, Parameter estimation for Gibbs distributions from partially observed data, Ann. Appl. Prob., 2(1), 142–170, 1992. 32. W. Qian and D.N. Titterington, Estimation of parameters of hidden Markov models, Phil. Trans.: Phys. Sci. and Eng., 337(1647), 407–428, 1991. 33. X. Descombes, A dense class of Markov random fields and associated parameter estimation, J. Vis. Commun. Image R., 8(3), 299–316, 1997. 34. J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. R. Statist. Soc., 36, 192–236, 1974. 35. L. Bedini, A. Tonazzini, and S. Minutoli, Unsupervised edge-preserving image restoration via a saddle point approximation, Image and Vision Computing, 17, 779–793, 1999. 36. S.S. Saquib, C.A. Bouman, and K. Sauer, ML parameter estimation for Markov random fields with applications to Bayesian tomography, IEEE Trans. Image Process., 7(7), 1029–1044, 1998. 37. C. Geyer and E. Thompson, Constrained Monte Carlo maximum likelihood for dependent data, J. Roy. Statist. Soc. Ser. B, 54, 657–699, 1992. 38. L. Younes, Estimation and annealing for Gibbsian fields, Annales de l’Institut Henri Poincare´. Probabilite´s et Statistiques, 24, 269–294, 1988. 39. B. Chalmond, An iterative Gibbsian technique for reconstruction of mary images, Pattern Rec., 22(6), 747–761, 1989. 40. Y. Yu and Q. Cheng, MRF parameter estimation by an accelerated method, Patt. Rec. Lett., 24, 1251–1259, 2003. 41. L. Wang, J. Liu, and S.Z. Li, MRF parameter estimation by MCMC method, Pattern Rec., 33, 1919–1925, 2000. 42. L. Wang and J. Liu, Texture segmentation based on MRMRF modeling, Patt. Rec. Lett., 21, 189– 200, 2000. 43. H. Derin and H. Elliott, Modeling and segmentation of noisy and textured images using Gibbs random fields, IEEE Trans. Pattern Anal. Machine Intell., 9(1), 39–55, 1987. 44. M. Mignotte, C. Collet, P. Perez, and P. Bouthemy, Sonar image segmentation using an unsupervised hierarchical MRF model, IEEE Trans. Image Process., 9(7), 1216–1231, 2000. 45. B.C.K. Tso and P.M. Mather, Classification of multisource remote sensing imagery using a genetic algorithm and Markov Random Fields, IEEE Trans. Geosci. Rem. Sens., 37(3), 1255–1260, 1999.

326

Signal and Image Processing for Remote Sensing

46. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd edition, Wiley, New York, 2001. 47. M.W. Zemansky, Heat and Thermodynamics, 5th edition, McGraw-Hill, New York, 1968. 48. J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, 2nd edition, Springer-Verlag, New York, 1992. 49. J. Richards and X. Jia, Remote sensing digital image analysis, 3rd edition, Springer-Verlag, Berlin, 1999. 50. D.A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing. Wiley-InterScience, New York, 2003. 51. M. Datcu, F. Melgani, A. Piardi, and S.B. Serpico, Multisource data classification with dependence trees, IEEE Trans. Geosci. Rem. Sens., 40(3), 609–617, 2002. 52. A. Berlinet and L. Devroye, A comparison of kernel density estimates, Publications de l’Institut de Statistique de l’Universite´ de Paris, 38(3), 3–59, 1994. 53. K.-D. Kim and J.-H. Heo, Comparative study of flood quantiles estimation by nonparametric models, J. Hydrology, 260, 176–193, 2002. 54. G. Moser, J. Zerubia, and S.B. Serpico, Dictionary-based stochastic expectation-maximization for SAR amplitude probability density function estimation, Research Report 5154, INRIA, Mar. 2004.

15 Random Forest Classification of Remote Sensing Data

Sveinn R. Joelsson, Jon A. Benediktsson, and Johannes R. Sveinsson

CONTENTS 15.1 Introduction ..................................................................................................................... 327 15.2 The Random Forest Classifier....................................................................................... 328 15.2.1 Derived Parameters for Random Forests...................................................... 329 15.2.1.1 Out-of-Bag Error .............................................................................. 329 15.2.1.2 Variable Importance ........................................................................ 329 15.2.1.3 Proximities ........................................................................................ 329 15.3 The Building Blocks of Random Forests..................................................................... 330 15.3.1 Classification and Regression Tree ................................................................ 330 15.3.2 Binary Hierarchy Classifier Trees .................................................................. 330 15.4 Different Implementations of Random Forests ......................................................... 331 15.4.1 Random Forest: Classification and Regression Tree................................... 331 15.4.2 Random Forest: Binary Hierarchical Classifier............................................ 331 15.5 Experimental Results...................................................................................................... 331 15.5.1 Classification of a Multi-Source Data Set...................................................... 331 15.5.1.1 The Anderson River Data Set Examined with a Single CART Tree ......................................................................... 335 15.5.1.2 The Anderson River Data Set Examined with the BHC Approach ................................................................................. 337 15.5.2 Experiments with Hyperspectral Data.......................................................... 338 15.6 Conclusions...................................................................................................................... 343 Acknowledgment....................................................................................................................... 343 References ................................................................................................................................... 343

15.1

Introduction

Ensemble classification methods train several classifiers and combine their results through a voting process. Many ensemble classifiers [1,2] have been proposed. These classifiers include consensus theoretic classifiers [3] and committee machines [4]. Boosting and bagging are widely used ensemble methods. Bagging (or bootstrap aggregating) [5] is based on training many classifiers on bootstrapped samples from the training set and has been shown to reduce the variance of the classification. In contrast, boosting uses iterative re-training, where the incorrectly classified samples are given more weight in 327

Signal and Image Processing for Remote Sensing

328

successive training iterations. This makes the algorithm slow (much slower than bagging) while in most cases it is considerably more accurate than bagging. Boosting generally reduces both the variance and the bias of the classification and has been shown to be a very accurate classification method. However, it has various drawbacks: it is computationally demanding, it can overtrain, and is also sensitive to noise [6]. Therefore, there is much interest in investigating methods such as random forests. In this chapter, random forests are investigated in the classification of hyperspectral and multi-source remote sensing data. A random forest is a collection of classification trees or treelike classifiers. Each tree is trained on a bootstrapped sample of the training data, and at each node in each tree the algorithm only searches across a random subset of the features to determine a split. To classify an input vector in a random forest, the vector is submitted as an input to each of the trees in the forest. Each tree gives a classification, and it is said that the tree votes for that class. In the classification, the forest chooses the class having the most votes (over all the trees in the forest). Random forests have been shown to be comparable to boosting in terms of accuracies, but without the drawbacks of boosting. In addition, the random forests are computationally much less intensive than boosting. Random forests have recently been investigated for classification of remote sensing data. Ham et al. [7] applied them in the classification of hyperspectral remote sensing data. Joelsson et al. [8] used random forests in the classification of hyperspectral data from urban areas and Gislason et al. [9] investigated random forests in the classification of multi-source remote sensing and geographic data. All studies report good accuracies, especially when computational demand is taken into account. The chapter is organized as follows. Firstly random forest classifiers are discussed. Then, two different building blocks for random forests, that is, the classification and regression tree (CART) and the binary hierarchical classifier (BHC) approaches are reviewed. In Section 15.4, random forests with the two different building blocks are discussed. Experimental results for hyperspectral and multi-source data are given in Section 15.5. Finally, conclusions are given in Section 15.6.

15.2

The Random Forest Classifier

A random forest classifier is a classifier comprising a collection of treelike classifiers. Ideally, a random forest classifier is an i.i.d. randomization of weak learners [10]. The classifier uses a large number of individual decision trees, all of which are trained (grown) to tackle the same problem. A sample is decided to belong to the most frequently occurring of the classes as determined by the individual trees. The individuality of the trees is maintained by three factors: 1. Each tree is trained using a random subset of the training samples. 2. During the growing process of a tree the best split on each node in the tree is found by searching through m randomly selected features. For a data set with M features, m is selected by the user and kept much smaller than M. 3. Every tree is grown to its fullest to diversify the trees so there is no pruning. As described above, a random forest is an ensemble of treelike classifiers, each trained on a randomly chosen subset of the input data where final classification is based on a majority vote by the trees in the forest.

Random Forest Classification of Remote Sensing Data

329

Each node of a tree in a random forest looks to a random subset of features of fixed size m when deciding a split during training. The trees can thus be viewed as random vectors of integers (features used to determine a split at each node). There are two points to note about the parameter m: 1. Increasing the correlation between the trees in the forest by increasing m, increases the error rate of the forest. 2. Increasing the classification accuracy of every individual tree by increasing m, decreases the error rate of the forest. An optimal interval for m is between the somewhat fuzzy extremes discussed above. The parameter m is often said to be the only adjustable parameter to which the forest is sensitive and the ‘‘optimal’’ range for m is usually quite wide [10]. 15.2.1

Derived Parameters for Random Forests

There are three parameters that are derived from the random forests. These parameters are the out-of-bag (OOB) error, the variable importance, and the proximity analysis. 15.2.1.1

Out-of-Bag Error

To estimate the test set accuracy, the out-of-bag samples (the remaining training set samples that are not in the bootstrap for a particular tree) of each tree can be run down through the tree (cross-validation). The OOB error estimate is derived by the classification error for the samples left out for each tree, averaged over the total number of trees. In other words, for all the trees where case n was OOB, run case n down the trees and note if it is correctly classified. The proportion of times the classification is in error, averaged over all the cases, is the OOB error estimate. Let us consider an example. Each tree is trained on a random 2/3 of the sample population (training set) while the remaining 1/3 is used to derive the OOB error rate for that tree. The OOB error rate is then averaged over all the OOB cases yielding the final or total OOB error. This error estimate has been shown to be unbiased in many tests [10,11]. 15.2.1.2

Variable Importance

For a single tree, run it on its OOB cases and count the votes for the correct class. Then, repeat this again after randomly permuting the values of a single variable in the OOB cases. Now subtract the correctly cast votes for the randomly permuted data from the number of correctly cast votes for the original OOB data. The average of this value over all the forest is the raw importance score for the variable [5,6,11]. If the values of this score from tree to tree are independent, then the standard error can be computed by a standard computation [12]. The correlations of these scores between trees have been computed for a number of data sets and proved to be quite low [5,6,11]. Therefore, we compute standard errors in the classical way: divide the raw score by its standard error to get a z-score, and assign a significance level to the z-score assuming normality [5,6,11]. 15.2.1.3 Proximities After a tree is grown all the data are passed through it. If cases k and n are in the same terminal node, their proximity is increased by one. The proximity measure can be used

Signal and Image Processing for Remote Sensing

330

(directly or indirectly) to visualize high dimensional data [5,6,11]. As the proximities are indicators on the ‘‘distance’’ to other samples this measure can be used to detect outliers in the sense that an outlier is ‘‘far’’ from all other samples.

15.3

The Building Blocks of Random Forests

Random forests are made up of several trees or building blocks. The building blocks considered here are CART, which partition the input data, and the BHC trees, which partition the labels (the output).

15.3.1

Classification and Regression Tree

CART is a decision tree where splits are made on a variable/feature/dimension resulting in the greatest change in impurity or minimum impurity given a split on a variable in the data set at a node in the tree [12]. The growing of a tree is maintained until either the change in impurity has stopped or is below some bound or the number of samples left to split is too small according to the user. CART trees are easily overtrained, so a single tree is usually pruned to increase its generality. However, a collection of unpruned trees, where each tree is trained to its fullest on a subset of the training data to diversify individual trees can be very useful. When collected in a multi-classifier ensemble and trained using the random forest algorithm, these are called RF-CART.

15.3.2

Binary Hierarchy Classifier Trees

A binary hierarchy of classifiers, where each node is based on a split regarding labels and output instead of input as in the CART case, are naturally organized in trees and can as such be combined, under similar rules as the CART trees, to form RF-BHC. In a BHC, the best split on each node is based on (meta-) class separability starting with a single metaclass, which is split into two meta-classes and so on; the true classes are realized in the leaves. Simultaneously to the splitting process, the Fisher discriminant and the corresponding projection are computed, and the data are projected along the Fisher direction [12]. In ‘‘Fisher space,’’ the projected data are used to estimate the likelihood of a sample belonging to a meta-class and from there the probabilities of a true class belonging to a meta-class are estimated and used to update the Fisher projection. Then, the data are projected using this updated projection and so forth until a user-supplied level of separation is acquired. This approach utilizes natural class affinities in the data, that is, the most natural splits occur early in the growth of the tree [13]. A drawback is the possible instability of the split algorithm. The Fisher projection involves an inverse of an estimate of the within-class covariance matrix, which can be unstable at some nodes of the tree, depending on the data being considered and so if this matrix estimate is singular (to numerical precision), the algorithm fails. As mentioned above, the BHC trees can be combined to an RF-BHC where the best splits on classes are performed on a subset of the features in the data to diversify individual trees and stabilize the aforementioned inverse. Since the number of leaves in a BHC tree is the same as the number of classes in the data set the trees themselves can be very informative when compared to CART-like trees.

Random Forest Classification of Remote Sensing Data

15.4 15.4.1

331

Different Implementations of Random Forests Random Forest: Classification and Regression Tree

The RF-CART approach is based on CART-like trees where trees are grown to minimize an impurity measure. When trees are grown using a minimum Gini impurity criterion [12], the impurity of two descendent nodes in a tree is less than the parents. Adding up the decrease in the Gini value for each variable over all the forest gives a variable importance that is often very consistent with the permutation importance measure.

15.4.2

Random Forest: Binary Hierarchical Classifier

RF-BHC is a random forest based on an ensemble of BHC trees. In the RF-BHC, a split in the tree is based on the best separation between meta-classes. At each node the best separation is found by examining m features selected at random. The value of m can be selected by trials to yield optimal results. In the case where the number of samples is small enough to induce the ‘‘curse’’ of dimensionality, m is calculated by looking to a user-supplied ratio R between the number of samples and the number of features; then m is either used unchanged as the supplied value or a new value is calculated to preserve the ratio R, whichever is smaller at the node in question [7]. An RF-BHC is uniform regarding tree size (depth) because the number of nodes is a function of the number of classes in the dataset.

15.5

Experimental Results

Random forests have many important qualities of which many apply directly to multi- or hyperspectral data. It has been shown that the volume of a hypercube concentrates in the corners and the volume of a hyper ellipsoid concentrates in an outer shell, implying that with limited data points, much of the hyperspectral data space is empty [17]. Making a collection of trees is attractive, when each of the trees looks to minimize or maximize some information content related criteria given a subset of the features. This means that the random forest can arrive at a good decision boundary without deleting or extracting features explicitly while making the most out of the training set. This ability to handle thousands of input features is especially attractive when dealing with multi- or hyperspectral data, because more often than not it is composed of tens to hundreds of features and a limited number of samples. The unbiased nature of the OOB error rate can in some cases (if not all) eliminate the need for a validation dataset, which is another plus when working with a limited number of samples. In experiments, the RF-CART approach was tested using a FORTRAN implementation of random forests supplied on a web page maintained by Leo Breiman and Adele Cutler [18].

15.5.1

Classification of a Multi-Source Data Set

In this experiment we use the Anderson River data set, which is a multi-source remote sensing and geographic data set made available by the Canada Centre for Remote Sensing

Signal and Image Processing for Remote Sensing

332

(CCRS) [16]. This data set is very difficult to classify due to a number of mixed forest type classes [15]. Classification was performed on a data set consisting of the following six data sources: 1. Airborne multispectral scanner (AMSS) with 11 spectral data channels (ten channels from 380 nm to 1100 nm and one channel from 8 mm to 14 mm) 2. Steep mode synthetic aperture radar (SAR) with four data channels (X-HH, X-HV, L-HH, and L-HV) 3. Shallow mode SAR with four data channels (X-HH, X-HV, L-HH, and L-HV) 4. Elevation data (one data channel, where elevation in meters pixel value) 5. Slope data (one data channel, where slope in degrees pixel value) 6. Aspect data (one data channel, where aspect in degrees pixel value) There are 19 information classes in the ground reference map provided by CCRS. In the experiments, only the six largest ones were used, as listed in Table 15.1. Here, training samples were selected uniformly, giving 10% of the total sample size. All other known samples were then used as test samples [15]. The experimental results for random forest classification are given in Table 15.2 through Table 15.4. Table 15.2 shows line by line, how the parameters (number of split variables m and number of trees) are selected. First, a forest of 50 trees is grown for various number of split variables, then the number yielding the highest train accuracy (OOB) is selected, and then growing more trees until the overall accuracy stops increasing is tried. The overall accuracy (see Table 15.2) was seen to be insensitive to variable settings on the interval 10–22 split variables. Growing the forest larger than 200 trees improves the overall accuracy insignificantly, so a forest of 200 trees, each of which considers all the input variables at every node, yields the highest accuracy. The OOB accuracy in Table 15.2 seems to support the claim that overfitting is next to impossible using random forests in this manner. However the ‘‘best’’ results were obtained using 22 variables so there is no random selection of input variables at each node of every tree here because all variables are being considered on every split. This might suggest that a boosting algorithm using decision trees might yield higher overall accuracies. The highest overall accuracies achieved with the Anderson River data set, known to the authors at the time of this writing, have been reached by boosting using j4.8 trees [17]. These accuracies were 100% training accuracy (vs. 77.5% here) and 80.6% accuracy for test data, which are not dramatically higher than the overall accuracies observed here (around 79.0%) with a random forest (about 1.6 percentage points difference). Therefore, even though m is not much less than the total number of variables (in fact equal), the TABLE 15.1 Anderson River Data: Information Classes and Samples Class No. 1 2 3 4 5 6

Class Description

Training Samples

Test Samples

Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Total

971 551 548 542 317 1260 4189

1250 817 701 705 405 1625 5503

Random Forest Classification of Remote Sensing Data

333

TABLE 15.2 Anderson River Data: Selecting m and the Number of Trees Trees

Split Variables

Runtime (min:sec)

50 1 00:19 50 5 00:20 50 10 00:22 50 15 00:22 50 20 00:24 50 22 00:24 100 22 00:38 200 22 01:06 400 22 02:06 1000 22 05:09 100 10 00:32 200 10 00:52 400 10 01:41 1000 10 04:02 22 split variables selected as the ‘‘best’’ choice

OOB acc. (%)

Test Set acc. (%)

68.42 74.00 75.89 76.30 76.01 76.63 77.18 77.51 77.56 77.68 76.65 77.04 77.54 77.66

71.58 75.74 77.63 78.50 78.14 78.10 78.56 79.01 78.81 78.87 78.39 78.34 78.41 78.25

random forest ensemble performs rather well, especially when running times are taken into consideration. Here, in the random forest, each tree is an expert on a subset of the data but all the experts look to the same number of variables and do not, in the strictest sense, utilize the strength of random forests. However, the fact remains that the results are among the best ones for this data set. The training and test accuracies for the individual classes using random forests with 200 trees and 22 variables at each node are given in Table 15.3 and Table 15.4, respectively. From these tables, it can be seen that the random forest yields the highest accuracies for classes 5 and 6 but the lowest for class 2, which is in accordance with the outlier analysis below. A variable importance estimate for the training data can be seen in Figure 15.1, where each data channel is represented by one variable. The first 11 variables are multi-spectral data, followed by four steep-mode SAR data channels, four shallow-mode synthetic aperture radar, and then elevation, slope, and aspect measurements, one channel each. It is interesting to note that variable 20 (elevation) is the most important variable, followed by variable 22 (aspect), and spectral channel 6 when looking at the raw importance (Figure 15.1a), but slope when looking at the z-score (Figure 15.1b). The variable importance for each individual class can be seen in Figure 15.2. Some interesting conclusions can be drawn from Figure 15.2. For example, with the exception of class 6, topographic data TABLE 15.3 Anderson River Data: Confusion Matrix for Training Data in Random Forest Classification (Using 200 Trees and Testing 22 Variables at Each Node) Class No. 1 2 3 4 5 6

1

2

3

4

5

6

%

764 75 32 11 8 81

126 289 62 3 2 69

20 38 430 11 9 40

35 8 21 423 39 16

1 1 0 42 271 2

57 43 51 25 14 1070

78.68 52.45 78.47 78.04 85.49 84.92

Signal and Image Processing for Remote Sensing

334 TABLE 15.4

Anderson River Data: Confusion Matrix for Test Data in Random Forest Classification (Using 200 Trees and Testing 22 Variables at Each Node) Class No. 1 2 3 4 5 6

1

2

3

4

5

6

%

1006 87 26 19 7 105

146 439 67 65 6 94

29 40 564 12 7 49

44 23 18 565 45 10

2 0 3 44 351 5

60 55 51 22 14 1423

80.48 53.73 80.46 80.14 86.67 87.57

(channels 20–22) are of high importance and then come the spectral channels (channels 1–11). In Figure 15.2, we can see that the SAR channels (channels 12–19) seem to be almost irrelevant to class 5, but seem to play a more important role for the other classes. They always come third after the topographic and multi-spectral variables, with the exception of class 6, which seems to be the only class where this is not true; that is, the topographic variables score lower than an SAR channel (Shallow-mode SAR channel number 17 or X-HV). These findings can then be verified by classifying the data set according to only the most important variables and compared to the accuracy when all the variables are

Raw importance 15

10

5

0

2

4

6

8

10

12

14

16

18

20

22

16

18

20

22

(a) z-score

60 40 20 0 (b)

2

4

6

8 10 12 14 Variable (dimension) number

FIGURE 15.1 Anderson river training data: (a) variable importance and (b) z-score on raw importance.

Random Forest Classification of Remote Sensing Data

335

Class 1

Class 2

4 1

3 2

0.5 1 0

5

10

15

20

0

5

10

Class 3

15

20

15

20

Class 4 4

2

3 2

1

1 0

5

10

15

20

0

5

10

Class 5

Class 6

3 4 2

3 2

1 0

1 5 10 15 20 Variable (dimension) number

0

5 10 15 20 Variable (dimension) number

FIGURE 15.2 Anderson river training data: variable importance for each of the six classes.

included. For example leaving out variable 20 should have less effect on classification accuracy in class 6 than on all the other classes. A proximity matrix was computed for the training data to detect outliers. The results of this outlier analysis are shown in Figure 15.3, where it can be seen that the data set is difficult for classification as there are several outliers. From Figure 15.3, the outliers are spread over all classes—with a varying degree. The classes with the least amount of outliers (classes 5 and 6) are indeed those with the highest classification accuracy (Table 15.3 and Table 15.4). On the other hand, class 2 has the lowest accuracy and the highest number of outliers. In the experiments, the random forest classifier proved to be fast. Using an Intelt Celeront CPU 2.20-GHz desktop, it took about a minute to read the data set into memory, train, and classify the data set, with the settings of 200 trees and 22 split variables when the FORTRAN code supplied on the random forest web site was used [18]. The running times seem to indicate a linear time increase when considering the number of trees. They are seen along with a least squares fit to a line in Figure 15.4. 15.5.1.1 The Anderson River Data Set Examined with a Single CART Tree We look to all of the 22 features when deciding a split in the RF-CART approach above, so it is of interest here to examine if the RF-CART performs any better than a single CART tree. Unlike the RF-CART, a single CART is easily overtrained. Here we prune the CART tree to reduce or eliminate any overtraining features of the tree and hence use three data sets, a training set, testing set (used to decide the level of pruning), and a validation set to estimate the performance of the tree as a classifier (Table 15.5 and Table 15.6).

Signal and Image Processing for Remote Sensing

336

Outliers in the training data 20 10 0

500

1000

1500

2000 2500 Sample number

3000

3500

Class 1

Class 2

20

20

10

10

0

200

400 600 Class 3

0

800

20

20

10

10

0

100

200

300 Class 5

400

500

0

20

20

10

10

0

4000

50 100 150 200 250 Sample number (within class)

300

0

100

200

300 Class 4

400

500

100

200

300 Class 6

400

500

200 400 600 800 1000 1200 Sample number (within class)

FIGURE 15.3 Anderson River training data: outlier analysis for individual classes. In each case, the x-axis (index) gives the number of a training sample and the y-axis the outlier measure.

Random forest running times 350 10 variables, slope: 0.235 sec per tree + 22 variables, slope: 0.302 sec per tree

300

+

Running time (sec)

250

200 150 + 100 + 50 0

+

0

200

400

600 800 Number of trees

1000

FIGURE 15.4 Anderson river data set: random forest running times for 10 and 22 split variables.

1200

Random Forest Classification of Remote Sensing Data

337

TABLE 15.5 Anderson River Data Set: Training, Test, and Validation Sets Class Description

Training Samples

Test Samples

Validation Samples

Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Total samples

971 551 548 542 317 1260 4189

250 163 140 141 81 325 1100

1000 654 561 564 324 1300 4403

Class No. 1 2 3 4 5 6

As can be seen in Table 15.6 and from the results of the RF-CART runs above (Table 15.2), the overall accuracy is about 8 percentage points higher ( (78.8/70.81)*100 ¼ 11.3%) than the overall accuracy for the validation set in Table 15.6. Therefore, a boosting effect is present even though we need all the variables to determine a split in every tree in the RF-CART. 15.5.1.2

The Anderson River Data Set Examined with the BHC Approach

The same procedure was used to select the variable m when using the RF-BHC as in the RF-CART case. However, for the RF-BHC, the separability of the data set is an issue. When the number of randomly selected features was less than 11, it was seen that a singular matrix was likely for the Anderson River data set. The best overall performance regarding the realized classification accuracy turned out to be the same as for the RFCART approach or for m ¼ 22. The R parameter was set to 5, but given the number of samples per (meta-)class in this data set, the parameter is not necessary. This means 22 is always at least 5 times smaller than the number of samples in a (meta-)class during the growing of the trees in the RF-BHC. Since all the trees were trained using all the available features, the trees are more or less the same, the only difference is that the trees are trained on different subsets of the samples and thus the RF-BHC gives a very similar result as a single BHC. It can be argued that the RF-BHC is a more general classifier due to the nature of the error or accuracy estimates used during training, but as can be seen in Table 15.7 and Table 15.8 the differences are small, at least for this data set and no boosting effect seems to be present when using the RF-BHC approach when compared to a single BHC. TABLE 15.6 Anderson River Data Set: Classification Accuracy (%) for Training, Test, and Validation Sets Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

Training

Test

Validation

87.54 77.50 87.96 84.69 90.54 90.08 86.89

73.20 47.24 70.00 68.79 79.01 81.23 71.18

71.30 46.79 72.01 69.15 77.78 81.00 70.82

Signal and Image Processing for Remote Sensing

338 TABLE 15.7

Anderson River Data Set: Classification Accuracies in Percentage for a Single BHC Tree Classifier Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

15.5.2

Training

Test

50.57 47.91 58.94 72.32 77.60 71.75 62.54

50.40 43.57 59.49 67.23 73.58 72.80 61.02

Experiments with Hyperspectral Data

The data used in this experiment were collected in the framework of the HySens project, managed by Deutschen Zentrum fur Luft-und Raumfahrt (DLR) (the German Aerospace Center) and sponsored by the European Union. The optical sensor reflective optics system imaging spectrometer (ROSIS 03) was used to record four flight lines over the urban area of Pavia, northern Italy. The number of bands of the ROSIS 03 sensor used in the experiments is 103, with spectral coverage from 0.43 mm through 0.86 mm. The flight altitude was chosen as the lowest available for the airplane, which resulted in a spatial resolution of 1.3 m per pixel. The ROSIS data consist of nine classes (Table 15.9): The data were composed of 43923 samples, split up into 3921 training samples and 40002 for testing. Pseudo color image of the area along with the ground truth mask (training and testing samples) are shown in Figure 15.5. This data set was classified using a BHC tree, an RF-BHC, a single CART, and an RF-CART. The BCH, RF-BHC, CART, and RF-CART were applied on the ROSIS data. The forest parameters, m and R (for RF-BHC), were chosen by trials to maximize accuracies. The growing of trees was stopped when the overall accuracy did not improve using additional trees. This is the same procedure as for the Anderson River data set (see Table 15.2). For the RF-BHC, R was chosen to be 5, m chosen to be 25 and the forest was grown to only 10 trees. For the RF-CART, m was set to 25 and the forest was grown to 200 trees. No feature extraction was done at individual nodes in the tree when using the single BHC approach. TABLE 15.8 Anderson River Data Set: Classification Accuracies in Percentage for an RF-BHC, R ¼ 5, m ¼ 22, and 10 Trees Class Description Douglas fir (31–40 m) Douglas fir (21–40 m) Douglas fir þ Other species (31–40 m) Douglas fir þ Lodgepole pine (21–30 m) Hemlock þ Cedar (31–40 m) Forest clearings Overall accuracy

Training

Test

51.29 45.37 59.31 72.14 77.92 71.75 62.43

51.12 41.13 57.20 67.80 71.85 72.43 60.37

Random Forest Classification of Remote Sensing Data

339

TABLE 15.9 ROSIS University Data Set: Classes and Number of Samples Class No. 1 2 3 4 5 6 7 8 9

Class Description

Training Samples

Test Samples

Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Total samples

548 540 392 524 265 532 375 514 231 3,921

6,304 18,146 1,815 2,912 1,113 4,572 981 3,364 795 40,002

Classification accuracies are presented in Table 15.11 through Table 15.14. As in the single CART case for the Anderson River data set, approximately 20% of the samples in the original test set were randomly sampled into a new test set to select a pruning level for the tree, leaving 80% of the original test samples for validation as seen in Table 15.10. All the other classification methods used the training and test sets as described in Table 15.9. From Table 15.11 through Table 15.14 we can see that the RF-BHC give the highest overall accuracies of the tree methods where the single BHC, single CART, and the RFCART methods yielded lower and comparable overall accuracies. These results show that using many weak learners as opposed to a few stronger ones is not always the best choice in classification and is dependent on the data set. In our experience the RF-BHC approach is as accurate or more accurate than the RF-CART when the data set consists of moder-

Class color

(a) University ground truth

(b) University pseudo color (Gray) image

bg 1 2 3 4 5 6 7 8 9 FIGURE 15.5 ROSIS University: (a) reference data and (b) gray scale image.

Signal and Image Processing for Remote Sensing

340 TABLE 15.10

ROSIS University Data Set: Train, Test, and Validation Sets Class No. 1 2 3 4 5 6 7 8 9

Class Description

Training Samples

Test Samples

Validation Samples

Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Total samples

548 540 392 524 265 532 375 514 231 3,921

1,261 3,629 363 582 223 914 196 673 159 8,000

5,043 14,517 1,452 2,330 890 3,658 785 2,691 636 32,002

ately to highly separable (meta-)classes, but for difficult data sets the partitioning algorithm used for the BHC trees can fail to converge (inverse of the within-class covariance matrix becomes singular to numerical precision) and thus no BHC classifier can be realized. This is not a problem when using the CART trees as the building blocks partition the input and simply minimize an impurity measure given a split on a node. The classification results for the single CART tree (Table 15.11), especially for the two classes gravel and bare-soil, may be considered unacceptable when compared to the other methods that seem to yield more balanced accuracies for all classes. The classified images for the results given in Table 15.12 through Table 15.14 are shown in Figure 15.6a through Figure 15.6d. Since BHC trees are of fixed size regarding the number of leafs it is worth examining the tree in the single case (Figure 15.7). Notice the siblings on the tree (nodes sharing a parent): gravel (3)/shadow (9), asphalt (1)/bitumen (7), and finally meadows (2)/bare soil (6). Without too much stretch of the imagination, one can intuitively decide that these classes are related, at least asphalt/ bitumen and meadows/bare soil. When comparing the gravel area in the ground truth image (Figure 15.5a) and the same area in the gray scale image (Figure 15.5b), one can see it has gray levels ranging from bright to relatively dark, which might be interpreted as an intuitive relation or overlap between the gravel (3) and the shadow (9) classes. The selfblocking bricks (8) are the class closest to the asphalt-bitumen meta-class, which again looks very similar in the pseudo color image. So the tree more or less seems to place ‘‘naturally’’ TABLE 15.11 Single CART: Training, Test, and Validation Accuracies in Percentage for ROSIS University Data Set Class Description Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

Validation

80.11 83.52 0.00 88.36 97.36 46.99 85.07 84.63 96.10 72.35

70.74 75.48 0.00 97.08 91.03 24.73 82.14 92.42 100.00 69.59

72.24 75.80 0.00 97.00 86.07 26.60 80.38 92.27 99.84 69.98

Random Forest Classification of Remote Sensing Data

341

TABLE 15.12 BHC: Training and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

78.83 93.33 72.45 91.60 97.74 94.92 93.07 85.60 94.37 88.52

69.86 55.11 62.92 92.20 94.79 89.63 81.55 88.64 96.35 69.83

TABLE 15.13 RF-BHC: Training and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

76.82 84.26 59.95 88.36 100.00 75.38 92.53 83.07 96.10 82.53

71.41 68.17 51.35 95.91 99.28 78.85 87.36 92.45 99.50 75.16

TABLE 15.14 RF-CART: Train and Test Accuracies in Percentage for ROSIS University Data Set Class Asphalt Meadows Gravel Trees (Painted) metal sheets Bare soil Bitumen Self-blocking bricks Shadow Overall accuracy

Training

Test

86.86 90.93 76.79 92.37 99.25 91.17 88.80 83.46 94.37 88.75

80.36 54.32 46.61 98.73 99.01 77.60 78.29 90.64 97.23 69.70

Signal and Image Processing for Remote Sensing

342 (a) Single BHC

(b) RF-BHC

(c) Single CART

(d) RF-CART

Colorbar indicating the color of information classes in images above

1

2

3

4

5 6 Class Number

7

8

9

FIGURE 15.6 ROSIS University: image classified by (a) single BHC, (b) RF-BHC, (c) single CART, and (d) RF-CART.

related classes close to one another in the tree. That would mean that classes 2, 6, 4, and 5 are more related to each other than to classes 3, 9, 1, 7, or 8. On the other hand, it is not clear if (painted) metal sheets (5) are ‘‘naturally’’ more related to trees (4) than to bare soil (6) or asphalt (1). However, the point is that the partition algorithm finds the ‘‘clearest’’ separation between meta-classes. Therefore, it may be better to view the tree as a separation hierarchy rather than a relation hierarchy. The single BHC classifier finds that class 5 is the most separable class within the first right meta-class of the tree, so it might not be related to meta-class 2–6–4 in any ‘‘natural’’ way, but it is more separable along with these classes when the whole data set is split up to two meta-classes.

52/48

30/70

86/14

67/33

64/36

63/37 FIGURE 15.7 The BHC tree used for classification of Figure 15.5 with left/right probabilities (%).

3

51/49

60/40

9

1

7

8

2

6

4

5

Random Forest Classification of Remote Sensing Data

15 .6

343

Conc lusi ons

The use of random forests for classification of multi-source remote sensing data and hyperspectral remote sensing data has been discussed. Random forests should be considered attractive for classification of both data types. They are both fast in training and classification, and are distribution-free classifiers. Furthermore, the problem with the curse of dimensionality is naturally addressed by the selection of a low m, without having to discard variables and dimensions completely. The only parameter random forests are truly sensitive to is the number of variables m, the nodes in every tree draw at random during training. This parameter should generally be much smaller than the total number of available variables, although selecting a high m can yield good classification accuracies, as can be seen above for the Anderson River data (Table 15.2). In experiments, two types of random forests were used, that is, random forests based on the CART approach and random forests that use BHC trees. Both approaches performed well in experiments. They gave excellent accuracies for both data types and were shown to be very fast.

Acknowledgment This research was supported in part by the Research Fund of the University of Iceland and the Assistantship Fund of the University of Iceland. The Anderson River SAR/MSS data set was acquired, preprocessed, and loaned by the Canada Centre for Remote Sensing, Department of Energy Mines and Resources, Government of Canada.

References 1. L.K. Hansen and P. Salamon, Neural network ensembles, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 993–1001, 1990. 2. L.I. Kuncheva, Fuzzy versus nonfuzzy in combining classifiers designed by Boosting, IEEE Transactions on Fuzzy Systems, 11, 1214–1219, 2003. 3. J.A. Benediktsson and P.H. Swain, Consensus Theoretic Classification Methods, IEEE Transactions on Systems, Man and Cybernetics, 22(4), 688–704, 1992. 4. S. Haykin, Neural Networks, A Comprehensive Foundation, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1999. 5. L. Breiman, Bagging predictors, Machine Learning, 24I(2), 123–140, 1996. 6. Y. Freund and R.E. Schapire: Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference, 148–156, 1996. 7. J. Ham, Y. Chen, M.M. Crawford, and J. Ghosh, Investigation of the random forest framework for classification of hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing, 43(3), 492–501, 2005. 8. S.R. Joelsson, J.A. Benediktsson, and J.R. Sveinsson, Random forest classifiers for hyperspectral data, IEEE International Geoscience and Remote Sensing Symposium (IGARSS 0 05), Seoul, Korea, 25–29 July 2005, pp. 160–163. 9. P.O. Gislason, J.A. Benediktsson, and J.R. Sveinsson, Random forests for land cover classification, Pattern Recognition Letters, 294–300, 2006.

344

Signal and Image Processing for Remote Sensing

10. L. Breiman, Random forests, Machine Learning, 45(1), 5–32, 2001. 11. L. Breiman, Random forest, Readme file. Available at: http://www.stat.berkeley.edu/~ briman/ RandomForests/cc.home.htm Last accessed, 29 May, 2006. 12. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd ed., John Wiley & Sons, New York, 2001. 13. S. Kumar, J. Ghosh, and M.M. Crawford, Hierarchical fusion of multiple classifiers for hyperspectral data analysis, Pattern Analysis & Applications, 5, 210–220, 2002. 14. http://oz.berkeley.edu/users/breiman/RandomForests/cc_home.htm (Last accessed, 29 May, 2006.) 15. G.J. Briem, J.A. Benediktsson, and J.R. Sveinsson, Multiple classifiers applied to multisource remote sensing data, IEEE Transactions on Geoscience and Remote Sensing, 40(10), 2291–2299, 2002. 16. D.G. Goodenough, M. Goldberg, G. Plunkett, and J. Zelek, The CCRS SAR/MSS Anderson River data set, IEEE Transactions on Geoscience and Remote Sensing, GE-25(3), 360–367, 1987. 17. L. Jimenez and D. Landgrebe, Supervised classification in high-dimensional space: Geometrical, statistical, and asymptotical properties of multivariate data, IEEE Transactions on Systems, Man, and Cybernetics, Part. C, 28, 39–54, 1998. 18. http://www.stat.berkeley.edu/users/breiman/RandomForests/cc_software.htm (Last accessed, 29 May, 2006.)

16 Supervised Image Classification of Multi-Spectral Images Based on Statistical Machine Learning

Ryuei Nishii and Shinto Eguchi

CONTENTS 16.1 Introduction ..................................................................................................................... 346 16.2 AdaBoost .......................................................................................................................... 346 16.2.1 Toy Example in Binary Classification ........................................................... 347 16.2.2 AdaBoost for Multi-Class Problems .............................................................. 348 16.2.3 Sequential Minimization of Exponential Risk with Multi-Class .............. 348 16.2.3.1 Case 1................................................................................................. 349 16.2.3.2 Case 2................................................................................................. 349 16.2.4 AdaBoost Algorithm ........................................................................................ 350 16.3 LogitBoost and EtaBoost................................................................................................ 350 16.3.1 Binary Class Case.............................................................................................. 350 16.3.2 Multi-Class Case ............................................................................................... 351 16.4 Contextual Image Classification................................................................................... 352 16.4.1 Neighborhoods of Pixels.................................................................................. 352 16.4.2 MRFs Based on Divergence ............................................................................ 353 16.4.3 Assumptions ...................................................................................................... 353 16.4.3.1 Assumption 1 (Local Continuity of the Classes) ........................ 353 16.4.3.2 Assumption 2 (Class-Specific Distribution) ................................ 353 16.4.3.3 Assumption 3 (Conditional Independence) ................................ 354 16.4.3.4 Assumption 4 (MRFs) ..................................................................... 354 16.4.4 Switzer’s Smoothing Method.......................................................................... 354 16.4.5 ICM Method....................................................................................................... 354 16.4.6 Spatial Boosting................................................................................................. 355 16.5 Relationships between Contextual Classification Methods..................................... 356 16.5.1 Divergence Model and Switzer’s Model....................................................... 356 16.5.2 Error Rates ......................................................................................................... 357 16.5.3 Spatial Boosting and the Smoothing Method .............................................. 358 16.5.4 Spatial Boosting and MRF-Based Methods .................................................. 359 16.6 Spatial Parallel Boost by Meta-Learning..................................................................... 359 16.7 Numerical Experiments ................................................................................................. 360 16.7.1 Legends of Three Data Sets............................................................................. 361 16.7.1.1 Data Set 1: Synthetic Data Set........................................................ 361 16.7.1.2 Data Set 2: Benchmark Data Set grss_dfc_0006 .......................... 361 16.7.1.3 Data Set 3: Benchmark Data Set grss_dfc_0009 .......................... 361 16.7.2 Potts Models and the Divergence Models.................................................... 361 16.7.3 Spatial AdaBoost and Its Robustness ............................................................ 363 345

Signal and Image Processing for Remote Sensing

346

16.7.4 Spatial AdaBoost and Spatial LogitBoost ..................................................... 365 16.7.5 Spatial Parallel Boost ........................................................................................ 367 16.8 Conclusion ....................................................................................................................... 368 Acknowledgment ....................................................................................................................... 370 References ................................................................................................................................... 370

16.1

Introduction

Image classification for geostatistical data is one of the most important issues in the remote-sensing community. Statistical approaches have been discussed extensively in the literature. In particular, Markov random fields (MRFs) are used for modeling distributions of land-cover classes, and contextual classifiers based on MRFs exhibit efficient performances. In addition, various classification methods were proposed. See Ref. [3] for an excellent review paper on classification. See also Refs. [1,4–7] for a general discussion on classification methods, and Refs. [8,9] for backgrounds on spatial statistics. In a paradigm of supervised learning, AdaBoost was proposed as a machine learning technique in Ref. [10] and has been widely and rapidly improved for use in pattern recognition. AdaBoost linearly combines several weak classifiers into a strong classifier. The coefficients of the classifiers are tuned by minimizing an empirical exponential risk. The classification method exhibits high performance in various fields [11,12]. In addition, fusion techniques have been discussed [13–15]. In the present chapter, we consider contextual classification methods based on statistics and machine learning. We review AdaBoost with binary class labels as well as multi-class labels. The procedures for deriving coefficients for classifiers are discussed, and robustness for loss functions is emphasized here. Next, contextual image classification methods including Switzer’s smoothing method [1], MRF-based methods [16], and spatial boosting [2,17] are introduced. Relationships among them are also pointed out. Spatial parallel boost by meta-learning for multi-source and multi-temporal data classification is proposed. The remainder of the chapter is organized as follows. In Section 16.2, AdaBoost is briefly reviewed. A simple example with binary class labels is provided to illustrate AdaBoost. Then, we proceed to the case with multi-class labels. Section 16.3 gives general boosting methods to obtain the robustness property of the classifier. Then, contextual classifiers including Switzer’s method, an MRF-based method, and spatial boosting are discussed. Relationships among them are shown in Section 16.5. The exact error rate and the properties of the MRF-based classifier are given. Section 16.6 proposes spatial parallel boost applicable to classification of multi-source and multi-temporal data sets. The methods treated here are applied to a synthetic data set and two benchmark data sets, and the performances are examined in Section 16.7. Section 16.8 concludes the chapter and mentions future problems.

16.2

AdaBoost

We begin this section with a simple example to illustrate AdaBoost [10]. Later, AdaBoost with multi-class labels is mentioned.

Supervised Image Classification of Multi-Spectral Images 16.2.1

347

Toy Example in Binary Classification

Suppose that a q-dimensional feature vector x 2 Rq observed by a supervised example labeled by þ 1 or 1 is available. Furthermore, let gk(x) be functions (classifiers) of the feature vector x into label set {þ1, 1} for k ¼ 1, 2, 3. If these three classifiers are equally efficient, a new function, sign( f1(x) þ f2(x) þ f3(x)), is a combined classifier based on a majority vote, where sign(z) is the sign of the argument z. Suppose that classifier f1 is the most reliable, f2 has the next greatest reliability, and f3 is the least reliable. Then, a new function sign (b1f1(x) þ b2f2(x) þ b3f3(x)) is a boosted classifier based on a weighted vote, where b1 > b2 > b3 are positive constants to be determined according to efficiencies of the classifiers. Constants bk are tuned by minimizing the empirical risk, which will be defined shortly. In general, let y be the true label of feature vector x. Then, label y is estimated by a signature, sign(F(x)), of a classification function F(x). Actually, if F(x) > 0, then x is classified into the class with label 1, otherwise into 1. Hence, if yF(x) < 0, vector x is misclassified. For evaluating classifier F, AdaBoost in Ref. [10] takes the exponential loss function defined by Lexp (F j x, y) ¼ exp {yF(x)}

(16:1)

The loss function Lexp(t) ¼ exp(t) vs. t ¼ yF(x) is given in Figure 16.1. Note that the exponential function assigns a heavy loss to an outlying example that is misclassified. AdaBoost is apt to overlearn misclassified examples. Let {(xi, yi) 2 Rq { þ 1, 1} j i ¼ 1, 2, . . . , n} be a set of training data. The classification function, F, is determined to minimize the empirical risk: Rexp (F) ¼

n n 1X 1X Lexp (F j xi , yi ) ¼ exp {yi F(xi )} n i ¼1 n i¼1

(16:2)

In the toy example above, F(x) is b1f1(x) þ b2f2(x) þ b3f3(x) and coefficients b1, b2, b3 are tuned by minimizing the empirical risk in Equation 16.2. A fast sequential procedure for minimizing the empirical risk is well known [11]. We will provide a new understanding of the procedure in the binary class case as well as in the multi-class case in Section 16.2.3. A typical classifier is a decision stump defined by a function d sign(xj t), where d ¼ + 1, t 2 R and xj denotes the j-th coordinate of the feature vector x. Nevertheless, each decision stump is poor. Finally, a linearly combined function of many stumps is expected to be a strong classification function.

6 5

exp logit eta 0−1

4 3 2 1 0 −1 −3

−2

−1

0

1

2

3

FIGURE 16.1 Loss functions (loss vs. yF(x)).

Signal and Image Processing for Remote Sensing

348 16.2.2

AdaBoost for Multi-Class Problems

We will give an extension of loss and risk functions to cases with multi-class labels. Suppose that there are g possible land-cover classes C1, . . . , Cg, for example, coniferous forest, broad leaf forest, and water area. Let D ¼ {1, . . . , n} be a training region with n pixels over some scene. Each pixel i in region D is supposed to belong to one of the g classes. We denote a set of all class labels by G ¼ {1, . . . , g}. Let xi 2 Rq be a q-dimensional feature vector observed at pixel i, and yi be its true label in label set G. Note that pixel i in region D is a numbered small area corresponding to the observed unit on the earth. Let F(x, k) be a classification function of feature vector x 2 Rq and label k in set G. We allocate vector x into the class with label ^ yF 2 G by the following maximizer: ^ yF ¼ arg max F(x, k) k2G

(16:3)

Typical examples of the strong classification function would be given by posterior probability functions. Let p(x j k) be a class-specific probability density function of the k-th class, Ck. Thus, the posterior probability of the label, Y ¼ k, given feature vector x, is defined by X p(k j x) ¼ p(x j k)= p(x j ‘) with k 2 G (16:4) ‘2G

which gives a strong classification function, where the prior distribution of class Ck is assumed to be uniform. Note that the label estimated by posteriors p(k j x), or equivalently by log posteriors log p(k j x), is just the Bayes rule of classification. Note also that p(k j x) is a measure of the confidence of the current classification and is closely related to logistic discriminant functions [18]. Let y 2 G be the true label of feature vector x and F(x) a classification function. Then, the loss by misclassification into class label k is assessed by the following exponential loss function: Lexp (F, k j x, y) ¼ exp {F(x, k) F(x, y)}

for k 6¼ y with k 2 G

(16:5)

This is an extension of the exponential loss (Equation 16.1) with binary classification. The empirical risk is defined by averaging the loss functions over the training data set {(xi, yi) 2 Rq G j i 2 D} as Rexp (F) ¼

1XX 1XX Lexp (F, k j xi , yi ) ¼ exp {F(xi , k) F(xi , yi )} n i2D k6¼y n i2D k6¼y i

(16:6)

i

AdaBoost determines the classification function F to minimize exponential risk Rexp(F), in which F is a linear combination of base functions. 16.2.3

Sequential Minimization of Exponential Risk with Multi-Class

Let f and F be fixed classification functions. Then, we obtain the optimal coefficient, b*, which gives the minimum value of empirical risk Remp(F þ b f): b* ¼ arg min {Rexp (F þ bf )}; b2R

(16:7)

Supervised Image Classification of Multi-Spectral Images

349

Applying procedure in Equation 16.7 sequentially, we combine classifiers f1, f2, . . . , fT as F(0) 0, F(1) ¼ b1 f1 , F(2) ¼ b1 f1 þ b2 f2 , . . . , F(T) ¼ b1 f1 þ b2 f2 þ þ bT fT where bT is defined by the formula given in Equation 16.7 with F ¼ F(t 16.2.3.1

1)

and f ¼ ft.

Case 1

Suppose that function f(, k) takes values 0 or 1, and it takes 1 only once. In this case, coefficient b in Equation 16.7 is given by the closed form as follows. Let ^yi,f be the label of * pixel i selected by classifier f, and Df be a subset of D classified correctly by f. Define Vi (k) ¼ F(xi , k) F(xi , yi ) and

vi (k) ¼ f (xi , k) f (xi , yi )

(16:8)

Then, we obtain Rexp (F þ bf ) ¼

n X X

exp [Vi (k) þ bvi (k)]

i¼1 k6¼yi

X X exp Vj (^yjf ) þ exp {Vi (k)} i2Df k6¼yi j62Df j62Df k6¼yi , ^yjf sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ XX X X X ^2 exp {Vi (k)} exp {Vj (^yjf )} þ exp {Vj (k)} (16:9) i2Df k2yi j62Df j62Df k6¼yi , ^yjf ¼ eb

XX

exp {Vi (k)} þ eb

X

The last inequality is due to the relationship between the arithmetic and the geometric means. The equality holds if and only if b ¼ b , where * 2 3 X X X 1 (16:10) exp {Vi (k)}= exp {Vj (^yjf )}5 b ¼ log4 * 2 i2D k6¼y j62D f

f

i

The optimal coefficient, b , can be expressed as *

1 1 «F (f ) b ¼ log * 2 «F (f )

with «F (f ) ¼ P

P

exp Vj (^yjf )

j62Df

P P exp Vj (^yjf ) þ expfVi (k)g

j62Df

(16:11)

i2Df k6¼yi

In the binary class case, «F( f) coincides with the error rate of classifier f. 16.2.3.2

Case 2

If f(, k) takes real values, there is no closed form of coefficient b*. We must perform an iterative procedure for the optimization of risk Remp. Using the Newton-like method, we update estimate b(t) at the t-th step as follows: b(tþ1) ¼ b(t)

n X X i¼1 k6¼yi

vi (k) exp [Vi (k) þ b(t) vi (k)]=

n X X

v2i (k) exp [Vi (k) þ b(t) vi (k)]

i¼1 k6¼yi

(16:12)

Signal and Image Processing for Remote Sensing

350

where vi(k) and Vi(k) are defined in the formulas in Equation 16.8. We observe that the convergence of the iterative procedure starting from b(0) ¼ 0 is very fast. In numerical examples in Section 16.7, the procedure converges within five steps in most cases. 16.2.4

AdaBoost Algorithm

Now, we summarize an iterative procedure of AdaBoost for minimizing the empirical exponential risk. Let {F} ¼ {f : Rq ! G} be a set of classification functions, where G ¼ {1, . . . , g} is the label set. AdaBoost combines classification functions as follows: .

.

.

.

Find classification function f in F and coefficient b that jointly minimize empirical risk Rexp( bf ) defined in Equation 16.6, for example, f1 and b1. Consider empirical risk Rexp(b1f1 þ bf) with b1f1 given from the previous step. Then, find classification function f 2 {F} and coefficient b that minimize the empirical risk, for example, f2 and b2. This procedure is repeated T-times and the final classification function FT ¼ b1 f1 þ þ bTfT is obtained. Test vector x 2 Rq is classified into the label maximizing the final function FT(x, k) with respect to k 2 G.

Substituting exponential risk Rexp for other risk functions, we have different classification methods. Risk functions Rlogit and Reta will be defined in the next section.

16.3

Logi tBoost and EtaBoost

AdaBoost was originally designed to combine weak classifiers for deriving a strong classifier. However, if we combine strong classifiers with AdaBoost, the exponential loss assigns an extreme penalty for misclassified data. It is well known that AdaBoost is not robust. In the multi-class case, this seems more serious than the binary class case. Actually, this is confirmed by our numerical example in Section 16.7.3. In this section, we consider robust classifiers derived by a loss function that is more robust than the exponential loss function.

16.3.1

Binary Class Case

Consider binary class problems such that feature vector x with true label y 2 {1,1} is classified into class label sign (F(x)). Then, we take the logit and the eta loss functions defined by Llogit (Fjx, y) ¼ log [1 þ exp {yF(x)}] Leta (F j x, y) ¼ (1 h) log [1 þ exp {yF(x)}] þ h{yF(x)}

(16:13) for 0 < h < 1

(16:14)

The logit loss function is derived by the log posterior probability of a binomial distribution. The eta loss function, an extension of the logit loss, was proposed by Takenouchi and Eguchi [19].

Supervised Image Classification of Multi-Spectral Images

351

Three loss functions given in Figure 16.1 are defined as follows: Lexp (t) ¼ exp (t), Llogit (t) ¼ log {1 þ exp (2t)} log 2 þ 1 and Leta (t) ¼ (1=2)Llogit (t) þ (1=2)(t)

(16:15)

We see that the logit and the eta loss functions assign less penalty for misclassified data than the exponential loss function does. In addition, the three loss functions are convex and differentiable with respect to t. The convexity assures the uniqueness of the coefficient minimizing Remp(F þ bf ) with respect to b, where Remp denotes an empirical risk function under consideration. The convexity makes the sequential minimization of the empirical risk feasible. For corresponding empirical risk functions, we define the empirical risks as follows:

Rlogit (F) ¼

Reta (F) ¼

16.3.2

n 1X log [1 þ exp {yi F(xi )}], and n i¼1

n n 1h X hX log [1 þ exp {yi F(xi )}] þ {yi F(xi )} n i¼1 n i¼1

(16:16)

(16:17)

Multi-Class Case

Let y be the true label of feature vector x, and F(x, k) a classification function. Then, we define the following function in a similar manner to that of posterior probabilities: plogit (y j x) ¼ P

exp {F(x, y)} k2G exp {F(x, k)}

Using the function, we define the loss functions in the multi-class case as follows: Llogit (F j x, y) ¼ log plogit (y j x) and Leta (F j x, y) ¼ {1 (g 1)h}{ log plogit (y j x)} þ h

X k6¼y

log plogit (k j x)

where h is a constant with 0 < h < 1/(g1). Then empirical risks are defined by the average of the loss functions evaluated by training data set {(xi, yi) 2 Rq Gji 2 D} as Rlogit (F) ¼

n 1X Llogit (F j xi , yi ) and n i¼1

Reta (F) ¼

n 1X Leta (F j xi , yi ) n i¼1

(16:18)

LogitBoost and EtaBoost aim to minimize logit risk function Rlogit(F) and eta risk function Reta(F), respectively. These risk functions are expected to be more robust than the exponential risk function. Actually, EtaBoost is more robust than LogitBoost in the presence of mislabeled training examples.

Signal and Image Processing for Remote Sensing

352

16 .4

Context ual I ma ge Clas sific ation

Ordinary classifiers proposed for independent samples are of course utilized for image classification. However, it is known that contextual classifiers show better performance than noncontextual classifiers. In this section, contextual classifiers: the smoothing method by Switzer [1], the MRF-based classifiers, and spatial boosting [17], will be discussed.

16.4.1

Neighborhoods of Pixels

In this subsection, we define notations related to observations and two sorts of neighborhoods. Let D ¼ {1, . . . , n} be an observed area consisting of n pixels. A q-dimensional feature vector and its observation at pixel i are denoted as Xi and xi, respectively, for i in area D. The class label covering pixel i is denoted by random variable Yi, where Yi takes an element in the label set G ¼ {1, . . . , g}. All feature vectors are expressed in vector form as X ¼ (XT1 , . . . , XTn )T : nq 1

(16:19)

In addition, we define random label vectors as Y ¼ (Y1 , . . . , Yn )T : n 1

and

Yi ¼ Y with deleted Yi : (n 1) 1

(16:20)

Recall that class-specific density functions are defined by p(x j k) with x 2 Rq for deriving the posterior distribution in Equation 16.4. In the numerical study in Section 16.7, the densities are fitted by homoscedastic q-dimensional Gaussian distributions, Nq(m(k), S), with common variance–covariance matrix S, or heteroscedastic Gaussian distributions, Nq(m(k), Sk), with class-specific variance–covariance matrices Sk. Here, we define neighborhoods to provide contextual information. Let d(i, j) denote the distance between centers of pixels i and j. Then, we define two kinds of neighborhoods of pixel i as follows: Ur (i) ¼ { j 2 D j d(i, j) ¼ r} and Nr (i) ¼ {j 2 D j % d(i, j) % r} (16:21) pﬃﬃﬃ where r ¼ 1, 2, 2, . . ., which denotes the radius of the neighborhood. pﬃﬃﬃNote that subset Ur(i) constitutes an isotropic ring region. Subsets Ur(i) with r ¼ 0, 1, 2, 2 are shown in Figure 16.2. Here, we find that U0(i) ¼ {i}, N1(i) ¼ U1(i) is the first-order neighborhood, and Npﬃﬃ2 (i) ¼ U1 (i) [ Upﬃﬃ2 (i) forms the second-order neighborhood of pixel i. In general, we have Nr(i) ¼ [1 % r0 % r Ur0 (i) for r ^ 1.

(a) r = 0

i

(b) r = 1

i

FIGURE 16.2 Isotropic neighborhoods Ur(i) with center pixel i and radius r.

(c) r = √2

i

(d) r = 2

i

Supervised Image Classification of Multi-Spectral Images 16.4.2

353

MRFs Based on Divergence

Here, we will discuss the spatial distribution of the classes. A pairwise dependent MRF is an important model for specifying the field. Let D(k, ‘) > 0 be a divergence between two classes, Ck and C‘ (k 6¼ ‘), and put D(k, k) ¼ 0. The divergence is employed for modeling the MRF. In Potts model, D(k, ‘) is defined by D0 (k, ‘): ¼ 1 if k 6¼ ‘:¼ 0 otherwise. Nishii [18] proposed to take the squared Mahalanobis distance between homoscedastic Gaussian distributions Nq(m(k), S) defined by D1 (m(k), m(‘)) ¼ {m(k) m(‘)}T S1 {m(k) m(‘)}

(16:22) Ð

Nishii and Eguchi (2004) proposed to take Jeffreys divergence {p(x j k) p(xj‘)} log{p(x j k)/p(x j ‘)}dx between densities p(x j k). The models are called divergence models. Let Di(g) be the average of divergences in the neighborhood Nr(i) defined by Equation 16.21 as follows: ( 1 P D(k, yj ), if jNr (i) j 1 j Nr (i)j Di (k) ¼ for (i, k) 2 D G (16:23) j2Nr (i) 0 otherwise where jSj denotes the cardinality of set S. Then, random variable Yi conditional on all the other labels Yi ¼ yi is assumed to follow a multinomial distribution with the following probabilities: exp {bDi (k)} Pr{Yi ¼ k j Yi ¼ yi } ¼ P exp {bDi (‘)}

for k 2 G

(16:24)

‘2G

Here, b is a non-negative constant called the clustering parameter, or the granularity of the classes, and Di(k) is defined by the formula given in Equation 16.23. Parameter b characterizes the degree of the spatial dependency of the MRF. If b ¼ 0, then the classes are spatially independent. Here, radius r of neighborhood Ur(i) denotes the extent of spatial dependency. Of course, b, as well as r, are parameters that need to be estimated. Due to the Hammersley–Clifford theorem, conditional distribution in Equation 16.24 is known to specify the distribution of test label vector, Y, under the mild condition. The joint distribution of test labels, however, cannot be obtained in a closed form. This causes a difficulty in estimating the parameters specifying the MRF. Geman and Geman [6] developed a method for the estimation of test labels by simulated annealing. However, the procedure is time consuming. Besag [4] proposed an iterative conditional mode (ICM) method, which is reviewed in Section 16.4.5. 16.4.3

Assumptions

Now, we make the following assumptions for deriving classifiers. 16.4.3.1 Assumption 1 (Local Continuity of the Classes) If a class label of a pixel is k 2 G, then pixels in the neighborhood have the same class label k. Furthermore, this is true for any pixel. 16.4.3.2 Assumption 2 (Class-Specific Distribution) A feature vector of a sample from class Ck follows a class-specific probability density function p(x j k) for label k in G.

Signal and Image Processing for Remote Sensing

354 16.4.3.3

Assumption 3 (Conditional Independence)

The conditional distribution of vector X in Equation 16.19 given label vector Y ¼ y in Equation 16.20 is given by Pi 2 D p(xi j yi). 16.4.3.4

Assumption 4 (MRFs)

Label vector Y defined by Equation 16.20 follows an MRF specified by divergence (quasidistance) between the classes.

16.4.4

Switzer’s Smoothing Method

Switzer [1] derived the contextual classification method (the smoothing method) under Assumptions 1–3 with homoscedastic Gaussian distributions Nq(m(k), S). Let c(x j k) be its probability density function. Assume that Assumption 1 holds for neighborhoods Nr(). Then, he proposed to estimate label yi of pixel i by maximizing the following joint probability densities: c(xi j k) Pj2Nr (i) c(xj j k)

c(x j k) (2p)q=2 j Sj1=2 exp {D1 (x, m(k))=2}

with respect to label k 2 G, where D1(,) stands for the squared Mahalanobis distance in Equation 16.22. The maximization problem is equivalent to minimizing the following quantity: X D1 (xi , m(k)) þ D1 (xj , m(k)) (16:25) j2Nr (i)

Obviously, Assumption 1 does not hold for the whole image. However, the method still exhibits good performance, and the classification is performed very quickly. Thus, the method is a pioneering work of contextual image classification.

16.4.5

ICM Method

Under Assumptions 2–4 with conditional distribution in Equation 16.24, the posterior probability of Yi ¼ k given feature vector X ¼ x and label vector Yi ¼ yi is expressed by exp {bDi (k)}p(xi j k) pi (k j r, b) Pr{Yi ¼ k j X ¼ x, Yi ¼ yi } ¼ P exp {bDi (‘)gp(xi j ‘)

(16:26)

‘2G

Then, the posterior probability Pr{Y ¼ y j X ¼ x} of label vector y is approximated by the pseudo-likelihood PL(y j r, b) ¼

n Y

pi (yi j r, b)

(16:27)

i¼1

where posterior probability pi (yi j r, b) is defined by Equation 16.26. Pseudo-likelihood in Equation 16.27 is used for accuracy assessment of the classification as well as for parameter estimation. Here, class-specific densities p(x j k) are estimated using the training data.

Supervised Image Classification of Multi-Spectral Images

355

When radius r and clustering parameter b are given, the optimal label vector y, which maximizes the pseudo-likelihood PL(y j r, b) defined by Equation 16.27, is usually estimated by the ICM procedure [4]. At the (t þ 1)-st step, ICM finds the optimal label, yi(t þ 1), (t) for all test pixels i 2 D. This procedure is repeated until the convergence of the given yi label vector, for example y ¼ y(r, b) : n 1. Furthermore, we must optimize a pair of parameters (r, b) by maximizing pseudo-likelihood PL(y(r, b) j r, b). 16.4.6

Spatial Boosting

As shown in Section 16.2, AdaBoost combines classification functions defined over the feature space. Of course, the classifiers give noncontextual classification. We extend AdaBoost to build contextual classification functions, which we call spatial AdaBoost. Define an averaged logarithm of the posterior probabilities (Equation 16.4) in neighborhood Ur(i) (Equation 16.21) by 8 X < 1 pﬃﬃﬃ log p(k j xj ) if jUr (i) j 1 j Ur (i)j j2U (i) for r ¼ 0, 1, 2, . . . (16:28) fr (x, k j i) ¼ r : 0 otherwise where x ¼ (x1T, . . . , xnT)T : qn 1. Therefore, the averaged log posterior f0 (x, k j i) with radius r ¼ 0 is equal to the log posterior log p(k j xi) itself. Hence, the classification due to function f0(x, k j i) is equivalent to a noncontextual classification based on the maximum-aposteriori (MAP) criterion. If the spatial dependency among the classes is not negligible, then the averaged log posteriors f1(x, k j i) in the first-order neighborhood may have information for classification. If the spatial dependency becomes stronger, then fr(x, k j i) with a larger r is also useful. Thus, we adopt the average of the log posteriors fr(x, k j i) as a classification function of center pixel i. The efficiency of the averaged log posteriors as classification functions would be intuitively arranged in the following order: f0 (x, k j i), f1 (x, k j i), fp2ﬃﬃ (x, k j i), f2 (x, k j i), , where x ¼ (xT1 , , xTn )T

(16:29)

The coefficients for the above classification functions can be tuned by minimizing the empirical risk given by Equation 16.6 or Equation 16.18. See Ref. [2] for possible candidates for contextual classification functions. The following is the contextual classification procedure based on the spatial boosting method. .

.

Fix an empirical risk function, Remp(F), of classification function F evaluated over training data set {(xi, yi) 2 Rq G j i 2 D}. Let f0 (x, k j i), f1(x, k j i), fpﬃﬃ2 (x, k j i), . . . , fr (x, k j i) be the classification functions defined by Equation 16.28.

.

Find coefficient b that minimizes empirical risk Remp (bf0). Put the optimal value to b0.

.

If coefficient b0 is negative, quit the procedure. Otherwise, consider empirical risk Remp(b0f0 þ b f1) with b0 f0 obtained by the previous step. Then, find coefficient b, which minimizes the empirical risk. Put the optimal value to b1.

.

If b1 is negative, quit the procedure. Otherwise, consider empirical risk Remp (b0f0 þ b1f1 þ bfpﬃﬃ2 ). This procedure is repeated, and we obtain a sequence of positive coefficients b0, b1, . . . , br for the classification functions.

Signal and Image Processing for Remote Sensing

356

Finally, the classification function is derived by Fr (x, k j i) ¼ b0 f0 (x, k j i) þ b1 f1 (x, k j i) þ þ br fr (x, k j i), x ¼ (xT1 , . . . , xTn )T

(16:30)

Test label y* of test vector x* 2 Rq is estimated by maximizing classification function in Equation 16.30 with respect to label k 2 G. Note that the pixel is classified by the feature vector at the pixel as well as feature vectors in neighborhood Nr(i) in the test area only. There is no need to estimate labels of neighbors, whereas the ICM method requires estimated labels of neighbors and needs an iterative procedure for the classification. Hence, we claim that spatial boosting provides a very fast classifier.

16.5

Relationshi ps between Contextual Classification Methods

Contextual classifiers discussed in the chapter can be regarded as an extension of Switzer’s method from a unified viewpoint, cf. [16] and [2].

16.5.1

Divergence Model and Switzer’s Model

Let us consider the divergence model in Gaussian MRFs (GMRFs), where feature vectors follow homoscedastic Gaussian distributions Nq(m(k), S). The divergence model can be viewed as a natural extension of Switzer’s model. The image with center pixel 1 and its neighbors is shown in Figure 16.3. First-order and second-order neighborhoods of the center pixel are given by sets of pixel numbers N1(1) ¼ {2, 4, 6, 8} and Npﬃﬃ2 (1) ¼ {2, 3, . . . , 9}, respectively. We focus our attention on center pixel 1 and its neighborhood Nr(1) of size 2K in general and discuss the classification problem of center pixel 1 when labels yj of 2K neighbors are observed. ^ be a non-negative estimated value of the clustering parameter b. Then, label y1 of Let b center pixel 1 is estimated by the ICM algorithm. In this case, the estimate is derived by maximizing conditional probability (Equation 16.26) with p(x j k) ¼ c(x j k). This is equiva^Div defined by lent to finding label Y ^ X ^ Div ¼ arg min {D1 (x1 , m(k)) þ b D1 (m(yj ), m(k))}, Y K j2N (1) r k2G

jNr (1) j ¼ 2K

(16:31)

where D1(s, t) is the squared Mahalanobis distance (Equation 16.22). Switzer’s method [1] classifies the center pixel by minimizing formula given in Equation 16.25 with resp Pect to label k. Here, the method can be slightly extended by changing ^ /K. Thus, we define the estimate due to the coefficient for j 2 Nr (1) D1(xj, m(k)) from 1 to b Switizer’s method as follows:

FIGURE 16.3 Pixel numbers (left) and pixel labels (right).

5

4

3

1

1

1

6

1

2

1

1

2

7

8

9

1

2

2

Supervised Image Classification of Multi-Spectral Images

^ Switzer Y

357

8 <

9 = X ^ b ¼ arg min D1 (x1 , m(k)) þ D1 (xj , m(k)) : ; K j2N (1) r k2G

(16:32)

Estimates in Equation 16.31 and Equation 16.32 of label y1 are given by the same formula except for m(yj) and xj in the respective last terms. Note that xj itself is a primitive estimate of m(yj). Hence, the classification method based on the divergence model is an extension of Switzer’s method. 16.5.2

Error Rates

We derive the exact error rate due to the divergence model and Switzer’s model in the previous local region with two classes G ¼ {1, 2}. In the two-class case, the only positive quasi-distance is D(1, 2). Substituting bD(1, 2) for b, we note that the MRF based on Jeffreys divergence is reduced to the Potts model. Let d be the Mahalanobis distance between distributions Nq(m(k), S) for k ¼ 1, 2, and Nr (1) be a neighborhood consisting of 2K neighbors of center pixel 1, where K is a fixed natural number. Furthermore, suppose that the number of neighbors with label 1 or 2 is randomly changing. Our aim is to derive the error rate of pixel 1 given features x1, xj, and labels yj of neighbors j in Nr(1). Recall that Y^Div is the estimated label of y1 obtained by ^Div 6¼ Y1}, is given by formula in Equation 16.31. Then, the exact error rate, Pr{Y e (b^; b, d) ¼ p0 F(d=2) þ

K X

( pk

k¼1

F( d=2 kb^d=g) F(d=2 þ kb^d=g) þ 2 2 1 þ ekbd =g 1 þ ekbd =g

) (16:33)

^ is an where F(x) is the cumulative standard Gaussian distribution function, and b estimate of clustering parameter b. Here, pk gives a prior probability such that the number of neighbors with label 1, W1, is equal to K þ k or K k in the neighborhood Nr(1) for k ¼ 0, 1, . . . , K. In Figure 16.3, first-order neighborhood N1(1) is given by {2, 4, 6, 8} with (W1, K) ¼ (2, 2), and second-order neighborhood Npﬃﬃ2 (1) is given by {2, 3, . . . ,9} with (W1, K) ¼ (3, 4); see Ref. [16]. If prior probability p0 is equal to one, K pixels in neighborhood Nr(1) are labeled 1 and the remaining K pixels are labeled 2 with probability one. In this case, the majority vote of the neighbors does not work. Hence, we assume that p0 is less than one. Then, we have ^ ; b, d); see Ref. [16]. the following properties of the error rate, e(b .

K P

pk . kbd2 =g k¼1 1 þ e ^ ; b, d), of b ^ takes a minimum at b ^ ¼ b (Bayes rule), and the P2. The function, e(b minimum e(b; b, d) is a monotonically decreasing function of the Mahalanobis distance, d, for any fixed positive clustering parameter b. ^ ; b, d), is a monotonically decreasing function of b for any P3. The function, e(b ^ and d. fixed positive constants b ^ ; b, d) < F(d/2) for any positive b ^ if the inequalP4. We have nthe inequality: e(b o P1. e(0; b, d) ¼ F(d/2), lim e(b^; b, d) ¼ p0 F(d=2) þ b^!1

.

.

.

ity b d12 log

1F(d=2) F(d=2)

holds.

Note that the value e(0; b, d) ¼ F(d/2) is simply the error rate due to Fisher’s linear discriminant function (LDF) with uniform prior probabilities on the labels. PK kbd2 =g The asymptotic value given in P1 is the error rate due to the k¼1 p k = 1 þ e

Signal and Image Processing for Remote Sensing

358

vote-for-majority rule when the number of neighbors, W1, is not equal to K. The property, P2, recommends that we use the true parameter, b, if it is known, and this is quite natural. P3 means that the classification becomes more efficient when d or b becomes large. Note that d is a distance in the feature space and b is a distance in the image. P4 implies that the use of spatial information always improves the performance of noncontextual discrimination ^ is far from the true value b. even if estimate b The error rate due to Switzer’s method is obtained in the same form as that of Equation qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 16.33 by replacing d with d * d= 1 þ 4b^2 =g

1F(d=2) F(d=2)

=d2 as in P3. Actually, the error rate exceeds the 1y-

^ (Figure 16.4a). On the other hand, the parameters of (b) intercept F(d/2) for large b meet the condition because b ¼ 1 and d is the same. Hence, the error rate is always less ^ , and this is shown in Figure 16.4b. than F(d/2) for any b 16.5.3

Spatial Boosting and the Smoothing Method

We consider the proposed classification function in Equation 16.30. When the radius is equal to zero (r ¼ 0), classification function in Equation 16.28 is given by x ¼ (xT1 , . . . , xTn )T : nq 1

F0 (x, k j i) ¼ b0 log p(k j xi ) ¼ b0 log p(xi j k) þ C,

(16:34)

where p(k j xi) is the posterior probability given in Equation 16.4, and C denotes a constant independent of label k. Constant b0 is positive, so the maximization of function F0(x, k j i) with respect to label k is equivalent to that of class-specific density p(xi j k). Hence, classification function F0 yields the usual noncontextual Bayes classifier. d = 1.5, b = 0.5

d = 1.5, b = 1.0

0.45

0.45 MRF Switzer

0.35 0.3 0.25

0.35 0.3 0.25

0.2 0.15 (a)

MRF Switzer

0.4 Error rate

Error rate

0.4

0.2 0

0.5

1

1.5

2

^

b

FIGURE 16.4 ^ ). Error rates due to MRF and Switzer’s method (x-axis: b

0.15 (b)

0

1

2 ^

b

3

4

Supervised Image Classification of Multi-Spectral Images

359

When the radius is equal to one (r ¼ 1), the classification function in Equation 16.18 is given by F1 (x, k j i) ¼ b0 log p(k j xi ) þ

X b1 log p(k j xj ) j U1 (i)j j2U (i) 1

¼ b0 log p(xi j k) þ

X b1 log p(xj j k) þ C0 jU1 (i)j j2U (i)

(16:35)

1

where C0 denotes a constant independent of k, and set U1(i) is defined by Equation 16.21, which is the first-order neighborhood of pixel i. Hence, the label of pixel i is estimated by a convex combination of the log-likelihood of the feature vector xi and that of xj with pixel j in neighborhood U1(i). In other words, the label is estimated by a majority vote based on the weighted sum of the log-likelihoods. Thus, the proposed method is an extension of the smoothing method proposed by Switzer [1] for multivariate Gaussian distributions with an isotropic correlation function. However, the derivation is completely different. 16.5.4

Spatial Boosting and MRF-Based Methods

Usually, the MRF-based method uses training data for the estimation of parameters that specify the class-specific densities p(x j k). Clustering parameter b appearing in conditional probability in Equation 16.24 and the test labels are jointly estimated. The ICM method maximizes conditional probability in Equation 16.26 sequentially with a given clustering parameter b. Furthermore, b is chosen to maximize pseudo-likelihood in Equation 16.27 calculated over the test data. This procedure requires a great deal of CPU time. On the other hand, spatial boosting uses the training data for the estimation of coefficients br and parameters of densities p(x j k). The estimation cost of the parameters of the densities is the same as that of the MRF-based classifier. Although estimation of the coefficients requires iterative procedure (Equation 16.12), the number of iterations is very low. After estimation of the coefficients and the parameters, the test labels can be estimated without any iterative procedure. Thus, the required computational resources, such as CPU times, should be significantly different between the two methods, which needs further scrutiny. Although the training data and the test data are numerous, spatial AdaBoost classifies the test data very quickly.

16.6

Spatial Parallel Boost by Meta-Learning

Spatial boosting is extended for classification of spatio-temporal data. Suppose we have S sets of training data over regions D(s) D for s ¼ 1, 2, . . . , S. Let {(xi(s), yi(s) 2 Rq G j i 2 D(s)} be a set of training data with n(s) ¼ j D(s) j samples. Parameters specifying the classspecific densities are estimated by each training data set. Then, we estimate the posterior distribution in Equation 16.4 according to each training data set. Thus, we derive the averaged log posteriors (Equation 16.28). We assume that an appropriate calibration for feature vectors xi(s) throughout the training data sets is preprocessed.

Signal and Image Processing for Remote Sensing

360

Fix loss function L, and let F(s) be a set of classification functions of the s-th training data set. Now, we propose spatial boosting based on S training data sets (Parallel Boost). Let F be a classification function applicable to all training data sets. Then, we propose an empirical risk function of F: Remp (F) ¼

S X

R(s) emp (F)

(16:36)

s¼1

where R(s) emp(F) denotes empirical risk given in Equation 16.6 or Equation 16.18 due to loss function L evaluated over the s-th training data set. The following steps describe parallel boost: .

.

Let {f0(s), f1(s), . . . , fr(s)} be a set of averages of the log posteriors (Equation 16.28) evaluated through the s-th training data set for s ¼ 1, . . . , S. Stage for radius 0 (a) Obtain the optimal coefficient and the training data set defined by (a01 , b01 ) ¼ arg min Remp (bf0(s) ) (s, b) 2 {1, . . . , S} R (b) If coefficient b01 is negative, quit the procedure. Otherwise, find the minimizer of empirical risk Remp (b01 f0(a01) þ bf0(s)). Let s ¼ a02 and b ¼ b02 2 R be the solution. (c) If coefficient b02 is negative, quit the procedure. Repeat this at most S times and define the convex combination, F0 b01 f0(a01) þ þ b0m0 f0(a0m0), of the classification functions, where m0 denotes the number of replications of the substeps.

.

.

Stage for radius 1 (first-order neighborhood) (a) Let F0 be a function obtained by the previous stage. Then, find classification function f 2 F1 and coefficient b 2 R that minimize the empirical risk Remp(F0 þ b f0(s)), for example, s ¼ a11 and b ¼ b11. (b) If coefficient b11 is negative, quit the procedure. Otherwise, minimize the empirical risk, Remp(F0 þ b11 f1(a11) þ bf1(s)), given F0 and b11 f1(a11). This is repeated and we define a convex combination F1 b11 f1(a11) þ þ b1a1 f1(a1m1). The function F0 þ F1 constitutes a contextual classifier. r The stage is repeated up to radius r, and we obtain the classification function F F0 þ F1 þ þ Fr.

Note that spatial parallel boost is based on the assumption that training data sets and test data are not significantly different, especially in the spatial distribution of the classes.

16.7

Numerical Experiments

Various classification methods discussed here will be examined by applying them to three data sets. Data set 1 is a synthetic data set, and Data sets 2 and 3 are real data sets offered in Ref. [20] as benchmarks. Legends of the data sets are given in the following

Supervised Image Classification of Multi-Spectral Images

361

section; then, the classification methods are applied to the data sets for evaluating the efficiency of the methods.

16.7.1 16.7.1.1

Legends of Three Data Sets Data Set 1: Synthetic Data Set

Data set 1 is constituted by multi-spectral images generated over the image shown in Figure 16.5a. There are three classes (g ¼ 3), and the labels correspond to black, white, and gray. Sample sizes from these classes are 3330, 1371, and 3580 (n ¼ 8281), respectively. We simulate four-dimensional spectral images (q ¼ 4) at each pixel of the true image of Figure 16.4a following independent multi-var iate Gaussian distributions pﬃﬃﬃ with mean vectors m1 ¼ (0 0 0 0)T, m2 ¼ (1 1 0 0)T= 2, and m3 ¼ (1.04980.6379 0 0)T and with common variance–covariance matrix s2 I, where I denotes the identity matrix. The mean vectors were chosen to maximize pseudo-likelihood (Equation 16.27) of the training data based on the squared Mahalanobis distance used as the divergence in Equation 16.22. Test data of size 8281 are similarly generated over the same image. Gaussian density functions with the common variance–covariance matrix are used to derive the posterior probabilities. 16.7.1.2

Data Set 2: Benchmark Data Set grss_dfc_0006

Data set grss_dfc_0006 is a benchmark data set for supervised image classification. The data set can be accessed from the IEEE GRS-S Data Fusion reference database [20], which consists of samples acquired by ATM and SAR (six and nine bands, respectively; q ¼ 15) observed at Feltwell, UK. There are five agricultural classes of sugar beets, stubble, bare soil, potatoes, and carrots (g ¼ 5). Sample sizes of the training data and test data are 5072 ¼ 1436 þ 1070 þ 341 þ 1411 þ 814 and 5760 ¼ 2017 þ 1355 þ 548 þ 877 þ 963, respectively, in a rectangular region of size 350 250. The data set was analyzed in Ref. [21] through various methods based on statistics and neural networks. In this case, the labels of the test and training data are observed sparsely. 16.7.1.3

Data Set 3: Benchmark Data Set grss_dfc_0009

Data set grss_dfc_0009 is also taken from the IEEE GRS-S Data Fusion reference database, which consists of examples acquired by Landsat 7 TMþ (q ¼ 7) with 11 agricultural classes (g ¼ 11) in Flevoland, Netherlands. Training data and test data sizes are 2891 and 2890, respectively, in a square region of size 512 512. We note that the labels of the test and training data of Data sets 2 and 3 are observed sparsely in the region.

16.7.2

Potts Models and the Divergence Models

We compare the performance of the Potts and the divergence models applied to Data sets 1 and 2 (see Table 16.1 and Table 16.2). Gaussian distributions with a common variance–covariance matrix (homoscedastic), and Gaussian distributions with class-specific matrices (heteroscedastic) are employed for class-specific distributions. The former corresponds to the LDF and the latter to the quadratic discriminant function (QDF). Test errors due to GMRF-based classifiers with several radii r are listed in the tables. Numerals in boldface denote the optimal values in the respective columns. We see that the divergence models give the minimum values in Table 16.1 and Table 16.2.

Signal and Image Processing for Remote Sensing

362 (a) True labels

(b) r = 0

(c) r = 1

(d) r = √2

(e) r = √8

(f) r = √13

FIGURE 16.5 (a) True labels of data set 1, (b) labels estimated by LDF, (c)–(f) labels estimated by spatial AdaBoost with b0 f0 þ b1 f1 þ þ br fr.

In the homoscedastic cases of Table 16.1 and Table 16.2, the divergence model gives a performance similar to that of the Potts model. In the heteroscedastic case, the divergence model is superior to the Potts model to a small degree. Classified images with s2 ¼ 1 are given in Figure 16.5b through Figure 16.5f. Figure 16.5b is an image classified by the LDF, which exhibits very poor performance. As radius r becomes larger, images classified by

TABLE 16.1 Test Error Rates (%) Due to Gaussian MRFs with Variance–Covariance Matrices s2I with s2 ¼ 1, 2 for Data Set 1 Homoscedastic Gaussian MRFs 2

s2 ¼ 2

s ¼ 1

s2 ¼ 3

s2 ¼ 4

Radius r2

Potts

Jeffreys

Potts

Jeffreys

Potts

Jeffreys

Potts

Jeffreys

0 1 2 4 5 8 9 10 13 16 17 18 20

41.66 9.35 3.90 3.47 3.19 3.39 3.65 3.89 4.46 4.49 4.96 5.00 5.68

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.57 9.97 3.90 3.02 3.32 3.35 3.51 3.86 4.36 5.00 5.30 5.92 6.50

41.57 9.68 4.46 3.12 2.96 3.37 4.06 4.09 4.66 4.99 5.54 5.83 6.56

52.63 25.60 13.26 9.82 8.03 7.76 8.28 8.88 10.12 11.16 12.04 12.64 14.89

52.63 27.41 12.23 9.20 7.75 8.57 8.20 10.08 10.20 11.68 11.65 12.67 13.55

54.45 30.76 17.49 11.83 10.72 10.14 10.58 10.57 13.25 14.25 15.34 16.07 17.18

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

Supervised Image Classification of Multi-Spectral Images

363

TABLE 16.2 Test Error Rates (%) Due to Homoscedastic and Heteroscedastic Gaussian MRFs with Two Sorts of Divergences for Data Set 2 Gaussian MRFs Homoscedastic 2

Heteroscedastic

Radius r

Potts

Jeffreys

Potts

Jeffreys

0 4 9 16 25 36 49 64

8.61 6.44 6.16 5.99 6.16 6.49 6.37 6.30

8.61 6.09 5.69 5.35 5.69 5.64 6.18 6.65

14.67 12.93 13.09 13.28 12.80 13.16 12.90 12.92

14.67 11.44 10.19 9.67 9.39 9.55 9.90 9.76

spatial AdaBoost becomes clearer. Therefore, we see that spatial AdaBoost improves the noncontextual classification result (r ¼ 0) significantly.

16.7.3

Spatial AdaBoost and Its Robustness

Spatial AdaBoost is applied to the three data sets. The error rates due to spatial AdaBoost and the MRF-based classifier for Data set 1 are listed in Table 16.3. Here, homoscedastic Gaussian distributions are employed for class-specific distributions in both classifiers. Hence, classification with radius r ¼ 0 indicates noncontextual classification by the LDF. We used the GMRF with the divergence model discussed in Section 16.4.3. From Table 16.3, we see that the minimum error rate due to AdaBoost is higher than that due to the GMRF-based method, but the value is still comparable to that due to the TABLE 16.3 Error Rates (%) Due to Homoscedastic GMRF and Spatial AdaBoost for Data Set 1 with Variance– Covariance Matrix s2I s2 ¼ 1 Radius r2 0 1 2 4 5 8 9 10 13 16 17 18 20

s2 ¼ 4

GMRF

AdaBoost

GMRF

AdaBoost

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.65 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34 4.31 4.19 4.34 4.43

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

54.45 40.88 33.79 28.78 22.71 20.50 18.96 16.58 14.99 14.14 13.02 12.57 12.16

Signal and Image Processing for Remote Sensing

364 (a) Data set 1

(b) Data set 2 0.06

3 Coefficients

Coefficients

0.05 2

1

0.04 0.03 0.02 0.01 0

−0.01 0

0

10

20

30

40

50

−0.02

20 40 60 80 Squared radii of neighborhoods

0

Squared radii of neighborhoods

100

FIGURE 16.6 Estimated coefficients for the log posteriors in the ring regions of data set 1 (left) and data set 2 (right).

GMRF-based method. The left figure in Figure 16.6 is a plot of the estimated coefficients for the averaged log posterior fr vs. r2. All the coefficients are positive. Therefore, the problem of stopping to combine classification function arises. Training and test errors as radius r becomes large are given in Figure 16.7. The test error takes the minimum value at r2 ¼ 17, whereas the training error is stable for large r. The same study as the above is applied to Data set 2 (Table 16.4). We obtain a similar result as that observed in Table 16.3. The left figur e in Figure 16.6 gives the estimated pﬃﬃﬃﬃﬃ coefficient. We see that coefficient br with r ¼ 72 isp ﬃﬃﬃﬃﬃ 0.012975, a negative value. This implies that averaged log posterior fr with r ¼ 72 ispﬃﬃﬃﬃﬃ not appropriate for the classification. Therefore, we must exclude functions of radius 72 or larger. The application to Data set 3 leads to a quite different result (Table 16.5). We fit homoscedastic and heteroscedastic Gaussian distributions for feature vectors. Hence, classification with radius r ¼ 0 is performed by the LDF and the QDF. The error rates due to noncontextual classification function log p(k j x) and the coefficients are tabulated in Table 16.5. The test error due to the QDF is 1.66%, and this is excellent. Unfortunately, coefficients b0 of functions log p(k j x) in both cases are estimated by the negative values.

(b) Test data

0.4

0.4

0.3

0.3

Test errors

Training errors

(a) Training data

0.2

0.1

0.1

0 0

0.2

10

20

30

40

Squared radii of neighborhoods

50

0 0

10

20

30

40

50

Squared radii of neighborhoods

FIGURE 16.7 Error rates with respect to squared radius, r2, for training data (left) and test data (right) due to classifiers b0f0 þ b1f1 þ þ brfr for data set 1.

Supervised Image Classification of Multi-Spectral Images

365

TABLE 16.4 Error Rates (%) Due to GMRF and Spatial AdaBoost for Data Set 2 Homoscedastic 2

Radius r 0 4 9 16 25 36 49 64

GMRF

AdaBoost

8.61 6.09 5.69 5.35 5.69 5.69 6.13 6.65

8.61 7.08 6.11 5.95 5.83 5.85 5.99 6.01

Therefore, the classification functions b0 log p(k j x) are not applicable. This implies that the exponential loss defined by Equation 16.5 is too sensitive to misclassified data. Error rates for spatial AdaBoost b0f0 þ b1f1 are listed in Table 16.6 when coefficient b0 is predetermined and b1 is tuned by the empirical risk Remp(b0f0 þ b1f1). The LDF and the QDF columns correspond to the posterior probabilities based on the homoscedastic Gaussian distributions and the heteroscedastic Gaussian distributions, respectively. For both of these cases, information in the first-order neighborhood (r ¼ 1) improves the classification slightly. We note that homoscedastic and heteroscedastic GMRFs have minimum error rates of 4.01% and 0.52%, respectively.

16.7.4

Spatial AdaBoost and Spatial LogitBoost

pﬃﬃﬃﬃﬃ We apply classification functions F0, F1, . . . , Fr with r ¼ 20 to the test data,pwhere ﬃﬃﬃﬃﬃ function Fr is defined by Equation 16.30, and coefficients b0, b1, . . . , br with r ¼ 20 are sequentially tuned by minimizing exponential risk (Equation 16.6) or logit risk (Equation 16.18). Error rates due to GMRF-based classifiers and spatial boosting methods for error variances s2 ¼ 1 and 4 for Data set 1 are compared in Table 16.7. We see that GMRF is superior to the spatial boosting methods to a small degree. Furthermore, spatial AdaBoost and spatial LogitBoost exhibit similar performances for Data set 1. Note that spatial boosting is very fast compared with GMRF-based classifiers. This observation is also true for Data set 2 (Table 16.8). Then, GMRF and spatial LogitBoost were applied to Data set 3 (Table 16.9). Spatial LogitBoost still works well in spite of the failure of spatial AdaBoost. Note that spatial AdaBoost and spatial LogitBoost use the same classification functions: f0, f1, . . . , fr. However, the loss function used by TABLE 16.5 Optimal Coefficients Tuned by Minimizing Exponential Risks Based on LDF and QDF for Data Set 3

Error rate by f0 Tuned b0 to f0 Error rate by b0f0

LDF

QDF

10.69 % 0.006805 Nearly 100%

1.66 % 0.000953 Nearly 100%

Signal and Image Processing for Remote Sensing

366 TABLE 16.6

Error Rates (%) Due to Contextual Classification Function b0f0 þ b1f1 Given b0 for Data Set 3. b1 Is Tuned by Minimizing the Empirical Exponential Risk b0

Homoscedastic

Heteroscedastic

107 106 105 104 103 102 101 1 1

9.97 10.00 10.00 10.00 9.10 9.45 10.24 10.69 10.69

1.35 1.35 1.35 1.25 1.00 1.38 1.66 1.66 1.66

TABLE 16.7 Error Rates (%) Due to GMRF and Spatial Boosting for Data Set 1 with Variance–Covariance Matrix s2I s2 ¼ 1

s2 ¼ 4

Spatial Boosting 2

Radius r

Homoscedastic GMRF

AdaBoost

41.66 9.25 3.61 2.48 2.72 3.06 3.30 3.79 4.13 4.17 4.67 4.94 6.11

41.66 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34 4.31 4.19 4.34 4.43

0 1 2 4 5 8 9 10 13 16 17 18 20

Spatial Boosting

LogitBoost

Homoscedastic GMRF

AdaBoost

LogitBoost

41.66 19.45 11.63 8.36 5.93 5.43 4.96 4.65 4.54 4.43 4.31 4.58 4.66

54.45 31.07 15.58 14.70 9.12 10.61 11.00 12.66 14.06 14.95 15.76 15.89 17.00

54.45 40.88 33.79 28.78 22.71 20.50 18.96 16.58 14.99 14.14 13.02 12.57 12.16

54.45 40.77 33.95 28.74 22.44 20.55 19.01 16.56 15.08 14.12 13.22 12.69 12.11

TABLE 16.8 Error Rates (%) Due to GMRF and Spatial Boosting for Data Set 2 Homoscedastic 2

Radius r 0 4 9 16 25 36 49 64

Homoscedastic GMRF

AdaBoost

LogitBoost

8.61 6.09 5.69 5.35 5.69 5.69 6.13 6.65

8.61 7.08 6.11 5.95 5.83 5.85 5.99 6.01

8.61 7.93 7.33 6.74 6.35 6.13 6.32 6.48

Supervised Image Classification of Multi-Spectral Images

367

TABLE 16.9 Error Rates (%) Due to GMRF and Spatial LogitBoost for Data Set 3 Homoscedastic 2

Radius r 0 1 4 8 9 10 13 16

Heteroscedastic

GMRF

LogitBoost

GMRF

LogitBoost

10.69 8.58 7.06 3.81 4.01 4.08 4.01 4.01

10.69 9.48 6.51 5.43 5.40 5.29 5.16 5.26

1.66 1.97 1.69 0.55 0.52 0.52 0.52 0.52

1.66 1.31 0.93 0.76 0.76 0.83 0.87 0.90

LogitBoost assigns less penalty to misclassified data. Hence, spatial LogitBoost seems to have a robust property. 16.7.5

Spatial Parallel Boost

We examine parallel boost by applying it to Data set 1. Three rectangular regions found in Figure 16.8a give training areas. Now, we consider the following four subsets of the training data set: .

Near the center of Figure 16.8a with sample size 75 (D(1))

.

North-west of Figure 16.8a with sample size 160 (D(2)) (a) True labels

(d) D (2), r = √13

(b) D (1) ∪ D (2) ∪ D (3) , r = 0

(e) D (3), r = √13

(c) D (1), r = √13

(f) D (1) ∪ D (2) ∪ D (3) , r = √13

FIGURE 16.8 True labels, training regions, and classified images. The n denotes the size of training samples. (a) True labels with three training regions. (b) Parallel boost with pﬃﬃﬃﬃﬃ r ¼ 0 (error rate ¼ 42.29%, n ¼ 435) noncontextual (1) ¼ 10.26%, p n ﬃﬃﬃﬃﬃ ¼ 75). (d) Spatial boost by D(2) classification. pﬃﬃﬃﬃﬃ (c) Spatial boost by D with r ¼ 13 (error rate (3) with r ¼ 13 (error rate ¼ 6.29%, n ¼ 160). (e) p Spatial boost by D with r ¼ 13 (error rate ¼ 7.46%, n ¼ 200). ﬃﬃﬃﬃﬃ (f) Parallel boost by D(1) [ D(2) [ D(3) with r ¼ 13 (error rate ¼ 4.96%, n ¼ 435).

Signal and Image Processing for Remote Sensing

368 TABLE 16.10

Error Rates (%) Due to Spatial AdaBoost Based on Four Training Data Sets and Spatial Parallel Boost Based on Union of Three Subsets in Data Set 1 Spatial AdaBoost 2

(1)

Radius r

D

0 1 2 4 5 8 9 10 13

43.13 23.84 17.16 13.88 10.88 10.49 10.48 10.41 10.26

(2)

Parallel Boost (3)

D

D

41.67 20.23 12.84 9.44 7.34 6.97 6.64 6.35 6.29

45.88 26.51 18.92 15.25 11.21 10.26 9.26 8.08 7.46

Whole

D(1)U D(2)U D(3)

41.75 20.40 12.15 8.77 5.72 5.39 4.82 4.38 4.34

42.29 21.18 13.03 9.46 6.39 5.78 5.41 5.19 4.96

.

South-east of Figure 16.8a with sample size 200 (D(3))

.

All training data with sample size 8281

The test data set is also of sample size 8281. The training region D(1) is relatively small, pﬃﬃﬃﬃﬃ so we only consider radius r up to 13. Gaussian density functions with a common variance–covariance matrix are fitted to each training data set, and we obtain four types of posterior probabilities for each pixel in training data sets and the test data set. Spatial AdaBoost derived by the four training data sets and spatial parallel boost derived by the region D(1) [ D(2) [ D(3) are applied to the test region shown in Figure 16.2a. Error rates due to five classifiers examined with the test data using 8281 samples are comparedpin ﬃﬃﬃﬃﬃTable 16.10. Each row corresponds to spatial boost based on radius r in {0, 1, . . . , 13}. The second to fourth columns correspond to Spatial AdaBoost based on the four training data sets. The last column is the error rate due to spatial parallel boost. We see that spatial AdaBoost exhibits a better performance as the training data size increases. The classifier based on D(3) seems exceptional, but it gives better results for a large r. Spatial parallel boost exhibits a performance similar to that of spatial AdaBoost based on all the training data. Here, we note again that our method uses only 435 samples, whereas all the training data have 8281 samples. A noncontextual classification result based on spatial parallel boost is given in Figure 16.8b, and it is very poor. Anpexcellent classification result is obtained as radius r becomes ﬃﬃﬃﬃﬃ bigger. The result with r ¼ 13, which is better than the results (b), (d), and (f) due to spatial AdaBoost using three training data sets, is given in Figure 16.8e. The error rates corresponding to figures (b), (d), (e), and (f) are given in Table 16.10 in bold.

16.8

Conclusion

We have considered contextual classifiers, Switzer’s smoothing method, MRF-based methods, and spatial boosting methods. We showed that the last two methods can be regarded as extensions of the Switzer’s method. The MRF-based method is known to be efficient, and it exhibited the best performance in our numerical studies given here. However, the method requires extra computation time for (1) the estimation of parameters

Supervised Image Classification of Multi-Spectral Images

369

specifying the MRF and (2) the estimation of test labels, even if the ICM procedure is utilized for (2). Both estimations are accomplished by iterative procedures. The method has already been fully discussed in the literature. The following are our remarks on spatial boosting and its possible future development. Spatial boosting is derived as long as posterior probabilities are available. Hence, posteriors need not be defined by statistical models. Posteriors due to statistical support vector machines (SVM) [22,23] could be utilized. Spatial boosting is derived by fewer assumptions than those of an MRF-based method. Features of spatial boosting are as follows: (a) Various types of posteriors can be implemented. Pairwise coupling [24] is a popular strategy for multi-class classification that involves estimating posteriors for each pair of classes, and then coupling the estimates together. (b) Directed neighborhoods can be implemented. See Figure 16.9 for the directed neighborhoods. They would be suitable for classification of textured images. (c) The method is applicable to a situation with multi-source and multi-temporal data sets. (d) Classification is performed non-iteratively, and the performance is similar to that of MRF-based classifiers. The following new questions arise: . . .

How do we define the posteriors? This is closely related to feature (a). How do we choose the loss function? How do we determine the maximum radius for regarding log posteriors into the classification function.

This is an old question in boosting: How many classification functions are sufficient for classification? Spatial boosting should be used in cases where estimation of coefficients is performed through test data only. This is a first step for possible development into unsupervised classification. In addition, the method will be useful for unmixing. Originally, classification function F defined by Equation 16.30 is given by a convex combination of averaged

(a) r = 1

(b) r = 1

(c) r = √2

(d) r = √2

i

i

i

i

FIGURE 16.9 Directed subsets with center pixel i.

370

Signal and Image Processing for Remote Sensing

log posteriors. Therefore, F would be useful for estimation of posterior probabilities, which is directly applicable to unmixing.

Acknowledgment Data sets grss_dfc_0006 and grss_dfc_0009 were provided by the IEEE GRS-S Data Fusion Committee.

References 1. Switzer, P., (1980) Extensions of linear discriminant analysis for statistical classification of remotely sensed satellite imagery, Mathematical Geology, 12(4), 367–376. 2. Nishii, R. and Eguchi, S. (2005), Supervised image classification by contextual AdaBoost based on posteriors in neighborhoods, IEEE Transactions on Geoscience and Remote Sensing, 43(11), 2547–2554. 3. Jain, A.K., Duin, R.P.W., and Mao, J. (2000), Statistical pattern recognition: a review, IEEE Transaction on Pattern Analysis and Machine Intelligence, PAMI-22, 4–37. 4. Besag, J. (1986), On the statistical analysis of dirty pictures, Journal of the Royal Statistical Society, Series B, 48, 259–302. 5. Duda, R.O., Hart, P.E., and Stork, D.G. (2000), Pattern Classification (2nd ed.), John Wiley & Sons, New York. 6. Geman, S. and Geman, D. (1984), Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 721–741. 7. McLachlan, G.J. (2004), Discriminant Analysis and Statistical Pattern Recognition (2nd ed.), John Wiley & Sons, New York. 8. Chile`s, J.P. and Delfiner, P. (1999), Geostatistics, John Wiley & Sons, New York. 9. Cressie, N. (1993), Statistics for Spatial Data, John Wiley & Sons, New York. 10. Freund, Y. and Schapire, R.E. (1997), A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55(1), 119–139. 11. Friedman, J., Hastie, T., and Tibshirani, R. (2000), Additive logistic regression: a statistical view of boosting (with discussion), Annals of Statistics, 28, 337–407. 12. Hastie, T., Tibshirani, R., and Friedman, J., (2001), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York. 13. Benediktsson, J.A. and Kanellopoulos, I. (1999), Classification of multisource and hyperspectral data based on decision fusion, IEEE Transactions on Geoscience and Remote Sensing, 37, 1367–1377. 14. Melgani, F. and Bruzzone, L. (2004), Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on Geoscience and Remote Sensing, 42(8), 1778–1790. 15. Nishii, R. (2003), A Markov random field-based approach to decision level fusion for remote sensing image classification, IEEE Transactions on Geoscience and Remote Sensing, 41(10), 2316–2319. 16. Nishii, R. and Eguchi, S. (2002), Image segmentation by structural Markov random fields based on Jeffreys divergence, Res. Memo., Institute of Statistical Mathematics, No. 849. 17. Nishii, R. and Eguchi, S. (2005), Robust supervised image classifiers by spatial AdaBoost based on robust loss functions, Proc. SPIE, Vol. 5982. 18. Eguchi, S. and Copas, J. (2002), A class of logistic-type discriminant functions, Biometrika, 89, 1–22. 19. Takenouchi, T. and Eguchi, S. (2004), Robustifying AdaBoost by adding the naive error rate, Neural Computation, 16, 767–787.

Supervised Image Classification of Multi-Spectral Images

371

20. IEEE GRSS Data Fusion reference database, Data sets GRSS_DFC_0006, GRSS_DFC_0009 Online. http://www.dfc-grss.org/, 2001. 21. Giacinto, G., Roli, F., and Bruzzone, L. (2000), Combination of neural and statistical algorithms for supervised classification of remote-sensing images, Pattern Recognition Letters, 21, 385–397. 22. Kwok, J.T. (2000), The evidence framework applied to support vector machine, IEEE Transactions on Neural Networks, 11(5), 1162–1173. 23. Wahba, G. (1999), Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV, In Advances in Kernel Methods—Support Vector Learning, Scho¨lkopf, Burges and Smola, eds., MIT press, Cambridge, MA, 69–88. 24. Hastie, T. and Tibshirani, R. (1998), Classification by pairwise coupling, Annals of Statistics, 26(2), 451–471.

17 Unsupervised Change Detection in Multi-Temporal SAR Images

Lorenzo Bruzzone and Francesca Bovolo

CONTENTS 17.1 Introduction ..................................................................................................................... 373 17.2 Change Detection in Multi-Temporal Remote-Sensing Images: Literature Survey ............................................................................................................ 376 17.2.1 General Overview ............................................................................................. 376 17.2.2 Change Detection in SAR Images .................................................................. 379 17.2.2.1 Preprocessing.................................................................................... 379 17.2.2.2 Multi-Temporal Image Comparison............................................. 380 17.2.2.3 Analysis of the Ratio and Log-Ratio Image ................................ 381 17.3 Advanced Approaches to Change Detection in SAR Images: A Detail-Preserving Scale-Driven Technique............................................................. 383 17.3.1 Multi-Resolution Decomposition of the Log-Ratio Image ......................... 385 17.3.2 Adaptive Scale Identification .......................................................................... 387 17.3.3 Scale-Driven Fusion.......................................................................................... 388 17.4 Experimental Results and Comparisons..................................................................... 390 17.4.1 Data Set Description ......................................................................................... 390 17.4.2 Results................................................................................................................. 392 17.5 Conclusions...................................................................................................................... 396 Acknowledgments ..................................................................................................................... 397 References ................................................................................................................................... 397

17.1

Introduction

The recent natural disasters (e.g., tsunami, hurricanes, eruptions, earthquakes, etc.) and the increasing amount of anthropogenic changes (e.g., due to wars, pollution, etc.) gave prominence to the topics related to environment monitoring and damage assessment. The study of environmental variations due to the time evolution of the above phenomena is of fundamental interest from a political point of view. In this context, the development of effective change-detection techniques capable of automatically identifying land-cover

373

Signal and Image Processing for Remote Sensing

374

variations occurring on the ground by analyzing multi-temporal remote-sensing images assumes an important relevance for both the scientific community and the end-users. The change-detection process considers images acquired at different times over the same geographical area of interest. These images acquired from repeat-pass satellite sensors are an effective input for addressing change-detection problems. Several different Earthobservation satellite missions are currently operative, with different kinds of sensors mounted on board (e.g., MODIS and ASTER on board NASA’s TERRA satellite, MERIS and ASAR on board ESA’s ENVISAT satellite, Hyperion on board EO-1 NASA’s satellite, SAR sensors on board RADARSAT-1 and RADARSAT-2 CSA’s satellites, Ikonos and Quickbird satellites that acquire very high resolution pancromatic and multi-spectral (MS) images, etc.). Each sensor has specific properties with respect to the image acquisition mode (e.g., passive or active), geometrical, spectral, and radiometric resolutions, etc. In the development of automatic change-detection techniques, it is mandatory to take into account the properties of the sensors to properly extract information from the considered data. Let us discuss the main characteristics of different kinds of sensors in detail (Table 17.1 summarizes some advantages and disadvantages of different sensors for change-detection applications according to their characteristics). Images acquired from passive sensors are obtained by measuring the land-cover reflectance on the basis of the energy emitted from the sun and reflected from the ground1. Usually, the measured signal can be modeled as the desired reflectance (measured as a radiance) altered from an additive Gaussian noise. This noise model enables relatively easy processing of the signal when designing data analysis techniques. Passive sensors can acquire two different kinds of images [panchromatic (PAN) images and MS images] by defining different trade-offs between geometrical and spectral resolutions according to the radiometric resolution of the adopted detectors. PAN images are characterized by poor spectral resolution but very high geometrical resolution, whereas MS images have medium geometrical resolution but high spectral resolution. From the perspective of change detection, PAN images should be used when the expected size of the changed area is too small for adopting MS data. For example, in the case of the analysis of changes in urban areas, where detailed urban studies should be carried out, change detection in PAN images requires the definition of techniques capable of capturing the richness of information present both in the spatial-context relations between neighboring pixels and in the geometrical shapes of objects. MS data should be used

TABLE 17.1 Advantages and Disadvantages of Different Kinds of Sensors for Change-Detection Applications Sensor Multispectral (passive)

Panchromatic (passive)

SAR (active)

1

Advantages

Disadvantages

3 Characterization of the spectral signature of land-covers 3 The noise has an additive model 3 High geometrical resolution

3 Atmospheric conditions strongly affect the acquisition phase

3 High content of spatial-context information 3 Not affected by sunlight and atmospheric conditions

3 Atmospheric conditions strongly affect the acquisition phase 3 Poor characterization of the spectral signature of land-covers 3 Complexity of data preprocessing 3 Presence of multiplicative speckle noise

Also, the emission of Earth affects the measurements in the infrared portion of the spectrum.

Unsupervised Change Detection in Multi-Temporal SAR Images

375

when a medium geometrical resolution (i.e., 10–30 m) is sufficient for characterizing the size of the changed areas and a detailed modeling of the spectral signature of the landcovers is necessary for identifying the change investigated. Change-detection methods in MS images should be able to properly exploit the available MS information in the change detection process. A critical problem related to the use of passive sensors in change detection consists in the sensitivity of the image-acquisition phase to atmospheric conditions. This problem has two possible effects: (1) atmospheric conditions may not be conducive to measure land-cover spectral signatures, which depends on the presence of clouds; and (2) variations in illumination and atmospheric conditions at different acquisition times may be a potential source of errors, which should be taken into account to avoid the identification of false changes (or the missed detection of true changes). The working principle of active synthetic aperture radar (SAR) sensors is completely different from that of the passive ones and allows overcoming some of the drawbacks that affect optical images. The signal measured by active sensors is the Earth backscattering of an electromagnetic pulse emitted from the sensor itself. SAR instruments acquire different kinds of signals that result in different images: medium or high-resolution images, singlefrequency or multi-frequency, and single-polarimetric or fully polarimetric images. As for optical data, the proper geometrical resolution should be chosen according to the size of the expected investigated changes. The SAR signal has different geometrical resolutions and a different penetration capability depending on the signal wavelength, which is usually included between band X and band P (i.e., between 2 and 100 cm). In other words, shorter wavelengths should be used for measuring vegetation changes and longer and more penetrating wavelengths for studying changes that have occurred on or under the terrain. All the wavelengths adopted for SAR sensors neither suffer from atmospheric and sunlight conditions nor from the presence of clouds; thus multi-temporal radar backscattering does not change with atmospheric conditions. The main problem related to the use of active sensors is the coherent nature of the SAR signal, which results in a multiplicative speckle noise that makes acquired data intrinsically complex to be analyzed. A proper handling of speckle requires both an intensive preprocessing phase and the development of effective data analysis techniques. The different properties and statistical behaviors of signals acquired by active and passive sensors require the definition of different change-detection techniques capable of properly exploiting the specific data peculiarities. In the literature, many different techniques for change detection in images acquired by passive sensors have been presented [1–8], and many applications of these techniques have been reported. This is because of both the amount of information present in MS images and the relative simplicity of data analysis, which results from the additive noise model adopted for MS data (the radiance of natural classes can be approximated with a Gaussian distribution). Less attention has been devoted to change detection in SAR images. This is explained by the intrinsic complexity of SAR data, which require both an intensive preprocessing phase and the development of effective data analysis techniques capable of dealing with multiplicative speckle noise. Nonetheless, in the past few years the remote-sensing community has shown more interest in the use of SAR images in change-detection problems, due to their independence from atmospheric conditions that results in excellent operational properties. The recent technological developments in sensors and satellites have resulted in the design of more sophisticated systems with increased geometrical resolution. Apart from the active or passive nature of the sensor, the very high geometrical resolution images acquired by these systems (e.g., PAN images) require the development of specific techniques capable of taking advantage of the richness of the geometrical information they contain. In particular, both the high correlation

Signal and Image Processing for Remote Sensing

376

between neighboring pixels and the object shapes should be considered in the design of data analysis procedures. In the above-mentioned context, two main challenging issues of particular interest in the development of automatic change-detection techniques are: (1) the definition of advanced and effective techniques for change detection in SAR images, and (2) the development of proper methods for the detection of changes in very high geometrical resolution images. A solution for these issues lies in the definition of multi-scale and multi-resolution change-detection techniques, which can properly analyze the different components of the change signal at their optimal scale2. On the one hand, the multi-scale analysis allows one to better handle the noise present in medium-resolution SAR images, resulting in the possibility of obtaining accurate change-detection maps characterized by a high spatial fidelity. On the other hand, multi-scale approaches are intrinsically suitable to exploit the information present in very high geometrical resolution images according to effective modeling (at different resolution levels) of the different objects present at the scene. According to the analysis mentioned above, after a brief survey on change detection and on unsupervised change detection in SAR images, we present, in this chapter, a novel adaptive multi-scale change detection technique for multi-temporal SAR images. This technique exploits a proper scale-driven analysis to obtain a high sensitivity to geometrical features (i.e., details and borders of changed areas are well preserved) and a high robustness to noisy speckle components in homogeneous areas. Although explicitly developed and tested for change detection in medium-resolution SAR images, this technique can be easily extended to the analysis of very high geometrical resolution images. The chapter is organized into five sections. Section 17.2 defines the change-detection problem in multi-temporal remote-sensing images and focuses attention on unsupervised techniques for multi-temporal SAR images. Section 17.3 presents a multi-scale approach to change detection in multi-temporal SAR images recently developed by the authors. Section 17.4 gives an example of the application of the proposed multi-scale technique to a real multi-temporal SAR data set and compares the effectiveness of the presented method with those of standard single-scale change-detection techniques. Finally, in Section 17.5, results are discussed and conclusions are drawn.

17.2 17.2.1

Change Detection in Multi-Temporal Remote-Sensing Images: Literature Survey General Overview

A very important preliminary step in the development of a change-detection system, based on automatic or semi-automatic procedures, consists in the design of a proper phase of data collection. The phase of data collection aims at defining: (1) the kind of satellite to be used (on the basis of the repetition time and on the characteristics of the sensors mounted on-board), (2) the kind of sensor to be considered (on the basis of the desired properties of the images and of the system), (3) the end-user requirements (which are of basic importance for the development of a proper change-detection

2

It is worth noting that these kinds of approaches have been successfully exploited in image classification problems [9–12].

Unsupervised Change Detection in Multi-Temporal SAR Images

377

technique), and (4) the kinds of available ancillary data (all the available information that can be used for constraining the change-detection procedure). The outputs of the data-collection phase should be used for defining the automatic change-detection technique. In the literature, many different techniques have been proposed. We can distinguish between two main categories: supervised and unsupervised methods [9,13]. When performing supervised change detection, in addition to the multi-temporal images, multi-temporal ground-truth information is also needed. This information is used for identifying, for each possible land-cover class, spectral signature samples for performing supervised data classification and also for explicitly identifying what kinds of land-cover transitions have taken place. Three main general approaches to supervised change detection can be found in the literature: postclassification comparison, supervised direct multi-data classification [13], and compound classification [14–16]. Postclassification comparison computes the change-detection map by comparing the classification maps obtained by classifying independently two multi-temporal remote-sensing images. On the one hand, this procedure avoids data normalization aimed at reducing atmospheric conditions, sensor differences, etc. between the two acquisitions; on the other hand, it critically depends on the accuracies of the classification maps computed at the two acquisition dates. As postclassification comparison does not take into account the dependence existing between two images of the same area acquired at two different times, the global accuracy is close to the product of the accuracies yielded at the two times [13]. Supervised direct multi-data classification [13] performs change detection by considering each possible transition (according to the available a priori information) as a class and by training a classifier to recognize the transitions. Although this method exploits the temporal correlation between images in the classification process, its major drawback is that training pixels should be related to the same points on the ground at the two times and should accurately represent the proportions of all the transitions in the whole images. Compound classification overcomes the drawbacks of supervised multi-date classification technique by removing the constraint that training pixels should be related to the same area on the ground [14–16]. In general, the approach based on supervised classification is more accurate and detailed than the unsupervised one; nevertheless, the latter approach is often preferred in real-data applications. This is due to the difficulties in collecting proper ground-truth information (necessary for supervised techniques), which is a complex, time consuming, and expensive process (in many cases this process is not consistent with the application constraints). Unsupervised change-detection techniques are based on the comparison of the spectral reflectances of multi-temporal raw images and a subsequent analysis of the comparison output. In the literature, the most widely used unsupervised change-detection techniques are based on a three-step procedure [13,17]: (1) preprocessing, (2) pixel-by-pixel comparison of two raw images, and (3) image analysis and thresholding (Figure 17.1). The aim of the preprocessing step is to make the two considered images as comparable as possible. In general, preprocessing operations include: co-registration, radiometric and geometric corrections, and noise reduction. From the practical point of view, co-registration is a fundamental step as it allows obtaining a pair of images where corresponding pixels are associated to the same position on the ground3. Radiometric corrections reduce differences between the two acquisitions due to sunlight and atmospheric conditions. These procedures are applied to optical images, but they are not necessary for SAR 3

It is worth noting that usually it is not possible to obtain a perfect alignment between temporal images. This may considerably affect the change-detection process [18]. Consequently, if the amount of residual misregistration noise is significant, proper techniques aimed at reducing its effects should be used for change detection [1,4].

Signal and Image Processing for Remote Sensing

378

SAR Image (date t2)

SAR Image (date t2)

Preprocessing

Preprocessing

Comparison

Analysis of the log-ratio image FIGURE 17.1 Block scheme of a standard unsupervised changedetection approach.

Change-detection map (M )

images (as SAR data are not affected by atmospheric conditions). Also noise reduction is performed differently according to the kind of remote-sensing images considered. In optical images common low-pass filters can be used, whereas in SAR images proper despeckling filters should be applied. The comparison step aims at producing a further image where differences between the two acquisitions considered are highlighted. Different mathematical operators (see Table 17.2 for a summary) can be adopted for performing image comparison; this choice gives rise to different kinds of techniques [13,19–23]. One of the most widely used operators is the difference one. The difference can be applied to: (1) a single spectral band (univariate image differencing) [13,21–23], (2) multiple spectral bands (change vector analysis) [13,24], and (3) vegetation indices (vegetation index differencing) [13,19] or other linear (e.g., tasselled cap transformation [22]) or nonlinear combinations of spectral bands. Another widely used operator is the ratio operator (image ratioing) [13], which can be successfully used in SAR image processing [17,25,26]. A different approach is based on the use of the principal component analysis (PCA) [13,20,23]. PCA can be applied separately to the feature space at single times or jointly to both images. In the first case, comparison should be performed in the transformed feature space before performing change detection; in the second case, the minor components of the transformed feature space contains change information. TABLE 17.2 Summary of the Most Widely Used Comparison Operators. (fk is the considered feature at time tk that can be: (1) a single spectral band Xbk, (2) a vector of m spectral bands [Xk1, . . . ,Xkm], (3) a vegetation index Vk, or (4) a vector of features [Pk1, . . . ,Pkm] obtained after PCA. XD and XR are the images after comparison with the difference or ratio operators, respectively) Technique Univariate image differencing Vegetation index differencing Image rationing Change vector analysis Principal component analysis

Feature Vector fk at the Time tk fk fk fk fk fk

¼ ¼ ¼ ¼ ¼

Xkb Vk Xkb [Xk1, . . . ,Xkm] [Pk1, . . . , Pkn]

Comparison Operator XD XD XR XD XD

¼ ¼ ¼ ¼ ¼

f2 f1 f2 f1 f2=f1 kf2 f1k kf2 f1k

Unsupervised Change Detection in Multi-Temporal SAR Images

379

Performances of the above-mentioned techniques could be degraded by several factors (like differences in illumination at two dates, differences in atmospheric conditions, and in sensor calibration) that make a direct comparison between raw images acquired at different times difficult. These problems related to unsupervised change detection disappear when dealing with SAR images instead of optical data. Once image comparison is performed, the decision threshold can be selected either with a manual trial-and-error procedure (according to the desired trade-off between false and missed alarms) or with automatic techniques (e.g., by analyzing the statistical distribution of the image obtained after comparison, by fixing the desired false alarm probability [27,28], or following a Bayesian minimum-error decision rule [17]). Since the remote-sensing community has devoted more attention to passive sensors [6–8] rather than active SAR sensors, in the following section we focus our attention on changedetection techniques for SAR data. Although change-detection techniques based on different architectures have been proposed for SAR images [29–37], we focus on the most widely used techniques, which are based on the three-step procedure described above (see Figure 17.1). 17.2.2

Change Detection in SAR Images

Let us consider two co-registered intensity SAR images, X1 ¼ {X1(i,j), 1 i I, 1 j J} and X2 ¼ {X2(i,j), 1 i I, 1 j J}, of size I J, acquired over the same area at different times t1 and t2. Let V ¼ {vc, vu} be the set of classes associated with changed and unchanged pixels. Let us assume that no ground-truth information is available for the design of the change-detection algorithm, i.e., the statistical analysis of change and no-change classes should be performed only on the basis of the raw data. The changedetection process aims at generating a change-detection map representing changes on the ground between the two considered acquisition dates. In other words, one of the possible labels in V should be assigned to each pixel (i,j) in the scene. 17.2.2.1

Preprocessing

The first step for properly performing change detection based on direct image comparison is image preprocessing. This procedure aims at generating two images that are as similar as possible unless in changed areas. As SAR data are not corrupted by differences in atmospheric and sunlight conditions, preprocessing usually comprises three steps: (1) geometric correction, (2) co-registration, and (3) noise reduction. The first procedure aims at reducing distortions that are strictly related to the active nature of the SAR signal, as layover, foreshortening, and shadowing due to ground topography. The second step is very important, as it allows aligning temporal images to ensure that corresponding pixels in the spatial domain are associated to the same geographical position on the ground. Co-registration in SAR images is usually carried out by maximizing crosscorrelation between the multi-temporal images [38,39]. The major drawback of this process is the need for performing interpolation of backscattering values, which is a time-consuming process. Finally, the last step is aimed at reducing the speckle noise. Many different techniques have been developed in the literature for reducing the speckle. One of the most attractive techniques for speckle reduction is multi-looking [25]. This procedure, which is used for generating images with the same resolution along the azimuth and range directions, allows reduction of the effect of the coherent speckle components. However, a further filtering step is usually applied to the images for making them suitable to the desired analysis. Usually, adaptive despeckling procedures are applied. Among these procedures we mention the following filtering techniques:

380

Signal and Image Processing for Remote Sensing

Frost [40], Lee [41], Kuan [42], Gamma Map [43,44], and Gamma WMAP [45] (i.e., the Gamma MAP filter applied in the wavelet domain). As the description of despeckling filters is outside the scope of this chapter, we refer the reader to the literature for more details.

17.2.2.2 Multi-Temporal Image Comparison As described in Section 17.1, image pixel-by-pixel comparison can be performed by means of different mathematical operators. In general, the most widely used operators are the difference and the ratio (or log-ratio). Depending on the selected operator, the image resulting from the comparison presents different behaviors with respect to the change-detection problem and to the signal statistics. To analyze this issue, let us consider two multi-look intensity images. It is possible to show that the measured backscattering of each image follows a Gamma distribution [25,26], that is, LL XkL1 LXk p(Xk ) ¼ L exp , mk mk (L 1)!

k ¼ 1, 2

(17:1)

where Xk is a random variable that represents the value of the pixels in image Xk (k ¼ 1, 2), mk is the average intensity of a homogeneous region at time tk, and L is the equivalent number of looks (ENL) of the considered image. Let us also assume that the intensity images X1 and X2 are statistically independent. This assumption, even if not entirely realistic, simplifies the analytical derivation of the pixel statistical distribution in the image after comparison. In the following, we analyze the effects of the use of the difference and ratio (log-ratio) operators on the statistical distributions of the signal. 17.2.2.2.1 Difference Operator The difference image XD is computed subtracting the image acquired before the change from the image acquired after, i.e., XD ¼ X2 X1

(17:2)

Under the stated conditions, the distribution of the difference image XD is given by [25,26]: n o XD j L1 X LL exp L m2 (L 1 þ j)! m1 m2 L1J X p(XD ) ¼ j!(L 1 j)! D (L 1)! (m1 þ m2 )L L(m1 þ m2 ) j¼0

(17:3)

where XD is a random variable that represents the values of the pixels in XD. As can be seen, the difference-image distribution depends on both the relative change between the intensity values in the two images and also a reference intensity value (i.e., the intensity at t1 or t2). It is possible to show that the distribution variance of XD increases with the reference intensity level. From a practical point of view, this leads to a higher changedetection error for changes that have occurred in high intensity regions of the image than in low intensity regions. Although in some applications the difference operator was used with SAR data [46], this behavior is an undesired effect that renders the difference operator intrinsically not suited to the statistics of SAR images.

Unsupervised Change Detection in Multi-Temporal SAR Images 17.2.2.2.2

381

Ratio Operator

The ratio image XR is computed by dividing the image acquired after the change by the image acquired before (or vice-versa), i.e., X R ¼ X 2 =X 1

(17:4)

It is possible to prove that the distribution of the ratio image XR can be written as follows [25,26]: p(XR ) ¼

L XL1 (2L 1)! X R R R þ XR )2L (L 1)!2 (X

(17:5)

R is the where XR is a random variable that represents the values of the pixels in XR and X true change in the radar cross section. The ratio operator shows two main advantages over the difference operator. The first one is that the ratio-image distribution depends R ¼ m2=m1 in the average intensity between the two dates only on the relative change X and not on a reference intensity level. Thus changes are detected in the same manner both in high- and low-intensity regions. The second advantage is that the ratioing allows reduction in common multiplicative error components (which are due to both multiplicative sensor calibration errors and the multiplicative effects of the interaction of the coherent signal with the terrain geometry [25,47]), as far as these components are the same for images acquired with the same geometry. It is worth noting that, in the literature, the ratio image is usually expressed in a logarithmic scale. With this operation the distribution of the two classes of interest (vc and vu) in the ratio image can be made more symmetrical and the residual multiplicative speckle noise can be transformed to an additive noise component [17]. Thus the log-ratio operator is typically preferred when dealing with SAR images and change detection is performed analyzing the log-ratio image XLR defined as: X LR ¼ log X R ¼ log

X2 ¼ log X 2 log X 1 X1

(17:6)

Based on the above considerations, the ratio and log-ratio operators are more used than the difference one in SAR change-detection applications [17,26,29,47–49]. It is worth noting that for keeping the changed class on one side of the histogram of the ratio (or log-ratio) image, a normalized ratio can be computed pixel-by-pixel, i.e.,

X NR

X1 X2 ¼ min , X2 X1

(17:7)

This operator allows all changed areas (independently of the increasing or decreasing value of the backscattering coefficient) to play a similar role in the change-detection problem.

17.2.2.3

Analysis of the Ratio and Log-Ratio Image

The most widely used approach to extract change information from the ratio and log-ratio image is based on histogram thresholding4. In this context, the most difficult task is to 4

For simplicity, in the following, we will refer to the log-ratio image.

382

Signal and Image Processing for Remote Sensing

properly define the th