LECTURES
ON
PRQBRB1LITV RND SECOND ORDER RANDOM RELDS
This page is intentionally left blank
Series on Advances in Mathematics for Applied Sciences - Vol.
LECTURES ON
PROBABILITY AND SECOND ORDER RANDOM FIELDS Diego Bricio Hernandez
World Scientific Singapore • New Jersey • London • Hong Kong
30
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 9128 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data Bricio Hernandez, Diego. Lectures on probability and second order random fields / Diego Bricio Hernandez. p. cm. — (Series on advances in mathematics for applied sciences, vol. 30) Includes bibliographical references and index. ISBN 9810219083 1. Random fields. 2. Probabilities. I. Title. II. Series. QA274.45.B75 1995 519.2-dc20 94-30544 CIP
Copyright © 1995 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form orbyanymeans, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 27 Congress Street, Salem, MA 01970, USA.
Printed in Singapore by Uto-Print
Foreword This book was written by Diego Bricio Hernandez during his stay at the University of Padua, Italy, in 1991, but was sent to the publisher only after the author's premature death in November 1993, at age 48. Diego spent extensive periods of time in Padua as a Visiting Professor, and maintained regular contacts with many others in the Italian scientific community. Those who knew Diego could not help but be touched by his warmth, generos ity, and selflessness, and by his enthusiasm in helping and collaborating with others. In publishing this monograph, Diego's many friends and colleagues join me in paying an affectionate tribute to his memory: Giovanni Andreatta, Paolo Dai Pra, Giovanni Di Masi, Susi Dulli, A. Gombani, Giovanni Marchesini, Luigi Mariani, Claudio Paniconi, Michele Pavon, Giorgio Picci, Stefano Pinzoni, Mario Pitteri, Mario Putti, Andrea Rinaldo, Wolfgang Runggaldier, Flavio Sartoretto and Marco Vianello (in Padua); S. Bittanti, L. Galgani and A. Locatelli (in Milan); Nicola Bellomo and R. Monaco (in Turin). Padua, June 23, 1994 Renato Spigler
v
This page is intentionally left blank
Preface This report originated from a series of eight weekly lectures delivered by the author at the DMMMSA 1 during the Spring Term of 1991. The target audience was a mix of mathematicians and engineers, recruited from the Universities of Padua and Trento. The purpose of delivering these seminars was twofold, namely to provide: • the theoretical background material required for the computer generation of random fields, of interest in various fields of Applied Mathematics, and • the necessary probabilistic background suitable for applied work in Water Resources Engineering as well as Signal and Image Processing. As to the first of these two goals, the main mathematical tools are the various representation theorems for second order random fields. The Karhunen-Loeve expansion is proposed as the main ingredient of a simulation algorithm for mean square continuous random fields defined on compact sets. Analogously, follow ing [17] the Cramer representation formula for random fields as an integral with respect to a random orthogonal measure is made the basis of a simulation algo rithm for random fields defined on a Euclidean space. This material is presented in Chapters 7 and 8, and relies heavily on the earlier chapters. In turn, Chapters 1 through 7 develop the mathematical notions referred to in the second of the two points mentioned above. The introductory chapter of the book by Dagan [7] was selected in order to provide some guidance in the selection of topics. In fact, these seven chapters simply develop the concepts required in Dagan's book, trying to present them in accordance with the spirit of modern Probability Theory and in somewhat greater depth. 1
Dipartimento di Metodi e Modelli Matematici per le Scienze Applicate, University of Padua (Italy) vii
Vlll
PREFACE
Chapters 1 and 2 constitute a brief recollection of the fundamental concepts of classical probabihty theory. For later reference, a brief discussion on Monte Carlo methods has been included in Chapter 3. In turn, chapters 4 and 5 contain the main mathematical tools required for the study of random fields from the "second order" point of view, namely the L2 theory of random variables and the Fourier transform on Euclidean spaces, respectively. With these tools at hand, chapters 6 and 7 develop the second order theory of random fields, including the representation formulas referred to above. Finally, chapter 8 discusses the practical issues involved in generating random fields on a computer, as well as modelling them from experimental data. The selection of topics is such that concepts are introduced only if they are required later on, thus guaranteeing the minimality of the presentation. The degree of difficulty of this material is intermediate, and thus it should be accessible to an upper level engineering student interested in applying these concepts to her/his disciphne. The very basics of Lebesgue integration should be a part of the mathematical toolkit of these students. It is only hoped that these lecture notes will serve to lure some young engineers and applied mathematicians into the study of these probabilistic topics. Full proofs of theorems are given only in some special cases. Instead, the style of this presentation seeks to motivate the reader before introducing the various concepts, and then to illustrate at least some of the connections among them. However, a modern, measure theoretical presentation of probabihty is adopted, and no attempt has been made to hide the concepts away from the reader. This is beheved to be important if engineers are to ready themselves for a profitable consultation of the modern hterature on the subject. These lectures on second order random fields were delivered following the invitation to do so received from F. Sartoretto and G. Gambolati. Indeed, both the lectures and these notes owe a lot to Sartoretto's initiative. My deepest recognition for his contribution. Besides, M. Putti made me aware of both [7] and [17], both of which constituted the guiding hght in the choice of topics covered here. In addition, C. Paniconi translated Chapter 1 from the Italian original into English. Their contribution is gratefully acknowledged. Needless to say, the contents of these notes remain my exclusive responsibility.
PREFACE
IX
I also want to thank the other participants in the seminar on random fields, namely R. Rigon and A. Rinaldo (Trento) as well as P. Salandin, R. Spigler, M. Takagi and M. Vianello (Padua), all of whom provided valuable interventions both at lecture time and afterwards. Partial support was received from DMMMSA, as well as from the Faculty of Engineering of the University of Padua and from The Venice Institute.
DBH, La Stanga, Fall of 1991
This page is intentionally left blank
Contents Preface
vii
1
R a n d o m Variables 1.1 The Concept of Probability 1.2 Random Variables 1.3 Distributions 1.4 Expected Value
1 1 5 8 14
2
R a n d o m Vectors 2.1 Joint Distributions 2.2 Independence 2.3 Transformations of Random Vectors
21 21 26 30
3
Sampling R a n d o m Variables 3.1 Designing a Sampling Technique 3.2 Monte Carlo Methods 3.3 Error Bounds
35 35 39 43
4
S e c o n d Order P r o p e r t i e s 4.1 Orthogonality 4.2 Orthogonal Projections 4.3 Conditional Expectation 4.4 Optimal Linear Estimation
51 52 56 59 62
5
The 5.1 5.2 5.3
67 67 71 76
Fourier Transform Characteristic Functions The Fourier Transform The Plancherel Theorem xi
xii
CONTENTS
6
Second Order R a n d o m Fields 6.1 Covariance Functions 6.2 Construction of Random Fields 6.3 Analytical Properties of Random Fields 6.4 Spectral Representation of Covariances
81 81 86 91 98
7
Spectral Representation of R a n d o m Fields 7.1 Random Measures 7.2 Stochastic Integrals 7.3 The Spectral Representation
103 103 107 112
8
Sampling and Modeling R a n d o m Fields 8.1 Sampling Orthogonal Random Measures 8.2 Fast Fourier Computations 8.3 Simulation and Orthogonal Expansions 8.4 Mean and Covariance Estimation
117 118 121 125 130
Bibliography
135
A The Sources
139
Index
141
Chapter 1 Random Variables This introductory chapter presents the basic probabilistic ideas that will be re quired in the sequel. It is not meant to replace a good introductory book (prefer ably [10]). Instead, it should only be regarded as a friendly introduction to the subject, written as a handy reference and for purposes of motivation. Section 1.1 introduces the concept of probability, from its empirical foundations to the fully axiomatic presentation of our days. The main mathematical object of study in Probability is the Probability Space - consisting of the space of results (Q), the family of all events (A), and the probability measure (P) - and this is clearly emphasized. The remaining three sections deal with random variables, and with the accompanying concepts of distribution functions, density functions, expected value, mean, variance, etc. Characteristic functions will be presented in section 5.1.
1.1
The Concept of Probability
A general model for an experiment could be constructed in the following manner: 1. Take a non-empty set f2, in such a way that each u> 6 fi represents a possible o u t c o m e of the experiment. 2. To the "greatest possible number" of subsets / I c i l , each subset A rep resenting a possible e v e n t , associate a number P(A) £ [0,1], called the probability of such event. fl is called the sample space of the experiment and each A C 0 to which we can associate a probability is called an e v e n t associated with the experiment. 1
2
CHAPTER
1. RANDOM
VARIABLES
We see that event A has been observed when we perform a realization of the experiment and an outcome ui £ A is obtained. What is the probability associated with an event? There are various an swers to this question, depending on the field or application. For example, an experimental researcher can decide to proceed in the following way: repeat the experiment a large number of times, say N, noting each time whether event A has occurred. Let NA be the number of times that event A has occurred and let NA be the corresponding relative frequency. We can take this value /A as the probability of A: the larger N is, the more valid is the assumption. This manner of proceeding is called the frequency approach: the probability of event A is denned as P{A) = Mm ~ .
(1.1)
A related approach is one used frequently in fields such as Signal Theory, in which we study phenomena which vary in time in a stationary manner. We define the time average of a signal / as
^■•=^lomdtFor an experiment in which the outcomes are non-negative real numbers (fl = [0, oo) ), for each A C fi we can define the special signal
W>-\
f i if* e A 0 Stf A.
The 1
1
— l")- JJ (i •2) T-K» T :an be identified identified with the probability nrohahilitv of nf having h avinor an a n ontrnTnp It is clear that (JA) can outcome in A, that is with the probability of A. Clearly equations (1.1) and (1.2) are analogous and both correspond to an experimental approach to the problem of knowledge acquisition. A radically different approach, because it is a priori, is one which we call Laplacian: there are N possible outcomes, or rather #Q = N\ if an event A can *) == bm
1.1.
THE CONCEPT
OF
3
PROBABILITY
be given in NA different ways, or rather # A = NA, we define the probability of A as
This approach corresponds to assigning to each event of the form {ui} - called an e l e m e n t a r y event since it consists of a single outcome - the same probability 1/N. A composite event A will have as probability the sum of the probabilities of its component elementary events. This approach is called Laplacian because it was proposed by the French mathematician Pierre Simon de Laplace in his famous treatise on the subject [25]. It is also called the equiprobabilistic approach for obvious reasons. In spite of its being limited to experiments with a finite number of outcomes, the approach has found numerous applications in the physical sciences (see [10]). Finally a third viewpoint, this one also a priori, is the gambler's approach: the probability of a given event is the fraction of my capital which I consider to be a fair bet on the possibility of seeing that event realized. Apart from gamblers, this approach is used by anyone needing to make a decision based on the information at his/her disposal: should I take the umbrella or leave it at home? In the philosophical literature on probability this approach is called the subjectivist approach, see [8]. Much has been written on the relationship between these various approach es to the assignment of probability. In particular the author believes that it is possible to reconcile the apparent discrepancies. At any rate, the question is irrel evant from a mathematical point of view, as was clarified by A. N. Kolmogorov in his celebrated treatise [23]; what is actually important is to clarify the following points: 1. What are the properties of the family of events A1 If A is this family, the probability is a function P:A->
[0,1].
2. What are the properties of this function? Obviously Q must be an event, called a certain event, since it can surely be verified, and the condition P(Q) = 1 must be true. It is also clear that if A and B are events and A C B, then P(A) < P(B) must hold. Analogously if A and B
4
CHAPTER
1. RANDOM
VARIABLES
are events with no outcomes in common, that is A D B = 0 (we say that A and B are i n c o m p a t i b l e events), then P{A + B) = P(A) + P{B).
(1.3)
Note that A + B means AUB when AnB = 0. In particular, if A is an event, it is convenient that its complement Ac also be an event, and so P(A ) = 1 — P{A). For technical reasons it is convenient that the union of countably many events in A also belongs to A. In view of the set theoretic identity (called De Morgan's)
n=l
\n=l
/
A is then necessarily closed under the formation of countable intersections. Definition 1.1.1 We say that a family A of subsets of Q, is a
A,
ii) A e A => Ac £ A, iii) Au A a > . . . £ A = > U~ =1 A„ e A The elements of A are called e v e n t s . Definition 1.1.2 Let A be a cr-algebra of subsets of fi. A function P : A —> [0,1] is a probability if i) P(fl) = 1; ») ^ ( £ ~ = i An) = E~=i P(A„) for every family {A n } of events for which An D Am = 0 if ra ^ m. The ordered triple
(n.A-P) is called a probability space if • n is a non-empty set (the space of o u t c o m e s ) , • A is a cr-algebra of subsets of Q, (the family of events),
(i.4)
1.2. RANDOM
5
VARIABLES
• P : A -* [0,1] is a probabihty. The structure (1.4) constitutes the basis for a mathematical model of an experiment, keeping in mind the non-reproducibility which is often encountered in empirical sciences. For instance, in a coin toss the outcomes are "heads" or "tails" and we can take
n = {H,T}. All possible events are elements of ■A = { 0 , { t f } , { T } , Q } . Lastly, if the coin is "fair", the probability P is given by 0 H->0 {H} h-> 1/2
Q H
{T} ^
1 1/2
according to the Laplacian approach. All the properties of probability, without exception, are derived from the mathe matical model (1-4).
1.2
R a n d o m Variables
A much more interesting (and more complex) example of a probabihty space is as follows: • We wish to study the steady state response of a confined heterogeneous aquifer, under conditions of water withdrawal. We measure the hydraulic conductivity at various points in the aquifer for confidence in the results. Hence it is convenient to consider modeling the aquifer as a medium with random characteristics. The scope of the modeling problem will be to construct a probabihty space {£l,A, P) adapted to this situation. The possible outcomes in our experiment of measuring hydraulic conductivity (i.e. the corresponding profiles) are functions which assume a value at each point in the space which we are working in. Let a set S C M be the mathematical representation of this space, illustrated in figure 1.1. Then the elements of £1 are the functions ui : S -> JR, where
31
CHAPTER
1. RANDOM
VARIABLES
Figure 1.1: The aquifer as our physical space. ui(x) = hydraulic conductivity at point x. We shall ask that the space 5 be open, for technical motives, and that the func tions be regular, say continuously differentiable. In fact, let us take
n = c1(5). A more classical notation for the hydraulic conductivity at point x when the profile is ui would be D(x,ui)
=
Lj(x).
Q is made up of continuous functions, thereby we can specify A in a natural way, using the properties of the space of continuous functions with respect to uniform convergence 1 . Let u : S —> IR be the piezometric level, which we assume to be a twice continuously differentiable function: u g C2(S), for each data configuration in the medium. The advantages of this degree of regularity will become apparent below. The confined nature of the medium is manifested in the boundary conditions £
- 0 in dS.
(1.5)
Let us consider the space of all continuous functions on S, with the topology of uniform convergence on compact subsets of S. Then Si is a subspace of this space, from which it inherits the relative topology. Let us take A to be the smallest tr-algebra containing all the open subsets of f2 i.e. A consists of the corresponding Borel subsets of (I. See [31].
1.2. RANDOM
VARIABLES
7
The continuity equation asserts that V(D(x,w)u)
=
{x) in S
(1.6)
for each w £ Q, (j>(x) being the local extraction rate per unit volume, that is
I If {x)dv ^! J J JB sec of water is extracted from the zone B, for any open set B C S. determine the function
u: S xil^-
Hence we
St
solving the boundary value problem (1.5), (1.6). For each point x £ S we determine a function w H4 U(X,U) which represents the piezometric level at the point x if the hydraulic conductivity profile in the medium is u> £ fi. In other words the piezometric level at a point is a random variable (see the definition below). In general we are interested in real functions X defined in the space of out comes, for a given probability space (U,A,P). We are interested in associating a probability with all the events of the type • The value of X is larger than b £ St. • The value of X is not larger than a £ IR. • The value of X falls in the interval [a, b], etc. These events will be denoted by means of the symbols (X > b), (X < a), (a < X < b), etc. In general {X £ B) = {w £ 0 : X(w) £ B} for each B C St. For this purpose we introduce the following concept: Definition 1.2.1 A r a n d o m variable in (fi, A, P) is a function X : il —► St such that (X < x) £ A for each x £ St. For a random variable X the probability P(X < x) is always defined, since (X < x) is always an event. It can be shown that (A' > a), (a < X < b) are always events if A is a random variable. In fact (A £ B) is always an event -
CHAPTER
8
1. RANDOM
VARIABLES
and hence P(X G B) is always denned - provided that B is an open or closed set in IR, or the countable union or countable intersection of such sets, or the complement of such a union or intersection, etc. The sets B C IR which can be obtained starting from the intervals by means of set operations such as countable complementation, union, and intersection are called Borel s u b s e t s of IR, in honor of the French mathematician Emile Borel. The set of the Borel sets of IR - denoted by B(IR) or simply B - is a : IR —» IR a continuous function. Then i) aX + BY is a random variable, ii) XY
is a random variable,
iii) cj) o X is a random variable. In particular \X\, X2, cos(X), sin(JT), etc. are random variables if X is. We can also consider complex random variables: if X and Y are real random variables, Z := X + iY is a complex random variable, by definition. Hence e,Y := c o s y + isin Y is a complex random variable and likewise e := e e .
1.3
Distributions
Let X be a random variable in the probability space (Q, A, P). We can determine a function Fx ■ IR —> IR (called the cumulative distribution - CDF - or simply distribution function of X) given by Fx(x) : = P{X < x).
(1.7)
Clearly if x < y then (X < x) C (X < y) and hence Fx{x) < Fx(y), that is, Fx is monotonically increasing. Furthermore, when a; —>• +oo the event (X < x) tends to 0. and when x —»■ —oo the event (X < x) tends to 0. Hence it is reasonable that lim Fx(x) = 0, Urn Fx{x) = 1. (1.8) i-> + oo
v
'
s:->-oo
v
'
v
'
We can prove that this occurs, along with the less intuitive fact that Fx is always right-continuous, that is the following result holds:
1.3.
31
DISTRIBUTIONS
Figure 1.2: A continuous CDF P r o p o s i t i o n 1.3.1 The cumulative distribution function of a random variable X is monotonically increasing, right-continuous, and satisfies (1.8). The plots of cumulative distribution functions have the general form shown in figure 1.2 or 1.3. How can we determine the cumulative distribution function of a given random variable XI We can sample values of X, repeating N times the corresponding experiment, obtaining for each x £ IR the table of results: Repetition 1 2
Observation xi x2
x; < xl yes no
N
xN
yes
Clearly #{i
Fe(x) : =
#{i<
N : x{ N
^x}
em
is called the empirical cumulative distribution function of X. If x{ < ... < xN, it has a plot of the form represented in figure 1.4. The plot is a step function
10
CHAPTER 1. RANDOM
Figure 1.3: A discrete CDF
Figure 1.4: An empirical CDF.
VARIABLES
1.3.
DISTRIBUTIONS
11
with all steps equal to 1/JV. It is certainly monotone and right-continuous. One would like to determine the theoretical distribution function F from the empirical distribution Fc, and do so with the greatest possible confidence level. In practice, this is achieved by applying a variety of g o o d n e s s of fit statistical t e s t s , i.e. tests designed for testing the hypothesis H0 : Fe = F where F is a given distribution function. The hypothesis should be tested in such a way that a wrong rejection of H0 is made in no more than, say, 100a% of cases 2 . An important statistical test of this kind is the Kolmogorov-Smirnov test, which can briefly be described as follows: 1. Determine the greatest absolute discrepancy between F and Fc, i.e. D := sup \Fc(x)
-F(x)\
2. Determine a rejection level 5 := Sffia and 3. Reject the hypothesis H0 only if D > S. See [26] for greater detail on this test, as well as for tables giving S for various combinations of N and a. A very important example of continuous cumulative distribution function is given by V Z7T J-ao
called the standard Gaussian distribution. We say that X is a standard Gaussian random variable if P(X < x) = F(x), where F(x) is defined in (1.9). For such a random variable we have F'(x) = -±=e-**'2
(1.10)
but it is not the case that any cumulative distribution function is differentiable - see for example Fig. 1.4. 2
a is the so called significance level, typically a small positive number.
12
CHAPTER
1. RANDOM
VARIABLES
Figure 1.5: Exponential distribution. Definition 1.3.1 A random variable X is said to be continuous if its cumulative distribution function has the form
Fx(x) = I'
fx(OdC
J — oo
fx is called the probability density function (briefly density) of X. Clearly
/*(*)> 0,
[+°°fxWt
=l
J — oo
for any density function fx ■ Besides the standard Gaussian density function (1.10), an interesting example is that of the exponential density function
MO
o
-At
if t < 0, if t> 0.
The random variable T has the cumulative distribution shown in figure 1.5. Ran dom variables with exponential density occur frequently in the study of waiting lines and related topics, see [10], [11]. An important class of random variables consists of those which take on a finite or infinite number of values, say x0 < x: < . ... Let pk = P(X = xfc), k = 1,2,.. .. We say that the sequence P = (Po.Pi,-..)
(1.11)
1.3.
13
DISTRIBUTIONS
Figure 1.6: A discrete CDF is the discrete d e n s i t y of the random variable X. Clearly (set x_i := —oo) P{X<x) P{X <x) = Y, P{xk-i
=0
iix<x0,
<X <xk)
if x m _ ! < x < xm.
fc=0
Hence Fx{x)
= X)P*>
i f Xm 1
~
<
x
-
Xm
'
as shown in figure 1.6. Important examples are a) the discrete binomial density, with parameters n and p: it is said that X B(n,p) if
p ( X = fc)= (nk\p\\-pT-\
fc
= 0,l,...,n
b) the Poisson density with parameter A: X ~ P{\) if P{X = k)
k\
t = 0,l,..
14
CHAPTER
1. RANDOM
VARIABLES
In general, a sequence (1.11) is a discrete density if and only if • Pk > 0 for every k and • J2kPk = 1-
1.4
Expected Value
A random variable can take on a large, often infinite, number of values. We are interested in substituting, in place of the possible values of the random variable, a representative value which takes into account the large or small probabilities associated with the above-mentioned values. The following construction gives a three-step definition (1.12) - (1.13) - (1.15) of this representative value of a random variable, to be termed its e x p e c t e d value. It is given in the spirit of abstract integration theory, and follows closely the first chapter of [31]. Suppose X is a non-negative random variable. For each n > 1, consider the points k , ¥ , fe = 0 , l , 2 , . . . and let
*»H = i- if i- < * M < *■ i
O"
9n
—
^ 2n
Note that Xn —>• X when n —> oo a.s. Observe that Xn assumes at most countable many values, depending on the observed outcome w, i.e. X is simple. Let £ consist of all non-negative simple random variables S such that S < X a.s.; we know now that S ^ 0. For each S £ E, let a\, ) = crk if xk < X(u) < xk+iThe weighted average of S is then defined as f S dP := f^ akP (xk < X < xk+1), J
(1.12)
k=i
and it is called the integral of S with respect to P. It is a non-negative number, possibly +oo. Define the integral of X with respect to P by means of / X dP := sup / S dP.
(1.13)
1.4. EXPECTED
VALUE
15
Again, / X dP is a non-negative number, possibly +00. Observe that
/ XdP < J Y dP if X
(1.14)
For a general random variable X, \X\ is a non-negative random variable, and so are X+ and X~, with
x±^\x\±x 2 Observe that
x = x+-x~, \x\ = x+ + x~. Then, the integrals
jx-dP are always well denned, although they can assume +00 as a value. Definition 1.4.1 The integral of a real random variable X with respect to the probability P is denned by
J XdP := j X+dP - J X'dP,
(1.15)
provided at least one of the two integrals on the right is finite. Note that, for integrable X, J X dP can be a real number, +00 or —00. Definition 1.4.2 A random variable X has e x p e c t a t i o n if J X dP is finite. The expected value of X is then EX := /
XdP.
Clearly X has expectation if and only if \X\ has expectation. Moreover, it can be easily verified that E(aX
+ (3Y) = aEX
+ (3EY
for every choice of a, /? 6 IR. Observe that, for a given S £ S, r
hp{S)dP
=
°°
Y,
(1.16)
16
CHAPTER
1. RANDOM
VARIABLES
hence it is natural to expect that, for continuous ip, = [+°°
sup [
•'-oo
the integral on the right-hand side being understood in the sense of RiemannStieltjes. In fact, it is true that E
(117)
J — oo
for a continuous ip, and the expected value exists if and only if the integral on the right hand side exists. See [6]. For instance, • If X has a discrete density {po,Pi, ■ ■ •), i.e. P(X then
= xk) = Pk, « = 0 , 1 , . . .
oo
E
k=0
• If X has a continuous density fx then E
[+°°
J — oo
as follows from the properties of the Riemann-Stieltjes integral, see [6]. N o t a t i o n : If X has expected value, we write X € Li(Q,A, there is no possibility of confusion. Example 1.4.1 a) If X ~ JV(0,1), that is X has density 1
-.e-**l\
then a simple computation shows that EX = 0. b) If X is Cauchy, that is
dy Fx{x) -- 1 f* n J-co 1 + y2 then X £ Lx.
P), or X £ L\ if
1.4. EXPECTED
VALUE
17
c) If A" is binomial, then X £ Lx and
EX=£k(nk^p\l-Pr-k
= np.
d ) If X is Poisson, then X £ Lx and EX = T, ke—±-
= A.
to «
P r o p o s i t i o n 1.4.1 X e i i i / a n d oraZy t/|Jf| € l i and tfien |JBA"| < £ | X | . P r o p o s i t i o n 1.4.2 Let X,Y
£ Lx, a,fl £ 1R, cf> a convex function.
a) aX +/3Y £ Lx and E{aX
+ /3Y) = aEX
Then
+ J3EY.
b ) <j>{EX) < E(cf> o X) (Jenssen's inequality).
In particular if X £ L\ then
E(
+ fi,
hence E(aX
+ H) = l* if X ~ JV(0,1).
Jenssen's inequality has numerous consequences. For example taking <j>(x) = x2 we obtain {EXf < EX2, if X2 £ Lx. In this case we say that X is square summable and we write X £ L2(£l, A, P), or X £ L2 if there is no possibility of confusion. That is L2 = {X £ L, : \X\2 e i i } . If X is square summable then EX2=
f+°°y2dFx(y),
(1.18)
as can be easily verified. If there exists a density (continuous or discrete) the preceding formula becomes
EX2= [+°° y2 fx(y)dy J—oo
CHAPTER
18 or
1. RANDOM
VARIABLES
EX2 = Y^x\pk k
accordingly. If X £ L2, it is possible to estimate its discrepancy with respect to the expected value. In fact the difference X — EX is a new random variable. Since {X - EX)2
= X2 - 2{EX)X
{EX)2,
+
it follows that E(X - EX)2
= EX2 - 2{EX)2
+ {EX)2
= EX2 -
{EX)2.
Definition 1.4.3 If X € L2, we define its variance by VarX := E{X -
EX)2.
Example 1.4.2 a) If X ~ JV(0,1), then X € L2 and
EX2= f , ' i = . ^ = L Hence VarX = 1, since EX = 0. b) Let Y = aX + fi, EX = 0, VarX = 1. Then EY = (j. and VarF = E{a2{X
- EX)2}
= a2.
Example 1.4.3 (The uniform distribution in [0,1]). A random variable U has uniform distribution in [0,1] with density shown in figure 1.7. Hence its cumulative density is as represented in figure 1.8. A simple calculation shows that U € L2 and EU=l/2,
Varf/ = 1/12.
(1.19)
1.4. EXPECTED VALUE
Figure 1.7: Uniform density in [0,1].
Figure 1.8: Uniform CDF in [0,1].
19
Chapter 2 Random Vectors This chapter supplements the first, and it has been written in the same spirit. Roughly speaking, here we study those concepts which involve more than one (but no more than countably many) random variables. It lies well within the limits of Classical Probability Theory. The main topics discussed are: • joint densities and distributions, especially the multivariate Gaussian case (section 2.1), • independence, convolutions and the classical limit theorem for sums of ran dom variables (section 2.2), • transformations of random vectors, including the transformations of the corresponding joint densities (section 2.3).
2.1
Joint Distributions
Consider the example on the confined aquifer in section 1.2. An infinite family J- of random variables {Y.,xeS} (2.1) was constructed there, where Yx{w)
:=u(x,u>),
is the piezometric head at x corresponding to a hydrauhc conductivity profile w. 21
22
CHAPTER
2. RANDOM
VECTORS
Suppose finitely many points pi,... ,pn G 5 are selected for measurement and let the random variables Xi,..., X„ be defined by X{ := YPi,
i=l,...,n.
Then, X := (Xi,... ,Xn) constitutes a r a n d o m vector: it is a vector valued random quantity. Abstractly, given a probability space (Q,A,P), an n-dimensional r a n d o m vector is a function X : Cl —> lRn such that each Xi is a random variable. Here X{ is the i-th coordinate of X, defined by X(u>) =
(X1(u,),...,Xn(u,)).
Just as in the scalar case, the probabilistic properties of a random vector X are embodied in its distribution function Fx : IRn —> IR, defined as follows: FXx
x„(*i, ...,*•.)•■= P ( * i < »i, • • ■ , * „ < »„),
where the various "," stand for "fl", i.e. (A, B ) E i f l B ,
for A, 5 € A.
Clearly
n(*<*ot "nV.-<*o when i n —► +oo, therefore J^+oo^1
^ O " * ' ' • • > * » ) = F *>
X._ t (!Bl,-..,!Bn-l)-
(2.2)
In addition i=l
3=1
for any permutation { i i , . . . , i n } of { 1 , . . . , re}, hence the invariance property FxH,...,xin(z>i,
■ ■ ■ ,xin)
= FXl
Analogously
n(Ai
x„{xi, . . . , » „ ) .
(2.3)
2.1.
JOINT DISTRIB
23
UTIONS
when any of the as,-'s goes to —oo. Thus '$2mF*i
xn{xu...,xn)
= Q, i = l,...,n.
(2.4)
Clearly n?=i(-^; < z«) increases with any of the x;'s if the remaining coordinates remain fixed. Therefore, each of the n functions Xi^FXl
X„(xu...,xn)
i = l,...,n,
(2.5)
is m o n o t o n i c non-decreasing. Finally, a more technical property of multidimensional (joint) distribution functions is that each of the n functions *i^FXl
Xn(x%,...,xn)
i = l,..., n
(2.6)
is right-continuous. See e.g. [6] for a proof of these properties of joint distribu tions. On the other hand, experiments can at most furnish the joint distribution function of Xi,... ,Xn and not the random variables themselves, and even less so the probability space ( 0 , - 4 , P), which is the actual model one would Uke to have. In fact, experiment furnishes even less, given the practical Umitation on the number of measurements that can be performed. Indeed finitely many (say N) values of Xj,..., Xn are measured £ii, •■•,&!»,
i =
l,---,N,
from which the quotient #{t<W:frj<»i, ...,&»<*„} N can be evaluated for each given ( i i , . . . , i „ ) £ IRn, at least in principle. This determines the empirical distribution of X^,..., X„, from which a multidi mensional distribution function F : lRn -4 St can be inferred. The m o d e l i n g problem is: Find t a probability space (fl, A, P) and • a random vector X : Q —> lRn such that FXl
x„{xu
...,xn)
= F{xu...,
xn).
CHAPTER 2. RANDOM
24
VECTORS
Clearly, the above problem has a solution only if F is both right continuous and monotonic non decreasing in each variable separately. Let us assume that such is the case. For each non empty ordered subset {*i, • ■ ■ ,h} of { 1 , . . . , n } , i.e. for each fc-tuple ( l i , . . . ,ik) of elements of { 1 , . . . ,n}, define the corresponding marginal distribution Fxu,...,Xn[ti, ■ ■ ■, tfc) = lim F(xu . . . , x„) where "lim" means: 1. Take ]imXj_>+00 for each j $ {ii,..
.,ik}.
2. Set Xj = tj, j G {*!,.. . , 4 } . For instance F1(t)= ^1
Mm
F(t,x2,...,xn)
n(h, ■ ■ ■ ,tn)
= F(ti,
...
,tn),
etc. Consider the fc-tuple {»i,.. . ,ik} and the /-tuple { j i , . . . ,ji}- If k = I and both fc-tuples are just permutations of each other, then one should have Fn
H
= Fn
A-
(2-7)
On the other hand, if fc < / and ji — i 1 ; . . . , j ^ = i^, then one should have F
H
>t( =
.
f
l f ■ I**)
i"11-^^!
'»^*+»
(2-8)
JI(*I> - - - »**.*fc+i. - ••,<()■
tj-H-oo,j>*;
Compare (2.7) with (2.3) and (2.8) with (2.2), respectively. The famous Kolmogorov Extension Theorem states that if both (2.7) and (2.8) hold, then the modeling problem has a solution. Conditions (2.7) and (2.8) are referred to in the literature as Kolmogorov consistency conditions. See chapter 1 of [24] for a proof of Kolmogorov's Theorem. Just as in the scalar case, a joint distribution may have a density. In such a case ■ ■■ ■oo
J — oo
fxi,,..,xn{ti,-.',t;n)dii
...d£n
2.1.
JOINT DISTRIB
UTIONS
25
and
'* *■ = o^t:-
(2 9)
-
An example of joint density is the standard multidimensional Gaussian dis tribution; a random vector X has such a distribution - X ~ JV(0, / ) - if its density is fx(x)
= -^=e-IWI 3 / 2 ,
(2.10)
where ||-|| denotes the ordinary EucUdean norm in IR". In general a continuous function / : JRn —> 2R will be a probabihty density only if • f(x)
> 0 for all x G IR,
• SlR" f(x)dx
= 1-
If the components X i , . . . , X n of a random vector X have expectation (i.e. if Xi G £ i , i = 1 , . . . , n ) , then we say that .X G Z• ••,£*»)r-
(2-11)
Analogously we say that X G £2 if -Xi G £2, i = 1, ■ • •, ra. In such a case, each of the expectations EXiXj exist, in virtue of Schwarz's inequality (4.14). Thus the random matrix (X — EX)(X — EX)T has expectation. The covariance matrix of X is defined as - EX)T'.
Cov(X) := E(X - EX){X The (i,j)-th
(2.12)
element of Cov(X) is ea := E{Xi - EXi)(Xj
-
EXjf,
and it is referred to as the covariance of Xi and Xj. A simple computation shows that if X ~ N(0,1), then X has mean 0 G IRn and its covariance matrix is the identity matrix / G IRnxn.
26
2.2
CHAPTER
2. RANDOM
VECTORS
Independence
In courses on Elementary Probability one defines the conditional probability of A given B as p{AnB)
P(A\m-
P{AlB)
P(B) ' provided both A and B are events and P(B) / 0. One can say that A and B are independent if P{A\B) = P(A) and P(B\A) = P{B), if both conditional probabilities are defined. Thus A and B are indepen dent if and only if P(AnB) = P(A)P(B). (2.13) Clearly, this last condition makes sense even if either P(A) or P(B) hence can be (and usually is) taken as the definition of independence B. Two random variables X and Y are independent if the events (X (V < y) are independent in the sense of (2.13) for each choice of x,y € P(X <x,Y
= P(X<
x)P{Y
vanishes, of A and < x) and 1R, i.e. if
< y).
In other words X and Y are independent if and only if Fx,Y(x,y)
= Fx{x)FY{y).
(2.14)
More generally the random vector X £ 1R" has i n d e p e n d e n t c o m p o n e n t s if all its marginals Fx^^.^Xn have the multiplicative property **.
xik = FXil---FXit,
(2.15)
for each ordered fc-tuple ( i j , . . . ,ifc) with { i j , . . . ,ifc} C { 1 , . . . ,n} with k < n. N.B. It does not suffice to ask that FXl Xn have the multiplicative property for k = n only. If X and Y have a joint density fxy, then both X and Y have a density (fx and / y , respectively), namely the marginals +oo
fxy{x,y)dy
/
-oo /+oo
fxy{x,y)dx. -OO
2.2.
27
INDEPENDENCE
By (2.14), if X and Y are independent
f
r
fxr&vWdr,
= f
J—OO J — OO
fxtiWF
Ml)*l
J— OQ
=
/"
J—OO
f
fx(()fY(r))dZdr,.
J — oo J—oo
Apply Fubini's theorem and (2.9) to obtain fxv(x,y)
= fx[x)fY{y)
(2.16)
if X and Y are independent random variables with a joint density fx,Y- Con versely, if the joint density factors into the marginal densities, then (2.14) holds. Thus (2.16) is a necessary and sufficient condition for independence, provided the joint density exists. Let X and Y be independent and let <j> and V> be bounded and continuous. Then (f>(X)if>{Y) € Lu and EftXMY)
=
[+aa [+X J—OO
=
4>(x)l>(y)dFxY(x,y)
J—OO
[+°° 4>{x)dFx(x) [+°° j>{y)dFy(y), J—oo
J — oo
i.e. under independence of X and Y, E{X)j>(Y) = E(X)E4,{Y).
(2.17)
In particular, if X, Y € L2 are independent, then
w
, = (-f>Vat°(r)).
In fact, by (2.17) E{X - EX)(Y - EY) = E(X - EX)E{Y
- EY) = 0.
Moreover Var(aX + bY) = a 2 Var(X) + 2abE{X - EX){Y - EY) + b2Va.r{Y), i.e. Var(aX + bY) = a 2 Var(X) + 6 2 Var(y) if X and Y are independent, for any choice of a, 6 € IR. The above results can be easily generalized:
CHAPTER
28
2. RANDOM
VECTORS
Proposition 2.2.1 If an n-dimensional random vector X has independent ponents, then Cov(X) = (fe»0(Var(Xi),..., Var(X„))
com
and Var cTX = c^Var(Xx) + ■ • ■ + c* Var(X„) for any c £ Mn. Let us apply the above results to the following particular situation: sampling a given random variable, like when a measurement is repeated a certain number of times. Suppose X is a given random variable. A sample of length n from X is a finite sequence of independent random variables Xi,..., Xn, each of them hav ing the same distribution as X. The components of the sample are said to be independent and identically distributed ("iid" for short). Given a sample, the sample mean --' + X " n is computed in order to estimate the "value" of X. Suppose X £ L2, with Mn :=
Xl +
EX = m,
(2.18)
VarX = a2.
Then 2
EMn = m,
VarM„ = — (2.19) n and the advantage of forming the sample average becomes apparent: While the mean is unaltered, the variance reduces when the number n of observations in creases. Assuming there is a density, the situation is as depicted in figure 2.1.
Thus, forming the arithmetic mean of a sample of measurements higher precision of the estimate. One feels tempted to assert that M„ —> m as n —> oo.
results in a
2.2.
29
INDEPENDENCE
Figure 2.1: Density of the sample mean when the number of observations n increases. In fact it is true that P fliir^ Mn = m) = 1
(2.20)
if X\,X%t. . . £ L2 are iid random variables with mean m. This result is known as the Strong Law of Large N u m b e r s . See [24] for a proof of this important theorem, whose practical value cannot be over-emphasized. In particular, let A be an event and let
X
/ 1 if u E A = \ 0 if u, £ A,
^
so that EX = P(A). Note that Xi + . . . + Xn = # { i < n : Xi e A} and (2.20) specializes into p ( ^ #
{
^
n
-
X
<
e A }
= P
{
A ) ) = l ,
CHAPTER
30
2. RANDOM
VECTORS
which provides an a posteriori justification of the Principle of Statistical Regu larity and thereby of the whole Frequency Approach to Probability. On the other hand, we know that VarM n = cr2/n,
EM„ = m,
but we know nothing about the distribution of Mn. It is true that EZn = 0,
VarZ„ = 1
where
Zn := ^fP-.
(2.21)
The Central Limit T h e o r e m states that if Xi,... m and variance a2, then
,Xn G L2 are iid with mean
lim P(Zn < x) = ~ f e-?l2di v27r ^-oo
(2.22)
n—>oo
uniformly in x € tttThus, the sample mean is given by Mn = m + —=Zn, y/n
where the normalized errors Zn are "asymptotically Gaussian". This explains at least partly the ubiquitous character of the Gaussian distribution.
2.3
Transformations of Random Vectors
Let X ~ JV(0, / ) and define Y = AX + b, n
(2.23)
nxn
T
where b G R , A e IR . Then EY = b and Cov(F) = AA , as can be easily verified. But, which is the distribution of Yl In general, let X be an n-dimensional random vector with distribution Fx, and let g : IRn —> IR" be a continuous transformation. Define the new random vector Y : O -> iR n by setting y(«) =
ff(A-H),
wtfl,
2.3. i.i.
TRANSFORMATIONS TKANatUHMATlUlyS
OF RANDOM RANDOM OF
631
VECTORS VtiLHURS
-- briefly Y = g(X). g{X). What is the distribution of YI Y? Does Y have a density if jX does? Let X have a continuous density and suppose g is continuously differentiablt differentiate, with a non vanishing Jacobian. Then it is given by the equations yi
=
gi(xi,.. ■ » » ! . )
=
Sfi(a:i,...,ajB)
yn = Vn =
gn(xi,. ■ » * n ) gn{xu...,xn)
Vi
whereas its inverse is given by equations ss ii
= =
2x„n
=
hi(yu•...,y .yn) n) fti(yi). ). hn(yhx,.n(yu■...,y > nyn).
We know that J:_8(x u...,xn) J:=8(xu...,xn)
oo{yu---,yn) (yi,..-,yn)
Let Let
(( -- 00 00 ,, yy] ] := := {77 {77 ee 2R" 2R" :r?i :r?i < < yi, yi,
= ll ,, .. .. .. ,, nn }} .. tt =
Then P(Y
= =
/■■■/ /*, yJ J»-»((-OO,V]) Jg-m-oo,y])
/[ - ■- ■[ /
J
./(—00,y] J(—oo ty]
fXt /*,
X„(®l,--r*n)d£l---d£n
Xa{*l(Vl,---,Vn),--->*n(VU---iVn))\J\dll"'
by the theorem for change of variables in a multiple integral [1]. Then Y has a density if X does, and ...,Yn(Vu--,Vn) fru...,Yn(Vu••»!&») =
/*, fx,
d(Xi x,xn1) d(xu... .■ jr„(3l(yi, ■ ■ • > Vn), • ■ • . »n(yi, ■ ■ d(y^...[yl) ■ > Vn)) T,(gl(Vl,---.yn),.--,»~(Vl,---,Vn)) d(yi,...,yn)
2 24 ((2.24) - )
32
CHAPTER
2. RANDOM
VECTORS
In (2.23), assume A is nonsingular. Then the Jacobian needed is det(A ^/det{AAT)~\
) =
Since 1
/
II
Il2
fx{x) =
exv
v^
1
-rf ii^m^ri^m
by (2.24) we obtain
/.a-) , my>
^/(2,r)»det(^)
P
l
2
/
This is the multivariate Gaussian distribution with mean 6 and covaxiance matrix AAT, and we say that Y ~ N(b, AAT). In general, the multivariate Gaussian distribution with mean m and covariance K > 0 is the random vector Y with density My)
1
=
=e -|(y-m)
f
jr-'(y-m)
(2
25)
^/(27r)"det(i
(2.26)
can be carried out efficiently in an algorithmic fashion, with L a lower triangular matrix. Thus, a random vector Y ~ N(m, K) can be realized from a Gaussian vector X ~ N(0,1) by simply transforming it according to Y = LX + m.
(2.27)
As another example, let us obtain the distribution of a pair of independent ran dom variables X, Y with marginal densities fx and fy, respectively. Note that, by the independence assumption / ^ , y ( i , y ) = /;r(a;)/y(y): Consider the trans formation u = x + y, v = y with unit Jacobian. Then fu,v(u,v)
= fx{u
-
v)fY{v)
2.3.
TRANSFORMATIONS
OF RANDOM
33
VECTORS
and the desired marginal density is +oo +oo
/
-oo -OO
fx{u-v)f fx{u-v)fYY{v)dv. {v)dv.
In other words fx+Y = fx * fx+Y * /y fy if X and Y are independent, where "*" denotes the convolution operation, defined lefined as in /•+0O
g{x) = = [+°° f(x // ** g{x) f{*- - y)g(y)dy y)g{y)dy J — oo
(2.28 (2.28)
J — oo
for or f, f,ge L^M). a e Li(m).
Certainly the distribution of X+Y coincides with that of Y+X, Y+X, and and therefon therefore me should expect that one (2.29 f*9=g*f. (2.29) f * g = g*f. Moreover X + (Y + 4 Z) Moreover + Y) + Z, hence one should have Z) = (X + f*{g*h) f*(g*h)-(f*g)*h. = {f*g)*h.
(2.30 (2.30)
3oth properties (2.29) and (2.30) can be readily derived from the definition (2.28). (2.28) Both It is possible to consider sums X\ Xi + + -- ■ variables ■ -+Xn of n independent random variables: heir distribution is given by the density function their fx,
*•■■*/*„,
without X^s are are iid, witl with vithout ambiguity because of (2.30). In particular, if the X^s Xi + + ■ -■- ■ has densityfx*, fx*,where where common ++ XnXnhas density ommon density fx, fx , then X\ r
■■= /*■■■*
f:
n times Further, •urther, the sample mean Mn of (2.18) has distribution /M„(Z)
=nfx"(nx).
Moreover, vloreover, the normalized norm alized mean (2.21) has density ffZZ„{z) „{z) = = *y/nf?(m *y/nf?(m
+ +
-^z). -^z).
follows: this notation, notation, the the Central Central Limit Limit T Th h ee oo rr ee m m can can be be restated restated as Inn this asi follows:
34
CHAPTER
2. RANDOM
Proposition 2.3.1 Let X € L2, with EX = m, Var(X) has a density fx ■ Then lim ay/n~ r
fr'{m+-^=z)dz
= - L
r
VECTORS
= a2, and suppose it
e-?l2dZ
uniformly in x € M. Of course, equation (2.22) is richer in probabilistic meaning, and therefore is preferable, from our viewpoint.
Chapter 3 Sampling Random Variables This third chapter concludes our tour into elementary Probability. It deals with the more practical aspects associated with probabihstic models. This subject matter is to probabUity theory what the various numerical methods are for linear algebra, differential equations, etc. The main topics touched upon in this chapter are: • design of sampling methods suitable for simulating a given random variable or vector on a computer, assuming a random number generator is available, • design of Monte Carlo techniques for the solution of deterministic problems via simulation. Applications are given to evaluation of integrals and solving systems of linear equations, • analysis of the error incurred when using Monte Carlo techniques, and some ideas on variance reduction that can help in order to achieve greater accuracy in computation.
3.1
Designing a Sampling Technique
Let f/i, . . . , [ / „ be iid, each of them uniformly distributed in the unit interval. By (1.19) and (2.19), the sample mean (i/ x H h Un)/n has mean 1/2 and variance l/(12ra), and therefore U1 + ... + Un-z V 12
35
36
CHAPTER
3. SAMPLING
RANDOM
VARIABLES
has mean 0 and variance 1. In addition, by the Central Limit Theorem the distribution of Vn approaches N(0,1) for large n. Assume a sample % , . . . ,un (with n "large") is available, and suppose it was drawn from the uniform distribution in [0,1]. Then vn:=J— U i + ■•■ + « „ - Vn \ II
(3.1)
yields an individual sample which can be assumed to have been drawn from N(0,1), provided n is sufficiently large. Thus, generating a single sample from N(0,1) involves the following: i) generating n individual samples from the uniform distribution
:
ii) performing n sums, two divisions and one multiplication, plus iii) extracting one square root. If a large number m of individual samples from iV(0, 1) is required, then the three steps above must be repeated m times, and doing so may be computation ally expensive when n is large. The corresponding overhead can be drastically reduced if n = 12, case in which (3.1) specializes into v = Wi +
1- u 12 — 6.
(3.2)
The three steps above reduce to i) generating 12 independent random numbers, ii) performing 11 sums and one subtraction. If a very large number m of standard Gaussian variates are to be generated, using (3.2) instead of (3.1) for other values of n may result in substantial savings. Random numbers are generated on a computer by calling a built-in routine, implemented in most high level compilers. Underlying such routines there are very interesting deterministic (surely chaotic) algorithms, whose consideration would take us too far afield. The interested readers are referred to [22]. Also called random numbers
3.1.
DESIGNING
A SAMPLING
37
TECHNIQUE
However, (3.2) still requires 12 calls to the random number generation routine to generate just one (approximately) standard Gaussian variate. The following algorithm requires 2 calls to the random number generation routine to produce two independent samples from N(0,1). It is called the polar m e t h o d , or Marsaglia's, and it can be described as follows, using a C-like syntax: do { g e n e r a t e u, v; s = ( 2 . 0 * n - 1)"2 + ( 2 . 0 * v - D ' 2 ; > while ( s >= 1 ) ; r = sqrt( - 2 . 0 * log s / s ) ; x = r * u , y = r * v; The purely arithmetic part of this algorithm generates a pair of numbers x,y for every pair of random numbers. Moreover, it requires 3 multiplications, 1 addition, 1 division, 1 evaluation of logarithm and one extraction of square root. In addition, it may waste one or more pairs of calls to the random number generation routine, until condition s < 1 is obtained. So it is costly, too. However, in our experience the polar method is quite reliable and does not cost much than the method outlined earlier, on a per Gaussian variate generated basis. Moreover, it is exact, and this circumstance may lead to more reliable results. Indeed, it is a theorem that x,y in the polar method are two independent standard Gaussian variates. The proof, incidentally, is based upon the change of variable formula (2.24) and can be found in [22]. Observe that n independent standard Gaussian variates constitute a sample of an JV(0, / ) random vector. Indeed, 1
c -ll*ll
2
/2
=
_!_„-■J/a
Le-«&/2
Therefore, one sample of N(m, K), with K > 0, can be computed by proceeding as follows: i) Factorize K into two lower triangular factors L and L ii) Generate n independent samples from N(Q, I).
(Cholesky).
38
CHAPTER
3. SAMPLING
RANDOM
VARIABLES
Figure 3.1: Inverting a given distribution. iii) Apply the transformation Lx + ■ to such sample. In general, one may want to sample a given random variable X, not neces sarily Gaussian. Suppose its distribution function is strictly monotonic. If U is uniformly distributed in [0,1], P {Fx\U)
<X)=
P(U < Fx(x))
=
Fx{x).
That is, Y := F% o U has the same distribution as X. This observation gives a general method for sampling a given distribution: if u is a random number and x is the unique solution of
Fx{x) = u
(3-3)
(see the sketch in figure 3.1) then a; is an individual sample of X. As an illustration, consider the exponential distribution with parameter A, for which u, i \ / 0 if a < 0, F
*(') = [l-r*
ifx>o!
Then x = --log(l
-u)
3.2.
MONTE
CARLO
METHODS
39
gives a sample from X if U is a random number. So does y = --logu,
(3.4)
by the way, with the added advantage of saving one subtraction. If equation (3.3) is difficult to solve, an open possibility is devising an approx imation to Fx1 and using x = (j>(u)
instead of solving (3.3). See [37] for additional tricks for sampling random vari ables.
3.2
Monte Carlo Methods
The Monte Carlo method for solving problems consists of • devising a probabilistic model which contains the problem's solution as a parameter, and • sampling the relevant random variables in order to estimate such parameter. Typically, the parameter to be estimated is the mean of a certain random variable X. By the Law of Large Numbers, the sample mean constitutes a good estimate of EX, for large n. Hence the solution to the original problem can be approximated by means of a probabilistic mechanism, even if the original problem itself was deterministic in character. For instance, consider the problem of evaluating an integral /=/1/(x)dx, Jo
(3.5)
for continuous / . If U is a random variable uniformly distributed in [0,1], then f(U) is a random variable, and Ef(U) Given a sample Ui,...,
= I.
U„, the Monte Carlo estimates for the above integral is
. Ux + • • • + Un 7~ , n
40
CHAPTER
3. SAMPLING
RANDOM
VARIABLES
the larger n the better. In general, let a : 7R —> IR be a bounded, right continuous, non-decreasing function. Consider the problem of evaluating the Stieltjes integral +oo
fda
/
-oo
with / as above. Let m : = inf a, M := sup a, and define F := (a — m)/(M so that I = {M -m)
— m),
I °° fdF. J—oo
Observe that 0 < F(x) < 1 and, moreover, lim Fix) = 0, lim Fix) = 1, SE-+-00
3!-> + 00
so that F is indeed the distribution function of a random variable X. / = {M
Therefore
-m)Ef(X).
In theory, in order to evaluate / approximately, it suffices to draw a sample Xi,... , Xn of X and form the estimate r
. -Xl + • ■ ■ +
/ir
I ~ (M - m)
n
Xn
.
As an example, let r+oo /•+oo
I = / Jo
Xx f(x)dx e e- "T[x)ax.
Then
I
={j0+°°f(*W-e-Xn
= \Ef(X),
where X is exponentially distributed with parameter A. Using equation (3.4), a sample U\,..., Un from the uniform distribution in [0,1] can be converted into a sample Y\,.. ■ ,Yn from the exponential distribution. Then 1YX+ ■■■ + ¥„ A n for large n.
3.2. MONTE
CARLO
METHODS
41
This idea is by no means confined to the approximate evaluation of one di mensional integrals. Indeed, let R be a region in Rd, and let / , <j> : R -»■ R be two continuous functions. Suppose <j>{xi,.. ,,Xd)
> 0 inR
and let M : = / ■ ■ ■ / (f>(xi,. . . , Xd)dxi . .. dxd < +oo. Then, p := <j)/M is the density of a certain random vector X £ 5? d . Therefore, I■=]■■■
} f(xu
■ ■ ■ > xd)<j>(xu ..., xd)dXl ...dxd
=
MEf(X),
thus allowing for a Monte Carlo evaluation of / , once a sample X%, ■. ■, Xn of X is obtained, with large n. For instance, let 7
"
: =
/
" '/
e""^"'""^/(zi. • • • - ^) da; i ■ • ■ **d,
where ui := (u»i,... ,Wd), with u»i > 0, i = 1 , . . . ,d. Letting y; = VluiXi,
i = 1, ...,d,
if follows that
,„ = ^ .
£
/
( «
*
)
,3.6)
where 5^,. . . , Vj are independent standard Gaussian random variables. We have seen how to sample this distribution, and we can use such knowledge to compute a Monte Carlo estimate of Iu based on (3.6). As a last illustration of the Monte Carlo idea, consider the problem of solving a system of linear algebraic equations Ax = b,
(3.7)
which is assumed to have a unique solution x £ IR . Let us define the quadratic form
*(z) = \\Ax-b\?Q, where
(3.8)
CHAPTER 3. SAMPLING RANDOM
42
VARIABLES
with Q a positive definite and symmetric matrix. Then, solving (3.7) is equivalent to minimizing $ . Define the ellipsoid E = {x € JRd : *(x) < 1}. Observe that the centre of the ellipsoid is precisely x := A~lb. *(x) = {Ax - b)TQ{Ax - b) = (x - x)TATQA{x so that e"* (l ) differs from the N(x, (ATQA)~1) factor, namely /„,<< e~*^dx. Therefore Jrndxe
Moreover, -
x),
density by a mere normalizing
*^dx
(3.9)
fjg e-*W«fa x = EX, T
1
with X ~ N(x, (A QA)- ). In principle, it suffices to estimate x using the corresponding sample mean and then sample N(x, (ATQA)~1), but this may not be the most efficient Monte Carlo technique we can think of. Instead, we can approximate the two integrals appearing in (3.9) using the Monte Carlo ideas presented above. For, observe that E is bounded. In fact, let it be contained in, say, the parallelepiped P = [a,,6i] x ••■ x [a„,6 n ]. Let V := (V1,...,Vd) be uniformly distributed in P, i.e. each V^ is uniformly distributed in the corresponding interval [at, &,-], i — 1 , . . . , d, and the various V^s are independent. We know how to sample V. Let V 1 , . . . , V" be a sample drawn from the distribution of V. Then
Jm"
L-and therefore
-*Wdx dx ~ i
{x)
1
E«-
E 2 - i v;*e-*(v») " E2 =1 e-*Cv fc ) '
Xi :
n
K>e- *(v»: *
*(Vt)
i = l... . , 7 1
3.3. ERROR
BOUNDS
43
gives the components of the solution vector x. Observe that each component of the solution can be estimated independently, thus resulting in a naturally parallel algorithm. Additional examples of the Monte Carlo technique can be found e.g. in [32] and [37].
3.3
Error Bounds
Monte Carlo computations are based upon the Strong Law of Large Numbers, which is recalled here for ease of reference: £(«)
:=
* i + • ■ • + * » _^ n
with probability
L
(2.20)
In order to estimate the error 1 ' ° ' — fi as a function of sample length n, let us begin by establishing the following result P r o p o s i t i o n 3.3.1 (Chebyshev's
inequality) Let Y E L2. Then, for e > 0 VarY
P(\Y-EY\>e)<^-. PROOF:
(3.10)
It suffices to observe that e2I[\Y-EY\>c)
<\Y-
EY\2IUY-EY\>C)
<\Y-
EY\\
where I A denotes the indicator function of A, and then take expectation to obtain e2P{\Y
- EY\ >e)<
E{Y -
EY)2,
from which the inequality (3.10) follows.
□
Chebyshev's inequality yields a snappy proof of the Weak Law of Large Num bers: letting a2 stand for the common variance of the AVs, get 2
0 < P (\X^ VI
- rfj,\ >e) < — 2l- . I / nc ne
Letting n —> oo, it follows that A"("> -> ii
in probability,
(3.11)
CHAPTER 3. SAMPLING RANDOM
44
VARIABLES
which is the content of the weak law of large numbers. Standard theorems on the convergence of random variables imply that (2.20) holds automatically for a subsequence. 2 On the other hand, letting
*:=4 ne'
it follows that
p
(lJ*wH<^)*1-<. r-l/2
\XW - J < — - = w.p. at least 1 - 8. (3.12) 1 ' Vra Given 8 and a, the error indeed goes to zero, at a rate 0(1/'y™)> which is sometimes rather t o o slow in practice. This seems to be an inherent limitation of this method. However, in practice one has got some freedom in the choice of X, the random variable whose expectation yields the answer to the original problem to be solved. Since there are infinitely many random variables with a given expectation, it is advisable to choose X with the smallest possible variance a2. Let us now review some of the techniques recommended to reduce the variance in Monte Carlo computations. For the sake of illustration, let us go back to the problem of evaluating / in (3.5). Let g be a continuous function such that sup \f(x) - g(x)\ < e
(3.13)
0<x
for a given £ > 0, and suppose
J := I g(x)dx Jo can be evaluated exactly. For instance, the Weierstrass' Approximation Theorem [2] states that a polynomial g satisfying (3.13) can always be found for any e > 0. Besides, there are constructive algorithms giving one such polynomial [2], hence it is practical to evaluate J instead of I. Then, I = J + (I — J), where I
-J
[\f(x)-g(x)]dx.
Jo
It holds indeed for the full sequence, as we know.
3.3. ERROR
45
BOUNDS
Figure 3.2: A bounded region surrounded by a rectangle. Taking (3.13) into account, it is clear that Var(/(t/)-(£/)) < 4 e 2 . Therefore, (3.12) specializes into Error in the computation of / — J < with probability at least 1 — S. Thus, it is recommendable to use a conventional (i.e. analytical or numerical) method to approximately solve the given problem, and then use Monte Carlo on the "residual problem". In fact, let R be a bounded region in R2, and Q be a rectangle containing R, say Q = [a,b] X [c,d\ (see figure 3.2). A possible way to estimate A(R) is the following. Consider the uniform distribution in Q. Then, the probability of a point of Q lying in R is p := A(R)/A(Q), hence A(R)=pA(Q) and it only remains to estimate p. Let XR(W)
=
1 0
if w € R otherwise
CHAPTER 3. SAMPLING RANDOM
46
VARIABLES
so that EXR=p,
VarXJJ=p(l-p).
Let P i , . . . , Pn be n points chosen at random 3 within Q, and let nR be the number of those lying within R. Then nR
<
*-x'yp(i-p)
\/n P n with probability > 1 — S, which shows that UR/U is a good estimate of p for large n. However, the relative error is aa
-*L<J±=l< npS \/npS
This inequahty shows that measuring p in terms of nRJn may give an unreliable result if p is small, and Monte Carlo should be applied taking this into consider ation. Indeed, Monte Carlo techniques often fail to guarantee a high level of pre cision, unless very large values of n are used. In practice this may or may not be acceptable, depending on the application. A somewhat rosier picture emerges upon a closer look at (3.12), however: the error bound depends solely on the variance cr and the number of trials n, for a given S. In particular, it does not depend on dimensionality or regularity, as most other estimates do. Therefore, Monte Carlo techniques may be expected to provide workable approximations where conventional numerical methods become uneconomical, say for solving high dimensional problems. As a second instance of this variance reduction technique, consider the prob lem of measuring areas of plane regions. Suppose for instance that the contour C := dR is given in parametric form as
x = (t), y = i>{t),
o<*
with cj> and ip piecewise continuously differentiate. Then
A(R) = - [ xdy-ydx= I Jc
[ f(t)dt,
Jo
''If P = (x, y), x is chosen at random within [a, 6] and y is chosen at random within [c, d).
3.3.
ERROR
BOUNDS
47
/
S- SR
Figure 3.3: A sample region surrounded by a polygonal. where
/(*)- \w)m - mm,
o<*
Therefore, an approximation scheme like the one in (3.13) could be used again in order to reduce the variance. For instance, one could approximate C by means of a polygonal of sufficiently many sides (see figure 3.3). Then Monte Carlo should be used in order to estimate the residual area only, since the area of the polygonal figure can be evaluated exactly by triangulation. However, we now know that measuring very small areas may lead to additional loss of precision. The foregoing approach to reducing variance in Monte Carlo computations is known as control variate, whereby a random variable with large variance is replaced by another random variable of small variance. An alternative to this ap proach is known as i m p o r t a n c e sampling, which requires choosing the random variable in such a way that samples are drawn at the values where "interesting things" happen. To make this precise, consider again the problem of evaluating
/ = /
f(x)dx.
Instead of sampling a random variable which is uniformly distributed in [a, b] and then estimating the mean of f(V), let us introduce a second probability density
CHAPTER
48
3. SAMPLING
RANDOM
VARIABLES
p, with Ja p(x)dx = 1 and write
1=
Sii^jp{x)dX
= EFp{X)
'
where
and X is a random variable with density p. Then
VarFp(X) = EF(Xy
- {EF{X)f
=[
f
-^dx
P r o b l e m : Choose p* in the class of all p{x) > 0 with So P{X) VaiFp,{X)
=
I\ 1 s u c n that
< VarF p (X)
for any other such p. This problem can be solved by variational techniques (see a discrete analog below), its solution being
?(*) = Jff 6 }'
■
(3-14)
L \f(x)\dx Clearly, this theoretical solution is of little practical use, since computing the optimal density requires evaluating / 0 |/(x)| 0, find g "easy to integrate" such that \f{x) - g{x)\ < e,
a<x
and then compute p(x) =
lff(»)l \fals(*)l
-Ul_ Then p should be a reasonable approximation of p* as in (3.14). D i v e r t i s s e merit. Minimize J27=i "*" subject to Y%=i Xi = 1, Xi > 0, i = 1,. . . ,n. Solution using Lagrange Multiphers:
- )
3.3. ERROR
49
BOUNDS
dL dxj
«?
,
— f + A,
dL
"
^ = £*" -1.
At a critical point of L, A > 0 and
= VA = £ M3=1
i=i
Then x -
H
|at|
£*=1 la*l
/ c - 1
.,71.
(3.15)
Observe the similarity of this solution with (3.14), of which (3.15) is a discrete
Chapter 4 Second Order Properties Consider the following problem, about the situation encountered at the beginning of section 1.2: An aquifer has been sampled at finitely many points, thereby obtaining local values of the hydraulic conductivity. It is desired to estimate this property at other points within the aquifer, without performing any other experiments. Let S C IR represent the portion of physical space occupied by the aquifer, and let pi,... ,pn £ S be the points at which sampling was conducted. Then, the values obtained are samples from the random variables Xi := D(pi,.),
i = l,...,n,
all of them defined on a common probability space (£l,A, P). Let Y be a random variable representing the hydraulic conductivity at a given point p £ S. It is desired to obtain the best possible estimate Y of Y, and base the estimation on the available information only. In the following sections we shall study some mathematical concepts that will enable us to setup and eventually solve this estimation problem. Among the topics dealt with in this chapter are: • the space of complex random variables having second moment, together with its geometrical and analytical structure, • in particular, the important Orthogonal Projection Theorem, and • conditional expectation and probability, and its application to solving esti mation problems such as the one posed above. 51
52
4.1
CHAPTER
4. SECOND
ORDER
PROPERTIES
Orthogonality
Let (Q, A, P) be a fixed probability space, and let X, Y, Z,... denote complex random variables. All the random variables encountered in this chapter will be assumed to have second moment, i.e. X,Y,Z,...&L2{n,A,P)=:L2. The inequality \z + w\2 <2{\z\2 + \w\2),
z,we(D
is easily established, and it shows that X,Y
eL2^X
+ Y e£j.
Moreover, clearly X £ L2=> aX e L2, for any complex number a. Hence L2 is a complex vector space. To illustrate, consider the following examples: Example 4.1.1 a) Let $7 = 1R, let A stand for all the Borel subsets of IR, and let P(A) = ~ yn
(
e~x2/2dx.
JA
Then L2 consists of all functions f : IR —tdJ satisfying
v b J-I e""Vai/(*)i"«fa < +°°-
(°)
We call this instance of L2 Hermite's space. Analogously, Laguerre's, Legendre's and Chebyshev's L2 spaces can be de fined by choosing b) Laguerre: Cl = [0,oo) and /'+00Ae-Al|/(a;)|2cia;<+oo Jo instead of (4.1). Usually A = 1 in this example.
(4.2)
4.1.
ORTHOGONALITY
53
c) Legendre: H = [-1,-1-1], and ^J^\f(x)\2dx<+oo,
(4.3)
instead of (4.1). d) C h e b y s h e v : ft = [ - 1 . + 1 ] , and 1 /•+! ■K l-i J-l
l/(*)|2 =dx < +oo,
vT^
(4.4)
instead of (4.1). These four examples of Li spaces are very important in both classical and modern Applied Mathematics, see for example [9] and [20]. Other examples of Li spaces (or rather, subspaces of a given Li space), can be constructed by considering sub-cr-algebras B of .A: Let B be a cr-algebra contained in A, i.e. a sub-cr-algebra of A. For instance, B could be the smallest possible cr-algebra, namely {0,f2}. A more interesting example is the smallest cr-algebra containing a given event B 6 A, let us call it again B. This cr-algebra must also contain Bc, besides containing 0 and U.. Since {0, B, Bc, Q,} is already a cr-algebra, it must coincide with B. A second example can be obtained as follows: let B\,...,Bn be pairwise disjoint events in A, and suppose 1
n = f; Bh. k=l
Then V := {Bi,... ,Bn} is said to constitute a (finite) partition of Q. Any cr-algebra containing V must necessarily contain all the unions of elements of V, of which there are 2 n , i.e. all sets of the form Bh + ■ ■ ■ +>k^B, foik> 1, with { i i , . . . , in} C {1, •. •, n}, plus the empty set. Since this collection of sets is already a cr-algebra (prove it!), it is the minimal cr-algebra B containing V. Clearly, V = {B, Bc} yields the second example. Recall that both + and £] operating on sets mean "disjoint union".
54
CHAPTER
4. SECOND
ORDER
PROPERTIES
In general, any collection C C A of events generates a sub-cr-algebra of A, namely the minimal sub-cr-algebra containing C. The bigger C, the richer the sub-cr-algebra of A that it generates. Let PB denote the restriction of P : A -> [0,1] to a given sub-cr-algebra B of A. Then (fi, B, PB) is a probability space on its own right. In fact, let us simplify the notation by dropping the super-index B. Consider S := L2(Q,B, P). Then, hnear combinations of random variables in 5 are random variables in S. Hence S is a subspace of L2. In other words: L2(£l, B, P) is a subspace of ^ ( f i , A, P) if B is a sub-cralgebra of .4. A new operation can be defined on L2 as follows: given X, Y £ L2, define their inner product by (X,Y) := E(XY). (4.5) In view of the easily established inequality 2\zw\ < \z\2 + H 2 ,
z,w€W,
the above operation is well defined, i.e. (X, Y) is a complex number for any X, Y G L2. The inner products in the Hermite, Laguerre, Legendre and Chebyshev instances of L2 space are (f,9)= (/,)=
±;J^e~"1/3H*)~fc)d*,
(4.6)
JQ+°° Xe-^f(x)^)dx,
(4.7)
{f,9)=
| Hi f(x)g(x)dx,
(4.8)
U,9)=
m i ^ d x ,
(4.9)
respectively. It is easy to verify that (•, •) possesses the following properties: i)
Bilinearity:
ii) Herrnitian
(aX + 8Y,Z)=
a{X,Z)+0{Y,Z),
(X,aY
a{X,Z)+0(Y,Z).
+ 3Z)=
character:
(x,y> = (y^o. iii) Positive non degeneracy: {X, X) > 0, and (X, X) = 0 only if X = 0.
4.1.
55
ORTHOGONALITY
Definition 4.1.1 Two complex random variables X, Y £ L2 are said to be or thogonal if (X, Y) = 0. An important consequence of orthogonality is the following: any finite col lection of pairwise orthogonal r a n d o m variables in L2 is linearly inde p e n d e n t . Indeed, if these random variables sire X\,..., sCn and axXi
+ . . . + anXn = 0,
then forming the inner product of both sides with Xi yields ati — 0 (why?), i = 1 , . . . , n. Therefore Xlt..., Xn are Unearly independent. It is n o t true that Unearly independent elements are necessarily orthogonal. Nevertheless any finite set of linearly i n d e p e n d e n t elements Xt,... ,X„ € L2 can b e replaced b y orthogonal e l e m e n t s Y j , . . . , Yn g L2 in such a way that Yk is a linear combination of Xi,..,, Xk, for each k = 1 , . . . , n. Indeed, let Y% := X%. Proceeding inductively, assume Yi,...,YJt have been obtained, with a) ( y i , y j ) = o i f i ^ j , b) Yj depends Unearly only on Xi, ■.. ,Xj, j = 1 , , . . ,k. Let Yk+1 = Xk+1 + aiYt
+ ... + akYk,
so that (i = 1 , . . . , k) (Yk+uYi)
= (Xh+1,Yd =
+
jr,<*i(YhX)
(X^M+ctiiY^Yi).
It suffices to choose (Xk+1,Yj) and {Yi,..., Thus
Yk, Y*+i} wiU have the desired properties.
Yx
:=
Xu
« - x>-t{MY- *=2 -
CHAPTER
56
4. SECOND
ORDER
PROPERTIES
generates the desired collection { Y i , . . . , Yn}. This procedure is known as GramSchmidt's orthogonalization procedure. For instance, consider the polynomials l,x,x\...
.
(4.10)
Since do + axx + .. . + anxn = 0 for a < x < /3 implies a0 = ai = • • • = o.n = 0, it follows that 1, x, x2,... are linearly independent in any space containing them. Applying the Gram-Schmidt orthogonalization procedure to (4.10) will result in a sequence of polynomials Po(x),pi{x),p2{x),...
(4.11)
with i) deg pk(x) < k, " ) {Pi,Pj) = 0
if i^ j .
In other words, (4.11) consists of orthogonal polynomials. If (-, ■) is as in (4.6), they will be called H e r m i t e polynomials, with the corresponding definition for Laguerre, Legendre and Chebyshev polynomials.
4.2
Orthogonal Projections
Definition 4.2.1 The norm of X G L2 is | | * | | := y/{X,X).
(4.12)
This definition makes sense for any X £ L2 by the positive character of the inner product (■, •}. Clearly | | * | | > 0, and by the non-degeneracy of (•, •) there results | | * | | = 0 if and only if * = 0. In addition, by the bilinearity of (•, •), it follows that \\aX\\ = ,/\a\'{X,X)
= \a\\\X\\.
A third property of || • || is the so-called triangle inequality, \\X + Y\\<\\X\\
+ \\Y\\,
(4.13)
4.2.
ORTHOGONAL
PROJECTIONS
57
which can be obtained as a corollary of Schwarz's inequality
K*.m
(4-14)
Indeed \\X + Y\\2
= (X + Y,X + Y)< \\X\\2 + (X,Y) + (Y,X) + \\Y\\2 =
||X|| 2 + 2 R e ( X , r ) + ||y|| 2
< ||x||2 + 2|(x,r)| + ||y||2 < ||x||2 + 2||x||||y|| + ||r||2 = (ll* + *l) 2 In turn, to establish (4.14) one can proceed as follows. Note that the inequality is trivially true if Y = 0. If Y / 0, let A := re'B £(D be given and observe that
||x + Ay||2 = ||x||2 + A(x,y) + A 0 , f o r a U r > 0 . Thus, the polynomial on the left hand side vanishes for at most one real value of the argument, hence its discriminant cannot be positive, and thereby we obtain Schwarz's inequality. The advantage of having a norm || • || on the linear space L2 is that we can now define a m e t r i c there: given AT, Y G L2, the distance between them is d(X:Y)
= \\X-Y\\.
(4.15)
Clearly, the distance d : L2 x L2 —► IR possesses the following properties, valid for X,Y,Z € L2: i) Symmetry:
d(X,Y)
=
d{Y,X).
ii) Positive non degeneracy: d(X, Y) > 0, and d(X, Y) = 0 only if X = Y.
CHAPTER
58
4. SECOND
ORDER
PROPERTIES
iii) Triangle's inequality: d[X, Z) < d(X, Y) + d(Y, Z). These properties are immediate consequences of the corresponding properties of the norm, given the definition (4.15). In turn, a metric in L2 allows us to introduce a convergence there: Given a sequence {Xn} in L2 and X € L2, we say that X„ converges to X (or tends) to X, and write Xn ~» X if d(Xn, X) -► 0, when n -> oo. In other words, Xn -> X if and only if given e > 0 there exists N > 1 such that \\Xn-X\\<e
itn>N.
(4.16)
This type of convergence of random variables is known as convergence in quadratic mean. It is customary to write X=
l.i.m Xn,
(4.17)
n—*oo
where "l.i.m" is read "limit in the mean". m , n > 1 be given. Then
Suppose Xn
—¥ X in L2, and let
\\xn-xm\\ < \\xn-x\\ + \\x-xm\\ e
e
if m,n > N. Thus {Xn}
converges in L2 => \\Xn — Xm\\ —> 0 when m , n —¥ oo.
The amazing fact is that the converse is also true, i.e. \\Xn — Xm\\ —> 0 when m, n —>■ oo => There exists X €. L2 such that Xn —> X when n —> oo. This property of L2 is far from trivial, and its proof is not easy (see for instance [31]). In the usual terminology, the property in (4.2) is called c o m p l e t e n e s s . S u m m i n g up: The space L2 defined in the preceding sections is a complex vector space. It also carries an inner product defined in (4.5), in terms of which both a norm (4.12) and the convergence (4.17) can be defined. Moreover, the resulting space is complete. Thus, L2 is a Hilbert space. The importance of completeness is trivially this: it is possible to define an iterative process and prove the existence of a limit for the resulting sequence
4.3. CONDITIONAL
EXPECTATION
59
w i t h o u t knowing the hmit a priori. A glance at (4.16) suffices to convince ourselves that the mere definition of convergence does require knowing the hmit in order to verify it. Let B be a sub-<7-algebra of .A and let {X„} be a sequence in L2(U,B,P). The sequence may converge in quadratic mean to a random variable X £ Li. An important property of the subspace L2(Q,B, P) is that X must necessarily belong to L2(Q,, B, P). Because of this, it is said that L2(£l, B, P) is a closed subspace of L2(Q,A, P), if B C A (see [31]). This fact has many desirable consequences. In fact, a fundamental property of closed subspaces in a Hilbert space is the celebrated Orthogonal P r o j e c t i o n T h e o r e m , whose proof can be found in [31] = T h e o r e m 4.2.1 Let H be a complex Hilbert space, and let S C H be a closed subspace. Then, for any x £ H there is a unique x £ S with the following properties: i) x is the point of S closest to x, i.e. \\x-x\\=unn\\x-z\\.
(4.18)
ii) The difference x — x is orthogonal to S, i.e. (x — x, z) = 0,
for every z £ S.
(4-19)
The content of this theorem can be clarified by looking at Fig. 4.1.
4.3
Conditional Expectation
A straightforward application of the Orthogonal Projection Theorem to the Hilbert space L2(Q, A,P) and the closed subspace L2(Q,B,P), with B C A, reveals the following facts: 1. Given X £ L2, there is a unique X £ L2(Q,B,P) function z i-> \X — Z\\, for z £ S.
which minimizes the
2. Such random variable X satisfies the condition (X, Z) = (X, Z) for any z £ L s (fi, B, P).
(4.20)
60
CHAPTER
4. SECOND
ORDER
PROPERTIES
Figure 4.1: Orthogonal projection onto a closed subspace. Let us now verify that the orthogonality condition (4.20) suffices to obtain X. For, consider the following examples: E x a m p l e 4.3.1 B = {0,ft}. Then L2(Q,B,P) consists of constant functions. Let c := X G(P, and let d := Z e
= dEX = cd,
and therefore c = EX (choose d ^ 0). Thus, the constant value giving the least error in quadratic mean with respect to a given random variable X is its expected value EX. Example 4.3.2 Let B £ A, with P(B) > 0, P(BC) > 0, and let B the
(4.21)
Take X = I A , with A £ A. Then the orthogonality condition specializes to E{IAnB)
= bEIB
4.3. CONDITIONAL
EXPECTATION
61
for Z = IB, i.e. P{Ar\B)
= bP(B),
or b = P(i4|B). Analogously, c = P ( A | B C ) . Thus, the piecewise constant random variable that best approximates IA (in the sense of quadratic mean) is iA = P(A\B)IB
P(A\Bc)IBc.
+
It is natural to call this random variable the conditional probability of A given B, denoted P(A\B), i.e.
ifw e B, if w e s c .
P(A\B), P{A\BC)
P(A\B){u)
E x a m p l e 4.3.3 Let B be as in the preceding example, and let X £ L2 be arbi trary, so that (4.21) holds. Just as before, the orthogonality condition specializes into E{XIS) E{XIBc)
= =
bP{B) cP(Bc).
However, E(XIB P(B)
= T^-J^BdP -
= E(X\B).
P(B)
Analogously, E{x n iBc) P{BC)
E{X\BC).
Thus X = E{X\B)IB
+
E(X\Bc)IBc,
and it is natural to call this random variable the conditional e x p e c t a t i o n of X given the (7-algebra B, denoted E(X\B). Thus E(X\B)(w)
E(X\B), E{X\BC),
if w € B , ifweBc.
62
CHAPTER
4. SECOND
ORDER
PROPERTIES
E x a m p l e 4.3.4 Let V = {Bu..., Bn} be a finite partition of Q, with Bk £ A, P{Bk) > 0, k = 1 , . . . , n, and let B be the sub-cr-algebra of A generated by V. For any X £ L2, the random variable in L2(Q,B, P) which approximates it best is the conditional e x p e c t a t i o n E(X\B), given by E{X\B){u)
= E{X\Bk)
if w 6 Bk.
This is proved by the same argument used in the preceding example. The details are left to the reader. In general, given any sub-cr-algebra B of A and any X £ L2, the conditional expectation E(X\B) is the unique random variable in L2(Q,B,P) determined by the orthogonality conditions E{XZ)
=
E{E{X\B)Z}
for any Z £ L2. The conditional expectation E(X\B) minimizes the residual \\X - Z\\, Z £ L2(n,B,P). Thus, E(X\B) is the random variable in L2[Q,B,P) which approximates X b e s t in quadratic mean. Let us apply this knowledge to various concrete situations in the following section.
4.4
Optimal Linear Estimation
Collecting data amounts to gathering information: the more data, the more in formation. Now, information is required in order to make rational decisions, and it is reasonable to expect decisions to improve as the underlying information gets richer and richer. On the other hand, real data are numbers. However, we know that repeating observations does not result in the same data. This variability can be accounted for by assuming the observations collected as samples from underlying random variables. Thus, information can be represented by a set of random variables. In the estimation problem presented in the opening part of this chapter, the relevant random variables are Xi,..., Xn (the m e a s u r e m e n t s ) and Y (the ob served value): it is required to estimate Y in terms of the Xks only, because they represent the available information. Indeed, estimators we are interested in are of the form Y =
(4.22)
4.4.
OPTIMAL
LINEAR
63
ESTIMATION
where is a sufficiently smooth function, called the estimator. A set of actual observations •ci j ■ - ■ >
x
n>
consists of values of Xit... ,Xn, i.e. Xk = Xfc(uj), for some w £ 0 . The corre sponding estimate is evaluated using y =
<j>(x1,...,x7l),
and Y(w) = y. The goal of estimation is to specify the estimator > in the best possible way. Let X £ Li be a given random variable, and suppose Y £ L-i is to be estimated in terms of X, say by mean of an estimate Y = <j>(X).
The estimate values are Y(w) = t(X(u)),
wGfl
i.e. Y = <j) o X. Estimates sought are of a very special kind: they are functions of the observation X. If X has a density and <£ is smooth, with a smooth inverse V», we have seen that Y has a density, too:
h(y) = hWy)M'(y)\In general
(y G B) = { o x e B) = {x e
=
X-1(r1(B))
and the distribution of Y is indeed determined by that of X: Fy(y) = P(X £ r l ( ( - o o , y ] ) ) = /
/F^(x).
On the other hand, the collection S ^ := {X~1{B) : B is a Borel set in(T} is readily seen to be a cr-algebra of subsets of f2, i.e. it is a sub-cr-algebra of A. It is in fact the smallest cr-algebra determined by X, and L2(£l,Bx,P) consists
64
CHAPTER
4. SECOND
ORDER
PROPERTIES
of all the square integrable random variables which are functions of X. Given Y G L2, we write E(Y\X) instead of E(Y\BX)- From what we know, E(Y\X) is the best approximation to Y among all square integrable random variables which are functions of X. The best estimate of Y among those random variables is then Y = E(Y\X).
(4.23)
If, instead of one random variable X as observation we have n of them, the situation does not change. Indeed, if BXl x„ denotes the £r-algebra generated by Xu. ..,Xn, then we write E(Y\XU.. '. ,Xn) instead of E{Y\BXl x„)- The best estimator is given by the conditional expectation Y:=E(Y\Xu...,Xn).
(4.24)
Situations arise in which infinitely many random variables are needed to represent the information pattern. For instance, suppose a signal evolves in time and its values f(s) are measured for 0 < s < T. Then each value f(s) can be thought of as a sample of a random variable X,, and there is one such random variable for each s £ [0,T]. The collection of random variables X. := {X„ 0 < a < T} is termed as a stochastic process. Let a second stochastic process Y. := {Yt, 0 < s < T} somehow related to X. be known. The s m o o t h i n g problem requires estimating Yt in terms of the past infor mation {X,,0 < s < T } , for some t < T. The filtering problem requires estimating YT in terms of {Xa,0 < s < T}. Finally, the prediction problem asks for an estimate of YT in terms of {X,, 0<s
E(Yt\X„Q<s
b) the optimal filter is specified by Yt =
E(Yt\X„0<s
c) the optimal predictor is given by Yt =
E{YT\X.,0<3
4.4.
OPTIMAL
LINEAR
65
ESTIMATION
Clearly, E^F), when T is a family of random variables, means conditional ex pectation with respect to the (Xi,.. . ,Xn), and computing the conditional expectation amounts to determin ing the function <j>. Thus, even in this finite observation case the problem is infinite dimensional. To stay on the feasible side, let us restrict ourselves to linear estimates of the form <j,{X1,...,Xn) = 81X1 + ... + 8nXn. The best linear e s t i m a t e can be specified in the form Y = 61X1 + ... + 6nXn,
(4.25)
where 0lt. . ., 9n are scalars to be determined. For, let S denote the set of all linear combinations #i-X\ + . . . + 8nXn, with ( i , . , , , j , 6 C . Clearly S is a closed linear subspace of L2. By the Orthogonal P r o j e c t i o n T h e o r e m , the best Hnear estimate, i.e. the one with the minimum residual V — 9iXi + . . . + ^„X„ is determined by the orthogonality conditions E{YXi)
= E{YXi),
i =
l,...,n,
i.e. by t h e set of equations EiX.Xi)^
+ ... + E{Xn~Xi%
=
for i = 1 , . . . , n. If the random variables Xi,...,Xn above system reduces to E{\Xi\%
= E(YXi),
i=
E{YX{), are orthogonal, then the
l,...,n
and the solution to the linear estimation problem is given by §
E(YXj)
EdXif)'
(426)
66
CHAPTER
4. SECOND
ORDER
PROPERTIES
If the data Xi,... ,X„ are not orthogonal random variables, provided they are linearly independent it is always possible to orthogonalize them using the GramSchmidt method. The resulting orthogonal set Zlt...,Z„ will also generate S and (4.26) still applies with Zt instead of X{. If Xi,...,Xn are not linearly independent, a smaller subset Xit,... ,X{m, m < n, can always be extracted in such a way that they are linearly independent and generate the same subspace S. The G r a m - S c h m i d t procedure is applied to this independent subset. The foregoing developments outline a possible solution to the estimation prob lem posed in the introduction to this chapter. We actually need E(Y\Xi,..., Xn), but a second best solution is given by (4.25) with the coefficients given by (4.26) assuming the necessary orthogonalization steps have been carried out. See [27] for the design of efficient estimation procedures applicable to this situation. In practice, neither Xi,..., Xn nor Y are known, hence computing with (4.26) may not be possible. However, N independent repetitions of the measurements can always be conducted, obtaining the values Xij, . . . , Xnj,X/j,
J = 1, . . . ,iV
from which both E{YXi) and .E|X;| 2 can be estimated in terms of the corre sponding sample means, see [26].
Chapter 5 The Fourier Transform This chapter contains some purely mathematical developments which will find important applications in the remaining chapters. For instance, chapter 8 deals with the computer simulation of random fields, and an important technique for that relies on the Fast Fourier Transform (FFT), which is an algorithm for the efficient evaluation of Fourier transforms on high speed computers. In addition, the Fourier Transform plays a central role in chapters 6 and 7, where spectral representations of both covariance functions and random fields are worked out. Indeed, the role played by the Fourier Transform in the linear theory of random fields cannot be overemphasized. The reader is urged to read this chapter with much attention, and return to it for reference while working on the remaining chapters.
5.1
Characteristic Functions
Let X be any real random variable and consider the complex valued random variable ei(X, for fixed £ € M. This new random variable is bounded and therefore belongs to L\. Define
MO = E{JtXY
(5-1)
The complex valued function thus defined is termed the characteristic function of X, and is defined for any £ € St. Explicitly,
4>x{i)= T°° eiiXdFx{x) J — OO
67
(5.2)
CHAPTER
68
5. THE FOURIER
TRANSFORM
and, if X has a density, < M 0 = f+°°
eiiXfx{v)dx.
J — oo
Notice from (5.1) that
• Mo) = i, • \4>x{t)\ < 1 for all £ G /R. An easy computation shows that **(£) = ""***,
i£X~N{0,l).
(5.3)
Also, from (5.1) it follows immediately that
and therefore Mt)
= **"-£*,
XX~N(V,CT2).
(5.4)
Notice that, for X, Y independent - see (2.17), Eei({X+Y)
= Eei£Xeity
=
Eei(XEe«V)
and therefore 4>X+Y{£)
= 4>X{£)4>Y((),
if X and Y are independent.
(5-5)
Characteristic functions are necessarily continuous 1 as shown by the following argument:
< E\ei(x
(l - eihx)\
= E\eihx
- ll.
Clearly \eihx - l | < 2 and eihx ->■ 1 when /i -> 0, therefore £|e i h J f - l | - f 0 by the D o m i n a t e d Convergence T h e o r e m [31]. Thus <j>x{£ + /i) —>• $x(£) when /i —> 0, and the assertion is proved. Indeed, the regularity of a given random variable is mirrored by the smooth ness properties of the corresponding characteristic function. In what follows, "X e Ln" means ll\X\n G Li", 'Indeed, they are uniformly continuous.
5.1.
CHARACTERISTIC
FUNCTIONS
69
T h e o r e m 5.1.1 If X £ Ln, then x € Cn and
#>(fl = inE (Xne«x) PROOF: Recall that emA - 1 T h
. > iX, when h —> 0.
Thus, for n = 1
h h
= -> ->
Ee
—h— i ^ ^iE(Xe* e * *x)
by Dominated Convergence. Therefore
4?At) =
iExSx.
An induction argument completes the proof.
r-]
In particular, observe that EXn
= i~n(£)(0)
forl€l„,
which shows that moments can be easily computed from the derivatives of the characteristic function at the origin. In fact, this is one of the main applications of characteristic functions in Elementary Probability, see [10]. For instance, if X ~ N(n,cr2), then
Mi) =
m) =
(f/*-«r»0^(0,
-*-2<M£) + (v--*20&(fl*(fl.
and therefore EX2 = cr2 + fM2.
EX - p,
The interested reader could easily develop expressions for EX3, EX*, etc. Let Xi,... ,Xn be independent, identically distributed random variables, with common mean fi and common variance a2. Let X^n> denote their sample mean, and set =
K") - 1 a/^/n
CHAPTER
70
5. THE FOURIER
TRANSFORM
common characteristic function of the normalized random Let <j> stand for the variables Xi- p Xn- (i o- ' • • '
Moreover, by Taylor' s theorem
♦(£)-'-=+-G)hence
b
s*(^)~-£ (*"°°)
and
log &?„(£) — — , i.e. si lim (/>z„U) = e ■> .
(5.6)
S u m m i n g u p : For large n, the characteristic function of the normalized error Zn resemble that of an N(0,1) random variable. This is the Central Limit T h e o r e m , in the light of the following T h e o r e m 5.1.2 A sequence {Fn} of distribution functions converges to a distri bution function F if and only if the sequence of their characteristic functions {n} converges to a continuous limit . In this case, is the characteristic function of F, and the convergence (j>n —> is uniform in every compact interval. For a proof, see [11]. See section 6.4 for additional material on characteristic functions. Let us now consider the Fourier Transform, a purely analytical concept generalizing that of characteristic function.
5.2.
THE FOURIER
5.2
TRANSFORM
71
T h e Fourier Transform
In what follows all functions will be complex valued, defined on a Euclidean space IR". Integration will be understood with respect to n-dimensional Lebesgue measure dx, and often the integration limits will be omitted for the sake of brevity. One such function / : IR" —>
/(*) =
if x < 1 , if x > 1 ,
it follows that / £ L2, f $. L\. Therefore L2 <£. L\. Fix i £ IR" and let / £ Lx. Then | e - 2 , r " £ : 7 ( z ) | = | / ( x ) | , so that
f{t):=fe-**~f{*)d* is well defined. This new function will be called the Fourier Transform of / . We have normalized with — 2TT in the exponential, following the usage in [9] and [34], for instance. Other choices are equally valid: —1 as in [38], + 1 as in [15], +27T as in [33], etc. Needless to say, some numerical constants appearing in the formulas below change accordingly. Clearly, \f{()\ < J \f{x)\dx, for all ( £ Lx. Thus • / £ Lx, for each / £ L\. • The mapping / >-» / is a bounded linear operator from Lx into L „ , with
The norms above are defined by 2 l l / I L := sup | / ( x ) | , xeZR" 2
The norm IHI,,, should actually be defined in terms of the essential s u p r e m u m . This is necessary because the elements of Lx, L2, L^, etc. are denned u p t o a set of m e a s u r e zero, see [31].
72
CHAPTER
5. THE FOURIER
TRANSFORM
H/ll, := J \f(x)\dx. The inner product in Li is
if,g) = J f{v)g{x)dx, and a norm arises in the usual way, i.e.
Il/lli -
\fifJ)-
The natural symmetries of the n-dimensional Euclidean space IRn are the so called rigid m o t i o n s : IR" —> IR", namely the compositions of a) t r a n s l a t i o n s T : IR" -> 7Rn, given by x >-> x + ft, with ft G IR", and b) o r t h o g o n a l t r a n s f o r m a t i o n s x i-» iix, with i2i? T = RT R = I. Thus (j> is a rigid motion if and only if (j> = T o R, with T a translation and iZ an orthogonal transformation of IR". A straightforward computation based on the T h e o r e m of C h a n g e of Variables in a multiple integral shows that
= J1 €-***■" e-2wi(,cf{Rx f[Rm++h)dx h)d* //"o>(£) °
ez«ij£.A
/
e-2«iRtyf(y)dy,
Therefore • If <£ = T o .R is a rigid motion in iRd with Tx = x + ft and fl£r = RTR = I, then h
W ( 0 = e^^ f(R().
More general symmetries of .ffi" are the afRne t r a n s f o r m a t i o n s , namely the compositions of translations T with nonsingulax hnear transformations x — i > Sx, with S invertible. Letting <j> = T o S be an affine transformation, its inverse is 4>1(y) = S~1(y — ft). The reader is invited to establish the following generaliza tion of the foregoing property of the Fourier Transform:
5.2.
THE FOURIER
TRANSFORM
73
• If d> = T o S is an affine transformation of IRd, with Tx = x + ft, then
,
K%h-{ST)
i r 1 '°«fl = 4ss\detS\ W >- 0-
(^)
Observe that fit
/ ( £ ) = j e-2"** [e-2"ih-x - l]
+ h)-
f(x)dx.
The integrand is bounded by 2 | / ( x ) | , which is integrable. Moreover, it goes to zero as ft -^ 0. Therefore / ( £ + ft) - / ( £ ) -^ 0 as ft -^ 0 by the Dominated Convergence Theorem. Thus • For each / £ L\, f is uniformly continuous. Moreover L e m m a 5.2.1 ( R i e m a n n - L e b e s g u e ' s Lemma,) For each f G L\, / ( £ ) -* 0 as |£| —¥ + o o .
PROOF: This result can be easily established for special choices of / . For instance, let Q = [oi, b%\ X . . .X [a n , fen] be a parallelepiped in ffin, and let f(x) = 1 on Q, / ( B ) = 0 outside . Then /(« = ~~
£
•••£
e-*"t&^+-+<-») < fa J ...
_>_„-2iri(£iii)|i>i
_ i _
-2?ri(£na;„)|6n
2ir£i
2TT(„
I "n
loi
i.e.
1/(°1£ (2^161--ie,r°
as|
^+°°-
If Qi, • • •, Qk are parallelepipeds in EC and / = Y!k=i "fc/^with / A (x) = 1 on Qk, fk(x) = 0 outside Qk, then / = £ i = i a*/fc a n d therefore / ( £ ) -> 0 as |£| —► oo. A generic / 6 Li can be approximated arbitrarily well in the ||-Hj norm by a sequence of functions of the form considered above (see [31]). That is, for / £ Li and each e > 0 there is a function g = Y,<*kfk such that ||/— g\\ < | , and l<7(01 -> 0 as |f| -* oo. Therefore
|/(o| < ||/-5|L+i^)i <
\\f-g\\x e
<
2
+
e
2 =
+ £
\m)\
74
CHAPTER
5. THE FOURIER
TRANSFORM
if |£| > M. Therefore /(£) -> 0 as |e| -> +oo.
□ By the Riemann-Lebesgue Lemma, not every complex continuous function can be the Fourier Transform of an integrable function: if it does not vanish asymptotically for large values of the argument it is not a Fourier transform. For instance, A constant function is not the Fourier transform of an integrable
function.
Suppose both / and Xkf{x) are integrable. Let e'*' denote the fc-th unit vector in the usual canonical basis of IRn. Then
/tt + *CW)-/(fl = j ^ ^ /e-^-lj f{x)dx Notice that
-1
h By Dominated Convergence
—> — 2irixk
as h —> 0.
Therefore • If / S i i and Xkf(x) is also integrable, then J^- exists. Moreover, it is the Fourier transform of —27ria;fc/(x). Thus, better integrability properties of / give rise to greater regularity of / . As an example, if / G L\ and |x| / ( i ) is also integrable, then the Laplacian A / exists: it is the Fourier transform of —(27r) 2 |i| 2 /(a;). Correspondingly, suppose / G L1 is differentiable, with J^- G L\. M o r e o v e r , suppose f{x) —> 0 as \x\ —> +oo. Then, using integration by parts,
gtt, . / , — « ( .,* = In other words
0- j
f(x)e-2"iex{-2Tvi£k)dx.
5.2.
THE FOURIER
75
TRANSFORM
• If / £ Li, | £ e i i and / ( i ) -> 0 when | i | -> +oo, then
^(0=2«6/(e)Assuming / € i i vanishes at infinity and A / is integrable, then = -(2TT)2|£|7(0-
A/tf)
Observe the symmetry between the expressions for A / and A / . All the results given so far refer to linear aspects of the Fourier transform. In addition, there is a nonlinear operation in L\ which behaves well in connection with Fourier transformation. Given f,g£ L\, consider the new function (as,y) •-> f{y)g{x — y)- By Fubini's theorem (see [31]), the function x
*->■ J f(y)g{x
- y)dy
is integrable. Define the convolution of / and g by f*g{x) = jf(y)g(x-y)dy.
(5.8)
By Fubini's theorem, it follows easily that f*g (f*g)*h
= g*f = f *(g*h)
whenever f,g,h £ Lx. Again, by Fubini's theorem f^9ii)
= jjp jjp e-2^xf(y)g(x - y)dxdy.
Apply the transformation of IRn x IR" u = y,
v = x -y,
to obtain
fT9{ ]
t = {Jm"e~2^'uf{u]du)
• If / , g S Li, then
Unre~**~9{v)dv)
_____
f*g(t) = f(ZMi)-
(5-9)
At this point the reader should stop and work out the connection between this material on the Fourier transform and the first section of this chapter.
76
CHAPTER
5.3
5. THE FOURIER
TRANSFORM
The Plancherel Theorem
We shall shift our emphasis from functions in L\ to square summable functions and develop the Fourier inversion theory - for functions in L2. Let us begin by noting an important property of L2: square summable func tions may not be necessarily integrable, but they can be approximated in the mean by square integrable functions which are integrable. For, let f £ L2 and, for m > 1, let
/»W - I 0)
if |x|
>
m, m.
Clearly, fm £ L2. To prove the integrability of fm, let C m denote the n-volume of the sphere of radius m. Then
f\fm(x)\dx
= f
J
\f(X)\dx
J\X\
< cin/ii 2 <+oo, so that fm £ Ly. Moreover, H/-/»llJ=/
|/(a)|*
asm->oo
J\X\>m
by Dominated Convergence. Thus • L\ fl L2 is dense in L2. L e m m a 5.3.1 If f £ Lx fl L2 then f £ L2 and \f\
= ||/|| 2 .
We cannot give a full proof of this fact but we can nonetheless outline its main points. For the details, refer to [34]. Observe that
\\fl = J\fw\2dz = Jf((;)mn for / £ Li fl L2. Also,
W) = J e2*i(xW)dx = J e-W>J(=y-)dy,
5.3.
THE PLANCHEREL
THEOREM
77
so that / is the Fourier transform of /(—x). Consequently, / ( £ ) / ( £ ) is the Fourier transform of the convolution
h(x) := J
f(y)f(y-x)dy
||/t = /M£KOn the other hand
\\f\\l = J\f(x)\2dx = h(0).
To complete the proof, it only remains to prove that fc(o) = / f c ( * K ,
(5.10)
which is precisely Corollary 1.2 of [34], to which the reader is referred for the details. The foregoing lemma is most important, because it allows us to extend the Fourier transform / — i > f to the whole of Li. Let us see how. Let / 6 Li be given. Because Li n L2 is dense in Li, there is a sequence {/*.} in Li fl Li such that ||/* — / | | 2 —> 0 as k —> oo. Consequently [[/*. — /(|| 2 —>■ 0 as fe,/ —► oo, and by the previous lemma \\fk~ fl\\2 = \\fk-fl\\2-+0
a.sk,l^oo.
By the completeness of L2, there is a unique element g 6 L2 such that /*. —> g as fc —>■ oo. Define f := g. Clearly
/IL =
lim fk
& 1 1 Kfc s
fc->oo
fc—^00 I'
= lim[|/4 2 = H/ll
11 -=
for / 6 i 2 Thus we have established the important T h e o r e m 5.3.1 (Plancherel's a linear isometry, i.e.
Theorem) The Fourier transform $ : L2 —> L2 is
I l * / L = II/II2
M every f e L a .
(5.11)
78
CHAPTER
5. THE FOURIER
TRANSFORM
Let us examine some important consequences of Plancherel's theorem. • $ preserves the inner product in Li . For, observe that («. v) = 4 (ll u + v\\\ ~ II" - I'll* + *|[t» + iv\\\ - i\\u - ix>||*) . Apply it to u = $ / , v = $ y , with f,g € L2, and recall (5.11) to get 4 (9f,ig)
=
\\*(f + g)\\'-
\\*(f - g)\\>
+ i\mf +
ig)f-i\\$(f-ig)\\2
= ll/ + 5l| 2 -||/-ffl| 2 2 2 + i\\f + ig\\ -i\\f-ig\\ =
4 (/,«,),
thus proving the assertion. • The image of the Fourier transform is a closed subspace of Li. I m $ is clearly a subspace of Li. Let {/„} be a sequence in I m $ , with /n ~~ 9 —► 0 for some g € L2- By (5.11), ||/n - /m|| = | / n - / m | -> 0
asm,n->(»,
hence {/„} converges to some / € £2- Again by (5.11), ll*/»-*/|| = ||/»-/l|->0
asn^oo,
therefore $ / = g, i.e. y € Im<J>, so that I m $ is closed. • The Fourier transform is onto L2. Suppose the closed subspace I m $ is not all of L2. Then, its orthogonal complement Im^ - 1 contains a nonzero element, say g £ L%. By Fubini's theorem, for f £ L2 jg(x)f(x)dx
= I (J e-2™»g(y)dy}
=
f{x)dx
Jg(y)(yJe-2^"f(x)dx^dy
= jg{y)f{z)dy=(j,g)
= 0,
5.3.
THE PLANCHEREL since f(()
THEOREM
79
= / ( - f l and therefore Q,g}
= J f{-i)g(t)dt
= (g, / ( - • ) ) ■
Therefore (g, / ) = 0
for all / G L2
and g = 0. But, by (5.11)
which is a contradiction. This shows that Im<$ = L 2 , i-e. $ is onto. Recall that the adjoint of a hnear operator T : L2 —> L2 is the unique linear operator T* : L2 —¥ L2 satisfying (Tf,g)
=
(f,T*g)
for each choice of / , g 6 L2 (see e.g. [14]). Linear isometries of L2 (like $, for instance) satisfy the condition T'T = I. Moreover, if they are onto (like $ ) , then they also satisfy TT* = / . For a proof of these statements, see [14], Theorem 3.1. In general, a linear operator T : L2 -4 L2 is said to be u n i t a r y if T'T = TT* = I.
(5.12)
Thus, a unitary operator is necessarily invertible and its inverse coincides with its adjoint, i.e. T " 1 = T*. • The Fourier transform is a unitary operator of L2. In fact, $ preserves the inner product, and therefore
for every f,g€. L2. Thus $*$ = / . Let / = $ u , which is possible because $ is onto. Therefore
II#VB; = II***«II; = H ;
= ii/iu,
hence $* is also an isometry. Since $** = <$, it follows that $**$* = $ $ * = / . This proves that $ is unitary.
CHAPTER
80
5. THE FOURIER
TRANSFORM
In view of what has been said so far, to invert Fourier transforms it suffices to compute the adjoint $* of $. For, by Fubini's theorem
(*f,g)
=
j(KJe-^'f{x)dx^W)^
=
j'f{x)(fe**-'g{t)di)dx
with
$tg(x) = Je2"«*g(£)dt The Fourier inversion formula
f{*) = j
(513)
should now be obvious. The Li theory of the Fourier transform yields a very elegant solution to the Fourier inversion problem. Nonetheless in probability applications characteristic functions are essentially Fourier transforms of densities, which are L\ functions. Thus, the Li theory provides a more natural setting, although it is much harder. The reader is invited to read it in Chapter 1 of [34].
Chapter 6 Second Order Random Fields This chapter contains the basic material pertaining the main topic of this book. Special emphasis is laid upon the construction of random fields from information available from experiment, namely covariance functions. The Karhunen-Loeve expansion is given a central role as method for the construction of random fields on compact sets, see section 6.2. The mean square calculus of random fields (continuity, integration and differentiation) is presented in section 6.3, and the Karhunen-Loeve expansion is shown to be a general method for constructing mean square continuous random fields on compact sets. Section 6.1 contains some notation and generalities on second order random fields. Finally, section 6.4 contains material related to chapter 5, including spectral measures and Bochner's Theorem on the representation of covariance functions as Fourier Transforms.
6.1
Covariance Functions
In what follows we shall always assume all random elements are defined on a given probability space (£l,A, P). Random fields will be defined on subsets of IRd, with d > 1. The Euclidean norm in ffi.d will be denoted by |-|. Definition 6.1.1 A second order random field over S C IRd is a function Z : S —» L2(£l,A, P ) . If d = 1, Z is said to be a second order stochastic process. In other words, a second order (complex) random field has been specified over 5 if a random variable Z(x) has been specified for each x E S, with .E|Z(x)| < 81
82
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
+00. Thus, we can alternatively say that a second order random field over S is a family {Z(x), x £ 5 } of square integrable random variables. Examples of random fields of interest in Water Resources Engineering are given in section 1.2 above, namely the hydraulic conductivity field {D(x), x £ 5 } and the piezometric height field {u(x), x £ S}. An example of random field arising in image analysis is {Z(x), x £ S}, where the points of S £ St represents the individual pixels1. An image is just a realization of the given field. As to the values of the field, black (B) and white (W) image analysis requires that _, „ I 0, if pixel x is B in image w, Z{x,ui) = < '., . , . . . v ' I I , 11 pixel x is W in image w.
/„ ,\ (6.1)
In modern practice, S is digitalized into a finite set and images are represented as matrices with elements in {B,W}. If color images are to be considered, then each random variable Z{x) takes values in the three dimensional space measuring the intensity of red (R), green (G) and blue (B) at pixel a;. Thus, color images are really 3-vectors of real random fields: the R field, the G field and the B field. Definition 6.1.2 A real random field Z is G a u s s i a n if for each n > 1 and each choice of points xl,...,xn £ S, the corresponding random vector (Z(x1),..., T Z(x")) is Gaussian. Let Z be a real Gaussian random field and let x1,. .., xn £ S. Let m £ lRn, C £ iR n x n be given by m; dj
=
EZ(x'),
=
{
E{Z{x )
t = 1 , . . . ,7i - rn^Zix^
-
mj),
i,j =
l,...,n
Clearly C is symmetric. Given u £ lRn, n
n
J2 CijUiUj =
E Y^ (z(xi)
•,i=i
- m i )u i (2(!B i ) -
i.j=l 2
=
E
>0 i=l
hence C > 0. 1
Pixel = picture element.
mj)uj
6.1.
COVARIANCE
FUNCTIONS
83
Suppose C > 0 and let C = LLT be its Cholesky factorization 2 . Define the random vector W := L~\Z
- m),
1
where Z stands for (^(a; ),.. ., Z(x"))T. Clearly W is a Gaussian random vector, with mean 0 and covariance matrix given by EWWT
L-\E{Z-m)(Z-m)T)L-T
=
In other words, W ~
1
T
T
L- (LL )L-
= I.
N(0,1 ) and Z = LW + m.
We know that
Mi) = e-*l(?
and therefore
M^ = e*"*-*^'
as can be easily verified, see (5.7). Thus, m and C determine the characteristic function of {Z(xx),. . ., Z(xn))T, thereby determining its distribution. Thus, the finite dimensional distributions of Z are determined by the pair of functions m(x) C(x,y)
-: ==
EZ{x) E[Z{x)-m(x)}[Z(y)-m(y)}T,
called the m e a n and covariance function. By Kolmogorov's Extension Theorem - see section 2.1 - the whole random field is thus determined by these two functions. Definition 6.1.3 Let Z be a (complex) second order random field over S. The m e a n and covariance function of Z are mz ■ S —> (D and Czz '■ S x S —t <E, given by
2
mz(x)
=
Czz{x,y)
=
Any factorization will do.
EZ(x) E(Z(x)-m{x))(Z(y)-m(y)).
84
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
We have seen that the mean and covariance functions of a Gaussian random field characterize it. Indeed, mz and Czz determine the finite dimensional distrib utions of Z. In general, this is not the case. Nevertheless, a surprisingly vast set of properties of an arbitrary second order random field depends only on mz and Czz- We shall concern ourselves in what follows with these properties only, which constitute the so called second order theory. Without any loss of generality, let us assume that mz = 0, i.e. that Z is centered. If such is not the case, Z — m is always centered, its covariance function being precisely CzzDefinition 6.1.4 A second order random field is said to be h o m o g e n e o u s , if there is a function Rz : IRd —¥
= Cz{u + x, x) = EZ{u +
x)Z{x),
for any x 6 IRd. In particular Rz{0) =
E\Z{x)\\
for any x € IRd. In general, let C : S x S —¥ (U be the covariance function of a certain random field Z over S. Then Proposition 6.1.1 A covariance function satisfies the following a) C{x,y) b)
=
conditions:
C{y,x).
\C(x,y)\<^E\Z(x)\2y/E\Z(y)\2.
c) For each choice ofn>l,
of x11...,
xn £ S and of u E
n
Y, Cix^x^mu]^
d) In particular, C(x,x)
> 0.
0.
(6.2)
6.1.
COVARIANCE
FUNCTIONS
85
PROOF: Property a) is easily established from the definition. Property b) is a consequence of the Schwartz inequality. Property c) has already been established for Gaussian random variables. The reader can no doubt supply the remaining details and thus complete this proof. r—i In particular, let C be the covariance function of a homogeneous second order random field Z over IR , and let R be the corresponding correlation function. Then a) R(x) = b) \R(x)\ <
R(-x). jE\Z(x)\2jE\Z(0)\2.
c) R satisfies n
Y^ R{x{ - xj)uiUj > 0 for any choice of n > 1, x1,. .. , xd £ ]Rd and u G Wn. Definition 6.1.5 A function / : IRd -+(D is said to be of positive t y p e if n
Y, f{x{ - x^uolj for each choice of n > 1, a s 1 , . . . , xd 6 Md and u
>0 £$".
Thus, correlation functions of homogeneous random fields are of positive type. A characterization of correlation functions based upon this property is given in section 6.4. Let us close this section with a few notational remarks which will prove useful later on in the sequel. Given two scalar centered second order random fields Z and W, their cross covariance function is Cz,w(x,y)
=
EZ{x)W{$.
This concept will be required in section 6.3. Theorem 6.3.3 deals with cross covariances between a scalar random field and a vector valued one. More generally, let us assume that both Z and W are vector valued random fields, say Z(x) is a random p x 1 complex matrix and W^(s) is a random q x 1 complex matrix, for
CHAPTER
86
each x 6 S. Then Z(x)W(x)
6. SECOND
ORDER RANDOM
FIELDS
is a random p x q complex matrix, its elements T
being L2 random variables. It is natural to define EZ(x)W(x) therefore Cw,z{x,y)=\\Czi,wj{x,y)\\.
6.2
elementwise, and (6.3)
Construction of R a n d o m Fields
Let K be a compact set in M and let Z be a second order random field over K. Let C be the covariance function of Z, which will be assumed to be square integrable over K x K in what follows, i.e. \C(x,y)\ Let A : L2(K)
—> L2(K)
dxdy < +oo.
be the integral operator A<j>(x)= f
C{x,y)4>{y)dy.
JK
Then, A is a compact, self-adjoint linear operator on L2(K) (see [14]). Let Ai, X2,. . ., be the eigenvalues of A, with orthonormal eigenfunctions rj)i, tf}2,. .., i.e. Ai/>n = A n ^„, re = 1 , 2 , . . . , {i>n,i>m) = Snm, re,TO = 1 , 2 , . . . , where (f,g)=
[ f(x)g(x)dx
(6.4)
iovf,geL2(K). Suppose moreover that C is continuous. Then (Af,f)>0,
feL2(K)
(6.5)
as can be easily derived from (6.2). Thus, the eigenvalues are non-negative (A„ > 0, re = 1,2,...), besides converging to zero as re —> oo. In addition, Af is continuous for each / £ L2(K), so that each ipn is continuous if An > 0. Indeed,
6.2.
CONSTRUCTION
OF RANDOM
87
FIELDS
Besides, Mercer's theorem holds, i.e. C{x,y)
= £
(6.6)
Afc^fc(x)V>fc(j/),
convergence being absolute and uniform over K x K. See Chapter IV of [14] for the above background material. Let Z\, Z 2 , . . . , be a sequence of (complex) random variables, with a) EZn = 0 , n = 1 , 2 , . . . b) EZnZm
= XnSnm, n,m = 1,2,
T h e o r e m 6.2.1 Let K, {A n }, {^n}, {Z„} be as above. In particular, assume C is continuous. Then the series (6.7)
Z(«) := £ > B ( * ) Z n converges in quadratic mean for any x £ K. field over K. Moreover a) EZ{x)Tn
= Xn^n(x),n
b) l i m ^ o E\Z(x)
= 2
- Z(x°)\
Thus Z is a second order random
l,2,... = 0, for any x° £ K.
PROOF: For each n > 1, let
Sn{x) = YJi>k{x)Zk. For n,m > 1, n > rn, E\Sn(x)
- Sm(x)\2
=
E fc=m+l
= £ A^o-Of^o fc=m+l
in virtue of (6.6). This proves the convergence in quadratic mean of the series in (6.7), thereby proving that Z is a second order random field. Moreover, ESn{x)Tk
= j^^^EZkZt k=l
=
XiMz)-
88
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
Property b) follows upon letting n —> oo. Finally, observe that \Z{x)-Z(x°)\
= \Z(x)\2-Z(x)Z{x°)-Z{x0)Z(x)+\Z{x°)\
(6.8)
and therefore E\z(x)
- Z(x°)\2
=
C{x,x)-C{x,x0)-C{x°,x)
-¥
C{x°,xo)-2C{xo,xo)
C{x°,x0)
+ + C(xo,x°)
=0
as x —> x°.
□
For the sake of illustration, consider the following, most important one di mensional example. E x a m p l e 6.2.1 Consider the function C(t, s) = min(t, s),
0
clearly continuous on [0,1] x [0,1]. Moreover, C(t,s) ing compact, self-adjoint integral operator is
1, = C(s,t).
The correspond
Acj>(t) = I scj>(s)ds + t I (j>(s)ds. Its eigenfunctions are continuous and they satisfy f s4>(s)ds + t f <j>(s)ds = \(/>{t),
0<(<1
for suitable real scalars A. In particular, necessarily <^(0) = 0. The left hand side is differentiable, then so is <j>. Differentiating, get
£ <j>(s)ds = Wit), from which it follows in particular that >'(l) = 0. Differentiating again, get the differential eigenvalue problem \<j>" + (j, = #0)
=
0 0,
on (0,1) *'(!) = 0.
6.2.
CONSTRUCTION
OF RANDOM
89
FIELDS
An elementary argument shows that the above problem has no non-negative eigenvalues. Let A = l/u>2. Then <j>(t) = A cos tot + B sin uit and the boundary conditions translate into A = Q,
a;Scosw = 0.
The solutions to cos w = 0 are ujn = (n + |)7r, re = 0 , 1 , . . . , hence the eigenvalues of this problem are ^"
=
7 — , ii2 2'
The eigenfunctions are of the form Asmuint.
n =
0,l,....
Since
r1 1 / sin 2 wntdt = - ,
Jo
2
the normalized eigenfunctions are V'n(t) = v 2 sin Ire+ - ) 7ri, re = 0 , 1 , . . . , Let ZQ, ZI, ... be a sequence of independent iV(0,1) random variables. Then
Wn(t)^±^±^Zh defines a sequence of Gaussian random variables for each t 6 [0,1]. EW„{t) = 0 and
Clearly
^(w.)^E^(t+'';l?,lt+i)". By the preceding theorem W{t) := Mm Wn(<) exists in quadratic mean, and therefore {W(t),0 < t < 1} is a second order stochastic process, clearly Gaussian and centered.
90
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
By Mercer's theorem .,
.
2 ~ sm(k + ±)irtsm(k + \)ns
hence EW(t)W(s) = min(i,s). The Gaussian stochastic process constructed above is called the standard Brownian motion or W i e n e r process and it is given by k
k=0
*
+ 2
Clearly W(0) = 0 a.s. Moreover, for t > s the Brownian increment W(t) — W(s) is Gaussian, with zero mean and variance E [W{i) - W(s)f
= t-2s
+ s = t-s.
Let n > 1 and pick instants 0 = t0 <
W(tn) = JT(W(tk) - W{th-t)). Let >k be the characteristic function of the fc-th Brownian increment, and let
Ee«w^.
4(() = Then
m = exP{-|e2}, MO = expj-^"^- 1 ^ 2 },
k = l,2,...,n.
Noting that *n = z2{tk — tk-i), k=l
it follows that
m = n MO, k=\
thus proving the independence of the Brownian increments.
6.3. ANALYTICAL
PROPERTIES
OF RANDOM
91
FIELDS
This is probably the most important example of a stochastic process in con tinuous time. All of its properties follow from the independence and Gaussian character of the Brownian increments, for instance. Among its most noteworthy features one should mention the following, see [24]: With probability one, the trajectories t H-» W{t,u)) everywhere continuous functions which are nowhere
6.3
of the Wiener process are differentiable.
Analytical Properties of Random Fields
Definition 6.3.1 A second order random field Z over S C IRd is m e a n square (m.s.) continuous at x° £ S if E\Z(x) — Z(x°)\ -4 0 as x —> x°. Z is said to be m e a n square (m.s.) continuous if it is continuous at every point of S. Theorem 6.2.1 gives a method for constructing m.s. continuous random fields over compact sets. We shall prove below that it is indeed a general method for constructing everywhere m.s. continuous random fields on a compact set (see Theorem 6.3.2). Proposition 6.3.1 A random field Z is m.s. continuous at x° if and only if its covariance function is continuous at (a; 0 ,a; 0 ). PROOF: Observe that \z{x) - Z{x°)\2 = \Z(x)\2 - 2ReZ(x)Zjx^)
\Z{x°)\2,
+
hence E\Z(x)
- Z{x°)\2 = C{x,x)
- 2ReC{x,x°)
0
+
C{x°,x°).
Clearly the continuity of C at (x°,a; ) imphes that E\Z(x)
- Z(x°)\
-> 0 as
x —>• x°.
Conversely, suppose Z is m.s. continuous at a;0, and let a; —)• x°, y —> x . Then C(x,y) = EZ(x)Z{y) -> EZ{x°)Z(x°) = C(x°,x°).
□ Let Z be the m.s. continuous random field
Z(x) = £ £„Vn(z) n=l
92
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
of equation (6.7). Formally, the Zn are the Fourier coefficients of Z with respect to the orthonormal system {V>n}, i e .
Zn = j Z{x)JJx)dx.
(6.9)
But, what is the meaning of this integral? Let us clarify this issue in what follows. It suffices for our purposes to consider integrals of the form / f{x)Z{x)dx,
(6.10)
JG
where a) G is a bounded region in St , b) / : G —>(D is continuous. Integral (6.10) will be defined in the sense of Riemann, with convergence in mean square. Let { Q n } ^ ! be a partition of ttt into cubes with edges parallel to the coor dinate axes, and let S := sup diam(<3„), where diam(5) = sup{|s — y\ : x,y £ 5 } , for any set S C IR ■ Number the cubes in such a way that m *=i
for some m > 1; this is possible because G is bounded. That is, G Pi Qk / 0 for fc = 1 , . . . , m and G H Ql = 0 for / > m. Let \Qk\ denote the d-dimensional volume of the fc-th cube, and pick xk € G fl Qk, for k = 1 , . . . , m . Form the Riemann sum m
Sm:=Y,f{*h)Z{*k)\Qk\
(6.11)
fc=i
whose value is a random variable in L2($l,A, P), and depends on both the par tition {„} and on the choice of xk G G f) Qk, for k = 1,. . . , m . If the above
6.3. ANALYTICAL
PROPERTIES
OF RANDOM
FIELDS
93
sum converges in mean square when S —> 0, the Umit is denoted by the symbol in (6.10). In other words
JG
f(x)Z(x)dx
:=
jrf(zh)Z(xk)\Qk\
l.i.m S->O
k=1
if and when the limit exists. Observe that
E\Smf = E/(**W*fc)Z(*')/(*')IQ*ll««l k,l=l m
k = £ /(**)c(*V)/(*')ie*n
If /(x)C(a;, y)f(y) to
is Riemann integrable over G xG, then the last sum converges jGjGf{x)C{x,y)W)dxdy.
(6.12)
The converse is also true: T h e o r e m 6.3.1 Integral (6.10) exists in the mean square sense if and only if integral (6.12) exists in the Riemann sense, and then
E [ f(x)Z{x)dx JG
E\[
f{x)Z(x)dx
= 0, =
f
\JG
f
f{x)C{x,y)Jty)dxdy.
JGJG
PROOF: Indeed, for m > n,
Sm-Sn=
£ f(xk)Z(xh)\Qk\ k=n+l
and
E\Sm-Sn\z=
m
£
m
£ /(**)(**,*')/(*')l
fe=n+l ( = n + l
which spells out the assertion of the theorem, in view of the completeness of both W and L2(Sl,A,P). □
94
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
Given a compact K C M and a continuous function C : K x K —±(D, the corresponding eigenfunctions {V>n} a r e continuous. Therefore the double integrals /
/
i>m{x)C(x,y)rl)n(y)dxdy
JK JK
always exist in the Riemann sense. If Z is the m.s. random field constructed in (6.7), then the integrals (6.9) exist for each n. T h e o r e m 6.3.2 [Karhunen-Loeve Representation] A random field Z with con tinuous covariance function C on a compact set K C IR is m.s. continuous if and only if oo
z(x) = £ z„y>n(x), n=l
convergence being in mean square. Here {Vv} is the sequence of eigenfunctions associated with the kernel C(x,y), with corresponding eigenvalues {A„}. Moreover, {Z„} is a sequence of orthogonal random variables, with zero mean and EZkZ~l=Skl\l!
k,l = 1,2,....
(6.13)
Moreover EZ{x)Zk
= \ki>k{x),
A: = 1 , 2 , . . . .
(6.14)
In fact Zk{w) = (Z(.,u),i,k)
,
k=l,2,...
the inner product being as in (6-4). PROOF: Given a m.s. continuous random field Z over K, the sequence of its Fourier coefficients {Zn} is well defined if C is continuous. Then EZkZt
=
E [ 4>k{x)Z{x)dx j
=
/
/
rPl{y)Z{y)dy
ipk{x)C(x,y)i>i(y)dxdy ipk(x)dx
=
Xi{i>i,i>k),
thus proving (6.13). Condition (6.14) is proved in an analogous manner. The converse statement has already been established in Theorem 6.2.1.
r—I
In what follows, let D be a domain in (i.e. an open, connected subset of) IRd.
6.3. ANALYTICAL
PROPERTIES
OF RANDOM
FIELDS
95
Definition 6.3.2 A (complex) random field Z on D is m.s. differentiable at x° G D and its derivative there is the random vector U G L2(£l, A, P) if Z{x° + h) = Z{x°) + Uh + e{x°, h),
(6.15)
where E\e(x°,h)\2
0
\hf
as h —> 0.
The derivative of Z at x° is normally denoted by By Jenssen's inequality Ee{x°,h)\2
Z'(x°).
<E\e(x0,h)\\
hence Q<E\e{x\h)\
E\e(x°,h)\2
K
0. \h\4 \ If Z is differentiable at x°, take expected value in (6.15) to get EZ{x° + h) = EZ(x°)
+ EU-h
(6.16)
+ Ee{x°, h),
which, together with (6.16) implies the differentiability of EZ at x°. Moreover {EZ)'(x°)
=
EZ'{x°).
Observe that E\Z{X° + h) - Z{x°) - U ■ hf = C{x° + h,x° + h)-
C(x° + h, x°) - C{x°, x° + h) + C{x°, x°)
- [E (Z{x° + h) - Z{x0))
U + E (Z{x° + h)-
Z(x0))
U) ■ h
+ E\U -h\2 = C{x° + h,x° + h)-
C(x° + h, x°) - C{x\ x° + h) + C{x°, x°) -E\U
■ h\2 + Ree{x°,h)U
■ h,
upon recalling the differentiability condition (6.15). Then C{x° + h,x° + h) - C{x° + h, x°) - C{x°, x° + h) + C{x°, x°) E\U-h\ \h\
2
<
E\Z{x° + h) - Z(x°) - U ■ h\2
+
E\e{x°,h)\ \U\ \h\
96
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
Observe that \U • h\ is a quadratic form in the coordinates of h. Let Q be its matrix, necessarily hermitian. Definition 6.3.3 The correlation function C admits a generalized derivative at x° € D if there is a hermitian matrix Q G
+ T){x°;h),
where
\hf If such is the case, Q is denoted by the symbol d2C{x°,x°) dxdy Thus we have the following characterization of mean square differentiabihty for second order random fields. The notation below is as in (6.3). T h e o r e m 6.3.3 Let D be a domain in Md. The random field Z is m.s. entiable at x £ D if and only if the generalized derivative d2Czz{x,x) dxdy exists. If such is the case a) EZ'{x)
=
(EZ)'{x°),
b) Czz,{x,y)=d-C^,
and
c) the generalized derivative — g g
exists and
Cz
—— dxdy
.
differ-
6.3. ANALYTICAL PROOF:
PROPERTIES
OF RANDOM
97
FIELDS
Only b) and c) require a proof. For, notice that Czz,(x,y)
=
EZ(x)Z'{y),
therefore CZz{x + h,y)= E[Z(x + = EZ\x)
CZz{x,y)
h)~Z(x)}Z(y)
■ hZ{y) +
= EZ'{x)Z(y)
Ee{x,h)Z(y)
■h+
Ee{x,h)Z(y)
i.e.
CZz{x + h,y) - CZz(x, y) = Cz>z{x,y)
■ h + 5(x y\h),
where \h\ This proves b). As to c), observe that \C(x + h,y + k)-
C{x + h,y)-
C{x,y + k) + C{x,y)
- {U-h){U
-k)
1*11*1 tends to zei o as h, k —> 0, as can be easily established. Recallthat (U ■ h){U ■k) n is a bilinear form in h and k, hence there is a matrix B € Vn> such that (U-h)(U-k) thus proving that Moreover
g}g
=
Bh-k,
exists. Indeed, it equals B.
E {Z(x + h) ■ Z(x)} {Z{y + k) ■ Z{y)} = (Z'(x)-h)(Z'(y)-k)
+ o(\h\\k\)
-*%&*■>+-mm. which proves c).
□
These differentiability criteria simplify considerably if the random field is homogeneous. Indeed, the following is true:
98
CHAPTER
6. SECOND
ORDER RANDOM
FIELDS
Corollary 6.3.1 Let Z be a homogeneous random field over the domain D C IR ■ Then, Z is m.s. differentiable if and only if the generalized second derivative ^y(O) exists. If this condition holds, then ^£ exists for all x £ D, and
6.4
CZz'{x,x
+ y)
=
Czlz,(x,x
+ y)
=
-j^{y), a?C —-j^(v)-
Spectral Representation of Covariances
The characteristic function of a d-dimensional random vector X has been denned as and therefore
4>x{i) = je*XdFx{x). Also
Mt)=
Je^^idx),
where fj,x is the measure denned on the Borel subsets of M by (ix(B)
= P(X e B).
The following theorem says that the distribution fix of X is determined by the characteristic function of X. T h e o r e m 6.4.1 Let \x, v, be two finite Borel measures such that I ei(Xfi(dx)
= J e*xv{dx).
(6.17)
Then fi = v. PROOF: Let M be the class of all complex valued bounded measurable functions / for which j f{z)p{dx) Observe that
= J f{x)w{dx).
(6.18)
6.4. SPECTRAL
REPRESENTATION
OF
COVARIANCES
99
a) M. is not empty and is closed under linear combinations. b) M. is closed with respect to the operation of taking limits of uniformly bound ed convergent sequences of measurable functions. Hence it contains all trigonometric polynomials such as N
Ev**.
(6-19)
By the Weierstrass approximation theorem, any bounded continuous function is the uniform limit of a uniformly bounded sequence of trigonometric polynomials like (6.19). Hence M. contains all continuous bounded functions. If / is any bounded measurable function, there is a uniformly bounded se quence {/ n } of continuous functions such that fn —> f pointwise. Therefore f £ M. Taking / = XB in (6.18) for an arbitrary B £ B yields fi(B) = v(B), hence fi = v. r—|
Definition 6.4.1 Let <j> : IRd —HZ? be a Borel measurable function. A spectral measure for cj> is a finite measure fj, : B —¥ [0, oo) such that
0(*) = | e * > K ) . The preceding theorem says that, if a spectral measure exists, then it is unique. If a function > has a spectral measure, then
4>(y-x) = (x-y), hence 4> is s y m m e t r i c . Moreover, for k > 1 and wi, . . . , « * G(F £ Z,m=l
{xl - xm)mum
=
^ (,m=l
/e^'-^Vt^ufc J
2
/
E u;e
im'-i
/*(«) > o
i.e. <j> is of positive type. Thus <j> could very well be the correlation function of a homogeneous random field. Indeed,
CHAPTER
100
6. SECOND
ORDER RANDOM
FIELDS
Proposition 6.4.1 Let (p : Mtd —>• W have a spectral measure. Then, is the correlation function of a homogeneous m.s. continuous random field. PROOF: Observe that <j> is continuous, since
W«)-#y)||«*-~e*"|ji(dfl and e'*'1 — ei('v\ < 2. Letting y -> x, get <j>(y) —► >{x), hence continuity. In addition, > is known to be symmetric and of positive type. Let M := n(IRd). Let U be a random vector in IRd, with distribution P(U G B) = ^ B M
)
Let V be uniformly distributed on [—JT, W], independent of U. For each x let
d
e m.
Z{x) = VMt,i{V-*+V) Then EZ(x)
--
M 1-K
^)(4,e^K))=0, 7 VJiR" V v
\J-K 2TTM \J-«
EZ(x)Z{y)
= =
=
lU
y)
MEe ^-
M* Mje J i f f «*■<—> "(de) M
4>(x-y)-
Hence Z is a m.s. continuous random field with correlation function <£.
□
T h e o r e m 6.4.2 [Bochner] A continuous Junction of positive type has a unique spectral measure. PROOF: See [13], pp. 209-211, also [14], Chapter 5.
□ d
If the spectral measure fi of <j> has a density g € Li(IR ),
M#) = J MM,
i.e.
6.4. SPECTRAL REPRESENTATION
OF COVARIANCES
101
then
-f<"'Mm.
>(*) =
Function g is called the spectral density of >. If there is a spectral density g, then
\*i*)\
II £ L\. For a proof, see [13], p. 211. The boxed remark at the end of section 6.2 is closely related to the one in sec tion 5.2, concerning the impossibility of having a constant function as the Fourier Transform of an integrable function. Indeed, a random field with a constant spec tral density is called white noise in the Electrical Engineering literature. The use of white in this terminology reflects the fact that all wave numbers £ are equally present in a constant Fourier transform, same as all frequencies are present in white light. Thus, white noise has infinite frequency band. An easy derivation shows that a constant spectral density is the Fourier trans form of an integrable function C if and only if C is the covaxiance function of the derivative W of the Wiener process -which we know does not exist! In prac tice one works with wide band approximations to white noise, so this apparent contradiction creates no real problems. Therefore White noise cannot be realized as a stochastic process. The white noise used in engineering practice is wide band approximation to real white noise. Nevertheless, white noise becomes a bona fide mathematical object provided distributions are accepted as possible trajectories of a stochastic process. In fact, d-dimensional white noise is a generalized random variable on IR with its Borel subsets, for d > 1. See e.g. [12] for the theory of generalized stochastic processes.
Chapter 7 Spectral Representation of Random Fields The spectral measure associated with every covariance function in section 6.4 becomes the basic ingredient of a representation formula for m.s. continuous, homogeneous random fields defined on a finite dimensional Euclidean space. The concept of an orthogonal random measure associated with a given positive mea sure is studied in section 7.1. In turn, the stochastic integral of a deterministic integrand with respect to an orthogonal random measure is introduced in section 7.2. Finally, the important representation formula for m.s. continuous homoge neous random fields is given in section 7.3. Every such random field is shown to be the Fourier Transform of the random orthogonal measure associated with the spectral measure of its covariance function.
7.1
Random Measures
In what follows, let S C lRd and let B = B(S) denote the family of its Borel sets. Let a finite positive measure A : B —>W be given (for instance, A could be Lebesgue's measure). Moreover, let (f2, A, P) be a probability space. Definition 7.1.1 A r a n d o m measure in S is a function v : B x fl —>
104 CHAPTER
7. SPECTRAL
REPRESENTATION
OF RANDOM
FIELDS
Thus, a random measure in S is both a) a random field on B, i.e. a family of random variables {v(B, ■), B S B}, and b) a measure valued random element ui i-> u(-,w) on (£l,A,P). Of particular interest are random measures for which each v(B, ■) is a square integrable random variable. For such random measures, the inner products E»[BrMCr),
(B,CeB)
are well defined. Definition 7.1.2 An orthogonal random measure in S is a function v : B —» L2[Q,A,P) such that i) "{Y^i
Bk) = HtLi v(Bk) for every pairwise disjoint sequence in B, and
ii) there is a complex measure A on B such that, for every B,C € B, Ev{B)v{C) = X(B PI C), for B,C £ B. In such case, A is called the structure measure of v. Note that X(B) = E\u(B)\2
> 0,
hence A is a finite positive measure. In particular = "A(0) \vl ~
2 E\u(%)\ - \"\v)\ ~= 0, "I
± ,
hence hence u(%) u(%) = = 0. 0. A commonly used notation is A commonly used notation is
E\v{di)\2 = \{di), E\v{dt,)\2
meaning meaning
E\u{B)\2 = X(B),
= \{di),
forallBeS.
E\u{B)\2 = X(B), forallBeS. If a spectral density exists, If a spectral density exists, it it is is customary customary to to write write A(d£) = A(d£) = g{t)dt, g{t)dt, hence
E\v{d£)\*== g(t)dt. E\»{di)\* g(tW-
(7.1) (7.1)
7.1. RANDOM
MEASURES
105
Let Z be a second order random field on 5 , and let L2(Z) denote the closed subspace of L2{Q,,A, P) generated by {Z(x),x £ S}. Namely, L2{Z) is the closure of the set of all linear combinations of the form
for n > l,x\...,xn£
S, c\...,cn
<E
Definition 7.1.3 An orthogonal random measure v is said to be s u b o r d i n a t e d to the random function Z if v[S) £ L2(Z) for each B £ B. E x a m p l e 7.1.1 Let {W(t),0 < t < 1} be the Wiener process, constructed in Example 6.2.1. Recall that its increments over disjoint intervals are independent, Gaussian random variables, with W{t + h) — W{t) ~ N(0,h). Let B stand for the Borel subsets of [0,1]. Define a random measure on B by setting uw([a,b}) = W{b)-W(a) for any interval [a, b] C [0,1]. The set function uw is clearly
b-c.
Again, a limiting argument shows that EWBWC
=
\BDC\,
hence vw is an orthogonal random measure with Lebesgue's one dimensional measure as structure measure. It is clearly subordinated to the Wiener process. E x a m p l e 7.1.2 Let S C Md and let Z be a homogeneous random field over IR with spectral density g £ Li(IRd) fl L2{IRd)- Then, the correlation of Z is
R(x) = Je2™- tgftdt,
106 CHAPTER
7. SPECTRAL
REPRESENTATION
OF RANDOM
FIELDS
by Bochner's Theorem 6.4.2. Let us construct an orthogonal random measure subordinated to Z, with the spectral measure of Z, i.e.
fi(B) = [ g(£)dt JB
as structure measure. For, recall that
/(*) = y>"V~(£R for every / £ L2{Md).
In particular, the closed subspace generated by the system {e 2 ™ 4 , x G Bd}
is the whole of L2(lRd). form
Its elements are the limits of hnear combinations of the (7.2)
Yl c*e
where n > 1, I 1 , • • •, I " 6 St , c 1 , . . ., c" 6
t,ckZ{tk). Given two linear combinations like (7.2), an easy computation shows that
-/(?*""•0 fe^eW
E[J2ckZ{x *)J [YldtZiy ■>)■ \ k
'
)
■
?(««■
In particular, E
/
Z!0*1
,a™*-«
(£R,
which shows that the mapping
Ylcke2™kt^J2ckZ(xk) is a hnear isometry. It can thus be extended uniquely to a hnear isometry ip L2(g) -> Lt(Z). Note that L2(g) = {>: Rd ^(C\ J \M)\ag{e)dt}.
7.2. STOCHASTIC
INTEGRALS
Define u : B ^ L2(Q,A,P)
107
by u{B) = V-(Xfl)-
Let Bi,Bj,...,
be a pairwise disjoint family of sets in B; clearly
*£*
k
hence
An* V «: \ k
1/
*k
Moreover,
Eu{B)u{C) Eu{B)U(C)
=■=
EII>( B)II>{XC)
£ V ( X BX ) ^ ( X C )
■ f x {t)xc(t)g{t)dt
B = J XB{t)M$)g{Z)di =- fI 9(tH g(tn JBnC JBnC
i.e. jgi/(5)i^)=/i(BnC). Therefore v is a random orthogonal measure whose structure measure is the spectral measure of the random field Z. Moreover, v is subordinated to Z by construction.
7.2
Stochastic Integrals
Let v be an orthogonal random measure with structure measure A on (S,B). We shall construct a linear isometry from L2(S, B, A) into L2(£l,A, P), denoted by
f^Jfdv,
(7.3)
E\J fdJ\ = f \f(z)\*X{dx).
(7.4)
i.e. a linear mapping such that
The mapping in (7.3) is called the stochastic integral with respect to the random orthogonal measure v. Let us begin by considering simple functions in L 2 (5', B,A), and define (7.3) for these functions only.
108 CHAPTER
7. SPECTRAL
REPRESENTATION
Definition 7.2.1 A function / e L2{S,B,X) form
OF RANDOM
FIELDS
is said to be simple if it is of the
n
where n > 1 , . c i , . . . , c n e <E and B, i , ■■k=l ■, Bn constitute a partition of 5. where n > 1, C j , . . . , c n £(D and B\,...,Bn constitute a partition of S. ,'s are the stochastic To suppose all To avoid avoid ambiguities, ambiguities, suppose all the the c* c^'s are distinct. distinct. Define Define the stochastic integral integral of of such such // by by means means of of
(fdu:=J2ckV{Bk). J
(7.5)
*=i
This definition associates with every simple / a well defined random variable in L2{n,A,P). Clearly J{af
+ l3g)du = ocJfdu+pJ9du,
(7.6)
if both / and g are simple. Moreover, E J fdujgdu
= j f{x)g(x)X{dx).
(7.7)
The following concept is useful in order to establish both (7.6) and (7.7). Definition 7.2.2 Given two partitions {Bi,..., Bm} and { C j , . . . , C n } of 5, the collection of sets {Bk fl Ci, k = 1, , m, I = 1 , . . . , n } is called the c o m m o n refinement of the given partitions. Clearly the common refinement of two given partitions of 5 is a partition of S. Thus, given two simple functions m
n
k=l
1=1
we can write f = ^Z ckXBhnC,,
9 = Y1, ^'XBkriC, •
k,l
k,l
In particular af + Pg = Y,(ack k.l
+
WxBknC„
7.2. STOCHASTIC
INTEGRALS
109
from which (7.6) follows readily. Analogously,
Effdufgdu = J
fiJ^lEW)
J
k
i
J2ckdiMBkHc7)
=
k,i
k,l
= J f(x)gjx~)\(dx) and (7.7) follows. As a consequence
E\] fdvf = j\f{x)\l\{dx). Recall that any / £ L2(S,B,X) functions, in the sense that
(7.8)
can be approximated by a sequence of simple
11/ " Ml = ( / {/(*) - Mx)}2 A(tfa)) * -> 0.
(7.9)
As a consequence, | | / m — / n | | —► 0, and, by (7.6) and (7.7) we have
E\J Udv - j fndu
1/m - /nil2 -»■ 0,
thus proving that {/ fndv} converges in L2{£l,A, P). However, the limit might certainly depend on the sequence {/ n }- To prove that such is not the case, let {/„} and {fll} be two approximating sequences, in the sense of (7.9). Then 11/„ — /„'11 —¥ 0 as n —> oo and therefore 2
E hence {/ fndi/}
/ / > - / / >
= II/: - /:ir ^ o,
converges in L2(Sl, A, P) to a limit which depends only on / .
Definition 7.2.3 Given / € L2(S,B,X), let {/„} be any sequence of simple functions such that ||/ n — / | | —► 0. The stochastic integral of / with respect to an orthogonal random measure v on (S, B) is given by
[fdu=
l.i.m [fndu.
(7.10)
110 CHAPTER
7. SPECTRAL
REPRESENTATION
OF RANDOM
FIELDS
Clearly relations (7.6) and (7.7) can be applied to any pair of simple functions / „ and gn, with ||/ n - f\\ -> 0, \\gn - g\\ -> 0, for f,g € L2(S,B, A). Letting n -> oo we obtain (7.6) and (7.7) for arbitrary f,g & L2(S,B,X). The preceding developments can be summarized in the following statement. T h e o r e m 7.2.1 Let v be an orthogonal random measure on (S,B), with struc ture measure A. The stochastic integral with respect to v is defined for simple functions f in (7.5) and for arbitrary functions f € L2(S, B, A) in (7.10). The correspondence f i-f / fdu is a linear isometry from L2(S, B, P) into L2(il, A, P). E x a m p l e 7.2.1 Let vw be the orthogonal random measure associated with the Wiener measure in example 7.1.1. Then, / ►-+ f fduw is called Wiener's inte gral. In a more customary notation the Wiener integral is denoted by / fdW. Clearly
E\j fdW^ = j\f(t)\2dt.
(7.11)
Moreover, from the definition of the Wiener integral it is clear that
E f fdW = 0. Indeed, the random variable J fdW is Gaussian for each / € Lj[0,1], with zero mean and covariance given by (7.11). Let us now see that the stochastic integral defined above allows us to construct new orthogonal random measures from a given one. For, note that for arbitrary g S L2(S, B, A) and b £ B, the function gxs is in L2(S, B,X) and therefore
J gdv := I XBgdv is well defined. Define Definition 7.2.4
JB
for each B € B.
7.2. STOCHASTIC
111
INTEGRALS
Clearly (7.2.4) defines a function vB : B -4 L2(Q, A, P) and
V fc /
k
Moreover, Eug{B)v„{C)
= E (J xsgdi^j =
(J'xcgd")
\g(x)\2X(dx).
f
JBnC
Define a new set function A9 : B —¥ [0, oo) by \g(B)
\g(x)\2X(dx),
:= f JB
clearly a finite measure on (S,B).
Therefore,
Eug{B)~UgJC) =
Xg{BnC)
and vg is an orthogonal random measure with Xg as a structure measure. The foEowing result relates integration with respect to vg to integration with respect to the original random measure v. P r o p o s i t i o n 7.2.1 If f & L2(S,B,Xg),
then fg £ L2{S,B,X)
I fdug = j
and
jgdv.
PROOF: For simple / , say / = £ * ckXBk,
J idv9 = Ylc* / xugdv = J IY, CkXBkj gdv and the assertion is true. For general / , let {/„} be a sequence of simple functions in L2(S,B,Xg) converging to / . Then / fndvg
= I fngdv,
n = 1,2,... .
Since E\Jfngdu =
- J fmgdv\2
= j \fn(x) - fm(x)\2
J \fn{x)g(x)-fm{x)g{x)\2X{dx)
-*0
the assertion foEows upon taking Emits when n —¥ oo.
Xg(dx)
asn,m->oo, p
112 CHAPTER
7.3
7. SPECTRAL
REPRESENTATION
OF RANDOM
FIELDS
The Spectral Representation
Recall the Karhunen-Loeve representation Z(x) = £
Vn(*)2n
(712)
n=l
for a m.s. continuous random field Z, given in theorem 6.3.2. A nice aspect of (7.12) is that all the local properties of Z are determined by a single sequence of orthogonal random variables {Zn}: it suffices to choose the coefficients appropri ately. Let us proceed now to obtain a family of representations for Z having this nice property. For, replace the orthogonal sequence of random variables {Zn} by an orthogo nal random measure v on (5, B) with structure measure 1 v. Suppose Eu(B) = 0 for each B 6 B. Next, replace the system of eigenfunctions {^n} by a family of Borel functions
W.,0,£€S}. Assume ip(x,.) £ L2(S,B,X) the stochastic integral
for each x £ S. Then replace the series in (7.12) by
j^{x,Z)V{d£)
:=
jrl>{x,.)dv.
By analogy with the Karhunen-Loeve representation, define the random field
Z(x):= Ji>(x,Z)»(d£).
(7.13)
Clearly Z is a centered second order random field. Moreover its covariance func tion is
C(x,y) =
EJ^{x,i)V{di)Ji,{y,i)V{di),
i.e.
C(x,y) = fi,(x,OWT)Hd().
(7.14)
Observe the clear analogy between this representation of the covariance function and Mercer's theorem (6.6). 'The underlying set S does not have to be compact.
7.3. THE SPECTRAL
REPRESENTATION
113
Note from (7.13) that each Z{x) is the limit in mean square of a sequence of linear combination of the form
Thus, necessarily Z(x) belongs to the closed subspace of L2(S,B,X) by the system {+{*,tU e sy.
generated (7.15)
This system is said to be c o m p l e t e in L2(S,B,X) if the closed subspace it gen erates coincides with L2(S,B,X). If such is the case, then every second order random field Z can be expected to be representable as in (7.13), where v is an orthogonal random measure determined by Z. All relations involved being linear, v must be a linear function of Z, i.e. v must be subordinated to Z. Let us formalize the foregoing remarks by proving the following result: L e m m a 7.3.1 Let S C IRd, B = B(S). Let X be a finite positive measure on (S,B), and let if) : S x S —±(D be given, with system (7.15) complete. Let Z be a m.s. continuous random field over S, whose covariance function satisfies (7.14). Then, there is an orthogonal random measure v, subordinated to Z and having X as a structure measure, such that
z{x) = Js^{x,i)v{di), x e s.
(7.16)
PROOF: The construction of v proceeds along the lines of Example 7.1.2, hence it will be merely outlined here. Indeed, consider the set of all linear com binations n
YJckip(xk,i), fc=i
where n > 1, x1,..., xn € S, C j , . . . , c n e G. The closure in L2{S, B, X) of the set of these linear combinations is the whole space. An easy computation shows that E
X>*Z(a
/
5>^(x*,fl
Therefore the linear mapping
J2cklP(xk,t)^Y,CkZ(2 k
k
X{d().
114 CHAPTER
7. SPECTRAL
REPRESENTATION
OF RANDOM
FIELDS
is a linear isometry from a dense subset of L^S, B, A) into Li{Z). It can be extended by continuity to yield a linear isometry $ : L2(S,B,X) —¥ L2(Z) such that Z(x) — $ ( K , . ) for each x £ S. Define
«/(£) = *(xs),
B6B.
The set function v can be shown to be an orthogonal random measure with A as structure measure just as in Example 7.1.2. It is subordinated to Z by construction. Define
W{x):=jsi,{x,Z)v[d£). Then E\Z[x) -
W(x)\2
= E\Z(x)\2
- EZ{x)W(x)
- EW{x)Z(x)
+
E\W(x)\2.
Observe that E\Z(x)\2
= C(x,x)
= I M*,t)?\(dt)
E\W(x)\2
=
and EZ{x)W(x)
= j
i>{x,i)EZ{x)v{dt)
EZ{xM§)
=
EZ{xj*fa)
=
E*MZ,.)J*UB)
where
= f 1,(z,£)\(dt), JB
hence EZ(x)W(x~)
= =
Jj^l)iKx,t)\(d() J\iP(x,t)\2\(dt)
=
Therefore Z(x) = W(x) in L2(Sl, A, P).
E\W(x)\2. a
The preceding lemma can be applied to homogeneous m.s. continuous random fields Z over 2R . For these random fields
C(x,y) = f
e2^'y)^(dC),
7.3. THE SPECTRAL
REPRESENTATION
115
where /x is the spectral measure, by Bochner's theorem. As pointed out in example 7.1.2, the system is complete in L2(IRd). The hypotheses of Lemma 7.3.1 are thus verified, and therefore the following result holds. T h e o r e m 7.3.1 Let Z be a homogeneous m.s. continuous random field over IR , with spectral measure (i. Suppose EZ(x) = 0, x £ iR d . Then, there is an orthogonal random measure v in IR , subordinated to Z and having \i as structure measure, such that Z(x) = f e2lrix^{d(),
x € Md.
(7.17)
In the above theorem,
E\u{dt)\* = tfdt) and, assuming there is a spectral density g, it follows that E\v(dt)\2
=
g(t)dt
in the notation of (7.1). On the other hand, it is instructive also to look into the case of discrete A in Lemma 7.3.1. Suppose A is concentrated in a countable set {£"} in S, with positive masses {A n }. Since A(S) = EJJLj ^n < + ° ° j it follows that An —> 0 as n —> oo. Then (7.14) transforms into
n=l
strongly reminiscent of (6.6). Consequently, v satisfies Ev{B)U{C)
= £{Afc
:(k£BnC},
and thus
E\»(B)\2 = J2{\k-.eeB}, so that u(B) = 0 if B contains no element of {£"}. In other words, v is concen trated in {£"}, with weights {u((n)}. Observe that
EW({C})\2 Eu({t}HUn})
= K, = 0 if m / n ,
116 CHAPTER hence the v({£n})
7. SPECTRAL
REPRESENTATION
OF RANDOM
FIELDS
=■ Zn are orthogonal. Thus, (7.16) yields
Z(x) = -£i;(x,r)Zn, n=l
which reminds us of the Karhunen-Loeve expansions (6.7). Indeed, the above situation obtains if S is compact, as established in Theorem 6.3.2: {A„} is the spectrum of the integral operator (7.17) and the {ip{x,£n)} are the corresponding eigenfunctions, which constitute a complete orthonormal sys tem. Moreover, the series expansion of C(x, y) given above is Mercer's theorem.
Chapter 8 Sampling and Modeling Random Fields Recall the representation formula
Z{x) = j e2"i(*u{d£)
(8.1)
(see Theorem 7.3.1) for a given homogeneous random field Z : IRd —> Zy2(Q, A, P), with EZ(x) = 0, x £ IRd (hence Ev(B) = 0 , B e B). This important formula furnishes the necessary tools for the simulation of Z, i.e. for computing values of each Z(x). Indeed, if are N independent samples of u, then it suffices to evaluate the N integrals
yv^-VK), k = i,...,N, to have a sample of length N from Z(x). In practical terms, the problem is solved if we know how to a) sample the orthogonal random measure u, and b) evaluate Fourier transforms efficiently. Sections 1 and 2 below will be devoted to these two tasks, respectively. Alternatively, consider the Karhunen-Loeve expansion Z{x) = £
ZJinix),
n=l
117
xeK,
(8.2)
118
CHAPTER
8. SAMPLING
AND MODELING
RANDOM
FIELDS
where K C Md is compact (see Theorem 6.3.2). Again, this last formula furnishes an algorithm for simulating Z. It suffices to devise efficient methods to a) sample the orthogonal sequence {Zn}, b) compute the eigenfunctions {V"n}, together with the associated eigenvalues {A„}, and c) sum the series for Z(x). These items are touched upon in section 8.3 below. Finally, both representation (8.1) and (8.2) are obtained from the second order properties of the random field Z. In practice, these properties must result from actual measurements, i.e. from sampling Z at sufficiently many points. Indeed, the required expectations can then be estimated by involving the Law of Large Numbers. Refer to section 8.4 below for these questions.
8.1
Sampling Orthogonal R a n d o m Measures
Let B stand for the family of Borel subsets of St , and let A be a finite positive measure on B. Consider an orthogonal random measure v with A as structure measure, in the sense of Definition 7.1.2. Suppose Eu(B) = 0, for all B S B. An important particular instance of orthogonal random measure obtains in the d i s c r e t e case described as follows. Let {£ x , £ 2 , . . . , } be an at most countable subset of ]Rd, and let {A lt A 2 , . . . } be positive weights, with J2k^k < +oo. Suppose A is discrete, concentrated at {£\Z2,-}, with weights {A 1 ,A 2 ,...} , i.e. * ( « * } ) = A», fc = l , 2 , . . . . Then
E\u(B)\2 =
J2{^-XkeB},
which implies that v is concentrated on {^ 1 ,^ 2 ,-..}, too. Define the random variables Zh •■=»[{?}),
*: = 1 , 2 , . . .
Then BZn EZnZm
= 0, n = l , 2 , . . . , = SnmXm, m,n — 1 . 2 , . . . .
8.1.
SAMPLING
ORTHOGONAL
RANDOM
MEASURES
119
To sample the orthogonal sequence {Zn} one may proceed as follows: Let itj, vi, 112, v2,... be a sequence of independent samples from a real random variable with mean 0 and variance 1/2. Then * * : = \f\k(uk
+ ivk),
A; = 1 , 2 , . . . ,
(8.3)
defines a sequence of independent samples from a complex random variable with mean 0 and variance A*., fc = 1,2,.... In particular, they are a sample from the orthogonal sequence {Zn} constructed above. Thus 00
yields a sample from Z(x), as follows from (8.1). If the representation (8.2) holds instead, then a sample from Z(x) is given by oo
Y^ zki>k(x). k=l
A second, equally important particular case obtains when A has a density g € Li(IRd) (the continuous case) i.e. when a spectral density exists. Then
Eu(B)^C) = I
g(()dt
JBnC
and in fact
EW&)\* = g{t)dt. Let us deal with the continuous case in the best Calculus tradition, i.e. by discretizing. For, let Qo,Qi, ■ ■ ■ ,QN be a finite partition of Md, like in figure 8.1, and suppose each Qk has finite Lebesgue measure |Q*|, for k = 1, 2 , . . . , N. Then,
k=ijQt
can be used as an approximation to Z(x). approximated by the finite sum
In turn, this last integral can be
fy^v(<w, k=l
120
CHAPTER
8. SAMPLING
AND MODELING
RANDOM
FIELDS
Figure 8.1: Partition of IR*. where £* g Qh, k = 1 , . . ., JV. Here
and this last integral can be approximated by the product g{£k)\Qk\ ='■ ^kLetting Zk:=u{Qk), k = l,2,...,N, Zk constitutes a finite sequence of orthogonal random variables with 0 mean and variance Ajt, k = 1,2,..., N, which we know how to sample. Indeed *k~Tjgtfh)\Qh\(ui.+ivk),
k=l,...,N
(8.4)
gives one such sample, where ui,vx,... ,uN,vN are 2N independent samples from a real random variable having mean 0 and variance 1/2. The sum £>***-*
(8.5)
fc=i
gives an appropriate sample from Z(x). Now, if w is a random number generated on a computer, it is a sample from the uniform distribution on [0,1], which has mean 1/2 and variance 1/12. Therefore
'H-zJ^w-i)
(8.6)
8.2. FAST FOURIER
COMPUTATIONS
121
yields a sample from a real random variable having mean 0 and variance 1/2. If *"ii • •• ,fV2N result from 2N calls to the random number generator, the transfor mation uk := a(w2k),vk := o-(w2k-i), k = 1,..., N yields the required random variates. Thus, computing one sample from Z(x) requires a) generating 2N random numbers 1 , b) transforming each of them as in (8.6), c) computing the coefficients zk as in (8.4), d) computing the sum in (8.5). To guarantee a modicum of precision, this must be done for large N (the larger the better). Moreover, if Z(x) is being sampled in order to compute some statistic using the Law of Large Numbers, the whole samphng procedure should better be repeated a "large" number of times (and, again, the larger the better). To top it all, one may be interested in samphng at many points. Thus, samphng random fields using the technique described above may be prohibitively expensive. Indeed, computing the sum in (8.5) requires N complex multiplications and N — 1 complex additions. If N values of x are to be sampled, the total of complex operations is 0(N2), which indeed grows very fast with N. Following [17], let us achieve computational efficiency in step d) above by using the Fast Fourier Transform, to be briefly described below. As to steps a), b) and c), not much can be done to render them more efficient, but worst alternatives could certainly be proposed.
8.2
Fast Fourier Computations
Let us address the problem of computing sums of the form (8.5) in an efficient way. Let us begin by renumbering the points {£*} and the complex numbers {zk}, assuming that the basic domain is the parallelepiped Qm ■=
[-mh,+mh]d,
1 This figure decreases to N when real random fields are to be generated, because of the symmetry relations (8.11).
122
CHAPTER
8. SAMPLING
AND MODELING
RANDOM
FIELDS
where h > 0 is given and m is "large". The following notation is convenient: Define a d-dimensional multi-index k := (ki, ...,fe
h kJd
if both k and I are multi-indices. Define the height \k\ of a multi-index k as
1*1 ~|feil + - -- + NDivide each of the factor intervals of Qm according to the mesh —mh, — (TO — l)h,..
., —h, 0, h,...,
(m — l)h, mh
and consider the product mesh in Qm, consisting of all points of the form hk, where A: is a multi-index and \k\ < TO. Assume the £*'s are precisely these mesh points, i.e. (k = hk, \k\ < m. (8.7) The sums to be evaluated are of the form 2^ e
zk,
x G In .
\k\<m
To achieve computational efficiency, restrict x to be of the form x' := SI,
\l\ < m,
(8.8)
where SH = ± .
^
(8.9)
This last equation can be termed as uncertainty principle, since it relates un certainties in both space (x) and wave number (£): they cannot both be reduced independently. However, the larger TO, the smaller the combined uncertainty. With these choices of mesh points the sums to be evaluated take the form
E e2,ri£U,
\l\<m.
\k\<m
Define
id
o m := em,
m = 1,2....
8.2.
FAST FOURIER
COMPUTATIONS
123
the main m-th root of unity, and consider the linear transformation aklzk,
wi = E
\l\ < m,
(8.10)
|Jfc|<m
where the subindex in a m has been suppressed in order to simplify notation. If (8.10) is to be used for the evaluation of (8.5), while modelling a real random field Z, then u>i must be real, for |/| < m. Therefore a kl
wi ~ E
ak-'-zZ^ = io,,
~ ~Zk = E
|fc|<m
|fc|<m
i.e. o.k\zZ-k - zh) = 0,
E
|/| < m,
\k\<m
hence Z-k = zjc, \k\<m.
(8.11)
Let us restrict ourselves to the case d — 1 in what follows. Suppose the 2m + 1 dimensional vector ( z _ m , . . . , z m ) is periodic with period 2m, i.e. Z—m
=
%m-
Since a~ m = a m , it follows that
W_m = £ a'"**!* = E " " ^ = w™ k
k
and the resulting 2m + 1 dimensional vector ( w _ m , . . . ,wm) will also be period ic, with period 2m. Redefine accordingly the basic transformation (8.10) and consider m-l
wi=
E
<£**.
/ = -»»,...,0,...,m-l
instead. L e m m a 8.2.1 If m is even, then 2
E
2
2
kl
*=-? '
I
x
\—*
*=-f
kl
(8.12)
124
CHAPTER
8. SAMPLING
AND MODELING
RANDOM
FIELDS
PROOF: Split the elements of 2—771) 2—771+1 j ■ ■ ■ j ^ o , . . . , Zrn—1
into the "even" and "odd" parts 2-771) 2-771 + 2! • ■ • 7
*0t •■•|*B»—2
2-m+li • • • i2 - l i
2i,...,Zm_l
and rewrite (8.12) accordingly, i.e.
vi = E a f r ^ - T ^ - + E oir*+*-1,,*-»+w-i. 771 — 1
771
i=o
i=i
Letting j = y + fc in the first sum, j = y + k + 1 in the second sum, we obtain
«*=
f+i E
a
f+i m'22*+at„ E a -'22fc+l.
The assertion follows upon realizing that a m = a™.
□ The foregoing lemma is of fundamental importance in rendering efficient the computation of sums like that in (8.12). Indeed, it implies that, for a given sequence of length 2m, such sum can be reduced to evaluating two sums, each of length TO = 2(TO/2). A tremendous simplification is obtained if m is a power of two, say m = 2", for some n > 1, because in such a case the process can be continued iteratively until sequences of very short length have to be considered, say rra = 2 in (8.12). In particular, for TO = 2 this equation reduces to U;_2
=
2 _ 2 + 2 - 1 + 2 0 + 2X =
W_i
=
2 _ 2 — 2 _ i + 2 0 - 2l = W i ,
W0,
which can be carried out with virtually zero computational cost.
8.3.
SIMULATION
AND ORTHOGONAL
EXPANSIONS
125
The main overhead associated with this recursive computation (known as a Fast Fourier Transform, briefly F F T ) concerns the bookkeeping required in order to keep trace of the various stages in which the sequence length is pro gressively halved. Otherwise, the arithmetic effort is reduced to one complex multiplication at every stage. For a computational implementation of this algo rithm, see [29]. In the notation of (8.5), N = 2m, with m = 2 n and the number of multipli cations is nm. In other words, the number of complex multiplications has been reduced from 0(N2) to 0(Nlog2N). See [18] for a formal proof of this fact, and for more general F F T ' s and applications thereof. In dimension d > 1, the computations can be arranged recursively, as in
«i, u =
E ^
"
^
*,
fci ,...,fc(j m-l
with zu
—
kd
—
z
V 2—1 kx
a
fl*i'i+~+'w-i',i-i2.
Z
, *l.
,
-Md-iMi-
l...,k
Thus, implementing a multidimensional F F T is achieved by combining a) a one dimensional F F T , with b) a suitable bookkeeping strategy. This last item is necessary because of the large arrays generated while apply ing the above iterative procedure. Again, see [29] for an implementation of a multidimensional F F T .
8.3
Simulation and Orthogonal Expansions
Let us consider now the simulation of a (non homogeneous) random field over a compact set K C IRd, which can be based upon the Karhunen-Loeve representa tion (8.2). The question of sampling the orthogonal sequence {Zk} has been addressed in section 8.1. Let us now concentrate on the two remaining issues connected with the practical application of (8.2), namely:
126
CHAPTER
8. SAMPLING
AND MODELING
RANDOM
FIELDS
a) computing the eigenfunctions {V>fc} and eigenvalues {A*} of the integral oper ator A in (7.17) and b) summing the series for z[x) in oo
where {z*.} is as in (8.3). Clearly this last series can only very seldom (if ever) be computed exactly. In fact, in practice only a few of the eigenfunctions will be available, because solving eigenvalue problems Aj,n = \n4,n, n = l,2,... (8.13) is not easy. Therefore, the question of "summation" boils down to "accelerating convergence" of the sequence of partial sums £«tM*),n=l,2,...
(8-14)
k=i
for each given value of x £ K. Following [4] let us consider a numerical se quence { a n } , which we know converges to a. The idea is to convert it into a second sequence { a ^ } which also converges to a, but does it faster. One such transformation is Shanks', defined by 2
a„ =
^—,n = 2,3,-..
(8.15)
ln+1 + O n _i — lan
Clearly this transformation can be iterated, giving rise to derived sequences { ° i }i { a n }' which converge even faster. The Shanks transformation (8.15) is known to often result in a substantial improvement in convergence speed, see [4] for examples. On the practical side however, loss of precision may occur because of roundoff errors in the denominator of (8.15): if a„ is "close" to a then a n + 1 + o n _i — 2a n may be close to 0. Hence the use of high precision arithmetic is mandatory. An alternative to Shanks transformation is R i c h a r d s o n e x t r a p o l a t i o n , which consists in applying the transformation a{n1] = t(-l)jan:j{n~y,
"
n = 2,3,....
(8.16)
8.3.
SIMULATION
AND ORTHOGONAL
EXPANSIONS
127
As v and n increase, a^ converges to a faster than a„. Again, see [4] for further details. Let us now consider the solution of the eigenproblem (8.13). Let us recall the variational characterization of the spectrum of a compact non-negative operator, like A, [14], p. 122: An = min m a x i j ^ f ^ ) , (817) dimS=n— 1 (f>±S
where RA : L2(K) namely
— {0} —> IR is the Rayleigh quotient associated with A,
XAW
= ^ #
(8-18)
\\
with ||.| = (., . ) 2 and (4>, ip) = JK cj>(x)ip(x)dx. In particular, A! =maxRA
U)
(8.19)
and it is known that An =
max
RA(4>),
n > 1.
(8.20)
*±V J -,j
Following [35], the R i t z m e t h o d consists of selecting a linearly indepen dent set >i,... ,4>N 6 L2(K) and performing the required maximization on the subspace Sp/ generated by this set, i.e. minimizing on
Sw:= {££*&:&.•••>& E H An easy computation shows that
fl (£&&)=-Mf), f e ^ where
*»«> = r f s and the matrices ATJV = {hi) and Mjv = (my) have the elements kij = (A4>j,4>i),
i,j =
rnij = {j,4>i) i
l,...,N
i , i = 1 , . - . ,iV
(8 21)
-
128
CHAPTER 8. SAMPLING AND MODELING RANDOM FIELDS
respectively. Note that both KN and MN are hermitian matrices, as follows directly from the self-adjoint character of A. They are also positive, and MN is in fact positive definite, 2 due to the linear independence of <j>i,..., 4>N- Thus, the discrete Rayleigh quotient RNH) is well denned in (8.21) and is a non-negative real number for each nonzero £ 6 <E . Letting
\hN
= maxfi N (fl
it follows that Allff
N>1.
On the other hand, minimizing the discrete Rayleigh quotient is readily seen to be equivalent to solving the algebraic eigenvalue problem KNt
= AMJV£
(8.22)
for the largest eigenvalue A^jy Let £' £ (D be an approximation to an eigenvector corresponding to A^jv for (8.22). As to the remaining eigenvalues, they can be approximated by resorting to (8.17), namely by solving the problem minmax.Rjv(£), where a) "max" stands for maximizing over the set defined by the equalities
7 7 i -e = o , . . . , 7 7 " - i ^ = o, for a given linearly independent set r}1,. .. , T?" - 1 EW . b) "min" stands for minimizing the above maxima with respect to all choices of Note however that minmaxiiflr(f) <
max RN(£) —: A„ w, f*-£=0,fc=l,...,n-l
where (,,.-■ , £n-i &® constitute approximations to eigenvectors corresponding to the previously computed eigenvalues X1:N,..., A n _ liJV , see (8.20). Thus, Xn^ 2 KN is called the stiffness matrix and MN is referred to as the mass matrix in the current finite element literature, see [35].
8.3.
SIMULATION
AND ORTHOGONAL
EXPANSIONS
approximates XnjN from above. If (i,...,£"_1 An,JV =
129
are the exact eigenvectors, then
^n,N-
The optimization problems to be solved for computing the approximate spec trum and the corresponding eigenvectors
are a) iaax.(^Q Rff(():
this yields both A^jy and £',
b) max.Rw(£), where the maximization is subject to f • £ = 0 , . . . , £ n _ 1 ■ £ = 0: this yields both A,,^ and £ n . These problems can be conveniently solved on a computer by means of the con j u g a t e gradient algorithm, see [19]. As an alternative, the eigenvalue problem (8.22) could be solved directly using numerical linear algebraic techniques, as suggested for instance in section 6.4 of [35]. Be it as it may, the approximate eigenfunctions are 4N)
= (I4>I + --- + ^UN,
k=l,...,N
and these are plugged into the partial sums (8.14) to perform the actual compu tations. Clearly the choice of basis functions <j>i,..., (j>n is crucial in the application of the Ritz method. From the point of view of convergence as N —¥ oo, things improve if ^ > j , . . . , tpp! can indeed be well approximated by elements of 5jv- From a practical viewpoint, the matrices MN and KN must be easily computable. The finite e l e m e n t technique for choosing a basis works along the following lines (see [35] for further information): Discretize K into a finite partition Qi,..., QN of parallelepipeds of diameter h. For each k, pick xk g Qk D K and let a neighborhood 14 of xh be given, where 14 consists of Qk plus "a few" other adjacent parallelepipeds. Finally, for each k pick 4>k £ L2(K) which vanishes outside 14- Then (A4>j,<j>i)
=
/
C(x,y)j{x)<j>i(x)dxdy,
JVjC\Vi
(jAi) =
/ JVjHVi
<j>j{x)4>i(x)dxdy.
130
CHAPTER
8. SAMPLING
AND MODELING
RANDOM
FIELDS
The advantage of this kind of basis functions is that Vj ("I Vi = 0 for "a large number" of pairs [i,j), hence Mn will be sparse (actually banded if the numbering is right).
8.4
Mean and Covariance Estimation
Let us limit ourselves to homogeneous random fields Z, defined on the whole space IRd. Let m S (E be its correlation function, i.e. m
=
EZ(x),
R{u)
=
EZ{x +
u)Z(x).
The record available for the estimation of these two quantities will be assumed to be {Z(x),x £ B}, where B is a Borel subset of M . Moreover, increasing information patterns will be represented by monotonically increasing families of such sets B, in such a way that B t IRd■ To fix notation, suppose B
2'
+
2
for T > 0. The volume of such B is \B\ = Td. Common estimators of m and R(u) are given by the spatial average
mT =
' ]F\Lz{x)dx'
(8.23)
and the spatial correlation
Rr(v)
Cr-ltuD-cr-im!
/
Z(x + u)Z{x)dx,
JB(U)
respectively. Here B(u) := Ii X ■ ■ ■ x Id, with , if uk < 0, T
, T
, if uk > 0.
(8.24)
8.4. MEAN AND COVARIANCE
131
ESTIMATION
Figure 8.2: The coordinate transformation changes a square into a rhombus. h — 1 , . . . , d. If there is a spectral density it can be estimated by
9T(0 = Je-2"i(»RT(u)du, the required computations being trusted to an F F T implementation. Observe that all these estimators are random variables. Taking expectation in (8.23), we obtain EmT
= —
/ EZ(x)dx
= m,
hence m? is unbiased. Analogously, ERT{u)
= ——— [ EZ(x + u)Z(x)dx \B{uj\ JB
= R{u),
hence RT is unbiased, too. Consequently, so is QT- Let us now address the question of c o n s i s t e n c y of these estimates. For instance, from (8.23) E\mT
-m\2
= -— I
I R(x -
y)dxdy.
L e m m a 8.4.1 If d — 1, then
±JJBR(x-y)dxdy
=
±J_+TTR(u)(l-^)du.
PROOF: Introduce the coordinate transformation u = x -y,
v = y
132
CHAPTER
8. SAMPLING
AND MODELING
RANDOM
FIELDS
and observe that the rectangle [—T, + T ] 2 is transformed into the rhombus in (u,v) space described by (see Figure 8.2) f
T
2
T
T
T}
{(«.«) e m .-^
v < +-}.
Change the order of integration to obtain
JlllR^-yy>dxdy
{Lll-u+JoJ^u)R{u]dvdu
=
= fTR(u)(T-\u\)du, hence the result.
□
In general, Z?|mr — m| is given by yw ( / V ''" / _ / ) =
j^d \J_T
R Xl
(
-yu--,xd-
Vd)dxx . . . dxddyi
...dyd
R x
{ i-yu--,xd-yd)dx1dy1...dxddyd
J_T j
1 f+f /•+? TpiJ_T
i_T R(xd -
Vd)dxddyd,
where 1
/ /•+¥ y+fN^ - 1
Proceeding recursively, the previous lemma can be applied d times, giving E\mT~m\2 +T r+T
1 r Yd, j _ T
(8.25) r+T
■■■J_T
d
/
I
i\
R{u1,...,ud)'[[h-l-1^\du1...dud.
Note that the integrand, on the right-hand side is bounded by |-ft(«i, ■ ■ • >ud)\ on [-T, +T] . Assuming R is integrable, obtain by Dominated Convergence that YimjrdE\mT
- m\2 = j R(u)du = ||i?||,
thus estabhshing the following result, from which consistency in mean square follows:
8.4.
MEAN AND COVARIANCE
ESTIMATION
P r o p o s i t i o n 8.4.1 If R £ Lx{IRd),
133
then
£|mr-m|2 = M i
+ 0(l),
asT^oo.
If R £ L1{IRd), then Boch ner's theorem 6.4.2 can still be applied in order to obtain necessary and sufficient conditions for consistency. Indeed, substitute
R(u) = I e^'X^) into (8.25) to obtain E\mT
- m| 2
d
r
=
r+T
y n j_
P2*^*'"1 / T
—f—
\UL\\ l
[I ~
-YJ
du^(dO-
An elementary computation shows that
TJ-Te
[
1
- Y )
d u
= \ 2 ^ ^ p ,
if^O.
Define the new measure p, by setting MB) = fi(B - {0}) for each Borel set B C IR ■ For each k = l,...,d hyperplane Hk := {i e lRd : 6 = 0}
consider the coordinate
and let
#:=ur=1#fc. Then £ | m r - m| 2 =
M ({0})
+ / ^ ^
/ T ( f Jig,
where fa can be described as follows: outside H, / r ( £ ) coincides with the product d
ii
l-cos(27r&r)
(2-6T)2 ■
On the hyperplanes i/*., it is a product of a smaller number of factors of the same type. Therefore,
\M0\ < # .
134
CHAPTER
where g £ Li(IRd), get
8. SAMPLING
AND MODELING
RANDOM
FIELDS
and v > 1. Apply the Dominated Convergence Theorem to
lim
E\mT-m\2=fM({0}),
T-yoo
thus proving the following characterization of m.s. consistency: T h e o r e m 8.4.1 Let Z be a homogeneous random field with spatial measure [t. Then, E\mr — m\ —¥ 0 as T —¥ oo if and only » / M ( { 0 } ) = 0. Observe that consistency ofTOTmeans that (for d = 1)
1 f+f
T'4» J,jJ
Z{r)dr = m = EZ{x),
i.e. time averages converge to the sample average. This property is commonly called ergodicity, and clearly lies at the root of modelling homogeneous random fields. Consistency of the estimator (8.24) is investigated (for d = 1, Gaussian case) in [3].
Bibliography [1] T . M . APOSTOL, Mathematical 1965. [2] R. G. BARTLE, The Elements York, 1964.
Analysis,
Addison Wesley, Reading MA,
of Real Analysis, John Wiley k Sons, New
[3] J. S. BENDAT AND A. G. PlERSOL, Random Data: Analysis and Measure ment Procedures, John Wiley and Sons, New York, 1986. [4] C. M. BENDER AND S. A. ORSZAG, Advanced Mathematical Scientists and Engineers, McGraw Hill, New York, 1978.
Methods for
[5] A. BLANC-LAPIERRE AND R. FORTET, Theorie des Fonctions Aleatoires, Masson et Cie. Eds., Paris, 1953. English version published as Theory of Random Functions, Gordon and Breach, New York, 1968. [6] H. CRAMER, Mathematical Press, Princeton, 1947.
Methods
of Statistics,
Princeton University
[7] G. DAGAN, Flow and Transport in Porous Media, Springer, New York, 1990. [8] B . DE FlNETTl, Teoria della Probability (2 vol.), Einaudi, Torino, 1970. Eng lish translation published as Theory of Probability: A Critical Introductory Treatment (2 vols.), (Wiley Classics Library) J. Wiley, New York, 1990. [9] H. DYM AND H. P . McKEAN, Fourier Series and Integrals, Academic Press, New York, 1972. [10] W . FELLER, An Introduction to Probability Theory and its vol. 1, John Wiley and Sons, New York, 1957. 135
Applications,
136
BIBLIOGRAPHY
[11]
, An Introduction to Probability Theory and its Applications, vol. 2, John Wiley and Sons, New York, 1966.
[12] I. M. GEL'FAND AND N. Y. VlLENKlN, Generalized Functions, vol. IV, Academic Press, New York, 1964. Translation from the 1961 Russian original. [13] I. I. GlHMAN AND A. V. SKOROHOD, The Theory of Stochastic vol. 1, Springer Verlag, Berlin, 1974. [14] I. GOHBERG AND S. GOLDBERG, Basic Boston, 1971.
Operator Theory,
Processes, Birkhauser,
[15] R. R. G O L D B E R G , Fourier Transforms, Cambridge University Press, Cam bridge, 1962. [16] G. G O L U B AND C. F . VAN LOAN, Matrix Computation, The Johns Hopkins University Press, Baltimore, second ed., 1989. [17] A. L. GUTJAHR, Fast Fourier transform for random field generation, project report for Los Alamos grant, Department of Mathematics, New Mexico Tech., Socorro, NM, 1989. [18] P . HENRICI, Fast Fourier methods in computational complex analysis, SIAM Review, 21 (1975), pp. 481-527. [19] M. HESTENES, Conjugate Direction Methods in Optimization, lag, New York, 1980. [20] H. HOCHSTADT, Special Functions of Mathematical Wilson, New York, 1961. [21] Y. KATZNELSON, An Introduction tions Inc., New York, 1976.
Springer Ver
Physics, Reinhart and
to Harmonic Analysis, Dover Publica
[22] D. A. KNUTH, The Art of Scientific Programming, vol. 2: Seminumerical Algorithms, Addison Wesley, Reading MA, 1973. [23] A. N. KOLMOGOROV, Grundbegriffe der Wahrscheinleichskeitstheorie, Springer, Berlin, 1933. English translation published as Foundations of Prob ability, Chelsea, New York, 1956. [24] J. LAMPERTI, Probability, Benjamin Co., New York, 1966.
BIBLIOGRAPHY
137
[25] P . S. LAPLACE, Essai philosophique sur les probabilites, Gauthier-Villars, Paris, 1921. Reprinted from the original edition of 1795. [26] B . W . LlNDGREN, Statistical Theory, The Macmillan Co., New York, 1962. [27] S. MEIER AND W . KELLER, Geo Statistik, Springer Verlag, Wien, 1990. [28] A. PAPOULIS, Probability, Random McGraw-Hill, New York, 1965. [29]
W.
H.
P R E S S , B.
P.
Variables and Stochastic
F L A N N E R Y , S. A. T E U K O L S K Y , AND W .
TERLING, Numerical Recipes: The Art of Scientific University Press, Cambridge, 1988.
Processes,
T.
VET-
Computing, Cambridge
[30] H. L. ROYDEN, Real Analysis, The Macmillan Co., New York, 1968. [31] W . RUDIN, Real and Complex Analysis, McGraw Hill, New York, 1966. [32] Y. A. SHREIDER, The Monte Carlo Method, Pergamon Press, London, 1966. [33] E . M. STEIN, Harmonic analysis on Mn, in Studies in Mathematics, J. M. Ash, ed., vol. 13, The Mathematical Association of America, Providence, RI, 1976. [34] E . M . STEIN AND G. WEISS, Introduction to Fourier Analysis on Euclidean Spaces, Princeton University Press, Princeton, 1971. [35] G. STRANG AND G. J. F i x , An Analysis of the Finite Element Prentice Hall, Englewood Cliffs NJ, 1973. [36] A. A. SVESHNIKOV, Applied Methods of the Theory of Random Pergamon Press, Oxford, 1966. [37] S. YAKOWITZ, Computational Reading MA, 1978.
Probability and Simulation,
Method,
Functions,
Addison Wesley,
[38] A. H. ZEMANIAN, Distribution Theory and Transform Analysis, Hill Book Co., New York, 1965.
McGraw
This page is intentionally left blank
Appendix A The Sources Basic Random Function Theory is amply covered in [13], and also in [5] and [36], with a more applied twist in [28]. For the basics of Probability and Statistics, refer to [10], [11], [24] and [6], also [28]. Basic Analysis at an intermediate level can be found for instance in [1]. More specifically, the measure theoretic aspects receive ample coverage in [30], and [31], and [14] is our source for Hilbert Space and the relevant Operator Theory. In particular, our presentation of Fourier Analysis is just a rendering of (parts of) the first chapter of [34]. The LaplaceStieltjes transform has not been discussed in the text, but it can be found in [21]. For a more accessible presentation of the basic Fourier analytic concepts, see [9]. The basic theory and its applications to the Earth Sciences are covered in the German volume [27]. Applications to classical Signal Analysis are hinted at in [28], and [7] applies the material in these lecture notes to problems of water flow in porous media. Measurement and estimation issues are discussed in [3], also in the last part of [36]. Monte Carlo simulation is discussed in [32] and [37], the main source for random number generation being the algorithms in [22]. The report [17] contains an efficient implementation of an algorithm for sampling homogeneous random fields based on their spectral representation. The Fast Fourier Transform is presented in [29], including an efficient computer implementation. Sources for the approximation techniques required for simulating random fields from their Karhunen-Loeve representation are: [4] for summation techniques, [35] for the computation of eigenfunctions and eigenvalues, and [16] for the associated linear algebraic problems.
139
This page is intentionally left blank
Index space, 47-48 Chebyshev's inequality, 38 Cholesky factorization, 29 common refinement, 100 complete system, 104 completeness
— A — adjoint of a linear operator, 72 affine transformations, 66
of Z,2, 52 conditional expectation, 54-55 probability, 54 conjugate gradient algorithm, 120 consistency, 122 Kolmogorov conditions, 22 continuous mean square random field, 84 at a point, 84 continuous, random variable, 11 control variate, 42 convergence
— B — bilinearity, 48 Bochner theorem, 93 Borel set, 56 Borel subsets, 7 Brownian motion, standard, 83 Bunyakovsky-Cauchy-Schwarz inequality, 50
—c— Cauchy-Schwarz inequality, 50 Central Limit Theorem, 27, 30, 64 change of variables theorem, 28 characteristic function, 61 Chebyshev polynomials, 50
in L2, 51 in quadratic mean, 51 convolution, 30, 69 correlation function of a random field, 78
141
INDEX
142 covariance, 23 cross, 79 function of a random field, 77 cumulative distribution function, 8
event, 1, 4 certain, 3 elementary, 2 events incompatible, 3 independent, 23
— D — density discrete, 11 exponential, 11 spectral, 93 distance properties of, 51 distribution empirical, 21 function, 8 cumulative, 8 empirical cumulative, 9 marginal, 21 standard Gaussian, 11 standard multidimensional Gaussian, 22 Dominated Convergence Theorem, 62, 125
— E — elementary event, 2 empirical cumulative distribution function, 9 distribution, 21 equiprobabilistic approach, 3 ergodicity, 125
— F — Fast Fourier computations, 113 FFT, 116-117, 122 filtering problem, 57 finite element technique, 121 Fourier Transform, 65 Fast, 116 Fourier transform, 72-73 frequency approach, 2 function of positive type, 79
— G — gambler, 3 Gaussian distribution standard, 11 standard multidimensional, 22 generalized derivative of a correlation function, 89 Gram-Schmidt orthogonalization procedure, 49 Gram-Schmidt procedure, 58
INDEX
143
— H — height of a multi-index, 114 Hermite polynomials, 50 space, 46, 48 Hermitian character, 48 homogeneous random field, 78
— I — identically distributed components of a sample, 25 importance sampling, 42 incompatible events, 3 independent components, 24 components of a sample, 25 events, 23 random variables, 23 inner product, 48 integrable function, 64 integral stochastic, 101 Wiener's, 102
— K — Karhunen-Loeve Theorem, 87 Kolmogorov consistency conditions, 22 Kolmogorov-Smirnov test, 10
— L — Laguerre polynomials, 50 space, 46, 48 Laplacian approach, 3 Law of Large Numbers, 110 Strong, 26 Weak, 39 Legendre polynomials, 50 space, 46, 48 lemma Riemann-Lebesgue's, 67 level of rejection, 10 linear estimate best, 58
— M — marginal distribution, 21 Marsaglia method, 32 mass matrix, 119 mean of a random field, 77 of a sample, 25 mean square continuous random field, 84 at a point, 84 differentiable random field, 88 measure orthogonal random, 96 spectral, 92 structure, 96
144
INDEX
method Gram-Schmidt, 49 Monte Carlo, 35 metric, 51 modelling problem, 21 Monte Carlo, 36-37 estimate, 35 method, 35 multi-index, 113 height of a, 114
— N — Noise white, 93 norm, 50
— P — partition, 47 Plancherel Theorem, 69, 71 polar method, 32 positive non degeneracy, 48 prediction problem, 57 probability, 1, 3 conditional, 23 density function, 11 space, 4 problem filtering, 57 prediction, 57 smoothing, 57 process Wiener's, 83
— o— observation, 1 optimal filter, 57 predictor, 57 smoother, 57 orthogonal projection theorem, 52 random measure, 96 continuous, 111 discrete, 110 random variables, 48 Orthogonal Projection Theorem, 58 orthogonal random measure, 110 outcome, 1
— R — random measure, 95 subordinated orthogonal, 97 numbers, 32 variable, 7 square summable, 16 variables independent, 23 vector, 20 random field centered, 77 correlation function of a, 78 Gaussian, 76 homogeneous, 78
INDEX
145
mean square continuous, 84 at a point, 84 second order, 75 weakly stationary, 78 random measure orthogonal, 110 orthogonal discrete, 110 Rayleigh quotient, 118-119 refinement common, 100 rejection level, 10 relative frequency, 2 Richardson extrapolation, 118 Riemann-Lebesgue's Lemma, 67 rigid motions, 65 Ritz method, 119
—s— sample, 25 space, 1 sample mean, 25 sampling technique, 31 Schwarz inequality, 50 second order theory, 77 Shanks' transformation, 118 cr-algebra, 4
Hilbert, 52 Laguerre's, 46, 48 Legendre's, 46, 48 probability, 4 spatial average, 121 correlation, 122 spectral density, 93 measure, 92 square integrable function, 64 summable random variable, 16 stiffness matrix, 119 stochastic integral, 99, 101 process, 57 second order, 75 weakly stationary, 78 structure measure, 96 subjectivist approach, 3 subordinated orthogonal random measure subspace closed, 52
— T — Theorem Bochner's, 93 Central Limit, 27, 30, 64 Change of Variable, 66
INDEX
146 Dominated Convergence, 62, 125 Karhunen-Loeve's, 87 Orthogonal Projection, 52, 58 Plancherel's, 69, 71 Weierstrass' Approximation, 40 time average, 2 triangle inequality, 50
—u— uncertainty principle, 114 uniform distribution, 17 unitary operator, 73
— V — variance, 17
—w — weakly stationary random field, 78 stochastic process, 78 Weierstrass' Approximation Theorem, 40 White noise, 93 Wiener integral, 102 Wiener process, 83