This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
)). +E';
(3.50) D
4
Nonconstant Diffusion Coefficient and the Complex Measure Conditions
In this section we explore the complex measure condition on the lower order coefficients when the highest order diffusion coefficient is not necessarily constant. Since the results are somewhat exploratory, we restrict to one dimension and consider the Cauchy problem
au at = Lu + g,
u(O, x) = uo(x)
(4.51)
for
Lf(x) = (a(x)f(x))xx Letting A(~) = E + a~) in the mild form
+ (b(x)f(x))x + c(x)f(x) -
Ef(x).
(4.52)
e, the Fourier transformed equation may be expressed
where a is the complex measure defined by Q:=
~(a( {O} )60 -
v 21r
a).
(4.54)
For later notational convenience let
a({O}) ry =
v'21T'
(4.55)
On Ito's Complex Measure Condition
76
To make the probabilistic construction in Fourier frequency space under the complex measure condition on the lower order coefficients we will require a condition of the following form on the leading order coefficient a( x). CONDITION A: Assume that a is a complex measure and
a( {O}) > lal (R\ {O}),
(4.56)
where lal denotes the corresponding total variation measure. One may note that in the case of constant coefficient a(x) = a, Condition A is merely the condition that a> O. The stochastic jump Markov process {~( t) : t ;::: O} and multiplicative functional X in this setting are defined as follows. First let q, Q denote the measure and probability distribution defined by the coefficients b, c exactly as in (3.18)(3.19) with dimension n = l. Similarly, one defines
de ro = dQ'
rl =
db dQ
(4.57)
precisely as in (3.20)-(3.21). In addition, let ao = IQlt(~) be the probability defined by normalizing the total variation measure of the complex measure a defined above in (4.55). Now define (4.58) Next let {Ji : i ;::: I} and {Ki : i ;::: I} be mutually independent sequences of i.i.d. symmetric Bernoulli 0-1 random variables as defined earlier for (3.25) with n = l. Additionally, let {ai : i ;::: I} be a sequence of independent Bernoulli 0-1 random variables, independent of {Ji : i ;::: I} and {Ki : i ;::: I}, and distributed according to the law
P(ai = 1) = p
lal(R\{O}) a( {O}) E (0,1).
=
(4.59)
For future reference, one should also note that p = ,-llal(R). Now the increments {TJi : i ;::: I} of the jump Markov process are i.i.d. and independent of the above coin tossing sequences {Ji }, {Kd with (4.60) Accordingly the skeletal jump process starting at
~o = ~
is given by
k
~k = ~ -
L TJi,
k;::: l.
(4.61 )
i=l
Conditionally on the spatial random walk {~d the holding times {Sk : k ;::: I} for the jump Markov process may be defined by specifying infinitesimal rates ..\(~k)' (e.g. see Blumenthal and Getoor (1968), Bhattacharya and Waymire (1990)), where (4.62)
Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire
°
Recall that E > is a parameter of (4.52). Finally, the multiplicative times functional factors
mj(~) =
Xis recursively defined with scale
(l-P)~>'(~)' if j = 0, (l-P)A:rr>.(~)' if j = 1
{
e
rp(t,~)
(4.63)
'f J. -- 2 ,
p>.W
and rescaled forcing term stochastic recursion
77
1
2((1 - p)A(~))-lg(t,~), by the following if if if if
Se Se Se Se
2: t < t, "'e = 0, < t, "'e = 1, ae = 0, Je = j < t,"'e = 1,ae = 1,Je =j
E {O, I} E {0,1}
(4.64) We are now ready to state the theorem in this setting. Theorem 4.1. Assume that the diffusion coefficient a satisfies Condition A. If band c are complex measures and if there is a number B such that luo(~)1 :::; B, and Ig(t,~)1 :::; BA(~)/2,~ E Rn,t 2: 0. Then a mild solution u(t,~) for the Fourier transformed equation is given by the stochastic representation
Proof. The focus is on the implied convergence of the expectation. We leave it to the reader to use the strong Markov property of the underlying jump process to verify the equation from the stochastic recursion defined by the times functional X. To establish integrability first note that
Ird1]) I :::;
lal(R) = lal(R\{O})
and
Thus (4.65) Now with counting random variables K, Nt, and K t defined exactly as in (3.27), where Nt is again the number of jumps by time t, one has upon iteration of the stochastic recursion that Kt
u(t,~)
= E~lFd(t, 0) = E~IF~
II
r J i - 1 (1]i)mJi-l (~i_l))l-Ui-l i=l . (r2(1]i)m2(~i-l)ti-l "'i-l {UO(~Nt )l[Nt :::; K] + rp(t - So - ... - SKtl ~Kt)l[Nt > K]} (4.66)
where an empty product is assigned value one. Therefore, letting q = q(R), one has k
lu(t,~) I :::; B
L E~e=~ II IqmJi_ (~i_lW-Ui-l1[K 1\ Nt = k]. 1
k;::O
0
On Ito's Complex Measure Condition
78
For each k it is helpful to introduce the mutually dependent pair of binomial distributed random variables k-l
k-l
= ~(1- O"i)l[Ji = 1],
Xk
Yk = ~(1- O"i)l[Ji = 0].
i=O
(4.67)
i=O
Also in the case that 0"0 = ... = O"k-l = 0 set hk = 1, else let hk denote the density of 2.:~-1 O"iSi conditional on the O"i'S, J/s, and ~i'S. Then, proceeding similarly as in the proof of Theorem 3.1, consider k-l
Ak :=Et,e=t,
II IqmJi(~i)ll-ai1[K
1\ Nt
= k]
o k-l
<
Et,e=t,
II IqmJi (~iW-ai 1[Nt 2: k]P(K 2: k) o
< Elf
+qmo(O)(-) 2e,
Setting (3 =
(1=:p)
1/2
l[Xk
max{(l_~)er'
= 1, Y k = 0] + pk }
vk-}, we have
x
2- k ((3t)---.k+Yk X A < -E _ 2 1[~ kt t,e-t, f( ~k + Yk) 2
k 2~ 2
+ y; > 1] + _q_(P.)k-l + (P.)k k-
2·
Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire
79
Thus we obtain
< B{~ 2- k (t /\ 1) E _ ({3t)~+Yk l[Xk ~ k?O
1;0-1; r( X k
t
2
+ Yik)
2
+
Yi
> 1]
k_
2 + y0Y(2 - p)2 +-2- P 2q
BC1(t /\ 1)
L L
2-k({3t)~
r(n/2) P(Xk
+ 2Yk = n)
k?12::;n::;2k
2B q + 2 - p (y0Y(2 _ p)
< 2B(t /\ 1)C 1
+ 1)
({3t/2) ~ 2B q L + -( n?2 r(n/2) 2 - p y0Y(2 -
< B(t /\ 1){3(1 + J2{3t/7r)e~t + 2B (
p)
t )+
2-p y0Y2-p
This establishes the desired convergence.
5
+ 1) 1). D
Acknowledgments
The authors are grateful to Professor V. N. Kolokoltsov for providing additional references and comments on a draft of this paper. This work was partially supported by a Focussed Research Group grant DMS-0073865 from the National Science Foundation.
Bibliography [1]
Albeverio, S., R. H¢egh-Krohn (1976): Mathematical theory of Feynman path integrals, Lecture Notes in Mathematics 523, Springer-Verlag, NY
[2]
Bhattacharya, R. and E. Waymire (1990): Stochastic Processes with Applications, Wiley, NY.
[3]
Bhattacharya, R., L Chen, S. Dobson, R. Guenther, C. Orum, M. Ossiander, E. Thomann, E. Waymire (2002): Majorizing Kernels & Stochastic Cascades With Applications To Incompressible Navier-Stokes Equations, Trans. Amer. Math. Soc. (in press).
[4]
Blumenthal, R.M. and R.K. Getoor (1968): Markov Processes and Potential Theory, Academic Press, NY
[5]
Feller, W. (1971): An Introduction to Probability Theory and its Applications, Vol II, 2nd ed., Wiley, NY
[6]
Folland, Gerald B. (1992) Fourier Analysis and its Applications Brooks/Cole Publishing Company, Pacific Grove California
[7]
Ito, K.(1965): Generalized uniform complex measures in the Hilbertian metric space with the application to the Feynman integral, Proc. Fifth Berkeley Symp. Math. Stat. Probab. II, 145-161.
80
On Ito's Complex Measure Condition
[8]
Kolokoltsov, V.N. (2000): Semiclassical analysis for diffusions and stochastic processes, Springer Lecture Notes in Mathematics, v. 1724, SpringerVerlag, NY.
[9]
Kolokoltsov, V.N. (2002): A new path integral respresentation for the solutions of the Schrodinger equations, Math.Proc. Camb.Phil.Soc 132 353-375
[10] LeJan, Y. and A.S. Sznitman (1997). Stochastic cascades and 3-dimensional Navier-Stokes equations, Prob. Theory and Rel. Fields 109 343-366. [11] Podlubny, I. (1999): Fractional Differential Equations, Academic Press, San Diego, CA.
Variational formulas and explicit bounds of Poincare-type inequalities for one-dimensional processes Mu-Fa Chen l Beijing Normal University
Abstract This paper serves as a quick and elementary overview of the recent progress on a large class of Poincare-type inequalities in dimension one. The explicit criteria for the inequalities, the variational formulas and explicit bounds of the corresponding constants in the inequalities are presented. As typical applications, the Nash inequalities and logarithmic Sobolev inequalities are examined.
AMS 2000 Subject classification: 49R50, 34L15, 26DIO, 60J27 Keywords: Variational formula, Poincare inequality, Nash inequality, logarithmic Sobolev inequality, Orlicz space, one-dimensional diffusion, birthdeath process.
1
Introduction
The one-dimensional processes in this paper mean either one-dimensional diffusions or birth-death Markov processes. Let us begin with diffusions. Let L = a(x)d 2 /dx 2 + b(x)d/dx be an elliptic operator on an interval (0, D) (D :S (0) with Dirichlet boundary at a and Neumann boundary at D when D < 00, where a and b are Borel measurable functions and a is positive everywhere. Set C(x) = bfa, here and in what follows, the Lebesgue measure dx is often omitted. Throughout the paper, assume that
J;
Z:=
lD
eC / a <
00.
(1.0)
Hence, dJ-l := a-1ecdx is a finite measure, which is crucial in the paper. We are interested in the first Poincare inequality
where Cd is the set of all continuous functions, differentiable almost everywhere and having compact supports. When D = 00, one should replace [0, D] by [0, D) but we will not mention again in what follows. Next, we are also interested in the second Poincare inequality (1.2) where 7r(f) = J-l(f)/Z = J JdJ-l/Z. To save the notations, we use the same A (resp., A) to denote the optimal constant in (1.1) (resp., (1.2)). The aim of the study on these inequalities is looking for a criterion under which (1.1) (resp., (1.2)) holds, i.e., the optimal constant A < 00 (resp., A < (0), 1 Research
supported in part by NSFC (No. 10121101), RFDP and 973 Project.
81
Poincare- type Inequalities
82
and for the estimations of A (resp., A). The reason why we are restricted in dimension one is looking for some explicit criteria and explicit estimates. Actually, we have dual variational formulas for the upper and lower bounds of these constants. Such explicit story does not exist in higher dimensional situation. Next, replacing the L2-norm on the right-hand sides of (1.1) and (1.2) with a general norm 11·lllffi in a suitable Banach space (the details are delayed to §3), respectively, we obtain the following Poincare-type inequalities IIf211lffi :S; AlffiD(f),
f
II (f - 7f(f) )211 :S; AlffiD(f),
f
0.
(1.3)
CCd[O, DJ.
(1.4)
f(O)
E CCd[O, D], E
=
For which, it is natural to study the same problems as above. The main purpose of this paper is to answer these problems. By using this general setup, we are able to handle with the following Nash inequalities[23j Ilf - 7f(f)112+ 4 / v :S; AND(f)llflli/ v
(1.5)
in the case of v > 2, and the logarithmic Sobolev inequality[18j: (1.6) To see the importance of these inequalities, define the first Dirichlet eigenvalue AO and the first Neumann eigenvalue A1, respectively, as follows.
AO = inf{D(f): f E C 1 (0,D) nC[O,D], f(O) = 0, 7f(12) = I}, A1 = inf{D(f) : f E C 1(0, D) n C[O, D], 7f(f) = 0, 7f(12) = I}. Then, it is clear that AO
(1.7)
= 1/A and A1 = 1/ A. Furthermore, it is known that
The second Poincare inequality
¢==::}
Var(Ptf) :S; Var(f) e- 2A1t .
Logarithmic Sobolev inequality
¢==::}
Ent(Ptf) :S; Ent(f) e-2t/ALS,
Nash inequality
¢==::}
Var(Ptf) :S; Cllfllr c
(1.8)
v,
where IlfilT is the LT(ft)-norm (cf., [8], [13], [18J and references within). It is clear now that the convergence in the first line is also equivalent to the exponential ergodicity for any reversible Markov processes with density (cf. [10]), and C(x), where i.e., IIPt(x,·) - 7fIIVar :S; C(x)e- ct for some constants E > Pt (x, .) is the transition probability. The study on the existence of the equilibrium 7f and on the speed of convergence to equilibrium, by Bhattacharya and his cooperators, consists a fundamental contribution in the field. See for instance [2J-[6J and references within. The second line in (1.8) is correct for diffusions but incorrect in the discrete situation. In general, one has to replace "¢==::}" by "====?". Here are three examples which distinguish the different inequalities.
°
° °
b(x) = a(x) = x, b(x) = a=x 2 Iog' x a(x) = 1 b(x) = -b
Ergodicity
2nd Poincare
LogS
L--r- exp .
Nash
,>1
,2::2
,>2
,>2
,>2
J
,2::0
,2::1
,>1
x
J
J
x
x
x
Table 1.1, Examples: Diffusions on [0, (0)
Mu-Fa Chen
83
Here in the first line, "LogS" means the logarithmic Sobolev inequality, "L1_ exp." means the L1-exponential convergence which will not be discussed in this paper. "J" means always true and "x" means never true, with respect to the parameters. Once known the criteria presented in this paper, it is easy to check Table 1.1 except the L1-exponential convergence. The remainder of the paper is organized as follows. In the next section, we review the criteria for (1.1) and (1.2), the dual variational formulas and explicit estimates of A and A. Then, we extend partially these results to Banach spaces first for the Dirichlet case and then for the Neumann one. For a very general setup of Banach spaces, the resulting conclusions are still rather satisfactory. Next, we specify the results to Orlicz spaces and finally apply to the Nash inequalities and logarithmic Sobolev inequality. Since each topic discussed subsequently has a long history and contains a large number of publications, it is impossible to collect in the present paper a complete list of references. We emphasize on recent progress and related references only. For the applications to the higher dimensional case and much more results, the readers are urged to refer to the original papers listed in References, and the informal book [13], in particular.
2
Ordinary Poincare inequalities
In this section, we introduce the criteria for (1.1) and (1.2), the dual variational formulas and explicit estimates of A and A.
To state the main results, we need some notations. Write x /\ y = min {x, y} and similarly, x V y = max { x, y}. Define
F = {f
j
=
O[O,D]
E
n C 1(O,D):
{f E 0[0, D] : f(O) f = f(·/\xo), f
= {f j' = {f
F'
f
= E
f(O) = 0,
f'1(O,D)
> O},
0, there existsxo E (0, DJso that
C1(O,xo)andf'l(o,xo) >
O},
(2.1)
E 0[0,
D] : f(O) = 0, fl(o,D) > O},
E 0[0,
D] : f(O) = 0, there existsxo E (0, D]so that
=
f(·/\ xo)andfl(o,x o) >
O}.
Here the sets F and F' are essential, they are used, respectively, to define below the operators of single and double integrals, and are used for the upper bounds. The sets j and j' are less essential, simply the modifications of F and F', respectively, to avoid the integrability problem, and are used for the lower bounds. Define
I(f)(x)
=
e-G(x) lD f'(x) x [feG la] (u)du,
fEF,
(2.2)
II(f)(x)
=
(X lD f(x) Jo dye-G(y) y [feG la] (u)du, 1
f
E
F'.
The next result is taken from [12; Theorems 1.1 and 1.2]. The word "dual" below means that the upper and lower bounds are interchangeable if one exchanges the orders of "sup" and "inf" with a slight modification of the set F (resp., F') of test functions.
Poincare-type Inequalities
84
= sup
Theorem 2.1. Let (1.0) hold. Define
xE(O,D)
e: .
Then, we have the following assertions.
(1) Explicit criterion: A <
00
iff B <
00.
(2) Dual variational formulas:
A::; inf
sup II(f)(x) = inf
JEP XE(O,D)
A
~ sup
inf
sup I(f)(x),
JeF xE(O,D)
II(f)(x) = sup
JEFf xE(O,D)
inf
I(f)(x).
(2.3)
JEF XE(O,D)
The two inequalities all become equalities whenever both a and b are continuous on [0, D].
(3) Approximating procedure and explicit bounds: (a) Define h = yTP, fn = fn-1II(fn-l) and Dn = SUPxE(O,D) II(fn) (x). Then Dn is decreasing in n and A ::; Dn ::; 4B for all n ~ 1. (b) Fix Xo E (0, D). Define fi xo ) =
= f~~{(· Axo)II(J~~{(.
Axo))
and en = sUPxoE(O,D) infxE(o,D) II(J~xo)(. A xo))(x). Then en is increasing in n and A ~ en ~ B for all n ~ 1. We mention that the explicit estimates "B ::; A ::; 4B" were obtained previously in the study on the weighted Hardy's inequality by [22]. We now turn to study A, for which it is natural to assume that
1D
e-C(s)ds
1 s
a(u)-leC(u)du
Theorem 2.2. Let (1.0) and (2.4) hold and set f the following assertions.
(1) Explicit criterion: A <
00
iff B <
00,
=
=
00.
(2.4)
f - 7r(f). Then, we have
where B is given by Theorem 1.l.
(2) Dual variational formulas: sup
inf
JEF xE(O,D)
IU)(x)::; A::; inf
sup I(f)(x).
(2.5)
JEF XE(O,D)
The two inequalities all become equalities whenever both a and b are continuous on [0, D].
(3) Approximating procedure and explicit bounds: (a) Define h = yTP, fn = fn-1II(fn-l) and Dn Then A ::; D n ::; 4B for all n ~ l.
= SUPXE(O,D) IIUn) (x).
(b) Fixxo E (O,D). Define fi xo ) =
Mu-Fa Chen
85
Part (1) of the theorem is taken from [11; Theorem 3.7J. The upper bound in (2.5) is due to [16J. The other parts are taken from [12; Theorems 1.3 and l.4J. Finally, we consider inequality (1.2) on a general interval (p, q) (-00 S; p < q S; (0). When p (resp., q) is finite, at which the Neumann boundary condition is endowed. We adopt a splitting technique. The intuitive idea goes as follows: Since the eigenfunction corresponding to A, if exists, must change signs, it should vanish somewhere in the present continuous situation, say B for instance. Thus, it is natural to divide the interval (p, q) into two parts: (p, B) and (B, q). Then, one compares A with the optimal constants in the inequality (1.1), denoted by Ale and A 2e , respectively, on (B, q) and (p, B) having the common Dirichlet boundary at B. Actually, we do not care about the existence of the vanishing point B. Such B is unknown, even if it exists. In practice, we regard B as a reference point and then apply an optimization procedure with respect to B. We now redefine C (x) = b/ a. Again, since it is in the ergodic situation, we assume the following (non-explosive) conditions:
J:
l' fe
q
e-C('lds e-C(s)ds
l'
e Cfa
~ 00
fes e C /a =
00
if P ~
-00
if q =
00
and
(2.6)
for some (equivalently, all) B E (p, q). Corresponding to the intervals (B, q) and (p, B), respectively, we have constants B le and B 2 (h given by Theorem 1.1. Theorem 2.3. Let (2.6) hold. Then, we have
(2) Let B be the medium of j.-l, then (Ale V A 2e )/2 S; A S; Ale V A 2e .
In particular, A <
00
iff Bl() V B 2e <
00.
Comparing the variational formulas (2.3) and (2.5) with the classical variational formulas given in (1.7), one sees that there are no common points. This explains why the new formulas (2.3) and (2.5) have not appeared before. The key here is the discover of the formulas rather than their proofs, which are usually simple due to the advantage of dimension one. As an illustration, here we present parts of the proofs. Proof of the upper bound in (2.5). Originally, the assertion was proved in [16J by using the coupling methods. Here we adopt the analytic proof given in [9J. Let 9 E C[a, DJ n Cl(a, D), 7["(g) = a and 7["(g2) = 1. Then, for every f E :F
Poincare-type Inequalities
86
with 7r(f)
~
0, we have
~
r 7r(dx)7r(dy) [g(y) - g(X)]2 2 Jo = r 7r(dX)7r(dy )(l g'(~dU)2 J{x~y} x f (u)
1=
D
Y
~
r 7r(dx)7r(dy) lx J{x~y}
Y
gf'~(U))2 du u
l
f'(~)d~
Y
x
(by Cauchy-Schwarz inequality)
= =
r 7r(dX)7r(dy )lx g'(u)2eC(U)e;,~(~) du[J(y) J{x~y} Y
lD
f(x)]
U
l u lD l lD
a(u)g'(u)27r(du) z;~(:;u)
~ D(g)
Ze-C(u)
f'()
sup uE(O,D)
~ D(g)
U
7r(dx)
u
7r(dy) [J(y) - f(x)]
7r(dx)
u
0
(since7r(f) ~
sup I(f)(x)
7r(dy) [f(y) - f(x)]
0).
XE(O,D) 1
Thus, D(g)- ~
-
SUPxE(O,D)
I(f)(x), and so
A=
sup
D(g)-l
~
sup I(J)(x). xE(O,D)
g: 7r(g)=O, 7r(g2)=1
This gives us the required assertion:
A ~ inf
sup I(J)(x).
JEF XE(O,D)
The proof of the sign of the equality holds for continuous a and b needs more work, since it requires some more precise properties of the corresponding eigenfunctions. D
Proof of the explicit upper bound "A
~
4B".
As mentioned before, this result is due to [22]. Here we adopt the proof given in [11], as an illustration of the power of our variational formulas. Recall that B = SUPxE(O,D) e- c e C la. By using the integration by parts formula, it follows that
J;
J:
(2.1)
Hence
lD
e-C(x) J'Pe c I(ViP) (x) = (J'P)'(x) x -aas required.
D
~
e-C(x)v'P(x) (1/2)e- C(x) .
2B
~ = 4B
Mu-Fa Chen
3
87
Extension; Banach spaces
Starting from this section, we introduce the recent results obtained in [14] and [15], but we will not point out time by time subsequently. In this section, we study the Poincare-type inequality (1.3). Clearly, the Banach spaces used here can not be completely arbitrary since we are dealing with a topic of hard mathematics. l,From now on, let (lB, I . IIIB' p,) be a Banach space of functions f: [0, D] --+ IR satisfying the following conditions: (1)
1ElB;
(2)
lBis ideal: Ifh E lBandlfl ~ Ihl, thenf E lB;
(3)
IlfilIB = sup gEQ
(4)
(3.1)
D r io Iflgdp"
Q:3 gowithinfgo
> 0,
where Q is a fixed set, to be specified case by case later, of non-negative functions on [0, D]. The first two conditions mean that lB is rich enough and the last one means that Q is not trivial, it contains at least one strictly positive function. The third condition is essential in this paper, which means that the norm I . IIIB has a "dual" representation. A typical example of the Banach space is lB = L' (p,), then Q = the unit ball in L~ (p,), l/r + l/r' = 1. The optimal constant A in (1.3) can be expressed as a variational formula as follows.
AIB -_ sup
{ IID(f)' f211IB . f
E Cd[O, D], f(O) -_ 0, 0< D(f) <
00
}.
(3.2)
Clearly, this formula is powerful mainly for the lower bounds of A. However, the upper bounds are more useful in practice but much harder to handle. Fortunately, for which we have quite complete results. Define
BIB DIB
= sup
=
sup xE(O,D)
Ilfo
1\
CIB =
sup xE(O,D)
11'P(x 1\ ·?IIIB 'P(X)
YII IB
(3.3)
J
Theorem 3.1. Let (1.0) and (3.1) hold. Then we have the following assertions.
(1) Explicit criterion: AIB <
00
iff BIB <
00.
(2) Variational formulas for the upper bounds: AIB~ inf, sup f(x)-lllf'P(x 1\ ')IIIB JEF xE(O,D) e-C(x) ~ inf sup f'() IlfI(x,D)IIIB' JEF xE(O,D) X
(3.4)
Poincare-type Inequalities
88
(3) Approximating procedure and explicit bounds: Let BlB < 00. Define fo = y"P, fn(x) = [[fn-l
(3.5) for all n 2: 1. We are now going to sketch the proof of the second variational formula in (3.4), from which the explicit upper bound AlB ::; 4BlB follows immediately, as we did at the end of the last section. The explicit estimates "BlB ::; AlB ::; 4BlB " were previously obtained in [7] in terms of the weighted Hardy's inequality [22]. The lower bounds follows easily from (3.2).
Sketch of the proof of the second variational formula in (3.4). The starting point is the variational formula for A (cf. (2.3)):
e-C(x) A < inf sup - IEF xE(O,D) f'(x) Fix g
lD
fec -x a
= inf sup
e-C(x)
IEF xE(O,D) f'(x)
lD x
fdp.
> 0 and introduce a transform as follows. b ---+ b/ g,
a
---+
a/g > O.
(3.6)
Under which, C(x) is transformed into
l°
x
Cg(x) =
big
- / = C(x). a g
This means that the function C is invariant of the transform, and so is the Dirichlet form D (f). The left-hand side of (1.1) is changed into
faD f2ge C/a
= faD f2gdp.
At the same time, the constant A is changed into
lD
e-C(x) Ag ::; inf sup f' ( ) f gdp. x x IEF XE(O,D) Making supremum with respect to g E Q, the left-hand side becomes
and the constant becomes
e-C(x) AlB = sup Ag ::; sup inf sup f' ( ) 9 glx x e-C(x)
= inf sup l' ( ) sup 1
x
= i~f s~p
x 9 e-C(x) f'(x)
lD °
lD x
f gdp ::; inf sup sup Ig x
f I(x,DWdp.
IlfI(x,D) IllB'
Mu-Fa Chen
89
We are done! Of course, more details are required for completing the proof. For instance, one may use 9 + lin instead of 9 to avoid the condition" 9 > 0" and then pass limit. 0 The lucky point in the proof is that "sup inf ::::; inf sup", which goes to the correct direction. However, we do not know at the moment how to generalize the dual variational formula for lower bounds, given in the second line of (2.3), to the general Banach spaces, since the same procedure goes to the opposite direction.
4
Neumann Case; Orlicz Spaces
In the Neumann case, the boundary condition becomes J'(O) = 0, rather than J(O) = O. Then Ao = 0 is trivial. Hence, we study Al (called spectral gap of L), that is the inequality (1.2). We now consider its generalization (1.4). Naturally, one may play the same game as in the last section extending (2.5) to the Banach spaces. However, it does not work this time. Note that on the left-hand side of (1.4), the term 7r(f) is not invariant under the transform (3.6). Moreover, since 7r(J) = 0, it is easy to check that for each fixed J E F, I(J)(x) is positive for all x E (0, D). But this property is no longer true when dJ-.l is replaced by gdJ-.l. Our goal is to adopt the splitting technique explained in Section 2. Let () E (p, q) be a reference point and let A;o, B~o, C~o, D;o (k = 1,2) be the constants defined in (3.2) and (3.3) corresponding to the intervals ((), q) and (p, ()), respectively. By Theorem 3.1, we have
k = 1,2. Theorem 4.1. Let (2.6) and (3.1) hold. Then, we have the Jo llo wing assertions.
(1) Explicit criterion: AB <
00
iff B~o V B't/ <
00.
(2) Estimates:
10 20 - B -
= 1,
bo ... bn -
J-.ln =
Consider a Banach space (IB, satisfying (3.1). Define i
1
I . liB, J-.l)
of functions E .-
{O, 1, 2, ... }
--+
lR
1
L-t J-.l'a.' i>l' - ,
'P'-'\;'"'t -
j=1
J
J
Clearly, the inequalities (1.3) and (1.4) are meaningful with a slight modification.
Poincare-type Inequalities
90
Theorem 4.2. Consider birth-death processes with state space E. Assume that
Z <
00.
(1) Explicit criterion for (1.3): Alffi
< 00 iff Blffi <
00.
(2) Explicit bounds for A lffi : Blffi ::::; Alffi ::::; 4Blffi· (3) Explicit criterion for (1.4): Let the birth-death process be non- explosive: 1
<Xl
i
"""" L....J -/I·b """" L....J /-L j =
(4.1)
00.
i=O ,...,~ ~ j=O
Then Alffi
< 00 iff Blffi <
00.
(4) Estimates for A lffi : Let E1 = {I, 2, ... } and let C1 and C2 be two constants such that 11r(f) I ::::; c111flllffi and 11r(f IEI)I ::::; c211fIEI ll lffi for all f E Iffi. Then, max {11111;I, (1::::; Alffi ::::; (1
V 2(1- 1r0)1111Ilffi )2}Alffi C
(4.2)
+ VcIil 1 Illffi ) 2Alffi.
Similarly, one can handle the birth-death processes on Z. An interesting point here is that the first lower bound in (4.2) is meaningful only in the discrete situation.
Orlicz spaces. The results obtained so far can be specialized to Orlicz spaces. The idea also goes back to [7]. A function : JR --+ JR is called an N - function if it is non-negative, continuous, convex, even (i.e., <1>( -x) = (x )) and satisfies the following conditions:
(x) =0 iff x=O,
lim (x) / x = 0,
lim (x) / x =
In what follows, we assume the following growth condition (or for <1>: sup (2x) /<1> (x ) x»l
<
00
00.
x-><Xl
x->O
~2-condition)
(¢::::::? sup x
where ~ is the left derivative of <1>. Corresponding to each N-function, we have a complementary N -function:
Y E JR. Alternatively, let 'Pc be the inverse function of ~, then
[24]). Given an N-function and a finite measure /-L on E := (p, q) Orlicz space as follows:
Ilfll1>
= sup gEt;}
c JR, define an
JEr Iflgd/-L,
(4.3)
Mu-Fa Chen
91
where 9 = {} 2:: , : JE EBJ (} )d/-l :::;
00 },
which is the set of non-negative functions
in the unit ball of L
(resp., (2.6)) holds, then Theorem 3.1 ( resp., 4.1) is available for the Orlicz space (L
5
N ash inequality and Sobolev-type inequality
It is known that when v
> 2, the Nash inequality (1.5):
is equivalent to the Sobolev-type inequality: Ilf - 7f(J)II~/(v-2) :::; AsD(J), where II· Ilr is the Lr(/-l)-norm. Refer to [1], [8] and [26]. This leads to the use of the Orlicz space L
(5.1) The results in this section were obtained in [19], based on the weighted Hardy's inequalities. Define C(x) = bfa, /-l(m, n) = eC /a and
J:
J;
l i
x
e- c
B~fJ
= sup
fJ
e- c
B~fJ
= sup
Here B~fJ (k = 1,2) is specified from BJB given in (3.3) with IBl = L
> 2.
(1) Explicit criterion: Nash inequality (equivalently, (5.1)) holds on (p,q) iff B~fJ V B~fJ
<
00.
(2) Explicit bounds:
~ (BlfJ 1\ B2fJ)
max { 2
v
v'
[1 _(ZlfJ V Z2fJ ) 1/2+I/V] 2 (BlfJ V B2fJ) } ZlfJ + Z2fJ v v
:::; Av :::; 4(B~fJ V B~fJ). In particular, if () is the medium of /-l, then
(5.2)
92
Poincare-type Inequalities
We now consider birth-death processes with state space {O, 1,2" .. }. Define 1
i
00
1!1'="\;"""'t't ~ ,i>l' _,
Bv
j=l/J-jaj
= sup 'Pi
(
t>l -
)
(v-2)/v
L/J-j ..
J=t
Theorem 5.2. For birth-death processes, let (4.1) hold and assume that Z < Then, we have max {(
2 )2/V , 1[ - (Z_1)1/2+1/V]2} _ vzv/2-1 -----zBv :s; Av :s; 16Bv.
Hence, when v > 2, the Nash inequality holds iff Bv <
6
00.
(5.3)
00.
Logarithmic Sobolev inequality
The starting point of the study is the following observation.
~II(J -
7r(J))211
:s; £(J) :s;
~~ II(J -
7r(J))211
(6.1)
where
(6.2)
Again, here B~o (k
= 1,2) is specified from BM, given in (3.3).
Theorem 6.1. Let (2.6) hold.
(1) Explicit criterion: The logarithmic Sobolev inequality on (p, q) c lR holds iff sup /J-(x, q) log XE(O,q)
sup /J-(p,x)log xE(p,O)
(1 ) /J- X, q (1 ) /J- p, x
l 1 x
e- c
<
00
and
0
0
x
hold for some (equivalently, all) () E (p, q).
(6.3) e- c
< 00
Mu-Fa Chen
93
(2) Explicit bounds: Let (j be the root of B~(}
= B~(}, () E [p, q]. Then, we have (6.4)
By a translation if necessary, assume that () = 0 is the medium of J-L. Then, we have
We now consider birth-death processes with state space {O, 1,2,· .. }. Define
Bq, = sUP'PiM(J-L[i,oo)), i~l
where J-L[i,oo)
=
Lj~iJ-Lj and
M(x) is defined in (6.2).
Theorem 6.2. For birth-death processes, let (4.1) hold and assume that Z < Then, we have 2 {J4Z+1-1 - max 5 2' ~
A LS
~
00.
( 1 - ZlW-1(Zll))2} Bq, ZW- 1(Z-l)
551( 1 + w- 1( z- 1))2 Bq"
where Zl = Z - 1 and w- I is the inverse function of w: w(x) In particular, A LS < 00 iff
1 ) sup 'Pi J-L[i, 00) log [. i~1 J-L'l,00
= x 2 log(1 + x 2).
< 00.
Acknowledgement. This paper is based on the talks given at "Stochastic analysis on large scale interacting systems", Shonan Village Center, Hayama, Japan (July 17-26, 2002) and "Stochastic analysis and statistical mechanics", Yukawa Institute, Kyoto University, Japan (July 29-30, 2002). The author is grateful for the kind invitation, financial support and the warm hospitality made by the organization committee: Profs. T. Funaki, H. Osada, N. Yosida, T. Kumagai and their colleagues and students.
Mu-Fa Chen Department of Mathematics, Beijing Normal University Beijing 100875, The People's Republic of China E-mail: [email protected] Home page: http) /www.bnu.edu.cn;-chenmf/main_eng.htm
94
Poincare-type Inequalities
Bibliography [1] Bakry, D., Coulhon, T., Ledoux, M., Saloff-Coste, L., Sobolev inequalities in disguise, Indiana Univ. Math. J. 44(4), 1033-1074 (1995) [2] Bhattacharya, R N., Criteria for recurrence and existence of invariant measures for multidimensional diffusions, Ann. Probab. 3, 541-553 (1978). Correction, ibid. 8, 1194-1195 (1980) [3] Bhattacharya, R N., Multiscale diffusion processes with periodic coefficients and an application to solute transport in porous media, Ann. Appl. Probab. 9(4), 951-1020 (1999) [4] Bhattacharya, R N., Denker, M. and Goswami, A., Speed of convergence to equilibrium and to normality for diffusions with multiple periodic scales, Stoch. Proc. Appl. 80, 55-86 (1999) [5] Bhattacharya, R N. and G6tze, F. Time-scales for Gaussian approximation and its break down under a hierarchy of periodic spatial heterogeneities, Bernoulli 1, 81-123 (1995) [6] Bhattacharya, R N. and Waymire, C. Iterated random maps and some classes of Markov processes, in "Handbook of Statistics", Vol. 19, pp.145170, Eds. Shanbhag, D. N. and Rao, C. R, Elsevier Sci. B. V., 200l. [7] Bobkov, S. G., G6tze, F., Exponential integrability and transportation cost related to logarithmic Sobolev inequalities, J. Funct. Anal. 163, 1-28 (1999) [8] Carlen, E. A., Kusuoka, S., Stroock, D. W., Upper bounds for symmetric Markov transition functions, Ann. Inst. Henri Poincare 2, 245-287 (1987) [9] Chen, M. F., Analytic proof of dual variational formula for the first eigenvalue in dimension one, Sci. Chin. (A) 42(8), 805-815 (1999) [10] Chen, M. F., Equivalence of exponential ergodicity and L2-exponential convergence for Markov chains, Stoch. Proc. Appl. 87, 281-297 (2000) [11] Chen, M. F., Explicit bounds of the first eigenvalue, Sci. Chin. (A) 43(10), 1051-1059 (2000) [12] Chen, M. F., Variational formulas and approximation theorems for the first eigenvalue in dimension one, Sci. Chin. (A) 44(4), 409-418 (2001) [13] Chen, M. F., Ergodic Convergence Rates of Markov Processes - Eigenvalues, Inequalities and Ergodic Theory, Collection of papers, 1993-200l. http://www.bnu.edu.cnrchenmf/main_eng.htm [14] Chen, M. F., Variational formulas of Poincare-type inequalities in Banach spaces of functions on the line, Acta Math. Sin. Eng. Ser. 18(3), 417-436 (2002) [15] Chen, M. F., Variational formulas of Poincare-type inequalities for birthdeath processes, preprint (2002), submitted to Acta Math. Sin. Eng. Ser. [16] Chen, M. F. and Wang, F. Y., Estimation of spectral gap for elliptic operators,r Trans. Amer. Math. Soc. 349(3), 1239-1267 (1997)
Mu-Fa Chen
95
[17] Deuschel, J. D. and Stroock, D. W., Large Deviations, Academic Press, New York, 1989 [18] Gross, L., Logarithmic Sobolev inequalities, Amer. J. Math. 97, 1061-1083 (1976) [19] Mao, Y. H., Nash inequalities for Markov processes in dimension one, Acta Math. Sin. Eng. Ser. 18(1), 147-156 (2002) [20] Mao, Y. H., The logarithmic Sobolev inequalities for birth-death process and diffusion process on the line, Chin. J. Appl. Prob. Statis. 18(1), 94-100 (2002) [21] Miclo, L., An example of application of discrete Hardy's inequalities, Markov Processes Relat. Fields 5, 319-330, (1999) [22] Muckenhoupt B., Hardy's inequality with weights, Studia Math. XLIV, 31-38 (1972) [23] Nash, J., Continuity of solutions of parabolic and elliptic equations, Amer. J. Math. 80, 931-954 (1958) [24] Rao, M. M. and Ren, Z, D., Theory of Orlicz Spaces, Marcel Dekker, Inc. New York, 1991 [25] Rothaus, O. S., Analytic inequalities, isoperimetric inequalities and logarithmic Sobolev inequalities, J. Funct. Anal. 64, 296-313 (1985) [26] Varopoulos, N., Hardy-Littlewood theory for semigroups, J. Funct. Anal. 63, 240-260 (1985)
96
Poincare-type Inequalities
Brownian Motion and the Classical Groups Anthony D' Aristotile
Persi Diaconis
SUNY at Plattsburgh
Stanford University
Charles M. Newman Courant Inst. of Math. Sciences Abstract Let r be chosen from the orthogonal group On according to Haar measure, and let A be an n x n real matrix with non-random entries satisfying Tr AA t = n. We show that Tr Ar converges in distribution to a standard normal random variable as n ---t 00 uniformly in A. This extends a theorem of E. Borel. The result is applied to show that if entries {31, . . . , {3k n are selected from r where k n ---t 00 as n ---t 00, then
/if 'L,1~ltl {3j, 0 :s:
t
:s:
1 converges to Brownian motion. Partial results
in this direction are obtained for the unitary and symplectic groups.
Keywords: Brownian motion; sign-symmetry; classical groups; random matrix; Haar measure
1
Introduction
Let On be the group of n x n orthogonal matrices, and let r be chosen from the uniform distribution (Haar measure) on On. There are various senses in which the elements of for behave like independent standard Gaussian random variables to good approximation when n is large. To begin with, a classical theorem of Borel [6] shows that P{ for 11 ::; x} ---t
vk I-oo eX
t2
dt. Theorems 2.1 and 2.2 below refine this, showing that an arbitrary linear combination of the elements of r is approximately normal: as n ---t 00, =
sup A#O
-oo<x
2
IP{ Tr(Ar) ::; x} -
I ---t o.
(1.1)
Here A ranges over all non-zero n x n matrices and IIAII = Tr(AAt); thus the normal approximation result is uniform in A. Borel's theorem follows by taking A to have a one in the one-one position and zeros elsewhere. When A above is the identity matrix, Diaconis and Mallows (see [11]) proved that Trr is approximately normal; this follows by taking A as the identity. As A varies, it follows that linear combinations of elements of r are also approximately normal. Interpolating between these facts and Borel's result, we prove that linking appropriately normalized entries from r yields in the limit standard Brownian motion. This is stated precisely in Theorem 3 below. We give a little history. Borel's result is usually stated thus: Let X be the first entry of a point randomly chosen from the n-dimensional unit sphere. Then P{ foX ::; x} ---t
97
Brownian Motion and the Classical Groups
98
micro canonical ensemble (uniform on the sphere) are captured by the canonical ensemble (product Gauss measure). These results are often mistakenly attributed to Poincare. See [15] for a careful history, rates of convergence, and applications to de Finetti type theorems for orthogonally invariant processes. The present project may be seen in the same light: the conditional distribution of an n x n matrix M with independent standard Gaussian coordinates, conditioned on M MT = I is Haar measure on the orthogonal group. Borel also studied the joint distribution of several coordinates of fo f. His work was extended by Levy [24, 25, 26]' Olshanski and Vershik [33] and 1 1 Diaconis-Eaton-Lauritzen [13]. These last authors show that any n a x n a block of fof converges to product Gauss measure in total variation. They also give applications to versions of deFinettti's theorem suitable for regression and the analysis of variance. Extensions by McKean to infinite dimensions are in [30]; he writes that "It is fruitful to think of Wiener space as an infinite-dimensional Our Theorem 3 gives one rigorous version of this fantastic sphere ofradius statement. These ideas were developed by Hida [21]; see Kuo [23] for a recent account. The study of global functionals such as the trace is carried out in [14, 16, 33]. In particular, the joint limiting distribution of Tr(f), Tr(f2), . .. , Tr(fk) is determined as that of independent normal variables. This turns out to be equivalent to a celebrated theorem of Szego and allows further study of the eigenvalues of f; see [7]. The eigenvalues of such random matrix models arise in dozens of situations and are currently being intensely studied. Mehta [32] gives a book length treatment. The area is in active development; see [12] for a recent survey. Interestingly, the eigenvalues of a Gaussian matrix have very different behavior from the eigenvalues of a random orthogonal matrix. In the first case they fill out the inside of the unit circle with order fo of them on the real axis [2, 17]; in the second case the eigenvalues lie on the unit circle. Brownian limits for partial traces are established in [10] and by Rains [34]. This last paper does much more, establishing results for partial traces of random matrices with law invariant under conjugation by On. This includes powers of Haar distributed matrices. One recent global result of Jiang [22] shows that the maximum entry of fo f has the same limiting distribution as the maximum of n 2 standard normal variables. His method of proof gives an approximate coupling between the first J columns of rand J columns of standard normals for J of order n/ (log n)2. The uniform Gaussian limit for linear combinations of the entries of a random orthogonal matrix is proved in Section 2. This is used to prove Brownian motion limits in Section 3. The unitary and symplectic groups are treated in Sections 4 and 5. While we cannot prove completely parallel results, we can show that the sequences of partial sums along the diagonal, suitably normalized, converge to complex Brownian motions.
roo."
2
A Refinement of a Theorem of Borel
Our main tool will be obtained by extending a theorem of Borel [6]. A key to the analysis is that a Haar distributed element of the orthogonal group has entries that are invariant under the sign-change group. If r is an n x n orthogonal
Anthony D 'Aristotile, Charles M. Newman and Persi Diaconis
99
matrix and M is a random diagonal matrix with ±1 chosen uniformly down the diagonal, then the diagonal entries of Mf are ±fii . Under mild conditions on f ii, sums of such entries are close to Gaussian by classical theory. If f is uniform on On, then Mf has the same law as f. The following result both makes this precise and more general. Theorem 2.1. For each positive integer n, choose any n x n real nonrandom matrix A with IIAII = n (Here IIAII = TrAAt j, and let f be a random Haar distributed n x n orthogonal matrix. Then Tr Af converges in distribution to N(O,l) as n ~ 00. Remark. The matrix A above depends on n. We have suppressed this in the notation. See Mallows [27] for further discussion of this method quantifying joint convergence of a growing vector to a vector of independent normals.
Proof. By singular value decomposition [20], there are orthogonal n x n matrices U and V such that U AV = W where W = Diag(al' ... , an) and al 2;> a2 2;> ••• 2;> an 2;> O. Now
TrAf
TrAVV-If
TrU(AVV-If)U- 1 Tr(U AV)(V- I fU- I ). =
(2.2)
However, U AV is diagonal with non-negative, non-increasing entries and V-I fU- 1 is random orthogonal by the invariance of Haar measure. We thus assume for the rest of the proof that A is diagonal with nonincreasing entries aj and I All = n. If we write Xj for fjj, then we have TrAf = I: 1a j X j which we may also write as Sn. We will show that IE(e irSn ) - e- r; I converges to O. To do this, it is enough to demonstrate that for each real r there is a constant L 2;> 0 such that, for each E in (0,1), lim sup IE(e irsn - e-,~2)1 ::; LE.
(2.3)
n-+oo
This last assertion will hold if, given any subsequence nz of the positive integers, there is a further subsequence nzu such that IE(eirs"lu) - e-r:)1 is eventually less than or equal to LE. 2
Given E > 0 , choose a positive integer m 2;> }2 so that ~ ::; E2 for j > m. This is possible since by induction one can show that for all j and all n (recall that ai is non-increasing in i). Since aj ::; Vn, it is possible to choose nlu which satisfies
a; : ; y
~ ~ cy. as u ~ ~J v'vlu
Here 0 ::;
CYj ::;
00
for j = 1, ... , m.
(2.4)
1. We must consider E( eirS"lu ) but shall henceforth replace ne u .
7,2
by n to simplify notation. Now IE(e~rSn) - e- 2 sum of the following 3 terms:
1
is less than or equal to the
(2.5) and (2.7)
Brownian Motion and the Classical Groups
100
To bound (2.5) , first of all note that n
II
E( eirSn ) = E( eir 2:.7'=1 ajX j
eirajXj)
(2.8)
j=m+l n
=
II
E(e ir 2:.7'=l ajXj
(cos(rajX j ) + isin(rajXj )))
(2.9)
j=m+l n
=
II
E(e ir 2:.7'=l ajXj
cos(rajX j )).
(2.10)
j=m+l
To pass from (2.9) to (2.10), one should keep in mind the sign-symmetry of the X j . In addition, n
II
cos(rajXj ) - e- 2:..f=rn+1
T22
a;E(X]l1
(2.11)
j=m+1
j=m+l
le-2 L...j=rn+1 a r2 " , n
2 x2
j
2
n
L
~ r4
ajXf
j
_
r2 " , n e-2 L...j=rn+1 a2j E(X2) j I
(2.12)
n
L
+ r2 I
a;(X] - E(X])) I.
(2.13)
j=m+l
j=m+1
To see that (2.12) is bounded above by (2.13), first take notice that for complex numbers ZI,' . " Zn, WI,' . " Wn of modulus less than or equal to 1, we have n
n
n
j=1
j=1
j=1
This is easily proved by induction. Also, it is not hard to show that
for all real numbers t. Finally, one observes that le- a - e-bl < la non-negative a, b. In view of (2.8)-(2.10), (2.5) is equal to
J
I (e ir 2:.~1 ajX j
n
II
cos(rajX j )
j=m+1
-e- T22 2:..f==+1 a;E(X;) e ir 2:.7'=1 ajX j ) dPI
bl
for
Anthony D 'Aristotile, Charles M. Newman and Persi Diaconis
(as we saw earlier that E(XJ)
= r4 ~
L
Using (11)-(13), this is bounded by
n
+ r2
ajE(XJ)
j=m+l n
r4
1;).
2/ I 2:
n
L
=
2
n
j=m+l
L
a;X] - E(
j=m+l n
2:
+ r2 (Var(
ajE(Xj)
101
a;XJ) I dP
j=m+l
(2.14)
a;X]))!.
j=m+l
To obtain (2.14), which is our initial bound for (2.5), keep in mind that / IY - EYI dP
~ (/ IY -
EYI 2 dP)!
= (VarY)!
by Holder's inequality. We will return to (14) but first we claim that (2.7) converges to zero. Since
which converges to 1 -
L.";=1
0:;, our assertion is clear. It is also the case that
r2 1 2:n a2 2.6 ) converges to zero. To see this, first note that since e- Tn j==+l j is 2 a X ) bounded, it is enough to verify that E ( eir 2:= j=l j j converges to e- Tr2 2: j=l a j . (
111
But this immediately follows from the fact [13J that the entries of the block matrix [y'nrijhsi,jSm are in the limit independent, each with the standard normal distribution. From (2.14), and the previous paragraph, we have
~
n
r4
L
2
ajE(XJ)
+ r2 (Var(
j=m+l
n
L
a;X]))!
+ Bn
j=m+l
where Bn ---+ 0 as n ---+ 00. Since X/ has a beta distribution with parameters 1, 1 and thus E(Xj) = (n)(~+2) ~ and ~ L.:+l ~ 1, we have
n-
:2
a;
(2.15) Therefore
Brownian Motion and the Classical Groups
102
Furthermore, 11
11
= 2:: ajVar(X])
Var( 2:: a;X]) j=m+l
m+l 11
2:: a;a~ Cov(X], X~).
+
j,k=rn+l
#k
Now
=
11 2:: a·J4
11 4 4 1 - -n2 2:: aE(X.) J J
j=m+l
3
j=m+l
11
< - _ n2 ""' L-
a4 J
_
1 n2
11
4 L- aJ
_
""'
j=m+l
2
=
j=m+l
11
n2
aj
2:: j=m+l
(2.16)
:::; 2E2.
To obtain (2.16) we can appeal to (2.15). By expanding and taking expectations of both sides of 11
11
1 = (2::rL)(Lr~j)' j=l
j=l
it follows that Thus
Therefore, for j
i=
k,
Cov(X], X~) = 1
< - n(n - 1)
E(X]X~) - ~ n
1 n 2'
One then easily verifies that for n 2 2 2
2
Cov(Xj' X k )
:::;
2 3' n
Thus 11
L j,k=rn+l
j#k
a;a~ Cov(X], X~)
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis
103
which converges to zero. We have
where En ---+ 0 as n ---+ 00. This yields (2.3) for some L depending on r, as desired, and we are done.
D
Our next result shows that the convergence in Theorem 2.2 is uniform in A. We only work with diagonal matrices A here but singular value decomposition says that this suffices. We then find it convenient to think of A as a point of a sphere of radius fo. Theorem 2.2. Let
r, Xj
=
rjj be as in Theorem 2.1 , and let An be the
surface of the sphere of radius fo in lRn. For v = (aI, ... , an) E An, write Sn(v) for 2.:.]=1 ajXj . Then Sn converges in distribution to N(O, 1) uniformly on An, z.e., as n ---+ 00, sup IP(Sn(v)::::; x) - (x) I ---+ O. xElft,vEAn
Proof. We first verify that the family F = {Sn(v): v E An, n = 1,2, ... } is tight. Corresponding to any sequence S of F, either there is a positive integer Y such that S is contained in the family {Sj (Vj): Vj E A j , 1 ::::; j ::::; Y} or S has a sub-sequence Snl (v n1 ) where nl ---+ 00. In the first case, S has a subsequence of the form Sk(Pku) where k is a fixed positive integer, 1 ::::; k ::::; Y, and Pku = (al u , ... , aku) E Ak for u = 1,2, .... Choose a sub-sequence Ul of the positive integers such that a ru1 ---+ br for 1 ::::; r ::::; k. Plainly Sk(PkuJ => Sk(W) where w = (h, ... , bk ). In the second case, the argument of Theorem 1 shows that SnJ vnJ => N(O, 1). Thus F is tight. It is easy to see that because of tightness, it suffices to show, as we now do, that for any interval [a, bJ S;;; lR lim n--=
sup
IP(Sn(v) ::::; x) - (x) I = O.
xE[a,b], vEAn
If false, there exists an EO > 0, a sub-sequence nz elements vn1 E Anl such that
---+ 00,
points x nl E [a, b], and
Now X n1 has a non-increasing or non-decreasing sub-sequence xn1u which converges to x E [a, bJ. We assume without loss of generality that xnzu is nondecreasing. We henceforth work with n1u but suppress the subsequence notation. Note that
Since Sn(v n ) => N(O,l), it is clear that P(x n < Sn(v n ) ::::; x) ---+ 0 and hence that P(Sn(v n ) ::::; xn) ---+ (x). Since (xn) ---+ (x), we obtain a contradiction which proves our claim. D
Brownian Motion and the Classical Groups
104
3
Orthogonal Matrices
We use the results of Section 2 to prove the main theorem of this section. This shows that if any growing selection of entries of a random orthogonal matrix are linked together in the classical way, a limiting standard Brownian motion results. To set up our notation, let r = (r ij )i,j=l be an n x n orthogonal matrix distributed by Haar measure. Choose a subset of size k n from among the entries of r. Suppose the entries are f31, f32, ... ,f3kn with f3j corresponding to e.g. lexicographic order ofr Ts : (r,s) < (x,y) ifr < x or ifr = x and s < y. To denote this ordering we write f31 r ll , f32 r 12 , ... , f3n+l r 21 , etc. ('oJ
('oJ
('oJ
Theorem 3.1. Let f31, f32, ... ,f3kn be entries of a Haar distributed random matrix in On, as above. Assume that k n / 00. If for £ in {I, ... ,kn } and t in
[0, 1],
then Xn ==> W, a standard Brownian motion, as n
-+ 00.
Proof. We first prove that the finite-dimensional distributions of Xn converge to the corresponding distributions of W. For a single time point t, we must prove that Xn(t) ==> N(O, t) = W t as n -+ 00. However, this is equivalent to
For each n, let A = (aij) i,j = 1 be the n x n real matrix defined as follows : if f3i
('oJ
r
ST,
for some i, 1 if f3[kntJ+l otherwise Note that
('oJ
r
:s: i :s: [knt] ST
IIAII = nand
which converges to N(O, 1) in distribution by Theorem 2.1.
However,
n - [kk':ln ::; k:t and so, by [5], it suffices to show that ji5,f3[k ntJ+l in probability, which folows from k n
-+ 00
° :s: -+
°
and the fact [13] that
We now consider two time points sand t with s < t. By the Cramer-Wold device [5], it is enough to show that
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis for any (a, b) E
]R2.
105
However, this is equivalent to showing that
where This can be shown by choosing an appropriate sequence of matrices A, as follows, and again applying Theorem 2.1. First note that
[k n s]a 2 + ([knt] - [k n s])b2 ~ (kn s - 1) a2 + (( k n t - 1) - k n s) b2 = k n sa2 - a2 + k n tb 2 - b2 - k n sb 2 = k nC 2(s, t) - (a 2 + b2)
Also observe that
[k n s]a 2 + ([knt] - [k n s])b 2 :::; k n sa 2 + k ntb 2 - (kns - 1)b 2 = kn sa 2 + k ntb 2 - k n sb2 + b2 = k n C 2 (s,
t)
+ b2
Combining these facts, we have
k n C2(S, t)
< (n _ [kns]na 2 _ ([knt] - [kns])nb 2 ) -
k nC2(s, t)
k nC2(S, t)
n(a 2 + b2 ) :::; k n C2(s, t) With these preliminaries, we define the matrix A in two cases. If n ([k n t]-[k n s])nb k n C2(s,t)
2
~~C;J(s~:) -
> 0 let A = (a·~,].)T}. be defined as follows: z,]=l
-,
if f3i
rv
r vu,
for some i, 1 :::; i :::; [kns] if f3i
rv
i, [kns]
r vu,
for some
+ 1 :::; i :::;
if f3[k n t]+l otherwise
rv
r vu
[knt]
Brownian Motion and the Classical Groups
106 s]na 2 O n the 0 ther h an d ,1'f n - k[kn nC2(s,t) we define A = (ai,j)i,j=l by:
if
/3i
r-v
r vu,
for some i, 1 ::s; i ::s; [knsJ if
/3i
r-v
r vu,
for some i, [knsJ if /3[k n t] r-v otherwise
+ 1 ::s; i ::s;
r vu
Note that in either case IIAII = n and so Tr(Ar)) => N(O, 1) by Theorem 2.1. However, it is plain that C(~,t) S[kns] + C(~,t) (S[knt] - S[kns]) differs from
ji[:,
Tr(Ar) by a quantity in absolute value bounded by 1f[!]; where 1 is an entry of the random n x n random r. Thus, as before, what remains is to show that ~
ji[:, converges to zero in probability. Thus, given
P( I via2 + b2 C (s, t)
I V~ kn 1 <
) = P( Iyin I n1 <
E
E
>0
c (s, t) ~) nE
via2 + b2
which converges to 1 as n ---+ 00. A similar argument shows that the higher order finite dimensional distributions behave properly. We next show that Xn is tight. According to Theorem 15.6 of [5], it is enough to show that for sufficiently large n
for K independent of n, h, t, and t 2 . The left member of the above expression is
where [kntlJ < i, j ::s; [kntJ and [kntJ < k, I ::s; [knt2J. Put [knt]- [knhJ = ml and [knt2J - [kntJ = m2· The left member of (3.17) is bounded from above by 2
~2 (aE(ri1 r i2) + bE(rilr~2)) n
where a and b are both less than or equal to ml m2. Here we have used the fact that for distinct entries 6, /3, Q and (J of a random orthogonal matrix, E(6 2 /3Q) = 0 and E(6/3Q(J) is non-positive. The first assertion uses the fact that for any (nonrandom) diagonal sign matrix M, the random matrices r M and Mr are equidistributed with r; the second assertion uses that and also the fact that Ebll 112121/22) = - (n-l)~(n+2) [36]. However, both n 2E(q1 q2) and n 2 E(ri1 r~2) converge to 1 , and so for all n, both expectations are less than for some positive constant L. Combining all of this information, we
:2
([kntJ - 1)
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis
107
have that
If k~ -s; t2 - t l
,
then
ml::,m2
-s;
2(t2 -
td while if k~ > t2 - t l , then either D
Xn(t) - Xn(h) or X n (t2) - Xn(t) is zero. Thus our claim is established.
4
Unitary Matrices
We first thought that obtaining unitary analogues of Theorems 1, 2, 3 would be straightforward but then encountered difficulties in translating to the complex case because of the lack of a singular value decomposition. This led us to carefully redo the preliminaries. Our main results are Theorems 5 and 6 below. For the proof of Theorem 6, it will be necessary to first establish Theorem 4, which is the analogue of Theorem 15.1 of [5]. To this end, let nk, V and £ denote respectively the Borel sets of IRk, D and D x D, where D is the Skorokhod space of right-continuous real-valued functions on [0,1] with left limits. For t l ," ., tk in [0,1], define by 7rtl, .. ,tk (x) = (x(td,"" X(tk)) for xED. Following Billingsley [5], sets of the form 7r~~... ,tk (H) where H E nk are subsets of D and called finite-dimensional sets. If To is a subset of [0,1], let :Fro be the collection of sets 7r~~... ,tk (H) where k 2:: 1, ti E To, and H E nk. Then :Fro is an algebra of sets, i.e., :FTo is closed under finite unions and finite intersections and the empty set 0 E :FTo ' See Royden [35] for more details. Obviously, :F[0,1] is the class of finite-dimensional sets. Billingsley has shown (Theorem 14.5 of [5]) that if To contains 1 and is dense in [0,1], then :FTo generates V. Extending these ideas, for Sl,' .. , Sk and tl,' .. , tz in [0,1]' define
by sending (x, y) to (x(st},· .. , X(Sk); y(td,' .. , y(tl))' Subsets of D x D of the form where HEn k, KEn I are called finite-dimensional sets (of D x D). If To and T1 are subsets of [0,1], let :FTo,Tl be the class of sets
where Si E To, tj E T 1, k 2:: I, l 2:: I, H E n k, and K E nl. One can easily verify that :FTo,Tl is a semi-algebra of sets, i.e., the intersection of any two members of FTo,Tl is again in :FTo,TJ and the complement of any set in :FTo,Tl is a finite disjoint union of elements of :FTo,TJ . If we let A be all finite disjoint unions
Brownian Motion and the Classical Groups
108
of members of FTo,T1 , then A is an algebra of sets in D x D (any semialgebra generates an algebra in this way [35]). Suppose To and Tl are both dense subsets of [0,1] and that 1 E To n T 1 . Let L be the a-algebra of subsets of D x D generated by FTo,Tl' Sets of the form
n
where H is in k and 81,' . " 8k E To are in FTo,T1 and may be identified with FTo ' Since FTo generates V, it is clear that G x DEL for all open sets G of D. Similarly D x L E L for all L open in D and so L contains all sets G x L where G, L are open in D. It is now plain that £ <::: L. On the other hand, Billingsley has shown that
is a measurable mapping. In a completely analogous way, it can be shown that
is also measurable (here n k x nl is the a-algebra of subsets of]Rk x]Rl generated by "measurable rectangles" of the form H x K where H E n k , K E nl). This a-algebra is precisely the a-algebra of Borel sets of]Rk x ]Rl (see [4]). It follows that the finite-dimensional subsets of D x D lie in £ by definition of measurable mapping. Thus L <::: £ and so we have L = £. Suppose P and Q are two probability measures on (D x D, £) which agree on FTo,T1 • Then they clearly agree on the a-algebra A generated by FTo,T1 • Since A generates £, it follows that P = Q on £ by Theorem 3. 2 of [4]. In the language of Billingsley [5], for To,T1 dense in [0,1] with 1 E TonTI, FTo,T1 is a "determining class." If P is a probability measure on (D, V), let Tp be the set of all points t E [0,1] such that 1ft is continuous except on a subset of D which has P':'measure O. Billingsley [5] has shown that Tp contains 0 and 1 and its complement in [0,1] is at most countable. Now let P be a probability measure on (D x D,£) with marginals Rl and R 2 . If 81,' .. , 8k E TRI and t 1 ,' . " tl E T R2 , then 7rs1 ,"',Sk is continuous except on a subset A of D of R1-measure zero. Similarly 1ftl, ... ,tl is continuous except on a subset B of D of R 2 -measure zero. Now (A x D)U(D x B) has P-measure 0 and off this set 7rs1 , ... ,Sk;tl, ... ,tl is continuous. We will need the following: Theorem 4.1. Let P n , n = 1,2, "', and P be probability measure8 on (DxD, E). Suppose Rl and R2 are the marginal probability measures of P. If {Pn } is tight and if Pn1f~: ... ,Sk;tl, ... ,tl =} P7r~: ... ,Sk;tl, ... ,tl holds whenever all the 8i are in TRl and all the tj are in T R2 , then Pn =} P. Proof. Since {Pn } is tight, each subsequence {Pn '} contains a further subsequence {Pn "} converging weakly to some limit Q. By Theorem 2 of [5], it suffices to show that each such Q is equal to P. Suppose Q1 and Q2 are the marginals of Q. If 81,' . " 8k all lie in TRI n TQ! and t 1 , •. " tz all lie in TR2 n TQ2 , then
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis
109
by hypothesis. Also 1[' s l, .. ,Sk;h, .. ,tl is continuous except on a subset of D x D of Q-measure zero by comments preceding the statement of the theorem. Since Pn" =? Q, it follows by Theorem 5.1 of [5] that
p n 111['-1 Sl,···,Sk;t1, .. ,t1
=?
Q1['-l
Sl,···,Sk;t1,··,tI·
Thus whenever each Si E TR1 n TQ1 and each tj E TR2 n TQ2' Let T1 = TR1 n TQ1 and T2 = TR2 nTQ2 . Each ofT1 and T2 is dense in [0,1] and 1 E Tl nT2 and so as we have seen above, FT1 ,T2 is a determining class. The above equality says that P and Q agree on F T1 ,T2 and we are done. D We are now in a position to establish the complex analogue of Theorem 2.1 (for diagonal A). Theorem 4.2. Let A = Diag(al, ... ,an ) and B = Diag(b1, ... ,bn ) where al 2: a2 2: ... 2: an and b1 2: b2 2: ... 2: bn and IIAII = IIBII = n, and let ~ = f + iA be an n x n unitary matrix distributed by Haar measure. Then (TrAf, TrBA) =? ~(Zl' Z2) as n --+ 00, where Zl and Z2 are i.i.d. standard normal (i.e., Tr Af + iTr Bf converges in distribution to a complex standard normal distribution).
Proof. By the Cramer-Wold device [5], it suffices to prove that xTrAf
1
1
+ yTrBA =? x J2Z1 + y J2Z2
for arbitrary (x, y) E ~2. Write of f and A. We will show that
Xj
for
Ijj
and Yj for
Ajj
with lij,
Aij
the entries (4.18)
converges to zero. We follow the proof of Theorem 2.1 and show that there is a constant L > such that, for each E > 0, the lim sup of (4.18) is less or equal to LE. Given E > 0, choose a positive integer m 2: so that ~ ::; E2 and ~ ::; E2 for j > m and all n. Given any subsequence nz of the positive integers, choose a subsequence nzs which satisfies
°
;2
aj
- - --+ a'l,
yInl;
bj
- - --+
yInl;
(3
as /1
j,
--+ 00
for j = 1,2, ... m.
(4.19)
As before, we will suppress the subsequence notation. The quantity (4.18) is less than or equal to the sum of the following three terms
IE( eir(x 2::=j=l ajXj+y 2::=j=l bj Y j )
_
e-
2~ (x 2 2::=j=Tn+1 a;+y2 2::=j==+l b;)) (4.20)
2 7"2
.E(eir(x2::=';=l a j X j +y2::=';=l bjYj))I,
le- r; 2~ (x 2 2::=j==+l a;+y2 2::=;'==+1 b;) E( eir(x 2::=';=1 ajXj+y 2::=';=1 bj Y j )) r2 1 2 "n 2 ,..2 2 "Tn _e-22nx L..,j==+l a j e-Tx L..,j=l
Oi
2
,..2
1
2 "n
j e-22nY L..,j==+l
b2 j
,.2
(4.21) 2
,,= jl,
e-TY L..,j=l
{32
(4.22)
Brownian Motion and the Classical Groups
110
Since n
1 n
~n
~ ~
m
a2 -----t J
1-
j=m+1 n
L
~ 0: 2 ~ J
and
j=1
m
bJ -----t 1 -
j=m+1
L (3], j=l
the term (4.22) converges to zero. By a known result (see, e.g., Lemma 5.3 of [33]), 1
(foX I , ... , fox m , foYI , ... , foYm ) =* y'2(ZI' Z2, ... , Z2m) where the Zi are i.i.d. N(O,l). Thus
and so
and hence (4.21) converges to zero. To bound (4.20), we first claim that n
n
j=m+1
j=m+l
n
n
To see this, let and note that eir(xL'j=lajXj+YL'j=lbjYj)
= G(
II j=m+1
cos(rxajXj )
II
cos(rybjYy))
j=m+l
plus a sum of products of the form GJ where J is a product of sines and cosines involving at least one sine term. To establish our claim, it is enough to verify that the expectation of any such GJ is zero. First suppose J contains the factor sin(rxajXj ) but not the factor sin(rybjYy). Then E(GJ) = 0 by the sign-symmetry of the diagonal elements of~. Next consider a product GJ containing a factor sin(rxajX j ) sin(rybjYy). The diagonal elements of ~ are also exchangeable, and so we can assume j = m + 1. Write
Anthony D 'Aristotile, Charles M. Newman and Persi Diaconis
111
where Un is the unitary group and P,n is Haar measure. For 0 E [0,27fJ, let D(O) be the n x n diagonal matrix Diag(l, 1, ... , 1, e iO , 1, ... , 1) where eiO is in position m + 1. By the invariance of Haar measure, D(O)Ll has the same distribution as Ll, and so
1
Hsin(rxam+l(SCOS(r+O))) sin(rybm +1 (ssin(r+O))) dP,n =f.
Un
Thus
f27r 10
1 Un
H sin(rxa m+l (scos(r + 0))) sin(rybm +1(ssin(r + 0))) d/1n dO = 27ff.
By Fubini's Theorem [35], we have
1 Un
H
f27r sin(rxam+l(SCOS(r + 0))) 10
sin(rybm + 1 (ssin(r + 0))) dO dP,n
= 27ff.
Next let l(O) = sin(rxam+1(scosO)) sin(rybm +1(ssinO)). Now, l is periodic with period 27f and shifting l by '"Y units yields a functions whose integral over [0,27f] coincides with the integral of lover that same interval. Thus
1
H
Un
f27r l( 0) 10
dO dP,n = 27f f.
However, l is an odd function and so
10f27r l(O) dO = J7r -7r l(O)
dO
=
o.
It follows that f = 0 and our claim is established. Using this fact and arguing as we did in the proof of Theorem 2.1 , we have that the expression in (4.20) does not exceed the value
1 I IT Un
-e
IT
cos( rxaj X j )
j=m+l
"n
_ r2x2 2 E(X2) 2 L..j=m+l a j j
n
e
_ r2y2
L:nj=m+l b2j E(y2) j I dP,n n
L
a;E(Xf) + r ; (Var(
j=m+l n
+r4y4
2
2 2
L
:::; r 4 x 4
cos( rybj lj)
j=m+l
L j=m+l
a;XJ))~
j=m+l 2 2
b;E(Y/) + r ~ (Var(
n
L
b;Yl))~.
j=m+l
We can bound this last expression as in the proof of Theorem 1, which leads us to a proper choice of L and completes the proof of Theorem 4.2. 0 It is natural to ask if Theorem 3.1 has complex and symplectic analogues. We believe this is the case but thus far, like in the case of Theorem 2.1, we are able to prove a result of this type only for elements of the diagonals of these classes of matrices. In doing so, we obviously lean heavily on the preceding theorem.
Brownian Motion and the Classical Groups
112
Theorem 4.3. Let On = Un be the unitary group of n x n complex matrices, and let ~ = r + iA be an element of On distributed according to Haar measure. Let dj = "ijj + iAjj and let SJ:: = L~=1 dj . If Zn(t,w)
= S[ntJ(w), t
E
[0,1]'
then Zn =} TV converges to TV where TV is standard complex-valued Brownian motion (TV = WP) + iWP) where W(I) and W(2) are independent onedimensional Brownian motions with drift 0 and diffusion coefficient ~). Proof. We appeal to Theorem 5. One can easily adapt the argument for tightness given in Theorem 3.1 to show that ReZn is tight. Here Ebfl) = 2~ and Ebrr"issAuuAvv) = 0 for distinct r, s, u, and v. Similarly, ImZn is tight and hence Pn is tight where Pn is the law of (ReZn , ImZn ). By Theorem 4.1, it remains to show that (4.23) where P is the law of (W(I), W(2)). We consider time points 81, 82, it, and t2 where 81 < 82 and tl < t2, and one may easily verify that the general case can be handled analogously. Letting Xn = ReZn and Yn = ImZn, we wish to prove that
However, this statement would follow if
converges in distribution to
Appealing as before to the Cramer-Wold device [5], it suffices to show that
converges in distribution to
for any (a, b, c, d) E ]R4. The remainder of the proof follows by applying Theorem 4.2 in essentially the same way as Theorem 2.1 is applied in the proof of Theorem 3.1. D
5
Symplectic matrices
Recall (see [8] ) that the group of symplectic matrices Sp(n) may be identified with the subgroup of U(2n) of the form
[~
-1]
E
U (2n),
(5.24)
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis
113
where A, B are complex n x n matrices. The trace of random matrices from this group is studied in [14, 16]. As shown there, if 8 is chosen according to Haar measure in Sp(n), then Tr(8) , Tr(8 2 ) , ... , Tr(8 k ) are asymptotically independent normal random variables. We now study the extent to which the diagonal entries of a random symplectic matrix generate Brownian motion. Random matrices in Sp( n) can be generated in the following way. Fill the real and imaginary entries of A and B in with real, standard normal i.i.d. random variables. Apply the Gram-Schmidt process to the n complex column vectors of dimension 2n which result. We now have a new A and B and we complete the right half of our matrix by following the pattern of (5.24). The matrix obtained in this way is distributed according to Haar measure in Sp(n). To see this, one can adapt the argument given for the construction of a random orthogonal matrix. See for example Proposition 7.2 of [17]. We now have
Theorem 5.1. Let Sp(n) be the symplectic group of 2n x 2n complex matrices of the form (5.24) , and let 8 be an element of Sp(n) chosen according to Haar measure /-Ln. Let A = (aij)r,j=l be the upper left n x n block of 8, and let d i = aii, 1 :::; i :::; n , and let SI: =
then Zn ::::} ~ TV where
TV
2:7=1 di ·
If
is standard complex-valued Brownian motion.
Proof. We are working with complex matrices and so we can follow the arguments of Theorems 4.2 and 4.3. We first need the symplectic analogue of Theorem 4.2. To accomplish this, only one change in the proof of Theorem 4.2 is required. In place of the diagonal matrix D(O), we use instead the 2n x 2n diagonal matrix D1 (0) = Diag(l, ... , 1, ei(J, 1, ... , 1, e-i(J, 1, ... , 1) where ei(J and e-i(J occur in positions number m + 1 and n + m + 1 respectively. The rest of the arguments for the analogues of Theorems 4.2 and 4.3 are clear. D
It should be noted that we cannot link all 2n diagonal entries to obtain Brownian motion. If we were to try, note that Zn(~) and Zn(l) - Zn(~) would tend to limits which are complex conjugates of one another and hence dependent.
Acknowledgement. The authors thank Harry Kesten for explaining how sign-symmetry could be used to show that the trace of a random orthogonal matrix converges to a standard normal distribution at the Bowdoin Conference on random matrices in 1985. They also thank Francis Comets for comments on earlier drafts of this paper. The first author thanks the Department of Statistics of Stanford University for warm hospitality extended to him during the summers of 1994-1997. He also thanks Jeff Rosenthal and Patrick Billingsley for some useful conversations. In addition, he acknowledges support from the Research Foundation of the State University of New York in the form of a PDQ Fellowship. The second and third authors acknowledge research support from the Division of Mathematical Sciences of the National Science Foundation.
Brownian Motion and the Classical Groups
114 Anthony D' Aristotile Dept. of Mathematics SUNY at Plattsburgh Plattsburgh, NY 12901
Persi Diaconis Depts. of Mathematics and Statistics Stanford University Stanford, CA 94305
Charles M. Newman Courant Inst. of Math. Sciences New York University 251 Mercer Street New York, NY 10012
Bibliography [1] Arratia, R., Goldstein, L., and Gordon, L., Poisson Approximation and the Chen-Stein Method, Stat. Science 5, 403-434, 1990. [2] Bai, Z.D., Methodologies in Special Analysis of Large Dimensional Random Matrices. A review, Statist. Sinica 9, 611-677, 1994. [3] Bhattacharya, R. and Waymire, E., Stochastic Processes with Applications, John Wiley and Sons, 1990. [4] Billingsley, P. J., Probability and Measure, Second Edition, John Wiley and Sons, 1986. [5] Billingsley, P. J., Convergence of Probability Measures, John Wiley and Sons, 1968. [6] Borel, E., Sur les principes de la theorie cinetique des gaz, Annales de l'ecole normale sup. 23, 9-32, 1906. [7] Bump, D. and Diaconis, P., Toeplitz minors, Jour. Combin. Th. A. 97, 252-271, 2001. [8] Brackner, T. and tom Dieck, J., Representation of Compact Lie Groups, Springer Verlag, 1985. [9] Daffer, P., Patterson, R., and Taylor, R., Limit Theorems for Sums of Exchangeable Random Variables, Rowman and Allanhold, 1985. [10] D'Aristotile, A., An Invariance Principle for Triangular Arrays, Jour. Theoret. Probab. 13, 327-342, 2000. [11] Diaconis, P., Application of the method of moments in probability and statistics. In H.J. Landau, ed., Moments in Mathematics, 125-142, Amer. Math. Soc., Providence, 1987. [12] Diaconis, P., Patterns in eigenvalues, To appear Bull. Amer. Math. Soc., 2002. [13] Diaconis, P., Eaton, M., and Lauritzen, 1., Finite de Finetti theorems in linear models and multivariate analysis, Scand. J. Stat. 19, 289-315, 1992.
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis
115
[14] Diaconis, P. and Evans, S., Linear functions of eigenvalues of random matrices, Trans. Amer. Math. Soc. 353, 2615-2633, 2001. [15] Diaconis, P. and Freedman, D., A dozen de Finetti-style results in search of a theory, Ann. Inst. Henri Poincare Sup au n. 2 23, 397-423, 1987. [16] Diaconis, P. and Shahshahani, M., On the eigenvalues of random matrices, J. Appl. Prob. 31A, 49-62, 1994. [17] Eaton, M., Multivariate Statistics, John Wiley and Sons, 1983. [18] Edelman, A., Kostlan, E., and Shub, M., How many eigenvalues of a random matrix are real?, Jour. Amer. Math. Soc. 7, 247-267, 1999. [19] Feller, W., An Introduction to Probability Theory and Its Applications, Vol. II, John Wiley and Sons, 1971. [20] Golub, R. and Van Loan, C., Matrix Computations, 2nd Ed., Johns Hopkins Press, 1993. [21] Hida, T., A role of Fourier transform in the theory of infinite dimensional unitary group, J. Math. Kyoto Univ. 13, 203-212, 1973. [22] Jiang, T.F., Maxima of entries of Haar distributed matrices, Technical Report, Dept. of Statistics, Univ. of Minnesota., 2002. [23] Kuo, H.H., White Noise Distribution Theory, CRC Press, Boca Raton, 1996. [24] Levy, P., Le£;ons d'Analyse Fonctionnelle, Gauthiers-Villars, Paris, 1922. [25] Levy, P., Analyse Fonctionnelle, Memorial des Sciences Mathematiques, Vol. 5, Gauthier-Villars, Paris, 1925. [26] Levy, P., Problemes Concrets d'Analyse Fonctionnelle, Gauthier-Villars, Paris, 1931. [27] Mallows, C., A Note on asymptotic joint normality, Ann. Math. Statist. 43, 508-515, 1972. [28] Maxwell, J.C., Theory of Heat, 4th ed., Longmans, London, 1875. [29] Maxwell, J.C., On Boltzmann's theorem on the average distribution of energy in a system of material points, Cambridge. Phil. Soc. Trans. 12, 547-575, 1878. [30] McKean, H.P., Geometry of Differential Space, Ann. Prob. 1, 197-206, 1973. [31] Mehler, F.G., Ueber die Entwicklung einer Function von beliebig vielen. Variablen nach Laplaschen Functionen hoherer Ordnung, Grelle's Journal 66, 161-176, 1866. [32] Mehta, M., Random Matrices, Academic Press, 1991.
116
Brownian Motion and the Classical Groups
[33] Olshanski, G., Unitary representations of infinite-dimensional pairs (G, K) and the formalism of R. Howe. In A.M. Vershik and D.P. Zhelobenko, eds. Representation of Lie Grops and Related Topics, Adv. Studies in Contemp. Math. 7, 269-463, Gordon and Breach, New York, 1990. [34] Rains, E., Normal limit theorems for asymmetric random matrices, Probab. Th. Related Fields 112, 411-423, 1998. [35] Royden, H. L., Real Analysis, 2nd Edition, The Macmillan Company, 1968. [36] Stein, C., The accuracy of the normal approximation to the distribution of the traces of powers of random orthogonal matrices, Technical Report No. 470, Stanford University, 1995.
Transition Density of a Reflected Symmetric Stable Levy Process in an Orthant Amites Dasgupta and S. Ramasubramanian Indian Statistical Institute Abstract Let {Z(s,x)(t) : t 2': s} denote the reflected symmetric a-stable Levy process in an orthant D (with nonconstant reflection field), starting at (s, x). For 1 < a < 2,0 :::; s < t, x E D it is shown that Z(s,x) (t) has a probability density function which is continuous away from the boundary, and a representation given.
1
Introduction
Due to their applications in diverse fields, symmetric stable Levy processes have been studied recently by several authors; see [4], [5] and the references therein. In the meantime reflected Levy processes have been advocated as heavy traffic models for certain queueing/stochastic networks; see [14]. The natural way of defining a reflected/regulated Levy process is via the Skorokhod problem as in [9], [3], [11], [1]. In this article we consider reflected/regulated symmetric a-stable Levy process in an orthant, show that transition probability density function exists when 1 < a < 2 and is continuous away from the boundary; the reflection field can have fairly general time-space dependencies as in [11]. It may be emphasized that unlike the case of reflected diffusions (see [10]) powerful tools/methods of PDE theory are not available to us. To achieve our purpose we use an analogue of a representation for transition density (of a reflected diffusion) given in [2]. Section 2 concerns preliminary results on symmetric a-stable Levy process in JRd, its transition probability density function and the potential operator. In Section 3, corresponding reflected process with time-space dependent reflection field at the boundary is studied. A major effort goes into proving that the distribution of the reflected process at any given time t > 0 gives zero probability to the boundary.
2
Symmetric stable Levy process
Let (O,F,{Ft},P) be a filtered probability space, d 2 2,0 < a < 2. Let {B(t) : t 2 O} be an Fradapted d-dimensional symmetric a-stable Levy process. That is, {B(t)} is an JRd-valued homogeneous Levy process (with independent increments) with r.c.I.I. sample paths; it is roation invariant and
E[exp{ i(u, B(t) - x) }IB(O) = x] = exp{ -tlul a } 117
(2.1 )
Reflected Levy Process
118
for t 2:: 0, U E IR d, x E IRd. It is a pure jump strong Markov process. Using LevyIto theorem and Ito's formula, it can be shown that the (weak) infinitesimal generator of B(·) is given by the fractional Laplacian
J
~a/2 f(x) = ~ft}C(d, ex)
f(x
~~fl+~ f(x) d~
(2.2)
1~I>r
whenever the right side makes sense, where C(d,ex) = r(dta)/[2-a7fd/2Ir(~)I]; the measure v(d~) = C(d, ex) I~I}+Q d~ is called the Levy measure of B(·). Also, for any t > 0,
P(B(t)
i= B(t-)) = o.
(2.3)
See [4], [5], [7], [8] for more information.
= 8g(X)/8xi,gij(X) = 82g(X)/8xi8xj, 1::;
For afunctiong onIRd,gi(X)
i,j::; d.
Lemma 2.1. If f E C~(IRd) then ~a/2 f E Cb(IR d). Proof: For 0
< r < s, ~~,~2 is defined by (2.4) r
Let f E C~(IRd). For any x E IRd observe that
If(x
+~)
- f(x)1
1~Id+a
1
l(1,CXl)(I~1)
::; 21IfIICXlI~Id+a l(1,CXl)(I~I)
(2.5)
and that as ex > 0
J
CXl
1 de - C 1~Id+a <" -
J
r
-(a+l)d
r<
(2.6)
00.
1
1~1>1
So continuity of f and dominated convergence theorem imply that ~~~ ' well defined, bounded and continuous. Next, Taylor expansion gives
f(x +~) - f(x)
dId
=
L Ii(x)~i + "2 L i=l
fij(Y)~i~j
f
is
(2.7)
i,j=l
where Y is point on the line segment joining x and x + ~. Since function for each i
J ~i 1~1;+a d~ = o.
~
1-+
~i
is an odd
(2.8)
r
d
Note that
L
i,j=l
lij (Y)~i~j
= O(I~12)
and 1
J 1~121~1;+a d~ J =
O
r-a+1dr <
C
0
00
(2.9)
Amites Dasgupta and S. Ramasubramanian
119
as a > 2. Since fij (-) E Cb(IR d) it is now easily seen that lim .6.~{2 f is well rIO
'
defined, bounded and continuous. Since
.6.0./2 f(x) = .6.~~f(x) ,
+ lim.6.~{2 f(x) rIO'
(2.10)
the lemma now follows.
D
It is known that the process B(·) has a transition density function; we now give a representation for it.
Theorem 2.2. The transition probability density function of B(·) is given by
p(s, x; t, z)
J
00
=(47f)-d/2(t - s)-d/a
o
g(r) exp {1 ~Iz rd 4(t - s)2/a r2
-
x12} dr (2.11)
°
for :s: s < t < 00, x, z E IR d, where g(.) is the density function of the square root of an ~ -stable positive random variable. Proof: By homogeneity enough to consider s = 0, x = 0. Let t > 0. By (2.1) and Proposition 2.5.5 (on pp. 79-80) of [13] it follows that B(t) = (BI (t), ... ,Bd(t)) is sub-gaussian and that there exist independent one-dimensional random variables 8, U1 , ... ,Ud such that Ui rv N(O, 2t 2 / a ), 1 :s: i :s: d,8 is ~stable positive random variable and (Bl (t), ... ,Bd(t)) rv (8~ U I , 8~ U2, ... ,8~ Ud). Denoting by g(-) the density of 8 1 / 2 , the joint density of (UI , ... , Ud, 8 1 / 2 ) is given by
h(6, ... ,~d,r)=
( 1) (1)t 47f
d/2
d/ a
g(r)exp
{1 -4t2/a8~; d
}
.
Using the invertible transformation (6, ... ,~d,r) f---t (r6, ... ,r~d,r) on IR d x (0,00) the joint density of (Bl (t), ... ,Bd(t), 8 1 / 2 ) is given by 1d h r
(~Yb ... , ~Yd' r) r r
(4~) d/2
m
d/a :d9(r) exp {
~ 41;/" :2
t
yf } .
Now integrating w.r.t. r we get (2.11).
D 00
J
rlkg(r)dr < 00 o for k = 2, 3, ... Indeed note that g(.) depends only on a; so if we consider kdimensional symmetric a-stable Levy process then the transition density will be given by (2.11) with d replaced by k; and as the density is well defined at x = z the claim follows. Remark 2.3. From the preceding theorem it follows that
Proposition 2.4. Denote Po(s, x; t, z) = f)p(s, x; t, z)/f)s, Pi(S, x; t, z) = f)p(s, x; t, Z)/f)xi, Pij(S, x; t, z) = f)2p(s, x; t, Z)/f)xif)Xj, 1 :s: i,j :s: d.
Reflected Levy Process
120
(i) Fix t > O,Z E JRd. Let to < t; then P,Po,Pi,pij,l ::; i,j ::; d are bounded contin uous functions of (s, x) on [0, to] X JRd. (ii) For any t > 0,8 > 0 sup{I'V xp(s, x; t, z)1 : 0::; s < t, Iz -
xl
~
8} ::; K(d, 8)
(2.12)
where K (d, 8) is a constant depending only on d, 8 and 'V x denotes gradient w. r. t. x-variables. 2
2
2
Proof: (i) Since ye- Y ,y e- Y are bounded, using Remark 2.3 and dominated convergence theorem, the assertion can be proved by differentiating w.r.t. s, x under the integral in (2.11). (ii) Since yd+2 e- y2 is bounded, differentiating under the integral in (2.11) we get for all 0 ::; s < t, Iz - xl ~ 8
l'Vxp(s,x;t,z)1
< K(d)
J (2) oo
g(r)
Iz _ xl
I ) d+2
2r(: ~ s~l/a
d+l ( I
exp
{
4r2~t -=- :)2/a I
-
12}
dr
o
< K(d)
(~)
d+l
00
j g(r)dr
=
K(d, 8).
o
o The following result indicates a connection between the transition density and the generator; though it is not unexpected, a proof is given for the sake of completeness.
Theorem 2.5. For fixed t > 0, z E JRd the function (s, x) the Kolmogorov backward equation
Po(s, x; t, z)
+ 11~/2p(s, x; t, z) = 0, s < t, x
E
1-+
p( s, x; t, z) satisfies
JRd
(2.13)
where Po is as in the preceding proposition and x in 11~/2 signifies that 11 a/2 is applied to p as a function of x. Proof: By the preceding proposition and Lemma 2.1 11~/2p(s, x; t, z) is a bounded continuous function. Put u(s, x) = p(s, x; t, z), s < t, x E JRd. Using Ito's formula (see [7]) for 0 ::; s < c < t, x E JRd c
E{u(c, B(c)) - u(s, B(s)) - j[uo(r, B(r))
+ 11a/2u(r, B(r))]drIB(s)
s
That is
j p(c, y; t, z)p(s, x; c, y)dy - p(s, x; t, z) IRd c
j j [po(r,y;t,z) S
IRd
+ 11~/2p(r,y;t,z)]p(s,x;r,y)dy
dr.
=
x} = O.
Amites Dasgupta and S. Ramasubramanian
121
By Chapman-Kolmogorov equation, l.h.s. of the above is zero. As the above holds for all c > s and the quantity within double brackets is bounded continuous in (r, y), by Feller continuity one can obtain (2.13) from the above letting c 1 s. D
We next look at the O-resolvent (or potential operator) associated with the process B (.). For a measurable function rp on JRd, x E JRd define
J J 00
Grp(x) =
p(O,x;t,z)dt dz =
rp(z)
IRd
JJ 00
0
rp(z)p(O,x;t,z)dz dt
0
whenever the r.h.s. makes sense. Since difficult to see that
(2.14)
IRd
°< a < 2 ::::;; d, using (2.11) it is not
00
Jp(O,x;t,z)=Clz_~ld_<x,zi-x
(2.15)
o
which is the so called Riesz kernel.
Theorem 2.6. Let rp E C;(JRd) and rp,r.pi,r.pij,l::::;; i,j::::;; d be integrable w.r.t.
the d-dimensional Lebesgue measure. Then (aj Gr.p E C;(JRd), (bj (Gr.p)i(X) = Gr.pi(X), (Gr.p)ij(X) Gr.pij(X), x E JR d,l < Z,] < d (c) f:..<X/2Gr.p(x) = -rp(x),x E JRd. D We need a lemma
Lemma 2.7. If f E L 1 (JRd) n LaO (JRd) then Gf is well defined, bounded and
continuous. Proof: Let {Tt} be the contraction semi group associated with B(·). Observe that
J
JJ
o
1
1
Gf(x) =
Td(x)dt+
00
f(z)p(O,x;t,z)dz dt.
(2.16)
IRd
°
Since Td is continuous for each t > and ITdUI ::::;; Ilflloo it is clear that the first term on r.h.s. is bounded and continuous. By (2.11)
r
If(z)p(O,x;t,z)l(1,oo)(t)1 ::::;; K
°
d/ a lf(z)11(1,00)(t)
which is integrable as < a < 2 ::::;; d. So continuity of p in x now implies that the second term on r.h.s. of (2.16) is bounded and continuous. D
Proof of Theorem 2.6: By Lemma 2.6 we get Gr.p, Gr.pi, Gr.pij are bounded continuous. A simple change of variables yields
JJ 00
r.p(z + he~) - r.p(z) p(O, x; t, z)dz dt
o IRd
JJ 00
rpi(Z)P(O, x; t, z)dz dt
o
IRd
Reflected Levy Process
122
by dominated convergence theroem; thus (G
A
u
cx/2G
lfft?
1·1m TtG
tlO
t [J= J "'( o
lfft?
=
z )p(O, X; t
+ s, z )dz ds -
IR d
z
0
t [-- / J
",(z )p(O, x; s, z )dz
o
J= J "'( )p(0;
dS]
IR d
dS] = -
IR d
for each x E IR d , completing the proof.
3
X; s, z )dz
D
Reflected process
Let D = {x E IR d : Xi > 0, 1 ::::: i ::::: d} be the d-dimensional positive orthant. The reflection field is a function R : [0,00) X IR d X IR d ~ M d(IR) where M d(IR) is the space of (d x d) matrices with real entries. We write R(t,y,z) = (rij(t,y,z)). We assume the following Assumptions (AI) The function (y, z) uniformly in t, for 1 ::::: i, j ::::: d.
I---t
rij(t, y, z) is Lipschitz continuous,
(A2) For i =I- j, there exist Vij such that Irij(t,y,z)1 ::::: Vij for all t,y,z. Set V = (( Vij)) with Vii = 0. We assume spectral radius of V = o-(V) < 1. (A3) Take rii(·,·,·) == 1,1 ::::: i ::::: d. (A2) is a uniform Harrison-Reiman condition that has proved useful in queueing networks; (A3) is just a suitable normalization. Let s;::: O,x E D. The Skorokhod problem in D corresponding to {B(t) : t 2:: s} and R consists in finding Ft-adapted r.e.l.l. processes y(s,x)(t), Z(s,x)(t), t 2:: s such that (i) Z(s,x) (t) E D for all t ;::: s; (ii) ~(s,x)(s) = 0, ~(s,x)(.) is nondeereasing, 1 ::::: i ::::: d; (iii) ~(s,x) (-) can increase only when Z;S,x) (.)
J
= 0;
that is, for 1 ::::: i ::::: d, t 2:: s,
t
~(s,x) (t)
=
l{o} (Zi(s,x)
s
(r))d~(s,x) (r), a.s.
(3.1)
Amites Dasgupta and S. Ramasubramanian
123
(iv) Skorokhod equation holds, viz. for 1 ~ i ~ d, t :2: s
Xi
+ Bi(t) -
+L j#i
Bi(S)
+ ~(s,x)(t)
J t
rij(u, y(s,x)(u_), Z(s,x)(u_ ))dlj(s,x)(u)
(3.2)
s
or in vector notation
J t
Z(s,x)(t) = X + B(t) - B(s)
+
R(u, y(s,x)(u_), Z(s,x)(u_ ))dY(s,x)(u). (3.3)
s
Solving the deterministic Skorokhod problem path by path one can solve the above stochastic problem. Indeed the following result is given in [11]. Proposition 3.1. Assume (Ai) - (A3). For each s :2: 0, xED there is a unique
pair Z(s,x)(.), y(s,x)(.) solving the above problem; also ~(s,x)(t) ~ ((1 - V)-l L(s,x»)i(t), a.s.
(3.4)
for t ~ s where L(s,x) (-) is given by Lis,x)(t)
=
sup max{O, -[Xi
+ Bi(t) -
Bi(S)]}.
s~u~t
Moreover {(Z(s,x)(t), y(s,x)(t)) : t :2: s} is an Ft-adapted 15 x 15-valued Feller continuous strong Markov process. Any discontinuity ofY(s,x) (., w) or Z(s,x) (-, w) has to be a discontinuity of B(·,w). If R is a function only oft, z then {Z(s,x)(t) : t :2: s} is a 15-valued Feller continuous strong Markov process. 0 The z-part of the above viz. {Z(s,x) (t) : t ~ s} may be called the reflected (or
regulated) symmetric a-stable Levy process. Proposition 3.2. Assume (Ai) - (A3) and let 1
< a < 2.
Then E[var (y(s,x)(.); [s, t])] < 00 for all t > s :2: 0, xED, where var (g(.); [a, b]) denotes the total variation of 9 over [a, b]. Proof: As ~(s,x) (-) is nondecreasing for each i it is enough to show that EI~(s,x)t)1
<
also we may take s = 0, x = 0. Since a > 1 note that EIBi(t)Ia:' < 00 for all 1 ~ a' < a. As B(-) is symmetric note that it is a martingale. (3.4) of the preceding proposition implies 00;
EI~(O,O)(t)la' ~ C E [sup
O~r~t
IBi(r)l] a'
~ 6 EIBi(t)Ia:' <
00
by Doob's maximal inequality for any 1 < a' < a. The required conclusion now ~~.
0
Note: In the context of reflected processes, the reflection terms are usually specified only for z on the boundary. However, no matter how the reflection
ReBected Levy Process
124
field is extended to D or JRd, only the values on the boundary determine the process; Theorem 4.5 of [12] and its proof can be easily adapted to our situation. The next result concerns expected occupation time at the boundary.
Theorem 3.3. Assume (AJ) - (A3); let 1 < s
< 2. Thenfors 2': O,X
0:
E D,t
>
(3.5)
Proof: We consider only s=O. Note that aD = {x E JRd : Xi =0 for some i}. Let H = {x E JRd : min IXil :::; I}. Let
aD =
~
{
For 0 < E :::; 1 define CPt on JRd by CPt(z) = cp(zIE). Note that d d CPEl CPt,i, CPt,ij ECb(JR ) n Ll(JR ); also they are supported on EH ~ H. Clearly
(3.6) Next define gt on JRd by CXl
gt(x)
j -
=
E: CPt(x) j 0
IRd
By Theorem 2.6, !:1 0'./2 g €
=
(3.7)
p(O, x; t, z)dt dz.
t~
< E :::;
1. We now claim that
supEO'.lg€(X)I-----t 0 as
E
1 o.
(3.8)
x
Putting S = tlEO'. in (3.7) and as l
EO'.lg€(x)1
:::;
EO'.j jP(O,X;EO'.S,Z)dzds o IR d CXl
+EO'. j Icp€(z)1 j IR d
h (x; E)
p(O,X; EO'.S, z)ds dz
1
+ h(x; E).
As p(O,X;EO'.S,·) is a probability density suplh(x;E)1 :::; EO'. -----t O. As cP is intex
grable, by (2.11) sup Ih(x; E)I x
<
fa
J1
IR d
a
ds dz
1
C EO'.-d j cp( ~z)dz = C EO'. j cp(z)dz IR d
CEO'. -----t 0
IRd
Amites Dasgupta and S. Ramasubramanian
125
whence (3.8) follows. We next show that supEaIVg",(x)l-t 0 as E 1 o.
(3.9)
x
By Theorem 2.6, and putting s = t/Ea gives
J J
J (~) ~ J 00
p(O,x;Eas,z)ds dz
-
IRd
0
00
-
p(O,x;Eas,z)ds dz.
IRd
Since
:s: i :s:
supEaIVgE(x)1 x
because
()i
>
0
d, an argument similar to the derivation of
:s: C
Ea - l -t 0 as E 1 0
1; this proves (3.9).
Now applying Ito's formula to Eag",(Z(O,x) (.)), denoting Z(O,x) (.) by Z(.), y(O,x) (.) by Y (.) and taking expectations we get t
EkagE(Z(t)) - Eag",(X)]
=
E
°
J t
+E
J
(R(u, Y(u-), Z(u- ))EaVg",(Z(u)), dY(u)).
(3.10)
°
By (3.8) l.h.s. of (3.10) tends to zero as E -t o. As R is bounded, Proposition 3.2 and (3.9) imply that the last term in (3.10) goes to zero as E -t O. Finally, D as l
Remark 3.4. A function
D
Using Theorem 3.3 we now improve on it!
Theorem 3.5. Assume (AJ) - (A 3), 1 <
()i
P(Z(s,x) (t) E aD)
< 2. Thenfors;::: O,X =
O.
E D,t
>s
(3.11)
Reflected Levy Process
126
Proof: Let ((z)
(;J + ... + zl~)}, where K
= K exp {h(r)
>2
e exp { -1!r2} , Irl:S: 1 o , Irl ~ 1
= {
f > 0 define fE(Z) = h(((Z/f)), Z E JRd. Clearly fE E C~(JRd) and afE(Z)/aZi = 0 for any Z E aD, 1 :s: i :s: d. It is not difficult to see that
For
lim fE(Z) dO
= 1cw(z), Z E JRd
(3.12)
(for Z ~ aD note that Zi > c for all i for some c > 0; hence ((Z/f) > 1 for all small f). Next, an argument as in Lemma 2.1 gives for f > 0 (3.13) for suitable constants C 1, C2 . Now we claim that for
Z
E
D\aD,
t1 Ct / 2 fE(Z)
---+
0 as flO.
(3.14)
Indeed let Z rt aD; there exist ro > 0, c > 0 such that (Zi + ~i) > c, 1 :s: i :s: d for I~I < ro· Choose fa > 0 so that for all f < fa, (((Z+~)/f) > K exp{ -df2 /c 2} > 1 for I~I < ro· Therefore fE(Z +~) = 0 = fE(Z) for all I~I < ro, f < fa and hence
t1 Ct / 2fE(Z)
fE(Z+~)I~I~+Ctd~.
j
=
(3.15)
1~I>ro
Since 1~ll+Q l(ro,oo)(I~I) is integrable and Ad(aD) = 0, by (3.12), (3.15) now the claim (3.14) follows. To prove the theorem we consider only the case s = O. Denote by Z(·), Y(·). We want to prove that for xED, t > 0,
Z(O,x) (.), y(O,x) (-)
t
limEjt1 Ct / 2fE(Z(r))dr dO
=
(3.16)
O.
a
By Theorem 3.3 and (3.13) for each
f
> 0,
t
E j 1aD(Z(r))t1Ct/2 fE(Z(r))dr a For c> 0, put Dc prove that
= (2c,oo)d.
=
o.
(3.17)
In view of (3.17), to prove (3.16) it is enough to
t
l!reE j 1DcCZ(u))t1Ct/2 fE(Z(u))du = 0 a
(3.18)
Amites Dasgupta and S. Ramasubramanian
127
for any fixed c > O. If Z E Dc, I~I < c note that Zi + ~i > c, 1 ~ i ~ d. So one can choose EO > 0 such that fE(Z +~) = 0 for all I~I < c, Z E Dc, E < EO. Hence for any E < EO 11DJZ(u))~a /2 fE(Z(u))1 ~
J
1 1~Id+ad~ ~ C ac1a '
1~I>e
The required assertion (3.18) and hence (3.16) now follows by (3.14) and dominated convergence theorem. Now to prove (3.11) (with s = 0), first consider the case x tJ- an. Since afE(-)/aZi = 0 on an, and y(.) can increase only when Z(·) E an, by Ito's formula
J~a/2 t
E[fE(Z(t))]- fE(X)
=
E
fE(Z(r))dr.
a By (3.12), (3.16) letting
lOin the above we get (3.11).
E
Next let x E an; for c > 0 let TJ - TJ~x) = inf{r ~ 0 : Z(r) E Dc}. By strong Markov property and the preceding case
E[l[o,tj(TJ)laD(Z(t))]
= O.
Note that {TJ~x) ~ t} i 0 (modulo null set) as c 1 0; otherwise we will get a contradiction to Theorem 3.3. Letting c lOin the above we get the required conclusion. This completes the proof. 0
Note: It may be interesting to compare the proofs of Theorems 3.3, 3.5 with those of their analogues for reflected Brownian motion given in [6]. In the following \1 2p(r, y; t, z) = \12P(r,'; t, z), ~~/2p(r, y; t, z) = ~~/2p(r,.; t, z) denote respectively the operators \1, ~ a/2 applied as function of y-variables. Our main result is
Theorem 3.6. Assume (AJ) - (A3); let 1 < a < 2. For 0 ~ s < t < oo,x E fJ, zED define
p(s,x;t,z)
J t
+E
(R(u, Y(u-), Z(u- ))\1 2p(u, Z(u); t, z), dY(u)) (3.19)
s
where Y(-) = y(s,x)(-),Z(-) = Z(s,x)(.). ForO ~ s < t,x E fJ,z E an take pR(s, x; t, z) = O. Then (i) pR is continuous on {O ~ s < t < 00, x E fJ, ZED}, it is also differntiable in (t, z); (ii) for any Borel set A S;;; fJ, s < t, x E fJ P(Z(s,x) (t) E A)
=
J
pR(s, x; t, z)dz.
(3.20)
A
In case R is independent of y-variables, pR is the transition probability density function of the Markov process Z(·). 0
Reflected Levy Process
128
We need a lemma Lemma 3.7. Hypotheses and notation as in the Proposition 3.2.
If
(sn,x n ) - t (s,x) then for a.a. w, forT> s var (Y(Sn,Xn)(.,w) - y(s,x)(.,w); [s,T]) sup IZ(Sn,Xn)(t,w) - Z(s,x)(t,w)1
-t
0
-t
o.
s~t~T
Proof: Denote z(n)(.) = Z(Sn,X n )(.), y(n)(.) = Y(Sn,X n )(.), Z(.) = Z(s,x)(.), Y(·) = y(s,x)(-). We first consider the case Sn < s for all n. Clearly z(n)(t,w), yen) (t, w), t :2 s is the solution to the Skorokhod problem corresponding to z(n)(s,w) + B(·,w) - B(s,w). For any T > s note that var ([Be, w) - B(s, w)
=
Iz(n)(s,w) -
+ Zen) (s, w)]
- [B(·, w) - B(s, w)
+ xl; [s, T])
xl.
For any w such that B(·, w) is continuous at s we have Xn +B(s, w) - B(sn' w) x. Boundedness of Rand (3.4) imply
J
-t
S
R(u, y(n)(u-),z(n)(u-))dy(n)(u,w)
Thus Iz(n)(s,w) [11].
xl
-t
-t
0 as n
-t
00.
0, and hence the result follows by Proposition 3.9 of D
Next let Sn > S for all n. For any n,Z(t,w),Y(t,w),t:2 Sn is the solution to the Skorokhod problem corresponding to Z(sn,w) + B(·,w) - B(sn,w). Clearly var ([xn
+ B(·, w) -
B(sn, w)]- [Z(sn, w)
+ B(·, w) -
B(sn, w)]); [sn, T])
IZ(sn,w) - xnl· So by the arguments as in [11] var (y(n)(.,w) - Y(·,w); [sn' T]) sup
Iz(n)(t,w)-Z(t,w)1
< CIZ(sn'w) - xnl < CIZ(sn,w)-xnl.
sn~t~T
Note that for s ::;: t ::;: Sn we may take z(n)(t,w) = Xn , y(n)(t,w) = O. Clearly var (Y(·, w); [s, sn]), sup IX n - Z(t, w)l, IZ(sn, w) - xnl all tend to 0 as Sn - t S s~t~sn
by right continuity. The required conclusion is now immediate. Proof of Theorem 3.6: Since dY(s,x)(.) can charge only when Z(s,x)(.) E aD and d(z, aD) > 0 for z tI- aD, well definedness of (3.19) follows from (2.12) and Proposition 3.2. Assertion (i) now follows from properties of p (viz. (2.11), (2.12), Proposition 2.4), boundedness and continuity of R and Lemma 3.7. To prove assertion (ii), in view of Theorem 3.5, it is enough to establish (3.20) when A c D.
Amites Dasgupta and S. Ramasubramanian
129
Fix t > s; let E > o. Apply Ito's formula to p(r, Z(s,x)(r); t, z), s :::; r :::; (t - E) corresponding to the semimartingale Z(s,x)(.) and use Theorem 2.5 to get
= p(s, x; t, z)
p(t - E, Z(t - E); t, z)
t-E
+
J
(R(r, Y(r-), Z(r- ))\l2p(r, Z(r); t, z), dY(r))
s
+ a stochastic integral. Let any
(3.21)
J be a continuous function with compact support KeD. By (3.21) for E > 0
J J J E
J(z)p(t - E, Z(t - E); t, z)dz
D
=
J
J(z)p(s, x; t, z)dz
D
t-E
+E
J(z)
D
(R(r, Y(r-), Z(r- ))\l2p(r, Z(r); t, z), dY(r))dz
(3.22)
s
For any w, note that p(t - E, Z(t - E, w); t, z)dz =? 6Z(t-,w)(dz) as since P(Z(t) =I- Z(t-)) = 0 it now follows that lim[l.h.s. of (3.22)] = E[J(Z(s,x)(t))]. dO
E
1 o.
And (3.23)
As d(K, aD) > 0, by (2.12), Proposition 3.2 and boundedness of J(-), R(·) lim[r.h.s. of (3.22)] = dO
J
J(z)pR(s, x; t, z)dz.
(3.24)
E[J(Z(s,x)(t))]
(3.25)
D
Thus
J
J(z)pR(s, x; t, z)dz
=
D
for any continuous function
J with compact support in D.
Next for any open set FeD, let {In} be a sequence of continuous functions with compact support in D such that In rlF pointwise. Clearly lim E[Jn(Z(s,x)(t))] = E[I F(Z(s,x)(t))].
(3.26)
n--+CXl
Taking expectation in (3.21) and letting
E
lOwe get
pR(s, x; t, z) = lim E[P(t - E, Z(t - E); t, z)] 2:: O. dO
Therefore by monotone convergence theorem
nl~~
J
J
D
D
In(Z)pR(s, x; t, z)dz =
IF(Z)pR(s, x; t, z)dz.
(3.27)
Now (3.25), (3.26), (3.27) imply that (3.20) holds for any open FeD, and hence for any Borel set A cD.
Reflected Levy Process
130
Finally, the last assertion is immediate from (ii); this completes the proof.
0
We conclude with the following questions. 1. Can (x, z)
r--t
pR(s, x; t, z) given by (3.19) be extended continuously to
2. Is pR(s,x;t,z)
> 0 for s < t,x,z
D x D?
E D?
3. When is pR symmetric in x, z?
Acknowledgement: The authors thank B. Rajeev and S. Thangavelu for some useful discussions; and Siva Athreya for bringing [1] to their notice while the work was in progress. Amites Dasgupta Stat.-Math. Unit Indian Statistical Institute 203 B.T. Road Kolkata - 700 108
S. Ramasubramanian Stat.-Math. Unit Indian Statistical Institute 8th Mile Mysore Road Bangalore - 560 059
Bibliography [1]
R. Atar and A. Budhiraja: Stability properties of constrained jumpdiffusion processes. Preprint, 2001.
[2]
S. Balaji and S. Ramasubramanian : Asymptotics of reflecting diffusions in an orthant. Proc. Internat. Conf. Stochastic Processes, December'96, pp. 57-81. Cochin University of Science and Technology, Kochi, 1998.
[3]
I. Bardhan:
[4]
K. Bogdan: The boundary Harnack principle for the fractional Laplacian. Studia Math. 123 (1997) 43-80.
[5]
K. Bogdan and T. Byczkowski Potential theory for the a-stable Schrodinger operator on bounded Lipschitz domains. Studia Math. 133 (1999) 53-92.
[6]
J. M. Harrison and R. J. Williams: Brownian models of open queueing networks with homogeneous customer populations. Stochastics 22 (1987) 77-115.
[7]
N. Ikeda and S. Watanabe: Stochastic differential equations and diffusion processes. North-Holland, Amsterdam, 1981.
[8]
K. Ito : Lectures on Stochastic Processes. Tata Institute of Fundamental Research, Bombay, 1961.
Further applications of a general rate conservation law. Stochastic Process. Appl. 60 (1995) 113-130.
Amites Dasgupta and S. Ramasubramanian
[9]
131
O. Kella: Concavity and reflected Levy process. J. Appl. Probab. 29 (1992) 209-215.
[10] S. Ramasubramanian: Transition densities of reflecting diffusions. Sankhya Ser. A 58 (1996) 347-381.
[11] S. Ramasubramanian : A subsidy-surplus model and the Skorokhod problem in an orthant. Math. Oper. Res. 25 (2000) 509-538. [12] S. Ramasubramanian: Reflected backward stochastic differential equations in an orthant. Proc. Indian Acad. Sci. (Math. Sci') 112 (2002) 347-360. [13] G. Samorodnitsky and M. S. Taqqu: Stable non-gaussian random processes : stochastic models with infinite variance. Chapman and Hall, New York, 1994. [14] W. Whitt: An overview of Brownian and non-Brownian FCLT's for the single-server queue. Queueing Systems Theory Appl. 36 (2000) 39-70.
132
Reflected Levy Process
On Conditional Central Limit Theorems For Stationary Processes Manfred Denkerl Universitiit Gottingen and Mikhail Gordin V.A. Steklov Institute of Mathematics Abstract The central limit theorem for stationary processes arising from measure preserving dynamical systems has been reduced in [6] and [7] to the central limit theorem of martingale difference sequences. In the present note we discuss the same problem for conditional central limit theorems, in particular for Markov chains and immersed filtrations.
1
Introduction
Let ((khcl = ((~k' 'r/k)hE'l, be a two-component strictly stationary random process. Every measurable real-valued function f on the state space of the process defines another stationary sequence (f ((k) ) kE'Z. Various questions in stochastic control theory, modeling of random environment among many other applications lead to the study of conditional distributions of the sums l:~:~ f ((k) given 'r/O, ... ,'r/n-l. In particular, the asymptotic behaviour of these conditional distributions is of interest, including the case when the limit distribution is normal. We shall prove conditional central limit theorems in the slightly more abstract situation of measure preserving dynamical systems (X, F, P, T), where (X, F, P) is a probability space and T : X --t X is P-preserving. Let f be a measurable function and Ji be a sub-O"-algebra. f is said to satisfy the conditional central limit theorem with respect to Ji (CCLT(Ji)), if P a.s. the conditional distributions of n-l
1 ""' Vn L..;foT k ,
k=O
given Ji, converge weakly to a normal distribution with some non-random variance 0"2 2': o. This leads to the identification problem for L2(P)-subspaces consisting of functions satisfying a CCLT. Following [6], an elegant way to describe such subclasses IThis paper is partially supported by the DFG-RFBR grant 99-01-04027. The second named author was also supported by the RFBR grants 00-15-960l9 and 02-0l-00265.
133
134
On Conditional Central Limit Theorems For Stationary Processes
uses T-filtrations, i.e. increasing sequences of o--fields Fn = T- 1 Fn+l' n E Z. Here we need to consider a pair of T-filtrations (Fn)nEZ and (Qn)nEZ satisfying 9n C Fn for every n E Z. For example, in case of a strictly stationary random process (~khEZ as above the o--field Fn (or 9n) is generated by ((k)k~n (or (T/kh~n' respectively). First of all, the conditional distributions in CCLT(1i) are determined by
1i
=
V 9k V V Fk·
kEZ
k~O
Secondly, a general condition describing the class of functions f for which the CCLT(1i) holds is given by the coboundary equation f = h + g - goT with a (Fn)nEz-martingale difference sequence h 0 Tk (i.e. h is UT 1i-measurable and EH f := EUI1i) = 0). The coboundary equation is implicit ely also used in [10] and [9]. In [10], sufficient conditions for CCLT(1i) are obtained, when 1i is replaced by 1i = VkEZ 9k, and our Proposition 3.1 contains this result as a special case. This proposition also specializes in case of skew products T(x,y) = (T(x),Tx(Y)) as in [9], where 9n is a T-filtration, and where 1i is also replaced by 1i.
It is hardly possible to verify this coboundary condition using properties of the o--fields (Fn)nEZ and (Qn)nEZ without making assumptions about their interaction. It has been noticed in [5] that conditional independence plays a fundamental role when studying conditional measures and their properties in connection with thermodynamic formalism. This additional property of conditional independence has been called immersion in [1], and we shall adopt this terminology. It means that for every n E Z the o--fields Fn and 9n+l are conditionally independent given 9n. The property of immersion is an essential simplification, although it seems to be rather strong. However, it looks quite natural in several situations (see e.g. [5]), in particular, when both ((khEZ and (T/khEZ are Markovian. Indeed, if the sequence (T/k)kEZ models the time evolution of a random environment influencing the process (~k)kEZ' the condition just means that there is no interaction between the process (~k)kEZ and the environment (T/k)kEZ, The same picture arises when (~khEZ models the outcome of non-anticipating observations over the process (T/khEZ, mixed with noise. If the sequence ((khEZ is a Markov chain, there is a natural assumption in terms of transition probabilities to guarantee that the corresponding filtrations are immersed (see Section 4). The notion of immersed filtrations was first recognized as an important concept in connection with the classification problem of filtrations (see [1] and references therein). A closely related notion, regular factors, was introduced in [5]. The latter paper also contains some examples of regular factors originating in twodimensional complex dynamics. In more general situations (like in control theory) some form of the feed-back between the two processes may be present, and we cannot expect that the corresponding filtrations are immersed. In this case more general concepts and results (like Theorem 3.7 of the present paper) have to be developed. In particular, we study the CCLT-problem for functions of Markov chains. We
Manfred Denker and Mikhail Gordin
135
follow the ideas in [7] closely where a rather general and natural condition in terms of the transition operator was introduced for the CLT-problem. This condition means that the Poisson equation is solvable, and it avoids mixing assumptions and similar concepts (e.g. [9] contains results in this direction). There is a natural construction embedding the original Markov chain into another one, for which the Poisson equation has to be solved. We give some comments how this verification can be done, in particular, in the context of fibred dynamical systems [5]. However, we do not go into much of details. As a consequence we obtain the functional form of the CLT for fluctuations of a random sequence around the conditional mean. Finally, we consider the case of immersed Markov chains. This property together with a solution of the Poisson equations for the original and extended Markov chains establishes an analogous result for conditional mean values of the original sequence, in addition to the CLT for fluctuations. The present paper arose from an attempt to understand Bezhaeva's paper [2] from the viewpoint of martingales. Bezhaeva's article studies the same problem as in the present note in the special case of finite state Markov chains. We do not reproduce these results in detail and formulate the conclusions of our theorems in a way different from the viewpoint taken in [2]. However, we would like to sketch the differences in both approaches. There are two results on the CLT in [2]: Theorem 3 and Theorem 5 (the latter theorem seems to be the most important result of [2]). Our corresponding results are Theorem 3.7 and Theorem 4.4. Though, we do not verify here that the conditions of our Theorem 4.4 are satisfied for a class of Markov chains considered in [2] and arbitrary centered functions: this would be just a reproduction of a part of [2]. Its proof and the content of our Section 4 clearly show that even for finite state Markov chain we really deal with continuous state space when considering a conditional setup. In fact much more general chains than in Theorem 5 in [2] (for example, geometrically ergodic) can be considered on the basis of our Theorem 4.4. Our method of proving the CLT is quite different from that of [2] and, as was remarked above, is based on approximation by martingales. We assume in this paper that all probability spaces and (j-fields satisfy the requirements of Rokhlin's theory of Lebesgue spaces and measurable partitions. This does not imply any restriction to the joint distributions of random sequences we are considering; hence we may freely use conditional probability distributions given a (j-field. An alternative approach would be to reformulate the results avoiding conditional distributions. However, we do not think that the advantages given by such an approach justifies the complexity of such a description.
2
Immersed Filtrations
Throughout this paper, let (X, F, P) and T : X ---+ X be, respectively, a probability space and an automorphism of (X, F, P) (that is an invertible Ppreserving measurable transformation). An increasing sequence of (j-subfields
136
On Conditional Central Limit Theorems For Stationary Processes
(Fn) nEZ of F will be called a filtration and a T -filtration if, in addition, T- 1(Fn) = Fn+l for every n E Z. Any a-field E ~ F defines a natural T-filtration (En)nEZ = (T-nE)nEZ, whenever T-1E :2 E. A filtration (Qn)nEZ is said to be subordinated to a filtration (Fn)nEZ, if for every n E Z (2.1 ) and it is called immersed into the filtration (Fn)nEZ, if (Qn)nEZ is subordinated to (Fn)nEZ and for every n E Z the a-fields Fn and Qn+l are conditionally independent given Qn. We shall always assume that
F=
V Fn
(2.2)
nEZ
(V sES Es
denotes the smallest a-field containing all a-fields Es , s E S). Setting Q = VnEZ Qn it follows from the definition of a T-filtration that Q is completely invariant with respect to T (that is T-1(Q) = Q). Finally, define F- = nkEZ F k , and similarly Q- = nkEZ Qk· Throughout this paper (Qn)nEZ always denotes a T-filtration which is subordinated to the T-filtration (Fn)nEZ. We then set
'lin
=
Q V Fn·
The transformation T defines a unitary operator UT on L2 = L 2(X, F, P) by UT f = f 0 T, f E L 2 . Given a sub-a-field 'li c F, we denote its conditional expectation operator (on L 2 ) by EH and its conditional probability by P( ·I'li). Let II . 112 denote the L 2 -norm. As mentioned above, the notion of immersed filtrations arises naturally in the context of Gibbs measures in the thermodynamic formalism (see [5]) and of Markov chains (see e.g. [2]). In order to simplify our conditions in the CCLT for these applications we need the following lemma for immersed filtrations. Lemma 2.1. The T -filtration (QkhEZ is immersed into the T -filtration (Fk)kEZ, if for every n E Z (2.3)
or, equivalently,
(2.4) Conversely, if UhhEZ is immersed into (FkhEZ, then the following equalities hold for every n E Z and m ~ 1:
(2.5) Proof. We first show that (Qk)kEZ is immersed into (Fk)kEZ, if (2.3 ) holds. Let n E Z be fixed and let ~ and ry be bounded functions measurable with respect to Fn and Qn+l, respectively. It follows from (2.3 ) that EFnry = EYnry. Therefore we have
EYn EFn (~ry) = EYn (~EFnry) EYn (~EYnry) = EYn (OEYn (ry),
Manfred Denker and Mikhail Gordin
137
which implies the conditional independence of Fn and 9n+l given 9n. In a similar way (replacing Fn by 9n+l) one shows conditional independence assuming
(2). Conversely, we first show that conditional independence of Fn and 9n+l given 9n for some n E Z implies (2.3). Indeed, it suffices to verify (2.3 ) for all bounded Fn V 9n+l-measurable functions ofthe form ~TJ, where ~ and TJ are Fnand 9n+l-measurable, respectively. By conditional independence, for a 9n+lmeasurable, bounded function h,
whence EYn+l~ that
=
EYn~. Similarly one shows that gFnTJ
EFn (TJEYn+10 (EYn~)(EFnTJ)
= EYnTJ.
It follows
= EFn (TJEYn~) = (EYn~)(EYnTJ)
EYn (~TJ)· Since the equation (2.4 ) can be proved similarly, we obtain the equivalence of (2.3 ) and (2.4 ). Moreover, by induction one easily verifies (2.5 ). 0
3
A Conditional Central Limit Theorem
Let (Vk)k2:1 be a sequence of real-valued random variables. For every n E Z+ define a random function with values in the Skorokhod space D([O, 1]) ([3], [8]) in the standard way: it is piecewise constant, right continuous, equals in the interval [0, lin) and equals n- 1 / 2 Ll::;m::;[ntlVm for a point t E [lin, 1]. This random function will be denoted by Rn(Vl, ... , vn ) and has a distribution on D([O, 1]), denoted by Pn(Vl, ... , v n ). We write Wa for the Brownian motion on [0,1] with variance (J2 of w a (1) (we need not exclude (J2 = since Wo is the process which identically vanishes). The distribution of Wa in C([O, 1]) will be denoted by Wa.
°
°
Remark 3.1. In the sequel we deal with convergence in probability of a sequence of random probability distributions in D([O, 1]) to a non-random probability distribution. It is assumed here that the set of all probability distributions in D( [0, 1]) is endowed with the weak topology. It is well known that the piecewise constant random functions (in D([O, 1])) can be replaced by piecewise linear functions (in C([O, 1])) without changing the essence of the results formulated below.
3.1
A general CCLT
As mentioned in the introduction the conditional central limit theorems in [9] and [10] are proved using some martingale approximation. There are different
138
On Conditional Central Limit Theorems For Stationary Processes
versions of a martingale central limit theorem which may be used in the present context. They all are versions and extensions of Brown's martingale central limit theorem. It has been used in [10] directly, and is used in [9] and here in a modified form. We apply a corollary of Theorem 8.3.33 in [8] to obtain the following CLT for arrays of martingale difference sequences. Lemma 3.1. For n E Z+ let (nn, F n , (Fk,n) k?:.O, pn) be a probability space with filtration Fk,n C Fn (k ~ O), and let (vk,nh?:.l be a square integrable martingale difference sequence with respect to ((Fk,nk:::o, pn). If for every E > 0 and t ~ 0 we have (3.6) grk- 1 •n (v~,n1{lvk,nl>E}) -----+ 0 l:S;k:S;nt
L
and
(3.7) l:S;k:S;nt in probability as n
-----+
(Xl
then {Pn (Vl' ... , v n ) : n ~ I}, converges weakly to
Wa 2 • The following proposition is the key result in the martingale approximation method for the CCLT. Implicitly it also appears in [10], and its proof is analogous to that for the central limit theorem in [6] or [7]. Proposition 3.1. Let T be an ergodic automorphism and (Hn)nEZ be a Tfiltration. Assume that g, h E L2 and (3.8) If f is defined by f = h+ g - UTg,
(3.9)
then, with probability 1, the conditional distributions Pn(J, UT f, ... ,U:;,-l flHo) given Ho of the random functions Rn (J, UT f, ... ,U:;'-l 1) converge weakly to the (non-random) probability distribution W a , where (J = IIhl12 ~ O.
Remark 3.2. The equations in (3.8 ) say that the sequence (U¥h)nEz is a stationary martingale difference sequence with respect to the filtration (Hn)nEZ, Remark 3.3. The conclusion of Proposition 3.1 remains true if the (J-field Jio in the statement is changed to any coarser one. This follows easily from the definition of weak convergence and the non-randomness of the limit distribution. Proof of Proposition 3.1. By remark 3.2 the sequence of finite series Vk,n = n- l / 2U!;-l h, (1 ~ k ~ n), form a martingale difference sequence with respect to the filtrations (Hk)o:S;k:S;n' Assume first that (J > O. We show that the sequence {vk,nI1 ~ k ~ n, n E Z} with probability 1 satisfies the conditions 3.6 and 3.7 of Lemma 3.1 with respect to the conditional distribution given Ji o· Relative to this conditional distribution with probability 1 the sequence (U¥h)nEz is a (non-stationary) sequence of martingale differences with finite second moments. The ergodic theorem implies that with P-probability 1
Manfred Denker and Mikhail Gordin
139
as n - 00. It follows that with probability 1 the same relation holds almost surely with respect to the conditional probability given H a, establishing (3.7 ). We need to check (3.6). By the ergodic theorem again, for every E > 0 and A > 0 we have with P-probability 1 E ri k-l( Vk,n 2 1 {llIk,nl>E} ) lim sup n->oo l~k~nt
L
lim sup n- 1 Erik ((U~h)21{IU~hl>ml/2}) n->oo a~k~(n-l)t
L
< lim sup
n- 1
n->oo
L
Erik ((U~h)21{IU~hl>A})
a~k~(n-l)t
lim sup n- 1 Erik (U~h2U~1{lhl>A}) n->oo a~k~(n-l)t
L L
lim sup n- 1 U~(Erio (h 21{lhl>A})) n->oo a~k~(n-l)t EErio (h 21{lhl>A})
= E(h 21{lhl>A}),
and, choosing A large enough, the latter expression can be made arbitrarily small. Thus for every E > 0 with P-probability 1
L
Erik-l(v~,nl{llIk,nl>E}) - 0
l~k~nt
as n - 00. This implies that with probability 1 the same expression tends to zero with respect to the conditional probability given H a, proving (3.6 ). It follows from Lemma 3.1 that Pn(h, ... , U:;,-lhIH a ) converges weakly to Wa P-a.s. The same conclusion also holds if a = 0 (h = 0 in this case). Finally we need to show that the sequences (U¥h)nE7l, and (U¥!)nE7l, are stochastically equivalent. We have R n (n- 1 / 2!, ... , n- 1 / 2U:;'-1 J) - R n (n- 1/ 2h, ... , n- 1/ 2U:;'-lh) =
n - un-I)) 2 - U) R n (n -1/2(UTg - g ) ,n -1/2(UTg Tg,···, n -1/2(UTg T g . It is easy to see that the maximum (over the interval [0, 1]) of the modulus of the latter random function equals n- 1 / 2 maxl~k~n IU~g - gl and does not exceed n- 1 / 2 (lgl + maxl~k~n IU~gl). Since by the ergodic theorem n- 1 U¥g2 0, this expression tends to zero P-a.s. Thus we see that P-a.s. the distance in D([O, 1]) between Rn(h, ... , U:;,-lh) and Rn(J, ... , U:;,-l J) tends to zero as n - 00. This implies that, with probability 1, the conditional distributions Pn(J, ... ,U:;,-l J) IHa) in D([O, 1]) have the same weak limit as
D
140
3.2
On Conditional Central Limit Theorems For Stationary Processes
On Rubshtein's CCLT
Proposition 3.1 is in fact a general result which can be seen when compared to other theorems in the literature. We begin recalling Rubshtein's result in [10].
Theorem 3.4. Let (~n, TJn)nE'Z be an ergodic stationary process with ~ E L2 and E9~o = O. If
(3.10)
then, with probability 1, the conditional distributions P n (6, 6, ... , ~n I Q) of Rn (6,6, ... , ~n) converge weakly to the non-random probability W a , where
. -1 E (6 hm
n---->oo
n
+ ... + ~n )2
=
(J
2•
The proof of this result can be reduced to Proposition 3.1 observing that (3.10 ) implies a representation as in (3.9). The result in [9], Theorem 2.3 is of the same nature, but in the special situation of a skew product. Another special case of Proposition 3.1 is the following theorem, which is also a generalization of Theorem 2 in [6], when p = 2.
Theorem 3.5. Let T be an ergodic automorphism and (fin)nEZ be aT-filtration. If f E L2 is a real-valued function satisfying 00
2:(llf - Erik fl12
+ IIEri-k f112) < 00,
(3.11)
k=O
then Proposition 3.1 applies to f. In particular, there exists (J ~ 0 such that with probability 1 the conditional distributions Pn(f, UT f, ... ,U:;'-l fl fio) converge weakly to the probability distribution Wa.
Proof. The following explicit formula defines a function g which permits a representation as in (3.9 ), where we set h = f - goT + g: 00
00
k=l
k=O
(here and below the series are L 2 -norm convergent due to the assumption
Manfred Denker and Mikhail Gordin
141
(3.11 )). It follows that h
f - UTg+ g
I: Ui + (f -
f -
k
1
EHk 1)
+L
k=l
U;+l(EH_k 1)
k=O
k=l
k=O
k=O
k=l
k=l
k=O
k=l
I:
k=l
U!;(E H-k+l
-
EH-k)f
kEZ n
k=-n
This representation clearly shows that h satisfies (3.8 ) and the theorem follows 0 from Proposition 3.1.
3.3
The CCLT for subordinated filtrations
Let (Qn)nEZ and (Fn)nEZ be two subordinated T-filtrations as explained in section 2 on filtrations. We shall use Proposition 3.1 to obtain sufficient conditions that the CCLT holds together with the CLT for the conditional mean. We begin with the following reformulation of Proposition 3.1.
Proposition 3.2. Let T be an ergodic automorphism, (Qn)nEZ and (Fn)nEZ be a pair ofT-filtrations such that (Qn)nEZ is subordinated to (Fn)nE'Z,' For f E L2 define = EY f and = f Assume that and admit representations
J
1
J.
J
1
(3.12) and
f = h+ g - UTg, where
then
g, g E L 2 ,
(3.13)
On Conditional Central Limit Theorems For Stationary Processes
142
(1,
1, ...
1)
i) the distributions Pn UT ,U:;'-l of the random functions Rn(1, UT ,U:;'-l 1) converge weakly to the probability distribution Wa,
1, ...
where
a = IIhl12 ~ o.
ii) with probability 1, the conditional distributions Pn(i, UT i, . .. ,U:;.-l ilHo) given Ho of the random functions Rn(i, UT i, . .. ,U:;.-l 1) converge weakly to the (non-random) probability distribution Wo:, where (j = IIhl12 ~ o. Remark 3.6. The same proof as for Proposition 3.2 shows that the joint distribution of the partial sums of (1, converge to aGaussian law with covariance matrix (O"ij), where O"I = Ilhll§, O"~ = Ilhll§ and 0"1,2 = 0"2,1 = Jhhd/L. One easily deduces from this that also f is asymptotically normal with variance Ilh + hll§.
i)
Proof. The assertion ii) is a direct consequence of Proposition 3.1. The assertion i) also follows from the Proposition 3.1 (applied to the filtration (Qn)nEZ) and Remark 3.3. 0 Corollary 3.1. Under the assumptions of Proposition 3.2, with probability 1, the conditional distributions p;:(i, UT i, ... ,U:;.-l ill, UT u:;.-11) converge weakly to Wo:, where (j = IIhl12 ~ o.
1, ... ,
Proof. This follows from Remark 3.3, because the functions are Q-measurable and Q ~ Ho.
1, UT 1, ... , u:;.-11 0
Theorem 3.7. LetT be an ergodic automorphism, and let (Qn)nEZ and (Fn)nEZ be a pair of T -filtrations such that (Qn)nEZ is subordinated to (Fn)nEZ. Let f E L2 be a real-valued function satisfying (Xl
L
Ilf - EFk fl12 < 00,
(3.14)
k=O (Xl
LilEY f - E'H-k fl12 < 00
(3.15)
k=O (Xl
LilEY f - EYk fl12 < 00
(3.16)
k=O
and (Xl
L IIEY-k fl12 < 00. k=O
Setting ~
~
then f and f admit, respectively, the representations
and
(3.17)
Manfred Denker and Mikhail Gordin
143
where
Moreover,
i) the distributions Pn ([, UT [, ... ,U:;'-l [) of the random functions Rn( 1, UT [, ... ,U:;,-l [) converge weakly to the probability distribution W&, where (j = IIhl12 ~ O. ii) with probability 1, the conditional distributions Pn (1, UT 1, ... ,U:;,-l 1lfio) given fio of the random functions Rn (1, UT ,u:;,-11) converge weakly to the (non-random) probability distribution Wo=, where (j = IIhl12 ~ O.
1, ...
Remark 3.8. (1) Instead of (3.17 ) it is sometimes more convenient to verify the stronger condition CXJ
I: IIEF-k fl12 <
00.
k=O
(2) If then the class of functions satisfying the assumptions of Theorem 3.7 is dense in the subspace of the functions f E L2 satisfying EQ- f = O. A sufficient condition for this can be found in subsection 4.4. Proof of Theorem 3. '1. We apply Theorem 3.5 twice. Let us show first that f and (fin)nEZ satisfy the assumptions of Theorem 3.5. We have by (3.14 ) and (3.15 ) CXJ
CXJ
k=O
k=O
I: 111- EHk 1112
CXJ
k=O CXJ
<
I: Ilf - EFk fl12 <
00
k=O
and CXJ
I: IIEH-k 1112
CXJ
k=O CXJ
I: IIEQ f - EH-k fl12 <
00.
k=O
By (3.16 ) and (3.17 ) we can also apply Theorem 3.5 to [ and (Qn)nEZ (instead of f and (fin)nEZ), since CXJ
CXJ
CXJ
k=O
k=O
k=O
I: IIEQ-k [112 = I: IIEQ-k EQ fl12 = I: IIEQ-k fl12 <
00
144
and
On Conditional Central Limit Theorems For Stationary Processes
00
L
00
111 -
L
Egk 1112 =
k=O
IIEg f -
Egk fl12
<
00.
k=O
o Corollary 3.2. Let T be an ergodic automorphism, (Yn)nEZ and (:F'n)nEZ be a pair of T -filtrations such that (Yn)nEZ is immersed into (:F'n)nEZ, Assume that
(3.18) and that f E L2 is a real-valued function satisfying 00
L
Ilf -
EFk fl12
<
(3.19)
00,
k=O 00
L
IIEg f -
EH-k fl12
<
00
k=O
and
00
L
IIEg-k fl12 < 00.
k=O
Set
1=
Eg f
and
1=
f - Eg f,
0en (3;,.14 )-(3.17) of Theorem 3.7 are satisfied and its conclusion applies to f and f· Moreover, the class of functions satisfying the assumptions (3.14 )(3.17 ) is dense in {f E L2 : Eg- f = a}. Remark 3.9. In many applications we have :F'_ a-subfield. This obviously implies (3.18 ).
= N where N is the trivial
Proof of Corollary 3.2. We only need to verify (3.16). This can be deduced from (2.5 ) in the statement of Lemma 2.1 as follows:
IIEg f -
Egk fl12
IIEg f - Eg(Eh f)lb IIEgU - Eh f)112 < Ilf - EFk fib
and (3.16 ) follows from (3.19 ). By (3.18 ) Remark 3.8 (2) applies and the set of functions satisfying (3.14 )-(3.17 ) is dense in {f E L2 : Eg- f = a}. 0
4 4.1
Markov chains A general result
Let (Xn)nEZ be a stationary Markov chain with state space (Sx, Ax) (where Sx is a non-empty set and Ax a a-field in Sx), transition probability Qx : Sx x
Manfred Denker and Mikhail Gordin
145
Ax -----.. [0,1] and stationary probability measure /-Lx on Ax. We assume that the random sequence (Xk)kE71. is defined on some fixed sample space (X, F, P) where the probability measure P is the distribution of the Markov chain with initial distribution /-Lx, the stationary distribution. Then every Xk maps (X, F, P) onto (Sx, Ax, /-Lx) in a measurable and measure preserving way. For every n E Z denote by Kn the cr-field in X generated by Xn and by fin the cr-field generated by {Xk : k ::::: n}, i.e. fin = Vk
Proposition 4.1. Let (Xn )nE71. be an ergodic stationary Markov chain with stationary probability measure /-Lx and transition operator Qx. If F E L 2(/-Lx) has the representation (4.20) for some G E £2(/-Lx), then, with probability 1, the conditional distributions Pn (F 0 xo, F 0 Xl, ... , F 0 xn-llfio) given fio of the random functions Rn (F 0 xo, F 0 Xl, ... , F 0 Xn-l) converge weakly to the (non-random) probability distribution W u , where cr 2 = IIGII§ - IIQxGII§ 2:: o. Proof. We apply Proposition 3.1 to F has now the form
Fo Xo
0
Xo. Indeed, the representation (3.9 )
(G 0 Xl - (QxG) 0 xo) - Go Xl + G0 H + G 0 Xo - UT (G 0 xo),
where H = Go Xl - (QxG) sufficient to notice that
0
Xo
Xo satisfies (3.8). To complete the proof it is
IIHII~ EplG 0 xl1 2 - 2Ep(((G 0 xd . (QxG)
IIGII~
4.2
-
IIQxGII~·
0
xo))
+ Epl(QxG) 0
xOl2 0
Markov chains fibred over invertible transformations
We keep the notation as in the previous subsection. In addition, let (S-rr, A-rr) be a measurable space and 'ljJ : Sx -----.. S-rr a measurable map. 'ljJ defines a stationary sequence tr n = 'ljJ 0 Xn (n E Z) with one-dimensional marginal /-L-rr = /-Lx 0 'ljJ-I, the image of /-Lx under 'ljJ. We assume that there exists an invertible measurable transformation V of S-rr onto itself such that (4.21)
146
On Conditional Central Limit Theorems For Stationary Processes
Since (Xn)nEZ is a stationary sequence with one-dimensional distribution /-ix, it follows from (4.21 ) that V preserves /-i'Tr' Next, consider the following identity for the transition operator Qx, for all bounded, Ax-measurable functions F on Sx and all bounded, An-measurable functions G on S'Tr :
(Qx((G
0
1j;)F))(·) = G(V(1j;(·)))(QxF)(-)'
(4.22)
If Sx,z = 1j;-l(Z) denotes the fibre over Z E S'Tr) then property (4.22 ) means that the transition probability for an initial point x E Sx is concentrated on the fibre Sx,V('I/J(x))' In this case the transition operator Qx is fibred over the transformation V, and (S'Tr) A'Tr' /-L'Tr) and V are called the base probability space and the base transformation, respectively.
Fix some xESx- We are interested in the distribution of (xn)n;::::O conditioned by the constraints Xo = x, 1j; (xn) = vn (1j; (x) ) n E Z. In order to describe this behaviour let C be a a-field generated by some fixed random variables 'Trl. The following observation follows from Propsition 4.1 by passing from the a-field Ji o to the coarser a-field C.
Proposition 4.2. Let Qx be an ergodic transition probability with stationary probability measure /-LX) and assume that Qx is fibred over a transformation V with base probability space (S'Tr' A'Tr' /-i'Tr)'
If FE L 2 (/-ix) has a representation (4.20 )
F= G-QxG for some G E L2 (/-L x), then, with probability 1, the conditional distributions Pn(F 0 xo,F 0 x1, ... ,F 0 xn-1IC) of the random functions Rn(F 0 xo,F 0 Xl, ... , F 0 Xn-1) converge weakly to the (non-random) probability distribution W a , where a 2 = IIGII~ - IIQxGII~ ~ o. The same conclusion holds for
F=F
- EA~F, where A~
= 1j;-l(An).
Proof. First note that the first claim follows from Proposition 4.1. assumptions we have the identity
By the
~
which implies that both functions F and F defined by
F = EA~ F , F = F also satisfy (4.20 ), because EA~(G - Qx G) ~
=
-
F
EA~G - QxEA~G.
(4.23)
o
~
Remark 4.1. Only F defines a stationary process fox n (n ~ 0) with a possibly non-generate CLT, while F has the form F = Go V - G, hence is a coboundary and defines a stationary process with a degenerate limit in the CLT. For a function F with decomposition (4.23 ), we can always assume that the function G in a representation (4.20 ) has a decomposition of the form (4.23 ) as well, I.e.
Manfred Denker and Mikhail Gordin
147
and Under this condition (4.20 ) admits at most one solution. Functions satisfying (4.20 ) form a dense subset in LdjJx). This follows from the fact that their orthogonal complement in L 2 (jJx) is the space of Q:-invariant functions, whence are constant by ergodicity of Qx' They are also dense in the subspace of functions F, satisfying (4.24)
Remark 4.2. There are different strategies to obtain (4.20 ) for a given function. If Qx is a normal operator (in the sense that it commutes with its conjugate), very precise conditions for (4.20 ) to hold can be given in terms of the spectral decomposition of F relative to Qx ([4]). For a function F a solution G to the equation (4.20 ) can be written down as a formal power series: 00
G= LQ~F. n=O
(4.25)
In some cases this series converges with respect to an appropriate norm.
Remark 4.3. Fibration over the base space is of particular interest for fibred dynamical systems (see [5]). The fibres are given by Sx,z = 'l/J-l(z) and the measure jJx has a disintegration into probability measures jJx,z which are supported on the fibres Sx,z' Under (4.22 ) fibrewise transition probabilities are defined by Q~,n) : Sx,z x A(vn(z)) -+ [0,1], Q~,n)(x,A) = Q~(x,A),
x E Sx,z, A E A(vn(z)),
where z E S1f and A(z) is the restriction of the a-field Ax to the fibre Sx,z' The family (Q~,n»)zES",n::::O is measurable in z and satisfies the cocycle identity in n, l.e.
1
Q~k(z),l) (u, A)Q~,k) (x, du) = Q~,k+l)(x, A),
(4.26)
S",,vk(z)
for z E S1f' X E Sx,z, A E A(Vk+l(z)), k, I 2': O. The transition probability Q~,n) transports the conditional measure jJ(x,z) on the fibre Sx,z to the conditional measure jJ(x,vn(z» on Sx,vn(z)' The condition (4.24 ) means that F has vanishing integrals with respect to each fibre probability measure jJx,z, thus defining the family of function spaces on fibres Sx,z given by functions of vanishing integral with respect to jJ(x,z)' The family Q~,n) also defines a family of operators between these function spaces with the cocycle property (4.26 ) (the operator Q~,n) maps functions on the fibre Sx,Vn(z) to those on Sx,z). They also preserve integrals with respect to the conditional measures, in particular, the set of function with integral 0 is invariant with respect to these operators. Various conditions are known in the literature ensuring that this family of operators, restricted to spaces of functions with vanishing integrals over all fibres, are contractions with respect to an appropriate norm (provided
148
On Conditional Central Limit Theorems For Stationary Processes
n is sufficiently large). For example, in the case of immersed finite state Markov chains considered in Theorem 5 of [2] (we shall treat the immersed case in the next section avoiding such considerations) there are only finitely many types of finite fibres with finitely many types of transition probabilities between them. Under some additional assumptions the contraction property is ensured in the uniform norm. Alternatively, assuming that Sx is a metric space, we can use Holder norms to achieve the contraction property. This technique is often used in connection with thermodynamic formalism and its relativized version (see [5] and references therein). The transfer operator considered there is a generalization of the transition operator, because it does not need to preserve the space of constant functions; however, it is a specialization at the same time, because the "reversed process" is deterministic. Notice, that there is no need to apply the Hilbert projective norm technique because we assume the existence of a stationary probability measure (though this technique is very helpful in proving the existence of these measures).
4.3
Reduction of conditional Markov chains to chains with deterministic base
In this section we sketch the application of subsection 4.2 to the general problem mentioned in the introduction. Recall that we are interested in the asymptotic distribution of L~:~ !((k) given 'TJo, ... , 'TJn-l, where (k = (~k' 'TJk) is a two component strictly stationary homogeneous Markov chain. Let ((khEZ be a stationary homogeneous Markov chain. Its state space is denoted by (S(, Ad (where S( is a set and A( is a O"-field in Sd, its transition probability by Q( : S( x A( -----+ [0,1] and its stationary probability measure by /1< on A(, i.e. E(F((n+l)l(k, k ~ n) = (Q(F)((n). We assume that the random sequence ((k)kEZ is defined on some fixed probability space (X,F, P) where the probability measure P is derived from the stationary distribution f-t( (as in subsection 4.2). Then every (k maps (X, F, P) onto (5(, A(, f-td in a measurable and measure preserving way. For every n E Z denote by An the O"-field in X generated by (n and by Fn the O"-field generated by {(k : k ~ n}, i.e. Fn = Vk
... ) -: (... ,~o, ZI, Z2"")' The~et Sx consists of those ~airs (~, z)
S; ~ whlch satlsfy cp(xo) E
Srr WIth x - ( ... , X-I, xo), Z - ( ... , Z-I, ZO, ZI,"') Then we set 'ljJ((x, z)) = z. The random sequence (Xn)nEZ can be de-
Zoo
Manfred Denker and Mikhail Gordin
149
fined on the same probability space (X, F, P) as ((n)nEZ by setting Un = (( ... , (n-1, (n), ( ... , 'T/n-1, 'T/n, 'T/n+1,··· )), n E IZ, where 'T/n in the second coordinate marks the position 0 in the infinite string. It is obvious that Un oT = Un+1 and that (Un)nEZ generates F. Therefore (X, F, P) can be also considered as the path space of (Un)nEZ. Note that (Un)nEZ is a Markov chain, because (Un)nEZ is a random sequence for which the past can be reconstructed from the present. Now we see that we are essentially in the situation of subsection 4.2. The operator Qx can be defined correctly at least as an operator on £2(Sx, Ax, /1x), and Proposition 4.2 applies. Given a function F on S" the problem remains to check (4.20 ) for the function F' defined on S x by
F'(( ... , X-I, xo), (... , Z-l, ZO, Zl, ... )) = F(xo). First we need to subtract from F' the function Z f--+ J F'(u)/1x,z(du), the conditional expectation with respect to the base. Then we may prove, for example, convergence of the series (4.25 ) for the function F' - J F'(u)/1x,z(du). As to the behavior of the random sequence (E(F 0 ((n)I{'T/dkEZ)nEZ related to the function Z f--+ J F'(u)/1x,z(du), it requires some estimates showing that /1x,z is mainly determined by the finite part of the sequence z. In Bezhaeva [2] this is assured by condition (A).
4.4
Immersed Markov chains
We keep the notation of the previous subsection. Let Q be a transition probability on S x A and A' be a a-subfield of A. Then Q is said to be A'-compatible if the transition probability Q(., A) is a A'-measurable function for every A E A'. Let ((n)nEZ be a stationary Markov chain and ('T/n)nEZ be a random sequence defined by 'T/n = 'P((n), n E IZ. We say that ('T/n)nEZ is immersed into ((n)nEZ, if Q, is 'P- 1 (A'T/ )-compatible. Under this condition a straight forward calculation shows that the sequence ('T/n)nEZ is a Markov chain, and that the filtration 9n = Vk
Zo,
Q(z,l)
depends on Zo only where z =
Zl' ... );
ii) the conditional measure /1z is a function of Zo, Z-l, ... (here again z
=
( ... ,Z-l,ZO,Zl, ... )).
Recall that 9 is the a-field generated by ('T/n)nEZ and A~ = 'I/'-l(A1l") is the a-field on the state space of the Markov chain (Un)nEZ generated by the map '1/'. In other words it is generated by the map (x, z) f--+ z, where x = ( ... , X -1, xo) and z = ( ... , Z-l, Zo, Zl' ... ). Let A be the map sending (x, z) to Xo. In the following theorem we use the notations introduced above.
150
On Conditional Central Limit Theorems For Stationary Processes
Theorem 4.4. Let ((n)nEZ be a Markov chain and 7]n = tp((n) , n E Z. Assume, that (7]n)nEZ is immersed into ((n)nEZ. Let (Xn)nEZ denote the Markov chain associated to ((n)nEZ as in subsection 4.3. For a function F' = F 0 A on Sx, define p' = EA~ F', and p' = F' - P'.
If the functions F and
P'
admit representations
and
P' = G' -
QxG',
where G E L 2 (/-Lc,) and G' E L 2 (/-Lx), then [ the assumptions of Proposition 3.2. Thus,
= P' 0 Xo and 1 = P' 0 Xo satisfy
1, ...
i) the distributions Pn(1, UT ,u:;.-11) of the random functions Rn(1, UT ,u:;.-11) converge weakly to the probability distribution Wo:, where &2 = IIGII~ -IIQd~ ~ 0;
1,· ..
ii) with probability 1, the conditional distributions Pn ([, UT [, ... ,U:;'-I lIHo) given 110 of the random functions Rn(J, UT [ , ... ,U:;'-I[) converge weakly
to the (non-random) probability distribution W 0:, where ;:;2 =
II G' I ~ -
IIQxG'II~ ~ o. Proof. We apply Proposition 3.2 to the functions f and f· It is clear from the proof of Proposition 4.1 that [ satisfies the condition (3.13 ) of Proposition 3.2 with ?i = G' 0 Xo.
Setting pn = (... , 7]n-l, 7]n) we introduce a stationary Markov chain (Pn)nEZ with state space Sp, transition operator Qp and stationary measure /-Lp. Let X : Sx ----t Sp be the map sending (( ... , X-I, Xo), ( ... , Z-I, ZO, ZI' ... )) to ( ... , Z-I, zo) and by A" the a-field in Sx generated by x. Then by immersion it follows that EA~ (G 0 A) is A"-measurable. Therefore, it can be written in the form GoA with an appropriate function G on Sp, and, applying the immersion property again, we obtain This implies the representation
P'(xn) whence (3.12 ) holds for
=
G(Pn) - (QpG)(Pn),
1 with 9 =
G(Pn).
It follows that Proposition 3.2 applies to the function f
= f + f·
o
Bibliography [1] Beghdadi-Sakrani M., Emery M.: On certain probabilities equivalent to coin-tossing, d'apres Schachermayer. Sem. de Probab. XXXIII, Lect. Notes in Math. 1709 (1999), 240-256.
Manfred Denker and Mikhail Gordin
151
[2] Bezhaeva Z.l.: Limit theorems for conditional Markov chains. Theory of Probability and its Applications (translated from the russian original) 163 (1971), 428-437. [3] Billingsley P.: Convergence of Probability Measures. John Wiley & Sons, Inc. New York-London-Sydney-Toronto, 1968. [4] Borodin A.N., Ibragimov LA.: Limit Theorems for Functionals of Random Walks. Proc. of Steklov Inst. of Math. 195, Providence RI, 1995. [5] Denker M., Gordin M., Heinemann S.-M.: On the relative variational principle for fibre expanding maps. Ergodic Theory and Dynamical Systems 22 (2002), 757-782. [6] Gordin M.l.: The central limit theorem for stationary processes. Russian. Dokl. Akad. Nauk SSSR 1889 (1969), 739-741. [7] Gordin M. 1., LifSic B. A.: Central limit theorem for stationary Markov processes. Russian. Dokl. Akad. Nauk SSSR 2394 (1978), 766-767. [8] Jacod J., Shiryaev A. N.: Limit Theorems for Stochastic Processes. Grundlehren der math. Wiss. 288 Springer Verlag Berlin etc. 1987. [9] Kifer, Y.: Limit theorems for random transformations and processes in random environments. Trans. Amer. Math. Soc. 350 (1998), 1481-1518.
[10] Rubshtein, B.-Z.: A central limit theorem for conditional distributions. Convergence in Ergodic Theory and Probability (Bergelson, March, Rosenblatt eds.), Walter de Gruyter, Berlin 1996, 373-380.
152
On Conditional Central Limit Theorems For Stationary Processes
Polynomially Harmonizable Processes and Finitely Polynomially Determined Levy Processes A. Goswami Indiana University
and A. Sengupta Indian Statistical Institute
Abstract The sequence {Pk (t, x)} of two-variable Hermite polynomials are known to have the property that, if {Mt, t 2': O} denotes the standard Brownian motion, then Pk(t, M t ) is a martingale for each k 2': 1. This property of standard Brownian motion vis-a-vis Hermite polynomials motivated the general notion of "polynomially harmonizable processes". These are processes that admit sequences of time-space harmonic polynomials, that is, two-variable polynomials which become martingales when evaluated along the trajectory of the process. For Levy processes, this property is connected to certain properties of the associated Levy /Kolmogorov measures. Moreover, stochastic properties of the under lying processes (like independence, stationarity of increments) turn out to be equivalent to certain algebraic/analytic properties of the corresponding sequence of polynomials. We first present a brief survey of these recently obtained general results and then describe necessary and sufficient conditions for certain classes of Levy processes to be uniquely determined by a finite number of time-space harmonic polynomials.
AMS (1980) Subject Classification: Primary 60F05, Secondary 60J05 Keywords: Time-Space Harmonic Polynomials, p-Harmonizability, Levy Processes, Hermite Polynomials, Charlier Polynomials, Finitely Polynomially Determined Processes, Semi-Stable Markov Processes, Intertwinning Semigroups
1
Introduction: General Definitions
The sequence of two-variable Hermite polynomials {Pk , k ~ I} on [0, (0) x lR are defined via the classical one-variable Hermite polynomials {Pk, k ~ I} as follows:
where
Pk(X) = (_1)ke X2 /2 ::k (e- x2 / 2 ). Some of the well-known properties of the sequence {Pd are:
•
Pk(t, x) is a polynomial in the two variables t and x, for each k. 153
Polynomially Harmonizable Processes
154
•
Pk(t, .) has degree k in x, with the leading term having coefficient 1.
• •
& -Pk(t,x) =&t
(k)2 Pk- 2(t,X) = --12 &22Pk(t,x), for each ~
uX
k
> 2. -
For the last two properties, we take Po(t, x) _ 1. The first two properties simply tell us that we can write k
Pk(t,x)
=
LP;k)(t)x j , j=O
where the p;k) (t) are polynomials in t and pik)(t)
==
1.
The sequence {Pk } of Hermite polynomials as defined above is known to have some deep connections with the standard Brownian motion. One of these is the well-known fact that if {Mt, t ~ O} denotes the standard Brownian motion, then for each k, {Pk(t, M t ), t ~ O} is a martingale (for the natural filtration of {Md) and standard Brownian motion is the only process with this property. Moreover, if P(t, x) is any two-variable polynomial such that {P(t, M t )} IS a martingale, then P belongs to the linear span of the sequence {Pk }. A natural question that arises is: which stochastic processes admit such a sequence of 2-variable polynomials which when evaluated along the trajectory of the process are martingales and, if so, to what extent do these polynomials determine the process? Also, is it possible to get the sequences of polynomials so as to satisfy properties similar to those of the Hermite polynomials mentioned above? These questions were investigated in detail in Goswami and Sengupta [2] and Sengupta [6]. Following are some notations and definitions that were introduced in these works. Here we restrict ourselves only to continuous-time processes. Let M = {Mt, t ~ O} be a stochastic process on some probability space. The time-space harmonic polynomials for the process M are defined to be all those two-variable polynomials P(·,·) such that {P(t, M t )} is a martingale (always for the natural filtration of M). The two variables will be referred to as repectively the 'time' and the 'space' variables. The collection of all time-space harmonic polynomials for a process M will be denoted P(M). In other words,
P(M):= {P : P is a 2-variable polynomial and {P(t,Mt )} is a martingale} k
Any two-variable polynomial P can be written as P(t,x)
= L Pj(t)x j , for j=O
some k, where each Pj(t) is a polynomial in t. If in the above representation, Pk (t) =j:. 0, we say that P is of degree k in the 'space' variable x. For a stochastic process M = {Mt }, we define Pk (M) to be the collection of those time-space harmonic polynomials which are of degree k in the space variable, that is,
Pk(M)
:=
{P E P(M) : P is of degree k in the space variable x}.
A. Goswami and A. Sengupta
155
Clearly,
P(M)
=
UPk(M). k
Definition: A stochastic process M is said to be polynomially harmonizable (p-harmonizable, in short) if Pk (M) =1= 0, for all k :2: 1. In this terminology, standard Brownian motion is a p-harmonizable process. Indeed, Brownian motion is p-harmonizable in a somewhat stricter sense, to be understood below. For a process M, let us denote P k (M) to be the set of those time-space harmonic polynomials of degree k in x, for which the leading term in x is 'free' of t, that is, the coefficient of xk is a non-zero constant. In other words, k
PdM)
:=
{P E Pk(M) : P(t,x)
LPj(t)x j with Pk(-) a non-zero constant},
=
j=O
and we let,
P(M)
:=
UPk(M). k
Clearly, Pk(M)
c Pk(M)
'II k and so, P(M)
c P(M). Also, if Pk(M)
=1=
0,
k
then there is P(t, x) =
2:= Pj(t)x j
E Pk(M) with Pk(')
== 1.
j=O
Definition: A stochastic process M is said to be p-harmonizable in the strict sense if Pk(M) =1= 0, for all k :2: 1. The second property ofthe two-variable Hermite polynomials listed earlier shows that standard Brownian motion is actually p-harmonizable in the strict sense. The other classical example of a strict sense p-harmonizable process is the Poisson process. For a Poisson process, with intensity 1 for example, a sequence of time-space harmonic polynomials is given by the so-called two-variable Charlier polynomials
where {
~
} denote the Stirling numbers of the second kind. The Gamma
process is another example of a strict sense p-harmonizable process. In keeping with the special properties of the sequence of Hermite polynomials mentioned earlier, we introduce here a list of properties for a sequence of twovariable polynomials. Let {Pk, k :2: I} be a sequence oftwo-variable polynomials with Pk being of degree k in x. We define Po == 1. Let us write Pdt,x) = k
2:= p(k)(t)x j , where . 0 J
the pY)(t) are polynomials in t. We are going to refer to
J=
the following properties in the sequel.
(i) Strict sense property: For each k :2: 1, p~k)(.) == 1.
Polynomially Harmonizable Processes
156
aPk
(ii) The Appell property: For each k ?: 1, ax
=
kPk- 1 , that
.
. (k)
IS,
JPj (t) =
kPJ-l (k-l)(t) , 1< ·
(iii) The pseudo-type-zero property: There exists a real sequence {hk} such that aPk for each k ?: 1, - a t
=
z=k (k) hiPki
i=l
i,
that
.
IS,
d (k) -d Pj (t)
t
=
k-j (k)
z=
i
(k-i) hiPj (t),
i=l
1 :::; j :::; k.
(iv) Uniqueness property: For each k > 1, PdO,x) 0, 0:::; j :::; k - 1.
=
xk, that is, pr)(O)
The sequence of Hermite polynomials satisfies all the properties (i) - (iv); property (iii) holds here with h2 = -1 and hk = 0 for k =J- 2. It is easy to verify that the two variable Charlier polynomials satisfy these properties as well. Theorems 2.3 and 2.4 in the next section will establish that these are reflections of the fact that both Brownian motion and Poisson process are homogeneous Levy processes. Let us make some basic observations about the properties listed above. First of all, with the convention that Po - 1, property (i) will always imply property (ii). Secondly, in our applications, the sequence {Pk } will be arising as time-space harmonic polynomials of a process M. Now if, the process itself happens to be a martingale, we can always take PI = x, in which case property (ii) will actually imply a slightly stronger property than (i), namely, (i/) for each k ?: 1, Pdt, x) - xk has degree at most k - 2 in x, that is, p~k) == 1 (k) and Pk-I = O. Properties (ii) and (iii) for a sequence of polynomials were studied analytically in an entirely different context in Sheffer [7], which is the source of our terminolgy for these properties in this context. It turns out that for a stochastic process M, the properties (ii), (iii) and some other algebraic/analytic properties the corresponding sequence of time-space harmonic polynomials are intimately connected to some stochastic properties of M.
2
Levy Processes and p-Harmonizability
In this section, we describe some of the results on p-harmonizability of Levy processes. Details of these can be found in [6]. Discrete-time versions of many of these results were proved earlier in [2]. For us, a Levy process will mean a process M = {Mt, t ?: O} with independent increments and having no fixed times of discontinuity. A homogeneous Levy process is one which is homogeneous as a Markov process, that is, whose increments are stationary besides being independent. In the results that follow, we will often need to impose two conditions on the process M, to be referred to as the moment condition and support condition. They are as follows: • Condition (Mo)
For all t, M t has finite moments of all orders.
A. Goswami and A. Sengupta
157
i 00,
• Condition (Su) : There is a sequence tn Isupport (MtJ I > k for infinitely many tn.
such that, for all k 2:: 1,
The moment condition (Mo) is clearly necessary for the process to be p-harmonizable. The role of the condition (Su) is more technical in nature. However, it may be noted that any homogeneous Levy process always satisfies this condition (unless, of course, it is deterministic). For a general Levy process, a simpler condition that gurantees (Su) is that M t - Ms be non-degenerate for all 0 S; s < t, that is, the increments are all non-degenerate. We now state some of the main results from [6].
Theorem 2.1. Any homogeneous Levy process M = {Mt, t 2:: O} with Mo == 0 and satisfying the conditions (Mo) and (Su) is p-harmonizable in the strict sense. Moreover, there exists a unique sequence P k E P k (M), k 2:: 1 satisfying properties (i) - (iv) and such that P(M) is just the linear span of {Pk , k 2:: I}. Further, the process M is uniquely determined by the sequence {Pd upto all the moments of its finite-dimensional distributions. Remark: (i) The fact that P(M) equals the linear span of {Pk, k 2:: I} implies, in particular, that P(M) = P(M). This is actually a special case of a more general fact proved by Goswami and Sengupta in [2], namely, that for any process M satisfying (Su), if Pk(M) -=/=- 0 V k, then Pk(M) = Pk(M) V k. (ii) The property of M being determined by the sequence {Pd can be strengthened as follows. If we assume, for example, that for some t > 0 and E > 0, E (exp{ aMd) < 00 V Ia I < E, then the sequence {Pd completely determines the distribution of the process M. Theorem 2.2. Let M = {Mt, t 2:: O} be a Levy process with Mo == 0 and satisfying the conditions (Mo) and (Su). Then M is p-harmonizable if and only if for each k 2:: 1, E(Mtk ) is a polynomial in t. In this case, there exists a unique sequence P k E Pk(M), k 2:: 1 satisfying properties (i), (ii) and (iv) and such that P(M) is just the linear span of {Pk , k 2:: I}. Further, the process M is uniquely determined by the sequence {Pk } upto all the moments of its finite-dimensional distributions. Remark: Note the absence of the pseudo-type-zero property (iii) in this case. In fact, property (iii) would not hold unless the process is homogeneous. [see Theorem 2.4]. We now describe a characterization of p-harmonizability of a Levy process M in terms of the underlying Levy measure, or, equivalently the Kolmorov measure. Associated to any Levy process, there is a a-finite measure m on [0,00) x (JR \ {O}), called its Levy measure, such that, E{exp(iaMt)} exp [iafL(t) - !a 2 a2 (t)
+
J
(e iCW
-
1- 1
i~:2
)
m([O, t]
@
dU)] ,
where fL(') and a 2 (.) are the mean and variance functions of the 'gaussian part' of M. It can be shown that p-harmonizability of M is equivalent to requiring
Polynomially Harmonizable Processes
158
that all the following functions be polynomials in t:
and for k
> 2, hk(t)
=
J
ukm([O, t] ® du).
The above characterization takes on a slightly simpler form when expressed in terms of what is known as the Kolmogorov measure associated with the process. It is the unique Borel mesure L on [0, (0) x lR such that log E{ exp(io:Mt)} = io:v(t)
+
J(
eiaU -
1u2
iO:U) L([O, t] ® du),
where v(t) = EMt is the mean function of the process M. We refer to Ito [3] for the definition and the transformation that connects the Kolmogorov measure and the Levy measure. A necessary and sufficient condition for pharmonizability of the process M is that : v( t) as well as the functions hk (t) = Ju k - 2 L([0,t] ®du), k 2: 2 are all polynomials in t. We have seen that for any Levy process M satisfying the conditions (Mo) and (Su), we can get a sequence P k E Pk(M), k 2: 1, such that Appel property (ii) holds. Moreover, if M is homogeneous, then the sequence {Pk} can be chosen so as to satisfy the pseudo-type-zero property (iii). The next two results show that, under some conditions, the converse is also true. In both the following theorems, M = {Mt, t 2: O} will denote a continuous-time stochastic process with r.c.I.I. paths starting at Mo == and satisfying conditions (Mo) and (Su) and {Ft, t 2: O} will denote the natural filtration of M.
°
Theorem 2.3. If there exists a sequence Pk E Pk(M), k 2: 1, satisfying the Appel property (ii), then for each 0 ::; s < t, the conditional moments E((Mt-Ms)kIFs) are degenerate for all k. If moreover, for each t, the momentgenerating function of M t is finite on some open interval containing 0, then M is a Levy process. Theorem 2.4. If there exists a sequence P k E Pk(M), k 2: 1, satisfying both the Appel property (ii) and the pseudo-type-zero property (iii) and if for each t, the moment-generating function of M t is finite on some open interval containing 0, then M is a homogeneous Levy process. Remark: Under the hypothesis of either of the above theorems, it can further be shown that the sequence {Pd satisfies the properties (i) and (iv) as well and is the unique sequence to do so. Moreover, the sequence {Pd span all of P(M) and also determines the distribution of M.
Next, we briefly mention some connections between the time-space harmonic polynomials of a process and what is known as semi-stability property, as developed in Lamperti [5]. Recall that a process M with Mo _ 0 is called semi-stable of index (3 > 0 if for every c > 0, the processes {Met, t 2: O} and {c!3 M t , t 2: O} have the same distribution. It can be easily shown that if {Pk E Pk(M)} is
A. Goswami and A. Sengupta
159
a sequence of time-space harmonic polynomials of a semi-stable process M, of index (3, then each P k satisfies the following homogeneity property:
where Pk(·) is the one-variable polynomial Pk(l, .). In other words, each P k is homogeneous in t f3 and x. It can be shown that, under mild technical conditions, the converse is also true, that is, the existence of a sequence {Pk E Pk (M)} such that each P k is homogeneous in t f3 and x, for some (3 > 0, implies that the process M is semi-stable of index (3. It is also worthwhile to point out here that if a process M admits a sequence {Pk} of time-space harmonic polynomials which are homogeneous in t f3 and x, then 2(3 must be an integer and that in case 2(3 is odd, the finite dimensional distributions of M are all symmetric about o. Finally, let us mention how an intertwining relationship between two markov processes, as developed in Carmona et al [1] relates the time-space harmonic polynomials of the two processes. If M and N are two markov processes with semigroups (Pt ) and (Qt) respectively, one says that the two processes (or, the two semi groups ) are intertwined if there exists an operator A such that APt = QtA V t. In many cases, the operator A is given by the "multiplicative kernel" for a random variable Z, that is, AJ(x) = EJ(xZ). In such a case, it is k
easy to show that, if P(t, x) =
L
Pj(t)x j is a time-space harmonic polynomial
j=O k
for the process M, then P(t, x)
= AP(t, x) = L pj(t)E(Zjx j ) is time-space j=O
harmonic for N. This has proved to be very useful in that if one knows the time-space harmonic polynomials of a process M, then one can get those for other processes which are intertwined with M. This is illustrated with examples in Section 4.
3
Finitely Polynomially Determined Levy Processes
In this section, we address the main question of this article, which involves obtaining a characterization of Levy processes whose laws are determined by finitely many of its time-space polynomials. In a sense, this is an extension of Levy's characterization of standard Brownian motion, which says that, under the additional assumption of continuity of paths, standard Brownian motion is characterized by two of its time-space harmonic polynomials, namely, the first two 2-variable Hermite polynomials PI (t, x) = x and P 2 (t, x) = x 2 - t. One knows that the continuity of paths is a crucial assumption here, without which the characterization does not hold. In the results that follow, the only path property we will assume is the standard assumption of r.c.I.I. paths for Levy processes. Let us start with some general definitions. Let C be a given class of processes.
Polynomially Harmonizable Processes
160
Definition A process M E C will be called k-polynomially determined in C (in short, k-p.d. in C), if Pj(M) =J 0, V j ::; k, and, for any N E C, Pj(N) = Pj (M) V j ::; k =? N d M. (Here:1:: means equality in distribution.) Processes which are k-p.d. in C for some k ;::: 1 are called finitely polynomially determined in C (in short, f.p.d. in C).
Let us remark here that an f.p.d. process need not be p-harmonizable. A general question that we may address is: for what classes of processes C, can one get a complete characterization of the f.p.d. members of C? For two important classes of processes, such a complete characterzation has been obtained and are presented below. The first result characterizes the f.p.d. processes in the class of all homogeneous Levy processes. As mentioned in the previous section, for any Levy process M, one has the representaion log(E( eiaMt ))
io:v(t)
+J
io:f.-t(t) -
(e
iaU
-u~ - iO:U) L([O, t] ® du)
~o:2(J2(t) + J
(e iaU - 1 - 1
i::
2)
m([O, t] ® du),
where Land m are called respectively the Kolmogorov measure and the Levy measure associated to the process M. In case M is homogeneous, the measures Land m turn out to be the product measures
L(dt ® dx) = dt ® l(dx), m(dt ® dx) = dt ® 77(dx) , where I and 77 are (J-finite measures on lR and lR \ {O} respectively and the above representations take on the following special forms
io:vt + t
J(
eiaU - 1 -
u2
io:f.-tt - .lo:2(J2t + tJ 2
iO:U) l(du)
(e iau - 1 -
io:u ) 77(du).
1 +u 2
It may be pointed out in this connection that the relation between the measures land 77 is simply given by
An important property of I that will be used subsequently is that for all k ;::: 2, the k-th cumulant of Ml equals J u k - 21(du). the following theorem now gives a characterization of f.p.d processes in the class of all homogeneous Levy processes. Theorem 3.1. A process M is finitely polynomially determined in the class of all homogeneous Levy processes if and only if the associated measure l, or equivalently the measure 77, has finite support. Proof. It is immediate from the above relation between the measures land 77 that whenever one of them has finite support, so does the other. In the proof, we will work with l.
A. Goswami and A. Sengupta
161 n
Suppose first that
l
has finite support, say,
l
= L
(;Ii6{rd,
where (;Ii
>
0, ~
=
i=1
1, ... ,n and r/s are distinct real numbers. Here 6{r} denotes the 'dirac' mass at r. We show that M is k-p.d. among homogeneous Levy processes with k = 2n+2. Let N be any homogeneous Levy process with Pj(N) = Pj(M) \:j j :;
2n+2. We will show that VN = VM and IN = IM which will imply that N!!:... M. It is easy to see that P j (N) = Pj (M) \:j j :; 2n+ 2 implies the equality of the first 2n + 2 moments of NI and M I , which in turn implies the equality of their first 2n + 2 cumulants. This entails, first of all, that VN = VM and also, in view of the above mentioned property of l, that ujIN(du) = ujIM(du) \:j j = 0,1, ... , 2n. From these, one can easily deduce that for any choice of distinct real numbers
J
In particular, taking ai
=
J
ri, \:j i, one obtains that
JiDI
(u - ai)2IN(du)
= 0,
n
implying that IN is supported on {rl,'" ,rn}, that is, IN
= L
(;I~6{r;}' for
i=1
non-negative (;I~, 1 :; i :; n. Using the facts VN = VM and J u j l M (du) \:j j = 0, 1, ... ,2n, it is now easy to conclude that (;I~ is, IN = IM.
JujIN(du) = =
(;Ii \:j
i, that
To prove the converse, suppose that M is a homogeneous Levy process for which the associated measure I is not finitely supported. We show that M is not f.p.d. by exhibiting, for any k, a homogeneous Levy process N, different from M, such that P j (N) = P j (M) \:j j :; k. This is done as follows. Fix any k ~ 1. Since 1 is not finitely supported, we can get disjoint borel sets Ai C JR, i = 1, ... ,k such that l(Ai) > 0, \:j i. Consider the real vector space of signed measures on JR defined as V
=
{,u : ,u(.) =
linear map A : V
-+
it
cile
n Ai),
ci
E JR, 1 :; i :; k} and consider the
JRk-1 defined by
A being a linear map form a space of dimension k into a space of dimension k - 1, the nullity of A must be at least 1. Choose a non-zero ,u in the null-space of A. Further, we can and do choose ,u so that 1,u(Ai)1 < l(Ai), \:j i. If we now define i = l + ,u, then i is a positive measure with i f- l but J u j i( du) = J ujl(du), \:j j = 0"" ,k - 2. It is now easy to see that if N is the homogeneous Levy process with VN = VM and Kolmogorov measure L(dt ® dx) = dt ® i(dx) , d
then Pj(N) = Pj(M)
\:j
j :; k but N
f-
M. 0
Remarks: (i) A simple interpretation of the above therorem is that a homogeneous Levy process is f.p.d. if and only if its jumps, if and when they occur, are of sizes in a fixed finite set.
(ii) The proof of 'if' part of the theorem shows that if the measure l is supported on precisely k many points, then the process is determined by its first 2k + 2
Polynomially Harmonizable Processes
162
many time-space harmonic polynomials. A natural question is whether 2k + 2 is the minimum number of polynomials necessary. As we shall see in Section 4, that is indeed the case for the most common examples of homogeneous Levy processes. We conjecture that it is perhaps true in general. Our next reult will give a similar characterization of the f. p.d. property in a more general class of Levy processes than the homogeneous ones. To be specific, we consider the class of those Levy processes for which the Kolmogorov measure admits a 'disintegration' w.r.t. the Lebesgue measure on [0,(0). Formally, let us say that the Kolmogorov measure L of a Levy process M admits a 'derivative measure' I if L(dt,dx)
=
l(t,dx)dt,
where l(t, A), t E [0,(0), A E B is a transition measure on [0,(0) x B. Here B denotes the Borel a-field on R We denote C to be the class of all those Levy processes whose Kolmogorov measure admits such a derivative measure. Clearly, all homogeneous Levy processes belong to this class, since in that case l(t,·) == l(·). The class C is fairly large. For example, Gaussian Levy processes as well as non-homogeneous compound Poisson processes belong to this class. Since C is clearly a vector space, any Levy process that arises as the sum of independent Levy processes of class C also belong to this class. As expected, our characterization of f.p.d. processes among the class C will be in terms of the derivative measure l(t,·) defined above and the general idea of the proof runs along the same lines as in the case of homogeneous Levy processes. However, the actual argument becomes a little more technical. For example, we would show that a process M in the class C cannot be k-p.d. unless for almost all t, the derivative measure l(t,·) is supported on at most k points. This is the content of the following Lemma 3.1. The idea of the proof is analogous to that of the 'only if' part of Theorem 3.1 for homogeneous Levy processes. That is, assuming the contrary is true, we will have to define a new process N in class d
C such that Pj(N) = Pj(M) V j :s; k but N =I- M. However, getting hold of this process N or equivalently its derivative measure l( t, .) involves using an appropriate variant of a result of Descriptive Set Theory, known as Novikov's Selection Theorem, stated below as Lemma 3.2. We refer to Kechris [4] for details. Lemma 3.1. Suppose the process M is k-polynomially determined in class = {t >
C. Then for any version of l, the set T c [0, (0) defined by T Isupp(l (t, .)) I > k} is Borel and has lebesgue measure zero.
°:
We omit the proof of this lemma here. As mentioned above, the proof uses the following selection theorem (see [6] for details). Lemma 3.2. Suppose U is a standard Borel space and V is a a-compact subset of a Polish space. Let B c U x V be a Borel set whose projection to U is the whole of u. Suppose further that, for each x E U, the x-section of B is closed in V. Then there is a Borel measurable function 9 : U -+ V whose graph is contained in B.
A. Goswami and A. Sengupta
163
We now state and prove the characterization result for f.p.d.-processes in the class C.
Theorem 3.2. Let M be a Levy process of the class C. (a) If there exists an integer k :2: 1 and a measurable function (XI,··· ,Xk,PI,··· ,Pk) : [0,(0) --t]Rk X [O,oo)k such that (i) for each j k
0,1, ... ,2k,
2: Pi(t)(Xi(t))j
is a polynomial in t almost everywhere, and, (ii)
i=l k
l(t,·)
=
2: Pi(t)O{x;(t)}(-)
is a version of the derivative measure for M, then M
i=l
is finitely polynomially determined (indeed, (2k + 2)-polynomially determined) in C. (b) Conversely, if M is finitely polynomially determined in C, then there exists an integer k :2: 1 and a measurable function (Xl, ... ,Xb PI, ... ,Pk) : [0,(0) --t ]Rk X [0, oo)k such that a version of the derivak
tive measure associated with M is given by l(t,·)
=
2: Pi(t)O{x;(t)}(-). i=l
As mentioned above, the idea of the proof is similar to the homogeneous case except that it is a little more technical. One of the key observations used in the proof is that for a process M in the class C, Pj(M) =I- 0, 1 :s: j :s: k if and only if the first cumulant CI (t) of the process M is a polynomial in t and for all 2 :s: j :s: k, the functions t f-----+ J u j - 2l(t, du) are polynomials in t almost everywhere, where l is a version of the derivative measure associated to M. Using this, here is a brief sketch of the proof of the theorem.
Proof. (a) In view of the above observation, the conditions (i) and (ii) clearly imply that Pj(M) =I- 0, 1 :s: j :s: 2k + 2. If now N is another process of class C with Pj(N) = Pj(M) V 1 :s: j :s: 2k + 2, then it will follow that N has the same mean function as M and also for all O:S: j:S: 2k, JujlN(t,du) = JUjlM(t,du) for almost all t E [0, (0), where IN and lM denote (versions of) the derivative measures associated with Nand M respectively. Consequently, one will have k
J
IT (U- Xi(t))2lN(t,du) = i=l
k
J
IT (u- xi(t))2lM(t,du).
By the same argument as
i=l
in the proof of the 'if' part of Theorem 3.1, we get IN(t,·)
= lM(t,·)
for almost
d
every t, and hence N = M. (b) Suppose that Mis k-polynomially determined in C. Using Lemma 3.1, one can get a version l(t,·) of the derivative measure associated to M such that Isupp(l(t, ·))1 :s: k for all t E [0, (0). For each 1 :s: j :s: k, let T j = {t E [0, (0) : Isupp(l(t, ·))1 = j}. It can be shown that each T j and hence UjTj is a Borel set. For t E T j , order the elements of supp(l(t, .)) as XI(t) < ... < Xj(t) and denote the 1(t, .)- measures of these points by PI (t), ... ,Pj (t) respectively. Also, for j < i :s: k, set Xi(t) = xj(t)+l andpi(t) = 0. Finally, for t 1: UjTj , set Xi(t) == Yi and Pi (t) == for all 1 :s: i :s: k, where Yl, ... ,Yk is any arbitrarily chosen set of k points. With these notations, it is clear that l (t, .) has the form asserted. One can now show that the mapping t f-----+ (XI(t),··· ,Xk(t),PI(t),··· ,Pk(t)) as 0 defined above is measurable and that completes the proof.
°
Remark: In the next section, we will see some examples of possible forms of the functions Xi(t) and Pi(t). Let us remark here that it is possible to formulate
Polynomially Harmonizable Processes
164
the definition of the class C in terms of the Levy measures and then to give a characterization involving the 'derivative measure' arising out of the Levy measure. However, it is not clear how to go beyond the class C and to even formulate a condition that will, for example, characterize the f.p.d. processes among all Levy processes.
4
Some Examples
The most commonly known examples of polynomially harmonizable processes are the standard Brownian motion and the standard Poisson process. One can easily see that for a Brownian motion with fL and a 2 as its drift and diffusion coefficients respectively, a canonical sequence of time-space harmonic polynomials is given by Pk(t, x) = (at)k/2pk(x
;!t), at
where the Pk are the usual one-variable
Hermite polynomials as defined in Section 1. Similarly, for the Poisson process with intensity A, a sequence of of time-space harmonic polynomials is given by Pk(t,x)
=
jto
(~)xj:~
{\-j}
(At)i, where
{ ~} denote the Stirling numbers of the second kind. If M is a non-homogeneous compound Poisson process with intensity function A(') and jump-size distribution F, then it is not difficult to see, using the results described in Section 2, that M is polynomially harmonizable if and only if AU is a polynomial function and F has finite moments of all orders. It is possible, though cumbersome, to get an explicit sequence of time-space harmonic polynomials.
A not so well-known example of a p-harmonizable process is the process M = BES 2 (1), the square of the 1-dimensional Bessel process. It is well-known that
. a seml-sta . bl e markov process wh ose generator IS . gIven . by dx d + 2X dx d2 ' · IS t h IS 2 Using this, one can show that M is polynomially harmonizable and that a k
sequence of its time-space harmonic polynomials is given by P k (t, x) k! tk-j J (2j)!(k-j)! X
= L (- 2)j j=O
U sing the technique mentioned at the end of Section 2, we can now get other examples of p-harmonizable processes that arise as markov processes whose semigroups are intertwined with that of the process BES2(1). Some examples of random variables which lead to interesting semigroups intertwined with that of BES 2 (1), in the sense described in Section 2, are (i) Z = Z 1. b, having the beta distribution with parameters -21 and b, 2 ' and, (ii) Z = 2Zb+~' where Zb+~ has gamma distribution with parameter b+~. The first one leads to the process BES2 (2b ), the square of the Bessel process of
A. Goswami and A. Sengupta
165
dimension 2b, while the latter leads to a certain process detailed in Yor [8] with "increasing saw-teeth" paths. Another interesting example of a process intertwined with BES2 (1) in the same way is what is called Azema's martingale (see Yor [9]) defined as M t = sgn(Bt) ·-jt - gt, t 2.: 0, where B is the standard Brownian motion and gt denotes the last zero of B before time t. The multiplicative kernel here is given by the random variable ml, ther terminal value of " Brownian meander". In [9], Yor uses Chaotic Representation Property to give an alternative proof of pharmonizability of Azema's martingale as well as each member of the class of "Emery's martingales'. As an illustration of our method, we use the time-space harmonic polynomials of BES2(1) as obtained above and the intertwinning to describe time-space harmonic polynomials for two of the cases mentioned above. In the case of BES 2(2b), a sequence of time-space harmonic polynomials are
k
(-2)j(1)j t k- j
.
i) ( .) xJ, where (Y)k stands for the prodj=O (2J)!(b+ 2 j k-J!
given by Pk(t,x) = ~. k-l
uct
TI (y + i).
i=O
For the Azema's martingale, one uses the fact the ml has a Rayleigh distribution to obtain a sequence of time-space harmonic polynomials given by Pdt, x) = k.
(k)
+ 1)hj
(t)x j , where r(.) denotes the gamma function
EHk(t, mIX)
= ~ 2~r(~
and Hk(t,x)
= ~ hjk) (t)x j are the 2-variable Hermite polynomials.
j=O k
j=O
We now discuss some examples of f.p.d. processes. First of all, it is not difficult to see that the only 2-p.d. Levy processes are those that are deterministic, that is, M t is identically equal to a polynomial pet). Our first example of a non-trivial f.p.d. process is the standard Brownian motion, which is a homogeneous Levy process with l(du) = 6{0}(du). Thus, by our Theorem 3.1, standard Brownian motion is uniquely determined among homogeneous Levy processes by its first four time-space harmonic polynomials, for example, by the first four 2-variable Hrermite polynomials. This result should be contrasted with the well-known characterization due to Levy, which says that the first two Hermite polynomials suffice if one assumes continuity of paths in addition. In contrast, our result says that among all homogeneous Levy processes, standard Brownian motion is the only one for which the first four hermite polynomials are time-space harmonic. A natural question is whether we can do with less than four. The answer is an emphatic 'no'. An example of another homogeneous Levy process for which the first three Hermite polynomials are time-space harmonic is the mean zero process determined by the Kolmogorov measure L(dt, du) = dt o l(du) , where l(du) = ~ [6{ _l}(du) + 6{1}(du)]. It is not difficult to see that any gaussian Levy process, with mean and variance functions being polynomials, is also 4-p.d.
For the homogeneous Poisson process with intensity A, one has l(du) = A6{1} , so that once again it is 4-p.d. among all homogeneous Levy processes. Here
166
Polynomially Harmonizable Processes
also, four is the minimum number needed, since one can easily construct an example of a different homogeneous Levy process for which the first three Charlier polynomials are time-space harmonic. For the non-homogeneous compound Poisson process, it can easily be seen that it is f.p.d. if and only if the jump-size distribution is finitely supported and the intensity function is a polynomial function and that in this case, it is actually (2k + 2)-p.d. where k is the cardinality of the support of the jump-size distribution. We conclude with some examples of f.p.d. processes in the class C. We have a characterization of such processes in Theorem 3.2. Here are some examples of possible forms of the functions Xi(t) and Pi(t), that appear in that Thoerem. We consider only the case k = 2. The simplest possible case is that Xl (t), X2 (t) and PI(t) 2: 0,P2(t) 2: 0 are themselves polynomials. Another possibility is that
Xl(t) = a(t) + Jb(ij,X2(t) = a(t) - Jb(ij,PI(t) = c(t) + d(t) Jb(ij,P2(t) = c(t) - d(t)Jb(ij, where a, b, c, d are polynomials so chosen that c + dVb, c - dVb are both non-negative on [0, 00). One can similarly construct other examples. From Theorem 3.2, it follows that all these would lead to processes that are f.p.d (in fact, 6-p.d.) in the class C. A. Goswami Stat-Math Unit Indian Statistical Institute 203 B.T. Road Kolkata 700 108, India ago swami @indiana.edu
A. Sengupta Division of Theoretical Statistics and Mathematics Indian Statistical Institute 203 B.T. Road Kolkata 700 108, India
Bibliography [1] Carmona, P., Petit, F. and Yor, M. (1994). Sur les fonctionelles exponentielles de certain processus de levy. Stochastics and Stochastic Reports, 47, p. 71-101. [2] Goswami, A. and Sengupta, A. (1995). Time-Space Polynomial Martingales Generated by a Discrete-Parameter Martingale. Journal of Theoretical Probability, 8, no. 2, p. 417-431. [3] Ito, K. (1984). Lectures on Stochastic Processes. TIFR Lecture Notes, Tata Institute of Fundamental Research, Narosa, New Delhi. [4] Kechris, A.S. (1995). Classical Descriptive Set Theory, v. 156, Graduate Texts in Mathematics, Springer-Verlag. [5] Lamperti, J. (1972). Semi-Stable Markov Processes. Wahrscheinlichkeitstheorie Verw. Gebiete. 22, p. 205-225.
Zentrablatt
[6] Sengupta, A. (1998). Time-Space Harmonic Polynomials for Stochastic Processes, Ph. D. Thesis, Indian Statistical Institute, Calcutta, India.
A. Goswami and A. Sengupta
167
[7] Sheffer, I. M. (1939). Some Properties of Polynomial Sets of Type Zero. Duke Math. Jour., 5, p. 590-622. [8] Yor, M. (1989). Vne Extension Markovienne de l'algebre des Lois Betagamma. G.R.A.S. Paris, Serie I, 303, p. 257-260. [9] Yor, M. (1994). Some Aspects of Brownian Motion; Part II : Some Recent Martingale Problems. Lectures in Mathematics, ETH Zurich. Laboratoire de Probabilites, Vniversite Paris VI.
168
Polynomially Harmonizable Processes
Effects of Smoothing on Distribution Approximations Peter Hall Australian National University
and Xiao-Hua Zhou Indiana University School of Medicine
Abstract We show that a number of apparently disparate problems, involving distribution approximations in the presence of discontinuities, are actually closely related. One class of such problems involves developing bootstrap approximations to the distribution of a sample mean when the sample includes both ordinal and continuous data. Another class involves smoothing a lattice distribution so as to overcome rounding errors in the normal approximation. A third includes kernel methods for smoothing distribution estimates when constructing confidence bands. Each problem in these classes may be modelled in terms of sampling from a mixture of a continuous and a lattice distribution. We quantify the proportion of the continuous component that is sufficient to "smooth away" difficulties caused by the lattice part. The proportion is surprisingly small - it is only a little larger than n-1logn, where n denotes sample size. Therefore, very few continuous variables are required in order to render a continuity correction unnecessary. The implications of this result in the problem of sampling both ordinal and continuous data are discussed, and numerical aspects are described through a simulation study. The result is also used to characterise bandwidths that are appropriate for smoothing distribution estimators in the confidence band problem. In this setting an empirical method for bandwidth choice is suggested, and a particularly simple derivation of Edgeworth expansions is given.
Keywords: Bandwidth, bootstrap, confidence band, confidence interval, continuity correction, coverage error, Edgeworth expansion, kernel methods, mixture distribution.
1 1.1
Introduction Smoothing in distribution approximations
Rabi Bhattacharya has made very substantial contributions to our understanding of normal approximations in statistics and probability. None has been less important and influential than his exploration and application of smoothing as it is related to distribution approximations. For example, his development of ways of smoothing multivariate characteristic functions lies at the heart of his pathbreaking work on Berry-Esseen bounds and other measures of rates of
169
170
Effects of Smoothing on Distribution Approximations
convergence in the multivariate central limit theorem (e.g. Bhattacharya 1967, 1968, 1970; Bhattacharya and Rao, 1976). His introduction of what has become known as the "smooth function model" (Bhattacharya and Ghosh, 1978), for describing properties of Edgeworth expansions of statistics that can expressed as smooth functions of means, has allowed wide-ranging asymptotic studies of statistical methods such as those based on the bootstrap. The present paper is a very small contribution, but nevertheless in a related vein - a small token of our appreciation of the considerable contribution that Rabi has made to distribution approximations in mathematical statistics. A key assumption in many distribution approximations in statistics is that the distribution being approximated is continuous. Without this property, not only are approximation errors likely to be large, but special features that the approximations are often assumed to enjoy can be violated. These include the property that the coverage error of a two-sided confidence interval is an order of magnitude less than that for its one-sided counterpart. In a range of practical problems the assumption of smoothness can be invalid, however. In such cases there may sometimes be enough "residual" smoothing present in other aspects of the problem for it to be unnecessary to smooth in an artificial way. Nevertheless, even in these circumstances it is important to know how much residual smoothing is required, so that the adequacy of the residual smoothing can be assessed. In other problems there is simply not enough smoothing to overcome the most serious discretisation errors; there, artificial smoothing, for example using kernel methods, can be efficacious. In the present paper, motivated by particular problems of both these types, we derive a general theoretical benchmark for the level of smoothing that is adequate in each case. In the first class of problem, encountered in several practical settings, we suggest an empirical method for assessing whether the benchmark has been attained. In the second class, related to smoothed distribution estimation, we introduce an empirical technique for determining how much smoothing should be provided. Both types of problem have a common basis, in that they represent mixture-type sampling schemes where a portion of the data are drawn from a smooth distribution and the remainder from a lattice distribution. It is shown that the sampling fraction of the smooth component can be surprisingly small before difficulties arise through the roughness of the other component. The threshold is approximately n-1logn, where n denotes sample size. In the case of the second problem this result may be interpreted as a prescription for bandwidth choice, which can be implemented in practice using a smoothed bootstrap method. For the first problem the result may be interpreted as defining a safeguard: only when the smooth component is present in a particularly small proportion will the unsmooth component cause difficulties. Next we introduce the two classes of problem.
1.2
First problem: bootstrap inference for distributions with both ordinal and continuous components
In some applications it is common to encounter a data distribution that is a mixture of an atom at the origin and a continuous component on the positive
Peter Hall and Xiao-Hua Zhou
171
half-line. Examples include the the cost of health care (e.g. Zhou, Melfi and Hui, 1997) and the proportion of an account that an audit determines to be in error (e.g. Cox and Snell, 1979; Azzalini and Hall, 2000). In the second example, both 0 and 1 can be atoms of the sampled distribution. In both examples the mean of the mixture, rather than the mean of just the continuous component, is of interest. If all the data are ordinal and lie within a relatively narrow range, for example if the costs or proportions in the respectively examples are distributed among only a half-dozen equally-spaced bins, then the lattice nature of the data needs careful attention if bootstrap methods are to be used to construct confidence intervals for the mean. Indeed, particular difficulties associated with this case were addressed in the first detailed theoretical treatment of bootstrap methods for distribution approximation; see Singh (1981). One way of alleviating these difficulties is to use smoothed bootstrap methods; see for example Hall (1987a). On the other hand, no special treatment is required if just the positive part of the sampled distribution is addressed, provided this portion of the distribution is smooth.
This begs the question of what should be done in the mixture case. Does the implicit smoothing provided by the continuous component overcome potential difficulties caused by the ordinal component? How does the answer to this question depend on the proportion of the ordinal component in the mixture? Our results on the effects of smoothing on distribution approximation allow us to answer both these questions; see sections 3.1 and 4.1. A related problem is that of smoothing a discrete distribution so as to construct a confidence interval for its mean. One approach is to blur each lattice-valued observation over an interval on either side of its actual value; see for example Clark et al. (1997, p. 12). For example, if a random variable Y with this distribution takes only integer values, we might replace an observed value Y = i by i + EZ, where E > 0 and Z is symmetric on the interval [-1, 1]. How large does E have to be in order to effectively eliminate rounding errors from an approximation to the distribution of the mean of n values of Y? In particular, can we allow E to decrease with sample size, and if so, how fast? Answers will be given in sections 3.1 and 4.1. Of course, in this second aspect of the first problem it is the mean of Y, not the mean of X = Y + EZ, about which we wish to make inference. However, the mean of EZ is known, and so it is a trivial matter to progress from a confidence interval for E(X) to one for E(Y).
1.3
Second problem: functions
confidence bands for distribution
Let U = {U1 , ... , Un} denote a random sample drawn from a distribution F, and write F for the empirical distribution function based on U. Then, with Za/2 denoting the upper ~a-Ievel point of the standard normal distribution,
172
Effects of Smoothing on Distribution Approximations
F ± {n- 1F(1 - F)}1/2 Za/2 is a conventional confidence band for F founded on normal approximation, with nominal pointwise coverage 1 - CY. In more standard problems, involving a mean of smoothly distributed random variables, the coverage accuracy of such a band would equal 0 (n -1). In the present setting, however, owing to asymmetric rounding errors that arise in approximating the discrete Binomial distribution by a smooth normal one, coverage error of even a two-sided symmetric confidence band is in general no better than O(n- 1 / 2 ). A particularly simple way of smoothing in this setting, and potentially overcoming difficulties caused by rounding errors, is to use kernel methods. Let K, the kernel, be a bounded, symmetric, compactly supported probability density, write L for the corresponding distribution function, and let h be a bandwidth. Then
Fh(u)
=
n- 1
t
L(U ~ U
i )
(1.1 )
2=1
is a smoothed kernel estimator of F. We may interpret E{Fh(u)} in at least two different ways: firstly, as the mean of a sample drawn from a mixture of two distributions, one taking only the values 0 and 1 (the latter with probability F(u - hc), where [-c, c] denotes the support of K), and the other having a smooth distribution (equal to that of L{ (u - Ui ) / h}, conditional on (u - Ui ) / h lying within the support of K); and secondly, as the distribution function of X = Y + hZ, where Y and Z have distribution functions F and L, respectively. Hence, this problem and those described in section 1.2 have identical roots. The bias of Fh(u), as an estimator of F(u), equals O(h2) provided F is sufficiently smooth. In relative terms its variance differs from that of F(u) by only O(h). See Azzalini (1981), Reiss (1981) and Falk (1983) for discussion of these and related properties. Together these results suggest that taking h as small as possible is desirable, since then h would have least effect on moment properties. Indeed, the moment properties suggest that h = O(n-1) might give the O(n-l) coverage error seen in conventional problems. However, it may be shown that this size of bandwidth is not adequate for removing difficulties caused by lack of smoothness of the distribution of F. With h = O(n- 1 ), rounding errors still contribute terms of order n- 1/ 2 to coverage error of two-sided confidence bands. Can we choose h large enough to overcome these problems, and yet small enough to give an order of coverage accuracy close to the "ideal" O(n- 1 )? And even if this problem has a theoretical solution, can good coverage accuracy be achieved empirically? These questions will be answered in sections 3.2 and 4.2, where we shall propose and describe an empirical bandwidth-choice method in the confidence band problem. Additionally we shall show that our approach to the problem of smoothed distribution estimation, via sampling from a mixture distribution, leads to particularly simple derivations of Edgeworth expansions. There is of course an extensive literature of the problem of bandwidth choice for kernel estimation of distribution functions. It includes both plug-in and cross-validation methods; see Mielniczuk, Sarda and Vieu (1989), Sarda (1993), Altman and Leger (1995), and Bowman, Hall and Prvan (1998). However, in all
Peter Hall and Xiao-Hua Zhou
173
these cases the bandwidths that are proposed are of asymptotic size n -1/3, much larger than n- 1 . They are appropriate only for estimation of the distribution function curve, not for confidence interval or band construction, and produce relatively high levels of coverage error if used for the latter purpose. The class of distribution and density estimation problems is characterised by an interesting hierarchy of bandwidth sizes: n- 1 / 5 for estimating a density curve, n- 1 / 3 for distribution curve estimation, and a still smaller size, approximately n -1 log n (as we shall show in section 3.2), for constructing two-sided confidence bands for a distribution function.
2
Distribution-Approximation Difficulties Caused by Lack of Smoothness
Let Xl' ... ' Xn be independent and identically distributed random variables with the distribution of X, and let X = n- 1 Li Xi denote the sample mean. Many explanations for the small-sample performance of bootstrap approximations to the distribution of X are based on properties of its Edgeworth expansion. A formal expansion exists under moment conditions alone. In particular, provided only that (2.1) the formal Edgeworth expansion up to terms in n- k / 2 is well defined; it is
where
i
If, in addition to the moment assumption (2.1), the distribution of X is smooth (for example if it is absolutely continuous), Qk can provide an accurate approx-
imation to the standardised distribution of X. For example, if the distribution of X has a bounded density, and if we define f1 = E(X) and 0- 2 = var(X), then (2.3)
uniformly in x, as n ---+ 00. The performance of bootstrap methods rests heavily on this result, through the property that the bootstrap provides a particularly accurate estimate of the term Qk on the right-hand side of (2.3). However, (2.3) fails if the sampled distribution is lattice. For example, if nX has the Binomial Bi(n, q) distribution, where 0 < q < 1, then (2.3) holds only if we add, to the right-hand side, a continuity-correction term for each order from n- 1 / 2 to n- k / 2 inclusive. Such terms compensate for errors introduced by approximating the relatively rough Binomial distribution by a smooth function.
Effects of Smoothing on Distribution Approximations
174
In particular, if the sampling distribution is supported on the set of integers and has lattice span 1, and if we define
Dk(x)
=
Qk(X) -
2..=
Q~{ (j - nf1)/(n 1/ 2a)} ,
j~nx
then the "corrected" form of (2.3) holds:
uniformly in x. See for example pp. 237-241 of Bhattacharya and Rao (1976). Of course, (2.4) has analogues in the case of other lattice distributions. In these general cases we may express Dk(X) as an expansion with terms of size n- j / 2 , for 1 ~ j ~ k. The term in n- 1 / 2 equals
where S(u) = (u) - u + ~ and (u) denotes the integer part of u. The wellknown continuity correction, applied for example to normal approximations to the Binomial distribution, adjusts for Dk1 (x). We shall show in section 5, however, that if the distribution of X is smoothed through being a mixture of only a small proportion of a continuous distribution, then all aspects of the continuity correction Dk (x) may be dispensed with. That is, Dk(X) may be dropped from (2.4), and (2.3) holds for all k "2 1. The implications of this result for coverage accuracy of confidence regions can be considerable. To appreciate this point, note that since 1[1 (x) at (2.2) is symmetric in x then, in the case of a smooth sampled distribution, potential coverage errors of size n- 1 / 2 cancel from the formula for coverage of the twosided confidence interval X ±n-1/2azo:/2' As a result this interval has coverage error O(n- 1). However, since the correction term Dk(X) is not symmetric in x then this property fails when the sampled distribution is unsmooth, and there the order of coverage error is only O(n- 1 / 2 ), even for symmetric, two-sided confidence intervals. Moreover, a conventional continuity correction does not remove all the error of this size; taking that approach, the best that can generally be achieved is to produce a conservative confidence interval where the coverage error is dominated by, rather than equal to, the nominal level plus O(n- 1 ). See Hall (1982, 1987a) for discussion of this issue. Of course, these results have direct analogues in the Studentised case; in the discussion above we have treated the non-Studentised case, where (J" is assumed known, only for convenience.
Peter Hall and Xiao-H ua Zhou
3
3.1
175
Overcoming Difficulties Caused by Lack of Smoothness Solution to first problem
Suppose the distribution of X is obtained by mixing a smoothly distributed random variable Y (for example, one having a bounded probability density) with an arbitrary but nondegenerate random variable Z, in proportions p and 1 - p respectively, where p may depend on n. We wish to know the effect that any smoothing conferred by the distribution of Y has on the distribution of a mean X of n independent random variables distribution as X. It will be shown in section 5 that if n -1 log n = o(p) then the discretisation-error term Dk is negligible, and in fact sUPx IDk(X)1 = o(n- k/ 2 ). As a result, the distribution of X is accurately approximated by its formal Edgeworth expansion, to any order that is permitted by the number of moments enjoyed by the distribution of X. This property applies equally to the distributions of Studentised and non-Studentised means; in both cases, the comparatively small amount of smoothing obtained when n- 1 logn = o(p) is nevertheless sufficient to compensate for highly unsmooth features of the other component of the sampling distribution.
We shall also note in section 5 that these results extend to applications of the bootstrap. Indeed, all those properties of the bootstrap that are valid whenever a fixed sampled distribution is accurately approximated by its formal Edgeworth expansion (see e.g. Hall, 1992, Chapter 3), continue to hold for our mixture distribution, provided n -1 log n = o(p). Of course, these results are somewhat asymptotic in character, although the particularly small lower bound to the effective value of p suggests that in most cases the results will be available in practice. Numerical work in section 4.1 will bear this out. In a specific, practical problem an empirical method for determining whether p is sufficiently large is to explore the problem by Monte Carlo means: model the distribution of the smooth component of the sampled distribution, and, taking the mixing proportion equal to its naive estimate, simulate to ascertain the effect of discretisation error in the context of the model. In the case of specific component distributions (e.g. a normal smooth component and a Bernoulli lattice component) it can be shown that the constraint n -1 log n = o(p) is necessary as well as sufficient for formal Edgeworth approximation to be valid at all orders. In more general cases it is readily proved that the less stringent constraint n- 1 = O(p) is not sufficient. Very similar results may be derived in the related problem of smoothing the distribution of an integer-valued random variable Y by adding to it, rather than mixing it with, a continuous component. That is, we replace Y by Y + EZ, where E > 0 and Z has a continuous distribution. As long as E = E(n) decreases to 0 more slowly than n- 1 log n, this modification allows us to approximate the distribution of the mean of Y + EZ by its formal Edgeworth expansion to any or-
Effects of Smoothing on Distribution Approximations
176
der; see section 5.3. If the distribution of Z is symmetric then the distributions of both Y and Y + EZ have the same mean and skewness, and their variances differ only to order E2. Moreover, the "converse" results described in the previous paragraphs have direct analogues in the setting of additive smoothing of a discrete distribution.
3.2
Solution to second problem
Recall from section 1.3 that we seek a pointwise, (1 - a)-level confidence band for the distribution function F. We noted there that the standard normalapproximation band, F ± {n- 1F(1- F)}I/2zOO/2' has only O(n-l/2) coverage accuracy, owing to uncorrected discretisation errors. We suggest instead the smoothed band, (3.1) where Fh is as defined at (1.1). We shall show at the end of this section that by taking h = n -1 (log n) HE, for any E > 0, coverage error of this band is reduced to O(h). That is only a little worse than the O(n-I) level encountered in related problems, where the sampled distribution is smooth. These properties are highly asymptotic in character, however. To achieve a good level of performance in practice we suggest the following approach. Using standard kernel methods, compute an estimator of the density f = F' based on the sample U. For example, if employing the same kernel K as before, the estimator would be
where hI is a bandwidth the size of which is appropriate to density estimation. (In particular, hI would generally be computed using either cross-validation or a plug-in rule; it would be of size n- 1 / 5 , in asymptotic terms.) Let Fhi(U) = fv
ih
{3oo(u, h) = P(Fh*(u) - [n- 1Fh*(u){l - Fh*(u)}f/ 2Zoo/2 :::; Fhi (u) :::; Fh*(u)
+ {n- 1Fh*(u) {1- Fh*(u)}]1/2ZOO/2IU).
Choose h = hoo to render (300 (u, h) as close as possible to a over the interval I where we wish to construct the final confidence band. For example, we might select hoo to minimize Aoo(h), where
Peter Hall and Xiao-Hua Zhou
177
Our confidence band is that defined at (3.1), but with h = ha . If desired, an additional level of calibration can be incorporated by choosing b, h) = (i, h) simultaneously, to minimise A'Y(h), and taking the band to be that at (3.1) but with bandwidth h and critical point Zi/2 (instead of Za/2). Finally we outline a derivation of the theoretical properties claimed of the confidence band at (3.1). It will be shown in section 5.4 that if h decreases to 0 at a slower rate than n- 1 logn, i.e. if
n h( n) / (log n)
(3.2)
---+ 00 ,
then the smoothed empirical distribution function estimator Fh, defined at (1.1), admits a formal Edgeworth expansion of any order k 2': 1. That is, if Qk = Qh,k at (2.2) denotes the formal Edgeworth expansion of Fh(u) then the analogue of (2.3) holds for each k 2': 1:
p(n 2 Fh(;h(U~h(U) :::; x) = Qh,dx) + o(n- k/ 2) 1/
(3.3)
uniformly in x, where
Fh(U) = E{Fh(u)} = O"h(u)2
=
n var{Fh(u)}
J
K(v) F(u - hv) dv,
=
J
L(U ~ vt f(v) dv - Fh(U)2.
If F" exists and is bounded in a neighbourhood of u then Fh(u) and O"h(U)2 = F(u){l - F(u)} + O(h). Therefore, provided
n- 1 logn« h = O(n- 1 / 2 )
= F(u) + O(h2) (3.4)
,
(3.3) for k 2': 2 implies that
P (n
1/2
<) _
Fh(U) - F(u) [F(u){l _ F(U)}j1/2 - X - Qh,k(X)
+ O(h)
uniformly in x. It may be shown by Taylor expanding the argument of the probability that this implies
p(n
1/2
Fh(u) - F(u) ) () [Fh(u){l _ Fh(u)}]I/2 :::; X = Qh,k X
+
O(h
).
(3.5)
Since the bandwidth h = n- 1 (logn)1+ E satisfies (3.4) then the claims made immediately below (3.1) follow from (3.5). Another advantage of our approach is that it leads to particularly simple derivations of detailed Edgeworth expansions. Indeed, once one appreciates that the problem can be posed in terms of sampling from a mixture, (3.3) immediately gives a simple form of the expansion, to arbitrarily high order. Deriving the expansion in more traditional form, with terms of orders n -i/2 h j for i, j 2': 0 (rather than simply n- i / 2 ), is only a matter of Taylor expanding the quantities O"h and Qh,k at (3.3). A different argument, based on intrinsic properties of the smoothed distribution function, was given by Garcfa-Soidan, GonzalezManteiga and Prada-Sanchez (1997). In addition to the complexity of that technique, it requires more severe conditions on the smoothness of K.
178
4 4.1
Effects of Smoothing on Distribution Approximations
Numerical Properties Effects of different mixing proportions in the first problem
We conducted a simulation study to assess the effects of mixing proportions on coverage accuracy of two-sided confidence intervals based on either Studentised or non-Studentised means. We generated 1000 samples of sizes n = 10 and 20 from a mixture of a discrete Bernoulli distribution with probability of success 0.1 and different continuous distributions: chi-squared distributions with two, four and six degrees of freedom, and a standard normal distribution. Figure 1 graphs coverage probabilities for two-sided 95% confidence intervals in both Studentised and non-Studentised cases, where the endpoints of the intervals are taken to be X ± 1.96n- 1 / 2 (T and X ± 1.96n- 1 / 2 a, respectively, and a is the bootstrap standard deviation. Coverage accuracy in the non-Studentised case is high for even small proportions of continuous data, as argued in section 3.1. More difficulties are experienced in the Studentised case, however. There, increasing the proportion of continuous data has a more marked influence on coverage accuracy. Analogous results are obtained for one-sided confidence intervals, except that there the effect of the proportion of continuous data is confounded with the influence of skewness which now has a significant effect on coverage accuracy for different sample sizes.
4.2
Effect of different mixing proportions in the second problem.
Numerical studies which are not detailed here show that for small bandwidths, before bias becomes a significant problem, coverages of smoothed confidence intervals for distribution functions increase monotonically with increasing bandwidth. This is a consequence of the variability of smoothed distribution estimators decreasing with increasing bandwidth. Confidence intervals usually, although not always, undercover when h = 0 and overcover when the bandwidth is taken to equal the value, hMs E say, that gives least mean squared error for a given argument u of the distribution function. As the bandwidth is increased from h = 0 to hMs E it typically passes through a value that, when used to construct a smoothed a-level confidence interval for F (u), gives zero coverage error. The ~ootstrap method suggested in section 3.2 produces an empirical approximation ho: to this interval-optimal bandwidth. Table 1 gives numerical examples of the performance of ho:. There we took F to be the standard normal distribution function, although results are similar in other cases; only u = 0, where the normal density has zero gradient and, consequently, the bias of a distribution estimator equals O(h4) rather than O(h2), is atypical. Columns of Table 1 give approximations to the true coverage of confidence intervals (obtained by averaging over 1000 samples, using B = 1500 bootstrap simulations) for different values of n. Rows express (a) the confidence interval using the bandwidth h = h MSE that produces optimal pointwise accu-
Peter Hall and Xiao-Hua Zhou
179
Chi-square with df=2 when n=10
Chi-square with df=2 when n=20
q
~ ~
~ :g,
a. ~
~
()
q
'"a
£'
a'"
a.
~
.
~ ~
~
a
()
0.2
0.3
0.4
. r e
()
0.1
q
~
a'"
a.
~e
a
"'a
~
....
()
~
()
0.3
0.4
0.5
0.1
f
0.5
Chi-square with df=6 when n=20 q
~
~
e a.
t ~
a
()
a'"
'" a
.... a
'"a 0.2
0.3
0.4
0.5
0.1
0.2
0.3
0.4
Proportion of continous data
Proportion of continous data
N(O,1) when n=10
N(O,1) when n=20
q
'-'
0.4
Chi-square with df=6 when n=10
0.1
a.
0.3
Proportion of continous data
a
I
0.2
Proportion of continous data
.?
a
a 0.2
'" a
....
a'"
. ~
....
'" a
~
0.5
Chi-square with df=4 when n=20
'"a
.
0.4
Chi-square with df=4 when n=10
q
a.
0.3
Proportion of continous data
0.1
~e
0.2
Proportion of continuous data
a'"
~
a
0.5
q
a.
.... a'"
0.1
.2
'"a
~
....
'"a
~ :c
a'"
0.5
q
'----
'"a
~
a'"
"' a
a.
"'a
~e
:g, ~
....
~
a
'-'
.... a
'"a
'"a 0.1
0.2
0.3
0.4
Proportion of continous data
0.5
0.1
0.2
0.3
0.4
0.5
Proportion 01 continous data
Figure 1 Coverage probabilities of two-sided 95% confidence intervals. Solid and dotted lines show coverages of non-Studentised and Studentised intervals, respectively, for the mean of a mixture of a discrete Bernoulli distribution and a chi-squared or a normal distribution.
EHects of Smoothing on Distribution Approximations
180
racy (PTWS); (b) the interval calculated using our bootstrap method (BOOT); and (c) the unsmoothed interval (UNSM). Except when u = 0 the coverage for the interval BOOT lies between its counterparts for PTWS and UNSM. In almost every case it is substantially closer to 0.95 than the coverages of either of the other two intervals. In our calculations we employed the distribution version of the Epanechnikov kernel, defined by L(u) = (3/4)u - (1/4)u 3 + (1/2) for It I :::; 1, L(u) = 0 ift < -1 and L(u) = 1 ift > 1.
Method BOOT PTWS UNSM
u= n=20 0.955 0.990 0.971
u = 0.75
0.0 n=50 0.942 0.983 0.918
n=20 0.941 0.987 0.945
n=50 0.948 0.986 0.925
u = 1.5 n=20 0.933 0.965 0.766
n=50 0.954 0.980 0.858
N(O,l): the standard normal distribution. Methods: BOOT, the interval using our bootstrap method; PTWS, the confidence interval using the bandwidth h = hMs E that produces optimal pointwise accuracy; UNSM, the unsmoothed interval. Table 1: Coverages of different confidence intervals for F( u). The distribution is standard normal, u denotes the argument at which F is estimated, and rows headed PTWS, BOOT and UNS~ represent intervals using the pointwiseoptimal bandwidth, the bandwidth ha suggested in section 3.2, and h = 0, respectively.
5 5.1
Technical Details Mixture of discrete and continuous distribution
Let Y be a random variable with the property that its characteristic function 'IjJ(t) = E(e itY ) satisfies Cramer's condition: lim sup 1'IjJ(t) I < 1.
(5.1)
It l---too
In particular, (5.1) holds if the distribution of Y is absolutely continuous. Let Z denote a random variable independent of Y and having any nondegenerate distribution, and let the distribution of X be a mixture of those of Y and Z in proportions p : 1 - p. We shall take p to be a function of sample size, since this allows us to explore the case where X = Z with very high probability. Thus,
x
=
{Y Z
with probabilityp = p(n) with probability 1 - p.
(5.2)
Peter Hall and Xiao-Hua Zhou
181
Given this distribution of X, define the formal Edgeworth expansion Qk as at (2.2), and put p, = E(X) and (J2 = var(X). Note that all moments of X depend on n, through p( n).
Theorem 5.1. Assume the distribution of Y satisfies (5.1), and that the distribution of X is given by (5.2). Suppose too that the distribution of Z is nondegenerate, that E(IYI + IZl)k+2 < (X) (5.3)
where k
~
1, and that
pen) as n
- t 00.
-t
0
and
lim n p(n)/(log n) =
(5.4)
(X)
n->oo
Then (5.5)
uniformly in x. Note particularly that (5.4) requires only a very small proportion, not much larger than O(n-llog n), of the Xi'S to be equal to the smoothly distributed Yi's. Furthermore, the Edgeworth expansion at (5.5) involves no continuity-correction term. Therefore, "a small amount of smoothness goes a long way" in removing any effects of discreteness of the distribution of the sample mean.
5.2
Bootstrap form of Theorem 5.1
Let .1'* = {Xi, ... ,X~} denote a resample drawn by sampling randomly, with replacement, from X = {Xl, ... , X n }. Let S2 be the variance of X (defined using divisor n rather than n - 1), let X* denote the mean of .1'*, and let Qk be the empirical form of Qk, in which each population moment is replaced by its sample counterpart.
Theorem 5.2. Assume the conditions of Theorem 5.1. Then
P{nl/2(X* - X)/S:::;
xiX} = Qk(X) + op(n- k/ 2) ,
(5.6)
uniformly in x. The first term in Q k, of size n -1 /2, depends on only the first three moments of the distribution of X. Provided E(IXI + IYI)6 < 00, these three moments differ from their sample counterparts only by order n -1/2. Therefore, taking k ~ 2 and subtracting (5.5) and (5.6), we deduce that
p{ nl/2(X*
- X)/ S :::;
xiX} - p{ n 1/ 2(X
- p,)/(J :::;
x}
=
Op (n- 1 )
,
uniformly in x. This is the analogue of second-order correctness in the present setting: the bootstrap approximation to the distribution of the sample mean is accurate to order n- 1, not simply n- 1/ 2 (as in a conventional normal approximation). Note particularly that this has been achieved through only a small amount of smoothing, by mixing a virtually arbitrary Z distribution with only a little more than proportion O(n-1logn) of the relatively smooth Y distribution.
182
5.3
Effects of Smoothing on Distribution Approximations
Variant of Theorem 5.1 for distribution smoothing
Let Y and Z be independent variables, as discussed in section 5.1, and in place of (5.2) put X = Y + EZ where E = E(n) is nonrandom. For this definition of X let Qk be the formal Edgeworth expansion as at (2.2). Theorem 5.3. Assume the distributions of Y and Z satisfy (5.3), that X = + E(n) Z, and that (5.4) holds with p(n) there replaced by E(n). Then (5.5) holds.
Y
5.4
Application to first and second problems
Application to the first problem is straightforward, provided the distribution of Z is nondegenerate. If the distribution is degenerate and the condition
p( n) is bounded away from 0
(5.7)
fails, then 0' = 0'( n) is not bounded away from 0, and this causes difficulties even in interpreting (5.5). In particular, if (5.7) fails then a formal Edgeworth expansion in powers of n- 1 / 2 is no longer appropriate; it should instead be in powers of {np(n)}-1/2. However, it is straightforward to show that if (5.7) holds then Theorems 5.1 and 5.2 remain valid when the condition that Z has a nondegenerate distribution is removed. Claims made in section 3.1, about properties of confidence intervals and bootstrap methods in the case of the "first problem" (see section 1.2), now follow directly from Theorems 5.1 and 5.2 and their counterparts for the Studentised mean, discussed in section 5.5. Next we consider allowing the distributions of Y and Z, and hence X, to vary with n. Theorems 5.1 and 5.2 continue to hold in this case, provided (a) the moment condition (5.3) is strengthened to for some E > 0,
limsup E{IY(n)1
+ IZ(n)1} k+2+E < 00,
(5.8)
n->oo
(b) the variance of Z is bounded away from 0 in the limit, i.e. liminf var{Z(n)} > 0, n->oo
(5.9)
and (c) the smoothness condition (5.1) holds in a uniform sense, i.e. lim sup sup Itl->oo n~l
IE[exp {itY(n)}] I < l.
(5.10)
(The analogue of (5.9) for Y follows from (5.10).) Claims made in section 3.2, about performance of bootstrap methods in the case of the "second problem" (see sections 1.3 and 3.2), follow from Theorems 5.1 and 5.2 under these more general conditions. To appreciate why, note that if the kernel K whose integral equals L is compactly supported, and if the
Peter Hall and Xiao-Hua Zhou
183
distribution of the random variable U has a continuous density, then we may interpret X = L{(x - U)/h} as being of the form (5.2). In that representation, Y has the distribution of L{ (x - U) / h} conditional on x - h < U < x + h, and Z has a Bernoulli distribution with
P(Z = 1) = p{U < x - hi U ~ (x - h, x
+ h)},
P(Z = 0) = 1 - P(Z = 1).
(Here we have assumed, without loss of generality, that the support of K equals [-1,1].) If in addition K is bounded and the distribution of U has a bounded density then (5.8)-(5.10) hold, and (5.4) is equivalent to (3.2).
5.5
Further generalisations and extensions
The theorems also apply to the case of the Studentised mean. There we should: (a) alter (5.5) to
p{ nl/2(X - /1)/ S
::;
x}
=
Rk(X)
+ o(n- k/ 2 )
,
where Rk is the formal Edgeworth expansion corresponding to the Studentised mean; (b) strengthen the moment condition (5.3) to
E(IYI
+ IZ1)2k+4 < 00;
(5.11)
and (c) change the smoothness assumption (5.1) to limsup Itl+lsl--+oo
IE{ exp (itY + isty 2 )}1 < 1.
(5.12)
Alternatively, the original moment condition can be retained but a more restrictive smoothness assumption imposed; compare Hall (1987b). To clarify the differences between the formal Edgeworth expansions Q k and Rk we note that Rk also admits a formula like (2.2), but with different polynomials 7rk. In particular the polynomial 7rl now equals (3 (2x2 + 1), instead of (3 (1 - x 2). See Hall (1992, Chapter 2) for discussion of these issues.
i
i
Likewise, Theorems 5.1 and 5.2 can be extended to the so-called "smooth function model", where X is replaced by a smooth function of an r-vector of means. In this case the r-variate versions of (5.11) and (5.12) are sufficient. In each generalisation, condition (5.4) on the mixing proportion may be retained. Theorems 5.1 and 5.2 also continue to hold if, instead of defining X by (5.2), we take Xi = Yi for 1 ::; i ::; (np) , and Xi = Zi for (np) < i ::; n, where Y1 , Y 2, . .. and Z 1, Z2, . .. denote independent sequences of independent copies of Y and Z, respectively, where (np) denotes the integer part of np. None of the other assumptions needs to be altered; in particular, condition (5.4) on p = p(n) may be retained. However, these variants of the theorems appear to have relatively few statistical applications.
5.6
Outline proof of Theorem 5.1
The derivation is based on characteristic functions and Fourier inversion. It is similar to that in traditional cases (e.g. Petrov, 1975, Chapter 5), with
Effects of Smoothing on Distribution Approximations
184
the exception of the method for bounding the difference, 0 say, between the characteristic functions of the left-hand side of (5.5) and of the term Qk on the right-hand side. Using standard arguments one may obtain the bound Ir1o(t)\ ::; ~n-k/2 exp( -ryt 2 ) for It I ::; (nl/2, where ~ > 0 can be arbitrarily small and ry, ( > 0 depend on ~ but not on n. For It I > (nl/2 one may establish the bound C 2 (1 - 3 )n p (n), where C2 > 0 and C3 E (0,1) depend on ( but not on n. Assuming p satisfies (5.4) we may deduce from these bounds, by taking ~ arbitrarily small, that the integral of Ir1o(t)1 over the interval (-n C 4,n C4 ), for any C4 > 0, equals o(n- k / 2 ), as has to be shown in order to complete the proof.
c
The proof of Theorem 5.2 is similar, and may be based on arguments of Hall (1992, 0 section 5.2). Xiao-Hua Zhou Peter Hall and Xiao-H ua Zhou Centre for Mathematics and its Applications Division of Biostatistics Australian National University Department of Medicine Canberra, ACT 0200, Australia Indiana University School of Medicine RG/4th Floor Regenstrief Health Center 1050 Wishard Boulevard Indianapolis, IN 46202, USA
Bibliography [1] Altman, N. and Leger, C. (1995). Bandwidth selection for kernel distribution function estimation. J. Statist. Plan. Infer. 46, 195-214. [2] Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68, 326-328. [3] Azzalini, A. and Hall, P. (2000). Reducing variability using bootstrap methods with qualitative constraints. Biometrika, to appear. [4] Bhattacharya, R.N. (1967). Berry-Esseen bounds for the multi-dimensional central limit theorem. PhD Dissertation, University of Chicago. [5] Bhattacharya, R.N. (1968). Berry-Esseen bounds for the multi-dimensional central limit theorem. Bull. Amer. Math. Soc. 74, 285-287. [6] Bhattacharya, R.N. (1970). Rates of weak convergence for the multidimensional central limit theorem. Teor. Verojatnost. i Primenen 15, 69-85. [7] Bhattacharya, R.N. and Ghosh, J. K. (1978). On the validity of the formal Edgeworth expansion. Ann. Statist. 6, 434-451. [8] Bhattacharya, R.N. and Rao, R. Ranga (1976). Normal Approximation and Asymptotic Expansions. Wiley, New York. [9] Bowman, A.W., Hall, P. and Prvan, T. (1998). Cross-validation for the smoothing of distribution functions. Biometrika 85, 799-808.
Peter Hall and Xiao-Hua Zhou
185
[IOJ Clark, L.A., Cleveland, W.S., Denby, L. and Liu, C. (1997). Modeling customer survey data. Manuscript. [l1J Cox, D.R. and Snell, E.J. (1979). On sampling and the estimation of rare errors. Biometrika 66, 125-132. Correction ibid 69 (1982),491. [12J Falk, M. (1983). Relative efficiency and deficiency of kernel type estimators of smooth distribution functions. Statist. Neer. 37, 73-83. [13J Garcia-Soidan, P.H., Gonzalez-Manteiga, W. and Prada-Sanchez, J.M. (1997). Edgeworth expansions for nonparametric distribution estimation with applications. J. Statist. Plann. InE. 65, 213-231. [14J Hall, P. (1982). Improving the normal approximation when constructing one-sided confidence intervals for binomial or Poisson parameters. Biometrika 69, 647-652. [15J Hall, P. (1987a). On the bootstrap and continuity correction. J. Roy. Statist. Soc. Ser. B 49, 82-89. [16J Hall, P. (1987b). Edgeworth expansion for Student's t-statistic under minimal moment conditions. Ann. Probab. 15, 920-931. [17J Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York. [18J Mielniczuk, J., Sarda, P. and Vieu, P. (1989). Local data-driven bandwidth choice for density estimation. J. Statist. Plan. Infer. 23, 53-69. [19J Petrov, V.V. (1975). Sums of Independent Random Variables. Springer, Berlin. [20J Reiss, R.-D. (1981). Nonparametric estimation of smooth distribution functions. Scand. J. Statist. 8, 116-119. [21J Sarda, P. (1993). Smoothing parameter selection for smooth distribution functions. J. Statist. Plan. Infer. 35, 65-75. [22J Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist. 9, 1187-1195. [23J Zhou, X.H., Melfi, A. and Hui, S.1. (1997). Methods for comparison of cost data. Ann. Internal Med. 127, 752-756.
186
Effects of Smoothing on Distribution Approximations
Survival Under Uncertainty in an Exchange Economy1 Nigar Hashimzade and Mukul Majumdar Cornell University
Abstract The paper explores a number of issues related to economic survival in market economies. An individual agent may fail to survive (may be ruined) if it faces a collapse of endowment or unfavorable terms of trade. The role of "intrinsic" and "extrinsic" uncertainty in triggering unfavorable terms of trade is examined in detail. In the presence of intrinsic uncertainty affecting the endowments, an important issue is the nature of stochastic dependence among the agents, particularly in a large economy.
1
Introduction
The last twenty years have witnessed a significant growth of the literature on the "survival problem" ([25], p.436), primarily in the context of the causes and remedies of famines. Once a subject essentially of empirical development economics, economic survival became an issue of analytical economics and, most recently, of general equilibrium theory. Considerable progress has been achieved in the theoretical analysis and empirical investigations of the causes of famines and policy measures to combat famines (see the collection edited by Dreze [10] and the detailed list of references). There has been a recognition that a partial equilibrium model, focusing on the food market, is unable to capture the complexity of events that result in famines, and may indeed render misleading policy prescriptions. It is better to turn to general equilibrium models with an explicit treatment of survival, for a better understanding of the relevant issues. Cast in a market economy framework, a formal analysis clearly indicates that an agent may fail to survive due to an "endowment failure" and/or "an adverse movement of the terms of trade" As Sen puts it in [25]' "... starvation is a matter of some people not having enough food to eat, and not a matter of there being not enough food to eat. While the latter can be a cause of the former, it is clearly one of many possible influences.,,2 The Ethiopian famine in 1972-74 and the famine in Bangladesh in 1974 provide striking examples of the "terms of trade" effect, examples in which a particular group of agents got "decimated by the market mechanism." (Sen [26]) The famine victims often belonged to the groups of non-food producers. These individuals had to acquire food in the market in exchange for their output (or labor), and, thus, were more vulnerable IThe paper is dedicated with affection and respect to Professor Rabi Bhattacharya. Thanks are due to Kaushik Basu, Steve Coate, David Easley, James Mirrlees, and Karl Shell for discussion and comments. All remaining errors are ours. 2 As a matter of fact, "Some of the worst famines have taken place with no significant decline in food availability per head." ([26], p.17)
187
188
Survival under Uncertainty in an Exchange Economy
to the shifts in the terms of trade affecting their food purchasing power (see also [20], p.14). Sen's entitlement approach elaborated in [24]- [26], as well as the model of Coles and Hammond [7] are examples of static, deterministic analysis of the survival problem in a general equilibrium framework. Uncertainty was formally introduced, and the survival probability was precisely defined in a static Walrasian model in Bhattacharya and Majumdar [4]. Here again, an agent may fail to survive ("is ruined") in a particular state of the environment for two reasons: a meager endowment in this state (a direct effect on the individual) and/or an "unfavorable" equilibrium price system at which the wealth of the agent falls short of the minimum expenditure (computed at the equilibrium price system) needed for survival (an indirect terms of trade effect involving the preferences and the endowments of all the agents). The main results of Bhattacharya and Majumdar [4] and Hashimzade [13] (reviewed briefly in Section 2) characterize the probability of survival in a "large" Walrasian economy, under alternative assumptions on the nature of dependence among economic agents, when the endowments depend on the state of the environment. In both these studies mentioned above the uncertainty is "intrinsic" , i. e. affects one of the "fundamental characteristics" (endowments) of an economy. But in a dynamic world, adverse term-of-trade effects may emerge from "extrinsic" uncertainty, which may influence current prices through self-fulfilling beliefs or expectations. Static models are obviously inadequate to deal with such a role of expectations. Risk-averse agents tend to smooth consumption over time, and their intertemporal consumption decisions depend on their expectations about future endowments and prices. These decisions, in their turn, typically affect current equilibrium prices, as well as the probability of survival. In Section 3 we explore the connection between survival and extrinsic uncertainty more formally by using the overlapping generations ("OLG") model (see [11] and [22]. A typical overlapping generations economy is an infinite horizon discrete-time economy with an infinite sequence of consumers, each living two periods. In every time period t there are "young" agents, born at t, and "old" agents, born at t -1. If young agents are endowed by consumption good(s), and old agents are endowed by nominal asset (fiat money), there is an opportunity for an inter-generational trade. We give an example of an overlapping generations economy in which an agent may be ruined even when the fundamentals (endowments and preferences) of the economy are not affected by uncertainty. Self-fulfilling beliefs of the agents based on "sunspots" may generate an adverse terms of trade, i. e. may lead to an equilibrium price system at which the consumption of old agents is below the minimum subsistence level. We note that there is already a vast literature on OLG models, following the seminal paper by Samuelson [22], and, in particular, on the role of extrinsic uncertainty (following the paper by Cass and Shell [5]), but neither this literature, nor the literature on the Arrow-Debreu model of complete markets treats the question of economic survival. In Section 4, we turn to the question of insurance against risk, and we explore the role of markets for securities in the survival problem. Lack of insurance and
Nigar Hashimzade and Mukul Majumdar
189
financial markets and the very limited access to such markets for a vast number of agents characterize many developing countries. However, even the presence of complete markets for securities does not necessarily improve the chance of survival of an agent. Trade in securities allows us to achieve optimal allocation, when the set of securities is complete (an example is a complete set of Arrow securities [1]: suppose, there are two possible states of environment. Then a complete set of Arrow securities would be a set of two securities, each paying one monetary unit in one state and nothing in another). Even so, the optimal allocation can be such that the consumption of some agents falls below the survival threshold. We consider an economy where endowments of the agents are random, and the agents can trade a complete set of securities (in our example securities yield payoff denominated in a numeraire commodity, see [12]) to insure themselves against this type of intrinsic uncertainty. We show that trade in securities can, in fact, worsen survival prospects of the agents 3.
2
Equilibrium
In what follows, R++ is the set of positive real numbers, x = (Xk) E Rl is nonnegative (written x ~ 0) if Xk ~ 0 for all k, and x is strictly positive (written x » 0) if x E R ++.
Consider, first, a deterministic Walrasian exchange economy with two goods. Assume that an agent i has an initial endowment ei = (eil' ei2) » 0, and a Cobb-Douglas utility function (2.1) where 0 < I < 1 and the pair (XiI, Xi2) denotes the quantities of goods 1 and 2 consumed by agent i. Thus an agent i is described by a pair Qi = (/, ei). Let p be the price of the first good. In a Walrasian model with two goods, we can normalize prices so that (p, 1 - p) is the vector of prices accepted by all the agents. The typical agent solves the following maximization problem (P): (2.2) subject to the "budget constraint" defined as
where the income or wealth Wi of the i-th agent is defined as the value of its endowment computed at (p,1 - p):
(2.3) Solving the problem (P) one obtains the excess demand for the first good as:
(2.4) 3We are not addressing the issue of practical implementation of securities or insurance policies. The point of this excercise is to demonstrate that the traditional approach to the equilibrium in market economies fails to tackle the survival problem precisely because the usual concept of Pareto optimality ignores the notion of survival.
Survival under Uncertainty in an Exchange Economy
190
One can verify that (2.5) The total excess demand for the first good at the prices (p, 1 - p) in a Walrasian exchange economy with n agents is given by: n
(l(P, 1- p) = L(il(p, 1- p)
(2.6)
i=l
In view of (2.5) it also follows that (2.7)
The "market clearing" Walrasian equilibrium price system is defined by
(2.8) and direct computation gives us the equilibrium price p~ (we emphasize the dependence of equilibrium price on the number of agents by writing p~) as:
(2.9) where
(2.10) To be sure, one can verify directly that demand equals supply in the market for the second good when the excess demand for the first good is zero. Finally, let us stress that a Walrasian economy is "informationally decentralized" in the sense that agent i has no information about (ej) for i i= j. Thus it is not possible for agent i to compute the equilibrium price p~.
2.1
Survival
In order to provide the motivation for our formal approach, we recall the basic elements of Amartya Sen's analysis ([26], Appendix A) in our notation. Let Fi be a (nonempty) closed subset of R~+. We interpret Fi as the set of all combinations of the two goods that enable the i-th agent to survive. Now, given a price system (p, 1 - p), one can define a function mi (p) as
(2.11) Thus, mi (p) is readily interpreted as the minimum expenditure needed for survival at prices (p, 1 - p). Example: Let (ail, ai2)
»
0 be a fixed element of R~+. Let
Pi = {(Xil,Xi2) E R~+ : XiI 2:: aiI,Xi2 2:: ai2}
(2.12)
Nigar Hashimzade and Mukul Majumdar
191
In our approach we do not deal with the set Fi explicitly. Instead, let us suppose that, in addition to its utility function and endowment vector, each agent i is characterized by a continuous function mi (p) : [0, 1] -----t R++, and say that for an agent to survive at prices (p,1 - p), its wealth Wi(P) (see (2.3) must exceed mi(p), Hence, the i-th agent fails to survive (or, is ruined) at the Walrasian equilibrium (p~, 1 - p~) if (2.13) or, using the definition (2.3) (2.14) From (2.13) and (2.14) one can see that an agent may face ruin due to (a) a possible endowment failure or (b) the equilibrium price system adversely affecting its wealth relative to the minimum expenditure. This issue is linked to the literature on the "price" and "welfare" effects of a change in the endowment on a deterministic Walrasian equilibrium (see the review of transfer problem by Majumdar and Mitra [17]). Observe that in our economy even with an exact information on the total endowment (L:~=l eil) of the first good ("food"), it is not possible to figure out how many agents may starve in equilibrium, in the absense of detailed information on the pattern of (ei' mi) (and the formula (2.9)).
2.2
Intrinsic uncertainty: computing the probability of ruin
Let us introduce uncertainty. Suppose that the endowments ei of the agents (i = 1, 2, ... n) are random variables. In other words, each ei is a (measurable) mapping from a probability space(n, F, P) into the non-negative orthant of R2. One interprets n as the set of all possible states of environment, and ei(w) is the endowment of agent i in the particular state w. The distribution of ei (.) is denoted by /-li [formally each /-li is a probability measure on the Borel (J field of R2, its support being a nonempty subset of the strictly positive orthant of R2]. From the expression (2.9), the "market clearing" equilibrium price p~(w) is random, i.e., depends on w: (2.15) The wealth Wi(p~ (w)) of agent i at p~ (w) is simply p~ (w )eil (w )+[1-p~ (w )]ei2(w). The event is the set of all states of the environment in which agent i does not survive. Again, from the definition of the event R~. it is clear that an agent may be ruined due to a meager endowment vector in a particular state of environment.
Survival under Uncertainty in an Exchange Economy
192
In what follows, we shall refer to this situation as a "direct" effect of endowment uncertainty or as an "individual" risk of ruin. But it is also possible for ruin to occur through an unfavorable movement of the equilibrium prices (terms of trade) even when there is no change (or perhaps an increase!) in the endowment vector. A Walrasian equilibrium price system reflects the entire pattern of endowment that emerges in a particular state of the environment. Given the role of the price system in determining the wealth of an agent and the minimum expenditure needed for survival, this possibility of ruin through adverse terms of trade can be viewed as an "indirect" ("terms of trade") effect of endowment uncertainty. To begin with let us make the following assumptions:
A2. {Xi} are uncorrelated, {Yi} are uncorrelated.
A3. some
[(1/n) 2.= EXi] converges to some 7f2
7r 1
> 0,
[(1/n) 2.= EY;] converges to
> 0 as n tends to infinity.
In the special case when the distributions of ei are the same for all i (so that lin 2:i EXi = 7fl, where 7fl is the common expectation of all Xi; similarly for 7f2), A3 is satisfied. Under A1-A3, if the number n of agents increases to infinity, as a consequence of the weak law of large numbers (see Lamperti [15], p.22) we have the following property of equilibrium prices p~:
Proposition 1. Under Ai-A3, as n tends to infinity, ability to the constant
p~(w)
converges in prob-
(2.16) Roughtly, one interprets (2.16) as follows: for large values of n, the equilibrium price will not vary much from one state of the environment to another, and will be insensitive to the exact value of n, the number of agents. For the constant Po defined by (2.16), we have the following characterization of the probability of ruin in a large Walrasian economy:
Proposition 2. function,
If POeil(w)
+ (1
- po)ei2(w) has a continuous distribution
(2.17)
Nigar Hashimzade and Mukul Majumdar
193
Remark: The probability on the right side of (2.17) does not depend on n, and is determined by J-li, a characteristic of agent i, and PO. Our first task is to characterize P(R~) when n is large (so that the assumption that an individual agent accepts market prices as given is realistic). One is tempted to conjecture that the convergence property of Proposition 1 will continue to hold if correlation among agents becomes 'negligible' as the size of the economy increases. We shall indicate a 'typical' result that captures such intuition. Proposition 3. Let the assumptions (Ai) and (AS) hold. Moreover, assume
(A.2') There exist two non-negative sequences (£kh2:o, (£~)k2:0 both converging to zero such that for all i, k
ICOV(Xi,Xi+k)1 ICov(Yi, Yi+k)1 Then, as n tends to infinity,
p~ (w)
< £k <
£~
converges in probability to the constant
Po = 7r1/[ 7r 1 + 7r2]
2.3
Some comments on Walrasian equilibria
The analysis so far is deceptively simple for one primary reason. Once one dispenses with the Cobb-Douglas functional form, one loses the formula (2.15) in which a unique equilibrium in every w is conveniently computed. A more general treatment - unavoidably more technical - is in [4] which contains the proofs of Propositions 1-3 above, and Proposition 4 below. In a more general framework with 1 2 2 goods (see [2] and [9] for a classical exposition of the deterministic Walrasian equilibrium theory), we begin with the price simplex S
= {p = (Pk)
E Rl : p
> 0, tPk = I}. An agent i accepts k=1
the price system pES as given. It is described by a pair (Ii, ei), where the endowment vector ei E Rl, ei »0. The wealth of the agent i at P is I
P . ei == L Pkeik· The demand function fi is a continuous function from k=1 S x R++ to R~ such that for every (p, Wi) E S x R++, P . fi(P, Wi) = Wi
Wi
=
I
(where p. fi(P, Wi) - LPkfidp, Wi)). Usually the demand functions are de-
k=1
rived from a utility maximization problem of type (P) indicated above. For our analysis, the key concepts are the excess demand function of agent i, defined as (i(p) - Ii(p, Wi) - ei (compare to (2.4)). The excess demand function I
for the economy is ((p) =
L (i(p), i=1
a continuous function on S.
Note that
Survival under Uncertainty in an Exchange Economy
194 I
LPk(ik(P)
= 0;
hence, the excess demand function for the economy satisfies
k=l
the "Walras Law":
I
p. ((p)
==
LPk(k(P)
= O.
(2.18)
k=l
An equilibrium price system p* E S satisfies ((p*) = O. By Walras Law (2.18), iffor any fJ E S (k(fJ) = 0 for k = 1,··· ,l-1 then necessarily (dfJ) = 0 for k = l. The Walras Law (2.18) can be verified directly from (2.3) and (2.4) in our example, and when the equilibrium price (2.9) is derived for the first market, there is also equilibrium in the second market which can be directly checked. A detailed exposition of this model with l 2: 2 commodities is in Debreu [9]. In [3] the Debreu model was extended to introduce random preferences and endowments, and the implications of the law of large numbers and the central limit theorem were first systematically explored. Throughout this section we shall consider l = 2 to see the main results in the simplest form.
2.4
Dependence: Exchangeability
We shall now see that if dependence among agents does not "disappear" even when the economy is large, the risk of ruin due to the "indirect" terms of trade effect of uncertainty may remain significant. To capture this in a simple manner, let us say that /1 and v are two possible probability laws of {ei (.) h::O-l. Think of Nature conducting an experiment with two outcomes "H" and "T" with probabilities (e, 1 - e), 0 < e < 1. Conditionally, given that "H" shows up, the sequence {ei (.) h>l is independent and identically distributed with common distribution /1. On the other hand, conditionally given that "T" shows up, the sequence {ei (-) h::o- I is independent and identically distributed with comon distribution v. Let 7rlp, and 7rlv be the expected values of Xl under /1 and v respectively. Similarly, let 7r2p, and 7r2v be the expected values of YI under /1 and v. It follows that Pn (-) converges to Po (-) almost surely, where Po (-) = 7flp,/[7flp, + 7f2p,] = Pop, with probability and Po(-) = 7rlv/[7flv +7r2v] = POv with probability 1 - e. We now have a precise characterization of the probabilities of ruin as n tends to infinity. To state it, write
e
J ri (/1)
{(UI,U2) E R! : POp,UI
l
+
(1- POp,)U2 :::; mi(POp,)};
(2.19)
lL (dU I, dU2).
Similarly, define ri(v) obtained on replacing /1 by v in (2.19). Proposition 4. Assume that POeil (w) + (1- PO)ei2(w) had a continuous distribution function under each distribution /1 and v of ei = (eil' ei2).
(a) Then, as the number of agents n goes to infinity, the probability of ruin of the i-th agent converges to ri (/1), with probability to ri(v), with probability 1 when "T" occurs.
e
e,
when "H" occurs and
Nigar Hashimzade and Muku] Majumdar
195
(b) The overall, or unconditional, probability of ruin converges to
Here, the precise limit distribution is slightly more complicated, but the important distinction from the case of independence (or, "near independence") is that the limit depends not just on the individual uncertainties captured by the distributions j1 and v of an agent's endowments, but also on () that retains an influence on the distribution of prices even with large n.
2.5
Dependency neighborhoods
Dependency neighborhoods were introduced by Stein [28] and are defined in the following way. Consider a set of n random agents. A subset Si of the set of integers {I, 2, ... , n} containing an agent i is a dependency neighborhood of i if i is independent of all agents not in Si. The sets Si need not constitute a partition. Further, consider a dependency neighborhood of Si - a set Ni such that Si N i , and the collection of agents in Si is independent of the collection of agents not in N i . The latter can be viewed as the second-order dependency neighborhood of the agent i. In general,
c:
Ni =
U Sj
(2.20)
{jEsd need not be the case (this is related to the fact that pairwise independence does not imply mutual independence), although one might expect this relation to hold in non-exotic situations (see, for example, [21]). Consider now an economy En with dependency neighborhoods sin), ... ,S~n) for each of n agents. As above, the i-th agent is characterized by Qi = (/, ei), where ei = (eil' ei2). The Walrasian equilibrium price p~ is given by (2.9)-(2.10). The convergence property, similar to Proposition 3, holds under modified assumptions on the distribution of random endowments and an additional assumption on the size of the dependency neighborhood. Proposition 5. Let the assumptions (A 1) and (A 3) hold. Moreover, assume
max 1COV(Zi' Zj) 1< B < 00 , Z E {X, Y}, for every i = i#jESt) 1, ... ,n uniformly in n for some sufficienly large positive B. (A.2") Bni ==
(A.4) Sn - . max US;n):::; n 1 - c uniformly in n for some ~==1, ..
E
E (0,1).
. ,n
Then, as n tends to infinity, p~ (w) converges in probability to p lim p~ (w) =
7f1 7f1
+ 7f2
Using the results of Majumdar and Rotar [19], we can construct approximate distribution of equilibrium price in a large Walrasian economy.
.
Survival under Uncertainty in an Exchange Economy
196
Proposition 6. Let the assumptions (A.1), (A.2"), (A.3) and (A.4) hold. Let us also assume that (2.20) holds for the dependency neighborhoods structure. Then the distribution of P~ (w) can be approximated by normal distribution with mean Po and variance Vn defined as
Po
(2.21 ) (2.22)
(See [13] for proofs.)
3
Extrinsic uncertainty with overlapping generations: an example
In the previous section we assumed that endowments of the agents are different in different states of environment. This type of uncertainty, that affects the socalled fundamentals of the economy (endowments, preferences, and technology), is called the intrinsic uncertainty. When the uncertainty affects the beliefs of the agents (for example, the agents believe that market prices depend on some "sunspots") whereas the fundamentals are the same in all states, this type of uncertainty is called extrinsic uncertainty. Clearly, with respect to the probability of survival, the extrinsic uncertainty has no direct effect, because it does not affect the endowments. However, it may have an indirect effect: self-fulfilling beliefs of the agents regarding market prices affect their wealth, and some agents may be ruined in one state of environment and survive in some other state, even though the fundamentals of the economy are the same in all states. To study the indirect, or the adverse term-of-trade effect of extrinsic uncertainty on survival we turn to a dynamic economy. Consider a discrete time, infinite horizon OLG economy with constant population. We use Gale's terminology [11] wherever appropriate. For expository simplicity, and without loss of generality we assume that at the beginning of every time period t = 1,2, ... there are two agents: one "young" born in t, and one "old" born in t - 1. In period t = 1 there is one old agent of generation O. There is one (perishable) consumption good in every period. The agent born in t (generation t) receives an endowment vector et = (e¥, en and consumes a We consider the Samuelson case 4 and assume, without loss vector Ct = (c¥, of generality, et = (1,0). We assume that the preferences of the agent of generation t can be represented by expected utility function Ut (-) = E rut (Ct)] with Bernoulli utility ut (Ct), continuously differentiable and almost everywhere twice continuously differentiable, strictly concave and strictly monotone onn D, compact, convex subset of R~+. The old agent of generation 0 is endowed with one
cn.
4If a population grows geometrically at the rate ,,(, so that "(t agents is born in period t, and there is only one good in each period, the Samuelson case corresponds to marginal rate of intertemporal substitution of consumption under autarky, Ul (e Y , eO)/U2 (e Y , eO), being less than 'Y- In our case "( = 1.
Nigar Hashimzade and Mukul Majumdar
197
unit of fiat money, the only nominal asset in the economy. In every period the market for the perishable consumption good is open and accessible to all agents. Denote the nominal price of the consumption good at time t by Pt. Define a price system to be a sequence of positive numbers, p = {Pt} ~o, a consumption program to be a sequence of pairs of positive numbers c = {Ct} ~o, a feasible program to be a consumption program that satisfies cf + Cf-l ~ ef + ef-I = 1. The agent of generation t maximizes his lifetime expected utility in the beginning of period t. In period 1, the young agent gives its saving (sf) of the consumption good, to the old agent in exchange for one unit of money (the exchange rate is determined by PI)' Thus, PI Sl = 1. This unit of money is carried into period 2 (the old age of agent born in period 1) and is exchanged (at the rate determined by P2) for the consumption food saved by the young agent born in period 2 (s~). The process is repeated.
3.1
Perfect Foresight Equilibrium
If there is no uncertainty, with perfect foresight the price-taking young agent's optimization problem is the following:
maxU(cf,
cn
subject to
cf cf (0
~
sf
~
1-
sf
ptsf /PHI
1, t = 1,2, ... ).
Here, sf - ef - cf is savings of the young agent (this is the Samuelson case, in Gale's definitions [11]). A perfect foresight competitive equilibrium is defined as a feasible program and a price system such that (i) the consumption program c = {cd solves optimization problem of each agent given p = {j:Jt} : (cr, cf) E V, cr = 1 - St and cf = '[JtSt/PHl with St
= arg
max U
O~s¥~1
((1-
sf)
,sf _Pt
)
Pt+1
and (ii) the market for consumption good clears in every period:
c¥ + Cf-l PtSt
for t
=
1,2,···.
1 1
(demand (demand
supply for the consumption good) supply for money)
Survival under Uncertainty in an Exchange Economy
198
By strict concavity of the utility function U (c¥, cn, the young agent's optimization problem has a unique solution. Hence, we can express St as a single-valued function of pt! Pt+ 1, i. e. we write St = St (Pt! PH d. This function (called savings function) generates an offer curve in the space of net trades, as price ratios vary. In the perfect foresight equilibrium (3.1)
The stationary perfect foresight monetary equilibrium is a sequence of constant prices P and constant consumption programs (1 - s, s), where s = s(1).5
3.2
Sunspot equilibrium
Now consider an extrinsic uncertainty in this economy. There is no uncertainty in fundamentals, such as endowments and preferences, but the agents believe that market prices depend on realization of an extrinsic random variable (sunspot). We assume that there is one-to-one mapping from the sunspot variable to price of the consumption good. Because the agents cannot observe future sunspots, they maximize expected utility over all possible future realization of the states of nature. We examine the situation with two states of nature, (J" E {a, ,6}, that follow a first-order Markov process with stationary transition probabilities, (3.2)
where 7ffJfJl > 0 is the probability of being in state (J"' in the next period given that current state is (J", and 7ffJOi + 7f fJ (3 = 1. A young agent born in t observes price pf and solves the following optimization problem: max [7f fJOi U(c¥,fJ, c~'Oi)
+ 7f fJ (3U(c¥,fJ, c~,(3)]
subject to 1- sf
(0 ~
sf
~ 1, s(::::: 0,
(J",(J"'
E
{a,,6}).
We restrict our attention to stationary equilibria, in which prices depend on the current realization of the state of nature (J", and do not depend on the calendar time nor the history of (J". A stationary sunspot equilibrium, SSE, is a pair of feasible programs and nominal prices, such that for every (J" E {a,,6} (i) the consumption programs solve the agents' optimization problem: sfJ (pfJ / pfJ l ) =
arg max [7ffJOiU ((1 - sfJ), sfJpfJ /pOi) 0<80-<1
+ (3.3)
5Given our assumptions on preferences and endowments, the stationary perfect foresight monetary equilibrium exists and is optimal (see, for example, [16], Ch. 8).
Nigar Hashimzade and Mukul Majumdar
199
and (ii) markets clear in every period, in every state.
It is easy to see that a stationary sunspot equilibrium exists when the equation pO: sO: (po:) _ pf3 pf3
s(3
(pf3) = 0 pO:
(3.4)
has positive solutions for pO: /pf3 other than 1 . Solution pO: /Pf3 = 1 corresponds to the equilibrium in which uncertainty does not matter. It can be shown that, if sunspot equlibria exist in this economy, there is at least two of them, with pO: /pf3 > 1 and pO: /pf3 < 1 (see, for example, [6], [27]). This means that in the sunspot equilibrium consumption of old agents is above the certainty equilibrium consumption of olds in one state of nature and below in the other. Suppose, we introduce an exogenous minimal subsistence level of consumption (independent of a E {oo,,B}). It may be the case that in one of the states of nature consumption of old agents falls short of minimal subsistence level: old agents are ruined. Note that the endowments are not affected by the uncertainty, and, therefore, there is no direct effect of uncertainty on ruin. The event of ruin is caused purely by an indirect, or term-of-trade effect: the equilibrium price system is such that the wealth of old agents does not allow them to survive. The following numerical example illustrates this possibility for the case of quadratic utility.
3.3
Ruin In equilibrium
Let the preferences of the agents be represented by expected utility function with
U(c)
u(eY,e O )
-
v(eO )
1 1 2aveYeo + q eY + reo - -b( eY)2 - -d( CO? 2 2 2 ~ (A - eO) , 0 < CO ::::; A { 0, eO> A where a, b, e, q, r, 0, A are positive constants such that the utility function is increasing and jointly concave in its arguments in V. v(·) is the disutility of consuming less than A, the minimal subsistence level. 6 As above, agents in each generation receive identical positive endowments e = 1 of consumption good when young and zero endowments when old; the initial olds are endowed with one unit of money. 6It may seem odd that the disutility from starvation is finite, but this can be justified by the willingness of the agents to take a risk. Consider the following. In the continuous time, if the consumption of an old agent is above A, he lives to the end of the second period. If his consumption is below A, perhaps, he does not die immediately. Albeit low, the amount consumed allows him to live some time in the second period, and his lifespan in the second period is the longer, the closer is his consumption to A. In the discrete time this translates into probability of survival in the second period as a function of consumption. Thus, the
200
3.3.1
Survival under Uncertainty in an Exchange Economy
Benchmark case: perfect foresight
For the above preferences, savings function St (pt/ Pt+!) is implicitly defined by (3.5) where Pt == Pt/Pt+l. The offer curve is described by
In the stationary (deterministic) perfect foresight monetary equilibrium consumption plan of an agent is (x, 1 - x), where x solves
a(
3.3.2
J~ t : 1
x -
+ x (b+ d) + v'(I- x) + q -
x)
r - b= 0
(:1.7)
Stationary sunspot equilibria
Two states of nature, 0: and f3 evolve according to a stationary first-order Markov process. The states of nature do not affect the endowments. Agents can trade their real and nominal assets. In a stationary sunspot equilibrium with trade sa, sf3 solve the following system of equations: 7f Oo
7f oo
a(Jl-;~a +
aJ l~:a + (1 - 7f00) aJ l~:a +
r - ds o
-
V'(SO))
q - b (1 - SO) =
+ (1_7f 00 ) (aJl~r + r -
(3.8)
d s f3 - V'(Sf3)) ::
and
It is easy to see that one solution is sa = sf3 = 1- x, where x solves the equation for the perfect foresight above. This solution does not depend on the transition probabilities, prices and consumption are not affected by the uncertainty: sunspots do not matter in this equilibrium. However, there may be more solutions. For example, for a = 2, b = 0.5, d = 7, q = 0.02, r = 0.6 (J = 0.05 A = 0.3 and n CW = n f3f3 = 0.15 there are three stationary monetary equilibria in the economy: one coinciding with the perfect foresight equilibrium and two sunspot equilibria. Prices and consumption programs for these equilibria are given in the following table. old agent survives with probability 1 if CO ;:;:. A and with probability less than 1 if CO < A. Suppose, the objective of the agent is to maximize the probability of survival (or maximize his expected lifespan). Then it can be presented equivalently as the objective to minimize the disutility from consumption at the level below A. Clearly, this disutility can be finite, at least in the vicinity of A, if the agent is willing to take a risk. The authors are indebted to David Easley for this argument.
Nigar Hashimzade and Mukul Majumdar
201
State
PFE
1st SSE
2nd SSE
0:
(0.6670; 0.3330; 3.00)
(0.5973; 0.4027; 2.48)
(0.7518; 0.2482; 4.03)
(3
(0.6670; 0.3330; 3.00)
(0.7518; 0.2482; 4.03)
(0.5973; 0.4027; 2.48)
(In every entry, the first number is consumption of young, the second is consumption of old, and the third is nominal price of consumption good.) The consumption programs in sunspot equilibria are Pareto inferior to the program in the perfect foresight equilibrium. Furthermore, in two sunspot equilibria old agents survive in one state of nature and fail to survive in another with the same amount of resources, because equilibrium price is too high. (We intentionally considered the case where agents survive in the certainty equilibrium to demonstrate that survival is always feasible. Also, in this model young agents always survive, - otherwise, the overlapping generations structure collapses.)
4
Insurance and survival
The purpose of the following examples is to demonstrate that trade in securities does not guarantee survival of all agents. Furthermore, trade in securities can even deteriorate the survival chances of some agents. For expositionary simplicity, we consider a static Cobb-Douglas-Sen economy, similar to the one described in Section 2.
4.1
Static economy with two states: definitions
Let us first restate the definitions of a stochastic general equilibrium concept in a Cobb-Douglas-Sen economy with logarithmic preferences for a particular case of two possible states of environment. Consider a pure exchange economy with two goods, l E {1,2}, with good 1 being a numeraire. There are two states of nature, sEn = {o:, (3}, with 7r = P[s = 0:] = 1 - P[s = (3]. Two consumers, i E {1,2}, receive endowments ei(s) = (eil(s),ei2(s)) E Each consumer is characterized by the Cobb-Douglas logarithmic utility function:
Ri·
(4.1) In addition, each consumer is characterized by the minimum expenditure function, mi (P* (.)), the level of wealth at and below which consumer i fails to survive in the equlibrium with (random, normalized) equilibrium price vector (l,p*(·)). Consumers maximize utility in every state, taking price as given. A random equilibrium is defined as a set of vectors of allocations, {Xi (s ) }, and prices, p* (s) for each state of nature, such that
Survival under Uncertainty in an Exchange Economy
202
1. Given normalized price vector (l,p*(s)) in state s, consumption vector Xi(S) = (Xi1(S),Xi2(S)) maximizes utility of consumer i in state s subject to his budget constraint, xi1(s) + P*(S)Xi2(S) :::; eil(s) + p*(s)ei2(s), for every i and s; 2. Markets for consumtpion goods clear in every state. If we allow "( (the parameter in Cobb-Douglas preferences) vary across the consumers, the equilibrium price in state s will be
Hence, wealth of consumer i in state s is
Assume, for simplicity, that the minimum expenditure function is the same for all agents and has linear form:
m(p*(s))
=
ao
+ p*(s)al
for some positive constants ao and al. Then, consumer i is ruined in state s if
If this inequality holds for consumer i for s = Q' only, then consumer i is ruined with probability 7r. If it holds for s = f3 only, then i is ruined with probability (1 ~ 7f). If it holds for consumer i in both states, then i is ruined with probability 1. Suppose, consumers know 7f. The question is, if consumers could trade securities before s is realized, would this improve their chances to survive?
4.2
Arrow-type securities in a two-period economy
Assume now, that in the economy described in Section 4.1 there are two time periods, t = 0,1. Let the preferences of the consumers be described by von Neumann-Morgenstern expected utility function, with Bernoulli utility in the log Cobb-Douglas form (4.1), with "( varying across consumers. At t = 0 consumers can issue and trade contracts in real Arrow-type securities. At t = 1 consumers receive their endowments, execute the contracts and trade consumption goods. Markets for securities are complete: for every state of nature there is a security that promises to deliver at t = lone unit of numernire good if this particular state occurs, and nothing in other states (see [23] and [12] for a more general exposition). Denote the holdings of security that pays in state s by for consumer i; E R. Consumers know probability distribution
yt
yt
Nigar Hashimzade and Mukul Majumdar
203
of the states of nature. In time period t = 0 they choose holdings of securities, or portfolios, (yt, yf) to maximize expected utility of consumption in time period t = 1. We normalize price of the asset that pays in state a to unity and denote price of the asset that pays in state f3 by q. A random equilibrium with complete asset markets is a set of vectors of portfolios {(yt,yf)}, allocations {Xi(S)}, security prices (l,q) and consumption good prices (l,p(s)) for each state of nature, such that 1. Given asset prices (1, q) and normalized consumption good price vector
(1, p( S )) in state s, portfolio (yt, yf) and consumption vector Xi (s) = (Xil(S),Xi2(S)) maximize expected utility of consumer i at t = 0 subject to his budget constraints at t = 0, yt + qyf s: 0, at t = 1, XiI (s) + P*(S)Xi2(S) s: ei1(s) + p*(s)ei2(S) + Yi, for every i and s; 2. Asset markets clear at t = 0; 3. Markets for consumption goods clear at t
= 1 in every state.
Routine calculations give the following expressions for equilibrium prices: q
p(f3) and
_()
p a
=
1- 7r El(a) 7r EI (f3)
----
p(a) E2(a)El (f3) EI (a)E2(f3)
Li(l-7rl'i)eil(a) - (1-7r) Lil'ieil(f3)E1(a)/E1(f3) 7r Li l'i ei2(a) + (1 - 7r) Li l'i ei2(f3)E2 (a)/ E 2 (f3) .
Here, EI(S) == Li eli(s) is aggregate endowment of good I in state s. Wealth (in terms of the numeraire) of consumer i at t = 1 is then
-
El (f3) -
Note, that Wi (f3) = -(-)Wi(a), which means that if there is no aggregate E1a uncertainty in the endowment of numeraire, wealth is equalized across states. If there is no aggregate uncertainty in the endowments of both goods, relative price of consumption goods is also equalized across states. Then p = p( a) = p(f3) will be between p* (a) and p* (f3) and Wi = Wi (a) = Wi (f3) will be between Wi ( a) and Wi (f3). For the minimum expenditure function in the above form, we will also have that mi (p) = mi (p( a)) = mi (p(f3)) will be between mi (p* (a) ) and mi (P* (f3)). Could it happen that wealth of a consumer in a particular state falls below the minimum subsistence level in an economy with securities, whereas without securities his wealth in the same state is above the minimum subsistence level?
Survival under Uncertainty in an Exchange Economy
204
The following simple numerical examples demonstrate this possibility for the case with no aggregate uncertainty and for the case with aggregate uncertainty in endowments.
4.2.1
Example A: No Aggregate Uncertainty
Consider an economy with two consumers, i E {I,2}. Let the preferences of these two consumers and their endowments in two states be the following:
Let P[s = 0:] securities
Consumer i
ri
ei (0:)
ei ((3)
i = 1
1/2
(1,0)
(0,2)
i=2
1/3
(1,4)
(2,2)
1 - P[s = (3] =
1f
= 1/4. Then in the equilibrium without
7 8
p* (0:)
-
p* ((3)
-
4
5
and in the equilibrium with securities
p( 0:)
=
p((3)
=
31 38·
Suppose, both consumers have minimal expenditure function in he linear form, with the same parameters ao = 3/4 and al = 1. Then the survival threshold in the economy without securities is 1.625 in state 0: and 1.55 in state (3. It is easy to see that agent i = 1 is ruined in state s = 0: and survives in state s = (3; agent i = 2 survives in both states. With securities, the survival threshold in both states is ~ 1.5658, and agent i = 2 still survives in both states, but agent i = 1 is now ruined in both states.
4.2.2
Example B: Aggregate Uncertainty
Consider the same economy, now with aggregate uncertainty in the endowments:
ri
ei ( 0:)
ei ((3)
i=I
1/2
(1,0)
(0,2)
i = 2
1/3
(0,2)
(2,2)
Consumer i
Nigar Hashimzade and Mukul Majumdar
With
7r
205
= 1/4 the equilibrium price without securities is p*(a) p*({3)
3 4 4 5
-
and with securities
p(a) p({3)
15 19 30 19
Let the minimal expenditure function for both consumers be linear, with ao = 1/5 and al = 1. The survival threshold in an economy without securities is, then, 0.95 in state a and 1 in state {3. Both agents survive in both states. With securities, the survival threshold is ~ 0.990 in state a and ~ 1.779 in state {3. In that case, agent 2 still survives in both states, but agent 1 survives only in {3 and is ruined in a. These two examples demonstrate how trade in securities may worsen survival prospects of the agents with random endowments even when markets for securities are complete.
5
Concluding remarks
In this paper we introduced a formal general equilibrium approach to the problem of survival under uncertainty. The question of obvious practical importance is "how does one improve the chance of survival of an agent"? Clearly, when ruin is caused by market forces, the intervention of the government is desirable. The choice of the optimal policy is determined by the policy tools available to the government, and the sensitivity of the survival probability to the changes in policy variables. For the case of static economy with intrinsic uncertainty this problem was touched upon in [4]. In particular, under certain assumptions on the joint distribution of the endowments and linearity of the minimum expenditure function, the probability of survival of an agent increases as the limiting averages of the endowments increase. For the OLG economy with extrinsic uncertainty we showed elsewhere [14] that a lump-sum tax and transfer policy, with the amounts of taxes and transfers depending on equilibrium market price, can stabilize consumption at certainty equilibrium level (without affecting prices), thus eliminating the possiblity of ruin of the agents. In any case, the general equlibrium framework has to be used in order to accurately predict the outcomes of various policy measures. Another issue should be mentioned. Throughout this paper we assumed that the objective of an agent is to maximize his expected utility (as the traditional economic theory postulates). In a model with a single agent Majumdar and Radner [18] explored the implications for maximization of the probability of
206
Survival under Uncertainty in an Exchange Economy
survival. A systematic extension of this analysis to a framework with many interacting agents remains an important direction of research. Nigar Hashimzade and Mukul Majumdar Department of Economics, Cornell University, Ithaca, New York 14853
Bibliography [1] K. Arrow, The Role of Securites in the Optimal Allocation of Risk-bearing, Rev. Econ. Studies 31 (1964),pp. 91-96. [2] Y. Balasko and K. Shell, The Overlapping-Generations Model. II. The Case of Pure Exchange with Money, J. Econ. Theory 24 (1981), pp. 112-142. [3] R. N. Bhattacharya and M. Majumdar, Random Exchange Economies, J. Econ. Theory 6 (1973), pp. 37-67. [4] R. N. Bhattacharya and M. Majumdar, On Characterizing the Probability of Survival in a Large Competitive Economy, Review of Economic Design 6 (2001), pp. 133-153. [5] D. Cass and K. Shell, Do Sunspots Matter? J. Polito Economy 91 (1983), pp. 193-227. [6] S. Chattopadhyay and T.J. Muench, Sunspots and Cycles Reconsidered, Economic Letters 63 (1999), pp. 67-75. [7] J .L. Coles and P.J. Hammond, Walrasian Equilibrium Without Survival: Existence, Efficiency and Remedial Policy. In: Basu et al (eds.), "Choice, Welfare and Development." Oxford: Clarendon Press (1995), pp. 32-64. [8] G. Debreu, "Theory of Value; An Axiomatic Analysis of Economic Equilibrium", New Haven: Yale University Press (1959). [9] G. Debreu, Economies with a Finite Set of Equilibria, Econometrica 38 (1970), pp. 387-392. [10] Jean Dreze (ed.), "The Economics of Famine" , Cheltenham, Northampton, MA, USA: An Elgar Reference Collection (1999).
UK,
[11] D. Gale, Pure Exchange Equilibrium of Dynamic Economic Models, J. Econ. Theory 6 (1973), pp. 12-36. [12] J. D. Geanakoplos and H. M. Polemarchakis, Existence, Regularity, and Constrained Suboptimality of Competitive Allocations when the Asset Market is Incomplete. In: W. P. Heller, R. M. Starr, and D. A. Starrett (Eds. ), "Uncertainty, Information, and Communication: Essays in Honor of Kenneth J. Arrow." Cambridge, New York and Melbourne: Cambridge University Press (1986), Vol. III, pp. 65-95. [13] N. Hashimzade, Probability of Survival in a Random Exchange Economy with Dependent Agents, forthcoming in Economic Theory (2002).
Nigar Hashimzade and Mukul Majumdar
207
[14] N. Hashimzade, "Survival with Extrinsic Uncertainty: Some Policy Issues", Working Paper (2002), Cornell University. [15] J. Lamperti, "Probability." New York: Benjamin (1966). [16] L. Ljungqvist and T. Sargent, "Recursive Macroeconomic Theory". Cambridge, Massachusetts; London, England: The MIT Press (2000). [17] M. Majumdar and T. Mitra, Some Results on the Transfer Problem in an Exchange Economy. In: Dutta, B. et al (eds.), "Theoretial Issues in Development Economics." New Delhi: Oxford University Press (1983), pp. 221-244. [18] M. Majumdar and R. Radner, Linear Models of Economic Survival Under Production Uncertainty, Economic Theory 1 (1991), pp.13-30. [19] M. Majumdar and V. Rotar, Equilibrium Prices in a Random Exchange Economy with Dependent Agents, Economic Theory 15 (2000), pp. 531550. [20] M. Ravallion, "Markets and Famines," Oxford: Clarendon Press (1987). [21] Y. Rinott and V. Rotar, A Multivariate CLT for Local Dependence with n- 1 / 2 Iog n rate, and Applications to Multivariate Graph Related Statistics, J. Multivariate Analysis 56 (1996), pp. 333-350. [22] P. Samuelson, An Exact Consumption-Loan Model ofInterest with or without the Social Contrivance of Money, JPE 66 (1958), pp. 467-482. [23] W. Shafer, Equilibrium with Incomplete Markets in a Sequence Economy. In: M. Majumdar (ed.), "Organizations with Incomplete Information. Essays in Economic Analysis: A tribute to Roy Radner." Cambridge, New York and Melbourne: Cambridge University Press (1998), pp. 20-41. [24] A. Sen, Starvation and Exchange Entitlements: A General Approach and its Application to the Great Bengal Famine, Cambridge J. Econ. 1 (1977), pp. 33-60. [25] A. Sen, Ingredients of Famine Analysis: Availability and Entitlements, Quarterly Journal of Economics 96 (1981), pp. 433-464. [26] A. Sen, "Poverty and Famines: An Essay on Entitlement and Deprivation," Oxford: Oxford University Press (1981). [27] S. Spear, Sufficient Conditions for the Existence of Sunspot Equilibria, J. Econ. Theory 34 (1984), pp. 360-370. [28] C. Stein, Approximate Computation of Expectations. Harvard, CA: IMS (1986).
208
Survival under Uncertainty in an Exchange Economy
Singular Stochastic Control in Optimal Investment and Hedging in the Presence of Transaction Costs Tze Leung Lai Stanford University
and Tiong Wee Lim National University of Singapore Abstract In an idealized model without transaction costs, an investor would optimally maintain a proportion of wealth in stock or hold a number of shares of stock to hedge a contingent claim by trading continuously. Such continuous strategies are no longer admissible once proportional transaction costs are introduced. The investor must then determine when the stock position is sufficiently "out of line" to make trading worthwhile. Thus, the problems of optimal investment and hedging become, in the presence of transaction costs, singular stochastic control problems, characterized by instantaneous trading at the boundaries of a "no transactions" region whenever the stock position falls on these boundaries. In this paper, we review various formulations of the optimal investment and hedging problems and their solutions, with particular emphasis on the derivation and analysis of Hamilton-Jacobi-Bellman (HJB) equations using the dynamic programming principle. A particular numerical scheme, based on weak convergence of probability measures, is provided for the computation of optimal strategies in the problems we consider.
1
Introduction
The problems of optimal investment and consumption and of option pricing and hedging were initially studied in an idealized setting whereby an investor incurs no transaction costs from trading in a market consisting of a risk-free asset ("bond") with constant rate of return and a risky asset ("stock") whose price is a geometric Brownian motion with constant rate of return and volatility. For example, Merton (1969, 1971) showed that, for an investor acting as a pricetaker and seeking to maximize expected utility of consumption, the optimal strategy is to invest a constant proportion (the "Merton proportion") of wealth in the stock and to consume at a rate proportional to wealth. In the related problem of option pricing and hedging, arbitrage considerations of Black and Scholes (1973) demonstrated that, by setting up a portfolio of stock and option that is risk-free, the value of an option must equal the amount of initial capital required for this hedging. However, both the Merton strategy and the Black-Scholes hedging portfolio require continuous trading and result in an infinite turnover of stock in any finite
209
Singular Stochastic Control
210
time interval. In the presence of transaction costs proportional to the amount of trading, such continuous strategies are prohibitively expensive. Thus, there must be some "no transactions" region inside which the portfolio is insufficiently "out of line" to make trading worthwhile. In such a case, the problems of optimal investment and consumption and of option pricing and hedging involve singular stochastic control. As we shall see, Bellman's principle of dynamic programming can often be used to derive (at least formally) the nonlinear partial differential equation (PDE) satisfied by the value function of interest. The derived PDE will then suggest methods (analytic or numerical) to solve for the optimal policies. One such numerical scheme, based on weak convergence of probability measures, will be particularly useful to the problems described in this paper. It turns out that some of the resulting free boundary problems can be reduced to optimal stopping problems in ways suggested by Karatzas and Shreve (1984, 1985), thereby simplifying the solutions of the original optimal control problems. We will focus on the two-asset (one bond and one stock) setting which many authors consider. Besides simplifying the exposition, such a setting can be justified by the so-called "mutual fund theorems" whenever lognormality of prices is assumed; see, for example Merton (1971) in the absence of transaction costs and Magill (1976) in the presence of transaction costs. Specifically, the market consists of two investment instruments: a bond paying a fixed risk-free rate r > 0 and a stock whose price is a geometric Brownian motion with mean rate of return Q > 0 and volatility a > o. Thus, the prices of the bond and stock at time t ~ 0 are given respectively by
dE t
= rEt dt
and
(1.1 )
where {Wt : t ~ O} is a standard Brownian motion on a filtered probability space (0, F, {Fdt2::o, JP) with Wo = 0 a.s. The investor's position will be denoted by (Xt, yt) (in Section 2) or (Xt, Yt) (in Section 3), where
X t = dollar value of investment in bond,
= dollar value of investment in stock, Yt = number of shares held in stock.
yt
(1.2)
In particular, we note the relation yt = Yt St.
The rest of the paper is organized as follows. In Section 2, we consider optimal investment and consumption, beginning with a treatment of the "Merton problem" (no transaction costs) over a finite horizon, and then proceeding to the transaction costs problem considered by Magill and Constantinides (1976) and, more recently, by ourselves. We also consider the infinite-horizon case, drawing on results from Davis and Norman (1990) and Shreve and Soner (1994), and review the work of Taksar, Klass and Assaf (1988) on the related problem of maximizing the long-run growth rate of the investor's asset value. The problem of option pricing and hedging in the presence of transaction costs is considered in Section 3. Some concluding remarks are given in Section 4.
Tze Leung Lai and Tiong Wee Lim
2
211
Optimal Consumption and Investment with Transaction Costs
The investment and consumption decisions of an investor comprise three nonnegative {Ft}t>o-adapted processes C, L, and M, such that C is integrable on each finite time interval, and Land Mare nondecreasing and right-continuous with left-hand limits. Specifically, the investor consumes at rate Ct from the bond and L t (resp. M t ) represents the cumulative dollar value of stock bought (resp. sold) within the time interval [0, t], 0 ::; t ::; T. In the presence of proportional transaction costs, the investor pays fractions 0 ::; A < 1 and 0 ::; fJ, < 1 of the dollar value transacted on purchase and sale of stock, respectively. Thus, the investor's position (Xt, yt) satisfies dXt dyt
= (r X t - Ct) dt - (1 + A) dL t + (1 = ayt dt + ayt dWt + dL t - dMt .
fJ,) dMt ,
(2.1a) (2.1b)
The factor 1 + A (resp. 1 - fJ,) in (2.1a) reflects the fact that a transaction fee in the amount of A dL (resp. fJ, dM) needs to be paid from the bond when purchasing dL (resp. selling dM) dollar value of stock. We define the investor's wealth (or net worth) as Zt = X t
+ (1 -
fJ,)yt
if yt ~ 0;
By requiring that the investor remains solvent (i.e., has nonnegative net worth) at all times, the investor's position is constrained to lie in the solvency region D which is a closed convex set bounded by the line segments
= a~D = [h,D
{(x, y) : x
> 0,
y
{(x,y) : x::; 0, y
+ (1 + A)Y = O}, ~ 0 and x + (1- fJ,)y = O}.
< 0 and
x
We denote by A(t, x, y) the class of admissible policies, for the position (Xt, yt) = (x, y), satisfying (Xs, Y s ) E D for t ::; s ::; T, or equivalently, Zs ~ 0 for t ::; s ::; T. At time t, the investor's objective is to maximize over A(t, x, y) the expected utility
J(t, x, y)
~ IE [iT e-~('-') U (C,) ds + e~(T-') U2(ZT) Ix, ~ x, Y" ~ y1' j
where f3 > 0 is a discount factor and U1 and U2 are concave utility functions of consumption and terminal wealth. We assume that U1 is differentiable and that the inverse function (U{)-l exists. Often U1 and U2 are chosen from the so-called HARA (hyperbolic absolute risk aversion) class:
U (c)
= CI
/,y if r < 1,
r =I 0;
U(c) = loge
if r = 0,
which has constant relative risk aversion -cU" (c) jU' (c) = 1 value function by
V(t, x, y)
=
sup (C,L,M)EA(t,x,y)
J(t, x, y).
r.
(2.2)
We define the
(2.3)
Singular Stochastic Control
212
2.1
The Merton Problem (No Transaction Costs)
Before presenting the solution to the general transaction costs problem (2.3), we consider the case A = /1 = 0 (no transaction costs) analyzed by Merton (1969). In this case, by adding (2.1a) and (2.1b), the total wealth Zt = X t + yt can be represented as (2.4) where ()t = yt/(Xt + yt) is the proportion of the investment held in stock. Using the reparameterization z = x + y, the value function can be expressed as
V(t,z) =
sup
(C,L,M)EA(t,z)
lE [rT e-!3(s-t)U1(C s )ds + e-!3(T-t)U2(ZT) I Zt
it
where A(t, z) denotes all admissible policies (C, ()) for which Zs
= z] ,
> 0 for all
t :::;: s :::;: T. The Bellman equation for the value function is max{(8/8t C,o
+ £:)V(t, z) + U(C)
subject to the terminal condition VeT, z) generator of (2.4): £:=
- (3V(t, z)} = 0,
(2.5)
= U2 (z), where £: is the infinitesimal
a 2()2 z 2 8 2 8 2 8z2 +[rz+(0:-r)Oz- C18z '
Formal maximization with respect to C and () yields C = (Un- 1 (Vz ) and () -(Vz/Vzz) (0: - r)/a 2z (in which subscript denotes partial derivative, e.g., Vz 8V/8z). Substituting for C and () in (2.5) leads to the PDE
8V _ (0: - r)2 (8V/8z)2 8t 2a 2 8 2V/8z 2 where C*
+
( _ C*)8V rz 8z
+
U (C*) _ (3V = 0 1
,
= =
(2.6)
= C*(t, z) = (Un-l (Vz(t, z)). Let p=
0: - r
c
(1 -1)a2'
=
_1_
[(3 _ Ir _
1- I
Ci(t) = c/{l - q\ec(t-T)}
(i = 1,2),
1(0: - r) 2 ] 2(1 -1)a 2 '
(PI =
1,
(h =
(2.7)
1- c.
If U1 takes the form (2.2), then C* = (Vz)l/CY-l) and solving the PDE yields the optimal policy: (); == p and C; = C1(t)Zt when U2 == 0, or C; = C 2(t)Zt when U2 takes the form (2.2). Note that c = when I = O. Thus, in the Merton problem, the optimal strategy is to devote a constant proportion (the Merton proportion p) of the investment to the stock and to consume at a rate proportional to wealth. Furthermore, for i = 1 or 2 (corresponding to U2 == 0 or to (2.2)), the value function is
(3
z"Y
Vet, z) = - [Ci(t)P-l I
Vet, z)
=
ai(t)
1
+ Ci(t)
if I < 1, 1-=1=0;
log[Ci(t)z]
if I = 0,
Tze Leung Lai and Tiong Wee Lim
213
where ai(t) = ,6-2[r - ,6 + (a - r)2/2o- 2]{1 - e t3 (t-T) [1 +
2.2
Transaction Costs and Singular Stochastic Control
In the presence of transaction costs, analytic solutions are generally unavailable, even for HARA utility functions. One approach to the problem is to apply a discrete time dynamic programming algorithm on a suitable approximating Markov chain for the controlled process. This approach is based on weak convergence of probability measures, which will ensure that the discrete-time value function converges to its continuous-time counterpart as the discretization scheme becomes infinitely fine. Note that the optimal investment and consumption problem involves both singular control (portfolio adjustments) and continuous control (consumption decisions).
We begin with an analysis of the Bellman equation, which will subsequently suggest an appropriate Markov chain approximation for our problem. We can obtain key insights into the nature of the optimal policies by temporarily restricting Land M to be absolutely continuous with derivatives bounded by"" l.e., Lt
=
lt
Rsds
and
Mt
=
lt
(2.8)
msds,
Proceeding as before, the Bellman equation for the value function (2.3) is max {(a/at
C,C,m
+ £:)V(t, x, y) + U1 (C) - ,6V(t, x, y)} = 0,
(2.9)
subject to V(T, x, y) = U2(X + (1 - fJ,)y) if y ?:: 0; V(T, x, y) = U2(x + (1 + >.)y) if y < 0, where £: is the infinitesimal generator of (2.1a)-(2.1b):
(J"2y2 a 2
a
a
[a
a ]
[
a
a]
£: = -2- ay2 +(rx-C) ax +ay ay + ay - (1 + >.) ax R+ (1 - fJ,) ax - ay m.
(2.10) The maximum in (2.9) is attained by C = (Un- 1 (Vx ), R = "'TI{Vy~(1+A)Vx}' and m = ",TI{Vy ::;(l-p,)Vx}. Thus, it can be conjectured that buying or selling either takes place at maximum rate or not at all, and the solvency region 'D can be partitioned into three regions corresponding to "buy stock" (B), "sell stock" (S), and "no transactions" (N). Instantaneous transition from B to the buy boundary aB or from S to the sell boundary as takes place by letting '" ---+ 00 and moving the portfolio parallel to aA'D or ap,'D (i.e., in the direction of
Singular Stochastic Control
214
V
(-1, (1 + A)-l or (1, -(1- JL)-l)T, where T denotes transpose). This suggests that Vet, x, y) = Vet, x + (1 - JL)8y, y - by) for (t, x, y) E Sand Vet, x, y) = Vet, x - (1 + A)8y, y + t5y) for (t, x, y) E B. In the limit as t5y -----f 0, we have
Vy(t, x, y) = (1 - JL)Vx(t, x, y), Vy(t, x, y) = (1 + A)Vx(t, x, y), In
N
the value function satisfies (2.9) with R= m
(t,x,y) E S, (t,x,y) E B.
(2.11a) (2.11b)
= 0, leading to the PDE
av (J2y2 a 2v * av av * - + - - + ( r x - C )-+ay-+U1(C )-;3V=O, at 2 ay2 ax ay
(t,x,y) EN, (2.11c)
where C* = C*(t,x,y) = (Un- 1 (Vx (t,x,y)) as in (2.6). To solve (2.11a)-(2.11c), the first step is to find an approximating Markov chain which is locally consistent with the controlled diffusion (2.1a)-(2.1b). Following Kushner and Dupuis (1992), we will use the "finite difference" method to obtain the transition probabilities of the approximating Markov chain. Specifically, for a candidate consumption decision (i.e., continuous control) C, we make the following (standard) approximations to the derivatives in equation (2.11c):
Vi(t, x, y)
-----f
[Vet + 8, x, y) - Vet, x, y)]/8,
V( x t,x,y )
-----f
{
-----f
{[V(t+t5,X,Y+E) - V(t+t5,X,Y)]/E ify ~ 0, [Vet + t5,x,y) - Vet + 8,x,y - E)]/E if Y < 0,
-----f
[Vet + 8,x,y + E)
T7 (
Vy t,x,y
)
Vyy(t,x,y)
[V(t+8'X+E,Y)-V(t+t5,X,Y)l/E ifrx-C~O, [Vet + 8,x,y) - Vet + 8,x - E,y)l/E if rx - C < 0,
+ Vet + 8,x,y -
E) - 2V(t + 8,X,Y)]/E 2.
(2.12) Collecting terms and noting that C* in (2.11c) is the optimal control, we obtain the following backward induction equation for the "consumption step":
VO(t, x, y)
= e-!3 omsx
{~p(X' f) x,y
1
x, y)V(t + t5, X, f))
+ 8Ul(C)}
,
(2.13)
where only the following five transition probabilities are nonzero:
p(x ±
E,
Y 1 x, y) = (rx - C)±8/E,
p(x, y ± E1 x, y) = ay±b/E + ((J2y2 /2)t5/E 2 , p(x, y 1x, y) = 1 - (Irx -
CI + alyl)8/E -
((J2y2)8/E2.
Equation (2.13) is to be evaluated for t E 1[' = {O, 8, 2t5, ... , N 8} with 8 = T /N and (x, y) belonging to some grid X x 1{ made up of multiples of ±E. Given 8, the choice of E must ensure that p(x, y 1 x, y) ~ 0. Let Al = maxxEX,C Irx - CI and A2 = max y E1{ Iyl. Then one could set
Tze Leung Lai and Tiong Wee Lim
215
A similar treatment of equations (2.11a)-(2.11b) yields respective relations for the "sell step" and the "buy step" (singular controls):
VS(t, x, y)
=
pV(t, x, Y - E)
vh(t, x, y) = (1
+ (1 -
+ >.)-1 [>' V(t, x -
p)V(t, x + E, y - E),
E, y)
+ V(t, x -
E, Y + E)].
Since only one of buy, sell or no transactions can happen at each step, the dynamic programming equation for the (discrete-time) finite horizon value function is therefore
V(t, x, y)
=
max{VO(t, x, y), VS(t, x, y), Vh(t, x, y)},
with terminal condition V(T, x, y) = U2(x + (1 - p)y) if Y 2': 0; V(T, x, y) = U2 (x + (1 + >.)y) if Y < O. For a sufficiently fine grid 1l' x x: x Y, this gives good approximations to the value function (2.3) and the transaction regions: (t, x, y) E S if V(t, x, y) = VS(t, x, y) and (t, x, y) E S if V(t, x, y) = Vh(t, x, y). When U1 and U2 take the form (2.2), we find that V is concave and homothetic in (x, y): for 'r/ > 0,
V(t, 'r/X, 'r/y)
=
'r/'V(t, x, y)
V(t, 'r/X, 'r/y) = {;3-1 [1-
if, < 1, ,
e(J(t-T)]
-I 0;
+ e(J(t-T)} log'r/ + V(t, x, y)
if,
= O.
Homotheticity of V suggests that if equations (2.11a) and (2.11b) are satisfied for some (t, x, y) E as and as, respectively, then the same is true for any (t, 'r/X, 'r/y) with 'r/ > o. Thus, it can further be conjectured that the boundaries between the transaction and no transactions regions are straight lines (rays) through the origin for each t E [0, T]. Moreover, since C* = (Vx )l/ b -l), equation (2.11c) becomes
av O' 2y2 a 2v av av 1 - , (av)'/h-l) -+----+rx-+ay-+-- --;3V = 0 at 2 ay2 ax ay , ax '
(t,x,y) EN,
with the fifth term on the l.h.s. of (2.14) replaced by -(1 + log Vx )
(2.14) when, = O.
We can further exploit homotheticity of V to reduce the nonlinear PDE (2.14) to an equation in one state variable. Indeed, let 'I/J(x) = V(t,x,l) so that V(t, x, y) = Y''I/J(t, x/y). Then, for some functions A*(t), A*(t), and -(1- p) < x*(t) < x*(t) < 00, equations (2.11a)-(2.11b) and (2.14) are equivalent to the following when, < 1 and, -I 0:
'I/J(t,x)
=
'I/J(t, x)
=
,-I A*(t)(x + ,-I A*(t)(x +
1- p)',
x ::; x*(t),
(2.15a)
1 + >')"
x 2': x*(t),
(2.15b)
x
E
[x*(t),x*(t)], (2.15c)
where
b3
=
0'2/2. (2.16)
Singular Stochastic Control
216
A similar set of equations can also be obtained for 'Y = o. A simplified version of the numerical scheme described earlier in this section can be implemented to solve for 'lj;(t, x) as well as the boundaries x*(t) and x*(t). For details and numerical examples, see Lai and Lim (2002a). Hence, for HARA utility functions, the optimal policy for the transaction costs problem (2.3) is given by the triple (C*,L*,M*), where
and L; =
!at lI{xs/ys=x*(s)} dL:,
t E [0, T].
The introduction of transaction costs into Merton's problem in Section 2.1 has the following consequence. The investor should optimally maintain the proportion of investment in stock between B*(t) := [1 + x*(t)]-1 > 0 and B*(t) := [1 + x*(t)]-1 < J-L-I, i.e., B*(t) :::; B; :::; B*(t) in our earlier notation. Thus, the no transactions region N is a "wedge" in the solvency region D. Such an observation can be traced back to Magill and Constantinides (1976), who found that "the investor trades in securities when the variation in the underlying security prices forces his portfolio proportions outside a certain region about the optimal proportions in the absence of transaction costs." The foregoing analysis and solution of problem (2.3) can be extended to the case of more than one stock. While a straightforward application of the principle of dynamic programming would suffice to derive the Bellman equation, computational aspects of the problem become much more involved. As pointed out by Magill and Constantinides (1976), m stocks imply 3m possible partitions of the solvency region so even for moderately large m (e.g., 35 ~ 250, 3 10 ~ 60000) it is unclear how to systematically solve for the transaction regions. When the stock prices are geometric Brownian motions, Magill (1976) established a mutual fund theorem on the reduction of the optimal investment and consumption problem to the case consisting of a bond and only one stock.
2.3
Stationary Policies for Infinite-Horizon Problems
We can view the infinite-horizon optimal investment and consumption problem as the limiting case of the finite-horizon problem in Section 2.2. By setting t = 0 and letting T --+ 00, the finite-horizon value function (2.3) approaches the following infinite-horizon value function (dropping the subscript on U1 ): V(x, y)
=
sup (C,L,M)EA(x,y)
IE
roo e-(3tU(Ct ) dt,
Jo
(x,y) ED,
(2.17)
where A(x,y) denotes the set of all admissible policies (C,L,M) for an initial position (x, y) E D such that (Xt, yt) ED for all t ~ 0 a.s. Because the problem no longer depends on time t, the regions 5, 5, and N are stationary over time. The Bellman equation is given by (2.9) without a/at. The analysis of Section
Tze Leung Lai and Tiong Wee Lim
217
2.2 carries over, leading to analogs of equations (2.11a)-(2.11c) (i.e., without t and av/at). For a general utility function U, the numerical procedure described in Section 2.2 can be modified to give a solution of the infinite-horizon investment and consumption problem. With the finite difference approximations given by (2.12) but without t or t + <5, we obtain, after normalization, the following analog of (2.13):
VO(x,y) = mgxe-(j3+a 2y2 /c 2 ),5
{~P(X'YIX'Y)V(X'Y) + <5U1(C)} ,
(2.18)
x,y
where <5
= E/'L., 'L. = Irx - CI + alyl, and p(x ± E, Y I x, y) = (rx - C)±<5/E,
p(x, y ±
E
I x, y) = ay±<5/E.
Thus, proceeding as in Section 2.2, the dynamic programming equation is
V(x, y) = max{Vo(x, y), VS(x, y), Vb(x, y)}, where VS(x, y)
(2.19)
= p, V(x, y - E) + (1 - p,)V(x + E, y - E) and Vb(x, y) = (1 +
.\)-1 [.\ V(x - E, y) + (1- '\)V(x - E, Y + E). According to which value on the r.h.s. of (2.19) V(x, y) takes, the position (x, y) is classified as belong to N, S, or B.
We next specialize U to take the form (2.2) to simplify the dynamic programming equation. For future reference, we begin with some results for the case of no transaction costs (.\ = p, = 0). An analysis of the infinite-horizon analog of (2.5) (i.e., without a/at) yields (); -= p and C; = cZt for all t ;?: 0, where p and c are given by (2.7). The value function is
V(z) 1 [
z'Y = _c'Y- 1
'Y
V(z) = {32 r - (3 +
if'Y < 1, 'Y =I- 0;
(a-r)2] 20- 2
1 + fjlog({3z)
if'Y
= O.
These results can also be derived from those in Section 2.1 on the Merton problem by letting T ----7 00, since then Ci (0) ----7 C (i = 1, 2). In the presence of transaction costs, the control problem has been independently considered by Davis and Norman (1990) using the principle of smooth fit and by Shreve and Soner (1994) using the concept of viscosity solutions to second-order PDEs. Earlier Constantinides (1986) obtained an approximate solution of the problem under the restriction that the investor consumes at a rate proportion to his holding in bond. A general numerical procedure when there are m > 1 stocks has been developed by Akian, Menaldi and Sulem (1996). Because V is concave and homothetic, it is possible to reduce the problem to solving ordinary differential equations (ODEs). Indeed, the control problem can be solved by finding a C 2 function 'ljJ and constants 00 > x* > x* > -(1 - p,) and A*, A* satisfying equations (2.15a)-(2.15c) without time dependence. It can be shown that ()* ::; p ::; ()*, with ()* = (1 + X*)-1, ()* = (1 + x*)-1. Two sufficient conditions for finiteness of the value function
Singular Stochastic Control
218
V are f3 > 1r + 1(0: - r)2 /{2(1-1)a 2} and (f3 - 0:1)(1 + A) > (f3 - r1)(1- J1); see Shreve, Saner and Xu (1991). Interestingly, if lump-sum transaction costs proportional to portfolio value (e.g., portfolio management fees) are imposed in addition to proportional transaction costs, then portfolio selection and withdrawal for consumption are made optimally at regular intervals (as opposed to trading at randomly spaced instants of time), with the investor consuming deterministically between transactions, as shown by Duffie and Sun (1990). To find the constants x*, x*, A*, A*, and the function 'ljJ, the principle of smooth fit can be first applied to 'ljJ" at x* and x* to solve for A* and A * (which depend on x* and x* respectively). Next, the second order ODE (2.15c) (without t and 8'ljJ / 8t) can be written as a pair of first-order equations after a change of variables. Specifically, for 1 # 0 (so U(c) = c'Y /1), let Q(f) = -bI/1- b2f + (1-1)b 3f 2 and R(f) = -bI/1 + (b3 - b2)f -1b3f2, where b1 , b2, and b3 are defined in (2.16). Then there exist functions f(x) and h(x) satisfying the system of differential equations
l' =
1 [R(f) - h], -b
f(x*) =
3X
1*
:=
x*::
+ A'
(2.20a) hi = _1_ _h_[h - Q(f)], 1 -1 b3 xf
h(x*) = Q(f*),
h(x*) = Q(f*),
(2.20b)
such that 'ljJ(x) =
~
[1 h (X)]'Y1 1-1
1
[~]'Y f(x)
satisfies (2.15c) (without t and 8'ljJ/8t). In this case, the optimal consumption policy is Ct = C*(Xt, Yt), where C*(x, y) = 1(1 - 1)-lxh(x/y)/ f(x/y). The case 1 = 0 can be treated similarly. Davis and Norman (1990) suggested the following algorithm for the numerical solution of (2.20a)-(2.20b) (in which f, h, x*' x* need to be determined). The iterative procedure starts with an arbitrary value x* of x* > 1 - p, and the corresponding values j* = x* /(x* + 1 + A) and h* = Q(}*). It uses numerical integration to evaluate j(x)
= j*
-l x
x*
R(j(u~) -
h(u) du,
3U
h(x) = h* - _1_1 x* h(u)[h(u) -=- Q(j(u))] du 1-1 x b3uf(u) for a sequence of decreasing x values until the first value x* of x for which h(x*) ~ Q(}(x*)). At this point, we have a solution of (2.20a)-(2.20b) with J1 replaced by x* + 1 - x*/ f(x*). The iterative procedure continues by adjusting the initial guess x* and computing the resulting x*' terminating when x* + 1 x*/ f(x*) differs from J1 by no more than some prescribed error bound.
Tze Leung Lai and Tiong Wee Lim
2.4
219
Maximization of Long-Run Growth Rate
An alternative optimality criterion was considered by Taksar, Klass and Assaf (1988). Instead of maximizing expected utility of consumption as in (2.17), suppose the objective is to maximize, in the model (2.1a)-(2.1b) without consumption (i.e., C t == 0), the expected rate of growth of investor assets (equivalently the long-run growth rate). This optimality criterion can be reformulated in terms of R t = Yt/ X t alone so that the problem is to minimize the following limiting expected "cost" per unit time: (2.21 ) where .\
f-Lx f(x)=x+l'
a2x 2 2(x + 1)2
(
2
( ) o:-r+- X- . 2 x +1 (2.22) In (2.21), L t (resp. NIt) can be interpreted as the cumulative percentage of stock bought (resp. sold) within the time interval [0, t], and is related to L t (resp. M t ) via dL t = (1/ Xt) dL t (resp. dMt = Y;;-1 dMd. If'\ = f-L = 0 (no transaction costs), the second and third terms in (2.21) vanish and the optimal policy is to keep Rt equal to the optimal proportion obtained as the minimizer of h(x). This is tantamount to setting Ot (= Yt/(X t + Yt)) equal to p* := (0: - r)/a 2 + 1/2, which resembles the Merton proportion pin (2.7).
g(x)=x+l'
h(x)
=
We study the general problem of minimizing (2.21) under the condition Io:-rl < (]"2/2. (If this condition is violated, the optimal policy is to transfer all the investment to bond or stock at time 0 and to do no more transfer thereafter.) Since
an analysis of the value function V using the Bellman equation shows (in a manner similar to the previous section) that there exist constants x *, x*, A (optimal value) such that
(a 2 /2)x 2 V"(x)
+ (0: -
r
+ a 2 /2)xV'(x) + h(x)
- A
= 0,
x E [x*, x*], (2.23a)
V'(x)
=
F(x),
x ~ x*'
V'(x) = G(x),
x> x* , -
(2.23b)
where F(x) = -'\(I+x)-l(I+(I+.\)x)-l and G(x) = f-L(1+x)-l(I+(I-f-L)X)-l. Using the principle of smooth fit at x* and x*, we find that A = h((1 + A)X*) = h((1 - f-L)x*), from which it follows that either
or
x* = (_1_) (p* - 1/2)(1 + A)X* + p* 1 - f-L (1 - p*)(1 + A)X* - p*
(2.24) Hence, even though an alternative criterion (of maximizing long-run growth rate) is used to assess the optimality of investment policies, the above analysis shows that like Section 2.3 the investor should again optimally maintain the
Singular Stochastic Control
220
proportion of investment in stock between ()* := x*/(1 + x*) and ()* := x* /(1 + x*). The constants x* and x* can be computed by solving the second-order nonhomogeneous ODE V'( ) = 2 x a 2 x 2p*
l
x
[h(
x*
x*
) _ h( )] 2(p*-1) d + F* + x*F~ (x* Y Y Y 1 - 2P* x
)2 P* (2.25)
with initial conditions V'(x*) = F* := F(x*) and V"(x*) = F~ := F'(x*) at x*' which is obtained by differentiating (2.23a)-(2.23b). A search procedure can then be employed to find that value of x* for which x* given by (2.24) satisfies V'(x*) = G(x*) in view of (2.23b).
3
Option Pricing and Hedging
This section considers the problem of constructing hedging strategies which best replicate the outcomes from options (and other contingent claims) in the presence of transaction costs, which can be formulated as the minimization of some loss function defined on the replication error. In our recent work, we directly minimize the (expected) cumulative variance of the replicating portfolio in the presence of additional rebalancing costs due to transaction costs. As shown in Section 3.3, this leads to substantial simplification as the optimal hedging strategy can be obtained by solving an optimal stopping (instead of control) problem. In Sections 3.1 and 3.2 we review an alternative approach, developed by Hodges and Neuberger (1989), Davis, Panas and Zariphopoulou (1993) and Clewlow and Hodges (1997), which is based on the maximization of the expected utility of terminal wealth and which generally results in a free boundary problem in four-dimensional space. Instead of solving the free boundary problem, Constantinides and Zariphopoulou (1999) derived analytic bounds on option pnces.
3.1
Formulation via Utility Maximization
The utility-based approach adopts a paradigm the investor trades only in the underlying stock and proportional transaction costs are imposed Following the notation in (1.2), his holding of (number of shares) is given by dXt
similar to Section 2. Suppose on which the option is written on purchase and sale of stock. bond (dollar value) and stock
= r X t dt - (1 + ),,)St dL t + (1 -
(3.1a)
fL)St dMt ,
dYt = dL t - dMt ,
(3.1b)
where L t (resp. M t ) represents the cumulative number of shares bought (resp. sold) within the time interval [0, t]. Define the cash value of y shares of stock when the stock price is S by
Y(y, S) = (1
+ )..)yS
if y < 0;
Y(y, S)
= (1 - fL)yS if y
~
o.
Tze Leung Lai and Tiong Wee Lim
221
For technical reasons, the investor's position is constrained to lie in the region
v = ((x,y,S)
E lR 2 x lR+: x
+ Y(y,S) > -a}
(3.2)
for some prescribed positive constant a. We denote by A(t, x, y, S) the class of admissible trading strategies (L, M) for the position (x, y, S) E V at time t such that (Xs, Ys, Ss) E V for all s E [t, T]. The objective is to maximize the expected utility of terminal wealth, giving rise to the value functions Vi(t,x,y,S)
=
lE [U(Z~)],
sup
i
= 0, s, b,
(3.3)
(L,M)EA(t,x,y,S)
where U : lR -+ lR is a concave increasing function (so it is a risk-averse utility function). The terminal wealth of the investor (with or without an option position) is given by z~
=
Z~ =
Z~ =
+ Y(YT, ST) X T + Y(YT, ST) II{STSK} + [Y(YT - 1, ST) + K] II{ST>K} X T + Y(YT, ST) II{STSK} + [Y(YT + 1, ST) - Kj II{ST>K} XT
(no call), (sell a call), (buy a call),
in which we have assumed that the option is asset settled so that the writer delivers one share of stock in return for a payment of K when the holder chooses to exercise the option at maturity T. In the case of cash settled options, the writer delivers (ST - K)+ in cash, so Z~ = X T + Y(YT, ST) - (ST - K)+ and Z~ = X T + Y(YT, ST) + (ST - K)+. From the definition of the value functions (3.3), it is evident that an application of the principle of dynamic programming will yield the same PDE for each value function (i = 0, s, b), with the terminal condition governed by utility of the respective terminal wealth. By temporarily restricting Land M as in (2.8) (and then letting r;, -+ (0), the Bellman equation for Vi is max£,m(8/8t + £)Vi(t, x, y, S) = 0, where £ is the infinitesimal generator of (3.1a)-(3.1b) and dSt = St(adt + adWt ):
£
8 = rx-
8x
2
2
8 +a S2 8 + as- -2 + [ -8 8S
2
8S
8y
(1
8 ]f + + A)8x
[ (1 - JL)8 - -8 ] m. 8x
8y
Thus, once again, the state space can be partitioned into regions in which it is optimal to buy stock at the maximum rate, or to sell stock at the maximum rate, or not to do any transaction. Arguments similar to those in Section 2 show that there exist functions y*(t, x, S) (buy boundary) and y*(t, x, S) (sell boundary) for each i = 0, s, b such that V~(t, x, y, S) = (1
+ A)SV:(t, x, y, S),
V~(t,x,y,S) = (1- JL)SV:(t,x,y,S),
lI;;i + rxV:
+ aSV~ + (a 2 S2 /2)V~s
=
0,
y
s:;
y*(t, x, S),
(3.4a)
y::::: y*(t,x,S),
(3.4b)
y E [y* (t, x, S), y* (t, x, S)], (3.4c)
The optimal hedging strategy associated with (3.3) is given by the pair (L *, M*), where for each i = 0, s, b, L*t --
l
0
t
II {ys=y.(s,Xs,Ss)} dL*s'
t E
[O,Tj.
Singular Stochastic Control
222
Two different definitions of option prices have been proposed. In Hodges and Neuberger (1989) and subsequently in Clewlow and Hodges (1997), the reservation selling (resp. buying) price is defined as the amount of cash ps (resp. pb) required initially to provide the same expected utility as not selling (resp. buying) the option. Thus, ps and pb satisfy the following equations:
(3.5) An alternative definition is used by Davis, Panas and Zariphopoulou (1993). Assuming that U(O) = 0, define Xi
=
inf{x : Vi(O,
X,
0, S)
~
O},
°
i
= O,s, b,
°
so in particular, X O:s; because VO(O, 0, 0, S) ~ (investing in neither bond nor stock is admissible). Thus, an investor pays an "entry fee" -x o to trade in the market strictly on his own account. The selling price ps and buying price pb of the option are then constructed such that the investor is indifferent between going into the market with and without an option position: ps = X S - x O and pb = -(x b -x O ). Although they advocate this definition for the option writer's price, Davis, Panas and Zariphopoulou (1993, pp. 492-493) express reservations of using it to define the buyer's price.
3.2
Solution for Exponential Utility Functions
A reduction in dimensionality (from four to three) can be achieved by specializing to the negative exponential utility function U(z) = 1 - e--Yz (with constant index of risk aversion -U"(z)jU'(z) = ,). Using this utility function, the bond position can be managed through time independently of the stock holding and
Vi(t, X, y, S)
=
1 - exp { _,xer(T-t)} Hi(t, y, S),
i = 0, s, b,
where Hi(t, y, S) := 1 - Vi(t, 0, y, S). As a consequence, the free boundary problem (3.4a)-(3.4c) for each i = 0, s, b is transformed into the following problem:
H;(t, y, S)
= _,er(T-t) (1
H;(t,y,S) = H:
_,e
+ )")SHi(t, y, S),
y:S; y*(t, S),
(3.6a)
p)SHi(t,y,S),
y ~ y*(t,S),
(3.6b)
[y*(t, S), y*(t, S)].
(3.6c)
r (T-t)(1_
+ aSH1 + (a 2 S 2 /2)H1s =
0,
y
E
It is also straightforward to observe that the price definitions are equivalent to pS _ -1 -rTl [HS(O,O,S)] -, e og HO(O,O,S) ,
pb __ -
-1 -rT
,
e
1
[Hb(O, 0, S)]
og HO(O, 0, S)
. (3.7)
The solution of the free boundary problem (3.6a)-(3.6c) can be obtained by approximating dYt = dL t - dMt and dSt = St(a dt + a dWt ) with Markov chains and applying a that discrete-time dynamic programming algorithm as in
Tze Leung Lai and Tiong Wee Lim
223
Section 2.2. To this end, it is useful to note from (3.6a)-(3.6b) that
Hi(t, Yl, S) = Hi(t, Y2, S) exp {~l'er(T-t)(l
+ ),,)S(YI ~ Y2)},
Yl:::; Y2 :::; y*(t, S),
Hi(t, Yl, S) = Hi(t, Y2, S) exp { ~l'er(T-t)(l ~ P,)S(YI ~ Y2)}, Yl 2': Y2 2': y*(t, S). We discretize time t so that it takes values in 1f = {O, 15, 215, ... , N 15}, where 15 = T / N. The number of shares is also discretized so that Y is a multiple of E. Then we can approximate the stock price process using the following random walk: with probability p, with probability 1 ~ p, where u = Ja 215 + (a ~ a 2/2)215 2 and p = [1 + (a ~ a 2/2)15/u]/2. Let Y = {kE : k is an integer} and § = {e ku So : k is an integer} This discretization scheme leads to the following algorithm for (t, y, S) E 1f X Y X §:
Hi(t, y, S) = min {Hi(t, Y + E, S) exp Hi(t, Y ~
E,
ber(T-t) (1
+ ),,)SE] ,
S) exp [ ~ "(er(T-t) (1 ~ p,)SE] ,
pHi(t + 15, y, eUS)
+ (1 ~ p)Hi(t + 15, y, e-US) };
(3.8)
see Davis, Panas and Zariphopoulou (1993) and Clewlow and Hodges (1997) for details. Depending on which term on the r.h.s. of (3.8) is the smallest, the point (t, y, S) is classified as belonging to B, S, or N, respectively. We set Y* ( t, S) (resp. y* (t, S)) to be the largest (resp. smallest) value of Y for which (t, y, S) E B (resp. S).
3.3
A New Approach
The previous analysis shows that, in the presence of transaction costs, perfect hedging of an option is not possible and trading in options involves an element of risk. Indeed, if the region D defined in (3.2) is replaced by the solvency region of Section 2, Soner, Shreve and Cvitanic (1995) showed that "the least costly way of hedging the call option in a market with proportional transaction costs is the trivial one-to buy a share of the stock and hold it." By relaxing the requirement of perfect hedging, Leland (1985) and Boyle and Vorst (1992) demonstrated that discrete-time hedging strategies, for which trading takes place at regular intervals, can nearly replicate the option payoff at maturity. The option price is essentially the Black-Scholes value with an adjusted volatility. While hedging error can be reduced to zero as the time between trades approaches zero, the adjusted volatility approaches infinity and the option value approaches the value of one share of stock. A new approach has been recently proposed in Lai and Lim (2002b). The formulation is motivated by the original analysis of Black and Scholes (1973) in the following way: form a hedging portfolio that minimizes hedging error and price the option by the (expected) initial capital require to set up the hedge.
Singular Stochastic Control
224
For the hedging portfolio, the objective is to minimize the expected cumulative instantaneous variance and additional rebalancing costs due to transaction fees, given by
J(t, S, y) = IE
[iT iT
F(s,
+",
s" y,) ds +
(S,/K) dM,
>'iT
Is,
(S'/ K) dT",
=
S,y,
=
yj.
= a 2(S/ K)2[y - fl.(t, S)]2
for the option writer and F(t, S, y) = + ~(t,S)F for the option buyer. Here, fl.(t,S) = N(d1(t,S)) is the Black-Scholes delta (i.e., the number of shares in the option's perfectly replicating portfolio) with where F(t, S, y)
a 2(S/K)2[y
d1(t, S) = {log(S/K)
+ r(T - t)}/aJT - t + aJT - t/2.
Taking 0: = r, analysis of the Bellman equation for the value function V (t, S, y) minL,M J(t, S, y) leads to the following free boundary problem:
Vy(t, S, y) = -AS/ K Vy(t, S, y) = /-LS/ K
=
NC n {y < ~(t, S)}, in NC n {y > ~(t, S)}, in
inN. By working with Vy instead of directly with V, we deduce from the previous set of equations that Vy(t, S, y) satisfies another free boundary problem associated with an optimal stopping problem. It is this reduction to optimal stopping that greatly simplifies the hedging problem. Applying the transformations s = a 2(t - T) and z = log(S/ K) - (p - 1/2)s, where p = r / a 2, it suffices to work with v(s, z, y) = Vy(t(s), S(s, z), y). For each y, we obtain the following discrete-time dynamic programming equation for the option writer, utilizing a symmetric Bernoulli walk approximation to Brownian motion:
v(s, z, y) = min{/-Le z+!3s, v(s, z, y)}lI{y>D(s,z)} + max{ _Ae z+!3s, v(s, z, y) }1I{y
(3.9)
= [/-LII{y>D(o,z)} - AII{y
![v(s+r5,z+05,y)+v(s+r5,z-05,y)], g(s,z,y) = 2e 2(z+(p-l/2)s)[y-D(s,z)], D(s, z) = eO: Ps (z/ J=S + J=S), and s = -15, -215, .... Each point (s, z, y) E (-00,0] x lR x [0,1] can be classified as belonging to the sell region, buy region, or no transactions region, according to whether v(s,z,y) = /-Le z+f3s , v(s,z,y) = _Ae z+!3s, or -Ae z+!3s < v(s, z, y) < /-Le z+f3s , respectively. Since v(s, z, y) is nondecreasing in y, there exist sell and buy boundaries, denoted respectively by yS(s, z) and yb(s, z), such that if y > yS(s, z) (resp. y < yb(s, z)), the option writer must immediately sell y-yS(s, z) (resp. buy yb(s, z) -y) shares of stock to form an optimal hedge. The optimal hedging portfolio for the option buyer can also be obtained from (3.9) by symmetry: the optimal sell and buy boundaries for the option buyer with sell rate /-L and buy rate A are _yb(s, z) and -yS(s, z)
Tze Leung Lai and Tiong Wee Lim
225
respectively, where yS(s, z) and yb(s, z) are the optimal sell and buy boundaries for the option writer with sell rate A and buy rate J-L. Simulation studies have shown the approach to be efficient in the sense that it results in the smallest standard error of hedging error for any specified mean hedging error, where hedging error is defined to the difference between the Black-Scholes value and the initial capital needed to replicate the option payoff at maturity. For details and refinements, see Lai and Lim (2002b).
4
Conclusion
Optimal investment portfolios and hedging strategies derived in the absence of transaction costs involve continuous trading to maintain the optimal positions. Such continuous policies are at best approximations to what can be achieved in the real world, and a frequent practice is to execute the policies discretely so that transactions take place at regular (or predetermined) intervals. With appropriate adjustments, these policies can also be implemented in the presence of transaction costs since they do not lead to an infinite turnover of asset. However, in the absence of a clearly defined objective, it is difficult to argue that a discrete policy is optimal in any sense. This difficulty can be overcome in investment and consumption problems through utility maximization, and in option pricing and hedging problems through the minimization of hedging error. Many formulations of these problems lead naturally to singular stochastic control problems, in which transactions either occur at maximum rate ("bang-bang") or not at all. In the analysis of these singular control problems, the principle of dynamic programming is used to derive the Bellman equations, which are nonlinear PDEs whose solutions in the classical sense have posed formidable existence and uniqueness problems. The development of viscosity solutions to these PDEs in the 1980s is a major breakthrough that circumvents these difficulties; see Crandall, Ishii and Lions (1992). In contrast to discrete policies, singular control policies require trading to take place at random instants of time, when asset holdings fall too "out of line" from a "target." Besides being naturally intuitive, singular control policies lend further insight into optimal investor behavior when faced with investment decisions (with or without consumption). Efficient numerical procedures can be developed to solve for the singular control policies based on Markov chain approximations of the controlled diffusion process. In some instances, a reduction to optimal stopping reduces the computational effort considerably.
Tiong Wee Lim Dept. of Statistics and Appl. Prob. National University of Singapore Singapore 117546
Tze Leung Lai Department of Statistics Stanford University Stanford, CA 94305
226
Singular Stochastic Control
Bibliography [1] Akain, M., Menaldi, J. L. and Sulem, A. (1996). On an investmentconsumption model with transaction costs. SIAM J. Control Optim. 34 329-364. [2] Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. J. Political Economy 81 637-654. [3] Boyle, P. P., and Vorst, T. (1992). Option replication in discrete time with transaction costs. J. Finance 47 271-293. [4] Clewlow, L. and Hudges, S. D. (1997). Optimal delta-hedging under transaction costs. J. Econom. Dynamics Control 21 1353-1376. [5] Constantinides, G. M. (1986). Capital market equilibrium with transaction costs. J. Political Economy 94 842-862. [6] Constantinides, G. M. and Zariphopoulou, T. (1999). Bounds on prices of contingent claims in an intertemporal economy with proportional transaction costs and general preferences. Finance Stoch. 3 345-369. [7] Cox, J. C. and Huang, C. F. (1989). Optimal consumption and portfolio policies when asset prices follow a diffusion process. J. Econom. Theory 49 33-83. [8] Crandall, M. G., Ishii, H. and Lions, P. L. (1992). User's guide to viscosity solutions of second order partial differential equations. Bull. Amer. Math. Soc. 27 1-67. [9] Davis, M. H. A. and Norman, A. R. (1990). Portfolio selection with transaction costs. Math. Oper. Res. 15 676-713. [10] Davis, M. H. A., Panas, V. G. and Zariphopoulou,T. (1993). European option pricing with transaction costs. SIAM J. Control Optim. 31 470493. [11] Duffie, D. and Sun, T. (1990). Transaction costs and portfolio choice in discrete-continuous-time setting. J. Econom. Dynamics Control 14 35-51. [12] Hodges, S. D. and Neuberger, A. (1989). Optimal replication of contingent claims under transactions costs. Rev. Futures Markets 8 222-239. [13] Karatzas, I., Lehoczky, J. P. and Shreve, S. E. (1987). Optimal portfolio and consumption decisions for a "small investor" on a finite horizon. SIAM J. Control Optim. 25 1557-1586. [14] Karatzas, I. and Shreve, S. E. (1984). Connections between optimal stopping and singular stochastic control I. Monotone follower problems. SIAM J. Control Optim. 22 856-877. [15] Karatzas, I. and Shreve, S. E. (1985). Connections between optimal stopping and singular stochastic control II. Reflected follower problems. SIAM J. Control Optim. 23 433-451.
Tze Leung Lai and Tiong Wee Lim
227
[16] Kushner, H. J. and Dupuis, P. G. (1992). Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag, New York. [17] Lai, T. L. and Lim, T. W. (2002a). Optimal investment and consumption on a finite horizon with transaction costs. Technical Report, Department of Statistics and Applied Probability, National University of Singapore. [18] Lai, T. L. and Lim, T. W. (2002b). A new approach to pricing and hedging options with transaction costs. Technical Report, Department of Statistics, Stanford University. [19] Leland, H. E. (1985). Option pricing and replication with transactions costs. J. Finance 40 1283-1301. [20] Magill, M. J. P. (1976). The preferability of investment through a mutual fund. J. Econom. Theory 13 264-271. [21] Magill, M. J. P. and Constantinides, G. M. (1976). Portfolio selection with transaction costs. J. Econom. Theory 13 245-263. [22] Merton, R. C. (1969). Lifetime portfolio selection under uncertainty: The continuous-time case. Rev. Econom. Statist. 51 247-257. [23] Merton, R. C. (1971). Optimum consumption and portfolio rules in a continuous-time sense. J. Econom. Theory 3 373-413 [Erratum 6 (1973) 213-214]. [24] Shreve, S. E. and Soner, H. M. (1994). Optimal investment and consumption with transaction costs. Ann. Appl. Probab. 4 609-692. [25] Shreve, S. E., Soner, H. M. and Xu, G.-L. (1991). Optimal investment and consumption with two bonds and transaction costs. Math. Finance 153-84. [26] Soner, H. M., Shreve, S. E. and Cvitanic, J. (1995). There is no nontrivial hedging portfolio for option pricing with transaction costs. Ann. Appl. Probab. 5 327-355. [27] Taksar, M., Klass, M. J. and Assaf, D. (1988). A diffusion model for optimal portfolio selection in the presence of brokerage fees. Math. Oper. Res. 13 277-294.
228
Singular Stochastic Control
Parametric Empirical Bayes Model Selection Some Theory, Methods and Simulation Nitai Mukhopadhyay Eli Lilly and Company
and
J ayanta Ghosh Purdue University Abstract For nested models within the PEB framework of george and Foster (Biometrika,2000), we study the performance of AIC, BIC and several relatively new PEB rules under 0-1 and prediction loss, through asymptoties and simulation. By way of optimality we introduce a new notion of consistency for 0-1 loss and an oracle or lower bound for prediction loss. The BIC does badly, AIC does well for the prediction problem with least squares estimates. The structure and performance of PEB rules depend on the loss function. Properly chosen they rend to outperform other rules.
1
Introduction
Our starting point is a paper by George and Foster (2000), abbreviated henceforth as [6]. [6] propose a number of new methods using PEB (Parametric Empirical Bayes) ideas on model selection as a tool for selecting variables in a linear model. An attractive property of the new methods is that they use penalized likelihood rules with the penalty coefficient depending on data, unlike the classical AIC, due to Akaike (1973), and BIC, due to Schwartz (1978), which use constant penalty coefficients. The penalty for a model dimension q is usually Aq, where A is a penalty coefficient. [6] compare different methods through simulation. Our major contribution is to supplement this with some theoretical work for both prediction loss and 0-1 loss. The former is supposed to be relevant in soft science, where one only wants to make good prediction, and the latter is relevant in hard science, where one wants to know the truth. It is known in model selection literature that these different goals lead to different notions of optimality. Our theory is based on the assumption that we have nested , orthogonal models - a situation that would arise if one tries to fit an orthogonal polynomial of unknown degree. This special case receives special attention in [6]. Our paper is based on Chapter 4 of Mukhopadhyay (2000), subsequently referred to as [9]. A related paper is Berger, Ghosh and Mukhopadhyay, (2003), which shows the inadequacy of BIC in high dimensional problems. 229
230
Parametric Empirical Bayes Model Selection
The BIC was essentially developed as an approximation to the Bayesian integrated likelihood when all parameters in the likelihood have been integrated out. The model that maximizes this is the posterior mode, it minimizes the Bayes risk for 0-1 loss. It is shown in Berger, Ghosh and Mukhopadhyay, (2003) that BIC is a poor approximation to this in high dimensional problems. The optimality of AIC in high dimensional prediction problems has been proved in a series of papers, e.g., Shibata (1981), Li (1987) and Shao (1997). Both the BIC and AIC are often used in problems for which they were not developed. We examine the penalties of [6] in Section 2 and make some alternative recommendations. All the model selection rules are studied in Sections 3 and 4 from the point of view of consistency under 0-1 loss. In section 5 we follow the predictive approach, using the consistency results proved earlier. For the situation where least squares estimates are used for prediction after selection of a model, we define an oracle, a sort of lower bound, in the spirit of Shibata. In the PEB framework it is easy to calculate the limit of the oracle, namely, the function B(·) and show that the Bayes prediction rule and the AIC attain this lower bound asymptotically. This is not always the case for the PEB rules, which are Bayes rules for 0-1 loss. Section 5 ends with a study of the case where Bayes (shrinkage) estimates are used instead of least squares estimates. Then the PEB rules are asymptotically optimal and can do substantially better than AIC. However, the benefit comes from the better estimates rather than more parsimonious model selection. Simulations in Section 6, for both 0-1 and squared error prediction loss, bear out the validity of asymptotic results in finite samples, they also provide useful supplementary information. Results similar to those outlined above are studied, in the Frequentist setting of Shao (1997), in Mukhopadhyay and Ghosh (2002) and for Shibata's Frequentist setting of nonparametric regression in Berger, Ghosh, and Mukhopadhyay, (2003). The assumptions, priors, results and proofs differ in the three cases. The PEB formulation of [6] provides a PEB background for the simplest as well as cleanest results of this type.
2
PEB Model Section Rules for 0-1 Loss
The problem of variable selection in nested orthogonal models can be put in the following canonical form in terms of the regression coefficients. The data consist of independent r. v's Yij, i = 1, 2, ... ,p, j = 1, 2, ... ,r. There are p models M q , 1 ::; q ::; p. Hardly any change occurs if q = 0 is also allowed. Under M q , Yij = (3i + tij, 1::; i ::; q, j = 1,2, ... ,r
N. Mukhopadhyay and J.K. Ghosh
231
= tij, q+ 1:::; i :::;p, j = 1,2,·· ·r, with til'S i.i.d. N(0,0- 2 ). For simplicity we assume 0- 2 is known. If 0- 2 is unknown the same theory applies if 0- 2 is replaced by a consistent estimate of 0- 2 . If r > 1 and p is large, then a consistent estimate of 0- 2 is available from the residuals Yij - Yi. In our asymptotics r is held fixed and p -+ 00. The sample size is n = pro Clearly, the model Mq of dimension q specifies that {3q+l,··· ,{3p are all zero. In the PEB formulation, see e.g. Morris (1983), the dimension of parameter space is reduced by assigning the parameters a prior distribution with a few unspecified (hyper-)parameters which are estimated from data and integrating out original parameters. [6] assume, as in Morris (1983), that {3I,·· . ,{3q are i.i.d. N (0, C 0- 2 / r). In our work we have used C 0- 2 , both choices have validity see our discussion in Berger and Pericchi (2001). In any case in the simulations r = 1, so that our prior is the same as that of [6]. As indicated in Morris (1983), a PEB formulation is a compromise between a classical Frequentist approach and a full Bayesian approach. In many decision theoretic examples based on real or simulated data, Efron and Morris (1973), Morris (1983) and others have shown that the PEB formulation permits borrowing of strength from estimates of similar parameters, leading to estimates that substantially improve classical estimates even in a Frequentist sense. However, this does not follow from PEB theory. The PEB theory works well, i.e. provides better estimates than classical ones in the sense of cross-validation or being closer to a known true value, when the normality (or other prior) distributional assumption is checked by comparing the expected and empirical distribution of Yi's. If Mq is true, then YI , Y2 , ... ,Yq are i.i.d. N(O, co- 2 + 0- 2 / r ). In the PEB formulation here there are two unknown remaining parameters, namely c and the true q denoted as qo. The PEB solution adopted by us is to estimate c from data and put a prior 7r(q) on q. We make one final assumption that 0- 2 = 1 which can be ensured by a suitable scale transformation. Suppose c is known and 7r(q) is a prior on q. The Bayes solution is to maximize with respect to q. The likelihood with {3I,· .. ,(3q integrated out namely,
L(q, c)
=
A 7r(q)(l
+ rc)-q / 2 exp{
rc SSq} ... 1 +rc
(1)
q
where SSq =
r2....:Y? and A doesn't depend on q or c.
Since c is not known, one
1
choice - referred to as a conditional maximum likelihood estimate of c - is to maximize the expression in (1) with respect to c, giving ~
SS
rC q = max{--q - 1, O} q
(2)
We now take 7r(q) uniform on 1 :::; q :::; p. Then the PEB Bayes rule will choose Mq if q maximizes the expression in (1) after replacing c by cq. This amounts to maximizing with respect to q,
Parametric Empirical Bayes Model Selection
232
A(q)
= A(q, cq ) = 2 log L(q, c) =
rC q ~ SSq - q log(1 1 + rCq
= SSq - q(1 + log + SSq)
+ rc q ) (3)
q
If instead of estimating c, we put a prior on tion we should maximize
C
and then use Laplace approxima-
(4) Details are given in [9]. Later we provide some evidence that a single estimate of a C across all models is preferable. A natural PEB estimate is obtained by taking 7r(q) = lip, and summing the expressions of the likelihood in (1) over 1 ::; q ::; p and then maximizing with respect to c. This estimate C1[ is referred to as the Marginal Maximum Likelihood estimate in [6]. One then gets a third penalized (log) likelihood ~ )} A1[ ( q) = S S q - q { I +~rC1[ log+ (1 + rC1[ rC1[ In this paper C1[ will also stand for any estimate which converges a.s. to c as true qo --> 00. George and Foster [6] discuss the relative advantages and disadvantages of each estimate of C and refer to unpublished work of Johnstone and Silverman (2000). The new model selection rules are to be compared with AIC which maximizes SSq - 2q/r and BIC which maximizes SSq - q{log(pr)}/r. As indicated before both these classical rules are inappropriate for high dimensional problems with 0-1 loss. The rule based on A(q) is essentially due to [6] except that, instead of our uniform prior, they choose the "binomial" prior.
(4a) where, according to [6], w is to be estimated also by maximizing (1). For a given q, it is clear that w appears only in the prior 7r(q) and not on the likelihood of the data given M q . The maximizing w, namely,
Wq
=
q/p
(5)
can hardly be called a PEB estimate in the same spirit as cq . Also for q/p bounded away from zero an one, the penalty in (log) integrated likelihood due to this 7r(q) is O(q) whereas this part of the penalty vanishes at the end-points. In other words, irrespective of the data, the models in the middle range of q are being unduly penalized. The binomial prior seems more appropriate in the all 2P subsets model selection which is much problem, where the models in the middle have cardinality (P) q bigger than the cardinality of, say q = 1 or p.
N. Mukhopadhyay and J.K. Ghosh
233
Even for all subsets model selection, there is some confounding between wand c in the following sense. The Bayesian "non-centrality" parameter is p
E(~ j3i) = pwc
(6)
1
An estimate of this can only help determine the product wc. Separate estimation of wand c will require the use of the normal likelihood in a way that is not robust. We will return to this problem elsewhere.
3
Consistency
We first consider the case where c is known, so in the PEB criteria estimates Cq , c7r are to be replaced by c. It is clear that if MqO remains fixed (as p -----> 00), then the likelihood ratio of MqO with respect to any other fixed M q, remains bounded away from zero and infinity. Hence it would be impossible to discriminate one of them from the other with error probabilities tending to zero as p -----> 00. That can happen only when Iqo - qll -----> 00 as p -----> 00. The following definition is motivated by this fact.
Definition Let qo -----> 00 as p -----> 00. A penalized likelihood criterion A(q, ¥,p) for model selection is consistent at qo if given E > 0 and for sufficiently large p and qo, there exists a k, (depending on E, p, qo, such that
Pqo{A(qo, ¥,p) > A(q, ¥,p), Vlq - qol 2 k} > 1 -
E
(7)
Of course we could take fixed qo and examine consistency from the right only. The treatment is exactly similar. Let
A(q, ¥,p) for some).. >
o.
=
SSq - q)..
Then for ql > qo and ql - qo
(8)
-----> 00
q
A(qo, ¥,p) - A(q, ¥,p) = -r ~ 9i2
+ (ql
- qo) .. = (ql - qo)().. - 1 + op) (9)
qo+l
Similarly for q < ql and qo - ql
-----> 00,
A(qo, ¥,p) - A(q, ¥,p)
=
(ql - qo)(1
+ rc -).. + op(I))
(10)
We thus have Proposition 3.1. The penalized likelihood criterion A(q, ¥,p) with constant penalty coefficient ).. is consistent at all qo -----> 00 iff 1 < ).. < 1 + rc.
Parametric Empirical Bayes Model Selection
234
For AIC, >.. show that
=
2, so one would have consistency if rc > 1. If rc < 1, one can
A(1, Y-,p) - A(q, Y-,p)
- t 00
(11)
a.s.
if q - t 00, i.e., AIC chooses MI or models not far from MI' It is shown in section 5 that this is a good thing to do, if one wants to make predictions and least squares estimates are used. The usual BIC with>" = log n is inconsistent, this extremely high penalty also leads to poor performance in prediction. A modified version due to several people, see [9] or Mukhopadhyay, Berger and Ghosh (2002) for references, has log p instead of log n. That also is not consistent in general. For consistency one requires r :2: 3 and 1 + rc-log r > O. We now turn to the three PEB rules with estimates cq or c7r . It is easy to check that the rule based on A7r (q) is consistent if c7r is a consistent estimate for c. To prove this we need to show
1 +rc 1 < --log(1 +rc) < 1 +rc rc
(12)
The right hand inequality follows from
log(1
+ rc) < rc
which is proved by the fact that the second derivative of log(1 The left hand inequality follows from
(1
+ rc)log(1 + rc) > r
(13)
+ x)
is negative.
(14)
which is proved by the fact that the second derivative of (1 + x)log(1 + x) is positive. The other two PEB criteria differ from each other by a quantity which is op(q), hence they are either both consistent or both inconsistent. Since cq has undesirable properties as an estimate of c (vide Section 4) neither of these rules is consistent in our sense. This does have some effect on their performance in prediction problems. All one can show for these two cases is that A(qo, Y-,p) - A(q, Y-,p) - t 00 if \q - qo\ - t 00 and (qO/ql) is bounded away from zero. To prove this, one has to use the behavior of cq for q > qo which is studied in the next section.
4
Estimation of c.
By the law of large numbers, for large q,
N. Mukhopadhyay and J.K. Ghosh
235
q
rL:Y? Cq
=
1
q
qac
_
1 = c (approximately), for q ::; qo
.
= - (approxImately), q > qo
(15)
q
Clearly, for large incorrect models, cq decreases the penalty for each additional parameter, namely, l+log (1 + cq ). This is counterintuitive. Plots of cq for simulated data in [9] shows that cq tends to die out for large incorrect values of q. This is the main reason why consistency became a problem for A(q, cq ). If the true qo is fixed and not large, one cannot have a consistent estimate of c. If qo
-----t 00
at a rate faster than some known
q,
then a consistent estimate is
(16) However such knowledge of q is unlikely. A plot of information about both c and true qo.
cq
provides good visual
An estimate of c, which is easy to calculate and has a nice Bayesian interpretation is the model average
(17)
where
(18)
Asymptotic behavior of c7r is difficult to study. It is unlikely to be consistent in general for the following reason. For values of q much larger than qo, cq will be much smaller than c but such q's will have large weights itq inappropriately. The net effect of this will be to pull down the average c7r away from c. Some evidence of this based on simulation is provided in [9]. We now make two rather strong assumptions which ensure consistency of a slightly modified version of c7r • AI) As p
-----t 00,
qo/p is bounded away from zero
A2) There is a known positive number k such that c::; k.
Parametric Empirical Bayes Model Selection
236
The modified version, also denoted by the same symbol, is p
Cn =
I)·q min(cq, k)
(19)
1
Under our assumptions cn ---+ c a.s. We sketch a proof. For slight simplicity, we take r = I. For q :::; qo
(20) This can be used to show for all q < qo (1 - E), 0 < E < 1, b > 0 and sufficiently small,
A(q, cq )
-
A(qo, cqo ) < (qo - ql {log(1 +
cqo ) -
c + b} + ql {log(1 +
cqo ) -log(1 + cq )} (4.1) (21)
(where '"'I > 0) with probability> I-E. We have used the fact that log (l+c) < c. We can now show as in the proof of Proposition 5.1 that
L
exp{A(q, cq )
-
A(qo, cqo )} ---+ 0
(22)
q~qo(1-t)
with probability tending to one as p ---+
00.
For q 2:: qo (23)
where by the strong law, sup Irql---+ 0 in probability. So, by concavity oflog(x), q?qo
there exists b > 0, such that for p 2:: q 2:: qo (1 + E) (24) where rq is a generic term such that sup Irql is op (1). Then for p 2:: q > qo(I+E), q
where
sup
qollrql = op(l)
q>qo(1+t)
The expression in (25) is, by (24),
b
< -q2
Once again an analogue of (22) for q > qo(1 + E) is true. So the contribution to cn from q > qo(1 + E) and q < qo(1- E) is negligible. But for Iq - qol < E, cq can be made as close to c by choice of E. This proves the consistency of cn .
N. Mukhopadhyay and J.K. Ghosh
5
237
Bayes Rule for Prediction Loss and Asymptotic Performance
It is well-known (see, e.g., Shao (1997)) that the loss in predicting unobserved Y's, for an exact replicate of the given design, on the basis of given data is the q
sum of a term not depending on the model and the squared error loss 2: (Yi -;Ji? . 1
So in evaluating performance of a model selection rule it is customary to ignore the term not involving the model and focus on the squared error loss. We do so below. For a fixed c the Bayes rule is described in the following theorem. We need to first define a quantile model. A model Mq is a posterior a-quantile model if n(i + 1 ~ qlY) ~ a < n(i ~ qlY) or equivalently. Theorem 5.1. The Bayes rule selects the smallest dimensional model ifrc and the posterior r~~cl quantile model if rc > 1
~
1
Proof Let Mq stand for the true (random) model with prior n(q) The posterior distribution of ;Ji given Mq is
rc n(;Jilq, Y) = N(--Yj, c/(1 1 + rc =
+ rc)),
i ~q
point mass at zero, i > q
Hence -
Y
2
2
C
E{(Yj-;Ji) Iq'Y}={1+rc} +1+rc' i~q -2
= Yi, Similarly, E{(;Ji - a)2Iq, Y} = {;~~~P =
i>q
+ l~rc'
i
~q
a,i > q
Suppose we ignore the fact that we have to select from among nested models (i.e., we have to include all j < i if we include i in our model) and just try to decide whether to set ;Ji non zero or zero. The posterior risks of these two decisions are
W(i excluded IY) -
=
c {n(q 1 + rc
Hence inclusion of i is preferred iff
~ ilY)} + Y?{( ~ )2n(q ~ iIY)}. 1 + rc
Parametric Empirical Bayes Model Selection
238
which implies
1 + rc 2 rc
< n(i
::; qlY)
Suppose rc > 1. Then we choose all i such that n(i ::; qlY) > ~~~~. Given the obvious monotonicity of n(i ::; qIY), this means we choose the ~~~~ posterior quantile model. Clearly this is the Bayes rule. More formally if d( q1) is the decision to choose model M q , corresponding posterior risk b
ql
w(q1IY) =
L
w(i includedlY)
L
+
w(i excludedIY)·
i=l
rc-1 : : : L Min{w(i includedlY), W(i excludedlY)} = W(-2 -quantile modellY) rc p
1
Similarly if rc ::; 1, it is easy to see that the simplest model minimizes the posterior risk among all models. This completes the proof. To define asymptotic Empirical Bayes optimality, we define an oracle, I.e., a lower bound to the performance of any selection rule. Let MqO be the true (unknown) model and d(q1) the decision to select Given y, the PEB risk of d(qd under M qO ' after division by qo, is 1
M q1 .
P
ql
A(q1) = -lLE{(Yi qo t= . 1
f3d Iqo, Y} + 2
L
E{f3ilqo, Y}]
. +1 t=ql
for q1 ::; qo 1
c
=--+ 1 + rc for q1
(1
1
-
L yqo
+ rc)2 qo.t=l
2
+
t
q1 - qo 1 qo q1 - qo .
L yq
t=qo+1
2
t
> qo·
Using the strong law of large numbers we obtain a heuristic approximation to A(q1) namely
c + q ~ + qo - q c2r2 l+rc qo(1+rc)r qo (l+rc)r' c 1 1 q - qo 1 =--+ -+---,qo
f3(qd =
which reduces to c
c 2r2
q (1 - rc)
-- + +1 + rc r(l + rc) qo
r
q ::; qo
N. Mukhopadhyay and J.K. Ghosh
239
and e 1 + re
q - qo 1
1
+ r(l + re) + ----;;;- -:;
q > qo
Clearly qo(3(·) is a non-random approximation to the posterior risk under Mqo. Note that (3(.) is minimum at qo if re > 1 and at q = 1 if re = 1, then (3(.) does not depend on q.
Let A (.) and (3(.) be defined as above. Then
Theorem 5.2.
inf A(q) lim supIA(q)-(3(q)I=O and lim .qf(3() =1. qo-+= q qo-+= In q q
Proof We consider the case q
~
qo. The other case follows similarly.
2
A(q) - (3(q) =
qo
(1:
re
1
qo - q
q
L fi2 - (1 + re)} q
) {-
2 2
c r
1
1
qo
~
-2
+ -qo- (1 + re )2 {qo-- -q L..,. Yi -
(1
+ re)}
q+l
=
Tl (q)
We show that sup ITl (q) I
--+
+ T2(q) 0 a.s. One can show the other part --+ 0 in a
q
similar way. q
By SLLN, given
E
-
> 0, we choose a A such that for q > A, I L:: Yi 2 /q- (1 +e)1 <
E.
1
Since qo > q and (1+e)2 > 1, ITl(q)1 for q ~ A can be made smaller than
~ ql. The remaining ITl(qo)1 if we choose q sufficiently large.
< E for A < q E
By repeated application of this kind of elementary argument one proves the first part of the theorem. The first part implies lim Iinf A (q) - inf (3 (q) I = 0 qo-+= q q Since, inf (3 (q) q
=e =1
if e
<1
if e
~
1
is positive, the second part of the theorem follows.
Theorem 5.3. For known e, the optimal model Mqc is asymptotically equivalent to the oracle q minimizing A(q) in terms of posterior predictive loss, i.e.,
Parametric Empirical Bayes Model Selection
240
posterior predictive loss of Mqc under qo (qo inf A(q) )
---+
1a.s.
q
as qo
---+ 00
To prove this we need the following result, which has some independent interest.
Proposition 5.1 Let qo be the true model,. As qo for any b ---+ 00 such that b = o(qo).
---+ 00,
7r(lql -qol
> bl¥)
---+
0
This is in the spirit of posterior consistency at qo except that b is not fixed but goes to infinity at a relatively slow rate. Proof of Theorem 5.4 Without loss of generality take r = 1. If c < 1, the model Mqc always chooses the simplest model. Hence its posterior risk (under qo) is qoA(qc). Since f3(q) is minimized at q = qc in this case, we are done. For c > 1, inf f3(q)
= 1.
Also by Prop 5.1,
qc qo
q
a.s.
---+
1
We consider the cases where qc ::;: qo The other case is similar. The posterior risk of Mqc for qc < qo is
which
---+
1 a.s. since
qc ---+ qo
1 a.s.
Proof of Prop. 5.1. We take r = 1 as before and let >.(c) = l~C log(l + c). It has been proved before that 1 < >.(c) < 1 + c. Using the strong law, given E > 0, there exists k > 0 such that for q > qo + k, with probability tending to one I(A(q) - A(qo))I(q - qo) - (1 - >,(c))1 < E i.e. A(q) - A(qo)
< -(q - qo)'y, for some I > o.
Hence
7r(q> qo
+ kl¥)::;:
L
t(q-qo)
where t =
e-"(
q>qo+k
=
t k / (1 - t)
One can similarly show 7r(q < qo - kl¥)
---+
---+
0
0, using >.(c) < 1 + c
Remark 5.1. Theorem 5.1. holds for unknown c if c is a consistent estimate and we use qc of the Empirica Bayes model selection rules but replacing cq by c. The same result holds for AlC also, which is interesting since AlC does not need to estimate c consistently. We prove this below. One simply notes that in Section 3 we prove that for rc > 1, AlC is consistent for qo, if qo ---+ 00. Also for rc < 1, AlC (q)- AlC (1) ---+ -00, if q ---+ 00. Using
N. Mukhopadhyay and J.K. Ghosh
241
these facts one shows, as in the proof of Theorem 5.4., Ale attains the same risk as the oracle. So far we have been looking at several Bayesian model selection rules from the point of view of prediction or squared error loss in a situation where after selection of model least squares estimates are used. Results differ in a major way if least squares estimated are replaced by the Bayes estimates E((3ilq, y) = 1~~c Yi if Mq is chosen and i -::; q. Since the proofs are similar we merely state the main facts. For a known c, the Bayes rule becomes the posterior median rule. This is a special case of a general result of Barbieri and Berger (2000) but can also be derived like Theorem 5. To define a Bayesian oracle, we redefine
A(q)
=
1
~
1"C
-
~
2
2
-[L,..,E{((3i- -I-Ii) Iqo,y}+ L,..,{((3i -0 Iqo,y}] qo i=1 + 1"C q+l q C -qo 1 + 1"C
+
(qo - q) C {-qo 1 + 1"C
1"C
+
1 + 1"C
+
~qO
-
2
.
Ii} (qio - ql) If ql -::; qo
q+l
and C
-- + 1 + 1"C
(q - qo) C -qo 1 + 1"C
+
~q
1
-
2
( - - I i ) /(ql - q) if ql > qo
qo
1 + 1"C
The heuristic nonrandom approximation is
(3(q) = !l_C_ qo 1 + 1"C and C
1 + 1"C
+ (ql
+ qo -
q{
qo
C
1 + 1"C
- qo) { C qo 1 + 1"C
+
2 2 1" C
1 + 1"C
+ _1_} 1 + 1"C
}
q -::; qo
q > qo
inf(3(qd = l~rc' attained at qo, for all c. The posterior median Bayes rule as well as the PEB model selection rules followed by Bayes estimation attains the risk of the Bayesian oracle, namely q minimizing A(q), provided C is known or a consistent estimate of C is used. The advantage of using the (shrinkage) Bayes estimates can be seen comparing the inf (3(q) for the two cases, namely l~rc for Bayes estimates and ~ for least squares estimates. For all fully Bayes rules reduce the posterior risk per component in the model by It~c which can be very large if both 1" and care small.
Parametric Empirical Bayes Model Selection
242
c=0.5,qo=50
c=0.5,qo=20 4 3 2
0.8 0.6 0.4 0.2
1
50 100150200250 c=3,qo=50
c=3,qo=20
1
50 100150200250
50 100150200250
Figure 1: Behavior of cq in a nested sequence of models.
6
Simulations and Discussion
A plot of cq against q is a good Bayesian data analytic tool that provides information about both c and the true dimension qo. This is true of all the four graphs in Figure 1 but it is specially noticeable when c if not too small. The second set of simulations describe the performance of different model selection rules for 0-1 loss. We have taken r = 1 In addition to AIC, BIC and the three PEB rules defined in section 2, we consider the Conditional Maximum Likelihood rule (CML) of [6], in which both cq and Wq are used as indicated in Section 2, even though the binomial prior seems unintuitive in the nested case. In simulation c = 0.5 or 3. Higher values of c are considered in [9], the results are very similar to those for c = 3. It is clear from Tables 1 and 2 that the BIC and CML are disastrous, as expected. AIC does well for c = 3 but badly for c =0.5, again as expected from Section 3. However, inconsistency is preferable to consistency in the prediction problem, vide the proof of Theorem 5.1 and Proposition 5.2. This is borne out by the third set of simulations.
The third set of simulations (Tables 3 and 4) describes performance of these cri teria under prediction loss. Once again, A* ( q) seems to do substantially better than A(q) and An is somewhat worse than the other two. AIC is competitive for c > 1 and dramatically better than c < 1 This is because with least squares estimates neither of the three PEB rules are asymptotically optimal if c < 1. Of course the Bayes rule qc for prediction loss would have done much better and be comparable to AIC.
N. Mukhopadhyay and J.K. Ghosh
41 5 A(q)
5
1
BIG AIG GML
8 44 270
38 310 1 3
A*(q)
A7r(q)
10 I
7 12 136 475 1 1 1 1 1 3 1 1 999
28 199 2
3 10 14 102 444 1 1 1 1 1 3 1 1 999
4 10 16 80 384 1 1 1 1 2 5 1 1 999
243
20 I 15 26 104 2 8 20 20 48 242 1 1 1 1 2 8 1 1 999
500 I
40 I 26 40 78 3 21 40 34 50 150 1 1 1 1 3
476 498 515 475 497 510 478 498 516 1 1 1 1 3 10 999 999 999
11
1 999 999
800 774 796 810 771 795 809 775 796 812 1 1 1 1 4 12 999 999 999
I
900 II 873 895 908 873 895 908 874 895 908 1 1 1
1 3 9 999 999 999
Table 1: Quartiles of the dimensions selected by different criteria for c = 0.5, r
=
II
qo
1.
10 I 4
3 A(q)
5 12
5 22 2
2 A*(q)
8
6
4 4
4
1
8 10 14 1
1 1
2
1 2
2 3
3 6
4
4 4
9 5
1
1
GML
10
5 38
1
AIG
9 5
8 64
BIG
11
4
3
A7r(q)
10
1 999
10 1
1 999
2 999
20 I 17 20 21 16 19 20 17 20 21 1 1 3 16 19 20 1 999 999
40 37 39 41 36 39 40 37 39 41 1 1 3 36 39 40 999 999 999
I
500 497 500 500 497 500 500 497 500 500 1 1 3 497 499 500 999 999 999
I
800 797 799 800 797 799 800 797 799 800 1 1 3 797 799 800 999 999 999
I
900 897 899 900 897 899 900 897 899 900 1 1 3 896 899 900 999 999 999
II
Table 2: Quartiles of the dimensions selected by different criteria for c = 3, r = 1.
Parametric Empirical Bayes Model Selection
244
II
qo A(q)
A*(q) An (q)
BIG AIG GML
227.94
5 211.53
10 205.26
20 178.71
40 138.14
500 522.19
800 818.44
900 909.33
35.77 293.53 2.63 5.18 425.15
20.66 297.17 3.05 4.91 412.92
37.06 297.28 5.5 7.86 466.09
42.47 235.44 10.54 13.68 499.04
54.76 180.23 20.69 25.82 574.25
518.25 522.89 250.74 258.63 998.05
816.34 818.57 401.06 409.8 1000.51
908.1 909.44 450.62 457.95 998.57
4
Table 3: Prediction loss of the models selected by different criteria for c r = 1.
II
qo A(q) A*(q) An(q)
BIG AIG GML
4
94.92 14.36 146.83 6.85 6.56 331.95
5 113.44 19.09 145.46 9.51 7.09 371.34
10 39.42 15.6 53.61 21.74 13.13 446.24
20 31.23 24.45 31.21 50.36 23.23 635.56
I
40 44.6 44.22 44.6 108.76 43.62 847.6
500 503.71 503.65 503.71 1489.84 503.66 998.85
II
= 0.5,
800 804.03 804.08 804.03 2392.47 804.32 998.6
Table 4: Prediction loss of the models selected by different criteria for c r = 1.
I
900 904.4 904.36 904.35 2693.92 904.23 999.55
= 3,
We have not done any simulations on the posterior median Bayes rule, which uses PEB shrinkage Bayes estimates. It is expected to outperform AIC as seen from the comparison of j3(.), s for model selection followed by least squares and model selection followed by Bayes estimates. The three PEB criteria of Section 2, followed by Bayes estimates, are expected to do much better than evident in Tables 3 and 4 but not as well as the posterior median rule. It may be worth pointing out that there is a basic difference between the median Bayes rule and AIC. Whether c > 1 or < 1, the median Bayes rule is consistent at qo- a proof can be constructed using Proposition 5.1 But it then shrinks the estimates towards zero appropriately, depending on values of c. AIC doesn't have this option, it uses least squares estimates. So for critically small values of c, namely c < 1, it has to choose a much lower dimensional model to have some sort of shrinkage.
Bibliography [1]
Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. In B. N. Petrov and F. Czaki, editors, Proceedings of the Second International Symposium on Information Theory, 267-271 Budapesk: Akad. Kiado.
[2]
Barbieri, M. and Berger, J. (2000) Optimal Predictive Model Selection, ISDS Discussion Paper, Duke University.
II
N. Mukhopadhyay and J.K. Ghosh
245
[3J
Berger, J.O., Ghosh J. K., and Mukhopadhyay, N. (2003) Approximations and consistency of Bayes factors as model dimension grows, Journal of Statistical Planning and inference, [112], 241-258.
[4J
Berger, J. O. and Pericchi, L. R. (2001) Objective Bayesian Methods for Model Selection: Introduction and Comparison, IMS Lecture Notes, (P. Lahiri editor) 38, 135-203.
[5J
Efron, B. and Morris, C. (1973) Stein's Estimation Rule and its Competitors an Empirical Bayes Approach, Journal of the American Statistical Association, 68, 117-130.
[6J
George, E. I. and Foster, D. F. (2000) Calibration and Empirical Bayes Variable Selection, Biometrika, 87, 731-747.
[7J
Li, K-C (1987) Asymptotic Optimality of cP ' Cl, cross Validation and Generalized cross Validation: Discrete Index Set, Annals of Statistics, 15, 958975.
[8J
Morris, C. (1983) Parametric Empirical Bayes Inference, Journal of the American Statistical Association, 78, 47-55.
[9J
Mukhopadhyay, N. (2000) Bayesian Model Selection for High Dimensional Models with Prediction Loss and 0-1 loss, thesis submitted to Purdue University.
[10J Mukhopdhyay, N. and Ghosh, J.K. (2002) Bayes Rules for Prediction Loss
and AIC, (submitted). [l1J Rissanen, J. (1983), A Universal Prior for Integers and Estimation by Minimum Description Length, Annals of Statistics, 11, 416-431. [12J Schwartz, G. (1978) Estimating the Dimension of a Model, The Annals of Statistics, 6, 461-464. [13J Shao, J. (1997) An Asymptotic Theory for Linear Model Selection, Statistica Sinica, 7, 221-264. [14J Shibata, R. (1981) An Optimal Selection of Regression Variables, Biometrika, 68, 45-54. [15J Shibata, R. (1983) Asymptotic Mean Efficiency of a Selection of Regression Variables, Annals of the Institute of Statistical Mathematics, 35, 415-423.
246
Parametric Empirical Bayes Model Selection
A Theorem of Large Deviations for the Equilibrium Prices in Random Exchange Economies Esa Nummel in University of Helsinki
Abstract We formulate and prove a theorem concerning the large deviations of equilibrium prices in large random exchange economies.
1
Introduction
We consider an economic system (shortly, economy) E, where certain commodities j = 1, ... , l are traded. Let R~ =def {p = (pI, ... ,pl) E Rl; pJ ~ 0 for all j = 1, ... , l}. The elements p of R~ are interpreted as price vectors (shortly, prices). (We will follow a convention, according to which superscripts always refer to the commodities whereas subscripts refer to the economic agents.) The total excess demand function Z (p) = (Zl (p), ... , Zl (p)) E Rl comprises the total excess demands on the l commodities in the economy at the prices p E R~. Its zeros p* are called the equilibrium prices: Z(p*) = O.
(In fact, according to Walras' law, we may regard money as an l+ 1 'st commodity [the numeraire] having price pl+l = 1 and total excess demand Zl+l(p) = -p. Z(p).) In the classical equilibrium theory the economic variables and quantities are supposed to be deterministic, see [2]. It is, however, realistic to allow uncertainty in an economic model. We assume throughout this paper that the total excess demand Z (p) is a random variable (for each fixed price p). In particular, it then follows that the equilibrium prices p* form a random set.
The seminal works concerning equilibria of random economies are due to Hildenbrand [5], Bhattacharya and Majumdar [lJ and Follmer [4J. The equilibrium prices in large random economic systems obey (under appropriate regularity conditions) classical statistical limit laws. The law of large numbers [lJ states that, as the number n of economic agents increases, the random equilibrium prices (r.e.p.'s) p~ become asymptotically equal to deterministic "expected" equilibrium prices: · Pn* 11m
n-+CXJ
= Pe'*
247
Random Exchange Economies
248
(The subscript n refers to the number of economic agents.) The central limit theorem (CLT) for the r.e.p. 's [lJ characterizes the "small deviations" of the r.e.p.'s from their expected values as asymptotically normal:
n ~ (p~ - p:)
----+
N in distribution,
where N denotes a multinormal random vector having mean zero. We argue in this article for the relevance of the theory of large deviations to random equilibrium theory. To this end, suppose that, an aposteriori observation of the equilibrium price is made, and let p denote the value of this observation. If the modeler is concerned with the estimation of the apriori probability of an aposteriori observation p of the equilibrium price in a large economy, the use of the CLT requires the apriori model to be "good" in the sense that the observation p ought to fall within a narrow range (having the asymptotically negligible order n - ~ = o( n)) from its expected value p;.
However, due to the fact that economics is concerned with the (economic) behaviour of human beings, any (predictive) economic model is always to some extent defective. It follows, in particular, that in a large economy an observed equilibrium price p may well represent a "large deviation" from its apriori predicted value p; (viz. fall outside the region of validity of the CLT). The main result of this paper is a theorem of large deviations (LD's) for the random equilibrium prices. It yields an exponential estimate for the (apriori small) probabilities of observations of r.e.p.'s "far away" from their expected values. Namely, we prove that, under appropriate regularity conditions, for an arbitrary fixed price p, there exists a constant i(p) 2: 0 such that
(1.1 )
In accordance with standard LD terminology (see [3]), we refer to the price depending constant i(p) as the entropy. In what follows we shall formulate and prove (1.1) as an exact mathematical theorem. LD theorems for random equilibrium prices were earlier presented in [7],[8J. The version here is of "local type" in that we are concerned with probabilities of observations of r.e.p.'s in small neighborhoods of a given fixed price. Because of this it turns out that the hypotheses of [7],[8J can be somewhat relaxed. Also it becomes possible to give a self-contained proof which does not lean on the general abstract LD theory. Therefore the proof ought to be accessible also to a reader who is not an LD specialist. The basic idea in the proof is to use a centering argument of a type which is commonly used in LD theory.
Esa N ummelin
2
249
Formulation of the LD theorem
We describe now the basic set-up and formulate the large deviation theorem in exact terms. We will be concerned with a sequence En, n = 1,2, ... , of economies. We assume that in the economy En there are N n economic agents labeled as i = 1, ... , N n . We assume that N n is of the order O(n); namely,
Nn
::;
An for some constant A <
00.
(2.1)
Let (0, P, F) be a probability space. We consider a double sequence of R1-valued maps (in: X R~ -----t Rl, n = 1,2, ... , i = 1, ... , N n , such that, for each fixed n, i and p, the function
°
is a random variable (viz. F-measurable). (in(P) is interpreted as the (random) individual excess demand by the i'th agent in En at the price p. Example 2.1. In a Cobb-Douglas exchange economy the individual excess demand by an agent i E En on commodity j is given by the formula I ;-j (
) _
"in p -
(,,-J)-1 jJ aj
in
"" k k ~p ein
- e jin ,
k=l
where the parameters a{n 2: 0 satisfy I
L a{n = 1 for all i
and n,
j=l
and e{n denotes the agent's initial endowment on the commodity j, see e.g. [1 OJ. In a random Cobb-Douglas exchange economy the parameters a;n and e{n are supposed to be random variables. The random total excess demand in the economy En is obtained as the sum of the random individual excess demands: Nn
Zn(P) =
L (in(P)· i=l
(In order to indicate its dependence on the size parameter n, we equip henceforth the total excess demand with the subscript n.) For a fixed economy En and for a fixed realization wE 0, a price p~(w) at which the total excess demand function vanishes, i.e., such that Zn(w;p~(w)) = 0, is called an equilibrium price for the realization w in the economy En. We denote by 7r~ (w) the set of equilibrium prices p~ for the realization w in the economy En·
Random Exchange Economies
250
Let
Cn(o:;p)
log Eea,zn(p), 0: E Rl,
=
denote the cumulant generating function (c.g.f.) of the random total excess demand Zn (p), P E R~, and let
e(o:;p) = limsupn-1Cn(0:;p). n--+oo
We denote
i(p)
=
-
inf e(o:;p) aERI
and call it the entropy (associated with the price p). Note that, due to the fact that c(O;p) == 0 it follows that i(p) ;::: 0 always. Recall that a c.g.f. is always a convex function. Consequently, Cn(o:;p) as well as the limit e( 0:; p) are convex functions (of the variable 0:). Thus in particular, if
Be
80: (o:(p); p) = 0 for some o:(p) E Rl, cf. the hypothesis (HI),
(2.2)
then it follows that
(2.3)
i(p) = -e(o:(p);p). The zeros pnces:
P:
of the entropy function i(p) will be called expected equilibrium
Under appropriate regularity conditions these are the same as the zeros of the mean excess demand function J-L(p) , defined by
Proposition 2.1. Suppose that
(2.4) there is a unique o:(p) such that ~~ (o:(p);p) = 0, and (2.5) e( 0:; p), 0: E Rl is differentiable at 0: = O.
Then (2.6) J-L(p)
=
~~ (O;p), and
(2.7) i(p) = 0 if and only if J-L(p) = O.
Proof of Proposition 2.1 That (2.5) implies (2.6) is a standard fact in LD theory (see e.g. [3]). In order to prove (2.7) assume first that i(p)
= 0,
i.e.,
c(o:(p);p) = min C(O:iP) = O. aERI
Esa Nummelin
Since c(O;p)
251
= 0, it follows from the uniqueness of a(p) that a(p) = O. p,(p)
Bc Ba(O;P)
=
O.
Bc p,(p) = Ba(O;P) =
o.
=
Therefore
Suppose conversely that
Again, due to uniqueness, a(p)
= 0 so
that
i(p) = -c(a(p);p) = -c(O;p) = 0,
o
indeed.
Example 2.2. Suppose that N n == nand (in(P) = (i(P) for i = 1, ... , n, where (i(P), i = 1,2, ... , is a sequence of i.i.d. random variables (fOT each fixed price p). In this case Cn(a;p) - nc(a;p), (2.8)
and therefore c(a;p)
=
log EeO:-(l (p)
is equal to the c.g.f. of the individual excess demand (l(P). Moreover, due to the classical LLN for i. i. d. random variables, the mean excess demand is equal to the expectation of the individual excess demand:
Let us now fix a price p E R~. We formulate the following set of hypotheses. (The abbreviation "w.p.l" means the same as "with probability 1", and the phrase" eventually" means "for all sufficiently big n" .) (HI) ::Ja = a(p) E Rl: g~ (a(p);p) = 0;
(H2) c(a(p);p) = lim n-1Cn(a(p);p); n---+CX)
(H3) ::JA1(p) <
00,
cl(P) > 0 : I(In(q)1 < Al(P) w.p.l, for all i and n, for
Iq - pi < cl(P);
< 00, c2(P) > 0 : Iq - pi < c2(P);
(H4) ::JA2(P)
(H5) ::JA-1(p)
< 00:
1(:~(q)1
::; A2(P) w.p.l, for all i and n, for
l(n-lZ~(p))-ll::; A-l(P) w.p.1., for all n.
Remarks. (i) Condition (H4) implies condition (H3).
Random Exchange Economies
252
(ii) Suppose that (in(P) = (i (p) , where (i(P), i = 1,2, ... , are i.i.d. as before. Now, due to (2.8), the hypothesis (H2) is trivially true. Also it turns out that in this case hypothesis (H5) can be replaced by the simpler hypothesis (H5 ') ,i (p) is non-singular, see [9].
(i) Suppose that the hypotheses (H1-3) hold true. Then there exists a constant Mo (p) < 00 such that
Theorem 2.1.
P(7r~
n U(p, c)
eventually, for all 0 < c <
0) < e-n(i(p)-Mo(p)E)
-=1=
Cl (p).
(ii) Suppose that the hypotheses (H1-2,4-5) hold true. constant Ml (p) < 00 such that P(7r~
eventually, for all c >
n U(p, c)
-=1=
Then there exists a
0) > e- n (i(p)+M 1 (p)E)
o.
Let us call a price p E R~ non-expected, if the entropy i(p) > o. Under the conditions (2.4-5) this is equivalent to p not being a zero of the mean excess demand fl(P):
fl(P)
-=1=
o.
By using Borel-Cantelli lemma we obtain the following corollary of part (i) of the LD theorem: Corollary 2.1. Suppose that the hypotheses (H1-3) hold true. Let p E R~ be a
non-expected price. Then 7r~
3
n U(p, c) = 0 eventually,
w.p.1, for all 0 < c < cl(P)·
Proof of the LD theorem
For the proof of the upper bound (i) we need two lemmas. standard type in LD theory.
The first is of
We define the following sequence of probability measures:
Pn;p(dw) = ea(p).zn(w;p)-Cn(a(p);p) P(dw), n = 1,2, .... Lemma 3.1. Suppose that hypotheses (H1-2) hold true. Then for each <5 > 0, there exists a constant 'T] = 'T]( <5; p) > 0 such that
Esa Nummelin
253
Proof of Lemma 3.1. Let t > 0 be arbitrary. By Chebyshev's inequality we have for the j'th component of the total excess demand:
p n,p . (zjn (p) > . etZ~(p) _ n&) < _ e- tn8 E n,p
where
ej
denotes the j'th unit vector in RI. Due to (HI) and (H2),
n-tCX)
= &(t)t where &(t)
---+
0 as t
---+
M
O. By choosing t small enough we thus see that
limsupn-llogPn;p(Z~(p) ~ n&)
< O.
n-tCX)
By symmetry, we have also limsupn-llogPn;p(Z~(p) ::::; -n&)
< 0,
n-tcx)
o
which completes the proof of Lemma 1.
Lemma 3.2. Suppose that the hypotheses (Hl-2) hold true. Then, for all &> 0, we have:
e- n (i(p)+2I a (p)18) < P(IZn(p)1 < n&) < e- n (i(p)-2I a (p)18) eventually.
Proof of Lemma 3.2. Recalling (2.3) we see that it suffices to prove that lim sup In- 1 logP(IZn(P)1
< n&) -c(o:(p);p)l::::; lo:(p)W
(3.1)
n-tCX)
Due to Lemma 1,
~ < 1- e- n'T)(8;p) < Pn;p(IZn(P) I < n&)
::::; 1 eventually,
and hence, in view of the definition of the probability measure Pn;p(}
Now clearly,
whence -log2 -lo:(p)ln& < logP(IZn(p)1 < n&) - Cn(o:(p);p)::::; lo:(p)ln& eventually, from which the claim (3.1) follows by letting n
---+ 00.
o
Random Exchange Economies
254
Now we are able to prove the upper bound inequality (i). To this end, note first that, due to the hypotheses (2.1), (H3) and the mean value theorem, we can conclude that the event 7r~
n U(p, E) =I- 0
implies the event
IZn(P) I ~ AA1(p)nE w.p.l, for all n;:::: 1, 0 < E < E1(P)· Thus, in view of Lemma 2, P(7r~
n U(p, E) =I- 0)
~ P(IZn(p)1
< AA1(p)nE) < e-n(i(p)-Mo(p)e:) eventually,
where the constant Mo(p) = 2AA1(p)la(p)l. For the lower bound we need the following lemma which is a straightforward corollary of Theorem XIV in [6].
Lemma 3.3. Suppose that f : R~ E-neighborhood of the price p:
---t
If"(q)1 ~ M <
Rl has bounded second derivative in an
00
for Iq - pi < E.
Moreover, suppose that the derivative f' (p)
E
Rl x I is non-singular, and
. E I} If '( p) -1 1< mm{2If(p)I' 4ME .
Then f(q) = 0 for some Iq - pi < E.
Proof of Lemma 3.3Let g(h) = j'(p)-l(J(p + h) - f(p)),
Ihl < E.
Then g(O) = 0, g'(O) = I (= the identity), and
It follows that
Let
z
~
- j'(p)-l f(p).
Then Izl = If'(p)-lllf(p)1 < ~ and hence by setting s = ~ in [L: Lemma XIV.1.3] we can conclude that there exists a unique Ihl < E satisfying g(h) = z, viz. f(p + h) = O.
Esa N ummelin
255
Now we are able to prove the lower bound inequality (ii). To this end, let
in Lemma 3. Due to (H4) and (H5), we have
and
Note that, by monotonicity, it suffices to prove the assertion for small only. Thus we may assume that
E < min {E2(P),
4A (P;A 2
-l(p)
E
> 0
},
where E2(P) is as in (H4). Now, in view of Lemma 3 it follows that, if
then n- 1 Zn(q) = 0 for some
Iq - pi < E,
VIZ.
1f~
n U(p, E)
=1=
0.
Finally, by Lemma 2
P(1f~ n U(p, E)
=1=
0) 2': P(ln- 1 Zn(P) I < A
> e-n(i(p)+Mt(p)c:)
E ( ))
P eventually, 2
-1
where the constant
This completes the proof of the theorem.
Acknowledgements I would like to thank Professor Krishna B. Athreya for the invitation to take part in this Festschrift in Honor of Professor Rabi Bhattacharya. I am indebted to Professor Mukul Majumdar for useful comments on the text.
Random Exchange Economies
256
Bibliography [1]
Bhattacharya, R.N. and Majumdar, M.: Random exchange economies. J. Economic Theory 6, 37-67 (1973).
[2]
Debreu, G.: Theory of Value. Wiley, 1959.
[3]
Dembo, A. and Zeitouni, 0.: Large Deviations and Applications. Jones & Bartlett, Boston, 1993.
[4]
Follmer, H.: Random economies with many interacting agents. J. Math. Economics 1, 52-62 (1974).
[5]
Hildenbrand, W.: Random preferences and equilibrium analysis. J. Economic Theory 3, 414-429 (1971).
[6]
Lang, S.: Real and Functional Analysis. Springer, New York, 1993.
[7]
Nummelin, E.: On the existence and convergence of price equilibria for random economies. The Annals of Applied Probability 10, 268-282 (2000).
[8]
Nummelin, E.: Large deviations of random vector fields with applications to economics. Advances in Applied Math. 24, 222-259 (2000).
[9]
Nummelin, E.: Manuscript, under preparation, 2003.
[10] Varian, H.: Microeconomic Analysis. Norton, New York, 1992.
Asymptotic estimation theory of change-point problems for time series regression models and its applications Takayuki Shiohama, Masanobu Taniguchi Osaka University, Japan
and Madan L. Puri Indiana University, USA Abstract It is important to detect the structural change in the trend of time series model. This paper addresses the problem of estimating change point in the trend of time series regression models with circular ARMA residuals. First we show the asymptotics of the likelihood ratio between contiguous hypotheses. Next we construct the maximum likelihood estimator (MLE) and Bayes estimator (BE) for unknown parameters including change point. Then it is shown that the proposed BE is asymptotically efficient, and that MLE is not so generally. Numerical studies and the applications are also given.
AMS subject classifications: 62MlO, 62M15, 62N99 Keywords: Change point, time series regression, asymptotic efficiency, Bayes estimator, maximum likelihood estimator.
1
Introduction
The change point problem for serially correlated data has been extensively studied in the literature. References on various time series models with change-point can be found in the book of Csorgo and Horvath (1997) and the review paper of Kokoszka and Leipus (2000). Focusing on a change point in the mean of linear process, Bai (1994) derived the limiting distribution of a consistent change-point estimator by least squares method. Later Kokoszka and Leipus (1998) studied the consistency of CUSUM type estimators of mean shift for dependent observations. Their results include long-memory processes. For a spectral parameter change in Gaussian stationary process, Picard (1985) addressed the problem of testing and estimation. Giraitis and Leipus (1990,1992) generalized Picard's results to the case when the process concerned is possibly non-Gaussian. For a structural change in regression model, a number of authors studied the testing and estimation of change point. It is important to detect the structural change in economic time series because parameter instability is common in this field. For testing structural changes in regression models with longmemory errors, Hidalgo and Robinson (1996) explored a testing procedure with 257
Asymptotic estimation theory
258
nonstochastic and stochastic regressors. Asymptotic properties of change-point estimator in linear regression models were obtained by Bai(1998), where the error process may include dependent and heteroskedastic observations. Despite the large body of literature on estimating unknown change-point in time series models, the asymptotic efficiency has been rarely discussed. For the case of independent and identically distributed observations, Ritov (1990) obtained an asymptotically efficient estimator of change point in distribution by a Bayesian approach. Also the asymptotic efficiency of Bayes estimator for change-point was studied by Kutoyants (1994) for diffusion-type process. Dabye and Kutoyants (2001) showed consistency for change-point in a Poisson process when the model was misspecified. The present paper develops the asymptotic theory of estimating unknown parameters in time series regression models with circular ARMA residuals. The model and the assumptions imposed are explained in Section 2. Also Section 2 discusses the fundamental asymptotics for the likelihood ratio process between contiguous hypotheses. Section 3 provides the asymptotics of the maximum likelihood estimator (MLE) and Bayes estimator (BE) for unknown parameters including change-point. Then it is shown that the BE is asymptotically efficient, and that the MLE is not so generally. Some numerical examples by simulations are given in Section 4. Section 5 is devoted to the investigation of some real time series data. All the proofs are collected in Section 6. Throughout this paper we use the following notations. A' denotes the transpose of a vector or matrix A and X(,) is the indicator function.
2
Asymptotics of likelihood ratio and some lemmas
Consider the following linear regression model Yt
where
= {o'x(t/n ::; T) + (3'x(t/n > T)}Zt + Ut, = Tt(O, (3, T) + Ut, (say), t = 1, ... , n
Zt = (Ztl, ... , Ztq)'
are observable regressors,
0
=
(Q1, ... ,
(2.1)
Qq)'
and (3 =
(/31, ... , /3q)' are unknown parameter vectors, and {ud is a Gaussian circular ARMA process with spectral density f()..) and E(ut) = O. Here T is an unknown change-point satisfying 0 < T < 1 and (0', (3', T) E E> c IRq x IRq x lR. Letting n-h
2:=
Zt+h,jZtk,
h
= 0,1, ...
t=l
n
2:=
ZHh,jZtk,
h = 0, -1, ... ,
t=l-h
we will make the following assumptions on the regressors {zd, which are a sort
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
259
of Grenander's conditions.
Assumption 2.1. l+p
(G.1) aii(O) = O(n),
i
= 1, ... , q,
and
L ZZi = O(p) for any (1 ~ l ~ n). t=l
(G.2) limn----;oo z~+l,daii(O)
= 0,
i = 1, ... , q.
(G.3) The limit lim
an (h)
_~_J_
n----;oo
n
= Pi ·(h) J
exists for every i, j = 1, ... ,q and h = 0, ±1, .... Let R(h) = {pij(h); i, j = 1, ... ,q}. (G.4) R(O) is nonsingular. From (G.3) there exists a Hermitian matrix function M(A) = {Mij(A); i,j = 1, ... ,q} with positive semidefinite increments such that (2.2)
Suppose that the stretch of series from model (1) Y n = (Yl,··· ,Yn)' is available. Denote the covariance matrix of Un = (Ul,···, un)' by 2: n , and let tn = (rl,··· ,rn)' with r t = r t ( a,{3, T). Then the likelihood function based on Y n is given by
Since we assume that {Ut} is a circular ARMA process, it is seen that the following representation
2: n =
U~diag{21T f(Ad,·
where Un = {n- 1 / 2 exp(21Tits/n); t, s son (1977)). Write
~n
has
.. ,21T f(An)} Un
= 1, ... ,n} and Ak = 21Tk/n (see Ander-
Then the likelihood function (2.3) is rewritten as
Define the local sequence for the parameters:
a n =a+n- 1 / 2 a,
f3n=f3+n- 1 / 2 b,
Tn=T+n-1p
(2.5)
Asymptotic estimation theory
260
where a, bE IRq and pER Under the local sequence (2.5) the likelihood ratio process is represented as
where dn(>'k)
= (27fn)-1/2 L~=1 UteitAk and A(>'k) = A1 + A2 + A3 with [Tn+p] Al
= (27f!(Ak))-1/2
L
((3 - o:)'zse-iSAk,
s=[Tn]+1 [m+p] A2 = -(27fn!(Ak))-1/2 a'zse-iSAk
L
8=1
and
s=[Tn+p]+1 Here note that dn(Ak), k = 1,2, ... are i.i.d. complex normal random variables with mean 0 and variance !(Ak) (c.f. Anderson (1977)). Henceforth we write the spectral representation of Ut by
Ut
=
i:
eitAdZu(A).
(2.7)
The asymptotic distribution of Zn (a, b, p) is given as follows.
Theorem 2.1. Suppose that Assumption 2.1 holds. Then for all (0:', (3', T) E the log-likelihood ratio has the asymptotic representation log Zn(a, b, p)
= ((3 - o:)'Wl 1 - 87f2
+ yTa'W2 + ~b'W3 [Tn+p]
<Xl
L
f(j)
j=-<Xl
=
log Z(a, b, p)
where
L
((3 - 0:)' zs+jZ~((3 - 0:)
s=[Tn]+1
+ op(l),
(say),
e,
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
261
and
Here WI, W 2 and W3 are asymptotically normal with mean 0 and covariance matrix VI, V2 and V3 , respectively, where
Next we present some fundamental lemmas which are useful in the estimation of change point.
Lemma 2.1. Suppose that Assumption 2.1 holds. Then for any compact set C C 8, we have sup Ea,(3,TZ~/2(a, b, p) ::; exp{ -g(a, b, pH a,{3,TEC
where g(a, b, p)
= (ai, b')K
(~) + cipi
with some positive definite matrix K and c >
o.
Lemma 2.2. Suppose that Assumption 2.1 holds. Then for any compact set C C 8, there exist Ii(C) = Ii, B(C) = B such that sup [iial - a211 2 + Ilb l (a,{3,T)EC!ai I
3
E a ,{3,T
[Z~/4( a2, b2, P2) - Z~/4(al' bl, PI)
-
r: ;
r
b2112 + IPI - P21 2 B(l
I
+ H"").
Estimation theory
We are interested in the behavior of maximum likelihood estimator (MLE) and Bayes estimator (BE). To introduce these estimators, we need a loss function w(y), y E ]Rd which is 1. nonnegative, continuous at point 0 and w(O)
= 0, but is not identically 0;
2. symmetric: w(y) = w( -y); 3. the sets {y : w(y) < c} are convex for all c > O.
Asymptotic estimation theory
262
We denote by W p the class of loss functions satisfying 1-3 with polynomial majorants. The example of such function is w(y) = lylP,p > O.
A'
=
The MLE 0ML
,A'
(Q ML , (3ML,TMd of 0
" = (0.', (3 ,T)
=
L(QML' i3ML,TML)
is defined by (3.1 )
L(o, (3, T)
max (a,{J,T)Ee
-,
-,
The Bayes estimator 0 E = (a~, (3 E, PE) with respect to the quadratic loss function l(x) = IIxl1 2 and a prior density 7rC) is of the form
OE
=
r Op(OIYn)dO
(3.2)
Je
where
=
p(OIYn )
7r(O)Ln(O)
Je 7r(v)Ln(v)dv
.
We suppose that the prior density is a bounded, positive and continuous function possessing a polynomial majorant on e. For Z (u), u = (a', b' , p)', in Theorem 1, define two random vectors
u'
A'
(ii', b , p) and iL'
=
Z(u) =
_,
= (ii', b
,p) by relations
Z(u),
sup
(3.3)
UElF!.2q+l
_
U=
flF!.2d 1
uZ(u)du
(3.4)
~----------
flF!.2Q+l
Z (v )dv
Theorem 3.1. Let the parameter set e be an open subset of jR2q+l. Then the MLE is uniformly on (0., (3, T) E e, consistent P -
lim OM L
n--+CXl
=0
and converges in distribution
£e( diag{ vITi, ...
,vITi, n})( 9M L
For any continuous loss function w E W
lim EOw((diag{ vITi,,,,
p,
-
O)} -----t £( u). d
we have
,vITi, n} )(9 ML -
n--+CXl
0)) = Ew(u).
A similar theorem for Bayes estimators can be stated as follows. Theorem 3.2. The Bayes estimator OE, uniformly on 0 E
e,
is consistent
Pe - lim OE = 0 n--+CXl
and converges in distribution
£e( diag{ vITi, ...
,vITi, n})(O E
-
O)}
-----t
d
£( iL).
For any continuous loss function w E W P' we have
lim EOw((diag{ vITi,,,,
n--+CXl
,vITi, n} )(OE -
0)) = Ew(iL).
Remark. From Theorem 3 and Theorem 1.9.1 oflbragimov and Has'minski(1981), we can see that the BE is asymptotically efficient such that
Ellul1 22: ElliLl12.
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
4
263
Numerical examples
In this section we report some Monte Carlo results for the MLE and BE of an unknown change point. We consider the following time series regression model:
_{a'{3
Yt -
,
+ Ut, t = 1, ... , [Tn] Zt + Ut, t = [Tn] + 1 ... ,n, Zt
(4.1)
where {ud is a Gaussian AR(I) process generated by Ut
=
~Ut-l
+ Ct,
Ct "-'
i.i.d.N(O, (}2).
To verify the theoretical results and for comparative purposes, we deal with the following regressors; Model (I) : Zt = 1 (scalar-valued), Model (II): Zt = cos(vt) (scalar-valued), Model (III): Zt = (1, cos(vt))'. For simplicity, we assume that the parameters a, (3, ~ and () are known and focus on the estimation of unknown change point T. The error term Ct'S are same across different combinations of parameters and models. The coefficients (a,{3) are taken to be (0,2), (1,3) and ((0,1),,(2,3),) for the corresponding Models (I), (II) and (III), respectively, and v = 7r/6. The MLE and BE with uniform prior of T are given by
k = inf{k:
max {Ln(i/n)}
l:::=;t
= Ln(k/n)},
and Ti
= i/n,
i
= 1, ... ,n - 1
respectively. Then we compute the mean and the square root of the mean square error (RMSE) for TM Land TB based on 100 replications. Table 4.1 summarizes the simulation results for ~ = 0.7,0.9 and n = 100,300. The change point T is fixed to be 0.5. A closer inspection of Table 1 reveals some interesting characteristics. First, we notice that, in each case, the RMSE of BE is smaller than that of MLE, however mean estimates are almost same for all cases. A change in a cosine trend function seems to increase the bias of a change point estimators, while for n = 300, the mean estimates lie in the vicinity of 0.5. The effect of large value of ~ (near unit root) for MLE is particularly significant for Model (I) in view of RMSE. The histogram of these results are plotted in Figures 4.1 and 4.2 for ~ = 0.7 and ~ = 0.9, respectively, when n = 100. A study of these figures facilitates understanding of the simulation results in Table 4.1. It is obvious that the shape of distributions for MLE and BE is different when ~ = 0.9. The former has a fatter tail in general, while the latter has high frequencies around 0.5. For Models (II) and (III), the distributions of MLE and BE are skewed to the right,
264
Asymptotic estimation theory
which causes an increase in bias of an estimator. These facts are verified by comparing the sample coefficient of skewness and the sample kurtosis which are listed in Figures 4.2 and 4.3 together. It is questioned how large the RMES becomes for different values of ~ and the cases when the change point locates the edge of samples. A perspective view of the result given in Table 1 for the RMSE of Model (I) is shown in Figures 4.3 over a grid of points T = 0.1, ... ,0.9 and ~ = 0.1, ... ,0.9. According to this figure, as it is expected, we observe that the RMSE increases as ~ increases. However it seems that the RMSE's are stable and unaffected by T even though T is close to 0.1 and 0.9 when ~ is from 0.1 to 0.7. The discrepancy of RMSE between MLE and BE is significantly large for ~ = 0.9 and T = 0.5. As it can be seen from this figure that the BE works better than MLE in terms of RMSE in all cases. Next, we investigate the effect of the selection of frequency v in Model (II). The autoregressive parameter ~ is fixed at 0.7. Table 4.2 presents the results . We observe that the precision of the change point estimates deteriorates when v is close to 0 when n = 100. While the consistency is convincing for large n, the RSEM of MLE and BE becomes large as the frequency v tends to O. We summarize the simulation results as follows. First, the performance of BE is better than MLE in terms of RMSE, which is consistent with the theoretical result given in the previous section. Even though we assumed that the parameters except for change point are known, it is expected that similar characteristics will be observed for the cases of unknown parameters. To see these, we will report some real data analysis in next section.
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
265
Model (I) 1il
iil
_.1..__
~
!il
0,0
0,2
0,4
0,6
0,8
0
'" g 1,0
0,0
0.2
--L 0.4
0,6
0,8
1.0
0,8
1.0
0,8
1.0
(b) BE
(a) MLE
Model (II) iil
iil
(')
0
g
g
g 0
0,0
0,2
0,4
0,6
0,8
1,0
0.0
0,2
(c) MLE
Model
0,4
0,6
(d) BE
(III)
iil
iil
0
g
g
g
'"
0
0
0.0
0,2
0,4
0,6
(e) MLE
0,8
1,0
0,0
0,2
0.4
0,6
(f) BE
Figure 4.1. Histograms for the results of Table 1 for ~ = 0.7 and n = 100. The sample coefficient of skewness PI and the sample kurtosis P2 are: (a) PI = 0.70, P2 = 7.12; (b) PI = -0.01, P2 = 4.68; (c) PI = 0.12, P2 = 4.74; (d) PI = 0.42,P2 = 3.56; (e) PI = 0.18,p2 = 5.55; (f) PI = 0.77,P2 = 5.10.
Asymptotic estimation theory
266 Table 4.1
Average estimates and RMSE of
T
when
T
0.5 RMSE
Mean
n = 100
n = 300
n = 100
n = 300
MLE
BE
MLE
BE
MLE
BE
MLE
BE
0.4955
0.4893
0.5032
0.4983
0.1121
0.0858
0.0497
0.0422
0.4726
0.4924
0.4998
0.5144
0.1981
0.1121
0.1840
0.1220
0.5197
0.5207
0.5000
0.4978
0.1187
0.0854
0.0394
0.0336
0.5081
0.5091
0.4984
0.4975
0.1348
0.1058
0.0425
0.0350
0.5311
0.5313
0.4932
0.4940
0.1100
0.0916
0.0337
0.0282
0.5314
0.5361
0.4900
0.4885
0.1597
0.1315
0.0538
0.0438
Model (I)
= 0.7 ~ = 0.9 ~
Model (II) ~ ~
= 0.7 = 0.9
Model (III)
= 0.7 ~ = 0.9 ~
5
Real data applications
This section is devoted to the application of change point estimation to three data sets (Nile data, U. S. quarterly unemployment rate and international airline ticket sales data) where a visible change point can be observed. Based on these data, we fit (4.1). The estimation procedure is as follows. First, we estimate the unknown parameters by a maximum likelihood method. For fixed k, q -s: k -s: n - q, the MLE of Q and f3 is given by
Then we can estimate the spectral density of the residual process {Ut = Yt -
{O:~X(t -s: k) + 13~X(t > k)}zd using the following nonparametric estimator
where M = n 2/ 5, w(·) is a weight function and rk,n(l) = n- 1 L~:; UtUt+z. Hence the likelihood function is calculated using this spectral estimates. The MLE's of unknown parameters are
k = inf{k: TML = kin,
max
q5Y5,n-q O:ML
L(oi,13i,iln) = L(Ok,13k' kin)}
= Ok and 13ML = 13 k .
(5.1)
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
267
Model (I) !i
g ~ ~
__L 0.0
02
0.4
0.6
g 0
OJ
S! 0.8
0.0
1.0
02
0.4
0.6
011
1.0
(b) BE
(a) MLE
Model (II) ~
til ~
0.0
.__L 02
0.4
0.6
:?l
_.L..~
til ~
0.8
0.0
1.0
0.2
(e) MLE
0.4
0.6
0.8
1.0
0.8
1.0
(d) BE
Model (III) g
g
til
~
~
g 0
0
0.0
0.2
0.4
0.6
(e) MLE
0.8
1.0
0.0
0.2
0.4
0.6
(f) BE
Figure 4.2. Histograms for the results of Table 1 for ~ = 0.9 and n = 100. The sample coefficient of skewness {11 and the sample kurtosis {12 are: (a) {11 = 0.11, {12 = 3.25; (b) {1l = 0.34, {12 = 2.71; (c) {11 = 0.93, {12 = 5.50; (d) JLl = 0.63, {12 = 4.00; (e) {1l = 0.54, {12 = 3.51; (f) {1l = 0.34, {12 = 2.95.
Asymptotic estimation theory
268
A
./'/ ,,11./ ~
0
m :::E
'
.
........p: .......o••_.--0'"
•••• '4,
t"!
0
0:::
.-: 0
~
Figure 4.3. RMSE of Model (I) when n
= 100.
Q---1J
MlE
"' •••••••• -1>.
BE
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
269
Table 4.2. Average and RMSE of MLE and BE for
T
when
T
= 0.5 for Model (II).
Mean
RMSE
n
= 100
n
= 300
n
= 100
n
= 300
v
MLE
BE
MLE
BE
MLE
BE
MLE
BE
7r /2
0.5028
0.5017
0.5005
0.5004
0.0250
0.0201
0.0074
0.0065
7r /4
0.4848
0.4849
0.4944
0.4947
0.0584
0.0496
0.0266
0.0211
7r /8
0.4840
0.4969
0.4857
0.4895
0.1361
0.1217
0.0551
0.0418
7r/16
0.5847
0.5710
0.5183
0.5161
0.2283
0.1629
0.0833
0.0697
7r/32
0.5434
0.5381
0.4613
0.4675
0.2141
0.1715
0.1285
0.1021
Next we compute the Bayes estimator. For simplicity of calculation, we postulate the result that the asymptotic distribution of aM Land i3 B are same as CxB and (c.f. Kutoyants (1994)). Therefore the Bayes change point estimator TB becomes
i3B _
TB =
L~::qq TiLn(&ML, i3ML' Ti) n-q Li=q Ln(D'.ML, J3ML' Ti) A
A
,
Ti
= i/n,
1,
= q, ... , n -
q.
Nile data
These data have been investigated by an i.i.d. framework, for details see e.g., Cobb (1978) and Hinkley and Schechtmann (1987). The data consist ofreadings of the annual flows of the Nile River at Aswan from 1871 to 1970. There was a shift in the flow levels in 1899, which was attributed partly to the weather changes and partly to the start of construction work for a new dam at Aswan. We apply a mean shift model for this data with Zt = 1. The MLE gives aML = 1097.75, ~ML = 849.97 and T = 0.28 (k = 28). On the other hand, the BE is TB = 0.2790(k = [TBn] = 27). The original series together with ML trend estimator are plotted in Figure 5.1. Figure 5.2 shows the posterior distribution of T, which shows strong evidence that the shift occurred in 1898. These results agree with those of the other authors. U. S. quarterly unemployment rates
This data set, (n = 184), is analyzed in Tsay (2002) by use of threshold AR model for first differenced series. Here we explain a seasonal trend by employing regression models with trigonometric functions and change point. The regression function is chosen to be Zt = (1, cos(vt))'. A Fisher's test for added deterministic periodic component rejects the Gaussian white noise at level .01. We have taken v = 47r /184 which gives the peak in the periodogram. The MLE detected the possible change point TML = 0.49(k = 90) and corresponding regression coefficients &ML = (4.65, -0.85)' and i3ML = (6.81, -0.94)'. The BE is TB = 0.49 which corresponds to k = [TBn] = 90. The estimated trend
Asymptotic estimation theory
270
~~----------------------------------------------------------~
o
20
40
60
100
80
Figure 5.1. Nile data with estimated mean and change point
k = 28
(MLE).
d-
~-
;;d-
~0
20
40
Figure 5.2. Posterior distribution of T.
60
80
100
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
271
function together with original data is shown in Figure 5.3. The posterior distribution for T is plotted in Figure 5.4. This analysis reveals that the mean level of an unemployment rate increased to about 2% in 3rd quarter of 1970, while the amplitude of long term cyclical trend stayed the same level throughout the period.
International airline ticket sales data This data have been investigated by fitting a seasonal ARIMA model (Box et. al. (1994) ). An alternative modeling is deterministic cyclical trend function modeling with a change point for once-differentiated data. The regression function given by z~ = (COS(VIt),COS(V2t),COS(V3t)) is selected by examining the periodogram. There are three frequencies which have comparably large spectrum, namely VI = 267r/143, V2 = 507r/143 and V3 = 747r/143. The ML estimators give the &ML = (-7.54,14.14, 1.43)',.6ML = (-35.76,37.01, -19.66)' and TML = 0.6319(k = 91). While Bayes estimator is TB = 0.6216(k = 89). As shown in the posterior probability of T, the change might have occurred from t = 80 to 100, which implies the possibility of multiple changes.
6
Proofs
Proof of Theorem 1. From (2.7), we have log Zn(a,,6, T) = -
2~
t
f(Ak)-1/2 {dn(Ak)A(Ak)
+ dn(Ak)
A(Ak)} -
k=1
2~
t
(6.1)
IA(Ak)12
k=1
First we evaluate the first term in (6.2). From (2.7) we have
-
2~
t
f(Ak)-1/2 {dn(Ak)A(Ak)
+ dn(Ak)
A(Ak)}
k=1
= __1_
2yfii
f(Ak)-1/2
k=l
+ dn (Ak)A2 + dn (Ak)A3 + dn(Ak)Al + dn(Ak)A2 + dn(Ak)A3} El + E2 + E3 + E4 + E5 + E6 (say). X
=
t
{dn(Ak)A I
Write the spectral density f(A) in the form
where Rf(j)'s satisfy 2.:;:-00 IjlmIRf(j)1 < 00 for any given mEN. Then, from Theorem 3.8.3 of Brillinger (1975) we may write
Asymptotic estimation theory
272
o
50
100
150
Figure 5.3. U. S. quarterly unemployment rates (1948-1993) with estimated trend and change point k = 90(MLE).
C>
~U">
~C>
~U">
tiC>
ciU">
ci-
:30
50
Figure 5.4. Posterior distribution of T.
100
150
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
o
20
40
60
80
100
273
120
140
Figure 5.5. The international airline ticket sales, once -differentiated data (dotted line) with estimated trend and change point k = 91 (black line).
<:0
ci~v
~-
gj ci
<:>
ci
0
20
40
60
Figure 5.6. Posterior probabilities of T.
80
100
120
140
Asymptotic estimation theory
274
where r(j) 's satisfy for any given mEN 00
~ Ijlmlr(j)1 <
00.
j=-oo
Then E1 can be written as El = -
1 ~ 1/2 2..jii ~ f()..k)dn()..k)Al k=l n
= -
~n+~
n
4~7r ~ f()..k)-l L k=l
1
1
n
= -4n7r - '" ~ 27r k=l
1
L
({3 - 0)' ZsUtei(t-S)Ak
t=l s=[Tn]+l
1
n
00
~ r(j)e- ijAk ' " ~
~
[Tn+p] '"
~
({3 - 0)' ZsUtei(t-S)Ak
t=l s=[Tn]+l
j=-oo
noon
[Tn+p]
.,
= -4n7r27r~ - - '" '" r(j) ' " ' " ({3 - 0)' ZsUtet(t-S-J)Ak ~ ~ ~ t=l s=[Tn]+l
k=l j=-oo
It is well known that
~ ei(t-s-j)Ak = ~
k=l
{n0 ifotherwise. t - s - j = 0 (mod n)
Since -[Tn + p] ~ t - s ~ [(1 - T)n] and r(j) satisfies any given m, we have
2: j
(6.2)
Ijlmlr(j)1 <
00
for
Hence we have only to evaluate E1 for 1 = 0 of t - s - j = In. Thus E1 is
1 1 E1
n
00
= ---
47r27r ~
~
({3 - 0)' ZsUt-
L
L
r(j)
j=-oo
n ' " ei(t-s-j)Ak
n~
t=l s=[Tn]+l
k=l
[Tn+p]
00
~ - 87r 2
'"
~
j=-oo
1
1
[m+p]
~ r(j) ' "
_
({3 - 0)' zs{ us+j} ==
El
(say).
s=[Tn]+l
Then _
1
El = -
1 47r ({3 - a)'
J 7T
7T
-
=
~({3 2
O)'WI
~
J7T
Zs
-7T
eijAeisAdZu()")
s=[Tn]+l
J=-OO
=-
[Tn+p]
00
87r 2 . ~ r(j)({3 - 0)' [m+p]
~
s=[Tn]+l
(say),
zse iSA f()..)-ldZ u ()")
(6.3)
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
275
where Zu(A) is the spectral measure of Ut defined by (2.7). Let L~::Y~~~l+l zse is ).. = A(A; h, p). we observe
Recalling that {Ut} is Gaussian, we have
WI
(0, 4:
r; N
2
i:
A(A; h, p)A *(A; h, P)f(A)-ldA))
(6.4)
Similarly we obtain
(6.5)
Next we calculate the second term E2 that is
E2 = -
t
2~
f(Ak)-1/2d n (Ak)A 2
k=l
n
n
~n+~
= _1_ ' " f(Ak)-1_1- ' " ' " Uta' zsei(t-S»)..k 4mf~
fo~ ~ t=l
k=l
n
=_1_~", 4mf 27f
'"
~j~<Xl <Xl
s=l
n
<Xl
[Tn+p]
r(j)e-ij)..k_1_", ' " a'Utzsei(t-S»)..k
fo f:-t ~
n
[Tn+p]
n
= ~~ '" f(j)_l_ ' " ' " a'Utzs.!. ' " ei)..dt-s-j). 47f27f
j~<Xl
fo8 ~
n~
Here note that n - 1 ;::::: t - S ;: : : -[Tn]. Because of (6.2) we have only to evaluate E2 for l = 0, 1 of t - s - j = In. Then
Asymptotic estimation theory
276
Similarly as in E 1 ,
.;Ta' = -2-W2
(say),
where
W 2 ---+ N Dt
(0, ~ j7r-7r 2f ()...)-ldM()"')) , 2n
(6.7)
which follows from the Riemann-Lebesgue theorem and Grenander's conditions (G.1) - (G.4). Similarly we obtain. (6.8) Next
Since [(1 - T)n] 2': t - s 2': 1 - n, we have only to evaluate E3 for I = 0, -1 of t - s - j = In. Hence
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
Similarly as in
E2
277
we have
(6.9)
- .;r=Tb' W 2 3, where (6.10) Similarly we obtain.
E 6 = .;r=Tb' W3 2
(6.11)
Hence from (6.4), (6.5), (6.7), (6.8), (6.10) and (6.11), we have
2~
t
jP..k)-1/2 { dn(Ak)A(Ak) + dn(Ak) A(Ak) } k=l c:::: (f3 - a)'Wl + JTa'W2 + .;r=Tb'W3 . -
(6.12)
Next we evaluate the second term in (6.2), which is 1 - 2n
n
L
JA(Ak)J2
k=l 1
n
= - - ~(Al 2n~
+ A2 + A3)(Al + A2 + A 3)
k=l
=
-~ ~(JAlJ2 + JA 2J2 + JA3J2 + AlA2 + AlA3 + A2A3 + +A2 A l + A3 A l + A3 A 2). 2n~ k=l
Asymptotic estimation theory
278 We have
(6.13)
n
00
= __1_ ~ ~ ~ r(j)e-~j>'k 4mf L..t 27r L..t k=1
1 1
00
47r 27r L..t
r(j)
L
L..t
~
({3 _ a)' Zt z :({3 _
a)ei(t-S»,k
[Tn+p] ~
1
[Tn+p] ~
L..t
L..t
({3 - a)' ZtZ' ({3 - a)-
n. . ~ e~(t-S-J»,k
n L..t
S
k=1
[Tn+p]
00
= - 47r 27r
L..t
t=[Tn]+1 s=[Tn]+1
j=-oo
1 1
[Tn+p]
~
t=[Tn]+1 s=[Tn]+1
j=-oo
~
= ---
[Tn+p]
L
r(j)
j=-oo
({3 - a)' zs+jz:({3 - a).
s=[Tn]+1
Next we have
= __1_~ 4n7r 27r
L 00
j=-oo
[Tn+p] [Tn+p]
r(j)a' ~ L..t t=1
~
L..t s=1
Zt Z ' S
a
{n
~ ~ ei(t-s-j»,k
}
n L..t
.
k=1
Note that [Tn] 2': t - s 2': -[Tn]. Similarly we have
(6.14)
1 ~ 7r 7r J=-OO .
T
= -4-2 L..t r(j)a' = -
T
47r a'
j7r e~J.. )' dM(A)a = --a' j7r -1 T
-7r
j7r f(A)-ldM(A)a -7r
47r
-7r
~ .. )' L..t r(j)e~J dM(A)a
27r J=-OO .
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
279
Also we obtain
(6.15) 1
n
A312 --2:I 2n k=l
=
~
1 1 --~-4mf k=l f(>"k)
(1-
..;n
1
~
~ t=[Tn+p]+l
loon = --~ r(j)a' ~
4mf 21r ~
=-
1-
T
= _1-
~
b' /71"
T
n
ZtZ' a
{I
f
n.
b' ze -is>''k ) S
.}
- ~ et(t-S-J)>"k n ~
8
t=[Tn+p]+18=[Tn+p]+1
-71" 21r J=-OO .
41r
..;n s=[Tn+p]+l
~ ~
~
j=-oo
In ~ , it>") bzte k ( -~
k=l
r(j)eij>"dM(>..)b
b' /71" f(>..)-ldM(>..)b.
-71"
41r
The fourth term becomes
1 -2n
n
2: AIA2 k=l
=-
4~1r t k=l
f
;1r
r(j)e-ij>"k ( [Tf] ((3 - 0)' Zt eit >..)
j=-oo 00
j=-oo
In
t=[Tn]+l [Tn+p] [Tn+p]
= ~~_1_ ~ r(j) ~ 41r 21r..;n ~
(-
~
t=h+1
From 1 - p ::; t - S ::; [Tn]
+p-
8=1 n
~ ((3 - 0)' ztz~a.!. ~ ei(t-s-j)>"k ~
8=1
1, t - s - j
ITfl a' z,e- i ' ' ' , )
n~ k=l
= 0, it is seen that
Asymptotic estimation theory
280
Similarly we observe
(6.17)
Now we evaluate
n = -~~~ '" r(j) '~ " '" a'zt z ' b~ 47r 27r n ~ ~ s n j=-oo t=l s=[Tn+p]+l ~n+~
00
n ' " ei(t-s-j)Ak. ~ k=l
Since -n + 1 :::; t - s :::; -1, we have only to evaluate for t - s - j
= 0, -no (6.18)
1 n -2n LA2 A 3 k=l
1 1
~ - - - J r ( l - r)
47r 27r
~
1 [Tn+p] 1 n 1 n. . r(j)L a'ztz:b- Le~(t-S-J)Ak j=-oo VITi t=l y'(1 - r)n s=[Tn+p]+l n k=l
L
L
00
f=
_ Jr(l - r) r(j)a' j7r eijAdM(A)b 47r . 27r J=-OO . -7r
= _ y'' ' --'r('' ' ---l--r----,-) a' j7r f(A)-ldM(A)b. 47r
-7r
Similarly we have
_~
t
2n k=l
A3A2 '" - Jr(l - r) a' j7r f(A)-ldM(A)b. 47r_7r
(6.19)
From the equations from (6.14) to (6.19) together with (6.4), (6.7), (6.10) and (6.13) complete the proof of Theorem l. Proof of Lemma 1. From Hannan (1970) and Anderson (1977) the joint density of dn(Al), ... ,dn(An) is given by k
p(dn(Al),··· ,dn(An)) = en
II exp( -dn(Ak)f(Ak)-ldn(Ak)) k=l
(6.20)
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
281
EZ~/2(a, b, p)
= Eexp
In [ - 4J7i ( ;
JI
Cn exp ( -
=
=
x exp ( -
4~
x exp ( -
4~
t.
t,
J... J
1
exp [ 16n
t,
d n(Ak)!(Ak)-l d n(Akl)
/(Ak)-1/2 { d,,(Ak)A(Ak)
t,
(f(,\k)-1/2 dn(Ak) 1
I: IA(Ak)12 n
4n
k=l
=
exp (
-l~n
t.
n
exp - 4n {; IA(Ak)12
+ dn(A.)
A(Ak) } )
IA(Ak)!') d(d,,(A1)" . dn(An))
C n exp [-
X
[1
] f(Ak)~1/2 {dn(Ak)A(Ak) + dn(Ad _ A(Ak)}
+ ~j;;))
I: IA(Ak)12 n
]
(mk)
1/2dn(Ak)
+ ~j;;l)
d(dn(AI)··· dn(An)
k=l
IA (Ak)1 2 )
.
Recalling the definition of likelihood process in (2.7), we have
From the proof of Theorem 1 and Assumption (G.1), the first term in (6.21) is bounded by 1
n
-16n
L (AlAI) k=l
3 1 [m+p] ~ -1681[2
(6.22)
[Tn+p]
I:
I:
({3 - a)' Ztr(t - s)zs({3 - a) t=[Tn]+1 s=[Tn]+1 3 1 [Tn+p] < - - --2 " " ' { ({3 - a)' zt} 2 x min f (A) ~ I 16 81[ ~ A t=[Tn]+1 = - [O(p)] for p> O. We have already shown in (6.17) and (6.18) that
1~n
t
{AI(A2
+ A 3)} = O(n~I/2)
k=l
and _1_
~ {AI (A2 + A3)} = O(n~I/2).
16n~
k=l
(6.23)
]
]
Asymptotic estimation theory
282
Furthermore, from the proof of Theorem 1 we can find a positive definite matrix K so that (6.24) Hence (6.23)-(6.24) implies the required result. Proof of Lemma 2. Let ()~ = (a~,,6~, 71)' and (); = (a;,,6; 72)' are some given values in 8, and are the forms of a1 = a + n- 1/ 2a1,,61 = ,6 + n- 1/ 2b2, 71 = 7+n- 1pl,a2 = a+n- 1/ 2a2,,62 = ,6+n- 1/ 2b1 and 72 = 7+n- 1p2. Denoting A(..\'k) under (}i as A(ai, bi , Pi; Ak) we set ~ln
= A(al,b1,Pl;Ak) - A(a2,b2,P2;Ak)
~2n
= IA(a1,b1,Pl;Ak)1 2 -IA(a2,b2,P2;Ak)1 2
and
The process Y n is written as (6.25) Then we observe
E a ,{3,T
\Z~/4(al' b 1 , PI) - Z~/4(a2' b2, P2)\1/4
= E a1 ,{31,Tl (1 - Yn)4
= E (1 - 4Yn + 6Y; - 4Y; + Y;) We have
Similarly, we obtain
6EY;
=
6exp(41] + 2,),
and
EY;
=
exp(161] + 4,).
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
283
Hence
(6.26) Using the following expansion for small y
we have E[l - Yn]4
= 1 - 4(1 + 1] + ,) + 6(1 + 41] + 2,) - 4(1 + 91] + 3,) + (1 + 161] + 4,) =
+0(1]2) + 0(/2) + 0(1],) 0 + 0(1]2) + 0(/2) + 0(1],)
which implies that the Taylor expansion of (6.26) starts with the linear combinations of second order terms of 1]2, ,2 and 1],. Here we need to evaluate the asymptotics of 1] and, in (6.26). Assume that without loss of generality PI 2 P2, then
Using the similar argument in proof of Lemma 1, we observe
which 'is written as
Analogously we have
which completes the proof.
Proof of Theorem 2. The proof follows from Theorem 1, Lemmas 1 and 2 of this paper and Theorem 1.10.1 of Ibragimov and Has'minski (1981). Proof of Theorem 3. The properties of the likelihood ratio Zn (a, b, p) established in Theorem 1, Lemmas 1 and 2 allow us to refer to Theorem 1.10.2 of Ibragimov and Has'minski (1981).
Bibliography [1] Anderson, T. W. (1977). Estimation for autoregressive moving average models in time and frequency domains. Ann. Statist. 5 842-865.
284
Asymptotic estimation theory
[2] Bai, J. (1994). Least squares estimation of a shift in linear processes. J. Time Ser. Anal. 15453-472. [3] Bai, J. (1997). Estimation of change point in multiple regression models. The Review of Economics and Statistics. 79 551-563. [4] Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (1994). Time Series Analysis Forcasting and Control, 3rd. ed. Prentice Hall, New Jersey. [5] Brillinger, D. R. (1981). Time Series: Data Analysis and Theory, expanded ed. San Francisco: Holden-day. [6] Cobb, G. W. (1978). The probrem of the Nile: Conditional solution to a change-point problem, Biometrika. 65 243-251. [7] Csorgo, M. and Horvath, L. (1997). Limit Theorems in Change-Point Analysis. Wiley, New York. [8] Dabye, Ali S. and Kutoyants, Yu. A. (2001). Misspecified change-point estimation problem for a Poisson process. J. Appl. Prob. 38A 701-709. [9] Giraitis, L. and Leipus, R. (1990). A functional CLT for nonparametric estimates of spectra and change-point problem for spectral function. Lietunos Mathematikos Rinkinys. 30674-697. [10] Giraitis, L. and Leipus, R. (1992). Testing and estimating in the change-point problem of the spectral function. Lietunos Mathematikos Rinkinys. 32 20-38. [11] Hidalgo, J. and Robinson, P. M. (1996). Testing for structural change in a long-memory environment. J. Econometrics. 70 159-174. [12] Hinkley, D. V. and Schechtman, E. (1987). Conditional bootstrap methods in the meanshift model. Biometrika. 74 85-93. [13] Hannan, E. J. (1970). Multiple Time Series. Wiley, New York. [14] Ibragimov, I. A. and Has'minski, R. Z. (1981). Statistical Estimation. New York: Springer-Verlag [15] Kokoszka, P. and Leipus, R. (1998). Change-point in the mean of dependent observations. Statist. and Probab. Letters. 40 385-393. [16] Kokoszka, P. and Leipus, R. (2000). Detection and estimation of changes in regime. Preprint. [17] Kutoyants, Yu. A. (1994). Identification of Dynamical System with Small Noise. Dordrecht: Kluwer Academic Publishers. [18] Picard, D. (1985). Testing and estimating change points in time series. Adv. in Appl. Probab. 17 841-867. [19] Ritov, y. (1990). Asymptotic efficient estimation of the change point with unknown distributions. Ann. Statist. 18 1829-1839. [20] Tsay, R. S. (2002). Analysis of Financial Time Series. Wiley, New York.
Fractional Brownian motion as a differentiable generalized Gaussian process Victoria Zinde-Walsh 1 McGill University
fj
CIREQ
and Peter C.B. Phillips 2 Cowles Foundation, Yale University University of Auckland fj University of York Abstract Brownian motion can be characterized as a generalized random process and, as such, has a generalized derivative whose covariance functional is the delta function. In a similar fashion, fractional Brownian motion can be interpreted as a generalized random process and shown to possess a generalized derivative. The resulting process is a generalized Gaussian process with mean functional zero and covariance functional that can be interpreted as a fractional integral or fractional derivative of the deltafunction.
Keywords: Brownian motion, fractional Brownian motion, fractional derivative, covariance functional, delta function, generalized derivative, generalized Gaussian process JEL Classification Number: C32, Time Series Models
1
Introd uction
Fractional Brownian motion, like ordinary Brownian motion, has almost everywhere continuous sample paths of unbounded variation and ordinary derivatives of the process do not exist. Gel'fand and Vilenkin (1964) provided an alternative characterization of Brownian motion as a generalized Gaussian process defined as a random functional on a space of well behaved functions. Interpreted as a generalized random process, Brownian motion is differentiable. A generalized Gaussian process is uniquely determined by its mean functional and the bivariate covariance functional. Correspondingly, the generalized derivative of a Gaussian process with zero mean functional is a generalized Gaussian process with zero mean functional and covariance functional that can be computed from the covariance functional of the original process. Gel'fand and Vilenkin provide a description of the generalized Gaussian process which represents the derivative of Brownian motion. This process has a covariance functional that can be interpreted in terms of the delta-function. 1 Zinde-Walsh thanks the Fonds Quebecois de la recherche sur la societ e et la culture (FQRSC) and the Social Sciences and Humanities Research Council of Canada (SSHRC) for support of this research. 2Phillips thanks the NSF for support under Grant No. SES 0092509.
285
Fractional Brownian Motion
286
The present paper considers fractional Brownian motion from the same perspective as a generalized process and shows how to characterize its generalized derivative. The resulting process is a generalized Gaussian process with mean functional zero and covariance functional that can be interpreted as a fractional integral or fractional derivative of the delta-function. Higher order derivatives can be similarly described.
2
Fractional Brownian motion as a generalized random process
The form of the fractional Brownian motion process considered here was introduced by Mandelbrot and Van Ness (1968). In Marinucci and Robinson(1999) it is called Type I fractional Brownian motion. This form of (standard) fractional Brownian motion for 0 < H < 1 is represented in integral form as
BH(r) = A(H)-l
[1:00 (r - s)H-~ dB(s) - [°00 (_s)H-~ dB(S)] ,
r 2: 0 (2.1)
with A(H)
=
[2k + fo
oo
{(I
+ s) H-~
1
- sH-!} ds]"2 and where B is standard
Brownian motion and H is the self similarity index. For H = ~ the process coincides with Brownian motion. Samorodnitsky and Taqqu (1994, ch.7.2) give the 'moving average' representation (2.1) as well as an alternative harmonizable representation of the fractional Brownian motion process. Bhattacharya and Waymire (1990) provide some background discussion of the Hurst phenomenon and subsequent theoretical developments that led to the consideration of stochastic processes of this type. The mean functional of (2.1) is E B H(r) is (Samorodnitsky and Taqqu, 1994)
V(rl' r2) = EB H(rl)B H(r2) = Note that BH(O)
= 0 and for V(rl,r2)
=
rl, r2
~
= 0 and the covariance kernel V (rl , r2 ) [h1 2H
+ Ir212H -lr2 -
rll2H] .
> 0 the covariance kernel becomes
~ [r~H +r~H -lr2 _rlI2H].
(2.2)
The usual covariance kernel of Brownian motion follows when H = ~. Following Gel'fand and Vilenkin (1964), define the space K of 'test functions' as follows. K is the space of infinitely continuously differentiable functions
(2.3) 30ther spaces of test functions can be chosen.
For example, the space S of infinitely
Victoria Zinde- Walsh and Peter G.B.Phillips
287
Integrals in linear functionals such as (2.3) are taken from 0 to 00 and they are convergent due to the fact that all q; E K have finite support. Test functions could differ at negative values of r without affecting the value of the functional (BH, q;) . Thus we can restrict ourselves to the subspace K+ of K of functions q;(r) with non-negative support. The representation (2.3) provides an interpretation of BH as a linear functional on the space K+. It is easily seen that this functional is continuous in the topology on K +. Since E (B H) = 0, the mean functional is zero. Next we derive the covariance functional of B H . This functional, which we denote by VH[q;, '¢] is given in terms of the covariance kernel V(rl' r2) of the process BH. For q;, '¢ E K+ we have
VH[q;, '¢] := (V, (q;(t), ,¢(s))) =
JJ
V(t, s)q;(t),¢(s)dtds.
Substituting the expression for V(t, s) from (2.2) we have 2VH
=
11 + 1 1 -1 [it -1 [1 00
00
00
=
[q;, '¢]
[t 2H
q;(t)dt
00
00
q;(t)
,¢(s)
00
S2H
-It -
s2H '¢(s)ds
s12H] q;(t)'¢(s)dtds
+
1
00
'¢(s)ds
1
00
t 2H q;(t)'¢dt
(t - S)(2H+l)-1,¢(S)dS] dt
8
(2.4)
(t - S)(2H+l)-1q;(t)dt] ds.
Denote the integral r(a) J~ (t - x )a-l f(x )dx by (fa f)(t) for a > O. This integral is the fractional integral (in the Liouville sense) of the function f· If g(t) = (fa f)(t) where a > 0, then f is the fractional derivative of 9 and we shall write f(t) = (f-ag)(t). We use these expressions to simplify (2.4) in what follows. Start by noting that since [t 2H
!too '¢(s)ds];;o = 0
which equals
(2H)
= (2H)
1 1 1
00
00
t 2H - 1
00
t 2H - 1 [(f'¢) (00) - (f'¢) (t)] dt.
'¢(s)dsdt
differentiable functions that go to zero at infinity faster than any power, or spaces of functions that are not infinitely differentiable. The number of continuous derivatives that the test functions possess will determine the number of generalized derivatives of the process that can be defined on that space.
Fractional Brownian Motion
288 Use this expression in (2.4) to get
2VH [<,b,1);]
[1 t 2H - [(I1);) (00) - (11);) (t)] dt] + (2H) (I1);)(00) [1 t 2H - 1 [(1<,b) (00) - (I<,b) (t)] dt] 00
= (2H) (1<,b) (00)
1
00
-f(2H + 1)
[1
00
<,b(t) (I2H+11);) (t)dt +
1
00
1);(t) (I2H+l<,b) (t)dt] . (2.5)
Now
1
00
=
<,b(t) (1 2H +1 1);) (t)dt
[(I<,b) (t) (I2H+11);)(t)];;o
= (I<,b) (00) (1 2H+1 1);) (00) =
1
00
-1 -1
00
00
(I<,b) (t) (I2H1);) (t)dt
(1<,b) (t)(12H1);) (t)dt
[(I<,b) (00) - (I<,b) (t)] (I2H1);)(t)dt,
(2.6)
and
1
00
t 2H - 1 [(11);) (00) - (11);) (t)] dt
1e 1 00
=
H- 1
00
1);(s)dsdt.
(2.7)
Using (2.6) and (2.7) in (2.5) gives the following expression for 2VH[<,b, 1);],
[1 t 2H - [(I1);) (00) - (I1);) (t)] dt] + (2H) (I1);)(00) [1 t 2H - [(I<,b) (00) - (I<,b) (t)] dt] 00
(2H) (I<,b)(oo)
1
00
-f(2H + 1) -f(2H + 1)
=
1 +1 00
1 1
00
00
1
[(1<,b) (00) - (I<,b) (t)] (I2H1);)(t)dt [(I1);) (00) - (11);) (t)] (I2H<,b)(t)dt
[(1<,b) (00) - (1<,b) (t)] [t 2H - 1 (2H) (I1);)(00) - f(2H
00
+ 1)(I2H1);)(t)] dt
[(11);) (00) - (I1);) (t)] [t 2H - 1 (2H) (1<,b)(oo) - f(2H
+ 1)(I2H<,b)(t)] dt,
so that
VH [<,b,1);]
=
1
roo [(I<,b) (00) -
(I<,b) (t)] [t 2H - 1 (2H) (I1);)(00) - f(2H
+ 1)(12H1);)(t)] dt +
1
roo [(I1);) (00) -
(I1);) (t)] [t 2H - 1 (2H) (1<,b) (00) - f(2H
+ 1) (12H<,b) (t)]
2 Jo 2 Jo
dt.(2.8)
Victoria Zinde- Walsh and Peter G.B.Phillips
289
Setting H = ~ in this expression, we find that (2.8) specializes to VI [¢, 1jJ] = 2
Joroo [(I¢) (00) -
(I¢) (t)] [(I1jJ)(oo) - (I1jJ) (t)] dt,
which is the covariance functional of Brownian motion as a generalized process, a formula given in Gel'fand and Vilenkin (1964, p. 259). Thus, as a generalized random process, fractional Brownian motion is a generalized Gaussian process with mean functional zero and covariance functional given by (2.8). Observe that (2.8) is a bilinear functional involving fractional integrals of the test functions 1jJ and ¢. This alternative approach provides a new description of fractional Brownian motion. In the conventional manner, fractional Brownian motion can be described by its randomly selected sample paths, so that one can think about this process as being indexed by a random element in the probability space where the process lives. In contrast, the new description of fractional Brownian motion as a generalized process indexes the process by deterministic functions belonging to the class K +. Its covariance properties are similarly indexed by these deterministic functions through the covariance functional VH[¢, 1jJ].
3
The generalized derivative of the fractional Brownian motion process
One advantage of this new description of fractional Brownian motion is that it is differentiable, and the process representing the derivative is also a generalized Gaussian process. The mean functional is zero for the derivative process and, according to Gel'fand and Vilenkin (1964, p. 257), its covariance functional Vk [¢, 1jJ] satisfies Vk[¢, VJ] = VH [¢', Vi]. Substituting ¢', 1jJ' for ¢ and 1jJ, respectively in (2.8), we get the expression
VH[¢' , 1jJ']
=~
2
roo [¢ (00) _ ¢(t)] [t 2H -1 (2H) 1jJ( (0) -
Jo
+~
2
=
roo [1jJ (00) -1jJ(t)] [t 2H -
Jo
r(2~ + 1)
{1°°
r(2H + 1)(I2H 1jJ')(t)] dt
(2H) ¢(oo) - r(2H + 1)(I2H¢')(t)] dt
1
¢(t) (I2H-11jJ)(t)dt
since (Ia+1 f')(t) = (Ia f)(t) and ¢(oo) of the test functions.
+
1
00
1jJ(t)(I 2H - 1¢)(t)dt}
(3.1)
= 1jJ(00) = 0, in view of the finite support
Next we interpret the bilinear functional ViI- First, for ordinary Brownian motion (H = ~) the functional Vk [¢, 1/J] has the simple form V{ [¢, 1/J] = 2
roo ¢(t)1jJ(t)dt,
Jo
Fractional Brownian Motion
290
which can be interpreted in terms of the delta-function 8(w), i.e.,
V{ [¢, 1j;] = (XJ ¢(t)1j;(t)dt. 2 Jo
1 I: 00
=
1 I: 11 00
=
00
=
00
8(w)¢(t)1j;(t + w)dtdw 8(s - t)¢(t)1j;(s)dtds (3.2)
8(s - t)¢(t)1j;(s)dtds.
Thus, the covariance kernel of the derivative of standard Brownian motion is the delta function, as shown in Gel'fand and Vilenkin (1964, p. 260). Similarly in the fractional case we can interpret ViI in terms of a generalized fractional integral/derivative of the delta-function. Treating w(t) = (Ia J)(t) as a generalized function on K, the functional (w, ¢) = J w(t)¢(t)dt is differentiable as a generalized function with derivative (w', ¢) = J w'(t)¢(t)dt = - J w(t)¢'(t)dt by definition of a generalized derivative (Gel'fand and Shilov, 1964). Using this relation in the expression for ViI [¢, 1j;] gives
ViI[¢,1j;]
=
r(2~ + 1)
{1°O ¢(t)(J2H - 1j;)(t)dt + 1 1
00
1j;(t)(I2H-l¢)(t)dt}.
As we see in what follows, this expression can be written in the form
Vk[¢,1j;] = r(2H + 1)
11 00
00
(J 2H - 1 8) (s - t)¢(t)1j;(s)dtds.
(3.3)
extending the representation (3.2) for the covariance functional of the first derivative of Brownian motion. So the covariance kernel of the derivative of fractional Brownian motion (treated as a generalized process) is the fractional derivative/integral (J 2H -18) of the delta function. For H > ~ this is a fractional integral, while for H < ~ it is a fractional derivative. We examine the two cases separately. In the case of a fractional integral with a = 2H - 1
> 0 and t > 0 we have
t
1
ta-
1
(r8) (t) = r(a) Jo (t - xt- 1 8(x)dx = r(a)· Then
1 1 [r~a) 1 1 [I 1 [I 1 1 o) 00
00
=
=
00
00
=
00
=
¢(t) (Ia1j;) (t)dt
t
¢(t) ¢(t) ¢(t)
00
(Ia
(t-x)a-l1j;(x)dx] dt
t
(I a8) (t - X)1j;(X)dX] dt
t
(I a8) (W)1j;(t - W)dW] dt (t - s)¢(t)1j;(s)dsdt,
(3.4)
291
Victoria Zinde- Walsh and Peter G.B.Phillips
and similarly
1=
'ljJ(t) (JU¢) (t)dt
so that
Vk[¢, 'ljJ]
=
r(2~ + 1)
=
r(2H
+ 1)
=
1= 1=
{1= 1= 1=
(JUb) (t - s)¢(t)'ljJ(s)dsdt,
¢(t) (I2H-1'ljJ) (t)dt
+
1=
'ljJ(t)(J 2H - 1¢) (t)dt }
(I2H- 1b) (t - s)¢(t)'ljJ(s)dtds,
giving the result (3.3). In the case of a fractional derivative with a = 2H - 1 < 0 (0 < H < ~) we write J2H -1 I = J2H I' and then
1= 1= [rta) 1 1= [1 1= [1 1= [1 1= 1= ¢(t) (Ia'ljJ') (t)dt
=
=
with a similar result for
r(2~ + 1)
= r(2~ + 1) r(2H
t
¢(t)
t
¢(t)
=
=
t
¢(t)
=
Vk[¢, 'ljJ] =
t
¢(t)
=
+ 1)
(t-x)a-1'ljJ'(x)dX] dt
(rb)(t - x)'ljJ'(X)dX] dt (rb)(W)'ljJ'(t-W)dW] dt (Ia- 1b) (w)'ljJ(t - W)dW] dt
(JU- 1b) (t - s)¢(t)'ljJ(s)dsdt,
Jo= ¢(t)(Ja¢')(t)dt. It follows that
{1= {1= 1= 1=
1= + 1=
¢(t) (I2H-1'ljJ) (t)dt ¢(t) (I2H'ljJ')(t)dt
+
'ljJ(t)(I2H-1¢) (t)dt }
'ljJ(t)(I2H ¢')(t)dt}
(J 2H - 1b) (t - s)¢(t)'ljJ(s)dsdt,
as required for (3.3). Clearly, one can proceed with further differentiation of the fractional process. Subsequent m-th order derivatives will provide gen,eralized Gaussian processes with mean functional zero and covariance functional expressed in terms of the generalized function (J 2H - m b) (t - s). Victoria Zinde Walsh Department of Economics McGill University & CIREQ
Peter C.B. Phillips Cowles Foundation, Yale University University of Auckland & University of York
292
Fractional Brownian Motion
Bibliography [1] Bhattacharya, R. N. and E. C. Waymire (1990). Stochastic Processes with Applications. New York: John Wiley. [2] Gel'fand I M. and G. E. Shilov (1964). Generalized Functions, Vol.4. New York: Academic Press. [3] Gel'fand I M. and N. Ya. Vilenkin (1964). Generalized Functions, Vol. 1. New York: Academic Press. [4] Mandelbrot, B.B. and J. W. Van Ness (1968). "Fractional Brownian Motions, Fractional Noises and Applications". SIAM Review, 10, 422-437. [5] Marinucci, D. and P. M. Robinson (1999). "Alternative Forms of Fractional Brownian Motion". Journal of Statistical Planning and Inference, 80, 111122. [6] Samorodnitsky, G. and M. S. Taqqu (1994). Stable Non-Gaussian Random Processes. London: Chapman & Hall.