This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
61. 051511
Here we have used the facts that for distinct k's, the random variables max(k- 1)est5ke IX; - l; I are identically distributed and )
Pmax IX;-!;i-d}=P max f
o5tse
}
045 145 11
ew, - e
It w.,
>61
(
< P{ max I ew1I > 61. 2 1051511
Continuing estimation (2.6) and taking account of the inequality
P{wo > z} <
`L exp(-zz/20)
Z2n
true for a normal random variable we with mean 0 and variance A, we have P{PoT(X`, !`) >- S} <
T P{ max IswtI > A
`ostse
4Q P{we > 6/2re} S
_
23 2re
6,/27c
z
exp --
8r2&2
.
(2.7)
Now it is sufficient to take A < 62 /4r2sO, and the right side of (2.7) will be smaller than 'z exp{-e-z(s - y)} for a sufficiently small and s <- so. This and inequalities (2.4) and (2.5) imply the assertion of the theorem. We establish some properties of the functional SoT((P) Lemma 2.1. (a) The functional SOT((p) is lower semicontinuous in the sense of uniform
convergence, i.e., if a sequence (p(") converges to (p in COT, then SOT((P) < -l M,-. SOT((P(0).
78
3. Action Functional
(b)
The set of.functions p, 0 < t < T such that cpo belongs to some compact subset of R' and S0T((p) < so < x is compact.
Proof. (a) It is sufficient to consider the case where the finite limit limn-. SoT((p(")) exists. We use the following fact (cf. Riesz and Sz.-Nagy [1],
p. 86): a function cp, is absolutely continuous and its derivative is square integrable if and only if N
sup
I cPt; - cp,, _ t
l2
ti - ti-I
i=1
is finite and in this case the supremum is equal to f 10, I2 dt. o Expression (2.8) is equal to N 2 lim Y IcPit - 9Ii-,I n-oo i=1 ti - ti-1 (n)
(n)
sup
N
< lim n-co
(n)
sup E o<,o
I
(n)
'i - .'i -' I
2
ti - ti-I
fT
= lim n-.m
0
This implies that cp is absolutely continuous and S(cp) <_ limn-,,, S((p(")). I2 ds = 2S0T((p) < 2s0 we obtain that (b) From the estimation f oI
ITjlsI2 ds
cpo+ fcPSds 0
2Tco.
Consequently, all functions of our set are uniformly bounded. This estimation implies the equicontinuity of the functions q :
IP,+a-cp,I <
f
t+h
f+h
hf
I0512ds
< ,/2S((p) f <_
12-so l h-.
The compactness follows from Arzela's theorem. Corollary. On every nonempty closed set in COT.for which the initial values of cpo are contained in some compact set, the functional SOT(cp) attains its smallest
value and values close to the smallest value are assumed by the functional only near functions at which the minimum is attained.
79
§3. Action Functional. General Properties
Theorem
2.3. Let a function (P E COT be such that (Po = 0, SOT(q) < oo. We
have
lim liim EZ In P{POT(X`, (p) < S} = lira lira s2 In P{p0T(X`, (p) < S)
310 Ito
310 CIO
= - SOT((P)-
For the proof of this theorem, it is sufficient to establish that for every > 0 and every sufficiently small SO > 0 there exists an go > 0 such that exp{-E-Z(SOT((P) - Y)}
P{POT(XC, (P) < a0) >_ exp{-E-2(SOT((P) + y)}
for 0 < E < Eo. The right-hand inequality constitutes the assertion in Theorem 2.1. We prove the left-hand inequality. We choose 60 > 0 so small that
inf
SOT(S) > SOT((P) - Y/4.
0: vor(vp,*) 5 30
This can be done relying on Lemma 2.1. By the corollary to the same lemma, we have
bt = PoT({1li: PoT((P, 0) <_ a0}, (I(SOT((P) - y/2)) > 0.
We apply Theorem 2.2: P{POT(X`, (P) < b0} < P{PoT(X`, 4b(SOT((P) - Y/2)) > at}
< exp{ _S-2 (SO((P) - Y)}
whenever s is sufficiently small. By the same token inequalities (2.9), and with them Theorem 2.3, are proved. O The proofs of Theorems 2.1 and 2.2 reproduce in the simplest case of the process X' = sw, the construction used in the articles [1] and [4] by Wenizell and Freidlin. In this special case, analogous results were obtained in Schilder's article [1].
§3. Action Functional. General Properties The content of §2 may be divided into two parts: one which is concerned with the Wiener process and the other which relates in general to ways of describing the rough (to within logarithmic equivalence) asymptotics of families of measures in metric spaces. We consider these questions separately.
80
3. Action Functional
Let X be a metric space with metric p. On the a-algebra of its Borel subsets let ph be a family of probability measures depending on a parameter h > 0. We shall be interested in the asymptotics of p'' as h l 0 (the changes which have to be made if we consider convergence to another limit or to 00 or if we consider a parameter not on the real line but in a more general set, are obvious). Let 2(h) be a positive real-valued function going to + 0e as It 10 and let S(x) be a function on X assuming values in [0, oo]. We shall say that A(h)S(x) is an action function for p" as h 10 if the following assertions hold: (0)
the set D(s) = (x: S(x) < s) is compact for every s >- 0;
(I)
for any S > 0, any y > 0 and any x e X there exists an ho > 0 such that
p"{y: p(x, y) < S) > exp{-1(h)[S(x) + y])
(3.1)
for all It < ho; (II)
for any S > 0, any y > 0 and any s > 0 there exists an ho > 0 such that p''{y: p(y, (D(s)) >- S) < exp{ -2(h)(s - y)}
(3.2)
for h < ho.
If c' is a family of random elements of X defined on the probability spaces (S2'',,', P''), then the action function for the family of the distributions p'', p"(A) = a A} is called the action function for the family ''. In this case formulas (3.1) and (3.2) take the form
P"{p(g', x) < 8} < exp{-2(h)[S(x) + y]), (p(s)) >- S} < exp{-A(h)(s - y)).
(3.1')
(3.2')
Separately, the functions S(x) and 2(h) will be called the normalized action function and normalizing coefficient. It is clear that the decomposition of an action function into two factors 2(h) and S(x) is not unique; moreover, 2(h) can always be replaced by a function 11(h) - 2(h). Nevertheless, we shall prove below that for a given normalizing coefficient, the normalized action function is uniquely defined.
EXAMPLE 3.1. X = R', p' is the Poisson distribution with parameter h for every h > 0, and we are interested in the behavior of ph as h 10. Here
2(h) = -In h; S(x) = x for every nonnegative integer x and S(x) = +00 for the remaining x.
§3. Action Functional. General Properties
81
If X is a function space, we shall use the term action functional. Hence for the family of random processes sw where w, is a Wiener process, t E [0, T]
and wo = 0, a normalized action functional as c - 0 is S(cp) = . f I cp, I dt o for absolutely continuous cp,, 0 < t < T, cpo = 0, and S(q) _ + oo for all other cp and the normalizing coefficient is equal to p-2 (as the space X we
take the space of continuous functions on the interval [0, T] with the metric corresponding to uniform convergence).
We note that condition (0) implies that S(x) attains its minimum on every nonempty closed set. It is sufficient to consider only the case of a
closed A 9 X with SA = inf(S(x): x E A) < oo. We choose a sequence of nA SA. The nested compact sets points x E A such that s = nAD and therefore, their intersection is are nonempty (since nonempty and contains a point XA, S(xA) = sA. It would be desirable to obtain immediately a large number of examples
of families of random processes for which we could determine an action functional. The following result (Freidlin [7]) helps us with that. Theorem 3.1. Let )(h)S"(x) be the action function for a family of measures µ" on a space X (with metric p,,) as h 10. Let cp be a continuous mapping of X into a space Y with metric py and let a measure v" on Y be given by the formula vh(A) = µh((p-'(A)). The asymptotics of the family of measures v" as It 10 is given by the action function .(h)S"(y), where S"(y) = min{S"(x): x E w-'(y)} (the minimum over the empty set is set to be equal to + oc).
Proof. We introduce the following notation: tD"(s) = {x: S"(x) < s}, V(s) = {y: S"(y) < s}.
It is easy to see that 1"(s) = cp(b (s)), from which we obtain easily that S" satisfies condition (0). We prove condition (I). We fix an arbitrary y E Y and a neighborhood of
it. If S"(y) = oo, then there is nothing to prove. If S"(y) < oo, then there exists an x such that p (x) = y, S"(y) = S"(x). We choose a neighborhood of x whose image is contained in the selected neighborhood of y and thus obtain the condition to be proved. Now we pass to condition (II). The pre-image of the set (y: py(y, D"(s)) >_ 6) under tp is closed and does not intersect the compact set c1 (s). Therefore, we
can choose a positive d' such that the S'-neighborhood of 4)"(s) does not intersect p-' {y: p1(y, V(s)) >_ 8}. From inequality (3.2) with px,'" and 8' in place of 6, we obtain a similar inequality for v", py, 0" and 6.
Using this theorem in the special case where X and Y are the same space with distinct metrics, we obtain that if )(h)S(x) is an action function for a family of measures µh as It 10 in a metric p, and another metric P2 is such that p2(x, y) - 0 if p,(x, y) - 0, then A(h)S(x) is an action function in the
82
3. Action Functional
metric P2, as well. Of course, this simple assertion can be obtained directly.
From this we obtain, in particular, that E-2SOT(9) = (1/2e2) Jo I0tl2 dt remains an action functional for the family of the processes sw 0 < t < T as c 10 if we consider the metric of the Hilbert space LOT The following examples are more interesting. EXAMPLE 3.2. Let G(s, t) be a k times continuously differentiable function on
the square [0, T] x [0, T], k > 1. In the space COT we consider the operator 6 defined by the formula 6(p, = J G(s, r t) dips. 0
Here the integral is understood in the sense of Stieltjes and by the assumed smoothness of G, integration by parts is allowed:
69, = G(T, *PT - G(0, t)(po -
f T aliass, t) cps ds. o
a
This equality shows that 6 is a continuous mapping of COT into the space Oki t) of functions having k - 1 continuous derivatives with the metric
P'-'((p, ) = max
OSiSk-t 05t5T
I dt(cOt - 0) dt'
We calculate the action functional for the family of the random processes T
X, = Sdw, = c
G(s, t) dws J0
in CokT t' as s - 0. By Theorem 3.1, the normalizing coefficient remains the same and the normalized action functional is given by the equality SX((P) = SOT((P) = min{SoT(I/i): do = (p} T
= min 2
f 1 0 . 1 2 ds:
cp
;
0
if there are no 0 for which do = cp, then SX((p) = + oo. We introduce an auxiliary operator G in LOT, given by the formula Gf (t) =
TG(s,
J0
t) f (s) ds,
J3. Action
83
Functional. General Properties
SXT in terms of the inverse of G. This will not be a one-to-one general, since G vanishes on some subspace Lo S L02 T, which operator in nontrivial. We make the inverse operator one-to-one artificially, by may be setting G- 1(p = 0, where 0 is the unique function in LoT orthogonal to Lo and such that Go = cp. The operator G-' is defined on the range of G. If SX((o) < oc), then there exists a function i' E COT such that do = cp and S.-T(o) < oo. Then 0 is absolutely continuous and do = Giy. Therefore,
and express
j
T
Sor((v) = min{2 S I&12 ds: do =
=min 211f 112: Gf =
,
where If II is the norm of fin LoT. Any element f for which Gf = 9 can be
represented in the form f = G-'cp + f', where fE Lo. Taking into account that G-' (p is orthogonal to Lo, we obtain 11f112 = JIG- `(PII 2 + 11f,112 >JIG- 'tPI12 This means that SOT(9) = ilIG-'cp112 for (p in the range of G. For the remaining tp E CoT " the functional assumes the value + oo. EXAMPLE 3.3. We
consider a random process X; on the interval [0, T],
satisfying the linear differential equation
_
d P(dt)X,
R Qk
d'`X `
dtk
evV,
k =0
where t.', is a one-dimensional white noise process. In order to choose a unique solution, we have to prescribe n boundary conditions; for the sake
of simplicity, we assume them to be homogeneous linear and nonrandom. We denote by G(s, t) the Green's function of the boundary value problem
connected with the operator P(d/dt) and our boundary conditions (cf. Coddington and Levinson [1]). The process X; can be represented in the form X; = s I' G(s, t) dws, i.e., X` = d(ew). In this case the corresponding operator has the single-valued inverse G-' = P(d/dt) with domain consisting of the functions satisfying the boundary conditions. From this we conclude that the action functional for the family of the processes X, as e - 0 is E- 2SoT(cp), where 2
S0T(gP) =
T J0T
dt,
and, if cp does not satisfy the boundary conditions or the derivative d"-'(p,/dt' ' is not absolutely continuous, then SoT(gP) = - co. The action functional has an analogous form in the case of a family of multidimensional random processes which are the solution of a system of linear differential equations on the right side of which there is a white noise multiplied by a small parameter.
84
3. Action Functional
In the next chapter (§1), by means of the method based on Theorem 3.1, we establish that for a family of diffusion processes arising as a result of a perturbation of a dynamical system z = b(x,) by adding to the right side a white noise multiplied by a small parameter, the action functional is equal to (1/21;2) IT It - b(tPr)I2 dt. We consider the conditions (0), (I), and (II) in more detail. In the case of a complete space X, condition (0) can be split into two: lower semicontinuity of S(x) on X (which is equivalent to closedness of 4(s) for every s) and relative compactness of V(s). Such a splitting is convenient in the verification of the condition (cf. Lemma 2.1). The condition of semicontinuity of S is not stringent: it is easy to prove that if the functions 2(h) and S(x) satisfy conditions (I) and (II), then so do 1(h) and the lower semicontinuous function S(x) = S(x) A limy-x S(y). (The method of redefining the normalized action functional by semicontinuity is used in a somewhat more refined form in the proof of Theorem 2.1 of Ch. 5.) Theorem 3.2. Condition (I), together with the condition of relative compactness of cd(s), is equivalent to the condition: (Ieq)
for any d > 0, y > 0 and so > 0 there exists ha > 0 such that inequality (3.1) is satisfied for all h < ho and all x e (D(so).
Condition (II) implies the following:
(II,,) for any S > 0, y > 0 and so > 0 there exists an ho > 0 such that inequality (3.2) is satisfied for all It < ho and s < so. Conditions (0) and (II) imply the following:
(II+) for any S > 0 and s >- 0 there exist y > 0 and ho > 0 such that µ"{y: p(y, (D(s)) >- S} < exp{ -A(h)(s + y)}
(3.3)
for all h
inf(S(y): y e A) > s + 2y; then A n cp(s + 2y) = 0. We select a positive b' not exceeding p(A, (D(s + 2y)) (this distance is positive by virtue of the compactness of the second set) and use inequality (3.2) for 8' in place of 8 and s + 2y in place of s.
Conditions (I) and (II) were introduced in describing the rough asymptotics of probabilities of large deviations in the papers [1], [4] of Wentzell
85
§3. Action Functional. General Properties
and Freidlin. There are other methods of description; however, under condition (0) they are equivalent to the one given here.
In Varadhan's paper [1] the following conditions occur instead of conditions (I) and (II): (1')
for any open A c X we have
-inf{S(x): x e A);
(3.4)
HE A(h)-' In y4(A) < -inf{S(x): x e A).
(3.5)
lim A(h)
In p"(A)
hjo
(II') for any closed A Z- X we have hj0
Theorem 3.3. Conditions (I) and (I') are equivalent. Condition (II') implies (II) and conditions (0) and (II) imply (II') Consequently, (I) p (I') and (II) a (II') under condition (0).
Proof'. The implications (I') (I), (I) (I') and (II') (II) can be proved very simply. We prove, for example, the last one. The set A = {y: p(y, (D(s)) b} is closed and S(y) > s in it. Therefore, inf{S(y): y E A', s. From (II') we obtain: Ii ii 0 A(h)' In p"(A) < -s, which means that for every y > 0 and h sufficiently small we have: A(h)-' In p"(A) < -s + y, i.e., (3.2) is satisfied.
Now let (0) and (II) be satisfied. We prove (II'). Choose an arbitrary y > 0 and put s = inf{S(y): y E A) - y. The closed set A does not intersect the compact set cD(s). Therefore, S = p(A, (D(s)) > 0. We use inequality (3.2) and obtain that ph(A) -< ph{y: p(y, cD(s)) > S}
< exp{ -A(h)(s - y)) = exp{ - A(h)(inf{S(y): y e A} - 2y)}.
for sufficiently small h. Taking logarithms, dividing by the normalizing coefficient A(h) and passing to the limit, we obtain that T1Mhj0 A(h)' In p" (A)
< -inf{S(y): y e A) + 2y, which implies (3.5), since y > 0 is arbitrary.
In Borovkov's paper [1] the rough asymptotics of probabilities of large deviations is characterized by one condition instead of the two conditions (I) and (II) or (I') and (II'). We shall say that a set A C- X is regular (with respect to the function S) if the infimum of S on the closure of A coincides with the infimum of S on the set of interior points of A: inf{S(x): x e [A]) = inf{S(x): x E (A)).
86
3. Action Functional
We introduce the following condition: (1 z)
for any regular Borel set A g X,
lim ,1(h)-' In p''(A) = -inf{S(x): x e A).
(3.6)
hjO
Theorem 3.4. Conditions (0), (I) and (II) imply (I Z). Moreover, if A is a regular set and min{S(x): x e [A]} is attained at a unique point xo, then lim
ph(A n {x: p(xo, x) < S}) = 1
hj0
(3.7)
ph(A)
for every 6 > 0. Conversely, conditions (0) and (I Z) imply (I) and (II).
We note that in terms of random elements of X, (3.7) can be rewritten in the form lim Ph{p(xo, h) < 6I 1;h e A} = 1.
(3.7')
hj0
Proof. We use the equivalences (I) e> (I') and (II) (II') already established (under condition (0)). That (I') and (II') imply (I is obvious. Moreover, if Z) A is a non-Borel regular set, then (3.6) is satisfied if ph(A) is replaced by the corresponding inner and outer measures. To obtain relation (3.7), we observe that min{S(x): x e [A], p(x, xo) >- S} > S(xo).
We obtain from (3.5) that fim t.(h)-' In ph{x a A: p(xo, x)
S}
hj0
< HE ),(h)
In ph{x a [A]: p(xo, x) >_ S} < -S(xo).
hjO
This means that ph{x e A: p(xo, x)
S} converges to zero faster than
ph(A) ; exp{ -A(h)S(xo)}. We show that (I Z) implies (I') and (II'). For any positive 6 and any set A c X we denote by A+a the 6-neighborhood of A and A_a the set of points
which lie at a distance greater than 6 from the complement of A. We set s(± 6) = inf(S(x): x e A+b) (infimum over the empty set is assumed to be equal to + cc). The function s is defined for all real values of the argument; at zero we define it to be inf{S(x): x e A}. It is easy to see that it is a nonincreasing function which is continuous except at possibly a countable number of points.
87
§3. Action Functional. General Properties
If A is an open set, then s is left continuous at zero. Therefore, for an arbitrarily small y > 0 there exists a b > 0 such that s(-6) < s(0) - y, and s is continuous at -6. The latter ensures the applicability of (3.6) to A_,,. We obtain
Jim A(h)-' In ph(A) >: lim A(h) - I In µh(A_5) > -s(0) - y. hjo
hj0
Since y is arbitrary, this implies (3.4).
In the case of a closed set A we use condition (0) to establish the right continuity of s at zero and then we repeat the same reasoning with A+a replacing A_a. 2 II(PII > c}. EXAMPLE 3.4. Let A be the exterior of a ball in LO2 T: A = {cp e LOT:
This set is regular with respect to the functional SoT considered in Example 3.3. To verify this, we multiply all elements of A by a number q smaller than
but close to one. The open set q A = q (A) absorbs [A] and the infimum of the functional as well as all of its values change insignificantly (they are multiplied by q2). Since A is regular, we have lim E2 In P{IIX'11 > c} = lim E2 In P{11X'11 > c} =
-min{
E-o
E-o
(dt d(P
P
where the minimum is taken over all functions cp satisfying the boundary conditions and equal to c in norm. We consider the special case where the operator P(d/dt) with the boundary conditions is self-adjoint in LOT. Then it has a complete orthonormal system of eigenfunctions ek(t), k = 1, 2, ... , corresponding to the eigenvalues
... (cf., for example, Coddington and Levinson [I D. If a function qp in LOT is representable in the form Eckek, then P(d/dt)cp = Ek°°=t ckAkek Ak, k = 1, 2,
and II P(d/dt)cp II Z = Y ck Ak .
This implies that the minimum in (3.8) is equal
to c2/2 multiplied by A;, the square of the eigenvalue with the smallest absolute value. Consequently, Jim c2 In P{11X'11 > c} E-o
_ -c2Ai/2.
(3.9)
The infimum of S(cp) on the sphere of radius c in LOT is attained on the eigenfunctions, multiplied by c, corresponding to the eigenvalue A,. If this eigenvalue is simple (and -A, is not an eigenvalue), then there are only two such functions: ce, and -ce,. Then for any 6 > 0 we have limP{IIX' - ce, II < S or IIX' + ce,11 < 61 IIX`II > c} = I. Ejo
88
3. Action Functional
The same is true for the conditional probability under the condition IX`II > C.
A concrete example: P(d/dt)cp = dzcp/dt2, boundary conditions:
(po
= (PT = 0. This operator is self-adjoint. The equation cp" = Acp, (po = coT = 0
for the eigenfunctions has the solution: A = Ak = -k2nz/T2; ek(t) (,/2-/T) sin(knt/T). The eigenvalues are simple. We have Jim e' In P{IIX`II > c}
r-o
c2n4
2T 4
for every d > 0 the conditional probability that X` is in the 6-neighborhood of one of the functions ±c([2/T) sin(ttt/T), under the condition that IIX`II is greater (not smaller) than c, converges to I as a -> 0. If our differential operator with certain boundary conditions is not selfadjoint, then A in (3.9) must be replaced with the smallest eigenvalue of the product of the operator (with the boundary conditions) with its adjoint.
EXAMPLE 3.5. Let b(x) be a continuous function from R' into R'. On the space COT of continuous functions on the interval [0, T] with values in R', consider the functional SOT(cp) equal to
IT 1 cps
- b((ps)12 ds, if co is absolutely
continuous and cpo = xO to + oe on thezrest of COT. Let an open set D a xO, D
R' be such that there exist interior points of the complement of D
arbitrarily close to every point of the boundary aD of D (i.e., OD = O[D]). Let us denote by AD the open set of continuous functions cp 0 S t < T such that cp, e D for all t E [0, T]. We prove that AD = COT\ AD is a regular set with respect to SOT-
Let the minimum of SOT on the closed set AD be attained at the function
cp 0 < t < T, cpo = xO (Figure 2). This function certainly reaches the boundary at some point to # 0: cp,a E OD. The minimum is finite, because there exist arbitrarily smooth functions issued from xO and leaving D in the time interval [0, T]; this implies that the function cp, is absolutely continuous.
Figure 2.
89
§3. Action Functional. General Properties
For any S > 0 there exists an interior point x° of R'\ D in the 6-neighborhood of the point cp,0. We put
I
(pa=tp,+ t (x°-tpro), to
0
this function belongs to the interior of AD. We prove that SoT((pa) as 810. This implies the regularity of AD. We have I
S0T(gp) = 2
f
T
o
[106, - b(wa) IZ
- 10, -
S0T((p)
b((p,) 121 dt
T(0a
=
- b((pi) - 0, + b((p,),
b((o,)) dt
J0
+
1
2
f
0
106
- b((ga) - 0, + b((p,) I2 dt.
On the other hand, c0a - b((pa) - cp, + b((p,) = to'(xe - (p,a) + b((p,) b((pa) - 0 uniformly in t c- [0, T] as S } 0. Consequently, the scalar product of this function with 0, - b(4,) and its scalar square in Lot. also converge to zero.
In Ch. 4 we prove that e2SoT(cp) is the action functional for the family of diffusion processes X; described by the equation X = b(X1) + etv X10 = x0
(provided that b satisfies a Lipschitz condition). Then as a - 0 we have: P{X; exits from D for some value t E [0, T]} x exp{ -e-2 S0T((p)). If this minimum is attained at a unique function, then the trajectories of X;, going out of D lie near this function with an overwhelming probability for small e.
We note that if D does not satisfy the condition OD = O[D], then the corresponding set AD may not be regular. If the boundary OD is smooth, then we can prove that AD is regular with respect to the same functional.
The last remark is concerned with the notion of regularity: if the action function is continuous (which was not the case in several examples related to function spaces), then a sufficient condition of regularity of the set A is the coincidence of a(A) with a[A]. Here is another form of the description of rough asymptotics (the integral description): (III)
If F(x) is a bounded continuous function on X, then exp{A(h)F(x)}µh(dx) = max{F(x) - S(x)}.
lim 1%(h)-' In h10
fX
X
(3.10)
90
3. Action Functional
This condition (under condition (0)) is also equivalent to conditions (I') and (II') (or (1) and (II)). A deduction of (III) (and even more complicated integral conditions) from (I') and (II') is contained in Varadhan's article [1]. We mention still another general assertion. Theorem 3.5. The value of the normalized action function at an element x e X can be expressed by either of the following two limits:
S(x) _ -lim lim 2(h)
al0ho
In u''{y: p(x, y) < 8}
= -lim lim 2(h)-' In ph{y: p(x, y) < 8}.
(3.11)
610 hj0
Proof. From condition (I) we deduce that l ma1o lim,;io >- -S(x), and
from conditions (II) and (0) that Nato limhlo S -S(x). We prove the second one. For any y > 0, the compact set b(S(x) - y) does not contain x. We choose a positive b smaller than one half of the distance of x from this compact set. We have uh{y: P(x, y) < b} < ph{y: p(y, D(S(x) - y)) > b}. By (3.2), this does not exceed exp{-2(h)(S(x) - 2y)} for h sufficiently small. This gives us the necessary assertion.
We note that we cannot replace conditions (I), (II) (under condition (0)) by relations (3.11). Relations (3.11) do not imply (II), as the following example shows: X = R', µh is a mixture, with equal weights, of the normal distribution
with mean 0 and variance h and the normal distribution with mean 0 and variance exp{e"h2}, and 2(h) = h-'. Here limalo limhlo 2(h)-' In ph{y:
p(x, y) < 8} _ -x2/2 for all x; the function x2/2 satisfies (0) but (II) is not satisfied. We formulate, in a general language, the methods used for the verification of conditions (I) and (II) in §2 (they will be used in §4 and §§1 and 2 of Ch. 5 as well). To deduce (I), we select a function gx(y) on X so that the family of measures j''(dy) = exp{2(h)gx(y)}µh(dy) converge to a measure concentrated at x; then we use the fact that
µh{y; P(x,y) < 8} =
5 Y: Ax. A
exp{-2(h)9x(y)}uh(dy)
On a part of the domain of integration with a sufficiently large µh-measure
we estimate gx(y) from above: gx(y) < gx(x) + y (we use Chebyshev's inequality) and we take gx(x) as S(x). This method was used by Cramer [1] to obtain precise rather than rough results in the study of distributions of sums of independent random variables.
91
§3. Action Functional. General Properties
We have obtained condition (II) in the following way: we chose a function
z(x) such that its values belong to {y: S(y) < co} and a set A such that p(Y(x), x) < 6 for x E A. Further we used the inequality µ''{x: p(x, (D(s)) >- a} < p"(X \A) + µ''{x c- A: z(x) 0 (D(s)) .
We estimated the first term by means of Kolmogorov's exponential inequality and the second term also by means of Chebyshev's exponential inequality: µ''{x E A: x(x) 0 4)(s)) = p'{x c- A: S(X(x)) > s}
< JA
exp{(l - x)A(h)S(z(x))}µ"(dx)
x exp{-(1 - x)A(h)s}. Of course, the problem of estimating the integral on A is solved separately in each case.
Finally, we discuss the peculiarities arising in considering families of measures depending, besides the main parameter h, on another parameter x assuming values in a space X (X will have the meaning of the point from which
the trajectory of the perturbed dynamical system is issued; we restrict ourselves to the case of perturbations which are homogeneous in time). First, we shall consider not just one space of functions and one metric; instead for any T > 0, we shall consider a space of functions defined on the interval [0, T] and a metric POT. Correspondingly, the normalized action functional will depend on the interval: S = SOT. (In the case of perturbations considered in Chs. 4, 5, and 7, this functional turns out to have the form SOT(W) = f o L(rp gyp,) dt, and it is convenient to define it analogously for all intervals [T,, T2], - oo < T, < T2 < + oo on the real line.) Moreover, to every point x E X and every value of the parameter h there will correspond its own measure in the space of functions on the interval
[0, T]. We can follow two routes: we either consider a whole family of functionals depending on the parameter x (the functional corresponding to x is assumed to be equal to + oo for all functions rp, for which p o # x) or introduce a new definition of the action functional suitable for the situation. We follow the second route. Let {Q", .' h, PX} be a family of probability spaces depending on the
parameters It > 0 and x running through the values of a metric space X and let X;, t >- 0 be a random process on this space with values in X. Let POT be a metric in the space of functions on [0, T] with values in X and let Sor((P) be a functional on this space. We say that A(h)SOT(W) is the action func-
tional for the family of the random processes (X', Px) uniformly in a class
92
3. Action Functional
,W of subsets of X if we have: (0p)
the functional SOT is lower semicontinuous and the sum of the cx(s) for x e K is compact for any compact set K E- X, where I)Js) is the set of functions on [0, T] such that p o = x and SOT((P) < s; for any b > 0, any y > 0, any so > 0 and any A e .c there exists an ho > 0 such that Px{POT(Xh, (P) < 8} > exp{-A(h)[SOT(T) + Y]}
(3.12)
for all h0, y > 0, so > 0 and any A e.4 there exists an ho > 0 such that Px{POT(X'. (Dx(s)) >- 8} < exp{-.1(h)(s - y)}.
(3.13)
for all h
§4. Action Functional for Gaussian Random Processes and Fields Let X, be a Gaussian random process defined for 0 < t < T, with values in
R' and having mean zero and correlation matrix a(s, t) = (a"(s, t)), a''(s, t) = MXsX,. The functions a''(s, t) will be assumed to be square integrable on [0, T] x [0, T]. We denote by A the correlation operator of X, acting in the Hilbert space LoT of functions on [0, T] with values in R': Af, = f a(s, t)fs ds. The o and the norm scalar product in this space is (f; g) = f o _;, f'(s)g'(s) ds, 112 is II f II = (f, f) We retain the notation (f, g) for the integral f (fs, g.') ds in the case where the functions do not belong to LoT but the integralo is defined in some sense. For example, if ti', is an r-dimensional white noise process, then fT f(s) dw5. We assume that A has finite trace. As is known (Gikhman and Skorokhod
[2], Ch. V, §5), this implies that the trajectories of X, belong to LoT. We denote by A 112 the nonnegative symmetric square root of A. The operator A 112 as well as A are integral operators with a square integrable kernel. In order to reconstruct this kernel, we consider the eigenfunctions el(t), ... ,
of A and the corresponding eigenvalues A,-,
Since
a(s, t) is a correlation function, A is nonnegative definite, i.e., Ak >- 0. We A,, < oo. put G(s, t) _ .Ik'2ek(s)ek(t). The operator A has finite trace Therefore, the series > A,','2ek(s)ek(t) is convergent in L101,101. The
93
14. Action Functional for Gaussian Random Processes and Fields
function G(s, t) is precisely the kernel of the integral operator A 112. In order to see this, it is sufficient to note that the eigenfunctions of the operator with kernel G(s. t) coincide with the eigenfunctions of A and its eigenvalues are square roots of the corresponding eigenvalues of A.
The random process X, admits an integral representation in terms of the kernel of A''2: T
X, =
G(s, t) dws, J0
where w, is an r-dimensional Wiener process. We shall write this equality in the form X, = A'"21i.,. In order to prove it, it is sufficient to note that X = A 112 N is a Gaussian process with MX, = 0 and
MX,X, =
T G(s,
u)G(t, u) du = a(s, t).
J0
The operators A and A 112 nullify some subspace L0 s L02 T. This subspace
is nontrivial in general, and therefore, for a definition of the inverses A-' and A-1/2, additional agreements are needed. We define the inverses by = tfi2 if Aifi1 = cp, A'12tli2 = (p and 'P,, 2 are = O1, A setting A orthogonal to L0. Consequently, A-' and A- 112 are defined uniquely on the ranges of A and A 112, respectively. On LoT we consider the functional S(rp) equal to IIA"2rPII2; if A-112(p z which A-' is defined, is not defined, we set S((p) = + o. For functions cp for Z(A-'cp, (p), i.e., S((p) is an extension of the functional we have S(cp) =
'(A-'ap, (p)
Theorem 4.1. The functional S(ip) is a normalized action .functional for the Gaussian process X,` = EX, in LoT as c - 0. The normalizing function is 2
Proof. Let GN(s, t) be a symmetric positive definite continuously differentiable
kernel on the square [0, T] x [0, T] such that T T [G(s,
Jo Jo
t) - GN(S, t)]2 ds dt <
,
N
where G(s, t) is the kernel of A 112. We denote by GN the operator with kernel GN(s, t). First we verify the inequality P{IIXE
- (pll < 8) > exp{-E-2(S(q) + y)}
(4.1)
for any 6, y > 0 for sufficiently small positive c. If S((p) = ±xc, then (4.1) is obvious. Let S((p) < x. There exists a 0 e LOT orthogonal to L0 and such
94
3, Action Functional
that A"2i/i = (p. We put (PN = GNy1 and XN = GN14'. Choose No so large that IIwN - (P11 _ < II'I (f fo I G(s, t) - GN(s, t)] 2 ds dt)''2 < 6/3 for N > No. For such N we have o {ILEX - (PII < b} 2 {ILEX - EXNII < 6/3, IIEXN - (PNII < 6/3).
From this we obtain
P{ILEX - (PII<S}>PIEXN-WNII<3}-P{ILEX-EXNII>-S/3}. }}}
(4.2)
Since GN(S, t) is a continuously differentiable kernel, in estimating the first term of the right side, we can use the result of Example 3.2: P{IIEXN - (PNII <
6/3} >- exp{-E-2(S((p) + y)}
(4.3)
for s smaller than some E1. Here we have used the fact that
GN'(PN=tP =A-"2(P,
S((P)=2Ik1'112 =2-IIGN'(PNII2
Moreover, using Chebyshev's inequality, we obtain jT
- EXN? /3} < e-
(
2M exp{9a
rT
2
rJ (G(s, t) - GN(s, t)) dw., L0
r
a2
dt _> 9E2
Jl
T
J0
T (G(s,
Lfo
t) - GN(s, t)) dws12 dt } J
(4.4)
JJJ
for an arbitrary a > 0. For sufficiently large N, the mathematical expectation on the right side is finite. To see this, we introduce the eigenfunctions (Pk
and corresponding eigenvalues µk, k = 1, 2,
... of the symmetric square
integrable kernel rN(s, t) = G(s, t) - GN(s, t). It is easy to verify the following equalities: FN(S, t) = Y, Pk (Pk(S)(Pk(t), T
µk =
J0
T
JO I NO, t)2 ds dt,
TrN(S, t) dw3 = Y- (k(Pk(t) T (Pk(s) dw.,
J0
k
0
jr [5T I AS, t) dws
JZ
dt = , µk [s:kS ) dw] 2
95
§q, Action Functional for Gaussian Random Processes and Fields
The random variables bk = !o 9k(s) dws have normal distribution with mean zero and variance one and they are independent for distinct k. Taking account of these properties, we obtain T
2 Za
[LTrNS, t) dws dt } = M exp
M exp as 00
(9a
°°
k zk
M exp S2 µkbk
18a
= f`
k=
E µk ki )
)
fo
k=
(
1 b2
µk
The last equality holds if (18a/52)µk < 1 for all k. Since µk
=
f
7
0
T
f
FAS, t)2 ds dt -+ 0
0
as N - oo, relation (4.5) is satisfied for all N larger than some N1 = N,(a, 6). The convergence of the series Y µk implies the convergence of the infinite product in (4.5). Consequently, if N > N1, then the mathematical expectation on the right side of (4.4) is finite and for a = S((p) + y and for c sufficiently small we obtain P{IIEX - EXNII >- 6/3} 5 const exp{-s-2(S((p) + y)}.
(4.6)
Combining estimates (4.2), (4.3), and (4.6), we obtain (4.1). Now we prove that for any s > 0, y > 0, 6 > 0 there exists an Eo such that
P{p(EX, (i(s)) >- S} < exp{-E-2(s - y)}
(4.7)
for E < Eo, where i(s) = {tp E LoT: S((p) < s}.
Along with the image '(s) of the ball of radius
2s in LoT under the
mapping A"2, we consider the image CDN(s) of the same ball under GN. Let N > 6s/6. Taking account of the definition of GN(s, t), we obtain T r T
sup
IIGw - GN(pII2 =
m:ll(vll < z:
sup s
fo Lfo(G(s, t) - GN(s, t))cps ds
2
dt
L
ff ,:IIrolls 0 T
T(G(s,
sup
t) - GN(S, t))2 ds dt II (p 112
o
< 6/3.
From this we obtain P(p(EX, 41(s)) >- b) < P{IIEX - EXNII >- 6/3} + P{p(EXN, (DN(s)) >- 6/3)
(4.8)
96
3. Action Functional
for N > 6s/8. By virtue of (4.4), the first term on the right side can be made smaller than exp{-E-2(s - y/2)) for N sufficiently large. The estimate P{p(EXN, (DN(s)) > 6/3} < exp{-e-2(s - y/2)}
of the second term follows from Example 3.2. Relying on these estimates, we can derive (4.7) from (4.8). Theorem 4.1 is proved. Remark. As was established in the preceding section, estimates (4.1) and (4.7) are satisfied for sufficiently small a uniformly in all functions (p with S((p) < const and for all s < so < oo, respectively.
In the above proof, taken from Freidlin [1], the main role was played by the representation X, = i, of the Gaussian process X,. This representation enabled us to reduce the calculation of the action functional for the family of processes eX, in LaT to the estimates obtained in §2 of the corresponding probabilities for the Wiener process (for the uniform metric). We are going to formulate and prove a theorem very close to Theorem 4.1; nevertheless, we shall not rely on the estimates for the Wiener process in the uniform metric but rather reproduce proofs from §3.2 in a Hilbert space setting (cf. Wentzell [4]). This enables us to write out an expression of the
action functional for Gaussian random processes and fields in various Hilbert norms. Of course, a Gaussian random field X. can also be represented in the form f G(s, z) dws and we can use the arguments in the proof o of Theorem 4.1. Nevertheless, this representation is not as natural for a
random process. Let H be a real Hilbert space. We preserve the notation (,) and II II for
the scalar product and norm of H. In H we consider a Gaussian random element X with mean 0 and correlation functional B(f, g) = M(X,f)(X, g). This bilinear functional can be represented in the form B(f, g) = (Af, g) where A is a self-adjoint linear operator which turns out to be automatically nonnegative definite and completely continuous with a finite trace (Gikhman and Skorokhod [2], Ch. V, §5). As earlier, we write S((p) _'IIA-'"2(p1I2 If A-1/2cp is not defined, we set S(4) = + oo. In order to make A - t/2 singlevalued, as A-'12(p we again choose that element 0 which is orthogonal to the null space of A and for which A1,20 = (p. Theorem 4.2. Let s, 6 and y be arbitrary positive numbers. We have
P{IIEX - (p 11 < 6) > exp{-e-2(S((p) + y)}
(4.9)
fore > 0 sufficiently small. Inequality (4.9) is satisfied uniformly for all cp with S((p) < s < oo. If fi(s) = {cp e H: S((p) < s}, then
P{p(eX, (D(s)) >- 6) S exp{-e-2(s - y)}
(4.10)
97
§4. Action Functional for Gaussian Random Processes and Fields
fore > 0 sufficiently small. Inequality (4.10) is satisfied uniformly for all
s<so
Proof. Let ei, i = 1, 2,... be orthonormal eigenfunctions of A and let 2i be the corresponding eigenvalues. We denote by Xi and (pi the coordinates of
X and ( p in the basis e1, e2, .... Here Xi = (X, e), i = 1, 2, ... , are independent Gaussian random variables with mean zero and variance MX; = M(X, ei)2 = (Aei, ei) = 2i. The functional S(p) can be represented in the following way: S((p) = i II A -' 12(p II 2 = i (W? /2i) We assume that
i
S((p) < oo. Then the joint distribution of the Gaussian random variables Xi - e-'(p;, i = 1, 2, ... , has a density p with respect to the distribution of the variables X., i = 1, 2,...:
p(X,,... , x,,, ...)
16-1 (piXi
exp i=1
-2
11
E-
2 (p2 i Jj.
Therefore,
P{Ilex - (PII < b} = P{ Y_ (X1 - e-,(Pi)' <
(a/E)`I
= M{fY_X? < 0/e)2; p(X1, X2....)} (4.11)
= M{ IIX112 < (6/E)2; 00
x expj -E-' Y 'I i t
EZS(O-
i=t
JJJJ
Using Chebyshev's inequality, we find that 00
P{11X112 < (b/e)2) > 1 - e26-2MIIX112 = 1 - E26-2 Y Ai > 3/4 i=1
for e < 2b(y?° 1 2i) - 'f2 and 00
P{Iy 1- '(piXil < K) > 1 - K-2MI Y_ Ai 19iXi1
2
J
= I - K-2 Y A`93 = 1 - 2K-2S(q) >- 3/4 i=1
for K >- 2,/2_ S((p). It follows from these inequalities that the random variable under the sign of mathematical expectation in (4.11) is greater than
exp{-e-2S((p) - e-'K) with probability not smaller than 3/4. This implies inequality (4.9).
98
3. Action Functional
Now we prove the second assertion of the theorem. We denote by X the random vector with coordinates (X,, X2, ... , Xio, 0, 0,...). The choice of the index io will be specified later. It is easy to verify that P{p(EX, c(s)) >- d} < P{cX0(D(s)) + P{p(X,X) >- S/E}. (4.12)
The first probability is equal to P{S(EX) > s} = P{S(X) > se-2} = P{
(
A 'X; > 2sc 2}.
(4.13)
JJJ
The random variable D0= i A' X? is the sum of squares of io independent normal random variables with parameters (0, 1), and consequently, has a X2-distribution with io degrees of freedom. Using the expression for the density of a X2-distribution, we obtain
P{eX00(s)} = P1j=Y A 'Xf > 2se-2} t
_
°c
J2sE
)))
1
Xio/2-
(4.14)
r(ia/2)2`0/2
< const E-'0 exp{ -E-2s} for e > 0 sufficiently small. The second probability in (4.12) can be estimated by means of Chebyshev's exponential inequality: P{ p(X, X) > b/E} < exp{ - 2 (6/0' } M exp{ 2 p(X, j)2 )2 }.
(4.15)
The mathematical expectation on the right side is finite for sufficiently large io. This can be proved in the same way as the finiteness of the mathematical
expectation in (4.4); we have to take account of the convergence of the series
1 A... Substituting c = 2sb-2 in (4.15), we have
P{p(X, X) > b/E} < const exp{ -ss- 2}
(4.16)
for sufficiently large io. Combining formulas (4.13), (4.14) and (4.16), we obtain the last assertion of the theorem.
This theorem enables us to calculate the action functional for Gaussian random processes and fields in arbitrary Hilbert norms. It is only required that the realizations of the process belong to the corresponding Hilbert space. In many examples, for example, in problems concerning the crossing of a level by a random process or field, it is desirable to have estimations in
j4. Action Functional for Gaussian Random Processes and Fields
99
the uniform norm. We can use imbedding theorems to obtain such estimations.
Let D be a bounded domain in R' with smooth boundary. Let us denote by W'2 the function space on D obtained from the space of infinitely differentiable functions in D by completing it in the norm
II u II w2 =
(
igI<_l
\112
fDD
dx) /
qj and ti(q) = al91u/axl' ... axq'. The space q,), 191 = W21 with this norm is a separable Hilbert space (Sobolev [1]). Roughly speaking, W'2 consists of functions having square integrable derivatives of where q = (q1,
order I. In order that the realizations of a Gaussian random field X=, z e D c R' with mean zero and correlation function a(u, v) = MX. X" belong to W;(D), it is sufficient (cf., for example, Gikhman and Skorokhod [1]) that the correlation function have continuous derivatives up to order 21
inclusive. Let m be a multiindex (m,, ... , mr), m, >- 0, and let MI + m2 +
ImI=
+ m, < 1 - r/2. For all x e D we have the estimation a!mlu(x)
ax;
ax;'
This inequality comprises the content of an imbedding theorem (cf. Sobolev [1] or Ladyzhenskaya and Ural'tseva [1], Theorem 2.1). It follows easily from this that if the correlation function of a random field has continuous derivatives of order 21 in D u OD, then estimates (4.9) and (4.10) are satisfied in the metric of C;' for m < 1 < r/2. Moreover, the functional S((p) is defined i1IA-112cpII2, where A is the correlation operator. by the equality S((p) = As above, A- 112 is made single-valued by requiring the orthogonality of A-'/2cp to the null space of A. If A -112cp is not defined, then S((p) = + oe. In particular, for 21 > r, estimates (4.9) and (4.10) hold in the norm of CD. If we apply sharper imbedding theorems to obtain estimates in CD, we can reduce the requirements on the smoothness of the correlation function. Lemma 4.1. (a) (b)
The set cp(s) = {(p a H: S(cp) < s}, s < oo is compact in H; the./iunctional S((p) is lower semicontinuous, i.e., if !I P. - cpll -- 0 as S((p ). n - oc, then S(cp) <
Proof. First we prove (b). We consider the Hilbert space H, c H obtained by completing the domain DA -./2 in the norm II If 11 t = II A -12f 11. This expression indeed defines a norm, since A-'12 is linear and does not vanish on nonzero elements. It is sufficient to prove our assertion for a sequence
3. Action Functional
100
cp such that
exists and is finite. For such a sequence, const < oe for all n. Since the set is bounded, it is weakly compact, i.e., there exists an element ip E H, such that some subsequence = (A - l!2(p,, A- ',2J') converges weakly to ip in H,: (ip, I ),,, = (A-1j2ip, A-1 r 2f) as i converges ce for any j e H,. This implies that weakly to W in H. Indeed, let g e H. We have Ag E H, and
g) =
g) = (A-
A- 1 2Ag) = (9n,, Ag)n, -4 0
as i cc. By virtue of the uniqueness of weak limits in H we obtain that ip = cp. Now we obtain assertion (b) of the lemma from the lower semicontinuity of the norm in the weak topology. Assertion (a) of the lemma follows from (b) and from the fact that A, and thus A112, are completely continuous.
Remark. It follows from the proof that S(cp) is lower semicontinuous in the topology of weak convergence in H. We indicate some more properties of S(cp), clarifying its probabilistic meaning. Theorem 4.3. Let co e DA - 2. We have
Jim lim e2 In P{IIEX - toll < S} = -S(ip);
(4.17)
alo clo
Jim alo
P{IIeX - (Pll < 8} = P{IIEXII < S}
exp
{-
e -2S( (p )} .
(4 . 18)
Proof. The first assertion was proved in a general setting §3. We prove the second assertion only for cp E DA-,. (Concerning a proof for cp a DA-,;Z, cf. Sytaya [1]; there is a sharpening of the first assertion of the theorem there.) Using the notation of Theorem 4.2, from equality (4.11) we obtain P{IIEX - colt < S} = M{IIEXII < S;
exp{E-1
t
i= Ai 'coixi - E-2S((p)
{'
= exp{ -E-2S((p)}MIIEXII < S; expES c
(
cc,
i=1
Al
1
cpiXi}} (4.19)
§Cq, Action
Functional for Gaussian Random Processes and i 'ields
If ipeDA _, then IY A `w;X1I = I (A-'(p,X)I
101
11A-'(PII IIXII. From this we
conclude that lim bio
M{1IeXII < d; exp{e-' Y Ai`(p,X1)} P{I1eXII < b}
-
1.
The last equality and (4.19) imply the second assertion of the theorem. Remark. Relation (4.18) is also interesting in problems not containing a small
parameter. It can be interpreted as an assertion stating that for a Gaussian random field (process) X, the functional exp{ -S(q )} plays the role of a nonnormalized probability density with respect to "uniform distribution in the Hilbert space H." EXAMPLE 4.1. Let D be a bounded domain in R" with smooth boundary OD. Let X. be a Gaussian random field defined for z e D u 8D and having mean zero and correlation function a(z,, z2). We assume that a(z,, z2) has continuous derivatives through order r + 2. Then the realizations of Xz have square integrable partial derivatives up to order [r/2] + I on D with probability 1. In other words, these realizations belong to WI2I21+'(D), and consequently, by the above imbedding theorem, X E CD,,'D and IIXII < const . IIXII w=r 21' 1.
This implies that the functional S((p) defined as
211A-' 2(p112 if A-'/2cp is
defined and as + oo for the remaining to, is a normalized action functional for the random field eX, as a - 0 not only in W2i21+' but also in CDv,D We put G = {rp a CDieD: max=EDUeD I tp(z) I 11. We study the behavior
of P{eX e G} as a -i 0. Every element of the boundary of the closed set G can be moved into the interior of G by multiplying it by a number arbitrarily close to 1. Since S(acp) = a2S(cO), this implies the regularity of G with respect to S(cp). Therefore, by Theorem 3.4 we have Jim e2 In P{eX E G} e-o
inf S((p). (peG
We calculate the infimum.
We denote by ek(z) and Ak the eigenfunctions and the corresponding eigenvalues of the symmetric kernel a(z,, z2). It is known that a(z,, z2) [,, Akek(zl)ek(z2). We calculate the value of S at the function a(z0, z) =
L Akek(ZO)ek(z). We have S((pzo) =
211A- 112(p..112
= 2 Y Ak '((p,,., ek) = 12 Y_ Akek(ZO) = 2a(z0, zn).
102
3. Action Functional
Let the function a(z, z) attain its maximum on D u 3D at the point 2. Put O(z) = a(2, 2)-'4 i(z)
At 2 the function 0 assumes the value 1; S(O) _ ?a(2, 2)-'. We show that S(() is the infimum of S((p) on G. Indeed, let rp(z) _ Y ck ek(z) be any function
assuming a value not smaller than 1 at some point 2. Using the Cauchy inequality, we obtain S(p)' a(2, 2) = 2 , Ak 'Ck
ek(z)
ckek(z) = 2gP(2) > 2
2
This implies S((p) >-
Hence infine6 S(p) =
za(2,
2)
>- La(2, 22)"'.
(4.20)
2)-'
Za(2,
If we assume that 2 is the only absolute maximum point of the function
a(z, z) on D u 8D, then the equality in (4.20) can be attained only for Ck = Akek(2)a(2, 2)-', k = 1, 2, ... , i.e., the infimum of S(4) is attained only for cp(z) = (p(z). Theorem 3.4 yields lim e2 In P{eX E G)
za(2, 2)"' = -
E
max a(z, z) )
2 (eDuM
If a(2, 2) > a(z, z) for all remaining points z of D u 3D, then limp { sup I eXz - O(z) I < b I eX E G I = 1 E-o
l
Z
for every b > 0. In the case of a homogeneous random field X. there exists no function pp along which the realizations of &XZ reach the level 1 with overwhelming probability as e - 0 under the condition that they reach it at all. Nevertheless, in this case, the functions a(zo, z) for various zo can be obtained from each other by shifts and it can be proved that
lim P{ max E-o
eXZ -
a(z - z0)
[ZEDv OD
where 6 > 0, a(z) = a(zo, zo + z).
a(0)
< lS I EXZO >- 1 1= 1,
Chapter 4
Gaussian Perturbations of Dynamical Systems. Neighborhood of an Equilibrium Point
§1. Action Functional In this chapter we shall consider perturbations of a dynamical system
z, = b(x,),
xo = x,
b(x) = (b'(x), ... , b'(x))
(1.1)
by a white noise process or by a Gaussian process in general. Unless otherwise
stated, we shall assume that the functions b' are bounded and satisfy a Lipschitz condition: I b(x) - b(y) I <- K I x - y 1, I b(x) I < K < oo. Here we
pay particular attention to the case where the perturbed process has the form
Xo=x,
X;=b(X;)+&',,
(1.2)
where w, is an r-dimensional Wiener process. In Ch. 2 we mentioned that as
s --* 0, the processes X; converge in probability to the trajectories of the dynamical system (1.1), uniformly on every finite interval [0, T]. This result can be viewed as a version of the law of large numbers. In Ch. 2 there is also
an assertion, concerning the processes X;, of the type of the central limit
theorem: the normalized difference (X; - x) converges to a Gaussian process. This result characterizes deviations of order t: from the limiting dynamical system. In this chapter we study the asymptotics of probabilities of large (or order 1) deviations for the family X" of processes and consider a series of problems relating to the behavior of the perturbed process on large time intervals. In the last section we consider large deviations for the processes X;, which are defined by the equations
Xt = b(X;, ac),
Xo = x,
(1.3)
where C, is a Gaussian process in R` and (b, x, y), x E R', y e R' is a continuous function for which b(x, 0) = b(x).
Let 0, be a continuous function on [0, T] with values in R'. In C0T(R') we consider the operator B,,: i --p v, where v = v, is the solution of the equation
v, = x +
t
J0
b(vs) ds + 0
t e [0, T].
(1.4)
104
4. Gaussian Perturbations of Dynamical Systems
It is easy to prove that under the assumptions made concerning b(x), the solution of equation (1.4) exists and is unique for any continuous function qi and any x e R'. The operator Bx has the inverse
(B;1v)1='//t=v,-x-
fb(vs) d s.
Lemma I.I. Suppose the junction b(x) satisfies the Lipschitz condition I h(x) - b(y) I <- K I x - y l.
Then the operator Bx in COT(R') satisfies the Lipschitz condition
IIBxcp - B,,/'II s eKTllkp - "11;
(P, 0 E COT(R').
Proof. By the definition of Bx, for u = Bx cp, v = Bx 'li we have
lu,-V1l = f (b(us) - b(vs)) ds + ((p1 0
t e [0, T].
Relying on Lemma 1.1 of Ch. 2, this implies the assertion of Lemma 1.1.
On COT(R') we consider the functional S((p) = So .(cp) which is defined by the equality SOT((') = 2
J0
1 t0s - b((ps) IZ ds
for absolutely continuous functions and we set SoT(cp) = + oc for the remaining tp e COT(R').
Theorem 1.1. The functional E-'S((p) is the action functional for the family X; of processes defined by equation (1.2) in C0T(R') as E - 0 uniformly with respect to the initial point x e R'.
Proof. For every fixed x, the assertion follows from Theorem 3.1 of Ch. 3 and Lemma 1.1 if we take account of the form of the actional functional for the family of processes Ew,.
105
11, Action Functional
Indeed, since BX is continuous in COT(R') and has an inverse, the action functional for the family of the processes X` = Br(ew) has the form e-2S(tp), where T
S(q)=2 Jo dt (B; (p),
2
1
dt
10, -
2 Jo
b((q,)12 dt,
if the function (B,t(p), = cp, - x - f o b(cp.,) ds is absolutely continuous. It is clear that this function is absolutely continuous if and only if ip, is so. Now we have to prove uniformity in x. The fulfillment of formulas (3.11), and (3.12) of Ch. 3 for small values of the parameter for all x e R' follows immediately from the uniform continuity of BX in x (a Lipschitz condition with a constant independent of x). It remains to prove (0) of §3 of Ch. 3: the lower semicontinuity of SOT and the compactness of UXEK X(s) for any compact set K. This can be deduced from the fact that (x, ii) -+ BX /i is a
homeomorphism if we consider only k with 0 = 0. O This implies in particular that the infimum of SOT(q) on any bounded closed subset of COT(R') is attained and SOT assumes values close to the smallest value only near functions at which the minimum is attained. We note that if SOT((p) = 0, then the function cp defined on [0,T] is a trajectory of the dynamical system (1.1), since in this case, (p is absolutely continuous and satisfies the condition rp, = b((p,) almost everywhere on [0, T]. We consider some simple applications of Theorem 1.1. Let D be a domain in R', let aD be its boundary, let c(x) be a bounded continuous function on R', and let g(x) be a bounded continuous function defined on aD.
We write t` = min{t: X' o D}, where X; is a solution of equation (1.6) and HD(t, x) = {cp e COT(R'): cpo = x, (p, e D u OD),
HD(t, x) _ {(p E COT(R'): (p0 = x, xs 0 D
for some s e [0, t]).
Theorem 1.2. Suppose that the boundary of D coincides with the boundary of its closure. We have Jim E2 In PX{X; E D} s-O
min SoT((D), toEHD(I.X)
lim c2 In PX{r' G t} = - min E- O
(QE HDO. X)
106
4. Gaussian Perturbations of Dynamical Systems
If the extremal ip, providing the minimum ol'So,((p) on HD(t, x) is unique and it assumes a value in OD at only one value s1 a [0, t], then lim
Mx{r` < t;
exp{ f o c(X5) ds}}
Px{r < t} E
Ego
s
= 9(ws) expj
lo
c(ipS)
(1.7)
Proof. Since r-2SOT((p) is the action functional for the family of processes XI, relations (1.5) and (1.6) follow from Theorem 3.4 of Ch. 3. In seeing this, we
have to use the regularity of HD(t, x) and HD(t, x), which was proved in Example 3.5 of Ch. 3.
To prove (1.7), we consider the events Ai = {r` < t, po,(X`, ip) < 81, AZ = {r` < t, po,(X`, 0) > S}. On the set A; we have l
S
(Jo E
9(&s) exp{ fo c(&) ds } - g(X L.) exp{
c(XS) ds
)))
0 as 610. This estimate follows from the circumstance that the time spent by the curve cps in the 6-neighborhood of the point ips a aD converges to zero as 610 and after the time ss the extremal cp, does not hit aD. The relations A; u A; = {r` < t} and (1.8) imply the inequality where A,
((.t) exp{ f c(&) ds } - A6 ][P{r` < t) - P(A)] o
)
J
- lglleucu'ps(AZ) < Mx{r` < t;g(X'.)exp{ f Ttc(Xs)ds}}
<
(1.9)
exp{ f ie(0,,) ds } + .1alPx{r` < t} + lglleIkII1px(A2-'),
where IIcII = suplc(x)I, IIgII = suplg(x)I. Since 0 is the only extremal of the action functional on HD(t, x), the value of So, at functions reaching OD and
being at a distance not smaller than 6 from 0 is greater than So, ((0) + y, where y is some positive number. Using Theorem 1.1 (the upper estimate with y/2 instead of y), we obtain that Px(A') <_ exp{-E-2(So,((p) + y/2)} fore sufficiently small. Taking account of relation (1.6), from this we conclude that lim Px(A,1)/Px{r` < t} = 0.
(1.10)
If we now divide inequality (1.9) by Px{r` < t} and take account of (1.10) and the fact that limb 10 A = 0, we obtain (1.7).
107
§1. Action Functional
Thus, the calculation of the principal term of the logarithmic asymptotics of probabilities of events concerning the process Xf has been reduced to the solution of some variational problems. These problems are of a standard character. For the extremals we have the ordinary Euler equation and the minimum itself can be found conveniently by means of the Hamilton-Jacobi equation (cf., for example, Gel'fand and Fomin [1]). If we write
min So,((p),
V(t, x, y) =
wo=X.(y,=Y then
min So,(cp) = min V(t, x, y),
min S01(9) = min V(s, x, y). OSSSI
VEHD(I,X)
YEDVDD
VC-HD(4X)
V OD
The Hamilton-Jacobi equation for V(t, x, y) has the form aV(' X, Y)
+2IVYV(t,x,y)I2 + (b(y),V V(t,x,y)) = 0,
(1.11)
where VY is the gradient operator in the variable y. We have to add the conditions V(0, x, x) = 0, V(t, x, y) >_ 0 to equation (1.11). Concluding this section, we note that the functions us(t, x) = PX{X; e D), (
vE(t, x) = PX{TE < t},
WE(t, X) = Mx l{-r' < t; g(X=.) exp
I Jo,
c(X:) ds 11
are the solutions of the following problems: 2
E
at
= i Au' + (b(x), V .,u% uz(0, x) = 1
t > 0;
x E R',
for x e D,
ut(0, x) = 0 for x 0 D; avE
at =
E2 2
evE + (b(x),
V'(0' x) = 0,
awEE2
at = 2
VX
vE),
x E D,
t > 0;
VE(t, x)Ix EaD = 1;
AWE + (b(x), VX wE) + c(x)wE,
wc(0, x) = 0, w`(t, x)lXEDD = 9(x);
x E D, t > 0;
4. Gaussian Perturbations of Dynamical Systems
108
so that Theorem 1.2 can be viewed as assertions, concerning the behavior as the parameter converges to zero, of solutions of differential equations with a small parameter at the derivatives of the highest order.
§2. The Problem of Exit from a Domain Let D be a bounded domain in R` and let OD be its boundary, which we assume to be smooth for the sake of simplicity. If the trajectory x,(x) of the system (1.1) issued from the point x e D leaves D u aD within finite time, then for small r, the trajectories of the process X', issued from x leave D within the same time with probability close to one and the first exit from D takes place near the exit point of x,(x) from D with overwhelming probability (cf. Ch. 2).
In this paragraph we shall assume that (b(x), n(x)) < 0 for x e OD, where n(x) is the exterior normal to the boundary of D, so that the curves x,(x) cannot leave D for x e D. The trajectories of X; issued from a point x e D leave D with probability 1 (and in this case for every r # 0). Nevertheless, the point of exit and the time necessary for reaching the boundary are not any more determined by the trajectory x,(x) of the dynamical system for small E,
but rather depend on the form of the field b(x) in the whole domain D in general. Here we study the problem of exit from D for the simplest structure
of b(x) in D which is compatible with the condition (b(x), n(x)) < 0. A more general case will be considered in Ch. 6.
Let O e R' be an asymptotically stable equilibrium position of system (1.1), i.e., for every neighborhood Xt of 0 let there exist a smaller neighborhood -'Z such that the trajectories of system (1.1) starting in SZ converge to
zero without leaving .'t as t -> x. We say that D is attracted to 0 if the trajectories x,(x), x e D converge to the equilibrium position 0 without leaving D as t --> x. The quasipotential of the dynamical system (1.1) with respect to the point O, is, by definition, the function V(O, x) defined by the equality V(O, x) = inf{ST,T2((P): (P E CT,T2(R'), (PT, = 0, (PT, = x).
We note that the endpoints of the interval [Tt, T2] are not fixed. The meaning of the term "quasipotential" will be clarified in the next section. It is easy to verify that V(O, x) >_ 0, V(O, 0) = 0 and the function V(O, x) is continuous. Theorem 2.1. Let 0 be a stable equilibrium position of system (1.1) and suppose that the domain D is attracted to 0 and (b(x), n(x)) < 0 for x c- D. Suppose
furthermore that there exists a unique point yo E OD for which V(O, yo) _ min,.,.,'D V(O, y). Then lim PX{p(Xi-, }'o) < 8} = 1, F.--o
for every 6 > 0 and x e D, where T' = inf{t: X' c- OD}.
109
12; The Problem of Exit from a Domain
The proof of this theorem will be done according to the following plan. First we show (by means of Lemma 2.1) that with probability converging to I
as r. , 0, the Markov trajectories X; issued from any point x e D hit a small neighborhood of the equilibrium position before they go out to aD (Fig. 3). Since Xr is a strong Markov process, this implies that it is sufficient to study how the trajectories starting in a small neighborhood of 0 go out of the domain. Let r and y be two small spheres of radii µ and p/2 with their center at the
equilibrium position 0 (Fig. 4). We introduce an increasing sequence of
Figure 3.
Figure 4.
Markov times To, ao, TI, at, rz, ... in the following way: ro = 0 and a. = inf It > in: X; e F}, to = inf {t > a,,- t : X, e y u OD) (if at a certain step, the process X; does not reach the set r any more, we set the corresponding Markov time and all subsequent ones equal to + oo. We note that infinite 2n's or an's can be avoided if we change the field b(x) outside D in an appropriate way.) The sequence Zn = XL forms a Markov chain on the set y u OD (generally speaking, a nonconservative chain: Zn is not defined if trn = oo
but this can happen only after exit to aD). For small c, this Markov chain passes from any point x e y v aD to the set y in one step with overwhelming probability. On the other hand, it turns out that if the chain indeed passes
from x e y onto OD, then with probability converging to 1 as a -- 0, this passage takes place into a point lying in the neighborhood of the minimum point yo of the quasipotential: for every 6 > 0 we have lim PZ{IZ1 - yol < 61Z1
etD} = 1
(2.1)
e--o
uniformly in z e y. The assertion of the theorem can be deduced easily from this.
110
4. Gaussian Perturbations of Dynamical Systems
The proof of (2.1) will of course, use the action functional and its properties.
We formulate and prove the auxiliary assertions needed for the proof. Lemma 2.1. Let F be a compact set in R' and let T and b be positive numbers. There exist positive numbers go and Q such that Px{PoT(X`, x(x)) >- S} < exp{-
for any x e F and E < go, where x,(x) is the trajectory of the dynamical system issued from x.
Proof. Put G(X) = {(P E COT: (PO = x, POT((P, X(x)) > b}.
By the corollary to Theorem 1.1, the infimum d of SOT(q) on the closed set UxE F G(x) is attained at some element of this set. The functional SOT vanishes
only on trajectories of the dynamical system. Therefore, d > 0. For any d' < d, the sets U.. F G(x) and UxE F Ox(d') are disjoint. Let us denote the distance between them by S'. (a' is positive because the first set is closed and the second is compact.) We use Theorem 1.1: for any y > 0 we have P{POT(X`, x(x)) -: S) = Px{X` e G(x)} < Px{POT(Xo, (Dx(d')
exp{-a-2(d' - y)}
for sufficiently small s and for all x E F. Hence the assertion of the lemma
holds for fi = d' - y (as Q we can therefore choose any number smaller than d).
In what follows, we denote by e,5(a) the 6-neighborhood of a point
aeR'. We shall need the following lemma in §4, as well. Lemma 2.2. Suppose that the point 0 is a stable equilibrium position of system
(1.1), the domain D is attracted to 0 and (b(x), n(x)) < 0 for x e OD. Then for any a > 0 we have: (a)
there exist positive constants a and To such that for any function (P,
assuming its values in the set D t aD\ej(O) for t e [0, T], we have (b)
the inequality SOT((P) > a(T - T0); there exist positive 7onstants c and To such that for all sufficiently small e > 0 and any x e D u aD\8;,(O) we have the inequality
Px{Ca > T} < exp{-E-2c(T - To)}, where C. = inf{t: X'0 D\es(O)}.
V. The
111
Problem of Exit from a Domain
be a neighborhood of 0 such that the trajectories proof. (a) Let never leave 9,(0). We x,(x) of the dynamical system issued from denote by T(a, x) the time spent by x,(x) until reaching 'Q.(O). Since D is attracted to 0, we have T(a, x) < oo for x e D u 3D. The function T(ct, x) is upper semicontinuous in x (because x,(x) depends continuously on x). Consequently, it attains its largest value To = D T(a, x) < oo. The set of functions from CoTo, assuming their values in D U 8D\8a(O), is closed in COT.. By the corollary to Theorem 1.1, the functional SOT, attains its infimum on this set. This infimum is different from zero, since otherwise some trajectory of the dynamical system would belong to this set.
Hence for all such functions. A > 0. By the additivity of S, for functions cp spending time T longer than To in D u 0\9a(O), we have S0T.(gp) > A; for functions spending time T >- 2To in D u 0D\e,(0), we have SOT(c0) >- 2A, etc. In general, we have
SoT((p) >- A[T/To] > A(T/To - 1) = a(T - To).
(b) From the circumstances that D is attracted to 0 and that (b(x), n(x)) < 0 on the boundary of D, it follows that the same properties will be enjoyed by the 6-neighborhood of D for sufficiently small b > 0. We shall assume that b is smaller than a/2. By assertion (a), there exist constants To and A such that SoT0(cp) > A for functions which do not leave the closed 6-neighborhood of D and do not get into 19a/2(O). For x e D, the functions in the set {gyp: (Po = x, SOT0((p) < A} reach 9QI2(O) or leave the 0-neighborhood of
D during the time from 0 to To; the trajectories of X; for which C. > To are at a distance not smaller than S from this set. By Theorem 1.1, this implies that for small & and all x e D we have
Px{Ca > To} < exp{-s-2(A - y)}. Then we use the Markov property:
(n + 1)To} = Mx[Cx > nT0; Px-TX. > To}]
<
nTo} sup P,.{4 > To}; yeD
and we obtain by induction that
(T1
P,.{C2>
Px{CQ > T} < Px cz > T JTot < [sup
Io L o
exp
-s-2(7
l[T/TO]
To}] J
l
- 1)(A y)}. ))}
Hence as c, we may take (A - y)T0, where y is an arbitrarily small number.
112
4. Gaussian Perturbations of Dynamical Systems
We formulate another simple lemma. Lemma 2.3. There exists a positive constant L such that for any x and y e R' there exists a smooth function q0,, (PO = x, cPT = y, T = Ix - y I for which
Indeed, we may put (p, = x + [(y - x)/(I y - x D]. We now pass to the proof of the theorem. Let S > 0. We write
d =min {V(O,y):yEBD,Iy- yoI >-b} - V(O,yo). Since yo is the only minimum place of V, we have d > 0. We choose a positive number p < d/5L such that the sphere I- of radius p and center 0 is inside D (L is the constant from Lemma 2.3). Lemma 2.4. For sufficiently small t we have
PX{Z, e OD} > exp{ -E-2(V(O, yo) + 0.45d)}
for all x e y. (We recall that y is the sphere of radius p/2 and center 0.)
Proof. We choose a point y, outside D u aD at a distance not greater than µ/2 from yo. There exists T > 0 such that for any point x e y there exists a function cp;, 0 < t < T, cpo = x, cpT = y,, SoT((px) < V(O, yo) + 0.4d.
Indeed, first of all we choose a function cp;'), 0 < t < T,, cpo' = 0, (PT, = Yo such that SOT, ((pi ") < V(O, yo) + 0.1d. We cut off its first portion up to the point x, = (p;;) of the last intersection of (p,') with r, i.e., we introduce
the new function (p'(2) = cp;; + 0 < t < T2 = T, - t,. We have (po' = x (') V(O, yo) + 0.1d. Moreover, by Lemma ') = YO, SoT2(co 2') = S,,T,(4P) PT 2.3,
we choose functions
SOT,((P13)) < 0.2d; (p,4), 0 G
(p,('), 0 < r < T3 = p. cPo' = 0, T; = x,, SOT4((P(4)) < 0.1d. t C T4, cP0 = Yo' T4 =
Finally, by the same lemma, for any x E y we choose a function cp;, 0 < t
< T5 = p/2, cpo' = x, T; = 0, S05(5) < 0.1d depending on x. We
(P(2) and p(4) construct the function cp; out of pieces (p (5), 0
PX{PoT(X`, (pX) < 8'} > exp{-E-2(V(O, yo) + 0.4d + 0.05d)}.
On the other hand, if a trajectory of X; passes at a distance smaller than 6' from the curve cp,, then it hits the 8'-neighborhood of y, and intersects OD
§2.
113
The Problem of Exit from a Domain X
(x)
tP _ BPS-r,-r,
Figure 5.
on the way, not hitting y after reaching F. Consequently, the probability that Z1 belongs to OD is not smaller than exp{ -E-2(V(O, yo) + 0.45d)}. F1
Lemma 2.5. For sufficiently small E we have PX{Z1
E3D\.ffa(yo)} < exp{-E-2(V(O, yo) + 0.55d)}
for allxEy.
Proof. We recall that Z1 = X", where T1 = inf{t > ao: X; E y u aD}. We introduce the notation T(y t OD) = inf{t > 0: X, c- y u aD}. The random variable Z1 is nothing else but the variable X',(YU8D) calculated for the segment,
shifted by ao on the time axis, of a trajectory after time ao. We use the strong Markov property with respect to the Markov time ao. We obtain
PX{Z, E aD\9j(yo)} = MX Px',,
E aD\9a(yo)).
Since X;0 E t, this probability does not exceed
sup PX{XL(r JOD) E aD\,ffa(yo)) XEr
for any x E y. We estimate the latter probability. By Lemma 2.2, for any c > 0 there exists Tsuch that PX{T(y u aD) > T} <_ exp{-E-2c) for all x c- F and s smaller than some so. As c we take, say,
V(0, yo) + d. To obtain the estimate needed, it remains to estimate PX(T(y v OD) < T, Xtiyu ID) E aD\B5(yo)}. We obtain this estimate by means of Theorem 1.1.
114
4. Gaussian Perturbations of Dynamical Systems
We consider the closure of the P/2-neighborhood of aD\d0a(yo); we denote
it by K. No function cp 0 < t < T, coo e T such that SOT(cp) < V(O, yo) + 0.65d hits K. Indeed, let us assume that qp,, e K for some t, < T. Then S01,((0) = SoT(tp) < V(O, yo) + 0.65d. By Lemma 2.3, we take the functions gyp,", 0:5 t < T,, cpo' = 0, (p") = cpo, with SoT,((p(1) < 0.2d and q42), 0
(P(", pp and q,(2) we build a new function: 0(t) = cp;1) for 0 < t < T,;
=(P,-T, for T,
Then 0o = 0, OT, +,,I T,E aD\ y(yo) and SOT, +(,+T2(O) < 0.2d + V(O, yo) + 0. Id + 0.65d. This is smaller than the infimum of V(O, y) for y e aD\-9a(Yo), which is impossible. This means that all functions from (Dx(V(O, yo) + 0.65d) pass at
a distance not smaller than u/2 from aD\e (yo). Using Theorem 1.1, we obtain for sufficiently small a and all x e t that Px{T(y u OD) < T, X' (YJoD)EaD\lia(Yo)}
< Px{PoT(X`, tx(V(O, Yo) + 0.65d)) ? k/2) < exp{-e-2(V(O, yo) + 0.65d - 0.05d)}, Px{XL(,.
D) a
aD\4ffa(Yo)} < Ps{T(y u OD) > T} + Px{T(y u 8D) < T,
a aD\9a(Yo)}
< exp{-E-2(V(O, yo) + d)) + exp{-e-2(V(O, yo) + 0.6d)} < exp{ -E-2(V(O, yo) + 0.55d)}. It follows from Lemmas 2.4 and 2.5 that
Px(Z, a aD\I(yo)) < Px(Z, a aD) exp{
-e-2 0.1d}
for sufficiently small a and all x e y. We denote by v the smallest n for which Z. e aD. Using the strong Markov property, for x e y we find that P.( I XL. - Yo I -> S} = Px{Z,, e aD\cfie(Y0)} 00
_ Y Px{v = n, Z. a aD\4ffb(yo)} n=1
_ Y Mx{Z1 ey,...,Z.-1 Ey};
EaD\(ffs(Yo)}}
n=1 00
< I Mx{Z, E y, ... , Zn-fey}; n=1
eOD}} .exp( -E-2 0.1d) = Y Px{v = n} exp{-c 12.0.ld) n=1
= exp{ as E
-E-2
0. 1d) -. 0
0. Consequently, the theorem is proved for x E y.
§2. The
115
Problem of Exit from a Domain
If x is an arbitrary point in D, then Px{IXT. - Yol >- S} < Px{XT(vu ,,D)E D} + Px{XT(Y,BD) E Y, I XT. - Yo l >_ S}.
The first probability converges to zero according to Lemma 2.1. Using the strong Markov property, we write the second one in the form Y;Px«,- D){IXT` - Yol > S}},
which converges to zero by what has already been proved. O In the language of the theory of differential equations, Theorem 2.1 can be formulated in the following equivalent form. Theorem 2.2. Let g(x) be a continuous function defined on the boundary 8D of a domain D. Let us consider the Dirichlet problem EZ
'
8u`
2 Au(x) + Y b`(x) ax; (x) = 0, u`(x) = g(x),
x e D;
x e D,
in D. If the hypotheses of Theorem 2.1 are satisfied, then limE.o u`(x) = g(yo). The proof follows easily from the formula uf(x) = Mxg(XT,) (cf. §5, Ch. 1)
if we take account of the continuity and boundedness of g(x). On the other hand, we can obtain Theorem 2.1 from Theorem 2.2 by means of the same
formula. 0 Under additional assumptions, we can obtain more accurate information on how a trajectory of X; goes out of D for small E. Now it will be more convenient to use the notation X`(t), p (t) instead of X;, cp etc.
We have defined V(O, y) as the infimum of SOT(q) for all functions ap(t), 0 < t < T going from 0 to y. This infimum is usually not attained (cf. examples in the next section). However, it is attained for functions defined on a semiaxis infinite from the left: there exists a function ap(t), - oo < t < T such that cp( - oo) = 0, cp(T) = y, S_ T(ip) = V(0, y). We shall not prove this but rather include it as a condition in the theorem we are going to formulate. (The assertion is contained in Wentzell and Freidlin [4] as Lemma 3.3 with the outlines of a proof.) The extremal p (t) is not unique: along with it, any translate (p(t) = p (t + a), - oc < t < T - a of it will also be an extremal. We introduce the following definition. Let G be a neighborhood of 0 with smooth boundary 3G. A curve cp(t) leading from 0 to the boundary 8D of D, necessarily intersects 3G somewhere. Let us denote by 0 aa((p) the last moment of time at which cp(t) is on OG: 8ar,((p) = sup{t: (p, e 0G}. If for some
116
4. Gaussian Perturbations of Dynamical Systems
Figure 6.
a > 0, the function cp(t) assumes values inside G for t e [OaG(cp) - a, Oac((P)], then we shall say that cp(t) leaves G in a regular manner.
Theorem 2.3. Suppose that the hypotheses of Theorem 2.1 are satisfied and
there exists an extremal !p(t), - oo < t < T, unique up to translations and going from 0 to aG (namely, to the point yo, Fig. 6). Let us extend 0(t) by continuity tot > Tin an arbitrary way. Let G c D be a neighborhood with smooth boundary of the equilibrium position and suppose that the extremal 0(t) leaves this neighborhood in a regular manner. Let its denote by O;"G the last moment qJ' time at which the trajectory of X6(t) is on aG until exit onto aD: OG = max{t < r`: XE(t) e aG}. For any 6> 0 and x e D we have lim PX Ll ma x I X`(t) - 0(t - O`aG + oaG(O))I < a} = 1. ) t
c-*0
c
In other words, with probability converging to 1 as F. 0, the last portion of the Markov trajectory until exit to aD but after exit from G is located in a small neighborhood of the extremal ip(t) translated in an appropriate way. The proof of this theorem can be carried out according to the same plan
as that of Theorem 2.1. We ascertain that for functions cp(t), 0 < t < T leading from 0 to aD and lying at a distance greater than 6/2 from all translates of the extremal, we have SOT(T) >- VA Yo) + d, where d is a positive
constant. Then we choose p < d/SL A 6/2 and the spheres y and F'. The functions co(t), 0 < t < T starting on F', lying farther than 6/2 from the translates of 0(t) and such that SOT(q) < V(O, yo) + 0.65d, do not reach the
117
§2, The Problem of Exit from a Domain
p/2-neighborhood of D. As in the proof of Theorem 2.1, from this we can conclude that px{T(y v OD) < T, X`(T(y v aD)) E 3D,
max I X`(t) - w(t - O'?G + O G(cb)) I >_ S} c<,<,' < exp{ -F;-2(V(O, yo) + 0.6d)} for all Px{X`(T(y
u 0) a 0D,
max I X`(t) - ((t - 0-?G + OOG(cp)) I >- 8) exp{-F-2.0.1d}
< P{X`(T(y u aD)) a 01
for all x e y, from which we obtain the assertion of the theorem for these x
and then forallxeD. If the dynamical system (1.1) has more than one stable equilibrium position
in D, then before it leaves the domain, the trajectory of Xr can pass from one stable equilibrium position to another. In this case, the point of exit from the domain depends on the initial point. We consider such a more complicated structure of a dynamical system in Ch. 6. Here we discuss the case where system (1.1) has a unique limit cycle in D. As a concrete example, let D be a planar domain homeomorphic to a ring. The boundary aD is assumed to be smooth, as before. We assume that system (1.1) has a unique limit cycle 11 in D and D is attracted to 11, i.e., the trajectory x,(x) for x e D converges to 11 with increasing t, without leaving D. On the boundary aD we assume that the earlier condition (b(x), n(x)) < 0 is satisfied. The quasipotential of system (1.1) with respect to 11 is, by definition, the function V(11, x) = inf{ST,TZ((P): cPT, e fI, c°TZ = X).
The following theorem can be proved in the same way as Theorem 2.1. Theorem 2.4. Suppose that the domain D is attracted to a stable limit cycle fI and
(b(x), n(x)) < 0 for x e 3D. Furthermore, suppose that there exists a unique point yo e aD for which
V(H, yo) = min V(l1, y). peeD
Then
lim Px{ I XT - Yo I < S} = 1, t-ao
for every 6 > 0 and x e D.
118
4. Gaussian Perturbations of Dynamical Systems
The assertion of Theorem 2.4 remains valid in the case where not the whole domain D is attracted to [I but only its part exterior to 11, provided that the interior part does not have an exit to OD. Then the cycle may be stable only from the outside. If the extremal of the functional S((p) from 11 to aD is unique up to translations, then an analogue of Theorem 2.3 can be proved.
§3. Properties of the Quasipotential. Examples In the preceding section it was shown that if we know the function V(O, x), i.e., the quasipotential of the dynamical system, then we can find the point yo on the boundary of the domain, near which the Markov trajectories of X; starting inside the domain reach the boundary for small E. In the next section
we show that this function is important in other problems, as well. For example, the principal term of the asymptotics of the mean time Mxt` spent by a trajectory of X; before reaching the boundary of a domain can be
expressed in terms of V(O, x). The behavior of the invariant measure of X; as E - 0 can also be described in terms of V(O, x). In this section we study the problem of calculating the quasipotential V(O, x), establish some properties of the extremals of the functional S((p) and consider examples. We shall only deal with the quasipotential of a dynamical system with respect to a stable equilibrium position; the case of the
quasipotential with respect to a stable limit cycle can be considered analogously.
Theorem 3.1. Suppose that the vector field b(x) admits the decomposition
b(x) = -VU(x) + l(x),
(3.1)
where the function U(x) is continuously differentiable in D u OD, U(O) = 0, U(x) > 0 and V U(x) 0 for x 0 and (1(x), V U(x)) = 0. Then the quasipotential V(O, x) of the dynamical system (1.1) with respect to 0 coincides with
2U(x) at all points x e D v OD for which U(x) < Uo = min.. ID U(A If U(x) is twice continuously differentiable, then the unique extremal of the functional S((p) on the set of functions Ws, - oo < s < T, leading from 0 to x is given by the equation s e (- oo, T),
c0., = V U((ps) + 1(cps),
(PT = x.
(3.2)
Proof. If the function cps for s e [T1, T2] does not exit from D t aD, then the relation U((pT2) - U((PT,) = f T; (V U((os), 0.,) ds implies the inequality T(cps,
ST,T2(cP) = 1 J T
VU((P.,)
- 1((ps)I2 ds + 2
T
> 2[U((pT2) - U((PT,)]
DU((Ps)) ds J T,
(3.3)
119
§3. properties of the Quasipotential. Examples
From this we can conclude that ST,TZ((p) >- 2U(x) for any curve cps, 'PT, = 0, (P T,. - x,
where x is such that U(x) < U0. Indeed, if this curve does not
leave D u OD over the time from Tt to T2, then the assertion follows from (3.3)
(we recall that U(O) = 0). If, on the other hand, cps leaves D t aD, then it crosses the level surface U(x) = Uo at some time TE (T1, T,). By the nonnegativity and additivity of S((p), we obtain ST,Ap)
ST,TZ((p)
2U(q) = 2U0 >- 2U(x).
On the other hand, if ips is a solution of (3.2), then W _ = 0. This follows from the facts that d U((s)/ds = I VU(ips)12 > 0 for 0, # 0 and 0 is the only zero of U(x). From this we obtain S- oc.T(0) = 2
(cps, V U(O,)) ds = 2[U(x) - U(O)] = 2U(x). J T
Consequently, for any curve cps connecting 0 and x we have: S((p) >_ 2U(x) = S(0). Therefore. V(0, x) = inf S((p) = 2U(x), and cp, is an extremal. If U(x) is twice continuously differentiable, then the solution of equation (3.2) is unique, and consequently, the extremal from 0 to x normalized by the condition cPT = x is also unique.
It follows from the theorem just proved that if b(x) has a potential, i.e., b(x) = -VU(x), then V(O, x) differs from U(x) only by a multiplicative constant. This is the reason why we call V a quasipotential. If b(x) has a decomposition (3.1), then from the orthogonality condition for VU(x) and 1(x) = b(x) + VU(x) we obtain the equation for the quasipotential: Z(VV(O, x), VV(O, x)) + (b(x), VV(O, x)) = 0,
(3.4)
which is, in essence, Jacobi's equation for the variational problem defining
the quasipotential. Theorem 3.1 says that the solution of this equation, satisfying the conditions V(O, 0) = 0, V(O, x) > 0 and VV(O, x) 0 0 for x 0 0, is the quasipotential. It can be proved that conversely, if the quasipotential V(O, x) is continuously differentiable, then it satisfies equation (3.4), i.e., we have decomposition (3.1). However, it is easy to find examples showing that V(O, x) may not be differentiable.
The function V(O, x) cannot be arbitrarily bad: it satisfies a Lipschitz condition; this can be derived easily from Lemma 2.3. Now we pass to the study of extremals of S((p) going from 0 to x. For them we can write Euler's equations in the usual way:
cp` - J=t
abk
Obi
ax'
axk
(cP'))
abf ((p,) = 0, - /=t b'((p,) axk
1 < k < r.
120
4. Gaussian Perturbations of Dynamical Systems
If b(x) has the decomposition (3.1), then we can write the simpler equations (3.2) for the extremals. From these equations we can draw a series of qualita. tive conclusions on the behavior of extremals. At every point of the domain, the velocity of the motion on an extremal is equal to VU(x) + 1(x) according
to (3.2), whereas b(x) _ -VU(x) + 1(x). From this we conclude that the velocity of the motion on an extremal is equal, in norm, to the velocity of the motion of trajectories of the system : I V U(x) + l(x) I = I -VU(x) + 1(x) I = I b(x) 1. The last assertion remains valid if we do not assume the existence of the decomposition of b(x). This follows from the lemma below. Lemma 3.1. Let S((p) < oe. Let us denote by ip the function obtained from (p by changing the parameter in such a way that I WS l = I b(OS) I for almost all s. We have S(O) < S(cp), where equality is attained only if I (PS I = I b((ps) I for almost all s. Proof. For the substitution s = s(t), gyp, = cps(,) we obtain T
S((p) = 2 f
I b((cs) - c05I2 ds
T, 1
rtlTzl
2 JIIT,) t(
I b(ip,) - ip1s(t) ' Izs(t) dt
f (I b(w,) Izs(t) + I w, Izs(t)-') dt - f ,(T2)
_
2
(b(ct), w,) dt
1(T1)
1(T11
t(Tz)
>t f(T,)
tlTz)
I b(1) I
I
,
I dt -
f
NT=1
dt,
OT I)
where t(s) is the inverse function of s(t). In (3.5) we have used the inequality axe + a-'y2 > 2xy, true for any positive a. We define the function s(t) by the equality
t=
f
s(1)
10.1 Ib((P.,)I-' du.
0
cps(,) is absolutely continuous This function is monotone increasing and in t. For this function we have I ip, I = I b((P,) I for almost all t and inequality (3.5) turns into the equality tlTz)
t(Tt)
This proves the lemma.
NTz)
fM)
121
§3. properties of the Quasipotential. Examples
Now we consider some examples.
consider the dynamical system z, = b(x,) on the real line, where b(O) = 0, b(x) > 0 for x < 0 and b(x) < 0 for x > 0 and D is an interval (a t, az) c R' containing the point 0. If we write U(x) = -fa b(y) dy, then b(x) = -dU/dx, so that in the one-dimensional case every field has a EXAMPLE 3.1. Let us
potential. It follows from Theorems 2.1 and 3.1 that for small e, with probability close to one, the first exit of the process X; = x + $o b(XS) ds + ew, from the interval (a t, a2) takes place through that endpoint at at which U(x) assumes the smaller value. For the sake of definiteness, let U(a,) < U(a2)The equation for the extremal has the form ips = - b(O); we have to take that
solution ips, - oo < s < T, of this equation which converges to zero as - co and for which (AT = a t .
s
In the one-dimensional case, the function v(x) = Px{XT,: = all can be calculated explicitly by solving the corresponding boundary value problem. We obtain 2
72
v`(x) _
t
ezc
Of course it is easy to see from this formula that lim,_0 Pc{XT, = at} = 1 if U(at) < U(a2). Using the explicit form of v(x), it is interesting to study how X, exits from (at, a2) if U(a,) = U(a2). In this case, Theorem 2.1 is not applicable. Using Laplace's method for the asymptotic estimation of integrals (cf., for example, Evgrafov [1]), we obtain that for x e (a,, az), r72e2e-2u(,r)
4
ez exp{2r-2U(012)} 2
1 U'(«z)1
ezt 2u(Y) dy - f_z exp{2E 2U(at)} I U'(«,) I 21 2
as r - 0. From this we find that if U(a,) = U(a2) and U'(a;) 5e 0, i = 1, 2, then
Px{x;. _ «;}
IU'(«t)I' + IU'(«z)I
t
Consequently, in this case the trajectories of X, exit from the interval through both ends with positive probability as r --r 0. The probability of exit through a; is inversely proportional to I U'(a;) I = b(a1) I
122
4. Gaussian Perturbations of Dynamical Systems
EXAMPLE 3.2. Let us consider a homogeneous linear system .z, = Ax, in Rr
with constant coefficients. Assume that the matrix A is normal, i.e., AA* = A*A and that the symmetric matrix A + A* is negative definite. In this case, the origin of coordinates 0 is an asymptotically stable equilibrium position. Indeed, the solution of our system can be represented in the form x, = eA`xo, where xo is the initial position. The normality of eA` follows from that of A. Using this observation and the relation eA`eA*` = we obtain e(A+A')t
Ix,I2
= (eAA`xo, eA`xo) = (e(A+A*)txo, xo) < IxoI
Ie(A-A*)1xoI.
Since A + A* is negative definite, we have I e4A+A*"xoI 0. Consequently, 1x112 --. 0 for any initial condition xo. It is easy to verify by direct differentiation that the vector field Ax admits the decomposition
Ax=
A
x,x)+A 4A:
_
2A*
X.
Moreover, the vector fields V( - 4(A + A*)x, x) = -RA + A*)x and 2(A - A*)x are orthogonal:
(- 2(A + A*)x, 2(A - A*)x) _ '[(Ax, Ax) - (A*x, A*x)] 4[(A*Ax, x) - (AA*x, x)] = 0. Let the right sides of our system be perturbed by a white noise: gee = AX; + :w,.
We are interested in how the trajectories of X; exit from a bounded domain D containing the equilibrium position 0. From Theorem 3.1 and formula (3.6) we conclude that the quasipotential V(O, x) of our dynamical system
with respect to the equilibrium position 0 is equal to - 2((A + A*)x, x). In order to find the point on the boundary OD of D near which the trajectories
of X; first leave D with probability converging to 1 as E - 0, we have to find the minimum of V(O, x) on OD. The equation for the extremals has the form
gyp, _ -2(A + A*)(p, + 2(A - A*)(p, = -A*(p,. If yo e 3D is the unique point where V(O, x) attains its smallest value on LID, then the last piece of a Markov trajectory is near the extremal entering this point. Up to a shift of time, the equation of this trajectory can be written in the form gyp, = e-'*'yo.
$q Asymptotics of the Mean Exit Time
123
Figure 7.
For example, let a dynamical system i - x z,i = - x, z
z; = x; - x; , be given in the plane R2. The matrix of this system is normal and the origin of coordinates is asymptotically stable. The trajectories of the system are logarithmic spirals winding on the origin in the clockwise direction (Fig. 7). The quasipotential is equal to [(x')2 + (x2)2], so that with overwhelming
probability for small c, Xf exits from D near the point yo a 8D which is closest to the origin. It is easy to verify that the extremals are also logarithmic spirals but this time unwinding from 0 in the clockwise direction.
§4. Asymptotics of the Mean Exit Time and Invariant Measure for the Neighborhood of an Equilibrium Position As in §2, let D be a bounded domain in R', D e 0 a stable equilibrium position of the system (1.1), r` the time of first exit of a process X; from D. Under the assumption that D is attracted to 0, in this section we calculate the principal term of In MX r` as r - 0 and also that of In W(D), where is the invariant
measure of the process X;, D = R'\D. Since the existence of an invariant measure and its properties depend on the behavior of b(x) not only for x e D, in the study of m`(D) we have to make some assumptions concerning b(x) in the whole space R'.
124
4. Gaussian Perturbations of Dynamical Systems
Theorem 4.1. Let 0 be an asymptotically stable equilibrium position of the system (1.1) and assume that the domain D c R' is attracted to 0. Furthermore, assume that the boundary OD of D is a smooth manifold and (b(x), n(x)) < 0 for x E OD, where n(x) is the exterior normal of the boundary of D. Then for x e D we have
lim E2 In Mr = min V(O, ),).
(4.1)
c-+o
Here the function V(O, y) is the quasipotential of the dynamical system (1.1) with respect to 0.
The proof of this theorem uses arguments, constructions and notation employed in the proof of Theorem 2.1. To prove (4.1), it is sufficient to verify that for any d > 0, there exists an EO such that for E < eo we have (a) (b)
E2 In Mxt` < Vo + d. e2 In MxT` > Vo
- d.
First we prove inequality (a). We choose positive numbers p, h, T1, and T.
such that the following conditions are satisfied: firstly, all trajectories of the unperturbed system, starting at points x e D u OD, hit the u/2-neighborhood of the equilibrium position before time T, and after time T, they do not leave this neighborhood; secondly, for every point x lying in the ball G = {x a R': Ix - 01 < u}, there exists a function cp; such that cpo = x, rp' reaches the exterior of the h-neighborhood of D at a time T(x) < T2
and cp, does not hit the p/2-neighborhood of 0 after exit from G and SoT(x)((px) < Vo + d/2.
The first of these assumptions can be satisfied because 0 is an asymptotically stable equilibrium position, D is attracted to 0 and for x e OD, we have the inequality (b(x), n(x)) < 0. The functions cp; occurring in the second condition were constructed in §2.
From the definition of the action functional, we obtain for y c- G that P.1 sup 0< t< T(y)
cl
I
y X` - rPtyI < h} { -E-2 (SoT(y)((pV) )) > exPl + 2)I }
>- exp{ -i 2(Vo + d)}, whenever E is sufficiently small.
Since the point cOT(y does not belong to the h-neighborhood of D, from the last inequality we conclude that Py.{TL < T2} > Pv{TL < T(y)} > exp{-r 2(V0 + d)} for sufficiently small a and y E G.
(4.2)
125
§4 Asymptotics of the Mean Exit Time
We denote by a the first entrance time of G: a = min{t: X' e G}. Using the strong Markov property of X,, for any x e D and sufficiently small c we obtain
PATE < T, + T2} > Mx{a < T,; Px.{TE < T2}}
>- P,r{a < T,} exp{-a-2(V0 + d)} t exp{-E-2(Va + d)}.
(4.3)
Here we have used inequality (4.2) for the estimation of Px:{T' < T2} and the fact that the trajectories of X; converge in probability to x, uniformly on
Then, using the Markov property of X,, from (4.3) we obtain
MxTE < > (n + 1)(T, + T2)Px{n(T, + T2) < T£ < (n + 1)(T, + T2)} n=o 00
_ (T, + T2) Y Px{T' > n(T, + T2)} h=o
<(T1 +T2)
[I- minPZ{TE
ni0 L
ZED
=(T,+T2)(minPz{T`
T2}
zeD
< 2(T, + T2) exp{E-2(Vo+ d)}, whenever E is sufficiently small. This implies assertion (a).
Now we prove assertion (b). We introduce the Markov times Tk, ak and the Markov chain Z defined in the proof of Theorem 2.1. The phase space of Z. is the set y u OD, where y = {x e R': Ix - 0 1 = p/2}. For the one-step transition probabilities of this chain we have the estimate P(x, 3D) < max P,{T, = TE} yer
= max [Py{T' = T, < T} + Py{r` = T, > T}].
(4.4)
yer
As follows from Lemma 2.2, T can be chosen so large that the second probability have the estimate
Py{T' = T, >- T) < z exp{-E-2(Vo - h)}.
(4.5)
In order to estimate the first probability, we note that the trajectories of X;,
0< t < T, for which T' = T, < T are a positive distance from the set {cp E CoT(R'):
Wo = y, SoT((p) < Vo - h/2) of functions provided that h > 0
126
4. Gaussian Perturbations of Dynamical Systems
is arbitrary and µ is sufficiently small. By virtue of the properties of the action functional, we obtain from this that
P,,{TE = T1 < T} < exp{-c-2(Vo - h)} for y e I' and sufficiently small e, ,u > 0. It follows from the last inequality and estimates (4.4) and (4.5) that
P(x, 8D) < exp{-a-2(Vo - h)},
(4.6)
whenever c, p are sufficiently small. As in Theorem 2.1, we denote by v the smallest n for which Z. = XT., E dD. It follows from (4.6) that
Px{v > n} > [1 - exp{ -e-2(Vo - h)}]"-' for x e y. It is obvious that TE = (r1 - TO) + (T2 - T1) + . - + Therefore, MxT` = a= 1 Mx{v > n; T. - T"_ }. Using the strong Markov property of X, with respect to the time a"_ we obtain that -
Mx{v > n; T. - Tn_} > Mx{v > n; T" - (T"_ t} > Px{v
-
xer
The last infimum is greater than some positive constant t1 independent of E. This follows from the circumstance that the trajectories of the dynamical system spend some positive time going from r to y. Combining all estimates obtained so far, we obtain that M'T' > t1 Y min P2{v >- n} "
2E7
t, Y(1 - exp{-c-2(V0 - h)})"-' = tl exp{c-2(Vo - h)} for sufficiently small c and µ and x e y.
This implies assertion (b) for x e y. For any x E D, taking into account that Px{T' > T1 } - 1 as c - 0, we have M., TE = Mx{TE < T1; TE} + Mx{TE > T1; TE}
> Mx{TE > TI; MT,TE}
> t' exp{(Vo -
exp{e-2(Vo - h)}.
h)F-2}Px{TE > T1} >
2
By the same token, we have proved assertion (b) for any x e D. O
127
§4 Asymptotics of the Mean Exit Time
Analyzing the proof of this theorem, we can see that the assumptions manifold aD is smooth and (b(x), n(x)) < 0 for x e aD can be relaxed. that the to assume instead of them that the boundary of D and that of It is sufficient of D coincide and for any x E aD, the trajectory x,(x) of the dythe closure namical system is situated in D for all t > 0. Remark.
We mention one more result relating to the distribution of the random variable re, the first exit time of D.
Theorem 4.2. Suppose that the hypotheses of Theorem 4.1 are satisfied. Then for every a> 0 and x E D we have Px{eE-2(V0-a)
lim
< TE < e`-2(v°+a)} = 1.
E-O
Proof. If lm PATE > eE 2(V°+°)} > 0 E-0
for some a > 0, then lim c2 In M1TE > Vo + a, E-o
which contradicts Theorem 4.1. Therefore, for any a > 0 and x e D we have lim Px{TE < exp{E-2(Vo + a)}} = 1.
(4.7)
E-o
Further, using the notation introduced in the proof of Theorem 2.1, we can write: PTITE < eE-2(Yo-a)}
n, rE < exp{E-2(V0 -
< MX x{T1 < TE, tt
a)}}
(4.8)
n=1
+ Px{TE = it}.
The last probability on the right side of (4.8) converges to zero. We estimate the remaining terms. Let mE = [C exp{E-2(Vo - a)}]; we choose the constant C later. For x e y we have 00
Px{v = n, TE < exp{E-2(V0 - a)} n=1 00
< Px{v < mE} + E Px{v = n, T,, < exp{E-2(V0 - a)}} n=Mi
< Px{v < mE} + Px{r,, < exp{E-2(Vo - a)}}.
128
4. Gaussian Perturbations of Dynamical Systems
Using the inequality
1} < exp{-e 2(VO - h)}, which holds for
x e y, h > 0 and sufficiently small r, we obtain that
Px{v < mL} < 1 - (1 - exp{- e-2(V0 - h)})"'' - 0
(4.10)
as e -. 0, for any C, a > 0 and h sufficiently small. We estimate the second term on the right side of (4.9). There exists 0 > 0
such that PX{T, > 01 > 1/2 for all x e y and e > 0. For the number Sof successes in m Bernoulli trials with probability of success 1/2, we have the inequality
P{S", > m/3} > I - b for m > mo. Since T,,, = (T1 - TO) + (r2 - T1) + + the strong Markov property of the process, we obtain that P_{T
< e` Z""-"} = P
f Me
<
I
C1
< S,
1), using
(4.11)
if 0/3 > 1/C and m, is sufficiently large. Combining estimates (4.8)-(4.11), we arrive at the relation lim P.,{-r` <
c-0
e`-2'v°_z,}
= 0,
xeD.
(4.12)
The assertion of the theorem follows from (4.7) and (4.12).
Now we pass to the study of the behavior, as e - 0, of the invariant measure of X; defined by equation (1.2). For the existence of a finite invariant measure, we have to make some assumptions on the behavior of b(x) in the
neighborhood of infinity. If we do not make any assumptions, then the trajectories of X; may, for example, go out to infinity with probability 1; in this case no finite invariant measure exists. We shall assume that outside
a sufficiently large ball with center at the origin, the projection of b(x) onto the position vector r(x) of the point x is negative and separated from zero, i.e., there exists a large number N such that (b(x), r(x)) < -1/N for lxi > N. This condition, which will be called condition A in what follows, guarantees that X; returns to the neighborhood of the origin sufficiently fast and thus there exists an invariant measure. A proof of this can be found in Khas'minskii [1]; the same book contains more general conditions guaranteeing the existence of a finite invariant measure. If there exists an invariant measure of X;, then it is absolutely continuous with respect to Lebesgue measure and the density m(x) = dp`/dx
129
§4. Asymptotics of the Mean Exit Time
satisfies the stationary forward Kolmogorov equation. In our case, this equation has the form 82
2 Am`(x) - Y ax; (b'(x)m`(x)) = 0.
(4.13)
t
Together with the additional conditions f Rr m`(x) dx = 1, m`(x) > 0, this equation determines the function m`(x) uniquely.
First we consider the case of a potential field b(x): b(x) -VU(x). In this case, the conditions of the existence of a finite invariant measure mean that the potential U(x) increases sufficiently fast with increasing jxi; for example, faster than some linear function a I x I + P. It turns out that if b(x) has a potential, then the density of the invariant measure can be calculated explicitly. An immediate substitution into equation (4.13) shows that m`(x) = c, exp{ -2F-ZU(x)},
(4.14)
where c, is a normalizing factor defined by the normalization condition 2 U(x); dx)-. The convergence of the integral occurring here is a necessary and sufficient condition for the existence of a finite cr = (f R" exp{ - 2r.
invariant measure in the case where a potential exists.
Let D be a domain in R'. We have p (D) = cE fD exp(-2F-'U(x)) dx. Using this representation, we can study the limit behavior of p as f. -* 0. Let U(x) >- 0 and assume that at some point 0, the potential vanishes: U(O) = 0. Then it is easy to verify that FZ In u'(D) = FZ In cE + F2 In fD exp{-2F-2U(x)} dx - inf 2U(x) (4.15) xeD
as F - 0. By Laplace's method, we can find a more accurate asymptotics of as c - 0 (cf. Bernstein [1] and Nevel'son [1]). If b(x) does not have a potential, then we cannot write an explicit expression for the density of the invariant measure in general. Nevertheless, it turns out that relation (4.15) is preserved if by 2U(x) we understand the quasipotential of b(x).
Theorem 4.3. Let the point 0 be the unique stable equilibrium position of system (1.1) and let the whole space R' be attracted to 0. Furthermore, assume that condition A is satisfied. Then the process X; has a unique invariant measure u` for every r > 0 and we have
lim e' In p'(D) e-0
inf V(0, x), xe D
(4.16)
130
4. Gaussian Perturbations of Dynamical Systems
Figure 8.
for any domain D c R' with compact boundary OD common for D and the closure of D, where V(O, x) is the quasipotential of b(x) with respect to 0: V(O, x) = inf{SOT(q ): P E COTIRr), (PO = 0, APT = X, T > 01
(Fig. 8).
We outline the proof of this theorem. As we have already noted, condition
A implies the existence and uniqueness of a finite invariant measure. To prove (4.16), it is sufficient to verify that for any h > 0 there exists co = eo(h) such that fore < co we have the inequalities (a)
p`(D) > exp{-e-2(V0 + h)},
(b)
µ2(D) < exp{-a-2(V0 - h)},
where Vo = infX E D
VA x).
If Vo = 0, then inequalities (a) and (b) are obvious. We discuss the case Vo > 0. It is clear that Vo > 0 only if p(O, D) = po > 0. For the proof of inequalities (a) and (b), we use the following representation of the invariant measure. As earlier, let y and I be the spheres of radii p/2 and p, respectively, with center at the equilibrium position 0 and let p < po. As in the proof of Theorem 2.1, we consider the increasing sequence of Markov times To, Go, TI, 6t, T2, .... Condition A implies that all these times are finite with probability 1.
The sequence X,, X',, ... , Xi., ... forms a Markov chain with compact phase space y. The transition probabilities of this chain have positive density with respect to Lebesgue measure on y. This implies that the chain has a unique normalized invariant measure 1`(dx). As follows, for example, from Khas'minskii [1], the normalized invariant measure p`(.) of X, can be
44. Asymptotics of the Mean Exit Time
131
on y in the following
expressed in terms of the invariant measure of way:
p'(D) = cE
Mx JV
T
XD(XD) ds 1`(dx),
(4.17)
J0
where XD(x) is the indicator of D and the factor cE is determined from the
normalization condition ul(R') = 1. We set TD = min{t: X; E D u aD}. From (4.17) we obtain u'(D) = cE
Jy
Mx J XD(XS) dsl'(dx) 0
< cE max Px{rD < r,} max Myr,. xey
(4.18)
yEDD
It follows from condition A and the compactness of OD that
max Myr,
(4.19)
ye3D
uniformly in E < EO < 1. Moreover, it follows from the proof of Theorem 2.1 that for sufficiently small p and E and x e y we have
Px{rD < r,} < exp{-E-2(VO - h/2)}.
(4.20)
Taking account of the relation cE =
(JMt1ldx)
t
it is easy to see that for sufficiently small p and e, (4.21)
0 < In cE < 2E2
We conclude from estimates (4.18)-(4.21) that for sufficiently small a we have In
p(D) <
-
h
VO EZ
h/2
=-
VO EZ
h
22
In order to prove (b), we introduce the set D_p = {x e D: p(x, OD) > /f}. For sufficiently small fi, this set is nonempty and infxED_j, V(O, X) < VO + h/4. If x e y and E, u > 0 are sufficiently small, then
Px{min{t: X; eD_g} < r,} > exp{-E 2(VO + h/2)}, rT,
inf Mx xeD_p
J0
XD(XS') ds > a2 > 0.
132
4. Gaussian Perturbations of Dynamical Systems
Combining these estimates, we obtain from (4.17) that for small E, pe(D) >- cE min Pr{min{t: X; e Dfi} < T11 xey
x
inf M>cE exp{-a-2(V0 + h/2)} JXD(XDds
XED_R
a2.
0
Taking account of (4.21), from this we obtain assertion (b).
We note that if b(x) has stable equilibrium positions other than zero, then the assertion of Theorem 4.3 is not true in general. The behavior of the invariant measure in the case of a field b(x) of a more general form will be considered in Ch.6.
§5. Gaussian Perturbations of General Form Let the function b(x, y), x e R', y e R, with values in R' be such that -x21+IY1 -Y2I)Ib(xt,Y1)-b(x2,Y2)I We consider a random process X; = X;(x), t >- 0, satisfying the differential equation X; = b(X;, EC,),
Xo = x,
(5.1)
where ?, is a Gaussian random process in R'. We shall assume that S, has zero mean and continuous trajectories with probability 1. As is known, for the continuity it is sufficient that the correla-
tion matrix a(s, t) = (a'j(s, t)), a''(s, t) = Mss(' have twice differentiable entries. We write b(x) = b(x, 0). The process X; can be considered as a result of small perturbations of the dynamical system x, = x,(x): z, = b(x,),
x0 = x.
It was proved in Ch. 2 that X; -,. x, uniformly on every finite time interval
as E - 0. In the same chapter we studied the expansion of X; in powers of the small parameter e. In the present section we deal with large deviations of X; from the trajectories of the dynamical system x,. For continuous functions cps, s >- 0, with values in R' we define an operator Bx(cp) = u by the formula
u = u, = fb(us, (ps) ds + x. 0
In other words, u, = Bx((p) is the solution of equation (5.1), where EC, is replaced by cp, and we take the initial condition u0 = x. In terms of Bx we can write: X; = Bx(EC).
133
§5. Gaussian Perturbations of General Form
We shall denote by 11 I I c and 11A LZ the norms in CO 7.(R') and LO' T(R'), respectively.
Lemma 5.1. Suppose that the functions b(x, y) satisfy a Lipschitz condition with constant K and u = Bx((p), v = Bx(I1i), where .p, 0 E COT(R'). Then
Ilu - vllc < K ,/
eKTllgp - 4111V-
proof. By the definition of Bx, we have I u, - v, I = I f[b(us, cps) - b(vs,'s)] ds
<
Krlus-v5lds+K foIcps-ds. 0
o
We conclude from this by means of Lemma 1.1 of Ch. 2 that
Ilu-vllc<eKTK f TIcPs-0.,1ds
We denote by A the correlation operator of (s. It acts in LoT(R') by the formula T
Aep, = f a(s, t)(ps ds. 0
As has been shown in the preceding chapter, the action functional of the family of processes et;, in LoT(R') as a - 0 has the form (5.2)
SOT((O) = IIA-t/ZCpIIL2.
If A - t'2cp is not defined, then we set SoT(cp) = + cc. We put S((q) = SOT((p) =
inf
illA-1120112
.
Theorem 5.1. Let X; be the random process defined by equation (5.1). The functional SoT((p) is the normalized action functional for the family of processes X' in COT(R'); the normalizing coefficient is f (c) = c- 2.
Proof. Since B. acts continuously from LoT(R') into COT(R') and the action functional of the family c in LOT(R) has the form (5.2), by Theorem 3.1 in Ch. 3 the action functional of the family of processes X' = B,,(cC) in COT(R') as c - 0 is given by the formula SoT(cp) = infd:B.(O)=w i11A'12'IILZ
134
4. Gaussian Perturbations of Dynamical Systems
EXAMPLE 5.1. Let r = I = 1 and let system (5.1) have the form Xc
-arctan(X' - rC,), X; = x. In this case B has an inverse: Bz 1((p) tan rP + (p. The action functional of the family X; can be written in the following way:
/ T `SOT((p) = 2 J I A - 112(tan 1
BPS + (ps)12 ds,
0
where A is the correlation operator of C, For example, if (, is a Wiener process, then 2
/ S OT(p)
= 2 fo,
ds
(tan DVS + (p5)
1
T
ds=2 fo
'S
cost qS+
/, Y'S
2
ds.
Knowing the action functional, we can determine the rate of convergence to zero of probabilities of various events connected with the perturbed system
on a finite time interval and thus obtain results analogous to Theorem 1.2 (cf. Nguen Viet Fu [1], [2]). If the perturbations are of a stationary character, we may hope to also obtain results, analogous to those of §§2, 4, concerning the most probable behavior, for small e, of the trajectories X; (x) of the perturbed system on infinite time intervals or on time intervals growing with decreasing e. For example, let 0 be an asymptotically stable equilibrium position of the
system z, = b(x,), inside a domain D, on the boundary of which the field b(x) is directed strictly inside D. We consider the time
t`(x) = min{t >- 0; X'(x) 0 D} of first exit from D. We may try to prove the following analogue of Theorem 4.1:
lim e2 In Mtr`(x) = Vo = inf{S0 (co): (Do = OApT E aD; T > 0}. 8-0
(5.3)
Nevertheless, it becomes clear immediately that this is not so simple. First of all, an analysis of the presupposed plan of proof shows that the role of the limit of e2 In Mrs(x) may also be presumably played by
Vo=inf{S_..T((p):4,,=0, -oo0) for the same reason. In the case of Markov perturbations, Vo and Vo* obviously
coincide but in the non-Markov case they may not. Moreover, in the proof of Theorems 2.1, 4.1, and 4.2, we have used a construction involving cycles, dividing a trajectory of the Markov process X; into parts, the dependence among which could be accounted for and turned out to be sufficiently small.
135
§S. Gaussian Perturbations of General Form
For an arbitrary stationary perturbation cC, we do not have anything similar: we have to impose, on the stationary process C conditions ensuring the weakening of dependence as time goes. Since we are dealing with probabilities
converging to zero (probabilities of large deviations), the strong mixing property
sup{ I P(A r B) - P(A)P(B) I : A E
c 3, B E
turns out to be insufficient; we need more precise conditions. These problems are considered in Grin's works [1], [2]; in particular, for a certain class of processes X;, the infima Vo and Vo coincide and (5.3) is satisfied.
Chapter 5
Perturbations Leading to Markov Processes
§1. Legendre Transformation In this chapter we shall consider theorems on the asymptotics of probabilities of large deviations for Markov random processes. These processes can be viewed as generalizations of the scheme of summing independent random variables; the constructions used in the study of large deviations for Markov
processes generalize constructions encountered in the study of sums of independent terms.
The first general limit theorems for probabilities of large deviations of sums of independent random variables are contained in Cramer's paper [1]. The basic assumption there is the finiteness of exponential moments; the
results can be formulated in terms of the Legendre transforms of some convex functions connected with the exponential moments of the random variables. The families of random processes we are going to consider are analogues of the schemes of sums of random variables with finite exponential-moments,
so that Legendre's transformation turns out to be essential in our case, as well. First we consider this transformation and its application to families of measures in finite-dimensional spaces. Let H(a) be a function of an r-dimensional vector argument, assuming
its values in (- oc, + oo] and not identically equal to + x. Suppose that H(a) is convex and lower semicontinuous. (We note that the condition of semicontinuity-and even continuity-is satisfied automatically for all a with the exception of the boundary of the set {a: H(a) < x }.) To this function the Legendre transformation assigns the function defined by the formula
L(/3) = sup [(a, (3) - H(a)],
(1.1)
Y
where (a, /3) = Y; _ , a; #' is the scalar product. It is easy to prove that L is again a function of the same class as H. i.e., it is convex, lower semicontinuous, assuming values in (- x, + oc] and not identically equal to + oo. The following properties of Legendre's transforma-
tion can be found in Rockafellar's book [1]. The inverse of Legendre's
137
§l. Legendre Transformation
transformation is itself:
H(a) = sup [(a, /3) - L(f3)]
(1.2)
p
(Rockafellar [I], Theorem 12.2). The functions L and H coupled by relations (1.1) and (1.2) are said to be conjugate, which we shall denote in the following
way: H(a)'-' L(J1). At points ao interior for the set {a: H(a) < oo} with
respect to its affine hull, H is subdifferentiable, i.e., it has a (generally nonunique) subgradient, a vector f30 such that for all a,
H(a) >- H(ao) + (a - ao, Qo)
(1.3)
(Rockafellar [1], Theorem 23.4; geometrically speaking, a subgradient is the
angular coefficient of a nonvertical supporting plane of the set of points above the graph of the function). The multi-valued mapping assigning to every point the set of subgradients of the function H at that point is the inverse of the same mapping for L, i.e., (1.3) for all a is equivalent to the inequality (1.4)
L(fl) ? L(f3o) + (ao a - a0) for all 13 (Rockafellar [1], Theorem 23.5, Corollary 23.5.1). We have L(l)
00
as I P I - oo if and only if H(a) < oc in some neighborhood of a = 0. For functions H, L which are smooth inside their domains of finiteness, the determination of the conjugate function reduces to the classical Legendre
transformation: we have to find the solution a = a(J3) of the equation VH(a) = 13 and L(J3) is determined from the formula L(f3) = (a(a), f'3) - H(a(l3));
(1.5)
moreover, we have a(#) = VL(J3). If one of the functions conjugate to each other is continuously differentiable n > 2 times, and the matrix of secondorder derivatives is positive definite, then the other function has the same smoothness and the matrices of the second-order derivatives at corresponding points are inverses of each other: (0 2
aa32H
WO) a«;
=
1(e-a
-
EXAMPLE I.I. Let H(a) = r(ea - 1) + - 1), a e R'; solving the equation H'(a) = re' - le-° = J3, we find that a(J3) = In 13 +
L(p)= flInP+
r, 1 > 0. Upon
#Z + 4r1 2r /2#r 2
+4r1- p2+4r1 +r+1.
138
5. Perturbations Leading to Markov Processes
It turns out that the rough asymptotics of families of probability measures
in R' can be connected with the Legendre transform of the logarithm of exponential moments. The following two theorems are borrowed from Gartner [2], [3] with some changes. Let p' be a family of probability measures in R' and put
W(a) = In J r exp{(a, x)}µ''(dx). The function H' is convex: by the Holder inequality, for 0 < c < 1 we have
H''(ca, + (1 - c)a2) = In 5exp{c(otix)} exp{(l - cxa2, x)}µ''(dx) R
< Inl (JRr exp{(a,, x)}µ''(dx)) L`
(
1-c
r
r
exp{(a2, x)}µh(dx))
R
= cH"(at) + (1 - c)H''((X2);
H' is lower semicontinuous (this can be proved easily by means of Fatou's lemma), assumes values in (- oo, + co] and is not identically equal to + 00, since H''(0) = 0. Let 1(h) be a numerical-valued function converging to + oo as h 10. We assume that the limit H(a) = lim A(h)-'H''(2,(h)a)
(1.6)
h10
exists for all a. This function is also convex and H(0) = 0. We stipulate that it be lower semicontinuous, not assume the value - co and be finite in some neighborhood of a = 0. Let L(fl) H(a). Theorem 1.1. For the family of measures µ'' and the functions . and L, condition
(II) of §3, Ch. 3 holds, i.e., for any 6 > 0, y > 0, s > 0 there exists ho > 0 such that for all h < ho we have µ''{y: p(y, (D(s))
S} < exp{ -),(h)(s - y)},
(1.7)
where b(s) _ {#: L($) < s}.
Proof. The set 4(s) can be represented as an uncountable intersection of half-spaces :
(D(s) = n {o: (a, fi) - H(a) <- s}. a
139
§1. Legendre Transformation
This set is compact, because L is lower semicontinuous and converges to + co at infinity. We consider the boundary a(D +a(s) = {y: p(y, (b(s)) = S}
of the 6-neighborhood of '1(s). For every point y of this compact set, there exists a such that (a, y) - H(y) > s. Hence the open half-spaces {y: (a, y) -
H(a) > s} cover the compact set aI+a(s). From these a we choose a finite number al, ... , an. We obtain that the convex polyhedron n
n {y: (ai, y) - H(a) < s} i=1
contains L(s) and does not intersect aI+a(s). This implies that the polyhedron lies in the 6-neighborhood of 4)(s). Using Chebyshev's exponential inequality, we obtain the estimate
ph{y: P(y, 4)(s)) > S} < µh U {y: (ai, y) - H(ai) >
s})
> µh{y: (ai, y) - H(ai) > s}
i=1
tf i=1
exp{A(h)[(ai, y) - H(a;) - s]}p"(dy)
R'
_Y
exp{1(h)[,1.(h)-'H""(.(h)a)
i=1
- H(ai)]}
x exp{-A(h)s}. We obtain (1.7) from this by taking account of (1.6).
We shall say that a convex function L is strictly convex at a point /3o if there exists ao such that
L(fl) > L($o) + (ao, Q - flo)
(1.8)
for all / /o. For a function L to be strictly convex at all points interior to the set {fl: L(f3) < co} with respect to its affine hull (with the notation of Rockafellar's book [1], §§4,6, at the points of the set ri(dom L)), it is sufficient
that the function H conjugate to L be sufficiently smooth, i.e., that the set {a: H(a) < oo } have interior points, H be differentiable at them and if a sequence of points ai converges to a boundary point of the set {a: H(a) < oo }, then we have IVH(ai)I oo (cf. Rockafellar [1], Theorem 26.3).
140
5. Perturbations Leading to Markov Processes
Theorem 1.2. Let the assumptions imposed on p", H" and H earlier be satisfied. Moreover, let the function L be strictly convex at the points of a dense subset of { f3: L(/3) < cr }.
For the family of measures (!" and the functions A and L, condition (I) of §3 of Ch. 3 is satisfied, i.e., for any S > 0, y > 0 and x E R' there exists ho > 0 such that for h < ho we have
p"{y: p(y, x) < S} > exp{-A(h)[L(x) + y]}.
(1.9)
Proof. It is sufficient to prove the assertion of the theorem for points x at which L is strictly convex. Indeed, the fulfillment of the assertion of the theorem for such x is equivalent to its fulfillment for all x, the same function A and the function L(x) defined as L(x) if L is strictly convex at x and as + 00 otherwise. At the points where L(x) < L(x), we have to use the circumstance
that L(x) = limy-x L(y), and the remark made in §3 of Ch. 3. Suppose that L is strictly convex at x. We choose ao so that L(f3) > L(x) + (ao, /3 - x) for f3 96 x. Then we have
H((xo) = sup[(c0, /3) - L(j3)] = (ao, x) - L(x).
(1.10)
v
Since H(ao) is finite, H''(A(h)ao) is also finite for sufficiently small h. For such h we consider the probability measure defined by the relation = Sr e Xp{A(h)(ao y) - H"(A(h)o)}p"(dy').
We use the mutual absolute continuity of µ" and
p"{y: p(y, x) < h} = r
exp{ -A(h)(a0, y) + H"(A(h)ao)}µ"''°(dy). ply.xi
{
(1.11)
We put S' = S A y/3 1 ao I and estimate the integral (1.11) from below by the product of the of the 6'-neighborhood of x with the infimum of the function under the integral sign: u"{y:
p(y, x) < v} >
E!".=°{y:
p(y, x) < 8'}
x exp{(-A(h)[(ao, x) - A(h)-'H(A(h)ao)]} x exp{ - A(h) 3 I. By (1.6) and (1.10), the second factor here is not smaller than
exp{ - A(h)[L(x) + y/3]
if h is sufficiently small. If we prove that then everything will be proved.
p(y, x) < 6'}
1 as It 10,
141
Legendre Transformation
For this we apply Theorem 1.1 to the family of measures u', 20. We calculate the characteristics of this family: y)}µ".8°(d1')
H"''°(a) = In J r exp{(a, R
= H"(a + d(h)ao) - H"(2(h)ao);
H(ao + a) - H(ao);
Ha°(a) = lim h10
La°(14) = L(/3) - [(ao, /3) - H(ao)]
The function La°(/3) vanishes at /3 = x and is nonnegative everywhere (since Ha°(0) = 0). H is strictly convex at x since L(f) is. This implies that La°(/3) is strictly positive for all /3 0 x and yo = min{L"°(/3): p(/3, x) >- 8'/2} > 0.
We use estimate (1.7) with 6'/2 instead of 6, positive y < 70 and s E (y, yo). We obtain for sufficiently small h that
p(y, V°(s)) > 8'/2} < exp{ -2(h)(s - y)},
u'-"°{y: p(v, x) > 8'} < which converges to zero as h 10.
Consequently, if the hypotheses of Theorems 1.1 and 1.2 are satisfied, then 1(h)L(x) is the action function for the family of measures u" as h j 0.
The following example shows that the requirement of strict convexity of L on a set dense in {/3: L(/3) < oc } cannot be omitted. EXAMPLE 1.2. For the family of Poisson distributions (u"(F) =
Er(h"e-"/k !))
we have: H"(a) = h (ea - 1). If we are interested in values of h converging to zero and put 2(h) = - In h, then we obtain
H(a) = lim(-In h)-' In H"(-a In h) h10
=limh h
lo
-lnh
1 + 00,
a<1, a> 1;
{
L(l3) = j + 03, /3,
/3 < 0,
/3>0.
But, the normalized action function found by us in §3, Ch. 3 is different from +oo only for nonnegative integral values of the argument. Another example: we take a continuous finite function S(x) which is not convex and is such that S(x)/ I x I oo as I x I oo and min S(x) = 0. As p" we take the probability measure with density C(h) exp{ - 2(h)S(x)}, where
142
5. Perturbations Leading to Markov Processes
2(h) -+ oc as h 10. Here the normalized action function will be S(x) but the Legendre transform of the function
H(a) = lim 2(h)-' In fexP{2(h)(cc, x))p"(dx)
h0
will be equal not to S(x) but rather the convex hull L(x) of S(x). In those domains where S is not convex, L will be linear and consequently, not strictly convex.
The following examples show how the theorems proved above can be applied to obtain rough limit theorems on large deviations for sums of independent random variables. Or course, they can be derived from the precise results of Cramer [1] (at least in the one-dimensional case). EXAMPLE 1.3. Let ,, 5a,
... , ", ... be a sequence of independent identically
distributed random vectors and let
Ho(a) = In Me'° be finite for sufficiently small I a 1. We are interested in the rough asymptotics
as n -+ eo of the distribution of the arithmetic mean We have
H"(a) = In M exp
t+
n+
"l =
nHo(n- ta).
\a'
We put 2(n) = n. Then not only does 2(n)H"(2(n)a) converge to H0(a) but it also coincides with it.
The function Ho is infinitely differentiable at interior points of the set {a: Ho(a) < oo }. If it also satisfies the condition I VHo(a;) I - oc as the points a; converge to a boundary point of the above set, then its Legendre transform Lo is strictly convex and the asymptotics of the distribution of the arithmetic mean is given by the action function n Lo(x). EXAMPLE 1.4. Under the hypotheses of the preceding example, we consider
S. - nM k)/B",
the distributions of the random vectors
where B. is a sequence going to oc faster than In- but slower than n. We have
H"(a) = In M expa, '
i
Ill
/
+
+"B.
= n[Ho(B01 ") - (B" , OHo(0))J IcI2 a2Ho ` a1af = n[ Y 1
Jj
§2. Locally Infinitely Divisible Processes
143
If as the normalizing coefficient A(n) we take B.'/n, then we obtain
H(a) = lim 2(n)t- -H(A(n)a) = n_.O
aZ Ho
11
21,; aai aa;
(0)aia;.
If the matrix of this quadratic form, i.e., the covariance matrix of the random is nonsingular, then the Legendre transform of H has the form
vector
1
2ij aijflI
i,
where (ai) = ((a2Ho/aai 6a;)(0))-1. The action function for the family of the random vectors under consideration is In particular, this means that nZ
In P lim lim 6j0n-m Bn
+Bn
-x
§2. Locally Infinitely Divisible Processes Discontinuous Markov processes which can be considered as a result of random perturbations of dynamical systems arise in various problems. We consider an example.
Let two nonnegative functions 1(x) and r(x) be given on the real line. For every h > 0 we consider the following Markov process Xh on those points of the real line which are multiples of h: if the process begins at a point x, then over time dt it jumps distance It to the right with probability h-'r(x) dt (up to infinitesimals of higher order as dt -- 0) and to the left with probability h-' 1(x) dt (it jumps more than once with probability o(dt)). For small It, in first approximation the process can be described by the differ-
ential equation z, = r(x,) - 1(x,) (the exact meaning of this is as follows: under certain assumptions on r and I it can be proved that as It 10, X; converges in probability to the solution of the differential equation with the same initial condition). A more concrete version of this example is as follows: in a culture medium of volume V there are bacteria whose rates c+ and c_ of division and death
depend on the concentration of bacteria in the given volume. An appropriate mathematical model of the process of the variation of concentration of bacteria with time is a Markov process X" of the form described above with h = V_', r(x) = x c+(x) and 1(x) = x - c _ (x).
It is natural to consider the process X; as a result of a random perturbation of the differential equation z, = r(x,) - I(x,) (a result of a small random
perturbation for small h). As in the case of perturbations of the type of a
5. Perturbations Leading to Markov Processes
144
white noise, we may be interested in probabilities of events of the form {por(X", () < S}, etc. (probabilities of "large deviations"). As we have already mentioned, the first approximation of X" for small h is the solution of the differential equation; the second approximation will be a diffusion process with drift r(x) - 1(x) and small local variance h(r(x) + 1(x)).
Nevertheless, this approximation does riot work for large deviations: as we shall see, the probabilities of large deviations for the family of processes X; can be described by means of an action functional not coinciding with the action functional of the diffusion processes. We describe a general scheme which includes the above example. In r-space R' let us be given: a vector-valued function b(x) = (b'(x), ... , b'(x)), a matrix-valued function (a' (x)) (of order r, symmetric and nonnegative definite) and for every x e R', a measure px on R`\ {0} such that
fIR12px(df) < r. K 101
For every h > 0 let (X;', P"r) be a Markov process in R' with right continuous
trajectories and infinitesimal generator A' defined for twice continuously differentiable functions with compact support by the formula A"f(x) = Y b'(x) i
(If (x)
axi
+ h-' R \IOI
+
h
Y
2 i. j
a''(x)
a?T(x) axi axi fa' a, (x'x)
f+ hfl) -.f(x) - h
/
/
Pr(df).
(2.1)
i
If a''(x) = 0 and lix is finite for all x, then X" moves in the following way: it jumps a finite number of times over any finite time and the density of jumps at x is h -'px(R' \ {0}) (i.e., if the process is near x, then over time It it makes a
jump with probability h-'p,(R' \ {0}) dt up to infinitesimals of higher order as dt -+ 0); the distribution of the length of a jump is given by the measure px(R' \ {0})-' px(h-' d/3) (again, as dt -+ 0); between jumps the process moves in accordance with the dynamical system r, = b(x,). where
b(x) = b(x) - f
fpx(df)
R \{01
On the other hand, if px(R' \ {0}) = x,, then the process jumps infinitely many times over a finite time.
The process under consideration above is a special case of our scheme with r = 1. measure px concentrated at the points ± 1, px{ I } = r(x), px{ - I } = 1(x) and b(x) = r(x) - 1(x).
145
§2. Locally Infinitely Divisible Processes
If the measure µx is concentrated at 0 for every x, then the integral term in formula (2.1) vanishes and A'' turns into a differential operator of the second order. In this case (X;, Px) is a family of diffusion processes with a small diffusion coefficient and the corresponding trajectories are continuous with probability one. In the general case (X;, Px) combines a continuous diffusion motion and jumps. The scheme introduced by us is a generalization of the scheme of processes
with independent increments-the continuous version of the scheme of sums of independent random variables. It is known that in the study of large
deviations for sums of independent random variables, an important role is
played by the condition of finiteness of exponential moments (cf., for example,
Cramer [1]). We introduce this condition for our scheme, as well: we shall assume that for all a = (a,, ... , a,), the expression H(x, a) _
W(x)a; + 1 Y_ a" (x)a; aj
2..;
+
I
(ex{xii} - 1 -
0i P iux(dp)
(2.2)
is finite. The function H is convex and analytic in the second argument. It vanishes at zero.
The connection of H with the Markov process (X,, Px) can be described in the following way: if we apply the operator A' defined by formula (2.1) to the function exp{Y; (x;x'), then we obtain h-'H(x, ha) exp{y; a;x'}. We denote by L(x, /i) the Legendre transform of H(x, a) with respect to the second variable. The equality H(x, 0) = 0 implies that L is nonnegative; it vanishes at 13 = b(x). The function L may assume the value + oo ; however, inside the domain where it is finite, L is smooth.
For the example considered above we have: H(x, (x) = r(x)(e" - 1) + l(x)(e-" - 1), and the function L has the form indicated in the preceding section with r(x) and 1(x) replacing r and 1.
For a function q,, T, < t < T2, with values in R', we define a functional by the formula S(AP) = ST,T,(co) =
f
T2
L(co, 0,) dt,
(2.3)
TI
if q is absolutely continuous and the integral is convergent; otherwise we put ST1T,(cp) =+ x. This functional will be a normalized action functional (and the normalizing coefficient will be h-'). In particular, if the measure p is concentrated at 0 and (a'') is the identity matrix, then, as follows from results of Ch. 4, the action functional has the form (2.3), where L(4, cp) = i 10 - b(op)1z. In Wentzell and Freidlin [4] the action functional is computed for a family of diffusion processes with an arbitrary matrix (a'') (cf. the next section).
146
5. Perturbations Leading to Markov Processes
Families of infinitely divisible processes belonging to our scheme have been considered in Borovkov [1]. We return to these classes of random processes in the next section. Now we formulate a result which generalizes results of both Wentzell and Freidlin [4] and Borovkov [1]. In order that h-'SOT((p) be the action functional for the family of processes (X,, Ps), it is, of course, necessary to impose some restrictions on this family. We formulate them in terms of the functions H and L. I.
There exists an everywhere finite nonnegative convex function H(a) such that H(0) = 0 and H(x, a) < H(a) for all x, a.
II.
The function L(x, f3) is finite for all values of the arguments; for any R > 0 there exists positive constants M and m such that L(x, /3) < M, I V, L(x,1i)1 < M, E;; (a2L/(a f' a/3j)(x, I3)c'c' >_ m E; (c')2 for all x, c e Rr and all J3, 1/31 < R.
The following requirement is between the simple continuity of H and L in the first argument, which is insufficient for our purposes, and uniform continuity, which is not satisfied even in the case of diffusion processes, i.e., for functions H and L quadratic in the second argument. III.
AL(6') =
sup
sup
1 Y -,' 1< b'
p
L(y ,13)
- L(y, P) - 0 as b' - 0.
1+ L(y, I3)
Condition III implies the following continuity condition for H:
H(y, (1 + AL(S'))-'a) - H(y', a) < AL(S'Xl + AL(S'))-'
(2.4)
for all a and I y - y' l < 8', where AL(6') - 0 as S' j 0.
Theorem 2.1. Suppose that the junctions H and L satisfy conditions I-III and the functional S(q) is defined by formula (2.3). Then h-'S(9) is the action functional for the family of processes (X,, Px) in the metric POT((P, 0) = SUP I (pt O
01
1
uniformly in the initial point as h j 0.
The proof (under somewhat more relaxed conditions) is contained in Wentzell [7] (see also Wentzell [9]).
First of all we have to prove that SOT is lower semicontinuous and the elements of the set {q : SOT(q) < s} are equicontinuous. We do not include this purely analytic part of the proof here; we mention that similar results can be found in the book by loffe and Tikhomirov [l], Ch. 9, §1. We mention a lemma, also without proof, having to do with the properties of S[OT] (for a proof, see Wentzell [7], [9]).
42. Locally
Lemma
147
Infinitely Divisible Processes
2.1. Let condition III be satisfied. For every so > 0 there exists
At > 0 such that for any partition of the interval from 0 to T by points 0 =
to < tt <
< tn_, < to = T, max(t;+, - t;) < At, and for any function
0:5 t <- T, for which SOT(cP) < so we have Sor(t) < SOT(4) + y, where (pr, I is a polygon with vertices at the points (t;, cp;,).
Now, it has to be proved that for any S > 0, y > 0 and so > 0 there exists a positive ho such that for all h < ho we have
PXO{Por(X', (P) < 6) > exp{-h-'[SoT((P) + Y]}, where cP is an arbitrary function such that SOT((P) < so, 4po = x and that
(s)) ? S) <- exp{-h-'(s - y)},
PX(PoT(X'','1
(2.6)
where'. (s) = {(P: 90 = x, SOT(4) < s) and x is any point from R' ands < so. For the sake of simple notation we give the proof in the one-dimensional case. First we mention a few facts used in both parts of the proof. Up to the term J o b(X') ds of bounded variation, the process X, is a square integrable martingale. Therefore, the stochastic integral with respect to X"
is meaningful. Let f (t, (o), 0 < t < T be a random function bounded in absolute value by a constant C and progressively measurable with respect to the family of a-algebras .W5t = a{X5, s < t) (i.e., for any t, the function f (s, w) for s < t is measurable in the pair (s, co) with respect to the product of the a-algebra of Borel subsets of the interval [0, t] and the a-algebra ,F5t). Then we can define the stochastic integral T
f (t - 0, co) dX,
J
(2.7)
0
(cf. Kunita and Watanabe [1]). We need to integrate only random functions piecewise continuous in square mean; for them the integral (2.7) can be defined as the limit, in the sense of convergence in probability, of Riemann sums n- 1
j=o
f (ti, w)(X,,., - X
"J)
as the partition 0 = to < t, < ... < t = T becomes infinitely fine. If the realizations of X, have bounded variation and the realizations of f (t, w) have one-sided limits at every point, then the integral (2.7) can be viewed as an ordinary Lebesgue-Stieltjes integral of the left-hand limit f (t - 0, CO) of the function f(t, co).
5. Perturbations Leading to Markov Processes
148
We need the following fact : (
T
Mx exp{ r
T
f(t - 0, w) dX;' - h-' JH(Xh, hf (t, w)) dt = 1
0
(2.8)
0
(this can be derived from Ito's formula for stochastic integrals with respect to martingales, cf. Kunita and Watanabe [1]). The proof of (2.5) is first carried out for piecewise smooth functions (P.
As in the case of diffusion, we use an absolutely continuous change of measures (the idea of using such a change of measures is due to Cramer [1]). We put a(t, x) = (aL/a f3)(x, gyp). By condition II and the piecewise smooth. ness of cp, the function a is bounded. Then, we put T
PX(A)
= MX A; exp h-' Jo a(t - 0, X`_0) dX;'
- fH(X, a(t, Hr)) dt 0
]I]
for any event A e y,,. According to (2.8), Ps is a probability measure. With respect to this measure, X;' turns out to be a Markov process again (cf. Gikhman and Skorokhod [2], Ch. VII, §6) but now nonhomogeneous in time. Its infinitesimal generator is given by the formula (cf. Grigelionis [1])
Rf (x) = b(t, x) f'(x) + 2 a(x) f "(x)
+ h-'
[f(x + hR) -1'(x) - hf'(x)/I]th1 x(dfJ), RfI\{0)
where ha(x) is the former diffusion coefficient and the drift b and measure A can be expressed in terms of the former ones by the formulas b(t, x) =
a (x, a(t, x)),
m,,.(df3) = exp(ha(t, x)fi)}mx(df3).
By virtue of our choice of a(t, x) = (aL/0f3)(x, cp,), the coefficient b is independent of x and equal to gyp,. This means, in particular, that with respect to the process X,' - cp, is a martingale (with vanishing mean, since go = x) and its variance is given by the formula PX,
MX(Xfi
-
02 H
(p`)2
= hMX fo
a a2
(Xs, a(s, XS)) ds.
(2.9)
149
§2. Locally Infinitely Divisible Processes
Since the density d'Px/dP' is positive, the measure Px can be expressed by integration with respect to Px: rr
P(A) = F1 [A;
( exp{h-1
- fT(t - 0, X,-0)dXh l
o
)
+ fTH(x;,a(t, X,')) dt] }l.
(2.10)
We shall use this for the event Aa = {SUPOSisT IXh - 911 < b}. We put a2H
D,
= sup t.y aa2
3'H D2 = sup t.y
Doe
(Y, a(t, Y)),
(y, a(t, Yat, Y2
(by condition II, these expressions are uniformly bounded for all functions (p, for which the derivative 0, is bounded by some constant). We introduce the random events
Ai =
sup I Xh - gnt1 < 2hh/2Dii2}; O<-tis T rT
(
Az = {
l
I
Jo
a(t - 0, X;-o)d(X, - (p,) <
2h1/2D2121.
In both cases here on the left side of the inequality we have a martingale (with respect to Ps) with mean zero and on the right side a constant estimating
from above the doubled square mean deviation of the martingale. By the Kolmogorov and Chebyshev inequalities we have PhX( A') > 3/4, 1
Pz(Ai)
3/4.
For h <- (4D,)-162, the event A; implies A5. Therefore, P}(A5) is estimated from below by the Pt-probability of A; n AZ (which is not smaller than 1/2) multiplied by the infimum of the exponential expression in (2.10) over this intersection. The sum of integrals under the exponential sign can be reduced to the form T
- J 0 a(t - 0, X;-0)d(Xh - (p,) -
J 0T
[a(t, X;')41 -
(x(t, X,'))] dt.
The first integral here does not exceed 2h112D2i2 in absolute value for w e A; n AZ and the second one is equal to -I' L(X", 0,) A By virtue of
150
5. Perturbations Leading to Markov Processes
condition III, this integral (without the minus sign) does not exceed fT
L(tp1, fit) dt(1 + AL(2h112Dl/2)) + AL(2h'nDI z) . T. 0
Finally, for h < (4D,)-162 we obtain the estimate PX{PoT(Xh, (P) < b} > i exp{-h `SoT(co) 2h- ti2Djj2 - h-1AL(2h11zD11 )[T + S0T(ip)]}.
The factor 1/2 and all terms except -h-'S0T(gq) are absorbed by the term 1y for sufficiently small h and we obtain inequality (2.5). In order to establish that (2.5) is satisfied uniformly for not only functions qp, with 10, 1 < const but also all functions with S0T(gP) < s0, we use the equicontinuity of these functions and Lemma 2.1. We choose At > 0 such that the oscillation of each of these functions q on any interval of length not exceeding At is smaller than 6/2. We decrease this At if necessary so that for It
any polygon I with vertices (t;, q ) such that max(tj+, - t,) < At we have SOT(1) < SoT(() + y/2. We fix an equidistant partition of the interval from 0 to T into intervals of length not exceeding At. Then for all polygons 1 determined by functions (p with S0T(tp) < s0 we have 11,1 < 6/2At. For It smaller than some h0, inequality (2.5) holds for these polygons with 6/2 instead of 6 and y/2 instead of y. We use the circumstance that poT(9,1) < 6/2. We obtain PX{PoT(Xh, 1) < 6/2}
PX{POT(Xh, (P) < b}
> exp{-h-1[S0 (W) + Y]}. Now we prove (2.6). Given a small positive number x, we choose a positive
6' < 6/2 such that the expression AL(6') in condition III does not exceed x. 0 < t S T with vertices at the points We consider a random polygon (0, Xo), (At, X;,), (2At, X2et), ... , (T, XT), where the choice of At = T/n (n is an integer) will be specified later. We introduce the events Ah(i) = j
I X" - X e, I<
sup .et s 1:5 G + 1W
5'
i = 0, 1, ... , n - 1. If all A"(i) occurred, then we had p0T(Xh,1h) < 6; moreTherefore, over, p0T(Xh, (N(s)) >- 6 implies lh 0 n-t
\1
n-t
PX{POT(Xh, 4X(s)) >- 8} < PX(U Ah(i)) + PX( (1 A h(i) n {lh
\-
o
'1
(s)}
.
i=O (2.11)
151
Locally Infinitely Divisible Processes
The first probability does not exceed
8'
n sup P,".{ sup I X'- Y l > OStSet
S
By virtue of a well-known estimate (cf. Dynkin (1], Lemma 6.3), the supremum
of this probability does not exceed
2 sup sup P3 { I X;' - y j > 8'/2}. t<et
y
We estimate this probability by means of the exponential Chebyshev inequality: for any C > 0 we have
Py{ I X, - yI > 8'/2} < [M' exp{h-'C (X; - y)} y)}]e" 1C512(2.12)
+
It follows from (2.8) that the mathematical expectation here does not exceed
exp{h-'tH(±C)} < exp{h-'LtH(±C)}. Now we choose C = 4s0/8' and we choose At so that AtH(±C) < so. Then the right side of (2.12) does not exceed 2exp{-h-'so} and the corresponding terms in (2.11) are negligible
compared to exp{-h-'(s - y)}. The second term in (2.11) is estimated by means of the exponential Chebyshev inequality:
n-t
= Px ( (
A"(l) n {SOT(l") > S})
F i
A"(i); exp{h'(1 + x)-'SOT(l")}
Mx o
x exp{-h-'(1 + x)-2s}.
(2.13)
Using the choice of 8', we obtain "
SOT(l) _
n-1
(i+t)et
i=0
i At
- X Ot} dt L l "' Xh +1)et At I Xh
"Y_1AtIL (X''tet+ .=o
`
_ (1 + )n1At L(Xet i=0
Xetl
J
.(1 + x) + x
X(+1)At X otl J
+ xT
5. Perturbations Leading to Markov Processes
152
for we n7=o A"(i). Taking account of this, we obtain from (2.13) that n- 1
Pz n A"(i) n {t" x(s)}} t=o
J
"
n-1
n-1
< Mx ,n A h(i); iI exp{h (1 + x)1
x exp{h-[-(1 +
x)`-2s
1
X( +1)Ot - X1 AtL(XAt )}1J h
+ (1 + x)-zxT]}.
(2.14)
Using the Markov property with respect to the times it, 2At, ... , (n - 1)At, we obtain that the mathematical expectation in this formula does not exceed (
[sup M,",[A"(0); exp{
l
T
h-'(1
( + x) - 'AtL ! y,
Xh
_
eAt
lllln
ll
Y/ JJ
(2.15)
Here the second argument of L does not exceed 5'/At. For fixed y, we approxi.
mate the function L(y, /1), convex on the interval /31 < 5'/At, by means of a polygon L'(y, /3) circumscribed from below to an accuracy of X. This can be done by means of a polygon with the number N of links not depending on y, for example, in the following way: we denote by (i-;, ig+; (i = 0, 1, 2,...) the points at which L(y, /3) = ix (for i # 0, there are two such points) and choose the polygon formed by the tangent lines to the graph of L(y, /3) at these points, i.e., we put
L'(y, ) =
max i. [L(Y, /31) + i0:5 i!c-
4
(Y,
(2.16)
Here the number N of points is equal to 2io + 1, where io is the integral part of x-' sup ,. supjpi for I# I < S'/At (Fig. 9).
L(y, /3). It is easy to see that L(y, /3) - L'(y, /3) < x
Figure 9.
13
153
:special Cases. Generalizations
Using the definition of L, the expression (2.16) can be rewritten in the form
L'(y, /3) = max [a; /3 - H(y, a;)],
(2.17)
-io
where the a; = aL(y, /3;)/c$ depend on y. Taking account of this, we can see that the mathematical expectation in formula (2.15) does not exceed
M; [A°(0); exp{h-'(1 + x)-' max [a;(Xo, - y) -io5i io
- AtH(y, a;)] + h-'(1 + x)-'x1t}] < exp{h-'x(1 + x)-'At} M;[A''(0); exp{h-'[(1 + x)'a((Xe, - y)
- At(1 + x)-'H(y, a;)]}].
(2.18)
By (2.8) we have (
Mexp{h' [(1 + x) a;(X, l
- S:Hx,(1 + x)`)
= 11
.
11
In order to use this, we subtract and add hf°o' H(X;, (1 + x)-'a;) dt under the exponential sign on the right side of (2.18). Using the choice of S' and inequality (2.4), we obtain that the ith mathematical expectation on the right side of (2.18) does not exceed exp{h-'At x(1 + x)- '}. We combine the estimates obtained so far:
Px{PoT(X", (Ds(s)) > S} < 4n exp{-h- 's} + (2io + 1)"
x exp{h-'[-(l + x)-2S + x(1 + x)-2T + 2x(1 + x)-'T]}.
(2.19)
Since x > 0 can be chosen arbitrarily small, for h smaller than or equal to some ho we obtain estimate (2.6) for all s :!5; so -
§3. Special Cases. Generalizations In this section we derive theorems on large deviations for certain families of Markov processes. They either follow from Theorem 2.1 or are close to it.
First of all, the hypotheses of Theorem 2.1 are satisfied for the families of diffusion processes in R', given by the stochastic equations
X; = b(X;) + a (X;)ev
(3.1)
S. Perturbations Leading to Markov Processes
154
as the parameter a converges to zero, provided that the drift coefficients b'(x) and diffusion coefficients a'j(x) = Yk - , ak(x)Qk(x), i, j = 1, ... , r are bounded and uniformly continuous in x and the diffusion matrix is uniformly nondegenerate: Y;ja"(x)c;c; > y Y ; c?, µ > 0. We calculate the characteristics of the family of processes (X;, Pt): the infinitesimal generator for functions f eCt2) has the form azf
2
(x) AY (x) = Y b'(x) of (x) + Y a'j(x) ox' ox' 2 ;; ox'
(3.2)
The role of the parameter h of the preceding paragraph is played by £2 and the normalizing coefficient is C. 2. Moreover, we have H(x, a) = Y b'(x)a; + 2 Y a" (x)ai a; ;
(3.3)
the Legendre transform of this function is
L(x, /1) = 2 Y a;;(x)(/l' - b'(x))(f3 - b'(x)),
(3.4)
where (a;;(x)) = (a''(x))-' ; the normalized action functional has the form (for absolutely continuous cp T, < t < T2) 1
ST,T2((P) = 2
Tz
f Yi; aij(q,)(4 - b'(W,))(O - bj((,r)) dt. /
T,
We verify conditions I-III of Theorem 2.1. As the function H we may choose
H(a) = BIaI + 4AIaI2,
where B and A are constants majorizing I b(x) I and the largest eigenvalue of the matrix (a''(x)). As the constant M = M(R) of condition II we may
choose µ-'(R + B + 1)2 and as m we may take A'. Condition III is also satisfied: the function AL(d') can be expressed in terms of the moduli of continuity of the functions b'(x), a'j(x) and the constants µ, A and B. The following theorem deals with a slight generalization of this scheme and in this case the proof of Theorem 2.1 needs only minor changes.
Theorem 3.1. Let the functions b'(x) and a"(x) be bounded and uniformly continuous in R', let the matrix (a''(x)) be symmetric for any x and let
>a''(x)c;cj q
Y c;' t
155
13. Special Cases. Generalizations
where p is a positive constant. Furthermore, suppose that b'`(x), ... , b"(x)
uniformly converge to b'(x), ... , b'(x), respectively, as is -* 0, (X;, Px) is a diffusion process in R' with drift b(x) = (b' `(x), ... , b"(x)) and diffusion matrix E2(a"(x)), and the functional S is given by formula (3.5). Then E-2Sor(cP) is the action functional for the family of processes (X,, Px) in the sense of the metric por(cP, ') = SUPO <, < r I (PI - oil, uniformly with respect to the initial point
Moreover, this result can be carried over to diffusion processes on a differentiable manifold.
Theorem 3.2. Let M be an r-dimensional manifold of class C") with metric p. Suppose that there exists a positive constant A such that in the A-neighborhood of every point xo e M we can introduce a common coordinate system Kxo and the distance p differs from the corresponding Euclidean distance only by a numerical factor uniformly bounded for all Kxo. For every E > 0 let (X,, P`x) be adiusion process on M (cf. McKean [1]) and suppose that in the coordinate system Kxo, its infinitesimal generator can be written in the form z
z
I b'`(x)
a ' + 2 I a''(x) l9x ax'
.
(3.6)
(In passage to another coordinate system, the a'j(x) are transformed as a tensor and in the b"(x) there also occur terms of order E2 containing derivatives of
a''(x)). Suppose that b"(x) -, b'(x) uniformly in all x such that p(xo, x) < A and all coordinate systems Kxo as E -> 0, the functions b'(x), a''(x) are bounded and uniformly continuous in all x, p(xo, x) < A, and all K,,., and Y_ a`'(x)c; c, >_ p
c?
for all cl,... , c,, where p = const > 0. Let the functional S be given by formula (3.5).
Then E-2Sor((p) is the action functional for the family of processes (X', Px)
with respect to the metric por(cp, 0) = supo respect to the initial point as e - 0.
The proof of Theorem 3.2 (and of Theorem 3.1) is contained in §1 of Wentzell and Freidlin [4].
We consider some more families of Markov processes for which the action functional can be written out easily by using formulas of the preceding
section, but the fact that the expression thus obtained is indeed the action functional does not follow from Theorem 2.1. Let n;''- ', ... , n;''- ' be independent Poisson processes with parameters hh-'A', respectively are positive constants). We
156
5. Perturbations Leading to Markov Processes
consider the process xti `. The infinitesimal '") , where`'' br = "' r = hn''' r r generator of this Markov process is
A''f(x...... x') = h-t2t[f(x' + h, x2, ... x') -f(x', x2, ... x')] + .. + h-'.Z'[f(x', ..., Xr + h) - f(x', ..., x'-1 x')],
[
(3.7)
A'(e"
H(x, a) = H(a) _
L(x, fl) ° L(Q) _
+ oo,
- fli + 21]
- 1),
(3.8)
for /3' > 0, ... ,13' >- 0,
if at least one of the f3' < 0;
(3.9)
the functional S(ip) is defined as the integral f T12
L(O,) dt
ST,T2(IP) =
(3.10)
TI
for absolutely continuous cp, = (cp; , ... , (p,) and as + oo for all remaining rp (the integral is equal to + oo for all (p, for which at least one coordinate is not a nondecreasing function). The hypotheses of Theorem 2.1 are not applicable, since L equals + 00 outside the set {#' >- 0, ... , f >- 0}. Nevertheless, the following theorem holds. Theorem 3.3. The functional given by formulas (3.10) and (3.9) is the normalized
action functional for the family of processes ,, uniformly with respect to the initial point as h J, 0 (as normalizing coefficient we take, of course, h- ').
This is a special case of Borovkov's results [1] (in fact, these results are concerned formally only with the one-dimensional case; for r > 1 we can refer to later work: Mogul'skii [1] and Wentzell [7]). The preceding results were related to the case where with the increase of the parameter, the jumps of the Markov process occur as many times more often as many times smaller they become. We formulate a result concerning the case where this is not so. As above, let nl ... , n" be independent Poisson processes with parameters
a > 0 and h > 0 we put
h(ir - a2't),
Theorem 3.4. Put T2
ST,T2(P) = -
2
r
T, it,
(1')-1(cPI)2
dt
(3.11)
§4.
Consequences.
Generalization of Results of Chapter 4
157
for absolutely continuous cp,, T, < t < T2 and ST,T,(cp) = +oo for the re-
7(p) is the action functional for the family of maining qp,. Then (h2a)- ISO II" ), 0 < t < T, uniformly with respect to the r11" I" = (7, processes
initial point as
,
ha -'
,
,
oo, h2a -+ 0.
This is a special case of one of the theorems of the same article [1] by Borovkov (the multidimensional case can be found in Mogul'skii [1]).
§4. Consequences. Generalization of Results of Chapter 4 Let us see whether the results obtained by us in §§2, 3 and 4 of Ch. 4 for small
perturbations of the type of a "white noise" of a dynamical system can be carried over to small jump-like perturbations (or to perturbations of the type of diffusion with varying diffusion).
Let z, = b(x,) be a dynamical system with one stable equilibrium position 0 and let (X;, PX) be a family of Markov processes of the form described in
§2. For this family, we may pose problems on the limit behavior as h 10 of the invariant measure p", of the distribution of the point X=h of exit from a domain and of the mean time MX r'' of exit from a domain. We may conjecture that the solutions will be connected with the function V(0, x) _ inf{Sr,r2((p): tor, = 0, (PT2 = x). Namely, the invariant measure µ'' must be described by the action function h -' V(0, x), the distribution of X',',, as h 10 must to be concentrated near those points of the boundary at which min).Eap V(0, y) is attained, MXT" must be logarithmically equivalent to
exp{h-'min V(0, y) ((
yEaD
and for small h, the exit from a domain must take place with overwhelming probability along an extremal of S((p) leading from 0 to the boundary, etc. The proofs in §§2, 4 of Ch. 4 have to be changed in the following way. Instead of the small spheres y and r about the equilibrium position, we have to take a small ball y containing the equilibrium position and the exterior r of a sphere of a somewhat larger radius (a jump-like process may simply jump over a sphere). Instead of the chain Z on the set y u OD, we consider a chain on the sum of y and the complement of D. A trajectory of X, beginning at a point of y is not necessarily on a sphere of small radius (the boundary
of r) at the first entrance time of F. Nevertheless, the probability that at this time the process will be at a distance larger than some $ > 0 from this sphere converges to zero faster than any exponential exp{ -Kh-' } as h 10. Theorems 2.1, 2.3, 2.4, 4.1 and 4.2 of Ch. 4 remain true for families (X,, PX) satisfying the hypotheses of Theorem 2.1. Of course, these theorems are also true for families of diffusion processes satisfying the hypotheses of Theorems 3.1 and 3.2. We give the corresponding
158
5. Perturbations Leading to Markov Processes
formulation in the language of differential equations, i.e., the generalization of Theorem 2.2 of Ch. 4. Theorem 4.1. Let 0 be a stable equilibrium position of the dynamical system x, = b(x,) on a manifold M and let D be a domain in M with compact closure
and smooth boundary 3D. Suppose that the trajectories of the dynamical system beginning at any point of D u aD are attracted to 0 as t - oe and the vector b(x) is directed strictly inside D at every boundary point. Furthermore, let (X;, Pz) be a family of diffusion processes on M with drift br(x) (in local coordinates) converging to b(x) as e -' 0 and diffusion matrix c2(a'.J(x)) and let the hypotheses of Theorem 3.2 be satisfied. For ever), F > 0 let ue(x) be the solution of Dirichlet's problem s2
a2UE(x)
I a''(x) ax ax' 2 i;
+ Y b'E(x) auE(x) = 0, i
ut(x) = g(x),
ax'
x e D,
x e aD,
with continuous boundary function g. Then 1imE_o uE(x) = g(yo), where yo is a point on the boundary at which minY D V(O, y) is attained (it is assumed that this point is unique).
The proof of the theorem below is entirely analogous to that of Theorem 4.3 of Ch. 4.
Theorem 4.2. Let the dynamical system z, = b(x,) in R' have a unique equilibrium position 0 which is stable and attracts the trajectories beginning at any point of R'. For sufficiently large I x I let the inequality (x, b(x)) < - c I x 1, where c is a positive constant, be satisfied. Suppose that the family of diffusion
processes (X;, Px) with drift bE(x) - b(x) and diffusion matrix s2(a'j(x)) satisfies the hypotheses of Theorem 3.1. Then the normalized invariant measure I,E of (X;, Px) as e - 0 is described by the action function r- 2V(O, X).
The formulation and proof of the corresponding theorem for diffusion processes on a manifold is postponed until §4 of Ch. 6. The matter is that this theorem will be quite simple for compact manifolds but the trajectories of a
dynamical system on such a manifold cannot be attracted to one stable equilibrium position. For families of jump-like processes, the corresponding theorem will also be more complicated than Theorem 4.3 of Ch. 4 or Theorem 4.2 of this chapter. We indicate the changes in Theorem 3.1 of Ch. 4 enabling us to determine the quasipotential. Let A be a subset of the domain D with boundary aD. The problem RA for a first-order differential equation in D is, by definition, the problem of finding a function U continuous in D u OD, vanishing on A and positive
Consequences. Generalization
§4.
159
of Results of Chapter 4
outside A, continuously differentiable and satisfying the equation in question in (D v OD) \ A and such that VU(x) 0 0 for x E (D v OD) \ A. Theorem 4.3. Let H(x, a) M-* L(x, /3) be strictly convex functions smooth in
the second argument and coupled by the Legendre transformation. Let ST,T2((p) = JT2 L((p,, 0,) dt (for absolutely continuous functions q, and +oo Let A be a compact subset of D. Let us put otherwise).
V(A, x) = inf {ST,T2((p): (PT, E A, OT, = x; - oc < T, < T2 < oo }. (4.1)
Suppose that U is a solution of problem RA for the equation
H(x, VU(x)) = 0
(4.2)
inD u 3D. Then U(x) = V(A,x)forall xfor which U(x) < min{U(y): y e 8D}. The infimum in (4.1) is attained at the solution of the equation gyp, = V.H((p VU(cp,))
(4.3)
with the final condition PT2 = x for any T2 < oo (then the value of (p automatically belongs to A for some T,, - oc < T, < T2).
Theorem 3.1 of Ch. 4 is a special case of this with H(x, a) = (b(x), a) + Ia12/2, A = {0}.
Proof. Let qp, T, < t < T2, be any curve connecting A with x and lying entirely in D u OD. We use the inequality L(x, /3) > Y; a; /f' - H(x, a), following from definition (1.1). Substituting 9p rp, and VU((p,) for x, /3 and a, we obtain T2
ST,T2((p) =
J T,
0U
T2
L(O (q,) dt > T,
Y ax ; ((o,)c; dt 1
rT2
H(p,, VU((pJ) dr. J T,
The second integral is equal to zero by (4.2). The first one is U(cpT2)
-
U(gpT,) = U(x). Hence ST, T2(0) >- U(x). If U(x) < min{C(y): y e 3D}, then
the consideration of curves going through the boundary before hitting x does not yield any advantage and V(A, x) >- U(x). Now we show that the value U(x) can be attained. For this we construct an extremal. We shall solve equation (4.3) for t < T2 with the condition coT2 = x. A solution exists as long as q,, does not go out of (D u OD) \ A, since the right side is continuous (the solution may not be unique). We have (d/dt)U(cp,) _ Y; (dU/0x'x(p,) gyp' = L((p gyp,) + H(qp VU((p,)). The second term is equal to zero. The first term is positive, since L(x, /3) >: 0 and vanishes only for /3 = b(x) = V, H(x, 0), which is not equal to
160
S. Perturbations Leading to Markov Processes
V, H(x, V U(x)) for x 0 A (because V U(x) # 0). Hence U((p,) decreases with decreasing t. Consequently, (p, does not go out of D u aD. We prove that for some T, < T2 (maybe for T, = - oo) the curve cp, enters A. If this were not so, then the solution gyp, would be defined for all t < T2 and the limit lim, - .
0 would exist. However, this would mean that cp, does not hit some neighborhood of the compactum A. Outside this neighborhood we have (d/dt)U((p,) = L((p 0,) > const > 0, which contradicts the existence of a finite limit of U((p,) as t -+ - cc. Now we determine the value of S at the function we have just obtained: Z
ST,T2(tp) =
TZL((p,, apt) dt =
J T1
d
T JT, dt
U((p,) dt = U(PT2) - U((QT,) = U(X).
The theorem is proved. The hypotheses of this theorem can be satisfied only in the case where all trajectories of the dynamical system z, = b(x,) beginning at points of D u aD are attracted to A as t co (or they enter A). The same is true for Theorem 3.1 of Ch. 4.
As an example, we consider the family of one-dimensional jump-like processes introduced at the beginning of §2. If 1(x) > r(x) for x > x0 and 1(x) < r(x) for x < xo, then the point x0 is a stable equilibrium position for the equation for the equation .z, = r(x,) - 1(x,). We solve problem H(x, U'(x)) = 0, where H(x, a) = r(x)(ea - 1) + 1(x)(e - 1). Given x, the function H(x, a) vanishes at a = 0 and at a = ln(1(x)/r(x)). For x 34 xa we have to take the second value, because it is required that U'(x) : 0 for x # x0. We find that U(x) = fxoln(1(y)/r(y))dy (this function is positive for both x > xo and x < x0). It is in fact the quasipotential. An extremal of the normalized action functional is nothing else but a solution of the differential equation 0, = (aH/aa)(cp U'((p,)) = 1((p,) - r((p,). Now we obtain immediately the solution of a series of problems connected with the family of processes (X;, Pz): the probability that the process leaves an interval (x,, x2) 9 xo through the left endpoint converges to 1 as h 10 if U(xl) < U(x2) and to 0 if the opposite inequality holds (for U(x,) = U(x2) the problem remains open); the mathematical expectation of the first exit
time is logarithmically equivalent to exp{h-' min[U(xl), U(x2)]} (this result has been obtained by another method by Labkovskii [1]) and so on. The asymptotics of the invariant measure of (X,, P') as h 10 is given by the action function h-'U(x) (this does not follow from the results formulated by us; however, in this case as well as in the potential case, considered in §4, of Ch. 4, the invariant measure can easily be calculated explicitly).
Chapter 6
Markov Perturbations on Large Time Intervals
§1. Auxiliary Results. Equivalence Relation In this chapter we consider families of diffusion processes (X;, Px) on a connected manifold M. We shall assume that these families satisfy the hypotheses of Theorem 3.2 of Ch. 5 and the behavior of probabilities of large'deviations from the "most probable" trajectory-the trajectory of the dynamical system x, = b(x,)-can be described as E 0, by the action functional E-ZS((P) = e 2ST,TZ(q ), where
2f, tiY_ ai,{(P,)(O' - b'((gj))(4j - b'((p,)) dr. I
Tz
S((P) = -
We set forth results of Wentzell and Freidlin [4], [5] and some generalizations thereof.
In problems connected with the behavior of X; in a domain D M on large time intervals (problems of exit from a domain and for D = M the problem of invariant measure; concerning special cases, cf. §§2, 4 of Ch. 4 and §4 of Ch. 5), an essential role is played by the function VD(x, y) = inf{SoT(4P): c°o = x, (PT = t', gyp, e D u OD for t c- [0, T]),
where the infimum is taken over intervals [0, T] of arbitrary length. For D = M we shall use the notation VM(x, y) = V(x, y). For small s, the function VD(x, y) characterizes the difficulty of passage from x to a small neighborhood of y, without leaving D (or D u 8D), within a"reasonable" time. For D = M we can give this, for example, the following exact meaning: it can be proved that .
V(x, y) = lim lim lim[-E2 In PX{Ta < T}], T- oo a-,0 a-'0
where Ta is the first entrance time of the 6-neighborhood of y for the process e
162
6. Markov Perturbations on Large Time Intervals
It is easy to see that VD is everywhere finite. Let us study its more compli. cated properties.
Lemma 1.1. There exists a constant L > 0 such that for any x, y e M with p(x, y) < A, there exists a function cpt, cpo = x, coT = y, T = p(x, y), for which So1(cp) < L L. p(x, y).
This is Lemma 2.3 of Ch. 4, adapted to manifolds. Here A > 0 is the radius of the neighborhoods in which the coordinate systems K.'. are defined (cf. §3, Ch. 5).
This lemma implies that VD(x, y) is continuous for x, y e D and if the boundary is smooth, for x, y e D v M.
Lemma 1.2. For any y > 0 and any compactum K c D (in the case of a smooth boundary, for K S D u OD) there exists To such that for any x, y e K
there exists a function p,, 0 < t < T, coo = x, PT = y, T< To such that SoT((o) < VD(x, y) + Y
For the proof, we choose a sufficiently dense finite 6-net {xi) of points in
K; we connect them with curves at which the action functional assumes values differing from the infimum by less than y/2 and complete them with end sections by using Lemma 1.1: from x to a point xi near x, then from xi to a point x; near y, and from x; to y.
Let cp 0 < t < T, be a continuous curve in M and let 0 = to < t1 <
. ,< t = T be a partition of the interval from 0 to T, such that all values cp, for t e [ti, ti+ 1] lie in one coordinate neighborhood Ks,. We can define a "polygon" l inscribed in cp by defining It for t e [ti, t;+ 1] by means of coordinates in K.,,:1, = 911 [(t;+I - t)/(tl+t - t.)] + cpf,.,[(t - tX(ti+i - t.)]. Lemma 1.3. For any K > 0 and any y > 0 there exists a positive h such that for any function cpt, 0 < t < T, T + S0T(cp) < K, and any partition of the interval from 0 to T with max(ti+1 - ti) < h we have SOT(l) < SOT(Cp) + Y
The first step of the proof is application of the equicontinuity of all such (p in order to establish that for sufficiently small h, the values cot, ti < t S t.+ 1,
lie in one coordinate neighborhood K,t, and the number of different coordinate systems Ks, is bounded from above by a number depending only on K. This is the assertion of Lemma 2.1 of Ch. 5 concerning which we referred
the reader to Wentzell [7] (however, for a function L(x, fi) quadratic in /3, the proof is simpler). We have already seen in the proof of Theorem 2.1 of Ch. 4 that in our arguments we have to go out to a small distance from the boundary of D or, on the other hand, we have to consider only points lying inside D at a positive
163
§l, Auxiliary Results. Equivalence Relation
distance from the boundary. Let p(x,-y) be a Riemannian distance cor-
responding to a smooth tensor g;,{x): (dp)Z = Y;; g;,{x) dx` dx'. We denote by D+a the 6-neighborhood of D and by D_a the set of points of D at a distance greater than b from the boundary. In the case of a compact twice continuously differentiable boundary OD, the boundaries OD-,5 and 0_6 are also twice continuously differentiable for sufficiently small b. For points x lying between 3D _a and 3D+a or on them, the following points are defined uniquely: (x)o is the point on 3D closest to x; (x)_a is the point on 3D_a closest to x; (x)+a is the point on BD+a closest to x. The mappings x (x)o, x - (x)_a and x (x)+a are twice continuously differentiable. Lemma 1.4. Let 3D be compact and twice continuously differentiable. For any K > 0 and any y > 0 there exists b0 > 0 such that for any positive b < bo and any function cp 0 < t < T, T + SoT(cp) < K, assuming its values in D U 3D,
there exists a function 0 0 <_ t < 1, having the following properties: The value 0o coincides with coo 1 cpo e D _a u DD-,, and is equal to ((oo) _a otherwise; the same is true for OT, coT and S0t(ip) <- So((cp) + y.
The same is true for D+a and D replacing D and D_a.
Proof. First of all we locally "straighten" the boundary, possibly decreasing ,l and in neighborhoods covering the boundary we choose new local coordinates x', x2, ..., x' in which 8D becomes the hyperplane {x' = 01, for points x in a small neighborhood of the boundary outside D the coordinate
x' is equal to p(x, 3D) and for points near 3D inside D, x' = - p(x, 8D). In these coordinate systems, the coefficients a''(x), a. (x) will differ from those in the original standard systems Kx0 but will be uniformly bounded and
continuous for all these systems (with constants and moduli of continuity depending on the curvature of 3D). The fact that S((p) is quadratic is insignificant now. We only use the circumstance that it can be represented in the form T
SoT(cp) =
J0
L((q,, 0) A
where L(x, f) is convex and satisfies conditions II and III of Theorem 2.1 of Ch. 5 in every local coordinate system. We choose a positive ..', not exceeding %/2 or y/3L (L is the constant in Lemma 1.1). Using the equicontinuity of all functions co under consideration, we choose ho > 0 such that the values of any of the co on any interval of
length not greater than ho lie in the A'-neighborhood of the left endpoint. We decrease ho so that for the "polygon" I inscribed in co with time step not exceeding ho, we have SOT(l) <_ SoT(cp) + y/3 (cf. Lemma 1.3). Let us put
n = [K/h0] + 1. For any function cp, defined on an interval of length T > h0, we take the partition of the interval from 0 to T into n equal parts and consider the
6. Markov Perturbations on Large Time Intervals
164
corresponding polygon 1 0 < t < T. In each of the local coordinate systems chosen by us, the modulus of the derivative of 1, does not exceed R = nA'/y0 at any point. We choose a positive b0 < A' such that t1L(60) < y/(3K + y) (cf. condition III of Theorem 2.1, Ch. 5) and S0 n
,,R I (aL/aIi') (x, f)
::-:I y/3 (cf. condition II).
For every S < 60 and each of the functions (p, under consideration we define a function ip 0 < t < T = T: 0`
_
if1,ED_avaD_a
1,
otherwise
This function satisfies the conditions concerning the initial and terminal points. We estimate SoT(ip). We have S0T(O) - SOT(l) =
f
T
(p,) - L(W1, 1,) dt + f T [L(ip,,1,) - L(l 11)] dt.
0
0
(1.1)
'The expression L(Q l,)
is
meaningful, since because of the choice
60 < 2.' < A/2, on the whole interval from iT/n to (i + 1)T/n the points j, and 1, are in a neighborhood where one of the local coordinate systems chosen by us acts.) The value ip, is different from 1, only when 1, is in the 6-strip along the boundary, i.e., on not more than n little intervals [ti, ti+ 1] s [iT/n, (i + 1)T/n]. Moreover, only the first coordinates may be different. In the first integral in (1.1) we have ip, and l,; the first of these derivatives, just as the second one, does not exceed R in modulus (because it either coincides with 1, or differs from 1, by a vanishing first coordinate). We have (i+ 1)TJn
f
i1)] dt = TJn
f1`1 aL
fi(t)) (w; - i,) dt.
(1.2)
1
Here fl(t) is a point of the interval connecting ip, with i, (which varies with t
in general), 't,' - l' does not depend on t and is equal to (`Y,i., - l,i., - Y',i + 1,,)(t1+1 - t3.
We pull this constant out of the integral. We use the facts that
I aL/aPt l
S y/3n8o and that 0 < 1, - gyp, < 6 for all t. We obtain that the integral (1.2) does not exceed y/3n and the first integral in (1.1) is not greater than y/3. The second integral in (1.1) does not exceed AL(b) fo, (1 + L(l l,)) dt <
3K + y .
(T + SoT(l)) < 3
165
§1. Auxiliary Results. Equivalence Relation
Finally, SOT(W) < SOT(l) + 2y/3 < SoT(cp) + Y
On the other hand, if T:5 ho, then p(cpo, (PT) < A'; the distances of cpo and (PT from the points 00 and ip-p do not exceed 6 < 60 < A', so that p(00, gyp;.) < 32' < y/L. Using Lemma 1.1, we obtain that these points are connected by the curve ip 0:5; t:5 T = ppo, 4 T), So fiW < Y:5 SOT( ) + y; this curve does not leave D_8 u aD_a.
The proof can be preserved for curves cp, in D+a u aD+, and gyp, in DuOD. It follows from Lemma 1.4 that VD(x, y) changes little as D is changed to D+a with small b, uniformly in x and y, varying within the boundaries of some
compactum in D u 3D. It also changes little as D is changed to D_6. In this case, if x or y lies outside D_6 u 3D_s, then they have to be replaced by (x)_b or (y)_6, respectively.
D between With VD(x, y) there is associated an equivalence relation points of D u 3D: x - D y if VD(x, y) = VD(y, x) = 0. This relation depends only on the dynamical system and does not change if the matrix (a''(x)) is changed to another matrix for which the maximal and minimal eigenvalues are different from each other by a bounded factor. For D = M, we simply write x yin place of x - M y. It is clear that x - D y implies x - y. It follows from Lemma 1.1 that in the case of a smooth boundary OD, the points equivalent to each other form a closed set. Lemma 1.5. Let x D y, y x. The trajectory x,(x) of the dynamical system i, = b(x,) beginning at x lies in the set of points z - D x.
Proof. There exists a sequence of functions cp;`, 0 < t < T", cpo= x, (PT. = y, lying Entirely in D u aD and such that SOT"(Wi"') - 0. The T. are bounded from below by a positive constant, say T. For the functions (p;") on the interval from 0 to T, the values of SOT converge to 0. Therefore, some subsequence of these functions converges, uniformly on [0, T], to a function cp, with SOT((P) = 0, i.e., to the trajectory x,(x) of the dynamical system. The points x,(x), 0 < t < T, are equivalent to x and y; this follows from the fact that VD(x, cp;") and VD((p;"), y) do not exceed SoT"(O"') - 0 as n
oo.
We continue with xT(x). We choose that one of the points x and y which is farther from xT(x). If it is, say, x, then we have p(XT(x), x) >_ Zp(x, y). In the same way as earlier, we obtain that over some time interval, x,(x) goes within the set of points equivalent to x. These intervals are bounded from below by a positive constant and thus we obtain our assertion for all t > 0 by successive application of the above argument. It is easy to see that our assertion is also true for t < 0.
166
6. Markov Perturbations on Large Time Intervah
Figure 10.
Figure 11.
Any w-limit set, i.e., any set of partial limits of a trajectory of the dynamical system as t --> oo, consists of equivalent points; this follows from Lemma I.I. A maximal set of points equivalent to each other, containing some CO-limit
set may consist of one w-limit set, it may be the sum of a finite or infinite number of w-limit sets, or it may contain further points not belonging to any w-limit set. Moreover, such a set contains, with every point, a whole trajectory of the dynamical system.
We give some examples. Let the trajectories of a system be concentric circles (Fig. 10). In this case, for any two points x, y of a closed trajectory we
have x - y. However, x , D y for x $ y if D has the form indicated in the figure.
In Fig. 11, if the trajectories not drawn approach the trajectories drawn, there are six different w-limit sets: the point 1, the point 2, the exterior curve (including the point 1), the union of the exterior curve and the intermediate curve, the union of the intermediate curve and the interior curve, the interior curve; the union of these w-limit sets forms a maximal set of points equivalent to each other. In the case where the trajectories not drawn move away from those drawn, the set of equivalent points remains the same but there remain only two w-limit sets: the points 1 and 2.
Lemma 1.6. Let all points of a compactum K S D u c3D be equivalent to each other but not equivalent to any other point in D u OD. For any y > 0 S > 0 and x, y e K there exists a function gyp, 0 < t < T, rP0 = x, (PT = Y, entirely in the intersection of D u aD with the 6-neighborhood of K and such that SOT(p) < Y.
Proof. We connect the points x and y with curves gyp;") e D u OD,
0< t<
y>
0.
If all curves (p;"' left the 6-neighborhood of K, then they would have a limit point outside this b-neighborhood, equivalent to x and y. Lemma 1.7. Let all points of a compactum K be equivalent to each other and let K : M. Let us denote by TG the time of first exit of the process X; from the
167
jl. Auxiliary Results. Equivalence Relation
& neighborhood G of K. For any y > 0 there exists b > 0 such that for all sufficiently small e, x E G we have MX TG < eY`
(1.3)
2.
proof. We choose a point z outside K such that p(z, K) < -,Y/3L A A where L and A are the constants from Lemma 1.1. We put 6 = p(z, K)/2 and consider the 6-neighborhood G K. We denote by x' the point of K closest to x (or any of the closest points if there are several such points) and by y the point closest to z. According to Lemma 1.2, for every pair of points x', y e K there
exists a function (p 0 < t < T, 90 = x', (PT = y, such that SOT((p) < y/3, and T is bounded by some constant independent of the initial and terminal points. We complete the curve (p, at the beginning and at the end with little segments leading from x to x' and from y to z, with the values of S not exceeding y/6 and y/3, respectively. Then the length of the time interval on which each of the functions 0, is defined is uniformly bounded for x e G by a constant To and the value of S at each of these functions does not exceed 5y/6.
We extend the definition of each of these functions up to the end of the interval [0, T0] as the solution of z, = b(x,); this does not increase the value of S.
Now we use Theorem 3.2 of Ch. 5: for x e G we have PX{TG < T0} > PX{POTO(X, gyp) < S} > e-0-97'
Z.
Using the Markov property, we obtain that PX{TG > nT0} < [1 - e-'-'Y"- 2]n. This yields GO
M`xTG < T0 Z [1 -
e-0,91*E-2]n =
n=0
Sacrificing 0.1y in order to get rid of T0, we obtain the required estimate. O Lemma 1.8. Let K be an arbitrary compactum and let G be a neighborhood of K. For any y > 0 there exists 6 > 0 such that for all sufficiently small a and x belonging to the closed 6-neighborhood g u aq of K we have TG
MX f x9(X;) dt > e-Y` Z.
(1.4)
0
Proof. We connect the point x e g u ag and the closest point x' of K with a
curve (p, with the value of S not greater than y/3 (this can be done for sufficiently small 6). We continue this curve by a solution of z, = b(x,) until
168
6. Markov Perturbations on Large Time Intervals
exit from g v ag in such a way that the interval on which the curve is defined has length not greater than some T < x independent of x. The trajectory of X; is at a distance not greater than 6/2 from cp, with probability not less than e-2j`-213 (for small e); moreover, it is in g until exit from G, it spends a time
bounded from below by a constant to and the mathematical expectation 213
in (1.4) is greater than
Now we prove a lemma which is a generalization of Lemma 2.2 of Ch. 4.
Lemma 1.9. Let K be a compact subset of M not containing any (0-limit set entirely. There exist positive constants c and To such that for all sufficiently small a and any T > To and x e K we have
P{T/i > T} <
e--i.c(T-To'
X
where Tk is the time of first exit of X; from K.
Proof. Using the continuous dependence of a solution on the initial conditions, it is easy to see that for sufficiently small 6, the closed 6-neighborhood K+a of K does not contain any cu-limit set entirely, either. For x e K+a we denote by T(x) the time of first exit of the solution x,(x) from K+a. We have
r(x) < x for all x e K+b. The function T(x) is upper semicontinuous, and consequently, it attains its largest value maxXEk.d T(x) = Tt < cc. We put To = Tt + 1 and consider all functions (p, defined for 0 < t < To and assuming values only in K+a. The set of these functions is closed in the sense of uniform convergence, and consequently, SOT attains its minimum A on this set. The minimum is positive, since there are no trajectories of the dynamical system among the functions under consideration. Then, using Theorem 3.2 of Ch. 5. in the same way as in the proof of Lemma 2.2 of Ch. 4, we obtain: Pz{Tk > To} < exp{ _g-2 (A - y)},
p-e
Pz{Tk > T} < exp
T
1
I)(A - y)
.
El
o
Corollary. It follows from Lemma 1.9 that fore smaller than some Eo and for
all xEKwe have MX Tk < To + e2/c < To = To + ED /c.
§2. Markov Chains Connected with the Process (X;, Px) In this section we shall assume that D is a domain with smooth boundary and compact closure on a manifold M. We impose the following restrictions
169
§2. Markov Chains Connected with the Process (X';, P;.)
on the structure of the dynamical system in D u aD:
(A) in D there exist a finite number of compacta Kt, K2, ... . K1 such that: (1)
for any two points x. y belonging to the same compactum we have X -.. D y ;
(2) (3)
if xEK;and y0K;,then x fny;
every w-limit set of the dynamical system r, = b(x,), lying entirely in D u aD, is contained in one of the K;. We have seen (02, 4, Ch. 4 and §4, Ch. 5) that in the case of a dynamical system with one (stable) equilibrium position 0, for the study of the behavior of the process X; on large time intervals for small c, an essential role is played by the Markov chain Z on the set y u OD, where y is the boundary of a small neighborhood of O. We have also seen that the asymptotics of the transition
probabilities of this chain are almost independent of the initial point x e y, so that for small F, the chain Z behaves as a simple Markov chain with a finite number of states. The asymptotics of the transition probabilities of the chain were determined by the quantities Vo = min{ V(O, y): y e OD}, min{ V(O, y): y e aD\&a(yo)}. For example, for small F, the transition probability P(x, aD) for x e y is between exp{ -F-2(V0 ± y)}, where y > 0 is small. We use an analogous construction in the case of systems satisfying condition (A).
We introduce the following notation: V0(K;, K;) = inf{Sor((1): cpo c- K;, (Pr E K;, tp, c- (D u OD)\ U Ks
S*i,j
for0
(if there are no such functions, we set V0(K;, Kj) = + oc). For x, y e D u OD we set Vn(x, K) = inf S SOA(P): (PO = X, W T E K. gyp, e (D u aD)\ U KS
s*j
t
for 0 < t < T}; VD(Kr, y) = inf j
I
(Po e K (Pr = )' P, E (D u OD)\ U KS S*i
for 0 < t <
I
;
VD(x, y) = inf { Sor((P): Po = x> APT = y, tP, e (D u aD)\U K,
I
S
for0
170
6. Markov Perturbations on Large Time Intervals
Finally, VD(Ki, OD) =min VD(Ki, y); yE?D
VD(x, 3D)
= min
VD(x, y').
yEaD
It is these quantities which determine the asymptotics of transition probabilities of the Markov chain connected with the process X,.
We consider an example. Let M = R2 and let the trajectories of the dynamical system have the form depicted in Fig. 12. Then there are four compacta consisting of equivalent points and containing the w-limit sets. It is easy to see that VD(K I, y) = 0 for all y inside the cycle K2 and in particular, on K2 we have VD(KI, K2) = 0; VD(KI, y) = oc for all y outside K2, whence VD(K I, K3) = VD(K I, K4) = VD(K I, OD) = oo. In the same way,
VD(K3, KI) = VD(K4, KI) = oc. Further, the values VD(K2, y) are finite and positive if y 0 K2, etc. The following matrix of VD(K;, K) is consistent with the structure, depicted in Fig. 12, of trajectories of the dynamical system: 0
0
00
00
1
0
9
9
cc
6
0
6
a0
0
0
(2.1)
(That VD(K2, K4) is equal to VD(K2, K3) and VD(K4, K2) = VD(K4, K3) can be proved easily.) Knowing VD(Ki, K) for all i,j, it is easy to determine VD(Ki, K) = VD(x,
Figure 12.
§2. Markov Chains Connected with the Process (X;, PI)
171
Namely,
VD(Ki, K,) = VD(Ki, K;) A min[VD(K KS) + VD(K,, K,)] S
A min[PD(K;, K51+ VD(KSj, K52) + VD(KS2, K;)] 31.32
A ... A
min
[PD(K;, K51) + ... + lo(Ks,-2, Ki)]
We can express VD(K;, r). VD(x, K;), VD(x, y) similarly. This can be proved by using Lemma 1.6. In the example considered by us, the VD(K;, K;) form the following matrix: 0
0
9
9
1
0
9
9
7
6
0
6
1
0
0
0
(2.2)
Let po be a positive number smaller than half of the minimum of the distances between K;, K, and between K;, OD. Let p, be a positive number smaller than po. We denote by C the set D u OD from which we delete the po-neighborhoods of Ki, i = 1, ... ,1, by r; the boundaries of the poneighborhoods of the K;, by g, the p,-neighborhoods of the K;, and by g the
union of the gi. We introduce the random times to = 0, a = inf{t > T.: X; E C}, t = inf{t a,,-,: X e ag u aD} and consider the Markov chain Z = X. From n = 1 on, Z. belongs to ag u OD. As far as the times a are concerned, Xa0 can be any point of C; all the following XQ until the time
of exit of X; to aD belong to one of the surfaces r, and after exit to the boundary we have r = a = a,,+, = and the chain Z. stops. The estimates of the transition probabilities of the Z provide the following two lemmas.
Lemma 2.1. For any y > 0 there exists po > 0 (which can be chosen arbitrarily small) such that for any p2, 0 < p2 < po, there exists p,, 0 < Pi < P2 such that for any So smaller than po and sufficiently small e, for all x in the p2-neighborhood G, of the compactum K. (i = 1'... , 1) the one-step transition probabilities of Z satisfy the inequalities
exp{-f-2(VD(K,. K) + y)} < P(x. ag) < exp{-E-2(VD(K;, K;) - y)};
(2.3)
exp{ -E-2(PD(K;, aD) + ),)} < P(x, aD)
<
y)};
(2.4)
172
6. Markov Perturbations on Large Time Intervals
Figure 13.
.for all y e aD we have
exp{ _F-2( Vo(K y) + y)} < P(x, 3D r exp{ _p-2( VD(K y)
- y)}.
(2.5)
In particular, if D coincides with the whole manifold M (and M is com-
pact), then OD = 0, the chain Z. has ag as its space of states, and (2.3) implies that for x e agi, P(x, ag;) lies between exp{ -s2(V(K;, K;) ± y)}. We provide a figure for this and the next lemma (Fig. 13). Proof. First of all, VD(K;, K;) = + oo (or PD(K,, OD) = + x, VD(K1, y) = + oo) means that there is no smooth curve connecting K, with K; in D u aD and not touching the other compacta (or connecting K, with OD or y on the boundary, respectively). From this it is easy to derive that they cannot be connected even with a continuous curve not touching the indicated compacta. This implies that for VD(K,, K;) = oc (or PD(Ki, 3D) = oc, VD(K;, y) = oo), the transition probabilities in (2.3) (or (2.4), (2.5), respectively) are equal to zero. As far as the finite PD(K;, K;), VD(K;, y) are concerned, they are bounded
by some Vo < oc. We note that it is sufficient to prove (2.3) and (2.5); estimate (2.5) will imply (2.4), since aD can be covered by a finite number of 8o-neighborhoods. We choose a positive po smaller than y/10L, A/2 (L and A are constants from Lemma 1.1) and a third of the minimum distance between K;, K; and 3D. Let a p2, 0 < P2 < po be chosen. By Lemma 1.4 there exists a positive b < po/2 such that for all i, j = 1. ... , 1 and y e aD we have VD, 6(K;, K;)
VD(K;, K,) - 0.1y,
VD-6(K;, K;) < VD(K;, Kj) + 0.1y,
VD,6(K y) ? PD(K;, y) - 0.1y, VD_6(K (Y)-6) <_ PD(K,, y) + 0.1y.
173
§2. Markov Chains Connected with the Process (X;, Px)
For every pair K,, Ki for which VD(K;, K) < oc we choose a function qs Kj, 0 < t < T = T(Ki. K), such that rpo"'' E K;, (p T' "' E Kl, cpKi. A' does US*i,; KS, does not leave D_a u aD_a for 0 < t < T, and for not touch which
SoT((P`,, `i) 5 VD(K;, K) + 0.2y.
Further, on aD we choose a p2-net y,, ... , YN. For every pair K;, yk for which VD(K,, yk) < oc, we choose a function cp ' ''", 0 < t 5 T = T(K,, yk), K does not leave (p0,,Yk E K;, (p"' = (yk)_a, which does not touch D_,, u aD_,,, and for which SOT((p 114,A)
< l"D(Ki, y,) + 0.2y.
We fix a positive pt smaller than p21 p0/2, min{ p(cp;` A',5U/J KS I : 0 < t < T(K;, K), i, j = 1, ( * i,
... , I } }
and
min{
l
A, U K5) : 0 < t < T(Ki, yk), i = 1, ... , 1; k = 1, ... , N I. 5*i
}}}
Let an arbitrary positive 60 < po be chosen. We derive estimates (2.3) and (2.5).
We choose a positive 6' not exceeding b, p, or po - p2 and such that the S'-neighborhood of the segment of the normal passing through any point y e aD intersects the boundary in 9,50(y). First we derive the lower estimates.
Let VD(K,, K) < oo. In accordance with Lemma 1.1, for any x E G; we take a curve connecting x with a point x' c- K. for which the value of S does not exceed 0.1y; the distance between this curve and the set C is not smaller than 6'. Then, according to Lemma 1.6, we find a curve in Gi which connects x' with E K,, with the value of S not greater than MY again. We combine these curves, complete them with the curve cp "''' and obtain a function cpr, 0 <- t < T (cp, and T depend on x e G. and j), cpo = x, cpT E K;
such that SOT((p) < VD(Ki, K) + 0.4y. For j = i we define (p, so that it connects x E G. with a point x" at distance po + 6' from K. and then with the closest point of K,; then SOT((p) < 0.6y = VD(Ki, K) + 0.6y. The lengths of the intervals of definition of the functions 9, constructed for all possible compacta K,, K; and points x c- G. can be bounded from above by a constant TO < oc (cf. Lemmas 1.1 and 1.2). We extend all functions (p, to the intervals from T to TO to be a solution of z, = b(x,) so that Sor((q) If a trajectory of X; passes at a distance from cp, smaller than 6' for 0 < t S TO, then the trajectory intersects r, and reaches the b'-neighborhood of
6. Markov Perturbations on Large Time lntervah
174
K; without getting closer than P2 + 6' to any of the other compacta, more. over, XT, a ag;. Using Theorem 3.2 of Ch. 5, we obtain fore not exceeding some eo depending only on y, VO, To, and b': P(x, ag) > PX{PoT0(X`, (P) < b'}
>- exp{-e-2(SOT0((p) + 0.ly)}
> exp{-e- 2( VD(Ki, Ki) + M. The lower estimate in (2.3) has been obtained; we pass to estimate (2.5). Let x e Gi, y E aD, VD(Ki, y) < cc. We choose a point yk from our po-net
such that p(yk, y) < po; then p((yk)-a, (y)-a) < 2po. We connect x and x' a Ki with a curve with the value of S not greater than O.ly and x' and (Poj.'k e Ki with the "value" of S not greater than 0.1y. We complete the curve thus obtained with 9,K"',. Then we reach (yk)_a and the total value of S is not greater than VD(Ki, y) + 0.5y. We connect (yk)_s with (y)+a through (y)-a, increasing the value of S by no more than 0.3y. Finally, we extend all functions constructed so far to the same interval [0, To] to be a solution of z, = b(x,) in such a way that SOT0(9) < VD(Ki, y) + 0.8y. Using Theorem 3.2 of Ch. 5 again, we obtain the lower estimate in (2.5). Now we obtain the upper estimates. It is sufficient to prove estimates (2.3) and (2.5) for x e F. (this follows from the strong Markov property). By virtue of the choice of po and 45', for
any curve (p 0 < r < T beginning on Fi, touching the S'-neighborhood of ag; (the (60 + b')-neighborhood of y e aD), not touching the compacta K5, s 0 i, j and not leaving D+a u aD +a, we have: S0T(cp) >- VD(Ki, K) - 0.3y(S0T(rp) >- VD(Ki, y) - 0.4y). Using Lemma 1.9, we choose T, such that for all sufficiently small E > 0 and x E (D u aD)\g we have Px{T, > T} < exp{-E-2V0}.
Any trajectory of X; beginning at a point x e 1-i and being in ag; (in aD n 9ao(y)) at time T, either spends time T, without touching ag u OD or reaches ag; (OD n iaa(y)) over time T,; in this case PoT,(X`, IX(VD(Ki, K) - 0.3y)) ? 6' (PoT1(X`, b (VD(Ki, y) - 0.4y)) ? S').
Therefore, for any x e I'i we have Ps{X , a ag,} < Px{T, > T, } + Px{PoT,(X `, CDx(VD(Ki, K) - 0.3y))
> 6');
(2.6)
PC {X', e OD u 1ffao(y)) < Px{T, > T, }
+ PX{PoT1(X`, 4x(VD(Ki y) - 0.4y)) > 6'}. (2.7)
§2. Markov Chains Connected with the Process (X;, P;)
175
For small c the first probability on the right side of (2.6) and (2.7) does not exceed exp{ _E-2 Vo} and by virtue of Theorem 3.2 of Ch. 5 the second one is smaller than exp{-F-2(VD(K;, K;) - 0.5y)} (exp{-F-2([o(K;, y) - O.Sy)}) for all x e r; . From this we obtain the upper estimates in (2.3) and (2.5).
Lemma 2.2. For any y > 0 there exists po > 0 (which can be chosen arbitrarily
small) such that for any p2, 0 < P2 < po there exists pt, 0 < Pt < p2 such that for any So, 0 < So < po, sufficiently small F and all x outside the P2_ neighborhood of the compacta K; and the boundary 8D, the one-step transition probabilities of the chain Z. satisfy the inequalities exp{ _C-2( PD(x, K;) + y)} < P(x, 8g;)
< exp{_e-2(VD(x, K;) - y)}; exp{
2(VD(x, OD) + y)} < P(x, 8D) < exp{-F-2(V'D(x, 01)) - y)};
(2.8)
(2.9)
exp{ _ E- 2(f7
< exp{ -F-2(VD(x, y) - y)).
(2.10)
Proof. We choose a po as in the proof of the preceding lemma. Let p21 0 < P2 < po, be given. Let us denote by C the compactum consisting of the points of D u 8D except the p2-neighborhoods of K; and 8D. We choose S, 0 < S < po/2, such that for all x e CP2, j = 1, ..., 1 and y e 7D we have VD, 6(x, K;) > VD(x, K;) - 0-1y, VD _
6(x, K;) < VD(x, K,) + 0.1y,
VD +6(x, Y) ? VD(X, Y) - 0.1y,
17D_6(x, (Y)-s) < VD(x, Y) + O.ly.
We choose p2-nets x1, ... , xM in CP2 and yt, ... , yN on 3D. For every pair x;, K1 for which VD(x;, K;) < oc we select a function (px''1'-, 0 < t < T = T(x;, Kj), (po','j = x,, (p'l-Ki e K;, which does not touch Usx1K5, does
not leave D -a u 0D _ s and for which SOT(cp" ''J) < VD(x;, K;) + 0.2y. For 0
176
6. Markov Perturbations on Large Time Intervals
Lemma 2.2 has a very simple special case where there are no compacta K;, i.e., there are no co-limit sets of the dynamical system in D U OD. In this case we obtain immediately the following theorem. Theorem 2.1. Suppose that the family (X;, P;) of diffusion processes satisfies the hypotheses of Theorem 3.2 of Ch. 5 and the drift b(x) satisfies a Lipschitz condition. Let D be a domain with compact closure and smooth boundary and assume that none of the w-limit sets of the system z, = b(x,) lies entirely in D o OD. Then the asymptotics as a -p 0 of the distribution, at the time of'exit to the boundary, of the process beginning at x can be described by the action ,function e- 2VD(X, y), y e aD, uniformly in the initial point strictly inside D, i.e., we have
Jim Jim ,,2 In Pc{Xi e
bo(y)}
VD(x, y),
60-+0 E"+O
uniformly in x belonging to any compact subset of D and in y e OD, where -tE = inf{t: X; 0 D}.
In particular, with probability converging to 1 as a - 0, the exit to the boundary takes place in a small neighborhood of the set of the points at which the trajectory x,(x), t > 0, touches the boundary until exit from D o aD. (Of course, this simple result can be obtained more easily.) The formulation of Theorem 2.1 in the language of differential equations reads as follows :
Theorem 2.2. Suppose that uE(x) is the solution of the Dirichlet problem LEur(x) = 0 in D, uE(x) = exp{e- 2F(x)} on OD, where in local coordinates we have LE = Y; b'E(x)(a/ax') + e2/2 L;; a"(x)(a2/ax'ax'), b'E(x) - b'(x) uniformly in x and the local coordinate systems K,,0 as a --+ 0 and F is a continuous
function. If none of the co-limit sets of the system x, = b(x,) lies entirely in D o aD, then uE(x) = exp{e- 2 max,EPD (F(y) - VD(x, y))} as e - 0, uniformly in x belonging to any compact subset of D, where VD(x, y) is defined by the coefficients b'(x), a"(x) in the same way as in §1. In the case where there are co-limit sets in D u aD (and namely, in D) and
condition A is satisfied, for the study of problems connected with the behavior of (X;, Px) we first have to study the limit behavior of Markov chains with exponential asymptotics of the transition probabilities.
§3. Lemmas on Markov Chains For finite Markov chains an invariant measure, the distribution at the time of exit from a set, the mean exit time, etc., can be expressed explicitly as the ratio of some determinants, i.e., sums of products consisting of transition
§3. Lemmas
on Markov Chains
177
probabilities (since the values of the invariant measure and of the other quantities involved are positive, these sums only contain terms with a plus sign). Which products appear in the various sums, can be described conveniently by means of graphs on the set of states of the chain. For chains on an infinite phase space divided into a finite number of parts for which there are upper and lower estimates (of the type of those obtained in the preceding section) of the transition probabilities, in this section we obtain estimates of values of an invariant measure, probabilities of exit to a set sooner than to another (others), and so on. In the application of these results to a chain Z. with estimates of the transition probabilities obtained in the preceding section, in each sum we select one or several terms decreasing more slowly than the remaining ones. The constant characterizing the rate
of decrease of the sum of products is, of course, determined as the minimum of the sums of constants characterizing the rate of decrease of each of the transition probabilities (cf. §§4, 5). Let L be a finite set, whose elements will be denoted by the letters i, j, k, m, n, etc. and let a subset W be selected in L. A graph consisting of arrows in - n (m e L\ W, n e L, n : m) is called a W-graph if it satisfies the following conditions: (1) (2)
every point m E L\ W is the initial point of exactly one arrow; there are no closed cycles in the graph.
We note that condition (2) can be replaced by the following condition: (2')
for any point m e L\W there exists a sequence of arrows leading from it to some point n e W.
We denote by G(W) the set of W-graphs; we shall use the letter g to denote graphs. If pi; (i, jE L, j 0 i) are numbers, then Ron-n)Eg p,R will be denoted by tt(g)
Lemma 3.1. Let us consider a Markov chain with set of states L and transition probabilities pi, and assume that every state can be reached from any other state
in a finite number of steps. Then the stationary distribution of the chain is {(LEL Qd 'Qi, i E L}, where
Qi = Y_ n(g) ge G(i)
Proof. The numbers Qi are positive. It is sufficient to prove that they satisfy the system of equations
Qi = Y Q,pfi jEL
(ieL),
since it is well known that the stationary distribution is the unique (up to a multiplicative constant) solution of this system. In the ith equation we carry
178
6. Markov Perturbations on Large Time Intervals
the ith term from the right side to the left side; we obtain that we have to verify the equality
Qi I pik = I Qjpji. k*i
j*i
It is easy to see that if we substitute the numbers defined by formulas (3.1) in (3.2), then on both sides we obtain the sum n(g) over all graphs g satisfying the following conditions: (1)
Every point m e L is the initial point of exactly one arrow m -> n
(n96 m,neL); (2)
in the graph there is exactly one closed cycle and this cycle contains the point i.
Lemma 3.2. Let us be given a Markov chain on a phase space X divided into disjoint sets Xi, where i runs over a finite set L. Suppose that there exist nonnegative numbers pij (j 0 i, i, j e L) and a number a > 1 such that
a-'pij < P(x, X) < apij
(x e Xi, i # j).
for the transition probabilities of our chain. Furthermore, suppose that every set Xj can be reached from any state x sooner or later (for this it is necessary and sufficient that for any j there exist a {j}-graph g such that ir(g) > 0). Then
/ a2-21('y Qi)
i
t
Qi < v(Xi) <
// a21-2('y
}}
Qi1
Qi
for any normalized invariant measure v of our chain, where 1 is the number of elements in L and the Qi are defined by formula (3.1).
Proof. For any pair i, j there exists a numbers of steps such that the transition probabilities P(s)(x, X) for x e Xi can be estimated from below by a positive constant. It follows from this that all v(Xj) are positive. Let us consider a Markov chain with transition probabilities pij = (1/v(Xi)) Jx v(dx)P(x, X). The stationary distribution of this chain is {v(X;), i e L}, which can be estimated by means of the expression given for it in Lemma 3.1.
Now we formulate an assertion which we shall use in the study of exit to the boundary.
179
13 Lemmas on Markov Chains
For i e L\ W, j e W we denote by G;,{W) the set of W-graphs in which the sequence of arrows leading from i into W (cf. condition (2')) ends at the point j. Lemma 3.3. Let us be given a Markov chain on a phase space X = U;EL Xi,
X, n X; = 0 (i # j), and assume that the transition probabilities of the chain satisfy the inequalities
a - 'pij < P(x, X;) < api;
(x E Xi. j
i),
(3.3)
where a is a number greater than one. For x e X and B c Uke W Xk we denote by qw(x, B) the probability that at the first entrance time of Uke W Xk, the particle performing a random walk in accordance with our chain hits B provided that it starts f rom x. If the number of points in L\W is equal to r, then
a 4,
9E
I
n(g)
it(g)
G(W) n (g) gsG(W)
< qw(x, X j) :5 a4r
9E
w) n
(g)
gEG(W)
(xEX;,iEL\W,jE W), provided that the denominator is positive.
We shall prove this by induction on r. For r = I we have W = L\{i} Using the Markov property, we obtain for x E X, that gL\3i}(x, X;) = P(x, X;) +
+
f
xi
J x,
P(x, dy)P(y, X;)
P(x, dy1)
JP(yi. dY2)P(Y2, X) + .. . x;
Let us denote by Ai; the infimum of P(x, X;) over x E X.; let Bi _ infxEx; P(x, X); let us denote by Ail, B, the corresponding suprema. By assumption, Y-k*i pik > 0. This means that Bi, B; < 1. We have
Ai; + AijBi + AJ31? +
<- gL\ii)(x, X,)
A. 1
-')B <_ gL\m(x, X) <_ 1 _B .
180
6. Markov Perturbations on Large Time Intervals
But, by assumption,
a-tpij
a Y- Pik. k*i
(Analogously, 1 - Bi >- a-' Ek*; Pik). We obtain from this that (3.4) is satisfied even with a2 instead of a°.
Now let (3.4) hold for all W such that L\W contains r elements, for all i e L\ W and for all j c- W. We prove inequalities (3.4) for a set W such that there are r + 1 points in L\W. Let i E L\ W, j e W and put
F = U Xk. keL\W
k*i
We may hit X; immediately after exit from Xi; we may hit F first and then X;;
we may hit F first, then return to X i and then hit X; without calling on F, etc. In accordance with this, using the strong Markov property, we obtain qw(x, X;) = gL\(i)(x, X;) + J'qL\{il(x
X;)
F
+ JqL\(It(X, dy) fXLgwvtil(Y, dxt)NL\{q(xt, X;) F +
5qL\(x dyt) Jwu1t(Yi X) +
x x
,
dxt) 5qL\(I)(xl ,
F
f
gL\ci)(xt, dy2)
F
f gL\(i)(x, dyt) f F
jwtii(Y2
dY2)
F
dxt)
X;
,
dx2)gL\{i)(x2, X) + ...
We introduce the notation Ci; = inf gL\{i)(x, X;), XEX;
Di; = inf 5qL\u}(x, XCX;
Ei = inf
$x'
X;), X j)
and denote the corresponding suprema by Ci;, Di; and Ei (as Bi, Ei is also smaller than 1). Using this notation, we can write
C;;+D;;+Ci;Ei+Di;Ei+Ci;E?+Di;E?+ <_gw(x,X;)
181
¢; Lemmas on Markov Chains
Ci; + D
9w (x, X;) <_
1 - Ei
Ci; + Di;
1 - E.
In order to make the formulas half as bulky, we shall only consider the upper estimate. Since (3.4) is proved for r = 1, we have Ci; -< a4(pi;/Ek: i Pik).
By the induction hypothesis, we have D4 ..
< E a4 - keL\W k#i
Pik
it(g) Ya4' geGkJ(W v(i))
Y- Pik
Y
k*i
= a4+4'
_Plk.Ki Hi!
7r(g)
k#i
gEG(WL(i})
The Hi; here is the sum of the products n(g) over those graphs in GiJ(W) in which the arrow beginning at i does not lead immediately to j and K. is the sum of the same products over all (W u {i})-graphs. From this we obtain C[, + D" < a4+4' Pi;Ki + Hi;' Y- Pik ' Ki k*i
where in the numerator we now have the sum of the n(g) over all graphs belonging to G. (W).
Now we estimate the denominator 1 - Ei. From the condition that the denominator in (3.4) does not vanish we obtain that if the chain begins at an arbitrary point, it will, with probability 1, hit UkEW Xk, and consequently, UkEWv(i) Xk. Therefore, Rwv{i}(y, Xi) = 1 - 9w,,(i)(Y, Ukew Xk) and
1 - Ei = inf, { 1 - j'qL\fi(x, d)')[1 xex,l F inf { 9L\)i)(x, U
\
kEW X
- 9wv(i)(Y,
U Xk T
kEW
fF 9L\(u(x d)')9w v (nY, U kl
\\
keW
X kxex,
The first term is not less than a-4(yk,,w Pik/Y-k*i Pik); and the second one is not less than
Y- a-4 rEL\W
k*i
Pit Y- Pik kE W
k#i
I
a-4' geGIk(Wv(i}) Y-
n(g)
n(g)
geG(Wv(i))-
Li
a-4-4' CC
L Pik ' Ki
kri
The Li here is the sum of the products n(g) over those graphs in G(W) in which the arrow beginning at i leads to a point belonging to L\W. Bringing
6. Markov Perturbations on Large Time Intervals
182
the estimates of the first and second terms to a common denominator, obtain that
we
I Pik Ki + Li a-4-4" kE W
I - Ei
I ik ' K i
kxip
In the numerator here we have the sum of the n(g) over all graphs g E G(W). Finally, we obtain
pijK1 + Hij
Cij + Dij <
1 -Ei
< a4..i
pijKi + Hij
Iplk'Ki+Li
Y_
kEW
kEW
which gives the upper estimate in (3.4). Performing analogous calculations for (Cij + Dij)/(1 - Ei), we obtain that (3.4) is proved for the case where the number of elements in L\W is equal to r + 1. The lemma is proved. Lemma 3.4. Let us be given a Markov chain on a phase space
X = U Xi,
Xi n Xj = O (i # j),
IEL
with the estimates (3.3) for the transition probabilities. We denote by mw(x) the mathematical expectation of the number of steps until the first entrance of U k w Xk, calculated under the assumption that the initial state is x. If the number of points in L\W is equal to r, then for x e Xi, i E L\W we have
I
n(g) +
«-4r gEG(Wvli})
Y
I
iEI.\W.j*i gEGij(W4(j))
n(g)
I n(g) 9EG(W)
I
z., a
4 gEG(Wv(i))
n(g) +
Y_
Y_
< mw(x)
jEL\W.j#i gEG, (W.v(j})
n(g)
Y n(g) gEG(W)
If L\ W consists of only one point i, then in the sum in the numerator we have only one graph-the empty one; the product n(g) is, of course, taken to be equal to 1. If L\W consists of more than one point, then the graphs over which the sum is taken in the numerator can be described as follows: they are the graphs without cycles, consisting of (r - 1) arrows m -+ n, m e L\W, n c- L, m 96 n, and not containing chains of arrows leading from i into W. We shall denote by G(i -f W) the set of these graphs.
183
03. Lemmas on Markov Chains
The proof will be carried out by induction again. First let r = 1, i.e., we consider the first exit from X,. The smallest number of steps until exit is equal to 1; we have to add one to this if we hit Xi again in the first step; we have to add one more I if the same happens in the second step, etc. Using the Markov property, we obtain mw(x) = mL\{i}(x) = 1 + P(x, X d +
+
J xi
P(x, dx,)
J x,
5P(x, dx I)P(x 1, Xi)
P(x,, dx2)P(x2, X i) +
This expression is between 1/(1 - B) and 1/(1 - Bi), where B; and R, are introduced in the proof of Lemma 3.3. We obtain (3.5) for r = 1 from the estimates of 1 - B; and 1 - B; . Now let (3.5) hold for all sets L\W with r elements and for all i e L\W. We prove (3.5) for L\ W consisting of r + I points. As in the proof of Lemma 3.3, we put F = UkEL\W; k*i X,,. The smallest value of the first entrance time of UkEw Xk is the first exit time of X,; if at this time we hit F, then we have to add the time spent in F; if after exit from F we hit Xi again, then we also have to add the time spent in X; at this time, etc. Using the strong Markov and property. we can write this in terms of the functions mL\{i}(x), the measures qL\ii}(x, ), q,,.,{,}(x,. ): mw(x) = mL\ti}(x) + f gL\tr}(x, F
+
JF
dx,)mL\{ii(xl)
qL\r)(x, dy) fXi
+ 5qL\I(x. dy1) F
f
wr)Yidx1) qL\{i)(xi, d) JF
X;
X mwv{1}0'2) +
qL\{i}(x, dYi) fF
X f qL\t }(xI, dY2) fX g
,qW
dx2)mL\vq(x2) + ...
F
Mi = inf mL\li}(x), XE X;
N, = inf $L\(i)(x, dy)mW()(y). XEX;
I}(Yl, dx1)
fX
184
6. Markov Perturbations on Large Time Intervals
The symbols Mi and N, denote the corresponding suprema. Using this nota. tion and Ei, Ei (introduced earlier), we obtain for x e Xi that
Mi+Ni
Mi+Ni
1-Ei
1-Ei
We have already estimated the denominators in this formula in the proof of the preceding lemma and we have already proved that Mi and Mi are between a14'/Y,k3l pik. To estimate Ni and Ni, we use the estimates of gL\lil(x, Xk) and the estimates of mw,,(i)(y) for y e Xk, which hold by the induction hypothesis. We obtain that Ni and Ni are between
L kel.\W
a
pik
a
Y, Pik
k#i k*i
Qik + Rik
Ki
where K, has the same meaning as in the proof of the preceding lemma, Qik is the sum of n(g) over the graphs belonging to G(W u {i} u {k}) and Rik is the sum of ir(g) over those graphs without cycles consisting of (r - 1) arrows m - n, m e L\W, m # i, n e L, n # m, in which the chain of arrows beginning at the point k does not lead to W. Bringing the fractions to a common denominator and making the necessary simplifications, we obtain that for x E Xi, mw(x) is between
Ki + Y, PikQik + Y Pik Rik a+4r+,
keL\W
k*i
kel.\W
k#i
Y PikKi + Li
kEW
(the notation Li was introduced in Lemma 3.3). Here in the numerator we have the sum of products n(g) which in fact has to appear in the numerator of formula (3.5): namely, Ki is the sum of n(g) over the (W u {i})-graphs, the second term is the sum over those graphs without cycles consisting of
(r - 1) arrows m - n, m e L\ W, n e W, n # m, in which an arrow i - k, k 0 W, goes from i and there is no arrow going from k, and the third term is the sum of n(g) over the same graphs in which a chain begins at i from more than one arrow and ends outside W. This, together with the denominator already computed, gives the assertion of the lemma. In order not to get confused, with the empty graph, we have to consider separately the passage from r = 1 to r = 2. In this case, L\ W consists of two elements i and k and Ki = pki + Lew Pkj, Qik + Rik = 1. In the numerator in (3.6) we have pki + >je w Pkj + Pik, so that the assertion of the lemma is
true in this case, as well. U
185
§4. The Problem of the Invariant Measure
§4. The Problem of the Invariant Measure By means of the results of the preceding two paragraphs, here we solve the problem of rough asymptotics of the invariant measure of a diffusion process with small diffusion on a compact manifold. We shall use the following
formula, which expresses, up to a factor, the invariant measure p of a diffusion process (X;, Px) in terms of the invariant measure v` of the chain Z. on 8g (we recall that D = m, 8D = 0): µ`(B) = jve(dy)M 3
J'xax) dt
(4.1)
0
(cf. Khas'minskii [1]). It is clear that for the determination of the asymptotics of sums of products n(g) consisting of the numbers exp{ - r;- 2 V(K1, K;)} we need the quantities
W(Ki) = min
Y V(Km, K,,) (m-n)eg
(here V(Km, K,,) = VM(Km, Kj is the infimum of the values of the normalized
action functional on curves connecting the mth compactum with the nth one without touching the others). Lemma 4.1. The minimum (4.2) can also be written in the form
W(Ki) =min
E V(K,, Kn).
(4.3)
geG(i)(m-.n)eg
Proof. It is clear that the minimum in (4.3) is not greater than that in (4.2) (because the inequality holds for the corresponding terms). It remains to prove the reverse inequality. Let the minimum in (4.3) be attained for the graph g.
If m -+ n is an arrow from g and V(Km, Kj = V(Km, Kn), we leave m --> n as it is. If, on the other hand,
V(Km, Kj = P(Km, Kj + ... + V(Ki,, Kn) (cf. §2), we replace it by the arrows m - i 1, ... , is -+ n. Then the sum does not
change but the {i}-graph ceases to be an {i}-graph. First of all, it may turn
out that i coincides with one of the intermediate points i;, j = 1, ..., s and in the new graph there is an arrow i - i;+ t or i -+ n; in this case we omit it. Then in the graph there may be a cycle containing the point n:
n-'ij-'i;+t -...--,is--n.
6. Markov Perturbations on Large Time Intervals
186
This cycle did not appear in g; therefore, one of the points i;, i;+ t, ... , i5 was the origin of an arrow leading to some other point; in this case we open the cycle by omitting the new arrow (i; i;+ t or i;+ t -+ 1J+2 or i5 -- n), Finally, there may still be points ii from which two arrows are issued. In this case we omit the old arrows; then the sum does not increase and the graph thus constructed will be an {i)-graph. Looking over all arrows in this way, we arrive at a graph g for which
Y V(K,, Kn) < Y V(Km, Kn). O (m-.n) e 9
(M-'rt) e 9
Theorem 4.1. Suppose that the system z, = b(x,) on a compact manifold M satisfies condition A (cf. §2). Let p` be the normalized invariant measure of the diffusion process (X PS) and assume that the family of these processes satisfies the hypotheses of Theorem 3.2 of Ch. 5. Then for any y > 0 there exists pt > 0
(which can be chosen arbitrarily small) such that the p`-measure of the p). neighborhood g; of the compactum K; is between
exp{ _C-2( W(Ki) - mini W(Ki) ± y)} for sufficient small E, where the W(Ki) are constants defined by formulas (4.2) and (4.3).
Proof. In accordance with Lemmas 2.1, 1.7 and 1.8, we choose small positive P, < P2 < po such that estimates (2.3), (1.3) and (1.4) are satisfied for small E with y/41 replacing y. By Lemma 3.2, the values of the normalized invariant measure v` of the chain Z,, lie between
/
exp -pE-2I\ W(Ki) - min W(Ki) ± r
t 21
For the estimation of p`(g;) we use formula (4.1): t,
p`(g.) = je, v`(dy)M;
so
Xg,(X;) dt = fogi v`(dy)M; JO X4.(X,) dt. J0
F or & small, this does not exceed exp{-E-2(W(K;) - min; W(Ki) [(21 - 1)/41]y)} and is not less than exp{ -E-2(W(Ki) - mini W(Ki) + [(21 - 1)/4l]y)}. The sum of these numbers is not less than exp{
-
F_
2
llyj,
121 1
from which we obtain pE(M) -> ex p
{-
_2 E
[21 4l-
!]y}.
187
q4. The Problem of the Invariant Measure
In order to estimate p`(M) from above, we use formula (4.1) again:
f v`(dy)[M3a0 + My.MzT1]
p`(M) = fag v`(dy)M;T,
g
< sup M; ao + sup M; T 1. yePg
XEC
The first mean does not exceed exp{e-'(y/41)) by virtue of (1.3) and the second is not greater than some constant by virtue of the corollary to Lemma 1.9.
Normalizing p` by dividing by p`(M), we obtain the assertion of the theorem.
If b(x) has a potential, i.e., it can be represented in the form b(x) = - V U(x), where V is the operator of taking gradient in the metric ds2 = Y_ ai;(x)dxi dxi, then we can write the following explicit formula for the density of the invariant
measure: m`(x) = CE exp{-2e-2U(x)}, where CE is a normalizing factor. This can be verified by substituting in the forward Kolmogorov equation; in the case where (ai;) is the identity matrix, cf. formula (4.14), Ch. 4. This
representation of the invariant measure reduces the study of the limit behavior of the invariant measure to the study of the asymptotics of a Laplace type integral (cf. Kolmogorov [1]). Theorem 4.1 gives us an opportunity to study the limit behavior of the invariant measure when no
potential exists, and therefore, the explicit form of the solution of the Kolmogorov equation cannot be used. However, the results of this theorem are, of course, less sharp. It follows from Theorem 4.1, in particular, that as a - 0, the measure p` is concentrated in a small neighborhood of the union of the K; for which min W(K) is attained. This result was obtained in Wentzell and Freidlin [2], [4]. In some cases, the character of the limit behavior of p` can be given more accurately.
Theorem 4.2. Suppose that the hypotheses of the preceding theorem are satisfied, mini W(Ki) is attained at a unique Kio, and there exists only one normalized invariant measure po concentrated in Kio of the dynamical system it = b(x,). Then p` converges weakly to po as e - 0.
The proof is standard: of the facts related to our concrete family of processes (Xi, Px), we have to use that MX f (XI) -+ f (x,(x)) uniform in x
as e - 0 for any continuous function f (the corollary to Theorem 1.2 of Ch. 2).
Results concerning the limit behavior of p` in the case where more than one invariant measure of the unperturbed dynamical system is concentrated in Kia have been obtained in two distinct situations. Let K.0 be a smooth submanifold of M.
6. Markov Perturbations on Large Time Intervals
188
Kifer [2] considered the case where the dynamical system on K10 is a transitive Anosov system. (The class of Anosov systems is characterized by the condition that the tangent fibering can be represented as a sum of three invariant fiberings; in the first one the tangent vectors undergo an exponential expansion as translated on trajectories of the system, in the second one
they undergo an exponential contraction, and the third one is a one. dimensional fibering induced by the vector b(x) at every point. These systems form a sufficiently large set in the space of all dynamical systems.) This case is close to that of a unique invariant measure. More precisely, in the case of Anosov systems, from the infinite set of normalized invariant measures we can select one, say p*, connected with the smooth structure on the manifold under consideration in a certain manner; this measure has the property that the following condition of exponentially fast mixing is satisfied for smooth functions:
f
(P1(x11(x))4 2(x,,(x))µ*(dx) const
f
(Pi(x),u*(dx) $co2(x)M*(dx)
kPtIItIIT2IIte-k1`'-`21
II t is the norm in the space C't' of continuously differentiable functions. The measure u* appears in various limit problems concerning dynamical systems, connected with smoothness. It also turns out to be the limit of the invariant measures u' of the perturbed system, independently of the concrete characteristics of the perturbations. The second class of examples considered relates to the case where the manifold K,0 can be fibered into invariant manifolds on each of which the invariant measure of the unperturbed system is unique. The limit behavior of µ`-depends on the structure of b(x) outside K;o and the concrete form of
where
II
perturbations; it can be determined if in the process X; we select a "fast " and a "slow" motion and use a technique connected with the averaging principle
(cf. §9, Ch 7). In Khas'minskii's paper [3] the limit of u" is found in the following special case: K10 coincides with the whole manifold M, which is the two-dimensional torus, and in natural coordinates, the system has the
form x1 = b'(x), i, = yb'(x), b'(x) > 0. If y is irrational, then the invariant measure of the dynamical system is unique: it can be given by the density C - b'(x)-'. If y is rational, then the torus can be fibered into invariant circles with an invariant measure on each of them. The density of the limit measure
can be calculated and it does not coincide with C b'(x)-' in general. We introduce a definition helping to understand the character of the distribution of u' among the neighborhoods of the K.. We shall say that a set N c M is stable if for any x c- N, y 0 N we have V(x, y) > 0. Similarly to the equivalence of points, the property of stability depends only on the structure of the system z, = b(x,). The example illustrated in Fig. 11 shows that there may exist a stable compact set not containing any stable w-limit
p. The Problem of the Invariant Measure
189
set (i.e., such that any trajectory of the dynamical system beginning near this
set does not leave a small neighborhood of the set). In the example in Fig. 12, K2 and K4 are stable and Kt and K3 are unstable. Lemma 4.2. If a compactum K; is unstable, then there exists a stable com-
pactum K; such that V(Ki, K) = 0.
proof. There exists x 0 K; such that V(Ki, x) = 0. We issue a trajectory xt(x), t >_ 0, from x. It leads us to its co-limit set contained in one of the compacta K3; furthermore, V(K1, K) = V(x, K) = 0. The compactum K3 does not coincide with Ki, otherwise x would have to be in Ki; if K3 is unstable, in the same way we pass from it to another compactum, etc. Finally we arrive at a stable compactum. Lemma 4.3. (a)
Among the {i}-graphs for which the minimum (4.3) is attained there is one in which from the index m, m 0 i of each unstable compactum an arrow m --+ j is issued with V(Km, Ki) = 0 and with Kj stable.
(b)
For a stable compactum Ki, the value W(Ki) can be calculated according to (4.3), considering graphs on the set of indices of only
(c)
stable compacta. If K3 is an unstable compactum, then
W(K) = min[W(Ki) + V(Ki, K)],
(4.4)
where the minimum is taken over all stable compacta Ki.
Proof. (a) For an {i}-graph for which the minimum (4.3) is attained we consider all m for which assertion (a) is not satisfied. Among these there are ones not containing any arrow from the index of an unstable compactum.
If no arrow ends in m, we replace m -+ n by m - j with V(Km, K) = 0. Then the {i}-graph remains an {i}-graph and the sum of the values of V corresponding to the arrows decreases.
If the arrows ending in m are st - m, ... , s, - m, and K51,..., K,, are stable, we also replace m - n by m - j, V(Km, K) = 0. If no cycle is formed, then we obtain an {i}-graph with a smaller value of the sum. However, a cycle m
j-
- sk - m may be formed. Then we replace sk -1, m by sk - n;
have V(Km, Kj) +
V(K, K) = V(K, K) < V(K, Km) + V(Km, K.),
so that the sum of the values of V corresponding to the arrows does not increase.
RepeatLng this operation, we get rid of all "bad" arrows.
(b) That the minimum over {i}-graphs on the set of indices of stable compacta is not greater than the former minimum follows from (a). The reverse inequality follows from the fact that every {i}-graph on the set of
6. Markov Perturbations on Large Time Intervals
190
indices of stable compacta can be completed to an {i)-graph on the whole set {1, ... , 1) by adding arrows with vanishing V, beginning at indices of unstable compacta. (c) For any i # j we have W(K) < W(K;) + V(K;, K). Indeed, on the right side we have the minimum of V(Km, Kn) over graphs in which every point is the initial point of exactly one arrow and there is exactly one cycle i -+j - i. Removing the arrow beginning at j, we obtain an {i}-graph without increasing the sum. If K; is an unstable compactum, then we choose a stable compactum Ks such that V(K;, KS) = 0 and a graph g for which
V(K,, K,,)
is attained and in which indices of unstable compacta are initial points of only arrows with V(Km, Kn) = 0. We add the arrow j - s to this graph. Then a cycle j - s - - j is formed. In this cycle we choose the last index i of a stable compactum before j. To the arrow i - k issued from i (k may be equal to j) there corresponds V(K,, Kk) = V(Ki, K). We throw this arrow out and obtain an {i}-graph g' for which Y,(m-n)eg V(K,,
W(K) - V(K;, K). This implies that W(K) > min[W(K) + V(K;, K3)] over all stable compacta K; .
Formula (4.4) implies, among other things, that the minimum of W(K;) may be attained only at stable compacta.
We consider the example of a dynamical system on a sphere whose trajectories, illustrated in the plane, have the form depicted in Fig. 12. Of course, the system has another singular point, not in the figure, which is unstable; we have to introduce the compactum K5 consisting of this point. If the values V(K;, K), I < i, j < 4 are given by the matrix (2.1), then the values W(K) for stable compacta are W(K2) = 6 and W(K3) = 9. We obtain that ass -+ 0, the invariant measure p` is concentrated in a small neighborhood of the limit cycle K2 and converges weakly to the unique invariant measure, concentrated in K2, of the system .z, = b(x,) (it is given by a density, with respect to arc length, inversely proportional to the length of the vector b(x)). Theorem 4.3. For X E M let us set
W(x) = min[W(K;) + V(K,, x)],
(4.5)
where the minimum can be taken over either all compacta or only stable compacta. Let y be an arbitrary positive number. For any sufficiently small neighborhood ' (x) of x there exists Fo > 0 such that for c < Fo we have exp{
-F-2(W(x) - min W(K;) + y)
l
S p`(, ,,(x)) < exp{ -F-2(W(x) - min W(K) - y) }.
14. The Problem of the Invariant Measure
191
proof. For a point x not belonging to any of the compacta K;, we use the following device: to the compact Kt,... , K, we add another compactum {x}. The system of disjoint compacta thus obtained continues to satisfy condition A of §2 and we can apply Theorem 4.1. The compactum {x} is unstable. Therefore, the minimum of the values of W is attained at a compactum other than {x} and W({x}) can be calculated according to (4.5). For a point x belonging to some K;, we have W(x) = W(K). The value of p%f,,(x)) can be estimated from above by p (g;) and a lower estimate can be obtained by means of Lemma 1.8.
Theorem 4.3 means that the asymptotics as c - 0 of the invariant measure P, is given by the action function
F-2(W(x) - min W(K;) ]. We consider the one-dimensional case, where everything can be calculated
to the end. Let the manifold M be the interval from 0 to 6, closed into a circle. Let us consider a family of diffusion processes on it with infinitesimal generators b(x)(d/dx) + (E2/2)(d2/dx2), where b(x) = - U'(x) and the graph of U(x) is given in Fig. 14. The function U(x) has local extrema at the points 0, 1, 2, 3, 4, 5, 6 and its values at these points are 7, 1, 5, 0, 10, 2, 11, respectively. (This is not the case, considered in §3, Ch. 4, of a potential field b(x), since U is not continuous on the circle M.) There are six compacta containing o)-limit sets of $, = b(x,): the point 0 (which is the same as 6), 1, 2, 3, 4 and 5; the points 1, 3 and 5 are stable. We can determine the values of V(1, x) for 0 < x < 2 by solving problem
R, for the equation b(x)VX(1, x) + "(VX(l, x))2 = 0. We obtain V(1, x) = 2[U(x) - U(1)]. Analogously, V(3, x) = 2[U(x) - U(3)] for 2 < x <- 4 and V(5, x) = 2[U(x) - U(5)] for 4 < x < 6 (in all three cases it can be verified separately that for curves leading to a point x on a route different
Figure 14.
192
6. Markov Perturbations on Large Time Intervals
Figure 15.
from the shortest, the value of the functional is greater). Moreover, we find that V(1, 3) = V(1, 2) = 8, V(1, 5) = V(1, 0) = 12, V(3, 1) = V(3, 2) = 10, V(3, 5) = V(3, 4) = 20, V(5, 1) = V(5, 0) = V(5, 6) = 18, V(5, 3) = V(5, 4) = 16.
On the set (1, 3, 5} we consider {i}-graphs and from them we select those
which minimize the sums (4.3). For i = 1 this turns out to be the graph 5 -. 3, 3 -. 1; consequently W(l) = 26. For i = 3 the sum is minimized by the graph 1 -. 3, 5 -. 3 and W(3) = 24. The value W(5) = 22 is attained for the graph 3 - 1, 1 -- 5. The function W(x) can be expressed in the following way:
W(x) =
24+2U(x)
for0<x<3,
(24 + 2U(x)) A 38 1 + 2U(X) (18 + 2U(x)) A 38
for 3 < x < 4,
for4<x<5, for 5 < x < 6.
Subtracting its minimum W(5) = 22 from W, we obtain the normalized action function for the invariant measure P` as f: --- 0; its graph is given in Fig. 15 (we recall that the normalizing coefficient here is p-2). The reader may conjecture that as F - 0, the invariant measure of a onedimensional diffusion process with a small diffusion is concentrated at the bottom of the potential well with the highest walls; this conjecture is wrong.
§5. The Problem of Exit from a Domain In this section we do not assume any more that the manifold M is compact. Instead, we assume that a domain D is given on it with smooth boundary and compact closure and condition A of §2 is satisfied.
193
The Problem of Exit from a Domain
We consider graphs on the set of symbols (K...... K,, x, y, OD). For xED, yEaD we put min
WD(x, y) =
Y_
VA(a, l3).
(5.1)
(a-#)eg
(We recall that Gx,.{y, OD) is the set of all graphs consisting of (I + 1) arrows emanating from the points K...... K,, x and such that for each of these points there exists a chain of arrows leading from the point to y or OD and for the initial point x this chain ends at y.)
Lemma 5.1. The minimum (5.1) can also be written in the form
min
WD(x, y) =
Y_
VD(a, fi).
(5.2)
gEGxy(y.BD} (a-P)Eg
The minimum value of WD(x, y) over all y e OD does not depend on x and is equal to
WD = min
Y VD(a,1),
geGfeD) (a-#)eg
where either we consider {aD}-graphs on the set {K, ... , Kt, aD) or from this set we delete the symbols K. denoting unstable compaeta. The minima (5.1), (5.2) can also be written in the form WD(x, y) _ [VD(x, y) + WD] A min[VD(x, K) + WD(K,, y)] _ [VD(x, y) + WD] A min[ VD(x, KO + WD(Kr, y)], (5.4)
where WD(K,, y) is defined as one of'the following minima, in which there occur
graphs on the set {K,..... K,, y, OD}: WD(K,, y) =
min
Y_
VD(a, /3)
gEGKiy(y.eD) (a-Q)Eg
min
Y_
VD(a, /3).
(5 . 5)
gEGKiyiy,I'D) (a-#)Eg
In formula (5.3), VD(a, /3) can also be replaced by V(a, /3) (here a = K,,..., K, and /3 is one of the K, or 3D).
The proof is analogous to those of Lemmas 4.1 and 4.3. Lemma 4.2 is replaced by the following lemma, which can be proved in the same way: Lemma 5.2. If a is an unstable compactum K. or a point x e D\U, K,, then either there exists a stable compactum K; such that VD(a, K;) = 0 or VD(a, OD) = 0.
6. Markov Perturbations on Large Time Intervals
194
Theorem 5.1. Let r` be the time of first exit of the process X; from D. For any
compact subset F of D, any y > 0 and any 8 > 0 there exist So, 0 < So 5 S and co > 0 such that for all e < Eo, x c- F and y c- OD we have exp{-E-2(WD(x, y) - WD + y< Px{XT
ay)}
S exp{ -e-2(WD(x, Y) - WD - Y)}, (5.6)
where WD is defined by formula (5.3) and WD(x, y) by formulas (5.1), (5.2) or (5.5) In other words, for a process beginning at the point x, the asymptotics of the
distribution as E -- 0 at the time of exit to the boundary is given by the action function E-2(WD(x, y) - WD), uniformly for all initial points strictly inside D.
Proof. We put y' = y 4-'-' and choose a corresponding po according to Lemmas 2.1 and 2.2. We choose a positive p2 smaller than po and the distance between F and OD. According to the same lemmas, we choose a positive
Pi < P2 and use the construction described in §2, involving Z. For x c- U; G; we use Lemma 3.3 with L = {K1, ... , K;, y, aD}, W = (y, 3D) and the following sets: X2, a E L: G1, ... , G1, OD v fajy)9 aD\,9ao(y)
(beginning with n = 1, Z. E G; implies Z E ag;). The estimates for the probabilities P(x, XQ) for x e G. are given by formulas (2.3)-(2.5); we have p,Q = exp{ -a-2VD(a, fl)) and a = exp{E-2y' }. The sum Y-9rGKi ty.aD} n(g) is
equivalent to a positive constant N multiplied by exp{-E- WD(K;, y)}. (N is equal to the number of graphs g E GJ;,,,{y, aD} at which the minimum of L3-. E9 VD(a, fl) is attained. The denominator in (3.3) is equivalent to a positive constant multiplied by exp{-E-2WD}.) Taking into account that for x c- G;, WD(x, y) differs from WD(K;, y) by not more than y', this implies the assertion of the theorem for x c- U; G. If x c- F\U; G;, we use the strong Markov property with respect to the Markov time rl: PX{X', E 98o(y)} = Pz{A'=, E Oao(Y)) I
+ Y MX{XL, e ag;; Px,, {X=. E tff ao(Y)} }.
(5.7)
According to (2.10), the first probability is between exp(- E - 2(VD(x, y) ± 7'));
and according to what has already been proved, the probability under the sign of mathematical expectation is between exp{ -e- 2(WD(K;, y) - WD ± (4' + 1)y')}. Using estimate (2.8), we obtain that the ith mathematical expectation in (5.7) falls between exp{-a-2(VD(x, K;) + WD(K;, y) + WD ± (4' + 1)y')}, and the whole sum (5.7) is between exp{-E-2(WD(x,y) - WD ± (4' + 2)y')}, where WD(x, y) is given by the first of formulas (5.4). This proves the theorem.
195
The Problem of Exit from a Domain
Theorem 5.1 enables us to establish, in particular, the most probable place of exit of X; to the boundary for small e. Theorem 5.2 (Wentzell and Freidlin [3], [4]). For every j = 1, ... ,1 let Y be the set of points y t= aD at which the minimum of VD(K;, y) is attained. Let the point x be such that the trajectory x,(x), t >_ 0, of the dynamical system, issued from x, does not go out of D and is attracted to K,. From the {aD}-graphs
on the set {K,, ... , K,, OD} we select those at which the minimum (5.3) is attained. In each of these graphs we consider the chain of arrows leading from
K, into OD; let the last arrow in the chain be K; - OD. We denote by M(i) the set of all such j in all selected graphs. Then with probability converging to 1 as e - 0, the first exit to the boundary of the trajectory of X;, beginning at x, takes place in a small neighborhood of the set UjeM(i) Y;-
The assertion remains valid i/'all the VD, including those in formula (5.3), are replaced by VD.
We return to the example illustrated in Fig. 12. We reproduce this figure here (Fig. 16). Besides VD(K,, K), let VD(K2, OD) = 8, VD(K3, 3D) = 2, Po0(K4, OD) = I be given; VD(K 1, OD) is necessarily equal to + oo. On OD we single out the sets Y2, Y3 and Y4. Now there will be two {3D)-graphs mini-
mizing Li-e)eg VD(a, /3): the first one consists of the arrows K, - K2, K2 -+ 3D, K3 - OD and K4 K2 and the second one is the same with K4 - K3 replacing K4 K2. Consequently, M(1) = M(2) _ {2}, M(3) _ {3}, and M(4) = {2, 3}. The trajectories of the dynamical system emanating from a point x in the left-hand part of D (to the left of the separatrices ending at the point K4) are
attracted to the cycle K2 with the exception of the unstable equilibrium position K,. The points of the right-hand part are attracted to K3 and the points on the separating line to K4. Consequently, for small e, from points of the left half of D, the process X; goes out to aD in a small neighborhood of
Figure 16.
196
6. Markov Perturbations on Large Time Intervals
Y2, from points of the right half, it hits aD near Y3 and from points of the separating line near Y2 or Y3. If we increase VD(K,, DD) so that VD(K2, OD) = 16, VD(K3, OD) = 10,
VD(K4, aD) = 9, then again there are two (OD)-graphs minimizing the or K4 - K3; for all we have M(i) _ {3}. Consequently, for small e, the exit to the boundary sum
from all points of the domain will take place near Y3.
Now we turn to the problem of the time spent by the process X,' in D until exit to the boundary. We consider graphs on the set L = {Kt, ..., K,, x, aD}. We put MD(x) =
min
VD(a, /3).
geG(x,4jeD)) (s-$)eg
The notation G(oc -# W) was introduced in §3 after formulating Lemma 3.4. Lemma 5.3. The minimum (5.8) can also be written as
MD(x) =
min
Y-
VD(a, /3);
(5.9)
(s- fi)eg
MD(x) = WD n min[VD(x, K) + MD(K,)],
(5.10)
i
where MD(K) is defined by the equality
MD(K,) =
min
VD(a, /3),
(5.11)
geG(Xil,je I) (s-'R)eg
where in the minimum we have graphs on the set {K,,..., K1, OD} (and WD is defined by formula (5.3)). In determining the minima (5.9) or (5.11), (5.10), one can omit all unstable compacta K,.
The proof is analogous to that of Lemmas 4.1 and 4.3 again. Theorem 5.3. We have lim F2 In MXrt = WD - MD(X)
(5.12)
E-0
uniformly in x belonging to any compact subset F of D.
Proof. We choose y', PO, pt, P2 as in the proof of the preceding theorem, but with the additional condition that the mean exit time from the po-neighborhood of the K, does not exceed exp{F-2y'} (cf. Lemma 1.7). We consider
Markov times to (=0), Tt, T2,... and the chain Z. = X. We denote by
197
5. The Problem of Exit from a Domain
the index of the step in which Z. first goes out to aD, i.e., the smallest n for which t` = T.. Using the strong Markov property, we can write: 00
Mzr _
Mi4T1}.
n=0
does not exceed y'} (for small e). Hence up to a
Lemmas 1.7, 1.8 and 1.9 yield that in this sum exp{-F:-2
2exp{g-2y'} but is greater than
factor between [2 exp{e-2y'}]+1, Mzt` coincides with Y o Px{Z 0 aD} ME v. This mathematical expectation can be estimated by means of Lemma 3.4.
First for x e U; G. we obtain, using estimates (2.3)-(2.5), that
exp{e- 2(WD - MD(Ki) - (4' + 1)y')}
< M -r, < 2 exp{e2(WD - MD(K;) + (4' + 1)y')},
(5.13)
for small s. Then for an initial point x e F\U, G; we obtain
Mxr = Mzt1 +
MX{Z1 eag;; M',T`}. 1=1
Taking account of the inequality Mz r1 < 2exp{e-2y'} and estimates (2.8) (2.10) and (5.13), we obtain that M x t` is between exp{e-2(WD - MD(x) + y)}. This is true for x e U; G;, as well. Since ,y > 0 was arbitrarily small, we obtain the assertion of the theorem.
We return to the example considered above (withVD(K;, K) given by the matrix (2.1) and VD(K2, aD) = 8, VD(K3, OD) = 2, VD(K4, OD) = 1). We calculate the asymptotics of the mathematical expectation of the time t` of
exit to the boundary for trajectories beginning at the stable equilibrium position K 3. We find that WD = 10 (the minimum (5.3) is attained at the two
graphs K1 - K2, K2 -+ OD, K3 - aD, K4 - K2 or K4 - K3) and MD(K3) = 6 (the minimum (5.11) is attained at the graphs K1 - K2, K3 - K4,
K4-'K2;K-'K2,K3-'K2,K4--'
Hence the mathematical expectation of the exit time is logarithmically
equivalent to exp{e 2(WD - MD(K3))} = Considering Z, we can understand why we obtain this average exit time. Beginning in K3, the chain Z makes a number of order exp{4e-2}.
exp{e-2VD(K3, aD)} = exp{2e-2} of steps on a93, spending an amount of time of the same order with probability close to 1 for small e. After this, with probability close to 1, it goes out to aD and with probability of order
exp{-e-2(VD(K3, K2) - VD(K3, 0D))} = exp{-4e-2},
198
6. Markov Perturbations on Large Time Intervals
it passes to the stable cycle K2 (it may be delayed for a relatively small number of steps near the unstable equilibrium position K4). After this has taken place, over a time of order exp{E- 2 VD(K2 , dD)} =
exp{8e-2) the
chain
Z performs transitions within the limits of 092 and (approximately exp{E-2 VD(K2, Kt)} = exp{E-2} many times less often) Ogt with over-
whelming probability. After this it goes out to the boundary. Hence
a
mathematical expectation of order exp{4E- 2 } arises due to the less likely-of
probability of order exp{-4E-2}-values of order exp{8E-2}. We recall that in the case of a domain attracted to one stable equilibrium position, the average exit time MX ',r' has the same order as the boundaries of the range of the most probable values of TE (Theorem 4.2 of Ch. 4). In particular, any quantile of the distribution of r is logarithmically equivalent to the average. Our example shows that this is not so in general in the case where there are several compacta K, containing w-limit sets in the domain D; the mathematical expectation may tend to infinity essentially faster than the median and quartiles. This restricts seriously the value of the above theorem as a result character. izing the limit behavior of the distribution of TL.
We mentik,i the corresponding result formulated in the language of differential equations.
Theorem 5.4. Let g(x) be a positive continuous function on D u t3D and let v`(x) be the solution of the equation L`v`(x) = -g(x) in D with vanishing boundary conditions on M. We have v`(x)
exp{E- 2(WD - MD(x))}
uniformly in x belonging to any compact subset as E
0.
§6. Decomposition into Cycles. Sublimit Distributions In the problems to which this chapter is devoted there are two large param-
eters: E-2 and t, the time over which the perturbed dynamical system is considered. It is natural to study what happens when the convergence of these parameters to infinity is coordinated in one way or another. We shall be interested in the limit behavior of the measures Px{X; e I-}; we restrict ourselves to the case of a compact manifold (as in §4). The simplest case is where first e 2 goes to infinity and then t does. Then all is determined by the behavior of the unperturbed dynamical system. It is clear that lim,.,,, limE .0 Px{X; e n = 1, if the open set r contains the whole
$6
199
Decomposition into Cycles. Sublimit Distributions
W-limit set of the trajectory x,(x) beginning at the point xo(x) = x. This limit is equal to zero if r is at a positive distance from the w-limit set. In 44 we considered the case where first t goes to infinity and then a- 2 does. Theorem 4.1 gives an opportunity to establish that
lim lim PX{X; E r) = 1 L-'0 ,-00
for open sets r containing all compacta K. at which the minimum of W(K;) is attained. In the case of general position this minimum is attained at one compactum. If on this compactum there is concentrated a unique normalized invariant measure Po of the dynamical system, then
lim lim Pa{X E F) = µo(r) L-0 (-'a0
for all r with boundary of po-measure zero. We study the behavior of X; on time intervals of length t(e 2) where t(c
2)
is a function monotone increasing with increasing a-2. It is clear that if t(2) increases sufficiently slowly, then over time t(E-2) the trajectory of X; cannot move far from that stable compactum in whose domain of attraction the initial point is. Over larger time intervals there are passages from the neighborhood of this compactum to neighborhoods of others; first to the
"closest" compactum (in the sense of the action functional) and then to more and more "far away" ones. First of all we establish in which order X1 enters the neighborhoods of the compacta K. Theorem 6.1. Let L = { 1, 2, ... , I) and let Q be a subset of L. For the process
X; let us consider the first entrance time TQ of the boundaries ag, of the pneighborhoods g, of the K, with indices in L\Q. Let the process begin in g; u agi, i e Q. Then for sufficiently small p, with probability converging to 1 as e -+ 0, XL belongs to one of the sets ag, such that in one of the (L\Q)-graphs g at which the minimum
A(Q) = min
Y V(Km, K,,)
(6.1)
geG(L\Q) (m-n)eg
is attained, the chain of arrows beginning at i leads to j E L\Q.
The proof can be carried out easily by means of Lemmas 2.1 and 3.3. In this theorem FF(Km, Kn) can be replaced by V(Km, Kn) and in this case
we can also omit all unstable compacta and consider only passages from one stable compactum to another.
200
6. Markov Perturbations on Large Time Intervals
We consider an example. Let K;, i = 1, 2, 3, 4, 5 be stable compacta containing a)-limit sets and let the values V(K;, K;) be given by the matrix 0
4
9
13
12
7
0
5
10
11
6
8
0
17
15
3
6
8
0
2
5
7
10
3
0
Let the process begin near K. We determine the {2, 3, 4, 5)-graph minimizing the sum of values of V. This graph consists of the only arrow 1 --, 2.
Therefore, the first of the compacta approached by the process will be K 2
Further, we see where we go from the neighborhood of K2. We put
Q = {2}. We find that the 11, 3, 4,51-graph 2 - 3 minimizes the sum (6.1). Consequently, the next compactum approached by the process will be K3, with overwhelming probability. Then, the graph 3 --> I shows that the process returns to K 1. Passages from K , to K 2, from K, to K 3 and from K 3 back to K 1 take place many times, but ultimately the process comes to one of the compacta K4, K5. In order to see to which one exactly, we use Theorem 6.1 for Q = { 1, 2, 3). We find that the minimum (6.1) is attained for the J4,511, 1 - 2, 2 4. Hence with probability close to 1 for small e, if graph 3
the process begins near K1, K2 or K3, then it reaches the neighborhood of K4 sooner than that of K5. (We are not saying that until this, the process performs passages between the compacta K 1, K 2 and K 3 only in the most probable order K1 - K2 - K3 --> K,. What is more, in our case it can be proved that with probability close to 1, passages will take place in the reverse order before reaching the neighborhood of K4.) Afterwards the process goes from K4 to K 5 and backwards (this is shown by the graphs 4 - 5 and 5 -* 4). Then it returns to the neighborhoods of the compacta K 1, K2 and K 3; most probably to K 1 first (as is shown by the graph 5 -+ 4,4 1). On large time intervals passages will take pl i .e between the "cycles" K, - Kz -+ K3 - K, and K4 -> K5 K4; they take place the most often in the way described above. Consequently, we obtain a "cycle of cycles"-a cycle of rank two. The passages between the K; can be described by means of the hierarchy of cycles in the general case, as well (Freidlin [9], [10]). Let K1..... K,0 be stable compacta and let Q be a subset of L = 11, 2.... , I }. We assume that there exists a unique (L\Q)-graph g* for which the minimum (6.1) is attained. We define RQ(i), i e Q as that element of L\Q which is the terminal point of the chain of arrows going from i to L\Q in the graph g*. Now we describe the decomposition of L into hierarchy of cycles. We begin with cycles of rank one. For every io e L we consider the sequence 1). Let n be the smallest index io, i t , i 2 ,--- in, ... in which in = R(;,,_ 0 < m < n and for k smaller than n for which there is a repetition: in = all ik are different. Then the cycles of rank one generated by the element
201
o. Decomposition into Cycles. Sublimit Distributions
i0 e L are, by definite, the groups {io}, {i1 }.... , 1
I), {i,,,
im+ 1
. -+
i}, where the last group is considered with the indicated cyclic
order. Cycles generated by distinct initial points i0 e L either do not intersect or coincide; in the latter case the cyclic order on them is one and the same. Hence the cycles of rank one have been selected (some of them consist of only one point).
We continue the definition by recurrence. Let the cycles of rank (k - 1) be already defined. They are (briefly (k - 1)-cycles) n; ', n2 `, ... , sets of (k - 2)-cycles equipped with a cyclic order. Ultimately, every cycle consists of points-elements of L; we shall denote the set of point which constitute a cycle by the same symbol as the cycle itself. We shall say that a n; if Rn; '(m) e nk cycle n;' is a successor of n;` and write n;` i(m) assumes the same `. It can be proved that the function for in e it value for all in e n'j', so that the above definition is unambiguous. Now we consider a cycle it ' and a sequence of cycles n1i. ' n!`-
beginning with it. In this sequence repetition begins nk ' from a certain index. Let it be the smallest such index: 7r`,-' = it ', 0 < in < n. We shall say that the cycle nko ' generates the cycles of rank k { lr `_ , } (in cycles of rank k, each consisting of one cycle of the preceding rank) and {n;,,;' nk,-'j 77i. it '}. Taking all initial (k - 1)-cycles ir,,-', we decompose all cycles of rank (k - 1) into k-cycles. The cycles of rank zero are the points of L; for some k all (k - 1)-cycles participate in one k-cycle, which exhausts the whole set L. For small s, the decomposition into cycles completely determines the most probable order of traversing the neighborhoods of the stable compacta by trajectories of X; (of course, all this concerns the case of "general position,"
where every minimum (6.1) is attained only for one graph). Now we turn our attention to the time spent by the process in one cycle or another. Theorem 6.2. Let it be a cycle. Let us put
C(n) = A(n) - min min
Y V(Km, K.),
(6.2)
icn gEGRIi}
where A(n) is defined by formula (6.1) and Gn{i} is the set of {i)-graphs over the set it. Then. for sufficiently small p > 0 we have
lim E2 In Mr = C(n)
(6.3)
c- 0
uniformly in x belonging to the p-neighborhood of u ,, Ki and for any y > 0 we have
lim E-0
uniformly in all indicated x.
Px{e`-z(ctni-,)
< Tn < e`-Z(ctn)+,}} = 1
(6.4)
202
6. Markov Perturbations on Large Time lutervats
Proof. Relation (6.3) can be derived from Theorem 5.3. We recall that for any set Q c L (not a cycle), limE_o e2 In MX' XrQ depends on the choice of the point x e u;EQ g; in general (cf. §5). The proof of assertion (6.4) is analogous to that of Theorem 4.2 of Ch. 4 and we omit it.
The decomposition into cycles and Theorem 6.2 enable us to answer the problem of behavior of X' on time intervals of length t(e-2), where t(E-2) has exponential asymptotics as E-' -+ 00: t(2) x e" - Z. Let it, n', ... , n1si be cycles of next to the last rank, unified into the last cycle, which exhausts L. If the constant C is greater than C(n), C(n'), ..., C(n's'), then over time of exp{Ce-2}, the process can traverse all these cycles many times (and order all cycles of smaller rank inside them) and for it the limit distribution found in §4 can be established. In terms of the hierarchy of cycles, this limit distribution is concentrated on that one of the cycles it, ri ..... n's' for which the
corresponding constant C(n), C(n ), ... or C(n"') is the greatest; within this cycle, it is concentrated on that one of the subcycles for which the corresponding constant defined by (6.2) is the greatest possible, and so on up to points i and the compacta K; corresponding to them. The same is true if we do not have t(E-2) cexp{Ce-2} but only lim e2 In t(e- 2) > max(C(n), C(n ), ... , C(n's')}.
r-o
Over a time t(E- 2) having smaller order, the process cannot traverse all cycles and no limit distribution common for all initial points sets. Nevertheless, on cycles of not maximum rank, which can be traversed by the process, sublimit distributions depending on the initial point set. We return to the example considered after Theorem 6.1. We illustrate the hierarchy of cycles in Fig. 17: the cycles of rank zero (the points) are
unified in the cycles (I - 2 -4 3 - 11 and {4 -- 5 --,. 4} of rank one and they are unified in the only cycle of rank two. On the arrow beginning at each cycle it we indicated the value of the constant C(n). If lim,_0 E2 In t(E-2) is
between 0 and 2, then over time t(e- 2), the process does not succeed in moving away from that compactum K, near which it began and the sublimit 11
Figure 17.
§7. Eigenvalue Problems
203
distribution corresponding to the initial point x is concentrated on that K.
to whose domain of attraction x belongs. Over time t(e- 2) for which limE-.o e2 In t(e-2) is between 2 and 3, a moving away from the 0-cycle {4} takes place; i.e., the only thing what happens is that if the process was near K4, then it passes to K5. Then the sublimit distribution is concentrated on
K5, for initial points in the domain of attraction of K4 and in the domain of attraction of K5. If limt.o s2In t(E-2) is between 3 and 4, then over time t(E-2) a limit distribution is established on the cycle {4 -+ 5 - 4}, but nothing else takes place. Since this distribution is concentrated at the point 5 corresponding to the compactum K5, the result is the same as for lim E2 In t(E-2) e-+0
between 2 and 3. lirnE_o E2 In t(r 2) is between 4 and 5, a moving away from the cycle If
{4 -+ 5 -+ 4} takes place, the process hits the neighborhood of K 1 and passes from there to K2, but not farther; the sublimit distribution is concentrated on K2 for initial points attracted to any compactum except K3 (and for points attracted to K3 on K3). Finally, if limE-o e21n t(E-2) > 5, then a limit distribution is established (although a moving away from the cycle {I -+ 2 -+ 3 -+ 11 and even from the cycle {3} may not have taken place).
Now we can determine the sublimit distributions for initial points belonging to the domain of attraction of any stable compactum. For example, for x atK4 provided tracted to K4 we have lime.ao PX{X,(2) e I-) = 1 for open I that 0 < limE-0 0 In t(E-2) < lmE.0 E2 In t(E-2) < 2; limE_,o PX{X;(E-2) e = 1 for open F K5 provided that 2 < limt-,o E2 In t(E-2) < fim,o EZ In t(e- 2) < 4. If these lower and upper limits are greater than 4 and smaller than 5, then lim,-o PX{X;(E-2) e F} = 1 for open sets I containing K2. is attracted Finally, if lim.0 E2 In t(E-2) > 5, then the distribution of to K3, independently of the initial point x. A general formulation of the result and a scheme of proof can be found in
Freidlin [10]. Sublimit distributions over an exponential time can also be established in the case of a process X; stopping after reaching the boundary of the domain (the situation of §5). We note that the problem of limit distribution of X' (,-2) is closely connected with the stabilization problem of solutions of parabolic differential equations with a small parameter (cf. the same article).
§7. Eigenvalue Problems Let L be an elliptic differential operator in a bounded domain D with smooth
boundary 8D. As is known, the smallest eigenvalue Al of the operator -L with zero boundary conditions is real, positive, and simple. It admits the
204
6. Markov Perturbations on large Time Intervals
following probability theoretic characterization. Let (X Px) be the diffusion process with generator L and let r be the time of first exit of X, from D. Then ,t forms the division between those A for which Mxe''t < or, and those for which Mxext = x (cf. Khas'minskii [2]). The results concerning the action functional for the family of processes (Xl, Px) corresponding to the operators z
LF -
2
z
I a"(x) c7x'dx' + Y he(x) ax.
can be applied to determine the asymptotics of eigenvalues of - Lr as F -+ 0. Two qualitatively different cases arise according as all trajectories of the dynamical system _r, = b(x,) leave D u OD or there are stable w-limit sets of the system in D.
The first, simpler, case was considered in Wentzell [6]. As has been established in §1 (Lemma 1.9), the probability that the trajectory of Xt'spends
more than time T in D can be estimated from above by the expression exp{-F-zc(T - To)}, c > 0. This implies that MXe't` < x for A < F-zc, and therefore, A,; > a zc. The rate of convergence of A to infinity is given more accurately by the following theorem. Theorem 7.1. As r --> 0 we have : A.; = (c, + o(1))F- z. where
c, = lim T` min{SOT(cp): cp, e D u OD, 0 < t < T}
(7.2)
and SOT is the normalised action fitnctional.fbr the family of diffusion processes (X`+ P`X ) (
Proof'. First of all we establish the existence of the limit (7.2). We denote the minimum appearing in (7.2) by a(T). Firstly, it is clear that a(T/n) < a(T)/n for any natural number n (because the value of S((p) is less than or equal to SOT((p)/n for at least one of the intervals [kT/n, (k + 1)T/n]). From this we
obtain that for any T, T > 0, a(T) >- [T/T] cc(T).
(7.3)
In order to obtain the opposite estimation, we use the fact that there exist positive To and A such that any two points x and y of D u 8D can be connected with a curve cp,(x, y), 0 < t < T(x, y) < To, such that SOT(x,r)((P(x, y)) < A.
205
17. Eigenvalue Problems
For some large T let the minimum a(T) be attained for a function cp. We put x = 0-p, 1' = 0o and construct cp, from pieces: cp, = Co, for 0 < t <- T; 4,1 =
cp,`.r(x, y) for T < t < T + T(x, y); we extend cp periodically with
period T + T(x, y). For any positive T we obtain SOT((p) < ([T/(T + T(x, y))] + 1)(SoT( ) + SOT(..Y)(co(x, y))), (7.4)
a(T) < ([TIT] + 1)(a(T) + A).
(7.5)
Dividing (7.3) and (7.5) by T and passing to the limit as T - oe, we obtain
T-'a(T)< limT-. T-'a(T) < )Tii -
T-'a(T) < T-'(a(T) + A).
(7.6)
Passing to the limit as T -- oe, we obtain that the limit (7.2) exists (and is finite).
Now let T` be the time of first exit of X; from D. We prove that exp{E-2 bT') = oo for sufficiently small E if b > cl. It is easy to see that
MX
lim T-' sup min{SOT((p): cpa = x, (PT = y, (p, e D u c3D, 0 < t < T) x.yeD
is also equal to c,. We choose a positive x smaller than (b - c,)/3, and T such that
T-' sup min{SOT(cp): p = x, c°T = y, (p, e D u OD, 0 < t < T} < c, + x. x.yeD (7.7)
Now we choose 6 > 0 such that min{SOT(cp): cpo = x, cpT = y, (p, E D_6 u OD- 6,
0 < t < T} < T(c, + 2x) (7.8)
for all x, y e D _ 6, where D _ 6 is the set of points of D at a distance greater
than 6 from the boundary. This can be done according to Lemma 1.4 (T= T, because the functions cp, are defined on the same interval [0, T]; cf. the proof of Lemma 1.4). For arbitrary x e D - 6, }' e D - 26 we choose a curve cp 0 <_ t < T, connecting them inside D_6 u OD_,, and such that
206
6. Markov Perturbations on Large Time Intervals
SO 40 < T(c, + 2x). By Theorem 3.2 of Ch. 5 we obtain that for sufficiently small E and all x e D_, , PX{TE > T, X`TeD_a} > Px{P0T(X`, (,) < 6}
expf _F-2 MAO +x T]) >- exp{-E-ZT(c, + 3x)}. Successive application of the Markov property (n - 1) times gives
PX(TE > nT) > exp{-ne-2T(c, + 3x)}. From this we obtain for small E and all n that Mx exp{e_2bTE} > exp{E-ZbnT}Pz{TE > nT}
exp{nE-2T(b - c, - 3x)}, which tends to infinity as n - :e. Therefore, M, exp{E-Zbr } = oo.
Now we prove that if b < c,, then Mx exp{E-2br } < co for small E. Again we choose 0 < x < (c, - b)/3 and T and b > 0 such that min{SOT(cp): (p, e D+a u BD+a, 0 < t < T) > T(c, - 2x)
(7.10)
(we have applied Lemma 1.4 once more). The distance between the set of functions ,G, entirely in D for 0 < t < T and any of the sets (Dx(T(c,
- 2x)) =
{tp: (PO = X, So .(cp) 5 T(c, - 2x)}
is not less than S. Therefore, by the upper estimate in Theorem 3.2 of Ch. 5, for sufficiently small E and all x e D we have Px{TE > T} < Px{PoT(X`, 4)x(T(c, - 2x))) >- 8}
< exp{ -E- ZT(c, - 3x)}.
(7.11)
The Markov property gives us PX{TE > nT} < exp{ -ne-2T(c, - 3x)}, and 00
Mx exp{E-zbT`} <
exp{E-Zb(n + 1)T}PX{nT < TE < (n + 1)T} n=0 OD
exp{E-2b(n + 1)T}Px{TE > nT} n=0 W
< exp{E-2bT} Z exp{nE-2T(b - c, + 3x)} < co. n=0
The theorem is proved. O
207
17. Eigenvalue Problems
The following theorem helps us to obtain a large class of examples in which the limit (7.2) can be calculated. Theorem 7.2. Let the field b(x) have a potential with respect to the Riemannian metric connected with the diffusion matrix, i.e., let there exist a function U(x), smooth in D v aD and such that
b'(x) _ - a''(x) OU(x) axi
(7.12)
Then c, is equal to min {.! XeDui)D 2 ij where
a" (x)b'(x)b'(x) }(7.13) JJJ
(a;{x)) = (a''(x))-'.
Proof. We denote by ci the minimum (7.13). Let the minimum a(T) be attained for the function (p:
a(T) =
J 0r 2Y_ a1.((q,)(0P'
-
b'((p,)) dt.
(7.14)
We transform the integral in the following way: T1
a(T) =
f 2Y_ a1,((p,)ci c?i dt
-J
T
a;;((p,)b'((p,)O{ dt +
a,,{(P,)b'((p,)bi((P,)dt. J0
The first integral here is nonnegative, the second is equal to U(90) - U(9T) by (7.12), and the third is not smaller than ci multiplied by T. This implies
that T- la(T) > c* - T-'C, where C is the maximum of U(x) - U(y) for all x, y e D' aD. Passing to the limit as T - oo, we obtain that c, > c;. The inequality c, < cr follows from the fact that for the function cp, - x0, where x0 is a point where the minimum (7.13) is attained, we have T -'SOT((p)
= cl. Now we consider the opposite case-the case where D contains w-limit sets of the system z, = b(x,). We assume that condition A of §2 is satisfied (there exist a finite number of compacta K 1, ... , K, consisting of equivalent points and containing all co-limit sets). As we have already said, in this case our process with a small diffusion can be approximated well by a Markov
208
6. Markov Perturbations on Large Time Intervals
chain with (1 + 1) states corresponding to the compacta K; and the boundary aD, having transition probabilities of order
exp{-E-2VD(K;, K;)}, exp{ -E 2 VD(K;, 3D)}
(7.15)
(the transition probabilities from aD to K; are assumed to be equal to 0 and the diagonal elements of the matrix of transition probabilities are such that the sums in rows are equal to 1). It seems plausible that the condition of finiteness of MX' e" ` is close to the condition that the mathematical expectation of e'1" is finite, where vE is the number of steps in the chain with the indicated transition probabilities until the first entrance of OD. The negative of the logarithm of the largest eigenvalue
of the matrix of transition probabilities after the eigenvalue I turns out to be the divisor between those A for which the above mathematical expectation is finite and those for which it is infinite. The asymptotics of eigenvalues of a matrix with entries of order (7.15) can be found in Wentzell's note [3]. The justification of passage from the chain to the diffusion process X is discussed in Wentzell [2]. Theorem 7.3. Let A.;, A', ... , A be the eigenvalues, except the eigenvalue 1, indexed in decreasing order of their absolute values, of a stochastic matrix
with entries logarithmically equivalent to the expressions (7.15) as E
0.
We define constants V(k(, k = 1, ... , 1, 1 + 1 by the formula Vlkl
= min
I
1",(O(,
(7.16)
geG(k) l=-R)eg
where G(k) is the set of W-graphs over the set L = {Kt, ..., Kt, aD} such that W contains k elements.
(We note that for k = 1 + 1, we have W = L; in this case there is exactly one W-graph, which is the empty graph and the sum in (7.16), just as V"'), is equal to zero.) Then for E - 0 we have Re(1 -,1k)
exp{-(V(k) -
Vlk+Il)E-Z}.
(7.17)
The proof is analogous to those of Lemma 3.1 and other lemmas of §3: the coefficients of the characteristic polynomial of the matrix of transition probabilities from which the identity matrix is subtracted can be expressed as sums of products ir(g) over W-graphs. From this we obtain the asymptotics of the coefficients, which in turn yields the asymptotics of the roots.
209
j'). Eigenvalue Problems
We note that the assertion of the theorem implies that V(k- 1) - V(k) V(k) - V(k+ 1) < V(k+1) _
V(1 +2) In the case
of "general position," where these inequalities are strict, the eigenvalue Ak turns out to be real and simple for smalls (the principal eigenvalue Al is always real and simple). As far as the justification of the passage from the asymptotic problem for the Markov chain to the problem for the diffusion process is concerned, it is
divided into two parts: a construction, independent of the presence of the small parameter e, which connects diffusion processes with certain discrete chains and estimates connected with the Markov chains Z introduced in §2. For the first part, the following lemma turns out to be essential. Suppose that (X,, Px) is a diffusion process with generator L in a bounded domain D with smooth boundary, ag is a surface inside D, r is the surface of a neighborhood of ag, separating ag from OD. Let us introduce the random times
Qo = min{t; X, e r}, r1 = min{t > ao: X, e ag u aD}.
For a > 0 we introduce the operators q" acting on bounded functions defined on ag:
q f (x) = Mx{Xt, e ag; c""f (XT)},
x e ag.
(7.18)
Lemma 7.1. The smallest eigenvalue Al of - L with zero boundary conditions on OD is the division between those a for which the largest eigenvalue of q" is smaller than I and those for which it is larger than 1.
The proof of this assertion and a generalization of it to other eigenvalues can be found in Wentzell [5].
In order to apply this to diffusion processes with small diffusion, we choose ag = Ui 0gi and r = Ui ri in the same was as in §2 and for the corresponding operators qf, we prove the following estimates. Lemma 7.2. For any y > 0 one can choose the radii of the neighborhoods of Ki so small for all a <e- If-' and sufficiently small r we have
I + ae-Y` 2 < Me6" < 1 + aey' 2, exp{-(VD(K;, K;) + y)E-z} < q`X"Cj(x) = MX{XT, Eag;; a"t'} < exp{-(VD(Ki, K;) -
Xeag;
y)c-z},
(7.19)
xeag;; (7.20)
exp{-(VD(Ki, 3D) + y)E-2} < Mx{XT, EOD; a"t'}
< exp{-(VD(K,,aD) -
y)E-z},
xeagi.
(7.21)
210
6. Markov Perturbations on Large Time Intervals
The proof is analogous to that of Lemmas 2.1 and 1.7; the upper estimates are obtained with somewhat more work. In conclusion, we obtain the following result. Theorem 7.4. The smallest eigenvalue A, of - L` is logarithmically equivalent to
exp{-E-z(Vtt1 - V12t} as r
(7.22)
0, where VIII, Vt2 are defined by formula (7.16).
In particular, if D contains a unique asymptotically stable equilibrium position 0, then W, ;::--
exp.f-r:-Zmin V(O, y)}, ean
(7.23)
J
where V is a quasipotential. Theorems 7.1 and 7.4 do not give anything interesting in the case where there are only unstable sets inside D or the stable limit sets are on the boundary of the domain. Some results concerning this case are contained in the paper [1] by Devinatz, Ellis, and Friedman. The results contained in Theorems 7.3 and 7.4 are related to results of §6. In particular, the constants which are to be crossed by limt..o e2 In t(e- Z) as
a sublimit distribution changes to another one are nothing else but Vtk,
- V(k+1).
We clarify the mechanisms of this connection. Let (X;, Px) be a diffusion
process with small diffusion on a compact manifold M or a process in a closed domain D u OD, stopped after exit to the boundary and let f (x) be a continuous function vanishing on aD. We assume that the function u(t, x) = Mx f(X;)-the solution of the corresponding problem for the equation au'/at = LEu`-can be expanded in a series in the eigenfunctions e;(x) of - L`: ue(t x) _
c7e
'e(x)
(7.24)
with coefficient c obtained by integrating f multiplied by the corresponding
eigenfunction if of the adjoint operator. (Since L` is not self-adjoint, the existence of such an expansion is not automatic.) We assume that the first to eigenvalues are real, nonnegative and decrease exponentially fast: A.;
xexp{-Cis-z},
Let the function t(e 2) be such that C;+t < lirE...o eZ In t(e-z) < Him, -o e2 In t(e-2) < C,. We have lim5.o
Z) = 1 for 1 < i 5 j and this
211
§7. Eigenvalue Problems
limit is equal to zero for i > j. If the infinite remainder of the sum (7.24) does not prevent it, we obtain
lim u5(t(E-2), x) = L lira 5f(y)e(y) dy lim e;(x). i=1 E-0
E-O
(7.25)
E-O
In other words, the character of the limit behavior of u5(t(ex) changes as 2)
crosses Cj. This outlines the route following which we can derive conclusions concerning the behavior of a process on exponentially increasing time intervals lim(-.o E2 In t(E-
from results involving exponentially small eigenvalues and the corresponding eigenfunctions. The same method can be used for the more difficult inverse problem.
Chapter 7
The Averaging Principle. Fluctuations in Dynamical Systems with Averaging
§1. The Averaging Principle in the Theory of Ordinary Differential Equations Let us consider the system
Z; = Eb(Z;, c,),
Zo = x
(1.1)
t > 0, is a function assumof ordinary differential equations in R', where ing values in R', e is a small numerical parameter and b(x, y) = (b' (x, y), ... , b'(x, y)).
If the functions b'(x, y) do not increase too fast, then the solution of equation (1.1) converges to Z° = x as E -' 0, uniformly on every finite time interval [0, T]. However, the behavior of Zr on time intervals of order E-' or of larger order is usually of main interest since it is only times of order E-' over which significant changes-such as exit from the neighborhood of an equilibrium position or of a periodic trajectory-take place in system (1.1). In the study of the system on intervals of the form [0. TE-'], it is convenient to pass to new coordinates in order that the time interval do not depend on E. We set X; = Z. Then the equation for X; assumes the form
X; = b(X;,Xo = X.
(1.2)
The study of this system on a finite time interval is equivalent to the study of system (1.1) on time intervals of order E-'.
Let the function b(x, y) be bounded, continuous in x and y, and let it satisfy a Lipschitz condition in x with a constant independent of y:
Ib(x1.y) - b(x2,y)I < KIx, - x2I. We assume that lim T-oo
1
rT
T Jo
b(x, cS) ds = b(x)
(1.3)
*l. The Averaging Principle in the Theory of Ordinary Differential Equations
213
for x c- R'. If this limit exists, then the function b(x) is obviously bounded and
satisfies a Lipschitz condition with the same constant K. Condition (1.3) is satisfied, for example, if , is periodic or is a sum of periodic functions. The displacement of the trajectory X; over a small time A can be written in the form ('A
Xt - X = J b(Xs, es/e) ds 0 A
A
= f b(x, bs/E) ds +
f[b(X'S,s/E) -
b(x,s/E) ds
0
0 e/E
0` -
f
o
b(x, cs) ds + pE(O)
The coefficient of A in the first term of the last side converges to b(x) as e/A -' 0,
according to (1.3). The second term satisfies the inequality I pE(A) I < KA2. Consequently, the displacement of the trajectory X; over a small time differs from the displacement of the trajectory x, of the differential equation X, = b(x,),
xo = x.
(1.4)
only by an infinitely small quantity compared to A as A - 0, e/A - 0. If we assume that the limit in (1.3) is uniform in x, then from this we obtain a proof of the fact that the trajectory X, converges to the solution of equation (1.4), uniformly on every finite time interval as a - 0. The assertion that the trajectory X; is close to x, is called the averaging principle.
An analogous principle can also be formulated in a more general situation We consider, for example, the system
Xo=x,
Cl _ E-'b2(XI,o = y,
(1.5)
e R', and b, and b2 are bounded, continuously differentiable functions on R' Q R' with values in R' and R', respectively. The velocity of the motion of the variables has order E-' as a -+ 0. Therefore, the are called fast variables, the space R' the space of fast motion and the x slow where x E R',
variables.
We consider the fast motion 4,(x) for fixed slow variables x c- R':
4,(x) = b2(x, ,(x)),
fo(x) = y,
and assume that the limit lim
1
T
T- o ' Jo
b,(x, 5(x)) ds = b,(x).
(1.6)
214
7. The Averaging Principle
exists. For the sake of simplicity, let this limit be independent of the initial pointy of the trajectory S(x). The averaging principle for system (1.5) is the assertion that under certain assumptions, displacement in the space of slow motion can be approximated by the trajectory of the averaged system x, = b1(x,),
zo = x.
For equation (1.2), the role of fast motion is played by c; _,tE. In
this
case the velocity of fast motion does not depend on the slow variables. Although the averaging principle has long been applied to problems of celestial mechanics, oscillation theory and radiophysics, no mathematically rigorous justification of it had existed for a long time. The first general result
in this area was obtained by N. N. Bogolyubov (cf. Bogolyubov and Mitropol'skii [1]). He proved that if the limit (1.3) exists uniformly in x, then
the solution X; of equation (1.2) converges to the solution of the average system (1.4), uniformly on every interval. Under certain assumptions, the rate of convergence was also estimated and an asymptotic expansion in powers of the small parameter was constructed. In Bogolyubov and Zubarev [1] (cf. also Bogolyubov and Mitropol'skii [1]), these results were extended to some cases of system (1.5), namely to systems in which the fast motion is one-dimensional and the equation for ' has the form , = E-'b2(X,) and to some more general systems. V. M. Volosov obtained a series of results concerning the general case of system (1.5) (cf. Volosov [1]). Nevertheless, in the case of multidimensional fast motions, the requirement of uniform convergence to the limit in (1.6), which is usually imposed, excludes a series of interesting problems, for example, problems arising in perturbations of Hamiltonian systems. In a sufficiently general situation it can be proved that for every T > 0 and p > 0, the Lebesgue measure of the set F`p of those initial conditions in problem (1.5) for which supo s ep converges to
zero with e. This result, obtained in Anosov [1], was later sharpened for systems of a special form (cf. Neishtadt [1]). Consequently, if a system of differential equations is reduced to the form (1.2) or (1.5), then it is clear, at least formally, what the equation of zeroth approximation looks like. In some cases a procedure can be found for the determination of higher approximations. In the study of concrete problems first we have to choose the variables in such a way that the fast and slow motions be separated. As an example, we consider the equation
z, + w2x = e (x, x; t),
x e R'
(1.7)
If f (x, x, t) = (1 - x2)z, then this equation turns into the so-called van der Pol equation describing oscillations in a lamp generator. For s = 0 we obtain the equation of a harmonic oscillator. In the phase plane (x, z) the solutions of this equation are the ellipses x = r cos((ot + 0), x = -rw sin(cot + 0), on which the phase point rotates with constant angular velocity to. Without
ll. The Averaging Principle in the Theory of Ordinary Differential Equations
215
perturbations (s = 0), the amplitude r is determined by the initial conditions and does not change with time and for the phase we have w, = on + 0. If e is now different from zero, but small, then the amplitude r and the difference
or - wt are not constant in general. Nevertheless, one may expect that
the rate of change of them is small provided that a is small. Indeed, passing from the variables (x, z) to the van der Pol variables (r, 0), we obtain drF
dr
= eF,(wt + 0`, r'., t),
ro,
rF2(wt + 0`, r':, t),
Oo = 00,
dO`
dt =
where the functions F,(s, r, t) and
t) are given by the equalities
F,(s, r, t) = - 1 f (r cos s, -r sin s, t) sin s, co
F2(s, r, t) = - rw f (r cos s, - r sin s, t) cos s. Consequently, in the van der Pol variables (r, 0), the equation can be written in the form (1.1). If f (x, z, t) does not depend explicitly on t, then F, (wt + 0, r) and F2(wt + 0, r) are periodic in t and condition (1.3) is satisfied. Therefore,
the averaging principle is applicable to system (1.8). In the case where f (x, )i, t) depends periodically on the last argument, condition (1.3) is also satisfied. If the frequency w and the frequency v of the function f in t are incommensurable, then the averaged right sides of system (1.8) have the form (cf., for example, Bogolyubov and Mitropol'skii [1])
F,(r)
r 2,r 4rr2(0
0
f (r cos 4/, - r sin /i, %,t) sin 0 dt/i dt, o
2a
F2(r)
,.2n
f
f
(1.9)
2n
f(r cos >G, -r sin 0, vr) cos 41 dpi A
4nzrw fo
It is easy to calculate the averaged right sides of system (1.8) for commensurable co and v, as well. Relying on the averaging principle, we can derive that as a 0, the trajectory (r,, 0,) can be approximated by the trajectory of the averaged system rr = eFtlrr),
O, =
ro = ro,
Oo = 00,
uniformly on the interval [0, Tr-']. Using this approximation. we can derive a number of interesting conclusions on the behavior of solutions of equation (1.7). For example, let F,(r0) = 0 and let the function F,(r) be positive to the left of r0 and negative to the right of it, i.e., let r0 be the amplitude of
7. The Averaging Principle
216
an asymptotically stable periodic solution of the averaged system. It can be shown by means of the averaging principle that oscillations with amplitude close to ro and frequency close to w occur in the system described by equation (1.7) with arbitrary initial conditions, provided that a is sufficiently small. In the case where F1(r) has several zeros at which it changes from positive to negative, the amplitude of oscillations over a long time depends on the initial state. We shall return to the van der Pol equation in §8. Equation (1.7) describes a system which is obtained as a result of small perturbations of the equation of an oscillator. Perturbations of a mechanical system of a more general form can also be considered. Let the system be conservative, let us denote by H(p, q) its Hamiltonian, and let the equations of the perturbed system have the form E
dd p`
dt
dq`
dt
=
H dq;
(ps, qE) + cf
p(p`. qE).
H (PE, qE) + tf .(p'. qE),
i = 1, 2.... , n.
app
If n = 1,p=x,q= andH(p,q)=q2+wzp2,fp=0,fy =.f (p. q), then this system is equivalent to equation (1.7). In the case of one degree of freedom
(n = 1), if the level sets H(p, q) = C = const are smooth curves homeomorphic to the circle, new variables can also be introduced in such a way that the fast and slow motions be separated. As such variables, we can take the value H of the Hamiltonian H(p, q) and the angular coordinate cp on a level
set. Since fi(p, q) = 0 for the unperturbed system, H(p;'q;) will change slowly.
It is clear that if we choose (H', (p) as variables, where H' is a smooth function of H, then H' also changes slowly. For example, in the van der Pol variables, the slow variable is r = .,IH. In order that the system preserve the Hamiltonian form, in place of the variables (H, 9), one often considers the so-called action-angle variables (1, (p) (cf., for example, Arnold [1]), where I = H/w. In the multidimensional case the slowly varying variables will be
first integrals of the unperturbed system. In some cases (cf. Arnold [1], Ch. 10), in Hamiltonian systems with n degrees of freedom we can also introduce action-angle variables (1, (p); in these variables the fast and slow mo-
tions are separated and the system of equations preserves the Hamiltonian form.
§2. The Averaging Principle when the Fast Motion is a Random Process As has already been mentioned, condition (1.3), which implies the averaging
principle, is satisfied if the function , is periodic. On the other hand, this condition can be viewed as a certain kind of law of large numbers: (1.3) is
§2. The Averaging Principle when the Fast Motion is a Random Process
217
satisfied if the values assumed by , at moments of time far away from each other are "almost independent ". In what follows we shall assume that the role of fast variables is played by a random process. First we discuss the simpler case of (1.2), where the fast
motion does not depend on the slow variables. Hence let , be a random process with values in R'. We shall assume that the function b(x, y) satisfies a Lipschitz condition: Ib(xt,yt) - b(x2, Y2)I < K(Ixt - X21 + IYt - Y2 I)Concerning the process we assume that its trajectories are continuous with probability one or on every finite time interval they have a finite number of discontinuities of the first kind and there are no discontinuities of the second
kind. Under these assumptions, the solution of equation (1.2) exists with probability 1 for any x e R' and it is defined uniquely for all t >- 0. If condition (1.3) is satisfied with probability 1 uniformly in x e R', then the ordinary averaging principle implies that with probability 1, the trajectory of X; converges to the solution of equation (1.4), uniformly on every finite interval (b(x) and x, may depend on w in general).
Less stringent assumptions can be imposed concerning the type of convergence in (1.3). Then we obtain a weaker result in general.
We assume that there exists a vector field b(x) in R' such that for any
>0andxc-R'wehave 1
T
rjr+T
b(x, S) ds - b(x)
>6}=0,
(2.1)
uniformly in t > 0. It follows from (2.1) that b(x) satisfies a Lipschitz condition (with the same constant as b(x, y)). Therefore, there exists a unique solution of the problem
The random process X; can be considered as a result of random perturbations of the dynamical system (2.2), small on the average. Relation (2.1) is an assumption on the average in time smallness of random perturbations. Theorem 2.1. Suppose that condition (2.1) is satisfied and sup, M Ib(x, S,)I2
< x. Then for any T > 0 and 6 > 0 we have
lime{ sup IX; -9,I>6 }=0. e-.0
0
)))
The assertion of this theorem follows easily from Theorem 1.3 of Ch. 2. For this we need to put b(c, s, x, (o) = b(x, cs;,(w)) and note that condition
7. The Averaging Principle
218
(2.1) can be written in the following form: for any T, b > 0 and x e R" we have
ft+T
b(E,s,x,(o)ds-Tb(x)
lim P{ c-0
t
uniformly in t. This is exactly the condition in Theorem 1.3 of Ch. 2. We note that our arguments repeat, in essence, the proof of the averaging principle in the deterministic case, which is contained in Gikhman [1] and Krasnosel'skii and Krein [1]. A similar result is contained in Khas'minskii [4]. Condition (2.1), which is assumed in Theorem 2.1, is satisfied under quite relaxed assumptions on the process it = b(x, ,) (x e R' is a parameter). For example, if ns is stationary in the wide sense, then it is sufficient that the diagonal entries of its correlation matrix (R"(T)) converge to zero as r -+ oo; in this case, b(x) = Mb(x, s). In the nonstationary case it is sufficient that there exist a function r(x, t) such that lim,__, r(x, t) = 0 and I M(b(x, 5s) - b(x), b(x, S+T) - b(x)) I < r(x, t).
We postpone examples until §8 and now study the difference X, - z, in more detail. In the deterministic case, where, for example, c, is a periodic function, this difference is of order t and we can write down the other terms of the asymptotic expansion in integral powers of E. In the study of probability theoretical problems we apparently have to consider typical the situation where the random process , satisfies some condition of weak dependence, i.e., a condition that the dependence between the variables , and ,+, becomes weaker in some sense with increasing T. It turns out that in this case the deviation of X; from x, is of another character. The difference X; - x, is of order but no other terms of the asymptotic expansion can be written down: as E -+ 0, the expression (1/,/)(X; - x,) does not converge
to any limit in general and only has a limit distribution. In other words, whereas the averaging principle itself-the assertion of Theorem 2.1-can be considered as a result of the type of laws of large numbers, the behavior of the standardized difference (1/,./F)(X; - z,) can be described by an assertion of the type of the central limit theorem. In order to clarify this, we consider in which the right sides do not depend on the simplest system If = x. If , satisfies the strong mixing condition of (cf. below), then under weak
additional restrictions, the distribution of the normalized difference , = (1/,/-&)(X' - x,) converges to a normal distribution as E - 0 (cf., for example,
lbragimov and Linnik [1]). In the next section we show that under some additional assumptions, the distributions of the normalized differences converge to Gaussian distributions in the case of systems of general form, as well. What is more, following Khas'minskii [4], we show that not only do
the distributions of the variables ; converge to Gaussian distributions for
13
219
Normal Deviations from an Averaged System
every fixed t, but as e -+ 0, the processes ; also converge, in the sense of weak
convergence, to a Gaussian Markov process and we also determine the characteristics of the limit process. In the remaining sections we also study large deviations of order 1 of X; from x, and large deviations of order e", where x e (0, i).
§3. Normal Deviations from an Averaged System We pass to the study of the difference between the solution X; of system (1.2) and the solution x, of the averaged system. It has been shown in the preceding
section that with probability close to 1 for a small, the trajectory of X; is situated in a small neighborhood of the function z, for t e [0, T], T < oo. Therefore, if we take the smoothness of the field b(x, y) into account, we may
hope that the difference X; - x, can be approximated with the deviation, from x of the solution of the system obtained from (1.2) by linearization in the neighborhood of the trajectory of the averaged system. Consequently, the study of the normalized difference (; = e - t j2(X; - x,) may be carried out according to the following plan: firstly, we study the normalized deviation in the case of a linearized system; secondly, we verify that the trajectory of
the original system differs from that of the linearized one by a quantity infinitely small compared to f as e - 0. In implementing the first part of our plan, we have to introduce notions and carry out arguments very similar to those usually employed in the proof of the central limit theorem for random processes. Moreover, since we would like to prove the weak convergence
of the processes rather than only convergence of the finite-dimensional distributions, we also need to verify the weak compactness of the family of processes g;.
We note that the study of large deviations of order e", where x E (0, 2), can also be reduced to the study of deviations of the same order for the linearized system. As to the probabilities of deviations of order 1 of X; from x, for system (1.2) and the linearized system, they have essentially different asymptotics.
We pass to the implementation of the above program of the study of deviations of order f. For this we recall, the notion of strong mixing and some properties of random processes satisfying the strong mixing condition. In a probability space {52,,x, P} let us be given an increasing family of a-algebras , ;:Fs; s F for 0 < s2 < s1 < tt < t2 < x. We say that this family satisfies the strong mixing condition with coefficient x(r) if
supsupI M5q - MMgI = x(z) 10 as r - x, where the supremum is taken over all .FO-measurable
and y , T measurable variables q, I n 1 < 1.
(3.1)
,
I
I
<- 1
220
7. The Averaging Principe
It can be proved (cf. Rozanov [1]) that if is an , "o-measurable and n is an ,F' T measurable random variable and M I I2+a < oo, M I n I2+a < cc, then
7[a(t)]aua+6)
(3.2)
If 0 < S1 < t1 < S2 < t2 <
< sm < tm are arbitrary numbers, A mine
m
MfInk - HMnk 5 (m - 1)6(0), k=1
k=1
where a(0) is the mixing coefficient for the Q-algebras F. We say that a random process t >- 0, satisfies the condition of strong mixing with mixing coefficient a(t) if the Q-algebras ys generated by the values of the processes for u e [s, t] satisfy the condition (3.1) of strong mixing. Let n1 _ (n 1(t), ... , n2k(t)) be a random process satisfying the condition
of strong mixing with coefficient a(t). Suppose that for some m > 2 we have M I n'(t)
Im(2k- 1) < C,
J tn-1[a(t)](m-zumdt=A torn=l, 2,
. .. , k;
0
ta+ T
so
M n(t)I
dt < BJT
for i = 1 , 2, ... , 2k, where C, B, and A1, ... , Ak are positive constants. Then there exists a constant C(2k), determined only by the constants C, B, A 1, .. , Ak,
such that M
f-.. D2k
5nI(si)n2(s2).
n2k(s
)ds1 ...ds2k
< C12k) Tk,
(3.4)
where D2k = {(s1, ... , s2k): Si a [to, to + T] for i = 1_., 2k). If D is the direct product of two-dimensional convex domains D(1),..., D(k) such that each of them can be enclosed in a square with side T and for all s e [to, to + T] we have the inequality I f so M,'(t) dt I < C, then M
f... f n1(CI)n2(S2) ' ... .112k (S20 ds1 ... ds2k
< C(2k) Tk.
(3.5)
D2k
The proof of estimates of the type of (3.4) takes its origin in the first publications on limit theorems for weakly dependent random variables (cf. Bernstein
221
§3 Normal Deviations from an Averaged System
[21). In the form presented here, estimates (3.4) and (3.5) are proved in Khas'minskii [4]. Now we formulate the fundamental result of this section. Theorem 3.1. Let the functions b(x, y), x e R', y c- R', i = 1, ... , r, have bounded continuousfirst and second partial derivatives on the whole space. Suppose that the random process , with values in R' has piecewise continuous trajectories with probability 1 and satisfies the condition of strong mixing with coefficient
x(z) such that $o r[a(z)]' `s dr < --c and supx,, M I b(x, ,) I3 < N < oc. Moreover, let the following conditions be satisfied: 1. The limits
lim -
fio+T
T
T-.
Mb(x, ,) ds = b(x),
o
1 fto+Tfto+T lim T- ao
T
Ak'(x, s, t) ds dt = Ak'(x), o
o
exist unifo rmly in x e R', to >- 0, where
Ak`(x, s, t) = M[bk(x. s) - Mbk(x, 2. For some C < oc we have To [Mb(xS, S/E) - b(xs)] ds < CE, max k.i
If,
axk (xS, SSIE) -axk (Xs)] ds < Cc LM
for all r e [0, To]. Then as E - 0, the process S; =
= (X; - x,)
converges weakly on the interval [0, To] to a Gaussian Markov process Co
satisfying the system of linear differential equations °=t?+B(x,)'°,
'°0=0,
where wo is a Gaussian process with independent increments, vanishing mathe-
matical expectation and correlation matrix (Rk'(t)), Rki(t) = f o Ak'(Xs) ds and B(x) = (B;(x)) = (ab'/ax'(x)).
Mwo,kwo,i =
7. The Averaging Principe
222
Proof. We introduce the notation r
B(x, y) = (B(x, y)) = az (x, y)/,
J [b(xs, SAE) - b(xs)] 4,
V° o
(b(s, e, co) = B(. ,, s/e) - B(xs), `P(s, e, w) =
1
[b(XS +
s, SAE) - b(xs> S,E) - B(z5,
f ].
It follows from the definition of X; and x, that we have the following relation for the normalized difference Cr :
f [b(Xs, 5rj - b(XS)] ds V° Jo r
,
r
_ t+
B(xs)Cs ds + 0
r
`P(s, E, w) ds.
fI)(s , s, co)CE ds + 0
(3.7)
fo
It is easy to see (it follows from the existence of bounded second derivatives of b`(x, y)) that `P(s, e, c)) is of order f gs)2. In general, the linearized equation must contain a term corresponding to the third term on the right side of (3.7). Nevertheless, as will follow from the discussion below, this term has a vanishing effect as e -> 0, and therefore, we do not include it in the linearized system from the very beginning. Hence we consider the simplified linearized equation
Ze = W, + fo, B(zs)Z; ds.
(3.8)
Inaccordance with our plan, we first have to prove that as E -- 0, Zr; converges to a process C° satisfying equation (3.6). For this we need the following lemma.
Lemma 3.1. The process A converges weakly to the process w° defined in the formulation of Theorem 3.1 as c -+ 0.
We divide the proof of the lemma into several steps. 1. First of all, we prove that the family of processes A; is weakly compact in COTp. For this it is sufficient to show (cf. Prokhorov [1]) that for any s, s + h e [0, TO] we have M I,%+h - A I4 < Ch2,
(3.9)
223
0 Normal Deviations from an Averaged System
where C is a constant independent of E. It is obviously sufficient to establish an
analogous estimation for every component of the process AE = (A',' (s), ... , A" "(s))
From the assumption concerning the mixing coefficient of , and condition 2
it follows that estimate (3.4) is applicable to the process W -'(s):
MIAs.k(s + h)
(s+h)/C
-
...
E2M f
f
S/E
(s+hWE
/E
i
x fl [bk(xE5., s,) - ok(XES;)] dsl ds2 ds3 ds4 i=1
2
<
E2C1 h2
= Cth2.
This implies (3.9) and weak compactness. 2. The hypotheses of Theorem 3.1 imply the relations
lim MA; = 0; E-. o
`(t) = JtA'C'(S) ds.
lim MBE E-0
(3.10)
The first of these equalities follows from condition 2 of Theorem 3.1. From the same condition we obtain that 7
E
f Jgki(u, o
) ds du + O(1),
s,
(3.11)
o
where we have used the notation gk`(u, s, E) = M[bk(xn, c /t) - Mbk(., /E)]
x [b'(xs, s/E) - Mb'(;, s/E)] In order to derive the second equality in (3.10) from this, we put H = {(s, u): 0 < s < t, 0 < u < t}, A = t/n, A; = {(s, u): iA < s < (i + 1)A, it < u < (i + 1)A}, where n is an integer; A = U,"=o' A, and B = H\ A. From (3.2) with b = 1-we obtain the estimate gk`(u, s, E)1 < C'2
/ - sAt/s ra(u ILL
E
From this we obtain n- 1
f!
i(u, s, e) duds < 2C2 E2 Y
i=o
f
iA/E
o
(i+ 1) Alt
du fA/E
ds[a(s - u)]"5
m
< 4C2E2n Jo uIa(u)I1/s du.
(3.12)
7. The Averaging Principle
224
Taking account of the boundedness of the derivatives of b'"(x, ) and condi. tion I of Theorem 3.1, we see that
a
i=o
'ti + uAle
1) A/c
1
gki(u, s, c) du ds = F.2 n
J iA/E
du
ds AiE
xM[bk(XiA, s) - Mbk(xiA,
x [b'(. 4. .) -
s)]
Mb'(xiA, cu)] + 0(nA 3)
n-1
E Y Aki(XiA) A + OE(l) + 0(1/n2). i=o
This equality holds as En , 0 and n
(3.13)
oo. From the boundedness of the
derivatives of b(x, y), condition 1 of Theorem 3.1 and the condition of strong mixing it follows that the functions A"(x) are continuous. Taking account of this continuity and relations (3.11)-(3.13). we find that MW.k(t)AE.i(()
_ I Aki(xs) ds + y,, k, 0
whereyE,n - 0as Fn -*0and cn2 cc. 3. In conclusion, we show that d; converges weakly to the process w° defined in the formulation of Theorem 3.1. From the weak compactness of the family of measures corresponding to the processes .1, in the space COTO it follows that every sequence of such processes contains a subsequence converging to some process A,. If we show that the distribution of the limit process A, does not depend on the choice of the subsequence, then weak convergence will be proved.
It follows from step 2 that MA, = 0 and the entries of the covariance matrix of the process A, = (41(t)'... , Ar(t)) have the form MAk(t)A'(t) _ JtA"(.) ds. Moreover, A, is a process with independent increments. Indeed, let S1 < t, < s2 < t2 < < s< t,,, be arbitrary nonnegative numbers and write A = min2
,-
-
A.s)}, z c- R':
A', )
I - fl M exp(i(z, A;, - As))
//
< (m - 1)0(A/E).
i=1
Taking into account that lim,_ m 2(T) = 0, we conclude from this that for the
limit process A,, the multivariate characteristic function of the vector is equal to the product of the character(:t,, - 4,04 ,= - ;-12 istic functions of the separate increments. Consequently, the limit process A, has independent increments. Therefore, the limit process A, has continuous trajectories, independent increments, mean zero and the given covariance matrix (MA"(t)2'(t)). As is
225
p3. Normal Deviations from an Averaged System
known, these properties determine 2, uniquely and A, is necessarily Gaussian (cf. Skorokhod [1]) and coincides with w° (in the sense of distributions). The weak compactness of the family of measures corresponding to the processes
.Z and the fact that this family has a unique limit point imply the weak convergence of A, to w,0. Lemma 3.1 is proved.
Now it is very easy to prove the weak convergence of the measure corresponding to the process Z; to the measure corresponding to °. Indeed, equation (3.8) defines a continuous mapping G: A' - Z` of COT, into itself. It is clear that if the measure corresponding to A` converges weakly to the measure corresponding to w°, then the measure corresponding to Z` = G(1r) converges weakly to the measure corresponding to G(w°) = Co Hence we have carried out the first part of our plan. Now we estimate the difference (, - Z; = U. From (3.7) and (3.8), for U; we obtain the relation
U; --
.
r
<
B(xs, S,r)US ds = f 0
F. (o)Z's ds +
o
<
J0
P(s, r, w) ds. (3.14)
Since the entries of the matrix B(x, y) are bounded, relying on Lemma 1.1 of Ch. 2, we conclude from the last equality that I U; I < eel
[
J(s. f:, w)ZS ds
J'P(s.
0
F, (0) (is
}
(3.15)
where c is a constant. If we show that the family of measures induced by the processes (, in CIT. is weakly compact and the right side of the last inequality converges to zero in probability, then the proof of the theorem will be completed.
First we prove that the right side of (3.15) converges to zero in probability
as F - 0. It follows from the boundedness of the partial derivatives of the functions b'(x, y) that for some Ct we have I `P(s, r, w) I <- C t f l ss l2.
(3.16)
Taking account of the boundedness of M I A, I2, it is easy to derive from (3.7) that M I Cs I2 < C2 < oo for s e [0, To]. This and (3.16) imply the estimate M 150 P(s, r, w) ds
< C3 t f .
(3.17)
Now we estimate the first term on the right side of (3.15). We use the following representation of the solution of problem (3.8) in terms of the Green's function K(t, s) = (K;(t, s)): Z, = At: +
r
fo
K (t. s).ls ds.
(3.18)
226
7. The Averaging Principle
As is known, K(t, s) is continuously differentiable in the triangle
{(t, s): 0 < s < t < To}.
For the norm of the matrix K(t, s) we have the estimate IIK(t, s)II < IIB(XS)II exp{IIB(XS)II It - sI} < C4-
Using the representation (3.18), we find that
M
r,
2
<M
4)(s, r, (w)Zs ds
2
f 'CB(XS,
0
5/E) - B(XS)]
ds
0
+Mf
siE) - B(X:)J f
o
u)2u du
Z
o
( 3.19)
Let us denote by 11 and 12 the first and second terms on the right side of (3.19), respectively. Let us put Wk(S/E, E, w) = bk(xs, s/E) - bk(xs), able
CO) = ax j (is,
OF s/F) - axj
(Xs)-
Taking account of the definition of A; and applying estimate (3.5), we obtain s
1, < C(r) max M k, j
1
k E, w) f'ds'p(s/..
I/e
= E3C(r) max fo ds, k,j
qj(u/E, E, w) du fo
E //C
fo ds2 fo
s1
S2
du, J due M[gp;(s,, E, co) 0
X (pj(S2, E, w)cp'(u1. E, w)q (u2, E, wA < C(r)C5F3(t/E)2 = C6Et2. (3.20)
Here C(r) is a constant depending on the dimension of the space. In order to estimate 12, we note that the differentiability of the entries K)(t, s) of the matrix-valued Green's function and condition 2 of the theorem imply the estimate f(E
Mq %s, r, (o)K{(Es, u) ds fYfE
SC7<00
227
§3. Normal Deviations from an Averaged System
for 0 5 u 5 t 5 To. Using this estimate and inequality (3.5), we obtain M
M
fu,
E3
dst J ur/c
f o0
0
fO
ul;t
r;r.
t%[
rr
X
(p"`(v/E, E, w) dv
0
O
2
u
s
r
f cp,{slE, E, a) ds 5K(s. u) du
J u2'c
u2rc
f
ds2
f o0
(E, st, u1, w)
o
0(f, s2, u2, w)cpk(vI, E, w)cpk(v2, E, w) < COEt4,
where 0(E, s, u, (o) = p' {s, E, w)K (Es, u). These inequalities imply the estimate /2 5 C9 E. Taking account of (3.19) and (3.20), we arrive at the inequality 2
r
M
f
(D(s, e., w)ZS ds
< Ct0 OF,
(3.21)
0
which holds for t e [0, To]. It follows from (3.15), (3.17) and (3.21) that MIU;I -+ 0 ast-+0. In order to prove the weak compactness of the family of measures corresponding to the C'. we note that ; and A are connected with the relation 1
K
'
Si = , + yy
I
[b(Xs,
Ss/c) - b(xs, 5are)] ds.
Taking account of estimate (3.9) and the boundedness of b(x, y), we can easily obtain an estimate for analogous to (3.9): MICE +h-CE 14-<eh2.
This estimate guarantees the weak compactness of the family of the processes
{;, t e [0, TO]. The weak compactness and the convergence of the finitedimensional distributions imply the weak convergence of c;; to s°. This completes the proof of Theorem 3.1.
In §8 we shall consider some examples of the application of this theorem and now we only make one remark. According to Theorem 3.1 we have
lim MF(5`) = MF(C°)
(3.22)
c-0
if the functional F(cp) is bounded and continuous on COT.. For discontinuous functionals, this passage to the limt is impossible in general. Nevertheless, if for the limit process (°, the set of points of discontinuity of F has probability zero, then, as is easy to prove, relation (3.22) is preserved. For example, let us
228
7. The Averaging Principe
be given a domain D = R' with smooth boundary OD and let F((p) = 1 if r((p) = inf It: gyp, a OD} < T, and F((p) = 0 for the remaining functions in C°T(R'). This functional is discontinuous at those (p which reach OD but do
not leave D v OD until time T and at those lp for which T((p) = T. If the matrix of the A"(xs) is nonsingular, then for the limit process CO, the set of trajectories reaching DD until time T but not leaving D has probability zero. This follows from the strong Markov property of the process and from the fact that a nondegenerate diffusion process beginning at a point x e aD hits both sides of the smooth surface aD before any time t > 0 with probability 1. The vanishing of the probability P{t(g°) = T} follows from the existence of the transition probability density of CO. Consequently, P{r(CE) < T} converges to P{ r(C°) < T} as r -+ 0. In particular, choosing the ball of radius S with center at the point 0 as D, we obtain
lim P{ sup J X' - x, I > b f } = P{r(C°) > T1. E-.0
105i
JJJ
The last probability can be calculated by solving the corresponding Kolmogorov equation. We mention another situation in which we encounter approximation by diffusion processes for deviations from trajectories of the averaged system. If we return to the "slow" time in which equation (1.1) is written, then the
averaging principle contained in Theorem 2.1 can be formulated in the following way: If condition (2.1) is satisfied, then for any 6 > 0 we have
sup IZ;-x,,,I>b}=0, E-0
limP10!5
l< TIE
)
where Z, is the solution of equation (1.1) and x, is the solution of the averaged system (1.4). In the case b(x) = 0 this theorem implies that in the time interval [0, TIE] the process Z; does not move away noticeably from the initial posi-
tion. It turns out that in this case, displacements of order 1 take place over time intervals of order E-Z. Apparently, it was Stratonovich [1], who first called attention to this fact. On the level of the rigor of physics, he established
(cf. the same publication) that under certain conditions, the family of processes Z;1E2 converges to a diffusion process and computed the characteristics of the limit process. A mathematically rigorous proof of this result was given
by Khas'minskii [5]. A proof is given in Borodin [1] under essentially less stringent assumptions. Without precisely formulating the conditions, which, in addition to the equality b(x) __ 0, contain some assumptions concerning the boundedness of the derivatives of the b(x, y) and also the sufficiently good mixing of , and the existence of moments, we include the result here.
229
Normal Deviations from an Averaged System
Let us introduce the notation a"(x, s, t) = Mb'(x, 5)bk(x, St), B(x, y) = (B%x, Y)) =
(`abax; `(x, Y)1 J,
MB;{x,S)b'(x,,).
K`(x, s, t) _ j= I
Suppose that the limits 1
aik(x) = lim T-.o
1
K'(x) = lim T-. T
ffo+T rto+T a`k(x, s, t) ds dt, o
J to
o
J to
fto+T rto+T
K'(x, s, t) ds dt
exist, uniformly in to > 0 and x e R'. Then on the interval [0, T], the process q = Z;,E2 converges weakly to the diffusion process with generating operator L
1
,k(x)
'
2 i.k=1 a
a2 i
'
axaxk +
i
K '(x)
axi
ass -+ 0. A precise formulation and proof can be found in Khas'minskii [5] and Borodin [1]. A natural generalization of the last result is the limit theorem describing the evolution of first integrals of a dynamical system. Consider equation (1.2), and assume that the corresponding averaged system (1.4) has a first integral H(x) : H(X1) = H(x) = constant for t >- 0. Let H(x) be a smooth
function with compact connected level sets C(y) = {x e R' : H(x) = y}. Since Xl -> Xr as s 10 uniformly on any finite time interval [0, T] in probability, maxo
Xo = x e R'.
We have:
H(Xt) - H(x)
1
E
1
s
t
J (OH(XS ), b(Xl, s/e2)) ds
Io(VH(X;), (b(X;, s/o) - b(XS))) ds.
(3.23)
230
7. The Averaging Principle
Here we used the equality (VH(x), b(x)) = 0, x e R', which holds since H(z) is a first integral of system (1.4). Let , be a stationary stochastic process with good enough mixing prop.
erties. Then the limit in (1.3) is equal to b(x) = Mb(x, ,), and, due to the central limit theorem, 1
f
E Jo(VH(x), b(x, ,Io) - b(x)) ds converges as E 10 in the distribution to a Gaussian random variable. Of course, the characteristics of this Gaussian variable depend on x e R'. Taking into account that the rates of change for k,' and ,/2 have different order, and that X, converges to X, as a 10, one can expect that, if the dynamical system X, has some ergodic properties on the level sets C(y), the characteristics of the limit of H(X;) as e 10 depend only on H(XX) but not
on the position of k,-, on C(H(XX )). This means that the process H(k ), 0 < t < T, should converge to a diffusion process Y,:
dY, =o(Y,)dW,+B(Y,)dt,
Yo=H(x),
where W, is a standard one-dimensional Wiener process. Thus, the convergence to a diffusion process is the result not just of averaging with respect to the fast oscillating process qd, but also of the ergodic behavior of the averaged dynamical system X, on the level sets of the first integral. The result of Stratonovich, Khasminskii, and Borodin is a special case of the limit theorem for first integrals: If b(x) = 0, then each coordinate x' is a first integral of the averaged system, and the evolution of Xf is, actually, the evolution of the first integrals. No assumptions concerning the ergodicity on the level sets is needed here, since each level set consists of one point. To formulate the exact result, we restrict ourselves to the two-dimensional
case. Consider equation (3.23) for r = 2. Let c -oo < t < oo, be a stationary process with the values in R. Assume that the trajectories , have at most a finite number of simple discontinuities on each finite time interval with probability 1. Equation (1.2) is satisfied at the points where ,/,,2 is continuous. The solution is assumed to be continuous for all t >_ 0. We assume also that the vector field b(x, u), x e R2, u e R'", is Lipschitz continuous and Ib(x, u) I is bounded or grows slowly; the function b(x) = Mb(x, ,) then is also Lipschitz continuous. Introduce the following conditions. 1. Let H(x), x e R2, be a first integral for averaged system (1.4). Assume that H(x) is three times differentiable and the sets C(y) = {x a R2 : H(x) = y} are closed connected curves in R2 without self-
intersections. This means, in particular, that H(x) has a unique extremum point. We can assume without loss of generality, that this point is the origin
lp
231
Normal Deviations from an Averaged System
0 C 'R2,
and that H(O) = 0, H(x) > 0 for x 0 0. The trajectory Xt performs periodic motion along C(y), y = H(Xo) with a period T(y). Assume that T(Y) < c(1 + y2) for y >_ 0 and some c > 0. 2. Let IH(x)I 5 C(Ix12 + 1),
02H(x) ax' ax.'
IVH(x)I 5 c(IxI + 1), a3H(x)
5 c,
ax'axiaxk I
for some c > 0 and for i, j, k e 11, 2}, x e R2. Let IxI'` <- c(H(x) + 1) for some 1 < ,u <- 2. 3.
Put g(x, z) = b(x, z) - b(x). Suppose a positive function q(z) and a
constant C > 0 exist such that Ib(x)I <- c(IxI + 1), ax15
z)
agax,
IOb(x)I 5 C,
Ig(x, z) I 5 Cq(z),
C'
5 1+
q(z),
i, j e{ 1, 2}, x e R2, z e R'.
Ix1
Let, for some p > (4(p + 1))/(,u - 1),
The next assumption concerns the mixing properties of the process 1(w), w e fl, -oo < t < oo. Let Fs be the o-field generated by the process when
-ao<s
sup
I (Po,,(B) - (Po x P.,)(B))I,
x ;'
where the measures Po,T and PO x PT on the space Q x S1 are defined by the relations:
Po,T(AI X A2) = P(AI A2),
(Po x P1)(A1 x A2) = P(AI)P(A2),
for AI a Fom, A2 e F.
232
7. The Averaging Princillk
We say that the process , satisfies the absolute regularity mixing con. dition with the coefficient /1(r), if 1(r) 10 as r --* oo. It is known that the absolute regularity mixing condition is stronger than
mixing condition (3.1) (strong mixing condition). Note that since t is a stationary process, the supremum in t in (3.1) can be omitted. Some sufficient
conditions for these types of mixing properties and bounds for the coefficients a(r) and /1(T) can be found in Ibragimov and Rozanov [1]. 4. Assume that the stationary process , satisfies the absolute regularity mixing conditions with the coefficient /1(T) such that T3/3(p 8)/"(T) dT < 00, (3.24)
P(P-8)/P(r) < c min(1,T-4)
for some c > 0 and the constant p defined in condition 3. If n
b(x,z) = E uk(x)Vk(z),
n < oo,
k=1
one can replace the absolute regularity condition by the strong mixing condition with the coefficient a(T) satisfying (3.24) with /3(T) replaced by a(T). Define
g(x,z) = b(x,z) - E(x),
F(x,z) = (VH(x),g(x,z)),
D(x,s) = MF(x, ,.)F(x, o), Q(x,s) = M (VF(x, s), g(x, 0)), where V is the gradient in x e R2. Let JO
D(x,s)ds,
D(x) = 2 0
Q(x)=2 F0 Q(x,s)a, Z
o) B(y)
I
T (y)
f
D(x) dl Y)
Q(x) dl
I
- T(y)
l b(x) l
f
Y)
Ib(x)I
233
44. Large Deviations from an Averaged System
where dl is the length element on C(y) = {x e R2 : H(x) = y} and T(y) _
fc(,)(dl/Jb(x))I) is the period of the rotation along C(y). 5. Assume that the functions a2 (y) and B(y) are Lipschitz continuous. Theorem 3.2 (Borodin and Freidlin [1]). Let Y, = H(.; ), where X; is the
solution of (3.22) and H(x) is the first integral of averaged equation (1.4). Suppose that conditions 1-5 hold. Then for any 0 < T < oo, the processes YE, 0:5 t:5 T, converge weakly in the space of continuous functions 0: [0, T] -> R' to the diffusion process Y, defined by the equation
dY, = Q(Y1)dW,+B(Y,)dt,
Yo = H(x).
The proof of this theorem and some of its modifications and generalizations can be found in Borodin and Freidlin [1]. Some examples related to Theorem 3.2 are considered in §8.
We assumed in Theorem 3.2 that the first integral H(x) has just one critical point. This assumption is essential. If for some y the level sets C(y) consist of more than one connected component, the processes YY = H(X; ),
will not, in general, converge to a Markov process. To have in the limit a Markov, process, one should extend the phase space. We consider such questions in the next chapter for white noise perturbations of dynamical systems. The situation is similar in the case of fast oscillating perturbations.
§4. Large Deviations from an Averaged System We have established that for small F, over the time [0, T] the process X; is near the trajectory x, of the averaged system with overwhelming probability
and the normalized deviations (1/f )(X; - x,) form a random process, which converges weakly to a Gaussian Markov process as F -> 0. If the averaged dynamical system has an asymptotically stable equilibrium position or limit cycle and the initial point Xo = x is situated in the domain of attraction of this equilibrium position or cycle, then it follows from the above results that with probability close to 1 for F small, the trajectory of X; hits the neighborhood of the equilibrium position or the limit cycle and spends an arbi-
trarily long time T in it provided that s is sufficiently small. By means of Theorem 3.1 we can estimate the probability that over a fixed time [0, T] the trajectory of X, does not leave a neighborhood D` of the equilibrium position if this neighborhood has linear dimensions of order /_& However, the above results do not enable us to estimate, in a precise way, the probability
that over the time [0, T] the process X; leaves a given neighborhood, independent of s, of the equilibrium position. We can only say that this probability converges to zero. Theorems 2.1 and 3.1 do not enable us to study events determined by the behavior of X; on time intervals increasing
7. The Averaging Prineip
234
with e-'. For example, by means of these theorems we cannot estimate the time spent by X; in a neighborhood D of an asymptotically stable equili. brium position until the first exit time of D. Over a sufficiently long time X, goes from the neighborhood of one equilibrium position of the averaged system to neighborhoods of others. These passages take place "in spite of" the averaged motion, due to prolonged deviations of , from its "typical" behavior. In one word, the situation here is completely analogous to that confronted in Chs. 4-6: in order to study all these questions, we need theorems
on large deviations for the family of processes X. For this family we are going to introduce an action functional and by means of it we shall study
probabilities of events having small probability fore < I and also the behavior of the process on time intervals increasing with decreasing E. These results were obtained in Freidlin [9], [11]. In what follows we assume for the sake of simplicity that not only are the partial derivatives of the b'(x, y), x e R', y e R', bounded but so are the b'(x, y) themselves: sup
I b`(x, y) I +
Ob'
axe
(x, Y)
The assumption that I b(x, y) I is bounded could be replaced by an assumption
on the finiteness of some exponential moments of Ib(x,,)I but this would lengthen the proofs. We shall say that condition F is satisfied if there exists a numerical-valued function H(x, a), x e R', (a a R', such that
lime In M exp{ 1 f T (as, b((ps, .,Ij) ds = E-0
)
o
T H(cps,
Jo
as) ds,
(4.1)
for any step functions cps, as on the interval [0, T] with values in R'. If as cps and as we choose constants cp, a e R', then we obtain from (4.1) that lim
1 In M exp
T-. T
rT (a, b((p, l;s)) ds } = H((p, a). Jo
(4.2)
)))
Lemma 4.1. The function H(x, cc) is jointly continuous in its variables and convex in the second argument.
Indeed, it follows from (4.2) that
IH(x+A,a+b)-H(x,a)I
235
§4 Large Deviations from an Averaged System
if we take account of the convexity of the exponential function and the montonicity and concavity of the logarithmic function. We define the function L(x, /3) as the Legendre transform of H(x, a) in the variables a:
L(x, /3) = sup[(a, /3) - H(x, a)]. The function L(x, 3) is convex in /3 and jointly lower semicontinuous in all variables; it assumes nonnegative values including + oo. It follows from the boundedness of b(x, y) that L(x, /3) = + oo outside some bounded set in the space of the variables P.
The function L(x, /3) is jointly lower semicontinuous in all variables. Indeed, it follows from the definition of L(x, /3) that for any x, /3 E R' and n > 0 there exists of,, =
/3) such that
L(x, /3) < (a,,, /3) - H(x,
1/n.
Taking account of the continuity of H(x, a), from this we obtain that for /3, and I x - x' J < S , 1(3 - #'I < S,, we have some b = (a,,, /3) - H(x,
(a,,, /3') + H(x', a,,) + 1/n < L(x', /3') + 1/n.
Consequently, L(x, /3) < L(x', /3') + 2/n if I x - x' J < S and I
< 5,,,
i.e., L(x, /3) is jointly lower semicontinuous in all variables.
Remark 1. Condition F is equivalent to the assumption that the limit (4.1) exists for every continuous cps, as.
Remark 2. In general, the variables (x, a) and (x, /3) vary in different spaces.
If equation (1.2) is considered on a manifold G, then x is a point in the manifold, a is an element of the cotangent space at x, (x, a) is a point of the cotangent bundle and (x, /3) is a point of the tangent bundle. On C0T(R') we introduce a functional SOT(co): T
SOT(co) =
J0
L((ps, BPS) ds
if cps is absolutely continuous; we put SOT((p) = + oo for the remaining elements cp of COT(R').
Lemma 4.2. For any compactum FO c R' and any s < ac., the set bFp(s) _ (cp E COT(R'): (po e Fo, SoT(cp) < s) is compact in COT(R'). The functional So-r(p) is lower semicontinuous in COT(R').
236
7. The Averaging Principle
Proof. Since L(x, fJ) is equal to + x outside a bounded set in the space of the variables Q, the set D1;0(s) may only contain functions whose derivatives are uniformly bounded. Taking account of the compactness of FO, it follows are uniformly bounded and equicon. from this that all functions in tinuous. Consequently, for the proof of the compactness of 4)Fo(s) we only have to show that DF0(s) is closed. The closedness of 4 Fo(s) obviously follows from the lower semicontinuity of SOT. As in §2, Ch. 5, concerning semi, continuity we refer to the book [1] by loffe and Tichomirov.
Theorem 4.1 (Freidlin [9], [11]). Let condition F be satisfied and let H(x, a) be differentiable with respect to a. The functional SoT(4)) is the normalized action functional in COT(R') for the famil y of processes X; as E -+ 0, the normal-
izing coefficient being f (E) = E-', i.e., the set 4x(s) = {(P = Cor(R'): (po = x, SOT((P) < s)
is compact in COT(R') and for any s, 6, r > 0 and cP E COT(R'), iPo = x, there exists EO > 0 such that for s < EO we have P{PoT(XE, (P) < S} > exp{-i:-'(SOT(4P) + ')}.
(4.3)
P{POT(XE, 4 (s)) > b} < exp{-E-'(s - y)}, where X; is the solution of equation (1.2) with the initial condition Xo = x.
We postpone the proof of this theorem until the next section and now discuss some consequences of it and the verification of the conditions of the theorem for some classes of processes. First of all we note that for any set A S CoT(R') = {(P E COT(R'): cqo = x) we have
- inf S((p) < lim E In P{XE E A} NE(A)
E-O
hm E In P[XE E A]
inf S((p),
(4.5)
pc[A)
E- O
where [A] is the closure of A and (A) is the interior of A in CxOT(R'). Estimates
(4.5) follow from general properties of an action functional (cf. Ch. 3). If the infima in (4.5) over [A] and (A) coincide, then (4.5) implies the relation lim E In P{X' E A}
inf SOT(4P)
E-O
peA
We would like to mention that because of the boundedness of Ib(x, S5)I and the possible degeneracy of the random variables b(x, S), the condition of coincidence of the infima of SOT((o) over the sets [A] and (A) is more
237
4, Large Deviations from an Averaged System
stringent than, say, in the case of a functional corresponding to an additive perturbation of the type of a white noise (cf. Ch. 4). It follows from the compactness of Dx(s) that there exists so > 0 such that estimate (4.3) holds for e < so for any function cp E cx(s). Since b(x, y) satisfies a Lipschitz condition, for any (Px E C'OT(R'), (p'' E CO"T(R') and 6 > 0 we have {w: POT(X`, (px) < b} c {w: POT(Y` (Py) < (e IT + 2)8},
if pOT((px (py) < S, where X` = X; and Y` = Y; are solutions of equation (1.2) with initial conditions x and y, respectively and K is the Lipschitz constant. Relying on estimates (4.3) and (4.5), from this we obtain that if p((px, (py) < b, then
inf 0E('OT(RI).Ply. (p')<,.
SOT(') < SOT((gx),
where S' = (eIT + 2)6. Taking account of the last inequality, one can easily see that
{p(X`, cx(s)) < S} c {p(Y`, t (s)) < d'} for I x - y I <- b. With use of these estimates and inclusions, it is easy to derive
the following "uniform" version of Theorem 4.1' from Theorem 4.1. Theorem 4.1'. Suppose that condition F is satisfied and H(x, a) is differentiable
with respect to a. The functional e 1SOT(O is the action functional for the family of processes X;, uniformly with respect to the initial point x in any compactum Q c R', as a - 0. This means that the assertions of Theorem 4.1 hold and for any s, S, y > 0 and any compactum Q e R' there exists eo > 0 such that inequalities (4.3) and (4.4) hold for every initial point x e Q and every rp E tx(s).
For a large class of important events, the infimum in (4.6) can be expressed in terms of the function ux(t, z) = inf {SOT(4 ): X00 = x, cp, = z}. As a rule, the initial point cpo = x is assumed to be fixed, and therefore, we shall omit
the subscript x in ux(t, z). As is known, the function u(t, z) satisfies the Hamilton-Jacobi equation. Since the Legendre transformation is involutive,
the Hamilton-Jacobi equation for u(t, z) has the form (cf., for example, Gel'fand and Fomin [1])
=0.
(4.7)
238
7. The Averaging Principle
The functional SoT((p) vanishes for trajectories of the averaged system. Indeed, from the concavity of In x it follows that T
1
H(x, a) = lim - In M exp T-.W T
lim
1M T
Jo
(a, b(x,S)) ds
roT (a, b(x, . )) ds
J
rT
a, lim
T-.T J
Mb(x, i .,) dsJ _ (a, b(x)),
1
and consequently, L((p, gyp) = supa[(O, a) - H(cp, a)] = 0 for cp = 6((P). It follows from the differentiability of H(x, a) with respect to a that the action functional vanishes only at trajectories of the averaged system. We take for the set A = {(p e Co .(R'): supo <15 T I (pt - x( I > S}. Then we conclude from (4.5) that for any S > 0 we have
P{ sup JX, - x,1 > 8} < exp{-cs '} l05t5T
)))
for sufficiently small E, where c is an arbitrary number less than inf,ElAl S(W)-
This infimum is positive, since S(q) > 0 for tp a [A] and S is lower semicontinuous. Consequently, if the hypotheses of Theorem 4.1 are satisfied, then the probability of deviations of order 1 from the trajectory of the averaged system is exponentially small. In §8 we consider some examples of the application of Theorem 4.1 and now discuss the problem of fulfilment of the hypotheses of Theorem 4.1 in the case where i;, is a Markov process. Lemma 4.3. Suppose that , is a homogeneous Markov process with values in
D c R' and for any x, aeR' let lim
1
In M,, exp
(a ,
b(x, ,)) ds } = H(x, a)
(4.8)
fo,
uniformly in y e D. Then condition F is satisfied.
Proof. Let a, and z., be step functions and let ak and zk be their values on [tk - 1, tk); 0 c to < t l < t2 < ... < t = T, respectively. Using the Markov
14 Large
239
Deviations from an Averaged System
property, we can write (1
rT
lim e In My exp{- J (as, e-0 1
l im a In My exp
b(z £
F. -o
x M;,, , expif
(az, b(z2,
fo 1
x M4,._,;E exp{ l
ds
fo,
ds J
r
ri,
(an, b(zn, s;J) ds}.
F.J 0
From (4.8) it follows that rk-'k-1 j(1 E In My exp{ E f0 (ak, b(zk, slE)) ds - H(Zk, ak)(tk - tk- 1)
<8k, (4.9)
0 uniformly in y E D as E - 0. Repeating this estimation on every interval [tk _ 1, tk), we obtain from (4.9) that where bk
I
T
e In M exp - J (as, b(zs, Ss/e)) ds E
0
n
k-1
H(zk, ak)(tk - tk - 1) k=1
This relation implies condition F.
For Markov processes we can formulate general conditions of Feller type and on the positivity of transitional probabilities which guarantee the validity of condition F and differentiability of H(x, a) with respect to a. We shall not discuss the general case but rather the case where , is a Markov process with a finite number of states. In §9 we consider the case of a diffusion
process ,. Let
t >- 0, be a homogeneous stochastically continuous Markov
process with N states 11, 2, ... , N}, let p,.#) be the probability of passage from
i to j over time t, and let P(t) = (p;,{t)). We denote by Q = (q;;) the matrix consisting of the derivatives dp;j(t)/dt at t = 0; as in known, these derivatives exist.
Theorem 4.2. Suppose that all entries of Q are different from zero. Let us denote
by Q°'X = (qX) the matrix whose entries are given by the equalities q,jX = q;; + 8;j (a, b(x, i)), where 8;j = I for i = j and 5 , = 0 for i j. Then QZX
240
7. The Averaging Princip
has a simple real eigenvalue ) = A(x, a) exceeding the real parts of all other eigenvalues. This eigenvalue is differentiable with respect to a. Condition F is satisfied and H(x, a) = A(x, a).
Proof. If 5, is a Markov process, then the family of operators T t > 0, acting on the set of bounded measurable functions on the phase space of according to the formula
,
f0,(a,
exp{
7,.f (z) =
l
b(x, ,)) ds},
forms a positive semigroup. In our case the phase space consists of a finite number of points and the semigroup is a semigroup of matrices acting in the N-dimensional space of vectors f = (f (1), ... , f (N)). It is easy to calculate the infinitesimal generator A of this semigroup:
A=lim7`,-E=(q1 ilo
+bi;-(a,b(x,i)))=Q°.X
t
By means of the infinitesimal generator the semigroup T, can be represented in the form T, = Since by assumption q;; # 0, the entries of the
matrix T, are positive if t > 0. By Frobenius' theorem (Gantmakher [1]), the eigenvalue with largest absolute value p = µ(t, x, a) of such a matrix is real, positive and simple. To it there corresponds an eigenvector e(t, x, a) _ (e1, . . . , eN), "_, ek = 1, all of whose components are positive. It is easy to derive from the semigroup property of the operators T that e(t, x, a) does not actually depend on t and is an eigenvector of the matrix i.e.,
a) = A(x, a)e(x, a). The corresponding eigenvalue )(x, a) is
real, simple, exceeds the real parts of all other eigenvalues of Q',' and p(t, x, a) = exp{t )(x, a)). The differentiability of A(x, a) with respect to a follows from the differentiability of the entries of Q' and the simplicity of the eigenvalue .(x, a) (cf. Kato [1]). By Lemma 4.3, in order to complete the proof of the theorem it is sufficient to show that T
lim
1 In M; exp fo ( a, b(x, S)) ds = , (x, a)
T -. ao T
for i = 1, 2--, N. This equality can obviously be rewritten in the following equivalent form: lim 1 In(T 1)(i) = A(x, a), Ty.T
where 1 is the vector with components equal to one.
(4.10)
241
15. Large Deviations Continued
To prove (4.10) we use the fact that all components of the eigenvector e . e(x, a) = (et, ... , eN),_k=, ek = 1, are positive:0 < c < min,
c('T, l)(i) < (Te)(i) = e"
(T,1Xi)
for i = 1, 2, ... , N. We take the logarithm of this relation and divide by t:
I In c +
ln(T 1)(i) <
I ln(Te)(i)
t = a.(x, a) +
r In e; l ln(T,1xi).
Letting t tend to oc in this chain of inequalities, we obtain (4.9). Theorem 4.2 is proved.
We note that in the case considered in Theorem 4.2, an equation can be written for the function u(t, z) introduced above, without determining the eigenvalue A(x, a) of QX. which has the largest real part. Indeed, .1(x, a) = A is a root of the characteristic equation
det(q;, + b [(a, b(x, i)) - A]) = 0.
(4.11)
Since by (4.7) and Theorem 4.2, u(t, z) satisfies the Hamilton--Jacobi equation (au/at)(t, z) = .1(z, V u), the function u(t, z) also has to satisfy the equation
det(q;;
- bi;r(V. u(t, z), b(z, M - at])
0,
where we have to choose that solution of the equation for which au/at is a root of equation (4.11) for a = V u(t, z), having the largest real part.
§5. Large Deviations Continued In this section we prove Theorem 4.1.
We choose a small number A > 0 such that T/A = n is an integer. Let i,: [0, T] - R' be a piecewise constant right continuous function having points of discontinuity only at points of the form kA, k = 1, 2, ... , n - 1. Let us consider the family of random processes
X;
= x + J'b(i 0
S/E) ds
242
7. The Averaging Pringpk
and write S0T(gP) = SoT((P) = f o L(1i,, cps) ds if cps is absolutely continuous and S0T((p) = + oe for the remaining cp E COT(R). The functional SOT((P) is lower semicontinuous; the set (D,,(s) = {cp E COT(R'): SoT(cp) 5 s, (po = x} is compact in COT(R'). This can be proved in exactly the same way as in Lemma 4.2.
Moreover, we note that the functional SW((p) is lower semicontinuous in , in the topology of uniform convergence for every cp. This follows easily from Fatou's lemma and the joint lower semicontinuity of L(x, /3) in all variables: if 0"-* tP, then T
lim SoT'((p) = lim fo L(ly:, cps) A n_ao
n - ao
Tlim L(t/is, &) ds > J on ao
T
cPs) ds = SoT(W),
fo
Lemma 5.1. Let condition F be satisfied and let the function H(x, a) be differentiable with respect to the variables a. The functional SW((p) is the normalized action Jimctional in COT(R') for the as E family of processes 0, with normalizing coefficient f (e) = C'. X;.10
Proof. Let a,, az , ... , xn E R'. We denote by a(s) the piecewise constant func-
tion on [0, T] which assumes the value -k=; ak for s e ((i - 1)0, iA], i = 1, 2, ... , n. The function hE(a,, az, ... , an), where x is the initial condition 90 ' = x, is defined by the equality
(
Taking account of the definition of if
hE(al,... ,
1
n
(
hE(a,,... , an) = c In M exp{
E-
(ak, JfCke ))l( k=1
we may write
JT(()
an) = E In M exp E-1b(o5, J) ds 0
kn11 ak. + (X, =
This and condition F imply the existence of the limit
a") _
lim,-0 h,(a1i ... , an) and the equality h"(a,, ... , an) = f:HS, a(s)) ds + k=1
JJ
It is easy to see that the function h" ((x ,, ... , a") is convex in the variables a,, ... , an. The differentiability of H(x, a) with respect to the second argument implies the differentiability of h" (a,, ..., an).
We denote by h" (a,, . . . , a,,). The function 1'(# 1,
. .
/3k E R', the Legendre . , /3n) can be expressed
transform
of
in terms of the
243
§5. Large Deviations Continued
Legendre transform L(x, J3) of H(x, a) in the following way: rL 1"(QI, ... , Yn) =
(W s,
J0
ll(s)) ds,
where fl(s) is the piecewise linear function on [0, T], having corners at the multiples of a and assuming the value /3k at kA,130 = x. Indeed, if x = 0, then by the definition of Legendre's transformation we have n
1001, ... , Nn) a
n
n
(ak, Nk) - e
sup I
k=1
H` O(k- 1) A,
R
i=ka,)] n
/
= A sup {sup[cci, N1/e) - HI 00, al +
.2---j
i=2 it
n aa
(«k, Nk/e) -
+
ai)
I
al
I «,k=2 )11 n
H`'(k- 1)A,
\
i=k
= e sup 22.... an
k - /11
n
n
n
/''II''
k Hl W(k- 1)A+ 2 n
#,/A) + A sup
=
22.....an k=2
k=2
i
Y-
a)
#1
«k, Nk
e
)
HI OR-DA , Y- «i )I.
\
i=k
Taking supremum with respect to a2 in the last term, as it was taken with respect to a1 earlier, and then taking supremum with respect to a3 and so on, we arrive at the relation n
1, ... , /3n) _ k=1
-1\ e L (O(k - 1)A, A - Nk e
which is equivalent to (5.1) for x = 0. Taking into account that /3") = l0(f1 - x, x), we obtain (5.1) for an arbitrary initial condition x e R'.
Let us put 1q, _ (J v O, If X;-* is a process in the r-dimensional space, then rl` is a random variable with values in (R')", the product
of n copies of R'. It follows from the definition of hs(a1,... , an) that h"(a1,... , an) = limL.O s In M exp(e-1(a, q')}, where We write BA(s) = (e a (R')":1"(e1, ... , en) < s} for s < oo and p(e, g) _ max I ek - gk I ,
where e = (e1,. .. , en) and g = (g1,.. . , gn) are points of
244
7. The Averaging Principle
(R')". It follows from Theorem 1.2 of Ch. 5 that for any s, 6, y > 0 and e e (R')" we have
P{p(rl', e) < b} > exp{-s'(ls(e1, ..., a") + y)}, P{p'(rl`, V (s)) > 8} < exp{-E-1(s - y)}. for sufficiently small E. Let (P E COT(R'), Wo = X, Sor(4) < oo. b > 0. We , co,,) AE (R')". For sufficiently small S' = 8'(6) and A = A(S) we have write (p° = ((PA, (Pze,
P(PoT(X"", (P) < b} > P{p(tl`, (p°) < 8'}.
This inequality follows from the fact that the trajectories of and any function (p for which 90((p) < oo satisfy a Lipschitz condition. Estimating the right side of the last inequality by means of the first of the inequalities (5.2), we obtain PI PoT(X`'", (P) < S} > exp{-e-'(F(cpe,..., (Pne) + y)} = exPS l
6:) ds + yl -E-1(j'L(Ii. T o
(5.3)
for every y > 0 and sufficiently small s, where BPS is the piecewise linear func-
tion having corners at the multiples of A and coinciding with (p, at these points.
Taking account of the absolute continuity of (ps and the convexity of L(x, fi) in #, we obtain T
f
n
L( O ,BPS) ds
o
1
L
`\(O(k- 1)e,
k=1 n
k=1
f
A
k-lle
(Ps dsA T
kA
(k- 1)A
fkA
L(,(k- ne, P:) ds = fo L(t5, (Ps) ds = SoT(W)
Relations (5.3) and (5.4) imply the first of the two inequalities in the definition of the action functional: P(PoT(.
m, (p) < S}
> exp{ -e- 1(Sor((P) + y)}.
To prove the second inequality, we note that we have the inclusion {(U E 0: PoT(X`' ", (Ax(s)) > 619 (CO E fl: p(tl`,
°(s)) > b'}
(5.5)
245
15. Large Deviations Continued
for sufficiently small A(b) and b'(b), where &(S) = {q E COT(R'): co = x, SOT(1P) S S}.
This inclusion and the second estimate in (5.2) imply the required inequality
45.(s)) > b} < exp{-E-'(s - y)}.
(5.6)
As has been remarked, the compactness of iD-,,(s) can be proved in the same way as in Lemma 4.2. Lemma 5.1 is proved. As usual, this lemma and the lower semicontinuity of SOT(q) imply that inf S(cp) < lim E In P{XE, 0 E A} E-0
`p (A)
< ltm E In P{XE 0 E A} < - inf S((p) E-0
(5.7)
oc[A]
for every A c CoT(R'). Here (A) is the set of points of A, interior with respect to the space CoT(R'). Lemma 5.2. Suppose that the function H(x, a) is differentiable with respect to the variables a. Let 0'"': [0, T] -' R' be a sequence of step functions uniformly
converging to some function cp E C0T(R') as n - oo. Then there exists a sequence gyp(") E COT(R'), uniformly converging to cp, such that T
)tm
L(q/'"', Ws")) ds < SOT((q) 0
n- OD
Proof. For an arbitrary partition 0 = to < t1 < t2 <
we have T
X > SOT(q) =
=
L((p., 0.) ds J0
Tsup[(O., a) - H((ps, a)] ds
J0
2
n
f1k
a) - H(Ws, a)] ds
Y- sup k=1 a n
k_1
r
Y supl (rp k=1 a L
- (p1ka) -
f1k
H(rPS, a) ds k_1
(5.8)
246
7. The Averaging Principle
We put Yk(a) =
H((ps, a) ds;
tk
I
tk
1k(P) = SUp[(a, /1) - Yk(a)] 2
1
It follows from the hypotheses of the lemma that the function yk(a) is convex and differentiable. The function Ik(3) is convex, nonnegative, and lower semi. continuous. Relation (5.8) can be rewritten in the form 00 > SOT((O)
Y- lk((Otk - (Otk-,) k-1
It is known (cf. Rockafellar [1]) that if !k(#) is a lower semicontinuous convex function and f3* e {f: lk(f3) < oo} = Ak, then lk(1R*) = lim,-B.1k($), where the points Q belong to 1k, the interior of Ak with respect to its affine
hull'). Therefore, for every 6 > 0 there exists a function ip,: [0, T] - R' such that sup I Y'1 - Y', I <6, 0
(Oo = (Oo,
(Ptk - (
tk-, c Ak
and n
// Y- lk((Otk
,n
(Otk-1) > Y, !k((01k - 'Ptk- 1)
k=1
k=1
For such a function we have (5.9)
S((P) > Y- lk((Otk - Y'tk-1) - d. k=1
For points fl e ,Tk, the supremum in the definition of lk(I3) is attained (cf. Rockafellar [1]). By virtue of this, there exist ak E R'such that lk(Otk ak) - Yk(ak); the ak satisfy the relation (0rk (Y'tk Wtk-
(01k-1'
OYk(ak)-
Let 0(m) be any sequence of step functions uniformly converging to (O as m -+ oo. We choose (p(m) according to the conditions: (P0(M) = (po, (psm) _ (m)0Then (O (m) converges to O H( (m) V. s, ka) for s e (tk - 1, t)k and 1k+0 - (OtkGPs at the points 0, t1, t2, ... , to = T as m - Co. Indeed, (0(m)
(m)
1(m) (Otk - (Olk -t=
ft tk
V.
(m)
a,,) ds
k-1 f1k
k-1
V. H((Os, ak) d ^s - Y'tk - (ptk -
' The affine hull (aff A) of a set A E- R' is defined by the equality
affAylxl+
+y"'x.:x,,...,x. EA, E yk=1}. k=1
1.
(5.10)
247
15. Large Deviations Continued
Here we have used the fact that the convergence of the convex functions a) ds to the differentiable function yk(a) as m -> x im) (a) = J'k_, plies the convergence Vyk(a) - Vyk(a) (Rockafellar [1]). Since cps'" = V H(/("', ak) for s e (tk- 11 tk), we have sup[(cPsn'
L(0S7' epsm,) =
a) - H(O4m) a))
T
_ (W(m' ak) -
H(OSn1'.
(5.11)
L10-
It follows from (5.10) and (5.11) that rTL(0snr) 0,
O1rn11 ds
= k=1
1
a) - H( /4"). a)] ds
Ik sup[(0sm) 1k-1 ]
[f1k
51k
ak) ds k=1
-:
-
1k(0,,, -
H(om). ak) ds k-1
k-1
cPtk - i)
5 SOT(cP) + S.
(5.12)
k=1
as m
x. Choosing b and the intervals between the points tk sufficiently
small. we can bring cPl,m' and (p, arbitrarily close to each other. This and (5.12) imply the assertion of Lemma 5.2.
Now we pass directly to the proof of Theorem 4.1. Let cP a COT(R') and
SOAP) < x. We choose a step function ,` and a function qqz such that PoT((P". (P) < A, SUPO <, < T 10' - cP1 I < A and IT L(/is , O k) ds < SoT((P) + Y
This can be done according to Lemma 5.2. It is easy to derive from the boundedness of the functions bk(x. 11) and their derivatives that {cv: P0T(X`, (P) < d} 2 {(O:
POT(Xc."',
cP") < d'}
for any b > 0 provided that A = A(S) and d' = S'(S) are sufficiently small. This inclusion and estimate (5.5) imply the inequality P{POT(X`, (P) < d} > P
{PoT(X':."
(P.') < d'}
>- exp{ -r:-'(S'""(cp') + )')} >- exp{ -rs-'(SOT(cP) + 2y)}
(5.13)
for sufficiently small s.
Now we prove the second inequality in the definition of the action functional. First of all we note that since I b(x, y) I is bounded, the trajectory of X;,
248
7. The Averaging ptinciplc
just as well as that of if- I, issued from a point x e R' belongs to some corn. pactum F c COT(R') for t E [0, T]. It follows from the semicontinuity of the functional Sor((p) in 0 that for any y > 0 there exists b = 3((p) such that S"(cP) > S - y/2 if Por(cP, 0) < 6 and SOT(cp) > s. Relying on the semi. continuity of So .((p) in cp, we conclude that the functional 6,((,) is lower semicontinuous in cP E Cor(R'). Consequently, by(cp) attains its infimum on every compactum. Let Fl be the compactum obtained from F by omitting the 6/2-neighbor. hood of the set Ox(s) = {cP E Cor(R'): cPo = x, Sor(cp) 5 s}. Let us write by = inf0EFl 67M, 6' = by/(4KT + 2), where K is a Lipschitz constant of b(x, y).
Let us choose a finite 6'-net in F and let q, , ... , cps" be the elements of this net, not belonging to 1.,(s). It is obvious that N
P{Por(X`, ba(s)) > b} 5Y, P{Por(Xc, (p(i)) < S')
(5.14)
1-1
if S' < 6. From the Lipschitz continuity of b(x, y) for Por((P, 0) we obtain the inclusion {co: POr(X`, (p) < 6')
{w: NO'.", (P) < (2KT + 1)6').
(5.15)
We choose step functions 0'1', 0(1), ... , r/i(N) such that POT(//U, (0°) < 6'/2 for i = 1, 2, ... , N. Relations (5.14) and (5.15) imply the estimate P{Por(X', D,(s)) > 6) <
P{Por(9'. "", 9(')) < (2KT + 1)6').
(5.16)
i=1
Every term on the right side of the last inequality can be estimated by means
of consequence (5.7) of Lemma 5.1. For every i = 1, ... , N we have the estimate P{Por(Xt.v,O
(P(n) < b,,/2}
< exp{-e-'[inf{S'1'"(cp): Por(9, q,O)) < by/2} - y/4]} for sufficiently small F. It follows from the definition of d., that the infimum on the right side of the last inequality is not less than s - y/2, and consequently,
P{Por(7'. "" (011) < 6 /2} < exp{ -e-1(s - y)}. From (5.16) and (5.17) we obtain the lower estimate
P{POT(X`, IX(s)) > 5) < exp{ -e-1(s - y)} for sufficiently small e. Theorem 4.1 is completely proved.
(5.17)
§6, The Behavior of the System on Large Time Intervals
249
§6. The Behavior of the System on Large Time Intervals As we have seen in Chapters 4 and 6, the behavior, on large time intervals, of a
random process obtained as a result of small perturbations of a dynamical system is determined to a great extent by the character of large deviations. In this paragraph we discuss a series of results concerning the behavior of the process X,, the solution of system (1.2). These results are completely analogous to those expounded in Chapters 4 and 6. Because of this, we only give brief outlines of proofs, concentrating on the differences. We shall consider the case where the fast motion, i.e., is a Markov process with a finite number of states. Let 0 e R' be an asymptotically stable equilibrium position of the averaged
system (2.2) and let D be a bounded domain containing 0, whose boundary OD has a continuously rotating normal. We assume that the trajectories x, beginning at points xo = x e D u OD converge to 0 without leaving D as t -+ oo. We put T' = inf {t: X{ 0 D}. In contrast with the case considered in Ch. 4, TE may be infinite with positive probability in general. We introduce the function inf S((p),
V(x, y) =
we 11,. y
where Hx y is the set of functions cp with values in R', defined on all possible intervals [0, T], T > 0, for which cpo = x, (PT = y. The function V(x, y) can
be expressed in terms of the function u,,(t, y) appearing in §4: V(x, y) = u,,(t, y). It follows from the lower semicontinuity of S(cp) that V(x, y) is
lower semicontinuous; it can be equal to + oo. We write V(x, OD) _ infye BD V (X, Y)
Theorem 6.1 (Freidlin [11]). Let the hypotheses of Theorem 4.2 be satisfied. Suppose that for every y E aD we have (b(y, i), n(y)) > 0
(6.1)
for some i = i(y), where n(y) is the exterior normal vector to OD at y. If V(O, aD) < oo, then lim is In Ms T` = V(O, aD) El0
for any x e D. If V(O, aD) = + oo, then Po{T` = oo} = 1
for any r > 0. For the proof of this theorem we need the following lemma.
(6.2)
250
7. The Averaging Principle
Lemma 6.1. Suppose that the hypotheses of Theorem 6.1 are satisfied. If V(xo, 3D) < co for some x0 a D, then V(x, aD) is continuous at xo. If V(O, OD)
< co, then V(x, 0D) S V(0, 0) < oo for all x e D and V(x, 3D) is continuous.
Proof. Let X,' and Z' be solutions of equation (1.2) issued from x0 and z, respectively, at time zero. For the difference X, - Z; we have the inequality
r
JX;-Z,I
t>-0.
0
From this by means of Lemma 1.1 of Ch. 2 we obtain
IX; - Z'1 < eRTIxo - zI.
(6.3)
If V(xo, aD) < oo, then taking account of condition (6.1), for any y > 0
and sufficiently small S > 0 we can construct a function cp,, t e [0, T], 4Po = xo, P(cPT, D) > S, for which SoT((p) < V(xo, OD) + y/4. Relying on Theorem 4.1, from this we conclude that
P{za,2 < T} > exp{-e-'(V(xo, al)) + y/2)}
(6.4)
for sufficiently small e > 0, where t 12 = inf{t: p(X;, D) > S/2}. Let Iz - x0
<
e_11T6/4
and let A be the event that the trajectory of Z; leaves D until
time T. It follows from (6.3) and (6.4) that
P(A) >- exp{-e-'(V(xo, OD) + y/2)}. On the other hand, P(A) can be estimated from above according to Theorem 4.1: we have P(A) < exp{ - c - ' ( inf SoT(lp) - y/4))
t
q
A
for sufficiently small e > 0, where A is the set of functions belonging to COT(R'), such that cpo = z and cp, leaves D before time T. We obtain from the last two estimates that V(z, OD) < inf SOT(cp) < V(xo, OD) + y rpe.4
for Ixo - zi < e-'T614. On the other hand, the semicontinuity of V(x,3D) implies the inequality V(z, OD) > V(xo, OD) - y,
251
The Behavior of the System on Large Time Intervals
provided that b is sufficiently small. Therefore, I V(z, OD) - V(xo, 3D) I < y for sufficiently small 6. The continuity at xo is proved. If V(O, OD) < oo, then V(x, aD) is continuous at O. Let S' be such that V(z, OD) - V (O, 0) I < y for I z - O 1 < S'. Since the trajectory of the averaged system, issued from a point x e D, necessarily hits the 6'-neighborhood of 0 and the action functional vanishes at trajectories of the averaged system, we have V(x, OD) < V(O, 3D) + y. By virtue of the arbitrariness of y, this implies that V(x, OD) < oo, and consequently, V(x, OD) is continuous everywhere in D. Lemma 6.1 is proved.
Now we outline the proof of Theorem 6.1. Due to the continuity of V(x, 3D), the proof of the first assertion of the theorem is completely analogous to that of Theorem 2.1 in Ch. 4. We only have to take into account that the process X; which we are now considering is not a Markov process in general, and therefore, we need to consider the pair (X', This pair forms a Markov process.
We prove the last assertion of the theorem. We assume the contrary. Without loss of generality, we may assume that (6.2) is not satisfied for e = 1:
the trajectories of X; issued from the point Xo = 0 leave D with positive probability. Then by condition (6.1) the trajectories of X, leave some 6neighborhood D+a of D, S > 0, with positive probability: for some T we have
Po{ra < T I > a > 0,
(6.5)
where r. = inf{t: X, * D+a}.
Let 0 < t, <
0.
io, i t, ... , i,_ 1
be a sequence of
integers. We consider the following set of step functions on [0, T]: ii(s) = ik for se [Sk, Sk+1),
k =0,...,n- 1; SO
0,ISk-tk1 <6
fort0, moments of time ti,..., t,- 1, integers io, i,, ... , in _ 1 and 6' > 0 such that the solution of the equation
Y, = b(y,, 0(t)),
yo = 0,
goes out of D+a/2 before time T for any function iIi e A;o;;((5'). It is easy to derive from the Markov property of , that for some c < oo we have PJx lbf/E
t.,-1
e - EE
t
7. The Averaging Principle
252
Taking account of this estimate, we arrive at the conclusion:
P{T`e-"-' for any > 0. On the other hand, by Theorem 4.1 we have
hme In PIT' < T} < - inf S0T(cp),
(6.7)
(peliDr
CIO
where H;' = {cp E CoT(R'): epo = 0, cps e OD for some s e [0, T]). Since V(0, 3D) < S0T(cp), we obtain from (6.6) and (6.7) that V(O, OD) < inf S(cp) < (QE IIT
P
< T} < c < x,
CIO
which contradicts the condition V(O, OD) = + x. The contradiction thus obtained proves the last assertion of Theorem 6.1. We note that for V(O, 3D) to be finite it is sufficient, for example, that for
some i = 1, 2, ..., N the solution of the equation z, = b(x,. i),
xo = 0,
leave D. Analogously to §6, Ch. 6, for small a we can solve the problem of exit of X`
from a domain containing several equilibrium positions or limit sets of a more general form. We only have to take into account that in the situation considered in this section V(x, y) may be equal to + x and the corresponding passages may not be possible in general. In particular, as we have seen in Theorem 6.1, there may exist absorbing limit sets, i.e., limit sets with the
property that if a trajectory hits the domain of attraction of the set, then it never leaves it. Analogously, there may exist absorbing cycles if, as was done in Ch. 6, we consider the decomposition of all limit sets into a hierarchy of cycles.
If the condition of connectedness: for any indices i, j of the limit sets K...... K, of system (2.2), V(K;, K) = V(x, y)1x E K,. Y E A; < a . is satisfied, then all results of §6, Ch. 6 on fibering into cycles remain true. In conclusion, we briefly treat invariant measures of X and their behavior as a 10. If the projection of the averaged vector field b(x) onto the position vector connecting the origin of coordinates with the point x is negative and separated from zero uniformly in x 0 F, where F is a compactum in R', then it can be proved that every process X; has an invariant measure, at least for sufficiently small E. This invariant measure is not unique in general. If we do
not make any additional assumptions, then in general, the family p of invariant measures of X; has many cluster points (in the topology of weak
253
17. Not Very Large Deviations
convergence of measures) as E 10. If we assume that the condition of con-
nectedness is satisfied, then there is only one cluster point in the case of general position. It can be described by means of the construction of {i}graphs, considered in the preceding chapter (cf. Freidlin [11]).
§7. Not Very Large Deviations We have already considered deviations of order E'/2 and of order I of X; from x,. Here we discuss briefly deviations of order E", where x e (0, 2) (cf. Baier and Freidlin [1]). These deviations have common features with both deviations of order I (their probabilities converge to zero) and deviations of order 011 (they are determined by the behavior of the functions b(x, y) near the averaged trajectory). In connection with this, deviations of order E", x e (0, 2), are governed, as in the case of normal approximation, by the system obtained from (1.2) by linearization near r,. For the sake of brevity, in this section we restrict ourselves to deviations from an equilibrium position of the averaged system, i.e., we shall assume that the initial condition in equation (1.2) is an equilibrium position of the
averaged system and it coincides with the origin of coordinates: X' = 0, 5(0) = 0. We assume that this equilibrium position is asymptotically stable in the first approximation. We introduce the notation Ob'
b(0, y) =.1(y),
(0, y) = B}(y),
- B;(cs) ds = B,, T-.T J0
B(y) = (B'(y)),
T
lim
B = (B;).
We assume that the limits exist in probability and B. = (8b'; axi)(0). We shall say that condition F" is satisfied if for any step function as: [0, T] R' we have lim
Et-2" In M
exp{E"-'
T(as,fds} = 1 f T(C;, as) ds, 2
Jo
610
(7.1)
0
where C is a constant symmetric matrix of order r. It is easy to see that (7.1) for as = a implies the equality rT
lim T 2"-' In M exp T T
co
l
(a, f (55)) ds } = 2 Ca, a). J0
}}}
(7.2)
254
7. The Averaging Principle
Under certain assumptions, which we shall not specify, C is the matrix of second derivatives of the function H(x, a) introduced in §4 with respect to the
variables a evaluated at x = 0, a = 0. In what follows, for simplicity, we shall assume that det C # 0. If C is singular but has a nonzero eigenvalue, then all results can essentially be preserved; only their formulation becomes more complicated. For functions belonging to CoT(R') we define the functional rT S(W) = SOT((P) = 2
J0
(C-'(SOS
- &p:), GPs - B(PS) ds,
if gyp, is absolutely continuous; for the remaining 9 E COT(R') we put S((p) = + oo. We have already encountered this functional in Ch. 4; S((p) is the normalized action functional for the family of Gaussian processes t;; defined by the differential equation It
= Bt;; + AC112u,1,
(7.3)
where w, is a white noise process. Below we clarify the relationship between our problem and equation (7.3)
Theorem 7.1. Suppose that condition F" is satisfied and det C # 0. For some
y > I - 2x let lim e'' In P10:5sup
I
fe1s-f,1ef(
e-x ft(B(/E) - B)
1ST
ff/E) duds
0
I
> S} _ -00 J
(7.4)
for any b > 0. Moreover, suppose that there exist to e (0, T] and a function 6(t), 6(t) > 0, lim,lo 6(t) = 0, such that ex-1
HE
sup
110
ESf<tp
e'-Zx
In M exp±
OShST-t
h+f J(,/E) ds
t) fh
< ao.
Then e2x-'SOT(cp) is the action functional for the family of processes Z; _ e-"X;, x e (0, 2) in the space COT(R') as e 10, where X; is the solution of equation (1.2) with the initial condition Xo = 0 (we recall that b(0) = 0).
The proof of this theorem is a combination of some arguments applied in the proofs of Theorems 3.1 and 4.1. Therefore, we only outline it. Firstly, it
can be proved that e2"-'S(ip) is the action functional for the family of processes Z, if e2"-'S((p) is the action functional for the processes Z; where . is the solution of the linearized system
f=(
t/E) + Blbt/E)"t,
0=
0.
255
§7. Not Very Large Deviations
Then, using (7.4), it can be proved that the estimates of Z;, appearing in the definition of the action functional are satisfied if the same estimates hold for Z, = E-"Xr, where X; is the solution of the equation X, = .1(
(7.5)
) + R91 E*
To obtain estimates for k;, first we calculate the action functional for the family of processes il; = E-" f o f (bs/e) ds. By means of Theorem 3.1 of Ch. 3, from this we calculate the action functional of 2;. Finally, the calculation of the action functional for the processes 1l; can be carried out in the same way as it was done in Lemma 4.3 (cf. also Gartner [1]). In the course of this, we use the last condition of the theorem.
Consequently, the functional S(cp) and the normalizing coefficient f(E) = s2i-' characterize deviations of order E", xe(0,1). In accordance with Theorem 3.1, the process E-'12X; converges to a Gaussian process l;° as E -s 0. As is easy to see, in our case this Gaussian process C° satisfies equation
E-1/2X1 (7.3) for A = 1. The ratio X' s' can be written in the form E1/2 and we can interpret Theorem 3.1 in the following way: the probabilities of deviations of order Ex, x e (0, 1), have the same asymptotics as deviations of order I caused by the Gaussian noise ACrv, A = e112-x. Of course, this circumstance is in perfect accordance with the fact that large but not very large deviations for sums of independent terms have the same asymptotics (in the principal terms) as the corresponding deviations of the normal approximation (cf. Ibragimov and Linnik [1]). Now we consider some random processes for which the hypotheses of Theorem 7.1 can be verified and the matrix C can be calculated.
Theorem 7.2. Let S, be a stochastically continuous Markov process with a finite number of states, let pi,(t) be its transition probabilities, and let qij = dpi,4t)/dt1t=0. Suppose that all qij are different from zero. Let us denote by ,1(a) the eigenvalue of the matrix Q' = (q, -), q7 = q, + S,/ct, b(0, i)) with the largest real part. Then the hypotheses of Theorem 7.1 are satisfied for x e (0, 1) and C = (C''), where C" = a2A(a)/aaiaaj Iz = 0.
The proof of this theorem uses the construction applied in the proof of Theorem 4.2. First, in the same way as in Lemma 4.3, it can be proved that in the case of the process S condition F" follows from relation (7.2). To prove (7.2), we note that M1
exp{t-x
l
J(f(s)) ds}
(Ti X'I)(i),
(7.6)
= where T; is the semigroup of operators acting in the space of functions g(i) defined on the phase space of c, according to the formula (T2g)(i) =
o
J
eXP{ ,(a, o
f
dsl}. J
256
7. The Averaging Principle
Using the notation introduced in the proof of Theorem 4.2, we obtain t2x- I In c + t2x- 1 ln(T;
1)(i) <
t2x- I ln(T,'-'2e)(i) t2x2(t-xa) + t2x-1 In ei
=
5
t2x- I ln(T;
21)(i).
We note that I is the largest eigenvalue in absolute value of T, for a = 0. Therefore, A(O) = 0. Taking into account that 6(0) = 0, it is easy to see that all first derivatives of A(a) vanish at a = 0. Therefore,
2(a) =
1I
02A
2 iJ aaiaaJ
(O)a;aJ + o(Ia12)
as I a I -+ 0. On account of the last equality, it follows from (7.7) that lim t2x-1 in(T; '21)(i) _ 1_00
'
1
01).
2 J., aaiaaJ
(0)ataJ = I(C c, a).
This and (7.6) imply (7.2) and thus condition P. The verification of the remaining conditions of Theorem 7.1 is left to the reader. An analogous result holds, of course, for some other Markov processes, for example, if , is a nondegenerate diffusion process on a compact space. Similar arguments enable us to verify conditions F and P for some nonMarkov processes with good mixing properties, as well (cf. Sinai [1]). We write out the Hamilton-Jacobi equation for the function u(t, x) _ inf {So,((p): tpa = 0, (p, = x). As has been said, the principal terms of the asymptotics of many quantities of interest can be expressed in terms of this function. By calculating the Legendre transform of the function L(x, /f) = J(C- 1(/f - Dx), /3 - Bx), we arrive at the following equation for u(t, x): au
at = Z(COx u, Vx u) + (Dx, V u).
It is also possible to write out Euler's equations for the extremals; in the case
under consideration, these equations are linear. Theorem 7.7 implies in particular that -min{u(t, x): IxI = d, t c- [0, T]}
lim eI -2x In P10:stsT sup I X' I > ex d E-0
JJ
for x E (0, 1).
The deviations of order ex determine the average time needed by the trajectories of the process X;, Xo = 0, to exit from a domain D` containing 0
257
p. Examples
if DE is
obtained from a given domain D by a stretching with coefficient
e: DE = Ex . D. Let TE = inf {t: X; a DE},
V(x) = inf u(t, x). tao
Theorem 7.3. Suppose that the hypotheses of Theorem 7.2 are satisfied, the matrix C is nonsingular and DE = Ex - D, x e (0, z), where D is a bounded domain with smooth boundary. Let us put VO = min..OD V(x). Then lim
E'-2x In MOTE
= V0,
E10
lim
P0{eL2K t(v--Y)
< TE <
er2rc-''vo+Y)} = 1
E1O
for any y > 0.
The proof of this theorem is analogous to those of the corresponding results of Chapter 4 and we omit it. We say a few words concerning the calculation of V(x). Analogously to §3, Ch. 4, it can be proved that V(x) is a solution of problem Ro for the equation
'(CVV(x), VV(x)) + (Fix, VV(x)) = 0.
An immediate verification shows that if C-'B is symmetric, then V(x) _
-(C-'Bx, x). Concluding this section, we note that analogous estimates can be obtained for not only deviations of order e', x e (0, 1) from an equilibrium position but also deviations of order ex from any averaged trajectory.
§8. Examples EXAMPLE 8.1. First we consider the case where the right sides of equations
(1.2) do not depend on x and the process , is stationary. Then X; can be expressed in the form
X; = x +
1/E
rl
b( SAE) ds = x + t(tAE)-' J0
b( 5) ds. 0
We write m = Mb(t;y), B"(T) = m') and assume that J; =1 B"(T) - 0 as T oo. By means of the Chebyshev inequality, we
258
7. The Averaging Principle
obtain from this for any S > 0 that T
> S}
b(as) ds - m
T
<
t+T
T 262 i
r
FT
M(b'( ) - m')(b`( ) - m') ds du
f t+T
r
uniformly in t > 0 as T increases. Consequently, the hypotheses of Theorem 2.1 are satisfied in this case and supo <_ t s T I X t' - x - mt 1 - 0 in probability as e - 0. If the process , has a mixing coefficient a(T) decreasing sufficiently
fast and the functions b(y) increase not too fast (for example if they are bounded) then Theorem 3.1 is applicable. In the case under consideration, this theorem is the assertion that the family of processes ct =
1
(X; - mt - x) = f
tit;
m] ds J
converges weakly, as E - 0, to a Gaussian process C, having mean zero and correlation matrix (t n s) K = (t n s)(K''), where
K" =
00
J
B''(r) dz.
00
It is obvious that (, has independent increments. The assertion that the distribution of Cr converges to a Gaussian distribution (for a given t) constitutes the content of the central limit theorem for random processes (cf., for example, Rozanov [1], Ibragimov and Linnik [1]). If condition F of §4 is satisfied for the process X;, then we can apply Theorem 4.1, which enables us to estimate large (or order 1) deviations of
of X, from the linear function mt + x, t E [0, T]. Since the right side of equation (1.2) does not depend on x, neither does the function H(x, a) defined by condition F, and the normalized action functional S0T((p) has the form S0T(cp) = f a L(4) ds, where L(/J) is the Legendre transform of H(a). As has
been explained earlier, it is important to be able to determine the function u(t, x) = inf{S0,((p): cpo = x0, (p; = x) .
for the calculation of the asymptotics of the probabilities of many interesting events. In the case considered here, the infimum and the extremal for which
259
§g. Examples
the infimum is attained can be calculated easily. We shall assume that L(J3) is strictly convex. Then Euler's equations
dt VL(O) = 0 for the functional S((p)show that only the straight lines 0 = c, c e R' are extremals. Using conditions at the endpoints of the interval [0, t], we obtain that the infimum is attained for the function ¢S = xo + [(x - xo)/t]s, and
u(t, x) = tL((x - xo)/t) Suppose we would like to determine the asymptotics of In P{X, a D} as E 10, where Xf = $o b(51) ds, and D is a bounded set in R' with boundary aD. It follows from Theorem 4.1 that
lim a In P{X E D} _ -t
inf
CIO
provided that the infimum coincides with the infimum taken over all interior points of D. If, moreover, the infimum is attained only at one point A e D u 8D, then it is easy to show that
lim P{IX1, - A I
for any S > 0. For example, let , be defined by the equality ' =
for t e [i, i + 1), i is an integer,
ni
where rio, 1 1 , ... , ryR,...
(8.1)
is a sequence of independent variables with a
common distribution function F(x). Then condition F is satisfied and
H4(a) = In
f-00
eca.bcyu dF(y),
00
provided that the integral under the logarithm is convergent. In this case Theorem 4.1 is close to theorems on large deviations for sums of independent terms. Theorem 4.1 is concerned with the rough, logarithmic, asymptotics of probabilities of large deviations, while theorems on large deviations for sums usually contain sharp asymptotics. On the other hand, Theorem 4.1 can be used to estimate probabilities of events concerning the course of a process X, on a whole interval t E [0, T] and not only for events related to a given moment of time.
260
7. The Averaging Principle
EXAMPLE 8.2. Now let equation (1.2) have the form
Xo = x e R',
.k': = b(Xt) + a(X;) rll,
(8.2)
where b(x) = (b' (x), . . . , b'(x)), a(x) = (a;(x)), and , is an r-dimensional random process with 0. The functions b'(x), aj(x) are assumed to be bounded and sufficiently smooth. If the diagonal entries of the correlation matrix B(s, t) of , converge to zero as I t - s I - oc, then by virtue of Theorem 2.1 we can conclude that X' converges in probability to the solution of the differential equation X0 = x,
X,
(8.3)
uniformly on the interval 0 < t < T as E J. 0. If , has good mixing properties, then by means of Theorem 3.1 we can estimate the normal deviations from x,: we can calculate the characteristics of the Gaussian process CO, the limit E-'12 (Xv of We now assume that condition F is satisfied for the process ,: there exists
a function H,(a): R' -> R' such that for any step function as: [0, T] - R' we have r
lim E In M exp E-' e10
rT
J0
rT
(as, bs/e) ds =
J0
H,(as) ds;
let H,(a) be differentiable with respect to a. As is easy to see, condition F is satisfied for equation (8.2) and H(x, a) = (b(x), a) + H4(a*(x)a).
(8.4)
The Legendre transform L(x, /J) of H(x, a) can be expressed simply in terms of the Legendre transform L,(/3) of H,(a):
L(x, fl) = L,(a-1(x)(fl - b(x))), provided that the matrix a(x) is nonsingular. For example, let , be a Markov process taking two values e1, e2 a R', let (p;;(t)) be the matrix of transition probabilities and let q;; = (dp;;/dt)(0). As is proved in §4, condition F is satisfied for 4, and H,(a) is equal to the largest eigenvalue of (qi1 + (a, e1)
q21
q12
q22 + (a,ez)
261
18 Examples
We consider the case where q = q22 = -q, e, _ -e2 = eeR'. Solving the characteristic equation, we find that
H(a) _ -q +
`q2
+ (a, e)2,
and by means of relation (8.4) we obtain the function H(x, a) for the family of processes X. We assume that 0 is an asymptotically stable equilibrium position for system (8.3). For the determination of the asymptotics of the mean exit time of a domain containing 0, of the point through which the exit takes place, of
the asymptotics of the invariant measure of X' and of other interesting characteristics, we have to calculate the function T
V(x) = inf f L(cp cps) ds: cpo = 0, (PT = x, T > 0 , I 0
as follows from §6. This function can be calculated as the solution of problem
Ro (cf. §4, Ch. 5) for the equation
(b(x), VV(x)) - q +
q2 + (a*(x)V V(x), e)2 = 0.
If system (8.3) has an asymptotically stable limit cycle F, then deviations from this cycle can be described by the quasipotential Vr(x), which can be determined as the solution of problem R. (cf. §4, Ch. 5) for the same equation. In this example we now consider deviations of order E", x e (0, z), from the equilibrium position 0. If condition F" is satisfied for ,, i.e., if the limit
lim E' -2" In M exp a"-' CIo
rT
Jo
1
(as,ds } = 2 )))
(Ccas, as) ds fo,
exists, where C4 is a symmetric matrix and as is any step function on [0, T], then, as is easy to see, condition P is also satisfied for the process X' defined
by equation (8.2) and we have C = a(0)C4a*(0). Let , be the Markov process with two states considered above. As follows from Theorem 7.2, condition F" is satisfied and C4 = (a2H4(a)/aa;aa;Io) = (1/q e'e'), where e', e2, ... , e' are the components of the vector e. Consequently, in this case for the family of processes X' we obtain C = (1/q)a(0)(e'e')a*(0). It is easy to prove that conditions F and P are satisfied for equation (8.2) and in the case of the processes c, defined by equality (8.1) EXAMPLE 8.3. Let us consider the van der Pol equation with random perturbations:
R + w2x = S[ f (x, X, Vt) + cp(x, z) ,].
(8.5)
262
7. The Averaging Principle
Here f (x, )i, vt) is a sufficiently smooth function, periodic in t with frequency
v, , is a stationary process in R' with vanishing mathematical expectation and correlation function K(r), and cp(x, z) is a smooth bounded function. As in the deterministic case, by introducing the van der Pol variables (r, 0), which are defined by the relations
x = r cos(wt + 0),
x = - rco sin(wt + 0),
and the "slow" times = Et, equation (8.5) can be rewritten as a system of two equations dr`
ds = d0`
Ts-
F 1(ws/E + 0`, vs/E, r`,
= F2(ws/E + 0`, vs/E, r`,
where F 1(-r, vt, r,
) _ - 1co [f (r cos r, - rw sin T, Vt) + cp(r cos T, -rw sin t)f] sin -r,
FAT, vt, r, ) _ - Yw [Pr cos t, - rw sin t, vt)
+ cp(r cos t, - rw sin T)f] cos T.
If K(T) decreases with the increase of T, then the right sides of equations (8.6) satisfy the hypotheses of Theorem 7.1. Moreover, as is easy to verify, rT
lim 1-
T--. T Jo
cp(r cos(ws + 0), - rw sin(ws +
= lim
rT
1J
T--W T o
cp(r cos(ws + 0), - rw sin(ws +
0) ds 0) ds
= 0,
so that the terms containing the factors Ss in the expressions for the functions F 1, F2 vanish upon averaging and the averaged equations have the same form as in the absence of stochastic terms. If the ratio w/v is irrational (this case is said to be nonresonant), then the averaged system has the form dF ds
=
F1(F),
A Ts
= F2(?),
where the functions F1(r), F2(r) are given by formulas (1.9).
263
gg. Examples
Now we assume, in addition, that , has finite moments up to order seven inclusive and satisfies the strong mixing condition with a sufficiently rapidly decreasing coefficient M(T). By Theorem 3.1 the family of random processes
E- 112(r, - r O - 8,), t e [0, T] converges weakly to the Gaussian Markov process (p 0) which corresponds to the differential operator
(Ah1(1i)- e + 2A12(r,)
Lu(p,
(r,)
+p
dr l
A22(r1) 002)
p + p rdF2(r,) 00,
where the functions A'i(r) are defined by the equalities f,2 ,t
A 1'(r) = 2
w3
ao
dt o
/_
dsK (t
J
to
s ) cp(r
cos t, - rw sin t)
x cp(r cos s, - rw sin s) sins sin t, z,t
f
A22(r) = 2nr2w3 f dt o
x cp(r cos s,
A
2rrrw3
52dt J o
/
- r(o sin s) cos t cos s, roo
1
_1
dsK t w s I cp(r cos t, - rw sin t)
(t
dsK \
w- sl/Jcp(r cos t, -rw sin t)
- 00
x cp(r cos s, - rw sin s) cos t sin s.
As was indicated in §1, if ro is the only root of the equation F1(r) = 0 and the function F1(r) changes from positive to negative in passing through ro, then independently of the initial conditions, periodic oscillations with amplitude close to ro and frequency close to co are established in the system
described by equation (8.5) without random perturbations ((p = 0) for sufficiently small E. If random perturbations are present, then, as follows from Theorem 7.1, with probability converging to I as E - 0, the phase trajectories (X,', approach a limit cycle of the form
I',o = {(x, z): x = ro cos(wt + 0), x = -row sin(wt + 0)),
over time of order E-' and they perform a motion close to the oscillations of the unperturbed system along deviating from these oscillations from time to time and returning to them again. We assume that at the initial moment the system was moving along on the cycle r,0. Then on time intervals
of order E-', deviations from periodic oscillations have order f with overwhelming probability, according to Theorem 3.1. By means of this theorem we can calculate various probabilistic characteristics of these deviations.
264
7. The Averaging
For example, to calculate the probability that the amplitude r; deviates from ro by more than h-1118- at least once over the time [0, T/e], we need to
solve the boundary value problem au
as
(s,
r) = Z(ro) a22
dF, au, + or dr (ro)r ar
-h < r < h;
u(s, - h) = u(s, h) = 1.
u(0, r) = 0,
The desired probability will be close to u(T, 0) for e < 1. Deviations of order 1 take place over times of order greater than s -'. To study these deviations, we need to use the results of §4. For example, let , = nk for t E [a - nk/w, a + nk/w), where q0, n I, are identically
, nk
distributed independent random variables and a is a random variable
independent of {n;} and uniformly distributed in [0, 2n/w]. It is easy to see that condition F is satisfied for system (8.6) and
H(r, 0, al, a2) = H(r, a,, a2) = Fl(r)al + F2(r)a2 + H"(allpl(r) + a202(r)), where 2n
1
4p,(r)
02(r) = -
Znw
cp(r cos s, -rw sin s) sin s ds, fo
1
Z rzq
2nrwJo
(r cos s, -rw sin s) cos s ds,
H"(t) = In Me`",. The functions F,(r) and F2(r) are defined by equalities (1.9). As is shown in §§4 and 6, the asymptotics of various probability theoretical characteristics of large deviations from the unperturbed motion can be calculated by means of the function H(r, a,, a2). Let (a, b) 3 ro and let ie = min It: r; 0 (a, b)}. We calculate lime 10 e In M,, ate for r E (a, b). It follows from the results of §6 that this limit is equal to min(u(a), u(b)), where the function u(r) is the quasipotential of random perturbations on the half-line r >- 0. It can be determined as the solution of problem R,a for the equation
The solution of this problem obviously is reduced to the determination of the nonzero root of the equation F,(r)z + H"(gp,(r)z) = 0 and a subsequent integration. For example, let the variables qk have a Gaussian distribution,
265
p. Examples
Wk = 0 and Dqk = a2. In this case the results of §4 are applicable (cf. Grin' [3]). We have a2T 2
H,,(T) =
2
z(r)
wi(r)
'
and
u(r) = - 1ra r 6 rD,i P)
d p.
Now we assume that the equation F, (r) = 0 has several roots ro < r, < . . < r2e and the function F1(r) changes sign from plus to minus at roots with
even indices and from minus to plus at roots with odd indices. If ro e (r2k- 1, r2k+ 1), then oscillations with amplitude close to r2k are established in
the system without random perturbations for small e. In general, random perturbations lead to passages between stable limit cycles. Let u2k(r) be the solution of problem R,2k for equation (8.7) on the interval u2k(r) < oo and u2k(r2k+1) < u2k(r2k_1). Then with probability close to 1 fore small, a passage takes place from the cycle I',2k = {(r, 0): r = r2k} to the cycle r,2(k + and the average time needed for the passage is logarithmically equivalent to exp{e- lu2k(r2k+1)}. We define a function V(r) on (0, oo) by the equalities V (r) = u,o(r)
for r E (0, r,];
V(r) = V(r2k-1) + u,2k(r) - u,2k(r2k-1)
for re [r2k-1, r2k+ 11.
The function V(r) has local minima at the points ro, r2, ... , r2ri. We assume that V(r) attains its absolute minimum at a unique point r2k.. Then, as follows from results of Ch. 6, the limit cycle rr2,,. is the "most stable": for small E the trajectories (r;, 0;) spend most of the time in the neighborhood of EXAMPLE 8.4. We consider the linear system
X; =
b(g,1E),
Xo = x.
(8.8)
The entries of the matrix A(y) = (A;(y)) and the components of the vector b(y) = (b'(y), ... , br(y)) are assumed to be bounded. Concerning the process , we assume that it possesses a sufficiently rapidly decreasing mixing coefficient and MA(g) = A = (A), b. Relying on Theorem 2.1 we conclude that X; converges in probability to the solution of the differential equation
x,=Ax,+b,
x0 = x,
uniformly on the interval 0 < t < T as a 10. The solution of this equation can be written as
x, = Jexp{A(t - s)}5 ds + e'"x. 0
266
7. The Averaging Principle
To estimate normal deviations of X ; from x,, we need to use Theorem 3.1. For the sake of simplicity we assume that , is stationary and b(y) 0. We denote by the joint correlation function of the processes and A",( ,)
and write Kim = f .1
Krm(t) (It,
G`"(r) = I K`^
x
In
j. M
Then by.Theorem 3.1 the normalized difference " = F-'"2(X' - x,) converges weakly to the Gaussian process S, = JO e( as F --> 0, where qs is a Gaussian process with independent increments and mean zero such that MryryT = 10 G`(x5) ds (cf. Khas'minskii [4]).
It is easy to give examples showing that the system obtained from (8.8) upon averaging may have asymptotically stable stationary points even in cases where the vector fields A(y)x + b(y) do not have equilibrium positions for any value of the parameter y or have unstable equilibrium positions, In the neighborhood of such points the process X', F < 1, spends a long time or is even attracted to them with great probability. For the sake of definiteness, let S, be the Markov process with a finite number of states, considered in Theorem 4.2, let b = 0 and let all eigenvalues of the matrix A have negative real parts. Then the origin of coordinates 0 is an asymptotically stable equilibrium position of the averaged system. Let D be a bounded domain containing the origin and having boundary aD and let V(x, OD) be the function introduced in §6. If V(0, aD) < x, then in accordance with Theorem 6.1, the trajectories X,, Xo = x ED, leave D over time t` going to infinity in probability as F -> 0 and In M t` - F-' V(0, aD). The case V(0, OD) = + ce is illustrated by the following example. Let S, be a process with two states and let A(y) be equal to 0),
A,
A2 =
a
('
0)
-
in these states, respectively. We consider the system X; =
If
,
did not pass from one state to another, then for any initial state So, the origin
of coordinates would be an unstable stationary point-a saddle. Let the matrix Q for the process , have the form
0= The stationary distribution of this process is the uniform distribution, that is The averaged system (2, i)
XI = i(g - a)x',
x2 = '-z(u - a)x2
267
fig. Examples
has an asymptotically stable equilibrium position at 0 if a > a. It is easy to prove that in this case the trajectories of X' converge to 0 with probability 1 as t -+ 0 for any s > 0, i.e., due to random passages of ,, the system acquires
stability. By means of results of the present chapter we can calculate the logarithmic asymptotics of P.,{-r' < oo} as a - 0, where rt = min{t: X' D} (D is a neighborhood of the equilibrium position) as well as the asymptotics of this probability for x -> 0, s = const.
EXAMPLE 8.5. Consider again equation (8.2). Let r = 2, b(x) = OH(x) _
(8H(x)/8.x2,-8H(x)/8x'), and ,, -oo < t < oo, be a two-dimensional mean zero stationary process with the correlation matrix B(r). The averaged system is now a Hamiltonian one:
X, = OH(X,),
To = x e R2,
(8.9)
and the Hamiltonian function H(x) is a first integral for (8.9). Assume that
the function H(x), the matrix a(x), and the process , are such that conditions 1-5 of Theorem 3.2 are satisfied. Here g(x,z) = Q(x)z,
F(x,z) = (VH(x),a(x)z), (8.10)
D(x, s) = M (VH(x), Q(x)cs)(V H(x),
Q(x,s) = M(V (VH(x), a(x)is),
x, z e R2, s > 0.
Let B(r) = (Bj(r)), B'J(r) =
B=
r
B(r) dr.
The convergence of this integral follows from our assumptions. One can derive from (8.10) that
D(x) = 2 J0 D(x,s)ds = (a(x)Ba*(x)VH(x),V.H(x)). 0
Since B(r) is a positive definite function, h and a(x)Ba*(x) are also positive definite. Thus one can introduce a(y) such that Gr2 (y) =
(J")
dl IVH(x)I
(a(x)BQ*(x)VH(x),VH(x))dl
r C(y)
IVH(x)
To write down the drift coefficient for the limiting process, we need some
268
7. The Averaging Principle
notations. For any smooth vector field e(x) in R2 denote by Ve(x) the matrix (ey(x)), ey(x) = 8ej(x)/8x'. Simple calculations show that
Q(x) = JO Q(x, s) ds = tr(a*(x) V(a*(x)VH(x)) B), 0
and we have the following expression for the drift,
B(y) =
dt l cty) IvH(x)I
)-
tr(a*(x) V(a*(x)VH(x)) - B) del
'
JC)
IVH(x)I
Let, for example, a(x) be the unit matrix. Then
D(x) = (BVH(x), VH(x)),
Q(x) = tr(H(x)B),
where H(x) is the Hessian matrix for H(x) : Hy(x) = 82H(x)/8x'8xi.
§9. The Averaging Principle for Stochastic Differential Equations We consider the system of differential equations
X`=b(X`,Y`)+a(X`,Y`)w,
Xo=x, Yo = y,
E-'B(X`, y,) + E-112C(X`, Y`)w,
(9.1)
where x E R', Y E R1, b(x, y) = (b1(x, y), ... , b'(x, y)),
B(x, y) = (B'(x, y), ... , B'(x, y)),
w, is an n-dimensional Wiener process and a(x, y) = (aj(x, y)), C(x, y) _ (C;(x, y)) are matrices transforming R" into R' and R', respectively. The functions b`(x, y), B'(x, y), a'(x, y), C).(x, y) are assumed to be bounded and
satisfy a Lipschitz condition. By this example we illustrate equations of the type (1.5) where the velocity of fast motion depends on the slow variables.
We also note that in contrast to the preceding sections, the slow variables in (9.1) form a random process even for given Y;, t E [0, t]. We introduce a random process Y, Y, x e R', y e R', which is defined by the stochastic differential quation
Y,'= B(x, Y;') + Cx, Y;r)rv
Yo'= y.
(9.2)
The solutions of this equation form a Markov process in R', depending on x e R' as a parameter.
269
19. The Averaging Principle for Stochastic Differential Equations
First we formulate and prove the averaging principle in the case where the
entries of the matrix o(x, y) do not depend on y and then we indicate the changes necessary for the consideration of the general case. We assume that there exists a function b(x) = (b'(x), 52(X),..., b'(x)), x e R', such that for any t > 0, x e R', y e R' we have i+T
1
M
where x(T)
b(x, Ysy) ds - b(x) < x(T),
T Jr
0 as T
(9.3)
oo.
Theorem 9.1. Let the entries of o(x,y) = o(x) be independent of y and let condition (9.3) be satisfied. Let us denote by X, the random process determined in R' by the differential equation'
X0 = X.
X, = b(X,) +
Then for any T > 0, S > 0, x e R' and y e R' we have
er
limp sup IX ; - X, l >6 = 0. 0
e--0
y
Proof. We consider a partition of [0, T] into intervals of the same length A. We construct auxiliary processes Y; and '' by means of the relations
f e B(X ke, YS) ds +
Yi = Y ke +
1
t e [kA, (k + l )A], = x + J b(XtS/ele
ds +
f C(Xke, ') dw,,
$c(X;) dws. 0
0
We show that the intervals A = A(s) can be chosen such that c ' A(e) - oo,
MAY; uniformly in x e R', y e R' and t e [0, T]. It follows from the definition of Y; and Y; that for t belonging to [kA, (k + 1)A] we have
MAY; - Y'12=M
1
fkA
[B(X YS) - B(Xke,
ds 2
+
1
"fir
'
J ke
[C(Xs, YS) - C(Xke, f5')] dws
' In this equation, it-, is the same Wiener process as in (9.l ). Since b(x, y) satisfies a Lipschitz condition, b(x) also satisfies a Lipschitz condition, so that the solution of the equation exists and is unique.
270
7. The Averaging Principle
if A
r
A
A
EZ[ f MI X;-Xkel2ds+ rMlYs-Ysi2ds)
+
fkA
e \
<
f MIYS - £sl2dsl` r
MIXS-Xkel2ds+
+11f
C3 E
E
r
MIXS-Xk,I2ds
kA
+C4(2+e) fMIYS
-
fSIZds.
(9.5)
kA
``
Here and in what follows, we denote by C; constants depending only on the Lipschitz coefficients of (b'(x, y), B'(x, y), 6,(x, y), Ci(x, y), the maximum of the absolute values of these coefficients and the dimension of the space.
It follows from the boundedness of the coefficients of the stochastic equation for XS that for A < 1 we have the estimate
MIXS-XkAI2
(9.6)
for s e [kA, (k + 1)A]. We obtain from this inequality and (9.5) that MI
(E1
A
I`
(EZ A
Yt -Y; IZ < C6I` + e2 AZ + C6
+
E
M I YS - YSIZ ds fkA
for t e [kA, (k + 1)A], from which we arrive at the relation
M I YS - 2 I2 < C6 \ +
)AexP{Ce(+ EZ
From this we conclude that (9.4) is satisfied if we put
A=A(E)=E4ln
'.
Now we show that for any S > 0 we have P10:S sup
IX'-,9'1 >6 } --*0
1ST
(9.7)
)))
as e -. 0 and A = A(e) = E ° In E-' uniformly in x e R', y e R'. Indeed, it follows from the definition of X; and k' that
P{ sup I X; - 2"1 > S } < P{ } t IOS,ST
fr
I b(X S, YS) - b(X [SYA]A, 's) I ds > S
271
19. The Averaging Principle for Stochastic Differential Equations
Estimating the probability on the right side by means of Chebyshev's inequality and taking account of (9.4) and (9.6), we obtain (9.7). It is also easy to obtain from (9.4) and (9.6) that sup MIX,
(9.8)
OS+sT
as
XS I converges to zero in probability
Now we show that supo s s : 5 T
as t - 0. The assertion of the theorem will obviously follow from this and (9.7).
First we note that it follows from the definition of f, that for s e [0, A] the process Z, = Pke+E coincides in distribution with the process XSIk°',k° defined
by equation (9.2). We only have to choose the Wiener process w, in (9.2) independent of Xke, Y. Taking into account that E-'A(E) - oo, we obtain, relying on (9.3), that (k+1)A
M
b(Xks, Y-) dt - Ab(Xke)
J kA
fE = AM
[b(Xke, ZS) - b(Xkn)] ds < A x(A/E).
A
Using this estimate, we arrive at the relation
M{ sup
V) ds - J
f b(Xv 1,1,1,
(Osr5T
0 I
< M fOS1!5jT1Al max
k=0
f
b(XS) ds 0
}
(k+1)A
[b(Xko, PS- b(Xko)] ds
I
+ c7 A
A
[Tln]
< YM
[b(Xke, PS) - b(Xke)] ds + C,A
k=0
S C7 A + T x(A/E) - 0
(9.9)
as e - 0, since A(E) - 0, x(A(E)/E) - 0 as E - 0. We estimate mt(t) = M 1kr. . - X, 12. It follows from the definition of X, and A'; that
k,
fo [
+ U pon
b(X[SlelnPS) - b(XS)] ds
J[b(X) - b(X5)] ds + f[a(X)
- cr(XS)] dw3.
squaring both sides and using some elementary inequalities and the
272
7. The Averaging Principle
Lipschitz condition, we arrive at the relation m'(t) < C8 t f'm(s) ds + C9 ftm(s) ds 0
f b(X[S/ele, Ys) ds - fo I
+ 3M
t
b(Xs) ds
0
We obtain from this relation that 2
m`(t) < 3M
f,) ds
I
fo,
-
fo b(XS)] ds
f or t e [0, T]. This implies by (9.9) that m`(t) -+0. For 6 > 0 we have the inequality
P{ sup
05t
e to(T+T2
0 uniformly on [0, T]
as
IfC;-X,I>b} }}
< P sup
0<(tT
J'[b(xts/lA ,Ys) - b(Xs)] ds ll
+ P{ f Tl b(XS) -
ds > 6/6 }
o
+ P{ fo, 15(k) - b(Xs)I ds > 6/6 t
f t [6(Xs) -
dws > 8/61
0
J
Jcr() - a(X:)] dws > 8/6} J
0
The first term on the right side converges to zero by virtue of (9.9). To prove that the second and third terms also converge to zero, we need to use Chebyshev's inequality, relation (9.8) and the fact that W(t) - 0 as F 10. The fourth and fifth terms can be estimated by means of Kolmogorov's inequality and also converge to zero. Consequently, we obtain that SUPO
-
(9.7). O Now we briefly discuss the case where the entries of u depend on x and y.
We assume that condition (9.3) is satisfied, and moreover, there exists a matrix a(x) = (a'j(x)) such that for any t >- 0, x e R' and y e R' we have
f,IT max M i,j T
Y Qk(x, YXY)cr (x, Ys'') ds - a`j(x) k
I
< x(T),
(9.10)
273
§9. The Averaging Principle for Stochastic Differential Equations
where x(T) 0 as T -+ cc and Y'Y is the solution of equation (9.2). Let X, be the solution of the stochastic differential equation
X0 = x1
X, = b(Xr) + 5(X,)wr,
where d(x) = (a(x))112 and w, is an r-dimensional Wiener process. It turns out that in this case X, converges to X,. However, the convergence has to be understood in a somewhat weaker sense. Namely, the measure corresponding to X in the space of trajectories converges weakly to the measure corresponding to X, as a 10. We shall not include the proof of this assertion here but rather refer the reader to Khas'minskii [6] for a proof. We note that in that work the averaging principle is proved under assumptions allowing some growth of the coefficients. Conditions (9.3) and (9.10) are also replaced by less stringent ones.
We consider an example. Let (r`, (p`) be the two-dimensional Markov process governed by the differential operator Le-1
a2
2 are
+ b(r, (p)
a
ar
+
1B(r, (G) a(p + zC (r, (p) a(pza l a
e
1
2
This process can also be described by the stochastic equations r` = b(r`, (p`) + rv 1,
or, = s - 'B(rzq,) + r- 1/2C(r` (pE)w2. We assume that the functions b(r, (p), B(r, (p) and C(r, (p) are 2n-periodic in
(p and C(r, cp) > co > 0. In this case the process 0;'0) obtained from the process (ro)
= (p0o) +
foB(ro, (p(ro)) ds +
J'C(ro,
(pa o1) dws
0
by identifying the values differing by an integral multiple of 2n is a nondegenerate Markov process on the circle. This process has a unique invariant measure on the circle with density m(ro, (p) and there exist C, , > 0 such that for any bounded measurable function f ((p) on the circle rznf(W)m(ro, (P) d4
I
< Ce-xt sup I f((P)I osro<-2n
0
(cf., for example, Freidlin [3]). This implies relation (9.3). Indeed, putting b(r) = I"' b(r, (p)m(r, (p)d(p, we obtain 1
2
rT
(M0 T J b(r, qps')) ds - b(r) 0 ) S MOO
i
T
T fo
b(r, q, ") ds - b(r)
274
7. The Averaging Principle 1
Tz
r
ff o
r = Tz f ds 2
T
(p(") - b(r))(b(r, (0()) - b(r)) ds dt
o
rT
f
Mwo(b(r, COs`)) - b(r))M.,(b(r,
-:) - b(r)) dt
r', maxIb(r,C (p)J f ds edt 0 T
r.q
o
asT -oo. Hence by Theorem 9.1, r; converges in probability to the diffusion process r, satisfying the differential equation by') + VV,'
ro=ro=r,
uniformly on the interval 0 -< t < T as a j 0. If a functional F[r,, 0 < t < T] of the process r; is continuous in C0T(R'),
then what has been said implies that F[r,, 0 < t < T] -+ F[i, 0 < t S T] in probability as E 10. The convergence is preserved if F has discontinuities such that the set of functions at which F is discontinuous has measure zero with respect to the measure induced by the limit process r, in the function space.
The results discussed in this section can be used for the study of the behavior, as E 10, of solutions of some elliptic or parabolic equations with a small parameter. Consider, for example, the Dirichlet problem
L`u`(r, q) = 0,
r c- (r,, r2); (9.11)
u`(r,, (p) = C
u`(rz, (P) = C2
in the domain D = {(r, cp): 0 < r, < r < rz}. If (r, cp) are interpreted as polar coordinates in the plane, then the above domain is the ring bounded by the concentric circles of radii r, and r2, respectively, and center at the origin of coordinates. As is known, the solution of this problem can be written in the form u`(r, (p) = C, P,. q,{ria = r, } + C2 P,. m{rte = rz },
where r` = inf{t: r, 0 (r rz)}. We write r = inf{t > 0:
0 (r,, rz)}. It is easy to verify that max,1 <, ,2 Pr{r > T) - 0 as T - oo and that the boundary points of [r,, r2] are regular for r, in [r,, rz], i.e., that P,,{r = 0} = 1, i = 1, 2 (cf. Wentzell [1]). This and the uniform convergence in probability of r, to r, on every finite interval [0, T] imply that
lim u`(r, (p) = u(r) = C, Pr(F, = r, ) + C2 P,{rT = rz }. f1o
275
ig. The Averaging Principle for Stochastic Differential Equations
The function u(r) can be determined as the solution of the problem
Zu"(r) + b(r)u'(r) = 0, u(r1) = C1, Solving this problem, we obtain that
u(r2) = C2.
r1
lim uE(r, q,) = u(r) = C, + (C2 - C1) Elo
r c- (r,, r2).
f rexpI -2 ,
($Y2exp{ x
-2
J0
yb(x) dx } dy
J0
}}
b(x) dx d,
Some examples of a more general character can be found in Khas'minskii [6].
Now we consider large deviations in systems of the type (9.1). We restrict ourselves to the case where there is no diffusion in the slow motion and the fast motion takes place on a compact manifold and the diffusion coefficients with respect to the fast variables do not depend on the slow variables.
Let M, E be two Riemannian manifolds of class C. Suppose that E is compact and dim M = r, dim E = 1. We denote by TMX and TEy the tangent spaces of M and E at x e M and y c- E, respectively. We consider a family of vector fields b(x, y) on M, depending on y E E as a parameter and a family B(x, y) of fields on E, depending on x c- M. On E we consider an elliptic
differential operator L of the second order, mapping constants to zero. The functions b(x, y), B(x, y) as well as the coefficients of L are assumed to be infinitely differentiable with respect to their variables. On the direct product M x E we consider the family of Markov processes Z; = (X;, Y;) governed by the operators !Ef (x, y) = (b(x, y), V f (x, y)) + e - '[Ly f(x, y) + (B(x, y), V f (x, y))],
where V, V, are the gradient operators on M and E, respectively. In coordi nate form the trajectory of the process, Z' = (X;, Y') can be given by the system of stochastic equations X, E' = b(X', Y;), (9.12)
g(YE)] +
where g(y) = (g'(y), ... , g'(y)) are the coefficients of first order derivatives in L, the matrix C(y) is connected with the coefficients a"(y) of higher order differentiations in L by the relation C(y)C*(y) = (a' (y)) and w, is an 1dimensional Wiener process.
276
7. The Averaging Principle
If M is not compact, then we have to impose an assumption on b(x, y) that it does not grow too fast: for any T > 0 and x c- M there exists a com. pactum F c M, x c- F, such that Pxy{X' E F for t c- [0, T]) = 1 for every
yEEand e > 0. Let a be an element of the dual T *Mx of TM.. We introduce the differential
operator R = R(x, z, a) acting on functions f (y), y c- E, according to the formula R(x, z, a)f (y) = Lf (y) + (B(z, y), V f (y)) + (a, b(x, y))f (y);
x, z e M and a E T *M.,, are parameters. For all values of the parameters, R(x, z, a) is an elliptic differential operator in the space of functions defined on E. It is the restriction, to smooth functions, of the infinitesimal generator of a positive semigroup. Analogously to §4, we can deduce from this that R(x, z, a) has a simple eigenvalue µ(x, z, a) with largest real part. This eigenvalue is real and by virtue of its simplicity, it is differentiable with respect to the parameters x, z, a. We introduce the diffusion process Y', z E M, on E, governed by the operator IYZ = L + (B(z, y), Vy).
Lemma 9.1. Let a E T*Mx and let F be a compactum in M. The limit lim
1 1
T-. T
(a, b(x, YE)) ds = µ(x, z, a)
In My exp fo,
exists uniformly in x, z E F and y c- E. The function µ(x, z, a) is convex downward in the variables at.
Proof. Let us write V(x, y, a) = (a, b(x, y)). The family of operators T' acting in the space of bounded measurable functions on E according to the formula
T; f (y) = My f (Y;) exp{ f V(x, Ys, a) ds } )
o
forms a positive semigroup. The assertion of Lemma 9.1 can be derived from this analogously to the proof of Theorem 4.2. Let us denote by L(x, z, f3)(x, z c- M, fi E TM.,) the Legendre transform of the function µ(x, z, a) with respect to the last argument:
L(x, z, fi) = sup[(a, 09
µ(x, z, a)].
277
ig, The Averaging Principle for Stochastic Differential Equations
We shall sometimes consider L(x, z, /3) with coinciding first two arguments. We write L(x, x, /3) = L(x, /3). This function is obviously the Legendre transform of µ(x, x, a). We note that L(x, z, a) is lower semicontinuous.
Theorem 9.2 (Freidlin [11]). Let X; be the first component of the Markov process Z, governed by the operator 2` on M x E. Let us put T
SOT((P) =
J0
L(q,, (P) ds
for absolutely continuous functions P E COT(M); for the remaining 4P E COT(M) we set SoT(co) _ + co.
The functional E 'SO 40 is the action functional for the family of processes X t e [0, T] in COT(M) as E 10.
Proof. Together with the process Z; = (Xr;, Y;), we consider the process where
C = E-1g(Yc) +
We write e(x, y) = C-1(y)B(x, y). The process 2; differs from Z, by a change of the drift vector in the variables in which there is a nondegenerate diffusion, so that the measures corresponding to these processes in the space of tra-
jectories are absolutely continuous with respect to each other. Taking account of this observation, we obtain for any function p: [0, T] - M and S > 0 that Px.y{POT(X`, (P) < 6} = Mx,y
(
{Por(2L, (P) < 6;
T
l
T
exp{ E- Iii f (e(2s, ?D, dw3) -
(2E)-1
(9.13)
Jo I e(9 t, Ys) I2 ds} }.
o
Let ik("): [0, T] - M be a step function such that PoT((P, ,(")) < 1/n. For any y, C > 0 we have
r
T
f
J0
(e(Z.r YS), dw,) -
Jo
dw3)
I
> 4E ; POT(k', (P) < b}
< exp{-CE-1} for sufficiently small b and 1/n. This estimate can be verified by means of the exponential Chebyshev inequality. We obtain from this and (9.13) that for
278
7. The Averaging Principle
any cP E COT(M) and y > 0 we have MX.
v
PoT(
`, (P) < b; exp E- 1!2 $(e(i S
YS), dws)
0
-
(2E)-
1
I
T
I e('Yn,, Ys) I2 ds s
3E
< PX.,{POT(X`, Jo (P) < 6)
< MX.Y{POT(?`, (P) < 6; exp{ E- 1!2 fT(e(4n),
-
(2E)-1
(9.14)
s) I2 ds + 3E
I
JoT
Ys), dws)
o
(
for sufficiently small 6 and 1/n. We introduce still another process, Z; = (X;, Y;), which is defined by the stochastic equations
X, = b(g;, Yc), l' Yt _ E-1g(2t) + e - 1B(Wi'
f3 ,
I I) + e
1/ 2C(
t)wr
in coordinate form. Taking account of the absolute continuity of the measures corresponding to Z; and Z;, it follows from inequality (9.14) that e- vl2E < PX. v{POT(X `, (P) < S} < ev12c PX.,,{POT(
(9.15)
`, (P) < 6)
for sufficiently small 6 and 1/n.
Let r, < t2 <
< be the points where l;"1 has jumps, to = 0, T and ,/4) = O(k) for t e [tk, tk+,), k = 0, 1, ... , m - 1. The process
X; satisfies the hypotheses of Theorem 4.1 on every interval [tk, tk+1). The role of S,,, is played by which can be represented in the form ?' = Y,! Y, = 9(Y,) + B(O(k), ?;) + C(? )w,. The fulfilment of condition F follows
from Lemmas 4.2 and 9.1. The corresponding functional has the form
L(/i(k), cps, cps) ds for absolutely continuous functions. It follows from this and (9.15) that for any y > 0 and sufficiently small 6 and 1/n there exists so > 0 such that for s < to we have the estimates
f;k
/ pT exp -E-1(J L(O("), rPs, 0.) ds + 2 0
PX.y{POT(X`, (P) < b}
rT
S exp - E- 1 (I L(lIis"1, (p., 0.) ds - 2 0
(9.16) )}.
279
§9. The Averaging Principle for Stochastic Differential Equations
The concluding part of the proof of this theorem can be carried out in the same way as the end of the proof of Theorem 4.1. We leave it to the reader. Remark 1. The assertion of Theorem 9.2 remains true if the manifold E is replaced by a compact manifold E with boundary and as the process Z, = (XI, Y;), we choose a process in M x E, which is governed by the operator 2' at interior points and for y belonging to the boundary aE of E it satisfies some boundary conditions, for example, the condition of reflection along a field n(y), where n(y) is a vector field on aE, tangent to E but not tangent to E. The study of such a process can be reduced to the study of a process on the
manifold M x E', where E' is the manifold without boundary, obtained by pasting two copies of E along the boundary aE (cf. Freidlin [3]). Remark 2. It is easy to write out the action functional characterizing deviations of order a", x e (0, ) of the process X, defined by equations (9.12) from the averaged system. For example, let the origin of coordinates 0 be an equilibrium position of the averaged system. Then the action functional for the process Z', = e- 'X; has the same form as in Theorem 7.1; as the matrix C we have to take (a2µ(0, 0, a)/aai aaj) for a = 0. As a majority of results related to diffusion processes, the theorem given here on large deviations is closely connected with some problems for differential equations of the second order with a nonnegative characteristic form. We consider an example. Let Y;
where g, is a Wiener process on
the interval [- 1, 1] with reflection at the endpoints. We define a process X, in R' by the differential equation X; = b(X', Y;). To write out the action functional for the family of processes X;, in accordance with Theorem 9.2 and Remark I (or according to Theorem 4.1; the fast and slow motions are separated here), we need to consider the eigenvalue problem
Nu(y) =
i + (a, b(x, y))u = Au(y),
1 d2ti
du
2 dy
By r= 3 1
= 0.
Let A = A(x, a) be the eigenvalue of the operator N with largest real part and let L(x, /3) be the Legendre transform of A(x, a) with respect to the second
argument. Then the normalized action functional for the processes X; as e 10 has the form Sor(cp) = f o L(q,, cps) ds for absolutely continuous functions. As was explained earlier, to determine the asymptotics of Px,,,{X, a D),
D c R', as e j 0 and the asymptotics of the probabilities of other events connected with X;, we need to calculate u(t, z) = inf {So,(cp): p a H,(x, z)}, where H,(x, z) is the set of functions qp such that cpo = x, cp, = z. The initial point x is assumed to be given and we sometimes omit it in our notation. As follows from Theorem 4.1, the action functional vanishes for trajectories
280
7. The Averaging Prirdpk
of the averaged system and only for them. It can be proved that in our case the averaged system has the form z,
b(x) = 2
b(x,),
f
b(x, y) dy,
so that if the point z lies on the trajectory x, of system (9.17) for which xo = x and x,o = z, then u(to, z) = 0. For the determination of u(t, z) we may use the Hamilton-Jacobi equation. In the case being considered it has the form 8u
vZ u).
at =
(9.18)
Since N is a self-adjoint semibounded operator, for its largest eigenvalue .1(x, a) we have the representation Ax, a) = min 1-2
1{2 .Cf'(Y)12dy
- J 1(a,b(x,Y))f2(Y)dy.,
where Y = {f: f' I I f(Y)12 dy = 1, f'(1) = f'(-1) = 0}. It follows from this that the solutions of equation (9.18) satisfy the following nonlinear differ. ential equation: au
at
(t,
f f[f(y)]2
z) = min{ 2
dy
- f(vu,b(z,Y))f2(Y)dY}.
Consequently, an equation can be given for u(t, z) without determining the first eigenvalue of N. Suppose we would like to determine the asymptotics, ass 10, of VE(t, x, y) = M,,,, f (X'), where f (x) : R' R' is a smooth nonnegative function, dif-
ferent from zero on a set G c R'. It is easy to see that vE(t, x, y)
f (x,(x)), where . ,(x) is the solution of the averaged system (9.17) with initial condition xo(x) = x. In particular, if x,(x) 0 G, then VE(t, x, y) -+ 0 ass 10. The rate of convergence of VE(t, x, y) to zero can be estimated by means of the theorem on large deviations for the family of processes X':
inf ux(t, z).
lim s In vc(t, x, y)
(9.19)
:EGL, G
E40
As is known, v'(t, x, y) is a solution of the problem E
at
2
E
r
E
y) aX 2E ayz + ;Y_ b'(x,
xeR', yE(-1, 1), t > 0, e
vO(0, x, Y) = f (x),
aY (t, x, Y)
L±1
= 0.
(9.20)
281
§9. The Averaging Principle for Stochastic Differential Equations
The function ii(t, x) = f (. ,(x)) obviously satisfies the equation r
00 (t,
at
x) _ I b'(x) aX;,
t > 0, x e R'; (9.21)
00, x) =.f W,
where b'(x) = 2 f'_, b'(x, y) dy. Therefore, the convergence of Xc to the trajectories of system (9.17) implies the convergence of the solution of problem (9.20) to the solution of problem (9.21) as E J 0. Relation (9.18) enables us to estimate the rate of this convergence. As we have seen in Chapters 4 and 6, in stationary problems with a small parameter, large deviations may determine the principal term of asymptotics,
not only the terms converging to zero with e. We consider the stationary problem corresponding to equation (9.20). For the sake of brevity, we shall assume that r = 1: a2ww.z
L`w`(x, y) =
1
2E Sy2
+ b(x, y)
aw,
ax =
0,
x E (- 1, 1),
Y E (- 1, 1),
E
a, (x, Y)
= 0,
K.,:(- 1,
w`(1, Y)1,.r, = X 1, Y),
(9.22)
P(- 1, Y),
1]:b(l,y)>o}, r_ = {ye[-1, 1]: b(-1,y)<0), where r+ = O (l, y) and '(-1, y) are continuous functions defined for y c- [ -1, 1]. A solution, at least a generalized solution, of problem (9.22) always exists but without additional assumptions it is not unique (cf. Freidlin [1], [3]). We require that for every xo a [-1, 1] there exists yo = yo(xo) such that either b(x, yo) > 0 for x >- xo or b(x, yo) < 0 for x < xo. This condition ensures the uniqueness of the solution of problem (9.22). If Z; = (X, Y;) is the process in the strip I y I < 1 with reflection along the normal to the boundary, governed by the operator L` for I y J < 1, x e (- ce, ce) and -r` = min{t: I X' I = 1), then the unique solution of problem (9.22) can be written in the form w(x, y) _ MXYO(XTF,
Now we consider the averaged dynamical system (9.17) on the real line. We assume that the trajectories of the averaged equation, beginning at any point x E [ -1, 1 ], leave [ -1, 1]. It is then obvious that b(x) does not change sign. For the sake of definiteness, let b(x) > 0 for x E [ - 1, 1]. If we assume in
7. The Averaging Principle
282
addition that b(1, y) >_ 0 for y e [ - 1, 1], then it is easy to prove that i
i
Jim w`(x, y) = Elo
J
'(1, y)b(1, y) dy 1
J
I
b(1, y) dy i
If we do not assume that b(1, y) >- 0 for y e [ -1, 1] in (1.9), then the situation becomes much more complicated. Concerning this, cf. Sarafyan, Safaryan, and Freidlin [1]. This work also discusses the case where the trajectories of the averaged motion do not leave (-1, 1).
Chapter 8
Random Perturbations of Hamiltonian Systems
§1. Introduction Consider the dynamical system in R' defined by a smooth vector field b(x):
zr=b(xr),
xo=xeR'.
(1.1)
In Chapters 2-6 we have considered small random perturbations of system (1.1) described by the equation Xl = b(Xr) + ewi,
X08
= x,
(1.2)
where w, is a Wiener process, and e a small parameter (we reserve the notation without - for another Wiener process and another stochastic equation).
The long-time behavior of the process Xr was described in Chapter 6 in terms that presupposed identification of points x, y of the space that are equivalent under the equivalence relation that was introduced in Section 1 of that chapter. But for some dynamical systems all points of the phase space
are equivalent-e.g., if the trajectories of the dynamical system are of the form shown in Fig. 10. An important class of such systems is provided by Hamiltonian systems.
A dynamical system (1.1) in R2i is called a Hamiltonian system if there exists a smooth function H(x), x c -,R2", such that b(x ) = VH(x)
(8H(p, q) ,..., OH(p, q) aql
aqn
OH(p, q) apl
8H(p, q)1 an
,
x= (p,q) = (pl,...,pn;gi,...,gn) eR2n.
The function H(p, q) is called the Hamiltonian, and n is the number of degrees of freedom. It is well known that H(x) is an integral of motion: H(x,) = H(pt, q,) = H(po, qo) is a constant; and the flow (p,, qr) preserves the Euclidean volume in R. The invariant measure concentrated on a trajectory of the dynamical system is proportional to d1/Ib(x)I, where dl is the length along the trajectory.
284
8. Random Perturbations of Hamiltonian Syste.,
(c)
Figure 18.
We consider Hamiltonian systems with one degree of freedom: .zt(x) = GH(x,(x)),
xo(x) = x e R2.
(1.3)
We assume that H(x) is smooth enough and limlxl-, H(x) = +oo. A typical example is shown in Fig. 18(a). The trajectories of the corresponding system are shown in Fig. 18(b). These trajectories consist of five families of periodic orbits and separatrices dividing these families: three families of closed trajectories encircling exactly one of the stable equilibrium points x3i x4, or x5 (the regions covered by the trajectories are denoted by D3, D4, and D5, respectively); the family of closed trajectories encircling x3 and x4 (they cover the region D2); and the family of closed trajectories encircling all equilibrium points (the orbits of this family cover the region D1). These families are separated by two oo-shaped curves with crossing points xt and x2. Each o0shaped curve consists of an equilibrium point (x1 or x2) and two trajectories approaching the equilibrium point as t -+ ±oo.
285
fl. Introduction
(c)
(d)
Figure 19.
Note that if the Hamiltonian has the shape shown in Fig. 19(a), the selfintersecting separatrix may look like that in Fig. 19 (d). Still in this case we call it loosely an oo-shaped curve. Consider the perturbed system corresponding to (1.3):
X, =VH(X,)+w,,
Xo =xeR2.
(1.4)
Here wt is a two-dimensional Wiener process, and e, a small parameter. The
motion Xi, roughly speaking, consists of fast rotation along the nonperturbed trajectories and slow motion across them; so this is a situation where the averaging principle is to be expected to hold (see Chapter 7). To study slow motion across the deterministic orbits it is convenient to rescale the time. Consider, along with k,', the process Xf described by
-kill =E2 VH(X,)+rv,,
X o =xeR2,
(1.5)
286
8. Random Perturbations of Hamiltonian Syste
where w, is a two-dimensional Wiener process. It is easy to see that the process Xf coincides, in the sense of probability distributions, with XflE2. We denote by Px the probabilities evaluated under the assumption that 10-* = x or Xo = x. The diffusions Xf, Xf defined by (1.4) and (1.5) have generating differential operators 2
D = b(x) V + 2 e,
L` =
E
b(x) V + 2 A,
where b(x) = VH(x), and the Lebesgue measure is invariant for these processes, as it is for system (1.3). (The same is true for other forms of pertur. bations of the system (1.3) that lead to diffusions with generators LE = b(x) . V + E2L0,
LE = - b(x) V + Lo, 82
where Lo is a self-adjoint elliptic operator; but for simplicity we stick to the case of Lo = i A.) If the diffusions XE, XE are moving in a region covered by closed tra-
jectories xr(x) (in one of the regions Dk in Fig. 18(b)), the results of Khas'minskii [6] can be applied, and the averaging principle holds: for small c the motion in the direction of these trajectories is approximately the same as the nonrandom motion along the same closed trajectory, and the process makes very many rotations before it moves to another trajectory at a significant distance from the initial one; the motion from one trajectory to another ("across" the trajectories) is approximately a diffusion whose characteristics
at some point (trajectory) are obtained by averaging with respect to the invariant measure concentrated on this closed trajectory. This diffusion is "slow" in the case of the process k,, but it has a "natural" time scale in the case of Xf. As for the characteristics of this diffusion on the trajectories, let us use the function H as the coordinate of a trajectory (H can take the same value on different trajectories, so it is only a local coordinate). Let us apply Ito's formula to H(XE):
dH(X,) = VH(XX) z b(XX) dt + VH(XX) dw1 + AH(X,) dt. E
i
The dot product VH b = 0 since b(x) = VH(x), so
H(XX) =H(XX)+J0f VH(Xs) dw,s+J0t1AH(XE)ds. Since Xf rotates many times along the trajectories of the Hamiltonian sys-
tem before H(X7) changes considerably, the integral of 1 VH(X7) is
287
p. Introduction
approximately equal to the integral fo B(H(XS )) ds, where
B(H) =
f(20H(x)/Ib(x)I)dl f(1/Ib(x)I)dl '
(1 7) .
the integrals being taken over the connected component of the level curve {x : H(x) = H} lying in the region considered. As for the stochastic integral in (1.6), it is well known that it can be represented as
J VH(W?) . dw = w(J' IVH(x:)12ds).
(1.8)
where W() is a one-dimensional Wiener process, W(O) = 0 (see, e.g., Freidlin [15], Chapter 1). The argument in this Wiener process is approximately fo A(H(XS )) ds, where
A(H) _ §(IVH(x)I2/Ib(x)I )dl f(1/I b(x)I) dl
the integrals being taken over the same curve as in (1.7) (we may mention that Ib(x) I = IVH(x) I, so the integrand in the upper integral is equal to IVH(x)I). This suggests that the "slow" process on the trajectories is, for
small e, approximately the same as the diffusion corresponding to the differential operator
Lf (H) = 2 A(H) r(H) + B(H) f'(H). This describes the "slow" process in the space obtained by identifying all points on the same trajectory of (1.3); but only while the process XJ is mov-
ing in a region covered by closed trajectories. But this process can move from one region covered by closed trajectories to another, and such regions are separated by the components of level curves {x : H(x) = H} that are not closed trajectories (see Fig. 18(b)). If we identify all points belonging to the same component of a level curve
Ix: H(x) = H}, we obtain a graph consisting of several segments, corresponding: 1,, to the trajectories in the domain Di outside the outer oo-curve; 12, to the trajectories in D2 between the outer and the inner oo-curve; 13 and 14, to the trajectories inside the two loops of the inner oo-curve (domains D3 and D4), and 15, to those inside the right loop of the outer oo-curve (domain D5) (see Fig. 18). The ends of the segments are vertices 01 and 02 corresponding to the oocurves, 03, 04, 05 corresponding to the extrema x3, x4, x5 (it happens that
the curves corresponding to 01, 02 each contain one critical point of H. a
288
8. Random Perturbations of Hamiltonian Systeras
saddle point). Let us complement our graph by a vertex O,,,, being the end of
the segment I, that corresponds to the point at infinity. Let us denote the graph thus obtained by F.
Let Y(x) be the identification mapping ascribing to each point x E R2 the corresponding point of the graph. We denote the function H carried over to the graph r under this mapping also by H (we take H(OB) = -1-00). The
function H can be taken as the local coordinate on this graph. Couples (i, H), where i is the number of the segment I;, define global coordinates on the graph. Several such couples may correspond to a vertex. The graph has the structure of a tree in the case of a system in R2; but for a system on a different manifold it may have loops.
Note that the function i(x), x e R2, the number of the segment of the graph r containing the point Y(x), is preserved along each trajectory of the unperturbed dynamical system: i(X,) = i(x) for every t. This means that i(x) is a first integral of the system (1.1)-a discrete one. If the Hamiltonian has more than one critical point, then there are points x, y E R2 such that H(x) = H(y) and i(x) A i(y). In this case the system (1.1) has
two independent first integrals, H(x) and i(x). This is, actually, the reason why H(XX) does not converge to a Markov process as a - 0 in the case of several critical points. If there is more than one first integral, we have to take a larger phase space, including all first integrals as the coordinates in it. In the present case, it is two coordinates (i, H). The couple (i(XX ), H(X, )) = Y(XX) is a stochastic process on the graph F, and it is reasonable to expect that as e -+ 0, this process converges in a sense to some diffusion process Y, on the graph F. The problem about diffusions on graphs arising as limits of fast motion of dynamical systems was first proposed in Burdzeiko, Ignatov, Khas'minskii, Shakhgil'dyan [1]. Some results concerning random perturbations of Hamiltonian systems with one degree of freedom leading to random processes on graphs were considered by G. Wolansky [1], [2]. We present here the results of Freidlin and Wentzell [2]. Some other asymptotic problems for diffusion procesess in which the limiting process is a diffusion process on a graph were considered in Freidlin and Wentzell [3]. What can we say about the limiting process Y1? First of all, with each segment I; of the graph we can associate a differential operator acting on functions on this segment:
L; f (H) = A;(H) f"(H) + Bi(H) f'(H);
i
(1.10)
A;(H), B;(H) are defined by formulas (1.9) and (1.7) with the integrals taken over the connected component of the level curve {x : H(x) = H} lying in the region D; corresponding to I. These differential operators determine the behavior of the limiting process Y, on IF, but only as long as it moves inside one segment of the graph. What can we say about the process after it leaves the segment of the graph where it started?
11.
289
Introduction
The problem naturally arises of describing the class of diffusions on a graph that are governed by the given differential operators while inside its segments. This problem was considered by W. Feller [1], for one segment. It turned out that in order to determine the behavior of the process after it goes out of the interior of the segment, some boundary conditions must be given, but only for those ends of the segment that are accessible from the inside. Criteria of accessibility of an end from the inside, and also of reaching the insider from an end were given. One of them: if the integral
Jexp{_J) dH}dH
(1.11)
diverges at the end Hk, then Hk is not accessible from the inside. These criteria, and also the boundary conditions, are formulated in a simpler way if we represent the differential operator Lf as a generalized second derivative
(d/dv)((d/du)f) with respect to two increasing functions u(H), v(H) (see Feller [2] and Mandl [1]). For example, the condition of the integral (1.11) diverging at Hk is replaced by unboundedness of u(H) near Hk (in fact, (1.11) is an expression for u(H)); the end Hk is accessible from inside and the
inside is accessible from Hk if and only if u(H), v(H) are bounded at Hk. Another condition for inaccessibility that we need: if f v(H) du(H) diverges at Hk, then the end Hk is not accessible from the inside. Representing the differential operator in the form of a generalized second derivative is especially convenient in our case, because the operators (1.10) degenerate at the ends of the segment I,: its coefficients A; (H), B; (H) given by formulas (1.9) and (1.7) have finite limits, but A,(H) -+ 0 at these ends (in the case of nondegenerate critical points, at an inverse logarithmic rate at an end corresponding to a level curve that contains a saddle point, and linearly at an end corresponding to an extremum). Feller's result can be carried over to diffusions on graphs; if some segments I, meet at a vertex Ok (which we write as I; - Ok) and Ok is accessible
from the inside of at least one segment, then some "interior boundary" conditions (or "gluing" conditions) have to be prescribed at Ok. (If the vertex Ok is inaccessible from any of the segments, no condition has to be given.) If for all segments II - Ok the end of I; corresponding to Ok is accessible from
the inside of I,, and the inside of I, can be reached from this end, then the general interior boundary condition can be written in the form
akLf(Ok) _
(±fki) du. (Ok),
(1.12)
Ui-Ok
where Lf (Ok) is the common limit at Ok of the functions L;f defined by (1.10) for all segments I ; - Ok; u, is the function on I; used in the representation L, = (d/dv,) (d/du,); a, >- 0, Yk; Z 0, and the 8k, is taken with + if the function u, has its minimum at Ok, and with - if it has its maximum there; and ak + E,;1; -O, IBki > 0 (otherwise the condition (1.12) is reduced to
290
8. Random Perturbations of Hamiltonian system
0 = 0). The coefficient ak is not zero if and only if the process spends a positive time at the point Ok. (Theorem 2.1 of the next section is formulated
only in the case ak = 0, and only as a theorem about the existence and uniqueness of a diffusion corresponding to given gluing conditions; in Freidlin and Wentzell [3] the formulation is more extensive.) Before considering our problem in a more concrete way, let us introduce some notations:
The indicator function of a set A is denoted by the supremum norm in the function space, by 11 11; the closure of a set A, by A; D; denotes the set of all points x e R2 such that Y(x) belongs to the interior of the segment I;;
Ck={x:YX=Ok}; Ck; = Ck n BD;;
for H being one of the values of the function H(x),
C(H) = {x : H(x) = H}; for H being one of the values of the function H(x) on D;, C;(H) = {x e D; : H(x) = H}; for two such numbers H1 < H2,
D;(Ht,H2) = D;(H2,HI) = {xeDi : Hi0,
Dk(±8) is the connected component of the set {H(Ok) - (S < x < H(Ok) +a} containing Ck; D(±8) = Uk Dk(±a); for a vertex Ok, a segment Ii - Ok, and a small 6 > 0,
Ck,(6)_{xeDt:H(x)=H(Ok)±6}
(the sets Ck;(6) are the connected components of the boundary of Dk(±8)); if D with some subscripts, and the like denotes a region in R2, zE with the same subscripts, and the like denotes the first time at which the process Xl leaves the region; for example, -rk(±6) = min{t : Xf 0 Dk(±8)}; and V with the same subscripts and the like denotes the corresponding time for the process k,8The pictures of the domains Dk(±S) and D,(H(Ok) ± 6,,W) are different
for a vertex corresponding to an extremum point xk and for one corresponding to a level curve containing a saddle point xk; they are shown in Fig. 20 and 21.
In the notations we introduced, the coefficients A; (H), B. (H) can be written as
A((H) _ §c,(H)(IVH(x)IZ/Ib(x)I)dl fc,(H) (1 1 I b(x) I) dl
Bt(H) _ §QH)(1AH(x)/Ib(x)I)dl fc,(H)(11Ib(x)I)dl
291
§1. introduction
Figure 20. Case of xk being an extremum.
Figure 21. Case of xk being a saddle point.
integrals of the kind used in (1.13) can be expressed in another form, and differentiated with respect to the variable H using the following lemma. Lemma 1.1. Left f be a function that is continuously differentiable in the closed region D;(H1, H2). Then for H, Ho e [Hl, H2],
fc,(H) f (x) I VH(x) I dl = fci(Ho) f(x) I V H(x) I dl
±
if Di(Ho,H)
[Vf (x) VH(x) +f (x),&H(x)] A,
where the sign + or - is taken according to whether the gradient VH(x) at C;(H) is pointing outside the region D; (Ho, H) or inside it; and
Vf(x) VH(x)+f(x) t1H(x)1 dH fC1(H) f(x)IVH(x)Idl =
c,(H) L
IVH(x)I
IVH(x)I
dl.
292
8. Random Perturbations of Hamiltonian Systems
The integral in the denominator in (1.13) is handled by means of Lemma 1.1 with f (x) = 1 / I VH(x) 12; those in the numerators, with f (x)
1 and with
f(x) = AH(x)/IVH(x)12. By Lemma 1.1, the coefficients A1(H), B1(H) are continuously differ. entiable at the interior points of the interval H(I,) at least k - 1 times if H(x) is continuously differentiable k times. Let us consider the behavior of these coefficients as H approaches the ends of the interval H(I1), restricting ourselves to the case of the function g having only nondegenerate critical points (i.e., with nondegenerate matrix (a2H/ax'axJ) of second derivatives). As H approaches an end of the interval H(I,) corresponding to a non.. degenerate extremum xk of the Hamiltonian, the integral fcj(H) (1 / I VHI) d1, which is equal to the period of the orbit C1(H), converges to O' H
ap (xk) ag (xk) fc,(H)
2
2
apaq (xk)
> 0;
IVHI dl - const (H - H(Ok)) -+ 0;
and
fCH)
(AH/IVHI) dl -+ Tk AH(xk).
So A1(H) --+ 0 as H -+ H(Ok) = H(xk), B1(H) -+ ZAH(xk) (positive if xk is a minimum, and negative in the case of a maximum). If H approaches H(Ok), where Ok corresponds to a level curve contain-
ing a nondegenerate saddle point, we have f C, H) (1 /IVHI) dl - const 11nIH - H(Ok)II -' 00; fc;(H) IVHI dl -+ fc IVHI d[ > 0. The coefficient A1(H) -+ 0 (at an inverse logarithmic rate); and it can be shown that in this AH(xk) (which can be positive, negative, or zero). case too B1(H) One can obtain zsome accurate estimates of the derivatives A'(H), Bi'(H) as H approaches the ends of H(I1) corresponding to critical points of the Hamiltonian, but we do not need such. What we need, and what is easily obtained from Lemma 1.1, is that
IA'(H)I,IB;(H)I
(1.14)
for sufficiently small IH - H(Ok)I, where Ao is some positive constant. A second corollary of Lemma 1.1 is that the derivatives of the functions v1(H), u1(H) in the representation L1 = (d/dv1)(d/du1) can be chosen as follows:
293
11. introduction
v; (H) =
1
c,(H) I VH(x) I
u;(H) = 2
dl'
IVH(x)I dl
(1.16)
.
C;(H)
indeed,
d
l_
1
1
v(H)u(H) f (H) + vi (H)
TV,
()' 1
f,(H)
the first coefficient is A,(H), and the derivative of fC,(H) IVHI dl is equal to fc,(H)(VH/IVHI)dl.
The function v,(H) can be taken equal to ± the area
enclosed by C; (H). Let us consider the vertices Ok of the graph from the point of view of their
accessibility. The functions v,(H) are bounded at all vertices Ok except at 0,,,. Now,
(H) = 2IVH(x)I H li ok) u1 (
(fc'i
dl\ I
(1.17)
(if u,(H(Ok)) is finite, this limit is the one-sided derivative of u; at H(Ok)). The limit (1.17) is finite for a vertex Ok corresponding to a separatrix con-
taining a saddle point, and the function u; is bounded at the end corresponding to Ok; the point Ok is accessible. Since u; (H) has a finite positive
limit at the end H(Ok) corresponding to Ok, we can rewrite the interior boundary condition (1.12) in the form
«kL.f(Ok) = E (±Iiki)f; (H(Ok)),
(1.18)
1:l! - Ok
where
denotes the derivative with respect to the local coordinate Hon the
ith segment, the coefficients Nki (that are different from those in the formulas
(1.12)) are taken with + if H z H(Ok) on I,, and with - if H:5 H(Ok) on I,.
As for a vertex Ok corresponding to an extremum xk, we have const (H - H(xk))2, u,(H) ' -const (H as
fc,(H) IVHI dl '
H(xk))-I
H --, H(xk), so the end Ok corresponding to an extremum xk is inaccessible. For the vertex Ore, we have: v,(H) = 2(u;(H))'-I, the integral fHG v,(H) du,(H) = J v,(H)u'(H) dH = fHu 2 dH diverges, and this vertex is inaccessible.
We prove (Theorem 2.2) that the process Y(XI) converges to a diffusion process Yt on the graph IF in the sense of weak convergence of distributions in the space of continuous functions with values in the graph F; and we find
294
8. Random Perturbations of Hamiltonian systems
the interior boundary conditions at the vertices Ok that correspond to curves with saddle points on them. The coefficients ak in the conditions (1.18) turn out to be 0. Let us outline a plan of the proof. The proof consists of several parts ensuring that: a continuous limiting process Y, exists; it is the diffusion process corresponding to the operator L; before its leaving the interior of the segment I,; (3) it spends zero time at the vertices Ok; (1) (2)
(4)
(5)
the behavior of the process after it reaches a vertex Ok does not depend on where it came from (so that it has a strong Markov character also with respect to the times of reaching a vertex); the coefficients Ykt in the interior boundary conditions (1.18) are such and such.
A more precise formulation is as follows. (1)
The family of distributions of Y(X.) in the space of continuous
functions is tight (weakly precompact). This is the statement of Lemma 3.2. We can reformulate the points (2) and (3) of our plan in terms of averaging in every finite time interval and spending little time in neighborhoods of Ck in every finite time interval; but instead we use the Laplace transforms: (2) for every 2 > 0 for every smooth function f on an interval [Hl, H2] consisting of interior points of the interval H(I,) we have MX [e_;w',H2)
r7 (Hi,H2)
f +Jo
f(H(XT
e-[2f(H(Xt)) -L;f(H(XI))] dt,
-f (H(x)) = O(k(c)),
(1.19)
uniformly with respect to x e D,(Hi, H2), where k(e) - 0 as e - 0. This is the statement of Lemma 3.3. An intermediate stage in proving (1.19) is estimating MX
J0
e-x'g(X7)di
(1.20)
for functions g in the region D,(H1, H2) such that the integral
f g(x)/Jb(x)I dl over each closed trajectory in the region in equal to 0 (Lemma 4.4). (3) We prove that there exists a function h(8), lima-o h(8) = 0, such that for every fixed 6 > 0, Mx fo e-xtxDk(+a) (X,) dt S h(8)
for sufficiently small a and for all x e R2.
(1.21)
295
p. Main Results
The form (1.21) of this property is easier to understand (little time is spent by the process Y(X1) in a neighborhood of any vertex Ok), but it is easier to prove and to use in the proof of our main result in the form of an estimate fok(ta)
a1'dt for all x e Dk(±5). Such estimates turn out to be different for vertices corresponding to extremum points and to those that correspond to curves containing saddle points; they are given in Lemmas 3.4 for MX
and 3.5. (4) We prove that for every small 6 > 0 there exists 6', 0 < 6' < 6, such that the probabilities Px{XT (±o) e Ckj(S)} almost do not depend on x if it changes on Ui Cki(S') (the liegtnning of the proof of Lemma 3.6). (5) To find the coefficients fki we use the fact that the invariant measure p of the process Xt is the Lebesgue measure, so the limiting process for y(Xf) must have the invariant measure p o Y-1. One can find the discussion of some generalizations of the problem described in this section in Section 7. In particular, there we consider biefly perturbations of systems with conservation laws. Perturbations of Hamiltonian systems on tori may lead to processes on graphs that spend a positive time at some vertices. Degenerate perturbations of Hamiltonian systems, in particular, perturbations of a nonlinear oscillator with one degree of freedom, are also considered in Section 7. In addition, we briefly consider there the multidimensional case and the case of several conservation laws.
§2. Main Results Theorem 2.1. Let F be a graph consisting of closed segments 11, ... , IN and vertices 01,..., OM. Let a coordinate be defined in the interior of each segment Ii; let ui(y), vi(y), for every segment Ii, be two functions on its interior that increase (strictly) as the coordinate increases; and let u; be continuous. Suppose that the vertices are divided into two classes.- interior vertices, for which limy-o,, ui(y), limy-o, vi(y) are finite for all segments Ii meeting at Ok (notation: I; - Ok); and exterior vertices, such that only one segment I, enters Ok, and f (c + vi(y)) dui(y) diverges at the end Ok for some constant c. For each interior vertex Ok, let Nki be nonnegative constants defined for i such that Ii-Ok; >2i:i;-Ok 13ki > 0. Consider the set D(A) e C(I') consisting of all functions f such that f has a continuous generalized derivative (d/dvi)(d/dui)f in the interior of each segment It;
finite limits limy-o, (d/dvi)(d/dui)f(y) exist at every vertex Ok, and they do not depend on the segment Ii - Ok; for each interior vertex Ok,
df
i:ii - Ok
fiki lim Ok dui (y) Y
Ell Nki i:/i -Ok
lim
f (y) = 0,
Y-+Ok dui
(2.1)
296
8. Random Perturbations of Hamiltonian Systeau
where the sum >' contains all i such that the coordinate on the ith segment has a minimum at Ok, and E", those for which it has a maximum.
Define the operator A with domain of definition D(A) by Af(y) (d/dvi)(d/dui)f (y) in the interior of every segment Ii, and at the vertices, as the limit of this expression.
Then there exists a strong Markov process (y,, Py) on F with continuous trajectories whose infinitesimal operator is A. If we take the sapce C[0, oo) of all continuous functions on [0, cc) with values in r as the sample space for this process, with y, being the value ofa function of this space at the point t, such a process is unique. If Ok is an exterior vertex, and y Ok, then with Py probability 1 the process never reaches Ok. The proof can be carried out similarly to that of the corresponding result for diffusions on an interval. First we verify the fulfillment of the conditions of the Hille-Yosida theorem; then existence of a continuous-path version of the Markov process is proved; and so on: see Mandl [1] or the original papers by Feller [1], [2]. Diffusion processes on graphs have been considered in
some papers, in particular, in Baxter and Chacon [1]. The corresponding Theorem 3.1 in Freidlin and Wentzell [3] allows considering only interior vertices.
In the situation of a graph associated with a Hamiltonian system, the vertices corresponding to level curves of the Hamiltonian containing saddle points are interior vertices, and those corresponding to extrema and to the point at infinity, exterior.
Theorem 2.2. Let the Hamiltonian H(x), x e R2, be four times continuously differentiable with bounded second derivatives; H(x) 2! A,Ix12, IVH(x)I z A2Ix1, LH(x) >_ A3 for sufficiently large I xI , where Al, A2, A3 are positive constants. Let H(x) have a finite number of critical points xl,... , xN, at which the matrix of second derivatives is nondegenerate. Let every level curve Ck (see notations in Section 1) contain only one critical point xk. Let (X,, P-) be the diffusion process on R2 corresponding to the differ(x) + e 2VH(x) Vf (x). Then the distribution of ential operator Lef (x) = ?Of of continuous functions on [0, oo) with values in the process Y(XI) in the space Y(R2)( a I-) with respect to PX converges weakly to the probability measure PY(x), where (y,, Py) is the process on the graph whose existence is stated in Theorem 2.1, corresponding to the functions ui, vi defined by formulas (1.15) and (1.16), and to the coefficients , ki given by
ski =
I VH(x) I dl.
(2.2)
The proof is given in Sections 3-6, and now we give an application to partial differential equations.
297
¢2. Main Results
Let G be a bounded region in R2 with smooth boundary aG. Consider the pirichlet problem
Lefe(x) =!Afe(x) +E OH(x) - Vfe(x) = -g(x), x e G. f '(x) = fi(x),
x E G, (2.3)
Here H(x) is the same as in Theorem 2.2, >1i(x) and g(x) are continuous functions on aG and on G u 3G, respectively, and c is a small parameter. It is well known that the behavior of fe(x) as a - 0 depends on the behavior of the trajectories of the dynamical system z, = VII(x,): If the trajectory x,(x), t >_ 0, starting at xo(x) = x hits the boundary aG at a point z(x) e aG, and VH(z(x)) n(z(x)) # 0 (n(z(x)) is the outward normal vector to OG at the point z(x)), then fe(x) --+ O(z(x)) as a --+ 0 (see, e.g., Freidlin (15]; no integral of the function g is involved in this formula, because, in the
case we are considering, the trajectory X, leaves the region G in a time of order e2). If x,(x) does not leave the region G, the situation is more complicated.
Let d be the subset of G covered by the orbits C;(H) that belong entirely to G : G = U;,H:Cj(H) c G C;(H); and let FG be the subset of the graph I' that
corresponds to G c R2 : FG = Y(G) = {(i, H) : C,(H) c G}. (i) Assume that the set I'G is connected (otherwise one should consider its connected components separately). Suppose that the boundary of FG consists of the points (ik, Hk), k = 1, . . . , 1. Each of the curves C,,(Hk), (ik, Hk) e OFG, has a nonempty intersection with 0G. (ii) Assume that for each (ik, Hk) e 8I'G the intersection Cik (Hk) n aG consists of exactly one point zk. This should be considered as the main case. Later we discuss the case of Cik (Hk) n aG consisting of more than one point. (iii) Assume that arG contains no vertices. Let A be the set of all vertices that belong to I-G. Let A = A, u A2, where A, consists interior vertices, and A2, of exterior vertices.
Theorem 2.3. If for a point x e G the trajectory x, (x), t Z 0, hits the boundary aG at a point z(x) e aG, and VH(z(x)) - n(z(x)) 0 0, then li
mff(x) = O(z(x))-
If x e G, and if the Hamiltonian H(x) satisfies the conditions of Theorem 2.2, and conditions (i), (ii), and (iii) are fuflled, then
lim.f8(x) =f(i(x), H(x)), where f (i, H) is the solution of the following Dirichlet problem on 17G,
298
8. Random Perturbations of Hamiltonian Systems
A,(H)f"(i, H) + B,(H)f'(i, H) = -g(i, H),
(i, H) a rG,
f (ik, Hk) = 4,(zk) for (ik, Hk) a OrG, f (i, H) is continuous on rG,
E (±fJki)/'(i,H(Ok)) = 0 for Ok e Al. Ho eOk
Here
A,(H) =
fc,(H) IVH(x) I dl TG(H)
B1(H)
IVH(x)I-l dl'
-- fc,(H) AH(x)/2IVH(x)I dl fC,(H) IVH(x)I-l dl
'
fk; are defined by (2.2) and taken with + if H >- H(Ok) for (i, H) a I,, and with - if H::9 H(Ok) for (i, H) a I,; and g(x)IVH(x)I_l
9(i,H) _
fCC(H)
dl
fC,(H) IVH(x)W-l dl
The solution of problem (2.4) exists and is unique. If the functions u,(H), v; (H) are defined by (1.15), (1.16), then
f(i, H) =f (i, H) + ai(l)u;(H) +a:2), where H
1(i, H) = J
H(Ok) J H(Ok)
9(i, y)v; (y) dy u; (z) dz,
Ok being an end of the segment I,. The constants a; l ), ai2) are determined uniquely by the boundary conditions on 8G, the continuity, and the gluing conditions (2.1) at the vertices belonging to Al. The proof of this theorem is based on Theorem 2.2 and includes the following statements. 1.
The solutions of problems (2.3) and (2.4) can be represented as
follows. Te
f '(x) = Mz'r (X ) + Mx JO g(X?) dx, "0
f(i,H) = M(,,H)I(YTo)+M(l,H)
0(Y,)ds,
fo
299
§2. Main Results
where re is the first exit time from G for X, : re = min{t : Xf # G}, r°,
the first exit time from r'G for the process yt of Theorem 2.2: r0 _ min{t : yt 0 r'G}, and M(;,H) = My is the expectation corresponding to the probability Py associated with the process yt on the graph. 2. MXre < A < oo for every small e 0 0 and for every x e G. The proof of this statement consists of two main parts. First we get a bound for the mean exit time from a region U such that the closure of Y(U) contains no vertices, and this is done in the standard way. Another bound is obtained for the expected exit time from a region D such that Y(D) belongs to a neighborhood of a vertex of I'; for this Lemmas 3.4 and 3.5 are used. 3. A;,(Hk) > 0 for all k, (ik,Hk) a arG, and thus every (ik, Hk) is a regular point of the boundary tFG for the process (yt, Py). 4.
If we denote a(k, H) = inf{IVH(x)I : x e Ck(H)}, then for every
positive 6,
lim Jim Pz { h-0 e-+0
l
1
h
h
J" g(XX) ds - 4(i(x), H(x))
uniformly in x such that a(Y(x)) > a > 0. EXAMPLE. Let us consider the boundary-value problem (2.3) with g(x) = 0, H(x) as shown in Fig. 18, and the region G as shown in Fig. 22(a). The region G has two holes. Its boundary touches the orbits at the points z1, z2, z3. In the part of G outside the region bounded by the dotted line the trajectories of the dynamical system leave the region G. For x e G\(G U 3G) the limit of f e(x) is equal to the value of the boundary function at the point
at which the trajectory x,(x) first leaves G. To evaluate lim5.0 fe(x) for x e G (G is shown separately in Fig. 22(c)) one must consider the graph IF corresponding to our Hamiltonian and its part FG c r corresponding to the region G (see Fig. 22(b)). In our example 8J G consists of three points (4, Hl), (5, H2), and (1, H3). The boundary conditions at aFG are:
f(4,H1) = ck(zl),
f(5,H2) = O(z2),
f(1,H3) = i(z3).
Since the right-hand side of the equation in problem (2.4) is equal to 0, on every segment I; the limiting solution f (i, H) is a linear function of the function u;(H). Since the functions u.(H) are unbounded at the vertices
of the second class (in particular, at 03), and the solution f (i, H) is bounded, this solution is constant on the segment 13. On the remaining segments the formulas (2.5) give the limiting solution up to two constants on each of the segments I,, 12, 14, 15. So we have nine unknown constants. To determine them we have three boundary conditions, four equations arising from continuity conditions at Ol and 02, and two gluing conditions at these vertices. So we have a system of nine linear equations with nine unknowns. This system has a unique solution.
8. Random Perturbations of Hamiltonian Systeam
300
(a) (b)
---- -----
(c) Figure 22.
Another class of examples is that with g(x) = I and t/i(x) = 0. The solution of the corresponding problem (2.3) is the expectation M'-r". So Theorem 2.3 provides the method of finding the limit of the expected exit time of the process Xt from a region G (or the main term of the asymptotic of the expectation Maze of the exit time for the process X1 as a --+ 0). Remark. We assumed in Theorem 2.3 that for each (ik, Hk) E aI'G the set C,,(Hk) n aG consists of exactly one point zk. The opposite case is that this intersection is equal to CCt(Hk). In this case one should replace the boundary condition f (ik, Hk) = 0(zk) by
f (ik, Hk) = 4, where (Hk) tp (x)
IVH(x)I dl
k = fC §Ck(Hk) IVH(X)I dl ,k
If aG n C,k (Hk) consists of more than one point but does not coincide with C,k (Hk), the situation is more complicated.
301
p. Proof of Theorem 2.2
§3. Proof of Theorem 2.2 Before proving Theorem 2.2 we formulate the necessary lemmas and introduce some notations. The proof of the lemmas is given in Sections 4 and 5.
Lemma 3.1. Let M be a metric space; Y, a continuous mapping M" Y(M), y(M) being a complete separable metric space. Let (Xe, P) be a family of Markov processes in M, suppose that the process Y(Xe) has continuous trajectories (in Theorem 2.2, the process X,' itself has continuous paths; but we want to formulate our lemma so that it can be applied in a wider class of situations). Let (y,, Py) be a Markov process with continuous paths in Y(M) whose infinitesimal operator is A with domain of definition D(A). Let us suppose that the space C[0, oo) of continuous functions on [0, eo) with values in r' is taken as the sample space, so that the distribution of the process in the space of continuous functions is simply Py. Let 'Y be a subset of the space C(Y(M)) such that for measures µ1, µ2 on Y(M) the equality f f dice = f f dµ2 for all
f e'I' implies µ1 = µ2. Let D be a subset of D(A) such that for every f e'P and a. > 0 the equation ..F - AF = f has a solution F e D. Suppose that for every x e M the family of distributions Qz of Y(X;) in the space C[0, oo) corresponding to the probabilities P' for all a is tight; and that for every compact K S Y(M), for every f e D and every A > 0,
MX J0 e
[1.f(Y(X,))-A.f(Y(X,))]dt-'.f(Y(x))
(3.1)
as a - 0, uniformly in x e Y-1(K). Then Qx converges weakly as e --p 0 to the probability measure PY(x).
The lemmas that follow are formulated under the assumption that the conditions of Theorem 2.2 are satisfied. Lemma 3.2. The family of distributions Qxe (those of Y(X;) with respect to the probability measures PX in the space C[0, oo)) with small nonzero a is tight.
It can also be proved that the family {Qx} with small a and x changing in an arbitrary compact subset of R2 is tight. Lemma 3.3. For every fixed H1 < H2 belonging to the interior of the interval H(I,), for every function f on [H1, H2] that is three times continuously differentiable on this closed interval, and for every positive number A, e
Mx
,Hz ) f(H(_Y, [e_'1 -AT
; (Hi,H2)
+Jo
e
dt
e
e
[)..f (H(XX )) - L,f (H(Xr ))] dt
-' f (H(x)) as a -i 0, uniformly with respect to x e Di(H1, H2).
(3.2)
302
8. Random Perturbations of Hamiltonian Syste,
Lemma 3.4. Let Ok be an exterior vertex (one corresponding to an extremum of the Hamiltonian). Then for every positive A. and x there exists 63.4 > 0 such that for sufficiently small e for all x e Dk(t83.4), tk(ts3..)
a
Mz J0
dt < K.
Lemma 3.5. Let Ok be an interior vertex (corresponding to a curve containing a saddle point). Then for every positive A and x there exists 63.5 > 0 such that for 0 < 6 5 83.5 for sufficiently small a for all x e Dk(±8), tf(fa) Mz
Je-1t dt < x 8.
(3.3')
0
Remark. Formulas (3.3) and (3.3') are almost the same as MXTk(±8) < x or x8, and such inequalities can also be proved; but it is these formulas rather than the more natural estimates for MXTr(±8) that are used in the proof of Theorem 2.2.
Lemma 3.6. Define, for Ii " Ok, Pki = P it (Ei:r, - ok fiki) , where I3ki are defined by formula (2.2). For any positive x there exists a positive 83.6 such that for 0 < 8 < 63.6 there exists 83.6 = 63.6 (6) > 0 such that for sufficiently small e, Px{X=k(±b) a Cki(6)} -PkiI < x
(3.4)
for all x e Dk(± 53.6)-
Proof of Theorem 2.2. Make use of Lemma 3.1. As the set P we take the subset of C(f) consisting of all functions that are twice continuously differentiable with respect to the local coordinate H inside each segment Ii; D is the subset of D(A) consisting of functions with four continuous derivatives inside each segment. The tightness required in Lemma 3.1 is the statement of Lemma 3.2; so what remains is to prove that the difference of both sides in (3.1) is smaller than an arbitrary positive number rI for sufficiently small e for every x e R2.
Let a function f e D, a number 2> 0, and a point x e R2 be fixed. Choose Ho > maxk H(Ok) so that Mee_ T° < X
I 2(IIfII + .1-' IlAf - Afll )
for all sufficiently small e, where Te = Te(Ho) = min{t : H(X,) is possible by Lemma 3.2).
Ho} (this
303
§3; proof of Theorem 2.2
To prove that r°°
me
e-z'[2f(Y(Xi))-Af(Y(Xi)))dt-f(Y(x))
<,
it is sufficient to prove that
.e))
MX I
+JO
e-z'[Af(y(X8 ))
- Af (Y(Xi ))] dt] -fMx))
< r//2.
(3.6)
Now we take small positive 8 and 6' < 6 and consider cycles between the succesive times of leaving the set Uk Dk(±8) and reaching the set Uk,j Ckj(6'), before the process reaches the level curve C(Ho): define Markov times TO < al < rl < aZ < ... by TO = 0,
an = m1II t > Zn_I : XX 0 U Dk(±(S)
,
k
Tn=nun t>an:Xe e U Ckj(a')UC(Ho) k,i
(after the process reaches C(Ho), all an and a, are equal to TE). The difference in (3.6) can be represented as the sum over time intervals from T. to an+i and from an to rn: it is equal to me
e-zZ"f(Y(X.,'"))
,0 +
r
e-At [2f( y(X:
)) - A.f ( Y(X,8)) ] dt
[e_f(Y(x)) - e"f(Y(X))
+ MX n=1
(the formally infinite sums are finite f o r every trajectory f o r which Te < w ). We can write the expectations of the sums as infinite sums of expectations:
8. Random Perturbations of Hamiltonian Sys
304 w
0
Ir
n=0
+ +
Mxle-)-n+lf(Y(XXn+I))
-e-1T"
(' V
(` (XTn))
L
r
Qn+1
e-x'[Af(Y(X"))
00
MX
- A.f(Y(XX ))] dt
[eTf(Y(Xi^))
+ f T" e-A'[)f(Y(Xf
-
ea^f(Y(Xf^))
)) - Af(Y(Xf ))]dt]
Q
if the sums En=0 Mx{Tn <
T`;e-xT^},
oo. So >n=1 Mz{c, < the first thing is to estimate these sums (which are approximately the same as
the expected number of cycles-to be precise, of cycles that contribute a significant amount to the sum). Let us denote by D(Ho,B') the region
Ix: H(x)
(D(Ho,6) is the same set with 6 in lieu of 8'). The random time Tn, n >_ 1, is the first exit time from D(Ho,B') after the time Un. Use the strong Markov property with respect to an: MX{Tn <
T`;e-x^}
= Mx{cn < T`;e-i°"O1(Xa^)},
(3.9)
where 0'1(z)
- W XTe(H.,b') C_ U Cki
f ); a
dT`(Ho a')
(3.10)
k,i
(according to our system, we use the notation Te(H0i 6') for the first exit time from the region D(H0i6')). It is clear that Xo^ a D(H0,b) for every n, so MX{Tn < TE; e-xT^ } S MX{Un < TE; a-"^ } max{oj (z) : z e D(Ho,b)} < MX{Tn_1 < Te; a-x^-' } max{O'(z) : z E D(Ho,b)},
and by induction Mz{Tn <
T`;a-AT.
} < [max{qi(z) : z e D(Ho,b)}]".
(3.11)
Since in defining the set D(Ho,B) we delete neighborhoods of all separatrices and of extremum points, the set D(H0i 6) splits into pieces that lie each
305
§3. Proof of Theorem 2.2
within its own region D. Suppose the ends of the segment I; are the vertices Ok, Ok', Hii = H(Ok) < H,2 = H(Ok'). Define, for every small b >- 0, Jjj6j = HiI +b,
if Hn +oo, i.e., if Ok, # 0,,), if one of the ends of I; is the vertex O.
{ H,2 - b
H,12
Ho
Then D(Ho,6) = Ui D1(Hl,HZ). Suppose the process Xf starts at a point z e D;(H, H2). In the region D; (HiJ, ,,
averaging takes place: we can apply Lemma 3.3 to estimate the
expectation me
X'E`(Ho,&,) e U Cki (a'); a zi'(flo a )
z
k,i
The idea is to evaluate the corresponding expectation for the limiting (the averaged) process, and to use it as the approximation for q-,. Let Ua'(H) be the solution of the boundary-value problem LiU,a'(H)
- AU,a'(H) = 0,
b,(H, a,) U;
Hay < H < Hf, Uf ' (H1') = 1,
if Ole 0 Oc,
(1
= t 0 if Oj = O (and Hf = Ho).
Applying Lemma 3.3 to this function, we obtain that
0'1(z) - Ua (H(z)),
(3.12)
uniformly in z e D,(Ha', Hi6;). It is clear that the solution U,a' (H) is strictly less than
1
for
H,' < H < Hn ; and it is equal to 0 at the boundary point Ho if one of the ends of I; is Occ. Therefore
max{U,'(H) : Hil < H 5 Hf} is strictly less than 1. For sufficiently small a the maximum max{0If (z) : z e Di(Ha, H62)}
is also strictly smaller than 1; so
max{bi(z): z e B E MX {in > TE;
E
n=0
n=0
> 1, (z) : z e D(Ho, b) }]"
_ (1-max{gi(z) : z e D(Ho,b)})-1 < oo,
306
8. Random Perturbations of Hamiltonian Systep
and 00
00
Mx{a, < Te;e-zt7 } <
Mx{-r.
< TE;e-Z'}
n=0
n=1
is estimated by the same quantity. So the transition from (3.7) to (3.8)
is
possible.
We need to know how these sums depend on 6. By (3.12) we have for sufficiently small e: 00
E Mz{in < Te; a-zr^} < 2(1 - max{ Ua'(H) : Hit < H < Hz})-'. (3.13) n=0
If we make 6' 0, the solution Ua' (H) converges to U° (H), also a solution of the same equation on the interval (H°, HO,), with boundary values 1 at the ends that are interior vertices, and with limits < 1 at any other end (0 at the right-hand end if this is H°). The function U°(H) is downward convex with respect to the function u;(H); so
max{U,°(H) : Hi < H < HZ} < max(U°(H), U,°(H,)).
If H° = H; is an end corresponding to an interior vertex, we have 1 - U°(Hj1) u(H°)I as 6 -> 0, where Ay is a positive constant. Since the one-sided derivative of u;(H) with respect to H exists and is positive at such ends (see formula (1.17)), we have U° (H) < 1 - A(jS for sufficiently small S, where Ay > 0. At all other ends we have U°(H) < 1 - B4 for small 8, By> 0. So we obtain max{ U°(H) : H jJj < H !!g H 12}:!!9l-Ajj,
A;>0. Now, for every small positive 6 there exists 60, 0 < 30 < S, such that for all
a',o
Mx{'rn < Te;
C8-1,
(3.14)
n=0
where C is a constant (there exists a constant 80 > 0 such that, for every positive 6 < 60, this holds for all 8', 0 < 6' < a0 = a0(6), for sufficiently small e and for all x). Now we return to the sums (3.8) (being the expression for the difference in (3.6)). Using the strong Markov property with respect to the Markov times
307
13. proof of Theorem 2.2
rn, or., we can write: MxLe-aTf(Y(XTe))+Jo
e-al[,lf(Y(X,))-Af(Y(X ))]dt] -f(Y(x))
ao
00
MX"{a. < TE;e3(Xa^)},
Mz{an < n=0
n=1
(3.15)
where =`(t6)
2(z) = Me e-xie(ta) f(Y(X=e(±o))) + f
e-x'[2 f(Y(Xt
)) -Af(Y(XX ))] dt
0
-f(Y(z))
(3.16)
('(±8) being the time at which the process X, leaves D(±8) = Uk Dk(±8)), f'(Y(Xse(Ho,6'))) 3(z) = MZ re-li`(Ho,6'), Lt`(Ho,6')
+
e-a` [df (Y(XX ))
J0
- Af (Y(X, ))] dt -f (Y(z)).
(3.17)
The argument XQ in 03, as that in Of in (3.9), belongs to b(b). As for X,8, point x for n = 0, but for n > 1 it belongs to Uk,i Cki(b') v C(Ho). Clearly the function 02'(z) = 0 for z 0 Uk Dk(±b), in particular, on C(H0), so the absolute value of the expression (3.15) does not it is the initial
exceed
102(x) I +
00
n=1
Mz{zn < Te; a-zi" } max
I02(Z) I
: Z E U Cki (a') k'i
00
+ E Mx{o, < TE; e-"'-} maxi Ic3(z)I : z e D(Ho,6)).
(3.18)
n=1
By Lemma 3.3, I#3 (z) I is arbitrarily small for sufficiently small e, uniformly in z e D(H0i6') D(H0,6), so 00
Mx{Qn < Te; a-zo^} max{Ig3(z)I : z e D(HO,6)} < PI/20
(3.19)
n=l
for sufficiently small e, and this is true for every function f on t that is smooth on the segments Ii, even if it does not satisfy the gluing conditions.
308
8. Random Perturbations of Hamiltonian Systems
As for 02, we cannot see at once that it converges to 0 as 8 --> 0, but we
can make it small by choosing 6 and 6'. This is not enough, because
: z e U k, i Ck, (8') } is multiplied by the sum that increases at the rate of 6-1 (see (3.14)) as S -- 0; so we have to obtain an estimate for q2(z) for an arbitrary z e D(±8), and a sharper one for z e Uk i Ck;(a').
max { 102'(z) I
We estimate 42(z) in different ways for an arbitrary z belonging to D(+_8)
(which is used to estimate the first summand in (3.18)), in the case of
z E Cki(8'), where Ok is an interior vertex, and in that of an exterior Ok. Clearly, for z e Dk(±a), =k(ta)
I4Z(z)I
a-x'dt. (3.20)
Choose a positive St so that for every vertex Ok
If(Y) -f(Ok)I < ?1/20 for ally e Dk(±6i) (we are using continuity of the function f) and that me
=k (±bi)
e
o
J
r! - dt<20[AIIfII+II4f-AfII]
for sufficiently small a and all z e Dk(±Sj) (we are using Lemmas 3.4 and 3.5). Then for 6 < 8t and for sufficiently small 8 Igi(z)I < ij/10 for all z e D(±S).
(3.21)
Next case: z e Cki(S'); the vertex Ok is an interior one. Using the fact that the one-sided derivatives f,.'(Ok) exist, choose a positive 62 so that
f(Y) -f(Ok) -f'(Ok) < H(y) - H(Ok)
'
t,
20C
for y e I; n D(±82) for all segments I; meeting at Ok, where C is the constant of the estimate (3.14). Choose 63 > 0, by Lemma 3.5, so that for 0 < 8 5 a3 for sufficiently small e and for all z E Dk(±8), r=k(±a)
MX to
a-)j dt
20C[AIlfII+II4f-AfII]
(3.22)
13. proof of Theorem 2.2
309
Choose a positive S4i by Lemma 3.6, so that for 0 < 6:5 S4 there exists a 64 = S4(6), 0 < 64 < S, such that I Pz{Xz-(±S) E Cki(6)}
PkiI <
20 cEi:li ok Il'(ok)I
for sufficiently small s and for all z e Dk(±Sk)
Using continuity of f again, choose, for every S > 0, a positive S5 = bs(S) < S so that
If(Y) - f (Ok) I <,5.1/20
(3.23)
for all y E Dk(±SS).
Now for 0 < S < min(S2iS3,S4), 0 < S' < min(S'' (S),SS(S)), for sufficiently small c and for all z c- Cki(S') C Dk(±64) n Dk(±65) we have
IMzf(Y(X k(ts))) -.f(Y(z))I <
I> i1i
+E
Pki' [ (i,H(Ok) ±5) -.f(Ck)]I
ok
IP'(X,.(±d) E Cki(S)} -PkiI If(i,H(Ok) ±S) -f(Ok)I
If (0k) -f(Y(z))I
(3.24)
In the first summand on the right-hand side the quantity in the brackets differs from S fi'(Ok) by not more than S. q/20 C, so the first summand does not exceed
Pki'a'f'(Ok)
+ E Pki'S 20C* 1I
i:li - Ok
i:li ^ Ok
The first summand here is equal to 0 because the gluing condition is satisfied at Ok, and the second one is equal to S ,i/20 C. Using the inequality
If(i,H(Ok) ± S) -f(Ok) I <- S I.f'(ok) I + S r7/20 C, we see that the second summand on the right-hand side of (3.24) is not greater than
E
i:li - Ok
11
20 C Ei:li-Ok If'(ok)I
'
S I,f'(ok)I+3S-17/20C=S-0.21l/C
(36. 17/20 C because the number of segments meeting at Ok is equal to 3
if the level curves do not contain more than one critical point, and I Pz{X, (tJ) a Cki(S)} -PkiI < 1).
310
8. Random Perturbations of Hamiltonian Systems
The third summand on the right-hand side of (3.24) is not greater than S i7/20 C by (3.23).
By (3.22), the second summand on the right-hand side of (3.20) is not greater than S 1/20 C, and 102(z)I <S 0.31j/C
for z e Ck,(S')
(3.25)
(for interior vertices Ok). The last case is that of z e Ck,(S'), Ok being an exterior vertex (remember, only one segment I; enters a vertex corresponding to an extremum point).
Let us use again the technique by which we obtained formulas (3.15)(3.18). Choose a positive S" < S', and consider smaller cycles between reaching two sets, taking S' instead of S, S" instead of S', and Ck,(S) instead of C(Ho) (everything in a neighborhood of an extremum point xk). Namely,
let us introduce Markov times a' < Tl < a2 < T? <
. by a', = 0, rn =
min{t > an : X, e Ck,(S") V Ck,(S)}, an = min{t >_ T'n_1 : X, Dk(±S')} (after the process reaches Ck,(S), all Tn and an are equal to Tk(±S); the difference is that, since we are starting from a point z e Ck,(S'), we have a' = 0, so we do not need to consider To). We have W
b2(z) _
Mz{T' < Tk(±a);
e-zZ04(X%)}
n=1 00
+
MZ{an < Tk(±S); E-'O
(3.26)
n=1
04e(z') = Mz,
[e_tf( Y(X(±Jl)))
r0
+I
C-At
(Y(Xe)) - A.r( '(Xre))j dt] -f(Y(z')), r
(3.27)
Mz,[e-Ar,(x(ok)±6,H(ok)±b").f(Y(Xr;(moo
os(Z) =
±a,H(ok)±a")))
+
r' JO0
-f( Y(z'))
±a,H(ok) ±o")
e-Al
-A.f(Y(X,8)))dt] (3.28)
where r; (H(Ok) ± S, H(Ok) ± S"), according to our system of notations, is the time at which the process leaves the region D,(H(Ok) ± S, H(Ok) ± S")
(this region is the same as the difference Dk(±S)\Dk(±S")); and for ZECk,(S'),
311
{, proof of Theorem 2.2 00 Tk(±S);e-xr}
Mz{Tn <
I02(z)I <
max I04(z')I : Z E U
Cki(o")
i:/i -Ok
n=1 00
+>2 Mz{Q;, < 7k(±8); Ca-} max{
)I
n=1
: z e U C, (a') -;:Ii
.
°k
(3.29)
Again 0'(z') can be made arbitrarily small by Lemma 3.3; similarly to (3.19), for sufficiently small e,
Mz{o < 4(±8); e--6-} max 105(z')I : Z' C n=
U
Cki(S')
< S x/20 c
1:11 ^'Ok
(3.30)
for all z E Ck;(8').
Similarly to the way we obtained (3.21), we can, for every S > 0, choose
a positive 66 = 5 (a) < a so small that for sufficiently small e for all z' E Dk(±8'), C;
(3.31)
< [max{q (z') : z E Ck;(8')]",
(3.32)
q (z)I and similarly to (3.9)-(3.11), we obtain: Mz{'c,', <
2k(±8);e-xT"}
Suppose the process X1 starts from a point z e Ck;(8'). Let us consider a solution V,a(H) of the differential equation L,V'5(H) - 2Vf (H) = 0 with V, (H(Ok) ± 8) = 0. This solution is determined uniquely up to a multiplicative constant. It does not change the sign between H(Ok) and H(Ok) ± S, and it converges to -I-oo or to -oo as H -+ H(Ok) (since the function u;(H) has an infinite limit at H(Ok)). Applying Lemma 3.3 to the function V,' (H), we obtain that a
V; (H(Ok) ± S)
(3
a --. 0, Since uniformly in z' E D; (H(Ok) ± S", H(Ok) ± 8). I VP(H(Ok) ± S") I -k oo as 8" ---> 0, for every 8' > 0 we can choose S' = 0 < 8;' < S', so that Va(H(Ok) ± 8')/Va(H(Ok) ± 8") < 3 for 0 <
as as
312
8. Random Perturbations of Hamiltonian Systems
6" < 8,'. Then for 6" < mini 8;' and for sufficiently small e, for z' e Cki(6').
06(z') < 3
Using this together with (3.29)-(3.31), we obtain that for sufficiently small 6, for 6' <- mini(Ji(6)) < 6, for 0 < S" < 6', and for sufficiently small a we have:
I02(z) I < 6.0.25r//C
for z e U
Cki(b'),
i:lr - ok
and together with (3.25) this yields max{Iq (z)I : z e Uki Cki(8')}
This proves the theorem.
§4. Proof of Lemmas 3.1 to 3.4 Proof of Lemma 3.1. The tool we use to establish weak convergence is martingale problems. See Stroock and Varadhan [1, 2], where martingale prob.
lems are formulated in terms of the space C`° of infinitely differentiable functions; we use some other sets of functions. Let us denote the difference of the left-hand side of (3.1) and its limit in
the right-hand side by A '(x). Let G(yl,... , y,), yi e Y(M), be a bounded measurable function. Then, by the Markov property of the process (X,, PX), forany0<_ti we have 00
MXG(Y(X,,),...,
e-z:[ f(y(X!
LJ to
)) - Af(Y(XX ))]dt
l
-e`xror(Y(X%))J = MXG(Y(Xi, ), ... , Y(x' )) e-2'°
(Xt'O)
We can represent this expectation as the sum of expectations taken over the event { Y(X0) e K} and over its complement, obtaining Mz{ Y(Xlo) e K; G(Y(x' ), ... , Y(X,')) e-xroee(xio)} + MX{ Y(XXa) 0 K; G(Y(XX, ), ... , Y(X8 )) e-ztO, (xtE.)}.
The
second
expectation
does
not exceed
II GII
e-"O sup,,, D8(z),
PX{ Y(XXo) 0 K} < IIGII . e-Ato . (211i'II + 2-IIIAfHI) Px{Y(xto)
K},
and
can be made arbitrarily small for all a by choosing a compact K 9 Y(M).
313
§4. Proof of Lemmas 3.1 to 3.4
The first expectation is not greater than 11 G11 - e-at" supZE Y-,(K) A'(z), and can be made small by choosing a sufficiently small. So we obtain
MxG(Y(X ),
Y(Xi.))
e-zt[Af(Y(XX .
to I JO
)) - Af(Y(XX ))] dt
- e-xro f(Y(Xio)), --+0
(e _. 0).
We can rewrite this formula using the expectations with respect to the measure Qx in the space C[0, oo), which we denote by 0 MXG(Yt...... yt.)
.
e-2t
1jO
MX:
0
[Af (yt) - Af (Yt)] dt - e-xtof (Yto) J
Since the family of distributions {QX} is tight, there exists a sequence 0 such that Qx converges weakly to a probability measure P. Let us denote the expectation corresponding to this probability by M. If the function G is continuous, the functional of y. e C[0, co) under the expectation sign in (4.1) is also continuous, and we can write: MG(yt...... yt.)' If'00"
e-zt[Af(Yt)-Af(Yt)]dt-e-xtof(Yto)
=0.
Changing the order of integration and integrating by parts, we obtain f
)-x'MG(yt......
yt.)'
[f(:) -f(Yto) - J10 Af(ys)ds] =0.
Since a continuous function is determined uniquely by its Laplace transform, this means that MG(yt...... yt,,) [f (yt) -f (yto) - fto Af (yy) ds] = 0 for all n
and 0 5 tj <
< t 5 to; so the random function f (yt) - fo Af (ys) ds is a
martingale with respect to the family of a-algebras JF[o,t] _generated by the functionals y. H ys, s E [0, t], and the probability measure P.
In other words, the probability measure is the solution of a martingale problem corresponding to the contraction of the operator A defined on the
set D c C(Y(M)), with the starting point Y(x) (which is clear because Y(XX) = Y(x) almost surely with respect to each of the probabilities P).
Now, the condition of existence of a solution FED of the equation .1F - AF = f for f e `P ensures uniqueness of the solution of the martingale problem. This statement is a variant of Theorem 6.3.2 (with condition (ii)) of Stroock and Varadhan [2], only freed from reference to the space C°° of infinitely differentiable functions and adapted to time-homogeneous processes.
314
8. Random Perturbations of Hamiltonian Systery
So P = P Y(x). The fact that not only the sequence QX converges weakly to PY(x), but also QX as a 0, is proved in the usual way.
The remaining lemmas involve many estimates with various positive constants, which we denote by the letter A with some subscript. We very often make use of Ito's formula applied to e-x, f(H(X, )), where f is a smooth function, A >- 0:
e-x' f(H(X,
dws )) =f(H(Xo)) + Jo a+ e-u[-Af(H(XS))+2f (H(Xs))IVH(XS)I2 sf'(H(XS))VH(XS)
Ja
+ 2 f'(H(X,s))AH(X')] ds, Mze-atf
(H(X E)) =f (H(x)) + MX
(4.2)
J0
e-'5[-Af (H(X., ))
+ 2f"(H(Xs))IVH(Xs )I2 + 2f'(H(Xs))AH(XE)] ds for the time r of going out of an arbitrary bounded region. In particular, for
A=0,f(H)=H, MXH(X,) = H(x) + Ms
2 AH(X`) ds,
(4.4)
M`H(XX) = H(x) + Mx f 2 AH(Xs) ds.
(4.5)
JO
o
Proof of Lemma 3.2. Introducing the graph, we did not describe the metric on it. Now we do this: for any two points y, y' e Y(R2), consider all paths
on the graph that connect these points following the segments I; : y = (io, Ho) '-' (io, Hl) = Oki = (il, Hl) ''' (il, H2) = Ok2 = (i2, H2) H ... r (it-1, H,-1) = Ok,_, = (it, H1-1) ''' (i1, Ht) = y', and take 1-1
p(y,y')
= min E jH1+1
- H14,
i=0
where the minimum is taken over all such paths. In order to prove the tightness it is sufficient to prove that: (1) for every T > 0 and & > 0 there exists a number Ho such that
Px{ max H(X,)zHo}<6 1110515 T
for all e;
JJJ
315
§4. proof of lemmas 3.1 to 3.4
(2) for every compact subset K c R2 and for every sufficiently small p > 0 there exists a constant A, such that for every a E K there exists a function fP (y) on Y(R2) such that fP (a) = 1, fP (y) = 0 for p(y, a) >- p,
0 < fP (y) < I everywhere, and fP ( Y(X, )) + Apt is a submartingale for all e (see Stroock and Varadhan [2]). As for (1), we can write, using formula (4.4),
where A4 >- sup AH(x) (the constant denoted Ao was introduced in (1.14), and A1, A2, A3 in the formulation of Theorem 2.2). Using Kolmogorov's inequality, we obtain for Ho > H(x) + (A4/2)T,
Pe J',max H(X,)>-Ho< stsT
JJJ
Mx JOT I °H(Xi
dt
Ho -H(x) - (A4/2)T fT[A5 +A6MXH(X,)]dt
Ho - H(x) - (A4/2)T and using (4.7), we obtain (4.6). Let us prove (2). Let h(x) be a fixed smooth function such that 0::5 h(x) < 1, h(x) = 1 for
x<0,andh(x)=0 for xz1. Let us take a positive p smaller than half the length of the shortest seg-
ment of the graph. Let a be a point of Y(R2). If the distance from a to the nearest vertex of the graph is greater than 2p/5, we put fP (y) _ h(5p(y, a)/p); if p(a, Ok) < 2p/5, we put fp (y) = h(5p(y, Ok)/p - 2). In both cases the function fp (y) = 0 outside the p-neighborhood of the point a. Now let us consider the functions fP ( Y(x)), x e R2. These functions are expressed by the formulas fP (Y(x)) = H(51H(x) - H(a) I /p), or fp (Y(x)) = H(5IH(x) - H(Ok)I/p - 2), or the identical 0 in different regions, and the
regions in which different formulas are valid overlap. The functions IH(x) - H(a)I, IH(x) - H(Ok) I are smooth except on the lines where H(x) = H(a) or H(Ok); so the functions fP (Y(x)) are smooth outside these lines. But also they are smooth in some neighborhoods of these lines, because h'(0) = h"(0) = h'(-2) = h"(-2) = 0, and the functions h(IxI), h(IxI - 2) are smooth. So the functions fP ( Y(x)) are twice continuously differentiable everywhere, and their gradients are orthogonal to VH(x). Apply Ito's formula:
f;(Y(X;)) =fP(Y(Xa)) and take
+Jo °f°
(Y(XS)) dws+ JO 2efP (Y(XS))ds,
316
8. Random Perturbations of Hamiltonian Systems
l Af, (Y(x))
sup
AP =
aEK,xER2 2
sup Ih'(x)I 5 sup IAH(x)I + sup Ih"(x)I ?2 sup IVH(x)IZ P xeK xe[0,1] P xeK
1
xE10,1,
2
The lemma is proved.
Proof of Lemma 3.4. Let us use formula (4.5) with r = zk(±S). The bound. ary 8Dk(±S) consists of one component, and either H(X,'k(+J)) is always equal to H(xk) +S, or always to H(xk) - S (depending on whether xk is a
minimum or a maximum of the Hamiltonian). Since the point xk is a nondegenerate extremum point, we have AH(xk) # 0. Of course, IAH(x)I > 'IAH(xk)I in a neighborhood of this point, so for sufficiently small S > 0,
Mxzk(±S) <
46 I H(xk) ± S - H(x)I < IAH(xk)I mf{I ZAH(x)I : x e Dk(±S)}
for all x e Dk(±S) and for all a (and the expectation on the left-hand side of (3.3) is smaller still). Lemmas 3.3 and 3.5 are formulated for domains Di(H1i H2), Dk(±S) that do not depend on c; but we obtain versions of these lemmas with domains depending on e : Dk(±S) with S = S(e) > 0 decreasing with a (but not too
fast), and for D.(H1,112) with one of the ends H1, H2-or both-being possibly at a distance S(c) from an end of the interval H(I1). Lemma 4.2 is the c-dependent version of Lemma 3.3, including this lemma as a particular case; and Lemma 5.1 is the time-dependent version of Lemma 3.5, which does not include it as a particular case. In the proof of Lemma 3.5 both Lemmas 4.2 and 5.1 are used. The proofs rely on a rather complicated system of smaller lemmas. Let us formulate two c-dependent versions of Lemma 3.3. Let U1(H1i H2) be the class of all functions g defined in the closed region D;(H1, H2), satisfying a Lipschitz condition Ig(x') - g(x) I < Lip(g) - Ix' - xl, and such that dl Jc,(H) g(x) Ib(x)I = 0
for all He(H1iH2). Lemma 4.1. There exist (small) positive constants A7, A8 such that for every positive A9 for every H1, H2 < A9 in the interval H(11) at a distance not less than 6 = S(e) = eA7 from the ends of H(11), for every positive .l, for sufficiently
317
0. proof of Lemmas 3.1 to 3.4
small e, for every g e (,(HI, H2), and for every x e D,(H,, H2), ; (Hi,H) I MX
e-x'g(XX) dtl < (IIgII + Lip(g)) . eAs
f0
(4.8)
Lemma 4.2. Under the conditions of the previous lemma, there exist such positive constants Alo, A11 that for sufficiently small e, for every H1, H2 < A9 in the interval H(1,) at a distance not less than 6 = 8(e) = eAI0 from its ends, for every x e D,(H1i H2), and for every smooth function f on the interval [H1i H2], MX
[e_(hu1H2)f(H(X(Hl,i,2)))
+
rt;(H,,Hz) I
ext[.lf (H(Xre)) -
UPI + Ilf"II +
Lif
(H(Xrs)
)] dt -f(H(x))
Ilf"'II)e4
(4.9)
Lemma 3.3 is a particular case of this lemma. Lemma 4.2 is proved simply enough if Lemma 4.1 is proved. To prove Lemma 4.1 we use the fact that our process X; moves fast along the trajectories of the Hamiltonian system (and more slowly across them). It is con-
venient for us to consider the "slow" process X. The following lemma makes precise the statement about our process moving along the trajectories. Lemma 4.3. For every positive 1, 2
Px{m x IX,' - xt(x)I > rl} < 3(e2 2L 1) E4
(4.10)
where L is the Lipschitz constant of the function VH (the same as that of VH); and M,,I
X, - x, (x) I <
( eut2L )
1/2 e.
(4.11)
Proof Applying Ito's formula to the random function I X, - xt(x) 12, we obtain: IXi
- x! (x) I2
= fo [2(Xs - xs(x)) . (b(Xe) - b(x.,(x))) + e2] ds + J' 2e(XS - x.,(x)) dw,
5 Jo 2LIXs -xs(x)I2ds+e2t+Jo 2x (X,' -xs(x)) dw,s, M' I St' - xt (x) I2 <- 2L
JO
Mx Xse If - xs (x) I2 ds + e2t.
318
8. Random Perturbations of Hamiltonian Systems
Using the Gronwall-Bellman inequality, we have:
MxI IX;,(x)12 -
e2Lc - 1 2.
2L
The estimate (4.11) follows from this at once. Applying Ito's formula to IX, - x,(x)14, we obtain in the same way:
I Xi - xi(x)
14
<
Jo
[4LIXs - xs(x) I4 + 6e2IXs - xs(x) I2] d
+ J0r 4E1Xs
- xs(x)I2(XS - x.,(x))
MI`Y- x,(x)l4 < 34
dws,
(4.12)
e2D - 1)2. 2L
-
Putting the Markov time r = min{t : I1 t - x,(x) > rj} A T instead of t in (4.12) and taking the expectation (the idea of the Kolmogorov inequality), we obtain ,i4. pX{
max IX: -x,(x)l zri}
0515 T
M: IX: -XT(x)14
< MX J0 [4LIXs -x '(x)14+6e21Xs -x'(x)121ds 2
< Jo MX[4LIXs - xs(x)14 + ,
6e21Xs
-
xs(x)I2]
ds < 3e41 e
2L
Note that Lemma 4.3 implies not only convergence in probability X; ->p x,(x) that is uniform in every finite time interval [0, T], but also in the interval [0, cIln -I] that grows indefinitely as a decreases, where the constant
c < L-t. The most likely behavior of the process X, (or X,1) in the region D,(H1, H2) is rotating many times along the closed trajectories filling this region. We have to devise some way to count these rotations, adopting some definition of what a rotation is. No problem arises in the case of rotations of a solution x,(x) of the unperturbed system x, = OH(x,): we can consider the first time at which x,(x) returns to the same point xo(x) = X. But the diffusion process X: never returns to X. Let us take a curve 0 in D;(Ht, 02) running across all trajectories x,(x) in this region so that every trajectory (level curve of the Hamiltonian) intersects 0 at one point. Unfortunately, the time at which k;-' returns to this curve cannot serve as the definition of a rotation time, because a trajectory If starting at some point of 0 returns to this
319
¢t. Proof of Lemmas 3.1 to 3.4
curve infinitely many times at arbitrarily small positive times. So to define a "rotation," we take two curves 8 and 8' running across the trajectories at two by different places, and define zo < Q1 < f I < &2 < f2 <
,,_.i:Xee0'},
To=0,
e8}. (4.13)
For the process k, starting at the curve 8, the times it, f2, ... (up to the time i; (HI, H2) at which the process leaves D.(H1 i H2)) can be considered as the times at which the first rotation ends, the second one ends, and so on.
Let T (x) be the period of the trajectory starting at the point x; t(x) = min{t > 0: x,(x) a 8}, e(x) = min{t > 0: x,(x) a 8'}. For x e 8 we have e(x) < t(x) = T(x). Lemma 4.4. Let H1 < HI < H2' < H2 be some numbers in the interval H(I;). Let the closed region D1(H', H2) lie at a distance greater than some positive d from the complement of the region D.(H1 i H2); and let the distance between the curves 8, 8' be greater than 2d Let there exist positive constants B and h such that for all x e 8 n D,(H', H2) the distance of x,(x) from the curve 8,
for0
1Bt p(x,(x), 8) > d
for h < t:5 T(x) -- h,
B(T(x)-t) for T(x)-h
Then forxc- 0rDt(Hi,H2)and0
Pz{ 1z1 - T(x) >_ At} < 3 (
2L
E4
2
)
B4Ata ,
(4.14)
and for every A > 0, [T (X)
a- 2` dt
MX I
4 3314
(
eu(T(X)+h)
-1
)
2L 2-
MXe- to = S e-
For all x e Di(H', H2),
1/2
z
3(2L B+
(T(X)-h)
a
e2L(T(x)+h)
-1
e2L(T(x)+h) - 1
+3
2L
) 2
2
e2
' 7j4--A ,
(4.15)
E4
Y4-h-4 .
(4.16)
320
8. Random Perturbations of Hamiltonian Systems
Figure 23.
e2L(T(x)+h)
Px{`1 >2T(x)+h} < 3[ ri e
)
2L
MJedt < 2T(x) + h + 3 (e2L(2T(x)+h) 2L
E
2
-1
2
)
B4h4 2
(4.17)
a2
B4h4.
(4.18)
Proof Let us introduce two regions: Di = {x a D;(H1, H2) \(a u a') : t'(x) < t(x)}, D2 = {x a D,(H1i H2)\(a n a') : t(x) < t'(x)} (see Fig. 23).
Let us consider a point x e a n D,(H;, H2'). For 0 < t5 At we have xt(x) a Dl; this point lies at a distance not greater than
At 5 d from 0, and
so is at a distance greater than d from a'. For T (x) - At < t < T (x) the point x, (x) lies in D2, and for T (x) < t < T (x) + At again in D1. For At 5 t < T(x) - At the point x,(x) lies at a distance at least BAt from the curve 0.
Now suppose a trajectory Xt starting at Xo = x (remember, x e 0 o D,(H1,H2)) is such that maxoAt. Since xst(x) a Dl lies at a distance at least At from a, we have also Xnt e D1. For t between At and
T (x) - At, the trajectory X! does not intersect
a.
Since the points
XT(x)-et(x) E D2, XT(x)+,&t(x) e D1 lie at a distance at least At from the curve 0, we have also XT(x)-Ate D2, XT(x)+At e D1. So between the times At
and T(x) - At the trajectory if passes from D1 to D2, intersecting 0' (and therefore jr, is between At and T(x) - At), and between T(x) - At and T(x) + At (but not between dl and T(x) - At) the trajectory passes from D2 to D1, intersecting a. So, finally, T(x) - At < as"1 < T(x) + At. So we have proved the inclusion of the events: {
max
OSt5T(x)+h
l1 - x,(x)I < BAt} c {i i - T(x)I< At};
321
§4. Proof of Lemmas 3.1 to 3.4
and the inequality
PX{iil - T(x)I >- All < Px{
max
l 0:5 t5 T(x)+h
IXr - xr(x)I >- BAt } J))
together with Lemma 4.3 yields (4.14). Now, rT(x)
e-Ae2' dt
MX 1 J
S MX{ail - T(x)l < h; I fl - T(x)l}
:,
r
e- 2t dt
+ MX{ Ii1 - T(x) >_ h;
.
ro
111
The second term is equal to (.1e2)-1 PX{ail - T(x)l >- h} and is estimated using (4.14); the first one is also estimated by (4.14) and the following elementary lemma applied to the random variable ail - T(x) I A h. Lemma 4.5. If a nonnegative random variable is such that P{ >- x} < C/xa for every x > 0, then Mr; < 3 0114
So we have (4.15). The estimate (4.16) follows from (4.14) and the fact that MXe- 2i' = MX {i1 < T(x) - h; a- 2f' } + MX{i1 > T(x) - h; a- 2zj }. As for the initial point Xo = x 0 8, the trajectory X, that is close to xr(x) need not cross the curve 8' and 0 after this within the time T (x) + h, but is bound to do so within two rotations plus h, and the estimate (4.17) is proved the same way as the more precise (4.14); (4.18) follows immediately. Proof of Lemma 4.1. The expectation in (4.8) can be written as c2MX f
f, (H,,H2)
e-,ie2rg('Yi)dt.
0
Take two curves 8, 8' in a slightly larger region D;(H1 - 6/2, H2 + 8/2), and
consider the sequence of Markov times ik defined by (4.13). Let v = max{k : ik < fi (H1, H2)}. We have f
T,
o
2
v-1
tk+,
e
k=0
2
r (Hi,H2)
dt +
'
e-;Z21
g(Xf) dt.
k
(4.19)
We are going to apply Lemma 4.3 with D,(H1 - 6/2, H2 + 6/2) instead of
D.(H1, H2) and D,(H1, H2) instead of b1(Hi, Hi). We have to estimate T(x), B, choose the curves 8, 8', and estimate the constants B, h, d.
322
8. Random Perturbations of Hamiltonian Systems
We have h.-5 max{lVH(x)I : H(x) < A9}; the distance from D,(HI,H2) to the complement of D; (HI - 6/2, H2 + 6/2) is not less than 6/2B. Let us take d = 8/2B (so the first condition imposed on d is satisfied; the rest is taken care of later). The period T(x) of an orbit xt(x) going through a point x e D; is positive, it has a finite limit as x approaches an extremum point xk (such that the corresponding vertex Ok is one of the ends of the segment I,), and it grows at the rate of const j1njH(x) - H(Ok)I I as x approaches the curve Ck; if
it contains a saddle point of the Hamiltonian. In any case, we have A12 < T(x) < A1311n 51 for x e D,(HI, H2). Take two points xo and xo on a fixed level curve C,(Ho) in the region D,,
and consider solutions y,(xo), y,(xo), -oo < t < co, of yf = VH(y,) going through the points xo, x(1; the curves described by these solutions are orthogonal to the level curves of the Hamiltonian. Let us take a, a' being
the parts of the curves described by y,(xo),
y,(xo) in the region
D, (HI - 8/2, H2 + 6/2). The curves described by y,(xo), y,(xo) may enter the same critical point xk as t oo or t -oo (they certainly enter the same critical point if one of the ends of the segment I, corresponds to an extremum of the Hamiltonian), but from different directions. The distance from D; (HI - 6/2, H2 + t5/2) to a
critical point is clearly not less than A140, and it is easy to see that the distance between the curves a and a' is at least A150. This distance is greater than d = 6/2B for sufficiently small 6 (i.e., for sufficiently small e). Also we can take B = A16 V o We could have taken h = const as far as it concerns the requirements p(x,(x), a) > Bt for x e a n D; (HI, H2), 0 < t < h,
p(x,(x), 0) > B(T(x) - t) for the same x and T(x) - h < t < T(x); but we also have to ensure that Bh < d, so we take h = A 17 V o. Now, if we choose the constant A7 so that A7 (2LA13 + 1) < 2, then for sufficiently small ewe obtain from (4.15), (4.16), and (4.18), T(X)
MXI
e-2' dt
(4.20)
J
for x e a n D,(HI, H2), where A18 < min(1 - A7 (LA13 +
Z),
2-4A7*
(LA13 + 1)),
MXe- 2Z' S 1 -
A12 e2
(4.21)
for xe anD,(H1iH2), and T,
e-'2' dt < 2A13A7 1ln el + 1
Mx JO
for all xeD,(HI,H2).
(4.22)
§4. Proof of lemmas 3.1 to 3.4
323
Now let us extend the function g to the whole plane so that IIgII and the Lipschitz constant Lip(g) do not increase. Let us add to the right-hand side of (4.19) and subtract from it fJ1 H2) e 2(g(Xt) dt; then we can represent this right-hand side as 00
E
{ik
k=O IX
f
'(k+1
-aE21
rr+l
_ g(X(E)
a
zk
dt - J(HH)
a-aE21g(X(E)
dt.
Since K, Ek 0 II{ik
(
G
g(i) dt
M
k=0
fffk+ ,
Mx2k < T(H1, Hz);
e-A`2(g(2)
dt k
}
7v+1
- MEX
dt. f, (H,,H2)
(4.23)
The last integral does not exceed IIgII times the integral of a-,2r from the time i; (H1, H2) until the end of the cycle starting at this time of reaching 8' and 8 after that. Applying the strong Markov property with respect to the Markov time i7 (H1, H2) and making use of (4.22), we obtain: me
e-.182,
0
=,(Hl,H2)
(2A13A7 . Ilnsl + 1)IIglI
(4.24)
Applying the strong Markov property with respect to fk to the kth summand in the infinite sum in (4.23), we obtain: J Mr' Tk < Tr (Hl, H2);
l
f
Tk+l
e-,te2(g(xi) dt}
Tk
TI
= MX fk < f (HI, H2); a-xE2rk Mz f e-tE2(g(Xi) dtl 0
Z_X{k
So we can write an estimate for the infinite sum in (4.23) that is quite similar to the estimate (3.18):
324
8. Random Perturbations of Hamiltonian Syst
0
Tk+1
ji, < 'r, (Hl , H2); if"
ME
e-.lEZrg(X;)
k=0 f,
< IMX r
e-xE2'g(Xi)dt
+
0
dt } TI
ME
max
e- 2tg(Xr) dt
z J0
zeanbi(HI,H2)
00
Mx{ik < i°, (H1, H2); a
}.
(4.25)
k=1
Similarly to (3.11), the sum on the right-hand side is not greater than k
max
Mze-z
2T'
= 1-
k=0 zeanD1(H,,H2)
Mze-'
max
2T'
-1
zear Dj(Hi,H2)
2
-2
).A12
(4.26)
for sufficiently small e (see (4.21)).
As for the integral f0' e- 2tg(Xr) dt, it is approximately equal to
f
T(z)
dl
g(xt(z)) dt = fc,(H(z))
0'
Ib(x) I = 0.
g(x)
Let us estimate the expectation of the difference: for z e a n D;(H1, H2), i,
z tg(i) e-ae dt -
Me
z J0
T(z)
J0g(xt(z))
dt
T(z)
a
2tdtl
T
(_)
+ IIgII. JO
T
(1- adt + Lip(g) f
T(z)
Mi I X,
0
- xt(z) dt.
The first expectation on the right-hand side is estimated by (4.20); the second integral is not greater than Ac2 T(z)2/2 < (),/2)A13A7e21n2 e < CA1,; and the last integral, by (4.9), does not exceed e
J T(z)
.
0
le2 Lt - 1 dt < e eLT(z) L < const e 1-LA13A7 < e 2L
A1g
So we have for sufficiently small a for all z e 0 n D;(H1, H2): M
e- 2`9(Xi) dtI 5 2(II9II + Lip(g))
z I
J0
eAls.
(4.27)
325
$4, Proof of Lemmas 3.1 to 3.4
The first term in (4.25) is estimated by means of (4.22). Putting this estimate together with (4.24)-(4.27), and remembering to multiply the expect. ation by e2, we obtain for small e: r; (H1 ,H2)
e-'1tg(Xi) dtl S (4A13A7 + 2)e2I1n eI + 4
Mx
eAie
AA 12
Jo
So the estimate of the lemma is true if A7 < I/ (4LA 13 + 2), and as As can take any positive number that is smaller than min(l - A7 (LA13 + 2),
2-4A7 (LA13+1)). Proof of Lemma 4.2. Apply formula (4.3) to a; (Hl, H2):
H:))) -f(H(x))
M`e-.1
X
= Mx
ri(Hi,H2) a Zt [-Af ( H ( X , - )) ) 0
+ If"(H(Xx))IVH(X1)I2 + 2f'(H(X,))AH(X, )] dt. The right-hand side differs from the expectation of the integral in (4.9) only by MX
f
T (Hi,H2)
e-xtg(Xx)
0
dt,
where
g(x) =.f"(H(x))[IVH(x) I2-A,(H(x))] +f'(H(x))[IAH(x) -B,(H(x))] The definition (1.13) of the coefficients A;(H), B;(H) implies g e 61(H1, H2). In order to apply Lemma 4.1, we have to estimate the supremum norm of g and its Lipshitz constant. Since IVH(x) 12, AH(x) are bounded in every bounded region, and the coefficients A;(H), B;(H) in every finite subinterval of H(11), we have IIgII <_ A19(llf'II + IIf"II). As for the Lipschitz condition, we have Lip(f (k)(H)) < A2oIIf(k+1)II, where A20 = max{IVH(x)I : H(x) < A9};
Lip(. f"(H)IVHI2 +. f'(H)AH) <
[Ai0llf"'II + A21IIf"II + Ago A2211f"II + A23II.f'II], z
326
8. Random Perturbations of Hamiltonian System
where
A21 = max{IV(IVH(x)I2)I : H(x) < A9},
A22 = max{IAH(x)I : H(x) < A9}, A23 = max{IV(AH(x))I : H(x) < A9};
Lip(2 f"(H)A;(H) +f'(H)B;(H)) < 1 [A20II.f",II ,
max
H,SHSH2
A;(H) +
H,SHSH2 I A,(H)I ]
+ [A20IIf"II ' H,mHxH2 B1(H) I + IIf,II - H,mHxH2 I B,(H)I]
Using the estimate (1.14), we obtain Lip(g) s A24(IIf'II + 11f* 11 + II.f",II )a-A0 = A24(IIf'II + 11f* 11 + IIf",II).-A°A,0.
Taking positive A10 < min(A8/A0i A7), All < As - AOA10i and applying Lemma 4.1, we prove Lemma 4.2 and with it, also Lemma 3.3. Now let us formulate a lemma that follows from Lemma 4.2 and that is used in the proof of Lemma 3.6.
Lemma 4.6. Let [H1, H2] be a subinterval of the interval H(I;). Let g be a continuous function on [HI, H2]; ip, a function defined only at the points HI, H2. Then, for the time z; (HI, H2) of leaving the region D;(Hi, H2), t, f(Hi,H2)
lim
Mz [(H(x(H1H2)))
+ fo
g(H(X, )) dt =f(H(x))
(4.28)
uniformly in x e D;(H1, H2), where
u,(Hz) - u;(H) .f (H) = ui(H2) - ur(HI) LI (p(H1) +
H (ut(H) J H,
+ ui(H) - ui(H1) [9(H2) + ui(H2) - u;(HI)
- u; (H1))g(H) dvi(H) I
JH2(u1(H2) H
- ui(H))g(H) dvi(H)]
.
(4.29)
Proof. Lemma 4.2 has to do with functions of the value of the process multiplied by an exponentially decreasing function. To be able to tell anything about expectations without such multiplication, we need an estimate for the expectation MX r, (HI, H2).
327
§4. Proof of Lemmas 3.1 to 3.4
Lemma 4.7. For all x e D,(H1 , H2),
M`r, x `(Ht, H2)
ZB
exp
1b
B(HzHt)
l
(4.30)
where
b = min{IVH(x) I : x e D;(Hi, H2)} > 0,
B=max{IAH(x)I:xeD;(Hi,H2)}
f (H) = cosh((2B/b) (H - (Ht + H2) /2)).
Now let us prove Lemma 4.6. The function f is the solution of the boundary-value problem
Ht < H < H2,
L, f (x) = -g(H),
j = 1, 2.
f(Hj) = ro(Hi),
M+
It is enough to prove (4.28) for smooth g and f (because a continuous g can be uniformly approximated by smooth functions, and MXr,(HI, H2) is uniformly bounded by Lemma 4.7). By Lemma 4.2,
[e_71H2o(H(x:(H,,HZ)))
- f (H(x))
1=i(H,,H2)
Jo
e-at[A f(H(X, )) + g(H(Xt ))] dt] (4.31)
A26EAll,
where the constant A26 may depend on A > 0. The difference between the expectation in (4.28) and that in (4.30) does not exceed i(H,,H2)
II
,II
-
M(1 - e;(H,H2)) + II.fII MAe`dt L
fT`(H,,H2) (1-e-z)dt 0
s 211f 11 AMXrE(HI, Hz) + IIgII AMX(r7 (Ht, H2))2
< 2A(IIf1I - A25 + 11g1! - A2
25),
because M'(-r; (Ht, H2))2/2 S MX r; (Ht, H2) sups My r, (Ht, H2).
328
8. Random Perturbations of Hamiltonian Systems
Taking A small enough, and then a small enough, we make the difference between the expectation in (4.29) and f (H(x)) small. Lemma 4.6 is proved.
§5. Proof of Lemma 3.5 Let us formulate the c-dependent version of Lemma 3.5 (not implying this lemma immediately). Lemma 5.1. Let Ok be an interior vertex corresponding to a level curve Ck that contains a saddle point xk. There exists a positive constant A27 < i such that for 6 = S(e) such that S(e) Iln eI --+ 0 as a -+ 0, S(e) >_ CA27 we have Mxzk(±6) = O(62I1nc1)
as a -+ 0, uniformly in x e Dk(±8). Proof. Let us return to formulas (1.6), (1.8):
H(Xf)
H(Xo) + WI
Jo
IVH(XS)I2ds 1 +Jo 2AH(X`)ds.
The time zk(±o) comes at the latest when IH(X,) - H(XX)I has reached the level 26. It occurs, in particular, if the integral fl' AH(XE) ds is small in absolute value, and the stochastic integral (the value W (J I VH(Xs) I2 ds)) is large. A precise formulation is as follows:
{J0 I2AH(Xs) I ds < 8, o max to W
o
(J'
IVH(Xs )2 dS J l > 38 }
of
{zk(±8) < to}.
We are going to use the fact that if C62 < fo IVH(XX) I2 ds, we have
°max
(J'
VH(XX)12dS)I
IW(C62)I,
and that the random variable W(C02) has a normal distribution with zero mean and variance CO2. We have
zk(±a) > to} C {6(±O) > to,
Ir0
IVH(X)I2dS < C02}
o
u < zk(±8) > to, J00 IVH(Xs)I2ds > CO, 2I W(C02)I ?
l
36
}
to, JO VH(X)I2 ds > CS2, I W(C02)I < 3b}.
329
§5. Proof of Lemma 3.5
Suppose fo° I Z AH(Xs) Ids < 6 for all trajectories Xf such that Tk (±6) >- to. Then the second event on the ri ght-hand side cannot occur, and the third one
is a part of the event { W (C8) < 36}. Using the normal distribution, we have for all x e Dk(±8), to,Jr°
IOH(X,)I2ds < C82}
PX{Tk(±8) > to} < PX{Tk(±a) >
3/.2
+-3/J-7r YL 1
J
o
e-v z/2 dy.
It does not matter much for our proof what C we take in this estimate. Let us take C = 9; then the last integral is equal to 0.6826. Of course, 12 AHI 5 A28 in some neighborhood of the curve Ck; so if, for small 6, we take to < 6/A28i the inequality fo° I AH(Xs) I ds < 6 is guaranteed. So we have to estimate, for some to = to(S)i< 8/A28, the first summand on the right-hand side of (5.1). To do this, we consider once again cycles between reaching two lines. Just as in the proof of Lemma 4.1, introduce curves 8, 8' in a slightly larger region Dk(±26):
0 ={xEDk(±26) : Ix-xkI=A29V'i}, 0'= {xEDk(±2a) : Ix -xkI =A30v6}.
Just as in the proof of Lemma 4.1, let us consider the "slow" process X,'. Define Markov times a"o < &i < i`I < a2 < ... by io = 0,
a; = min{t >- ii-i : 1 E 8'},
fi = min{t >: &i : X, E a}.
For sufficiently large positive constants A29 < A30 the curves a, a' each consist of four arcs: two across the "sleeves" of Dk(±26) in which the solutions of the Hamiltonian system go out of the neighborhood of the saddle point, and two across the "sleeves" through which the trajectories go into the cen-
tral, cross-like part of Dk(±26). Let us denote the corresponding parts by aout, aout, Din, ai'n (they each consist of two arcs; see Fig. 24).
Before the time ik(±a), the times &, are those at which the process Xt goes out of the central part of the region Dk(±8); and ii, i > 0, are those at which it, after going through one of the two "handles" of this region, reaches its boundary. Lemmas 5.2 and 5.3 are taking care of the spaces between these times.
Let D, be one of the two outer "handles" of the region Dk(±8); e.g., its left part between the lines Bout and ain (or the right part); f,, will denote the first time at which the process X' leaves this region.
330
8. Random Perturbations of Hamiltonian Systems
Figure 24.
Lemma 5.2. There exist positive constants A31, A32, A33 such that for sufficiently small e and small6 > eA", Mzi- < A32 I1n6I,
(5.2)
for every x E D,; Pz{XT E ain}
1
0),
(e
(5.3)
uniformly in x c- a'; and Px { J." JVH(Xi) I2 ds > A33 }
1
(e
0),
(5.4)
uniformly in x e aout.
Proof. The proof is similar to those of Lemmas 4.4 and 4.1, and relies on the same Lemma 4.3. As in the beginning of the proof of Lemma 4.1, we have B = max{Ib(x)I : x c- Dk(±28)} < A34; again we take d = 8/2B. The time Ta(x) in which the trajectory xt(x) leaves the region D, (through the line Oin, of course) is bounded by A35 I1n6J, and the time the trajectory starting on of spends in the set {x : Ix - xkI >_ A36} is at least A37 if we choose the
constant A36 small enough and if 0 is small. We can take B = A38V and
§5. Proof of Lemma 3.5
331
h=A39V3so that for all xeD,,
p(xt(x),0) >
Bt
for0
d
for h < t < Ta(x) - h,
1B(Tc(x) - t) for Ta(x) - h < t < Ta(x). We obtain, by Lemma 4.3, the estimates (5.3), (5.4), and sup{Pz{z, > A35jln6 + 1} : z e D,} 5 sup{Pz{z' > TT(z) + h} : z e D,} -+ 0
(e -' 0), Px{z' > n (A35 11n51 + 1)} < [sup{Pz{z' > A35Ilntj + 1} : z E D,}]", Me C xC
A35I1nbj + 1
1-sup{PZ{z8>A35I1n5I+1}:zEDc}
for sufficiently small e.
Now let Dx be the cross-like part of Dk(±b) between the lines a' : Dx = Dx}. {x a Dk(±6) : Ix - xkI < A30yo} (see Fig. 25). Let iX = min{t Lemma 5.3. For some choice of positive constants A29 < A30, if 6 = 8(e) is such that 8(e) -+ 0, a(e)/e2 -+ oo as a -+ 0, we have (5.5)
MziX = O(ln(8/e2)) uniformly in x e Dx;
Px{ix < ik(±a) or Xix a Bout}
(5.6)
1
xe0.
as
Proof By the Morse lemma, one can introduce new coordinates , q (in lieu of p, q) in a neighborhood of xk so that the point xk is the origin, and the two branches of the oo-shaped curve are the axes and .7. Let the c-axis be the branch along which the saddle point xk is unstable for the dynamical system. In the new coordinates the operator LE has the form:
L'f(x) = 2
I
a a(2) + 2a12(x)
+ B' (x) a a(x) + B2(x) Of(X) + e2 [b1
a22(x) a(2 )J
(x) a
) + b2(x) 'f
( ) (5.7)
8. Random Perturbations of Hamiltonian Systems
332
Figure 25.
where the coefficients a'y(x), B'(x), b'(x) are smooth, and the symmetric matrix (a''(x)) is positive definite. The second sum here is equal to VH(x) Vf (x). Because of the fact that xk is a nondegenerate saddle point of the Hamiltonian, and because of our choice of the axes, we have
B' (0,,j) =
0) = 0;
aBI
0
>o
aB z
< 0.
01
The functions a'J r1), rj), b' )7) are defined in a neighborhood of the point (0,0); let us extend these functions of the whole plane so that au, b' are bounded, z
11) >- ao > 0,
everywhere.
B'(0, rl) = B2
0) = 0,
a > Bo > 0, aa2 < -Bu
333
§5. Proof of Lemma 3.5
for the diffusion in the plane governed by the We use the notation operator (5.7) with coefficients extended as described above. Before the process k;' leaves a neighborhood of xk, l and $' are the c- and 1-coordinates of this process. The curves a, a' are no longer parts of circles in our new coordinates, but for small 6 they are approximately parts of ellipses the ratio of whose sizes is A30/A29. We can choose positive constants A29 < A30, A4f, and A41 < A42 < A40 f for x e a', 1?11 < A41 f for x e a, so that, for small positive 6, and ICI > A4205 for x e an Let us consider the times iout = min{ t : J fl = A40 f } and f _ min{ t : 1X1'1 = A420}- It is clear that iz < tout; and if the process X,' starts then it leaves the region D., either from a point Xo = x e a, and z`out < through one of its sides (that are trajectories of the system), or through aout It turns out that the time iout is of order of ln(6/e2), whereas z is of order at least 6/e2, i.e., infinitely large compared with iout (one can prove using large-deviation estimates similar to Theorem 4.4.2 that n is, in fact, of order exp{const6/e2}, but we do not need that). The proof is based on comparing the processes er, i' (that need not be Markov ones if taken separately) with one-dimensional diffusions. e_y Let us introduce the function F(x) = fox [J'4' e'2 dz] dy. Using l'Hopital's rule, we prove that F(x) - 1 In x as x -> oo. Let us consider the function. iyn,
2
2Bo 1 [F(
BoA4ov' /e) - F( Bole)].
This function is the solution of the boundary-value problem e2 d2ue
du`
2 d 2 +B0 d _ -1,
so
is
the expectation
ue(±A4of) = 0;
of the
time of leaving the interval
(-A4of, A4of) for the one-dimensional diffusion process with generator starting from a point e (-A4of, A4of) Let us apply the operator Le to the function u` on the plane, depending on the c-coordinate only: Leue(x) = a11(x) - (2(
- B1(x) 2 ( Bo
We have
Bole) - 1 2x
Bo/e)F'( Bo le) - o b1(x)F'( Bo le).
B1(0, q) + (aB1 /a )( ,,,), Leve(x) < - ao + (2e/C). - max I F' (x) I < - ao/2 for sufficiently small c, and we obtain by formula (5.1) of Chapter 1: sup lb 1(x) I
334
8. Random Perturbations of Hamiltonian Systems
Mxzout <
a
2 < A43 ln(8/s2).
By Chebyshev's inequality, 0
Px{ our > (6/82) 1/2}
ass -+ 0, uniformly in x. Now let us consider the function v6(rj) = eBo(,/E)2. This function is a positive solution of the equation 2d2
E
E
2 d,72 -
Bon
BovE = 0,
dil
so the (-Bo) -exponential moment of the exit time for the one-dimensional diffusion with generator ((8 2/2)(d2/d)12)) - Boij(d/drj) can be expressed through it. Applying the operator LE to the function vE(q), we obtain: LEve(x)
=
[a22(x)
(Bo + 2BZ i2 f + B2(x) 2 !!01 + b2(x)
2Boll] eBo(n/E)z.
For sufficiently small InI we have Levi - BovE < 0, and by formula (5.2) of Chapter 1, vE(n)
We-Bot;, <
.
VE(A4zf) For x e 0 we have Mxe-Bore < PE{iEm <x
(8/e2)t/2} =
x
2 eBo(A421-A42)6/e2,
and by Chebyshev's inequality
PE{e-B0 ' > e-BO(,j/82)1/2} E
-BpTe
< Mxe
<
e-Bo(A4z-A41)alE2+Bo(a/e2)1/2
0
e-Bo((/E2) `n
ass
0, uniformly in x e 0. This proves Lemma 5.3.
Let us finish the proof of Lemma 5.1. Return to Fig. 24. The first probability on the right-hand side of (5.1), in which we have taken C = 9, is equal to I
o/EZ
Pxl 'r(+a) > to/8210
l IVH(Xs)I2ds < 9(2/821.
(5.8)
335
§5. Proof of Lemma 3.5
For an arbitrary natural n this probability does not exceed T"
p; in < to/e2 <_ i8(+8), ll
)I2ds < 962/E2
IVH( JO
}
+ PX{i Aik(±t5) 7 to/?}. The second probability is estimated by Chebyshev's inequality:
Pz{i A ik(±t5) > to/E2} <
x(ZroA/E2(±8))
(5.10)
Using the strong Markov property with respect to the Markov times ij, v;, we obtain, by Lemmas 5.2 and 5.3, for sufficiently small e, M' (i A ik (±t5)) < n -sup{ MZiX : z e Dx} + sup{ Mii,' : z c- D,}] (5.11)
To estimate the first probability in (5.9), we use the exponential Chebyshev inequality: for a > 0,
< fe(±b),exp{-a l <
me
X{i < fE(±8);
Jo
exp{-a fo"
e._9a621E2
IVH(X.,E)I2ds} Z
l
JAH(Xs )I2 ds}}
(5.12)
.
e-9a52/e2
Using the strong Markov property with respect to the times fi, we can write
MX-If. <=E(±t5);exp{-a
J0
IVH(Xs)12ds}} n-1
r
r
[sup Mi{i1 < ie(±8);exp{ -a =E8
ll
f Jo
JOH(X:)12ds}}J JJ
Let us use the strong Markov property with respect to the Markov time al (which, for the process starting at a point Xo = z e 0, coincides with the time fl if it is smaller than ie(±8)): Ifi,
Mz{z1 < zE(±o);exp{-a
1(
IVH(Xs)I2dS}J
J0
= Mz{tx < ik(±,5);exp{-a J?IVH(.k,8)1'dS } 0i(Xis)}1 x
l
.
o
(5.13)
336
8. Random Perturbations of Hamiltonian Systems
where 11
¢7(z') = Mz{T,'
The argument in 07' always belongs to a'. For z' e aln, we have 0; (z') < 1, and for z' E we use the inequality r f l r r0a, 0'(Z') < PZ, S IOH(Xs )I2 ds < A33 j + pe, IOH(Xs) I2 dS > A33 } . e-aA33 Jo `J
l
l
JJJ
JJJ
Again, it does not matter much what a we take; let us take a = A3'3 . By Lemma 5.2, this expression, for sufficiently small e and for every z' a a', is smaller than a-1 plus an arbitrarily small quantity. Now, the expression (5.13) does not exceed PE z{ fk k < i kE (+- b)
+
f +8 Xzx E a'out} ' sup{07(E z' ) ' z' E a out } s E a'm} +PE z{ T` x
zx
and by Lemma 5.3, for sufficiently small e and for every z e a, is smaller than e-1 plus an arbitrarily small quantity.
Now let us take n = n(e)
oc that is equivalent to
as
e -+ 0. Then, by the estimate (5.12) and the following, the first summand in (5.9)
goes to 0 as e
0,
uniformly in x E Dk(±S). If we take
to = 10OA44A33182I1n EI, the second summand in (5.9) is estimated by a quantity that coverges to 0.1 as E -> 0 (see (5.10) and (5.11)), and the probability (5.8) is less than 0.11 for sufficiently small e and all x e Dk(±S). Remember also that for (5.1) to be satisfied we postulated that to should be smaller than S/A28: this is satisfied for sufficiently small a, because we require that 6(E)IInel --* 0. So by (5.1) we have PX{rk(±8) >_ to} < 0.8 for sufficiently small a and x e Dk(±(). Application of the Markov property yields PX{Tk(±8) z nto} < 0.8", and MXzk(±6) <
to
1 - 0.8
= 50OA44A3382I1naI.
Lemma 5.1 is proved.
Proof of Lemma 3.5. We are going to prove that there exist positive constants A45, A46, A47 such that rTk ±tS)
e-z` dt < A4582I1n8I
MX
(5.14)
J0
for all sufficiently small e, eA46 < S < A47, and x c- Dk(±8).
Take So = 60(E) = EA46/3, and consider the cycles between the Markov times ro < al < zi < 0`2 < - -, where
337
§5. proof of Lemma 3.5
ro = 0, a,, = min{t >: rn-I : Xt 0 Dk(±80)}, and r,, = min{t >- a,, : Xr E U; Cki(6o) v Ui Ck;(6)}. The expectation in (5.14) is equal to 00
Q"+1 a-z`dt} + J" J n=1
r Mx{ rn < rk(±8); f
n=0
111
M% Jan
ll
T
< rk(±b); f " e-ztdt}
00
Mx{Tn < rk(+8); a-Zr"} sup{MZrk(±280) : z e Dk(±280)} n=0 00
+
r1
sup Mz J e-zt dt : z e U Cki(20)
Mx{an < Tk(±8); n=1
l
0
JJJ
(5.15)
(the strong Markov property is used).
The first supremum is estimated by Lemma 5.1; it is O(Sollnel) < A488o8I1n81. For a path starting at a point z e Ck;(260), the time r1 is nothing but r; (H(Ok) ± 8o, H(Ok) ± 6). To estimate its Mz-expectation, we apply Lemma 4.2 to the solution of the boundary-value problem
)f,(H)-Lf(H)=1, J (H(Ok) ± 80) =f(H(Ok) ± 6) = 0. It is easy to see that 11f 1-) 11
280) < A506o8Iln8I, and
A4962Iln8I,
By Lemma 4.2 we have
.
Tr (H(Ok) ±SO,H(Ok) ±o)
TI
Mz JO e-At dt=Me
e-At[2f(H(XX))--Lf(H(X,))]dt
J0
< fii(H(z)) + (IIJ'll + 11f"11 + Il fat
II)EA .
The first term is not greater than A50608I1n8I; the second is infinitely small compared to 8o8Iln8I if we choose A46 small enough. So both supremums in (5.15) are O(8o8Iln6I). Now, using the strong Markov property, we obtain the estimates: CO
cc
Mx{an < rk(±8); e-za" } < E Mx{rn < rk(±8); a- Ir" } n=1
n=0
r
r
ll
ll
e sup{ MZ{ X:, e U
n=O
I
Ck;(So);e-xr,}}l
1
(5.16)
338
8. Random Perturbations of Hamiltonian Syste,
Let F1 be the solution of the boundary-value problem
.1F1(H) - L,F,(H) = 0, F,(H(Ok) ± 60) = 1,
F;(H(Ok) ± 6) = 0.
For this solution Fj(H(Ok) ± 280) < 1 - Aki6o/8; so, applying Lemma 4.2, we get for sufficiently small a and for z E Cki(26o), me
X,, E U Ckr(6o), ext' } < 1 - Aki6O/(26),
z
J
and the sum (5.16) does not exceed 28/(6o min, Ak;). This yields (5.14). Remark.
p
It is easy to deduce from formula (5.14) that M6zk(±6) < A5182I1n8I
(5.17)
for all sufficiently small e, 012 < 6 < A53, and x E Dk(±8). To do this, we e-l)-t fok(ta) use the fact that rk(±8) < (1 a-2'dt for Tk(±8) < 1/.l, Ps{rk(±8)
1/.l} < A4562I1n8I/(1 - e-'),
PX{zk(±8) > n/..} < [A4562I1n8I/(1 - e-l)]".
§6. Proof of Lemma 3.6 We use a result in partial differential equations, which we give here without proof, reformulated to fit out particular problem.
Let 2E be a two-dimensional second-order differential operator in the domain D = (H,, H2) x (-oo, oc) given by
Y` = e-2
B + a l l (H' e) 8H2
+ 2a 12(H, 0) 8H88
+
a2
(H, 0) aa2
+b'(H,0)aH+b2(H,0)e, where
(6.1)
for all (H, 0) e D and real , ri, 0 < A < A < oo, and Ib'(H, 0)I < B. Lemma 6.1 (see Krylov and Safonov [1]). For every 6 > 0 there exist constants a = a(A, A) e (0, 1) and N = N(A, A, h, 8) < co such that
Iu(H',0') -u(H,0)I
(6.2)
339
§6. Proof of Lemma 3.6
for every solution u of the equation .1 eu = 0, for all H, H' E (Hi + 6, H2 - 6), and all 0, 0. Neither a nor N depend on e.
Using this result, we obtain a lemma that is important in the proof of Lemma 3.6 (in which we apply it to the numbers H(Ok) ± 6'/2, H(Ok) ± 6', and H(Ok) ± 28').
Lemma 6.2. For a segment I, of our graph, let H1 < H < HZ be numbers in the interior of the interval H(I,). Then lim
max
max I Mx,f (X,,(H,,H2)) - M12f (X,,(H,,H2))I = 0
(6.3)
e-.0 x,,x2 a Cg(H) 1:11111 51
(the functions f over which the maximum is taken are defined on 3Di(H1, H2)).
Proof The function ue(x) = Px{Xtj(H, H2) E y} is a solution of the equation Leue(x) = 0 in the domain Di(H,, H2). Introducing new coordinates in this annulus-shaped domain: the value of the Hamiltonian H, and the periodic "angular" coordinate 0, we have Le = e-2B(H, 0) (8/a0) + z A with positive B(H, 0), and dividing the equation by B(H, 0) brings the operator to
the form (6.1). The estimate (6.2) with H' = H proves (6.3) (the rate of convergence is that of const e2a, where the positive exponent depends on H1, H2, and the constant also on H). Now let us prove Lemma 3.6. We want to prove that for Ij - Ok for every 6 > 0, for sufficiently small 6', 0 < 6' < 6, the difference Px{Xtk(±6) E Ckj(6)} -pkjI
is small for sufficiently small E and for all x e Dk(±8'), where pkj are given by Pkj = I3kjl Ei:i,-O, Pki.
The first step is proving that for every i such that h - Ok and for every
function f on 8Dk(±6) with 11f 11 <_ 1 (in particular, for f (z) = xCkj(6)(z)), lim
max
e-.0 XI,X2eCki(a')
IFe(x1) - Fe(x2)I = 0,
(6.4)
where
Fe(x) = Mxf(Xik(fd)).
(6.5)
Take 6" and 6"' so that 0 < 6" < 6' < a"' < 6. Let us apply the strong Markov property with respect to the time T, (H(Ok) ± 6", H(Ok) ± 6"') <
340
8. Random Perturbations of Hamiltonian Systems
rk (±t): for xm c- Cki(6'), m = 1, 2, we have F`(xm) = M' FE(Xz (H(ok)±a",H(ok)±lr))'
Now (6.4) follows from Lemma 6.2.
The second step is proving that for every 8 > 0 and every K > 0 there exists S', 0 < 6' < 6, such that for sufficiently small e, max
IFE(xi) - F`(x2)I < K.
X1 ,X2 E Dk(±5')
There are exactly three regions, Di., Di and Di27 that are separated by the separatrix Ck, one of them, Di,,, adjoining the whole curve Ck, and Di,, D,2 adjoining only parts of it (so that Cki(, = Ck = Cki, u Cki2). We have either H(x) > H(Ok) for x c- D;(, and H(x) < H(Ok) for x c- Di, u Die, or all signs are opposite.
Let r be the first time at which the process Xr reaches Ck,, (8') or Since r < rk (±6) for the process starting at a point
Cki, (6) v Cki2 (S).
x e Dk(±S'), we can apply the strong Markov property with respect to T. We have for x E Dk(±8'), FE(x) = MX{Xi E Cki,(a) lJCki2(a);f(XX)} +Mx{XT E Ckbla)iFE(X,)},
inf
Px{X= E Cki, (6) V Cki2(8)}
f (Z) + PX{X= E Cki.(S)}
Z E (3Dk (±S)
inf
FE(Z)
Z E Cki, (a
< F8(x) < PX{X E Cki, (a) U
sup
Cki2(6)}
f(z) + PX{X E Ckio(6)}
Z E BDk (±J)
sup
FE(z),
Z E Ckio (S')
therefore for xi, x2 E Dk(±8'), IF' '(xi) - FE(x2) 15
sup
f (z) -
zeMk(±J)
inf P z)
ZEODk(±t5)
x max(P.,, {X E Cki, (a) u cki2 (a)}, Pz2 {x= E Cki, (a) u Cki2 (a)})
+ I sup LZ6
Ckb(S')
FE(z) -
inf
FE(z)
.
ZECkip(o')
The second difference, by (6.4), is arbitrarily small if e is small enough. Let us estimate Pz{X,, E Cki, (!S) u Cki2 (S)} for X E Dk(ta').
341
§6. Proof of Lemma 3.6
Apply formula (4.5):
MXH(XX) - H(x) = Mx
Jo
- AH(XX) di.
(6.7)
Here X, E Ck o (6') u Ck,, (6) u Ck,2 (6). The function H takes the value H(Ok) ± 6' on Ck,.(8'), and the value H(Ok) + 6 on Ck,, (6) v Ck,2(6). The left-hand side in (6.7) is equal to [H(Ok) ± 6'] [1- Px{Xz E Ck,, ((5) u Ck,2 (a)}]
+ [H(Ok) + 8] PX{Xi E Ck,, (S) U Cki2 (b)},
so (6.7) can be rewritten as (b + b') PX{X= E Cki1(b) U Cki2 (6)
=8'±[H(Ok)-H(x)]+MX
Jo
2AH(X,)dt
<26'+IMX J" ZAH(X,)dtl, 0
and the expectation of the integral does not exceed i sup JAHI MXr < 2 sup IAHI MXrk-(±b) < A54 8211n8j for sufficiently small e by (5.17). So for sufficiently small e, PX{X= E Cki, (b) u Ck,, (8)} <
+ A54811n6j,
which can be made arbitrarily small by choosing small positive 6, and then small positive b' < 6. So the second step of our proof is complete. It is true that we formulated our statement for an arbitrary positive 6, and finished with establishing it only for sufficiently small 6. But, first of all, small 6 are enough for our purpose of proving our main theorem; and secondly, this happened because we used the very rough inequality MXr < MXrk (±6), and it is very easy to verify (6.6) for arbitrary positive 6, making use of the strong Markov property.
So PX{X= (+) E Ck,(5)} has approximately the same value for all x E Dk(±6') (for 6' small enough and e small enough). What remains is to prove that this value is approximately pkj = Ail >i:I;.Ok Yki, where Yki are given by formula (2.2). To do this, we use the fact that the invariant measure .i for the process (XI, PX), which is the Lebesgue measure, can be written as an integral with respect to the invariant measure of the imbedded Markov chain. Let us adduce this result in our specific case.
342
Cs
8. Random Perturbations of Hamiltonian Systems
Let us choose a number Ho that is greater than all H(xk) + 26. Let = Uki Ckj(6) v C(Ho - b), Cs' = Uk i Cki (b') u C(Ho - b'). Introduce
the random times ro
by zo=0, ak=
mini >- rk : X, -c- Ca}, rk = mini >- ak_t : Xl e Ca}. The sequence Xe irk, k = 0, 1, 2, ... , n, ..., is a Markov chain (for k >- 1, all Xik E Ca' ); and if Xo e Ca, the sequence Xak, k = 0, 1, 2, ... , n, ... , is also a Markov chain
on Q. Every invariant measure p of the process (X, Px) can be represented in the form 4u(A) = J
J0
Ca
ve(dx)MI
XA(XI) di = f
ve(dx)M
J Cdr
t,
JO X.g(4) di, (6.8)
w here ve, v'e are measures on Ca, Ca' satisfying the system of integral
equations ve(B) =
v'e(C) =
v'e(dx)Pz{Xao E B},
v8(dx)Px{XT, E C}
(6.9)
J C6
J C6r
(see Khas'minskii [7], also Wentzell [5]). These measures are invariant measures of the Markov chains Xak, X k The measures ve, v'e are nonzero (since p 0 0), and for every Ii - Ok, ve(Cki(b)) > 0,
v'e(Cki((5')) > 0,
because starting from each of these sets, or from C(Ho - S) (or Cki(b')) it is possible, with positive probability, to reach every other Cki(b), Cki(b') in a finite number of steps (cycles). So we can introduce the averages:
8 - fUi:1,-Ok PkJ
Cki(5') v'e(dx)Px{X7k(±5) e Ckj(b)}
vie(Vi:l,.,
(6.10)
/
ok
Cki(b ))
According to (6.6), Px{Xtk(±S) E Ckj(b)} -Pkjl < K
(6.11)
for all xeDk(±b'). If the process X, starts in the set C5', it cannot be in Ckj(b) at time ri unless it started in Ui:J,_ok Cki(b'). Therefore, by the first equation in (6.9), Ve (U1:Ii- Ok Cki (b)) = 0'(Ui:Ji^ Ok Cki(b')), and the numerator in (6.10) is nothing but ve(Ckj(b)). So (6.11) can be rewritten as Px{X=k(to) E Ckj(6)}
ye(CyJ(b)) - Ei:Ji-ok (Cki(b))
(6.12)
for sufficiently small positive b' < b, sufficiently small e, and x e Dk(±6').
343
§6. Proof of Lemma 3.6
So the next step is to estimate ve(Ckj(b)) for small b > 0, small 6', 0 < 6' < 6, and small e.
We have not used formula (6.8) yet. This formula implies the corresponding equality for the integrals with respect to the measure j (the Lebesgue measure): if G(x), x e R2, is integrable, we have ,
G(x)je(dx) =
G(x) dx = JJR2
JJR2
G(XE) A
ve(dx)Mx J0
JCa
If the function G(x) is equal to 0 in the region Ix: H(x) > Ho - 8} and in all regions Dk(±b), then G(X,) dt = 0, and R2
Jf
G(x) dx =
Jcd
v`(dx)M'
Jo
G(X,) A
(6.13)
Let us consider an arbitrary segment Ij of the graph with ends Ok, and Ok2. Let us denote Hk, = H(Ok,), Hk2 = H(Ok2), except if the vertex Ok2 corresponds to the point at infinity, in which case we take Hk2 = Ho. Let g(H) be an arbitrary continuous function on the interval H(Ij) that is equal to 0 outside its subinterval (Hk, + 6, Hk2 - 6). Consider the function G(x) that is equal to g(H(x)) in the domain Dj, and to 0 outside Dj. The left-hand side of (6.13) can be rewritten as
ft I
JD JJJ
Hk2-a
g(H(x)) dx =
Jg(H) dvj(H), Hk, +S
where uj (H) is the function whose derivative is given by formula (1.15). For the expectation in the right-hand side of (6.13), formulas (4.28) and (4.29) hold, and we can write Hk2-6
J Hki +a
g(H) dvj(H) = VE(Ckj(6)) Hk2
x
JHk
11
(uj(Hk2 -6') - uj(H))g(H) dvj(H) + o(l)1
+ V(Ck2;(6)) X
I uj(Hk' +6) - uj(Hk, +a') uj(Hk2 -b) - uj(Hk, +8 )
fuj(Hk2 -S') - uj(Hk2 -6) Luj(Hk2 - a) - uj(Hk, + 15')
Hk2 -6 JHk (uj(H) +6
11
- uj(Hk, +&'))g(H) dvj(H) + o(1)] ,
where the o(l) go to 0 as e --> 0.
In order for this to be true for an arbitrary continuous function g, it is necessary that the limits of v8(Ck j(S)), V6(Ck2j(6)) exist as a -+ 0, namely,
344
8. Random Perturbations of Hamiltonian Systems
urn VE(Ckv(a)) = uj(Hk, Urn VE(Ck2j(8)) _ e--.0
I
+a) - uj(Hk, +8')
uj (Hk2 - a) -1 uj (Hk2 - 6)
But by formula (1.17), uj(H') - uj(H) - 2(fck IVH(x) I dl)-' . (H' - H) as H, H' -+ Hk, (we disregard the case of Hk2 ='flo). This means that, for sufficiently small 6 and e,
V`(Ckj(a)) -
Ykj
2(8-a')
I
Together with formula (6.12), this yields IPx
JJ{XT;(±d)
E Ckj(a)} -Pkjl < 2K,
and Lemma 3.6 is proved.
§7. Remarks and Generalizations 1.
Let a system
Xr=b(X,),
Xo=xeR2,
(7.1)
in the plane have a smooth first integral H(x) : H(X,) = H(x) for t > 0. Assume that H(x) has a finite number of critical points and limlxl-,,, H(x) = oo. Since H(x) is a first integral, VH(x) b(x) = 0, x E R2, and thus b(x) = f3(x)OH(x),
where /3(x) is a scalar. Consider small white noise perturbations of such a system:
X, = /3(X,)OH(X,) + eW,,
Xo = x.
(7.2)
Here W, is the Wiener process in R2, 0 < e << 1. If /3(x) - constant, system (7.1) is Hamiltonian. If /3(x) # constant but does not change its sign, the situation is similar to the Hamiltonian case. But if /3(x) changes its sign, the problem becomes more complicated: even if H(x) has just one critical point, the slow component of the perturbed process may not converge to a Markov process as e -> 0. To have a Markov limit, it may be impossible to avoid consideration of processes on graphs. We restrict ourselves to just an example (see Freidlin [19]). Let H(x), X E R2, have just one minimum at the origin and H(0) = 0, and
345
§7. Remarks and Generalizations
I
2
ro)
(a) Figure 26.
let the level sets C(y) = {x e R2 : H(x) = y}, y > 0, be smooth curves homeomorphic to the circle (see Fig. 26). Suppose /3(x) is negative inside the loop ABCDEFA and positive outside this loop. Here, A, C, D, F are the points where the loop {x a R2 : /t(x) = 0} is tangent to the level curve of H(x); B is the point where the trajectory starting at F crosses AC for the first time, and E is the point at which the trajectory starting at .4 crosses FD for the first time. The dynamical system on a level set C(y) has a unique invariant measure if y 0 (H(C), H(D)). The density of this measure with respect to the length
element on C(y) is const x (1$(x)I IVH(x)1)-l. But, for y e (H(C),H(D)), the dynamical system has more than one equilibrium point on C(y). An invariant measure is concentrated at each of these equilibrium points. The dynamical system has four rest points when y e (H(B), H(A)): two stable and two unstable. This results in the formation of a new "stable" first integral independent of H(x). To define this new first integral k(x), denote by xol)(y) the point of intersection of C(y) and the arc FED of the curve y = {x e R2 : /1(x) = 0}. Denote the intersection point of C(y) and of the arc ABC by xa2 (y). We assume that there exists at most one such point xol)(y) and at most one It is clear that x6(1) (y) exists for H(F) < y < H(D), and exists for
H(C) < y < H(A). Define k(x), x e R2, as follows.
if xo')(H(x)) exists and x belongs to the domain of attraction of x(') (H(x)), i = 1, 2; 1, if H(x) H(D), or if x belongs to the arc CMD; 2, if H(x) < H(C), or if x belongs to the arc ANF. i,
k(x) =
It is easy to see that k(x) is a first integral for the system shown in Fig. 26(a).
346
8. Random Perturbations of Hamiltonian Systems
Let now k,' be the perturbed process defined by (7.2) for the dynamical system shown in Fig. 26(a). Let Xe be the corresponding rescaled process:
Consider the graph IF in Fig. 26(b). The pairs (i, H), H >: H(O) = 0, i e 11, 2}, can be used as coordinates on F. The points (1, H(F)) and (2, H(B)) as well as (1, H(E)) and (2, H(A)) are identified.
Define the mapping Y(x) : R2 --i F by Y(x) = (k(x), H(x)), and let Y1 = Y(X1) It is easy to see that, when H(X,) 0 [H(C), H(D)], the fast component of the process X,1 has distribution on C(H(X,)) close to the unique invariant measure of the nonperturbed system on the corresponding level set of H(x).
When H(x') e (H(C), H(B)) u (H(A), H(D)), the distribution of the fast component is close to the measure concentrated at the stable rest point of the
nonperturbed system on the corresponding level set. But when H(Xf) e (H(B), H(A)), the fast coordinate of the X, is concentrated near one of the points xo1 (H(X, )) or x(62)(H(X, )), depending upon the side from which H(X,) last entered the segment [H(B), H(A)]. If c << 1, the process Xc "jumps" from F to B along a deterministic trajectory. Similarly, Xf "jumps" from A to E. Using these arguments, one can check that the processes YY converge weakly as 6 --b 0 in the space of continuous functions on [0, T], T < oo, with the values in F to a diffusion process Yt on F. The process YY is defined as follows. Let T(y) = fc(y) [IP(x) IVH(x) I I I] -' d8 and [T(Y)]-t fC(Y) IVH(x)I
At(Y) =
dl x)I
H(F) < y < H(D);
IVH(xo)(Y))I2,
AH( x) di
(27'(Y))-'
Jc(y)
IQ(x)VH(x)I '
T (Y)] -` f
IvH( x)I
c(y)
A2(Y)
fl(x)I
IVH(xo)(p))I2,
B2 (Y)
_
( 27 (Y))
y > H(D),
H(F) < y < H(D);
2 AH(xo') (Y)), [
y > H(D),
AH(x) dl - ky) If()VH(x)I'
i AH(xo2) (Y)),
Define the operators Li, i = 1, 2,
y < H(C), H(C) < y < H(A); y < H(C), H(C) < y < H(A).
§7. Remarks and Generalizations
Li = 2 Ai(y) y2 + B;(y)
347
-.
The process Yt on F is governed by the operator Lt on the upper part of the graph F(i = 1), and by L2 on the lower part (i = 2). The operator L2 degenerates at the point (0, 2) e IF, so that this point is inaccessible for the process Y,. To define the process Y, for all t >_ 0 in a unique way, one should add the gluing conditions at the points (1, H(F)) identified with (2, H(B)) and at (1, H(E)) identified with (2, H(A)). These gluing conditions define the behavior of the process at the vertices. As we have already seen, the gluing conditions describe the domain of definition of the generator for the process Yt. Let Yt be the process on IF such that its generator is defined for functions f (i, y), (i, y) e F, which are continuous on 1, smooth on {(i,y) e F : y >_ H(F), i = 1} and on {(i,y) e F : y < H(A), i = 2}, and
satisfy Llf(l,H(F))= L2f(2,H(B)) and Llf(1,H(E))=L2f(2,H(A)). Obviously, these gluing conditions have the form described earlier in this chapter with one of a ki = 0 for k = 1, 2. The generator coincides with Ll at the points of the set {(i, y) e F : y > H(F), i = I} and with L2 at the points of {(i,y) e F : y < H(A), i = 2}. The limiting process Yt on r is determined by these conditions in a unique way. We have seen that the graphs associated with the Hamiltonian systems in R2 always have a structure of a tree. If we consider dynamical systems in 2.
the phase space of a more complicated topological structure, the corresponding graph, on which the slow motion of the perturbed system should be considered, can have loops. Consider, for example, the dynamical system on the two-dimensional torus T2 with Hamiltonian H(x) defined as follows. Let the torus be imbedded in R3 and the symmetry axis of the torus be parallel to
a plane 11 (see Fig. 27(a)), and H(x) be equal to the distance of a point x e T2 from 11. The trajectories of the system then lie in the planes parallel to 11 as shown in Fig. 27(a). The set of connected components of the level sets of the Hamiltonian H is homeomorphic to the graph r shown in Fig. 27(b). This graph has a loop. One can describe the slow motion for the diffusion process resulting from small white noise perturbations of the dynamical system in the same way as in the case of the systems in R2. The dynamical system or this example is not a generic Hamiltonian sys-
tem on T2. Generic Hamiltonian systems on two-dimensional tori were studied in Arnold [2] and Sinai and Khanin [1]. In particular, it was shown there that a generic dynamical system on T2 preserving the area has the following structure. There are finitely many loops in T2 such that inside each
loop the system behaves as a Hamiltonian system in a finite part of the plane. Each trajectory lying in the exterior of all loops is dense in the exterior. These trajectories form one ergodic class. There are two such loops in the example shown in Fig. 28.
8. Random Perturbations of Hamiltonian Systems
348
(a) Figure 27.
(a)
(b) Figure 28.
(c)
349
§7. Remarks and Generalizations
The Hamiltonian H of such a system on T2 is multivalued, but its gradient VH(x), x e T2, is a smooth vector field on the torus. Consider small white noise perturbation of such a system:
X; = OH(.i) + cW,,
X, = zt/E2.
To study the slow component of the perturbed process, we should identify some points of the phase space. First, as we did before, we should identify all the points of each periodic trajectory. But here we have an additional identification: since the trajectories of the dynamical system outside the loops are dense in the exterior of the loops, all the points of the exterior should be glued together. The set of all the connected components of the level sets of the Hamiltonian, provided with the natural topology, is homeomorphic in this case to a graph I' having the following structure: the exterior of the loops corresponds to one vertex Oo. The number of branches connected with Oo is equal to the number of loops. Each branch describes the set of connected components of the Hamiltonian level sets inside the corresponding loop. For example, the left branch of the graph I' shown in Fig. 28(c), consisting of the edges 0002i 0201, 0203, describes the set of connected components inside the loop yl. Let HI, H2,. .., HN be the values of the Hamiltonian function on the loops y , . .. , yN. Denote by H(x), x e T2, the function equal to zero outside the loops and equal to H(x) - Hk inside the kth loop, k = 1, ... , N. Let all the edges of F be indexed by the numbers 1, . . . , n. The pairs (k, y), where k is the number of the edge containing a point z e I and y is the value of H(x) on the level set component corresponding to z, form a coordinate system on r. Consider the mapping Y : T2 i- r, Y(x) = (k(x), H(x)). Then one can expect that the processes Ye = Y(XE), 0 < t < T, T < oo, converge weakly in the space of continuous functions on [0, T] with the values in IF to a diffusion process Y, on F. One can prove in the same way as in the case of Hamiltonian systems in R2, that the limiting process is governed inside the edges by the operators obtained by averaging with respect to the invariant density of the fast motion. The gluing conditions at all the vertices besides Oo have the same form as before. But the gluing condition at Oo is special. The
uniform distribution on T2 is invariant for any process Xt, e > 0. Thus, the
time each of them spends in the exterior of the loops is proportional to the area of the exterior. Therefore, the limiting process Y, spends a positive amount of time at the vertex Oo, that corresponds to the exterior of the loops. The process Y, spends zero time at all other vertices. To calculate the gluing conditions at Oo, assume that the Hamiltonian
H(x) inside each loop has just one extremum (like inside the loop y2 in Fig. 28). Since the gluing conditions at 00 are determined by the behavior of the process X, in a neighborhood of the loops, we can assume this without loss of generality. Let the area of the torus T2 be equal to one, and the area of the exterior of the loops be equal to a. Denote by p the measure on r such that u{Oo} = a, (du/dH)(k, H) = Tk(H) inside the kth edge, where
350
8. Random Perturbations of Hamiltonian System,
Tk(H) =
IVH(x)l-1 dC, Ck(H)
Ck(H) is the component of the level set C(H) = {x E T2 : H(x) = H} inside the kth loop. It is easy to check that Tk(H) = IdSk(H)/dHI, where Sk(H) is the area of the domain bounded by Ck(H). Taking this into account, one
can check that p is the normalized invariant measure for the processes Y, = Y(X,) on IF for any e > 0, and it is invariant for the limiting process Yt as well.
On the other hand, a diffusion process on IF with the generator A has an invariant measure p if and only if
J Au(k, y) du = 0
r
for any continuous function u(z), z e t, that is smooth at the interior points of the edges and such that Au(z) is also continuous. Taking into account that the operator A for the process Y, coincides with the differential operator Lk
-1
d
)
dHd
= 2Tk(H) dH ak(H)
ak(H) = f ,
k_(H)
I VH(x) I de,
C
inside any edge Ik of the graph F, we conclude from (7.3) that the measure p is invariant for the process Y, only if N
aAu(Oo) =
2 E(± 1)ak(0) du
(k, Oo).
(7.4)
k=1
Here (du/dH)(k, Oo) means the derivative of the function u(z), z E F along
the kth edge Ik - Oo at the point O. Sign "+" should be taken in the kth term if H(x) < 0 on Ik, and sign "-" if H(x) > 0 on Ik. Equality (7.4) gives us the gluing condition at the vertex Oo. 3.
Consider an oscillator with one degree of freedom,
j), +f (Pt) = 0.
(7.5)
Let the force f (p) be a smooth generic function, F(p) = fo f (z) dz. Assume
that limlpl-, F(p) = oc. One can introduce the Hamiltonian H(p, q) _ 1 q2 + F(p) of system (7.5) and rewrite (7.5) in the Hamiltonian form,
Pr = aH (Pt, qt) m qt 4
9r = -
aH 9
(7.6)
(Pt, qt) = -r(Pt)
351
§7. Remarks and Generalizations
If we denote by x the point (p, q) of the phase space, equations (7.6) have the
form zt = VH(xt). Consider now perturbations of (7.5) by white noise, Ae +f(Pi) = eJi'.
(7.7)
Here Wt is the Wiener process in R', 0 < e << 1. One can rewrite (7.7) as a system: 'e
9r
e
= -f (P 8) +eWZ.
Now we have the degenerate process (0, eWt) in R2 as the perturbation. Although the general construction and the scheme of reasoning in this case are more or less the same as in the nondegenerate case, some additional estimates are needed, especially when we prove the Markov property and calculate the gluing conditions for the limiting process. On the other hand, we show that the characteristics of the limiting process, in the case under consideration, have simple geometric sense. Here we follow Freidlin and Weber [1].
As in the case of nondegenerate perturbations, the process 4) in R2 defined by (7.8) has, roughly speaking, fast and slow components. The fast component corresponds to the motion along the nonperturbed periodic trajectories, and the slow component corresponds to the motion transversal to the deterministic trajectories. The fast component can be characterized by the invariant density of the nonperturbed system on the corresponding periodic trajectory. To describe the slow component, it is convenient to rescale the time: put P =Pt/e2 qt = 91/e2 Then the function, p;, ql satisfies the equations 1
p1= EZq,, 1
9t = - E2 .f (Pi) +'3'r, where Wt is a Wiener process in R'. First, let us consider the case of the force f (p) having just one zero, say
at p = 0. Then the slow component of the process defined by (7.9) can be characterized by how H(pl, qf) changes. Using the Ito formula, we have
JH(P, q) -
H(Po, 90) =
H9(Ps,
o
9s) dWs + 2
Hee(P:,P:) ds.
Jo
352
8. Random Perturbations of Hamiltonian Systems
We made use of the orthogonality of VH(p, q) and VH(p, q). Applying the standard averaging procedure with respect to the invariant density of the fast motion with the frozen slow component, one can easily prove that for any T > 0 the processes Yt = H(p1, qI) converge weakly in the space of continuous functions on [0, T] to the diffusion process Y1 on [0, oo), corresponding to the operator
i L = 2 A(Y) A(Y)
_
d
dyd e
1
f
+ B(Y) dy ,
Hg(p,q)de
T (y) c(y) I VH(p, q) I '
B(y) =
(7.10)
Hgg d8
2T(y) f
y)
IVH(p,q)I
where T(y) = fc(y)(d8/IVH(p,q)I) is the period of the nonperturbed oscillations with the energy H(p, q) = y. Here, as before, C(y) _ {(p, q) a R2 : H(p, q) = y}; d( is the length element on C(y). The point 0 is inaccessible for the process Y, on [0, oo). The proof of this convergence can be carried out in the same way as in the nondegenerate case. Using the Gauss formula, one can check that
dY
[A(Y)T (Y)] = B(Y)T (Y)
Thus,
L 2T (Y) dY
(acy-)
where a(y) = A(y)T(y). Now, since H(p, q) =1 q2 + F(p), the contour C(y) can be described as follows.
C(y) = {(p, q) e R2 : a(y)
2(y - F(p))},
(7.11)
where a(y), f(y) are the roots of theequation F(p) = y. Note, that since
F'(p) = f (p) has just one zero at p = 0, and limIpI_. F(p) = oo, the equation F(p) = y has exactly two roots for any y > 0. Using (7.11), we calculate:
HH(p,q)=q2=2(Y-F(p))
353
§7. Remarks and Generalizations
and IVH(p, q)I =f2(p) +2(y- F(p)) for (p, q) e C(y). Thus, a (Y)
JP(Y)
Hq (P, q) de
=2 (y) IVH(Rq)I
dP
2(y - F(p)) = S(y),
«(y)
where S(y) is the area of the domain bounded by C(y). Taking into account that the period %(y) = dS(y)/dy, one can rewrite the operator defined by (7.10) in the form
L
2S1(y)
WY
(sc).
(7.12)
Now let f (p) have more than one zero. Then the level sets of the Hamil-
tonian H(p, q) =1 q2 + F(p) may have more than one component. Let t = {01,..., 0,.; Il , ... , I,,} be the graph homeomorphic to the set of connected components of the level sets of H(x, y), provided with the natural topology. Let Y : R2 -> IF be the mapping introduced earlier in this chapter,
Y(x) = (k(x),H(x)), where k(x) e {1, ... , n} is the number of the edge containing the point of IF corresponding to the nonperturbed trajectory starting at x e R2. The pair (k, H) forms the coordinates on F. If f (p) has just one zero, the graph IF is reduced to one edge [0, oo). But if the number of zeros is greater than one, the function k(x) is not a constant, and k(x) is an additional, independent of H(x), first integral of the nonperturbed system. This implies that the process H(p', q') in the case of many zeros no longer converges to a Markov process (if f (p) has no special symmetries; but remember that we assume that f (p) is generic). To have in the limit a Markov process, we should, as in the case of the nondegenerate perturbations, extend the phase space by inclusion of the additional first integral. In other words, we should consider the processes YY = Y(p',g1) on the graph r related to the Hamiltonian H(p, q). To describe the limiting process on t, define a function S(z),
z = (k, y) e t: put S(k, y) equal to the area bounded by the component Ck(y) of the level set C(y), corresponding to the point z = (k, y) e t \ { Oi , ... , If z = (k, y) is a vertex, corresponding to the extremum of H(p, q) (exterior vertex), then S(z) = 0. If z = O,, is a vertex corresponding to a saddle point of H(x) (interior vertex), then S(z) has a discontinuity at O. Each interior vertex is an end for three edges, say, I,,, I,Z,
and I,,. It is easy to see that limz.oo S(z) = S,,(Oj), r = 1,2,3 exist. These zEl,,
limits S,,(O4,), in general, are different for r = 1, 2, 3, and one of them is equal to the sum of the other two. Theorem 7.1 Let f (p) e COD, F(p) = fo f (s) ds, limlpl-,,, F(p) = oo. Assume that f (p) has a finite number of simple zeros and the values of F(p) at different
354
8. Random Perturbations of Hamiltonian Systems
critical points are different Then for any T > 0, the process Z, = Y(p;,
on
F converges weakly in the space of continuous functions on [0, T] with values in
F to a Markov diffusion process Zt as s j 0. The limiting process Zt is governed by the operator d d (S(k,y)dy Lk = 2Sy(k,y) dy 1
i n s i d e the edge I k c IF, k = 1, ... , n, and by the gluing conditions at the vertices: a bounded continuous on IF and smooth inside the edges function g(z), z E I', belongs to the domain of definition of the generator A of the limiting
process Zt if and only if Ag is continuous on F, and at any interior (corresponding to a saddle point of H(p, q)) vertex O,, of IF, the following equality holds.
S1(O,)d.!g(Qt) +S,2(OC)di?g(Or) = Si;(Oe)d6g(Ol) dy
dy
dy
Here Sik(Ot) =1im_o, S(z), Irk - 01, k = 1,
2,
we assume that
3;
"k
H(Y-1(z)) < H(Y-1(Oc)) or H(Y-1(z)) > H(Y-1(Oe)) simultaneously for "F zel,, U1,Z. The operators Lk, k = 1, ... , n and the gluing conditions define the limiting process Zt in a unique way.
The proof of this theorem is carried out, in general, using the same scheme as in the case of nondegenerate perturbations. The most essential difference arises in the proof of the Markov property (step 4) in the plan described in §8.1). We use here a priori bounds of Hormander type instead of Krylov and Safonov [I]. The detailed proof of Theorem 7.1 can be found in Freidlin and Weber [1]. The degenerate perturbations of a general Hamil-
tonian system with one degree of freedom are considered in that paper as well. 4.
Consider the diffusion process (Xt, P,r) in Rr corresponding to the
operator
L
> B` (x)
2 i,J=1 AtJ (x) ax
axt
.
We say that a function H(x) is a first integral for the process (Xt, P."), if Px{H(XX) = H(x)} = 1,
x e K.
If the function H(x), x e R', is smooth, then H(x) is a first integral for the
355
§7. Remarks and Generalizations
process Xt if and only if for any x e R',
8H
r
f,i=1
OH
A'' (x) 8x' (x) axt (x) =
LH(x) = 0.
0,
(7.13)
This follows immediately from the Ito formula. Of course, just degenerate diffusion processes can have a nontrivial first integral. If all diffusion coefficients are equal to zero, the process turns into a dynamical system defined by the vector field B(x) _ (Bt (x), ... , B'(x)), and conditions (7.13) into
equality B(x) VH(x) = 0, x e R. Trajectories of the process governed by the operator L can be described by the equation
dX,=oo(X,)dW,+B(X,)dt,
Xo=xeR',
where ao(x)QO(x) = (A''(x)) and W, is the Wiener process. Consider now random perturbations of the process Xt: dX, = vo(X,) dW, + B(X,) dt + eai (X!) d W, +S2 b(XE) dt;
b(x) = (b' (x), ... , b'(x)), W, is a Wiener process here al (x)o (x) = independent of W. Let us rescale the time: X, = XIIE2. The generator of the new process Xt is z
L8 = E L + L1,
Li = r a''j(x) 2 ax'ax +
i=l
b'(x) 8z' .
Assume that the operator Lt is uniformly elliptic with smooth bounded coefficients. Suppose the nonperturbed process Xt has a smooth first integral
H(x). Assume that H(x) is a generic function. It has a finite number of nondegenerate critical points, and each level set contains at most one critical
point. Moreover, let limlxl-. H(x) = oo, min,
r R2
H(x) = 0. Let C(y) =
{x a R2 : H(x) = y}, y > 0. The set C(y) is compact and consists of a finite number n(y) of connected components Ck(y), C(y) = Uk='i Ck(y). The process Xt is degenerate since it has a nontrivial first integral. But we assume that F_'t=t A'l (x)e;ej z ak(y)Je12 for any e = (et, ... , e,),
e - VH(x) = 0, x e Ck(y), with some ak(y) > 0, if Ck(y) has no critical points. The last assumption means that the process Xt is not degenerate if considered on a nonsingular manifold Ck(y). The manifold Ck(y) is compact. Thus, the process X, has on any nonsingular Ck(y) a unique invariant density Mj (x). If Ck(y) contains a critical point 0, we assume that the process X, considered on Ck(y) has just one invariant probability measure concentrated at the point O.
356
8. Random Perturbations of Hamiltonian Systems
One can introduce fast and slow components of the perturbed process (X, e, Ps): the fast motion along the level sets of the first integral and the slow motion in the transversal direction. The slow motion can be described, at least in a neighborhood of a nonsignular level set Ck (y), by the evolution of H(X, ). The fast component near the manifold Ck(y) has a distribution close to the distribution with the density My (y) if 0 < e << 1.
To describe the evolution of the slow component, let us apply the Ito formula to H(X1 ):
H(Xr) - H(x)
a
Jo VH(XS) . cro(XS) dW, +- J0 LH(XE) & t
+
Jt
0
VH(XS) . a1(XS)dWs+J L1H(Xs)ds.
(7.14)
0
Since H(x) is a smooth first integral, condition (7.13) is fulfilled. Therefore, the first two integrals on the right-hand side of (7.14) having large factors vanish:
H(Xt) - H(x) = jo VH(Xe) aI (Xs) d Ws +
J0t
L1 H(XX) ds.
(7.15)
Consider now the graph F homeomorphic to the set of connected components of the level sets of the function H(x). Let F consist of the edges I1i ... , I. and vertices Ol , ... , Om. Let Y : R' -4 t be the mapping defined earlier in this chapter in the case r = 2 : Y(x) is the point of r corresponding
to the connected component of C(H(x)) containing the point x e R Y(x) = (k(x),H(x)), where (k, H) are the coordinates on F. The function k(x), as well as H(x), is a first integral for the process X,. Consider the random processes YY = Y(XI) on t, e > 0. These processes converge weakly to a diffusion process Yt on r as e 10. To calculate the characteristics of the limiting process, consider, first, an interior point (y, k)
of the edge Ik c r. Let Ck(y) be the corresponding level set component. Then, using (7.15) and the fact that the process Xt on Ck(y) is ergodic and My (x) is its limiting density, one can prove that the process YY inside A is governed by the operator 1 d2 d Lk = 2ak (Y) dye + bk (Y) dy ,
8
ak(y) =
ij=l
a`'(x) axx)
"IX)
LiH(x)M, (x) dx.
bk(y) = fCk (V)
My (x) dx,
357
§7. Remarks and Generalizations
To determine the limiting process Yt for all t >_ 0, the behavior of the process after touching the vertices should be described. One can prove that the exterior vertices are inaccessible. To calculate the gluing conditions at the interior vertices, one can, sometimes, use the same approach as we used before. Assume that the processes in R' corresponding to the operators L and LI have the same invariant density M(x). Then the density M(x) is invariant
for the process (X,, Px) for any e > 0. Define the measure ,u on r as the projection on r of the measure p, ju(D) = fD M(x) dx, D c R': FA(Y) =
#(Y-,(Y));
y is a Bore] set in F. Then the measure p will be invariant for the processes
Y,, e > 0, and for the limiting process Yt. Now, if we know that Yt is a continuous Markov process on r, the gluing conditions at a vertex Ok are described by a set of nonnegative constants ak, flkl, ... , flke, where a is the number of edges connected with Ok. These constants can be chosen in a unique way so that the process YY has the prescribed invariant measure. For example, one can use such an approach if
L = L + B(x) V,
Lt = LI + b(x) O,
where L and LI are self-adjoint operators and the vector fields B(x) and b(x) are divergence free. The Lebesgue measure is invariant for L and Lt in this case.
Note that if the process (XI, Px) degenerates in a dynamical system, to repeat our construction and arguments, we should assume that the dynamical system has a unique invariant measure on each connected component Ck(y). This assumption, in the case when the dimension of the manifolds Ck(y) is greater than 1, is rather restrictive. One can expect that the convergence to a diffusion process on the graph still holds under a less restrictive assumption, namely, that the dynamical system on Ck(y) has a unique invariant measure stable with respect to small random perturbations.
The nonperturbed process (Xt, Px) (or the dynamical system) in R' governed by the operator L may have several smooth first integrals 5.
Hl (x), ... , He(x); t < r. Let
C(y)={xc-R':Hl(x)=yl,...,He(x)=ye}, y=(yt,...,ye)eR'. The set C(y) may be empty for some y n Re'. Let C(y) consist of n(y) connected components: C(y) = Uk`? Ck(y). Assume that at least one of the first integrals, say HI (x), tends to -Foo as IxI -- oo. Then each Ck(y) is compact.
358
8. Random Perturbations of Hamiltonian System,
A point x e R' is called nonsingular if the matrix (oH;(x)/ax1), 1 5 i 5 e, 1 < j < r, has the maximal rank &I. Assume that the nonperturbed process at a nonsingular point x e Rr is nondegenerate, if considered on the manifold C(H1(x), ... , H,,(x)). This means that
a(x) > 0,
E A'J(x)e'ej >_ a(x)lel2, t,i
for any e = (e1 i ... , er) such that e VHk (x) = 0, k = 1, ... , e. Then, the diffusion process Xt on each Ck(y), consisting of nonsingular points, has a unique invariant measure. Let My (x), y e R', y e Ck(y), be the density of that measure. The collection of all connected components of the level sets C(y), y e R', provided with the natural topology is homeomorphic to a set r consisting of glued C-dimensional pieces. The interior points of these pieces correspond to the nonsingular components Ck(y).
Define the mapping Y : R' H r : Y(x), x e R', is the point of t corresponding to the connected component of C(H1(x),. .. , H,(x)) containing x. One can expect that the stochastic processes Y7 = Y(XX ), where Xf corresponds to L' = (1 /e2)L + L1, converge weakly (in the space of continuous functions on [0, T] with values in T, provided with the uniform topology) to a diffusion process Yt on r as a -+ 0. Inside an f-dimensional piece yk c I', the matrix (3H;(x)/0x1) has rank t, and the values of the first integrals H, (x), ... , H., (x) can be used as coordinates. The process Y, in these coordinates is governed by an operator a
a2
1
i
Lk=2Eak(Y)a;ai+Ebk(Y)a Y YY i=1 41=1
The coefficients a`k(y) and bk(y) can be calculated by the averaging procedure with respect to the density My (x):
a()=
r
A- (x)
'Hi (x)
fCk(Y) mnl
bk(y) _ f
)a
( )
My (x) dx,
LH;(x)ML (x) dx.
Ck(r)
To determine the limiting process, Yt for all t >_ 0, one should supplement the operators Lk governing the process inside the e-dimensional pieces, by the gluing conditions in the places where several (-dimensional pieces are glued together. These gluing conditions should be, in a sense, similar to the boundary conditions for the multidimensional process (see Wentzell [8]).
§7. Remarks and Generalizations
359
C Figure 29.
In the conclusion of this section, we would like to mention one more asymptotic problem leading to a process on a graph. 6.
Consider the Wiener process Xt in the domain" G' c R', shown in Fig. 29(a), with the normal reflection on the boundary. The domain G' consists of a ball of the radius p' and of three cylinders of radii Eri, Er2, Erg, respectively. Let the axes of these cylinders intersect at center 0 of the ball. It is natural to expect that the projection of the process Xf on the "skeleton" F of the domain GE, shown in Fig. 29(b), converges to a stochastic process on F, as e j 0. One can check that, inside each edge of the graph, the limiting process will be the one-dimensional Wiener process. Assuming that near the points A, B, C, the boundary of GE consists of a piece of the plane orthogonal to the axis of the corresponding cylinder, it is easy to see that the limiting process has instantaneous reflection at the points A, B, c e F. To determine the limiting process for all t >_ 0, we should describe its behavior at the ver-
tex 0. The gluing condition at 0 depends on the relation between pE and e. If e maxk rk S p' << E('-l)/', the gluing condition has the form 3
r;-1 du (0) = 0,
(7.16)
1=1
where d/dyi means the differentiation along the ith edge and yi is the distance from 0. Condition (7.16) means that the limiting process spends time
360
8. Random Perturbations of Hamiltonian Systems
zero at the vertex. If pe >> e('-l)/', then the point 0 is a trap for the process: after touching 0, the trajectory stays there forever. The gluing condition in this case is u"(0) = 0,
independently of the edge along which this derivative is calculated. If limEio pee('-l)/r = x, then the domain of the generator of the limiting process consists of continuous functions u(x), x e r, for which u"(x) is also continuous and s
du
r
i-
-'
dY; (0)
K'I'( 2)F (
r
(r + 2
1) u"(0) .
2
The limiting process in this case spends a positive time at 0. This problem and some other problems leading to the processes on graphs were considered in Freidlin and Wentzell [4] in Freidlin [21].
Chapter 9
Stability Under Random Perturbations
§1. Formulation of the Problem In the theory of ordinary differential equations much work is devoted to the
study of stability of solutions with respect to small perturbations of the initial conditions or of the right side of an equation. In this chapter we consider some problems concerning stability under random perturbations. First we recall the basic notions of classical stability theory. Let the dynamical system X, = b(x,)
(1.1)
in R' have an equilibrium position at the point 0: b(O) = 0. The equilibrium position 0 is said to be stable (Lyapunov stable) if for every neighborhood U, of 0 there exists a neighborhood U2 of 0 such that the solutions of equation (1.1) with initial condition xo = x e U2 do not leave U, for positive t. If, in addition, lim,. x, = 0 for trajectories issued from points xo = x sufficiently close to 0, then the equilibrium position 0 is said to be asymptotically stable. With stability with respect to perturbations of the initial conditions there is closely connected the problem of stability under continuously acting perturbations. To clarify the meaning of this problem, along with equation (1.1), we consider the equation X, = b(X,) + br,
(1.2)
where s, is a bounded continuous function on the half-line [0, oc) with values in R'. The problem of stability under continuously acting perturbations can
be formulated in the following way: under what conditions on the field b(x) does the solution of problem (1.2) with initial condition Xo = x converge
uniformly on [0, x) to the constant solution X, = 0, as Ix - 01 + sup,:,,,. I C, I 0. It can be proved that if the equilibrium position is stable, in a sufficiently strong sense, with respect to small perturbations of the initial conditions, then it is also stable under continuously acting perturbations.
362
9. Stability Under Random Perturbations
For example, if the equilibrium position is asymptotically stable, i.e.,
limx,=0 t-.
uniformly in xo = x belonging to some neighborhood of 0, then 0 is also stable with respect to small continuously acting perturbations (cf. Malkin [1R)
The situation changes essentially if we omit the condition of boundedness of C, on [0, oc). In this case the solutions of equation (1.2) may, in general, leave any neighborhood of the equilibrium position even if the equilibrium position is stable with respect to perturbations of the initial conditions in the
strongest sense. Actually, the very notion of smallness of continuously acting perturbations needs to be adjusted in this case. Now let C, in (1.2) be a random process: C, = C,(w). If the perturbations C,(w) are uniformly small in probability, i.e., supo 5, < oo I C,(w) I - 0 in probability, then the situation is not different from the deterministic case: if the point 0 is stable for system (1.1) in a sufficiently strong sense, then as
sup U(0)
o5r
converges to zero in probability and the initial condition xo = x converges
to 0, we have: P{supo,<, Ix, - 01 >- 6} -.0 for any 6 > 0. Nevertheless, the assumption that [, converges to zero uniformly on the whole half-line [0, oc) is too stringent in a majority of problems. An assumption of the following kind is more natural: sup, M I C, IZ or some other characteristic of the process C, converges to zero, which makes large values of I C, I unlikely at every fixed time t but allows the functions J C## I to assume large values
at some moments of time depending on to. Under assumptions of this kind, the trajectories of the process z, may, in general, leave any neighborhood of the equilibrium position sooner or later. For example, if C, is a stationary Gaussian process, then the trajectories of z, have arbitrarily large deviations from the equilibrium position with probability one regardless of how small
MICr12=a00is.
To clarify the character of problems, involving stability under random perturbations, to be discussed in this chapter, we consider an object whose state can be described by a point x e R. We assume that in the absence of
random perturbations, the evolution of this object can be described by equation (1.1), and random perturbations C,(w) lead to an evolution which can be described by the equation
X, = b(X C).
(1.3)
As , we also allow some generalized random processes for which equation (1.3) is meaningful (there exists a unique solution of equation (1.3) for any
§1. Formulation of the Problem
363
initial point x0 = x e R'). For example, we shall consider the case where b(x, y) = b(x) + r(x)y and C, is a white noise process or the derivative of a Poisson process.
We assume that there exists a domain D c R' such that as long as the phase point characterizing the state of our object belongs to D, the object does not undergo essential changes and when the phase point leaves D, the object gets destroyed. Such a domain D will be called a critical domain. Let Xx be the solution of equation (1.3) with initial condition Xo = x. We introduce the random variable TX = min{t: X; 0 D}-the time elapsed until the destruction of the object. As the measure of stability of the system with respect to random perturbations C, relative to the domain D, it is natural
to choose various theoretical probability characteristics of TX).' Hence if we are interested in our object only over a time interval [0, T], then as the measure of instability of the system, we may choose P{TX < T}. If the time interval is not given beforehand, then stability can be characterized by MTX. In cases where the sojourn of the phase point outside D does not lead to destruction of the object but is only undesirable, as the measure of stability,
we may choose the value of the invariant measure of the process for the complement of D (if there exists an invariant measure). This measure will characterize the ratio of the time spent by the trajectory X; outside D. However, a precise calculation of these theoretical probability characteristics is possible only rarely, mainly when Xx turns out to be a Markov process or a component of a Markov process in a higher dimensional space. As is known, in the Markov case the probabilities and mean values under
consideration are solutions of some boundary value problems for appropriate equations. Even in those cases where equations and boundary conditions can be written down, it is not at all simple to obtain useful information from the equations. So it is of interest to determine various kinds of asymptotics of the mentioned theoretical probability characteristics as one parameter or another, occurring in the equation, converges to zero. In a wide class of problems we may assume that the intensity of the noise is small in some sense compared with the deterministic factors determining the evolution of the system. Consequently, a small parameter appears in the
problem. If 0 is an asymptotically stable equilibrium position of the unperturbed system, then exit from a domain D takes place due to small random
perturbations, in spite of the deterministic constituents. As we know, in such situations estimates of P{TX < T} and MTX are given by means of an
action functional. Although in this way we can only calculate the rough, logarithmic, asymptotics, this is usually sufficient in many problems. For example, the logarithmic asymptotics enables us to compare various critical We note that r' may turn out to be equal to 0o with positive probability. If the random perturbations vanish at the equilibrium position 0 itself, then P(t' < no) may converge to 0 as x -+ 0. For differential equations with perturbations vanishing as the equilibrium position is approached, a stability theory close to the classical theory can becreated (cf. Khas'minskii [1]).
364
9. Stability Under Random Perturbations
domains, various vector fields b(x, y) and also enables us to solve some problems of optimal stabilization. This approach is developed in the article by Wentzell and Freidlin [5]. We introduce several different formulations of stability problems and outline methods of their solution. Let X,, t > 0, be the family of random processes in R`, obtained as a result of small random perturbations of system (1.1); the probabilities and mathematical expectations corresponding to a given value of the parameter and a given initial point will be denoted by Px, Mx, respectively. Let A(h)S0r((P) be the action functional for the family of processes X; with respect to the metric poT((o, Eli) = supo <, < r I (P, - of I ash . 0. It is clear that SOT vanishes for trajectories of system (1.1) and only for them. Let 0 be a stable equilibrium position of system (1.1), let D be a domain
containing 0, and let tD = inf {t: X" 0 D} be the time of first exit of the process from D. We are interested in the stability of the system on a finite time interval [0, T]. We shall characterize stability by the asymptotics of PX{tD < T} ash 10. It is appropriate to introduce the following measure o/' stability of our system with respect to the given random perturbations and domain D:
VD x= inf{SoT((P): W E U U Hxy(t) (
O
where the set Hxy(t) consists of all functions cps defined for s e [0, T], such that
cpo = x and cp, = y. The sense of this measure of stability is the following: if the infima of SOT over the closure and interior of UO sr<.r U,OD Hxy(t) coincide, then Pz{tD < T} is logarithmically equivalent to exp{ -A(h)VD-x}.
We note that for the coincidence of the infima over the closure and the interior
it is sufficient (in the case of a functional SOT of the form considered in Chs. 4-5) that D coincide with the interior of its closure. The measure of stability VDT.x is given by info<,sr,y4DU(t,x, y), where u(t, x, y) = inf {SOT(4 ): q e Hx.,.(t)) can be determined from the Jacobi equation. The problem in which there is no fixed time interval [0, T] of observation is characterized by another measure of stability. We define PD as the infimum of the values of the functional SOT of the functions cp, defined on intervals [0, T] of any length, such that po = 0, (PT 0 D. We may calculate µD as the infimum, over y 0 D, of the quasipotential V(O, y) = inf {SOA(P): Wo = O, APT = y; 0 < T < oo},
365
§1. Formulation of the Problem
which can be calculated (Theorem 4.3 of Ch. 5) as the solution of an appropriate problem for a partial differential equation of the first order. According
to results of Chs. 4-7, under suitable assumptions on the processes Xh and the domain D, the mean exit time Mh TD of D is logarithmically equivalent
to exp{A(h)PD} for all points x belonging to D for which the trajectory of system (1.1) issued from x converges to the equilibrium position 0, without leaving D (Theorems 4.1 of Ch. 4, 5.3 of Ch. 6 and 6.1 of Ch. 7). In this case the mathematical expectation represents a typical value of the exit time to within logarithmic equivalence. Namely, for any y > 0 we have lim Px{exp{1(h)[pD - y]} < TD < exp{A(h)[pD + y]}} = 1 h10
(Theorem 4.2 of Ch. 4). Further, the value of the normalized invariant measure of Xh for the set R'\ D is logarithmically equivalent to exp{ -A(h)PD}
(Theorem 4.3 of Ch. 4 and 4.2 of Ch. 5). Consequently, if the time interval is not fixed beforehand, then the constant PD is, in some sense, a universal
measure of stability for perturbations and critical domain of the kinds being considered. If the critical domain is not given, then such a universal characteristic of stability of the equilibrium position is the quasipotential V(O, y) of random perturbations.
The "most dangerous point" on the boundary of the critical domain can be expressed in terms of V(O, y): under certain assumptions, the "destruction" of the object takes place, with overwhelming probability for small h, near the points y E OD where V(O, y) attains its infimum over OD.
Now we consider the problem of selecting an optimal critical domain. We assume that for domains D containing the equilibrium position 0 of the unperturbed system, a monotone functional H(D) is defined: for the sake of definiteness, we assume that this functional has the form ID h(x) dx, where h(x) is a positive function. From the domains D with a given value H0 of H(D) we try to select one with the smallest probability of exit from the domain
over a given time T or with the largest mathematical expectation of the exit time. The optimal critical domain depends on h in general. We shall seek an asymptotic solution of the problem, i.e., we shall try to construct a domain which is better than any other domain (independent of h) for sufficiently small h.
It is clear that the problem is reduced to maximization of the corresponding measure of stability, V',' or PD. It is easy to see that it has to be solved in the following way: we choose domains D, of the form { y: inf u(t, x, y) < c 0
I
366
9. Stability Under Random Perturbations
(or {y: V(O, y) < c}, respectively); from the increasing family of these domains we choose that one for which H(D,) = Ho. If the function inf u(t, x, y) 0
(or V(O, y)) is smooth in y, then the good properties of Dc are guaranteed. Any domain whose boundary is not a level surface of the function inf u(t, x, y) 0
(V(O, y), respectively) can be made smaller with preservation of the value of inf,,#D info,
For example for the system z, = Ax, with a normal matrix A and perturbations of the type of a "white noise" (i.e., X' = A X; + sw,), the optimal critical domain for the problem without a fixed time interval is an ellipsoid (cf. Example 3.2, Ch. 4).
We pass to problems of optimal stabilization.
We assume that the perturbed equation (1.3) contains a parameter (or several parameters) which can be controlled. A choice of the way of control of the process consists of a choice of the form of dependence of the
controlling parameter on values of the controlled random process. We introduce the following restriction on the character of this dependence. To every form a of dependence of the controlling parameter on values of the process let there correspond a family of random processes X°''. For all of them we assume that there exists an action functional A.(h)S°((p) in which the normalizing coefficient d(h) does not depend on the choice of control. The solution of certain problems of optimal control of a process for small h is connected with the functional S°(Q) = SoT((p) and the quasipotential V°(x, y) = inf (SoT(cP): cPo = x, (PT = Y; 0 < T < oo ). In particular, it is plausible that the problem of the choice of control maximizing the mean exit time of a domain D (asymptotically as h 10) is connected with the func-
tion V(x, y) = sup V°(x, y),
(1.4)
where the supremum is taken over all admissible ways of control. We restrict ourselves to controlled Markov processes, homogeneous in time. As admissible controls we shall consider those in which the value of the controlling parameter at a given time t is a definite function a(X,) of the
value of the process at the same moment of time. Such controls lead to Markov processes homogeneous in time (cf. Krylov [2]); at every point x their local characteristics depend on x and on the value of the controlling parameter a(x) at x.
367
§2. The Problem of Optimal Stabilization
Hence let the class of admissible controls consist of functions a(x) whose values at every point x belong to the set 11(x) of admissible controls at x (and which are subject to some regularity conditions). To every admissible function there corresponds a Markov process PX h) (the corresponding mathematical expectation, normalized action functional, and quasipotential will be denoted by Mx'-X h, SoT((p), and Va(x, y)). We shall consider the following problem: we seek a function a which maximizes
lim .1(h)-' In h10
As we have already mentioned, the solution of this problem is most likely connected with the function V(x, y) defined by formula (1.4). A series
of questions arises here: How can the maximal quasipotential V(x, y) be determined? How can the optimal control problem be solved by means of it? In the following sections we consider these questions (for a certain class of controlled Markov processes) with examples.
§2. The Problem of Optimal Stabilization The class of families of random processes for which we shall consider the problem is the family of locally infinitely divisible processes considered in Ch. 5. We assume that at every x e R' we are given a set 11(x) 9 R'. To
every pair x e R', a e 11(x) let there correspond: a vector b(x, a) =
(b'(x, a), ..., b'(x, a)), a symmetric nonnegative definite matrix (a" (x, a)) of order r, and a measure p(x, a, .)on R'\ (0), such that .(I fi 1Zp(x, a, d/3) < 00. We define the process corresponding to the value h of the parameter (from (0, oo)) and the choice a(x) of the control function (a(x) belongs to 11(x) for every x) to be the Markov process with infinitesimal
generator defined for twice continuously differentiable functions with compact support by the formula z
A',' 'f (x)
b'(x a(x))
of (x) + h Y a''(x a(x)) a .f (x) ax' 2 ax' ax'
+ h-1 R\101 1
[[(x + h#) -f(x) - h y Ofexi(X) fl'Jµ(x, a(x), d#). (2.1)
Of course, it may turn out that no Markov process corresponds to a given
function a(x) for some h > 0; in this case we shall consider the control function inadmissible. We denote by 21 the class of admissible controls.
A large number of publications is devoted to the problem of finding conditions guaranteeing that to a given set of local characteristics there corresponds a locally infinitely divisible process; this problem has been studied especially extensively in the case of diffusion processes, i.e., where
9. Stability Under Random Perturbations
368
p = 0. For example, it is sufficient that the diffusion matrix be uniformly nondegenerate and continuous and the drift vector be bounded and measurable (cf. Krylov [1], Stroock and Varadhan [1]). The case of a p, different from zero, has been considered in Komatsu [1], Stroock [1], Lepeltier and Marchal [1], etc.
For a varying choice of a(x), the probabilities of unlikely events for (X h,", PX ") are connected with the function of three variables b'(x, a)a; + 2 I a"(x, a)a; a;
H(x, a, a) _
+ J r [exp{y f3'a;} - 1 - > f3'a;]p(x, a, d fl), R \101
(2.2)
which, together with its derivatives with respect to a;, we shall assume to be finite and continuous in all arguments. Under the hypotheses of Theorem
2.1 of Ch. 5, the action functional for the family of processes as h 10 is given by the formula
h-'Sor(
`
J 0T
a(W,), q,) clt,
(2.3)
where L(x, a, f3) is the Legendre transform of H(x, a, a) in the third argument. In Ch. 5 we required that the functions H(x, a), L(x, f3) be continuous in x.
Therefore, it would be desirable to consider only continuous controlling functions a(x). However, it is known that in problems of optimal control we can only rarely get along with continuous controls. It turns out that the necessary results can be carried over to the case where the controls are discontinuous at one point (or a finite number of points) but the results cannot be preserved in general if discontinuity occurs on a curve (or on a surface). Therefore, we introduce the following class of functions. Let a(x) be defined on R' with values in R'. We shall write a E H if a is continuous for all x e R' except, maybe, one point and for every x the value of a belongs to the set II(x) of admissible controls at x. Theorems 2.1-2.3 for diffusion processes are contained in Wentzell and Freidlin [5].
from a bounded Theorem 2.1. Let TD be the time of exit of the process domain D it-hose boundary coincides with the boundary of its closure. For any control.function a e 11 n 91 we have lim h In h10
x e D.
V0 = max min sup V"(x0, y)
369
§2. The Problem of Optimal Stabilization
We note that V° is in turn the infimum of the values of S°((p) for functions leading from xo to y, so that the right side of (2.4) contains a quite complicated combination of maxima and minima: max min sup inf. The proof of the theorem can be carried out in the following way. For any
y > 0 we choose T > 0 such that for any control function a e II and any xo e D there exists a function cp 0 < t < T, such that cps, = xo, cp, leaves D for some t e [0, T] and SaT((p) < Vo + y. Then we prove the lemma below.
Lemma 2.1. For any a e n n 2I we have
Px°(tD < T+ 1) > exp{-h-'(Vo + 2y)}
(2.5)
for sufficiently small h and all x e D.
The proof can be carried out as that of part (a) of Theorem 4.1 of Ch. 4: the function (p, is extended beyond D; we use the lower estimate given by Theorem 2.1 of Ch. 5, of the probability of passing through a tube. The only difference is that the hypotheses of Theorem 2.1 of Ch. 5 are not satisfied in the neighborhood of the point x,, where a(x) is discontinuous. Nevertheless, the corresponding segment of the function cp, can be replaced by a straight line segment and the lower estimate gives the following lemma.
Lemma 2.2. Let be a family of locally infinitely divisible processes with infinitesimal generator of the form (2.1) and let the corresponding function
H(x, a(x), a), along with its first and second derivatives with respect to a, be bounded in ant' finite domain of variation of a (but it may be arbitrarily discontinuous as a./unction x). Then for any y > 0 and any 6 > 0 there exists po > 0 such that Px °{po,o(X'' °, (p) < b} > exp{ -yh-' }
(2.6)
for sufficiently small h > 0, for all x, y such that It' - xI <- Po, where p, _
x+t[(t'-x)/(Iv-xl)],0
The proof copies the corresponding part of the proof of Theorem 2.1 of Ch. 5 but without using the continuity of H or L. After the proof of (2.5), the proof of Theorem 2.1 can be completed by applying the Markov property:
n(T + 1)) < [I - exp{-h-'(Vo + 2y)}]"; 0C
(T + 1)
Px °{tD > n(T + 1)) n=o
< (T+ 1)exp{h-'(Vo + 2y')).
370
9. Stability Under Random Perturbations
Now we put
H(x, a) = inf H(x, a, a). a e n(x)
Theorem 2.2. Let V(x) be the solution of problem Rx0 for the equation
H(x, DV(x)) = 0
(2.7)
in a domain D. For any function a belonging to 11 (i.e., a is continuous every. where except one point and a(x) e 11(x) for every x) for the quasipotential we have
V0(xo, X) < V(x)
for all x in the set
B=
{xeDu
8D: V(x) <
inf V(y)}. yeah
11
))))))
Moreover, suppose that there exists a function a(x, a), continuous on the set {(x, a): x 0 xo, a 0 0, H(x, a) = 0}, such that a(x, a) a 11(x) and H(x, a(x, a), a) = H(x, a) = 0. Then
sup Va(xo, X) = V(x) aen
for all x e B and the supremum is attained for the function a(x) = a(x, V V(x)).
Proof. Suppose that for some a e n, x e B and s > 0 we have V8(XO, x) > e + (1 + E)V(x).
We may assume that e 0 E,K = [V0(xo, x*) - V(x*)]/[l + V(x*)], where x* is the point where a(x) is discontinuous. We consider the set A = B n {x: Va(xo, x) > e + (1 + E)V(x)}; this set is open in B. We put Vo = inf { V(x): x e A). The infimum Vo is not attained; let x., be a limit point of A on the level surface {x : V(x) = Vol. Let x), ...,
x,... be a sequence of points of A converging to c,,, (Fig. 30). It is clear that the points of the surface {x: V(x) = Vol do not belong to A and Vo. We note that by virtue of the choice of e 0 E*, the point x0, does not coincide with the point x* of discontinuity of a.
371
§2. The Problem of Optimal Stabilization
V(x) = Vo
Figure 30.
We consider the vectors V
V V(x
By virtue of the properties of the Legendre transformation,
(VV(x.), /3w) = L(x,,, a(x.), f3.) + H(x., a(x.), VV(x,,)).
(2.8)
The function L(x., /3) is nonnegative everywhere and vanishes only V is at /3 = V,H(x,", 0). On the other hand, the solution of problem Rxo). Therefore, the first term in (2.8) is positive. The second term is not less than H(x,", V V(x,,)) = 0. Therefore, the scalar is directed outside the product (V V(x,"), /3.) is positive, i.e., the vector surface {x: V(x) = Vol at x,,,). By virtue of the continuity of V !7(x), the situation is the same at points close to x,,. For every point x" we define the function
rp,"'=x"+t/3,,,
t<0.
For x" sufficiently close to xm (i.e., for sufficiently large n), the straight line
gyp;"' intersects the surface {x: V(x) = Vol for a small negative value t", where I to l
V(xn) - Vo (V V(xx), fl.)
as n - oo. The denominator here is not less than L(x,,, a(x,,), Itnl
so that
V(xn)
- V0 (1 +00)) L(x., a(x.), P.)
asn -oc. We estimate the value of the functional S" of the function (p,("' for to <
t < 0: 0
S".0(0")) =
a L(Wi"l, a(ggi"'), fl ) dt
- It. I L(x., a(x.), R.) <_ (V (xn) - Vo)(l + o(1)) (2.9)
372
9. Stability Under Random Perturbations
as n -+ oo. By virtue of the definition of the quasipotential V° we have V°(x0,
V°(x0, q0,(")) +
The first term here does not exceed c + (1 + E + (1 + E)V0; the second term can be estimated by formula (2.9). Consequently, the inequality V1(x0 ,
E + (1 + C) yo + (1 +
E + (1 + E) V(x°)
is satisfied for sufficiently large n. On the other hand, this contradicts the fact that x e A, i.e., that V°(x0, x°) > E + (1 + It follows that A is empty. The first part of the theorem is proved. For the proof of the second part it is sufficient to apply Theorem 4.3 of Ch. 5 to the functions H(x, a(x), a) H L(x, a(x), fl), continuous for x = x0, where a(x) = a(x, VV(x)). El
Now we assume that there exists a function a(x, a) mentioned in the hypothesis of Theorem 2.2, for every x0 e D there exists a function V(x) = V.,°(x) satisfying the hypotheses of Theorem 2.2 and for every x0, the function
ax°(x) = a(x, V. Vxo(x)) belongs to the class 2[ of admissible controls. These conditions imply, in particular, that for any two points x0, x, ED, sufficiently close to each other, there exists a control function a(x) such that there is a "most probable" trajectory from x, to x0-a solution of the equation z, = b(x,, a(x,)). We introduce the additional requirement that 0 is an isolated point of the set {a: H(x, a) = 0} for every x. Then x0 can be reached from x, over a finite time and it is easy to prove that the same remains true for arbitrary x0, x, a D, not only for points close to each other.
Theorem 2.3. Let the conditions just formulated be satisfied. We choose a point x0 for which minYEBD !7X0(y) attains the maximum (equal to VO). We choose the control function a(x) in the following way: in the set Bx° = Jx: Vxo(x) < min Vxo(Y) ye(7D
}}
for the remaining x we define a(x) in an arbitrary we put a(x) = a(x, Vx way, only ensuring that a(x) is continuous and that from any point of D the solution of the system z, = b(x,, a(x,)) reaches the set Bx° for positive t, remaining in D. Then
lim h In Mz °rD = V0. h10
for any x e D.
(2.10)
373
§3. Examples
Together with Theorem 2.1, this theorem means that the function a is a solution of our optimal stabilization problem. The proof can be carried out in the following way: for the points of Bx0, is a Lyapunov function for the system x, = b(x a(x,)). the function This, together with the structure of a(x) for the remaining x, shows that xo is the unique stable equilibrium position, which attracts the trajectories issued from points of the domain. Now (2.10) follows from Theorem 4.1 of Ch. 4, generalized as indicated in §4, Ch. 5.
§3. Examples At the end of Chapter 5 we considered the example of calculating the quasipotential and the asymptotics of the mean exit time of a neighborhood of a stable equilibrium position and of the invariant measure for the family of one-dimensional processes jumping distance h to the right and to the left
with probabilities h-'r(x) dt and h-'I(x) dt over time dt. By the same token, we determined the characteristics of the stability of an equilibrium position. This example admits various interpretations; in particular, the process of division and death of a large number of cells (cf. §2, Ch. 5). The problem of stability of an equilibrium position of such a system, i.e., the problem of determining the time over which the number of cells remains below a given level may represent a great interest, especially if we consider, for example, the number of cells of a certain kind in the blood system of an organism rather than the number of bacteria in a culture. Another concrete interpretation of the same scheme is a system consisting of a large number of elements N, which go out of work independently of each other after an exponential time of service. These elements start to be repaired; the maintenance time is exponential with coefficient p depending
on the ratio of the elements having gone out of work. The change of this ratio x with time is a process of the indicated form with h = N-', r(x) = (1 - x) A (.1 is the coefficient of the distribution of the time of service) and I(x) = xp(x). A nonexponential maintenance time or a nonexponential time between subsequent cell divisions lead to other schemes, cf. Freidlin [8] and Levina, Leontovich, and Pyatetskii-Shapiro [1]. We consider examples of optimal stabilization. EXAMPLE 3.1. Let the family of controlled processes have the same structure
at all points; in other words, let the function H and the set of admissible controls at a given point be independent of x: H(x, a, a) _- H(a, a):
11(x) _- II.
374
9. Stability Under Random Perturbations
In this case the function H is also independent of x:
H(x, a) = H(a) = inf H(a, a) aert
and equation (2.7) turns into H(VV(x)) = 0.
(3.1)
The function H(a) vanishes only for one vector in every direction (except a = 0). The locus of the terminal points of these vectors will be denoted by A. Equation (3.1) may be rewritten in the form V V(x) e A.
Equation (3.1) has an infinite set of solutions-the r-dimensional planes of the form VXpzp(x) = (ao, x - xo), oto c- A -but none of them is a solution of problem Rxo for the equation. To find this solution, we note that the solu-
tion planes depend on the (r - 1)-dimensional parameter ao E A and a one-dimensional parameter independent of a on the scalar product (ao, xo). This family is a complete integral of equation (3.1). As is known
(cf. Courant [1], p. 111), the envelope of any (r - 1)-parameter family of these solutions is also a solution. If for a fixed xo, the family of planes Vxoxo(x) has a smooth envelope, then this envelope is the desired solution Vxa(x) of problem Rxo.
This solution is given by a conic surface at any rate (i.e., Vo(x) is a positively homogeneous function of degree one in x - xo); it is convenient to define it by means of its (r - 1)-dimensional level surface
U, = {x - xo: Vxo(x) = 1)
(it is independent of xo). The surface U, has one point fo on every ray emanating from the origin. We shall see how to find this point. Let the generator of the cone Vxo(x), corresponding to a point /io, be a line of tangency of the cone with the plane (ao is determined uniquely, because the direction of this vector is given: it is the direction of the exterior normal to U, at /fo). The intersection with the horizontal plane at height 1 is the (r - 1)-dimensional plane {/3 : (ao, fl) = 1 }, which is tangent to U, at 80 (Fig. 31). The point of this plane which is the closest to the origin is situated at distance I ao I - ` in the same direction from the origin as ao, i.e.,
t(ao)
0
Figure 31.
§3. Examples
375
it is the point 1(ao) obtained from ao by inversion with respect to the unit sphere with center at 0. Hence to find the level surface U1, we have to invert the surface A; through every point of the surface 1(A) thus obtained we consider the plane orthogonal
to the corresponding radius and take the envelope of these planes. This geometric transformation does not always lead to a smooth convex surface:
"corners" or cuspidal edges may appear. Criteria may be given for the smoothness of U, in terms of the centers of curvature of the original surface A.
If the surface U, turns out to be smooth (continuously differentiable), then the solution of the optimal control problem may be obtained in the following way. In D we inscribe the largest figure homothetic to U, with positive coefficient of homothety c, i.e., the figure xo + cU,, where xo is a point of D (not defined in a unique way in general). Inside this figure, the optimal control field a(x) is defined in the following way: we choose a point
Po a U, situated on the ray in the direction of x - x0; we determine the corresponding point oco of A ; as a(x) we choose that value ao of the controlling
parameter in fI for which min°En H(a, ao) is attained (this minimum is equal to H(ao) = 0; we assume that the minimum is attained and that ao depends continuously on as belonging to A).
Consequently, the value of the control parameter a(x) is constant on every radius issued from x0. At points x lying outside xo + cU,, the field a(x) may be defined almost arbitrarily, it only has to drive all points inside the indicated surface. The mean exit time is logarithmically equivalent to exp{clr' } as h 10.
In Wentzell and Freidlin [5], a special case of this example was considered, where the subject was the control of a diffusion process with small diffusion by means of the choice of the drift. EXAMPLE 3.2. We consider a dynamical system perturbed by a small white
noise; it can be subjected to control effects whose magnitude is under our
control but which themselves may contain a noise. The mathematical model of the situation is as follows:
a(X:, )[6(Xr,a) + e&(X,K where w, w, are independent Wiener processes and a(x) is a control function, which is chosen within the limits of the set 11(x). We consider the simplest case: the process is one-dimensional, a, 6, and a are positive constants, b(x) is a continuous function, D is the interval (x,, x2) and the controlling parameter a(x) varies within the limits
±L26-' max lb(x)l + d 'Q] [x1,x2]
at every point (or within the limits of any larger segment f] (x)).
376
9. Stability Under Random Perturbations
We have H(x,a,a) = (b(x) + ab)a + i(.2 + a2a2)a2;
H(x, a) = min H(x, a, a) = b(x)Lx - b2 + ;Q2a2 Q
for
Ice I
>- [2d26 - 2 max[X,. XZl I b(x) I + Qab-' ] -'. The minimum here is
attained for a = -bQ-2a-'. The function H(x, a) vanishes for (
/
-_
Q2
bQO2
+
Of = al(x)
2a2 < 0
and for
a=a2(x)_
b(x) -2
b(x)2 Q4
52 W2- 2
>0,
and also for a = 0, which is, by the way, immaterial for the determination
of the optimal quasipotential (it can be proved easily that
I al(x) I and a2(x) I surpass the indicated boundary for Joel). Problem RX0 for the equation
H(x, Vxo(x)) = 0 reduces to the equation Vxo(x) -
1;(x) for xl < x < xo, a2(x)
for xo < x < x2
with the additional condition 1imx-XO VXO(x) = 0. It is easy to see that min(VXO(xl), VXa(x2)) attains its largest value for an xo for which VX0(x,) and VO(x2) are equal to each other. This leads to the following equation for x0:
rx J
b(x)2
+ 62 ga+ b(x)1 J dx -
f
4 +
%o
L
b(x)2
b2
- b(x)] dx
which has a unique solution. After finding the optimal equilibrium position x0, we can determine the optimal control according to the formula
_ bd-2a1(x)-' to the left of xo, a(x) - 5 -bQ-2a2(x)-' to the right of x0. ((
Chapter 10
Sharpenings and Generalizations
§1. Local Theorems and Sharp Asymptotics In Chapters 3, 4, 5 and 7 we established limit theorems on large deviations,
involving the rough asymptotics of probabilities of the type P{X'E A}. There arises the following question: Is it possible to obtain subtler results for families of random processes (similar to those obtained for sums of independent random variables)-local limit theorems on large deviations and theorems on sharp asymptotics? There is some work in this direction; we give a survey of the results in this section. If X; is a family of random processes, a local theorem on large deviations may involve the asymptotics, as h 10, of the density p;'(y) of the distribution of the value of the process being considered at time t, where the point y is different from the "most probable" value of x(t) for small h (we take into
account that the density is not defined uniquely in general; we have to indicate that we speak of, for example, the continuous version of density). We may prove local theorems involving the joint density p",...... ,,(y,, ... ,
of random variables X,..... X. We may also consider the asymptotics of the density of the distribution of a functional F(X') at points different from the "most probable" value of F(x(.)).
However, obtaining these kinds of results involving a large class of functionals F and families of random processes X" is unrealistic for the time being, at least because we need to obtain results on the existence of a continuous density of the distribution of F(X") beforehand. For the same reason, positive results in the area of local theorems on large deviations involving densities of values of a process at separate moments of time up to now are restricted to families of diffusion processes, for which the problem of the existence of a density is well studied and solved in the affirmative (sense) under insignificant restrictions, in the case of a nonsingular diffusion matrix.
The transition probability density p"(t, x, y) for the value of a diffusion process Xr under the assumption that Xo = x has the meaning of the fundamental solution of the corresponding parabolic differential equation (or that of the Green's function for the corresponding problem in the case of a diffusion process in a domain with attainable boundary). This provides a
base for both an additional method of study of the density p" and the area of possible applications.
10. Sharpenings and Generalizations
378
We begin with Friedman's work [1] because its results are connected more directly with what has been discussed (although it appeared a little later than Kifer's work [1] containing stronger results). Let ph(t,x,y) be the fundamental solution of the equation au/at = Lhu in the r-dimensional space, where
Lhu=}t2
a21,
() ax' ax' +Yb`(x)au, ax
a`'x
in other words, ph(t, x, y) is the continuous version of the transition probability density of the corresponding diffusion process (XI, PX). (Here it is more convenient to denote the small parameter in the diffusion matrix by h rather than E2, as had been done beginning with Ch. 4.) We put (1.2)
V(t, x, y) = min{So,(gp): 4Po = x, (Pr = y},
where h-'So, is the action functional for the family of processes (Xv, PX). The functional So, is given, as we know, by the formula SOAP) = 2 J
ajj(WSXOS - b'((gS))(WS - b'((gs)) ds,
(1.3)
where (a;j(x)) = (a"(x))-'. It can be proved that V(t, x, y).
lim h In ph(t, x, y)
(1.4)
h10
This rough local theorem on large deviations is obtained by applying rough (integral) theorems of Wentzell and Freidlin [4] and estimates from Aronson [1]: ph(t, x, y)
AO exp - (hty'2
co I
{
A, At, x+ Y) > /ht)rr2 eXp A2
-
X0,2 X)
ht
Cl IY - x(t, x)I2 }
(ht)r/2-2 eXp -
ht
c2I (Y - x(t, x))12 ht
I
for sufficiently small t, where the A;, c; and a are positive constants and x(t, x)
is the solution of the system r = b(x) with initial condition x(0, x) = x. Further, if (X;', PX) is the diffusion process corresponding to Lh in a domain D with smooth boundary OD, vanishing upon reaching the boundary,
then its transition probability density q''(t, x, y) is the Green's function for
§1. Local Theorems and Sharp Asymptotics
379
the equation au/at = Lhu with boundary condition u = 0 on aD. It is proved that V0(t, x, y),
lim h In qh(t, x, y)
(1.5)
h10
where VD(t, x, y) is defined as the infimum of SoT(cp) for curves cp connecting
x and y over time t, without leaving D. This implies in particular that lim qh(t, x, y) = 0
(1.6)
hIoP (t,x,y)
if all extremals for which the minimum (1.2) is attained leave D U OD. The following opposite result can also be proved: if all extremals pass inside D, then qh(t, x, 1)
= 1. lim h1oPh(t,x,y)
(According to M. Kac's terminology, the process X;' "does not feel" the boundary for small h.) In the scheme being considered we may also include, in a natural way (and almost precisely), questions involving the asymptotics, for small values of time, of the transition density of a diffusion process (of the fundamental
solution of a parabolic equation) not depending on a parameter. Indeed, let p(t, x, y) be the transition probability density of the diffusion process (X1, Px) corresponding to the elliptic operator ,2
Lu = 2Y_ a" (x)ax'axj
+ Yb'(x)ax'
We consider the family of diffusion processes X, = X,,,, h > 0. The transition density p(h, x, y) of X, over time h is equal to the transition density ph(1, x, y) of X, over time 1. The process (X,, Pz) is governed by the differential operator
Lhu = hLu =
h
2
2
I a''(x) ax' ax' + h > b'(x)
a-1.
(1.8)
The drift coefficients h b'(x) converge to zero as h j 0, so that the family of operators (1.8) is almost the same as the family (1.1) with the b'(x) replaced
by zero. In any event, the action functional may be written out without difficulty: it has the form h-'So,((p), where Sor(ip) = 2
f
Y_ a;;((p.)O',
ds.
(1.9)
380
10. Sharpenings and Generalizations
The problem of finding the minimum of this functional can be solved in terms of the following Riemannian metric p(x, y) connected with the matrix (a;;): P(x, Y) =
min Wo=x.4,,=Y
J [Y pr;((ps)Os]'12 ds. 0
It is easy to prove that the minimum of the functional (1.9) for all parametrizations cps, 0 S s < t, of a given curve is equal to the square of the Riemannian length of the curve multiplied by (2t)-'. Consequently, the minimum (1.2) is equal to (2t)-'p(x, y)2. Application of formula (1.4) yields
lim h In p(h, x, y) _ - -p(x, y)z. hj0
(1.10)
Accordingly, for the transition density of the diffusion process which vanishes on the boundary of D we obtain
lim h In q(h, x, y) = -ZPD(x, y)Z,
(1.11)
hjO
where PD(X, y) is the infimum of the Riemannian lengths of the curves connecting x and y, without leaving D. There are also results, corresponding to (1.6), (1.7), involving the ratio of the densities q and p for small values of the time argument. In particular, the "principle of not feeling of the boundary" assumes the form lim
q(t, x, y)
=1
(1.12)
jop(t,x,Y)
if all shortest geodesics connecting x and y lie entirely in D. The results (1.10)-(1.12) were obtained in Varadhan [2], [3] (the result (1.12) in the special case of a Wiener process was obtained in Ciesielski [1]). In Kifer [1], [3] and Molchanov [1] sharp versions of these results were obtained. In obtaining them, an essential role is played by the corresponding
rough results (it is insignificant whether in local or integral form): they provide an opportunity to exclude from the consideration all but a neighborhood of an extremal (extremals) of the action functional; after this, the problem becomes local. The sharp asymptotics of the transition probability density from x to y turns out to depend on whether x and y are conjugate or not on an extremal
connecting them (cf. Gel'fand and Fomin [1]). In the case where x and y are nonconjugate and the coefficients of the operator smooth, not only can the asymptotics of the density up to equivalence be obtained but an asymptotic
381
§1. Local Theorems and Sharp Asymptotics
expansion in powers of the small parameter can also be obtained for the family of processes corresponding to the operators (1.1) we have (2nht)_"2
ph(t, x, y) =
exp{-h-'V(t, x, y)}[K0(t, x, y)
+ hK I (t, x, y) +
+ hmKm(t, x, y) + o(hm)]
(1.13)
as h 10 (Kifer [3]); for the process with generator (1.8) we have p(t, x, y) =
(2nt)_"
exp{ -p(x, y)2/2t}[K0(x, y)
+ tK I (x, y) + - - - + tmKm(x, y) + o(tm)]
(1.14)
as t 10 (Molchanov [1]). Methods in probability theory are combined with analytic methods in these publications. We outline the proof of expansion (1.13) in the simplest case, moreover for m = 0, i.e., the proof of the existence of the finite limit lim (2xht)''2ph(t, x, y) exp{h-' V(t, x, y)}.
(1.15)
h1o
Let the operator Lh have the form
Au + Y b(x)
Lhu =
8x`
(1.16)
2
The process corresponding to this operator may be given by means of the stochastic equation
XS = b(X;) + h'/2rv
Xo = x,
(1.17)
where w, is an r-dimensional Wiener process. Along with X;, we consider the
diffusion process Ys, nonhomogeneous in time, given by the stochastic equation
YS = Os + h''2r'
Ya = x,
(1.18)
where cp 0 < s < t, is an extremal of the action functional from x to y over time t (we assume that it is unique). The density of the distribution of Y; can be written out easily: it is equal to the density of the normal distribution with mean cpt and covariance matrix htE; at y it is equal to (27tht)-'"2. The ratio of the probability densities of X; and Y; is equal to the limit of the ratio
P{X''ey+D}/P{Y"Ey+D}
(1.19)
382
10. Sharpenings and Generalizations
as the diameter of the neighborhood D of the origin of coordinates con. verges to zero. We use the fact that the measures in Co,(R'), corresponding to the random processes Xs and Y5, are absolutely continuous with respect to each other; the density has the form dpxh
(Yn) = exp{h- 112
(fb(Y) (b( Y') - Psdw)-
(2h)-1
- 02 ds I(1.20)
The absolute continuity enables us to express the probability of any event connected with X' in the form of an integral of a functional of Yh; in particular, dpXh
,,
=M Y,Ey+D;dµYh(Y).
P{X'' E1 y+ DI
n
Expression (1.19) assumes the form M((
{exp{h- 112 J'(b(Y)
- cps, dws)
(l
I
-(2h)-'fIb(Ys)-0sl2ds}IY;'ey+D} = M{ exp{ h- 1/2 J(b(5 + h'r2ws) - 0s, dws)
l
(
-
(2h)-'
oIb(p.,
fo
ll + h'i2n,s) - OS12ds} w,Eh 1/2D}
(we use the fact that wo = 0 and Ys = cps + h'/2ws for 0 < s < t). If D (and together with it, h-1/2D) shrinks to zero, we obtain the conditional mathematical expectation under the condition w, = 0. Hence p"(t, x,1/2 = (21rht)-
h1j2ws) - OS' dws)
t
-
o (2h)-'
fo Ib(cps J
l + h'"2ws) - OS12 ds} w, = )
)
of.
(1.21)
J
(Formula (1.21) has already been obtained by Hunt [1]). To establish the existence of the limit (1.15) as h 10, first of all we truncate the mathematical expectation (1.21) by multiplying the exponential expression by the indicator of the event {maxo<s<, Ih'"2wsI < d}. The circumstance
that the omitted part may be neglected as h 10 may be established by means of rough estimates connected with the action functional. Then we
§1. Local Theorems and Sharp Asymptotics
383
transform the exponent by taking account of the smoothness of b. In the first integral we take the expansion of b(cp., + h"2ws) in powers of h1"2w5 up to terms of the first order and in the second integral up to terms of the second order. The principal term arising in the second integral is equal to -(2h)-1
olb((PS)
- 0 .,l2 ds,
0
and upon substitution into (1.15) it cancels with h-1 V(t, x, y). If we verify that the terms of order h-112 also cancel each other, it only remains to be proved that the terms of order 1 and the infinitely small terms as h 10 do not
hinder convergence (this can be done in the case where x and y are not conjugate).
We discuss the terms of order h- 112, arising from the first integral in (1.21). We integrate by parts: h- 112 f (b(co) - cps, dws) = h- 112(b(p,) - 0, w1) 0
d - h-1/2 r (w ds (b(4,5) -
P.:)
ds.
Jo
The integrated term vanishes by virtue of the condition w, = 0. The expres-
sion (d/ds)(b((ps) - 0,) can be transformed by taking account of Euler's equation for an extremal. It is easy to see that the integral cancels with the terms of order h- 1/2, arising from the second integral in (1.21). It can be seen from the proof outlined here that the coefficient K0(t, x, y)
in the expansion (1.13) has the meaning of the conditional mathematical expectation of the exponential function of a quadratic functional of a Wiener process under the condition w, = 0.
The situation is not more complicated in the case where x and y are connected with a finite number of extremals and they are not conjugate to each other on any of these extremals. In the case of conjugate points the transition density over time t can be expressed by means of the ChapmanKolmogorov equation in terms of a transition density over a shorter time. In the integral thus obtained, the main role is played by densities at nonconjugate points and the asymptotics is obtained by applying Laplace's method (the finite-dimensional one). For the density p(t, x, y) as t 10, this is done in Molchanov [1]; in particular, for various structures of the set of minimal geodesics connecting x and y, there arise asymptotic expressions of the form p(t, x, y) with varying a. Concerning sharp asymptotics in problems involving large deviations not reducing to one-dimensional or finite-dimensional distributions, little
384
10. Sharpenings and Generalizations
has been done yet. The results obtained in this area do not relate to probabilities P''{X' e e A} or densities but rather to mathematical expectations
M'exp{h-'F(X'')},
(1.22)
where F is a smooth functional (the normalizing coefficient is assumed to be equal to h- '). The consideration of problems of this kind is natural as a first step, since even in the case of large deviations for sums of independent two-dimensional random vectors, the sharp asymptotics of integrals analogous to (1.22) can be found much easier than the sharp asymptotics of the probability of hitting a domain. The expression (1.22) is logarithmically equivalent to
exp{h-' max[F - S]}, where S is the normalized action functional. If the extremal cp providing this maximum is unique, then the mathematical expectation (1.22) differs from
M''{p(X', cp) < S; exp{h-'F(X'')}}
(1.23)
by a number which is exponentially small compared with (1.22) or (1.23). This enables us to localize the problem. The plan of further study is analogous, to a great degree, to what has been done in Cramer [1]: the generalized Cramer transformation is performed, which transforms the measure Ph into a new probability measure P' such that the "most probable" trajectory of X° with respect to Ph turns out to be the extremal (p, for h small. With respect to the new probability measure, the random process h-'12[Xl - gyp,] turns out to be asymptotically Gaussian with characteristics which are easy to determine. If the functionals F and S are twice differentiable at qp (the requirement of smoothness of S can be reduced to smoothness requirements of local characteristics of the family of
processes X') and the quadratic functional corresponding to the second derivative (F" - S" )(cp) is strictly negative definite, then for the means (1.23) and (1.22) we obtain the following sharp asymptotics:
M'exp{h"'F(X'')} - Ko exp{h-' [F(cp) - S(ip)]}
(1.24)
as h 10. The constant Ko can be expressed as the mathematical expectation of a certain functional of a Gaussian random process. If F and S are v + 2 times differentiable, then for (1.22), (1.23) we obtain an asymptotic expansion of the form
exp{h[F((p) - S((p)]}(K0 + K I h +
+
K["12lhl"'2l + o(h"'2)).
(1.25)
§2. Large Deviations for Random Measures
385
Analogous results may be obtained for the mathematical expectations of functionals of the form G(X') exp{h-'F(X'')}. This program has been realized in Schilder [1] for the family of random processes X' = h-1J2w,, where w, is a Wiener process and in Dubrovskii [1], [2], [3] for the families of locally infinitely divisible Markov processes, considered by us in Ch. 5. We note that in the case considered in Schilder [1], the part connected with the asymptotic Gaussianness of h-112[X" _ P,]
with respect to P' falls out of our scheme outlined here (because of the triviality of that part). In exactly the same way, in this simple situation we do not need to be aware of the connection of the employed method with H. Cramer's method.
§2. Large Deviations for Random Measures We consider a Wiener process , on the interval [0, 1] with reflection at the
endpoints. For every Borel set r s [0, 1] we may consider a random variable 7IT(r), the proportion of time spent by , in r for t E [0, T]: T
7rT(r) = T
£Xrsds.
(2.1)
The random variable irT(F) = nr(r, co) is a probability measure in r. In the space of measures on [0, 1], let a metric p be given, for example, the metric defined by the equality
p(µ, v) = sup Iµ[0, x] - v[0, x]I.
(2.2)
O5x<1
It is known that for any initial point 4o = x e [0, 1], the measure nT converges to Lebesgue measure 1 on [0, 1] with probability 1 as T - oo. If A is a set in the space of measures, at a positive distance from Lebesgue measure 1, then Px{irT a A} 0 as T -- oo, i.e., the event {w: rzT a A} is related to large deviations.
Of course, formula (2.1) defines a random measure 1ZT(F) for every measurable random process ,(w), t >_ 0. If this process is, for example, stationary and ergodic, then by Birkhoff's theorem, nT(r) converges to a nonrandom measure m(F) (to the one-dimensional distribution of the process) as T - oc and the deviations of nT from m which do not converge to zero belong to the area of large deviations. Families of random measures, converging in probability to a nonrandom measure, also arise in other situations, in particular, in problems concerning the intersection of a level. For example, let , be a stationary process with
sufficiently regular trajectories and let ?IT = T,(T > 0). Along with , we may consider the family of random measures nT on [0, 1], where 7CT(r) is
386
10. Sharpenings and Generalizations
the number, normalized by dividing by T, of intersections, taking place for t e r s [0, 1], of a given level by q,. Under certain regularity conditions on ,, the measure nT converges, as T -+ oo, to Lebesgue measure multiplied by the average number of intersections over the time unit. We may consider limit theorems of various kinds for random measures. Here we discuss some results concerning the behavior of probabilities of large deviations. The rough asymptotics of probabilities of large deviations for measures (2.1), connected with Markov processes, was studied in a series of publications by Donsker and Varadhan [1], [2]. Independently of them,
general results were obtained by Gartner [2], [3]. Gartner also applied these results to Markov and diffusion processes and considered a series of examples. Our exposition is close to Gartner's work. Let (E, -4) be a measurable space. We denote by B the space of bounded measurable functions on E and by V the space of finite countably additive
set functions (charges) on the a-algebra B. We introduce the notation= SE f(x)u(dx). In V we may consider the B*-weak topology (cf. Dunford and Schwartz [1]) given by neighborhoods of the form (P:
u - uo, f,> 1 < 6, i = 1, ..., n),
f, a B. Along with it, we shall consider a metric p and the corresponding topology in V. We shall only consider metrics given in the following way. A system Wi of functions f e B bounded in absolute value by one is fixed and we put
p(u, v) = sup1-
I; fewt
u, veV.
(2.3)
Of course, in order that equality (2.3) define a metric, we need to take a suffi-
ciently rich supply of functions f: every charge p e V need to be defined uniquely by the integralsfor f e V. Let a finite measure m be fixed on (E, .4). We shall say that a metric p, defined by (2.3) in V, satisfies condition (1) if for every 6 > 0 there exist finite systems 21 = 216, B = 236 of measurable functions on E, bounded in absolute value by one, such that <m, w> < S for w e 23 and for every f e T1 there exist u c- 21 and w c- 23 for which I f - v 1 < w.
Condition (1) enables us to reduce the study of large deviations for measures to the study of large deviations for random vectors in a finitedimensional space and to use the results of §1, Ch. 5. The metric (2.2) in V = V([0,1]) is a special case of the metric (2.3) with the system T1 consisting of the indicators of all intervals [0, x]; it satisfies condition (1) if as m we choose Lebesgue measure. As 216 we may choose the indicators of the intervals of the form [0, kb] and as 2'36, the indicators of the intervals [k5, (k + 1)6]. Another example of a metric satisfying condition
387
§2. Large Deviations for Random Measures
(1): E is a compactum, 9 is the a-algebra of its Borel subsets, p is the metric corresponding to the family 9J1 of functions bounded in absolute value by one and satisfying a Lipschitz condition with constant I (this metric corresponds to C*-weak convergence in the space V, considered usually in probability theory). If m is any measure with m(E) = 1, then as 91, we may choose a finite 6-net in 9Jt and as 23b we may choose the singleton consisting of the constant S function. We fix a complete probability space {S2, .F, P}. A mapping n: f1 x B -+ R'
is called a random measure if n(., A) is a random variable for every A e'$ and n(w, ) is a measure for almost all w. In what follows we shall often consider sets of the form {w: p(n(w, -)"U) < S}.
In order that these sets be measurable for any p c V and d > 0, it is sufficient to assume that in 9R there exists a countable subset 9R0 such that for every
function f e 9A and every measure p there exists a bounded sequence of elements f of 9JZo, converging to f almost everywhere with respect to p. We shall always assume that this condition is satisfied. Now let us be given a family of random measures nh, depending on a
parameter h. For the sake of simplicity we assume that h is a positive numerical parameter.
Our fundamental assumption consists in the following. There exists a function 2(h) converging to + oc as h j 0 and such that the finite limit
H(f) = lim A(h)` In M exp{d(h)}
(2.4)
h10
exists for every f e B. As in §1, Ch. 5, it can be proved that H(f) is a convex functional. We make the following assumptions concerning this functional: A.1. A.2.
The functional H(f) is Gateau differentiable, i.e., the function h(y) = H(f + yg), y e R', is differentiable for every f, g e B. If a bounded sequence { of measurable functions on E converges to an f e B in measure with respect to the measure m (m is the measure singled out on (E, a)), then H(.J'). ,
We denote by S(p) the Legendre transform of the functional H(.):
S(p) = sup [- H(f)],
p e V.
(2.5)
feB
It is easy to see that S(p) is a convex functional on V, assuming nonnegative values, possibly including + oo.
We list the basic properties of S. In the meanwhile we introduce some notation to be used in what follows.
388 B. 1.
10. Sharpenings and Generalizations
In order that S(p) < oo, it is necessary that the charge p be a measure. If nh'(E) = 1 with probability 1, then S(p) < oo only for measures p
with p(E) = 1. The functional S is lower semicontinuous with respect to B*-weak convergence.
For any s > 0, the measures p such that S(p) <_ s are uniformly bounded and uniformly absolutely continuous with respect to m. B.3. If condition (1) is satisfied, then S is lower semicontinuous in the topology induced by the metric p and the set B.2.
I(s) = {p c- V: S(p) < s},
s < oc, is compact in this topology. B.4. Let'2I be a finite system of functions belonging to B. We write PsI(p, v) = sup I-
VE9I
Sti(p) = sup [- H(.f)], a -(4I)
where x(21) is the linear hull of W. Moreover, we write (Dqj(s) = {p e V: Sx(p) < s}, s e [0, oo). (We note that p41(p, v) = 0 does not necessarily imply that p = v.) For any 6 > 0 and s >- 0 we have the inequality inf{S(v): pw(v, p) < 8} < Sv,(p)
(2.6)
(D,y(s) S { b(s)}+a = {p a V: p21(p, I(s)) < 8}.
(2.7)
and the inclusion
We outline the proofs of these properties (complete proofs may be found in Gartner [3]). B.1 may be proved very simply. We do not even have to use assumptions A.1, A.2. For example: if p(A) < 0 for some A e -4, then S(p) > sup [- H(YXA)] y
sup yp(A) = oo. y<_o
Here we have used the fact that H(f) < 0 for f < 0. B.2. If S(p) <- s, then we use the fact that the supremum (2.5) is not smaller
than the value of the expression in square brackets for f = YXA, y > 0. We obtain p(A) <
H(YXA) + s Y
389
§2. Large Deviations for Random Measures
The boundedness follows from this immediately. To verify the uniform absolute continuity, it is sufficient to prove that for any c > 0 there exists S > 0 such that m(A) < S implies p(A) < s for all measures p e 1(s). We choose y such that sly is smaller than a/2; condition A.2 implies that there exists S > 0 such that H(YXA) < ye/2 if m(A) < S. B.3. The compactness of fi(s) in the B*-weak topology follows from B.1
and B.2. It only remains to be shown that p,,, p e 0(s), it. -+ p in the B*weak topology imply that p(p,,, p) -> 0. This can be deduced from (1) by using the uniform absolute continuity of they. and p. B.4. According to A.1 the derivative H'(f )(h) = limy-o y-1(Hf + yh) H(f )) exists, which is a linear functional of h e B. The equality pf(A) = H'(f)(XA) defines a finite positive countably additive (by virtue of assumption A.2) measure on (E, .4). It follows from the properties of convex functions
(cf. Rockafellar [1]; we have already used these properties in the proof of Lemma 5.2 of Ch. 7) that it is sufficient to prove inequality (2.6) for measures
p for which the supremum in the definition of SA(p) is attained for some function fo e 2'(21). Then for all h E 2'(21) we have:= H'(fo)(h) =
. It is easy to deduce from the definition of pfo that the upward
convex functional a(f) =- H(f ), f e B, attains its maximum for f = fo. From this we obtain S(pf°) = - H(fo) = H(fo) = S41(p), where pg(pf°, p) = 0 (pul is a semimetric!). This proves
-
inequality (2.6) and along with it, inclusion (2.7). Now we formulate the fundamental result of this section.
Theorem 2.1. Let the metric p satisfy condition (1), let the limit (2.4) exist, and let the functional H(f) satisfy conditions A.1 and A.2. Then A(h)S(p) is the action functional for the family of random measures it" in the metric space (V, p) ash 10; i.e.,for any y, 6, s > 0and peV we have
P{p(ir", p) < S} > exp{-A(h)[S(p) + y]},
(2.8)
P{p(70, (D(s)) >- S} < exp{-A(h)[s - y]}
(2.9)
for sufficiently small h, where (b(s) = {p e V: S(p) < s}.
Proof. First we obtain estimate (2.8). If S(p) = + ac, there is nothing to be proved. Therefore, we assume that S(p) < eo. We use condition (1) for 61 > 0. We obtain the estimate p(ic", p) < p41(tt", p) + max <7t", w> + max. WE41
(2.10)
WEB
Since by B.2, the measure p is absolutely continuous with respect to m, the last term on the right side of (2.10) is smaller than 5/4 for sufficiently small 61. Consequently, P{p(n'', p) < 5) > P{p%(n", p) < 6/2} - P{max> WEB
(2.11)
390
10. Sharpenings and Generalizations
Applying Theorem 1.2 of Ch. 5 to the family of finite-dimensional vectors 17h = h 10, and taking into account that in a finite-dimensional space all norms are equivalent, we obtain the estimate P{pql(n", p) < 8/21 > exp{ -A(h)[S.A(µ) + y/2]} > exp{ -)(h)[S(p) + )'/2]}. (2.12)
Now we estimate the subtrahend in (2.11). The exponential Chebyshev inequality yields P{<,c", w> > 8/41 < exp{-xA(h) a}M exp{xA(h)<x", w>)
= exp{ _2(h)[x
- A(h)-' In M exp{,1(h)
for any x > 0. The expression in square brackets converges to x(8/4)
-
H(xw) as h j 0. If we choose x sufficiently large and then decrease 81 so that H(xw) be sufficiently small for all w E 0 (this can be done by virtue of condition A.2), we obtain x(8/4) - H(xw) > S(p) + y for all w c- B. This implies that
P{> 8/4) < exp{-),(h)[S(p) + y]}
(2.13)
for sufficiently small h > 0. Estimate (2.8) follows from (2.1l)-(2.13). Now we deduce (2.9). We use condition (1) again. By virtue of B.2 we have max,,,.s< 8/4 for sufficiently small positive 8, and for all p E (F(s). From this and (2.10) we conclude that P{p(rc", (F(s)) >_ 8) < P{pj,(n", (F(s)) > 8/2) + P{max
> 8/4 WEB
I(2.14)
Let the functions v1,. .. , v E 21 form a basis of the linear space 2'(21). To estimate the first term on the right side of (2.14), we use the inclusion (De(s) s {cp(s)}° 6 (ef. formula (2.7)) and apply Theorem 1.1 of Ch. 5 to the family of finite-dimensional vectors ,I" = (n"(v1), ... , P{p41(n", (D(s)) >- 8/2) < P{p%(rc", (Fe(s)) >_ 8/4) < exp{-A(h)(s - y/2)}.
The second term in (2.14) can be estimated by means of (2.13) and we obtain
estimate (2.9) for small h. O We consider an example. Let (S,, Px) be a diffusion process on a compact
manifold E of class 0°°', controlled by an elliptic differential operator L
391
§2. Large Deviations for Random Measures
with infinitely differentiable coefficients. Let us consider the family of random measures aT defined by formula (2.1). We verify that this family satisfies the
condition that the limit (2.4) exists, where instead of h 10, the parameter T goes to oo and as the function A we take T: for any function f e B, the finite limit
H(f) = lim T` In M x expT f I fO
T-. m
ds .
(2.15)
11
exists. As in the proof of Theorem 4.2 of Ch. 7, for this we note that the family of operators T
T, g(x) = Mx exp{ Jo f (SS)
J
forms a semigroup acting in B. If g(x) is a nonnegative function belonging to B, assuming positive values on some open set, then the function Tfg(x) is strictly positive for any t > 0. As in the proof of Theorem 4.2 of Ch. 7, we may deduce from this that the limit (2.15) exists and is equal to .1(f ), the logarithm of the spectral radius of T, (cf., for example, Kato [1]). The fulfilment of conditions A.1 and A.2 follows from results of perturbation theory. Moreover, it is easy to prove that the supremum in the definition of S(µ) may be taken for not all functions belonging to B but only continuous ones: S(µ)
= sup [<µ>f> - A(f)] fec
(2.16)
For continuous functions f the logarithm of the spectral radius of Ti coincides with that eigenvalue of the infinitesimal generator Af of the semigroup T, which has the largest real part: this eigenvalue is real and simple. For smooth functions, Af coincides with the elliptic differential operator
L + f. This implies that by .(f) in formula (2.16) we may understand the maximal eigenvalue of L + f.
The random measure hT is a probability measure for every T > 0. According to B. 1, the functional S(µ) is finite only for probability measures µ.
Since A(f + c) = A(f) + c for any constant c, the supremum in (2.16) may be taken only for those f e C for which ),(f) = 0. On the other hand, the functions f for which %.(f) = 0 are exactly those functions which can be
represented in the form -Lu/u, u > 0. Consequently, for probability measures µ, the definition of S(p) may be rewritten in the form
S(µ)
inf u>o
l µ, u
u
(2.17)
392
10. Sharpenings and Generalizations
For measures k admitting a smooth positive density with respect to the Riemannian volume m induced by the Riemannian metric connected with the principal terms of L, we may write out Euler's equation for an extremal of problem (2.17) and obtain an expression for S(p), not containing the infimum. In particular, if L is self-adjoint with respect to m, then the equation for an
extremal can be solved explicitly. The infimum in (2.17) is attained for u = dµ/dm and S(p) can be written in the form
v S(u) = g
dp
r JE
dm. dµ
dm
The functional S(p) may be calculated analogously for measures 71T in
the case where (4, Pz) is a process in a bounded domain with reflection at the boundary (Gartner [2]).
§3. Processes with Small Diffusion with Reflection at the Boundary In Chapters 2-6 we considered the application of methods in probability theory to the first boundary value problem for differential equations with a small parameter at the derivatives of highest order. There naturally arises the question of what can be done for the second boundary value problem or for a
mixed boundary value problem, where Dirichlet conditions are given on some parts of the boundary and Neumann conditions on other parts. Let the smooth boundary of a bounded domain D consist of two components, 31D and 02D. We shall consider the boundary value problem E2
L`u`(x) _ - Y a'j(x) 2
3u`(x) 3!
a2us
8x` ax' = 0,
+ Y b`(x)
CUE
8x'
= 0,
x e D,
u`(x) LD = f ( x),
a,D
where a/a1 is the derivative in some nontangential direction; the coefficients of the equation, the direction ! and the function f are assumed to be sufficiently smooth functions of x. This problem has a unique solution, which can be written in the form u`(x) = Mx f (XI.), where (XI, Ps) is the diffusion process in D u BID, governed by LE in D, undergoing reflection in the direction I on the part DID of the boundary; re = min{t : XX E 02D} (Fig. 32). The asymptotics of the solution u` may be deduced from results involving
§3. Processes with Small Diffusion with Reflection at the Boundary
393
Figure 32.
the limit behavior of X; for small E. We begin with results of the type of laws of large numbers.
Along with X;, we consider the dynamical system z, = b(x,) in D, obtained from X' for E = 0. It follows from results of Ch. 2 that with probability
close to one for small E, the trajectory of X beginning at a point x e D, is close to the trajectory x,(x) of the dynamical system until the time of exit of x,(x) to the boundary (if this time is finite). From this we obtain the following: if x,(x) goes to 02D sooner than to 0,D and at the place y(x) of exit, the field b is directed strictly outside the domain, then the value u(x) of the solution of problem (3.1) at the given point x c- D converges to f (y(x)) asE -+ 0.
If X' begins at a point x E 0,D (or, moving near the trajectory of the dynamical system, reaches 0,D sooner than 02D), then its most probable behavior depends on whether b(x) is directed strictly inside or outside the domain. In the first case, X; will be close to the trajectory, issued from x E a,D,
of the same dynamical system t, = b(x,). From this we obtain the following result.
Theorem 3.1. Let the field b be directed strictly inside D on a,D and strictly outside D on 02 D. Let all trajectories of the dynamical system .z, = b(x,), beginning at points x e D u 3%D, exit from D (naturally, through 32D). Then the solution u`(x) of problem (3.1) converges to the solution u°(x) of the degenerate problem
Y b'(x) ax; = 0,
x c- D,
(3.2) u°(x)1a2D = PA
uniformly forxeDuODase- 0. If b is directed strictly outside Don 01 D, then the trajectory of the dynamical
system goes out of D u 49,D through 81D and X; is deprived of the opportunity of following it. It turns out that in this case X; begins to move along a
394
10. Sharpenings and Generalizations
trajectory of the system z, = b(x,) on a1D, where b is obtained by projecting
b parallel to the direction onto the tangential direction. This result also admits a formulation in the language of partial differential equations but in a
situation different from the case of a boundary consisting of two components, considered here (cf. Freidlin [2]). In the case where the trajectories of the dynamical system do not leave D through a2D but instead enter D through 02D, results of the type of laws of large numbers are not sufficient for the determination of lim,_0 u`(x). A study of large deviations for diffusion processes with reflection and of the asymptotics of solutions of the corresponding boundary value problems
was carried out in Zhivoglyadova and Freidlin [1] and Anderson and Orey [1]. We present the results of these publications briefly.
We discuss Anderson's and Orey's [1] construction enabling us to construct a diffusion process with reflection by means of stochastic equations, beginning with a Wiener process. We restrict ourselves to the case of a process
in the half-plane R+ = {(x', x2): x` >- 0} with reflection along the normal direction of the boundary, i.e., along the x'-direction. We introduce a mapping F of COT(R2) into COT(R+): for E C0T(R2) we define the value rl, _ by the equalities r,( ) of the function h = (3.3)
min(0, min SS
\
0<s<1
It is clear that F() E CoT(R+), the mapping f is continuous and
Inr-n.1<-IC,-sslWe would like to construct a diffusion process with reflection at the boundary of R+, with diffusion matrix (a"(x)) and drift b(x) = (b'(x), b2(x)).
We represent the diffusion matrix in the form (a''(x)) = a(x)a*(x), where a(x) is a bounded matrix-valued function with entries satisfying a Lipschitz condition (such a matrix exists if the a`'(x), together with their derivatives, are bounded and continuous). We define the functionals a,(C) and b,(C) on COT(R2) by putting
ag) = a(f',(C)),
b,(() =
We consider the stochastic equation
x + f'as(') dws + 0
bs(X) ds,
x e R+ .
J0
The existence and uniqueness theorem for such equations can be proved in the same way as for standard stochastic differential equations. It turns
§3. Processes with Small Diffusion with Reflection at the Boundary
395
out that the random process X, = r,(X) is exactly the diffusion process with reflection in R + , having the given diffusion and drift coefficients.
Now let us assume that there is a small parameter in the equation. For the sake of simplicity, we restrict ourselves to the case where a is the identity matrix multiplied by r:
X; = x + Ew, + Jb() ds;
(3.4)
0
the corresponding process with reflection will also be equipped with the index E: X; = r,(X`). It can be proved that the solution of equation (3.4) can be obtained by applying a continuous mapping B,, to the function Ew e COT(R2) (the proof is the same as that of Lemma 1.1 of Ch. 4). Hence the process X` with reflection can be obtained from Ew by means of the composiUsing general properties tion of two continuous mappings: X` =
of the action functional (§3, Ch. 3), we obtain that the action functional for the family of processes X; in COT as r - 0 has the form r-2SoT(p), where 1
min
SOTM =
x: r(ax(x)) =w 2
f
T
I X512 ds.
(3.5)
0
On the other hand, Bx is invertible:
x-
(Bx
f b5(W) ds; 0
taking account of this, expression (3.5) can be rewritten as T
SoT(W) =
min
1
,: r(pt)=w 2
Jo
I
b(p.,) I2 ds
(3.6)
(we use the fact that b5(4i) = b(r5(iIi)) = b(rp5)). It is easy to verify that the minimum (3.6) is equal to ('T
Sor(r0) =
2 fo
1 0., - b(p) 12 ds,
(3.7)
where b(x) is the field coinciding with b(x) everywhere except at those points
of 81D at which b(x) is directed outside D; at these points b(x) is defined as the projection of b(x) onto the direction of the boundary. The minimum is attained for the function ' defined by the equalities (0,1, p?),
)G? = (p,2,
0, = W' +min(0, J'x(o(4.) b'(0, q ))ds.
(3.8)
396
10. Sharpenings and Generalizations
A formula analogous to (3.7) also holds for a varying matrix (a`'(x)) and for processes in an arbitrary domain D with reflection along any smooth field 1, nontangent to the boundary, as well. E
Results involving the action functional imply, in particular, that for 0 the trajectory of X; converges in probability to a function at which
S+ vanishes, i.e., to a solution of the system x, = 6(x,).
Now we present those results involving the asymptotics of solutions of problem (3.1) which follow from the above result, (cf. Zhivoglyadova and Freidlin [1]). Let D be the ring in the plane, having the form {(r, 0): 1 < r < 2} in the polar coordinates (r, 0). We consider the problem auE E - E2 /a2UE + a2uc1 + b,(r, 0) au` L U' ar + b9(r, e) = ae2 2 are as = 0, E
au,
or
(1,
0) = 0,
x e D,
uc(2, 0) _ .f (0).
We shall say that condition I is satisfied if the trajectories, beginning in D, of the dynamical system 9, = b(x,) go to a,D = {r = 1} sooner than to 02D = {r = 2) and b,(l, 0) < 0 for all 0. We assume that on the interval [0, 21r] the function b,,(1, 0) has a finite number of zeros. Let K,, K21 ..., K, be those of them at which be(l, 0) changes its sign from plus to minus as 0 increases. We consider
V(K y) = inf{Sor((p): tpo = K., car = y; T > 01 and assume that for any i, the minimum of V(K;, y) for y E a2 D is attained at a unique point yi = (2, ©). We put V(K,, a2D) = V(K;, y;).
Theorem 3.2. Let the above conditions be satisfied. Let g* be the unique {a2 D}-graph over the set of symbols {K,, ... , K,, a2 D}for which the minimum of V(x, fl) is attained over all {a2D}-graphs g. Let the trajectory, beginning at a point x r= D, of the system z, = b(x,) go out to the circle 0,D at the point (1, 0x); for x c- a, D, as Bx we choose the angular coordinate of x. Let K; be the point to which the solution of the equation 0, = be(l, 0,) with
initial condition 0. is attracted as t -+ x. Then lim,..0 u(x) = f (Ok), where Kk - 02D is the last arrow in the path leading from Ki to 02D in g*. The proof of this theorem can be carried out in the same way as that of Theorem 5.2 of Ch. 6.
§4. Wave Fronts in Semilinear PDEs and Large Deviations
397
§4. Wave Fronts in Semilinear PDEs and Large Deviations Consider the Cauchy problem D i + cu,
at 8u( x)
t > O, x e Rt,
axe
u(O,x) =X-(X) =
1,
x<0,
0,
x > 0.
Here c and D are positive constants. Let X1 be the process in RI corresponding to (D/2)(d2/dx2) : X` = J/DWW. Then we have: u(t, x) = Mxx (X1)ect = ectP{x + VrD5 W, < 0} = ect
xD e-y2/21
1
2nt f-ce
dy.
O ne can easily derive from this equality that for any a > 0
u(t, at) x
c - 2D)
t
oo.
If a* = 2cD, then lim
u(t,x) =0
sup
1-'00 x>-(a*+h)t
lim
inf
1--wo xs(a*-h)t
u(t, x) = oo
for any h > 0. This means that the region of large values of u(t, x) propagates, roughly speaking, with the speed a* = 2cD. Now let the constant c in (4.1) be replaced by a smooth function c(u) such
that c(u) > 0,
for u < 1,
c(u) < 0,
for u > 1,
c = c(0) = max c(u);
uzO
398
10. Sharpenings and Generalizations
au(t, x) = D
at
a2u
+ f(u),
i axz
t > O, x e R',
u(O,x) = X (x),
where f (u) = c(u)u.
Applying the Feynman-Kac formula, one can obtain the following equation for u(t, x) u(t, x) = Mxx (X,) exp{ f c(u(t - s, Xs)) dx }.
(4.5)
Since, according to (4.3), c(u) < c(O) = c, we conclude from (4.5),
0 < u(t, x) < e"Px{Xt < 0}, and thus the first of the equalities (4.2) holds for the solution of the nonlinear problem. The second equality of (4.2), of course, does not hold for the solution of problem (4.4). Since c(u) < 0 for u > 1, the solution never exceeds 1. But as we show later, one can prove using the large deviation estimates that u(t, x) = 1 for any h > 0, where u(t, x) is the solution of (4.4).
Equation (4.4) appeared, first, in Fisher [1] and in Kolmogorov, Petrovskii, and Piskunov [1]. The rigorous results for nonlinearities f (u) = c(u)u satisfying (4.3) were obtained in the second of these papers. Therefore, this nonlinearity is called KPP-type nonlinearity. It was shown in Kolmogorov, Petrovskii, and Piskunov [1] that the solution of problem (4.4) with a KPPtype nonlinear term behaves for large t, roughly speaking, as a running wave:
u(t, x) z v(x - a*t), where a* = D
t --+ oo,
2Dc, c =f'(0), and v(z) is the unique solution of the problem
v"(z) + a*v'(z) + f (v(z)) = 0,
v(-oo) = 1,
-co < z < oo, v(+co) = 1,
v(0) = 2.
Thus, the asymptotic behavior of u(t, x) for large t is characterized by the asymptotic speed a* and by the shape of the wave v(z). Instead of considering large t and x, one can introduce a small parameter e > 0: put z/(t, x) = u(t/e, x/e). Then the function u6(t, x) is the solution of
§4. Wave Fronts in Semilinear PDEs and Large Deviations
399
the problem
- eD a2Ue +
ate(t, x) at
2 axe
1
E
f(uE) , (4.6)
tt(0,x) = X (x) One can expect that ue(t, x) = u(t/e, x/e) .: v((x - a*t)/e) X (x - a*t) as e 10. This means that the first approximation has to do only with the asymptotic speed a*, and does not involve the asymptotic shape v(z). Consider now a generalization of problem (4.6) (see Freidlin [12], [14], [17], [18]): ate(t x)
at'
=
e
2
'
ague
t>0,
1
> a'J(x) aaxl + e f (x, uE),
xeR',
i.j=1
uE(0,x) = g(x) > 0.
A number of interesting problems, like the small diffusion asymptotics for reaction-diffusion equations (RDEs) with slowly changing coefficients, can be
reduced to (4.7) after a proper time and space rescaling. We assume that the coefficients a`j(x) and f (x, u) are smooth enough, a A2 1
)2 for some 0 < a < a; the function f (x, u) = c(x, u)u, where c(x, u) satisfies conditions (4.3) for any x e R' (in other words, f (x, u) a`j (x)A,4j < a
is of KPP-type for any x e R'). Denote by Go the support of the initial function g(x). We assume that 0 < g(x) < g < oo and that the closure [Go] of Go coincides with the closure of the interior (Go) of G : [(Go)] = [Go]. Moreover, assume that g(x) is continuous in (Go). Denote by (X,6, Px) the diffusion process in R' governed by the operator
V= e
2-
a2
E j_1
ax'axj
The Feynman-Kac formula yields an equation for te(t,x),
ut(t,x) = Mxg(X;)exp{
!
1 J0
c(X:,ue(t-s,Xs))ds}.
(4.8)
e
JJJ
First, assume that c(x, 0) = c = constant. Then, taking into account that c(x, u) < c(x, 0) = c, we derive from (4.8), 0 < ue(t, x) 5 Mxg(X, )e"/E < gear/EPx{Xl a Go}.
(4.9)
400
10. Sharpenings and Generalizations
The action functional for the family of processes Xs, 0 S s < t, in the space of continuous functions is equal to t
1
1
Sot (0) =
2e
r
f tE
ds,
0
(a''J(x))-1, and
where 0 is absolutely continuous,
lmelnP{X, eGo}=-inf{Sot(q):0eCot, 00=x,,eGo}.
(4.10)
810
Denote by p(x, y) the Riemannian distance in Rr corresponding to the form dsz = E; i=1 a1 (x) dx' dxi. One can check that the infimum in (4.10) is actually equal to (1/2t)pz(x, Go). We derive from (4.9) and (4.10), 0 < u`(t, x) < je`iPX{Xt e Go}
exp{ E
(ci - pz (2GO)) l
It follows from the last bound that
if p(x, Go) > tv,
lim tt (t, x) = 0, 10
t > 0.
(4.11)
We now outline the proof of the equality
o u(t, x) = 1,
1
if p(x, Go) < t 2c,
t > 0.
(4.12)
Let us first prove that
lim elnu'(t,x) = 0, ¢l0
(4.13)
if p(x, Go) = t 2c, 1 > 0. If K is a compact set in It > 0, x e Rr}, then the convergence in (4.13) is uniform in (t, x) e K n {p(x, Go) = t 2c}. One can derive from the maximum principle that lim uE(t, x) < 1,
t > 0, x e Rr.
(4.14)
E40
Therefore, to prove (4.13) it is sufficient to show that lim aInue(t,x) z 0, 10
ifp(x,GO)=t 2c,t>0.
(4.15)
§4. Wave Fronts in Semilinear PDEs and Large Deviations
401
Let ¢s, 0 5 s ::g t, be the shortest geodesic connecting x and [Go] with the parameter s proportional to the Riemannian distance from x. For any small 6 > 0, choose z E (Go), Iz - 0t 1 < 6, and define
0<s-<6,
x, s
a < S < t - 6,
=
t-6<3
hs,
where hr is the linear function such that ht-j _
and ht = z. Denote by of a > 0, the p-neighborhood in Co, of the function 0s, and by
X,u the indicator of this set in Cot. Let p > 0 be chosen so small that the ball in R' of radius 211 centered at z belongs to (Go) and for any tli a d°..
min d (t/is, Gs) = do > 0, a5s5i-6
where d(., ) is the Euclidean distance in R' and G,q = {x e R', p(x, Go) < s 2c}. Such a choice of p is possible since d( , Gs) d1 > 0 for 6 < s:5 t - 6. Then (4.11) implies that uE (t - s, qs) --> 0 as a 10 for (5 < s < t - d and fi E Eat, and thus c%, u5(t - s, qis)) --' c(t/is, 0) = c as e 10 for such and s. Since Sot(¢`) = p2(x, Go)/2t, one can derive from the definition of 0s that for any h > 0 there exists b > 0 so small that Sot(0s) - p2(x, Go) < h. 21
Let go = min{g(x) : Ix - zj < p}. It follows from (4.8) that r
u'(1, x) ? MXg(X,)X,(X')expj E jo c(Xs,u(t-s,X.,))ds 111
gpe(c(t-3s))IEP,, > gpec(t-3o)1a
exp J l
j9p}
- 1 (P2 (x, Go) + 2h/ }
(4.16)
2t
for e > 0 small enough. In (4.16) we used the lower bound from the definition of the action functional. Taking into account that p(x, Go) = t 2c, we derive (4.15) from (4.16).
We now prove (4.12). Suppose it is not true. Then, taking into account (4.14), xo a R' and to > 0 exist such that p(xo, Go) < to 2c,
lim u`(to,xo) < 1 - 2h s10
(4.17)
10. Sharpenings and Generalizations
402
for some h > 0. Let U = Ue be the connected component of the set {(t, x) : t > 0, x e R', p(xo, Go) < t 2c, ue(t, x) < I - h},
containing the point (to, x0). The boundary 8U of the set U c R'+1 consists of sets 8 U, and 8 U2:
0U, =0Un{(t,x) : t>0,xeR',ue(t,x) = 1 -h},
8U2=avn({(t,x):t>0,p(x,Go)=tV2__c}u{(t,x):t=O,xeGo}). =min{s : (t - s, X) e U} and let X;, i e 11, 2}, be the indicator function of the set of trajectories XI for which (t - re, X,8.) e BU;. Let A, Define
be the distance in R'+1 between (to, xo) and the surface {(t, x) : t > 0, p(x, Go) = t 2c}, A _ AI A to. Let xo denote the indicator function of the set of trajectories Xe, such that maxo s s s IX,' - xJ < A. Using the strong Markov property, one can derive from (4.8), 1
U8(to,xo)=M zt(to-re,X,-)exp{E 1
J0
c(X,s,ue(to-s,XS))ds} JJ
2
r
ll ,,e ,e Mxxxoua(to-re,Xt)exp{E Jf c(Xe,ue,(t0-s,X,))ds}.
<
o
1=1
(4.18)
The first term of the sum in the right-hand side of (4.18) can be bounded from below by (1 - h)Mxo4o, since ue(t, x) = I - h on 8U, and c(x, u) >- 0. To estimate the second term, note that re > A and min c(Xe, ue(to - s, X,)) >t
05s5z`
min c(x, u) = co > 0, 05u51-h Ix-xoI 5 A
if (to - r, X) e 8U2. Thus, the second term in the right-hand side of (4.18) is greater than MxoX'OX'2e(Cox)/eue(to
- re, 4).
Using bound (4.13), we conclude that the last expression is bounded from below by M4 if s > 0 is small enough. Combining these bounds together, one can derive from (4.18): ue(to,xo) ? (1 - h)Mxoy1Xo + Mx0X210
21-h-M,.(1-4)
403
§4. Wave Fronts in Semilinear PDEs and Large Deviations
fore small enough. Since M ,,,to co > 0 exists such that
1 as E 10, the last inequality implies that
ue(to,xo)> 1- 2 for0<E<eo. This bound contradicts (4.17) and thus (4.12) holds. Equalities (4.11) and (4.12) mean that, in the case c(x, 0) = c = constant, the motion of the interface separating areas where u`(t, x) is close to 0 and to 1 for 0 < e << 1, can be described by the Huygens principle with the velocity field v(x, e), x, e e R', I e I = 1, which is homogeneous and isotropic if calculated with respect to the Riemannian metric pS. ) corresponding to the form dsz = E2,,j=, ai(x) dx' dxi, (a&)) = (a'J(x)) , and I v(x, e)I = 2c. If Gt _
{x a R' : lim,jo u(t,x) = 1}, t > 0, then
G,+h=Ix aR':p(x,G,)
h>0.
In particular, if Go = Go is a ball of radius E with the center at a point z e R',
and g(x) = g8(x) is the indicator function of Go c K, then G, is the Riemannian ball of radius t/ with the center at z. Now let c(x, 0) = maxos. c(x, u) = c(x) not be a constant. Put 1
r
V(t, x) = inf
Jo
c(0s) - .2
ds : O e Cor, X60
= x, #, a Go
.
+,i=I
One can check that V(t, x) is a Lipschitz continuous function increasing in t. It follows from (4.8) and (4.3) that us(t, x) <- Mxg(X;) exp{
a
Jo C(XC) ds
}
The expectation in the right-hand side of the last inequality, as follows from §3.3, is logarithmically equivalent to exp { (1 /e) V (t, x) }, a 10. Thus, lim u`(t, x) = 0,
if V(t, x) < 0.
(4.19)
if V(t,x) > 0,
(4.20)
If we could prove that lim zt(t,x) = 1, 610
then the equation V(t, x) = 0 would give us the position of the interface
404
10. Sharpenings and Generalizations
(wavefront) separating areas where ue(t, x) is close to 0 and to 1 at time t. But simple examples show that, in general, this is not true; the area where lime1o ue(t, x) = 0 can be larger than prescribed by (4.19). This, actually, depends on the behavior of the extremals in the variational problem defining V(t, X).
We say that condition (N) is
fulfilled
if for any (t, x) such that
V(t, x) = 0, V (t' x)
I
= sup
0
c(Os) - 2
ds : ¢ e Cot,
.E aji
j=1
00 =x,¢te[Gt],V(t-s,¢s.)<0forO<s
.
Theorem 4.1. Let ue(t, x) be the solution of problem (4.7) with a KPP-type nonlinear term f (x, y) = c(x, u)u. Then el Ol ue(t, x)
0
uniformly in (t, x) changing in any compact subset of { (t, x) : t > 0, x e R`,
V(t,x) <0}. If the condition (N) holds, lim ue(t, x) = 1 810
uniformly in any compact subset of { (t, x) : t > 0, x e R`, V(t, x) > 0}, and the equation V(t, x) = 0 describes the position of the wavefront at time t.
The first statement of this theorem is a small refinement of (4.19). The proof of the second statement is similar to the proof of (4.12) in the case c(x) = c = constant; the condition (N) allows us to check that limelo c In u(t, x) = 0, if t > 0 and V (t, x) = 0. The rest of the proof is based on (4.8), strong Markov property, and condition (4.3), as in the case c(x) _ constant. For the detailed proof, see Freidlin. [15]. The motion of the front in the case of variable c(x) = c(x, 0) has a number of special features, which cannot be observed if c(x) = c = constant. Consider, for example, the one-dimensional case a(,
at
x)
=
2
2 axe
ue (O, X) = X- (x
+ E c(x)ue(l - ue),
- fl)
t> 0, x e R,
(4.21)
405
§4. Wave Fronts in Semilinear PDEs and Large Deviations
Here c(x) is a smooth increasing positive function; /1 > 0 is a constant. One can check that the condition (N) is fulfilled in this case, and the equation V(t, x) = Vp(t, x) = 0 describes the position of the front at time t. If x > fi, then
Vp(t, x) = sup{
Jo
[c(c5)
0o
- 2 s]
+
= x,
Or
Let, first, c(x) = 1 for x < 0 and c(x) = 1 + x for x > 0. Then the function Vp(t, x) can be calculated explicitly:
V,6(t,x)
24+t(1+
Vp(t,x)>0,
(#-x) 2
fl+x
13
2t
2
x>fl,
\\x
The interface between areas where ue(t, x) - 1 and ue(t, x) - 0 as a 10 moves in the positive direction, and solving the equation Vp(t, x) = 0, we define its position xp(t) at time t > 0,
xf(t) = 2 +fl+
a
3 +2t2(1 +fl).
The interface motion in this case has some particularities which are worth mentioning. Evaluate fl = xo(l), x;(1), xo(2):
xo(1)=I+ 3:.2.0, xo(2) = 2 + 3 x 5.6,
xx(0)=1+
43+ 3 + 3 X5.1.
We see that xo(2) > xx(1). This means that the motion of the front does not satisfy the Markov property (i.e., the semigroup property): given the position of the front at time 1, the behavior of the front before time 1 can influence the behavior of the front after time 1. The evolution of the function ue(t,x) satisfies, of course, the semigroup property: if ue(s, x) is known for x e R1,
406
10. Sharpenings and Generalizations
then ue(t, x), t > s, can be calculated in a unique way. However, it turns out that for c(x) # constant, the evolution of the main term of ue(t, x) as e 10, which is a step function with values 0 and 1, already loses the semigroup property. To preserve this property, one must extend the phase space. The phase space should include not just the point where Vp(t, x) is equal to zero, but the whole function vp(t, x) = VV (t, x) A 0.
The loss of the semigroup property also shows that the motion of the front cannot be described by a Huygens principle. One more property of the front motion in this example: Let uo(t,x) and u i (t, x) be the solutions of equation (4.21) with different initial functions uo(0, x) = X (x), ui (0, x) = 1 - X (x - 1). In the first case, the front moves
from x = 0 in the positive direction and its position at time t is given by xo(t). In the second case, the front moves from x = 1 in the negative direction of the x-axis. Condition (N) is not satisfied in the last case, but one can
prove that the position k (t) of the front at time t is the solution of the equation (Freidlin [15]),
X (t) _ - c(Xt),
Xo = 1.
(4.22)
In our example, c(x) = 1 + x for x > 0. Solving (4.22), one can find that the
front comes to x = 0 at time ti.o z 1.22 in the second case. On the other hand, solving the equation xp(t) = 1 for fi = 0, we see that in the first case, the front needs time to_,I z 0.57 to come from 0 to 1. Thus, the motion in the direction in which c(x) increases is faster than in the opposite direction. This property is also incompatible with the Huygens principle. We have seen in this example that if c(x) is linearly increasing, the front moves at an increasing, but finite, speed. It turns out that if c(x) increases fast enough on a short interval, remaining smooth, the front may have a jump. For example, consider problem (4.21) with ft = 0 and take as c(x) a smooth increasing function which coincides with the step function e(x) =
1 + 3(1 - X (x - h)), h > 0, everywhere except a 6-neighborhood of the point x = h. One can check that the condition (N) is satisfied in this case. But if 8 is small enough, the solution xo = xo (t) of the equation V (t, xo) = 0 has the form shown in Fig. 33. This means that for 0 < t < To, the front moves continuously. But at time t = To (see Fig. 33) a new source arises at a point h' close to h if 8 is small enough. The region where ue(t, x) is close to 1 for 0 < e << 1 propagates from this source in both directions, so that ue(t, x) is close to 1 at time t, a (To, T1 ) on [0,xo(t1)) and on (y1(t1),y2(t1)). Outside the closure of these intervals
ue(ti, x) - 0 as z 10. At time T1 both components meet. These and some other examples are considered in more detail in Freidlin [14].
The condition (N) is not always satisfied. For example, if in problem (4.21) c(x) is a decreasing function, then (N) is not fulfilled. To describe the behavior of the wavefront in the general case, introduce the function
§4. Wave Fronts in Semilinear PDEs and Large Deviations
407
t
Figure 33.
W(t,x) = sup S os!
f
[c(cs)
-2
ll`
-] ds
:
.
i,j=l
0eCol, 00=x,0,eGo} One can prove that the function W(t, x) is Lipschitz, continuous, and nonpositive. If the condition (N) is satisfied, then W (t, x) = V(t, x) A 0.
Theorem 4.2. Let u(t, x) be the solution of problem (4.7) with a KPP-type nonlinear term f (x, u) = c(x, u)u, Then (i) limelo ue(t, x) = 0, uniformly in (t, x) e F1, where F1 is a compact subset of the set { (t, x) : t > 0, x e R', W (t, x) < 0}; (ii)
limelo u(t, x) = 1, uniformly in (t, x) a F2, where F2 is a compact
subset of the interior of the set { (t, x) : t > 0, x e R', W (t, x) = 0}.
The proof of this theorem and various examples can be found in Freidlin [17] and Freidlin and Lee [1].
Various generalizations of problem (4.7) for KPP-type nonlinear terms can be found in Freidlin [18], [20], [22], and Freidlin and Lee [1]. In particular,
one can consider RDE-systems of the KPP-type, equations with nonlinear boundary conditions, and degenerate reaction-diffusion equations. An analytic approach to some of these problems is available at present as well; corresponding references can be found in Freidlin and Lee [1]. Wavefront
propagation in periodic and in random media is studied in Gartner and Freidlin [1], Freidlin [15], and Gartner [5].
408
10. Sharpenings and Generalizations
Consider one-dimensional dynamical system u = f (u). If f (u) is of the KPP-1 type, the dynamical system has two equilibrium points: one unstable point at u = 0 and one stable point at u = 1. One can consider problem (4.7) with such a nonlinear term that the dynamical system has two stable rest points separated by the unstable third one. Such nonlinear terms are called bistable. Problem (4.7) with a nonlinear term that is bistable for each x c- R' is of great interest. One can prove that zt (t, x) also converges in this case to a
step function as e ]. 0, and the evolution of this step function can be described by a Huygens principle. These results can be found in Gartner [4].
§5. Random Perturbations of Infinite-Dimensional Systems We considered, in the previous chapters of this book, random perturbations of finite-dimensional dynamical systems. Similar questions for infinitedimensional systems are of interest as well. From a general point of view, we face in the infinite-dimensional case, again, problems of the law-of-largenumbers type, of the central-limit-theorem type, and of the large-deviation type. But many new interesting and rather delicate effects appear in this case. A rich class of infinite-dimensional dynamical systems and semiflows is described by evolutionary differential equations. For example, one can consider the dynamical system associated with a linear hyperbolic equation or a semifliow associated with the reaction-diffusion equation 8u(t, x)
at
= DDu +f (x, u),
t>0,xEGcR',
u(0, x) = g(x), 8u(t, x) On
IaG = 0.
Here D is a positive constant, f (x, u) is a Lipschitz continuous function, and
n = n(x) is the interior normal to the boundary 8G of the domain G. The domain G is assumed to be regular enough. Problem (5.1) is solvable for any continuous bounded g(x). The corresponding solution u(t, x) will be differ-
entiable in x e G for any t > 0, even if g(x) is just continuous. This shows that problem (5.1) cannot be solved, in general, for t < 0. Therefore, equation (5.1) defines a semiflow T1g(x) = u(t,x), t z 0, but not a flow (dynamical system), in the space of continuous functions. Of course, there are many ways to introduce perturbations of the semiflow Tlg. One can consider various kinds of perturbations of the equation, perturbations of the domain, and perturbations of the boundary or initial conditions. Let us start with additive perturbations of the equation. We restrict our-
§5. Random Perturbations of Infinite-Dimensional Systems
409
selves to the case of one spatial variable and consider the periodic problem, so that we do not have to care about the boundary conditions: 2 au`(t,x)-Da+f(x,uE)+sC(t,x), TX_ 2
at
zt(0,x) = g(x).
We assume that the functions f (x, u), g(x), C(t, x) are 2m-periodic in x, so that they can be considered for x belonging to the circle S' of radius 1, and we consider 27r-periodic in x solutions of problem (5.2). When s = 0, problem (5.2) defines a semiflow in the space Cs, of continuous functions on Sl provided with the uniform topology. What kind of noise C(t, x) will we consider? Of course, the smoother C(t, x), the smoother the solution of (5.2). But on the other hand, the smoothness makes statistical properties of the noise and of the solution more complicated. The simplest (from the statistical point of view) noise C(t, x) is the space-time white noise
C(t'
x)
02 W (t' x)
=
t >_ 0, x e [0, 21r).
arax
Here W (t, x), t >- 0, x e R1, is the Brownian sheet, that is, the contin-
uous, mean zero Gaussian random field with the correlation function M W (S, x) W (t, y) = (s A t) (x A y). One can prove that such a random field exists, but 62 W (t, x)/arax does not exist as a usual function. Therefore, we have to explain the solution of problem (5.2) with C(t, x) = 62 W(t, x)/atax. A measurable function 1/(t, x), t >_ 0, x e Sl, is called the generalized solution of (5.2) with C(t, x) = a2 W (t, x)/atax, if u`(t, x)b(x) dx
-
g(x)o(x) dx J s,
J s,
[uE(s, x)Dc"(x) -f (x, uE (s, x))0(x)] dx ds o
s1
+e
0'(x) W(t, x) dx
(5.3)
Js1
for any 0 e CS, with probability 1. It is proved in Walsh [1] that such a solution exists and is unique, if f (x, u) is Lipschitz continuous and g e Csi. Some properties of the solution were established in that paper as well. In particular, it was shown that the solution is Holder continuous and defines a Markov process ur = u5(t, 0) in Cs, . The asymptotic behavior of u4 as s 10 was studied in Faris and Jona-Lasinio [1] and in Freidlin [16].
410
10. Sharpenings and Generalizations
Consider, first, the linear case, x)
D
ale
ae + E
8z
t > 0, x e S,
a az x)
(5.4)
ve(0,x) = g(x)
Here D, a > 0, W (t, x) is the Brownian sheet. The solution of (5.4) with g(x) _- 0 is denoted by vo(t, x), and let vg(t, x) be the solution of (5.4) with E = 0; vy°(t, x) is not random. It is clear that ve(t, x) = vg (t, x) + vo(t, z).
Due to the linearity of the problem, va (t, x) is a Gaussian mean zero random field. One can solve problem (5.4) using the Fourier method. The
eigenvalues of the operator Lh(x) = Dh"(x) - ah(x), x c- S1, are 1k = Dk2 + a, k = 0, 1, 2,.. .. The normalized eigenfunctions corresponding to 2k
are n 1/2 cos kx and 7r-1/2 sin kx. Then 00
Ao(t) +_= E(Ak(t) sinkx + Bk(t) coskx), (5.5)
vE(t, x) = vg(t, x) +
V/7r k=1
where Ak(t) and Bk(t) are independent Ornstein-Uhlenbleck processes, satisfying the equations dAk(t) = dWk(t) - AkAk(t) dt,
Ak(0) = 0,
dBk(t) = d Wk (t) - 2kBk(t) dt,
Bk(0) = 0.
Here Wk(t) and Wk(t) are independent Wiener processes. The correlation function of the field V (t, x) has the form M(v8(s,x) - Mv`(s,x))(v8(t,y) - Mve(t,y)) = ezB(s, x, t, y)
=
E2
00
k=O
cos(x - y)
(
e xk(r-S) - e tk(r+s));
2k
x,y e S1, 0 < s < t. (5.6)
One can view vi = vE(t, ), t >- 0, as a Markov process in Cs, (generalized Ornstein-Uhlenbeck process). Equalities (5.5) and (5.6) imply that the process v, has a limiting distribution p8 as t -+ oo. The distribution p8 in Cs, is mean zero Gaussian with the correlation function c2B(x, y)
ez . cos k2k (x- Y)
= c
k =O
k
p8 is the unique invariant measure of the process vt in Cs, .
(5.7)
411
§5. Random Perturbations of Infinite-Dimensional Systems
Consider now the random field ve(t, x), 0 < t < T, x e S1, for some T E (0, oo). It is easy to see that ve(t, x) converges to vg (t, x) as a -> 0, but with a small probability ve(t, x) will be in a neighborhood of any function ¢(t, x) e C10,71xs1 such that 0(0, x) = g(x), if e > 0. The probabilities of such deviations are characterized by an action functional. One can derive from Theorem 3.4.1 and (5.6) that the action functional for the family ve(t, x) in as e --+ 0 is a 2Sv(0), where 10r(t,x)
S°(O) =
2
Js1 Jo
+oo,
-Do,",(1,x)+a0(t,x)I2dtdx,
0e W21'2
for the rest of L[o,T]xs1.
Here W2'2 is the Sobolev space of functions on [0, T] x S1 possessing generalized square integrable derivatives of the first order in t and of the second order in x.
Note that there is a continuous imbedding of W1'2 in C[o,T]xsi. We preserve the same notation for the restriction of S°(qS) on C10,71xs,. Using Theorem 3.4.1 and Fernique's bound for the probability of exceeding a high
level by a continuous Gaussian field (Fernique [1]), one can prove that e 2Sv(q) is the action functional for ve(t, x) in C[oT]xsI as e -> 0. Consider now equation (5.2) with C(t, x) = 02W(t, x)/atax. Let F(x, u) _ f0" f(x, z) dz, and
U(c) = Jo3
I (do)2 +F(x, fi(x))
dx,
E WI21
where W21 is the Sobolev space of functions on S' having square integrable first derivatives. It is readily checked that the variational derivative 6U(q)/8q$ has the form 8U(o) _
-D D
dx2
-.f(x, c)
So the functional U(O) is the potential for the semiflow in Cs1 defined by (5.1): au(t, x)
at
_ 5U(u(t, x)) 8u
We have seen in Chapter 4 that for finite-dimensional dynamical systems, potentiality leads to certain simplifications. In particular, the density me(x)
412
10. Sharpenings and Generalizations
of the invariant measure of the process
4 = -VU(xj) +cJ1' in R' can be written down explicitly: m8(x) = C e
C 1= E
2C 2u(x),
z JRr e '
x c R', dx.
Convergence of the last integral is the necessary and sufficient condition for existence of a finite invariant measure. One could expect that a similar formula could be written down for the invariant density m'(b) of the process u!:
mE(q) = constant x exp{ -
U(O) I.
(5.8)
The difficulty involved here, first of all, is due to the fact that there is no counterpart of Lebesgue measure in the corresponding space of functions. One can try to avoid this difficulty in various ways. First, for (5.8) to make sense, one can do the following. Denote by 96(q) the 6-neighborhood of the function 0 in CS, -norm. If v is the normalized invariant measure for the process u;, then one can expect that T
r
Py{ lim 1 f T_ T
dt = v(9a(c))} = 1,
where x w is the indicator of the set 9J(4). Then it is natural to call exp{-(2/e2)U(0)} the nonnormalized density function of the stationary distribution of the process u1 (with respect to nonexisting uniform distribution) provided T
lim lim
.ToT X (0J(tOt)dt
.510 T-.oo fo X4(02)(u;) dt
= exp{-E(U(O1)
- U(c2))}.
(5.9)
In (5.9), a is fixed. It is intuitively clear from (5.9) that the invariant measure
concentrates as c -> 0 near the points where U(S) has the absolute minimum. One can try to give exact meaning not to (5.8) but to its intuitive implications, which characterize the behavior of uj for e << 1.
It is also possible to try to write down a formula for the density of the invariant measure of the process u, with respect to an appropriate Gaussian
413
§5. Random Perturbations of Infinite-Dimensional Systems
measure correctly defined in Cs, . Of course, the form of the density with respect to this reference measure will differ from (5.8). We commence with this last approach.
As the reference measure in Cs,, we choose the Gaussian measure 14 which is the invariant measure for the process of in Cs, defined by (5.4). The measure l e has zero mean and the correlation function (5.7); a is a positive constant. Denote by M",e the expectation with respect to the measure /4. Theorem 5.1. Assume that for some at > 0 1 ( 2 zn Ae = M''e expS - EZ fo [F(x, O(x)) - Z a02(x)] dx } < ao.
l
JJJ
Denote by ve the measure in Cs1 such that
l
zrz
d
(0) = AE'
exp{-E fo
[F(x, c(x)) _ 1a02(X)] dx }.
a
Then ve is the unique normalized invariant measure of the process 4 in Cs1 defined by (5.3). For any Borel set r c Cs1 and g e Cs,, r
Pg{ im
T-oo
T fo
Xr(ui)dt= V(F) } = 1.
The proof of this theorem can be found in Freidlin [16]. Equality (5.9) can be deduced from this theorem.
One can use Theorem 5.1 for studying the behavior of the invariant measure ve as a -+ 0. In particular, if the potential U(O) has a unique point 0 e Cs, of the absolute minimum, then the measure ve is concentrated in a neighborhood of q. if 0 < e << 1: for any 6 > 0,
li .V 0e
,:mSxI
If the absolute minimum of U(c) is attained at finitely many m points of Csi, then the limiting invariant measure is distributed between these m points,
and under certain assumptions, the weight of each of these points can be calculated.
One can check that the field ue(t, x) can be obtained from ve(t, x) with
the help of a continuous invertible transformation B of the space CIO, T]Xs,
: u(t, x) = B[v] (t, x). Therefore, by Theorem 3.3.1, the action func-
414
10. Sharpenings and Generalizations
tional for the family uEI(t, x) as a -+ 0 in C[° T] XS, has the form e 2Su(W) = e 2Sv(B_1(W)),
0 e C[o,T]xsI,
where c 2Sv is the action functional for vE(t, x) calculated above. This gives us the following expression for S"(qi). T
SU () _
f
2
Jo
O
+00,
I
D
at
ax2
-f(x,Yf(t,x)) dxdt,
VI C W1 2 ,2,
1J# e C[0,T]XSI \WW'2
(5.10)
One can use this result, for example, for calculation of the asymptotics in the exit problems for the process ue, Consider now a system of reaction-diffusion equations perturbed by the space-time white noise, auk(t,x) at
- Dk
a2
axz
+fk(x;ul,...,u;)+e
zk(0, x) = 9k (x),
a
a2Wk(t,x) atax
(5.11)
t > O, x e S1, k e { 1, 2, ... , n}.
Here Wk(t, x) are independent Brownian sheets. If the vector field (f, (x, u), ... , f" (x, u)) =f (x, u), u e R", for any x e S' is potential, that is, fk(x, u) = -aF(x, u)/auk for some potential F(x, u), k e { 1, ... , n}, x e S1, then the semiflow u° = (U01 (t, .), ... , un(t, )) in [Cs,]" is potential: auk
at
-
6U(u°) 8uk
U(O) = J o" 2 k=1
Dk
(dk(X))2+F(x,c(x))
dx,
l
0 = (01(x),...,0"(x)) One can give a representation for the invariant measure in this case similar to the case of a single RDE. If the field f (x, u) is not potential, the semiflow (5.11) is not potential as well. But even in the nonpotential case, one can calculate the action functional for the family ue(t,x) defined by (5.11). It has a form similar to (5.10). For example, this action functional allows one to
calculate the limiting behavior of the invariant measures as e -' 0 in the nonpotential case. If the spatial variable in (5.1) has dimension greater than 1, one cannot consider perturbations of (5.1) by a space-time white noise; the corresponding equation has no solution in the case of such irregular perturbations. One
415
§5. Random Perturbations of Infinite-Dimensional Systems
can add to (5.1) the noise c (t, x), where C(t, x) is the mean zero Gaussian field with a correlation function R(s, x, t, y) =6(t - s)r(x, y), where r(x, y) is smooth. Then the perturbed process exists, but its statistical properties are more complicated than in the white noise case. It is interesting to consider here the double asymptotic problem, when r(x, y) = (1/%')F((x - y)/A), where d is the dimension of the space, and both a and A tend to zero; r(u) is assumed to be a smooth function such that limjj_,o F(u) = 0. The additive perturbations of the equation do not, of course, exhaust all interesting ways to perturb a differential equation. Consider, for example, the following linear Cauchy problem. aue(t, x)
at
- d(x) azue + be(x) ate = LeuE ax
axe
t>0,xeR',
(5.12)
ue(0,x) = g(x).
Here d(x) and be (x) are random fields depending on a parameter e > 0 and
g(x) is a continuous bounded function. Let, for example, d(x) = a(xle), bM(x) = b(x/e), where (a(x), b(x)) is an ergodic stationary process. Then one can expect that the semiflow 4 defined by problem (5.12) converges as a 10 to the semiflow defined by the equation 0u°(t, x)
at
02u°
Duo
= a axz + b ax
u°(0, x) = g(x),
where a and b are appropriate constants. This is an example of the homogenization problem. One can ask a more general question: what are the conditions on the coefficients (ad (x), bl (x)) in (5.12) that ensure convergence of corresponding semiflows. It is useful to consider a generalization of problem (5.12). Let u(x), v(x)
be monotone increasing functions on Rl, and let u(x) be continuous. One can consider a generalized differential operator Du is defined as h(x + 0) - h(x)
Duh(x) = lim u(x + 0) - u(x)'
and D, is defined in a similar way. The operator LE in (5.12) also can be written in the form D,,.Du. with
u/(x) = vE(x)
J.
dy exp{ -
X dy
fo
ba (z)zf ,
ybE(z)dz
- oI ac(y) exp J.
ae(z)
416
10. Sharpenings and Generalizations
Note that the pairs (u, v), (u + c1i v + c2) and (au, a 1 v), where c1, c2, a are constants, a > 0, correspond to the same operator D D,,. Consider the semiflow Wr, e > 0, defined by the equation 8WE (t, x)
at
_ D,eDv WE(t, x),
uE(O,x) = g(x).
Then limlo(WW g - W°g) = 0 for any continuous bounded g(x) if and only if ua(x) -+ u°(x), vE(x) --+ v°(x) as e 10 at any continuity point of the limiting function (after a proper choice of the functions u°, v° having to do with the above-mentioned nonuniqueness) (Freidlin and Wentzell [4]). In particular, all the homogenization results for equation (5.12) can be deduced from that statement. Such a result, which is complete, in a sense, is known only in the
case of one spatial variable. Results concerning homogenization in the multidimensional case were obtained by Kozlov [1] and Papanicolaou and Varadhan [1]. Some large deviation results for equation (5.12) with rapidly oscillating
coefficients, which are useful, in particular, when wavefronts in random media are studied, were obtained in Gartner and Freidlin [1] (see also Freidlin [15]). The corresponding multidimensional problem is still open. Perturbations of semiflows defined by PDEs caused by random perturba-
tions of the boundary conditions are considered in Freidlin and Wentzell [1]. Initial-boundary problems for domains with many small holes are considered in Papanicolaou and Varadhan [2].
References
Anderson, R. F. and Orey, S. [1] Small random perturbations of dynamical systems with reflecting boundary. Nagoya Math. J., 60 (1976), 189-216. Anosov, D. V. [1] Osrednenie v sistemakh obyknovennykh differentsial'nykh uravenenii s bystro koleblyushchimisya resheniyami. Izv. Akad. Nauk SSSR Ser. Mat., 24, No. 5 (1960),721-742.
Arnold, V. 1. [1] Matematicheskie Melody Klassicheskoi Mekhaniki. Nauka: Moscow, 1975.
English translation: Mathematical Methods of Classical Mechanics, Graduate Texts in Mathematics, 60. Springer-Verlag: New York, Heidelberg, Berlin, 1978. [2]
Topological and ergodic properties of closed 1-forms with incommensurable period. Funet. Analysis and Appl., 23, No. 2 (1991), 1-12.
Aronson, D. G.
[1]
The fundamental solution of a linear parabolic equation containing a small parameter. Ill. J. Math., 3 (1959), 580-619.
Baler, D. and Freidlin, M. 1. [1] Teoremy o bol'shikh ukloneniyakh i ustoichivost' pri sluchainykh vozmushcheniyakh. Dokl. Akad. Nauk SSSR, 235, No. 2 (1977), 253-256. English translation: Baier, D. and Freidlin, M. I., Theorems on large deviations and stability under random perturbations. Soviet Math. Dokl., 18, No. 4 (1977), 905-909.
Baxter, J. R. and Chacon, R. V. [ 1 ] The equivalence of diffusions on networks to Brownian motion. Cont. Math., 26 (1984), 33-49. Bernstein, S.
[1]
Sur ('equation differentiel de Fokker-Planck. C. R. Acad. Sci. Paris, 196 (1933), 1062-
[2]
Sur l'extension du theoreme limite du calcul des probabilites aux sommes de quantites dependantes. Math. Ann., 97 (1926), 1-59.
1064.
Blagoveshchenskii, Yu. N. [1] Diffuzionnye protsessy. zavisyashchie of malogo parametra. Tear. Veroyatnost. i Primenen., 7, No. 2 (1962), 135-152. English translation: Diffusion processes depending on a small parameter. Theory Probab. Appl., 7, No. 2 (1962), 130-146.
Blagoveshchenskii, Yu. N. and Freidlin, M. 1. [1] Nekotorye svoistva diffuzionnykh protsessov, zavisyashchikh of parametra. Dokl. Akad. Nauk SSSR, 138, No. 3 (1961), 508-511. English translation: Blagoveiscenskii, Yu. N. and Freidlin, M. I., Certain properties of diffusion processes depending on a parameter. 2, No. 3 (1961), 633-636.
Bogolyubov, N. M. and Mitropol'skii, Y. A. [1] Asymptotic Methods in the Theory of Nonlinear Oscillations, 2nd ed. Gordon & Breach: New York, Delhi, 1961.
References
418
Bogolyubov. N. N. and Zubarev, D. N. [1) Metod asimptoticheskogo priblizheniya dlya sistem s vrashchayuschcheisya fazoi i ego primenenie k dvizheniyu zaryazhennykh chastits v magnitnom pole. Ukrain. Mat. Zh., 7, No. 7 (1955). Borodin, A. N. [1] Predel'naya teorema dlya reshenii differentsial'nykh uravnenii so sluchainoi pravoi chast'yu. Teor. Veroyatnost. i Primenen., 22, No. 3 (1977), 498-512. English translation: A limit theorem for solutions of differential equation with random right-hand side. Theory Probab. Appl., 22, No. 3 (1977), 482-497.
Borodin, A. N. and Freidlin, M. I. [1] Fast oscillating random perturbations of dynamical systems with conservation laws. Ann. Inst. H. Poincare, 31, No. 3 (1995), 485-525. Borovkov, A. A. [1] Granichnye zadachi dlya sluchainykh bluzhdanii i bol'shie ukloneniya v funktsional'nykh prostranstvakh. Teor. Veroyatnost. i Primenen., 12, No. 4 (1967), 635-654.
English translation: Boundary-value problems for random walks and large deviations in function spaces. Theory Probab. Appl., 12, No. 4 (1967), 575-595. Burdzeiko, V. P., Ignatov, F. F., Khas'minskii, R. Z., and Shakhgil'dyan, V. Z. [1] Statisticheskii analiz odnoi sistemy fazovoi sinkhronizacii. Proceedings of the Conference on Information Theory, Tbilisi, 1979, 64-67. Ciesielski, Z.
[1] Heat conduction and the principle of not feeling the boundary. Bull. Acad. Polon. Sci. Ser. Math., 14, No. 8 (1966), 435-440. Coddington, E. A. and Levinson, N. [1] Theory of Ordinary Differential Equations. McGraw-Hill: New York, 1955.
Courant, R. (Lax, P.) [1] Partial Differential Equations. New York University. Institute of Mathematical Sciences: New York, 1952. Cramer, H. [1] Sur un nouveau theoreme limite de la theorie des probabilites, Acta Sci. et Ind., (1938), 736.
Da Prato, G. and Zabczyk, J. [1]
Stochastic Equations in Infinite Dimensions. Cambridge University Press: New York, 1992.
Day, M. V. [I ] Recent progress on the small parameter exit problem. Stoch. Stoch. Rep., 20 (1987), 121150.
[2]
Large deviation results for exit problem with characteristic boundary. J. Math. Anal. App!., 147, No. 1 (1990), 134-153.
Dembo, A. and Zeitouni, O. Large Deviations Techniques and Applications. Jones and Bartlett; Boston, 1992. [1] Deuschel, J.-D. and Stroock, D. W. [ 1 ] Large Deviations. Academic Press, 1989.
Devinatz, A., Ellis, R., and Friedman, A. [I] The asymptotic behavior of the first real eigenvalue of the second-order elliptic operator with a small parameter in the higher derivatives, 11. Indiana Univ. Math. J., (1973/74), 991-1011.
Donsker, M. D. and Varadhan, S. R. S. [1] Asymptotic evaluation of certain Markov process expectations for large time, I: Comm. Pure Appl. Math. 28, No. 1 (1975), 1-47; 11: ibid., 28, No. 2 (1975), 279-301; III: ibid., 29, No. 4 (1976), 389-461. [2] Asymptotics for the Wiener sausage. Comm. Pure Appl. Math., 28, No. 4 (1975), 525-565. Doob, J. L. (1] Stochastic Processes. Wiley: New York, 1953.
References
419
Dubrovksii, V. N. [I] Asimptoticheskaya formula laplasovskogo tipa dlya razryvnykh markovskikh protsessov. Teor. Veroyatnost. i Primenen., 21, No. 1 (1976), 219-222.
English translation: The Laplace asymptotic formula for discontinuous Markov processes. Theory Probab. App!., 21, No. 1 (1976), 213-216. [2]
Tochnye asimptoticheskie formuly laplasovskogo tipa dlya markovskikh protsessov. Dokl. Akad. Nauk SSSR, 226, No. 5 (1976), 1001-1004. English translation: Dubrovskii, V. N., Exact asymptotic formulas of Laplace type for Markov processes. Soviet Math. Dokl., 17, No. 1 (1976), 223-227.
[3]
Tochnye asimptoticheskie formuly laplasovskogo tipa dlya markovskikh protsessov. Kandidatskaya dissertatsiya, Moskva, 1976.
Dunford, N. and Schwartz, J. T. [1] Linear Operators, Vols. 1-III. Wiley: New York, London, Sydney, Toronto, 1958-1971. Dynkin, E. B. [1] Osnovaniya Teorii Markovskikh Protsessov. Fizmatgiz: Moskva, 1959. German translation: Die Grundlagen der Theorie der Markoffschen Prozesse. Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen, Band 108. SpringerVerlag: Berlin, Heidelberg, New York, 1961. [2] Markovskie Protsessy. Fizmatgiz: Moskva, 1963. English translation: Markov Processes. Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen, Band 121-122. Springer-Verlag: Berlin, Heidelberg, New York, 1965. Evgrafov, M. A. [1] Asimptoticheskie Otsenki i Tselye Funktsii. Gostekhizdat: Moskva, 1957. English translation: Asymptotic Estimates and Entire Functions. Gordon & Breach: New. York, 1961.
Faris, W. and Jona-Lasinio, G. [1] Large deviations for nonlinear heat equation with noise. J. Phys. A, 15 (1982), 3025-3055. Feller, W. [ I ] The parabolic differential equations and the associated semi-groups of transformations. Ann. Math., 55 (1952), 468-519. [2] Generalized second-order differential operators and their lateral conditions. IlI J. Math., 1 (1957), 459-504.
Fernique, X., Conze, J., and Gani, J. [ I ] Ecole d'ete de Saint-Flour IV 1974. P. L. Hennequin, Ed., Lecture Notes in Math., 400, Springer, 1975.
Feynman, R. P. and Hibbs, A. R. [1] Quantum Mechanics and Path Integrals. McGraw-Hill: New York, 1965. Fisher, R. A. [ I ] The wave of advance of advantageous genes. Ann. Eugen., 7 (1937), 355-359. Freidlin, M. 1.
[1] 0 stokhasticheskikh uravneniyakh Ito i vyrozhdayushchikhsya ellipticheskikh uravneniyakh. Izv. Akad. Nauk SSSR Ser. Mat., 26, No. 5 (1962), 653-676. [2] Smeshannaya kraevaya zadacha dlya ellipticheskikh uravnenii vtorogo poryadka s malym parametrom. Dokl. Akad. Nauk SSSR, 143, No. 6 (1962), 1300-1303. English translation: A mixed boundary value problem for elliptic differential equations of second order with a small parameter. Soviet Math. Dokl., 3, No. 2 (1962), 616-620. [3] Diffuzionnye protsessy s otrazheniem i zadacha s kosoi proizvodnoi na mnogoobrazii s kraem. Teor. Veroyatnost. i Primenen., S, No. 1 (1963), 80-88.
English translation: Diffusion processes with reflection and problems with a directional derivative on a manifold with a boundary. Theory Probab. Appl., 8, No. 1 (1963),75-83. [4]
Ob apriornykh otsenkakh reshenii vyrozhdayushchikhsya ellipticheskikh uravnenii. Dokl. Akad. Nauk SSSR, 158, No. 2 (1964), 281-283. English translation : Freidlin, M. I., A priori estimates of solutions of degenerating elliptic equations. Soviet Math. Dokl., 5, No. 5 (1964), 1231-1234.
420
References
[5] 0 faktorizatsii neotritsatel'no opredelennykh matrits. Teor. Veroyatnost. i Primenen., 13, No. 2 (1968), 375-378. English translation: On the factorization of non-negative definite matrices, Theory Probab. Appl., 13, No. 2 (1968), 354-356.
[6] 0 gladkosti reshenii vyrozhdayushchikhsya ellipticheskikh uravnenii. Jzv. Akad. Nauk SSSR Ser. Mat., 32, No. 6 (1968),1391-1413. English translation: Freidlin, M. I., On the smoothness of solutions of degenerate elliptic [7]
equations. Math. USSR-Izv., 32, No. 6 (1968), 1337-1359. Funktsional deistviya dlya odnogo klassa sluchainykh protsessov. Teor. Veroyatnost. i Primenen., 17, No. 3 (1972), 536-541.
English translation: The action functional for a class of stochastic processes. Theory Probab. Appl., 17, No. 3 (1972), 511-515. 0 stabil'nosti vysokonadezhnykh sistem. Teor. Veroyatnost. i Primenen., 20, No. 3 (1975), 584-595. English translation: On the stability of highly reliable systems. Theory Probab. Appl., 20, No. 3 (1975), 572-583. [9] Fluktuatsii v dinamicheskikh sistemakh s usredneniem. Dokl. Akad, Nauk SSSR, 226, No. 2 (1976), 273-276. English translation: Freidlin, M. I., Fluctuations in dynamical systems with averaging. Soviet Math. Dokl., 17, No. 1 (1976), 104-108. [10] Subpredel'nye raspredeleniya i stabilizatsiya reshenii parabolicheskikh uravnenii s malym parametrom. Dokl. Akad. Nauk SSSR, 235, No. 5 (1977), 1042-1045. English translation: Freidlin, M. I., Sublimiting distributions and stabilization of solutions of parabolic equations with a small parameter. Soviet Math. Dokl., 18, No. 4 (1977), 1114-1118. [11] Printsip usredneniya i teoremy o bol'shikh ukloneniyakh. Uspekhi Mat. Nauk, 33, No. 5 (1978), 107-160. English translation: The averaging principle and theorems on large deviations. Russian Math. Surveys, 33, No. 5 (1978), 117-176. [12] Propagation of concentration waves due to a random motion connected with growth. Soviet Math. Dokl., 246 (1979), 544-548. [13] On wave front propagation in periodic media. Stochastic Analysis and Applications, M. A. Pinsky, Ed., Marcel Decker: New York: 1984, 147-166. [14] Limit theorems for large deviations and reaction-diffusion equations. Ann. Probab., 13, No. 3 (1985), 639-676. [15] Functional Integration and Partial Differential Equations. Princeton Univ. Press: Princeton, NJ, 1985. [16] Random perturbations of RDE's. TAMS, 305, No. 2 (1988), 665-697. [17] Coupled reaction-diffusion equations. Ann. Probab., 19, No. 1 (1991), 29-57. [18] Semi-linear PDE's and limit theorems for large deviations. Ecole d'ete de Probab. de SaintFlour XX-1990. Lecture Notes in Math. 1527, Springer, 1991. [19] Random perturbations of dynamical systems: Large deviations and averaging. Math. J. Univ. of Sao-Paulo, 1, No. 2/3 (1994), 183-216. [20] Wave front propagation for KPP-type equations. In Surveys in Applied Mathematics, 2, J. B. Keller, D. M. McLaughlin, and G. Papanicolaou, Ed., Plenum: New York, 1995, 1-62. [21] Markov Processes and Differential Equations: Asymptotic Problems. Birkhkuser, 1996. [8]
Freidlin, M. I. and Lee, T.-Y. [ 1 ] Wave front propagation and large deviations for diffusion-transmutation processes. Probab. Theory Related Fields, 106, No. 1 (1996), 39-70. Freidlin, M. I. and Weber, M. [1] Random perturbations of nonlinear oscillator, Technical Report-1997, Dresden Technical University, 1997. Freidlin, M. I. and Wentzell, A. D. [1] Reaction-diffusion equations with randomly perturbed boundary conditions. Ann. Probab., 20, No. 2 (1992), 963-986. [2] Diffusion processes on graphs and the averaging principle, Ann. Probab., 21, No. 4 (1993), 2215-2245. [3] Random perturbations of Hamiltonian systems. In Memoirs of American Mathematical Society, No. 523, May 1994.
References
[4]
421
Necessary and sufficient conditions for weak convergence of one-dimensional Markov processes. In The Dynkin Festschrift. Markov Processes and their Applications, M. I. Freidlin, Ed., Birkhauser, 1994. 95-110.
Friedman, A. [1] Small random perturbations of dynamical systems and applications to parabolic equations. Indiana Univ. Math. J., 24, No. 6 (1974), 533-553; Erratum to this article, ibid., 24, No. 9 (1975).
Gantmakher, F. R. [1] Teoriya Matrits. Nauka: Moskva, 1967. English translation: Gantmakher, F. R., Theory of Matrices, Chelsea: New York, 1977. Giirtner, J. [1] Teoremy o bol'shikh ukloneniyakh dlya nekotorogo klassa sluchainykh protsessov. Teor. Veroyatnost. i Primenen., 21, No. 1 (1976), 95-106. English translation : Theorems on large deviations for a certain class of random processes. Theory Probab. Appl., 21, No. I (1976). 96-307.
[2] 0 logarifmicheskoi asimptotike veroyatnostei bol'shikh uklonenii. Kandidatskaya dissertatsiya, Moskva, 1976.
[3] 0 bol'shikh ukloneniyakh of invariantnoi mery. Teor. Veroyatnost. i Primenen., 22, No. 1(1977),27-42. English translation: On large deviations from the invariant measure. Theory Probab. [4]
Appl., 22, No. I (1977), 24-39. Nonlinear diffusion equations and excitable media. Soviet Math. Dokl., 254 (1980), 1310-
[5]
On wave front propagation. Math. Nachr., 100 (1981), 271-296.
1314.
Gartner, J. and Freidlin, M. I. [ 1 ] On the propagation of concentration waves in periodic and random media. Soviet Math. Dokl., 249 (1979), 521-525.
Gel'fand, I. M. and Fomin, S. V. [1] Variatsionnoe Ischislenie. Fizmatgiz: Moskva, 1961. English translation: Gel'fand, I. M. and Fomin. S. V., Calculus of Variations, revised English edition, Prentice-Hall: Englewood Cliffs, NJ, 1963. Gikhman, I. I. [1] Po povodu odnoi teoremy N. 1. Bogolyubova. Ukrain. Mat. Zh., 4, No. 2(1952),215-218. Gikhman, I. I. and Skorokhod, A. V.
[I]
Vvedenie v Teoriyu Sluchainykh Protsessov. Nauka: Moskva, 1965.
English translation: Introduction to the Theory of Random Processes, W. B. Saunders: [2]
Philadelphia, 1969. Sluchainye Protsessy. Nauka: Moskva, 1971.
English translation: Gikhman, 1. 1. and Skorokhod, A. V., The Theory of Stochastic Processes. Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen, Band 210. Springer-Verlag: Berlin. Heidelberg, New York, 1974-1979. Girsanov, 1. V.
[1] 0 preobrazovanii odnogo klassa sluchainykh protsessov s pomoshch'yu absolyutno nepreryvnoi zameny mery. Teor. Veroyatnost. i Primenen., 5, No. 3 (1960), 314-330. English translation: On transforming a certain class of stochastic processes by absolutely continuous substitution of measures. Theory Probab. Appl., 5, No. 3 (1960), 285-301. Grigelionis, B. [1] 0 strukture plotnostei met, sootvetstvuyushchikh sluchainym protsessam. Litovsk. Mat. Sb., 13, No. I, (1973), 71-78. Grin', A. G.
[1] 0 vozmushcheniyakh dinamicheskikh sistem regulyarnymi gaussovskimi protsessami. Teor. Veroyatnost. i Primenen., 20, No. 2 (1975), 456-457.
English translation: On perturbations of dynamical systems by regular Gaussian processes. Theory Probabl. Appl., 20, No. 2 (1975), 442-443.
422
References
[2]
Nekotorye zadachi, kasayushchiesya ustoichivosti pri malykh sluchainykh vozmushcheniyakh. Kandidatskaya dissertatsiya, Moskva, 1976. [3] 0 malykh sluchainykh impul'snykh vozmuschcheniyakh dinamicheskikh sistem. Teor. Veroyatnost. i Primenen., 20, No. 1 (1975), 150-158. English translation: On small random impulsive perturbations of dynamical systems, 20, No. 1 (1975), 150-158. Gulinsky, O. V. and Veretennikov, A. Yu. [ I ] Large Deviations for Discrete-Time Processes with Averaging. VSP: Utrecht, 1993.
Holland, C. J. [1] Singular perturbations in elliptic boundary value problems. J. Differential Equations, 20, No. 1 (1976), 248-265. Hunt, G. A. [1] Some theorems concerning Brownian motion. Trans. Amer. Math. Soc., 81 (1956), 294319.
lbragimov, I. A. and Linnik, Yu. V. [I] Nezavisimye i Statsionarno Svyazannye Velichitry. Nauka: Moskva, 1965. English translation: Independent and Stationary Sequences of Random Variables. WoltersNordhoff: Groningen, 1971. Ibragimov, I. A. and Rozanov, Yu. A. [1] Gaussian Random Processes. Springer, 1978.
Ikeda, N. [I] On the construction of two-dimensional diffusion processes satisfying Wentzell's boundary conditions and its application to boundary value problems. Mem. Coll. Sci. Kyoto. Ser. A, 33, No. 3 (1961), 367-427, loffe, A. D. and Tikhomirov, V. M. [1] Teoriya Ekstremalnykh Zadach. Nauka: Moskva, 1974. English translation: loffe, A. D. and Tikhomirov, V. M., Theory of Extremal Problems. Studies in Mathematics and its Applications, Vol. 6. North-Holland : Amsterdam, New York, 1979. Kato, T. [1] Perturbation Theory for Linear Operators, 2nd ed. Springer-Verlag: Berlin, Heidelberg, New York, 1976. Khas'minskii, R. Z. [1] Ustoichivost' Sistem Differentsial'nykh Uravnenii Pri Sluchainykh Vozmushcheniyakh 1kh Parametrov. Nauka: Moskva, 1969. English translation: Khas'minskii, R. Z., Stochastic Stability of Differential Equations, 2nd ed. Sijthoff & Noordhoff: Alphen aan den Rijn, Netherlands and Rockville, MD, 1980.
[2] 0 polozhitelnykh resheniyakh uravneniya 2Iu + V u = 0. Teor. Veroyatnost. i Primenen., 4, No. 3 (1959), 332-341.
English translation: On positive solutions of the equation 91u + V u = 0. Theory Probab. App!., 4, No. 3 (1959), 309-318.
[3] 0 printsipe usredneniya dlya parabolicheskikh i ellipticheskikh differentsial'nykh uravnenii i markovskikh protsessov s maloi diffuziei. Teor. Veroyatnost. i Primenen., 8, No. 1 (1963), 3-25. English translation: Principle of averaging for parabolic and elliptic differential equations and for Markov processes with small diffusion. Theory Probab. App!., 8, No. 1(1963), 1-21.
[4] 0 sluchainykh protsessakh, opredelyaemykh differentsial'-nymi uravneniyami s malym parametrom. Teor. Veroyatnost. i Primenen., 11, No. 2 (1966), 240-259. English translation: On stochastic processes defined by differential equations with a small parameter. Theory Probab. App!., 11, No. 2 (1966), 211-228.
[5]
Predel'naya teorema dlya reshenii differentsial'nykh uravnenii so sluchainoi pravoi chast'yu. Teor. Veroyatnost. i Primenen., 11, No. 3 (1966), 444-462. English translation: A limit theorem for the solutions of differential equations with random right-hand sides. Theory Probab. App!., 11, No. 3 (1966), 390-406.
References
423
[6] 0 printsipe usredneniya dlya stokhasticheskikh differentsial'nykh uravnenii Ito. Kyber[7]
netika, 4, No. 3 (1968), 260-279. Ergodic properties of recurrent diffusion processes and stabilization of the solutions of the Cauchy problem for parabolic equations, Teor. vroyatnost. i primenen., 5 (1960), 179196.
Kifer, Yu. I. [1] Nekotorye rezul'taty, kasayushchiesya malykh sluchaynykh vozmushchenii dinamicheskikh sistem. Teor. Veroyatnost. i Primenen., 19, No. 2 (1974), 514-532. English translation: Certain results concerning small random perturbations of dynamical systems. Theory Probab. Appl., 19, No. 2 (1974), 487-505. [2] 0 malykh sluchaynikh vozmushcheniyakh nekotorykh gladkikh dinamicheskikh sistem. lzv. Akad. Nauk SSSR Ser. Mat., 38, No. 5 (1974),1091-1115.
English translation: Kifer, Ju. I., On small random perturbations of some smooth dynamical systems. Math. USSR-Izv., 8, No. 5 (1974), 1083-1107. Ob asimptotike perekhodnykh plotnostei protsessov s maloi diffuziei. Teor. Veroyatnost. i Primenen., 21, No. 3 (1976), 527-536. English translation: On the asymptotics of the transition density of processes with small diffusion. Theory Probab. Appl., 21, No. 3 (1976), 513-522. [4] Large deviations in dynamical systems and stochastic processes. Invent. Math., 110 (1992), 337-370. [5] Limit theorems on averaging for dynamical systems, Ergodic Theory Dynamical Syst., 15 (1995), 1143-1172.
[3]
Kolmogorov, A., Petrovskii, I., and Piskunov, N. [ I ] Etude de l'equation de la diffusion avec croissnence de la matibre et son application a un problamne biologique. Moscow Univ. Buul. Math., 1 (1937),1-25.
Kolmogorov, A. N. [1] Zur Umkehrbarkeit der statistischen Naturgesetze. Math. Ann., 113 (1937), 766-772. Kolmogorov, A. N. and Fomin, S. V. [1] Elementy Teorii Funktsii i Funktsional'nogo Analiza. Nauka: Moskva, 1968. English translation : Elements of the Theory of Functions and Functional Analysis. Graylock Press: Rochester, NY, 1957.
Komatsu, T. [1] Markov processes associated with certain integro-differential equations, Osaka J. Math., 10 (1973), 271-303.
Kozlov, S. M. [1] The averaging of random operators. Math. USSR Sbornik, 37, No. 2 (1979), 167-180.
Krasnosel'skii, M. A. and Krein, S. G. [1] 0 printsipe usredneniya v nelineinoi mekhanike. Uspekhi Mat. Nauk, 10, No. 3 (1955), 147-152.
Krylov, N. V.
[I] 0 kvazidiffuzionnykh protsessakh. Teor. Veroyatnosi. i Primenen., 11, No. 3 (1966), 424-443.
English translation: On quasi-diffusional processes. Theory Probab. Appl., 11, No. 3 [2]
(1966), 373-389. Upravlyaemye Difjuzionnye Protsessy. Nauka: Moskva, 1977. English translation: Controlled Diffusion Processes. Springer-Verlag: New York, Heidelberg, Berlin, 1980.
Krylov, N. V. and Safonov, M. V. On a problem suggested by A. D. Wentzell. In The Dynkin Festschrift. Markov Processes and their Applications, M. I. Freidlin, Ed., Birkhduser, 1994, 209-220.
[I]
Kunita, H. and Watanabe, S. [I] On square integrable martingales. Nagoya Math. J., 30 (1967), 209--245.
References
424
Labkovskii, V. A. [1] Novye predel'nye teoremy o vremeni pervogo dostizheniya granitsy tsep'yu Markova. Soobshch. Akad. Nauk Gruzin. SSR, 67, No. 1 (1972), 41-44. Ladyzhenskaya, O. A. and Ural'tseva, N. N. [1] Lineinye i Kvazilineinye Uravneniya Ellipticheskogo Tipa. Nauka: Moskva, 1964. English translation: Ladyzhenskaya, O. A. and Ural'tseva, N. N., Linear and Quasilinear
Elliptic Equations. Mathematics in Science and Engineering, vol. 46. Academic Press: New York. 1968.
Lepeltier, J. P. and Marchal, B. [I] Probleme des martingales et equations differentielles stochastiques associbs a un operateur integro-diffbrentiel. Ann. Inst. H. Poincare, B12, No. 1 (1976), 43-103. Levina, L. V., Leontovich, A. M., and Pyatetskii-Shapiro, I. I. [1] Ob odnom reguliruemom vetvyashchemsya protsesse. Problem), Peredachi Informatsii, 4, No. 2 (1968). 72-83. English translation: A controllable branching process. Problems Inform. Transmission, 4, No. 2 (1968), 55-64. Levinson, N.
[1]
The first boundary value problem for equation &Au + Au. + Bus + Cu = D for small s. Ann. Math., 51, No. 2 (1950), 428-445.
Liptser, R. S.
[1]
[2]
Large deviations for occupation measures of Markov processes. Theory Probab. AppL, 1 (1996), 65-88. Large deviations for two-scaled diffusion. Probab. Theory Related Fields, 1 (1996), 71-104.
Malkin, I. G. [1] Theory of Stability of Motion. United States Atomic Energy Commission: Washington, DC, 1952. Mandl, P. [1] Analytic Treatment of One-Dimensional Markov Processes. Springer: Prague, Academia, 1968.
McKean, H. P., Jr. [I] Stochastic integrals. Probability and Mathematical Statistics, vol. 5. Academic Press: New York, 1969. Mogul'skii, A. A. [1] Bol'shie ukloneniya dlya traektorii mnogomernykh sluchainykh bluzhdanii. Teor. Yeroyatnost. i Primenen., 21, No. 2 (1976), 309-323. English translation: Large deviations for trajectories of multi-dimensional random walks. Theory Probab. Appl., 21, No. 2 (1976), 300-315. Molchanov, S. A. [I] Diffuzionnye protsessy i rimanova geometriya. Uspekhi Mat. Nauk, 30, No. ](1975),3-59.
English translation: Diffusion processes and Riemannian geometry. Russian Math. Surveys, 30, No. 1 (1975), 1-63.
Neishtadt, A. I. [1] Ob osrednenii v mnogochastotnykh sistemakh, I: Dokl. Akad. Nauk SSSR, 223, No. 2 (1975), 314-317; II: ibid., 226, No. 6 (1976), 1296-1298. English translation: Averaging in multifrequency systems. 1: Soviet Physics. Doklady, 20, No. 7 (1975). 492-494; Ii: ibid., 21, No. 2 (1976), 80-82. Nevel'son, M. B. [1] 0 povedenii invariantnoi mery diffuzionnogo protsessa s maloi diffuziei na okruzhnosti. Teor. Yeroyatnost. l Primenen., 9, No. 1 (1964), 139-146. English translation : On the behavior of the invariant measure of a diffusion process with small diffusion on a circle. Theory Probab. Appl., 9, No. 1 (1964), 125-131. Nguen Viet Fu [1] K odnoi zadache ob ustoichivosti pri malykh sluchainykh vozmushcheniyakh. Vestnik Moskov. Univ. Ser. Mat. Mekh., 1974, No. 5, 8-13.
References
[2]
425
Malye gaussovskie vozmushcheniya i uravneniya Eilera. Vestnik Moskvv. Univ. Ser. Mat. Mekh., 1974, No. 6, 12-18.
Papanicolaou, G. and Varadhan, S. R. S. [11 Boundary-value problems with rapidly oscillating random coefficients. In Proceedings of the Conference on Random Fields, Esztergon, Hungary, Colloquia Math. Soc. Janos Bolyai, 27 (1981), 835-873. [2] Diffusion in regions with many small holes. In Proceedings of the Conference Stochastic Differential Systems and Control, Vilnius, 1978, 1979. Petrov, V. V.
[I] Summy Nezavisimykh Sluchainykh Velichin. Nauka: Moskva, 1972. English translation: Sums of Independent Random Variables. Ergebnisse der Mathematik and ihrer Grenzgebiete, 82. Springer-Verlag: Berlin, Heidelberg, New York, 1975.
Pontryagin, L. S., Andronov, A. A., and Vitt, A. A. [I] 0 statisticheskom rassmotrenii dinamicheskikh sistem. Zh. Eksper. Teoret. Fiz., 3 (1933), No. 3, 165-180. Prokhorov. Yu. V. [1] Skhodimost' sluchainykh protsessov i predelnye teoremy teorii veroyatnostei. Tear. Veroyatnost. i Primenen., 1 (1956), No. 2, 177-238. English translation: Convergence of random processes and limit theorems in probability theory. Theory Probab. App!., 1, No. 2 (1956), 157-214. Riesz, F. and Szokefalvi-Nagy, B. [1] Functional Analysis. Translation of second French edition. Ungar: New York, 1955.
Rockafellar, R. T. [1] Convex Analysis. Princeton Mathematical Series, 28. Princeton University Press: Princeton, NJ, 1970. Rozanov, Vu. A. [I]
Statsionarnve Sluchainye Protsessy. Fizmatgiz: Moskva, 1963.
English translation: Stationary Random Processes. Holden-Day: San Francisco, 1967. Sarafyan, V. V., Safaryan, R. G., and Freidlin, M. 1. [I] Vyrozhdennye diffuzionnye protsessy i diffuzionnye uravneniya s malym parametrom. Uspekhi Mat. Nauk, 33, No. 6 (1978), 233-234.
English translation: Degenerate diffusion processes and differential equations with a small parameter. Russian Math. Surveys. 33, No. 6 (1978), 257-260. Schilder, M.
[I] Some asymptotic formulas for Wiener integrals. Trans. Amer. Math. Soc., 125, No. 1 (1966), 63-85. Sinai, Ya. G.
[1]
Gibbovskie mery v ergodicheskoi teorii. Uspekhi Mat. Nauk, 27, No. 4 (1972), 21-64.
English translation: Gibbs measures in ergodic theory. Russian. Math. Surreys, 27, No. 4 (1972), 21-69.
Sinai, Ya. G. and Khanin, K. M. [ 11 Mixing for some classes of special flows over a circle rotation. Funct. Anal. Appl., 26, No. 3 (1992), 1-21. Skorokhod, A. V. [I) Sluchainye Protsessy s Ne:avisimvmi Prirashchemiyami. Nauka: Moskva, 1964. Sobolev, S. L.
[1]
Nekotorye primeneniya funktsional'nogo analiza v matematicheskoi fizike. Izd. Len. gos. universiteta, Leningrad, 1950. English translation: Applications of Functional Analysis in Mathematical Physics. Translations of Mathematical Monographs, 7. American Mathematical Society: Providence, RI, 1963.
426
References
Stratonovich, R. L. [1] Uslovnye Markovskie Protsessy i Ikh Primenenie k Teorii Optimal'nogo Upravleniya. Izd. MGU: Moskva, 1966. English translation: Conditional Markov Processes and Their Application to the Theory
of Optimal Control. Modern Analytic and Computational Methods in Science and Mathematics, 7. American Elsevier: New York, 1968.
Stroock, D. W. [I] Diffusion processes with Levy generators. Z. Wahrsch. Verve. Gegiete, 32 (1975), 209-244. Stroock, D. W. and Varadhan, S. R. S. [1] Diffusion processes with continuous coefficients, I: Comm. Pure App!. Math., 22, No. 3 (1969), 345-400; 11: ibid., 22, No. 4 (1969), 479-530. [2]
Multidimensional Diffusion Processes. Springer, 1979.
Sytaya, G. N
[1] 0 nekotorykh lokal'nykh predstavleniyakh dlya gaussovoi mery v gil'bertovom prosstranstve. Tezisy dokladov Mezhdunarodnoi konferentsii po teorii veroyatnostei i matematicheskoi statistike. Vilnius, 1973, Vol. II, 267-268. Varadhan, S. R. S. [I] Asymptotic probabilities and differential equations. Comm. Pure Appl. Math., 19, No. 3 [2] [3]
[4]
(1966), 261-286. On the behavior of the fundamental solution of the heat equation with variable coefficients. Comm. Pure App!. Math., 20, No. 2 (1967), 431-455. Diffusion processes in a small time interval. Comm. Pure Appl. Math., 20, No. 4 (1967), 659-685. Large Deviations and Application. In SIAM, CBMS-NSF, Regional Conferences series in Applied Mathematics, 46 (1984).
Volosov, V. M. [1] Usrednenie v sistemakh obyknovennykh differentsial'nykh uravnenii. Uspekhi Mat. Nauk, 17, No. 6 (1962), 3-126.
English translation: Averaging in systems of ordinary differential equations. Russian Math. Surveys, 17, No. 6 (1962), 1-126. Walsh, J. [1] An introduction to stochastic partial differential equations. In Ecole d'ete de Probabilite de
Saint-Flour XIV-1984, P. L. Hennequin, Ed., Lecture Notes in Mathematics, 1180, Springer, 1984, 265-439.
Wentzell, A. D. [1] Kurs Teorii Sluchainykh Protsessov. Nauka: Moskva, 1975. English translation: Wentzell, A. D., A Course in the Theory of Stochastic Processes. McGraw-Hill: New York, London, 1980. [2] Ob asimptotike naibol'shego sobstvennogo znacheniya ellipticheskogo differentsial'nogo operatora s malym parametrom pri starshikh proizvodnykh. Dokl. Akad. Nauk SSSR, 202, No. 1 (1972), 19-21. English translation: Wentzell. A. D.. On the asymptotic behavior of the greatest eigenvalue
of a second-order elliptic differential operator with a small parameter in the higher [3]
[4]
derivatives. Soviet Math. Dokl., 13, No. 1 (1972), 13-17. Ob asimptotike sobstvennykh znachenii matrits s elementami proyadka exp{-Vjjl2e2}. Doki. Akad. Nauk SSSR, 202, No. 2 (1972), 263-266. English translation: Wentzell. A. D., On the asymptotics of eigenvalues of matrices with elements of order exp( - V;t/(2e2)}. Soviet Math. Dokl., 13, No. 1 (1972), 65-68. Teoremy, kasayushchiesya funktsionala deistviya dlya gaussovskikh sluchainykh funktsii. Teor. Veroyatnost. i Primenen., 17, No. 3 (1972), 541-544.
English translation: Wentzell. A. D., Theorems on the action functional for Gaussian [5]
random functions. Theory Probab. Appl., 17, No. 3 (1972), 515-517. Formuly dlya sobstvennykh funktsii i mer, svyazannykh s markovskim protsessom. Teor. Veroyatnost. i Primenen., 18, No. I (1973), 3-29.
English translation: Wentzell, A. D., Formulae for eigenfunctions and eigenmeasures associated with a Markov process. Theory Probab. App!., 18, No. 1 (1973), 1-26.
References
427
[6]
Ob asimptotike pervogo sobstvennogo znacheniya differentsial'nogo operatora vtorogo poryadka s malym parametrom pri starshikh proizvodnykh. Teor. Veroyatnost. i Primenen., 20, No. 3 (1975), 610-613. English translation: Wentzell, A. D.. On the asymptotic behavior of the first eigenvalue of a second-order differential operator with small parameter in higher derivatives. Theory Probab. Appl., 20, No. 3 (1975), 599-602. [7] Grubye predel'nye teoremy o bol'shikh ukloneniyakh dlya markovskikh sluchainykh protsessov, 1: Teor. Veroyatnost. i Primenen., 21, No. 2 (1976), 231-251; 11: ibid., 21, No. 3 (1976), 512-526. English translation: Wentzell, A. D., Rough limit theorems on large deviations for Markov stochastic processes, I: Theory Probab. Appl., 21, No. 2 (1976), 227-242, 11: ibid., 21, No. 3 (1976), 499-512. [8] On lateral conditions for multidimensional diffusion processes. Teor. Veroyatn.i Primen., 4, No. 2 (1959), 172-185. [9] Limit Theorems on Large Deviations for Markov Stochastic Processes. Kluwer: Dordrecht, 1991.
Wentzell, A. D. and Freidlin, M. 1, [1] Malye sluchainye vozmushcheniya dinamicheskoi sistemy s ustoichivym polozheniem ravnovesiya. Dokl. Akad. Nauk SSSR, 187, No. 3 (1969), 506-509. English translation: Wentzell, A. D. and Freidlin, M. I., Small random perturbations of a dynamical system with a stable equilibrium position. Soviet Math. Dokl.,10, No. 4 (1969), 886-890.
[2] 0 predel'nom povedenii invariantnoi mery pri malykh sluchainykh vozmusheniyakh dinamicheskikh sistem. Dokl. Akad. Nauk SSSR, 188, No. 1 (1969), 13-16. English translation: Wentzell, A. D. and Freidlin, M. I., On the limiting behavior of an invariant measure under small perturbations of a dynamical system. Soviet Math. Dokl., 10, No. 5 (1969); 1047-1051. [3] 0 dvizhenii diffundiruyushchei chastitsy protiv techeniya. Uspekhi Mat. Nauk, 24, No. 5 (1969), 229-230. [4] 0 malykh sluchainykh vozmushcheniyakh dinamicheskikh sistem. Uspekhi Mat. Nauk, 25, No. I (1970), 3-55. English translation: Wentzell, A. D. and Freidlin, M. I., On small random perturbations of dynamical systems. Russian Math. Surveys, 25, No. 1 (1970), 1-55. [5] Nekotorye zadachi, kasayushchiesya ustoichivosti pri malykh sluchainykh vozmushcheniyakh. Teor. Veroyatnost. i Primenen., 17, No. 2 (1972), 281-295.
English translation: Wentzell, A. D. and Freidlin, M. I., Some problems concerning stability under small random perturbations. Theory Probab. Appl., 17, No. 2 (1972), 269-283.
Wolansky, G. [ 11 Stochastic perturbations to conservative dynamical systems on the plane. Trans. Amer. Math. Soc., 309 (1988), 621-639. (2] Limit theorems for a dynamical system in the presence of resonances and homoclinic orbits. I Differential Equations, 83 (1990), 300-335.
Zhivoglyadova, L. V. and Freidlin, M. I. [1] Kraevye zadachi s malym parametrom dlya diffuzionnogo protsessa s otrazheniem. Uspekhi Mat. Nauk, 31, No. 5 (1976), 241-242.
Index
Action function 80 for families of distributions in a
Diffusion processes on graphs
finite-dimensional space 138, 140 for the invariant measure 190-191 normalized 80
Action functional 73, 81, 413-414 connection with Feynman's approach to quantum mechanics 73
Eigenvalue problems
289
204
Equivalence relation connected with the action functional 169 Exterior vertex 295
for diffusion processes with small diffusion 104, 154, 155
First integral
for diffusion processes with small diffusion and reflection 395 for a dynamical system with Gaussian perturbations 133
for families of locally infinitely divisible processes 146 for families of random measures
for a dynamical system 283 for a Markov process 354
Gluing conditions 289, 295 Graph corresponding to a function 287
387,
391
for Gaussian random processes and fields 92
normalized
74
for stochastic equations with averaging
Hamiltonian system 283 Hierarchy of cycles 198
Homogenization 415 Huygens principle 403
277
for systems with averaging 235-236, 254 for the Wiener process with a small
parameter
Interior vertex 295
74
Asymptotic expansion for the solution of the perturbed system
KPP-equation
398-399
2. 52
for the solution of a stochastic equation with a small parameter 56 for the time and place of the first exit from
a domain 58 Averaging principle 213, 217, 269, 286
Laplace method 70, 121, 383 Legendre transformation of convex functions 136 Limit behavior of the solution of the Dirichlet problem 64, 67, 69, 115, 158, 176, 198, 273-274,
Brownian sheet 409
281-282,297-298 of the solution of the mixed problem
392,
396
Cramer transformation 90, 148, 384
of the solution of parabolic equations 105-108,280-281
60,
430
Index
Limit distribution. See Problem of the invariant measure Logarithmic equivalence
11
Reaction-diffusion equation (RDE) 408, 414 Regularity of a set with respect to a function (a functional) 85 Results of the type of laws of large numbers 5
Measure of stability 364-365
for perturbations of the white noise type 45
for stochastic equations with averaging Normal deviations. See Results of the type of the central limit theorem Normalizing coefficient 80
269
for systems with averaging 216 Results of the type of the central limit theorem 5 for perturbations of the white noise type
Perturbations of dynamical systems by a white noise 8, 45 nonrandom I random 3
57
for systems with averaging 221 Results on large deviations 5, 377. See also Action functional
Principle of not feeling the boundary 379 Problem of the asymptotics of the mean exit time 124, 127, 134, 160, 249 Problem of exit from a domain
108, 158,
160, 176, 192, 300
Problem of the invariant measure 128, 158, 160,185,412-413 Problem of optimal stabilization 366-367 Problem of selecting an optimal critical domain 365
Sets stable with respect to the action functional 188 Sublimit distributions 198
Van der Pol equation
W -graphs Quasipotential
108, 118, 159
177
Wave front 404
214, 261
Grundlehren der mathematischen Wissenschaften Continued from page ii
258. Smoller: Shock Waves and Reaction-Diffusion Equations, 2nd Edition 259. Duren: Univalent Functions 260. Freidlin/Wentzell: Random Perturbations of Dynamical Systems, Second Edition 261. Bosch/Giintzer/Remmert: Non Archimedian Analysis-A System Approach to Rigid Analytic Geometry 262. Doob: Classical Potential Theory and Its Probabilistic Counterpart 263. Krasnosel'skil/Zabrelko: Geometrical Methods of Nonlinear Analysis 264. Aubin/Cellina: Differential Inclusions 265. Grauert/Remmert: Coherent Analytic Sheaves 266. de Rham: Differentiable Manifolds 267. Arbarello/Cornalba/Griffrths/Hams: Geometry of Algebraic Curves, Vol. 1 268. Arbarello/Cornalba/Griffiths/Harris: Geometry of Algebraic Curves, Vol. 11 269. Schapira: Microdifferential Systems in the Complex Domain 270. Scharlau: Quadratic and Hermitian Forms 271. Ellis: Entropy, Large Deviations, and Statistical Mechanics 272. Elliott: Arithmetic Functions and Integer Products 273. Nikol'skil: Treatise on the Shift Operator 274. Hormander: The Analysis of Linear Partial Differential Operators III 275. Hormander: The Analysis of Linear Partial Differential Operators IV 276. Liggett: Interacting Particle Systems 277. Fulton/Lang: Riemann-Roch Algebra 278. Barr/Wells: Toposes, Triples and Theories 279. Bishop/Bridges: Constructive Analysis 280. Neukirch: Class Field Theory 281. Chandrasekharan: Elliptic Functions 282. Lelong/Gruman: Entire Functions of Several Complex Variables 283. Kodaira: Complex Manifolds and Deformation of Complex Structures 284. Finn: Equilibrium Capillary Surfaces 285. Burago/Zalgaller: Geometric Inequalities 286. Andrianov: Quadratic Forms and Hecke Operators 287. Ma.skit: Kleinian Groups
288. JacodlShiryaev: Limit Theorems for Stochastic Processes 289. Manin: Gauge Field Theory and Complex Geometry 290. Conway/Sloane: Sphere Packings, Lattices and Groups 291. Hahn/O'Meara: The Classical Groups and K-Theory 292. Kashiwara/Schapira: Sheaves on Manifolds 293. Revuz/Yor. Continuous Martingales and Brownian Motion 294. Knus: Quadratic and Hermitian Forms over Rings 295. Dierkes/Hildebrandt/Kiister/Wohlrab: Minimal Surfaces I 296. Dierkes/Hildebrandt/Kiister/Wohlrab: Minimal Surfaces IT 297. Pastur/Figotin: Spectra of Random and Almost-Periodic Operators 298. Berline/Getzler/Vergne: Heat Kernels and Dirac Operators 299. Pommerenke: Boundary Behaviour of Conformal Maps 300. Orlik/Terao: Arrangements of Hyperplanes 301. Loday: Cyclic Homology 302. Lange/Birkenhake: Complex Abelian Varieties 303. DeVore/Lorentz: Constructive Approximation 304. Lorentz/v. Golitschek/Makovoz: Constructive Approximation. Advanced Problems 305. Hiriart-Urruty/Lemarichal: Convex Analysis and Minimization Algorithms I. Fundamentals 306. Hiriart-Urruty/Lemardchal: Convex Analysis and Minimization Algorithms II. Advanced Theory and Bundle Methods 307. Schwarz: Quantum Field Theory and Topology 308. Schwarz: Topology for Physicists 309. Adem/Milgram: Cohomology of Finite Groups 310. Giaquinta/Hildebrandt: Calculus of Variations I: The Lagrangian Formalism 311. Giaquinta/Hildebrandt: Calculus of Variations II: The Hamiltonian Formalism
312. Chung/Zhao: From Brownian Motion to Schrodinger's Equation 313. Malliavin: Stochastic Analysis 314. Adams/Hedberg: Function Spaces and Potential Theory 315. Btirgisser/Clausen/Shokrollahi: Algebraic Complexity Theory
316. Saff/Totik: Logarithmic Potentials with External Fields 317. Rockafellar/Wets: Variational Analysis 318. Kobayashi: Hyperbolic Complex Spaces