This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
e0 : 8o E F} .
Definition 10.3.1
=
We prove in this paper that, except for a case studied in § 10.4.3 where
either \i8o E F, e0 = 1 and then
•
or 38o E F, e 0 = 0 and then
=
0 (if He0 is accepted for at least
At the beginning of § 10.4 we shall explain that a has t o b e infinitesimal. We shall prove in § 10.5 that this NSLRT is uniformly more powerful than any "regular" (this term will be defined later) nonstandard test based on X. Convention: if E is an external event, and P R(E) its external probability and if T) is an external number, "the probability of E is equal to T)" means P R(E) = T) and "the probability of E is exactly equal to TJ" means P R(E) TJ · =
10.3.4
Large deviations for
X
In exponential families, J( · , >.(Bo)) can be regarded as the Cramer transform of Pe0 • rviore generally, classical literature establishes that
1

n nlim  ln P (X n E A) =  xinf C(x) + oo n
EA
where A is a Borel set of �k such that A C A0 and X n the mean of n i.i.d. ran dom variables taking values in � k , C the Cramer transform of their distribution (see [16] , section 4: the proof is based on the minimax theorem for compact convex sets) . Even more generally, if the i.i.d. random variables take values in some topological vector space E , classical literature ( cf. [3] ) establishes the same result if A is a finite union of convex Borel sets of E. For exponential families, it is possible to establish specific proofs by using halfspaces (cf. [10] , chap. 7 and exercises 7.5.1 to 7.5.6). In a more general setting, halfspaces are also used in [3] . section 2. Our propositions 10.3.2 and 10.3.3 give nonstandard results which imply (by transfer) the classical results. If the first part of the proofs is based on the law of large numbers as in the classical literature, the second part, based on infinitesimal pavings, seems to be original.
10.3. Exponential families 10.3.5
151
nregular sets
Definition 10.3.2 Let A is nregular in a iff 3b E
be a subset of A and a E 8A. We shall say that A hal(a), 3p E J Jn, 0] such that P l l >  1 (b) I I 0 and B(b, p) C A (where B(b, p) is the open ball of centrum b and radius p) .
Proposition 10.3.1 Let A hal(8A) = hal( a 0A); then A
=:
be a limited subset of A such that 0A = ( 0A )0 and is nregular in any point a E a A.
We first claim that A \ hal(8A) = 0A \ hal(8 °A) . Indeed, if x E A \ hal(8A) then °x E 0A and 0x tf. hal(8A) hal( a 0A) and so 0x E 0A \ hal(8 °A); consequently, 0x E (0A)0 and so x E (0A)0; finally, as x tf. hal(8A) = hal( a 0A) , x E 0A \ hal( a 0A) . Conversely, if x E 0A \ hal( a 0A), there exists y E A such that x y; then x E A for if not x E hal(8A) = hal(8 °A) . If a tf. hal(8A), choose an infinitesimal p such that p > Jn · Then for each b E A \ hal(8A) , B (b, p) C A: it is obvious if b is standard and if b is nonstan dard, with 0b its shadow, there exists a standard po such that B(0b, po) C A and then B(b, p) C B(0b, Po) C A. So Proof.
=
=:
Vc ?!:, 0, 3b EA, d(a, b) < c and B (b, p) C A. Then the internal set { c E � : 3b EA, d( a, b) < c 1\ B(b, p) C A} contains all appreciable c. Cauchy ' s principle yields that ::lc
=:
0, 3b E A, d(a, b) < c and B(b, p) C A;
as a tf. hal(8A) then b tf. hal(8A) and so >.  1 (b) is limited (because A is a standard diffeomorphism from 8 ° onto A0). Finally p 1 1 >  1 (b) l l ::: 0. If a E hal(8A) , fix p = n 1 14 ; if b E A\ (hal( 8A)Uhal( 8A)) then B (b, p) c A and, as >.  1 (b) is limited p 1 1 >  1 (b) l l < n 1 18 (for example). So the internal set
{ c E � : 3b EA. d(a., b) < c 1\ B(b, p) C A 1\ P l l > 1 (b) ll < n 1 } 18
contains all appreciable c. Cauchy 's principle yields that
::lc ::: 0, 3b E A, d(a, b) < c 1\ B(b, p) C A 1\ p I I >  1 (b) I I < n 1 18 .
D
For example, a standard set A such that A c A0 (e.g. an open standard set) is nregular in any boundary point. It is possible to prove that a limited convex set A such that 0A has a nonempty interior is also nregular in any boundary point (we shall not use this result in the following) .
10. Nonstandard likelihood ratio test in exponential families
152 Lemma 10.3.1
(i) Let (�o , 6) E A2 and let � E ]�o , 6 [ (the segment between �o and 6). Then J ( ( 6 ) < J ( �o , 6 ) . (ii) Let A be a nonempty subset of A and 6 E A0 \ A. Then J(A, 6) J(A, 6 ) = J (aA, 6 ) . (iii) Let E and F be subsets of A such that E is a compact set included in F 0 ; let � E A0 \ E. Then J(F, �) < J(E, 0 and J(�, F) < J (�, E) . =
Proof. (i) As A is convex, � E A. We set �o  � =: a(�  6 ) where a > 0; denoting e = >,  1 (�) and e 1 := >, 1 (6 ) , we have ( using corollary 2.5 in [10]) (ii) Let �o E A. If �o tf. aA, let � E aA n ].;o , 6 [. Using (i), we can write J(�, 6 ) :::; J(�o, 6 ) . So J(aA, 6 ) :::; J(A, 6 ) . Conversely, using the continuity of J ( , 6), we have ·
J (aA, 6 )
�
J (A, 6 ) = J(A, 6 ) .
(iii) If � E F, then J ( P �) = J (�, F) = 0 and J (E, �) /= 0 , J(�, E) /= 0 because � t/. E. We suppose now that � tf. F. Let �E E §E be such that J(E, �) = J (�E , �) ; there exists at least one �F E aF n ]�E , �[ and then, using (i), J (�F, �) < J(�E, �) and so J(F, �) < J(E, �) . Let now �k E §E be such that J (�, E) J (�, �k) ; there exists at least one �� E §F n J �k , �[. We set B E := A  1 (�k) and Bp := >.  1 (�� ). Then =
J (�, �k)  J (�, �� ) = =
1/J (BE)  1/J (B) + (e  e E 1 �)  1/J (BF) + 1/J (B)  (e  eF 1 �) = '1/J (e E )  1/J (eF) + ( eF  e E 1 �) = 1/J (BE)  1/J (BF) + (eF  e E 1 �� ) + (eF  eE 1 �  �� ) = 1 ( eF , e E) + ( eF  e E 1 .;  �� ) .
Then setting .;  �� = : (3( ��  �k) where (3 > 0, we can write according to corollary 2.5 in [10] .
D
10.3.
Exponential families
153
Proposition 10.3.2 Let A be a limited subset of A \ hal (8A) , 6 = >. (B1) be a limited point of A \ hal (8A) and �o E aA such that J(�o , 6 ) :: J(A, 6 ) . If A is nregular in �o , then
1

 ln Pl) (X E A) :: J(A, 6 ) , 1
n and if furthermore d(6 , A) 7: 0, then
1

 ln Pl) (X E A) 1
n
=
J (A, 6) ( 1 + 0)
=
@.
Proof. For all (el , e2) E 8 2 ' denoting X (xl , . . . ' Xn ) and X (where Xj = (xj ,i hs i s k E � k ) we can write =
(
=
� 'L.j= l Xj
)
PlJ; ( dx) = exp n ( ( el  e2 I x)  1/J ( el) + 1/J ( e2)) p� ( dx) .
Let eo := >.  1 ( �o) and let 6 =: >. (e2 ) and p > Jn be such that 6 E hal ( �o ) and B( 6 , p) c A. As >. is a standard diffeomorphism from 8° onto A0 , Bo , B1 , B2 are limited. Then
PlJ; ( X E A) > PlJ; (X E B ( 6 , p)) > c exp n ((el  e2 I X)  1/J(Bl ) + 1/J(B2)) dP� jXEB(b,p) > exp n c ((el  e2 I X)  '1/J (Bl ) + 1/J (B2)) dPlJ2 JXEB(b,p)
(
(
)
)
by Jensen ' s inequality. P� (Xi E 6 ,  p/ 2, 6,; + p/2]) = + 0 for According to (8.4) in each i = . . . k. Then the nonstandard law of large numbers yields P� (X E B ( 6 , p)) = + 0 and we can write ( denoting by 1:k the external set of limited vectors of �k )
[ 4] ,
1
1 1  ln P(j ( X E A) n
1
>
[
1
,
.!. ln( 1 + 0 )
(B 1  e2 1 6 + J:k p )  1/J(BI ) + 1jJ(B2) + n 0 ( e l  e2 1 6) + J:p  1/J(e 1 ) + 1/J(e2) + n  1 (e2 , e1 ) + 0.
As
I (B2 , B1 )  I(eo , B 1 ) = (eo  e1 1 6  �o ) + I (B2 , Bo), and I (B2 , Bo) = @ I I B2  Bo ll = @ 11 6  �o il with I I Bo  B 1 ll limited, 11 6  �o il we have ( cf. lemma 3. 2 . 2 in
[ 13] ),
::
0
 I (B2 , e1 ) =  I(eo , el ) + 0 =  J(A, 6) + 0 ,
10. Nonstandard likelihood ratio test in exponential families
154
and so

1 n
 ln P� (X E A)
2'
 J (A, 6 ) + 0.
Conversely, the limited set A is included in a hypercube [ p, p] k where p is a standard integer; pave it with n k (2p) k hypercubes ( T1 ) with side 6 = � · Among these hypercubes, eliminate the ones which do not intersect A and for the others choose t1 E A n Tt . In the aim of simplicity, we shall denote again the selected hypercubes by ( Tt h< t < N · Let Bt : = >. 1 ( tt) , Bt is limited because t1 is limited and t 1 tf. hal(8A) . F�om the relation N < nk (2p) k we deduce In N = In n £ . Then we can write ' n n N dPenl p� ( X E A ) :::: L r l=l Jp<E Tl } N L r X E T1 } exp n( ( el  el I X)  1/J (Bl ) + 1/J (Bt)) dP� l=l }{
(
< <
Thus
1
)
f:l=l exp (n ( (el  el I tt + �k )  1/J (Bl) + 1/J (Bt) )) J: k ax exp { n ( ( el  el I t l + )  ?jJ(Bl ) + 1/J (Bt) ) } . N l ,!?l...N n

{(
1
k
)
J:  ln Pe (X E A) <  ln N + :m ax el  el I tt +   1/J (Bl ) + 1/J (Bt) l  l...N n n n J: 1 <  ln N + max { (Bl  Bt l tt)  1/J (BI ) + ?/J (Bt) } + l=l...N n n 1
< <
Finally,
1
}
J:
max  I (Bt , Bl ) + n ln N + l=l...N n
J(A. 6 ) + 0. 1 n

 ln Pe (X E A ) 1
=
 J ( A, 6 ) + 0.
As we said previously, according to lemma 3.2.2 in [ 13] , if a and 6 are limited elements of A \ hal(A) , then J(a , 6 ) = @d(a, 6 ) . Consequently, if d ( A, 6 ) = @ then J ( A , 6) = @. Thus
1

 :;:;, ln P� (X E A) = J(A, 6 ) ( 1 + 0) = @.
D
In fact. the continuity of J(·, 6) allows us to generalize this result to limited subsets of A:
10.3. Exponential families
155
Proposition 10.3.3 Let A be a limited subset of A, 6 be a limited point of A \ hal(8A) and let �o E aA such that J(�o, 6 ) J(A, 6 ) . ff A is nregular in �o then 1 n ln Pl)1 (X E A) ::  J (A, 6). ::
Proof. Let 6 =: >.(B2 ) E hal(�o) and p > Jn be such that B ( 6 , P) C A and p IIB2 II :: 0. As in the proof of proposition 10.3.2, we can write 1 n ln Pl)1 (X E A) > (B 1  e2 1 6 + £kp)  1jJ (B 1 ) + 'ljJ (B 2 ) + _!_n ln (1 + 0 ) (e l  e2 1 6 ) + £p + 0  1/J (B 1 ) + 1/J (B2 ) + 0 .J( 6 , 6 ) + 0 = J( �o , 6) + 0 =  J(A, 6 ) + 0 because 116  �o il :: 0 and .J( · , 6 ) is Scontinuous. Conversely, as in the proof of proposition 10.3.2 , use a paving (T1h < l < N with side 6 = � where each T1 is closed and such that T1 n A /= 0 , but cho;se t1 in T1 (maybe outside A) such that: if B1 , z  el. z > 0 then t l , z is maximal in T1 if B1 , z  el. i � 0 then tl , i is minimal in T1 So, for X E Tl we can write (e l  el I X) :::.: (e l  el I tl) and then
Pe� ( X E A)
N
�
L Jrp<E Tz } dr'e 1
<
L Jr{x E T1 } exp ( n((el  el I X)  1/J (Bl ) + 1/J (Bl )) ) dP�
<
l=l N
l=l N :L: exp n((e l  Bl l tl )  1/J (B I ) + 'l/J (B I )) . l=l
(
Thus
1 n ln Pl)1 (X E A)
)
1
<
 ln N + =maxN {(el  el I t l )  1/J (B l ) + 1/J ( Bl ) } l l... n
<
 ln N + =max l...N {.J(t1 , 6 )} .
1 n
l As .J(· , 6 ) is Scontinuous, J(tl , 6 ) = J (0tl , 6) + 0 and then , as 1  ln Pe: (X E A) �  J( 0 A, 6 ) + 0. n
0tl
E
0A
10. Nonstandard likelihood ratio test in exponential families
156
We now claim that J(0A, 6) ::: J(A, 6). Indeed, if a E A, then °a E 0A and J(a, 6 ) ::: J(0a, 6); so J(0A, 6 ) :;, J(A, 6). Conversely, if a E 0A, there exists a' E A such that a ::: a' and thus such that J (a, 6 ) ::: J (a', 6); so J(A, 6) ;::_ J( 0A, 6 ) . D Finally � ln PJ: ( X E A ) :::; J(A, 6) + 0. 10.3.6
nregular sets defined by KullbackLeibler information
Let 8o be an internal subset of 8, included in a standard compact K included in the interior of e. Let co := co (8o) := sup {I(B, 8o) : e E 8} :::; oo . For c E ]O, co[, let Ac : = {� E A : J(�, Ao) :::; c} where Ao : = >.(8o). We shall see in lemma 10.3.2 that J(· , Ao) is continuous; then if c < c' < co , Ac is a proper subset of Ac' (for if not, { � E A : J(�, Ao) :::; c1c' } = { � E A : J(�, Ao) < c1c' } is a connected component of A which is a connected set.) Lemma 10.3.2
If c :::; co is limited, then aAc is a limited compact set.
= { � E A : J ( �, Ao ) = c} is closed in A since the function � J(�, Ao) is continuous on A as we prove now. By transfer, we just have to prove this continuity for a standard Ao: if 6 and 6 are such that 11 6  6 11 ::: 0, if �o E Ao is such that J(6 , �o) ::: J(6 , Ao) then, as J(, �o) is Scontinuous, J( 6 , �o) ::: J(6 , �o) and so J( 6 , Ao);;,J(6 , Ao) ; similarly J(6 , Ao);;,J( 6 , Ao). We prove now that if � is unlimited, then J(�, A0 ) is unlimited. For each �o E Ao , �o belongs to the standard compact K; then II �  �o il ::: oo and so J(�, �o) ::: oo (cf. [10, p. 177]). Thus J(�, Ao) is unlimited. Therefore { � E A : J(�, Ao) = c} is limited. D ____,
Proof. a Ac
Proposition 10.3.4 Let c :::; co be limited and B1 be a limited point of 8' such that 6 := >.(B 1 ) tf. hal (8A) . (i) If 6 tf. Ac, then � ln PJ: (X E Ac) = � ln PJ: (X E A�) :::  J( aAc, 6 ) .
(ii) If 6 E Ac, then � ln PJ: ( X E Ac)
:::
(iii) If 6 tf. Af , then � ln PJ: ( X E Af)
0.
=
� ln PJ: (X E Af) ::: J( aA c , 6 ) .
(iv) If 6 E Af , then � ln PJ: ( X E Af) ::: 0.
10.3. Exponential families
157
Proof. We first prove the both similar results (i) and (iii), and then the both similar results (ii) and (iv) which are obtained in a same way. (i) If 8o and c are standard, Ac is a standard compact set such that Ac A2 . Indeed, A� {� E A : J (�, Ao) < c}. If � is such that J (�, Ao) c , let >.. o E Ao (a compact set) be such that J(�, Ao) = J (�, Ao) = J (�, >..o ) = c. According to lemma 10.3.1 (i) , for any ( E ] ( >.. o [, J ((, >..o ) < J(�, >..o ) and so J((. Ao) < c. Thus ]�, >..o [ E A� and then � E A2. So, according to proposition 10.3. 1 , Ac and A� arc nregular in any point of 8Ac and then, ac cording to proposition 10.3.3, we can write � ln PlJ; (X E Ac) :::::  J (Ac, 6 ) and � ln PlJ; (X E A�) ::::: J(A� , 6 ) . Finally, lemma 10.3.1 (ii) yields J(Ac, 6 ) = J (A�. 6 ) J(aAc. 6 ) . Now, if c or 8o are not standard, the continuity of J (�, ) on >.. ( K) implies 0Ac(8o) Aoc(08o) and the hypothesis of proposition 10.3.1 is verified since =
=
=
=
·
=
hal (8Ac)
=
hal (aAc) U { � E 8A : J (�, Ao) � c}
and
hal( a 0Ac) = hal (a0Ac) u { � E aA : J (�, 0Ao) � °C} . Indeed, on one hand, the continuity of J (�, ) implies ·
and on the other hand, this continuity also implies hal( 8Ac)
{ � E A : J (�, >.. ( 8o)) ::::: c} { � E A : J(�, >.. ( 08o)) ::::: 0c}
=
hal (8 °Ac)
·
Then we can conclude as before, using proposition 10.3.1 , proposition 10.3.3 and lemma 10.3. 1. (iii) In the same way, Af is nregular in any point of aAf = aAc, but Af is not always limited. Meanwhile, aAc is limited and so the proof (in proposition 10.3.3) of � ln PlJ; (X E Af ) ?  J (8Ac, 6 ) + 0 remains valid. Suppose now that 6 E Ac. Ac is included in a standard hypercube Up : = {x E �k : Xi > p} [  p , p] k (where p is a standard integer) . If we set Sp,i and s�, 2 {X E �k : Xi <  p } then u:; u7= 1 (Sp,i u s�, i ) . I t i s clear that :=
=
:=
On one hand, as Up \ A c is limited and nregular in any boundary point, we can write
10. Nonstandard likelihood ratio test in exponential families
158
Using lemma 10.3.1 (iii) with E = §uP and F = A( , we have J(aUPu aAc, 6 ) = � � 6). J(8Ac, 6) and so n1 ln Pe 1 ( X E Up \ Ac) ::::: J(8Ac, O n the other hand, n
P8; ( X E Upc ) �
k
2._) P8; ( X E Sp,i) + P8; ( X E S�,i )) . i =1
We know that � ln P8; (X E Sp ,i) ::::: J(Sp,2 , 6 ) and � ln P8; (X E 3� ,2) :::::  J(S�,i ' 6) (it is a classical result concerning halfspaces: cf. chap. 7 in [10]); so 1
1  E U C ) � max max(  J( SP 6 ) , J( S , 6)) + 0  ln P8; (X ·"' P p,; n 2=l... k which is less than  J(aAc, 6 ) according to lemma 10.3. 1(i). Then � ln P8; (X E � J(8Ac, 6 ) + 0. 1 C � · Fmally, r;, ln Pe 1 ( X E Ac ) ::::: J(8A c, 6 ). (ii) If 6 E hal(aAc) , then exists �o E §Ac such that J (�o , 6 ) ::::: J(A, 6 ) ::::: 0 and it remains possible to use proposition 10.3. 1 , proposition 10.3.3 and lemma 10.3.1 as for the proof of (i). If 6 tf. hal(aAc) , then 6 E 0Ac \ hal(aAc) according to the proof of (i) and so there is a standard number p such that the ball B (�, p) is included in 0Ac \ hal(§Ac) and consequently included in Ac. Then the nonstandard law of large numbers yields P8; ( X E Ac) = 1 + 0 and finally ln P8; ( X E Ac) ::::: 0. (iv) is obtained in a similar way. D A
A cc )
n
Definition 10.3.3 Let e E 8'. Let he be the function defined on [0, co [ by h e( c ) := J(A(, >.(B)) and ge the function defined on [0, co [ by ge ( c) : = J(Ac, >.(B) ) . ___,
As c Ac is increasing, ge is a nonincreasing function and he is a non decreasing function. Let r(B) : = I(e, 8 0 ); >. ( e ) E Ac iff c ? r(B) and so ge ( c ) = 0 if and only if c E [r(B) , co [ and h e( c) = 0 if and only if c E ]o, ,(B)].
Proposition
10.3.5 ___,
___,
h e (c) are Scontinuous on (i) The functions (e, c) ge (c) and ( e, c) {e E 8 \ hal(88) : e limited } x ( [O , co [ n £). (ii)
The function ge is decreasing on [0 , /( B)] and he is increasing on [r(B) , co] .
10.4. The nonstandard likelihood ratio test
159
Proof. Let e be standard and e ' ::: e , then .>.. ( e ) ::: .A ( e ' ) ; let c and c' be such that c' ::: c. Then °Ac' = 0Ac and Similarly, J(Ac' ' .>.. ( e' )) ::: J(0Ac' ' .>.. ( e)). The monotonicity of ge is deduced from lemma 10.3.1 (iii) which shows that if c' < c and .A ( e ) tf. Ac' then J(Ac' , .>.. ( e)) > J(Ac, .>.. ( e)). D The continuity and monotonicity of he are proved in the same way. 10.4
The nonstandard likelihood ratio test
Recall that F := { 8o internal : So c 8o c hal(So) } , that the NSLRT is defined by
sup Pl) ( X E Ar (8o)) eEe o
:S:
a :S: sup Pl) ( X E Ar (8o)). eEeo
Consequently, Ar is the rejection set (relatively to X ) , A� the acceptation set and the test e 0 is randomized if X E aAc . As n is illimited, if d < co (8o) and if d is limited then, for e E 8o, proposition 10.3.4 yields
As supeE e o J(aAd, .>.. ( e ))
=
d by definition of Ad, we have d(a, 8o) ::: ln a .  
n
This is the nonstandard form of a classical result of Bahadur ( [1] , [2] ) . We shall study the NSLRT according as these d( a, 8o) are infinitesimal or appre ciable. In this latter case, we shall have to distinguish the cases d( a, 8o) < co (8o) and d(a, 8o) ? co (8o). In the following, Zo stands for co (So), Ao for ).. (So) and Ro J (X, Ao) for R e 0 ; Ac will denote Ac ( Ao); the notations he (c) and ge (c) will also refer =
10. Nonstandard likelihood ratio test in exponential families
160
to these Ac implies
:=
K
Ac (Ao). Note that if
8o E
F, then the Scontinuity of A on
�
Re0 Ro. The following lemmas will be used for the calculation of the risks of the first and the second kind. They are valid not only for exponential families, but also for any statistical family. Here 8" is a standard subset of 8. Lemma 10.4.1 Let T be a statistic, S a (perhaps external) subset of 811 and F : D ____, � + , ( e , c) ____, Fe (c) a standard continuous function defined on a domain D = {(B, c) E 8" x �+ : de � c � d� } such that e • •
the functions e + de and e + d� aTe continuous, for each e E S, Fe is decreasing on [ de , d�] , for each e in s and c nearly standard in l de ' d� [ ' 1 n ln Pe(T � c) � Fe ( c) .
Then, for each e nearly standard in S and Co standard in [de, d� [ Proof. PRIJ (T � Co) = supw =@ Pe (T > Co + w). Now write 1  ln Pe (T � Co + w) � Fe (Co + w) � o Fe (Co + w). n
So, as oFe = Fe0 ( where Bo we have
:=
oe) is a standard continuous decreasing function,
� ln PR(j (T� C0 ) = J oo, ��� Fe (Co + w) [ = ] oo, Fe (Co) + 0[ .
Similarly, 1 n ln PR e (T � C0 ) = ] oo, ° Fe (Co) + 0[ = ] oo, Fe (Co) + 0 [ , n
for if not, there exists T) < 0,
T)
�
0 such that
1
Vr:: E hal + ( o ) , n ln Pe(T � Co + r:: ) > o Fe (Co) + TJ
where hal + ( o )
:=
{c: E hal ( O ) : � 0} . r::
10.4. The nonstandard likelihood ratio test
161
Then { E � + : � ln PlJ (T ? Co + ) > ° Fe (Co) + 7]} is an internal set in cluding hal + ( o ) and consequently ( by Cauchy ' s principle ) including an interval [0, wo] where wo is standard. But c:
c
� ln Pe (T ? Co + wo) :: Fe (Co + wo) :: ° Fe (Co + wo) n with °Fe(Co + wo) < °Fe (Co) which is a contradiction, because both these numbers are standard. Finally D Lemma 10.4.2 Let T be a statistic, S a (perhaps external) subset of 811 and G : E + � + , ( e , c) + Ge (c) a standard continuous function defined on a domain E = { (B , c) E 8" x � + : e e ::; c ::; e� } such that •
the functions e
____,
ee
and e
____,
e� are continuous,
•
for each e E S, Ge is increasing on [ee , e�] ,
•
for each e in s and c nearly standard in l ee ' e� [ ' 1  ln PlJ (T ::; c) :: Ge (c) . n
Then, for each e nearly standard in S and Co standard in [e e , e � [ PRIJ (T � Co) = L: e nGe ( Co ) + n0 . Proof. PR� (T � Co)
=
inf t =©: PlJ (T ::; Co + t). Now write
� ln PlJ (T ::; Co + t) :: Ge (Co + t) :: 0Ge(Co + t). n So, as 0Ge is a standard continuous increasing function, we have
1 n ln n PRe (T � Co) =
]
 oo ,
]
. Ge (Co + t) = ]  oo , Ge (Co) + ] . 0 tmf =@
As in lemma 10.4.1 , one establishes that PR� (T � Co) finally PRIJ (T � Co) = L:enGe ( Co ) + n0 .
=
PRIJ (T � Co) and D
10. Nonstandard likelihood ratio test in exponential families
162
10.4.1
In a
n
Proposition (i) if Ro
�
infinitesimal
10.4.1
For l�a
0 then
=
�
0,
0 otherwise
=
1;
(ii) for e E hal(Go ) , the risk of the first kind is exactly equal to £e n @ nhe ( 0l ; (iii) for a limited e E 8 \ (hal( 88) u hal( Go))! the risk of the second kind is exactly equal to £ e  nge ( O ) + n° . Proof. (i) If Ro 7: 0 then \i8o E F, Re0 7: 0 and so \i8o E F, e0 = 1 because d( a , 8o)  1�a 0 and finally
�
�
=
=
=
=
:=
�
____,
:
p Rlf ( Ro � 0 )
=
£e n@ nhe ( O ) .
(iii) According to proposition 10.3.4 (i) and (ii) , for a limited c :::; co , we have � ln Pl) (Ro :S: c) ge (c) . For any limited e E 8 \ (hal(88) U hal(Go)) , there exists a standard compact set 811 C 8 \ hal(88). such that e E 8". According to proposition 10.3.5, (e, c) ge (c) is continuous on 8" [O, co (8")] and for e E 8",  ge is increasing on [o, , (e)] . We can now apply lemma 10.4.2 with D { (e, c) E 8" x � + : 0 :::; c :::; I ( e) } , S = 8" \ hal( Go) , T = Ro , Ge = ge , Co = 0, t o obtain P RIJ (Ro � 0) �
____,
X
=
£enge ( O ) + n0 .
0
Remark 10.4. 1 Let e E hal( Go); the nonstandard version of the law of large numbers yields PRIJ (X E hal(>.(e)))
=
1 + 0 ==? PRIJ (X E hal(Ao))
=
1 + 0.
As the reject criterion is X tf. hal ( Ao), it is not logical to take a appreciable, but infinitesimal.
10.4. The nonstandard likelihood ratio test 10.4.2 0 �
1
n
163
l in a l � co
If co oo , we shall suppose that I !:a l is limited because some preliminary results (in § 10.3) are no longer available (for example Ac is no longer limited if c 00 ) . =
:::
Proposition 10.4.2 For o ; l l:a l ; zo and l l:a l limited, p1Lt C 0 (  1�a ) . (i) If Ro � C then
=
=
Proof. (i) If Ro � C then V8o E F, R e0 �  1�a , so Re0 > d(a, 8o) and e 0 = 1 . Finally,
�
As J ( 0 X, Ao) and J ( 0 X, Ao) are both standard numbers, this contradicts R e � C and Ro :;, C. o (ii) The proof is similar to the proof of proposition 10.4. 1 (ii) , with C0 = C instead of Co = 0. We note that the risk of the first kind calculated in §10.4.1 is a particular case of this result. (iii) For the first result, the proof is similar to the proof of proposition 10.4.1 (iii) , with Co = C instead of Co = 0 and S = { e E 8" \ hal( So) : I ( e) � C} instead of S = 811 \ hal(So ) . For the second, we choose a standard compact set 8" c 8 \ hal( 8 8) . such that e E 8". In fact, the proof of (ii) is valid with S { e E 8" : I ( e) :;, C} and one obtains =
p R{f ( Ro
� C) = Len@nhe ( C)
D
10. Nonstandard likelihood ratio test in exponential families
164
10.4.3
1  l ln a l � co n
The number co is standard because 8o is standard and co /= oo ; A is a limited set for if not J(�, Ao) is unlimited because � is unlimited (see the proof of lemma 10.3.2) and then co (So ) � sup�E A J(�. Ao) is unlimited. We shall suppose here that for each subset 8o of e, the function e J(e, 8o) is continuous on 8. This assumption is verified by classical exponential families. Then for any 8o , co ( 8 o) = sup�E A J(�, A( 8 o)) . For any 8 o C K, if co ( 8 o) � � lln a l then d(a, 8 o) = co( 8 o) and so
Proposition
10.4.3
For � lln al ?; co,
(i)
Proof. As
e0 < 1. If not, \i8 o E F, max
Remark 10.4.2 Let e be a limited element of e \ (hal(88) u (So)); the non standard version of the law of large numbers yields P RIJ ( X E hal( A( B))) = 1 +0 and thus PRIJ ( X E A0 ) = 1 + 0. So, for this e, the risk of the second kind is equal to 1 + 0: if � lln al � co, the NSLRT is not consistent (in the classical
10.5. Comparison with nonstandard tests based on X
165
theory, a sequence of tests ( (f?n)n ::>no {where n is the size of the sample} is con sistent if for any e E 81 , lim17_,= Ee ( l  (f?17 ) = 0: then, for n "" oo, the risk of the second kind is infinitesimal). As the example of multinomial distributions shows (cf. {9}), it seems impossible to give general results about the value of
Remark 10.4.3
=
10. 5
Comparison with nonstandard tests based on X
In the classical theory, for testing e E 8o against e E 81 , one fixes the level a and one tries to find a test that has maximum power in any e E 8 1 . However, such uniformly most powerful (U.M.P.) tests exist only in a few exceptional cases: for example for !dimensional exponential families, if 8o : = { e E 8 : e :S: Bo } . If no U .M.P. exists, one studies an asymptotic approach, for sample sizes tending to infinity. One of the tools to compare asymptotically the power of two tests of the same hypothesis is the Bahadur relative efficiency. In a more general framework (not only for exponential families) Bahadur ([1] or [2] , see also [12]) has proved that the likelihood ratio test is at last as "efficient" as any test based on a rejection criterion of the form T > c (with randomization if T = c if necessary) , where T is a statistic built with the nsampling. However, it may be that to get the same power in a point e E 8 \ 8o the size of the sampling for the likelihood ratio test has to be larger than for the test based on T: thus was introduced the notion of "Bahadur deficiency" (cf. [13]). We shall prove that the NSLRT is uniformly the most powerful in a large family of nonstandard tests if a � 1 and I�a � co . The standard likelihood ratio test is not as powerful in the corresponding standard family. Let ((f?e0 ) be a family of tests of level a (where (f?e 0 tests He0 : e E 8o against e tt 8o) . The corresponding levela nonstandard test (f?O is defined by (f?O
=
: inf { (f?e 0
: 8o E F }
and when we shall refer to a "nonstandard test", it will be defined thusly. 10.5.1 R
Regular nonstandard tests
Suppose that for each 8o , R (a, 8o) such that
:=
y? e 0
is a test of size a defined by a rejection set
sup Pl) ( X E R) :::_: a :::_: sup Pl)( X E R u aR) .
liE8o
liE8o
10. Nonstandard likelihood ratio test in exponential families
166
If R is nregular in any point of aR and if aR is limited , rpe0 will be called "regular". The most famous example of such a test is the chisquared test since the relative frequency F is in fact the sample mean for the multivariate distribution. In order to get R /= 0 we shall suppose that � lln a l � co and so � lln a l � co(8o) for 8o E F. If 'P8 o is regular for any 8o E F, 'PO will be said to be "regular". We prove now that
Proposition 10.5.1 For a � 1 and � lln a l � co , if 'P O is a regular levela non standard test, then

=:
:=
n
=
,
•
A
�

,
•
�
10.5. Comparison with nonstandard tests based on X
167
But J( 0 �o , >. ( 0S o)) and J( 0 �o , >.(So)) /= 0 are both standard numbers; as >. ( So) c ( >. ( 0S o ) ) 0 we have J( 0.;o , >.( 0S o)) < J ( 0 �o , >.(So)) and so J (�o , >. ( Go)) ;:, c. 1 c: this contradicts On the other hand J(R(a, G 0 ) , >. (Go)) : ::o
�o E R(a, Go) .
10.5.2
Case when
80
a

::o
D
is convex
For any 8o E F and any e 1 E 8 \ hal ( 8o) , there is a levela test \[!de o which is the most powerful at e 1 (for any levela test cp testing e E 8o against e E 8 1 , Ee 1 cp ::; Ee 1 W�0 ) . This test is defined in the following way (see [13, p. 49]) let �
is the least favorable distribution. Then \[!�0 = 1 if f(X) < C and W�0 = 0 if f (X) > C where C is a constant depending on a, 8o , e1 . So, the rejection set (relatively to X) is { � E A : f ( �) < C}.
where
T
Definition
10.5.1
For e 1
E8
\ hal( So ) , let W� 1
:=
inf { W�0
: 8o E
F} .
We might think that W� 1 is "the most powerful at e1 nonstandard levela test of Ho"; but in fact, in some cases,
Proposition
10.5.2 Jf Go is
convex, then for any e1
Proof. Let e1
E 8 \ hal( So) .
We prove that the rejection sets
A := { � E A :
!(�)
E 8 \hal (So) ,
W�0 .
< C}
are nregular and then the proposition 10.5. 1 yields the result. The set So is convex and d ( e1 , Go) rj: 0 and so exists a standard cone � C �k and a standard positive number > 0 such that if u E � and ll u l l = 1 then ( eo e1 I u) < for any eo E So . Therefore, for any 8o E F, for any e E 8o and for any u E � \ {0} , (eo  e1 I u) < 0. Choose 8o E F. If u E � \ { 0} , f(x + u)
c
c

le
:= .
o
ex
p ( n ( eo e l I u)) ·
. exp

(
n
( ( eo

e1 I x ) ?jJ ( eo) + 1jJ (e1 )) 
)
T
( deo)
< f(x).
168
10. Nonstandard likelihood ratio test in exponential families =
Let now a E 8A, then f(a) C. For any u E � \ {0}, f(a + u) < C, so a + (� \ {0}) C A . It is now possible to choose a point b E a + � such that d(a, b) = � ( for example) and such that B(b, nf12 ) C a + �  If a t/. hal ( 8A ) , this proves that A is nregular in a. If E hal ( 8A) , we have to use Cauchy's principle like at the end of the proof of proposition 10.3. 1 D a,
References
[1] R. R. BAHADUR, An optimal property of likelihood ratio statistic, in Pmc. Fifth Berkeley Symp. Math. Statist. Prnb. , Vol. 1 , 1965. [2] R. R. BAHADUR Some limit theorems in statistics, SIAM, Philadelphia, 1971. [3] R.R. BAHADUR and S . L . ZABELL, "Large deviations of the sample mean in general vector spaces", Annals of Probability, 31 (1979) 58762 1. [4] I . VAN DEN BERG, An external probability order theorem with applica tions, in F. and M. Diener editors, Non Standard Analysis in Practice, SpringerVerlag, Universitext, 1995. [5] J . B oSGIRAUD, "Exemple de test statistique non standard, risque ex terne", Pub. Inst. Stat. Paris, 41 (1997) 8595. [6] .J . BoSGIRAUD, "Tests statistiques non standard sur des proportions", Pub. Inst. Stat. Paris, 44 (2000) 312. [7] .J . BoSGIRAUD , "Tests statistiques non standard pour les modeles a rap port de vraisemblance monotone ( premiere partie ) ", Pub. Inst. Stat . Paris, 44 (2000) 87101 [8] J . BoSGIRAUD, "Tests statistiques non standard pour les modeles a rap port de vraisemblance monotone ( seconde partie ) ", Pub. Inst. Stat. Paris, 45 (200 1) 6178. [9] .J . BoSGIRAUD , "Nonstandard chisquared test", Journal of Information & Optimization Sciences, 26 (2005) 443470 . [10] L . D . B ROWN, Fundamentals of Statistical Exponential Families, Institute of Mathematical Statistics, Hayward, California, 1986. [11] F. DIENER and G. REEB, Analyse non standard, Hermann, Paris, 1989. [12] P. G ROENEBOOM and .J . OoSTERHOFF, "Bahadur efficiency and proba bilities of large deviations", Statistica N eerlandica, 3 1 ( 1977) 124.
References
169
[13] W . C . M . KALLENBERG , Asymptotic Optimality of likehood ratio tests in exponential families, Mathematisch Centrum, Amsterdam, 1978. [14]
F. KouDJETI and I. VAN DEN B ERG, N eutrices, external numbers and external calculus, in F. et M. Diener editors: Non Standard Analysis in Practice, SpringerVerlag, Universitext, 1995.
[15]
Radically Elementary Probability Theory, Princeton Univer sity Press, 1987. E . NELSON ,
[16] S . R . S . VARADHAN , Large Deviations and Applications, SIAM, Philadel phia, 1984.
A finitary approach for the representation of the infinitesimal generator of a markovian sem1group Scherazade Benhabib *
Abstract Ill, where the central question was: under suitab le regularity conditions, what is t he form of the infinitesimal generator of a Markov semigroup?
This work is based on Nelson's paper
the elementary approach using 1ST [2] , the idea is to replace the ![{ with a finite state space X possibly containing an unlimited number of points. The topology on X arises
In
continuous state space, such as naturally
from
the
probability theory.
For :1: E X, let. 'I.r.
be
the set
of all
h E M vanishing at x where M is the multiplier algebra of the doma.iu 'D of the iufiui tcsimal generator. To desaihe the st.rneLm:e of the scmigroup
l::y EX\{x} a(a; , y) h(y) so that of the points far from x appears separately. A defini tion of the quantity O:ah ( x ) = l:yEF a.(;t, y) h(y) is given using the least upper bo und of the stun..<; on all intemal sets TV included in the external set F. Tl:ris leads to the characterization of the
generator A, we want
to
split Ah(a;)
the contribution of the external set Fx
global part of
11.1
=
the infiuitesimal generator.
Introduction
Let X be a finite state space possibly containing an uulimit.ed unrnbcr N of points. For all x in X, let a(x, y ) be a real number for each y on X such that a(x, y) is posi tive if y =/= x and L y a(x, y) = 0. Then A = (a(x, y)) is a finite N x N matrix, with positive off diagonal elements and row sums 0. L n t'��" exists and is a markovian semigroup with infinitesimal Thus p t generator A. =
•EIGSI, 17041 La Rochelle, France. scherazade .benhabib@eigs i . fr
11.1. Introduction
171
Then we can define externally the domain of definition of A and its multi plier algebra as follows. Definition 1 1 . 1 . 1
A is the external set
The domain D of definition of the infinitesimal generator
D = { f E � X / f limited and Af limited}
(11.1.1)
And its multiplier algebra is the external set M = { f E D/ Vg E D fg E D}
(11. 1.2)
The topology on X arises naturally from the probability theory and a proximity relation is defined on X, more particulary from the way the functions in the multiplier algebra M of the domain D of the infinitesimal generator act on X. Definition 1 1 . 1 .2
and only if
Let u and v be in X, we say that u is close to v (u Vh E M h(u)
�
�
v) if
h(v)
Thus, the elements of M have a macroscopic property of continuity for mally analogous to the Scontinuity. With this relation, we define the external sets :
�
Cx
=
X}
(11. 1.3)
Fx
= {y E X y rf. X}
( 11.1.4)
{y E X y
of points close to x and :
of those far from x. For each x in X, (a(x, y)) are positive real numbers if x /= y and can take unlimited values. Let Ix be the set of all h in M vanishing at x, and for h in Ix let us define ( 11.1.5 ) Ah(x) = L a(x, y) h (y) yE X \{x}
To describe the structure of the semigroup generator, we want to split Ah(x) so that the contribution of the points far from x in X. appears separately. One is tempted to define directly the quantity
aa h (x) = L a(x, y)h (y) yEFx
( 11.1.6 )
11. The infinitesimal generator of a markovian semigroup
172
which alas has no meaning because of the external character of the set of indices F:r Nevertheless, we will show in Section 1 1.2 how we can attach a definite meaning to it. That leads in Section 1 1 .3 to a definition for an axintegrable function. Theorem 11.3.1 establishes the representation of Ah(x) for functions in M which have a zero at x of the third order. This is the global or integral part of A. Theorem 11.3.2 and its corollary extend the notion of axintegrable to functions which have a zero of the first or the second order. 11.2
Construction of the least upper bound of sums in 1ST
Let E be an external subset of X and f a positive function defined on X. If there exists a standard natural number no such that for all internal sets W contained in E we have
Lemma 1 1 . 2 . 1
L f(y) :S: no
(11.2.1)
yEW
then there exists a unique up to an infinitesimal least number a( ! ) such that
L f(y) � a( ! )
(11.2.2)
yEW
for all internal sets W contained in E. Proof. Let no be a standard natural number such that for all internal sets W contained in E we have ( 11.2.1). By external induction, there is a least no such that (1 1.2.1) holds. Now for each standard natural k there is a largest natural number j with 0 :S: j :S: 2 k such that for all such W (11.2.3) L f(y) :S: no  k yEW
1
Let a(k) no  ik . Then for each standard k we have a standard a ( k ) , so there is a standard sequence k + a( k) . This sequence is decreasing, therefore it has a standard limit a (f) . Moreover, for all standard k and all such W , D ( a (f)  Ly Ew f(y)) :S: 21k · This proof uses very elementary tools. It can easily be formulated in the minimal nonstandard analysis introduced by Nelson in [3] in which "standard" =
1 1.2. Construction of the least upper bound of sums in 1ST
1 73
a(O)
Step 0:
no  1 Step 1:
no  1
/,� i)\ : : J)\ n
(2
Step 2:
no  1
no a(1) no a(2) no
Figure 11.1 applies only to natural numbers. On the contrary, if all the force of 1ST is used, one could check that s a (f) sup x E �+ ; 3 intw C E L f(y) :S x
=
{
yEW
}·
Figure 11.1 exhibits the way the sequence is built. Definition 1 1 . 2 . 1 1.
If f is a positive function defined on X, and if the conditions required for Lemma 1 1 . 2. 1 are verified the quantity (11.2.4) af = L f(y) yEE
is defined to be the standard limit a (f) . 2. If the conditions for Lemma 1 1 . 2. 1 are not verified, let af be the formal symbol x and say that (1 1 . 2. 2) is not defined. 3. In the general case, let f f +  f  be the decomposition of f into the difference of two positive functions. Say that af is defined in case both aJ + and a1 are defined, and let a f be their difference:
=
Notation 1 1 . 2 . 1
of af .
For each x in X, if f depends on x, write aJ (x) instead
11. The infinitesimal generator of a markovian semigroup
174
11.3
The global part of the infinitesimal generator
For x E X and h E M , let f = a(x, ·)h, then h is said to be axintegrable if and only if aJ (x) is defined.
Definition 1 1 . 3 . 1
Notation 1 1 . 3 . 1
In that case write aa h (x) instead of aJ (x) .
Definition 1 1 .3.2 1.
h is a function in Ix if and only if h E M and h(x)
2. h is a function in I� if and only if h E M and h limited and eb gk ·in Ix .
= 0.
=
I:f=l e k gk with p
3. h is a function in I� if and only if h E M and is of the form h Ll e k fk gk with e k , fk , gk E Ix and p limited. 4. We say that h has a zero at x of the first order when h E Ix, of the second order when h E I�, of the third order when h E I� . If h has a zero at x of the third order, then h is axintegrable and Ah ( x) is infinitely close to aah ( x) .
Theorem 1 1 . 3 . 1
Proof.
The proof goes through the following three steps:
1 . Let h be in I� , therefore it is of the form h and fk , gk in Ix .
=
I:i f� gk with q limited
,
2. I f f g E I.:r then P g is axintegrable and so is h. These steps are proved as follows:
1. Let h E I�, as 1 1 1 efg = 4 (e + f) 2 g  2 e 2 g  2 f 2g, therefore h is of the mentioned form.
I L a(x, y)f2 (y)g+ (y) I w
:::;
l l g i i A(f2 ) (x)
1 1.3. The global part of the infinitesimal generator
175
As l l g l l and A(f 2 ) are limited, they are both infinitely close to a standard number. Taking the integer part of o (l lgi iA(f2 ) (x)) plus 1, we get the existence of the natural number no required by Lemma 11.2.1. The same argument applies to f 2g  . Thus aa( J2 g + ) (x ) and aa(j 2 g) (x ) exist and as O!a( J 2 g) (x ) = aa( J 2 g + ) (x )  aa(j 2g) (x ) , then f2 g is ax integrable, and so is h. 3. Let (3 » 0. Since g E Ix there is an internal set W of Fx such that g + (y) :::; (3 for y E W 0 . We have
I A(f2 g + ) (x)  aa( J 2g+) I
=
IL
yE X \{x}
a (x , y) f2 (y)g + (y)  aa( J 2g + )
I
L a (x , y)f2 (y)g+ (y)  L a (x , y)f2 (y)g+ (y) L L a(x, y)f2 (y) :::; (3 L a (x , y) f 2 (y)
<
yEW
yE X \{x}
yEWr"\{x}
:::; {3
=
yEW0\{x} yE X \{x}
(3A(f2 ) (x)
Since A(f2 ) is limited and (3 » 0 is arbitrary, A(f 2 g + ) aa( J 2 g + ) (x ) . The same applies to f 2 g  . Then A(f 2 g  ) (x) ::: aa( J2 g) (x) and aa( J2 g) = aa( J2 g + ) (x) + aa( J2 g) (x) . D Thus A(f2 g) (x) ::: aa( J 2g) (x ) . Therefore A(h) (x) ::: a(ah) (x ) . :::
If h zs positive and has a zero at x of the first order, then h is axintegrable and ah is less or infinitely close to Ah(x) . Theorem 1 1 .3.2 Proof.
Let W be any internal subset of Fx and h a positive function in I� , so
0 :::;
L a (x, y)h (y) :::; A(h (x w
)
)
As A (h) (x) is limited, we argue as in Theorem 11.3.1 and apply Lemma 1 1.2.1. One concludes that h is axintegrable. As the inequality above holds for all internal subset W of Fx , we get that aah (x ) is less or infinitely close to A(h) (x) . D
176
11. The infinitesimal generator of a markovian semigroup
Corollary 1 1 . 3 . 1
If h has a zero at
integrable.
x
of the second order, then h is ax
Let h be a function in I� , as fg = 41 (e + g) 2  41 (e  g) 2 h is also of the form h = Lk = l Pk ff with Pk = ±1, q limited and f'f in I"} . Consequently we can apply Theorem 1 1 .3.2 and h is axintegrable. D
Proof.
1 1 .4
Remarks
The next goal will be to find an 1ST construction of the pure diffusion part of the generator of the semigroup and to complete a work already in progress concerning the rescaling of the time parameter for the markovian semigroup pt .
I would like to thank Professor E. Nelson for the initial suggestion of this problem and numerous enlightening discussions. I would also like to thank Professor G. Wallet of the University of la Rochelle for his support.
Acknowledgment
References
[1]
E. NELSON, " Representation of a Markovian Semigroup and Its Infinitesi mal Generator", Journal of Mathematics and Mechanics, 7 ( 1958) 997988.
[2]
E. NELSON, "Internal Set Theory: a new approach to nonstandard analy sis", Bulletin Amer. Math. Soc., 83 ( 1977) 11651 198.
[3]
E. NELSON,
[4]
Press, 1987 E.
B.
Radically Elementary Probability Theory, Princeton University
DYNKIN,
Markov Processes, SpringerVerlag, 2 vols .. 1965.
[5] W. FELLER, An Introduction to Pmbability Theory and Its Applications, John Wiley & Sons. [6] I . VAN DEN BERG, Non Standard Asymptotic Analysis, Lectures Notes in Mathematics, 1249 , SpringerVerlag, 1987.
On two recent applications of nonstandard analysis to the theory of financial markets * Frederik S. Herzberg
Abstract Suitable notions of "unfairness" that measure how far an empirical dis counted asset price process is fi·om being a martingale arc introduced for complete and incompletemarket settings. Several limit processes are i nvolved each tirue, prompting a nonfltandard approach to the this concept. This leads to
( rather
traded
than a
011
martingale
several stock
a
analysi
s of
existence result for a "fairest price measure"
measure ) for an
asset
exchanges. This approach
describing the impact of
12 . 1
an
that is simultaneously also proves useful when
currency tr ansaction tax.
Introduction
Nonstandard analysis is most successful when it concerns itself with
ematical concepts that. i nvolve several limit
math p ro ecss cs in a nontrivial way. One
such exampl e is the quantification of a stochastic process's distance from being a martingale.
between the existence of a martingale mea..sure for a discounted asset price process and the nonexistence of atbitrage opportunities, there is a natural economic interpretation to the compru·ison of two positive stochastic processes in regard to "how fru·" they actually ru·e from being a martingale. Vole will l'encler two economic questions related to such conceived "unfairness" mathematical : (1) Is there a "fairest price measure" (rather than a m�u·tingale measure ) for an asset that is sinmltaneously traded at mul t iple stock exchanges? (2) Does a currency transaction tax imposed on a currency that is sub j ect to herd behaviour among traders have a fairnessGiven the wellknown correspondence
· Mathematical Institute, University of Oxford, Oxford OXl :3LB, United Kingdom. herzberg@maths . ox . ac . uk / herzberg@wiener . iam . uni bonn . de
12.
178
Applications to the theory of financial markets
enhancing impact on the currency exchange rates preceding a financial crash? Both questions will be shown to allow for an affirmative answer. Techniques from nonstandard analysis have already been successfully ap plied to mathematical finance in the work of Cutland, Kopp and Willinger [5] . This contribution is based on two more recent papers by the author [8, 9] . 12.2
A fair price for a multiply traded asset
In this Section, we will en pass ant motivate notions of "unfairness" for both completemarket as well as incompletemarket settings. Theorem 12.2.1 Let n E N, p > 0 and c, N > 0 . Consider an adapted probability space (r, Q, IP') and B[O, 1 ] ® Y1measurable geometric Brownian mo tions with constant multiplicative drift g� : r x [0, 1 ] �d·n , i E { 1 , . . . , n}. Then there exist processes gi , i E { 1, . . . , n} on an adapted probability space (D, F, CQ) , equ.ivalent to the processes g� on r (in the sense of adapted equiv alence [7}), such that there is a probability measure Mm on n in the class of measure s Q probability measure, VA E F1 iJ · CQ(A) :S: Q(A) :S: N CQ(A) , > vz .../.r. J. E { 1 , . . . , n } 0 I( ) I s c minimising ____,
1 1 CovQ((g,)s, (gJ)Jd .t.Q 9t9J s
}
·
w·
_
�
in the class of Loeb extensions of finitely additive measures from (*C) (D, G) (G being an arbitrary lifting of g). Analogously, there is a probability measure Mn minimising
(where nn ( Q, g) is defined to be +oo if the derivative in 0 in this definition does not a.s. exist as a continuous function in t) in the class of Loeb extensions of finitely additive measures from (*C) (D, G) . Moreover, inf, mn (·, g) :S: inf mr ( , g) C(D g )
as well as
C(r,g)
·
inf nr (·, g ) . inf nn (, g) :S: C(r,g)
C(D,g)
12.2. A fair price for a multiply traded asset
1 79
The original paper [8} does not state the Theorem and the subs equent Lemmas as precisely as it is done here, although the exact result becomes clear from the proofs in that paper.
Remark 12.2. 1
The proof for this Theorem can be split into the following Lemmas which might also be interesting in their own right.
Using the notation of the previous Theorem hyperfinite adapted space n [12},
Lemma 12.2.1
12. 2. 1,
for any
inf mn (· , g) :::; inf mr (·. g) . C (r,g)
C(D.g)
Under the asswnptions of Theorem 12.2. 1 and choosing n to be any hyperfinite adapted space, the infimum of mn ( , g) in the class of Loeb extensions of finitely additive measures from (*C) (D, G) is attained by some measure Mm E C(D. g) . Proofs for both of these Lemmas can be found in a recent paper by the author [8] . Although they look very similar, it is technically slightly more demanding to prove the following two Lemmas ( which in turn obviously entail the second half of the Theorem, i.e. the assertions concerned with the map n) . Lemma 12.2.2
Using the notation of the previous Theorem hyperfinite adapted space n [12},
Lemma 12.2.3
12. 2. 1 ,
for any
nr (· , g ) . inf nn (· , g) :::; C inf (r,g)
C(D.g)
Under the assumptions of Theorem 12.2. 1 and choosing n to be any hyperfinite adapted space, the infimum of nn (·, g) on C(D, g) is attained by some measure Mn E C(D, g) .
Lemma 12.2.4
Easy results are Lemma 12.2.5 A semimartingale X is a ?martingale on n if and only if 1 mn (P, x ) = 0 . The function mn (P, · ) v on the space of measurable processes of n satisfies the triangle inequality and is 1homogeneous. For p = 2, it defines
an inner product on the space [.
:=
{x : n
x
[0, 1]
+
�d
:
x
measurable, rn(P, x )
which becomes a HilbeTt space by this construction.
<
+oo }
12. Applications to the theory of financial markets
180
A semimartingale X is a ?martingale on n if and only if The function nn (P, ·) on the space of measurable processes of n remains unchanged when multiplying the argument by constants (scaling invariance) .
Lemma 12.2.6 nn (P, x) = 0.
We can generalise this to the following Definition 12.2.1
Let D be an adapted probability space. Define
£ (Sl, �d ) :
:=
{ X : Sl
X
[0, 1] + �d :
X measurable } .
A function Y £ ( n, �d ) + [0, +oo] is an incompletemarket notion of un fairness if and only if it satisfies the triangle inequality, is 1homogeneous, and assigns 0 to a semimartingale y if and only if y is a martingale. Y is said to be a completemarket notion of unfairness if and only if it remains unchanged under multiplication by constants and Y vanishes exactly for those semimartin gales that are in fact martingales. 1
By Lemmas 12.2.5 and 12.2.6 respectively, the function mn (Q,  ) P (for a fixed probability measure Q) is an example for an incompletemarket notion of unfairness, whereas similarly nn (Q:  ) , again for a fixed probability measure Q, is an example for a completemarket notion of unfairness.
The distinction between notions of unfairness for complete and incomplete markets can be justified by the following reasoning: If it is, under assumption of completeness, possible to buy as much of an asset as one intends to, multiplication of the discounted price process by a constant does not enhance the arising arbitrage opportunities at all; therefore, for complete markets, a suitable notion of unfairness should be scaling invariant in that it does not change under multiplication of the argument  which is conceived as being a discounted price process  by constants.
Remark 12.2.2
12.3
Fairnessenhancing effects o f a currency transaction tax
In this Section, we shall analyse the empirical price of an asset that is subject to herdbehaviour preceding a financial crash and how the resulting distortion can be mitigated through imposing a transaction tax. This is of particular relevance if the crashing asset in question is a currency, given the massive (usually destabilising and therefore adverse) macroeconomic conse quences of such financial crises.
12.4. How to minimize "unfairness"
181
We shall restrict our attention to a model of currency prices where extrap olation from the observed behaviour of other traders, up to white noise and constant inflation, completely accounts for the evolution of the currency price. This is to say that, in analogy to a Nash equilibrium, we assume that every agent is acting in such a manner that he gains most if all other agents follow his pattern. In our model this pattern will consist in using some average value of past currency prices as a "sunspot", that is a proxy for a general perception that depreciation or appreciation of the particular currency in question is due. In order to introduce the model for the precrash discounted currency price, consider a � + � ' a piecewise constant function, such that a = a(1) > 0 on (O, + x) a = a( 1) < 0 on (oo, O). We assume that the logarithmic discounted price process x ( v ) is governed, given some initial condition that without loss of generality can be taken to be x ( v ) 0, by the stochastic differential equation :
=
(
x �v ) = a x �v )  L Pu . x ((�2 u ) vo uEI
d
J2
dt + J . dbt  2 '
)
X{
(v ) _ '>'
I xt
. (v)
dt I } (12.3. 1)
L.. n E I Pn x ( t n)VO ;:o:v
where r > 0 is the logarithmic discount rate (assumed to be constant) , b is the onedimensional Wiener process, (Pu)uEI is a convex combination  i.e. I C (0, + oo) is finite, Vu E I Pu > 0 and L uEI Pu = 1. The parameter v depends very much on the tax rate we assume. If p is the logarithmic tax rate and T the expected time during which one will hold the asset (that is the expected duration of the upward or downward tendency of the stock price) , one can compute v as follows: v = T · p. The equation (12.3.1), is of course first of all only a formal equation that thanks to the boundedness of a can be made rigorous using the theory of stochastic differential equations developed by Hoover and Perkins [11] or Albeverio et al. [2] for instance. We will introduce the following abbreviation: n;,(v)  X{ 1 · 1 2: v } a . 'f/ 
·
.
12.4
,
How to minimize "unfairness"
Intuitively, the map nn (P, · ) (for a fixed probability measure P on a proba bility space D) when applied to a discounted asset price process measures how
12. Applications to the theory of financial markets
182
often ( in terms of time and probability ) and how much it will be the case that one may expect to obtain a multiple ( or a fraction) of one's portfolio simply by selling or buying the stock under consideration. For the following, we will drop the first component of n ( indicating the probability measure ) if no ambiguity can arise; for internal hyperfinite adapted spaces and Loeb hyperfinite adapted spaces, we will assume that the canon ical measure ( the internal uniform counting measure and its Loeb measure, respectively) on the space is referred to. First of all we derive a formula that will make explicitly computing n easier in our specific setting: Lemma 12.4.1 If x ( v ) satisfies (12. 3. 1) for some v > 0 on an adapted prob ability space (r, g , CQ), then the discounted price process exp x� v) : t ::0: 0)
( ( )
is of finite unfairness. More specifically,
( ( ))
nr exp xl"l
[ IE [ 1 ,;,1" (x;"1 fr p;x:;1 i)vo) I] dt [ IE [ I d: l .,�o IE [ xi�¥,] + �' I ] dt I
�

The proof is more or less a formal calculation, provided one is aware of the pathcontinuity of our process and the fact that the filtrations generated by b and x(v ) are identical. For this implies that, given t > 0, the value Pro of.
'1/J ( v )
(
)
( ) ( v)u  ""' xt+ 6 Pi X ( t+ u � ) vo ( w ) i EI v
does not change within sufficiently small times u  almost surely for all those paths w where x�v ) (w)  L i E I p� x/��� ) vo(w) tf. { ±v }, this condition itself being satisfied with probability 1 . Now, using this result and the martingale property of the quotient of the exponential Brownian motion and its exponential bracket, we can deduce that for all t > 0 almost surely:
How to minimize "unfairness"
12.4.
183
Analogously, one may prove the second equation in the Lemma: for, one readily has almost surely
In order to proceed from these pointwise almost sure equations to the assertion of the Theorem, one will apply Lebesgue's Dominated Convergence Theorem, yielding
For finite hyperfinite adapted probability spaces, an elementary proof for the main Theorem of this Section can be contrived: Lemma 12.4.2 For any hyperfinite number H we will let for all v > 0 denote the solution to the hyperfinite inztial value problem
x6v) { x ( v )  x (v )  � } (x(v)  "" . x(v) ) X{
\it E
=
0, 0, . . . , 1 t
t+ J, 
x (v)
a
!

t
+ J . 1ft+ J,
6 Pu
uEI .
( tu) VO
2 1 J 1 (2H ! ) 1/'2 2 H !
I
(v )
xt

(u)
L Pu ·X (t u )V O
uEI
I
?: v
}
.� H!
(1 2.4. 1)
12.
184
Applications to the theory of financial markets
(where 1'1£jH ! : n { ± 1 } H ' + { ± 1 } is for all hyperfinite e <::.:: H! the projection to the f!th coordinate) which is just the hyperfinite analogue to (1 2. 3. 1) . Using this notation, and considering a hyperfinite adapted probability space of mesh size H!, one has for all k < H!, =
""' IE Xk(v)/H !  IE [X (v ) I FkjH !]  2J2 1 H! "±f
<::; 6
If
k
for all 2: v . As a consequence, m (xO! n) is monotonely decreasing for all hyperfinite adapted probability spaces Sl = { ±1 } H ! . Pro of of Lemma 12.4.2. It suffices to prove the result for finite ( rather than merely hypcrfinitc) adapted spaces. By transfer to the nonstandard universe, we will obtain the same result for infinite hyperfinite H as well. The proof of this Lemma for finite H relics on exploiting the assumption that a is piecewise constant, since a = a ( 1 )x(O,+=) + a (  1 )x(=.O) + a (O)x{o} yields Vk < H! Vv > 0 T
[ (v) I J v) (
J2 1
IE X kJ} Fk/ H !  X J'r, + 2 H!
v
(
 �a x( l  6 ""' pu kj H ! H!
_
1
_
 H!
(
·
u EI
a(1)x
{
(v
X ( �,l u)v o
(c, · J
c'·l
) {l
xk/H '  L Pu·X .J£. uEI HI
+ a(  1 ) x
{
C uJ
xk/H'  L Pu ·X uEl
u
x
1t
) VO
(C kJ  u ) JTf
(v) xk/H'
;:::v }
VO
::; v
" p " L
uEl
)} '
.x (v )
k  u) V O (w
l>v}
which immediately follows from the construction of Anderson's random walk [3] B. = v�Jr. H, and the recursive difference equation defining the process x(v) . However,2 this last equation implies Vk < H! Vv > 0
I
[ (v) I ] (v)
J2 1
IE IE X k}i,1 Fk/ H !  X J'r, + 2 H! 1
= 
H!
a ( 1)X
IE
{
(v )
I
((v)
:.:C: v } sv } )
xk/ lf '  L Pu ·X k m u) V O uEl
+ a(1)x
{
X
��� ,  uLE I Pu X ((vm�  u
VO
12.4. How to minimize "unfairness"
185
} "'"' IP' "'"' { 6 XkjH!  6 Pu · X �, u)vo ? v
Now all that remains to be shown is that (v)
k< H! and
(v)
(
uEI
"'"' IP' { XkjH!  "'"' 6 Pu . X );, u)vo 6 (v)
k< H!
(v)
(
uEI
<::;
v
}
are monotonely decreasing in v. The former statement is a consequence of the following assertion:
Vv' ::; v \If! < H!
IP' { xt�!  L Pu . xt�, u)vo v } L k�R uEI "'"' "'"' Pu . X ", u)vo 6 1P' { Xk/ H '  6 '2
H
<::::
(v' )
k�R
(v')
(H
uEI
2::
v
1
}
(12.4. 2)
One can prove this estimate by considering the minimal f! such that the point wise inequality
L k
(v)
(v)
xk/H'  L Pu ·X ( _k_ ;:o: v uEI HI u V O
)
} ::; L k
(v')
(v')
::?: v xk f H'  L Pu · X ( _b_ HI u VO uE I
)
fails to hold. Then one has an w E S1 such that Vk < f! (w ) (vJ CvJ
X{
xk/H '  L Pu · X ( k uEI HT  u
= X{
1 0
(v')
)
vo
::?:v
}
(v' )
xk/H'  L Pu · X ( k ::?: w u VO v uEI
X{ = X{
(v )
(v)
(v' )
(v')
)
' } (w) ,
)
::?:v xf! H '  L Pu · X ( £ uEI HI u VO
)
'}
::?: v XR f H'  L Pu ·X ( £ uEI HI u VO
(12.4.4 )
} (w) ,
( 12.4.5 ) ( 12.4.6)
) ' } (w .
But equation ( 12.4.4 ) implies, via the difference equation for inductively in k the relation Vk < f! Xf(w ) = X;;:' (w ) .
(12.4.3)
x ()
(12.4. 1),
12.
186
Applications to the theory of financial markets
If one combines this with
(which is equation (12.4.5)) and v ::; v' , one can derive  again via the recur sive difference equation (12.4. 1)  that x , :::: x ! applied in t = �! as well as the estimates
i;2
X { XR/H'  L (v ' )
uEI
Pu · X
(v ' ) k
( H!
u
) VO
2:v ' } (w)
> >
X{ X{ X{
i;1
(v ' )
xf/H'  L Pu · X uEI
(v )
xf/ H '  L Pu X uEI
(v )
·
xt/ H '  L Pu · X uE l
1.
This contradicts equation (12.4.6) . Hence, the estimate tablished for all k < H!, leading to (12.4.2). S imilarly, one can prove
(v )
(t
w u
(v )
(f
wu
(v )
w_  u
(t
)VO ) VO ) VO
(12.4.3)
2:v ' } (w) 2:v' } (w) 2:v } (w)
has been es
(v)  L Pu · X (v ),  O ::; v } L IP' { Xk/H' ( H u)V k�R uEI "" IP' { Xk/H' "" Pu . X((v')l,) u)vo ::; ' } (v' )  L._., ::; L._., k�R uEI which entails that Lk < H! IP' { Xk/�!  'I:, Pu · X (v1 _ u)vo ::;  v } must be monouEI ( H ! Vv' ::; v W <
H!
·
k
v
tonely decreasing in v.
D
Using nonstandard analysis and the model theory of stochastic processes as developed by Keisler and others [10, 12, 7] , we can prove the following result:
Suppose (y(v) : v > 0 ) is a family of stochastic processes on an adapted probability space r such that y(v) solves the stochastic differential equation (12. 3. 1) formulated above for all v > 0. Then the function J f+ nr (y(" l ) attains its minim·um on [0, S] in S.
Theorem 12.4.1
Pro of. B y the previous Lemmas 12.4.2 and the formula for n from Lemma 12.4. 1, the assertion of the Theorem holds true internally for hyperfi nite adapted spaces r, if we replace y(o) by a lifting y(o) and if we let nn when applied to internal processes denote the hyperfinite analogue of the standard nn . Now, according to results by Hoover and Perkins [11] as well as Albeverio
187
References
et al. [2] , the solution x(v ) of the hyperfinite initial value problem (12.4.1) is a lifting for the solution x(v ) of ( 12.3.1) on a hyperfinite adapted space for any v 2: 0. Now, y c+ nn ( y ) is the expectation of a conditional process in the sense of Fajardo and Keisler [7] . Therefore, due to the Adapted Lifting Theorem [7] , we must have Vv 2: 0 °nn x(v ) = nn ( x(v ) )
( )
(where we identify n with its internal analogue when applied to internal pro cesses) . Since the internal equivalent of the Theorem's assertion holds for internal hyperfinite adapted space, the previous equation implies that it is also true for Loeb hyperfinite adapted spaces. Now let (y (v ) : v > 0) be a family of processes on some (not necessarily hy perfinite) adapted probability space r with the properties as in the Theorem . Because o f the universality o f hyperfinite adapted spaces [7, 12] , we will find a process x (v ) on any hyperfinite adapted space n such that x and y are automor phic to each other. This implies [2] that x ( v ) satisfies ( 12.3.1) as well. Further more, as one can easily see using Lebesgue's Dominated Convergence Theorem, Vv
>
0 nn ( x ( v) )
=
nr ( y ( v ) ) .
Due to our previous remarks on the solutions of (12.3.1) on hyperfinite adapted spaces, this suffices to prove the Theorem. D Acknowledgements. The author gratefully acknowledges funding from the German National Academic Foundation ( Studienstiftung des deutschen Volkes ) and a predoctoral research grant of the German Academic Exchange Service ( Doktorandenstipendi1Lm des Dwtschen Akademischen A 1Lstauschdienstes ) . He is also indebted to his advisor, Professor Sergio Albeverio, as well as an anony mous referee for helpful comments on the first version of this paper.
References
[1]
S . ALBEVERIO, Some personal remarks on nonstandard analysis in prob ability theory and mathematical physics, in Prnceedings of the V!Ilth In ternational Congress on Mathematical Physics  Marseille 1986, eds. M . Mebkhout , R. Seneor, World Scientific, Singapore, 1987.
[2]
S . ALBEVERIO , J. FENSTAD , R. H0EGHKROHN and T . LINDSTR0M ,
Nonstandard methods in stochastic analysis and mathematical physics, 1986.
Academic Press, Orlando,
[3]
R. M . ANDERSON, "A nonstandard representation for Brownian motion and Ito integration", Israel .Journal of Mathematics. 25 (1976) 1546.
188
12. Applications to the theory of financial markets
[4] A. CoRCOS ET AL . , "Imitation and contrarian behaviour: hyperbolic bubbles, crashes and chaos", Quantitative Finance, 2 (2002) 264281. [5] N. CUTLAND, E . KoPP and W . WILLINGER, "A nonstandard treatment of options driven by Poisson processes", Stochastics and Stochastics Re ports, 42 ( 1 993) 1 1 5 133. [6] l\1 . D REHMANN, .J . 0ECHSSLER and A . RoiDER, Herding with and with out payoff externalities  an internet experiment , mimoe 2004. [7] S . FAJARDO and H. J. KEISLER, Model theory of stochastic Lecture Notes in Logic 14, A. K Peters, Natick (Mass . ) , 2002.
processes,
[8] F . S . HERZBERG, "The fairest price of an asset in an environment of tem porary arbitrage", International Journal of Pure and Applied Mathemat ics, 18 (2005) 121  131. [9] F . S . HERZBERG, On measures of unfairness and an optimal currency transaction tax, arXiv Preprint math. PR/0410543. http : //www . arxiv . org/pdf/math . PR/04 10543 [10] D . N . HOOVER and H . .J . KEISLER, "Adapted probability distributions", Transactions of the American Mathematical Society, 286 ( 1 984) 1 59201. [ 1 1] D . N . HooVER and E . PERKINS, " Nonstandard construction of the stochastic integral and applications to stochastic differential equations I, II", Transactions of the American Mathematical Society, 275 ( 1 983) 158. [12] H. J. KEISLER, lnfinitesimals in probability theory, in Nonstandard anal ysis and its applications, ed. N. Cutland, London 1Iathematical Society Student Texts 10, Cambridge University Press, Cambridge, 1988. [13] P . A . LOEB , " Conversion from nonstandard to standard measure and ap plications in probablility theory", Transactions of the American Mathe matical Society, 2 1 1 ( 1 974) 1 13122 . [14] E . NELSON, Radically elementary probability theory, Annals of Mathemat ics Studies 1 17. Princeton University Press, Princeton (NJ) , 1987. [15] E. PERKINS, Stochastic processes and nonstandard analysis , in Nonstan dard analysisrecent developments, ed. A.E. Hurd, Lecture Notes in Math ematics 983, SpringerVerlag, Berlin, 1983. [16] K . D . STROYAN and .J . M . B AYOD, Foundations of infinitesimal stochastic analysis, Studies in Logic and the Foundations of Mathematics 1 1 9, North Holland, Amsterdam, 1986.
Quantun1 Bernoulli exp eriments and quantum stochastic processes Manfred
\iVolff*
Based on a
W ...algebraic
Abstract
approach to quantmn probability theory we
construct basic discret e in terna l quantum stochastic pro cesses with in
d ep eiHlent increments. We ob tain Bernoulli
exp eriments
a oneparameter family of (classical)
as linear combinations of these basi c processes .
Then we use the nonstandard hull of the internal GN SHilbert space corresponding to the chosen state
T
( the
?{,.
un derlying quantum probabil
i ty measure) in o rder to derive nonstandard hulls of our internal pro cesses.
Finally continuity requirements lead to the specification of a
certain flUhf;pace £ of
nal pro cesses to
the
can
Hr
to which the nonstandard hulls of our inter
be restricted and
wh ch
i
turns out to be isomorphic
LoebGuicltardet space intro d uced by Lci tzMartini
space of £ the.n ifl shown to he isom orphic to
F+
(L2 ([0, 1] , >.))
1 101.
A sub space
flymmetric Fock
and om· basic processes agree with the processes of
Hudson and Pa.rtltasarathy on this
13. 1
t.he
subspace.
Introduction
In the early 1980's Hudson and Parthasarathy [5], Hudson and Lindsay [4] . and Hudson and Streater [6] started the theory of quantum stochastic pro cesses, in pcu·ticular of quantum Brownian motion. In [12[ Parthao:;arathy showed one way how to make the passage from quantum random walk to diffu sion. All theS(' topics are extensively treated in P.A. Meyer's compendium [ 1 1 [ . For another general introduction to this field see fl3] where an extensive mo tivation ti·om qua.ntum physics may be found. The HudsonPart.hasarathy ap proach is based on t.he classical symmetric Fock space over L2 (R+ , >.) where ·Mathematisches Institut, EberhardKarlsUniv. Tiibingen, manfred . wolff�unituebingen . de
190
13.
Quantum Bernoulli Experiments
A denotes the Lebesgue measure (see section 13.6 ) . Guichardet [3] gave an interesting representation of this Fock space as an L 2 direct sum of spaces L� = L 2 (Xn) where Xn is the space of all subsets of the unit interval [0, 1] having exactly n elements. The measure on Xn is defined via the conesponding Lebesgue measure on �f , This representation was extensively used by Maassen developing his kernel approach to general quantum stochastic processes. Journe [7] invented the so called toy Fock spaces (bebe Fock in French) as discrete approximations of the symmetric Fock space. Let Do = { 0, 1} be equipped with the uniform distribution fLO · Then the toy Fock space of order n is the space L 2 (D0, p, � n) . A rigorous discrete approximation of the Guichardet space and the basic quantum stochastic processes defined on it which uses the N space L2 (D � , p,� ) as a toy Fock space is given by Attal [2] . There is another approach based also on the Guichardet space to prove such approximation theorems using nonstandard analysis which was developed by LeitzMartini [10] . He discretized the Guichardet space in the following manner: let T = { � : k E *Z , 0 :::; k :::; N } be the hyperfinite time line where N E *N is infinitely large. Let r n be the set of all internal subsets of T of internal cardinality n and set r = u�=O rk . Now let ]1.1 be an internal subset of r and set where IAI denotes the internal cardinality of A. Then m is an internal measure on r with m(f) � e. The space L 2 (r, Lm) , where Lm denotes the Loeb measure associated to m, contains the Guichardet space as a Banach sublattice. In fact L 2 (r, Lm) is the nonstandard version of the Guichardct space, so to speak the Loeb Guichardet space over the time interval [0, 1] , Among other things LeitzMartini constructed all the relevant discrete quantum stochastic processes and showed that their nonstandard hulls exist in a precise manner and form the basic quantum stochastic processes: the time process, the creation and annihilation processes, and the number process. All approaches mentioned so far start with some kind of discrete approxi mation of the symmetric Fock space or the isomorphic Guichardet space over �+ , [0 , 1] respectively. In contrast to these approaches we present here a new one based on the more natural W* algebraic foundation of quantum stochastic as described e.g. in [8] , In [13] the author sketched a different though similar approach in the finitedimensional setting but he was not able to make the transition from the finitedimensional to the continuous infinitedimensional case. Our access is exactly the noncommutative version of Anderson's way [1] to Brownian mo tion. In particular we arc able to ju.s tify the usc of the symmetric Fock space (or equivalently the Guichardet space) in the context of quantum stochastic
191
13.2. Abstract quantum probability spaces
processes. We believe that by our approach the applications of nonstandard analysis as given by LeitzMartini [ 10] are also better understandable. We begin with the discrete quantum Bernoulli experiment modelled in the von Neumann algebra of all 2 n x 2 nmatrices over CC. Then we prove a l\loivreLaplace type theorem for quantum Bernoulli experiments, i.e. we show that these discrete stochastic processes converge to quantum Brownian motion in the same sense in which the classical approximation of the usual Brownian motion is treated by Anderson [ 1] . Furthermore we construct the basic quantum stochastic processes out of the corresponding discrete versions in the case of a vacuum state. Finally we show how the symmetric Fock space approach fits into our setting. Applications to the theory of stochastic processes are under preparation .
13 . 2
Abstract quantum probability spaces
Recall that almost all information about a given probability space (D, I:, JL ) is to be found in the pair (A, JL) with A = L= (n, I:, JL). Let us denote by [! ] the equivalence class modulo negligible functions in which the essentially f( w ) dJL( w ) = : bounded measurable function f is contained. Then [f] + JL(f) defines a positive, linear, order continuous normalized functional on A. Analogously a quantum probability space is a pair (A, T) where A denotes a W*algebra and T is a positive, linear order continuous normalized functional on A, or a normal state for short. The reader not familiar with the abstract theory of W*algebras may look at them as subalgebras of the algebra of all bounded operators on an appropriate Hilbert space which arc closed under involution * : a + a* and which are also closed with respect to the strong operator topology. In this context order continuity means the following: whenever a downward directed net (aa) of nonnegative selfadjoint operators from A converges to the operator 0 with respect to the strong operator topology then lima T(aa) = 0 holds. In particular one may interpret L= (n, l::JL ) as the W*algebra of all multiplication operators MJ : g + MJ (g) = fg (! E L= (n, I:, JL)) on the Hilbert space L 2 (D, I:, JL ) . Let (A, T) be a quantum probability space. Let A1 , A2 be W*subalgebras generating the W*subalgebra A3; moreover let Tk be the restriction of T to Ak (k = 1 , 2, 3). Then A1 and A2 are called stochastically independent if (A3 , T3) is isomorphic to (A1 ®A2 , Tl ®T2) where ® denotes the W*tensor product .
fn
Remark: Let us point out that this notion of independence agrees with the usual one in the classical (commutative) case A = L= (n, I:, JL) but that
13. Quantum Bernoulli Experiments
192
there are other notions of independence in the quantum stochastic setting also generalizing the classical one (e.g. free independence) . Two probability spaces (A. T) and (A', T1) are called equivalent if there exists an order continuous algebraic isomorphism rp : A + A' with T1 = T o rp . Now let us construct the L 2 space corresponding to the probability space (A, T). It is nothing else than the so called GNSspace (after Gelfand, Naimark, and Segal) . To this end we introduce the Cvalued mapping (al b) = T(a*b) on A x A. It is scsquilincar, that means, it is linear in the second argument, antilinear in the first argument, and it satisfies (a l a) 2 0 for all a E A. The space LT = {a : (a I a) = 0} is a left ideal, and on A/ LT there is welldefined a scalar product by (a l b) = ( a l b) where a = a + LT denotes the equivalence class in which a is contained. 7{T is the completion of A/ LT with respect to the associated scalar product norm 11 a11 = � = JT(a* a) . The representation JrT : A + J:(HT ) is given by JrT (a)(b + LT) = ab + LT which is welldefined since LT is a left ideal. The representation is injective whenever A is simple (i.e. has no nontrivial closed ideals) , e.g. A = Mn (C), the algebra of all n x nmatrices. In case A = L = (n, p,) the space 7{T turns out indeed to be the space 2 L (D, p,) and in this way L = (n, p,) is represented as the algebra of operators of multiplication by L =functions (see above) . 13 . 3
Quantum Bernoulli experiments
Let us begin with the easiest example of a quantum probability space: Let
A = M2 (C) be the W*algcbra of all complex 2 x 2 matrices a = (a;k ) and choose A E] 1/2, 1] . Then T.A (a) = >au + ( 1  >.)a22 defines a normal state on A. As in the case of classical probability we will often write IE ,\ (a) in place of a.
T.A (a) , and we call it the expectation of Now consider the following model of tossing a fair coin: choose Ao as the commutative W*algebra C { O . l } and let p,o : Ao + C be given by p,o(f) = � (! ( 0) + f ( 1)). The underlying classical probability space is to be rediscovered in Ao as the set of two idempotents 1 { o } and 1 { 1 } where 1 A denotes the indicator function of a subset A of a given set. Ao is spanned also by the elements 1r
We now construct all injective *algebra homomorphisms : Ao + A satisfy ing T,\ o = fLO · In the sense of section 13.2 this means that we will find all abstract probability spaces (B, T,\ 1 13 ) which arc equivalent to (Ao, p,o). An easy calculation shows that they are given by := with 1r
1r
1rz

z
1
) = : Qz
13.3.
Quantum Bernoulli experiments
and linear extension, where Pz  Q z =
( � 0 ) = : Xz .
z E lC
193
is a parameter with l z l
= 1. n2 (X) =
In summary we have seen that the simplest quantum probability space (A. T.\) contains a oneparameter family (Az , T.\ IAJ of models of classical coin tossing. For w = eis, z = e it ( n < s, t :<:: n) the commutator [Xw , X2] . = XwXz X2Xw is easily determined: [Xw , Xz] = 2 i
sin(s  t)
( � � ) .
So Xw and X2 commute if w = z or w = z. We obtain JE.\ ( [Xw , X2] ) = 2 i (2>.  1) sin(s  t). The absolute value of it attains its maximum at  t = ±n /2. The pair (Xw , X2) with z =  iw is called a quantum coin tossing. It is unique up to automorphisms. More precisely this means the following: choose w = e it (t E]  n, n] ) , and consider the automorphism 1.{) on A given by I.{)( a) = u*au where 0 e it/2 u. 0 e it/2 Then i.p(Xl ) = Xw ' i.p(X i ) = x iw and T.\ 1.{) = T.\ . So in the following we need only to consider the quantum coin tossing (X1 , X_ i ) . Notice that X1 = ax , X_i = ay and [Xw , X_,w]x = 2ia2 , where ax , a y, az denote the Pauli matrices. The following important matrices can be constructed by (ax , ay) : 1 0 1 . (13.3. 1 ) (ax + wy ) , b2 0 0 1 0 0 . (13.3. 2) b+ 2 (ax  wy ) , 1 0 1 0 0 ( 13.3.3) 1  [ax , ay] · bo 0 1 2i Obviously the following formulae hold: s
_
(
( ( (
0
)
) ) )
ax = b+ + b  , ay = i · (b+  b  ) , [b  , b+] = a2 = 1  2b0 , Xw = wb+ + wb  .
These formulae show that the random variables b+ and b are in a certain sense more basic than Xw though they are not selfadjoint. As in the classical case we describe the experiment of n ( E N) independent coin tossings by the nfold tensor product (A®n , T�n) . The algebra correspond ing to the kth trial is described by the elements 1 129 · · • 129 1 ®x 129 1 129 · · · 129 1 = : X k 'v" (k 1 ) times ( n k) times
'v"
194
13.
Quantum Bernoulli Experiments
where x E A is arbitrary. Describing k times repetition of our quantum coin tossing we obtain Sw. k := L;= l Xw1 . We have k [Sw ,k, s iw, k] = 2i k·
La
J =l
z,
The pair (Sw, k l S iw, k ) is called a quantum Bernoulli experiment. Setting Ak := A0 k ® 1 ® · · · ® 1 = {a ® 1 ® · · · ® 1 : a E A0 k } 'v" 'v" n  k times n  k times we obtain a natural filtering lC · 1 C A1 C · · · C An = A0n . Moreover there also exists a conditional expectation !Ek from An onto Ak which is given by IEk (a ® b) = T; ( n k ) (b) · a , where b E A0 ( n k ) , and linear extension.
13.4
The internal quantum processes
Now we consider a polysaturated model *V(JR:.) of the full structure V(JR:.) over JR:.. As usual we replace the standard integer above by an infinitely large integer N. Then all considerations in the previous section remain true for internal objects. We use the discrete time interval T = { � : 0 <:.:: k < N } denoting its elements by §., t, JJ . . . . Moreover we rescale also the family (Ak) by setting At := ANt · Set p � JN · Then we introduce n
a±t_ a'l_ •
at
pbf , b'j , p
2
(0 �). 1
( 13.4. 1 ) ( 13.4.2) (1 3.4.3)
These elements are the internal increments by which we construct the basic internal stochastic processes A£ = L: o :S;�
( A i ) t is called the internal creation process, (Ai )t the annihilation pro ( A! ) t the time process and finally (Ai)t th� �umber process. The internal Brownian motion corresponding to the parameter w E lC then is given by Bw,t wAt + wA_t ( l w l = 1 ) . It satisfies the internal difference equation dBw,t = w at+ + w at .
ce ss ,
13.4. The internal quantum processes
195
Like in the previous section we consider the state Tf: N obtain
T.\ ,N and we
JE.\,N ( Bw,t_ ) : = T,\,N ( Bw,t_ ) = 0, JE.\ ,N(B�,t. ) = t_.
(13.4.4)
The second equation follows from the facts that
((wa; wa:; ) 2 )
for §. # 11:, but T>,N  + The commutator is
= T>,N (1/N) =
1/N.
The socalled internal Itotable for the increments is easily computed:
a• a+o a§. a;,
pa2:a: I a0; I a0� I p2aa:;; p2 a; 0 0 p2 a� 0 a+§. a§.o 0 0 a• a;, 0
ptv =
A� + lvi B",t. + l v l 2 A; lvl
s
s
§.
Let 0 # v E *IC be arbitrary. Then the process (P.�_) given by is called the Poisson Process with parameter v . Now we compute the characteristic operator functions of these pro cesses. Whenever V = (ifi)t_ is a stochastic process its characteristic oper ator function is defined by fv (u, t) = exp( i uvt), where u E *IF!:.. Obviously A ( t ) l u= O = iifi and fv ( u, t) is unitary iff Vi is selfadjoint . The processes V = (Vi) we considered so far are all of the form u,
Vi =
I: d§.,
O:S;§.
where
d8 = 1 Q9 · · • Q9 1 Q9 d Q9 1 Q9 · · · Q9 1 'v"' 'v"' ' N13 1 times NN13 times and d E A is appropriately chosen. Since d'IJ,, dy_ commute for 11: # 1!. we obtain
fv (u, t)
=
@ (exp( i ud) ) 13• O:S:i3<.1.
13. Quantum Bernoulli Experiments
196
The most important formulas are the following ones:
@ (cos( pu)! + i sin(u)Xw).e_ O <;.s
13.5
From the internal to the standard world
The operator norm topology on A is too fine in order to obtain reason able nonstandard hulls. Moreover the processes discussed so far are internally bounded but not necessarily Sbounded. For example Bw, kjN is selfadjoint, hence its operator norm is equal to its spectral radius. This number is easily computed as )N, or in other words I I Bw.t. ll = VN · t In the following we only consider T;.. for >. = 1 . The other case will be treated elsewhere. We set T1 = : T and we construct the Hilbert space 7{T corresponding to (A, T) (A the algebra of 2 2matrices, see section 13.2) . To this end we denote the columns of the 2 2matrix a by a = (ai , a�) . Then T(a* b) = (ai I bi ) where the latter is the canonical scalar product on IC 2 . Hence LT = {a : a � = 0} and nT (a) (b + LT) = abi + LT. In particular 7{T S=' IC2 , where the isometric isomorphism is given by a + LT + ai. We have previously introduced the random variable X1 ax . Its equiva lence class in 7{T is denoted by xl = x l + LT We set eo = b�1 = ( 6 ) = xp and e 1 = b11 = ( ? ) = X1 . Now we consider TN = T® N Obviously 7{TN S=' (*IC2)® N holds. Let w E {0, 1 }T be an arbitrary internal function. Then we define ew by e w = @�=[} e w ( k/N ) · The set {e w : w E {0, 1 V } forms an orthonormal basis in 7{TN · 7{TN is an internal Hilbert space, so its nonstandard hull if.TN is a Hilbert space. The formulae (13.3.1) up to ( 13.4.3) above lead to the following equations, which result from the fact that 7{TN is the GNSHilbert space of (A N , TN ) · In order to make the formulae as simple as possible we denote the addition mod 2 by EB, i.e. we consider D = {0, 1 V as the cartesian product of the group (Z2 , EB). Then we set w + = 1r EB w and w  = w for w E {0, 1 }T . Moreover we write 1.t. in place of 1 {.1.} . x
x
=
13.5. From the internal to the standard world
197
With all theses conventions we obtain
w± (.t.) e
(13.5. 1)
w+ (.t.) ew  a  a + ew,
(13.5. 2)
VFJ wEB1t '
t t _;::;w (.t.) ew = N · ata; ew. 
13. 5 . 1
(13.5.3)
Brownian motion
First of all we show that our internal Brownian motion ( Bw,t) leads to the standard Brownian motion on a subspace of HTN . To this end we recall that the standard Brownian motion (f3t) on L 2 ( D , L'") (with D = {0, 1}r, fL fLN fL� N , L the corresponding Loeb measure) can be viewed as a family of multiplication operators f {3tf on L2 ( D L'") which are selfadjoint but unbounded (cf. the end of section 13.2) . From equation (13.4.4) it follows that Bw,t is Sbounded in 7{N . We claim that classical Brownian motion (or better to say: Anderson ' s Brownian motion) is equivalent (in the sense of section 13.2) to our CBw , ot) where (Bw.ot) denotes the family of selfadjoint operators on a subspace of HTN coming from the family (Bw,t_). To this end let l w l = 1 and =
=
'"
f+
,
Po = � 2
( w
w 1
l
).
For w E n we set Pw = @o::;. k < N pw ( kjN) · It follows TN (Pw) = 2  N = !L({w}) . We denote the internal W*algebra generated by {Pw : w E {0, 1}T} by B ( w ) . The formula p bw ,t .·    (P1   p1 ) w at+ + w at t  2 t , shows that ( Bw t) L o < s < t bw.t is contained in B ( w). The internal L= sp;e L= (n, fL) is nothing else than L=({0, 1}, fLo) 0 N . Thus it is not very hard to see that 11N := 11� N (see section 13.2) maps L= (n, fL) onto B(w) with TN 11N IB (w ) = fL. Now let D = {f E L= (n, fL) : f is Sbounded} . Then 
=
+
c
o
11N ( D ) C {a E AN : llall is Sbounded} C { x E 7{TN : llxll 2 = VT ( x* x ) is Sbounded}.
Moreover for j, g E D we have
13.
198
Quantum Bernoulli Experiments
because of 1rN (Jg) = 1rN (j)*nN (g) . Since L 2 (D, L'") is the completion of ef : f E D} the mapping 0/ + ;;;(}) can be extended to a linear isometry U from L2 (D, L'") onto a subspace M , say, of 7{TN · For t E [0, 1] set t min{§. E T : §. > t}. We show that U(f3t ) Bw, t · Defi ne =
Yt
(w)
=
{ 11
=
w(.t.) w(.t.)
=
1 0 =
and l'i = }N L O :S.£oc f3t li\1n = f3t holds in L2 (D, L'") . Therefore for every standard E > 0 there exists a standard no with llf3t(1D  1 Mn ) ll 2 < E for n 2 no standard. But this in turn implies 11Yt(1D  1 MJ ll 2 < E for n standard, n 2 no . Since 7rN is an isometry also for the internal L 2 norm on 7{TN we obtain IIBw, t7rN(1 D  1 Mn ) ll 2 < E for these n 2 no. I I Bw. t 7rN (1 Mn ) l l 2 = I I Yt1 Mn ll 2 � n implies that Bw, t 7rN ( 1 M, ) E 7{TN ,fin and U(f3t 1 MJ Bw ,t;;;{i AJJ . But this in turn gives U(f3t) = Bw, t · That the operator of multiplication by f3t is mapped onto the operator of multiplication by Bw ,t follows from an easy additional argument. =
13.5 . 2
The nonstandard hulls of the basic internal processes
The deeper problem is to find a joint subspace JC of HTN such that all pro cesses have standard parts as closable operators densely defined in JC. We begin the construction with some more notations: For w E { 0 , 1}T =: D we set Mw = {.t. : w(.t.) = 1 }, and lwl := I Mw l , that is the (internal) car dinality of lvlw . Using these notations we obtain the following formulae by means of ( 1 3.4.4)(13.5.2) . Let y LwED Yw ew be arbitrary. Then =
A_t y Aiy Aty Az y
L �L O:S,s.
) ( L( L ) ) L (� L ) L (� L w
+
( •)
(13.5.4)
Yw'w
wED
wED
tpED
O:S,s.
w (')
O:S,s.
(13.5.5)
Yw'w
11,an,'P ( •)
1!,an,'P
+
(,)
'"
(1 3 . 5 . 6 )
'"
(13.5. 7)
13.5. From the internal to the standard world
199
Consider now the projection IE.t. from 7{ onto 1l.t. which is the Hilbert space coming from A..t. (see section 13.3 ) . IE..t. is given by lEt ( ew ) Then we obtain IE_.t.A�
=
{ ew 0
w (�) = 0 for all � 2> t
else
A�IE..t. for each rt E family (A�h< 1 is ada�ted t� (H..t.h < 1 ·
{ • , o, + ,  }
=
or in other words the
We prove now some continuity properties. To this end we introduce 7{ ( n)
:=
{ L Yw ew : Yw E *lC} l w l= n
and we set 7{ : = 7{TN for short. Then 7{ = 1_;{= 0 7{ (n) where _i denotes the orthogonal direct sum. More over A; (H( n) ) C 7{ ( n) Ai (H( n) ) C 7{ ( n) Ai (H( n ) ) C 7{n + 1 , and finally Ai (H( ;;:) ) C 7{ ( n1 ) where 7f ( 1 ) := {0 } = : 7{ ( N+ 1 ) . For y = L l w l =n Yw ew and 0 :S §. < t we obtain ,
,
L I� L
II A� y  A� y ll 2
lw l =n
§."5cy
w+ ( s ) I 2 1Yw l 2
1 1 2 1 Yw l 2 L L I� "5cy <.t. w
<
=
( t  �) 2 II Y II 2 . (13.5. 8)
l l=n §. It follows that II A;  A: ll :S (t  §.) , in particular the mapping t Scontinuous from T to L(H) with respect to the operator norm. Let Y = L l w l = n Yw E 7{ ( n) , and 0 :S � < t
II Ai y  A� y ll 2
=
L(L
l w l=n §."5cY.<.t.
f+
A� is
w( �) r1Yw l 2 ·
It follows that II A� IJt(n) I :S but the mapping t A�y is not Scontinuous in general. So finally we have to choose an appropriate subspace in order to get Scontinuity of t + A�y. Next we show the following inequality: n,
Proposition 1
holds.
For 0 :S � < t
f+
(13.5.9 )
13. Quantum Bernoulli Experiments
200 Proof.
Let y = L l w l = n Ywew E Ji( n ) be arbitrary. Then
holds. If ip(u)
=
0 then
l iP
EB
t,,,l
=
<
n + 1 hence Y'Pffil:Yc
=
0. It follows
n+1 N
Let w E D be arbitrary with lwl = n . Then there exist at most N(t.  ..s.)  lw · 1 [� ,![ 1 different ip with ip EB 11' = w and .§. :S 1J < t_. This gives I (Ai  A! ) Y II2 :S D N (t;�) ( n + 1 ) II Y II2, and the asserted inequality follows. Corollary 1
( 13.5. 1 0)
Ai is the adjoint of Ai. By co�struction we have 'H( n) l_Ji( m ) for n # So the subspace JC= of H , generated by all the spaces i{( n ) ( n E N) is nothing else than the orthogonal direct sum JC= = l_ n ENo i{(n ) . Recall the definition t = min{JJ E T : > t.}. Proof.
m.
Then from the inequality (13.5.8) it follows that by
1J
A;,n Y (A[,ny ) /\ there is uniquely defined an operator on i{( n) bounded by t. So we obtain the operator A� = l_ n ENo At . n on JCoc which is selfadjoint and bounded by t. Moreover I I A�  A: II :S t  holds. D In order to define the nonstandard hull of one of the other operators A� (� E { +,  , 0}) we choose the following joint domain of definition: JC� : = { Y E }(00 : Y = l_n ENo Yn, Yn E i{(n ) , � nll :iJn ll § < 00} =
s
which is dense in JC=. From the inequalities (13.5.9) to (13.5.10) we obtain that by
13.5.
201
From the internal to the standard world
:
the operators A� K0 + K 00 are uniquely defined. Moreover the following proposition holds: Proposition 2
A�
is selfadjoint and Ai and Ai are adjoint to each other.
Proof. Set At n = A� I joint, whereas Atn and is obvious.
H17• These operators are bounded and A�, n is selfad A�( n+ l ) are adjoint to each other. Now the assertion D
The final problem we have to solve is to find an appropriate filtration with the necessary conditions of continuity. To this end we calculate the difference IE.t  IE.e_ for :<:: §. < t For y E 7{ ( n) we obtain
0
_e.
(IE_t  IE Let t. =
)y =
k/N, = l/N. It follows §.
l
1 Yw 2
l w l = n , w9 [o: ..t[
l
max 1 Yw 2
<
lwl = n
(k n l)
17 max 1 Yw 2 Nn (t_  §.)
l
lwl =n
•
n 1 1 (1  k l ) · · · ( 1  k  l ) . I
So the mapping y + IE.tY will be Scontinuous if max lwl = n Yw 1 2 N17 will be finite. This consideration leads to the internal measure m on D = given by m ( w } ) = NL . Then (D, m) is the direct sum of (D17, mn)n :5o N with Dn = = and mn = m bn · Notice that m(Dn) = (�) · Jn , so that N m(D) = L�= O ( � ) · = ( + it ) :::::J e. Moreover L�= L m(D k ) :::::J for all L :::::J oo. Let (D, Lm) be the Loeb measure space associated to (D, m) . Then D ) = e and D is an Lm  null set. Finally L 2 (D, Lm) � _l n N L (D17 , LmJ where Lmn is the Loeb measure associated to m17 • Consider the external subspace of all Sbounded internal *complex val ued functions on D . Then for E the complexvalued function : + is in L 2 (D. Lm) and the space E is dense in L 2 (D, Lm) . Mor eover = 0fk in the L2 sense, where = Now we map onto a subspace Y of 7{ by the mapping V defined as follows: ( )  I wl / 2 ew , V( ) =
{ {w E D : l w l
n}
Jk
Lm (Uk EN k E 2 0( j (w ))
0f
1
{0 . 1 }T 0
\ (U kE N Dk)
X f X
LkENo
X
ef : f X} fk f Ink .
f Lfw N wED
°f
w
13.
202
Quantum Bernoulli Experiments
and set
W(j) = 0]). Then Jr2 l f l 2 dm II V ( f ) ll 2 . Let .C b e the closure o f {W(j) : f E X} = : .[( b) in JC=. Then the last equation shows L2( D, Lm ) � .C. Moreover Xk = {f E X : f ( De ) = 0 for £ # k} is mapped onto a subspace of Ji ( k) ( k standard) . Let .C k be the closure in JC= of {W(f) : f E Xk } =: .C�b) . Then .C = ..lkENLk and .Ck � L 2( D b Lm k ) . The following facts arc easily seen to hold:
Theorem 1
_c k(b) .Ai• _c k(b) Aio _c k(b) .A �+ _c (b) k .A  _c k(b) �
c JC'ff for k E No c .C k c .C k c _c (b) k+l
c .c�bl_l .
Moreover .C n JC0 is dense in .C and it is the joint domain of definition of the operators A� , � E { +,  } which are closable and essentially selfadjoint in case � = and adjoint to each other in case � = +,  . •, o,
•, o,
Remark: We map an internal set !vi of T onto its indicator function This mapping 7/J is bijective and therefore we obtain an isometric isomorphism U of m ) onto the internal Guichardet space of LeitzMartini. This shows that our considerations concerning the continuity of the filtering leads in a very natural manner to the LoebGuichardet space for the basic quantum stochastic processes cf. section 13. 1).
1M E {0, 1 }T =
D.
L2( D,
(
13.6
The symmetric Fock space and its embedding into .C
Let 'H be a Hilbert space and let 'Hr:om be the nth tensor product of 'H . We denote the full symmetric group of n elements by Then by
§n .
there is uniquely defined an orthogonal projection, the range of which is called the nth symmetric tensor product The symmetric Fock space
JiO n .
13.6.
The symmetric Fock space and its embedding into .C
203
+ ( H ) over H is the orthogonal sum :F+ ( H ) = l_':= o H O n where as usual H O 0 lC. For the special case H = L 2 ( [0, 1], >.) (>.: Lebesgue measure) the tensor n
:F
=
product H® is nothing else than L2 ( [0, 1] 71 , .\71 ) where A 71 denotes the dimensional Lebesgue measure. and the projection Pn is given by
n
Now we give the more or less classical definition of the basis quantum processes. To this end we set
h o · · · fn : = Pn ( h Q9 · · · � fn ) · 0
For
0 :S t
<
1
the definition of the creation operator is given by
Ai (( h · · · fn ) : = 1 [o, t [ h · · · fn 0
0
0
0
0
(13.6. 1)
and linear extension t o the largest possible domain. Similarly the annihilation operator is given by
n Ai ( h o . . . o fn ) = L (1 [o, t [ l fJ ) h o . . . o jj o . . . o fn , J= l
(13.6. 2)
where the expression jj indicates that this factor is omitted. The number operator is given by
A� (fl o · · · o fn ) = 1 [o, t [ · fl o · · · o fn + · · · + fl o f2 o · · · o fn  l o 1 [o, t[ · fn , (13.6. 3) whereas the timeoperator is
A� (h o · · · o fn ) = t fl o · · · o fn ·
(13.6.4)
Remark: These operators are those ones introduced by Hudson and Parthasarathy [5] as the basic quantum processes. Consider now the set Xn = {(t 1 , t 2 , . . . , t n ) : 0 :S t 1 < · · · < t71 :S 1}. It is a measurable subset of [0, 1] 71 • Let f be an element of L 2 (X71, .\ 71 ) . Setting
f( t 1 , · · · , tn ) =
{ OJ (t 1 , . . . , tn)
(t 1 , . . . , t n ) E Xn
otherwise
13 . Quantum Bernoulli Experiments
204
we obtain an element j of L 2 ( [0 , l]n, An) . Then Pn (]) E L 2 ( [0, l] , A) O n and II Pn (]) ll 2 = fxn l l 2 An , i.e. llfll = II Pn (]) ll holds. On the other hand let E L2 ( [0, 1] , A )0 n be arbitrary. Then = Pn (jl xn ) (equality in the L2sense) . It follows that the mapping + Pn (]) is a linear isometry from L2 ( Xn , An) onto L 2 ( [0, 1] , A) O n. Its inverse is given by 91 xn =: Un (g ) . Now we embed L2 (Xn , An) isometrically into the space Ln constructed at the end of the previous section. To this end let w E Dn be arbitrary. Set (w) = min(� : w(�) = 1) and by induction �k+ 1 (w) = min(� > h : w(�) = 1). Then I(w) := (h (w) , . . . , �n (w)) E *Xn n Tn . Moreover if f := (kl /N. . . . , kn/N) E *Xn n Tn then f = f( w) for w = 1 M where M = {h (w) , . . . , �n (w) }. We consider the dense subspace Cn of L2 (Xn , An) consisting of the re strictions to Xn of continuous functions on the closure Xn . For E Cn we set S(f) (w) = * (f(w)) . This is an element of L2 (D n , mn ) satisfying l l f l l 2 :::::J II S(f) ll 2 · It follows that Cn is linearly and isometrically embedded into L2(D n , LmJ , and this embedding can obviously be extended to the whole of L2 (Xn , An) . Since L2 (D n , Lmn ) is linearly and isometrically embedded in Ln , we obtain that the symmetric tensor product L2 ( [0, l] , A) O n can be em bedded in Ln . An obvious extension then yields an embedding denoted by U of the symmetric Fock space F+ ( L2 ( [0 1 ] , A) into £. Call y the image of this embedding and set Yo = G n K0 .
fd
f
f
f n!
g n! >
t1
f
f
,
2 Yo is dense in y. Moreover the restrictions of A� to Yo yield closed densely defined operators which are selfadjoint for � E { •} and adjoint to each other in the other cases and which moreover satisfy
Theorem
on u 1 (Qo) where to (13. 6.!J
o,
A�
aTe the operators given by equations
(13. 6. 1)
'UP
The proof is obvious.
References [ 1 ] R. M . ANDERSON, "A nonstandard representation for Brownian motion and Ito integral", Israel .J. Mathern. , 25 1546. [2] S . ATTAL, Approximating the Fock space with the toy Fockspace, in .J. Azema, I\1 . Emery, M. Ledoux, I\L Yor (eds.), XXXVI, Lecture Notes in Mathematics 180L SpringerVerlag, Berlin New York, 2003.
Seminaire de Probabilites
References
[3] A.
GUICHARDET,
205
Symmetric Hilbert spaces and related topics, Lecture
Notes in Mathematics 26L SpringerVerlag, Berlin, 1972.
[4] R.
L . HUDSON and J . M. LINDSAY, "A noncommutative martingale rep resentation theorem for nonFock Brownian motion", .T . Funct. Anal., 6 1
(1985) 202221 .
[5] R.
R . PARTHASARATHY , "Quantum Ito ' s formula and stochastic evolutions", Comm. Math. Physics, 93 (1 984) 311323. L. HUDSON and K .
[6] R.
L. HUDSON and R. F. STREATER, "Ito ' s formula is the chain rule with Wick ordering", Phys. Letters, 86 ( 1982) 277279.
[7]
"Structure des cocycles markoviens sur respace de Fock", Prob. Theory Re. Fields, 75 (1987) 291 316.
.T . L . .T OURNE.
[8] B .
KDMMERER and H. MAASSEN , "Elements of quantum probability", Quantum Probab. Communications, QPPQ, 10 (1998) 73100.
[9]
H . MAASSEN, Quantum Markov processes on Fock space described by integral kernels, in L. Accardi, W. von Waldenfels, Lecture Notes in Math ematics 1136, SpringerVerlag, Berlin. New York, 1985.
Quantum probability and applications II, Proceedings Heidelberg 1984, [10] M. LEITZMARTINI. Quantum stochastic calculus using infinitesimals, Ph.D. Thesis, University of Tubingen, 2001 . [11] P . A . MEYER. Quantum probability for probabilists, Lecture Notes Mathematics 1 538, SpringerVerlag, Berlin, New York, 1993.
[12]
m
K. R. PARTHASARATHY, The passage from random walk to diffusion in quantum probability, (1988) , Ap plied Probability Trust, Sheffield, 231  245.
Celebration Volume in Appl. Probability [13] K. R. PARTHASARATHY, An introduction to quantum stochastic calculus, Birkhauser. Basel, 1992.
Applications of rich measure spaces formed from nonstandard models Peter Loeb*
Abstract \i'Ve review some recent work by Yeneng Sun and the author. Sun'i:l work shows that there
are
results, some used for decades without a rigourous
that arc only true for spaces with tho rich structmo of Loeb measure spacel:l. His j oint work with the author uses that structure to extend an important result on the purification of measure valued maps. foundat.ion,
14. 1 In
Introduction 1975 [ 1 1] ,
the author constructed a class of standard measure spaces
fanned on nonstandard models.
These spaces, nmv called "Loeb spaces" in
the literature, arc very close to the underlying internal spaces, and ru·c rich in
s t ruc t ure.
(Sec
1 121 by Yeneng Sun
the author's throe clmptcrs and Osswald's two chapters in
for background. ) 'vVe will briefly review here some recent work
that uses these measure spaces. Sun has shown that there are results, some
used in applications for decades without a rigorous foundation, that are only true for spaces with the rich structure of Loeb measure spaces. We will fol l ow the description of Sun'::; work with a detailed proof of a special case of t.hc recent
result in
[H] by Yeneng Sun and the author on A counterexample shows that the
valued maps.
the purification of measure
result we give is false if the
Loeb space we use is replaced by the unit interval with Lebesgue measure. We
note in passing hero that the result in
[13]
by Osswald, Sun, Zhang and the
author presents yet another example needing rich measure ::;paces.
*
Department
of
Mathematics, 'University
loeb@math. uiuc . edu
of
Illinois, Urbana, IL 61801.
14.2. Recent work of Yeneng Sun
14.2
207
Recent work of Yeneng Sun
We begin with some work in the late 1990's of Yeneng Sun ( [16] , [17] , [18] , and Chapter 7 of [12] ) . That work, Sun's alone, is published elsewhere, so what is said here is an invitation to read, and not a substitute for, the original articles. To set the background we consider the following question: Does it make sense to speak of an infinite number of independent individuals or random variables indexed by points in a uniform probability space? Can one, for example, reasonably consider an infinite number of independent tosses of a fair coin with the tosses indexed by a uniform probability space, and if so, does it make sense to say that half the tosses should be heads? Of course, there is no problem in speaking of independent random variables indexed by the first integers with each integer having probability 1/ An infinite index set with a uniform probability measure, however, must be uncountable; there is no uniform probability measure on the full set of natural numbers. The problem thus evolves to finding the possible meaning of an uncountable family of independent random variables. Whatever way one approaches this problem, the usual measuretheoretic tools fail. A natural attempt to generalize coin tossing replaces the natural numbers with points in the interval [0, 1], and thus replaces sequences of 1 ' s and  1 ' s with functions from [0, 1] to the twopoint set { 1, 1 } . This is a reasonable model for an uncountable family of independent coin tosses. In 1937, Doob [2] exhibited a problem with this and similar spaces of functions when standard techniques arc applied. To sec the problem, let denote the set of {  1, 1 } valued functions on [0, 1] , and let be the product measure on constructed from the measure taking the values 1 / 2 at 1 and 1/2 at  1 . In the usual construction of an appropriate aalgebra on measurable sets are formed from the algebra of cylinder sets, with functions in each cylinder set restricted at only a finite number of elements of [0, 1] to take either the value 1 or  1 . It follows that each measurable set in is determined in a way described below by a countable subset of [0, 1 ] , and so the following result holds.
n
n.
P
D
D
D,
D Proposition 14.2.1 Fix any h E D. Set Mh := {w E D : w(t) = h(t) except for countably many t E [0, 1 ] } . Then Mh has Pouter measure 1 . Proof. For each measurable B � D, there is a countable set C [0, 1] such that for all o:, (3 in D, if o:(t) = (3 (t) for all t E C, then E B if and only if (3 E B. Suppose Mh � B . Given w E D, Let w' agree with w on C and agree with h on [0, 1] \ C. Then w' E Mh, so w' E B. It then follows that w E B . Thus B = D, so the outer measure P*( Mh ) = 1 . a
C
D
14. Applications of rich measure spaces
208
Remark 14.2.2 Note that if w E lvh, then since Lebesgue measure A of a countable set is 0, w = Aa.e. on [0, 1 ] . This is true if is nonmeasurable, or if 1, or if  1 . Now outer measure when applied to the intersection of measurable sets with Mh is finitely additive, hence countably additive. Since the outer measure of Mh is 1 , one can trivially extend P to a measure P with P(Mh) = 1 . Thus, no matter what might be, one can claim that P almost every function is equal to at Aalmost very point of [0, 1]. This, and other highly questionable arguments have been used for decades to work around the measure theoretic problem indicated by Doob's example.
h
=
h
=
h
h
h
h
Another approach to representing a continuum of independent random vari ables is to consider a function w , called a process, where is an index from an uncountable probability space called the parameter space (the probability measure need not be uniform) and w is taken from a second probability space called the sample space. The question then is whether it makes sense to work with the usual product of these two probability spaces. Sun has shown in Proposition 7.33 of [12] that no matter what kind of measure spaces, even Loeb measure spaces, one might take as the parameter space and sample space of a process, independence and joint measurability with respect to the classical measuretheoretic product, i.e. , formed using measurable rectangles as in [15] , are never compatible with each other except for a trivial case. Here is the exact statement of that proposition.
f(i, )
i
Let (I, I, ) and (X, X, v) be any two probability spaces. Form the classical, complete product probability space (I x X, I 129 X, 129 v ) . Let f be a function from I x X to a separable metric space. If f is jointly measurable on the product probability space, and for 11almost all (i 1 , i 2 ) E !xi, h and fi2 are independent (call this almost sure pairwise in dependence), then, for ttalmost all i E I, f( i. ) is a constant f1m ction on X. Proposition 14.2.3 (Sun)
f1
f1
f1 129
·
Sun notes that Proposition 14.2.3 is still valid when f1 has an atom A . The almost sure pairwise independence condition implies the essential constancy of the random variables fi for almost all i E A. In the articles cited at the beginning of this section, Sun has shown that a construction overcoming these measuretheoretic problems is obtained by forming the internal product of internal factors, and then taking not just the Loeb space of each factor, but also the Loeb space of the internal product. This yields a rich extension of the usual product aalgebra formed from the Loeb factors, one that still has the Fubini Property equating the integral over the product space to the iterated integrals over the factor spaces forming that product. Here in more detail is that construction.
14.2. Recent work of Yeneng Sun
209
Sun's starts with internal spaces ( T, T, A) and ( D, A, P) . The space T may be a hyperfinite set with A given by uniform weights. He then forms the Loeb spaces (T, L>. (T) ) ) and (D, L!L (A) , P ) . He lets A Q9 P denote the internal product measure, while T Q9 A denotes the internal product aalgebra, and L>. (T) Q9 Lp (A) denotes the classical product aalgebra formed from L>. (T) and Lp (A) as in [15] . On the other hand, forming the Loeb space from the internal product T Q9 A produces a larger aalgebra L>.0P (T Q9 A) on T n . ( Sec the following propositions. ) Here arc some properties of what we shall call the big product space (T x D, L>.0p (T Q9 A) , �) . X
The big product space depends only on the Loeb factor spaces (T, L>. (T) , >:) and (D, Lp (A) , P) . Proposition 14.2.5 (Anderson [1] ) If E E L>. (T) Q9 Lp (A) , then E E L>.0p (T � A) and � (E) = ); ® P(E) . Proposition 14.2.6 (Hoover (first example) , Sun [ 1 7] ) The inclusion of L>. (T) Q9 Lp (A) 'in L>.0P (T Q9 A) is strict if and only if both ); and P have nonatomic parts. Proposition 14.2. 7 (Keisler [6] ) A Fubini Theorem holds for the big prod uct space, Proposition 14.2.4 (KeislerSun [7] )
In his articles, cited above, Sun shows that to work with independence in a continuum setting, as has been attempted in informal mathematical applica tions without rigor, and for decades, one needs a rich product aalgebra such as the aalgebra of a big product space. Here is that general result as stated in Proposition 7.4.1 of [12] and proved in Theorem 6.2 of [17] .
Let X be a complete, separable and metrizable topological space. Let M (X) be the space of Borel probability measures on X, wher·e M(X) is endowed with the topology of weak convergence of measures, Let be any Borel probability measure on the space M(X) . If both ); and P are atomless, then there is a process f from (T n , L>.0P (T Q9A), GP) to X such that the random variables ft = f ( t, ) are almost surely pairwise independent (i,e,, for �almost all (t1 , t2 ) E T x T, fh and ft2 are independent ), and the probability measure on M(X) induced by the function Pft 1 from T to M (X) is the given measure Remark 14.2.9 Here, for any given t E T, P ft 1 is the probability measure on X induced by the random variable ft D + X. It is the measurable mapping from T to M (X) taking the value P ft 1 at each t E T that induces Proposition 14.2.8 (Sun) fL
·
X
fL .
:
the measure fL on M (X).
14. Applications of rich measure spaces
210 Remark 14.2.10
(
A consequence of the proposition is that when the space
T. L>.. (T), :\) is a hyperfinite Loeb counting probability space and 'ji is a non
atomic Loeb probability measure, it makes sense to use the big product space as the underlying product space for an infinite number of equally weighted, independent random variables or agents. As part of his work with large product spaces, Sun has extended the law of large numbers. The usual strong law states that if random variables Xi , E N, are independent with the same distribution and finite mean m, then � L:i= l Xi tends almost surely to the constant random variable m. That is, for almost all samples the value of the sequence at tends to the constant m.
i
w, w Theorem 14.2. 1 1 (Sun) Let f be a realvalued integrable process on the big product space. If the random variables ft := f(t, ·) are almost surely pair wise independent, then for almost all samples w E D, the mean of the sample function fw := f(·,w) on the parameter space T is the mean off viewed as a random variable on the big product space. There is no requirement of identi cal distributions. Another facet of this same work deals with independence. It is well known that for a finite collection of random variables, pairwise independence is strictly weaker than mutual independence. Sun has shown that for processes on big product spaces, pairwise independent and mutual independent coalesce. and they coalesce with other notions of independence that are distinct for a finite number of random variables. This implies asymptotic results for finite families of random variables where the families are ordered by containment and have increasing cardinality.
14.3
Purification of measurevalued maps
We now turn to a special case of the joint work of Yeneng Sun with the author in [14] . That special case generalizes the following celebrated theorem of Dvoretzky, Wald and Wolfowitz (see [3] , [4] , [5] ) .
Let A be a finite set and M(A) the space of probability measures on A. Let (T, T) be a measurable space, and ILk ! k = 1 , · · · , finite, atomless signed measures on (T, T) . Given f : T + M (A) so that for each a E A, f( ·) ( {a}) is Tmeasurable, there is aTmeasurable map g : T A so that for each a E A, and each k l f(t)({a})dfLk (t) = fLk ({t E T : g(t) = a}). Theorem 14.3.1
m,
+
:<:: m,
14.3. Purification of measurevalued maps
211
This theorem justifies the elimination, i.e., purification, of randomness in some settings. For example, in games, T represents information available to the players, and A represents the actions players may choose, given E T. Every player�s objective is to maximize her own expected payoff, but that payoff depends on the actions chosen by all the players. For each player, a mapping from T to A is called a pure strategy. A mapping from T to M (A) is called a mixed strategy; in this case, each player chooses a "lottery on A". A Nash equilibrium is achieved if every player is satisfied with her own choice of strategy given the choices of the other players. In quite general settings, such an equilibrium can be achieved using mixed strategies. When results such as Theorem 14.3.1 apply, those mixed strategies can be purified yielding an equilibrium with the same expected payoffs. We re fer the reader to [14] for more information on the gametheoretic consequences of Theorem 14.3.1 and its extension. We will concentrate here on the proof of that extension, a proof simpler than that of the more general result in [14] . We extend the DWW Theorem from a finite set A to a complete, sepa rable metric space. There always exists a Borel bijection from such a space to a compact metric space. That is, if A is uncountable, then it follows from Kuratowski's theorem (see [15] , p. 406) that there is a Borel bijection from A to 1]. On the other hand, for a countable set A, one can use a bijection from A to {0, 1 , 1/2, . . , 1 / , . . . } . Therefore, it suffices to let A be a compact metric space. For simplicity we will work with a finite set of measures, but may extend to a countable collection.
t
[0,
. n
Let K be a finite set, and let A be a compact metric space. For each k E K, let be a nonatomic, finite, signed Loeb measure on a Loeb measurable space (T, T) . If f is a Tmeasurable mapping from T to M (A), then there is a T measurable mapping g from T to A such that for each k E K and for all Borel sets B in A, f f (t ) ( B) ( dt) = (g 1 [Bl) . This is equivalent to the condition that for each k E K and for any bounded Borel measurable function e on A, l l B( a)f(t)(da )fLk( dt) l B(g( t )) !Lk (dt). Theorem 14.3.2
fLk
r
fLk
fLk
=
Example 14.3.3 Sun and the author show by a counter example in [14] that the Loeb measures of the theorem cannot be replaced with measures absolutely continuous with respect to Lebesgue measure on [0, 1 ] . In their example, the compact set A is the interval [ 1, 1], and the measurevalued map is given by E [0, 1 ] . The example uses just two measures, : = (5t + L t )/2 for each /L l = >. on [0, 1 ] , and fL 2 = 2 >. on 1]. If there is a function that works, then it is not hard to see using the functions on A given by and
f(t)
t t [0 ,
g B(a) l a l B(a) a2 =
=
14. Applications of rich measure spaces
212
(t l g(t)1 ) 2 >.. ( dt) g
g(t )
that J01 must take the value t or  t >.. a.e. = 0. That is, on [0, 1 ] . It is also not hard to see that this is impossible. The moral is that a Lebesgue measurable can't switch values on [0, 1] fast enough to work.
fik
To set up the proof of the theorem. we let be the signed internal measure generating for each k E K. Also, we set Po = where is the internal total variation of and c = Then Po is an internal probability measure on and for each k E K, < < Po . Let The probability measu � P is the Loeb measure generated P=
fLk
fik
T,
st(c) L.kE K l fLk 1.
fLk f3k
�L.k EK 1/ikl , L.kEK 1/ik I (T). fik
f3k fLk
1/ikl
by P0 , and for each k E K, < < P. Let be the internal RadonNikodym derivative of with respect to Po . Since < < P, is Sintegrable with � respect to Po and := st is the RadonNikodym derivative of with respect to P. We let B denote the collection of Borel subsets of The following lemma is the only part of the proof that needs nonstandard analysis.
fik
f3k
{ik
A.
fLk
Let { ¢, : i E N} be a countable, dense (with respect to the sup norm topology) subcollection of the continuous realval1Led functions on A. As sume that there is a sequence ofTmeasurable mappings {gn , n E N} from T to A such that for each i E N and k E K, the sequence c/J2 (gn(t)) f3k (t)P(dt) converges; let c,,k E rn:. denote the limit. Then, there is aTfrmeasurable mapping g from T to A such that for each i E N and k E K, l c/J, (g(t)) f3k (t ) P (dt) ci k Proof. For each n E N, let hn : T + *A be a Tomeasurable lifting of gn with respect to the internal measure Po . Then for each and i E N and each k E K, Lemma 14.3.4
=
,
·
n
and so
�i�
n t CXJ
( l l *¢i(hn(t)){ik (t)Po(dt)  i,k l ) st
c
hn
= 0.
Using N 1 saturation, we may extend the sequence to an internal sequence and choose an unlimited integer H E *N so that for every E N and each k E K,
The desired function
t E T.
i
g is obtained by setting g(t) := st (hH ( )) E A at each t
D
14.3. Purification of measurevalued maps
213
( )
Now for the proof of Theorem 14.3.2, we note first that since M A is a compact metric space under the Prohorov metric p which induces the topol ogy of weak convergence of measures , there is a sequence of simple functions Un } �= l from T, T to M A such that
(
)
( )
(
)
Vt E T, nlim p (jn (t) , j(t)) = 0. t oo Assume we know that for each E N, there is a Tmeasurable mapping n
9n from T to A such that for each k E K, and any bounded Borel measurable function e on A ,
fr l e(a) fn(t) (da)p,k (dt) = fr e(gn(t))!Lk (dt) .
Let g = { cPi : i E N} be a countable dense subset of the continuous realvalued functions on A with the supnorm topology. For a given cPi E g and each t E T, A ¢� (a)fn (t) (da) + A cjJ, (a)j (t) (da) . Moreover, each integral is bounded by the maximum value of l¢i l on A , so by the bounded convergence theorem, for each k E K,
f
f
L cPi (gn (t) )fJk (t)P(dt) L cP2 (gn (t))tLk (dt) = L l cPi (a)fn (t) (da)p,k (dt) + L l cPi (a)j (t) (da)p,k (dt) =
By the lemma, there is a Tmeasurable mapping g from T to A such that for each k E K,
L cPi (g(t) )tLk (dt) = L cPi (g(t) )f3k (t) P(dt) = L l cPi (a) j(t) (da)tLk (dt) .
Since this is true for each cPi in Q, it is true with cPi replaced by an arbi trary bounded Borel measurable function e on A. Therefore, without loss of generality, we may assume that f : T + M A is simple. Next, we fix a sequence of Borel measurable, finite partitions pm { A , . . . , A } of A such that the diameter of each set in Pm is at most 1/2m , and Pm + l is a refinement of Pm . Also for each l, 1 ::; l ::; lm , we pick a point a[ E A . Since f is a simple function from T to M A , there is a Tmeasurable partition { Sj }f= l of T such that f = "'/j E M A on Sj . It follows that for
( )
1
=
Z:
[
( ) ( )
14. Applications of rich measure spaces
214
each B E B, and each k E K, fr f(t ) ( B ) p,k (dt) = Lf= l "(j ( B ) p,k(Sj ) . By Lyapunov's Theorem, each 51 can be decomposed by a finite, Tmeasurable partition {Tf' m , . . . , Tz:":'} so that for every l :::.; lm and k E K, f1k (Tf 'm ) = ,.YJ (A[ )p,k (Sj ) , whence
l j(t) (A[ ) p,k (dt) f; "(j (A[n ) !1k ( Sj ) =
N
N
=
( )
L /1k Tf 'm . j= l
(As an alternative to the above use of the Lyapunov theorem, one can with small modifications of the proof here use the author's Lyapunov theorem [10] , but then all of the simple functions must be modified on a Pnull set T0 so that for each of them, the corresponding partition sets Sj of T are internal. ) Now for each m 2: 1 , we define a Tmeasurable mapping 9m : T > A so that for each l :::.; lm and each j :::.; N, 9m ( t) = a[ on T/'m . For each continuous, realvalued 7/J on A , for each m _2: 1 and each k E K,
lm N
= L L ( 7/J (a [ )rj (A[ ) ) f1 k (Sj ) l=l J = l
lm N
( )
= L L 7/J (a[ ) !1 k T/'m l=l j = l
approximates as m > oo the integral
£ (l >/� (a)f(t) (da) ) Mk (dt)
�
·
·
t, £ (1r t, � 1, (1r
)
>,b (a) j(t) (da) �k (dt)
)
>,b(a) o, (da) Mk (dt ;
215
References so
1/J (a[ ) !Lk (Tzj,m ) . L (1 1/J (a)j(t)(da) ) fLk (dt) It follows from the lemma that there is a Tmeasurable mapping g from T to lm N =LL l=l j = l
A such that for each k E K and for each continuous realvalued function e on A from a countable dense set of such functions, and therefore for each bounded Borel measurable function e on A,
L B(g(t))tLk (dt) L l B(a)j(t)(da) fLk (dt). =
D
References [1] R. M . ANDERSON, "A nonstandard representation of Brownian motion and Ito integration", Israel J . Math . , 25 (1976) 1546. [2] .T . L . D ooB, " Stochastic processes depending on a continuous parameter", Trans. Amer. Math. Soc., 42 (1937) 107 140. [3] A. DVORETSKY, A. WALD and .J . WoLFOWITZ, "Elimination of random ization in certain problems of statistics and of the theory of games", Proc. Nat . Acad. Sci. USA, 36 (1950) 256260.
[4] A. DVORETSKY, A. WALD and .J . WoLFOWITZ, "Relations among certain ranges of vector measures", Pac . .J. Math. , 1 (1951) 59 74. [5] A . DVORETSKY, A . WALD and .J . WoLFOWITZ, ""Elimination of ran domization in certain statistical decision problems in certain statistical decision procedures and zerosum twoperson games", Ann. Math. Stat . , 22 (1951) 1 21.
[6] H . .J . KEISLER,
An infinitesimal approach to stochastic analysis, Memoirs
Amer. Math. Soc.
48, 1984.
[7] H .. T . KEISLER and Y . N . SUN, "A metric on probabilities, and products of Loeb spaces", .Jour. London Math. Soc . , 69 (2004) 258272.
14. Applications of rich measure spaces
216
[8] 1\ I . A . K HAN , K . P . RATH and Y. N. SuN. "The DvoretzkyWald Wolfowitz theorem and purification in atomless finiteaction games", In ternational Journal of Game Theory, 34 (2006) 91  104.
[9] M . A . KHAN and Y. N . Su N , "Noncooperative games on hyperfinite Loeb spaces", J. Math. Econ. , 31 (1999) 455492. [10] P . A . LOEB, "A combinatorial analog of Lyapunov's Theorem for infinites imally generated atomic vector measures". Proc. Amer. Math. Soc . , 39
(1973) 585586. [ 1 1] P . A . L OEB, " Conversion from nonstandard to standard measure spaces and applications in probability theory", Trans. Amcr. Math. Soc., 2 1 1
(1975) 1 1 3 122.
Nonstandard Analysis for the Working Mathematician. Kluwer Academic Publishers, Amsterdam, 2000.
[12] P . A . LOEB and M. WoLFF, cds.,
[13] P . A . L OEB, H . OsswALD, Y. SuN and Z . ZHANG, "Uncorrclatcdncss and orthogonality for vectorvalued processes", Tran. Amer. Math. Soc . , 356
(2004) 3209 3225. [14] P . A . LOEB and Y. Su N , "Purification of measurevalued maps", Doob Memorial Volume of the Illinois Journal of Mathematics, 50 (2006) 747 762. [15] H . L . RoYDEN,
Real Analyszs, third edition. Macmillan, New York, 1988.
[16] Y. SuN. "Hyperfinite law of large numbers", Bull. Symbolic Logic, 2 (1996) 189198. [17] Y. Su N , "A theory of hyperfinite processes: the complete removal of in dividual uncertainty via exact LLN". J. Math. Econ. , 29 (1998) 419503. [18] Y. SU N , "The almost equivalence of pairwise and mutual independence and the duality with exchangeability", Probab. Theory and Relat. Fields, 1 1 2 (1998) 425456.
More on Smeasures David A. Ross
15.1
*
Introduction
In their important. (but. often overlooked ) paper [1] . C. Ward Henson and Frank \iVattenberg introduced the notion of Smeasura.biz.ity, and showed that Smeasurable functions are "approximately standarcP' ( in a sense made precise in the next section) . In a recent paper ([3]), the author used this machinery to transform a well known nonstandard plausibility argument for the RadonNikodym theorem into a correct and complete nonstandard proof of the theorem. The problem of finding an "essentially nonstandard" proof for RadonNikodym had been one of long standing. A ltho ugh Luxemburg gave a nonstandard proof as long ago as 1972 ([2] ) , he obtained the result as a consequence of another equally deep theorem in analysis clue to Riesz. ( Beate Zimmer [6] has recently proved vectorvalued extensions of RadonNikoclym starting from the same plausibility argument, though using different standardizing machinery than that in [3] or the present paper. ) In this paper I use an Smeasure argument very like the one in [3J to give an intuitive nonstandard proof of the Riesz result used by Luxemburg. Along the way [ gi ve a new proof for the main technical result from [1] on Smeasures, a uew noustaudard proof for Egoroff's Theorem, and a nonst.
15.2
Loeb measures and Smeasures
I will assume that we work in a nonstandard model in the sense of Robin son , and that this model is as saturated as it needs to be to carry out all constructions; in particular, it is au enlargement. *Department of Mathematics, University of Hawaii, Honolulu, HI 96822. ross@math . hawai i . edu
218
15. More on Smeasures
Suppose X a set and A is an algebra on X. There are two natural algebras on *X : Ao = {*A : A E A} (the algebra of subsets of *X) , and *A. Note that except in the simplest cases, (i) neither Ao nor *A are aalgebras; (ii) Ao is external; and (iii) *A is vastly larger than Ao . These leads to two distinct aalgebras:
standard
1 . As
=
the smallest aalgebra containing Ao
2. AL
=
the smallest aalgebra containing *A
Recall that if IL is a (finitely or countablyadditive) finite measure on (X. A) then *IL maps *A to * [0 , oo ) , and in particular is not normally a measure, un less the range of IL is finite. However, 0*1L takes its values in [0 , oo ) , and so (*X, *A, 0*1L ) is an external, standard, finitelyadditive finite measure space. The following was first noticed by Loeb. It is an immediate consequence of N rsaturation and the Caratheodory Extension Theorem, or can be proved directly; sec [4] for details. Theorem 1 °*1L VE E AL
extends to a aadditive measure ILL on (*X , AL ) . Moreover, ILL(E)
=
=
inf{ILL (A) : E c:;; A E *A} sup{ILL(A) : A c:;; E, A E *A}
( 1 5 . 2. 1 ) ( 1 5 . 2 . 2)
Of course, the measure ILL remains aadditive when restricted to any a algebra contained in AL, in particular As; this corresponds to replacing *A by Ao as the generating algebra. However, it is not so obvious that the approx imation properties ( 1 5 . 2 . 1 ) and ( 15 . 2 . 2) hold with this replacement as well. The next lemma, which asserts that they do, guarantees that a set E c:;; *X has 0" in the sense of Henson and Wattenberg [1] precisely when E c:;; D for some D E As with ILL(D) = 0. For this lemma, and the duration of the paper, it will be convenient to assume that A is a aalgebra.
"Smeasure
Lemma 1 VE E As, �LL (E)
=
=
inf{ �L(A) : E c:;; *A, A E A} sup{IL(A) : *A c:;; E, A E A}
Proof. Let
'B
{ E E As : Vc > 0 ::JA, B E A such that *A c:;; E c:;; *B and IL(B \ A) < c} { E E As : Vc > 0 ::JA, B E A such that *A c:;; E c:;; *B and ILL (*B) < ILL(E) + c and ILL (*A) > ILL (E)  c } .
(1 5.2.3) ( 1 5 . 2 . 4)
219
15.2. Loeb measures and Smeasures
E
E
Evidently *A 'B for every A A. It suffices to show that 'B is a aalgebra, so that As � 'B. If A � E � B then B e � Ec � A c and fLL(B c \ A c ) = fLL (A \ B) , so 'B is closed under complements. It remains to show that 'B is closed under countable unions. Let E = U n En where En 'B ( n N) increases to E. and let c > 0. For some N, ILL( EN ) > fLL(E)  c/2, and for some A A, *A � EN and fLL (*A) > fLL (EN )  c/2; it follows that *A � E and fLL(*A) > fLL(E)  c. For the exterior approximation, there exists Bn A with En � *Bn and fLL (*Bn \ En ) < c2  (n + 1 ) . Put B = U n Bn , then E = U n En � U n *Bn � *B ' and
E
E
E
E
fL(B)
=
fLL ( *B)
Call a function
(
)
((
) )
/L L *B \ U *Bn + ILL U *Bn \ E + fLL(E) n n n 1 < 0 + L c T ( + ) + fLL(E) S fLL(E) + c. n S
f : *X + rn:.
D
approximately standard provided:
1. f is Asmeasurable;
f
2.
the restriction to JR:. ; and
g = f ix
3.
f :::::J *g almost
everywhere (with respect to
of
to X is an Ameasurable function from X
fLL) .
For example, suppose h : X + rn:. i s a bounded Ameasurable function. For r rn:. , 0*h  1 (  oo, = n nE N * h  1 (  oo , + 1/n) , so 0 *h is Asmeasurable. For X, = = 0*h(x) , so h = (0*h) lx. It follows that 0*h is approximately standard. The main result of this section is the theorem of Henson and Wattenberg, that Asmeasurable functions are approximately standard. The proof here differs from theirs. and makes it possible to prove some new results about integrability. Denote by the set of all approximately standard functions. The following enumerates some useful propert ies of
r] x EE h(x) *h(x)
r
all
APS
APS.
Lemma 2
APS is closed under finite linear combmations. 2. If f, g E APS then min{!, g} E APS and max{!, g} E APS. 3. If fn E APS, n E N, and fn mono tonically increases pointwise to f, then f E APS . 1.
220
4.
15. More on Smeasures
Let h : X rn:. be an Ameasurable function, and put E = {x E *X : l *h(x) l < oo}. Then (i) E E As, (ii) fLL(*X \ E) = 0, and (iii) 0(*hxE ) E APS (where XE is the characteristic function of E). +
Proof.
f, g *X rn:. and (af (3f) l x = (af l x f3f l x ) . 2. As in ( 1 ) , note that max{f, g} l x = max{ f l x , g l x } and min{!, g} l x = min{ f l x , g l x }. 3. Let 9n = fn l x , An = { x E *X : *gn (x) ';f:; fn(x)}, and A = U n EN A n. Since fLL( An) = 0 for each fLL( A) = 0. Let g = f i x and note g = upn 9n · Fix a standard c 0, and for E N let En = {x E X : 9n (x) g(x)  c }. Since X = Un EN En, fLL(*En) = fL(En) increases to fL(X) = fLL(*X), so fLL(A U (*X \ U nE N *En )) = 0 . Suppose x E ( *X \ A) UnEN *En . For some m E N, x E *E and fm(x) f(x)  c. It follows that *g (x) < *g (x) + c ::::::; fn (x) + c f(x) + c < fm(x) + 2c ::::::; *gm (x) + 2c *g(x) + 2r::n. Since c was arbitrary, *g (x) ::::::; f(x). 4. Put En = {x E X : l h (x) l < } , so E = Un EN *En E As . X = U n EN En, so fLL(*En) fL(En) increases to fL(X) fLL(*X), and fL L(*X \ E) 0. 1 . Follows immediately from the observation that if E rn:. then + +
a , (3
+
n,
s
>
n
>
n
n,
:S
:S
n
=
n
>
=
=
The proof of (iii) is now like the example preceding the statement of this lemma. D
Every Asmeasurable function is approximately standard. Proof. Note that if A E A then X*A = *xA = 0*XA is in APS by Lemma 2 ( 4) .
Theorem 2
By ( 1 ) and (3) of Lemma 2 and a suitable version of the Monotone Class Theorem (for example, Theorem 3 . 1 4 of [5] ) , every bounded Asmeasurable function is in APS. If 2 is Asmeasurable then = supn max{f, n so is in APS by (2) and (3) of Lemma 2. Any general Asmeasurable can be written = max{!,  max{ so is in APS by (1) and (2) of Lemma 2. D
f 0
f
f
f
},
0} f, 0} For example, suppose E E As ; denote by S(E) the set X n E of standard elements of E . If f is the characteristic function of E then g = f i x is the characteristic function of S(E) . By the theorem, S(E) is Ameasurable and fL(S(E)) = fLL(*S(E)) = f *g dfLL = f f dfLL = IL L( E) . Corollary 1 Let f : *X be approximately standard, and p 0. Put g = f i x . Then f E £P(ttL) if and only if g E £P(tt), in which case J fPdfLL = J gPdfL and l f l v = l g l v · + JR:.
>
221
15.3. Egoroff's Theorem
Proof. Without loss of generality, f 2: 0. Let s 71 : *X + rn:., n E N, be simple functions increasing pointwise to fP. Put t71 = s 71 l x, then t71 is a sequence of simple functions on X increasing pointwise to gP. Moreover, if s71 = L7:=1 CYk XEk then t n = L 7:= 1 CYk XS ( Ek ) , and j s n dfLL = L7:=1 CYk fLL(Ek ) = '5:."7:= 1 CYk fL(S(Ek )) = J t 71 dfL. Let n + oo, and it follows that J fPdfL L and J gPdfL arc either both infinite or arc both finite and equal. D The next result lets us extend the notion of approximate standardness to completionmeasurable functions.
Let f : *X + JR:.; the following are equivalent: (i) f is measurable with respect to the ILLcompletion of As; (ii) f i x is measurable with respect to the fLcompletion of A, and f :::::J * (f i x) off a fLLnullset in As .
Lemma 3
Proof. Recall (without proof) the standard fact that a function is completion measurable if and only if it agrees with a measurable function off a nullset . (i =? ii) There is an Asmeasurable f' and a fLLnullset A E As such that f = f' off A. Let g = f i x and g' = f' l x , then g' is Ameasurable, *g' :::::J f' off a fLLnullset B E As, and {x E X : g(x) # g'(x)} � S(A) . Since fLL(A) = 0, fL(S ( A)) = 0 by the example above, and so g' is measurable with respect to the fLcompletion of A. Note also that *fL(S(A)) = 0, so fLL (A U B U *S(A)) = 0. For x E *X , x rJ. A U B U *S(A) , f(x) = f'(x) :::::J g'(x) = g(x) , proving (ii) . (ii =? i) Let A E As be a fLLnullset with f :::::J * (fix) of!:" A. Let g = fix, g' Ameasurable such that g = g' off a fLnullset B E A, E = {x E *X : l *g'(x) l < oo }, and f' = 0(*g' XE ) · By Lemma 2, f' is in APS. For x rJ. E U *B U A, f'(x) :::::J *g ' (x) = *g(x) :::::J f(x) , so in fact f'(x) = f(x) . Since f' is As measurable, f is completion measurable. D
15.3
Egoroff's Theorem
The most striking application [1] made of the Smeasure construction was a proof of Egoroff's Theorem. Their proof relied on a nonstandard condition shown by Robinson to be equivalent to approximate uniform convergence. Here I use the machinery above to give an alternate proof of the theorem not depending on Robinson"s condition. :
Let (X, A, fL) be a finite meas1Lre space, j, fn X + rn:. measur able, fn + f a. e. Then Vc > 0 3A E A fL(A) > fL( X )  c & fn + f uniformly on A.
Theorem 3
222
15. More on Smeasures
Proof. For n E N put 9n = inf m> n fm and h n = supm > n fm · Note that for almost all x, hn (x)  gn (x) is nonnegative and nondecreasing in n with limn_, = hn(x)  9n(x) = 0. Put ?Jn = 0 *gn , h n = 0 *h n . Let E = {x E *X : limn _,= hn(x)  ?Jn (x) c/c 0}. Note E E As, so fLL(E) = p,(S(E)) = p,(0) = 0. It follows that for some A E A, *A c;;;; E C and p,(A) > p,(X)  c . It remains to show that fn converges to f uniformly on A; equivalently, that hn  9n converges to 0 uniformly on A. Fix 6 > 0. For x E *A there is a k E N such that hk(x)  gk(x) < 6. Note *h k( x )  *g k (x) < 6 as well. Let ¢(x) be the least k such that *h k (x)  *g k (x) < 6. The function ¢ : *A + N is internal and finitevalued, so has a standard upper bound N E N. Then *h k  *g k < 6 on *A for every k 2 N, so h k  9k < 6 on A; this completes the proof. D
15.4
A Theorem of Riesz
Let T be a continuous linear real functional on .C 2 (X, A, fL ); then there is a g such that for every f E .C 2 (X, A, p,), T(f) = J fg dp,.
Theorem 4
g is actually in .C 2 (X, A, p,) . By saturation let A be a *finite algebra with A0 c;;;; A c;;;; *A . There
After the proof we show that the function
Proof. is an internal *partition II of *X which corresponds to A in the sense that the latter is the internal closure of the former under hyperfinite unions. Now, define
i (x)
:=
{
*T( xv ) *p, (p )
0,
'
x E p E II and *p, (p )
>
otherwise.
Let a : II + *X be an internal choice function, that is, For any bounded, measurable f : X + rn:.,
T (f) = L * T (*f xv ) p p p
0;
ap E p for p E II. ( 1 5.4. 1 ) ( 1 5.4.2) ( 1 5.4.3) (1 5.4.4)
223
1 5 .4. A Theorem of Riesz
1p � L 1 *j(x yY (x) *df1 p p = J *f :Y *d/1 =
L *f(avYf (x) *d/1
(1 5.4. 5)
p
( 1 5.4. 6) ( 1 5.4. 7)
The steps from ( 1 5.4. 1 ) to ( 15.4.2) and (15.4.5) to ( 1 5.4.6) follow from boundedness of f and the definition of A. For (15.4.2) to ( 1 5 . 4.3) note that by continuity of T, if p E II and *!1(P ) = 0 then *T( xv ) = 0. Moreover, if *f is replaced in the above by any internal function h : *X + *JR:. with the property that h is constant on each p E II, then all but the first equality in the above still holds, and we obtain *T(h) = J hi d*f1 . In particular, if h = XA for some A E A then *T( x A ) = fA :Y d*f1. The proof now proceeds as follows : (i) Show :Y is finite almost every where (and therefore 1 =0 :Y exists almost everywhere) . (ii) Show that :Y is Sintegrable (and therefore 1 =0 :Y is integrable) . (iii) Put G = IE[riAs] (the conditional expectation of 1 ) . (iv) Let g be the restriction of G to X; by Theorem 2 g is Ameasurable. (v) Show that g works. For (i) , write [:Y < n] = {x E *X : :Y(x) < n } ( n E *N) , and [i < oo] = U n EN [:Y < n] . Suppose (for a contradiction) that f1L ( [:Y < oo] ) < 1  r for some standard r > 0. Then !1 ( [:Y < n ] ) < 1  r for each standard n E N, so !1 ( [:Y < H] ) < 1  r for some infinite H. But then oo > T(1) � *J1:Yd*f1 (by the note above) 2 *J [:Y ? H] :Yd*/1 2 H*!1 ( [:Y 2 H] ) > rH, which is infinite, a contradiction. For (ii) , the reader is referred to [4] for a discussion of Sintegrability. In particular, to show that :Y is Sintegrable, it suffices to show that VH in finite, *J :Y > H] :Yd*/1 � 0. (Indeed, this can be adopted as the definition of I Sintegrability. ) So, fix such an H. Note [:Y > H] E A; it follows that * J :Y > H] :Yd*/1 = I *T(X[:Y > HJ ) · Suppose (for a contradiction) that *T(X[:Y > HJ ) > r > 0, r stan dard. By (i) , *!1 ( [:Y > H] ) � 0. It follows from transfer that for every stan dard n E N there is a Bn E A with T(XBn ) > r and f1(Bn) < 1 / 2 n . Put B n := Um > n Bm . Then 11(B n ) < L m > n 2  m = 2  n + l , but T(XBn ) > r. As n + oo, X;;n + 0 in £ 2 but T(XBn ) > :;., contradicting continuity of T. It follows from Sintegrability of i that 1 =0 :Y is integrable, so its con ditional expectation with respect to As, IE[riAs] , exists; put G = IE[riAs] . (Conditional expectation is defined in the next section, where a simple non standard proof for its existence is given. The reader is referred to chapter 9 of [5] for properties of the conditional expectation.)
224
1 5 . More on Smeasures
Finally, let g G l x , which ( as noted above ) is Ameasurable by Theorem 2. It remains to show that g satisfies the conclusion of Theorem 4. Let f E J:} (X, A, p,) , and suppose first that f is bounded. This ensures that (*!)1 is Sintegrable, and (0*j)r E J:} (*X, AL, p,L) . Then =
J *FI d *p, � J (0*!) { dp,L = j IE W *f)r i A s ] dp,L j C *f) IE [riAs] dp,L = J (0*j)G dp,L = J fg dp,
T(j) �
=
( 1 5.4.8) ( 1 5.4.9) ( 15.4. 1 0) ( 15.4. 1 1 ) (15.4. 12) (15.4. 1 3)
The step from ( 15.4.8 ) to ( 15.4.9 ) is by Sintegrability, ( 1 5.4. 10 ) to ( 15.4. 1 1 ) is a standard property of conditional expectation ( using the fact that 0*j is Asmeasurable ) , and (15.4.12 ) to ( 1 5.4. 13 ) is Corollary 1 with p = 1 . For unbounded f E J:} (X, A, p, ) and n E N, let fn = max { n, min { !, n } } . By the result above , T(jn ) = J fn g dp,. By continuity of T and Lebesgue's Dominated Convergence Theorem ([ 5 ] , Theorem 5.9 ) , T(j) = J fg dp,. D The Riesz Theorem is usually stated in the following nominally stronger form.
Let T be a continuous linear real functional on .C 2 (X, A, p,) ; then there is a g E .C 2 (X, A, p,) such that for every f E .C 2 (X, A, p,), T(j) = J fg dp,.
Corollary 2
Proof. It suffices to show that any g satisfying the conclusion of Theorem 4 is already in .C 2 (X, A, p,). Since T is continuous, it is bounded, so there is a constant C such that for any f E .C 2 ( X, A, p,) , T(j) � CIIJII 2 · Put 9n = min { g, n } . Then ll9n ll� = Jg� dp, S fgn g dp, = T(gn ) S Cll9n l l 2 · Divide both sides of this inequality by ll9n ll 2 and square, obtain J g� dp, ll9nll � S C 2 . The result now follows by Fatou's Lemma ([5] , Lemma 5.4) . D
15.4. A Theorem of Riesz 1 5 .4 . 1
225
Conditional expectation
Theorem 5 Suppose (X, A, fL) zs a probabdzty measure, that 'B C:: A is another aalgebra, and that j E ,.C} (X, A, tt) . There is a 'B measurable function g : X ___, ffi. such that for any B E 'B, f dfL 9 dfL.
JB
=
JB
The function 9 is called the conditwnal expectation of f on 'B, and denoted by IE [f i'B] . To avoid circularity, it is necessary to show that IE [f i 'B] exists without use of the Riesz Theorem (or related results, such as the RadonNikodym Theorem) . Such a proof appears in [3] . This section presents a modification of that proof which is even more elementary, in that it does not require the Hahn decomposition. Without loss of generality f 2 0. Put 9 {9 E ..CJ (X, 'B, fL) : VB E 'B , 9 dfL <:.:: f dfL }. Note if 9 1 , 92 E 9 then max{9 1 , 92 , 0} E 9 . Let r = sup{fx gdfL : g E 9}. and for n E N let 9n E 9 with fx gn dfL > r  l/2 n ; we may assume that 0 <:.:: 9 1 <:.:: 92 <:.:: · · · . Put 9 supn 9n · Note that if B E 'B , 9 dfL = supn 9n dfL <:.:: f dfL, so 9 E 9. Moreover, fx 9 dfL = r . It remains to show that for any B E 'B. f dtt = 9 dfL. Suppose not; then there is a B E 'B and c > 0 with f dfL  9 dfL = c. For some 6 > 0, c 1 = JB (j  g)dfL > 0 where g = 9 + 6 · g almost witnesses a contradiction; the perturbation by 6 needs to be localized to a slightly smaller set. As in the proof of Theorem 4 let 23 be a * finite algebra with {*B : B E 'B } C:: 23 C:: *'B, and let II be the internal hyperfinite *partition of *X corresponding to 23 . Let B + = U{b E II : * (!  g )d *fL > 0}. Put s = o * (!  g )d *fL . Note that for every C E 'B , if C C:: B then Jc(f  g) dfL = * (!  g )d*fL <:.:: s : in particular. c1 <:.:: s. For n E N consider the statement,
JB
JB
JB
=
JB
JB
JB
=
JB
JB
JB+ f.c
Jb
As this holds in the nonstandard model (with B + transfer in the standard model. For m E N put Bm B00 n m B m . Note B= E 'B and B = C:: B, so g)dfL > s  2 n , the other hand. since always Ln >m 2  n = s  2 m , therefore . (j  g)dfL 2 s . Put 9 1 = 9 + If C E 'B, =
6XB=· _
for Bn) , it holds by U n >m Bn , and put (j  g)dfL <:.:: s. On
=
JB=
JBJJ JB
JB m (j  g)dfL > s
CX)
Jc\ Bx u
r u 9') dfL = r
Jc
JB XB
_
9) dfL +
.fCnB=u  g) dfL .
15. More on Smeasures
226
The first term in this sum is nonnegative since g E 9 . The second is nonnegative since otherwise g')dfL 2 0, so g' E fj )dfL > s. It follows that 9 . Since s > 0, !L(BcxJ) > 0, so g' dfL g dfL + 6fL(B00) > r. a contradiction.
JBx\C ( j 
J
=
J
Jc ( J 
References [1] C . WARD HENSON and FRANK WATTENBERG , "Egoroff ' s theorem and the distribution of standard points in a nonstandard model", Proc. Amer. Math. Soc., 81 ( 1981) 455461 . [2] W . A . J . LUXEMBURG, O n some concurrent binary relations occurring in analysis, in Contributions to nonstandard analysis (Sympos., Oberwol fach, 1970), Studies in Logic and Found. Math. , Vol. 69, NorthHolland, Amsterdam. 1972. [3] DAVID A. Ross, Nonstandard measure constructions  solutions and problems, in Nonstandard methods and applications in mathematics, Lec ture Notes in Logic, 25, A.K. Peters, 2006. [4] DAVID A . Ross. Loeb measure and probability, in Nonstandard analysis (Edmburgh, 1996), NATO Adv. Sci. Inst. Ser. C. Math. Phys. Sci., vol. 493, Kluwer Acad. Publ., Dordrecht, 1997. [5] DAVID WILLIAMS, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press, Cambridge, 1991. [6] B EATE ZIMMER. "A unifying RadonNikodym theorem through nonstan dard hulls", Illinois .Journal of Mathematics, 49 (2005) 873883.
A RadonNikodym theorem for a vectorvalued reference measure G. Beate Zimmer*
The conclusion of be represented
a
Abstract
RadonNikodym theorem iH that
as �m
that for all measurable sets Lebesgue)
a
measure f1. can
integral with respect to a reference measure
A, p(A)
such
JA .f1,. (x) d).. with a (Bochner or
=
integrable derivative or density .fw The mcasme ).. is usually a
countably additive <Jfinit.e measure on the given measure space and the
mcasmc JL is absolutely continuous with respect to ).. . Different theorems
have different range spaces for p., which could be the real numbers, or Banach s paces with or without the RadouNikodym property.
Iu this
paper we ge1ieralize to derivatives of vector valued measures with re spect a vectorvalued reference measure. theorem for vector
measures
Vvc present a R.adonNikodym
of bounded variation that arc absolutely
continuous with respect to another vector measme of bounded variation. W'hile it is easy in settings sud1 on the interval
[0, 1]
as
Jl < <
,\ , where ,\ is Lebesgue measure nonstan
and Jl· is vectorvalued to write down a.
clard RadouNikodym deri vative of
Lhe form t.p :
*[0, 1]
>
fiu ( *E) by
t.p1, (:1:) 2:!1 :���:\ lA, (.1; ) , a vector valued reference measure does uot. =
allow this approach.
as
the quotient of
two vectors
in different B anach
spaces is undefined. Furthermore, generalizing to a vector valued control mcmmre uecessita(;es the usc of a generalization of the Bartle
bilinem· vector integral.
16 . 1
integral , a
Introduction and notation
For noustandard notions and notatious not defined here we refer for exam
ple to the book by Albeverio, Fenstad, 1IoeghKrohn and Lindstrom
t.he survey on nonstandard hulls by Henson and 1\tloore *Department of Tvfathematics and Statistics, Texas Corpus
Christi, TX 781112.
[1]
and
[6] or the introduction
A&M Universit.y 
Corpus Christi,
16. A RadonNikodym theorem
228
to nonstandard analysis in [7] . Plenty of information about vector measures can be found in [5] . Throughout, D is a set and I: is a aalgebra of subsets of D . E, F and G denote Banach spaces. fL : I: + E and v : I: + F are countably additive vector measures of bounded variation. The variation of v is defined as I vI (A) = sup L 1rEII llv(A") II where A E I: and II is a finite partition of A into sets in I: . B y Proposition I . l . 9 i n [5] the variation lvl of a countably additive measure v is also a countably additive measure on I:. We think of the nonstandard model in terms of superstructures V(X) and V(*X) connected by the monomorphism * : V(X) + V(*X) and call an ele ment b E V(*X) internal, if it is an element of a standard entity, i.e. if there is an a E V (X) with b E *a. We assume that the nonstandard model is at least Nsaturated, where N is an uncountable cardinal number such that the cardinality of the aalgebra I: of I vimeasurable subsets of D is less than N. Let (*D, Llvi (*I: ) , lvl ) denote the Loeb space constructed from the nonstan dard extension of the measure space (D, I:, I v i ). This measure space is obtained by extending the measure 0*1vl from *I: to the aalgebra generated by *I:. _I he completion of this aalgebra is denoted by Llvi (*I: ) . The Loeb measure l v l is a standard countably additive measure. The functions we work with take their values in a Banach space E or its nonstandard hull E. The nonstandard hull of a Banach space E is defined as E = fin(* E)/ the quotient of the clements of bounded norm by clements of infinitesimal norm. The nonstandard hull is a standard Banach space and contains E as a subspace. We denote the quotient map from fin(*E) onto E by or 'if£ . Whenever we encounter products of Banach spaces, we equip them with the floo norm: l i e J l l = max(ll e l l , I I J I I ) . B y a result of Loeb in [8] , there exists an internal *finite partition of *D consisting of sets A1 , . . . , AH E *I: (H E *N ) such that the partition is finer than the image under * of any finite partition of D into sets in I:. The proof uses a concurrent relation argument. From this fine partition we discard all partition sets of *!v i measure zero. The remaining sets still form an internal collection, which we also denote by A1 , . . . , AH . ::::J ,
1r
16.2
X
The exist ing literature
Ross gives a nice detailed nonstandard proof of the RadonNikodym the orem for two realvalued afinite measures fL < < v on a measurable space in [9] . Previously we have studied nonstandard RadonNikodym derivatives of vector measures tt : I: + E with respect to Lebesgue measure on [0, 1 ] . We have shown that through nonstandard analysis a unifying approach to
16.2. The existing literature
229 or
RadonNikodym derivatives independent of the RadonNikodym property lack thereof of E can be found. The generalized derivatives we constructed are not necessarily essentially separably valued and therefore not always Bochner integrable. However, a generalization of the Bochner integral in [10] for func tions with values in the nonstandard hull of a Banach space allows one to integrate the generalized derivatives. In [ 1 1] with methods similar to those used by Ross [9] or Bliedtner and Loeb in [3] , and with the use of local reflex ivity we "standardize" the generalized derivatives from maps *D + E to maps D + E" with values in the second dual of the original Banach space E. The vector measures book [5] by Diestel and Uhl is still considered as the authoritative work on RadonNikodym theorems. The standard literature yields very little on vectorvector derivatives; only Bogdan [4] together with his student Kritt has made an initial foray into this field. In their article they assume that they have two vector measures fL : I: + E and v : I: + F with fL < < I v 1 . and they assume that the range space of fL has the RadonNikodym property and the range space of v is uniformly convex. They then define a derivative and integral in the form fL(A) = fA ��I (w) t l (w) dv, where tl is a function from D into F', the dual space of the range of v . The integrand is then of the form �� ( w) = e . !' ' a function from n into E X F'. A trilinear product on E x F x F' with values in E can be defined as e · J' · f f' (f) · e. This is needed to integrate simple functions of the form L�=1 e, . !I · 1 A , : n + E X F' with respect to the Fvalued measure v as follows: ·
=
L t e; · JI · 1A, dv = t e; · JI · v(A; ) = t e, · J; (v(A; )) E E . ,=1
,=1
,=1
The main difficulty is the definition of tl , the derivative of a realvalued measure with respect to a vectorvalued measure, the opposite setting from the ordinary RadonNikodym theorems. Bogdan uses his assumption that F is uniformly convex to find this derivative. Since v < < l v l and since uniform convex spaces are reflexive and hence have the RadonNikodym property, there is a Bochner integrable function d��l : D + F such that for all measurable sets A, the vector measure can be written as a Bochner integral v(A) = fA d�� l dl v l . A standard theorem ( e.g. Theorem 11.2.4 in [5]) about vector measures states that the norm of the RadonNikodym derivative is a RadonNikodym derivative for the total variation of the vector measure, i.e. for all A E I: l v i (A)
=
i l d�� � � d l v l .
This implies that I vialmost everywhere on D, II d��1 1 1 1 . For uniformly convex spaces F there is a continuous map g : Sp + SF' , where Sx denotes =
230
16 .
A RadonNikodym theorem
the unit sphere of X, such that ( !, g(f)) = 1 for all f E Sp. In this case, the composition g o d�� l is a map from D into the unit sphere of F'. The derivative of fL with respect to v is defined as �� � · (g o d�� l ) : D + E F', or, if one regards the product e f' as a rank one continuous linear operator from F to E, the derivative can be considered a map: �� : D + L(F, E) . Since d��l and d�� l arc both Bochner integrable and g is continuous and g o d�� l is almost everywhere of norm one, it is easy to show that the result is integrable with respect to the vector measure v with an integral that uses approximation by simple functions. =
x
16.3
x
The nonstandard approach
With nonstandard analysis we can significantly weaken the assumptions on the Banach spaces E and F. We use Bogdan's idea of writing the derivative dlvl _ djV[ dp · ;IV dfl . (g o djV[ dv ) . as djV[ The derivatives ��I and d��l are derivatives of a vector measure with respect to a real valued measure. They can be found and integrated with the methods described in [10] , [11] or [12] . In [10] we defined a Banach space M(lvi, E) of e�ended integrable func tions as the set of equivalence classes under equality lvlalmost everywhere of functions f : *D + E for which there� an internal, *simple, Sintegrable f l vlalmost everywhere on *D. Such a cp : *D + *E such that 71£ o cp cp is called a lifting of f. On M ( lvl, E) we defined an integral by setting JA f d!vl 71 ( fA cp d * lvl ) , for all A E *I:, where the integral of the internal simple function cp is defined in the obvious way. This integral can be extended to sets in the Loeb aalgebra and generalizes the Bochner integral in the sense that M ( lvl, E) contains L1 ( lvl, E) and the integrals agree on that subspace. However, .iH(Ivi, E) also contains functions which fail to be essentially separa bly valued and hence fail to be measurable. Lemma 1 in [11] translates to this situation as: =
=
Let E be any Banach space and let fL : I: + E be a ! v i absolutely continuous countably additive vector measure of bounded variation. Then the internal *simple function cp 11 : *D + *E defined by
Lemma 16.3 . 1
is S integrable, where A1 , . . . , AH is the fine partition of *D introduced above.
16.3. The nonstandard approach
231
Sintegrability implies that the function is l v l almost everywhere fin(*E) valucd. This allows us compose an Sintcgrable internal function with the quotient map 71E : fin(*E) + E to make it a nonstandard hull valued function defined on the Loeb space ( *D , L i v i (*L: ) , !vi) . Define f11 E M(lvi, E) by
f!l = 71£ o CfJw
Theorem 4 in [11] asserts that if the vector measure fL has a Bochner integrable RadonNikodym derivative, then the generalized derivative "is" the Radon Nikodym derivative in the following sense:
Let fL : I: + E be a countably additive vector measure with a Bochner integrable RadonNikodym derivative f : D + E. Define the gener alized RadonNikodym derivative f11 : *D + E as above. Then 71£ o *f = f11 on each set Ai in the fine partition of *D.
Theorem 16.3.2
Of course, the same arguments also work for finding a Fvalued generalized derivative fv E lVI(Ivi, F ) of the vector measure v : I: + F with respect to its total variation I v i . Differentiating v with respect to its total variation gives one extra feature of the derivative:
Let F be a Banach space and let v : I: + F be a countably additive vector measure of bounded variation. Define
Lemma 16.3.3
Then fv = 71F o C{Jv is I vi almost everywhere of norm one. Since v << lvl , the Sintegrability of fv follows from Lemma 16.3.1 . The norm condition follows from an adaptation of a basic standard result (Theorem 11.2.4 in [5] ) : if f is Bochner integrable and a vector measure F is defined by F(A) = fA f dlvl, then I F I (A) = fA ll f l l dl v l . By construction of the generalized derivative, Proof.
for all A E *I:. In this situation it implies that * l v i (A) = * d* l vl and hence * II CfJv (w) ll
=
l I d��� �
1 for lvlalmost every w E *D.
D
232
16.4
16. A RadonNikodym theorem
A nonstandard vectorvector integral
The next theorem allows us to generalize the generalized Bochner integral to a vectorvector integral which for any simple function obviously agrees with the Bartle integral developed in [2] . For the definition of the Bartle integral, E, F and G are Banach spaces equipped with a continuous bilinear multipli cation E x F + G and v : I: + F is a countably additive vector measure of bounded variation. Such a product can arise by taking E F G to be a C*algebra, or by taking E = F' and G = rn:., or by taking E = L ( F, G ) . A function f : D + E is Bartle integrable if there is a sequence of simple functions sn that converges to f almost everywhere and for which the se quence An ( A ) fA S n dv converges in the norm of G for all A E I:. Then fA f dv = limn_, = >n (A ) exists in norm uniformly for all A E I:. Essentially bounded measurable functions are Bartleintegrable. The Bartle integral is a countably additive set function. =
=
=
Let E, F and G be Banach spaces equipped with a continu ous bilinear multiplication E x F + G. Assume that v : I: + F is a countably additive vector measure of bmmded variation and f E Af (!vi, E) . The integrand f has an internal lifting i.p J and there is an internal *simple lifting i.pv of the generalized derivative of v with respect to I v i . Then for all A E *I:, the integral fA f dv defined by
Theorem 16.4 . 1
l f dv
= no
(l
i.pJ · i.pv
d* l v l
)
is a welldefined countably additive G valued integral. The function f E 1\!I (!vi, E) has a internal, Sintegrable and *simple lifting i.pJ : *D + fin ( *E ) . By Lemma 16.3.3 i.pv : *D + fin ( *F ) is internal, *simple, Sintegrable and of norm one almost everywhere. Continuity of the multiplication E x F + G implies that the product i.p j ' i.pv is an Sintegrable and internal *simple function from *D + fin ( *G) . This means that it is a lifting of an extended Bochner integrable fun�on in M(!vi, G ) and hence the Gvalued generalized integral with respect to l v l applies and gives a welldefined vector vector integral. Since the extended Bochner integral is countably additive, so is this integral. D Proof.
The use of nonstandard analysis simplifies standard arguments here, since in the standard setting this argument would only work if the Banach space F had the RadonNikodym property.
16.5. Uniform convexity
16.5
233
Uniform convexity
Bogdan needed the assumption that F is uniformly convex to convert the Fvalued derivative d�� l into the F'valued derivative tl . He composed the derivative d�� l with a continuous map g from the unit sphere of F to the unit sphere of its dual space F' such that for all f E Sp, (!, g(j)) 1 . The resulting map g o d�� l : D + F' is lvl measurable and inherits the integrability of d�� � · If the Banach space F happens to be uniformly convex, we compose the nonstandard extension of g with VJv. In this case, (!, g(j) ) 1 for all . f E SF t ransl ates . t o /\ **l vv i((AA,, )) , *g *l*vv (i (AA,,))  1 for .  1 , . . . , H 1.e. =
(
m
))
can

z 
=
Let E be any Banach space and let F be a uniformly convex vector space. If the vector measure fL : I: + E is absolutely continuous with respect to the total variation of the vector measure v : I: + F, then there is an extended Bochner integrable function f : *D + ExF' S1Lch that for all A E I:
Theorem 16. 5 . 1
fL(A) =
J*A f dv.
Define VJ J Yp · (*g o VJ v ) , where Yp is a lifting of the generalized derivative d��l , tpv is a lifting of the generalized derivative d�� l and g is the continuous map from Sp to Sp . This lifting is internal, Sintegrable, *simple and takes its values in fin(*E) *Sp . Define f 71ExF' o :PJ · Then
Proof.
=
fn f d v
x
=
71£ 7IE rrc
(!n tp (w)
=
) (�� *lvi*fL(Ai)(Ai) . *g ( *lv*v(Ai)i (Ai) ) *v(Ai) ) (f ,;��(�;) 'l vi (A;)) p
· *g ( VJv ( w ))
d *v
·
71£ ( *fL(*D))
fL(D) ,
and similarly for fL(A) , where (Ai) is replaced by (A; n *A) throughout. Since the fine partition refines the standard partition of D into A and its complement, Ai n *A is either the empty set or A,. D
16. A RadonNikodym theorem
234
If E has the RadonNikodym property, then this lifting coincides with the image of the derivative found by Bogdan in the nonstandard hull of the space of vectorvector integrable functions on D.
16.6
Vectorvector derivatives without uniform convexity
We can weaken the assumption on F: it follows from the HahnBanach theorem that for each nonzero f E F there is an f' E F' with 1 1 .!' 1 1 1 and f'(f) = llfll · Without uniform convexity, the choice of f' is not necessarily unique, and the map f f+ f' need not be continuous on the unit sphere of F. We can forego the use of a function defined on all of Sp. All we need is a derivative on each set A, in the fine partition of *D. =
g
Let F be any Banach space and let v : I: + F be a countably additive vector measure of bounded variation. Then there is a function 9i vl E M(lvi, P) such that Theorem 16.6. 1
1A 9ivl dv = o(*lv i ( *A)) = lvi (A)
for all A E I: .
Proof. As the collection of sets A, is internal, we can use the axiom of choice to get an internal collection of norm one functionals E *Sp with ( *v(A; )) = * lv i (A; ) for i = 1 , . . . , H.
g;
g;
Define an internal *simple Sintegrable function .
1/Ji vl
=
H
L i=l
:
1/J i vl *D + *Sp by
g; . A
1 ,·
This function is a lifting of an extended integrable function 9i vl E M(lvi, P) . We can define a continuous bilinear multiplication F' x F + JR; by f' · f = f' (f) and then use the integral from theorem 16.4. 1 . In this case the quotient map no becomes the standard part map o : fin (*JR;) + R By the choice of the fine partition. A; n *A is either the empty set or equals Ai. d*v dv = o
lA 9ivl
(lA 1/l i vl ) 0(1A t, g; ·1A, d*v)
16.6. Vectorvector derivatives without uniform convexity
(� o(� o
gi · *v(Ai n *A) *lvi (Ai n *A)
o ( * l v i ( *A ))
)
235
) =
for any set A E I:. This finishes the proof, as for all A E I:, lvi (A)
0 ( * 1 v i ( *A )) . D
The last theorem constructed a derivative for any countably additive vector measure of bounded variation with respect to its total variation as an element of M ( lvl, P) . This was the main difficulty in differentiation with respect to a vector measure. Now we can differentiate one vector measure with respect to another vector measure. :
Theorem 16.6.2 Let E and F be any Banach spaces, let fL I: + E and v I: + F be a countably additive vector measures of bounded variation such that fL < < lvl . Then there is a function h E M(lvi, ExF' ) such that for all A E I: fL ( A ) h dv. :
=
j
*A
x
x
The continuous trilinear multiplication E F' F + E defined by f' (f) · e extends to a continuous bilinear multiplication ExF' F + E defined by � f') n(f) 7rE(J'(f) · e). The extended integrable function h *D + E F' is defined through its lifting cp cp11 · 1P1v1 , the product of the lifting cp11 of d��l E M(lvi, E) and the lifting 1/J i v l of t l E M(lvi, P) found in theorem 16.6. 1 . Setting Proof. x
x
e f' f
= :
x
x
=
x
=
*
x
we obtain a simple internal function with values in fin ( *E ) *SF' . The Sintegrability of cp follows from the Sintegrability of cp11 and Lemma 16.3.3 which asserts that *111/l i v l ll :::::J 1 almost everywhere. Then for all A E I: 1rE ( *fL ( *A )) fL (A )
"" (f (�� 1rE
M ( A; n 'A )
'
)
*fJ (Ai n *A ) * l v i ( A n *A) * l vi (Ai n *A ) ·
2
)
16. A RadonNikodym theorem
236
:
and h *D . df.lv t 1ve d .
16.7
+
ExF' is an extended integrable generalized vectorvector derivaD
Remarks
All the constructions above depend on the choice of a fine partition A 1 , . . . , AH of *D that refines all finite standard partitions of D into measurable sets. A different choice of the partition would yield different derivatives, especially in the case of nonuniformly convex Banach space F. However, for the standard analyst there is no noticeable difference between different choices of fine partitions, as the integrals over any standard set will be the same, no matter which fine partition is used. Representing a vector measure v I: F as an integral of its generalized derivative d1� l simplifies some of the results on Loeb completions of internal measures [13] by Zivaljevic. a
:
+
References [1] S. ALBEVERIO, J . E . FENSTAD, R. H 0EGHKROHN and T. LINDSTR0M,
Nonstandard methods in stochastic analysis and mathematical physics,
Academic Press, Orlando, F L, 1986.
[2] R. G . BARTLE, "A general bilinear vector integral", Studia Math. , 15 ( 1956) 337352. [3]
.J . BLIEDTNER
and P . A . LOEB, "The Optimal Differentiation Basis and Liftings of L00", Trans. Amer. Math. Soc., 352 (2000) 46934710.
[4] W . M . BOGDANOVICZ and B. KRITT, "RadonNikodym Differentiation of one VectorValued Volume with Respect to Another", Bulletin De L' Academic Polonaise Des Sciences, Scric des sciences math. astr. ct phys. , XV ( 1967) 479486.
References
237
[5] J. DIESTEL and J . J . U HL , Vector Measures, Mathematical Surveys 1 5, American Mathematical Society, Providence, RI, 1977. [6] C .W . HENSON and L. C . MOORE, Nonstandard Analysis and the the ory of Banach spaces, in A.E. Hurd (ed.) Nonstandard AnalysisRecent Developments, Springer Lecture Notes in Mathematics 983, 1983. [7] A . E . HURD and P .A. LOEB, An Introduction to Nonstandard Real Anal ysis, Academic Press, Orlando, FL, 1985. [8] P . A . LOEB , "Conversion from nonstandard to standard measure spaces and applications to probability theory", Trans. Amer. Math. Soc., 2 1 1 (1 975) 1 13 122. [9] D. A. Ross, Nonstandard Measure Constructions  Solutions and Prob lems submitted to NS2002, Nonstandard Methods and Applications in Mathematics, June 1 016 2002, Pisa, Italy. [10] G . B . ZI M MER , "An extension of the Bochner integral generalizing the LoebOsswald integral", Proc. Cambr. Phil. Soc., 123 ( 1998) 1 1 9 131. [ 1 1] G . B. Zrl\IMER, "A unifying RadonNikodym Theorem through Nonstan dard Hulls", Illinois Journal of Mathematics, 49, (2005) 873883. [12] G . B. Zil\IMER, Nonstandard Vector Integrals and Vector Measures, Ph.D . Thesis, University of Illinois at UrbanaChampaign, October 1994. [13]
R.
T. Z IVALJEVIC, "Loeb completion of internal vectorvalued measures", Math. Scand. , 56 (1985) 276286.
Differentiability of Loeb measures Eva Aigner*
Abstract We introduce a general definition of Sdiffcrcntiability of m1 internal measure and compare different special cases. It will be shown how Sdifferentiability of an illtcrnal measure yielc Is differeutiability of the associated Loeb measure. vVc give some exmnplcs.
17. 1
Introduction
In this pa per we present some new results about differential properties of Loeb measures. The theory of differentiable measures was sugges ted by Fomin [9] as aJl infinite dimensional substitute for the SobolevSchwartz theory of distributions and has extended rapidly to a strong field of research . In partkular during the last ten years it has become the foundation for many applications in different fields such as quantum field t heory ( sec e.g. [10] or [14]) or stochastic analysis ( see e.g. [4] , [5] , [6] and [151 ) . There are many different. notions of measure differentiability. The following two are most, common (see e.g. [3[ , [6[ , [9[ , [13[ and [151). Let v be a Borel measure on a locally convex space E and y an element of E. l) The measure 11 is called Fom·indifferentiable along y if for all Borel sub sets B C E the limit lim
r�O, rElR
11(B + ·ry)  z; ( B) T
ex i sts . v is called Skorohoddifferentiable with respect to y, if there exists another Borel measure v1 on E, such that for all continuous real
2) The measure •
LudwigMaximilians Universitiit , Miincheu , Germany.
17.1. Introduction
239
valued bounded functions g on E limr
r + 0.
J g(x  ry) dv( x)  J g(x ) dv( x) ER r
=
J g(x ) dv'(x ) .
The relationships between these two approaches were studied by Averbukh, Smolyanov and Fomin [3] and Bogachev [6] . A very general definition of differentiability of a curve of measures is given by Weizsacker in [16] . In this paper we define measure differentiability as follows.
Let (D, :F, v) be a measure space, (vr)  c
=
=
J
Note that Definition 1 7. 1 . 1 covers the cases mentioned above, since for a locally convex space D and a fixed vector y E D a curve (vr )  c
=
:
1 7. Differentiability of Loeb measures
240
17.2
Sdifferent iability of internal measures
Standing Assumption
Throughout this paper let D be an internal set, A an internal *t'Yfield on D and fL :2: 0 an internal *t'Yadditive Sbounded measure on A. Let ( !Lt)tEJ be an internal curve of nonnegative *t'Yadditive Sbounded measures on A. Since we want to obtain an external curve of Loeb measures, we assume that for some c E rn:. + the internal parameter set J is either an interval of *JR containing the standard interval I =]  c; c [ or J is a discrete interval { Jl , r;l , . . . , ki/ , fl } with H E * N \ N, k E { 1 , . . . , H } and c :'S: .fl . Moreover, the internal curve ( !Lt)tEJ shall be Scontinuous in the following sense. If t, s E J with t :::::J s, then fLt( A) :::::J fLs (A) for each A E A. Finally we assume that fL = fLO · We now introduce Sdifferentiability for internal measures.
Suppose we have a (not necessarily internal) set C of in ternal *JR:.valued functions on D, each being Ameasurable and Sbounded. We say that the internal measure fL is S differentiable with respect to the set C if there exists an internal S bounded signed measure fL1 on A so that for all f E C and for all infinitesimals t E J, t # 0 J f(w) dfLt (w)  J f(w) dfL(w) :::::J f(w) dfL (w ) .
Definition 17.2.1
We call fL1 an (internal) derivative of fL ·
J
'
Note that a derivative is not uniquely determined by the above definition. If fL is Sdifferentiable with respect to a set C and if for some infinitesimal t E J the internal measure 11 ' ;11 has limited values, then 11 ' ;11 is a derivative of fL· In this section we will regard and compare Sdifferentiability for differ ent sets C of functions. Following the standard literature we define SFomin differentiability.
The internal measure fL is called S Fomindifferentiable if the differentiability is with respect to C = {1 A A E A} where 1 A is the indicator functwn. Note that in the case of SFomindifferentiability each measure 11 ' ;11 with t :::::J 0, t E J \ {0}, is a derivative of fL · The following proposition shows the
Definition 17.2.2
:
power of SFomindifferentiability.
If fL is SFomindifferentiable and fL1 is a derivative of fL, then fL is Sdifferentiable wzth respect to the set C of all Sbounded *JR:. valued Ameasurable functions. The Fominderivative 1/ is also a derivative with respect to C . Proposition 17.2.1
17.2. Sdifferentiability of internal measures
241
Let fL be SFomindifferentiable and let fL1 be a derivative of fL · Let t # 0 be an infinitesimal of J and set [L := 11 ' ;'" . Since fL1 :::::J [L on A we obtain J f(w)dfL'(w) :::::J J f(w)d[L(w) for all Sbounded Ameasurable functions f : D + *R D Proof.
When considering measures with Lebesgue densities, then the following lemma is useful. =
Let D *JR:., A the internal field of Borel subsets, y a fixed element of *JR:., fL an internal S bounded measure on A. Take J = *JR:. and for all t E J define fLt (A) = fL(A + ty) for all A E A. Assume that fL has an internal Lebesgue density f satisfying the following three conditions: 1) f 2 0, f = 0 only on a set of internal Lebesgue measure 0. 2) f is * differentiable in the direction ofy (with derivative f� (x) := f'(x) · y) . 3) If t 0, then for all x E *JR:. with f (x) # 0 f� (x) � f(x + ty) _ 1 . t f(x) f(x)
Lemma 1 7. 2 . 1
:::::J
(
)
:::::J
Now if the internal f1mction (3� : *JR:. + *JR:., defined by f� (x) . (3� (x) = f(x) if f(x) # 0 0 if f(x) 0,
{
=
is 511 integrable, then fL is SFomindifferentiable and if fL1 is a derivative, then for all A E A fL1 (A) (3� (x) dfL(x) . :::::J
l
Since (3� is 511 integrable, A f+ JA (3� (x)dfL(x) defines an Sbounded measure. Now let N = {x E *JR:. : f(x) = 0}. Then >(N) = 0, where A is the internal Lebesgue measure. If t 0, t # 0, then
Proof.
:::::J
I
!Lt (A) � fL(A)
 l (3� (x)dfL(x) l l l f(x + ty)  f(x)  f3J,(x) . f(x) l d>(x) r l � ( f(x + ty) _ 1 )  f� (x) l dfL(x) f(x) f(x) JA \ N t �
= :::::J
0.
Here we have used condition 3 ) and the fact that fL is Sbounded.
D
1 7. Differentiability of Loeb measures
242
Example 1 7. 2 . 1 Let D = *JR:., A be the internal field of Borel subsets and fL defined by fL(A) fA l;x2 d>.. ( x) . Fix an element y E *JR:. and define the curve by fLt (A) .JA + ty l ;x2 d>.. ( x) , t E *JR:.. If y is limited. then it ' s easy to see that the assumptions of Lemma 17.2.1 are satisfied. Hence the measure fL is SFomindifferentiable and if fL1 is a derivative, then fL'(A) :::::: fA ���¥ dfL(x) for each A E A. If y is an unlimited element of *JR:., then fL is not SFomin differentiable, because for t = � and for the internal interval A = * [ 0 , 1] C *JR:. the value flt ( A);p ( A) is unlimited. =
=
Note that Lemma 17.2.1 is also true for D *JR:.L , L E *N. In the author's thesis [1] this is used to show the SFomindifferentiability of a nonstandard representation of the Wiener measure introduced by Cutland and Ng in [8] . We now turn to other forms of Sdifferentiability. Again. following the standard literature, we define SSkorohoddifferentiability. =
Remark
Let D be a subset of *M where NJ is a metric space and let A be an internal *afield on D. The measure fL is called S Skorohod differentiable if it is S differentiable with respect to the set of all S bmmded, Ameasurable functions f : D *JR:. that are Scontinuous. This means, f ( w) :::::: f ( w) for w , w E D, w :::::: w . Definition 1 7.2.3
+
I f D is as described in Definition 17.2.3, then, as a consequence of Proposi tion 17.2. 1 , SFomindifferentiability implies SSkorohoddifferentiability. The following example shows that the converse is not true. =
Fix a natural number H E *N \ N and let D c *JR:., D * Z } . The measure fL is the counting measure defined on the field of internal subsets of n, i.e.
Example 1 7.2.2
{ 1£
·
z : z E
fL(A)
=
lA
n
[o H  1 ] I �H .
Here [0, Hif 1 ] {0, 1£ , . . . , Hii l } and I A I is the internal number of elements. Now let J c D, J = [ Jr , Jr] with l E *N, Jr :::::: � and let (fLt)tE J be defined by fLt (A) fL(A + t). Note that for any k E *N with fl E J we obtain: =
=
and
17.3. Differentiability of Loeb measures Now choose k E *N with fl
E
J and fl 0 and
::::::
243
0 and set A = [0, k}/ ] . Then fL k  fL Hk H
(A)
1.
Hence fL is not SFomindifferentiable. It is not hard to check that fL is S Skorohoddifferentiable with internal derivatives 1"1;'1 for all infinitesimals t E J \ {0} . We will give a standard application of this example in Example 17.3.1 . Finally we consider Sdifferentiability with respect to *continuous func tions. We will see that this is equivalent to SFomindifferentiability. The following proposition can be easily shown by transfer of Urysohn ' s Lemma.
Let D be an internal *normal space, A the internal field of Borel subsets. If ( !Lt ) t EJ is a curve of *regular measures, then fL is S differentiable with respect to the set C of all internal * continuous S bounded functions if and only if fL is S Fomindifferentiable. Proposition 1 7.2.2
17.3
Differentiability of Loeb measures
In this section we show how Sdifferentiability of an internal measure yields differentiability (in the sense of Definition 17.1 .1) of the corresponding Loeb measure. Recall the Standing Assumption on page 240. Now we can define a curve of Loeb measures in a unique way. Let c E rn:.+ as described in the Standing Assumption (page 240) , I =]  r:: , r:: [ . For each r E I choose t E J such that t :::::: r and set fLr := fLt· Let us denote the associated Loeb spaces by (D, L 11r (A) , (fLr )L) · Since the Loeb afields L 11r (A) are not necessarily identical we choose a joint afield :F c n rE I Lllr (A) . We now define the curve ((fLL)r)rE I of measures on :F by (fLL)r := (fLr)L restricted to :F. Let us mention some obvious connections between Sdifferentiability and differentiability.
Let fL be S differentiable with respect to a set C of internal *JR:. valued, Ameasurable and S bounded functions on D and let 1L' be an internal derivative of fL · 1) Then for all f E C
Lemma 1 7. 3 . 1
rlim > 0
J_ ) d_ fLr_(w_)')_ f (_ 0 (=J_f_ _ w_ ( w_)d_fL_ ( w )) 0 '(cc_ r
______
The convergence is uniform, if C is internal.
1 7. Differentiability of Loeb measures
244
2) Suppose C
:=
{g : D + ffi. : there is an f E C with 0 ( f (w ) )
and
:F.,, :=
=
g ( w ) for all w E D}
(nr L.,r (A) ) n £.,, (A) . EI
Then the Loeb measure ILL , restricted to :F.,, , is differentiable with respect to the set C and the Loeb extension ( tl ) L of o ( tl ) , restricted to :F1L' , is a derivative (ILL ) ' of ILL . This means that for all g E C : J g (w)d(ILL)r (w)  J g(w )d�LL (w ) g (W )d( 1L' ) L (W ) . l. r1m >0 r =
J
Note that, if IL' and 1 ' are two internal derivatives of IL, then J g(w)d(IL')L (w) = J g( w )d(r')L for all g E C, but the Loeb measures ( �L ')L and (r')L on A and hence also the afields £.,, (A) and £1, (A) may be different. In Example 17.2.2 we defined an internal counting measure IL· Since IL is SSkorohoddifferentiable with internal derivatives T, t E J \ { 0 } , t � 0, we can17apply Lemma 17 17.3 . 1 . But as we have seen in Example 17.2.2, for k E *N with E J and � 0 and A = [0, k}/]
Example 1 7. 3 . 1
( IL Hk 17 IL ) (A) L
=
IL =!5:.  IL 0 and ( HI'l ) (A) L
=
1.
Nevertheless, there exists a afield, on which the Loeb measures ("1 �" ) L coin cide for all infinitesimals t E J \ { 0 } . Let B(ffi.) be the ( standard ) afield of all Borel subsets of R For B E B(ffi.) let sC 1 [B] = {w E D : 0W E B} . According to the usual approach to standard Lebesgue measure by a nonstandard count ing measure ( see [ 7] or [ 12] ) , the external set st  1 [B] is an element of L.,r (A) for all r E I and (ILr ) L ( s C 1 [B ] ) = vr ( B ) , where ur ( B ) = JB +r 1 [o . l j (x ) d>. (x ) (A) with standard Lebesgue measure >.. But st  1 [B ] is also an element of L for all infinitesimals t E J \ { 0 } and PrP t
( ILt � IL )
L
( sC1 [B ] ) = 1B( 1)  1 B (O) .
Hence the Loeb measures ( "1�" h , restricted to the afield {sC1 [B ] : B E B(ffi.)}, coincide for all infinitesimals t E J \ { 0 } . If we define a measure v' on B(ffi.) by v'(B ) = 1 B ( 1)  1B(O), then the Sdifferentiability of the internal counting measure tt yields the Skorohoddifferentiability ( with respect to 1 E ffi.) of the standard measure v with derivative v'.
17.3. Differentiability of Loeb measures
245
We will now see that in the case of SFomindifferentiability the afield :F doesn ' t depend on the chosen internal derivative and the derivative (ILL)' of ILL is uniquely determined. 1\Ioreover, the differentiability of ILL is true not only with respect to standard parts of internal functions, but also with respect to all :Fmeasurable Sbounded realvalued functions on D. We gather these facts together into the following, which is the main theorem of this paper.
Let 1L be SFomindifferentiable and :F = n rE I L11r (A). Then ILL is differentiable on :F with respect to the set C = { 1 B B E :F} and the differentiability is uniform on C. The derivative (ILL )' is uniquely de termined and is absolutely continuous with respect to ILL . If IL' is an internal derivative of IL, then the Loeb extension (IL')L is defined on :F and coincides with (ILL)' . In particular this is true for all internal measures � , where t E J \ { 0} is infinitesimal. Theorem 1 7. 3 . 1
:
Let IL' be a derivative of IL· We will show the following statements: (A) For all internal sets A E A the limit
Proof.
(ILL) ' ( A) ' �'LL (A) '  'll. m "r'r> 0 r exists and is equal to ( �L')L ( A) . The convergence is uniform on the inter nal field A. (B) If N E :F is a ILLnullset, then (ILL) r (N)  �LL (N) lim = 0. r r>0 The convergence is uniform for all ILLnullsets of :F. Any ILLnullset of :F is also a (IL')Lnullset of :F. (C) If B E :F and if A E A is ILLequivalent to B, then (ILL) r (A)  ILL ( A) (ILL) r (B)  ILL ( B) = lim lim ' r>0 r> 0 r r in particular the left limit exists. The convergence is uniform on :F. (D) The Loeb extension (IL')L is defined on :F and for all B E :F
(A) follows from Lemma 17.3.1. To prove (B) let N b e a ILLnullset of :F, i.e. there exists a sequence (N� )n EN C A so that for all n E N we have N C N1_ , Nn� c:;; N� and 1
1 7. Differentiability of Loeb measures
246
p,(Nl ) :<:: � · Let N all r nE I
:=
n�=l N� . Then N E :F, p,L (N)
=
0 and we obtain for
Since p,L (N) = 0 we get
Therefore it is sufficient to show that lim0 (!n ); (JV) r+
=
0. Now since p,L (N)
=
0,
r
limn_, = ( !1L ) r ( N� )  limn_, = /1L ( N� ) r
( !1L) r (N� )  !1L (N� )
lim
" It follows from (A), that for each E N the limit limr+ O '\ exists and is equal to (p,')L (N� ). Since (p,')L is defined on the smallest afield containing A, the limit limn + = (p,')L (Nl ) (p,')L (N) also exists. Because of the uniform convergence stated in (A) and since (N� )n EN C A we can exchange the limits as follows: (11L)r( N 1 )pL( N 1 )
n
=
lim nlim +CXJ r+ 0
(p,L) r (Nl )  p,L (N1_ ) n
r
lim lim r+ 0 n+CXJ
(p,L) r (N� )  ttL (N� )
n
So the measure /1L is differentiable at N and the value of the derivative is (p,')L (N) . It remains to show that (p,')L (N) = 0. To this end we usc an argument due to Smolyanov and Weizsacker [15] . Let us consider the function
Then f is differentiable at t = 0 and f'(O) = (p,')L (N) . Since f is nonnegative and j(O) = 0, the first derivative of f in 0 must be 0. Hence (p,')L (N) = 0. The uniformity of the convergence can be seen by using again the uniform convergence on A. Of course (t/)L (N) = 0 since N C N. Hence the P,L nullsets of :F are also (p,')Lnullsets of :F.
17.3. Differentiability of Loeb measures
247
(C) Here we show the differentiability for an arbitrary element of :F. So let B E :F, A E A ILLequivalent to B and r E I. Since ILL (B ) = ILL (A) , it is sufficient to show that (ILL )r ( B )  ( ILL)r(A) O. rlim > 0 r Since ILL is nonnegative the following estimate is easy to verify. =
where A 6 B denotes the symmetric difference ( A \ B ) U (B \ A ) . Now (B) yields
I
(ILL )r(B)  (ILL)r( A ) rlim > 0 r
I
< lim  r> 0
(ILL)r( A 6 B) I I T
=
O.
The uniform convergence follows from (A) and (B). Hence (C) is proved. (D) follows from (A), (B) and (C).
D
Let IL be a measure with internal Lebesgue density satisfying the assumptions of Lemma 17.2.1 for a fixed y E *R Recall that the internal curve is given by ILt( A ) IL(A + ty) for all Borel subsets A C *R Let c E rn:. + , I =]  c, r:: [ . Since IL is SFomiudifferentiable, we can apply Theorem 17.3.1. Hence ILL is differentiable on :F = n r EI L ,lr (A) with respect t o c = { 1B B E :F}. Moreover, for the uniquely determined derivative (ILL)' we obtain: Example 1 7.3.2
=
for all B E :F, where (3� is defined in Lemma 17.2.1 The power of Fomindifferentiability is also shown in the last result. Corollary 1 7. 3 . 1 If IL is S Fomindifferentiable with an internal derivative IL' and :F is defined as in Theorem 1 7. 8. 1, then the Loeb measure ILL , restricted
to :F, is differentiable with respect to the set C of all :F measurable realvalued bounded functions on D. The Loeb measure ( �L')L, restricted to :F, is the deriva tive of (IL )L .
Proof.
This is routine integration theory using the uniform differentiability on
:F and the approximation of measurable functions by simple functions. D Remark An application of Fomindiffcrcntiability is given in [15] by Smolya
nov and Weizsacker. There, a nonnegative measure v on a locally convex space
1 7. Differentiability of Loeb measures
248
E is considered which is Fomindifferentiable along all elements of a Hilbert subspace of E. Using this measure differentiability, Smolyanov and Weizsacker introduce an operator on a subspace of £ 2 (E, v) . In the Gaussian case this operator is the derivative operator of the Malliavin calculus (see [11]). The question of the assumptions that are needed for differential properties of Loeb measures to yield such a derivative operator is the subject of ongo ing investigation. References
[1]
A IGNER , Differentiability of Loeb measures and applications, PhD the sis, Universitiit Munchen, (in prep.) E.
[2] S. ALBEVERIO, .J . E . FENSTAD, R. H 0EGH KROHN and T. LINDSTR0M, Nonstandard Methods in Stochastic Analysis and Mathematical Physics, Academic Press, New York, 1986. [3] V.I.
AVERBUKH, O . G . S MOLYANOV and S .V. FOl\1IN, "Generalized func tions and differential equations in linear spaces, I. Differentiable mea sures", Trudi of Moscow Math. Soc., 24 (1971) 140184.
[4] V . I .
" Differential properties of measures on infinite dimen sional spaces and the Malliavin calculus", Acta Univ. Carolinae, Math. Phys., 30 (1989) 930. BOGACHEV,
[5] V.I.
BOGACHEV, "Smooth Measures, the Malliavin Calculus and Approx imations in Infinite Dimensional Spaces", Acta Univ. Carolinae, Math. Phys., 3 1 (1990) 923.
[6] V . I .
BOGACHEV, "Differentiable Measures and the Malliavin Calculus", Journal of :tvlathematical Sciences, 87 (1997) 35773731 .
[7] N .
CuTLAN D,
Loeb Measures in Practice: Recent Advances, Lecture Notes
in Mathematics 1751 , SpringerVerlag, 2000.
[8] N .
CUTLAND and S .A. N G, A nonstandard approach to the Malliavin calculus, in Applications of NonstandardAnalysis to Analysis, Functional Analysis Propability Theory and Mathematical Physics (eds. S. Albeverio, W.A . .J. Luxemburg and M. Wolff) , D.ReidclKluwer, Dordrecht. 1995.
[9] S. V. FoMIN, Differential measures in linear spaces, in Proc. Int. Congr. of Mathematicians. Sec. 5, Izd. Mosk. Univ. , Moscow, 1966.
249
References
[10]
A .V. K IRILLOV , "Infinitedimensional analysis and quantum theory as semimartingale calculus", Russian Math. Surveys, No. 3 (1994) 4195.
[11]
D. NuALART, The Malliavin Calculus and and its Applications, SpringerVerlag, 1995.
[12]
H. O sswALD, Malliavin calculus tion, Book manuscript, 2001.
[13]
A.V. S KOROHOD, Integration in Hilbert Spaces, Ergebnisse der Mathe matik, SpringerVerlag, Berlin, NewYork, 197 4.
[14]
O . G. SMOLYANOV and H.V. WEIZSACKER, "Formulae with logarithmic derivatives of measures related to the quantization of infinitedimensional Hamilton systems", Russian Math. Surveys, 551 (1996 ) 357358.
[15]
O . G . S l\!OLYANOV and H . V . WEIZSACKER, " Smooth probability mea sures and associated differential operators", Inf. Dim. Analysis, Quantum Prob. and Rcl. Topics, 2 (1999) 5178.
[16]
H .V . WEIZSACKER, Differenzierbare Maj3e und Stochastische Analysis, 6 Vortragc am Graduiertenkolleg 'Stochastische Prozesse und probabilistis che Analysis' . Januar/Februar, Berlin, 1998.
Related Topics,
Probability
in abstract Wiener spaces. An introduc
The power of Gateaux differentiability
Vftor
N eves*
Abstract
The search for useful nou standard minimil:ation conditions on C1 func tiona]s defined on Banach spaces lead us to a very shows that
if a C1 function .f
:
E
+
simple argument
which
F between Banach spaces is actu
finite points along finite vectors, then it iS UJlllOrll!J.y Conti nUOUS On bounded SetS il' and ouJ.y if it is Jipschitziau
ally GatealL'C differentiable on
on bounded sets. The following is a development of these ideas starting from locally convex spaces.
18. 1
Preliminaries
This section consists of
an
informal description of Lools to fi·ame the dis
course and therefore contains only a small number of theorems and few proofs. Ccu·eful foundational treat me nts may be found in [3] , [9[, [14[ or [ 1 [ ; we shall also use Nelson's quantifiers vst an d 3st as in [ 8] . 'vVe assume the existence of two settheoretical structures A an d f3 and a function *(·) : A + B satisfying the properties we proceed to describe. Elements of eit:her of t he structures which are sets shall be called entities 1 •
(Pl)
A is a model of the Televant A nalysis in the sense that any object of Classical IVIathematical Analysis as well as the mathematical structures under s t udy have an interprctat.ion in A and theorems about them are true in A. In particul m the complete ordered field lR of real numbers or any other classical space cu·e elements of A.
•Departamento de Matemat;ica, Universidade de Aveiro. vneves®mat . ua . pt
Work for this article was partially supported by FCT via both the grant POCTI\ MAT\41683\01 and funds from the R&D unit CEOC. 1We shall also arlmit the existence of atoms i.e. objects wit.hout clements which arc not the empty set; for instance numbers are atoms.
18. The power of Gateaux differentiability
254
The image by *() of any given element a E A will be denoted *a and be called standard as well as a itself; elements of standard sets in B shall be called internal ; non internal sets in B are called external. A set of sets verifies the Finite Intersection Property, abbreviated f.i.p. , if all its finite subsets have nonempty intersection. Denote the cardinal (num ber of clements) of a set S by card(S) and its power set by P(S) . (P2) Polysaturation 2
Given E E A and C C *P(E), if C verifies the f. i.p. and card( C) card(A) then n C # 0
<
There is an encompassing formal first order language £ with equality = and E (to be read as the membership relation) , with the usual connectives ' for negation, 1\ for conjunction, V for disjunction, =? for implication, ¢? for equiva lence, universal V and existential ::J quantifiers and enough constants, predicate and function symbols to denote any elements, relations and functions under consideration either in A or in B. Say that a formula of £ is bounded if its quantified subformulae, hereby including the formula itself, are of the form
Vx [x E a =? ¢]
or
::Jx [x E a 1\ ¢]
for some constant a and formula ¢; a sentence is a formula without free variables. B contains a formal copy of A in the following sense. (P3) Transfer
If ¢( a1 , , an) is a bounded sentence with occurrences of constants ai ( 1 :<.:: i :<.:: n) and no more constants, then ¢( a 1 , , an) is true in A iff ¢(*a 1 , , *an) is true in B 3 . ·
·
·
·
·
·
·
·
·
It is always useful to keep in mind theorems 18. 1 . 1 and 18. 1.2 below. Say that a formula of £ is standard (resp. internal) if it is bounded and all its constants denote standard (resp. internal) elements of B. Theorem 18. 1 . 1 (Principles of Standard and of Internal Definition) A set B E B is standard (resp. internal) iff there exists a E A and a standard (resp. intemal) formula ¢ such that B = {x E *al ¢(x) } or, more informally,
a set in B is standard or internal iff it is definable by a respectively standard or internal formula. 2 Actually
formulate.
this is stronger than needed for our purposes, but it is powerful and easy to
38 is a kind of elementary extension of A with embedding *( · ) .
18.1.
255
Preliminaries
Call monad any intersection
n cEC *C with 0 # c E A.
Any internal set which contains a monad n cEC *C also contains one of the standard sets *C.
Theorem 1 8 . 1 . 2 (Cauchy's Principle) Hyperreal numbers
A E *JR:. are classified the following way
A is finite . A is

infinitesimal . A is infinite .

::lr E lR:. lxl .:::; *r Vr E lR:. [r > 0 =? lxl A is not finite.
.:::;
*r]
0 and fL denote respectively the set of finite and infinitesimal hyper real numbers. For any given ( real ) locally convex space S. let fs denote a gauge, i.e. a directed set, of seminorms defining the topology of S; fin(*S) and fL(*S) shall denote respectively the sets of finite and infinitesimal elements in *S, i.e. , for any given x E *S,
V1 E fs r(x ) E 0 V1 E fs r (x ) E fL
x E fin(*S) XE
fL(*S)
(18. 1 . 1) (18. 1 . 2)
When rs is unbounded, in the sense that Vx E
S fs (x) . { r (x ) l r E fs}
is unbounded,
(18.1.3)
it so happens that Vx E
*S [x E fL(*S)
¢?
vst l E fs r(x)
_:::;
1] .
(18.1 .4)
Condition (18.1 .4) has the syntactical advantage of dispensing with a quanti fier; as we will present strongly syntactical reasoning,
we assume from now on that condition ( 18. 1 .3) is always verified. Denote uc the set of standard elements of *C, i.e.,
uc
:=
{*xl x E C}.4
If E and F are locally convex spaces, L ( k ) (E, F) denotes the space of continuous klinear maps l : E k + F; a map l E *L ( k ) (E, F) is said to be finite ( resp. infinitesimal) if l(fin(*E) k ) � fin(*F) ( resp. l(fin(*E) k ) � tt(*F) ) . I t is useful to keep in mind the following results ( [4] is a most complete source; also see [14, chap. 10] ) . 4
We sometimes identify C with "C for the s ake of simplifying notation.
18.
256
The power of Gateaux differentiability
In a locally convex space S, 1. x E fL(*S) if and only if \f).. E 0 Ax E fL(*S) 2. x E fin (*S) if and only if \f).. E fL Ax E fL(*S)
Theorem 1 8 . 1 . 3
3.
(x E *S) .
( x E *S) . Vx E fL(*S) ::3 ).. E * ffi. \ 0 Ax E fL(*S), therefore i t is also true that Vx E *S [x E fL(*S)
¢?
::3 ).. E * ffi. \ 0 Ax E fL(*S)]
Let x ::::l y mean that x is infinitely near y, i.e., denote standard part whenever appropriate, i.e . ,
y
= st (x)
:=
y E us & x
::::l
x  y E fL( *S) ; also let st y:
ns (*S) will denote the set of nearstandard vectors of ns (*S)
*S, i.e. ,
: = us + fL( *S) .
f : *E + *F is Scontinuous on A c:;: *E if for all x, y E A , whenever x ::::l y ; by definition, any standard function f is Scontinuous if *f is. Theorem 18. 1 .2 implies that Scontinuity of internal functions is nothing else than continuity measured with standard tolerances; this and condition (18. 1 .3) imply: A function
f(x)
::::l
f(y).
Suppose A E *P(E) . An internal function f A + *F is Scontinuous at x E A iff Vv E U fp Vc E U ffi. ::lr E u rE ::36 E Uffi. Vy E A [r(x  y) < 6 =? v(f(y)  f(x)) < c] .
Lemma 18. 1 . 1
In particular
A function f : A c:;: E + F between locally convex spaces E and F is 1. continuous (on A) iff it is Scontinuous on *A n ns (*E) ; 2. uniformly continuous (on A) iff it is Scontinuous on *A.
Theorem 18.1.4
And
Given locally convex spaces E and F and an internal *k linear function l *E k + *F, the following conditions are equivalent 1. l is Scontmuous.
Theorem 1 8 . 1 . 5 :
18.2.
257
Smoothness
2. l ( fin (*E)) c:;: fin(*F) (i. e. the internal Scontinuous maps are the internal finite maps). 3. l (tL(*E) ) c:;: tL(*F) . It might also be interesting to notice the following Theorem 1 8 . 1 . 6 Let E and F and ¢ : (A x fin (*E)) + *F be an
be locally convex spaces, A be a subset of *E internal function which is linear in the second coordinate. If ¢ is Scontinuous in the sense that Vx. y E A Vu, v E fin( *E) [x :::::J y & u :::::J v =? cjJ(x, u) :::::J cjJ(y, v)] , then, for all x E A , ¢(x, ·) is a finite map. Proof. Assume ¢ : (A x fin (*E)) + *F is Scontinuous, x E A c:;: *E and that u E fin (*E) ; for all t E fL , tu :::::J 0, therefore t¢(x, u) ¢(x, tu) :::::J ¢(x, 0) 0, hence, by theorem 18.1 .3, ¢(x, u) E fin(*F) as required. D Observe that, when S is normed, an element x E *S is infinitesimal (resp. finite) iff ll x ll E fL (resp. llxll E 0) and the following holds. =
=
Corollary 18. 1 . 1 Given normed spaces E and F and a klinear function l : E k + F, the following conditions are equivalent 1. l is continuous 2. Vx E *S [ llxll :::::J 0 =? ll l (x) ll :::::J O J 3. Vx E *S [ ll xll E 0 =? ll l(x) ll E 0 ] . Of course theorem 18. 1.5 and corollary 18. 1.1 essentially say that , for linear maps continuity is equivalent to continuity at zero.
18.2
Smoothness
Let E and F be complete locally convex spaces, A be an internal subset of *E, f : A + *F and l (  ) : A + *L(E, F) be internal and write f (x + tv) = f(x) + tl x (v) + tL (x E A , v E *E, t E *ffi., L E *F) . (18.2. 1) :::::J
f i s GSdifferentiable at x with GSderivative lx, i f L 0 whenever v E fin (*E) and t :::::J 0; f is uniformly differentiable at a E *E with uniform derivative l () if f is GSdifferentiable at x and lx is finite, for all x :::::J a. We shall also call l : A x fin (*E) + *F the derivative of f and define (x, v E *E) . df(x, v) := l(x, v) : = lx (v) := dfx (v)
18.
258
The power of Gateaux differentiability
Observe that , when one takes x and v in E within equation (18.2. 1), the GS derivative df ( x. v) is just the Gateaux derivative ( Gderivative for short) of f at x along v. The following theorem is well known too (see [13] ) .
Theorem 18.2.1 If E and F are Banach spaces, the function f +
:
E
+
F
has Gderivative df(x, y), for all x E E, and x df(x, ) : E + L (E, F) is continuous for the uniform topology on L (E, F), then x + df(x, ) is actually a Fnichet derivative. ·
The following is a simple generalization of proposition
·
2.4 in [15] .
Theorem 18.2.2 If the internal map f : *E
at all points of an internal set A E 1.
*P( E)
+ *F is uniformly differentiable with derivative df, then
f is Scontinuous on A
2. df : A x fin(*E) + *F verifies (a) �f (A x fin(*E)) c:;: fin (*F) (b) df is Scontinuous in the sense that Vx, y E A Vu, v E fin ( *E) [x :::::J y & u :::::J v =? dfx (u) :::::J dfy (v)] . Proof. Borrowing from [15] : suppose that f : *E + *F is internal and uni formly differentiable at all points of the internal set A, that x, y E A and that x :::::J y; take A E *ffi. \ 0 such that .A(x  y) E fL ( *E) (theorem 18.1 .3) and define t := ±; t is infinitesimal and, for some L E fL(*F),
f(y + t.A(x  y))  f(y) tdfy (.A(x  y)) + tL dfy (x  y) + tL E fL(*F)
f(x)  f(y)
by theorem
18.1 .5,
and
1 is proven .
2 (a) is an immediate consequence of the fact that the maps dfx are finite and this is useful in proving Scontinuity of df in (b) : take x, y E A and u, v E fin(*E) such that x :::::J y & u :::::J v; observe that df(x, u)  df (y, v)
=
dfx(u  v) + dfx (v)  dfy (v).
As the maps dfx arc finite, by theorem 18.1 .5, df(x, u  v) E fL(*F) and it is enough to show that, under the present assumptions, dfx (v)  dfy (v) :::::J 0.
18.2.
259
Smoothness
Now, taking A and t as above and v E fin ( *E ) , there exist infinitesimal vectors L. rJ, J E *F such that
djy (x  y) + t L ( 18.2.2) f(x)  f(y) (18.2.3) tdfx(v) + tr] f(x + tv)  f(x) tdjy (v) + djy (x  y) + tJ. (18.2.4) f (y + t(v + .A(x  y)))  f(y) Summing (18.2.2)+( 18.2 .3)(18.2.4), we obtain t(dfx (v)  djy (v)) = t ( L + rJ  6) and dfx (v)  djy (v) :::::J 0 as required. D Moreover Stroyan also showed in [15] that uniform differentiability is a very good generalization of Gdifferentiability in that, among other properties, it overcomes the impossibility of topologizing L(E , F) in such a way that the eval uation map (x, y) f+ l(x) : L(E, F) x E + F be continuous when E and F are not nonnable locally convex spaces 5 ; we established the equivalence between uniform and Cqb differentiability of standard functions on complete locally con vex spaces, and discuss relations with a weaker kind of differentiability when fin ( *E ) = ns ( *E ) , in [12] . The main results in this context read as follows. Assume that U is a nonempty open subset of E, and k E N.
:
f U + F, a E 17U + fL (*E)
Definition 18.2.1 (Stroyan [15] ) The function f is kuniformly differen tiable at a if there are finite maps d ( i ) f() U + L( i ) ( E, F) , ( 1 <:.:: i <:.:: k)  the derivatives of f  such that, with d d ( l ) , for all x E fin ( *E ) and all t E fL(*IR'.) , there exist rJ E fL(*F) and infinitesimal maps T/z (2 <:.:: i <:.:: k), :
:=
so that
=
(18.2.5) f(a) + tdfa (x) + tr] f (a + tx) + l ( ( ) ) ) ( d z fa (·) + td z fa (x , · ) + tr]i (  ) (1 <:.:: i < k) . (18.2.6) d " fa + tx(·) f is kuniformly differentiable in U if it is kuniformly differentiable at all a E 17 U + fL(*E) . When f is 1uniformly differentiable we just say that it is uniformly differentiable. The following may also be found in [15] . Theorem 18.2.3 When f is kuniformly differentiable, not only .f itself is Scontinuous but also the derivatives d ( z ) f ns ( *E ) x fin ( *E ) + *F are S cordinuous: Vx, y E ns ( *E ) Vu, v E fin ( *E ) [x :::::J u & y :::::J v =? d (i ) fx (u) :::::J d( i ) fy (v)] (1 S: i <:.:: k) . =
:
5A
quite simple and clear explanation of this can be found in
[7,
page
2] .
18.
260
The power of Gateaux differentiability
Theorem 18.2.4 ( Taylor's formula [15]) f is kuniformly differentiable in U if there are maps dj ( i) U + L � '? (E, F) (1 <:.:: i <:.:: k), such that, whenever :
a E au + fL(*E), x E fin(*E), t E fL, there exists rJ E fL(*F) such that k i j(a + tx) = f(a) + L �t df�i ) (x, · · · , x) + t k rJ. i= l z.
Relations with Fdifferentiability on Banach spaces are actually studied in [14, chap. 5] . On what regards standard definitions of smoothness we take from [6] , whereto we refer the reader for details, namely on convergence
structures.
Definition 18.2.2 Let N denote the filter of neighborhoods of zero in JR:., B be
a filter on E and, for any filter :F in a (real) vector space, N:F denote the filter generated by { {rb l r E N & b E F} l N E N & F E F } ; also, for k E N, let B k be the filter in E k generated by {B 1 x · · · x Bk l B1 , · · · Bk E B} and, for any filter :F in L ( k) (E, F), F(B k ) be generated by {U 1 E F l (B) I F E :F & B E B}; finally, if¢ is afunction and :F is a filter on the domain of¢, ¢(F) is generated by {¢(F) I F E :F} . B is quasibounded if NB converges to zero; Aqb denotes the conver gence structure of quasibounded convergence; when a filter :F converges to x E E with respect to Aqb, we write :F E Aqb (x) . 2. A filter :F in L ( k) (E, F ) qbconverges to zero if :F(Bk ) converges to zero in F whenever B is quasibounded in E. 3 . A junction l : U + L ( k ) (E, F) is qbcontinuous if l(:F) E Aqb(l(a)) whenever a E U and :F is a filter in E converging to a. 1.
4.
Recall that U is an open subset of E and f : U + F. We say that f is
of class c;b if
(a) f is qbcontinuous. (b) There exist maps d'i( . ) : U + L ( z ) (E, F) (1 <:.:: i <:.:: k) such that i. d' i( . ) is qb continuous, for all i = 1 , · · , k ii. For all a E U, all x E E and all i = 0, · · · , k  1 ·
i = d + 1 fa( x, · ) lim t > 0 �t ( d'fa+ tx  Ja) ' for the topology of pointwise convergence on L i (E, F) .
18.3.
261
Smoothness and finite points
Theorem 18.2.5 ([12] ) The function f is of class c;b , with derivatives di i( . ) , ( 1 :S i :S k) iff it is kuniforrnly differentiable with the sarne derivatives. The following is also a consequence (for example see theorem ( 18.2.5) or of theorem 18.2.4.
[6] )
either of this
:
Theorem 18.2.6 The function f E + F, between Banach spaces E and F, is of Frechet class C k , with derivatives di fc ) , ( 1 :S i :S k) iff it is kuniforrnly
differentiable with the sarne derivatives.
18.3
Smoothness and finite points
Recall that E and F denote real locally convex spaces.
Lemma 18.3.1 Suppose A E *P(E) . The following conditions are equivalent for any internal rnap f A + *F which is GSdifferentiable at all x E A n fin(*E), with GSderivative df : (A n fin(*E)) X fin(*E) + *F. 1. f is uniformly differentiable at all x E A n fin (*E) 2. f is Scontinuous on A n fin (*E) 3. df( (A n fin(*E)) x fin(*E)) C:: fin(*F) :
Proof. For the sake of simplicity, we assume A = *E; the proof of the general case is an easy adaptation thereof. Suppose that df : fin(*E) x fin(*E) map f : *E + *F.
+
*F is the GSderivative of the internal
(1 =? 2) This is 1 in theorem 18.2.2 above. (2 =? 3) Assume that f is Scontinuous on fin(*E), let df() : fin(*E) + *L(E, F) be the GSderivative of f. Pick x and v in fin(*E); we must show that dfx (v) is finite; suppose this is not the case, take a seminorm 1 E fp such that r(dfx (v)) t/c 0 and let t := r(df: ( v ) ) ; t is infinitesimal and, by hypothesis, there exists L E fL(*F) such that f(x + tv) = f(x) + tdfx (v) + tL; but then 1 r(tdfx(v)) I (f(x + tv)  f(x)  tL) � 0 =
=
which is impossible; it follows that dfx (v) must be finite as required. (3 =? 1) In the presence of GSdifferentiability, 3 completes the definition of uniform differentiability. D Say that a standard function f : E + F is GS, or uniformly, differentiable at x E *E with derivative dfx if respectively *f is GS, or uniformly differentiable, at x with derivative *df.
18.
262
The power of Gateaux differentiability :
Theorem 18.3.1 If a standard function f E + F is GSdifferentiable with standard derivative dfx at all x E fin(*E), the following conditions are equivalent 1.
f is uniformly differentiable on bounded sets with derivative df .
2. f is umformly continuous on bounded sets. 3.
For any bounded subset B of E2 , df(B) is bounded in F.
4.
df is uniformly continuous on bounded subsets of E2 .
As we said before, the proof is easy. Start by keeping in mind that
Theorem 18.3.2 A subset B of the locally convex space S is bounded iff *B C:: fin ( *S ) ; in particular, if E and F are normed and L(E: F) is endowed with the usual uniform norm, a subset B of L(E, F) is bounded iff
V¢ E *B Vx E fin(*E) ¢(x) E fin(*F).
(18.3.1)
Proof of thm. 18.3 . 1 . Equivalences (1 ¢? 2 ¢? 3 ) are immediate conse quences of lemma 18.3.1 and theorem 18.3.2 (note that , by theorem 18.3.2 above. *B n fin(*E) = *B if and only if B is bounded) .
(3 =? 4) Follows from theorems 18.3.2 and 18.2.2. ( 4 =? 2) If we show that condition 4 implies that dfx is a finite map when ever B is a bounded subset of E and x E *B, we are done; but this follows from theorem 18.1 .6. D As an application to Banach spaces: :
Corollary 18.3.1 Let E and F be Banach spaces and f E + F be a S continuous and GS differentiable function at all x E fin (*E) with standard
denvatzve df, then 1.
df zs actually the Fnichet derivative of f and f is of class C 1 .
2. (Mean Value) Denotmg [x, y] the line segment from x to y, Vx, Y E E 3z E [x, y] ll f(x)  f(y) ll :::; ll dfz ll · ll x  Y ll ;
(18.3.2)
therefore f is Lipschitzian on bounded sets, i.e., for each bounded subset B of E, there exists K E rn:. such that Vx, y E B ll f(x)  f(y) ll :::; K ll x  Y ll .
(18.3.3)
263
18.4. Smoothness and the nonstandard hull
For all infinitely near finite vectors x, y, z E *E , there exists an infinites imal L E *F such that
3.
f(x)  f(y)
=
dfz (x  Y ) + ll x  Y IIL
( 18.3.4)
Proof. 1 . As f is Scontinuous on fin(*E) . it is uniformly continuous on bounded sets and hence, by theorem 18.3. 1 , of Frechet class C 1 . 2 . Condition (18.3.2) is true in view of part 1 above; therefore condi tion ( 18.3.3) is true (by theorems 18.3.1 and 18.3.2) because fin(*E) is *convex. 3 . We borrow from [14, page 97] : the Frechet derivative x r+ dfx is Scontinuous at finite points by 2b in theorem 18.2.2 and we may apply Trans fer of the Mean Value condition ( 18.3.2) to the function
g
='
V r+
J ( V )  dfz ( V  ) Z
,
whose derivative v r+ dgv = dfv  dfz is infinitesimal whenever therefore verifies, for some c E [x , y] , hence c :::::J z,
II J ( x )  J ( y )  dfz (x  y ) ll ll x  Y ll 18.4
=
ll g(x)  g(y) ll
s;
v :::::J
l dgc ( l l xx  yY l l ) I :::::J
z, and
0.
D
Smoothness and the nonstandard hull
This section evolves along main ideas due to Manfred Wolff. Recall that · :::::J is an equivalence relation on *S, with equivalence classes x := x + fL(*S) and let S be the nonstandard hull of *S with seminorms i' or •
S
:=
fin( *S ) ;
�
& ry (x)
:=
st ( r ( x ) ) (x E fin(*S) ) .
Nonstandard hulls are complete spaces, but may vary with the chosen model of analysis A: actually Banach spaces with invariant nonstandard hulls are finite dimensional, but this is not the case with locally convex nonnormable ones; spaces with invariant nonstandard hulls were characterized in [5] and were named HMspaces by Keith Stroyan (relevant data is summarized in detail in [14, chap. 10] ; also see [10, sec. 3.9] for nonstandard hulls of metric spaces) . Any internal function between locally convex spaces. f : A C:: *E r+ *F that is S continuous on fin(*E) n A and such that J (fin(*E) n A) C:: fin(*F) has a natural nonstandard hull j : C:: E + P too defined by
A
A ](x)
{x l x E fin(*E) n A } j(x) (x E A n fin(*E)) ;
18.
264
The power of Gateaux differentiability
note that j is not a standard function in the sense we have been considering, for it certainly is not a priori an element of the model of analysis A, nevertheless an application of Nelson's Algorithm ( [8] , [11]) will provide us with definitions of differentiability for functions without reference to nonstandard extensions. From now on f : *E + *F denotes an internal GSdifferentiable function with derivative df such that
f ( fin(*E))
C:::
(18.4. 1 )
fin(*F)
and
(18.4.2)
whenever this is a good definition, namely when df is Scontinuous on fin(*E) and x, v E fin(*E) . For any GSdifferentiable function h : E + F ,
t:.. ( h x , v, t) 18.4.1
:=
( � (h(x + tv)  h(x)  dh(x v) ) ) ,
(x, v E *E; t E * ffi. \ {0}) .
Strong uniform differentiability :
Definition 18.4 . 1 An internal function f *E + *F is Strongly Uni formly Differentiable (SUD for short) if it is uniformly differentiable on fin(*E) and f(fin(*E)) C::: fin(*F) . Let P and In these terms, ously verified
Q f
denote respectively gauges of seminorms for E and F. is SUD when the two following conditions are simultane
\i(x, v) E *E 2 [(x, v) E fin(*E) 2 =? df(x, v) E fin(*F)] \i(x, v) E *E2 \it E * ffi. [(x, v) E fin( *E) 2 1\ 0 cj. t E fL =? t:.. (j , x , v, t) E fL ( *F ) j ; these expand respectively to
\i(x, v) E *E2 [vstp E *P 3 8t m E *N p(x) + p(v) � m =? \i8t q E *Q3 8t n E *N q ( df(x , v)) � n] \i(x, v) E *E 2 \it E 'R vstp E *P 3 8t m E *N p(x)+p(v) � m 1\ \ist n E *N 0 < i t I � � =? \ist q E *Q q(t:.. ( f, x, v, t)) � 1 ;
[
]
which reduce respectively to the following, where we leave the domains implicit,
\i(x, v) vst q 3 8t (p, n) vst m [p(x) + p(v) � m =? q (df(x, v)) � n] \i(x, v, t) \i8t q 3 8t (p, n) \i8t m p (x) + p(v) � m
[
1\ 0 < l t l
�
� =? q(t:.. ( j , x , v , t)) � 1] ;
18.4.
265
Smoothness and the nonstandard hull
Nelson's algorithm then gives
\j8t q \18t m 3 st fzn p x N V(x, v) 3(p, n) E P x N [p(x) + p(v) � m(p, n) =? q (df(x, v)) � n ] \18t q \18t m 3 st fin p x N V(x, v, t) 3(p, n)
[
p(x) + p(v) � m(p, n)
1\
0 < ltl
(18.4.3)
PxN
E
1
� ; =?
q( !:::. (j, x, v, t)) � 1 ] .
(18.4.4)
Formulas (18.4.3 ) and (18.4.4) together define strong uniform differentia bility of an internal function and we actually proved the following
Theorem 18.4. 1 An internal GSdifferentiable function f : *E + *F is S UD
iff
\j8t q \18t m 3 st fin P x N V(x, v, t) 3(p, n)
[ [p(x) + p(v)
�
m(p, n)
1\
0 < ItI
�
E
PxN
� ] =?
[ q (df(x, v))
�
n
1\
q (!:::. ( j, x, v, t) )
�
]
(18.4.5)
1] :
When f is standard, transfer of ( 18.4.5) provides the definition, 18.4.2 below, of SU differentiability for functions between locally convex spaces in any "universe" A:
Definition 18.4.2 A standard function h : S + T between locally convex spaces S and T, with unbounded gauges of seminorms respectively I: and 8 , is Strongly Uniformly Differentiable, with derivative dh ( ) : S + L(S, T) , (S UD for short) when, leaving implicit that a E I:, T E e, m : I: X N N , x E S, v E S and t E JR \ { 0}, >
VT Vm
::J f z n c
[ [a (x) + a(v)
X
N c:;: I:
�
X
m(a, n)
e
V(x, v, t) 3 (p, n)
1\
0 < ltl
�
Ec
X
N
� ] =?
[T (df(x, v)) � n
1\
T( !:::. (h, x, v, t))
�
]
(18.4.6)
1] .
NB: variants of this formula where any inequality � is taken to be
strict, i.e., to be < , are equivalent.
18.
266
18.4.2
The power of Gateaux differentiability
The nonstandard hull
Theorem 18.4.2 Let E and F be locally convex spaces and f : *E + *F be
an internal GSdifferentiable function with derivative df such that f ( fin(*E))
( 18.4. 7)
� fin(*F) .
f is SUD iff j : E + F is uniformly differentiable with derivative dj : E2 + F given by d](x , v) := djr;;:; ) . Proof. Let us first assume that the internal function f : *E + *F is SUD with derivative df, and takes finite vectors of *E into finite vectors of *F. Define j3
Q := { I] I q E Q } ;
:= {P I p E P}
although there might exist continuous seminorms not of the type ,y, P and Q arc unbounded directed families of seminorms, so that condition ( 18. 1 .4) still holds with obvious adaptations. Also recall that
](x) = f(x) and observe that equation
defines dj well, in view of theorems 18. 1 . 5 and Take q E Q and rrL : Q x N + N . Define
M(p, n)
:=
m(p, n)
18. 1.6 and lemma 18.3. 1 .
(p E P, n E N) ;
M is a standard function. therefore we may deduce from exists a standard finite set P x N � *P x 'N such that
(18.4.5)
that there
V(x, v, t) E *E2 x 'R 3(p, n) E P x N
[p (x) + p(v) < M(p, n) 1\ 0 < l t l
:<::
�
=?
]
q(df(x, v)) :<:: n 1\ q( !:::. (f, x, v, t)) < 1 . P and N being standard and finite, N
:= N
'
p
define
:= {P I p E P} ,
267
18.4. Smoothness and the nonstandard hull
and take (x, v, t) ; whatever the representatives x and v, there is a correspond ing pair (p, n) E P x N, which may be vary with x and v, but always veri fies ( 18.4.3) ; suppose that
�; ( 18.4.8) n all the�bers involved here are standard, p(x) :::::J p(x) & p(v) :::::J p(v) as well as q(df(x, v)) q(df(x, v)); in particular, it follows p(x)
+ p(v) < M(p, n)
1\ 0
< ltl
�
:::::J
p(x) + p(v)
M(p, n )
m(p, n)
< so that
q(df(x, v))
n
< an thus

(j(df(x, v))
<
n
that is
(j(d}(x, v ) )
<
n
and the first factor in the consequent of ( 18.4.5) follows for j as required; for the remaining factor of the consequent, observe that, actually
p(x) + p(v) < M(p, n )
1\ 0
< l tl
1
� ;
n
implies
q
( � (f(x + tv)  j(x)  dj(x, v) ) )
too; t and n are standard, 0 # t and 18.3. 1 , thus 1 t finally
( f (X + tv)  j (X )

dj ( X, V )
)
= q(t:.(J, x, v, t)) <
1
f as well as df are Scontinuous by lemma :::::J
(
1 ' t j (X + t v )
A
 j (X)


dj ( X, V )
)
;
(j(t:.(}, , x, v, t) ) � 1 . The proof that j is SUD with derivative dj is finished. Now suppose that j is itself uniformly differentiable with derivative d} : E2 ) F. Take q E ()"Q and a standard function m : *(P X N) ) *N; since ( ) is 1  1 M (p, n) := m(p, n ) ·
268
18. The power of Gateaux differentiability
defines a function from P x N into N and we may apply (18.4.6) appropriately written in terms of j, and obtain the corresponding finite set P x N � P x N; pick x , v E E, t E ffi. \ {0} and use ( 1 8 . 4 . 6 ) , with < for <:.:: , in order t o obtain (p, n) E P x N so that
[iJ (x) + fj(v)
<:.::
M(fj, n) 1\ 0 < l t l
<:.::
[<1 ( d](x, v))
�]
=?
< n 1\ q(l� (j, x, v, t)) < 1 ] ;
therefore, if
<:.:: 
fj ( x) + fj ( v )
<:.::  ,
then hence or
1 n
p( x ) + p(v) < m(p, n) 1\ 0 < l t l <:.::
M (fj, n) 1\ 0 < l t l
( 18.4. 9)
( 18.4. 1 0)
1 n
q ( d](x, v)) < n 1\ q (!:::. ( ], x, v, t)) < 1 q ( dn;:;)) < n 1\ q(!:::. (j, x, v, t)) < 1 .
I t immediately follows that
q (df( x , v) )
<:.:: n,
because both q (ifr;;;:; ) ) and the n above are standard; moreover and when t >f, 0 q(!:::. ( f, x , v, t)) <:.:: 1 ;
f) is linear ( 18.4. 1 1 )
it so happens that t � q (!:::. (f, x , v , t ) ) i s always an internal function and we actually have shown that
[
Vq E uQ 3n E u N o < l t l < which is the same as
Vq E ()"Q [o # t � o
=*
� =? q(!:::. (f, x , v, t))
<:.::
1
]
q(!:::. (f, x , v, t ) ) � o]
and thus (18.4. 1 1 ) also holds when t � 0. Again because ( ) is 1 1 , we may define p : = {P I p E P} 
and we have shown that ( 18.4.5) holds.
D
269
References
References [1] S . ALBEVERIO, J E . FENSTAD, R. H 0EGHKROHN and T. LINDSTR0M ( eds. ) , Nonstandard Methods in Stochastic Analysis and Mathematical Physics. Academic Press, 1 986. [2] LEIF 0. A RKERYD, NIGEL .J. CUTLAND and C. WARD HENSON (cds . ) , Nonstandard Analysis, Theory and Applications, Kluwer, 1997. [3] C. WARD HENSON, A Gentle Introduction to Nonstandard Extensions, in Nonstandard Analysis, Theory and Applications, Arkeryd, Cutland & Henson (eds. ) , Kluwer, 1997. [4] C. WARD HENSON and L . C . MOORE, "The Nonstandard theory of topo logical vector spaces", Trans. of the AMS, 172 ( 1 972) 405435. [5] C . WARD HENSON and L . C . M ooRE , " Invariance of The Nonstandard Hulls of Locally Convex Spaces", Duke Math . .T . . 40 93 205. [6] H . H . KELLER, Differential Calculus in Math 417, SpringerVerlag, 1 974.
in Locally Convex Spaces, Le
e
. Notes
[7] ANDREAS KRIEGL and PETER W . MICHOR, The Convenient setting of Global Analysis, Math. Surveys and Monographs 53, AMS ( 1 997) 405435. [8] FRANCINE D IENER and KEITH STROYAN, Syntactical Methods in In finitesimal Analysis, in Cut land ( ed. ) , Nonstandard Analysis and its Ap plications, CUP, 1 988. [9] T oM LINDSTR0lvl , An invitation to Nonstandard Analysis. in Cutland (ed. ) , Nonstandard Analysis and its Applications, CUP, 1 988. [10] PETER LOEB and MANFRED WOLFF (eds . ) , the Working Mathematician. Kluwer, 2000.
Nonstandard Analysis for
[ 1 1] E. NELSON, " Internal Set Theory: a new approach to Nonstandard Anal ysis", Bull. AMS, 83 ( 1 977) . [12] VfTOR N EVES, " Infinitesimal Calculus in HM spaces", Bull. Soc. Math. Belgique, 40 ( 1 988) 1 77198. [13] JACOB T. ScHWARTZ, 1969.
Nonlinear Functional Analysis, Gordon & Breach,
270
18. The power of Gateaux differentiability
[14] KEITH D. STROYAN and W . A . J . LUXEI\IBURG : ory of Infinitesimals, Academic Press, 1976.
Introduction to the The
[15] KEITH D . STROYAN, " Infinitesimal Calculus in Locally convex Spaces: 1 . Fundamentals", Trans. Amer. Math. Soc . , 240 ( 1978) , 363383.
Nonstandard Palais Smale conditions Natalia ��Iartins* and Vftor N eves**
Abstract
the PalaisSmale con cli tiou (PS) be low, some of them generalizations, but still sufficient to prove Mountain Pass Theorems, which arc quite import ant in Critical Point Theory. \!lie
19. 1
present uonst.anclarcl versious of
Preliminaries
For notation, other preliminaries and basic references on Nonstandard Analysis we suggest section 1 in article [13] in this volume, namely we assume that we have two settheoretical structures A and B  the former a. model of "the relevant" Analysis  and a 11 function * ( · ) : A . B which satisfies the Transfer and Polysatmation Principles. Specific notation and results follow. Recall that *JR denotes the set of hyperreal numbers and 0 the set of finite hyperreal numbers; if x E "'ll.t is infinitesimal we write x � 0. If x, y E *JR are such that x  y � 0, we say that x is infinitely close to y and write 'E � y. Also remember that finite hyperreal numbers have a standard part: .
Theorem 1 9 . 1 . 1 (Standard Part Theorem)
E
0 ihe1·e exists a ·u.niq'lLe r E lR sv,ch that :r; � t·; ·r is called the standard part of :r and is denoted by st ( x ) or 0x. Moreover, for all x, y E 0, st(x + y) = st (x) + st(y) , st(xy) = st(x)st(y) and if x � y then st (x) � st(y) .
rf x
*Departamento de Matematica, Universidade de Aveiro. nataliam@mat . ua . pt
Work for !:his article was partially supported by the R&D unit Center for Rc :>em·ch in Optimi?.atiou and Control ( CEOC) of t.hc University of Jlwciro and grm1t. POCTijMAT/41683/2001 of the Portuguese Foundation for Science m1d Technology
·�
(FCT)
via
FEDER.
vneves@mat . ua . pt
272
19. Nonstandard PalaisSmale conditions
Next we restrict some definitions to normed spaces; just recall that a normed space is a locally convex space whose topology is defined by a sin gle norm and also recall that, for any set A E A, "A :=
{*a : a E A},
and often A and "A are identified for simplicity.
Definition 19. 1 . 1 Let (E, 11 ·11) be a real normed space and x E *E. 1. x E fin(*E), i. e., x is finite if llxl l E 0;
2. x
E fL(*E) , x E fL(*E);
i. e.,
x
is infinitesimal if llxl l
3. x E ns(*E) , i.e., x is nearstandard x :::::J a, which means x  a :::::J 0; 4. x E pns(*E), exists a E 17E
:::::J 0
in *JR; x
if there exists a
:::::J 0
E 17E
means
such that
i. e., x is prenearstandard if for all positive c E lR there such that ll x  a ll < c.
It follows from the above definitions that ns(*E) c;; pns(*E) and ns(*E) fin(*E) . In general, ns(*E) of: pns(*E) and ns(*E) of: fin(*E).
c;;
Theorem 1 9 . 1 . 2 Let (E, 11 · 11) be a real normed space. 1. E is complete if and only if pns(*E) ns (*E); =
=
2. E is finite dimensional if and only if fin(*E) ns(*E); 3. A c;; E is compact if and only if *A c;; ns(*E) and st(*A) = A . The nonstandard extension of N, *N, is the set of hypernatural numbers and *N= denotes the set of infinite hypernatural numbers. It is also useful to keep in mind the following properties of sequences and functions in a real normed space (E,
11 · 11) .
Proposition 19. 1 . 1 Suppose ( xn ) n EN is a sequence in E. Then 1. ( X n ) n E N is bounded if and only if Vn E *N= X n E fin(*E);
2. ( xn ) nEN converges to a E E if and only if Vn E *Noc X n :::::J a; 3. ( x n ) n EN has a convergent subsequence if and only if ::lm Xrn E ns(*E).
E *N=,
Theorem 1 9 . 1 . 3 Let E and F be two real normed spaces and f E + F . The function f is continuous o n a E E if and only if
Vx E *E [ x :::::J a =;. f(x)
:::::J
f(a)] .
273
19.2. The PalaisSmale condition
19.2
The PalaisSmale condition
:t\'Iany results in Critical Point Theory involve the following condition, orig inally introduced in 1 964 [10] by Palais and Smale:
If E is a real Banach space, the C 1 junctional j satisfies the PalaisSmale condition iffor all ( Xn ) n EN E EN ,
Definition 1 9 . 2 . 1
:
E + lR
(f(xn )) n EN is bounded and nlim HXJ j'(x n ) = 0 =? ( Xn ) n EN has a convergent subsequence.
(PS)
Suppose (E, 1 1 · 1 1 ) is a real Banach space with continuous dual E' and duality pairing ( ·, · ) . Let C 1 ( E, lR) denote the set of continuously Frechet differentiable functionals defined on E; a is critical point ( resp. almost critical point) of f if f'(a) = 0 (resp. f'(a) :::::J 0) , i.e . , if (j'(a), v) = 0 (resp. (j'(a), v) :::::J 0) for all v E E (resp. v E fin(*E) ) : j(a ) is a critical value of f if j 1 (J(a)) contains critical points. If f : E + lR is Frechet differentiable we will denote by K the set of all critical points of f, that is,
K = {x E E : j ' (x) = O} and, for each that is,
c
E R
Kc
will denote the set of critical points with value
Kc = {x E E : .f ' (x) = 0 1\ f(x) = c} = K n f 1 (c) .
The following is easy to prove.
Su,ppose that f E C1 (E, IR) satisfies (PS) . Then 1. If f is bounded, K is a compact set. 2. For each a, b E lR such that a ::; b, {u E E : a :s; j(u) :s; b 1\ j'(u) = O} = f 1 ( [a, b] ) n K
Proposition 19.2.1
is a compact set. Note that
f satisfies and
(PS) and
K is compact
K is compact and f is bounded
c;l?
as can be seen with the following two examples.
c;l?
f is bounded
f
satisfies (PS ) ,
c,
19.
274
Nonstandard PalaisSmale conditions
Example 1 9 . 2 . 1 The real function f( x ) = x3 ( x E IR) satisfies (PS), is compact and f is not bounded . Example 19.2.2 Let
{ 2;2
x2 if X E if X tt
[  1 , 1] [ 1 , 1]
f is bounded, K = {0} but f does not satisfies
(PS ) .
f (X)
=
K = {0}
Example 19.2.3 The real functions exp ( x ) , cos ( x ) , sin ( x ) and all the con stant functions defined on lR do not satisfy (PS ) . Now we will present an important class of functionals that satisfies ( PS ) . Proposition 19.2.2 C1 (E, IR) is coercive fies (PS ) .
If E is a finite dimensional real Banach space and f E (i. e. f(x) + +oo whenever ll x ll + +oo), then f satis
Proof. Let (x n ) n EN C:.: E be such that (f(x n ) ) n EN is bounded and limn _, = f'(x n ) = 0. Then, for all n E *Noc, f(x n ) E 0 ( Proposition 19. 1 . 1 ) . Since f is coercive, for all n E *Noc , X n E fin(*E) . E is finite dimensional then , by Theorem 19.1 .2, fin ( *E ) = ns ( *E ) and therefore, for all n E *N= , X n E ns ( *E ) . Finally, Proposition 19. 1 . 1 implies that (x n ) n EN has a convergent subsequence. D
19.3
Nonstandard PalaisS male conditions
The following definition often shortens statements
Suppose f E C1 (E, IR) . We say that a sequence (un ) n EN is a PalaisSmale sequence for f zf
Definition 19.3.1
(j(un )) n EN is bounded and nlim >CXJ j'(un ) = 0. Therefore,
f satisfies the PalaisSmale condition ( PS ) if every PalaisSmale se quence for f has a convergent subsequence.
275
19.3. Nonstandard PalaisSmale conditions
Suppose E is a real Banach space and f E C 1 (E, IR) . Next we present some nonstandard variations of (PS ) : (PSO)
( un ) n EN is a PalaisSmale sequence U::lm E *N= Um E ns (*E)
(PS1 )
f (u) E 0 1\ f'(u)
f(u) E 0 1\ f'(u) :::::J 0 Uu E fin(*E) 1\ st ( f (u ) ) is a critical value of f
(PS2)
(PS3)
(PS4)
(un ) n EN
( Un) n EN is a PalaisSmale sequence Uis bounded 1\ Vn E *N= st (f(un ) ) is a critical value of f
(un ) n EN
( un ) n EN is a PalaisSmale sequence Uis bounded 1\ ::l n E *N= st (f (un ) ) is a critical value of f
Proposition 19.3.1 If f E C 1 (E, IR) (PS1 )
:::::J 0 =? u E ns(*E)
<=?
[f (u) E
0 1\ j' (u)
:::::J
0
then
=?
u E ns(*E)
1\ st (f(u))
is a critical value
]
of f .
Proof. The implication ¢== is trivial. For the proof of the other implication, suppose that u E *E is such that f (u) E 0 and f'(u) :::::J 0. By (PS1 ) , there exists a E 17E such that u :::::J a and, therefore. from the continuity of f and f' it follows that (Theorem 1 9 . 1 .3) f (a) too, so that f (a)
=
:::::J f ( u)
and f' (a)
:::::J f' ( u) :::::J 0
st (f(u)) and f (a) is a critical value of f.
Nonstandard versions of (PS) and (PS) itself are related as follows.
D
276
19. Nonstandard PalaisSmale conditions
Theorem 19.3.1 (PS1)
For any real Banach space =;.
(PS)
¢?
(PSO)
=;.
(PS2)
E we have ¢?
(PS3)
=;.
(PS4) .
Proof. It is clear that (PS) ¢? (PSO), (PS1) =;. (PSO) and (PS3) therefore (PS 1 ) =;. (PS) ¢? (PSO) and (PS3) =;. (PS4)
*E
=;.
(PS4) ,
are true. First we prove that (PSO) =? (PS2 ) . Let u E such that f(u ) E 0 and For each f ' (u) :::::J 0. Fix M E JR+ such that l f (u) l < M. Suppose u � n E N, define the standard set
Hn { x E E : :=
Since, for each
fin(*E).
�
IJ ( x ) l < M 1\ ll f' ( x ) ll <
1\ l l x l l > n } ·
n E N,
*Hn = { x E *E : l f( x) l < M 1\ ll f' ( x ) ll < � 1\ ll x l l > n } . we conclude that *Hn # 0 and the Transfer Principle says that Hn # 0. For each n E N, take Xn E Hn. Then ( x n)n E N is a PalaisSmale sequence but, for all n * N oc , Xn � fin(*E), and hence. for all n * N = , Xn � ns(*E). This i s a contradiction with (PSO) and therefore u E fin(*E). u
E
E
E
Now we will prove that if f(u) critical value of f. Let a = st (f (u) ) and, for each n
Fn {X E E : :=
Since, for each
E
0 and
f ' (u) :::::J 0, then st (j(u)) i s a
E N, define
I J ( x ) l < M 1\ I I J' ( x ) ll <
�
1\ I a  f( x ) l
<
� }.
n E N,
*Fn {X E *E : IJ ( x ) l < M 1\ ll f' ( x ) ll < � 1\ lo:  f(x) l < �} then , *Fn # 0 and therefore Fn # 0 . For each n E N take Xn E Fn. Then ( x n)n E N is a PalaisSmale sequence u
E
=
and from (PSO) we conclude
Since f
E C 1 (E, IR) , a :::::J f( xm ) :::::J j( a ) and 0 :::::J j' ( xm )
:::::J
j' ( a )
.
277
19.3. Nonstandard PalaisSmale conditions Therefore a = j(a) , f'(a) is proven. Lets prove that ( P 52) Then,
=
0,
a
is a critical value of f and (PSO)
=? (PS2)
=? ( P 53) . Let ( un ) n EN be a PalaisSmale sequence.
From (PS2) we conclude that
Vn E *N= [ un E fin(*E) 1\ st (j(un ) ) so we may conclude that
is a critical value of f ]
( un ) n EN is bounded and
Vn E *N= st(j(u n ) )
is a critical value of
f.
Finally we will prove that ( P 53) =? ( P 52) . Let u E *E such that f ( u) E 0 and f'(u) :::::J 0. Let M E JR+ such that l f(u) l < M. Suppose u � fin(*E) ; we construct a PalaisSmale sequence ( Xn )nEN using the sets H71 , as in the proof of the first part of the proof that ( P SO) =? ( P 52) , in such a way that ( Xn ) n EN is not bounded, which contradicts (PS3 ) . Suppose now that f ( u) E 0 and f' ( u) :::::J 0 . We need t o prove that a = st (j ( u)) is a critical value of f. .Just as in the second part of the proof of (PSO) =? (PS2 ) , we can build a PalaisSmale sequence (x n ) n EN such that for all n E *N= , j(x n ) :::::J o:. From (PS3) we conclude that for all n E *N= , D st(j(xn ) ) = a is a critical value of f. More can be said when
E is separable:
When E is a separable Banach space, (PS) and (PS1 ) are equivalent; in other words: if E is a separable Banach space, a C1 junction f : E + lR verifies the PalaisSmale condition if and only if almost critical points where f is finite are nearstandard. Theorem 19.3.2
Proof. Suppose f satisfies (PS) and u E *E is such that f(u) E 0 and f'(u) :::::J 0. If u � ns(*E) , it follows from Theorem 19. 1 . 2 that u � pns(*E), that is, ::lc E JR+ Vy E uE ll u  Y ll > c. Let V : = { Vp : p E N} be dense in E. We will construct a PalaisSmale sequence (xn ) n EN in E such that for all N E *N = , X N � ns(*E) which contra dicts (PS ) . Let M E JR+ b e such that l f(u) l < M. For each n E N, define Cn
:=
{ x E E : IJ(x) l < M 1\ llf'(x) ll < � 1\ Vp E N [p :::; n =? llxvv ll > c ] } ·
278
19. Nonstandard PalaisSmale conditions
Since, for each n E N, u E *C71 , we conclude that *Cn # 0 and therefore Cn # 0 . For each n E N, take Xn E crl' Let N E *N = and v E 17E . Since v = E, there exists P o E N such that ll v  Vp0 II < � . Since we conclude that x N � ns(*E) which contradicts (PS ) .
D
It is not clear whether this equivalence is true in general. The following result is consequence of the fact that if E is finite dimensional, then E is separable and fin(*E) = ns(*E ) . Theorem 19.3.3 ( PS 1 )
If E is finite dimensional ¢?
(PS)
¢?
(PSO) ¢? (PS2) ¢? (PS3) ¢? (PS4) .
Another easy observation:
Any C1 functional in a Banach space which verifies (PS4) and admits a PalaisSmale sequence, has at least one critical point.
Proposition 19.3.2
Proof. Suppose that ( un ) nEN is a PalaisSmale sequence for the functional f E C 1 (E, IR) . Then there exist m E *N= such that st ( f ( um) ) is a critical value of f. Hence, there exists a E E such that j(a) st (j(um ) ) and f'(a) 0. D =
=
Next we present an example which shows that
(PS2) f? ( PS 1 ) . Example 19.3. 1 Let H be an infinite dimensional Hilbert space and define j : H + lR x f+ f ( x ) = g ( ll x ll 2  1 ) where g : lR + lR i s given by
g(t)
�
{
0 t 2 exp
_
__L t2
if t :<.:: 0 if t > 0
Observe that g is a C1 function and
g ' ( t) :::::: 0 Also,
¢}
h:H x
[ t :<.:: 0
v
t
+ lR f+
llxll2  1
::::::
0 ].
( 1 9.3 . 1 )
19.3.
279
Nonstandard PalaisSmale conditions
is a C1 functional and \fa E H
Vx E H
(h'(a) , x)
=
•
2a x
where • denotes the inner product in H. Therefore, f is a C1 functional. We will prove that f does not satisfy ( P 51) but satisfies ( P 52) . By Theorem 19.1.2 we can take u E *H such that u E fin(*H) \ ns(*H) and l l u l l = 1 . Hence, j(u) = 0 and f'(u) = 0, which shows that f does not satisfy (P51 ) . Note that v tf_ fin(*H) =? f(v) tf 0 ·
·
and
f'(v) :::::J 0
<=?
<=?
ll f'(v) l l :::::J 0 Vx E fin(*H) (f'(v), x)
=
(2v x)g'(l l v l l 2  1 ) :::::J 0. •
(19.3.2)
Next we will prove that
f'(v) :::::J 0 =? [ l lv l l
�
1
V
llvll :::::J 1 ] .
(19.3.3)
f'(v) :::::J 0 and v # 0, by (19.3.2) either 2v ll�l l :::::J 0, and thus ll v ll :::::J 0, or g '(llvll2  1) :::::J 0, so that, by ( 19.3. 1 ) , llvll � 1 or llvll :::::J 1; in all possible cases ( 19.3.3) holds. •
If
Since
[ llv ll
<:::
1
and 0 is a critical value of
j,
we may conclude that
f(v)
E 0 1\
proving that
V
ll v ll :::::J 1 ] =? f(v) :::::J 0
j'(v) :::::J 0 =? v E fin(*H) f
does satisfy
1\
st(f(v))
0 is a critical value of f
( P 52) .
Remark 19.3. 1 Other consequences of Example 1 . I f we assume H to b e separable, Theorem
(P52) 2.
=
Moreover, the fact that that the sets
19.3 . 1 :
19.3.2 shows that
f? (PS ) .
f E C1 (E, IR) and satisfies (P52), does not imply (a
are compact (see Proposition
19.2. 1 ) .
<::: b)
19.
280
Nonstandard PalaisSmale conditions
19.1.1
It follows easily from Proposition lent to
that condition (PS3) is equiva
( un ) n EN
(un ) n EN
is a PalaisSmale sequence Uis bounded 1\ all convergent subsequences of converge to a critical value of f
(j(un ) ) nEN
and (PS4) is equivalent to
( un ) n EN
( un ) n EN
is a PalaisSmale sequence Uis bounded 1\ there is a subsequence of (!( un ) ) n EN which converges to a critical value of f.
Moreover, in the finite dimensional case, these two standard conditions are equivalent .
19.4
PalaisSmale conditions per level
In this section we present a weaker compactness condition for C1 functionals introduced in 1980 [3] by Brezis, Caron and Nirenberg. In the survey book [7] the reader can find more variants of the (PS ) condition. c
Suppose f E C1 (E, IR) and E R We say that (un ) n E N is a PalaisSmale sequence of level {for f) if
Definition 19.4.1
c
lim f(un ) nj . CXJ
=
lim f'(un ) c and nj . CXJ
=
0.
f satisfies the PalaisSmale condition of level (PS )c, if every Palais Smale sequence of level has a convergent subsequence. c,
c
Remark 19.4.1 Suppose that
f satisfies
(PS ) , then
f E C1 (E, IR) .
f satisfies
Then
(PS) c for all
c
1.
If
2.
If f satisfies (PS) c, then the set of critical points of value compact.
E
R c,
Kc , is
Example 19.4. 1 The function exp(x) : lR ____, lR satisfies (PS ) c for all c except for c = 0. The real functions sin( x) and cos( x) defined in lR satisfy (PS )c for all c except for c = 1 and c =  1 .
19.5. 19.5
281
Nonstandard variants of PalaisSrnale conditions per level
Nonstandard variants of PalaisSmale conditions per level
As above, E is a real Banach space, Obvious adaptations of conditions following conditions per level:
(PSO)c
( P 51 )
( un ) n EN
c
(PS2)c Note that if
and
c
and
ER (PS2)
is a PalaisSmale sequence of level U::lm E *N= Um E ns(*E)
f(u) :::::J
c
1\
1\
c
=
c
1\ U
c
f'(u) :::::J 0
st (f(u) )
is a critical value of f
( un ) n EN is a PalaisSmale sequence of level
hence, the adaptation of conditions alent to the standard condition:
provide the
f'(u) :::::J 0 =? u E ns(*E)
f(u) :::::J u E fin(*E)
f E C1 (E, IR) (PSO) , (PS1)
c,
then
(PS3) and (PS4) to this context are equiv
is a PalaisSmale sequence of level c U( un ) n EN is bounded 1\ c is a critical value of f.
( un ) n E N
19.3.1, 19.3.2 and 19.3.3 can be easily proven. Theorem 1 9 . 5 . 1 For any real Banach space E we have (PS1)c =} (PS)c B (PSO)c =} (PS2)c• Theorem 19.5.2 Suppose E is a real separable Banach space and f E C 1 (E, IR) . Then f satisfies (PS)c if and only if f satisfies (PS1 )c. Theorem 19.5.3 If E has finite dzmension, then (PS1)c
=
19.
282
19.6
Nonstandard PalaisSmale conditions
Mountain Pass Theorems
Some important results in Critical Point Theory, such as the Mountain Pass Theorems and some variants of Ekeland's Variational Principle, can be obtained using a deformation technique. This technique was introduced in 1934 by Lusternik and Schnirelman [8] and consists in deforming a given C 1 functional outside the set of critical points. In 1983 Willcm proved in [14] the Quantitative Deformation Lemma; a very technical lemma that involves the concept of pseudogradient vector field (sec [14] or [9] ) . For
S s;;; E , a E JR+
and
c E IR,
r : = {x E E : f(x) :S c} where
we usc the following notations and
Sa := {x E E : dist(x, S) :S a}
dist(x, S) = inf { l l x  Y ll : y E S}.
Lemma 19.6.1 (Quantitative Deformation Lemma) S S: E, c E IR, c, 6 E JR+ be such that
\iy E f 1 ( [c  2c, c + 2r::l ) n S25 x
Let f E C1 (E, IR) ,
[ ll f' (y) ll 2 8; ] .
Then there exists rJ E C( [O, 1] E, E) such that 1. rJ(O, y) y; 2. rJ(t, y) = y if y It' f 1 ( [c  2r:: , c + 2r:: ] ) n S2s; =
3.
rJ(1, r+E n S)
s;;;
r E ,.
for all t E [0, 1] , rJ(t, ) : E + E is a homeomorphism; 5. \it E [0, 1] , \iy E E ll rJ (t, y)  Y l l :S 6 ; 6. for each y E E, f ( rJ( y)) is non increasing;
4.
·
·,
7. \it E ]0, 1] , \iy E r n Ss f(rJ(t, y))
< c.
An easy consequence of the Quantitative Deformation Lemma is the fol lowing variant of Ekeland's Variational Principle (see [9, page 14] ) . Corollary 19.6.1 Let f E C 1 (E, IR) b e bounded from below. Then, for any c E JR+ , there exists u E E such that
f(u)
:<:: inf
xE E
f(x) + c and ll f '( u) ll
< vfc.
19.6.
283
Mountain Pass Theorems
Applying the Transfer Principle to Corollary Corollary 19.6.2 Let f a point u E *E such that
19.6.1 we obtain
E C1 (E, IR) be bounded from below. Then there exists
f(u) :::::J inf f(x) and j'(u) :::::J 0. x EE Now we can deduce =
Let f E C 1 (E, IR) be bounded below and c infxE E f(x) . If f satisfies (PS2)c then c is a critical value of f. Proof. By Corollary 19.6.2 we conclude that there exists u E *E such that f(u) :::::J c and f'(u) :::::J 0. Since f satisfies (PS2)c, c st(f(u)) is a critical value of f. D Theorem 19.6.1 is a generalization of the classical result ( see [7, page 16] ) : Theorem 19.6.1
=
Theorem 19.6.2 Let f E C1 (E, IR) be bounded f satisfies ( PS )c then c zs a cntical value of f.
below and c
=
infxEE f(x) .
If
We now state the Mountain Pass Theorem introduced by Ambrosetti and Rabinowitz in 1973 [1]. Theorem 19.6.3 (Mountain Pass Theorem, AmbrosettiRabinowitz) Let E be a real Banach space and f E C 1 ( E, lR) . Suppose that 1.
there exists e E E and r E JR+ such that llell > r and max{f(O), f( e ) } < b inf f(x); :=
[[ x [[=r
:
2. f = {r E C( [0, 1], E) : 1 (0) = 0 A r(1) = e } and c = inf max f(r(t) ); 1Er tE[O,l] 3. f satisfies (PS) . Then c :2: b and c is a critical value of f . We usually say that if f : E + lR satisfies condition 1 of Theorem 19.6.3, then f satisfies the mountain pass geometry ( with respect to 0 and e) . Remark 19.6.1 Conditions 1 and 2 of Theorem 19.6.3 are not enough to imply that c is a critical value of f as we can see with the following example ( sec [7, page 36] ) . The function f : IR2 + lR defined by f( x, y) = x2 + ( x + 1 )3y 2 satisfies the mountain pass geometry with 0 = (0, 0), e = ( 2, 3) and r = ! · (0, 0) is a strict local minimizer and is the only critical point of f. Therefore, there is no z E IR2 such that f(z) = c > 0 and f'(z) = 0.
19.
284
Nonstandard PalaisSmale conditions
In 1980 Brezis, Caron and Nirenberg obtained in [3] a generalization of the Mountain Pass Theorem of AmbrosettiRabinowitz for functionals satisfying the (P S) c condition: Theorem 19.6.4 (Mountain Pass Theorem, Brezis CoronNirenberg) Let E be a real Banach space and f E C1 ( E, lR) . Suppose that 1.
there exists e E E and r E JR+ such that li e I I max{f(O) , f(e)}
2.
=
r {r E C( [O, 1] , E) : 1(0)
=
<
0 !\ 1(1)
b
:=
=
>
r and
inf f(x); ll x ll =r
e} and
c :=
inf max
1Er tE[ O .l]
f(r(t)) ;
f satisfies (PS)c · Then 2: b and is a critical value of f . 3.
c
c
This result and the Mountain Pass Theorem of AmbrosettiRabinowitz are easy consequences of the Quantitative Deformation Lemma, since it is possible to prove that for every C 1 functional that satisfies conditions 1 and 2 of both theorems. there exists a PalaisSmale sequence of level c (see [9, page 19]). We now deduce the following generalization of the Mountain Pass Theorem of BrczisCoronNirenberg. Theorem 19.6.5
pose that 1.
Let
E
be a real Banach space and f E
there exists e E E and r E JR+ such that li e I I max{f (O), f(e)}
2. f 3.
Then
=
{r E C( [O, 1] , E) : 1(0)
f satisfies c
2:
=
<
0 !\ 1(1)
b =
:=
>
C1 (E, IR) .
Sup
r and
inf f(x); ll x ll =r
e } and
c :=
inf max
1Er tE[O,l]
f(r(t) ) ;
(PS4) c · c
b and is a critical value of f.
Proof. From the Quantitative Deformation Lemma there exists a PalaisSmale sequence of level c . Since f satisfies (PS4) c, c is a critical value of f. D
285
References
References [1] A. A!VI BROSETTI and P . RABINOWITZ, " Dual vatiational methods in crit ical point theory and applications", J. Funct . Anal., 14 (1973) 34938 1 . [2] LEIF 0 . ARKERYD, NIGEL J . CUTLAND and C . WARD HENSON (eds . ) , Nonstandard Analysis, Theory and Applications, Kluwer, 1997. [3] H. B REZIS, J . M . C oRON and L. N IRENBERG, "Free vibrations for a non linear wave equation and a theorem of P. Rabinowitz", Comm. Pure Appl. Math. , 33 ( 1980) 667684. [4] NIGEL J. CUTLAND (ed.) , CUP, 1988.
Nonstandard Analysis and its Applications,
[5] NIGEL J . CUTLAND, VfTOR NEVES, FRANCO OLIVEIRA and J OSE S ousA PINTO (cds.), Developments in nonstandard mathematics, Long man, 1995. [6] IVAR EKELAND "Nonconvex minimization Problems", Bulletin (New Se ries) of the American l\Iathematical Society, 1 ( 1979) .
An In troduction to Minimax Theorems and their Applications to Differential Equations, Kluwer Academic Publishers, 200 1 . L . LUSTERNIK and L . SCHNIRELMANN, Methodes topologiques dans le problemes variationnels, Hermann, Paris, 1934. .JEAN MAWHIN. Crztical Point Theory and Applicatwns to Nonlinear Dif ferential Equations, VIGRE Minicourse on Variational Methods and Non
[7] MARIA DO RosARIO GROSSINHO and STEPAN AGOP TERSIAN,
[8] [9]
linear PDE, University of Utah, 2002 . [10] R . S . PALAIS and S . SMALE, "A generalized Morse theory", Bull. Amer. Math. Soc., 70 (1964) 165172.
Minimax Methods in Critical Point Theory with Applicatwns to Dzfferentwl Equations, CBl\IS Regional Conference, 65,
[ 1 1] PAUL H. RABINOWITZ,
AMS, Providence, Rhode Island, 1986. [12] K . D . STROYAN and W . A . .J . LUXEMBURG, Infimtesimals, Academic Press. 1976.
Introduction to the theory of
[13] V. NEVES, The power of Gateaux differentiability, this volume. [14] M . WILLEM, Lectures on Critical Point Theory, Trabalho de Matermitica 199, Fund. Univ. Brasilia, Brasilia. 1983.
Averaging for ordinary different ial equat ions and functional differential equat ions Tewfik Sari* Abstract A nonstauclard approach to averaging theory for ordiuary di fferential equations and functional differential equations is developed. We define a notion of perturbation and we obtain averaging results under weaker conditions than the results in the literature. The classical averaging the
orems apptoximate the solutions of the system by th e solutions of the averaged system, for L ipschitz continuous vector fields, and when the so lutions exist on the same interval as the �;olutions of the averaged system. vVe extend these resnlt.s to perturbations of vector fields which are uui
tormly continuous in the spatial variable with l'E'Spect to the time variable and without any restriction on the interval of existence of the solution.
20 . 1
Introduction
In the early seventies, Georges R.ceb, who learned about Abraham Robin son's Nonstandard A nalysis (NSA) [29[ , was convinced that NSA gives a lan guage which is well adapted to the study of perturbation theory of differen tial equations (see [6, p. 374[ or [251). The a,'domatic presentation Internal Set Theo1·y (IS T) [26[ of NSA given by E. Nelson corresponded more to the Reeb's dream and was in agreement with his con vi cti on "Les entie'f's na:i.fs ne remplissent pas N ". Indeed, no formalism can recover exactly all the actual phenomena, and nonslanda:rd objecls which may be considered as a. formal ization of non naive objects are already elements of our usual (standard) sets. We clo not need any use of sta.rs and enlarg ements. Thus, the Reebian school adopted IST. For more informations about Recb's dream and convictions sec the Recb's preface of Lut z and Gaze's book [25] , St.ewart,'s book [40] p. 72, or Lobry's book [23[. The Reebian school of nonstandard perturbation theory of differential eqv.a, tions produced various and numerous studies and new results as attested by a * univcrsiic de Haute Alsace, l'vlulhousc, Fra.11ce. T , Sari@uha . fr
20.2.
Deformations and perturbations
287
lot of books and proceedings (see [2, 3, 4, 7, 8, 9, 10, 12, 23, 25, 30, 37, 42] and their references) . It has become today a wellestablished tool in asymptotic theory, see, for instance, [17, 18, 20, 24, 39] and the fivedigits classification 34E18 of the 2000 Mathematical Subject Classification. Canards and rivers (or Ducks and Streams [7] ) are the most famous discoveries of the Reebian school. The classical perturbation theory of differential equations studies defor mations, instead of perturbations, of differential equations (see Section 2.1). Classically the phenomena arc described asymptotically, when the parameter of the deformation tends to some fixed value. The first benefit of NSA is a natural and useful notion of perturbation. A perturbed equation becomes a simple nonstandard object , whose properties can be investigated directly. This aspect of NSA was clearly described by E. Nelson in his paper Mathematical Mythologies [30] , p. 159, when he said "For me, the most exciting aspect of non
standard analysis is that concrete phenomena, such as ducks and streams, that classically can only be described awkwardly as asymptotic phenomena, become mythologized as simple nonstandard objects. "
The aim of this paper is to present some of the basic nonstandard techniques for averaging in Ordinary Differential Equations (ODEs) , that I obtained in [32, 36] . and their extensions, obtained by M. Lakrib [19] , to Functional Dif ferential Equations (FDEs) . This paper is organized as follows. In Section 20.2 we define the notion of perturbation of a vector field. The main problem of perturbation theory of differential equations is to describe the behavior of tra jectories of perturbed vector fields. We define a standard topology on the set of vector fields� with the property that f is a perturbation of a standard vector field fo if and only if f is infinitely close to fo for this topology. In Section 20.3 we present the Stroboscopic Method for ODEs and we show how to use it in the proof of the averaging theorem for ODEs. In Section 20.4 we present the Stroboscopic Method for FDEs and we show how to use it in the proofs of the averaging theorem for FDEs. The nonstandard approaches of averaging are rather similar in structure both in ODEs and FDEs. It should be noticed that the usual approaches of averaging make use of different concepts for ODEs and for FDEs: compare with [5, 31] for averaging in ODEs and [13, 14, 15, 22] for averaging in FDEs.
20.2 20. 2 . 1
Deformations and perturbations Deformations
The classical perturbation of differential equations
theory of differential equations x = F(x, c ) ,
studies families
(20.2. 1 )
20.
288
Averaging
where x belongs to an open subset U of JRn , called phase space, and c belongs to a subset B of JRk , called space of parameters. The family (20.2.1) of differential equations is said to be a kparameters deformation of the vector field P0 (x) = P(x, co), where co is some fixed value of c. The main problem of the perturbation theory of differential equations is to investigate the behavior of the vector fields P(x , c) when c tends to c0 . The intuitive notion of a perturbation of the vector field Po which would mean any vector field which is close to Po does not appear in the theory. The situation is similar in the theory of almost periodic functions which, classically, do not have almost periods. The nonstandard approach permits to give a very natural notion of almost period (see [16, 28, 33. 41] ) . The classical perturbation theory of differential equations considers deformations instead of perturbations and would be better called defoTmation theory of dijjeTential equations. Ac tually the vector field P(x, c) when c is sufficiently close to co is called a perturbation of the vector field P0 (x ) . In other words, the differential equation
x
=
Po (x)
(20.2.2)
is said to be the unperturbed equation and equation (20.2.1), for a fixed value of c, is called the perturbed equation. This notion of perturbation is not very satisfactory since many of the results obtained for the family (20.2. 1) of dif ferential equations take place in all systems that are close to the unperturbed equation (20.2.2). Noticing this fact, V. I. Arnold (see [1] , footnote page 157) suggested to study a neighborhood of the unperturbed vector field P0 (x) in a suitable function space. For the sake of mathematical convenience, instead of neighborhoods, one considers deformations. According to V. I. Arnold, the situation is similar with the historical development of variational concepts, where the directional derivative ( Gateaux differential) preceded the derivative of a mapping (Frcchet differential) . Nonstandard analysis permits to define a notion of perturbation. To say that a vector f is a perturbation of a stan dard vector field fo is equivalent to say that f is infinitely close to fo is a suitable function space, that is f is in any standard neighborhood of fo . Thus, studying perturbations in our sense is nothing than studying neighborhoods, as suggested by V. I. Arnold.
20.2.2
Perturbations
Let X be a standard topological space. A point x E X is said to be infinitely close to a standard point x0 E X, which is denoted by x c::: x0, if x is in any standard neighborhood of x0. Let A be a subset of X . A point x E X is said to be nearstandard in A if there is a standard x0 E A such that x c::: x0.
20.2.
289
Deformations and perturbations
Let us denote by
N SA
=
{x E X : :3st xo E A x
c:::
xo},
the externalset of nearstandard points i n A [34] . Let E b e a standard uniform space. The points x E E and y E E are said to be infinitely close, which is denoted by x c::: y, if (x, y) lies in every standard entourage. If E is a standard metric space, with metric d, then x c::: y is equivalent to d(x , y) infinitesimal.
Let X be a standard topological space X. Let E be a standard uniform space. Let D and Do be open subsets of X, Do standard. Let f : D + E and fo : Do + E be mappings, fo standard. The mapping f is said to be a perturbation of the mapping fo, which is denoted by f fo, if N SDo C D and f(x) fo (x) for all x E N SDo . Let :Fx,E be the set of mappings defined on open subsets of X to E : Definition 1
c:::
c:::
:Fx , E = { (!, D) : D open subset of X
and
f : D + E} .
Let us consider the topology on this set defined as follows. Let The family of sets of the form
(fo, Do) E :Fx,E .
{(!, D) E :Fx.E : K C D Vx E K (j(x), fo (x)) E U}, where K is a compact subset of Do, and U is an entourage of the uniform space E, is a basis of the system of neighborhoods of (!0, D0 ). Let us call this topology the topology of uniform convergence on compacta. If all the mappings are defined on the same open set D, this topology is the usual topology of uniform convergence on compacta on the set of functions on D to E.
Assume X is locally compact. The mapping f is a perturbation of the standard mapping fo if and only if f is infinitely close to fo for the topology of uniform convergence on compacta.
Proposition 1
Proof. Let f : D + E be a perturbation of fo : Do + E. Let K be a standard compact subset of D0. Let U be a standard entourage. Then K C D and f(x) c::: fo (x) for all x E K. Hence (j(x), fo (x)) E U. Thus f c::: fo for the topology of uniform convergence on compacta. Conversely, let f be infinitely close to fo for the topology of uniform convergence on compacta. Let x E N SDo . There exists a standard xo E Do such that x c::: xo . Let K be a standard compact neighborhood of x0, such that K C Do (such a neighborhood exists since X is locally compact) . Then x E K C D and (j(x) , j0 (x)) E U for all standard entourages U, that is N sDo C D and f ( x) c::: fo ( x) on N sDo . Hence f is a perturbation of fo. D
20.
290
Averaging
The notion of perturbation can be used to formulate Tikhonov's theorem on slow and fast systems whose fast dynamics has asymptotically stable equi librium points [24] , and Pontryagin and Rodygin's theorem on slow and fast systems whose fast dynamics has asymptotically stable cycles [39] . In the fol lowing section we use it to formulate the theorem of Krilov, Bogolyubov and Mitropolski of averaging for ODEs. All these theorems belong to the singular perturbation theory. In this paper, by a solution of an Initial Value Problem (IVP) associated to an ODE we mean a maximal (i.e. noncontinuable) solu tion. The fundamental nonstandard result of the regular perturbation theory of ODEs is called the Short Shadow Lemma. It can be stated as follows [36, 37] . Let g : D + IRd and g0 : Do + IRd be continuous vector fields, D, Do C 0 IR+ x IRd . Let ag and a be initial conditions. Assume that g0 and a8 are standard. The IVP =
dXjdT g(T, X), X(O)
=
a0
(20.2.3)
is said to be a perturbation of the standard IVP =
=
(20.2.4) dXjdT go (T, X), X(O) a8, 0 if g g0 and a ag. To avoid inessential complications we assume that equation dX/ dT g0 (T, X) has the uniqueness of the solutions. Let ¢0 be the solution of the IVP (20.2.4) . Let I be its maximal interval of definition. Then, by the following theorem any solution of problem (20.2.3) also exist on I and is infinitely close to ¢o . Theorem 1 (Short Shadow Lemma) Let problem (20. 2. 3) be a perturba tion of problem (20. 2.4). Every solution ¢ of problem (20. 2. 3) is a perturbation of the solution c/Jo of problem (20. 2.4), that is, for all nearstandard t in I, ¢(t) is defined and satisfies ¢(t) ¢o (t). Let us consider the restriction 1jJ of ¢ to NsI. By the Short Shadow Lemma, for standard t E I, it takes nearstandard values 1/J(t) ¢0 (t) . Thus its shadow, which is the unique standard mapping which associate to each standard t the standard part of 1jJ ( t) , is equal to ¢0 . In general the shadow of ¢ is not equal to ¢0 . Thus, the Short Shadow Lemma describes only the "short time behaviour" "='
"='
=
"='
"='
of the solutions.
20.3
Averaging In ordinary differential equations
The method of averaging is wellknown for ODEs. The fundamental re sult of this theory asserts that , fm small c > 0, the solutions of a nonau tonomous system
x = f (tjr:: , x, r:: ) ,
where
x = dxjdt,
(20.3. 1)
291
20.3. Averaging in ordinary differential equations are approximated by the solutions of the averaged autonomous system iJ
=
F (y) ,
where
F(x)
=
1 lim + T =T
1 f(t, x, T
0
0)
dt.
(20.3.2)
The approximation of the solutions of (20.3. 1) by the solutions of (20.3.2) means that if x ( t , c ) is a solution of (20.3. 1 ) and y(t) is the solution of the averaged equation (20.3.2) with the same initial condition, which is assumed to be defined on some interval [0, T] , then for c c:::: 0 and for all t E [0, T] , we have x(t, c ) c:::: y(t). The change of variable z ( T ) = x(u) transforms equation (20.3. 1) into equation 1 (20.3. 3) z = c f ( T, z , c ) . where z ' = dz j dT . Thus, the method of averaging can be stated for equation (20.3 .3), that is, if c is infinitesimal and 0 ::; T ::; T/c then z ( T, c) c:::: y(cT) . Classical results were obtained by Krilov, Bogolyubov, Mitropolski, Eck haus, Sanders, Verhulst (see [5, 31] and the references therein). The theory is very delicate. The dependence of f ( t, x, c ) in c introduces many complications in the formulations of the conditions under which averaging is justified. In the classical approach, averaging is justified for systems (20. 3 . 1 ) for which the vector field f is Lipschitz continuous in x. Our aim in this section is first to formulate this problem with the concept of perturbations of vector fields and then to give a theorem of averaging under hypothesis less restrictive than the usual hypothesis. In our approach, averaging is justified for all perturbations of a continuous vector field which is continuous in x uniformly with respect to t. This assumption is of course less restrictive than Lipschitz continuity with respect to x.
20.3.1
KBM vector fields
Let U0 be an open subset of IRd . The continuous vector field fo : IR+ x U0 + JRd is said to be a KrilovBogolyubovMitropolski (KBM) vector field if it satisfies the following conditions Definition 2
1.
The function x + fo ( t, x) is continuous in x uniformly with respect to the variable t.
2. For all x E Uo the limit F(x) 3.
=
=
limr+ = � J0T fo(t, x)
dt exists.
The averaged equation y(t) F(y(t)) has the uniqueness of the solution with prescribed initial condition.
20.
292
Averaging
Notice that, in the previous definition, conditions (1) and (2) imply that the function F is continuous, so that the averaged equation considered in condition (3) is well defined. In the case of non autonomous ODEs, the definition of a perturbation given in Section 20.2 must be stated as follows. Definition 3 Let U0 and U be open subsets of IRd . A continuous vector field f : IR+ x U + IRd is said to be a perturbation of the standard continuous vector x
field fo : IR+ Uo + JRd if U contains all the nearstandard points in Uo , and f(s, x) fo (s, x) for all s E IR+ and all nearstandard x in Uo. c:::
Theorem 2 Let fo : IR+ x U0 + IRd be a standard KBM ao E Uo be standard. Let y( t) be the solution of the IVP
y (t)
=
F(y(t)),
y(O)
=
vector field and let (20.3.4)
ao,
defined on the inteTval [0, w[, 0 < w < oo. Let f IR+ U + JRd be a perturbation of fo. Let c > 0 be infinitesimal and a ao. Then every solution x(t) of the IVP (20.3.5) x(O) = a, x(t) f (tjr:: , x(t)) ' is a perturbation of y ( t), that is, for all nearstandard t in [0, w [, x( t) is defined and satisfies x(t) y(t). c:::
:
x
=
c:::
The proof, in the particular case of almost periodic vector fields, is given in Section 20.3.4. The proof in the general case is given in Section 20.3.5.
20.3.2
Almost solutions
The notion of almost solution of an ODE is related to the classical notion of c almost solution.
function x( t) is said to be an almost solution of the standard differential equation x G(t, x) on the standard interval [0, L] if there exists a finite sequence 0 to < < t N + 1 L such that for n 0, , N we have
Definition 4 A
=
=
c:::
·
·
·
tn+ l tn, x(t) and x(tn+ l )  x(tn) tn+ l  tn
=
=
·
c:::
x(tn) for t E [t n , t n+ l ] ,
c:::
G(tn , x(tn )).
·
·
The aim of the following result is to show that an almost solution of a standard ODE is infinitely close to a solution of the equation. This result which was first established by .J. L. Callot (sec [1 1 , 27]) is a direct consequence of the nonstandard proof of the existence of solutions of continuous ODEs [26] .
293
20.3. Averaging in ordinary differential equations
Theorem 3 If x ( t) is an almost solution of the standard differential equation x = G(t, x) on the standard interval [0, L] , x(O) y0, with y0 standard, and the IVP y = G(t, y), y(O) = y0, has a unique solution y(t), then y(t) is defined at least on [0, L] and we have x(t) y(t), for all t E [0, L] . "='
"='
D
Proof. See [ 1 1 , 36]
Let us apply this theorem to obtain an averaging result for an ODE which does not satisfy all the hypothesis of Theorem 2. Consider the ODE (see [ 1 1 , 27. 36] )
tx . . X. ( t ) = Sln c
(20.3.6)
The conditions (2) and (3) in Definition 2 are satisfied with F(x) = 0. Thus, the solutions of the averaged equation arc constant. But condition ( 1 ) of the definition is not satisfied, since the function f(t, x) = sin (tx) is not continuous in x uniformly with respect to t. Hence Theorem 2 docs not apply. In fact the solutions of (20.3.6) are not nearly constant and we have the following result: Proposition 2 If c > 0 is infinitesimal then, in the region t 2 x > 0 solutions of (20. 8. 6) are infinitely close to hyperbolas tx = constant . In region x > t 2 0, they are infinitely close to the solutions of the ODE
the the
2 2 (20.3. 7) x = G(t, x) , where G(t, x) = vlx  t  x . t Proof. The isocline curves h = { ( t, x) : tx = 2k7rc} and I� = { ( t, x) : tx = (2k + �) 7rc} define, in the region t 2 x > 0, tubes in which the trajectories are trapped. Thus for t 2 x > 0 the solutions are infinitely close to the hyperbolas tx = constant. This argument does not work for x > t 2 0. In this region, we consider the microscope
T
=
t  tk
X
=
c ' where ( tk , X k ) are the points where a solution h· Then we have
X  Xk c
x( t)
' of (20.3.6) crosses the curve
X(O) = 0. By the Short Shadow Lemma (Theorem 1 ) , X(T) is infinitely close to a solution of dXjdT = sin (x k T + t k X). By straightforward computations we have
X X tk + l  tk
k k+l '
":::
G(t k , X k ) .
Hence, in the region x > t 2 0, the function x( t) is an almost solution of the ODE (20.3. 7) . By Theorem 3, the solutions of (20.3.6) are infinitely close to the solutions of (20.3.7). D
20.
294
20.3.3
Averaging
The stroboscopic method for ODEs
�+
In this section we denote by G : x D + function, where D is a standard open subset of function such that 0 E I C Definition 5 We say that respect to G if there exists
�+.
�d.�d
�d
a standard continuous Let x : I + be a
x satisfies the Strong Stroboscopic Property with > 0 such that for every positive limited t0 E I with x(to) nearstandard in D, there exists t1 E I such that fL < t1  to 0, [to , t1] c I, x(t) x(to) for all t E [to, t1], and fL
c::::
c::::
x(t l )  x(to) G ( t0, x t0 )) . ( t1  to The real numbers t0 and t1 are called successive instants of observation of the �
'''' _
stroboscopic method. Theorem 4 (Stroboscopic Lemma for ODEs) Let a0 E D be standard. Assume that the IVP y(t) G (t, y(t)), y(O) a0, has a unique solution y defined on some standard interval [0, L] . Assume that x(O) a0 and x satisfies the Strong Stroboscopic Property with respect to G. Then x is defined at least on [0, L] and satisfies x(t) y(t) for all t E [0� L] . =
=
c::::
c::::
Proof. Since x satisfies the Strong Stroboscopic Property with respect to G, it is an almost solution of the ODE x = G(t, x). By Theorem 3 we have x(t) c:::: y(t) for all t E [0, L] . The details of the proof can be found in [36] . D The Stroboscopic Lemma has many applications in the perturbation theory of differential equations (see [11, 32, 35, 36� 38 , 39]). Let us use this lemma to obtain a proof of Theorem 2.
20.3.4
Proof of Theorem 2 for almost periodic vector fields
·,
Suppose that fo is an almost periodic in t, then any of its translates fo (s + x0) is a nearstandard function, and fo has an average F which satis fies [16, 28, 33, 41]
s+Tfo (t, x) dt, 1 T>rx> T s s E �+ · F 1 1 s+ T F(x) T s fo (t, x) dt,
F(x) uniformly with respect to
=
1
lim 
Since
c:::: 
is standard and continuous, we have
(20.3.8)
20.3.
295
Averaging in ordinary differential equations
IR
for all s E + , all T ::::::' oo and all nearstandard x in U0 . Let x : I + U be a solution of problem (20.3.5) . Let t0 be an instant of observation: t0 is limited in I, and x0 = x(to) is nearstandard in U0 . The change of variables
X transforms
(20.3.5)
=
_x'(t o + cTc) _ x_o ' c _ _ __
_
into
dX/dT
=
f(s + T, xo + eX) ,
s
where
=
tojc.
By the Short Shadow Lemma ( Theorem 1 ) , applied to g(T, X) f(s + T, xo + eX) and go (T, X) fo ( s + T, xo) , for all limited T > 0, we have X(T) ::::::' J0Y fo (s + r, xo)dr. By Robinson's Lemma this property is true for some unlimited T which can be chosen such that cT 0. Define t1 t0 + cT. =
=
::::::'
Then we have
=
1
l
+ y 1 s Y ::::::' 1 0 fo( s + r, xo) dr = T s fo (t, xo) dt ::::::' F(xo). T T Thus x satisfies the Strong Stroboscopic Property with respect to F. Using the Stroboscopic Lemma for ODEs ( Theorem 4) we conclude that x(t) is infinitely close to a solution of the averaged ODE (20.3.4). x(t l )  x(to) t1  to
'''' =
20.3.5
X ( T)

Proof of Theorem 2 for KBM vector fields
Let fo be a KBM vector field. From condition (2 ) of Definition 2 we deduce + that for all s E + , we have F(x) = limy_, = � fss Y fo (t, x) dt, but the limit is not uniform on s . Thus for unlimited positive s , the property ( 20.3.8 ) docs not hold for all unlimited T, as it was the case for almost periodic vector fields. However, using also the uniform continuity of fo in x with respect to t we can show that (20.3.8 ) holds for some unlimited T which are not very large. This result is stated in the following technical lemma [36] .
IR
IR
IRd
Let g : + x M + be a standard continuous function where M is a standard metric space. We assume that g is continuous in m E M uniformly with respect to t E + and that g has an average G(m) limy_, = � J0Y g(t, m) dt. Let c > 0 be infinitesimal. Let t E + be limited. Let m be nearstandard in M . Then there exists a > c, a 0 such that, for all limited T 2 0 we have + 1 s YS g(r, m) dr ::::::' T G ( m ) , where s = tjc, S = o:jc. Lemma 1
S
1 8
IR
::::::'
IR
=
20.
296
2
The proof of Theorem found also in [36] .
Averaging
needs another technical lemma whose proof can be
Let g IR+ JRd JRd and h IR+ JRd be continuous functions. Suppose that g(T, T X) h(T) holds for all limited T E IR+ and all limited X E JRd , and J0 h ( r) dr is limited for all limited T E IR+ . Then, any solution X(T) of the IVP dXjdTT g(T, X), X(O) 0, is defined for all limited T E IR+ and satisfies X(T) "=' J0 h(r) dr. Lemma 2
:
:
+
x
"='
=
+
=
Proof of Theorem 2. Let x : I + U be a solution of problem (20.3.5). Let t0 E I be limited, such that x0 = x(t0) is nearstandard in U0. By Lemma 1 , applied to g = fo, G = F and m = x(to), there is a > 0, a "=' 0 such that for all limited T 2 0 we have
T 1 rs + S fo (r, xo) dr "=' TF (xo), s .}s
where
s = to/c, S = o:jc.
(20.3.9)
The change of variables
X(T) transforms
(20.3.5)
=
x (to + o:T)  xo 0:
into
dXjdT
=
f(s + ST, xo + o:X).
By Lemma 2, applied t o g(T, X) = f(s and (20.3.9) , for all limited T >
+ ST xo + o:X) and h(T) = fo (s + 0, we have T s +TS X(T) "=' fo (s + Sr, xo) dr = 1 fo (r, xo) dr ":' TF (xo). 0 s s Define the successive instant of observation of the stroboscopic method t1 by t1 = to + a. Then we have ST, xo),
lo
1
x(t l )  x(to) = X ( 1) "=' F(xo). t1  to Since t1  to = a > c and x(t)  x(to) = o:X(T) "=' 0 for all t E [to, t1] , we have proved that the function x satisfies the Strong Stroboscopic Property with respect to F. By the Stroboscopic Lemma, for any nearstandard t E [0 , w[, x(t) i s defined and satisfies x(t) "=' y(t). D
20.4.
297
Functional differential equations
20.4
Functional differential equations
Let C = C ([  r, 0] , IRd ) , where r > 0, denote the Banach space of continuous functions with the norm 11 ¢ 11 = sup{ II ¢(B) II : e E [  r, O] } , where 1 1 · 11 is a norm of JRd . Let L 2 t0. If x : [r, L] + IRd is continuous, we define Xt E C by setting Xt (B) = x(t + B), e E [  r, OJ for each t E [0, L] . Let g : JR + x C + lRd , (t, u) f+ g(t, v. ) , be a continuous function. Let ¢ E C be an initial condition. A Functional Differential Equation (FDE) is an equation of the form
x(t) = g (t, xt) ,
xo = ¢.
This type of equation includes differential equations with delays of the form =
x(t) G(t, x(t), x(t  r) ) , where G : IR+ IRd IRd + JRd . Here we have g(t, u ) G(t, u(O) , u(  r)). The method of averaging was extended [13, 22] to the case of FDEs x
x
=
the form where
of
(20.4. 1)
c is a small parameter.
In that case the averaged equation is the ODE
(20.4.2) where F is the average of of the form
f.
It was also extended
x(t)
=
f (t/c, x t ) .
[14]
to the case of FDEs
(20.4. 3)
In that case the averaged equation is the FDE
(20.4.4) Notice that the change of variables x(t) = z(t/c) does not transform equa tion (20.4. 1) into equation (20.4.3) , as it was the case for ODEs (20.3.3) and (20.3.1), so that the results obtained for (20.4. 1) cannot be applied to (20.4.3) . In the case of FDEs of the form (20.4. 1) or (20.4.3), the clas sical averaging theorems require that the vector field f is Lipschitz continuous in x uniformly with respect to t. In our approach, this condition is weak ened and we only assume that the vector field f is continuous in x uniformly with respect to t. Also in the classical averaging theorems it is assumed that the solutions z(T, c) of (20.4. 1) and y(T) of (20.4.2) exist in the same interval [0, T /c] or that the solutions x(t, c) of (20.4.3) and y(t) of (20.4.4) exist in the same interval [0, T] . In our approach, we assume only that the solution of the averaged equation is defined on some interval and we give conditions on the vector field f so that, for c sufficiently smalL the solution x(t, c) of the system exists at least on the same intervaL
298
20. Averaging
20.4.1
Averaging for FDEs in the form z'(r)
=
E:f (r, Zr )
c is a small parameter zo c/J . The change of variable x(t) z(t/c) transforms this equation in (20.4. 5) x(t) f (tjc, Xt ,c ) , x(t) c/J(t/c), t E [  cr, 0] , where Xt , E E C is defined by Xt ,E (B) x( t + cB) for B E [  r, 0] . Let f IR+ x C + JRd be a standard continuous function. We assume that (H1) The function f : u f (t, u) is continuous in u uniformly with respect to the variable t. (H2) For all u E C the limit F(u) limrHx> � J0T f (t, u) dt exist s. We identify IRd to the subset of constant functions in C , and for any vector c E JRd , we denote by the same letter, the constant function u E C defined by u(B) c, e E [  r, 0] . Averaging consists in approximating the solutions x(t, c) of (20.4. 5) by the solution y(t) of the averaged ODE (20.4. 6) y(t) F (y(t)) , y(O) ¢(0) . According to our convention , y(t) , in the righthand side of this equation. is the constant function u t E C defined by ut (e) y(t), e E [r, O] . Since F is We consider the IVP, where
=
=
=
=
=
:
f+
=
=
=
=
=
continuous, this equation is well defined. We assume that
( H 3)
The averaged ODE (20.4.6) has the uniqueness of the solution with pre scribed initial condition.
(H4) The function f is quasibounded in the variable u uniformly with respect to the variable t, that is, for every t E IR+ and every limited u E C, f (t, u) is limited in JRd . Notice that conditions (H1), (H2) and ( H 3) are similar to conditions (1 ) , (2) and (3) of Definition 2 . In the case of FDEs we need also condition (H4) . In classical words, the uniform quasi boundedness means that for every bounded subset B of C, f (IR+ x B) is a bounded subset of IRd . This property is strongly related to the continuation properties of the solutions of FDEs (see Sections 2 . 3 and 3. 1 o f [1 5] ) . Theorem 5 Let f : IR+ x C + IRd be a standard continuous function satisfying
the conditions (Hl )  (H4). Let cjJ be standard in C. Let L > 0 be standard and let y : [0, L] + IRd be the solution of (20.4. 6). Let c > 0 be infinitesimal. Then every solution x(t) of the problem {20. 4 . 5) is defined at least on [ cr. L] and satisfies x(t) "::: y(t) for all t E [0 , L] .
20.4.
299
Functional differential equations
20.4. 2
The stroboscopic method for ODEs revisited
In this section we give another formulation of the stroboscopic method for ODEs which is well adapted to the proof of Theorem 5. Moreover , this formulation of the Stroboscopic Method will be easily extended to FDEs ( see Section 20.4.4) . We denote by G : IR x IRd + IRd , a standard continuous function. Let x : I + JRd be a function such that 0 E I C IR .
+
+
We say that x satisfies the Stroboscopic Property with respect to G if there exists fL > 0 such that for every positive limited t0 E I, satisfying [0, t0] C I and x(t) is limited for all t E [0, t0] , there exists t1 E I such that fL < t1  to "::: 0, [to, t1] c I, x(t) "::: x(to) for all t E [to , t1], and
Definition 6
x(t l )  x(to) t l  to
'''.'
_
�
G ( t0, x ( to )) .
The difference with the Strong Stroboscopic Property with respect to G con sidered in Section 20.3.3 is that now we assume that the successive instant of observation t1 exists only for those values t0 for which x(t) is limited for all t E [0, t0] . In Definition 5, in which we take D = IRd , we assumed the stronger hypothesis that t1 exists for all limited to for which x(to) is limited.
Let ao E D be standaTd. Assume that the IVP y(t) G (t, y(t)), y(O) ao, has a unique solution y defined on some standard interval [0, L] . Assume that x(O) "::: ao and x satisfies the Stroboscopic Property with respect to G. Then x is defined at least on [0, L] and satisfies x(t) y(t) for all t E [0, L] .
Theorem 6 (Second Stroboscopic Lemma for ODEs) =
=
c:::
Proof. Since x satisfies the Stroboscopic Property with respect to G, it is an almost solution of the ODE x = G(t, x). By Theorem 3 we have x(t) "::: y(t) for all t E [0, L] . The details of the proof can found in [19] or [21] . D Proof of Theorem 5. Let x : I + JRd be a solution of problem (20.4.5). Let t0 E I be limited, such that x(t) is limited for all t E [0, t0] . By Lemma applied to g = f, G = F and the constant function m = x(to), there is a > a "::: 0 such that for all limited T 2: 0 we have
1 S
ls+TS f(r, x(to)) dr "::: TF(x(to)), s
where
s = to/c, S = o:jc.
(20.4. 7)
Using the uniform quasi boundedness of f we can show ( for the details see or [21]) that x(t) is defined and limited for all t "::: t0 . Hence the function e
X ( , T)
=
x(to + aT + cB)  x(to) , B E [r, O] . T E [0, 1] , 0:
1, 0,
[19]
20.
300
is well defined. In the variable
X (  , T)
(20.4.5)
system
Averaging
becomes
ax (0, T) = f(s + ST, x(to) + aX(  , T)). oT Using assumptions (H1 ) and (H4) together with (20.4. 7), we obtain after some computations that for all T E [0, 1] , we have
X (O, T)
"'='
{T f(s + Sr, x(to)) dr = 1 1s+TSf (r, x(t0)) dr
Jo
S
8
"'='
TF(x(t0)).
Define the successive instant of observation of the stroboscopic method t1 = to + a. Then we have
t1
by
x(t i )  x(to) X (O, 1 ) F(x(to)). t l  to Since t1  to a > c and x(t)  x(to) aX(O, T) 0 for all t E [to, t1 ] , we have proved that the function x satisfies the Stroboscopic Property with respect to F. By the Second Stroboscopic Lemma for ODEs, for any t E [0, L] , D x(t) is defined and satisfies x(t) y(t). "'='
=
=
"'='
=
"'='
20.4.3
Averaging for FDEs in the form x (t)
We consider the IVP, where
x(t)
=
c
=
is a small parameter
f (tj c , Xt ) ,
xo
=
f (tjE:, Xt )
(20.4.8 )
¢,
We assume that f satisfies conditions ( H1 ) , ( H2 ) and ( H4 ) of Section Now, the averaged equation is not the O D E (20.4.6), but the FDE
y(t) = F ( Yt ) ,
20.4. 1.
(20.4.9)
Yo = ¢.
Averaging consists in approximating the solutions x(t, c ) of ( 20.4.8 ) by the solution y(t) of the averaged FDE (20.4.9). Condition (H3) in Section 20. 4. 1 must be restated as follows
(H3)
The averaged FDE ( 20.4.9 ) has the uniqueness of the solution with pre scribed initial condition.
d Theorem 7 Let f : IR+ C + IR be a standard continuous function satisfying the conditionsd (Hl)(H4) . Let ¢ be standard in C. Let L 0 be standard and let 0 be infinitesimal. y : [0, L] IR be the solution of problem (20.4.9). Let x
+
>
c >
Then every solution x(t) of the problem {20.4.8) is defined at least on [  r, L] and satisfies x(t) y(t) for all t E [r, L] . "'='
20.4.
301
Functional differential equations
2 0.4.4
The stroboscopic method for FDEs
Since the averaged equation (20.4. 9) is an FDE, we need an extension of the stroboscopic method for ODEs given in Section 20.4.2. In this section we denote by G : IR x C + IRd , a standard continuous function. Let x : I + IRd be a function such that [  r, OJ C I C IR .
+
+
We say that x satisfies the Stroboscopic Property with respect to G if there exists fJ· > 0 S1Lch that for every positive limited to E I, satisfying [0, to] C I and x(t) and G(t , Xt) are limited for all t E [0 , to] , there exists t 1 E I such that fJ < t1  to 0, [to, t1] C I, x(t) x(to) for all t E [to, t1] , and
Definition 7
c:::
c:::
x(t x(to) ' l )'  ' '  G ( to, X to ) . t1  to �
Notice that now we assume that the successive instant of observation t1 exists for those values t0 for which both x ( t) and G ( t, xt ) are limited for all t E [ 0, t0 ] . In the limit case r = 0 , the Banach space C is identified with IRd and the function X t is identified with x(t) so that , G(t , X t ) is limited, for all limited x(t) . Hence the "Stroboscopic Property with respect to G " considered in the previous definition is a natural extension to FDEs of the "Stroboscopic Property with respect to G " considered in Definition 6.
Let ¢ E C be standard. A s sume that the IVP y(t) G (t, Yt ), Yo ¢, has a unique solution y defined on some standard interval [ r, L] . Assume that the function x satisfies the Stroboscopic Property with respect to G and x0 c::: ¢. Then x is defined at least on [r, L] and satisfies x(t) y(t) for all t E [r, L] .
Theorem 8 (Stroboscopic Lemma for FDEs) =
=
c:::
Proof. Since x satisfies the Stroboscopic Property with respect to G, it is an almost solution of the FDE x = G(t, Xt ) · For FDEs, we have to our disposal an analog of Theorem 3. Thus x ( t) c::: y ( t) for all t E [0, L] . The details of the proof can found in [19] or [21] . D Proof of Theorem 7. Let x : I + JRd be a solution of problem (20.4.8) . Let t0 E I be limited, such that both x(t) and F(xt ) are limited for all t E [0, t0] . From the uniform quasi boundedness of f we deduce that x( t) is Scontinuous on [0, t0] . Thus Xt is nearstandard for all t E [0, t0] . By Lemma 1 , applied to g f, G F and m Xt 0 , there is o: > 0, o: c::: 0 such that for all limited T 2 0 we have =
=
1
S
1s +TS f (r, X 8
=
t 0 ) dr
c:::
T F ( Xt0 ) ,
where
s
=
to/c , S
=
o:jc.
(20.4.10)
20.
302
Averaging
Using the uniform quasi boundedness of f we can show ( for the details see or [2 1]) that x(t) is defined and limited for all t , t0. Hence the function
[19]
X ( e , T ) = x(to + aT + B)  x(to + B) , B E [r, 0] , T E [0, 1] , a is well defined. In the variable X ( · , T) system (20.4.8) becomes ax · oT (0, T) = f(s + ST, X t0 + aX( , T) ). Using assumptions ( H 1) and (H4) together with (20.4.10), we obtain that for all T E [0, 1] , we have
{T
1 X(O, T) , J f (s + Sr, xt 0 ) dr = s o
1s+ TS f (r, x ) dr , TF (xt ) . 0 t0 s
Define the successive instant of observation of the stroboscopic method t1 = to + a. Then we have
t1
by
x(t l )  x(to) X (O, 1) "' F ) (x to t1  to Since t1  to a > c and x(t)  x(to) aX (O, T) 0 for all t E [to, h ] , we have proved that the function x satisfies the Stroboscopic Property with respect to F. By the Stroboscopic Lemma for FDEs, for any t E [0, L] , x ( t ) is defined and satisfies x(t) , y(t) . D =
=
=
,
References
[1]
V . I . ARNOLD ( ed. ) , Dynamical Systems Sciences, Vol. 5, SpringerVerlag, 1994.
V, Encyclopedia of Mathematical
[2]
H . BAR REAU and .J . HARTHONG (editeurs ) , dard, Editions du CNRS, Paris, 1989.
La mathematique Non Stan
[3]
E. BENOIT ( Ed. ) , Dynamic SpringerVerlag, Berlin, 1991.
Proceedings, Luminy
[4]
I . P . VAN DEN BERG , Nonstandard Asymptotic in Math. 1249, SpringerVerlag, 1987.
Bifurcations,
Analysis,
1990,
Lectures Notes
[5] N . N . BOGOLYUBOV and Yu. A . I\IITROPOLSKI, Asymptotic Methods in the Theory of Nonlinear Oscillations, Gordon and Breach, New York, 1961.
303
References
[6]
J . W . DAUBEN , Abraham Robinson, The Creation of Nonstandard Anal ysis, A Personal and Mathematical Odyssey, Princeton University Press, Princeton, New Jersey, 1995.
[7]
F. DIENER and M . DIENER (eds . ) , Universitext , SpringerVerlag, 1995.
[8]
F. DIENER and G . REEB,
[9]
M. DIENER and C . LOBRY (editeurs) , Analyse non tation du reel, OPU (Alger) , CNRS (Paris) , 1985.
Nonstandard Analysis in Practice,
Analyse non standard,
Hermann,
1989.
standard et repnisen
[10] M . DIENER and G. WALLET (editeurs) , Mathematiques finitaires et anal yse non standard, Publication mathematique de l'Universite de Paris 7, Vol. 311, 312, 1989. [11]
J. L. CALLOT and T. SARI , Stroboscopie et moyennisation dans les sys tcmes d'cquations diffcrcntielles a solutions rapidement oscillantes, in
Mathematical Tools and Models for Control, Systems Analysis and Sig nal Processing, vol. 3, CNRS Paris, 1983.
[12]
A . FRUCHARD and A. TROESCH (editeurs), Colloque Trajectorien a la memoire de G. Reeb et J. L. Callot, StrasbourgObernai, 1216 juin 1995, Prepublication de l'IRtvi A, Strasbourg, 1995.
[13]
J . K . HALE, "Averaging methods for differential equations with retarded arguments and a small parameter", .J. Differential Equations, 2 (1966)
57 73.
[14] [15]
J . K . HALE and S . M . VERDUYN LUNEL, "Averaging in infinite dimen sions", J. Integral Equations Appl . , 2 (1990) 463494. .T . K . HALE and S . M . VERDUYN LUNEL , lntmduction to Functional Dif Applied Mathematical Sciences 99, SpringerVerlag,
ferential Equations, New York, 1993.
[16]
L . D . KLUGLER, Nonstandard Analysis of almost periodic functions, in Applications of Model Theory to Algebra, Analysis and Probabzhty, W.A.J. Luxemburg ed. , Holt, Rinehart and Winston, 1969.
[17]
M. LAKRIB, "The method of averaging and functional differential equa tions with delay", Int. J . Math. Sci. , 26 (2001) 497511.
[18]
M . LAKRIB, " On the averaging method for differential equations with delay", Electron. J. Differential Equations, 65 (2002) 116.
304
20. Averaging
Stroboscopie et moyenmsation dans les equations differen tielles fonctionnelles a retard, These de Doctorat en Mathematiques de
[19] 1\ I . LAKRIB,
l' Universite de Haute Alsace, Mulhouse, 2004. [20] 1\ I . LAKRIB and T. SARI, "Averaging results for functional differential equations", Sibirsk. Mat. Zh. , 45 ( 2004) 375386; translation in Siberian Math. J . , 45 (2004) 3 1 1320. [2 1] 1\ I . LAKRIB and T. SARI , Averaging Theorems for Ordinary Differential Equations and Retarded Functional Differential Equations. http : //www . math . uha . fr/ps/20050 1 lakrib . pdf [22] B . LEHMAN and S . P . WEIBEL . "Fundamental theorems of averaging for functional differential equations", J . Differential Equations, 152 ( 1 999) 160190. [23] C. LOBRY, 1 989.
Et pourtant . . . ils ne remplissent pas N !,
Aleas Editeur, Lyon,
[24] C. LOBRY, T. SARI and S. TOUHAMI, " On Tykhonov's theorem for con vergence of solutions of slow and fast systems", Electron. J. Differential Equations, 19 ( 1998) 122. [25] R. LUTZ and M . GOZE, Nonstandard Analysis: a practical guide applications, Lectures Notes in Math. 881, SpringerVerlag, 1 982.
with
[26] E . NELSON, "Internal Set Theory", Bull. Amer. 1\Iath. Soc . , 83 ( 1 977) 1 165 1 1 98. [27] G. REEB, Equations differentielles et analyse non classique ( d'apres .J . L. Callot) , in Pmceedings of the 4th International Colloquium on Differ ential Geometry ( 1 978) , Publicaciones de la Universidad de Santiago de Compostela, 1 979. [28] A. RoBINSON , " Compactification of Groups and Rings and Nonstandard Analysis". Journ. Symbolic Logic, 34 ( 1 969) 576588. [29] A. ROBINSON. 1 974.
Nonstandard Analysis,
American Elsevier, New York,
[30] .J .l\1. SALANSKIS and H . SINACEUR (eds.) , Le Colloque de Cerisy, SpringerVerlag, Paris, 1992.
Labyrinthe du Continu,
[31] J . A . SANDERS and F. VERHULST, Averaging Methods in Nonlinear Dy namical Systems, Applied Mathematical Sciences 59, SpringerVerlag, New York, 1985.
305
References
[32]
T. SARI, "Sur la theorie asymptotique des oscillations non stationnaires", Asterisque 1091 10 (1983) 1411 58.
[33]
T. SARI , Fonctions presque periodiques, in Actes de l 'ecole d 'ete Analyse non standard et representation du reel. OranLes Andalouses 1984. OPU Alger  CNRS Paris, 1985.
[34]
T. SARI, General Topology, in Nonstandard Analysis in Practice, F. Di ener and M. Diener (Eds. ) , Universitext, SpringerVerlag, 1995.
[35]
T. SARI, Petite histoire de la stroboscopie, in Colloque Trajectorien a la Memoire de J. L. Callot et G. Reeb, StrasbourgObernai 1995. Publication IRMA, Univ. Strasbourg (1995) , 515.
[36]
T . SARI, Stroboscopy and Averaging, in Colloque Trajectorien a la Me moire de J. L. Callot et G. Reeb, StrasbourgObernai 1995, Publication IRMA, Univ. Strasbourg, 1995.
[37]
T. SARI, Nonstandard Perturbation Theory of Differential Equations, Ed inburgh, invited talk in International Congres in Nonstandard Analysis and its Applications, ICMS, Edinburgh, 1996. http : //www . math . uha . fr/sari/papers/i cms 1996 . pdf
[38]
T . SARI , Averaging in Hamiltonian systems with slowly varying parame ters, in Developments in Mathematzcal and Experimental Physics, Vol. C, Hydrodynamics and Dynamical Systems, Proceedings of the First Mexican Meeting on Mathematical and Experimental Physics, El Colegio Nacional, Mexico City, September 1014, 2001 , Ed. A. Macias, F. Uribe and E. Diaz, Kluwer Academic/Plenum Publishers, 2003.
[39]
T . SARI and K . YADI, "On PontryaginRodygin's theorem for convergence of solutions of slow and fast systems", Electron. J. Differential Equations, 139 (2004) 117.
[40]
I. STEWART,
[41]
K . D . STROYAN and W . A . .J . LUXEMBURG , infinitesimals, Academic Press, 1976.
[42]
I I I8
1987.
The Problems of Mathematics,
Oxford University Press,
Introduction to the theory of
rencontre de Geometrie du Schnepfenried, Feuilletages, Geometrie symplectzque et de contact, Analyse non standard et applications, Vol. 2, 1015 mai 1982 , Asterisque 109110, Societe Mathematique de France, 1983.
Pathspace measure for stochastic differential equation with a coefficient of polynomial growth Toru Nakamura
A a additive measure over
*
Abstract
a spa.ce of paths
is constructed to give the so lution to the FokkerPlanck equation associated with a st.oehastic differ ential equation with coefficient fuuction of polynomial growth by making use of nonstauda.rd analysis.
21.1
Heuristic arguments and definitions
Consider a stochastic differential equation,
dx(t)
=
f (x(t)) dt + db(t),
(21 . 1 .1 )
where f ( x) is a realvalued function and b(t) a Browni an motion vvith vari an ce 2Dt for a time interval t. vVe wish to construct a m easme over a space of paths for (21 . 1 .1 ) . To my knowledge, the coefficient. fund.ion has been assumed to have at most linear growth, that is [.f(x) [ .::::; const· [ x [ for sufficiently lar ge x, otherwise some paths explode to infinity in finite tunes. As an example, let .f(x) = [ x [ l+6 (8 > 0) and define explosion t ime e for each continuous path x(t) by limt>eo x(t) = ±oo. Then, it is proved that P(e oo) < 1 and more strongly P( e = oo ) 0. For general case, see Feller's test for explosion in [1]. Despite the explosion, we shall consider f(x) of polynomial growth of an arbitrary order and define a measure over a space of paths. vVe use nonstandard analysis because it has a very convenient theory, Loeb measure theory [2, 3] , which enables us t.o construct. a standard O"additive measure in a simple way. ·
=
=
*Department of Mat.hemat.ics. S undai Preparatory School, KandaS urugadai. Chiyoda kn, Tokyo 1 0 10062, .Japan.
21.1.
307
Heuristic arguments and definitions
Let us interpret (21 . 1 . 1) as a law for a particle momentum x = x(t) at time t with the force f ( x( t)) for drift and the random force db( t) / dt acting on the particle. By a time t > 0 some particles may disappear to infinity along the exploding paths, but others still exist with finite momentum so that they should make up a "probability" density of a particle momentum. We wish to usc the word "probability" though its total value may be less than 1 because of the disappearance of particles to infinity. Especially when f ( x( t)) is a repulsive force, that is, its sign is the same as that of x(t) , a particle can get larger momentum compared to the case where f (x(t)) is absent. However, once it gets very large momentum, it can hardly come back to the former one because the drift force f (x(t)) acts on the particle so as to increase its momentum. Thus, the particles which have gone to infinity by a time t > 0 could not contribute to the probability density at t. This consideration suggests that we can introduce a cutoff at an infinite number in momentum space if it is necessary in order to define a measure over a space of paths so that the probability density should be constructed by a path integral with respect to the measure. Eq. (21.1.1) gives the forward FokkerPlanck equation for the probability density U(t, x) of a particle momentum x at time t, 0
0 2 D o 2 U(t, x)  { J(x) U(t, x) } . (21 . 1 .2) ot ox ox We assume that the drift coefficient f(x) and the initial function U(O. x) satisfy
U(t. x)
=
the following conditions:
(A1)
For some natural number large x.
n
E N,
l f(x) l
:<::
const · l x l n
for sufficiently
(A2) f(x) E C 2 (IR) . (A3) f(x) 2 /(4D) + f'(x)/2 is bounded from below. We denote the min {f(x) 2 /(4D) + f'(x)/2 1 x E IR}.
bound by
c =
(A4) U(O, x) E C 2 (IR) and its support is a bounded set . Rewrite (21.1 .2) into a difference equation with infinitesimal timespacing c and momentumspacing 6 V'fDc using a forward difference quotient 0 1 0 U(t, x) ==?  { U(t + r:: , x)  U(t, x) } t c =
for timederivative, and central ones 0
ox
1 U(t, x) ==? 26 { U(t, x + 6)  U(t, x  6) }
21.
308
and
Pathspace measure for stochastic differential equation
oEJ2x2 U(t, x) =? 1)21 { U(t, x + 6) + U(t, x  6)  2 U(t, x) }
for spacederivative. The result is
U(t + c, x) � { 1 + .f (x  6)Jc/ (2D ) } U(t, x  6) ( 21.1.3 ) + � { 1 + .f (X + 6) ( JcI (2D ) ) } u ( t ' X + 6) ' which indicates that the coefficients H 1 + .f ( .T  6)Jc/( 2D )} should be as signed to the infinitesimal linesegment with end points ( t, x ( t) ) ( t, x ( t + c)  6) and ( t + c. x(t + c) ) of a *polygonal path x( s), and !{1 + .f (x + 6) ( Jc/ ( 2D ) ) } to the segment with end points ( t, x(t) ) (t, x(t + c) + 6) and ( t + c , x(t + c )) . Taking into account the approximation 1 .f (x(t) ) (±v 2Dc )  1 .f (x(t) ) 2 c log [ 1 + .f (x(t ) ) (± Jc/ ( 2D ) )] � 4D 2D 1 .f (x(t)) { x(t + c)  x(t)}  1 .f (x(t)) 2 c , 4D 2D =
=
=
�
=
let us interpret the coefficients as
� { 1 + .f (x(t) ) (±Jc/(2D) ) }
� � exp [lt+c 2�.f (x( s) ) db ( s)  1 t+c 4�.f ( x( s) ) 2 ds] .
The first integral on the righthand side is the Itointegral. Thus, we define *path w, *measure fL for each w, and U(t, by
x)
Definition 1 (1)
(2)
v be [t/c] with Gauss ' parenthesis. For each internal function {0, 1 , · , v  1} {  1 , 1} and y E IR, define Xk by Xk y + ��:� o: ( i )6, and w by the *polygonal path with vertices (0, y), ( c , x l ), (vc , xv)· Let a
:
·
·
=
+
·
·
·
,
Define * measure fL by fL (w)
=
1v exp [ rt 1 .f (w ( s) ) db (s )  rt 1 .f (w ( s) ) 2 ds ( 21.1. 4) Jo 2D Jo 4D 2 ]
with the first integral in the exponent being the Ito integral, and U (t. x) L U(O, w(O) ) tL(w) ( 21.1.5 ) =
w
where the sum is taken over all w satisfying w( vc )
=
x.
21.2.
309
Bounds for the *measure and the *Green function
We note that the *measure mula
[4].
21.2
( 21.1. 4)
corresponds to the Girsanov for
Bounds for the *measure and the *Green function
l\Iaking use of the I to formula t
1 f(w( s) ) db( s) 1 { F(w(t) )  F(w(O)) }  � {t f' (w(s) ) ds 2D 2 }0 }0 2D where F'(x) f(x), we obtain a bound for fL as fL(w) 21v exp [ 2� {F(w(t) )  F(w(O)) }  lot{ 4� J(w ( s) ) 2 + �j' (w(s)) }ds] (21.2.1) :::; 21v exp [ 2� {F(w (t))  F(w(O) )}  ct] . The constant c in the last line was given in the assumption ( A3 ) . Let us fix the end points of w at finite numbers y and x, and consider a {
=
=
=
space of *paths,
w(c[t/ cl) x } , and define the *Green function for the interval [0, t ] by (21.2.2 ) Q(t, x : 0, y ) 216 L fL (w) . wEP(t,x:O,y) Then, U(t, x) defined in ( 21. 1.5) is written as U (t, x) L U(O, y)Q(t, x : 0, y) 26. ( 21.2.3) y The infinitesimal spacing corresponding to dy is not 6 but 2 6 in ( 21.2.3 ) because only the paths that start every other point y could reach the end point x . Since 26 exp [ (x  y) 2 ] ( 1 + O (c l/2 ) ) "" 2_ _ (21.2. 4) v 4 l/ 4 D nD 2 2 t ( � t) wEP(t,x:O,y) P(t, x : 0, y )
=
{w
I
w(O) y =
and
=
=
=
'
the *Green function is bounded as
F(y) Q(t, x : O, y) :::; exp [ F(x) 2D  2D  ct ] 1 exp [ (x4;�) 2 ] ( 1 + 0 (c l/2 )) , (21.2.5) (4nD t) 1 /2 x
21.
310
Pathspace measure for stochastic differential equation
and hence
(21.2.6)
U(O , y )
Since the support of is assumed in (A4) to be a bounded set, the righthand side of (21.2.6) is nearstandard. To define a standard function as the standard part of we introduce two time scales of different orders, one for the Brownian motion, r:: , and the other, T, for the changes in the drift term in (21 . 1 . 1) ; we choose c = 0(T3), for example. The spacing c is finer and stands for the timespacing of *random walks, and T is long enough to cover many steps of the *random walks, yet short enough for the change in the drift term to be small. Define a standard function t, as the standard part of the value of at the coarsegrained lattice point of time !;_:
U,
U
U
U ( x)
U(t,
x) = st U(i, x)
where
i = T [t/T] ,
(21.2.7)
which we expect to be the solution to the FokkerPlanck equation
21.3
(2 1 . 1 .2) .
Solution t o the FokkerPlanck equation
x)
In order to prove that U(t, in (21.2.7 ) is the solution to (21 . 1 .2 ) , or more concretely to estimate h in (2 1.3.7 ) below, we should truncate the *paths at an infinite number as
A PA(i, x : O, y) = { w E P(t, x : O, y) l \is E [O, i] l w ( s)I < A } . The magnitude of A will be determined later in (21 .3.5) . Then the correspond ing *Green function YA and UA arc defined by ( 21.3.1 ) YA(t, x : 0, y) = 261 L fl (w) wEPA (t,x: O , y)
and
UA(L x) = L U(O, y)QA(i, x : 0, y) 26. y
( 21.3.2 )
Let us first calculate the difference between ( 21 .2.2) and ( 21.3.1 ) . Consider a *path 0, y) and put min { s 0, \ Define a *path by turning upside down the section of the path for
w E P(t, x : y) PA(t, x : w'
t(w) =
I l w ( s) l = A}. w( s)
311
2 1 .3 . Solution t o the FokkerPlanck equation
0 0. P(t, x : O, y) \PA(t, x : O, y)
2A  y 2A  y
the interval :<.:: s :<.:: t (w ) so that w' should start or at time s = The section of w'(s) for t(w ) :<.:: s :<.:: t is that of w (s) for the same interval. In this way, we can define a onetoone correspondence from wE to w' E Then by (2 1 .2.4) ,
P(t, x : 0, 2A  y) UP(t, x : 0.  2A  y).
and hence
F(y) ] 1 l9(t, x : O, y )  9A(t, x : O, y) l :<; exp [ F(x) 2D  2D  ct (4nDt) l/2 x (2 1 .3 . 3) { exp [ (2A  y  x) 2 ] + exp [ ( 2A  y  x) 2 J } x ( 1 + ( 1 ;2 ) ) 4D t
c
4D t
,
which implies
(21 .3.4)
x.
for any finite Now, we are ready to prove that wish to evaluate the standard part of
for an infinitesimal parameter as
A
a
=
� {U( t +
U(t, x)
a, X )
is the solution to (21 . 1 .2) . We
 U ( t, ) } X
kT (k E *N) given arbitrarily. Choose the truncation
(21.3.5 )
0,
where (3 > a standard constant, is just introduced to make the argument of logarithm dimensionless . Then by (21 .3.4) ,
UA (t, x)
for any standard natural number n, meaning that can be identified with up to negligible error. Therefore we shall hereafter deal with the truncated instead of Then the difference quotient we should calculate is
U(t, x) UA
U.
21.
312
Pathspace measure for stochastic differential equation
where
In order to calculate the sum in h , we wish to expand the summand as power series in � except for the exponential function, which is possible if � is sufficiently small. Since
(21.3.9) satisfies
YA (f
+ a, x : f, x + �) :S exp [ 2� { F(x)  F(x + �) }  ca] �2 1 exp [ 4D a ] ' ( 4n D a ) 1 / 2 x
(21.3.10)

the summand in (21.3. 7) contains exp [ 4b ] as a factor which enables us to u restrict � to be sufficiently small, 1 � 1 in rough estimation. I n reality, it is restricted to =
for some small
a
such as
O (a 112 ),
1/10 as shown in the following.
Note that
(21.3.11) for some m E N by the truncation at A (A1) . Then by (21.3. 10) ,
=
O ( l log
,Ba l) and the assumption
21.3.
313
Solution t o the FokkerPlanck equation
(D /{3) ! ({Ja) !a ,
e llog /3ul m ) (4Du . ] o (e l log/Jul m ) .
in the last line the infinitely large number If I� I :2: In fact, can be controlled by the half of the factor e  e I
1 [ 8Da ]  [ 8({3a)2 a
exp  £ < exp Hence, the sum 2.:::�; in canst · 1�;"12(D I
(21 .3.7)
=
for such � is estimated as
�2 4( nDa) 112;c [ 8Da ] d� o (aae  8(P") 2a ) o(an ) /3)1/2 (/3u ) 1/2a
J
1

exp
 
I
=
=
for any n E N, so that it can be neglected. Now, we have only to consider � of infinitesimal order equal or less than so that we can expand
F(x)  F(x + �)
=
1 , a2a U(x)  I�2 f'(x) + O (a23a ). 3
2.
Next, we wish t o replace the integral in
(21 .3.9)
(21 .3.12)
by
l t+u f(w( s ) ) 2 ds + 1 lt+u f' (w(s) ) ds '"' a J (x) 2 + a J' (x) . (21 .3.13) 4D ? 4D 1
2
t
t
It is justifiable if paths going very far at some time neglected. Let us take an infinitesimal
where a' is small but a ' > a , for example, Then, the sum over which satisfies bounded by
{ [ ( 4nDa)l l 2 exp
a
=
a
(2A'  �) 2 ] + exp [ ( 2A' + �) 2 ] } 4D a 4D a O (a l l 2 e (t3:)2a ) o (an ) =
_
can be
=
(21 .3.14)
=
for any n E N in the same way as used the fact � = = consider satisfying
w
[t , t + a]
2/1 0 if we take 1/1 0 . l w ( s)  x l :2: A' for some s E [t, t + a] is
w
canst
'
sE
(21.3.3), and hence negligible.
O (a! a ) o (A') because
a
'
>
a.
l w ( s )  xl :::; (D / f3) ! (f3a) !a'
Here, we have Thus, we have only to
21.
314
for all
s
E
Pathspace measure for stochastic differential equation
[t , t + a], and hence
[ 4D f(x) 2  �2 f'(x) J 1 exp [  1 {f ( w ( s)) 2  f ( x) 2 } ds 4D Ho � 1 { f' (w(s)) ds  f'(x) } ds ] exp [ � f ( x) 2  � j' ( x) J ( 1 + (a� a' ) ) . 4
= exp  _5!___ x
=
Putting
:t
o
(21.3. 15) =
0
(21 .3. 12), (21.3. 15) "'
� wEPA (t+ ,x:t,x+ l; ) o
into
H
x
and
_1:_ 2 v' 
[__r__ ] ( 1 + O (a3/ 2 ) ) exp / 4Da ( 4nD a) l 2 26
(2 1.3.9), we obtain e + a, x : f, x + �)  ( 4nDa1 ) l/2 exp [ 4Da J= { 2 2 Da � 2  2Da f(x) 2 o ( a ) =  � f(x)  � + f (x) + + } 4 D 8D2 2D 2 1 4�Da J · 4( nDa) 1 /2 exp [ 
YA (f
1
x
(21.3.16)
x
Lastly, we usc the expansion
(21.3.17) with for h . For h , we usc the expansion taken one step further,
with
21.3.
315
Solution to the FokkerPlanck equation
because (21.3.16) contains a factor of order O(a � a ) , whereas there is no such factor in h . We shall not give the proof of these expansions, for it requires elementary but a little long estimations. Estimations similar to this case can be found in [5, 6, 7] . Putting (21.3. 16) and (21.3. 17) into (21 .3.7) and replacing the sum by integral which is permitted because 6 = o(T 31 2 ), we obtain h
J
=
{  � j(x)  �2 + 2Da f (x) + �2  2Da f(x) 2 + o(a) }
� � � � ( D 1 ;9) 112 (iJu) 1/2a x
4D
2D
1
8D 2
� 2 d� {uA (t, x) + �Do UA (t, x) } (4nD1a) 112 exp [  4Da J
{ (  2D� f(x)  �2 +4D2Da f (x) + �2 8D 2Da 2) 2 j(x) UA (t . x) 1 2 1 � f(x)D ( x) o(a) } �2 d�. . UA t + o [ 1 2D 4Da J (4nDa) 1 2 J
1
�� � � ( D 6) 1/2 (;Ju ) 1/2a
exp

Since the domain of integration can be extended to *JR by the factor �2 1 (47rDu ) I /2 exp [ 4Du l '

h
=
 2Da f(x) 2 ) UA ( x) t J { (  2D� J(x)  �2 +4D2Da f (x) + f,2 8D 2 1 e f(x)Do UA (t , x) + o(a) } exp [  e J d�  2D 4Da (4n Da) 1 1 2 a { f'(x)UA (t, x) + f(x)Do UA (t , x) } + o(a) . 1
* !R
(21 .3.19)
= 
Similarly, putting
(21 .3. 18)
into
(21 .3.8),
we obtain
(21 .3.20) Therefore,
� {UA (t + a, x)  UA(t x) } DDJ UA (t, x)  { f' (x)UA (t, x) + j(x)Do UA (t, x) } + o(1 ) =
(21 .3.21)
holds for any infinitesimal a kT, which means that U ( t, x) st UA (t , .1: ) is differentiable with respect to t and its tderivative is the standard part of =
=
316
2 1 . Pathspace measure for stochastic differential equation
the righthand side of (2 1 .3.21 ) . Though we omit the proof, it is proved that U(t, x ) is twice differentiable with respect to x and their values are and Taking the standard part of the bothhand sides of (2 1 .3 . 2 1) , we obtain
(21.3.22) namely U (t, x ) is the solution to the FokkerPlanck equation (2 1 . 1 .2) . Let us finally note that, by Loeb measure theory, a standard aadditive measure over a space of paths can be derived from the *measure fL we have constructed in this paper so far.
References
[1]
H . P . M cK E AN , JR., London. 1969, (§ 3 . 6) .
Stochastic Integrals, Academic Press,
[2]
P . A . LOEB, "Conversion from nonstandard to standard measure spaces and applications in probability theory", Trans. Amer. lVIath. Soc . , 2 1 1 ( 1975) 1 131 22.
[3]
K. D. STROYAN and .J . l\1 . BAYOD , Foundations of infinitesimal stochastic Studies in Logic and the Foundations of Mathematics, North Holland. Amsterdam, 1986.
[4]
I. V. GrGRSANOV, " On transforming a certain class of stochastic processes by absolutely continuous substitution of measures", Theor. Prob. Appl . , 5
New York and
analysis,
( 1960) 285301 .
[5]
T . NAKAMURA, " A nonstandard representation of Feynman's path inte grals", J. Math. Phys. , 32 (1991) 457463.
[6]
T . NAKAMURA, "Path space measures for Dirac and Schri:idinger equations: Nonstandard analytical approach", J. Math. Phys., 38 (1997) 40524072.
[7]
T. NAKAMURA, "Path space measure for the 3+ 1dimensional Dirac equa tion in momentum space", J. Math. Phys . , 41 (2000) 52095222.
Optimal control for N avierStokes equations Nigel J. Cutland
*
and Katarzyna Grzesiak
**
Abstract We survey receut results on existence of optimal controls for stochastic
NavierStokes equations in 2 and 3 dimensions using Loeb space methods.
22.1
Introduction
this paper we give a brief survey of recent results 1 concerning the exis tence of optimal controls for the stochastic NavierStokes equations (NSE) in a bounded domain D in 2 and 3 space dimensions; that is, D C JRd with d 2 or 3. The controlled equations in th eir most general form are as follows ( sec the uext section for details) : In
=
·n (t) = no +
{t { .fo
+ .lot
g

IJ
A v, ( ., )

B ( 'U ( s)) + f ( s, n( s) , O (s , v.) ) } d.s
(22.1. 1)
(s, u(s)) dw(s)
Here the evolving velocity field ·u = u(L, w) is a stochastic process with values in the Hilbert space H � V1(D) of divergence free functions with domain D; t his gives the (random) velocity 'lt(l, x, w) E JR.d o f the fluid a t any time t and point. x E D. The most general kind of control 0 that we consider act.s through the external forcing term f, and takes the form 0 : [0, T] x 1l 7 M wh ere 1l is the space of paths in H and the control space Jv[ is a compact metric space. · Mathematics Department, University of York UK. ,
nc507@york . ac . uk
**Department of Finance, \;\/yzsza Szkola Biznesu, NationalLouis University, Nowy Sacz. Poland. j ermakowicz@wsbnlu . edu . pl
1The results reported here are developed f1·om the second author's PhD thesis [16] writ.t.en under the supervision of the first. author. Full details may be found in the papers 191 and 11 Oj.
318
22. Optimal control for NavierStokes equations
In certain settings however it is necessary to restrict to controls of the form : [0, T ] + lvl that involve no feedback, or those where the feedback only takes account of the instantaneous velocity u( t). The terms vA, B i n the equations are the classical terms representing the effect of viscosity and the interaction of the particles of fluid respectively; the term B is quadratic in u and is the cause of the difficulties associated with solving the NavierStokes equations (even in the deterministic case g = 0) . The final term in the equation represents noisy external forces, with w denoting an infinite dimensional Wiener process . The methods involve the Loeb space techniques that were employed in [4, 6] to solve the stochastic NavierStokes equations with general force and multi plicative noise (that is, with noise g (s, u(s)) involving feedback of the solu tion u) , combined with the nonstandard ideas used earlier in the study of optimal control of finite dimensional equations (see [7] for example) . For the 3d case it is necessary to utilize the idea of approximate solutions developed in [ 1 1 ] for the study of attractors. We assume a fixed time horizon for the problem, and then the aim is to establish the existence of an optimal control that minimizes a general cost J that has a running component and a terminal cost, modelled by
e
J(B , u) = JE
(lT h (t, u(t) , B(t, u)) dt + h (u(T) ) ) .
(22 . 1 .2)
The existence of an optimal control (in some cases, a generalized or relaxed control) can be established in a number of settings. The results for d = 2 are somewhat stronger than for d = 3, which is a reflection of the wellknown distinction between these two cases even for the uncontrolled equations: in di mension d = 2 there is uniqueness of solutions and the solutions are strong ( es sentially this means that the field u(t, x ) is differentiable in x for each t) whereas for d = 3 the solutions that exist for all time are weak and uniqueness is a ma jor open problem2 . Consequently even the formulation of the optimal control problem is more difficult, and optimal controls are obtained using our methods for more restricted classes of controls compared to the results for d = 2. The plan of the paper is as follows. First (Section 22.2) we provide some details of the Hilbert space setup for the equations and their solution, and then recall the basic nonstandard ideas concerning controls. The results for dimen sion d = 2 are then outlined in Section 22.3, and in the final section we do the same for d = 3. We omit proofs, referring the interested reader to [9] and [10] , although in some cases where new ideas are introduced we provided sketches. Alternative approaches to control theory for stochastic NavierStokes equa tions have been studied in [3, 13, 18] , where the controls are assumed to be 2
In fact this is o n e o f the Millennium problems. Uniqueness
but for d
=
�s
known for strong solutions
3 �uch �olution� can only be obtained for �hart timco.
319
22.2. Preliminaries stochastic processes
e
:
[o, T]
x
n
____, H
adapted to the information filter given by complete observations of the fluid velocity. In [3] for example, the author assumes that there is a constant p > 0 such that l e(t, w) l < p for all t E [O, T] and w E D, while in [13, 18] it is assumed that the controls satisfy the condition IE
(loT l e(t) l 2 dt)
< 00 .
In these papers the control is subject to the action of a linear operator. The stochastic minimum principle is derived providing a necessary condition for an optimal control, and a dynamic programming approach is presented to give a sufficient condition. It is shown that the minimum value function is a viscosity solution to the HamiltonJacobiBellman equation associated with the problem. In [13] the authors obtain smooth solutions to the H.TB equation which justifies the dynamic programming approach.
22.2 22.2.1
Preliminaries Nonstandard analysis
For optimal control theory, nonstandard analysis is a natural tool to use because one can always find a "nonstandard" optimal control  this is simply eN for any infinite N, where (en)nEN is a minimizing sequence of controls. The task then is to see whether e N can be transformed into a standard optimal control of some kind. For finite dimensional DEs and SDEs this idea was developed in [7] and related papers. In the study of the NavierStokes equations (particularly the stochastic ver sion) Loeb space techniques have proved very powerful, providing for example the first general existence proof for the stochastic NavierStokes equations in dimensions up to 4 (see [6] ) and more recently new results concerning the existence of attractors (see [5] , [ 1 1] and [ 1 2] ) . I n the work reported here the nonstandard techniques used in these two areas are combined. We work in a standard universe V V(S ) where S is a base set that contains all the objects of interest, and take an N 1 saturated extension *V(S ) C V(*S) . For ease of reference we gather together in an Appendix the most important facts about the nonstandard representation of the spaces used in the study of the NSE equations.
=
22.2.2
The stochastic Navier Stokes equations
The classical form of the uncontrolled stochastic NavierStokes equations with zero boundary condition on a time interval 0 :<:: t :<:: T is as follows:
22 . Optimal control for NavierStokes equations
320
{
du = { v �u  < div u = 0 u lan = 0 u(O, x)
=
u. \7 > u  \7p + j(t, u) } dt + g ( t, u)dw(t)
uo (x)
These equations are considered in a bounded domain D C IRd ( d = 2 , 3) which is fixed throughout the paper, with the boundary oD of class C 2 . Here u : [0, T] X D X n + JR3 is the random velocity field, v is the viscosity, p is the pressure, and f represents external forces; u0 is the initial condition. The diffusion term g together with the driving Wiener process w represents additional random external forces, or noise. Underlying this model is a filtered probability space 0 = (D, :F, (:Ft ) t >o , P) . For the conventional Hilbert space formulation, write L 2 ( D ) = ( L 2 ( D )) d and let V = { u E Crf' (D, IRd ) : div u = 0 } . Then H is the closure o f V i n L 2 ( D ) with the norm given by
(u, v)
lul2
=
(u, u) , where
J ui ( x )v i ( x)dx. L 2= 1 d
=
D
and V is the closure of V in the norm
l ui + ll u ll where ll u ll 2
(
)
=
((u, u)) and
ou o ((u, v)) = L ox ' oxv · J J ]=1 d
H and V are real Hilbert spaces, V dense in H. The dual space to V 1s denoted by V' with the duality extending the scalar product in H and V c H = H' c V ' . Write A for the Stokes operator on H (the selfadj oint extension of the pro jection of �) which is densely defined in H; it can be extended to A : V + V' by Au [v ] = (( u, v)) for u, v E V. The operator A has an orthonormal ba sis of eigenfunctions { ek h E N = [ C H with eigenvalues 0 < A k / oo . For u E H write u = 2...::: ukek. Write Hn for the finite dimensional subspace Hn = span{ e 1 , e 2 , . . . , en } and Prn for the projection onto Hn . A family of spaces Hr for r E lR is defined as follows: for r 2 0 Hr
=
=
{ U E H : kL Ak Uk < 00} =1
321
22.2. Preliminaries with the norm given by
=
l ui ; = L >.'k uk . k= l and H r is the dual of Hr . We may represent H  r by H r
=
{ (uk)kEN :
=
L x ;;r u k < 00} k=l
In terms of this family we have H0 = H, H 1 = V, and H  1 = V' with the norms l u i = l u l o and ll u ll = l u l l · The quadratic function B is given by (B(u), y) = b(u, u, y) , where
b(u, v, y)
=
d J u� (x) 80�j (x)y1 (x) dx i
L
2,J = l D
whenever the integral is defined. This describes the nonlinear inertia term in the equation. The trilinear form b has many properties [19] including the following that we need here.
b(u, v, v) 0, l b(u, v, y) l :::; c l u l i ll u ll � lvl i llvll � II Y II =
(22.2. 1) (22.2.2)
There are additional properties that hold only i n dimension d = 2. The inequality (22.2.2) gives the second of the following properties. Proposition 22. 2 . 1
For u E V
I Au lv' = ll u ll 1 3 I B(u) lv' :::; cl u l2 ll u P In the above setting the evolution form of the NavierStokes equations (without explicit control) in the space V' (the vector \lp 0 in this space) is given by =
lot vAu(s)  B (u(s)) + f (s, u(s))} ds t + lo g (s , u (s)) dw(s)
u(t) = uo +
{
(22.2.3)
with the initial condition u0 E H. The first integrals are Bochner integrals in V'. The driving noise process w is an Hvalued Wiener process with covari ance Q , a fixed nonnegative trace class operator (see [6, 14] for details) , and
322
22. Optimal control for NavierStokes equations
the stochastic integral is the extension of the Ito integral to Hilbert spaces due to Ichikawa [17] . I n the study of the NavierStokes equation there are several types of solu tion. The most general is a weak solution, but in the case d = 2 we also have strong solutions. The definitions are as follows (see [6] ) : Definition 22.2.2 ( a ) An adapted process u : [0. T] X n + H is a weak solution to the stochas tic N avierStokes equations (22.2.3) if (i) for Pa.a.
w the path u( , w)
has
u(, w) E L = ( [ O , T] ; H) n L2 ( [0, T] ; V ) n C( [O , T] ; Hweak ) , (ii) for all (iii)
tE
[0, T] the equation (22.2.3) holds as an identity in V',
u satisfies the energy inequality IE
By
(
T
)
l u(t) l 2 + l{o ll u (t) ll 2 dt < oo. t E[O, T] sup
solution henceforth in this
paper we mean
(22 . 2 . 4)
weak solution.
(b) A weak solution is strong if (i) for Pa.a.
w the path u(, w)
has
u( · , w) E L = ([ o , T] ; V ) n L 2 ( [0, T] ; H2 ) n C([ O . T] ; Vweak ) , u( , w ) E C( [O , T] : H)) ; w the path u ( , w ) has
(which implies that (ii) for Pa.a.
(22.2.5)
sup
tE[O .T]

T
ll u (t) ll 2 + l{o Au(t) 2 dt < oc .
(22 . 2 . 6)
When controls are introduced into the forcing term f the equation (22.2.3) takes the form
u ( t)
=
lot vAu (s)  B (u( s)) + t + lo g ( s , u(s)) dw(s) .
uo +
{
f ( s , u( s ), B(s, u))} ds
In the next section we discuss the types of control that we consider.
323
22.2. Preliminaries
The following general existence result for solutions to the sNSE was first proved in [4] ( see also [6] for an exposition ) , using Loeb space methods. First define Km = { v : I v i i <:.:: m } C::: V, with the strong topology of H . I n the theorem below, continuity o n each Km turns out t o b e the appropriate condition for the coefficients f, g; this is weaker than continuity on V in either the Hnorm or the weak topology of V.
Suppose that uo E H and
Theorem 22.2.3
f:
[0,
oc ) x V + V' ,
g:
[0, 00) X v
+ L ( H, H )
are jointly measurable functions with the following properties (i) f ( t, · ) E C(Km, V�cak) for all m, (ii) g(t, ) E C(Km, L ( H, H ) weak ) for all m, (iii) l f (t, u) l v' + l g(t, u) IH ,H <:.:: a(t) (l + l ui) where a E L 2 ( 0, T ) for all T. Then equation (22.2. 3) has a solution u on a filtered Loeb space. ·
In dimension d = 2 a stronger existence theorem ( and uniqueness, given Lipschitz coefficients ) has been established  this will be noted when required in Section 22.3.
22.2.3
Controls
The simplest controls considered in this paper are those with no feedback, as follows. A:o is customary in optimal control theory, it is often necessary to extend this class to its natural completion, which is the class of generalized or relaxed controls. These, and the topology on them are as defined thus ( as in [7] ) . The compact metric space lvl is fixed for the whole paper. Definition 22.2.4
(i) The class C of ordinary controls is the set of measurable functions e : [0, T] + AI, where M is a fixed compact metric space. (ii)
The class D of relaxed controls is the set of measurable functions cp [O, T] + M1 (M), where M1 (M) is the set of probability measures on M. (We regard C C::: D by identifying a E M with the Dirac measure 6a ) .
:
The weak or narrow topology on It is given by means of the set JC of bounded measurable functions z : [0, T] x M + lR with z(t, ) continuous for all t E [0 , T] . The action of B E It on z E }( is defined by ·
324
22. Optimal control for NavierStokes equations
B(z)
T
=
j z(t, B(t)) dt. 0
Then the topology on hoods the sets
11: is defined by specifying as subbase of open neighbour
{B : I B(z ) l < c} c>O , zE IC· The effect of a relaxed control o n a function z E JC is obtain by first extending the domain of each z E JC to [0, T] x M1 (M) as follows: for a probability measure fL E M 1 ( M) define z(t , fL )
=
j z(t, m) dfL (m) .
M
The weak (or narrow) topology is then extended to ::D by defining the action of cp E ::D on z E JC to be
where for brevity we use C{Jt := cp (t) . The fundamental results about controls that we shall usc arc summarized below; for details consult for example [7] , [15] , [2] or [20] (noting that a relaxed control is a particular case of a Young measure). Theorem 22.2.5
The set of relaxed controls ::D is compact.
This means that for an internal ("nonstandard") control (*ordinary or *relaxed) there is a well defined standard part 0 E ::D. (In [7] it is shown how this can be defined explicitly using Loeb measures.) Note that for a control cp E ::D we have 0 ( * cp) = cp. For the next result we define a uniform step control to be an ordinary control e such that there is a partition of [0, T] into intervals of constant length with e constant on each partition interval. Theorem 22.2.6
The set of uniform step controls 11:5 is dense in ::D .
To handle controls in a nonstandard setting the next definition and re sult is basic. Definition 22.2. 7 Z : * [0, T] x *M + *R is a bounded uniform lifting of z E JC if Z is an internal *measurable function such that (i) Z is finitely bounded
22.3. Optimal control for
(ii)
for a.a.
T E * [0, T] ,
d
=
325
2
for all
a
E *M : Z(T,
o:
) :::::J
z(0T, 00:).
Let
and Z be a bounded uniform lifting of z. Then
Theorem 22.2.8 ([7] )
:::::J
T
T
j Z(T, lj)T )dT j z(t, CfJt )dt :::::J
0
0
In later sections we will define wider classes of controls of the form e = where u() denotes the trajectory =
B(t, u) for u E H or even e B(t, u( · )) {u( s ) : O :<; s :<; t} up to time t. 22.3
Optimal control for d = 2
Throughout this section we take d 2 and utilize the stronger existence and uniqueness results that obtain in this case. =
22.3.1
Controls with no feedback
First we consider the simplest kind of optimal control problem, taking the set of controls to be the nofeedback controls 11: and its generalization :D. The controlled equation then takes the form
u( t )
=
lot { vAu(s)  B (u(s)) t + lo g ( s , u( s )) dw(s)
uo +
+ f (s, u(s) , cp ( s ))} ds
(22.3 . 1 )
for a control i n :D . Provided f and g are suitably Lipschitz then solutions to this equation are unique. Theorem 22.2.3 is refined as follows.
Let d 2. Consider the following conditions on the func tions f, g: there exist constants c > 0 and a() E L 2 (0, T) such that: ( ) f : [0, T] H x M + V' is jointly measurable and (i) l f(t, u, m )  f(t, v , m ) l v' :<:: c l u  v i (ii) f(t, u, ) is continuous (iii) for a. a. t E [0, T]
Theorem 22.3 . 1 a
=
x
·
for all u, v E H
l f ( t , u, m ) l v' and m E M.
:<::
a (t)(l + l ui)
326 (b)
22. Optimal control for NavierStokes equations
g:
is jointly measurable and (i) l g(t, u)  g(t, v)IH , H <::: c l u  v i (ii) for a. a. t E [0, T] [0, T]
x
H
+
L(H, H)
l g (t, u) IH,H <::: a ( t )(l + lui) for all u, v E H. There is a filtered probability space 0 (D, :F, (:Ft ) t >o, P) carrying a Wiener process with the univ ersal property that for any f, g satisfying the abo v e conditions (for any a( t) and c) and any ordinary or relaxed control the equa tion (22. 3. 1) has a unique solution on 0 satisfying the energy inequality =
w,
IE
l u(t) l 2 + { T ll u(t) ll 2 dt < E lo tE[O,TJ
(
sup
)
(22.3.2)
with the constant E uniform over the set of controls; in fact E E ( uo, a ()) . =
The uniqueness of solutions requires the property
l b(u, v, y)l <::: c l u l � ll u ll � lvl � llvll � I I Y I I
of the form b which is only valid in dimension 2. We now fix a space 0 as given by this theorem. This may be a Loeb space (as in [6, Theorem 6.4. 1 and 6.6.2] ) , but for the current purpose this is not essential3 : we simply assume that there is such a space 0 in the basic standard universe of discourse and do not care about its provenance. (In Section 22.3 . 1 1 it will b e necessary t o specify the space 0 more precisely.) For a given control cp and initial condition u0 we define
u'P
=
the unique solution for the control cp with
u(O)
=
u0
(Strictly we should write u'h to denote the underlying space, but where not mentioned this will be n.)
22.3.2
Costs
To formulate an optimal control problem it is necessary to consider the cost of a control. Here we consider a general cost comprising a running cost and a terminal cost, defined as follows. 3
Thi� i� i n contra�t to Section 22 . 3 . 1 1 and abo when working in d
necessary to specify the space 0 more precisely.
=
3 , where it will be
22.3. Optimal control for
d
=
327
2
Definition 22.3.2 We assume given (and fixed for the paper) two functions h : [0, T] x H x M + R and h : H + R with the following properties.
(i) h is jointly measurable, nonnegative, continuous in the second and the third variables and satisfies the following growth condition: for a.a. t E [0, T], all u E H and m E M l h (t, u, m) l :::; a(t) (l + l ui) .
(ii) h : H + R is nonnegative, continuous and has linear growth: that is, lh(u) l :::; c (l + lui) for all u E H . The cost for a control cp (ordinary o r relaxed) is J ( cp ) defined by T h (t, u'P( t ) , cp ( t )) dt + h (u'P( T )) .:1( cp) IE =
(lo
)
(Strictly we should write Jo,(cp) to denote the underlying space, but if the space is not mentioned then we mean D.) The minimal cost for ordinary controls is
.:Jo
=
inf{ .:J(e) : e E Q:}
and the minimum cost for relaxed controls is
Jo 22.3.3
=
inf { .:T( cp) :
cp E �}
Solutions for internal controls
In the present context we are not concerned with proving existence of solu tions to the stochastic NavierStokes equations (sNSE)  either by Loeb space methods or any other technique: we are simply assuming the basic existence result above. However, for the purposes of obtaining an optimal control we wish to transfer this result to give the existence of an internal solution U if> to the internal equation on the internal space * 0 controlled by an internal control
be the unique solution (which exists by transfer of Theorem 22. 3. 1) on * 0 to the internal equation * (22. 3. 1) with control ; that is U if> ( T )
=
loT { *AUif> (a)  *B (Uif> (a)) + *f (a, Uif> (a) , T + lo *g (a, U if> (a)) dW(a)
*uo +
v
( a)) } da + (22.3.3)
and satisfying the internal energy inequality
(22.3.4)
Then U if> has a standard part u o uif>, which is a solution on L ( *O ) to the standard equation (22. 3. 1) with control 0, i. e. u = uoq, and =
(22.3.5)
22.3.4
Optimal controls
The main theorem for controls with no feedback is now easy to derive. Theorem 22.3.4 Let uo E H and f, g, h and h satisfy the conditions of The orem 22. 3. 1 and Definition 22. 3. 2. Then there is an optimal relaxed control ipo,
i.e. such that J ( ipo)
=
Jo
Proof. Take a sequence of controls ( ek ) k E N c e: such that
K E * N; then *J• n (BK) Jo for the internal ordinary control BK : * [0, T] + *A1. Corresponding to this control is the unique internal solution Fix an infinite
::::::J
ueK to the internal controlled NavierStokes equation (22.3 .3) which obeys the energy inequality (22.3.4). Let ipo : = oeK : [0, T] + M 1 (1\1) and apply Theorem 22.3.3 to the control
so
22.3. Optimal control for
d
=
329
2
It only remains to show that ..JL ( *O ) ( IPo ) = ..J ( �Po ) . First notice that ..J ( �Po ) is a real number so ..J (ipo) = *..J• o (*ip0 ). Apply Theorem 22.3.3 to the internal control *'Po and the internal solution u *
so
ipo is optimal.
D
Actually the above theorem is easily derived from the following more gen eral consequence of Theorem 22.3.3. Theorem 22.3.5 topology on ::D .
The function ..J is continuous with respect to the narrow
Proof. This is routine using the nonstandard criterion for continuity, together with Theorem 22.3.3 and the argument in the final part of the above proof. D This result, together with the density of that Jo = ..Jo and so we have Theorem 22.3.6
of relaxed controls.
11: in
::D (Theorem 22.2.6) means
The optimal relaxed control ipo is also optimal in the class
As usual in optimal control theory, if the force term f is suitably convex then from an optimal relaxed control it is easy to obtain an optimal ordi nary control.
22.3.5
HOlder continuous feedback controls
(d
=
2)
The natural generalization of the control problem for the stochastic 2D NavierStokes equation is to allow e to depend on the instantaneous fluid ve locity as well as on time:
u(t)
=
uo +
lo { vAu(s)  B (u( s)) lo g (s, u(s)) dw( s ) . t
t
+
+ f (s, u(s), B( s , u(s))) } ds +
(22.3.6)
It is straightforward to extend the approach outlined above for nofeedback controls to the special situation when the controls considered are Holder con tinuous feedback controls with uniform constants, as follows.
330
22. Optimal control for NavierStokes equations
Definition 22.3. 7 For a given set of constants ( o:, ,6, 1, c1 , c2 ) with o:. ,6 E (0. 1 ) and 1 , C l , c2 > 0 the set of Holder continuous feedback controls C( o:, ,6, I: q , c2 ) is the set of jointly measurable functions
e [o, TJ :
x
H
____,
M
which are locally Holder continuous in both variables with these constants; that is: (22.3.7) for all
t, s E [0, T] such that I t  s l ::; 1 and all u, v E H such that l u  v i ::; f .
We will see that this definition guarantees the existence of an optimal ordinary control; the Holder continuity allows us to construct the standard part of an internal control as an ordinary control without the need to consider measurevalued (i.e. relaxed) controls. The cost of a control is as before: (22.3.8) where h and h. satisfy the conditions of Definition 22.3.2. The minimal cost is now defined as JO = inf { .J(B ) : B E Q: ( o:, ,6, / , c1 , c2) }
o
(of course, strictly this should be .J (o: , ,6, 1, c1 ; c2 )). The standard part of an internal control is defined in the obvious way:
Definition 22.3.8 Suppose that 8 *M. Define e E Q:(o:, ,6, / , cl , c2 ) by
E *Q: ( o: , ,6, { , c 1 , c2 ), so 8 : * [0 , T] *H + x
B(t, u) 08 ( t, u) 08 ( T, U) for any T :::::J t and u :::::J u in H (because of the condition (22.3.7) ) . =
=
Write e
=
08.
Now we have the counterpart o f Theorem 22.3.3 for Holder continuous controls.
Let 8 E *Q:(o:, ,B, ,, c1 , c2 ) be an internal control and j, g satisfy the conditions of Theorem 22. 3. 1. Let U8 be the unique solution on * 0 to the internal equation with control 8: U8 ( T ) *uo + + )0ro T{ v*AU8 (T)  *B (U8 (T)) + *j (T, U8 (T) , 8 (T, U8 (T))) } dT (22.3.9) T + *g (T, U8( T)) d W(T )
Theorem 22.3.9
=
i
22.3. Optimal control for
d
=
331
2
and satisfying the internal energy inequality T
)
I U8(T) I 2 + {o II U8 (T) II 2 dT < E. rE*[O ,T] J Then its standard part u = o ue is a solution on L (* O.) to the standard equa tion (22. 3. 6) with control 0 8 (z. e. u U0 8 ), and IE·p
(
sup
=
*J·o ( 8 ) � ..1L ( *Sl ) C8 ) .
This gives the following with proof almost identical to that of Theorem 22.3.4.
Let f, g satisfy the conditions of Theorem 22. 3. 1. There exists an optimal admissible control B0 E Q:(q , c2 , (3, / ) , i. e. such that
Theorem 22.3. 10
J(Bo )
=
22.3.6
o:,
Jo .
Controls based on digital observations
(d
=
2)
The most general results for optimal control o f 2 D stochastic N avierStokes equations involves feedback controls that depend on the entire past of the path of the process u through digital observations made at a fixed finite number of points of time. This model of control was discussed in [1] and [7] for finite dimensional stochastic equations; in the latter paper digital observations were made at random times, but we do not include that feature here. The controlled system takes the form
du(t) = { vAu( t )  B(u(t), u( t )) + f ( t, u, B(y(u), t))} dt (22 . 3 . 1 0) + g ( t, u)dw(t) with initial condition u(O) = u0 E H. In this equation u = u( ) denotes the path of the process u( t) up to the present time (which will be clear from the context) . The function y ( u) in the control e denotes the digital observations or readout given by the path u, described in the next sections. 22.3.7
The space
H
For this system the conditions on f, g will be strengthened slightly so that the solutions to (22.3. 10) are strong, and thus the paths of solutions to the equation belong to the space 7{ where Let
lui
7{
=
C([O, T] ; H) n L = ([O, T] ; V ) .
denote the uniform norm o n this space; that is
l ui = 0 sup l u( t ) l 9S T
and note that with this norm 7{ is separable.
332
22.3.8
22. Optimal control for NavierStokes equations
The observations
Controls will be based on the information received from digital observations of the solution path u E 'H made at observation t imes
0 < t1
< ... <
tv < T
that are fixed for the problem. For completeness we write t0 = 0 and tp+ l but observations are not made at these times. The digital observation (taking its value in N) at time ti is denoted by Yi where
T,
Y�
:
'H + N
(i
1, .
=
The complete set of observations for a path
u
. . , p)
is recorded as
y(u) = ( Yl ( u ) , . . . , Yv (u)) The i th observation y, is assumed to be nonanticipating in the sense that depends only on the past u I [0, t�J of the path u up to time t�. The digital observation functions y, are fixed for the whole problem.
22 . 3 .9
y� ( u )
Ordinary and relaxed feedback controls for digital observations
The feedback controls considered here are as follows. M is the fixed com pact metric space as before. Definition 22.3. 1 1 An ordinary feedback control based on digital ob servations is a measurable function
e NP :
x
[o, T]
____,
M
where e is nonanticipating in the sense that if t k :<.:: t < tk + l for k = 1 , . . . , p, then B(y, t) depends on t and only the first k components of y. namely (y 1 , . . . , Yk)· For t E [0, tl) a control e depends only on t. A relaxed feedback control based on digital observations is a mea surable function where 1111 ( M) is the space of probability measures on M and i.p is non anticipating in the same sense as for an ordinary control. Write 6 and fJ for these sets of ordinary and relaxed feedback controls.
22.3. Optimal control for
d
=
333
2
For brevity, in the current context , by a control (ordinary or relaxed) we mean a feedback control based on digital observations. The interpretation of a relaxed control is as before. Nonstandard (i.e. internal) controls have standard parts that are given as follows, using the compactness of nonfeedback controls (Theorem 22.2.5) . Let
X
* [0, T]
+
*M ( M)
It is sufficient to construct the standard part of
in the sense of the weak topology on ::D (sec the remark following Theorem 22 .2.5). For k = 1, . . . , p and n E I'JP write n I k = (n 1 , . . . , n k ) and notice that
=
Each
*[t k , t k+ l [ ), so,
From this we define the standard part r.p = o
for t E [t k, t k + l [ · (Note that for fixed n E I'JP if we write W (T) =
in 11: is dense in ::D .
is compact and the subset Q:S of uniform step controls
The following is the extension of Theorem 22.2.8 to show how the standard part of a digital feedback control acts.
334
22. Optimal control for NavierStokes equations =
Let
. If z : [0, T] lvl + lR is a bounded, measurable function continuous on A1 and Z : * [0, T] *M + * JR is a bounded uniform lifting (according to Defini tion 22. 2. 'l) then for any n E NP
Proposition 22.3 . 13 x
22.3.10
x
Costs for digitally observed controls
The cost of a control is defined in the expected way: for a control r.p ( ordi nary or relaxed) define (22.3 . 1 1 ) where u is any solution to the equation with control r.p . The cost functions h and lL are given, with properties set out below. As usual the aim is to find a control which minimizes the cost over the set of controls. We will see that the solution to the equation is unique for a given control so we can define the cost J(r.p) = J(r.p , u'P ) (where u'P is the solution for the control r.p) . Then the optimal control problem is reduced to finding a control such that Jo
:= inf{J(B) : e an ordinary feedback control }
is obtained. As is to be expected, in general the set of ordinary feedback controls docs not have sufficient closure properties and we prove existence of an optimal relaxed control.
22.3.11
Solution of the equations
As noted above the conditions on the functions f, g, h, h need to be strengthened (and modified to take account of the form of feedback) . We fix the following set of assumptions for the current discussion.
There exist constants a > 0 and L > 0 such that: f : [0, T] 7{ M + H is jointly measurable, bounded, nonanticipating, Lipschitz contmuous in the second variable and continuous in the third variable. Specifically (i) l f(t, u, m ) l :::; a (ii) u I t = v I t ===? f(t, u, m) = f ( t, v , m ) (iii) l f(t, u, m)  f(t, v, m) I H :::; L l u  v i
Conditions 22. 3 . 14 ( a)
x
x
22.3.
Optimal control for
·
x
+
=
x
(c)
(d)
x
·
(e)
335
f(t, u, ) continuous for a. a. t E [0, T] , for all u, v E 7{ and m E M . g : [0, T] 7{ + L(H, V) is jointly measurable, bounded, nonanticipating and Lipschitz continuous in the second variable. Specifically (i) l g(t, u) IH . v :::; a (ii) u I t = v I t ===? g(t, u) = g(t, v) (iii) l g(t, u)  g(t, v) IH . H :::; L lu  v i for a. a. t E [0, T] and all u, v E 7l . For convenience we also assume that g is diagonal, i. e. V n E N Sd(n) where Sd (n) denotes the gn Prn(g I Hn ) : [0, T] x 7{ space of n n diagonal matrices (that is, a21 f. 0 iff i f. j). g(t, u) is invertible and l g  1 (t, u)f(t, u, m) l :::; a for a. a. t E [0, T] , all u E 7{ and m E M . h : [0, T] 7{ fl,f + lR i s jointly measurable, nonnegative, bounded, nonanticipating in the second variable and continuous in the second and third variables. Specifically (i) l h(t, u, m) l :::; a (ii) u I t = v I t ===? h(t, u, m) = h(t, v, m) (iii) h(t, ) is continuous for a. a. t E [0, T] , all u, v E 7{ and m E M. h : 7{ + lR is nonnegative, continuous and bounded. Specifically l h(u) l :::; a for all u E 7l . (iv)
(b)
d=2
,
x
·
Exist ence and uniqueness o f solutions t o t h e equation With the in formation, control and cost structures now defined, we return to the equa tions (22.3. 10) under consideration. Since these equations involve feedback of the entire past of the process both directly in the coefficients f, g and through the control r.p there are new considerations concerning existence and unique ness. The first task therefore is to show how the basic existence theorem of [6] can be extended to prove the following.
There is an adapted Loeb space 0 = ( D , F, (Ft ) t E[O ,TJ , P) carrying a Wiener process of covariance Q, with the following property. For any f, g satisfying Conditions 22. 3. 14 {a, b, c), any initial condition u0 E H and any feedback control r.p (ordinary or relaxed) the equation (22. 3. 10) has a unique strong solution u u'P on n .
Theorem 22.3 . 15
w
=
22.
336
Optimal control for NavierStokes equations
The proof of this is rather long and somewhat complicated technically. The essence is to solve the internal hyperfinite dimensional Galerkin approximation in H N with the internal control *cp giving an internal solution u*'P ' and then take the standard parts. The latter involves transferring the Girsanov theorem to the hyperfinite setting, and the use of special topologies defined on the spaces C(H, H) and C(H x M, H) to make them separable so that Anderson's Luzin Theorem can be applied.
22.3.12
Optimal control
For the rest of this section the space n = ( D, :F, (:Ft ) tE[ O ,TJ ' P) and Wiener process w are fixed as those given by Theorem 22.3.15. The following is proved by taking the internal control in the proof of the previous theorem to be an arbitrary internal control
Suppose that
is the so lution to the following equation in H N on the internal space s1 ( n, .A, ( .AT ) T E * [O,T] , II) of the previous theorem:
Theorem 22.3.16
=
dU(T) { v'AU(T)  BN (U(T) , U(T)) + FN (T, U,
is the unique strong solution to the controlled equation (22. 3. 1 0) with cp 0 and =
=
=
where J(
) = * IE
(lor h (T, uif> ,
), T)) dt + *h (Uif> ) ) *
The existence of an optimal relaxed control now follows easily: Theorem 22. 3 . 1 7
The cost function J is continuous on ::D
Proof. Suppose that CfJn + cp in ::D . Theorem 22.3.16 shows that ] ( *cpn ) :::::J J ( cpn ) and so J( cpM ) :::::J *J ( cp M ) for small infinite AI . Again using Theorem 22.3.16 we have J( C{JM ) :::::J J ( 0CfJM ) = J ( cp ) . Thus *J ( cpM ) :::::J J ( cp ) for small infinite M and so J ( cpn ) + J ( cp ) as required. D The compactness of ::D and the density of stepcontrols now gives:
22.4.
Optimal control for
=
d 3
337 =
For feedback controls based on digital observations Jo Jo (the infimum for relaxed controls) and there is an optimal relaxed control ipo with J( !fo) Jo Jo.
Theorem 22. 3 . 1 8 =
=
Optimal control for d
22.4
=
3
The extra complications of the 3D stochastic N avierStokes equations mean that we are only able to consider the simplest kind of controls. The controlled equation considered takes the form
u(t)
=
lo. t { vAu(s)  B (u(s)) + f (s, u(s) , B( s))} ds + t + l g (s, u(s)) dw(s)
uo +
(22.4. 1)
which has the same appearance as equation (22.3.1) for nofeedback controls in dimension d = 2. The possible nonuniqueness of solutions means that the very definition of cost and optimality has to be refined  sec below  and this requires a space that is rich enough to support essentially all possible solutions for any given control. For this it is necessary to employ a Loeb space. First we fix the conditions on the coefficients f, g in the equation. These are the natural extension of the basic conditions for the fundamental existence of solutions in Theorem 22.2.3.
There exists an L 2 function a(t) 2 0 such that f [0, T] H M V ' is jointly measurable with linear growth, con tinuous in the second variable on each Kn and continuous in the third variable, i. e. (i) l f(t, u, m) lv ' :::; a(t) ( 1 + l u i ), (ii) f(t, , m) E C(Kn ; V�eak ) for all finite n, where Kn {u E V ll u ll :::; n} as before (with the strong Htopology). (iii) f (t, u, ) continuous for a. a. t E [0, T] , all u E H and all m E M . g [0, T] H L(H, H) is jointly measurable with of linear growth, and continuous in the second variable on each Kn , i. e. (i) l g(t, u) IH . H :::; a(t) (1 + l u i ) , (ii) g(t, ) E C( Kn ; L(H, H)weak) for all finite n, for a. a. t E [0, T] and all u E H.
Conditions 22.4 . 1 (a)
:
x
x
·
·
(b)
:
x
·
+
+
22.
338
22.4.1
Optimal control for NavierStokes equations
Existence of solutions for any control
Theorem 22.2.3 then gives the following, where as before the key feature to note is the uniform energy bound.
There is a filtered probability space 0 = (D, :F, (:Ft ) t >o, P) with the universal property that for any f, g satisfying Conditions 22. { 1 (a, b) and any ordinary or relaxed control, the equation (22. { 1) has a solution sat isfying the energy inequality
Theorem 22.4.2
IE with a constant E
..
..
=
(
T
)
l u(t) l 2 + { ll u(t) ll 2 dt < E lo t E[O ,TJ sup
(22.4.2)
E ( uo, a) uniform over the set of controls.
The space fulfilling this result in [6, Theorem 6.4. 1] is a particular Loeb space, which we need to specify later. Note that in the current 3D setting solutions may not be unique (this is an open problem) .
22.4.2
The control problem for 3D stochastic NavierStokes equations
In order to formulate the optimal control problem for the 3D stochastic N avierStokes equations it is necessary to have a space 0 as in Theorem 22.4.2 that has solutions for all controls. Fix such a space; then bearing in mind the possible nonuniqueness of solutions, the optimal control problem is formulated as follows. For a given control cp define
UP := { u : u is weak solution on n to (22.4. 1) for cp and satisfying (22.4.2) } Then, taking the functions J(cp, u)
= IE
(22.4.3)
h, h satisfying the conditions of Definition 22.3.2 let
(loT h (t, u(t) , cp (t)) dt +
h ( u ( T))
).
(22.4.4)
The cost for cp is then defined by J(cp)
:=
inf{J(cp. u)
:
u,
E UP} .
(22.4.5)
Now set Jo
=
inf { J(B, u)
: e E Q:, u E U8 }
=
inf{J(B)
: e E Q: }
22.4. Optimal control for
=
d 3
339
which is the optimal cost for ordinary controls. Likewise
Jo
=
inf { ..1 ( ip, u) : ip E
::D, u E U'P}
=
inf { ..1 ( ip) : ip E
::D }
is the optimal cost for relaxed controls. The optimal control problem is to find if possible an optimal control B and optimal solution u E U8 such that ..J(B , u)
=
..J(B)
=
..Jo
and similarly for relaxed controls
..l ( !f , u) = ..l ( IP ) = Jo . Remark The above definitions are of course relative to the space 0 so strictly this should be acknowledged by writing ..]0° , etc. However, the space n defined in the next section will be fixed for the rest of the paper. 
22.4.3
The space
n
In order to obtain an optimal control and optimal solution to the 3D stochastic N avicrStokcs equations the space must be rich enough to carry a good supply of solutions for each given control; we will see that the Loeb space used in [4] (sec also [6] ) to solve the general existence problem for the stochastic N avierStokes equations in dimensions ::; 4 is just what is needed. For the rest of this paper we take 0 to be this space, defined as follows. Fix an infinite natural number N and take the internal space where D is the canonical space of continuous functions * C ( * [0, T] ; HN ) , and II is the Wiener measure induced by the canonical Wiener process W(T) E HN having covariance operator QN ( = PrN * Q PrN ) . The filtration (A T ) TE * [o . T] is that generated by the internal process W ( T) . Take the Loeb measure and Loeb algebra P
:Ft
=
n a(AT )
<0
t T
v
=
L(II) , :F L (A) =
and put
N
where N denotes the family of Pnull sets. This gives the adapted Loeb space n
which carries the process covariance Q.
=
w
( D, F, (Ft ) t E[o .rJ , P )
=
0 W,
which is a Wiener process in H with
340
22. Optimal control for NavierStokes equations
The fundamental existence result for the 3D stochastic NavierStokes equa tions on the space 0 in [ 4] or [ 6] is proved by taking the internal solution U ( T, w) to the Galerkin approximation of dimension N, so U(T, w ) is an internal pro cess with values in HN . The existence of a uniform finite energy bound allows the definition of the standard part process u = o U in H, which is the required solution. We do not need to repeat this here in order to establish existence, but the proof of the key theorem below concerning approximate solutions uses similar techniques.
22.4.4
Approximate solutions
Recall the main idea for using nonstandard methods in deterministic op timal control theory used in 2dimensions. Take a minimizing sequence of controls Bn and consider a nonstandard control 8 = *BK for infinite K and a solution U = U 8 to the equation for this control. Then u = o u is a standard solution for the control 1.p = 08 which is thus optimal. The problem with this approach in the 3D stochastic setting is that the solution U associated with 8 would live in *H and be carried by the nonstandard space *0 (provided we can make sense of that ) . We showed in Section 22.3 how the uniqueness of solutions for d = 2 allows us to move back to the space n to complete the story, but here an alternative approach is needed. This is provided by the notion of approximate solutions, first introduced in [ 1 1] in order to establish results on attractors for 3D stochastic NavierStokes equations. The rather technical notion required in [ 1 1] has been adjusted below to suit the current needs  and is somewhat simpler than the corresponding notion in [ 1 1] . Definition 22.4.3 (Approximate solutions) Fix an initial condition uo E H and let E = E ( uo, a) . Let
as follows.
(a) For each j E *N denote by X1 the internal class of internal processes u : * [0, T] X n + HN that are * adapted to the filtration (AT ) ' with the following properties: (i) With ITprobability � 1 
� on n, for all T
E
* [0, T] and all
T I Uk(T)  Uk(O)  { vAUk(a)  Bk (U(a), U(a)) } da T T  Ff (a. U(a)) da  Gk (a, U(a)) dW(a) l :S Tj
lo
lo
where Bk = (*B, *e k), Gk = (*g, *e k) ·
lo
Fi: (a. U(a))
=
k ::; j : (22 .4.6)
( *f (a, U(a) ,
and
22.4.
Optimal control for
=
d 3
341
(ii)
(22.4.7) (iii)
I U(O)  uo l
1
<:.:: : J
(22.4.8)
(b) Define x = nj EN xf . This is the set of approximate solutions to the controlled stochastic N avierStokes equations for control . Remark The sequence of internal sets X1<�'> is decreasing with j and since the set X involves Xj only for finite j then we have X_k � X for all K E *N \ N . The theory o f [11] , adapted t o the modified definition o f approximate solu tion here gives the following key result. First we must define the internal cost j ( , U) for an internal process U E H N on Sl and control by
which is different from *..J•n (
Let be an internal control (that is, E *::D) and U E x . Then for Pa. a. w E D, IU(T, w) l is finite for all T E * [O, T] and U(, w) is weakly S continuous. The process defined by u(t, w )
=
0U(T, w )
for T :::::J t belongs to U o q, (where 0 is the standard part of q> in the Weak topology on ::D as described in Section 22. 2. 3) and
(b)
Let r.p E ::D be a standard control and u E UP. Then there exists with u = o U as defined in part (a), and hence J(*r.p, U)
:::::J
u
E x *'P
..J(r.p, u) .
Proof (Sketch) . The proof of (a) follows quite closely the proof of Theo rem 6.4. 1 in [6] . The main difference here compared with that proof is (i) the presence of the control in the drift term f, which is dealt with routinely using
22.
342
Optimal control for NavierStokes equations
Anderson's Luzin Theorem, and (ii) the fact that internally we only have an approximate equation. That is, we have :::::J instead of = in the internal equa tion for each finite coordinate of the solution. That is, from the approximate equality (22.4.6) with j = K for some infinite K, using overspill, we have
Uk (T)
:::::J
Uk (O) + +
lo { vAUk (a)  Bk (U(a) , U(a))}da lo Ff (a, U(a) )da lo Gk (a, U(a)) dW(a) T
T
+
T
for finite k. This gives equality after taking standard parts. For (b) the theory in [11] can be adapted easily to obtain suitable liftings of the solution u, giving an internal process U E X*.P such that u = 0U. The remaining fact about the cost follows from part (a) with
22.4.5
Optimal control
The results concerning optimal control of the stochastic 3D equation follow from the next more general theorem. Theorem 22.4.5
Let f, g, h and h satisfy Conditions 22.4. 1.
(a) For every control r.p there is an optimal solution u.P E U .P with J ( r.p, u.P) = J(r.p) (b) Let I.{Jn be a sequence of controls and Un E U.P n, with J ( I.{Jn , un) converging.
Then there is a control r.p and u E U.P with J( r.p, u) = nlim J( I.{Jn , un ) · >oo
Proof. (a) For a control r.p E ::D take a minimizing sequence ( U k ) k EN C U.P such that J(r.p, uk ) � J(r.p) . Using Theorem 22.4.4 (b) , for each k take an approximate solution uk E x *.p such that U k = 0 uk and .J (*r.p, Uk ) :::::J J ( r.p, Uk ) . Weakening this gives and for each finite k so by Nlsaturation there is an infinite K with uK E x; c:: x *.p and J(*r.p, UK) :::::J J(r.p, uK) · Then Theorem 22.4.4 (a) gives 0 UK E U.P with J(r.p, 0 UK) = J(r.p) , so we may take u.P = 0UK.
22.4. Optimal control for
=
d 3
343
(b) The proof is similar to the proof of (a) . For each k take an approximate solution uk E x *'P k such that U k = ouk and J(*r.pk , Uk ) :::::J J( yk , U k ) · Using N l saturation there is an infinite K with UK E x: K � x *'P K and J(*r.pK , UK) :::::J
J1 = l imn_, = J(r.pn, Un) · Now let r.p o ( *r.pK) and u J(r.p, u) = 0J(*r.pK, UK) = Jl =
=
0 UK;
Theorem 22.4.4 (a) gives
u
E
U'P
and D
Corollary 22.4.6
(a) There exists a relaxed control r.po (and solution u'P 0 E U 'PO ) that achieves
the mmimum cost for ordinary controls (i. e. such that J(r.po, u'P0 ) J( r.po) Jo) ; (b) There is an optimal relaxed control rpo and optimal solution u
=
=
Proof. For (a) take a minimizing sequence of ordinary controls (and solutions) and for (b) take a minimizing sequence of relaxed controls and solutions. Then apply (b) of the previous theorem. D Remark 22.4. 7 One consequence of the possible nonuniqueness of solutions is that unlike in dimension 2 we cannot prove that Jo = Jo, so this theory is rather less satisfactory than that for 2D.
22.4.6
Holder continuous feedback controls
(d
=
3)
Recall that in dimension d = 2 , the existence of optimal ordinary feedback controls in a general setting was ensured by taking Holder continuous feedback controls (Section 22.3.5) . The technique of approximate solutions allows the extension of this idea to the 3D stochastic setting. Controls take the form B(t, u(t)), and the controlled equation (22.4. 1) becomes
u(t)
=
lot {  vAu(s)  B (u(s)) + f (s, u(s), B(s, u(s))) } ds + t + lo g ( s , u(s)) dw(s)
uo +
(22.4.9)
with cost function
J(B, u) for
u E U'P
=
JE
(loT h (t, u(t) , B(t, u(t)))dt + h (u(T)) )
(22 .4. 1 0)
22.
344
We maintain the Conditions
Optimal control for NavierStokes equations
22.4. 1
on the coefficients
f, g, h, h.
The controls considered are the Holder continuous feedback controls 11:(o:, (3, 1, Cl , c2 ) given by Definition 22.3.7. The controls we consider are Holder continuous feedback controls defined as follows. So we fix constants (o:, f3, r. q , c2) and as before write e:H = Q:(o:, (3, , , C l , C2) · We continue to work with the space n as defined above (Sec tion 22.4.3) ; it is routine to see that for any control B E Q:H there is a solution to (22.4.9) having the energy bound (22.4.2) . As before write U8 for the set of all such solutions, and then the minimal cost is defined as expected: for B E Q:H extend the earlier definition to give: J ( B ) = inf { J(B, u) and then set
JoH
=
: u E U8 }
inf { J(B ) : B E Q:H } .
The standard part of an internal control is defined in the natural way: Definition 22.4.8 Suppose that 8 E * Q:H , so 8 : * [0, T] 08 = B E Q:H by B(t, u) = 08 ( t , u ) = 08(T, U)
for any T :::::J tion (22.3.7)).
2 2 . 4 .7
t
and
U
:::::J
u
x
*H
+
*M. Define
in H (this makes sense because of the condi
Approximate solutions for Holder continuous controls
For an initial condition u0 E H and an internal control 8 E * Q:H the sets X18 and X8 are defined as in Definition 22.4.3 with F/( modified in the obvious way:
F/; (a, U(a))
=
(*f (a, U(a) , 8(a, U(a))) , *e ) k
Then we have the counterpart of Theorem controls.
22.4.4
for Holder continuous
Theorem 22.4.9
(a) Let 8 be an internal Holder continuous control (that is, 8 E *Q: H ) and
8 uEX . U ( , w) is
,
Then for a. a. w E n I U(T, w) l is finite for all T E * [O, T] and weakly S continuous. The process defined by u(t, w) 0U ( T, w ) for T :::::J t belongs to U o e and

=
22.4. Optimal control for
d
=
3
345
(b) Let B E �H be a standard Holder continuous control and u E U8 . Then
there exists U E x *e with u = 0U as defined in part (a), and hence J(*B, U) ;::; :J(B, u) .
The proof is similar to the proof of Theorem 22.4.4 except for dealing with the terms that contain the control , and this is routine using the Holder continuity property. Finally we have the counterpart of Theorem 22.4.5.
Let f, g, h and h satisfy Conditions 22.4. 1. (a) For every Holder continuous control e there is an optimal solution u 8 E U 8 with :J(B, u8 ) :J(B) . (b) Let Bn be a sequence of controls in �H and Un E U8 n , with :J(Bn , u n ) converging. Then there is a control B E �H and u E U8 with :J(B, u) limn_, J ( Bn , Un )
Theorem 22.4.10
=
CXJ
=
·
Proof. Proved from Theorem 22.4.9 just as Theorem 22.4.5 followed from Theorem 22.4.4. D The existence of an optimal control now follows. Corollary 22.4 . 1 1 There is an optimal Holder continuous timal solution u 80 E U 80 with :J(Bo, u 80 ) = :J(Bo) = :J0H .
control Bo and op
Proof. Take a minimizing sequence Bn of controls in �H and solutions Un E U e with :J(Bn , u8n ) + :J0H and apply (b) of the previous theorem. D
Appendix: Nonstandard representations of the spaces Hr For ease of reference the most important facts about the nonstandard rep resentation of members of the spaces Hr are summarized here, together with other crucial nonstandard matters. For full details consult [6] . The space *H has a basis (*e n ) n eN = (En ) n eN given by the nonstandard extension *e of the function e : N + H. For each U E *H there is a unique internal sequence of hyperreals (Un ) n eN such that *=
22.
346
Optimal control for NavierStokes equations
and the subspace HN of *H defined by
HN
=
N
{ L Un En : n =l
U E *IRN , U internal }
O n H ( and also V or any of the spaces Hr) there are both weak and strong topologies. Here are the most important nonstandard facts.
Let U E *H (or H N ) Then U is (strongly) nearstandard to u in H (denoted U :::::J u ) if I U  *u l :::::J 0 (and then l UI :::::J l u i ). U is weakly nearstandard to u in H (denoted U :::::Jw u) if ( U, *v) :::::J ( u, v) for all v E H and l u i � 0 1 UI (allowing this to be oo). If U is strongly nearstandard then it is weakly nearstandard and the stan dard parts agree, so we write o U for the standard part in whichever topol ogy it may be nearstandard. If l UI is finite then U is weakly nearstandard and (0 U) n 0 (Un ) for fi nite n . If II U II is finite then U is strongly nearstandard in H. If I U I , I V I < oo then U :::::J w V � U� :::::J Vi V i E N.
Proposition 22.4. 1 2 (a) (a) (b)
(c) (d) (e)
=
Similar facts obtain for other spaces in the spectrum Hr. From (22.2.2) we have the following lemma ( a slight extension of the Crucial Lemma ( Lemma 2.7.7) of [6] ) , which is crucial for taking standard parts of the quadratic term B(U). which is the source of many of the difficulties when discussing the NavierStokes equations. Le mma 22.4 . 1 3 ( Crucial Lemma)
finite, and z E V then where u
=
ou
and v
=
If U, V E * V with II UII and II VII both
*b(U, V, *z) b(u, v, z) 0V (with u, v E V . ) Hence, if II U II < oo *B (U) :::::J b(u) in V ' (weakly) :::::J
For the proof consult any of [ 6. 9, 10] . Taking standard parts of the term AU needs the following observation. Le mma 22.4 . 14
If U E * V with IIUII finite with u *AU:::::J Au in V ' (weakly)
=
ou
then
347
References
References
[1]
S . ALBEVERIO, J . E . F ENSTAD , R. H OEGH KROHN and T. LINDSTROM,
Nonstandard Methods in Stochastic Analysis and Mathematical Physics, Academic Press, New York, 1986.
[2]
E . .J . BALDER, New fundamentals of Young measure convergence, in Cal culus of Variations and Differential Equations (Eds. A. Ioffe,S. Reich & I. Shafrir) , CRC Research Notes 410, CRC Press, Boca Raton, 1999.
[3]
H . I . B RECKNER, Approximatwn and Optimal control of the Stochastic NavierStokes Equation, PhD Thesis, Halle (Saale) , 1999.
http : //webdo c . sub . gwdg . de/ebook/e/2002/pub/mathe/ 99H10 1 /of _index . htm
[4]
M . CAPINSKI and N . J . CuTLAND. "Stochastic NavierStokes equations", Acta Applicanda Mathematicae, 25 ( 1991) 5985.
[5]
M. CAPINSKI and N . J . CUTLAND , " Existence of global stochastic flow and at tractors for N avierStokes equations", Probability Theory and Related Fields, 1 1 5 (1999) 121151.
[6]
M . CAPINSKI and N . J . C C TLAND , Nonstandard Scientific, 1997.
Fluid Mechanics, World
Methods for Stochastic
[7]
N . J CUTLAND , " Infinitesimal methods in control theory: deterministic and stochastic", Acta Appl. Math. , 5 (1986) 105 135.
[8]
N . .J . CUTLAND , Loeb measure theory, in Developments in nonstandard mathematics, Eds. N.J. Cutland, F. Oliveira, V. Neves, J . SousaPinto, Pitman Research Notes in Mathematics Vol. 336, Longman, 1995.
[9]
N . J . CUTLAND and K. GRZESIAK, "Optimal control for 2dimensional stochastic N avierStokes equations", to appear in Applied :tvlathematics and Optimization.
[10]
N . J . CUTLAND and K. GRZESIAK, "Optimal control for 3dimensional stochastic N avierStokes equations", Stochastics and Stochastic Reports, 77 (2005) 437454.
[11]
N . J . CUTLAND and H . J . KEISLER. " Global attractors for 3dimensional stochastic NavierStokes equations". Journal of Dynamics and Differential Equations", 16 (2004) 205266.
.
.
22.
348
[12]
Optimal control for NavierStokes equations
N . J . CuTLAND and H . J . KEISLER, "At tractors and neoattractors for 3D stochastic NavierStokes equations", Stochastics and Dynamics, 5 (2005)
487533.
[13] [14]
G . DAP RATO and A. DEBUSSCHE. " Dynamic programming for the stochastic N avierStokes equations", Mathematical Modelling and N umer ical Analysis, 34 (2000) 459 475. G . DAPRATO and J . ZABCZYK , Stochastic Equations Cambridge University Press, Cambridge, 1992.
sions,
in Infinite Dimen
[15]
R.V. GAMKRELIDZE,
[16]
K. G RZESIAK, Optimal Control for NavierStokes Equations using Non standard Analysis, PhD Thesis, University of Hull, UK, 2003.
[17]
A . ICHIKAWA , " Stability of semilinear stochastic evolution equations", Journal of f\Iathcmatical Analysis and Applications, 90 (1982) 1244.
[18]
S . S . SRITHARAN (ed. ) , Optimal Control of Viscous in Applied Mathematics, Philadelphia, 1998.
[19]
R. TEMAM, NavierStokes edition, 1984.
[20]
M. VALADIER, Young measures, in Methods of Nonconvex Analysis (A. Cellina, ed. ) , Lecture Notes in f\Iathematics 1446, SpringerVerlag, Berlin,
1978.
1990.
Principles of Optimal Control, Plenum, New York,
Equations,
Flow, SIAM Frontiers
NorthHolland, Amsterdam, third
Lo calintime existence of strong solutions of the ndimensional Burgers equation via discretizations Joao Paulo Teixeira*
Consider the u1.
=
Abstract
e quation :
111l11.
 (·u V')u + f ·
for
a:
E
[0, 1]"
and t E
(01
)
x ,
together v.rith periodic boundary conditions and initial condition ·u ( t.. 0)
_q(:c) .
=
This corresponds a NavierStoket; problem where t he incompress ibility condition has been dropp ed . The maj or difficulty iu existence proofs for this simplified problem is the unbo unded advection term,
(n · \7)'!1..
We present a proof of localintime existence of a smooth solution based on a di scr e ti zation by a suitable Euler scheme. It will be shown that this solution exists in an int erval with C depending only on n and tlte values of the Lipschitz constants of f and u a.t time The argument given is based directly on local estimates of the solutions of the discrctizcd problem.
[0. T) , where T ::; t;,
23 . 1
0.
Introduction
The Burgers equation 'Ut
 II.I::!..U + (u \l)·u = .{ ·
x E D c �" , t > O
provides an example of a model for flows that takes into account the interaction between diffusion and ( nonlinear ) advection. This is probably the simplest nonlinear physical model for turbulence. *
lllstitnto Superior Tecn.ico, Lisbon, Portugal . j teix@math . ist . utl . pt
350
23. Burgers equation via discretizations
This equation is a simplification of the NavierStokes equations where the incompressibility constraint on the flow has been dropped. The NavierStokes equations are usually studied by taking projected solutions of the Burgers equations onto a subspace of divergence free functions. The main difficulty in the analysis of nonlinear flows, the advection term, ( u · 'V)u, is present in both equations. It is a consequence of the incompressibility condition that there are only trivial NavicrStokes flows for n = 1 . This is not the case in Burgers flows, where the case n = 1 is already nontrivial. Cole [6] and Hopf [10] have studied the problem in the real line:
{
Ut  VUxx + UUx u(x, t) = uo (x)
=
0
X
E
X
ER
IR, t > 0
(23. 1 . 1 )
Using the transformation of variables
v u = 2v ,
(23 . 1 .2)
V.r
(23. 1 . 1) was reduced to and initial value problem for the heat equation:
{
Vt  VVxx = 0
v(x, t)
=
vo(x)
X
E
X
ER
IR, t > 0
(23 . 1 .3)
(Where u� = 2 v ..!jvo ) . This enabled them to get a formula for a solution of problem (23. 1 . 1 ) . in terms of a Gaussian integral. Thus, and under very mild conditions on u0 , a solution, u, exists for all x E lR and t > 0, and is smooth. It would be reasonable to expect that the n > 1 case should behave in a similar way. The usual standard theory for this type of equations uses a weak formulation of the problem in Sobolev spaces, a Galerkin approximation to show existence of weak solutions and, finally, regularity estimates. How ever, these methods only lead to partial results. For wellposed problems with generic initial and boundary data, the best that can be obtained is local in time existence of regular solutions. It can also be shown that regular (strong) solutions are unique (if they exist) . See [4, 13, 9, 12, 16] for accounts on these methods. To gain some new insight into this problem, we develop a hyperfinite ap proach to the following model problem on a compact domain.
Let 'll' n = IRn ;zn be an ndimensional torus. Assume that f is locally Lipschitz continuous on 'll'n X [0, oo ) and u0 E C2 •1(1!'n ) (that is, uo is twice differentiable and all its second partial derivatives are Lipschitz continuous on 'll'n ). Let D = 'll'n X (0, oo ) . Let E JR+ . Let f IRn ;zn + IR, Model Problem:
v
:
351
23. 1 . Diffusionadvection equations in the torus
u0 : JRn ;zn + JRn . Our task is to study the initial value problem for the Burgers equations:
{
Ut  vb.. u + (u · 'V)u = f u = uo
D on 'll'n X {0}. in
(23. 1 . 4)
A (strong) solution of this problem is a sufficiently smooth u satisfying (23 . 1 . 4) . By "sufficiently smooth u" we mean that u : 'll'n x [0, oo ) + lR is such that u(  , t) E C2 ( 1l'n ) for all t E [0, oo ) and u ( x , ·) E C 1 ( [0. oo ) ) for all x E 'll'n .
23 . 2
A discretization for the diffusionadvection equations in the torus
We now look for a discretized version of problem (23 . 1 .4) . Since we are mainly interested in the existence result, we will do this in the simplest way possible. First we work in the standard universe. To discretize 'll'n , we introduce an hspaced grid on 'll'n . Choose lvf E N 1 , and let h = Jv1 . Then, let: 11':\1
=
{ 0, h, 2h, . . . , (M  l ) h , 1 r
=
h (Z mod Mt.
Consistently with our interpretation of 'll'M as a discrete version of the toms, given any x ( m 1 , m2 , . . . , mn ) h and
'll'n , we define addition in 1!'�1 as follows: y = (h, l2 , . . . , ln )h in 11':\1 , let:
=
This makes addition welldefined in 'll'M ; furthermore, it will behave in a similar way to addition in 'll'n . In particular, the set of gridneighbors of any x E 'll'M ,
{x ± hei :
i = 1, 2, . . . , n }
is welldefined. As for the discretization of time, consider T and define:
lf:
=
{ 0, k, 2k, . . . , (K  l)k, T }
E JR+ and K E N 1 . =
k (N n [0, K) );
Let
k
=
J? ,
To each triple d = (M, K, T), with M , N E N 1 and T E JR+ , we associate a discretization as defined above . We will , later on, introduce some restrictions on the set of admissible d. For now, given any d = (lvf, K, T) , with M, N E N 1 and T E JR+ , we let :
352
23. Burgers equation via discretizations Dd
=
']['M
X
(II u {T})
The elements of Dd are called gridpoints. Any function U whose domain is a subset of Dd is called a gridfunction. The discrete Laplacian can be defined as a map b.. d : JR15d + JR15d given by:
(
)
1 n U(x + h e,, t)  2U(x, t) + U(x  h ei , t) . (23.2. 1 ) h2 L i=l The definition of addition in 'll'M makes this welldefined. The discrete version of the nonlinear parabolic operator, P, that occurs in the diffusionadvection equations is defined as the map Pd : JRDd + JRDd b.. d U(x, t)
=
given by:
" U(x, t + k)  U(x, t)  vudU ( x, t ) k U(x + he , , t)  U(x  he , , t) . L....., U · ( x, t ) +� 2h i= l '
With we get:
PdU(x, t) �
=
,�;,, (U(x, t + k)  ( 1  2nvA)U(x , t) 
 >.
� ((
( �
)
�
(23.2.2)
)
))
v  U; (x, t) U(x I he ; , t) + v + U; (x, t) U(x  he ; , t)
The discretized version of problem (23 . 1 .4) is, then:
{
PdU(x, t) f(x, t) U(x, 0) uo(x, 0) =
=
if
(x, t)
if
X E
E
Dd
1l'A1 ·
(23.2.3)
Let us look more closely at the finite difference equation in (23.2.3). If we solve for U(x, t + k), we get: =
U(x, t + k) (1  2nv.A) U(x, t) n (, h h t) U(x+hei , t) + v+ 2 Ui (x, t) U(x  he i . t)�) U (x, +A v i 2 \ + .Ah 2 f(x, t)
�(
)
(
)
(23.2.4)
23.2.
353
Standard estimates for the discrete problem
Equation (23.2.4), together with the initial condition in (23.2.3), gives a recursive formula for the unique solution of problem (23.2.3). Define a map
by:
+ + � (0  � v, (x t)) U(x+he, t) + 0+�V,(x, t)) U(x  he,, ,v,
=
(1  2nvA) U(x, t)
.x
,
whenever
(x, t)
E Dd . Equation
+
U ( x, t k)
=
(23.2.4)
(23.2.5)
can now be written as:
+
In what follows, we will require the righthand side of (23.2.5) to be in the form of a weighted average, that is, all coefficients multiplying U(x, t) and U(x ± he, , t) must be positive. For this, we will always assume that:
1 A<2m/
( 23.2.6)
We also require that: for all ( x, t ) E Dd ,
(23.2. 7)
where ud is the solution of (23.2.3), relative to d. This means that the set of admissible discretizations is:
{
(M, K, T) E I'h
x
I'h
x
JR+ :
1 A<2nv
1\
V( x, t )

E Dd
2v I Ud ( x. t ) l < h
}
Note that condition (23.2.7) is easily satisfied when we work in the nonstandard universe. Whenever h is infinitesimaL 2/: will be infinitely large; so, if U is kept finite, then condition (23.2.7) holds.
23.3
Some standard estimates for the solution of the discrete problem
In this section, we derive estimates that will not require a nonstandard discretization.
23.
354
A
Burgers equation via discretizations
Consider any ( standard) discretization, Dd , let:
c
I I U IIL� ( A)
max
(x,t)EA
[U] L� ( A) [[UlJL� ( A)
d.
Given
U
Dd +
IRn
and
I U(x, t) l
I U(x, t)  U(y, t) l lx  Y l I U(x, t + k)  U(x, t) l max k (x , t) , (x , t +k )EA max
(x,t) , (y , t)EA #y
=
Here, I · I denotes the Euclidean norm on ]RJ . We begin by showing some properties of
Let U, V E (IRn ) 15d . Let h > 0 be such that: 2v . h< V I II L'; (Dd ) Then, for all (x, t) E Dd : I
Lemma 1
(23.3. 1)
(x, t) E Dd: I <�>(U V) (x, t) l <::: I (1  2nvA) U(x, t) l +
Proof. For any ,
A
I
Let
� ( I (v  � V; (x, t) ) U(x + he; , t) H (v + � V; (x, t) ) U(x  he; , t) l )
M
=
II U II Lc; (l5d ) ·
By condition
(23.3.1):
h v ± 2 V(x, t) > O. Thus:
I <�>(U, V)(x, t) l +A Since
�((
<:::
�
(1  2nvA) I U(x, t) l +
( �
)
)
v  V; (x, t) I U(x + he; , t) l + v + V; (x, t) I U(x  he , t) l
I U(x, t) l
<:::
I
M,
we get:
)
� (v  h Vi(x, t) + v + h Vi(x, t) ) n
<:::
(1  2nvA) M + AM
<:::
(1  2nvA) M + AM2vn
2
=
M.
2
D
23.3.
355
Standard estimates for the discrete problem
Lemma 2
Let U, V, W, Z E (IRn y15 d . Then, for all (x, t) =
E Dd :
Proof. For all
(x, t)
(
)(
)
E Dd :
=
)
v( (v % (v %
(1  2n :A. ) U( x , t)  V(x, t) +
� ( (v % W, (x, t) ) U(x + he; , t)  z, (x, t) ) V(x + he, t) + + (v + � W; (x, t) ) U(x  he , t)  + z; (x , t) ) V(x  he ; , t) ) ( 1  2n:A. v ) ( U(x, t)  V(x, t) ) + n h + A � ( (v W,(x , t) ) ( U(x + he , , t)  V(x + he , , t) ) + + (v + � W, (x, t) ) ( U(x  h e, , t)  V(x  h e2 , t) )) + + >.
=
2
8(
Ah n +2 ( Zi (x, t)  Wi (x, t) ) V(x + he , , t) =
)(
Lemma 3
(
)
( Zi (x, t)  Wi (x, t) ) V(x  h e, , t)
Let d be a discretization such that: 2v h < uo ll ll u'c + T ll f ll u)O
If U is the solution of the discrete problem,
(23. 2. 3),
In partzcular, d is an admissible discretization.
then:
)
.
D
23.
356
Burgers equation via discretizations
Proof. Let M = ll uo ll uxo and L = l l f ll uxo . We show, by induction on 0, k, . . . . T, that for all T = 0, k, . . . , t and for all x E 'll'j\1 :
I U ( x, t) l
t
<::: M + tL.
By the initial condition in problem (23.2.3), the result holds for t = 0. As suming it is valid for some t E lJ: , we have, by the hypothesis on h and the induction hypothesis, that:
2v
2v < M + tL I U ( x, t) l ' x {0. k, 2k, . . . , t}. Hence, by Lemma 2: h<
for all ( x, t) E 'll'j\1
I U ( x, t + k) l 23.4
M + TL <
2v
<::: I
=
M + (t + k)L. D
Main estimates on the hyperfinite discrete problem
The nonstandard analytical setup we now use is as follows. We work in a superstructure (V(IR) , *V(IR), *) . We will omit the stars on all standard functions of one or several variables and usual binary relations. Any x E *IRn is called finite iff there exists m E N such that l xl < m. Each finite x E *JR can be uniquely decomposed as x = r + c: , where r E lR and c is an infinitesimal; r is called the standard part of x, and denoted by st x. If x, y E *IR are such that x  y is infinitesimal, then we say that x is infinitely close to y, and write x :::::J y. Similarly, if x, y E *IRn : x :::::J y
lx  Yl
iff
:::::J 0 iff
Xi
:::::J
Yi ,
for each i
=
1,
. . .
,
n.
Let j, l E N. If x E * ]RJ is finite then let:
If F : A
c
*JRJ
+
*IR1 is an Scontinuous function, define °F ( st x )
For sets A E ]RJ . let:
0A
=
{ st x :
=
op by:
st ( F (x )) Vx E A.
(23.4. 1)
}
"x is finite" and x E A .
Each "circle" map as introduced above i s sometimes called a standard part map.
23.4. Main estimates on the hyperfinite discrete problem
357
In this, and other similar definitions of this work, we consider *V, where V = { � d } . *V is an internal *family of linear maps indexed in the internal set of all admissible d; each �d E *U is a *linear map acting on the vector space of internal gridfunctions U : Dd + *R Similarly, we can consider *U, where
{
U = Pd : 111 E *N, T E *JR+ , K E *N and
}
TM 2 < 1 2nv , K

*U is now an internal family of linear maps, indexed on an internal set. Each HiiK E *U is then a *linear map acting on the vector space of internal grid functions. For the norms and seminorms introduced in the previous Section, we can proceed similarly. We get families of internal norms and seminorms which, by transfer, satisfy the following. Given U : Dd + * IRn and A C Dd :
I I U IIL� ( A) [U] L� ( A)
=
[[U]] L� ( A) Lemma 4
{
=
{
}
* max I U(x , t) l : (x, t) E A ,
} }
* max IU(x, t)  U(y, t)l : (x, t) , (y, t ) E A and x # y , =
lx Yl * max I U(x, t + k)  U(x, t) l : (x, t), (x, t + k) E A . k
{
_
Let d be an admissible discretization such that h is infinitesimal. Let
If U is the solution of the discrete problem
{23. 2. 3)
then, for T < 2n1Lo :
Lo [U] L� (Dd ) rv< 1 2nLoT 
To work internally, we will begin by requiring a weaker condition on h, namely:
Proof.
2v ,, ,h < ,l l uo l l u)C + T llfl l u)O ·
(23.4.2)
Lemma 3 shows that this is a sufficient condition for admissibility of d, which is all that is needed for now.
23. Burgers equation via discretizations
358
Fix z E 'll'M and write V(x, t)
U(x + z, t + k)  U(x, t + k) =
=
=
U(x + z, t) . Then, by Lemma 2:
.h2 (f(x + z, t)  f(x, t))
(
+ >.h 2 (J(x + z, t)  f(x, t))
)(
)
By *recursion, we will construct a function L : {0 , k, 2k, . . . , T} that, for all t 0 , k, . . . T and x, z E 'll'M :
+
*IR such
=
I U(x + z , t)  U(x, t) l :::; L(t) lzl
(23.4.3)
Let L (t) be given by:
For t
{
=
0,
Lo L(t) + nk(L(t)) 2 + nkL6
L(O) L(t + k)
we get:
I U(x + z, 0 )  U(x, 0 ) I
=
lu.o(x + z)  u.o(x) I :::; [ u.o ] Ld (1I''M) lzl :::; Lo I z l .
Now, we assume that for all T
=
0,
k, . . . , t and x E 'll'j\1
I U(x + z, T)  U(x, T) l :::; L(t) lzl . Since d is admissible ( and so I V(x, T) l that I
=
I U(x + z, T) l
<
2{), Lemma 1 implies
I U(x + z, t + k)  U(x, t + k) l :::; L(t) l zl Ah n + 2 L L(t) lziL(t)2h + Ah 2 [j]Lx lzl i= l :::; L(t) + nk(L(t)) 2 + nkL6 l z l .
(
)
This shows inequality (23.4.3) by *induction. Note that L( t) defines the Euler iterates for the standard ODE initial value problem:
{
y' y(O) Lo =
(23.4.4)
23.4. Main estimates on the hyperfinite discrete problem Since
y
'
359
2ny 2 , and thus: Lo Y (t) < 1  2n L0 t
2 0 , y (t) 2 L0 . Hence,
y
'
:<::
Now, use the fact that h is infinitesimal. By the convergence of the Euler iterates to the solutions of problem (23.4.4) , we conclude that: Lo L(t) :::::J y (t) :<:: 1  2n L t o This estimate is valid for t in any compact interval where problem (23.4.4) has solution; in particular, the estimate will be valid for t < 2n1Lo . D
Let d be an admissible discretization such that h is infinitesimal. Let Lo be as in Lemma 4 and
Lemma 5
Mo = max ([[uo lJL�(1I':\I ) ' � ( [[f]] L�(l5d) f12, Lo ) '
[[uolJL�(1I':\I ) = * max { v� uo (x)  (u · 'V)uo(x) + f(x, 0 ) :
with
x E 1!'%J }
If U is the solution of the discrete problem {23. 2.3) then, for T <
2n1Lo :
o [[ u]] L�(Dd) < 1  M 2n Lo T Proof. Again we begin by requiring that h satisfies the weaker statement: 2v (23.4.5) h < uo ll ll u " + T i l f i l L Write V(x, t) U(x, t + k) . Then, by Lemma 2: U(x, t + 2k)  U(x, t + k)
.h 2 (f(x, t + k)  f(x, t)) =
=
=
=
l=l
+ >.h2 (f(x, t + k)
 f(x, t))
)
)
By *recursion, we again construct a function !VI : {0 , k, 2k, . . . , T} that , for all t = 0 , k, . . . T and x E 1l'A1 :
I U(x, t + k)  U(x, t) l
:<::
M(t)k.
+
*JR such
(23.4.6)
23. Burgers equation via discretizations
360
Let M ( t) be given by:
{
Mo M(t) + nkM(t)L(t) + nkMJ Here, L is the function introduced in the proof of Lemma 4. For t 0 , we get: n n U(x, k)  U(x, 0 ) k :::::J V� UO ( X )  ( UO · \7 ) UO ( X ) + j ( X , 0 ) < [[uolJL:J ( 1I':\I ) < Mo. Now, we assume that for all T 0 , k, . . . , t and x E 1!'�1 M(O) M(t + k)
=
=
I U(x, T + k)  U(x, T) l <::: M(t)k. Since d is admissible ( and so I V(x, T) I I U(x, T + k) l < 2/: ) , Lemma 1 implies that I
i=l
<
(M(t) + nkM(t)L(t) + nkMJ ) k
M ( t + k)k.
This shows inequality (23.4.6) by *induction. Now, we note that M(t) defines the Euler iterates for the standard ODE initial value problem:
{
=
n z y + nMJ (23.4. 7) Mo. Here, y( t) is the solution of problem (23.4. 7) , as given in the proof of Lemma 4. Since z' 2 0 , z(t) 2 Mo, so nz(t)y(t) 2 nMoLo 2 nMJ. Hence, z'(t) <::: 2nz(t)y(t) , and thus: Mo z(t) < 1  2nLot z' z(O)
=
Now, use the fact that h is infinitesimal. By the convergence of the Euler iterates to the solutions of problem (23.4.4) , we conclude that:
M (t) :::::J z(t) <
Mo 1  2nL0t
23.5. Existence and uniqueness of solution
361
This estimate is valid for t in any compact interval where problem (23.4.4) has solution; in particular, the estimate will be valid for t < 2n1Lo . D
23 . 5
Existence and uniqueness of solution
By the results of section 23.4, the function U(x, t) is Scontinuous for t E 1 . Note that the set of admissible values for T < Lo 2n depends only on L0, that is, on the size of the Lipschitz constants of u0 and f. Therefore, u(st x, st t) st U(x, t) (23.5. 1) is welldefined and continuous on 'll'n x [0, T] . Also, Lemmas 4 and 5 imply that u is globally Lipschitz on 'll'n x [0, T] . To show that u solves (locally in time) the problem and also to prove a uniqueness result, we need the following:
{0, k, . . . , T}, with T
=
Let a : 'll'n x [0, T] + lR be Lipschitz continuous. Then, the problem, in D Vt  vb.. v + (a · 'V)v f (23.5.2) on 'll'n X {0}. uo has a unique solution v E C2 • 1 ( 1l'n X [0, T] ) . Furthermore, if v is a solution of problem (23. 5. 2) then (in the nonstandard universe) v satisfies: for all (x, t) E Dd , there exists an c :::::J 0 such that v(x, t + k)
.h 2 f(x, t) + r:: >. h 2 . Proof. Consider problem (23.5.2). Since a is given, the vectorial differential equation Vt v b.. v + (a 'V)v f is just an uncoupled system of n scalar second order parabolic equations. Since a is a Lipschitz continuous function on ']['n X [0, T] and Uo E C2 • 1 ('ll'n ), by the theory of parabolic equations (e.g. Friedman, [8]) , the problem has a unique (strong) solution, v E C2, 1 ( 1l'n X [0, T] ). By the regularity of v: f(x, t) Vt (X, t)  vb.. v (x, t) + (a(x, t) · 'V)v(x, t) Pd v(x, t) v(x, t + k)  v(x, t) vu" v x, t + � a" x, t v(x + h e2, t)  v(x  he i , t)  d ( ) 6 ( ) k 2h l=l Lemma 6
{
=
v =
=

_
• •
·
=
=
>,;,.' (v (x, t + k)  (1  2n v,\)n(x, t) 
:::::J
 ), � ( (v  �a; (x, t) ) n(x + he; , t) + (v + � a,(x t) ) n(x  he, , t) ) )
23. Burgers equation via discretizations
362
Consequently, for each (x, t) E Dd, there exists an c :::::J 0 such that:
,
v(x t + k) + >. =
=
v
(1  2n .A ) v(x , t) +
� ((v � a, (x, t) ) v(x + he; , t) + (v + � a; (x, t) ) v(x  he, , t)) +
+ .Ah 2 f(x , t) + r:: .A h 2
D
Here is our existence lemma: Lemma 7 Let u and T be as given by equation (23. 5. 1 ). Then u E C( 2 ; l ) ( 'll'n x [0, T] ) and u is a (strong) solution of the Burgers equation problem (23. L4) . Proof.
Consider the problem:
{
Vt
v
 Vb,. V + ( U · \7) V
=
=
j
in D on 'll'n
uo
X
{0}.
By Lemma 6 the problem has a unique (strong) solution, u E C2, 1 ( 1l'n X [0, T] ) and, in the nonstandard universe, for all (x, t) E Dd, there exists an c :::::J 0 such that: v(x , t + k)
=
=
T =
I U(x, t)  * v(x , t) l ::; rt. This implies that, for all (x, t) E Dd: u( st x, st t) st U(x, t) st *v(x, t) v( st x, st t), =
=
=
as wanted. (As usual, from this point on we omit stars on *v) . For t = 0 the statement follows from the initial conditions. Now, assuming the statement holds true for some t jk, j E *N: =
U(x, t + k)  v(x, t + k)
= =
(
)(
)
Ah n u (x, t)  U (x, t) +2 v(x + hei, t)  v(x  he 2 , t) . , L " i= l
23.5. Existence and uniqueness of solution
363
Since v(  , t) is a C2 function on the compact set 'll'n , there exists a finite C so that l v(x + he, , t)  v(x  hei , t) I :S 2hC. Also, by the definition and continuity of u , Ui (x, t)  Ui (x, t) = 6 :::::J 0. By the induction hypothesis and Lemma 3, I
I U(x, t + k)  v(x, t + k) l
rt + ck + )..2h n6 2hC = rt + k (c + nC6) :S r(t + k ) . :S
D
As consequence of the above Lemma, we get: Theorem 1
C2, 1 ('ll'n ) . Let
Let f be locally Lipschitz continuous on 'll'n
(
1
x
[0, oo ) and u0
1
E
)
L  max [uo] L=d ('II'nM ) n ( [f] £=d (75d ) ) 1 / 2 , [[uolJ Ud"' ( � nM ) n ( [[fll u"'d ( Dd ) ) 1 /2 · >
>
Here, [[uolJ Ud"' (1I'M" ) is an upperbound for the Lipschitz constant of the function �uo (x) + (uo · 'V)uo(x) + f(x, O) . Then, for any T < 2�L ' the problem
{
Ut = v�u  (u · 'V)u + f u = uo has a strong solution, u E C2, 1 (1l'n X [0, T] ) .
in D on 'll'n
X
{0}.
The uniqueness theorem follows from:
Let u and v be solutions of the Burgers equation problem [0, T] , for some T > 0. Let Lv = [v] L d (1I'n x [O , T] ) . Then:
Lemma 8
on 'll' n
X
u(x, t) = v(x, t)
for all
X
E
(23. 1..4)
{1, }
'll'n , 0 :S t < min T 2nLv
Proof. Using the nonstandard statement of Lemma 6, (note that u and v are given functions) , we conclude that for all (x, t) E Dd, there exist infinitesimal 61 and 62 such that:
u(x, t + k) =
v(x, t + k)
=
.h 2 .
23. Burgers equation via discretizations
364
}
{
Let 8 be the largest ik, i E *N, such that ik :S min 2n1L v , T . Let r E � + be arbitrary. We will show by induction on t 0, k, 2k, . . . . 8 that for all 0, k, 2k, . . . , t and x E 'll'j\;1 : =
T =
l * u(x, t)  * v(x, t) l
This implies that, for all (x, t)
E
:<::
rt.
Dd :
u(st x, st t)
=
s
v( t x, st t),
as wanted. (From hereon, we omit stars on *u and *v). For t = 0 the statement follows from the initial conditions. Now, assuming the statement holds true for some =
t jk, j E *N:
u(x, t + k )  v(x, t + k ) Let
c =
61  62
::::::
=
.h 2
0. Using Lemma 2: =
u(x, t + k )  v(x, t + k )
.h u(x, t + k )  v(x, t + k ) < rt + c:k + 2 nrt 2hLv ,
l
Since t :<:: e
:<::
l
rt + k ( c + rntLv ) .
2nL ' it follows that: lu(x, t + k )  v(x, t + k ) l
:<::
r(t + k ).
Let u and v be solutions of our Model Problem [0, T] , for some T > 0. Then u v.
Theorem 2
=
D
(23. 1..4)
Assume there exists t E [0, T] such that, for some x v(x, t) ; let B be the infimum of such t. Note that: u(x, B) v(x, B) , for all x E 'll' n .
Proof.
=
E
on 'll'n
x
'll'n , u(x, t) #
(23.5.3)
References
365
If e 0 then equality (23.5.3) follows from the initial conditions; otherwise, it follows from continuity of u and Consider the problem: =
{
v.
Wt
w
=
=
vD.. w  (w · V')w + g u(x , e)
in D on 1rn
X
(23.5.4)
{0}.
where g(x, t) f(x, e + t), for all (x, t) E 'll'n X [0, T  B] . Consider WI , W2 : 'll'n X [0, T  B] + lR given by WI (x, t) u(x, e + t) and WI (x, t) ( x , e + t). But W I and w2 are solutions of problem (23.5.4) such that, for arbitrarily small t > 0: =
This contradicts Lemma 8.
=
=
v
D
References [1] .T . M . BURGERS, "A Mathematical Model Illustrating the Theory of Tur bulence", Advances in Applied Mechanics, 1 ( 1948) 171  199. [2] M. CAPINSKI and N . J . CUTLAND, "A Simple Proof of Existence of Weak and Statistical Solutions of Navier Stokcs Equations", Proceedings of the Royal Society, London. Ser. A, 436 ( 1992) 1  1 1 . [3] R. COURANT and D . HILBERT. Methods of mathematical physics, Inter science Publishers, New York, 1953 1962. [4] P. CIARLET, The finite element method for elliptic problems, North Hol land, New York, 1978. [5] C . CHANG and H . J . KEISLER, Model Theory, North Holland, Amsterdam, 1990. [6] J. D . COLE, "On a Quasilinear Parabolic Equation Occuring in Aerody namics", Quant. Appl. Math., 9 ( 1951) 225236. [7] N. CUTLAND , Nonstandard Analysis and its Applications, London Math ematical Society, Cambridge, 1988. [8] A. FRIEDMAN , Partial differentzal equatwns of parabolic type, Prentice Hall, Inc., Englewood Cliffs, N . .T., 1 964. [9] D . HENRY, Geometric theory of semilinear parabolic equations, Springer Verlag, New York, 1981.
23. Burgers equation via discretizations
366
[10] E. HOPF, "The Partial Differential Equation Ut + UUx nications Pure and Applied Math. , 3 (1950) 201 230.
=
VUxx ",
Commu
[11] H . J KEISLER, Foundations of Infinitesimal Calculus, Prindle, Weber and Schmidt, 1976. .
.
[12] .J .L. LIONS, Quelques methodes de resolution des problems aux limites nonlineaires, Dunod, GauthierVillars, Paris, 1969. [13] J .L. LIONS, Control of systems governed by partial differencial equations, SpringerVerlag, New York, 1972. [14] A . J . MAJDA and A.L. BARTOZZI, Vorticity and Incompressible Flows, Cambridge Texts in Applied Math., Cambridge, 2002. [15] K . D . STROYAN , Introduction to the Theory of Infinitesimals, Academic Press, New York, 1976. [16] LUTHER W. WHITE, "A Study of Uniqueness for the Initialization Prob lem in Burgers ' Equations", .T. of Math. Analysis and Applications, 1 72 (1993) 41243 1.
C alculus with infinitesimals Keith D . Stroyan *
24. 1
Intuitive proofs with "s1nall" quantities
Abraham Robinson discovered a rigorous approach to calculus with in finitesimals in 1960 and published it in [9] . This solved a 300 yem· old problem elating to Lcibni z and Newton. Extending t he ordered field of (Declekind) "real" numbers Lo indude infiniLosinmls is noL difficult algebraically, buL calcu lus depends on approximations with transceudental functions. Robinson used mathematical logic to show how to extend a.ll real ftmctions in a way that preserves their properties in a precise sense. These properties cm1 be used to develop calculus with infinitesimals. Infinitesimal numbers have always fit basic intuitive approximation when certain quantities are "small enough," but Leibniz, Euler, and many others could not make the approach free of contra diction. Sec:tion 1 of this article uses some intuitive approximations to derive a few fundamental results of analysis. We usc approximate equality, x � y, only in an intuitive sense that " x is sufficiently close to y". II. Jerome Keisler developed simpler approaches to Robinson's logic and began using infinitesimals in begilming U.S. calculus courses in 1 969. The experimental and first edition of his hook were used widely iu th e 1970's. Sec tion 2 of this article completes the intuitive proofs of Section 1 nsiug Keisler's approach to infinitcsimals from [6J . 24 . 1 . 1
Continuity and extreme values
Theorem 1 The Extreme Value Theorem Suppose a function f [x] is continuous on a compact interval [a, b] . Then f[x] attains both a maximttm nnd minimum, that is, there are points Xl\.fAX and Xmin in [ a, b] , so that for eve1·y other x in [a, b] , f [xmin] :::; f [x] :::; f [XMAX] · ·university of Iowa,
USA.
keithstroyan@uiowa . edu
370
24. Calculus with infinitesimals
Formulating the meaning of "continuous" is a large part of making this result precise. We will take the intuitive definition that f[x] is continuous means that if an input value x1 is close to another, x2 , then the output values a.rc close. We summarize this as: ! [:1:] is continuous if and only if a. ::::; x1 g::: x2 ::::; b ==} .f[x r] g::: j [x2]. Given this property of f[x] , if we partition [a, b] into tiny increments,
a. < a +
k(b  a) 2(b  a) l (b  a) < ··· < b < ··· < a+ < a+ H H H
the maximum of the function on the finite partition occurs at one (or more) of the points :t:AJ = a + k ( a) . This means thai, for any other ptutit;iou point j (ba) rr : ' [X M ] 2: }· [.!.:  ( . _ X 1  Q, + J J Any point a ::::; x ::::; b is within f.J of a partition point x 1 a + j(bi7a) , so if H is very large, :x; g::: :r; 1 and
��
=
f[x u ] ;:::: f[x t ]
g:::
.f [x]
and we have found the approximate maximum. It is not hm·d t,o make t.his idea into a sequential argument where XJ\!J [II] depends on H, but there is quite some trouble to make the sequence X M[H] converge (using some form of compactness of [a, b] .) Robinson's theory simply shows that the hyperreal X lvJ chosen when 1/H is infinitesimal, is infinitely near an ordinary real number where the maximum occurs. We complete this proof as a simple example of Keisler's Axioms in Section 2. 24. 1 . 2
Microscopic tangency i n one variable
In beginning cakulus you learned that the derivc1bvo measures the slope of the line tangent to a curve y = f[x] at a particular point, (x, J[x] ) . We begin by setting up convenient "local variables" to use to discuss this problem. If we fix a particular (x, f [x] ) in the xycoordinates, we can define new parallel coordinates ( dx , dy) through this point. The (dx, dy)origin is the point of tangency to the curve. y
dy
m
dx
*� J;fx =
Figure 24. 1 . 1 : IVIicroscopic Tangency
24. 1 . Intuitive proofs with "small" quantities
371 =
A line in the local coordinates through the local origin has equation dy mdx for some slope m. Of course we seek the proper value of m to make dy = mdx tangent to y = f[x] . You probably learned the derivative from the approximation lim
.6 x > 0
f[x + �x]  f [x] �X
=
j ' [x] .
If we write the error in this limit explicitly, the approximation can be expressed as f[x +��f[x] = f' [x] + c or f[x + �x]  f [x] = f' [x] · �x + c �x. Intuitively we say f [x] is smooth if there is a function f'[x] that makes the error small, c 0, in the formula ·
:::::::
f [x + 6x]  f[x]
=
f'[x] · 6 + c · 6x
(24. 1 . 1)
:::::::
when the change in input is small, 6x 0. The main point of this article is to show that requiring c to be infinitesimal whenever 6x is infinitesimal and x is near standard gives an intuitive and simple direct meaning to C 1 smooth. Requiring only that c tend to zero only for fixed real x, or pointwise , is not sufficient to capture the intuitive approximations of our proofs. The nonlinear change on the left side of ( 24. 1 . 1 ) equals a linear change, m 6x, with m = f' [x] , plus a term that is small compared with the input change. The error c has a direct graphical interpretation as the error measured above x + 6x after magnification by 1j6x. This magnification makes the small change 6x appear unit size and the term c 6x measures c after magnification. ·
·
y
Figure 24. 1. 2: Magnified Error When we focus a powerful microscope at the point (x, f[x] ) we only sec the linear curve dy = m dx, because c 0 is smaller than the thickness of the line. The figure below shows a small box magnified on the right. ·
:::::::
372
24. Calculus with infinit,esimals dy
dy
X
Figure 24. 1.3: A !v!agnificd Tangent
24. 1 .3
The Fundamental Theorem of Integral Calculus
Now we usc the intuitive microscope approximation (24. 1 . 1 ) to prove:
Theorem 2 The Fundamental Theorem of Integral Calculus: Part I b Suppose we want to find fa f[x] dx . If we can find another function F[x] so that the derivative satisfies F'[x] = f[x] for every x , a :::; :c :::; b, then
1b
f [x] dx
=
F [b]  F[a]
The definition of the integral we use is the real number approximated by a sum of small slices, b6x
.l J[x] dx � f[x] b
�
·
8x, when r5x
st.C'p li.r
�
0
Figure 24.1 .4: Sum Approximations
24. 1 .4
Telescoping sums and derivatives
We know that if F[x] has derivative F' [x] = f [x] , the differential approxi mation above says,
F [x + r5x]  F[x] so we can smn both sides
=
f[x] · 8x + c: 8x •
24. L Intuitive proofs with "small" quantities bi5x
L F[x + 6x]  F[x]
x =a
,.,t('p O x
=
373 bi5x
b!5
L f [x] · 6x + L c · 6x
The telescoping sum satisfies, bi5x
L F[x + 6x]  F[x]
x=a s t e p Ox
=
F[b']  F[a]
so we obtain the approximation,
1ab f [x] dx ;::; x=a L f[x] · 6x = F[b']  F[a]  L c · 6x x=a bi5x
bi5x
"tep O x
, t c p O x
This gives, bi5x
bi5x
bi5x
x=a step Ox
x=a step O x
x=a step 6x
L f[x] · 6x  (F[b']  F[a] ) <
L c · 6x < L bi5x
< m ax [ l c l ] · L 6x x=a
or
I: .f[x] dx
step Ox
;::; 0 ;::;
=
lei
· 6x
max [ l c l ] · (b'  a )
� b;;;!� f[x] · 6x ;::; F[b']  F[a] . Since F[x] 1s continuous, step O x
F[b'] ;::; F[b] , so I: f [x] dx
=
F[b]  F[a] .
We need to know that all the epsilons above are small when the step size is small, c ;::; 0, when 6x ;::; 0 for all x a, a + 6x , a + 26, · · · . This is a uniform condition that has a simple appearance in Robinson's theory. There is something to explain here because the theorem stated above is false if we take the usual pointwise notion of derivative and the Riemann integral. ( There are pointwise differentiable functions whose derivative is not Riemann integrable. See [ 5 ] Example 35, Chapter 8, p.l07 ff. ) The condition needed to make this proof complete is natural geometrically and plays a role in the intuitive proof of the inverse function theorem in the next example. =
24 . 1 . 5
Continuity of the derivative
We show now that the differential approximation
f [x + 6x]  f[x]
=
f' [x] · 6 + c · 6x
374
24. Calculus with infinitesimals
forces the derivative function J' [x] to be continuous,
Let XI R:: x2 , but :r 1 =f x2 . Usc the differential approximation with x = X I and 15x = x2  :1:1 and also with x = x2 and 15x = x1  x2 , geometrically looking at the tangent approximation from both endpoints .
.f [x2]  f[x 1] = .f ' [xr] (.'1:2  x1 ) + c:1 (x2  x1) J [x1 ]  .f [x2] = .f' [x2] · (x1  x2 ) + c:2 · (x i  x2 ) ·
·
Adding these equations, we obtain
Dividing by the nonzero term (.1:2  x1 ) and adding .f' [x2] t o both sides, we obtain, j' [x2] = .f' [xi] + (c:1 £2 ) or j' [x2] R:: f' [xl] . since the difference between t,wo small errors is small. This fact can be used to prove:
Theorem 3 The Inverse Function Theorem {f f'[xo] =f 0 then f[x] has an inverse function in a small neighborhood of xo , that, is, if y R:: Yo = .f[x o] , then there is a wnique x R:: xo so that y = .f[x] . We saw above that the differential approximation makes a microscopic view of the graph look linear. If y R:: Yl the linear equation dy = m dx with m. f' [xr] can be inverted to find a first approximation to the inverse, ·
=
Yo = m (x1  xo) x 1 = xo + � (y  Yo) Y
·
y
1 X OX x ""'1"Xo
Figure 24. 1 .5: Approximate Inverse
24. 1 . Intuitive proofs with "small" quantities
375
=
We test to see if f [x 1 ] y. If not, examine the graph microscopically at (x 1 , yl ) = (x 1 , j[x 1 ] ) . Since the graph appears the same as its tangent to within c and since m = f'[x l ] :::::J J'[x 2 ] , the local coordinates at (x 1 , yl ) look like a line of slope m. Solving for the linear xvalue which gives output y, we get Y  Y l m (x 2  x l ) X2 = Xl + � (y  j[x 1 ]) =
·
Continue in this way generating a sequence of approximations, x 1 = xo + G[x n ] , where the recursion function is G[�] x + � (y f[�] ) . The distance between successive approximations is 1 lx 2  x 1 l I G[x l ]  G[xoJ I :<:: 2 lx 1  xo l 1 1 1 lx 3  x 2 l = I G[x 2 ]  G[x 1J I :<:: 2 · lx 2  x 1 l :<:: 2 · 2 · lx 1  xo l
� ( y  yo ) , Xn+ l
=
=
=
·
by the Differential Approximation for G [x] . Notice that G'[�] 0, for � :::::J x o , so IG'[�] I < 1/2 in particular, and
=
1  f'[�] /m :::::J
I G[x l ]  G[xoJ I :<:: 21 · lx 1  xo l I G[x2]  G[x 1J I :<:: 21 · l x2  x 1 l :<:: 212 · l x 1  xo l 1 lxn + l  Xnl < n · lx 1  xo l 2 lxn+ l  xo l < lxn+ l  Xn l + lxn  Xn  1 1 + · · · + lx 1  xo l < 1 + � + . . · + n · lx 1  xo l 2 2
(
�)
A geometric series estimate shows that the series converges, Xn + x :::::J x 0 and f[x] = y. To complete this proof we need to show that G[�] is a contraction on some noninfinitesimal interval. The precise definition of the derivative matters because the result is false if f' [x] is defined by a pointwise limit. The function f[x] = x + x 2 sin[n/x] with j[O] = 0 has pointwise derivative 1 at zero, but is not increasing in any neighborhood of zero. 24. 1 . 6
Trig, polar coordinates , and Holditch's formula
Calculus depends on small approximations with transcendental functions like sine, cosine, and the natural logarithm. Following arc some intuitive ap proximations with nonalgebraic functions.
24. Calculus with infinitesimals
376
Sine and cosine in radian measure give the :z; and y location of a poil1t on the unit circle measured a distance e along the circle. Now make a small change in the angle and magnify by 1/6@ to make the change appear unit size. y 1
Figure 24. 1 .6: Sine increment�; Since magnification does not change lines, the radial segments from the origin to the tiny increment of the circle meet the circle at right angles and appear under magnification to be parallel lines. Smoothness of the circle means that under powerful magnificat,ion, the circulc:Lr increment appears to be a line. The difference in values of the sine is the long vertical leg of the increment "triangle" above on the right. The apparent hypotenuse with length 6() is the circular increment. Since the radial lines meet the drclo at right angles tho large triangle on the unit circle at the left, with long leg cos[e] and hypotenuse 1 is similar to the increment triangle, giving
We write approximate similarity because the increment "triangle" actually has one circular side that is �straight. In any case, t his is a convincing argument that dd�n [e] = cos[e] . A similar geometric argument on the increment triangle shows that d([;;s [e] =  sin [B] . 24. 1 . 7 Tho
The p olar area differential
derivation of sine and eosine is related to the area differential in polar coordinates. If we take an anglo i ncrement of 6@ and magnify a view of the circular arc on a circle of radius r· , the length of Lhe circnlar increment is r· · c)(}, by similarity.
24. 1 . Intuitive proofs
377
with "small" quantities
y
Figure 24. 1 .7: Polar increment
A magnified view of circles of radii r and r · + 6r bct.ween t.he rays a t angles () and () + 6() appears to be a rect an gl e with sides of lengths or and r c)() If this were a t·.rue rec t angle , its area would be 7' • (j() 6r , but it is ouly an approximate rectangle Technically, we can show that the area of this region is ·r 6() 6r p l us a term that is s mall comp ared with thi s infinit esimal , ·
.
·
.
·
Keisler's Infinite Sum Theorem 16] assures us that error and integrate with respect to TdfJdr. Theorem 4 Holditch 's formula
The area swept out by a tangent of length con·uex curue in the plane is A = 7fR2 •
R
as
we
·it
·
can neglect this size
traverses an arbitmTY
Figure 24. 1 .8: pip coordin ates can sec t;his i nteres t i ng result by using a variation of pol ar coordinates and t h e infinitesimal polar area increment above. Since the curve is convex , ea,ch tangent, meets the cmve at, a unique angle, !p, and each poi nt in the region swept ou t by the tangents is a distance p along that ta n gent 'vVe
.
24. Calculus with infinitesimals
378
We look at an infinitesimal increment of the region in p
Figure 24 .1 .9: ponly increment Next, looking at. t.hc base point. of the tangent on the cnrvc, moving t.o the
correct
Figure 24.1 .10: rp and p increment of height p Orp and base op, or area oA ·
=
p Orp Op: ·
·
Figure 24. 1 . 1 1 : da = p Orp 8p ·
·
24. 1 . Intuitive proofs with "small" quantities
3 79
Integrating gives the total area of the region
{ R { 2 n pdcpdp = nR2 . Jo Jo 24. 1 .8
Leibniz's formula for radius of curvature r
The radius of a circle drawn through three infinitely nearby points on a curve in the (x, y ) plane satisfies
where s denotes the arclength. For example, if y = f[x] , so ds =
d dx �=1
(
)1 + (f' [x] ) 2 dx. then
)
f ' [x] = J1 + (f' [x] ) 2
If the curve is given parametrically, y Jx' [t] 2 + y' [t] 2 dt, then 1
r
24. 1 .9
( )
d dys  dxd
f" [x]
y[t] and x = x[t] , so ds
y ' [t]x" [t]  x ' [t] y " [t] (x'[t] 2 + y' [t] 2 ) 3/ 2
Changes
Consider three points on a curve lC with equal distances b.s between the points. Let a1 and a1 1 denote the angles between the horizontal and the segments connecting the points as shown. We have the relation between the changes in y and a: . [ al = b.y s1n (24. 1 . 2) 
b.s
The difference between these angles, b.a, is shown near PIII ( figure 24. 1 . 12). The angle between the perpendicular bisectors of the connecting segments is also b.a, because they meet the connecting segments at right angles. These bisectors meet at the center of a circle through the three points on the curve whose radius we denote The small triangle with hypotenuse gives r.
r
(24. 1.3)
24. Calculus with infinitesimals
380
s
Figure 24. 1 . 12: Changes in and 24 . 1 . 1 0
a
Small changes
Now we apply these relations when the distance between the successive points is an infinitesimal os. The change 0 sin[o:]
= 0
( ��)
= sin[o:]
 sin[ a  5o:] = cos [o:] . oa + {} . oa, (24. 1 . 4)
with {} :::::J 0, by smoothness of sine (see above) . Smoothness of sine also gives, .
[ ] 00:
00:
sm 2 = 2 + rJ . oa, with rJ :::::J 0. r
Combining this with formula (24. 1 .3) for the infinitesimal case (assuming # 0) , we get OS 00: = + 00: , with :::::J 0. 
r
L
L
•
Now substitute this in (24.1 .4) to obtain
o
( oyos )
By trigonometry, cos[o:]
=
=
os
cos[o:] ;:: + ( os, with ( :::::J 0. ·
oxjos, so
u�)
0 1 os 1  � = ;: + C ox :::::J ;: '
24. 1 . Intuitive proofs with "small" quantities
381
as long as g� is not infinitely large. Keisler's Function Extension Axiom allows us to apply formulas (24. 1.3) and (24. 1.4) when the change is infinitesimal, as we shall see. We still have a gap to fill in order to know that we may replace infinitesimal differences with differentials (or derivatives) , especially because we have a difference of a quotient of differences. First differences and derivatives have a fairly simple rigorous version in Robinson's theory, just using the differential approximation (24. 1 . 1 ) . This can be used to derive many classical differential equations like the tractrix, catenary, and isochrone, see: Chapter 5, Differential Equations from Increment Geometry in [11]. Second differences and second derivatives have a complicated history. See [2]. This is a very interesting paper that begins with a course in calculus as Leibniz might have presented it. 24. 1 . 1 1
The natural exponential
The natural exponential function satisfies y[O] = 1 dy =y dx We can use (24. 1. 1 ) to find an approximate solution, y[6x]
=
y[O] + y' [O] · 6
=
1 + 6x
Recursively, y[26x] y[36x] y[x]
y[6x] + y' [6x] · 6x y[6] ( 1 + 6x) (1 + 6x) 2 y[26x] + y' [26x] · 6x y[26x] · (1 + 6x) (1 + 6x) 3 =
( 1 + 6x t 1ox , for x
·
=
=
=
0, 6x, 26x, 36x,
=
·
·
·
This is the product expansion :::::J (1 + 6 ) 1 / ox , for 6x :::::J 0. No introduction to calculus is complete without mention of this sort of "in finite algebra" as championed by Euler as in [3] . A wonderful modern interpre tation of these sorts of computations is in [7] . W. A. J. Luxemburg's reformula tion of the proof of one of Euler's central formulas sin [ z] z TI%"= 1 (1  c::7r ) 2 ) appears in our monograph, [13] . e
=
24. Calculus with infinitesimals
382
24. 1 . 12
Concerning the history of the calculus
Chapter X of Robinson's monograph [10] begins:
The history of a subject is usually written in the light of later developments. For over half a century now, accounts of the history of the Differential and In tegral Calc1Ll1LS have been based on the belief that even though the idea of a number system containing infinitely small and infinitely large elements might be consistent, it is useless for the development of Mathematical Analysis. In consequence, there is in the writings of this period a noticeable contrast between the severity with which the ideas of Leibniz and his successors are treated and the leniency accorded to the lapses of the early proponents of the doctrine of lim its. We do not propose here to subject any of these works to a detailed criticism. However, it will serve as a starting point for our discussion to try to give a fair summary of the contemporary impression of the history of the Calculus . . . I recommend that you read Robinson's Chapter X. I have often wondered if mathematicians in the time of Weierstrass said things like , 'Karl ' s epsilondelta stuff isn ' t very interesting. All he does is reprove old formulas of Euler.' I have a nonstandard interest in the history of infinitesimal calculus. It really is not historical. Some of the old derivations like Bernoulli's derivation of Leibniz' formula for the radius of curvature seem to me to have a compelling clarity. Robinson's theory of infinitesimals offers me an opportunity to see what is needed to complete these arguments with a contemporary standard of rigor. Working on such problems has led me to believe that the best theory to use to underly calculus when we present it to beginners is one based on the kind of derivatives described in Section 2 and not the pointwise approach that is the current custom in the U.S. I believe we want a theory that supports intuitive reasoning like the examples above and pointwise defined derivatives do not.
24. 2
Keisler's axioms
The following presentation of Keisler's foundations for Robinson ' s Theory of Infinitesimals is explained in more detail in either of the ( free .pdf ) files [12] and the Epilog to Keisler's text [6] . 24.2 . 1
Small, medium , and large hyperreal numbers
A field of numbers is a system that satisfies the associative, commutative and distributive laws and has additive inverses and multiplicative inverses for nonzero elements. ( See the above references for details. ) Essentially this means the laws of high school algebra apply. The binomial expansion that follows is
24.2. Keisler's axioms
383
a consequence of the field axioms. Hence this formula holds for any pair of numbers x and b.x in a field. To compare sizes of numbers we need an ordering. An ordered field has a transitive order that is compatible with the field operations in the sense that if a < b, then a + c < b + c and if 0 < a and 0 < b, then 0 < a · b. (The complex numbers can't be ordered compatibly because i 2  1.) An infinitesimal is a number satisfying 1 6 1 < 1Im for any ordinary nat ural counting number, m 1 , 2, 3, Archimedes' Axiom is precisely the statement that the (Dcdckind) "real" numbers have no positive infinitcsimals. (We take 0 as infinitesimal.) Keisler's Algebra Axiom is the following: =
=
24.2 . 2
·
·
·
.
Keisler's algebra axiom
The hyperreal numbers are an ordered field extension of the real numbers. In particular, there is a positive hyperreal infinitesimal, 6.
Axiom 5
Any ordered field extending the reals has an infinitesimal, but we just include this fact in the axiom. There arc many different infinitcsimals. For example, the law a < b =? a + c < b + c applied to a 0 and b c 6 says 6 < 26. If k is a natural number, k6 < � ' for any natural m, because 6 < k ·� when 6 is infinitesimal. All the ordinary integer multiples of 6 are distinct infinitesimals, =
=
=
. . . < 36 < 26 < 6 < 0 < 6 < 26 < 36 < . . . c
Magnification at center with power 1 I 6 is simply the transformation x + ( x  c) I 6, so by laws of algebra, integer multiples of 6 end up the same integers apart after magnification by 1 I 6 centered at zero.
3
2
1
1
0
2
3
Figure 24.2. 1 : Magnification by i Similar reasoning lets us place � , � , on a magnified line at one half the distance to 6, one third the distance, etc. ·
·
·
24. Calculus with infinitesimals
384
0
c5 c5
;, 2 u 
c5
3 2
Figure 24.2.2: Fractions of 6 Where should we place the numbers 62 , 63 · · · ? On a scale of 6, they arc infinitely near zero, � ( 62 0) 6 :::::J 0. Magnification by 1 /62 reveals 6 2 , but moves 6 infinitely far to the right, b ( 6 0) J > m for all natural m = 1 , 2, 3. · · · =


=
Figure 24.2.3: 63 on a 62 scale Laws of algebra dictate many "orders of infinitesimal" such as 0 < · · · < 63 < 62 < 6. The laws of algebra also show that near every real number there are many hyperreals, say near 3.14159 · · · 1r =
... <
1f

36 <
1f

26 <
1f

6<
1f
1f
1f
1f
< + 6 < + 26 < + 36 < . . .
A hyperreal number x is called limited (or "finite in magnitude") if there is a natural number m so that lx l < m. If there is no natural bound for a hyperreal number it is called unlimited (or "infinite") . Infinitesimal numbers are limited, being bounded by 1. Theorem 6 Standard Parts of Limited Hyperreal Numbers
Every limited hyperreal number x differs from some real number by an in finitesimal, that zs, there is a real r so that x :::::J r. This number is called the "standard part" of x, r = st [x] . Define a Dedekind cut in the real numbers by A { s s :<:.:: x} and D B = {s : x < s}. st [x] is the real number defined by this cut. The fancy way to state the next theorem is to say the limited numbers are an ordered ring with the infinitesimals as a maximal ideal. This amounts to simple rules like "infsml limited infsml." Proof.
=
x
:
=
Theorem 7 Computation rules for small, medium, and large
( a) If p and q are limited, so are p + q and p · q
24.2. Keisler's axioms
385
( b)
Ij r:: and 6 are infinitesimal, so is c + 6.
( c)
If 6
;:::j
0
and q is limited, then q · 6
;:::j
0.
( d ) 1/0 is still undefined and 1/x is unlimited only when x ;:::j 0. Proof. These rules are easy to prove as we illustrate with (c) . If q is limited, there is a natural number with l q l < k. the condition 6 0 means 161 < k � , so l q · 6 1 < � proving that q · 6 ;:::j 0. D ;:::j
24.2 . 3
The uniform derivative o f x 3
Let's apply these rules to show that f[x] = x 3 satisfies the differential approximation with f' [x] 3x 2 when x is limited. We know by laws of alge bra that =
(x + 6x) 3  x 3 = 3x 2 6x + c · 6x, with c = ((3x + 6x) · 6x) If x is limited and 6x ;:::j 0, (a) shows that 3x is limited and that 3x + 6x is also limited. Condition (c) then shows that c = ( (3x + 6x) · 6x) ;:::j 0 proving that for all limited x f[x + 6x]  f [x] j' [x] · 6x + c · 6x =
with c ;:::j 0 whenever 6x ;:::j 0. Below we will see that this computation is logically equivalent to the state ment that lim 0 (x+ ��) 3 x:l 3x 2 , uniformly on compact sets of the real line. �x + It is really no surprise that we can differentiate algebraic functions using algebraic properties of numbers. You should try this yourself with f [x] = x n , f[x] 1/x. f[x] JX, etc. This does not solve the problem of finding sound foundations for calculus using infinitesimals because we need to treat transcen dental functions like sine, cosine , log. x
=
24.2.4
=
=
Keisler's function extension axiom
Keisler's Function Extension Axiom says that all real functions have exten sions to the hyperreal numbers and these "natural" extensions obey the same identities and inequalities as the original function. Some familiar identities are sin[ a + {3] = sin[ a] cos[f3] + cos[ a] sin[f3] log[x · y] = log[x] + log[ y]
24. Calculus with infinitesimals
386
The log identity only holds when x and y are positive. Keisler's Function Extension Axiom is formulated so that we can apply it to the Log identity in the form of the implication (x > 0 and y > 0) =? log[x] and log[y] are defined and log[x · y] = log[x] + log [y] The Function Extension Axiom guarantees that the natural extension of log[·] is defined for all positive hypcrrcals and its identities hold for hypcrrcal numbers satisfying x > 0 and y > 0. We can state the addition formula for sine as the implication (o: = a and (3 = {3) =? sin [a:] , sin[{3] , sin[ a: + {3] , cos[a] , cos[f3] are defined and sin[o: + {3] = sin[a:] cos[f3] + cos[a:] sin [f3] The addition formula is always true so we make the logical real statement (sec 24.2. 7 below) begin with ( a = a and {3 = {3) . 24.2.5
Logical real expressions
Logical real expressions are built up from numbers and variables using functions. (a) A real number is a real expression. (b) A variable standing alone is a real expression. (c) If e 1 , e 2 , · · , en are a real expressions and f [x 1 , x 2 , · · · , xn] is a real func tion of n variables. then f[ e1 , e 2 , · · , en] is a real expression. ·
·
24.2.6
Logical real formulas
A logical real formula is one of the following: (i) An equation between real expressions, E1 = E2 . (ii) An inequality between real expressions, E1 < E2 , E1 E1 2' E2 , or E1 # E2 .
<
E2 , E1 > E2 ,
(iii) A statement of the form .. E is defined" or of the form "E is undefined. "
24.2. Keisler ' s axioms 24. 2 . 7
387
Logical real statements
Let S and T be finite sets of real formulas. A logical real statement is an implication of the form, S =? T. The functional identities for sine and log given above are logical real statements. Axiom 8 Keisler's Function Extension Axiom Every real function f[x l , x 2 , · · , xn] has a "natural"
extension to the hy perreals such that every logical real statement that holds for all real numbers also holds for all hyperreal numbers when the real functions in the statement are replaced by their natural extensions. ·
There are two general uses of the Function Extension Axiom that under lie most of the theoretical problems in calculus. These involve extension of the discrete maximum and extension of finite summation. The proof of the Extreme Value Theorem below uses a hyperfinite maximum, while the proof of the Fundamental Theorem of Integral Calculus uses hyperfinite summation and a maximum. Theorem 9 Simple Equivalency of Limits and Infinitesimals Let f [x] be a real valued function defined for 0 < l x  a l < � with �
a fixed positive real number. Let b be a real number. Then the following are equivalent: (a) Whenever the hyperreal number x satisfies a # x :::::J a, the natural exten sion function satisfies f [x] :::::J b. (b) For every real accuracy tolerance e there is a sufficiently small positive real number 1 such that if the real number x satisfies 0 < l x a l < 1 , then lf[x]  b l < e . Condition ( b ) is the familiar Weierstrass "epsilondelta" condition ( written with e and I · ) Notice that the condition f[x] :::::J b is NOT a logical real statement because the infinitesimal relation is NOT included in the formation rules for forming logical real statements. Proof. We show that ( a) =? ( b ) by proving that not ( b ) implies not ( a) , the contrapositive. Assume ( b ) fails. Then there is a real e > 0 such that for every real 1 > 0 there is a real x satisfying 0 < l x  a l < 1 and l f[x]  b l 2: e . Let X [r] = x be a real function that chooses such an x for a particular f. Then we have the equivalence 1
> 0 ¢? (X [r] is defined, 0 < IX [r]  a l <
1,
l f[X[r]]  b l
2:
e)
24. Calculus with infinitesimals
388
By the Function Extension Axiom this equivalence holds for hyperreal numbers and the natural extensions of the real functions X[·] and f [·]. In particular, choose a positive infinitesimal 1 and apply the equivalence. We have 0 < IX [I ]  al < I and l f[X [r]]  b l > e and e is a positive real number. Hence, f[X [r]] is not infinitely close to b, proving not ( a ) and completing the proof that ( a) implies ( b ) . Conversely, suppose that ( b ) holds. Then for every positive real e, there is a positive real 1 such that 0 < l x  al < 1 implies lf[x]  b l < e. By the Function Extension Axiom, this implication holds for hyperreal numbers. If � :::::J a, then 0 < I�  a l < 1 for every real 1, so If[�]  b l < e for every real positive e. In other words, f [�] :::::J b, showing that ( b ) implies ( a) and completing the proof of the theorem. D Other examples of uses of the Function Extension Axiom to complete the proofs of the basic results of Section 1 follow. 24.2.8
Continuity and extreme values
We follow the idea of the proof in Section 1 for a real function f[x] on a real interval [a, b] . Coding our proof in terms of real functions. There is a real function x M [h] so that for each natural number h the max imum of the values f[x] for x = a + k6.. x , k = 1 , 2, , h and 6.. x = (b  a) /h occurs at x 111 [h] . We can express this in terms of real functions using a real function indicating whether a real number is a natural number, ·
J
[x]
=
·
·
{ 0,1 . 1f�f x = 11,, 2,2, 3,3, : : : X
=
When the natural extension of the indicator function satisfies I[k] 1 , we say that k is a hyperinteger. ( Every limited hyperinteger is an ordinary positive integer. As you can show with these functions. ) The maximum of the partition can be described by
We want to extend this function to unlimited "hypernatural" numbers. The greatest integer function Floor [x ] satisfies, I [Floor [x ]] 1, 0 :S x  Floor [x ] :S 1 . The unlimited number 1/6, for 6 :::::J 0 gives an unlimited H Floor [x] with I[H] 1 and =
=
=
24.2. Keisler ' s axioms
389
There is a greatest partition point of any number in [a , b] , P[h, x] 1 and 0 < a + Floor [ h �.:::� ]bha with a :::; P[h, x] :::; b & I [ h P [��� a ] x  P[h, x] :::; 1/h. When we take the unlimited hypernatural number H we have x  P[H, x] :::; 1 /H 0 and P[H, x] a partition point in the sense that (a :::; x :::; b & I [H P [��l a J = 1 ) , so we have =
;:::j
f[P[H, x]] :::; f[xM [H]]. Let rM st [x 11t [H]], the standard part. Since a :::; X M [H] :::; b, a :::; rM :::; b. Continuity of the function in the sense x 1 ;:::j x2 =? f [xl] ;:::j f [x2] gives =
f[x] ;:::j f[P[H, x]] :::; f[x M [H]] for any real x in [ a, b] .
;:::j
f[rM ] , so f[x] :::; f[rM ]
One important comment about the proof of the Extreme Value Theorem is this. The simple fact that the standard part of every hyperreal x satisfying a :::; x :::; b is in the original real interval [a. b] is the form that topological com pactness takes in Robinson's theory: A standard topological space is compact if and only if every point in its extension is near a standard point, that is, has a standard part and that standard part is in the original space. 24.2 . 9
Microscopic tangency i n one variable
Suppose f[x] and f' [x] are real functions defined on the interval (a, b), if we know that for all hyperreal numbers x with a < x < b and a :j::, x :j::, b, f[x + 6x]  f [x] J' [x] · 6x + c · 6x with c ;:::j 0 whenever 6x ;:::j 0, then arguments like the proof of the simple equivalency of limits and infinitesimals above show that f' [x] is a uniform limit of the difference quotient functions on compact subintervals [a, ,6] C (a, b). More generally, in [12] we show: =
Theorem 10 Uniform Differentiability
Suppose f [x] and J' [x] are real functions defined on the open real interval a, b) ( . The following are equivalent definitions of, "The function f[x] is smooth with continuous derivative f' [x] on (a, b) . " ( a) Whenever a hyperreal x satisfies a < x < b and x is not infinitely near a or b, then an infinitesimal increment of the extended dependent variable is approximately linear on a scale of the change, that is, whenever 6x 0 f[x + 6x]  f [x] j' [x] · 6x + c · 6x with c ;:::j 0. ;:::j
=
( b) For every compact subinterval [a, {3] C (a, b), the real limit f[x + � x]  f[x] lim j ' [x] uniformly for :::; x :::; {3 . �x + 0 � =
a
24. Calculus with infinitesimals
390
( c) For every pair of hyperreal x 1 f' [c] .
:::::!
x 2 with a < st [x i ]
(d) For every c in (a, b), the real double limit,
=
c < b,
lim
Xl + C, X2 + C
f [xt] f[xX22] Xl 
2 ]:__:f [xt] = f[xX2 Xl
:::::!
f' [c] .
f [x+�x� f [x] zs (e) The tmditional pointwise defined deTivative Dx f = � lim + 0 x continuous on (a, b) .
Continuity of the derivative follows rigorously from the argument of Sec tion 1 , approximating the increment f[ x 1 ]  f[x 2 ] from both ends of the interval [x 1 , x 2 ] . It certainly is geometrically natural to treat both endpoints equally, but this is a "locally uniform" approximation in realonly terms because the x values are hyperreal. In his Geneml Investigations of Curved Surfaces ( original in draft of 1825, published in Latin 1827, English translation by l\1orehead and Hiltebeitel, Princeton, N.J , 1902 and reprinted by the University of Michigan Library ) , Gauss begins as follows: A curved S1Lrface is said to possess continuous C1Lrvature at one of its points A, if the directions of all stmight lines drawn from A to points of the surface at an infinitely small distance fmm A are deflected infinitely little from one and the same plane passing through A. This plane is said to touch the surface at the point A. In [4] , Chapter A6, we show that this can be interpreted as C 1 embedded if we apply the condition to all points in the natural extension of the surface. 24. 2 . 10
The FUndamental Theorem of Integral Calculus
The definite integral J: f[x] dx is approximated in real terms by taking sums of slices of the form
f[a] · �x + f[a + �x] · �x + f[a + 2�x] · �x + · · · + f[b'] · �x, where b' a + h · �x and a + ( h + 1) · �x > b Given a real function f [x] defined on [a , b] we can define a new real function S[a , b, �x] by S[a, b, �x] f[a] · �x + f [a + �x] · �x + f [a + 2�x] · �x + · · · + f[b'] · �x, where b' = a + h · �x and a+ ( h + 1) · �x > b. This function has the properties =
=
of summation such as
I S[ a, b. �x J I :::; l f[a] l · �x + l f[a+�x] l · �x + l f[a+2�x] l · �x + . . · + l f[b'] l · � x I S[a, b, �x J I :::; M ax [ lf[xJ I : x = a , a + �x, a + 2 �x, . . . , b' · (b  a)
]
24.2. Keisler ' s axioms
39 1
We can say we have a sum of infinitesimal slices when we apply this function to an infinitesimal 6x,
1a f[x] dx � x=a L f [x] · 6x bo
b
step 5x
or
1b f [x] dx
=
[�
l
box
st
f[x] · 6x , when 6x � 0
,_,h,p O x
Officially, we code the various summations with the functions like S [ a , b, 6x] (in order to remove the function f[x] as a variable.) We need to show that this is welldefined. that is, gives the same real standard part for every in finitesimal, S [a , b, 6x] � S[a , b, L] and both are limited (so they have a common standard part.) When f [x] is continuous, we can show this "existence," but in the case of the Fundamental Theorem, if we know a real function F[x] with F' [x] = f[x] for all a ::; x ::; b, the proof in Section 1 interpreted with the extended summation functions and extended maximum functions proves this "existence" at the same time it shows that the value is F[b]  F [a] . The only ingredient needed to make this work is that max [ l �:[x, 6x] l : x = a, a + 6x . a + 26x, · · , b' ] = c[ a + k 6x, 6x] ·
�
0
This follows from the Uniform Differentiability Theorem above when we take one of the equivalent conditions as the definition of "F'[x] = j[x] for all a < x < b." Notice that c[x, �x] is the real function f[x +'t�f[x]  f' [x] , so we can define an infinite sum by extending the real function
Sc [ a , b � �x] 24. 2 . 1 1
=
b ox
L l ei · 6x
x=a .step 6x
The Local Inverse FUnction Theorem
In [1] Michael Behrens noticed that the inverse function theorem is true for a function with a uniform derivative even just at one point . (It is NOT true for a pointwise derivative. ) Specifically, condition (d) of the Uniform Differentiability Theorem makes the intuitive proof of Section 1 work.
24. Calculus with infinitesimals
392
Theorem 1 1 The Inverse Function Theorem If m is a nonzero real number and the real function x :::::J xo, a real xo with Yo = f[xo] and f[x] satisfies
f[x] is defined for all
f[x 2 ] _ ]  _::_ f[x___:_ l ' :::::J m whenever x 1 :::::J x 2 :::::J xo X2  X l then f[x] has an inverse function in a small neighborhood of x0, that is, there is a real number � > 0 and a smooth real function g[y] defined when I y  Yo I < � with f[g[y]] = y and there is a real c > 0 such that if lx  xo l < c, then l f[x]  Yo l < � and g[f [x]] = x. .:____:_ '
This proof introduces a "permanence principle." When a logical real formula is true for all infinitesimals, it must remain true out to some positive real number. We know that the statement " l x  x0 1 < 6 =? f [x] is defined" is true whenever 6 :::::J 0. Suppose that for every positive real number � there was a real point r with l r  x0 1 < � where f [r] was not defined. We could define a real function U[�] = r. Then the logical real statement Proof.
� > 0 =? (r = U[�] , l r  xo l < � ' f[r] is undefined ) is true. The Function Extension Axiom means it must also be true with �
=
6 :::::J 0, a contradiction, hence, there is a positive real � so that f[x] is defined whenever lx  xo l < �.
We complete the proof of the Inverse Function Theorem by a permanence principle on the domain of yvalues where we can invert f[x] . The intuitive proof of Section 1 shows that whenever I Y  Yo I < 6 :::::J 0, we have lx 1  xo l :::::J 0, and for every natural n and k,
l xn  xo l < 2 1 x 1  xo l , l xn+ k  x k l < 2 k1_1 l x 1  xo l , f [xn ] is defined, I Y  f[xn+ l l l < � � Y  f[xnJ I Recall that we refocus our infinitesimal microscope after each step in the recursion. This is where the uniform condition is used. Now by the permanence principle, there is a real � > 0 so that whenever I Y  Yo I < � ' the properties above hold, making the sequence Xn convergent. Define g[y] = n+ Limrx> Xn · D 24. 2 . 12
Second differences and higher order smoothness
In Section 1 we derived Leibniz' second derivative formula for the radius of curvature of a curve. We actually used infinitesimal second differences, rather than second derivatives and a complete justification requires some more work.
References
393
One way to restate the Uniform First Derivative Theorem above is: The curve y = f[x] is smooth if and only if the line through any two pairs of infinitely close points on the curve is near the same real line, X l ::::::: X2 =}
f[x l ]  f[x 2 ]
::::::: m
X l  X2
A natural way to extend this is to ask: What is the parabola through three infinitely close points? Is the (standard part) of it independent of the choice of the triple? In [8] , Vftor Neves and I show: Theorem 12 Theorem on Higher Order Smoothness
Let f[x] be a real function defined on a real open interval (a, w). Then f[x] is ntimes continuously differentiable on (a, w) if and only if the nth _order differences 6n f are Scontinuous on (a , w) . In this case, the coefficients of the interpolating polynomial are near the coefficients of the Taylor polynomial, whenever the interpolating points satisfy x 1
:::::::
·
·
·
::::::: Xn
::::::: b.
References
[1]
MICHAEL BEHRENS , A Local Inverse Function Theorem, in Victoria Sym Lecture Notes in Math. 369, Springer Verlag, 1974.
[2]
H . J . M . Bos, "Differentials, HigherOrder Differentials and the Derivative in the Leibnizian Calculus", Archive for History of Exact Sciences, 14 1974.
[3]
L . EULER, Introductio in Analysin Infinitorum, Tomus Primus, Lausanne, 17 48. Reprinted as L. Euler, Opera Omnia, ser. 1, vol. 8. Translated from the Latin by .J. D. Blanton, Introduction to Analysis of the Infinite, Book I, SpringerVerlag, New York, 1988.
[4]
JON BARWISE (editor) , The Handbook of Mathematical Logic, North Hol land Studies in Logic 90, Amsterdam, 1977.
[5]
BERNARD R. GELBAUl\1 and JOHN M . H. OLMSTED , in Analysis, HoldenDay Inc. , San Francisco, 1964.
[6]
H . JEROME KEISLER, Elementary Calculus: An Infinitesimal edition, PWS Publishers, 1986. Now available free at http : //www . math . wisc . edu/ ke i sler/calc . html
posium on Nonstandard Analysis,
2nd
Counterexamples Approach,
24.
394
[7]
Calculus with infinitesimals
l\ I ARK McKINZIE and CURTIS T C CKEY , "Higher Trigonometry, Hyper real Numbers and Euler's Analysis of Infinities". Math. Magazine, 74
( 2001) 339368.
[8]
VITOR NEVES and K. D . STROYAN, "A Discrete Condition for Higher Order Smoothness", Boletim da Sociedade Portugesa de Matematica, 3 5
(1996) 8194.
[9] ABRAHAM ROBINSON , "Nonstandard Analysis.. , Proceedings of the Royal Academy of Sciences, ser A, 64 ( 1961 ) 432440 [10] ABRAHAM
ROBINSON, Nonstandard Analysis, NorthHolland Publishing Co. , Amsterdam, 1966. Revised edition by Princeton University Press, Princeton, 1996.
[11]
K . D . STROYAN, Projects for Calculus: The Language of Change, on my website at http : //www . math . uiowa . edu/%7Estroyan/Proj e ct sCD/estroyan/ indexok . htm
[12]
K . D . STROYAN, Foundations of Infinitesimal Calculus. http : //www . math . uiowa . edu/%7Estroyan/backgndct lc . htm
[13]
K . D . STROYAN and W . A . .J . LUXEl\IBURG , Introduction to the Theory of Infinitesimals. Academic Press Series on Pure and Applied Math. 72, Academic Press, New York, 1976.
PreUniversity Analysis Richard 0 )Donovan *
This paper is
a
Abstract
Hrbacek's article showin g
how his
ap
Conceptual difficulties arise in elementary pedagogical approaches.
In
followup of
proach can be pedagogically tmiversity level.
K.
helpful
when introducing analysis at pre
most cases it remains difficult to explain at preuniversity level how the
derivative is calculated at nonstandard values or how
an
internal func
tion is clefiucd. 1Irbacek provides a modified version of 1ST lSI
(rather
P6raire's RIST) which seems to reduce all tlw,se difficulties. This system is hricHy
preR ented
here i11 its pedagogical form with
an
app licat io n to
the derivative. It must be understood as a stateoftheart report1 .
25.1
Introduction
Infinitesimals aTe interesting when teaching analysis because they give meaning to symbols s uch as dx, dy or df(x) and all related formulae. Also, symb ol ic manipulations are usually simpler thtn with limits. In secondary school, real numbers arc introduced with no formal justifica tion. It is sometimes sh ow n that v'2 tJ_
1
College Andr&.Chavanne, Geneve, richard . o donovan�edu . ge . ch
Switzerland.
The author has been teaching an alysis with infinitesimals at preuniversity level for been possible thanks to many exch anges with Karel Hrbacek and also many helpful remarks by Keith Stroyan.
several years. This paper has
25.
396
PreUniversity Analysis
Hoskins writes: "the logical basis of NSA needs to be made clear before it can be used safely" [6] . Most published books on the subject seem to confirm this view. Di Nasso, Benci and Forti state: "Roughly, nonstandard analysis consists of two fundamental tools: the starmap and the transfer principle" [2] . Foundational aspects are fundamental but they should not be a prerequisite to study infinitcsimals. Frcgc and Wittgcnstcin arc not studied in kindergarten prior to learning how to add natural numbers. Our motivations may have a more pedagogical origin than those exposed by Hrbacek in [8] , but globally the reasons why we feel unsatisfied with the avail able textbooks are the same. Because of these difficulties, some colleagues have been tempted to work with infinitesimals by ignoring the question of transfer altogether. But without transfer, analysis with infinitesimals is analysis in a nonarchimedean field, and these fields are known to have very special and unwanted properties as shown in [4] and [13]. Nonstandard analysis ensures that the properties of completeness arc transferred to a certain class of objects.
25.2
Standard part
A major difficulty in nonstandard analysis is the existence of external func tions. Consider the "updown" function: j:
X f+
2 St ( X )  X ·
At standard scale it is indistinguishable from the identity function but at the infinitesimal scale the function is everywhere decreasing with a rate of change (slope? ) equal to 1 and it is Scontinuous and satisfies the interme diate value property! Why does it not satisfy that if a continuous function has a negative slope everywhere on an interval , then the function is decreasing on that interval? It appears difficult to explain to students that statements which use the standardness predicate are "external" but that we can nonetheless use it safely for the definition of the derivative. How can a student understand when it is acceptable to use st[ ] and when is it not? Some students have pointed out that if the derivative is given by st (M�x) ) for all x, then f'(x + 6) = f' (x). The classical textbooks such as [9] , [14] or [5] use an indirect definition for the derivative at nonstandard points which makes a drastic distinction in nature between standard and nonstandard numbers: at standard points, the derivative is calculated using that point and other nonstandard neighbours but at nonstandard points some form of transfer is used. How can we be convincing when explaining why there are two different definitions?
25.3. Stratified analysis
397
Even Stroyan's uniform derivative [14] is not totally satisfying here because it requires that we recognise a "real function", which in turn means that we need a definition of what such a function is, and what an extension is.
25.3
Stratified analysis
Stratified analysis may be an answer. An interesting aspect is that the theory is adapted for elementary teaching at the same time as it is being developed. The interactions are mutual. In the adaptation for highschool , we have deliberately avoided references to the words "standard" and "nonstandard" for reasons already discussed in [10] . Instead of the binary relation C::: introduced by Hrbacek, we have suggested to use a concept of level, symbolised by v ( x ) . If a C::: b then a E v (b): a is relatively standard to b means that a is at the level of b. For highschool, we feel that this is a simpler concept. All of the adaptations have been discussed with Hrbacek so that they satisfy both the educational requirements and the theoretical consistency. The following is a possible description given to the students:
Levels We get to know the fa miliar natura l n u m bers 1 , 2, 3 , . . . when we learn to cou nt. But it would be a rare child that actually cou nts ( i n steps of 1) to more than a thousa nd or so. It is not necessary to keep going further, beca use somewhere at this point one gets the idea that the cou nti ng process never ends (this is cal l ed : potential infi n ity) , a n d one i s capable of forming the set of natural n u m bers N , i . e. it is possible to consider the col lection of all natural n u m bers as one "th i ng" ( this is ca lled: actual infinity) . It is neither necessary nor possible to cou nt all natura l n u m bers to do it! Also, sums, products, differences, quotients, powers, roots, etc. of these fa m i l iar n u m bers are or can be known at this stage. All these fa miliar n u m bers are at the sa me level. I f x is at the level of y, it means that x can be known when y ca n be know n ; written X E v(y) Fa miliar n u m bers are a l l at the coarsest level . v(O) = v ( 1 ) = v
(�)
= v ( v3)
398
25.
PreUniversity Analysis
But then there are natura l n u m bers u n known at this level , n u m bers that are so l a rge (we will say "u nlimited") that they do not belong to v(O). Let K be such an u n l imited natura l n u m ber, i.e. K � v ( O ) . It ta kes a higher level of knowledge to know K, but once we do, we also know 2K, K + 1 . . . and many other objects which all belong to v(K). I n increasing our knowledge about n u m bers, we do not lose former knowledge, so 1 , 2, 3, . . . are a lso at v(K) : they rema i n known. Also, the reciproca l : k is a n "infi nitesimal" that becomes known at the level of K. Hence a n u m ber such as, say, 3 + 15K is at v(K) (a finer level) 3 and is i nfi n itely close to 3. But aga i n , this level of knowledge does not exha ust the set N, so there are n u m bers not at v ( K )  hence not at v ( O ) , etc. The levels potenti a l ly expand forever and never fu l l y exha ust N. •
There are many levels, each making new n u m bers known.
•
At v ( O ) there are no i nfi nitesima ls.
Infinitesimals are defined relatively to a level. ainfinitesimals and a unlimited are numbers which are not at ( a) and which are, respectively, less in modulus, greater in modulus, than any nonzero number at ( a). For an alimited number � ' the ashadow (noted sha (�)) is the unique number at v(a) which is ainfinitesimally close to � The xshadow replaces the concept of standard part in this relativistic framework2 . We give a more restricted definition of levels than Hrbacck's. We only include numbers in the levels, not sets. This slight difference can be considered as a restriction of the definition given by Hrbacek not a contradiction: the only sets we use explicitly in our course are intervals, N or JR. One of the immediate advantages of stratified analysis is the possibility to easily identify "acceptable" statements  those that will eventually transfer. A statement about x is acceptable if it makes either no reference to levels or only to the level of x. The same holds if instead of x there is a list x x1 , x 2 , . . . A function f is defined by a rule that tells us what f ( x) is when x is in the domain of f. The rule may depend on some parameters, Pl , P2 , . . . A function is acceptable if the rule either does not refer to levels or refers only to v(x, p 1 , p2 , . . . ). (Although probably not necessary at this school level, transfer ensures that if there are two different rules using parameters from different levels which give the same value for all values of the variable, then they define 2 Again. for rca�on� related to politic� in the educational realm, �hadow ha� been preferred v
v
=
to "st andard p art". coar�cr lcvcb information.
It also yields a fairly intuitive image that numbers cast a shadow on
( provided
they arc finite with rc�pcct to that level ) ; �hadow� contain lc��
25.4.
Derivative
399
the same function and its level is the coarsest one where the function can be defined) . f : x f+ 2 sh0 (x)  x i s not an acceptable function because there is an absolute reference to v( 0) in the shadow predicate. If adapted as 2 shx ( x)  x then, as shx ( x) = x, it simplifies to the identity function. Theorems about continuity are for (acceptable) continuous functions and this function does not satisfy the conditions. The student can see this with no "external" knowledge. It seems reasonable to state that from then on, acceptable functions will be simply called functions (so the statements of theorems will look very much like the classical ones). ·
·
25.4
Derivative
+
The derivative of a function is defined by: Let f :]a; b [ lR be a function and x E]a; b [ f is differentiable at x iff there is an L E v(x, f) such that, for any < x, f > infinitesimal h , with x + h E ]a; b[, S
h < x ,j>
( f(x
+ h) h
f(x)
)
=
L
then the derivative is f' (x) = L It is a direct observation that the derivative is "acceptable". For f : x f+ x 2 at x = 2 the derivative is simple and works exactly as in any other nonstandard method. For x in general. the quotient simplifies to 2x + h. The only parameter of f is 2 E v(O) hence v(x, f) v(x). As h is xinfinitesinml, shx (2x + h) = 2x. For a direct calculation of f' (2 + c5), we have 2 + c) E v(c5) hence a [2 + 6]infinitesimal is also a 6infinitesimal. Let h be a 6infinitesimal.
=
Thus stratified analysis satisfies one of our major requirements: a single definition of the derivative which applies to all numbers. In [10] it is shown that using the definition of infinitesimals, the students could work out the rules of computation as exercises and find that for standard a and infinitesimal c), a · c) is infinitesimal. The same exercises adapted to ac ceptable statements yield that if 6 is ainfinitesimal, then a · 6 is ainfinitesimal.
25 . PreUniversity Analysis
400
25.5
Transfer and closure
Some definitions have been rewritten several times and the definitions we use today may not be final. All the proofs of theorems in the syllabus must be checked and crosschecked. The only principles that are needed are Hrbacek's closure principle: If f is an accept able function, then f ( x ) E v ( x, f) for all x in the domain of f. This extends the observation that operations with "familiar" numbers do not yield infinitesimals. The other is a simple form of transfer which states that an acceptable statement is true for v ( a ) iff it is true for all v ((J) with a E v ( (J ) . This adaptation to preuniversity level is not completed yet . We hope to finish a first version of a handout , with proofs, within a year or so. Our goal is still t he same: for most people, infinitesimals "are there", but can they be used to make maths easier to teach and learn and still remain rigorous? We hope to be able to contribute an answer to this question in a not too distant future.
References [1] Occam's razor, In
Encyclopaedia Britannica,
1 998. CD version.
[2] VIERI BENCI , MARC FORTI and MAURO Dr NASSO, The eightfoldpath to nonstandard analysis, Quaderni del Dipartimento di :t\1athematica Ap plicata "U. Dini", Pisa, 2004. [3] MARTIN BER Z, "Nonarchimedean analysis and rigorous computation", International .Journal of Applied Mathematics, 2 (2000) 889930. [4] MARTIN BERZ, " Cauchy theory on LeviCivita fields", Contemporary Mathematics, 319 (2003) 3952.
Analyse Non Standard, Hermann , 1989. Standard and Nonstandard Analysis, Ellis Horwood,
[5] F . DIENER and G . REEB , [6] RoY HOSKINS ,
1990.
[7] RoY HOSKINS, 2004. personal letter. [8] KAREL HRBACEK , Stratified analysis? , in this volume. [9] .JEROME KEISLER, Elementary Calculus, University of Wisconsin, 2000. www . math . wi s c . edu/ke isler/cal c . html [10] JOHN KIMBER and RICHARD O ' DONOVAN, Non standard analysis at preuniversity level, magnitude analysis. In D. A. Ross N . .J . Cutland, M . D i Nasso (eds.), Nonstandard Methods and Applications in Mathematics, Lecture Notes in Logic 25, Association for Symbolic Logic, 2006.
References
401
[ 1 1]
MAURO Dr NASSO and VIERI BENCI, Alpha theory, an elementary ax iomatics for nonstandard analysis, Quaderni del Dipartimento di Mathe matica Applicata "U. Dini", Pisa, 200 1 .
[ 1 2]
CAROL SCHUMACHER, Chapter Zero,
[13]
KHODR SHAMSEDDINE
[14]
K . D. STROYAN, Mathematical Background: Foundations of Infimtesimal Calculus, Academic Press, 1997.
AddisonWesley. 2000.
and MARTIN BERZ, "Intermediate values and inverse functions on nonarchimedean fields", IJMMS, (2002) 1651 76.
www . math . u iowa . edu/%7Estroyan/backgndctlc . html