This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
: V ~ L 2 (P), called generalized random functions or functionaL,;. K. Ito and 1. M. Gel'fand independently introduced these objects in the middle 1950's. In their serninal work, Gel'fand and Vilenkin (1964) gave various properties of these functionals and presented representation theorerns for thern. They also considered to have independent values, i.e., if Yet>(t) for t E G, where J [Alx(t)l]dt < 00 for some A > O}. The gauge norm and the Orlicz norm are defined by Ilxll(4)) (O) of LcI>(O) is defined to be the set {x E L O}. In the same way we define the Orlicz sequence space £ . An important parameter for analysis in an Orlicz space is the rate of growth of the underling N-function. An N-function <1> Cu) is said to satisfy the Ll2-condition for large u (for small ru or for all u ~ 0), in symbol <1> E ~2(00)(<1> E ~2(0) or <1> E ~2), if there exist Ua > 0 and K > 2 such that <1>(2u) :S K<1>(u) for u ~ Uo (for 0 :S u :S ruo or for u ~ 0). An N-function <1>(u) is said to satisfy the \72-condition for large u, in symbol <1> E \72 (00), if there exist Uo > 0 and a > 1 such that <1>(u) :S fa<1>(au) for u ~ ruo· Similarly we define <1> E \72(0) and <1> E \72, The basic facts on Orlicz spaces can be found in [10], [11] and [14]. For instance, Lcl>[O, 1] (L E ~2(00) (<1> E ~2 or <1> E ~2(O)); L E D2(00) \72(00) (<1> E D2 \72 or <1> E D2(0) \72(0)). I\ new quantitative index of <1>( u) is provided by the following six constants: s) (0) s (t) exists. Then d -ef-4> ; ( I.) rq, == 1·lm u -+ oo ~. q,-1(2u) eXIsts an r defined by (52) exists, then
Coo
1. G. Y. H. Chi (1969), "Nonlinear prediction and rnultiplicity of generalized random processes," Ph.D. thesis at Carnegie-Mellon Univ., Pittsburgh, PA. 2. 1. M. Gel'fand and N. Ya. Vilenkin (1964), Generalized Functions, Vo1.4, Academic Press~ Inc., New York. 3. M. M. R,ao (1969), "Representation theory of rIlultidinlensional generalized random fields," IJroc . Second Internal. Syrnp. Multivariate Analysis', Academic Press, Inc., New York, 411-436. 4. (1971), "Local functionals and generalized randoru fields with independent values," Theor. Prob. Appl., 16,457-473. 5. (1981), "Local fllnctionals on Coo(C) and probability," J. Functional Anal., 39, 23-41. 6. R. J. Swift (1992)~ "Structure and sample path analyses of harmonizable random
12
Rao
fields," Ph.D. thesis at UCR, R,iverside, CA. 7. A. M. Yaglorn (1987), Correlation Theory of Stationary and Related Random Functions, Vols.1-2, Springer-Verlag, Berlin.
VII. Sufficiency studies If f(-, 8) : ]Rn -+ ]R+ is a probability density function of a random vector X == (Xl,'" ,Xn ) where () E I c ]R, one estimates f} by a Borel function of X, denoted (X), and if (X) is another such estirnator, then R. A. Fisher (1922) suggested that be called sufficient for f} whenever the conditional distribution of (j given does not involve the paranleter (), so that all the "information" about f) is contained in which is thus preferable to (j in any statistical inference problem about e; and he presented a substantial theory of it. This useful concept was generalized, and an analytical expression obtained, by Halmos and Savage (1949) as follows. If {Pe, e E A} is a family of probability measures on (0, I:), then a a-algebra B C I: is terrned sufficient for that class iff for each bounded measurable function .f : -+ ]R, there is a bounded B-measurable F f such that
e
e
e
e e
°
a.e.[PeL () E A.
Then, among others, the following result was established by these authors:
Theorem 1. If {Pe, 8 EA} is do'minated by 8o'me measure A : I: -+ }R+, such that AIB is a-finite, then B is sufficient for the Pe-fam,ily if], clf: is B-measurable for all () E A. If the (a-finite) domination condition is simply omitted, then Burkholder (1961) showed that this result does not hold and, nloreover, discussed the many serious difficulties that can result in the ensuing analysis. Then a rnore general condition than domination was introduced by T. S. Pitcher (1965) and this was generalized by Robert Rosenberg (1970) using the theory of Orlicz spaces. His work leads to a study of rnartingales relative to families of measures. It was also found later that Pitcher's condition is essentially equivalent to the existence of a "localizable" (which is rnorc general than a-finiteness) dominating measure A for the given farnily. There are still rnany interesting problerns in this area for investigation. The undominated case, for instance, is related to the Radon-Nikodym theory of vector valued rneasures which do not have finite variation. Using the order topology on real functions M on (0, I:), integrable relative to the family {Pe, e EA}, one can prove some vector R-N theorems which are useful in this context. On the other hand, many basic unresolved questions also exist on proper constructions and evaluations of conditional expectations or probabilities. This is of central importance for a satisfactory analysis of such areas as Bayesian statistics and calculation of transition probabilities in general Markov Processes. A detailed account of these questions and available rigorous results have been given in rny recent work (1993a,b). 1. D. L. Burkholder (1961), "Sufficiency in the undominated case," Ann. Math. Statist., 32, 1191-1200. 2. R. A. Fisher (1922), "On the mathernatical foundations of theoretical statistics," Roy. Soc. Phil. Trans., Ser. A, 222,309-368.
Problems of Analysis Arising from Applications
13
3. P. R. Halmos and L. J. Savage (1949), "Application of R,adon-Nikodym theorern to the theory of sufficient statistics," Ann. Math. Statist., 20, 225--241. 4. T. S. Pitcher (1965), "A ruore general property than dOlllination for sets of probability rneasures," Pacific . .1. Math., 15, 597-611. 5. M. M. Rao (1993a), "Exact evaluation of conditional expectations in the Kolmogorov model," Indian .1. Math., 35, 57-70. 6. (1993b), Conditional Measures and Applications, Marcel Dekker, Inc., New York. 7. R. L. Rosenberg (1970), "()rlicz spaces based on farnilies of measures," Studia M ath. , 35, 15-49.
VIII. Stochastic processes and inference The classical statistical inference questions for sequences of independent randolll variables with comrnon distributions is first extended to stochastic processes by Grenander (1950), opening up a vastly new and important area for research. He also solved many problems there and introduced new methods which ernploy deep results from functional analysis. For instance, the nonlinear prediction based on the past and present observations leads to a structural analysis of prediction operators. If the index set of the process is one dilnensional, usually identified as tilne, and the error criterion is a nonnegative convex function which need not be a square, then one has to consider the LP or even the Orlicz spaces L'P for this work. In the latter case, the structure of prediction and projection operators have been studied in detail (1974). On the other hand if the process satisfies an nth order linear stochastic differential equation with the driving force as a harlllonizable process, an unbiased prediction problem of the following type is ilnportant in some applications (1994). The work leads to systerns of Fredholm integral equations. Solutions of these equations have not yet been adequately analyzed. A sornewhat simplified version of a typical statement will be given as an illustration.
+ Zt, t E [a, b]} be a process with }If as a random signal and Zt as a noise process. Suppose both are of second order, orthogonal to each other and E(yt) == aj!Jj(t), aj E JR. If T y is the covariance function of the -process, Zt has orthogonal incrernents, E(Zt) == 0, with the variance function H(·), let X t be observed on [a, b] and.9j be srnooth. Then an unbiased weighted linear least squares predictor Xta of
Proposition 1. Let {Xt == yt
L'T=l
Y
J:
Xtn == Xtdp(t, to), 11Jhere p(', to) is a (11Jeight) function of bounded variation on [a, b], e:rists whenever pC " to) is a solution of the Fredholm integral equation:
yta for to > b, of the form
Here k(s, t) == ry(s, t) system's constraints,
+ H(m'in(s, t)),
and the Aj(tO) are the Lagrange multipliers for the
14
Rao
If the index set is rllultidilllcnsional, i.e., X t , yt, Zt are randolll fields, the corresponding study involves lllultiple stochastic integrals. These problerlls lead to a general analysis of rllulti-parameter processes or fields. Here the basic results of Cairoli and Walsh (1975) have to be employed and extended. Sorue of it has been done by Michael Brennan (1979) and much further research is needed. The desired stochastic calculus involving lllany new questions are subjects of current research interests. 1. M. D. Brennan (1979), "Planar serni-rIlartingales," J. Multivar. Anal., 9,465--486. 2. R. Cairoli and J. B. Walsh (1975), "Stochastic integrals in the plane," Acta Math., 134, 111--183. 3. U. Grenander (1950), "Stochastic processes and statistical inference," Ark. Mat., 1, 195-277. 4. M. M. Rao (1974), "Inference in stochastic processes-IV," Sankhya, Ser. A, 36, 63120. 5. (1994), "Harmonizable processes and inference: unbiased prediction for stochastic flows," J. Statist. Plan. Inf., 39, 187-209.
IX. The existence problem for processes In all the above work the existence of various types of processes, to be used In rIlodeling, is a nontrivial question to settle. The basic result that one invokes in such problerIls is due to Kolmogorov (1933). It says that if a family {Ftl ,.oo ,tn' ti E T, n ~ I} of compatible finite dimensional distributions is given, then there exists a probability space (O,~, P) and a stochastic process {X t , t E T} on it having the given family as its finite dimensional distributions. The compatibility condition which is both necessary and sufficient can be stated as follows: (i) lim xn --+00 F tl ,oo. ,t n (Xl, ... ,X n ) == F t1 ,.oo ,t n - l (Xl, ... ,Xn-l), and (ii)Fti1 "OO ,tin (Xi! , ... ,Xi n ) == F tl ,oo. ,tn (Xl, ... ,x n ), for all permutations (i l , ... ,in) of (1, ... ,n), and for all n ~ 1. The mysterious condition (ii) above is better understood when it is recognized as part of a projective lirnit problenl that is involved here. The latter identification and its structure of the basic theorem is due to Bochner (1955) who then extended it. It was further generalized by Prokhorov (1956) and several others (e.g., Choksi, Metivier, Sion and his students). There is also a dual problern, called the direct limits of measure spaces. I have discussed various forrns of both these problems in rIlY book (1981). An application of the direct linlit result is rIloreover found useful in solving an open question of Whitney's in georIletric integration theory, generalizing it to infinite dimensions, by Stephen Noltie (1975) in his thesis. A cornprehensive result for the projective limits, which includes and refines several previous cases, has recently been given in collaboration with Vjaceslav Sazonov (1993). This has close relations with set martingales and other parts of analysis. The duality of direct and projective limit theory has interesting connections with topology. Further work is possible in this class of ideas. 1. S. Bochner (1955), Harm,onic Analysis and the Theory of Probability, Univ. Calif. Press, Berkeley, CA. 2. A. N. Kolmogorov (1933), Grundbegriffe de". Wahrschienlichkeitsrechnung, SpringerVerlag, Berlin. [English translation (1956), Chelsea Publishing Co., New York.]
Problems of Analysis Arising from Applications
15
3. S. V. Noltie (1975), "Integral representations of chains and vector measures," Ph.D. thesis at UCR, Riverside, CA. (See also, Chapter 4 of Real and Stochastic Analysis, Wiley, New York, (1986), 211-248.) 4. Yu. V. Prokhorov (1956), "Convergence of random processes and limit theorems in probability," Theor. Prob. Appl., 1, 157-214. 5. M. M. R,ao (1981), Foundations of Stochastic Analysis, Academic Press, Inc., New York. 6. and V. V. Sazonov (1993), "A projective limit theorem for probability spaces and applications," Theor. Prob. Appl. 38, 307-315.
Quasi-periodic Solutions of Hamiltonian Evolution Equations JEAN BOURGAIN NJ 08540
School of Mathematics, Institute for Advanced Study, Princeton,
The problem discussed here is the persistency of quasi-periodic solutions of linear or integrable equations after Hamiltonian perturbation. This subject is closely related to the well known "KAM-theory" on invariant tori in smooth dynamical systems. Most of the research has been achieved within the last ten years by various authors (including J. Frohlich, T. Spencer, E. Wayne, S. Kuksin, and more recently W. Craig, E. Wayne and Inyself). The main difference with KAM is the fact that one considers (finite dimensional) tori in an infinite dimensional phase space. Considering PDE's in space dimension> 1, the most significant new difficulty arising is the presence of large clusters of normal frequencies. This feature which seems incompatible with the use of a "standard" KAM technology as applied by E. Wayne [W] and S. Kuksin [K1] has lead to the development of a new method to approach the persistency problems, also in finite dimensional phase space. The first PDEresults obtained along these lines appear in the work of W. Craig and E. Wayne [C-W] on time-periodic solutions of ID-nonlinear wave equations. Later on the author extended their method to cover the full setting of quasi-periodic solutions, leading in particular to a new proof of the KAM theorem and Melnikov theorem in finite dimensional phase space under weaker non-resonance hypothesis (not excluding multiplicities in the normal frequencies). This understanding lead to significant progress on the PDE problems. Presently the 17
Bourgain
18
persistency problem is in ID (1 space dimension) in satisfactory way understood (under the Dirichlet and periodic boundary conditions). In higher space dimension several cases can be treated (especially the nonlinear Schrodinger equation (NSE) and the case of time periodic solutions). Many problems remain and are today an active research topic. The purpose of this expose is to give a brief account of the model problems, the techniques and some of the results. Consider an equation of the form .
~ Ut
+
8H
A U
+ E 8u == 0
(1)
where A is a selfadjoint operator and H == H (u, it) is polynomial (or real analytic). We assume that A is diagonal in an eigenfunction basis which is "well localized" wrt the exponentials. The following are the main examples
(i) A ==
-~
+Ma
where
obtained by adding to the Laplacian a Fourier multiplier Ma. In this case, the eigenfunction basis are the exponentials and the eigenvalues
(2) We assume lim an == 0
(3)
Inl-HX)
or lim
Inl~oo
lanl
-
O"n21
==
o.
(4)
(ii) Consider the ID Sturm Liouville operator d2
(5)
A==--+V(x) dx 2 where V is a real analytic I-periodic potential. The periodic spectrum of A is a sequence
(6) which satisfies A2n-l, A2n A2n -
= Jr2 n 2 + A2n-l ~
J
0
V dx
+ O(n- 2 )
rapidly.
(7) (8)
Solutions of Hamiltonian Evolution Equations
19
The corresponding eigenfunction s 'P2n-I, 'P2n are periodic or anti-periodic according to the parity of n. Expanding 'Pj in a Fourier series (9) m
There is the localization property wrt exponentials (10) for some constants c
> 0, C <
00
depending on V.
Conversely, one has an expansion (11 ) with again
(12) Assume V even i.e. V(x) == V( -x). Consider the spectrum of A == - dd:2 + V subject to Dirichlet boundary conditions on [0,1]. This gives a sequence {Mn} interlacing the periodic spectrum
(13) and the corresponding eigenfunction s {7Pn} form a basis for the 2-periodic odd functions. Thus 00
7Pn(x) ==
E :(j;n(m) sin 7rmx
(14)
m=I 00
SIn
7rnx ==
E S'n(m) 7Pm(X)
(15)
m=I
where again
(16) (iii) The analogue of the theory in ID described above fails in higher dinlension. ()ne nlay however, consider the special case of a potential of the form
V (x) == VI (Xl) + ... + Vd (x d) .
(17)
For periodic boundary conditions, one gets for eigenvalues and eigenfunction s
(18)
Bourgain
20
In the Dirichlet case, one gets
(19)
(iv) A==(_~+p)1/2
(20)
·with eigenvalues (21) and the exponentials as eigenfunction s. Observe that the sequence {( k+ p) 1/2 I k E Z+} consists of rationally independent numbers for typical p. Coming back to (1), denote {'Pn, f.-ln} the eigenfunction s and corresponding eigenvalues for A, i.e. A 'Pn
==
J-Ln 'Pn·
Fix some specific indices nI, n2, ... ,nb and denote Aj == J-Lnj (j == 1, ... , b). Then b
uo(x, t)
==
2: aj ei>..jt CPnj (x)
(22)
j=l
yields a solution of the linear equation
(23) This solution is quasi-periodic and corresponds to a flow on a b-torus Tb == 70{ laj I}. The problem we study is which of the solutions (22) "persist" for the perturbed equation
8H
-lUt
+ Au + E ail == o.
(24)
The meaning of persistency is as follows. One has a quasi-periodic solution u€(x,t)
==
2:u€ (n,k) ei(>..',k)t 'Pn(x) n,k
(25)
of (24), where k is a Zb-index (ej
== jth unit vector)
(26)
21
Solutions of Hamiltonian Evolution Equations
L
w(n, k) lu€(n, k)
1<
yfE
(27)
(n,k)f!.S
lA' - AI < Cc.
(28)
In (27),
(29) is the "resonant set" . We let w be a weight function. For instance
w(n, k) == ec(lnl+lkl)
(30)
corresponds to analytic solutions. We will lIlostly consider weights of the fornl
w(n, k) == eClnl+lkl)c.
(31)
For some c > 0 (because they lead to some technical simplifications later on). There is dependence of the new frequency A' on {aj} and the perturbation c ~~. In fact, assuming A~, ... ,A~ rationally independent, a tilne shift permits to assurne aj (1 :::; j :::; b) real and hence A' depends only on {Iajl}. The solution (25) corresponds by (27) to a perturbed torus 7;{lajl} of 1O{lajl} in the phase space. Consider A == (A1,"" Ab) as a paranleter taken in a paralneter set A. Persistency of quasi-periodic solutions (22) after the Hamiltonian perturbation as described above will occur for A in a large subset of the frequency set. More precisely, the persistency will hold provided (32) A (j. A~(lajl (1 :::; j :::; b), perturbation) where
€---+O
lIles A~ ---+ 0 .
(33)
The preceding deals with perturbations of linear equations. In order to obtain families of quasi-periodic solutions with b frequencies, one needs to consider a paranleter dependent linear equation with b-paralneters p == (P1' ... ,Pb), such that if {An (p)} are the eigenvalues of A(p), then det
(8A
i= O.
nj
8Pk
)
l:::;j,
(34)
k:::;b
In the previous exanlples, such parameters are obtained via the Fourier lIlultiplier Ma or by letting V range in some b-nlanifold of potentials. Recall here that if d2 A(V) == - dx 2
+ V(x)
(35)
22
Bourgain
then from first order variation
(36) where
A'=A+E(A,a)
(37)
and assuming initial non-resonance conditions (1 e Melnikov condition) b
LkjAj +J-ln -# 0
(n ~ {n, ... ,nb})
(38)
j=1 (expressing non-resonance of the normal frequencies J-ln with (Z-combinations of) the tangential frequencies AI, ... , Ab), conditions appearing later on in the process may be ensured by restricting lal to an appropriate Cantor set. This requires the dependence (37) of A' on lal to be sufficiently non-degenerate (amplitude-frequency modulation). Secondly, one may sometimes extract parameters out of the nonlinearity by putting the Hamiltonian
(39) in an appropriate Birkhoff normal form.
Assume that one may write (1) in the form
(40) where
Define for j == 1, ... ,b (41) (action variables). From (40)
(42) Fix some values {IJ}~=1 and write
(43)
Solutions of Hamiltonian Evolution Equations
23
iFrom (40), (42), (43) we get the system
iqnj + (Aj + dJ) qnj +cvIJ Ijqnj +c<5 g::~ = 0 (j = 1, .. . ,b) {
iij
-
evIJ (gH2 qn j
.. + {lnqn +6 zqn e
qn'J -
g!!2 qn j
Qfu -O 8qn -
iin.) == 0 (j == 1, ... , b) J (-t. n I nI, ... ,nb )
(44)
which appears as (evIJ)-perturbation of the linear system
i~nJ + (Aj + clJ)qnj == 0 ilj == 0 { iqn + {lnqn == 0
(j == 1, (j == 1,
, b)
, b)
(45)
with a b-parameter set of frequencies {Aj + c lJ}~=l obtained by variation of {lJ}. Hence the problem is reduced to a perturbation of a linear system as above. A similar approach is applied in the study of perturbations of integrable systems. For instance, perturbations of the KdV-equation Ut
+ U xxx + UU x == 0
(46)
of the form Ut
+ U xxx + UU x + c
j(u)x == 0
(47)
(f (u) polynomial or real analytic). The process of extracting the normal form here is rather involved since it is based on the Riemann surface correspondence (cf. [K2]). Coming back to (1), we deal with the problem of persistency of finite dirnensional tori in an infinite dimensional phase space. This is a generalization of the more classical KAM (Kolmogorov-Arnold-Moser) setting of persistency of n tori in 2n-dimensional phase space. As a first method to approach (1), one may however, apply the basic KAM scheme, which consists in removing the perturbation cH in (39) by composing with canonical transfornlations. Given a Hamiltonian flow
iqt = c
~~
F = F(q, ij)
Hamiltonian
(48)
denote q ---+ q' the time-1 shift, which is a symplectic transformation of phase space. Then the systern
(49) is transformed into
.,
zqt
8H'
== --8q'
(50)
24
Bourgain
where H(q) == H'(q'). Denote in complex coordinates the Poisson bracket (51) Then E
2
H'==H+E{H,F}+, {{H,F},F}+···==H+E{H,F}+O(E 2 ). 2.
(52)
Hence, for H given by (39)
we find (54) The main idea is to choose F in order to reduce HI + {Ho, F} and iterate the process. The drawback of this method is that it requires the spectrum of A to satisfy non-resonance conditions of the form (2 e Melnikov conditions) b
E
kjAj
+ J-ln
(55)
- J-ln' =1= 0
j=1
besides (38). Condition (55) is violated in the case of multiplicities in the normal frequencies and impairs the use of KAM in this situation (which in the context of (1) appears in ID under periodic boundary conditions and in higher dimension). To illustrate the appearance of (55), assume F of the form F(q, ij)
== (Bq, q),
B
== B*.
(56)
Then, by (51) 1
{Ho, F} = 2i E(fLn - fLn') bnn, iin qn' '" ([A, BJq, q)
(57)
n,n'
where [A, B] == AB - BA. If one rewrites (1) in the Fourier version (wrt ei (>'"
(_(A', k)
,k) t
'Pn (x)), one gets
+ fLn) it (n, k) + c ~~ (n, k) = O.
(58)
Hence (38) is the only type of condition which appears if one tries to solve (58) by a direct perturbative approach. If one considers such a E-series, considerable problems appear due to the small divisors J-ln - (A', k).
(59)
Solutions of Hamiltonian Evolution Equations
25
They yield approximative solutions by truncation, introducing only divisors such that Itln -
(A', k) I » c.
(60)
To perfornl their actual summation is a very delicate issue (leading to re-normalization problems) and was even in finite dimensional phase space (i.e. n ranges in a finite set) only recently fully understood (*). The method followed here is to solve (58) by a Newton iteration schelne, converging much faster (double exponentially fast) and therefore less affected by the presence of small divisors. On the other hand, it will require to control the inverse of the linearization of (58) which is a non-diagonal operator. We first perform a Lyapunov-Schmidt type decomposition. Consider the b equations
(--\j+-\j)aj+C
~~ (nj,ej)
=0
(j=l, ... ,b)
(61)
obtained by taking (n, k) E S of (26), (29). They form the (finite) systenl of Q-equations. The remaining (infinite) system
(-(-\',k)+Pn)u(n,k)+c
~~ (n,k)=O (n,k)~S
(62)
are called the P-equations. The general procedure is to determine UI(n,k)~S from (62) (depending on A') and then substitute in (61) to obtain the new frequencies A' == (A~, ... ,A~). Now these frequencies A~, ... , A~ need to be real. Thus (61) expresses, in fact, 2b conditions with b parameters. The formal solvability is a consequence of the Hamiltonian nature of (1) (result due to Poincare). Assume H(u, u) is a sum of monomial s uju k with real coefficients. In proving the persistency result, we may assume aj E lR (j == 1, ... , b), considering time shifts as observed ea~er. Hence, the system UI(n,k)~S produced from solving (62) will be real and so will be ~~ (nj,ej) in (61). Thus (61) yields a real solution in A' and the formal solvability is clear in this case. The system (62) cannot be treated by a standard implicit function theorem because of the appearance of small divisors. We denote v == u.
(63)
Assuming the eigenfunction basis for A in (1) given by exponentials, we have thus
v(n, k) == u( -n, -k)
(64)
Bourgain
26
(if one would consider a real eigenfunction basis {CPn}, then clearly v(n, k) == u(n, -k)). Duplicate the equations (62) considering the system
+ Jln) U(n, k) + c ~~ (n, k) == 0 ((A', k) + Jl-n) v(n, k) + c ~~ (n, k) == 0 (- (A' , k)
{
tt 5 (n, k) tt -5. (n, k)
(65)
We solve this system in (UI(n,k)~S' VI(n,k)~S) by Newton's algorithm. The relation (64) will be preserved for the approximative solutions. Recall the formal scheme. Consider the equation
F(y) ==
o.
(66)
Starting from Yo (here 0), the consecutive approximations are defined by
(67) and
(68) In our case (65), the linearized operator T is given by T
== D + c5
(69)
where D is diagonal
D~,k == -(A', k)
D;:,k == (A', k)
+ J-Ln + J-L-n
(70)
and
5 ==
8 82H f5U7JV ( 8 8 2H
(71)
~
84> expresses 4J-multiplication in Fourier, thus here 84> (( n, k) (n', k')) == ~(n - n', k - k').
(72)
The operator S is selfadjoint and depends on the given approximative solution (u, v). Along the construction, we satisfy an estimate on their Fourier coefficients of the form
!u(n, k)1 < C e-(\n!+lk\)C
(73)
Solutions of Hamiltonian Evolution Equations
27
for some fixed c > O. This fact clearly yields a corresponding off-diagonal decay estimate for each of the 4 matrixes 8> appearing in (71)
(74) The main difficulty consists in controlling the inverse r- 1 , due to the fact that the diagonal elements IIlay be arbitrarily small. Fixing some c « 8 < 1, call a site (n, k) singular if
or ID~,kl == I(A' , k)
+ fL-nl < 8.
(75)
The geometric structure of the singular sites plays an important role in the study of r- 1 (in particular, one exploits essentially their separation properties). Assume that one succeeds to ensure for these operators T bounds (76) on the inverse of the restriction
(77) where, for instance, B(N) grows at most polynomially
B(N) < Ne
(78)
or at least much slower than exponential. Assume H (u, v) is a polynomial expression. At the jth step of the Newton iteration, an approxirnative solution y is obtained and we assume that supp it j is contained in an (n, k) - ball of radius Mj
(79)
for some constant M. Letting Yj == (Uj, Vj) and F(y) the left number of (65), assume
(80) Since F is expressed as a polynolnial in U, V, we get
(81) Put N == Mj+l and define Yj+l by (cf. (67))
(82)
28
Bourgain
Hence, by (78), (80) (83) and by (82) (84) From (80), (83) and off-diagonal decay of T and T N1 (cf. (74)), the first term in (84) may be bounded by
(85) Hence, from (84)
(86) Letting c > 0 be sufficiently small, one may conclude that (87)
IIYj+1 - Yj 11 < e- 3 (Mj)C < e- 2 (Mj+l)C
(88)
which is compatible with (73), (79). The main difficulty is to obtain the estimate (76) and the off-diagonal estimate, say
ITN 1 (x,x')1 < e- 2 Ix-x 'I 1
c
for
Ix -
N
x'I> 100.
(89)
Coming back to (38), assume (except for (n, k) E S)
1- (k,"\) + J-Lnl >
(1 +
Ikl)-c
(90)
for some constant C. Assuming "\' as in (28), it follows that also
1- (k, ,.\') + J-Lnl >
1
2 (1 + Ikl)-c for Ikl < c- 1/ 2C .
(91)
N < c- 1 / 2C
(92)
Thus, for
one may invert T N == DN
+ c SN
by a Neumann series, i.e. 00
T N1 == D N1
+ 2:(-l)j c j
(SND N1 )j
j=l
since
IISN D N1 < Vi, 11
and (78), (89) are clearly satisfied.
(93)
Solutions of Hamiltonian Evolution Equations
29
In order to fulfill (78), (89) at later scales, requires to impose further conditions on (A, A', a); a == (al, .. " ab), considered as initial parameters in the Newton iteration process (recall that at each stage, T depends on previous approximative solution). The effect of these further conditions is to excise from the (A, A', a)-parameter set exceptional subsets which measure tends rapidly to zero. Since A' will be determined by solving the Q-equations (61) as A' == A + c(;\, a) (94) our aim is to remain with conditions on (A, a) as in (32), (34); in particular in a model when a is fixed, A will be restricted to a Cantor set of positive measure. Essentially speaking, these conditions on (;\, A', a) consist in keeping certain expressions E(A, ;\', a) away from zero, where E(A, A', a) is differentiable with more sensitivity on A' than A, a. In order for iLl(n,k)ft S to solve the P-equations, previous conditions on (;\,;\' a) need to be fulfilled. We assume however, that iL!(n,k)ft S is smoothly defined on the entire parameter set (A,;\I, a). Substitution in (61) allows then to solve in A' invoking the standard implicit function theorenl. One gets (94) or more precisely (assuming aj E lR)
(95)
where the second term depends on a == (a I,
... ,
ab) only.
In carrying out this program, there is a distinction between the case of time periodic solutions (b == 1) and quasi-periodic solutions (b > 1). The case b == 1 turns out to be significantly easier. The arithmetic properties of the sequence {tLn}, more precisely the structure of clusters of the form (96) {n E Z d I ItLn - tL I < I} when tL ---+ 00 plays a basic role. In the PDE-context, the sequence {Pn} is infinite and its properties depend on the space dimension d. It turns out however that conditions of the form (55) are unnecessary and appear as an artifact of the KAM approach. Remarks (i) The method described above is rnore flexible than KAM. In fact, in the main body of the analysis, which consists in solving the P-equations, the Hamiltonian structure plays essentially no role. There are variants. For instance, one may use a truncated perturbation series to
Bourgain
30
obtain an approximative solution of the P-equation up to E K , for any power K, and then apply the more rapidly converging Newton method and get an actual solution, E K -close to the approximative one. (The Hamiltonian counterpart of this consists in achieving a normal form with nonresonant part O(c K )). The condition (38) b
(97)
LkjAj+J-Ln=F O (nf/-{nl, ... ,nb}) j=l
(1 e Melnikov condition) expresses non-resonance of the normal frequencies with the tangential ones. In the resonant case, one modifies the previous scheme as follows. Define now the resonant set S => S as
S == {( n, k)
I - (A, k)
+ J-Ln == O}
(98)
and consider for the Lyapunov-Schmidt decomposition ..-..
+ Aj)aj + E ~~ (nj, ej) = 0 (-(A', k) + J-Ln) ft(n, k) + E ~~ (n, f.;) == 0
(-Aj {
(j = 1, ... , b)
(99)
for (n, k) E S\S
(Q-equations) and
(-(A', k)
+ IJ-n)
u(n, k)
+ E ~:
(n, k) = 0 for (n, k)
1. S
(100)
(P-equations) . Fix
(A'.)· J J= 1 , ... ,b' {a·}· J J= 1 , ... , band
ftl (n,k)ES\S ~
and solve ftl(n,k)~S from (100). One then uses (99) for the deternlination of A' and ftls\s' A typical example of this appears in Theorem 131 below. (ii) The Lyapunov-Schmidt approach to (1) yields a new Inethod to solve stability problems in finite dimensional phase space as well. Observe that the Halniltonian system . Pn
8H. qn 8qn
== -
8H 8Pn
== - -
(n == 1, ... , N)
(101)
Hamiltonian
H(p,q) == H(Pl,···,PN, ql, ... ,qN)
(102)
is equivalent to ..
~u
8H 8u
== 2 -
(103)
31
Solutions of Hamiltonian Evolution Equations
where u == p + iq. Thus one gets a new proof of the KAM theorem (stability of N tori is 2N dirnensional phase space) and Melnikov's theorem (more generally, stability of tori of dirnension n :::; N) along the lines of Lyapunov's theorem (the periodic case n == 1). For Lyapunov's result, the non-resonance condition is (104) corresponding to the case b == 1 in (38). Hence, in the finite dimensional case, we prove a Melnikov theorern for (1) under the non-resonance assumption (38), without the need of the extra assulnption (55). This result is new. From the previous remark, it appears that besides multiple normal frequencies one nlay investigate also the case of norlnal frequencies resonant with the tangential ones, thus when (104), (38) are partially violated. Presently there has been no systematic study of this. (iii) Consider a NLW
Btt
- ~y
+ V y + c F' (y) ==
(105)
0
and denote
(106) (assuluing this makes sense). Rewrite (105) in the following Haluiltonian form
iJ==BZ { i == -By - c B- 1 F'(y)
(107)
considering instead of L 2 the Hilbert space H 1 / 2 with scalar product
(u, v) 12 == (u, Bv). Denoting u
==
y
+ iz,
(108)
(107) yields iit - Bu + c B- 1 F'(Re u)
==
(109)
0
which is of the form (1) with A replaced by -B. Thus the spectrum in ID under periodic (resp. Dirichlet) boundary conditions is given by -y0:j (resp. -y1i;;) and behaves as
y0:j = In this particular case, V
==
1r; + 0 (7)' p
y1i;; =
1rn
+0
(~) .
(110)
(constant), we get for the periodic spectrum
(111 )
32
Bourgain
(cf example (iv) above). Next, we state some concrete results.
THEOREM 112. Consider a ID NLS [Kl], [Bl] .
U xx
'lUt -
or
. 'lUt
where H and
Ma
==
==
U xx
+ VU + c
8H 8il == 0
aH
+ Mau + c ail == 0
(113)
(114)
H (u, il) is polynomial or real analytic, V a real analytic periodic potential
a real Fourier multiplier (as discussed in the examples). Consider an unperturbed
== 0
solution of (113), (114) with c
b
uo(x, t)
==
L aj ei)..jt CPnj (x)
(115)
j=l
where )..j
==
J-Lnj (j
).. == ()..1, ... , )..b)
== 1, ... , b)
and {CPnj} are the corresponding eigenfunction s. Consider
as a b-parameter (for equation (113), this may be achieved by appropriate
variation of V, cf. (35)-(36)). Assume a non-resonance condition (38)
IJ-ln + (k,)..) I 2: c (1 + Ikl)-c
(116)
satisfied, for Ik I < N sufficiently large (depending on H). Then, for).. E A€(lal), a subset of the parameter set of small complementary measure when c ---+ 0, there is a perturbed solution u€ of (113), (114) with perturbed frequency)..' u€(x, t)
==
L u€(n, k) ei ()'"
,k)t
CPn(x)
(117)
n,k
satisfying (26)-(28). Thus
(118)
L
e(lnl+lkl)c lu€(n,
k)1 <
Vi
(119)
(n,k)rt. S
\)..'-)..\ < Cc for some c
> 0,
and where S
==
{(nj, ej) (j
(120)
== 1, ... , b}.
In the case of (113), with Dirichlet boundary conditions, assume V even and H(u, il) even (hence ~It, odd), see discussion in example (ii) above.
Solutions of Hamiltonian Evolution Equations
33
THEOREM 121. The analogue of Theorem 112 holds for 2D Schrodinger equations, but in (113) the potential V (Xl, X2) should be assumed of the form VI (Xl) + V 2 (X2) (see discussion in example (iii) above).
[B3]
In ID, a result similar to Theorem 112 may be stated for NLW equations [Kl], [W] Utt -
+ V(x)u + E j'(u)
U xx
== 0
(122)
and even derivative NLW equations [Bl]
Utt -
U xx
+ V(x)u + E Bf'(u)
== 0
B
__ (-d
2
dx 2
) 1/2
.
(123)
In the next theorem, the role of outer parameters in the equation is replaced by amplitudefrequency modulation.
THEOREM 124. Consider a ID NLW Utt -
and assume P
U xx
+ PU + (u 3 + higher order terms)
== 0
(125)
2:: 0 is a typical number in the sense of linear independence of the sequence (126)
Fix a sequence
(127) and consider the solution b
uo(x, t) ==
L aj
cos njX . cos Ajt
(128)
+ pu == O.
(129)
j=l of the linear equation Utt -
There is a Cantor set C C {a
==
asymptotically full measure when
U xx
(aj)j=l, ... ,b I aj
lal
-t
> O}
of positive measure, in fact, of
0, such that for a E C the solution (128) of (129)
persists for (125) b
u(x, t)
==
L qj j=l
cos njx . cos Ajt + 0(laI 3 ).
(130)
Bourgain
34
The persistency problem for higher dimensional wave equations seems more difficult, due to the behavior of the frequencies 1nl == (ni + ... + n~)1/2 when d ~ 2. One may treat however, the special case of time periodic solutions in any dimension (this is also the case for NLS).
THEOREM 131. Consider the periodic wave equation in dimension d
[B2] Utt -
where again p of the form
>0
b,.u
+ pu + (u 3 + higher order terms) == 0
(132)
is a typical number. More precisely, we require p to satisfy a condition
I~ k j
tll
>
(L Ikjl)
r
-C
for all {k j
}
E
zr+1\{O}.
(133)
.Fix no E Zd\ {O}. There is a Cantor set C of positive measure in an interval [0,6] and for
Po E C a solution of (132) of the form U(x, t) == Po cos ((no, x)
+ At) + 0 (p~)
(134)
where
(135)
The next two results are normal forIn reductions, bringing the problem back to perturbations of a linear problem with parameters. (cf. the discussion (39)-(45)).
THEOREM 136. Consider a ID NLS (137) where
f
is a polynomial or real analytic and satisfying a non-degeneracy condition
f' (0)
=1=
o.
(138)
Consider (137) with periodic boundary conditions say. Fix a sequence of positive integers
(139) Then for a solution
==
{aj }j=l, ... ,b
in a Cantor family C of positive measure, there is a quasi-periodic b
u(x, t) ==
L j=l
qj ei(njx+,Ajt)
+ 0(laI 3 )
(140)
Solutions of Hamiltonian Evolution Equations
35
with frequencies A~, ... ,A~, where Aj
==
n; +
m
+ 0(
a
2
I
1
(j == 1, ... , b).
)
(141)
This result is due to S. Kuksin and J. Poschel (under Dirichlet boundary conditions) [K-P]. THEOREM 142. Consider the 2D cubic NLS [B3] iUt -
~u
+ culul 2 ==
0
(c -I- 0)
(143)
or, more generally, an 2D NLS
(144) with f as above in Theorem 136, with periodic boundary conditions. For the modes nI, ... ,nb E Z2, we fix 2 lattice points nI, n2 on a same circle
(145) (more complicated structures involving more then 2 points may be treated as well but this is the simplest case). There is a Cantor family C of positive measure such that for a == {aj }j=1,2 E C (148) (144) has a quasi-periodic solution 2
u(x, t) ==
L aj
ei((nj ,x)+'\jt)
+ O(laI 3 )
(146)
j=1
with frequencies A' == (A~, A~)
(147)
Further Comments. It seems a natural program to extend the classical theory of smooth dynamical systems to the setting of infinite dimensional phase space, in particular in the context of Hamiltonian PDE's as discussed here. A subject closely related to persistency of invariant tori is that of Nekhoroshev stability [N]. This phenomenon may be roughly stated as follows. Consider in 2N-dinlensional phase space a perturbed Hamiltonian (148)
36
Bourgain
where (I, cp) are action-angle variables. The unperturbed Hamiltonian is assumed to be of the form N
Ho(I) ==
L
Aj I j +
C
L IJ + 0(1
3
)
(149)
j=l
where either the linear part is non-resonant or c ::j: 0 (i.e. Ho is strictly convex). Denote I(t), cp(t) the evolution of I, cp for the H-flow
(j == 1, ... , N).
(150)
Then there is stability II
(t) - 1(0) I <
Ea
(151)
for exponentially long time (152) where the relation a, b depends on the dimension N. It is expected that typically for sufficiently long time, there will be a "drift" of the action variables, known as Arnold diffusion. The question we address is to what extend such result nlay be formulated for Hamiltonian PDE's. There are 2 immediate problems arising here. (i) The fact that the phase space is infinite dimensional (in the proof of the previous result, at least in the convex case, the role of the dimension is significant). (ii) The choice of an appropriate topology on the phase space. Some results may be proved in the non-resonant region [B4]. For instance, consider a ID NL wave equation
Utt-Uxx+PU+Ej'(U) ==0
p, i.e.
(153)
p}
+ have good Diophantine properties. Assume f'(u) odd and E small. Then, for odd and smooth initial data u(O), Ut(O), the corresponding solution to (153) will evolve close to a quasi-periodic function of time for tirnes t, ItI < Tc; where Tc;
with typical
{y'n 2
may be taken to be any power of E- 1 . The key facts on which the argument is based are the following. (i) The possibility to perform a perturbative analysis up to any order and obtain an approxinlative solution Uo of (153). (ii) The fact that the equation linearized at Uo has zero Lyapunov exponents.
37
Solutions of Hamiltonian Evolution Equations
Presently, there do not seem to be satisfactory PDE counterparts of the Nekhoroshev theorem under convexity assumption of the Hamiltonian, even endowing the phase space with the weak topology. On the other hand, let us make also followingwing qualitative comnlent. Global smooth solutions 'lL to (153) may be shown (in general) to satisfy estimates on higher derivatives of the form (154) hence with exponential time dependence. Recently, this crude estimate was ilnproved to a power-like bound (see [B5])
lIu(t)IIHS < (1
+ Itl)A(s-l) (8 > 1)
(155)
yielding an upper bound on how fast energy may travel from low to high Fourier modes. Related exalnples show that this is essentially an optimal result, up to the value of the constant A. The approach used is fairly general and combines algebraic conservation properties with the local theory on the initial value problem. Similar results may be obtained for the nonlinear Schrodinger equation, for instance the ID cubic NLS (156) (known to be non-integrable) and in infinite volume, i.e. on the line instead of with periodic boundary conditions.
Bourgain
38
References (strictly for the purpose of this expose) [B1] J. Bourgain: Construction of quasi-periodic solutions of Hamiltonian perturbations of linear equations and applications to nonlinear PDE, IMRN, Vol.11 (1944). [B2] J. Bourgain: Construction of time periodic solutions of nonlinear wave equations in higher dimension, preprint 1995, to appear in Geometric and Functional Analysis (GAFA). [B3] J. Bourgain: Quasi-periodic solutions of Hamiltonian perturbations of 2D linear Schrodinger equations, preprint IHES/M/95/1. [B4] J. Bourgain: Construction of approximative and almost periodic solutions of perturbed linear Schrodinger and wave equations, preprint. [B5] J. Bourgain: On the behavior of higher Sobolev norms of smooth solutions of nonlinear Hamiltonian PDE, preprint. [C-W] W. Craig, E. Wayne: Newton's method and periodic solutions of nonlinear wave equations, Comm. Pure and Applied Math. 46 (1993), 1409-1501. [K1] S. Kuksin: Nearly Integrable Infinite-Dimensional Hamiltonian Systems, LNM 1556
(Springer) .
[K2] S. Kuksin: Perturbation theory for quasi-periodic solutions of infinite dimensional Hamiltonian systems and its applications to the Korteweg-de Vries equation, Math. USSR Sbornik 64 (1989), 397-413. [K-P] S. Kuksin, J. Poschel: Annals of Math., to appear. [N] N. Nekhoroshev: An exponential estimate of the time of stability of nearly-integrable Hamiltonian systems, Russian Math. Surv. Vo1.32 , N06 (1977), 1-66. [W] E. Wayne: Periodic and quasi-periodic solutions of nonlinear wave equations via KAM theory, Comm. Math. Phys. 127 (1990),479-528.
Scaling Limits for Lattice Gas Models S.R.S. VARADHAN Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 ([email protected]) H.T. YAU Courant Institute of Mathernatical Sciences, New York University, New York, NY 10012 ([email protected])
1. INTRODUCTION
When we consider a large system of interacting particles evolving in time, one of the natural things is to provide a simplified description of the state of the system. Rather than provide a detailed "microscopic" picture, we describe the system by providing the values of certain "macroscopic" parameters. This is best illustrated by an example. The model considered here describes particles on sites in a d-dimensional periodic lattice. Let us take a small parameter ( and consider a periodic lattice in Rd of length L == (-1. We have a certain number, nE' of particles distributed over the sites in the lattice. The important restriction is that there may be at most one particle per site. We shall scale the lattice by E so that the lattice is embedded in the standard d-dimensional torus for every value of (. We denote by x points of the embedded lattice ~ as well as points in the d-torus T d . For each lattice site x the variable 7]( x) which can take the values of zero or one signifies
t
39
40
Varadhan and Yau
the absence or presence of a particle at site x. Clearly
gets small we say that the particles are distributed according to density p(x) where p(.) is a nonnegative measurable function on T d with 0 S p( x) S 1 for all x if for every bounded continuous test function J(x) on T d we have As
E
Clearly this implies that the total number of particles, nE' should be proportional to L d as d E ---+ 0, the constant of proportionality is the total integral of p(x) on T . Imagine now that the system of particle evolves in some stochastic manner. Particles may jump at some Poisson rates to nearby sites if any of them should be vacant. The Poisson rate could depend on the irnmediate environment of the current particle. This would of course happen simultaneously for all the particles. Although locally the system will be changing fast, the local" density" of particles will change rather slowly because no particles are created or destroyed . If we speed thing up then the" density" changes normally but locally the system changes its microscopic state very rapidly. Assuming that density is the only conserved quantity there will be an one parameter family of invariant measures for the evolution. Because of the rapid time scale, locally the system will be near equilibrium dictated by the local density. The local density then is a function of space and time. One of the goals is to write down a partial differential equation that controls the evolution of this density function p( t, x).
2. THE MODEL In order to describe the dynamics mathematically we proceed in the following manner. Let us denote by n the space of all configurations on the periodic lattice ~ t. n is only a finite set even if it is a large one. Our evolution is a Markov process with n as state space. In order to describe our Markovian evolution on n we first define a probability measure J-L on n and define a Dirichlet form relative to J-L. The infinitesimal generator, L, will be the operator corresponding to the Dirichlet forrn. Suppose b is a bond connecting two nearest neighbor sites on the lattice. For any configuration ry we can define a new configuration ryb obtained by interchanging the values of ry at the two ends of the bond. This corresponds to a particle moving from the occupied site at one end of the bond to the free site at the other end of the same bond. Of course if both ends are free or both are occupied then nothing happens and ryb == ry. For any function f on n and any bond b we can define
Scaling Limits for Lattice Gas Models
41
\7 b (f) (TJ) == [f (TJb) - f (TJ)]. The Dirichlet form D L (f) then takes the form
Dd!) =
r I)"V7b(f)(1])f dfl
In
b
The measure J-L on the space 0 is specified as a Gibbs measure which means that the probabilities for the various points TJ in 0 are defined by
J-L(TJ) == exp [-
L: xEZL
Fx(TJ)]/Z
f
Here Z is a normalization constant, F (TJ) is a local function of TJ depending on {rJ (y)} for a few y near the origin and Fx is the translation of F by x so that F x (77) is the sarne function but around the site x. The surn in the exponent is to be thought of as the energy so that J-L( 77) == exp[ - H (TJ)]. The total energy H is the sum of local energies PT' If F depends only on TJ(O) then J-L is a product measure. Otherwise J-L has some dependence built into it. A simple class of examples are the Ising measures. Given a J-L through the F and given the Dirichlet form we have a Markovian evolution. In this evolution the total number of particles is conserved. If we denote by OL,n those configuration on our lattice of length L which have n particles, this subset will be an irreducible invariant subset of our Markov process so that the restriction J-LL,n of J-L to this set, correctly normalized, will be the invariant ergodic measures and one should really think of the evolution taking place on anyone of these irreducible components. One final comment is that we need to speed up the evolution by a factor of £-2 and this means that the Dirichlet fornl gets a factor of £-2 in front.
3. HYDRODYNAMIC SCALING We make a qualitative assumption concerning the energy function F. We assume that it is such that it possesses a unique infinite volume Gibbs rneasure and more over certain stronger conditions known as the mixing conditions of Dobrushin and Shlosman are satisfied. This is the case if F is snlall in some sense. According to the work of [LY, Y], such a condition implies a uniform estimate on the spectral gaps for our(speeded up) Markov generators on various SlL,n. Let us start with some initial configuration TJ that has asymptotically Po(x) as macroscopic density, i.e. lim E-+O
L: J(x)r/(x) = JTd r J(x)po(x) dx ;r
Then with respect to the measure PE' representing our Markovian evolution, the following limit holds uniformly on any finite interval in probability: lim E-+O
L: J(x)1](t, x) = JTdl J(x)p(t, x) dx x
42
Varadhan and Yau
Here the function p( t, x) is given as the unique solution of the following nonlinear diffusion equation. ap 1 at == 2\JD(p)\J(p),
p(O, x) == Po(x) The diffusion matrix D is to be deternlined. It can be specified by a Green-Kubo formula or, as we do here, through a variational fornlula which makes it easier to work with. In the definition of J-L(ry) we can add to the exponent a term of the form ,X Lx ry(x) so that the formula in a finite volume reads
J-L(ry) == exp[,X
L
ry(x) -
x
L xE7Z
Fx(ry)]/Z
f
In the infinite volume limit this will produce a stationary process with density p which is a function of the real number 'x. As A varies from -00 to 00, p(,X) will go from zero to one and A(p) is the inverse function. An infinite volume Gibbs measure J-L can be specified by its density p or by its chemical potential A.
4. FORMULA FOR THE DIFFUSION MATRIX For simplicity let us do the calculation in one dimension. We want to calculate the change in time of an object like E I:x J(x)ry(t, x) where ry(t,.) is the configuration at time t. Computing the change in time gives us two terms. One of them is a martigale term which is seen to be negligible for small E by a mean square calculation. The other term is obtained by applying the infinitesimal generator to the quantity of interest and is of the form
- L J/(x)W ,x+l(ry(t, .)). x
x
Here W x ,x+l is the current from x to x + 1. The system is called a non-gradient system because W x ,x+l is not the difference V7;+l - Vx for some local functional V of the configuration. Here the functions W x ,x+l and Vx are basically the same function translated by shift to the point x on the lattice. In a gradient system we can do a summation by parts and reduce the above quantity to x
We can calculate the expected value v(p) of Vx in the Gibbs state with density p and this leads to the diffusion equation ap a 2 v(p(t,x)) at ax~
Scaling Limits for Lattice Gas
~fodels
43
Our case is non-gradient and we have to replace W x ,x+l by a term of the form D(TJ(x+ 1) - TJ( x)) for a suitable constant D depending on the local density p. This leads to the diffusion equation 8p(t,x) == ~D( ( ))8 p(t,x) at 8x p t, x Dx The function D(p) as a function of p is determined in the following manner. We start in equilibriurD with the Gibbs measure with density p in infinite lattice. If f is a local function then 9 == £f , WO,l ,(TJ(l) - TJ(O)) are all mean zero functions. Their integrals over time has a central limit theorem. Let us denote by a 2 (f, D, p)
== lim _[_1- lim ~ x l-+oo
Var
2
+ 1 t-+oo t
(it -1~:'O1
[W",x+l(T/(S, .)) - 9x(T/(S, .)) - D(T/(x + 1) - T/(X))]) ds
Then D(p) is determined so that inf a 2 (f, D == D(p), p) == 0 . f
There is a multi-dimensional analog for replacing the current by D(p) times \7 p and this determines the matrix D (p).
REFERENCES [LY] Lu, S. L. and Yau, H. -T.: Spectral gap and Logarithmic Sobolev Inequality for Kawasaki and Glauber dynamics, Commun. Math. Phys. 156,399-433, 1993. [V] Yau, H. -T.: Logarithmic Sobolev inequality for lattice gases with mixing conditions, preprint.
Multivariate Distributions with Gaussian Conditional Structure BARRY fornia
c.
ARNOLD
JACEK WESOLOWSKI saw, Poland
Department of Statistics, University of California, Riverside, Cali-
Mathematical Institute, Warsaw University of Technology, War-
Key words: quasi- Gaussian distributions, classical normal distribution, normal conditionals distribution, elliptical contours, linear regression, mixtures, K agan class.
ABSTRACT
Multivariate distributions exhibiting some features of the conditional structure associated with the classical normal model are investigated. Features considered include conditional distributions of subvectors and conditional moments. Our understanding of the classical normal model is enhanced by the study of such quasi-Gaussian distributions together with investigation of additional assumptions required to characterize the classical normal model. Special attention is paid to the class of distributions exhibiting Gaussian conditional structure of the second order, Le. those in which the conditional moments of orders one and two match the Gaussian model. 45
46
1
Arnold and Wesolowski
THE CLASSICAL MULTIVARIATE NORMAL DISTRIBUTION
A random vector X == (Xl, X 2 , . .. , X k ) is said to have a classical multivariate normal distribution if it admits a representation of the form
where Zl, Z2, ... ,Zk are i.i.d. standard univariate normal random variables. In such a case we write X rv N(k) (I:!:.., I:). Here I:!:..ERk and I: is a non-negative definite k x k matrix. Such random variables have remarkable properties. For example:
1. All one dimensional marginals are normal. 2. All £ dimensional marginals, £ < k, are £-variate normal. 3. All linear combinations are normal. In fact for any £ x k matrix B we have
4. All conditionals are normal. Thus if we partition X == (X, X) then the conditional distribution of X given X == i2 is multivariate normal. 5. All regressions are linear. Thus for any i and any j1,j2,··. ,j£(=I= i) E(XiIXjl' ... ' Xji) is a linear function of X j !, X h , ... , X je . 6. All conditional variances are constant. Thus var(Xi IXh , ... , X j ,) is nonrandom for any i, and any j1, j2, ... ,j£( =1= i). 7. If I: is positive definite, the joint density of X is elliptically contoured.
8. X has linear structure, i.e. X admits a representation of the. form X
== ~o + AZ
where the Zi'S are independent random variables.
Most of these properties, taken individually, fail to characterize the classical multivariate normal distribution. Combinations of these properties can be used to characterize the classical model. Condition 3 does characterize the classical model. Condition 4 also will characterize the classical model provided k > 2. None of the others alone will do it. Conditions 7 and 8, together, will characterize the classical distribution. The present paper will focus mainly on two issues: the possibility of weakening the assumption of property 4 and still preserving a k-variate normal characterization (Section 2), and a discussion of models which, though not classical normal, mimic the conditional moment structure of the classical models (Section 3 and 4). Additional conditions for such structures leading to multivariate normality are outlined in Section 5.
MuItivariate Distributions with Conditional Structure
47
Some useful notational conventions follow. Suppose X denotes a k-dimensional random vector and J2ERk . A partition of X into two subvectors of dimension k and k with k + k == k will be denoted by (X, X). The corresponding partition of J2 will be denoted (i, x.). Xi will denote the ith coordinate of X. X (i) is the k - 1 dimensional vector obtained from X by deleting ~X"i' ..,X" (i,j) is obtained from X by deleting Xi and X j . Analogously real vectors J2(i) and J2(i,j) are defined.
2
CONDITIONAL CHARACTERIZATIONS OF THE CLASSICAL NORMAL MODEL
Suppose that for each i and for each J2(i)ERk - 1 the conditional distribution of X(i) == J2(i) is normal with a mean and variance that may depend on J2(i) , i.e.
Xi
given
(1) In this case, generalizing the early results of Bhattacharya (1943) and solving an appropriate set of functional equations, one Inay verify that X must have what we may call a k-variate normal conditionals distribution with density of the form:
(2) where
(3) There are necessary restrictions on the ranges of the r's in (2.3) in order to ensure integrability and to ensure that all expressions for conditional variances are uniformly positive. Of course roo...o is not really a parameter, it is a normalizing factor that is a function of the remaining r's chosen to ensure that the integral of the joint density is 1. If X, of dimension k, has a density of the form (2.2) we will write
See Arnold, Castillo and Sarabia (1992) for a more detailed introduction to the normal conditionals nlodel. The classical k-variate normal distribution is of course a special case of the normal conditionals model (2.2), since obviously it satisfies the required condition (2.1). It can be recognized by the fact that for such a distribution all coefficients ri. for which L:j=l i j > 2 must be zero since, in order for (2.2) to represent a classical normal model, G(J2) must be a quadratic form. Many characterization programs may be viewed as beginning with conditional normal requirements leading to the model (2.2), or some related submodel, and then imposing additional conditions to ensure vanishing of the "unwanted" coefficients (i.e. ri.'s with L:j=l i j >
2).
Arnold and Wesolowski
48
To begin with, we may recall that the classical normal distribution actually has far more conditional normal distributions associated with it than those alluded to in (2.1). In fact, if X rv N(k)(~, E), then for any partition of X into subvectors X and X of dimensions k and k with k == k + k we have
(4) Since all subvectors of X are again classical normal, even more conditional distributions, analogous to those in (2.4) but now based on partitioning subvectors of X, are again guaranteed to be normal. Assumption (2.1) is not enough to guarantee the classical model. Assumption (2.4) is more than enough (provided k > 2, otherwise (2.1) and (2.4) coincide and fail to characterize the classical normal model). In fact one may prove (see Arnold, Castillo and Sarabia (1994)) that, for k > 2, a sufficient condition to guarantee a classical multivariate normal lllodel is an assumption that for each i, j and each ~(i,j)fRk-2
(5) The key observation is that (2.5) implies that for each i, XiIX(i,j) == ~(i,j) is normal, since the classical bivariate normal has normal marginals. Consequently (2.5) is enough to guarantee that
(6) and for each i, X(i)
rv
NC(k-l)
(1(i)) .
(7)
However marginals of a normal conditionals distribution (2.2) can only be of the normal conditionals form if certain 1's are zero. In fact (2.7) guarantees that all the "unwanted" 1's are zero, and the fact that X must have a classical normal distribution is a consequence. Of course the conditional mean functions and conditional variance functions which are encountered in the normal conditionals model (2.2) are not the familiar linear regressions and constant conditional variances associated with the classical model. If we are willing to assume, in addition to the assumption that each Xi given X(i) is normal, that the conditional variances are constant, i.e. that XiIX(i)
== ~(i)
rv
N(Pi(X-(i) , a;)
,
(8)
then the unwanted 1's in (2.2) are forced to be zero and we must have X, rv N(k) (~, E), i.e. classical normal. An analogous alternative sufficient prescription is the requirement that in (2.1) each Pi(~(i)) be a linear function of ~(i)' It is indeed well known that for a classical k- variate normal random vector X we can explicitly write the parameters in the conditional distribution of X given X in terms of the original parameters of the -distribution of X. Thus with X == (X, X) and H == (H,~) we have
(9) where
E == (Ell E1 2 E 21
E 22
)
Multivariate Distributions with Conditional Structure
49
and
E l1 .2 == En - E12E221 E 21 . The linear regressions and constant conditional variances are explicitly displayed in (2.9). Linear regressions are not that unusual in multivariate distributions. Constant conditional variances are unexpected. In some ways they are even counterintuitive. Taken together, the requirements of linear regressions and constant conditional variances seem potentially so restrictive as to possibly, alone, suffice to characterize the classical normal model. They don't. But verifying that they don't and asking what additional requirements will lead to characterizations is an interesting exercise that enriches our understanding of the real nature of the curious classical multivariate normal model. The topic will be addressed in the next section. Before leaving the study of conditional normality assumptions to focus on conditional moment assumptions, it is worth returning to the list of 8 properties of the classical model listed in section 1. Which of these in addition to (2.1) (i.e. Xi given X(i) == ~(i) is normal Vi, V~(i)) will guarantee classical multivariate normality. We have already considered properties 5 and 6. Property 1 has potential, since marginals of normal conditionals models are typically not of the normal conditionals form and a fortiori not (classical) normal. In fact, if all one dirnensional marginals of X are normal and (2.1) holds then the unwanted l's in (2.3) must disappear and the classical normal model is obtained. Turning to condition 2, it is a condition that subsumes 1 and consequently can be used to characterize the classical model. Actually far less is needed. For example, if in addition to (2.1), each X(i) is classical (k -1)variate normal, then X must be classical k-variate normal. Indeed any marginal normality statement sufficient to guarantee one dimensional normal marginals will obviously suffice. Turn next to condition 3. If k linearly independent linear combinations of the coordinates of X are normally distributed and if (2.1) holds then, by a suitable linear transformation, we have == B.-x.: with normal conditionals (i.e. (2.1)) and normal one dimensional marginals. Then and consequently also X is a classical normal random vector. Next turn to condition 7, elliptical contours. This is easily dealt with. The contours of the normal conditionals density are determined by G(~) in (2.3). Their form will be elliptical only if the unwanted l's are all zero; i.e. only in the classical normal case.
r r
Finally consider condition 8. The assumption of linear structure turns out to be particularly fruitful in conjunction with certain conditional moment assumptions, as we shall see in the next section. In conjunction with the normal conditionals assumption, i.e. (2.1), the role of linear structure is less evident. Assumption 8 does however imply the existence of a linear transformation of X (with density (2.2)) that has a density which can be factored. This does imply that the unwanted l's in (2.3) must be zero and does indeed guarantee classical multivariate normality.
3
GAUSSIAN CONDITIONi\L STRUCTURE
What can we say about collections of random variables which exhibit linear regression functions and constant conditional variances? We will, following Wesolowski (1991), call
50
Arnold and Wesolowski
this property Gaussian conditional structure of the second order. Formally we will say that a random element (or indexed collection of random variables) X == {Xa : aEA} exhibits Gaussian conditional structure of the second order and write XEGCS 2(A) if for any n == 2,3, ... and any ai, a2, ... , anEA,
(ii) var(Xa1 IXa2 , ... , X an ) is non-random.
At times it is convenient to use the term Gaussian conditional structure of the second order to refer to the distributions or probability measures associated with the random element rather than with the random element per se; we will do this at times without explanation and without fear of confusion. To avoid trivial examples we will usually implicitly assume that the Xa's are linearly independent and not uncorrelated. Collections of independent random variables could otherwise provide uninteresting examples of Gaussian conditional structure of the second order. Observe that A could correspond to the natural numbers or the reals or positive reals. Consequently time series will be subsumed in the class of random elements under consideration. Spatial processes can be viewed as being random elements associated with a set A that is a subset of R k . Any normal process or, more generally, any Gaussian random element, will obviously exhibit Gaussian conditional structure of the second order. Our main focus will be however on random vectors of dimension k; i.e. on random elements where A == {I, 2, ... , k}. If X == (Xl, ... ,Xk ) exhibits Gaussian conditional structure of the second order we will write
XEGCS 2(k). A remark is in order about the subscript 2 that appears in our definition of Gaussian conditional structure of the second order. One could obviously ask that the random element mimic the conditional moment structure of a Gaussian element with regard to more than the first two conditional moments. One could ask for the first j conditional moments to behave as they do for Gaussian elements. The class of random elements exhibiting such behavior would be denoted by GCSj(A) instead of GCS2(A). Our focus will be on GCS2(A). Only once will we briefly mention how we might construct non-Gaussian members of GCSj(A), for j > 2. If XEGCS 2(k), it is natural to ask whether X must necessarily be Gaussian. The question is already meaningful and reasonably challenging when k == 2; i.e. in the case of bivariate distributions. Kagan, Linnik and Rao (1973) provide the following lemma indicating the nature of characteristic functions associated with GC 8 2 (2) distributions. Lemma 3.1: In order that the two-dimensional random vector (..(X", Y) satisfy (i) E(y"IX) ==
Multivariate Distributions with Conditional Structure
51
a + j3X and (ii) var(YIX) == a 2 (a constant), it is necessary and sufficient that the characteristic function of (X, Y) satisfies
(1) and
2
2
2 2 . 8 d 2 d -a 2 q;(t 1 , t 2 ) It2=O == - (a + a )<;b(t 1 , 0) + 2zaj3-d q;(t 1 , 0) + j3 -d 2 r/>(t 1 , 0) . t t t 1
2
(2)
1
If one, as do Kagan, Linnik, and Rao, then assumes that (X, Y) has linear structure (i.e., satisfies condition 8, of section 1), then we may verify that indeed (X, Y) must have a classical bivariate normal structure. Examples of non-Gaussian characteristic functions satisfying the conditions of Lemma 3.1 are not that easy to visualize. It is in fact probably an inappropriate approach to the problem of verifying that there do exist random vectors with GC 52 that are not classical normal random vectors. It is probably more fruitful to seek non-Gaussian density functions that will exhibit the required conditional properties (and a fortiori will have characteristic functions satisfying the conditions in the Lemma). The first example of this genre was provided by Kwapien sometime prior to 1985. It was first reported in Bryc and PI ucinska (1985). It was in fact presented in terms of the joint characteristic function. He considers a random vector (-"Y, Y) whose joint characteristic function is given by <;b x,v ( s,
t) == P cos (s + t)
+ (1
- p) cos (s - t) .
(3)
where p E (0,1) and, to avoid independence, P I- 1/2. It is obvious that (3.3) does not correspond to a Gaussian random vector and it is not hard to verify that conditions (3.1) and (3.2) hold, as do the parallel conditions corresponding to interchanging the roles of X and Y. Consequently (X, Y.) with characteristic function (3.3) does exhibit Gaussian conditional structure of order 2, i.e. (X, Y) E GC52 (2). Where did (3.3) come from? And, why does it work? The picture is clearer if we look at the following joint discrete density of a random vector (X, }/")
x -1
1
-1
l=E
E.
1
l!.
l=E
y
(4)
!x,v(x, y) : 2
2
2 2
where P E (0, 1) and, to avoid independence, pI-I /2. It is readily verified that this is indeed Kwapien's exampIe (the corresponding characteristic function is given by (3.3)). But the joint distribution (3.4) has marginals with only two possible values. This gives linear regression functions by default (any function with a two point dornain is linear!). Constant conditional variances are a consequence of the fact that p(l - p) == (1 - p)p. The elegant simplicity of the Kwapien example would suggest ready extension to higher dimensions. However, only recently (Nguyen, Rempala and Wesolowski (1994)), have any
Arnold and Wesolowski
52
other (other than relabeled versions of the Kwapien example) non-Gaussian examples been described in either two or more dimensions. Indeed there were some disturbing indications that Gaussian conditional structure of the second order might be more restrictive than one would initially imagine. Bryc and Plucinska (1985) showed that if we consider a random element where A == {I, 2, ... } then under mild regularity conditions, Gaussian conditional structure of order 2 i.e. GCS2 (1, 2, ...) is sufficient to guarantee that X must be a normal process. Earlier, in a series of papers (Plucinska (1983), Wesolowski (1984) and Bryc (1985)), an analogous result was obtained in case on which A == R +. The first step to finding non-Gaussian random vectors of dimension greater than 2 with Gaussian conditional structure of order 2, would focus on 3 dimensional exarnples and, following Kwapien's lead, would focus on simple discrete examples in which the conditional moment conditions will transform into relatively simple equations in the unknown cell probabilities. Thus for example we might seek a 3-dimensional discrete distribution whose second order conditional structure will match that of a 3 dimensional classical normal distribution of the form
(5) For such a distribution we will have E(Xi ) == 0 and var("X"i) == 1, i == 1,2,3. The conditional moments will be given by 1 E(XiIXj , X k) == 3(..X"j + X k) var(XiIXj , X k ) =
~
(6)
E("\'"i IXj ) == X j /2 and
var(Xi!Xj ) ==
~4
for all choices of i -# j -# k (since the distribution is clearly symmetric). Following the lead of the Kwapien example we would seek a 3 dimensional random vector (Y1 , Y 2 , Y3 ) with a discrete distribution with possible values for each Yi being -mJ, m-I, ... , 1,1, ... ,m and with probabilities Pijk == P(Y1 == i, Y2 == j, Y3 == k). The joint distribution should be exchangeable. The marginal means and variances should be O's and l's. The conditional means and variances of the Vi's should agree with those in (3.6). Our task is then to solve for the unknown values of {Pijk : -rn Si::::; j ::; k :S m} subject to the given constraints. If m ::::; 3, there are more constraints than variables. For m, == 4, there are 120 variables (Pijk'S) which must be non-negative and satisfy 91 linear constraints. A promising situation, although a solution is not guaranteed. Unfortunately efforts to solve such a system of equations have not proved successful. The search for a solution continues since a simple discrete example may shed additional light on the nature of the class of distributions with Gaussian conditional structure. However, the problem of constructing a non-Gaussian multivariate distributions (of finite dimension> 2) with Gaussian conditional structure was recently resolved by Nguyen, R,ernpala and Wesolowski (1994). The solution is ingenious but, retrospectively, obvious. Inspired by their exanlples, the following simple construction is possible.
MuItivariate Distributions with Conditional Structure
53
Take fo(xJ to be the joint density of a classical k-variate normal distribution with mean vector J-L and variance-covariance matrix E. We now construct a k-dimensional density which has the same conditional means and variances as does fa (x.). Pick two distinct bounded densities gl and g2 each supported on the interval (-1, 1) and each having mean 0 and variance 1. There are of course a plethora of such densities. Now consider the new k-dimensional density defined by k
f*(x.) == !o(x.)
+ C 11[91 (Xi) - 92(Xi)]
(7)
i=l
where C is chosen srnall enough to guarantee that the expression in (3.7) is uniformly positive (possible since the g/s are bounded densities). Obviously f* (x..) is non-Gaussian but all of its marginals are Gaussian and it is readily verified that all of its first and second conditional moments match those of fa (x..). The density f* (x..) thus belongs to GC 52 (k); and in fact GC 52 (k) is not just non-empty but contains an enormous variety of distributions constructed in a fashion analogous to that used to define (3.7). It is indeed possible, by putting additional higher moment conditions on the g/8 (used in the construction of f*), to find k-variate nonGaussian distributions whose conditional moments up to the m'th order (m > 2) match those of a classical normal k-variate distribution.
4
THE STRUCTURE OF THE CLASS GCS2 (k)
From the discussion in Section 3, we are aware that the class GCS2 (k) is quite extensive. Our goal in the present section is to identify characteristic properties of the class and to identify conditions sufficient to guarantee that a member of the class indeed is a classical Gaussian distribution. For notational simplicity, some of the discussion is restricted to the bivariate case (i.e. k == 2). Suppose that XfGCS 2 (k). Obviously any vectors of the form L == (C1"Y"1 + b1 , C2"Y"2 + b2 , ... , CkXk + bk ) for Cl, ... ,Ck > 0 and llER k will again belong to GC S2 (k ). Consequently there is no loss in generality if we focus on standardized members of GCS2 (k). These are random vectors ZEGC 52 (k) with the property that E (Zi) == 0 and var (Zi) == 1; i == 1,2, ... ,k. Throughout this section we will adopt the convention that if we use the notation X, we are dealing with a general nlember of the class GC52 (k) while, if we use the notation Z, we are referring to a standardized random vector in GC 52 (k). Thus we are concerned with random vectors ..Y" such that, with Zi == ("Y"i - E("Y"i)) / vvar"Y"i' Z satisfies: for any i,jl,'" ,je (£ :::; k - 1) f
E(ZiI Z jl' ... ,Zje) ==
(i)
L 6j ,i,rn Z jm
m=l
(1)
-
and
(2)
(ii) for constants, 6:j),rn
E
2 Rand ai,i
f
R+ .
Arnold and Wesolowski
54
A random vector Z satisfying (4.1) and (4.2) will have a corresponding variance-covariance matrix ~ == R (with unit entries in the diagonal). To avoid trivial cases we assume R is not a diagonal matrix. Clearly there are quite complicated inter-relationships that must hold among the coefficients appearing in (4.1) and (4.2) since they must be consistent with some diagonal variance-covariance matrix R. Of course, for a given R, there are many GCS2 (k) distributions. It is convenient to introduce the notation GCS2(k,~) to denote all random vectors X with Gaussian conditional structure of the second order with a given k-dimensional variance-covariance matrix L:. Analogously if we write Z E GCS2 (k, R) we mean that Z is a standardized vector with Gaussian conditional structure of the second order and correlation matrix R. If (4.1) and (4.2) hold, the joint characteristic function of Z is severely constrained. Conditions analogous to those displayed in equations (3.1) and (3.2) must hold for various first and second partial derivatives of the joint characteristic function. In the bivariate case, we have Z. E GCS2(2) iff
(i)
(3)
(ii)
(4)
and where
p == cov ( Z 1, Z 2) (E (- 1, 1)) . Conditions (3.1) and (3.2) may be rewritten for such standardized variables as follows. Lemma 4.1: satisfies
Z.
E
GCS2(2) iff for some p
(-1,1) its joint characteristic function cjJ(t l , t 2)
E
a d at c/J( t l , t 2) It =0 == Pdt 2c/J(O, t 2) a d 1
(5)
l
8t2 c/J(t l , t 2)l t 2=0 == Pdt c/J(t l , 0)
(6)
l
a2
atr 4>( t 1 , t2)It, =0
= (/ -
1)4>(0, t2)
d2
+ p2 dt~ 4>(0, t 2 )
(7)
and
(8)
It is not hard to verify that a classical bivariate normal random vector with unit variances and correlation coefficient p, has a joint characteristics function which satisfies (4.5) - (4.8). Similarly the joint characteristic function of the Kwapien distribution (3.3) clearly satisfies (4.5) - (4.8) with p == 2p - 1. The class GCS2 (k) contains Gaussian distributions, non-Gaussian densities as in (3.7) and, when k == 2, even discrete distributions. The common features of all the members can be expressed in terms of properties of conditional moments or of derivatives of the joint characteristic function. The class is however diverse. Some closure properties are however
Multivariate Distributions with Conditional Structure
55
available for the class GCS2(k). For example each subclass GCS2(k, E), for fixed Z, is closed under mixtures. Theorem 4.2: Suppose {X a : X GCS2(k, E) for every Q. function Q
(.
Q
E A} is an indexed collection of random vectors with If we define Z to be a random vector with distribution
FzJ~) = l Fx" (~)dH(o)
for any probability distribution H on A, then Z-
E
GCS2(k, E).
Proof: The bivariate case (k == 2) with A of cardinality 2 was reported by Bryc (1985). The general result is straightforward if we write the joint characteristic function as a mixture
4J£(t)
= l4Jx" (t)dH(o)
and observe that the conditions (4.5) - (4.8) (and their k-dimensional analogs) are preserved by mixtures since the covariance structures (and hence the coefficients in (4.5) - (4.8)) are the same for every Q. Linear combinations of independent random vectors in GCS2(k, E) will yield random vectors in GC S2 (k) but with a different covariance matrix. Specifically we have Theorem 4.3: Suppose that X(1) and X(2) are independent members (not necessarily identically distributed) of GCS2(k, E) then for (a, b,) =I- (0,0), aX(l) + bX(2) E GC S2(k, (a 2+b2 )E). In particular if a2 + b2 == 1, then aX(l) + bX(2) E GCS2(k, E). Proof: We provide a proof in the bivariate case. More extensive equations analogous to (4.5) - (4.8) must be verified in higher dimensional cases. For simplicity and without loss of generality we assume that X(1) and X 2 have been standardized and we will denote them by Z-(1) and Z(2). By assumption Z(1) and Z(2) have common correlation ratio p and their joint characteristic functions satisfy (4.5) - (4.8). Denote the joint characteristic functions of Z-(l), Z-(2) and aZ(l) + bZ-(2) by 11,12 and 13. Because Z(1) and Z-(2) are independent we have
Consequently, using (4.5) for 11 and 12
a
+11 (at l , at2)b atl 12 (bt l , bt2) It 1 ==0 d 11 (0, at 2)I2(0, bt 2) dt 2
== ap-
Arnold and Wesolowski
56
d
+bp11 (0, at 2 )-d 12(0, bt 2 )
t2
==
d P- 13(0, t 2 ) dt 2
.
Thus (4.5) holds for 13. Similarly (4.6) may be verified. Differentiating twice and using (4.7) for 11 and 12 we find
d2
+p2 _ 2 13 (0, t 2 ) dt 2
.
When a 2 + b2 == 1 this implies that (4.7) continues to hold for 13. In parallel fashion, since (4.8) holds for 11 and 12 then, when a 2 + b2 == 1, it continues to hold for 13. Since conditions (4.5) - (4.8) are sufficient for membership in GSC2(2) the conclusion of the theorem follows. Naturally we can extend Theorem 4.3 to deal with sums of more than 2 independent members of GCS2(~)' Indeed we can consider infinite convolutions since clearly the class GCS2(k,~) is closed under weak convergence (i.e. if x(n) E GCS2(k, ~), n == 1,2, ... and x(n) ~ X(oo) then X(oo)EGCS 2(k, ~)). Thus we may state Theorem 4.4: Suppose X(1) X(2) ... are independent random vectors each being a member of GCS2(k,~) (the same ~ for every X(i)). Define == L~l aiX(i) where L~l aT == 1. It follows that E GCS2(k, ~).
r
r
Example: (Uniform and Cantor marginals) Suppose that X(i), i == 1,2, ... are i.i.d. Kwapien random vectors (with characteristic function (3.3) and joint density (3.4)). Consider a random vector == L~l aiX(i) where L~l a; < 00. Since each X(i) E GCS2(2) with correlation 2p-1, it follows that GC S2(2) with the same correlation, 2p-1. Particular choices for the ai's yield interesting examples. If we choose ai == 1/2i , i == 1,2, ... ,r will have a continuous bivariate distribution with uniform (-1, 1) marginals (and Gaussian conditional structure). We conjecture but are unable to prove that this joint distribution is singular (unless p == 1/2, the uninteresting case of independent marginals). If we choose ai == 2/3i ,i == 1, 2, ... then will have a singular joint distribution with Cantor (and thus clearly singular) marginals. Thus we have a singular continuous example with Gaussian conditional structure. It is well known that sums of independent Cantor-like singular random variables can have nonsingular (indeed uniform) distributions. Our present construction (using ai == 2/3 i ) allows us to give an example of dependent Cantor-like random variables whose sum is uniform. To do this, consider the special case == L~l tX(i) where the X(i),s are Kwapien random vectors with p == 2/3. Here Y1 and Y 2 are singular (Cantor) distributed on (-1,1) but (Y1 + Y2 ) /2 is uniform on (-1, 1) (as is easily proved by looking at the convergent infinite product representation of its characteristic function obtained using the expression for the Kwapien characteristic function given in (3.3)).
r
rE
r
r
Multivariate Distributions with Conditional Structure
5
57
FROM GCS2 (k) TO CLASSICAL NORMAL
The examples of section 4 clearly indicate that additional conditions, besides appropriate behavior of conditional moments up to order 2 will be required to characterize the classical normal model. In this section we survey some known and some new results in this area. First a result due to Szablowski (1989). Theorem 5.1: If ..X
E
GCS2 (k) and if X is elliptically contoured then X
r-.J
N(k)
(/!., ~).
Next, we consider the generalized independence models described by Kagan (1988) classes. Definition 5.2: A k-dimensional random vector X belongs to the Kagan class Dk,j(loc), j == 1, 2, ... , k, k == 1, 2, ... , if its characteristic function 4J x, in some neighborhood, V, of the origin in R k has the form -
4J~Jf)
IT
==
R il ,... ,ij (t il t i2 , ...
, tij)
l~il <· ..
where each Ri is a continuous complex function with Ri(D.) == 1, Vi.. It is plausible that any X in a Kagan class Dk,j(loc) that exhibits Gaussian conditional structure of the second order might be classical k-variate normal. Some progress towards proving this result is provided in the following result due to Wesolowski (1991). Theorem 5.3: If X
E
GCS2(k) for some k
> 2 and if X
E
D k ,2(loc) then X
r-.J
N(k)(!!:., 'L.).
If we assume that X, in addition to having Gaussian conditional classical structure of the second order, is infinitely divisible, then it must be classical normal. This result is due to Wesolowski (1993). We refer the reader to the original paper for a general proof. Here we provide a simple illuminating proof for the bivariate case only. Theorem 5.4: If XEGCS2 (k) and if ..X is infinitely divisible then X
r-.J
N(k)(/!.,'L.).
Proof: (in the bivariate case, i.e. k == 2, the case k > 2 was proved by another approach in Wesolowski (1993)). As usual, without loss of generality we assurne zero means and unit variances. Since ..X == (Xl, X 2 ) is infinitely divisible, the logarithm of its joint characteristic function is of the form
1/J(f) == log 4J(f)
for some measure K. It then follows that 2
8 21/J(f)l t l=O -a
tl
== -1-
i
R2
2
x eit2Y-2--2dK(x,y) X
+Y
Arnold and Wesolowski
58
and
d2 _nJ,(O t ) -- -1 2 «f/ ,2 dt 2
However since X
E
1.
2
R2
-dK( eit2y _Y 2 2 X, Y ) . X +Y
GCS2 (2), we know that (4.8) holds. Consequently we have
1.
R2
X2 - 2 - - 2 dK (x,
x
+Y
y) ==
p2
J
y
2
- 2 - - 2 dK (x,
x
+y
y)
(1)
(where p2 < 1). Analogously, by considering ~'l/J(t)lt2=O and ~'l/J(tl'O) we find that (5.1) 2 1 again holds with the roles of x and y interchanged. Summing we conclude that for p2 < 1,
( dK(x, y) ==
JR 2 i.e. dK
6
p2 (
JR2
dK(x, y) ,
== O. Consequently X must be classical bivariate normal.
REMARKS
Progress towards understanding the class of distributions with Gaussian conditional structure is accelerating. Many interesting questions remain open. Perhaps the most frustrating lacuna in the current inventory of examples involves the absence of any discrete example with Gaussian conditional structure of dirnension greater than 2 (as discussed in Section 3). Theorem 4.3 together with the celebrated Kwapien example permits construction of a plethora of two dimensional discrete distributions with Gaussian conditional structure. The elusive 3 dimensional examples should appear soon. REFERENCES
1. Arnold, B.C., Castillo, E. and Sarabia, J.M., Conditionally Specified Distributions, Lecture Notes in Statistics, Vol. 73, Springer-Verlag, Berlin, (1992). 2. Arnold, B.C., Castillo, E. and Sarabia, J.M., A conditional characterization of the multivariate normal distribution, Statistics and Probability Letters, 19, 313-315, (1994). 3. Bhattacharya, A., On some sets of sufficient conditions leading to the normal bivariate distribution, Sankhya, 6,399-406, (1943). 4. Bryc, W., Some remarks on random vectors with nice enough behavior of conditional moments, Bull. Polish Acad. Sci. Math., 33, 677-684, (1985). 5. Bryc, W. and Plucinska, A., A characterization of infinite Gaussian sequences by conditional moments, Sankhya, A47, 166-173, (1985). 6. Kagan, A.M., New classes of dependent random variables and a generalization of the Darmois-Skitovich to several forms, Theory of Probability and Applications, 33, 286295, (1988).
Multivariate Distributions with Conditional Structure
59
7. Kagan, A.M., Linnik, J.V. and Rao, C.R., Characterization Problems of Mathematical Statistics, Wiley, New York, (1973). 8. Nguyen, T.T., Rempala, G. and Wesolowski, J., Non-Gaussian measures with Gaussian structure, to appear in Probability and Mathematical Statistics, (1994). 9. Plucinska, A., On a stochastic process determined by the conditional expectation and the conditional variance, Stochastics, 10, 115-129, (1983). 10. Szablowski, P.J., Can the first two conditional moments identify a mean square differentiable process?, Comput. Math. Appl., 18, 329-348, (1989). 11. Wesolowski, J., A characterization of the Gaussian process based on properties of conditional moments, Demonstratio Math., 18, 795-808, (1984). 12. Wesolowski, J., Gaussian conditional structure of the second order and the Kagan classification of multivariate distributions, Journal of Multivariate Analysis, 39, 79-86, (1991). 13. Wesolowski, J., Multivariate infinitely divisible distributions with the Gaussian conditional structure of the second order. In Stability Problems for Stochastic Models (Kalashnikov, V.V. and Zolotarev, V.M. eds). Lecture Notes in Mathematics, Vol. 1546, 180-183, Springer-Verlag, Berlin, (1993).
The Minimal Projection from L 1 onto 1tn
BRUCE L. CHALMERS CA 92521
Department of Mathematics, University of California, Riverside,
FREDERIC T. METCALF side, CA 92521
Department of Mathematics, University of California, River-
ABSTRACT Simple equations are presented which are shown to be necessary and sufficient for the (unique) projection from L 1 [a, b] onto a finite-dimensional Haar subspace to be minimal. In particular these equations provide the minimal L 1 [-1, 1] projection onto the algebraic polynomials of degree n, for which numerical solutions are given for n S; 5. 61
62
Chalmers and Metcalf
1. Introduction and Preliminaries In [1] are derived sufficient and necessary (assuming the subspace is "smooth") equations for finite rank L 1 projections to be minimal (Theorem A below). (The existence of a minimal projection in this setting is proved in [5].) As an application of these conditions in [1] is obtained a sufficient condition, labeled "Prescription," for determining a minimal projection from L 1 [-1, 1] onto a finite-dimensional subspace V. In this paper we show that the Prescription is in fact also necessary whenever V is a Haar space, the projection (identity) action on V is generalized to an arbitrary non-singular action on V, and L 1 [-1, 1] is generalized to L 1 ([-1, 1], v), where v is an arbitrary finite nonatomic Borel measure on [-1,1]. (Recall that V is an n-dimensional Haar space on [a, b] means that the elements of V are continuous functions on [a, b] and any nonzero v E V has no more than n -1 zeros in [a, b], or, equivalently, that any n distinct point-evaluation functionals (supported in [a, b]) are independent on V.) The proof is based on an application of the classical Hobby-Rice theorem, a theorem of fundamental importance in the the theory of best approximation in the L 1-norm (cf [2]). As applications, we will use the Prescription to find numerical solutions for the minimal projections from £1[-1,1] onto V == 1rn - l , the space of (n - 1)degree algebraic polynomials, n - 1 ::; 5. More generally, for (T, E, v) a complete measure space, let P == EUi ® Vi be a linear operator from L 1 (T, v)) onto V with Ui E L OCJ and Vi E V, satisfying
(i, j == 1, ... , n), with the matrix A = (aij) fixed, but non-zero. Additional notation which will be used in the following is
and
Vi(t)
=
h
n
Vi(S) sgn k(t, s)dv(s)
where
k(t, s) ==
L: Uj(t)Vj(s). j=1
Also, the Lebesgue function L(t) for P is given by
L (t) =
hi
k(t, s) I dv (s) = 11 (t) . V(t ),
t ET;
note that IIPI! == ess sup L(t); see [3, Lemma 2]. Throughout this paper the notation t E T' ~ T will mean for almost all t out of T' relative to the measure v. In the following all the statements and results will refer to the operator P (and the associated "action" matrix A). Note that if P is a projection then A is the identity matrix.
Minimal Projection fronl L I onto
63
IT II
The following theorem proved in [1] provides necessary and sufficient (equality) conditions for the operator P to be minimal.
Theorem A ([1]).
Let (T, E, v) be a complete measure space for which v is strictly
localizable. Let V be a finite-dimensional subspace of L 1 , and let P
n
= 710 ii = I:: Ui ® Vi
be
i=1
an operator mapping L 1 into V with Ui E LOO (i = 1, ... , n), ii = (VI, .. . , V n ) a fixed basis for V, and the matrix A == Vi (t)Uj(t) dv(t) = aij (i, j = 1, ... ,n) fixed. In order that P be minimal, the following equality conditions are necessary and sufficient: There exists a non-zero n x n matrix M such that
IT
(a) the Lebesgue function L(t) = IIPI! on T' = supp (lvIii), and (b) there exists a positive function cP such that
4>(t) V(t)
= Alii(t),
(1)
tET'.
(In fact, 4> = 71· Mii IIIPII.) Notation. We will denote by P rnin a minimal operator given by Theorem A. Recall that, in an L1 space, the subspace V is said to be smooth if and only if each
member of V\O is almost everywhere different from O. Corollary A ([1]). If V is smooth, then the LebesguefunctionforPmin in Theorem A is constant on T. Theorem B ([1]). P min is unique if V is smooth and i1 (t) . iJ is determined up to a scalar factor by its roots.
2. Main Theorem Lemma 1 (Hobby-Rice Theorem [4]). Let V be an n-dimensional subspace of L 1 ([ -1, 1], v), where v is a finite nonatomic B orel measure. Then there exist points -1 = to ~ t1 ~ ... ~ t n+ 1 == 1 such that
v v E V.
(2)
Theorem 1 (Prescription). Let V be an n-dimensional Haar subspace of £1([-1,1], v), where v is a finite nonatomic Borel measure.
Then P min
== i1 0 iJ =
n
I:: Ui ® Vi
from
i=1
L 1 ([
-1, 1], v) into V with respect to the fixed action A is given uniquely by the following
prescription:
VI (x(t)) V1(Xl(t))
V2 (x(t)) V2
(Xl (t))
Vn(x(t) ) Vn(XI (t))
-1
Aa(t)
o (3)
o
Chalmers and Metcalf
64
where
Vi(x) :=
1 (lX' - JXl(2 + ... + (_l)n-l JXn-l r ) Vi(S) dv(s),
i = 1, ... , n,
(4)
-1
and x( t) are solutions to
(5) where mi(t) :==
mi . iJ(t),
i == 1, ... , n, for some n x n matrix
M ·.-
(6)
with
Vn (x(t)) ]
l1(t) := sgn [m
n
.
(7)
v(t)
and A:=
IIPII.
(8)
Note: The n 2 parameters mij (normalized so that one of mij == 1) and the norm parameter A are determined from the n 2 orthonormalization conditions (i,j == l~ ... ~ n).
(9)
Proof. Keeping in mind equation (b) of Theorem A, for a given matrix M with non-zero rows mi (i == 1, ... ,n), we would like to solve
Vi (x(t)) mi ·iJ(t)
for
_
Vn ( X(t ))
-
mn ·iJ(t) ,
i
= 1, ... , n
(10)
- 1,
x(t) == (x 1 ( t ), ... , X n -
1 ( t)), with - 1 < Xl ( t) < ... < X n _ 1 ( t) < 1 for each t E ( -1, 1). Note that (10) can be rewritten as
mi+1 . iJ(t) Vi (x(t)) - mi . V(t)Vi+1(X(t)) =
(1:' -1~2 + ... +
(_l)n-l
l~-J [mi+l(t)Vi(s)-mi(t)Vi+l(S)] dv(s),
i = 1, ... , n,
Minimal Projection from L I onto
65
'Tt"
where m~ :== rni . v(t), i == 1, ... , n. The existence of Xi == Xi(t), i == 1... . ,71 - 1, now follows directly from Lemma 1 (the Hobby-Rice Theorem). Next note that the invertibility of the 71 x 71 matrix in (3) is equivalent to the n - 1 point-evaluations bXi (i == 1, ... ,71 - 1) and the functional
°
being independent on V. But, indeed if i= v E V such that V(Xi) == 0, i == 1, ... , n - 1, then. by the Haar assumption, for XQ :== -1 and X n :== 1, we have sgn(v) == t (-I)i on (~Ct-l, X'i)~ 'i == 1~ ... ~ n, where E == -sgn(v( -1)), and thus LB(v)1 > O. establishing the independence. Note that a(t) Vi (x (t)) will be Vi (t) of TheorelTI A, i == 1.... ,71. The hornogeneity of the system (10) allows one of the mij to be nornlalized to be 1. leaving exactly 71 2 equations in 71 2 unknowns. Finally. by the Haar condition. for each t E (-1, +1), k(t.~) == u(t)· v(s), as constructed according to the above prescription, changes sign only at s == Xi (t) (i == 1, ... , n - 1) and thus A == u(t) . V(x(t)) == Ik(t, s)1 dv(s) == L(t) > 0, where L(t) is the Lebesgue function for P. Thus Theorem A guarantees that we have P rnin . Finally, by Theorem B, Pmin is unique. •
IT
REMARK. System (3) is equivalent to
(3a)
where'iFt ==
Ui/'Un.
('i == 1... .. 71,-1 ), and
Aa
(3b)
Thus, if Vn - 1 :== [Vb .. Vn-l] is also Haar, then the (71 - 1) x (71 - 1) matrix in (3a) above is invertible. In particular, in the algebraic case, Vn - 1 :== [1, t, .... fL-2] and the (71 - 1) x (71 - 1) nlatrix above is in fact the classical (invertible) Vandermonde matrix. q
3. Applications All the applications in this section are directed towards the determination of the 11linimal projection from L 1 [-I, 1] onto 7rn - l (i.e., the action A == I, the measure v is standard Lebesgue measure, and V == [1, t, ... , tn-l]). The first two applications are repeated from [1] for the sake of completeness and as an aid to the reader to identify the various parts of the Prescription.
Chalmers and Metcalf
66
In Application 1 (n==2), for each AI, the single function x(t), being the root of a quadratic, is determined explicitly, and then AI :==diag( 1, m) and A are determined to meet the (two) remaining (after symmetry) orthonormality conditions (via a numerical method). In Application 2 (n==3), for each M, the two functions Xi(t), i == 1,2, are also determined explicitly (in terms of the solution of a quartic), and then M and A are determined to meet the (five) remaining (after symmetry) orthonormality conditions (via a numerical method. In Applications n-l (4 :::; n :::; 6), for each M, determining the n -1 functions Xi(t), i == 1, ... , n - 1, involves solving (for each t) an (n - 1) x (n - 1) system of polynomial equations of degree n and thus must be determined entirely numerically (e.g. by Newton's method). M and A are then determined to meet the remaining (after symmetry) orthonormality conditions (via a numerical method). In all cases we give the M matrix (up to 4 decimal places) and the projection norm A (up to 5 decimal places). APPLICATION 1. V == [1, t][-l,l] (Franchetti-Cheney [3]). Consider the process described by the above prescription. In this case n == 2, V1(x) == 2x and V2 (x) == x 2 - 1. Using symmetry considerations, equation (5) becomes 2x( t) _ x 2 ( t) - 1 1
(1.5)
mt
-
from which the admissible solution is x(t) == mt-sgn(mt)Jm 2 t 2 and (7) are then
7j;(t) == -x(t)
+ 1.
Equations (3a), (3b)
(== Ul(t)!U2(t))
Aa(t) U2(t) == -2x-(-t)-'ljJ-(t)-+-x-2-(t-)---1
(1.3a) (1.3b)
a (t) == s gn x (t ),
(1.7)
which give
u (t)
=
1
U2(t) Using the symmetry of
Ul
and
U2,
=
AJx(t)1
1 + x 2 (t) Asgn x(t) 1 + x 2(t) .
equations (9) become
11
U 1 (t)
dt ==
11
0 0
tU2
1
(t) dt == -, 2
(1.9)
which are identical to equations (8) and (10) of [3], and result in the equation 2'l/J(1)[1 - 7/;2(1)
+ 'ljJ(1)] log 1'ljJ(1) I + 1 - 7/;2(1) == 0
(12)
Minimal Projection from L I onto
67
IT II
for 'l)J(I), and hence rn. It then follows that A > 0 and Theorem 1 guarantees we have Pmin. In this case
and
Pmin == A == 1.22040 .... APPLICATION 2. V == [1, t, t 2 ][_I,I] (Chaln1ers-Metcalf [1]). In this case n == 3 and --
VI ( X
b X 2) == 2(x 1 -
:r; 2
+ 1),
--
2
V2 ( XI, X 2) ==
2
--
xI - X 2'
V 3 (x 1, X 2)
2 3 == 3 (1 + xI
3
- X 2) .
Using symmetry considerations, equations (5) become
+ 1] + mI3 t2
mlI
~ [xf (t) - x~ (t)
xy(t) - x~(t) t
2[Xl(t) - :r;2(t)
m31
+ 1] + m33 t2
(2.5)
Letting and equations (2.5) may be rewritten Xl -
:c 2
+ 1 == 1'1 (x i-x ~) ,
Introducing the variables YI
Yl
==
:Cl -
X2
+ 1 == 1'IYIY2
(2.5)'
and Y2 == and
YI
which reduce to the single quartic equation for
Xl
+ X2
3y~
leads to
+ yr 4
+1=
1'3YIY2,
Y2 :
This equation is then solved yielding admissible Xl and X2 (-1 ~ Xl (t) < X2(t) ~ 1). The function a(t) (in (7)) is -1 for ItI < to and +1 for to < Itl, where ±to are points where the admissible solutions of the quartic equation switch from one pair of roots to another. The values of A, rnlI, m13, m3b and m33 are determined from the five non-trivial orthonormality conditions
J J I
1
=
Ul(t) dt
-1
1
o=
J J I
=
t2U3(t) dt
-1
1
t21Ll (t) dt =
-1
U3(t) dt.
-1
J 1
=
tU2(t) dt
-1
(2.9)
Chalmers and Metcalf
68
The solution of these equations (for example, by the iteration method of §3 in [1]) yields to == 0.45710 ... and the values of A and M given below. Hence, the Xi(t) are specified, and Pmin == if 8 V, where i1 (t) is given by
or
and IIPmin11 == A. 1
M==
0 ( -.1552
o
.9336
o
-.6675) 0 1.2711
and
Pmin == A == 1.35948 .... APPLICATION 3. V == [1, t, t 2 , t3h-l,1]'
M ==
1 0 ( -.0797
o
0 1.4760 0 -.2648
-.4726 0 1.0520 0
0 ) -1.1095 0 1.3017
and
Pmin == A == 1.46184 .... APPLICATION 4. V == [1,t,t 2 ,t 3 ,t4 ]r_l,1]' 1
o M==
-.1590
o
.2767
o
1.2925
o
-.7046
o
2.3605
o
-1.8019
o
-.9250
o
-.1523
o
1.1257
o
-0.0126
o
-.1178
o
1.0642
and
Pmin == A = 1.54874 ....
Minimal Projection from L I onto
69
IT II
1
()
-.0460
o
.8893
o
-.0781
o
o
-.5593
o
-.0743
o
-.0675
o
-.1769
1.6151
o
o .9039
o 3.5525
o .6439
-.4955
o
o
-1.4609
-.9900
o .9124
o
o -2.6180
o .2685
and P min
= A = 1.61031 ....
REFERENCES [1] Chalmers, B. L. and F. T. Metcalf, The determination of minimal projections and extensions l:n L 1 , Trans. l\rner. l'v1ath. Soc. 329(1992),289-305.
[2] Cheney, E. W., AppLications of fixed-point theorem,s to approximation theory, Theory of Approx. with Appl., l\cadenlic Press Inc., New York, (1976), 1-8. [3] Franchetti, C. and E. W. Cheney, Al1:nirnaL projections in Ll-space, Duke Math. J. 43(1976),501-510.
[4] Hobby, Charles R. and John R. Rice, A 1noment problem in L 1 approximation, Proc. Amer. Math. Soc. 16(1965), 665-670. [5] Morris, P. D. and Chcney, E. W., On the existence and characterization of minimal projections, J. lleine Angc\v. Math., 270 (1974), 61-76.
"Proofs" and Proofs of the Eckart-Young Theorem JOHN S. CHIPMAN Minnesota 55455
Department of Economics, University of Minnesota, Minneapolis,
INTRODUCTION In 1936 Eckart and Young formulated the problem of approximating an n x k matrix X of rank k by an n x k matrix of rank r < k. This has come to be known as the Eckart-Young theorem. It has important applications to factor analysis in psychometrics (for which it was originally developed by Eckart and Young), to clustering and aggregation in econometrics (cf. Fisher, 1962, 1969), to quantum chemistry (cf. Goldstein and Levy, 1991; Aiken, Erdos and Goldstein, 1980), as well as to the theory of biased estimation (cf. Marquardt, 1970) in statistics. Marquardt showed that if in the regression model y
= Xf3+c,
£c
= 0, £cc' = a 2 I,
Work supported by a Humboldt Research Award for Senior U. S. Scientists. I wish to thank John Eagon, Joel Roberts, and Paul Garrett of the University of Minnesota's School of Mathematics for their help. In particular, Lemma 2 and Theorem 1 were supplied by Roberts and the idea for Theorem 3 by Garrett, with whom I had many valuable discussions. Both of them declined coauthorship, but they deserve most of the credit for the results. Upon presentation of this paper at the Delhi Workshop on Generalized Inverses, 14 December 1992, George Styan drew my attention to an unpublished technical report by Rao and Styan (1976), some of the results of which were reported by Rao (1979, 1980); this raised some of the same issues as the present paper, and presented alternative proofs. I was also privileged to read some unpublished notes by Styan (1976). A relevant unpublished paper by Sondermann (1980) should also be mentioned. Finally, I wish to thank Renate Meyer for bringing by attention to the paper by Mirsky (1960)-see also Schmidt (1907), von Neunlann (1937), Stewart & Sun (1990), and Meyer (1993, p. 67)-and to Jerome Goldstein for stimulating conversations. 71
Chipman
72
where rank X == k, the square of the normalized length of (3 (i.e., (3'(3/ (72) is less than the sum of the reciprocals of the k - r smallest eigenvalues of X' X, then the estimator /3(T)
= XlT)Y
-where X(T) is the best approximation of X by a matrix of rank r < k and XlT) is the MoorePenrose generalized inverse of X(r)-has lower mean-square error £(~(r) - (3)'(~(r) - (3) than the ordinary least-squares estimator ~ == xt Y of {3. This result has been extensively applied to aggregation problems in econometrics by the present writer (Chipman, 1978, 1983, 1985). Eckart and Young (1936) stated their result without proof, although they presented a heuristic argument. A somewhat n10re elaborate argument appeared in Householder and Young (1938), but still with important gaps. Golub and Kahan (1965, p. 220) presented an alternative proof, but it contains a serious lacuna analyzed below. A detailed proof is presented in Stewart (1973, pp. 322-3) but here again some of the crucial steps are omitted. It was subsequently pointed out by Stewart and Sun (1990, pp. 208-10) that the result had been proved by Mirsky (1960) for complex matrices and arbitrary unitarily invariant norms, and earlier still by Schmidt (1907) for integral operators and the Hilbert-Schrnidt norm. They therefore call the result the Schmidt-Mirsky theorem. Mirsky's proof is based on von Neumann 's (1937) theory of symmetric gauge functions. In this paper some simple proofs will be provided for the Frobenius norm that do not require this apparatus. While they will be stated for real matrices X and orthogonal matrices P and Q, the proofs go through without change for complex matrices X and unitary matrices P and Q. A basic problem is that the set of n X k matrices of rank r < k is not closed. Such a set is defined by the conditions that all minors of order greater than r vanish, and at least one minor of order r be nonvanishing. One wishes to find in this set a matrix that is closest to X in the Frobenius norm; but since the set is obviously not closed, the existence of such a matrix is not at all obvious. Stewart (1973, pp. 322-3) recognizes that there is an existence problem, but limits himself to stating: "We prove the theorem under the assumption that the minimum ... actually exists (this assumption can easily be established by analytical considerations)." Implicitly what is done is to deal with the set of matrices of rank ~ r; this is a closed set, and can be compactified, hence a matrix in this set exists that is closest to X; but on the face of it, it might have rank < r. It has to be shown that it has rank exactly r.
FORMULATION AND SOLUTION OF THE PROBLEM We denote by X the set of all real n x k matrices X. For X, Y E X with entries define the inner product (X, Y) == trace(Y'X) == XijYij.
Xij, Yij
we
L
Write IIXII == (X, X)!; this is the Frobenius norm. The function d(X, Y) == IIX - YII defines the Frobenius distance between X and Y, and is a metric on X. We shall denote by X r the subset of n X k matrices of rank ~ r. Lemma 1 to follow shows that the set of n X k matrices X of rank ~ r is closed, and that a suitable subset of this set consisting of matrices close to the given X is compact. Hence X can be best approximated by a matrix of rank ~ r. Lemma 2 is used to prove Theorem 1, which states that the matrix closest to X actually has rank r. As far as existence of a closest
73
Eckart-Young Theorem
matrix of rank r is concerned, this is the end of the matter. However, the Eckart-Young theorem states that the best approximation of X by a matrix of rank 1~ can be obtained by replacing all but the the r largest singular values of X by zeros. It must still be shown that this procedure provides the correct result, and that the resulting matrix actually has rank r. This is done in Theorem 2, which is based on Stewart (1973). Theorem 3 provides an alternative, streamlined, proof. Finally, an extremely simple proof furnished to the author by Heinz Neudecker is presented in the Appendix. LEMMA 1 Let X be a given n x k matrix of rank> r, where r < m == min(n, k). Then within the class X r of n x k matrices X of rank ~ r, there exists a matrix X closest to X in the Frobenius norm, i.e., such that
IIX - XII == XEX min IIX - XII·
(1)
r
Proof: The set X r of n x k matrices of rank ~ r is defined by the condition that all minors of order r + 1 of such matrices are equal to zero. Since these minors are polynomials in the elements of the matrices X, these equations define a closed set in the nk-dimensional space of matrices X. Let B be the ball of radius IIXII centered at X. Then B n X r is compact, and is nonempty since it contains at least the zero matrix O. Therefore the continuous function f(X) = IIX - XII has a minimum, X, on B n X r , which is clearly the minimum on X r . • LEMMA 2 Let E ij be the n x k matrix with 1 in the i, jth position and Os elsewhere, and let A be any n x k matrix. Then there exists a real number A =1= 0 such that the nk matrices
AEij-A
(i=1,2, ... ,n;j==1,2, ... ,k)
(2)
form a basis in nk-dimensional space. Proof: If the set of matrices (2) is linearly dependent, then the nk x nk matrix whose columns are the COIUlllil vectors of successive columns of (2) is a matrix of the form Alnk - M, where M has all its nk columns equal to to the column vector of columns of A, and A is an eigenvalue of M. But the eigenvalues of M are 0 with a multiplicity of nk - 1, and just one other real number (namely, the sum of the elements of A). For any real number other than this one or 0, the set (2) is therefore linearly independent. • THEOREM 1 Let X be an n x k matrix of rank ~ r, and let X be a matrix of rank ~ r which is closest to X (in the Frobenius norln) among all Inatrices in the set X r of n X k matrices of rank S r. Then rank X == r. Proof: Suppose -;ank X < r, and let A be such as to satisfy Lemma 2. Since rank AEij == 1 for A =1= 0, we have, since multiplication of a matrix by a non-zero scalar does not affect its rank, (3) rank[(l - t)X + t,.\E ij ] ~ rank X + rank ,.\Eij ~ r for any real number t, since the rank of the sum of two matrices is less than or equal to the sum of their ranks (because the column space of the sum of two matrices is contained in the sum of the column spaces of the two matrices, as is easily verified), and (3) holds for t == 0 and t = 1. Thus, of all points on the line (1 - t) X + t,.\Eij , the closest to X will be X, since X has been assumed to be a closest point to X of all X with X E X r • Thus, the matrix 'X - X is perpendicular to the matrix X - ,.\Eij , since the shortest distance from a point to
74
Chipman
a line is along the perpendicular. But then X - X is perpendicular to all of nk-dimensional space, since X - AEij is a basis for this space. Thus, X - X = 0, i.e., X = X.• We now present our first proof of the Eckart-Young theorem. The proof presented here follows that of Stewart (1973, pp. 322-2), which is in turn based on that of Householder and Young (1938). In particular, the crucial Step 3a has been added, showing that the relevant matrix D11 has rank r. Steps 4 and 5 provide some additional needed details. THEOREM 2 (Eckart and Young) Let X be a given n X k matrix of rank p > r, where p ~ m = min( n, k). Then a matrix that minimizes 1I X - X 11 over the set X r = {X : rank X ~ r} is given by
(4) where
x
(5)
= QDP'
is a singular-value decomposition of X, and the n x k matrix D(r) is obtained from D by replacing all but a set of its r largest diagonal elements by Os. Further, X is a minimizer only if it is obtained in this way. Proof: From Lemma 1, there exists a matrix X of rank ~ r closest to X in the Frobenius norm, i.e., such that (1) holds, and since rank X > r, this matrix X has rank r, by Theorem 1. Let a singular-value decomposition of X be denoted
(6) where
iJ
is an n
X
k diagonal matrix of the form A
D
=
[S0
0] 0 '
(7)
• an r x r d·lagonaIma t· A 82, A ... , 8A }WI . th 81 A > A > 0 Th e an d SA IS fIX d·lag { 81, = 82 = ... > = 8Ar >. r main task of the proof is to show that b = D(r) where the latter is obtained from D in the manner described in the statement of the theorem. Define
(8) and denote
D=
[~:: ~:~]
(9)
where D11 is of order r x r. D has rank p. Owing to the orthogonal invariance of the Frobenius norm, it follows from (6) and (8) that (1) is equivalent to
liD - nil = First we show that the matrix
mil! rank D~r
D of (8)
This will be done in a series of three steps.
(10)
and (9) must be of the form
- = [S0
D
liD - bll·
D022
]
•
(11)
Eckart-Young Theorem
75
Step 1. We show that D21 =
o.
Suppose not; then define
- [S D
D =
This has the same rank as
21
0] 0 .
(12)
S, which is r; hence the matrix
has rank r, and by the orthogonal invariance of the Frobenius norm,
IIX - XII = IIX - QDP'II = IIQ'XP - DII = liD - DII· But from (9) and (12),
and from the orthogonal invariance of the Frobenius norm, liD - b 11 = IIX - X 11. Therefore, X, which has rank r, is closer to X than X, in contradiction to (10). This contradiction proves that D21 = o. Step 2. A completely similar proof shows that D12 = o. Step 3a. We show that rank Du = r. Suppose not; then since D has rank p > r, and D21 and D12 have been shown to be zero, we can find a partition
D22 = of D22 such that D22 ,11
:f 0 and the n
[l}22,11 22 ,21
D
D
x k matrix
_ [DU
D ==
l!22,12] 22 ,22
0
o
D22o~ ,11 0 o
]
has rank r. Accordingly,
i22~'12 D
]
22 ,22
This contradicts (10) as before; hence rank D11 = r. Step 3b. We show finally that D11 = S. Suppose not; then, defining
iJ = [Du o this matrix has rank r, and
0] 0 '
= IID-DII·
Chipman
76
leading to a contradiction, as before. Therefore D must be of the form (11). Step 4. Now let D22 = Q22RP~2
(13)
be a singular-value decomposition of the (n - r) X (k - r) matrix D22 , where Q22 and P22 are, respectively, (n - r) X (n " " " " r) and (k - r) X (k - r) orthogonal matrices, and R is an (!1' - r) x" (k - r)" diag~nal" matri~ of si~gular values of D22 . Define further the partitions P = [PI, P2] and Q = [Ql' Q2] of P and Q into their first r and last k - rand n - r columns, respectively. Finally, define the rectangular n x k diagonal matrix
D=[~ ~] and the k
X
k and n
X n
(14)
matrices (15)
which are readily verified to be orthogonal. Then we verify from (15), (14), (13), (11), and (8) that (16) QDP' = QD?' = X; thus, QDP' is a singular-value decomposition of X, in accordance with (5); and from the orthogonal invariance of the Frobenius norm,
IIDII = IIDII = IIXII· On the other hand, it is clear from (15), (7), and (6) that
QDP' = QD?' = X, so that QDP' is a singular-value decomposition of establishing (4). From (16), (14), and (13) we have
X. It remains to show that D =
IIXI1 2 = 11511 2+ IIRI1 2 = 11511 2+ IID 22 1\2, the diagonal elements of 5 as r of its singular values, and
so that X has of its remaining m - r singular values is equal to (13) we have
(17) D(r),
(18) the sum of squares
111522112. From (16), (17), (14), (7), and (19)
Since by hypothesis, (19) is a minimum (satisfying (1)), this can only be the case if, in (18), the diagonal elelnents of S are the r largest singular values of X, and those of R are the m - r smallest (with possible ties). It follows that, if the singular values of X are ordered as SI ~ S2 ~ ... ~ Sr ~ Sr+l ~ ... ~ Srn, S must contain SI, S2, .. . , Sr, and R must contain Sr+l, ... , Srn. (If Sr = Sr+l, X is not unique.) Applying this requirement to (7) and (14) we have b = D(r) and the main part of the theorem is proved. Step 5. Finally, let X = QiJP' be any other singular-value decomposition of X and let iJ(r) be obtained from iJ by replacing all but a set of its r largest singular values by Os. Define X = QiJ(r) P'. Then by the orthogonal invariance of the Frobenius norm we have
77
Eckart-Young Theorem
This may be compared with the theorem as presented by Golub and Kahan (1965, p. 220), who proceed as follows 1 (where I have -substituted the notation of the present paper): "THEOREM" Let X be an n X k matrix of rank p ~ k < n and let its singular-value decomposition be given by (5), where D is an n X k diagonal matrix of the form
and S is a px p diagonal matrix of singular values of X in descending order SI ~ S2 ~ ••• ~ Sp. Let Xr be the set of all n x k matrices of rank r < p. Let b be the n x k diagonal matrix obtained from D by setting all but its r largest singular values equal to zero, and define X = QDP'. Then IIX - XII ~ IIX - XII for all X E Xr • "Proof":
From the orthogonal invariance of the Frobenius norm,
IIX - XII Denote jj
= Q' X P.
=
liD - Q'XPII·
Then
s;,
Since IIX - XI1 2 = liD - DI1 2 = 2::j=r+l it follows that IIX - XII is minimized when djj = Sj for j = 1,2, ... , rand dij = 0 otherwise. • As noted above, the set Xr is not closed; and no use in the proof is made of the hypothesis that X, hence iJ, has rank r. However, the problem is simpler: the last sentence asserts in effect that k
I:(Sj -
k
djj )2
~
j=l
Suppose k
= p = 3 and r = 2,
I:
s;
for all
iJ
=
Q'XP such that X E Xr •
j=r+l
and let
SI
= 3, S2 = 2,
and
S3
= 1.
Then the matrix
provides a counterexample to the statement. (By setting all elements in the bottom row of jj equal to .5, one would obtain the same result but violate the rank condition.) Thus, nondiagonal jj would have to be disposed of by a separate argument (cf. Styan, 1976). THEOREM 3 Let X be a fixed element of X and have rank p ~ m = min(n, k), and for any r < p let ..¥ be any matrix E X r that is closest to X. Then there exists a singular-value A
lSee also Problem 10 in section 1f.3 of Rao (1965, p. 56; 1973, p. 70), and section 21 of Chapter 6 of Ben-Israel & Greville (1974, pp. 246-9), as well as the revision in Ben-Israel & Greville (1980, pp. 246-9), where a few more references will also be found.
78
Chipman
decomposition X == QDP' of X, where
o D==
o o o
and d1 ~ d2 ~
••.
~ dp > 0, such that, defining a matrix
D by
o o
o o where dr > 0, we have X == QDP'. Thus, X has rank r. Proof: Let ~ E X be a matrix having one of the three patterns (block decompositions)
where the northwest block is r X r and the southeast block is (n - r) X (k - r). Let X E X r be a matrix closest to X (which exists by Lemma 1), and let X == QiJP' be a singular-value decomposition of X such that
b= ~
where D l has diagonal entries d1 of the three patterns above,
d2
~ ••• ~
X(t) Then X (t) is certainly still in X r • Since X E X r minimizes /IX -
for
~
X1l 2 ,
[~1 ~]
==
dr
~
O. Define, for real t, and for
~
of one
X + tQ~P'.
it follows that
of all three patterns above. Multiplying this out and taking the derivative, we obtain
(X - X,Q~P') ==
o.
Rearranging, this is
(Q'XP -
iJ,~) == O.
Since this holds for ~ of all three patterns above, it must be that
Q' XP _ iJ ==
[0o 0 ] Z22
Q' X P- iJ
is of the form
Eckart-Young Theorem
79
where Z22 is some (n - r)
X
(k - r) matrix. It follows then that
Q'XP ==
[Do Z220]. 1
Let A 22 and B 22 be orthogonal matrices of orders n - rand k - r respectively, such that Z22 has singular-value decomposition Z22 == A22D2B~2' where
with dr + 1 ~ dr +2 ~
••.
~
A == [Ir
o
dm ~ O. Define 0 ], B == [ I r A 22 0
Bo
22
] ,and D == [
D01
Then
A'Q'XPB == D. But QA and PBare orthogonal matrices, so the singular values of X are the diagonal elements of D1 and D2. But the singular values of X are the diagonal elements of D1 plus adjoining zeros. Thus, defining P == PBand Q == QA, we have
Therefore, the distance between X and
X is
The permutations of the singular values which minimize this distance are obviously those for which the m - r singular values dr +1 , ••• ,dm are the m - r smallest and the r singular values d 1 , ••• ,dr the r largest. Since X has been defined as a matrix closest to X in X n D 1 must contain the r largest singular values; and the singular-value decomposition X == QiJ P' was already chosen so that the singular values of X, which by the above are the r largest singular values of X, are in descending order. Likewise the singular-value decomposition Z22 == A22D2B~2 was chosen so that the singular values dr +1 , . .. ,dm are in descending order. Hence, QX P' arranges all the singular values of X in descending order. Since r < p and dp > 0, clearly dr > 0 and X has rank exactly r. • We conclude with an extremely simple proof of necessity of the Eckart-Young condition furnished to me by Heinz Neudecker, which is contained in the Appendix following. 2
2For the methodology followed see Magnus and Neudecker (1991), pp. 358ff.
Chipman
80
APPENDIX: A Proof of the Eckart-Young Theorem HEINZ NEUDECKER, University of Amsterdam, Amsterdam, The Netherlands Let X be closest to the given n x k matrix X in the Frobenius norm. Its rows may be expressed without loss 'of generality as linear combinations of r 1 x k orthonormal vectors, i.e., = AB', B' B = IT
x
where A is n x rand B is k x r. We therefore wish to find A and B that solve the problem Minimize tr (X - AB')'(X - AB')
subject to
B'B = I,
or equivalently, Maximize'lj;
== 2 tr BA'X - tr A'A subject to B' B = 1.
Setting up the Lagrangean expression
= 2 tr
BA'X - tr A'A - tr L(B'B - I),
we see without difficulty that since B'B is symmetric, without loss of generality the Lagrangean multiplier matrix L may be taken to be symmetric. Using this symmetry we obtain for variations in A and B d
(i)
= 0 for XB = A
(ii)
A'X = LB'
(iii)
B' B
Setting d
+ 2 tr
(A'X - LB')dB.
arbitrary dA and dB yields, with the given constraint,
= I.
From these three equations we obtain A'A
= A'XB = LB'B = L
whence L is also positive definite. From the first two equations and the symmetry of L we obtain X'XB = X'A = BL' = BL. From these it follows that 'lj;
= 2 tr
BA'X - tr A'A
= 2 tr
BLB' - tr L
which is to be a maximuln. Write
L = TAT' where T is orthogonal and A is diagonal, and define
A=
AT,
i3 =
BT.
= tr
L,
Eckart-Young Theorem
81
Then
A'A
=:
T'A'AT
=:
T'LT
=:
A
and
B'B
=:
T'B'BT
=:
T'T
=:
I.
Equations (i) to (iii) above then become
A
(i')
XB
(ii')
A'X
=:
T'LB'
(iii')
iJ' B
=:
I.
=:
=:
T' LT iJ'
=:
AB'
From these equations it follows that
X'XB
=:
BA
and
B'B
=:
I.
Thus, A, whose trace is to be maximized (being equal to the trace of L), is a diagonal matrix of r eigenvalues of X'X, and iJ is the matrix whose r columns constitute an associated orthonormal set of r eigenvectors of X' X. A is maximized when these r eigenvalues are a set of r largest eigenvalues of X' X . •
REFERENCES Aiken, John G., John A. Erdos and Jerome A. Goldstein (1980). Unitary Approximation of Positive Operators, Illinois Journal of Mathematics, 24 (Spring): 61-72. Ben-Israel, Adi, and Thomas N. E. Greville (1974). Generalized Inverses: Theory and Applications. New York: John Wiley & Sons. Reprint edition with corrections, Huntington, New York: Robert E. Krieger Publishing Company, 1980. Chipman, John S. (1978). Towards the Construction of an Optimal Aggregative Model of International Trade: West Germany, 1963-1975, Annals of Economic and Social Measurement, 6 (Winter-Spring): 535-554. Chipman, John S. (1983). Dynamic Adjustment of Internal Prices to External Price Changes, Federal Republic of Germany, 1958-1979: An Application of Rank-Reduced DistributedLag Estimation by Spline Functions, Quantitative Studies on Production and Prices (Wolfgang Eichhorn, Rudolf Henn, Klaus Neumann, and Ronald W. Shephard, eds.), Wiirzburg: Physica-Verlag, Rudolf Liebing GmbH, pp. 195-230. Chipman, John S. (1985). Testing for Reduction of Mean-Square Error by Aggregation in Dynamic Econometric Models, Multivariate Analysis - VI. Proceedings of the Sixth International Symposium on Multivariate Analysis (Paruchuri R. Krishnaiah, ed.), Amsterdam: North-Holland Publishing Company, pp. 97-119. Eckart, Carl, and Gale Young (1936). The Approximation of One Matrix by Another of Lower Rank, Psychometrika, 1 (Septenlber): 211-218. Eckart, Carl, and Gale Young (1939). A Principal Axis Transformation for Non-Hermitian Matrices, Bulletin of the American Mathematical Society, 45 (February): 118-121.
82
Chipman
Fisher, WaIter D. (1962). Optimal Aggregation in Multi-Equation Prediction Models, Econometrica, 30 (October): 774-769. Fisher, WaIter D. (1969). Clustering and Aggregation in Economics. Baltimore: The Johns Hopkins Press. Goldstein, Jerome A., and Mel Levy (1991). Linear Algebra and Quantum Chemistry, American Mathematical Monthly, 98 (October): 710-718. Golub, G., and W. Kahan (1965). Calculating the Singular Values and Pseudo-Inverse of a Matrix, Journal of the Society for Industrial and Applied Mathematics, Series B, Numerical Analysis, 2 (No. 2): 205-224. Householder, A. S., and Gale Young (1938). Matrix Approximations and Latent Roots, American Mathematical Monthly, 45 (March): 165-171. Magnus, Jan R., and Heinz Neudecker (1991). Matrix Differential Calculus with Applications in Statistics and Econometrics. Chichester and New York: John Wiley & Sons. Reprinted 1994. Marquardt, Donald W. (1970). Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation, Technometrics, 12 (August): 591-612. Meyer, Renate (1993). Matrix-Approximation in der multivariaten Statistik. Aachen: Verlag der Augustinus Buchhandlung. Mirsky, L. (1960). Symmetric Gauge Functions and Unitarily Invariant Norms, Quarterly Journal of Mathematics, Oxford Second Series, 11 (March): 50-59. von Neumann, John (1937). Some Matrix-Inequalities and Metrization of Matric-Space, Tomsk Univ. Rev., 1: 286-300. Rao, C. Radhakrishna (1965). Linear Statistical Inference and Its Applications. New York: John Wiley & Sons. 2nd edition, 1973. Rao, C. Radhakrishna (1979). Separation Theorems for Singular Values of Matrices and Their Applications in Multivariate Analysis, Journal of Multivariate Analysis, 9: 362-377. Rao, C. Radhakrishna (1980). Matrix Approxilnations and Reduction of Dimensionality in Multivariate Statistical Analysis, Multivariate Analysis - V. Proceedings of the Fifth International Symposium on Multivariate Analysis (P aruchuri R. Krishnaiah, ed.), Amsterdam: North-Holland Publishing Company, pp. 3-22. Rao, C. Radhakrishna, and George P. H. Styan (1976). Notes on a Matrix Approximation Problem and Some Related Matrix Inequalities, Indian Statistical Institute, Delhi Campus, Discussion Paper No. 137, March. Schmidt, Erhard (1907). Zur Thearie der linearen und nichtlinearen Integralgleichungen. I. Theil: Entwicklung willkiirlicher Funktianen nach Systemen vorgeschriebener, Mathematische A nnalen, 63: 433-476. Sondermann, Dieter (1980). Best Approxin1ate Solutions to Matrix Equations under Rank Restrictions. Report No. 23/80, Institute far Advanced Studies, The Hebrew University,
Eckart-Young Theorem
83
Mount Scopus, Jerusalem, Israel (August). Stewart, G. W. (1973). Introduction to Matrix Computations. New York: Academic Press. Stewart, G. W. and Ji-guang Sun (1990). Matrix Perturbation Theory. San Diego: Academic Press, Inc. Styan, George P. H. (1976). "The Berlin Notes" (MS).
An Analytic Semigroup Associated to a Degenerate Evolution Equation ANGELO FAVINI* Dipartinlento di Matematica, Universita di Bologna, Piazza di Porta S. Donato, 5, 40126 Bologna (Italy)
JEROME A. GOLDSTEIN** Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152 SILVIA ROMANELLI * Dipartilnento di Matenlatica, Universita' di Bari, via E. Orabona, 4, 70125 Bari (Italy)
1. Introduction It is well known that an important diffusion process is described with the help of the differential operator
A71(x) :== x(l - x)u"(x)
x E (0,1)
whose domain D(A) includes the so-called Wentzell boundary conditions, i.e. lim x-+O+ ,x-+l-
Au(x) == O.
the corresponding semigroup has been studied by many authors since Feller's work [9]. It arises in many ways in the applications, for instance, in a diffusion approxilnation liluit *Supported by M.U.ll.S.T. GO% and 40% and by G.N.A.F.A. of C.N.R. **Partially supported by a USNSF grant.
85
· 86
Favini et al.
for a sequence of Markov chains related to the Wright - Fisher model in genetics (see [8], Chapter 10). From the point of view of the generation problem, the results of Clement and Timmermans in [6], assure that A with domain
D(A) := {u
E
C[O, 1]
n C 2 (0, 1)1
lim
X~O+ ,x~l-
Au(x) = O}
is the generator of a Co-contraction semigroup on C[O, 1] equipped with the sup-norm 11.1100 and many interesting consequences are derived in approximation theory as shown in the monograph [1]. A subsequent direct approach to the study of existence and uniqueness results concerning Cauchy problems associated to the partial differential equation
a2 u
a(x) ax 2 (x, t) -
au (x, t) = 0, at
o < x < 1,
t
> 0,
with boundary conditions
u(O, t) = u(l, t) = 0,
°
where a(x) := p(x)x(1 - x) with p E C[O, 1] and p(x) > for all x E [0,1], was given in the space H;(O, 1) by Fichera in [10], highlighting also other properties concerning, in particular, the eigenvalues of A. Hence, in a natural way the question arose if analyticity holds for the semigroup generated by (A,D(A)) in some of the above considered spaces. Stimulated by these investigations, in [2] Attalienti and Romanelli examined the more general problem of analyticity for Co-semigroups generated by differential operators of the type AQu = au" on C[O, 1] with Wentzell boundary conditions, provided that a E C[O, 1] and a(x) > 0, for x E (0,1) and a(O) = 0 = a(I). Unfortunately, assumptions on a leading to analyticity of the semigroup gave rise to some restrictions of Aa included the condition that
hO'l) J~(x) dx = +00,
which obviously fails when a(x) := x(1 - x). Recently, analyticity of the semigroups generated by operators of the type Aa, with or without Wentzell boundary conditions, in LP-weighted spaces (1 < P < (0), was studied in [3]. In particular, for a(x) := x(l-x), it follows that if D(A a ) is defined as the completion of C~ (0,1) in the norrrl
Ilul12,a := (1lulli l
+ IIu'IIi2 + Ilx(1 - x)u"lli}.)~,
o
0
then (Aa, D(A a )) generates an analytic semigroup on 2 L1.(O, 1):= {u o
E
2 Lloc(O, 1)1
1 (0,1)
2
lu(x)1 (-) dx < +oo}. a x
Our purpose, here, is to give an explicit description of the domain of A in H~(O, 1), which allows us to obtain the analyticity of the associated semigroup. Interesting consequences
Analytic Semigroup and Degenerate Evolution Equation
87
are derived in connection with the adjoint problem (see [5]). It is also shown that the operator A (with suitable domain) generates a holomorphic semigroup on W1,P(O, 1), for 1
00.
This work was completed during the visit of J.A. Goldstein at the Universities of Bari and Bologna, in May 1996. The authors are grateful to G.N.A.F.A. of C.N.R. for having supported this invitation and J .A. Goldstein is most grateful for the exceptional hospitality of his two coauthors and Enrico Obrecht during this visit and a previous visit to Bologna and Bari in 1994, when preliminary insight into this research was initiated. Let us introduce the operator A on H;(O, l)
2. Main results in Hilbert spaces given by
D(A) :== {u
E
H;(O, 1)1 u" exists (in the sense of distributions) withx(l-x)u"
E
H;(O, I)}
and
foru E D(A).
Au :== x(l - x)u", We have
Theorem 1. (A, D(A)) generates a uniformly bounded sernigr-oup analytic in the right half plane on H;(O, 1). Pr-oof. First of all, let us consider HJ(O, 1) endowed with the inner product
< u, v
>:==
j'
u/(x)v/(x) dx
(0,1)
that is equivalent to the usual inner product
r
r
u(x)v(x)dx+
J(O,l)
u'(x)v'(x)dx
J(O,l)
in view of the Poincare inequality. To motivate our choice of the space, we observe that if lle A > with
°
(1)
AU - x(l - x)u" ==
f
E
H;(O, 1),
u E D(A),
then
A
(1 ')
f
u -u" x(l - x)
x(l - x)
implies necessarily
(2)
j.
(0,1)
7L"(x)u(~r;) dx ==
that is, [u'(x)u(x)]~~~ vanishes.
-
r J(O,l)
lu'(x)1 2 d:E,
Favini et al.
88
Indeed,
[u'(x)u(x)]~~5
( u"(x)u(x) dx == l(o,l)
(3)
j
-
lu'(x)1 2 dx. (0,1)
Moreover,
r
Aj (0,1)
Now
f
E H~(O,
lu(x)12 dx u"(x)u(x) dx x(1 - x) leo,l)
r
=
f(x)u(x) dx.
leo,l) x(1 - x)
1) gives
Ir
x r )l2) dx = j (1) leo,l) x I-x (o,~)xl-x
::: leo,~)xl-X r (1)
11 (l 1f'(t)1 x
f'(t) dtl 2 dx +
I
j(~,l)xl-x ( 1 ) j1 f'(t) dtl
2
dx
x
0
x
j(~,l)xl-x ( 1 ) (1 1f'(t)1 dt)2 dx 1
dt)2 dx +
0
x
: : l(o,~) r x (I-x 1 ) ( r 1 dt)( r 1f'(t)1 dt) dx+ lo lo x
2
+
2
1
r ( 1 ) (j1 12 dt)( Jxr l(~,l)xl-x x
1
J'(t)1 2 dt) dx
::; 21IjIIH~'
(4)
j(x)u(x) . ( ) dx converges smce u x 1- x summable on (0,1) and (3) is verified. and therefore ~(o
1)
,
E H~(O,
1). This implies that
u"u is
On the other hand, this also implies that both limits lim u'(x)u(x), x-+O+
lim u'(x)u(x) x-+1-
exist and belong to C. In order to show that they vanish, we prove that, for all lim u'(x), x-+O+
exist and are in C. To see this, observe that x(1 - x)u"
U
E
tim u'(x) x-+ 1-
== g E
H~(O, 1), so that g(x)
for a suitable constant C depending on u. Analogously,
g(x)
=
-1
D(A), the limits
1
g'(t) dt,
x E (0,1),
==
J; g'(t) dt yields
Analytic Semigroup and Degenerate Evolution Equation
89
gives
Ig(x)1 ~ C~, Hence, 0 S Y S x S
~
x
(0,1).
E
implies
11 1
x
lu'(x) - u'(y)1 ==
< -
u"(t) dtl
y
x
y
=
11 y
x
t~) dtl
tIt
Ig(t)1 dt t(l - t)
2c1x
< ~ y.ji == 4C(# - VY)
~0
as x, y -t 0+. The same argument assures that any function u satisfying our equation admits lirnx-+l- u'(x) E C. Then (2) is proved. Multiplying (1') by u(x), integrating on (0,1) and taking real and ilnaginary parts, we deduce that
r
Re>.
l(O,l)
IIm AI
lu(xW dx + lI u 'IIi2 x(l - x)
j
= Re
(0,1)
j(x)u(x) dx x(l - x)
r
2
(0,1)
j
lu(x)1 dx = IIm f(x)u(x) dxl. x(l - x) l(O,l) x(l - x)
This yields
(Re'\+IIm'\l)
r
r
lu(xW dx+llu'lIi2 S2( If(xW dx)!(j lu(xW dx)!. leo,l) x(l - x) leo,l) x(l - x) (0,1) x(l - x)
Consequently
I,\I(
j
(0,1)
lu(x)12 ~ ( ) dx) 2 S 2(
x 1-
X
j
Ij(x)1 2 .1 ( ) dx) 2 S 41IfIIHHO,1)'
(0,1) X
1- x
On the other hand, multiplying (1) by -u"(x) and integrating on (0,1) we have
-A
r
x(l - x)lu"(x)1 2 dx == -
u(x)u"(x) dx + j
l(O,l)
(0,1)
j'
j(x)u(x) dx;
(0,1)
in view of (3), this reads
(5)
r
A lu'(xWdx+j x(1-x)lu (x)1 2 dx==-j j(x)u(x)dx. leO,l) (0,1) (0,1) ll
Favini et al.
90
Taking real and imaginary parts in (5)l we easily obtain
(Re A + IIm AI)IIu'IIi2
+
r
x(l - x)lu"(x)1 2 dx
l(o,l)
~ 21 = 21 :S
r
~
f(x)u"(x) dxl
)(0,1)
r
f(x) Jx(l - x)lu"(x)1 2 dxl l(o,l) JX(l - x)
41IfIIH~(O,1)(
r
leo,l)
x(l - x)lu"(xW dx)!.
Notice that x(l - x)u" E H~(O, 1) implies
r
( 1 ) Ix(l - x)ul/(xW dx l(o,l) x 1 - x
=
r
l(o,l)
x(l - x)lu"(x)1 2 dx < +00,
by the above remark. Hence
(j
x(l - x)lu"(xW
(0,1)
Moreover, since
f(o,l)
j
dx)~ :S 41IfIIH~(O,1)'
j(x)u"(x) dx is convergent by f(X)U"(X) dx
(0,1)
=
r l(o,l)
f(x) Jx(l - x)u"(x) dx, JX(l - x)
and the Cauchy-Schwarz inequality, we deduce that
r
f(x)u"(x) dx
l(o,l)
= [f(x)u'(x)];~6 -
r
j'(x)u'(x) dx
l(o,l)
and then both lin1its linl f(x )u' (x),
x---+o+
linl j (x )u' (x) x---+l-
exist. Since f E H;(O, 1) and both limits lirnx---+o+ u'(x) and limx---+l- u'(x) belong to C, we conclude that for all u verifying (1) we have
j.
j(X)U"(X) dx == -
(0,1)
j'
f'(x)u'(x) dx.
(0,1)
Therefore, from (4), rewritten as
A IIu'IIi2
+
r
l(O,l)
x(l - x)lu"(x)1 2 dx ==
r
l(o,l)
j'(x)u'(x) dx,
Analytic Semigroup and Degenerate Evolution Equation
91
we deduce the a priori bound
for some absolute constant c. The preceding arguments show that A is symmetric on
H; (0,1) too, since
< Au, v > == {
(x(l - x)u")'v' (x) dx
leO,l)
== - {
x(l - x)u" (x)v" (x) dx
leO,l)
== {
u' (x )(x(1 - x)v")' (x) dx
leO,l)
==< u,Av > for all u, v E D(A). Moreover
r
< Au, u >== -
x(1 - x)lu"(x)1 2 dx
:s; 0,
leO,l)
so that A is nonpositive. On the other hand, we observe that, for all u, v E D(A),
< (1 - A)u, v > 1I~ ==
j'
1l' (x )v' (x) dx
r
+
. eO,l)
x(l - x)1l" (x )v" (x) dx .
leO,l)
Let us introduce the Hilbert space V defined by
V :== {u
E
r
H;(O, 1)1
x(1 - x)lu"(x)1 2 dx < oo}.
leO,l)
It coincides with the cOlnpletion of C~ (0,1) with respect to the norm
Ilull~ :==
1
lu'(x)1 2 dx +
(0,1)
1
2
x(1 - x)lu"(x)1 dx.
(0,1)
To see this, note that for u E V,
{
lu'(x)1 2 dx == Re (
leo,l)
-u(x)u"(x) dx
.l(0,1)
J -u(x)
== Re[l
x(1 - x)
(0,1)
:s;
[j'
X
,lt )I.
. (0,1) X
2
1- x
)
dx.
1l"(i)
r
l(O,l)
JX(l - x) dx]
11l"(xWx(1- x) dx]! <
00,
Favini et al.
92
by (4) and Cauchy - Schwarz inequality, The sesquilinear form
a(u, v) :==
j'
u'(x)v'(x) dx
+ (
(0,1)
x(1 - x)U"(X)V"(x) dx
J(O,l)
is continuous on V x V and coercive, because Ilull~ == a(u,u). It follows that the operator B associated to a(., .) (see [13], Theorems 2.22 and 2.23, pp.28-29) is an isomorphism from V to its dual V* and the part B of B in H~(O, 1) is positive definite and self-adjoint. Since
D(B) == {u
E
VI Bu E H~(O, I)},
the operator B is precisely I - A, so that B == I - A is onto H~(O, 1).
o Corollary 1. The operator W defined by
D(W) :== {u E H1(0, 1)1 u" exists in the sense of distributions andx(1 - x)u" E H~(O, I)}, Wu:== x(1 - x)u", foru
D(W),
E
generates an analytic semigroup on H1(O, 1). Proof. In order to solve the equation AW - x(1 - x)w == f
(6) with
W
E
D(W), we notice that f
E
E
H1(O, 1),
0[0,1] and, hence, we can introduce
h(x) :== f(x) - (1 - x)f(O) - xf(I),
x
E [0,1]
which, obviously, belongs to H ~ (0, 1). As a consequence of Theorem 1, we can affirm that, for all A, with Re A + Ilm AI 2: Eo > 0, there exists a unique u E D(A) such that
(7)
AU(X) - x(1 - x )u" (x) == h(x).
This means that u E H~(O, 1) and x(1 - x)u" E H~(O, 1). But (7) can be rewritten as
A(U(X)
I-x
x
+ -A-f(O) + ~ f(I))
and, thus, W :== u
I-x
+ -A-f(O) +
I-x - x(1 - x)(u + -A-f(O)
x
+ ~ f(I))"(x)
== f(x)
x
~ f(l) E H 1 (0,1) solves precisely (6) with Wentzell
boundary conditions. Moreover 1- x
X
Ilwlllll S Ilulllll + II-A-!(O) + ~ !(I)IIHl is estimated (via Sobolev imbedding) as follows
Since uniqueness easily follo,vs frolll Theorem 1 too, this concludes the proof.
o
Analytic Semigroup and Degenerate Evolution Equation
93
Remark 1. As a by- product, we derive the following regularity property for a related degenerate differential operator with Neumann boundary conditions. We could restrict ourselves here to a(x) :== x(l-x), but all subsequent arguments work in the case that a E 0 1 [0,1], a > on (0,1) and a(O) == a(l) == 0; so we assume these conditions on a in what follows. Actually in [5] a detailed study is given for more general operators. Here we merely give a simple, direct approach. Let us introduce the operator (B,D(B)) on L 2 (0, 1) given by
°
D(B):== {u E L 2 (0,1)luis locally absolutely continuous in(O,l), au' E H;(O,l)} d du Bu :== dx (a dx)' Then, B is a closed densely defined operator on L 2 (0, 1). From Corollary 1 we deduce that for all f E H 1 (0,1) there exists a unique u E H 1 (0,1) such that au" E H~(O,l) and
AU(X) - a(x)u"(x) == f(x),
x
E (0,1);
it follows that
AU' - (au")' == f'
E
L 2 (0, 1).
Then, v(x) :== u'(x) E £2(0,1) satisfies
AV(X) - (av')'(x) == f'(x), where av' E H~(O, 1) and is the unique solution to
Av-Bv==f'·
J;
Thus, let 9 E L 2 (0, 1), and observe that f(x) :== g(t) dt E H 1 (O, 1) and all functions h E HI (0, 1) with 9 == h' reduce to h == f + e, with an arbitrary constant e. Hence, if AV - (av')' == 9 == (f + e)', v E D(B) then, necessarily v
= (u + ~)' = u', where u + ~ satisfies for all c, Re). > 0, c
c
A(u+~)-a(u+~)"==f+e,
o:(u(x)
+ ~)I/ -t O.
for x --t 0+, x --t 1- . Therefore, taking into account that
If(;c)1 ~ and, hence
we conclude that
This proves the following
VI Ilgll£2,
x E (0,1)
Favini et al.
94
Corollary 2. The opeTator (B,D(B)), defined as above, geneTates a unifoTmly bounded Co-semigroup on L 2 (0,1), analytic in the Tight half plane. 3. Analytic semigrollps in W 1 ,P(0, 1) (1 < P < 00) and differentiable semigrollps in C 1 [0, 1]. The techniques of passing from Au == (au')' (where a E C 1 [0, 1]), with generalized Neumann boundary conditions of the form
a(x)u'(x) ---t
°
(as x --t 0,1),
J;
to Bu == au" with Wentzell boundary conditions by replacing u(x) by v(x) :== u(y) dy enable us to translate properties of A(== A p ) on LP(O, 1) investigated in [5] to corresponding properties of B( == Bp) on W1,P(0, 1). The following result is a particular case of [5], Theorem 2.9, reestablished here in a direct way.
Proposition 1. Let a be in 0 1 [0,1] with a > (A p , D(A p )) is defined by
° on (0,1)
and a(O) ==
°
== a(I).
If
D(A p ) :== {u E Wl~~ n LP(O, 1)1 au' E W;'P(O, I)}
Apu :== (au')' then (A p , D(A p )) geneTates
(l
Co - analytic sernigr'Oup on LP(O, 1), fOT 1 < P < 00.
Proof. As a consequence of Corollary 2, the assertion is already true for p == 2. Now, let us examine the case p > 2. If f E LP(O,I) and A E C, Re A > 0, there exists a unique u E D(A p ) such that
(8)
AU - (au')' ==
Let us multiply (8) by
(9)
Allull~ -
ulul p - 2
r
f.
and integrate froll1 0 to 1. Thus we obtain
(au')'(x)luI P- 2(x) d:I: =
)(0,1)
r
j(x)u(x)luIP-2(x) dx.
)(0,1)
Defining (3:==
r j'
j(x)u(x)luIP-2(x) dx
./(0,1)
I :== -
(au')' (:c )u(:c) lul p - 2 (x) d:c,
(0,1)
we can rewrite the equality (9) as follows Allull~
+I
== {3.
Analytic Semigroup and Degenerate Evolution Equation
95
By Holder's inequality,
1
where p
1
+ - ==
1== ( )(0,1)
==
1. Also, integration by parts yields
p'
1
(o:u')(x)(u(uu)~)'(x) dx a(x)u'(x)u'(x)l u P - 2 (x) dx+ I
(0,1)
+P-
2
2
==
j'
r
a(x)u'(x)u(x)luI P- 4 (x)(u'(x)u(x)
+ u(x)u'(x)) dx
)(0,1)
a(x)l/u'(:r)1 2 IuI P - 2 (:r) dx
+ (p -
2)
(0,1)
j'
o:(x)luI P - 4 (x)(u'u)(x) Re (u'u) (x) dx.
(0,1)
Taking real and imaginary parts in (9), we deduce that, respectively,
(Re A)llull~
j' == r
a(x)lu'1 2(x)luI P - 2(x) dx
+
(0,1)
+ {
a(x)luI P - 4 (x)(Re (u'u) (x))2 dx ==
)(0,1)
Re (f uluI P - 2 )(x) dx
.1(0,1)
~ IIm Alllullt
Ilfll p l u llt- 1 ~
+ (p -
2)
r
a(:c)luI P- 4 (x)s'ign(Im A)Im (u'u)(x)Re (u'u) (x) dx =
.1(0,1)
== sign(Irn A)
r
Irn (fuluI P - 2 )(x) dx
.1(0,1)
~ Ilfllpllull~-l, Thus, for 0 < c
~
1 it follows that
clAlllull~ ~ (l?e
~
(-1
A+ cllrn AI)llull~ + c(p -
2))
j'
a(x)lu'12(x)luIP-2(x) dx + (1
(0,1)
+ c)llfllpllull~-l
~ 21Ifllpllull~--1
provided that c
~
1 - - , Since p p-2
> 2,
where cp == 2(p- 2) (hence cT) ~ co asp ~ (0) and this holds for all and all A with lie A > 0,
f
E
L 2 (0, 1) nL(X)(O, 1)
Favini et al.
96
Now, we assume that 1 < P < 2 and observe that, by duality we have
< Au,v >==< u,Av > where Au
== (au')' with boundary conditions a(x)u'(x)
-1-
Oasx
-1-
0,1
and u, v are in various spaces. Thus, formally, A; == A p " where A p (resp. A;) acts on LP(O, 1) (resp. LP' (0,1)) and
+ p-1 ==
(p')-l
Since
II(A -
1, for 1
00.
A*)-Ill == 11(5:: - A)-Ill,
this second case on p can be deduced from the first case and the estimates used in its proof. Moreover the fact that R(AI - A) (for Re A > 0) is dense in LP(O, 1) follows from the L 2 case.
D Let 1 < p <
00
and define Bp'lL
:== au" for
u E D(B p), where
D (Bp) :== {u E W l,p (0, 1) n Wl~': (0, 1) I Bp u E W 1,p (0, 1) and lim a (x) u" (x) == O} x-+O,I
i.e. Bp is equipped with the Wentzell boundary conditions. Thus, we can prove the following Theorem 2. (Bp, D(Bp)) (for 1 W1,P(0, 1).
<
P
< (0) generates a Co analytic semigroup on
Proof. Let 1" < P < CX) and F E W I ,P(O, I) with F(O) == F(I) == 0 and take f == F' E LP(O, 1). From for every A E C,ReA > 0, there is v E D(Bp ) such that (10)
AV - (av')' == f
with
a(x)v'(x) -+ 0, asx -+ 0,1.
:== foX v(s) ds. Then u' == v and integrating (10) from 0 to x we deduce that AU - av/' == F and (av')(x) == (a'Zl,")(x) -1- 0 as x -1- 0,1. Thus u E D(B p) with (A - Bp)u == F and
Let u(x)
Itn'lip = Ilvll p ~ 1~lllfllp = 1~IIIP'llp. Since Ilu'lI p is equivalent to the usual norm of u in W;'P(O, 1), by the Poincare' inequality, it follows that Bp (suitably restricted) generates an analytic semigroup on W;,P(O, 1). The extension of this result froIn W;'P(O, 1) to WI,P(O, 1) follows by the same argument used to extend the p == 2 case frolli H~(O, 1) to H 1 (0, 1), so we may safely omit the details (see the proof of Corollary 1). D Using the same type of approach, new results can be also obtained in the space CI[O, 1], as the following Theorem shows.
Analytic Semigroup and Degenerate Evolution Equation
97
Theorem 3. Under the saute assu'mptions as in Proposition 1, the operator (B, D (B)) given by
D(B)
:==
E
{VJ
C 1 [0, 1] n C 2 (0, 1)1 aw"
E
C 1 [0, I]}
B'u :== au" generates a Co differentiable sernigroup on Cl [0,1]. Proof. Let us observe that, in view of [5] Theorem 3.3, the operator (A,Doo(A)), where
and Au :== (au')', generates a Co differentiable contraction semigroup on C[O, 1]. Now, let F E C 1 [0, 1], A > and consider the equation
°
AU - (au')' == F'.
(11 )
Since F' E C[O, 1], it has a unique solution u E D(A oo ), with
±
1171IIC[O,l) ::; 1IF'llc[o,l]' Hence, integrating (11) froIn A
I
°to
x, wc deduce that
'u(y) dy - a(x)u'(x) = F(x) - F(O),
(O,x)
namely A[
j.
1
F(O) d2 u(y) dy + -\-] - a(x)-d2 [
(O,x)
r
x
/\
Let
w(x):==
j.
F(O) u(y) dy + -\-]
(O,x)
/\
F(O) u(y)dY+-A-'
(O,x)
1
Then w E C [0, 1]. Moreover,
a(x)w"(x) and auJ" E C 1 [0, 1], hence
71J
E
==
a(x)u'(x) -+
° asx
-+ 0+,1-
D(B). We also notice that, if A71J -
av/' == 0,
w E D(B),
then d a (w ')' A'll! , - -d x
== 0,
w' E C[O, 1]
==
F(x).
98
Favini et al.
and this irnplies that 71/ == 0, hence 'lv(x) == const. Consequently w" == 0 and therefore w == O. Thus the uniqueness holds. Now, we 11lUSt estimate the norm of w. To this aim, we observe that the norl11 Ilwlll :== max{lw(O)I, Ilw'llc[O,l]} is equivalent to the usual norrn
because obviously and, on the other hand, 'lLJ(X)
r
== w(O) +
w'(t) dt
J(O,x)
implies that for every x E [0, 1]
l'lv(x) I
s Iw(O)1 +
r
17ll'(t)1 dt
J(O,x)
+ II w 'lIc(o,l]
S Iw(O)/
S 211wlll. Hence IIwlle 1 S max{21Iwllr, Ilw'llc[O,l]} S 211wllr· Let us come back to our estilllate 11 W
III == 111ax{ 17ll (0) I, 11 7V '11 c [0 \1] } F (O) 1
== 111ax{I--I, Ilullc[O,l]} A
{ 11'(0) I IIF'llc[O,l]}
<
1
- n1ax -A-'
A
=
~ max{IF(O)I, IIP'llc[O,l}}
==
>: IIFlll'
1
Then, (B, D(B)) generates a Co-contraction sernigroup on Cl[O, 1]. Moreover, for A in a suitable region ~ as described in [12], Theorem 4.7 p.54, there exists c > 0 such that
Ilullc[O,l] :S On the other hand
c(l
+ IImAI)IIF'llc[O,l]'
Analytic Semigroup and Degenerate Evolution Equation
Iw(O)1
1
= p:jIF(O)1 :s;
99
c(l
+ IIm'\l)llFlll'
Therefore rnax{ Iw(O) I, Ilw'llc[Oll]} S c(l
+ IIm AI) IIFlll.
and the semigroup generated by (B,D(B)) is differentiable on Gl[O, 1], according to the above mentioned result in [12].
o
°
Let us observe that, ifu E D(A oo ) (resp. u E D(B)), then limx~o+,x~l- a(x)u'(x) = (resp. limx~o+ ,x~l- a(x)u"(x) == 0). In particular, all previous results hold for a(x) :== x(1 - x)m(x), where x E [0,1] and m E Gl[O, 1], with m(x) > 0 in [0,1]. Final remarks.
The long standing conjecture in this area concerns
Au :== x(l - x)u" with Wentzell boundary conditions. By Clement and Timmerrnans [6], A generates a Go - contraction semigroup on G[O, 1]. Is this semigroup analytic? After this work was done, but while the final revisions were being made, G. Metafune kindly provided us with a preprint [11], which states that on G[O, 1], A generates a semigroup analytic in the right half plane. Thus, despite the boundary degeneracy, the operator u -t x(1 - x)u" with Wentzell boundary conditions generates an analytic semigroup on many spaces of interest. REFERENCES 1. F. Altomare - M. Campiti, Korovkin-type Approximation Theory and its Applications, de Gruyter
Studies in Mathematics, 17 WaIter de Gruyter Co., Berlin, New York, 1994. 2. A. Attalienti - S. Ronlanelli, On some classes of analytic semigro'Ups on C([a, b]) related to R or r admissibLe mappings, Evolution Equations, G. Ferreyra - G.R. Goldstein - F. Neubrander (eds) Lect. Notes in Pure and Applied Math. 168, M.Dekker, New York - Basel - Hong Kong, 1995, pp. 29-34. 3. V. Barbu - A. Favini - S. Rornanelli, Degenerate evolution equations and reguLarity of their associated semigroups, Funkc. Eqvc. (to appear). 4. H. Brezis - W. Rosenkrantz .. B. Singer, On a degenerate elliptic-paraboLic equation occurring in the theory of probability, Comm. Pure Appl. Math. 24 (1971), 395 - 416. 5. M. Campiti - G. Metafune _. D. Pallara, Degenerate self - adjoint evolution equations on the unit interval, Semigroup Forum (to appear). 6. Ph. Clement - C.A. Timmermans, On Co-semigroups genernted by differential operators satisfying VentceL's boundary conditions, Indag. Math. 89 (1986), 379 -387. 7. R. F. Curtain - H. Zwart, An Introduction to Infinite - d'irnens'tonal Linear- Systems Theory, Springer, 1995.
100
Favini et al.
8. S. N. Ethier - T. G. Kurtz, Markov Processes, Characterization and Convergence, WHey Series in Probability and Mathematical Statistics, J. WHey, 1986. 9. W. Feller, The parabolic differential equations and the associated semi-groups of transformations, Ann. of Math. (2) 55 (1952), 468-519. 10. G.Fichera, On a degenerate evolution problem, Partial Differential Equations with Real Analysis, H. Begehr - A. Jeffrey (eds), Pitman Research Notes in Mathematics Series 263, Longman Scientific and Technical, 1992, pp. 15-42. 11. G. Metafune, Analyticity for some degenerate evolution eq'uations on the unit interval, preprint (1996). 12. A. Pazy, Semigro'ups of linear' Operators and Applications to Partial Differential Equations, Springer Verlag, Berlin - Heidelberg -Tokyo, 1986. 13. H. Tanabe, Equations of Evolution, Pitman Monographs and Studies in Math., London, San Francisco, Melbourne, 1979.
Degenerate Nonlinear Parabolic Problems: The Influence of Probability Theory
JEROME A. GOLDSTEIN* Department of Mathematics, Louisiana State University, Baton Rouge, LA 70803, USA CHIN-YUAN LIN Department of Mathematics, University of South Carolina, Columbia, S.C. 29208 and Department of Mathematics, National Central University, Chang-Li 320, Republic of China KUNYANG WANG Departnlent of Mathematics, Louisiana State University, Baton Rouge, La. 70803, USA
1. INTRODUCTION
Of concern are mixed initial-boundary problems for the nonlinear equation (1)
for x E [0,1] and t 2: 0. Here cp is continuous and positive on (0,1) x lR, but cp(x,~) may approach 0 as x tends to either 0 or 1. Thus the diffusion coefficient may degenerate on the spatial boundary. Problems like this with nonlinear, degenerate diffusion coefficients arise in a variety of contexts in fluid dynamics and elsewhere. The particular example
Dv
1
a
4
au
2
- == --[y (u+ - +u )] at y2 ay ay * Partially supported by an NSF grant
Current affiliation: University of Memphis, Memphis, Tennessee 101
(2)
Goldstein et al.
102
°
(for < Y < 00 and t ~ 0) arises in physics and reduces to (1) when one sets u(t, x) == v(t, tan(~x)). The theory of the Kompaneets equation (2) is far from complete; in particular, well-posedness for the Cauchy problem is not yet established. (But see Goldstein [11] for partial results and related references.) A systematic study of (1) was begun by Goldstein and Lin in [12] in 1987 and continued in [13]-[15]' [17], [18]. Among the related articles we cite the interesting work of Dorroh and G. R. Goldstein [7], [5], [6] who allow cp == cp(x, u, u x ) to depend on u as well. But because this case does not admit a global quasi-dissipative estimate, only local existence is established in general; as global existence is our main concern, we restrict our attention to (1) here. Suppose cp(x,~) ~ CPo(x) where CPo E 0(0,1) and l/cpo E L 1 (0, 1). Then a variety of (linear and nonlinear) boundary conditions can be associated with (1), and the resulting problem is governed by a (nonlinear) contraction semigroup on 0[0,1]. (See [12] and also [8] and the remarks in [9] for extensions.) But for a very degenerate case, such as the Kompaneets equation (or its integrated version), where cp(x,~) behaves like x 2 near x == 0, the appropriate boundary condition is the Wentzell boundary condition:
cp(x, ux)u xx for each t
~
+ 'ljJ(x, u, u x ) -+
°as x -+ 0,1
0. (See [13].) In other words the Cauchy problem takes the form
du/dt == Au,
u(O) == uo,
for u : lR+ == [0, (0) -+ X == 0[0,1]. Here f E i1(A) iff f E X n 0 2 (0, 1) and Af E 0 0 (0,1), that is, Af E X and (Af)(x) -+ as x -+ 0,1 (where Af == cp(x, f')f" + 'ljJ(x, f, f')). This boundary condition is formally equivalent to du/dt == on the spatial boundary {O, I}, whence the Wentzell boundary condition can be viewed as an inhomogeneous Dirichlet boundary condition where the boundary value (for all t ~ 0) is that of the initial function
°
°
Uo· In [13] it was shown that A determined by (1) (with the Wentzell boundary condition) is m-dissipative on X == 0[0,1], provided 'l/J == and cp(x,~) ~ CPo(x), CPo E 0[0,1], and CPo > on (0,1). Thus cp(x,~) may approach zero with arbitrary speed as x -+ or 1. In [15], an extension was made to a special class of nonzero 'ljJ. The motivation for the hypothesis on 'l/J comes from a beautiful linear result of Clement and Timmermans [3], which can be viewed as the final sharp result in a theory begun by W. Feller in the 1950s. This result is as follows.
°
°
°
°
Let n, (3 E 0(0, 1) with n > on (0, 1). On X == 0[0, 1] define Bu == nu" + (3u', where u E i1(B) if u E 0 2 (0,1) n X and Bu E 0 0 (0,1), so that B has the Wentzell boundary condition. Then B is densely defined and dissipative. The Clement-Timmermans' theorem states that B is m-dissipative (Le. Ran (1 - B) == X) iff both (A o ) and (AI) hold: Let
{-l ~~:~dS}. x
W(x)
=
exp
"2
Degenerate Nonlinear Parabolic Problems
W E Ll(O,~) W E Ll(~, 1)
103
1
or or
f02 W(x) I;(a(s)W(s))-l dsdx == 00, It2 W(x) jl(o:(s)W(S))-l dsdx == 00. x.
The idea is best explained in terms of the underlying Markov diffusion process. The drift coefficient f3 is competing with the diffusion coefficient a. The purpose of the boundary condition is to instruct the Markov particle how to proceed after it reaches a boundary point j E {O, I}. If the particle cannot reach j, then no boundary condition should be assigned at j, since doing so restricts the domain of B too much and prevents I - B frorn being surjective. Condition (A j ) is what ensures that the Markov particles actually reach the boundary point j. In the case of (1),
°
Here we want to discuss a new result of this type. Details will appear in [16], but it is still not strong enough to cover the Kompaneets equation. Thus, in his thesis [20], K. Wang has sought to approach the Kompaneets equation by its most general linear version. Below we indicate his extension of the Clement-Timmermans result to the context of a generalized linear version of equation (2), namely
8v 1 at = (3(y)
[8U + k(y)u) ] .
8 ay a(y) ay
(3)
The integrability conditions we impose give rather sharp results. They are sharp in the sense that when our nonlinear equation reduces to a linear one, the sufficient condition becomes necessary as well. This can be achieved by relaxing the hypotheses
Goldstein et al.
104
conditions and hints on how to make hypotheses giving useful sufficient conditions in terms of integrability conditions. The final results make no mention of probability theory, either in the statements or the proofs. Still, this research could not have been done without the motivational influence of probability theory. It is a pleasure to dedicate this paper to Professor M.M. Rao on his 65th birthday. M.M. is the mathematical father of the first named author and the mathematical grandfather of the other two authors.
2. DEGENERATE NONLINEAR DIFFUSION WITH DRIFT: INTEGRABILITY CONDITIONS
We want to present a precise statement of a new existence theorem for (1). Details will appear in [16]. Comparing (1) with the linear equation Ut
== a(x )u xx + {3(x )u x
treated by Clement and Timmermans [3] (and discussed in Section 1), we view cp as corresponding to a and 'ljJ as corresponding to (3u x . Since cp(x,~) 2:: CPo(x), it may seem more appropriate to view cpo as the analogue of a, but our result will emphasize cp itself. For simplicity of presentation, we considered the sirnplified Clement-Timmermans' criterion, namely that
x -+ W(x) == exp{ _
jX f3(s)a(s)-lds} 1/2
1
is in L (0, 1). (Cf. (A o) (AI) of Section 1.) The operator A is defined on X
== 0[0, 1] by
(Au)(x) == cp(x, u'(x))u"(x))
+ 'ljJ(x, u(x), u'(x))
for x E [0,1] and u E f'(A) == {v E 0 2 (0,1) n X : Av E 0 0 (0, 1)}, i.e. Av should be in X and should vanish at the endpoints x == 0,1. Thus we view A as being equipped with the Wentzell boundary condition. We now state two minimal sets of hypotheses on the coefficients cp and 'ljJ. (B1) cp E 0([0,1] x IR); cp(x,~) 2:: CPo(x) and CPo(x)
>
°
for all (x,~) E (0,1) x IR;
and
cpo E 0[0,1]. IR ); 'ljJ(x, 7],~) is non-increasing in 7] for each fixed (x, ~); 'ljJ(x, 0, 0) for all XE[O,l]; and for all r > there is a constant K (r) such that
(B2) 1/J
E
0([0,1]
X
2
°
11/J(x,7],~)I:s K(r)(l
for all (x, 7],~) E [0,1] x [-r, r] x IR.
+ I~I)
==
°
Degenerate Nonlinear Parabolic Problems
105
The (non-increasing) monotonicity of TJ -+ 'ljJ(x, TJ,~) can be weakened to monotonicity of TJ -+ 'ljJ(x, TJ,~) - WTJ for some real w. The drift coefficient 'lj)(x, TJ,~) can have arbitrary growth in TJ but 'ljJ is restricted to have linear growth in ~. But the arbitrary growth in TJ is illusory. By the maximum principle, solutions u should satisfy sup lu(x, t)! ::; sup lu(x, 0)1, :-n,t
x
and so the only relevant values of TJ (for fixed u(', 0)) correspond to a bounded interval. Let (B1) , (B2) hold. Suppose also that cp 2: c for some c > 0, i.e., CPo(O) > 0 and CPo(l) > O. Then A is densely defined and m-dissipative, and so A determines a sellligroup T == {T(t) : t 2: O} by the Crandall-Liggett-Benilan theorem [4], [2], [1]. This implies that for all
f
E ~(A) ==
X,
t
u(t) == T(t)f == lim (1 - _A)-n f n-HX) n exists ( for all t 2: 0) and defines the unique mild solution of
du(t)/dt == A(u(t)),
u(O) ==
f.
This is the unique solution (in a suitable generalized sense) of
(4) u(x,O) == f(x), cp(x, u, )u xx
+ 'ljJ(x, u, u x )
-t 0 as
(5)
x -t 0,1.
Moreover, Ilu(t) - v(t)IICXJ is non-increasing in t for all solutions u, v of the above problem (4), (5) corresponding to initial conditions f, 9 respectively. The Wentzell boundary condition (5) means that Ut == 0 on the spatial boundary {O, I}; hence u(j, t) == f(j) for all t 2: 0 and j == 0,1. To verify the hypotheses of the Crandall-Liggett theorem, we lnust check three conditions.
(Cl)
For some A > 0 and all h some dense set in X, there is a u in
~(A)
satisfying
u - AAu == h. (C2) If Ui - AAui holds for i == 1,2 where hi E X and A > 0, then
(C3) The graph of A is closed in X x X, and
~(A)
is dense in X.
The last condition is easy and we will not discuss it further. (C2) is the dissipat'ivity of A. (Cl) is the range condition. (Cl) is the hard part so we begin with (C2).
Goldstein et al.
106
Let (B1), (B2) hold and let hi ,'\, Ui, be as in (C2). Let U = UI - U2 (or U2 - UI if necessary). Choose Xo E [0,1] such that u(xo) = Ilulloo. If < Xo < 1 then u'(xo) = 0, u"(xo) ::; 0, whence
°
/I'Ul -
u21100
= u(xo) ::; u(xo) - '\'P(xo, u' (xo) )u" (xo) =
(UI -
,\AUI)(XO) - (U2 - ,\AU2)(XO)
-'\{1jJ(xo, UI(XO), u~(xo)) -1jJ(xo, U2(XO), u~(xo))} since u~(xo) = u~(xo), ,\ > 0, 'P ~ 0, ::; (UI -
(UI -
U2)"(XO) ::; 0, (6)
,\AUI)(XO) - (U2 - ,\AU2)(XO)
since '\1jJ(xo, "7, u~ (xo)) is nonincreasing in
'TJ
= (hI - h 2 )(xo) ::; Ilh l
h 2 11oo.
-
If Xo E {O, 1}, then equality holds in (6) since AUi(XO) = 0, and (C2) is verified in all cases. For the range condition, let ,\
>
°and hEX. We want to solve U -
'\Au = h.
°
This is easy to do when 'P ~ c > on [0, 1] x JR. The boundary condition (Au(j) == 0) implies u(j) = h(j) for j = 0,1. Let f(x) = ax+{3 be the linear function such that k(x) = h(x)-f(x) vanishes at both endpoints x = 0,1. Let v(x) = u(x) - f(x). Then U- '\Au = h is equivalent to v - Bv = k, where
(Bv)(x) = '\{'P(x,v'(x) +a)v"(x) +1jJ(x,v(x) +f(x),v'(x) +a)} for v E :.D(B)
= {w
E C 2 (O, 1)
n X : Bw
-v
"
E X, w(O)
= w(l) = O}. This can be rewritten as
h - v - '\1jJ(x, v + f, v' + a) =--------'\'P(x, v'
+ a)
with (homogeneous) Dirichlet boundary conditions. Using the Green's function for the Dirichlet Laplacian on [0,1] this becomes the integral equation
1 1
v(x)
=
G(x, y)cp(y, v'(y)
+ a)-1[A- 1(h(y) -
v(y))
+ 'IjJ(y, v(y) + £(y), v'(y) + a)]dy
which can be solved by a fixed point argument (cf. [12]' [15]). For
Un - '\Au n == h. By the above argument Un exists. We want to show that as n --+ solution u of u - '\Au = h.
00,
Un converges to the
Degenerate Nonlinear Parabolic Problems
107
The proof uses a potential theory type argument (cf. [12], [15] ). The following two hypotheses allow this to be done. 'ljJ(x, TJ,~) == Mo(x, TJ,~)
(B3)
where Mo, M 1 E C 1 ([0, 1] x JR 2 ), and for each R
+ ~M1 (x, TJ,~)
> 0,
sup{IMi(x, TJ, ~)I : x E [0,1], ITJI ::; R, I~I ::; R, i == 0, I} < 00, aM sup{1 ax (x, TJ, ~)I: x E [0,1], ITJI ::; R, I~I::; R, i == 0, I} < 00, and Mo(x,TJ,~) ==
°
for all (TJ,~) E JR2 and x E {O, I}.
The integrability hypothesis is as follows. (B4) There exists a /-L E L1 (0,1) such that sup
VEC 1[O,1]
{exp[-
j,x1/2
-1 M 1 (s,
v(s), v'(s) )ds]} ::; p,(x) a. e. on (0,1).
In the above integrability condition, 'P corresponds to a and M 1 to (3 (cf. (A o), (AI))'
Theorem 1. (Goldstein-Lin [16]). Let (Bl)-(B4) hold. Then A is densely defined and m-dissipative on X == e[O, 1]. In particular, (Cl)-(C3) holds and A determines a strongly continuous contraction semigroup (given by the Crandall-Liggett exponential formula) which governs the 11Jell posed Cauchy problem Ut == Au, u(O) == f for f E X. For a specific example, let 'P(x,~)
==
a(x)'P1(X,~),
Mo == 0,
M1(X,TJ,~) == (3(X)'l/J1(X,TJ,~)
where for some
Cl
>
°and all (x, TJ,
~),
In this case, ignoring regularity questions, (B4) holds if and only if W E L 1 (0, 1) where W is defined using a, (3 in the usual way (cf. (A o), (A 1) ). The hypotheses in Theorem 1, especially (B3), (B4), are much less restrictive than the (sign restriction) hypotheses of [15].
The Generalized Kompaneets Equation
Of concern is
1
Ut
= j3[a(ux + ku + F(u))]x
(7)
Goldstein et al.
108
for
°< t, x < 00 with initial conditions u(x, 0)
==
f(x) and boundary conditions
a(u x + ku + F(u)) -+ as x -+ 0, 00. Here (J, a, a', k, k' E C(O, 00) with a, (3 the operator A by
°
°
> on
(0,00); and FECI (IR). Define
Au == (3-l[a(u' + ku + F(u))]', X == L l ((O,oo);(3(x)dx) : Av E X, a(v'
with domain l1(A) {v E + kv + F(v)) -+ 0 as x -+ O,oo}. By standard nonlinear sernigroup theory, it is not difficult to see that A is dissipative on X, (i.e., (Cl) holds) if and only if for all Ul, U2 E l1(A),
where signor == rIlrl or 0, according as r i- 0 or r == 0. Let set {x E IR : u( x) i- O} as a union of open intervals
u == Ul
- U2 and write the open
00
Then
(AUl - AU2' signo(u))
=
f jb n=l
n
(Aul - AU2) signo(u)iJdx,
an
which is non-positive, provided that each term is. Consider
where 0 < an < bn < 00 and u 0, u' (b n ) :S o. Consequently
>
0 on (anb n ). Then u(a n )
== u(b n ) == 0 and u'(a n ) >
bn
In
j [a(u'+ku+F(Ul)-F(U2))]'dx a(u' + ku + F(Ul) == °+ a(bn)u' (bn) - a( an)u' (an) :S °
=
an
F(U2))]~~
==
by the above and since Ul == U2 at an, bn . The other possible cases are similar, except that the boundary conditions must be used if an == or if bn == 00.
°
Condition (C3) can be dispensed with by replacing A by it closure. Condition (Cl) is the stumbling block. We want to solve u - AAu == h (given hEX and A > 0). Let
v(x) =
l
x
u(s)iJ(s)ds,
x E [0,00].
Then v E C[O, 00], v(O) == 0, v(oo) == Jooou{Jds E IR (since u E X). Now replace x E [0,00] by y == ~ tan-l(x) E [0,1]. Then A induces an operator B on C[O, 1] of the form
Bu == ')I(y)u + 'l/J(y, u, u')
Degenerate Nonlinear Parabolic Problems
109
with Wentzell boundary conditiflns. For the classical Kompaneets equation (2), "((y) ~ const. y2 near y == and "((y) ---* 00 as y ---* 1. In particular the previous work of Goldstein and Lin on (1) should be extended to cover the case when cp(x, ~), 1/J(x, 'TJ,~) are allowed to be discontinuous at x == 0,1. This has been done by Wang [20]. But the resulting theory (and the extension of the theorenl of Section 2 to this context) seems insufficient to produce a well-posedness result for (2).
°
Wang [19], [20] has extended the linear theory of (7) (cf. (4)) to the context of the Clement-Timrnermans theorem. The nonlinear extensions are still under investigation, so we shall not report on them here. We make two hypotheses. (D1) Let
a, a', k, k'
E C(O,
(0), with
0:,
(3
Define
> 1
°
on (0,00).
Au = j3[a(u'
+ ku)]'
for u E 1'(A) == {v E X == L 1 ((0, (0); (3(x)dx) : v E C 2 (0, (0), Au E X, a(v' as x ---* 0, 00 }.
°
(D2)
fooo f3(x) exp{ -
+ ku)
---*
fIX k(s)ds }dx < 00.
Theorem 2. (Wang [19]) Let (Dl), (D2) hold. Then the closure of A is densely defined and m-dissipative (i.e. (Cl) holds for A). Thus by the Hille- Yosida theorem (cf. [lO}), A generates a strongly continuous contraction (linear) semigroup on X.
Theorem 3. (Wang [19]) Let (Dl), (D2) hold. Let
y
= £2((0,00)); {3(x) exp{JX k(s)ds }dx).
Let A 2 be the operator A but with its domain modified in the obvious way so that it acts on Y rather than X. Then A 2 is a non-positive essentially selfadjoint operator.
Condition (D2) is the analogue of the Clement-Timmermans condition W E L 2 (0, 1) (cf. (Aa), (AI))' For the semigroup T == {T(t) : t 2: O} generated by A (or (A 2 ) on both X and Y, T is a positive semigroup in the sense that if f 2: (for f in X or Y), T(t)f 2: O. Because of Theorem 3, T is positive in two senses, namely T(t) is a positive operator (as above) in the lattice sense and T(t) is also a positive selfadjoint operator.
°
Let a(x) == x 4 , (3(x) == x 2 , k(x) == k == constant. When k > 0, then (D2) holds and both Theorem 2 and 3 apply. When k == 0, .A is m-dissipative and .A 2 is non-positive selfadjoint, but (D2) fails. (This corresponds to (Aa), (AI) holding but W 1-: L 1 (0,1).) When k < 0, the closure of A is not nl-dissipative; the conclusions of Theorems 2,3 fail in this case.
110
Goldstein et al.
REFERENCES 1. V. Barbu, Nonlinear Semigroups and Differential Equations in Banach Space, Noordhoff, Leyden, 1976.
2. Ph. Benilan, Equations d'Evolution dans un Espace de Banach Quelconque et Applications, Thesis, Universite de Paris XI, Orsay, 1972. 3. Ph. Clement and C. A. Tilnmermans, On Co-selnigroups generated by differential operators satisfying Ventcel's boundary conditions, Indag. Math. 89 (1986), 379-386. 4. M.G. Crandall and T. M. Liggett, Generation of semigroups of nonlinear transformations on General Banach spaces, Amer. J. Math. 93 (1971), 265-298. 5. J. R. Dorroh and G. R. Goldstein, Quasilinear diffusions, in Evolution Equations, Control Theory and Biomathematics (ed. by Ph. Clement and G. Lumer), Dekker, New York (1994), 155-166. 6. J. R. Dorroh and G. R. Goldstein, A singular quasilinear parabolic problem in n dimensions, in preparation. 7. J. R. Dorroh and G. R. Rieder, A singular quasilinear parabolic problem in one space dimension, J. Diff. Equations 91 (1991), 1-23. 8. G. R. Goldstein, Nonlinear singular diffusion with nonlinear, boundary conditions, Math Meth. Appl. Sci. 16 (1993), 279-298. 9. G. R. Goldstein, J. A. Goldstein, and S. Oharu, The Favard class for a nonlinear parabolic problem, in Recent Development in Evolution Equations (ed. by A. C. McBride and G. F. Roach), Longman, Pitman Notes, Harlow (1995), 134-147. 10. J. A. Goldstein, Semigroups of Linear Operators and Applications, Oxford University Press, New York and Oxford, 1985. 11. J. A. Goldstein, The Kompaneets equation, in Differential Equations in Abstract Spaces (ed. by G. Dore, A. Favini, E. Obrecht, and A. Venni), Dekker, New York (1993), 115-123. 12. J.. A. Goldstein and C.- Y. Lin, Singular nonlinear parabolic boundary value problems in one space dimension, J. Diff. Equations 68 (1987), 429-443. 13. J. A. Goldstein and C.-Y. Lin, Highly degenerate parabolic boundary value problems, Diff. 1nl. Eqns. 2 (1989), 215-227. 14. J. A. Goldstein and C.-Y.Lin, An LP-semigroup approach to degenerate parabolic boundary value problems, Ann. Mat. Pura. Appl. 159 (1991), 211-227. 15. J. A. Goldstein and C.-Y. Lin, Parabolic problems with strong degeneracy at the spatial boundary, in Semigroup Theory and Evolution Equations (ed. by Ph. Clement, E. Mitidieri, and B. de Pagter), Dekker (1991), 181-191. 16. J.A. Goldstein and C.-Y. Lin, in preparation.
Degenerate Nonlinear Parabolic Problems
III
17. C.-Y. Lin, Degenerate nonlinear parabolic boundary value problems, Nonlinear Anal. TMA 13 (1989), 1303-1315. 18. G. Lumer, R. Redheffer, and W. WaIter, Estimates for solutions of degenerate second order differential equations and inequalities with application to diffusion, N onlinear Anal. TMA 12 (1988), 1105-1121. 19. K. Wang, The linear Kompaneets equation, J. Math. Anal. Appl., to appear. 20. K. Wang, The Generalized K ompaneets Equation, Ph.D. Thesis, Louisiana State University, 1995.
An Application of Measure Theory to Perfect Competition NEIL E. GRETSKY Department of rvlathematics, University of California, Riverside, CA 92521 (email: [email protected])
JOSEPH M. ()STROY Department of Economics, University of California, Los Angeles, CA 90024 (email: [email protected]) WILLIAM R,. ZAME Department of Econornics, University of California, Los Angeles, CA 90024 (email: [email protected])
Dedicated to Professor M.M. Rao on the occasion of his 65th birthday.
1
Introduction
This talk was given by the first author as part of the Festschrift held in November, 1994 at the University of California, Rjverside in honor of M.NI. R,ao's 65th birthday. The work presented here is a complement to a much larger project; this paper constitutes an alternative approach to a problerll discussed in (Gretsky, Ostroy, and Zame a). Other related material and background can be found in (Gretsky, Ostroy, and Zanle 1992; Gretsky, Ostroy, and Zame b). Since one of Professor Rao's influences was to instill in his students a love of vector measures, the purpose of this talk is to show how a measure theoretic approach allows an appropriate description of non-manipulation in very large econornies and to give a sample result. 113
Gretsky et al.
114
2
The Model
We start by describing a very large! assignment economy. There are many "stories" for which the assignment economy is an appropriate model; we choose one in which the commodities being traded are houses. As is the usual practice, an economy will be specified by listing the endowments and preferences of the agents who participate in the economy. Our story may be summarized as follows. There are two kinds of agents: buyers and sellers. Each buyer wishes to buy one house and is assumed to be initially endowed with enough money to buy any house; his preferences consist of a schedule describing what he is willing to pay for each house. Each seller is initially endowed with one house which she will be willing to sell if the price were right. The equilibria of the resulting exchange economy are the subject of study in the present paper 2 . We give a mathematical model for this story. Let the set of houses be a compact metric space H. For convenience, we introduce a fictional house 0 to H to indicate no trade. Denote H U 0 as Ho. The set of buyers is given by B == {b : H ---+ [0,1] I b is continuous} and the set of sellers is given by S == H x [0,1]. Denote the set of agents as I == BUS. We have assumed continuity for buyers partially for sirnplicity; a less restrictive condition on buyers leads to a much more complicated set of results as discussed in (Gretsky, Ostroy, and Zame
b). We interpret the information for buyers as meaning that b(h) is the reservation value that a buyer labeled as being of type b places on house h, i.e. b( h) is the maximum amount of money that a buyer of type b is willing to pay for house h. For sellers the interpretation is that a seller labeled as being of type s == (h, r) owns house h and places a reservation value r on this house, i.e. a seller of type (h, r) will not sell her house for an amount of money less than r. The reservation values are norrnalized to lie in the interval [0,1]. We can describe the preferences of an agent of type i by a utility junction defined on houses and money as:
vi(h)
vi(h) + m b(h) ifi == b,h E H if i == s_ == (h, r), h { -r o if h == 0
E H
Finally, the initial data of the economy consists of a population measure It E M+ (1) which is compactly supported. Such a measure specifies what types of, and in what relative quantities, agents are present. The condition of compact support is no restriction on the collection of sellers since the space S is a priori compact. However, this condition ensures that the collection of continuous functions for the buyers who are actually present in the economy form an equicontinuous family. The appropriate solution concept for such an exchange economy is Walrasian equilibrium, i.e. a price system on houses and a collection of trades among individuals such that individuals maximize utility, taking the price system as given, and such that markets clear. 1 Such economies are called non-atomic assignment economies since, in the attempt to describe large economies, it is common to assume the measures involved are non-atornic. The results given here do not actually depend on such an assumption. 2The goal of the assignment problem is to pair up buyers and sellers so as to maximize the total profit. The equilibria of the exchange economy provide an alternative, but equivalent, formulation to the optimization problem. See (Gretsky, Ostroy, and Zame 1992) for this and other equivalences.
Application of Measure Theory to Perfect Competition
115
A price system in our context should be a bounded Borel measurable function on the space of houses. For technical reasons 3 we agree to identify price systems as being identical if they agree almost everywhere in the natural measure /-lH induced on houses by the given population measure J-L; viz. the measure J-LH is the first marginal of the population measure J-Ls defined on S == H x [0,1]. Thus, a price system is a member of LOO (/-lH). The trading allocation of houses to buyers and sellers is described by a housing distribution measure y E
M+(I x Ho).
If the support of the housing measure /-lH is a proper subset of H, there is an ambiguity about prices for houses in H which are not in the support of /-lH. It is convenient to insist that prices for such houses be identically 1 so that no buyer ever strictly prefers to buy such a house. Moreover, for the fictional house 0 we extend p by setting p( 0) == O. It ,vas shown in (Gretsky, Ostray, and Zame 1992) in the present case of an equicontinuous family of buyer functions that Walrasian price systems can be chosen to be continuous on the support of /-lH. Given any Walrasian price system p, we will without further mention always reduce to the unique price system p' which agrees with p almost everywhere with respect to /-lH, is continuous on the support of J-LH, and is identically 1 on the complement of the support of /-lH· In order to define Walrasian equilibrium, we introduce an auxiliary concept. Given a price system p for houses, the indirect utility junction at these prices is defined to be V~( ) == {max{suPhEH{b(h) - p(h)}, O} t P max{p(h) - r, O}
if i == b if i == 8 == (h, r)
where the supremum in the first expression is the PH-essential supremum performed in the space LOO(PB) and the second expression is in LOO(ps). This gives the maximum utility attainable by an agent of type i subject to the budget constraint
-p(h)
m
== { p(h)
if i == b if i == s
DEFINITION: A Walrasian equilibrium for the assignment economy P is a pair (p, y) where p E LOO(/-lH) is a price system and y E Jvf+(I x Ho) is a housing distribution such that
(i) Yl == P,
(ii) y(B x G) == y(S x G) for every Borel set G
(Hi)
y (I x
~
H, and
Ho) == y { (b, h) E B x H : Vb ( h) - p (h) == V; (p) } +y{ (8, h) E S x H : vs(h) + p(h) == v;(p)} + y { ( i, 0) : Vi ( 0) == v; (p)}
These conditions state, respectively, that y is population consistent with P; that the houses received by the buyers equal those supplied by the sellers; and that, except possibly for a /-l-null set, all buyers and sellers are maximizing utility subject to the budget constraint defined by prices p. 3If individual point functions are used for price systems then there are several constructions the results of which cannot be guaranteed to be measurable. See (Gretsky, Ostroy, and Zame a) for rnore detail.
116
3
Gretsky et at.
Misrepresentation
A natural question in the study of perfect competition is whether any agent or coalition of agents can favorably manipulate the Walrasian equilibria by misrepresenting their type. We consider a subpopulation of a given ,population to be a measure 1/ E M+ (1) such that o :::; 1) :::; M. Note that, in an assignment economy, trading really occurs only between pairs of rnatched individuals - one buyer and one seller. Consequently, without loss of generality, we may restrict attention to subpopulations consisting solely of buyers or sellers. We first consider subpopulations of buyers only. Thus, we start with a population /-L == (MB, /-Ls) and a subpopulation 1/ of MB for which we will measure the results of misrepresentation. Denote by Ll the diagonal set of B x B. DEFINITION: An announcement of misrepresentations for the subpopulation a measure a on B x B such that
(i)
1/
of buyers is
al == /-LB,
(ii) II a 211
== II,LLB 11, and
(iii) (a 1.6. c ) 1
:::; 1).
The interpretation is that a is a distribution on B x B which describes announcements of types; here, the pair (b, b') means that an agent of type b announces himself to be of type b'. (Notice that an agent may misrepresent his preferences but not his endowment.) The three conditions are, respectively, that the population implicit in the announcement is consistent with the given population, that population mass is conserved, and that the actual misrepresenters are contained in the subpopulation 1/. The misrepresentation leads to the "new" economy given by the population (0:2, ,LLs) and to a resulting Walrasian equilibrium
(yQ, pQ). Given a population measure /-L, a subpopulation of buyers 1/, an announcement 1/, and a choice (yQ, pQ) of a Walrasian equilibrium for the economy given by the population measure (a2' /-Ls), an outcome of the announcement is a measure TJQ E M+(B x B x Ho) such that DEFINITION:
a of misrepresentations for
(i) (ii)
TJ?2 == 0:, and TJ2,3 == yet.
The measure TJQ is a distributional description of house assignments with respect to agents' true and announced types. The total utility of this outcome to all buyers is the aggregate of the outcome that each agent receives in this "new" economy measured in terms of that agent's true preferences. This total utility can be expressed as
r
{b(h) - pQ(h)}d1]Q(b, b', h)
} BxBxHo
Unfortunately, as it stands, the distribution 1] is not very useful for two reasons: (i) TJ lacks descriptive power in that the structure of the misrepresentation is not clear; (ii) 1] lacks
Application of Measure Theory to Perfect Competition
117
technical power in that there is not a convenient 4 way to prove any characterization theorems about manipulation in this formulation. We recall a classical result in the theory of vector measures.
PROPOSITION 1 Let (0, E, j.L) be a finite complete measure space and let X be a Banach space. If a countably additive vector measure F : E ~ X* satisfies IIF(E) 11 S KIj.L(E) I for all E E E for .some constant K, then there exists a weak* -measurable function f : 0 ~ X* such that for all x E X and for all E E 2: (F(E), x)
=
le (J(w),
x)dfl(W)
We call the function f a Gelfand density (or a weak* density) for F with respect to j.L. We write F(E) == J fdfL weak* in X*. This weak* representation theorem is equivalent to the fact that every bounded linear operator T from L 1 (0, E, j.L) to X* has a weak* density. In fact, the vector measure F and the operator T which correspond to each other under the map F(·) == T(X.) have the same weak* density. The Gelfand integral and its properties were introduced in (Gelfand 1936; Gelfand 1938). The representation theorem was proved by Gelfand in the special case that X is a separable Banach space's . The general case is proved by means of a lifting theorem as in (Dinculeanu and Uhl 1973). A more detailed discussion may be found in (Diestel and Uhl 1977). In the present model, it has been assumed that the collection H of houses is a compact Hausdorff space. Consequently, the Banach spaces of continuous functions C(H), C(B), and C(8) are separable and we may apply Gelfand's original theorem in these cases for misrepresentation outcomes. We start with housing allocations. For clarity, we will write the argument of a measure-valued density as a subscript; e.g. for'l/J : I ~ M+(Ho), we will write 'l/Ji(E) instead of 'l/J(i, E) or (1/J(i))(E).
THEOREM 1 Let y E M+(I x Ho) be a Walrasian allocation for the economy given by the population measure j.L. Then there is a weak* -measurable function 'ljJ : I ~ M+(Ho) with II1/Jill == 1 a.e.{/t} such that for any A ~ I y(A,·)
=
i
1jJdfl
weak' in M+(H)
In fact, dy(i, h) == 1/Ji(dh)dj.L(i) Proof: Define F : SI ~ M+(Ho) by (F(E))(G) == y(E, G) for each Borel set E ~ I and each Borel set G ~ Ho. Since Ho is compact, l\1(Ho) is the dual of the space of continuous functions C(Ho). It follows easily from y E M+(I x Ho) that F is a countably additive vector measure. Moreover, since y is a Walrasian allocation for IL, vve have that IIF(E)11 == (F(E))(Ho) == y(E, Ho) == Yl(E) == JL(E). Consequently, by Gelfand's theorem there exists a weak* density 1/J : I ~ M+(Ho) such that for all continuous functions x on Ho and for all Borel E ~ I
4 However, see (Gretsky, Ostroy, and Zame a) for an alternative approach which was developed after this talk was given. 5The speaker is grateful to J.J. Uhl, Jr. for pointing out this historical fact.
Gretsky et al.
118
which can be written as
( y(E, dh)x(h) == { ( x(h)d'l/Ji(h)dj-l(i)
lHo
lE lHo
or, equivalently,
dy(i, h) == 1/)i(dh)dj-l(i) Note that for any A a Borel subset of I we have j-l(A) == y(A, Ho) == II'l/Jill == 1 a.e.[j-l]. 0
J II'l/Jilldj-l(i)
so that
With this theorem there is a complete description in terms of weak* densities for the outcome of a misrepresentation.
THEOREM 2 Let (yQ, pQ) be any Walrasian equilibrium and TJQ be any outcome for a population measure j-l, a buyer subpopulation v, and an announcement a of misrepresentations for v. Then there exist a weak* -measurable junction 1/)Q : B ~ M+ (Ho) UJith 11'l/J~ 11 == 1 a.e.[j-lB] and a weak*-measurable junction a : B ~ M+(B) with Ilabll == 1 a.e.[,LB] such that
yCt(E,·)
=
a.(E,·) =
k k
weak' in M(Ho)
1/J'b- d/LB(b')
weak' in M(B)
abd/LB(b)
and
Proof: Given the announcement a of misrepresentations, we apply Gelfand's theorem to the vector measure (Z(E)) == a(E, F) and theorem 1 to the Walrasian allocation yQ. 0 The total utility to all buyers in the outcome of the announced economy
(
lBxHxHo
{b(h) - pQ(h) }dTJ(b, b' , h)
may be rewritten by Theorem 2 as
Thus, the total utility to the misrepresenting subpopulation v is
{ ( r {b(h) -
lB lB lHo
pQ(h)}'l/J~(dh)ab(db')dv(b)
With no manipulation allowed, a is ILB normalized on the diagonal mass 6b E M(B). In this case, the utility to the subpopulation v is
{ { ( {b(h) - P(h)}'l/Jb (dh)6b(db')dv(b)
lB lB lHo
l
which simplifies to
{ ( {b(h) - P(h)}'l/Jb(dh)dv(b)
lB lHo
~;
i.e. ab is the point
Application of Measure Theory to Perfect Competition
119
Consequently, the increase in utility to the misrepresenting subpopulation v of buyers resulting from the announcement a in the economy J-L is
U~(v) ==;; ;; ;; B
B
{b(h) -
Ho
pO(h)}7/J~(dh)ab(db')dv(b)
-;;;; {b(h) - P(h)}7/Jb(dh)dv(b) B
Ho
Note that the increase in utility may be negative. Moreover, we have in the notation sup(v) on the choices of y O and T); these need not be unique for a pressed the dependence of given J-L, v, and a. A similar expression can be developed for subpopulations of sellers. Now the announcement a is defined on S x S and this leads to a distributional description T) on S x S x Ho of houses as a result of true and announced types. The utility to the misrepresenting subpopulation v is
U:
(
{pO(h) - a(s) }d7J(s, s', h)
iSXSXHo
which can be rewritten in a similar manner to that of Theorem 2 as
where y O (', F) == IF cP~,dJ-Ls(s') weak* in M(Ho) with IlcP~' 11 == 1 a.e.[J-Ls) and a(E,') IEasdJ-LS(s) weak* in M(S) with Ilasll == 1 a.e.[J-Ls). Since a seller type can misrepresent only her preferences, viz. the reservation value for her own house, the measure as which describes how the type s misrepresents is a measure on S == H x [0, 1] which is supported on the set {1f s} X [0, 1] and consequently can be described as a measure Ps on [0,1]. Moreover, if a seller of type s were to announce herself as being type s', then it would have to follow that s' == (h, r') where h == 1f" and r' E [0, 1] since the house endOWlllent cannot be misrepresented. Consequently, cP~' == tlO(r')c5 h + {I - uO(r') }c5o is a measure on Ho in which the latter terrn represents no-trade activity and thus will not contribute to the utility integral. So we may rewrite the utility integral as
({
is i[O,l]
{pQ(1fs) - a(s)}uO(r)dps(r)dv(s).
We can compute the increase in utility to the misrepresenting subpopulation v of sellers resulting frolll the distribution Cl of announcements in the economy J-L to be
U~(v) ==
{
(
is' i[o,l]
{pO(1fs) - a(s)}uO(r)dps(r)dv(s) -
1.' {p(1fs) s
a(s)}u(s)dv(s)
As we did in the case for subpopulations of buyers, we again suppress in the notation the dependence of U/~ (v) on the choices of yO and 7J.
4
Manipulation
DEFINITION: The economy J-L is non-manipulable if there is a Walrasian price p E P(J-L) with the property that given E > 0 there is (~ > 0 such that for any subpopulation v of buyers or sellers satisfying I/vll < 6, any distribution Cl of announcements of misrepresentations for v, any Walrasian equilibrium (yO, pO) for the misrepresented econOlllY, and any outcome T) of the announcement it follows that
120
Gretsky et at.
An economy being non-manipulable is a manifestation of it being perfectly competitive in the sense that asymptotically small coalitions 6 cannot favorably manipulate the economy by misrepresentation. This turns out to be equivalent to a number of other conditions including stability of the Walrasian price correspondence and differentiability of the function totaling the gains from all trades. We give a sample result. The Walrasian price correspondence is a correspondence (a possibly multiply-valued function) P : M + (I) -t L ~ (jj H) which takes each economy described by a population measure to the collection of Walrasian prices for that economy. We will consider the stability question for the subset of population measures which are absolutely continuous with respect to a fixed measure jj. If v is such a measure then it has Radon-Nikodym derivative dv / d/-L E L~ (/-L); moreover, v is a subpopulation if and only if dv/d/-L :S 1 a.e.[jj]. Our restriction gives the correspondence PJl: L~(/-L) -t L~(J-lH) where PJl(h) == P(v) for h == dv/djj.
THEOREM 3 The economy /-L is non-manipulable if and only if the (restricted) price correspondence PJl : L~ (/-L) -t L~ (jjH) is norm-norm continuous at 1. Proof: For the sake of argument, we consider a subpopulation of buyers. Assume that PJl is norm-norm continuous at 1. We need to compare the expression
to the expression
hhHo {b(h) - P(h)}1/Jb(dh)dv(b). . B
The latter may be written as
{ { ( {b(h) - p(h) }1/Jb' (dh)cSb(b')dv(b).
lB lB lHo utility vb is
Note that the indirect continuous with respect to the sup norm topology on prices. Thus, although the Walrasian allocations for nearby prices need not be close in variation to the given allocation for jj, the utility of such allocations is close to the utility of the given allocation, i.e.
IItl {b(h) - pQ(h)}1/JII (dh) is continuous at 0 with respect to
Q.
L
{b(h) - p(h)}'1Pb,(dh) II
Consequently,
UQ(v) lim _Jl_ == 0 II v ll--rO
Ilvll
as desired. Conversely, if PJl is not norm-to-norm continuous at 1, then there exists some E > 0 such that for every cS > 0 there is a subpopulation v and an announcement Q consistent with v 6S mall coalitions serve as proxies for the individual agents which would be the objects of concern in finite economies.
Application of Measure Theory to Perfect Competition
such that IIvll < cS and of v it follows that
IlpQ - pll
Iv;(pQ) - v;(p)1
2: f for some pQ
121
E PtL(l - dv /dJ-L). Hence for b in the support
I sup{b(h) - pQ(h)} - sup{b(h) - p(h)}1 h
h
> IlpQ - pll Consequently,
U:(v)/llvll
need not converge to O. 0
References Diestel, J. and J. J. Uhl, Jr. (1977). Vector Measures. Number 15 in Mathematical Surveys. Providence: American Mathematical Society. Dinculeanu, N. and J. J. Uhl, Jr. (1973). A unifying Radon-Nikodym theorem for vector measures. Journal of Multivariate Analysis 3, 184-203. Gelfand, 1. M. (1936). Sur un lemme de la theorie des espaces lineaires. Comm. Inst. Sci. Math. Mec. Univ. de Kharkoff et Soc. Math. Kharkoff (4) 13,35-40. Gelfand, 1. M. (1938). Abstrakte funktionen und lineaire operatoren. Matematicheskii Sbornik (New Series) 4 46, 235-286. Gretsky, N. E., J. M. astroy, and W. R. Zame. Perfect competition in the nonatomic assignment model: The continuous case. Forthcoming. Gretsky, N. E., J. M. astroy, and W. R. Zame. Perfect competition in the nonatomic assignment model: The discontinuous case. Forthcoming. Gretsky, N. E., J. M. astroy, and W. R. Zame (1992). The nonatomic assignment model. Economic Theory 2, 103-127.
Dilations of Hilbert-Schmidt Class Operator-Valued Measures and Applications YUICHIRO KAKIHARA Department of Mathematics, University of California, Riverside, Riverside, CA 92521-0135, U. S. A. Dedicated to Professor M. M. Rao on the occasion of his 65th birthday
ABSTRACT The space of Hilbert-Schmidt class operators has a gramian structure, i.e., a trace class operator valued inner product. A gramian orthogonally scattered dilation of a Hilbert-Schmidt class operator valued measure is considered. Several new characterizations of it are given. An application to Hilbert space valued second order stochastic processes is made, where some equivalence conditions are given for a process to have an operator stationary dilation.
1. INTRODUCTION The orthogonally scattered dilation of Hilbert space valued measures has been fully studied and, in Section 2, we shall state basic results on it. The purpose of this paper is to consider gramian orthogonally scattered dilation of Hilbert-Schmidt class operator valued measures and its application to Hilbert space valued second order stochastic processes. Let H, K be a pair of complex Hilbert spaces. B(H) denotes the algebra of all bounded linear operators on Hand T(H) the Banach space of all trace class operators on H. S(K,H) denotes the Hilbert space of all Hilbert-Schmidt class operators from K into H. S(K, H) has SOll1e nice properties and among them is a gramian structure. That is, 123
Kakihara
124
S(K, H) is a left B(H)-module with the operator multiplication from left and if we define [x, y] == xy* E T(H) for x, y E S(K, H), then [".] satisfies that (1) [x, x] 2: 0, and [x, x] == 0 if and only if x == 0; (2) [x + y, z] = [x, z] + [y, z]; (3) [ax, y] == a[x, y]; (4) [x, y]* == [y, x], where x, y, z E S(K, H) and a E B(H). The T(H)-valued inner product [', .] is called the gramian in S(K, H). We say that S(K, H) is a normal Hilbert B(H)-module. In Section 3, we characterize those S(K, H)-valued measures which have gramian 01'thogonally scattered dilations. In Section 4, a new necessary and suffifient condition for an H -valued second order stochastic process on a locally compact abelian group to have an operator stationary dilation is given together with known conditions. All the contents of this paper will be included in detail in the monograph [6].
2. ORTHOGONALLY SCATTERED DILATION Let (8,21) be a measurable space. ca(21, K) denotes the set of all K -valued bounded c.a. (countably additive) measures on (8,21). The semivariation II~II (A) of ~ E ca(2t, K) at A E 2l is defined by
II~II(A) = sup {11 ~ Q~~(.6.)t : IQ~I ::; 1, .6. E
7[
E
II(A) },
(2.1)
where I1(A) denotes the set of all finite 2l-measurable partitions of A and II·IIK the norm in K. ~ is said to be orthogonally scattered (0. s.) if (~( A), ~ (B) ) K == 0 for every disjoint pair A, B E QC where (" ')K is the inner product in K. caos(2t, K) denotes the set of all o.s. measures in ca(2t K).
DEFINITION 2.1 (1) ~ E ca(2t, K) is said to have an orthogonally scattered dilation (o.s. d.) if there exist a Hilbert space Jt containing K as a closed subspace and an Tj E caos(21, Jt) such that ~ == JTj, where J : it -t K is the orthogonal projection. The triple {1], Jt, J} is also called an O.s. d. of ~. (2) ~ is said to have a spectral dilation if there exist a Hilbert space .R, a (weakly c.a.) spectral measure E(·) in it, an operator S E B(it,K) and a vector 'l/J E .R such that ~(.) == SE(·)'l/J, where B(.R,K) is the Banach space of all bounded linear operators from .R into K. Let X and Wbe two normed linear spaces. A linear operator T : oX ---+ ~ is said to be absolutely 2-summing if 7f2(T) < 00, where 1r2(T) == inf{C > 0 : (2.2) holds}:
for any n 2: 1 and
Xb' .. ,X n
E
x,
Dilations of Operator-Valued Measures
125
where II'II~ and 11·llx· are norms in !D and X·, respectively. Let LO(8) be the set of all complex valued ~-simple functions on 8. For! E LO(8) consider the sup norm 11/1100 defined by 11/1100 == sup I/(t)l. Then (LO(8), 11.11(0) becomes tEe
a normed linear space. Let ~ E ca(~, K) and define the integral of I ==
n
l: a j 1Aj
E LO(8)
j=1
w.r.t. (with respect to)
~
~
over A E
in an obvious manner by
1 d~ = t Q;j~(Aj f
A
n A),
j=l
where lA is the indicator function of A. Then the following theorem is known (cf. Niemi [10, 11], Rao [13] and Rosenberg [14]):
THEOREM 2.2 Let ~ E ca (21, K). Then the following conditions are equivalent: (1) ~ has an a.s.d. (2) ~ has a spectral dilation. (3) There exists a constant C > 0 such that for any n 2: 1 and 11, ... ,In E LO(8)
till Ji J=1
by
e
2
d1,11
K
~ ell t
J=1
2
(2.3)
IJi' 11 00 .
(4) The operator Se : (LO(8), 11·1100) -t K is absolutely 2-summing, where Se! == fe f d~ for f E LO(8). (5) There exists a positive finite measure v E ca(m, IR+) such that
1vhere
jR+
== (0, (Xl). In this case,
1/
Se
is defined
is called a 2-majarant of ~.
It follows frolll the Grothendieck's inequality (cf. Grothendieck [3] and Lindenstrauss and Pelczyri.ski [7]) that the inequality (2.3) holds with C == Kcll~II(8)2 where Kc is the Grothendieck constant and "~II (8) is the total semivariation of (cf. Rosenberg [14]). Therefore we have:
e
COROLLARY 2.3
Every ~ E ca(21, K) has an o.s.d.
3. GRAMIAN ORTHOGONALLY SCATTERED DILATION We consider S(K, H)-valued c.a. measures on (8,21) and assume that H is separable, so that H has a countable CONS (complete orthonormal system). As in the Introduction, the T(H)-valued gramian [".] in S(K, H) is considered.
DEFINITION 3.1
An S(K, H)-valued measure ~ E ca (21, S(K, H)) is said to be
gramian orthogonally scattered (g.o.s.) if [~(A),~(B)] == 0 for every disjoint A, B E
Kakihara
126
cagos(m,S(K,H)) denotes the set of all g.o.s. measures in ca(m,S(K,H)). ~ E ca(m, S(K, H)) is said to have a gramian orthogonally scattered dilation (g.o.s.d.) if there exist a Hilbert space Jt containing K as a closed subspace and an TJ E cagos(m, S(fi, H)) such that ~ == PTJ, where P : S(.R, H) ---t S(K, H) is the gramian orthogonal projection, i.e., p. is an orthogonal projection which satisfies that
2L
[P 2 x, y] == [Px, y] == [x, Py]'
x,y E S(fi, H).
The triple {TJ, Jt, P} is also called a g. O.s. d. of ~' As we mentioned in Section 2, every Hilbert space valued bounded c.a. rneasure has an o.s.d. But not every Hilbert-Schluidt class operator valued measure has a g.o.s.d. Thus we shall give SOl1le necessary and sufficient conditions for the g.o.s.d. Let ~ E ca(2l, S(K, H)) and LO (8 ; B(H)) be the set of all B(H)-valued 2l-sil1lple n
functions on 8. The integral of ==
L:
a j lA) E LO(8; B(H)) W.r.t. ~ over A E
m is
j=l
defined by
m
which is in S(K,H).
Let F E ca(m,T(H)).
For ==
L:
ajIA), W
j=1
LO(8; B(H)) the integral of (, w) w.r.t. F over A
1
1
(CP, '11) dP =
dF '11*
A
A
=
E
ft
2l is defined by
ajF(Aj
n B k n A)bk·
j=1 k=1
Let us put T+(H) == {a E T(H) : a ~ O}. With these preparations we prove the following proposition which was lllentioned without proof in Kakihara [4, 3.9. Proposition].
PROPOSITION 3.2 Let ~ E ca(m,S(K,H)). Then, ~ has a g.o.s.d. if and only ifii has a T+(H)-valued 2-majorant FE ca(2t,T+(H)). That is, (3.1 )
Proof: Suppose that ~ has a g.o.s.d. {TJ,fi,P}. Put F(-) == [TJ('),TJ(')] == TJ(-)TJe)*, then n
F E ca(2l, T+ (H)). For ==
I:: aj lA
j
E LO (8 ; B(H)) we have that
j=l
~L d7] = since
~
l
d(P7]) =
l
de
== PTJ and P commutes with the module action of B(H) and hence
Dilations of Operator-Valued Measures Conversely~
127
suppose that (3.1) holds and define M : 2! x 2!
n B)
M(A, B) == F(A
M~(A,
-
B),
-7
A, B
T(H) by E
2!,
where M~ (A, B) == [~(A), ~(B)]. Then we see that M is a positive definite kernel on 21 x 21 in the sense that
L
ajM(Aj , Ak)ak 2: 0
j,k for any n ~ 1, aI, ... ,an E B(H) and AI, . .. ,An E 21. Thus there exist a reproducing kernel normal Hilbert B(H)-module Y of M containing S(K, H) as a closed sublnodule and an 1] E cagos(2!, Y) such that ~ == P1], where P : Y -7 S(K, H) is the granlian orthogonal projection (cf. [4]). By the structure theorem (cf. Ozawa [12]) there exists a Hilbert space Jl such that Y ~ S(Jl, H), i.e., Y and S(K, H) are isomorphic as normal Hilbert B(H)-modules, and K can be regarded as a closed subspace of R. Therefore, {1], .R, P} is a g.o.s.d. of ~. Let ~ E ca (2!, S(K, H)). The operator semivariation 11~llo(A) of ~ at A E 21 is defined by
11~lIo(A) = sup {II ~ at.~(Li)L where
: at.
E B(H),
11· Ila is the Hilbert-Schmidt norm.
Ilat.11
~ 1, Li E
1r
E I1(A)},
Now ~* defined by ~*(.) == ~(-)* is In E 21 is defined by
ca (21, S(H, K)). The strong sernivariation II~* lis (A) of ~* at A
IIClls(A) = snp
{II ft
C(Li)1>t.t : 1>t.
E
H,
111>IIH
~ 1}.
Note that ~*4> defined by (~*4»(') == ~*(')4> is in ca(2t, K) for each 4> E H. Then we have the following proposition (cf. Kakihara [5, 5.2 Theorem, 5.3 Corollary and 5.7 Theorenl]). The part (2) ::::;> (1) was proved in Truong-van [17].
PROPOSITION 3.3 (1) ~ has a g.o.s.d. (2) 11~llo(8) < 00. (3) 11~*lls(8) < 00.
For ~ E ca (2t, S(K, H)) the following statements are equivalent:
(4) For some CONS {4>k}r=l in H there exists a family {17k,Rk,J k }r=l of o.s.d.'s of 00
{~*4>k}~I ~ ca(21,K) such that
2: 1117k(8)/Il < 00, k
k=l
where 1I·II.~k is the norm in Rk for
k 2: 1.
°
L (8 ; K) denotes the set of all K -valued 2l-simple functions on 8 and fa (21, K) the Banach space of all K-valued finitely additive (f.a.) measures on (8,2t) with the total semivariation norm 11·11(8), where the semivariation 11(11(·) of ( E fa(2t, K) is defined as in n
(2.1). For (
E
fa(2t, K) and r.p ==
2: 'l/JjlA j=1
( by
j
E LO(8;
K) we define the integral of r.p w.r.t.
Kakihara
128
Moreover, the norm
II'PII*
IIcpll.
is defined by
= sup {
Il
(cp, d() I:(
E
fa(21, K), 11(11 (8) :::; 1}.
It is known (see e.g. Makagon and Salehi [8]) that the dual space of (LO(8; K), 11·11*) can be identified with the Banach space fa(2!, K), where the isomorphism U : fa(2!, K) -+ LO(8 ; K)* is given by
(U()(cp) =
l
(cp, d(),
If ~ E ca (21, S(K, H)), then the integral of 'P w.r.t. ~ is defined by
Now we get other characterizations of g.o.s.d. as follows: For ~ E ca (21, S(K, H)) the following statements are equivalent: (1) ~ has a g.o.s.d. (2) For every CONS {4>k }~=l in H there exists a family {1]k, .Rk' Jk}k=l of o.s.d. 's of
PROPOSITION 3.4
00
{~*4>k}k=l ~ ca(21,K) such that
k
~
1.
L
k=l
II1]k(e)ll~k
<
00,
where 11·II.~k is the norm in .Rk for
00
(3)
2::
k=l
1I~*4>kll(8)2
<
00
for every ON sequence {cPk}k=l in H.
Ie
(4) The operator S~ : (LO(8; K), 11-11*) -+ H defined by S~'P == d~ 'P for 'P E LO(8; K) is absolutely 2-summing. (5) The adjoint operator S; : H -+ (fa(2!, K), 11·11 (8)) of S€ defined in (4) is absolutely 2-summing. Proof: (1) =} (2) can be shown in a same manner as in [5, 5.7 Theorem]. (2) =} (3): Let {cPk}k=l be any ON sequence in Hand {'l/Jk}k:::l be a CONS in H containing {cPk}k'=l' Let {1]k,Ji k , J k }k'=l be a family of o.s.d.'s of {~*'l/Jk}k=l such that 00
L
k=l
II"lk(e)lI~k
<
00,
which exists by assumption. Then, for each k ~ 1, we have
since 'T/k is o.s., and hence we obtain 00
L k=l
00
11~*4>kll(8)2 ~
L k=l
00
11~*'ljJkll(8)2 ~
L k=l
II"lk(8)II~k <
00.
Dilations of Operator-Valued Measures
(3)
=}
129
(5): For any ON sequence e == {
z= St
/1st lie ==
/l
k=1 00
sup
1
LO(8; K),
11
:s I}
k=1 00
sup { 1( S
k )H
'P,
=
2
1
:
4>k ) H
E
L °(8 ; K), 11
:s I}
(3.2)
2
1 : 'P E LO(8 ; K), II'PII. ~ 1}
~sup{IL('P,dC4>k)12 :'P EL O(8;K), II'PII. ~ I}
:s z= sup {1I
11
k=1
z= 1I~*
==
00,
k=1
S;
is absolutely 2-summing. by assumption. By Slowikowski [16, Theorem 1] we see that (5) =} (4): Let e == {
Then, (Je(Se) is equal to the right hand side of (3.2), so that (Je(S~) == IISelie' Since < 00 for every ON sequence e in H by assumption, it follows from Slowkowski [16, Proposition 1] that there exist a Hilbert space n, T I E B(LO(8 ;K),n) with IITIII == 1 and a Hilbert-Schmidt class operator T 2 E S(n, H) such that S~ == T 2 T I • Since every Hilbert-Schmidt class operator is absolutely 2-summing, S~ is also absolutely 2-summing. (4) =} (1): Since S~ is absolutely 2-summing, it follows from Makagon and Salehi [8, 5.4 Theorem] that ~ has a weakly c.a. spectral dilation. Then, ~* also has a weakly c.a. spectral dilation, so that the proof of 5.2 Theorem in [5] can be applied to conclude the proof. ae(S~)
Combining Propositions 3.2, 3.3 and 3.4, we get the following characterization of g.o.s.d. for a measure in ca (21, S(K, H)).
THEOREM 3.5 Let ~ E ca(21,S(K,H)). Then the following conditions are equivalent: (1) ~ has a g. o. s. d. (2) ~ has a T+(H)-valued 2-majorant F E ca (21, T+(H)), i.e.,
[ler cl> d~ , ler cl> d~] :s; ler cl> dF cl> *,
Kakihara
130
(3) 11~llo(e) < 00. (4) 1I~*lls(e) < 00. (5) For every CONS {1>k}k=l in H there exists a family {"7k, Jtk' J k }k=l of a.s.d. {~*1>k}k=l ~ ca(21, K) such that
k
~
00
L:
k=l
II1Jk(8)II~k
<
00,
where
11·II.Rk
'8
of
is the norm in Rk for
1.
(6) For some CONS {1>k}k=l in H the conclusion of (5) holds. 00
(7)
L:
k=l
11~*1>kll(8)2
<
00
for every ON sequence {1>k}k=l in H.
Ie
(8) The operator S~ : LO(8 ; K) -+ H defined by Seep == d~ ep for ep E LO(8; K) is absolutely 2-summing. (9) The adjoint operator S; : H -+ fa(21, K) of Se defined in (8) is absolutely 2summing.
4. APPLICATION In this section, we consider Hilbert space valued second order stochastic processes. So let (O,~, J-L) be a probability measure space and L5(0; H) be the Hilbert space of all H-valued strong random variables x(·) on 0 such that x(w) J-L(dw) == 0 and Ilx(w)llk J-L(dw) < 00. We put X == L5(O; H). The inner product and the norm in X are given respectively by
In
(X,y)x
=
l
(x(w),y(w))HP,(dw),
In
.1
IIxllx == (x, x)}
x,y E X.
X is a left B(H)-module with the natural action of B(H) and has a T(H)-valued inner product, which we called a gramian, defined by
[x, y]
=
l
x(w) 0 y(w) p,(dw),
x,yE X,
where ® is in the sense of Schatten [15]. In fact, X and S(L6(0),H) are isomorphic f d/l == O} and the as normal Hilbert B(H)-modules, where L5(O) == {f E L 2 (0) : correspondence between these spaces is given by X :3 x f--7 T; E S(L5(O), H), T x being defined by (4.1) 1> E H.
In
Let G be a locally compact abelian group and 8 be its dual. Q38 denotes the Borel a-algebra of G. ~ E ca(Q3 c ' L5(O)) is said to be regular if for any c > 0 and A E 21 there exist a compact set C and an open set 0 such that C ~ A ~ 0 and II~II(O\C) < c. Regularity of X-valued measures is defined sinlilarly. rca(Q3c,L5(O)), rcao8(Q38,L5(0)), rca(Q3 c ' X) and rcagos(Q3c' X) denote the sets of all regular measures in ca(Q3 c , L6(O)), caos(Q3 c ' L5(O)), ca(Q3c' X) and cagos(Q3c' X), respectively. The strong semivariation 11~lls(') of ~ E ca(Q3c' X) is defined to be that of Tee.) E ca(~c' S(H, L5(0))) (cf. (4.1)). Let {x(t)} be an L~(O)-valued process on G. That is, x(·) : G -+ L6(O) is a mapping. Then, {x( t)} is said to be weakly stationary if its (scalar) covariance function l' (8, t) == (x(s), x(t))2 (s, t E G) depends only on the difference 8t- 1 and if, putting 1'(8t- 1) == 1'(8, t),
Dilations of Operator-Valued Measures
131
:y is continuous on G, where (', ')2 is the inner product in L6(0). In this case, there exists a unique ~ E rcaos(~G' L6(0)) such that x(t)
=
fc (t, X) ~(dX),
t
E G,
(4.2)
where (', .) is the duality pair of G and G. ~ is called the representing measure of the process {x (t) }. {x (t)} is said to be 'weakly harmonizable if it has an integral representation (4.2) for some' E rca(2t, L6(0)).
DEFINITION 4.1 Let {x(t)} be an X-valued process on G, Le., x(·) : G -+ X is a mapping. (1) {x(t)} is said to be scalarly weakly harmonizable if {(x(t), 4»H} is an L6(0)-valued weakly harmonizable process for each cP EH. (2) {x (t)} is said to be weakly harmonizable if it has an integral representation (4.2) for some' E rca(~G' X). (3) {x (t)} is said to be weakly operator harmonizable if it has an integral representation (4.2) for some ~ E 1~ca(~G' X) of bounded operator semivariation, i.e., 11~llo(8) < 00. (4) {x(t)} is said to be operator stationary if its operator covariance junction r(s, t) == [x(s),x(t)] (s, t E G) depends only on the difference st- 1 and, putting f(st- 1 ) == f(s, t), f is a T(H)-valued weakly continuous function on G, i.e., Tr(af(.)) is continuous for every a E B(H). Here Tr(·) is the trace in T(H). In this case, {x(t)} has an integral representation (4.2) for a unique ~ E 1'cagos(~G' X). We are interested in stationary dilations of L6(0)-valued or X-valued processes on G. An L6(O)-valued process {x(t)} is said to have a weakly stationary dilation if there exist a Hilbert space .ft containing L6(0) as a closed subspace and a .ft-valued weakly stationary process {y(t)} on G such that x(t). = Jy(t) for t E G, where J : Jt -+ L6(0) is the orthogonal projection. The triple { {y(t)},.R, J} is also called a weakly stationary dilation of {x(t)}. Note that .R can be taken as an L 2-space L5(0) for some probability measure space (O,~, jj). Moreover, if "7 E rcaos(~G'.R) is the representing measure of {y(t)}, then {x(t)} is weakly harmonizable with the representing measure' E rca(~G' L6(0)) for which "7 is an o.s.d. of ~. Since every ~ E ca(~G' L6(0)) has an o.s.d. by Corollary 2.3, every L6(O)-valued weakly harmonizable process has a weakly stationary dilation, which was proved by Niemi [9] (see also Chang and Rao [1] and Rao [13]). If {x(t)} is an X-valued scalarly weakly harmonizable process, then there exists a family {{Yet>(t)}, L5(0et», J
DEFINITION 4.2 An X-valued process {x(t)} on G is said to have an operator stationary dilation if there exist a normal Hilbert B(H)-module Y = L6(0; H) containing X as a closed submodule and a V-valued operator stationary process {y(t)} such that x(t) = Py(t) for t E G, where P : Y -t X is the gramian orthogonal projection and
Kakihara
132
(O,~, it) is a probability 111easure space. The triple { {y(t)}, Y, stationary dilation of {x (t) }.
p} is also called an
operator
DEFINITION 4.3 An L6(O)-valued process {x(t)} on G is said to be V-bounded if a) x ( .) : G -+ L6 (0) is norIll continuous, b) {x(t)} is bounded, i.e., sup {llx(t)112: t E G} < 00, c) There exists a constant C > 0 such that (4.3) where L 1 (G) is the Ltgroup algebra of G with the Haar measure dt, 11<;31100 is the sup norm of the Fourier transform of rp, and the integral is in the sense of Bochner. An X-valued process {x(t)} on G is said to be scalarly V-bounded if, for each 0 such that
where L 1 (G ; B(H)) is the Banach space of all B(H)-valued functions on G which are Bochner integrable w.r.t. dt and, for E L 1 (G;B(H)), 4> E Co(G;B(H)) (the Banach space of all B(H)-valued norm continuous functions on G vanishing at infinity with the sup norm 11·1100) is its Fourier transform, Le.,
(x)
=
le
(t, X)
~(t) dt,
If {x(t)} is an X-valued weakly harmonizable process with the representing measure X), then it has an operator stationary dilation if and only if ~ satisfies one of the conditions of (1) - (9) in Theorem 3.5. More generally, we have the following proposition. ~ E rca(~G'
PROPOSITION 4.4. Let {x( t)} be an X-valued pocess on G. Then the following statements are equivalent.' (1) {x (t)} has an operator stationary dilation. (2) {x( t)} is weakly operator harmonizable. (3) {x(t)} is operator V-bounded. (4) For some CONS {
{{Yk(t)},L~(nk),Jk}:'=l of {(X(t),4>k)H}:'=1 such that
f: IIYk(t)II~,k
k=l
where 11 . 112,k is the norm in L6(Ok) for k ~ 1. (5) For every CONS {
<
00
for t E G,
Dilations of Operator-Valued Measures
133 00
ON sequence {4>k}k=l in H, where, for each k the L5(0)-valued process {(x(t), 4>k)H}.
~
L
G;k < 00 for every k=l 1, G<jJk > 0 is the constant in (4.3) for
(6) {x(t)} is scalarly V-bounded and G
Proof: The equivalence (1) {::} (2) was proved in [5, 5.5 Theorem], (2) {::} (3) in [4, 3.7. Theorem] and (1) {::} (4) in [5, 5.7 Theorem]. (1) {=} (5) is proved similarly. (1) =} (6): Suppose that {x(t)} has an operator stationary dilation {{y(t)}, Y,p}. We Inay assunle that Y = L6(0; H) for some probability measure space (O,~, jj,). Let T/ E rcagos (fJ3fj, Y) be the representing measure of {y(t)} and put ~ = PT/ E rca(fJ3 c ' X), which is of bounded operator sernivariation and the representing measure of {x( t)}. For 4> E Hand t E G we put ~r;<jJ(t) = (x(t),4»H' Then, {x<jJ(t)} is an L6(0)-valued weakly harmonizable process with the representing measure ~<jJ(') = (~(.), cP) H E Tca(fJ3 c ' L6(0)). It is well-known that an L5(0)-valued process is weakly harmonizable if and only if it is V-bounded (see e.g. [13]). Hence, {x(t)} is scalarly V-bounded. Let {ePk}k=l be any ON sequence in Hand {7Pk}k=l be a CONS in H containing {4>k} k=l' Then we get for each k ~ 1
sup
{II fa
= sup = sup
dtl12 :
{II le 43(x) ~1/Jk {II le I(x) ~1/Jk
(d x )112 :
11431100 S; 1} E
L1(G), 11431100 S; 1}
(d X )112 : 1 E Go((]), 11/1100 S; 1}
= 11 ~ 1/' k 11 (G) = 11 J T/1/J k 11 (G)
:::; 11T/1/Jkll(G)
=
(4.4 )
11T/1/Jk(G)112'
where Go (0) is the Banach space of all complex valued continuous functions on G vanishing at infinity, J : L5(0) -t L6(0) is the orthogonal projection and T/
1"caos(fJ3c,L6(O)) for 4> E H. Putting G<jJk = 11T/
00
00
k=l
k=l
k=l
L G~k = L IIT/c/>k (G)II~ ~ L 11T/1Pk (G)II~ = IIT/(G)II~ <
00,
00
=L
(77('),7Pk)H7Pk, where 1I·lly is the norm in Y. k=l (6) => (1): Let {'l/Jk}k=l be a CONS in H. Since {x(t)} is scalarly V-bounded, it is scalarly weakly harmonizable. For each k ~ 1 let ~1/Jk E rca(fJ3 c ' L6(0)) be the representing measure of the L6(0)-valued process {(x(t), 'l/Jk)H}. Then, we have by (4.4) that
since 77(')
_
11~1/Jkll(G)
00
:S Clj1k' Define
~(.) =
L
k=l
~1/Jk(')'l/Jk' Then ~ is well-defined and in
rca(fJ3 c 'X)
Kakihara
134
since (cf. Diestel and Uhl [2, p. 4]) 00
Ilell(G)2
~ 16 sup Ile(A)II~ == 16 sup AE~a
S 16
L Ile1/Jk(A)II~
AE~ak=l
L lI~l/Jk
11 (0)2
S 16
k=l
L C~k <
(X)
k=l
by assumption. Now we see that {x( t)} is weakly harmonizable with the representing measure ~ since
x(t) =
f
(x(t),7h)H1fJk =
k=l
f k=l
k(t,X)~"'k(dX) G
= j~(t,x) ffi'k(dX)1fJk = k(t,xH(d X), G
k=l
t E G.
G
00
Let {
L k=l
......
1I~4>kll(G)2 ~
00
L
C~k
<
00,
k=l
where e>(·) == (e('),
e
References
[1] D. K. Chang and M. M. Rao, Bimeasures and nonstationary processes, in: Real and Stochastic Analysis, Ed. by M. M. Rao, John Wiley & Sons, New York, pp. 7-118,1986. [2] J. Diestel and J. J. Uhl, Jr., Vector Measures, Amer. Math. Soc., Providence, R. I., 1977. [3] A. Grothendiek, ReSU111e de la theorie metrique des produits tensoriels topologiques, Bol. Soc. Mat. Sao Paulo 8 (1956), 1-79. [4] Y. Kakihara, A note on har1110nizable and V-bounded processes, J. Multivar. Anal. 16 (1985), 140-156. [5] Y. Kakihara, A classification of vector harmonizable processes, Stochastic Anal. Appl. 10 (1992), 277-311. [6] Y. Kakihara, Multidimensional Second Order Stochastic Processes, World Scientific, in preparation. [7] J. Lindenstrauss and A. Pelczynski, Absolutely summing operators in £p-spaces and their applications, Studia Math. 29 (1968), 275-326. [8] A. Makagon and H. Salehi, Spectral dilation of operator-valued measures and its application to infinite dimensional harmonizable processes, Studia Math. 85 (1987),257-297. [9] H. Niemi, On stationary dilations and the linear prediction of certain stochastic processes, Soc. Sci. Fenn. Comment. Phys.-Math. 45 (1975),111-130. [10] H. Niemi, On orthogonally scattered dilations of bounded vector measures, Ann. Acad. Sci. Fenn. Ser. A I Math. 3 (1977),43-52. [11] H. Niemi, Orthogonally scattered dilations of finitely additive vector measures with values in a Hilbert space. in: Prediction Theory and Harmonic Analysis, The Pesi
Dilations of Operator-Valued Measures
[12] [13] [14] [15] [16]
[17]
135
.A1asani Volume, Ed. by V. Mandrekar and H. Salehi, North-Holland, New York, pp. 233251, 1983. M. Ozawa, Hilbert B(H)-modules and stationary processes, Kodai Math. J. 3 (1980), 26-39. M. M. Rao, Harmonizable processes: structure theory, L 'Enseign. Math. 28 (1982), 295-351. M. Rosenberg, Quasi-isometric dilations of operator-valued measures and Grothendieck's inequality, Pacific J. Math. 103 (1981), 135-161. R. Schatten, Norm Ideals of Completely Continuous Operators, Springer, New York, 1960. W. Slowikowski, Absolutely 2-summing mappings from and to Hilbert spaces and a Sudakov Theorem, Bull. Acad. Polon. Sci. Ser. Sci. Math. Astronom. Phys. 17 (1969), 381-386. B. Truong-van, Une generalisation du theoreme de Kolmogorov-Aronszajn processus Vbornes q-dimensionnels: domaine spectral § dilatations stationnaires, Ann. Inst. Henri Poincare Sec. B 17 (1981),31--49.
Transient Solution of the M/M/1 Queueing System via Randomization ALANKRINIK
Department of Mathematics, California State Polytechnic University, Pomona, California 91768
DANIEL MARCUS
Department of Mathematics, California State Polytechnic University, Pomona, California 91768
DANKALMAN
Department of Mathematics and Statistics, American University, Washington D. C. 20016
TERRYCHENG
Department of Mathematics, Irvine Valley Community College, Irvine, California 92720
1
INTRODUCTION
A. K. Erlang's [3] single server queueing system MlM/I is the prototype of modem queueing models. Students of queueing theory generally begin their studies of queueing systems by learning about the MlM/1 system. Mathematical and statistical research on MlM/1 is quite extensive (see, for example, [7,8,10,14,15]) and remains a rich source of active research interest today. The total number of customers in the MlM/I queueing system is a birth-death process having constant birth (arrival) rates A and constant death (service) rates J.1 . The state space is the natural numbers which represents the possible number of customers in the system at any point in time. Transitions within the MlM/I queueing sytem are pictured according to the following state rate diagram.
A
A
A
Diagram 1 137
A
A
A
Krinik et al.
138
Alternatively, MIM/l may be characterized as having independent and identically distributed interarrival times that are exponentially distributed with parameter A and independent and identically distributed service times that are exponentially distributed with parameter 11 . In this article we consider the fundamental problem of determining the transient probability functions, P i,j ( t), which represent the probability of going from state i to state j in time 1. These functions are known [7, 10] to satisfy the following infinite system of differential equations.
pi,o (t) = -APi,O (t) + IlPi,l (t) pi,j (t) = APi,j-l (t) - (A + P)Pi,j (t) + JiPi,j+l (t) for i=O, 1,2,3,... and j=1,2,3, ... where A and f.1 are positive constants. This problem for the MM/l queueing system was first solved in the early 1950's [1,4,6,13] - decades after Erlang's pioneering work. Even after the solution was known, researchers have continued to develop new solution methods (cf. [8, 12, 15]) to achieve better understanding of this process. Interests in the transient probabilities of MlM/1 is strong for a number of reasons: the central (and influential) role played by MlM/l in queueing theory, increasing realization of the importance of transient information, the inherent complicated (and interesting) nature of existing solution methods which challenge us to develop, when possible, simpler solution methods and alternative solution fonns that are numerically stable and inviting to practitioners. Interestingly, also starting in the early 1950's, a general method to numerically detennine transient probability functions in Markov processes known as randomization (or unifonnization) was introduced; see [9] for history. Since its inception, randomization has been gaining recognition as an effective, numerically stable method to calculate transient probabilities. The importance of randomization is its use in numerically solving for transient probabilities in complicated queueing systems where analytic solutions are unknown or intractable. Essentially, randomization shifts the problem of determining transition probability functions in time t to finding n-step transition probabilities of a suitable, embedded Markov chain (see [9,10,11,14]). Our purpose in this article is to report a solution of the transition probability functions of MlM/1 using the randornization method. The key idea is to count certain sets of sample paths of the embedded Markov chain associated with MlM/l by making a one to one correspondence with another class of sample paths which can be counted using the reflection principle. Our solution has a different form than the usual solution of MIM/ 1 given in terms of Bessel functions (cf. [7,10]). In place of using Bessel function theory, Laplace transfonns, generating functions and complex analysis, we rely upon having the randomization fonn of the solution and counting sample paths via coding and the reflection principle.
2
SOLUTIONS OF MlM/l VIA RANDOMIZATION
The governing system of differential equations for notation as
P'(t) = pet) where
Q
P i,j (t) may be written in infinite matrix
Transient Solution of the M/M/1 Queueing System
139
i, j = 0, 1,2,...
and
o
-A
o Q=
o
A solution of this system by randomization (cf. [10,14]) is given as
pet) where
Jl
P =
1
A + Jl 0
A
0
o A
0 0
o A
is a stochastic matrix corresponding to the following embedded Markov chain.
Krinik et al.
140
q
p
q
q
Diagram 2
with
q
A A+J.l
p=--
and
q
q
q
J.l A+J.l
q=--
In particular,
· . (t) P I,J for i, j = 0, 1, 2,... where p i~) is the probability of going from state i to state j in N steps for the embedded Markov chain pictured in Diagram 2. Therefore, we have an explicit solution of
P i,j (t) for the M/M/1 queueing system once we have an expression for accomplished in the following theorem.
P i~) . This is
THEOREM 1 Suppose i and j are any two states: i, j = 0, 1, 2, ... Assume N = 1, 2, 3,...
A
is chosen and let c = i + j and d = j - i and p = - -
A+J.l
fl
(N)
and q = - - . Then P i,j
A+J.l
,
the
N-step transition probability from state i to state j for the embedded Markov chain in Diagram 2 is given by
~l
N-d - i { { [
N N + c + 2m
2 when N + d is even and
J [
N N + c +22 m + 2
J}
m} N-d m N+d 2 q-2-+ p -- -
Transient Solution of the M/M/l Queueing System
P ~~) l,J
141
=
Note, we adopt the convention that
(~) = 0
if
M
>
N
or
M
< O.
PROOF. The number of sample paths in Diagram 2 from state i to state j in N steps having either 2m or 2m+ 1 loops at state zero is provided in the following lemma. This counting result allows one to detennine the contribution to p ~~) from sample paths having a prescribed number of loops at state zero. For any finite sequence of numbers S = SI s2 s3 ... sN we adopt the notation that Sk means k
I
so' The following definitions are motivated by classifying sample paths from i to j in
n=l
Diagram 2 according to whether we loop at zero exactly m times, merely touch state zero without looping or never even visit state zero during our journey from i to j.
DEFINITION 2 Suppose i, j are two arbitrary states from 0,1,2,3, ... and assume N= 1,2,3, ... is given. A sample path of length N fron1 state i to state j in Diagram 2 may be represented by a finite sequence of N elements l\ = a 1 8.2 a3 ... aN such that AO = 0 by convention, AN = j - i and
an
= {O -1
if
or
for n=1,2, ... N.
or
1
if
The set of all sample paths, A, of length N from i to j having exactly m of its elements equal to zero (that is, ank = 0 for k=1,2, ...m for some subsequence of an ) is denoted by ID
~ 1.
Srj
Srj
(m) where
(0) will denote those sample paths A of length N from i to j for which an
-:t:-
°
Rr
for each n=1,2, ... ,N and yet An = -i for some n=1,2, ... ,N-1. Finally, let j represent all the remaining sample paths A of length N from i to j; that is, those sample paths A that go from i to j in N steps such that An -:t:- -i for each n= 1,2, ... ,N-1. The next proposition establishes that Srj (m) and Srj+ffi (0) have the same number of sample paths for m=O, 1,... , N-i-j. The following graphs illustrate how this one-ta-one correspondence may be visualized when N=9, i= 1, j=2 and m=O,l or 2. Start with a sample path
Krinik et al.
142
f,2
in S (0) such as shown in Figure 1. If we change the right most "zero to one" segment (dashed in Figure 1) into a horizontal segment and lower the graph to the right of this segment one unit, we obtain the sample path in S r,l (1) pictured in Figure 2. Repeat the process again by changing the right most "zero to one" segment of Figure 2 (the dashed segment) into another horizontal segment and once again lower the part of the graph to the right of the new horizontal
to
(2) of Figure 3. These operations are easy to segment one unit to obtain the sample path in S reverse by successively replacing horizontal segments by "zero to one" segments and raising the graph to the right of each replacement one unit. The reasoning as to why this procedure is a oneto-one correspondence is contained within the proof of the following proposition.
(9,2) (0,1)
Figure 1
(9,1) (0,1)
Figure 2
Transient Solution of the M/M/l Queueing System
143
(0,1)
Figure 3
PROPOSITION 3 Suppose i, j are arbitrary states of the embedded Markov chain in Diagram 2. Assume N = 1, 2, 3,... , then (a)
Sra (k) and S~k (0) have the same number of sample paths for k=O,l, ..., N-i.
(b)
Srj (m) and Srj+ffi (0) have the same number of sample paths for m=O,I,..., N-i-j.
PROOF. (a) f:
Define functions f and g as follows.
sra (k) ~ Srk (0) such that for A E sra (k), f(A) = B = bI b2 ... bN where
-1 bn =
{
1
if
an = -
I}
. In words, f changes all zero entries of A into ones. Define
otherwise
g : S~k (0) ~ S~a (k) so that for B
E
S~k (0) , g(B) = A = al a2 ... aN where
if
b n =-1
if
b n = 1 and B n- l < Br
for each r = n, n + 1, ... , N }.
otherwise The function g changes k one entries to zero and keeps everything else the same. To see that f is one to one, argue by contradiction. That is, suppose there exists A -:;:. A' with f(A) = f(A'). Let A = a 1 a2 ... aN and A' = a'l a l2 ... a'N and let h be the largest subscript such that ah:t:- a'h. Neither ah nor a'h can be -1 because f(A) = f(A') and the definition of f would force both to be -1 which contradicts ah :t:- alh. So ah = 0 and a'h= 1 or ah = 1 and a'h= O. Without loss of generality assume ah = 0 and a'h= 1. However, AN = A'N = - i, implying Ah=A'h and Ah-l = -i by Definition 2 of sra (k). But this means A'h-l = -i-I, an impossibility since A'h-l 2:: -i by Definition 2. Therefore, f(A) -:;:. feAt) and f is one to one. Clearly f(g(B)) = B
Krinik et al.
144
for all B
E
S rk
(0) which means f is also onto. This completes the proof that f is a one to one
correspondence between (b)
Sra (k) and Srk (0).
Define f and g as in part (a). Again f:
Srj (m) ~ Srj+m (0) changes zeroes to ones
and g: S~j+m (0) ~ sra U+m) is as before. Th~ same argument given in part (a) applies to give that f is one to one. To check that f is also onto, define a new function
sra
sra
h: U+m) ~ Srj (m) as follows. If A E U+m) then A has j+m zero entries. That is, there is a subsequence n 1 < n2 < ... < nj+m such that anr = 0 if and only if r= 1,2,... ,j+m. Define h(A)=B where B=bl b2 ... bN such that
if
= or
0
where
r=1+m.2+m •...• j +m}
otherwise In other words, h changes the last j zeroes into ones. Then for any A E S rj+ffi (0), f(h(g(A») = A , so f is also onto. This establishes the one to one correspondence between
Srj (m) and Srj+ffi (0). REMARK 4 A related result to the preceding proposition appears in [8, pp. 10-17]. LEMMA 5 with i, j
(a)
= 0,
SEj
Suppose i, j are arbitrary states for the embedded Markov chain drawn in Diagram 2 1,2,3,.... Assume N=l, 2, 3,... and let c = i + j and d =j - i, then
(0)
u Rtj
has
{l ~ J l +;+ J} N d
N
2
sample paths, provided N+d is
even.
(b)
SEj
(2m) has
{l ~+ N+
2m
N-c
J
and where m=1,2, ... , - - and
2
l ~2m N+c
+2
J}
sample paths, for N+c even
Transient Solution of the M/M/l Queueing System
s fj (2m+ I)
has {[ N + c
~
for N+c odd, where m=O, 1,... ,
PROOF.
2 ID + 1}
145
[N + C
~2
ID
+ 3}}
sample paths,
N -c-l 2
Part (a) follows directly from the reflection principle. The number of sample paths
within S~j (2m), and S~j (2m+l) equals the number of sample paths within S~j+2m (0) , and
S~j+2m+l (0) respectively by Proposition 3, part (b). The results then follow using the reflection principle; see [5] for additional details. REMARK 6 The use of lattice path combinatorics to determine the n-step transition probabilities of embedded Markov chains for certain birth-death processes is very promising. The preceding randomizationlcombinatoric approach also produces analytic solutions of the transition probabilities for the MlM/2 queueing system. These results and related matters will be presented elsewhere. REFERENCES 1. Bailey, N. T. J. (1954). A continuous time treatment of a simple queue using generating functions, J. Roy. Statist. Soc. Ser. B, ~ 288-291. 2. Bhattacharya, R. N. and \Vaymire, E. C. (1990). Stochastic Processes with Applications, Wiley, New York. 3. Brockmeyer, E., Halstrom, H. L. and Jensen, A. (1948). The life and works of A.K. Erlang, Transactions of the Danish Academy of Technical Sciences,,2: 1-277. 4. Champernowne, D. G. (1956). An elementary method of solution of the queueing problem with a single server and continuous parameter, J. Ray. Statist. Soc. Ser. B. 18: 125-128. 5. Cheng, T. (1994). Reflecting Paths in the Single Server Queueing System, Ivlasters Thesis, California State Polytechnic University, Pomona. 6. Clark, A. B. (1953). The time dependent waiting line problem, Report M729-IR39, Univ. Michigan. 7. Cohen,1. W. (1982). The Single Server Queue, North-Holland Ser. Appl. Math. Mech. 8, North-Holland, Amsterdam, 2 nd edition. 8. Conolly, B. (1975). Lecture Notes on Queueing Systems. Wiley, New York. 9. Grassmann, W.K. (1991). Finding transient solutions in Markovian event systems through randomization, Numerical Solution of Markov Chains (W. 1. Stewart, ed.), Marcel Dekker, New York. 10. Gross, D. and Miller, D. R. (1984). The randomization technique as a modeling tool and solution procedure for transient Markov processes, Oper. Res. 32 : 343-361. 11. Gross, D. and Harris, C. M. (1985). Fundamentals of Queueing Theory. WHey, New York, 2 nd edition. 12. Krinik, A. (1992). Taylor series solution of the MIM/l queueing system, J. Comp. Appl. Math. 44: 371-380. 13. Lederrnann, W. and Reuter, G. E. H. (1954). Spectral theory for the differential equations of simple birth and death processes, Phi!. Trans. Roy. Soc. London Ser. A 246: 321-369. 14. Medhi, J. (1991). Stochastic Models in Queueing Theory. Academic Press, Boston. 15. Parthasarathy, P. R. (1987). A transient solution to an MIM/l queue: a simple approach, Adv. Appl. Prob .. 12.: 997-998.
A Characterization of Hida Measures HUI-HSIUNG KUO* Department of Mathematics, Louisiana State University, Baton Rouge, LA 70803 ([email protected])
1. WHAT IS A HIDA MEASURE?
Let S be the Schwartz space of real-valued rapidly decreasing functions on JR. It is a nuclear space with the topology given by the family of norms
where A is the operator A == -d 2 / dx 2 + x 2 + 1 and I . 10 is the L 2 (IR)-norlll. Let S' denote the dual space of S. The (j-fields on S' generated by the weak topology, the strong topology, and the inductive linlit topology are all the same, which is regarded as the Borel field of S'. A Hida measure is a certain Borel measure on S' to be defined below. For each p 2:: 1, let Sp denote the completion of S with respect to the norm I . Ip. The dual space S~ of Sp can be identified as the completion of L2 (IR) with respect to the nornl
so that S' is the inductive limit of {S~; p 2:: I}. Let S~ and S~,c denote the complexification of S' and S~, respectively. For any p 2:: 1 and 0 :S (3 < 1, let Ap,,B denote the space of all conlplex-valued function
Kuo
148
(a) 'P is an analytic function on S;,c. (b) There exists some constant C 2:: 0 such that
Icp(x)1 For rp E A p ,,8, define
~ C exp [~(l + tJ)lxl~], "Ix E S~,c·
Ilrpllp,,8
by
IIcpllp,~ = sUI?
Icp(x)lexp [-
xESp,c
~(l +tJ)lxl~].
Obviously, A p ,,8 is a Banach space with the norm 11 . IIp,,8' Moreover, A p ,,8 C A q ,,8 for any p ~ q ~ 1 and the inclusion is continuous. For fixed 0 ~ f3 < 1, define
A{3
=
nA ,{3' p
p;:::l
We endow A{3 with the projective limit topology, i.e., such that the inclusion A,8 C A p ,,8 is continuous for each p A{3 is a Frechet space, i.e., a complete metrizable locally Sin10n [5]). Let A~ and A;,,8 denote the dual spaces of A,8 that we have
A~ =
the coarsest topology on A{3 2:: 1. It is easy to check that convex space (see Reed and and A p ,,8, respectively. Note
U A;,{3'
p;:::l
We call A p the space of test functions on 5'. Its dual A~ is the space of generalized functions on 5'. For the case f3 = 0, these spaces have been introduced by Lee in [4]. A function cp in A p ,{3 is defined on S~. We will use the same notation cp to denote its restriction to 5'.
Definition. A Borel measure v on 5' is called a Hida measure of order f3 if 'P E £1 (1/) for all cp E A{3 and the linear functional rp -+
ISI
v
,
If v is a Hida Ineasure of order f3, then it induces a generalized function, denoted by in A~ such that
=
r cp(x)dv(x), 1st
cp E
A~.
To give a siluple exaluple of Hida nleasure, note that by the Minlos theoreln there exists a unique probability measure J-l on 5' such that
l,
ei(x,O dJ-l(x)
= exp [ - ~1~15], ~ E S.
This Ineasure J-l is referred as the standard Gaussian measure on 5'. It is easy to check that 1£ is a Hida measure of order f3 = O. A Hida measure of nonzero order has been introduced by Kondratiev and Streit [1] (see also Kuo [3]). Let LA(t), 0 < A ~ 1, be the Mittag-Leffier function 00
L>.(t) =
(_t)n
~ r(l + An)'
t
~ 0,
Characterization of Hida Measures
149
where r is the Gamma function. The function L).("15), , E S, is positive definite. Hence by the Minlos theorem there exists a unique probability measure v). such that
It can be shown that v). is a Hida measure of order 1 - A. For details, see Kondratiev . and Streit [1] or Kuo [3].
2. A CHARACTERIZATION THEOREM
!.
Note that the inclusion L 2 (IR) C S~ is a Hilbert-Schmidt operator for any p > Hence (L 2 (IR), S~) is an abstract Wiener space for such p (see Kuo [2]). Thus the standard Gaussian Ineasure J.-l is supported on S~ for any p > ~. Then by the Fernique theoren1 (see Kuo [2]), there exists SOll1e constant a > 0 such that
This type of integrability can be used to characterize Hida measures on SI.
THEOREM. A Borel measure v on S' is a Hida nleasure of order {3 if and only if v is supported in S~ for some p ~ 1 and
L, exp [~(1 + fJ)lxl~]
dv(x) <
00.
p
This theorelll has been proved for the case (3 = 0 by Lee in [4]. However, the proof for the necessity part in [4] cannot be adapted to the case f3 =1= o. To prove the sufficiency of the above theorem, let 'P E A,e. Then
L, 1~(x)1 = L, (1~(x)1 exp [ - ~(1 + fJ)lxl~~]) exp [~(1 + fJ)lxl~~~] 1I~lIp,{3 L, exp [~(1 + fJ)lxl~;:] dv(x)
p
dv(x)
p
:S
dv(x).
p
This iInplies that the linear functional
is continuous on A,e. Hence
A~
and so v is a Hida measure of order {3.
Kuo
150
For the necessity part, we only sketch the proof. For details, see Kuo [3]. Suppose v is a Hida lueasure on 5' of order f3. Then it induces a generalized function <1>v in A~. Since A~ == Uq~lA~,,8' there exists some q 2 1 such that <1>,8 E A~,,8 and v( rp)
=
r rp( x) dv(x),
rp E
iSI
Aq,,B.
(1)
Define a function 'ljJ on S~,c by
It is easy to check that 'ljJ is an analytic function on S~,c. By Lemma 6.6 in Kuo [3] we have the following inequality for any, 2 1:
We can use this inequality to obtain that
VJ(x) =
f
n=O
(~! [~lxl~rr+,B
~ exp [~(1 + fJ) Ix I:=?] . Therefore, 'ljJ E A q,,8 and by equation (1) we have
v(VJ)
=
r VJ(x)dv(x).
(2)
iSI
On the other hand, by Leluma 15.16 in Kuo [3] we have the following inequality for allY r > 0: 00 ( un ) r eTu / 4 < e3r / 2 ~ -n 'Vu 2 o. L....J 2 n=O n.' ' 2
By letting r == 1 + (3 and u == Ixl~, we get
1+,6
2]
exp [ -4-lxl~ ~
e
3
x E S~,c·
(3)
(l+,B)/2 v ( VJ).
(4)
(1+,8)/21/J(x),
It follows from equations (2) and (3) that
l,
exp
[1 : fJ Ixl ~] dv( x) ~ e
3
Now note that the inverse operator of A == -d 2/ dx 2 + x 2 + 1 acting on L 2(IR) is continuous and its operator norm is given by /lA -1/1 == 2- 1. Hence for any p 2 q 2 1 we have Ixl- p == lA -Pxlo == IA-(p-q)A-qxlo
::; 2-(p-q) lA -qx/o
== 2-(p-q) Ix I_q'
Characterization of Hida Measures
151
We can choose large p such that p > q + ~' Then
L,
exp
[~(1 + iJ)lxl~] dll(x) ~
L,
exp
[1 :iJlxl~] dll(x).
Thus by equation (4) we have
L, e~p [~(1 + iJ)lxl~]
dll(x)
~ e3 (l+,B)/2 v('lP) < 00.
But Ixl-p == 00 for any x E 5' \ S~' Thus the last inequality implies that the measure v is supported on S~ and
L,
exp
[~(1 + iJ)lxl~] dll(x)
<
00.
p
This con1pletes the proof of the theorelll.
Example. The probability llleasure v).. in Section 1 is a Hida measure of order 1 - A. Hence by the above theoren1, it is supported in S~ for some p ~ 1 and we have
L,
exp
[~(2 - A)lxl~] dll>.(x) < 00.
p
REFERENCES [1] Kondratiev, Vu. G. and Streit, L.: Spaces of white noise distributions: Constructions, Descriptions, Applications. I; Reports on Math. Phys. 33 (1993) 341-366 [2] Kuo, H.-H.: Gaussian Measures in Banach Spaces. Lecture Notes in Math. 463, Springer-Verlag, 1975 [3] Kuo, H.-H.: White Noise Distribution Theory. CRC Press, 1996 [4] Lee, Y.-J.: Analytic version of test functionals, Fourier transfOrlTI and a characterization of measures in white noise calculus; J. Funct. Anal. 100 (1991) 359-380 [5] Reed M. and Simon, B.: Methods of Modern Mathematical Physics I: Functional Analysis. Academic Press, 1972
New Results in the Simplex Method in Linear Programming ROGER N. PEDERSEN Pittsburgh, PA 15213
Department of Mathematics, Carnegie Mellon University,
"Notation is important. It can even solve problems. But, at some point, you must do some work yourself." K. O. Friedrichs.
1. INTRODUCTION AND STATEMENT OF THE PROBLEM. Without using any symbols at all, we can give a precise statement of the problem by saying that it is to find the maximum, if it exists, of a linear function of a finite number of real variables on a convex plane polyhedron of the same variables. The simplex method of solving the problem is then to find a vertex of the polyhedron and then to proceed along edges from one vertex to the next, in a manner that the linear function increases, until the maximunl is reached. All the data needed to state and solve the problem can be stored in an (m + 1) x (n + 1) matrix A. The analytical statement of the problem then is to find the maximum of the objective function n
L
A m + 1,jXj
+ A m + 1,n+1
(1.1)
2: 0, i = 1, ... , m.
(1.2)
j=1
subject to the constraints n
L AijXj + A i
,n+1
j=1
153
Pedersen
154
By defining A to be the matrix comprising the first m rows and first n columns of A and b to be the transpose of A1,n+l, .. ,Am,n+l, the constraint (1.2) takes the simpler form Ax
+b2
0,
(1.3)
meaning, of course, that each component of the column vector is non-negative. The vector x is superfluous for the purpose of applying the simplex algorithm. But, working only with
the matrix A, can lead to misconceptions as we shall see in the next section. But first, let us find another notation for the constraint set by using Ai to denote the rows of A. Then (1.3) can be replaced by (1.4) where bi is the i th coordinate of band (,) represents the canonical inner product.
2. BUT, THOSE SLACK VARIABLES ARE UNNECESSARY. Let us re-write (1.3) as
(AC)(C-- 1 X)+b2 0
(2.1)
where C is any non-singular n x n matrix, noting that this does not require an equality. Now, assuming A has rank n, we may apply elementary column operations to reduced echelon form. If C is the product of the corresponding elementary column matrices and y == C- 1 X, the first n coordinate of (2.1) are (2.2) Then, by making the translation Zi == Yi + bi , we may assume the constraint set to be in, what is commonly called, canonical form. Furthermore, if for one j, 1 ::; j ::; n, we put Xj == Zj - fJj in (1.1),(1.2) we see that this corresponds to multiplying the jth column of the full matrix
A by fJi and subtracting it from the (n + l)th column; that is, it is an elementary column operation. I prefer doing elementary row operations on the transpose. Thus the simplex method reduces to transposing the matrix A and applying elementary row operations until the first n column are in reduced echelon form, with the restriction that the pivots are to be picked from the first n rows of AT. The only question that remains is when to start using the simplex pivoting strategy. After the system is in canonical form, we must use the simplex strategy; before that we may use instead the standard Gaussian Elimination Strategy. Note that the simplex strategy requires picking the maximum positive element of the current column and hence is a partial pivoting strategy. We shall have more to say about this in Section 5.
3. EMPTY SETS, REDUNDANT CONSTRAINTS AND LOWER DIMENSIONAL SETS. Let us now suppose that the normals of the first n constraints form a linearly independent set. Then, for any k > n, n
Ak
==2: DkiAi
(3.1)
i=l
and hence
n
Lk(x)
==2: DkiLi(X) + 6 k i=l
(3.2)
155
Simplex Method in Linear Programming
with
n
D.k == A k,n+l -
L akiAi,n+l.
(3.3)
i::::::;l
It follows from (1.4) and (3.2) that if (akl,., ... , ak,n, D.k) are all non-negative, the k th constraint is redundant and that if they are all negative the set is empty. If for some i :S; n, aki > 0, akj :S; 0 for j :# i and D.k < 0, then the i th constraint is redundant. In all other cases where none of the numbers (akl,., ... , ak,n, D.k) is zero it is easily shown that the set forrned for the first n and the k th , is non-empty. The other important special case occurs when D.k == 0 and aki :S; 0 for i == 1, ... , n. Then the entire constraint set is contained in the set where Lk(x) == O. Hence, we may use this constraint to eliminate a variable and obtain a lower dimensional set. This means that, by reducing the number of dimensions, we may assume that this case does not occur. We note from (3.2) and (3.3) that, when the constraint set is in canonical form, Ai,n+l == 0, i == 1, ... , n, so the aki's and D.k are just the coefficients of the constraint equation. From this point on \ve shall assume that the set is in canonical form. The origin will be called the basic vertex, the first n constraints the basic constraints and the rest of the constraints the non-basic constraints.
4.
THE SIMPLEX ALGORITHM WITH A NON-DEGENERATE BASIC VERTEX.
A vertex which is the intersection of more than n-planes is called a degenerate vertex. This means that, when the basic vertex is non-degenerate, all of the non-basic constraints have non-zero constants. The simplex strategy then is to increase by one the number of positive constants among these until they are all positive and then to increase the constant in the objective function. Let us assume that the constraints are ordered so that Ai,n+l
> 0, i < ]J
Ai,n+l
< 0,
and if]J < m
(4.1) Our first objective is to increase k to maximize
]J
]J:S;
i :S; m.
by one when it is less than m. The first step is to choose
{Ap,j : Ap,j
> 0, 1:S; j :S; n} .
(4.2)
When p :S; rn, the results of Section 3 insure that we may assume the above set to be non-empty; when p == m + 1, it is only empty when we have found the maximum. Next, we choose £ to maximize the negative numbers Av,n+l : v :S; p - 1, A v,k { ~ v,k
<
o} .
(4.3)
Suppose that the above set is empty. If p == m + 1 and A m + 1 ,k > 0 there is no maximum while if A m + 1,k < 0 we may set Xk == 0 and continue in one less dimension. If p :S; m we simply set == p, observing that the silnplex method requires only one step. Next, we interchange the eth non-basic constraint with the k th basic constraint and put the constraint set back into canonical form. This requires applying Gaussian elimination to the k th colurnn of AT. The new elements of the matrix then are
e
I A ek
,
1 == - -
Ae,k
(4.4)
Pedersen
156
A' - - A ej £,j -
and when i
=1=
Af,k'
j
I- k
(4.5)
f, A' i,k -
A~· == A·· 'J
Ai,k Af,k'
(4.6)
- AikA£j
Ae,k' j
'J
=1=
k
.
(4.7)
In particular, A' - - A£,n+l f,n+l Af,k
>0
(4.8)
since, whether f == p or f < p, A f ,n+l and A£k have opposite signs. If i =1= f and f < P - 1, we see from (4.7) that A~,n+l is the sum of two positive numbers when A ik > 0 and that when A ik < 0 it is positive as a consequence of the choice (4.3) of f. Hence, in any case, the first p - 1 constants remain positive and if f == p, A p,n+l is also positive and we have increased p by one. But we also see from (4.7) that if f < p,
A~,n+l > Ap,n+l
(4.9)
Since the constraint set has only a finite number of vertices, we shall, in a finite number of steps either find the set to be empty, prove that A~,n+l > or arrive at a degenerate vertex.
°
5. THE CASE OF A DEGENERATE VERTEX. The case of a degenerate vertex occurs when there are zero constants Ai,n+l == O. Suppose that we apply the previous strategy to the basic constraints and the non-basic constraints with non-zero constants. Then we see from (4.7) that when Ai,n+l == 0, A 'i,n+l
== -
(Afj) A ik A fk
(5.1)
°
and since A fj < 0, A fk > 0, we have A~,n+l > whenever Ai,k > 0. There is no reason that this should be the case, but, by applying the simplex strategy to the first n columns of A, with the k th playing the roll of the constants, we can use the simplex strategy to achieve this. Because the algorithm is slightly more complicated when the degeneracy is of higher order, it is convenient to introduce constants ak, (3k satisfying, after reordering the constraints and variables Ai,k == 0, n + 1 :s; i < ak
(5.2) < 0, {3k:S; i < ak+l with a n+2 == m. The cases ak == n + 1, 13k == CYk and {3k == ak-t 1 are used to indicate that the corresponding set is empty. Now we apply the following algorithm to the constraint set in canonical form.
[1] k == n + 1
Simplex Method in Linear Programming
157
[2] Reorder the constraints so that (5.2) is satisfied. Now, we are ready to pick the current constraint indexed by p. The choice agrees with (4.1) when k == n + 1. [3] If k
== n + 1 or
13k
<
Qk+l, set p
==
13k and proceed to
[5).
Now, when we arrive at line [4], we have k < n + 1 and {3k == (tk+l' This means that the elements of the pivot column below the zeros in the k th row of AT are all zero so we can take advantage of the remark preceding (5.1) noting that, because k < n the current pivot row has already been chosen in the line [5].
[4) Replace k by k + 1 and proceed to [7]. Now we are ready to pick the current pivot row of AT.
[5) Reorder the variables so that Ap,k-l maximizes the positive coefficients i S k - 1 when it is non-empty. If it is empty proceed to [10].
Ap,i,
i S
If Qk == n + 1, we are ready to begin the updating subroutine. Otherwise, we decrease k by 1 and return to [2).
[6) If Qk > n + 1, decrease k by 1 and return to [2]. When we arrive at line [7] we know that the k - 1th row of AT is the current pivot row and, before updating, we must find the current pivot column. [7] If the set {i < p : Ai,k-l < O} is non-empty, choose E to maximize the ratios Ai,kIAi,k-l. Otherwise set == p. N O\V, we are ready to interchange the constraints indexed by k - 1 and E and then put the matrix back into canonical form.
e
[8] Return the matrix to canonical form by applying Gaussian elimination to reduced echelon form to AT using the element indexed by E, k - 1 as pivot. We note that, since the elements Al,j, j > k - 1, are all zero the elementary row operation correspond to adding zero to the rows of AT indexed by j > k. Hence the Qj'S and 13j'S, j > k so they are unchanged. We now redefine the Qj'S and 13j'S for j S k returning to [2]. [9] Return to [2]. The program will terminate at [10]. [10] The maximum is A m + 1 ,n+l. We have tacitly assumed the maximum to exist, leaving to the reader the task of adding the lines, explained in Section 3, regarding empty sets, redundant constraints, lower dimensional problems and problems with no maximum.
6. SMALL PIVOTS AND DEGENERATE VERTICES. In running the above algorithm, it is crucial that one distinguish between non-zero numbers and zeros represented by round-off errors. The author has studied this problem extensively on the Radio Shack Color Computer and on the Tandy 1000. Computing, respectively, to 9 and 16 places, base 10, the Random Number generator was used to supply the data and, computing to p places base 10, the test for determining whether or not a number is zero was by comparison with 10"-P, 2 S r S p/2. In order to increase the probability that the set is not empty, the probability that the origin satisfying a constraint is set at 1f, 0 S 1f S 1. With
158
Pedersen
no other restriction, a degenerate vertex has never been found. By building in the condition of degeneracy, e.g. by applying a similarity transformation to a known degenerate situation and adding more constraints, the program seenlS to work as well as in the non-degenerate case. The problem, in each case, is checked by re-running the program on the constraints forming the final basic vertex and by evaluating the objective function at the intersection of their planes. We have also never found an ill-conditioned matrix with the random number generator. By putting in the Hilbert matrix [2], prob. 169, p. 337, we find the obvious difficulty. However, by computing to a sufficient number of places, we have always been able to overcome the difficulty.
7. FURTHER METHODS OF SPEEDING UP THE PRC)GRAM. The Simplest Method of Speeding Up the Program is to remove the redundant constraint using the test of Section 3, noting that the test requires only sign-tests of quantities that are computed anyway. Its disadvantage is that a constraint that sho\vs up as redundant in one coordinate system does not necessarily in another. The number of degenerate constraints can be increased by adding the condition that the objective function be greater than its value at the current basic vertex. Another method of possibly speeding 11 p the prograrn is to use the fact that once a vertex has been found we know that the constraint set is non-ernpty. Then we can eliminate a variable using any of the constraints. If the constraint used was redundant, the new set will be empty. Otherwise, we obtain the maximum on an (n - I)-dimensional face. The weakness of this method is that we lose time when we use a redundant constraint to eliminate a variable.
8. THE STATEMENT OF THE CONDITION THAT THE SET BE EMPTY OR CONTAIN A REDUNDANT CONSTRAINT. In this section we iterate the formulas (4.4) - (4.8) for the constraint set n
L
Aijxj
+ Ai,n+l
~ 0, i == 1, ... , rn
(8.1 )
j=]
in canonical form. That is, A ij == cSij,
i == 1, ... , n
+ 1, ,7
== 1, ... , n.
(8.2)
Specifically, we generalize the condition that the set is ernpty when A kj < 0 for all j == 1, ... , n + 1 and contains a redundant constraint when the set {A k 1, ... , A kn , A k ,n+ 1 } consists only or non-negative elements or Ak,n+l < 0 and A kj > 0 for exactly one j ~ n. In this section we shall use the above stated condition to obtain a result for appropriate union by obtaining explicit formulas for the coefficients in the constraints when the constraints
(8.3) have been interchanged with the constraints (8.4)
Simplex Method in Linear Programming
159
in the order k i , £1' i == 1, ... , r and the constraint set is returned to canonical form at each step. In order to state the formulas, we denote by
(8.5) the minor determinant of A ij , i == rrl, + 1, ... , rn, j == 1, ... , n indexed by the rows i}, ... , in and the columns jl, ... , jr' Then, with A~j representing the original matrix and Aij the matrix after the constraints indexed by k 1 , ... , k r have replaced those indexed by £, ... , £r,
k r } , K~ == [1, rn]
Kr
{k 1 ,
Dr
Ir (k 1 , ... , ki
... ,
-
K r , L r == {£ 1 , ... , £r },
L~ ==
[1, n + 1] - L r , (8.6)
:
£1, ... , Pr)
we have the forrnulas for i E
K~,
A~j == fr+l (k 1, ... , kr,
j
1: : £1,
A~,fl == (_l)r- ir (k}, ... , kj _- 1 , kj +l , and for ki E
(8.7)
kr, i : £1, "', er) / Dr , £j E L r ,
(8.8)
E L~,
(8.9)
J(~r,
A~i,j == (-I)r+l- i ir (k 1 ,
A%i,£j
,
, £r,j) / D r , j E L~,
j
... ,
kr : PI, ... , €i-l, £i+1, ... , £n j) / D r , j
== (-lr+ fr-l (k1, ... ,kj-l,kj+l, ... ,kr: £1, ... £i-I,£i+l, ... ,tr )/D r , £j E
L r·
(8.10)
Before stating the condition for redundant constraints or empty sets, we shall prove the following theorern.
Theorem 8.1. 1-'he formulas (8.7) - (8.10) are invariant under permutation of k 1 , , k r or £1, ... , Pr in the sense the sign of~ither (8.7), (8.8) or (8.9), (8.10) for fixed i and j == 1, , n+l are invariant. This makes it possible to state the condition for empty sets or redundant constraints using only the pair (8.7), (8.8) in the order T == 1,2, ... , n. Proof. First let us note that we may assume that the k's and l's are in increasing order. This follows fronl the fact that when k l , ••• , kn are permutations of the same set, then k 1 , ... ,kj - 1,kj-t-1' ... kr,j == 1, ... ,n are merely written down in a different order. To prove this by induction, let a == (k 1 , ... , k r ) and aj == (k}, ... , k j _}, kj + l , ... , k r ) and suppose that the largest element y of a is indexed by £. Then after interchanging the y with the last elements of a and aj, j :f. E, the sign of the ratio (Jj / a is retained when j < E, changes when j > P and is multiplied by (_l)r-£ when j == £. Hence by rnoving the £th ratio to the end of the list and decreasing the order of those indexed by k, £ + 1 :::; k :::; T, we obtain a valid induction proof. Similarly for the l's. • Theorem 8.2. In applying the empty set or redundant constraint test, it is sufficient to scan (8.7), (8.8) for all permutation (k 1, ... , kr ) and (£}, ... , £r) in increasing order of T. Proof. In proceeding from r to T + 1 we interchange the constraints indexed by k r + l and Pr + 1 • A simple computation shows that in an (n + 1) constraint set in canonical form, an interchange of the (n + 1)st constraint with a basic constraint can't change the sign test indicating an empty set or redundant constraint l . But, by Theorem 8.1, we may assume that any k i and Pi were interchanged. 1
See Section 13, #2.
160
Pedersen
9. THE RECURSION FORMULA. Assuming that we have computed the matrix Aij' the matrix A~j+l is obtained by interchanging the constraints indexed by k r + 1 , f r + 1 and updating the matrix as in [1]. The result is (9.1) (9.2) and for i
#- kr + 1 , (9.3) (9.4)
Note, in particular, that (9.4) is the ratio of a 2 x 2 minor and a 1 x 1 minor and when r == 0, it agrees with (8.10). Also, when r == 0, (9.2), (9.3) agree with (8.8) (8.9). In order to make (9.1) agree with (8.6) we make the convention fo == 1. Before proving the general result, we shall develop some lemmas on determinants.
10. SOME LEMMAS ON DETERMINANTS. Let us use the usual convention that Bij is the co-factor of bij . Then our first and main lemma is:
Lemma 10.1. Let B == (b ij ) be a k x k matrix and let C be the (k - 1) x (k - 1) matrix j C == (b t).. - bi'kbk b kk
)
< ,i J. < k - 1. ' 1-
(10.1 )
Then
det C == det B/b kk .
Proof.
(10.2)
Define: (10.3)
Now we use the fact that the derivative of a determinant is the sum of the determinants obtained by differentiating one row of the matrix. When we differentiate the i th row of C, the new i th row is (10.4)
•
If we interchange this row with each of those indexed by i + 1, ... , k - 1, we have the matrix obtained by deleting the i th row from the first k columns of B. Hence, when we take the determinant, we obtain (10.5) It follows that
k-l
==:E i=l
bikBik .
(10.6)
Simplex Method in Linear Programming
161
When we differentiate twice we obtain a sum of determinants of matrices having two rows equal. Hence 'P" (E) == 0 so
B kk
+f
L
bikBik ·
(10.7)
i=l
Putting f == l/b kk gives (10.2). The recursion formula (9.4) ~Nith i
> k r + 1, j >
£r+l can be rewritten
(10.8) where the nUlnerator is the determinant of the 2 x 2 matrix indexed by J-L k r + 1 , i and v == £r+l,j and is, in fact, just the Lemma 10.1 with k == 2 after a change of indices. More generally, we can use Lemma 10.1 to prove inductively that, for 1 :S p :S r + 1,
(10.9) where the numerator is the determinant of the (p + 1) x (p + l)matrix indexed by J-L == k r +2- p, ... , k r + l , i and v == £r+l-p, ... , £r+l, j. In particular, when p == r + 1, (10.9) reduces, in view of (8.5), to (10.10) for any i E J(r+l, j E L~+l' In particular, (10.11) or (10.12) Since this is true for each r, we have proved (8.7) with i J(~+1' j E L~+l as a consequence of (10.10) and (10.11) with r replaced by r - 1. By eliminating A~j+l between (10.9), (10.10), setting i == kr +2 , j == £r+2, and using 8.6 for A~tl-P, we obtain the interesting identity
(10.13)
with J-L and v ranging over the indices kr+2- p, ... , kr +2 and £r+2-p, ... , £r+2. The main use we make of this identity is:
Theorem 10.2. Consider the identity (10.13) with p == 1. If three of the four minors comprising tl1e determinant on the right have sign opposite the fourth then D r ::p 0 and the sign of fr+2 is determined by the identity.
162
Pedersen
We shall also need the following identity f r+ 1 (k 1, ... , kr + 1 : I! 1, ... , I!i -1 , I!i+ 1, ... , I!r + 1,j) f r (k 1, ... , kr : I!i' .. ·I!r )
(10.14)
+
(k 1, ... , kr+1 : 1!1' ... , £r+1) fr (k 1, ... , kr : £1,
fr+1
, fi-
1 , f i + 1 , ... ,
fr,j) == O.
If we suppress the dependence on k 1 , ... , kr - 1 and f 1 , , f r - 1 , the left side of (10.14) is the 3 x 3 deterrninant of the matrix with rows indexed by (k 1 , k 1 , k2 ) and column indexed by fr' f r + 1 , j. Since the first two rows are equal the determinant is zero.
11. COMPLETION OF THE PROOFS OF THE IDENTITIES. We now have the main tools sufficient for the proofs of (8.7), ... ,(8.10) by induction. Note that we have proved (8.7) for all rand i 1- K r , j 1- L,r, the proofs of the cases (8.8), (8.9), (8.10). Hence, we may use i == k r + 1 , j == f r + 1 in (8.7) to express (9.1) as
AT+l
=
k r +l,f r +l
iT (kt, "" kT : l\, "" £T) D r +1
(11.1) .
This is the promoted version of (8.10) with i == r+1, j == r+1. By putting i == kr + 1 , j into (8.7) and substituting (11.1) into (8.10), we obtain
AT+!
__ iT+l (k 1, "', kT+1 : .el, "".eT> j)
k r +l,j -
D
1- L,r+l (11.2)
r
which is the formula (8.9) corresponding to the pair kr +1 ,j with j that for k i E K r , j 1- L,r+1.
1- L,r.
It follows from (9.4)
(11.3) After substituting (8.7) and (8.9), and setting the result equal to (8.9) with r replaced by r + 1, we obtain the identity (10.14). This completes the proof of the remaining cases in (8.9). The proof of the promoted version of (8.8) is isomorphic. There remains the case indexed by ki E K r and f j E L,r. We obtain from (9.4), (8.8), (8.9), (8.10) and (10.11) with r replaced by r - 1 into (11.4), we obtain
A rki,k +1
j
+
--
(-l)i+j Dr
{fr-1 (k 1, ... , k·)-1, k·2+1, ... , kr '. £1, ... , £.2-1, £.2+1, ... , £r )
iT (k 1 , "" kT : £1, "" £;-1, £i+l, "" £T-d iT (k 1 , k;-I, k;+l, "" k i fr+1 (k 1, ... , k r+ 1 : f 1, ... , f r + 1 )
1 :
£1, "" £T+!)}.
(11.4)
We now apply (10.13) in the form fr+1 (k 1, ... , k r + 1 : £1, ... , £r+1) fr-1 (k 1, ... , k j - 1, k j + 1, ... , k r : £1, ... , £i-1, f i + 1, ... f r )
(11.5) - fr (k 1, ... , k j - 1, k j + 1, ... , kr +1 : £1, ... , er) f1' (k 1, ... , kr : f 1, ... , f i - 1, f i+ 1, ... , er) .
After substituting (11.5) into (11.4), we have the promoted version of (8.10) for ki E K i , £j E L,j. Since the case of kr+ 1 C K r+1, £r+ 1 E L,r+ 1 has already been disposed of, the proof is complete.
Simplex Method in Linear Programming
12.
163
DUPLICATIC)NS OF CONSTRAINTS.
The formulas (8.7) - (8.10) are derived under the assumption that the sets (k 1 , ... , k r ) and (£1, ... , £r) are distinct. In particular the £'s are a subset of (1, ... , n) so we must have r S n. On the other hand, it follows from the recursion formulas (9.1) - (9.4) that we may, at any time, start over with a new matrix and continue until there is a duplication in either the k's or the £'s. In this section we resolve the question of such a duplication in the second step. The new matrix coefficients, after interchanging the i th non-basic constraint with the £th basic constraint and then returning to reduced echelon form by the use of elementary column operations, are (12.1) A~l == 1/ Ail A~j
and for k
== -Aij/Ail , j::J £
(12.2)
A~f
(12.3)
::J i, A~j
== A kl / Ail,
== A k] - A kl Aij / Ail,
j
# £.
(12.4)
Now let us interchange the new k th constraint with the £th basic constraint. By analogy with (12.1), (12.2) the coefficients for the new k th constraint are (12.5) and
A%j == A~j/ A~f'
j::J £.
(12.6)
After substituting from (12.1) - (12.4) there becomes (12.7) and
A%j == Aij - Ail Akj/Akl , j
# £.
(12.8)
These are just the parameters obtained after interchanging the i th constraint with the £th and returning to reduced echelon form. But they are in the position of the k th . The new coefficients for the i th constraint are
(12.9) (12.10) Again, after substituting from (12.1) - (12.4) and taking into account cancellations, these become A~~ == 1/ A kl , (12 .11 )
A:j == -Akj / A~l' j
# £.
(12.12)
They are the coefficients for the k th constraint after interchanging the k th constraint with the £th, and they are in the position of the i th . For r # k or i, r > n, the new coefficients for the r th constraint are (12.13)
Pedersen
164
A~j == A~j - A~f A~j / A~f' j
-I- f
(12.14)
After substituting from (12.1) - (12.4), these become (12.15) A~j ==
A rj - A rf Arj / A k £, j
-I- f
(12.16)
which are just the coefficient obtained after interchanging the r th constraint with the fth in the original matrix. This together with the remarks following (12.8) and (12.12) yields a proof of the following theorem.
Theorem 12.1. Interchanging the i th non-basic constraint with the fth, updating and then interchanging the k th with the fth and updating is equivalent to merely interchanging the k th with the fth in the original matrix, updating and then interchanging the i th and k th . Now let us determine the effect of interchanging one non-basic constraint with two different basic constraints. If after obtaining the formulas (12.1) - (12.4), we interchange the i th constraint with the qth basic constraint, q f i, the new parameter for the i th constraint are (12.17) A~~ == 1/ A~q, A~~ == -A~f / A~q,
(12.18)
and A~j == -A~j
/ A~q, j f q, f.
(12.19)
The formulas (12.17) - (12.19), after substituting from (12.1) - (12.4) are just the formulas obtained after interchanging the i th with the qth in the original matrix. For k f q, (12.20) (12.21) and A~j == A~j - A~q A~j / A~q, j
f
q, f.
(12.22)
Again, after substituting from (12.1) - (12.4), these are just the formula for the k th constraint after interchanging the k th with the qth in the original matrix except that the qth and Rth variables have been interchanged.
Theorem 12.2. IEwe interchange the i th non-basic constraint with the fth basic constraint, update and then interchange the new i th constraint with the qth, q f i, and update, this is equivalent to merely interchanging the i th with the qth updating and permuting the qth and fth variables.
Simplex Method in Linear Programming
13. THE CASE OF
(n
165
+ 2) CONSTRAINTS.
Let us assume the constraint set to be in canonical form. If er is any subset of the non-basic indices, we shall denote by So. the corresponding set of non-basic constraints and by So. the set So. together with the basic constraints. For a single index i we define
st == {j :S n : A ij > O}
(13.1)
and (13.2) The cardinality of set S shall be denoted by ISI. We shall assume that our constraint set contains no degenerate vertices and that minor determinants used in counting are always non-zero. Our (n + 1) constraint set Si is empty when ai,n+l < 0 and == n, hence == O. It
Istl
ISi-1
1st I == n or ai,n+l < 0 and 1st I == 1. 1st I == a, 0 :S a :S n. If a < n there exists an f :S n such that < o. If we
contains a redundant constraint when ai,n+l > 0 and Now suppose that
Ail
interchange the i th and fth constraints and put the set back into canonical form we obtain the constraint set
Si with ISi I =
If IJ > 0 there exists an index {' ::;
IJ.
nwith ai,f > O. After
interchanging the i th and fth constraint and putting the set back into canonical form the set
Si has ISi I = n+ 1-
IJ.
It follows that interchanging two constraints in an
(n + 1) - constraint
set cannot change its status relative to being empty, or having a redundant constraint. Hence, if neither Si nor Sj has this property, we can find an empty set or redundant constraint in an (n + 1) constraint set only by interchanging Si with a basic constraint and examining S j, j i- i or conversely. In particular, after making this interchange, the new constant term is Aj,n+l
== -
~; (Ai,n+l - ~;: Aj,n+! )
(13.3)
Hence, that constant term in the i th constraint, after interchanging the jth and the £th, has the same or opposite sign as the jth constant, after interchanging the i th and the gth , according to whether A jl and A if have the opposite or the same signs. Let us no\v study the jth constraint after interchanging the i th and the fth with A if < 0 and Aj,i > O. This requires analyzing the signs of (13.4) and
A 'j,k == A j,k
-
jiA
A -A
ik·
(13.5)
if
Since A jl and Ail have opposite signs, it follows from (13.4) that (13.6)
166
Pedersen
and from (13.5) that Ajk > 0,
k E Si n Si
(13.7)
Ajk < 0,
k E Si- n Sj-'
(13.8)
and For k E Si- n
st - {£} , we may make the signs of A"k J
== (Ajk _ Ajl) . A ik A Ail
(13.9)
ik
all negative or all positive, without violating (13.6), by choosing £ to minimize or maximize the ratios A jk ,k E Si_ n Sj+ . ( 13.10 ) -A ik Since, by (13.6), Ajl < 0, we can't achieve an empty set or redundant constraint unless Aj,n+l < O. This rules out the possibility A i ,n+l > 0 and A j ,n+l > O. When A i ,n+l < 0, A j ,n+l > 0, this cannot be the case unless it is true for either Si or Sj. There remains the cases where A i ,n+l and Aj,n+l have opposite signs. By applying the results (13.6) - (13.10), we see that Sij is empty or contains a redundant constraint if £ minimizes the ratios (13.10), (13.11) and either
A· 1 < _JfA'n > 0, ~ · 0 A 1.,n+l A j,n+l <, A A i,n+l
(13.12)
il
or (13.13)
14. THE CASE OF THREE VARIABLES. We shall focus our attention on the case of proving that the constraint set is non-empty by finding a vertex that satisfies all constraints. Thus, suppose that A i ,n+l
> 0, n + 1 ::; i ::; p - 1, Ap,n+l < O.
(14.1 )
Definition 14.1. For each k E S:' we define pk ,,+ .. A i,k < 0, A p,n+l - A 'k -- { 1,. A A i,n+l > 0 }
(14.2)
ik
and
7.:-
=
{i :
A ik
< 0,
Ap,n+l -
~: Ai,n+l
<
o} .
(14.3)
The simplex strategy makes Ap,n+l increase until it is either positive or the set has been demonstrated to be empty. That this strategy requires more than one step requires that ~- be non-empty for each k E S:. Otherwise, if Si: == 0, we may achieve our objective by interchanging the k th basic constraint with the pth and putting the matrix back into reduced echelon form.
Simplex Method in Linear Programming
167
From now on we shall assume the number of dimensions to be three. By making one Simplex step and permuting the variables, we assume that
A p1 > 0, Ap2 > 0, A p3 < 0, A p4 < O.
S;
(14.4)
S;,
If there is a constraint indexed by i < p for which 1 E and 2 E it follows from (13.11) - (13.13) that there is a redundant constraint. Let us then assume that there exist two non-basic constraints indexed by i and j for which i E ~-
n ~+,
j E ~+
n ~- .
(14.5)
The constraints indexed by i, j, p have the following sign configuration 1
e j k
234
+ +
(14.6)
e + +
+ +
The circled and uncircled minuses referring to
0.:-
and
0.:+
respectively. We then have
A p1 Ap4 - TAi4 < 0,
(14.7)
i1
A p2 A p4 - TAi4 > 0,
(14.8)
i2
A p2 -A· A j2 J 4 < 0
(14.9)
A p1 A p4 - TAj4 > O.
(14.10)
A p4 and
-
jl
These are equivalent to
f2 (i, p : 1, 4)
> 0, 12 (i, p : 2, 4) < 0, 12 (j, p : 2, 4) > 0,
12 (j, p : 1, 4)
< O.
(14.11) It follows from (14.7), (14.8); (14.9), (14.10) that
12 (i, p : 1, 2) < 0, 12 (j, p : 1, 2) > O.
(14.12)
By writing (14.12) in the form
A p2Ai1 < 0, A pI - -A A p2A j1 > 0, A p1 - -A i2
(14.13)
j2
we have
12 (i, j
: 1, 2) > O.
(14.14)
Similarly, it is a consequence of (14.7), (14.10) and (14.8), (14.9) that
12 (i, j
: 1,4) < 0,
12 (i, j
: 2,4) > O.
(14.15)
Pedersen
168
After interchanging the i th and first constraints and updating, we have the coefficient matrix
1
fdi
(1 : 1)
-11 (i,2) -11(i:3) -11(i:4) ) : 1, 2) 12 (i, j : 1, 3) 12 (i, j : 1, 4) 11 (p : 1) 12 (z, p : 1, 2) 12 (z, p : 1, 3) 12 (z, p : 1, 4)
f dj, 1)
12 (i, j
(14.16)
It follows from the imposed signs (14.6) - (14.15) that the matrix (14.16) has the sign configuration 1 234 J p
+
+ + ± +
(14.17)
+ ± -
If the coefficient indexed by p, 3 were negative the third basic constraint would be redundant. Therefore, we impose the sign (14.18) l(i,p: 1,3) < 0 leaving the configuration
( ~- +=:+ :) -
(14.19)
with only the 2,3 element having an arbitrary sign. In any case the interchange of the jth and second constraints is admissible. After this interchange, we have the matrix with D 2 == 12 (i,j: 1,2) > 0
1 (11 (j : 2) - 11 (i : 2) 12 (i, j : 2, 3) 12 (i, j : 2, 4) ) - 11 (j : 1) 11 (i : 1) - 12 (i, j : 1, 3) - 12 (i, j, 1, 4) D 2 -12(j,p:1,2) 12(i,p:1,2) 13(i,j,p:1,2,3) 13(i,j,p:1,2,4)
(14.20)
Now let the coefficients of the pth constraint be denoted by A~j' It follows from (14.12) that
(14.21 ) Hence, if A~4 < 0 the set is either empty or there is a redundant constraint. If A~4 > 0, this configuration does not contribute to the promoted version of ~+. Of course, this statement does not apply if the interchange is made with respect to some other constraint. Let us now examine the other admissible exchanges within the present matrix. From (14.19) it appears that the interchange of the i th and second variables is one such possibility. But this follows the interchange of the i th and the first. But this is, by Theorem 12.2, the interchange of the i th and second followed by a permutation. From (14.19) we see that the only other admissible interchange is the interchange of the jth and third constraints under the condition
12 (i,j: 1,3) > O.
(14.22)
This interchange gives the matrix
1 (11 (j : 3) - 11 (i : 3) 12 (i, j : 3, 2) 12 (i, j : 3, 4) ) -11 (~: I? 11 (~: 11 -12 (i,j: 1,2) -12 (i,j: 1,4) D 2 - 12 (J, P . 1, 3) 12 (z, p . 1, 3) 13 (i, j, p : 1, 3, 2) F (i, j, p : 1, 3, 4) 2
(14.23)
Simplex Method in Linear Programming
169
with D 2 == 12 (i,j : 1,3) which by (14.22) is positive. By (14.18) we have 12 (i,p: 1,3) < O. This configuration appears to have insufficient information to resolve the sign of 12 (j, p : 1,3) . However, if the assumption (14.22) leads to a legitimate simplex step it does impose the additional sign 12 Ci, j : 3, 4) > O. (14.24) In any case, the previous configuration \vas sufficient to resolve the case of the constraints in three variables. When there are more constraints the additional condition (14.24) may be helpful in analyzing the interaction of various sets of three non-basic constraints combined with the basic constraints. We remark, also, that if the same constraints i and j solve the maximum problem determining the next simplex step for two steps in a row, the analysis of (14.20) is sufficient to produce either a complete simplex step or to find a redundant constraint. That this be the case when both maximums are achieved by the i th constraint \vould require the interchange i - 3. By (14.19) this is impossible since both the 1,3 and 3,3 elements are positive. Finally, we consider the sign configuration 234
1 8
j p
+ + + 8 + + + +
(14.25)
The interchange of the i th and first constraints leads to
+ + + + + +
(14.26)
instead of (14.19). Some of these signs are determined as before and the others are consequences of Theorem 10.2. Now we notice that the interchange of the jth constraint with the second is the only admissible simplex interchange. Now to apply the preceding analysis to (14.20), we need only (14.21). This is again a consequence of Theorem 10.2.
15. THE CASE OF SIX CONSTRAINTS IN THREE VARIABLES. The analysis of the preceding section yields the following Theorem.
Theorem 15.1. Let us consider a set of Six Constraints in Three variables which is in Canonical form and with only one constraint not satisfying the basic vertex. If completing a simplex step or finding a redundant constraint or finding the set to be ernpty requires more than three steps then up to a permutation of the first three columns we may assume the configuration of the non-basic constraints 8
+ ± +
J:+8±+ p :
(15.1)
+ +
We leave open the question of whether the number of steps can be reduced from three to two by starting with the configuration
+ -t- +for the pth constraint.
(15.2)
Pedersen
170
References [1] Dantzig, Linear Programming and Extensions, Princeton Univ. Press. [4] Polya, G., Szego, G., Problems and Theorems in Analysis, Springer-Verlag, New York, Heidelberg, Berlin, Berlin, 1972. [5] Strang, G., Linear Algebra and its Applications (3rd Ed.), Harcourt, Brace, Jovanovich, San Diego. [6] Wu, S. and Coppins, R., Linear Programming and Extension, McGraw Hill. Acknowledgement
I would like to thank Jenny Bourne Wahl for criticizing an earlier version of Sections 1 7 of this manuscript.
An Estimate of the Semi-Stable Measure of Small Balls in Banach Spaces BALRAM S. RAJPUT Knoxville, TN 37923
Department of Mathematics, The University of Tennessee,
Abstract. Let (lE:, 11 . 11) be a separable Banach space. Let J.L be a symmetric r-semistable probability measure of index 0 < a :S 2 on lE:, and let 0 < q < a. It is proven that q if fIE IIxll dJ.L == 1 then J.L{llxll :S t} :S const. t a / 2 , for all t > 0, where const. depends only on r, q and a (and not lE: or J.L). This result compliments similar known results for symmetric Gaussian and a-stable probability measures on lE. Two other related results are also proved; these are needed for the proof of the above main result.
1.
INTRODUCTION AND PRELIMINARIES
Let (lE:, 11 . 11) be a separable Banach space. Let J.L be a symmetric Borel probability measure on lE:. In a recent paper, M. Lewandowski, M. Ryzner, and T. Zak (1992) showed that, if J.L is a-stable, satisfying fIE IIXllqdJ.L == 1 with 0 < q < a, then J-L{ Ilxll :S t} :S const. t, where const. depends only on a and q (and not lE: or J.L). In the case when J.L is centered Gaussian, a similar result is proved earlier by S. Szarek (1991) and also by X. Fernique and by J. Sawa; Sawa requires in addition that lE: be a Hilbert space. (For a discussion and references of the Fernique and Sawa contribution, we refer the reader to Lewandowski, Ryzner, and Zak (1992)).
This research is partially supported by the University of Tennessee Science Alliance, a State of Tennessee Center of Excellence.
171
172
Rajput
The main effort of this paper is aimed at proving a version of the above result of Lewandowski, Ryzner, and Zak (1992) for the larger class of semi-stable probability measures. Specifically, we prove the result stated in the abstract. The proof of Lewandowski, Ryzner, and Zak (1992) in the stable case is based on the fact that every E-valued symmetric a-stable random variable is conditionally Gaussian and on the well known Anderson Inequality for Gaussian nleasures. Since a serni-stable random variable in general is not conditionally Gaussian (Rosinski (1991), p.32), the methods used in Lewandowski, Ryzner, and Zak (1992) do not apply in the more general semi-stable case; a similar situation seems to prevail with regard to the methods of proof used by Szarek and Sawa. Our proof, like the one due to Fernique in the Gaussian case (see Lewandowski, Ryzner, and Zak (1992)), is based on the well known Kantor Inequality. In the Gaussian case (a == 2), this approach yields the same upper bound for J.L{llxll ::; t} as obtained in Lewandowski, Ryzner, and Zak (1992) in the stable case (namely, const. t ). In the proper semi-stable case, on the other hand, this approach yields the upper bound for J.L{ Ilxll ::; t} as const. t Ci / 2 . which, in the interesting case, i.e., when t is close to 0, is worse than const. t . (For more on this point see Concluding Remark). For our proof of the main result, in addition to the Kantor Inequality, we also need an estimate for the lower bound of the tail of symmetric semi-stable probability measures on IE; this is obtained in Lemma 1. This lower bound is obtained by using the PaleyZygmund Inequality and another result which provides a comparison between moments of a semi-stable probability measure and a related F-norm (Proposition 1). Throughout, r and a will denote real numbers satisfying 0 < r < 1 and 0 < a < 2; and the notation r - SS (a) will mean " r-semi-stable index a". Further, throughout IE will denote a real separable Banach space. By a measure on IE, we shall always mean that it is defined on its Borel a-algebra. For the sake of brevity, we refer the reader to Chung, Rajput, and Tortrat (1982) and Rajput and R,ama-Murthy (1987) for the definition and properties of E-valued r - SS(a) random variables and r - SS(a) probability measures on IE. A fact regarding these which will be important for us is the following: Let X be a symmetric E-valued random variable and let J.L == £(X), the law of X; then X is an r - SS(a) random variable (equivalently, J.L is an r - SS(a) probability measure) {::} J.L is n infinitely divisible and J.Lr == rr/Ci • J.L, for all n == ±1, ±2, ... , where a . J.L == £(aX) for a s real number a and J.L , s > 0, denotes the sth root of J.L (see Chung, Rajput, and Tortrat (1982) and Rajput and Rama-Murthy (1987)). Note also that if J.L is centered Gaussian then J.L is r - SS(2) measure, for all 0 < r < 1. Before we end this Section, we introduce a few more notation: Let 0 < P < q < a, then we set
C(r, a,p, q) == Let C
(
nJ u;
a_
q) ~ (2-r~_+:~ ~) .
== C(r, a, ~,q), then we put K(r,a,q)-=-
1 ) ((2C)~) (23) (Jr(l-r) 2~-1;
we note that both C(r, a, p, q) and K(r, a, q) are greater than 1. For a non-negative random variable ~ and p > 0, we shall use the notation Ap(~) for SUPt>o t (P{ ~ > t} )l/ P ; and, for an lE- valued random variable X and q > 0, we shall use the notation IIXllq for (Ellxllq) ~.
Semi-Stable Measure of Small Balls in Hanach Spaces
2.
173
STATEMENTS AND PROOFS OF RESULTS The main result of this paper is the following:
THEOREM 1. Let J-L be a symmetric r-SS(a) probability measure on E and let 0 and K == K(r. a, q). If fIE IIXllqdJ-L == 1, then J-L{llxll ::; t} ::; K tCl./2, for all t > O.
As noted in Section 1, the proof of the theorem depends on the following Lemma. The proof of the Lemma in turn depends on Proposition 1. The counterpart of the Proposition and that of its Corollary (Corollary 1) for the stable case are well known and have played a useful role in the study of stable measures in Banach spaces (see Araujo and Gine (1980), Linde (1982), Linde (1986)). We state and prove these results and the Lemma first. We have stated these in terms of lE-valued r - SS(a) random variables (rather than in terms of r - SS(a) probability measures on IE) because we found it notationally convenient. We note that even though we have assumed throughout that 0 < a < 2, the above and the following results and their proofs are also valid in the case when J-L is symmetric Gaussian (the case r - SS(2)). PROPOSITION 1. Let 0 variable, then we have
< q < a, and let X be an lE-valued symmetric r - SS(a) random
COROLLARY 1. Let X be an lE-valued symmetric r - SS(a) random variable and let then we have
o < p < q < a,
LEMMA 1. Let 0 < q < a and E > 0, and set C == C(r, a, ~,q). Then, for any lE-valued symmetric r - SS(a) random variable X satisfying IIXllq > CE, we have
P{IIXII > E} 2 in particular, if
IIXllq 2
2CE, then
Proof of Proposition 1: The nontrivial part here is the right inequality. A proof of this in the stable case was given by Linde (see Linde (1986), p. 137 and Linde (1982)). The proof given in Linde (1982) and Linde (1986) is based on functional analytic methods and uses certain results of De-Acosta (1977), in particular, the fact that the counterpart of
Rajput
174
An (IIXII) (in the stable case) is finite. Later another proof of this inequality in the stable case was given by Gine, Marcus, and Zinn (1985). This proof is probabilistic and based on an idea of Pisier (see Gine, Marcus, and Zinn (1985)). We adapt this proof to the present semi-stable case. For every n == 1, 2, ... , let k n == [r~ ] ' the integral part of 1/ r n . Then r~ == k n + rn Cn, 0 :::; Cn < 1. Fix n, and let Xj's be iid random variables with [,(X j ) == M , j == s 1, ... ,kn , and let Yn be independent of Xj's with [,{Yn } == Ml-rnkn, where J-L , S > 0, denotes the sth root of J-L == ['(X). Clearly, we have
P{ max (IIYn ll, IIX.ill,j == 1, ... ,kn ) > t} ==1 - P{ max (1IYnll, IIXjll,j == 1, ... ,kn ) ~ t} =1 -
(il
t}) P(llYnll s: t)
P{IIXjll s:
21 - (1 - P{IIX1 II > t} )k n
==1 -
(
1 - J-Lr n
{llxll > t}
)
k
n
(1)
. n
Denote the left side of (1) by L(t) and let v(t) == J-Lr {llxll > t} == J-l{llxll > r-n/nt}. Then (1) becomes L(t) 2 1 - (1 - v(t))k n ; equivalently, v(t) ::; 1 - (1 - L(t))l/k n . Then, using the fact 1 - (1 -
x)l/rn
s:
~ (l':X) ,OS: x < 1, for all m = 1,2, ... , one finally gets v(t)
s:
(k1) n
L(t)
(2)
1 - L(t) .
1ft> 0 is such that P{IIXII > t} < 1/2, then (2) and Levy's Inequality (Araujo and Gine (1980), p. 57) yield
P{IIXII > r-n/nt} < -
(~) kn
2P{IIXII > t}
1 - 2P{IIXII > t}
.
(3)
Set t q == 4~ IIXllq; then, by Chybeshev's Inequality, we have P{IIXII > t q } 1/4. Thus, by (3), we have
::;
Ellt~lIq == q
(4) for every n
(t)
get we have
a
= 1,2, ... Set Sn = tq/rn/a; then (t) a = r~ = k n + Cn' Hence, using (4), we P{IIXII > Sn} = (k n + cn)P{IIXII > Sn} s: 2. Therefore, for every n = 1,2, ... , s~P{IIXII
Now let Sn < S < Sn+!, then sa P{IIXII sn}::; (2/r)t~. Thus, we have
> s}
sup snp{IIXII S>SI
> sn}
s:
> s}
~ 2t~.
s~+lP{IIXII > ~ (2/r)t~.
(5) Sn}
= (~) . P{IIXII >
175
Semi-Stable Measure of Small Balls in Banach Spaces
But since, clearly, 8a p{IIXII > 8}:::;
Showing
~,ifO < 8:::; 81 ==~, we have ro
) IIXllq.
Aa(IIXII) ::; (2:i~
Another proof of this part of the inequality is also possible; this proof uses the analog of the above inequality for the stable case (Linde, 1986), and a comparison result of the tail probabilities of symmetric stable and semi-stable JE.-valued random variables due to Rosinski (1987). The proof of this comparision result in turn depends on the theory of single stochastic integrals (Rosinski, 1987). The above proof is direct and does not depend on any of these facts. The left side inequality is standard; we include a simple proof for completeness. For simplicity of notation, Set ~ == 11 X 11. From above, we have A == Aa (~) < 00; and, clearly, by the definition of A,P{~ > t} :::; min{~:,l}. Using this we get E(~q) ==
qJoOOuq-lP(~ > u)du:::; qJoOOuq-lInin{~:,l}du == qJoAuq-ldu+qJ;uq-l(~)adu
= Aq
+ (~)
Aq
=
(a~q) Aq.
This yields
(~)~ IIXllq ::; Aa(IIXII)·
This completes
the proof of the Proposition. The proof of the Corollary is immediate from the Proposition. Proof of Lemma 1: Let ~
!!
== IIXII and A == 5. Then, since IIXII!! E~2 2
~ C- 1 11 X
llq >
C- C£, 0 < A < 1. Therefore, by the well-known Paley-Zygmund Inequality, 1
Since
~~; f.,
;:::
~. C2
This proves the first inequality; the second is now immediate from the
first. Proof of Theorem 1: As in the definition of K, Set C == C(r, a, 1, q) and let Y == 2CX where X is an JE.-valued randolll variable with L:(X) == J.-l. Let n and m be any positive rn integers satisfying 0 < rnm :::; 1. Let Y1 , ... ,Ym be iid's with L:{Y1 } == v == r n / a .v, th where, as before vs, 8 > 0, is the 8 roof v == L:(Y); and let Zm be independent of Yj's . (n with L:{Zm} == v I - r n m. Then uSing v r ) *m * V 1 - r nm == V T nm * V 1 - T nm == V, we have
176
Rajput
v == £(Y) == £ (2:::1 Yi + Zm) . (Here * denotes the usual convolution). Hence, it follows, form Kantor's Inequality (Araujo and Gine (1980), p. 136), that
m
i=l
S;
(D
~
(3) [mP{IIY
=
(~) [mP{11Y11 > 1}]-1/2;
2
[
-1/2
~P{IIY;II 1 11
> rn/a} + P{IIZmll
> rn/o}
provided
IIYllq ~
]
]
in the last step we have used the fact that v inequality of Lemma 1, we have
P {II YII> 1}-
> r n/ a
1/2
-1/2
(6)
rn
== r n / o . v. Taking c == 1, in the second C~
S; (1 - 2 _!l)' 2
2C. Therefore, (6) yields
(7) where D
== (~) 2
(--2L) = (;!) (2C)~ 1-2-1 2 (21-1)
.
Let kn be the integral part of l/r n , then 7'~
== kn + Cn, 0
~
Cn
< 1. Taking m == kn in
(7), we get
Thus, since (l-rncn) ~ 1-r and n was arbitrary, we have P{IIYII ~ rn/o} ~ D(r n / o )0/2(1r)-1/2, for all n == 1,2, .... Now let t be any positive real number satisfying r~ < t < r~; then the preceeding inequality yields that
D ( n/a)0/2
~
P{IIYII S; t} S;P{IIYII S; rn/a} S; D
n+l
(
r~
)0/2
----;::::=========:--
Jr(l - r) if
t}
IIYllq < -
<
D to / 2
- Jr(l - r)
~ 2C. We have thus proven that if 0 < t ~ r!; and D
( V~ r(l-r) )
t o / 2 . If t > r!;, then
D to./
2
~ V r(1-7')
>
D
(
,
IIYllq
~ 2C, then
P{IIYII
~
1) 0./2
ra
v~ r(l-r)
==
D
~-r v J.-'"
> 1 ; thus, th e
Semi-Stable Measure of Small Balls in Banach Spaces
177
preceeding probability inequality is valid for all t > O. Therefore, recalling that Y == 2CX 2 2C 2D / and observing that K(r,Q,q) = ~' we have P{IIXII -s: t} -s: K t a 2, provided Q
Q
/
/
r(l-r)
IIXllq ~
1. This completes the proof.
CONCLUDING REMARK Let IE == lR, the real line; then it is easy to show that, for any symmetric r - SS(o:) probability measure J-L on lR satisfying .fIR Ixlqdpl == 1 with 0 < q < 0:, J-L{lxl ~ t} ~ const. t where the const. depends only on T,O: (and not on J-L). To see this, we proceed as follows: Let a be the (finite) symmetric spectral measure (on 6 == {r i- < Is I ~ I}) of the given measure 11, then cp(y), the characteristic function of J-L, is given by
where ko(t) == Itl-O L:~=-oo T- n (l - cosrn/ot), t :f- 0, ko(O) == 0 (Rajput and RamaMurthy, 1987). Now, if f denotes the probability density function of J-L, then, using the fact ko (t) ~ do (r, 0:), t -# 0 (the constant do depending only on T and 0:) (Rajput and Rama-Murthy, 1987), we have, for t > 0,
Now recalling the fact that (.f~ IsIOa(ds))i- ~ d1(r, 0:, q)·(fIR Ix,qdJ-L)~ (where d 1 depends only on r, 0: and q) (Linde, 1986), we get J-L{lxl ~ t} ~ const.(T, 0:, q) t, provided fIR IxlqdJ-L == 1, where const.(r, 0:, q) ==
2f(i).
r/ ell
This fact notwithstanding, the question whether t a / 2
7rod
can be replaced by t in the statement of TheoreIIl 1 relnains open. The bet here seems to be that the answer to this is affirnlative!
178
Rajput
REFERENCES 1 A. Araujo and E. Gine (1980). The CLT for Real and Banach Valued Random Variables, J. Wiley New York. 2 D. M. Chung, B. S. Rajput, and A. Tortrat (1982). Semi-stable laws on topological vector spaces, Z. Wahrsch. verw. Geb, 60: 209- 218. 3 A. De-Acosta (1977). Asynlptotic behavior of stable measures, Ann. of Probab., Q: 494-499. 4 E. Gine, M. B. Marcus, and J. Zinn (1985). A version of Chevet's Theorem for stable processes, J. Functional Anal., 63: 47-73. 5 M. Lewandowski, M. Ryzner, and T. Zak (1992). Stable measure of a small ball, Proc. Amer. Math. Soc., .f: 489-494. 6 W. Linde (1982). Operators generating stable measures on Banach spaces, Z. Wahrsch. verw. Geb., 60: 171-184. 7 W. Linde (1986). Probability in Banach Spaces, J. Wiley, New York. 8 B. S. Rajput and K. Rama-Murthy (1987). Spectral representations of semi-stable processes, and semi-stable laws on Banach spaces, J. Multi. Anal., 21: 141-159. 9 J. Rosinski (1987). Bilinear random integrals, Dissertations Mathematicae, CCLIX. 10 J. Rosinski (1991). On a class of infinitely divisible processes as mixtures of Gaussian processes, Stable Processes and Related Topics (S. Cambanis, et al), Birkhauser, Boston, 27-41. 11 S. Szarek (1991). Condition numbers of random matrices, J. of Complexity, 1: 131-149.
Nonsquare Constants of Orlicz Spaces ZHONGDAU REN 92521
Department of Mathematics, University of California, Riverside, CA
Dedicated to Professor M. M. Rao on the occasion of his 65th birthday.
Abstract. Estimation of nonsquare constants, in the sense of James, of Orlicz spaces is given. Clarkson's inequalities for LP space have been generalized for Orlicz space by using M. M. Rao's interpolation theorem. The exact values of nonsquare constants of a class of reflexive Orlicz spaces are also obtained by using a new quantitative index of N-functions and the inequalities of Clarkson type for Orlicz spaces. 1993 Mathematical Subject Classification: 46B30.
1
Introduction
Let X be a Banach space and let S(X) == {x EX: IIxll == I} he the unit sphere of X. In 1964, Jarnes[9] called X uniformly nonsquare if there exists a 6 > 0 such that for any x,y E S(..Y), either 11~(x + y)11 :S 1 - 6 or 11~(x - y)11 :S 1 - 6. In 1990, Gao and Lau[4] introduced the following. Definition 1.1 The pararneter J(X) of a Banach space X, which will be called nonsquare constant in the sense of J ames in this paper, is defined by
J(X) == sup{ min(llx
+ YII, Ilx - yll) : x, y
E
S(X)}.
(1)
Gao and Lau[4] proved that ..Y is uniformly nonsquare in the sense of James if and only 179
180
if J(X)
Ren
< 2.
Remark 1.2(See [4]) SchaJfer[16] called ..Y uniformly nonsquare if there exists an a > 1 such that max(llx + yll, Ilx - yiD ~ a for any x, y E S(X). Nonsquare constant g(X) of a Banach space .LY, in the sense of Schaffer, is defined by
g(X) == inf{max(II.T + yll,
Ilx -
yll) : x,
yE
S(X)}.
If dim X ~ 2, then 1 :S g(X) :S V2 :S J(X) :S 2 and g(X)J(X) == 2. Therefore, 1 < g(X) if and only if J(X) < 2, i. e., .LY is uniformly nonsquare in the sense of Schaffer if and only if X is uniformly nonsquare in the sense of James(see also Gao and Lau[5]). In this paper, we only deal with J(X) when X is an Orlicz space. Let lvl lul <1>(u) == la 4J(t)dt and \lJ(v) == la ljJ(s)ds
r
r
be a pair of complementary N-functions, i.e., 4J(t) / 00 as t / 00. The Orlicz function space L cl> (0) on ~l == [0,1] or [0,(0) is defined to be the set {x : x is Lebesgue measurable on 0 and p
= inf { C > 0 : P4>(~)
and
IIXII4>
= sup
<::: 1}
{k Ix(t)y(t) Idt : p",,(y) <::: 1} .
The norms are equivalent: Ilxll(cl» :S II:rllcl> :S 21Ixll(
n
CX
==
.
n
,
. <1>-l(U) PcI> == hm sup £f..-l( u--,;oo '±' 2u )'
(2)
,
Pep ==
<1>-l(U) -1 ( ) , <1> 2u
(3)
_ { <1>-l(U) } (Jet> == sup <1>-1(2u): 0 < U < 00 .
(4)
. . <1>-l(U)
hm lnf 1( ) u--,;oo <1>- 2u
<1>-l(U) a~ = !im inf <1>- l ( 2u ) u--,;o
n
,0.
hm sup u--,;o
and _ 0:
==
.
1nf
{<1>-1 (u) <1> _ 1 ( 2u) : 0
<
}
'U
< 00 ,
Nonsquare Constants of Orlicz Spaces
181
The following result will play the leading role in this paper. Theorem 1.3 (i) (j. 6 2(00) {:} /3cf> == 1, (j. \72(00) {:} a4> == ~; (ii) (j. 6 2(0) {:} /3g == 1, (j. \72(0) {:} a~ == ~; (iii) f/: 6 2 {:} i34> == 1, (j. \72 {:} a4> == ~. The proof of Theorem 1.3 can be found in [14, p.23] and [15]. Another quantitative index of is well known and is provided by the following six constants:
. . tcjJ(t) A4> == hmlnf ffi( )' t-+oo '¥ t
. tcjJ(t) B4> == hmsup ffi()' t-+oo '¥ t
(5)
o .. tcjJ(t) A4> == hm lnf '¥ ffi (t ) , t-+O
o. tcjJ(t) B4> == hmsup ffi( )' t-+O '¥ t
(6)
and
-
A
. {tcjJ(t) } = mf (t): 0 < t < 00 ,
-
B4> == sup
{tcjJ(t)
}
(t): 0 < t < 00 .
(7)
It is also known that f/: 6 2(00) {:} B4> == 00, f/: \72(00) {:} A4> == 1, f/: 6 2 (0) {:} Bg == 00, (j. \72(0) {:} A~ == 1, f/: 6 2 {:} 134> == 00 and f/: \72 {:} A4> == 1. Furthermore, we have the following. Proposition 1.4 Let
- +A4>
1
BlJt
== -
1
A~
+-
1
B~
==
1 -=-
A4>
1
+ --- == BlJt
1
.
(8)
Proposition 1.5 Let (u) be an N-function. Then
(9) (10) and (11 )
The proofs of Propositions 1.4 and 1.5 can be found in [14, p.27], [11) and [15).
2
Lower Bounds of J(L((O))
We first consider the Orlicz space L(4))(O) == (L4>(O), 11 . 11(4))) with 11 . 11(4)) being the gauge norm defined by an N-function on 0 == [0,1], or [0,(0) with the usual Lebesgue measure {L, and the Orlicz sequence space £(<1». Theorem 2.1 Let be an N-function. Then, nonsquare constants of L(4)) [0,1], L(4)) [0,(0) and £(4)), in the sense of James, satisfy respectively (12)
182
Reo
(13) and
max
(:~, 2,6g) ::; J(£<4»).
(14)
Proof To prove (12), we first show
~
:::; J(£(
(15)
a
By (2), there exist 1 < Ui /' 00 such that 1imi--+oo :_-II(~~}) == a. For any given is an i o 2: 1 such that for i 2: i o -1 (Ui) -1( 2u i) < a + f. For simplicity, we set
Uo
==
Uio·
Choose Cl and C 2 in [0,1] such that Cl
f
> 0, there
nC 2
== 0 and
f-L( Cl) == p( C 2 ) == 2~o· Put x(t) == -1(2uO)XCI (t)
and
== -1(2uO)XC2(t),
y(t)
where XCI is the characteristic function of Cl. Note that
We have Ilxll(
Ilx Since
f
YII(4))
= Ilx + YII(4)) =
-1 ( 2u o) -1( ) Uo
1
> -+-. a f
is arbitrary, we obtain (15) by (1). Next we prove
(16) For any given
f
> 0, by (2), there exists a
Vo
> 1 such that
-1 (vo) -1( 2v o) > (3 -
Choose El and E 2 in [0,1] such that El x(t) == -l(VO)[X E I (t)
Then Ilxll(
+ XE 2(t)]
nE
2
f
2".
== 0 and /-1 (El ) == /-1(E 2 ) ==
and
2~o. Put
y(t) == -l(VO)[X E I (t) - XE 2(t)].
Nonsquare Constants of Orlicl Spaces
183
Since E is arbitrary, we get (16) by (1). Hence, (12) follows from (15) and (16). The proof of (13) is similar to that of (12). Finally, we prove (14). We first show
(17) For any given 1 > t > 0, by (3), there exists 0 < Uo < ~ such that [-1(UO)/-1( 2uO)] + E or, equivalently, [(a~ + E)-l (2uo)] > Uo.
<
a~
Let k o == [2~o] be the integer part of 2~o' Then k o :::; 2~o < k o + 1. Choose c ~ 0 such that 2kouo + (c) == 1. Put ko
and ko
ko
~
,
A
,
Y == (0, ... ,0,0, -1 (2uo), ... ,-1 (2uo), C, 0, 0, ...). Then, we have PlI>(x) == PlI>(Y) == 1, Ilxll(lI» == IIYII(lI» == 1 and
PlI>
1[ (a~+E)(X-Y)]
PlI>
E
[
(a~
+ E)(X + y)] 1-
E
1 1- E 1 --{2ko[(a~ + E)-1(2uo)] 1- t 2kouo 1 - 2uo - - > - - - > 1. 1- f 1- E
> --p[(a~ + E)(X + y)]
>
+ 2[c(a~ + E)]}
Therefore, min(II·T - YII(
E
Ilx + YII(lI»)
~
1-
E
-0--' a
+E
is arbitrary. Secondly, we prove
(18) For any given 1 > or, equivalently
E
> 0, by (3), there exists m. '¥
Let k o ==
°< Vo <
A
X
> /3g-~
[2-1(VO)] '3 0 > 2vo· 2{
-
E
[21\0]' Then ko :::; 2~o < k o + 1. Choose t ~ ko
~ such that [-l(vo)/-l (2vo)]
°
such that 2kov o + (t) == 1. Put
ko A
== (4)- 1(vo), . . . , - 1 ( Vo )', 4>- 1(Vo ), . . . , - 1 ( Vo )', t, 0, 0, . . .)
184
Reo
and Y
Then Ilxll(4))
==
ko
ko
A
A
== (~-l(vo),"" -l(vo)','--l(vo),"" --l(vo)', 0, t, 0,"
IIYII(4))
.).
== 1 since P4>(x) == P4>(Y) == 1 and
x- Y
P4> [ (1 - E) (2 f3~ - E)
]
+Y
x
]
P4> [ (1 - E) (2f3~ - E) 1 [ -x + Y- ] > --P4> I -
2f3~
E
E
{k [2-1(V 2f3~ -
1
1-
-
f
0
2kovo l-E
O)] f
(
+2
t 2f3~ -
)} E
1 - 2vo 1-E
> -->-->1. Therefore,
+ YII(4))) 2: (1 - E) (2f3g
min(llx - YII(4))' Ilx
- E).
Since E is arbitrary, we obtain (18). Finally, (14) follows from (17) and (18).0 Next we deal with another three classical Orlicz spaces equipped with Orlicz norm. Theorem 2.2 Let be an N-function. Then nonsquare constants of L4>[O, 1] == (LcI>[O, 114», L4>[O, 00) and £cI>, in the sense of James, satisfy respectively max
(2,B\jJ'
l
a
J :;
J(£[O,
ID,
1], 11·
(19)
(20) and max
(
0 2,B\jJ,
1)
a~
::; J(£
(21 )
where 'lT is the complementary N-function to . Proof We omit the proof of (19) and turn to prove (20). We first show
(22) By the definition of
(3w in (4), for any given 1 >
f
> 0 there exists 0 <
'IT-I (vo) 'IT-I (2vo) > f3w Choose G l and G 2 in [0,00) such that G l
nG
2
and
Vo
< 00 such that
f
2"'
== 0 and
IL(G l )
==
tt(G 2 )
==
2~o' Put
Nonsquare Constants of Orlicl Spaces
185
Note that
IlxGlll
=
J1(Gl)'IJ~\},(~l/
Therefore, one has Ilxll
Since
E
is arbitrary, we obtain (22). Next we prove 1 -=::; J(£ [0, (0)).
(23)
Q'It
For any given
E
> 0, there is a Uo >
°such that w-I(uo)
_
W- I( 2u O) <
Choose El and E 2 in [0, (0) such that El
x(t)
Uo
= 'IJ~l('UO)
[XEl (t)
+ XE2(t)]
nE
Q'It
+ E.
== 0 and
2
== I1(E2 ) ==
1 -2 . UQ
Put
Uo
y(t)
and
I1(E l )
=
'IJ- 1('Uo) [XEl (t) - XE2(t)].
One has Ilxll
Ilx - yll
W-1(2uO) 1 'IJ~1( ) > --- . Uo Q'It + f
Since f is arbitrary, we obtain (23). Hence, (20) follows from (22) and (23). To prove (21), we first show
(24) For given 1 >
f
> 0, there exist
'Un
~
°such that for all n
W- I (v n ) W-I (2v n ) >
We may assume 2v n
::;
1 for all n ~ 1. Let kn
0
fJ'It -
==
1 -k- - < 2v n n +1 'It-I
(~ll
Since ~ /'
and
00
as v ~ 0, we have
~
f
2'
[2~n]' Then ::;
1 -k . n
1
Reo
186
Put
and Cn
Then bn
~
= 2(kn + 1)'11 -1
[
2(k
1] + 1) - 2k '1l n
-1 (
n
1)
2k
n
'
0 and 1 Cn < 2(k n + 1)W- 1 (-k ) - 2kn w- 1 2 n
as n --+ 00. Choose no ~ 1 such that bncn < k o == k no , Co == Cno and bo == bno ' Put
E
(_1_) == 2w- (_1_) --+ 0 2kn 2kn 1
for all n
~
no. For simplicity, we set Vo == v no '
ko ~
X == (b o, bo, ... , bo, 0, 0, ... , 0, ... ) and ko
ko
~~
Y == (0, 0, ... , 0, bo, bo, ... , bo, 0, 0, ... ). We have Ilxll«I> == Ilyll«I> == bok ow- 1 (t) == 1 and
Ilx + yll«I>
Ilx - yll«I>
bo2ko'1l-
1
C~J
bo { 2(k o +
1)'1I~1
[2(k 1+ 1)] - Co } o
> bo ['11-:~ Vo ) - co] > bo [l~O
~
((J~ -
D
1 '11- ( 2v o) - co]
i) boko'1l~l CJ - boco 2 (f1~ - i) - boco 2
((J~ -
> 2(/3~ - E). Since
E
is arbitrary, we obtain (24). Finally, we prove 1
-0
Gw
For any given 1 >
E
> 0, there exist
~
>
~
'Un
«I> J(€ ). ~ 0 such that for all n ~ 1
(25)
Nonsquare Constants of Orlicz Spaces
°
1) -
== (k n + 1) W-1 ( k + 1
Sn
Since t n ~ and Sn ~ and so, for n 2:: no
187
n
°
as n -+
00,
11,0
==
Uno'
to == t no and
So
==
(
1)
k
'
n
there is an no 2:: 1 such that E
E
2
1+f
2t n s n < - < - - < Let us set
k n W-1
-
tns n
<
~ for all n
2:: no
f
-0--' D:\lJ f
+
and define
Sno
ko
ko
~~
X
== (to,to,···,to,to,to,···,to,O,O,···)
and ko
ko
~,
Y
Then, we have 11:];111>
A
,
== (to, to,' ", to, --to, -to,"', -to, 0, 0,' .. ).
== Ily/l1> == t o2koW- 1 (2k o ) == 1 and
Ilx -
YII1>
Ilx + yll1> 2t okoql-l
(:J
2t o [(k o + 1)q1-1 (k
o
~ 1) - 8 0 ]
1 > 2t o [W- ( 2u o) _ so]
2uo
q1~I~'UO))
> 2t o[
2uo
D:\lJ
+f
-
so]
- W - 1 ( - 1 ) - 2toso > -io2ko D:~ + f 2ko 1-
t
> D:~ + f ' Since E is arbitrary, we have proved (25). Thus, (21) follows from (24) and (25). 0 Some exarnples will be given in Section 4. Remark 2.3 James[9] proved that every uniformly nonsquare Banach space is reflexive. For the above six classical Orlicz spaces, this can be easily proved. For instance, by Theorem 1.3 and Theorem 2.1 "ve have that
Moreover, Chen[l], Hudzik[7] and \\Tang and Chen[17] proved that uniform nonsquareness coincides with reflexivity for these ()rlicz spaces(see also [6]). Some relations between nonsquare constants and other geometric coefficients of Banach space can be found in [5, Theorem 5.4] and [18, Theorern 3.2].
188
3
Reo
A Generalization of Clarkson's Inequalities
Clarkson[2] is the first mathematician to study geometry of Banach space. His results, called Clarkson's inequalities in these later days, deal only with LP spaces(see also Corollary 3.4 in this section). In 1966, Rao[12] first obtained Riesz-Thorin type interpolation theorem between Orlicz spaces equipped with Orlicz norm(see also [14, p. 226]). In 1972, Cleaver[3] generalized Rao's interpolation theorem for fP-product of Orlicz spaces(see also [14, p. 240]). In 1985, the author proved that these theorems are still valid for Orlicz spaces equipped with gauge norm(see [14, p.226, p.256]). In this section, by using Rao's theorem with its generalization, we generalize Clarkson's inequalities for the case of Orlicz spaces. The main result of this section is Theorm 3.2, which will be used in Section 4. Let us start with the following. Lemma 3.1 Let
n
Then
_ _ l-s ( fJ~s = (fJ~) and
1) (1) v'2
v'2
s
~
s
<1
)s> (v'2)S 1 - > -. v ' 2-2 2
__ (_all> )l-S( - 1
all> s
Therefore, the conclusion follows from Theorem 1.3(iii). 0 Theorem 3.2 Let
[llx + Yllt~s) + Ilx - Yllt~,)] 2 ~ Similarly, we have for any x, y E
2-s
21 [llx ll e;:)
+ IIYIIC;:l]---'---
Lll>s (0) s
2-s
[lIx + Y111, + Ilx - Yllt] 2 ~ 21 [llxll~~s + IIYII~~s ] ---,--Proof Let $1
(28)
== (
T1
S oo.We define
(29)
Nonsquare Constants of Orlicz Spaces
where
11 (x, y) 11(~l),Tl
189
[llxll(~) IIYII(~)];:;-,
+ if 1 :s; Tl < max(lIxll(~), lIyll(~)), if Tl == 00.
== {
00
Similarly, we can define X[(Ql), t l ], X[(<j;2), T2] and X[(Q2), t 2] for Ql == (<1>, <1» and <j;2 == Q2 == (<1>0, <1>0)' Now let us choose Tl == 1, t l == 00 and T2 == t 2 == 2, and define a linear operator T : X[(
IIT(x, Y)II(Ql),tl
max(llx + yll(~), Ilx - yll(~)) ~
Ilxll(~)
+ lIyll(~)
Cl/l(x, y)II(~d,Tl and 1
[llx + YII(~o) + Ilx - YII(~o)] t2 1
[llx + yll~ + Ilx - YII~] 2 1
V2 [llxll~ + IIYII~] 2 C2 11 (x, y) 1l(
J2. 1
-
Ts
Let
Ts
1-
8
== - -
Tl
and t s be determined by S
+-
T2
1
1-
ts
tl
8
8
and
- == - - +-.
and
2 t s == -.
t2
Then 2
Ts
== - 2-8
(30)
8
In view of Rao's interpolation theorem and Cleaver's generalization(see also [14, pp.236-239)), we have T E {X[(<j;s), Ts ] ---+ ..X" [(Qs) , ts]} and, by ci-sc~ == 2i, (31) where <j;s == Qs == (<1>s' <1>s) with <1>s being the inverse of (26) for 0 < 8 ~ 1. Therefore, (30) implies that ..y[(<j;s), Ts ] == {(x, y) : x, Y E M(~8)(r2)} equipped with norm
II(x, y)II(~,},r,
2-8
=
[lIx11t,;:) + Ilyllt,;:}]'--
(32)
and that
(33)
Reo
190
It follows from (31), (32) and (33) that (28) holds for any :r, y E l\1(
where
lI(x
== { ["xll~l + Ilyll~l]~,
y)ll_ , •
max( Ilx 11
if 1 :S Tl < if T 1 == 00.
Ilvll <1>),
00
Hence, (29) holds by similar arguments. 0 Recall that the modulus of convexity and the IllOdulus of srnoothness of a Banach space X are b(X, E) defined on [0,2] and O(..Y, T) on [0,(0) respectively by
o(X, f) = inf { 1 and
Q(X,
T) =
sup
~11:r: + yll : :1:, y E S(X), II.T - yll
{~(II:r: + y\1 + 11:1: - yll) -
1 : :r: E S(X),
=
f}
Ilyll = T} .
We say that ..Y is uniforrnly convex if b(..Y, f) > 0 for every 2 2: f > 0 and that ..Y is uniformly smooth if limT--+o [O( ..Y, T) /T] == O. Corollary 3.3 Let be an N-function and let 1>8 be the inverse of (26). Suppose that o < s ~ 1 and that
"Ys
E
{L ( s)[0, 1], L (
00 ) ,
€
Then, X s is uniformly convex and uniformly SIYlooth. rvlore precicely, one has (34) and
(35)
- 1.
Proof We first deal \vith gauge norm. If :1:, y E 5("\8) and
II.T -
YI/(s) ~
E,
one has from
(28)
or, equivalentely,
1-
21 11 :r + yll(
~ 1-
2)1
1 ('2 2 2~
- f~
,
which implies (34). Therefore, b( ..Y s , E) > 0 if 0 < f :S 2, i. e., "\8 is uniformly convex. On the other hand, if Ilxll(s) == 1 and IIVII(
~ (11:1: + YII(,) + Ilx - YII(,l)
::; :S
[~(II: l: + YII~,) + 11:1: _ yll{,))] 1 '2
(
1 + T2-s
)
2;8 ,
Nonsquare Constants of Orlicl Spaces
191
which shows (35). Therefore, lim
T-+O
{!(X~, T) 1 [ ' ~ lirn - (1 T
+ T 2-8) 2;8 - 1] == 0, 2
T
T-+O
i. e., ..:Y"s is uniforrnly srnooth. Using (29), we can sho"v that (34) and (35) are still valid for another three ()rlicz spaces equipped with Orlicz norm. 0 From this result we can deduce the follo\ving. Corollary 3.4 (Clarkson 's inequalities) Suppose that 1 < ]J < 00, 1P + 1q 1 and x, Y E LP(~l), \vhere D is as in Theorem 3.2. Then, one has for 1 < P S 2 1
1
(I
24
[II.T + yll~ + 11:1: - YII~] ~ and for 2
~ p
<
1
Proof If 1 <
==
2(p-a) p(2-a) '
i. e., s(u)
(36)
00 1
[llx + yll~ + II.T - YII~]]I ~ .5
1
[llxll~ + IIYII~]]I
]J ~
we have 0
2, \ve choose 1 <
< .5 < 1 and for -
== iul P. Since
L(1)8)(~l)
==
CL
[llxll~ + IIYII~]
2];
~ 2. Putting
< P
LP(~1), 11· 11(1)8)
. 2 P hm - == - p- 1
2 lim -
== p
(37)
•
(u) == lul a , o(u) == u 2 and
==
2-8 lirn - 2
== q,
'a\.l
lip and
11 .
1
== -
.
2-8
00.
p-1
1
p
q'
Letting (u)
== - - == -
hm - 2
and
(38)
]J'
we obtain (36) by (28). If 2 ~ ]J < 00, we choose 2 ~ p < b < .') == ~i~=~j, again we have 0 < .') S 1 and s('u) == lul P . Since
b/oo .5
q
> 0 -
u
a\.l S
1
b/oo
== lul b and
(39)
we get (37), again by (28). 0 Remark 3.5 In view of (:34), (35), (38) and (39), one has cS ( LP (D),
and
E)
~
q
{ 1 - ~ (2 1 - ~ (2 P -
f.
q) : '
fP)
P,
(!(LP(D), T) S { (1 + TP)~ - 1, (1
+ Tq) 4 -
1,
~~
~
1< p 2 If 2 ~ p < 00
~f 1 < p ~ 2 If 2 ~ p <
00,
which imply that LP(D) is uniformly convex and uniformly smooth. Of course, the above two inequalities can be directly induced from (36) and (37). It should be noted that the Inodulus of convexity of some special Orlicz spaces was discussed by Rao[13, pp.307-308] and Hudzik[8], independently(see also [14, pp.289-303]).
192
Ren
Main Theorems
4
Now we can estimate upper bounds of nonsquare constants of some Orlicz spaces by using Clarkson type inequalities (28) and (29). Theorem 4.1 Let <1> be an N-function and <1>s be the inverse of (26). If 0 < s :::; 1, then nonsquare constants of L(
2 [min(llx
+ YII(
:2
YII(4)s))]; ::;
:2
Ilx + Ylll
Ylll
2~
or min(llx
+ YII(
YII(4)s)) ::; 21-~,
which implies (40). Similarly, we can prove (41) by using (29). 0
Example 4.2 If 1 u
2:
°
and for
°<
(X)
and <1>( u)
== lul 2p + 21u1 P , then· <1>-1 (u) == (JU+T - 1) ~ for
s ::; 1
It is easily seen that
.
O!,
aO
Q
==
a~s and !34>s
<1>;l(U)
(1) l:;S+~
= (3<1>, = J~~ <J>;1(2u) ="2
'
('u) (1) l;S+~ o - lim <1>-1 fJ
== f34>s' Therefore, we have from (40) and Theorem 2.1 that 21- i 2: J(L(
and 21-
~
2: J ( L ( s ) [0, (X) )) 2: { 2
7~_:"
2-(2jJ+"2),
Example 4.3(see[4]) If 1 < p <
00
if 1 < p
~~
if~:::;p
and LP(O) E {LP[O, 1J, LP[O, (0), £P}, then
J(LP(O)) == max (2~, 21-*) .
(42)
Nonsquare Constants of Orlicz Spaces
In fact, putting
== lulP,
193
we have (};cflp
== f3cfl p ==- (};~p ==f3g p ==
Qcfl p
== l3cfl p ==
2-* and
max (2~ 21-~) ~ J(LP(0.)). I
The opposite inequality of the above follows from (40), (38) and (39). Thus, (42) holds. Combining Theorem 2.1 and Theorem 4.1 with Theorem 1.3, we can find the exact values of nonsquare constants in the sense of J ames of a class of reflexive Orlicz spaces equipped with gauge norm. The first main theorem of this paper is as follows. Theorem 4.4 Let
n
J(L(cfl s)[0, 1]) == (ii) If
rt
6 2(0)
21-~.
(43)
n V2(0), then (44)
(45)
Proof (i) In virtue of (12) and (40), one has max (_1_, 2f3cfl s) Q:'cfl s
If
rt
6 2(00 L then ~ ~
(};cfl
~
(3cfl
==
~
J(L(cfls)[O, 1])
~ 21-~.
(46)
1 by Theorem 1. 3 (i). Therefore, we have from (27)
and so, (47) If
rt
V2(00), then ~
==
(};cfl
~
(3ep
~ 1 by Theorem 1.3(i). Since
_1_ == 21-i > 2(3
s
again, (47) holds. Finally, (43) follows from (46) and (47). (ii) and (iii) are similar to (i). 0 To find the exact values of nonsquare constants of a class of reflexive Orlicz spaces equipped with Orlicz norm we need the following two lemmas.
Reo
194
Lemma 4.5 Let <1> be an N-function and let <1>s be the inverse of (26). Then
(48)
(49) and
1 1- " --== ---+ -8
As
A
2'
1
1-
8
.'3
--- == --- +-. Bs
B
(50)
2
Proof Note that <1>~(t) == c/Js(t) a.e on (0, (0). Putting t == <1>;l(U) for 0 < U < 00 one has
U[<1>,;-1(U)]' <1>;I(U) (1- 8)[<1>-1 (u)]-s[<1>-I(1);)]'u 1+1 + [<1>-1 (u)]1- s u 1 (1- s)u[<1>-I(U)]' 8 ------+<1>-I(U) 2·
[<1>-I(u)P-s~u1
(51)
Therefore, 1 . <1>s(t) . U[<1>-I(U)]' -A == hmsup~() = (1- s)hmslip -1()
t-HX)
t'!! S t
U-tOO
U
S
+ -2 =
1- s
-A
s
+-. 2
Similarly, we can prove the others in (48), (49) and (50) by using (51).0 Lemma 4.6 Let <1>, W be a pair of complementary N-functions. Suppose that
C
r&
1 -;:;0-
== 2
C'I'
(52)
t'lj;(t)
~
and 2r~r& == 1.
Proof The assertions follow from Propositions 1.4 and 1.5.0 The second main theorem of this paper is as follows. Theorem 4.7 Let
n
J(L
(53)
Nonsquare Constants of Orlicz Spaces
(ii) If
195
tf- 62(0) n\72(0) and cg exists, then J ( f
n
(54)
(iii) If tf- 6 2 \72 and Cell exists in the case that the case that f/- 6 2(0) \72(0), then
n
tf- 62(00) n\72(00) or cg exists in
J(L
(55)
Proof (i) By (19) and (41), we have max (2/3 W+' s
_1_) : :; J(Lells[O, 1]) :S 21-1, n WJ
(56)
wt
where is the complementary N-function to 8' The conditions given in (i) imply that C == 00 or C
et-
s
(8) and so, in view of Lemma 4.6,
S
1
n WJ
== fJ WJ == 1 wJ == 2- c'lJJ == 2~-1.
Therefore,
(
max 2/3w+' In the case that C
== 1, one has
-Cl
s
== 1 -
1)
n W+
s == 2 1 -2".
(57)
s
8
-2 ,
1 /'lJ+ ==
-('1
8
-2
and n ws+
== /3 w+ == 1 w+ == 2-1. Again s
s
(57) holds. Hence, (53) follows from (56) and (57). (ii) The conditions given in (ii) imply that max
(
0 2/3wJ'
1)
-0-
n WJ
== 2 1-~2 •
(58)
Therefore, (54) follows from (21), (41) and (58). (iii) By (20) and (41), we have max
(2!3w+,~) :S J(£8[0, 00)) :::; 2 -1. nwt 1
(59)
s
(60) since max(/3wt, !3~t) :::; !3wt and Q'I1t :::; min( Q;t, n~t) always hold. Finally, (55) follows from (59), (60), (57) and (58). 0
196
Reo
Example 4.8 Let 1 < P < 00 and let M (u) be the inverse of
M-1(u) == [In(l
+ u)]tp ut,
u;::::
o.
Then (61) (62) and (63) In fact, putting (u) == e1u\P ~ 1 in Theorem 4.4 we have -l(U) == [In(l and ;- 1 (u) == [In (1 + u)] 1 ;8 U i .
+ u)]i
for u 2: 0
Therefore, M(u) == s(u)ls=~' Since lim -l (u) == 1 -l (2u)
and
U-tOO
it follows from Theorem 1.3 that rt. 6 2(00), et 6 2 and E 6 2(0) n \72(0). Therefore, (61) follows from (43) and (45). On the other hand, we have C
f- i
and Therefore, (63) follows from (14), (21), (40) and (41).
References [1] S. T. Chen, Non-squareness of Orlicz spaces(in Chinese), Chinese Ann. Math., 6A(1985), 619-624.(MR 87j: 46064) [2] J. A. Clarkson, Uniformly convex spaces, Trans. Amer. Math. Soc., 40(1936), 396-414. [3] C. E. Cleaver, On the extension of Lipschitz-Holder maps on Orlicz spaces, Studia Math., 42(1972), 195-204. [4] J. Gao and K. S. Lau, On the geometry of spheres in normed linear spaces, J. Austral. Math. Soc., 48A(1990), 101-112.
Nonsquare Constants of Orlicz Spaces
197
[5] J. Gao and K. S. Lau, On two classes of Banach spaces with uniform normal structure, Studia Math., 99(1991),41-56. [6] R. Grzaslewicz, H. Hudzik and W. Orlicz, Uniform non-f~l) property in some normed spaces, Bull. Pol. Acad. Soc. Math., 34(1986), 161-171. [7] H. Hudzik, Uniformly non-f~l) Orlicz spaces with Luxemburg norm, Studia Math., 81(1985), 41-54. [8] H. Hudzik, An estimation of the modulus of convexity in a class of Orlicz spaces, Math. Japon., 32(1987), 227-237. [9] R. C. James, Uniformly nonsquare spaces, Annals of Math., 80(1964), 542-550. [10] M. A. Krasnosel'skii and Ya. B. Rutickii, Convex Functions and Orlicz Spaces, P. Noordhoff Ltd. Groningen, 1961. [11] J. Lindenstrauss and L. Tzafriri~ Classical Banach Spaces, I and 11, Springer, Berlin, 1977 and 1979. [12] M. !'vI. Rao, Interpolation, ergodicity and martigales, J. Math. Mech., 16(1966), 543-567. [13] M. M. Rao, Measure Theory and Integration, Wiley-Interscience, New York, 1987. [14] M. M. Rao and Z. D. Ren, Theory ofOrlicz Spaces, Marcel Dekker, New York, 1991. [15] Z. D. Ren, Packing in Orlicz function spaces, Ph. D. Dissertation(Advisor: N. E. Gretsky), University of California, Riverside, 1994. [16] J. J. Schaffer, Geometry of Spheres in Normed Spaces, Marcel Dekker, New York and Basel, 1976. [17] Y. W. Wang and S. T. Chen, Non-squareness, B-convexity and flatness of Orlicz spaces, Comrnent. Math. Prace Mat., 28(1988), 155-165. [18] H. K. Xu, Measure of noncompactness and normal type structure in Banach spaces, Panamer. Math. J., 3(1993), No.2, 17-34.
Recursive Multiple Wiener Integral Expansion for Nonlinear Filtering of Diffusion Processes SERGEY LOTC)TSKyt Center for Applied rVlathematical Sciences, University of Southern California, Los Angeles, CA 90089-1113 BORIS L. RC)ZOVSKII+ Center for Applied Mathematical Sciences, University of Southern California, Los Angeles, CA 90089-1113
Abstract A recursive in time Wiener chaos representation of the optirnal nonlinear filter is derived for a time homogeneous diffusion model with uncorrelated noises. The existing representations are either not recursive or require a prior computation of the unnormalized filtering density, which is time consuming. An algorithm is developed for computing a recursive approxinlation of the filter, and the error of the approximation is obtained. When the parameters of the rnodel are known in advance, the on-line speed of the algorithm can be increased by performing part of the computations off line.
Key Words: nonlinear filtering, Wiener chaos, recursive filters.
1
INTRODUCTION
In a typical filtering model, a non-anticipative functional ft(x) of the unobserved signal process (:r(t))t>o is estimated from the observations y(s), s ::; t. The best mean square estimate is kno-wn to be the conditional expectation E[ft(x)ly(s), s ::; t], called the optimal filter. When the observation noise is additive, the Kallianpur-Striebel formula (Kallianpur (1980), Liptser and Shiryayev (1992)) provides the representation of the optimal filter as follows:
tThis work was partially supported by ONR Grant #N00014-95-1-0229. +This work was partially supported by ONR Grant #N00014-95-1-0229 and ARO Grant DAAH 04-951-0164.
199
Lotosky and Rosovskii
200
where
4>d·]
is a functional called the unnormJalized optirnal .filter.
In the particular case
ft(x) == f(x(t)), there are two approaches to computing 4>df]. In the first approach (Lo and Ng (1983), Mikulevicius and Rozovskii (1995), Ocone (1983)), the functional 4>df] is expanded in a series of multiple integrals with respect to the observation process. This approach can be used to obtain representations of general functionals, but these representations are not recursive in time. In fact, there is no closed form differential equation satisfied by rPt Lt]. In the second approach (Kallianpur (1980), Liptser and Shiryayev (1992), Rozovskii (1990)), it is proved that, under certain regularity assumptions, the functional 4>t[f] can be written as
4>tUJ =
Jj(x)u(t,x)dx
(1.1)
for some function u( t, x), called the unnormJalized .filtering density. Even though the computation of u( t, x) can be organized recursively in time, and there are many numerical algorithms to do this (Budhiraja and Kallianpur (1995), Elliott and Glowinski (1989), Florchinger and LeGland (1991), Ito (1996), Lototsky et al. (1996), etc.), these algorithms are time consumingbecause they involve evaluation of u(t, :r) at many spatial points. Moreover, computation of 4>df] using this approach requires subsequent evaluation of the integral (1.1). The objective of the current work is to develop a recursive in time algorithm for computing 4>df] without computing u(t, x). The analysis is based on the multiple integral representation of the unnormalized filtering density (Lototsky et al. (1996), Mikulevicius and Rozovskii (1995), Ocone, (1983)) with subsequent Fourier series expansion in the spatial domain. For simplicity, in this paper we consider a one-dimensional diffusion model with uncorrelated noises. In the proposed algorithm, the computations involving the parameters of the model can be done separately from those involving the observation process. If the parameters of the model are known in advance, this separation can substantially increase the on-line speed of the algorithm.
2
REPRESENTATION OF THE UNNORMALIZED OPTIMAL FILTER
Let (O,:F, P) be a complete probability space, on which standard one-dimensional Wiener processes (V(t))t~O and (W(t))t;::o are given. R,andom processes (x(t))t20 and (y(t))t~O are defined by the equations
t t r b(x(s))ds + r a(x(s))dV(s), lo ./0
x(t) == xo
+
y(t) =
h(;r;(s))ds + W(t).
l
(2.1)
In applications, x(t) represents the unobserved state process subject to estimation from the observations y(s), s ::s t. The a - algebra generated by y(s), s ::s t, will be denoted by:Ff. The following is assumed about the model (2.1):
(AI) The Wiener processes
(V(t))t~O
and (W(t))t~O are independent of xo and of each
other; (A2) The functions b( x), a (x), and h (x) are infinitely differentiable and bounded with all the derivatives;
Nonlinear Filtering of Diffusion Processes
201
Xo has a density p(x), x E R, so that the function p == p(x) is infinitely differentiable and, together with all the derivatives, decays at infinity faster than any power of x.
(A3) The random variable
Let j == j (x) be a measurable function such that (2.2) for some ko 2: 0 and L > O. A.ssumptions (A2) and (A3) imply that Elj(x(t))1 2 < 00 for all t 2: 0 (Liptser and Shiryayev, 1992). Suppose that T > 0 is fixed. It is known (Kallianpur (1980), Liptser and Shiryayev (1992)) that the best mean square estimate of j(x(t)) given y(s), S' :::; t :::; T, is j(x(t)) == E[f(x(t))\Ff], and this estimate can be written by the Kallianpur-Striebel forrnula as follows:
.f(x(t))
= E[j~x(t) )p(t) IFl] E[p(t)IFf]
where
p(t)
= exp { l h(x(s))dy(s)
(2.3)
,
~ ~ llh(x(s)Wds},
and E is the expectation with respect to measure P(.) :== J. (p( T) ) -1 dP. Moreover, under measure P, the observation process (y( t) )O~t~T is a \Vicner process independent of
(x(t))O
Th~ ~onditional expectation E[jCr(t))p(t)IFfJ \vill be referred to as the unnormalized optimal filter and will be denoted by cPt [fJ. In this section, a recursive in time representation of 4>df] is dprived for an arbitrary function f satisfying (2.2). It is kno\vn (RDzovskii, 1990) that, under assumptions (AI) (A3), there exists a random field u( t, x), t 2: 0, .r E R, called the unnormalized filtering density, such that
4Jtl!J =
.lR u(t, x)f(x)dx.
(2.4)
Denote by Pt cp(.r) the solution of the equation
8v (t, x)
at
1 8 2 ( a 2 ( X ) V ( t, .r) )
2
v(O, x) ==
8x 2
8(b(x)v(t, x)) 8x
t > 0;
and consider 0 == to < t 1 < ... < tM == T, a partition of [0, T] with steps ~i == t i - ti-l (this partition "rill be fixed hereafter). The following theorem gives a recursive representation of the un normalized filtering density at the points of the partition. THEOREM 2.1. (AI) -- (A3),
(cf. Mikulevicius and Rozovskii (1995), Ocone (1983)) Under assumptions
(2.5)
for i == 1, ... ,lvI, where y(i)(t) == y(t
+ t i- 1 )
-
y(ti-d, 0:::; t :::; ~i'
Lotosky and Rosovskii
202
Proof. This follows from Theorem 2.3 in Lototsky et al. (1996) and Theorem 3.1 in Ito (1951). To simplify the further presentation, the following notations are introduced. For an :F~_1 - measurable function 9 == g(x, w) and 0 ~ t ~ ~i,
FJi)(t, g)(x)
:==
F~i) (t, g)(x)
:=l{k...
Ptg(x),
{2g _skh .. . hPs1g(x)dy(i) (SI) ... dy(i)(Sk)'
k:::::
1.
(2.6)
With these notations, (2.5) becomes (2.7) It is known (Ladyzhenskaia et al. (1968), Rnzovskii, (1990)) that
1
(2.8)
1"1hen it follows by induction that for every t E [0, ~i], i == 1, ... ,M, and k 2 0, the operator 9 H F~i)(t,g) is linear and bounded from L 2 (O,P;L 2 (R)) to itself and EIIF~i)(t,g)116 ~ eCt[(Ct)k /k!]EllgI16. This, in particular, implies that u(t i ,') E L 2 (R) P- and P- a.s. In the following theorem, the unnormalized filtering density is expanded with respect to an orthonormal basis in L 2 (R). With a special choice of the basis, this expansion will be used later to construct the recursive representation of the unnormalized optimal filter. THEOREM 2.2.
If {en}n~O is an orthonormal basis in L 2 (R) and random variables
1/Jn (i), n 2:: 0, i == 0, ... ,M, are defined recursively by 1/Jn(O) == (p, en)o, Wn(i) == L (L(F~i)(~i,el),en)O(~)l(i -1)), i == 1, .. . ,M, k~O
(2.9)
l~O
then
u(t i , .)
==
L ~)n(i)en, P - a.s.
(2.10)
n~O
Proof. The proof is carried out by induction. Representation (2.10) is obvious for i == O. If it is assumed for some i-I 2:: 0, then (2.7) and the continuity of F~ i) (~i, .) imply that
~)n(i)
L(F~i)(f).i' U(ti-l, ')), en)o == k>O L (L(F~i)(~i,ed,en)o~~l(i - 1)),
k~O
:==
(u(t i , '), en)o
==
l~O
and (2.10) follows with 'l/Jn(i) given by (2.9). REMARK. Direct computations show that the infinite SUIns in (2.9) can be interchanged even though the double sum need not converge absolutely. The absolute convergence holds
1" .
110 and (', ')0 are the norm and the inner product in L 2 (R). The value of the constant C depends only on the parameters of the model and is usually different in different places.
Nonlinear Filtering of Diffusion Processes
203
if l:n VEI'l/)n(i) 2 < 00, which is the case when {en} is the Hermite polynomial basis (2.11). For practical computations, both infinite sums in (2.9) must be truncated. These truncations are studied in Section 3. 1
To get a representation of 4>df], it now seems natural, according to (2.4), to multiply both sides of (2.10) by f(x) and integrate, but this cannot be done in general because (2.10) is an equality in L 2 (R) and f need not he square integrable. The difficulty is resolved by choosing a special basis {en} so that integral JR 1(.r) en (x) dx can be defined for every function 1 satisfying (2.2). Specifically, let {en} be the Hermite basis in L 2 (R) (Gottlieb and Orszag (1977), Hille and Phillips (1957)):
en(x) ==
1
V2
n Jr1/2 n !
2
e- x /2Hn(x),
(2.11)
where Hn(x) is the nth Hermite polynornial defined by
Then the following result is valid. THEOREM 2.3. then
If assumptions (At) - (A3) and (2.2) hold and en is defined by (2.11),
1Jti Lt] ==
L In1Pn( i),
P - a.s.,
(2.12)
n20
where In == JR I(x)en(x)dx and 1f)n(i) is given by (2.9). Proof. Condition (2.2) and fast decay of en(.r) at infinity imply that In is well defined for all n. Then (2.12) will follow frorn (2.4) and (2.10) if the series l:n~o In'l/Jn(i) is P - a.s. absolutely convergent for all i == 0, ... , M. Since measures P and P are equivalent, it suffices to show that (2.13) IInl EI1/Jn(i) I < 00.
L
n20
Arguments sirnilar to those in Hille and Phillips (1957), paragraph (21.3.3), show that
which implies that
Ilk I ~
Cn( 2k o+1)/4.
(2.14)
On the other hand, it follows froIll the proof of Theorem 2.6 in Lototsky et al. (1996) that for every integer ! there exists a constant C (J) such that (2.15) Taking ! sufficiently large and cornbining (2.14) and (2.15 ) results in (2.13). REMARK. It is known (Hille and Phillips (1957), paragraph (21.3.2)) that su Px Ien (x) I ~ en -1/12. Together with (2.15), this inequality implies that, for the Hermite basis, the series in (2.10) converges uniformly in :r E R, P - a.s.
204
3
Lotosky and Rosovskii
RECURSIVE APPROXIMATION OF THE UNNORMALIZED OPTIMAL FILTER
It was already mentioned that the infinite sums in (2.9) must be approximated by truncating the number of terms, if the formula is to be used for practical computations. Multiple integrals in (2.6) must also be approximated. The effects of these approximations are studied below. For simplicity, it is assumed that the partition of [0, T] is uniform (~i == ~ for all i == 1, ... , M). With obvious modifications, the results remain valid for an arbitrary partition. Given a positive integer 1'\;, define random variables ljJn,K (i), n == 0, ... ,I'\;, i == 0, ... , M, by
1Pn,K(O)
==
'l/Jn,,,(i) =
(p, en)o,
t
((PLlCI, cn)o
+ (PLlhcl, cn)o[y(t i ) - y(ti~l)]+
(3.1)
[=0
(1/2)(P~h2e[,en)o[(y(t i ) - y(t i_d)2 - ~])7jJn,K(i - 1), i == 1, ... , M. Then the corresponding approximations to u (t i , x) and
UK(t i , x)
==
L
'~)n,K(i)en(x),
n=O
(3.2)
K
==
L ljJn,K(i)ln' n=O
The errors of these approximations are given in the following theorem. THEOREM 3.1. If assumptions (AI) - (A3) and (2.2) hold and the basis {en} is chosen according to (2.11) then max
lSiSM
VEllu,,(t i ,·) - U(ti, ·)116 S CD. +
C~;lD.'
1'\;'-
l~~~~ VEI(hAtJ -
u1(t o, x)
:==
p(x),
U1(t i , x)
:==
L
2
JEll' 115
(3.3) (3.4) will be used. All
F~i)(~, U(t i- 1, ·))(x).
k=O
It is proved in Lototsky et al. (1996), Theorem 2.4, that max IlIu(ti ,·)
°SiSM
-
U1(t i , ·)1110 ~ C~.
(3.5)
Step 2. Define
pJi) (~, g)(x) -(i)
:== P~g(x), ._
F1 (~,g)(x).- [y(t i ) - y(ti-l)]P~hg(x), F- 2(i) (~, g)(x) :== (1/2)[(y(t i ) - y(t i- 1)) 2 -
]
~ P~h
2
g(x),
Nonlinear Filtering of Diffusion Processes
205
and then by induction
11 1 (t o, x) 11 1(t i ,x)
:==
p(:L') , 2
==
L
P~i)(6, 11 1(ti_l' ·))(x).
k=O
Since y(t) - y(ti-d, t > t i- 1, is independent of :F~_l under measure
P,
it follows that
111111 (t i , .) - U1(t i , .) 1116 == III Ptl ( U 1 (t i- 1, .) - U1(t i- 1, .)) 1116 + 2
"' L
(3.6)
P~i)(6, U 1 (t i_ 1, .)) - F~i)U1(ti_1' ,))1116.
k=O
Next, 2
III L
P~i)(6, U1(t i_1 , .))
F~i)(6, U1(t i_1 , ·))1116
-
:::;
k=O 2
4
L
(1IIP~i)(~, U1 (t i _ 1 ,')
-
U1(t i _1, ,))1116+
(3.7)
k=l
IIIP~i)(6, U1 (t i_ 1, .)) - F~i)(6, U 1(t i_ 1 , ·))III~). It follows from (2.8) and the definition of P~i) that (3.8) The Taylor forrnula and the definition of Pt imply
where
11 . 1I H2
is the norm in the corresponding Sobolev space. Then
1(t _ , .)) - Pl(i\6, U 1(t _ , '))1116 i 1 i 1 IIIPii)(6, U 1(t i_1 , .)) - PJi)(6, U 1 (ti_l' '))1116 11 IP\(i) (6,
U
:::; CEllu 1 (ti_l' ')lIiI2~3; :::; CEllu 1(t i_1, ·)lIiI2~4.
(3.9)
Finally, the continuity of operator Pt in H 2 (R) (Ladyzhenskaia et al. (1968), Rozovskii, (1990)) implies (3.10) Ellu 1 (t i - 1 , .) IliF :::; eCTllplliF :::; C. Combining inequalities (3.6) - (3.10) results in
(at least for sufficiently small .6.), which, by the Gronwall Lemma, implies (3.11)
Step 3. The same arguments as in the proof of Theorem 2.6 in Lototsky et al. (1996) show that -1 C(,) (t i ,') - UK(t i , ·)1110:::; K:'Y-1/2~' (3.12)
'"u
Combining (3.5), (3.11), (3.12), and the triangle inequality results in (3.3).
206
Lotosky and Rosovskii
2. The natural way of proving (3.4) is to use (3.3) and the Cauchy inequality. To deal with the technical difficulty that f ~ L 2 (R), the following spaces are introduced (Rozovskii (1990), Sec. 4.3): for r E R, L 2(R,r) == {cp : I R cp2(.r)(1 + :r 2 )rd:r < oo}. The weighted Sobolev spaces Hn(R, r) are defined in a similar way. Then L 2 (R, r) is a Hilbert space with inner prodnct (!PI, !P2)r := JR !PI (:r )!P2(X) (1
+ x 2)' dx
and the corresponding norm 11·llr. If CPl EL 2(R,r) and CP2EL 2(R, -r), then JRCP1(X)CP2(X)dx is well defined and will be denoted by (CPl,CP2)O. Condition (2.2) implies that f E L 2 (R, -r) for all r > ko + 1/2. On the other hand, assumptions (A2) and (A3) imply that u(t,') E L 2 (R, r) for all r E R (Rozovskii, 1990, Theorem 4.3.2), and the same is true for U1(t i , '), U1(t i , '), and UK(t i , .). Fix an even integer r > k o + 1/2 and define iJ(x) :== (1 + x 2 )r/2. Notation Ill· IIlr :==
JEll' 11;
will also be used. By the Cauchy inequality,
JElcPti,K[fJ - cPtiLfJI2 == JE(uK(t i ,·)
J
u(t i , '), f)5 ~
-
11 f 11 ~ rill n t i , .) - U ( t i , .) Ill; ~ 11 f 11- r ( Illu(t i, .) Illu1(ti ,·) - U1(t i , ·)lllr + Illn,l(i-i") - UK(t i , ')lllr)' K (
t
'U 1 ( i, .)
II1 r +
(3.13)
Since the operator Pt is linear bounded from Hn(R, r) to itself (Ladyzhenskaia et al. (1968), Rozovskii (1990)), the arguments of steps 1 and 2 can be rcpeated to conclude that (3.14)
Next, it follows from the proof of Theorem 2.6 in Lototsky et al. (1996) that for every positive integer, there exists C (,) such that for all i == 0, ... , AI - -1 2 C(,) E(u (t i , '), en)o ~ n 2'Y+ r '
(3.15)
Similarly, by (3.12), there is C(,) so that
Illu1(ti ,·) ~ n,,;(t i , ')III~ :::; K2'Y~~!;fl2' On the other hand, repeated application of the relations e~ == (Viien-l - vn+1"e n +l) / V2 and -e~ + (1 + x 2 )e n == 2( n
+ 1 )e n
(3.16)
shows that
n+r/2
(g,e n );/2 ~ C
L
mr(g,em)~
m=n-r/2
(if m < 0, the corresponding term in the sum is set to be zero), and consequently
L (g, e
n
);/2 ~ C
n~O
L
n (g, e n )6· r
n~O
Combining the last inequality with the identities
I/gll; ==
Ilgj)ll~ == L(gj), e n )6 == L(g, e n );/2 n
n
Nonlinear Filtering of Diffusion Processes
207
results in
Illu 1 (t i ,') -
UK(t i , ·)111; ==
L
E(n~l(ti") - UK(t i , '), en );/2 :s;
n2::0
CL n 'ECu 1
K
1
(t i ,·) - 'lLK(t i , '), en )6 ==
n>O
C
I: nTE( ill (t
CL n ECu (t i ,·) T
1
UK(t i , '), en )6+
n=O i , '),
en )6·
n>,',
Now, (3.15) and (3.16) imply
Together with (3.13) and (3.14), the last inequality implies (3.4). REMARK. The constants in (3.3) and (3.4) are determined by the bounds on the functions b, a, 12, and p and their derivatives and by the length T of the time interval. The constants in (3.4) also depend on Land ko from (2.2). The error bounds in (3.3) and (3.4) involve two asymptotic parameters: ~ (the size of the partition of the time interval) and K (the number of the spatial basis functions). With the appropriate choice of these parameters, the errors can be made arbitrarily small. In Lototsky et al. (1996), the multiple integrals (2.6) were approximated using the Cameron-Martin version of the Wiener chaos decomposition. The analysis was carried out only for the unnormalized filtering density, but the results can be extended to the unnormalized optimal filter q>dfJ in the same way as it is done in the present work. The overall error of approximation from Lototsky et al. (1996) has the same order in ~ and K as (3.3), but the approximation formulas are more complicated. Formulas (3.1) and (3.2) provide an effective numerical algorithm for computing both the unnormalized filtering density u( t, x) and the unnormalized optimal filter 1Jt [1] independently of each other. If the ultimate goal is an estimate of i(x(t i )) (e.g. estimation of moments of x(t i )), it can b~ achieved with a given precision recursively in time without computing u(t i , x) as an intermediate step. This approach looks especially promising if the paranleters of the model (i.e. functions b, a, h and the initial density p) are known in advance. In this case, the values of (P~el, en)o, (P~hel, en)o, (1/2)(P~h2el, en)o, and in == (!, Cn)o, n,l == 1, ... ,K, can be pre-conlputed and stored. When the observations become available, the coefficients 1Pn(i) are computed according to (3.1) and'then 1J t il K [!J is COlllputed according to (3.2). As a result, the algorithm avoids performing on line the time consuming operations of solving partial differential equations and computing integrals. Moreover, only increments of the observation process are required at each step of the algorithm.
REFERENCES Budhiraja, A. and Kallianpur, G. (1995). Approximations to the Solution of the Zakai Equations using Multiple vViener and Stratonovich Integral Expansions, Technical Re-
Lotosky and Rosovskii
208
port 447, Center for Stochastic Processes, University of North Carolina, Chapel Hill, NC 27599-3260. Elliott, R. J. and Glowinski, R. (1989). Approximations to solutions of the Zakai filtering equation, Stoch. Anal. Appl., 7(2):145-168. Florchinger, P. and LeGland, F. (1991). Time discretization of the Zakai equation for diffusion processes observed in correlated noise, Stoch. and Stoch. Rep., 35(4):233256. Gottlieb, D. and Orszag, S. A. (1977). Numerical Analysis of Spectral Methods: Theory and Applications, CBMS-NSF Regional Conference, Series in Applied Mathematics, Vo!.26. Hille, E. and Phillips, R. S. (1957). Functional Analysis and Semigroups, Amer. Math. Soc. Colloq. Publ., Vo!. XXXI. Ito, K. (1951). Multiple Wiener integral, J. Math. Soc. Japan, 3:157-169. Ito, K. (1996). Approximation of the Zakai equation for nonlinear filtering, SIAM J. Cont. Opt. ( to appear). Kallianpur, G. (1980). Stochastic Filtering Theory, Springer. Ladyzhenskaia, O. A., Solonikov, V. A., and Ural'tseva, N. N. (1968). Linear and quasilinear equations of parabolic type, American Mathematical Society, Providence, Rhode Island. Liptser, R. S. and Shiryayev, A. N. (1992). Statistics of Random Processes, Springer. Lo, J. T.-H. and Ng, S.-K. (1983). Optimal orthogonal expansion for estimation I: Signal in white Gaussian noise, Nonlinear Stochastic Problems (Bucy, R" and Moura, J., ed.), D. Reidel Pub!. Company, pp. 291-309. Lototsky, S., Mikulevicius, R., and Rozovskii, B. L. (1996). Nonlinear Filtering Revisited: A Spectral Approach, SIAM Journal on Control and Optimization, to appear. Mikulevicius, R. and R,Ozovskii, B. L. (1995). Fourier-Hermite Expansion for Nonlinear Filtering, Festschrift in honor of A. N. Shiryayev. Ocone, D. (1983). Multiple integral expansions for nonlinear filtering, Stochastics, 10: 1-30. Rozovskii, B. L. (1990). Stochastic Evolution Systems, Kluwer Academic Publishers.
A Berry-Esseen Type Estimate for Hilbert Space Valued V-Statistics and On Bootstrapping Von Mises Statistics V.V. SAZONC)V Steklov Mathematical Institute, Moscow, and Hong Kong University of Science and Technology
This paper consists of two parts related to each other only by employing a common approach. This approach consists in using the technique developed for the proof of Berry-Esseen type estimates and Edgeworth type expansions for Hilbert space valued independent random variables. A number of researches contributed to this area and a rather complete account of the related work up to 1990 can be found in the survey paper by Bentkus et al. (1990). Here we will nlention only papers by G()tze (1979), Yurinskii (1982), and Sazonov, Ulyanov and Zalesskii (1988, 1991), which are most closely related to the present work. First consider Hilbert space valued V-statistics. Let Xl, ... ,..:Yn be independent identically distributed (i.i.d.)
random
vari~bles
with values in a measurable space (X, X).
Denote P the distribution of Xl: P(X I E A), A E X.
Let be a map defined on
(X x X, X x X) with values in a separable Hilbert space H such that (XI, X2)
:=:
Xl, X2 E X. The inner product and norm in }f will be denoted (.,.) and
respectively.
11.11
(X2' Xl),
Assume that E (Xl, X 2) == 0 (this assumption is not essential and is made for simplicity) and EII(X 1 , ..:Y2 ) \ I <
00.
The U-statistic ,vith kernel corresponding to the sequence 209
Sazonov
210
Xl, ... ,Xn is defined as
The Hoeffding decomposition represents Un as n
Un == 2n- l L91(X 1) + 2n- 1 (n -1)-1
L
92(Xi ,Xj ),
l~i<j~n
j=l
where
91(X) == E
y) ==
Without loss of generality we may assume that 91 (X) is defined everywhere. We will also assume that 0 <
(52
== EI19l (Xl) 11 2 <
00.
Clearly by the Fubini theorem
Denote V the covariance operator of 91 (.LY1):
and (5~ ~ cr~ ... be the eigenvalues of V. Finally let Y be a Gaussian H-valued random variable with mean zero and covariance operator (5-2V. By the central limit theorem in Hilbert space, if
then ~n(a)
== sup IP(112-1nl/2cr-lUn - all ~ r) - P(IIY - all ~ r)1
-t 0 as
n
-t
00.
r~O
(see Borovskikh (1986), Borovskikh and Korolj uk (1990)). We are interested in estimating ~n (a)
in terms of P and n. The first estimate of this type for real valued U -statistics was
obtained by Serfling (1973) and since then a number of gradual improvements were published (for details see Lee (1990), Bentkus, Gotze and Zitikis (1992)). The best at present estimate was proved by Koroljuk and Borovskikh (1986). It states that sup IP(2- l n l / 2(5-lUn
::;
r) - P(Y ~
r)1 ::; c(cr- l EI91(X 1 )1 3 + a-5/3EI92(Xl,X2)15/3)n-l/2,
r~O
and it turns out that 5/3 here can not be replaced by any smaller number (see Bentkus, Gotze, Zitikis (1992)). For H-valued Xi' it was shown in Borovskikh (1986), Borovskikh and Koroljuk (1990, 1991, 1994), that
V-Statistics and Von Mises Statistics
211
In Puri and Sazonov (1991) an estimate of ~n(a) with an explicit dependence on moments
Ellgl(X l )11 3, Ellg2(X 1 , X 2)11 2 and on a was obtained, however this estimate is only of order O(n- 1/ 3 ). Our new result is the following estimate of ~n(a), obtained jointly with Vu. Borovskikh and Madan Puri, which has an explicit dependence on mornents and is of order
O(n- l / 2 ): 9
~n(a) :::; c(l
+ IlaI13)a4(II ajl)agl(EII(Xl,X2)113)2n-I/2.
(1)
j=l
The main tools in the proof of this estimate are an extended version of the Gotze lemma on the estimation of the characteristic function of the square of the norm of a sum of independent Hilbert space valued random variables (see Gotze (1979), Yurinskii (1982) and especially Sazonov, Ulyanov and Zalesskii (1988), Lemma 11), and moment inequalities for Hilbert space valued U-statistics (see Korolj uk and Borovskikh (1994)). A rather long proof of (1) will be published elsewhere. Next consider the bootstrapping of van Mises w 2 -statistics. Let Xl, X 2 , ... be LLd. real random variables defined on a probability space (0, 8, P). Assume that the distribution function F(x) == P(X l :::; x) is continuous and denote by Fn(x) == n- l Lj=l l{xj:c.:;x} the empirical distribution function corresponding to Xl, ... X n , n == 1,2, ... (here and in what follow 1B denotes the indicator function of a set B, i.e. 1B (b) == 1 if b E Band 1B (B) == 0 if b ~ B). The von Mises w 2 -statistics are defined as
w;, = n
i:
(Fn(x) - F(X))2dF(x).
To define the bootstrapped version of w~ a resample sequence
X;, ...
,X~ of i.i.d. random ,X~
may be
J~l, •••
,Xn or
variables with distribution function Fn is introduced. The variables X;, . .. defined on (0,8, P), and in this case they are assumed to be independent of
on an another probability space (~1*, B, P*). We prefer to suppose them to be define.d on 0*. If
F~
is the ernpirical distribution function corresponding to X;, ...
version of w~ is
(w;,)* = n
i:
(F~(x)
,X~,
the bootstrapped
- Fn(xWdFn(x).
We will also consider generalized Bayesian bootstrap versions of w~. Let UI , U2 , . •. be a sequence of real i.i.d. random variables defined on a probability space (0',8', P'), such that P'(UI
== 0) == 0, E'UI == 1, E' e tUt <
00
for some t > O.
(2)
Condition (2) is equivalent to the existence of 9 > 0, T > 0 such that E'
et(U-l) :::; e-
gt2 2 /
for all
t: ItI :::; T
(2')
(see Petrov (1975), Ch. 3, Lemma 5). The generalized Bayesian bootstrap version of w~ is defined as
Sazonov
212
n
~j
== Uj (L uj ) -
== 1, . . . ,n.
j
1,
j=1
Our aim is to estimate
and
As far as we know 6n ,
~n
have not been estimated before. The following results have been
obtained jointly with Albert Lo and will be published with full proofs elsewhere. 1. With probability 1 for any
E
>0 (3)
where c( E) depends only on
E.
2. With probability 1
(4) where c((3, 9, T) depends only on j3
== a- 4 E' (U1 - 1)4 and on g, T from (2') (one can write
down a c((3, 9, T) with an explicit dependence on (3, 9, T). It follows from the proofs that (3) and (4) are true for all w such that all Xl (w ), X 2 (w ), ... are different (the set of such w has P probability 1). To prove (3) we write
where
6n1 == sup IP(w~ ::; y) - G(y)l,
6n2 == sup IP*((w~)* ::; y) - Gn(y)l,
y~O
y~O
6n 3 == sup IG(y) - Gn(y)l; y~O
==
here G(y) may be represented as G(y) Gaussian random variable with EY
==
P(IY~I
::; V), Y being a Hilbert space valued
0 and the covariance operator V having eigenvalues
(n-j)-2, j == 1,2, ... ; Gn(y) is defined below, see (5) and what follo,vs it. It is well known that 6n l ::; en-I, n
== 1,2, ... (see e.g. Bentkus et al. (1990), Gotze
(1979)). To estimate 6n2 (w) we observe that (w;)* can be represented as n
(w;)* ==
IIn- 1 / 2 LYnl
2 11
,
l=l
where Ynl for each n
== 1,2, ... , I == 1, ... ,n is a L 2 [O, I]-valued random variable defined by
Ynl(t) == l{zt:S k / n} - kin
if
(k - l)ln < t
:s;
kin,
k
== 1, ... ,n,
V-Statistics and Von Mises Statistics
213
where Zt == mn(..Yt) and m n is a map defined on Xk(w), k == 1, ... ,n by mn(X(k)(W)) == kin, X(k)(W), k == 1, ,n, being the values of the order statistics corresponding to Xl, ... ,Xn
(..:\'"(k)(W), k == 1,
,n, are assumed to be different and they are different with probability 1).
Then we apply a Berry-Esseen type estimate (Corollary 7 in Sazonov, Ulyanov and Zalesskii (1988)) to the sequence Ynl , I == 1, ... ,n of L 2 [0, 1] i.i.d. random variables to obtain n
sup IP*(n- I / 2 y~O
1
L }~ll ::; y) -
Gn(y)1 ::; c(f)n- 1+
E
(5)
,
l=l
where Gn(y) == P(INnl ::; y) and N n is a Gaussian L2 [0,1]-valued random variable with ENn == 0 and having the same covariance operator Vn as Ynl . The eigenvalues of Vn are (4n2sin2(1fkl(2n)))-I,
Wnk
== { 0,
== 1, ... k 2 n.
k
,n-l
Finally, using explicit expression for eigenvalues of V and Vn one can prove that 6n :3 and this leads to the estimate of 6n (w) stated above.
::;
cn- l
The proof of (4) goes basically along the same lines. The main differences are: in proving (4) we have to use a Berry-Esseen type estimate for non identically distributed independent Hilbert space valued summands (see Ulyanov (1987)) and instead of I/n- I/2L:f=l Ynll/ we have to deal with the norm of a similar normalized sum multiplied by the random factor n I I Lj=l Uj I· Due to the presence of this random factor our approach leads to a lower order
estimate (4) as compared to (3). It is appropriate to note that estimates (3) and (4) have only theoretical value since when the distribution function F is continuous there is no need to bootstrap
w;.
We are grateful to Andrew Rukhin for the information related to the operator Vn , in particular for the reference to Rutherford (1951), which permitted to identify precisely the eigenvalues of Vn .
References Bentkus, V., Gotze, F., Paulaskas, V. and Rackauskas, V. (1990). The accuracy of Gaussian approximation in Banach spaces, SFB 343 "Diskrete Strukturen in der lvlathematik", Preprint 90-100, Univeristat Bielefeld. Bentkus, V., Gotze, F. and Zitikis, R. (1992). Lower estimates of the convergence rate for Ustatistics, SFB 343 "Diskrete Strukturen in der Mathematik", Preprint 92-075, Universitat Bielefeld. Borovskikh, Vu. V. (1986). Theory of U-statistics in Hilbert space, Preprint No. 86.78, Institute of Mathematics, Ukrain. Acad. of Sci., Kiev, (Russian). Borovskikh, Vu. V. and Koroljuk, V. S. (1990). UH-statistics, Sov. 432-435.
Math.
Dokl., 40:
Sazonov
214
Gotze, F. (1979). Asymptotic expansions for bivariate von Mises functionals, Z. Wahrsch. Verw. Gebiete, 50: 333-355.
Koroljuk, V. S. and Borovskikh, Vu.
V. (1986).
Approximation of non degenerate U-
statistics, Theory Probab. Appl., 30: 439-450. Koroljuk, V. S. and Borovskikh, Vu. V. (1991). Rate of convergence in the central limit theorem for U H-statistics, Theor. Probab. Math. Statist., 43: 79-85. Koroljuk, V. S. and Borovskikh, Vu. V. (1994). Theory of U -statistics, Kluwer, Dodrecht. Lee, A. J. (1990). U -statistics: theory and practice, M. Dekker, New York. Petrov, V. V. (1975). Sums of independent random variables, Springer-Verlag, Berlin. Puri, M. L. and Sazonov, V. V. (1991). On Hilbert space valued U-statistics. Theory Probab. Appl., 36: 604-605.
Rutherford, D. E. (1951). Some continuant determinants arising in Physics and Chemistry, Proc. Roy. Soc. Edin. A, 63: 232-241.
Sazonov, V. V., Ulyanov, V. V. and Zalesskii, A. (1988). Normal approximation in Hilbert space 1-11, Theor. Probab. Appl., 33: 207-227, 473-487. Sazonov, V. V., Ulyanov, V. V. and Zalesskii, A. (1991). A precise estimate of the rate of convergence in the central limit theorem in Hilbert space, Mat. USSR Sbornik, 68: 453-482. Serfling, R. J. and Grams, W. E. (1973). Convergence rates for V-statistics and related statistics, Ann. Statist., 1: 153-160. Ulyanov, V. V. (1987). Asymptotic expansions for distributions of sums of independent random variables in H, Theory Probab. Appl., 31: 25-39. Yurinskii, V. V. (1982). On the accuracy of normal approximation of the probability of hitting a ball, Theory Probab. Appl., 27: 280-289.
On the Strong Form of the Faber Theorem BORIS SHEKTMAN, Department of Mathematics, University of South Florida, Tampa, FL 33620
ABSTRACT. Let H~ be the space of polynolnials of degree at most n - 1 on the unit circle with the uniform norm. We show that the dual spaces (H~) * cannot be embedded uniformly in any £l-space. This result implies (but is not equivalent to) the famous theorem of Faber.
Let H n be the space of polynolnials of degree at lnost n - 1 and let Hr:. be the ndimensional Banach space H n considered as a subspace of Lp (1r), where 1r is the unit circle. We use d(X, Y) to denote the Banach-Mazur distance, 7rl(T) to denote the absolutely summing nornl of an operator T and loo (T) for the Loo-factorization norrn of T. (ef [4]). Theorem 1. There exists a constant C > 0 such that, if En is an n-dim,ensional subspace of an £1 -space, then d((H::)*, En) 2: Clog n. Proof. Let X be an £l-space. Let En C X be an n-dimensional subspaee. Let
be such that
T(H~)*
== En. We want to prove that
IITIIIIT-11I 2: Clog 215
n.
216
Shektman
=
Assume without loss of generality that liT-Ill xi, ... ,x~ E X* such that and
Ilx; 11
Then (cf [3]) there are vectors
1.
~ 1.
Consider the following operators:
J: H: -+ H~; Je ik8
=
S' HI -+ ten). Se ik8 .
ni'
. t h e CanOnICa . 1 b asIS . .In were h el, ... ,en IS
o(n)
(.1
o(n) A .·(.1 -+
= eik8 , _l_ ek
k+l
'
,
- x k* • , A ek-
X*·
We illustrate these operators on the diagram X*~Hoo~Hl~l(n) n n 1 ~
A
By the Pietsch Factorization Theorem (cf [4])
1rl(J)
== IIJII ==
1.
By the Hardy inequality (cf [1]) there exists a constant c > 0 independent of n, such that
liS/I
~ c
Finally, we have and hence
IIAII :S
1.
Since A maps an £, 1 -space into an £'oo-space, we have
roo(A)
= IIAII = 1.
Next observe that n
tr(ASJT*) = tr(T* ASJ) =
L
1
k + 1 2:: log n.
k=O
By trace-duality (cf [4]) logn :S tr(ASJT*) ~ 1rl(SJT*)roo(A) ~
IISII1rl(J)IIT*IIIIAII
Thus
IITII
==
IIT* 11
~
~
clITII·
1 log n. 0 c
-
Strong Form of the Faber Theorem
217
Theorem 2. The results of the Theorem 1 remain true if we replace the space any of the following spaces:
(1) (2) (3)
H~
by
span [e iAk (1]k==l C O(T) (where Ak are arbitrary distinct integers, span [cos jO]7==1 C 0[0, 1f] span [tk]k==l C O[a, b].
Proof. The proof is the same as that of Theorem 1 if we replace the use of the Hardy inequality by the generalized Hardy inequality (cf [2]) for the trigonometric polynomials and by the Sidon inequality (cf [6]) for the cosine polynomials. The isometry between the algebraic and cosine polynomials proves the remaining case. 0 Corollary 1 (Faber Theorem). Let Pn : O(T) ---+ H n be an arbitrary sequence of projections onto the trigonometric polynomials. Then IIPn 11 ~ 0 log n.
Proof. Let Pn : 0(1') ---+ H~. Then P~ : (H~)* ---+ O*(T). Moreover, P~(H~)* is an n-dimensional subspace of O*(T) and II(P~)-lll == 1. Since O*(T) is an Ll-space, we conclude
Remark 1. The result of Theorem 1 is sharp. Proof. Let Qn be the Fourier projections from 0(1r) onto H n · Then
IIQnll ~ logn and
We will now rnention two exaluples to conclude that Theorern 1 does not follow from the Faber Theorenl.
Example 1. Let A(T) be the disk-algebra. Then, for any sequence of projections P n from A(T) onto H n , we have (cf [6]) IIPnl1 ~ Ologn. Despite this fact (H~)* is uniformly isomorphic to the subspaces of An(l'). Indeed it was shown by Bourgain and Pelczynski (cf [5]) that H~ are unifornlly isomorphic to subspaces Vn of A(T) which are uniformly complemented. Using the reasoning of Corollary 1, we get the desired conclusion. 0 It may appear that the reason for this example is in the fact that A(T) is not an L oo space and the relative projectional constant of H~ in A(1r) is not isomorphic-invariant. Therefore
Example 2. Let Vn C O(T) he such that
It is well-known (cf [4]) that for any sequence of projections Pn from C(l') onto Vn , we have IlPn 11 ~ ~)n. Yet ~: are isornetric to f~n) and hence (by Dvoretzky's Theorem) are uniforrnly isornorphic to subspaces of C* (T) which is an £l-space. 0 I am grateful to B. Chalrners, V. Curaric and S. Kisliakov for many suggestions and for the encouragement to wnte up this note.
V;
218
Shektman
References
[1] Hoffman, K., Banach Spaces of Analytic Functions, Prentice-Hall, Englewood Cliffs, NJ, 1962. [2] McGehee, O. C., Pigno, L. and Smith, B., Hardy's inequality and the L l norm of exponential sums, Ann. Math., 113((1981),613-618. [3] Rudin, W., Functional Analysis, McGraw-Hill Inc., 1973. [4] Tomczak-Jaegerman, N., Banach-Mazur Distances and Finite-Dimensional Operator Ideals, Wiley, New-York, 1985. [5] Wojtaszczyk P., Banach Spaces for Analysts, Cambridge Univ. Press, Cambridge, 1991. [6] Zygmund A., Trigonometric Series, Cambridge Univ. Press, Cambridge, 1959.
Nonlinear Filtering Theory for Stochastic Reaction-Diffusion Equations S. L. HOBBS Naval Command Control, and Ocean Surveillance Center, Code 784, San Diego, CA 92152-6040 S. S. SRITHARAN Naval Command Control, and Ocean Surveillance Center, Code 574, San Diego, CA 92152-6040 Supported by the ONR Mathematical Sciences and Mechanics Divisions; Affiliations: lJniversity Of Colorado and SDSU
This paper is dedicated to the occasion of Professor M. M. Rao's sixty fifth birthday
Abstract We examine the nonlinear filtering problem for the stochastic reactiondiffusion equation with additive noise. We derive the Zakai and FujisakiKallianpur-Kunita filtering equations for the evolution of the un-normalized and normalized condition~l expectation, and prove existence and uniqueness theorems for measure valued solutions.
1
INTRODUCTION
The reaction-diffusion equation is an elementary model in a number of physical applications including combustion. The purpose of this article is to study the filtering problem for the stochastic reaction-diffusion equation 219
Hobbs and Sritharan
220
with bounded nonlinearity and additive, trace class white noise. One of the important applications would be in combustion control with partial observations. In this paper we will derive the Zakai and Fujisaki-Kallianpur-Kunita (FKK) filtering equations for the evolution of the (un-normalized and normalized) conditional expectation (filter), and prove existence and uniqueness theorems for measure valued solutions. The filtering problem for finite dimensional systems has been studied for many years but filtering for systems governed by partial differential equations has only recently begun to be developed [8ri94]. In [Ahm94] a nonlinear filtering result for semilinear equations by an entirely different approach is announced. The reaction-diffusion equation governs the state of a system u(t) which evolves in time according to a parabolic partial differential equation with a non-linear interaction term. With stochastic forcing by a Wiener process w(t), we have on a complete probability space (0, F, m),
du(t) == [Au + B(u)] dt
+ Gdw(t)
(1)
where A is an elliptic partial differential operator, B is a nonlinear function of u, and G is a bounded linear operator (which could be the identity). To this stochastic evolution equation we add an observation vector z(t) E R m for each time t which evolves according to its own stochastic ordinary differential equation,
dz(t) == h(u(t))dt + dW2(t)
(2)
where h(·) is an m-vector valued function, W2(·) is an m-vector Wiener process which is not necessarily independent of w(·). The filtering problem is to estimate some (scalar) function f(t, u(t)) of the system state at time t given the set of observations {z( s) : 0 < s :::; t} up to time t. It is well known that the best (minimum variance) estimate of this quantity is the conditional expectation
IIt(f) == E[f(t, u(t))IFtJ
(3)
given the a-field Ft == a{z(s) : 0 < s :::; t} generated by the observations. For the time evolution of IIt{f) we will derive the following equation using the method of [FKK72]:
dIIt(f) == II t (8t f
+ [IIt(Mf) -
+ £f)dt
IIt(f)IIt(h)J · [dz - IIt(h)dt]
(4)
Stochastic Reaction-Diffusion Equations
where
221
1
.cf = 2tr (GQG*8uu J)+ < Au + B(u), 8u f >1
(5)
and
Mf
==
G;auf + fh.
(6)
Here Q is the covariance operator for the process w(t) and G 2 is a linear operator (discribed below) from the observation space to the state Hilbert space. The Zakai equation [Zak69]
(7) is satisfied by an un-normalized conditional expectation of f given the two are related by the equation
Ft,
where
(8) and its inverse
(9) A measure valued solution to the FKK equation gives an evolving conditional probability measure lIt defined on the Borel subsets of the state space H in which u( t) takes its values" so that the conditional expectation may be computed as
llt(J) = and an un-normalized measure
fH f(t, u)IIt(du)
(10)
et satisfies (11 )
for any
2
f
in a restricted class of test functions.
ASSUMPTIONS AND NOTATION
We will make the following assumptions. (AI) The linear (unbounded) operator A : D(A) c H --+ H is selfadjoint and generates a strongly continuous semi-group S(t), t ~ 0, of bounded linear operators on H.
Hobbs and Sritharan
222
(A2) The nonlinear operator B : H --+ H is defined and continuous on all of H and satisfies the following Lipschitz and linear growth conditions: there exists a constant C > 0 such that I\B(u) - B(v)11 ~ Cllu - vii and IIB(u)11 2 ~ 0 2 (1 + Ilu11 2 ) for all u,v E H. (A3) We assume the Wiener process w(t) takes values in its own separable Hilbert space U. (A4) G : U --+ H is a bounded linear transformation and does not depend on t or u. We will also assume that w (t) has a trace class covariance operator ([DZ92], ch 4.1). That is, Ew(s) ® w(t) == (t /\ s)Q where Q : U --+ U is a positive definite, self-adjoint, bounded linear operator with finite trace. If we denote by Vi and Vi the eigenvalues and (complete orthonormal set of) eigenfunctions of Q then the trace of Q is I:~l Vi and we may write ([DZ92], ch 4.1) 00
w(t) ==
L vfiJif3i(t)Vi
(12)
i=l
as an expansion of w(t) where the f3i's are independent standard real valued Wiener processes. The convergence is in V'. In order to derive our results we will generally restrict the functions f and h to being of class Ct (notation below) but explicit conditions will be given in the theorems below. For the purpose of expressing (and proving) the Zakai and FKK filtering equations, it is useful to model the noise process w (t) (or more precisely Gw(t)) as the sum
(13) where G 2 : R m ~ H is a bounded, full rank linear operator. This is possible by setting G 1 dWl (t) equal to the difference between the first and last terms. The Wiener process Wl (t) takes its values in a separable Hilbert space U 1 (Ut == H is one possibility) and G l : U l --+ H bounded linear. If we use (13) and adjoin a random initial condition, we get
du(t) == [Au
+ B(u)] dt + G1dwl(t) + G2dw2(t) u(O) == uo.
(14)
(15)
We will fix throughout T > 0 and work on the time interval [0, T]. The results of the paper also hold on [0,00), except that in this case convergence,
Stochastic Reaction-Diffusion Equations
223
e.g., in C([O, 00), H), is in the topology of uniform convergence on compact subsets of [0,00). There will be several filtrations Ft of increasing a-fields C F. Usually these filtrations are generated by one or more processes with independent increments; for such processes v(t) we will denote by the completion of a{v(s) : 0 < s ~ t}, and we include in every F~ all P-null sets.
Fr
SOLVABILITY OF THE STOCHASTIC REACTION DIFFUSION EQUATION
3
We can take the following basic probability space:
o == H
X
C([O? T]; U),
F == B(H x C([O, T]; U))
m == Ita x ,\ where B(.) is the Borel algebra, Ita is the distribution of the initial data and ,\ is the Wiener measure. On the basic probability space (0, F, m) given above we will take as our normal filtration F:0'w the complete a-fields generated by Uo and w(·) and all P-null sets ([DZ92], ch 3.3 and 7.1). Since Uo is nO-measurable, and E F, the solution u(t) of (14) and (15) will be a predictable process with respect to this a-field.
no,w
Definition 1 For any H -valued
(14)
no,w
-measurable random variable Uo, a predictable H -valued process u(t), 0 ~ t ~ T is a mild solution of and (15) if, for all t E [0, T]
F:0'w
m{l lIu(s)Wds < oo} = t
i) and
ii) u(t)
= S(t)uo +
I
I
1
t
S(t - s)B(u(s))ds
t
+
S(t - s)Gdw(s), m-a.s..
224
Hobbs and Sritharan
The condition that the process u(t) be predictable with respect to the filtration F;"0'w is important for it will play a role in our main result. This means ([DZ92], ch 3.3) that u(t) = u(t, w) is measurable with respect to PT the (completed) a-field generated by all subsets of [O,T] x n which have the form (s, t] x F where s; S < t S; T and F E F:o,w.
°
Theorem 1 Let assumptions (A 1) to (A4) hold and the initial value Uo be an
H-valued random variable which is independent ofw(·), and with Elluollq < 00 for some q 2 2. Then the initial value problem (14) and (15) has a unique (up to equivalence) mild solution u(t). Further, u(t) has a version whose trajectories are continuous a.s., i.e., u(·) E C([O, T]; H), and there exists q C > 0 (depending on T) such that SUPtE[o,T]Ellu(t) IIq S; C(l + Elluoll ). Denoting by X q the Banach space of ,H -valued predictable processes v(t) such that the norm (suPo~t~TEllv(t)"q)llq< 00, proof of the existence of u(t) is obtained by taking u(t) as the limit of successive approximations of the mapping !( : X q ~ X q defined by
Kv(t)
= S(t)uQ +
I S(t - s)B(v(s))ds + I S(t - s)Gdw(s). t
t
(16)
!{ is a contraction mapping on sufficiently small subintervals of [0, T] ([DZ92], Theorem 7.4 or [Ich82]). We note three things that we will need for Theorems 2 and 3, the main results of this paper: First, although the solution u( t) only satisfies (14) in a mild sense (it need not take values in D(A)), it is H-valued and not just a distributional solution. Second, as a mild solution, u(t) is predictable and hence adapted to the filtration F;"0'w ([DZ92], ch 7.1). An examination of the proof shows that F;"0,w is the smallest a-field that can be used for Ft, so u(t) is indeed F;"0'w adapted. Third, u(t) is a measurable function ofuo and w(s) for S S; t. For on the subinterval (ti, ti+l), u(t) is the limit of a sequence of the form ui(t) = !(ui-1(t) where I{v(t) is the limit of sums of the form
S(t - ti)V(ti)
+ ES(t -
sk)B(v(Sk))(Sk - Sk-l)
k
+ E S(t - Sk)G(W(Sk) - W(Sk-l)) k
(17)
225
Stochastic Reaction-Diffusion Equations
and the sums are taken over a partition {Sk} of the subinterval. We see that each term in these sums is clearly measurable with respect to the starting data (S(t) is continuous) and the O"-field generated by w(t) on the relevant subinterval only.
DERIVATION OF THE FKK AND ZAKAI EQUATIONS
4
In this section we will derive evolution equations for the conditional expectation IIt(f) = E [f(u(·, t))IFtZ]. Let us define the innovation process Y(t) = {Y1(t),···, Yn(t)} as,
Y(t)
= z(t)
-it
IIr(h)dr.
(18)
Lemma 1 [Fl(/(72} Let u(·) be the solution of (9) (Theorem 1), z(t) be the observation process defined in (10) and h(·) E Cb(Rm ). Then (Y(t),Ft,m) is an m-vector standard Wiener process. Moreover, the two sigma fields {Y(r) - Y(s), t ::; s < r ::; T} and Ft are independent.
0"
The following martingale representation result due to Fujisaki-KallianpurKunita [FKK72] is the key to the derivation of the nonlinear filtering equation.
Lemma 2 Every square integrable martingale (M (t), Ft, m) is sample continuous and can be represented as a stochastic integral with respect to the innovation process:
M(t) = £[M(O)] where
E
it
+ ~(s) · dY(s),
s E [0, T],
iT 1~(t)12dt < +00
and «P(t) is jointly measurable in (0, T) x
(20)
n and adapted to Ft.
Definition 2 The class of cylindrical test functions COY
=
{f(·,·) : [-a, T] x H ei E D(A),i
(19)
~ R;
CCY
as follows:
f(t, u) = 4>(t, (u, el),···, (u, en)),
= 1,··· ,n;4> E Cgo((-a;T)
x Rn)} ,a> O.
(21)
226
Hobbs and Sritharan
We now define (22) where
f
E
CCY
and £ given by (5).
Lemma 3 For all gale in [0, T].
f
E CCY ,
(M f (t), Ft, 'In) is a square integrable martin-
This follows from the fact that for the mild solution, combining the results in [DZ92] and [Vio76] we can conclude that, for f E CCY ,
Mf(t):= j(t,u(t)) - j(O,u(O))
-It(~~(s,u(s)) +£j(s,u(s)))ds
(23)
is a square integrable ~,UO-martingale (see also [HSS95] for details). We n.ow note the following estimate,
E IT Ij(u(t)h(u(t))1 2 dt < +00,
(24)
Under the condition (24) we can follow the method in [FKK72] to obtain the explicit form of tP(t) in (19) using lemmas (2) and (3): (25) where M is defined in (6). We thus get the :Fujisaki-Kallianpur-Kunita equation (4) for
f
E
CCY.
We have due to the boundedness· of h,
E IT Ills(h)1 2 ds < 00. Define Bt(f) for f E
et(J)
CCY
(26)
as,
= llt(J) exp {It lls(h) · dz(s) - ~ It Ills(h)1 2 ds} .
(27)
Then by Ito formula (see [HSS95]) we get the Zakai equation (7) for f E CCY.
227
Stochastic Reaction-Diffusion Equations
5
KOLMOGOROV'S BACKWARD EQUATION
The proof of the uniqueness of measure valued solutions in Theorem 3 will be based on having a unique solution of Kolmogorov's backward equation,
+ h(v)· ~(t)(t,v),
t > T,V E D(A),
(28)
and
(29)
(T,V)=='l1(V), vEH.
Definition 3 A strict solution to (28)-(29) is a function
--+
R such that
(i) E Cb([O, T] x H) (ii) (t,·) E C;(H), Vt ~ 0, (iii) E C 1 ([0, T] x D(A)) and (28) is satisfied for and v E D(A) and t ~
o.
C~(H). Then (28) and (29) has a unique strict solution for 0 :S t :S T and it is given by the Feynman-l(ac formula
Proposition 1 {DZ92} Let h(·), 'l1(.) E
(t, v)
= E[ll1(u X(t,v))exp (it h(uX(s, v)) ox(s)ds)J
(30)
where u( t, v) is a solution of
duX(t) == [Au X + B(u X) + G2~(t)] dt UX(T,T,V)==VEH.
+ Gdw(t)
(31) (32)
228
6
Hobbs and Sritharan
MEASURE VALUED SOLUTIONS AND. SOLVABILITY OF THE FKK AND ZAKAI EQUATIONS
Let M(H) be the vector space of finite O"-additive measures on the Borel O"-field B(H); this is a subspace of the dual of Cb(H) and can be given the inherited weak topology. Denote by M+(H) the subset of positive measures and P(H) the subset of probability measures on this Borel O"-field. In order to define measure valued solutions for the Zakai and FKK equations and show the existence of such solutions we again need the class of cylindrical test functions introduced earlier.
Definition 4 A M+(H)-valued process et is called a measure valued solution of the Zakai equation on [0, T] if the following five conditions hold: (i) et is Ft adapted, i.e., St is Ft measurable for all t E [0, T],
(ii) E (iii)
1TfH Ilull
q
8 t ( du )dt < 00, q
El < St, 1 > 2 < +00, 1
1 I< T
(iv) E (v) for all
f
E CCY
8t, 1 >
2
1
~ 2,
t E [0, T],
dt < +00,
(34) (35)
and t E [0, T] the weak Zakai equation holds
< St, j(t) >==< 8 0 , j(O) >
+
1 < 8s,osj(s) +£j(s)) > ds + 1 < 8 s,Mj(s) > odz(s) t
(33)
t
(36)
rn-a.s.
Definition 5 A P(H)-valued process IT t is called a measure valued solution of the FKK equation on [0, T] if the following three conditions hold: (i) IT t is Ft adapted, , i.e., IT t is Ft measurable for all t E [0, T],
Stochastic Reaction-Diffusion Equations
(ii) E (iii) for all
f
229
loT LIluWITt(du)dt <
E COY and t E
00,
q
~ Z,
(37)
[0, T] the weak FKK equation holds
< TIt, f{t) >=< 110, f(O) > +
lot < ITs, (8sf(s) + £f(s)) > ds+ lot < ITs, Mf(s) -
f(s )h(s) > ·[dz(s)- < ITs, h(s) > ds]
(38)
m-a.s.
Point (ii) says that et and II t have at least finite second moments. In order to prove measure valued solvability for the FKK and Zakai equations, we will need to mention the existence of conditional probability measures; these are the kernels in the following definition [Get75]. A kernel from the measurable space (f!, A) to the measurable space (H, B) is a real function /l(w, B) defined for each wEn and B E B such that w .....-+ /l(w, A) is A-measurable for all B E Band B .....-+ /l(w, B) is a positive finite measure for all W E f!. We now come to one of our main results.
Theorem 2 Assume that the hypotheses of Theorem 1 hold and that h E m C~(H; R ). Assume also that \11 E C~(H). Then there exists a unique measure valued solution TIt of the F!(!( equation (38) on [0, T], and there exists a unique measure valued solution et of the Zakai equation (36) on
[0, T]. Also, TIt and
et
are related by
(39) and its inverse
(40)
Proof: Since II t and et will be related by the invertible transform (39), (40) it suffices to show existence and uniqueness for only one besides the relation (39),(40). It will be convenient to show existence for II t and uniqueness for The key step for the existence of TIt is the following lemma on the existence of kernels [Get75] .
et.
Hobbs and Sritharan
230
([Cet7S], Prop 4.1): Let Y be homeomorphic to a Borel subset of a compact metric space (Y is a Lusin space [Cet7S]), and denote by Bb(Y) and Bb(O) respectively bounded Borel functions on Y and o. Suppose that T : Bb(Y) ---t Bb(O) is linear a.e., positive a.e., and satisfies 0 :s; fn i f implies T fn i T f for any sequence of functions {fn} and f E Bb(Y). Then there exists a bounded kernel /l(.,.) from (0, A) to (Y, B(Y)) such that T f(w) == fy f(u)/l(w, du), for all f E Bb(Y). (Equal here is as elements of
Lemma 4
Bb(O).) To prove the theorem we first use the lemma to obtain a kernel which is a candidate for our desired measure. Now every complete, separable metric ,pace (Polish space) is a Lusin space (see [Get75] and the reference [3] contained therein, p 201), so the Hilbert space H satisfies the condition of the lemma. At any fixed t E [0, T], f(t,·) is bounded Borel on H. Now we set A == Ft as the a-field on 0 and define T in the lemma by
Tf(t,·)(w) == E[f(t,u(t))\Ft](w). The expectation is with respect to rn on !1, and u(t) == u(t,w) is a measurable function on !1. It is easy to check that this l' satisfies the hypothesis of the lemma: linearity, positivity, and 'continuity' for bounded nondecreasing sequences. Thus, there is a kernel (depending on t) Il t such that
Tf(t,·)(w) = fHf(t,u)llt(w,du). We conclude that
IIt(J)
= E[f(t, u(t))IFtJ =
Lf(t, u)llt(-' du).
(41 )
for all bounded Borel f. We now check that the kernel Il t is indeed a measure valued solution of the FKK equation. Point (i) of the definition follows from the lemma, for the definition of 'kernel' implies that /It is measurable. (ii) follows from the b.ound given in Theorem 1 and the Monotone Convergence Theorem. For we can apply
Fr
= k9n(u)llt(-,du) the bounded functions .9n(u) == Ilull n, n E N, and E[9n(u)IFt]
to as n
q
i
00.
/\
then take the limit
Stochastic Reaction-Diffusion Equations
231
Verifying (38) is the main work in this argument. Our approach is to simply substitute (41) into (4). However, one finds that without further restrictions on f E Bb(H) the resulting expressions Otf + .cf and Mf are not hounded Borel functions and (38) is not implied from (4) and (41). For this reason we restrict our class of test functions to f E CCY , and we indicate how to make sense of the terms in (4) through (6). Now, < ftU, ouf(t, u) > will mean < u, AOuf(t, u) >, using the selfadjointness of A and noting again that Vu E H, ouf(t,u) E D(A) because ei E D( A) for i == 1, 2, ... frolll the defillition of CCY . This term is well defined for all u E H and, as a real valued function of t, it is Coo on [0, T] hence it is bounded Borel as required by (38) and (41). Next, < B( u), ouf(t, u) > is well defined by the above comments on ouf and the hypothesis that D( B) == H and B has linear growth (assumption A2). Finally, tr(GQG*ouuf(t, u)) is well defined for all u E H and is in fact a Coo function of t E [0, T] (and therefore is bounded Borel). For using (21) and choosing the same orthonormal set {ei} as there ([DZ92], p 416) we have 00
tr(GQG*ouuj(t, u)) ==
L: < GQG*Ouujei, ei > i=l
n
00
==
k=l£=l
i=l n
n
L: < GQG*(L: L: Ok£cP( ... )ek ® e£)ei, ei >
n
== L:L:0k£cP(t,< U,el >, ... ,< u,e n » < GQG*ek,e£ > (42) k=l£=l and this is clearly well defined and Coo as a function of t E [0, T] for every uEH. The term G;ouj(t, u)) arising in Mj is easy to handle: As a function of t it is in Coo([O, T]; R m ) since G; is bounded and f E CCY . Our proof of the uniqueness of is adapted from [Sri94] which is an infinite dimensional generalizatioll of a method of Rozovskii [Roz91]. We also point out that an analogous method was used by Vishik and Komech [VK84] for the uniqueness theorem of the direct Kolmogorov equation associated to the stochastic Navier-Stokes equation. Fix any x E C([O, T], R m ) and define the following three processes on
et
[O,T]: qt == exp{
i
t
o
lit
x(s)· dz(s) - 2
0
1~(s)12ds},
Hobbs and Sritharan
232
p;l
= exp{ _ [t h(uX(s)). dz(s) + ~
lo
COY
Ih(u X(s))1 2ds},
= qtPt-1 ·
It
For any j E
t
2 lo
apply the Ito formula to < 8 t , j(t) > It to obtain
< et, f(t) > It =< 8 0 , j(O) >
1<
+ +
t
+ (£ + Mz)f(s) > ,sds
8 s,8sf(s)
1 t
Is [< 8 s, f(s) > (z(s) - h(UX(s )))+ < 8 s, Mf(s) >] · dW2( s).
The last term is a martingale so
E < et, f(t) > It = E < eo, j(O) >
1J < t
+E
8 s, 8sf(s)
+ (£ + Mz)f(s) > Is ds .
(43)
Now, let us take the unique solution (t, v) (see (30)-(31)) of the backward Kolmogorov equation (28) corresponding to the initial data w( v). Here w( v) is a cylindrical test function in v. We will consider the smooth approximations
f(t, v)
= ~n,t:(T - t, v)
(44)
and take the limit E ---+ 0 and n ---+ 00. Using the convergence properties of
E{<
=E where
U
X
[W(UX(T, 0, v)) exp
(1
> ,T}
T
h(uX(r,O,v))· Z(r)dr)] ,
(45)
solves (31). Now, using Girsanov's transformation we get
E [W(UX(T, 0, v)) exp = E
(1
T
h(uX(r,O,v))· Z(r)dr)]
[w (u (T, 0, v ))qT] ,
( 46)
Stochastic Reaction-Diffusion Equations
233
where u solves (14). To justify this step we need to use finite dimensional approximations of (31) and (14), use Girsanov transform to these finite dimensional diffusion processes and then use the weak convergence of the probability distributions of u and U X to obtain (46) in the limit [HSS95]. We will now apply the absolutely continuous change of measure (O,~, m) to (O,~, m) defined by din -1 dm = PT · (47) Then under the new measure we can write (45)-(46) as
E {< eT, W > qT} = fj; [fj; [w( u(T, 0, v) )PTIFT]qT] .
(48)
Since processes of the form qt defined above are dense in L 2 (0, Ft , m) [Roz90], we conclude that
< eT, \IJ >= E [\IJ( u(T, 0, v) )PTIF~], m-a.s.
(49)
Since for an arbitrary measure-valued solution TIt of the FKK equation, et defined by (39) satisfies (36), we have thus established the uniqueness of TIt and et in the interval [0, T].
References [Ahm94] N. U. Ahmed. Nonlinear filtering for stochastic differential equations in Hilbert spaces. In W. F. Ames, editor, 14th IMACS World conference on computational and applied mathematics, pages 5-8, 1994. [DZ92]
G. DaPrato and J. Zabczyk. Stochastic Equations in Infinite Dimensions. Cambridge University Press, New York, 1992.
[FKK72] M. Fujisaki, G. Kallianpur, and H. Kunita. Stochastic differential equations for the nonlinear filtering problem. Osaka J. Math., 9:1940, 1972. [Get75]
R.K. Getoor. On the construction of kernels. In P.A. Meyer, editor, Seminaire de Probabilites IX. Lecture Notes in Mathematics, vol465. Springer-Verlag, 1975.
234
Hobbs and Sritharan
[HSS95] S. L. Hobbs, G. Sobko, and S. S. Sritharan. Nonlinear filtering theory of stochastic semilinear partial differential equations. To be published, 1995. [Ich82]
A. Ichikawa. Stability of semilinear stochastic evolution equations. J. Math. Anal. Appl., 90:12-44, 1982.
[Roz90]
B.L. Rozovskii. Lecture notes on linear stochastic partial differential equations. Lecture Notes 25, Dept. Math., University of North Carolina, 1990.
[Roz91]
B.L. Rozovskii. A simple proof of uniqueness for Kushner and Zakai equations. Stochastic Analysis, ed. E. Mayer-Wolf and E. Merzbach and A. Schwartz:449-458, 1991.
[Sri94]
S. S. Sritharan. Nonlinear filtering of stochastic Navier-Stokes equation. In T. Funaki and W. A. Woycznski, editors, Nonlinear Methods on Stochastic Partial Differential Equations: Burgers Turbulence and Hydrodynamic Limit. Springer-Verlag, 1994.
[Vio76]
M. Viot. Solution faibles D 'equations aux derivees partielles stochastique nonlineaires. These, tJniversite Pierre et Marie Curie, Paris, 1976.
[VK84]
M. J. Vishik and A. I. Komech. On Kolmogorov's equations corresponding to the two dimensional stochastic Navier-Stokes system. Trans. Moscow Math. Soc., pages 1-42, 1984.
[Zak69]
M. Zakai. On the optimal filtering of diffusion processes. Wahrscheinlichkeitstheorie. Verw. Geb., 11 :230-243, 1969.
Z.
An Operator Characterization of Oscillatory Harmonizable Processes RANDALL J. SWIFT Department of Mathematics, Western Kentucky University, Bowling Green, Kentucky
Dedicated to Professor M.M. Rao, advisor and friend, on the occasion of his 65th birthday.
1
INTRODUCTION
A class of nonstationary stochastic processes which are encountered in some applications is the class of modulated stationary processes X(t). These processes are obtained when a stationary process Xo(t) is multiplied by some nonrandom modulating function A(t):
X(t)
==
A(t)Xo(t).
This class of processes has been investigated by Joyeux (1987) and Priestley (1981). The book by Yaglom (1987) provides a nice treatment of these processes. In particular, if A(t) admits a generalized Fourier transform, the class of oscillatory processes, studied by Priestley (1981) is obtained. In sorne physical situations, the assumption of stationarity for the process X o(t) is unrealistic R,ao (1982). If this condition is relaxed, and Xo(t) is assumed to be harmonizable and if A(t) admits a generalized Fourier transform, the process X(t) is not oscillatory, but is oscillatory harmonizable. This paper investigates the properties of oscillatory harmonizable processes. Section 2 recalls the basic theory of harmonizable processes required for the subsequent analysis. Section 3 introduces and develops the class of oscillatory harmonizable processes. In this section, the spectral representation of oscillatory harmonizable processes is obtained. This representation is used to deduce relationships between the oscillatory harmonizable processes and 235
Swift
236
other classes of nonstationary processes. Section 4 obtains an important and useful operator characterization for oscillatory harmonizable processes.
2
PRELIMINARIES
In the following work, there is always an underlying probability space, (0,2::, P), whether this is explicitly stated or not. DEFINITION 2.1 For P 2: 1, define Lf;(P) to be the set of all complex valued f E LP(O, 2::, P) such that E(f) == 0, where E(f) == In f(w)dP(w) is the expectation. In this paper, we will consider second order stochastic processes. More specifically, mappings X : IR ~ L6(P), DEFINITION 2.2 A stochastic process X : lR ~ L6(P) is stationary (stationary in the wide or Khintchine sense ) if its covariance r (s, t) == E (JY" (s )..(Y" (t)) is continuous and is a function of the difference of its arguments, so that
r(s, t) == f(s - t). An equivalent definition of a stationary process is one whose covariance function can be represented as
(1) for a unique non-negative bounded Borel measure F(-). This alternate definition is a consequence of a classical theorem of Bochner's (Gihman and Skorohod, 1974), and motivates the following definition. DEFINITION 2.3 A stochastic process X : IR ~ L6(P) is weakly harmonizable if its covariance r(·,·) is expressible as
r(s, t)
=
fIR fIR ei>..s-i>"'tdF(A, A')
(2)
where F : IR x IR ~ C is a positive semi-definite bimeasure, hence of finite Frechet variation. The integrals in (2) are strict Morse-Transue, (Chang and Rao, 1986). A stochastic process, X(·), is strongly harmonizable if the bimeasure F(·,·) in (2) extends to a complex measure and hence is of bounded Vitali variation. In either case, F(·,·) is termed the spectral bi-measure (or spectral measure) of the harmonizable process. Comparison of equation (2) with equation (1) shows that when F(·,·) concentrates on the diagonal A == A', both the weak and strong harmonizability concepts reduce to the stationary concept. Harmonizable processes retain the powerful Fourier analytic methods inherent with stationary processes, as seen in Bochner's theorem, (1); but they relax the requirement of stationarity. The structure and properties of harmonizable processes has been investigated and developed extensively by M.M. Rao and others. The following sources are listed here to provide a partial summary of the literature. The papers by Rao (1978, 1982, 1989, 1991, 1994) provide a basis for the theory. Chang and Rao (1986) develop the necessary bi-measure theory. A study of sample path behavior for harmonizable processes is considered by Swift (1996b). Some results on moving average representations were obtained by Mehlman (1992). The
Oscillatory Harmonizable Processes
237
structure of harmonizable isotropic random fields and some applications has been consid~red by Swift (1994, 1995, 1996a). Second order processes with harmonizable increments has been investigated also by Swift (1 996c). The forthcoming book by Kakihara gives a general treatment of multidimensional second order processes which include the harmonizable class.
3
OSCILLATORY HARMONIZABLE PROCESSES
M.B. Priestley (1981), introduced and studied a generalization of the class of stationary processes. This generalization is given by: DEFINITION 3.1 A stochastic process X : 1R -t L6(P) is oscillatory if it has representation
X(t)
=
fR A(t, A)ei>.tdZ(A)
where Z (.) is a stochastic measure with orthogonal increments and
A(t, A)
=
h
eitx H(A, dx)
with H(·, B) a Borel function on JR, H(>..,·) a signed measure and A(t, >..) having an absolute maximum at >.. == 0 independent of t. Using this representation the covariance of an oscillatory process is
The idea of definition 2.3 provides the motivation for the following definition: DEFINITION 3.2 A stochastic process X : JR -t L6(P) is oscillatory weakly harmonizable, if its covariance has representation
r(8, t) ==
r r A(s, >")A(t, >"')eiAS-iA'tdF(>.., >..')
.J IR .JlR
where F(·, .) is a function of bounded Frechet variation, and
A(t, A)
= fm eitx H(A, dx)
with H(·, B) a Borel function on JR, H(>.., .) a signed measure and A(t, >..) having an absolute maximum at >.. == 0 independent of t. Note that if A(t, >..) == 1, this class coincides with the weakly harmonizable processes. As Priestley's definition provides an extension to the class of stationary processes, definition 3.2 provides an extension to the class of weakly harmonizable processes. Observe, further, that in this definition, for F(·,·) concentrating on the diagonal, >.. == >"', the oscillatory processes are obtained. Thus the oscillatory harmonizable processes also provide an extension to the class introduced by Priestley, which we will now term oscillatory stationary.
Using this definition, it is possible to obtain the spectral representation of an oscillatory harmonizable process X(·).
Swift
238
THEOREM 3.1 The spectral representation of an oscillatory weakly harmonizable stochastic process is:
X(t)
= fm A(t, .>-)eiAtdZ('>-)
where Z (.) is a stochastic mJeasure satisfying
with F(·,·) a function of bounded Frechet variation.
Proof: Let X(·) be an oscillatory weakly harmonizable process. Then, the covariance r(·,·) has representation
r(s, t) ==
r r A(s, A)A(t, A')ei>.s-i>"tdF(A, A').
JmJm
Applying a form of Karhunen's theorem, (Yaglorn, 1987, volume 2, pages 33 - 41) gives the spectral representation of X (.) as
X(t)
= fIR A(t, .>-)eiAtdZ('>-) ,
which is the desired result. 0 The following condition on the signed measure H, for oscillatory strongly harmonizable processes show these processes are actually a subclass of the strongly harmonizable processes. A similar result was obtained by R. Joyeux (1987), for the oscillatory stationary processes. THEOREM 3.2 If X(·) is an oscillatory strongly harmonizable process with
j~ 1H('>-, d:r) 1< CXl uniformly in A E lR, then X (.) is strongly harmonizable.
Proof: Let Z(A) = where A is a Borel set of
fm H('>-, A -
'>-)dZ('>-)
m and A - A == {x - A : x EA} .
.2(.) is a stochastic measure since H(A,') is a signed measure, and uniformly bounded by K. Now set
-,Y(t) ==
r
eiAtdZ(A) . .fm Claim: X(.) is a strongly harmonizable process. If one lets .2(A, B) == E(Z(A)Z(B)) A, B Borel sets of lR, it must be shown that
r r I F(dw, dw') I <
.fm. .fIR
00 .
Now
E(Z(dw)Z(dw'))
fm fIR H(.>-, d(w -
.>-))H(N, d(w' - .>-))E(Z(d.>-)Z(d.>-))
fm fm H(.>-, d(w -
.>-))H(.>-', d(w' - .>-))F(d.>-, d.>-')
Oscillatory Harmonizable Processes
239
where F(A, B) == E(Z(A)Z(B)) is of finite \!itali variation since -"Y(t) is strongly harmonizable. Thus,
fIR fIR
I
F(dw, dw')
fIR fIR fIR fIR H(>', d(w I
1
< since Now
I H I (A, JR)
>'))H(N, d(w' - N))F(d>', dA')
1
00
is bounded, proving the claim.
== X(t). So X (t) is strongly harmonizable~ which completes the proof of the theorem. 0 An additional class of processes related to the oscillatory processes is given by: DEFINITION 3.3 An oscillatory weakly harmonizable stochastic process X : 1R --* L6(P) is c-slowly changing weakly harmonizable if
B(>.) =
fIR I x I I H I (>., d:r)
::;
E,
V >.
E
JR.
Slowly changing stationary processes where first considered by Priestley (1981) and are of interest not only in engineering but also in economics. Priestley showed that it is possible to define a spectral measure for these processes. The class of slowly changing harmonizable processes introduced above extend the class of slowly changing stationary processes. The following corollary shows that it is possible to consider a similar concept for the slowly changing harmonizable class.
COROLLARY 3.1 Slowly changing strongly harmonizable processes form a subclass of strongly harmonizable processes.
Proof: The assumption is
.fIR I x I I H I (A, dx)
::; cV>. E JR.
Claim:
.fIR I H
[ (>., dx) <
00.
Swift
240
In fact,
I H I (A,JR)
1m I H I (A,dx) '~TI
r
J,xl?.1
I
H
+ 1m I x 11
I
H
(A, dx) ::; K
I
+
(A, dx) ::; K
r
J 1xl?.1
+E <
1
x
11
H
1
(A, dx)
00
which is the claim. Now since K is finite, by the theorem X(t) is strongly harmonizable, proving the corollary. 0
4
AN OPERATOR CHARACTERIZATION
Using oscillatory harmonizable processes, it is possible to obtain a representation of a broader class of processes on fR. DEFINITION 4.1 Let S be a locally compact space with Bo as the a-ring generated by the bounded Borel sets of S. If T is any index set, {X (t), t E T} c L6 (P) a second order process, r its covariance and ~ : Bo x Bo -t C, a bimeasure having locally bounded Frechet variation then X (.) is said to be (locally) weakly of class (C) when ~ is positive definite and
r(s, t)
=
1s1s g,(A)gt(N){3(dA, A') (s, t)
E T x T (strict MT-integral)
where gt : S -t C, t ETa family of Borel functions for which the integral exists. If ~ has locally finite Vitali variation, then the process is termed of class (e) relative to {gs' SET} and ~. Weak class (C) processes are considered extensively in Chang and Rao (1986). Oscillatory harmonizable processes affords this broad class of processes to have a simple representation on fR as seen in the following PROPOSITION 4.1 The class of oscillatory weakly harmonizable processes {X(t), t E fR} c L6(P) coincides with the class of weak class (C) processes indexed on fR. Proof: This follows by setting gs(A) == eiSAAs(A) in the definition of weak class (c) processes, since F(·,·) always has finite Frechet variation. D Using this simple identification, an operator representation of weak class (C) processes indexed on fR is possible. This result is an extension of that given in Chang and Rao (1988) for the oscillatory stationary class. THEOREM 4.1 X (.) is an oscillatory weakly harmonizable process iff it is representable as
X(t) == a(t)T(t)Y(O), t E fR, where Yo == Y(O)
Oscillatory Harmonizable Processes
241
is some point in
H()C) == sp{X(t), t E 1R} with a(t)
CL
densely defined closed operator in H( ..Y ) for each t E 1R and
{T(s), s
E
lR}
a weakly continuous family of positive definite contractive operators in H (~Y) which commutes 'with each a( t), t E lR. Proof: Suppose ){(t) is oscillatory weakly harmonizable, then
X(t) =
fIR A(t, '\)ei>.tdZ('\)
where Z (.) is a stochastic rneasure satisfying
with F(·, .) of bounded Frechet variation. Let
then Y (.) is weakly harmonizable. Now by a theorem of Rao (1982) there is a weakly continuous family of positive definite contractive operators {T(t), t E lR} on H()() == sp{X(t), t E lR} so that Using the spectral theorem for this family of operators, (cf. R,ao, 1982)
l'(t) ==
r
.fIR
eiJl.t
E(dA), t
E
JR
where {E(·), B} is the resolution of the identity of {T(t), t E 1R} with B as the Borel a-algebra of JR. So Z(A) == E(A)Yo, A E B. Now define
a(t) =
fIR A(t, '\)E(d'\)
t
E
JR.
It follows that a(t) is closed and densely defined on H( ..Y ) with its domain containing
{Y(s), s E 1R}. Now since T(t) and E(D) commute for all t and D, then a(t) and {E(D), D E B} commute, so that a(t) and {T(s), s E lR} commute for each t.
242
Swift
Thus
a(t)T(t)Yo
fIR A(t, A)ei),t E(dA)Yo
X(t) where (3) follows since
Thus if X(t) is oscillatory weakly harmonizable, then
X(t) == a(t)T(t)Y(O) where
Yo == Y(O) is some point in H(}{) == sp{X(t), t E lR}
a(t) is a densely defined closed operator in H(X) for each t {T(s), s
E
E
lR and
lR}
is a weakly continuous family of positive definite contractive operators in H(X) which commutes with each a(t), t E lR. Now suppose X(t) can be represented as
X(t) == a(t)T(t)Y(O) with a(t), T(t), and Y(O) as stated in the theorem. Then, using a classical result of van Neumann and F. Riesz (1990), a(t) is a function g(t) of T (t) and further
a(t,) = g(t)T(t) =
hi g(t, A)E(dA).
Thus
-,X"(t)
a(t)T(t)Y(O)
r g(t, A)E(dA) JIRr eiwtE(dw)Yo
JIR
Oscillatory Harmonizable Processes
243
but this is the representation of a oscillatory weakly harmonizable process. 0
ACKNOWLEDGEMENTS The author expresses his thanks to Professor M.M. Rao for his advice and encouragement during the work of this project. The author also expresses his gratitude to the Mathematics department at Western Kentucky University for release time during the Spring 1995 semester, during which this work was completed.
REFERENCES 1.
2. 3. 4. 5. 6. 7. 8. 9. 10.
11. 12. 13. 14.
D. K. Chang and M. M. R,ao. (1986). Bimeasures and. Nonstationary Processes. Real and Stochastic Analysis John Wiley and Sons, New York, p. 7. D. K. Chang and M. M. Rao. (1988). Special Representations of Weakly Harmonizable Processes. Stoc. Anal. and Appl., fi(2):169. 1. 1. Gihlnan and A. V. Skorohod. (1974). The Theory of Stochastic Processes 1. Springer-Verlag, New York. R. Joyeux. (1987). Slowly Changing Processes and Harmonizability. J. Time Series Anal.. 8, No.4. Y. Kakihara. Multidimensional Second Order Stochastic Processes. World Scientific, In preparation. M. H. Mehlman. (1992). Prediction and Fundamental Moving Averages for Discrete Multidimensional Harmonizable Processes. J. Multiv. Anal., 43, No.l. M. B. Priestley. (1981). Spectral A.nalysis and Time Series. Vol. 1 and 2, Academic Press, London. M. M. R.ao. (1978). Covariance Analysis of Non Stationary Time Series, Developments in Statistics. 1, p. 171. M. M. R.ao. (1982). Harmonizable Processes: Structure Theory. L'Enseign Math, 28, p. 295. M. M. R,ao. (1989). Harnlonizable Signal Extraction, Filtering and Sampling. Topics in Non-Gaussian Signal Processing. (E.J. Wegman, S.C. Schwartz, J.B. Thomas, eds.), Springer-Verlag, New York. M. M. Rao. (1991). Salnpling and Prediction for Harmonizable Isotropic Random Fields. .T. Comb., Info. and Sys. Sci.. 16 No. 2- 3 p. 207. M. M. Rao. (1994). Harmonizable processes and inference: unbiased prediction for stochastic flows. J. Stat. Plan. and Infer .. 39 p. 187. F. Riesz & B. Sz-Nagy. (1990). Functional Analysis. Dover, New York. R. Swift. (1994). The Structure of Harmonizable Isotropic Random Fields. Stoch. j\nal. and Appl., 12, No. 5, p. 583.
244
15. 16. 17. 18. 19.
Swift
R. Swift. (1995). R,epresentation and Prediction for Locally Harmonizable Isotropic Random Fields. J. Appl. Math. and Stoch. Anal.. VIII, p. 101. R. Swift. (1996a). A Class of Harmonizable Isotropic R,andom Fields. J. Comb., Info. and Sys. Sci., (to appear). R. Swift. (1996b). Almost Periodic Harmonizable Processes. Georgian Math. J., ( to appear). R. Swift. (1996c). Stochastic Processes with Harmonizable Increments. J. Comb., Info. and Sys. Sci., (to appear). A.M. Yaglom. (1987). Correlation Theory of Stationary and Related Random Functions. Vo!. 1 and 2, Springer-Verlag, New York.
Operator Algebraic Aspects for Sufficiency MAKATO TSUKADA, Department of Information Sciences, Toho University, Funabashi City, Chiba 274, Japan
o.
Introduction. Sufficiency is one of the most important concepts in mathematical statistics. In the measure theoretic context ([Halmos and Savage, 1949]), it is specified with a measurable space (0, F), a set of probability measures P and a o"-subfield 9 of F. However for a technical reason it is often assumed that P is dorninated by some a-finite measure. If not so, several pathological difficulties occur (see for example [Burkholder, 1960]). More general property than domination was, for example, introduced by [Pitcher, 1965]. On the other hand, [LeCam, 1964, 1986] discussed sufficiency in an abstract framework, namely, the theory of Banach lattices. Including these, several attempts have been made to remedy such difficulties. Some of these are also related to an axiom of set theory, that is, existence of measurable cardinality ([Ramamoothi and Yamada, 1981],[Luschgy and Mussmann, 1985], etc.). Also see, [Fujii and Morimoto, 1986],[Luschgy, 1988], [Luschgy, Mussmann and Yamada, 1988]. In this note, we give another definition of sufficiency in the view of operator algebras and apply it to the theory of Gibbs states on countable sets. 1. Basic spaces. Let (0, F) be a measurable space. We denote by ca(O, F) (resp. pr(O, F)) the set of all countably additive bounded complex-valued measures (resp. probability measures) on (0, F).
Now let {(OJL,Fp.) : J-L E pr(O,F)} be a family of disjoint copies of (O,F). A bimeasurable bijection from (0, F) to (Op., FJL) is denoted by LJL for each J.l E 245
Tsukada
246
pr(O, F). Put
U
EBO ==
Ott,
..J!.Epr(n,:F) EBF == {A ~ EBO: A n Ott E F tt
L
m(A) ==
(\11-£ E pr(O,F)},
JL(L tt -l(A n Ott))
(A E EBF).
ttEpr(0.,F) Since (EBn,EBF,m) is a direct sum of {(O,F,JL): JL E pr(O,F)}, it is a localizable measure space, and the Banach space LP (EBO, EBF, m) of the set of all m-equivalence classes of p-th power integrable complex-valued functions on (EBfl, EBF, m) can be identified with the Banach space
EB
LP(O, F, JL) == {{ftt }ttEpr(0.,F) : ftt
ttEpr(n,F)
and
L
LP(O, F, JL)
E
(\I JL E pr(O, F))
J
If/LIPd/l < oo}
ttEpr(n,F)
for all 1 ::; p < 00 and LOO (EBO, EB.1'", m) the set of all m-equivalence classes of essentially bounded complex-valued functions on (EBO, EBF, m) with
EB
Loo(O,F,JL) == {{ftt }ttEpr(0.,F) : ftt
Loo(O,F,JL)
E
(\lJL
E
pr(O, F))
J..£Epr(0.,F) and
sup JL-ess. sup Ifttl < oo}. ttEpr(0.,F)
LOO (EBO, EB.1'", m) is the dual Banach space of L 1 (EBO, EBF, m). On the other hand LOO (EBO, EBF, m) can also be identified with a commutative von Neumann algebra as the multiplicative operator algebra on L 2 (EBO, EB.1'", m). The weak* topology and the weak operator topology on LOO (EBn, EBF, m) coincide because
Let B(O, F) be the set of all bounded measurable complex-valued functions on (0, .1'"). We define
1r(f) == {[fJIl}ttEpr(0.,F)
(f
E
B(n, .1'"))
\vhere [f]1l denotes the JL-equivalence class of f in LOO(O, F, JL). Let M(O, F) be the weak* closure of Im 1r in Loo (EBO, EB.1'", m). Proposition 1. M(O,.1'") is a von Neumann algebra and its predual is isometrically isomorphic to ca(O, F), which is equipped with the total variation norm.
Proof. It is trivial that M(O,.1'") is a von Neumann algebra. Let Mo be the polar of M(O, .1'"). That is,
Mo
= {f
E
L 1 (EBD,EBF,m):
J
fgdm
=0
(Vg E M(D, F))}.
247
Operator Algebraic Aspects for Sufficiency
L 1 (EBn,EBF,m)/M o can be identified with the predual of M(O,F). Suppose F E L 1 (EBO, EBF, m)/M o and I E F. Let vF(A)
=
i
("lA E F).
Idm
Then Vp does not depend on the choice of I and is a countably additive bounded complex-valued measure on (0, F). Conversely if v E ca(n, F) then there exist aI, a2, Q3, a4 2 0 and J-Ll, J-L2, Jl.3, J-L4 E pr(n, F) such that v == alJ-Ll - Ci2Jl.2
Put Cij,
l/-l == { 0,
+ i(Q3J-L3
- Q4J-L4).
if Jl. == Jl.j for some j == 1,2,3,4; otherwise.
1
Then F E L (EBf1, EBF, m)jMo such that {1/-l}J.LEPr(n,F) E F satisfies Vp == v. It is also straightforward that the mapping F r-+ Vp is an isometric isomorphism. 0 By the above proposition, for each v E ca(O, F) there exists a unique weak* linear functional
J
Idv
(VI E B(O, F)).
If v E pr(n,F) and {!J.L}J.LEpr(n,F) E M(O,F) then
=
J
Iv dv .
2. Experiments and observables. Let P be a non-empty subset of pr(n, F) and we call it a set of experiments. Define
po == {F E M(O,:F): 'Pv(IFI) == 0
(Vv E P)}.
Then
po == {{!J.L}J.LEpr(n,F) E M(O,F):
Iv == 0
(Vv E P)}
because
Iv
=0 ~ {::}
J
I/vl dv
=0
'Pv ( {I I J.L I} /-lEpr(n,F)) ==
o.
We can easily see that po is a weak* closed ideal of M(O, F). Therefore there exists E E M(f1,:F) such that (i.e., E is {O, 1 }-valued and it is an orthogonal projection as a multiplicative operator) and po == {EF: FE M(O,F)}. Let E be
{e/L}J.LEpr(n,.r).
Put
P == {J-L E pr(O, F) : eJ.L == O}.
Tsukada
248
Lemma 2. p == {J.L E pr(fl, F) : 'PJj (F) == 0
('rjF E 'p O )}.
Proof· ( ~ ) v E P
== 0 [fJj == 0 (VJ-L E P) =* fv == 0] [I f Jj I == 0 (VJ-L E P) => fv == 0] ['PJj(IFI) == 0 (VJ.L E P) => 'Pv(F) == 0]
=>
ev
'Pv(F) == 0
( 2 ) Because F
E
pO implies
IFI E
'Pv(F) == 0
(VF E pO).
pO,
(VF E pO)
=>
'Pv(IFI) == 0
=> => =>
['P Jj ( IF I) == 0 (V J-L [fJ.L == 0 (VJ-L E P) ev
=>
v E P.
==
(VF E pO) E
'Pv(IFI) == 0]
P)
=>
=*
Iv == 0]
0
D
Let £(P) be the linear span of P. Then we have the following.
Lemma 3.
£(P) == {v E ca(fl,F) :-'Pv(F) == 0
== {v
(VF E pO)}
E ca(fl, F) : ~J-Ll,' .. ' J-Ln E
3Cl, ... , Cn
> 0, Ivl :S
CIJ.Ll
P,
+ ... + cnJ.Ln} -,
where Ivl means the total variation measure of v and - is the closure in norm topology. Proof. The first equality follows from the previous lemma and the Hahn-Jordan decomposition. l,From this, £(P) is a closed subspace of ca(n, F). Put
x
== {v
E
ca(n, F) : ~J-Ll, ... , J.Ln
E
P, 3Cl,·.·, en > 0,
Ivl :::; CIJ.Ll + ... + CnJ-Ln}'
£(P) 2 X is trivial. Now suppose that there exists v E £(P) \ X. By the HahnBanach theorem and Proposition 1 there exists F E .:\11 (0, F) such that
'Pv(F) =I- 0
rpJj(F) == 0
.and
This contradicts the first equality.
D
(VJ.L E £(P)).
Operator Algebraic Aspects for Sufficiency
249
Proposition 4. M(O, F)/po is a von Neumann algebra and its predual is isometrically isomorphic to £(P).
Proof. This is a direct consequence of
pO == {F
E M(O,F):
'Pv(F) == 0
("'Iv E £(P))}.
0
We denote M(O, F)/po by M(O, F, P) and call it a set of observables. This space can also be constructed as follows. The direct sum
EB LOO(n, F, J-L) == {{f~}~E1' : f~
E
LOO(O, F, J-L)
(VJ-L E P)
~E1'
sup J-L -ess. sup
and
~E1'
If~ I < oo}
is a von Neumann algebra and its predual is
EB L
1
(0, F, J-L) == {{f~}~E1' : fJ1. E L 1 (0, F, J-L)
J1.E1' and
L J1.E1'
JIf,..ldfJ. <
(V/-L E P)
oo}.
We define a Inapping 7f1' from B(O, F) into ffiP.E1' Loo(O, F, /-L) by
(Vf
E
B(n, F)).
Then 7f1' is *-homomorphism and
ker1r1'=={/EB(n,F):f==O
J-L-a.e.
(VJ-LEP)}.
Since ker1rp is a closed ideal of B(O, F), B(O, F)/ ker7f1' is a C*-algebra with norm
IIFII ==
inf
fEF
IIfll
(F E B(O,F)/ker7f1')'
Moreover this space is *-isomorphic to lm 7f1" Hence 11[/]1'11
== sup Ilflloo,p. p.E1'
(VI
E
B(O, F))
where
[f]1' == {g E B(O, F) : f == 9 and 11· 1100,p. is the norm of LOO(O, F, 1-1-).
J-L-a.e.
(V J-L E P)}
Tsukada
250
Theorem 5. M(O, F, P) is the closure of lm 1fp in EB JLEP LOO(O, F, J-L) in the weak* topology. In particular, lm 1fp is weakly* closed if and only if there exists a localizable measure A on (0, F) such that each J-L E P has density dJ-L/ dA. Proof. The former assertion is easy anti we only prove the latter. Suppose lm 1fp is weak* closed, namely, it is a commutative von Neumann algebra. Since commutative von Neumann algebras are semi-finite, there exists a faithful normal semi-finite trace T on it. Put . (A E F).
This is a localizable measure which we want. Conversely if there exists a localizable measure A on (0, F) such that each has density dJ.1/ dA. We define
I=={fELoo(o.,F,A):f==o
J-L-a.e.
J-L E
P
(VJ-LEP)}.
Then it can be easily seen that I is a weakly* closed ideal of LOO (0., F, A) and that Loo(O, F, A)/I is *-isomorphic to B(O, F)/ ker1fp and then to lm 1fp. 0 Remark. The topology on B(o.,F)/ker1fp induced from the weak* topology coincides with the topology defined by [Pitcher,1965]. An analogous theorem is also proved by [Luschgy and Mussman, 1985]. Example 1. Let fJ be [0,1] and F the Borel field. Suppose P is the set of all Dirac measures on (0., F). Then EB JLEp Loo(fJ,F,J.1) is identified with [00[0,1] (the set of all bounded complex-valued functions on [0,1] ) and so is M(fJ, F, P). It is really bigger than lm 1fp. However we can modify F such as M(o., F, P) is identified with lm 7rp. Namely, let F be the power set of 0.. Example 2. Let (0., F) be the same as the above. Suppose P is the set of all Dirac measures and the Lebesgue measure. Then
EBLoo(O,F,J-L) ==M(rl,F,P) JLEP
In this example, we can not make any kind of modification like the above. 3. a-subfields and sufficiency. Let 9 be a a-subfield of F and
Pig == {J-LIQ : J.1 E P}. However if there is no ambiguity, Pig is merely denoted by P like M(O, g, P) rather than M(o., g, Pig). It is also true for J.1 E pr(O, F) like LP(rl, g, J.1) rather than LP(fJ, Q, J-LIQ). M(o., Q) and M(O, Q, P) are considered as von Neumann subalgebras of M(O, F) and M(O, F, P) respectively. The conditional expectation of f E LP(O, F, J.1) with respect to Q is denoted by EJL(flg) for each J.1 E pr(O, F). The mapping EJL('lg) is a projection of norm-one from LP(O, F, J.1) onto the subspace LP(O, Q, J.1) for every 1 ::; P ::; 00. Now we define
Operator Algebraic Aspects for Sufficiency
251
Then E(·/Q) is a projection of norm-one from LP(EBO, EBF, m) onto LP(EBO, EBQ, m) for every 1 S P S 00. E('IQ) naturally induces a projection of norm-one from EBJ.LEPLP(O,F,J.L) onto EBJ.LEP LP(O,Q, J.L). Is the range of M(O,F,P) contained in M(O, Q, P)? This containment is not always true. If it is true, then we say that 9 is sufficient for P. Namely, Q is sufficient for P if and only if {EJ.L (fig)} J.LEP belongs to M(O, Q, P) for all f E B(O, F). In general, this condition is really weaker than that for any / E B(O, F) there exists 9 E B(O, Q) such that EJ.L(fIQ) == 9 /-la.e. for all J-L E P. Let us consider Example 1 in the previous section and let 9 be the a-subfield generated by all singletons contained in O. Then 9 is sufficient for P because M(n, Q) == M(O, F) and E(·lg) is identity on M(O, F). For any f E B(O, F), E ox (fig) == f(x) , and no 9 E B(O, g) satisfies f == 9 . Note that P is dominated by the semi-finite counting measure A and d6 x / d>" is g-measurable. In Example 2, 9 is also sufficient for P, because M(rl, Q, P) is equal to [00[0,1] EB C l and E(·IQ) maps f E B(O, F) to f ffi f(x)dx .
fo
°
4. Gibbs states on a countable set. Let S be a countable set and the power set of S (the set of all subsets of S ). For each s E S , we define a {O, 1}-function as on 0 by (X E n) where Ix is the indicator function of X on S . It is well known that the weakest topology on 0 induced by {as} sES is totally disconnected, compact, and metrizable. The space C(O) of all complex valued continuous functions defined on 0 is a C*algebra. The Borel field on 0 is denoted by F , which is the smallest a-field on 0 generated by {as} sES' It coincides with the Baire field on 0, which is generated by C(O). Every probability measure is identified with a state on C(O) (i.e., positive linear functional J-L with J.L(I) == 1). EBJ.tEca(f2,F) £oo(n, F, J.L) is known as the enveloping von Neumann algebra which is the second dual Banach space C(O)** of C(O) . For any A ~ S , we denote by FA the a-subfield of F which is generated by {as}sEA. Clearly F 0 == {0,n} and Fs ==:F . We put
[A, A] == {Y EO: Y n A == A}
(A
~
X).
Then :FA is the smallest a-subfield containing {[A, A] : A ~ A} . Let C be the set of all finite subsets of S . A subset {fA} AEC of C(O) indexed by C is called a local specification if:
fA(X) ~ 0
L
(X EO),
fA(AUB) == 1
(B ~ A C )
A~A
for all A E C , and
/1'1 2 (A U B) == fA 1 (A U B)
L
/1'1
2
(A' U B)
A'~Al
for all A ~ Al ~ A2 E C and B with specification {fA} AEC if
~ Al C
•
We say that J.L E pr(O, F) is a Gibbs state
Tsukada
252 Ell (l[A,A] IFAC) (X) == fA(A U
(X n AC))
for all A ~ A E C and X EO. Let P be the set of Gibbs states with specification {fA} AEC . It is known that P is a non-empty compact convex subset of pr(O, F) in the vague topology (see, for example, [Preston, 1974]). For any J.L E P , if A ~ A', EIL(l[A,A/) IFAc) (X) == EIL(l[AnA,A)n[An(A'\A),A'\A) IFAc) (X)
== E tL (l[AnA,A] l[An(A/\A),A'\A) IFAc) (X) == l[An(A'\A),A'\A) (X)EIL (1 [AnA,A] IFAc) (X) == l[An(A'\A),A'\A](X)jA((A n A) U (X
for all A
C
nA
))
~
A' and X E f2. This says that FAc is sufficient for P. We put FAc . Since M(f2, F oo , P) == nAEC M(f2, .rAc, P) , using the martingale convergence theorem on von Neumann algebras (see,[Tsukada,1985]), we conclude the following theorem.
F oo ==
nAEC
Theorem.
.roo
is sufficient for P. REFERENCES
1. P.R. Halmos & L.J. Savage, Application of the Radon-Nikodym theorem to the theory of sufficient statistics, Ann. Math. Statist. 20 (1949), 225-241. 2. F. Hiai, M. Ohya & M. Tsukada, Sufficiency, KMS condition and relative entropy in von Neumann algebras, Pacific J. Math. 96 (1981), 99-109. 3. F. Hiai, M. Ohya & M. Tsukada, Sufficiency and relative entropy in *-algebras with applications in quantum systems, Pacific J. Math. 107 (1983), 117-140. 4. L. LeCam, Asymptotic Methods in Statistical Decision Theory, Springer, 1986. 5. H. Luschgy & D. Mussmann, Equivalent properties and completion of statistical experimaents, Sankya: Indian J. Stat. 47 (1985), 174-195. 6. T.S. Pitcher, A more general property than domination for sets of probability measures, Pacific J. Math. 15 (1965), 597-611. 7. D. Petz, Sufficient subalgebras and the relative entropy of states on a von Neumann algebra, Commun. Math. Phys. 105 (1986), 123-131. 8. C. Preston, Gibbs States on Countable Sets, Cambridge Univ. Press, 1974. 9. M. Tsdukada, Convergence of closed convex sets and a-fields, Z. Wahrsch. verw. Geb. 62 (1983), 137-146. 10. , The strong limit of von Neumann subalgebras with conditional expectations, Proc. Amer. tvIath. Soc. 94 (1985), 259-264. 11. H. U megaki, Conditional expectation in an operator algebra Ill, Kodai Math. Sem. Rep. 11 (1959), 51-64.
Nonlinear Parabolic Equations, Favard Classes, and Regularity GISELE RUIZ GOLDSTEIN t Department of Mathematics, Louisiana State University, Baton Rouge, LA 70803, and CERI and Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152
1. INTRODUCTION
Let A be an m-dissipative operator (not necessarily linear) on a Banach space X. By the Crandall-Liggett theorem A deternlincs a contraction semigroup T on (V(A)). The Favard class (or the generalized domain) V( A) is defined to be V(A) = {f E (D(A)) : ~fo IIAAfl1 < oo}.
Here AA is the Yosida approximation of A, namely AA == )..-1 (I - (I - )"A)-l) for).. is not difficult to show that the Favard class can be equivalently defined to be
V(A) == {I ==
E ('0( ..4 )) :
IIT(t)I -
III
:S Mft for some M f > 0 and 0 < t < I}
{I E ('O(A)): for sonIe sequence {gn} E 'O(A) with gn -t
I, Ag n is bounded as
n -t oo}.
Clearly, 'O(A) C 13(A) C ('O(A)),
Partially supported by an NSF grant. 253
>
o. It
Goldstein
254
and one can show V(A) == V(A) if X is reflexive. From our perspective, the most important aspect of the Favard class is the property T(t)(V(A)) c V(A) for each t
> 0,
that is, the Favard class is an invariant set for the semigroup. Hence, the Favard class contains information on spatial regularity of a problem. For example if we can show that Wok,P(f!) C V(A) c Wk,P(f!), (1.1) says the solution u(t) will have spatial derivatives up to and including order k, each of which is in LP(f!). The problem with this method is that V( A) is very difficult to compute explicitly. Our purpose in this paper is to calculate the Favard class explicitly in the case of a nonlinear parabolic problem with degeneracy and to draw some conclusions about regularity.
°
The problem of calculating the Favard class for this problem in the case 'ljJ == with either Dirichlet or nonlinear boundary conditions was studied in (4), (5). In this paper we consider a more general operator with several different types of boundary conditions, so that even in the case where no lower order terms are present this paper gives new results. The main result is Theorem 2. It is stated in Section 2 and proved in Section 3. Section 4 contains some extensions, while Section 5 contains concluding remarks and directions for future research. 2. A SINGULAR NONLINEAR PARABOLIC PROBLEM
We consider the problem
(2.1 ) for x E [0,1] and t E [0,00). Let X
== e[O, 1]; we assume the initial condition u(O, x) == uo(x).
We allow several types of boundary conditions at j
u(t,j)
== 0,1.
== 0
(-l)ju x (t,j) E (3j(u(t,j))
((BC. i )D) ((BC j )N)
Here f3 j is a strictly increasing maximal monotone graph in IR? containing the origin. Thus o E (3j(O), and if Yi E !3(Xi) for i == 1,2 and Xl < X2, then YI < yz. Note that (BC.i)N includes the linear boundary conditions
255
Parabolic Equations, Favard Classes, Regularity
for
0: j
> 0.
We also allow for periodic-like boundary conditions
u(t, 0) ux(t,O)
= u(t, 1)
= ux(t, 1).
Regarding 'P and 'l/J we assume 'P(x,q)
> 0 for
0
< x < 1,'P(x,q) 2: 'Po(:r) where
'P 0 E C [0, 1J, 'Po (x) > 0 for x E [0, 1] \ S and 'P~l E L l
(2.2)
[0, 1].
Here
s=
{x E [0,1] : 'Po(x) = O}.
(Hence meas S == 0.) There exist positive constants L, AI,and N such that
Iv'(x, p, q) 1'l/J(x,p,q)1
- VJ (x,
p,q) l:s Lip - p I
(2.3)
:s M(lpl)(l + 'P(x,q))M(l + Iq!)
(2.4)
:s N(l + Ipl)·
(2.5)
and
IVJ(x,p,O)1
where M : [0, 00) -+ [0, (0) is a continuous nondecreasing function.
In fact the constant L can be replaced by a continuous nondecreasing function £( Iq I), so that (2.3) holds only locally, and our theorems still remain valid. For such extensions see [2]. Let X be the Banach space C[O, 1] with the sup norm. We define the operator A on . .X" by
A.u = 'P(', u')u"
+ 1/J(x, u, u').
Choose one boundary condition at j = 0 and one at j (BCj),(BCj)N, or choose (BC)P. Then we define the set
yr
_
BC -
C[O, 1] C[O,l]nC I [O,l) 1] n CI(O, 1] { Cf[O, I C [O,l]
if if if if
1 from the conditions
(BC!?), (BCl)D hold (BC){j and (BC})D hold (BCo)N and (BCI)D hold (BCj)N holds for j == 1,2, or if (BC)? hold.
256
Goldstein
We define the domain V(A) of the operator A by
D(A) == {u E YBC n C 2 (0, 1) : Au E C[O, 1) and u satisfies the chosen boundary conditions at x == 0,1}.
Theorem 1: A is m-dissipative on X.
°
This result is due to J.A. Goldstein and C.Y. Lin [8] in the special case tP == 0, <po > and
>
°
such that
+ a) 1/ (x, y, ~ + a)
(2.6)
7
:::;
Cocp(x,~)
:::; CotP(x, y,~)
for all x E [0,1], all~ E IR and all a E IR with
lal < co.
The following theorem is the main result.
Theorem 2: Define A and D(A) as in the preceding discus~'Jion, and assume (2.2)-(2.6). Then the Favard class of A is D(A) == {u E C 2 (0, 1)
n YBC
+ 1/ (x, u, u') 7
:
u' E AC[O, 1]
E LOC>(O~ 1)
and u satisfies the boundary conditions at j == 0, I}.
3. PROOF OF THEOREM 2 Proof: Let
D == {u E C 2 (0, 1) n YBC : u' E AC[O, 1], Au E LOC>(O, 1) and u satisfies the boundary conditions at j == 0, I}. Our goal is to show D is precisely the Favard class of A. First, we shall show that V(A) ~ D. Let u E D(A), and choose Un E D(A) such that IIun - ulloc> -+ and IIAun "OCJ :::; M 1 for some M 1 > and for all n == 1,2···. Let Mo == Ilcpo-llll. Then for all n 2: 1,
°
°
(3.1)
Parabolic Equations, Favard Classes, Regularity
257
Define two quantities
wL(f; J) = sup{JE I f(x)ldx : E is a subinterval of [0,1] with
IEI < 8},
and
wc(f;J)
=
sup{lf(x) - f(y)1 : x,y E [0,1] with Ix -
yf S; J}.
[1]) that for J E (o,~], and if
It is not difficult to show (cf. fIt E LI[O, 1], we have
IIf'lloo
~
f
E YBC
n G2(0, 1)
411fll00 + III"11t
with
(3.2)
111' 1100 s:: ~ 111' 1100 + w df"; J).
(3.3)
°
Also, notice that if {fn} ~ L 1 [0, 1], then the statement that for every E > 0 there is a <5 > such that WL(!n; J) < E for all n is equivalent to saying that the sequence {In} is uniformly integrable on [0,1]. Similarly, the statement that for each E > there is a fJ > such that wc(fn; 8) < E for all n is equivalent to saying that the sequence {fn} is equicontinuous on
°
°
[0,1]. In our problem if we write
"() Un X
it follows that, given
E
~
'Po
(
X
)
> 0, (3.4)
and
(3.5)
'Po
l for some 8e > 0 and all n since is integrable. Together the estimates (3.1) and (3.2) show that {u~} is a pointwise bounded sequence in G[O, 1], while (3.4) and (3.5) show that {un} is an equicontinuous sequence in GI[O, 1]. Hence by the Arzela-Ascoli theorem there is a subsequence, which we again denote by {un} which converges uniformly to a function u E GI[O, 1]. lYsing the boundedness of {Au n }, we have, at least for some subsequence,
and
in X,
Un
-+
u~
-+ u' in . .X",
u~
-+ u" a.e.
U
Goldstein
258
for some
cS
sufficiently small. It also follows that
where Au = c.p(x, u')u" + ~(x, u, u'). Thus it remains to show that the boundary conditions hold. In the case of Dirichlet (BCj)D or periodic-like boundary conditions (BC)?, the result follows by the uniform convergence of Un to u. In the case of the nonlinear boundary conditions (BCj)N, the result follows from the closedness of the graph (3j.
U
This completes half of our proof. Next, we show that D ~ V(A). Let can be written uniquely in the form
{X (Y
= a + bx + 10 10
u(x)
U
E D. Clearly,
(3.6)
u"(s)dsdy.
Since the continuous functions are dense in L 1 [0,1], we can choose a sequence {In} ~ C[O, 1] with (a) in -+ u" a.e. and in L 1 (0,1) (b)
lin(x)l:S 2I u"(x)1 + 1 a.e. for
+ ~(x,
(c) sup Ilc.p(x, u~)u~~ where
Un
Un,
n
u~)lloo
2: 1 :S lV <
00
2
E C [0, 1] is defined by
11 x
un(x)
= an + bnx +
Clearly,
u~(x) = bn +
1
Y
(3.7)
fn(s)dsdy.
x
u~(x) = f~,(x)
fn(s)ds
for all x E (0,1),
a.e.
The constants a and bin (3.6) are uniquely determined. Specifically, a In (3.7), the definition of Un, choose an and bn so that as n -+ 00
The fact that sup IIc.p(x, u~Ju~
+ If'(x,
Un,
u~)1I
<
00
== u(O) and b == u'(O).
follows from (2.3), and (2.4) and the
assumption thar u E D. In order to complete the proof that U E D(A), it remains to show that the boundary conditions hold for each Un, so that {un} ~ V(A). This amounts to choosing an and bn appropriately. Recall that a
== u(O) and b :::::: u'(O), and define the constants
1
Cn, C,
d n and d by
1
en
=
fn(s)ds
(3.8)
Parabolic Equations, Favard Classes, Regularity
259
1 l lY 1
=
c
(3.9)
u"(s)ds,
I
=
d"
j,,(s)dsdy
(3.10)
r r u"(s)dsdy. ./0 lo
(3.11 )
l
d
=
It follows from the choice of {fn} that Cn
-+
C
and d n -+ d as n -+
00.
We consider the different cases based on the boundary conditions chosen.
Under these boundary conditions a == 0 and b == -d. Hence, the boundary conditions will be satisfied by Un if we choose an == 0 and bn == -dn. (The latter holds since d n -+ d.)
With periodic like boundary conditions we see a == a+b+d and b == b+c : whence c == 0 and b == -d. Choosing bn == -d n and an == a, we see that the sequence (an, bn ) -+ (a, b) as n -+ 00 and that un(O) == un(l), u~(O) == u~(l). Thus, Un satisfies (BC)?
This is the most difficult case. With these nonlinear boundary conditions, we must have b E f3o(a) and -(b + c) E f31(a + b + d). For Un E D(A), we need bn E f30(an)~ -(b n + cn) E /31 (an + bn + d n ) to hold for all n. Define the maximal monotone graphs, In(S) : == /31((1
,(s): == /31((1 By the strict monotonicity of 130 and interval
J
131,
,n"
on IR by
+ /3o)(s) + d n ) + /3o(s) + /3o)(s) + d) + /3o(s).
Range (,) == Range (rn) == J where J is the open
= (iiif ;31 + iiif ;3o, S~;31 + s~;3o )
.
Note that J is independent of n. For both boundary conditions to hold we need -Cn E rn( an). Since U satisfies the boundary conditions, we have -c E ,( a); in particular, c E J. Since C n -+ C, it follows that en E J for n sufficiently large. Hence, by the strict monotonicity there exists a unique an
Goldstein
260
with -en E ,(an). Also, for such n, there is a (uniquely determined if (31 is single valued) bn with bn E 13o(a n ) such that
Even if bn is not uniquely determined, from the facts that b E 130 (a) and an -t a, we see that we can choose bn E 130 (an) for sufficiently large n in such a way that
and bn -t b as n -t Case
4:
Case 5:
00.
(BCo)O,(BCI)N. (BCo)N,(BC1 ).
The proofs in Cases 4 and 5 are similar. We omit the details. 4. FURTHER RESULTS
Let Y == LCQ(O,l). We define the natural extension A of A from e[o, 1] to LCQ(O, l) by (Au)(x) ==
°
that is, the natural maximal domain of
A is the Favard class of A.
In [5] G. Goldstein, J. Goldstein and S. Oharu prove the following theorem in the case
'tP == 0.
A
is dissipative, and for all A > 0, 'D(A) C R(I - AA), that is, the hypotheses of the Crandall-Liggett theorem.
Theorem 3: The operator
A satisfies
Thus, A generates a contraction semigroup T == {T(t) : t 2: O} on D(A) == D(A) c C[O, 1]. It can be shown that C[O,l] {u E C[O, 1] {u E C[O, 1] {u E C[O, 1] {u E C[O, 1]
: u(O) : u(l) : u(O) : u(O)
For each Uo E 'D(A), the semigroup (2.1) satisfying u(O,x) == uo(x). For each A >
°
== == == ==
O} O} u(l) == O} u(l)}
if if if if if
(BCO)N,(BC1)N (BCo)D, (BCl)N (BCo)N, (BC1)D (BCo)D, (BC1)D (BC)P hold.
hold hold hold hold
T gives a unique mild solution u(t,x) == (T(t)uo)(x)
D(A) == D(A) == D(A) ~ C[O, 1] == R(I - AA)
== R(I - AA)
of
Parabolic Equations, Favard Classes, Regularity
261
so the range condition in Theorem 3 follows easily from Theorem 1 and the fact that A is an extension of A. The difficult part of Theorem 3 is the dissipative estimate. One must find an analogue of the second derivative test on £00(0,1). Heuristically, evaluation at a point can be viewed a linear functional on Loo(O, 1), but it is not a "good" linear functional. Application of the Hahn-Banach theorem, which requires a careful study of the duality map of LCXJ(O, 1), leads quite naturally to using finitely (but not countably) additive set functions on the Borel sets in [0, 1] which take values in [0,1]. The important facts about the duality map in Roo and LCXJ(O, 1) are contained in [10] and [11], respectively. In [5] we prove Theorem 3 for 'l/J == 0; we also use Theorem 3 in a critical way to prove the next theorem in that case. This theorem which can be extended to the present situation, is important since it gives us information on regularity in time of solutions of (2.1).
Theorem 4: Let A be the extension of A on LOO(O, 1) and V(A) be as above. Then for all Uo E V( A), there is a unique mild solution u( t) == T( t )uo of
{ satisfying
u'(t) == Au u(O) == uo
d
"'-
wk* -u(t) == Au(t) dt for t
~
(4.1)
o.
The statement (4.1) means that for every h E Ll(O, 1),
1 1
(u(t), h) as a function of t, and
= d
u(t, x )h(x)dx E AC loc [0,00)
-.
dt (u(t), h) == (Au(t), h) a.e. Notice that we cannot hope for a similar result on the space e[O, 1]; £00(0, 1) is a dual space whereas C[O, 1] is not. 5. FUTURE DIRECTIONS
We plan to investigate Favard classes for operators of the type we have been considering but with Wentzel boundary conditions rather than the ones used here. Let A be the operator U -7 cp(x, u' )u" + 'l/J(x, u, u') acting on a subset of C[O, 1]. The general Wentzel boundary condition associate with the operator A at the endpoint j(j == 0,1) is ajAu(j) + bju(j) +
°
cj u' (j) == where Vi == (a j , bj , C j) is a nonzero vector in lR • The case of Vo == VI == (1, 0, 0) is treated in [9] as far as existence is concerned. That is, in [9] it was shown that certain realizations of A are m-dissipative, but no Favard classes were computed. Here we give one 3
Goldstein
262
sample calculation. (Questions of this nature are being pursued in collaboration with J erry Goldstein and Silvia Romanelli.) Consider the boundary conditions defined by ~i == (1, b.i , 0) for j == 0,1 where ba 2: 0 2: bI with bQ - hI > O. Let Ui - AAui == hi for i == 1,2 when A > O. To prove dissipativity we must show IIUt - u2110c> :S IIh I - h 2 110c>' Choose XQ E [0,1] such that IluI - uzll oo == (UI - U2)(XQ). When 0 < XQ < 1, the proof proceeds as usual by the first and second derivative tests. Now consider the case XQ == O. (The case XQ == 1 is similar.) Evaluate Uj - AAui == hi at 0 and use the boundary condition AUj(O) + bjUi(O) == 0 where ba 2: O. Then ui(O)(l + AbQ ) == hi(O); whence,
11 U 1
-
U
2)(0) == (1
211 co == (u 1
-
U
Ilh I
-
hzll oo .
:S
+ Abo) -1 ( h 1 ( 0) -
h2(0) )
This implies the dissipativity of the operator. Favard classes associated with this type of boundary condition seem quite difficult to classify. We plan to study these objects in the future.
References 1. J.R. Dorroh~ and G.R. Rieder, A singular quasilinear parabolic problem in one space dimension, J. Diff. Eqns. 91 (1991), 1-23. 2. J.R. Dorroh, and G.R. Goldstein, Existence and regularity for singular parabolic problems, in preparation. 3. G. R. Goldstein, Nonlinear singular diffusion with nonlinear boundary conditions, Math. Meth. Appl. Sci. 20 (1993), 1-20. 4. G.R. Goldstein, J.A. Goldstein, and S. Oharu, The Favard class for a nonlinear parabolic problem, in Evolution Equations (ed. by A. C. McBride and G.F. Roach), Longman, Pitrnan Notes, Harlow (1995), 134-147. 5. G.R. Goldstein, J.A. Goldstein and S. Oharu, in preparation. 6. G.R. Goldstein, J.A. Goldstcin and S. Romanelli, in preparation. 7. J.A. Goldstein, Semigroups of Nonlinear Operators, in preparation. 8. J. A. Goldstein and C. Y. Lin, Singular nonlinear parabolic boundary value problems in one space dimension, J. Diff. Eqns. 68 (1987), 429-43. 9. J. A. Goldstein and C. Y. Lin, Highly degenerate parabolic boundary value problems, Diff. Int. Eqns. 2 (1989), 216-227. 10. 1. Rada, K. Hashimoto and S. Oharu, On the duality map of (1979), 71-97.
fCXJ,
Tokyo J. Math 2
11. K. Hashimoto and S. Oharu, On the duality mapping of LCO(O, 1), to appear.
Parabolic Equations, Favard Classes, Regularity
263
12. G. R. Rieder~ Spatially degenerate diffusion with periodic-like boundary conditions~ in Differential Equations with Applications in Biology, Physics, and Engineering (J. A. Goldstein~ F. Kappel, W. Schappacher, eds.), Lecture Notes in Pure and Applied Math., Marcel Dekker, New York (1991),301-312.
Index
absolutely 2·sumnting 124
configuration 40
abstrad Wiener space 149
constant conditional variances 49
agents 114
constraint 153, 154, 155, 158
amplitude-frequency modula.tion 22
convex pla.ne polyhedron 153
analytic sentigroup 86
coordinate 153, 154
a.ssignment economy 114
Cramer·Jtao inequalities 2 Crandell.Liggett theorem lOS, 253
Bana.ch function space 4 basic:
constrain~
Crandall.Liggett.8enilAn theorem 105 155, 163 164 current constraint 156
ba.sic vertex 155
current pivot row 156, 157
Bayesia.n boatstrap 211
current pivot column 157 Berry-Esseen type 209, 213 cylindrical test function 225, 232
Besicovitch-Orlicz space 5 Birkhofl' normal form 22
Dj)«loc) 57
bond 40
1l2-condjtion 180
bootstrap 209 211
degenera.te evolution equation 85
boundary conditions 254,255
degenerate vertex 155, 156 det,erminant 160, 161, 164
CaJIonica.l form 158, 165 differenti~le semigroup 94
Cantor distributed 56
diffusion approximation 85 centrallimJt theorem diffusion matrix 42
in Hilbert Space 210 Clarkson
inequaljt~
diffusion process 199
188, 191
direct limit 14 classical multivariate Dirich1el form 40, 41 normal distribution 46, 49 dj&&ipath-e 106 cla.ssical normal model 57 CIement-Timmerrnl\.lls theorem 99, 103
Eckart-Young theorem 71
conditional expectation 219,220,221
Edgeworth type expansion 209 265
Index
266 elli ptically con toured 57
generalized Cramer-Rao inequalities 2
elementary column matrix 154
generalized random fields 11
elementary column operation 154
Gibbs measure 41
elementary row operation 154
Gibbs state 251, 252
embedded Markov chain 138
Girsanov transform 232, 233
t-slowly changing weakly
Giitze lemma 211
harmonizable 239
gramian 130
evolution equation 85, 225
gramian orthogonally scattered 124
experiment 248
gramian orthogonally
explosive process 8
scattered dila.tion 125
Faber theorem 217
Haar subspace 61
Favard class 253
Hamiltonian perturba.tion 17
Fenchel-Orlicz space 5
Hardy-Orliez space 5
Fernique theorem 149
harmonizable 9, 237
Feyman-Kac formula 227
harmonizable process 9, 10
filtering 219, 220
Hida measure 147
F'richet space 148
Hilbert B(H)-module 124
Frechet variation 237
llilbert matrix 152
FrobeniuB norm 72
Hilbert-Schmidt operator 123
Fujisaki-Kallianpur-Kunita
Hilbert space valued
(FKK) filtering equation 220,226,228,229,233 GCS2(k) 50, 53
GauBsian conditional structure of second order 45,50,52,54,57,58
U-statistic 211 Hoeffding decomposition 209 hydrodynamic scaling 41 idem potent operator 2 indirect utility function 115
Gaussian measure 171
ill-conditioned matrix 157
Gelfand density 117
inference 13
Gelfand weak • density 117
inner product 153
Index
167
innovation process 225
ma.x.imal monotone gTaph 254
in~elading par~icles
measure-valued solution
39
infinitely di vi si ble 57
219,220,228,229,234
invariant 159
Melnikov condition 22
Ita formula 232
Melnikov theorem 17
James' constant 179, 180, 181, 184
microscopic p.icture 39
mild solution 223, 224, 226 Kagan cla.ss 57
minimal project.ion 61
Kallianpur·Kunita
minimal LI projection 61
filtering equation 219 KAM
~heory
17
MJnlos theorem 148 misrepresenta.tion 116
Kantor' inequality 176
MHtag-Leffier function 148
KdV equation 23
mixing c.onditionJl 41
Kolmogorov's backward equa.tion 227,232 Kompanee~s equation
102
Kwapien's example 51, 58
Loo factorization 215 lattice gas model 39 linear regressions 46,48, 49 linear structure 46 local density function 40
mixtures 55 module 124 modulus of convexity 190 modulus of smoothne s 190 Mors~Tra.nsue
237
multiple Wienel integral 199 multivariMe normal distribution 46 V,-condition 180
local specification 250
nonhasic constraint 155, 156, 164, 169
Lusln space 230
nonlinear d1ifulrion 42
Lyapunov-Sc.hmidt decomposition 25
nonlinear filtering 199,219 nonlinear prediction 5
M/M{l 137 macroscopic para.mder 39
nonlJnear wave equa.tion 31
majorant 125 manipula~ion
nonlinear SchrOdinger equation 32
119
non-manipula.ble 119
268
Index
nonsquare constant 179
qoasi-Gaussian distribution 45
nonnaJ Hilbert B·modulc 124
qoasi.peri~c
normal conditionals distribution 48
queueing system 137
normal conditionals model 47 normaJs 154 n-step transition probabilities 138
evolution 17
randomization 138 random number generator 157 range condition 106
objective function 153, 157
aaa's interpolation theorem 189
observable 249
real variables 153
operator semi variation 127
recursive filter 199
operator stationary 131
recursi ve formula 161
operator stationary dilation] 31
redundant constraint
Orlicz space 3 179, 180
158, ]59, 164, 165, 166. 168, 169
orthogonal invariance 74
reduced echclon form 163, 165
orthogonally scattered 124
reflection principle 138
orthogonally scattered dilation 124
regularity 261
oscillatory stationary 239
representing measures 114
oscillatory weakly barmonizable 23
reservation vaJ ue 114
parabolic 254 partial pivoting strategy 154 perfect competition 114 periodic lattice 39 permutation 159, 168, 169
resonant set 2] Riesz space 4 r·semi stable measure of index a 172 semivariation ] 27
persistency 17
scalarly weakly barmonizable 131
population measure 114
scalar1y weakly
predictable 223, 224 projective limit 14 projective limit topology 148
stationary dilation 131 Schaffer constant 180 Schwartz space 147 second order IIQ process 7
Index
269
semi·stabl 171
U-statistic 209
simplex method 153, 155
utility function 114
simplex pivoting strategy 154 simplex strategy 154, 155, 156, 166 singular site 27
vector measure 113 Vitali variation 237 von Mises w 2 statistic 211
singular value decomposition 76 small balls 171
Walrasian allocation 117
spectral bi-measure 237
Walrasian equilibrium 115
spectral dilation 124
Walrasian prices 115
stable probability measure 171
weak convergence 56
stationary 236
weakly harmonjzable 131, 237
stochastic reaction-diffusion
weakly operator harmonizable 131
equation 219
weakly of class (C) 240
strongly continuous semigroup 85
weakly stationary 130
strongly harmonizable 237
weakly stationary dilation 131
sufficiency 12, 245
well-localized 18
symmetric semi·stable measure 171
Wentzell boundary condition 85, 102
trace class covariance 222 trace class operators 123 trace class whi le noise 220 lransient probabilities 138
Wiener chaos 199 Wiener measure 224 Wiener process 220, 225 Yosida approximation 254
translation 154 transpose 154 two-majoriant 125 uniform marginals 56 uniformization 138 uniformly nonsquare 179, 180 updating subroutine 157
Zakai equation 221,226,228,229
about the book Covering the are of modem analy i and probability theory, thi e citing Fe 1chrift pre eOl ,coll lion of paper gi en at the conference held in honor of the 65th birthda of . M. Rao, h prolifj publi hed re ear h loin lude th well-r cei ed Marcel Dekker, Inc. book Theory of Orlicz Space and Conditional
Measures alld Applications. eaturing previ u Iy unpubli hed re earch arli le by a ho t f iOlernati nally r cognized choJar, tochastic Proces es and Functional Analy is offers conlribuli n olulion equalion ... Jaltic gas on theme uch a persi tency in Hamiltonian model ... Banach pace theory ...det rmini tic and locha tic differential equation ...operalonh ory ... and m r . Furni hed with over 300 reference and 750 di play equation and figure, tochastic Processes and Functional Analysis i indi pen able for locha tic and fun tional analy tocha tic pr e re earcher , re carch math malician , theoretical phyici 1 and tali ti ian and graduate lud nl in the e di ipline.
about the editors ... JEROME A. GOLD TEI i a Profe or of Mathemalic at the Univer ity of Memphi , Tenne ee. He is the author of over 100 re earch article. and one book, and coeditor f four b k. including Differenrial Equations with Applicatiolls ill Biolog , Physics, and EIIgineerin (M reel Dekker, Inc.). Dr. Gold lein r eiv d Ihe B.S. (1963), M.S. 1964) and Ph.D. (1967) degree from Carnegie Mellon Uni er ity, Piltsburgh, P nn ylvania. EtL E. GRET KY i an A iate Profe or of Mathematic at th Uni er ily f California, Riverside. He received the B.S. degree (1962) from the California In lilute f T hnology, P adena, and the .. (1964) and Ph.D. (1967) degr from Carnegie Mellon Univer ily, Pill burgh, Penn ylvania.
J. J.
HL. JR. i a Profe or of Mathematic al the Uni ersily of LlIinoi at UrbanaChampaign. He received the B.. d gree (1 2) from the College of William and ary, Virginia, William burg, and the .5. (I 64) and Ph.D. (19 ) degr e from Carnegie Mellon Univer ity, Piu burgh, Penn ylvania.
Primed ill the United State ofAmerica
ISBN: 0-8247-9801-5
mareel dekker, Ine./new york' basel • hong kong