REAL AND STOCHASTIC ANALYSIS Current Trends
8940_9789814551274_tp.indd 1
17/10/13 11:37 AM
October 24, 2013
10:1
9in x 6in
Real and Stochastic Analysis: Current Trends
This page intentionally left blank
b1644-fm
REAL AND STOCHASTIC ANALYSIS Current Trends
Edited by:
M. M. Rao University of California at Riverside, USA
World Scientific NEW JERSEY
•
LONDON
8940_9789814551274_tp.indd 2
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TA I P E I
•
CHENNAI
17/10/13 11:37 AM
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data Real and stochastic analysis (World Scientific (Firm)) Real and stochastic analysis : current trends / Malempati Madhusudana Rao, University of California, Riverside, USA. pages cm Includes bibliographical references. ISBN 978-9814551274 (hard cover : alk. paper) 1. Stochastic analysis. I. Rao, M. M. (Malempati Madhusudana), 1929– editor of compilation. II. Title. QA274.2.R424 2014 519.2'2--dc23 2013027573
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
Copyright © 2014 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
In-house Editor: Angeline Fong
Printed in Singapore
October 24, 2013
10:54
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
CONTENTS
Preface Introduction and Overview
ix xi
1. Gaussian Measures on Infinite-Dimensional Spaces V. I. Bogachev 0
Introduction . . . . . . . . . . . . . . . . . . . 0.1 Notation and terminology . . . . . . . . 1 Gaussian Measures on Rd . . . . . . . . . . . 2 Infinite-Dimensional Gaussian Distributions . 3 The Wiener Measure . . . . . . . . . . . . . . 4 Radon Gaussian Measures . . . . . . . . . . . 5 The Cameron–Martin Space and Measurable Linear Operators . . . . . . . . . . . . . . . . 6 Zero-one Laws and Dichotomies . . . . . . . . 7 The Ornstein–Uhlenbeck Semigroup . . . . . 8 The Hermite–Chebyshev Polynomials . . . . . 9 Sobolev Classes over Gaussian Measures . . . 10 Transformations of Gaussian Measures . . . . 11 Convexity . . . . . . . . . . . . . . . . . . . . 12 Open Problems . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
1 . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
1 2 3 6 12 17
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
19 32 32 34 42 51 68 73 75
2. Random Fields and Hypergroups Herbert Heyer 0 1
2
Introduction . . . . . . . . . . . . . . . . Commutative Hypergroups . . . . . . . 1.1 Definition and first examples . . . . 1.2 Some harmonic analysis . . . . . . 1.3 Basic constructions of hypergroups Random Fields over Hypergroups . . . .
v
85 . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. 85 . 86 . 86 . 91 . 98 . 110
October 24, 2013
10:1
vi
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
Real and Stochastic Analysis
2.1 Second order random fields . . . . . . . 2.2 Translation and decomposition . . . . . 2.3 Harmonizability . . . . . . . . . . . . . . 3 Generalized Random Fields over Hypergroups 3.1 Segal algebras . . . . . . . . . . . . . . . 3.2 The extended Feichtinger algebra . . . . 3.3 Covariance and duality . . . . . . . . . . 3.4 Suggestions for further research . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
3. A Concise Exposition of Large Deviations F. Hiai 0 1 2 3 4 5 6 7 8
Introduction . . . . . . . . . . . . . . . . . . Definitions and Generalities . . . . . . . . . The Cram´er Theorem . . . . . . . . . . . . The G¨ artner-Ellis Theorem . . . . . . . . . Varadhan’s Integral Lemma . . . . . . . . . The Sanov Theorem . . . . . . . . . . . . . Large Deviations for Random Matrices . . . Quantum Large Deviations in Spin Chains . Applications of Large Deviations . . . . . . 8.1 Boltzmann-Gibbs entropy and mutual information . . . . . . . . . . . . . . . 8.2 Free entropy and orbital free entropy . References . . . . . . . . . . . . . . . . . . . . .
183 . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
3
183 185 191 199 211 218 230 245 252
. . . . . . . . . 252 . . . . . . . . . 257 . . . . . . . . . 265
4. Quantum White Noise Calculus and Applications Un Cig Ji and Nobuaki Obata 1 2
110 118 129 142 142 148 161 176 179
Introduction . . . . . . . . . . . . . . . . . . . . . . . . Elements of Gaussian Analysis . . . . . . . . . . . . . 2.1 Standard construction of countable Hilbert spaces 2.2 Gaussian space . . . . . . . . . . . . . . . . . . . 2.3 Fock spaces and the Wiener–Itˆo decomposition . 2.4 Underlying spaces . . . . . . . . . . . . . . . . . . White Noise Distributions . . . . . . . . . . . . . . . . 3.1 Standard CKS-space . . . . . . . . . . . . . . . . 3.2 Brownian motion . . . . . . . . . . . . . . . . . . 3.3 The S-transform . . . . . . . . . . . . . . . . . .
269 . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
269 273 273 276 279 282 285 285 291 292
October 24, 2013
10:1
9in x 6in
Real and Stochastic Analysis: Current Trends
vii
Contents
3.4 Infinite dimensional holomorphic functions . . . . White Noise Operators . . . . . . . . . . . . . . . . . . 4.1 White noise operators and their symbols . . . . . 4.2 Quantum white noise . . . . . . . . . . . . . . . . 4.3 Integral kernel operators and Fock expansion . . 4.4 Characterization of operator symbols . . . . . . . 4.5 Wick product and wick multiplication operators . 4.6 Multiplication operators . . . . . . . . . . . . . . 4.7 Convolution operators . . . . . . . . . . . . . . . 5 Quantum Stochastic Gradients . . . . . . . . . . . . . 5.1 Annihilation, creation and conservation processes 5.2 Classical stochastic gradient . . . . . . . . . . . . 5.3 Creation gradient . . . . . . . . . . . . . . . . . . 5.4 Annihilation gradient . . . . . . . . . . . . . . . . 5.5 Conservation gradient . . . . . . . . . . . . . . . 6 Quantum Stochastic Integrals . . . . . . . . . . . . . . 6.1 The Hitsuda–Skorohod integral . . . . . . . . . . 6.2 Creation integral . . . . . . . . . . . . . . . . . . 6.3 Annihilation integral . . . . . . . . . . . . . . . . 6.4 Conservation integral . . . . . . . . . . . . . . . . 7 Quantum White Noise Derivatives . . . . . . . . . . . 7.1 Quadratic functions of quantum white noise . . . 7.2 Quantum white noise derivatives . . . . . . . . . 7.3 Wick derivations . . . . . . . . . . . . . . . . . . 7.4 Quantum white noise differential equations of Wick type . . . . . . . . . . . . . . . . . . . . . 7.5 The implementation problem . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . Hilbert Space Valued Measures . . . . . . 2.1 Radon-Nikod´ ym property . . . . . . 2.2 Weak Radon-Nikod´ ym derivatives . . 2.3 Existence and uniqueness . . . . . . 2.4 Orthogonally scattered measures and
. . . . . . . . . . . . . . . . . . . . . . . . . dilation
297 300 300 302 306 310 311 314 315 321 321 322 324 327 329 331 331 331 334 335 336 336 337 340
. . . 342 . . . 343 . . . 348
5. Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration, and Cram´er and Karhunen Processes Yˆ uichirˆ o Kakihara 1 2
b1644-fm
355 . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
355 357 357 360 364 374
October 24, 2013
10:1
viii
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
Real and Stochastic Analysis
2.5 Dunford-Schwartz type integration . . . . . . 2.6 Bimeasure integration . . . . . . . . . . . . . 3 Hilbert-Schmidt Class Operator Valued Measures . 3.1 The space of Hilbert-Schmidt class operators as a normal Hilbert module . . . . . . . . . . 3.2 The space L1 (ξ) . . . . . . . . . . . . . . . . . 3.3 Weak Radon-Nikod´ ym derivatives . . . . . . . 3.4 The spaces L1DS (ξ) and L1∗ (ξ) . . . . . . . . . 3.5 The spaces L1DS (η) and L2 (Fη ) . . . . . . . . 3.6 The spaces L1∗ (ξ) and L2∗ (Mξ ) . . . . . . . . . 4 Cram´er and Karhunen Processes . . . . . . . . . . 4.1 Infinite dimensional second order stochastic processes . . . . . . . . . . . . . . . 4.2 Cram´er processes . . . . . . . . . . . . . . . . 4.3 Karhunen processes . . . . . . . . . . . . . . . 4.4 Operator representation . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 377 . . . . . 382 . . . . . 384 . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
384 386 390 396 404 406 414
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
414 416 420 423 428
6. Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes M. M. Rao 1 2
Introduction . . . . . . . . . . . . . . . . . . . . Error Estimation Problems from Probabilty Limit Theory . . . . . . . . . . . . . . . . . . . 3 Higher Order SDE and Related Classes . . . . 4 Entropy, Action/Rate Functionals and LDP . . 5 Vector Valued Processes and Multiparameter FLO Classes . . . . . . . . . . . . . . . . . . . . 6 Evaluations and Representations of Conditional References . . . . . . . . . . . . . . . . . . . . . . .
431
. . . . . . . 431 . . . . . . . 434 . . . . . . . 450 . . . . . . . 461 . . . . . . . 483 Means . . . 493 . . . . . . . 498
7. Bispectral Density Estimation in Harmonizable Processes H. Soedjak 1 Introduction . . . . . . . . . . . . . . . . . 2 Assumptions and a Resampling Procedure 3 The Limit Distribution of the Estimator . 4 Final Remarks and Suggestions . . . . . . References . . . . . . . . . . . . . . . . . . . . Contributors
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
503 . . . . .
. . . . .
. . . . .
. . . . .
503 505 515 558 560 561
October 24, 2013
10:1
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
PREFACE
Just as in the case of the earlier volumes published in 1986, 1997 and 2004 under the general title of Real and Stochastic Analysis, exhibiting the usefulness of the real (also called functional analytic) methods in advancing stochastic analysis, the current volume again aims to exemplify these methods and concentrate on several (related) areas of stochastic processes and fields that are deemed to be of considerable interest. The purpose is to present some interesting parts of stochastic analysis by active researchers which crucially employ such functional analytic methods. Thus each chapter highlights the current state of the subject considered, and presents, by an active researcher, emphasizing what is completed and what are the current trends in the subject. The material is not only a survey but contains considerable amount of new material as well. The seven chapters deal with different classes of the subject by the invited authors. The special role and motivation of the Brownian Motion and some serious extensions to infinite dimensional spaces have been focussed. Also included are reasons for the need of infinite dimensional extensions and use of abstract methods, by including some concrete illustrations. In this setting applications to quantum stochastic analysis is discussed. Also the LDP and random fields on general structures (hypergroups) as well as representation of certain classes related to Cram´er and Karhunen processes as well as statistical estimation problems for families of harmonizable (nonstationary) classes are also treated. Some aspects of free random analysis extending the LDP results is considered as well. As usual all articles are reviewed. I hope that the work stimulates both the young and seasoned researchers. For presenting the articles in the desired format, I would like to thank the authors in doing some revisions, and for meeting the deadlines. Thanks are also due to my collegues Dr. L. O. Ferguson and particularly Dr. Y. Kakihara for advice and help on the arrangements, and the UCR
ix
October 24, 2013
x
10:1
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
Real and Stochastic Analysis
math department for the final format-related work and the publisher for the enthusiastic cooperation in bringing out the volume on schedule. M. M. Rao Editor Riverside, CA May, 2013.
October 24, 2013
10:1
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
INTRODUCTION AND OVERVIEW
M. M. Rao
As the limit theory of (normalized) partial sums of (independent) random variables often lead to Gaussian measures, prompting the great H. Poincar´e (attributing to Lippman) to say that “there is something mysterious about this distribution (or measure) that mathematicians think it is a law of nature and physicists are convinced that it is a mathematical theorem”. Motivated by such a prevalent attitude, this volume starts with an extensive treatment of Gaussian measures, and presents several achievements and promises by Bogachev discussing not only his own and coworkers’ contributions as well as much of the current activity in this area with extensive updated references and reviews. Since simple minded extensions of (consistent) Gaussian measures from finite dimensional spaces does not necessarily have σ-additive extensions, conditions for the latter exemplified by the Wiener measure and extensions of the Cameron-Martin analysis of the subject taking a prominent place are treated. Most of the corresponding works by the Russian and French schools have been discussed in a unified form in this chapter. Since Gaussian measures generally are more amendable for detailed analysis, their extension to infinite dimensional spaces also is prominent for much of the following work. Applications with Ornstein-Uhlenbeck classes, as well as the key role of the Hermite-Chebyshev polynomials in this analysis and applications to problems in Sobolev spaces with Gaussian measures are illustrated. Many problems that can be persued for active and new researchers in the subject are pointed out with helpful remarks. The second chapter by Herbert Heyer treats the problems if the underlying spaces have special group structures, particularly with polynomial and other hyper-group structures which admit concrete analysis. As one who spent a great deal of time on hyper groups in recent times, the author first presents succinctly some aspects of structural analysis of hyper groups and xi
October 24, 2013
xii
10:1
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
Real and Stochastic Analysis
used it later when considering random fields. He presents the basic analysis in the first half of the chapter to have it studied independently and then moves on to consider the stationary and more particularly harmonizable random fields on hyper groups. The latter is a subject which has not been widely known, and he includes the still unpublished research work completed under his guidance at his university. It is thus a welcome feature that detailed some new and not yet seen material included using his well-known expository style. The chapter concludes with suggestions for a follow-up research on the subjects discussed. Chapter 3 by Hiai treats the modern analysis of large deviations over random matrices and spin systems including some work done after the publication of his recent reference volume with D. Petz (2000), which treats the free random variables and entropy as well as extensions of Varadhan’s (by now) classical work on Large Deviations in the ‘Free’ context. The basic analysis of Cram´er, extended by G¨ artner and Ellis as well as others is analyzed and widened. Considerable space is devoted to Boltzman-Gibbs and Free entropy analysis. Some of the work appears here for the first time and researchers should be appreciative to have a detailed and helpful treatment of this fast-growing subject from a broad perspective. Chapter 4 written by Ji and Obata deals with an extension of Hida’s white noise calculus, namely the differential analysis in the sense of Schwartz distribution theory, is presented with simplifications applicable to probabilistic methods of the quantum calculus. This generalizes the abstract Wiener space analysis of L. Gross’s and its extended applications, as expounded by Kuo (1996), further adapted and expanded by the authors in a series of publications, is included with details. This will be helpful to newcomers with some knowledge of the basic exposition in Parthasarathy (1992). The detailed treatment of quantum white noise calculus shows its affinity with abstract Wiener process, using Wick products, the accompanying differential calculus and several new problems to be analyzed and extended. Chapter 5 by Kakihara discusses a weakened concept of the RadonNikod´ ym derivative in Banach spaces. Even in Hilbert space this allows one to study integral representations of random processes of Cram´er and Karhunen types which generalize the classical (Khintchine) stationary classes and representations with operator valued stochastic measures. It is the operator spaces that lead one to study weak R-N types. Here to make some applications of the subject, one needs to consider some well-behaved operator spaces such as the Hilbert-Schmidt and trace class types. After
October 24, 2013
10:1
9in x 6in
Real and Stochastic Analysis: Current Trends
Introduction and Overview
b1644-fm
xiii
developing the necessary analysis of these spaces, the author applies this work to the two classes, namely Cram´er and Karkunen types, the latter were originally considered in the scalar case by Karhunen (as extension partly of Lo`eve’s) and generalized by Cram´er — both in the scalar cases. In the multivariate case, the relevant analysis is given in detail here by the author. This will be of interest both to new and seasoned workers in random analysis. Chapter 6 on Entropy and LDP (large deviations and related matters) as well as on some classes of higher order stochastic differential equations (SDEs) is written by me, primarily concentrating on the LDP for several types of stochastic processes, using the entropy invariant originated by Shanon and perfected by Khintchine. It is also used now as a key invariant in stochastic limit analysis by many researchers. After considering integral representations with processes of Lφ1 ,φ2 -bounded integrators introduced by Bochner that extend both the Brownian and Itˆ o integrals, the nth order SDEs are discussed along with their applications in T. S. Pitcher’s deep stationary analysis involving their likelihood ratios under differential (admissible) measure functions. Then the work considers the LDP for the processes which lead to using Legendre-Orlicz classes in this work. If the processes are vector valued, this leads to the projective limit theory and to be in a given Banach space some restrictions are needed which lead to the abstract Wiener class. Next restricting to the stationary case leads to Ornstein-Uhlenbeck processes. These are treated often as surveys to conserve space. The LDP analysis for random fields here leads to the Fenchel-Legendre-Orlicz classes. These spaces are outlined and in the multidimensional case they lead to some unresolved problems some of which are pointed out for future research. They involve vector measure valued random elements which are currently open. A major question in this analysis is a computation of conditional expectation operators. No really usable method of computation is available in the literature. There are some procedures using Fourier analysis (essentially an extension of L´evy’s inversion type formula) extended to the present setup, and its multidimensional forms under some conditions are available, but the area is not well developed and asks for immediate research efforts. The known work is discussed here. Because there are many unresolved questions, usually only sketches of the proofs are included and the newer workers will have lots of problems to work with. Finally Chapter 7, written by H. Soedjak, deals with a particular but an important class of nonstationary processes, namely harmonizable
October 24, 2013
10:1
xiv
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-fm
Real and Stochastic Analysis
random classes which are originally considered by Lo`eve and Bochner (also independently by Rozanov), now called strong and weak classes respectively which are a natural extension of the stationary class (see also Chapter 2 above) which have immense applicational potential. These are a first extension of the (weak) stationary class, although not treated much in the textbook literature. The statistical estimation problems on large samples or segments of observations, and their consequent limit behavior are essential for any true and real-world applications. This is not yet available. So here Soedjak, although he has announced some of the results earlier with only hints of proof, presents a complete version of the basic limit theorems for the strongly harmonizable class. Thus the weakly harmonizable case and the respective extensions to Karhunen and Cram´er classes (as discussed in Chapter 5 above) can be studied and extensions to weakly harmonizable classes may also be considered after that. It is thus evident that the current trends in Stochastic Analysis use both the functional analytic methods and results as well as the basics and their modern counterparts. It is hoped that these works, containing different aspects of stochastic theory contributes for a sustained and different aspects of the subjects which would stimulate both the abstract (i.e. functional analysis) and stochastic aspects and their mutual interactions in current and future research. References [1] F. Hiai and D. Petz (2000), The Semi-circle Law, Free Random Variables and Entropy, AMS Surveys, Providence, RI. [2] H. H. Kuo (1996), White Noise Distribution Theory, CRC Press, Boca Raton, FL. [3] K. R. Parthasarathy (1992), An Introduction to Quantum Stochastic Calculus, Birkh¨ auser Verlag, Berlin.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
CHAPTER 1 GAUSSIAN MEASURES ON INFINITE-DIMENSIONAL SPACES
V. I. BOGACHEV
0. Introduction Gaussian distributions, along with certain discrete distributions, are the most important statistical distributions in science and technology. They have been known and used for two centuries. Thousands of research papers have been published on their properties, the number of papers in applied fields dealing with Gaussian can be hardly estimated, but still almost every decade some old problems are given solutions and new interesting ones arise. The aim of this survey is to give a modern and concise account of the theory of infinite-dimensional Gaussian distributions. We present a number of classical cornerstone achievements, some more recent results, and open problems with relatively short formulations. A detailed discussion of the general theory of Gaussian measures with an extensive bibliography can be found in the author’s book [21] (see also [24]); various important directions in this theory are discussed in greater detail in other relatively recent books such as Fernique [60], Janson [87], Hida, Hitsuda [80], Ledoux, Talagrand [97], and Lifshits [101], [102] (see also Stroock [124] in Chapter 8), as well as in older monographs on the subject, among which one should mention Rozanov [120], Ibragimov, Rozanov [85], Badrikian, Chevet [14], and Kuo [90]. See also Adler [2] and Piterbarg [118], where Gaussian processes and fields are the central object. It should be noted that our discussion concerns mostly Gaussian measures rather than Gaussian processes and the emphasis here is on analytical issues (and partly on measure-theoretic and topological ones). In particular, we do not discuss at all sample path properties of Gaussian processes, a very important subject from which many abstract concepts grew out. The books 1
October 24, 2013
2
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
[60], Ledoux, Talagrand [97], and Lifshits [101] cited above describe this area in detail and give credit key researchers in it, such as Dudley [56], [57], Fernique, Sudakov [125] and Talagrand [128]. For a discussion of the white noise calculus, which is related to Gaussian analysis, see [81]. Concerning recent applications, we refer the reader to [55], [103], and the survey [102], where additional references can be found. Over the years I have had a splendid opportunity to discuss problems related to Gaussian measures with outstanding experts, including R. Dudley, X. Fernique, L. Gross, I. A. Ibragimov, P. Malliavin, P.-A. Meyer, Yu. V. Prohorov, V. V. Sazonov, A. V. Skorohod, A. V. Sudakov and M. Zakai. This work was supported by the RFBR projects 12-01-33009 and 11-01-12104-ofi-m, Simons-IUM fellowship of the Simons Foundation and the program SFB 701 at the University of Bielefeld. 0.1. Notation and terminology The following notation and terminology will be used throughout. A probability space is a triple (X, A, µ), where A is a σ-algebra of subsets of some space X and µ is a probability measure on A. The symbol Aµ denotes the completion of A with respect to µ (which is not assumed to be complete), i.e. the σ-algebra of sets of the form A ∪ Z, where A ∈ A and Z is a subset of a set of measure zero. The class of all µ-integrable functions is denoted by L1 (µ) and the corresponding Banach space of equivalence classes (where functions equal almost everywhere are identified) is denoted by L1 (µ). Similar notation Lp (µ) and Lp (µ) is used for the classes of µ-measurable functions integrable to power p ∈ (1, ∞) and the respective spaces of equivalence classes. Given another space Y with a σ-algebra B, a mapping f : X → Y is called measurable with respect to the pair (A, B) if f −1 (B) ∈ A for each B ∈ B. The term “µ-measurable” is applied in the case where A = Aµ and in that case f may be defined only µ-almost everywhere (abbreviated as µ-a.e.), which means “everywhere except for a set of measure zero”. Such a mapping f transforms µ into a measure on Y called the image measure or the induced measure and defined by the formula (µ ◦ f −1 )(B) = µ(f −1 (B)),
B ∈ B.
Obviously, the measurability condition is needed to define the value on the right, and it is straightforward to verify that µ ◦ f −1 is indeed a measure
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
3
on B. Certainly, in the same way one can define images of not necessarily probability measures. It follows relatively easy from the definition that the integrals with respect to the image measure can be evaluated by the following change of variables formula: −1 ϕ(y)µ ◦ f (dy) = ϕ(f (x))µ(dx), (0.1) Y
X
where ϕ is a B-measurable function (it is integrable against µ◦f −1 precisely when ϕ ◦ f is µ-integrable). If a measure ν on A has the form ν = · µ, where is a µ-integrable function, which means that ν(A) = (x)µ(dx), A ∈ A, A
then is called absolutely continuous with respect to µ, which is denoted by ν µ, and is called its Radon–Nikodym density with respect to µ. A necessary and sufficient condition for that, expressed by the Radon– Nikodym theorem, is that ν vanishes on all sets of µ-measure zero. If also µ ν, which is equivalent to = 0 µ-a.e., then the measures are called equivalent, which is denoted by ν ∼ µ. One σ-algebra plays a particularly important role in measure theory. This is the Borel σ-algebra B(X) of a topological space X (for the purposes of this survey it is enough to be acquainted with the concept of a metric space) defined as the minimal σ-algebra containing all open sets. The term a “Borel measure” means a bounded measure on the Borel σ-algebra. It is customary to use σ-algebras generated by classes of sets or functions. Given an arbitrary class S of sets in a space X, there is the smallest σ-algebra in X containing S; it is denoted by σ(S). Similarly, given any class F of functions on X, there is the smallest σ-algebra with respect to which all these functions are measurable; it is denoted by σ(F ). The same σ-algebra is generated by the class of sets {f < c}, where f ∈ F and c ∈ R. The norm on Rd will be denoted by |·|. The space of continuous functions C[a, b] is considered with its natural norm f = supt∈[a,b] |f (t)|. 1. Gaussian Measures on Rd A Gaussian measure on the real line is a Borel probability measure which is either concentrated at some point a (i.e., is Dirac’s measure δa at a) or has density (2πσ)−1/2 exp(−(2σ)−1 (x − a)2 ) with respect to Lebesgue measure,
October 24, 2013
4
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
where a ∈ R1 is its mean and σ > 0 is its dispersion. The measure for which a = 0 and σ = 1 is called standard Gaussian. Similarly the standard Gaussian measure on Rd is defined by its density (2π)−d/2 exp(−|x|2 /2) with respect to Lebesgue measure. Although in the next section a general concept of a Gaussian measure on a linear space is introduced, we define explicitly general Gaussian measures on Rd . These are measures that are concentrated on affine subspaces in Rd and are standard in suitable (affine) coordinate systems. In other words, these are images of the standard Gaussian measure under affine mappings of the form x → Ax + a, where A is a linear operator and a is a vector. A bit more explicit representation is provided by the Fourier transform of a bounded Borel measure µ on Rd defined by the formula µ (y) = exp(i(y, x))µ(dx), y ∈ Rd . In these terms, a measure µ is Gaussian if and only if its Fourier transform has the form 1 µ (y) = exp i(y, a) − Q(y, y) , 2 where Q is nonnegative quadratic form on Rd . The Fourier transform of the standard Gaussian measure is given by γ (y) = exp(−|y|2 /2). The change of variables formula yields the following relation between A and Q if µ is the image of γ under the affine mapping Ax + a: µ (y) = exp(i(y, Ax + a))γ(dx) = exp(i(y, a))
exp(i(A∗ y, x))γ(dx)
= exp(i(y, a) − |A∗ y|2 /2), that is, Q(y) = (AA∗ y, y). It is readily verified that µ has a density on the whole space precisely when A is invertible. The vector a is called the mean of µ and is expressed by the equality (y, a) = (y, x)µ(dx).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
5
For the quadratic form Q we have the equality Q(y, y) = (y, x − a)2 µ(dx). These equalities are verified directly (it suffices to check them in the onedimensional case). A random variable ξ on a probability space is called Gaussian if its distribution is a Gaussian measure; thus, Gaussian random variables exist on the unit interval with Lebesgue measure. Gaussian distributions possess a lot of much diverse characterizations; see [43], [107], [108], [114], [130] and [21]. Here only few of them are mentioned. The first of them was obtained by B. V. Gnedenko as a generalization of a theorem of S. Bernstein proved under the additional assumption about the existence of the finite second moment of ξ. Theorem 1.1. A random vector ξ in IRn is Gaussian if and only if for every random vector η that is independent of ξ and has the same distribution, the random vectors ξ − η and ξ + η are independent. The next result is due to M. Kac. Theorem 1.2. A random vector ξ in IRn is centered Gaussian if and only if, for every pair (ξ1 , ξ2 ) of independent copies ξ and every real number ϕ, the random vectors ξ1 sin ϕ + ξ2 cos ϕ, ξ1 cos ϕ − ξ2 sin ϕ are independent copies of ξ. The following useful characterization of the Gaussian property was found by G. Polya. Theorem 1.3. Let √ ξ and η be two independent random variables such that ξ, η, and (ξ + η)/ 2 have equal distributions. Then ξ and η are centered Gaussian. √ The factor 1/ 2 is put just for symmetry: the same is true for any combination αξ + βη, where α, β ∈ (0, 1) and α2 + β 2 = 1. Another normality test is due to H. Cramer. Theorem 1.4. Let ξ and η be two independent random variables such that ξ +η is normal. Then ξ and η are normal. In other words, if the convolution of two probability measures is Gaussian, then each of them is Gaussian as well. Let us also mention a characterization proved by S. Kwapien, M. Pycia, and W. Schachermayer in terms of inequalities for distribution functions.
October 24, 2013
6
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Theorem 1.5. Let η and ξ be two independent random variables with a common symmetric distribution such that ξ + η (1.1) P √ ≥ t ≤ P (| ξ |≥ t), ∀ t ≥ 0. 2 Then these random variables are Gaussian. The next result is called the Darmois–Skitovich theorem. Observe that Theorem 1.1 and Theorem 1.2 follow from this result. Theorem 1.6. Let ξ1 , . . . , ξn be independent random variables and let αi and βi , i = 1, . . . , n, be nonzero real numbers such that the random varin n ables i=1 αi ξi and i=1 βi ξi are independent. Then the variables ξi are Gaussian. Ramachandran [119] extended this result to a.e. convergent infinite ∞ ∞ series i=1 αi ξi and i=1 βi ξi with uniformly bounded ratios αi /βi and βi /αi . It has been recently shown by I. A. Ibragimov [84] that it suffices to have a uniform bound on one of these two ratios. Gaussian distribution often appear as extremal cases in various inequalities, see [21] for examples and further references.
2. Infinite-Dimensional Gaussian Distributions We define here Gaussian measures on general linear spaces. The main definition requires no topology, just the concept of a Gaussian random variable, but very quickly some limited knowledge of basic topology becomes useful (but not absolutely necessary). Definition 2.1. Let E be a linear space and let F be some linear space of linear functions on E. A probability measure µ on the σ-algebra σ(F ) is called Gaussian if the induced measure µ ◦ f −1 is Gaussian for every f ∈ F . If all these measures are centered, then µ is called centered. The Fourier transform (the characteristic functional) of µ is a complexvalued function on F defined by exp(if )dµ. µ (f ) = X
In the terminology of random processes this definition simply means that the random process {ξt }t∈T on (E, σ(F ), µ), where T = F , ξt = t(ξ),
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
7
Gaussian Measures on Infinite-Dimensional Spaces
ξ ∈ E, is Gaussian, which by definition means that the linear combinations of ξt have Gaussian distributions. Conversely, given a Gaussian process {ξt }t∈T with some parametric set T and a probability space (Ω, B, P), we take E = RT (the space of all real functions on T ), for F we take the linear space of functionals of the form x → c1 x(t1 ) + · · · + cn x(tn ),
ti ∈ T,
and for µ we take the distribution of the process in RT . The latter is first defined on all cylindrical sets Ct1 ,...,tn ,B = {x ∈ RT : (x(t1 ), . . . , x(tn )) ∈ B},
ti ∈ T, B ∈ B(Rn ),
by setting µ(Ct1 ,...,tn ,B ) = P(ω : (ξt1 (ω), . . . , ξtn (ω)) ∈ B). Then the classical Kolmogorov theorem yields a probability measure on the σ-algebra generated by all cylindrical sets (which coincides with σ(F )) that extends µ; the assumption of this theorem is that µ(Ct1 ,...,tn ,B ) is countably additive in B from B(Rn ), keeping n and t1 , . . . , tn fixed, which is fulfilled in our case due to the fact that P is countably additive. By construction, this extension is Gaussian. Gaussian measures are easily characterized by their Fourier transforms. Proposition 2.2. A measure µ on σ(F ) is Gaussian precisely when its Fourier transform has the form 1 µ (f ) = exp iL(f ) − Q(f, f ) , 2 where L is a linear function on F and Q is a nonnegative quadratic form on F . If µ is Gaussian, then by the change of variables formula (0.1) we
Proof. have
eit µ ◦ f −1 (dt) = exp(iaf − bf /2),
µ (f ) = X
where
af =
bf =
2
(t − a) µ ◦ f
−1
tµ ◦ f
−1
f dµ,
(dt) = X 2
(f − af ) dµ =
(dt) = X
2
f dµ − X
X
2 f dµ .
October 24, 2013
10:0
9in x 6in
8
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Obviously, L(f ) = af is linear in f and Q(f, f ) = bf is a nonnegative quadratic form. Suppose now that such L and Q exist. Then, again by (0.1), the Fourier transform of the measure µ ◦ f −1 on R has the form exp(iyf )dµ µ ◦ f −1 (y) = eiyt µ ◦ f −1 (dt) = X
= exp(iL(yf ) − Q(yf, yf )/2) = exp(iyL(f ) − y 2 Q(f, f )/2), which shows that µ ◦ f −1 is Gaussian.
Note that the Fourier transform in infinite dimensions was introduced by Kolmogorov (he called it “the Laplace transform”). Now a bit of topology: we recall that a linear space X is called locally convex if it is equipped with a family P of seminorms such that for every x = 0 there is p ∈ P with p(x) > 0; a seminorm p satisfies the conditions: p ≥ 0,
p(λx) = |λ|p(x),
p(x + y) ≤ p(x) + p(y).
Such a family defines the topology as follows: we take basic neighborhoods of zero of the form Up, 1...,pn ,ε = {x ∈ X : p1 (x) < ε, . . . , pn (x) < ε},
pi ∈ P, ε > 0,
and then declare to be open those sets that are arbitrary unions of shifts of such basic neighborhoods of zero (with possibly varying ε and pi ). Certainly, the empty set is also counted open. This is completely analogous to defining open sets in normed spaces, just in place of balls we take their analogs. Clearly, normed spaces are locally convex. The space of all continuous linear functions on a locally convex space X is denoted by X ∗ and is called the dual (or topologically dual) space. The dual space generates on X the so-called weak topology: the topology σ(X, X ∗ ) generated by the family of seminorms pf (x) = |f (x)|. A standard exercise in the theory of locally convex spaces is to show that the topological dual to X with the weak topology is again X ∗ . The main point to introduce here these topological concepts is that we need the space R∞ , the countable product of R, its elements are arbitrary sequences x = (x1 , x2 , . . .) which are real. There is a natural convergence in this space: just the coordinate-wise, but this convergence cannot be defined by a norm. However, the countable family of seminorms pn (x) = |xn | do
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
9
the job. One can also consider the metric d(x, y) =
∞
2−n min(|xn − yn |, 1)
n=1
on R∞ , which generates the same convergence and makes R∞ complete and separable. Note that the space R∞ 0 of finite sequences coincides with the dual space. If in Definition 2.1 the space F separates points in E, that is, for each x = 0 there is f ∈ F with f (x) = 0, then E becomes a locally convex space by taking the seminorms pf (x) = |f (x)|, f ∈ F . There are many aspects of the theory of Gaussian measures that do not involve topology, but in some important issues it becomes essential. Definition 2.3. Let X be a locally convex space and let µ be a Gaussian measure on σ(X ∗ ). Then µ is called a Gaussian measure on the locally convex space X. Example 2.4. An important example of a Gaussian measure is the countable product γ of the standard Gaussian measures on the real line. This measure is defined on the space E = R∞ . The elements of F = E ∗ are finite linear combinations of the coordinate functions, so γ is Gaussian. As we shall see, this special example plays a very important role in the whole theory. In some sense (precised below) this is a unique up to isomorphism infinite-dimensional Gaussian measure. ∞ We recall that a countable product µ = n=1 µn of probability mea sures µn on spaces (Xn , An ) is defined on X = ∞ n=1 Xn as follows: first it is defined on sets of the form A = A1 × · · · × An × Xn+1 · · · by µ(A) = µ1 (A1 ) × · · · × µn (An ), then it is verified that µ is countably additive on the algebra of finite unions of such sets (called cylindrical sets), which results in a countably additive extension to the smallest σ-algebra A := ∞ n=1 An containing such cylindrical sets. Certainly, one can also consider arbitrary products of Gaussian measures. The product of an arbitrary family of probability measures µt indexed by a nonempty set T is defined similarly, just in place of finite collections µ1 , . . . , µn one takes all possible finite collections µt1 , . . . , µtn .
October 24, 2013
10:0
9in x 6in
10
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Example 2.5. Let T be some index set such that for every t ∈ T we are given a Gaussian measure γt on a linear space Et with the correspond ing space Ft of linear functionals. Then the measure γ = t∈T γt on the product E = t∈T Et is Gaussian with respect to the space F = t∈T Ft consisting of functionals of the form f (x) = ft1 (xt1 ) + · · · + ftn (xtn ), ti ∈ T. It should be noted that if each Et is a locally convex space and Ft = Et∗ , then we have F = E ∗ . Example 2.6. A general Gaussian measure µ on T (equivalently, a Gaussian process indexed by T ) is completely determined by its covariance function K(t, s) and its mean m(t): m(t) = x(t)µ(dx), K(t, s) = (x(t) − m(t))(x(s) − m(s))µ(dx). T
T
Indeed, these two objects uniquely determine the finite-dimensional projec-
n tions µt1 ,...,tn as Gaussian measures on R with means m(t1 ), . . . , m(tn ) and covariances K(ti , tj ) i,j≤n ; these projections are obviously consistent. The standard Gaussian measure γ on R∞ can be restricted to many other smaller linear subspaces of full measure. For example, taking any ∞ sequence of numbers αn > 0 with n=1 αn < ∞, we can restrict γ to the weighted Hilbert space of sequences
∞ αn x2n < ∞ , E := (xn ) ∈ R∞ : n=1
making this expression the square of the norm. The fact that γ(E) = 1 follows by the monotone convergence theorem, which shows that ∞
αn x2n < ∞
n=1
almost everywhere due to convergence of the integrals of the terms (the integral of x2n is 1). Similarly, one can find non-Hilbert full measure Banach spaces of sequences (xn ) with supn βn |xn | < ∞ or limn→∞ βn |xn | = 0 for suitable sequences βn → 0; more precisely, the condition is this: ∞ C exp − 2 < ∞ ∀ C > 0. βn n=1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
11
However, there is no minimal linear subspace of full measure. The point is that the intersection of all linear subspaces of positive (equivalently and full) measure is the subspace l 2 , which has measure zero, as one can verify directly (indirectly this is seen from the fact that the series of independent random variables x2n diverges almost everywhere, since they are identically distributed); for example, one can use that l 2 is covered by the union of the cubes [−k, k]∞ , which are obviously of measure zero. A interesting and useful example of a more general type of full measure set is given by the classical result that the set Λ = (xn ) : lim sup |xn |/ log n = 1 n→∞
has full measure. It is essential to take here lim sup, not just lim (which does not exist a.e., since otherwise the space E corresponding to αn = n−1 | log(1 + n)|−2 would not have probability 1). Another nonlinear example: n−1 (x21 + · · · + x2n ) → 1 a.e. (the law of large numbers). Typically, full measure sets arise via limit theorems, for example, as domains of convergence of random series as above. One of the simplest examples, which will be important below when we discuss measurable linear ∞ functionals, is the series n=1 cn xn . It is known that the domain of convergence of this series has measure either zero or one (clearly, this domain is a linear subspace, so it is subject to zero-one laws discussed below) and the ∞ former is precisely the condition n=1 c2n < ∞. However, there is no explicit characterization of the domain of convergence; this domain is not the same as the domain of the absolute convergence. Actually l2 is the intersection of the smaller class of full measure spaces serving as convergence domains of such series: it follows from the Cameron–Martin formula below that l2 is ∞ contained in every full measure linear subspace and if n=1 h2n = ∞, then one can find (cn ) ∈ l2 such that the series of cn xn diverges. Remark 2.7. We shall see in Section 1.5 that, for any Gaussian measure on a separable Banach space, any measurable linear functional is generated by a continuous linear functional on a Banach space of full measure continuously embedded into the original space. In the case of R∞ such a space (even Hilbert) can be found more explicitly. However, it is not always a weighted l 2 -space, i.e. it is not always possible to find a sequence of numbers αn > 0 such that the corresponding space E is of full measure and the functional ∞ cn xn is continuous on it. Indeed, the latter condition means that ∞ ∞ n=1 ∞ −1 2 n=1 αn cn < ∞. Since also n=1 αn < ∞, we have n=1 |cn | < ∞,
October 24, 2013
10:0
12
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
which is not always the case. However, another Hilbert space works. Let 2 −k and us take increasing natural numbers Nk such that ∞ n=Nk cn < 8 introduce the space of sequences 2 Nk+1 ∞ ∞ −k 2 k Y = (xn ) : 2 xk + 4 cn xn <∞ n=Nk +1 k=1 k=1 equipped with the norm · Y whose square is the above series. Due to our choice of Nk the space Y (which is clearly a Hilbert space) has measure 1. ∞ The functional f (x) = n=1 cn xn is continuous on Y with the indicated Nk+1 cn xn | ≤ 2−k if x Y ≤ 1, which means that the norm norm, since | n=N k of f on Y is estimated by 2Nk maxn≤Nk |cn | + 1. The space Y depends on (cn ) and it is impossible to find any common Y for all such series. A discussion of various ways of defining the concept of a Gaussian measure can be found in [126].
3. The Wiener Measure Another key example is the Wiener measure PW on the space R[0,T ] of all functions on [0, T ] or on the space of continuous functions C[0, T ]. Example 3.1. On the space R[0,T ] the Wiener measure is defined by means of its finite-dimensional projections Pt1 ,...,tn , 0 < t1 < · · · < tn ≤ T , whose densities pt1 ,...,tn (x1 , . . . , xn ) with respect to Lebesgue measures on Rn have the following form: √
(x2 − x1 )2 1 1 x21 exp − × exp − 2t1 2(t2 − t1 ) 2πt1 2π(t2 − t1 ) (xn − xn−1 )2 1 exp − ×···× . 2(tn − tn−1 ) 2π(tn − tn−1 )
In addition, it is required that P0 = δ0 . Kolmogorov’s theorem on consistent finite-dimensional distributions yields the existence of the Wiener measure on R[0,T ] . Then it is verified that the set C[0, T ] of continuous functions has outer measure 1 (which is not straightforward and can be done in several different ways) and hence the measure PW can be defined also on C[0, T ], which is called the classical Wiener measure.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
13
Though, there are constructions of the Wiener measure directly on the space C[0, T ]. One, going back to Wiener, is this: let {ϕn } be an orthonormal basis in L2 [0, 1] and ψn (t) =
t 0
ϕn (s)ds,
t ∈ [0, 1].
It turns out (but is not straightforward) that, given a sequence {ξn } of independent standard Gaussian random variables on a probability space (Ω, B, P), the series w(t, ω) =
∞
ξn (ω)ψn (t)
n=1
converges uniformly for almost all ω. Then the Wiener measure PW can be defined by PW (B) = P(ω : w(·, ω) ∈ B),
B ∈ B(C[0, 1]).
In order to show that the right-hand side makes sense, i.e., the corresponding set of points ω is in B, it suffices to observe that this is true for all cylindrical sets of the form {x : x(t) ≤ c}, where t ∈ [0, 1], c ∈ R, and then show that the smallest σ-algebra in C[0, 1] containing such sets is the Borel σ-algebra. The latter follows by a simple observation that in separable metric spaces the Borel σ-algebras are generated by closed balls and that every closed ball in C[0, 1] can be written as a countable intersection of the indicated cylindrical sets. It should be noted that the original Wiener construction (see [136], [137], [138] and [139]) was different: he first constructed the integral over the path space in the spirit of the Daniell integral. A straightforward calculation shows that the Wiener measure has mean zero and the covariance function K(t, s) = min(t, s). Behind the Wiener measure there is another important object: the Wiener process. One possible definition is this: a Wiener process {wt }t≥0 on a probability space (Ω, B, P) is a random process such that 1) w0 = 0, 2) whenever t1 ≤ t2 ≤ · · · ≤ tn , the random variables wtn − wtn−1 , . . . , wt2 − wt1 are independent and wt − ws is centered Gaussian with dispersion t − s if s ≤ t, 3) the trajectory t → wt (ω) is continuous for almost every ω. One can show that the distribution of such a process in C[0, 1] is the Wiener measure. Conversely, having the Wiener measure PW on C[0, 1],
October 24, 2013
10:0
9in x 6in
14
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
one can construct a Wiener process (for t ∈ [0, 1]) by setting wt (ω) := ω(t), where ω ∈ Ω = C[0, 1] and P = PW . Other, technically more involved but useful definitions of a Wiener process exist with certain additional requirements concerning filtrations on probability spaces. Such definitions will not be employed below. Useful information related to Brownian motion is collected in [40]. Certain other important Gaussian measures are introduced by using the Wiener measure or modifying some of its properties, and similarly with Gaussian processes. Example 3.2. (i) The fractional Brownian movement is defined as a centered Gaussian random process ξα on [0, T ] (where T ∈ (0, ∞]) with the covariance function K(s, t) =
sα + tα − |s − t|α , 2
α ∈ (0, 2].
It is also possible to add a positive factor in front; the general form of the covariance function of a fractional Brownian movement is K(s, t) =
(H) 2H (s + t2H − |s − t|2H ), 2
α ∈ (0, 1],
where H is called the Hurst parameter. It is possible to express the fractional Brownian motion via stochastic integration with a Wiener process. This class of processes was introduced by Kolmogorov [88] more than 70 years ago and since then collected a vast literature; see [54] and [116] for recent surveys. (ii) The stationary Ornstein–Uhlenbeck process ξ = (ξt )t∈T on a parameter set T ⊂ IR1 is defined as a centered Gaussian process with the covariance function K(s, t) = e−|t−s| . The stationary Ornstein–Uhlenbeck process can be expressed by means of the Wiener process by virtue of the formula ξt = e−t we2t . For this process, every random variable ξt is standard Gaussian. (iii) Let β > 0 and σ > 0. The Ornstein–Uhlenbeck process with the parameters β and σ and the initial normal distribution having mean m0 and variance σ02 is defined as a Gaussian process with mean m(t) = e−βt m0 and the covariance function cov(ξt , ξs ) = e−(t+s)β [σ02 − m20 ] +
σ 2 −β(t+s) 2β min(t,s) (e − 1). e 2β
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
15
If σ = σ0 = 1, m0 = 0 and β = 1/2, we arrive at the stationary Ornstein– Uhlenbeck process. The Ornstein–Uhlenbeck processes also admit representations by means of stochastic integrals with respect to Wiener processes. (iv) The fractional Ornstein–Uhlenbeck process is a centered Gaussian process with the covariance function K(t, s) = exp(−|t − s|α ),
0 < α < 2.
(v) The Brownian bridge is the process wt0 := wt − tw1 on T = [0, 1]. Its covariance function has the form min(s, t) − st. (vi) The Wiener field (or Brownian sheet) is a centered Gaussian process on the index set T = [0, 1]d with the covariance function K(s, t) =
d
min(si , ti ).
i=1
Gaussian measures on separable Banach spaces can be introduced via an interesting concept of norms measurable in the sense of Gross. First we define the canonical cylindrical Gaussian measure on a separable Hilbert space H. A general cylindrical measure ν is a set function defined on the algebra of all cylindrical sets in H of the form C = {x : P x ∈ C0 }, where P0 is an orthogonal projection on a finite-dimensional subspace H0 in H and C0 is a Borel set in H0 , such that the induced measure νH0 : C0 → ν(C) is countably additive on B(H0 ). If every such measure is standard Gaussian on the finite-dimensional Euclidean space H0 , then ν is called the canonical cylindrical Gaussian measure on H. It is important that ν is not countably additive on the algebra of cylinders if H is infinite-dimensional: otherwise it would have an extension to a Borel probability measure on H, which is impossible since the outer measure of every ball in H is zero (actually, identifying H with l2 , we would have that such an extension coincides with the restriction of the countable product of standard Gaussian measures to l2 , which is zero). Cylindrical measures can be also defined on Banach or locally convex spaces. Namely, a cylindrical set C in a Banach or locally convex space X is a set of the form Cf1 ,...,fn ,C0 = {x ∈ X : (f1 (x), . . . , fn (x)) ∈ C0 } ,
fi ∈ X ∗ , C0 ∈ B(Rn ).
October 24, 2013
16
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Such sets form the cylindrical algebra Cyl(X); a cylindrical measure σ is a function on Cyl(X) such that its images under projections x → (f1 (x), . . . , fn (x)) are countably additive (again on the whole algebra Cyl(X) it may fail to be countably additive). Our previously defined cylindrical measures on a Hilbert space fit this more general framework, since any cylindrical set C of the form C = A−1 (C0 ) in H with a bounded operator A with values in Rn and a Borel set C0 ∈ B(Rn ), can be written as P −1 (C1 ), where P is the orthogonal projection to the finite-dimensional subspace H0 in H that is the orthogonal complement of A−1 (0) and C1 is a Borel set in H0 (namely, C1 = A−1 (C0 )). Definition 3.3. Let H be a separable Hilbert space and let P(H) be the set of all orthogonal projections in H with finite-dimensional ranges. A seminorm q on the space H is called measurable in the sense of Gross (or Gross measurable) if, for every ε > 0, there exists a finite-dimensional orthogonal projection Pε ∈ P(H) such that ν(x ∈ H : q(P x) > ε) < ε
if P ∈ P(H) and P ⊥ Pε ,
(3.1)
where ν is the canonical cylindrical Gaussian measure on H. For example, the standard norm |·|H of H is not Gross measurable unless H is finite-dimensional. Indeed, let ε = 1/2. Then, for any finitedimensional projection P0 , we can find a projection P on a sufficiently large finite-dimensional subspace in H orthogonal to the range of P0 such that ν(x ∈ H : |P x|H > 1/2) > 1/2. On the other hand, one can show that every Gross measurable norm is continuous. Definition 3.4. A triple (i, H, B) is called an abstract Wiener space if B is a separable Banach space, H is a separable Hilbert space, i : H → B is a continuous linear embedding with dense range, and the norm q of B is measurable on H in the sense of Gross (more precisely, q ◦ i is Gross measurable). Clearly, having a cylindrical measure ν on H, we can define its image ν ◦ i−1 on B by ν ◦ i−1 (Cf1 ,...,fn ,C0 ) := ν(Cf1 ◦i,...,fn ◦i,C0 ). For simplicity the measure ν ◦ i−1 on B is denoted by the old symbol ν and is called the extension of ν to B (of course, it is not the usual extension for
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
17
two reasons: H is not a cylindrical set in B and even if ν ◦ i−1 is countably additive the set H may be of zero measure with respect to it). The idea of Gross [73] was to characterize countably additive Gaussian measures as extensions of the standard cylindrical Gaussian measure (which is not countably additive) under embeddings generated by his measurable norms. The precise assertion is this (see [21] for a proof). Theorem 3.5. Let (i, H, B) be an abstract Wiener space. Then the canonical cylindrical Gaussian measure ν on H is countably additive on B. In addition, H coincides with the Cameron–Martin space of ν on B. Conversely, let γ be a centered Gaussian measure on a separable Banach space X such that H = H(γ) is dense in X and let i : H → X be the natural embedding. Then (i, H, X) is an abstract Wiener space. In general, the verification of the fact that a given norm is Gross measurable may be difficult. For example, it is not straightforward to see that the sup-norm of C[0, 1] is Gross measurable on H = W02,1 [0, 1]. However, it is relatively simple in the case of Euclidean norms. Namely, if a continuous norm q on H is generated by some inner product and Hq is the completion of H with respect to q, then q is Gross measurable precisely when the natural embedding H → Hq is a Hilbert–Schmidt operator. For example, if H = l2 ∞ 2 2 2 and q 2 (x) = ∞ n=1 qn xn , then the condition is that n=1 qn < ∞. In this case it not difficult to estimate directly Gaussian measures of ellipsoids corresponding to qn .
4. Radon Gaussian Measures A nonnegative Borel measure µ on a topological space X is called Radon if, for every set B ∈ B(X) and every ε > 0, there is a compact set Kε ⊂ B such that µ(B\Kε ) < ε. Theorem 4.1. Each Borel measure on an arbitrary complete separable metric space X is Radon. Moreover, this is true for any Souslin space X, i.e., the image of a complete separable metric space under a continuous mapping. In particular, this is true for the spaces C[0, 1], R∞ , and all separable Hilbert spaces. There exist non-Radon Gaussian measures on locally convex spaces, but this is a very rare situation (practically impossible in applications; see, however, Example 4.3).
October 24, 2013
18
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Definition 4.2. A Radon probability measure γ on a locally convex space X is called Gaussian if for every f ∈ X ∗ the induced measure γ◦f −1 is Gaussian on the real line. It is known that any Radon Gaussian measure γ has mean m ∈ X, i.e., m is a vector in X such that f (x)γ(dx) ∀ f ∈ X ∗ . f (m) = X
If m = 0, i.e., the measures γ ◦ f −1 for f ∈ X ∗ have zero mean, then γ is called centered. Any Radon Gaussian measure γ is a shift of a centered Gaussian measure γm defined by the formula γm (B) := γ(B + m). Hence for many purposes it suffices to consider only centered Gaussian measures. Not all Gaussian measures are Radon. Example 4.3. Let γ be the product of an countable family T of standard Gaussian measures on the real line (see Example 2.5). Then γ(K) = 0 for every compact set K in RT . Indeed, every compact set K is contained in a product of compact intervals It = [−Ct , Ct ]. Since T is uncountable, there exist k ∈ N and a countable set {tn } such that Ctn ≤ k −1 . Clearly, the ∞ set n=1 Itn has measure zero with respect to the corresponding countable power of the standard Gaussian measure, hence γ(K) = 0. It is possible (but difficult) to prove that the measure in the previous example can be extended to a Borel measure on X = RT (originally it is defined on the smaller σ-algebra generated by X ∗ ). Under the continuum hypothesis, there is even a Borel Gaussian measure on a separable Euclidean space vanishing on all compact sets (see [21, p. 150, 151]). It is also worth noting that in the case of a nonseparable Banach space X it may happen that a centered Gaussian measure on the σ-algebra σ(X) has no Borel extensions. Proposition 4.4. Let µ be a centered Gaussian measure on l ∞ such that σn2 = x2n µ(dx) → 0. If µ(c0 ) = 0, then, assuming the continuum hypothesis, the measure µ cannot be extended to a measure on the Borel σ-field of l∞ . In particular, this is the case if µ is the product of centered one-dimensional Gaussian measures with variances σn2 = (log(n + 1))−1 .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
19
5. The Cameron–Martin Space and Measurable Linear Operators For a centered Radon Gaussian measure γ we denote by Xγ∗ the closure of X ∗ in L2 (γ). The elements of Xγ∗ are called γ-measurable linear functionals. There is an operator Rγ : Xγ∗ → X, called the covariance operator of the measure γ, such that f (x)g(x)γ(dx) ∀ f ∈ X ∗ , g ∈ Xγ∗ . f (Rγ g) = X
Set g := h if h = Rγ g. Then h is called the γ-measurable linear functional generated by h. The following vector equality holds (if X is a Banach space, then it holds in Bochner’s sense): Rγ g = g(x)xγ(dx) ∀ g ∈ Xγ∗ . X
For example, if γ is a centered Gaussian measure on a separable Hilbert space X, then there exists a nonnegative nuclear operator K on X for which Ky = Rγ y for all y ∈ X, where we identify X ∗ with X. Then we obtain (Ky, z) = (y, z)L2 (γ)
and γ (y) = exp(−(Ky, y)/2).
Let us take an orthonormal eigenbasis {en} of the operator K with eigenvalues {kn }. Then γ coincides with the image of the countable power γ0 of the standard Gaussian measure on R1 under the mapping R∞ → X,
(xn ) →
∞ k n xn e n . n=1
∞ This series converges γ0 -a.e. in X by convergence of the series n=1 kn x2n , which follows by convergence of the series of kn and the fact that the integral of x2n against the measure γ0 equals 1. Here Xγ∗ can be identified with √ the completion of X with respect to the norm x → Kx X , i.e., the embedding X = X ∗ → Xγ∗ is a Hilbert–Schmidt operator. The space H(γ) = Rγ (Xγ∗ )
October 24, 2013
20
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
is called the Cameron–Martin space of the measure γ. It is a Hilbert space with respect to the inner product (h, k)H := h(x) k(x)γ(dx). X
The corresponding norm is given by the formula |h|H := h L2 (γ) . Moreover, it is known that H(γ) with the indicated norm is separable and its closed unit ball is compact in the space X. Note that the same norm is given by the formula |h|H = sup{f (h) : f ∈ X ∗ , f L2(γ) ≤ 1}. It should be noted that if dim H(γ) = ∞, then γ(H(γ)) = 0. In terms of the inner product in H the vector Rγ (l) is determined by the identity f gdγ, f ∈ X ∗ , g ∈ Xγ∗ . (5.1) (jH (f ), Rγ g)H = f (Rγ g) = X
The following equality is also worth noting: γ (l) = exp(−|Rγ (l)|2H /2),
l ∈ X ∗.
In the above example of a Gaussian measure γ on a Hilbert space we have √ H(γ) = K(X). Let us observe that H(γ) coincides also with the set of all vectors of the form f (x)xγ(dx), f ∈ L2 (γ). h= X
Indeed, letting f0 be the orthogonal projection of f onto Xγ∗ in L2 (γ), we see that the integral of the difference [f (x) − f0 (x)]x over X vanishes since the integral of [f (x) − f0 (x)]l(x) vanishes for each l ∈ X ∗ . Theorem 5.1. The mapping h → h establishes a linear isomorphism ∗ between H(γ) and Xγ preserving the inner product. In addition, Rγ h = h. en } is an orthonormal If {en } is an orthonormal basis in H(γ), then { basis in Xγ∗ and e n are independent random variables.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
21
One can take an orthonormal basis in Xγ∗ consisting of all elements ξn ∈ X ∗ . The general form of an element l ∈ Xγ∗ is this: l=
∞
cn ξn ,
n=1
where the series converges in L2 (γ). Since ξn are independent Gaussian random variables, this series converges also γ-a.e. The domain of its convergence is a Borel linear subspace L of full measure (we even have L ∈ σ(X)). One can take a version of l which is linear on all of X in the usual sense; it is called a proper linear version. For example, let l on L be defined as the sum of the indicated series; then we extend l to all of X by linearity in an arbitrary way (e.g., taking a linear subspace L1 which algebraically complements L and setting l(x + y) = l(x) whenever x ∈ L, y ∈ L1 ). Such a version is not unique in the infinite-dimensional case, but any two proper linear versions coincide on the subspace H(γ) (although it has measure zero!). Thus, every γ-measurable linear functional f has a version f0 linear on the whole space (however, it is not always possible to find a Borel linear version, since all Borel linear functions on Banach spaces are continuous). It is easy to show that such a version is automatically continuous on H(γ) with the norm |·|H ; more precisely, f hdγ, h ∈ H. f0 (h) = (Rγ f, h)H = X
Indeed, the second equality holds by the definition of the inner product in H. To prove the first one we take a sequence {fn } ⊂ X ∗ convergent to f in L2 (γ). Passing to a subsequence we may assume that fn → f a.e. The set L of convergence is a full measure linear subspace, hence it contains H and f0 on H coincides with the pointwise limit of fn |H . Let h = Rγ k, k ∈ Xγ∗ . Then k = h and fn (h) = fn (Rγ k) = fn kdγ → f kdγ = f hdγ. X
X
X
Conversely, any continuous linear functional l on the Hilbert space H(γ) admits a unique extension to a γ-measurable proper linear functional l such that l coincides with l on H(γ). For every h ∈ H(γ), such an extension of the ∞ ∞ h. If h = n=1 cn en , then h = n=1 cn e functional x → (x, h)H is exactly n. Two γ-measurable linear functionals are equal almost everywhere precisely when their proper linear versions coincide on H(γ).
October 24, 2013
22
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
In the opposite direction, it is known that every proper linear γ-measurable function belongs to Xγ∗ . The reader is warned, however, that the continuity of the restriction to H of a general linear function l on X does not mean that l is γ-measurable. In [21, Chapter 3] there is an example due to von Weizs¨acker of a nonmeasurable linear function l on R∞ that is identically zero on H; of course, this restriction lifts to a measurable (in this case zero) functional, but the latter is not the original functional. If a measure γ on X = R∞ is the countable power of the standard Gaussian measure on the real line, then X ∗ can be identified with the space of all sequences of the form f = (f1 , . . . , fn , 0, 0, . . .). Here we have (f, g)L2 (γ) =
∞
fi g i .
i=1
Hence Xγ∗ can be identified with l 2 ; any element l = (cn ) ∈ l2 defines ∞ an element of L2 (γ) by the formula l(x) := n=1 cn xn , where the series converges in L2 (γ). Therefore, the Cameron–Martin space H(γ) coincides with the space l2 with its natural inner product. An element l represents a continuous linear functional precisely when only finitely many numbers cn are nonzero. For the Wiener measure on C[0, 1] the Cameron–Martin space coincides with the class W02,1 [0, 1] of all absolutely continuous functions h on [0, 1] such that h(0) = 0 and h ∈ L2 [0, 1]; the inner product is given by the formula 1 (h1 , h2 )H := h1 (t)h2 (t)dt. 0
The general form of a measurable linear functional with respect to the Wiener measure is given by the stochastic integral l(x) =
1
0
h (t)dx(t).
We recall that the stochastic integral I(u) =
0
1
u(t)dwt
of a function u ∈ L2 [0, 1] with respect to the Wiener process is defined as follows: if u is a step function assuming values ci on the intervals [ti , ti+1 ),
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
23
then I(u) :=
ci (wti+1 − wti );
i
it is readily verified that I is linear and I(u) L2 (P) = u L2[0,1] , hence I extends to an isometry from L2 [0, 1] to a closed subspace in L2 (P). This means that I(u) is the L2 -limit of Gaussian random variables I(uk ), where step functions uk converge to u in L2 [0, 1]. The functional l defined by the above stochastic integral with h is continuous precisely when u = h has bounded variation. Indeed, if h has bounded variation, then l can be rewritten via the integral over du. Conversely, if there is a continuous linear functional f on C[0, 1] that coincides with l almost everywhere with respect to the Wiener measure, then it has the form f (x) = x(t)σ(dt) with some bounded measure σ on [0, 1]. If we show that the restriction of the proper linear version of l to the Cameron–Martin space is given by the integral of h (t)x (t), then we obtain that −σ is the derivative of h in (0, 1) in the sense of distributions, which yields that h is of bounded variation (alternatively, by considering piecewise constant x one can show that the variation of h is estimated by the variation of the measure σ). It remains to verify a general fact that the restriction of the proper linear version of the stochastic integral l(x) =
0
1
ψ(t)dx(t),
where ψ ∈ L2 [0, 1], to the space H = W02,1 [0, 1] is given by the integral of ψx . This follows from the following two facts: (1) The claim is true if ψ is a step function (then l is continuous on C[0, 1], which is obvious from the definition of the stochastic integral); (2) If step functions ψn converge to ψ in L2 [0, 1], then the corresponding stochastic integrals converge in L2 (PW ) and their restrictions to H converge to the restriction of the proper linear version of l. It is worth mentioning that if a linear function on a Banach space is measurable with respect to all Radon Gaussian measures, then it is
October 24, 2013
24
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
necessarily continuous. Hence the space of measurable linear functionals depends on γ. The next classical result, called the Cameron–Martin formula, relates measurable linear functionals and vectors in the Cameron–Martin space to the Radon–Nikodym density for shifts of the Gaussian measure. Theorem 5.2. The space H(γ) is the set of all h ∈ X such that γh ∼ γ, where γh (B) := γ(B + h), and the Radon–Nikodym density of the measure γh with respect to γ is given by the following Cameron–Martin formula: h − |h|2H /2). dγh /dγ = exp(− For every h ∈ H(γ) we have γ ⊥ γh . A centered Radon Gaussian measure is uniquely determined by its Cameron–Martin space (with the indicated norm!): if µ and ν are centered Radon Gaussian measures such that H(µ) = H(ν) and |h|H(µ) = |h|H(ν) for all h ∈ H(µ) = H(ν), then µ = ν. The Cameron–Martin space is also called the reproducing Hilbert space. Definition 5.3. A Radon Gaussian measure γ on a locally convex space X is called nondegenerate if for every nonzero functional f ∈ X ∗ the measure γ ◦f −1 is not concentrated at a point. The nondegeneracy of γ is equivalent to that γ(U ) > 0 for all nonempty open sets U ⊂ X. This is also equivalent to that the Cameron–Martin space H(γ) is dense in X. For every degenerate Radon Gaussian measure γ there exists the smallest closed linear subspace L ⊂ X for which γ(L + m) = 1, where m is the mean of the measure γ. Moreover, L + m coincides with the topological support of γ. If m = 0, then on L the measure γ is nondegenerate. The role of the countable power of the standard Gaussian measure is clear from the following important theorem due to B. S. Tsirelson [131] (a proof can be found also in [21]) who extended another classical result established earlier by Itˆo and Nisio for Banach spaces. Theorem 5.4. Let γ be a centered Radon Gaussian measure on a locally convex space X, let {en } be an orthonormal basis in H(γ), and let {ξn } be independent standard Gaussian random variables (for example, the sequence of coordinate functions on R∞ with the countable power of the standard ∞ Gaussian measure on the real line). Then the series n=1 ξn (ω)en converges in X for a.e. ω and the distribution of its sum is γ. In particular,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
25
this is true if ξn = en . In addition, there is a Souslin linear subspace S ⊂ X with γ(S) = 1. This theorem shows that the countable power of the standard Gaussian measure on the real line is the main (and essentially unique) example of a centered Radon Gaussian measure since every centered Radon Gaussian measure γ is the image of this countable product under a measurable linear mapping T (however, T need not be continuous); the mapping T is given by the series indicated in the theorem, and its restriction to l 2 is an isometry between l2 and H(γ). By analogy with functionals measurable linear operators are introduced. Definition 5.5. A mapping T from X to a locally convex space Y is called a γ-measurable linear operator if it is measurable with respect to the pair of σ-algebras B(X)γ and B(Y ) and has a version that is linear in the usual sense (this version is called proper linear). The general form of a measurable linear operator is given in the next theorem. Theorem 5.6. Let γ be a centered Radon Gaussian measure on a locally convex space X with the Cameron–Martin space H and let T be a γmeasurable linear operator with values in a locally convex space Y such that the measure γ ◦ T −1 is Radon (which is automatically true if Y is a separable Fr´echet space.) Let {en } be an orthonormal basis in H and let T0 be a proper linear version of T . Then γ-a.e. Tx =
∞
e n (x)T0 en ,
n=1
where the series converges in Y a.e. In addition, the restriction of T0 to H is a bounded operator from H to H(γ ◦ T −1). We observe that the case of two different spaces reduces to that of a single space by passing to the product X × Y with the product measure γ⊗δ0 and the operator S : (x, y) → (0, T x), which takes γ⊗δ0 to δ0 ⊗γ◦T −1. It turns out that, conversely, any bounded operator on H gives rise to a measurable linear operator. Theorem 5.7. Let γ be a centered Radon Gaussian measure on a locally convex space X with the Cameron–Martin space H. Then, for every operator T ∈ L(H), there exists a γ-measurable proper linear mapping
October 24, 2013
10:0
9in x 6in
26
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
T : X → X with the following properties: (i) the mapping T coincides with T on H,
(ii) the image of the measure γ under the mapping T is a centered Radon Gaussian measure µ with the Cameron–Martin space H(µ) = T (H). Any two such mappings are equal γ-a.e. Moreover, if {en } is an orthonormal basis in H(γ), then almost everywhere ∞
Tx =
e n (x)T en ,
n=1
where the series converges γ-a.e. A proper linear version can be defined by the sum of this series on its domain of convergence (and extended by linearity to the whole space). ∞ If the measure γ is the distribution of the series n=1 ξn (ω)en from ∞ Theorem 5.4, then µ is the distribution of the series n=1 ξn (ω)T en , which converges a.e. Let us observe that the Fourier transform of the measure µ has the form µ (l) = exp(−|T ∗ Rγ l|2H /2).
(5.2)
Indeed, for any l ∈ X ∗ we have |T ∗ Rγ l|2H =
∞
(T ∗ Rγ l, en )2H =
n=1
exp i
µ (l) = X
∞
(Rγ l, T en)2H =
n=1 ∞
∞
|l(T en )|2 ,
n=1
en (x)l(T en ) γ(dx) = exp −
n=1
∞
|l(T en )|2 /2 .
n=1
By using this theorem one can obtain a somewhat more general result. Corollary 5.8. Let γ and µ be centered Radon Gaussian measures on locally convex spaces X and Y, respectively, and let A : H(γ) → H(µ) be a continuous linear operator. Then A extends to a γ-measurable linear : X → Y such that the image of γ under this mapping is a cenmapping A tered Radon Gaussian measure with the Cameron–Martin space A(H(γ)). Corollary 5.9. Let X and Y be locally convex spaces and let γ be a centered Radon Gaussian measure on X. A continuous linear operator A : H(γ) → Y
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
27
: X → Y for is the restriction to H(γ) of a measurable linear operator A −1 which the measure γ ◦ A is Radon precisely when there exists a Gaussian Radon measure ν on Y such that A(H(γ)) ⊂ H(ν). If Y is a Hilbert space, then this is equivalent to the inclusion A ∈ H(H(γ), Y ), hence to the existence of a Hilbert–Schmidt operator T on Y with A(H(γ)) ⊂ T (Y ). If we have two bounded operators A and B in L(H), then we obtain and B. In addition, the operator AB has an two measurable operators A ! If B takes γ into an equivalent measure, then extension AB. !=A ◦ B. AB Bx) is well-defined, namely, The main point here is that the composition A( does not depend on versions of A and B, since if we change both on a measure zero set Z, then this does not affect the equivalence class due to −1 (Z) has measure zero. the fact that B As we have already seen in Section 1.2, measurable linear functionals on R∞ with the standard Gaussian measure are actually continuous linear functionals on compactly embedded Hilbert spaces of full measure. This is false in general just because of the absence of full measure continuously embedded Hilbert spaces. Example 5.10. Let γ be a Gaussian measure on C[0, 1] such that the identity embedding of its Cameron–Martin space H into L2 [0, 1] is not a trace class operator. Then γ(E) = 0 for every Hilbert space E continuously embedded into C[0, 1]. In particular, this is true for the classical Wiener measure. Indeed, if E is a Hilbert space continuously embedded into C[0, 1], then one can show that it must be separable. If γ(E) > 0, then γ(E) = 1, so H ⊂ E and the embedding H → E is a Hilbert–Schmidt operator. It is known that the embedding E → L2 [0, 1] is also a Hilbert–Schmidt operator (see [21]), so the composition of two embeddings must be nuclear. However, the situation changes if non-Hilbert spaces are allowed. Proposition 5.11. Let γ be a centered Radon Gaussian measure on a Fr´echet space X and let f be a γ-measurable linear functional. Then there is a separable reflexive Banach space E of full γ-measure compactly embedded into X such that f coincides almost everywhere with a functional in E ∗ . More generally, given a γ-measurable linear operator T from X to a separable Fr´echet space Y, one can find a separable reflexive Banach space E of full γ-measure compactly embedded into X such that A coincides almost everywhere with a continuous linear operator from E to Y .
October 24, 2013
28
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Actually, this result has nothing to do with Gaussian, as shown in [141], it is valid for general measures and measurable linear operators defined as almost everywhere limits of sequences of continuous linear operators. A construction can be made a bit more explicit for Gaussian measures. Also the existence of a reflexive Banach support is a general phenomenon. However, it is not known whether the same conclusion holds for incomplete normed spaces with Radon Gaussian measures. Let us explain how the construction described in Section 1.2 should be changed; for simplicity measurable linear ∞ functionals are considered. First we write f as f = n=1 cn fn almost everywhere with certain (cn ) ∈ l2 and fn ∈ X ∗ such that fn L2 (γ) = 1. Next we find a compactly embedded separable reflexive Banach space (B, · B ) of full measure (which exists for any Radon measure on any Fr´echet space, see [23, Theorem 7.12.4]). Finally, we set 2 1/2 Nk+1 ∞ k 4 cn fn (x) , q(x) = x B + n=Nk +1 k=1 2 −k where increasing Nk are such that ∞ , and take for E the n=Nk cn < 8 subspace of all vectors with finite norm q(x). The same reasoning as in ∞ Section 1.2 shows that E has full measure and n=1 cn fn is bounded on the unit ball of E. The completeness of E follows from the readily verified fact that its closed unit ball is closed in X (here the continuity of the functions fn is important). Since q(x) ≥ x B , the embedding E → X is compact. The reflexivity of E is verified by using the reflexivity of B (which yields the reflexivity of Z = B ⊕ l 2 and E is isometric to a subspace of Z Nk+1 via the mapping x → (x, Ax), Ax = (2k gk (x))∞ n=Nk +1 cn fn ); k=1 , gk = then the space E is automatically separable due to the separability of B. It is worth noting that E is Hilbert if B is Hilbert. Remark 5.12. For every Gaussian measure γ on a separable Fr´echet space X one can find a compactly embedded Banach space of full measure with a Schauder basis. This follows from a more general result of Okazaki [113]. However, this fact does not extend to arbitrary measures on X, as shown in [65]. In relation to measurable linear operators we mention a useful result on Gaussian conditional expectations and measures. The following result is an infinite-dimensional version of the classical theorem about normal correlation.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
29
Theorem 5.13. Let ξ and η be two Gaussian random vectors with values in locally convex spaces X and Y such that the vector (ξ, η) has a Radon Gaussian distribution in X ⊕ Y . Then the conditional expectation IE(ξ|η) of the vector ξ with respect to the σ-field generated by η exists and is a Gaussian vector. In addition, ξ = IE(ξ|η) + ζ, where ζ is a Gaussian random vector in X independent of η and having a Radon distribution. Moreover, denoting by γη the distribution of η, one can find a γη -measurable linear operator A : Y → X such that IE(ξ|η) = Aη. For a proof, see [21, p. 140], where Y = X, but this is not essential; the last assertion of the theorem is not explicitly mentioned there, but it is seen from the proof. We recall that the characteristic property of the conditional expectation IE(ξ|η) is that it is measurable with respect to the σ-field generated by η and for all l ∈ X ∗ and all bounded Borel functions ϕ on Y one has IE[l(ξ)ϕ(η)] = IE[l(IE(ξ|η))ϕ(η)]. Let us now consider conditional measures. Let X and Y be locally convex spaces and let γ be a centered Radon Gaussian measure on the product Z = X ×Y . Its projection γY on Y is a centered Radon Gaussian measure. For any set B ⊂ Z, let B y = {x ∈ X : (x, y) ∈ B}. The general theory of conditional measures (see [23, Chapter 10]) gives a family of Radon probability measures γ y , y ∈ Y , on X, called conditional measures, such that for every Borel set B ⊂ Z the function y → γ y (B y ) is γY -measurable and γ y (B y )γY (dy). γ(B) = Y
This means that the integral of a bounded Borel function f against γ can be evaluated as f (x, y)γ(dx dy) = f (x, y)γ y (dx)γY (dy). X×Y
Y
X
However, in the Gaussian case we can get more: the conditional measures γ y are Gaussian, moreover, they can be found explicitly. The next result in the case of Radon measures answers the question raised in [89].
October 24, 2013
10:0
9in x 6in
30
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Theorem 5.14. There exist a centered Radon Gaussian measure µ on X and a γY -measurable linear operator A : Y → X such that the shifts µAy of µ by the vectors Ay serve as conditional measures γ y . Proof. We consider two Gaussian elements ξ = x and η = y on (Z, γ) and the distribution µ of the centered Gaussian vector ζ = ξ − Aη from the previous theorem. We have x = Ay + ζ(x, y), where the random elements y and ζ(x, y) on (Z, γ) are independent Gaussian. Let γ0 be the measure on Z obtained by integrating µAy in y with respect to γY . It is a Radon Gaussian measure, which is centered, since γY is symmetric and A is linear. In order to identify γ and γ0 it suffices to show that the integral of the square of every element l ∈ Z ∗ is the same for both measures. Any such functional has the form l(x, y) = l1 (x) + l2 (y), l1 ∈ X ∗ , l2 ∈ Y ∗ . The function l1 (Ay) gives the conditional expectation of l1 (x) with respect to the σ-field generated by the second coordinate, in particular, the integrals of l1 (x)l2 (y) and l1 (Ay)l2 (y) against γ coincide. Hence, using independence of random variables l1 (Ay) and l1 (z(x, y)) and the definition of µ (the law of ζ) we find that $ % 2 l1 (x) + 2l1 (x)l2 (y) γ(dx dy) + l2 dγ = l22 dγY X×Y
X×Y
=
$
l1 (Ay + ζ(x, y))
X×Y
=
X
2
%
γ(dx dy) + Y
Y
&
' 2 (l1 ◦A) l2 + l22 dγY
$ % 2 2 (l1 ◦A) + 2(l1 ◦A)l2 + l22 dγY . l1 dµ + Y
On the other hand, 2 (l1 (x) + l2 (y)) µAy (dx) = (l1 (x + Ay) + l2 (y))2 µ(dx) X
X
= l1 (Ay)2 + 2l1 (Ay)l2 (y) + l2 (y)2 +
X
which yields the same value as above after integration in γY .
l12 dµ,
Using characterizations of Gaussian random variables we obtain characterizations of measurable linear functionals. Theorem 5.15. Let γ be a centered Radon Gaussian measure on a locally convex space X (or a centered Gaussian measure on σ(X)) and let
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
31
f : X → IR1 be a γ-measurable function such that f
x+y √ 2
=
f (x) + f (y) √ 2
for γ ⊗ γ-a.e. pair (x, y) ∈ X × X. Then f ∈ Xγ∗ . Theorem 5.16. Let γ be a centered Radon Gaussian measure on a locally convex space X and let f be a γ-measurable function such that, for every fixed h ∈ H(γ), one has f (x + h) − f (x) = g(h) γ-a.e., where g(h) is some number. Then f = c + f0 γ-a.e., where c ∈ R1 and f0 ∈ Xγ∗ . Corollary 5.17. Let γ be a centered Radon Gaussian measure on a locally convex space X and let T be a γ-measurable mapping with values in a separable Fr´echet space Y such that, for every fixed h ∈ H(γ), one has T (x + h) − T (x) = g(h)
γ-a.e.,
where g(h) is some element of Y . Then T has an affine modification. Theorem 5.18. Let X be a locally convex space and let γ be a Gaussian measure on σ(X) (or a Radon Gaussian measure). Then, for any affine function f on X (which is not assumed to be measurable in advance), the following conditions are equivalent. (i) There exists a set A ∈ σ(X) (respectively, A ∈ B(X)) of positive measure such that f (A) does not intersect some compact set of positive Lebesgue measure; (ii) in (i) one can take an interval for such a compact set; (iii) f is a γ-measurable function; (iv) for some real number r, the function f − r belongs to the closure of X ∗ in the space L2 (γ). In either case f is a Gaussian random variable. Proofs of these facts can be found in [21], where there are also references to many original works. It many cases, such results had been first established for Hilbert or Banach spaces and later extended to locally convex spaces; Borell’s paper [37] was one of the first systematic studies of general Radon Gaussian measures.
October 24, 2013
10:0
9in x 6in
32
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
6. Zero-one Laws and Dichotomies An important property of Gaussian measures is the so called 0–1 law which asserts that certain sets of special form may have measure only 0 or 1 (see [21] for proofs). Theorem 6.1. Let γ be a Radon Gaussian measure on a locally convex space X. (i) For every γ-measurable affine subspace L ⊂ X we have either γ(L) = 0 or γ(L) = 1. (ii) Let {en } be an orthonormal basis in H(γ) and let a γ-measurable set E be such that, for every n and every rational number r, the sets E and E + ren coincide up to a set of measure zero. Then either γ(E) = 0 or γ(E) = 1. In particular, this is true if a γ-measurable set E is invariant with respect to the shifts by vectors ren . Another classical alternative in the theory of Gaussian measure is the Hajek–Feldman theorem on equivalence and singularity. Theorem 6.2. Let µ and ν be Radon Gaussian measures on the same space. Then either µ ∼ ν or µ ⊥ ν. One more important fact is the following Fernique theorem. Theorem 6.3. Let γ be a centered Radon Gaussian measure and let a γ-measurable function q be a seminorm on a γ-measurable linear subspace of full measure. Then exp(εq 2 ) ∈ L1 (γ) for some ε > 0. 7. The Ornstein–Uhlenbeck Semigroup Let γ be a centered Radon Gaussian measure on a locally convex space X; as usual, one can assume that this is the standard Gaussian measure on R∞ . The Ornstein–Uhlenbeck semigroup is defined by the formula f (e−t x − 1 − e−2t y)γ(dy), f ∈ Lp (γ). (7.1) Tt f (x) = X
A simple verification of the fact that {Tt }t≥0 is a strongly continuous semigroup on all Lp (γ), 1 ≤ p < ∞, can be found in [21]; the semigroup property means that Tt+s f = Ts Ts f,
t, s ≥ 0.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
33
An important feature of this semigroup is that the measure γ is invariant for it, that is, Tt f (x)γ(dx) = f (x)γ(dx). X
X
Theorem 7.1. For every p ∈ [1, +∞) and f ∈ Lp (γ) one has ( ( ( ( ( p lim Tt f − f L (γ) = 0, lim (Tt f − f dγ ( =0 ( t→+∞
t→0
Lp (γ)
and if 1 < p < ∞, then also lim Tt f (x) = f (x) a.e. t→0
It is also known that in the finite-dimensional case lim Tt f (x) = f (x) t→0
a.e. for all f ∈ L1 (γ). It remains an open problem whether this is true in infinite dimensions. The generator L of the Ornstein–Uhlenbeck semigroup is called the Ornstein–Uhlenbeck operator (more precisely, for every p ∈ [1, +∞), there is such a generator on the corresponding domain in Lp (γ); if p is not explicitly indicated, then usually p = 2 is meant). By definition, Lf = limt→0 (Tt f − f )/t if this limit exists in the norm of Lp (γ). This operator will be important in the next section. On smooth functions f (x) = f (x1 , . . . , xn ) in finitely many variables one can explicitly calculate that Lf (x) = ∆f (x) − (x, ∇f (x)) =
n $ % ∂x2i f (x) − xi ∂xi f (x) . i=1
This representation can be also extended to some functions in infinitely many variables In the general case there is no explicit expression for Lf . Theorem 7.2. The Ornstein–Uhlenbeck semigroup (Tt )t≥0 is hypercontractive, i.e., whenever p > 1, q > 1, one has Ttf q ≤ f p for all t > 0 such that e2t ≥ (q − 1)/(p − 1). It is not obvious from the definition that the operators Tt possess a common orthogonal basis in L2 (γ). This will be seen in the next section. More general Ornstein–Uhlenbeck semigroups have been studied in [32] and [104], where additional references can be found.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
34
b1644-ch01
Real and Stochastic Analysis
8. The Hermite–Chebyshev Polynomials In the theory of Gaussian measures an important role is played by the Hermite (or Chebyshev–Hermite) polynomials Hn defined by the equalities H0 = 1,
2 (−1)n 2 dn Hn (t) = √ et /2 n (e−t /2 ), dt n!
n > 1.
They have the following properties: Hn (t) =
√
nHn−1 (t) = tHn (t) −
√ n + 1Hn+1 (t).
In addition, the system of functions {Hn } is an orthonormal basis in L2 (γ), where γ is the standard Gaussian measure on the real line. For the standard Gaussian measure γn on Rn (the product of n copies of the standard Gaussian measure on R1 ) an orthonormal basis in L2 (γn ) is formed by the polynomials of the form Hk1 ,...,kn (x1 , . . . , xn ) = Hk1 (x1 ) · · · Hkn (xn ),
ki ≥ 0.
If γ is a centered Radon Gaussian measure on a locally convex space X and {ln } is an orthonormal basis in Xγ∗ , then a basis in L2 (γ) is formed by the polynomials Hk1 ,...,kn (x) = Hk1 (l1 (x)) · · · Hkn (ln (x)),
ki ≥ 0, n ∈ N.
For example, for the countable power of the standard Gaussian measure on the real line such polynomials are Hk1 ,...,kn (x1 , . . . , xn ). It is convenient to arrange polynomials Hk1 ,...,kn according to their degrees k1 + · · · + kn . For k = 0, 1, . . . we denote by Xk the closed linear subspace of L2 (γ) generated by the functions Hk1 ,...,kn with k1 + · · · + kn = k. The functions Hk1 ,...,kn are mutually orthogonal and, for the fixed value k = k1 + · · · + kn , form an orthonormal basis in Xk . The one-dimensional space X0 consists of constants and X1 = Xγ∗ . One can show that every element f ∈ X2 can be written in the form f=
∞
αn (ln2 − 1),
n=1
where {ln } is an orthonormal basis in Xγ∗ and for f converges in L2 (γ)).
∞ n=1
α2n < ∞ (i.e., the series
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
35
The spaces Xk are mutually orthogonal and their orthogonal sum is the whole L2 (γ): L2 (γ) =
∞ )
Xk ,
k=0
which means that, denoting by Ik the operator of orthogonal projection onto Xk , we have an orthogonal decomposition F =
∞
Ik (F ),
F ∈ L2 (γ).
k=0
In the case of the classical Wiener space such decompositions were first considered by Wiener [140]. One can check that Tt Hk1 ,...,kn = e−k1 −···−kn Hk1 ,...,kn , which yields that Tt F =
∞
e−kt Ik (F ),
F ∈ L2 (γ).
k=0
Given a separable Hilbert space E, one defines similarly the space Xk (E) of polynomials with values in E as the closure in L2 (γ, E) of the linear span of the mappings f · v, where f ∈ Xk , v ∈ E. Applying Theorem 7.2 we obtain a number of important results for polynomials. Corollary 8.1. Let p ≥ 2. Then the operator In : f → In (f ) from L2 (γ) to Lp (γ) is continuous and In (f ) p ≤ (p − 1)n/2 f 2 .
(8.1)
In addition, for every p ∈ (1, ∞), the operators In are continuous on Lp (γ) and In L(Lp (γ)) ≤ (M − 1)n/2 ,
(8.2)
where M = max(p, p(p − 1)−1). Corollary 8.2. Let f ∈ Xd . For any α ∈ (0, d/(2e)), there holds the inequality γ(x : |f (x)| ≥ t f 2 ) ≤ c(α, d) exp(−αt2/d ), where c(α, d) = exp α +
d . d−2eα
October 24, 2013
36
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Corollary 8.3. The spaces Xd are closed with respect to convergence in d
measure. Moreover, any sequence from ⊕ Xk that converges in measure, is k=0
convergent in Lp (γ) for every p ∈ [1, ∞). The same is true for the spaces Xd (E) of mappings with values in any separable Hilbert space E. Corollary 8.4. The norms from Lp (γ), p ∈ [1, ∞), are equivalent on every Xn . In addition, for every p > 0, the topology on Xn induced by the metric from Lp (γ) coincides with the topology of convergence in measure. Finally, if q > p > 1, one has f p ≤ f q ≤
q−1 p−1
n/2 f p
∀ f ∈ Xn .
(8.3)
For Lp -estimates for Hermite polynomials and other related estimates for Gaussian chaos, see also [92], [94], and the recent book [115]. There is a completely different description of elements in Pd := Pd (γ) := X0 + · · · + Xd in almost algebraic terms. Theorem 8.5. A γ-measurable function f coincides a.e. with a function in Pd (γ) precisely when it has a version f0 such that, for every x ∈ X, the function h → f0 (x + h) is a continuous polynomial of degree d on H(γ). Below we prove a more general result. Let γ be the standard Gaussian measure on R∞ . Let σn denote the σ-algebra generated by the coordinate functions xj with j > n. Let E(ψ|σn ) be the conditional expectation of a γ-integrable function ψ with respect to σn , i.e., E(ψ|σn )(xn+1 , xn+2 , . . .) = ψ(y1 , . . . , yn , xn+1 , xn+2 , . . .)γ1 (dy1 ) · · · γ1 (dyn ). We recall that σn -measurable functions are exactly Borel functions on X independent of the variables x1 , . . . , xn . Estimate (8.3) yields the following known properties of measurable polynomials, which we state with a proof for the reader’s convenience.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
37
Gaussian Measures on Infinite-Dimensional Spaces
Lemma 8.6. (i) For every d there exists ε(d) > 0 such that, for every function f ∈ Pd with zero mean, the function ϕf (t) = exp(itf )dγ X
satisfies the estimate 1 − Reϕf (t) ≥
1 2 t f 22 3
whenever |t| · f 2 ≤ ε(d).
(8.4)
(ii) For every d there exists a number α(d) > 1 such that for all f ∈ Pd + * γ x : f 1/4 ≤ |f (x)| ≤ α(d) f 1 ≥
1 . 2α(d)
(8.5)
Proof. (i) From the Taylor formula for the function Re ϕf at the point 0 we find ϕf (t) = 1 −
t2 f 22 + r(t, f ), 2
where |r(t, h)| ≤ 6−1 |t|3 f 33 .
According to (8.3) we have f 33 ≤ C(d, 3)3 f 32 . Set ε(d) := C(d, 3)−3 . Then t2 |t|3 t2 f 22 − f 33 ≥ f 22 2 6 3
if C(d, 3)3 |t| · f 2 ≤ 1,
that is, whenever |t| · f 2 ≤ ε(d). (ii) Let represent f 1 as the sum of the integrals of |f | over the following three sets: Ω1 := {x : |f (x)| < f 1 /4},
Ω := {x : f 1 /4 ≤ |f (x)| ≤ α f 1 },
Ω2 := {x : |f (x)| > α f 1 }. The integral over Ω1 does not exceed f 1 /4. The integral over Ω2 is estimated as follows by the aid of the Cauchy–Bunyakovskii inequality, the Chebyshev inequality and (8.3): |f |dγ ≤ γ(Ω2 )1/2 f 2 ≤ α−1/2 C(d, 2) f 1 . Ω2
October 24, 2013
10:0
9in x 6in
38
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Hence, whenever α−1/2 C(d, 2) ≤ 1/4 we obtain |f |dγ ≥ f 1 /2, Ω
which gives the required estimate if we set α(d) := 16C(d, 2)2 .
Theorem 8.7. Let a sequence of functions fn + gn , where fn ∈ Pd and a γ-measurable function gn depends only on the variables xi with i > n, have the characteristic functionals ϕfn +gn (t) equicontinuous at the origin. Then the sequence of functions ψn := fn − E(fn |σn ) is bounded in every Lp (γ). Proof. Let us represent the space X with the measure γ as the product of Rn with the standard Gaussian measure γn and another copy of (X, γ). The points x ∈ X will be written as pairs x = (z, y), where z ∈ Rn , y ∈ X. It is easily verified that the function E(fn |σn ) is a measurable polynomial of degree d and does not depend on the variables x1 , . . . , xn . For γ-a.e. fixed y, the function z → fn (z, y) has a version that is a polynomial on Rn of degree d. For such y, the function z → ψn (z, y) is a polynomial of degree d with zero mean with respect to the measure γn . According to (8.4), whenever ψn (z, y)2 γn (dz) ≤ ε(d)2 , (8.6) t2 Rn
we have
1 1 − Re exp[itψn (z, y)]γn (dz) ≥ t2 3 n R
Rn
ψn (z, y)2 γn (dz).
(8.7)
It is readily seen that the integral of ψn (z, y)2 in z with respect to the measure γn is a measurable polynomial of degree 2d in y. Hence on account of (8.5) one has the inequality 1 −1 2 2 2 ψn (z, y) γn (dz) ≤ α(2d) ψn 2 ≥ γ y : 4 ψn 2 ≤ . 2α(2d) n R The set on the left-hand side of this inequality will be denoted by Ωn . Let t2 ψn 22 ≤ ε(d)2 /α(2d).
(8.8)
Then, for every y ∈ Ωn , one has (8.6), which by (8.7) yields the estimate 1 2 1 − Re t ψn 22 . exp[itψn (z, y)]γn (dz) ≥ 12 n R
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
39
By using independence of gn of the first n variables and Fubini’s theorem, we obtain ϕfn +gn (t) = exp(itf )dγ X
exp[itgn (y) + itE(fn |σn )]
= X
Rn
exp[itψn (z, y)]γn (dz)γ(dy).
Since | exp[itgn (y) + itE(fn |σn )]| = 1, the inequalities obtained above along with condition (8.8) give the estimate 1 − Reϕfn +gn (t) ≥
t2 t2 ψn 22 γ(Ωn ) ψn 22 ≥ . 12 24α(2d)
The uniform boundedness of ψn 2 is obvious from this estimate: otherwise we could find sequences tj → 0 and nj → ∞ such that 1 − Reϕfnj +gnj (tj ) ≥ (24α(2d))−1 , which is impossible by our condition.
Corollary 8.8. The conclusion of the proven theorem is true if the sequence of functions fn + gn converges in distribution. Corollary 8.9. Let a sequence of functions ηn = fn + gn , where fn ∈ Pd and a γ-measurable function gn depends only on the variables xi with i > n, has a finite limit η almost everywhere. Then η ∈ Pd . Proof. Almost everywhere convergence implies convergence in distribution. So the sequence of functions ψn is bounded in L2 (γ). We may assume that fn = ψn replacing gn by gn + E(fn |σn ). Then fn 2 ≤ C < ∞. It is well known that there exists a subsequence {ni } such that the functions Fk := (fn1 + · · · + fnk )/k converge in L2 (γ). Passing to a subsequence we may assume that the functions Fk converge almost everywhere. The functions (ηn1 + · · · + ηnk )/k converge almost everywhere to the function η. Then the functions wk := (gn1 + · · · + gnk )/k
October 24, 2013
40
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
also converge almost everywhere to a limit w. It remains to observe that w almost everywhere coincides with a constant. This follows from Kolmogorov’s zero-one law (see [23, 10.10(iv)]) since for every finite permutation of the coordinates π we have w(π(x)) = w(x) almost everywhere. Indeed, for any fixed m we have wk = k−1 (gn1 + · · · + gnm ) + k −1 (gnm+1 + · · · + gnk ), where the first summand tends to zero as k → ∞ and the second one is independent of the variables with the indices at most m. Therefore, if π does not change coordinates with the indices larger than m, then w(π(x)) = w(x) for all x where w(x) = limk→∞ wk (x) exists. We note the obvious fact that the condition of independence of gn of the variables x1 , . . . , xn can be replaced by the existence of a modification (that is, an almost everywhere equal function) with such a property. We recall that a function p on Rn is a polynomial of degree d precisely when for every fixed vectors a, b ∈ Rn the function t → p(a + tb) is a polynomial of degree d on the real line. Corollary 8.10. Let a γ-measurable function f possess the following property: for some d ≥ 0, for every n ∈ N and γ-a.e. x the function (t1 , . . . , tn ) → f (x1 + t1 , x2 + t2 , . . . , xn + tn , xn+1 , xn+2 , . . .)
(8.9)
is a polynomial of degree d. Then f ∈ Pd . Moreover, in place of (8.9) it suffices that, for every finite collection of integer numbers b1 , . . . , bn , for γ-a.e. x the function t → f (x1 + tb1 , x2 + tb2 , . . . , xn + tbn , xn+1 , xn+2 , . . .)
(8.10)
be a polynomial of degree d. Proof. We first observe that the condition with functions (8.10) for almost all x yields the condition with functions (8.9) for almost all x, as well as the fact that for almost all x the condition with functions (8.9) is fulfilled for all n at once. Indeed, for any fixed n, we take in (8.10) the collections b1 , . . . , bn with n − 1 zeros and one unit, which yields that function (8.9) for almost all x is a polynomial of degree at most nd. Now taking all possible integer bi , it is not difficult to deduce from this that the degree of such a polynomial is actually at most d. Now, using (8.9), we apply induction on d. For d = 0 the claim is true by the zero-one law. Suppose now that d > 0 and our claim is proven for all natural numbers less than d. In particular,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
41
we obtain that the partial derivative ∂x1 f of the function f in x1 (which exists almost everywhere due to our condition) is a measurable polynomial of degree d − 1. Hence ∂x1 f (x1 , x2 , . . .) equals x1d−1 ud−1(x2 , x3 , . . .) + · · · + x1 u1 (x2 , x3 , . . .) + u0 (x2 , x3 , . . .), where uk ∈ Pd−1−k . Therefore, we obtain the representation f (x1 , x2 , . . .) = xd1 vd (x2 , x3 , . . .) + · · · + x1 v1 (x2 , x3 , . . .) + v0 (x2 , x3 , . . .), where each vk is a measurable polynomial of degree d − k if k > 0 and a measurable function v0 is independent of the variable x1 . It is clear that v0 satisfies the same conditions as f . Repeating this reasoning for every variable x2 , . . . , xn , we obtain the representation f = fn +gn , where fn ∈ Pd and a measurable function gn is independent of the first n variables. By the previous corollary we have f ∈ Pd . Corollary 8.11. For a general centered Radon Gaussian measure γ with an orthonormal basis {ei} in its Cameron–Martin space the previous corollary means that a γ-measurable function f belongs to Pd (γ) precisely when it has a version such that, for a.e. x, the functions (t1 , . . . , tn ) → f (x + t1 e1 + · · · + tn en ) are polynomials of degree d for all n. Finally, we mention a very recent result obtained by Arutyunyan and Yaroslavtsev [13]. Theorem 8.12. Every function in Pd (γ) admits a version that is a usual algebraic polynomial on X of degree d, i.e., a sum of functions bk (x, . . . , x), 1 ≤ k ≤ d, where bk on X k is k-linear, b0 ∈ R. Similarly, any measurable k-linear function on (X k , γ k ), i.e., a function that is a measurable linear functional in each variable for almost all fixed other variables, has a modification that is k-linear algebraically. In particular, this result gives a positive answer to the question raised by H. von Weizs¨acker at the end of the 1980s, , 1 when the book [135] was in preparation, whether the stochastic integral 0 u(t)dw(t) of a deterministic continuous function u ∈ C[0, 1] with respect to the Brownian path w admits a true bilinear version B(u, w) on the square of C[0, 1] equipped with the square of the Wiener measure.
October 24, 2013
10:0
9in x 6in
42
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
9. Sobolev Classes over Gaussian Measures In this section we briefly discuss Sobolev spaces with respect to Gaussian measures. This is a very important analytical tool and one of the mainstreams in the modern theory. The reason why such classes are important is that many nonlinear functionals on infinite-dimensional spaces arising in applications have very poor differentiability or even continuity properties from the point of view of the classical analysis (norm continuity, Fr´echet or Gˆ ateaux differentiability), but are Sobolev smooth. This effect is much stronger than in the finite-dimensional case (where it is also notable, e.g., in the theory of partial differential equations), and it was Paul Malliavin who invented special tools (now called the Malliavin calculus) to deal with such problems. It should be noted that important ideas closely connected with Gaussian Sobolev classes were developed already by Gross [74] and the first definition of such classes was given by Frolov [66] and [67]. Similarly to the classical Sobolev spaces (see e.g., [1]), there are essentially three different ways of introducing such spaces: as suitable completions of smooth functions, in terms of integration by parts, and through integral representations. We first consider the case of the standard Gaussian measure γ on Rd . The classes W p,1 (γ), 1 ≤ p < ∞, are obtained as the completions of the class C0∞ (Rd ) with respect to the Sobolev norms f p,1 :=
1/p |f |p dγ
+
1/p |∇f |p dγ
.
Similarly one defines the classes W p,1 (γ, Rm ) of Rm -valued Sobolev mappings. An extension to higher order derivatives is relatively straightforward, but there is a nuance in the choice of the norm on higher order derivatives: for many purposes it turns out to be reasonable to take Hilbert–Schmidt norms (rather than other matrix norms). In particular, the space W p,2 (γ) is obtained by taking the norm f p,2 := f p,1 +
p/2 |∂xi ∂xj f |2
1/p dγ
.
i,j≤d
Continuing inductively we obtain the spaces W p,r (γ), r ∈ N. The same class W p,r (γ) is characterized as follows: it consists of all functions f ∈ Lp (γ) such that f possesses generalized partial derivatives ∂xi1 · · · ∂xir f represented by elements in Lp (γ).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
43
The infinite-dimensional case, where γ is a centered Radon Gaussian measure on a locally convex space with the Cameron–Martin space H, is completely analogous, the only difference is that now in place of C0∞ we take the class F Cb∞ of all functions on X of the form f (x) = f0 (l1 (x), . . . , ln (x)),
li ∈ X ∗ , f0 ∈ Cb∞ (Rn ).
Let {ei } be an orthonormal basis in H. Set ∂h f (x) = lim
t→∞
f (x + th) − f (x) . t
For all p ≥ 1 and r ∈ IN, the Sobolev norm · W p,r is defined by the following formula, where ∂i := ∂ei : 1/p p/2 r 2 (∂i1 . . . ∂ik f (x)) γ(dx) . (9.1) f W p,r = k=0
X
i1 ,...,ik ≥1
If X = R∞ and H = l2 , then F Cb∞ is just the space of functions of finitely many variables of the class Cb∞ and if γ is the standard Gaussian measure on R∞ , then the Sobolev norms on such functions are the previously defined norms in the finite-dimensional case. Let W p,r (γ) denote the completion of F Cb∞ with respect to the Sobolev norm · p,r = · p,r . Note that the same norm can be written as f p,r =
r
DHk f Lp (γ,Hk ) ,
k=0
where DH f stands for the derivative of order k along H and Hk is the space of Hilbert–Schmidt k-linear forms on H, which can be defined inductively by setting Hk = H(H, Hk−1 ), H1 = H, where H(H, E) is the space of Hilbert–Schmidt operators between Hilbert spaces H and E equipped ∞ 2 with it natural norm defined by T 2H = i=1 T ei E for an arbitrary orthonormal basis {ei } in H. After this completion procedure all elements in W p,r (γ) acquire Sobolev derivatives DHk f of the respective orders. In particular, any f ∈ W 2,1 (γ) has a Sobolev gradient DH f along H, which is a limit in L2 (γ, H) of the H-gradients of smooth cylindrical functions convergent to f in the norm , · 2,1. For example, in the case of the standard Gaussian measure on R∞ ∞ the measurable linear functional f (x) = n=1 n−1 xn belongs to all classes k
October 24, 2013
44
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
m W p,r (γ), since the sums n=1 n−1 xn converge in each norm · p,r ; more and all higher specifically, DH f (x) is a constant vector h = (n−1 )∞ n=1 ∞ −2 2 derivatives vanish. Similarly, the function f (x) = xn belongs n=1 n p,r −2 ∞ 2 to all classes W (γ), DH f (x) = 2(n xn )n=1 , DH f (x) is constant and equals the diagonal operator with eigenvalues 2n−2, higher order derivatives vanish. In a similar way one defines the Sobolev spaces W p,r (γ, E) of mappings with values in a Hilbert space E. The corresponding norms are denoted by the same symbol || · ||p,r . An equivalent description employs the concept of a Sobolev derivative. Let p > 1. We shall say that a function f ∈ Lp (γ) has the generalized (or Sobolev) partial derivative g ∈ L1 (γ) along a vector h ∈ H if, for every ϕ ∈ F C ∞ , one has the equality ∂h ϕ(x)f (x)γ(dx) = − ϕ(x)g(x)γ(dx) + ϕ(x)f (x) h(x)γ(dx). X
X
X
(9.2) Set ∂h f := g. Similarly one defines generalized partial derivatives for mappings with values in a separable Hilbert space E. Definition 9.1. Let p ∈ (1, +∞). The class Gp,1 (γ, E) consists of all mappings f ∈ Lp (γ, E) such that there is a mapping Df ∈ Lp (γ, H(H, E)) with the property that, for every h ∈ H, the E-valued mapping x → Df (x)h serves as a generalized partial derivative of f along h. The classes Gp,r (γ, E) with ∈ N are defined inductively as follows: the class Dp,r+1 (γ, E) consists of all mappings f ∈ Gp,1 (γ, E) such that Df belongs to Gp,r (γ, Hr (H, E)) and the derivative of order r + 1 is defined by r+1 r f = DH DH f . DH Theorem 9.2. One has Gp,r (γ, E) = W p,r (γ, E) if p ∈ (1, +∞), r ∈ N. Remark 9.3. The case p = 1 requires a special examination, since in the definition of generalized derivatives we used the fact that hf ∈ L1 (γ), which is true by H¨ older’s inequality for any f ∈ Lp (γ) with p > 1. The space 1,1 G (γ) can be defined as the space of all functions f ∈ L1 (γ) such that hf ∈ L1 (γ) for all h ∈ H and there is a mapping DH f ∈ L1 (γ, H) for which the function (DH f, h)H serves as a generalized partial derivative along h for each h ∈ H. However, as shall see below, the inclusion hf ∈ L1 (γ) is automatically fulfilled if f has a directional partial derivative ∂h f ∈ L1 (γ) in the sense considered below.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
45
A closely related description focuses on directional properties of functions in the Sobolev classes. We present a typical result for r = 1 and E = R1 ; extensions to greater r and infinite-dimensional E are straightforward. Let us fix an orthonormal basis {ei } in H. Theorem 9.4. A function f in Lp (γ), p ≥ 1, belongs to W p,1 (γ) precisely when, for each ei , it has a version f such that the functions t → f(x + tei ), where x ∈ X, are locally absolutely continuous and, setting ∂ei f (x) :=
d f (x + tei )|t=0 , dt
p we obtain a mapping ∇f = (∂ei f )∞ i=1 belonging to L (γ, H). The same is p,1 p,1 p,1 true for the class G (γ), so that W (γ) = G (γ) also for p = 1.
Proof. The case p > 1 is considered in [21], where it is also shown that the membership in W 1,1 (γ) or in G1,1 (γ) implies the existence of the indicated versions. However, the coincidence of W 1,1 (γ) and G1,1 (γ) is not discussed in [21]. Suppose now that f ∈ L1 (γ) has a version that is locally absolutely continuous on the lines x + R1 h for some h ∈ H and that ∂h f ∈ L1 (γ), where the partial derivative ∂h f is defined almost everywhere through the indicated version. According to the lemma proven below f h ∈ L1 (γ). It 1,1 1,1 follows that W (γ) ⊂ G (γ). It should be added that the partial derivative ∂ei f (x) exists almost everywhere, since t → f(x + tei ) is almost everywhere differentiable on the real line (by a classical result from real analysis), which yields through conditional measures that the derivative at zero exists for almost every fixed x (certainly, for a given x there might be no derivative at zero). The reader is warned that a version f with the required properties depends in general on ei , which is suppressed in our notation. This happens already in dimension 2: taking a function f ∈ W 2,1 (γ) such that every version of it is locally unbounded (it is easy to give an example), we see that f has no version continuous in each variable separately (such a version would have a point of continuity). Lemma 9.5. There is a constant C with the following property: if a function f ∈ L1 (γ) has a version that is locally absolutely continuous on the lines x + R1 h for some h ∈ H and ∂h f ∈ L1 (γ), where ∂h f is defined almost everywhere through the indicated version, then | h||f |dγ ≤ C( f L1 (γ) + ∂h f L1 (γ) ). (9.3)
October 24, 2013
10:0
9in x 6in
46
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Proof. In the one-dimensional case the assertion is obvious, because the integral of t|f (t)| over [0, ∞) with respect to the standard Gaussian measure is estimated by C( f L1 (γ) + f L1 (γ) ) with some constant C as follows. Let us deal with a locally absolutely continuous version of f . Then g (t) = −tg(t) and by the integration by parts formula we have R R t|f (t)|g(t)dt = |f (t)| g(t)dt − |f (t)|g(t)|R 0, 0
0
whence, taking into account that ||f (t)| | = |f (t) | a.e., we find that +∞ |t||f (t)|g(t)dt ≤ 2 f L1 (γ) + |f (0)|. −∞
Let us estimate f (0). We may assume that f (0) > 0. Let us take T > 0 such that [0, T ] has γ-measure 1/4. Next, we choose τ ∈ [0, T ] such that the f (τ ) ≤ 4 f L1(γ). Then, letting C1−1 := mint∈[0,T ] g(t), we have f (0) ≤ f (τ ) + f L1 [0,τ ] ≤ 4 f L1(γ) + C1 f L1 (γ) , so that
+∞ −∞
|t||f (t)|g(t)dt ≤ C( f L1 (γ) + f L1 (γ)),
(9.4)
where C = 6 + C1 does not depend on f . The general case follows from this special one. Indeed, we can assume that |h|H = 1. Then the conditional measures γ x on the straight lines x + R1 h are standard Gaussian, which yields estimate (9.3). In fact, this can be seen even without conditional measures. The claim reduces to the case where γ is the standard product measure and h = e1 . Then it suffices to use Fubini’s theorem and (9.4) for the first coordinate and fixed other coordinates. The characterization in the last theorem is useful due to its local character (of course, it also involves Lp -membership that is not a local condition). For example, by using this theorem one can show under appropriate assumptions that certain compositions ψ(f ) are Sobolev functions for Sobolev f . Yet another description of Sobolev classes (even with fractional orders of differentiability) employs the Ornstein–Uhlenbeck semigroup {Tt}. Let r > 0. Set ∞ −1 tr/2−1 e−t Tt f dt, f ∈ Lp (γ), Vr f := Γ(r/2) 0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
where
∞
Γ(α) := 0
b1644-ch01
47
tα−1 e−t dt.
By the same formula we define Vr on Lp (γ, E), where E is any separable Hilbert space. For p ≥ 1 and r > 0 let us consider the space f H p,r = Vr−1 f Lp(γ) .
H p,r (γ) := Vr (Lp (γ)),
It is not difficult to show that this space is complete. Let us note a common useful property of the classes of any of the three types with p > 1: if functions fn belonging to one of them converge in measure to a function f and supn fn p,r < ∞, then f belongs to the same class. Another common feature is the reflexivity of these spaces (which follows by the reflexivity of Lp with 1 < p < ∞). It is very important that the derivatives in these constructions are taken along H, so that the geometry of the space X carrying the measure γ is irrelevant. If X itself is a nice space (say, Hilbert or Banach), then smooth functions in the classical Fr´echet or Gˆ ateaux sense with appropriate bounds on derivatives become Sobolev differentiable. However, no values of p and r ensure continuity of elements in W p,r . Example 9.6. Let γ be the standard Gaussian measure on R∞ restricted −2 2 xn < ∞. to the full measure Hilbert space E of sequence (xn ) with ∞ n=1 n Let f (x) =
∞
n−2/3 xn .
n=1
Then the function f has no version continuous on E with its Hilbert norm, but f ∈ W p,r (γ) for all r ∈ N and p ∈ [1, +∞), moreover, DH f is a constant k f = 0 if k ≥ 2. vector and DH A similar effect is seen in the case of the stochastic integral 1 f (w) = ψ(t)dw(t), 0
2
where ψ ∈ L [0, 1] has unbounded variation (say, just has no bounded version). We recall that such stochastic integrals regarded as measurable linear functionals on C[0, 1] are given by continuous functionals on C[0, 1] (are represented as integrals of paths with respect to bounded measures)
October 24, 2013
48
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
precisely when ψ is a function of bounded variation (as an equivalence class in L2 [0, 1], that is, has a modification of bounded variation). For integer values of r the spaces H p,r (γ) can be compared with the previously defined classes. The following very important result is Meyer’s equivalence. Theorem 9.7. If p ∈ (1, +∞), r ∈ N, then H p,r (γ) = W p,r (γ) = Dp,r (γ) and there exist positive constants mp,r and Mp,r such that mp,r DHr f Lp(γ,Hr ) ≤ (I − L)r/2f Lp (γ) ≤ Mp,r [ DHr f Lp(γ,Hr ) + f Lp (γ) ].
(9.5)
The same is true for E-valued mappings, where E is a separable Hilbert space. Let us observe that for any function f ∈ W p,2 (γ) we have its second 2 f and the action Lf of the Ornstein–Uhlenbeck operator on derivative DH it. In the case of the standard Gaussian measure on Rn one has Lf (x) = ∆f (x) − (x, ∇f (x)) =
n
[∂x2i f (x) − xi ∂xi f (x)],
i=1
where both parts ∆f (x) and (x, ∇f (x)) exist separately. The same holds in the case of R∞ for functions of finitely many variables. However, for general functions f ∈ W 2,2 (γ) in infinite dimensions this is not true. For example, let us consider a function f ∈ X2 given by f (x) = ∞ −1 2 2 (xn − 1). Then DH f (x) = 2(n−1 xn )∞ n=1 and DH f (x) = A is a n=1 n constant Hilbert–Schmidt operator defined by the diagonal matrix with the numbers 2n−1 at the diagonal. We have Lf (x) = 2
∞
n−1 (1 − x2n ),
n=1
where the series converges in L2 (γ) and a.e. pointwise, but the part “∆f ”, which is the series of n−1 , does not exist separately. Two other important inequalities central for Gaussian analysis are presented in the next theorem. Theorem 9.8. Suppose that γ is a centered Radon Gaussian measure on a locally convex space X. Then, for any f ∈ W 2,1 (γ), one has the logarithmic
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
Sobolev inequality 1 f 2 log |f |dγ ≤ |DH f |2H dγ + f 2 dγ log f 2 dγ . 2 X X X X In addition, there holds the Poincar´e inequality 2 f dγ dγ ≤ |DH f |2H dγ. f− X
X
b1644-ch01
49
(9.6)
(9.7)
X
Several authors contributed in discovering these inequalities in different form; Nash’s paper [109] is the earliest one I know where the Poincar´e inequality is explicitly given in the stated form with gradients (certainly, when written in terms of the Hermite expansions it becomes trivial); the paper of Gross [75] (where (9.6) was proved explicitly with gradients) became a starting point of intensive research related to characterizations and applications of logarithmic Sobolev inequalities (see references in [24] and also the recent paper [47]). As an application of the logarithmic Sobolev inequality let us consider the following situation that often arises in stochastic analysis. Suppose that ν = · γ is a probability measure. Its entropy (or the entropy of ) is defined by Entγ := log dγ, whenever log is integrable; otherwise we set Entγ := +∞. Since the function t → t log t is convex and the logarithm of the integral of vanishes, we have by Jensen’s inequality that Entγ ≥ 0. Upper bounds for entropy are often of interest in applications. Suppose that √ ∈ W 2,1 (γ). Then the Sobolev inequality yields the estimate |∇|2 1 Entγ ≤ I(), I() := dγ, 2 where |∇|2 / = 0 on the set { = 0}. √ To justify this we note that ∈ W 1,1 (γ) and that ∇ = 2−1 −1/2 ∇ with the above convention. Indeed, the integrability of and |∇|2 / yield the integrability of |∇| by the Cauchy inequality. Using Theorem 9.4, we can calculate the derivatives pointwise by using the corresponding versions.
October 24, 2013
10:0
50
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
The logarithmic Sobolev inequality is a certain weak replacement for missing analogs of the classical Sobolev inequalities in Rd which improve the initial integrability of a function on the basis of the integrability of its derivative. This is used, e.g., in the study of invariant measures of infinitedimensional diffusions (see [31]). It is not difficult to see that the membership in W p,r (γ) provides no membership in Lp+ε (γ). In addition, no continuity is ensured by the membership even in all W p,r (γ), p < ∞, r ∈ N. This is seen even in the simplest ∞ −1 example of a measurable linear functional f (x) = xn on R∞ n=1 n with the standard Gaussian measure γ. Indeed, let us show that there is no function g continuous on R∞ and equal f a.e. (the fact that f itself is not continuous, is obvious, since every continuous linear function on R∞ depends on finitely many variables). Otherwise there is a neighborhood of zero V such that |f (x)| ≤ M a.e. in V for some M . Hence there exist k and c such that |f (x)| ≤ M a.e. on the set S = {|xi | < c, i = 1, . . . , k}. There −1 is N such that | ∞ xn | ≤ N with probability at least 1/2. Since n=k+2 n the set {x : xk+1 > ck + N + M + 1} has positive measure, we arrive at the contradiction: there is a positive measure set of points x ∈ S such that f (x) > M . Sobolev functions satisfy vector integration by parts formulas. Theorem 9.9. Suppose that v ∈ W p,1 (γ, H), where p > 1. Then there is a function δv ∈ Lp(γ), called the divergence of v, such that (DH ϕ, v)H dγ = δvϕ dγ, ϕ ∈ W q,1 (γ), q = p/(p − 1). (9.8) X
X
If v = DH f, where f ∈ W p,2 (γ), then δv = Lf . Riesz transforms and other operators related to Gaussian Lp - and Sobolev spaces are studied in [4], [42], [123]. Besov-type Gaussian spaces are studied in [117]. Spaces BV of functions bounded variation with respect to Gaussian measures are considered in [7], [9], [10], [68], [82]. It is also possible to introduce Sobolev classes with respect to a Gaussian measure γ restricted to a domain. Suppose that we are given a Borel or γ-measurable set V ⊂ X of positive γ-measure such that every straight line of the form x + R1 en intersects it by a convex set Vx,n . If the sets (V − x) ∩ H are open in H for all x ∈ V , then V is called H-open (this property is equivalent to the fact that V − x contains a ball from H for every x ∈ V , and is weaker than openness of V in X), and if all such sets
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
51
Gaussian Measures on Infinite-Dimensional Spaces
are convex, then V is called H-convex. The latter property is weaker than the usual convexity. There are several natural ways of introducing Sobolev classes on V . The first one is considering the class W p,1 (V, γ) equal the completion of F Cb∞ with respect to the Sobolev norm · p,1,V with the order of integrability p, evaluated with respect to the restriction of γ to V . This class is contained in the class D p,1 (V, γ) consisting of all functions f on V belonging to Lp (V, γ) and having the following property: for each h ∈ H there is a version of f (denoted by the same symbol) absolutely continuous on each closed interval in Vx,n and such that defining ∂h f as df (x + th)/dt|t=0 (which exists for γ-a.e. x ∈ V ) the gradient ∇H f defined by (∇H f, h)H = ∂h f belongs to Lp (V, γ, H). The class D p,1 (V, γ) is naturally equipped with the Sobolev norm · p,1,V defined by the restriction of γ to V :
1/p |f |p dγ
f p,1,V = V
+ V
|∇H f |pH dγ
1/p .
It is readily verified that the spaces W p,1 (V, γ) and Dp,1 (V, γ) with the Sobolev norm are Banach. Hino [83] used the class D2,1 (V, γ) (denoted there by W 1,2 (V )) and proved that a dense subset is formed by functions that possess extensions to the whole space in the class W 2,1 (γ). In the finitedimensional case for convex V both classes coincide, the infinite-dimensional situation is not clear, but for H-convex H-open sets one has W 2,1 (V, γ) = D 2,1 (V, γ), which follows from [83]. In [34], an H-convex and H-open set V is constructed such that, for each p ∈ [1, +∞), there is a function f ∈ W p,1 (V, γ) without extensions in the class W p,1 (γ). In the case of a Hilbert space, such a set V can be chosen as convex and open. For some results related to extensions of mappings on the Wiener space, see [22]. 10. Transformations of Gaussian Measures There are several interesting classes of transformations of Gaussian measures arising in applications and deserving attention in any survey of general scope. The classical works by Cameron and Martin [44], [44] and Girsanov [70] influenced considerably this area. I would mention first linear transformations. This class has been thoroughly investigated and probably all major questions have been answered. Given a centered Radon Gaussian measure γ on a locally convex space X and a γ-measurable linear mapping A : X → Y with values in a locally
October 24, 2013
52
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
convex space Y , we obtain a Gaussian measure ν = γ ◦ A−1 on Y . In typical cases (but not always) this is a Radon measure, which is the case automatically if Y is a complete separable metrizable space (or a Souslin space). If Y = X, then ν and γ are either equivalent or mutually singular, and it is useful to be able to decide this in terms of A. It is also of interest to know when A preserves γ, i.e. γ ◦ A−1 = γ. denote the γ-measurable linear extenFor any operator A ∈ L(H) let A sion of A described in 1.4. The symbols H(H) or H denote the class of Hilbert–Schmidt operators on H. Throughout dealing with a Radon Gaussian measure γ we denote by H its Cameron–Martin space H(γ); the complete notation will be used only when we deal with several measures. Theorem 10.1. (i) Let T : X → X be a γ-measurable linear mapping such that γ ◦ T −1 = γ. Let T0 be a proper linear modification of T and let U be the restriction of T0 to H. Then U ∈ L(H) and U ∗ is an isometry (i.e., preserves distances). If U is injective, then U is an orthogonal operator. (ii) Conversely, for every U ∈ L(H) such that U ∗ is an isometry, there exists a γ-measurable proper linear mapping T that preserves the measure γ and coincides with U on H. Let us explain why the measurable linear extension of an orthogonal is given by the formula operator U on H preserves γ. We recall that T = U ∞ T x = n=1 en (x)U en , where {en } is an orthonormal basis in H. According to (5.2) we have for all l ∈ X ∗ (l). γ ◦ T −1 (l) = exp(−|U Rγ (l)|2H /2) = exp(−|Rγ (l)|2H /2) = γ In infinite-dimensional spaces, an operator T may preserve the measure γ without being injective on H(γ). For example, let γ be the countable product of the standard Gaussian measures on the real line. Then the mapping T : IR∞ → IR∞ ,
T x = (x2 , . . . , xn , . . .),
takes γ into γ, but is not injective on l2 . Hence the isometry U ∗ may fail to be a surjection: it is an isometry between H and U ∗ (H). Definition 10.2. A measurable linear automorphism of the space X is a γ-measurable linear mapping T with the following properties: (i) there exists a set Ω such that γ(Ω) = 1, T maps Ω one-to-one onto Ω and T (X\Ω) ⊂ X\Ω;
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
53
(ii) for every B ∈ B(X) one has T (B) ∈ B(X)γ and γ(T −1(B)) = γ(T (B)) = γ(B). More generally, given two centered Radon Gaussian measures µ and ν on locally convex spaces X and Y , a µ-measurable linear operator T : X → Y is called a measurable linear isomorphism if there exist full measure sets Ω1 ⊂ X and Ω2 ⊂ Y such that T : Ω1 → Ω2 = T (Ω1 ) is one-to-one, T (X\Ω1 ) ⊂ Y \Ω2 and for all B1 ∈ B(X) and B2 ∈ B(Y ) one has T (B1 ) ∈ B(X)ν ,
ν(T (B1 )) = µ(B1 ),
µ(T −1 (B2 )) = ν(B2 ).
Proposition 10.3. Let T : X → X be a γ-measurable linear mapping. The following conditions are equivalent: (i) the mapping T is a measurable linear automorphism; (ii) the mapping T takes all sets of measure zero to sets of measure zero and its proper linear version is an orthogonal operator on H(γ); (iii) the mapping T takes all measurable sets to measurable sets and γ(B) = γ(T −1(B)) = γ(T (B))
∀ B ∈ B(X).
Similar equivalences hold in the case of measurable linear isomorphisms. We recall that a mapping has Lusin’s property (N) or satisfies Lusin’s condition (N) if it takes all measure zero sets to measure zero sets. Corollary 10.4. Suppose that a γ-measurable linear mapping T satisfies Lusin’s condition (N ), its proper linear version is injective on H(γ) and γ ◦ T −1 = γ. Then T is a measurable linear automorphism. Corollary 10.5. Let T be a measurable linear automorphism and let a mapping S be such that S ◦ T = T ◦ S = I. Then S is a measurable linear automorphism. We observe that not every measurable linear mapping T with γ ◦ T −1 = γ takes all sets of measure zero to sets of measure zero. For example, if γ is the countable power of the standard Gaussian measure on the real line, then one can take a linear subspace X ⊂ IR∞ such that γ(X) = 1 and an algebraic complement of X will have a Hamel basis {vα } of cardinality of the continuum. Now we can redefine the identity operator on this algebraic complement mapping {vα } onto IR∞ . This will give a linear
October 24, 2013
54
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
version of the identity operator which maps a set of measure zero onto the whole space (the image measure does not change, of course). One can show that if γ ◦ T −1 = γ and H(γ) is infinite-dimensional, then T has a version which maps some measure zero set onto a set of full measure, but there is also a version which takes all sets of measure zero to sets of measure zero, i.e., has Lusin’s property (N). transform the measure γ Proposition 10.6. Let A, B ∈ L(H) and let B 1 into an equivalent measure. Then AB = AB γ-a.e. It follows from the above results that every orthogonal operator U on H = l2 defines a measurable linear operator on R∞ that preserves the standard Gaussian measure γ. The previous proposition says that compositions are also preserved. In general, there is no Borel linear version, since Borel linear operators on R∞ are continuous. However, one might try to construct a Borel pointwise action of the group of orthogonal operators. It is explained in [71] why this is also impossible. It is not straightforward to check that an operator is an automorphism, but there is a simple criterion for having a version that is an automorphism. Proposition 10.7. A proper linear γ-measurable operator T : X → X admits a version that is a measurable linear automorphism precisely when T |H is an orthogonal operator. More generally, given two centered Radon Gaussian measures µ and ν on locally convex spaces X and Y, a proper linear measurable operator T : X → Y admits a version that is a measurable linear isomorphism precisely when T |H(µ) is an isometry between H(µ) and H(ν). Proof. If T is a linear automorphism, then Theorem 10.1 yields that T |H is an orthogonal operator. Conversely, if T |H is an orthogonal operator, then its measurable extension T = T preserves γ and the same is true for S = (T |H )−1 . Let S be a proper linear version of the measurable extension x = x for a.e. x. Similarly, T Sx =x of S. By the previous proposition ST for a.e. x. Hence we have a full measure linear subspace x = T Sx = x}. L = {x : ST The linear subspaces T n (L) and Sn (L) have also full measure and the same 2∞ is true for Ω = n=0 (T n (L)∩ Sn (L)). For each x ∈ Ω we have T k x, Sk x ∈ L (T n x) = T ST n x = T n x, ST (Sn x) = T SSn x = Sn x for all k, so we have ST x = x for all x ∈ Ω. for all n. Therefore, T (Ω) ⊂ Ω, S(Ω) ⊂ Ω, T Sx = ST Finally, outside Ω the obtained version can be redefined in an arbitrary
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
55
way. The case of different spaces is similar, but it can be also reduced to the considered one by passing to the space X×Y with the Gaussian measure µ⊗ν and the operator T0 (x, y) = (Sy, T x), where S is a measurable linear extension of the inverse to the isometry T |H(µ) : H(µ) → H(ν). Combining this result with Tsirelson’s Theorem 5.4, we arrive at the following conclusion. Corollary 10.8. Let γ be a centered Radon Gaussian measure on a locally convex space X such that H(γ) is infinite-dimensional. Then there is a measurable linear isomorphism between (X, γ) and (R∞ , γ0 ), where γ0 is the standard Gaussian product-measure. We shall say that an operator A ∈ L(H) has property (E) if A is invertible and AA∗ − I ∈ H. Lemma 10.9. (i) An operator A ∈ L(H) has property (E) precisely when A = U (I + K), where U is an orthogonal operator and K is a symmetric Hilbert–Schmidt operator such that I + K is invertible. (ii) If A ∈ L(H) has property (E), then A∗ and A−1 have this property as well. In addition, the composition of two operators with property (E) has this property. (iii) Let A ∈ L(H) and A(H) = H. Then AA∗ − I is a Hilbert–Schmidt operator if and only if A = (I + S)W, where S is a symmetric Hilbert–Schmidt operator, the operator I + S is invertible and W ∗ is an isometry. Theorem 10.10. (i) Let T : X → X be a γ-measurable linear mapping, let T0 be its proper linear version, and let γ ◦ T −1 ∼ γ. Then A := T0 |H maps H continuously onto H and AA∗ − I ∈ H. (ii) Conversely, for any operator A ∈ L(H) satisfying the conditions A(H) = H
and
AA∗ − I ∈ H,
there is a γ-measurable proper linear mapping T for which one has T |H = A and γ ◦ T −1 ∼ γ.
October 24, 2013
56
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
Corollary 10.11. Let T be a γ-measurable linear mapping. The following conditions are equivalent: (i) a linear version of T maps H into H and has property (E) and T satisfies Lusin’s condition (N); (ii) there is a set Ω with γ(Ω) = 1 such that T maps Ω one-to-one onto Ω, T (X\Ω) ⊂ X\Ω and γ ◦ T −1 ∼ γ. In this case there is a γ-measurable linear mapping S that is inverse to T, i.e., T S = ST = I. Corollary 10.12. Suppose that a proper linear version of a γ-measurable linear mapping T has property (E) on H and f is an H-Lipschitzian function. Then f ◦T is also H-Lipschitzian and DH (f ◦T )(x) = T ∗ DH f (T x). An analogous assertion is true for mappings f to any separable Hilbert space. Now we give formulas for the Radon–Nikodym densities of equivalent Gaussian measures. Recall the concept of a regularized Fredholm–Carleman determinant for operators of the form I + K, K ∈ H. The main idea is seen in the case where the operator K is diagonal and has eigenvalues ki . Then the product det K :=
∞
(1 + ki )
i=1
may diverge if K has no trace. However, as one can easily verify, the product det2 K :=
∞
(1 + ki )e−ki
i=1
converges. Here we have det2 K = det K exp(−trace K) if K is a nuclear operator. Let K ∈ H be a finite-dimensional operator with range K(H). Set det2 (I + K) := det(I + K|K(H) ) exp(−trace K|K(H) ). Then the following Carleman inequality (see, e.g., Gohberg, Krein [72, Chapter IV, 2]) is fulfilled: 4 3 1 (10.1) |det2 (I + K)| ≤ exp K 2H . 2 For finite-dimensional operators A and B, letting I + C = (I + A)(I + B), one has det2 (I + A)det2 (I + B) = det2 (I + C) exp(traceAB).
(10.2)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
57
Gaussian Measures on Infinite-Dimensional Spaces
Now we can extend the function det2 to all Hilbert–Schmidt operators. If K ∈ H and the operator I + K is not invertible (which corresponds to the existence of an eigenvalue −1 for K), then we put det2 (I + K) := 0. Proposition 10.13. Let K ∈ H and let the operator I + K be invertible. Then for every sequence of finite dimensional operators Kn convergent to K in the Hilbert–Schmidt norm, the sequence det2 (I +Kn ) converges to a limit denoted by det2 (I + K) and independent of our choice of the approximating sequence. The function K → det2 (I + K) on the space H is locally uniformly continuous on the set of operators whose spectra do not contain −1. Moreover, the function det2 satisfies (10.1) and (10.2). If {en } is an arbitrary orthonormal basis in H, then for every K ∈ H we have 5 n 6 * +n exp − (Kei , ei ) . det2 (I + K) = lim det δij + (Kei , ej ) n→∞
i,j=1
i=1
Let K ∈ H and let the operator T = I + K on H be invertible. Set 3 4 1 2 ΛK (x) := |det2 (I + K)| exp δK(x) − |Kx|H . 2 −1 . Then Theorem 10.14. Let S = (I + K) 1 d(γ ◦T −1 ) (x) = , dγ ΛK (Sx)
d(γ ◦S −1 ) (x) = ΛK (x). dγ
(10.3)
Theorem 10.15. Two centered Radon Gaussian measures µ and ν on X are equivalent precisely when H(µ) and H(ν) coincide as sets and there exists an invertible operator C ∈ L(H(µ)) such that CC ∗ − I ∈ H(H(µ)) and |h|H(ν) = |C −1 h|H(µ)
for all h ∈ H(µ).
(10.4)
If C − I ∈ H(H(µ)), then dν 1 (x) = . −1 x) dµ ΛC−I (C
(10.5)
Finally, if µ ∼ ν, then there exists a symmetric operator C on H(µ) such that C − I ∈ H(H(µ)) and (10.5) is fulfilled. Corollary 10.16. Let µ and ν be two equivalent centered Radon Gaussian measures on X. Then there exist an orthonormal basis {en } in H(µ) and ∞ a sequence {λn } of real numbers distinct from −1 such that n=1 λ2n < ∞
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
58
b1644-ch01
Real and Stochastic Analysis
and for every sequence of independent standard Gaussian random variables ξn on a probability space (Ω, P ) one has the equality µ=P◦
∞
−1
ξn en
and
ν=P ◦
n=1
∞
−1
(1 + λn )ξn en
.
n=1
Corollary 10.17. Two centered Radon Gaussian measures µ and ν on X are equivalent precisely when there exists an invertible symmetric nonnegative operator T on H(µ) such that T − I ∈ H(H(µ)) and (f, g)L2 (ν) = (T Rµ f, Rµ g)H(µ)
∀ f, g ∈ X ∗ .
(10.6)
An equivalent condition: the norms f L2 (µ) and f L2 (ν) are equivalent on X ∗ and the quadratic form (f, f )L2 (ν) − (f, f )L2 (µ) on the space Xµ∗ is generated by a Hilbert–Schmidt operator on Xµ∗ . Corollary 10.18. Let µ and ν be two equivalent Radon Gaussian measures on X. Then dν/dµ = exp F , where F is a µ-measurable second order polynomial of the form F (x) = c +
∞ n=1
cn ξn (x) +
∞
αn ξn (x)2 − 1
µ-a.e.,
(10.7)
n=1
∞ 2 ∞ 2 where c ∈ IR1 , n=1 cn < ∞, n=1 αn < ∞, αn < 1/2, {ξn } is an ∗ orthonormal basis in Xγ , and both series converge a.e. and in L2 (µ). Conversely, if F has such a form, then exp F ∈ L1 (µ) and the measure with density exp F −1 L1 (µ) exp F with respect to µ is Gaussian. In the case where X is a separable Hilbert spaceand µ and ν have covariance operators Kµ and Kν , we obtain H(µ) = Kµ (X). Assuming that Kµ and Kν have dense ranges (which can be achieved by passing to √ −1 the closure of H(µ) in X), we write C in the form C = Kν Kµ . On −1 √ −1 Kν ∈ L(X) is the other hand, C = Kµ C0 Kµ , where C0 = Kµ an invertible operator. Here we have C − I ∈ H(H(µ)) precisely when C0 − I ∈ H(X). Therefore, the equivalence of the measures µ ν is characterized by the and−1 √ Kν and the inclusion continuity and invertibility of the operator Kµ −1 √ Kµ Kν − I ∈ H(X). The latter condition can be written as −1 −1 Kµ Kν Kµ − I ∈ H(X)
(10.8)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
59
since A − I ∈ H(X) precisely when AA∗ − I ∈ H(X). Certainly, here one can interchange µ and ν. In the considered Hilbert case there is a sufficient (but not necessary) condition for equivalence that does not require finding square roots of the covariance operators. Example 10.19. Suppose that H(µ) = H(ν) and Kν = (I + Q)Kµ for some operator Q ∈ H(X) such that the operator is I + Q invertible. Then we have µ ∼ ν. Example 10.20. A Gaussian measure ν on L2 [0, 1] is equivalent to the Wiener measure P W if and only if aν ∈ H(P W ) and its covariance operator Rν is an integral operator with a kernel Kν of the following form: t s Q(u, v)du dv, Kν (t, s) = min(t, s) + 0
0
where Q ∈ L2 ([0, 1]2 ) is a symmetric function such that it generates the integral operator without eigenvalue −1. In this case for a.e. (t, s) one has the equality Q(t, s) = ∂t ∂s Kν (t, s). Example 10.21. Let τ ∈ C 1 [0, 1], τ (t) > 0, τ (0) = 0, τ (1) = 1. Set T x(t) = x(τ (t))/ τ (t). The measure ν = P W◦T −1, i.e., the distribution of the process wτ (t) / τ (t), is equivalent to the Wiener measure P W precisely when the function τ is absolutely continuous and τ ∈ L2 [0, 1]. Let us find the Radon–Nikodym density of the measure induced by the Ornstein–Uhlenbeck process ξ on [0, 1] with ξ0 = 0 with respect to the Wiener measure P W . Girsanov’s theorem [70] gives at once the equality 1 1 1 1 2 dµξ (w) = exp − w dw − w dt , t t dP W 2 0 8 0 t
(10.9)
where by the Itˆ o formula we obtain 1 1 ws dws = (w12 − 1). 2 0 However, we shall derive the same expression from the previous results. The measure µξ on C[0, 1] is the image of P W under the linear operator T
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
60
b1644-ch01
Real and Stochastic Analysis
defined by the equation T x(t) = x(t) −
1 2
0
t
T x(s)ds.
This equation is uniquely solvable. The inverse operator S is given by the formula 1 t x(s)ds. Sx(t) = x(t) + 2 0 It is readily seen that Q = S − I is a nuclear operator on H = H(P W ) and its complexification has no eigenvalues. In addition, 4|Qx|2H = x 2L2 [0,1]
and δQ(x) = −2−1
∞
(Qx, en )H en (x),
n=1
which can be written as −
1 ∞ 1 1 1 (x, en )L2 [0,1] en (s)dx(s) = − x(s)dx(s), 2 n=1 2 0 0
where {en} is any orthonormal basis in H (i.e., {en } is a basis in L2 [0, 1]). Now formula (10.9) follows from (10.5). The next important class of transformations consists of mappings of the form T (x) = x + F (x), where F takes values in the Cameron–Martin space H. Integral and differential stochastic equations furnish examples of such transformations. For example, the famous Girsanov theorem [70] on equivalence of distributions of diffusion processes with drifts with respect to the Wiener measure can be stated in this terms. Not every transformation of the indicated form takes γ to an absolutely continuous measure (examples can be easily constructed, see [21, Chapter 6]), but the converse is true in the following sense (see [23], [24]). Theorem 10.22. If a probability measure ν is absolutely continuous with respect to γ, then one can find a Borel mapping F : X → H such that ν = γ ◦ T −1 . There are several sets of conditions ensuring that γ ◦ T −1 γ. We include two of them in the following theorem (more difficult part (i) is essentially due to [91] and deserves further consideration with respect to possible weakening the assumption of continuity of the derivative). The hypotheses can be slightly relaxed (say, passing to local Sobolev classes), but essentially they represent the best achievements so far.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
61
It is known (see [21, Chapter 5]) that if F is γ-measurable and there is a number C such that for every h ∈ H one has |F (x + h) − F (x)|H ≤ C|h|H a.e., then F has a version that is Lipschitzian along H with constant C and this version is almost everywhere Gˆ ateaux differentiable along H, so that DH F (x) exists almost everywhere. Theorem 10.23. Any of the following conditions implies that γ◦T −1 γ. (i) One has F ∈ W 2,1 (γ, H), and, for almost every x, the mapping h → F (x + h),
H →H
is Fr´echet differentiable, the mapping h → DH F (x + h) with values in H is continuous and DH F (x) has no eigenvalue 1. (ii) There is a constant λ < 1 such that, for every h ∈ H, one has |F (x + h) − F (x)|H ≤ λ|h|H
a.e.,
the operator DH F (x) (which exists a.e.) is a Hilbert–Schmidt operator and for some constant M one has DH f (x) HS ≤ M . In the finite-dimensional case it suffices that I + DF (x) be invertible almost everywhere. It is still unknown whether this is enough in infinite dimensions. Various families of such transformations (such as flows) have been an object of intensive studies starting from [51]; typical results and references can be found in [8], [21], [24] and [29]. Yet another class of recently studied transformations of a similar type comes from the so-called Monge–Kantorovich problem on a Gaussian space (see [25], [26], [59], [63], [64] and [27]). Given a probability measure µ equivalent to γ (or just absolutely continuous, but we assume equivalence for simplicity) and another probability measure ν of the same type, the problem is to find a Borel mapping T taking µ to ν and minimizing M (µ, ν, T ) = |T (x) − x|2H µ(dx) in the class of mappings T with ν = µ ◦ T −1 . The very formulation suggests that T (x) − x ∈ H. To be more precise, this is some analog of the finite-dimensional Monge problem, while the Kantorovich problem is minimization of the integral |y − x|2H σ(dx dy) K(µ, ν, σ) = X×X
October 24, 2013
62
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
over all probability measures σ on X × X with projections µ and ν to the factors. Formally speaking, of course, the infima are considered, but it turns out that under rather broad assumptions both are attained. Clearly, the Monge infimum cannot be smaller than the Kantorovich one, because for any transformation of T of µ into ν we obtain the image σ of µ under the mapping x → (x, T (x)), and the projections of σ are exactly µ and ν and K(µ, ν, σ) = M (µ, ν, T ). Moreover, under broad assumptions the minima in the two problems coincide and the optimal mapping has quite a reasonable structure T = I + DH ϕ, where ϕ is a real function with some Sobolevtype regularity. In the recent paper [46] the existence of a solution to the Monge problem has been established under the only assumption that both measures µ and ν are absolutely continuous with respect to γ. Interesting applications of optimal mappings to inequalities with Gaussian measures are discussed in [49]. The previous classes of transformations are defined without reference to any special form of the measure γ. However, if γ is the standard Gaussian measure on R∞ , then a new interesting class of mappings arises. Let us call ∞ T = (Tk )∞ → R∞ a triangular mapping if, for each k, its component k=1 : R Tk is a function of x1 , . . . , xk : Tk (x) = Tk (x1 , . . . , xk ), where Tk is a function on Rk (denoted by the same symbol). If T is differentiable (which is not assumed), then its derivative is represented by a triangular matrix. A triangular mapping T is called increasing if each function xk → Tk (x1 , . . . , xk ) is increasing (no monotonicity in other variables is assumed). Therefore, in the case of a differentiable triangular mapping its derivative has nonnegative elements at the diagonal. If no other conditions are imposed, the class of increasing triangular transformations is so large that the following result is true. Theorem 10.24. Let µ be a probability measure absolutely continuous with respect to γ. Then, for an arbitrary Borel probability measure ν on R∞ , there is an increasing triangular Borel transformation T with ν = µ ◦ T −1 and such a mapping is unique up to a modification. The mapping T described in this theorem is called the canonical triangular transformation and is denoted by Tµ,ν . Such mappings possess certain stability properties. For example, if absolutely continuous probability measures µj converge in variation to µ and Borel probability measures νj converge in variation to ν, then the mappings Tµj ,νj converge to Tµ,ν in measure µ.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
63
More can be said if µ = γ and ν γ. For example, if ν has finite entropy, i.e., for = dν/dγ we have log ∈ L1 (γ), then Tµ,ν (x) = x+F (x), where F (x) ∈ H a.e. It is not known whether this is true without entropy. In the case of finite entropy Talagrand [129] established the following useful transport inequality employed in the proof of the previous theorem: 2 |T (x) − x|H γ(dx) ≤ 2 log dγ. Our previous discussion has been concerned with infinite-dimensional transformations. However, many important problems are connected with the images of Gaussian measures under nonlinear mappings to finitedimensional spaces or even to R. In typical problems it is of interest to have conditions that ensure the existence of densities of such images and certain properties of these densities such as boundedness, smoothness, positivity, and rate of decay. One of the most powerful tools in this area is the Malliavin calculus [105], [106], which can be regarded as extending to infinite dimensions of the ideas and methods of the theory of Sobolev spaces and geometric measure theory. There are several books and surveys devoted to diverse aspects of the Malliavin calculus, see [16], [17], [18], [20], [21], [24], [35], [41], [52], [58], [86], [111], [112], [122] and [132]. Let us mention three results on finite-dimensional images of a centered Radon Gaussian measure γ on a locally convex space X; as always, one can assume that this is the standard product Gaussian measure on R∞ . Theorem 10.25. Let F = (F1 , . . . , Fn ) : X → Rn , where Fi ∈ W 1,1 (γ). Suppose that the mapping DH F : X → Rn is surjective γ-a.e. Then the measure γ ◦ F −1 is absolutely continuous. The condition of surjectivity of DH F a.e. can be restated as a.e. nondegeneracy of the so-called Malliavin matrix M = (Mij )i,j≤n with entries Mij = (DH Fi , DH Fj )H ,
i, j ≤ n.
Theorem 10.26. Suppose that in the previous theorem Fi ∈ W p,r (γ) for all p ∈ [1, ∞) and r ∈ N and that the Malliavin matrix M has the property that 1/ det M ∈ Lp(γ) for all p ∈ [1, ∞). Then the density of the induced measure γ ◦ F −1 is infinitely differentiable and is rapidly decreasing with all derivatives. Weaker assumptions are needed to guarantee that the induced measure has a Sobolev density. For example, its density is in the class W 1,1 (Rn ) if Fi ∈ W p,2 (γ) and 1/ det M ∈ Lp (γ) for all p < ∞ (some p depending on n suffices). Let us explain the main idea in the case n = 1. We want to show
October 24, 2013
10:0
9in x 6in
64
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
that µ = γ ◦ F −1 has a Sobolev class density. To this end we show that the distributional derivative of µ is given by a function. We have 1 ϕ dµ = ϕ ◦ F dγ = (DH (ϕ ◦ F ), DH F )H dγ. |D F |2H H X X By the integration by parts formula the right-hand side equals 2 4(DH LF F · DH F, DH F )H ϕ(F ) − dγ. − |DH F |2H |DH F |3H X If the function G :=
2 4(DH F · DH F, DH F )H LF 2 − |DH F |H |DH F |3H
is in L1 (γ), then we obtain that µ has a density of bounded variation, in particular, is bounded. Using the estimate −1 2 |G| ≤ |LF | · |DH F |−2 H + 4 DH F L(H) |DH F |H ,
we see that in addition to the inclusion F ∈ W 2,2 (γ) it is enough to have 1 1,1 (R). the inclusion |DH F |−4 H ∈ L (γ) in order to guarantee that ∈ W q If G ∈ L (γ), then the obtained integral is estimated by ϕ Lp (µ) G Lq (γ) with p = q/(q − 1), which is further estimated via C ϕ Lp (R) and yields that ∈ W p,1 (R). For example, G ∈ Lq (γ) if F ∈ W 2q,2 (γ)
−4q and |DH F |H ∈ L1 (γ),
which can be easily written in terms of p. Higher order smoothness of is established inductively under the respective assumptions about F (which can be also stated in terms of G). Let us consider some examples. Example 10.27. (i) Let F ∈ X2 , which means that F is of the form F =
∞
αn ( en 2 − 1),
n=1
where {en} is an orthonormal basis in H and suffices to consider the case where F (x) =
∞
∞ n=1
α2n < ∞. Hence it
αn (x2n − 1)
n=1 ∞
on the space R with the standard Gaussian measure. Suppose that not all αn vanish. Certainly, the distribution of F can be written as an infinite
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
65
convolution, but its differentiability properties are easier investigated through analysis of DH F without explicit formulas. Clearly, DH F (x) = 2(αn xn )∞ n=1 . Hence |DH F (x)|2H = 4
∞
α2n x2n ,
n=1
|DH F (x)|2H
−1
so > 0 a.e. and γ ◦ F has a density . However, this density need not be smooth (say, if α1 = 1 and αn = 0 for n > 1). If there are infinitely many nonzero αn , then one can show that ∞ γ x: α2n x2n ≤ ε = o(εk ) ∀ k ∈ N, n=1
hence
|DH F (x)|−1 H
∈ L (γ) for all p < ∞, which implies that ∈ Cb∞ (R). p
(ii) Let ψ ∈ C ∞ (R) and
F (x) =
0
1
ψ(x(t))dt
on the classical Wiener space. Suppose that the derivatives of ψ have at most polynomial growth. Then F ∈ W p,r (γ) for all finite p and r. Assume also that ψ (0) = 0. It is readily verified that 1 DH F (x)h = ψ (x(t))h(t)dt, h ∈ W02,1 [0, 1]. 0
In addition, PW (x : |DH F (x)|2H ≤ ε) = o(εk ) for every k. Indeed, if ψ (0) > 0, then there is c > 0 such that ψ (s) > c whenever s ∈ [−c, c]. With probability 1 − o(εk ) we have |x(t)| ≤ c for all t ∈ [0, ε1/3 ], since the 1/4-H¨older norm of the Brownian path has all moments (which is true for any measurable seminorm), so PW (x : sup s−1/4 |x(s)| ≥ R) ≤ Ck R−k s∈[0,1]
with some constants Ck . Let r = ε1/3 . Taking h with support in [0, r] such that h(t) = 3r−1 t
if t ∈ [0, r/3],
h(t) = 1 − 3r
−1
h(t) = 1 if t ∈ [r/3, 2r/3],
(t − 2r/3) if t ∈ [2r/3, r],
October 24, 2013
66
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
we see that the norm of r1/2 h in H is estimated by some constant and that ψ (x(t))h(t) ≥ cr1/2 on [r/3, 2r/3] if |x(t)| ≤ c for all t ∈ [0, r]. Hence for such x we have |DH F (x)h| ≥ Cr3/2 = Cε1/2 , where C is a constant. It follows that the induced measure has a density in Cb∞ (R). (ii) The situation is different for the stochastic integral 1 F (x) = ψ(x(t)) dx(t), 0
where ψ satisfies the same conditions as in (ii). For example, if ψ(s) = s, then F (x) = x(1)2 /2 − 1/2 and the distribution density is not smooth. However, letting t ψ(s)ds, Ψ(t) = 0
by the Itˆ o formula we find that 1 1 1 Ψ(x(1)) = Ψ (x(t)) dx(t) + Ψ (x(t)) dt 2 0 0 1 1 = F (x) + ψ (x(t)) dt. 2 0 Therefore, if h ∈ W02,1 [0, 1] has support in [0, 1/2], we have 1 1 ∂h F (x) = − ψ (x(t)) h(t) dt, 2 0 so under the assumption that ψ (0) = 0 we obtain similarly to the previous example that the distribution density is smooth. Theorem 10.28. Let F = (F1 , . . . , Fn ) : X → Rn , where Fi ∈ W p,2 (γ) for all p < ∞. Suppose that there is ε > 0 such that the functions 2 exp(ε| det M |−4 ), exp(ε|DH Fi |16n−12 ), exp(ε|LFi |2 ), exp(ε DH Fi 4L(H) ) H
are in L1 (γ). Then γ ◦ F −1 admits a continuous density such that (x) ≥ exp (−c1 exp(c2 |x|)) ,
where c1 , c2 > 0.
If n = 1, then it suffices that exp(ε|DH F |−4 ), exp(ε|LF |2 ),
2 exp(ε DH F 2L(H) ) ∈ L1 (γ).
(10.10)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
67
Proof. Let us apply the following result proved in [33]. Let µ = dx be a 1,1 (Rd ) and probability measure on Rn such that ∈ Wloc exp(κ|∇/|) ∈ L1 (µ) with some κ > 0.
(10.11)
Then the continuous version of satisfies (10.10). Let us verify that (10.11) is fulfilled under our assumptions. It suffices to show that there is C > 0 such that ∂xi / Lp(µ) ≤ Cp
∀ p ∈ N, ∞ where µ = γ ◦F −1, because then the integral of p=0 εp |∂xi /|p/p! against p p p µ is estimated by ∞ p=0 ε C p /p!, which is finite for sufficiently small ε by Stirling’s formula, so that exp(ε|∂xi /|) is µ-integrable for small ε, whence we obtain the integrability of exp(ε0 |∇/|) for yet smaller ε0 . Hence the desired estimate is equivalent to the estimate p ∂xi ϕ dµ ≤ Cp ϕ Lq (µ) , ϕ ∈ C0∞ , q = . p−1 Rn The left side can be written as ∂xi ϕ dx = − ∂xi ϕ dx = − ∂xi ϕ ◦ F dγ. Rn
Rn
(10.12)
X
As above, we consider the vector fields vj = DH Fj on X, j = 1, . . . , n. Let M −1 = (N ij )i,j≤n . We have ∂xi ϕ ◦ F =
N ij Mjk ∂xk ϕ ◦ F =
n
N ij ∂vj (ϕ ◦ F ).
j=1
j,k≤n
Therefore, integrating by parts we obtain that the right-hand side of (10.12) equals d j=1
ϕ(F (x))dvj (N ij · γ)(dx),
X
where dvj (N ij · γ) is the derivative of the measure N ij · γ along the vector field vj , which is the measure given with respect to γ by the density rij := ∂vj N ij − N ij δvj = (DH Fj , DH N ij )H − N ij LFj . Now we have to estimate the Lp -norm of this function. The function N ij is estimated by (det M )−1 maxij |Mij |n−1 . The vector DH N ij is a linear combination of the elements of the form (det M )−2 GDH Mkl , where G is
October 24, 2013
10:0
9in x 6in
68
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
the product of 2n − 2 matrix elements of M . Finally, the H-norm of the term DH Mkl = DH (DH Fk , DH Fl )H is estimated by the following sum: 2 2 DH Fk L(H) |DH Fl |H + DH Fl L(H) |DH Fk |H . Therefore, rij can be estimated by 3 2 C max |LFj |2 + (det M )−4 + max |DH Fj |8n−8 + max DH Fj 4L(H) H j
4
j
j
+ max |DH Fj |16n−12 . H j
It follows from our assumptions that there is ε > 0 such that exp(εrij ) is integrable. Hence there is C > 0 such that rij pLp (γ) ≤ Cp! ≤ Cpp , which yields the desired conclusion. In the case n = 1 we have some simplifications due to the fact that there is only one field v = DF F , so M = (DH F, DH F )H and 2 F · DH F, DH F )H − LF ], dv (M −1 γ)/dγ = M −1 [M −1 (DH 2 which is estimated by M −1 DH F L(H) + M −1|LF |.
For polynomials the following very interesting result has been recently proved in [110]. Theorem 10.29. Let F1 , . . . , Fn be measurable polynomials. If the measure on Rn induced by (F1 , . . . , Fn ) is not absolutely continuous, then there is a nonzero polynomial ψ on Rn such that ψ(F1 , . . . , Fn ) = 0 a.e. Distributions of nonlinear functionals are also related to surface measures in infinite dimensions, see [5], [21], [24], [61] and [82]. 11. Convexity Gaussian measures possess various convexity properties expressed by means of inequalities related to measures of specific sets or distributions of specific functionals. A number of useful theorems of this sort state that for some function ϕ(·, ·, ·) the inequality γ(λA + (1 − λ)B) ≥ ϕ(λ, γ(A), γ(B))
∀ λ ∈ [0, 1],
holds true for all A and B from a certain class of sets. The following fundamental result is called Ehrhard’s inequality; it was originally discovered by Ehrhard for convex sets, later extended in [93] to the case where only one of the two sets A and B is convex, and finally settled by Borell [38]
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
69
Gaussian Measures on Infinite-Dimensional Spaces
for arbitrary pairs of Borel sets. We recall that a set A is called convex if ta+(1−t)b ∈ A whenever a, b ∈ A and t ∈ [0, 1]. A set A is called symmetric if A = −A. The formulation involves the normal distribution function t 2 1 e−s /2 ds. Φ(t) = √ 2π −∞ We set Φ−1 (0) = −∞ and Φ−1 (1) = +∞. Theorem 11.1. Let A and B be two Borel sets in Rn and γn the standard Gaussian measure on Rn . Then one has for all λ ∈ [0, 1]: Φ−1 {γn (λA + (1 − λ)B)} ≥ λΦ−1 {γn (A)} + (1 − λ)Φ−1 {γn (B)}. (11.1) In the next theorem we use the inner measure γ∗ defined as γ∗ (S) = sup{γ(K) : K ⊂ S is compact} just because in the general case one cannot guarantee the measurability of the vector sum of two Borel sets. However, for all reasonable measures occurring in applications such a measurability holds (e.g., if X is separable complete metrizable or Souslin). Theorem 11.2. Let γ be a centered Radon Gaussian measure on a locally convex space X. Then, for arbitrary Borel sets A and B and all λ ∈ [0, 1], one has the Brunn–Minkowski inequality γ∗ (λA + (1 − λ)B) ≥ γ(A)λ γ(B)1−λ .
(11.2)
If, in addition, A and B are convex, then Φ−1 {γ∗ (λA + (1 − λ)B)} ≥ λΦ−1 {γ(A)} + (1 − λ)Φ−1 {γ(B)}. (11.3) If X is separable complete metrizable (or Souslin), then λA + (1 − λ)B is measurable, so that one can use γ in place of γ∗ . Corollary 11.3. In the situation of the previous theorem, for every symmetric convex set A and every vector a such that the sets A and A + a are γ-measurable, one has γ(A + a) ≤ γ(A). More generally, if A + ta is γ-measurable for any t ∈ [0, 1], then γ(A + a) ≤ γ(A + ta)
∀ t ∈ [0, 1].
(11.4)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
70
b1644-ch01
Real and Stochastic Analysis
Furthermore, the function
f (x + ta)γ(dx)
t → X
is nondecreasing on [0, +∞) if f is such that the sets {f ≤ c}, c ∈ R1 , are symmetric and convex, and f (· + ta) is γ-integrable for any t ≥ 0. Corollary 11.4. Let A be a γ-measurable convex set of positive measure. Then the topological support of the restriction of γ to A is convex. On Gaussian Brunn–Minkowski inequalities see also [15] and [69]. Our next aim is the following isoperimetric inequality due to Sudakov and Tsirel’son [127] and Borell [36]. Theorem 11.5. Let γn be the standard Gaussian measure on IRn and let U be the closed unit ball in Rn centered at the origin. For every measurable set A ⊂ IRn , the following inequality holds true: Φ−1 (γn (A + rU )) ≥ Φ−1 (γn (A)) + r
∀ r > 0.
(11.5)
Note that inequality (11.5) can be written as γn (A + rU ) ≥ Φ(a + r), where a = Φ−1 (γn (A)). Therefore, Φ(a+r) is the measure of the set Π+rU , where Π is a half-space having the same measure as A. If we define the surface measure of A as the limit of the ratio r−1 (γn (A + rU ) − γn (A)) as r → 0, then (11.5) shows that the half-spaces possess the minimal surface measures among the sets of given positive measure. Theorem 11.6. Let γ be a Radon Gaussian measure on a locally convex space X, let A be a γ-measurable set, and let UH be the closed unit ball in the Hilbert space H = H(γ). Then γ(A + tUH ) ≥ Φ(a + t)
∀ t ≥ 0,
where a is chosen in such a way that Φ(a) = γ(A). If γ(A) ≥ 1/2, then 1 2 1 γ(A + rUH ) ≥ Φ(r) ≥ 1 − exp − r . (11.6) 2 2 It follows that, for every positive α, there exist r0 (α) ≥ 0 and a real number c(α) such that, for all r ≥ r0 (α), the following inequality holds true: 2 r γ(A + rUH ) ≥ 1 − exp − + c(α)r γ(A) = α > 0. (11.7) 2
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
71
Gaussian Measures on Infinite-Dimensional Spaces
A measurable function f on a locally convex space with a Radon Gaussian measure γ is said to be a γ-measurable convex function if it has a modification f0 : X → R1 which is convex in the usual sense, i.e., f0 (λx + (1 − λ)y) ≤ λf0 (x) + (1 − λ)f0 (y)
∀ λ ∈ [0, 1], ∀ x, y ∈ X.
Concave functions are defined similarly with “≥” in place of “≤”. Theorem 11.7. Let F (t) = γ(x : f (x) ≤ t), where γ is a Radon Gaussian measure on a locally convex space X and f is a γ-measurable convex function on X. Then the function G : t → Φ−1 (F (t)) is concave. This result applies in particular to seminorms; it also yields the following fact (established in [48]). Corollary 11.8. Suppose that the conditions in Theorem 11.7 are satisfied. Then: (i) the function F is continuous everywhere apart from the point t0 = inf{t : γ(f ≤ t) > 0}; (ii) the function F is absolutely continuous on the ray (t0 , +∞) and has a positive derivative F at all points of this ray excepting, possibly, some at most countable set S, where F has the one-sided limits and jumps down; moreover, F is continuous on the set (t0 , +∞)\S; (iii) for every t1 > t0 , the function F has bounded variation on [t1 , +∞) if it is defined on S as the left-hand limit, in particular, F is bounded on [t1 , +∞). There exist examples showing that F may have a jump at the point t0 and F may be unbounded in a neighborhood of t0 and have jumps at the points of a countable set. An important example of a measurable convex functional is a measurable seminorm. Theorem 11.9. Let γ be a centered Gaussian Radon measure on X. For every γ-measurable seminorm q on X, there exist a sequence {fn } ⊂ Xγ∗ and a sequence of numbers αn ≥ 0 such that q(x) = sup[fn (x) + αn ] n
γ-a.e.
(11.8)
A useful property of a γ-measurable function f is its convexity along the Cameron–Martin space, i.e., convexity of the functions h → f (x+h), h ∈ H, or even a slightly weaker property, namely, convexity of the function t → f (x + th) for every h and almost every (depending on h) points x. A useful
October 24, 2013
72
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
generalization was introduced in [62], where a γ-measurable function f is called 1-convex along the Cameron–Martin space if the function 1 h → Fx (h) := f (x + h) + |h|2H 2 is convex on H regarded as a mapping with values in the space L0 (γ) of measurable functions with its natural ordering. In other words, given h, k ∈ H and α ∈ [0, 1], one has Fx (αh + (1 − α)k) ≤ αFx (h) + (1 − α)Fx (k)
for γ-a.e. x,
where the corresponding measure zero set may depend on h, k, α. It is also possible to consider this mapping with values in the Hilbert space L2 (σ) for the equivalent measure σ = (f 2 + 1)−1 · γ. One can show that for every fixed ei in a given orthonormal basis {ei } there is a version of f such that the functions t → f (x + tei ) + t2 /2 are convex. Probably, the most elementary in formulation convexity inequality is contained in the so-called correlation inequality that is still an open problem and which special case presented in the next theorem was proved by Harg´e [76]. Theorem 11.10. Let γ be a centered Radon Gaussian measure on a locally convex space X and let A and B be Borel symmetric convex sets such that B is an ellipsoid of the form B = {x : Q(x) ≤ 1}, where Q is a continuous nonnegative quadratic form. Then γ(A ∩ B) ≥ γ(A)γ(B). In this direction the following result was proved in [99] (see also [121]). Theorem 11.11. For each ε ∈ (0, 1) there is kε > 0 such that γ(A ∩ B) ≥ γ((1 − ε)A)γ(kε B) for every pair of symmetric convex Borel sets A and B and every centered Radon Gaussian measure γ. Assertion (i) in the next theorem is due to [50]. Assertion (ii), previously known as S-conjecture, was proved in [95]. Theorem 11.12. (i) Let γ be a centered Radon Gaussian measure on a locally convex space X and let A be a symmetric convex Borel set. Then the function t → log γ(et A) is concave on (0, ∞).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
73
(ii) Let A be closed and let S = {x : |l(x)| ≤ 1} with l ∈ X ∗ be such that γ(S) = γ(A). Then γ(tA) ≥ γ(tS)
if t ≥ 1
and
γ(tA) ≤ γ(tS)
if 0 ≤ t ≤ 1.
Interesting recent results related to inequalities for Gaussian measures have been also obtained in [39], [77], [78], [79], [98] and [142]. 12. Open Problems It is reasonable to end with a list of open problems related to Gaussian measures; some of these problems have already been mentioned above. In this list γ always denotes the standard Gaussian measure on Rd in the finite-dimensional case and the countable product of such measures in the case of R∞ ; H = l 2 is the Cameron–Martin space of the latter and {ei } is the usual orthonormal basis in l 2 . Most of these problems are quite longstanding, but I do not give any attribution (in a few cases where I could give an attribution), because too many authors have worked on them, which often has resulted in modifications and precisions of these problems. 1. Let A and B be convex compact sets in Rd symmetric about the origin. Is it true that γ(A ∩ B) ≥ γ(A)γ(B)? It is known that this is true if d ≤ 2 or for certain special pairs (A, B); the most general result obtained so far covers the case where one of the sets is an ellipsoid (Harg´e [76]). I do not know for which convex symmetric polyhedrons in R3 this is true (the number of vertices is of interest). 2. Let L be a Borel linear subspace in R∞ with γ(L) = 1. Does L contain a convex compact set of positive γ-measure? A similar question makes sense for any convex L of positive measure. It is only known that this may fail for non-Gaussian measures. 3. Let {fn } be a sequence in Pd (γ) such that their distributions γ ◦ fn−1 converge weakly to some measure ν on the real line. Does there always exist a function f ∈ Pd (γ) such that ν = γ ◦ f −1 ? A positive answer is known for d = 2 (see [12]). 4. Let f be a continuous polynomial on a Hilbert space E ⊂ R∞ with γ(E) = 1 such that f (x) > 0. Is it true that γ(x : f (x) < ε) = o(εn ) for all n? A related problem: is it true that the induced measure γ ◦ f −1 has a smooth (or bounded) density if the set {x : DH f (x) = 0} is empty or finite? Some partial results in this direction are known (see [19], [21], [24]).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
74
b1644-ch01
Real and Stochastic Analysis
5. Let µ be a probability measure on R∞ absolutely continuous with respect to γ. Is it always possible to find a triangular transformation T of R∞ of the form T (x) = x + F (x) with F (x) ∈ H such that µ = γ ◦ T −1? Partial positive answers are known under various additional assumptions about the Radon–Nikodym density dµ/dγ. If T is not required to be triangular, then a positive solution is given in [28] (see also [23], [24]). 6. Let F ∈ W 2,1 (γ, H) be such that the operators I + DH F (x) on H are invertible and let T = I + F . Is it true that the measure γ ◦ T −1 is absolutely continuous with respect to γ? (No continuity of DH F (x) as in Theorem 10.23(i) is assumed.) Partial results are known where the continuity of DH F is replaced by other additional assumptions such as bounds on the norm of DH F (x) (see [21], [24], [132]). 7. Let f be a γ-measurable function on R∞ that is Lipschitzian and convex along H (i.e., the functions h → f (x + h) on H have the respective properties). It is known that the Gˆ ateaux derivatives DH f (x) exists γ-a.e. Is it true that this is a Fr´echet derivative γ-a.e.? If we remove the convexity assumption, then a counterexample exists (see [30], [24]). 8. For γ on R∞ , let f ∈ L1 (γ). Is it true that limt→0 Tt f (x) = f (x) a.e.? This problem is related to estimates γ(x : supt>0 |Tt f (x)| > R) ≤ cR−1 for large R, i.e., the so-called weak 1 − 1-estimates for the maximal function T ∗ f (x) = supt>0 |Ttf (x)|. In the finite-dimensional case this is true; in addition, for every finite d there is c = cd for which the latter estimate holds; in infinite dimensions this is true if f ∈ Lp (γ) with p > 1. The question is whether the best possible cd are uniformly bounded. A similar problem for another maximal function was negatively solved in [6], where it was shown that if Cd is the minimal number such that γ(x : M f (x) > R) ≤ Cd R−1 for every function f ∈ L1 (γd ), where γd is the standard Gaussian measure on Rd and |f (y)|γd (dy), M f (x) = sup γd (B(x, r))−1 r>0
B(x,r)
then Cd → ∞ as d → ∞. 9. Let {wt }t∈[0,1] be the usual d-dimensional Wiener process, i.e., its components are independent Wiener processes, and let b : R1 × Rd → Rd be a
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
75
bounded Borel mapping. Let us consider the integral equation t b(s, x(s, ω))ds, x(t, ω) = wt (ω) + 0
which is understood in the usual sense for every fixed ω. By the continuity of wt and boundedness of b, a solution exists by a standard result (a corollary of the Schauder fixed point theorem). Note that we are not talking of solutions in the sense of stochastic differential equations, where certain additional progressive measurabilities of solutions are required (such more restrictive solutions are known to be unique [133]). Solving a long-standing open problem, Davie [53] has recently shown that the integral equation above has a unique solution for almost every ω (in the one-dimensional case with continuous b(t, x) = b(x) the result was proved long ago by Veretennikov and Kleptsyna [134], see also [11, Chapter 2, Section 8]). An important ingredient of the proof is an estimate (for the one-dimensional process) E
0
1
2p ∂x g(s, ws (ω))ds ≤ C p p! sup |g(s, x)| s,x
for every p ∈ N and every compactly smooth function g on [0, 1] × R1 . The proof of this estimate and of the whole result is difficult, and it would be interesting to find a simpler one, which would better explain the essence of the phenomenon. In particular, if g(s, x) = g(x), then the latter estimate is strongly related to the properties of occupation times for the Brownian motion (if we write the integral in s as the integral of g (s) against the image of Lebesgue measure under the function s → ws (ω)). The smallness of the set Z of paths for which there is no uniqueness can be also measured in terms of the Sobolev capacities Cp,r (see [21] and [24]). Which capacities Cp,r vanish on Z? Note that given a countable set H0 in the Cameron–Martin space, we obtain a full measure set Ω0 of paths with the uniqueness property such that Ω0 + h = Ω0 for all h ∈ H0 . References [1] R. A. Adams and J. J. F. Fournier, Sobolev Spaces, 2nd edition, Academic Press, New York, 2003. [2] R. J. Adler, An introduction to continuity, extrema, and related topics for general Gaussian processes, Institute of Math. Stat. Lect. Notes, Monograph Ser., 12. Institute of Math. Stat., Hayward, California, 1990.
October 24, 2013
76
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
[3] B. V. Agafontsev and V. I. Bogachev, Asymptotic properties of polynomials in Gaussian random variables, Dokl. Ross. Akad. Nauk. 429(1) (in Russian), 151–154; English transl.: Dokl. Math. 80(3) (2009), 806–809. [4] H. Aimar, L. Forzani and R. Scotto, On Riesz transforms and maximal functions in the context of Gaussian harmonic analysis, Trans. Amer. Math. Soc. 359(5) (2007), 2137–2154. [5] H. Airault and P. Malliavin, Int´egration g´eometrique sur l’espaces de Wiener, Bull. Sci. Math. (2) 112(1) (1988), 3–52. [6] J. M. Aldaz, Dimension dependency of the weak type (1, 1) bounds for maximal functions associated to finite radial measures, Bull. Lond. Math. Soc. 39(2) (2007), 203–208. [7] L. Ambrosio and E. Durand-Cartagena, Metric differentiability of Lipschitz maps defined on Wiener spaces, Rend. Circ. Mat. Palermo (2) 58(1) (2009), 1–10. [8] L. Ambrosio and A. Figalli, On flows associated to Sobolev vector fields in Wiener spaces: an approach ` a la DiPerna–Lions, J. Funct. Anal. 256(1), (2009) 179–214. [9] L. Ambrosio, M. Miranda S. Maniglia and D. Pallara, Towards a theory of BV functions in abstract Wiener spaces, Physica D: Nonlin. Phenom. 239(15) (2010), 1458–1469. [10] L. Ambrosio, M. (Jr.) Miranda, S. Maniglia and D. Pallara, BV functions in abstract Wiener spaces, J. Funct. Anal. 258(3) (2010), 785–813. [11] S. V. Anulova, A. Yu. Veretennikov, N. V. Krylov, R. Sh. Liptser and A. N. Shiryaev Stochastic calculus, Current problems in mathematics. Fundamental directions, 45 (1989), 5–253. Itogi Nauki i Tekhniki, Akad. Nauk SSSR, Vsesoyuz. Inst. Nauchn. i Tekhn. Inform., Moscow, (in Russian); English transl.: Stochastic calculus. Probability theory, III. Encyclopaedia Math. Sci., 45, Springer, Berlin, (1998); 253. [12] M. A. Arcones, The class of Gaussian chaos of order two is closed by taking limits in distribution, Advances in stochastic inequalities, Th. P. Hill et al. (eds.), 13–19. AMS special session on Stochastic inequalities and their applications, Georgia Institute of Technology, Atlanta, Georgia, USA, October 17–19, 1997. Providence, Rhode Island: AMS, American Mathematical Society. Contemp. Math. 234, (1999). [13] L. M. Arutyunyan and I. S. Yaroslavtsev On measurable polynomials on infinite-dimensional spaces, Dokl. Ross. Akad. Nauk. 449(6) (2013), 627–631 (in Russian); English transl.: Dokl. Math. 87(2) (2013), 214–217. [14] A. Badrikian and S. Chevet, Mesures cylindriques, espaces de Wiener et fonctions al´eatoires gaussiennes, Lecture Notes in Math, Springer-Verlag, Berlin — New York, Vol. 379, 1974. [15] F. Barthe and N. Huet On Gaussian Brunn–Minkowski inequalities, Studia Math. 191 (2009), 283–304. [16] D. Bell, The Malliavin Calculus, Wiley and Sons, New York, 1987. [17] D. R. Bell, Degenerate Stochastic Differential Equations and Hypoellipticity, Longman, Harlow, 1995.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
77
[18] F. Biagini, Y. Hu, B. Øksendal and T. Zhang, Stochastic Calculus for Fractional Brownian Motion and Applications, Springer-Verlag, London, 2008. [19] V. I. Bogachev, Functionals of random processes and infinite-dimensional oscillatory integrals connected with them, Izvest. Akad. Nauk SSSR. 156(2) (1992), 243–278 (in Russian); English transl.: Russian Sci. Izv. Math. 40(2) (1993), 235–266. [20] V. I. Bogachev, Differentiable measures and the Malliavin calculus, J. Math. Sci. (New York) 87(4) (1997), 3577–3731. [21] V. I. Bogachev, Gaussian Measures, Amer. Math. Soc., Providence, Rhode Island, 1998. [22] V. I. Bogachev Extensions of H-Lipschitzian mappings with infinitedimensional range, Infin. Dim. Anal., Quantum Probab. Relat. Top. 2(3) (1999), 1–14. [23] V. I. Bogachev, Measure Theory, Springer, Berlin, 1(2), 2007. [24] V. I. Bogachev, Differentiable Measures and the Malliavin Calculus, Amer. Math. Soc., Providence, Rhode Island, 2010. [25] V. I. Bogachev and A. V. Kolesnikov, On the Monge–Amp`ere equation in infinite dimensions, Infin. Dim. Anal. Quantum Probab. Relat. Top. 8(4) (2005), 547–572. [26] V. I. Bogachev and A. V. Kolesnikov, Sobolev regularity for the Monge– Ampere equation in the Wiener space. arXiv: 1110.1822 (to appear in Kyoto J. Math.). [27] V. I. Bogachev and A. V. Kolesnikov, The Monge–Kantorovich problem: achievements, connections, and perspectives, Russian Math. Surveys 67(5) (2012), 3–110. [28] A. V. Bogachev, A. V. Kolesnikov and K. V. Medvedev, Triangular transformations of measures, Matem. Sbornik. 196(3) (2005), 3–30 (in Russian); English transl.: Sbornik Math. 196(3) (2005), 309–335. [29] V. I. Bogachev and E. Mayer-Wolf, Absolutely continuous flows generated by Sobolev class vector fields in finite and infinite dimensions, J. Funct. Anal. 167(1) (1999), 1–68. [30] V. I. Bogachev, E. Priola and N. A. Tolmachev, On Fr´echet differentiability of Lipschitzian functions on spaces with Gaussian measures, Dokl. Ross. Akad. Nauk. 414(2) (2007), 151–155 (in Russian); English transl.: Dokl. Math. 75(3) (2007), 353–357. [31] V. I. Bogachev and M. R¨ ockner Regularity of invariant measures on finite and infinite dimensional spaces and applications, J. Funct. Anal. 133(1) (1995), 168–223. [32] V. I. Bogachev, M. R¨ ockner and B. Schmuland, Generalized Mehler semigroups and applications, Probab. Theor. Relat. Fields. 105(2) (1996), 193–225. [33] V. I. Bogachev, M. R¨ ockner and S. V. Shaposhnikov, Lower estimates of densities of solutions of elliptic equations for measures, Dokl. Ross. Akad. Nauk. 426(2) (2009), 156–161 (in Russian); English transl.: Dokl. Math. 79(3) (2009), 329–334.
October 24, 2013
78
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
[34] V. I. Bogachev and A. V. Shaposhnikov, On extensions of Sobolev functions on the Wiener space, Dokl. Ross. Akad. Nauk. 448(4) (2013), 379–383 (in Russian); English transl.: Dokl. Math. 87(1) (2013), 58–61. [35] V. I. Bogachev and O. G. Smolyanov, Analytic properties of infinite dimensional distributions, Uspehi Matem. Nauk. 45(3) (1990), 3–83 (in Russian); English transl.: Russian Math. Surveys. 45(3) (1990), 1–104. [36] C. Borell, The Brunn–Minkowski inequality in Gauss space, Invent. Math. 30(2) (1975), 207–216. [37] C. Borell, Gaussian Radon measures on locally convex spaces, Math. Scand. 38(2) (1976), 265–284. [38] C. Borell, The Ehrhard inequality, C. R. Acad. Sci. Paris, S´ er. I. 337 (2003), 663–666. [39] C. Borell, Inequalities of the Brunn–Minkowski type for Gaussian measures. Probab, Theory Related Fields. 140(1–2) (2008), 195–205. [40] A. N. Borodin and P. Salminen, Handbook of Brownian Motion — Facts and Formulae, Birkh¨ auser Verlag, Basel — Boston — Berlin, 1996. [41] N. Bouleau and F. Hirsch, Dirichlet Forms and Analysis on Wiener Space, De Gruyter, Berlin–New York, 1991. [42] B. Brandolini, F. Chiacchio and C. Trombetti, Hardy type inequalities and Gaussian measure, Commun. Pure Appl. Anal. 6(2) (2007), 411–428. [43] W. Bryc, The Normal Distribution, Characterizations with Applications, Lecture Notes in Statistics, Springer-Verlag, New York, Vol. 100, 1995. [44] R. H. Cameron and W. T. Martin, Transformation of Wiener integral under translation, Ann. Math. 45 (1944), 386–396. [45] R. H. Cameron and W. T. Martin, Transformations of Wiener integrals under a general class transformation, Trans. Amer. Math. Soc. 58 (1945), 184–219. [46] F. Cavalletti, The Monge problem in Wiener space, Calc. Var. Partial Diff. Equ. 45(1–2) (2012), 101–124. [47] A. Cianchi and L. Pick, Optimal Gaussian Sobolev embeddings, J. Funct. Anal. 256(11) (2009), 3588–3642. [48] B. S. Cirelson, I. A. Ibragimov and V. N. Sudakov, Norms of Gaussian sample functions, Lecture Notes in Math. 550 (1976), 20–41. [49] D. Cordero-Erausquin, Some applications of mass transport to Gaussiantype inequalities, Arch. Ration. Mech. Anal. 161(3) (2002), 257–269. [50] D. Cordero-Erausquin, M. Fradelizi and B. Maurey, The (B) conjecture for the Gaussian measure of dilates of symmetric convex sets and related problems, J. Funct. Anal. 214(2) (2004), 410–427. ´ [51] A.-B. Cruzeiro, Equations diff´erentielles sur l’espace de Wiener et formules de Cameron–Martin non-lin´eaires, J. Funct. Anal. 54(2) (1983), 206–227. [52] G. Da Prato, Introduction to stochastic analysis and Malliavin calculus, Edizioni della Normale, Scuola Normale Superiore di Pisa, Pisa, 2007. [53] A. M. Davie, Uniqueness of solutions of stochastic differential equations, Int. Math. Res. Not. IMRN (24) (2007), Art. ID rnm124, 26 pp. ¨ unel, Stochastic analysis of the fractional [54] L. Decreusefond and A. S. Ust¨ Brownian motion, Potential Anal. 10 (1999), 177–214.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
79
[55] S. Dereich, F. Fehringer, A. Matoussi M. Scheutzow, On the link between small ball probabilities and the quantization problem for Gaussian measures on Banach spaces, J. Theoret. Probab. 16(1) (2003), 249–265. [56] R. M. Dudley, Sample functions of the Gaussian processes, Ann. Probab. 1(1) (1973), 3–68. [57] R. M. Dudley, The sizes of compact subsets of Hilbert space and continuity of Gaussian processes, J. Funct. Anal. 1(3) (1967), 290–330. [58] S. Fang, Introduction to Malliavin Calculus, Beijing, 2004. [59] S. Fang, J. Shao and K.-Th. Sturm, Wasserstein space over the Wiener space, Probab. Theory Related Fields. 146(3–4) (2010), 535–565. [60] X. Fernique Fonctions al´eatoires gaussiennes, vecteurs al´eatoires gaussiens, Universit´e de Montr´eal, Centre de Recherches Math´ematiques, Montr´eal, 1997. [61] D. Feyel and A. de. La Pradelle, Hausdorff measures on the Wiener space, Potential. Anal. 1(2) (1992), 177–189. ¨ unel, The notion of convexity and concavity on Wiener [62] D. Feyel, A. S. Ust¨ space, J. Funct. Anal. 176(2) (2000), 400–428. ¨ unel, Monge–Kantorovitch measure transportation and [63] D. Feyel, A. S. Ust¨ Monge–Amp`ere equation on Wiener space, Probab. Theory Related Fields 128(3) (2004), 347–385. ¨ unel and M. Zakai, The realization of positive random [64] D. Feyel, A. S. Ust¨ variables via absolutely continuous transformations of measure on Wiener space, Probab. Surv. 3 (2006), 170–205. [65] V. P. Fonf, W. B. Johnson, G. Pisier and D. Preiss, Stochastic approximation properties in Banach spaces, Studia Math. 159(1) (2003), 103–119. [66] N. N. Frolov, Embedding theorems for spaces of functions of countably many variables, I. Proceedings Math. Inst. of Voronezh Univ., Voronezh University No. 1, 1970, pp. 205–218 (in Russian). [67] N. N. Frolov, Embedding theorems for spaces of functions of countably many variables and their applications to the Dirichlet problem, Dokl. Akad. Nauk SSSR, 203(1) (1972), 39–42 (in Russian); English transl.: Soviet Math. 13(2) (1972), 346–349. [68] M. Fukushima and M. Hino, On the space of BV functions and a related stochastic calculus in infinite dimensions, J. Funct. Anal. 183(1) (2001), 245–268. [69] R. J. Gardner and A. Zvavitch, Gaussian Brunn–Minkowski inequalities, Trans. Amer. Math. Soc. 362 (2010), 5333–5353. [70] I. V. Girsanov, On transforming a certain class of stochastic processes by absolutely continuous substitution of measures, Teor. Verojatn. i Primen. 5(3) (1960), 314–330 (in Russian); English transl.: Theory Probab. Appl., 5 (1960), 285–301. [71] E. Glasner, B. Tsirelson and B. Weiss, The automorphism group of the Gaussian measure cannot act pointwise, Israel J. Math. 148 (2005), 305–329. [72] I. G. Gohberg and M. G. Krein, Introduction to the theory of Linear nonselfadjoint operators. Nauka, Moscow, 1965 (in Russian); English transl.: Amer. Math. Soc., Providence, Rhode Island, 1969.
October 24, 2013
80
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
[73] L. Gross, Abstract Wiener spaces, In: Proc. 5th Berkeley Symp. Math. Stat. Probab., Part 1, pp. 31–41. University of California Press, Berkeley, 1965. [74] L. Gross, Potential theory on Hilbert space, J. Funct. Anal. 1(2) (1967), 123–181. [75] L. Gross, Logarithmic Sobolev inequalities, Amer. J. Math. 97(4) (1975), 1061–1083. [76] G. Harg´e, A particular case of correlation inequality for the Gaussian measure, Ann. Probab. 27(4) (1999), 1939–1951. [77] G. Harg´e, A convex/log-concave correlation inequality for Gaussian measure and an application to abstract Wiener spaces, Probab. Theory Relat. Fields. 130(3) (2004), 415–440. [78] G. Harg´e, Characterization of equality in the correlation inequality for convex functions, the U -conjecture, Ann. Inst. Henri. Poincar´ e, Probab. Stat. 41(4) (2005), 753–765. [79] G. Harg´e, Reinforcement of an inequality due to Brascamp and Lieb, J. Funct. Anal. 254(2) (2008), 267–300. [80] T. Hida and M. Hitsuda Gaussian processes, Amer. Math. Soc., Providence, Rhode Island, 1993. [81] T. Hida, H. Kuo, J. Pothoff and L. Streit, White Noise, An InfiniteDimensional Calculus, Kluwer Acad. Publ., Dordrecht, 1993. [82] M. Hino M, Sets of finite perimeter and the Hausdorff–Gauss measure on the Wiener space, J. Funct. Anal. 258(5) (2010), 1656–1681. [83] M. Hino, Dirichlet spaces on H-convex sets in Wiener space, Bull. Sci. Math. 135 (2011), 667–683. [84] I. A. Ibragimov, On the Skitovich–Darmois–Ramachandran theorem, Teor. Verojatn. i Primen. 57(2) (2012), (in Russian); English transl.: Theory Probab. Appl. 57 (2013). [85] I. A. Ibragimov and Y. A. Rozanov Gaussian random processes. Translated from the Russian. Springer-Verlag, New York — Berlin, 1978 (Russian ed.: Moscow, 1970). [86] N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion Processes, North–Holland, Amsterdam, 1989. [87] S. Janson, Gaussian Hilbert Spaces, Cambridge Univ. Press, Cambridge, 1997. [88] A. N. Kolmogoroff, Wienersche Spiralen und einige andere interessante Kurven im C. R. Hilbertschen Raum, (Dokl.) Acad. Sci. URSS (N.S.). 26 (1940), 115–118. [89] P. Kr´ee and A. Tortrat, Desint´egration d’une loi gaussienne µ dans une somme vectorielle, C. R. Acad. Sci. Paris. 277 (1973), 695–697. [90] H. -H. Kuo, Gaussian Measures in Banach Spaces, Springer-Verlag, Berlin — New York, 1975. [91] S. Kusuoka, The nonlinear transformation of Gaussian measure on Banach space and its absolute continuity I, II, J. Fac. Sci. Univ. Tokyo, Sec., 1A. 29(3) (1982), 567–598; 30(1) (1983), 199–220. [92] L. Larsson-Cohn, Lp -norms of Hermite polynomials and an extremal problem on Wiener chaos, Ark. Mat. 40(1) (2002), 133–144.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
81
[93] R. Latala, A note on the Ehrhard inequality, Studia Math. 118 (1996), 169–174. [94] R. Latala, Estimates of moments and tails of Gaussian chaoses, Ann. Probab. 34(6) (2006), 2315–2331. [95] R. Latala and K. Oleszkiewicz, Gaussian measures of dilatations of convex symmetric sets, Ann. Probab. 27(4) (1999), 1922–1938. [96] M. Ledoux, A recursion formula for the moments of the Gaussian orthogonal ensemble, Ann. Inst. Henri Poincar´e Probab. Stat. 45(3) (2009), 754–769. [97] M. Ledoux and M. Talagrand, Probability in Banach Spaces. Isoperimetry and Processes, Springer-Verlag, Berlin — New York, 1991. [98] J. Lehec, The symmetric property (τ ) for the Gaussian measure, Ann. Fac. Sci. Toulouse Math. (6) 17(2) (2008), 357–370. [99] W. V. Li, A Gaussian correlation inequality and its applications to small ball probabilities, Electr. Commun. Probab. 4 (1999), 111–118. [100] W. V. Li and Q.-M. Shao, Gaussian processes: Inequalities, small ball probabilities, and applications, in Stochastic Processes: Theory and Methods, C. R. Rao and D. Shanbhag (eds.), Handbook of Statistics, North-Holland, Amsterdam, Vol. 19, 2001, pp. 533–597. [101] M. A. Lifshits Gaussian Random Functions, Kluwer Academic Publ., Dordrecht, 1995 (Russian ed.: Kiev, 1995). [102] M. A. Lifshits, Lectures on Gaussian Processes, Springer, New York, 2012. [103] H. Luschgy and G. Pages, Sharp asymptotics of the Kolmogorov entropy for Gaussian measures, J. Funct. Anal. 212(1) (2004), 89–120. [104] J. Maas and J. van Neerven, On analytic Ornstein–Uhlenbeck semigroups in infinite dimensions, Arch. Math. (Basel). 89(3) (2007), 226–236. [105] P. Malliavin, Stochastic calculus of variation and hypoelliptic operators, Proc. Intern. Symp. on Stoch. Diff., Eq. (Res. Inst. Math. Sci., Kyoto Univ., Kyoto, 1976). pp. 195–263. Wiley, New York — Chichester — Brisbane, 1978. [106] P. Malliavin, Stochastic Analysis, Springer-Verlag, Berlin, 1997. [107] A. M. Mathai and G. Pederzoli, Characterizations of the Normal Probability Law, Wiley, New York, 1977. [108] K. S. Miller, Multidimensional Gaussian Distributions, John Wiley and Sons, New York, 1964. [109] J. Nash, Continuity of solutions of parabolic and elliptic equations, Amer. J. Math. 80 (1958), 931–954. [110] I. Nourdin, D. Nualart and G. Poly, Absolute continuity and convergence of densities for random vectors on Wiener chaos. Math-arXiv:1207.5115 (2013). [111] I. Nourdin and G. Peccati, Normal Approximations Using Malliavin Calculus: From Stein’s Method to Universality, Cambridge University Press, Cambridge, 2012. [112] D. Nualart, The Malliavin Calculus and Related Topics, 2nd edition, Springer-Verlag, Berlin, 2006.
October 24, 2013
82
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch01
Real and Stochastic Analysis
[113] Y. Okazaki, Stochastic basis in Fr´echet space, Math. Ann. 274 (1986), 379–383. [114] J. K. Patel and C. B. Read, Handbook of the Normal Distribution, 2nd edition, Marcel Dekker, New York, 1996. [115] G. Peccati and M. S. Taqqu, Wiener Chaos: Moments, Cumulants and Diagrams, Springer, Berlin, 2011. [116] J. Picard, Representation formulae for the fractional Brownian motion, In: S´eminaire de Probabilit´es XLIII, C. Donati-Martin, A. Lejay and A. Rouault (eds.), Lecture Notes in Math., 2006, Springer, 2011, pp. 3–70. [117] E. Pineda and W. Urbina, Some results on Gaussian Besov–Lipschitz spaces and Gaussian Triebel — Lizorkin spaces, J. Approx. Theory. 161(2) (2009), 529–564. [118] V. I. Piterbarg Asymptotic methods in the theory of Gaussian processes and fields. Izdat. Moskovsk. Univ., Moscow, 1988 (in Russian); English transl.: Amer. Math. Soc., Providence, Rhode Island, 1996. [119] B. Ramachandran, Advanced Theory of Characteristic Functions. Statist. Publ. Soc., Calcutta, 1967. [120] Yu. A. Rozanov, Infinite-dimensional Gaussian distributions, Trudy Matem. Steklov Inst. 1968. V. 108, pp. 1–161 (in Russian); English transl.: Proc. Steklov Inst. Math. 108, American Math. Soc., Providence, Rhode Island, 1971. [121] G. Schechtman, Th. Schlumprecht and J. Zinn, On the Gaussian measure of the intersection of symmetric, convex sets, Ann. Probab. 26 (1998), 346–357. [122] I. Shigekawa, Stochastic Analysis, Amer. Math. Soc., Providence, Rhode Island, 2004. [123] P. Sj¨ ogren and F. Soria, Sharp estimates for the non-centered maximal operator associated to Gaussian and other radial measures, Adv. Math. 181(4) (2004), 251–275. [124] D. W. Stroock, Probability Theory: An Analytic View, 2nd edition, Cambridge University Press, 2011. [125] V. N. Sudakov, Geometric problems of the theory of infinite-dimensional probability distributions, Trudy Mat. Inst. Steklov. 141 (1976), 1–190 (in Russian); English transl.: Proc. Steklov Inst. Math. (2) (1979), 1–178. [126] V. N. Sudakov, The Weizs¨ acker phenomenon and the canonical determination of Lebesgue–Rokhlin Gaussian measures, Zap. Nauchn. Sem. S.Peterburg. Otdel. Mat. Inst. Steklov. (POMI). 364 (2009), 200–234 (Russian); English transl.: J. Math. Sci. (New York). 163(4) (2009), 430–445. [127] V. N. Sudakov and B. S. Tsirel’son, Extremal properties of half-spaces for spherically invariant measures, Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov (LOMI). 41 (1974), 14–24 (in Russian); English transl.: J. Soviet Math. 9 (1978), 9–17. [128] M. Talagrand, Regularity of Gaussian processes, Acta Math. 159(1–2) (1987), 99–149. [129] M. Talagrand, Transportation cost for Gaussian and other product measures, Geom. Funct. Anal. 6 (1996), 587–600.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Gaussian Measures on Infinite-Dimensional Spaces
b1644-ch01
83
[130] Y. L. Tong, The Multivariate Normal Distribution, Springer-Verlag, Berlin — New York, 1990. [131] B. S. Tsirelson, A natural modification of a random process, and its application to series of random functions and to Gaussian measures, Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov (LOMI). 55 (1976), 35–63 (in Russian); English transl.: J. Soviet Math. 16 (1981), 940–956; Addendum to the article on natural modification. Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI). 72 (1977), 202–211 (in Russian); English transl.: J. Soviet Math. 23 (1983), 2363–2369. ¨ unel and M. Zakai, Transformation of Measure on Wiener Space, [132] A. S. Ust¨ Springer-Verlag, Berlin, 2000. [133] A. Yu. Veretennikov, On strong solutions and explicit formulas for solutions of stochastic integral equations. Mat. Sbornik. 111 (1980), 434–452 (in Russian); English transl.: Math. USSR Sb. 39 (1981), 387–403. [134] A. Yu. Veretennikov and M. L. Kleptsyna, On the trajectory approach to stochastic differential equations, In: Statistics and Control of Stochastic Processes, A. N. Shiryaev (ed.), Nauka, Moscow, (in Russian), 1989, pp. 22–23. [135] H. von Weizs¨ acker and G. Winkler, Stochastic Øntegrals: An Introduction, F. Vieweg, Braunschweig — Wiesbaden, 1990. [136] N. Wiener, The average of an analytic functional, Proc. Nat. Acad. Sci. 7(9) (1921), 253–260. [137] N. Wiener, The average value of an analytic functional and the Brownian movement, Proc. Nat. Acad. Sci. 7(10) (1921), 294–298. [138] N. Wiener, Differential space, J. Math. and Phys. 2 (1923), 131–174. [139] N. Wiener, The average value of a functional, Proc. London Math. Soc. 22 (1924), 454–467. [140] N. Wiener The homogeneous chaos, Amer. J. Math. 60 (1938), 879–936. [141] E. V. Yurova, On continuous restrictions of measurable linear operators. Dokl. Ross. Akad. Nauk. 443(3) (2012), 300–303 (in Russian); English transl.: Dokl. Math. 85(2) (2012), 229–232. [142] A. Zvavitch, Gaussian measure of sections of dilates and translations of convex bodies, Adv. Appl. Math. 41(2) (2008), 247–254.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
This page intentionally left blank
b1644-ch01
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
CHAPTER 2 RANDOM FIELDS AND HYPERGROUPS
Herbert Heyer
0. Introduction There are two reasons that motivate the author to write an expository article on the significance of the notion of a hypergroup within the theory of random fields: the first one being the reluctance to oppose to the invitation of the engaged editor, who wishes to see this topic represented as a current trend of research, the second one being the author’s conviction in that the access should be provided to still not publish material on generalized random fields over hypergroups in order to promote further investigations on the subject. The reader of the present work will either be a harmonic analyst interested in applications to probability theory or a probabilist who does not hesitate to extend his knowledge in harmonic analysis while trying to deepen his understanding of the structure of random fields. Both of these aspects determine the shaping of the exposition. Section 1 is designed to introducing the notion of a commutative hypergroup and to provide basic constructions of an algebraic-topological object which appears as a generalization of a locally compact Abelian group. In Section 2 we discuss second order random fields over a commutative hypergroup, stressing stationarity and harmonizability along the classical lines. A motivation for studying random fields over hypergroups and beyond groups is the use of hypergroup stationarity for random fields that are not stationary in the traditional sense. Section 3 is central to the topic. Generalized random fields are defined as continuous linear mappings on a Segal algebra of functions on the given hypergroup, with values in a Hilbert space. This Segal algebra has been introduced by H. G. Feichtinger [12] in the group case, as a generalization
85
October 24, 2013
86
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
of the classical algebra of test functions invented by L. Schwartz and I. M. Gelfand. The Feichtinger algebra was extended to large classes of hypergroups by M. Leitner [37] and H.-J. Neu [39], the latter having also provided applications to a general covariance theory of generalized random fields over hypergroups. As for basic references the reader is invited to consult the monograph by W. R. Bloom and H. Heyer [1] whenever he misses details on hypergroups, and the books by A. M. Yaglom [54], M. M. Rao [45] and Y. Kakihara [28] on the classical theory of random fields over groups and homogeneous spaces. The author pays tribute to the probabilists who have laid the foundations to the theory of random fields over algebraic-topological structures. Their relevant contributions will be interpreted in the Bibliographical Notes at the end of the article. H.-J. Neu deserves special recognition of his agreement to have the results of his thesis included in the present exposition. 1. Commutative Hypergroups 1.1. Definition and first examples In this introductory section we provide the standard notions of hypergroup theory and a few examples which show how certain classes of hypergroups originate from groups. Some conventional agreement seems to be in order. For a locally compact space K we apply the abbreviations C(K), C b (K), C 0 (K) and C c (K) for the spaces of continuous functions, bounded continuous functions, continuous functions vanishing at infinity, and continuous functions with compact support on K respectively. With the symbol B(K) for the space of Borel measurable functions on K we obtain the obvious sequence of inclusions C c (K) ⊂ C 0 (K) ⊂ C b (K) ⊂ C(K) ⊂ B(K). Similarly we consider the inclusions Mc (K) ⊂ M b (K) ⊂ M (K) and M 1 (K) ⊂ M b (K), where M (K), M b (K), Mc (K) and M 1 (K) denote the sets of all (Radon) measures, of bounded measures, measures with compact support and of probability measures on K respectively.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
87
A lower + attached to C or M in the above spaces refers to the respective cones of nonnegative elements. For each x ∈ K the symbol εx stands for the point (Dirac) measure in x. 1.1.1. Definition A hypergroup (K, ∗) is a locally compact space K together with a convolution ∗ in M b (K) such that (M b (K), ∗) becomes a Banach algebra and that the following axioms are fulfilled: (H1) The mapping (µ, ν) → µ ∗ ν from M b (K) × M b (K) into M b (K) is continuous with respect to the weak topology τw in M b (K). (H2) For x, y ∈ K the convolution product εx ∗ εy belongs to Mc1 (K) := M 1 (K) ∩ Mc (K). (H3) There exists a unit element e ∈ K with εe ∗ εx = εx ∗ εe = εx for all x ∈ K, and an involution x → x− in K such that εx− ∗ εy − = (εy ∗ εx )− and e ∈ supp (εx ∗ εy ) if and only if x = y − whenever x, y ∈ K. (H4) The mapping (x, y) → supp (εx ∗ εy ) from K × K into the space C(K) of compact subsets of K furnished with the Michael topology, is continuous. A hypergroup (K, ∗) is said to be commutative if the convolution ∗ is commutative. In this case (M b (K), ∗, − ) is a commutative Banach ∗-algebra with identity εe .
October 24, 2013
10:0
9in x 6in
88
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
Obviously the notion of a hypergroup deviates from that of a group by the fact that on a hypergroup Dirac measures generally convolve to measures with compact support, not necessarily to measures with singleton support. Since the set of finitely supported measures in M b (K) is locally dense in M b (K), the convolution of a hypergroup (K, ∗) is uniquely determined by its value on the Dirac measures, i.e., T x f (y)µ(dx)ν(dy) µ ∗ ν(f ) = K
K
for all µ, ν ∈ M b (K), where T x f (y) := f (x ∗ y) := εx ∗ εy (f ) for all x, y ∈ K, f ∈ C b (K). T x is called the (x-left) translation operator on C b (K) for all x ∈ K. It can be extended to the spaces B(K) and M (K). For A, B ⊂ K we define the set product A ∗ B := {supp (εx ∗ εy ) : x ∈ A, y ∈ B} which enjoys the properties compatible with the axioms of the hypergroup (K, ∗). We note that every locally compact group G together with its natural convolution structure derived from the group translation forms a hypergroup (G, ∗). In this case εx ∗ εy = εx·y for all x, y ∈ G. An important class of hypergroups derived from a pair (G, H) consisting of a locally compact group G and a compact subgroup H of G will be described in the following: 1.1.2. Example The Banach *-algebra M b (GH) := {µ ∈ M b (G) : εx ∗ µ ∗ εy = µ for all x, y ∈ H} admits the normalized Haar measure ωH ∈ M 1 (G) as an identity. On the other hand we consider the double coset space G//H := {HxH : x ∈ G}
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
89
and the canonical projection p : G → G//H which induces a probability preserving isometric isomorphism p˜ : M b (GH) → M b (G//H) of Banach spaces. Via p˜ the Banach space M b (G//H) inherits a convolution from M b (GH), hence (G//H, ∗) becomes a hypergroup with identity H ∈ G//H and involution HxH → (HxH)− := Hx−1 H The canonical Banach space isomorphism p˜ appears as a probability preserving isometric isomorphism of Banach *-algebras. It turns out that the convolution of (G//H, ∗) has the form εHxH ∗ εHyH = εHxtyH ωH (dt ) H
whenever HxH, HyH ∈ G//H. The double coset hypergroup (G//H, ∗) is commutative if and only if (G, H) is a Gelfand pair in the sense that the subalgebra of H-biinvariant functions of L1 (G, ωG ) is commutative. Special double coset hypergroups show the diversity of analytic applications of hypergroup theory. 1.1.2.1 Let G := M(d) be the motion group of Rd and H := SO(d) the special orthogonal group (d ≥ 1). Then G/H ∼ = Rd and G//H ∼ = R+ carries the Bessel-Kingman convolution inherited from the group convolution on M(d). 1.1.2.2 Let G := SO0 (d) be the Lorenz group of dimension d + 1 and H := SO(d). Then G/H ∼ = Hd , where Hd denotes the hyperbolic space of dimension d, and G//H ∼ = R+ . Here the convolution on R+ gives rise to a Jacobi hypergroup structure on R+ . 1.1.2.3 For G := SO(d + 1) and H := SO(d) (d ≥ 1) we obtain G/H ∼ = Sd , where Sd denotes the d-dimensional sphere, and the ultraspherical or Gegenbauer hypergroup structure on G//H ∼ = [−1, 1]. 1.1.2.4 Let G be the automorphism group Aut (Γ) of a graph Γ and H := Ht0 the stabilizer of a fixed vertex t0 ∈ Γ under the action of G on Γ.
October 24, 2013
10:0
9in x 6in
90
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
Then G/H ∼ = Γ and G//H ∼ = Z+ , where the convolution on Z+ provides a Cartier hypergroup structure. In the preceding examples we saw that continuous group actions can be applied in order to produce hypergroups. For the general view we add 1.1.3. Example Let G be a locally compact group and H a compact group. Suppose that the mapping (x, s) → xs from G × H into G is a continuous action of H on G such that each of the mappings x → xs is an automorphism of G. Then GH := {xH : x ∈ G}, where xH := {xs : s ∈ H} for x ∈ G, is a hypergroup (GH , ∗) with identity eH = {e} and involution xH → (xH )− := (x−1 )H . The convolution ∗ of (GH , ∗) has the form εxH ∗ εyH = ε(xs y)H ωH (ds ) H
=
H
ε(xyt )H ωH (dt )
whenever xH , y H ∈ GH . (GH , ∗) is called the orbit hypergroup associated with the pair (G, H). As a special case we note 1.1.3.1 Example defined by
of the conjugacy hypergroup of a compact group G GG := {xG : x ∈ G},
where xG := {txt−1 : t ∈ G}
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
91
for x ∈ G. (GG , ∗) is a commutative hypergroup admitting the identity {e} and the involution xG → (xG )− := (x−1 )G , the convolution being explicitly given by ε(t−1 xty)G ωG (dt ) εxG ∗ εyG = G
for all xG , y G ∈ GG . 1.2. Some harmonic analysis The analysis of hypergroups K depends as in the group case on the existence of (left) Haar measures in M+ (K) which by definition are nonvanishing and T x -(left) invariant for each x ∈ K. Haar measures exist on hypergroups as long as they are discrete or compact or commutative. They are unique within a positive constant and have full support. So, for a given hypergroup K its Haar measure ωK will always be fixed. If K is a compact hypergroup, ωK is necessarily bounded, hence it can be normalized, and it is an idempotent measure (in the sense of convolution) in M 1 (K). Moreover, any hypergroup admitting a bounded Haar measure ωK must be compact. FROM NOW ON WE ASSUME (K, ∗) TO BE COMMUTATIVE. 1.2.1. Definition For µ ∈ M b (K) and g ∈ B(K) the convolution of µ with g is given by µ ∗ g(x) := g(y − ∗ x)µ(dy ) K
for all x ∈ K. If f, g ∈ B(K) with at least one of the functions f, g having σ-finite support, then the convolution of f with g is defined by f (x ∗ y)g(y −)ωK (dy ) f ∗ g(x) := K
whenever x ∈ K. The convolution of functions gives rise to the introduction of the spaces Lp (K) := Lp (K, ωK ) for 1 ≤ p ≤ ∞.
October 24, 2013
10:0
9in x 6in
92
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
Similar to the group case L1 (K) can be embedded into M b (K) as a Banach subalgebra. 1.2.2. Definition A function χ ∈ C(K) is called multiplicative if χ(e) = 1 and χ(x ∗ y) = χ(x)χ(y) for all x, y ∈ K. If in addition, χ− (x) := χ(x− ) = χ(x) for all x ∈ K, then χ is a semicharacter of K. Bounded semicharacters are called characters of K. The sets of semicharacters and characters will be abbreviated by K ∗ and K ∧ respectively. We shall also need in what follows the set K ∗,p of strictly positive semicharacters of K. The set K ∧ of characters of K furnished with the compact open topology τco becomes a locally compact space; it serves as the dual space of K. In general K ∗ does not admit a convolution structure, hence cannot be a hypergroup. For K ∗ to become a hypergroup it is at first required that for χ, ψ ∈ K there exists a measure µχ,ψ ∈ M 1 (K ∗ ) satisfying ρ(x)µχ,ψ (dρ ) χ(x)ψ(x) = K∗
for all x ∈ K. In this situation a convolution ˆ∗ in M b (K ∧ ) can be defined by εχ ˆ∗ εψ := µχ,ψ whenever χ, ψ ∈ K ∗ . Next it can be attempted to verify the axioms of a hypergroup with the unit character 1 as a unit and some involution. Hypergroups with the property that K ∗ is also a hypergroup with the above introduced pointwise multiplication of characters are called strong. For strong hypergroups K the double dual K ∧∧ := (K ∧ )∧ can be formed, and K ⊂ K ∧∧ . If, moreover, K ∼ = K ∧∧ , then K is called a Pontryagin hypergroup (with reference to Pontryagin’s duality theorem valid for any locally compact Abelian group). Pontryagin hypergroups are rare, although in the sequel
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
93
we shall describe subexamples called self-dual hypergroups K which are defined by the property that K ∧ ∼ = K. 1.2.3. Definition For a measure µ ∈ M b (K) the Fourier (Stieltjes) transform µ ˆ is given by χdµ µ ˆ(χ) := K
whenever χ ∈ K ∧ . If f ∈ L1 (K), then fˆ := (f ∗ ωK )∧ is the Fourier transform of f . The Fourier mapping µ → µ ˆ is a norm-decreasing involutive algebra isomorphism from M b (K) into C b (K ∧ ). The Plancherel-Levitan theorem states that there is a unique Plancherel measure πK ∈ M+ (K ∧ ) with the property that |f |2 dωK = |fˆ|2 dπK K
K∧
for all f ∈ L1 (K, ωK ) ∩ L2 (K, ωK ). In particular, the Fourier mapping extends to L2 (K), and L2 (K) is isometrically embedded into L2 (K ∧ ) := L2 (K ∧ , πK ). Generally, supp πK is a proper subset of K ∧ , and 1 does not necessarily belong to supp πK . If however, K is strong, πK = ωK ∧ and supp πK = K ∧ . Again following the analysis on locally compact Abelian groups one introduces also for commutative hypergroups K the inverse Fourier trans∨
∨
forms σ and f of measures σ ∈ M b (K ∧ ) and functions f ∈ L1 (K ∧ ) respectively and proves the corresponding inversion and Plancherel-Levitan theorems. 1.2.4. Back to the examples 1.2.4.1 We consider the double coset hypergroup K = G//H of Example 1.1.2 in the case that (G, H) is a Gelfand pair, i.e., that K is commutative. The Haar measure ωK of K is the image of the Haar measure ωG of G under the projection p : G → G//H.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
94
b1644-ch02
Real and Stochastic Analysis
Any H-biinvariant function φ ∈ C(G) with φ(e) = 1 is by definition a spherical function of the pair (G, H) provided φ(xhy)ωH (dh ) φ(x)φ(y) = H
for all x, y ∈ G. Multiplicative functions χ of K are in one-to-one correspondence with the spherical functions on G via the projection χ → χ ◦ p. The Fourier transform of K turns out to be the spherical Fourier-(Stieltjes) transform. 1.2.4.2 Returning to the orbit hypergroup K := GH of Example 1.1.3 we note that the Haar measure ωK of K is the image of ωG under the canonical projection p : G → GH . It is easily seen that, if G is assumed to be Abelian, the functions φχ on K for χ ∈ G∧ given by χ(xs )ωH (ds ) φχ (xH ) := H
for all x ∈ K, belong to the dual K ∧ of K. Actually, H
K ∧ = {φχ : χ ∈ G∧ }, and φχ = φψ if and only if χ and ψ are contained in the same orbit under the dual action of H on G∧ given by −1
χh (x) := χ(xh ) for all χ ∈ G∧ , h ∈ H, x ∈ G. 1.2.4.2.1 In the special case that G := Rd and H := SO(d) for d ≥ 1 the resulting orbit hypergroup K := GH ∼ = R+ is self-dual, i.e., there exists a homeomorphism ψ : K → K ∧ satisfying ψ(w)(z)εx ∗ εy (dw ) ψ(x)(z)ψ(y)(z) = K ∧
for all x, y ∈ K, z ∈ K .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
95
In fact, K ∧ is a hypergroup, where εψ(x) ∗ εψ(y) = ψ(εx ∗ εy ) and ψ(ωK ) = ωK ∧ = πK . For α :=
d 2
− 1 a Haar measure of K is given by ωα (dr ) =
1 rd−1 dr , 2 Γ d2 d 2
the characters of K have the form x → φλ (x) =
1 Γ(α + 1) iλxt √ e (1 − t2 )α− 2 dt Γ d+1 π 2
= jα (λx) for λ ∈ C, x ∈ R+ , jα denoting the normalized Bessel function of index α. Here φλ = φλ if and only if λ2 = λ2 , and K ∧ = {φλ : λ ∈ R+ } ∼ = R+ . The Fourier transform of the Bessel-Kingman hypergroup (Rd )SO(d) is known as the Hankel transform. 1.2.5. Definition A function f ∈ C(K) is said to be positive definite if for all sequences {x1 , . . . , xn } in K and {c1 , . . . , cn } in C (n ≥ 1) n n
ci cj f (xi ∗ x− j ) ≥ 0.
i=1 j=1
Since on hypergroups positive definite functions are not necessarily bounded, we shall prefer to work with the set P D(K) of bounded positive definite functions on K. 1.2.6. Properties of the set P D(K) 1.2.6.1 A function f ∈ C(K) belongs to P D(K) if and only if f (g ∗ g ∼ )dωK ≥ 0 K
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
96
b1644-ch02
Real and Stochastic Analysis
for all g ∈ C c (K), where g ∼ (x) := g(x− ) whenever x ∈ K. 1.2.6.2 L2 (K) ∗ L2 (K)∼ ⊂ P D(K). 1.2.6.3 (Bochner’s Theorem) A function f ∈ C(K) belongs to P D(K) if and only if there exists a (uniquely determined) Bochner measure σ ∈ b M+ (K ∧ ) such that ∨
f = σ. As a consequence we note 1.2.6.4 For f ∈ L1 (K) ∩ P D(K) one has fˆ ≥ 0, fˆ ∈ L1 (K ∧ ) and (fˆ)∨ = f. Next we discuss a method of modifying the convolution of a hypergroup K which will be applied to the theory of random fields in Section 3. 1.2.7. Definition Let χ0 ∈ K ∗,p . The convolution εx • εy :=
1 χ0 · (εx ∗ εy ) χ0 (x ∗ y)
for all x, y ∈ K extends uniquely to a bilinear, associative probability preserving and locally continuous convolution • on M b (K) such that (K, •) becomes a commutative hypergroup with identity and involutions as for (K, ∗). For µ, ν ∈ M c (K) one has (χ0 · µ) • (χ0 · ν) = χ0 · (µ ∗ ν), hence the mapping µ → χ0 · µ establishes an algebra isomorphism between (Mc (K), ∗) and (Mc (K), •). Note that this isomorphism cannot be extended to M b (K) unless χ0 is bounded. The hypergroup (K, •) is called the χ0 -modification of (K, ∗).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
97
Whenever we deal with a modification of (K, ∗) we add a dot in order to emphasize the χ0 -modified convolution. Let χ(x) Kχ∧0 := χ ∈ K ∗ : sup <∞ x∈K χ0 (x) and ∧ K•,
1 χ0
:=
ρ ∈ K•∗ : sup |ρ(x)|χ0 (x) < ∞ . x∈K•
1.2.8. Properties of χ0 -modifications 1.2.8.1 ωK• = χ20 · ωK is a Haar measure of K• . 1.2.8.2 The mapping χ →
χ χ0
is a homomorphism between Kχ∧0 and K•∧ , i.e. Kχ∧0 = {χ ∈ K ∗ : |χ(x)| ≤ χ0 (x) for all x ∈ K}. 1.2.8.3 For the Plancherel measures πK and πK• we have supp (πK ) ⊂ K ∧ ∩ Kχ∧0 ∧ supp (πK• ) ⊂ K•∧ ∩ K•,
1 χ0
and .
Moreover, considering the homeomorphism χ → Φ(χ) := ∧ between K ∧ ∩ Kχ∧0 and K•∧ ∩ K•,
1 χ0
χ χ0
,
πK• = Φ(πK ). With the above listed tools it can be proved that 1.2.8.4 For every hypergroup (K, ∗) there exists exactly one strictly positive character χ0 ∈ supp (πK ).
October 24, 2013
10:0
9in x 6in
98
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
1.2.8.5 Let f ∈ B(K). If f = u∨• with u ∈ L1 (K•∧ ), then f=
1 (u ◦ Φ)∨ , χ0
and if g = w∨ with w ∈ L1 (K ∧ ), then g = (w ◦ Φ−1 )∨• . 1.2.8.6 If σ ∈ M b (K ∧ ) with supp(σ) ⊂ K ∧ ∩ Kχ∧0 , then σ ∨ = χ0 (Φ(σ))∨• , ∧ and if τ ∈ M b (K•∧ ) with supp(τ ) ⊂ K•∧ ∩ K•,
τ ∨• =
1 χ0
, then
1 (Φ−1 (τ ))∨ . χ0
Finally we add that: 1.2.8.7 Given f ∈ C c (K) we obtain for χ ∈ K ∧ ∩ Kχ∧0 with χ = χ0 ρ, ∧ where ρ ∈ K•∧ ∩ K•, 1 , that χ0
fˆ(χ) =
1 f χ0
∧• (ρ).
Examples of χ0 -modifications of commutative hypergroups will be given in the following subsection.
1.3. Basic constructions of hypergroups There are prominent constructions of hypergroup structures on the spaces Zd+ and Rd+ , where the convolutions are defined in terms of polynomials and special functions respectively. 1.3.1. Discrete polynomial hypergroups Let K be a countable discrete space, and let d ∈ N. We consider a set {Qx : x ∈ K} of polynomials on Cd and for n ∈ Z+ the set Pn of polynomials Q ∈ C[X1 , . . . , Xd ] with deg(Q) ≤ n, as well as the set Kn := {x ∈ K : Qx ∈ Pn }.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
99
Random Fields and Hypergroups
For all x, y ∈ K the product Qx Qy admits a complex linearization of the form c(x, y, w)Qw Q x Qy = w∈K
with linearizing coefficients c(x, y, w) ∈ C. K is said to be a discrete (d-variable) polynomial hypergroup if there exists a set {Qx : x ∈ K} of polynomials such that {Qx : x ∈ Kn } is a basis of Pn for each n ∈ Z+ and that a convolution ∗Qx in M b (K) is given by εx ∗Qx εy ({w}) := c(x, y, w) for all x, y, w ∈ K, where the numbers c(x, y, w) are the linearizing coefficients introduced above. For every discrete polynomial hypergroup (K, ∗Qx ) we have K ∗ = {χz : z ∈ Cd }, where χz is the evaluation mapping given by χz (x) := Qx (z) for all x ∈ K. 1.3.1.1 First examples of discrete polynomial hypergroups in one variable (d = 1) are constructed as follows. × Let (an )n∈Z+ , (bn )n∈Z+ and (cn )n∈N be sequences in R× + , R+ and R+ respectively satisfying the relations a0 + b0 = 1 and an + bn + cn = 1 for all n ∈ N. We introduce a sequence (Qn )n∈Z+ of polynomials on R by Q0 (x) = 1, Q1 (x) =
1 (x − b0 ) a0
and Q1 (x)Qn (x) = an Qn+1 (x) + bn Qn (x) + cn Qn−1 (x)
(n ∈ N)
(1.1)
for all x ∈ R. By Favard’s theorem there exists a measure π ∈ M 1 (R) with the property that Qn (x)Qm (x)π(dx ) = sm δmn R
October 24, 2013
10:0
9in x 6in
100
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
with sm > 0. Consequently the sequence (Qn )n∈Z+ is orthogonal with respect to π, but not orthonormal, although Qn (1) = 1 for all n ∈ Z+ . Given n, m, k ∈ Z+ with |n − m| ≤ k ≤ n + m one obtains a real linearization Qn (x)Qm (x) =
n+m
g(n, m, k)Qk (x)
(x ∈ R),
(1.2)
k=|n−m|
where the coefficients g(n, m, k) ∈ R can be described recursively in terms of the sequences (an )n∈Z+ , (bn )n∈Z+ and (cn )n∈N . Now, let (Qn )n∈Z+ be an orthogonal sequence of polynomials defined by the recurrence relation (1), and assume that there is a nonnegative linearization of the form (2) in the sense that g(n, m, k) ≥ 0 for all n, m, k ∈ Z+ with |n − m| ≤ k ≤ n + m. Then εn ∗Qn εm :=
n+m
g(n, m, k)εk
k=|n−m|
for n, m ∈ Z+ provides M b (Z+ ) with a convolution ∗Qn such that Z+ becomes a hypergroup with unit element 0 and involution n → n− := n. The Haar measure ωZ+ of (Z+ , ∗Qn ) takes the form ωZ+ ({n}) = (εn ∗Qn εn )({0})−1 = g(n, n, 0)−1 for all n ∈ Z+ . Moreover, Z∧ + = {χx : x ∈ D}, where D := {x ∈ R : |Qn (x)| ≤ 1 for all n ∈ Z+ } is a compact subset of the interval [1 − 2a0 , 1] and the Plancherel measure πZ+ coincides with the measure that orthogonalizes the defining polynomial sequence (Qn )n∈Z+ . Subexamples of the discrete polynomial hypergroups in one variable are 1.3.1.1.1 The discrete Jacobi (polynomial) hypergroups of the form ), where {Qα,β : n ∈ Z+ } denote the set of normalized Jacobi (Z+ , ∗Qα,β n n polynomials.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
101
It turns out that ∼ Z∧ + = [−1, 1], and that for all (α, β) ∈ V 1 := (α , β ) ∈ R2 : α ≥ β > −1 with β ≥ − or α + β ≥ 0 2 (Z+ , ∗Qα,β ) is a strong, hence a Pontryagin hypergroup. n For α = β the discrete Jacobi hypergroups become the ultraspherical hypergroups; for α = − 12 , α = 12 and α = 0 they are called Chebyshev hypergroups of the first and second kind, and Legendre hypergroups respectively. In the case of integer parameters α=β=
d−3 2
for d ≥ 2 we obtain the identifications Z+ ∼ = (SO(d)//SO(d − 1))∧ and ∼ Z∧ + = SO(d)//SO(d − 1). (Compare Example 1.1.2.3). 1.3.2. Compact polynomial hypergroups This arises from a dual view of the polynomials defining discrete polynomial hypergroups, as we have seen in the above cited example of the double coset hypergroup SO(d)//SO(d − 1). Here the compact interval [−1, 1] becomes a compact hypergroup with a (dual) convolution εx ˆ∗Qn εy ∈ M 1 ([−1, 1]) given by α,β Qα,β ∗Qn εy ) (x)Q (y) = Qα,β n n n d(εx
[−1,1]
∗Qn ) Z+ .(Z∧ +,
is called a dual Jacobi hypergroup. for all x, y ∈ [−1, 1], n ∈ This procedure motivates an axiomatic introduction of general compact polynomial hypergroups (K, ∗P ), where K is a compact subset of Rd (d ≥ 1) and P ⊂ C[X1 , . . . , Xd ] is a family of polynomials on K which
October 24, 2013
102
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
are orthogonal with respect to some measure π ∈ M+ (K). For the family P we require that for x, y ∈ K there is a measure µx,y ∈ M 1 (K) satisfying P (x)P (y) = P dµx,y K
whenever P ∈ P, and introduces a convolution ∗P in M 1 (K) by εx ∗P εy := µx,y for all x, y ∈ K. If the convolution ∗P fulfills the axioms (H1) to (H4) of a hypergroup, (K, ∗P ) is said to be a compact polynomial hypergroup. Rather than citing the full axiomatic approach described in [2] we just quote a few analytic properties and give special examples. Since K is compact, K ∧ is discrete and can be identified with the countable set P. The Fourier-Stieltjes transform µ ˆ of µ ∈ M b (K) is given by µ ˆ({P }) = P dµ K
for all P ∈ P, and the mappings µ → µ ˆ and f → fˆ are injective on M b (K) 1 and L (K) respectively. Identifying the Plancherel measure πK of K with the measure P → πK ({P }) = P −2 2 on P, the Plancherel-Levitan formula provides an isometric isomorphism f → fˆ from L2 (K) onto l2 ({π({P }) : P ∈ P}). 1.3.2.1 Geometrically interesting examples can be listed. 1.3.2.1.1 The disk hypergroup (D, ∗α ) with basic space D := {z ∈ C : |z| ≤ 1} is defined via the family (Qα m,n )(m,n)∈Z2+ of disk polynomials of order α > 0 given by α,|m−n|
Qα m,n (z) := Qm∧n
(2|z|2 − 1)|z||m−n|
for all z ∈ D, m, n ∈ Z+ . (D, ∗α ) admits 1 := (1, 0) ∈ C as a unit element and z → z as an involution.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
103
The normalized Haar measure ωα of (D, ∗α ) can be computed as ωα (d(x, y)) =
α+1 (1 − x2 − y 2 )α dx dy . π
The dual D∧ ∼ = Z2+ turns out to be a hypergroup, so D is strong and as such a Pontryagin hypergroup. For α = d − 2 (d ≥ 3) the disk hypergroup D can be identified with the double coset hypergroup U (d)//U (d − 1). 1.3.2.1.2 The half-disk hypergroup (D+ , ∗) with D+ := {(x, y) ∈ R2 : x2 + y 2 ≤ 1, y ≥ 0} appears as the dual E ∧ of a hypergroup E whose convolution is defined via half-disk polynomials which are Koornwinder type III polynomials. One can show that D+ is the quotient hypergroup D/{−1, 1}, where {−1, 1} is a subgroup of D, hence E ∼ = D∧ + becomes a subhypergroup of ∧ ∼ 2 D = Z+ . 1.3.2.1.3 Cone-embedded hypergroups Applying square polynomials, Koornwinder type III and IV polynomials one can establish compact polynomial hypergroup structures on the unit square [−1, 1]2 , on the parabolic triangle {(x1 , x2 ) ∈ R2 : 0 ≤ x22 ≤ x1 ≤ 1} and on the triangle {(x1 , x2 ) ∈ R2 : 0 ≤ x2 ≤ x1 ≤ 1} respectively. For details and the relevant references see [30]. 1.3.3. Sturm-Liouville hypergroups These hypergroup structures on R+ are introduced by means of a SturmLiouville operator. A function A ∈ C(R+ ) ∩ C 1 (R× + ) is said to be admissible if A(x) > 0 for all x ∈ R× , if there exists constants ε > 0, α0 ≥ 0 and a function + ∞ α1 ∈ C (] − ε, ε[) satisfying A α0 (x) = + α1 (x) A x
October 24, 2013
10:0
9in x 6in
104
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
whenever x ∈]0, ε[. In the singular case α0 > 0 one assumes in addition that α1 is even. We consider the Sturm-Liouville operator L associated with A defined by Lf (x) := −
1 (A(x)f (x)) A(x)
× 2 2 for all f ∈ C 2 (R× + ), x > 0, and the differential operator lA on C ((R+ ) ) defined by
lA (u)(x, y) := L1A u(x, y) − L2A u(x, y) for all x, y > 0, where LjA denotes the Sturm-Liouville operation with respect to the j-th variable of the function at evaluation (j = 1, 2). Now, a hypergroup (R+ , ∗) is called a Sturm-Liouville hypergroup if there exists an admissible function A such that for each even function f ∈ C ∞ (R) the function (x, y) → uf (x, y) := f d(εx ∗ εy ) R+
on R2+ belongs to
2 C 2 ((R× +) )
and satisfies lA (uf ) = 0
as well as (uf )y (x, y) = 0 whenever x ∈ R+ . For each Sturm-Liouville hypergroup K := (R+ , ∗L ) the unit element is 0 and the involution coincides with the identity. Haar measure of K is given by ωK := A · λR+ , where λR+ denotes the restriction of the Lebesgue measure to R+ . It turns out that K ∗ = {φλ : λ ∈ R ∪ iR}, where the functions φλ are the solutions of the initial value problem LA φλ = (λ2 + ρ2 )φλ , φλ (0) = 1,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
105
and φλ (0) = 0 for λ ∈ R ∪ iR. Here A (x) ≥0 x→∞ 2A(x)
ρ := lim
denotes the index of K which under additional assumptions on A exists. Moreover, K ∧ = {φλ : λ ∈ R+ ∪ i[0, ρ]}. For λ ∈ i]ρ, ∞[ the semicharacters φλ of K are strictly positive and strictly increasing. Hence, given φλ ∈ K ∗,p for some λ ∈ i]ρ, ∞[ the φλ -modification of K is again a Sturm-Liouville hypergroup with admissible function Aφλ := φ2λ · A and Haar measure Aφλ · λR+ . 1.3.3.1 Bessel-Kingman hypergroups are special cases of Sturm-Liouville hypergroups as they appear for the choice x → A(x) := x2α+1 for α > − 21 . These hypergroups K := (R+ , ∗α ) are self-dual, the characters of K being related to the modified Bessel functions x → φλ (x) := jα (λx) of order α. For α =
1 , 2
K∼ = M (3)//SO(3) ∼ = (R3 )SO(3) .
(Compare with Examples 1.1.2.1 and 1.2.4.2.1.) The φα -modification of K for λ := i yields the Naimark hypergroup. We note that any strong Strum-Liouville hypergroup is necessarily a Bessel-Kingman hypergroup for a suitable parameter α ∈ R+ . 1.3.3.2 Let G := Z and H := {e, i}, where e denote the identity of H and i the inversion of Z. Then the orbit hypergroup K := ZH ∼ = Z+ carries the
October 24, 2013
106
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
hypergroup structure given by the convolution εm ∗ εn :=
1 (ε|m−n| + εm+n ) 2
for m, n ∈ Z+ . K is called the discrete cosine hypergroup. K can also be considered as the discrete polynomial hypergroup generated by the sequence of Chebychev polynomials of first kind. K ∧ consists of the characters φλ for λ ∈ [0, π] given by φλ (n) := cos(λn) whenever n ∈ Z+ . Hence K ∧ ∼ = [0, π]. It should be noted that the discrete cosine hypergroup is a subhypergroup of 1.3.3.3 the symmetric hypergroup of noncompact type which is introduced as a Sturm-Liouville hypergroup L := (R+ , ∗A )
with A(x) = 1 for all x ∈ R+ .
For this hypergroup the convolution reads as εx ∗A εy :=
1 (ε|x−y| + εx+y ) 2
for all x, y ∈ R+ , and the characters are given by φλ (x) = cos(λx)
(λ ∈ R+ )
whenever x ∈ R+ . Further approaches to the symmetric hypergroup of noncompact type lead to the representation of L as the double coset hypergroup G//H with G := RZ2 and H := Z2 or as the orbit hypergroup RH with H := {e, i}, where i denotes the inversion of R. Moreover, L can be derived from a Bessel-Kingman hypergroup (R+ , ∗α ) for α > − 12 . L is a Pontryagin hypergroup which admits the discrete cosine hypergroup K as a non-trivial subhypergroup. χ0 -modification of L with χ0 ∈ L∗,p defined by χ0 (x) = cosh x
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
107
for all x ∈ K yields the cosh hypergroup which is a Sturm-Liouville hypergroup M := (R+ , ∗A ) with A(x) := cosh2 x for all x ∈ R+ . M is not strong and admits the discrete cosh hypergroup as a subhypergroup. The hypergroup M can also be obtained as a χ0 -modification of the discrete cosine hypergroup K if one chooses χ0 ∈ K ∗,p with χ0 (n) := cosh n for all n ∈ Z+ . 1.3.4. Higher rank Bessel hypergroups This class of self-dual commutative hypergroups enlarges the concept of Bessel-Kingman hypergroups introduced in 1.3.3.1 and already touched upon in Examples 1.1.2.1 and 1.2.4.2.1. Let F denote one of the division algebras R, C or H (the algebra of Hamilton quaternions) with dimension d = 1, 2 or 4 respectively. Conjugate, real part and norm of t ∈ F are t, Re t :=
1 (t + t) 2
and 1
|t| := (tt) 2 respectively. For p, q ∈ N we introduce the vector space Mp,q := Mp,q (F) of p × qmatrices over F, where Mq := Mq,q . The set H(q) = H(q, F) := {x ∈ Mq (F) : x = x∗ } of hermitian q × q-matrices over F is a Euclidean vector space with scalar product (x, y) → x, y := Re tr (x∗ y), where x∗ := xt , and norm 1
x → x = x, x 2 . Clearly, dim H(q) =: n = q + d2 q(q − 1).
October 24, 2013
108
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
We also introduce the set Π(q) := Π(q, F) = {x2 : x ∈ H(q)} = {x∗ x : x ∈ H(q)} of positive semidefinite matrices in H(q), and the symmetric cone Ω(q) := Ω(q, F) of (strictly) positive definite matrix. Ω(q) is an open convex self-dual cone whose linear automorphism group acts transitively on it. Now, the unitary group U (p) := U (p; F) acts on M (p, q) by left multiplication (u, x) → ux. The orbit space M (p, q)U(p) can be identified with the space Π(q) under the mapping 1
U (p)x → (x∗ x) 2 =: |x|. We note that the Stiefel manifold (p, q) := {x ∈ M (p, q) : x∗ x = Iq } is the orbit of the block matrix σ0 :=
Iq ∈ M (p, q). 0
It can be shown, and here we follow the exposition [47] of M. R¨ osler, that the convolution ∗ of the orbit space M (p, q)U(p) transformed to Π(q) is given by f (|σ0 r + uσ0 s|)du εr ∗ εs (f ) = U(p)
=
P
(p,q)
f (|σ0 r + σs|)dσ
for all r, s ∈ Π(q), f ∈ C c (M (p, q)), where dσ denotes the normalized U (p) invariant measure on (p, q). It follows that (Π(q), ∗) is an orbit hypergroup as defined in Example 1.1.3, with unit element 0 and the identity as involution.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
109
The dual space Π(q)∧ of Π(q) consists of the functions φs (s ∈ Π(q)) given by φs (r) = e−i uσ0 r,σ0 s du U(p)
=
P (p,q)
e−i σ,σ0 sr dσ.
These functions are Bessel functions of a matrix argument introduced by C. Herz in [19]. Indeed, for x ∈ M (p, q) P (p,q)
e
−i σ,x
dσ = Iµ
1 ∗ x x , 4
hence φs (r) = Iµ
1 2 rs r 4
. for all r ∈ Π(q), where µ := pd 2 (Π(q), ∗) = (Π(q), ∗µ ) is a self-dual hypergroup via the homeomorphism s → φs from Π(q) onto Π(q)∧ . Its Haar measure arises as the image under the mapping x → |x| of the Lebesgue measure
1 2π
pqα 2 dx
of M (p, q) and can be computed explicitly. Because of the self-duality of (Π(q), ∗µ ) its Haar measure coincides with the Plancherel measure. It should be noted that from the work of M. R¨osler quoted above a generalization of the higher rank Bessel hypergroups (Π(q), ∗µ ) to a continuous series of commutative hypergroup structures on Π(q) follows which interpolate those occurring as orbit hypergroups. The corresponding parameters µ are real numbers > d(q − 12 ). After all, for µ = pd with p ≥ 2q this generalization yields the higher 2 rank Bessel hypergroups of orbit type, for q = 1 the Bessel-Kingman hypergroups of the form ((Rp )SO(p) , ∗α ) with α := p2 − 1.
October 24, 2013
10:0
9in x 6in
110
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
2. Random Fields over Hypergroups 2.1. Second order random fields In the following discussion K := (K, ∗) remains to be a commutative hypergroup with convolution ∗. For a given probability measure space (Ω, F, P) we consider the Hilbert space L2 (Ω, F, P; C) of all complex-valued square P-integrable random variables on (Ω, F, P) and its closed subspace L20 := {ξ ∈ L2 : E(ξ) = 0} of centered elements of L2 . Given a locally compact space E and a mapping f : E → L2 L2 (f ) := sp ({f (x) : x ∈ E}) denotes the closed linear space generated by the set {f (x) : x ∈ E}. If L2 (f ) is separable, then for any measure space (Σ, A, µ) with a bounded measure µ on (Σ, A) such that dim L2 (Σ, A, µ; C) = dim L2 (f ) we have that L2 (Σ, A, µ; C) ∼ = L2 (f ). 2.1.1. Definition A (second order) random field over K is a mapping X : K → L2 . The mapping ρX : K × K → C given by
ρX (a, b) := E (X(a) − E(X(a)))(X(b) − E(X(b))) for all a, b ∈ K is called the covariance kernel of X. X is said to be centered if E(X(a)) = 0 for all a ∈ K. IN WHAT FOLLOWS WE SHALL EXCLUSIVELY DEAL WITH CENTERED RANDOM FIELDS OVER K. Clearly, a random field X over K is bounded or continuous if and only if ρX is bounded or continuous respectively.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
111
2.1.2. Definition A random field X over K is called stationary (over K) if ρX ∈ C b (K × K) and ρX (c, e)εa × εb− (dc) ρX (a, b) = K
whenever a, b ∈ K. 2.1.3. Consequences The covariance kernel ρX of a stationary random field X over K is always positive definite. Conversely (Kolomogorov), for any positive definite mapping ρ : K → C there exists a stationary random field X over K such that ρ = ρX . Since ρX is positive definite, by Property 1.2.6.3 there exists a unique b (K ∧ ) such that (Bochner) measure µ := µX ∈ M+ χ(a)µ(dχ) ρX (a, e) = K∧
for all a ∈ K. Consequently,
ρX (a, b) =
K∧
χ(a)χ(b)µ(dχ)
whenever a, b ∈ K, and hence X belongs to the Karhunen class. The measure µ =: µX representing ρX is called the spectral measure of the random field X. If, in particular, µ = f · πK , then f := fX is said to be the (deterministic) spectral density of X. Next to the spectral representation also Cram´er’s stochastic representation for stationary random fields over groups admits an analogue for hypergroups: A random field X over K is stationary if and only if there exists a unique orthogonal stochastic measure Z := ZX : B(K ∧ ) → L2 such that X(a) = χ(a)Z(dχ) K∧
for all a ∈ K. Moreover, Z(B)22 = µ(B) whenever B ∈ B(K ∧ ).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
112
b1644-ch02
Real and Stochastic Analysis
2.1.4. Examples of stationary random fields over K 2.1.4.1 Stationarity under the passage to real parts Let G be a locally compact Abelian group and H := {e, i} a group acting on G via inversion t → i(t) := t−1 . By K := GH we denote the orbit hypergroup associated with the pair (G, H). Let X be a stationary random field over G which is symmetric in the sense that X(t−1 ) = X(t) for all t ∈ G. Then the real part random field Y over K defined by Y (aH ) := Re X(a) for all a ∈ G is stationary over K. In fact, applying the identities (aH )− = (a−1 )H = aH and
E X(a)X(b) = E X(ab−1 )X(e)
one computes for all a, b ∈ G that 1 H H −1 1 −1 ρY (a , b ) = E (X(a) + X(a )) X(b) + X(b ) 2 2 1 1 ρY ((ab−1 )H , e) + ρY ((ab)H , e) 2 2 = ρY (z, e) ε(abt )H (dz) ωH (dt ) =
GH
=
GH
H
ρY (z, e)(εaH ∗ ε(bH )− )(dz ).
As an immediate application of this example, one obtains that the real part random fields arising from symmetric stationary random fields over G := Z or R are stationary over the discrete cosine hypergroup (1.3.3.2) or over the symmetric hypergroup on noncompact type (1.3.3.3) respectively. One just has to refer to the explicit forms of the respective convolutions. 2.1.4.2 Stationarity under modification Let X be a stationary random field over K. For χ0 ∈ K ∗,p we consider the χ0 -modification K• of K.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
113
Random Fields and Hypergroups
Then the random field Y over K defined by Y (a) :=
1 X(a) χ0 (a)
for all a ∈ K is stationary over K• . Now we list two subexamples of 2.1.4.2 concerning mean value estimator random fields. 2.1.4.2.1 Let X be a stationary random field over Z. Then the random field Y defined by Y (n) :=
n 1 X(k) 2n + 1 k=−n
for all n ∈ Z+ is no longer stationary in the classical sense, but stationary over the discrete Jacobi polynomial hypergroup K := (Z+ , ∗Qα,β ) with n 1 1 α = 2 , β = − 2 (1.3.1.1.1). In fact, for all n, m ∈ Z+ π n m 1 1 ikt ρY (n, m) = e eikt µ(dt ) 2n + 1 2m + 1 −π
k=−n
π
=
−π π
= −π
=
sin[(2n + 1) 2t ] sin[(2m + 1) 2t ] µ(dt ) (2n + 1) sin 2t (2m + 1) sin 2t 1
Qn2
,− 12
n+m
n+m
1
,− 12
2 (cos t)Qm
εn ∗ εm ({k})
k=|n−m|
=
k=−m
(cos t)µ(dt ) π
−π
1
Qk2
,− 12
(cos t)µ(dt )
ρY (k, 0)εn ∗ εm ({k}).
k=|n−m|
Now, for χt ∈ K ∗,p (t ∈]1, ∞[) we consider the χt -modification K• of K (which again is a discrete polynomial hypergroup). Then the random field Y˜ over K• given by Y˜ (n) :=
n 1 X(k) (2n + 1)χt (n) k=−n
for all n ∈ K• = Z+ is stationary over K• .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
114
b1644-ch02
Real and Stochastic Analysis
2.1.4.2.2 Let K be a stationary random field over R. The random field Y over R+ given by a 1 X(s)ds if a ∈ R× + Y (a) := 2a −a X(0) if a = 0 is stationary over the Bessel-Kingman hypergroup K := (R+ , ∗α ) with α = 1 (1.3.3.1). 2 In fact, for all a, b ∈ R+ b a 1 1 ird isλ ρY (a, b) = e dr e ds µ(dλ) 2b −b R 2a −a 1 1 sin(aλ) sin(bλ)µ(dλ) = aλ bλ R = φ 12 (aλ)φ 12 (bλ)µ(dλ) R
=
R
= R+
R+
φ 12 (tλ)(εa ∗ εb )(dλ) µ(ds )
ρY (t, 0)εa ∗ εb (dt ).
For φλ ∈ K ∗,p with λ ∈ i]ρ, ∞[ we consider the φλ -modification K• of K. Then the random field Y˜ over K• given by a 1 X(s)ds if a ∈ R× + Y˜ (a) := 2aφλ (a) −a X(0) if a = 0 is stationary over K• . φλ -modification with λ = i yields stationarity over the Naimark hypergroup (1.3.3.1). 2.1.4.2.3 Let X be a symmetric stationary random field over Z. By Example 2.1.4.1 the random field Y := Re X is stationary over the discrete cosine hypergroup K := Z+ . But then the random field Y˜ defined by Y˜ (n) :=
1 Re X(n) cosh n
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
115
for all n ∈ Z+ appears to be stationary over the discrete cosh hypergroup K• which is the χ0 -modification of K with χ0 := cosh. 2.1.4.2.4 Let X be a symmetric stationary random field over R and Y := Re X. Then Y is stationary over the symmetric hypergroup K of noncompact type by Example 2.1.4.1, and the random field Y˜ defined by 1 Re X(a) cosh a for all a ∈ R+ is stationary over the cosh hypergroup K• , the modifying positive semicharacter of K being cosh. Y˜ (a) :=
2.1.5. Stationary random fields occurring in statistics Primary sources for the subsequent examples are the papers [32] by R. Lasser and M. Leitner and [24], [25] by V. H¨ osel and R. Lasser. Let (Z+ , ∗Qn ) be a discrete polynomial hypergroup in one variable, with ∼ Haar measure ωZ+ and dual space Z∧ + = D (see 1.3.1.1). 2.1.5.1 Oscillations Let (Xn )n∈Z+ be a sequence of orthogonal random variables Xn on a probability space (Ω, A, P) with mean 0 and finite second moment σn2 . Let (tn )n∈Z be a fixed sequence of characters in D, and assume that ∞
σk2 < ∞.
k=−∞
Then Y (n) :=
∞
Xk Qn (tk )
k=−∞
for all n ∈ Z+ defines a stationary random field Y over Z+ ; it is called an oscillation of the sequence (Xn )n∈Z+ . In fact, for the covariance kernel ρY of Y we have ρY (n, m) =
∞
σk2 Qn (tk )Qm (tk )
k=−∞
=
n+m k=|n−m|
whenever n, m ∈ Z+ .
ρY (k, o) εn ∗Qn εm ({k})
October 24, 2013
10:0
9in x 6in
116
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
2.1.5.2 White noise A random field W over Z+ := (Z+ , ∗Qn ) is called white noise over Z+ if E(W (n)W (m)) = δn,m ωZ+ ({n})−1 for all n, m ∈ Z+ . Clearly, W is stationary, and µW = πZ+ . 2.1.5.3 Moving averages Let W be a white noise over Z+ . For any random field Y over Z+ we define εn ∗ Y (m) :=
n+m
Y (k)(εn ∗Qn εm )({k})
k=|n−m|
for all m ∈ Z+ . A random field X over Z+ is called a moving average over Z+ if there exists a sequence (an )n∈Z+ in L2 (Z+ , ωZ+ ) such that X(n) =
∞
ak εn ∗ W (k) ωZ+ ({k})
k=0
whenever n ∈ Z+ . It has to be shown that X is well-defined and stationary. For that purpose we introduce for a given stationary random field Y over Z+ a sequence (Tm )m∈Z+ of translation operations Tm generated by Y , i.e., on H := sp ({Y (n) : n ∈ Z+ }). In fact, the linear, symmetric and non-decreasing mapping N
bk Y (k) →
k=0
N
bk εm ∗ Y ({k})
k=0
on sp ({Y (n) : n ∈ Z+ }) admits a continuous symmetric extension Tm on H. Moreover, Tm Tn :=
n+m
Tk (εn ∗Qn εm )({k})
k=|n−m|
for all m, n ∈ Z+ . Now we return to the moving average X over Z+ . Since X(0) ∈ L2 , X(m) = Tm X0 ∈ L2 ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
117
where Tm denotes the translation operator generated by W . The stationarity of X follows from the equations ρX (n, m) = E[(Tn X0 )Tm X0 ] = E[(Tm Tn X0 )X0 ] =
m+n
E[(Tk X0 )X0 ]εm ∗Qn εn ({k})
k=|m−n|
=
m+n
ρX (k, 0) εm ∗Qn εn({k})
k=|m−n|
valid for all n, m ∈ Z+ . 2.1.5.4 Autoregression Let W be a white noise over Z+ := (Z+ , ∗Qn ). A stationary random field X over Z+ is said to be autoregressive of order q ≥ 1 if there exist b1 , . . . , bq ∈ C such that X(n) = b1 ε1 ∗ X(n) + · · · + bq εq ∗ X(n) + W (n) for all n ∈ Z+ . One can show that an autoregressive random field X of order 1 over Z+ with α := b1 and |α| < 1 is in fact a moving average over Z+ . The main argument in proving this assertion is described as follows: Let Q be any polynomial admitting the representation Q=
N
αk Qk
k=0
with α0 , α1 , . . . , αN ∈ C. One defines Q ∗ Y (n) :=
N
αk εk ∗ Y (n)
k=0
for all n ∈ Z+ , where Y is a random field over Z+ . By the composition property of the sequence (Tm )n∈Z+ of translation operators introduced in 2.1.5.3 we obtain that P ∗ (Q ∗ W (n)) = (P Q) ∗ W (n) holds for all n ∈ Z+ and polynomials P, Q.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
118
b1644-ch02
Real and Stochastic Analysis
Now Y =
∞
αk εk1 ∗ W
k=0
is a well-defined random field over Z+ satisfying the required regression. As in the classical theory one sees that X = Y . 2.2. Translation and decomposition For discrete polynomial hypergroups we encountered translations of random fields already in 2.1.5.3. This notion was extended by M. Leitner for arbitrary commutative hypergroups K and stationary random fields X over K having spectral measure µ and stochastic measure Z. The discussion of this section relies on his work in [34] and [35]. Let H := L2 (X) ⊂ L2 and define εa ∗ X(b) := χ(a)χ(b)Z(ds) K∧
for all a, b ∈ K. For a ∈ K the mapping Ta : sp ({X(b) : b ∈ K}) → H given by
Ta
N
k=0
ak X(bk )
:=
N
ak εa ∗ X(bk )
k=0
is well-defined, linear, continuous and contractive, hence admits a continuous extension to H (which we again denote by Ta ). Moreover, Ta∼ = Ta− , Te = Id, Ta is normal, the operators Ta (a ∈ K) commute, and the mapping a → Ta on K is continuous. More significant properties valid for a, b ∈ K, are (1) Ta Tb = KTt (εa ∗ εb )(dt ), (2) Ta X(b) = K Xt (εa ∗ εb )(dt ), and (3) spec(Ta ) = εˆa (supp µ). {Ta : a ∈ K} is said to be the family of translation operators associated with the random field X over K.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
119
For proofs of properties (1) and (2) we observe that the vector-integrals involved are well-defined, since supp (εa ∗ εb ) is compact for all a, b ∈ K (by axiom (H4)). Transition from the translation operation Ta to the corresponding multiplication operator Mεˆa− defined by Mεˆa− f := εˆa− f for all f ∈ sp ({ˆ εb− : b ∈ K}), with values in L2 (K ∧ , µ), implies property (1), since Mεˆa− Mεˆb− f = (εa ∗ εb )∧ f ∧ εt (εa ∗ εb )(dt ) f
=
K
= K
=
εˆt− (εa ∗ εb )(dt ) f Mεˆt (εa ∗ εb )(dt ) f
K
for every f ∈ L2 (K ∧ , µ). But with Property (1) at hand, one also has Property (2), since for all a, b ∈ K Ta X(b) = Ta Tb X(e) = Tt (εa ∗ εb )(dt ) X(e) K
X(t)(εa ∗ εb )(dt ).
= K
Finally we know that spec (Ta ) = spec (Mεˆa− ) equals the essential range Rεˆa− of εˆa− . Since εˆa− is continuous, we obtain Rεˆa− = εˆa− (supp µ), and this proves Property (3). 2.2.1. Oscillations over K Let K be a commutative hypergroup with dual space K ∧ , and let χ be a fixed character in K ∧ .
October 24, 2013
10:0
9in x 6in
120
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
For a given random variable ξ ∈ L2 with Eξ = 0 we introduce the stationary random field X := χξ over K. X is called an oscillation over K. It is left to the reader to compare this definition with that of an oscillation of a sequence of orthogonal random variables considered in 2.1.5.1. 2.2.2. Proposition For any nondegenerate stationary random field X over K with spectral measure µ and associated family {Ta : a ∈ K} of translation operators the following statements are equivalent: (i) X is an oscillation over K. (ii) µ is a Dirac measure. (iii) {Ta : a ∈ K} is an irreducible family (of operators). Proof. The implications (i) ⇒ (ii), (i) ⇒ (iii) and (ii) ⇒ (i) are obvious. Only (iii) ⇒ (i) remains to be shown. Consider the norm-decreasing ∗-algebra homomorphism D from M b (K) into the space B(H) of bounded operators on the Hilbert space H := L2 (X) defined by D(εa ) := Ta for all a ∈ K. D is an irreducible ∗-representation of M b (K), and from the commutative property (1) of the family {Ta : a ∈ K} we conclude that for every a ∈ K Ta = r(a)Te , where r(a) ∈ C. Since the function on K is continuous and bounded, the assertion follows from the equalities r(a)r(b)Te = Ta Tb = Tt (εa ∗ εb )(dt ) K
= r(a ∗ b)Te valid for all a, b ∈ K. 2.2.3. In case of a group K the translation operators associated with a stationary random field X over K are unitary.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
121
There is a converse for arbitrary commutative hypergroups K in the sense that if X is a stationary random field over K with spectral measure µ and supp (µ) = K ∧ , then unitarity of a translation operator Ta associated with X implies that a belongs to the maximum subgroup or center c(K) := {x ∈ K : εx ∗ εx− = εe } of K. In fact, from K
1Z(ds ) = X(e) = Ta∼ Ta X(e) = Ta− X(a) = χ(a− )χ(a)Z(ds ) K∧
one concludes that −
1 = χ(a )χ(a) = K
χ(t)(εa ∗ εa− )(dt ),
since the mapping χ → χ(a) is continuous on K ∧ for all a ∈ K. This implies ε e = (εa ∗ εa− )∧ , hence that a ∈ c(K).
2.2.4. Proposition For the translation operators Ta (a ∈ K) associated with a stationary random field X over K with spectral measure µ and supp (µ) = K ∧ the following statements are equivalent: (i) Ta is a point operator in the sense that there is a q(a, b) ∈ K such that Ta X(b) = X(q(a, b)) for all b ∈ K. (ii) Ta is unitary. (iii) K is a group. Proof. In view of the properties of the family {Ta : a ∈ K} the equivalence (ii) ⇔ (iii) is clear. For the remaining assertions we need only show the implication (i) ⇒ (iii). By Cram´er’s stochastic representation theorem
October 24, 2013
122
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
the functions χ → χ(a)χ(b) and χ → χ(q(a, b)) (a, b ∈ K) are equal in L2 (K ∧ , µ), and hence we have εb (χ) = ε q(a,b) (χ) (εa ∗ εb )∧ (χ) = ε a (χ)
for all χ ∈ K ∧ , since the function χ → χ(a) is continuous on K ∧ . But from εa ∗ εb = εq(a,b) one sees that K carries the group structure given by the operation (a, b) → q(a, b).
2.2.5. Orthogonal decompositions of random fields Let X be a stationary random field over a commutative hypergroup K, with spectral measure µ and stochastic measure Z. We assume K ∧ to be the disjoint union A ∪ B of sets A, B ∈ B(K ∧ ). The random fields χZ(dx ) (2.1) R := A
and
S :=
χZ(dχ)
(2.2)
B
are stationary over K, and X = R + S.
(2.3)
Moreover, the spectral measures µR and µS of R and S are Re A µ and Re B µ respectively and R ⊥ S.
(2.4)
Conversely, if R and S are stationary random fields over K with {R(a) : a ∈ K} and {S(a) : a ∈ K} in L2 (X) such that (3) and (4) are linear. Then there exist disjoint sets A, B ∈ B(K ∧ ) with A ∪ B = K ∧ such that R and S admit the representation (1) and (2) respectively. The following decomposition theorem concerns an orthogonal decomposition of a stationary random field over K whose maximal part is stationary over the double coset hypergroup K//C := {aC : a ∈ K},
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
123
where C := c(K) and aC := C ∗ {a} ∗ C = {a} ∗ C for every a ∈ K. 2.2.5.1 Theorem (Center Decomposition) Given a stationary random field X over K there exist stationary random fields aC → R(aC) and a → S(a) ∈ L2 (X) over K//C and K respectively such that (i) X = R + S; (ii) R ⊥ S; (iii) If S admits a decomposition U + V with stationary random fields U and V over K//C and K respectively satisfying properties (i) and (ii), then U = 0, where 0 denotes the degenerate random field; (iv) R and S are uniquely determined by the properties (i) to (iii). Proof. Let µ and Z be spectral and stochastic measure of X. By C ⊥ we denote the orthogonal complement {χ ∈ K ∧ : ResC χ = 1} of C = c(K). The random fields ˜ χZ(dχ) R := C⊥
and R defined by ˜ R(aC) = R(a) for all aC ∈ K//C are (well-defined and) stationary over K and K//C respectively. Introducing the stationary random field χZ(dχ) S := K ∧ \C ⊥
over K one sees that assertions (i) and (ii) of the theorem are true. Suppose, now, that S = U + V is a decomposition of S as in (iii) of the ˜ defined by theorem. Then the random field U ˜ (a) := U (aC) U
October 24, 2013
124
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
for all a ∈ K is stationary over K with spectral measure µU˜ and stochastic measure ZU˜ . One shows that supp (µU˜ ) ⊂ C ⊥ . But the Propositions 2.2.2 and 2.2.4 yield µU˜ = 0, hence the assertion (iii). For the proof of assertion (iv) we assume being given stationary random fields R and S over K//C and K respectively, satisfying statements (i) to ˜ given by (iii) of the theorem. Clearly, the random field R ˜ (a) := R (aC) R for all a ∈ K is stationary over K with supp (µR˜ ) ⊂ C ⊥ . The spectral measure µS of S satisfies µS (C ⊥ ) = 0. In fact, if this were not true, S would admit a nontrivial stationary part over K//C of the form χZS (dχ), C⊥
where ZS denotes the spectral measure of S . By Proposition 2.2.4 there exist disjoint sets A, B ∈ B(K ∧ ) with K ∧ = A ∪ B such that ˜ R = χZ(dχ) A
and S =
χZ(dχ). B
Consequently, A = C ⊥ [µ] and B = K ∧ \C ⊥ [µ], and the proof of the theorem is complete. In order to prepare an analogue of the classical Wold decomposition we suppose again that X is a stationary random field over a commutative hypergroup K. Let A denote a family of non-empty subsets of K. For A ∈ A we introduce the spaces HX (A) := sp ({X(a) : a ∈ A})
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
and HX :=
b1644-ch02
125
HX (A).
A∈A
The random field X is said to be A-singular if HX = HX (K), A-regular if HX = {0}, and A-adapted if it is invariant in the sense that Ta HX ⊂ HX for all a ∈ K. Here {Ta : a ∈ K} denote the family of translation operators associated with X. 2.2.5.2 Theorem (Wold Decomposition) Let X be a stationary random field over K which is A-adapted. Then there exists a unique decomposition X = R + S, where (i) (ii) (iii) (iv)
R is an A-regular stationary random field over K, S is an A-singular stationary random field over K, R ⊥ S, and HR (A), HS (A) are contained in HX (A) for all A ∈ A.
Proof. We know from the discussion on translates of random fields at the beginning of this section that the random fields a → R(a) := Ta prHX ⊥ X(e) and a → S(a) := Ta prHX X(e) are stationary over K. Obviously X = R + S. From Ta Y, U = Y, Ta− U = 0
October 24, 2013
10:0
9in x 6in
126
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
⊥ for random variables Y ∈ HX and U ∈ HX we obtain that ⊥ ⊥ Ta (HX ) ⊂ HX
whenever a ∈ K, and (iii) holds true. Clearly, HS (A) ⊂ HX (A) for all A ∈ A. To complete the argument for (iv) we just observe that R(a) = X(a) − S(a) ∈ HX (A) for all a ∈ A, A ∈ A. Since HR ⊂ HX , and HR ⊥ HX by definition of R, also (i) has been proved. For (ii) we return to the inclusions HX ⊂ HX (A) ⊂ HR (A) ⊕ HS (A) valid for all A ∈ A. This implies HX ⊂ HS (A) for all A ∈ A, since HX ⊥ HR (A). Thus HX ⊂ HS ⊂ HS (K) ⊂ HX , and this implies (ii). The uniqueness of the decomposition follows in analogy to that of Theorem 2.2.5.1. 2.2.6. Examples 2.2.6.1 In the case of a discrete polynomial hypergroup Z+ it can be shown that a stationary random field over Z+ is A-adapted for the family A := {{k, k + 1, . . .} : k ∈ Z+ }. More generally, 2.2.6.2 If K is an arbitrary commutative hypergroup and A := {K\C : C ∈ C(K)}, then any stationary random field over K is A-adapted. 2.2.6.3 Obviously any white noise over a discrete polynomial hypergroup Z+ is A-regular for A as in Example 2.2.6.1.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
127
2.2.6.4 Oscillations X over a discrete polynomial hypergroup (Z+ , ∗Qn ) of the form X(n) := Qn (t)ξ
(n ∈ Z+ ),
∼ Z∧ and ξ ∈ L2 are fixed, are A-singular for A as in where t ∈ D = + Example 2.2.6.1. 2.2.7. Returning to moving averages In the following we restrict the discussion to discrete polynomial hypergroups Z+ := (Z+ , ∗Qn ), families A as in Example 2.2.6.1, and consider stationary random fields X over Z+ . The first two of the subsequent sufficient conditions for regularity and singularity are quoted without proof. 2.2.7.1 If the spectral density f of X satisfies the condition 1 ∈ L1 (Z∧ + ), f then X is A-regular. 2.2.7.2 For X with spectral density f to be A-singular it is necessary and sufficient that Q ∈ L1 (Z∧ +) f for all polynomials Q = 0. 2.2.7.3 Moving averages of the form n → X(n) := ak Tk W (n) ωZ+ ({n}) k≥0
and n → X(n) :=
an∗k W (k) ωZ+ ({n})
k≥0
over Z+ , where W is a white noise over Z+ , admit a spectral density |ˆ a|2 , a 2 denoting a fixed function (an )n∈Z+ in L (Z+ ). It follows that if X is any stationary random field over Z+ with spectral density f ∈ L1 (Z∧ + ) and if f > 0 [µ] (for the spectral measure µ := f · πZ+ of X), then X is a moving average.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
128
b1644-ch02
Real and Stochastic Analysis
2 Proof. Let φ ∈ L2 (Z∧ + ) with f := |φ| , and let Z denote the stochastic measure of X. We define a random field Y over Z+ by Y (n) := Qn φ−1 dZ D
for all n ∈ Z+ . Clearly, Y is a white noise over Z+ , since for m, n ∈ Z+ E(Y (m)Y (n)) = Qm Qn |φ−2 |dµ D
= δmn (εm ∗Qn εn ) ({0})−1 . The mapping A → Y˜ (B) :=
φ−1 dY
B
from B(D) into L2 is an orthogonal stochastic measure with Y˜ (B)2 = πZ+ (B) for all B ∈ B(D). With the Fourier series φ, Qk Qk ωZ+ ({k}) φ= k≥0
in L2 (Z∧ + ) in mind we obtain for all n ∈ Z+ X(n) = Qn φ−1 φdZ D
Qn
= D
=
φ, Qk Qk ωZ+ ({k}) dY˜
k≥0
φ, Qn Tn Y (k) ωZ+ ({k}),
k≥0
where {Tn : n ∈ Z+ } denotes the family of translation operators associated with the white noise Y over Z+ . 2.2.7.4 If X is a stationary random field over Z+ with spectral density f ∈ L1 (Z∧ + ), where the Plancherel measure πZ+ of Z+ is continuous, and if X is A-regular, then X is a moving average over Z+ .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
129
In fact, if X(n) = 0 for all n ∈ Z+ , the assertion is trivial. So, let X = 0. Then there exists a polynomial Q = 0 such that by 2.2.7.2 Q ∈ L1 (Z∧ + ). f Consequently, f = 0 [µ], and by 2.2.7.3 X is indeed a moving average. 2.3. Harmonizability In order to extend the notion of stationarity of random fields to the broader notion of harmonizability we need a short digression to bimeasures and their integrals. We follow the works of M. M. Rao [43], [44], D. K. Chang and Rao [6] for the classical background and H-J. Neu [39] for some extensions to hypergroups. Given measurable spaces (Σ1 , A1 ) and (Σ2 , A2 ), a mapping β : A1 × A2 → C is called a bimeasure on Σ1 × Σ2 provided the mappings β(A, ·) and β(·, B) are measures on Σ2 and Σ1 for every A ∈ A1 and B ∈ A2 respectively. A bimeasure β on Σ1 × Σ1 is said to be positive definite if for all n ∈ N, A1 , . . . , An ∈ A1 and c1 , . . . , cn ∈ C n n
ci c¯j β(Ai , Aj ) ≥ 0.
i=1 j=1
If measurable functions f1 and f2 on Σ1 and Σ2 are β(·, B)-integrable and β(A, ·) integrable for all A ∈ A2 and B ∈ A1 respectively, then the integrals β f1 (B) := f1 (σ1 )β(dσ1 , B) (B ∈ A2 ) Σ1
and
β f2 (A) :=
Σ2
f2 (σ2 )β(A, dσ2 ) (A ∈ A1 )
exist, and β f1 and β f2 are measures on Σ2 and Σ1 respectively. If, moreover, f2 is β f1 -integrable and f1 is β f2 -integrable, then (f1 , f2 ) is integrable in the sense of Morse and Transue provided f1 (σ1 )β f2 (dσ1 ) = f2 (σ2 )β f1 (dσ2 ). Σ1
Σ2
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
130
b1644-ch02
Real and Stochastic Analysis
In this case the two integrals define the Morse–Transue integral f1 (σ1 )f2 (σ2 )β(dσ, dσ2 ) Σ1
Σ2
of (f1 , f2 ). 2.3.1. Definition A random field X over a commutative hypergroup K is called harmonizable if there exists a positive definite bimeasure βX on K ∧ × K ∧ , where (K ∧ , B(K ∧)) is a Borel space, such that for the covariance kernel ρX of X one has ρX (a, b) = χ(a)ψ(b)βX (dχ, dψ) K∧
K∧
whenever a, b ∈ K. Here, the integral with respect to βX is understood in the sense of Morse and Transue applied to (f1 , f2 ) with f1 (χ) := χ(a) and f2 (ψ) := ψ(b) for all χ, ψ ∈ K ∧ , a, b ∈ K. If βX induces a measure on B(K ∧) ⊗ B(K ∧ ), then ρX determines a strongly harmonizable random field X over K. In fact, the mapping (A, B) → βX (A, B) := µX (A ∩ B) on B(K ∧ ) × B(K ∧ ) is a positive definite bimeasure (on K ∧ × K ∧ ), and χ(a)χ(b)βX (dχ, dχ) ρX (a, b) = K∧
=
K∧
K∧
χ(a)χ(b)µX (dχ)
whenever a, b ∈ K. Moreover, βX is concentrated on the diagonal σ(K ∧ ) of K ∧ . In analogy to the case of a locally compact Abelian group one shows the following characterization of harmonizable random fields over a commutative hypergroup. 2.3.2. Theorem Let X be a random field over K with covariance kernel ρX .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
131
The following statements are equivalent: (i) X is harmonizable with associated bimeasure βX . (ii) There exists a unique spectral stochastic measure ZX : B(K ∧ ) → L20 with E(ZX (A)ZX (B)) = β(A, B) for all A, B ∈ B(K ∧ ) such that X=
K∧
χZX (dχ).
(iii) The mapping a → X(a) is continuous, normbounded, and satisfies the inequality f XdωK ≤ cfˆ∞ K
for all f ∈ L1 (K), where c is a constant > 0. ˜ ⊃ L2 on an enlarged probability (iv) There exists a Hilbert space L20 (P) 0 ˜ ˜ ˜ ˜ : K → L2 (P) ˜ (over space (Ω, F, P) and a stationary random field X 0 K) such that ˜ X = prL20 X, ˜ onto L2 . where prL2 denotes the orthogonal projection from L20 (P) 0 For a characterization of harmonizable random fields over K in terms of variation boundedness we need some preparations from Functional Analysis. Let (Σ, A) be a measurable space and (L, · B ) be a Banach space. By the semivariation of a vector measure Z : A → L we understand the number n αi Z(Ai ) , Zsv (A) := sup i=1
where the supremum is taken on all sequences {α1 , . . . , αn } in C with |αi | ≤ 1 for i = 1, . . . , n and all partitions {A1 , . . . , An } of A in A. It turns out that for every vector measure Z : A → L one has Zsv (A) < ∞ whenever A ∈ A.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
132
b1644-ch02
Real and Stochastic Analysis
2.3.3. Corollary Any harmonizable random field X over K is continuous and bounded, hence ρX ∈ C b (K × K). The continuity of X is part of statement (iii) of the theorem, and the boundedness of X follows from statement (ii) of the theorem, since X(a)2 ≤ ZX sv (K ∧ ) < ∞ for all a ∈ K. The remaining statement is implied by the assertion preceding Definition 2.1.2. 2.3.4. Definition A random field X over K is said to be variation-bounded (V -bounded) if the set f XdωK : fˆ∞ ≤ 1, f ∈ L1 (K) K
is norm-bounded. In connection with the equivalence (i) ⇔ (iii) in Theorem 2.3.2 we show 2.3.5. Theorem For any random field X over K the following statements are equivalent: (i) X is harmonizable. (ii) X is weakly continuous and V -bounded. Proof. (i) ⇒ (ii) Let X be harmonizable. By (ii) of Theorem 2.3.2 there exists a spectral stochastic measure Z : B(K ∧) → L20 such that X= χZ(dχ). K∧
For l ∈ (L20 )
l◦X =
K∧
χ l ◦ Z(dχ),
hence X is weakly continuous (even continuous and bounded). We consider the Bochner integral f XdωK K
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
133
and obtain for all f ∈ L1 (K) f XdωK = f l ◦ XdωK l K
K
f
=
K∧
K
fˆ∼ (χ)l ◦ Z(dχ)
= K∧
=l
χ l ◦ Z(dχ) dωK
K∧
fˆ∼ (χ)Z(dχ) .
Since l ∈ (L20 ) had been chosen arbitrarily, f XdωK = fˆ∼ (χ)Z(dχ), K
hence
K
f XdωK ≤ cfˆ∞ K
2
1
for all f ∈ L (K), where c := Zsv (K ∧ ) < ∞, and so X is V -bounded. (ii) ⇒ (i) Let X be weakly continuous and V -bounded. The set D := {fˆ∼ : f ∈ L1 (K)} is a dense subalgebra of C 0 (K ∧ ). Consider the mapping φ : D → L20 (well-) defined by f XdωK φ(f˜∼ ) := K
for all f ∈ L1 (K). From the assumption we know that the set C := {φ(fˆ∼ ) : fˆ∼ ∞ ≤ 1, f ∈ L1 (K)} is norm-bounded, hence relatively weakly compact, since L20 is reflexive. But then by a Lemma of I. Kluv´anek [29] there exists a unique spectral stochastic measure Z : B(K ∧ ) → L20 such that ∼ ˆ f XdωK = φ(f ) = fˆ∼ (χ)Z(dχ), K
K∧
October 24, 2013
10:0
9in x 6in
134
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
and for l ∈ (L20 )
K∧
fˆ∼ (χ)l ◦ Z(dχ) =
f l ◦ XdωK , K
whenever f ∈ L1 (K). It follows that f (a)l χ(a)Z(dχ) − X(a) ωk (da) = 0, K∧
K
and for all l ∈ (L20 ) K∧
fˆ∼ (χ)l ◦ Z(dχ) =
f l ◦ XdωK , K
whenever f ∈ L1 (K). It follows that f (a)l χ(a)Z(dχ) − X(a) ωK (da ) = 0 K
K∧
for all l ∈ (L20 ) and all f ∈ L1 (K). Consequently, χ(a)Z(dχ) − X(a) = 0 l K∧
for ωK -a.a. a ∈ K and all l ∈ (L20 ) . Since X is weakly continuous by assumption, χ(a)Z(dχ) X(a) = K∧
for all a ∈ K. Thus by Theorem 2.3.2 X is harmonizable. Let X be a harmonizable random field over K with spectral stochastic measure ZX : B(K ∧ ) → L20 , and let µ be an arbitrary measure in M (K ∧ ). 2.3.6. Definition A µ-strongly integrable mapping f : K ∧ → L20 with the property that L2 (f ) is separable, is called a µ-spectral stochastic density of X if ZX (B) = f dµ B
for all B ∈ B(K ∧ ). If µ := πK , then f := fX is just named a spectral stochastic density of X.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
135
One notes that for harmonizable random fields X over K with µ-spectral stochastic density fX the associated bimeasure βX is extendible to the product-σ-algebra B(K ∧ ) ⊗ B(K ∧ ) and thus X is strongly harmonizable. In what follows we consider a sequence of subspaces of the space C(K, L20 ) of all continuous random fields X over K with separable L2 (X). Here C(K, L20 ) carries the compact open topology τco induced by the family {NC : C ∈ C(K)} of seminorms NC given by NC (X) := max{X(a)2 : a ∈ C} for all X ∈ C(K, L20 ). The sequence of ascending inclusions Hdc (K, L20 ) ⊂ Hd (K, L20 ) ⊂ Hst (K, L20 ) ⊂ H(K, L20 ) contains the random fields with stochastic spectral density and compact support, with spectral stochastic density, strongly harmonizable random fields and harmonizable random fields respectively. Clearly, the space S(K, L20 ) of stationary random fields over K is a subspace of Hst (K, L20 ). 2.3.7. It is easy to construct random fields in Hd (K, L20 ). Just let X ∈ C(K, L20 ) be an ωK -strongly integrable random field and let φ ∈ L1 (K) ∩ P D(K). Then for each a ∈ K the function b → φ(a− ∗ b)X(b) on K is ωK -strongly integrable, the random field Y over K defined by φ(a− ∗ b)X(b)ωK (db ) Y (a) := K
for all a ∈ K belongs to Hd (K, L20 ), and the spectral stochastic density fY of Y is given by ˆ −) χ− XdωK fY (χ) = φ(χ K
for all χ ∈ K ∧ .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
136
b1644-ch02
Real and Stochastic Analysis
The function φ occuring in the definition of the random field Y is said to be a harmonizing function, and a continuous linear operator T on C(K, L20 ) with T (C(K, L20 )) ⊂ Hst (K, L20 ) will be named a harmonizing operator. 2.3.8. Theorem Let φ ∈ L1 (K) ∩ P D(K) and ψ ∈ B(K) with compact supp (ψ). Then the mapping X → Tφ,ψ (X) on C(K, L20 ) given by
φ(a− ∗ b)ψ(b)X(b)ωK (db )
Tφ,ψ (X)(a) := K
for all a ∈ K is a harmonizing operator with values in Hd (K, L20 ).
Proof. Let X ∈ C(K, L20 ). The random field ψX over K is ωK -strongly integrable, since its restriction to Cψ := supp (ψ) is bounded. By 2.3.7 Tφ,ψ (X) is a random field in Hd (K, L20 ) with spectral stochastic density ˆ −) χ → φ(χ χ− ψXdωK K
on K ∧ . Obviously Tφ,ψ is a linear operator. To see that Tφ,ψ is continuous (with respect to τco ) we look at the inequalities Tφ,ψ (X)(a)2 ≤ |φ(a− ∗ b)| |ψ(b)| X(b)2 ωK (db ) K
≤ K
|φ(a− ∗ b)|ωK (db ) ψ∞ NCψ (X)
≤ φ1 ψ∞ NCψ (X) valid for all a ∈ K, and see that Tφ,ψ (X)∞ ≤ φ1 ψ∞ NCψ (X) < ∞.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
137
2.3.9. Corollary If Cφ := supp (φ) is compact, then the operator Tφ,ψ is harmonizing with values in Hdc (K, L20 ) with supp (Tφ,ψ (X)) = Cψ ∗ Cφ− for all X ∈ C(K, L20 ). Proof.
From φ(a− ∗ b)ψ(b)X(b) = 0
for a, b ∈ K, X ∈ C(K, L20 ) one deduce that ({a− } ∗ Cψ ) ∩ Cφ = Ø ⇔ {a− } ∩ (Cψ− ∗ Cφ ) = Ø, hence that a ∈ (Cψ− ∗ Cφ )− = Cψ ∗ Cφ− .
From the general theory of hypergroups we recall the notion of αuniform continuity and observe that any random field X ∈ C(K, L20 ) is α-uniformly continuous on compact subsets of K. This fact will be applied in the subsequent approximation theorem whose origin goes back to the works [9], [10] by D. Dehay and R. Moch´e. 2.3.10. Theorem The space Hdc (K, L20 ) is (τco −) dense in C(K, L20 ). Proof. It has to be shown that every neighborhood of X ∈ C(K, L20 ) has a nonempty intersection with Hdc (K, L20 ). But each such neighborhood contains a neighborhood W := {Y ∈ C(K, L20 ) : NC (X − Y ) ≤ ε} with C ∈ C(K) and ε > 0 properly chosen. So it suffices to show that W ∩ Hdc (K, L20 ) = Ø. Now let Ce be a compact neighborhood of e ∈ K. We noted above that X is α-uniformly continuous on C ∗Ce . Hence there exists an open neighborhood U of e with U ⊂ Ce such that for a ∈ C and b ∈ {a} ∗ U one has X(b) − X(a)2 < ε.
October 24, 2013
10:0
138
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
Choosing a symmetric compact neighborhood Ce of e such that Ce ×Ce ⊂ U 1 the function φ := 1Ce ∗ 1∼ Ce belongs to L (K) ∩ P D(K), is ≥ 0 and has compact support Ce ∗ Ce . The function φ := αφ with α ∈ R such that φdωK = 1 K
is harmonizing which together with ψ := 1C∗Ce , provides a strongly harmonizable random field a → Tφ,ψ (X)(a) := φ(a− ∗ b)ψ(b)X(b)ωK (db ) K
over K by Theorem 2.3.8. In fact, Corollary 2.3.9 implies Tφ,ψ (X) ∈ Hdc (K, L20 ). In order to show that Tφ,ψ (X) ∈ W we observe that for a ∈ C Tφ,ψ (X)(a) − X(a) = φ(a− ∗ b)ψ(b)X(b)ωK (db ) K
φ(a− ∗ b)X(a)ωK (db )
− K
φ(a− ∗ b)(ψ(b) − X(a))ωK (db )
= K
φ(a− ∗ b)(X(b) − X(a))ωK (db ).
= C∗Cφ
and obtain
φ(a− ∗ b)X(b) − X(a)2 ωK (db )
Tφ,ψ (X)(a) − X(a)2 ≤ C∗Cφ
≤ε
φ(a− ∗ b)ωK (db ) = ε.
C∗Cφ
From Theorem 2.3.10 it is clear that also the spaces Hd (K, L20 ), Hst (K, L20 ) and H(K, L20 ) are dense in C(K, L20 ). 2.3.11. Generation of harmonizable random fields 2.3.11.1 (Truncation) Let K be a discrete hypergroup, X ∈ C(K, L20 ), and let A be a finite subset of K. Then the truncated random field Y := 1A X
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
139
belongs to Hd (K, L20 ) and admits the representation Y = fY Z Y , where fY (χ) :=
χ− (k)X(k)ωK ({k})
k∈A
for all χ ∈ K ∧ . Moreover, the bimeasure βY associated with Y is given by βY (B, C) = χ(k) ρX (k, l)ψ(l)ωK ({l})πK (dχ)πK (dψ) B
C k∈A
l∈A
whenever B, C ∈ B(K ∧ ). 2.3.11.2 (L1 -harmonization) Let X ∈ C(K, L20 ). Then the integral X(z)(εa × εb )(dz ) X(a ∗ b) := K
exists, since supp (εa ∗ εb ) is compact by axiom (H4) for all a, b ∈ K. Now let X ∈ H(K, L20 ) be ωK -strongly integrable, and let φ ∈ L1 (K). Then the random field Y over K defined by Y (a) := φ(b)X(a ∗ b)ωK (db ) K
for all a ∈ K belongs also to H(K, L20 ), and ZY = f ZX with χφdωK
f (χ) := K
for all χ ∈ K ∧ . 2.3.11.3 (Averaging) Let K := (R+ , ∗) be the symmetric hypergroup of noncompact type, introduced in 1.3.3.3, and let X ∈ H(K, L20 ) be ωK strongly integrable with spectral stochastic measure ZX . For T ∈ R+ we
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
140
b1644-ch02
Real and Stochastic Analysis
introduce the averaged random field Y := YT over K given by Y (a) :=
1 2
T
0
(X(|a − b|) + X(a + b))db
for all a ∈ K. Then Y ∈ H(K, L20 ) admits a spectral representation Y (a) = cos(λa)ZY (dλ) (a ∈ K), R+
where ZY = f ZX
with
1 sin(λT ) λ
f (λ) := for all λ ∈ R+ . Indeed, for all a ∈ K Y (a) =
T
R+
0
=
R+
X(s)(εa ∗ εb )(ds ) db
φ(b)X(a ∗ b)db ,
where φ := 1[0,T ] . By 2.3.11.2, Y ∈ H(K, L20 ), and cos(λa)ZY (dλ) Y (a) = R+
=
cos(λa)
R+
= R+
cos(λb)φ(b)db R+
ZX (dλ)
1 cos(λa) sin(λT )ZX (dλ). λ
2.3.11.4 (P D-harmonization) Let φ ∈ P D(K) with Bochner measure b µ ∈ M+ (K ∧ ). Given an ωK -strongly integrable random field X ∈ C(K, L20 ) we introduce the random field Y over K defined by φ(a ∗ b)X(b)ωK (db ) Y (a) := K
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
141
for all a ∈ K. Then Y ∈ Hst (K, L20 ), and Z Y = fX · µ with a µ-spectral stochastic density ρX of the form χXdωK fX (χ) = K
for all χ ∈ K ∧ . 2.3.11.5 (Inducing) Let K = (Z+ , ∗) be the discrete cosine hypergroup introduced in 1.3.3.2. For n ∈ Z+ we consider the subhypergroup Hn := {nk : k ∈ Z+ } b of K. Then 1Hn ∈ P D(K) with Bochner measure µn ∈ M+ (K ∧ ). For 2 a given ωK -strongly integrable random field X ∈ C(K, L0 ) the induced random field Y over K defined by
Y (m) :=
(1Hn (|m − l|) + 1Hn (m + l))X(l)
l≥0
for all m ∈ Z+ belongs to Hst (K, L20 ) and ZY = fY µn with a µn -spectral stochastic density fY given by fY (λ) =
cos(λk)X(k)ωK ({k})
k≥0
whenever λ ∈ [0, π]. In view of 2.3.11.4 the proof reduces to showing that 1Hn ∈ P D(K) for n ∈ Z+ . For the necessary arguments H-J. Neu [39] relies on results from [18] and [51] by P. Hermann and by M. Voit respectively. In fact, since the 1-dimensional representation ρ := 1Hn ∈ Hn∧ can be induced to K, ρ(P (f ∗ f ∼ )) ≥ 0,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
142
b1644-ch02
Real and Stochastic Analysis
where P denotes the projection from C c (K) onto C c (Hn ). But (f ∗ f ∼ )1Hn 1Hn dωHn ≥ 0 ρ(P (f ∗ f ∼ )) = K
implies
(f ∗ f ∼ ) 1Hn dωK ≥ 0
K
for all f ∈ C c (K), since for the discrete hypergroup K ωHn = Re Hn ωK whenever n ∈ Z+ . Property 1.2.6.1 yields the assertion. 3. Generalized Random Fields over Hypergroups 3.1. Segal algebras 3.1.1. A subalgebra S(K) of L1 (K) is called a Segal algebra on a commutative hypergroup K if the following conditions are fulfilled: (S1) S(K) is a Banach algebra with respect to a norm · S . (S2) S(K) is dense in L1 (K). (S3) For all f ∈ S(K) and a ∈ K, La f := εa ∗ f satisfies La f S ≤ f S . (S4) For every f ∈ S(K) the mapping a → La f from K into S(K) is continuous. A Segal algebra S(K) is said to be character invariant if it fulfills two more conditions (S5) For f ∈ S(K) and χ ∈ K ∗ , χf ∈ S(K) with χf S ≤ f S , and
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
143
(S6) for every f ∈ S(K) the mapping χ → χf from K ∧ into S(K) is continuous. If condition (S2) is replaced by the relation (S2 ) Fc (K) := {f ∈ L1 (K) : supp (fˆ) is compact} ⊂ S(K), then Definition 3.1.1 provides a CK-algebra introduced by A. K. Chilana and A. Kumar in [8]. We note that for strong hypergroups (S2 ) implies (S2), hence every CK-algebra is a Segal algebra. Conversely, for strong hypergroups conditions (S1) to (S6) imply (S2 ) which says that every character invariant Segal algebra is in fact a CK algebra. 3.1.2. Obvious facts 3.1.2.1 Every Segal algebra S(K) is an ideal in L1 (K), hence 3.1.2.2 h ∗ f ∈ S(K) and h ∗ f S ≤ h1 f S whenever h ∈ L1 (K), f ∈ S(K). 3.1.2.3 µ ∗ f ∈ S(K) and µ ∗ f S ≤ µf S for all µ ∈ M b (K), f ∈ S(K). 3.1.3. Examples of Segal algebras as they appear in the work of R. B¨ urger [5] and M. Leitner [37]. c 3.1.3.1 Let k ∈ C+ (K) be fixed and define for f ∈ C(K) the function (k) on K by f
f (k) (x) := (Lx k)f ∞
October 24, 2013
10:0
9in x 6in
144
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
for all x ∈ K. Then the Wiener algebra W (K) := {f ∈ C(K) : f (k) ∈ L1 (K)} furnished with the norm f → f W (K) := f (k) , is a Segal algebra satisfying (S5). A related Segal algebra is W∗ (K) := f ∈ C(K) : f = hn ∗ gn , hn , gn ∈ W (K), k≥1 hn W (K) gn W (K) < ∞ n≥1
together with the norm f → f W∗ (K) := inf
h≥1
hn W (K) gn W (K) ,
where the infimum is taken over all representations of f in terms of products hn ∗ gn with hn , gn ∈ W (K). If K is a strong hypergroup, then W0 (K) := {f ∈ W (K) : fˆ ∈ W (K ∧ )} together with the norm f → f W0 (K) := f W (K) + fˆW (K ∧ ) is a Segal algebra satisfying (S5). 3.1.3.2 For a discrete hypergroup K the algebra L1 (K) is the only character invariant Segal algebra on K. 3.1.3.3 Let p ∈ [1, ∞]. Then S(K) := L1 (K) ∩ Lp (K)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
145
together with the norm f → f S := f 1 + f p becomes a character invariant Segal algebra on K. 3.1.3.4 If K is a strong hypergroup and p ∈ [1, ∞[, then S(K) := {f ∈ L1 (K) : fˆ ∈ Lp (K)} furnished with the norm f → f S := f 1 + fˆp is a character invariant Segal algebra on K. From Functional Analysis we borrow the notion of continuous embeddings of Banach spaces which gives rise to a concept of minimality. The following definition was given for locally compact Abelian groups by H. G. Feichtinger in [12]. 3.1.4. Definition A Feichtinger algebra on a commutative hypergroup K is a character invariant Segal algebra on K which is minimal in the sense that it is continuously embeddable in every other character invariant Segal algebra on K. It turns out that the Feichtinger algebra on K is uniquely determined. In the sequel we shall make use for an arbitrary commutative hypergroup K, of the Banach space A(K) := {f ∈ C 0 (K) : f = u∨ for u ∈ L1 (K ∧ , πK )} furnished with the norm f → f A(K) := u1 , where u ∈ L1 (K) with u∧ = f , and for a given subset Q of K, of its subspace AQ (K) := {f ∈ A(K) : supp (f ) ⊂ Q}. It is easy to show that A(K) ∩ C c (K) is τco -dense in C 0 (K). 3.1.5. The Feichtinger algebra in the special case of a locally compact Abelian group G.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
146
b1644-ch02
Real and Stochastic Analysis
For a fixed open relatively compact neighborhood C of the element e of G with ωG (Q) = 1 we consider the Banach space S0 (G) := f = εyi ∗ f : yi ∈ G, fi ∈ AQ (G) (i ≥ 1), i≥1
fiA(G)
i≥1
<∞
together with the norm f → f S0(G) := inf
i≥1
fi A(G) ,
where the infimum is taken over all representations of f appearing in the definition of S0 (G). Then S0 (G) is the Feichtinger algebra of G. The proof of the character invariance of S0 (G) depends on the validity of the equation χ(εy ∗ f ) = εy ∗ χ(y)χf holding for all y ∈ G, χ ∈ G∧ and f ∈ L1 (G). The equation, however, is true for an arbitrary commutative hypergroup K if and only if all the characters of K are unitary in the sense that χ(y ∗ y− ) = 1 for all y ∈ K. But this means that K is a group. As a consequence of this remark, for hypergroups K a different approach to construct the Feichtinger algebra is needed. If K is a strong hypergroup, then M. Leitner [37] accomplished a first generalization. 3.1.6. Definition Let Q be an open, relatively compact neighborhood of e with ωK (Q) = 1. By S0 (G) we denote the set of all functions f ∈ L1 (K) which admit an admissible representation as an L1 (K)-convergent sum of the form (1)
(1)∨
f = f0 + µ1 ∗ f1 + a2 (3)∨
+ a4
(2)
(1)∨
(µ4 ∗ a4
(1)
(2)
(1)∨
(µ2 ∗ f ) + µ3 ∗ a3 (1)
)(µ4 ∗ f4 ) + · · · ,
(1)
(µ3 ∗ f3 )
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
147
Random Fields and Hypergroups (j)
where fi ∈ AQ (K), µi
b1644-ch02
(j)
∈ Mc1 (K), ai ∈ Mc1 (K ∧ ) (i, j ≥ 1) such that fi A(K) < ∞. i≥0
S0 (K) becomes a Banach space once we introduce the norm f → f S0(K) := inf fi A(K) : f has an admissible representation . i≥1
Moreover, (S0 (K), · S0 (K) ) is continuously embeddable into the Banach spaces (L1 (K), · ), (C 0 (K), · ∞ ) and (A(K), · A(K) ). In the proof of the statements the assumption that K is a strong hypergroup seems inevitable, since one requires that any f ∈ S0 (K) with an (j) 1 ∧ 1 admissible representation involving fi = u∨ i for ui ∈ L (K ), µi ∈ Mc (K) (j) and ai ∈ Mc1 (K ∧ )(i, j ≥ 1), has the form (1)∧
f = (u0 + µ1
(1)
(1)∧
u1 + a2 ∗ (µ2
u2 ) + · · · ) ∨ .
3.1.7. Theorem For strong hypergroups K the Banach algebra S0 (K) is the Feichtinger algebra on K. 3.1.8. Comments on the Proof of Theorem 3.1.7 That S0 (K) satisfies axioms (S1) to (S4) of a Segal algebra can be shown without the assumption of strength for K. Let f ∈ S0 (K) and χ ∈ K ∧ . Since K is strong, one has χa∨ = (εχ ∗ a)∨ whenever a ∈ M b (K ∧ ). But then the admissible representation of f yields that (1)
(1)
(1)
∨ ∨ χf = ε∨ χ f0 + εχ (µ1 ∗ f1 ) + (εχ ∗ a2 ) (µ2 ∗ f2 ) + · · · (1)
(1)
(1)
= f0 + ε∨ χ (µ1 ∗ f2 ) + (εχ ∗ a2 )(µ2 ∗ f2 ) + · · · with f0 := (εχ ∗ u0 )∨ ∈ AQ (K). (1) (j) Since ai ∈ Mc1 (K ∧ ), also εχ ∗ ai ∈ Mc1 (K ∧ ) (i, j ≥ 1), hence χf has an admissible representation and belongs to S0 (K). This proves (S5).
October 24, 2013
10:0
9in x 6in
148
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
In order to see (S6) one considers f ∈ S0 (K) and χi , χ ∈ K ∧ (i ≥ 1) with χi → χ as i → ∞. Without loss of generality one assumes that W := supp (f ) is compact. Moreover, there exists a function h ∈ S0 (K) with Re W h = 1. But then χi f − χf S0 (K) ≤ c(χi − χ)f ∧ hS0 (K) , where c denotes a constant > 0. Since K ∧ carries a convolution, ((χi − χ)f )∧ = (εχi − εχ ) ∗ f ∧
(i ≥ 1),
hence ((χi − χ)f )∧ → 0 as i → ∞, and there follows (S6). A detailed proof of Theorem 3.1.7 also shows that the definition of S0 (K) does not depend on the initially chosen neighborhood Q of e. Such a proof is contained in the generalization of the theorem to be presented in the following subsection. 3.2. The extended Feichtinger algebra Generalizing the notion of the Feichtinger algebra means in our context its extension to a large class of hypergroups including all strong hypergroups. Technically this extension has been achieved by H-J. Neu [39] through the modification procedure cited in 1.2.7. 3.2.1. Definition A Segal algebra S(K) on a commutative hypergroup K is called χ0 character invariant for some χ0 ∈ K ∗,p if the following conditions are satisfied: (S5• ) For f ∈ S(K) and χ ∈ supp (πK ), χ1 χf ∈ S(K) and 0 1 χf ≤ f S(K); χ0 S(K) (S6• ) For f ∈ S(K) the mapping χ →
1 χf χ0
from supp (πK ) into S(K) is continuous.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
149
The special choice of the semicharacter χ0 := 1K takes one back to character invariance introduced in 3.1.1. For a χ0 -character invariant Segal algebra S(K) we quote two inequalities as useful tools. 3.2.2. 1 ∨ u f χ0
If f ∈ S(K), n ∈ L1 (K ∧ ), then 1 ∨ u f χ0
∈ S(K) and
≤ u1 f S(K) .
S(K)
Replacing u by a ∈ M b (K ∧ ) with supp (a) ⊂ supp (πK ) one obtains that 1 ∨ a f ∈ S(K) and χ0 1 ∨ a f χ0
≤ af S(K).
S(K)
3.2.3. Supplements on Segal algebras 3.2.3.1 S(K) admits an approximate unit (fU )U∈U for L1 (K), where U denotes a basis of compact neighborhoods of e, fU ∈ C(K) with fU ≥ 0, supp (fU ) ⊂ U and fU dωK = 1 K
for all U ∈ U. 3.2.3.2 Let Q ⊂ K and let V be a relatively compact symmetric neighborhood of e. There exists an f ∈ S(K) ∩ C(K) with 0 ≤ f ≤ 1, f (y) = 1 for all y ∈ Q and f (y) = 0 for all y ∈ closure (V ∗ Q ∗ V ). 3.2.3.3 Let Q ⊂ K be compact. Then AQ (K) ⊂ S(K) such that f S(K) ≤ cQ u1 for all f ∈ AQ (K) of the form f = u∨ , u ∈ L1 (K ∧ ), where cQ is a constant > 0. 3.2.3.4 If K is strong, then A(K) ∩ C c (K) is a dense subalgebra of S(K).
October 24, 2013
10:0
9in x 6in
150
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
For a proof of this statement one shows that Fc (K) is a dense ideal of S(K) which implies that also AQ (K) ∗ A(K) is a dense ideal of S(K). This fact together with 3.2.3.3 yields the assertion. 3.2.4. Definition Let χ0 ∈ K ∗,p . A χ0 -character invariant Segal algebra is said to be a χ0 Feichtinger algebra on K if it can be continuously embedded into any other χ0 -character invariant Segal algebra on K. For the choice χ0 := 1K we just speak of a Feichtinger algebra on K. Our next aim is to describe the class of those hypergroups K for which, by way of a suitable choice of χ0 ∈ K ∗,p , a χ0 -Feichtinger algebra exists. Let A1 (K ∧ ) := {u ∈ L1 (K ∧ ) : u∨ ∈ C c (K)}. 3.2.5. Definition A commutative hypergroup K is said to be χ0 -admissible if there exists a χ0 ∈ K ∗,p such that the following conditions are fulfilled: (F1) For all u ∈ A1 (K ∧ ) and a ∈ Mc1 (K ∧ ) with supp (a) ⊂ supp (πK )
1 ∨ ∨ a u χ0
∧
∈ A1 (K ∧ );
(F2) For each u ∈ A1 (K ∧ ) the mapping ψ →
1 ψu∨ χ0
∧
from supp (πK ) into A1 (K ∧ ) is · 1 -continuous. 3.2.6. Examples of admissible hypergroups 3.2.6.1 All discrete hypergroups K are 1K -admissible. In fact, let u ∈ A1 (K ∧ ) and a ∈ Mc1 (K ∧ ). Then by the compactness of K ∧ , (a∨ u∨ )∧ ∈ L1 (K ∧ ) and (a∨ u∨ )∧∨ = a∨ u∨ ∈ C c (K). This shows (F1).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
151
Again by the compactness of K ∧ we have for u ∈ A1 (K ∧ ) and χi , χ ∈ K (i ≥ 1) that ∧
((χi − χ)u∨ )∧ 1 ≤ ((χi − χ)u∨ )∧ ∞ ≤ (χi − χ)u∨ 1 ≤ |χi − χ|u∨ 1 → 0 for χi → χ as i → ∞. But this proves (F2). 3.2.6.2 All strong hypergroups K are 1K -admissible. In fact, for u ∈ A1 (K ∧ ) and a ∈ Mc1 (K ∧ ) the strength of K implies that (a ∗ u)∨∧ = a ∗ u ∈ L1 (K ∧ ) and (a∨ u∨ )∧∨ = a∨ u∨ ∈ C c (K) hence (F1). (F2) follows from ((χi − χ)u∨ )∧ 1 ≤ (εχi − εχ ) ∗ u1 → 0 valid for all u ∈ A1 (K ∧ ) and χi , χ ∈ K ∧ (i ≥ 1) with χi → χ as i → ∞. 3.2.6.3 Let K be a hypergroup whose χ0 -modification K • for some χ0 ∈ K ∗,p is strong. Then K is χ0 -admissible. For a proof we take u ∈ A1 (K ∧ ) and a ∈ Mc1 (K ∧ ) with supp (a) ⊂ supp (πK ) and compute with the aid of 1.2.8 ∧ 1 ∨ ∨ a u = ((φ(a))∨• χ0 (u0 ◦ φ−1 )∨• )∧ χ0 = (χ0 (φ(a) • (u ◦ φ−1 ))∨• )∧ = (((φ(a) • (u ◦ φ−1 )) ◦ φ)∨ )∧ = (φ(a) • (u ◦ φ−1 )) ◦ φ ∈ L1 (K ∧ ) and observe that
1 ∨ ∨ a u χ0
But this implies (F1).
∧∨ =
1 ∨ ∨ a u ∈ C c (K). χ0
October 24, 2013
10:0
9in x 6in
152
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
In order to see the validity of (F2) we pick χi , χ ∈ supp (πK ) (i ≥ 1) with χi := χ0 ψi , χ := χ0 ψ, ψi , ψ ∈ supp (πK • ). Since χ10 (χi − χ)u∨ has compact support, property 1.2.8.7 yields the equalities
1 (χi − χ)u∨ χ0
∧
(ρ) =
1 (χi − χ)u∨ χ20
∧• (σ)
= ((ψi − ψ)(u ◦ φ−1 )∨• )∧• (σ) = (((εψi − εψ ) • (u ◦ φ−1 ))∨• )∧• (σ) = (εψi − εψ ) • (u ◦ φ−1 )(σ) valid for all ρ ∈ supp (πK ) with ρ := χ0 σ, σ ∈ supp (πK • ). But then ∧ 1 ∨ (χi − χ)u = (εψi − εψ ) • (u ◦ φ−1 )•1 → 0 χ0 1
for ψi → ψ as i → ∞. 3.2.6.4 Direct products of hypergroups occurring in the examples 3.2.6.1 to 3.2.6.3 are admissible, since their defining properties are stable under forming direct products. 3.2.6.5 Concrete example of admissible hypergroups are: 3.2.6.5.1 Among the discrete hypergroups: the discrete Jacobi polynomial hypergroups (1.3.1.1.1) and the discrete cosine hypergroup (1.3.3.2). 3.2.6.5.2 Among the strong hypergroups: the Bessel-Kingman hypergroups (1.3.3.1), the symmetric hypergroup of noncompact type (1.3.3.3), the disk hypergroups (1.3.2.1.1), and the higher rank Bessel hypergroups (1.3.4). 3.2.6.5.3 Among the modifications of strong hypergroups: the Naimark hypergroup (1.3.3.1) and the cosh hypergroups (1.3.3.3) as a modification of the symmetric hypergroup of noncompact type. Now we extend the discussion following Definition 3.1.6 beyond strong hypergroups.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
153
Random Fields and Hypergroups
3.2.7. Definition Let Q be a fixed open, relatively compact neighborhood of e with ωK (Q) = 1, and let χ0 ∈ K ∗,p . By SF (K) we denote the set of all functions f ∈ L1 (K) which admit an admissible representation as an L1 (K)-convergent sum of the form 1 (1)∨ 1 (1)∨ (1) (1) a2,1 . . . a2,n1 (µ2 ∗ f2 ), f = f 0 + µ1 ∗ f 1 + χ0 χ0 1 (1)∨ 1 (1)∨ (2) (1) a3,1 . . . a3,n2 (µ3 ∗ f3 ) + µ3 ∗ χ0 χ0 1 (2)∨ 1 (2)∨ a4,1 . . . a4,n4 + χ0 χ0 1 (1)∨ 1 (1)∨ (1) (2) a4,1 . . . a4,n3 (µ4 ∗ f4 ) + · · · , (3.1) × µ4 ∗ χ0 χ0 (j)
where fi ∈ AQ (K), µi supp (πK ) and
(j)
(j)
∈ Mc1 (K), ai,k ∈ Mc1 (K ∧ ) with supp (ai,k ) ⊂
fi A(K) < ∞
i≥0
for all i, j, k, nk ∈ N. 3.2.8. Properties of the set SF (K) 3.2.8.1 SF (K) is a Banach space with the norm f → f SF (K) := inf fi A(K) , i≥0
where f has an admissible representation of the form above. 3.2.8.2 (SF (K), · SF (K) ) is continuously embedded into (L1 (K), · 1 ). If in addition K satisfies condition (F1) of 3.2.5, then 3.2.8.3 (SF (K), · SF (K) ) is continuously embedded into (C 0 (K), · ∞ ) and (A(K), · A(K) ). In order to indicate the proof of the first cited statement we pick f ∈ SF (K) with an admissible representation of the above from with fi = u∨ i ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
154
b1644-ch02
Real and Stochastic Analysis
ui ∈ L1 (K ∧ ) (i ∈ Z+ ) and observe that condition (F1) of 3.2.5 implies 1 (1)∨ 1 (1)∨ (1) (1) ∨ + µ ∗ u + a . . . a f = u∨ (µ2 ∗ u∨ 0 1 2)+ ··· 1 χ0 2,1 χ0 2,n1 1 (1)∨ 1 (1)∨ (1)∧ ∨ ∧∨ + (µ u ) + a . . . a k = u∨ + ··· , 1 0 1 χ2 2,1 χ0 2,n1 −1 where k :=
1 (1)∨ (1)∧ ∨ a (µ u2 ) χ0 2,n1 2
and k ∧ ∈ A1 (K ∧ ). By repeating this procedure one obtains an h ∈ L1 (K) with h∧ ∈ A1 (K ∧ ) such that (1)∧
f = u∨ 0 + (µ1
u1 )∨ + (h∧ )∨ + · · ·
(1)∧
= (u0 + (µ1 Hence f ∈ C 0 (K) and f ∞ ≤
u1 ) + h∧ + · · · )∨ .
fi∞ ≤
i≥0
fi A(K) ,
i≥0
which provides the desired embedding. 3.2.9. Theorem Let K be a χ0 -admissible hypergroup for some χ0 ∈ K ∗,p . Then the space (SF (K), · SF (K) ) is the χ0 -Feichtinger algebra on K. Proof. Our task is to verify the axioms (S1) to (S4), (S5• ) and (S6• ) (the dot referring to the χ0 -character invariance), and to prove the minimality of the Banach algebra (SF (K), · SF (K) ) within the class of all χ0 -character invariant Segal algebras on K. (S1) follows from the fact that SF (K) is an ideal in L1 (K) together with the inequality h ∗ f SF (K) ≤ h1 f SF (K) valid for all f ∈ SF (K) and h ∈ L1 (K). Since SF (K) admits an approximate unit (fU )U∈U for L1 (K) with fU ∈ AQ (K) ⊂ SF (K) by 3.2.3.1 and 3.2.3.3, the inclusion C c (K) ∗ (fU )U∈U ⊂ SF (K) implies (S2).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
155
(S3) results from εx ∗ f ∈ SF (K) and εx ∗ f SF (K) ≤
fi A(K)
i≥0
for all x ∈ K and for each admissible representation of f ∈ SF (K). In order to show (S4) we note that for f ∈ SF (K) and h ∈ L1 (K ∧ ), χ10 h∨ f ∈ SF (K) with 1 ∨ h f χ0
SF (K)
≤ 4h1f SF (K) .
(3.2)
Let now f ∈ SF (K) with admissible representation in terms of fi = u∨ i , ui ∈ L1 (K ∧ ) (i ≥ 0). For ε > 0 there exists an m ≥ 1 with i≥m
fi A(K) <
ε . 2
Let Fm be the m-th partial sum of the admissible representation of f , ending with a summand involving fm , for a properly chosen sequence (ni )i≥1 in N. Then, for xi , x ∈ K such that xi → x for i → ∞ we obtain εxi ∗ f − εx ∗ f SF (K) ≤ (εxi − εx ) ∗ Fm SF (K) + ε. Without loss of generality we assume that supp (εxi ∗ f ) (i ≥ 0) and supp (εx ∗ f ) are contained in a compact subset W of K. Since AQ (K) ⊂ SF (K), supplement 3.2.3.2 provides a function h ∈ SF (K) with Re W h = 1. But then εxi ∗ f − εx ∗ f SF (K) ≤ 4((εxi − εx ) ∗ f )∧ 1 hSF (K) , with the aid of (2), and the uniform convergence ((εxi − εx ) ∗ f )∧ → 0 implies the continuity of the mapping x → εx ∗ f for all f ∈ SF (K). (SF• ) can be shown as follows: Let f ∈ SF (K) with an admissible representation of the form (1). For χ ∈ supp (πK ) 1 1 ∨ 1 ∨ (1) χf = ε χ f0 + ε (µ ∗ h) + · · · χ0 χ0 χ0 χ 1
October 24, 2013
10:0
156
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
is an admissible representation of χ10 χf , hence ∈ SF (K). Moreover, 1 χf ≤ fiA(K) , χ0 SF (K) i≥0
hence ≤ f SF (K) . As for the proof of (S6• ) we consider f ∈ SF (K), χi , χ ∈ supp (πK ) (i ≥ 1) with χi → χ for i → ∞. Again we assume that f has a compact support W and that there exists a function h ∈ SF (K) with Re W h = 1. Now (2) implies ∧ 1 1 1 χi f − χf ≤ 4 (χ − χ)f hSF (K) , i χ0 χ0 χ0 SF (K) 1
and applying (F2) we obtain 1 (χi − χ)f →0 χ0 SF (K) as i → ∞. This shows that the mapping χ → χ10 χf from supp (πK ) into SF (K) is continuous. It remains to be shown that SF (K) can be continuously embedded into any other χ0 -character invariant Segal algebra S(K) on K. For this let f ∈ SF (K) with an admissible representation whose m-th partial sum Fm ends with a summand involving fm . Then Fm ∈ C c (K) ∩ A(K) by 3.2.8.3, since (F1) is fulfilled, and Fm ∈ S(K) by 3.2.3.3. From 3.2.2 and supplement 3.2.3.3 we deduce the inequalities Fm S(K) ≤
m i=0
fi S(K) ≤ cQ
m
fiA(K) .
(3.3)
i=0
Thus (Fm )m∈N is a Cauchy sequence in S(K), hence f ∈ S(K), and the inclusion SF (K) ⊂ S(K) has been established. The inequality f S(K) ≤ cf SF (K) for all f ∈ S(K) and a constant c > 0 is clear in view of the inequalities (3), and the desired continuous embedding of SF (K) has been achieved.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
157
3.2.10. Corollary The χ0 -Feichtinger algebra SF (K) on K is τco -dense in C 0 (K). Proof. We recall that A(K) ∩ C c (K) is τco -dense in C 0 (K). From the theorem together with supplement 3.2.3.3 and property 3.2.8.3 we therefore obtain that A(K) ∩ C c (K) ⊂ SF (K),
and this implies the assertion. 3.2.11. Theorem For Pontryagin hypergroups K SF (K)∧ = SF (K ∧ ),
the equality being understood as an equality of topological algebras, where SF (K)∧ carries the topology induced by the norm f → f SF (K)∧ := f SF (K) . Proof. By the minimality of the Feichtinger algebra SF (K) (Theorem 3.2.9) it suffices to show that SF (K)∧ is a character invariant Segal algebra on K ∧ . Since SF (K) ⊂ A(K) by property 3.2.8.3, SF (K)∧ is a closed subspace of L1 (K ∧ ). But for all f, g ∈ A(K) ∩ C c (K), f ∧ ∗ g ∧ = (f g)∧ which implies via 3.2.2 that f ∧ ∗ g ∧ ∈ SF (K)∧ and f ∧ ∗ g ∧ SF (K)∧ ≤ f A(K)gSF (K) ≤ f SF (K) gSF (K) . For strong hypergroups K we have that by supplement 3.2.3.4 A(K) ∩ C c (K) is a dense subalgebra of any character invariant Segal algebra on K, in particular of SF (K). Thus (SF (K)∧ , · SF (K)∧ ) is a Banach algebra and (S1) is satisfied. Again the strength of K allows us to claim that every f ∈ L1 (K) with compact supp (fˆ) belongs to SF (K). Since A(K ∧ ) ∩ C c (K ∧ ) is dense in L1 (K ∧ ), we obtain (S2).
October 24, 2013
10:0
9in x 6in
158
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
(S3) and (S4) are obviously true, since for strong hypergroups K εχ ∗ f ∧ = (χf )∧ whenever χ ∈ K ∧ and f ∈ SF (K). Now the Pontryagin property of K enters. In this case every character ˜ with ρ ∈ K ∧∧ is of the form ρ = x x ˜(χ) := χ(x) for all χ ∈ K ∧ . This implies ρf ∧ = (εx ∗ f )∧ ,
hence the validity of (S5) and (S6).
An important tool for introducing the covariance distribution of generalized random fields over admissible hypergroups to be discussed in the subsequent section, is the projective tensor product of Banach algebras. We recall that given two Banach algebras (E, · E ) and (F, · F ) its projective tensor product is defined to be
:= f = fi ⊗ gi : fi ∈ E, gi ∈ F (i ≥ 1), E ⊗F i≥0
i≥0
fi E giF < ∞ .
Together with the norm
f → f ⊗ fi E gi F : f = fi ⊗ gi ∈ E ⊗F b := inf i≥0
i≥0
· ⊗ the space (E ⊗F, b ) becomes a Banach algebra. An immediate application of this notion to Segal algebras S(K) and S(L) on hypergroups K and L respectively yields a Segal algebra
S(K)⊗S(L) on K × L. The following is easily proved. 3.2.12. Lemma (1)
(2)
Let K and L denote hypergroups with χ0 - and χ0 - modifications K • (1) (2) and L• for χ0 ∈ K ∗,p and χ0 ∈ L∗,p respectively.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
159
Random Fields and Hypergroups (1)
b1644-ch02
(2)
If S(K) and S(L) are χ0 - and χ0 -character invariant Segal algebra
on K and L respectively, then S(K)⊗S(L) is a χ0 -character invariant Segal (1) (2) algebra on K × L for χ0 := χ0 ⊗ χ0 ∈ (K × L)∗,p . For a proof of this lemma only axioms (S5• ) and (S6• ) have to be verified, and this is done similarly for the respective task in the proof of Theorem 3.2.9. The subsequent functorial property of the Feichtinger algebra is due to M. Leitner in [37] for strong hypergroups and to H-J. Neu in [39] for the general case. 3.2.13. Theorem (1)
(2)
(1)
Let K and L be two χ0 -and χ0 -admissible hypergroups for χ0 ∈ K ∗,p (2) and χ0 ∈ L∗,p respectively. (1) Then the χ0 -Feichtinger algebra SF (K × L) on K × L for χ0 := χ0 ⊗ (2) χ0 ∈ (K × L)∗,p can be written as
F (L), SF (K × L) = SF (K)⊗S where the norms · SF (K×L) and · ⊗ b are equivalent, and the spaces
F (L) are equal as topological algebras. SF (K × L) and SF (K)⊗S Proof. At first one observes that K × L is a χ0 -admissible hypergroup. Theorem 3.2.9 together with Lemma 3.2.12 implies that
F (L) SF (K × L) ⊂ SF (K)⊗S and that f ⊗ b ≤ cf SF (K×L)
F (L), for all f ∈ SF (K)⊗S
with a constant c > 0. For the inverse inclusion one has to show that every
F (L) has an admissible representation. f ∈ SF (K)⊗S Let
F (L) f= fi ⊗ gi ∈ SF (K)⊗S i≥0
with fi ∈ SF (K), gi ∈ SF (L), satisfying fi SF (K) gi SF (L) < ∞. i≥0
October 24, 2013
10:0
9in x 6in
160
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
Next we look at admissible representations of fi and gi with choices fi,j ∈ AQ1 (K) and gi,j ∈ AQ2 (L) respectively (i, j ≥ 0). Here fi,k and gi,k have to be chosen such that fi,k A(K) gi,l A(K) < ∞. i≥0 k≥0 l≥0
An explicit calculation involving properties 1.2.8 of f (x, y) = (fi ⊗ gi )(x, y) = fi (x)gi (y) i≥0
i≥0
for (x, y) ∈ K × L in terms of the given admissible representations of fi and gi (i ≥ 1) yields fi,k ⊗ gi,l A(K×L) = fi,k A(K) gi,l A(L) < ∞, i≥0 k≥0 l≥0
i≥0 k≥0 l≥0
and shows that the calculated representation of f is admissible. Finally, let
F (L) fi ⊗ gi ∈ SF (k)⊗S f= i≥0
with fi , gi (i ≥ 0) as above. For the m-th partial sum Fm :=
m
fi ⊗ g i
i=0
we obtain Fm SF (K×L) ≤
fi,k ⊗ gi,l A(K×L)
i≥0 k≥0 l≥0
=
fi,k A(K) gi,l A(L) ,
i≥0 k≥0 l≥0
hence ≤
m
fi SF (K) gi SF (L)
i=0
and consequently f SF (K×L) ≤ f ⊗ b. This finishes the proof.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
161
3.3. Covariance and duality In the remaining discussion we shall always deal with a given admissible hypergroup K on which the Feichtinger algebra SF (K) has been constructed. The dual space SF (K) of SF (K) will be considered as the space of distributions on K. Since SF (K) is continuously embedded into C 0 (K) by property 3.2.8.3, one has L1 (K) ⊂ M b (K) ⊂ SF (K). Distributions l ∈ SF (K), measures µ ∈ M b (K) and functions g ∈ L1 (K) can be identified if f dµ = f gdωK l(f ) = K
K
for all f ∈ SF (K). We shall apply the notation ·, f for these three types of evaluations at f . A first systematic use of the Feichtinger algebra in the theory of stochastic process has been made by W. H¨ ormann in his thesis [23]. There is an unpublished paper [13] by Feichtinger and H¨ ormann which develops a theory of generalized random fields for Feichtinger algebras on a locally compact Abelian group. A substantial part of this theory can be extended to commutative hypergroups. 3.3.1. Definition Generalized random fields over K are continuous linear mappings X from SF (K) into the Hilbert space L2 := L2 (Ω, F, P). The mappings X(f )dP f → EX(f ) := Ω
and (f, g) → CovX , f ⊗ g := X(f ) − EX(f ), X(¯ g) − EX(¯ g )L2 on SF (K) and SF (K) × SF (K) are called the expectation functional and the covariance kernel of X respectively. While it is obvious that EX ∈ SF (K) we need to show that
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
162
b1644-ch02
Real and Stochastic Analysis
3.3.2. Lemma CovX admits a linear extension to an element of
F (K)) = SF (K × K). (SF (K)⊗S Proof. We restrict our argument to generalized random fields X from SF (K) into L20 . At first one extends CovX to the space SF (K) ⊗ SF (K) := sp ({f ⊗ g : f, g ∈ SF (K)}) furnished with the norm # "N N h → h⊗ := inf fn SF (K) gn SF (K) : h = f n ⊗ gn n=1
(N ∈ N).
n=1
Next we define CovX , h :=
N
X(fn), X(¯ gn )L2
n=1
for all h=
N
fn ⊗ gn ∈ SF (K) ⊗ SF (K).
n=1
From the construction of general tensor products follows that CovX is welldefined. It remains to be shown that CovX is bounded. Let h ∈ SF (K) ⊗ SF (K). By the Cauchy-Schwarz inequality we obtain that $ $ N $ $ $ $ fn ⊗ gn $ |CovX , h| = $CovX , $ $ i=1
$N $ $ $ $ $ = $ X(fn ), X(¯ gn )$ $ $ i=1
≤
N
Xop fn SF (K) Xop gn SF (K)
n=1
= X2op
N n=1
fn SF (K) gn SF (K) ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
163
Random Fields and Hypergroups
hence that " |CovX , h| ≤
X2op inf
N
fn SF (K) gn SF (K) : h =
n=1
N
# fn ⊗ g n
n=1
= X2oph⊗ . As a bounded linear functional on SF (K) ⊗ SF (K), CovX can be extended
F (K) of SF (K) ⊗ SF (K), and by uniquely to the completion SF (K)⊗S Theorem 3.2.13, CovX ∈ SF (K × K). 3.3.3. Notation With the knowledge of Lemma 3.3.2 EX and CovX are well named as expectation distribution and covariance distribution of the generalized random field X over K respectively. In what follows we shall compare the notions of generalized random fields and classical random fields over admissible hypergroups, where the adjective “classical” means bounded continuous second order random fields as introduced in Subsection 2.1. Similar to the approach in Section 2 we shall exclusively deal with centered generalized random fields X in the sense that EX(f ) = 0 for all f ∈ SF (K). Given a centered generalized random field X over K one says that the covariance distribution CovX of X is represented by a function h ∈ C b (K × K) provided CovX , f ⊗ g = h, f ⊗ g for all f, g ∈ SF (K). 3.3.4. Theorem Let X1 be a classical random field over K with covariance kernel ρX1 . Then (i) ρX1 ∈ C b (K × K), ˜ : (ii) X1 can be extended uniquely to a bounded linear mapping X b 2 M (K) → L0 , and ˜ is a generalized random field over K such that CovX (iii) X := Re SF (K) X is represented by ρX1 .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
164
b1644-ch02
Real and Stochastic Analysis
Proof. Property (i) follows directly from the definitions. In order to see (ii) and (iii) we consider ˜ X1 (y), lL2 µ(dy ) X(µ), lL2 := K
for all µ ∈ M b (K), l ∈ L20 . Since the mapping X1 (y), lL2 µ(dy ) l → K
˜ on L20 is linear and bounded, X(µ) is a well-defined element of L20 , and the b ˜ bounded linear mapping X : M (K) → L20 is an extension to M b (K) of ˜ defines a generalized random field over K. X1 . But then X := Re SF (K) X From X1 (y), lf (y)ωK (dy ) X(f ), l = K
follows the remaining equality g)L2 CovX , f ⊗ g = X(f ), X(¯ = X1 (y), X(¯ g)L2 f (y)ωK (dy ) K
X1 (y), X1 (z)L2 f (y)g(z)ωK (dy )ωK (dz )
= K
K
= ρX1 , f ⊗ g whenever l ∈ L20 , f, g ∈ SF (K).
3.3.5. Theorem Let X be a generalized random field over K such that its covariance distribution CovX is represented by a function h ∈ C b (K × K), and that the set {X(f ) : f ∈ SF (K)} is dense in L20 . Then (i) X extends uniquely to a τw - · 2 -continuous linear operator X0 from M b (K) into L20 ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
165
˜ : K → L20 given by (ii) the mapping X ˜ X(y) := X0 (εy ) ˜ over K, and for all y ∈ K defines a classical random field X (iii) ρX˜ = h. Proof. Under the assumption on X one observes that for a net (fα )α∈A in SF (K) such that (fα ωK )α∈A τw -converges to some µ ∈ M b (K), (X(fα ))α∈A is a Cauchy net in L20 . From the denseness of {X(f ) : f ∈ SF (K)} in L20 we obtain a uniquely determined element X0 (µ) of L20 satisfying X0 (µ), X(g)L2 = limX(fα ), X(g)L2 α
for all g ∈ SF (K). Clearly, the mapping X0 : M b (K) → L20 is linear. As for the τw - · 2 -continuity of X0 we take a net (µβ )β∈B in M b (K) with τw − lim µβ ∈ M b (K). β
Since the space of all ωK -continuous bounded measures on K is τw -dense in M b (K), there exists a net (fβα )β∈B,α∈A in SF (K) such that τw − lim fβα · ωK = µβ α
for all β ∈ B. Cantor’s diagonal procedure yields the existence of a net (fγ )γ∈G in SF (K) with τw − lim fγ · ωK = µ r
By the observation at the beginning of the proof (X(fγ ))γ∈G is a Cauchy net in L20 with limit X0 (µ), and by the construction of (fγ )γ∈G we get lim X0 (µβ ) = X0 (µ), β
hence the desired continuity of X0 . ˜ on K by Now we define the mapping X ˜ X(y) := X0 (εy ) ˜ follow from the respective for all y ∈ K. Boundedness and continuity of X ˜ properties of X0 . Thus X is a classical random field over K.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
166
b1644-ch02
Real and Stochastic Analysis
In order to obtain (iii) we just compute h(t)(εy ⊗ εz )(dt ) h(y, z) = K×K
h(t)((εy ∗ εe ) ⊗ (εz ∗ εe ))(dt )
= K×K
h(t)(εy ∗ fα · ωK ) ⊗ (εz ∗ fα · ωK )(dt )
= lim α
K×K
h(t)(εy ∗ fα ) ⊗ (εz ∗ fα ) ωK ⊗ ωK (dt )
= lim α
K×K
= limh, (εy ∗ fα ) ⊗ (εz ∗ fα ) α
= limX(εy ∗ fα ), X(εz ∗ fα )L2 α
= X0 (εy ), X0 (εz )L2 ˜ ˜ = X(y), X(z) L2 = ρX˜ (y, z) all y, z ∈ K, where we applied a net (fα )α∈A in SF (K) with τw − lim fα · ωk = εe . α
3.3.6. R´esum´e Theorems 3.2.4 and 3.2.5 provide under the respective assumption a 11 correspondence between the classes of generalized and classical random fields over an admissible hypergroup K. Our last aim in the present exposition will be the discussion of special properties of generalized random fields like boundedness, stationarity, filtering and harmonizability, at least for Pontryagin hypergroups K. These properties lead in a natural way to a duality for generalized random fields over K. Stationarity and Boundedness Given a ∈ K we introduce the translations La τ for τ ∈ SF (k) by La τ, f := τ, La− f
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
167
for all f ∈ SF (k), and L(a,a) τ for τ ∈ SF (K × K) by L(a,a) τ, f ⊗ g := τ, La− f ⊗ La− g for all (f, g) ∈ SF (K × K). 3.3.7. Definition A generalized random field X over K is said to be stationary if X(f ), X(g)L2 = X(La f ), X(La g)L2 for all a ∈ K, f, g ∈ SF (K). Clearly, a generalized random field X over K is stationary if and only if its covariance distribution CovX is diagonally invariant in the sense that for each a ∈ K, L(a,a) CovX = CovX . This fact follows from the equalities g)L2 CovX , f ⊗ g = X(f ), X(¯ = X(La− f ), X(La− g¯)L2 = CovX , (La− f ) ⊗ (La− g) = L(a,a) CovX , f ⊗ g valid for all f, g ∈ SF (K). From now on it is assumed that K is a Pontryagin hypergroup. For χ ∈ K ∧ we introduce the multiplication operator Mχ on SF (K) by Mχ f := χf for all f ∈ SF (K), and for a ∈ K the dual multiplication operator Ma on SF (K ∧ ) by Ma f (χ) := χ(a)f (χ) for all f ∈ SF (K ∧ ), χ ∈ K ∧ . The relationship between Ma and La is made precise by the formula ∨
La f = (Ma− f )∨ valid for all a ∈ K, f ∈ SF (K ∧ ).
October 24, 2013
10:0
9in x 6in
168
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
3.3.8. Definition A generalized random field X over K is called dually stationary if X(f ), X(g)L2 = X(Mχ f ), X(Mχ g)L2 for all f, g ∈ SF (K), χ ∈ K ∧ . Given generalized random fields X and Y over K and K ∧ respectively ∨ ˆ of X and the inverse-dual field Y of Y are defined by the dual field X ∨
ˆ ) := X(f ) X(f for all f ∈ SK (K ∧ ) and ∨
Y (f ) := Y (fˆ) for all f ∈ SF (K) respectively. 3.3.9. Properties of dualization ˆ is dually stationary. 3.3.9.1 X (over K) is stationary if and only if X ∧ If X is stationary, then for f, g ∈ SF (K ) ∨
∨
ˆ ), X(g) ˆ X(f = X(f ), X(g )L2 ∨
∨
= X(La− f ), X(La− g )L2 = X((Ma f )∨ ), X((Mag)∨ )L2 ˆ a f ), X(M ˆ a g), = X(M ˆ is dually stationary. hence X It is easy to show that ˆ = Y if 3.3.9.2 for generalized random fields X on K and Y on K ∧ , X ∨
and only if Y = X. Similar to property 3.3.9.1 one sees that ˆ is stationary. 3.3.9.3 X (over K) is dually stationary if and only if X ˆ 3.3.9.4 From the previous properties follows that the mappings X → X ∨
from SF (K) into SF (K ∧ ) and X → X from SF (K ∧ ) into SF (K) can be considered as Fourier transform and inverse Fourier transform of generalized random fields over K and K ∧ respectively. ˆ ∨ = X for every generalized random field over K. Clearly, (X)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
169
3.3.9.5 We recall that the Fourier transform of a distribution τ ∈ SF (K) is a distribution τˆ ∈ SF (K ∧ ) defined by ∨
ˆ τ , g := τ, g for all g ∈ SF (K ∧ ). With this terminology we assert that for a generalized random field X ˆ over K with dual field X, % X , f ⊗ g = Cov ˆ , f ⊗ g ∗ Cov X whenever f, g ∈ SF (K ∧ ). The proof follows easily from the subsequent equalities valid for all f, g ∈ SF (K ∧ ): % X , f ⊗ g = CovX , (f ⊗ g)∨ Cov ∨ ¯ ∨ = X(f ), X(g )L2 ∨
= X(f ), X((g¯∗ )∨ )L2 ˆ ), X( ˆ g¯∗ )L2 = X(f = CovXˆ , f ⊗ g ∗ . 3.3.10. Definition A generalized random field X over K is said to be U -bounded (uniformly bounded) or V -bounded (variation bounded) if there exists a constant c > 0 such that X(f )2 ≤ cf ∞ or X(f )2 ≤ cfˆ∞ for all f ∈ SF (K) respectively. The dualization applied above immediately yields property ˆ (over K ∧ ) is V 3.3.10.1 X (over K) is U -bounded if and only if X ˆ is U -bounded. bounded, X is V -bounded if and only if X Filtering and Boundedness The filtering transformation of generalized random fields over a hypergroup K suggested by the classical linear filtering and discussed in a more general
October 24, 2013
170
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
framework by A. M. Yaglom [54] and by D. K. Chang and M. M. Rao [6] can be treated in a purely harmonic-analytic way. For fixed k ∈ L1 (K) and h ∈ A(K) we introduce the mappings Tk and Mh on SF (K) by Tk (f ) := k ∗ f and Mh (f ) := hf for all f ∈ SF (K) respectively. Clearly, the linear operators Tk and Mh can also be applied to distributions in SF (K) and to generalized random fields X over K, the latter application being given by Tk X(f ) := X(Tk f ) and Mh X(f ) := X(Mh f ) for all f ∈ SF (K) respectively. The resulting generalized random fields are called filtered. 3.3.11. Properties of filtering ˆ 3.3.11.1 (Tk X)∧ = Mkˆ X. In fact, for all f ∈ SF (K) ∨
(Tk X)∧ (f ) = (Tk X)(f ) ∨
= X(k ∗ f ) ˆ ))∨ = (X(kf ˆ ) ˆ kf = X( ˆ ). = Mk X(f 3.2.11.2 CovMh X = Mh⊗h¯ CovX . The assertion follows from the equalities g ) CovMh X , f ⊗ g = Mh X(f ), Mh X(¯ = X(hf ), X(h¯ g)L2
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
171
¯ = CovX , (hf ) ⊗ (hg) = Mh⊗h¯ CovX , f ⊗ g for all f, g ∈ SF (K). 3.3.11.3 CovTk X = Tk⊗k¯ CovX . This follows from 3.3.9.5 and the previous two properties, since an obvious calculation yields % T X , f ⊗ g = (Tk⊗k− CovX )∧ , f ⊗ g Cov k valid for all f, g ∈ SF (K). 3.3.12. Theorem The following three statements are equivalent: (i) X is U -bounded. (ii) Tk X is U -bounded for all k ∈ L1 (K). (iii) Mh X is U -bounded for all h ∈ A(k). Proof. It suffices to show the equivalence (i) ⇔ (ii). Let X be U -bounded. Then for each k ∈ L1 (K) Tk X(f )2 ≤ X(k ∗ f )2 ≤ ck ∗ f ∞ ≤ ck1 f ∞ < ∞, whenever f ∈ SF (K) and c > 0. Thus Tk is U -bounded. Conversely, let Tk X be U -bounded for each k ∈ L1 (K). Then there exists for k ∈ L1 (K) a constant ck > 0 such that Tk X(f )2 ≤ ck f ∞ for all f ∈ SF (K). At first we show that Tk X(f )2 ≤ ck1 f ∞ for all k ∈ L1 (k), f ∈ SF (k) and c > 0 which means that the mapping τX : L1 (k) → L(SF (k), L20 ) := L((SF (K), · ∞ ), L20 )
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
172
b1644-ch02
Real and Stochastic Analysis
given by τX (k) := Tk X for all k ∈ L1 (K) is a bounded linear operator. While the linearity of τX is clear, the boundedness requires further arguments. Let (kα )α∈A be a net in L1 (K) with kα → 0 (in L1 (K)) and Tkα X → Y (in L(SF (K), L20 )). Then for f, g ∈ SF (K) (Tkα f ) ⊗ g → 0 (in L1 (K)) and Y (f ), X(g)L2 = limTkα X(f ), X(g)L2 α
= limX(kα ∗ f ), X(g)L2 α
= limCovX , (kα ∗ f ) ⊗ g¯ = 0, α
hence Y = 0. Now the closed graph theorem applies and yields the boundedness of τX . Next, for f ∈ SF (K) we choose an approximate unit (gα )α∈A in L1 (K) with gα 1 = 1 for all α ∈ A such that lim gα ∗ f − f SF (K) = 0. α
From the estimate above we obtain the existence of c > 0 such that Tgα X(f )2 ≤ cf ∞ for all α ∈ A. But Tgα X → X (in the operator topology), hence X(f )2 ≤ cf ∞ for all f ∈ SF (K), and this is the U -boundedness of X. Similarly to Theorem 3.3.12 one proves the subsequent
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
173
3.3.13. Theorem. The following statements are equivalent: (i) X is V -bounded. (ii) Tk X is V -bounded for all k ∈ L1 (K). (iii) Mh X is V -bounded for all h ∈ A(K). One just looks at 3.3.10.1. Harmonizability As a preparation we introduce the notation ˆ 0 (K2 ), · ⊗ (V0 (K1 × K2 ), · V0 ) := (C 0 (K1 )⊗C ˆ) for Pontryagin hypergroups K1 and K2 . One observe that SF (K1 × K2 ) is τco -dense in V0 (K1 × K2 ), and V0 (K1 × K2 ) is τco -dense in C 0 (K1 × K2 ). A reference to Theorem 3.2.13 and Corollary 3.2.10 suffices. 3.3.14. Definition Bimeasures on K1 × K2 are elements of the space BM (K1 × K2 ) := V0 (K1 × K2 ). Clearly, M b (K1 × K2 ) ⊂ BM (K1 × K2 ) ⊂ SF (K1 × K2 ), where these spaces are dense within the succeeding ones. The Fourier transform βˆ of a bimeasure β ∈ BM (K1 ×K2 ) is defined by ˆ 1 , χ2 ) := β(χ¯1 , χ¯2 ) β(χ for all χ1 ∈ K1∧ , χ2 ∈ K2∧ It turns out that βˆ ∈ C b (K1∧ × K2∧ ). Details on this approach to bimeasures can be found in [16] by C.C. Graham and B.M. Schreiber. 3.3.15. Theorem Let X be a generalized random field over K. (i) X is U -bounded if and only if CovX is uniquely extendible to a bimeasure on K × K.
October 24, 2013
10:0
9in x 6in
174
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
% X is uniquely extendible to a bimea(ii) X is V -bounded if and only if Cov ∧ ∧ sure on K × K . Proof. As in the previous characterization of boundedness it is sufficient to prove (i). Let X be U -bounded, and let f ∈ SF (K). For fn ⊗ gn . f= n≥1
with (fn )n∈N and (gn )n∈N in SF (K) we have $ $ $ $ $ $ $CovX , fn ⊗ gn$$ ≤ |CovX , fn ⊗ gn | $ $ n≥1 $ n≥1 = |X(fn ), X(¯ gn )L2 | n≥1
≤
X(fn)2 X(gn )2
n≥1
≤c
fn ∞ gn ∞
n≥1
with c > 0. Since SF (K × K) is τco -dense in V0 (K × K), CovX is uniquely extendible to a bimeasure on K × K. Conversely, if there exists a unique extension of CovX to V0 (K × K), then X(f )22 = CovX , f ⊗ f¯ ≤ cf ⊗ f¯V0 ≤ cf 2∞ for the f ∈ SF (K) and some c > 0, hence X is U -bounded.
3.3.16. Discussion % X can be Given a V -bounded generalized random field X over K, Cov ∧ ∧ uniquely extended to a bimeasure β on K × K by Theorem 3.3.15(ii). Consequently βˆ =: g ∈ C b (K ×K), and CovX can be identified with a function in C b (K × K). Then Theorem 3.3.5 implies that X can be identified ˜ over K. with a unique classical random field X In summary we recognize that V -bounded generalized random fields over K can be identified with V -bounded, hence with harmonizable classical
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
175
random fields over K, and harmonizability of general random fields does not require a separate definition. 3.3.17. Definition A generalized random field X over K is called strongly harmonizable if % X can be identified with a measure µ ∈ M b (K ∧ × K ∧ ) in the sense that Cov % X , f ⊗ g = µ, f ⊗ g Cov for all f, g ∈ SF (K ∧ ). If X is a strongly harmonizable classical random field over K, then its covariance kernel ρX is identified with a bimeasure βX on B(K ∧ ) ⊗ B(K ∧ ) (see Definition 2.3.1). Consequently, the notions of strong harmonizability for classical and for generalized random fields coincide. But for classical random fields over K strong harmonizability implies V -boundedness by Theorem 2.3.5. Therefore the following result is quite suggestive. 3.3.18. Theorem For any generalized random field X over K strong harmonizability implies V -boundedness. % X is extendible to a conProof. From the assumption it follows that Cov 0 ∧ ∧ tinuous linear mapping on C (K × K ). We noted above that SF (K ∧ × K ∧ ) ⊂ V0 (K ∧ × K ∧ ) ⊂ C 0 (K ∧ × K ∧ ), %X where each of these spaces is τco -dense in the succeeding one. Hence Cov ∧ ∧ is a continuous linear mapping also on V0 (K × K ), but as such it can be extended to a bimeasure on K ∧ × K ∧ . By Theorem 3.3.15 (ii) this is equivalent to V -boundedness of X. 3.3.19. In Summary we can state that there is a 1-1-correspondence between strongly harmonizable generalized and strongly harmonizable classical random fields over K. In deviation from classical random fields, stationary generalized random fields over K are not necessarily strongly harmonizable or V -bounded. Many questions concerning stationarity and harmonizability of generalized random fields K remain open, since supporting results from the
October 24, 2013
176
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
harmonic analysis of commutative hypergroups are yet to be established. These remain also the challenge to extend part of the theory beyond Pontryagin hypergroups. 3.4. Suggestions for further research Much work has been done in recent years on the analysis of hypergroups (Fourier algebra, amenability) and on their structure (induced representations, extensions). In the sequel we restrict ourselves to reporting only on those open problems which are related to random fields over hypergroups. In [20] the author formulated ideas of extending the works by J. J. Fournier and K. A. Ross [14] on random Fourier series on compact commutative hypergroups, and by G. Blower [3] on spectrally generated random fields over certain hypergroups with manifolds as their basic spaces. There is still the open problem of introducing a notion of white noise over a hypergroup. Although W. H¨ ormann [23] discussed quite efficiently properties of white noise over locally compact Abelian groups, an extension to commutative hypergroups seems to rely intrinsically on spectral synthesis results which still have to be established. There are two actual suggestions arising from recent work on random fields over algebraic-topological structures which could lead to new properties of random fields over hypergroups. We first mention the paper [22] by the author and M. M. Rao on infinite dimensional stationary random fields over locally compact Abelian groups G. In this work random fields over G are considered as mappings (x, g) → X(x, g) from E × G into the Hilbert space L2 (Ω, F, P; C), where E denotes some vector space. These random fields admit an integral representation of the form χ(g)Z(x, dχ) X(x, g) = G∧
for all (x, g) ∈ E × G, where Z(x, ·) is a random measure on B(G∧ ) for each x ∈ G. The investigations in [22] aim at random fields having a weak Markov property. This aspect would be worthwhile to pursue for random fields over commutative hypergroups. Another recent paper [38] by A. Malyarenko contains a useful notion of invariant random fields over vector bundles, inspired by problems in cosmology. Let (E, π, T ) be a vector bundle, where
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
177
E and T are topological spaces, π : E → T is a continuous mapping and the fiber Et := π −1 (t) is a finite dimensional vector space for each t ∈ T . The most prominent example of a vector bundle is the tangent bundle of a manifold T . A vector random field over a vector bundle Ξ := (E, π, T ) is a collection X of random vectors ω → X(t, ω) such that X(t) := X(t, ·) ∈ Et for each t ∈ T . In other words, a vector random field over Ξ is a random section of Ξ. X is called a second order vector random field over Ξ if EX(t)2Et < ∞ for all t ∈ T , where it is assumed that every fiber Et carries an inner product and that the function x → x2Et on Et is continuous. With some additional effort one defines mean square continuity for vector random fields over Ξ and introduces the notion of invariance with respect to a group action on T . The approach in [38] yields a spectral decomposition of a vector random field over a compact homogeneous space. It might be appealing to consider this problem also for hypergroups generated by group actions as cited in Example 1.1.3.
Bibliographical Notes In the Introduction general references to the main sources of the exposition have been given. The text itself provides detailed information whenever the new results require justification “sur place”. The present additional bibliographical notes are intended to supplement references of historical relevance and to widen the path to the immense literature on the subject. Section 1 The aim of this introductory section is to select basic knowledge on hypergroups and their harmonic analysis. Apart from the references [1] by W. R. Bloom and the author and [21] by the author, the interested reader should also be directed to the pioneering papers [11], [27] and [48],
October 24, 2013
178
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
[49] by the founders of hypergroup theory C. F. Dunkel, R. I. Jewett and R. Spector respectively. A slightly more detailed report on the modification of hypergroups has been given in our exposition, since this analytic procedure invented by M. Voit in [50], makes it possible to construct new hypergroups from given ones and to extend the notion of second order random fields (Section 2) to generalized random fields over hypergroups (Section 3). For a condensed treatment of discrete commutative hypergroups and important classes of examples R. Lasser’s expository paper [31] is recommended for supplementary reading. An interesting enlargement of hypergroups with Euclidean basic spaces is the class of local Sturm-Liouville hypergroups discussed by C. Rentzsch in [46]. A short description of higher rank Bessel hypergroups has been added, since these hypergroups studied extensively by M. R¨ osler [47], M. Voit [52] and W. Hazod [17], are selfdual and hence interesting examples illustrating the study of generalized random fields (Section 3). Section 2 is devoted to the generalization of second order random fields to commutative hypergroups. The first attempt to investigate these random fields within the general setting was made by R. Lasser and M. Leitner in various papers. In subsequent studies stationarity and harmonizability became central topics. Stationary random fields over discrete hypergroups and their applications to statistics are discussed in the publications [32], [33] by R. Lasser and M. Leitner, and in [24], [25] by V. H¨ osel and R. Lasser. In [35] M. Leitner generalized some results of L. Bruckner [4] to discrete polynomial hypergroups. Background material on harmonizability of random fields over hypergroups can be found in the work of M. M. Rao and coauthors, for instance in [43], [44] and [6], also in Y. Kakihara’s textbook [28] which provides an extensive list of references. The papers [9] and [10] by D. Dehay and R. Moch´e encouraged H.-J. Neu to obtain results on harmonizing functions and operators within the framework of commutative hypergroups [39]. Section 3 is based on the seminal work of I. M. Gelfand [15] and K. Ito [26] who considered random fields as mappings from the Schwartz space of test functions on the Euclidean space into some Hilbert space, an idea which allows to view covariance kernels as distributions. The extension of this approach to general locally compact Abelian groups goes back to
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
179
A. M. Yaglom [53], [54] and A. L. Ponomarenko [41], [42]. These authors replaced the Schwartz space by some more abstractly defined algebra of functions. For commutative hypergroups the notion of a Segal algebra introduced by A. K. Chilana and K. A. Ross [7], A. K. Chilana and A. Kumar [8], and by R. B¨ urger [5] proved to be fruitful. Following the investigations of H. Niemi [40], W. H¨ ormann [23] and H. G. Feichtinger and W. H¨ ormann [13] the theory of generalized random fields has been extended to commutative hypergroups in two steps: for strong hypergroups by M. Leitner [37] and for more general admissible hypergroups by H.-J. Neu [39]. In the survey [20] the author discusses the deficiencies of the present approach, especially the limitation arising from the restriction to Pontryagin hypergroups. Even for nondiscrete Pontryagin hypergroups the notion of white noise is yet to be defined.
References [1] W. R. Bloom and H. Heyer, Harmonic Analysis of Probability Measures on Hypergroups, Walter de Gruyter, Berlin–New York, 1995. [2] W. R. Bloom and H. Heyer, Polynomial hypergroup structures and applications to probability theory, Publ. Math. Debrecen 72(1–2) (2008), 199–225. [3] G. Blower, Stationary processes for translation operators, Proc. London Math. Soc. (3) 72 (1996), 697–720. [4] L. Bruckner, Interpolation of homogeneous random fields on discrete groups, Ann. Math. Statistics 40(1) (1969), 251–258. [5] R. B¨ urger, Contributions to duality theory on groups and hypergroups, in Topics in Modern Harmonic Analysis, Proceedings of a Seminar held in Torino and Milano 1982, Vol. II, Istituto Nazionale di Alta Mathematica Francesco Severi, Roma, 1983, pp. 1055–1070. [6] D. K. Chang and M. M. Rao, Bimeasures and nonstationary processes, in: Real and Stochastic Analysis, M. M. Rao (ed.), John Wiley & Sons, 1986, pp. 7–118. [7] A. K. Chilana and K. A. Ross, Spectral synthesis in hypergroups, Pacific J. Math. 76 (1978), 313–328. [8] A. K. Chilana and A. Kumar, A spectral synthesis in Segal algebras on hypergroups, Pacific J. Math. 80 (1979), 59–76. [9] D. Dehay and R. Moch´e, Strongly harmonizable approximations of bounded continuous random fields, Stoch. Proc. and their Appl. 23 (1986), 327–331. [10] D. Dehay and R. Moch´e, Strongly harmonizing operators and strongly harmonizable approximations of continuous random fields on LCA groups, Stoch. Proc. and their Appl. 29 (1988), 129–139.
October 24, 2013
180
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
[11] C. F. Dunkl, The measure algebra of a locally compact hypergroup, Trans. Amer. Math. Soc. 179 (1973), 331–348. [12] H. G. Feichtinger, On a new Segal algebra, Monatsh. Math. 92 (1981), 269–289. [13] H. G. Feichtinger and W. H¨ ormann, Harmonic analysis of generalized stochastic processes on locally compact Abelian groups, Manuscript, after 1989. [14] J. J. Fournier and K. A. Ross, Random Fourier series on compact Abelian hypergroups, J. Austral. Math. Soc. (Series A) 37 (1984), 45–81. [15] I. M. Gelfand, Generalized random processes, Dokl. Akad. Nauk. SSSR 100 (1955), 853–856. [16] C. C. Graham and B. M. Schreiber, Bimeasure algebras on LCA groups, Pacific J. Math. 115 (1984), 91–127. [17] W. Hazod, Probability on matrix-cone hypergroups: Limit theorems and structural properties, J. Applied Analysis 15(2) (2009), 205–245. [18] P. Hermann and M. Voit, Induced representations and duality results for commutative hypergroups, Forum Math. 7 (1995), 543–558. [19] C. S. Herz, Bessel functions of matrix argument, Ann. of Math. (2) 61 (1955), 474–523. [20] H. Heyer, The covariance distribution of a generalized random field over a commutative hypergroup, Contemporary Mathematics 261 (2000), 73–82. [21] H. Heyer, Structural Aspects in the Theory of Probability, Second Enlarged Edition, World Scientific Publishers, Singapore, 2010. [22] H. Heyer and M. M. Rao, Infinite dimensional stationary random fields over a locally compact Abelian group, Int. J. Math. 23(4) (2012) 1250029 (23 pages). [23] W. H¨ ormann, Generalized Stochastic Processes and Wigner Distributions, Dissertation, Universit¨ at Wien, 1989. [24] V. H¨ osel and R. Lasser, One-step prediction for Pn -weakly stationary processes, Mh. Math. 113 (1992), 199–212. [25] V. H¨ osel and R. Lasser, Prediction of weakly stationary sequences on polynomial hypergroups, The Annals of Probability 31(1) (2003), 93–114. [26] K. Ito, Stationary random distributions, Memoirs College of Sci. Univ. Kyoto 28A (1953), 209–223. [27] R. I. Jewett, Spaces with an abstract convolution of measures, Adv. in Math. 18(1) (1975), 1–101. [28] Y. Kakihara, Multidimensional Second Order Stochastic Processes, World Scientific Publishers, Singapore, 1997. [29] I. Kluv´ anek, Characterization of Fourier-Stieltjes transforms of vector and ˇ operator-valued measures, Czech. Math. J. 17(92) (1967), 261–277. [30] T. H. Koornwinder and A. L. Schwartz, Product formulae and associated hypergroups for orthogonal polynomials on the simplex and on a parabolic triangle, Constructive Approximation 13(4) (1997), 537–567. [31] R. Lasser, Discrete commutative hypergroups, in Advances in the Theory of Special Functions and Orthogonal Polynomials, W. zu Castell, F. Filbir and B. Forster (eds.), Vol. 2, Nova Science Publishers, 2005, pp. 55–102.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Random Fields and Hypergroups
b1644-ch02
181
[32] R. Lasser and M. Leitner, Stochastic processes indexed by hypergroups I, J. Theor. Prob. 2(3) (1989), 301–311. [33] R. Lasser and M. Leitner, On the estimation of the mean of weakly stationary and polynomial locally stationary sequences, J. Multivariate Analysis 35(1) (1990), 31–47. [34] M. Leitner, Stochastic processes indexed by hypergroups II, J. Theor. Prob. 4(2) (1991), 321–332. [35] M. Leitner, Regularity and singularity of weakly stationary processes indexed by a commutative hypergroup, in Probability Measures on Groups X, Oberwolfach 1990, H. Heyer (ed.), Plenum Press, New York–London, 1991, pp. 269–278. [36] M. Leitner, Hyper-weakly harmonizable processes and operator families, Stoch. Analysis and its Appl. 13 (1995), 471–485. [37] M. Leitner, Character invariant Segal algebras on hypergroups, Preprint, 1995. [38] A. Malyarenko, Invariant random fields in vector bundles and application to cosmology, Annales de L’Institut Henri Poincar´e-Probabiliti´ es et Statistiques 47(4) (2011), 1068–1095. [39] H.-J. Neu, Beitr¨ age zur Theorie klassischer und verallgemeinerter zuf¨ alliger Felder auf einer kommutativen Hypergruppe, Dissertation, Universit¨ at T¨ ubingen, 1999. [40] H. Niemi, Stochastic rocesses as Fourier transforms of stochastic measures, Acad. Sci. Fenn. A I Math. 591 (1975), 1–47. [41] A. I. Ponomarenko, Harmonic analysis of generalized wide-sense homogeneous random fields on a locally compact group, Theory Probab. Math. Statist. 4 (1974), 119–137. [42] A. I. Ponomarenko, Generalized second order random fields on locally compact groups, Theory Probab. Math. Statist. 29 (1984), 125–133. [43] M. M. Rao, Harmonizable processes: Structure theory, L’Enseignement math´ematique, T. XXVIII, fasc.3–4 (1982), 295–351. [44] M. M. Rao, Bimeasures and harmonizable processes, in Probability Measures on Groups IX; Oberwolfach 1988, H. Heyer (ed.), Lecture Notes in Math. 1379, Springer-Verlag, Berlin, 1989, pp. 254–298. [45] M. M. Rao, Random and Vector Measures, World Scientific Publishers, Singapore, 2012. [46] C. Rentzsch, L´evy-Khintchine representation on local Sturm-Liouville hypergroups Infinite Dimensional Analysis, Quantum Probability and Related Topics 2(1) (1991), 79–104. [47] M. R¨ osler, Bessel convolutions on matrix cones, Compos. Math. 143 (2007), 749–779. [48] R. Spector, Th´eorie axiomatique des hypergroupes, C.R. Acad. Sci. Paris S´er. A-B 280(25) (1975), A105–A106. [49] R. Spector, Measures invariantes sur les hypergroupes, Trans. Amer. Math. Soc. 239 (1978), 147–165. [50] M. Voit, Positive characters on commutative hypergroups and some applications, Math. Z. 198(3) (1988), 405–421.
October 24, 2013
182
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch02
Real and Stochastic Analysis
[51] M. Voit, Properties of subhypergroups, Semigroup Forum 56(3) (1998), 373–391. [52] M. Voit, Bessel convolution on matrix cones: Algebraic properties and random walks, J. Theoret. Probab. 22 (2009), 741–771. [53] A. M. Yaglom, Second order homogeneous random fields, in Proc. Fourth Berkeley Symp. Math. Statist. and Prob. Vol. 2, University of California Press, 2 (1961), pp. 593–622. [54] A. M. Yaglom, Correlation Theory of Stationary and Related Random Functions I. Basic Results, Springer-Verlag, Berlin, 1987.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
CHAPTER 3 A CONCISE EXPOSITION OF LARGE DEVIATIONS
F. HIAI
0. Introduction The purpose of this survey article is to give concise expositions on familiar large derivation theorems of Cram´er, of G¨ artner-Ellis and of Sanov as well as rather recently developed large deviations for random matrices and in quantum spin chains. The general abstract framework of large deviation was proposed by S. R. S. Varadhan [38] in 1966 although the topic may be traced back to much earlier. The theorem of Cram´er [11] for independent identically distributed (i.i.d.) variables was published in 1938 and the level-2 extension of Sanov [36] was discovered in 1957. On the other hand, the noni.i.d. extension of the Cram´er theorem was considered by G¨artner [18] in 1977 and completed by Ellis [15] in 1984. To heuristically describe what is the large deviation principle, let X1 , X2 , . . . be a sequence of i.i.d. Gaussian real random variables with stann dard distribution N (0, 1). Then the empirical sum Sn := n−1 i=1 Xi has distribution N (0, 1/n) so that, for any δ > 0, ∞ 2 n P (|Sn | ≥ δ) = 2 e−nx /2 dx 2π δ and we have δ2 1 log P (|Sn | ≥ δ) = − . n→∞ n 2 lim
This means that
nδ2 P (|Sn | ≥ δ) ≈ exp − , 2
183
October 24, 2013
10:0
9in x 6in
184
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
which decreases to 0 exponentially fast. Indeed, the Cram´er theorem for this (Xi ) tells more generally that, for any a < b, 1 x2 log P (Sn ∈ (a, b)) = − inf . n→∞ n x∈(a,b) 2 lim
On the other hand, let M1 (R) be the set of probability measures on R and µ0 be the standard Gaussian measure. The Sanov theorem treats the empirical measure (a random probability measure) δX1 + · · · + δXn , n where δx is the Dirac measure at x ∈ R, and it tells that if F is a closed subset of M1 (R) (in weak topology) with µ0 ∈ / F , then δX1 + · · · + δXn ∈ F ≈ exp −n inf S(νµ0 ) , P ν∈F n where S(νµ0 ) is the relative entropy (or the Kullback-Leibler divergence) of ν with respect to µ0 . Note that S(νµ0 ) is also written as 1 1 −H(ν) + (0.1) x2 dν(x) + log 2π, 2 R 2 where H(ν) is the Boltzmann-Gibbs entropy − R p(x) log p(x) dx with p(x) being the density of ν. As seen from the above special cases of the Cram´er and the Sanov theorems, the exponentially fast convergence is an essence of large deviations. The convergence is governed by a certain rate function I; in the above situation, I(x) := x2 /2 on R for the Cram´er case and I(ν) := S(νµ0 ) on M1 (R) for the Sanov case. In this way, a large deviation may be considered as a sort of convergence result strengthening the wellknown strong law of large numbers: Sn → 0 or (δX1 + · · · + δXn )/n → µ0 as n → ∞ almost surely. In Section 1 of the present article, the definition and some general basics of large deviations are summarized. Section 2 gives a full account on the Cram´er large deviation. The G¨artner-Ellis large deviation is treated in detail in Section 3. In the second half of this section, based on Mosonyi’s notes [30], the large deviation lower bound is shown in a weaker assumption, which does not seem available in other literatures. This may be useful since a somewhat strong assumption of the G¨ artner-Ellis theorem is sometimes not easy to verify in actual applications. A short but detailed enough account on Varadhan’s integral lemma is supplied in Section 4. In the first half of Section 5 a weak large deviation
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
185
(considered as an infinite-dimensional extension of the Cram´er theorem) in the abstract setting of locally convex topological vector space is presented, and the Sanov large deviation is established in the second half. The expositions in Sections 1–5 are essentially self-contained with detailed proofs, except a few facts in convex analysis in Section 4 and some familiar properties about weak topology on probability measures on a Polish space in Section 5. The interested reader may consult, for instance, [34] for convex analysis in Euclidean spaces and [6, 7] for weak topology on measures. The main source of Sections 2–4 is Dembo and Zeitouni’s monograph [13] while that of Section 5 is Deuschel and Stroock’s [14]; both are quite comprehensive and readable texts on the subject. The book [16] of Ellis is also a good introduction to large deviations combined with statistical mechanics; the terminology of three levels was used there. Section 6 is concerned with the large deviation of level-2 for the empirical eigenvalue distribution of random Hermitian matrices (typically, GUE or Gaussian unitary ensemble), which was first presented by Ben-Arous and Guionnet [2] in the course of the development of free probability theory. But it is a bit strange to the author that the large deviation for random matrices did not emerge before the discovery of free probability theory due to Voiculescu since the random matrix theory has a long history before free probability. The exposition of Section 6 is taken from [23]. In Section 7 a recent development on quantum large deviations in quantum spin chains is surveyed without proofs. The large deviation of level-1 in one-dimensional quantum spin chains was proved by Ogata [32] after some attempts in [21, 27, 31] when the reference state is the Gibbs state for a finite-range interaction or a C ∗ -finitely correlated state. But it seems that the quantum large deviation of level-2 (i.e., the quantum version of the Sanov theorem) is still not completed in spite of some attempts in [8, 9]. Among quite many applications of large deviations, those to Boltzmann-Gibbs entropy/mutual entropy and free entropy/orbital free entropy are exemplified in Section 8, which may be a very brief introduction to the microstate approach to both classical and free probability theories. The microstate approach to free entropy was developed in [39, 40], and its orbital version was in [19] (also [44]).
1. Definitions and Generalities Let X be a Hausdorff topological space and BX the Borel σ-field on X . Let M1 (X ) denote the set of all probability measures on (X , BX ). Let (µn )
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
186
b1644-ch03
Real and Stochastic Analysis
be a sequence of µn ∈ M1 (X ), n ∈ N. Let (εn ) be a sequence of numbers with 0 < εn → 0 as n → ∞. In most cases, we take εn = 1/n or sometimes εn = 1/n2 . Moreover, let I : X → [0, ∞] be a lower semicontinuous function. Definition 1.1. The sequence (µn ) is said to satisfy the (full) large deviation principle (LDP) in the scale εn with the rate function I if, for every Γ ∈ BX , − inf◦ I(x) ≤ lim inf εn log µn (Γ) ≤ lim sup εn log µn (Γ) ≤ − inf I(x), x∈Γ
n→∞
n→∞
x∈Γ
where Γ◦ and Γ denote the interior and the closure of Γ, respectively. It is straightforward to see that the above definition is equivalent to saying that the following two properties hold: (a) for every closed F ⊂ X , lim sup εn log µn (F ) ≤ − inf I(x), n→∞
x∈F
(b) for every open G ⊂ X , lim inf εn log µn (G) ≥ − inf I(x). n→∞
x∈G
Inequalities (a) and (b) are called the large deviations upper bound and the large deviations lower bound, respectively. Definition 1.2. The sequence (µn ) is said to satisfy the weak LDP in the scale εn with the rate function I if the lower bound in (b) holds for every open G ⊂ X and the upper bound in (a) holds for every compact subset of X . Definition 1.3. The rate function I is said to be good if the level set {x ∈ X : I(x) ≤ α} is compact in X for every α ∈ [0, ∞). Definition 1.4. The sequence (µn ) of probability measures on (X , BX ) is said to be exponentially tight (with respect to the scale εn ) if for every α > 0 there is a compact Kα ⊂ X such that lim sup εn log µn (Kαc ) < −α.
(1.1)
n→∞
The next lemma tells that exponential tightness would play a crucial role in proving the LDP.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
187
Lemma 1.5. Assume that (µn ) is exponentially tight. Then: (1) If the upper bound in (a) holds for every compact subset of X , then it also holds for every closed subset of X . (2) If the lower bound in (b) holds for every open subset of X , then I is good. Consequently, if (µn ) is exponentially tight and it satisfies the weak LDP with a rate function I, then I is a good rate function and (µn ) satisfies the LDP. Proof. (1) Let F ⊂ X be closed. To show the upper bound for F , we may assume that β := inf x∈F I(x) ∈ (0, ∞]. Let 0 < α < β be arbitrary and take a compact Kα ⊂ X as in Definition 1.4. Since F ∩ Kα is compact, lim sup εn log µn (F ∩ Kα ) ≤ − n→∞
inf
x∈F ∩Kα
I(x) ≤ −β < −α
as well as (1.1). Hence, for every n sufficiently large, we have µn (F ) ≤ µn (F ∩ Kα ) + µn (Kαc ) ≤ e−α/εn + e−α/εn = 2e−α/εn so that lim sup εn log µn (F ) ≤ −α. n→∞
Letting α β gives the conclusion. (2) Apply the lower bound to the open set Kαc given in Definition 1.4 to have − inf c I(x) ≤ lim inf εn log µn (Kαc ) < −α x∈Kα
n→∞
so that inf x∈Kαc I(x) > α. This means that {x : I(x) ≤ α} ⊂ Kα . Hence {x : I(x) ≤ α} is compact since it is closed due to the lower semicontinuity of I. Theorem 1.6. Assume that X is a regular topological space and that (µn ) ⊂ M1 (X ) satisfies the LDP with a rate function I. Then I is a unique rate function associated with the LDP of (µn ). Proof. Suppose I˜ is also a rate function associated with (µn ). Since I is lower semicontinuous, note that
I(x) = sup inf I(y) G
y∈G
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
188
b1644-ch03
Real and Stochastic Analysis
for every x ∈ X , where G runs over neighborhoods of x. Hence for any δ > 0 there is a neighborhood G1 of x such that inf y∈G1 I(y) > min{I(x)−δ, 1/δ}. Since X is regular, there is an open neighborhood G of x such that G ⊂ G1 . ˜ we have From the LDP of (µn ) with both rate functions I and I, ˜ − inf I(y) ≥ lim sup εn log µn (G) ≥ lim inf εn log µn (G) ≥ − inf I(y) n→∞
n→∞
y∈G
y∈G
so that ˜ ≥ inf I(y) > min{I(x) − δ, 1/δ}. ˜ I(x) ≥ inf I(y) y∈G
y∈G
˜ ˜ we have Letting δ 0 gives I(x) ≥ I(x). Replacing the roles of I and I, ˜ I = I. The next theorem provides a practical way to show the weak LDP. Theorem 1.7. Let A be an open base of X , and let (µn ) ⊂ M1 (X ) be given. For every x ∈ X define I(x) :=
sup A∈A: x∈A
− lim inf εn log µn (A) . n→∞
Assume that, for every x ∈ X , I(x) =
sup A∈A: x∈A
− lim sup εn log µn (A) . n→∞
Then (µn ) satisfies the weak LDP with the rate function I. Proof. From the definition of I it is immediate to see that I is a nonnegative and lower semicontinuous function on X . For any open G ⊂ X and any x ∈ G, there is an A ∈ A such that x ∈ A ⊂ G and so lim inf εn log µn (G) ≥ lim inf εn log µn (A) ≥ −I(x). n→∞
n→∞
Hence the lower bound (b) holds. Next, let K ⊂ X be compact. For each δ > 0 define I δ (x) := min{I(x) − δ, 1/δ},
x ∈ X.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
189
For every x ∈ K there is an Ax ∈ A such that x ∈ Ax and − lim sup εn log µn (Ax ) ≥ I δ (x). n→∞
m Choose x1 , . . . , xm ∈ K such that K ⊂ i=1 Axi and so µn (K) ≤ m µ (A ). Therefore, n x i i=1 lim sup εn log µn (K) ≤ lim sup εn log m max µn (Axi ) n→∞
1≤i≤m
n→∞
= lim sup εn log m + max εn log µn (Axi ) 1≤i≤m
n→∞
= max lim sup εn log µn (Axi ) 1≤i≤m n→∞
≤ − min I δ (xi ) ≤ − inf I δ (x). 1≤i≤m
x∈K
Letting δ 0 gives the upper bound (a) for compact sets.
The next theorem is a partial converse of Theorem 1.7, which gives another way to show Theorem 1.6. Theorem 1.8. Assume that X is a regular topological space, and let A be an open base of X . Assume that (µn ) ⊂ M1 (X ) satisfies the LDP with a rate function I. Then for every x ∈ X , I(x) = sup − lim inf εn log µn (A) A∈A: x∈A
=
n→∞
sup A∈A: x∈A
− lim sup εn log µn (A) . n→∞
Proof. Let x ∈ X be arbitrary. The large deviations lower bound implies that, for every A ∈ A with x ∈ A, lim inf εn log µn (A) ≥ − inf I(y) ≥ −I(x). n→∞
y∈A
Hence I(x) ≥
sup A∈A: x∈A
− lim inf εn log µn (A) . n→∞
On the other hand, the large deviations upper bound implies that lim sup εn log µn (A) ≤ − inf I(y) n→∞
y∈A
(1.2)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
190
b1644-ch03
Real and Stochastic Analysis
so that sup A∈A: x∈A
− lim sup εn log µn (A) ≥ n→∞
sup
inf I(y) .
(1.3)
A∈A: x∈A y∈A
Assume I(x) > α. Note that {y ∈ X : I(y) > α} is open due to the lower semicontinuity of I. Since X is regular, there is an A ∈ A such that x ∈ A ⊂ A ⊂ {y ∈ X : I(y) > α} and so inf y∈A I(y) ≥ α. Therefore, sup
inf I(y) ≥ I(x).
(1.4)
A∈A: x∈A y∈A
Combining (1.2)–(1.4) yields the conclusion.
The next theorem says that the LDP is preserved under continuous maps. The proof is an easy exercise. Theorem 1.9 (Contraction principle). Let X and Y be Hausdorff topological spaces and T : X → Y a continuous map. Assume that (µn ) ⊂ M1 (X ) satisfies the LDP with a good rate function I : X → [0, ∞]. Then (µn ◦ T −1 ) ⊂ M1 (Y) satisfies the LDP with the good rate function I T (y) := inf{I(x) : x ∈ T −1 y},
y ∈ Y,
where inf ∅ = ∞ as usual. In the Introduction, it was heuristically claimed that the LDP is a stronger version (with exponentially fast convergence) of the law of large numbers. The next theorem may be a rigorous statement for that. Theorem 1.10. Let (Yn ) be a sequence of X -valued random variables on a probability space (Ω, P ), and let µn := µYn ∈ M1 (X ), the distribution of Yn . Assume that (µn ) satisfies the LDP in the scale εn with a good rate function I having a unique minimizer x0 . Moreover, assume that x0 has ∞ a countable neighborhood base and that n=1 r1/εn < ∞ for all r ∈ (0, 1) (this is the case if εn = 1/n). Then (Yn ) converges to x0 almost surely. Proof. First, note that the LDP implies that inf x∈X I(x) = 0 and hence I(x0 ) = 0. Choose a sequence of open neighborhoods G1 ⊃ G2 ⊃ · · · of x0 . Since I is good, we have δk := inf x∈Gck I(x) ∈ (0, ∞] for every k ∈ N. Hence
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
191
A Concise Exposition of Large Deviations
for each k we have lim sup εn log µn (Gck ) ≤ − inf c I(x) ≤ δk x∈Gk
n→∞
∞ ∞ so that n=1 µn (Gck ) ≤ n=1 e−δk /εn < ∞. By the Borel-Cantelli lemma, this implies that c P lim sup{ω : Yn (ω) ∈ Gk } = 0. n
Let N be the union of the above lim sup sets over all k. Then P (N ) = 0 and it is easy to see that Yn (ω) → x0 as n → ∞ for all ω ∈ N c . 2. The Cram´ er Theorem Let X = (X1 , X2 , . . .) be a sequence of i.i.d. real random variables on a probability space (Ω, P ) with distribution µ := µX1 ∈ M1 (R). For each n ∈ N define 1 Xi n i=1 n
Sn :=
and µn = µX n := µSn ∈ M1 (R).
We define
Λ(λ) = ΛX (λ) := log E(e
λX1
) = log R
eλx dµ(x),
λ ∈ R,
where E(·) is the expectation on (Ω, P ) as usual. The above Λ(λ) is well defined as a function on R with values in (−∞, ∞] and called the logarithmic moment generating function or cumulant generating function. The Fenchel-Legendre transform of Λ is defined as Λ∗ (x) = Λ∗X (x) := sup{λx − Λ(λ)},
x ∈ R.
λ∈R
Lemma 2.1. Λ and Λ∗ has the following properties. (1) Λ is convex and lower semicontinuous with −∞ < Λ(λ) ≤ ∞ and Λ(0) = 0. (2) Λ∗ is convex and lower semicontinuous with 0 ≤ Λ∗ (x) ≤ ∞. (3) If dom Λ := {λ ∈ R : Λ(λ) < ∞} = {0}, then Λ∗ ≡ 0. (4) If dom Λ ∩ (0, ∞) = ∅, then x ¯ := E(X1 ) exists in [−∞, ∞). Also, if dom Λ ∩ (−∞, 0) = ∅, then x¯ exists in (−∞, ∞].
October 24, 2013
10:0
9in x 6in
192
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
(5) When x ¯ ∈ [−∞, ∞), for every x ≥ x¯, Λ∗ (x) = sup{λx − Λ(λ)},
(2.1)
λ≥0
¯. Also, when x ¯ ∈ (−∞, ∞], for and Λ∗ (x) is non-decreasing for x ≥ x every x ≤ x¯, Λ∗ (x) = sup{λx − Λ(λ)}, λ≤0
and Λ∗ (x) is non-increasing for x ≤ x ¯. x) = 0. (6) When x ¯ ∈ (−∞, ∞), Λ∗ (¯ (7) inf x∈R Λ∗ (x) = 0. (8) Λ is differentiable in (dom Λ)◦ and Λ (η) =
E(X1 eηX1 ) , E(eηX1 )
η ∈ (dom Λ)◦ .
(1) By H¨ older’s inequality, for 0 < α < 1, α λ X1 1−α e Λ(αλ + (1 − α)λ ) = log E eλX1 λX1 α λ X1 1−α ≤ log E e E e = αΛ(λ) + (1 − α)Λ(λ ).
Proof.
The lower semicontinuity of λ is immediate from Fatou’s lemma, and other properties are obvious. (2) Λ∗ is convex and lower semicontinuous by definition and Λ∗ (x) ≥ 0x − Λ(0) = 0. (3) is obvious. (4) If Λ(λ) < ∞ for some λ > 0, then x dµ(x) = E X1 1{X1 ≥0} ≤ E eλX1 /λ < ∞ [0,∞)
and hence x ¯ < ∞ exists. The second assertion is similar. (5) When x ¯ exists, for every λ ∈ R, Jensen’s inequality gives x. Λ(λ) = log E(eλX1 ) ≥ E(log eλX1 ) = λ¯
(2.2)
When x ¯ = −∞, Λ(λ) = ∞ for all λ < 0 by (2.2), and hence (2.1) is clear. When x ¯ ∈ (−∞, ∞), for every x ≥ x¯ and λ < 0, by (2.2), λx − Λ(λ) ≤ λ¯ x − Λ(λ) ≤ 0 and hence (2.1) holds. Since λx−Λ(λ) is non-decreasing in x if λ ≥ 0, Λ∗ (x) is also non-decreasing for x ≥ x ¯ by (2.1). The latter assertion is similar, or
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
193
we may replace Xi with −Xi so that Λ∗X (x) = sup{(−λ)(−x) − Λ−X (−λ)} = Λ∗−X (−x).
ΛX (λ) = Λ−X (−λ),
λ∈R
(6) When x ¯ ∈ (−∞, ∞), (2.2) gives Λ∗ (¯ x) = sup{λ¯ x − Λ(λ)} ≤ 0 λ∈R
x) = 0 by (2). so that Λ∗ (¯ (7) By (3), (4) and (6), we may assume that dom Λ ∩ (0, ∞) = ∅ and x ¯ = −∞, or dom Λ ∩ (−∞, 0) = ∅ and x¯ = ∞. When dom Λ ∩ (0, ∞) = ∅ and x¯ = −∞, for every x, log µ([x, ∞)) ≤ inf log E eλ(X1 −x) = − sup{λx − Λ(λ)} = −Λ∗ (x) λ≥0
λ≥0
thanks to (2.1). Hence lim Λ∗ (x) ≤ − lim log µ([x, ∞)) = 0.
x→−∞
x→−∞
The other case is similar, or replace Xi with −Xi . (8) Assume η ∈ (dom Λ)◦ and notice that εX1 E e(η+ε)X1 − E eηX1 −1 ηX1 e =E e ε ε and eηX1
eεX1 − 1 −→ X1 eηX1 (pointwise) as ε → 0. ε
Choose a δ > 0 with η ± δ ∈ dom Λ. Then eηX1 eδ|X1 | is integrable since eηX1 eδ|X1 | = e(η+δ)X1 1{X1 ≥0} + e(η−δ)X1 1{X1 <0} . Moreover, when ε ∈ (−δ, δ), ∞ ηX eεX1 − 1 |ε|k−1 |X1 |k ≤ eηX1 e 1 ε k! k=1
≤ eηX1
∞ δ k−1 |X1 |k k=1
k!
≤
eηX1 eδ|X1 | . δ
Hence the Lebesgue convergence theorem works to show the conclusion.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
194
b1644-ch03
Real and Stochastic Analysis
Theorem 2.2 (Cram´ er). With the above definitions and assumptions, (µn ) satisfies the LDP with the rate function Λ∗ , namely, (a) (upper bound) for every closed set F ⊂ R, lim sup n→∞
1 log µn (F ) ≤ − inf Λ∗ (x). x∈F n
(b) (lower bound) for every open set G ⊂ R, lim inf n→∞
1 log µn (G) ≥ − inf Λ∗ (x). x∈G n
Proof. (a) For any nonempty closed F ⊂ R set Λ∗F := inf x∈F Λ∗ (x). The conclusion being trivial if Λ∗F = 0, we assume Λ∗F > 0. Then x ¯ exists in [−∞, ∞] by Lemma 2.1 (3) and (4). For every x and λ ≥ 0, notice that µn ([x, ∞)) ≤ E(enλ(Sn −x) ) = e−nλx E(eλX1 )n = e−n{λx−Λ(λ)} . Similarly, for every x and λ ≤ 0, µn ((−∞, x]) ≤ E enλ(Sn −x) = e−n{λx−Λ(λ)} . Therefore, by Lemma 2.1 (5) we have ∗
µn ([x, ∞)) ≤ e−nΛ
(x)
−nΛ∗ (x)
µn ((−∞, x]) ≤ e
for every x ≥ x ¯ if x¯ < ∞,
(2.3)
for every x ≤ x ¯ if x¯ > −∞.
(2.4)
First assume x ¯ ∈ (−∞, ∞). If x ¯ ∈ F , then by Lemma 2.1 (6) we have x) = 0 = Λ∗F , contradicting the assumption Λ∗F > 0. Hence x ¯ ∈ F , and Λ∗ (¯ ¯. Then x− < x+ , and let (x− , x+ ) be the component of R\F containing x x− > −∞ or x+ < ∞. If x+ < ∞ then x+ ∈ F and Λ∗ (x+ ) ≥ Λ∗F , and if x− > −∞ then x− ∈ F and Λ∗ (x− ) ≥ Λ∗F . When x+ < ∞, by (2.3), ∗
µn ([x+ , ∞)) ≤ e−nΛ
(x+ )
∗
≤ e−nΛF ,
and when x− > −∞, by (2.4), ∗
µn ((−∞, x− ]) ≤ e−nΛ
(x− )
∗
≤ e−nΛF .
Hence ∗
µn (F ) ≤ µn ((−∞, x− ]) + µn ([x+ , ∞)) ≤ 2e−nΛF ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
195
A Concise Exposition of Large Deviations
which gives lim sup n→∞
1 log µn (F ) ≤ −Λ∗F . n
Next assume x ¯ = −∞. Since limx→−∞ Λ∗ (x) = 0 by Lemma 2.1 (5) and (7), we have x0 := inf F > −∞ from the assumption Λ∗F > 0. Hence x0 ∈ F and Λ∗ (x0 ) ≥ Λ∗F . Apply (2.3) for x = x0 to have lim sup n→∞
1 1 log µn (F ) ≤ lim sup log µn ([x0 , ∞)) ≤ −Λ∗ (x0 ) ≤ −Λ∗F . n n→∞ n
The case x ¯ = ∞ is similar. (b) To prove (b), it suffices to show that, for every x ∈ R and δ > 0, lim inf n→∞
1 log µn ((x − δ, x + δ)) ≥ −Λ∗ (x). n
(2.5)
Indeed, let G ⊂ R be any nonempty open set. For every x ∈ G, choose a δ > 0 with (x − δ, x + δ) ⊂ G; then (2.5) implies that lim inf n→∞
1 1 log µn (G) ≥ lim inf log µn ((x − δ, x + δ)) ≥ −Λ∗ (x). n→∞ n n
Hence we have lim inf n→∞
1 log µn (G) ≥ − inf Λ∗ (x). x∈G n
Furthermore, it suffices to show (2.5) when x = 0: lim inf n→∞
1 log µn ((−δ, δ)) ≥ −Λ∗ (0) = inf Λ(λ). λ∈R n
(2.6)
Indeed, if we take Yi := Xi − x, then X µY n ((−δ, δ)) = µn ((x−δ, x+δ)),
ΛY (λ) = ΛX (λ)−λx,
Λ∗Y (0) = Λ∗X (x).
We divide the proof of (2.6) into the three cases. (I) Assume that µ((−∞, 0)) > 0, µ((0, ∞)) > 0 and supp µ is bounded. The first two assumptions imply that Λ(λ) → ∞ as |λ| → ∞, and the third
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
196
b1644-ch03
Real and Stochastic Analysis
implies that Λ(λ) < ∞ for all λ ∈ R. Since Λ is a C 1 -convex function on R thanks to Lemma 2.1 (1) and (8), there is an η ∈ R such that Λ(η) = inf λ∈R Λ(λ) and Λ (η) = 0. Define d˜ µ(x) := eηx−Λ(η) dµ(x), which belongs to M1 (R) since R
d˜ µ(x) =
1 eΛ(η)
R
eηx dµ(x) =
E(eηX1 ) = 1. eΛ(η) ˜
˜ 2 , . . . with µ ˜ = µ ˜1 , X ˜. Write µ ˜n := µX Consider i.i.d. X n . For any ε > 0 we X1 have µn ((−ε, ε)) = · · · dµ(x1 ) · · · dµ(xn ) P {|
≥ e−nε|η|
n i=1
xi |
···
{|
Pn
i=1 xi |
eη
Pn i=1
xi
dµ(x1 ) · · · dµ(xn )
= e−nε|η| enΛ(η) µ ˜n ((−ε, ε)) and ˜1) = E(X
R
x d˜ µ(x) = e
−Λ(η)
R
xeηx dµ(x) = Λ (η) = 0
thanks to Lemma 2.1 (8). For any ε ∈ (0, δ), since µ ˜n ((−ε, ε)) → 1 as n → ∞ due to the weak law of large numbers, we obtain lim inf n→∞
1 1 log µn ((−δ, δ)) ≥ lim inf log µn ((−ε, ε)) ≥ −ε|η| + Λ(η). n→∞ n n
Letting ε 0 gives lim inf n→∞
1 log µn ((−δ, δ)) ≥ Λ(η) = inf Λ(λ). λ∈R n
(II) Assume that µ((−∞, 0)) > 0, µ((0, ∞)) > 0 and supp µ is unbounded. Choose an M > 0 with µ([−M, 0)) > 0 and µ((0, M ]) > 0, and ˆ1, X ˆ 2 , . . . with µ ˆ = µ choose i.i.d. X ˆ := µ([−M, M ])−1 µ|[−M,M] . Write X1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
197
A Concise Exposition of Large Deviations ˆ
µ ˆn := µX n . We have µn ((−δ, δ)) = · · ·
{|
Pn
i=1
xi |
dµ(x1 ) · · · dµ(xn )
···
≥ µ([−M, M ])n
{|
Pn
i=1
xi |
dˆ µ(x1 ) · · · dˆ µ(xn )
ˆn ((−δ, δ)) = µ([−M, M ])n µ and
ˆ Λ(λ) := log E(e
ˆ1 λX
−1
) = log µ([−M, M ]) = log [−M,M]
[−M,M]
e
λx
dµ(x)
eλx dµ(x) − log µ([−M, M ]).
Therefore, lim inf n→∞
1 1 log µn ((−δ, δ)) ≥ log µ([−M, M ]) + lim inf log µ ˆ n ((−δ, δ)) n→∞ n n ˆ ≥ log µ([−M, M ]) + inf Λ(λ) = inf ΛM (λ) λ∈R
λ∈R
(2.7) from the case (I) applied to (ˆ µn ), where ΛM (λ) := log [−M,M] eλx dµ(x). Since ΛM (λ) → ∞ as |λ| → ∞ and ΛM (λ) is increasing in M , we see that inf ΛM (λ) = min ΛM (λ) α
λ∈R
λ∈R
as M ∞
for some α ∈ (−∞, 0]. By compactness, {λ : ΛM (λ) ≤ α} = ∅. M>0
Taking a λ0 from this set, we obtain Λ(λ0 ) = lim log M→∞
[−M,M]
eλ0 x dµ(x) ≤ α
so that by (2.7), lim inf n→∞
1 log µn ((−δ, δ)) ≥ α ≥ Λ(λ0 ) ≥ inf Λ(λ). λ∈R n
October 24, 2013
10:0
9in x 6in
198
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
(III) Assume that µ((−∞, 0)) = 0 or µ((0, ∞)) = 0. When µ((−∞, 0)) = 0, Λ(λ) is non-decreasing in λ and eλx dµ(x) = log µ({0}). inf Λ(λ) = lim Λ(λ) = lim log λ∈R
λ→−∞
λ→−∞
[0,∞)
Since µn ((−δ, δ)) ≥ µn ({0}) = µ({0})n , we obtain lim inf n→∞
1 log µn ((−δ, δ)) ≥ log µ({0}) = inf Λ(λ). λ∈R n
The case µ((0, ∞)) = 0 is similar.
For example, let (X1 , X2 , . . .) be a sequence of i.i.d. real random variables with normal distribution N (m, σ 2 ). Then it is easy to compute (x−m)2 1 λX1 E(e )= √ eλx e− 2σ2 dx 2πσ R 2 2 (x−m−λσ2 )2 2 2 eλm+λ σ /2 2σ2 √ e dx = eλm+λ σ /2 = 2πσ R and 2 2
Λ(λ) = λm + λ σ /2,
2 (x − m)2 2σ = Λ (x) = sup λx − λm − λ . 2 2σ2 λ∈R ∗
Hence (µn ) satisfies the LDP with the good rate function Λ∗ (x) = (x − m)2 /2σ2 . In particular, we have for any δ > 0 δ2 1 log P (|Sn − m| ≥ δ) = − 2 , n→∞ n 2σ lim
which can be also seen from the distribution N (0, σ 2 /n) of Sn − m directly. A remarkable point in the above Cram´er theorem is that there is no restriction on the distribution µ of Xi . In fact, even the existence of the mean x ¯ = E(X1 ) is not assumed. On the other hand, remark that the condition 0 ∈ (dom Λ)◦ , an essential assumption in the next section (see Assumption 3.1), is rather strict because it implies that E(eηX1 ) < ∞ if |η| is small, so E(|X1 |n ) < ∞ for all n ∈ N. Furthermore, it is worthwhile to notice: Theorem 2.3. In the same situation as in Theorem 2.2 the following conditions are equivalent: (i) Λ∗ is good; (ii) 0 ∈ (dom Λ)◦ ; (iii) (µn ) is exponentially tight.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
199
Proof. (ii) ⇒ (iii) will be proved in Lemma 3.3 in a more general situation. (iii) ⇒ (i) follows from Lemma 1.5 since we have LDP by Theorem 2.2. Finally, we prove (i) ⇒ (ii). Thanks to Lemma 2.1 (1) Fenchel’s duality (see Lemma 4.5) gives Λ(λ) = sup{λx − Λ∗ (x)},
λ ∈ R.
x∈R
From (i) and Lemma 2.1 (7) there is an x0 ∈ R such that Λ∗ (x0 ) = 0. Moreover, by convexity of Λ∗ , there are a, b ∈ R with a ≤ x0 ≤ b such that Λ∗ (x) ≤ 1 for x ∈ [a, b] and Λ∗ (x) ≥ 1 for x ∈ R\[a, b]. When a < x0 and α := 1/(a − x0 ) < 0, we have ≤ α(x − x0 ) if a ≤ x ≤ x0 , Λ∗ (x) ≥ α(x − x0 ) if x < a or x > x0 . Therefore, Λ(α) =
sup {αx − Λ∗ (x)} < ∞ a≤x≤x0
so that α ∈ dom Λ. When a = x0 , we must have Λ∗ (x) = ∞ for all x < x0 , which implies that (−∞, 0] ⊂ dom Λ. Similarly, when b > x0 , we have 1/(b − x0 ) ∈ dom Λ; otherwise, [0, ∞) ⊂ dom Λ. Hence (ii) follows. 3. The G¨ artner-Ellis Theorem Let (Yn ) be a sequence of Rd -valued random vectors, and let µn := µYn ∈ M1 (Rd ), the distribution of Yn . For each n ∈ N, define the logarithmic moment generating function
λ,Yn ) = log e λ,x dµn (x), λ ∈ Rd , Λn (λ) := log E(e Rd
where λ, x is the usual inner product in Rd . Note that Λn is well defined as a function on Rd with values in (−∞, ∞]. We assume: Assumption 3.1. For every λ ∈ Rd , the limit Λ(λ) := lim
n→∞
1 Λn (nλ) n
(3.1)
exists in [−∞, ∞]. This function Λ on Rd is called the limiting logarithmic moment generating function. Moreover, we assume that 0 is in the interior (dom Λ)◦ of dom Λ := {λ ∈ Rd : Λ(λ) < ∞}.
October 24, 2013
10:0
9in x 6in
200
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
Lemma 3.2. Under Assumption 3.1, Λ(λ) > −∞ for all λ ∈ Rd and Λ is a convex function. The Fenchel-Legendre transform of Λ Λ∗ (x) := sup {λ, x − Λ(λ)},
x ∈ Rd ,
(3.2)
λ∈Rd
is a good convex rate function. Proof. Since Λn ’s are convex on Rd as Lemma 2.1 (1), Λ is also convex in the extended sense (at the moment) that Λ(αλ + (1 − α)λ ) ≤ αΛ(λ) + (1 − α)Λ(λ ),
λ, λ ∈ Rd , 0 < α < 1,
if the right-hand side is well defined in [−∞, ∞]. From Λn (0) = 0 we have Λ(0) = 0 and so Λ∗ (x) ≥ 0. Suppose Λ(λ) = −∞ for some λ ∈ Rd . Then by convexity, Λ(αλ) ≤ αΛ(λ) + (1 − α)Λ(0) = −∞ and hence Λ(−αλ) = ∞ for all 0 < α < 1, contradicting the assumption 0 ∈ (dom Λ)◦ . Hence Λ(λ) > −∞ for all λ ∈ Rd . By definition, Λ∗ is convex and lower semicontinuous on Rd . So it remains to prove the goodness of Λ∗ . Since 0 ∈ (dom Λ)◦ , choose a δ > 0 such that B δ (0) := {λ : λ ≤ δ} ⊂ (dom Λ)◦ , · being the Euclidean norm on Rd . Note (see [34]) that the convex function Λ is automatically continuous on (dom Λ)◦ ; thus let c := supλ∈Bδ (0) Λ(λ) < ∞. For each x ∈ Rd , Λ∗ (x) ≥
sup {λ, x − Λ(λ)} ≥
λ∈B δ (0)
sup λ, x − c = δx − c.
λ∈B δ (0)
Hence for every α ≥ 0, the level set {x : Λ∗ (x) ≤ α} is a closed set included in {x : x ≤ (α + c)/δ}, so it is compact. Lemma 3.3. (µn ) is exponentially tight. Proof. Let u1 , . . . , un be the standard basis vectors of Rd . Since 0 ∈ (dom Λ)◦ , there is a θ > 0 such that Λ(θuj ), Λ(−θuj ) < ∞ for 1 ≤ j ≤ d. (j) Let µn be the j th marginal distribution of µn . For each ρ > 0 we have µ(j) n ((ρ, ∞))
= ≤
Rj−1 ×(ρ,∞)×Rd−j
Rd
dµn (x)
en( θuj ,x−θρ) dµn (x) = exp(−nθρ + Λn (nθuj )),
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
µ(j) n ((−∞, −ρ)) =
b1644-ch03
201
≤
Rj−1 ×(−∞,−ρ)×Rd−j
Rd
dµn (x)
en(− θuj ,x−θρ) dµn (x) = exp(−nθρ + Λn (−nθuj )).
Therefore, for 1 ≤ j ≤ d, lim sup n→∞
lim sup n→∞
1 1 log µ(j) Λn (nθuj ) −→ −∞, n ((ρ, ∞)) ≤ −θρ + n n
(3.3)
1 1 log µ(j) Λn (−nθuj ) −→ −∞ (3.4) n ((−∞, −ρ)) ≤ −θρ + n n
as ρ → ∞. We then have lim sup n→∞
1 log µn (Rd \[−ρ, ρ]d ) n
≤ lim sup n→∞
d 1 µn (Rj−1 × (R\[−ρ, ρ]) × Rd−j ) log n j=1
d 1 (j) = lim sup log {µ(j) n ((−∞, −ρ)) + µn ((ρ, ∞))} n→∞ n j=1
(j) 1 (j) ≤ lim sup log 2d max µn ((−∞, −ρ)), µn ((ρ, ∞)) 1≤j≤d n→∞ n 1 1 (j) (j) = lim sup max log µn ((−∞, −ρ)), log µn ((ρ, ∞)) n n→∞ 1≤j≤d n −→ −∞ as ρ → ∞ thanks to (3.3) and (3.4), which implies the exponential tightness of (µn ). Definition 3.4. For y ∈ Rd , we say that y is an exposed point of Λ∗ or Λ∗ is strictly subdifferentiable at y if there is a λ ∈ Rd such that Λ∗ (x) > λ, x − y + Λ∗ (y) for all x = y. The above λ is called an exposing hyperplane or strict subdifferential of Λ∗ at y. (This is the strict version of subdifferential Λ∗ (x) ≥ λ, x − y + Λ∗ (y) for all x.) We write F for the set of exposed points of Λ∗ having an exposed hyperplane in (dom Λ)◦ .
October 24, 2013
202
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
For each nonempty convex set C ⊂ Rd , the relative interior ri C of C is defined as the set of y ∈ C such that, for every x ∈ C, there exists an ε > 0 such that y − ε(x − y) ∈ C for some ε > 0. Then: 1◦ ri C = ∅. 2◦ If x ∈ C and y ∈ ri C, then (1 − α)x + αy ∈ ri C for all 0 < α ≤ 1. In the following definition we introduce, in addition to Assumption 3.1, a condition on differentiability of Λ, which will be essential in Theorem 3.7 below. Definition 3.5. We say that Λ is essentially smooth if • Λ is differentiable on (dom Λ)◦ , • limn→∞ ∇Λ(λn ) = ∞ whenever {λn } ⊂ (dom Λ)◦ and λn → λ ∈ ∂(dom Λ), the boundary of dom Λ, where ∇Λ is the gradient of Λ. Lemma 3.6. If Λ : Rd → (−∞, ∞] is an essentially smooth, lower semicontinuous and convex function, then ri (dom Λ∗ ) ⊂ F, where F has been defined in Definition 3.4. For the proofs of the lemma as well as the above facts 1◦ and 2◦ on convex analysis, see [13] (also [34] for more details). Now we are in a position to present the theorem of G¨artner and Ellis. Theorem 3.7 (G¨ artner-Ellis). With Assumption 3.1 and the definitions above, the following hold : (a) (upper bound) For every closed F ⊂ Rd , lim sup n→∞
1 log µn (F ) ≤ − inf Λ∗ (x). x∈F n
(b) (lower bound) For every open G ⊂ Rd , lim inf n→∞
1 log µn (G) ≥ − inf Λ∗ (x). x∈G∩F n
(c) If Λ is lower semicontinuous and essentially smooth, then (µn ) satisfies the LDP with the good rate function Λ∗ .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
203
A Concise Exposition of Large Deviations
Proof. (a) By Lemmas 1.5 (1) and 3.3, it suffices to prove this for compact sets K ⊂ Rd . For any δ > 0 define I δ (x) := min{Λ∗ (x) − δ, 1/δ},
x ∈ Rd .
For every q ∈ K, choose a λq ∈ Rd such that λq , q − Λ(λq ) ≥ I δ (q), and a ρq > 0 such that ρq λq ≤ δ. For each n ∈ N and q ∈ K, with Bρq (q) := {x : x − q < ρq }, we have
µn (Bρq (q)) = P (Yn ∈ Bρq (q)) ≤ E exp λq , Yn − inf λq , x
= E(e λq ,Yn ) exp −
inf
x∈Bρq (q)
λq , x
x∈Bρq (q)
≤ exp (Λn (λq ) + δ − λq , q), because inf
λq , x = λq , q +
x∈Bρq (q)
inf
λq , x − q = λq , q +
x∈Bρq (q)
inf
x∈Bρq (0)
λq , x
= λq , q − ρq λq ≥ λq , q − δ. Therefore, 1 log µn (Bρq (q)) ≤ δ − λq , q + Λn (λq ). n
There are q1 , . . . , qm ∈ K such that K ⊂ m j=1 Bρqj (qj ). Then we have m 1 1 1 log µn (K) ≤ log µn (Bρqj (qj )) ≤ log m max µn (Bρqj (qj )) 1≤j≤m n n n j=1
1 log µn Bρqj (qj ) n
=
log m + max 1≤j≤m n
≤
log m + δ − min {λqj , qj − Λn (λqj )} 1≤j≤m n
≤
log m + δ − min I δ (qj ) 1≤j≤m n
so that lim sup n→∞
1 log µn (K) ≤ δ − min I δ (qj ) ≤ δ − inf I δ (x). 1≤j≤m x∈K n
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
204
b1644-ch03
Real and Stochastic Analysis
Letting δ 0 yields that lim sup n→∞
1 log µn (K) ≤ − inf Λ∗ (x). x∈K n
(b) If suffices to prove that for every y ∈ F and δ > 0 lim inf n→∞
1 log µn (Bδ (y)) ≥ −Λ∗ (y) − δη, n
where η ∈ (dom Λ)◦ is an exposing hyperplane of Λ∗ at y. Indeed, for any open G ⊂ Rd and any y ∈ G ∩ F, choosing a δ > 0 with Bδ (y) ⊂ G, we then have lim inf n→∞
1 1 log µn (G) ≥ lim inf log µn (Bδ (y)) ≥ −Λ∗ (y) − δη. n→∞ n n
Since δ > 0 can be arbitrarily small, this yields that lim inf n→∞
1 log µn (G) ≥ −Λ∗ (y) for all y ∈ G ∩ F. n
Note that Λn (nη) < ∞ for every sufficiently large n. Hence for such n, one can define µ ˜n ∈ M1 (Rd ) by d˜ µn (x) := en η,x−Λn (nη) dµn (x), because
Rd
d˜ µn = e−Λn (nη)
1 1 log µn (Bδ (y)) = log n n = ≥
Rd
en η,x dµn (x) = 1. We then obtain
Bδ (y)
eΛn (nη)−n η,y−n η,x−y d˜ µn (x)
1 1 Λn (nη) − η, y + log n n
Bδ (y)
e−n η,x−y d˜ µn (x)
1 1 Λn (nη) − η, y − δη + log µ ˜n (Bδ (y)), n n
because −nη, x − y ≥ −nδη for x ∈ Bδ (y). Therefore, lim inf n→∞
1 1 log µn (Bδ (y)) ≥ Λ(η) − η, y − δη + lim inf log µ ˜n (Bδ (y)) n→∞ n n 1 ˜n (Bδ (y)). ≥ −Λ∗ (y) − δη + lim inf log µ n→∞ n
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
205
To prove that lim inf n→∞ n1 log µ ˜ n (Bδ (y)) = 0 for all δ > 0, let us apply large deviations upper bound (a) for (˜ µn ). To do this, set ˜ n (λ) := log e λ,x d˜ µn (x), λ ∈ Rd . Λ Rd
Since
˜ n (nλ) = log Λ
Rd
e
n λ,x
d˜ µn (x) = log
Rd
en λ+η,x−Λn (nη) dµn (x)
= Λn (n(λ + η)) − Λn (nη) (recall that Λn (nη) < ∞ for every sufficiently large n), we have 1˜ ˜ Λ(λ) := lim Λ n (nλ) = Λ(λ + η) − Λ(η) n→∞ n and hence ˜ ˜ ∗ (x) := sup {λ, x − Λ(λ)} Λ λ∈Rd
= sup {λ + η, x − Λ(λ + η)} − η, x + Λ(η) λ∈Rd
= Λ∗ (x) − η, x + Λ(η). Hence Assumption 3.1 is also satisfied for (˜ µn ) so that large deviation upper ˜ ∗ . Apply (a) bound (a) holds for (˜ µn ) as well with the good rate function Λ c to the closed set Bδ (y) to obtain lim sup n→∞
1 ˜ ∗ (x0 ) ˜ ∗ (x) = −Λ log µ ˜n (Bδ (y)c ) ≤ − inf c Λ n x∈Bδ (y)
for some x0 ∈ Bδ (y)c , where the existence x0 is guaranteed by the goodness ˜ ∗ . Since y is an exposed point of Λ∗ with exposing hyperplane η, it of Λ follows from x0 = y that Λ∗ (x0 ) > η, x0 − y + Λ∗ (y) so that ˜ ∗ (x0 ) = Λ∗ (x0 ) − η, x0 + Λ(η) > −η, y + Λ∗ (y) + Λ(η) ≥ 0. Λ Therefore, for every δ > 0, lim sup n→∞
1 log µ ˜n (Bδ (y)c ) < 0, n
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
206
b1644-ch03
Real and Stochastic Analysis
which implies that µ ˜n (Bδ (y)c ) → 0 or µ ˜n (Bδ (y)) → 1 as n → ∞. In particular, lim inf n→∞
1 log µ ˜n (Bδ (y)) = 0. n
(c) By (b) and Lemma 3.6, it suffices to prove that, for every open G ⊂ Rd , inf
x∈G ∩ ri (dom Λ∗ )
Λ∗ (x) = inf Λ∗ (x). x∈G
We may assume x ∈ G ∩ dom Λ∗ ; then dom Λ∗ = ∅ and hence ri (dom Λ∗ ) = ∅ by the fact 1◦ mentioned above. Choose a y ∈ ri (dom Λ∗ ). If α > 0 is sufficiently small, then the fact 2◦ gives (1 − α)x + αy ∈ G ∩ ri (dom Λ∗ ). Since Λ∗ is convex, it is immediate to see that lim Λ∗ ((1 − α)x + αy) ≤ Λ∗ (x).
α 0
Hence inf
z∈G ∩ ri (dom Λ∗ )
Λ∗ (z) ≤ Λ∗ (x).
This implies the conclusion.
Although the G¨ artner-Ellis theorem is quite useful, the essential smoothness assumption for the full LDP in (c) is rather difficult to verify. Even Assumption 3.1 is not so easy to check in many cases when we want to apply large deviations upper and lower bounds in (a) and (b). In fact, the latter assumption 0 ∈ (dom Λ)◦ in Assumption 3.1 is considered rather strict from the point mentioned in the previous section (just before Theorem 2.3). So it may be desirable to relax the assumptions of G¨artnerEllis theorem even though the conclusions become weaker. In the rest of this section, following [30] we present a result in this direction while restricted for simplicity to the one-dimensional R-case. Let (µn ) be a sequence in M1 (R) with logarithmic moment generating functions Λn (λ) := log R eλx dµn (x), λ ∈ R. Since it is not assumed that the limit limn→∞ n1 Λn (nλ) exists for every λ ∈ R, we instead define Λ(λ) := lim sup n→∞
1 Λn (nλ), n
λ ∈ R.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
207
Since Λn ’s are convex, note that so is Λ on R (in the extended sense as mentioned in the proof of Lemma 3.2). Lemma 3.8. For every y ∈ R, lim sup n→∞
lim sup n→∞
Proof.
1 log µn ([y, ∞)) ≤ − sup{λy − Λ(λ)}, n λ≥0
1 log µn ((−∞, y]) ≤ − sup{λy − Λ(λ)}. n λ≤0
For each λ ≥ 0,
µn ([y, ∞)) ≤
[y,∞)
enλ(x−y) dµn (x) ≤ e−nλy Λn (nλ),
and hence lim sup n→∞
1 log µn ([y, ∞)) ≤ −λy + Λ(λ), n
which gives the first inequality. The proof of the second inequality is similar. Assumption 3.9. Assume that the finite limit 1 Λn (nλ) n→∞ n
Λ(λ) := lim
exists for all λ in some open interval (α, β) with −∞ ≤ α < β ≤ ∞, and moreover that Λ is differentiable on (α, β). Since Λ is convex on (α, β), this differentiability assumption automatically implies that Λ is C 1 on (α, β). Define the polar function of Λ by Λ∗(α,β) (x) := sup {λx − Λ(λ)},
x ∈ R,
λ∈(α,β)
which is the Fenchel-Legendre transform of Λ extended on R with Λ(λ) = ∞ outside (α, β). Let −∞ ≤ a := lim Λ (λ) ≤ b := lim Λ (λ) ≤ ∞, λ α
λ β
and assume a < b, that is, Λ is not totally linear on (α, β). Lemma 3.10. For any η ∈ (α, β), y := Λ (η) is an exposed point of Λ∗(α,β) with exposing hyperplane η.
October 24, 2013
10:0
9in x 6in
208
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
Proof. Since Λ is convex on (α, β), Λ(λ) ≥ y(λ−η)+Λ(η), i.e., ηy−Λ(η) ≥ λy − Λ(λ) for all λ ∈ (α, β). Hence Λ∗(α,β)(y) = ηy − Λ(η). We show that Λ∗(α,β) (x) > Λ∗(α,β) (y) + η(x − y) for all x = y. To do this, assume the contrary, i.e., Λ∗(α,β) (x) ≤ Λ∗(α,β) (y) + η(x − y) = ηx − Λ(η) for some x. Then λx − Λ(λ) ≤ Λ∗(α,β) (x) ≤ ηx − Λ(η),
λ ∈ (α, β),
and hence Λ(λ) − Λ(η) λ−η
≤ x if λ < η, ≥ x if λ > η.
This implies that y = Λ (η) = x.
Under Assumption 3.9, one can see by Lemma 3.10 that (a, b) is the set of exposed point of Λ∗(α,β) having an exposing hyperplane in (α, β). So the next theorem is essentially the same as (b) of Theorem 3.7. A merit of the next theorem is that the rate function is the polar function of Λ restricted on (α, β) so that we do not need to check the existence of Λ(λ) on the whole R. Also the assumption 0 ∈ (dom Λ)◦ is not necessary. But in the current situation, the rate function is not necessarily good. Theorem 3.11. Under Assumption 3.9, for every open G ⊂ R, lim inf n→∞
1 log µn (G) ≥ − inf Λ∗(α,β) (x). n x∈G∩(a,b)
Proof. For each y ∈ G ∩ (a, b), there is an η ∈ (α, β) such that y = Λ (η) and hence Λ∗(α,β) (y) = ηy − Λ(η). As in the proof of (b) of Theorem 3.7, it suffices to prove that, for every δ > 0, lim inf n→∞
1 log µn ((y − δ, y + δ)) ≥ −Λ∗(α,β) (y) − δ|η|. n
Since Λn (nη) < ∞ for n large enough, one can define µ ˜n ∈ M1 (R) by d˜ µn (x) := enηx−Λn (nη) dµn (x).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
209
A Concise Exposition of Large Deviations
Notice, as in the proof of (b) of Theorem 3.7, that 1 log µn ((y − δ, y + δ)) n 1 eΛn (nη)−nηy−nη(x−y) d˜ µn (x) = log n (y−δ,y+δ) ≥
1 1 Λn (nη) − ηy − δ|η| + log µ ˜n ((y − δ, y + δ)) n n
so that lim inf n→∞
1 log µn ((y − δ, y + δ)) n
1 log µ ˜n ((y − δ, y + δ)) n 1 ˜n ((y − δ, y + δ)). = −Λ∗(α,β)(y) − δ|η| + lim inf log µ n→∞ n
≥ Λ(η) − ηy − δ|η| + lim inf n→∞
It remains to prove that lim inf n→∞ this, it is enough to show that
1 n
log µ ˜n ((y − δ, y + δ)) = 0. To do
1 log µ ˜n (R\(y − δ, y + δ)) n→∞ n 1 = max lim sup log µ ˜n ((−∞, y − δ]), n n→∞ 1 lim sup log µ ˜n ([y + δ, ∞)) < 0. n→∞ n
lim sup
(3.5)
The logarithmic moment generating function of µ ˜n is ˜ n (λ) := log eλx d˜ Λ µn (x), λ ∈ R. R
Since
˜ n (nλ) = log Λ
R
en(λ+η)x−Λn (nη) dµn (x) = Λn (n(λ + η)) − Λn (nη),
we have 1˜ ˜ Λ(λ) := lim sup Λ n (nλ) = Λ(λ + η) − Λ(η) n n→∞ so that by Assumption 3.9 ˜ Λ(λ) = Λ(λ + η) − Λ(η)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
210
b1644-ch03
Real and Stochastic Analysis
whenever λ + η ∈ (α, β). By Lemma 3.8 applied to (˜ µn ), lim sup n→∞
lim sup n→∞
1 ˜ log µ ˜n ([y + δ, ∞)) ≤ − sup{λ(y + δ) − Λ(λ)}, n λ≥0
1 ˜ log µ ˜n ((−∞, y − δ]) ≤ − sup{λ(y − δ) − Λ(λ)}. n λ≤0
Thus it suffices for (3.5) to prove that ˜ sup{λ(y + δ) − Λ(λ)} > 0,
˜ sup{λ(y − δ) − Λ(λ)} > 0.
λ≥0
λ≤0
Notice that ˜ = sup{λ(y + δ) − Λ(λ + η) + Λ(η)} sup{λ(y + δ) − Λ(λ)} λ≥0
λ≥0
= Λ(η) − η(y + δ) + sup{(λ + η)(y + δ) − Λ(λ + η)} λ≥0
=
−Λ∗(α,β)(y)
− ηδ + sup{λ(y + δ) − Λ(λ)}
≥
−Λ∗(α,β)(y)
− ηδ + sup {λ(y + δ) − Λ(λ)}.
λ≥η
λ∈[η,β)
(3.6) When λ ∈ (α, η], since Λ(λ) ≥ y(λ − η) + Λ(η) ≥ (y + δ)(λ − η) + Λ(η), we have λ(y + δ) − Λ(λ) ≤ η(y + δ) − Λ(η). Hence sup {λ(y + δ) − Λ(λ)} = sup {λ(y + δ) − Λ(λ)} = Λ∗(α,β) (y + δ).
λ∈[η,β)
λ∈(α,β)
By (3.6) and this we obtain ˜ ≥ −Λ∗(α,β) (y) − ηδ + Λ∗(α,β)(y + δ) > 0 sup{λ(y + δ) − Λ(λ)} λ≥0
˜ > 0 is similar thanks to Lemma 3.10. The proof of supλ≤0 {λ(y − δ) − Λ(λ)} and omitted. Thus, (3.5) is established and the conclusion follows.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
211
A Concise Exposition of Large Deviations
4. Varadhan’s Integral Lemma The aim of this section is to show the result called Varadhan’s integral lemma, which is a general outgrowth of the LDP with a good rate function and provides quite an important tool in applications of LDP. In this section we assume that X is a regular topological space. Theorem 4.1 (Varadhan). Let X be a regular topological space. Assume that (µn ) ⊂ M1 (X ) satisfies the LDP in the scale εn with a good rate function I. Let φ : X → R be a continuous function. If φ satisfies lim lim sup εn log eφ(x)/εn dµn (x) = −∞, (4.1) L→∞ n→∞
then
{φ≥L}
lim εn log
n→∞
X
eφ(x)/εn dµn (x) = sup {φ(x) − I(x)}.
(4.2)
x∈X
In particular, this holds if φ satisfies eαφ(x)/εn dµn (x) < ∞ lim sup εn log
(4.3)
X
n→∞
for some α > 1. The theorem follows from the following three lemmas. Lemma 4.2. Assume that the large deviations upper bound (Definition 1.1 (a)) holds for (µn ) with a good rate function I. If φ : X → R is upper semicontinuous satisfying condition (4.1), then eφ(x)/εn dµn (x) ≤ sup {φ(x) − I(x)}. lim sup εn log X
n→∞
x∈X
For each L > 0 let φL (x) := min{φ(x), L}. Since eφ(x)/εn dµn (x) εn log
Proof.
X
= εn log
{φ
≤ εn log 2 max
e
φ(x)/εn
X
e
dµn (x) +
φL (x)/εn
{φ≥L}
e
φ(x)/εn
dµn (x)
dµn (x),
{φ≥L}
e
φ(x)/εn
dµn (x)
,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
212
b1644-ch03
Real and Stochastic Analysis
we have
lim sup εn log
X
n→∞
eφ(x)/εn dµn (x)
≤ max lim sup εn log eφL (x)/εn dµn (x), X
n→∞
lim sup εn log n→∞
{φ≥L}
e
φ(x)/εn
dµn (x) .
(4.4)
When we proved the assertion for each φL in place of φ, we notice that lim sup εn log eφ(x)/εn dµn (x) n→∞
X
sup {φ(x) − I(x)}, lim sup εn log
≤ max
n→∞
x∈X
{φ≥L}
e
φ(x)/εn
dµn (x) .
Letting L → ∞ gives the assertion for φ thanks to Theorem 4.1. Therefore, we may assume that φ(x) ≤ L < ∞ (then (4.1) trivially holds). Let δ > 0 and α > 0 be arbitrary. Since φ and I are respectively upper and lower semicontinuous and X is regular, one can choose, for any y ∈ X , an open neighborhood Gy of y such that φ(x) < φ(y)+δ and I(x) > I(y)−δ for all x ∈ Gy (the closure of Gy ). Since the level set Kα := {x : I(x) ≤ α}
m is compact, m there c are y1 , . . . , ym ∈ Kα such that Kα ⊂ i=1 Gyi . With (⊂ {x : I(x) > α}), since F0 := i=1 Gyi eφ(x)/εn dµn (x) X
≤
eφ(x)/εn dµn (x) + F0
m i=1
≤ eL/εn µn (F0 ) +
m
eφ(x)/εn dµn (x)
Gyi
e(φ(yi )+δ)/εn µn (Gyi )
i=1
≤ (m + 1) max{eL/εn µn (F0 ), eφ(yi )+δ)/εn µn (Gyi ) : 1 ≤ i ≤ m}, we have lim sup εn log n→∞
X
eφ(x)/εn dµn (x)
≤ lim sup max{L + εn log µn (F0 ), φ(yi ) + δ + εn log µn (Gyi ) : 1 ≤ i ≤ n} n→∞
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
213
A Concise Exposition of Large Deviations
= max L + lim sup εn log µn (F0 ), n→∞
φ(yi ) + δ + lim sup εn log µn Gyi : 1 ≤ i ≤ m n→∞
≤ max L − inf I(x), φ(yi ) + δ − inf I(x) : 1 ≤ i ≤ m x∈F0
x∈Gyi
≤ max{L − α, φ(yi ) − I(yi ) + 2δ : 1 ≤ i ≤ m} ≤ max L − α, sup {φ(x) − I(x)} + 2δ . x∈X
In the above, we have used the large deviations upper bound for the second inequality. Thus, the desired inequality follows by letting α ∞ and δ 0. Lemma 4.3. Assume that the large deviations lower bound (Definition 1.1 (b)) holds for (µn ) with a rate function I. If φ : X → R is lower semicontinuous, then lim inf εn log eφ(x)/εn dµn (x) ≥ sup {φ(x) − I(x)}. n→∞
X
x∈X
Proof. Let y ∈ X and δ > 0. Since φ is lower semicontinuous, one can choose an open neighborhood of y such that φ(x) > φ(y) − δ for all x ∈ G. We then have φ(x)/εn e dµn (x) ≥ εn log eφ(x)/εn dµn (x) εn log X
G
≥ inf φ(x) + εn log µn (G) x∈G
≥ φ(y) − δ + εn log µn (G). Hence the large deviations lower bound implies that eφ(x)/εn dµn (x) ≥ φ(y) − δ − inf I(x) ≥ φ(y) − δ − I(y), lim inf εn log n→∞
X
x∈G
which yields the result since y and δ are arbitrary.
Lemma 4.4. If condition (4.3) holds for some α > 1, then condition (4.1) is satisfied.
October 24, 2013
10:0
9in x 6in
214
Proof.
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
For α > 1 and L > 0 we have eφ(x)/εn dµn (x) = eL/εn {φ≥L}
{φ≥L}
e(φ(x)−L)/εn dµn (x)
≤ eL/εn
X
eα(φ(x)−L)/εn dµn (x)
= e(1−α)L/εn so that
X
eαφ(x)/εn dµn (x)
lim sup εn log n→∞
{φ≥L}
eφ(x)/εn dµn (x)
≤ (1 − α)L + lim sup εn log n→∞
X
eαφ(x)/εn dµn (x).
Hence, by letting L ∞, (4.1) follows from (4.3) with α > 1.
The next lemma is a general duality for the Fenchel-Legendre transform on a locally convex topological space, whose proof is based on the HahnBanach separation theorem. Lemma 4.5. Let X be a locally convex Hausdorff topological real vector space with the dual space X ∗ and the duality λ, x for x ∈ X and λ ∈ X ∗ . Let Ψ : X → (−∞, ∞] be a lower semicontinuous convex function and define the Fenchel-Legendre transform Ψ∗ by Ψ∗ (λ) := sup {λ, x − Ψ(x)}, x∈X
λ ∈ X ∗.
Then Ψ is the Fenchel-Legendre transform of Ψ∗ as follows: Ψ(x) = sup {λ, x − Ψ∗ (λ)}, λ∈X ∗
x ∈ X.
The next theorem is a certain converse of Theorem 3.7 as it shows, in an infinite dimensional topological space, the existence of the limiting logarithmic moment generating function from the LDP with an additional condition ((4.5) below). The theorem will play a crucial role in the next section. Theorem 4.6. Let X be a locally convex Hausdorff topological real vector space. Assume that (µn ) ⊂ M1 (X ) satisfies the LDP with a good rate
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
215
A Concise Exposition of Large Deviations
function I. Moreover assume that e λ,x/εn dµn (x) < ∞ lim sup εn log
(4.5)
X
n→∞
for every λ ∈ X ∗ . Then: (1) For every λ ∈ X ∗ , the finite limit Λ(λ) := lim εn log n→∞
X
e λ,x/εn dµn (x)
exists and satisfies Λ(λ) = sup {λ, x − I(x)}. x∈X
(2) If I is convex (and lower semicontinuous) on X , then I = Λ∗ , i.e., I(x) = sup {λ, x − Λ(λ)}, λ∈X ∗
Proof.
x ∈ X.
(1) For each λ ∈ X ∗ , since assumption (4.5) gives lim sup εn log e2 λ,x/εn dµn (x) < ∞, X
n→∞
condition (4.3) is satisfied for φ(x) = λ, x. Hence Theorem 4.1 implies that e λ,x/εn dµn (x) = sup {λ, x − I(x)}. (4.6) Λ(λ) := lim εn log n→∞
X
x∈X
By (4.5) we have Λ(λ) < ∞ for all λ ∈ X ∗ . Moreover, Λ(0) = 0 and Λ is convex on X ∗ by (4.6). Hence Λ(λ) > −∞ for all λ ∈ X ∗ as well. (2) follows from (1) and Lemma 4.5. By Lemmas 4.2 and 4.3 one can in fact prove the following stronger version of Varadhan’s integral lemma, which implies Theorem 4.1 by taking G = F = X and also contains the original LDP when φ(x) ≡ 0. Theorem 4.7. Let X be a regular topological space. Assume that (µn ) ⊂ M1 (X ) satisfies the LDP in the scale εn with a good rate function I. Let φ : X → R be a continuous function and assume that condition (4.1) is satisfied. Then
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
216
b1644-ch03
Real and Stochastic Analysis
(a) for every closed F ⊂ X , eφ(x)/εn dµn (x) ≤ sup {φ(x) − I(x)}, lim sup εn log n→∞
x∈F
F
(b) for every open G ⊂ X , lim inf εn log eφ(x)/εn dµn (x) ≥ sup {φ(x) − I(x)}. n→∞
Proof.
x∈G
G
(a) For every closed F ⊂ X and every L > 0 define
L if x ∈ F , −L if x ∈ F c ,
θL (x) :=
which is obviously upper semicontinuous function on X . Let φL := min{φ, θL }, which is an upper semicontinuous function on X satisfying (4.1). Hence one can apply Lemma 4.2 to have lim sup εn log eφL (x)/εn dµn (x) ≤ sup {φL (x) − I(x)} n→∞
X
x∈X
≤ max sup {φ(x) − I(x)}, −L . x∈F
Moreover, since φ(x)/εn e dµn (x) ≤ F
≤
F ∩{φ
e
φ(x)/εn
φL (x)/εn
X
e
dµn (x) +
dµn (x) +
eφ(x)/εn dµn (x)
{φ≥L}
{φ≥L}
eφ(x)/εn dµn (x),
we have lim sup εn log n→∞
X
eφ(x)/εn dµn (x)
eφL (x)/εn dµn (x), ≤ max lim sup X
n→∞
lim sup n→∞
{φ≥L}
e
φ(x)/εn
dµn (x)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
217
A Concise Exposition of Large Deviations
≤ max lim sup eφL (x)/εn dµn (x), −L, X
n→∞
lim sup n→∞
{φ≥L}
e
φ(x)/εn
dµn (x) .
Letting L → ∞ yields the assertion thanks to (4.1). (b) For every open G ⊂ X and L > 0 define L if x ∈ G, θL (x) := −L if x ∈ Gc , which is lower semicontinuous on X . Apply Lemma 4.3 to a lower semicontinuous function φL := min{φ, θL } to have eφL (x)/εn dµn (x) ≥ sup {φL (x) − I(x)} lim inf εn log n→∞
X
x∈X
≥ sup {min{φ(x), L} − I(x)}.
(4.7)
x∈G
Since
X
e
φL (x)/εn
dµn (x) ≤
e
φ(x)/εn
dµn (x) +
G
≤
Gc
e−L/εn dµn (x)
eφ(x)/εn dµn (x) + e−L/εn ,
G
we have
lim inf εn log n→∞
X
eφL (x)/εn dµn (x)
φ(x)/εn e dµn (x), −L . ≤ max lim inf ε log n→∞
(4.8)
G
Combining (4.7) and (4.8) and letting L → ∞ give the assertion.
Remark 4.8. By (4.4) condition (4.1) implies that the limit in (4.2) is finite, which we denote by B. Then Theorem 4.7 says that the sequence of probability measures −1 dˆ µn (x) := eφ(y)/εn dµn (y) eφ(x)/εn dµn (x) X
satisfies the LDP in the scale εn with the rate function I(x) − φ(x) + B on X .
October 24, 2013
10:0
9in x 6in
218
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
Theorem 4.1 (also Theorem 4.6) is thought as an (infinite dimensional) extension of the so-called Laplace method so that it is also called the Laplace-Varadhan’s integral lemma. When µn is the distribution of an X -valued random variable Yn , the integrals in assertions (a) and (b) of Theorem 4.7 are eφ(x)/εn dµn (x) = E eφ(Yn )/εn 1{Yn ∈B} B
for B ⊂ X . So we may call Theorem 4.7 a “functional LDP”. From the statistical mechanical point of view, limiting logarithmic generating functions like (3.1) may be considered as (mean) pressure functions (or free energy densities), and rate functions for LDP such as the G¨ artner-Ellis theorem often arise as entropy-like functions. The Legendretype expression as in (3.2) is the so-called variational expression of the entropy-like function I = Λ∗ in terms of the pressure function Λ (and the reverse expression is often valid). This kind of variational formulations appear universally in statistical mechanical systems (see [16]). From the same point of view, Varadhan’s integral lemma (also Theorems 4.7) may be considered as a variational expression in the presence of a potential function φ. The left-hand side of (4.2) may be thought as the free energy density perturbed by φ. The quantum version of the same lines in the setting of quantum spin chains will be discussed in Section 7.
5. The Sanov Theorem In this section we will show the celebrated Sanov theorem. The proof is based on the weak LDP in a rather abstract setting (Theorem 5.4) and the exponential tightness property of the distributions of the empirical measures (Lemma 5.6). We begin with a rather general assumption and some preliminary lemmas in an abstract locally convex topological space. Assumption 5.1. X is a locally convex Hausdorff topological real vector space and E is a closed convex subset of X . A metric ρ on E is compatible with the topology induced by X for which the following hold: 1◦ (E, ρ) is a Polish space. 2◦ For every 0 < α < 1 and x1 , x2 , y1 , y2 ∈ E, ρ(αx1 + (1 − α)x2 , αy1 + (1 − α)y2 ) ≤ max{ρ(x1 , y1 ), ρ(x2 , y2 )}.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
219
Lemma 5.2. (1) If K ⊂ E is compact, then the closed convex hull co K of K is compact. (2) Let A be an open and convex subset of E. If K ⊂ A is compact, then co K ⊂ A. Proof. (1) Let K ⊂ E be compact and δ > 0 be arbitrary. There are
m x1 , . . . , xm ∈ K such that K ⊂ j=1 Bδ (xj ), where Bδ (xj ) := {y ∈ E : ρ(y, xj ) < δ}. Define m m αj yj : αj ≥ 0, αj = 1, yj ∈ Bδ (xj ) . Γ(δ) := j=1
j=1
Since Assumption 5.1 2◦ implies that Bδ (xj )’s are convex, Γ(δ) is convex, m so the convex hull co K of K is included in Γ(δ). If j=1 αj yj is as in the definition of Γ(δ) above, then m m ρ αj yj , αj xj ≤ max ρ(yj , xj ) < δ j=1
1≤j≤m
j=1
thanks to Assumption 5.1 (2◦ ) again. Hence co K ⊂ Γ(δ) ⊂ (co {x1 , . . . , xm })δ so that co K ⊂ (co {x1 , . . . , xm })2δ , where Aδ := {x ∈ E : ρ(x, A) < δ} for A ⊂ E. Since co {x1 , . . . , xm } is obviously compact, we see that co K is totally bounded and hence compact. (2) Let A ⊂ E be open and convex, and K ⊂ A be compact. Since X is locally convex, for each x ∈ K there is a convex neighborhood Ux of x such that U x ⊂ A. Hence there are finite convex U1 , . . . , Uk such that U i ⊂ A
and K ⊂ ki=1 Ui . Consider k k ( ( ' := U i ∩ co K = U i ∩ co K ⊃ K. K i=1
i=1
Since co K is compact by (1), U i ∩ co K is compact and convex for each 1 ≤ i ≤ k. Hence k k ' co K = αi yi : αi ≥ 0, αi = 1, yi ∈ U i ∩ co K i=1
i=1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
220
b1644-ch03
Real and Stochastic Analysis
is compact as it is a continuous image of the compact space k k ) (α1 , . . . , αk ) : αi ≥ 0, αi = 1 × U i ∩ co K . i=1
i=1
' ⊂ A so that co K ⊂ A. Since A is convex, we have K ⊂ co K * Consider the infinite product space Ω := E N = ∞ 1 E with the Borel B , B being the Borel (= Baire) σ-field on E. For each σ-filed BΩ = ⊗∞ E E 1 n ∈ N let Xn : Ω → E be the nth coordinate function, i.e., Xn (ω) := ωn for ω = (ωn ) ∈ Ω so that Xn : (Ω, BΩ ) → (E, BE ) is measurable. Moreover, define 1 Xi n i=1 n
Sn :=
and Sm,n :=
n 1 Xi n − m i=m+1
(m < n).
Now let µ ˆ ∈ M1 (E) be given, where M1 (E) is the set of probability measures on (E, BE ). Define a probability measure P := µ ˆN = ⊗∞ ˆ on (Ω, BΩ ) 1 µ so that X1 , X2 , . . . are i.i.d. with µ ˆ = µX1 . Also define µ ˆn := µSn ∈ M1 (E),
n ∈ N.
(5.1)
Lemma 5.3. (1) For every convex set C ∈ BE , the sequence {ˆ µn (C)}∞ n=1 is supermultiplicative. (2) For every open set A ⊂ E, either µ ˆn (A) = 0 for all n ∈ N, or there is an N ∈ N such that µ ˆ n (A) > 0 for all n ≥ N. Proof. (1) Since Sm+n = the convexity of C that
m S m+n m
+
n S , m+n m,m+n
it is immediate from
{ω : Sm+n (ω) ∈ C} ⊃ {ω : Sm (ω) ∈ C} ∩ {ω : Sm,m+n (ω) ∈ C}. By independence of Xi ’s and shift invariance of P , P (Sm+n ∈ C) ≥ P (Sm ∈ C)P (Sn ∈ C), that is, µ ˆm+n (C) ≥ µ ˆm (C)ˆ µn (C). (2) Let A ⊂ E be open and assume that µ ˆm (A) > 0 for some m ∈ N. ˆm } ⊂ M1 (E) is tight (see Theorem 5.5 below), Since a finite set {ˆ µ1 , . . . , µ there is a compact K ⊂ E such that µ ˆj (K) > 0 for 1 ≤ j ≤ m. Suppose that every x ∈ A has a neighborhood Ux with µ ˆm (Ux ) = 0. Since E has a countable open base by Assumption 5.1 (1◦ ), there is a countable {x1 , x2 , . . .}
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
221
∞ ∞ in A such that A ⊂ i=1 Uxi and hence µ ˆm (A) ≤ i=1 µ ˆm (Uxi ) = 0, a ˆm (U ) > 0 for every contradiction. Hence there exists an x0 ∈ A such that µ neighborhood U of x0 . For each y ∈ E, looking at the equality (1 − 0)x0 + 0y = x0 in the topological vector space X , one can choose an εy ∈ (0, 1) and neighborhoods Uy of x0 and Vy of y such that (1 − α)Uy + αVy ⊂ A if 0 ≤ α < εy .
k Choose y1 , . . . , yk ∈ E such that K ⊂ i=1 Vyi , and let ε := min1≤i≤k εyi > +k 0 and U := i=1 Uyi , a neighborhood of x0 . Then (1 − α)U + αK ⊂ A if 0 ≤ α < ε.
(5.2)
Since X is locally convex, U can be assumed convex. Choose an N ∈ N with N ≥ 2m/ε. For each n ≥ N write n = lm + j with l ∈ N and 1 ≤ j ≤ m. Since j j Slm + Slm,n Sn = 1 − n n and j/n ≤ m/N ≤ ε/2 < ε, by applying (5.2) to α = j/n we have ˆlm (U )ˆ µj (K) µ ˆn (A) = P (Sn ∈ A) ≥ P (Slm ∈ U, Slm,n ∈ K) = µ by independence and shift-invariance of P . Since µ ˆj (K) > 0 for 1 ≤ j ≤ m and µ ˆlm (U ) ≥ µ ˆm (U )l > 0 by (1), µ ˆ n (A) > 0 follows. Hereafter we denote by C ◦ the set of all open convex subsets of E under Assumption 5.1. For each A ∈ C ◦ , if we set ˆn (A) ∈ [0, ∞], an := − log µ
n ∈ N,
then the above lemma says that {an } is subadditive and either an = ∞ for all n ∈ N or there is an N such that an < ∞ for all n ≥ N . Hence it is well known that limn→∞ an /n exists and lim
n→∞
an an = inf . n≥1 n n
So we define Lµˆ (A) := − lim
n→∞
1 1 log µ ˆn (A) = − inf log µ ˆn (A) ∈ [0, ∞]. n≥1 n n
(5.3)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
222
b1644-ch03
Real and Stochastic Analysis
Moreover, for each x ∈ E define Iµˆ (x) := lim Lµˆ (Bε (x)) = sup{Lµˆ (A) : x ∈ A ∈ C ◦ }. ε 0
(5.4)
Then we have: Theorem 5.4. (1) The function Iµˆ is a convex and lower semicontinuous function on E. (2) (ˆ µn ) satisfies the weak LDP with rate function Iµˆ , i.e., (a) for every open G ⊂ E, lim inf n→∞
1 log µ ˆ n (G) ≥ − inf Iµˆ (x), x∈G n
(b) for every compact K ⊂ E, lim sup n→∞
(3) Moreover, if G =
m
i=1
lim
Ai is a finite union of Ai ∈ C ◦ , then
n→∞
Proof.
1 log µ ˆn (K) ≤ − inf Iµˆ (x). x∈K n
1 log µ ˆn (G) = − inf Iµˆ (x). x∈G n
(1) Since by (5.3) and (5.4) 1 ◦ ˆn (A) : x ∈ A ∈ C , n ∈ N , Iµˆ (x) = sup − log µ n
it is obvious that Iµˆ is lower semicontinuous. To show the convexity of Iµˆ , it suffices to show that it is mid-point convex. Let x, x1 , x2 ∈ E with x = (x1 + x2 )/2. For any A ∈ C ◦ containing x, choose Ak ∈ C ◦ containing xk , k = 1, 2, so that 12 (A1 + A2 ) ⊂ A. Then
1 log P (S2n ∈ A) Lµˆ (A) = lim − n→∞ 2n
1 log P (Sn ∈ A1 , Sn,2n ∈ A2 ) ≤ lim inf − n→∞ 2n (since S2n = (Sn + Sn,2n )/2)
1 = lim inf − log P (Sn ∈ A1 )P (Sn ∈ A2 ) n→∞ 2n
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
1 = 2 =
b1644-ch03
223
1 1 lim − log µ ˆn (A1 ) + lim − log µ ˆn (A2 ) n→∞ n→∞ n n
Iµˆ (x1 ) + Iµˆ (x2 ) Lµˆ (A1 ) + Lµˆ (A2 ) ≤ , 2 2
implying Iµˆ (x) ≤ (Iµˆ (x1 ) + Iµˆ (x2 ))/2. (2-a) Let G ⊂ E be open. For every x ∈ G choose an A ∈ C ◦ such that x ∈ A ⊂ G. Then lim inf n→∞
1 1 log µ ˆn (G) ≥ lim log µ ˆn (A) = −Lµˆ (A) ≥ −Iµˆ (x). n→∞ n n
(2-b) Let K ⊂ E be compact and set β < inf x∈K Iµˆ (x) be arbitrary. For each x ∈ K there is an Ax ∈ C ◦ such that x ∈ Ax and Lµˆ (Ax ) > β. Hence
m there are finite A1 , . . . , Am ∈ C ◦ such that K ⊂ i=1 Ai and Lµˆ (Ai ) > β for 1 ≤ i ≤ m. Therefore, m 1 1 lim sup log µ ˆn (K) ≤ lim sup log µ ˆn (Ai ) n→∞ n n→∞ n i=1 1 ˆ n (Ai ) = lim sup log max µ 1≤i≤m n→∞ n 1 = lim max log µ ˆn (Ai ) n→∞ 1≤i≤m n = max (−Lµˆ (Ai )) ≤ −β. (3) Let G = max
1≤i≤m
m i=1
1≤i≤m
Ai with Ai ∈ C ◦ . Since
m 1 1 1 log µ ˆn (Ai ) ≤ log µ ˆn (G) ≤ log µ ˆn (Ai ) n n n i=1 1 ≤ log m max µ ˆn (Ai ) 1≤i≤m n 1 1 log µ ˆn (Ai ) , = log m + max 1≤i≤m n n
we see that lim
n→∞
1 1 log µ ˆn (G) = lim max log µ ˆn (Ai ) n→∞ 1≤i≤m n n = max −Lµˆ (Ai ) = − min Lµˆ (Ai ). 1≤i≤m
1≤i≤m
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
224
b1644-ch03
Real and Stochastic Analysis
Hence it suffices to show that Lµˆ (A) = inf Iµˆ (x) x∈A
for every A ∈ C ◦ .
Since Lµˆ (A) ≤ inf x∈A Iµˆ (x) is clear by definition, let us show the reverse inequality when Lµˆ (A) < ∞. For any δ > 0 choose an N ∈ N such that ˆn (A) ≤ Lµˆ (A) + δ for n ≥ N . Note that an open subset A of E − n1 log µ is a Polish space in the topology induced by E. Hence µ ˆN |A is tight (see Theorem 5.5), so there is a compact K ⊂ A such that −
1 1 log µ ˆN (K) ≤ − log µ ˆN (A) + δ ≤ Lµˆ (A) + 2δ. N N
By Lemma 5.2, K can be assumed convex. Then by Lemma 5.3 (1) we have − lim sup n→∞
1 1 log µ ˆn (K) ≤ − lim sup log µ ˆnN (K) n n→∞ nN ≤ − lim sup n→∞
=−
1 log µ ˆN (K)n nN
1 log µ ˆN (K) ≤ Lµˆ (A) + 2δ. N
Furthermore, by (2-b) we have inf Iµˆ (x) ≤ inf Iµˆ (x) ≤ − lim sup
x∈A
x∈K
n→∞
1 log µ ˆ n (K). n
Therefore, inf x∈A Iµˆ (x) ≤ Lµˆ (A) + 2δ, so the assertion follows.
Now, we fix the situation in which the Sanov theorem is formulated. Let Σ be a Polish space with the Borel (= Baire) σ-field BΣ . We write Cb (Σ; R) for the space of all real bounded continuous functions on Σ, which is a Banach space with the sup-norm f := sups∈Σ |f (s)|. Let M(Σ) denote the set of all finite signed measures on (Σ, BΣ ), which is a Banach space with the total variation norm. Moreover, let M1 (Σ) be the set of all probability measures on (Σ, BΣ ). The dual pairing between Cb (Σ; R) and M(Σ) is defined as f dν, f ∈ Cb (Σ; R), ν ∈ M(Σ). f, ν := Σ
The weak topology on M(Σ) is nothing but σ(M(Σ), Cb (Σ; R)), the topology induced by the dual pair, for which M(Σ) becomes a locally
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
225
A Concise Exposition of Large Deviations
convex Hausdorff topological real vector space whose dual space is Cb (Σ; R). Obviously, M1 (Σ) is a closed (in the weak topology) and convex subset of M(Σ). In the following, we state a few basic facts on the weak topology on M1 (Σ) (see e.g., [6, 7] for details). evy metric on M1 (Σ) is defined by 1◦ The L´ ρ(µ, ν) := inf{δ > 0 : µ(F ) ≤ ν(F δ ) + δ and ν(F ) ≤ µ(F δ ) + δ for all closed F ⊂ Σ} for µ, ν ∈ M1 (Σ), which is compatible with the weak topology and makes (M1 (Σ), ρ) a Polish space (see [7, 14]). Moreover, ρ satisfies Assumption 5.1 2◦ (this is an easy exercise). 2◦ M1 (Σ) is compact in the weak topology if and only if Σ is compact. In this case, M(Σ) = C(Σ; R)∗ , the dual Banach space, and the weak topology is the weak* topology. 3◦ For (µn ) and µ in M1 (Σ), µn → µ weakly as n → ∞ if and only if lim sup µn (F ) ≤ µ(F ) n→∞
for all closed F ⊂ Σ,
or equivalently lim inf n→∞ µn (G) ≥ µ(G) for all open G ⊂ Σ. The next result is important (see [7]). Theorem 5.5 (Prohorov). A set Γ ⊂ M1 (Σ) is relatively compact in the weak topology if and only if Γ is tight, namely, for every δ > 0 there is a compact K ⊂ Σ such that µ(Σ\K) ≤ δ for all µ ∈ Γ. In particular, a single µ ∈ M1 (Σ) is tight. Now, the Sanov theorem is formulated as follows. Let Y = (Y1 , Y2 , . . .) be a sequence of i.i.d. Σ-valued random variables with µ := µY1 ∈ M1 (Σ). Let δs be the Dirac measure at s ∈ Σ. We define the empirical measures LY n :=
δY1 + · · · + δYn , n
n ∈ N,
(5.5)
which are M1 (Σ)-valued random variables or random probability measures ˆ ∈ M1 (E) be the distribution on Σ. Let X := M(Σ), E := M1 (Σ) and µ
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
226
b1644-ch03
Real and Stochastic Analysis
of δY1 . Then it is clear that µ ˆn ∈ M1 (E) given in (5.1) is nothing but the , i.e., distribution of LY n µ ˆn (Γ) =
P (LY n
⊗n
∈ Γ) = µ
⊗n
δt + · · · + δtn t∈Σ : 1 ∈Γ n
n
=µ ˆ
ν1 + · · · + νn (ν1 , . . . , νn ) ∈ E : ∈Γ n
(5.6)
n
for Borel sets Γ ⊂ M1 (Σ). With fact 1◦ above, Theorem 5.4 implies that (ˆ µn ) satisfies the weak LDP with a rate function Iµˆ that is a convex lower semicontinuous function on M1 (Σ). In the rest of this section, we further show the Sanov theorem saying that (ˆ µn ) indeed satisfies the (full) LDP and Iµˆ is the good rate function equal to the relative entropy functional with respect to µ. To prove this, the following is essential. Lemma 5.6. The sequence of the distributions µ ˆn of LY n is exponentially tight, i.e., for every m ∈ N there is a compact Km ⊂ M1 (Σ) such that lim sup n→∞
1 c log µ ˆn (Km ) ≤ −m. n
Proof. Since µ ∈ M1 (Σ) is tight, there are compact Ck ⊂ Σ, k ∈ N, such 2 that µ(Ckc ) ≤ e−2k . Define 1 c , Kk := ν ∈ M1 (σ) : ν(Ck ) ≤ k
ˆ m := K
∞
Kk .
k=m
ˆ m is Then Kk ’s are closed in M1 (Σ) by fact 3◦ mentioned above; so K c ˆ ˆ closed. Since ν(Ck ) ≤ 1/k for all k ≥ m and all ν ∈ Km , Km is tight and so compact by Theorem 5.5. Notice that
1 1 c 2 Y c L LY ≤ E exp 2nk (C ) > (C ) − n k n k k k , n = e−2nk E exp 2k 2 1{Yi ∈Ckc }
/ Kk = P P LY n ∈
i=1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
227
= e−2nk E[exp(2k 2 1{Y1 ∈Ckc } )]n 2
= e−2nk [µ(Ck ) + e2k µ(Ckc )]n ≤ e−2nk 2n ≤ e−nk . Hence ∞
ˆ m) ≤ /K P (LY n ∈
∞
P (LY / Kk ) ≤ n ∈
k=m
e−nk =
k=m
e−nm ≤ 2e−nm 1 − e−n
so that lim sup n→∞
c 1 ˆ m = lim sup 1 log P LY ˆ m ≤ −m. log µ ˆn K /K n ∈ n n n→∞
From Lemmas 1.5 and 5.6 we have shown that (ˆ µn ) satisfies the LDP with the good rate function Iµˆ . Furthermore, notice that, for every f ∈ Cb (Σ; R), 1 1 en f,ν dˆ µn (ν) = log E exp nf, LY log n n n M1 (Σ) , n 1 = log E exp f (Yi ) n i=1 n 1 log E ef (Y1 ) = log E ef (Y1 ) n ef dµ < ∞. = log
=
Σ
Hence Theorem 4.6 implies that Iµˆ (ν) =
sup f ∈Cb (Σ;R)
{f, ν − Λµ (f )},
where
ν ∈ M1 (Σ),
Λµ (f ) := log
Σ
ef dµ,
f ∈ Cb (Σ; R).
(To be more precise in applying Theorem 4.6, we may extend Iµˆ to M(Σ) by letting Iµˆ (ν) = ∞ for ν ∈ M(Σ)\M1 (Σ).) Therefore, Iµˆ is the FenchelLegendre transform Λ∗µ of Λµ . In physics terminology, Λµ (f ) is the pressure of a potential f with respect to µ. An essential fact is that Λ∗µ (ν) for ν ∈ M1 (Σ) is the relative entropy of ν with respect to µ, which we will show
October 24, 2013
10:0
9in x 6in
228
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
in the following: Lemma 5.7. For every ν ∈ M1 (Σ), Λ∗µ (ν) coincides with the relative entropy (or Kullback-Leibler divergence) S(νµ) defined by dν dν dν log dν if ν µ dµ log dµ dµ = dµ Σ Σ (absolutely continuous), S(νµ) := ∞ otherwise. Proof. Assume ν µ and let ψ := dν/dµ. For 0 < θ < 1 let νθ := (1−θ)ν +θµ and ψθ := dνθ /dµ = (1−θ)ψ +θ (≥ θ). For every f ∈ Cb (Σ; R) by Jensen’s inequality we have
(f − log ψθ ) dνθ exp[f, νθ − S(νθ µ)] = exp Σ
≤
Σ
= Σ
exp(f − log ψθ ) dν e
f
ψθ−1
dνθ =
Σ
ef dµ = exp Λµ (f )
so that f, νθ − Λµ (f ) ≤ S(νθ µ). Obviously, f, νθ → f, ν as θ 0. On the other hand, by concavity of t log t we have S(νθ µ) = ψθ log ψθ dµ Σ
≤
Σ
(1 − θ)ψ log ψ dµ = (1 − θ)S(νµ) −→ S(νµ) as θ 0.
Hence f, ν − Λµ (f ) ≤ S(νµ) for every f ∈ Cb (Σ; R). This implies that Λ∗µ (ν) ≤ S(νµ), which is trivial unless ν µ. To prove the reverse inequality, assume that β := Λ∗µ (ν) < ∞. Then f dν − log ef dµ ≤ β (5.7) Σ
Σ
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
229
for all f ∈ Cb (Σ; R). For any bounded BΣ -measurable function φ, one can choose a uniformly bounded sequence {fn } in Cb (Σ; R) such that fn → φ (µ + ν)-a.e. Apply (5.7) to fn and take the limit to see that (5.7) holds for f = φ as well. When Γ ∈ BΣ and µ(Γ) = 0, letting f = α1Γ in (5.7) we have αν(Γ) ≤ β for all α > 0; hence ν(Γ) = 0. Therefore, ν µ; so let ψ := dν/dµ and ψn :=
1 1 1 + ψ1{ n1 ≤ψ≤n} + n1{ψ>n} . n {ψ< n }
Applying (5.7) to bounded log ψn gives log ψn dν − log ψn dµ ≤ β. Σ
Σ
Notice that log ψn dν Σ
1 dν + log ψ dν log 1 1 n {ψ< n } {n ≤ψ≤n} 1 = log ψ dµ + ψ log ψ dµ 1 1 n {ψ< n } {n ≤ψ≤n} 1 1 ψ log ψ dµ since ψ log ≥ ψ log ψ on ψ < ≥ n n {ψ≤n} ψ log ψ dµ = S(νµ) as n → ∞. −→ ≥
Σ
Moreover, it is straightforward to check that Σ ψn dµ → Σ ψ dµ = 1 as n → ∞. Hence we obtain S(νµ) ≤ β, and S(νµ) ≤ Λ∗µ (ν) follows. f Remark 5.8. Since (ν, µ) → f, ν − log Σ e dµ is affine and weakly continuous on M1 (Σ) × M1 (Σ), Lemma 5.7 shows that S(νµ) is jointly convex and jointly lower semicontinuous in the weak topology, which are the well-known properties of relative entropy. At the end we obtain: Theorem 5.9 (Sanov). Let Y = (Y1 , Y2 , . . .) be a sequence of i.i.d. Σ-valued random variables with µ := µY1 ∈ M1 (Σ). Then the distributions (ˆ µn ) of the empirical measures (LY n ) satisfies the LDP with the relative entropy functional S(·µ) as good rate function.
October 24, 2013
10:0
9in x 6in
230
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
There is another route to prove the Sanov theorem. The G¨ artner-Ellis theorem can be generalized to an abstract setting of locally convex topological spaces, which is known as Baldi’s theorem. Theorem 5.9 can be obtained by using this theorem and Lemma 5.6. Moreover, the Sanov theorem holds true in the τ -topology that is finer than the weak topology. The τ -topology is induced by the dual pairing between the space B(Σ) of bounded Borel measurable functions and M(Σ). (See [13, 14] for details.) The Cram´er theorem (also the G¨ artner-Ellis theorem) is the LDP for d empirical sums in R (or R ) while the Sanov theorem is for empirical measures in M(Σ). Since levels of the underlying space are different, they are sometimes called large deviations of level-1 and of level-2, respectively (see [16]).
6. Large Deviations for Random Matrices The logarithmic energy E(ν) of a signed measure ν on the complex plane C is given as 1 log dν(x) dν(y) E(ν) := |x − y| C2 whenever
C
log 1 d|ν|(x) d|ν|(y) < ∞ |x − y| 2
(|ν| denotes the total variation of ν); otherwise we put E(ν) := ∞. The logarithmic energy plays an essential role in potential theory. Lemma 6.1. Let ν be a compactly supported signed measure on C such that ν(1) = 0. Then E(ν) ≥ 0, and E(ν) = 0 if and only if ν = 0. Proof. Recall that a real symmetric kernel L(x, y) is called negative definite if n
ξi ξj L(xi , xj ) ≤ 0
(6.1)
i,j=1
n whenever real numbers ξ1 , . . . , ξn satisfy i=1 ξi = 0. It follows by approximation that for a continuous negative definite kernel L(x, y) one has L(x, y) dν(x) dν(y) ≤ 0 C2
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
231
if ν is a compactly supported signed measure such that ν(1) = 0. Indeed, approximating ν by atomic measures one can have a double integral which reduces to a double sum of the form (6.1). The logarithmic kernel K(x, y) := log |x − y| has a singularity at x = y, and to avoid this we set for an ε > 0 ∞ 1 1 dt. (6.2) − Kε (x, y) := log(ε + |x − y|) = 1 + t t + ε + |x − y| 0 This kernel Kε (x, y) is the integral of negative definite kernels ([3, Chapter 3]) and it is negative definite by itself. Therefore, Kε (x, y) dν(x) dν(y) ≤ 0, C2
and we take ε 0 to conclude that E(ν) ≥ 0 whenever K(x, y) is integrable with respect to |ν| ⊗ |ν| (otherwise, E(ν) = ∞ by definition). Now assume E(ν) = 0. For 0 < ε < R < ∞ we write
1 dν(x) dν(y) dt C2 t + |x − y| ε R 1 1 dν(x) dν(y) dt − = 1 + t t + |x − y| ε C2 1+R log(ε + |x − y|) + log dν(x) dν(y) = (1 + ε)(R + |x − y|) C2 (6.3)
−
R
by the Fubini theorem. Here note ([3]) that (t + |x − y|)−1 is a positive definite kernel for any t > 0 and hence 1 dν(x) dν(y) ≥ 0, t > 0. (6.4) C2 t + |x − y| We can take the limit of (6.3) as ε 0 and R ∞ to obtain ∞ 1 dν(x) dν(y) dt = 0 , C2 t + |x − y| 0 which implies thanks to (6.4) that 1 dν(x) dν(y) = 0 for all t > 0. t + |x − y| 2 C
October 24, 2013
10:0
9in x 6in
232
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
Taking the expansion ∞ (−1)n 1 |x − y|n = t + |x − y| n=0 tn+1
in a neighborhood of t = ∞, we have |x − y|2n dν(x) dν(y) = 0 C2
for all integers n ≥ 0. This means that n i,j=0
(−1)i+j
n n xi xj dν(x) xn−i x ¯n−j dν(x) = 0 j i C C
for iallj n. Now we can easily make a mathematical induction to obtain x x¯ dν(x) = 0 for all integers i, j ≥ 0, which is enough to conclude ν = 0. The free entropy Σ(ν) of ν ∈ M1 (C) is defined by Σ(ν) := log |x − y| dν(x) dν(y) = −E(ν).
(6.5)
C2
Note that when ν is compactly supported, the above integral always exists though it can be −∞, for example, if ν has an atom. Lemma 6.2. The free entropy functional Σ(ν) is weakly upper semicontinuous and concave on the set of probability measures restricted on any compact subset of C. Moreover, it is strictly concave in the sense that Σ(λν1 + (1 − λ)ν2 ) > λΣ(ν1 ) + (1 − λ)Σ(ν2 ) if 0 < λ < 1 and ν1 , ν2 are compactly supported probability measures such that ν1 = ν2 , Σ(ν1 ) > −∞ and Σ(ν2 ) > −∞. Proof. Let Kε (x, y) be the kernel given in (6.2). The weak upper semicontinuity follows because Σ(ν) is written as Kε (x, y) dν(x) dν(y) Σ(ν) = inf ε>0
C2
and the above double integral is continuous in the weak topology when the support of µ is restricted on a compact subset.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
233
To prove the strict concavity, let ν1 = ν2 be compactly supported measures such that Σ(ν1 ) > −∞ and Σ(ν2 ) > −∞. First we show that 1 E(ν1 , ν2 ) := log dν1 (x) dν2 (y) |x − y| 2 C is finite. Since the kernel Kε (x, y) is negative definite, we have 0≥ Kε (x, y) d(ν1 − ν2 )(x) d(ν1 − ν2 )(y) C2
≥ Σ(ν1 ) + Σ(ν2 ) − 2
C2
Kε (x, y) dν1 (x) dν2 (y).
Letting ε 0 yields that Σ(ν1 ) + Σ(ν2 ) + 2E(ν1 , ν2 ) ≤ 0, so E(ν1 , ν2 ) < ∞ (and E(ν1 , ν2 ) > −∞ is obvious since ν1 , ν2 are compactly supported). Now we are in the situation where E(ν1 ), E(ν2 ) and E(ν1 , ν2 ) are all finite. Then we have, for 0 < λ < 1, E(λν1 + (1 − λ)ν2 ) = E(ν2 ) + 2λE(ν2 , ν1 − ν2 ) + λ2 E(ν1 − ν2 ) and by Lemma 6.1 d2 E(λν1 + (1 − λ)ν2 ) = E(ν1 − ν2 ) > 0. dλ2 This implies that Σ(ν) is strictly concave (hence also concave).
The (logarithmic) capacity of a compact set K ⊂ R is defined as cap(K) := exp sup{Σ(ν) : ν ∈ M1 (K)} with convention cap(K) = 0 if Σ(ν) = −∞ for all ν ∈ M1 (K). Then the capacity of a general Borel set A ⊂ R is defined as cap(A) := sup{cap(K) : K ⊂ A compact}. A property is said to hold for quasi-every x ∈ A if it holds for all x ∈ A except in a set of capacity zero. Let S be a closed subset in R (or C). Let M1 (S) denote the set of all probability measures ν whose support supp ν is included in S. Moreover, let w : S → [0, ∞) be a weight function, which is assumed to satisfy the following conditions: 1◦ w is continuous on S. 2◦ S0 := {x ∈ S : w(x) > 0} has positive (logarithmic) capacity, that is, E(ν) < +∞ for some probability measure ν such that supp ν ⊂ S0 . 3◦ |x|w(x) → 0 as x ∈ S, |x| → ∞, when S is unbounded.
October 24, 2013
10:0
9in x 6in
234
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
Let Q(x) := − log w(x) and define the weighted energy integral (or weighted potential) EQ (ν) :=
S2
log
1 + Q(x) + Q(y) dν(x) dν(y) |x − y|
for ν ∈ M1 (S).
One observes that EQ (ν) > −∞ is well defined thanks to the above assumptions. See [35, Theorem I.1.3] for the details on the next theorem due to Mhaskar and Saff, which is fundamental in the theory of weighted potentials and it is proved by the adaptation of the classical Frostman method. Theorem 6.3 (Mhaskar-Saff ). With the above assumptions, there exists a unique νQ ∈ M1 (S) such that EQ (νQ ) = inf{EQ (ν) : ν ∈ M1 (S)}. Then EQ (νQ ) is finite, νQ has finite logarithmic energy, and supp νQ is compact. Furthermore, the minimizer νQ is characterized as νQ ∈ M1 (S) with compact support such that for some real number B the following holds: log |x − y| dνQ (y) S
In this case, B = EQ (νQ ) −
≥ Q(x) − B ≤ Q(x) − B S
for all x ∈ supp νQ , for quasi-every x ∈ S.
Q dνQ .
We denote by Msa n the space of n × n Hermitian (or self-adjoint) matri2 ces, which can be identified with the Euclidean space Rn by taking the real and imaginary parts Aii (1 ≤ i ≤ n) and Re Aij , Im Aij (1 ≤ i < j ≤ n) of the entries as the coordinates of A ∈ Msa n . So the standard reference measure on Msa is the Lebesgue measure given as n dA = dΛn (A) := 2n(n−1)/2
n ) i=1
dAii
)
d(Re Aij ) d(Im Aij ).
(6.6)
i<j
The constant 2n(n−1)/2 in the above is not essential but it arises when we 2 transform the Lebesgue measure on Rn onto Msa n by the isometry between n2 with the Hilbert-Schmidt norm and R with the Euclidean norm. Msa n sa ∗ For A ∈ Mn let A = V DV be the diagonalization, where V ∈ U(n) is a unitary matrix and D = Diag (t1 , t2 , . . . , tn ) is a diagonal matrix with t1 ≤ t2 ≤ · · · ≤ tn . Except a negligible (with respect to Λn ) set, V is unique
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
235
A Concise Exposition of Large Deviations
up to a diagonal unitary factor. So one can change the coordinates as n A ∈ Msa n ←→ (V, D) ∈ U(n)/T × R≤ ,
where U(n)/T is the homogeneous space divided by the torus of diagonal unitaries and Rn≤ ⊂ Rn is the vectors of increasing coordinates. Let γ˙ U(n) denote the probability measure on U(n)/T induced from the Haar probability measure γU(n) on U(n). See [23] for the details on the next lemma and related facts. Lemma 6.4. The measure Λn is transformed into the product measure n n(n−1)/2 ) ) (2π) (ti − tj )2 dti γ˙ U(n) ⊗ *n−1 j=1 j! i<j i=1 on U(n)/T × Rn≤ under the above change of coordinates. Now let Q(x) be a real continuous function on R such that for any ε>0 lim |x| exp(−εQ(x)) = 0.
|x|→∞
(6.7)
For each n ∈ N consider an n × n random Hermitian matrix H(n) whose distribution on Msa n is given as 1 exp(−nTr(Q(A))) dA, Zn where Tr is the usual trace on matrices, Q(A) is defined via functional calculus, and Zn is the normalization constant, i.e., Zn := exp(−nTr(Q(A))) dA. Msa n Consider the (random) eigenvalues λ1 , . . . , λn of the random matrix H(n), and we are concerned with the LDP for the empirical measures LH(n) :=
δλ1 + · · · + δλn , n
n ∈ N.
(6.8)
From Lemma 6.4 one observes that the joint distribution of the eigenvalues (λ1 , . . . , λn ) has joint density n ) 1 exp −n Q(ti ) |ti − tj |2 ˜ Zn i=1 i<j
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
236
b1644-ch03
Real and Stochastic Analysis
with the normalization constant Z˜n . Slightly more generally, we consider the probability measure µ ˜n on Rn having the density n ) 1 exp −n Q(ti ) |ti − tj |2β , (6.9) ' Zn i=1
i<j
where β > 0 is fixed (independent of n) and Z˜n is the normalization constant, i.e., n ) exp −n Q(ti ) |ti − tj |2β dt. Z˜n := Rn
i=1
i<j
Note that the assumption (6.7) implies that this integral is finite. (Note also that the cases β = 1/2, 1 and 2 correspond, respectively, to real symmetric, complex Hermitian and symplectic random matrices, see [29].) ˜n , the distribution of LH(n) When λ ∈ Rn is distributed according to µ in (6.8) is the probability measure µ ˆn on M1 (R) defined by ˜n ({t ∈ Rn : κt ∈ Γ}) µ ˆ n (Γ) := µ
(6.10)
for Borel sets Γ ⊂ M1 (R), where κt :=
δt1 + · · · + δtn n
for t ∈ Rn .
Theorem 6.5. The finite limit B := limn→∞ n12 log Z˜n exists and (ˆ µn ) satisfies the LDP in the scale 1/n2 with the good rate function Q(x) dν(x) + B, ν ∈ M1 (R). (6.11) I(ν) := −βΣ(ν) + R
Furthermore, there exists a unique νQ ∈ M1 (R) such that I(νQ ) = 0. Both the Sanov theorem (Theorem 5.9) and Theorem 6.5 are level-2 LDP for the distributions of the empirical measures of the same forms (5.5) and (6.8). But the scales are different as 1/n for the former and 1/n2 for the latter. The scale 1/n2 is natural because the empirical measure (6.8) comes from the eigenvalues of an n × n random Hermitian matrix and the 2 dimension of Msa n is n . A more essential difference between Theorems 5.9 and 6.5 is the reference measure; a product measure µ⊗n is taken in (5.6) while µ ˜n in (6.10) has a more complicated density (6.9). Comparing the rate functions of Theorems 5.9 and 6.5, we are tempted to call the rate function (6.11) the relative free entropy of ν with respect to νQ . This is really natural as discussed in [5, 20].
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
237
A Concise Exposition of Large Deviations
To prove the theorem, we introduce a kernel F (x, y) := −β log |x − y| +
Q(x) + Q(y) , 2
and its cutoff Fα (x, y) := min{F (x, y), α}
for α > 0.
Since
Q(y) Q(x) + log |y| exp − F (x, y) ≥ −β log |x| exp − 2β 2β
whenever |x|, |y| ≥ 2, it follows that Fα (x, y) is bounded and continuous. Therefore, ν ∈ M1 (R) → Fα (x, y) dν(x) dν(y) R2
is continuous and
−βΣ(ν) +
R
Q(x) dν(x) =
R2
F (x, y) dν(x) dν(y)
(6.12)
= sup R2
α>0
Fα (x, y) dν(x) dν(y)
is lower semicontinuous in the weak topology on M1 (R). Lemma 6.6. lim sup n→∞
Proof.
Rn
= Rn
≤
F (x, y) dν(x) dν(y).
We estimate as follows:
Zn =
1 log Z˜n ≤ − inf n2 ν∈M1 (R)
Rn
exp −
n
) Q(ti ) exp − (Q(ti ) + Q(tj )) |ti − tj |2β dt
i=1
exp −
n
i<j
Q(ti ) exp −2
i=1
exp −
n i=1
i<j
F (ti , tj )dt
i<j
2
Q(ti ) exp −n
{x=y}
F (x, y) dκt (x) dκt (y) dt
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
238
b1644-ch03
Real and Stochastic Analysis
≤ exp −n inf ν
= R
2
{x=y}
F (x, y) dν(x) dν(y)
n e−Q(x) dx exp −n2 inf ν
R2
Rn
exp −
n
Q(ti ) dt
i=1
F (x, y) dν(x) dν(y) ,
implying the lemma. Lemma 6.7. For every ν ∈ M1 (R),
1 ˆn (G) inf lim sup 2 log µ G n→∞ n 1 ≤− F (x, y) dν(x) dν(y) − lim inf 2 log Z˜n , n→∞ n
where G runs over a neighborhood base of ν in the weak topology. ˜ := {t ∈ Rn : κt ∈ G}. Proof. For any neighborhood G of ν ∈ M1 (R) put G As in the proof of Lemma 6.6 we have
≤
1 ˜ Zn
˜ G
exp −
˜ G
˜ = 1 ˜n (G) µ ˆn (G) = µ Z˜n
exp −
Q(ti ) exp −2
i=1
n
F (ti , tj )dt
i<j
Q(ti )
i=1
2 × exp −n 1 ≤ ˜ Zn
n
R2
R
e
−Q(x)
Fα (x, y) dκt (x) dκt (y) + nα dt
n dx
2 × exp −n inf ν ∈G
R2
Fα (x, y) dν (x) dν (y) + nα .
Therefore, 1 log µ ˆn (G) n2 1 ≤ − inf Fα (x, y) dν (x) dν (y) − lim inf 2 log Z˜n . n→∞ n ν ∈G
lim sup n→∞
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
239
Thanks to the weak continuity of ν → Fα (x, y) dν (x) dν (y) we obtain
1 ˆn (G) inf lim sup 2 log µ G n→∞ n 1 ≤− Fα (x, y) dν(x) dν(y) − lim inf 2 log Z˜n . n→∞ n Letting α → ∞ yields the conclusion.
Lemma 6.8. For every ν ∈ M1 (R), 1 ˜ F (x, y) dν(x) dν(y) lim inf 2 log Zn ≥ − n→∞ n and
1 inf lim inf 2 log µ ˆn (G) n→∞ n G 1 ≥− F (x, y) dν(x) dν(y) − lim sup 2 log Z˜n , n→∞ n
where G runs over a neighborhood base of ν. Proof.
It is obvious that 1 ˆn (G) : G a neighborhood of ν ν ∈ M1 (R) → inf lim inf 2 log µ n→∞ n
is upper semicontinuous. Since F (x, y) is bounded below, we have F (x, y) dν(x) dν(y) = lim F (x, y) dνk (x) dνk (y) k→∞
−1
with νk := ν([−k, k]) 1[−k,k] ν. So it suffices to assume that ν has a compact support. For ε > 0 let φε be a nonnegative C ∞ -function supported in [−ε, ε] such that φε (x) dx = 1, and φε ∗ ν be the convolution of ν with φε . Thanks to the properties of Σ(ν) given in Lemma 6.2, it is easy to see that Σ(φε ∗ ν) ≥ Σ(ν). Also
lim
ε 0
R
Q(x) d(φε ∗ ν)(x) =
R
Q(x) dν(x).
Hence we may assume that ν has a continuous density with compact support. Moreover, let m be the uniform distribution on an interval [a, b] including supp ν. Then it suffices to show the required inequalities for each
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
240
b1644-ch03
Real and Stochastic Analysis
(1 − δ)ν + δm, 0 < δ < 1. After all, we may assume that ν has a continuous density f > 0 on supp ν = [a, b] so that there is a δ > 0 such that δ ≤ f (x) ≤ δ −1 for all a ≤ x ≤ b. For each n ∈ N let (n)
a < a1
(n)
< b1
(n)
< a2
< · · · < an(n) < bn(n) = b
be such that a
(n)
ai
i − 12 f (x) dx = , n
(n)
bi
f (x) dx =
a
i , n
1 ≤ i ≤ n.
Then it immediately follows that 1 δ (n) (n) ≤ bi − ai ≤ , 2n 2nδ
1 ≤ i ≤ n.
Define (n)
∆n := {(t1 , . . . , tn ) ∈ Rn : ai
(n)
≤ ti ≤ bi , 1 ≤ i ≤ n}.
For any neighborhood G of ν, it is clear that ˜ := {t ∈ Rn : κt ∈ G} ∆n ⊂ G for all n large enough. Therefore, for large n we have ˜ ≥µ ˜ n (G) ˜n (∆n ) µ ˆn (G) = µ n ) 1 = exp −n Q(ti ) |ti − tj |2β dt Z˜n ∆n i=1
i<j
n ) (n) 1 (n) (n) 2β aj − bi exp −n ξi dt Z˜n ∆n i=1 i<j n n ) (n) δ 1 (n) (n) 2β exp −n ξi , aj − bi ≥ ˜ Zn 2n i=1 i<j ≥
(n) (n) (n) where ξi := max Q(x) : ai ≤ x ≤ bi . Now let g : [0, 1] → [a, b] be t the inverse function of t → a f (x) dx. Since (n) ai
=g
i − 12 n
and
(n) bi
i , =g n
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
241
A Concise Exposition of Large Deviations
we have 1 (n) ξi = n→∞ n i=1 n
lim
0
1
Q(g(t)) dt =
b
Q(x)f (x) dt =
Q(x) dµ(x)
a
and 2 (n) (n) lim log(aj − bi ) = 2 log(g(t) − g(s)) ds dt n→∞ n2 0≤s
1
1
= 0
0
(6.13)
log |g(s) − g(t)| ds dt
f (x)f (y) log |x − y| dx dy = Σ(ν).
=
(Since log(g(t) − g(s)) is singular at s = t, equality (6.13) is not so obvious, so the proof will be given in Appendix A.) Therefore, by (6.12) 0 ≥ lim sup n→∞
1 log µ ˆn (G) ≥ − n2
F (x, y) dν(x) dν(y) − lim inf n→∞
1 log Z˜n n2
and lim inf n→∞
1 log µ ˆn (G) ≥ − n2
F (x, y) dν(x) dν(y) − lim sup n→∞
1 log Z˜n , n2
as desired. Lemma 6.9. The finite limit B = limn→∞
1 n2
log Z˜n exists.
Proof.
By Lemmas 6.6 and 6.8 we have 1 1 ˜ F (x, y) dν(x) dν(y) ≤ lim inf 2 log Z˜n . lim sup 2 log Zn ≤ − inf ν n→∞ n n n→∞
This gives the result because Theorem 6.3 says that ν ∈ M1 (R) → F (x, y) dν(x) dν(y) attains the finite minimum. Lemma 6.10. The sequence (ˆ µn ) is exponentially tight.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
242
b1644-ch03
Real and Stochastic Analysis
Proof.
For any α > 0 set Kα := ν ∈ M1 (R) : Q(x) dν(x) ≤ α .
Since Q(x) → ∞ as |x| → ∞ by the Assumption (6.7), it is easy to see that sup ν([−R, R]c ) → 0
ν∈Kα
as R → ∞
and hence Kα is compact in the weak topology (see Theorem 5.5). We have n 1 c n t∈R : µ ˆn (Kα ) = µ ˜n Q(ti ) > α n i=1 n ) 1 Q(ti ) |ti − tj |2β dt exp −n = P n 1 Z˜n Q(ti )>α n
≤
i=1
i=1
i<j
n ) 1 n2 α n exp − exp − Q(ti ) |ti − tj |2β dt. 2 2 Z˜n Rn i=1
i<j
When Q(x) is replaced by Q(x)/2, the finite limit n ) 1 n B2 = lim 2 log exp − Q(ti ) |ti − tj |2β dt n→∞ n 2 n R i=1 i<j exists as well as B by Lemma 6.9. Hence the above estimate gives lim sup n→∞
α 1 log µ ˆn (Kαc ) ≤ −B + B2 − . n2 2
Since α > 0 is arbitrary, we have the conclusion.
Now, we are in a position to complete the proof of Theorem 6.5. End of proof of Theorem 6.5. The proof is in the previous lemmas. Indeed, due to Theorem 1.7, Lemmas 6.7–6.9 imply that (ˆ µn ) satisfies the weak LDP with the rate function I(ν) = F (x, y) dν(x) dν(y) + B. Moreover, due to Lemma 1.5, Lemma 6.10 implies that (ˆ µn ) satisfies the LDP with the good rate function I. The existence of the unique minimizer is due to Theorem 6.3.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
243
A Concise Exposition of Large Deviations
In particular, when Q(x) := x2 /2σ 2 is a quadratic function, we have an n × n self-adjoint Gaussian matrix (often called GUE) G(n) :=
n 1 exp − 2 Tr A2 dA. Zn 2σ
In fact, this arises as a random Hermitian matrix Gij (n) 1≤i,j≤n satisfying • Re Gij (n) (1 ≤ i ≤ j ≤ n) and Im Gij (n) (1 ≤ i < j ≤ n) are independent Gaussian random variables, • Re Gii (n) (1 ≤ i ≤ n) are of N (0, σ 2 /n) and Re Gij (n), Im Gij (n) (1 ≤ i < j ≤ n) are of N (0, σ2 /2n). The following is the large deviation theorem due to Ben Arous and Guionnet. One can see a resemblance of the rate function below to that of the LDP for i.i.d. real Guassians given in (0.1). Corollary 6.11 (Ben Arous-Guionnet). When Q(x) = x2 /2σ 2 , (ˆ µn ) satisfies the LDP with the good rate function I(ν) := −βΣ(ν) +
1 2σ 2
R
x2 dν(x) +
β 3β log βσ 2 − , 2 4
ν ∈ M1 (R).
Furthermore, the semicircular measure w2√βσ2 :=
1 / 4βσ2 − x2 1[−2√βσ2 , 2√βσ2 ] (x) dx 2πβσ 2
is a unique minimizer of I with I(w2√βσ2 ) = 0. Proof. By Theorem 6.5 it remains to compute the constant B and to distinguish the minimizer of I. By the Selberg integral formula (see [29, Sec. 17.1], [23, pp. 118–119]) we have
n n 2 exp − 2 x Z˜n = 2σ i=1 i Rn
= (2π)n/2
)
|xi − xj |2β dx
i<j
n n −n(β(n−1)+1)/2 ) Γ(1 + jβ) . σ2 Γ(1 + β) j=1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
244
b1644-ch03
Real and Stochastic Analysis
Thanks to the Stirling formula, n β β 1 β 1 jβ log jβ − B = lim 2 log Z˜n = lim log σ 2 − log n + 2 n→∞ n n→∞ 2 2 n j=1 2 =
n βj jβ β (log σ 2 − 1) + lim log n→∞ 2 n j=1 n n
β = (log σ 2 − 1) + β 2
1 0
x log βx dx =
Moreover, for the semicircle measure [35, Sec. IV.5]) that r = wr (y) log |x − y| dy = −r <
β 3β log βσ 2 − . 2 4
wr with radius r, it is known (see x2 1 r + log − r2 2 2 r x2 1 + log − 2 r 2 2
if |x| ≤ r, if |x| > r.
Hence / that wr is a unique minimizer of −Σ(ν) + 2Theorem 6.3 implies 2 x dν(x). With r = 2 βσ2 we may apply this to 2 r R 1 1 2 x2 dν(x) = β −Σ(ν) + x dν(x) . −βΣ(ν) + 2 2σ R 2βσ2 R Similar large deviations for other types of random matrices such as the non-self-adjoint Gaussian matrix, the Haar-distributed unitary matrix (called CUE or circular unitary ensemble), and the Wishart matrix are included in [23]. Here is another special case of a random Hermitian matrix restricted on a bounded part of the operator norm A ≤ R and distributed by the normalized Lebesgue measure. For any R > 0 let λn,R be the normalization of the restriction Λn on {A ∈ Msa n : A ≤ R}. Then by Lemma 6.4, it induces the probability ˜ n,R on [−R, R]n (the eigenvalue space) having the joint density measure λ 1 ) (ti − tj )2 , Zn i<j where
Zn =
) (ti − tj )2 dt
[−R,R]n i<j 2
= Rn
n−1 2 ) 2 ) (j + 1)!(j!) (ti − tj )2 dt = (2R)n (n + j)! [−1,1]n i<j j=0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
245
due to the Selberg integral formula (see [23, 29]). By using the Stirling formula one has R 1 log Zn = log . 2 n→∞ n 2 lim
(6.14)
The empirical measure on M1 ([−R, R]) of λn,R is ˜ n,R ({t ∈ [−R, R]n : κt ∈ Γ}) ˆ n,R (Γ) := λ λ for Borel sets Γ ⊂ M1 ([−R, R]). Now, the next theorem is a version of Theorem 6.5 where β = 1 and Q(x) = 0 with restriction of probability measures to those supported on [−R, R]. Indeed, Theorem 6.5 holds true when we take 0 if |x| ≤ R, Q(x) = ∞ if |x| > R, though Q(x) is not continuous. ˆ n,R ) given above satisfies the LDP in the Theorem 6.12. The sequence (λ 2 scale 1/n with the good rate function I(ν) := −Σ(ν) + log
R , 2
ν ∈ M1 ([−R, R]).
7. Quantum Large Deviations in Spin Chains Quantum version of large deviations has recently been developed since the appearance of the papers [27, 31]. The standard formulation of quantum large deviations is made in the setting of quantum spin chains. Let A0 be a finite-dimensional C ∗ -algebra, hence isomorphic to a finite direct 4 sum kj=1 Mdj (C) of full matrix algebras. The infinite tensor product C ∗ 5 algebra A := i∈Z Ai of Ai := A0 is a quantum spin chain with one-site algebra A0 . For any subset X ⊂ Z we have a C ∗ -subalgebra 5 sa AX := i∈X Ai of A, where A∅ := C1 by convention, and write AX for the set of all self-adjoint elements of AX . Let ω be a state on A, i.e., a linear functional on A such that ω(a∗ a) ≥ 0 for all a ∈ A and ω(1) = 1, 1 being the identity of A. For each X ⊂ Z and each observable a ∈ Asa X, by functional calculus and the Riesz-Markov theorem we have a unique µaX ∈ M1 (R) such that, for every continuous function f on R, f dµaX = ω(f (a)). R
October 24, 2013
10:0
9in x 6in
246
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
The µaX is called the distribution of a with respect to ω. The shift automorphism γ of A is defined as γ : A[m,n] → A[m,n+1] , a → 1Am ⊗ a for each m, n ∈ Z, m ≤ n. The state ω is translation invariant if ω ◦ γ = ω. The set of all translation-invariant states on A is a weakly* closed convex subset of A∗ , and its extremal points are called ergodic states. Given a translation-invariant state ω on A and a sequence (an ) of observables an ∈ Asa [1,n] , n ∈ N, the quantum LDP of our interest is the n LDP for the sequence (µa[1,n] ) telling that, with a certain rate function I : R → [0, ∞], (a) (upper bound) for every closed F ⊂ R, lim sup n→∞
1 1 n log µa[1,n] (F ) = lim sup log ω(1F (an )) ≤ − inf I(x), x∈F n n n→∞
(b) (lower bound) for every open G ⊂ R, lim inf n→∞
1 1 n log µa[1,n] (G) = lim inf log ω(1G (an )) ≥ − inf I(x). n→∞ x∈G n n
In the above, 1B (an ) denotes the spectral projection of an corresponding to a Borel subset B ⊂ R. (Note that the spectral projections of an are in A[1,n] since A[1,n] is finite dimensional.) Let us first examine the particular case where ω is a product (or i.i.d.) state and (an ) is the ergodic averages of a one-site observable, that is, for a state ϕ on the one-site algebra A0 and a0 ∈ Asa 0 , ω := ϕZ =
6
1 i γ (a0 ). n i=1 n
ϕ,
i∈Z
an :=
n Let Λn be the logarithmic moment generating function of µa[1,n] and Λa0 be i that of the distribution µa0 of a0 with respect to ϕ. Since γ (a0 ), 1 ≤ i ≤ n, are commuting, we have for every λ ∈ R n (x) Λn (nλ) = log enλx dµa[1,n]
R
= log ω(e
nλan
) = log ω
n )
e
λγ i (a0 )
i=1
= log ϕ⊗n ((eλa0 )⊗n ) = n log ϕ(eλa0 ) = n log eλx dµa0 (x) = nΛa0 (λ). R
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
247
A Concise Exposition of Large Deviations
This means that n (B) µa[1,n]
=
µ⊗n a0
n x1 + · · · + xn ∈B , x∈R : n
B ∈ BR .
Hence the situation reduces to the classical Cram´er theorem where the range of i.i.d. Xi ’s is a finite set. Our aim for quantum LDP is to extend the above particular case to the situation where the reference state ω is a non-product (i.e., non-i.i.d.) correlated state and (an ) comes from multi-site observables. For translationinvariant states on spin chains, there are two well-developed classes of Gibbs states and of finitely correlated states, which are briefly described below. A function Φ : {X ⊂ Z : |X| < ∞} → A, |X| being the cardinality of X, such that Φ(∅) = 0 and Φ(X) ∈ Asa X for all X is called an interaction. Assume that Φ has finite range, i.e., there is an N ∈ N such that Φ(X) = 0 if the diameter of X is greater than N and that Φ is translation invariant, i.e., Φ(X + 1) = γ(Φ(X)) for all X. For each Λ ⊂ Z with |Λ| < ∞, the local G Hamiltonian HΛΦ , the local Gibbs state ωΛ and the local dynamics Λ α are defined by HΛΦ :=
Φ(X),
X⊂Λ Φ
G ωΛ (a) :=
TrΛ (e−HΛ a) , Φ TrΛ (e−HΛ )
Φ
Φ
itHΛ αΛ ae−itHΛ , t (a) := e
t ∈ R, a ∈ AΛ ,
5 where TrΛ is the trace functional on AΛ := i∈Λ Ai such that TrΛ (e) = 1 for all minimal projections e in AΛ . Then the (global) Gibbs state ω and the (global) dynamics αΦ are introduced as G , ω := (weak∗ ) lim ωΛ Λ Z
Λ αΦ t := (strong) lim αt , Λ Z
t ∈ R.
It is well known (see [1,10]) that ω is a unique αΦ -KMS state at −β = −1 (i.e., with inverse temperature β = 1). On the other hand, (C ∗ -)finitely correlated states was introduced in [17] as a certain kind of quantum Markov states. Let B be a finite-dimensional C ∗ -algebra, E : A0 ⊗ B → B a completely positive unital map and ρ a state of B such that ρ(E(1A0 ⊗ b)) = ρ(b) for all b ∈ B. For each a ∈ A0 define a map Ea : B → B by Ea (b) := E(a ⊗ b), b ∈ B. Then the finitely correlated state ω determined by the triple (B, E, ρ) is the γ-invariant
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
248
b1644-ch03
Real and Stochastic Analysis
state of A given by ω(a0 ⊗ a1 ⊗ · · · ⊗ an ) := ρ(Ea0 ◦ Ea1 ◦ · · · ◦ Ean (1B )),
ai ∈ Ai , 0 ≤ i ≤ n.
As for the sequence (an ), for any a ∈ Asa Λ with |Λ| < ∞, define the ergodic averages of the multi-site observable a an :=
1 n
γ k (a) ∈ Asa [1,n] ,
n ∈ N.
(7.1)
Λ+k⊂[1,n]
More generally, for a translation-invariant finite-range interaction Ψ we also define 1 Ψ 1 an := H[1,n] = Ψ(X), n ∈ N. (7.2) n n X⊂[1,n]
Some known results on quantum LDP are surveyed below, which are regarded as the quantum version of the G¨ artner-Ellis theorem. Let Φ, Ψ be translation-invariant finite-range interactions, ω be the Gibbs state for Φ and (an ) be as given in (7.2). Then it was shown in [27] that the mean pressure (or free energy density) of Φ with respect to ω pω (Ψ) := lim
n→∞
Ψ 1 log ω(e−H[1,n] ) n
(7.3)
n satisfies the large deviations upper bound exists and is finite. Hence µa[1,n] (see (a) above) with the rate function I(x) := sup{λx − pω (−λΦ)},
x ∈ R.
(7.4)
λ∈R
Indeed, since the limiting logarithmic moment generating function of n µa[1,n] is lim
n→∞
1 log n
R
1 log ω(enλan ) = pω (−λΨ), n→∞ n
n enλx dµa[1,n] = lim
the G¨ artner-Ellis upper bound (Theorem 3.7 (a)) can be applied. In [27] the above result was also shown in the setting of arbitrary dimension (i.e., Zν -lattice spin chains, ν ∈ N) but in high temperature regime (i.e., in the KMS state associated with the interaction βΦ with inverse temperature β sufficiently small) while the details are omitted here. Also in [31], the full LDP for Gibbs states was obtained by a cluster expansion technique in
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
249
the setting of arbitrary dimension and high temperature regime in the case where (an ) is the ergodic averages of a one-site observable, whose details are also omitted here. A translation-invariant state ω on A is said to have the upper factorization property if there exists a constant α > 0 such that ω ≤ α(ω|A(−∞,0] ) ⊗ (ω|A[1,∞) ), and also said to have the lower factorization property if ω ≥ β(ω|A(−∞,0] ) ⊗ (ω|A[1,∞) ) for some constant β > 0. It is shown in [21] that the Gibbs state for a translation-invariant finite-range interaction satisfies both upper and lower factorization properties and any finitely correlated state satisfies the upper factorization property. Hence the next theorem is a generalization of the result in [27] mentioned above. Theorem 7.1 ([21]). Let ω be a translation-invariant state on A satisfying the upper factorization property. For any translation-invariant finiterange interaction Ψ, the mean pressure pω (Ψ) in (7.3) with respect to ω n exists and is finite. Hence, for (an ) given in (7.2), (µa[1,n] ) satisfies the large deviations upper bound (a) with the rate function I(x) in (7.4). If we prove the differentiability of the function λ ∈ R → pω (−λΦ), then we can use the full power of the G¨ artner-Ellis theorem to have the full LDP n ). This differentiability question was finally settled of the sequence (µa[1,n] by Ogata [32]. Theorem 7.2 (Ogata). Assume that ω is either the Gibbs state for a translation-invariant finite-range interaction Φ or a finitely correlated state on A as described above. Let Ψ be a translation-invariant finite-range interaction. Then the function λ ∈ R → pω (−λΨ) is differentiable for every n ) satisfies the LDP with the good rate λ ∈ R. Hence, for (an ) in (7.2), (µa[1,n] function I in (7.4). The theorems mentioned above are the quantum LDP of level-1 for a sequence of probability measures on R which arise as the distributions of a certain sequence of observables under a suitable quantum state. The next step in quantum LDP should be to develop the quantum LDP of level-2, i.e., the quantum version of the Sanov theorem. Some discussions to this direction were given in [8, 31]. The next theorem is from [8], which is not
October 24, 2013
10:0
9in x 6in
250
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
the very quantum analog of the usual formulation of the Sanov theorem but a bit modified from the point of view of hypothesis testing. Theorem 7.3 (Bjelakovic et al.). Let ϕ ∈ S(A0 ) and Ψ ⊂ S(A0 ), where S(A0 ) is the set of states on A0 . Then there exists a sequence {pn } of orthogonal projections pn ∈ A[1,n] = A⊗n 0 , n ∈ N, such that lim ψ ⊗n (pn ) = 0,
n→∞
lim
n→∞
ψ∈Ψ
1 log ϕ⊗n (pn ) = − inf S(ψϕ) ψ∈Ψ n
(typicality),
(7.5)
(separation rate).
Moreover, for each sequence of projections {˜ pn } satisfying (7.5), lim inf n→∞
1 log ϕ⊗n (˜ pn ) ≥ − inf S(ψϕ), ψ∈Ψ n
so S(Ψϕ) := inf ψ∈Ψ S(ψϕ) is the best achievable separation rate. In the above, S(ψϕ) is the relative entropy of ψ with respect to ϕ. Furthermore, in [9] by the same authors, the above theorem in the i.i.d. setting was largely generalized to the case of correlated states in connection with Stein’s lemma of hypothesis testing. Among other results, it was shown in [9] that the quantum Sanov theorem in the form of Theorem 7.3 hold true under *-mixing of a correlated state ϕ. The *-mixing is a factorization property quite similar to the upper and lower factorizations mentioned above. There is a formulation of quantum large deviation different from that discussed above in this section. The free energy density (7.3) is written as Ψ 1 log Tr[1,n] elog D(ωn ) e−H[1,n] , lim n→∞ n where D(ωn ) is the density matrix of ωn := ω|A[1,n] . From the point of view of quantum statistical mechanics, the limit 1 Ψ log Tr[1,n] exp log D(ωn ) − H[1,n] or n→∞ n 1 Φ Ψ log Tr[1,n] exp −H[1,n] − H[1,n] lim n→∞ n lim
(called perturbed free energy density) may be (even more) meaningful. The next theorem is concerned with the variational expression of perturbed free energy density.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
251
A Concise Exposition of Large Deviations
Theorem 7.4 ([22]). Let ω be the Gibbs state as above and (an ) be given in (7.1) for any a ∈ Asa Λ with |Λ| < ∞. Then for every real continuous function f on [λmin (a), λmax (a)], where λmin (a) and λmax (a) are the minimum and the maximum of the spectrum of a, the functional free energy density pω (a, f ) := lim
n→∞
1 log Tr[1,n] exp(log D(ωn ) − nf (an )) n
(7.6)
exists and pω (a, f ) = max{−f (x) − Ia (x) : x ∈ [λmin (a), λmax (a)]}, where Ia is defined by Ia (x) := sup{λx − pω (−λa)}
(7.7)
λ∈R
with pω (−λa) := lim
n→∞
1 log Tr[1,n] exp(log D(ωn ) − nλan ), n
λ ∈ R.
The limit (7.6) has a direct physical meaning in the case when f (x) = x2 and a = a0 ∈ A0 . In this case, Φ −H[1,n] −
n 1 ai aj n i,j=1
is a mean field perturbation of the interaction Φ, where aj := γ j (a0 ). The limit is the free energy density for the mean field model and the variational formula has an important physical interpretation. A special case of Theorem 7.4 where ω is a product state and an is an average of a one-site observable was formerly discussed in [33]. A slight generalization of Theorem 7.4 is in [12]. The variational expression in Theorem 7.4 is considered as a quantum analog of Varadhan’s integral lemma (see Theorems 4.1 and 4.6) even though the exact form of LDP is not clearly formulated. In this connection the following comment is worth noting: Remark 7.5. As remarked in [22], if the so-called BMV (Bessis, Moussa and Villani) conjecture [4] (also [28]) is true, then for each n ∈ N we have a probability measure µan supported in [λmin (a), λmax (a)] such that Tr[1,n] exp(log D(ωn ) − λan ) = e−λx dµan (x), λ ∈ R. R
October 24, 2013
10:0
9in x 6in
252
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
Hence pω (−λa) is nothing but the limiting logarithmic moment generating function of (µan ). Since this function is differentiable for every λ ∈ R (see [22]), the G¨artner-Ellis theorem shows that (µan ) satisfies the LDP with the good rate function Ia in (7.7). Furthermore, a proof of the BMV conjecture has recently been put forward by H. Stahl [37]. However, even in this nice situation, it does not seem that Theorem 7.4 is a direct consequence of Varadhan’s integral lemma, since Tr[1,n] exp(log D(ωn ) − nf (an )) = e−nf (x) dµan (x) R
is not guaranteed.
8. Applications of Large Deviations The theory of large deviations is powerful and useful, and its applications are of course quite many. Applications to Bolzmann-Gibbs entropy and free entropy are illustrated in this section.
8.1. Boltzmann-Gibbs entropy and mutual information Throughout this subsection let (X1 , . . . , Xn ) be an n-tuple of real random variables on a probability space (Ω, P ), and assume that the Xi ’s are bounded. The Boltzmann-Gibbs entropy of (X1 , . . . , Xn ) is defined to be p(x1 , . . . , xn ) log p(x1 , . . . , xn ) dx1 · · · dxn H(X1 , . . . , Xn ) := − · · · Rn
if the joint density p(x1 , . . . , xn ) of (X1 , . . . , Xn ) exists; otherwise H(X1 , . . . , Xn ) = −∞. Note that the above integral is well defined in [−∞, ∞) since the density p is compactly supported. Definition 8.1. The mean value of x = (x1 , . . . , xN ) in RN is given by κN (x) :=
N 1 xj . N j=1
For each N, m ∈ N and δ > 0 we define ∆(X1 , . . . , Xn ; N, m, δ) to be the set of all n-tuples (x1 , . . . , xn ) of xi = (xi1 , . . . , xiN ) ∈ RN , 1 ≤ i ≤ n, such that |κN (xi1 · · · xik ) − E(Xi1 · · · Xik )| < δ
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
253
for all 1 ≤ i1 , . . . , ik ≤ n with 1 ≤ k ≤ m, where xi1 · · · xik means the pointwise product, i.e., xi1 · · · xik := (xi1 1 · · · xik 1 , xi1 2 · · · xik 2 , . . . , xi1 N · · · xik N ) ∈ RN and E(·) is the expectation on (Ω, P ) as usual. For each R > 0, define ∆R (X1 , . . . , Xn; N, m, δ) to be the set of all (x1 , . . . , xn ) ∈ ∆(X1 , . . . , Xn ; N, m, δ) such that xi ∈ [−R, R]N for all 1 ≤ i ≤ n. Heuristically, ∆(X1 , . . . , Xn ; N, m, δ) is the set of “microstates” consisting of n-tuples of discrete random variables on the N -point set with the uniform probability such that all joint moments of order up to m give the corresponding joint moments of X1 , . . . , Xn up to an error δ. Based on the Sanov large deviation theorem we prove the next theorem, which says that the Boltzmann-Gibbs entropy is gained as an asymptotic limit of the volume of the approximating microstates. Theorem 8.2. For every m ∈ N and δ > 0 and for any choice of R ≥ max1≤i≤n Xi ∞ ( · ∞ being the L∞ -norm) the limit lim
N →∞
1 log λ⊗n N (∆R (X1 , . . . , Xn ; N, m, δ)) N
exists, where λN is the Lebesgue measure on RN . Furthermore, H(X1 , . . . , Xn ) =
lim
lim
m→∞, δ 0 N →∞
1 log λ⊗n N (∆R (X1 , . . . , Xn ; N, m, δ)) N
holds independently of the choice of R ≥ max1≤i≤n Xi ∞ . Proof. Let R ≥ max1≤i≤n Xi ∞ be arbitrary and Σ := [−R, R]n . Let ν0 ∈ M1 (Σ) be the joint distribution of (X1 , . . . , Xn ). Consider x = (x1 , . . . , xn ) of xi = (xi1 , . . . , xiN ) ∈ [−R, R]N as an element x = (x1 , . . . , xN ) in ΣN by xj = (x1j , . . . , xnj ) ∈ Σ, 1 ≤ j ≤ N . Note that, for every i1 , . . . , ik ∈ {1, . . . , n}, we have
δx1 + · · · + δxN N
(xi1 · · · xik ) = κN (xi1 · · · xik ),
(8.1)
where xi1 · · · xik means the product of the coordinate functions xi1 , . . . , xik on Σ = [−R, R]n . Let µR be the normalization of the restriction of λ⊗n on [−R, R]n , i.e., µR := (2R)−n λ⊗n |[−R,R]n . For m ∈ N and δ > 0 the set
October 24, 2013
10:0
9in x 6in
254
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
F (m, δ) of all ν ∈ M1 (Σ) satisfying |ν(xi1 · · · xik ) − ν0 (xi1 · · · xik )| ≤ δ for all i1 , . . . , ik ∈ {1, . . . , n} with 1 ≤ k ≤ m is closed in M1 (Σ) in weak topology. Replacing ≤ δ with < δ we have an open set G(m, δ). The Sanov theorem (Theorem 5.9) implies that δx1 + · · · + δxN 1 ⊗N N x∈Σ : ∈ F (m, δ) lim sup log µR N N →∞ N ≤− lim inf N →∞
inf
ν∈F (m,δ)
1 log µ⊗N R N
≥−
inf ν∈G(m,δ)
S(νµR ), δ 1 + · · · + δxN x ∈ ΣN : x ∈ G(m, δ) N S(νµR ).
Now, we prove that the limit δx1 + · · · + δxN 1 ⊗N N log µR ∈ F (m, δ) lim x∈Σ : N →∞ N N
(8.2)
exists and its limit as m → ∞, δ 0 is equal to −S(ν0 µR ). If inf ν∈F (m,δ) S(νµR ) = ∞ for some m = m1 and δ = δ1 , then the limit (8.2) exists and is −∞ for every m ≥ m1 and δ ≤ δ1 ; moreover S(ν0 µR ) = ∞ in this case. So assume that inf ν∈F (m,δ) S(νµR ) < ∞ for all m ∈ N and δ > 0. Then, by convexity of relative entropy, it is easy to check that inf
ν∈F (m,δ)
S(νµR ) =
inf
ν∈G(m,δ)
S(νµR ),
so the limit (8.2) exists. Moreover, since the sets G(m, δ) constitute a neighborhood base of ν0 in M1 (Σ), the infimum of the above tends to S(ν0 µR ) as m → ∞ and δ 0 by lower semicontinuity of relative entropy. Finally, = (2R)−nN λ⊗n since µ⊗N R N |[−R,R]nN and (8.1) gives ∆R (X1 , . . . , Xn ; N, m, δ) =
x∈Σ
N
δ x1 + · · · + δxN ∈ F (m, δ) , : N
we see that (8.2) is equal to 1 n log λ⊗n N (∆R (X1 , . . . , Xn ; N, m, δ)) − log(2R) . N →∞ N lim
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
255
A Concise Exposition of Large Deviations
On the other hand, we have S(ν0 µR ) = S(ν0 λ⊗n ) + log(2R)n = −H(X1 , . . . , Xn ) + log(2R)n .
Hence the result follows.
In the following, some kinds of mutual information is introduced in the discretization approach using microstates of permutations. Let RN ≤ be the set of RN -vectors of increasing coordinates and SN be the symmetric group of order N , i.e., the permutations on {1, 2, . . . , N }. Definition 8.3. The action of SN on RN is given by σ(x) := (xσ−1 (1) , xσ−1 (2) , . . . , xσ−1 (N ) ) for σ ∈ SN and x = (x1 , . . . , xN ) ∈ RN . For each R > 0, N, m ∈ N and δ > n 0, we denote by ∆sym,R (X1 , . . . , Xn ; N, m, δ) the set of all (σ1 , . . . , σn ) ∈ SN such that (σ1 (x1 ), . . . , σn (xn )) ∈ ∆R (X1 , . . . , Xn ; N, m, δ) n for some (x1 , . . . , xn ) ∈ RN . For each R > 0 define ≤ Isym,R (X1 , . . . , Xn ) := −
lim
lim sup
m→∞, δ 0 N →∞
1 ∆sym,R (X1 , . . . , Xn ; N, m, δ) , log γS⊗n N N
where γSN is the uniform probability measure on SN . Define also I sym,R (X1 , . . . , Xn ) by replacing lim sup by lim inf. Obviously, 0 ≤ Isym,R (X1 , . . . , Xn ) ≤ I sym,R (X1 , . . . , Xn ). Moreover, ∆sym,∞ (X1 , . . . , Xn ; N, m, δ) is defined by replacing ∆R (X1 , . . ., Xn ; N, m, δ) in the above by ∆(X1 , . . . , Xn ; N, m, δ) without cut-off by the parameter R. Then Isym,∞ (X1 , . . . , Xn ) and I sym,∞ (X1 , . . . , Xn ) are also defined as above. Lemma 8.4 ([25]). For any choice of R ≥ max1≤i≤n Xi ∞ , Isym,∞ (X1 , . . . , Xn ) = Isym,R (X1 , . . . , Xn ),
(8.3)
I sym,∞ (X1 , . . . , Xn ) = I sym,R (X1 , . . . , Xn ).
(8.4)
So we denote the equal quantities in (8.3) by Isym (X1 , . . . , Xn ) and those in (8.4) by I sym (X1 , . . . , Xn ), and call them the mutual information and upper mutual information of (X1 , . . . , Xn ), respectively.
October 24, 2013
10:0
9in x 6in
256
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
The next theorem was shown in [25] based on Theorem 8.2, which gives the exact relation of Isym and I sym with the Boltzmann-Gibbs entropy H(·). It says that Isym (X1 , . . . , Xn ) is formally the sum of the separate entropies H(Xi )’s minus the compound H(X1 , . . . , Xn ). Thus, a naive meaning of Isym (X1 , . . . , Xn ) is the entropy (or information) overlapping among the Xi ’s, justifying the terminology “mutual information.” Theorem 8.5 ([25]). H(X1 , . . . , Xn ) = −Isym (X1 , . . . , Xn ) +
n
H(Xi )
i=1
= −I sym (X1 , . . . , Xn ) +
n
H(Xi ).
i=1
Let µ(X1 ,...,Xn ) be the joint distribution measure on Rn of (X1 , . . . , Xn ) while µXi is that of Xi for 1 ≤ i ≤ n. Let S(µ(X1 ,...,Xn ) , µX1 ⊗ · · · ⊗ µXn ) denote the relative entropy of µ(X1 ,...,Xn ) with respect to the product measure µX1 ⊗ · · · ⊗ µXn , i.e., dµ(X1 ,...,Xn ) S(µ(X1 ,...,Xn ) , µX1 ⊗ · · · ⊗ µXn ) := log dµ(X1 ,...,Xn ) d(µX1 ⊗ · · · ⊗ µXn ) if µ(X1 ,...,Xn ) µX1 ⊗ · · · ⊗µXn ; otherwise S(µ(X1 ,...,Xn ) , µX1 ⊗· · ·⊗µXn ) := ∞. When H(Xi ) > −∞ for all 1 ≤ i ≤ n, one can easily verify that S(µ(X1 ,...,Xn ) , µX1 ⊗ · · · ⊗ µXn ) = −H(X1 , . . . , Xn ) +
n
H(Xi ).
i=1
Thus, the above theorem yields that if H(Xi ) > −∞ for all 1 ≤ i ≤ n, then Isym (X1 , . . . , Xn ) = I sym (X1 , . . . , Xn ) = S(µ(X1 ,...,Xn ) , µX1 ⊗ · · · ⊗ µXn ), and Isym (X1 , . . . , Xn ) = 0 if and only if X1 , . . . , Xn are independent. In particular, the mutual information I(X; Y ) of two real random variables X, Y is originally defined as I(X; Y ) := S(µ(X,Y ) , µX ⊗ µY ), which is also written as I(X; Y ) = −H(X, Y ) + H(X) + H(Y )
(8.5)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
257
as long as H(X) > −∞ and H(Y ) > −∞. Hence we have I(X; Y ) = Isym (X, Y ) = I sym (X, Y ) if X, Y are bounded with H(X), H(Y ) > −∞. We do not know whether Isym (X1 , . . . , Xn ) = I sym (X1 , . . . , Xn ) hold without the assumption H(Xi ) > −∞ for 1 ≤ i ≤ n, and more strongly, whether the limit lim
N →∞
1 log γS⊗n (∆sym,R (X1 , . . . , Xn ; N, m, δ)) N N
exists as in Theorem 8.2.
8.2. Free entropy and orbital free entropy Let (M, τ ) be a tracial W ∗ -probability space, that is, M is a von Neumann algebra with a faithful normal tracial state τ , and M sa be the set of selfadjoint elements in M . Let MN be the algebra of N × N complex matrices and Msa N the set of Hermitian matrices in MN . The normalized trace of A ∈ MN is denoted by trN (A) and the operator norm of A by A. In [39] Voiculescu introduced the free entropy of an n-tuple (X1 , . . . , Xn ) of noncommutative self-adjoint random variables in (M, τ ) as follows: Definition 8.6. For each N, m ∈ N and δ > 0 define n Γ(X1 , . . . , Xn ; N, m, δ) := {(A1 , . . . , An ) ∈ (Msa N) :
|trn (Ai1 · · · Aik ) − τ (Xi1 · · · Xik )| ≤ δ, 1 ≤ i1 , . . . , ik ≤ n, k ≤ m}. Moreover, for each R > 0, define ΓR (X1 , . . . , Xn ; N, m, δ) to be the set of all (A1 , . . . , An ) ∈ Γ(X1 , . . . , Xn ; N, m, δ) such that Ai ≤ R for all 1 ≤ i ≤ n. We define χR (X1 , . . . , Xn ) :=
lim
m→∞, δ 0
lim sup N →∞
n 1 ⊗n log Λ (Γ (X , . . . , X ; N, m, δ)) + , log N R 1 n N N2 2
where Λ⊗n N denotes the n-fold tensor product of the “Lebesgue” measure ΛN on Msa N given in (6.6) (with N in place of n). Then the free entropy
October 24, 2013
10:0
9in x 6in
258
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
of (X1 , . . . , Xn ) is χ(X1 , . . . , Xn ) := sup χR (X1 , . . . , Xn ). R>0
The definition being based on the microstate (or matricial) approximation, χ(X1 , . . . , Xn ) is sometimes called the microstate free entropy in contrast to another Voiculescu’s free entropy χ∗ (X1 , . . . , Xn ) in the nonmicrostate approach in [41]. One can see a strong resemblance of Definition 8.6 to an expression in Theorem 8.2 with Definition 8.1. An essential difference is that lim supN →∞ appears in Definition 8.6 while limN →∞ is obtained in Theorem 8.2 by the Sanov theorem. Indeed, one of the most significant open question in free entropy is whether replacing lim sup in the definition χ(X1 , . . . , Xn ) in Definition 8.6 by lim inf yields the same quantity or not. The n-tuple (X1 , . . . , Xn ) is said to be regular if this is the case. However, the next theorem tells that we can prove the existence of the limit as N → ∞ for the free entropy in the single variable case. The proof is based on the large deviation result (Theorem 6.12). Theorem 8.7. Let X ∈ M sa . For every R ≥ X, m ∈ N and δ > 0, lim
N →∞
1 1 log ΛN ΓR (X; N, m, δ) + log N 2 N 2
exists, and χR (X) =
lim
lim
m→∞, δ 0 N →∞
= Σ(µX ) +
1 n log N log Λ⊗n N (ΓR (X; N, m, δ)) + N2 2
3 1 log 2π + , 2 4
where Σ(µX ) is the free entropy, given in (6.5), of the distribution measure µX (with respect to τ ) of X. Hence χ(X) = χR (X) independently of the choice of R ≥ X. Proof. Let X ∈ M sa and R ≥ X. For m ∈ N and δ > 0 let F (m, δ) be the set of ν ∈ M1 ([−R, R]) such that |ν(xk ) − µX (xk )| ≤ δ for 1 ≤ k ≤ m, and G(m, δ) be given by replacing ≤ δ with < δ, where ν(xk ) := [−R,R] xk dν(x) (hence µX (xk ) = τ (X k )). Since F (m, δ) is closed and G(m, δ) is open in M1 ([−R, R]) in weak topology, Theorem 6.12
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
259
implies that lim sup N →∞
≤ lim inf N →∞
1 ˜ N,R ({t ∈ [−R, R]N : κt ∈ F (m, δ)}) log λ N2 sup
ν∈F (m,δ)
Σ(ν) − log
R , 2
1 ˜ N,R ({t ∈ [−R, R]N : κt ∈ G(m, δ)}) log λ N2
≥
sup
Σ(ν) − log
ν∈G(m,δ)
R . 2
It is easy to check that supν∈F (m,δ) Σ(ν) = supν∈G(m,δ) Σ(ν); so we have lim
N →∞
1 ˜ N,R ({t ∈ [−R, R]N : κt ∈ F (m, δ)}) log λ N2
=
sup ν∈F (m,δ)
Σ(ν) − log
R . 2
(8.6)
˜ N be the measure on RN (instead of RN ) induced from ΛN on Here, let Λ ≤ ˜ Msa N . By Lemma 6.4 it is seen that ΛN has the joint density CN
) (ti − tj )2
with CN :=
i<j
(2π)N (N −1)/2 *N j=1 j!
so that ˜ N,R = λ
1 ¯N . Λ [−R,R]N CN ZN
The Stirling formula gives 1 3 1 1 lim log N = log 2π + . log C + N 2 N →∞ N 2 2 4 From this and (6.14) it follows that (8.6) is rewritten as
1 ˜ N {t ∈ [−R, R]N : κt ∈ F (m, δ)} + 1 log N log Λ lim N →∞ N 2 2 =
sup ν∈F (m,δ)
Σ(ν) +
1 3 log 2π + . 2 4
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
260
b1644-ch03
Real and Stochastic Analysis
Finally, we notice that ˜ N ({t ∈ [−R, R]N : κt ∈ F (m, δ)}) Λ ˜ N ({t ∈ [−R, R]N : |κt (xk ) − µ(xk )| ≤ δ, 1 ≤ k ≤ m}) =Λ k k = ΛN ({A ∈ Msa N : A ≤ R, |trN (A ) − τ (X )| ≤ δ, 1 ≤ k ≤ m})
= ΛN (ΓR (X; N, m, δ)), from which letting m → ∞ and δ 0 yields the conclusion.
When we write χ(ν) := Σ(ν) + 2−1 log 2π + 3/4 in view of Theorem 8.7, the rate function for the LDP in Corollary 6.11 with β = 1 and σ = 1 is written as 1 1 x2 dν(x) + log 2π, I(ν) = −χ(ν) + 2 R 2 which is a complete resemblance to (0.1) with χ(ν) in place of H(ν). Let U(N ) be the unitary group of order N and γU(N ) be the Haar probability measure on U(N ). We identify N × N diagonal matrices D with diagonal entries x1 ≤ · · · ≤ xN with vectors (x1 , . . . , xN ) ∈ RN ≤. The idea of the next definition is same as that of Definition 8.3. Replacing discrete microstates of permutations with continuous microstates of unitary matrices, we consider unitary orbits U DU ∗ , U ∈ U(N ), of diagonal matrices D ∈ RN ≤. Definition 8.8. For an n-tuple (X1 , . . . , Xn ) of self-adjoint random variables in (M, τ ) and for each R > 0 and for each m, N ∈ N and δ > 0, we denote by Γorb,R (X1 , . . . , Xn ; N, m, δ) the set of all (U1 , . . . , Un ) ∈ U(N )n such that (U1 D1 U1∗ , . . . , Un Dn Un∗ ) ∈ ΓR (X1 , . . . , Xn ; N, m, δ) for some diagonal matrices D1 , . . . , Dn ∈ RN ≤ . We define χorb,R (X1 , . . . , Xn ) :=
lim
lim sup
m→∞, δ 0 N →∞
1 ⊗n log γU(N ) (Γorb,R (X1 , . . . , Xn ; N, m, δ)), N2 (8.7)
and χorb (X1 , . . . , Xn ) := sup χorb,R (X1 , . . . , Xn ). R>0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
261
A Concise Exposition of Large Deviations
Moreover, Γorb,∞ (X1 , . . . , Xn ; N, m, δ) is defined by replacing ΓR (X1 , . . . , Xn ; N, m, δ) in the above with Γ(X1 , . . . , Xn ; N, m, δ) without cut-off by the parameter R. Then we also define χorb,∞ (X1 , . . . , Xn ) as above by replacing Γorb,R with Γorb,∞ . Similarly to Lemma 8.4 we have: Lemma 8.9 ([19]). For any choice of R ≥ max1≤i≤n Xi , χorb,∞ (X1 , . . . , Xn ) = χorb,R (X1 , . . . , Xn ). So we denote the equal quantities in the above by χorb (X1 , . . . , Xn ), and call it the orbital free entropy of (X1 , . . . , Xn ). Note that χorb (X) = 0 for any single variable X. As in Definition 8.3 it might be possible to consider the modified χorb with lim inf in place of lim sup in (8.7). However, it is not so meaningful since the regularity question between using lim sup or lim inf is quite open even for free entropy χ(X1 , . . . , Xn ) as mentioned after Definition 8.6 unlike Boltzmann-Gibbs entropy (see Theorem 8.2). The next theorem gives the exact relation between χorb and the usual χ. It bears quite a striking resemblance to Theorem 8.5. The proof is based on Theorem 8.7; indeed, an essential point in the proof is that lim sup can be replaced by lim for each χ(Xi ) of single variable. Theorem 8.10 ([19]). χ(X1 , . . . , Xn ) = χorb (X1 , . . . , Xn ) +
n
χ(Xi ).
i=1
The theorem in particular gives −χorb (X, Y ) = −χ(X, Y ) + χ(X) + χ(Y ) for two (non-commutative) self-adjoint random variables X, Y in (M, τ ) with χ(X), χ(Y ) > −∞. The above expression suggests that −χorb (X, Y ) is a kind of free probability counterpart of the classical mutual information I(X; Y ) for two real random variables X, Y (see (8.5)). It seems that expression (8.5) was one of the motivations of Voiculescu to introduce in [42] the mutual free information i∗ (A1 ; . . . ; An ) for subalgebras A1 , . . . , An of M . In [26] (also [24]) we conjectured that, for two projections p, q in (M, τ ), −χorb (p, q) coincides with the mutual free information i∗ (Cp + C(1 − p); Cq + C(1 − q)), and gave a heuristic computation
October 24, 2013
10:0
9in x 6in
262
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
supporting it. It would further be conjectured that −χorb (X1 , . . . , Xn ) = i∗ (W ∗ (X1 ); . . . ; W ∗ (Xn )) holds for any X1 , . . . , Xn in (M, τ ), where W ∗ (Xi ) means the von Neumann subalgebra of M generated by Xi (and 1 of M ); however the problem seems quite difficult. Upon these considerations we may be tempted to use the terminology “microstate mutual free information” for −χorb . The free independence for (self-adjoint) random variables (also for subalgebras) in (M, τ ) is the central concept in free probability theory while we omit the details here and refer the reader to [23,43]. In [39,40] Voiculescu proved that if X1 , . . . , Xn are freely independent, then the additivity of free entropy χ(X1 , . . . , Xn ) = ni=1 χ(Xi ) holds, and the converse is also true, that is, X1 , . . . , Xn are freely independent if this additivity holds with finite χ(Xi ), 1 ≤ i ≤ n. This additivity theorem is slightly strengthened by the next theorem in view of Theorem 8.10. Theorem 8.11 ([19]). χorb (X1 , . . . , Xn ) = 0 if and only if X1 , . . . , Xn are freely independent. We end the section with referring to new papers on χorb . The paper [44] contains new approaches to the orbital free entropy, and recent developments of the question −χorb (p, q) = i∗ (Cp + C(1 − p); Cq + C(1 − q)) are found in [45, 46]. Appendix A. Proof of (6.13) This proof is not included in [23], so we give it here. Since g (t) = 1/f (g(t)) ∈ [δ, δ −1 ] for all 0 ≤ t ≤ 1, it follows that log δ + log(t − s) ≤ log(g(t) − g(s)) ≤ − log δ + log(t − s),
0 ≤ s < t ≤ 1.
This shows that log(g(t) − g(s)) is integrable on 0 ≤ s < t ≤ 1. For any 0 < δ < 1, since log(g(t) − g(s) + δ) is continuous (hence bounded) on 0 ≤ s ≤ t ≤ 1, it is obvious that 1 1 (n) (n) (n) (n) lim sup 2 log(aj − bi ) ≤ lim 2 log(aj − bi + δ) n→∞ n n→∞ n i<j
i<j
= 0≤s
log(g(t) − g(s) + δ) ds dt.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
263
A Concise Exposition of Large Deviations
Moreover, the monotone convergence theorem gives log(g(t) − g(s) + δ) ds dt = log(g(t) − g(s)) ds dt. lim δ 0
0≤s
0≤s
Therefore, lim sup n→∞
1 (n) (n) log a − b log(g(t) − g(s)) ds dt. ≤ j i n2 0≤s
Next, by the Lebesgue convergence theorem, for every ε > 0 there exists a δ0 > 0 such that if 0 < δ < δ0 then log(g(t) − g(s)) ds dt > −ε 0≤s
and g(t) − g(s) < 1 for all 0 ≤ s < t ≤ 1, t − s < δ. Choose a δ ∈ (0, δ0 /2), and for each n ∈ N let m = m(n) := [nδ]. With , ni ] × ( j−1 , nj ] we have Aij := ( i−1 n n 1 n2
(n)
log(aj
(n)
− bi )
i<j, j−i≥m
=
(n) (n) 1Aij (s, t) log aj − bi ds dt
i<j, j−i≥m
and lim
n→∞
(n)
1Aij (s, t) log(aj
(n)
− bi )
i<j, j−i≥m
= log(g(t) − g(s)) for a.e. 0 ≤ s < t ≤ 1, t − s ≥ δ. (n) (n) Moreover, since m/n → δ as n → ∞, i<j, j−i≥m 1Aij (s, t) log aj − bi is uniformly bounded for n ∈ N. Hence the Lebesgue bounded convergence theorem gives 1 n→∞ n2
lim
(n)
log(aj
(n)
− bi )
i<j, j−i≥m
=
0≤s
log(g(t) − g(s)) ds dt.
(A.1)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
264
b1644-ch03
Real and Stochastic Analysis
Now define i + 12 j − 1 j − 12 j−i−1 i ,
Bij := (s, t) : < s < n n n n n 1 j− (a right triangle with ni , n 2 as vertex of right angle). Since g(t) − g(s) ≤ g
j− n
1 2
−g
i (n) (n) = aj − bi n
if (s, t) ∈ Bij ,
we have 1 (n) (n) (n) (n) log(g(t) − g(s)) ds dt ≤ |Bij | log(aj − bi ) = log(aj − bi ), 8n2 Bij 2 noting1 that the area of Bij is |Bij | = |Aij |/8 = 1/8n . Hence, since m − 2 /n < δ0 for all n large enough, we have
1 n2
(n)
(n)
− bi )
log(aj
i<j, j−i<m
≥8
i<j, j−i<m
log(g(t) − g(s)) ds dt Bij
≥8
0<s
log(g(t) − g(s)) ds dt ≥ −8ε.
(A.2)
From (A.1) and (A.2), lim inf n→∞
1 (n) (n) log a − b log(g(t) − g(s)) ds dt − 8ε ≥ j i n2 0≤s
Since ε > 0 is arbitrary, 1 (n) (n) log aj − bi log(g(t) − g(s)) ds dt. lim inf 2 ≥ n→∞ n 0≤s
Acknowledgments The manuscript of this survey article came out from the author’s intensive courses for graduate students on large deviations at the Chungbuk National University in 2008 and at Nagoya University in 2012. The author thanks
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
b1644-ch03
265
Professor Un Cig Ji and Professor Akihito Hora who invited him to those courses. The author is also grateful to Dr. Mil´ an Mosonyi for discussions on the subject and for his permission to use his notes [30].
References [1] H. Araki, On uniqueness of KMS states of one-dimensional quantum lattice systems, Comm. Math. Phys. 44 (1975), 1–7. (related to §7) [2] G. Ben Arous and A. Guionnet, Large deviation for Wigner’s law and Voiculescu’s noncommutative entropy, Probab. Theory Related Fields 108 (1997), 517–542. (§6) [3] C. Berg, J. P. R. Christensen and P. Ressel, Harmonic Analysis on Semigroups. Theory of Positive Definite and Related Functions, Springer, New York, 1984. (§6) [4] D. Bessis, P. Moussa and M. Villani, Monotonic converging variational approximations to the functional integrals in quantum statistical mechanics, J. Math. Phys. 16 (1975), 2318–2325. (§7) [5] Ph. Biane and R. Speicher, Free diffusions, free entropy and free Fisher information, Ann. Inst. H. Poincar´ e Probab. Statist. 37 (2001), 581–606. (§6) [6] P. Billingsley, Probability and measure, John Wiley, New York, 1995. (§5) [7] P. Billingsley, Convergence of Probability Measures, John Wiley, New York, 1999. (§5) [8] I. Bjelakovic, J.-D. Deuschel, T. Kr¨ uger, R. Seiler, Ra. Siegmund-Schultze and A. Szkola, A quantum version of Sanov’s theorem, Comm. Math. Phys. 260 (2005), 659–671. (§7) [9] I. Bjelakovic, J.-D. Deuschel, T. Kr¨ uger, R. Seiler, Ra. Siegmund-Schultze and A. Szkola, Typical support and Sanov large deviations of correlated states, Comm. Math. Phys. 279 (2008), 559584. (§7) [10] O. Bratteli and D. W. Robinson, Operator Algebras and Quantum Statistical Mechanics 2, 2nd edition, Springer, 1997. (§7) [11] H. Cram´er, Sur un nouveau th´eor`eme-limite de la th´eorie des probabilit´es, Actualit´es Scientifiques et Industrielles 736 (1938), 5–23, in Colloque consacr´e ` a la th´eorie des probabilit´es, Vol. 3, Hermann, Paris. (§2) [12] W. De Roeck, C. Maes and K. Neto˘cn´ y, A note on the non-commutative Laplace-Varadhan integral lemma, Rev. Math. Phys. 22 (2010), 839–858. (§7) [13] A. Dembo and O. Zeitouni, Large Deviation Techniques and Applications, 2nd edition, Springer, New York, 1998. (§1–5) [14] J. D. Deuschel and D. W. Stroock, Large deviations, Pure and Applied Mathematics, Vol. 137. Academic Press, Boston, MA, 1989. (§1–5) [15] R. S. Ellis, Large deviations for a general class of random variables, Ann. Probab. 12 (1984), 1–12. (§3) [16] R. S. Ellis, Entropy, Large Deviations and Statistical Mechanics, Springer, New York-Berlin, 1985. (§1–5)
October 24, 2013
266
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch03
Real and Stochastic Analysis
[17] M. Fannes, B. Nachtergaele and R. F. Werner, Finitely correlated states on quantum spin chains, Comm. Math. Phys. 144 (1992), 443–490. (§7) [18] J. G¨ artner, On large deviations from the invariant measure, Theory Probab. Appl. 22 (1977), 24–39. (§3) [19] F. Hiai, T. Miyamoto and Y. Ueda, Orbital approach to microstate free entropy, Int. J. Math. 20 (2009), 227–273. (§8) [20] F. Hiai, M. Mizuo and D. Petz, Free relative entropy for measures and a corresponding perturbation theory, J. Math. Soc. Japan 54 (2002), 679–718. (§6) [21] F. Hiai, M. Mosonyi and T. Ogawa, Large deviations and Chernoff bound for certain correlated states on a spin chain, J. Math. Phys. 48 (2007), 123301, 1–19. (§7) [22] F. Hiai, M. Mosonyi, H. Ohno and D. Petz, Free energy density for mean field perturbation of states of a one-dimensional spin chain, Rev. Math. Phys. 20 (2008), 335–365. (§7) [23] F. Hiai and D. Petz, The Semicircle Law, Free Random Variables and Entropy, Mathematical Surveys and Monographs, Vol. 77, Amer. Math. Soc., Providence, 2000. (§6, 8) [24] F. Hiai and D. Petz, Large deviations for functions of two random projection matrices, Acta Sci. Math. (Szeged) 72 (2006), 581–609. (§6, 8) [25] F. Hiai and D. Petz, A new approach to mutual information, in Noncommutative Harmonic Analysis with Applications to Probability, M. Bo˙zejko et al. (eds.), Banach Center Publications, Vol. 78, 2007, pp. 151–164. (§8) [26] F. Hiai and Y. Ueda, A log-Sobolev type inequality for free entropy of two projections, Ann. Inst. H. Poincar´e Probab. Statist. 45 (2009), 239–249. (§8) [27] M. Lenci and L. Rey-Bellet, Large deviations in quantum lattice systems: one-phase region, J. Stat. Phys. 119 (2005), 715–746. (§7) [28] E. H. Lieb and R. Seiringer, Equivalent forms of the Bessis-Moussa-Villani conjecture, J. Stat. Phys. 115 (2004), 185–190. (§7) [29] M. L. Mehta, Random Matrices, Second edition, Academic Press, Boston, 1991. (§6) [30] M. Mosonyi, Large deviations, Notes, 2008. (§3) [31] K. Neto˘cn´ y and F. Redig, Large deviations for quantum spin systems, J. Stat. Phys. 117 (2004), 521–547. (§7) [32] Y. Ogata, Large deviations in quantum spin chains, Comm. Math. Phys. 296 (2010), 35–68. (§7) [33] D. Petz, G. A. Raggio and A. Verbeure, Asymptotic of Varadhan-type and the Gibbs variational principle, Comm. Math. Phys. 121 (1989) 271–282. (§7) [34] R. T. Rockafellar, Convex Analysis, Princeton Univ. Press, Princeton, 1970. (§3) [35] E. B. Saff and V. Totik, Logarithmic Potentials with External Fields, Springer, Berlin-Heidelberg-New York, 1997. (§6) [36] I. N. Sanov, On the probability of large deviations of random magnitudes, (Russian) Mat. Sb. N. S. 42 (84) (1957), 11–44; On the probability of large
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
A Concise Exposition of Large Deviations
[37] [38] [39] [40]
[41]
[42]
[43] [44] [45] [46]
b1644-ch03
267
deviations of random variables, in Select. Transl. Math. Statist. and Probability, Vol. 1, Inst. Math. Statist. and Amer. Math. Soc., 1961, pp. 213–244. (§5) H. R. Stahl, Proof of the BMV Conjecture, Preprint, arXiv:1107.4875v3. (§7) S. R. S. Varadhan, Asymptotic probabilities and differential equations, Comm. Pure Appl. Math. 19 (1966), 261–286. (§1) D. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory, II, Invent. Math. 118 (1994), 411–440. (§8) D. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory, IV: Maximum entropy and freeness, in Free Probability Theory, D.V. Voiculescu (ed.), Fields Inst. Commun. 12, Amer. Math. Soc., 1997, pp. 293–302. (§8) D. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability theory, V, Noncommutative Hilbert transforms, Invent. Math. 132 (1998), 189–227. (§8) D. Voiculescu, The analogue of entropy and of Fisher’s information measure in free probability theory VI: Liberation and mutual free information, Adv. Math. 146 (1999) 101–166. (§8) D. V. Voiculescu, K. J. Dykema and A. Nica, Free Random Variables, CRM Monograph Series, Vol. 1, Amer. Math. Soc. Providence, RI, 1992. (§8) Ph. Biane and Y. Dabrowski, Concavification of free entropy, Adv. Math. 234 (2013), 667–696. (§8) B. Collins and T. Kemp, Liberation of projections, Preprint, arXiv:1211. 6037. (§8) M. Izumi and Y. Ueda, Remarks on free mutual information and orbital free entropy, Priprint, arXiv:1306.5372. (§8)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
This page intentionally left blank
b1644-ch03
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
CHAPTER 4 QUANTUM WHITE NOISE CALCULUS AND APPLICATIONS
UN CIG JI and NOBUAKI OBATA
1. Introduction The so-called “white noise calculus” is a well-known area of stochastic analysis. It was the basic motivation of Hida [30] to construct a suitable framework where the white noise, i.e., the time derivative of Brownian motion, is realized as a stochastic process with explicit time parameter. Since then a large number of works related to white noise calculus have been published covering many topics with various backgrounds, see Kuo [66] for the collection of references up to 2002. The diversity of white noise calculus has been accelerated since it encountered quantum probability; A quantum aspect was explicitly introduced into white noise calculus and a white noise approach to quantum stochastic analysis was launched out by Obata [77–79]. Then “quantum white noise calculus” has developed along with normal-ordered white noise differential equations as an extension of quantum stochastic differential equations of Itˆ o type [17–19, 79, 80], quantum stochastic integrals of Itˆ o type and quantum Itˆ o formula [36, 37], their extensions to higher powers of quantum white noise [16], integral representation of quantum martingales [41], quantum stochastic gradients and quantum Hitsuda-Skorohod integrals [51], quantum white noise derivatives [46, 48] with their applications to quantum martingales [49] and to the implementation problem of the canonical commutation relations [52]. The basis of these developments is found in the operator theory on white noise functions [76], established in the early 1990’s. There is a survey on quantum white noise calculus by the same authors [42], which complements the results obtained until 2002. The main purpose of this paper is to provide a concise access to “quantum white noise calculus” and to show some of the recent achievements. 269
October 24, 2013
270
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
The bulk of quantum white noise calculus is based on a particular pair of Gelfand triples. Let T be a topological space with a Borel measure dt. For example, T is taken to be a time interval for classical or quantum stochastic analysis, a Euclidean domain or a more general manifold in application to quantum or random fields, and so forth. We first consider a Gelfand triple: E ⊂ H = L2 (T, dt; C) ⊂ E ∗ ,
(1.1)
where E is a countable Hilbert nuclear space densely and continuously embedded in H. The (Boson) Fock space Γ(H) over H is obtained by the second quantization. Generalizing this procedure we construct the second Gelfand triple: W ⊂ Γ(H) ∼ = L2 (ER∗ , µ; C) ⊂ W ∗ ,
(1.2)
where (ER∗ , µ) is the Gaussian space and Γ(H) ∼ = L2 (ER∗ , µ; C) is known as the Wiener–Itˆ o–Segal isomorphism. These Hilbert spaces are fundamental. For example, a Brownian motion {Bt } is defined as a set of random variables in L2 (ER∗ , µ; C), however, its time derivative called the white noise {Wt } is outside. The Gelfand triple (1.2) is an idea to realize {Wt } in W ∗ . The creation and annihilation operators acting in Γ(H) are essential in quantum theory. It is widely accepted that the annihilation operator at and the creation operator a∗t at a point t ∈ T are defined as distributions with values in unbounded operators in the Fock space Γ(H). Our approach is different in this aspect. Exploiting the Gelfand triple (1.2), we formulate such singular “operators” as usual continuous operators in L(W, W ∗ ). In particular, the quantum white noises at and a∗t become continuous operators in L(W, W) and in L(W ∗ , W ∗ ), respectively. In short, the Hilbert spaces Γ(H) ∼ = L2 (ER∗ , µ; C) are not enough to include singular but important objects in stochastic analysis and quantum theory and the pair of Gelfand triples (1.1), (1.2) provides a useful framework to overcome such difficulties. Our standpoint shares a common spirit with the Schwartz distribution theory where the delta function is justified by duality. For relevant approaches based on distribution theory, see also Accardi–Lu–Volovich [1], Berezansky– Kondratiev [7], Bogolubov–Logunov–Todorov [9], Bohm–Gadella [10], Kr´ee–R¸aczka [57], Kristensen–Mejlbo–Thue Poulsen [58]. It is a non-trivial question how to construct a pair of Gelfand triples. In this paper, we adopt a standard CKS-space constructed on the basis of the axiomatization presented by Obata [76] for (1.1) and the CKS-space introduced by Cochran–Kuo–Sengupta [20] for (1.2). Elements in W and W ∗ are called a test white noise function and a generalized white noise function or a white noise distribution, respectively. A continuous operator from W into
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
271
W ∗ is generally called a white noise operator. The aim of “quantum white noise calculus” is a systematic study of white noise functions and operators in terms of the quantum white noise {at , a∗t ; t ∈ T }. The paper is organized as follows; In Section 2 we collect some basic notions and notations common in Gaussian analysis. In particular, we review the Wiener–Itˆo–Segal isomorphism and clarify some conditions to be satisfied by an underlying space. In Section 3 we construct a Gelfand triple (1.2) following the idea of Cochran–Kuo–Sengupta [20], generalizing a Hida–Kubo–Takenaka space [60] and a Kondratiev–Streit space [55]. We also present another construction developed by Gannoun–Hachaichi–Ouerdiane–Rezgui [23]. As a matter of fact, the resultant classes of white noise distributions coincide as was shown by Asai–Kubo–Kuo [2]. In Section 4 we review the white noise operator theory. The white noise operators L(W, W ∗ ) cover a wide class of Fock space operators and have many applications. We introduce the quantum white noises {at , a∗t ; t ∈ T } and their “polynomial” called an integral kernel operator, of which the informal integral expression is given by: κ(s1 , . . . , sl , t1 , . . . , tm ) Ξl,m (κ) = T l+m
× a∗s1 · · · a∗sl at1 · · · atm ds1 · · · dsl dt1 · · · dtm . An operator of this form was first brought into white noise calculus by Hida–Obata–Saitˆ o [33] for the study of the infinite dimensional Laplacians. It is noteworthy that every white noise operator admits an infinite series expansion in terms of integral kernel operators (Fock expansion theorem). This result was first proved by Obata [75, 76] with the help of operator symbols and has been simplified and generalized, see e.g., [13, 19, 43, 45, 53]. Relevant concepts in quantum theory have been extensively studied, see e.g., Berezin [8], Bogolubov–Logunov–Todorov [9], Glimm–Jaffe [25,26], Haag [29], Kr´ee–R¸aczka [57]. In Section 5 we discuss quantum extensions of the stochastic gradient ∇, also known as the Malliavin gradient. In fact, using the classical stochastic gradient we define three quantum stochastic gradients: the annihilation gradient ∇− , creation gradient ∇+ and conservation gradient ∇0 acting on a suitable space of white noise operators. These corresponds to the three basic quantum stochastic processes: t t t as ds, A∗t = a∗s ds, Λt = a∗s as ds, At = 0
0
0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
272
b1644-ch04
Real and Stochastic Analysis
which are called the annihilation process, creation process and conservation (number) process, respectively. In Section 6 we discuss the quantum Hitsuda–Skorohod integrals. It is well known that the adjoint action of the stochastic gradient defines the Hitsuda–Skorohod integral which generalizes the stochastic integral of Itˆo type [65, 71, 73]. As a quantum counterpart, we define three quantum stochastic integrals called the annihilation integral, creation integral and conservation integral by the adjoint actions of the three quantum stochastic gradients. Quantum stochastic integrals for non-adapted integrands were studied by Belavkin [4] and Lindsay [70] with different methods. Their integrals coincide with ours when integrands are taken from a common domain. The results in Sections 5 and 6 are based on Ji–Obata [51]. Finally in Section 7 we discuss the new concept of quantum white noise derivatives and their applications. The Fock expansion theorem means that a white noise operator is a function of quantum white noise Ξ=
∞
Ξl,m (κl,m ) = Ξ(as , at ; s, t ∈ T ),
l,m=0
where the quantum white noise plays a role of the coordinate system. It is therefore natural to consider the derivative with respect to at and a∗t sharing a common spirit with variational calculus. However, since the quantum white noise is so singular, we do not define directly δΞ/δat and δΞ/δa∗t but formulate the annihilation derivative and creation derivative by means of annihilation and creation operators with smooth kernels. It turns out that these derivatives are also derivations with respect to the Wick product. We apply this property to reduce the implementation problem for the canonical commutation relation into a differential equation of new type for white noise operators. This section is mostly based on the recent achievements by Ji–Obata [52]. Acknowledgments. The present work was supported by JSPS-NRF Cooperation Program “Non-Commutative Harmonic Analysis with Applications to Real World Complex Phenomena” (2010–2012). General Notations. Let X, Y be locally convex spaces over the real or complex fields. We denote by L(X, Y) the space of continuous linear operators from X into Y equipped with the topology of uniform convergence on every bounded subset. If Y is of one-dimensional, we write X∗ = L(X, Y), which is called the (strong) dual space of X.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
273
In this paper a complex linear space appears as X = XR + iXR with a real space XR . In this case the complex conjugate is defined naturally by ξ + iη = ξ − iη for ξ, η ∈ XR . For locally convex spaces X, Y we denote by X ⊗ Y the completed πtensor product. If X is a nuclear space and Y is a Fr´echet space, then by the nuclear kernel theorem [91, 93] we have the topological isomorphisms: X⊗Y∼ = L(Y∗ , X), Y ⊗ X∗ ∼ = L(X, Y), Y∗ ⊗ X∗ ∼ = (Y ⊗ X)∗ ∼ = B(X, Y) ∼ = L(X, Y∗ ), where B(X, Y) stands for the space of continuous bilinear forms on X × Y. If H1 , H2 are Hilbert spaces, H1 ⊗ H2 denotes the Hilbert space tensor product unless there is no danger of confusion. Note that H1 ⊗ H2 ∼ = L2 (H2 , H1 ), where L2 (H2 , H1 ) is the space of operators of Hilbert–Schmidt type. For a locally convex space X and an integer n ≥ 1 let X⊗n = X ⊗ · · ·⊗ X (n-times) denote the n-fold tensor power of X. For ξ1 , . . . , ξn ∈ X we define the symmetrized tensor product by · · · ⊗ξ n= ξ1 ⊗
1 ξσ(1) ⊗ · · · ⊗ ξσ(n) , n! σ∈S(n)
where S(n) is the group of permutations on {1, 2, . . . , n}. The n-fold symb metric tensor power X⊗n is the closed subspace of X⊗n spanned by ξ ⊗n , where ξ runs over X. Here the polarization identity is useful: · · · ⊗ξ n= ξ1 ⊗
1 2n n!
1 · · · n (1 ξ1 + · · · + n ξn )⊗n ,
where the sum is taken over all possible combinations of i ∈ {±1}. 2. Elements of Gaussian Analysis 2.1. Standard construction of countable Hilbert spaces Let HR be a real Hilbert space with an inner product ·, ·, and H = HR + iHR the complexification. Since the inner product of HR is an Rbilinear form on HR × HR , it is naturally extended to a C-bilinear form
October 24, 2013
274
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
on H × H, denoted by the same symbol. Then the inner product of H is defined by ¯ η, ξ|η = ξ,
ξ, η ∈ H.
Obviously, ξ|η = ξ, η for ξ, η ∈ HR . The norm of H is defined by |ξ|0 = ξ|ξ, ξ ∈ H. The norm of HR is defined in a similar manner and is denoted by the same symbol. A chain of Hilbert spaces rigging H is constructed by means of a positive selfadjoint operator in a standard manner. Let A be a selfadjoint operator with dense domain Dom (A) ⊂ H, which is positive, i.e., inf Spec (A) > 0, and real, i.e., A maps Dom (A) ∩ HR into HR . For each p ≥ 0, the dense subspace Dom (Ap ) ⊂ H becomes a Hilbert space equipped with the norm |ξ|p = |Ap ξ|0 ,
ξ ∈ Dom (Ap ).
This Hilbert space is denoted by Ep . Note that A−1 becomes a bounded operator on H and the operator norm is given by ρ ≡ A−1 OP = {inf Spec (A)}−1 < ∞. For p ≥ 0 we define a norm of H by |ξ|−p = |A−p ξ|0 ,
ξ ∈ H.
Let E−p be the completion of H with respect to this norm. Then we have a chain of Hilbert spaces: · · · ⊂ Ep ⊂ · · · ⊂ E0 = H ⊂ · · · ⊂ E−p ⊂ · · · ,
(2.1)
where each inclusion is continuous and has a dense image, and |ξ|p+q ≤ ρq |ξ|p ,
ξ ∈ Ep+q ,
p ∈ R,
q ≥ 0.
(2.2)
Moreover, for any p, q ∈ R the operator Ap−q is naturally extended to an isometry from Ep onto Eq . One of the noteworthy features of the chain of Hilbert spaces (2.1) is found in their duality relations. Let p ≥ 0 and we set fx (ξ) = A−p x, Ap ξ,
x ∈ E−p ,
ξ ∈ Ep ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
275
where the right-hand side is the canonical C-bilinear form on H × H. We see from |fx (ξ)| = |A−p x, Ap ξ| ≤ |A−p x|0 |Ap ξ|0 = |x|−p |ξ|p that fx is a continuous linear functional on Ep , i.e., fx ∈ (Ep )∗ . Then it is easily verified that the map x → fx is an isometric isomorphism from E−p onto (Ep )∗ . We always identify E−p with (Ep )∗ in this manner. Moreover, if x ∈ H ⊂ E−p , since A is a real operator we have fx (ξ) = A−p x, Ap ξ = x, A−p Ap ξ = x, ξ,
ξ ∈ Ep ⊂ H.
Namely, the C-bilinear form (x, ξ) → fx (ξ), (x, ξ) ∈ E−p × Ep , and the canonical one on H × H coincide on a common domain. Hence without confusion we may write fx (ξ) = x, ξ,
x ∈ E−p ,
ξ ∈ Ep .
Taking the above argument in mind, we say that Ep and E−p are mutually dual spaces with respect to H. From (2.1) we obtain the projective limit space: E = proj lim Ep = Ep p→∞
p≥0
equipped with the Hilbertian norms |·|p . Since the norms are linearly ordered as in (2.2), we may choose a countable set of defining norms so E becomes a countable Hilbert space. Let f be a continuous linear functional on E. Then by definition there exist p ≥ 0 and C ≥ 0 such that |f (ξ)| ≤ C|ξ|p ,
ξ ∈ E.
Hence f is extended uniquely to a continuous linear functional on Ep so that there exists x ∈ E−p such that f (ξ) = x, ξ,
ξ ∈ E.
(2.3)
Conversely, every x ∈ E−p with some p ≥ 0 gives rise to a continuous linear functional on E. Thus, E ∗ and p≥0 E−p are identified. Moreover, it is known that the strong dual topology of E ∗ and the inductive limit topology coincide. In this sense we write E−p . E ∗ = ind lim E−p = p→∞
p≥0
October 24, 2013
10:0
9in x 6in
276
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
The canonical bilinear form on E ∗ × E is denoted by the same symbol ·, ·. Consequently, the chain of Hilbert spaces (2.1) is equipped with the limit spaces on both sides: E ⊂ · · · ⊂ Ep ⊂ · · · ⊂ E0 = H ⊂ · · · ⊂ E−p ⊂ · · · ⊂ E ∗ .
(2.4)
The real parts verify similar inclusion relations: ER ⊂ · · · ⊂ ER,p ⊂ · · · ⊂ ER,0 = HR ⊂ · · · ⊂ ER,−p ⊂ · · · ⊂ ER∗ .
(2.5)
Lemma 2.1. If A−r is of Hilbert–Schmidt type for some r > 0, then E becomes a nuclear space. Proof. By the assumptions on A there exists a complete orthonormal basis {ek ; k = 0, 1, 2, . . .} for H such that Aek = λk ek ,
λk > 0,
A−r 2HS =
∞
λ−2r < ∞. k
k=0
It is easy to see that {A−(p+r) ek ; k = 0, 1, 2, . . .} is a complete orthonormal basis for Ep+r . Since ∞
|A−p−r ek |2p =
k=0
∞
|A−r ek |20 =
k=0
=
∞
∞
2 |λ−r k ek |0
k=0
λ−2r = A−r 2HS < ∞, k
k=0
the natural injection Ep+r → Ep is of Hilbert–Schmidt type. This is valid for all p ≥ 0 so E is a nuclear space. From (2.4) and (2.5) we obtain complex and real Gelfand triples: E ⊂ H ⊂ E∗,
ER ⊂ HR ⊂ ER∗ ,
where H is a Hilbert space, and E is a countable Hilbert nuclear space densely and continuously embedded in H. A Gelfand triple is also called a nuclear rigging. For further properties of a Gelfand triple, see e.g., Gelfand– Vilenkin [24]. 2.2. Gaussian space The real Gelfand triple: ER ⊂ HR ⊂ ER∗ ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
277
obtained in the previous subsection is fundamental for Gaussian analysis. Let F (ER∗ ) be the smallest σ-field such that the function x → x, ξ, x ∈ ER∗ , is measurable for all ξ ∈ ER . It follows from the celebrated Bochner–Minlos theorem on measures on an infinite dimensional vector space [94] that there exists a probability measure on (ER∗ , F (ER∗ )) specified uniquely by the characteristic function: 2 e−|ξ|0 /2 = eix, ξ µ(dx), ξ ∈ ER . (2.6) ER∗
The above probability measure µ is called the standard Gaussian measure on ER∗ and the probability space (ER∗ , F (ER∗ ), µ) is referred to as the (standard) Gaussian space. We often omit to explicitly refer to the σ-field. For ξ ∈ ER we set Xξ (x) = x, ξ,
x ∈ ER∗ .
Apparently, Xξ is a continuous linear function on ER∗ and measurable with respect to F (ER∗ ). Thus, Xξ becomes a random variable defined on the Gaussian space (ER∗ , µ). Lemma 2.2. The set of random variables {Xξ ; ξ ∈ ER } is a Gaussian system with mean x, ξµ(dx) = 0 (2.7) E[Xξ ] = ER∗
and covariance
E[Xξ Xη ] =
ER∗
x, ξx, ηµ(dx) = ξ, η.
(2.8)
Proof. Since {Xξ ; ξ ∈ ER } is closed under linear combination, it is sufficient to show that Xξ is a Gaussian random variable for all ξ ∈ ER . In fact, we see from (2.6) that 2 2 2 itXξ E[e ]= eitx, ξ µ(dx) = e−|tξ|0 /2 = e−t |ξ|0 /2 , t ∈ R, ER∗
which is the characteristic function of the one-dimensional normal distribution with mean 0 and variance |ξ|20 . Therefore we have E[Xξ ] = 0,
E[Xξ2 ] = |ξ|20 .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
278
b1644-ch04
Real and Stochastic Analysis
Next, for ξ, η ∈ ER we observe that E[(Xξ + Xη )2 ] = E[Xξ2 ] + 2E[Xξ Xη ] + E[Xη2 ] = |ξ|20 + 2E[Xξ Xη ] + |η|20 . On the other hand, 2 ] = |ξ + η|20 = |ξ|20 + 2ξ, η + |η|20 . E[(Xξ + Xη )2 ] = E[Xξ+η
Then (2.8) is immediate.
It follows from Lemma 2.2 that {Xξ ; ξ ∈ ER } ⊂ L2 (ER∗ , µ; R). Let · 0 denote the norm of L2 (ER∗ , µ; R). Then we have x, ξ2 µ(dx) = |ξ|20 , ξ ∈ ER . Xξ 20 = ER∗
Since ER is a dense subspace of HR , the linear map ξ → Xξ ∈ L2 (ER∗ , µ; R), ξ ∈ ER , is uniquely extended to an isometry from HR into L2 (ER∗ , µ; R). The image is denoted by Xξ again. We often write Xξ (x) = x, ξ,
x ∈ ER∗ ,
though the value is not pointwisely defined in general. The following result is now straightforward. Proposition 2.3. {Xξ ; ξ ∈ HR } is a Gaussian system with mean and covariance given as in (2.7) and (2.8). Moreover, the map ξ → Xξ is an isometry from HR into L2 (ER∗ , µ; R). Remark 2.4. For a complex ξ = ξ1 + iξ2 ∈ H, ξ1 , ξ2 ∈ HR , we define Xξ by Xξ (x) = x, ξ = x, ξ1 + ix, ξ2 ,
x ∈ ER∗ .
Then the map ξ → Xξ becomes an isometry from H into L2 (ER∗ , µ; C). Proposition 2.5 (Wick formula). For ξ1 , . . . , ξ2n ∈ H it holds that
2n ER∗ k=1
x, ξk µ(dx) =
1 2n n!
ξσ(1) , ξσ(2) · · · ξσ(2n−1) , ξσ(2n) .
σ∈S(2n)
(2.9)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
279
Proof. For ξ ∈ HR , since Xξ obeys the normal distribution with mean 0 and variance |ξ|20 , we have E[Xξ2n ] =
(2n)! 2n |ξ| . 2n n! 0
In other words, ER∗
x, ξ2n µ(dx) =
(2n)! ξ, ξn . 2n n!
Then by polarization and complexification we obtain (2.9).
Remark 2.6. The Wick formula (2.9) is expressed also in the following form: E[Xξi Xξj ], E[Xξ1 · · · Xξ2n ] = ϑ∈PP (2n) {i,j}∈ϑ
where PP (2n) is the set of all pair partitions of {1, 2, . . . , 2n}. 2.3. Fock spaces and the Wiener–Itˆ o decomposition We first recall the notion of the (Boson) Fock space. Let H be a Hilbert b space with inner product ·|· and norm |·|0 . For n ≥ 0 let H ⊗n be the n-fold symmetric tensor power of H, of which the inner product and norm b are denoted by the common symbols. By definition H ⊗0 ∼ = C. Let Γ(H) denote the space of all sequences φ = (f0 , f1 , . . . , fn , . . .),
b
fn ∈ H ⊗n ,
satisfying φ20 =
∞
n! |fn |20 < ∞.
n=0
Then, Γ(H) becomes a Hilbert space with the inner product defined by φ|ψ =
∞
n! fn |gn
φ = (fn ),
ψ = (gn ).
n=0
We call Γ(H) the (Boson) Fock space over H. The factor n! is for our convention.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
280
b1644-ch04
Real and Stochastic Analysis
With each ξ ∈ H we associate an exponential vector or a coherent vector defined by
ξ ⊗n ξ ⊗2 ,..., ,... . (2.10) φξ = 1, ξ, 2! n! Note that φξ ∈ Γ(H) and φξ 20
⊗n ξ = n! n! n=0 ∞
2 = e|ξ|20 . 0
Lemma 2.7. Let D ⊂ H be a dense subspace. Then the exponential vectors {φξ ; ξ ∈ D} are linearly independent and span a dense subspace of Γ(H). The above fact is well known, for the proof see e.g., [76, Chapter 2]. We now recall the structure of L2 (E ∗ , µ; C). Some notations are needed. Let τ be an element in (ER ⊗ ER )∗ uniquely specified by τ, η ⊗ ξ = ξ, η,
ξ, η ∈ ER .
In other words, τ is an integral kernel for the identity operator I ∈ L(ER , ER ), so in fact τ ∈ ER ⊗ ER∗ . For x ∈ ER∗ and n = 0, 1, 2, . . . we define the Wick tensor : x⊗n : ∈ (ER⊗n )∗ inductively by : x⊗0 : = 1, : x⊗1 : = x, : x⊗(n−1) : −(n − 1)τ ⊗ : x⊗(n−2) : , : x⊗n : = x⊗
n ≥ 2.
The Wick tensor is closely related to the Hermitian polynomials, which are defined by the generating function: 2
e2xt−t =
∞ n t Hn (x). n! n=0
(2.11)
Lemma 2.8. For ξ ∈ ER , ξ = 0, it holds that
x, ξ |ξ|n0 ⊗n ⊗n . : x : , ξ = n/2 Hn √ 2 2 |ξ|0 Proof. By induction on n. We need only to exploit the three-term recurrence relation: Hn (x) = 2xHn−1 (x) − 2(n − 1)Hn−2 (x), which is straightforward from (2.11).
n ≥ 2,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
281
Quantum White Noise Calculus and Applications
Lemma 2.9. For ξ ∈ ER it holds that : x⊗m : , ξ ⊗m : x⊗n : , ξ ⊗n µ(dx) = n!|ξ|2n 0 δmn . ER∗
Proof. The orthogonality relation for the Hermite polynomials is well known:
2 1 x x √ Hm √ Hn √ e−x /2 dx = 2n n!δmn . 2π R 2 2
Then the assertion is immediate from Lemma 2.8. For n ≥ 0 let Pn be the space of all Wick polynomials of the form: φ(x) = : x⊗n : , fn ,
fn ∈ (n-fold algebraic tensor power of E). (2.12)
It is shown by the polarization formula that Pn coincides with the set of finite linear combinations of the Wick polynomials of the form : x⊗n : , ξ ⊗n with ξ running over E. Lemma 2.10. For φn ∈ Pn as in (2.12) we have φm , φn = : x⊗m : , fm : x⊗n : , fn µ(dx) = n!fm , fn δmn . ER∗
In particular, φn 20 = n!|fn |20 . Proof.
(2.13)
Follows from Lemma 2.9 with the help of the polarization formula.
Using L2 -approximation based on the isometry (2.13), we define b : x : , fn for fn ∈ H ⊗n as an element of L2 (ER∗ , µ; C). Consequently, Hn = Pn coincides with the space of functions of the form : x⊗n : , fn , b where fn ∈ H ⊗n . Since the subspaces Pn are mutually orthogonal by Lemma 2.10, so are Hn . Moreover, it is shown that the linear space spanned by {Pn ; n = 0, 1, 2, . . .} is dense in L2 (ER∗ , µ; C). Consequently, we have the orthogonal sum decomposition: ⊗n
L2 (ER∗ , µ; C) =
∞
⊕Hn ,
n=0
which is referred to as the Wiener–Itˆ o decomposition. In other words, for every φ ∈ L2 (ER∗ , µ; C) there exists a unique sequence (f0 , f1 , . . . , fn , . . .)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
282
b1644-ch04
Real and Stochastic Analysis
b with fn ∈ H ⊗n such that
φ(x) =
∞
: x⊗n : , fn ,
x ∈ ER∗ .
(2.14)
n=0
In that case it holds that φ20 =
∞
n! |fn |20 .
n=0
In view of the definition of the Fock space we see that the above correspondence φ ↔ (f0 , f1 , . . . , fn , . . .) gives rise to a unitary isomorphism: L2 (ER∗ , µ; C) ∼ = Γ(H),
(2.15)
which is called the Wiener–Itˆ o–Segal Isomorphism. We also say that φ ∈ L2 ∗ (ER , µ; C) in (2.14) is called the Gaussian realization of φ = (fn ) ∈ Γ(H) and often denote them by the same symbol. Proposition 2.11. The Gaussian realization of the exponential vector (2.10) is given by φξ (x) = ex, ξ−ξ, ξ/2 ,
x ∈ ER∗ ,
ξ ∈ E.
The proof is straightforward. Since {φξ ; ξ ∈ E} spans a dense subspace of Γ(H), the unitary isomorphism (2.15) is uniquely determined by the correspondence
ξ ⊗n ,... . φξ (x) = ex, ξ−ξ, ξ/2 ↔ φξ = 1, ξ, . . . , n! Remark 2.12. The notation :Xξ1 · · · Xξn : is also used in some literatures, e.g., [40]. The relation is obvious: : Xξ1 · · · Xξn : =: x, ξ1 · · · x, ξn : = : x⊗n : , ξ1 ⊗ · · · ⊗ ξn . 2.4. Underlying spaces For wide applications the Gelfand triple E ⊂ H ⊂ E ∗ should be constructed over an underlying topological space T . We take T to be a time interval for stochastic processes, or an Euclidean domain or a more general manifold for random fields or quantum fields. In order to keep generality, we give the list of minimal requirements for T and the positive selfadjoint operator A used for the construction of the Gelfand triple. As a result, T is quite
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
283
arbitrary, even happens to be a discrete or finite space. This axiomatization appeared first in [76]. We always consider the topological σ-field (Borel σ-field) over T and take a Borel measure denoted simply by dt. Define a real and complex Hilbert spaces: HR = L2 (T, dt; R),
H = L2 (T, dt; C) = HR + iHR .
Let A be a positive selfadjoint operator densely defined in H satisfying the following conditions: (A1) A is a real operator, i.e., A maps Dom (A) ∩ HR into HR ; (A2) A is positive and ρ ≡ A−1 OP = {inf Spec (A)}−1 < 1; (A3) A−1 is of Hilbert-Schmidt type. Applying the standard construction (Section 2.1), we obtain complex and real Gelfand triples: E ⊂ H = L2 (T, dt; C) ⊂ E ∗ ,
ER ⊂ HR = L2 (T, dt; R) ⊂ ER∗ .
An element in E is called a test function on T , and one in E ∗ a generalized function or a distribution on T . Since the space E is obtained by taking the intersection of Ep , each ξ ∈ E stands for merely an equivalence class of measurable functions which coincide almost everywhere on T . In this connection we need to assume the following: (A4) for each ξ ∈ E there exists a unique continuous function ξ on T such for almost all t ∈ T ; that ξ(t) = ξ(t) ξ ∈ E, is continuous, (A5) for each t ∈ T a linear functional δt : ξ → ξ(t), ∗ i.e., δt ∈ E ; (A6) the map t → δt ∈ E ∗ , t ∈ T , is continuous. The linear functional δt in (A5), also called the delta function, is crucial for realization of quantum white noise. Under the above assumptions we always consider E as a space of continuous functions on T . Example 2.13. A prototype of our consideration is the Gelfand triple: S(R) ⊂ HR = L2 (R; dt; R) ⊂ S (R),
October 24, 2013
10:0
9in x 6in
284
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
where S(R) is the space of rapidly decreasing R-valued C ∞ -functions, and S (R) the space of tempered distributions. Let H = L2 (R; dt; C) and consider the differential operator A = 1 + t2 −
d2 dt2
acting in H. For k = 0, 1, 2, . . . we set
√ k −1/2 2 ek (t) = π 2 k! Hk (t)e−t /2 ,
t ∈ R,
where Hk is the Hermite polynomial of degree k. It is known that {ek }∞ k=0 is an orthonormal basis of H satisfying Aek = (2k + 2)ek ,
k = 0, 1, 2, . . . .
It is then easy to see that A is a positive selfadjoint operator satisfying (A1)–(A3). In fact, ρ = A−1 OP =
1 , 2
A−q 2HS =
∞ k=0
1 < ∞, (2k + 2)2q
q>
1 . 2
Let E be the countable Hilbert space obtained from A in the standard manner. We see from the construction that every ξ ∈ E admits the expansion ∞
ξ=
ck ek ,
ck ∈ C,
k=0
which converges in Ep for all p ≥ 0. Moreover, it is proved that ˜ = ξ(t)
∞
ck ek (t),
t ∈ R,
k=0
converges and becomes a rapidly decreasing C ∞ -function, see Reed–Simon [88, Chapter V]. Thus we have ξ ∈ S(R). In this manner we can verify condition (A4) and obtain E = S(R) + iS(R). In fact, the canonical topology of S(R) is defined often by the seminorms: ξm,n = sup |tm ξ (n) (t)|,
ξ ∈ S(R).
t∈R
It is proved that the topology defined by the above seminorms coincides with the one defined by the Hilbertian norms induced by A.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
285
Quantum White Noise Calculus and Applications
Finally, the evaluation δt : ξ → ξ(t) is nothing else but Dirac’s delta function. It is known that t → δt ∈ E−p is continuous for p > 5/12. Moreover, for any p > 5/12 and 0 ≤ α ≤ 1 with p − 5/12 > α/2 there exists a constant number C = C(p, α) ≥ 0 such that |δs − δt |−p ≤ C|s − t|α ,
s, t ∈ R.
(2.16)
Hence conditions (A5) and (A6) are fulfilled. For detailed study on the norm estimates, see [77, 78].
3. White Noise Distributions 3.1. Standard CKS-space We start with a variant of Fock space by changing the weights in the direct sum. Let α = {α(n)} be a sequence of positive numbers and define ∞ b 2 2 ⊗n n! α(n)|fn |p < ∞ . Γα (Ep ) = φ = (fn ); fn ∈ Ep , φp,+ = n=0
This is called the weighted Fock space over Ep with a weight sequence α. Obviously, the Fock space Γ(Ep ) is the case of α(n) ≡ 1. We first need the following condition: (B1) α(0) = 1 and there exists some σ ≥ 1 such that inf α(n)σ n > 0. n≥0
Then Γα (Ep ) ⊂ Γ(H) for all p ≥ p0 , where p0 ≥ 0 is given by ρ2p0 σ = 1. In fact, setting = inf n≥0 α(n)σn > 0, we have φ20 =
∞
n!|fn |20 ≤ −1
n=0
≤ −1
∞
n!α(n)σ n |fn |20
n=0 ∞
n!α(n)(ρ2p σ)n |fn |2p ≤ −1 φ2p,+ .
n=0
Having obtained a chain of weighted Fock spaces: · · · ⊂ Γα (Eq ) ⊂ · · · ⊂ Γα (Ep ) ⊂ · · · ⊂ Γα (H),
p0 ≤ p < q,
we define their limit space by W = Γα (E) = proj lim Γα (Ep ) = p→∞
p≥0
Γα (Ep ).
October 24, 2013
10:0
9in x 6in
286
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
Under condition (B1), Γα (Ep ) and Γ1/α (E−p ) are mutually dual spaces with respect to Γ(H). Hence we have Γ1/α (E−p ). W ∗ = ind lim Γ1/α (E−p ) = p→∞
p≥0
Lemma 3.1. W is a countable Hilbert nuclear space. Proof. For any p ≥ 0 and q > 0 such that A−q HS < ∞ the natural inclusion Ep+q → Ep is of Hilbert–Schmidt type. So is Γα (Ep+q ) → Γα (Ep ). In fact, the Hilbert–Schmidt norm of this inclusion is given by ∞
(1 − λj−2q )−1 < ∞,
j=0
where 1 < λ0 ≤ λ1 ≤ · · · are the eigenvalues of A.
Thus, we come to a Gelfand triple: W ⊂ Γ(H) ⊂ W ∗ ,
(3.1)
where the canonical C-bilinear form on W ∗ × W is defined by Φ, φ =
∞
n!Fn , fn ,
Φ = (Fn ) ∈ W ∗ ,
φ = (fn ) ∈ W.
(3.2)
n=0
We need further assumptions on the weight sequence α = {α(n)}. The generating function of α = {α(n)} is defined by Gα (t) =
∞ α(n) n t . n! n=0
(3.3)
In order that Gα is entire holomorphic on C, i.e., the power series in the right-hand side has an infinite radius of convergence we assume the following: α(n) 1/n (B2) lim = 0. n→∞ n! On the other hand, it follows from (B1) that the generating function of 1/α = {1/α(n)} defined by G1/α (t) = is also entire holomorphic on C.
∞
1 tn n!α(n) n=0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
287
Proposition 3.2. For an exponential vector φξ ∈ Γ(H), ξ ∈ H, we have φξ ∈ Γα (Ep ) if and only if ξ ∈ Ep . In that case, φξ 2p,+ = Gα (|ξ|2p ),
p ≥ p0 .
In particular, φξ ∈ W if and only if ξ ∈ E. Moreover, {φξ ; ξ ∈ E} is a set of independent vectors and spans a dense subspace of W. We need some technical notions. Two sequences {α(n)}, {γ(n)} of positive numbers are said to be equivalent if there exist positive constants C1 , C2 , M1 , M2 > 0 such that C1 M1n α(n) ≤ γ(n) ≤ C2 M2nα(n),
n = 0, 1, 2, . . . .
A sequence of positive numbers {β(n)} is called log-concave if β(n)β(n + 2) ≤ β(n + 1)2 ,
n = 0, 1, 2, . . . .
We now assume the following conditions: (B3) α is equivalent to a positive sequence γ = {γ(n)} such that {γ(n)/n!} is log-concave; (B4) α is equivalent to another positive sequence δ = {δ(n)} such that {(n!δ(n))−1 } is log-concave. Some important consequences are stated in the following: Lemma 3.3. Let α = {α(n)} be a sequence of positive numbers satisfying (B1)−(B4). (1) For any λ > 0 there exist C1 > 0 and C2 > 0 such that eλt Gα (t) ≤ C1 Gα (C2 t),
t ≥ 0.
(2) There exists a constant M ≥ 1 such that Gα (s)Gα (t) ≤ Gα (M (s + t)),
s, t ≥ 0.
(3) There exists a constant M ≥ 1 such that Gα (s + t) ≤ Gα (M s)Gα (M t),
s, t ≥ 0.
We may take the same constant number M as in (2).
October 24, 2013
10:0
9in x 6in
288
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
Proof.
It is known [2] that there exists M ≥ 1 such that α(m + n) ≤ M m+n α(m)α(n),
m, n ≥ 0,
(3.4)
α(m)α(n) ≤ M m+n α(m + n),
m, n ≥ 0.
(3.5)
(For this claim we need only α(0) = 1, (B3) and (B4).) Then we see that
∞ λk α(l) k+l α(n) n n k α(l) λt e Gα (t) = = λ t t . (3.6) k k! l! n! α(n) n=0 k,l=0
k+l=n
By (B1) there exists a constant > 0 such that α(n)σ n ≥ . Together with (3.5), we obtain n
n α(l) α(l) λk λk −n ≤ k k α(n) M α(l)α(k) k+l=n
k+l=n
≤M
n
n
1 λk −k k σ
k+l=n
=
−1
M n (1 + σλ)n .
Then under t ≥ 0, (3.6) becomes eλt Gα (t) ≤ −1
α(n) ((1 + σλ)M t)n = −1 Gα ((1 + σλ)M t), n! n=0
which proves (1). The proofs of (2) and (3) are similar.
Definition 3.4. Let E ⊂ H = L2 (T, dt; C) ⊂ E ∗ be a Gelfand triple constructed from a positive selfadjoint operator A satisfying (A1)–(A6). Let α = {α(n)} be a sequence of positive numbers satisfying (B1)–(B4). Then the Gelfand triple W ⊂ Γ(H) ⊂ W ∗ ,
W = Γα (E),
(3.7)
is called the standard CKS-space over H = L2 (T, dt; C) with a weight sequence α. Example 3.5. It is obvious that α(n) ≡ 1 satisfies conditions (B1)–(B4). The standard CKS-space corresponding to α(n) ≡ 1 is the Hida–Kubo– Takenaka space [60], often denoted by (E) ⊂ Γ(H) ∼ = L2 (ER∗ , µ; C) ⊂ (E)∗ .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
289
In this case the generating function (3.3) is given by Gα (t) = Ga/α (t) =
∞ 1 n t = et . n! n=0
Example 3.6. For 0 ≤ β < 1 the positive sequence α(n) = (n!)β satisfies (B1)–(B4). The corresponding CKS-space is called the Kondratiev–Streit space [55]. The generating functions Gα (t) = G1/α (t) =
∞ ∞ (n!)β n 1 t = tn , 1−β n! (n!) n=0 n=0 ∞
∞ 1 1 n t = tn , β 1+β n!(n!) (n!) n=0 n=0
do not admit concise expressions. Nevertheless, explicit calculation is possible to some extent, see Kuo [65]. More examples of positive sequences satisfying (B1)–(B4) were discussed by Cochran–Kuo–Sengupta [20]. Under the Wiener–Itˆ o–Segal isomorphism Γ(H) ∼ = L2 (ER∗ , µ; C) or by the Gaussian realization, we may regard W as a space of functions on ER∗ . In this sense an element of W is called a test white noise function, and by duality an element of W ∗ is called a generalized white noise function or a white noise distribution. By construction ξ ∈ W is determined up to µ-null sets on ER∗ . Thanks to the continuous version theorem stated below we always assume that ξ ∈ W is a continuous function on ER∗ . Theorem 3.7. For each φ ∈ W there exists a unique continuous function φ on ER∗ such that φ(x) = φ(x) for µ-a.e. x ∈ ER∗ . Moreover, φ(x) is given by the absolutely convergent series: φ(x) =
∞
: x⊗n : , fn ,
x ∈ ER∗ ,
n=0
o–Segal where (fn ) ∈ Γ(H) corresponds to the given φ under the Wiener–Itˆ isomorphism. The proof for the Hida–Kubo–Takenaka space is found in [76] and a small modification is sufficient to the case of a general CKS-space.
October 24, 2013
10:0
9in x 6in
290
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
For a generalized white noise function Φ = (Fn ) ∈ W ∗ we adopt its formal Gaussian realization: Φ(x) =
∞
: x⊗n : , Fn ,
x ∈ ER∗ .
n=0
Definition 3.8. With each t ∈ T we associate Wt = (0, δt , 0, . . .) ∈ W ∗ . Then {Wt ; t ∈ T } is called the white noise field on T . When T is a time interval. it is called the white noise process. It follows from (A5) and (A6) that t → Wt ∈ W ∗ is continuous. Let {Xξ ; ξ ∈ ER } be the Gaussian system discussed in Section 2.2. From the obvious relation Xξ (x) = x, ξ = : x⊗1 : , ξ we see that Xξ is the Gaussian realization of (0, ξ, 0, . . .) ∈ W. Let φ = (f0 , f1 , . . .) ∈ W. It follows from (3.2) that Wt , φ = δt , f1 = f1 (t), Xξ , φ = ξ, f1 = ξ(t)f1 (t) dt, T
hence we have Xξ , φ =
ξ(t)Wt , φ dt. T
In other words, ξ(t)Wt dt,
Xξ =
(3.8)
T
where the right-hand side makes sense as an integral in W ∗ . (3.8) clarifies a role of the white noise field {Wt ; t ∈ T }; although Wt is not defined pointwisely, smearing the parameter t ∈ T by integration, we obtain a Gaussian random variable. In this line it is convenient to write E[Wt ] = 0,
E[Ws Wt ] = τ (s, t).
(3.9)
The latter means that the white noise field is pointwisely independent. Somehow informal notation τ (s, t) = δ(s − t) is also used in some literatures.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
291
Quantum White Noise Calculus and Applications
3.2. Brownian motion Let us start with the real Gelfand triple: S(R) ⊂ HR = L2 (R; dt; R) ⊂ S (R) and the standard Gaussian space (S (R), µ), see Example 2.13. We know that {Xξ ; ξ ∈ HR } is a Gaussian system. For each t ≥ 0 we set Bt (x) = X1[0,t] (x) = x, 1[0,t] ,
x ∈ ER∗ ,
(3.10)
where 1[0,t] stands for the indicator function of the interval [0, t]. Then {Bt ; t ≥ 0} becomes a Gaussian system with mean E[Bt ] = 0 and covariance E[Bs Bt ] = 1[0,s] , 1[0,t] = min{s, t},
s, t ≥ 0.
In other words, {Bt } is a Brownian motion starting at 0. Remark 3.9. The above Brownian motion is sometimes called a weak Brownian motion in the sense that the continuity of a sample path is not referred to. It follows by Kolmogorov’s continuous version theorem that a Brownian motion is constructed from a weak Brownian motion by taking the continuous version. The Paley–Wiener–Zygmund theorem says that almost all sample paths of Brownian motion is nowhere differentiable. Therefore, the time derivative of Brownian motion, i.e., the velocity of a Brownian particle: Wt =
d Bt dt
(3.11)
is not defined sample-pathwise. We now employ a standard CKS-space, e.g., the Hida–Kubo–Takenaka space: W ⊂ Γ(H) ∼ = L2 (ER∗ , µ; C) ⊂ W ∗ . Note first the obvious relation: d t d ξ(s)ds = 1[0,t] , ξ, δt , ξ = ξ(t) = dt 0 dt
ξ ∈ S(R),
t > 0,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
292
b1644-ch04
Real and Stochastic Analysis
which means that d 1[0,t] = δt dt
in S (R).
Now, in view of Bt = (0, 1[0,t] , 0, . . .) ∈ Γ(H) ⊂ W ∗ , Wt = (0, δt , 0, . . .) ∈ W ∗ , we see that t → Bt ∈ L2 (ER∗ , µ; C) ∼ = Γ(H) is continuous but is not differentiable as a Γ(H)-valued function. While, t → Bt ∈ W ∗ is differentiable (in fact, C ∞ -class) and (3.11) holds. In that case, {Wt } is called the white noise process. A simple model of a particle moving in an environment with friction and random force is given by m
dv = −γv + Wt , dt
(3.12)
where v = v(t) is the velocity of the particle, m > 0 the mass and γ > 0 a friction coefficient. (3.12) is a Langevin equation in the simplest form. Since t → Wt ∈ W ∗ is continuous, there is no difficulty of using an elementary calculus to write down the solution: γ 1 t − γ (t−s) v(t) = e m Ws ds + v0 e− m t . (3.13) m 0 The right-hand side is justified in W ∗ and is understood as a white noise extension of a standard stochastic integral. It is necessary to verify “regularity” of {v(t)} for further analysis. In view of (3.8) we see that {v(t)} ⊂ L2 (ER∗ , µ; R) is a Gaussian system. Then by direct calculation we have γ
E[v(t)] = v0 e− m t , Cov(v(s), v(t)) = E [v(s)v(t)] − E [v(s)]E [v(t)] =
γ γ 1 (e− m |s−t| − e− m (s+t) ). 2γm
3.3. The S-transform Let us come back to a general CKS-space: W ⊂ Γ(H) ⊂ W ∗ ,
W = Γα (E),
H = L2 (T, dt; C).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
293
Quantum White Noise Calculus and Applications
Definition 3.10. The S-transform of Φ = (Fn ) ∈ W ∗ is defined by SΦ(ξ) = Φ, φξ =
∞
Fn , ξ ⊗n ,
ξ ∈ E.
(3.14)
n=0
Remark 3.11. The Gaussian realization of the exponential vector φξ is given in Proposition 2.10. Then the S-transform of φ ∈ Γ(H) ∼ = L2 (ER∗ , µ; C) takes the integral form: φ(x)ex,ξ µ(dx), ξ ∈ E, (3.15) Sφ(ξ) = φ, φξ = e−ξ,ξ/2 ER∗
where we note that φ(x)ex,ξ is an integrable function on (ER∗ , µ). Moreover, recall that the standard Gaussian measure µ on ER∗ is quasi-invariant under the translation by ξ ∈ ER and the Radon-Nikodym derivative is given by µ(dx − ξ) = φξ (x) = ex, ξ−ξ, ξ/2 , µ(dx) see e.g., Glimm–Jaffe [26], Obata [76], Yamasaki [94]. Then (3.15) becomes Sφ(ξ) = φ(x + ξ)µ(dx), ξ ∈ ER . ER∗
As a result, Sφ(ξ) for a complex ξ ∈ E is obtained also by the analytic continuation of the above integral. Example 3.12. For the white noise {Wt ; t ∈ T } we have SWt (ξ) = Wt , φξ = ξ(t),
ξ ∈ E.
For x ∈ E ∗ we define an exponential vector φx by a similar formula as in (2.10). Then Sφx (ξ) = φx , φξ = ex, ξ ,
ξ ∈ E.
Theorem 3.13. The S-transform determines a white noise distribution uniquely. Proof. of W.
Immediate from the fact that {φξ ; ξ ∈ E} spans a dense subspace
The S-transform is a function on E, which is a complex nuclear space. It is important to characterize the S-transform as a function on E. For regularity property we recall some standard notions. A function F : E → C is called entire if there exists p ≥ 0 such that F is extended to a continuous
October 24, 2013
294
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
function on Ep and the extension admits a power series expansion. A function F : E → C is called Gˆ ateaux-entire if for any ξ, η ∈ E, the function z → F (ξ + zη) is an entire holomorphic function on the complex plane C. If a Gˆ ateaux-entire function F : E → C is locally bounded on Ep for some p ≥ 0, it is entire [22]. Entire functions are classified with their growth rates. In order to estimate the growth rates we need the following functions: ∞ n2n Gα (s) α (t) ≡ , inf tn G n!α(n) s>0 sn n=0 1/α (t) ≡ G
∞ n=0
tn
n2n α(n) n!
G1/α (s) . s>0 sn
inf
α (resp. G 1/α ) has a In fact, condition (A3) (resp. (A4)) holds if and only if G positive radius of convergence Rα > 0 (resp. R1/α > 0), for the proof see [2]. The following results are known as the characterization theorem for S-transform and have many applications. Theorem 3.14 (Characterization theorem for S-transform). A Cvalued function F on E is the S-transform of some Φ ∈ W ∗ , i.e., F = SΦ, if and only if the following two conditions are satisfied : (F1) F is a Gˆ ateaux-entire function on E; (F2) there exist constants C ≥ 0 and p ∈ R such that |F (ξ)|2 ≤ CGα (|ξ|2p ),
ξ ∈ E.
(3.16)
In that case, for all s ≥ 0 satisfying A−s 2HS < Rα , α ( A−s 2 ). Φ2−(p+s),− ≤ C G HS Theorem 3.15 (Characterization theorem for S-transform). A Cvalued function F on E is the S-transform of some Φ ∈ W, i.e., F = SΦ, if and only if (F1) and the following condition are satisfied : (F3) For all p ≥ 0 there exists a constant C ≥ 0 such that |F (ξ)|2 ≤ CG1/α (|ξ|2−p ),
ξ ∈ E.
In that case, for all s ≥ 0 satisfying A−s 2HS < R1/α , 1/α ( A−s 2 ). Φ2p−s,+ ≤ C G HS
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
295
Quantum White Noise Calculus and Applications
An outline of the proof of Theorem 3.14 is divided into Lemmas 3.16– 3.18 below. Lemma 3.16. and (F2) .
The S-transform F = SΦ for Φ ∈ W ∗ satisfies (F1)
Proof. Let Φ = (Fn ) ∈ W ∗ . Then by definition Φ ∈ Γα (E−p ) for some p ≥ 0. Then, with the help of the estimate: ∞
|Fn , ξ ⊗n | ≤
n=0
∞
|Fn |−p |ξ ⊗n |p
n=0
≤
∞
1/2
∞ α(n) 2n |ξ|p n! n=0
n!α(n)−1 |Fn |2−p
n=0 1/2 = Φ−p,− Gα (|ξ|2p ),
1/2
(3.17)
we see that the series (3.14) converges and holomorphic on Ep . Therefore, SΦ is a Gˆ ateaux-entire function on E and condition (F1) is fulfilled. The growth estimate (3.16) follows immediately from (3.17). Lemma 3.17. Let F : E → C be a Gˆ ateaux-entire function. Assume that there exist an entire function G on C and p ∈ R such that |F (ξ)|2 ≤ G(|ξ|2p ),
ξ ∈ E.
Then the n-th Gˆ ateaux derivative Fn (ξ1 , . . . , ξn ) =
1 Dξ · · · Dξn F (0) n! 1
(3.18)
becomes a continuous n-linear form on E satisfying |Fn |2−(p+s)
≤
nn n!
2
G(t) inf t>0 tn
A−s 2n HS
(3.19)
for all s ≥ 0 such that A−s is of Hilbert–Schmidt type. Proof. Note first that z → F (zξ) is an entire function on C. The Taylor expansion is given by F (zξ) =
∞ n=0
Fn (ξ, . . . , ξ)z n ,
ξ ∈ E,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
296
b1644-ch04
Real and Stochastic Analysis
and the coefficient is estimated by Cauchy’s integral formula: |Fn (ξ, . . . , ξ)| ≤ |ξ|np
G(t) t>0 tn
1/2
inf
.
Then, (3.19) follows by the polarization formula and by explicit calculation with Fourier expansion of Fn in terms of eigenvectors of A. Lemma 3.18. Let F be a C-valued function on E satisfying (F1) and (F2). ateaux derivative defined by (3.18) and For each n ≥ 0 let Fn be the n-th Gˆ put Φ = (Fn ). Then we have α (A−s 2 ), Φ2−(p+s),− ≤ C G HS
(3.20)
where s ≥ 0 is taken so as to satisfy A−s 2HS < Rα . In particular, Φ ∈ W ∗ . Proof.
From Lemma 3.17 we see that Φ2−(p+s),− = ≤
∞ n! |Fn |2−(p+s) α(n) n=0
n 2 ∞ n! CGα (t) n inf A−s 2n HS . n t>0 α(n) n! t n=0
Since lims→∞ A−s HS = 0, we may choose a sufficiently large s ≥ 0 such that A−s 2HS < Rα . Then we come to (3.20). Similarly we can prove the following Lemma 3.19. Let F : E → C be a Gˆ ateaux-entire function. Assume that there exist constants C ≥ 0 and p ∈ R such that |F (ξ)|2 ≤ CG1/α (|ξ|2−p ),
ξ ∈ E.
(3.21)
For each n ≥ 0 let Fn be the n-th Gˆ ateaux derivative defined by (3.18) and put Φ = (Fn ). Then we have 1/α ( A−s 2 ). Φ2p−s,+ ≤ C G HS
(3.22)
Now we come back to Theorem 3.15. The “only if” part is similar to Lemma 3.16. Suppose that a C-valued function F on E satisfies (F1) and (F3). Then, (3.22) holds for all p ≥ 0, which implies that Φ ∈ W. This proves Theorem 3.15.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
297
Remark 3.20. For the Hida–Kubo–Takenaka space, i.e., the case of α(n) ≡ 1, the conditions (F2) and (F3) become simpler since Gα (t) = G1/α (t) = et . ˜ α (t) = Moreover, although we do not know a concise expression of G ˜ G1/α (t), we may use the following inequality: ˜ α (t) = G ˜ 1/α (t) = G
∞ nn (et)n ≤ (1 − e2 t)−1 , n! n=0
which is verified by a sharp form of Stirling’s formula due to Robbins: n n n n 1 √ √ 1 2πn e 12n+1 < n! < 2πn e 12n , n = 1, 2, . . . . e e The S-transform was first introduced by Kubo–Takenaka [60], as a variant of T -transform of Hida [30]. It was Potthoff–Streit [87] that first proved the characterization theorem for white noise distributions in terms of S-transform in the case of a Hida–Kubo–Takenaka space, soon later Kuo– Potthoff–Streit [67] obtained a similar characterization for test white noise functions. Since then many relevant works appeared, see e.g., [20,43,55,59]. Among others, motivated by a CKS-space [20], Asai–Kubo–Kuo [2] tried to sort out minimum conditions in order to keep characterization theorem of the same type. These conditions are listed in (B1)–(B4).
3.4. Infinite dimensional holomorphic functions In this subsection we review another construction of a Gelfand triple: W ⊂ Γ(H) ⊂ W ∗ introduced by Gannoun–Hachaichi–Ouerdiane–Rezgui [23]. Let θ be a Young function, i.e., it is a continuous, convex, and increasing function defined on [0, ∞) such that θ(0) = 0 and lim
x→∞
θ(x) = ∞. x
For each p ∈ R and m > 0 let Exp(Ep , θ, m) be the space of entire functions f on Ep satisfying the condition: f θ,p,m = sup |f (x)|e−θ(m|x|p ) < ∞. x∈Ep
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
298
b1644-ch04
Real and Stochastic Analysis
Then Exp(Ep , θ, m) becomes a Banach space with norm ·θ,p,m. We define Fθ (E ∗ ) =
proj lim Exp(E−p , θ, m).
p→∞, m→+0
An element f ∈ Fθ (E ∗ ) is a holomorphic function on E ∗ with exponential growth of order θ and of minimal type. Every f ∈ Fθ (E ∗ ) admits a Taylor expansion of the form: f (x) =
∞
x⊗n , fn ,
x ∈ E∗.
n=0
Let Fθ (E) be the space of all Taylor coefficients (fn ) obtained from f ∈ Fθ (E ∗ ). It is known that Fθ,m (Ep ), Fθ (E) = p≥0,m>0
where
φ = (fn ); fn ∈
Fθ,m (Ep ) =
b Ep⊗n ,
∞
θn−2 m−n |fn |2p
<∞ ,
n=0
eθ(r) . n = 0, 1, 2, . . . . r>0 rn
θn = inf
Moreover, equipped with the projective limit topology, Fθ (E) is a countable Hilbert nuclear space and is isomorphic to Fθ (E ∗ ). The correspondence T : f → (fn ), f ∈ Fθ (E ∗ ), is called the Taylor map. The dual space of Fθ (E) is given by Gθ,m (E−p ), Gθ (E ∗ ) = p≥0,m>0
where Gθ,m (E−p ) =
Φ = (Fn ); Fn ∈
b ⊗n E−p ,
∞
2
(n!θn ) m
n
|Fn |2−p
n=0
The canonical C-bilinear form on Gθ (E ∗ ) × Fθ (E) is given by n!Fn , fn , Φ = (Fn ), φ = (fn ). Φ, φ = n≥0
Under the condition: lim
x→+∞
θ(x) < ∞, x2
<∞ .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
299
we obtain a Gelfand triple: Fθ (E) ⊂ Γ(H) ⊂ Gθ (E ∗ ).
(3.23)
This construction was developed independently of CKS-space but it was shown by Asai–Kubo–Kuo [2] that the resultant classes of Gelfand triples coincide. For each Φ = (Fn ) ∈ Gθ (E ∗ ) we define a holomorphic function F on E by F (ξ) =
∞
Fn , ξ ⊗n ,
ξ ∈ E.
(3.24)
n=0
This is called the holomorphic realization and is denoted by F = HΦ. Then by restriction H maps Fθ (E) and Gθ (E ∗ ) to spaces of holomorphic functions on E. Define Gθ ∗ (E) =
ind lim
p→∞, m→+0
Exp(Ep , θ∗ , m),
θ∗ (x) = sup{tx − θ(t)},
x ≥ 0.
t≥0
An element of Gθ∗ (E) is a holomorphic function on E with exponential growth of order θ ∗ and of arbitrary type. It is proved that the map H is a topological isomorphism from Gθ (E ∗ ) onto Gθ ∗ (E), and that H Fθ (E) = T −1. During the above discussion we see that the dual space of Fθ (E ∗ ) is isomorphic to Gθ ∗ (E). We observe this isomorphism more in detail. Since T : Fθ (E ∗ ) → Fθ (E) is a topological isomorphism, so is its dual map T ∗ : Fθ∗ (E) → Fθ∗ (E ∗ ). Thus, we obtain the composition: (T ∗ )−1
∼ =
L : Fθ∗ (E ∗ ) −−−−−−→ Fθ∗ (E) −−→ Gθ (E ∗ ) −−→ Gθ ∗ (E). H
Let F ∈ Fθ∗ (E ∗ ) and (T ∗ )−1 F = Φ = (Fn ) ∈ Gθ (E ∗ ). Then we have LF (ξ) = HΦ(ξ) =
∞
Fn , ξ ⊗n ,
ξ ∈ E.
(3.25)
n=0
On the other hand, for an exponential vector φξ ∈ Fθ (E), ξ ∈ E, the holomorphic realization is defined by ∞ ⊗n ξ ⊗n = ex,ξ , x ∈ E. ,x eξ (x) = Hφξ (x) = n! n=0
October 24, 2013
300
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
The obvious extension to E ∗ is denoted by the same symbol. Then eξ ∈ Fθ (E ∗ ). We now observe the canonical pairing of F ∈ Fθ∗ (E ∗ ) and eξ as follows: ∞ ∞ ξ ⊗n F, eξ Fθ∗ (E ∗ )×Fθ (E ∗ ) = Φ, φξ = = n! Fn , Fn , ξ ⊗n , n! n=0 n=0 where Φ, φξ is the canonical bilinear form associated with (3.23). In view of (3.25) we come to LF (ξ) = F, eξ Fθ∗ (E ∗ )×Fθ (E ∗ ) . Consequently, the Laplace transform L : Fθ∗ (E ∗ ) → Gθ ∗ (E) establishes the topological isomorphism. Remark 3.21. Since the holomorphic realization (3.24) coincides with the S-transform by definition, we conclude that LF (ξ) = F, eξ Fθ∗ (E ∗ )×Fθ (E ∗ ) = F˜ , φξ = S F˜ (ξ),
ξ ∈ E,
where F and F˜ are related under the isomorphism Fθ∗ (E ∗ ) ∼ = Gθ (E ∗ ). As ∗ ∗ a result, the topological isomorphism L : Fθ (E ) → Gθ ∗ (E) is nothing else but the characterization theorem of S-transform. Characterization theorem for the S-transform in terms of infinite dimensional holomorphic functions was established also by Lee [69] in case of Hida–Kubo–Takenaka space.
4. White Noise Operators 4.1. White noise operators and their symbols A continuous linear operator from W into W ∗ is called a white noise operator. The space of white noise operators is denoted by L(W, W ∗ ) and is equipped with the bounded convergence topology. Since the natural inclusion W → W ∗ is continuous, we have a continuous inclusion L(W, W) ⊂ L(W, W ∗ ). By restriction, we also have L(W ∗ , W ∗ ) ⊂ L(W, W ∗ ).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
301
Quantum White Noise Calculus and Applications
Thus, the white noise operators cover a wide class of Fock space operators. In particular, every bounded operator on the Fock space Γ(H) is a white noise operator. Since W is a nuclear space, applying the kernel theorem we have the canonical isomorphism: L(W, W ∗ ) ∼ = W ∗ ⊗ W ∗,
(4.1)
which is defined by Ξφ, ψ = ΞK , ψ ⊗ φ,
φ, ψ ∈ W.
Then, the map Ξ → ΞK gives rise to isomorphisms: L (W ∗ , W) ∼ = W ⊗ W,
L (W, W) ∼ = W ⊗ W ∗,
L (W ∗ , W ∗ ) ∼ = W ∗ ⊗ W. L (W ∗ , Γ(H)) ∼ = Γ(H) ⊗ W,
etc.
It is noted that the Hilbert space tensor product Γ(H)⊗Γ(H) is isomorphic to the space of Hilbert–Schmidt operators L2 (Γ(H), Γ(H)). For Ξ ∈ L (W, W ∗ ) the adjoint operator Ξ∗ ∈ L (W, W ∗ ) is defined by Ξ∗ φ, ψ = Ξψ, φ,
φ, ψ ∈ W.
The right-hand side is sometimes denoted by φ, Ξψ informally. Remark 4.1. Let Ξ be an operator in Γ(H) with dense domain. Then the “Hilbert space adjoint operator” Ξ† is defined by Ξ† φ|ψ = φ|Ξψ,
φ ∈ Dom (Ξ† ),
ψ ∈ Dom (Ξ).
Then for Ξ ∈ L(W, Γ(H)) we have ¯ Ξ† φ = Ξ∗ φ,
φ ∈ Dom (Ξ† ) ∩ W.
In this paper we use mostly the adjoint Ξ∗ with respect to the canonical C-bilinear form. Proposition 4.2. It holds that L2 (Γα (Ep ), Γα (E−p )) = L (Γα (Ep ), Γα (E−p )). L (W, W ∗ ) = p≥0
Proof. see that
p≥0
Since L (W, W ) ∼ = (W ⊗ W)∗ and W = proj limp→∞ Γα (Ep ) we
(W ⊗ W)∗ ∼ =
∗
p≥0
Γα (E−p ) ⊗ Γα (E−p ) ∼ =
p≥0
L2 (Γα (Ep ), Γα (E−p )),
October 24, 2013
302
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
where we used a general fact concerning π-tensor product, Hilbert space tensor product and nuclearity. The first relation is thus shown. For the second one, it is sufficient to show that Ξ ∈ L (Γα (Ep ), Γα (E−p )) for some p ≥ 0 is a white noise operator. But this is apparent by definition. Definition 4.3. The symbol of a white noise operator Ξ ∈ L (W, W ∗ ) is a function on E × E defined by (ξ, η) = Ξφξ , φη , Ξ
ξ, η ∈ E.
Similarly, the Wick symbol is defined by (ξ, η) = Ξφξ , φη e−ξ, η , Ξ
ξ, η ∈ E.
The above definition is due to Berezin [8] and Kr´ee–R¸aczka [57]. As there is a simple relation between the symbol and the Wick symbol, our results will be stated mostly in terms of the symbol. Theorem 4.4. The symbol determines a white noise operator uniquely. Proof. Straightforward from the fact that {φξ ; ξ ∈ E} spans a dense subspace of W. See also the proof of Theorem 3.13. 4.2. Quantum white noise We are now in a position to introduce the most fundamental white noise operators. With each f ∈ E ∗ we associate the annihilation operator a(f ) defined by ∞ a(f ) : φ = (fn )∞ n=0 → ((n + 1)f ⊗1 fn+1 )n=0 ,
where f ⊗1 fn stands for the right contraction defined by f ⊗1 fn , ξ1 ⊗ · · · ⊗ ξn−1 = fn , ξ1 ⊗ · · · ⊗ ξn−1 ⊗ f . We prove below that a(f ) ∈ L (W, W). The adjoint operator a∗ (f ) ∈ L (W ∗ , W ∗ ) is defined by a∗ (f )Φ, φ = Φ, a(f )φ,
Φ ∈ W ∗,
φ ∈ W.
We call a∗ (f ) the creation operator. It is straightforward to see that ∞ a∗ (f ) : φ = (fn )∞ n=0 → (f ⊗fn−1 )n=0 ,
(understanding that f−1 = 0).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
303
Quantum White Noise Calculus and Applications
In particular, for each t ∈ T we put a∗t = a∗ (δt ).
at = a(δt ),
These are called respectively the annihilation operator and creation operator at a point t. The pair {at , a∗t ; t ∈ T } is called the quantum white noise (field) on T , or a free Boson field on T in physical literatures [25, 26]. Lemma 4.5. For any p ≥ 0 and q > q0 there exists a constant C = C(p, q) ≥ 0 such that a(f )φp,+ ≤ C|f |−(p+q) φp+q,+ ,
f ∈ E∗,
φ ∈ W.
Here q0 ≥ 0 is a constant determined by W. In particular, a(f ) ∈ L(W, W). Proof.
For f ∈ E ∗ and φ = (fn ) ∈ W we have by definition a(f )φ = ((n + 1)f ⊗1 fn+1 )∞ n=0 .
Then we obtain a(f )φ2p,+ =
∞
n!α(n)|(n + 1)f ⊗1 fn+1 |2p
n=0
≤
∞
(n + 1)(n + 1)!α(n)ρ2qn |f |2−(p+q) |fn+1 |2p+q ,
(4.2)
n=0
where we used the inequality: |f ⊗1 fn+1 |p ≤ ρqn |f |−(p+q) |fn+1 |p+q ,
p ≥ 0,
q ≥ 0,
which follows by simple application of Schwartz inequality. Then (4.2) becomes a(f )φ2p,+ ≤ |f |2−(p+q)
∞
(n + 1)ρ2qn
n=0
×
α(n) (n + 1)!α(n + 1)|fn+1 |2p+q α(n + 1)
≤ |f |2−(p+q) φ2p+q,+ sup(n + 1)ρ2qn n≥0
α(n) . α(n + 1)
(4.3)
It is known (see the proof of Lemma 3.3) that there exists a constant M ≥ 1 such that α(1)α(n) ≤ M n+1 α(n + 1).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
304
b1644-ch04
Real and Stochastic Analysis
Then the coefficient in (4.3) becomes sup(n + 1)ρ2qn n≥0
α(n) M ≤ sup (n + 1)(ρ2q M )n , α(n + 1) n≥0 α(1)
which is finite for all q > q0 with q0 ≥ 0 being determined by ρ2q0 M = 1. Thus, for f ∈ E ∗ we have a(f ) ∈ L (W, W) and so a∗ (f ) ∈ L (W ∗ , W ∗ ). If f = ζ ∈ E, the corresponding annihilation and creation operators are more regular in the following sense. Lemma 4.6. If ζ ∈ E, then a(ζ) extends to a continuous linear operator from W ∗ into itself and a∗ (ζ) restricted to W is a continuous linear operator from W into itself. The proof is similar to Lemma 4.5. For simplicity the extension and restriction mentioned in Lemma 4.6 are denoted by the same symbols. So we have a∗ (ζ) ∈ L(W, W) and a(ζ) ∈ L(W ∗ , W ∗ ) for ζ ∈ E. Theorem 4.7 (Canonical commutation relation). For ξ, η ∈ E it holds that [a(ξ), a(η)] = 0,
[a∗ (ξ), a∗ (η)] = 0,
[a(ξ), a∗ (η)] = ξ, η,
(4.4)
where the left-hand sides are defined white noise operators in L(W, W ∗ ). Proof. We only need to note that the composition [a(ξ), a∗ (η)] = a(ξ) a∗ (η) − a∗ (η)a(ξ) is defined as a white noise operator by Lemma 4.6. The commutators are easily computed by definition. Proposition 4.8. Both maps t → at ∈ L(W, W) and t → a∗t ∈ L (W ∗ , W ∗ ) are continuous. Moreover, the former is an element of L(W, W) ⊗ E, i.e., L(W, W)-valued test function on T, and the latter belongs to L(W ∗ , W ∗ ) ⊗ E, i.e., L(W ∗ , W ∗ )-valued test function on T . Proof. It follows from Lemma 4.5 that the correspondence a: f → a(f ) is a continuous linear map from E−p into L(W, W) for all p > q0 . Namely, a belongs to the space L(E−p , L(W, W)) ∼ = L(W, W) ⊗ Ep ,
p > q0 ,
so does L(W, W) ⊗ E. The second statement is proved in a similar manner.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
305
Proposition 4.9. It holds that a(f )φξ = f, ξφξ ,
f ∈ E∗,
ξ ∈ E.
In other words, an exponential vector is an eigenvector of every annihilation operator. The proof is straightforward. As for the creation operators, we note the following Proposition 4.10. It holds that a∗ (f )φξ =
d φξ+tf , dt t=0
where f, ξ ∈ E ∗ and the limit is taken in W ∗ . If f, ξ ∈ E, the limit is taken in W. Proof.
Let f, ξ ∈ E ∗ . For t ∈ R we define Φt ∈ W ∗ by Φt =
φξ+tf − φξ − a∗ (f )φξ . t
Then by definition we have
tf, η −1 e − f, η φξ , φη . SΦt (η) = t Using the inequality |eαt − 1 − αt| ≤
|α|2 2 |αt| t e , 2
α ∈ C,
t ∈ R,
we obtain 1 |f, η|4 t2 e2|f, ηt| φξ 2−p,− φη 2p,+ 4 2 2 1 ≤ |f |4−p |η|4p t2 e(|f |−p +|η|p )|t| φξ 2−p,− Gα (|η|2p ) 4
|SΦt (η)|2 ≤
≤
2 t2 4 |f |2−p|t| |f |−p e φξ 2−p,− × |η|4p e|η|p |t| Gα (|η|2p ). 4
where p ≥ 0 is chosen in such a way that f, ξ ∈ E−p . Assume that |t| ≤ 1. By Lemma 3.3 there exist constants C1 > 0 and C2 > 0 such that 2
|η|4p e|η|p |t| Gα (|η|2p ) ≤ C1 Gα (C2 |η|2p ) ≤ C1 Gα (|η|2p+q ),
October 24, 2013
10:0
9in x 6in
306
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
where q > 0 is taken in such a way that C2 ρ2q ≤ 1. Consequently, we have |SΦt (η)|2 ≤ C3 t2 Gα (|η|2p+q ),
η ∈ W,
|t| ≤ 1,
where C3 > 0 is a constant independently of η and t. Applying the last part of Theorem 3.14, we obtain α (A−r 2HS ), Φt 2−(p+q+r),− ≤ C3 t2 G where r ≥ 0 is chosen in such a way that A−s 2HS < Rα . It then follows that Φt → 0 in Γ1/α (E−(p+q+r) ) as t → 0, so is in W ∗ as desired. The second half of the statement is proved in a similar manner. Finally we record the symbols of quantum white noises, the verification of which is straightforward. Proposition 4.11. For f ∈ E ∗ we have )(ξ, η) = f, ξeξ, η , a(f
∗ (f )(ξ, η) = f, ηeξ, η , a
ξ, η ∈ E.
4.3. Integral kernel operators and Fock expansion Let l, m ≥ 0 be integers. For κ ∈ (E ⊗(l+m) )∗ and φ = (fn ) ∈ W we put gn = 0,
0 ≤ n < l;
gl+n =
(n + m)! κ ⊗m fn+m , n!
n ≥ 0.
(4.5)
Then the integral kernel operator with kernel distribution κ, denoted by Ξl,m (κ), is defined by Ξl,m (κ)φ = (gn ).
(4.6)
By definition Ξ0,0 (κ) = κI, Ξ0,1 (f ) = a(f ),
κ ∈ C,
Ξ1,0 (f ) = a∗ (f ),
f ∈ E∗.
In fact, the action of Ξl,m (κ) is easily understood through a more descriptive (but formal) integral expression: Ξl,m (κ) = κ(s1 , . . . , sl , t1 , . . . , tm ) T l+m × a∗s1 · · · a∗sl at1
· · · atm ds1 · · · dsl dt1 · · · dtm .
It is necessary to prove that (4.6) defines a white noise operator.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
307
Quantum White Noise Calculus and Applications
Theorem 4.12. For any κ ∈ (E ⊗(l+m) )∗ the integral kernel operator Ξl,m (κ) is a white noise operator, i.e., Ξl,m (κ) ∈ L(W, W ∗ ). Moreover, Ξl,m (κ) belongs to L(W, W) if and only if κ ∈ (E ⊗l ) ⊗ (E ⊗m )∗ . The proof is omitted. A direct proof is by norm estimate similar to Lemma 4.5, but much more lengthy. An alternative proof is by characterization theorem for operator symbols (Section 4.4). The kernel distribution of an integral kernel operator Ξl,m (κ) is uniquely determined whenever taken from (E ⊗(l+m) )∗sym(l,m) = {κ ∈ (E ⊗(l+m) )∗ ; Sl,m (κ) = κ}, where Sl,m = Sl ⊗ Sm is the symmetrizing operator with respect to the first l and the last m variables independently. Proposition 4.13. The symbol of an integral kernel operator Ξl,m (κ) is given by Ξl,m (κ)(ξ, η) = Ξl,m (κ)φξ , φη = κl,m , η⊗l ⊗ ξ ⊗m eξ, η , Proof.
ξ, η ∈ E.
By a direct calculation by definition.
Theorem 4.14 (Fock expansion). For any Ξ ∈ L(W, W ∗ ) there exists a unique family of distributions κl,m ∈ (E ⊗(l+m) )∗sym(l,m) such that Ξ=
∞
Ξl,m (κl,m ),
(4.7)
l,m=0
where the right-hand side converges in L(W, W ∗ ). Proof. We only give an outline. Given Ξ ∈ L(W, W ∗ ), consider the Wick symbol: η) = e−ξ, η Ξφξ , φη , Ξ(ξ,
ξ, η ∈ E.
We see easily by Lemma 3.3 that there exist C ≥ 0 and p ≥ 0 such that η)|2 ≤ CGα (|ξ|2 )Gα (|η|2 ). |Ξ(ξ, p p
(4.8)
For l, m ≥ 0 we consider an entire function on Cm+l defined by 1 ξ1 + · · · + zm ξm , w1 η1 + · · · + wl ηl ) ψ(z1 , . . . , zm , w1 , . . . , wl ) = Ξ(z
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
308
b1644-ch04
Real and Stochastic Analysis
and a (l + m)-linear functional Kl,m = Kl,m (ξ1 , . . . , ξm , η1 , . . . , ηl ) on E by ∂ l+m ψ 1 . Kl,m = l!m! ∂z1 . . . ∂zm ∂w1 . . . ∂wl z1 =···=zm =0 w1 =···=wl =0
0). Then we have the Taylor expansion: For l = m = 0 we set K0,0 = Ξ(0, Ξ(zξ, wη) =
∞
m times
l times
Kl,m (ξ, . . . , ξ, η, . . . , η)z m wl ,
z, w ∈ C.
(4.9)
l,m=0
Applying a standard method to (4.8), we can estimate the norm of Kl,m as follows: |Kl,m (ξ1 , . . . , ξm , η1 , . . . , ηl )| ≤ Cl,m |ξ1 |p . . . |ξm |p |η1 |p . . . |ηl |p
(4.10)
for all ξ1 , . . . , ξm , η1 , . . . , ηl ∈ E. Therefore Kl,m is a continuous (l + m)linear functional on E, and hence there exists κl,m ∈ (E ⊗(l+m) )∗ such that κl,m , η1 ⊗ · · · ⊗ ηl ⊗ ξ1 ⊗ · · · ⊗ ξm = Kl,m (ξ1 , . . . , ξm , η1 , . . . , ηl ). Then (4.9) becomes Ξ(zξ, wη) =
∞
κl,m , η ⊗l ⊗ ξ ⊗m z m wl ,
z, w ∈ C.
(4.11)
l,m=0
Clearly, κl,m ∈ (E ⊗(l+m) )∗sym(l,m) and, using (4.10) we obtain l+m |κl,m |−(p+q) ≤ Cl,m A−q HS ,
(4.12)
where q > 0 is chosen in such a way that the right-hand side is finite. Using the existing estimate of an integral kernel operator, we may choose r > 0 and s > 0 such that |κl,m |−(p+q) φp+q+r+s,+ Ξl,m (κl,m )φ−(p+q+r),− ≤ Cl,m
and M=
∞
l+m Cl,m Cl,m A−q HS < ∞.
l,m=0
Then we have ∞ l,m=0
Ξl,m (κ)φ−(p+q+r),− ≤ M φp+q+r+s,+ ,
φ ∈ W,
(4.13)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
309
which implies that the infinite series ∞
Ξ =
Ξl,m (κl,m )
(4.14)
l,m=0
converges in L(W, W ∗ ). Finally, we show that Ξ = Ξ . Since (4.14) converges in L(W, W ∗ ), we have ∞
(ξ, η) = Ξ φξ , φη = Ξ
Ξl,m (κl,m )φξ , φη .
(4.15)
l,m=0
On the other hand, we see from (4.11) that η) = eξ, η Ξ(ξ, η) = Ξ(ξ,
∞
κl,m , η ⊗l ⊗ ξ ⊗m eξ, η .
(4.16)
l,m=0
The expressions (4.15) and (4.16) coincide by Proposition 4.13, hence we have Ξ = Ξ . Theorem 4.15 (Fock expansion). For any Ξ ∈ L(W, W) there exists b ) ⊗ (E ⊗m )∗sym such that a unique family of distributions κl,m ∈ (E ⊗l Ξ=
∞
Ξl,m (κl,m ),
(4.17)
l,m=0
where the right-hand side converges in L(W, W). Proof.
Similar to that of Theorem 4.14.
Remark 4.16. It is proved by Berezin [8] that every bounded operator on Γ(H) admits an expression in the form (4.7), where convergence of the infinite series is understood in a weak sense. While, it has been widely accepted at various levels of mathematical rigor that the annihilation and creation operators generate an irreducible algebra of Fock space operators. Then the expression (4.7) is naturally understood as a general form of a Fock space operator, tracing back to Haag [29], see also Glimm–Jaffe [25]. Our emphasis in Theorems 4.14 and 4.15 is that the class of operators are clearly limited to the white noise operators and the rate of convergence is estimated.
October 24, 2013
10:0
310
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
4.4. Characterization of operator symbols Theorem 4.17 (Characterization of operator symbols). A C-valued function Θ defined on E × E is the symbol of an operator Ξ ∈ L(W, W ∗ ), if and only if i.e., Θ = Ξ, (O1) Θ is a Gˆ ateaux-entire function on E × E; (O2) there exist constant numbers C ≥ 0 and p ≥ 0 such that |Θ(ξ, η)|2 ≤ CGα (|ξ|2p )Gα (|η|2p ),
ξ, η ∈ E.
In that case, for all q ≥ 0 satisfying A−q 2HS < Rα we have 2α ( A−q 2HS )φ2p+q,+ , Ξφ2−(p+q),− ≤ C G
φ ∈ W.
(4.18)
Proof. An outline only. We first note that the “only if” part is straightforward. Next we assume that (O1) and (O2) are satisfied. These conditions are satisfied by e−ξ, η Θ(ξ, η) too. Then, using a similar argument as in the proof of Theorem 4.14, we construct a family of kernel distributions {κl,m }, show that ∞ Ξl,m (κl,m ) (4.19) Ξ= l,m=0
= Θ. The estimate (4.18) is converges in L(W, W ∗ ), and verify that Ξ obtained during the proof of the convergence of (4.19). Theorem 4.18 (Characterization of operator symbols). A C-valued function Θ defined on E × E is the symbol of an operator Ξ ∈ L(W, W) if and only if (O1) Θ is a Gˆ ateaux-entire function on E × E; (O3) for any p ≥ 0, there exist constant numbers C ≥ 0 and q ≥ 0 such that |Θ(ξ, η)|2 ≤ CGα (|ξ|2p+q )G1/α (|η|2−p ),
ξ, η ∈ E.
In this case, for each r, s ≥ 0 satisfying A−r 2HS < Rα and A−s 2HS < R1/α we have α (A−r 2 )G 1/α (A−s 2 )φ2 Ξφ2p−s,+ ≤ C G HS HS p+q+r,+ , Proof.
φ ∈ W.
Similar to that of Theorem 4.17 but slightly more technical.
Theorems 4.14 and 4.15 together with Theorems 4.17 and 4.18 were first proved by Obata [75, 76] for the Hida–Kubo–Takenaka space. Further
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
311
Quantum White Noise Calculus and Applications
generalization and refinements were found in [13, 16, 17]. Later characterization theorems for the S-transform and the one for operator symbols were unified by Ji–Obata [43], see also [53]. In this line we also obtained a symbol characterization for Ξ ∈ L(Γα (E1 ), Γβ (E2 )). 4.5. Wick product and wick multiplication operators We start with the Wick product of generalized white noise functions. Lemma 4.19. For two Φ1 , Φ2 ∈ W ∗ there exists Ψ ∈ W ∗ uniquely specified by SΦ1 (ξ)SΦ2 (ξ) = SΨ(ξ),
ξ ∈ E.
Proof. By the characterization theorem for the S-transform (Theorem 3.14) together with Lemma 3.3. We write Ψ = Φ1 Φ2 and call it the Wick product. In other words, the Wick product Φ1 Φ2 is characterized by S(Φ1 Φ2 )(ξ) = SΦ1 (ξ)SΦ2 (ξ),
ξ ∈ E.
(4.20)
Obviously, the Wick product is commutative. Therefore, equipped with the Wick product, W ∗ becomes a commutative algebra. Moreover, the map (Φ1 , Φ2 ) → Φ1 Φ2 is a separately continuous bilinear map from W ∗ × W ∗ into W ∗ . Lemma 4.20. For Φ1 = (Fn ) and Φ2 = (Gn ) we have Φ1 Φ2 = (Hn ),
Hn =
n
Fk ⊗ Gn−k .
(4.21)
k=0
Proof.
By definition we have SΦ1 (ξ)SΦ2 (ξ) =
∞
Fk , ξ ⊗k
k=0
=
∞
∞
Gl , ξ ⊗l
l=0
Fk ⊗ Gl , ξ ⊗(k+l)
k,l=0
=
n ∞
Fk ⊗ Gn−k , ξ ⊗n ,
n=0 k=0
from which (4.21) follows.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
312
b1644-ch04
Real and Stochastic Analysis
With each Φ ∈ W ∗ we associate the Wick multiplication operator MΦ by MΦ Ψ = Φ Ψ,
Ψ ∈ W ∗.
We see that MΦ ∈ L(W ∗ , W ∗ ) and MΦ 1 Φ2 = MΦ 1 MΦ 2 = MΦ 2 MΦ 1 .
(4.22)
Theorem 4.21. Let Φ = (Fn ) ∈ W ∗ . Then the Fock expansion of the Wick multiplication operator MΦ is given by MΦ =
∞
Ξl,0 (Fl ).
(4.23)
l=0
Proof.
By definition of the Wick product we have (ξ, η) = M φξ , φη = Φ φξ , φη M Φ Φ = S(Φ φξ )(η) = SΦ(η)Sφξ (η).
(4.24)
Here we note that SΦ(η) =
∞
Fn , η ⊗n ,
Sφξ (η) = φξ , φη = eξ, η .
n=0
Then (4.24) becomes (ξ, η) = M Φ
∞
Fn , η⊗n eξ, η ,
n=0
from which the Fock expansion (4.23) follows. We next introduce the Wick product of white noise operators.
Lemma 4.22. For two white noise operators Ξ1 , Ξ2 ∈ L(W, W ∗ ) there exists a white noise operator Ξ ∈ L(W, W ∗ ) uniquely specified by η) = Ξ 1 (ξ, η)Ξ 2 (ξ, η)e−ξ,η , Ξ(ξ,
ξ, η ∈ E,
or equivalently by η) = Ξ 1 (ξ, η)Ξ 2 (ξ, η), Ξ(ξ,
ξ, η ∈ E.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
313
Quantum White Noise Calculus and Applications
1 (ξ, η)Ξ 2 (ξ, η) Proof. In fact, it is not difficult to check that Θ(ξ, η) = Ξ −ξ, η e satisfies (O1) and (O2) in Theorem 4.17. The white noise operator Ξ specified in Lemma 4.22 is called the Wick product of Ξ1 and Ξ2 , and is denoted by Ξ = Ξ1 Ξ2 . We note some simple properties: I Ξ = Ξ I = Ξ, (Ξ1 Ξ2 ) Ξ3 = Ξ1 (Ξ2 Ξ3 ), (Ξ1 Ξ2 )∗ = Ξ∗2 Ξ∗1 , Ξ1 Ξ2 = Ξ2 Ξ1 . Equipped with the Wick product, L(W, W ∗ ) becomes a commutative ∗-algebra. It is also proved by the characterization theorem for operator symbols that L(W, W) is closed under the Wick product. It is noted that the map (Ξ1 , Ξ2 ) → Ξ1 Ξ2 is a separately continuous bilinear map from L(W, W ∗ ) × L(W, W ∗ ) into L(W, W ∗ ). Proposition 4.23. For s, t ∈ T we have as at = as at ,
a∗s at = a∗s at ,
as a∗t = a∗t as ,
a∗s a∗t = a∗s a∗t ,
where the right-hand sides are well-defined compositions of white noise operators. More generally, for s1 , . . . , sl , t1 , . . . , tm ∈ T it holds that a∗s1 · · · a∗sl Ξat1 · · · atm = (a∗s1 · · · a∗sl at1 · · · atm ) Ξ,
Ξ ∈ L(W, W ∗ ). (4.25)
Proof. have
For simplicity we set P = as1 · · · asl and Q = at1 · · · atm . Then we ((Q∗ P ) Ξ)φξ , φη = (Q∗ P )φξ , φη Ξφξ , φη e−ξ, η = P φξ , Qφη Ξφξ , φη e−ξ, η = ξ(s1 ) · · · ξ(sl )η(t1 ) · · · η(tl )Ξφξ , φη = ΞP φξ , Qφη = Q∗ ΞP φξ , φη ,
which proves the assertion.
October 24, 2013
10:0
9in x 6in
314
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
4.6. Multiplication operators Each φ ∈ W is considered as a random variable on the Gaussian space (ER∗ , µ) through the Wiener–Itˆo–Segal isomorphism Γ(H) ∼ = L2 (ER∗ , µ; C). Then we may consider their product φψ as a function on (ER∗ , µ). To avoid confusion we call it pointwise product. Lemma 4.24. The pointwise product gives rise to a continuous bilinear map from W × W into W. Proof. For φ = (fn ) and ψ = (gn ) it follows by straightforward computation that φψ = (hl ),
hl =
∞ m+n=l k=0
n+k m+k k gn+k . fm+k ⊗ k! k k
Then applying a direct estimate of φψp,+ , we obtain the assertion.
For Φ ∈ W ∗ and φ ∈ W we define Φφ = φΦ ∈ W ∗ by Φφ, ψ = Φ, φψ,
ψ ∈ W.
Obviously, the map (Φ, φ) → Φφ is a separately continuous bilinear map. In particular, each Φ ∈ W ∗ gives rise to a multiplication operator MΦ ∈ L (W, W ∗ ) defined by MΦ φ = Φφ. With this we have a continuous injection W ∗ → L(W, W ∗ ). Note also that (MΦ )∗ = MΦ . Theorem 4.25. Regarded as a multiplication operator, the white noise field {Wt ; t ∈ T } is decomposed into a sum of quantum white noises: Wt = at + a∗t . Proof.
(4.26)
We first note that
MWt φξ , φη = Wt , φξ φη = eξ, η Wt , φξ+η = eξ, η (ξ(t) + η(t)). On the other hand, we have (at + a∗t )φξ , φη = at φξ , φη + φξ , at φη = eξ, η (ξ(t) + η(t)). Then we have MWt = at + a∗t , which shows (4.26).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
315
Remark 4.26. During the above proof we employed the identity φξ φη = eξ, η φξ+η , which is verified easily by the Gaussian realization. On the other hand, we note that φξ φη = φξ+η , which is straightforward by definition of the Wick product. Let X be a bounded random variable on the Gaussian space (ER∗ , µ). Then X gives rise to a multiplication operator MX acting on the Hilbert space H = L2 (ER∗ , µ; C). In this sense we have an isometric inclusion A = L∞ (ER∗ , µ; C) ⊂ B(H). Let 1 ∈ H be the constant function taking value one. Define a positive linear functional on A by ϕ(A) = 1|A1,
A ∈ A,
we obtain an algebraic probability space (A, ϕ). Now note that X(x)µ(dx) = E[X]. ϕ(MX ) = 1|MX 1 = 1|X = ER∗
The map X → MX , the canonical classical-quantum correspondence, enables us to discuss classical random variables within the framework of quantum probability. Our embedding W ∗ ⊂ L(W, W ∗ ) is an extension on this line. In this context (4.26) is referred to as the quantum decomposition of the white noise. 4.7. Convolution operators With each y ∈ ER∗ we will associate the translation operator Ty ∈ L(W, W). The Gaussian realization of φ = (fn ) ∈ W given by φ(x) =
∞
: x⊗n : , fn ,
x ∈ ER∗ ,
n=0
being a function on ER∗ , the translation is defined naturally by Ty φ(x) = φ(x − y),
x, y ∈ ER∗ .
In particular, for an exponential vector we have Ty φξ (x) = φξ (x − y) = ex−y, ξ−ξ, ξ/2 = e−y, ξ φξ (x).
(4.27)
October 24, 2013
10:0
9in x 6in
316
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
Namely, Ty φξ = e−y, ξ φξ .
(4.28)
It then follows from the characterization theorem or by direct norm estimate that Ty ∈ L(W, W). In fact, we see from (4.28) that the “translation” operator Ty is defined for all complex y ∈ E ∗ and belongs to L(W, W). The explicit action of Ty is stated in the following: Proposition 4.27. Let y ∈ E ∗ . For φ = (fn ) ∈ W we have
∞ n+k k fn+k . Ty φ = (gn ), gn = (−y)⊗k ⊗ n k=0
Remark 4.28. defined by
Let us recall the holomorphic realization of φ = (fn )
[Hφ](x) =
∞
x⊗n , fn ,
x ∈ E∗.
n=0
With each y ∈ E ∗ we associate a translation operator ty defined by (ty [Hφ])(x) = [Hφ](x − y),
x ∈ E∗.
By definition we have (ty [Hφ])(x) =
∞
(x − y)⊗n , fn
n=0
∞ ∞ n+k = (−1)k x⊗n , fn+k ⊗k y ⊗k . k n=0 k=0
In view of Proposition 4.27 we obtain (ty [Hφ])(x) =
∞
x⊗n , gn = H[Ty φ](x).
n=0
Consequently, we obtain ty H = HTy or equivalently, Ty = H −1 ty H. Following Ben Chrouda–El Oued–Ouerdiane [6] we introduce the convolution operator CΦ associated with Φ = (Fn ) ∈ W ∗ . First we observe that for φ = (fn ) ∈ W and x ∈ E ∗ , !
∞ ∞ n+k k ⊗k n! Fn , (−1) fn+k ⊗k (−x) Φ, T−x φ = k n=0 k=0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
317
Quantum White Noise Calculus and Applications
=
∞ ∞ (n + k)! n=0 k=0
k!
Fn , fn+k ⊗k x⊗k
∞ ∞ (n + k)! ⊗k x , Fn ⊗n fn+k . = k! n=0 k=0
We set hk =
∞ (n + k)! Fn ⊗n fn+k . k! n=0
(4.29) b
Then it is proved that the right-hand side converges in E ⊗k and (hk ) ∈ W. We define CΦ φ = (hk ), which is called the convolution operator associated with Φ ∈ W ∗ . Moreover, it is verified that CΦ ∈ L(W, W). As a result, the convolution operator is determined by [H(CΦ φ)](x) = Φ, T−x φ,
x ∈ E∗.
Lemma 4.29. Let Φ = (Fn ) ∈ W ∗ . The Fock expansion of the convolution operator CΦ is given by CΦ =
∞
Ξ0,m (Fm ),
(4.30)
m=0
where the right-hand side converges in L(W, W). Proof.
In view of CΦ φ = (hk ), where hk is given in (4.29), we easily obtain CΦ φξ = Φ, φξ φξ .
Hence "Φ (ξ, η) = CΦ φξ , φη = Φ, φξ φξ , φη = C
∞
Fm , ξ ⊗m eξ,η ,
m=0
from which the Fock expansion (4.30) follows. Theorem 4.30. For Φ ∈ W ∗ we have CΦ = (MΦ )∗ ,
MΦ = (CΦ )∗ .
October 24, 2013
10:0
9in x 6in
318
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
In particular, for Φ, Ψ ∈ W ∗ and φ ∈ W we have Φ Ψ, φ = Ψ, CΦ φ = Φ, CΨ φ. Proof.
(4.31)
Immediate from Theorem 4.21 and Lemma 4.29.
Example 4.31. Let τ ∈ (E × E)∗ be the trace, i.e., the integral kernel corresponding to the identity operator. For τ˜ = (0, 0, τ, 0, . . .) we have ∆G = Ξ0,2 (τ ) = Cτ˜ = (Mτ˜ )∗ . Namely, the Gross Laplacian is a convolution operator (see also Section 7.1). Theorem 4.32. It holds that CΦ1 Φ2 = CΦ1 CΦ2 = CΦ2 CΦ1 = CΦ1 CΦ2 ,
Φ1 , Φ2 ∈ W ∗ .
Moreover, the map Φ → CΦ gives rise to a continuous, injective homomorphism from (W ∗ , ) into (L(W, W), ). Proof. The first two relations are immediate from (4.22) and Theorem 4.30. Since CΦ contains only annihilation operators (Lemma 4.29), the Wick product and the composition of convolution operators coincide, namely, CΦ1 CΦ2 = CΦ1 CΦ2 . The rest is a routine verification. In order to summarize characteristic properties of the convolution operators we mention some basic results on the translation operator Ty . Lemma 4.33. For y ∈ E ∗ we have Ty = Proof.
∞ (−1)m Ξ0,m (y ⊗m ). m! m=0
Recall from (4.28) that Ty φξ = e−y, ξ φξ . Then we have
Ty (ξ, η) = Ty φξ , φη = e−y, ξ eξ, η =
∞ 1 (−y)⊗m , ξ ⊗m eξ, η , m! m=0
from which the Fock expansion of Ty follows.
We need notation. For κl,m ∈ (E ⊗(l+m) )∗ and ζ ∈ E let ζ ∗ κl,m ∈ and κl,m ∗ ζ ∈ (E ⊗(l+m−1) )∗sym(l,m−1) be defined
(E ⊗(l+m−1) )∗sym(l−1,m)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
319
respectively by ζ ∗ κl,m , η ⊗(l−1) ⊗ ξ ⊗m = κl,m , ζ ⊗ η⊗(l−1) ⊗ ξ ⊗m , κl,m ∗ ζ, η⊗(l−1) ⊗ ξ ⊗m = κl,m , η⊗l ⊗ ξ ⊗(m−1) ⊗ ζ, where η, ξ ∈ E. Lemma 4.34. For κl,m ∈ (E ⊗(l+m) )∗ and ζ ∈ E we have [a(ζ), Ξl,m (κl,m )] = lΞl−1,m (ζ ∗ κl,m ), [a∗ (ζ), Ξl,m (κl,m )] = −mΞl,m−1 (κl,m ∗ ζ).
(4.32) (4.33)
(The right-hand sides are understood to be zero for l = 0 and m = 0, respectively.) Proof. The proofs being parallel, we will show (4.33). It follows from Proposition 4.10 that Ξl,m (κl,m )a∗ (ζ)φξ , φη d Ξl,m (κl,m )φξ+θζ , φη = dθ θ=0 = κl,m , η ⊗l ⊗ mξ ⊗(m−1) ⊗ ζeξ, η + ζ, ηκl,m , η ⊗l ⊗ ξ ⊗m eξ, η .
(4.34)
On the other hand, a∗ (ζ)Ξl,m (κl,m )φξ , φη = ζ, ηκl,m , η ⊗l ⊗ ξ ⊗m eξ, η .
(4.35)
From (4.34) and (4.35) we see that a∗ (ζ)Ξl,m (κl,m )φξ , φη − Ξl,m (κl,m )a∗ (ζ)φξ , φη = −mκl,m , η ⊗l ⊗ ξ ⊗(m−1) ⊗ ζeξ, η = −mκl,m ∗ ζ, η ⊗l ⊗ ξ ⊗(m−1) eξ, η = −mΞl,m−1 (κl,m ∗ ζ)φξ , φη , which shows (4.33).
Theorem 4.35. For a white noise operator Ξ ∈ L(W, W) the following conditions are equivalent: (i) Ξ is a convolution operator, i.e., Ξ = CΦ for some Φ ∈ W ∗ ; (ii) Ξ commutes with all translations Ty , y ∈ E ∗ ;
October 24, 2013
320
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
(iii) Ξ commutes with all annihilation operators a(y) = Ξ0,1 (y), y ∈ E ∗ ; (iv) Ξ is the adjoint operator of a Wick multiplication. Proof. (i) ⇒ (ii). We see from Lemma 4.29 that a convolution operator consists of annihilation operators only. Similarly, by Lemma 4.33 so does a translation operator. Hence the convolution and translation operators are commutative. (ii) ⇒ (iii). For y ∈ E ∗ we have lim
t→0
Tty − I = ay t
in L(W, W). Then, taking the derivative at t = 0 of ΞTy = Ty Ξ we obtain Ξay = ay Ξ. (iii) ⇒ (i). Consider the Fock expansion of Ξ and apply Lemma 4.34. We then see that Ξ contains only annihilation operators, which means that it is a convolution operator. (i) ⇔ (iv) is shown in Theorem 4.30. Remark 4.36. In existing literatures, products or operations similar to the Wick product have been discussed from various aspects. First of all, remind that the Wick product is also called the normal-ordered product in many physical literatures. In some literatures [5] the Wick product of white noise operators is called the convolution product. There are some literatures, e.g., [21], the right-hand side of (4.31) is taken to be the definition of the “convolution product” of two white noise distributions, denoted by Φ Ψ or by similar symbols. Clearly, this convolution product coincides with the Wick product. Also in some literatures CΦ φ is denoted by Φ φ. As is remarked in [21, Remark 2.1], one should keep in mind that Φ φ has no consistency with Φ Ψ, see also Theorem 4.30. The notion of convolution product of two white noise distributions was also introduced by Kuo [64], being motivated by a direct analogue of convolution product of measures on an abelian group. In fact, Kuo’s convolution product Φ Ψ is defined by S(Φ Ψ)(ξ) = SΦ(ξ)SΨ(ξ)eξ,ξ/2 . We see from (4.20) that Kuo’s convolution product is different from the Wick product and so from the convolution product in [5].
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
321
5. Quantum Stochastic Gradients Throughout this section we take T = R and employ the Hida–Kubo– Takenaka space, i.e., the CKS-space W ⊂ Γ(H) ⊂ W ∗ ,
H = L2 (R; dt; C),
with weight sequence α(n) ≡ 1. In this section we write L2 (R) = L2 (R; dt; C). Generalizing slightly the traditional wording we say that a timeparameterized family of white noise distributions {Φt } ⊂ W ∗ , where t runs over an interval, is called a classical stochastic process. Similarly, {Ξt } ⊂ L(W, W ∗ ) is called a quantum stochastic process. In this paper we always assume that the maps t → Φt ∈ W ∗ and t → Ξt ∈ L(W, W ∗ ) are continuous. For example, the white noise process {Wt }, the Brownian motion {Bt } are classical stochastic processes, and the quantum white noise process ({at }, {a∗t }) are quantum stochastic processes. 5.1. Annihilation, creation and conservation processes For a quantum stochastic process {Lt } ⊂ L(W, W ∗ ) the integral Ξt =
t
Ls ds
(5.1)
a
is defined in a usual manner, e.g., through the canonical bilinear form and becomes a quantum stochastic process. Moreover, the map t → Ξt ∈ L(W, W ∗ ) is differentiable and d d Ξt = Lt = dt dt
t
Ls ds
(5.2)
a
holds in L(W, W ∗ ). The annihilation process, the creation process, and the number (conservation) process, which play a fundamental role in the Hudson– Parthasarathy calculus [38, 72, 84], are quantum stochastic processes also in our sense and are expressed as At =
0
t
as ds,
A∗t
t
= 0
a∗s
ds,
Λt =
0
t
a∗s as ds,
(5.3)
October 24, 2013
10:0
9in x 6in
322
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
respectively. Note that the Brownian motion is decomposed into a sum of the annihilation and creation processes: Bt = At + A∗t ,
(5.4)
where the left-hand side is regarded as multiplication operator. Apparently, (5.4) is an integral form of the quantum decomposition of the classical white noise (4.26). As is easily verified, if {Lt } is a quantum stochastic process, so are both {Lt at } and {a∗t Lt }. Therefore, for a quantum stochastic process {Lt } the integrals t t Ls as ds, a∗s Ls ds, 0
0
are meaningful in L(W, W ∗ ) and become quantum stochastic processes. These are called the quantum stochastic integrals against the annihilation process and against the creation process, respectively. The latter is also called the quantum Hitsuda–Skorohod integral. The classical Hitsuda–Skorohod integral is defined for {Φt } ⊂ W ∗ by t a∗s Φs ds, (5.5) Ψt = 0
where the right-hand side is an integral of a W ∗ -valued continuous function on the interval [0, t], see e.g., [32]. The classical-quantum correspondence is stated in the following: Proposition 5.1. Let {Φt } ⊂ W ∗ be a classical stochastic process. Let {Ψt} be the classical Hitsuda–Skorohod integral defined by (5.5) and {Ξt } the quantum Hitsuda–Skorohod integral defined by t a∗s MΦs ds. Ξt = 0
Then we have Ψt = Ξt φ0 . 5.2. Classical stochastic gradient It follows from Proposition 4.8 that for φ ∈ W the map t → at φ is an W-valued rapidly decreasing function (note here that ER = S(R)). Then
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
323
Quantum White Noise Calculus and Applications
the map ∇ defined by ∇φ(t) = at φ,
φ ∈ W,
t ∈ R,
(5.6)
becomes a continuous linear map from W into S(R) ⊗ W ∼ = S(R; W). We need to extend the domain of ∇. We set φ2D =
∞
(n + 1)n! |fn |20 ,
φ = (fn ),
b
fn ∈ H ⊗n .
(5.7)
n=0
Let D be the space of φ = (fn ) such that φD < ∞. Then D is a subspace of Γ(H) and becomes a Hilbert space equipped with the norm · D . For the dual space we define Φ2D∗ =
∞
(n + 1)−1 n!|Fn |20 ,
Φ = (Fn ),
b
Fn ∈ H ⊗n .
(5.8)
n=0
Let D∗ be the space of Φ = (Fn ) satisfying ΦD∗ < ∞. We then have the inclusion relations: W ⊂ D ⊂ Γ(H) ⊂ D∗ ⊂ W ∗ . Lemma 5.2. The map ∇ in (5.6) is extended uniquely to a continuous linear map from D into L2 (R) ⊗ Γ(H) ∼ = L2 (R; Γ(H)). Denoting the extension by the same symbol, we have ∇φL2 (R;Γ(H)) ≤ φD ,
φ ∈ D.
Lemma 5.3. The map ∇ in (5.6) is extended uniquely to a continuous linear map from Γ(H) into L2 (R)⊗D∗ ∼ = L2 (R; D∗ ). Denoting the extension by the same symbol, we have ∇φL2 (R;D∗ ) ≤ φΓ(H) ,
φ ∈ Γ(H).
(5.9)
The proofs are straightforward. Thus, extending the domain of ∇ to D and Γ(H), we obtain the following diagram: W ∇$
−−−−→
S(R, W)
−−−−→
D ∇$
−−−−→
L2 (R; Γ(H))
−−−−→
Γ(H) ∇$
(5.10)
L2 (R; D∗ ),
where the right arrows are continuous inclusions and the down arrows are continuous linear maps, which differ in domains but are denoted by the
October 24, 2013
10:0
9in x 6in
324
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
symbol ∇. We refer to ∇ as the (classical) stochastic gradient. The middle ∇ in (5.10) appears often in literatures, see e.g., Kuo [65], Malliavin [71], Nualart [73]. Proposition 5.4. Let ζ ∈ H = L2 (R). Then a(ζ) ∈ L(D, Γ(H)) and we have ζ(t)∇φ(t) dt, φ ∈ D. (5.11) a(ζ)φ = R
Moreover, it holds that a(ζ)φ, ψ = ∇φ, ζ ⊗ ψ,
φ ∈ D,
ψ ∈ Γ(H).
(5.12)
Also, a(ζ) ∈ L(Γ(H), D∗ ) and similar relations as in (5.11) and (5.12) hold. Proof. It is straightforward to check (5.11) for exponential vectors φξ with ξ ∈ E. Observing that both sides are continuous in φ with respect to the topology of D, we get the result. The proof of (5.12) is then obvious. 5.3. Creation gradient We will define the creation gradient as a continuous linear map acting on white noise operators. Several domains will be introduced. ˜ + defined by We start with L(W, D). Consider ∇ ∼ ∇⊗I = ˜ + : D ⊗ W ∗ −− −−→ L2 (R; Γ(H)) ⊗ W ∗ −−→ L2 (R; Γ(H) ⊗ W ∗ ), ∇
(5.13)
where the first arrow is defined by the middle version of stochastic gradient (5.10) and the second one needs clarification. It is known that L2 (R; Γ(H)) ⊗ W ∗ ∼ = ind lim L2 (R; Γ(H)) ⊗ Γ(E−p ). p→∞
(5.14)
On the other hand, in view of the isomorphisms among Hilbert spaces: L2 (R; Γ(H)) ⊗ Γ(E−p ) ∼ = n(L2 (R) ⊗ Γ(H)) ⊗ Γ(E−p ) ∼ = L2 (R) ⊗ (Γ(H) ⊗ Γ(E−p )) ∼ = L2 (R; Γ(H) ⊗ Γ(E−p )), where the tensor products are all in the sense of Hilbert spaces, we define L2 (R; Γ(H) ⊗ W ∗ ) = ind lim L2 (R; Γ(H) ⊗ Γ(E−p )). p→∞
(5.15)
Then, combining (5.14) and (5.15), we obtain the isomorphism in (5.13). Let K denote the isomorphisms: L(W, D) → D ⊗ W ∗ and L(W,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
325
Γ(H)) → Γ(H) ⊗ W ∗ . The latter naturally induces an isomorphism L2 (R; L(W, Γ(H))) → L2 (R; Γ(H) ⊗ W ∗ ), which is denoted also by K, i.e., K(Ξ)(t) = K(Ξ(t)). With these notations we define ˜ + ◦ K. ∇+ = K−1 ◦ ∇ In other words, ∇+ is the composition of continuous linear maps: ∼ =
˜+ ∇
∇+ : L(W, D) −−→ D ⊗ W ∗ −−−→ L2 (R; Γ(H) ⊗ W ∗ ) ∼ =
−−→ L2 (R; L(W, Γ(H))), which is called the creation gradient. Note that L2 (Γ(Ep ), D) ⊂ L(W, D) for all p ∈ R. For the restriction of ∇+ we have the following result. Theorem 5.5. Let p ∈ R. For any Ξ ∈ L2 (Γ(Ep ), D) its creation gradient ∇+ Ξ belongs to L2 (R; L2 (Γ(Ep ), Γ(H))) and ∇+ Ξ(t)2L2 (Γ(Ep ),Γ(H)) dt ≤ Ξ2L2 (Γ(Ep ),D) . (5.16) R
In particular, ∇+ : L2 (Γ(Ep ), D) → L2 (R; L2 (Γ(Ep ), Γ(H))) is a contraction. Proof. maps:
We see that ∇+ restricted to L2 (Γ(Ep ), D) is the composition of ∼ =
∇⊗I
∇+ : L2 (Γ(Ep ), D) −−→ D ⊗ Γ(E−p ) −−−−→ L2 (R; Γ(H)) ⊗ Γ(E−p ) ∼ =
−−→ L2 (R; Γ(H) ⊗ Γ(E−p )) ∼ =
−−→ L2 (R; L2 (Γ(Ep ), Γ(H))), where the isomorphisms ∼ = are all unitary isomorphisms between Hilbert spaces. On the other hand, ∇ : D → L2 (R; Γ(H)) is a contraction by Lemma 5.2, so is the above ∇ ⊗ I. Then (5.16) follows. Corollary 5.6. For any Ξ ∈ L2 (Γ(H), D) we have ∇+ Ξ(t)2L2 (Γ(H),Γ(H)) dt ≤ Ξ2L2 (Γ(H),D) . R
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
326
b1644-ch04
Real and Stochastic Analysis
In particular, ∇+ Ξ(t) is a Hilbert–Schmidt operator on Γ(H) for a.e. t ∈ R. Next note that L(W ∗ , D) ⊂ L(W, D). The creation gradient ∇+ restricted to L(W ∗ , D) is the composition of continuous linear maps: ∼ =
∇⊗I
∇+ : L(W ∗ , D) −−→ D ⊗ W −−−−→ L2 (R; Γ(H)) ⊗ W ∼ =
∼ =
−−→ L2 (R; Γ(H) ⊗ W) −−→ L2 (R; L(W ∗ , Γ(H))), where L2 (R; Γ(H) ⊗ W) is defined by L2 (R; Γ(H) ⊗ W) = proj lim L2 (R; Γ(H) ⊗ Γ(Ep )). p→∞
The above discussion on the creation gradient is summarized into the following diagram: L(W ∗ , D) ∇+ $
→
L2 (Γ(Ep ), D) ∇+ $
→
L(W, D) ∇+ $
L2 (R; L(W ∗ , Γ(H))) → L2 (R; L2 (Γ(Ep ), Γ(H))) → L2 (R; L(W, Γ(H))), (5.17) where p runs over R and the right arrows are continuous injections (inclusions). Note that L(W, D) ∼ = ind lim L2 (Γ(Ep ), D), = ind lim D ⊗ Γ(E−p ) ∼ = D ⊗ W∗ ∼ p→∞
p→∞
L(W , D) ∼ = D⊗W ∼ = proj lim D ⊗ Γ(Ep ) ∼ = proj lim L2 (Γ(E−p ), D). ∗
p→∞
p→∞
Hence, a white noise operator Ξ belongs to L(W, D) if and only if there exists p ≥ 0 such that Ξ ∈ L2 (Γ(Ep ), D). Similarly, Ξ belongs to L(W ∗ , D) if and only if Ξ ∈ L2 (Γ(E−p ), D) for all p ≥ 0. Thus, the norm estimate of ∇+ in (5.17) is obtained from Theorem 5.5. In a similar manner we come to the following diagram: L(W ∗ , Γ(H)) ∇+ $
→
L2 (Γ(Ep ), Γ(H)) ∇+ $
→
L(W, Γ(H)) ∇+ $
L2 (R; L(W ∗ , D∗ )) → L2 (R; L2 (Γ(Ep ), D∗ )) → L2 (R; L(W, D∗ )), (5.18)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
327
where p runs over R and L(W, Γ(H)) ∼ = ind lim L2 (Γ(Ep ), Γ(H)), p→∞
L(W , Γ(H)) ∼ = proj lim L2 (Γ(E−p ), Γ(H)). ∗
p→∞
Here we also have the following Theorem 5.7. Let p ∈ R. For any Ξ ∈ L2 (Γ(Ep ), Γ(H)) its creation gradient ∇+ Ξ belongs to L2 (R; L2 (Γ(Ep ), D∗ )) and ∇+ Ξ(t)2L2 (Γ(Ep ),D∗ ) dt ≤ Ξ2L2 (Γ(Ep ),Γ(H)) . R
In particular, ∇+ : L2 (Γ(Ep ), Γ(H)) → L2 (R; L2 (Γ(Ep ), D∗ )) is a contraction. Finally we mention a role of the creation gradient, see Proposition 5.4 for the classical case. The proof is straightforward by definition. Proposition 5.8. Let Ξ be a member of one of the domains of the creation gradient in (5.17) and (5.18). Then, for ζ ∈ H = L2 (R) the composition a(ζ)Ξ is defined as a continuous operator and admits the integral expression: ζ(t)∇+ Ξ(t) dt. (5.19) a(ζ)Ξ = R
5.4. Annihilation gradient Recall that one of the domains of the creation gradient ∇+ is L(W, D), see (5.17). We consider the space of adjoint operators, i.e., L(D∗ , W ∗ ), and define ∇− by composition as follows: ∼ =
∇− : L(D∗ , W ∗ ) −−→ W ∗ ⊗ D −−−−→ W ∗ ⊗ L2 (R; Γ(H)) I⊗∇
∼ =
−−→ L2 (R; W ∗ ⊗ Γ(H)) ∼ =
−−→ L2 (R; L(Γ(H), W ∗ )), where L2 (R; W ∗ ⊗ Γ(H)) is defined by L2 (R; W ∗ ⊗ Γ(H)) = ind lim L2 (R; Γ(E−p ) ⊗ Γ(H)). p→∞
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
328
b1644-ch04
Real and Stochastic Analysis
We call ∇− the annihilation gradient. Consideration similar to (5.17) yields L(D∗ , W) ∇− $
L2 (D∗ , Γ(Ep )) ∇− $
→
→
L(D∗ , W ∗ ) ∇− $
L2 (R; L(Γ(H), W)) → L2 (R; L2 (Γ(H), Γ(Ep ))) → L2 (R; L(Γ(H), W ∗ )), (5.20) where L(D∗ , W) ∼ = proj lim L2 (D∗ , Γ(Ep )), p→∞
L(D , W ) ∼ = ind lim L2 (D∗ , Γ(E−p )). ∗
∗
p→∞
For the norm estimate of the annihilation gradient we have the following result, of which the proof is similar to that of Theorem 5.5. Theorem 5.9. Let p ∈ R. For any Ξ ∈ L2 (D∗ , Γ(Ep )) its annihilation gradient ∇− Ξ belongs to L2 (R; L2 (Γ(H), Γ(Ep ))) and ∇− Ξ(t)2L2 (Γ(H),Γ(Ep )) dt ≤ Ξ2L2 (D∗ ,Γ(Ep )) . (5.21) R
Corollary 5.10. For any Ξ ∈ L2 (D∗ , Γ(H)), ∇− Ξ belongs to L2 (R; L2 (Γ(H), Γ(H))) and ∇− Ξ(t)2L2 (Γ(H),Γ(H)) dt ≤ Ξ2L2 (D∗ ,Γ(H)) . (5.22) R
In particular, ∇− Ξ(t) is a Hilbert–Schmidt operator on Γ(H) for a.e. t ∈ R. Repeating similar consideration, we obtain the following diagram: L(Γ(H), W) ∇− $
−−−−→
L2 (Γ(H), Γ(Ep )) ∇− $
−−−−→
L(Γ(H), W ∗ ) ∇− $
L2 (R; L(D, W)) −−−−→ L2 (R; L2 (D, Γ(Ep ))) −−−−→ L2 (R; L(D, W ∗ )), (5.23) which corresponds to (5.18). The norm estimates are similar to (5.21) and (5.22). Theorem 5.11. Let Ξ be a member of one of the domains of the annihilation gradient in (5.20) and (5.23). Then, for ζ ∈ H = L2 (R) the
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
329
composition Ξa(ζ)∗ is defined as a continuous operator and admits the integral expression: ζ(t)∇− Ξ(t) dt. (5.24) Ξa∗ (ζ) = R
Moreover, ∇− Ξ(t) = (∇+ Ξ∗ (t))∗
for a.e. t ∈ R.
(5.25)
Proof. The proof of (5.24) is similar to that of Theorem 5.8, while (5.25) follows from
∗ ζ(t)∇+ Ξ∗ (t) dt = ζ(t)(∇+ Ξ∗ (t))∗ dt, Ξa∗ (ζ) = (a(ζ)Ξ∗ )∗ = R
R
which is easily justified. 5.5. Conservation gradient
We need the “diagonalized” tensor product ∇ ∇ of the stochastic gradients. We begin with the following. Lemma 5.12. For any p ≥ 0 and q > 0 with p + q > 5/12 there exists a constant C(p, q) > 0 such that sup ∇ψ(t)2p ≤ C(p, q) ψ2p+q ,
ψ ∈ W.
(5.26)
t∈R
Proof.
As mentioned in Example 2.13, sup | δt |−r < ∞, t∈R
r>
5 . 12
Then, for any pair of p, q satisfying the assumption, we have a finite constant C(p, q) defined by C(p, q) = max{ρ2qn (n + 1)|δt |2−(p+q) ; t ∈ R, n = 0, 1, 2, . . .} < ∞. (5.27) Now for ψ = (gn ) ∈ W we have ∇ψ(t) = at ψ so that ∇ψ(t)2p
=
∞
n! |(n + 1)δt ⊗1 gn+1 |2p
n=0
≤
∞ n=0
ρ2qn (n + 1)|δt |2−(p+q) (n + 1)! |gn+1|2p+q ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
330
b1644-ch04
Real and Stochastic Analysis
where a basic formula for norm inequalities is used, see [76, Appendix C]. Taking (5.27) into account, we obtain ∇ψ(t)2p ≤ C(p, q)ψ2p+q ,
t ∈ R,
which completes the proof. For each φ ∈ D and ψ ∈ W we define [(∇ ∇)φ ⊗ ψ](t) = ∇φ(t) ⊗ ∇ψ(t),
for a.e. t ∈ R.
Then, with the help of Lemmas 5.2 and 5.12 one can show easily that [(∇ ∇)φ ⊗ ψ](t)2Γ(H)⊗Γ(Ep ) dt ≤ C(p, q)φ2D ψ2p+q . R
Therefore, ∇ ∇ : D ⊗π Γ(Ep+q ) → L2 (R; Γ(H) ⊗ Γ(Ep )) is continuous, where ⊗π is the π-tensor product (not the Hilbert space tensor product). Since D ⊗ W = D ⊗π W → D ⊗π Γ(Ep+q ) is a continuous injection, we see that ∇ ∇ : D ⊗ W → L2 (R; Γ(H) ⊗ W) is a continuous linear map. The conservation gradient is now defined by compositions of continuous linear maps: ∼ =
∇ ∇
∇0 : L(W ∗ , D) −−→ D ⊗ W −−−−−→ L2 (R; Γ(H) ⊗ W) ∼ =
−−→ L2 (R; L(W ∗ , Γ(H))), where L2 (R; Γ(H) ⊗ W) = proj lim L2 (R; Γ(H) ⊗ Γ(Ep )) p→∞
∼ = (L2 (R) ⊗ Γ(H)) ⊗ W. In a similar manner, ∼ =
∇ ∇
∇0 : L(W ∗ , Γ(H)) −−→ Γ(H) ⊗ W −−−−−→ L2 (R; D∗ ⊗ W) ∼ =
−−→ L2 (R; L(W ∗ , D∗ )) becomes also a continuous linear map. Summing up, L(W ∗ , D) ∇0 $ 2
L(W ∗ , Γ(H)) 0 $∇
−−−−→
∗
2
∗
∗
L (R; L(W , Γ(H))) −−−−→ L (R; L(W , D )). We call ∇0 the conservation gradient.
(5.28)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
331
Quantum White Noise Calculus and Applications
6. Quantum Stochastic Integrals 6.1. The Hitsuda–Skorohod integral The classical stochastic integral of Hitsuda–Skorohod type is defined by means of the adjoint action of the classical stochastic gradient. Taking the adjoint maps in the diagram (5.10), we come to the following L2 (R; D) δ$
−−−−→
L2 (R; Γ(H)) δ$
−−−−→
S (R, W ∗ ) δ$
Γ(H)
−−−−→
D∗
−−−−→
W ∗,
where δ = ∇∗ . We call δ(Ψ) the Hitsuda–Skorohod integral. By definition it holds that (6.1) δ(Ψ), φ = Ψ(t), ∇φ(t)dt, R
for a suitable pair Ψ and φ. The quantum stochastic integrals of Hitsuda–Skorohod type are defined in the same spirit, where the quantum stochastic gradients play a role.
6.2. Creation integral The creation integral δ + is by definition the adjoint map of the creation gradient ∇+ . Taking the adjoint map of (5.17), we have L2 (R; L(W ∗ , Γ(H))) → L2 (R; L2 (Γ(Ep ), Γ(H))) → L2 (R; L(W, Γ(H))) δ+ $ δ+ $ δ+ $ L(W ∗ , D∗ )
→
L2 (Γ(Ep ), D∗ )
→
L(W, D∗ ), (6.2)
where p runs over R. Similarly from (5.18) we obtain L2 (R; L(W ∗ , D)) −−−−→ L2 (R; L2 (Γ(Ep ), D)) −−−−→ L2 (R; L(W, D)) δ+ $ δ+ $ δ+ $ L(W ∗ , Γ(H))
−−−−→
L2 (Γ(Ep ), Γ(H))
−−−−→
L(W, Γ(H)). (6.3)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
332
b1644-ch04
Real and Stochastic Analysis
The norm estimates of δ + follow immediately from those of ∇+ . + 2 δ (Ξ)L2 (Γ(Ep ),D∗ ) ≤ Ξ(t)2L2 (Γ(Ep ),Γ(H)) dt, δ + (Ξ)2L2 (Γ(Ep ),Γ(H)) ≤
(6.4)
R
R
Ξ(t)2L2 (Γ(Ep ),D) dt.
(6.5)
In particular, putting p = 0 in (6.5), we obtain Theorem 6.1. For any Ξ ∈ L2 (R; L2 (Γ(H), D)) the creation integral δ + (Ξ) is a Hilbert–Schmidt operator on Γ(H). Theorem 6.2. Let Ξ be a member of one of the domains of the creation integral in (6.2) and (6.3). Then it holds that + δ (Ξ)φ, ψ = Ξ(t)φ, ∇ψ(t) dt (6.6) R
for a suitable pair φ, ψ. Therefore, denoting (Ξφ)(t) = Ξ(t)φ we have δ + (Ξ)φ = δ (Ξφ) ,
φ ∈ W.
(6.7)
Proof. In order to fix the idea we assume that Ξ ∈ L2 (R; L(W, Γ(H))) and ˜ + ◦ K, where prove (6.6) for φ ∈ W and ψ ∈ D. Recall that ∇+ = K−1 ◦ ∇ + 2 ˜ : D ⊗ W → L (R; Γ(H) ⊗ W) is defined in (5.13). Taking the adjoint, ∇ we have δ+ = K−1 ◦ δ˜+ ◦ K, where δ˜+ : L2 (R; Γ(H) ⊗ W ∗ ) → D∗ ⊗ W ∗ be ˜ + . With these notations we have the adjoint map of ∇ δ + (Ξ)φ, ψ = K(δ + (Ξ)), ψ ⊗ φ = δ˜+ (K(Ξ)), ψ ⊗ φ ˜ + (ψ ⊗ φ) = KΞ, (∇ψ) ⊗ φ = KΞ, ∇ = KΞ(t), ∇ψ(t) ⊗ φ dt = Ξ(t)φ, ∇ψ(t) dt. R
R
The last integral is the Hitsuda–Skorohod integral, see (6.1). Thus, + δ (Ξ)φ, ψ = Ξ(t)φ, ∇ψ(t) dt
R
= R
which proves (6.7).
(Ξφ)(t), ∇ψ(t) dt = δ(Ξφ), ψ,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
333
As for a criterion for δ + (Ξ) being a bounded operator on Γ(H), we only show the following: Theorem 6.3. Let Ξ be a member of one of the domains of the creation integral in (6.2) and (6.3). Assume that there exist a dense subspace E ⊂ Γ(H) and a constant number C ≥ 0 such that Ξ(t)φ2D dt ≤ Cφ20 , φ ∈ E. (6.8) R
Then the creation integral δ + (Ξ) is a bounded operator from Γ(H) into itself. Proof.
For any φ ∈ E and ψ ∈ Γ(H), we see from (6.6) that |δ + (Ξ)φ, ψ| ≤ |Ξ(t)φ, ∇ψ(t)| dt R
≤
R
Ξ(t)φD ∇ψ(t)D∗ dt.
Then, by assumption (6.8) and the contraction property of the classical gradient (Lemma 5.3) we obtain + 2 2 |δ (Ξ)φ, ψ| ≤ Ξ(t)φD dt ∇ψ(t)2D∗ dt ≤ Cφ20 ψ20 . R
R
Hence δ + (Ξ) is a bounded operator on Γ(H).
Corollary 6.4. For any Ξ ∈ L2 (R; L(Γ(H), D)) the creation integral δ + (Ξ) is a bounded operator on Γ(H). Proof.
By assumption we have Ξ(t)2L(Γ(H),D) dt < ∞, R
where the integrand is the square of the operator norm. Hence, for φ ∈ Γ(H) we have Ξ(t)φ 2D dt ≤ φ20 Ξ(t)2L(Γ(H),D) dt. R
R
Therefore, the condition in Theorem 6.3 is satisfied.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
334
b1644-ch04
Real and Stochastic Analysis
6.3. Annihilation integral The annihilation integral δ − is defined to be the adjoint map of the annihilation gradient. From (5.20) we obtain L2 (R; L(Γ(H), W)) → L2 (R; L2 (Γ(H), Γ(Ep ))) → L2 (R; L(Γ(H), W ∗ )) δ− $ δ− $ δ− $ L(D, W)
→
L2 (D, Γ(Ep ))
→
L(D, W ∗ ) (6.9)
and from (5.23) we obtain L2 (R; L(D∗ , W)) → L2 (R; L2 (D∗ , Γ(Ep ))) → L2 (R; L(D∗ , W ∗ )) δ− $ δ− $ δ− $ L(Γ(H), W)
→
L2 (Γ(H), Γ(Ep ))
→
L(Γ(H), W ∗ ), (6.10)
where p runs over R. The norm estimates of δ − follow immediately from those of ∇− and are similar to (6.4) and (6.5) as follows: − 2 δ (Ξ)L2 (D,Γ(Ep )) ≤ Ξ(t)2L2 (Γ(H),Γ(Ep )) dt, R
δ − (Ξ)2L2 (Γ(H),Γ(Ep )) ≤
R
Ξ(t)2L2 (D∗ ,Γ(Ep )) dt.
(6.11)
In particular, putting p = 0 in (6.11), we obtain Theorem 6.5. For any Ξ ∈ L2 (R; L2 (D∗ , Γ(H))) the annihilation integral δ − (Ξ) is a Hilbert–Schmidt operator on Γ(H). In a similar fashion as in Theorem 6.2 we have the following Theorem 6.6. Let Ξ be a member of one of the domains of the annihilation integral in (6.9) and (6.10). Then it holds that − (6.12) δ (Ξ)φ, ψ = Ξ(t)(∇φ(t)), ψ dt R
for a suitable pair φ, ψ. Therefore, − δ (Ξ)φ = Ξ(t)(∇φ(t)) dt. R
(6.13)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
335
Quantum White Noise Calculus and Applications
Corollary 6.7. Let Ξ be a member of one of the domains of the annihilation integral in (6.9) and (6.10). Then it holds that (δ − (Ξ))∗ = δ + (Ξ∗ ).
(6.14)
For the proof we need only to compare (6.6) and (6.12). By virtue of the relation (6.14) one can derive from Theorem 6.3 and Corollary 6.4 a condition for δ − (Ξ) being a bounded operator on Γ(H). 6.4. Conservation integral We defined the conservation gradient ∇0 with two different domains in (5.28). The conservation integral is defined to be their adjoint maps, i.e., L2 (R; L(W, D)) −−−−→ L2 (R; L(W, Γ(H))) δ0 $ δ0 $ L(W, Γ(H))
−−−−→
(6.15)
L(W, D∗ ).
In a similar fashion as in Theorems 6.2 and 6.6 we have the following: Theorem 6.8. Let Ξ be a member of one of the domains of the conservation integral in (6.15). Then it holds that δ 0 (Ξ)φ, ψ = Ξ(t)∇φ(t), ∇ψ(t) dt R
for a suitable pair φ, ψ. Therefore, δ 0 (Ξ)φ = δ(Ξ∇φ),
(6.16)
where Ξ∇φ is defined by Ξ∇φ(t) = Ξ(t)(∇φ(t)). We only mention the following criterion for a conservation integral being a bounded operator on Γ(H). The proof is similar to that of Theorem 6.3. Theorem 6.9. Let Ξ be a member of one of the domains of the conservation integral in (6.15). Assume that there exist a dense subspace E ⊂ Γ(H) and a constant number C ≥ 0 such that Ξ(t)(∇φ(t))2D dt ≤ Cφ20 , φ ∈ E. R
Then the conservation integral δ 0 (Ξ) is a bounded operator on Γ(H).
October 24, 2013
10:0
9in x 6in
336
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
We see from (6.7), (6.13) and (6.16) that our definitions of the Hitsuda–Skorohod quantum stochastic integrals coincide with the ones introduced by Belavkin [4] and Lindsay [70] for a common integrand. In fact, their definition starts with the right-hand sides of (6.7), (6.13) and (6.16) for suitably chosen Ξ and φ. Our definition is more direct thanks to the quantum stochastic gradients acting on white noise operators. 7. Quantum White Noise Derivatives In this section we go back to a standard CKS-space W ⊂ Γ(H) ⊂ W ∗ , where H = L2 (T, dt; C). 7.1. Quadratic functions of quantum white noise For each S ∈ L(E, E ∗ ), by the kernel theorem there exists a unique τS ∈ E ∗ ⊗ E ∗ such that τS , η ⊗ ξ = Sξ, η,
ξ, η ∈ E.
The above relation is easy to understand from the formal integral form:
τS (t, s)η(t)ξ(s) dsdt = τS (t, s)ξ(s) ds η(t) dt. T ×T
T
T
We call τS the S-trace. In particular, for the identity operator I ∈ L(E, E) we write τ = τI ∈ E ⊗ E ∗ , which is used to define the Wick tensor in Section 2.3. With each S ∈ L(E, E ∗ ) we associate integral kernel operators defined by τS (s, t)as at dsdt, ∆G (S) = Ξ0,2 (τS ) = T ×T
∆∗G (S) = Ξ2,0 (τS ) =
T ×T
Λ(S) = Ξ1,1 (τS ) = For S = I we write
∆G ≡ ∆G (I) = T
a2t dt,
T ×T
τS (s, t)a∗s a∗t dsdt, τS (s, t)a∗s at dsdt.
N ≡ Λ(I) = T
a∗t at dt,
which are called the Gross Laplacian and the number operator, respectively. In this sense, ∆G (S) and Λ(S) are called a generalized Gross Laplacian and a generalized number operator, respectively.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
337
Lemma 7.1. For any S ∈ L(E, E ∗ ) we have ∆G (S) ∈ L(W, W) and ∆G (S)∗ ∈ L(W ∗ , W ∗ ). Lemma 7.2. For any S ∈ L(E, E ∗ ) we have Λ(S) ∈ L(W, W ∗ ). In particular, N ∈ L(W, W) ∩ L(W ∗ , W ∗ ). Lemma 7.3. The generalized number operator Λ(S) coincides with the differential second quantization of S, i.e., τS (s, t)a∗s at dsdt. Λ(S) = dΓ(S) = Ξ1,1 (τS ) = R2
Proof.
Let S ∈ L(E, E ∗ ) and set γ0 (S) = 0, γn (S) =
n−1
I ⊗k ⊗ S ⊗ I ⊗(n−k−1) ,
n ≥ 1.
k=0
Then the differential second quantization dΓ(S) is defined by dΓ(S)φ = (γn (S)fn ),
φ = (fn ) ∈ W.
It is easily verified that dΓ(S) ∈ L(W, W ∗ ). Then we see that η) = dΓ(S)(ξ,
=
∞ 1 γn (S)ξ ⊗n , η ⊗n n! n=0 ∞ 1 nSξ, ηξ, ηn−1 n! n=0
= Sξ, ηeξ, η . On the other hand, η) = τS , η ⊗ ξeξ, η = Sξ, ηeξ, η . Λ(S)(ξ, Since the symbols coincide, we have dΓ(S) = Λ(S).
7.2. Quantum white noise derivatives For any white noise operator Ξ ∈ L(W, W ∗ ) and ζ ∈ E the commutators [a(ζ), Ξ] = a(ζ)Ξ − Ξa(ζ),
−[a∗ (ζ), Ξ] = Ξa∗ (ζ) − a∗ (ζ)Ξ,
October 24, 2013
10:0
9in x 6in
338
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
are well defined as compositions of white noise operators by Lemma 4.6, i.e., belong to L(W, W ∗ ). We define Dζ+ Ξ = [a(ζ), Ξ],
Dζ− Ξ = −[a∗ (ζ), Ξ].
These are called the creation derivative and annihilation derivative of Ξ, respectively. Both together are referred to as the quantum white noise derivatives of Ξ. Note the obvious relations: (Dζ+ Ξ)∗ = Dζ− (Ξ∗ ),
(Dζ− Ξ)∗ = Dζ+ (Ξ∗ ).
The Fock expansion theorem says that every white noise operator Ξ is a “function” of quantum white noise, say, Ξ = Ξ(as , a∗t ; s, t ∈ T ). It is then natural to consider the derivatives of Ξ with respect to the coordinate variables at and a∗t , say, δΞ , δat
δΞ . δa∗t
However, these are not always well defined by the singularity of quantum white noise and are justified by the above quantum white noise derivatives. We may understand informally that δΞ δΞ Dζ− Ξ = ζ(t) dt, Dζ+ Ξ = ζ(t) ∗ dt. δa δa t T T t Proposition 7.4. (ζ, Ξ) → Dζ± Ξ is a continuous bilinear map from E × L(W, W ∗ ) into L(W, W ∗ ). Proof.
By direct norm estimates, see [48] for the details.
Lemma 7.5. For f ∈ E ∗ we have Dζ− a(f ) = ζ, f I, Dζ− a∗ (f ) = 0,
Dζ+ a(f ) = 0, Dζ+ a∗ (f ) = ζ, f I,
Proof. These are essentially the canonical commutation relations (Theorem 4.7). Lemma 7.6. For S ∈ L(E, E ∗ ) and ζ ∈ E we have Dζ+ ∆G (S) = 0,
Dζ− ∆G (S) = a(Sζ) + a(S ∗ ζ),
Dζ+ ∆∗G (S) = a∗ (Sζ) + a∗ (S ∗ ζ),
Dζ− ∆∗G (S) = 0,
Dζ+ Λ(S) = a(S ∗ ζ),
Dζ− Λ(S) = a∗ (Sζ).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
339
Proof. Verification is straightforward by definition and the canonical commutation relation. Here we mention an “informal proof” which is more instructive. Consider τS (s, t)as at dsdt. ∆G (S) = T ×T
Since
Du−
is the derivative with respect to au , we have τS (s, u)as ds + τS (u, t)at dt. Du− ∆G (S) = T
Hence Dζ− ∆G (S) =
T
T
ζ(u)Du− ∆G (S)du
=
T ×T
τS (s, u)ζ(u)as dsdu +
T
τS (u, t)ζ(u)at dtdu
S ∗ ζ(t)at dt
Sζ(s)as ds +
=
T ×T
T
= a(Sζ) + a(S ∗ ζ).
The rest is verified similarly.
More generally, we have the following result, of which the proof is obvious by Lemma 4.34. Proposition 7.7. For κl,m ∈ (E ⊗(l+m) )∗ and ζ ∈ E we have Dζ+ Ξl,m (κl,m ) = lΞl−1,m (ζ ∗ κl,m ),
Dζ− Ξl,m (κl,m ) = mΞl,m−1 (κl,m ∗ ζ).
Lemma 7.8. Let Ξ ∈ L(W, W ∗ ). For ξ, η, ζ ∈ E we have d + tζ, η) − ζ, ηΞ(ξ, η), Ξ(ξ (Dζ− Ξ)(ξ, η) = dt t=0 d + η + tζ) − ζ, ξΞ(ξ, η). (Dζ Ξ) (ξ, η) = Ξ(ξ, dt t=0
Proof.
By definition we have (Dζ− Ξ)φξ , φη = Ξa∗ (ζ)φξ , φη − a∗ (ζ)Ξφξ , φη d = Ξφξ+tζ , φη − ζ, ηΞφξ , φη , dt t=0
from which the first relation follows. The second one is verified similarly.
October 24, 2013
10:0
9in x 6in
340
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
Remark 7.9. In a simple context of one-degree of freedom, a concept of translation of operators was introduced in [39] and then the creation and annihilation derivatives occurred as commutators [85]. 7.3. Wick derivations Recall that the white noise operators L(W, W ∗ ) equipped with the Wick product form a commutative algebra. Definition 7.10. A continuous linear map D : L(W, W ∗ ) → L(W, W ∗ ) is called a Wick derivation if D(Ξ1 Ξ2 ) = (DΞ1 ) Ξ2 + Ξ1 (DΞ2 ),
Ξ1 , Ξ2 ∈ L(W, W ∗ ).
Theorem 7.11. The creation and annihilation derivatives Dζ± are Wick derivations. Proof.
We show that the creation derivative is a Wick derivation, i.e., Dζ+ (Ξ1 Ξ2 ) = (Dζ+ Ξ1 ) Ξ2 + Ξ1 (Dζ+ Ξ2 )
(7.1)
by means of the operator symbols. Let ξ, η ∈ E. By Lemma 7.8 we have (Dζ+ (Ξ1 Ξ2 ))b(ξ, η) d (Ξ1 Ξ2 )b(ξ, η + tζ) − ζ, ξ(Ξ1 Ξ2 )b(ξ, η) = dt t=0 d 1 (ξ, η)Ξ 2 (ξ, η)e−ξ, η . = (Ξ1 Ξ2 )b(ξ, η + tζ) − ζ, ξΞ dt t=0 Since
d (Ξ1 Ξ2 )b(ξ, η + tζ) dt t=0 d 2 (ξ, η + tζ)e−ξ, η+tζ 1 (ξ, η + tζ)Ξ = Ξ dt t=0 d 2 (ξ, η)e−ξ, η 1 (ξ, η + tζ)Ξ = Ξ dt t=0 d 2 (ξ, η + tζ)e−ξ, η + Ξ1 (ξ, η) Ξ dt t=0 1 (ξ, η)Ξ 2 (ξ, η)ζ, ξe−ξ, η , −Ξ
(7.2)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
341
(7.2) becomes (Dζ+ (Ξ1 Ξ2 ))b(ξ, η) d 2 (ξ, η)e−ξ, η 1 (ξ, η + tζ)Ξ Ξ = dt t=0 1 (ξ, η) d 2 (ξ, η + tζ)e−ξ, η +Ξ Ξ dt t=0 1 (ξ, η)Ξ 2 (ξ, η)ζ, ξe−ξ, η . − 2Ξ
(7.3)
On the other hand, by similar calculation we have ((Dζ+ Ξ1 ) Ξ2 )b(ξ, η) d 2 (ξ, η)e−ξ, η 1 (ξ, η + tζ) Ξ Ξ = dt t=0 1 (ξ, η)Ξ 2 (ξ, η)ζ, ξe−ξ, η , −Ξ
(7.4)
(Ξ1 (Dζ+ Ξ2 ))b(ξ, η) d 2 (ξ, η + tζ)e−ξ, η Ξ = Ξ1 (ξ, η) dt t=0 1 (ξ, η)Ξ 2 (ξ, η)ζ, ξe−ξ, η . −Ξ
(7.5)
Since (7.3) coincides with the sum of (7.4) and (7.5), we obtain (7.1). Noting that Dζ− (Ξ1 Ξ2 ) = [Dζ+ (Ξ∗1 Ξ∗2 )]∗ = (Dζ− Ξ1 ) Ξ2 + Ξ1 (Dζ− Ξ2 ), we see that the annihilation derivative is also a Wick derivation.
It is a natural question to characterize Wick derivations in general. For a Wick derivation D and a white noise operator C ∈ L(W, W ∗ ) we define C D by (C D)Ξ = C DΞ,
Ξ ∈ L(W, W ∗ ).
Then we see easily that C D is a Wick derivation. In other words, the Wick derivations form a L(W, W ∗ )-module. The following theorem asserts that every Wick derivation is generated by the annihilation and creation derivatives. The precise argument is found in [52].
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
342
b1644-ch04
Real and Stochastic Analysis
Theorem 7.12. For a Wick derivation D : L(W, W ∗ ) → L(W, W ∗ ) there exist unique F, G ∈ E ⊗ L(W, W ∗ ) such that F (t) Dt+ dt + G(t) Dt− dt, (7.6) D= T
T
where the integrals in the right-hand sides are understood through the canonical bilinear forms. 7.4. Quantum white noise differential equations of Wick type Given a Wick derivation D : L(W, W ∗ ) → L(W, W ∗ ) and a white noise operator G ∈ L(W, W ∗ ), we consider a linear differential equation: DΞ = G Ξ.
(7.7)
The solution is described as in the case of linear ordinary differential equations. For U ∈ L(W, W ∗ ) the Wick exponential is defined by wexp U =
∞ 1 n U n! n=0
(7.8)
whenever the series converges in L(W, W ∗ ). Theorem 7.13. Let G ∈ L(W, W ∗ ). If there is an operator U ∈ L(W, W ∗ ) such that DU = G and wexp U ∈ L(W, W ∗ ), then a general solution to (7.7) is given by Ξ = (wexp U ) F = F wexp U
(7.9)
with a white noise operator F ∈ L(W, W ∗ ) satisfying DF = 0. Proof.
Let F ∈ L(W, W ∗ ) such that DF = 0. Then D[(wexp U ) F ] = (D wexp U ) F + (wexp U ) (DF ) = (DU ) (wexp U ) F.
Therefore, Ξ = (wexp U ) F is a solution to (7.7). Conversely, let Ξ be an arbitrary solution to (7.7). Consider the operator F = (wexp (−U )) Ξ. We see that F ∈ L(W, W ∗ ) and that DF = (−G (wexp (−U )) Ξ + (wexp (−U )) (G Ξ) = 0. Therefore, Ξ = (wexp U ) F with DF = 0.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
343
Recall that Dζ± are Wick derivations. We start with the following simple differential equations: Dζ+ Ξ = 0,
Dζ− Ξ = 0.
Lemma 7.14. A white noise operator Ξ ∈ L(W, W ∗ ) satisfies Dζ+ Ξ = 0 for all ζ ∈ E if and only if Ξ contains no creation operators, i.e., its Fock expansion is of the form: Ξ=
∞
Ξ0,m (κ0,m ).
m=0
Similarly, Ξ ∈ L(W, W ∗ ) satisfies Dζ− Ξ = 0 for all ζ ∈ E if and only if Ξ contains no annihilation operators, i.e., its Fock expansion is of the form: Ξ=
∞
Ξl,0 (κl,0 ).
l=0
Therefore, Ξ ∈ L(W, W ∗ ) satisfies Dζ+ Ξ = Dζ− Ξ = 0 for all ζ ∈ E if and only if Ξ is a scalar operator. Proof.
The proof is straightforward by Fock expansion.
Comparing Lemma 7.14 with Lemma 4.29 and Theorem 4.35, we see that a convolution operator is characterized as a white noise operator satisfying Dζ+ Ξ = 0 for all ζ ∈ E. Similarly, a Wick multiplication operator is characterized as a white noise operator satisfying Dζ− Ξ = 0 for all ζ ∈ E. 7.5. The implementation problem Let S, T ∈ L(E, E) and consider transformed annihilation and creation operators: b(ζ) = a(Sζ) + a∗ (T ζ), b∗ (ζ) = a∗ (Sζ) + a(T ζ), where ζ ∈ E. Our implementation problem is to find a white noise operator U ∈ L(W, W ∗ ) satisfying U
W −−−−→ a(ζ)$
W∗ b(ζ) $
W −−−−→ W ∗ U
U
W −−−−→ a∗ (ζ)$
W∗ b∗ (ζ) $
W −−−−→ W ∗ U
We will give a general form of U under certain conditions for S, T .
(7.10)
October 24, 2013
10:0
9in x 6in
344
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
Theorem 7.15. Assume that S, T ∈ L(E, E) fulfill the following conditions: (i) S is invertible; (ii) T ∗ S = S ∗ T . Then a white noise operator U ∈ L(W, W ∗ ) satisfies the intertwining property: U a(ζ) = b(ζ)U,
ζ ∈ E,
if and only if U is of the form 1 ∗ −1 −1 ∗ U = wexp − ∆G (T S ) + Λ((S ) − I) F, 2
(7.11)
(7.12)
where F ∈ L(W, W ∗ ) is an arbitrary white noise operator satisfying Dζ+ F = 0 for all ζ ∈ E. Proof.
By definition, we have
+ U a(ζ) = b(ζ)U = (a(Sζ) + a∗ (T ζ))U = DSζ U + U a(Sζ) + a∗ (T ζ)U,
and hence + U = U a(ζ) − U a(Sζ) − a∗ (T ζ)U. DSζ
Writing the right-hand side in terms of the Wick product, we come to + U = [a(ζ − Sζ) − a∗ (T ζ)] U, DSζ
(7.13)
which is equivalent to (7.11). Thus, it is sufficient to solve the differential Equation (7.13). By Lemma 7.6 we know that + DSζ Λ((S −1 )∗ − I) = a(ζ − Sζ),
+ DSζ ∆∗G (T S −1 ) = 2a∗ (T ζ).
Then, with the help of Theorem 7.13 a general form of the solutions to (7.13) is given by (7.12), where F ∈ L(W, W ∗ ) is an arbitrary white noise + operator satisfying DSζ F = 0 for all ζ ∈ E. Since S is invertible, the last condition for F is equivalent to that Dζ+ F = 0 for all ζ ∈ E. Theorem 7.16. Assume that S, T ∈ L(E, E) fulfill the following conditions: (i) S is invertible; (ii) T ∗ S = S ∗ T ;
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
345
Quantum White Noise Calculus and Applications
(iii) S ∗ S − T ∗ T = I; (iv) ST ∗ = T S ∗ . Then a white noise operator U ∈ L(W, W ∗ ) satisfies the intertwining property: U a∗ (ζ) = b∗ (ζ)U,
ζ ∈ E,
(7.14)
if and only if U is of the form: 1 1 ∗ −1 −1 ∗ −1 U = wexp − ∆G (T S ) + Λ((S ) − I) + ∆G (S T ) G, 2 2 where G ∈ L(W, W ∗ ) is an arbitrary white noise operator satisfying (Dζ− − DT+ζ )G = 0 for all ζ ∈ E. Proof. In a similar fashion as in the proof of Theorem 7.15, we see that (7.14) is equivalent to (Dζ− − DT+ζ )U = [a∗ (Sζ − ζ) + a(T ζ)] U.
(7.15)
Our task is to solve the differential equation (7.15). First we need to find a solution to the differential equation: (Dζ− − DT+ζ )Y = a∗ (Sζ − ζ) + a(T ζ).
(7.16)
As is easily verified, Y = ∆∗G (K) + Λ(L) + ∆G (M ),
K = K ∗,
M = M ∗,
satisfies (7.16) if and only if 2M − L∗ T = T,
L − 2KT = S − I.
Thanks to the conditions (i)–(iv) we may choose 1 K = − T S −1 , 2
L = (S −1 )∗ − I,
M=
1 −1 S T. 2
Then the assertion follows immediately from Theorem 7.13.
Theorem 7.17. Assumptions being the same as in Theorem 7.16, a white noise operator U ∈ L(W, W ∗ ) satisfies the following intertwining properties: U a(ζ) = b(ζ)U,
U a∗ (ζ) = b∗ (ζ)U,
ζ ∈ E,
October 24, 2013
10:0
9in x 6in
346
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
if and only if U is of the form: 1 1 U = C wexp − ∆∗G (T S −1) + Λ((S −1 )∗ − I) + ∆G (S −1 T ) , 2 2
(7.17)
or equivalently, 1
∗
U = C e− 2 ∆G (T S
−1
)
1
Γ((S −1 )∗ )e 2 ∆G (S
−1
T)
,
(7.18)
where C ∈ C. It follows from Theorems 7.15 and 7.16 that U is of the form 1 ∗ −1 −1 ∗ U = wexp − ∆G (T S ) + Λ((S ) − I) F 2 1 1 ∗ −1 −1 ∗ −1 = wexp − ∆G (T S ) + Λ((S ) − I) + ∆G (S T ) G, (7.19) 2 2
Proof.
where F, G ∈ L(W, W ∗ ) satisfy Dζ+ F = 0,
(Dζ− − DT+ζ )G = 0,
for all ζ ∈ E. We see from (7.19) that 1 −1 G = F wexp − ∆G (S T ) . 2 Since the right-hand side contains no creation operators, we have Dζ+ G = 0,
ζ ∈ E.
(7.20)
Then, 0 = (Dζ− − DT+ζ )G = Dζ− G,
ζ ∈ E.
(7.21)
It follows from (7.20) and (7.21) that G is a scalar operator, see Lemma 7.14. Consequently, (7.17) follows from (7.19). For the conventional expression (7.18), we note that 1 ∗ 1 ∗ −1 −1 (7.22) wexp − ∆G (T S ) = exp − ∆G (T S ) , 2 2 wexp {Λ((S −1 )∗ − I)} = Γ((S −1 )∗ ), 1 1 −1 −1 wexp ∆G (S T ) = exp ∆G (S T ) , 2 2
(7.23) (7.24)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
347
where Γ((S −1 )∗ ) is the second quantization of (S −1 )∗ . Obviously, the Wick products of these operators coincides with the usual composition of operators (7.18). As is easily verified, condition (ii) in Theorem 7.16 is necessary and sufficient for [b(ζ), b(η)] = [b∗ (ζ), b∗ (η)] = 0,
ζ, η ∈ E.
While, condition (iii) therein is necessary and sufficient for [b(ζ), b∗ (η)] = ζ, η,
ζ, η ∈ E.
Namely, these two conditions are equivalent for {b(ζ), b∗ (ζ); ζ ∈ E} to satisfy the canonical commutation relation. In general, under these conditions, an operator U satisfying (7.10) is called (an implementor of) the Bogoliubov transformation. Remark 7.18. The original implementation problem is to find a (unitary) operator U satisfying the following commutative diagrams: U
U
Γ(H) −−−−→ Γ(H) b(ζ) a(ζ)$ $
Γ(H) −−−−→ Γ(H) b∗ (ζ) a∗ (ζ)$ $
Γ(H) −−−−→ Γ(H)
Γ(H) −−−−→ Γ(H).
U
(7.25)
U
This problem has a long history and the solution to the unitary implementation problem is known [8, 34, 89, 90, 92]. Among the operators obtained in Theorem 7.17 we can pick up the unitary implementors by applying complex white noise. Being somehow beyond the main purpose of this paper, this topic will be discussed elsewhere. For B, C ∈ L(E, E ∗ ) there exists a unique white noise operator GB,C ∈ L(W, W ∗ ) satisfying GB,C φξ = eBξ, ξ φCξ ,
ξ ∈ E.
It is also checked that GB,C ∈ L(W, W) for B ∈ L(E, E ∗ ) and C ∈ L(E, E). The verification is straightforward by the characterization theorem of operator symbols. The operator GB,C is called a generalized Fourier–Gauss ∗ transform and its adjoint GB,C a generalized Fourier–Mehler transform
October 24, 2013
10:0
348
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
[14, 43]. Noting that Γ(C) ∈ L(W, W ∗ ) and e∆G (U) ∈ L(W, W) for B, C ∈ L(E, E ∗ ), we obtain GB,C = Γ(C) e∆G (B) . Hence the implementor U in Theorem 7.17 is a composition of a generalized Fourier–Gauss transform and a generalized Fourier–Mehler transform. When B = bI and C = cI are scalar operators with b, c ∈ C, we write Gb,c = GbI,cI for simplicity. This Gb,c is called a Fourier–Gauss transform [68], see also [65]. The Fourier–Gauss transform Gb,c with specific parameters b = (ieiθ sin θ)/2 and c = eiθ , where θ ∈ R, is denoted by Gθ . Then Kuo’s Fourier–Mehler transform [62] is given by Fθ = Gθ∗ , and Kuo’s Fourier transform [61] by F−π/2. References [1] L. Accardi, Y.-G. Lu and I. Volovich, Quantum Theory and Its Stochastic Limit, Springer, Berlin, 2002. [2] N. Asai, I. Kubo and H.-H. Kuo, General characterization theorems and intrinsic topologies in white noise analysis, Hiroshima Math. J. 31 (2001), 299–330. [3] S. Attal and J. M. Lindsay, Quantum stochastic calculus with maximal operator domains, Ann. Probab. 32 (2004), 488–529. [4] V. P. Belavkin, A quantum nonadapted Itˆ o formula and stochastic analysis in Fock scale, J. Funct. Anal. 102 (1991), 414–447. [5] M. Ben Chrouda and H. Ouerdiane, Algebras of operators on holomorphic functions and applications, Math. Phys. Anal. Geom. 5 (2002), 65–76. [6] M. Ben Chrouda, M. El Oued and H. Ouerdiane, Convolution calculus and applications to stochastic differential equations, Soochow J. Math. 28 (2002), 375–388. [7] Yu. M. Berezansky and Yu. G. Kondratiev, Spectral Methods in InfiniteDimensional Analysis, Kluwer Academic, 1995. [8] F. A. Berezin, The Method of Second Quantization, Academic Press, 1966. [9] N. N. Bogolubov, A. A. Logunov and I. T. Todorov, Introduction to Axiomatic Quantum Field Theory, Benjamin, Massachusetts, 1975. [10] A. Bohm and M. Gadella, Dirac Kets, Gamow Vectors and Gel’fand Triples, Lect. Notes in Phys. 348, Springer–Verlag, 1989. [11] L. Bruneau and J. Derezi´ nski, Bogoliubov Hamiltonians and one-parameter groups of Bogoliubov transformations, J. Math. Phys. 48 (2007), 022101. [12] D. M. Chung and T. S. Chung, Wick derivations on white noise functionals, J. Korean Math. Soc. 33 (1996), 993–1008. [13] D. M. Chung, T. S. Chung and U. C. Ji, A simple proof of analytic characterization theorem for operator symbols, Bull. Korean Math. Soc. 34 (1997), 421–436.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
349
[14] D. M. Chung and U. C. Ji, Transformations on white noise functionals with their applications to Cauchy problems, Nagoya Math. J. 147 (1997), 1–23. [15] D. M. Chung and U. C. Ji, Some Cauchy problems in white noise analysis and associated semigroups of operators, Stochastic Anal. Appl. 17 (1999), 1–22. [16] D. M. Chung, U. C. Ji and N. Obata, Higher powers of quantum white noises in terms of integral kernel operators, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1 (1998), 533–559. [17] D. M. Chung, U. C. Ji and N. Obata, Normal-ordered white noise differential equations I: Existence of solutions as Fock space operators, in Trends in Contemporary Infinite Dimensional Analysis and Quantum Probability, L. Accardi et al. (eds.), Istituto Italiano di Cuitura, Kyoto, 2000, pp. 115–135. [18] D. M. Chung, U. C. Ji and N. Obata, Normal-ordered white noise differential equations II: Regularity properties of solutions, in Prob. Theory and Math. Stat., B. Grigelionis et al. (eds.), VSP/TEV Ltd., 1999, pp. 157–174. [19] D. M. Chung, U. C. Ji and N. Obata, Quantum stochastic analysis via white noise operators in weighted Fock space, Rev. Math. Phys. 14 (2002), 241–272. [20] W. G. Cochran, H.-H. Kuo and A. Sengupta, A new class of white noise generalized functions, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1 (1998), 43–67. [21] J. L. Da Silva, M. Erraoui and H. Ouerdiane, Convolution equation: Solution and parabolic representation, in Quantum Probability and Infinite Dimensional Analysis: Proceedings of the 29th Conference, H. Ouerdiane and A. Barhoumi (eds.), World Scientific, 2010, pp. 230–244. [22] S. Dineen, Complex Analysis on Infinite Dimensional Spaces, Springer– Verlag, 1999. [23] R. Gannoun, R. Hachaichi, H. Ouerdiane and A. Rezgui, Un th´eoreme de dualit´e entre espaces de fonctions holomorphes ` a croissance exponentielle, J. Funct. Anal. 171 (2000), 1–14. [24] I. M. Gelfand and N. Ya. Vilenkin, Generalized Functions, Vol. 4, Academic Press, 1964. [25] J. Glimm and A. Jaffe, Boson quantum field models, in Collected Papers, Vol. 1, Birkh¨ auser, 1985, pp. 125–199. [26] J. Glimm and A. Jaffe, Quantum Physics, 2nd edition, Springer, 1987. [27] L. Gross, Potential theory on Hilbert space, J. Funct. Anal. 1 (1967), 123–181. [28] M. Grothaus, Yu. G. Kondratiev and L. Streit, Complex Gaussian analysis and the Bargmann–Segal space, Methods of Funct. Anal. and Topology 3 (1997), 46–64. [29] R. Haag, On quantum field theories, Dan. Mat. Fys. Medd. 29(12) (1955), 1–37. [30] T. Hida, Analysis of Brownian Functionals, Carleton Math. Lect. Notes No. 13, Carleton University, Ottawa, 1975. [31] T. Hida, H.-H. Kuo and N. Obata, Transformations for white noise functionals, J. Funct. Anal. 111 (1993), 259–277.
October 24, 2013
350
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
[32] T. Hida, H.-H. Kuo, J. Potthoff and L. Streit, White Noise: An Infinite Dimensional Calculus, Kluwer Academic Publishers, 1993. [33] T. Hida, N. Obata and K. Saitˆ o, Infinite dimensional rotations and Laplacians in terms of white noise calculus, Nagoya Math. J. 128 (1992), 65–93. [34] F. Hiroshima and K. R. Ito, Local exponents and infinitesimal generators of canonical transformations on Boson Fock space, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 7 (2004), 547–571. [35] H. Holden, B. Øksendal, J. Ubøe and T. Zhang, Stochastic Partial Differential Equations, Birkh¨ auser, 1996. [36] Z.-Y. Huang, Quantum white noises — white noise approach to quantum stochastic calculus, Nagoya Math. J. 129 (1993), 23–42. [37] Z.-Y. Huang and S.-L. Luo, Wick calculus of generalized operators and its applications to quantum stochastic calculus, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1 (1998), 455–466. [38] R. L. Hudson and K. R. Parthasarathy, Quantum Itˆ o’s formula and stochastic evolutions, Commun. Math. Phys. 93 (1984), 301–323. [39] R. L. Hudson and S. N. Peck, Canonical Fourier transforms, J. Math. Phys. 20 (1979), 114–119. [40] S. Janson, Gaussian Hilbert Spaces, Cambridge Tracts in Mathematics, Cambridge University Press, 2008. [41] U. C. Ji, Stochastic integral representation theorem for quantum semimartingales, J. Funct. Anal. 201 (2003), 1–29. [42] U. C. Ji and N. Obata, Quantum white noise calculus, in Non-Commutativity, Infinite-Dimensionality and Probability at the Crossroads, N. Obata, T. Matsui and A. Hora (eds.), World Scientific, 2002, pp. 143–191. [43] U. C. Ji and N. Obata, A unified characterization theorem in white noise theory, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 6 (2003), 167–178. [44] U. C. Ji and N. Obata, Unitarity of Kuo’s Fourier–Mehler transform, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 7 (2004), 147–154. [45] U. C. Ji and N. Obata, A role of Bargmann–Segal spaces in characterization and expansion of operators on Fock space, J. Math. Soc. Japan 56 (2004), 311–338. [46] U. C. Ji and N. Obata, Admissible white noise operators and their quantum white noise derivatives, in Infinite Dimensional Harmonic Analysis III, World Sci. Publ., Hackensack, NJ, 2005, pp. 213–232. [47] U. C. Ji and N. Obata, Unitarity of generalized Fourier–Gauss transforms, Stochastic Anal. Appl. 24 (2006), 733–751. [48] U. C. Ji and N. Obata, Generalized white noise operator fields and quantum white noise derivatives, S´eminaires et Congr´es 16 (2008), 17–33. [49] U. C. Ji and N. Obata, Annihilation-derivative, creation-derivative and representation of quantum martingales, Commun. Math. Phys. 286 (2009), 751–775. [50] U. C. Ji and N. Obata, Quantum stochastic integral representations of Fock space operators, Stochastics 81 (2009), 367–384. [51] U. C. Ji and N. Obata, Quantum stochastic gradients, Interdiscip. Inform. Sci. 14 (2009), 345–359.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
351
[52] U. C. Ji and N. Obata, Implementation problem for the canonical commutation relation in terms of quantum white noise derivatives, J. Math. Phys. 51 (2010), 123507. [53] U. C. Ji, N. Obata and H. Ouerdiane, Analytic characterization of generalized Fock space operators as two-variable entire functions with growth condition, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 5 (2002), 395–407. [54] Yu. G. Kondratiev, P. Leukert and L. Streit, Wick calculus in Gaussian analysis, Acta Appl. Math. 44 (1996), 269–294. [55] Yu. G. Kondratiev and L. Streit, Spaces of white noise distributions: Constructions, descriptions, applications I, Rep. Math. Phys. 33 (1993), 341–366. [56] P. Kr´ee, La th´eorie des distributions en dimension quelconque et l’int´egration stochastique, in Stochastic Analysis and Related Topics, H. Korezlioglu and A. S. Ustunel (eds.), Lect. Notes in Math. Vol. 1316, Springer–Verlag, 1988, pp. 170–233. [57] P. Kr´ee and R. R¸aczka, Kernels and symbols of operators in quantum field theory, Ann. Inst. Henri Poincar´ e Sect. A 28 (1978), 41–73. [58] P. Kristensen, L. Mejlbo and E. Thue Poulsen, Tempered distributions in infinitely many dimensions I–III, Commun. Math. Phys. 1 (1965), 175–214; Math. Scand. 14 (1964), 129–150; Commun. Math. Phys. 6 (1967), 29–48. [59] I. Kubo, H.-H. Kuo and A. Sengupta, White noise analysis on a new space of Hida distributions, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 2 (1999), 315–335. [60] I. Kubo and S. Takenaka, Calculus on Gaussian white noise I–IV, Proc. Japan Acad. 56A (1980), 376–380; 411–416; 57A (1981), 433–437; 58A (1982), 186–189. [61] H.-H. Kuo, On Fourier transforms of generalized Brownian functionals, J. Multivariate Anal. 12 (1982), 415–431. [62] H.-H. Kuo, Fourier–Mehler transforms of generalized Brownian functionals, Proc. Japan Acad. 59A (1983), 312–314. [63] H.-H. Kuo, On Laplacian operators of generalized Brownian functionals, in Stochastic Processes and Applications, K. Ito and T. Hida (eds.), Lect. Notes in Math. Vol. 1203, Springer–Verlag, 1986, pp. 119–128. [64] H.-H. Kuo, Convolution and Fourier transform of Hida distributions, in Stochastic Partial Differential Equations and their Applications, Lect. Notes in Control and Inform. Sci. Vol. 176, Springer, 1992, pp. 165–176. [65] H.-H. Kuo, White Noise Distribution Theory, CRC Press, 1996. [66] H.-H. Kuo, A quarter century of white noise theory, in Quantum Information IV, T. Hida and K. Saitˆ o (eds.), World Scientific, 2002, pp. 1–37. [67] H.-H. Kuo, J. Potthoff and L. Streit, A characterization of white noise test functionals, Nagoya Math. J. 121 (1991), 185–194. [68] Y.-J. Lee, Integral transforms of analytic functions on abstract Wiener spaces, J. Funct. Anal. 47 (1983), 153–164. [69] Y.-J. Lee, Analytic version of test functionals, Fourier transform and a characterization of measures in white noise calculus, J. Funct. Anal. 100 (1991), 359–380.
October 24, 2013
352
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch04
Real and Stochastic Analysis
[70] J. M. Lindsay, Quantum and non-causal stochastic integral, Probab. Theory Related Fields 97 (1993), 65–80. [71] P. Malliavin, Stochastic Analysis, Springer-Verlag, 1997. [72] P. A. Meyer, Quantum Probability for Probabilists, Lect. Notes in Math. Vol. 1538, Springer–Verlag, 1993. [73] D. Nualart, The Malliavin Calculus and Related Topics, Springer-Verlag, New York, 1995. [74] N. Obata, Rotation-invariant operators on white noise functionals, Math. Z. 210 (1992), 69–89. [75] N. Obata, An analytic characterization of symbols of operators on white noise functionals, J. Math. Soc. Japan 45 (1993), 421–445. [76] N. Obata, White Noise Calculus and Fock Space, Lect. Notes in Math. Vol. 1577, Springer–Verlag, 1994. [77] N. Obata, Generalized quantum stochastic processes on Fock space, Publ. RIMS 31 (1995), 667–702. [78] N. Obata, Integral kernel operators on Fock space — Generalizations and applications to quantum dynamics, Acta Appl. Math. 47 (1997), 49–77. [79] N. Obata, Quantum stochastic differential equations in terms of quantum white noise, Nonlinear Analysis, Theory, Methods and Applications 30 (1997), 279–290. [80] N. Obata, Wick product of white noise operators and quantum stochastic differential equations, J. Math. Soc. Japan. 51 (1999), 613–641. [81] N. Obata, Unitarity criterion in white noise calculus and nonexistence of unitary evolutions driven by higher powers of quantum white noises, in Modelos Estoc´ asticos II, D. Hern´ andez, J. A. L´ opez-Mimbela and R. Quezada (eds.), Aportaciones Matem´ aticas Investigaci´ on 16, Mexican Math. Soc., 2001, pp. 251–269. [82] N. Obata, Coherent state representation and unitarity condition in white noise calculus, J. Korean Math. Soc. 38 (2001), 297–309. [83] N. Obata and H. Ouerdiane, A note on convolution operators in white noise calculus, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 14 (2011), 661–674. [84] K. R. Parthasarathy, An Introduction to Quantum Stochastic Calculus, Birkh¨ auser, 1992. [85] S. N. Peck, Canonical Schwartz spaces and generalized operators, J. Math. Phys. 20 (1979), 336–343. [86] M. A. Piech, Parabolic equations associated with the number operator, Trans. Amer. Math. Soc. 194 (1974), 213–222. [87] J. Potthoff and L. Streit, A characterization of Hida distributions, J. Funct. Anal. 101 (1991), 212–229. [88] M. Reed and B. Simon, Methods of Modern Mathematical Physics I: Functional Analysis, Academic Press, 1980. [89] S. N. M. Ruijsenaars, On Bogoliubov transforms for systems of relativistic charged particles, J. Math. Phys. 18 (1976), 517–526. [90] S. N. M. Ruijsenaars, On Bogoliubov transforms, II. The general case, Ann. Phys. 116 (1978), 105–134.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Quantum White Noise Calculus and Applications
b1644-ch04
353
[91] H. H. Schaefer, Topological Vector Spaces, 4th corrected printing, SpringerVerlag, 1980. [92] D. Shale, Linear symmetries of free boson fields, Trans. Amer. Math. Soc. 103 (1962) 149–167. [93] F. Treves, Topological Vector Spaces, Distributions and Kernels, Academic Press, 1967. [94] Y. Yamasaki, Measures on Infinite Dimensional Spaces, World Scientific, 1985.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
This page intentionally left blank
b1644-ch04
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
CHAPTER 5 ´ WEAK RADON-NIKODYM DERIVATIVES, DUNFORD-SCHWARTZ TYPE INTEGRATION, ´ AND KARHUNEN PROCESSES AND CRAMER ˆ ˆ KAKIHARA YUICHIR O
1. Introduction This article deals with a Dunford-Schwartz type integral with respect to a Hilbert-Schmidt class operator valued measure and its application to processes of Cram´er and Karhunen classes. In so doing we develop a theory of weak Radon-Nikod´ ym derivatives of Hilbert space and Hilbert-Schmidt class operator valued measures. To integrate a scalar function with respect to a Hilbert space valued measure or a Hilbert-Schmidt class operator valued measure the Dunford-Schwartz integral (Dunford and Schwartz [10]) suffices. However, if we integrate an operator valued function with respect to an operator valued measure, the Dunford-Schwartz integral is no longer applicable. If a measure has a finite variation and so a Radon-Nikod´ ym derivative, then this deficiency will be overcome using the Bochner integral. To include measures of unbounded variation we need a weaker notion of Radon-Nikod´ ym derivative, simply called a weak Radon-Nikod´ ym derivative here. When a measure takes values in a separable Hilbert space, it is easy to construct such a derivative and this is one of our main concerns in this article. To be more specific let (Θ, A) be a measurable space and H be a separable Hilbert space. Consider an H-valued countably additive measure ξ ∈ ca(A, H). For a scalar measurable function f on Θ an integral f dξ, A ∈ A A
355
October 24, 2013
356
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
can be defined by the Dunford-Schwartz integral. If Φ is an operator valued function on Θ such as B(H)-valued, where B(H) is the set of all bounded linear operators on H, then it is not easy to define an integral Φ dξ, A ∈ A. A
If ξ has a Radon-Nikod´ ym derivative ξ with respect to some dominating measure ν, then the integral might be defined by Φξ dν, A ∈ A. A
We should mention that our derivative is not using Pettis integral as considered in Bongiorno, Di Piazza and Musial [3]. We should also mention that the results expressed here may be generalized to a measure taking values in a Banach space with a base (cf. Lipecki and Musial [19]). Roughly, the first half of this article is based on the works from Kakihara [15, 16, 17] and the latter half is new. The content of each section is described as follows. In Section 2, separable Hilbert space valued measures are studied. For such a measure a weak Radon-Nikod´ ym derivative is defined. Some conditions on existence and uniqueness of a weak Radon-Nikod´ ym derivative are obtained. Integral of a scalar valued function with respect to a measure with a weak Radon-Nikod´ ym derivative is defined and a relation to the Dunford-Schwartz integral is presented. Based on the work of Section 2 we consider Hilbert-Schmidt class operator valued measures in Section 3. Using a weak Radon-Nikod´ ym derivative of such a measure a Dunford-Schwartz type integral is defined for an operator valued function. We shall show, for instance, that a set of integrable operator valued functions with some restrictions is a Banach space under a suitable norm. Finally in Section 4, some classes of infinite dimensional second order stochastic processes are explored. The Cram´er and Karhunen classes of finite dimensional processes are defined and investigated extensively (cf. Chang and Rao [5] and Rao [32, 35]). So our goal here is to study infinite dimensional Cram´er and Karhunen classes. This was done partially in Kakihara [13] and we shall complete that work. We should emphasize that our study of Hilbert space valued measures and Hilbert-Schmidt class operator valued measures is motivated from the infinite dimensional second order stochastic processes.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
357
2. Hilbert Space Valued Measures Hilbert space valued measures are considered in this section. To define a Dunford-Schwartz type integral for such a measure of unbounded variation, we introduce a weak Radon-Nikod´ ym derivative with respect to a suitable dominating measure. Basic properties of this type of derivative are obtained. A necessary and sufficient condition for the existence of a weak Radon-Nikod´ ym derivative is shown. Using an orthogonally scattered dilation we can also give fairy general sufficient conditions. When we apply a weak Radon-Nikod´ ym derivative of a Hilbert space measure to define a Dunford-Schwartz type integral we prove that our integral coincides with the Dunford-Schwartz integral.
2.1. Radon-Nikod´ ym property To understand the need of a weaker notion of Radon-Nikod´ ym derivative let us consider the ordinary Radon-Nikod´ ym derivative of a Banach space valued measure. Let (Θ, A) be a measurable space as before and X be a Banach space with a norm ·X . Let ca(A, X) denote the set of all X-valued countably additive measures on A. For an X-valued measure ξ ∈ ca(A, X) the variation |ξ|(·) is defined by |ξ|(A) = sup ξ(∆)X : π ∈ Π(A) , A ∈ A, ∆∈π
where Π(A) is the set of all finite measurable partitions of A. Let vca(A, X) be the set of all X-valued measures ξ ∈ ca(A, X) of bounded variation, i.e., |ξ|(Θ) < ∞. The Banach space X is said to have the Radon-Nikod´ym property with respect to a finite measure space (Θ, A, ν) if, for each X-valued measure ξ ∈ vca(A, X) of bounded variation with ξ ν there exists an X-valued function ϕ ∈ L1 (Θ, ν ; X) such that ξ(A) = ϕ dν, A ∈ A, (2.1.1) A
where L1 (Θ, ν ; X) is the Banach space of X-valued Bochner integrable functions on Θ with respect to ν. The function ϕ is called a Radon-Nikod´ym derivative of ξ with respect to ν and is denoted by ϕ = dξ/dν. The Banach space X is said to have the Radon-Nikod´ym property if it has the RadonNikod´ ym property with respect to any finite measure space. According to Diestel and Uhl [9, p. 82 and p. 79] the following Banach spaces have the
October 24, 2013
358
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
Radon-Nikod´ ym property: (1) Reflexive spaces. (2) Separable dual spaces. Hence, any Hilbert space H has the Radon-Nikod´ ym property. Moreover, the space T (H) of trace class operators on a separable Hilbert space H has the Radon-Nikod´ ym property. We note that the finiteness of a measure can be replaced by the σ-finiteness, which is shown below. Lemma 2.1.1. If a Banach space has the Radon-Nikod´ym property, then it has the Radon-Nikod´ym property with respect to any σ-finite measure space. Proof. Let X be a Banach space with the Radon-Nikod´ ym property and (Θ, A, ν) be a σ-finite measure space. Suppose that an X-valued measure ξ ∈ vca(A, X) is of bounded variation such that ξ ν. Then, the variation |ξ| is a finite measure such that ξ |ξ|. Hence, there exists a function ϕ ∈ L1 (Θ, |ξ| ; X) such that ξ(A) = ϕ d|ξ|, A ∈ A. A
Note that |ξ| ν. In fact, suppose ν(A) = 0. Then, ν(A) = 0 =⇒ ν(A ∩ B) = 0,
B∈A
=⇒ ξ(A ∩ B) = 0,
B∈A
(2.1.2)
=⇒ |ξ|(A) = 0. By the Radon-Nikod´ ym theorem (cf. Dunford and Schwartz [10, p. 176]) the Radon-Nikod´ ym derivative f = d|ξ|/dν exists in L1 (Θ, ν). Then we see that d|ξ| dν = ξ(A) = ϕ d|ξ| = ϕ ϕf dν, A ∈ A dν A A A and ϕf ∈ L1 (Θ, ν ; X) since ∞ > |ξ|(Θ) = Thus, the lemma is proved.
Θ
ϕX d|ξ| =
Θ
ϕX |f | dν.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
359
Two things should be remarked. Let X be a Banach space. For any X-valued measure ξ ∈ ca(A, X) there exists a finite measure ν such that ξ ν (cf. Dunford and Schwartz [10, p. 321]). If ξ is of bounded variation and has a Radon-Nikod´ ym derivative dξ/dν with respect to a suitable ν, it is not necessarily unique in the a.e. sense (cf. Rao [30] and Whitney [38, p. 319]). However, when we consider a separable Hilbert space, we can show the uniqueness of the Radon-Nikod´ ym derivative in the a.e. sense as follows. Proposition 2.1.2. If H is a separable Hilbert space, ξ ∈ vca(A, H) is an H-valued measure of bounded variation and ν is a σ-finite measure such that ξ ν, then the Radon-Nikod´ym derivative dξ/dν of ξ with respect to ν exists in L1 (Θ, ν ; H) and is unique in the ν-a.e. sense. Proof. Since the existence of the Radon-Nikod´ ym derivative ϕ = dξ/dν follows from Lemma 2.1.1, we shall prove the uniqueness of ϕ in the ν-a.e. sense. Let {ek }∞ k=1 be an orthonormal basis of H. For each k ≥ 1 we see that (ξ(·), ek )H is a scalar valued measure that is absolutely continuous with respect to ν. Here, (·, ·)H is the inner product in H. Hence the RadonNikod´ ym derivative gk ≡ d(ξ(·), ek )/dν ∈ L1 (Θ, ν) exists. On the other hand (ϕ, ek )H dν = ϕ dν, ek = (ξ(A), ek )H , A ∈ A, A
A
H
which implies that d (ξ(·), ek )H = (ϕ, ek )H ν-a.e. dν since the Radon-Nikod´ ym derivative gk is unique in the ν-a.e. sense. Now it follows that ∞ ∞ (ϕ, ek )H ek = gk ek ν-a.e. ϕ= gk =
k=1
k=1
Let ϕ1 ∈ L1 (Θ, ν ; H) be another Radon-Nikod´ ym derivative of ξ with respect to ν, so that ξ(A) = ϕ1 dν, A ∈ A. A
Then we see that, for each k ≥ 1, (ξ(A), ek )H = (ϕ1 , ek )H dν, A
A ∈ A,
October 24, 2013
10:0
9in x 6in
360
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
and hence d (ξ(·), ek )H = gk = (ϕ1 , ek )H . dν It follows that ϕ=
∞
gk ek = ϕ1
ν-a.e.
k=1
Thus, ϕ is unique in the ν-a.e. sense.
Let H be a separable Hilbert space and {ek }∞ k=1 be its orthonormal basis. For an H-valued measure ξ ∈ ca(A, H) let ν be a σ-finite measure such that ξ ν and consider scalar measures ξk (·) = (ξ(·), ek )H ,
k ≥ 1.
Since the Radon-Nikod´ ym derivative gk ≡ dξk /dν exists in L1 (Θ, ν) for k ≥ 1 we can form an H-valued function ϕ(θ) =
∞
gk (θ)ek ,
θ ∈ Θ.
(2.1.3)
k=1
This function ϕ is well-defined ν-a.e. if and only if ∞
|gk (θ)|2 < ∞ ν-a.e. θ.
(2.1.4)
k=1
It follows from Proposition 2.1.2 that if ξ is of bounded variation, then ∞
dξ =ϕ= gk ek dν
ν-a.e.
k=1
and ϕ ∈ L1 (Θ, ν ; H). A question here is that if ξ is not of bounded variation and condition (2.1.4) is satisfied, then does the function ϕ defined by (2.1.3) represent any kind of Radon-Nikod´ ym derivative of ξ with respect to ν? This will be investigated in the next subsection. 2.2. Weak Radon-Nikod´ ym derivatives First we shall give two Hilbert space valued measures of unbounded variation, each of which has a well-defined function ϕ given by (2.1.3) and satisfying the equation (2.1.1). Then, the definition of a weak Radon-Nikod´ ym derivative of a Banach space valued measure will be stated and its well-definedness will be shown.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
361
Example 2.2.1. Let H be a separable Hilbert space with an orthonormal basis {ek }∞ k=1 . Let Θ = R, the set of real numbers, and A = BR , the Borel σ-algebra of R. Define a set function ξ on BR by ξ(A) =
1 ek , k
A ∈ BR ,
k∈A∩N
where N = {1, 2, 3, . . .}. Then we see that ξ ∈ ca(BR , H) but |ξ|(R) =
∞ 1 = ∞, k
k=1
i.e., ξ is of unbounded variation. If we define a σ-finite measure ν by ν({k}) = 1,
k ∈ N,
ν(R\N) = 0,
then ξ ν, gk ≡ dξ/dν = (1/k)1{k} and ∞ ∞ 1 gk (θ)2 ≤ < ∞, k2 k=1
θ ∈ R,
k=1
where 1A represents the indicator function of a set A. Hence, condition (2.1.4) is satisfied and the function ϕ given by ϕ(θ) =
∞ 1 1{k} (θ)ek k
k=1
is well-defined ν-a.e. θ. Moreover, the defining Equation (2.1.1) for a RadonNikod´ ym derivative is enjoyed by this ϕ. Example 2.2.2. Consider Θ = [0, 1], A = BΘ (the Borel σ-algebra of [0, 1]) and the Hilbert space H = L2 (Θ, λ), where λ is the Lebesgue measure. Let the H-valued set function ξ be given by ξ(A) = 1A ,
A ∈ BΘ .
Clearly ξ ∈ ca(BΘ , H) and is of unbounded variation. In fact, let Ak = (1/(k + 1), 1/k], k ≥ 1. Then, Ak ’s are disjoint and |ξ|(Θ) ≥ =
n k=1 n k=1
ξ(Ak )2 =
n k=1 n
λ(Ak )
1 1 ≥ → ∞ (n → ∞), k + 1 k(k + 1) k=1
October 24, 2013
10:0
9in x 6in
362
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
where ·2 is the norm in L2 (Θ, λ). If we set ϕ(θ) = 1Θ , then it follows that ϕ ∈ L1 (Θ, ν ; H) since ϕ1 = |ξ|(Θ) = ∞ and ϕ dν = 1A , A ∈ BΘ . ξ(A) = A
Hence, the equation (2.1.1) is true. We now need a weaker notion of Radon-Nikod´ ym derivatives to be defined below for a Banach space valued measure. Definition 2.2.3. Let X be a Banach space and ξ ∈ ca(A, X) be an X-valued measure. Assume that ξ ν with a σ-finite measure ν. Then, we say that ξ has a weak Radon-Nikod´ym derivative ξ with respect to ν if ξ is an X-valued function on Θ and if there exists a sequence {yn }∞ n=1 ⊂ L0 (Θ ; X) of X-valued A-simple functions such that (1) y n (θ) − ξ (θ)X → 0 ν-a.e. θ; (2) A yn dν − ξ(A)X → 0 for A ∈ A. The sequence {yn } is called a determining sequence for ξ . We shall write ξ = dξ/dν and ξ dν = lim yn dν = ξ(A), A ∈ A. A
n→∞
A
The independence of the choice of a determining sequence should be obtained. This will be shown after the following lemma is proved. Lemma 2.2.4. Let ν be a σ-finite measure on (Θ, A) and ηn , η be finite measures on (Θ, A) such that ηn , η ν for n ≥ 1. Let fn = dηn /dν for n ≥ 1 and f = dη/dν. If ηn (A) → η(A) for A ∈ A and fn → g ν-a.e., then f = g ν-a.e. Proof.
It suffices to show that fn → f ν-a.e. Suppose the contrary. Then, A0 = {θ ∈ Θ : fn (θ) → f (θ)}
has a positive ν-measure, i.e., ν(A0 ) > 0. Since fn → g ν-a.e., we can assume that B0 = {θ ∈ Θ : g(θ) > f (θ)} is such that ν(B0 ) > 0. Observe that ∞
1 θ ∈ Θ : g(θ) ≥ f (θ) + ≡ C0 . B0 ⊆ n n=1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
363
Since ν(C0 ) > 0 there is some n0 ≥ 1 such that ν(D0 ) > 0, where 1 . D0 = θ ∈ Θ : g(θ) ≥ f (θ) + n0 We can assume that ν(D0 ) < ∞ since if ν(D0 ) = ∞, then σ-finiteness of ν implies that there is some subset D1 ⊂ D0 such that ν(D1 ) < ∞ and we can work on D1 . Now by Egorov’s theorem (cf. Dunford and Schwartz [10, pp. 149–150]) for any ε > 0 with ε < ν(D0 ) there exists E0 ⊂ D0 such that ν(E0 ) < ε and fn → g
uniformly on D0 \E0 ≡ F0 .
Hence, there exists a k0 ≥ 1 such that |fn − g| <
1 2n0
for n ≥ k0 on F0 ,
fn ≥ f +
1 2n0
for n ≥ k0 on F0 .
which implies that
Thus it follows that for n ≥ k0 ηn (F0 ) = fn dν ≥ F0
= η(F0 ) +
1 f+ dν 2n0 F0
ν(F0 ) . 2n0
This contradicts ηn (F0 ) → η(F0 ). Therefore, fn → f ν-a.e. and f = g ν-a.e. Corollary 2.2.5. Let H be a separable Hilbert space and ξ ∈ ca(A, H) be an H-valued measure such that ξ ν with ν a σ-finite measure. If ξ has a weak Radon-Nikod´ym derivative ξ with respect to ν, then ξ does not depend on the choice of a determining sequence. 0 Proof. Let {yn }∞ n=1 ⊂ L (Θ ; H) be a determining sequence for ξ , so that we have yn − ξ H → 0 ν-a.e. and
yn dν − ξ(A) → 0 for every A ∈ A.
A
H
October 24, 2013
10:0
9in x 6in
364
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
Let {ek }∞ k=1 be an orthonormal basis of H. For any fixed k ≥ 1 let η(A) = (ξ(A), ek )H , A ∈ A, ηn (A) = (yn , ek )H dν, n ≥ 1, A ∈ A. A
Also let f=
dη , dν
g = (ξ , ek )H ,
fn =
dηn = (yn , ek )H , dν
n ≥ 1.
Then, we see that ηn (A) → η(A) for any A ∈ A and fn → g ν-a.e. Hence, it follows from Lemma 2.2.4 that f = g ν-a.e., i.e., d (ξ(·), ek )H = (ξ , ek )H dν Thus we have that ξ =
∞
(ξ , ek )H ek =
k=1
ν-a.e.
∞ d (ξ(·), ek )H ek dν
ν-a.e.,
k=1
which is independent of {yn }∞ n=1 .
2.3. Existence and uniqueness In this subsection we assume that H is a separable Hilbert space and obtain a necessary and sufficient condition for an H-valued measure to have a weak Radon-Nikod´ ym derivative with respect to a dominating measure. Definition 2.3.1. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν with ν a σ-finite measure. We say that ξ is ν-bounded if there exists an orthonormal basis {ek }∞ k=1 of H such that ∞
|gk (θ)|2 < ∞ ν-a.e. θ,
(2.3.1)
k=1
where gk = d(ξ(·), ek )H /dν ∈ L1 (Θ, ν) for k ≥ 1. First we should show that ν-boundedness does not depend on a particular choice of an orthonormal basis. Lemma 2.3.2. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν ∞ with a σ-finite measure ν. Let {ek }∞ k=1 and {ek }k=1 be two orthonormal bases of H. If ξ is ν-bounded with respect to {ek }∞ k=1 , then it is ν-bounded , too. with respect to {ek }∞ k=1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
365
Proof. Suppose that ξ is ν-bounded relative to {ek }∞ k=1 . Then, an H-valued function ϕ(θ) =
∞
gk (θ)ek ,
θ∈Θ
k=1
is well-defined ν-a.e., where gk ≡ d(ξ(·), ek )H /dν ∈ L1 (Θ, ν) for k ≥ 1. It follows that for A ∈ A and k ≥ 1 (ϕ, ek )H dν = ϕ dν, ek = (ξ(A), ek )H , A
A
H
which implies that gk ≡
d (ξ(·), ek )H = (ϕ, ek )H . dν
Thus we get ϕ(θ) =
∞
(ϕ(θ), ek )H ek =
k=1
∞
gk (θ)ek
ν-a.e. θ
k=1
and hence ϕ(θ)2 =
∞ k=1
|gk (θ)|2 =
∞
|gk (θ)|2 < ∞ ν-a.e. θ.
k=1
Therefore, ξ is ν-bounded relative to {ek }∞ k=1 .
We considered a discrete measure ξ in Example 2.2.1 and showed that ξ is ν-bounded and has a weak Radon-Nikod´ ym derivative with respect to ν. Hence, we can expect that any discrete measure has a weak RadonNikod´ ym derivative with respect to a suitable dominating measure. This is justified as follows. Proposition 2.3.3. Let ξ ∈ ca(A, H) be an H-valued discrete measure such that ξ ν, where ν is a σ-finite measure. Then, ξ is ν-bounded. Moreover, ξ has a weak Radon-Nikod´ym derivative with respect to ν. Proof. Let ξ ∈ ca(A, H) be an H-valued discrete measure such that ξ ν, so that there is a countable set B = {θ1 , θ2 , . . .} such that ξ({θi }) = 0 for i ≥ 1 and ξ(A) = 0 if A ∩ B = ∅. Since ξ ν we can let ν({θi }) = αi > 0 for i ≥ 1.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
366
b1644-ch05
Real and Stochastic Analysis
Let {ek }∞ k=1 be an orthonormal basis of H and write ξ({θi }) =
∞
βik ek ,
i ≥ 1,
k=1
so that
∞ k=1
|βik |2 < ∞ for i ≥ 1. Then, it is easy to see that for k ≥ 1 ∞
gk ≡
βik d (ξ(·), ek )H = 1{θi } dν αi i=1
since
gk (θi )ν({θi }) =
{θi }
gk dν = (ξ({θi }), ek )H ,
i ≥ 1.
Hence we have that
∞ |βik |2 ∞ 2 , θ = θi , i ≥ 1, |gk (θ)|2 = k=1 αi k=1 0, otherwise
and ξ is ν-bounded. ∞ Moreover, ϕ(θ) = k=1 gk (θ)ek converges in H ν-a.e. θ and is a weak Radon-Nikod´ ym derivative of ξ with respect to ν, which is verified easily. In fact, for each n ≥ 1 let Bn = {θ1 , . . . , θn } and an H-valued function yn be defined by yn (θ) =
n
1Bn (θ)gk (θ)ek ,
θ ∈ Θ.
k=1 0 Then, the sequence {yn }∞ n=1 ⊂ L (Θ ; H) is a determining sequence for ϕ.
Now we can state and prove a main theorem in this subsection saying that ν-boundedness is a necessary and sufficient condition for the existence of a weak Radon-Nikod´ ym derivative of an H-valued measure. Theorem 2.3.4. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν with a σ-finite measure ν. Then, ξ has a weak Radon-Nikod´ym derivative with respect to ν if and only if ξ is ν-bounded. In this case, the weak Radon-Nikod´ym derivative is unique in the ν-a.e. sense.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
367
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
Proof. Assume that ξ is ν-bounded, so that for some orthonormal basis {ek }∞ k=1 of H the inequality (2.3.1) holds with gk = d(ξ(·), ek )H /dν ∈ L1 (Θ, ν) for k ≥ 1. Let us define a function ϕ : Θ → H by ∞ gk (θ)ek , if the series converges in H, ϕ(θ) = k=1 0, otherwise. Since ξ is ν-bounded the series converges ν-a.e. and ϕ is defined everywhere. 0 For each k ≥ 1 let {ykn }∞ n=1 ⊂ L (Θ) be a sequence of C-valued A-simple functions on Θ such that ykn − gk 1 → 0 ykn (θ) − gk (θ) → 0
as n → ∞,
ν-a.e. θ as n → ∞,
|ykn (θ) − gk (θ)| ≤ |gk (θ)|
ν-a.e. θ, n ≥ 1,
(2.3.2) (2.3.3) (2.3.4)
where C is the set of complex numbers and ·1 is the L1 -norm in L1 (Θ, ν). Let ∞ 2 |gk (θ)| < ∞ Aϕ = θ ∈ Θ : k=1
∩
∞
{θ ∈ Θ : ykn (θ) → gk (θ) as n → ∞}
k=1
∩
∞
{θ ∈ Θ : |ykn (θ) − gk (θ)| ≤ |gk (θ)|}.
n,k=1
Then ν(Acϕ ) = 0 by ν-boundedness of ξ, (2.3.3) and (2.3.4). Also let yn (θ) =
n
ykn (θ)ek ,
θ ∈ Θ, n ≥ 1.
k=1
We shall show that ϕ is the weak Radon-Nikod´ ym derivative and the sequence {yn }∞ is a determining sequence for ϕ. n=1 Let θ ∈ Aϕ and ε > 0 be given. There exists an integer n0 ≥ 1 such that
n
(ϕ(θ), ek )H ek < ε, n ≥ n0 . (2.3.5)
ϕ(θ) −
k=1
H
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
368
b1644-ch05
Real and Stochastic Analysis
Since (ϕ(θ), ek )H = gk (θ) for k ≥ 1 it follows that for n ≥ n0
2 ∞ ∞
(ϕ(θ), ek )H ek = |(ϕ(θ), ek )H |2
k=n+1
k=n+1
H
∞
=
|gk (θ)|2 < ε2 .
(2.3.6)
k=n+1
By (2.3.3) we can choose an integer n1 ≥ n0 such that ε |ykn (θ) − gk (θ)| ≤ √ , n0
n ≥ n1 , 1 ≤ k ≤ n0 .
(2.3.7)
Thus, we have for n ≥ n1 that ϕ(θ) − yn (θ)H
n n
≤ ϕ(θ) − gk (θ)ek + (gk (θ) − ykn (θ))ek
k=1 k=1 H H
n 0
< ε + (gk (θ) − ykn (θ))ek
k=1
H
n
+ (gk (θ) − ykn (θ))ek ,
k=n0 +1
n 0
=ε+
H
1/2 |gk (θ) − ykn (θ)|
< ε + n0 +
+
n
1/2 |gk (θ) − ykn (θ)|
2
k=n0 +1
k=1
2
by (2.3.5),
ε √ n0
n
2 1/2 1/2
|gk (θ)|2
,
by (2.3.7) and (2.3.4),
k=n0 +1
< ε + ε + ε,
by (2.3.6),
= 3ε. Thus, ϕ(θ) − yn (θ)H → 0 as n → ∞. This shows that yn − ϕH → 0 as n → ∞ ν-a.e.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
369
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
The second condition can be shown in a similar manner as above. Let A ∈ A and ε > 0 be given. There exists an integer n2 ≥ 1 such that
n
(ξ(A), ek )H ek < ε, n ≥ n2 . (2.3.8)
ξ(A) −
k=1
H
Hence it follows that for n ≥ n2
2 ∞ ∞
(ξ(A), e ) e = |(ξ(A), ek )H |2
k H k
k=n+1
k=n+1
H
2 ∞ < ε2 . g dν k
=
k=n+1
Now we claim that ykn ’s can be chosen to satisfy (gk − ykn ) dν ≤ gk dν , k, n ≥ 1, A ∈ A. A
(2.3.9)
A
(2.3.10)
A
To see this claim let us write for k, n ≥ 1 gk = (g1k − g2k ) + i(g3k − g4k ),
gjk ≥ 0, j = 1, 2, 3, 4,
ykn = (y1kn − y2kn ) + i(y3kn − y4kn ),
yjkn ≥ 0, j = 1, 2, 3, 4.
Then, we can select yjkn ’s so that 0 ≤ yjkn ≤ gkn ν-a.e., Consequently, gjk dν, 0 ≤ (gjk − yjnk ) dν ≤ A
yjkn ∈ L0 (Θ).
j = 1, 2, 3, 4, k, n ≥ 1, A ∈ A.
A
This implies the required inequality (2.3.10). Now by (2.3.2) we can choose an integer n3 ≥ n2 such that ε ykn − gk 1 ≤ √ , n2
n ≥ n3 , 1 ≤ k ≤ n2 .
Then, by the help of (2.3.8), (2.3.9) and (2.3.10) we have that for n ≥ n3
ξ(A) − yn dν
A
H
n
≤ ξ(A) − (ξ(A), ek )H ek
k=1
H
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
370
b1644-ch05
Real and Stochastic Analysis
n n
+ (ξ(A), ek )H ek − ykn ek dν
A k=1
k=1
n
<ε+ (gk − ykn )ek dν
A
H
k=1
H
k=1
H
n
n 2
≤ε+ (gk − ykn )ek dν + (gk − ykn )ek dν
A A k=n2 +1
H
n 2 1/2 2 1/2 n 2 (gk − ykn ) dν (gk − ykn ) dν =ε+ + A
k=1
≤ε+
n 2
<ε+
2 1/2 |gk − ykn | dν
+
A
k=1
n 2
k=n2 +1
A
2 1/2 n gk dν
k=n2 +1
A
1/2 gk − ykn 21
+ε
k=1
< ε + n2
ε √ n2
2 1/2 +ε
= 3ε. Thus, the second condition in Definition 2.2.3 is verified. Therefore, ϕ = dξ/dν. To show the converse, suppose that ξ has a weak Radon-Nikod´ ym ∞ derivative ξ with respect to ν. Let {ek }k=1 be an orthonormal basis of H and gk = d(ξ(·), ek )H /dν for k ≥ 1. Then we have gk = (ξ , ek )H ν-a.e. for k ≥ 1 and ξ =
∞
gk ek
ν-a.e.,
(2.3.11)
k=1
which implies that ∞ > ξ (θ)2H =
∞ gk (θ)|2 k=1
Thus, ξ is ν-bounded.
ν-a.e. θ.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
371
The uniqueness of a weak Radon-Nikod´ ym derivative follows from (2.3.11). There are some immediate consequences of the above theorem that will be used later. Corollary 2.3.5. Let {ek }∞ k=1 be an orthonormal basis of H, ξ ∈ ca(X, H) be an H-valued measure and ν be a σ-finite measure such that ξ ν. Assume that ξ has a weak Radon-Nikod´ym derivative ξ with respect to ν. (1) If we write the Radon-Nikod´ ym derivative of (ξ(·), ek )H with respect to ν as (ξ(·), ek )H , then the weak Radon-Nikod´ym derivative ξ of ξ is expressed as ∞
ξ =
(ξ(·), ek )H ek
ν-a.e.
(ξ(·), φ)H = (ξ , φ)H
ν-a.e.
k=1
(2) For any φ ∈ H we have
(3) If {φ, φn (n ≥ 1)} ⊂ H and φn − φH → 0 (n → ∞), then (ξ(·), φn )H → (ξ , φ)H
ν-a.e. and in L1 (Θ, ν).
(4) If a ∈ B(H), then aξ ∈ ca(A, H) and aξ ν. Moreover, aξ has a weak Radon-Nikod´ym derivative with respect to ν given by (aξ) = aξ
ν-a.e.
Proof. (1) follows from Theorem 2.3.4. (2) Let φ ∈ H. For n ≥ 1 write zn =
n
(ξ(·), ek )H ek .
k=1
Then, zn − ξ H → 0 (n → ∞) ν-a.e. It follows from the bounded convergence theorem that for A ∈ A (ξ(A), φ)H = ξ dν, φ = lim zn dν, φ =
A
lim
n→∞
A
A n→∞
H
zn dν, φ
H
zn dν, φ
= lim H
n→∞
A
H
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
372
b1644-ch05
Real and Stochastic Analysis
n→∞
= A
(zn , φ)H dν =
= lim
lim (zn , φ)H dν
A n→∞
A
lim zn , φ
n→∞
H
(ξ , φ)H dν.
dν = A
This is enough to show that (ξ(·), φ)H = (ξ , φ)H ν-a.e. (3) follows from (2). (4) For any φ ∈ H we have by (2) above ((aξ) , φ)H = (aξ, φ)H = (ξ, a∗ φ)H = (ξ , a∗ φ)H = (aξ , φ)H . Thus, (aξ) = aξ ν-a.e.
Corollary 2.3.5(3) can be generalized as follows, where the existence of the weak Radon-Nikod´ ym derivative is not assumed. Lemma 2.3.6. Let H be any Hilbert space, ξ ∈ ca(A, H) be an H-valued measure and ξ ν with a σ-finite measure ν. If {φ, φn (n ≥ 1)} ⊂ H and φn − φH → 0 (n → ∞), then (ξ(·), φn )H → (ξ(·), φ)H
in L1 (Θ, ν).
∞ Hence, there exists a subsequence {φnk }∞ k=1 ⊆ {φn }n=1 such that
(ξ(·), φnk )H → (ξ(·), φ)H Proof.
ν-a.e.
For n ≥ 1 let µn (A) = (ξ(A), φn − φ)H , gn = (ξ(·), φn )H ,
A ∈ A,
g = (ξ(·), φ)H ∈ L1 (Θ, ν).
Then, by Diestel and Uhl [9, p. 2] we have that the semivariation of ξ is written as ξ(Θ) = sup{|(ξ(·), ψ)H |(Θ) : ψ ∈ H, ψH ≤ 1}. Hence, it follows that gn − g1 = |µn |(Θ) = |(ξ(·), φn − φ)H |(Θ) ≤ ξ(Θ)φn − φH → 0, i.e., gn → g in L1 (Θ, ν).
The following proposition shows that if an H-valued measure ξ has a weak Radon-Nikod´ ym derivative with respect to a σ-finite measure ν, then
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
373
we can find some finite measure ν0 such that ν ≈ ν0 (i.e., ν ν0 ν) and ξ has a weak Radon-Nikod´ ym derivative with respect to ν0 . Proposition 2.3.7. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν, where ν is a σ-finite measure. If ξ has a weak Radon-Nikod´ym derivative ξ with respect to ν, then there exists a finite measure ν0 such that ν ≈ ν0 and ξ has a weak Radon-Nikod´ym derivative ξ0 with respect to ν0 given by ξ0 = ξ
dν dν0
ν-a.e.
(2.3.12)
Proof. Since ν is σ-finite there is a countable measurable partition {B1 , B2 , . . .} ⊂ A of Θ such that ν(Bn ) < ∞, n ≥ 1. Define ν0 by ν0 (A) =
∞ ν(A ∩ Bn ) , n2 ν(Bn ) n=1
A ∈ A.
It is clear that ν0 is a finite measure such that ν ≈ ν0 . Let {ek }∞ k=1 be an orthonormal basis of H. Since ξ has a weak RadonNikod´ ym derivative ξ with respect to ν, it is ν-bounded by Theorem 2.3.4, i.e., ∞
|gk (θ)|2 < ∞ ν-a.e. θ,
(2.3.13)
k=1
where gk ≡ d(ξ(·), ek )H /dν ∈ L1 (Θ, ν) for k ≥ 1. Since ξ ν0 ≈ ν we have dν d d hk ≡ (ξ(·), ek )H (ξ(·), ek )H = dν0 dν dν0 = gk p ∈ L1 (Θ, ν0 ),
k ≥ 1,
where p ≡ dν/dν0 ∈ L1 (Θ, ν0 ). Then, we see that ∞
|hk (θ)|2 =
k=1
∞
|gk (θ)|2 |p(θ)|2 < ∞ ν0 -a.e. θ
k=1
by (2.3.13) and |p| < ∞ ν0 -a.e. Hence ξ is ν0 -bounded and has a weak Radon-Nikod´ ym derivative ξ0 with respect to ν0 that is given by, using (2.3.11), ξ0 = which is (2.3.12).
∞
∞
k=1
k=1
dν dξ = hk ek = gk pek = ξ dν0 dν0
ν0 -a.e.,
October 24, 2013
374
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
2.4. Orthogonally scattered measures and dilation In the previous subsection we saw that ν-boundedness is equivalent to the existence of a weak Radon-Nikod´ ym derivative for an H-valued measure. In this subsection, we take another view for the existence. Let H be any Hilbert space (not necessarily separable) and consider H-valued measures. A special class of Hilbert space measures is that of orthogonally scattered measures. An extensive study of such measures is done by Masani [21]. An important theorem is that any Hilbert space valued measure has an orthogonally scattered dilation. Using this fact we can give fairly simple sufficient conditions for the existence of the weak RadonNikod´ ym derivative, which is our object in this subsection. Definition 2.4.1. An H-valued measure η ∈ ca(A, H) is said to be orthogonally scattered if (η(A), η(B))H = 0 for every disjoint A, B ∈ A. Let caos (A, H) denote the set of all H-valued orthogonally scattered measures on A. As is easily seen, the measures ξ given in Examples 2.2.1 and 2.2.2 are orthogonally scattered. Now let η ∈ caos(A, H) be an H-valued orthogonally scattered measure. Define νη by νη (A) = η(A)2H ,
A ∈ A.
(2.4.1)
Then, νη is a finite measure such that η νη . For an H-valued measure ξ ∈ ca(A, H) let us denote by Sξ = S{ξ(A) : A ∈ A},
(2.4.2)
the closed subspace of H generated by the set {ξ(A) : A ∈ A}. Then, the existence of a weak Radon-Nikod´ ym derivative of an orthogonally scattered measure can be phrased as follows. Theorem 2.4.2. Let η ∈ caos(A, H), an H-valued orthogonally scattered measure, and νη be given by (2.4.1). If the closed subspace Sη is separable, then η has a weak Radon-Nikod´ym derivative with respect to νη . Proof. Since the space Sη defined by (2.4.2) is separable we can choose a countable set A1 = {A1 , A2 , . . .} ⊆ A such that η(Ai ) = 0 for i ≥ 1 and Sη = S{η(Ai ) : i ≥ 1}.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
375
Define A2 = {A ∩ B, A\B, B\A : A, B ∈ A1 }, A3 = {A ∩ B, A\B, B\A : A, B ∈ A1 ∪ A2 } and, in general, for k ≥ 4 define Ak = {A ∩ B, A\B, B\A : A, B ∈ A1 ∪ A2 ∪ · · · ∪ Ak−1 }. Let B = ∪∞ i=1 Ai which is countable. Then, choose a subset B0 = {B1 , B2 , . . .} ⊆ B such that Bi ∩ Bj = ∅ for i = j and each Ai can be written as a union of some Bj ’s, so that Sη = S{η(Ai ) : i ≥ 1} ⊆ S{η(Bi ) : i ≥ 1} ⊆ Sη since η is orthogonally scattered, or S{η(Bi ) : i ≥ 1} = Sη . Collect nonempty Bi ’s such that η(Bi ) = 0 and write them as C = {C1 , C2 , . . .}, so that Ci ∩ Cj = ∅ for i = j. It follows that Sη = S{η(Ci ) : i ≥ 1}. Moreover, the set {η(Ck )/η(Ck )H }∞ k=1 forms an orthonormal basis for Sη since η is orthogonally scattered and η(Ci ) = 0 for i ≥ 1. Letting ek =
η(Ck ) , η(Ck )H
k ≥ 1,
we set gk =
d η(·), ek H , dνη
k ≥ 1,
the Radon-Nikod´ ym derivative of the scalar measure (η(·), ek )H with respect to νη . Observe that η(Ck ) 1 η(·), ek H = η(·), = νη (· ∩ Ck ). η(Ck )H H η(Ck )H Hence, we have that gk =
1Ck η(Ck )H
νη -a.e.,
where 1Ck is the indicator function of Ck . Consequently, it follows that ∞ ∞ 1Ck (θ) gk (θ)2 = < ∞ νη -a.e. θ. νη (Ck )
k=1
k=1
Therefore, by Theorem 2.3.4 we see that η has the weak Radon-Nikod´ ym derivative with respect to νη .
October 24, 2013
376
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
An immediate consequence of the above is: Corollary 2.4.3. Let η ∈ caos(A, H) be an H-valued orthogonally scattered measure. If H is separable or the σ-algebra A has a countable generator, then η has a weak Radon-Nikod´ ym derivative with respect to νη . As mentioned earlier any Hilbert space valued measure has an orthogonally scattered dilation, where its definition is given below. This was first noted by Abreu [1] in the study of stationary dilation of a strongly harmonizable process. Niemi [28] proved it in a more general setting where the measurable space is a topological space. Finally, Chatterji [7] showed it for any measurable space, implying that the orthogonally scattered dilation is an algebraic property. Now the definition goes as follows. Definition 2.4.4. An orthogonally scattered dilation of an H-valued measure ξ ∈ ca(A, H) is a triple {η, K, J} such that (1) K is a Hilbert space containing Sξ as a closed subspace; (2) η is a K-valued orthogonally scattered measure on A such that ξ = Jη, where J : K → Sξ is the orthogonal projection. Note that the Hilbert space K need not contain the whole Hilbert space H. We can take K to be Sη . In this case, {η, Sη , J} is called a minimal orthogonally scattered dilation of ξ. A measure ξ ∈ ca(A, H) is said to have a separable orthogonally scattered dilation {η, K, J} if K is separable. Using orthogonally scattered dilation we can state some sufficient conditions for a Hilbert space valued measure to have a weak Radon-Nikod´ ym derivative with respect to a suitable dominating measure, which follow from Theorem 2.4.2. Corollary 2.4.5. A Hilbert space valued measure has a weak RadonNikod´ym derivative with respect to a suitable dominating measure if it has a separable orthogonally scattered dilation. Proof. Let ξ ∈ ca(A, H) have a separable orthogonally scattered dilation {η, K, J}, where K is a separable Hilbert space and η ∈ caos(A, K). Then, η has a weak Radon-Nikod´ ym derivative η with respect to a finite measure
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
377
νη . Since ξ = Jη, it follows from Corollary 2.3.5 (4) that the weak RadonNikod´ ym derivative ξ = dξ/dνη of ξ with respect to νη exists and is given by ξ = Jη .
Thus the conclusion follows.
Corollary 2.4.6. If the σ-algebra A has a countable generator, then every Hilbert space valued measure on (Θ, A) has a weak Radon-Nikod´ym derivative with respect to a suitable dominating measure. Proof. Since A has a countable generator, every Hilbert space valued measure on A has a separable range. Hence, the conclusion follows from Corollary 2.4.5. 2.5. Dunford-Schwartz type integration We define an integral of a scalar function with respect to a Hilbert spacevalued measure with a weak Radon-Nikod´ ym derivative. Then, we study a relationship between this type of integral and Dunford-Schwartz integral. As before we assume that H is a separable Hilbert space. Definition 2.5.1. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν, where ν is a σ-finite measure on A. Assume that ξ has a weak Radon-Nikod´ ym derivative ξ with respect to ν. A scalar valued measurable function f on Θ is said to be ξ-integrable if there exists a sequence 0 {fn }∞ n=1 ⊂ L (Θ) of scalar valued A-simple functions on Θ such that (1) fn ξ − f ξ H → 0 ν-a.e. as n → ∞; (2) For every A ∈ A the sequence { A fn dξ}∞ n=1 is a Cauchy sequence in H, where the integral A fn dξ is defined in an obvious way. In this case we shall write f dξ = f ξ dν = lim fn dξ, A
A
n→∞
A ∈ A,
(2.5.1)
A
which is called the integral of f with respect to ξ over A. Let L1 (ξ) denote the set of all scalar valued functions on Θ that are ξ-integrable. A scalar valued function f is said to be ξ-integrable in the DS-sense if 0 there exists a sequence {fn }∞ n=1 ⊂ L (Θ) of scalar valued A-simple functions such that (1’) below and (2) above are true: (1’) fn → f ξ-a.e., i.e., |ξ|({θ ∈ Θ : fn (θ) → f (θ)}) = 0.
October 24, 2013
10:0
9in x 6in
378
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
In this case, the integral of f with respect to ξ over A ∈ A is defined by f dξ = lim fn dξ. (2.5.2) n→∞
A
A
Let L1DS (ξ) denote the set of all scalar valued functions on Θ that are ξ-integrable in the DS-sense. In either case, the sequence {fn }∞ n=1 is called a determining sequence for f . Note that, in (1’) of the above definition, the variation |ξ|(·) can be replaced by the semivariation ξ(·). The integral (2.5.2) does not depend on the choice of a determining sequence {fn }∞ n=1 as is shown in Dunford and Schwartz [10, p. 323]. Thus, we need to show the same thing for our integral (2.5.1). This will be done after a lemma and its corollary are proved. Lemma 2.5.2. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν0 ν, where ν and ν0 are σ-finite measures. Suppose that ξ has a weak Radon-Nikod´ym derivative ξ with respect to ν and a scalar valued function f is ξ-integrable relative to ξ . Then, ξ has a weak Radon-Nikod´ym derivative ξ0 with respect to ν0 and f is ξ-integrable relative to ξ0 . Moreover, the integrals (2.5.1) of f using ξ and ξ0 coincide for each determining sequence. Proof.
Let {ek }∞ k=1 be an orthonormal basis of H and let q≡
dν0 , dν
gk ≡
d (ξ(·), ek )H , dν
hk ≡
d (ξ(·), ek )H , dν0
k ≥ 1.
Then, since ξ exists and ν0 ν we see that ∞>
∞ k=1
|gk (θ)|2 =
∞
|hk (θ)|2 |q(θ)|2
ν-a.e. θ, ν0 -a.e. θ.
k=1
Let B = {θ ∈ Θ : q(θ) = 0}. Then, ν0 (B) = B q dν = 0. This implies that |ξ|(B) = 0, |(ξ(·), ek )H |(B) = 0 and hk = 0 on B for k ≥ 1. Hence, ∞
|hk |2 < ∞
ν0 -a.e.
k=1
∞ 2 c since obviously k=1 |hk | < ∞ ν0 -a.e. on B . Thus, the weak Radon Nikod´ ym derivative ξ0 of ξ with respect to ν0 exists.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
379
0 Let {fn}∞ n=1 ⊂ L (Θ) be a determining sequence for f and let
C = {θ ∈ Θ : fnξ0 − f ξ0 H → 0}, D = {θ ∈ Θ : fnξ − f ξ H → 0} = {θ ∈ Θ : fnξ0 q − f ξ0 qH → 0}. By condition (1) of Definition 2.5.1 and ν0 ν we have ν(D) = 0 and ν0 (D) = 0. If θ ∈ C ∩ D c , then q(θ) = 0. Hence, q dν = 0. ν0 (C ∩ Dc ) = C∩Dc
fn ξ0
− f ξ0 H → 0 ν0 -a.e. Thus, ν0 (C) = 0 and integrals (2.5.1) of f using ξ and ξ0 coincide since n ≥ 1 are independent of ξ and ξ0 for each A ∈ A.
It is obvious that the the integrals A fn dξ,
Corollary 2.5.3. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν where ν is a σ-finite measure and assume that ξ has a weak RadonNikod´ym derivative ξ with respect to ν. Then, there exists a finite measure ν0 such that ν0 (A) ≤ ξ(A),
A ∈ A,
(2.5.3)
ξ ν0 ν and the weak Radon-Nikod´ym derivative ξ0 of ξ with respect to ν0 exists. Proof. The existence of a finite measure ν0 satisfying (2.5.3) and ξ ν0 follows from Dunford and Schwartz [10, p. 321]. We show that ν0 ν. In fact, the argument in (2.1.2) implies that if ν(A) = 0, then |ξ|(A) = 0 and ξ(A) = 0, so that ν0 (A) = 0 by (2.5.3). Now the rest of the assertion follows from Lemma 2.5.2. For a given H-valued measure ξ ∈ ca(A, H) there always exists a finite dominating measure ν for ξ. In fact, if we let {η, K, J} be its orthogonally scattered dilation, then νη is finite and ξ νη . Moreover, it follows from Corollary 2.5.3 that there is some finite measure ν with properties ξ ν and ν(A) ≤ ξ(A),
A ∈ A.
(2.5.4)
Under these preparations we can prove the following. Proposition 2.5.4. Let ξ ∈ ca(A, H) be an H-valued measure that has a weak Radon-Nikod´ym derivative ξ = dξ/dν with respect to a finite measure
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
380
b1644-ch05
Real and Stochastic Analysis
ν such that (2.5.4) holds and f ∈ L1 (ξ) be ξ-integrable. Then, the integral (2.5.1) is independent of the choice of a determining sequence. ∞ 0 Proof. Let {fn }∞ n=1 , {fn }n=1 ⊂ L (Θ) f . For n ≥ 1 and θ ∈ Θ define ϕn (θ) by 0, ϕn (θ) = f (θ)ξ (θ) − f (θ)ξ (θ), n
{
n
be two determining sequences for if fn (θ)ξ (θ) → f (θ)ξ (θ) or fn (θ)ξ (θ) → f (θ)ξ (θ), otherwise.
we see that ϕn (θ)H → 0 for every θ ∈ Θ and the sequence Then, ∞ ϕ dν} is convergent in H for every A ∈ A. Moreover, n n=1 A lim (fn − fn ) dξ = 0, n ≥ 1, ν(A)→0
A
or
ϕn dν = 0,
lim
ν(A)→0
n ≥ 1.
(2.5.5)
A
Since the sequence { A ϕn dν}∞ n=1 converges for every A ∈ A, by the VitaliHahn-Saks Theorem (cf. Diestel and Uhl [9, pp. 24–25]) the limit in (2.5.5) is uniform in n. Hence for any ε > 0 there exists a δ > 0 such that
< ε, n ≥ 1. ϕ dν ν(A) < δ =⇒ n
A
H
By Egoroff’s theorem there exists an A0 ∈ A such that ν(A0 ) < δ and ϕn → 0 uniformly on Ac0 . Once ε > 0 and δ > 0 are chosen, there exists a positive integer N such that for any n ≥ N , ϕn (θ)H < ε for every θ ∈ Ac0 and
ϕn dν ≤ ϕ dν + ϕ dν
n n
A∩Ac0
A A∩A0 H H H
≤ εν(Θ) + ε uniformly for A ∈ A. Thus,
ϕn dν lim
n→∞ A
=0 H
for every A ∈ A. Therefore, the integral A f dξ is independent of the choice of a determining sequence, so that it is well-defined for every A ∈ A.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
381
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
One final question is: what is a relationship between the two integrabilities given in Definition 2.5.1? Of course the integrability in DS-sense is more general, but those two coincide in the present case as shown below. Proposition 2.5.5. Let ξ ∈ ca(A, H) be an H-valued measure such that ξ ν where ν is a finite measure satisfying (2.5.4). If ξ has a weak Radon-Nikod´ym derivative ξ = dξ/dν, then ξ-integrability given above and ξ-integrability in the DS-sense are equivalent. Hence, it follows that L1 (ξ) = L1DS (ξ). Proof. By the assumption we see that ν-a.e. is equivalent to ξ-a.e. Thus, (1’) in Definition 2.5.1 implies (1) in Definition 2.5.1. Suppose that (1) holds and let A = {θ ∈ Θ : fn (θ)ξ (θ) → f (θ)ξ (θ)}, B = {θ ∈ Θ : fn (θ) → f (θ)}. Since fn ξ − f ξ H → 0 ν-a.e., we have ν(A) = 0. If θ ∈ B ∩ Ac , then fn (θ)ξ (θ) − f (θ)ξ (θ)H = |fn (θ) − f (θ)|ξ (θ)H → 0. Hence, ξ = 0 on B ∩ Ac , so that ξ(B) ≤ ξ(B ∩ A) + ξ(B ∩ Ac ) = 0. Thus, ν(B) = 0 and fn → f ν-a.e. Therefore, (1) implies (1’) and the two notions of integrability coincide. Let ξ ∈ ca(A, H) be an H-valued measure that has a weak RadonNikod´ ym derivative ξ with respect to a suitable dominating measure ν, so that the space L1 (ξ) is formed. For a ξ-integrable function f ∈ L1 (ξ) define an H-valued measure ξf and the norm of f respectively by ξf (A) = f dξ, A ∈ A A
and f ξ = ξf (Θ).
(2.5.6)
It is shown by Abreu and Salehi [2] that the space L1DS (ξ) is a Banach space for any Banach space valued measure ξ ∈ ca(A, X) under the norm ·ξ corresponding to the one given by (2.5.6). In the present case, the space L1 (ξ) is a Banach space with the norm ·ξ . In Subsection 3.4, we shall show this kind of result also for a Hilbert-Schmidt class operator valued measure.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
382
b1644-ch05
Real and Stochastic Analysis
2.6. Bimeasure integration Scalar valued bimeasure integration theory was initiated by Morse and Transue in a series of papers [24, 25, 26, 27]. Subsequently several authors such as Thomas [37] developed the theory. Chang and Rao [4, 5] and Rao [31, 32, 34] applied the bimeasure theory to harmonizable and other second order stochastic processes. In this subsection, we briefly give basic definitions and results concerning bimeasure integration theory along the line of [4, 5]. Definition 2.6.1. Let A × A = {A × B : A, B ∈ A} and consider a mapping m : A × A → C. m is said to be a scalar bimeasure if (1) m(·, A) and m(A, ·) are scalar measures for each A ∈ A; (2) For each n ≥ 1, α1 , . . . , αn ∈ C and A1 , . . . , An ∈ A we have that n
αi αj m(Ai , Aj ) ≥ 0.
i,j=1
Let M = M(A × A) be the set of all scalar bimeasures on A × A. For a bimeasure m ∈ M the variation and semivariation are defined respectively by
|m|(A, B) = sup
∆∈π,∆ ∈π
|m(∆, ∆ )| : π ∈ Π(A), π ∈ Π(B) ,
α∆ α∆ m(∆, ∆ ) : |α∆ |, |α∆ | ≤ 1, m(A, B) = sup ∆∈π,∆ ∈π ∆ ∈ π ∈ Π(A), ∆ ∈ π ∈ Π(B) , for A, B ∈ A. Let Mv = Mv (A × A) denote the set of all scalar bimeasures of bounded variation. We are assuming positive definiteness condition (2) in the above definition. This is because a typical example of a bimeasure is the one induced by a Hilbert space valued measure. That is, for an H-valued measure ξ ∈ ca
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
383
(A, H) define mξ by mξ (A, B) = (ξ(A), ξ(B))H ,
A, B ∈ A.
It is easy to see that mξ is a bimeasure. The integration of scalar functions with respect to a bimeasure is defined as follows. Definition 2.6.2. Let m ∈ M be a scalar bimeasure on A × A and f, g be scalar valued A-measurable functions on Θ. (1) The pair (f, g) is said to be m-integrable if the following three conditions hold: (a) f is m(·, B)-integrable for B ∈ A and g is m(A, ·)-integrable for A ∈ A; (b) m1 (·) ≡ Θ g(λ) m(·, dλ), m2 (·) ≡ Θ f (θ) m(dθ, ·) ∈ ca(A, C); (c) f is m1 -integrable and g is m2 -integrable and the following equality is true: f (θ) m1 (dθ) = g(λ) m2 (dλ). Θ
Θ
Let L2 (m) denote the set of all scalar functions f on Θ such that the pair (f, f ) is m-integrable. (2) The pair (f, g) is said to be strictly m-integrable if condition (a) above and (b’) and (c’) below are true: C (b’) mD 1 (·) ≡ D g(λ) m(·, dλ), m2 (·) ≡ C f (θ) m(dθ, ·) ∈ ca(A, C) for C, D ∈ A; D (c’) f is mC 1 -integrable for C ∈ A and g is m2 -integrable for D ∈ A, and the following equality is true: D f (θ) m1 (dθ) = g(λ) mC C, D ∈ A. (2.6.1) 2 (dλ), C
D
The common value in (2.6.1) is denoted by C
∗ D
(f, g) dm
∗
or C
f (θ)g(λ) m(dθ, dλ).
D
Let L2∗ (m) denote the set of all scalar functions f on Θ such that the pair (f, f ) is strictly m-integrable.
October 24, 2013
10:0
9in x 6in
384
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
Remark 2.6.3. Let H be any Hilbert space. Then, in Chang and Rao [4] it is proved that for any H-valued measure ξ ∈ ca(A, H) we have that L1DS (ξ) = L2∗ (mξ ), where the equality is as sets. Moreover, L2∗ (mξ ) becomes a Hilbert space with the inner product (·, ·)mξ given by (f, g)mξ = (f, f ) dmξ , f, g ∈ L2∗ (mξ ). Θ
Θ
In Section 3.6, we shall generalize these results for a trace class operator valued bimeasure. 3. Hilbert-Schmidt Class Operator Valued Measures Hilbert-Schmidt class operator valued measures are studied here that are used to describe infinite dimensional second order stochastic processes. We need to generalize the results developed in Section 2 to the present case. 3.1. The space of Hilbert-Schmidt class operators as a normal Hilbert module Let H and K be two Hilbert spaces. Let B(H) denote the space of all bounded linear operators on H and T (H) the space of all trace class operators in B(H). We consider a normal Hilbert B(H)-module as a model for the space S(K, H) of all Hilbert-Schmidt class operators from K to H (cf. Kakihara [13] and Ozawa [29]). Definition 3.1.1. A linear space X is called a left B(H)-module if there is a module action (a, x) → a · x ∈ X for a ∈ B(H) and x ∈ X. X is called a normal pre-Hilbert B(H)-module if X is a left B(H)-module and has a T (H)-valued gramian [·, ·] : X × X → T (H) such that for x, y, z ∈ X and a ∈ B(H) (1) (2) (3) (4)
[x, x] ≥ 0, and [x, x] = 0 if and only if x = 0; [x + y, z] = [x, z] + [y, z]; [a · x, y] = a[x, y]; [x, y]∗ = [y, x].
If X is a pre-Hilbert normal B(H)-module, then an inner product and a norm are defined respectively by (x, y)X = tr[x, y],
1/2
xX = (x, x)X = [x, x]τ1/2 ,
x, y ∈ X,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
385
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
so that X becomes a pre-Hilbert space, where tr(·) is a trace and ·τ is a trace norm. If X is a Hilbert space, then X is called a normal Hilbert B(H)-module. A typical example of a normal Hilbert space is the space X = S(K, H) ⊂ B(K, H), where the module action and the gramian are given respectively by a · x = ax,
[x, y] = xy ∗ ,
a ∈ B(H), x, y ∈ X.
Moreover, for any normal Hilbert B(H)-module X there exists a Hilbert space K such that X is isomorphic to S(K, H) (cf. Ozawa [29] and Kakihara [13]). Since the tensor product space H ⊗ K is isomorphic to S(K, H), we can identify X = S(K, H) = H ⊗ K and consider these spaces as the same. Here, each tensor product φ ⊗ f ∈ H ⊗ K is identified as a one-dimensional operator from K to H: (φ ⊗ f )g = (g, f )K φ,
g ∈ K,
where (·, ·)K is an inner product in K. Moreover, the following equalities will be used later: for φ, ψ ∈ H, f, g ∈ K and a ∈ B(H) a · (φ ⊗ f ) = (aφ) ⊗ f,
(φ ⊗ f )∗ = f ⊗ φ,
[φ ⊗ f, ψ ⊗ g] = (f, g)K φ ⊗ ψ.
(3.1.1) (3.1.2)
Thus, let X = S(K, H) be a normal Hilbert B(H)-module and consider its gramian (or modular) bases. Definition 3.1.2. Let X be a normal Hilbert B(H)-module. A subset {xα }α∈I of norm one elements of X is said to be gramian orthonormal if (1) [xα , xβ ] = 0 if α = β; (2) [xα , xα ]2 = [xα , xα ] for α ∈ I. A maximal gramian orthonormal set is called a gramian basis of X. A typical gramian basis is obtained as follows. Let φ ∈ H be of norm one and {fα }α∈I be an orthonormal basis of K. Then, the set {φ ⊗ fα }α∈I forms a gramian basis of X. If {xα }α∈I is a gramian basis, then each x ∈ X
October 24, 2013
10:0
9in x 6in
386
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
can be written uniquely as x=
[x, xα ]xα ,
α∈I
where the series converges in the norm of X. We say that X is a separable normal Hilbert B(H)-module if it has a countable gramian basis, which is so if and only if the Hilbert space K is separable where X = K ⊗ H. 3.2. The space L1 (ξ) Let H be a separable Hilbert space and X be a normal Hilbert B(H)module. We can think that X = S(K, H) = K ⊗ H for some Hilbert space K. We shall consider X-valued measures, so let us give some basic definitions and results here. After this, for an X-valued measure ξ of bounded variation we construct a space L1 (ξ) consisting of operator valued functions and show that it is a Banach space with a suitable norm. As before let ca(A, X) denote the set of all X-valued measures on A and vca(A, X) the set of all those measures in ca(A, X) of bounded variation. For an X-valued measure ξ ∈ ca(A, X) we define the operator semivariation ξo (A) on A ∈ A by
a∆ ξ(∆) : a∆ ∈ B(H), ξo (A) = sup
∆∈π
X
a∆ ≤ 1, ∆ ∈ π ∈ Π(A) . Let bca(A, X) denote the set of all X-valued measures of bounded operator semivariation. A useful formula for ξo (·) is given by ξo (A) = sup{|[ξ(·), x]|(A) : x ∈ X, x ≤ 1},
A ∈ A,
(3.2.1)
where |[ξ(·), x]|(A) is the variation of a T (H)-valued measure [ξ(·), x] at A ∈ A. Orthogonally scattered measures are defined in Section 2.4. In our setting, an X-valued measure ξ is said to be gramian orthogonally scattered if [ξ(A), ξ(B)] = 0 for disjoint A, B ∈ A. Let cagos(A, X) be the set of all gramian orthogonally scattered measures in ca(A, X). Clearly cagos(A, X) ⊂ caos(A, X). If ξ ∈ cagos(A, X) is gramian orthogonally
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
387
scattered, then we have ξo (A) = ξ(A)X ,
A ∈ A,
(3.2.2)
so that cagos(A, X) ⊂ bca(A, X), and νξ is a finite measure on A such that ξ νξ where the latter is given by νξ (A) = ξ(A)2X ,
A ∈ A.
(3.2.3)
Corresponding to orthogonally scattered dilation there is a notion of gramian orthogonally scattered dilation. Let ξ ∈ ca(A, X) be an X-valued measure. A gramian orthogonally scattered dilation of ξ is a triple {η, Y, P } such that Y is a normal Hilbert B(H)-module containing the closed sub˜ ξ generated by the set {ξ(A) : A ∈ A}, η ∈ cagos(A, Y ) is a module S ˜ ξ is the Y -valued gramian orthogonally scattered measure, and P : Y → S orthogonal projection with ξ = P η. An orthogonal projection onto a closed submodule is called a gramian orthogonal projection. Note that Y need not contain the whole space X. A gramian orthogonally scattered dilation {η, Y, P } is said to be separable if Y is a separable normal Hilbert B(H)˜ η . It is known that an X-valued module, and to be minimal if Y = S measure has a gramian orthogonally scattered dilation if and only if it is of bounded operator semivariation (cf. [13, p. 104]). Masani [22] studied quasiisometric measures that are essentially the same as gramian orthogonally scattered measures here. Rosenberg [36] obtained the gramian orthogonally scattered dilation in the case where dim K < ∞. Let L0 (Θ ; B(H)) denote the space of all B(H)-valued A-simple functions on Θ. That is, Φ ∈ L0 (Θ ; B(H)) is of the form Φ=
n
a k 1 Ak
k=1
for some positive integer n, a1 , . . . , an ∈ B(H) and {A1 , . . . , An } ∈ Π(Θ). For such Φ its integral with respect to ξ ∈ ca(A, X) is defined by n Φ dξ = ak ξ(A ∩ Ak ). A
k=1
In order to define the integral for more general functions we need some measurability notions. Definition 3.2.1. An H-valued function φ on Θ is said to be strongly mea0 surable if there exists a sequence {φn }∞ n=1 ⊂ L (Θ ; H) of H-valued A-simple
October 24, 2013
388
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
functions such that φn − φH → 0 for every θ ∈ Θ. The function φ is said to be weakly measurable if the scalar function (φ(·), ψ)H is A-measurable for every ψ ∈ H. [Since we assume that H is separable, it follows that φ is strongly measurable if and only if it is weakly measurable.] A B(H)-valued function Φ on Θ is said to be A-measurable if, for every φ ∈ H, the H-valued function Φ(·)φ is strongly measurable in the above sense. [Hence, Φ is A-measurable if and only if the scalar function (Φ(·)φ, ψ)H is A-measurable for every φ, ψ ∈ H.] Let O(H) denote the set of all linear operators a with domains D(a) ⊆ H and ranges R(a) ⊆ H. An O(H)-valued function Φ on Θ is said to be A-measurable if there exists a sequence {Φn }∞ n=1 of B(H)-valued Ameasurable functions on Θ such that Φn (θ)φ − Φ(θ)φH → 0 for every θ ∈ Θ and φ ∈ D(Φ(θ)). Now we consider an X-valued measure of bounded variation and integrals of O(H)-valued functions with respect to such a measure. Definition 3.2.2. Let ξ ∈ vca(A, X) be an X-valued measure of bounded variation. Hence, ν(·) = |ξ|(·) is a finite measure and the Radon-Nikod´ ym derivative ξ = dξ/dν ∈ L1 (Θ, ν ; X) exists, where L1 (Θ, ν ; X) is the Banach space of all X-valued Bochner integrable functions on Θ with respect to ν. Then, an O(H)-valued A-measurable function Φ on Θ is said to be ξintegrable if Φξ is X-valued and Φξ ∈ L1 (Θ, ν ; X). The norm of Φ is defined by Φξ X dν = Φξ 1 , (3.2.4) Φ1,ξ = Θ
1
where ·1 is the norm in L (Θ, ν ; X). Let L1 (ξ) denote the set of all O(H)-valued A-measurable functions on Θ that are ξ-integrable. Obviously, for ξ ∈ vca(A, X), the space L1 (ξ) is a linear space and a left B(H)-module. To show the completeness with respect to the norm defined by (3.2.4) we need a notion of the generalized inverse of a bounded linear operator in B(H). Definition 3.2.3. If L is a closed subspace of H, then JL stands for the orthogonal projection of H onto L. The generalized inverse a− of an
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
389
operator a ∈ B(H) is defined by a− = JN(a)⊥ a−1 JR(a) , where N(a)⊥ is the orthogonal complement of the null space N(a) of a and a−1 is the (multivalued) inverse relation to a, i.e., a−1 φ = {ψ ∈ H : aψ = φ}. Remark 3.2.4. (1) Here are some basic properties of the generalized inverses (cf. Hestenes [12] and Mandrekar and Salehi [20]). Let a ∈ B(H). (a) (b) (c) (d) (e)
If a = a∗ , then a− = JR(a) a−1 JN(a) . a− a = JN(a)⊥ . The closure of aa− = JR(a) . a− = (a∗ a)− a∗ . ∞ ∞ + If a = j=1 λj Ej ∈ C(H)+ , then a− = j=1 λ− j Ej , where C(H) is the nonnegative part of the set C(H) of all compact operators on H, and λ− = λ−1 if λ = 0 and = 0 if λ = 0.
(2) Let Φ(·) and Ψ(·) be O(H)-valued A-measurable functions on Θ and a(·) be a B(H)-valued A-measurable function on Θ. Then, Φ + Ψ, aΦ, and Φa are O(H)-valued A-measurable functions on Θ, where D(Φ + Ψ) = D(Φ) ∩ D(Ψ), D(aΦ) = D(Φ) and D(Φa) = {φ ∈ H : aφ ∈ D(Φ)}. (3) If Φ(·) is a C(H)+ -valued A-measurable function on Θ, then Φ− (·) = Φ(·)− is A-measurable. Applying the generalized inverses of operators we have the following theorem. Theorem 3.2.5. Let ξ ∈ vca(A, X) be an X-valued measure of bounded variation, and L1 (ξ) be the set of all O(H)-valued ξ-integrable functions on Θ. Then, L1 (ξ) is a left B(H)-module and a Banach space in the norm ·1,ξ given by (3.2.4). Proof. The proof is similar to that of Theorem 2.5 in Kakihara [14]. We only show the completeness of L1 (ξ). Let {Φn }∞ n=1 be a Cauchy sequence in L1 (ξ). Then, {Φn ξ }∞ is a Cauchy sequence in L1 (Θ, ν ; X), where n=1 ν(·) = |ξ|(·). Since the latter space is complete we can find some Ψ ∈ L1 (Θ, ν ; X) such that Φn ξ − Ψ1 → 0. Let Φ = Ψξ − . Then, we see that Φ is an O(H)-valued A-measurable function by Remark 3.2.4 (2) and (3).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
390
b1644-ch05
Real and Stochastic Analysis −
Moreover, by Remark 3.2.4 (1)(b), Φξ = Ψξ ξ = ΨJN(ξ )⊥ ∈ L1 (Θ, ν ; X) since Φξ X ≤ ΨX . Now we see that − Φn − Φ1,ξ = Φn ξ − Φξ 1 = Φn ξ − Ψξ ξ X dν Θ
= ≤
Θ
Θ
Φn ξ JN(ξ )⊥ − ΨJN(ξ )⊥ X dν Φn ξ − ΨX dν = Φn ξ − Ψ1 → 0
1
as n → ∞. Thus L (ξ) is complete. 3.3. Weak Radon-Nikod´ ym derivatives
We considered weak Radon-Nikod´ ym derivatives for Hilbert space valued measures in Section 2.2. Now let X = K ⊗ H = S(K, H). We study weak Radon-Nikod´ ym derivatives for X-valued measures, where H and K are separable Hilbert spaces. We obtain some representations for the weak Radon-Nikod´ ym derivative of an X-valued measure. Gramian ν-boundedness is introduced and is shown to be equivalent to the existence of a weak Radon-Nikod´ ym derivative with respect to ν. Let S(H) be the space of all Hilbert-Schmidt class operators in B(H). We have the following proposition. Proposition 3.3.1. Let ξ ∈ ca(A, X) be an X-valued measure such that ξ ν with ν a σ-finite measure. (1) ξ has a weak Radon-Nikod´ym derivative with respect to ν if and only if ξ is ν-bounded. (2) Suppose that ξ has a weak Radon-Nikod´ym derivative ξ and {x }∞
=1 is a gramian basis, where x = φ ⊗ f , ≥ 1 with φ ∈ H being of norm one and {f }∞
=1 being an orthonormal basis of K. Then, ξ is unique in the ν-a.e. sense and is given by ξ (θ) =
∞
Φ (θ)x ,
ν-a.e.,
(3.3.1)
=1
where Φ is an S(H)-valued function for ≥ 1. ∞ Proof. (1) Let {ek }∞ k=1 be an orthonormal basis of H and {f } =1 be a ∞ similar one of K. Observe that the set {ek ⊗f }k, =1 is an orthonormal basis
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
391
in X and X is separable. Hence, by Theorem 2.3.4 ξ has a weak RadonNikod´ ym derivative if and only if it is ν-bounded, completing the proof of part (1). (2) If {ek } and {f } are orthonormal bases of H and K respectively, we have that ξ (·) =
∞
gk, (·)(ek ⊗ f ) ν-a.e.,
k, =1
where gk, ≡
d (ξ(·), ek ⊗ f )X ∈ L1 (Θ, ν), dν
k, ≥ 1
and (·, ·)X is the inner product in X. Moreover, ξ is unique in the ν-a.e. sense. Note that for each ≥ 1 the H-valued function φ on Θ given by φ (·) =
∞
gk, (·)ek
k=1
is well-defined and ξ is expressed as ∞ ∞ ∞ gk, (·)ek ⊗ f = φ (·) ⊗ f
ξ (·) =
=1
k=1
ν-a.e.
=1
Now, for each ≥ 1, define an operator valued function Φ on Θ by Φ (θ)φ = φ (θ) and Φ (θ)ψ = 0 if ψ ⊥ φ. Then, it is obvious that Φ is an S(H)valued function for each ≥ 1 and that ξ (·) =
∞
Φ (·)φ ⊗ f =
=1
∞
Φ (·)x
ν-a.e.
=1
Thus the proof of part (2) and hence of the proposition is complete.
When a measure ξ is of bounded operator semivariation, then we can say more about the functions Φ in (3.3.1). Corollary 3.3.2. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν, where ν is a σ-finite measure. If ξ has a weak Radon-Nikod´ym derivative ξ with respect to ν, then we have that ∞ Φ (·)x ν-a.e., ξ (·) =
=1
October 24, 2013
10:0
9in x 6in
392
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
where {x = φ ⊗ f }∞
=1 is a gramian basis of X as in Proposition 3.3.1 and Φ is a T (H)-valued function given by Φ =
d [ξ(·), x ] ∈ L1 (Θ, ν ; T (H)), dν
≥ 1,
the Radon-Nikod´ym derivative of a T (H)-valued function [ξ(·), x ] with respect to ν. Proof.
Since {x }∞
=1 is a gramian basis for X we have that ξ(A) =
∞
[ξ(A), x ]x ,
A ∈ A,
=1
where the series converges in the norm ·X . Since ξ is of bounded operator semivariation it follows that ξ (·) ≡ [ξ(·), x ] ∈ vca(A, T (H)) is a T (H)valued measure of bounded variation for each ≥ 1 by (3.2.1). Since T (H) is a separable dual space and hence has the Radon-Nikod´ ym property and since ξ ν, it follows that the ordinary Radon-Nikod´ ym derivative of ξ
with respect to ν exists: ξ ≡
dξ
∈ L1 (Θ, ν ; T (H)), dν
≥ 1.
By Proposition 3.3.1 we have that ξ (·) =
∞
Φ (·)x
=1
with S(H)-valued functions Φ , ≥ 1. Since ξ is unique in the ν-a.e. sense, it follows that Φ = ξ ν-a.e for each ≥ 1. Thus the proof is complete. For an X-valued measure ξ ∈ ca(A, X) let Sξ denote the closed sub˜ ξ be the space of X generated by the set {ξ(A) : A ∈ A}. Moreover, let S closed submodule generated by the set {ξ(A) : A ∈ A}, i.e., the closed subspace generated by n ai ξ(Ai ) : ai ∈ B(H), Ak ∈ A, 1 ≤ k ≤ n, n = 1, 2, 3, . . . . k=1
As an analogy to ν-boundedness we define garmian ν-boundedness as follows. Definition 3.3.3. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν where ν is a σ-finite measure and
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
393
{x }∞
=1 be a gramian basis of X. Then, ξ is said to be gramian ν-bounded if the function Φ(θ) =
∞
[ξ(·), x ] x ,
θ∈Θ
(3.3.2)
=1
is well-defined ν-a.e. θ. As in the case of ν-boundedness we should show that gramian ν-boundedness does not depend on a particular choice of a gramian basis, which will be done after the following lemma. Lemma 3.3.4. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν with a σ-finite measure ν. (1) For x, y ∈ X and a, b ∈ B(H) it holds that [ξ(·), x + y] = [ξ(·), x] + [ξ(·), y] ,
[aξ(·), bx] = a[ξ(·), x] b∗
ν-a.e.
(2) If x, xn ∈ X (n ≥ 1) and xn − xX → 0, then [ξ(·), xn ] − [ξ(·), x] → 0
in L1 (Θ, ν ; T (H)).
Hence, for some subsequence {xnk } ⊆ {xn } it holds that [ξ(·), xnk ] − [ξ(·), x] τ → 0 Proof.
ν-a.e.
(1) is clear. To see (2) let for n ≥ 1 Fn (A) = [ξ(A), xn − x], Φn = [ξ(·), xn ] ,
A ∈ A,
Φ = [ξ(·), x] ∈ L1 (Θ, ν ; T (H)).
Note that by (3.2.1) |Fn |(Θ) ≤ ξo (Θ)xn − xX → 0. Hence, it follows that Φn − Φ1 =
Θ
Φn − Φτ dν = |Fn |(Θ) → 0,
so that Φn → Φ in L1 (Θ, ν ; T (H)).
Next lemma proves that gramian ν-boundedness is independent of the choice of gramian bases.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
394
b1644-ch05
Real and Stochastic Analysis
Lemma 3.3.5. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν, where ν is a σ-finite measure. Let ∞ {x }∞
=1 and {y } =1 be two gramian bases of X. If ξ is gramian ν-bounded with respect to {x }, then it is so with respect to {y }. Proof. Since ξ is gramian ν-bounded with respect to {x } the function Φ defined by (3.3.2) is a well-defined X-valued function ν-a.e. on Θ. For ≥ 1 we have ∞
x =
[x , ym ] ym .
(3.3.3)
m=1
Hence, it follows from (3.3.2), (3.3.3) and Lemma 3.3.4 that ∞ ∞ ∞ [x , ym ] ym [x , yn ] yn ξ(·), Φ= m=1
=1
=
∞ ∞ ∞
n=1
[ξ(·), ym ] [x , ym ]∗ [x , yn ] yn
=1 m=1 n=1
=
=
∞
[ξ(·), ym ]
m,n=1 ∞
∞
[ym , x ][x , yn ] yn
=1
[ξ(·), ym ] [ym , yn] yn
m,n=1
=
∞ m=1
=
∞
[ξ(·), ym ]
∞
[ym , yn ] yn
n=1
[ξ(·), ym ] ym
ν-a.e.
m=1
Thus, ξ is gramian ν-bounded with respect to {y }.
One more lemma is in order to establish our main result of this subsection, which is a Hilbert module version of Corollary 2.3.5 (2). Lemma 3.3.6. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν for a σ-finite measure ν. Suppose that ξ has a weak Radon-Nikod´ym derivative ξ with respect to ν. Then, we have for x ∈ X that [ξ(·), x] = [ξ , x]
ν-a.e.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
Proof.
b1644-ch05
395
Let x ∈ X be nonzero. Observe the following.
[ξ(·), x] = [ξ , x] ν-a.e. ⇐⇒ {tr([ξ(·), x] a) = tr([ξ , x]a), a ∈ B(H)} ν-a.e. ⇐⇒ {tr([ξ(·), a∗ x] ) = tr([ξ , a∗ x]), a ∈ B(H)} ⇐⇒ {(ξ(·), a∗ x)X = (ξ , a∗ x)X , a ∈ B(H)} ⇐⇒
{(ξ(·), z)X
= (ξ , z)X , z ∈ Xx }
ν-a.e.
ν-a.e.
ν-a.e.
⇐⇒ {(ξ(·), zn )X = (ξ , zn )X , n ≥ 1} ν-a.e. where {zn : n ≥ 1} is a dense subset of Xx , the closed submodule of X generated by x. Since the last statement is true by Corollary 2.3.5 (2), the proof is complete. Theorem 3.3.7. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν where ν is a σ-finite measure. Then, ξ has a weak Radon-Nikod´ym derivative with respect to ν if and only if it is gramian ν-bounded. Proof. Suppose that ξ has a weak Radon-Nikod´ ym derivative ξ with ∞ respect to ν and let {x } =1 be a gramian basis of X. Then, we have by Lemma 3.3.6 ξ =
∞
[ξ , x ] x =
=1
∞
[ξ, x ] x ,
=1
where the series converges ν-a.e. in ·X . Hence, ξ is gramian ν-bounded. Conversely, assume that ξ is gramian ν-bounded. Since gramian ν-boundedness is independent of the choice of a gramian basis we can choose ∞ a gramian basis {φ ⊗ f }∞
=1 , where φ ∈ H is of norm one and {f } =1 is an orthonormal basis of K. Then, Φ=
∞
=1
Φ (φ ⊗ f ) =
∞
(Φ φ) ⊗ f
=1
is a well-defined X-valued function ν-a.e., where Φ = [ξ(·), φ ⊗ f ] ∈ L1 (Θ, ν ; T (H)),
≥ 1.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
396
b1644-ch05
Real and Stochastic Analysis
Note that Φ φ ∈ L1 (Θ, ν ; H) for ≥ 1. Now let {ek }∞ k=1 be an orthonormal basis of H and write Φ φ =
∞
gk, ek ,
≥ 1,
(3.3.4)
k=1
so that we have Φ=
∞ ∞
=1
gk, ek
⊗ f =
k=1
∞
gk, (ek ⊗ f ).
(3.3.5)
k, =1
Note that gk, ∈ L1 (Θ, ν) for k, ≥ 1. Using (3.1.1) and (3.1.2) observe that for ≥ 1 Φ φ = [ξ(·), φ ⊗ f ] φ = (ξ(·)(φ ⊗ f )∗ ) φ = (ξ(·)(f ⊗ φ)φ) = (ξ(·)(φ, φ)H f ) = (ξ(·)f ) ,
(3.3.6)
where “ ” means a derivative with respect to ν, and that for k, ≥ 1 (ξ(·), ek ⊗ f )X = tr[ξ(·), ek ⊗ f ] = tr[ξ(·)(ek ⊗ f )∗ ] = tr[ξ(·)(f ⊗ ek )] = tr[(ξ(·)f ) ⊗ ek ] = (ξ(·)f , ek )H . Hence, it follows from Corollary 2.3.5 (4), (3.3.4) and (3.3.6) that (ξ(·), ek ⊗ f )X = (ξ(·)f , ek )H = ((ξ(·)f ) , ek )H ∞ = (Φ φ, ek )H = gj, ej , ek j=1
= gk, .
H
Thus, ξ is ν-bounded since Φ given by (3.3.5) is well-defined ν-a.e. Therefore, by Theorem 2.3.4 ξ has a weak Radon-Nikod´ ym derivative with respect to ν. 3.4. The spaces L1DS (ξ) and L1∗ (ξ) In this subsection we consider X-valued measures of bounded operator semivariation, where X = S(K, H) with H and K being separable Hilbert spaces. For such a measure ξ we construct the spaces L1DS (ξ) and L1∗ (ξ)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
397
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
using the weak Radon-Nikod´ ym derivative ξ with respect to a suitable dominating measure. Some properties of these spaces are obtained. First we define ξ-integrability for operator valued functions as follows, which is an analogy of the scalar valued function case. Definition 3.4.1. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν where ν is a σ-finite measure. Suppose that ξ has a weak Radon-Nikod´ ym derivative ξ with respect to ν. (1) An O(H)-valued A-measurable function Φ on Θ is said to be ξ-integrable in the DS-sense if Φ(θ)ξ (θ) ∈ X for ν-a.e. θ and if there exists a sequence 0 {Φn }∞ n=1 ⊂ L (Θ ; B(H)) of B(H)-valued A-simple functions on Θ such that (θ)X → 0(n → ∞) for ν-a.e. θ; (a) Φn (θ)ξ (θ) − Φ(θ)ξ (b) The sequence { A Φn dξ}∞ a Cauchy sequence in the norm ·X n=1 is m for every A ∈ A, where for Ψ = k=1 ak 1Ak ∈ L0 (Θ ; B(H)) we define m Ψ dξ = ak ξ(A ∩ Ak ). A
k=1
In this case, the integral of Φ with respect to ξ over A ∈ A is defined by Φ dξ = Φξ dν = lim Φn dξ. (3.4.1) A
n→∞
A
A
The sequence {Φn } is called a determining sequence for Φ. Let L1DS (ξ) denote the set of all O(H)-valued A-measurable functions on Θ that are ξ-integrable in the DS-sense. (2) For a function Φ ∈ L1DS (ξ) we define an X-valued measure ξΦ by ξΦ (A) = Φ dξ, A ∈ A. (3.4.2) A
Then we consider the space L1∗ (ξ) as: L1∗ (ξ) = {Φ ∈ L1DS (ξ) : ξΦ ∈ bca(A, X)}. The norm of Φ ∈ L1∗ (ξ) is defined by the operator semivariation of ξΦ : Φξ = ξΦ o (Θ).
(3.4.3)
Remark 3.4.2. (1) We note that if ξ ∈ cagos(A, X) is an Xvalued gramian orthogonally scattered measure, then ξ has a weak Radon-Nikod´ ym derivative with respect to νξ defined by (3.2.3) and
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
398
b1644-ch05
Real and Stochastic Analysis
L1DS (ξ) = L1∗ (ξ) is true. In fact, let Φ ∈ L1DS (ξ) and {Φn } ⊂ L0 (Θ ; B(H)) be a determining sequence for Φ. We first show that ξΦ is gramian orthogkn onally scattered. Let A, B ∈ A be disjoint. If Φn = k=1 an,k 1An,k for n ≥ 1, then # $ Φn dξ, Φn dξ [ξΦn (A), ξΦn (B)] = A
=
k n
B
an,k ξ(A ∩ An,k ),
k=1
=
kn
kn
an, ξ(B ∩ An, )
=1
an,k [ξ(A ∩ An,k ), ξ(B ∩ An, )]a∗n,
k, =1
=0 since [ξ(A ∩ An,k ), ξ(B ∩ An, )] = 0 for 1 ≤ k, ≤ kn . Hence ξΦn is gramian orthogonally scattered. Thus we have that [ξΦ (A), ξΦ (B)]τ ≤ [ξΦ (A) − ξΦn (A), ξΦ (B)]τ + [ξΦn (A), ξΦn (B)]τ + [ξΦn (A), ξΦn (B) − ξΦ (B)]τ ≤ ξΦ (A) − ξΦn (A)X ξΦ (B)X + ξΦn (A)X ξΦn (B) − ξΦ (B)X → 0 (n → ∞) since ξΦn (A) − ξΦ (A)X → 0 and ξΦn (B) − ξΦ (B)X → 0 (n → ∞), and hence the sequences {ξΦn (A)} and {ξΦn (B)} are bounded. Thus the measure ξΦ is gramian orthogonally scattered and hence Φξ = ξΦ o (Θ) = ξΦ (Θ)X < ∞ by (3.2.2). (2) It follows from (1) above that if {ξn } ⊂ cagos(A, X) is a sequence of gramian orthogonally scattered measures such that ξn (A) − ξ(A)X → 0 as n → ∞ for every A ∈ A for some X-valued measure ξ ∈ ca(A, X), then ξ is gramian orthogonally scattered. (3) We observe that the integral given in (3.4.1) is nonabsolute since Φξ need not be in L1 (Θ, ν ; X). As in the case of a Hilbert space valued measure we should show well-definedness of the integral (3.4.1). This will be done in Proposition 3.4.3 below. (4) For an X-valued measure ξ ∈ bca(A, X) of bounded operator semivariation with the weak Radon-Nikod´ ym derivative ξ and an O(H)1 valued function Φ ∈ L∗ (ξ), the measure ξΦ defined by (3.4.2) is of
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
399
bounded operator semivariation and has a weak Radon-Nikod´ ym derivative given by = Φξ . ξΦ
(3.4.4)
To show this, let {yn } ⊂ L0 (Θ ; X) and {Φn } ⊂ L0 (Θ ; B(H)) be determining sequences for ξ and Φ, respectively. Then, it is easy to see that {Φn yn } ⊂ L0 (Θ ; X), Φn (θ)yn (θ) − Φ(θ)ξ (θ)X → 0 and
Φn yn dν − ξΦ (A)
A
→0
ν-a.e. θ
A ∈ A.
X
Thus (3.4.4) is obtained. Proposition 3.4.3. Let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation such that ξ ν, where ν is σ-finite, and suppose that ξ has a weak Radon-Nikod´ ym derivative ξ with respect to ν. If Φ ∈ L1DS (ξ) is ξ-integrable in the DS-sense, then the integral (3.4.1) is independent of the choice of a determining sequence. Proof. Let {Φn }, {Ψn} ⊂ L0 (Θ ; B(H)) be two determining sequences for Φ. For each n ≥ 1 and θ ∈ Θ let zn (θ) be defined by if Φn (θ)ξ (θ) → Φ(θ)ξ (θ) 0 or Ψn(θ)ξ (θ) → Φ(θ)ξ (θ), zn (θ) = Φn (θ)ξ (θ) − Ψn (θ)ξ (θ) otherwise. Then, we see that zn (θ)X → 0 for every θ ∈ Θ and the sequence { A zn (θ) ν(dθ)} is convergent in X for every A ∈ A. Since Φn − Ψn is a B(H)-valued A-simple function we have that lim (Φn − Ψn ) dξ = 0, n ≥ 1, ν(A)→0
or
A
zn (θ) ν(dθ) = 0,
lim
ν(A)→0
A
n ≥ 1.
(3.4.5)
Since { A zn dν} converges for every A ∈ A, by the Vitali-Hahn-Saks theorem the limit in (3.4.5) is uniform in n. Hence, for any ε > 0 there exists a
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
400
b1644-ch05
Real and Stochastic Analysis
δ > 0 such that
ν(A) < δ =⇒ z dν n
A
< ε,
n ≥ 1.
X
By Egoroff’s theorem there exists an A0 ∈ A such that ν(A0 ) < δ and zn (θ) → 0 uniformly on Ac0 . Once ε > 0 and δ > 0 are chosen, there exists a positive integer N such that for any n ≥ N , zn (θ)X < ε for every θ ∈ Ac0 and
zn dν
A
X
≤ zn dν + z dν n
A∩Ac0
A∩A0 X X
≤ εν(Θ) + ε uniformly for A ∈ A. Thus,
zn dν lim
n→∞ A
=0 X
for every A ∈ A. Therefore, the integral A Φ dξ is independent of the choice of a determining sequence and is well-defined for every A ∈ A. We shall prove that for an X-valued measure ξ of bounded operator semivariation with a weak Radon-Nikod´ ym derivative the space L1∗ (ξ) becomes a Banach space with the norm ·ξ given by (3.4.3). Theorem 3.4.4. Suppose that A has a countable generator. Then, for any X-valued measure ξ ∈ bca(A, X) of bounded operator semivariation the space L1∗ (ξ) is a Banach space with the norm ·ξ given by (3.4.3). Proof. We shall show the completeness of the space. By assumption on A, ξ has a weak Radon-Nikod´ ym derivative with respect to a suitable dominating measure ν, i.e., ξν = dξ/dν. Let {Φn } ⊂ L1∗ (ξ) be a Cauchy sequence. Then, Φn − Φm ξ = ξΦn − ξΦm o (Θ) → 0 as n, m → ∞. Hence {ξΦn } is a Cauchy sequence in bca(A, X). Since the space (bca(A, X), ·o (Θ)) is a Banach space (cf. [13, p. 60]) there exists an
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
401
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
X-valued measure ζ ∈ bca(A, X) such that ξΦn − ζo (Θ) → 0
(n → ∞).
(3.4.6)
≡ dξΦn /dν = Φn ξν for n ≥ 1 by (3.4.4). Let Note that ξΦn ν and ξΦ n ,ν ρ be a dominating measure for ζ and set µ = ν + ρ. Then, weak RadonNikod´ ym derivatives of ξ and ζ with respect to µ are given respectively by ξµ = ξν (dν/dµ) and ζµ = ζρ (dρ/dµ), and hence ξΦ ≡ dξΦn /dµ = Φn ξµ n ,µ for n ≥ 1. We now show that ζµ = ζµ JN(ξµ )⊥ µ-a.e., where both sides are considered to be operators in S(K, H) = X, so that JN(ξµ )⊥ is an orthogonal projection on K. Suppose the contrary and let B = {θ ∈ Θ : ζµ (θ) = ζµ (θ)JN(ξµ )⊥ } be such that µ(B) > 0. Hence, for some f ∈ N(ξµ ) and φ ∈ H such that f K = 1 = φH we have µ(Bf,φ ) > 0, where
Bf,φ = {θ ∈ Θ : (ζµ (θ)f, φ)H = (ζµ (θ)JN(ξµ )⊥ f, φ)H = 0}. This implies that the variation of the scalar measure (ζ(·)f, φ)H is positive at Bf,φ , i.e., |(ζµ f, φ)H | dµ > 0. (3.4.7) |(ζ(·)f, φ)H |(Bf,φ ) = Bf,φ
Note that
|((ξΦn (·) − ζ(·))f, φ)H |(Bf,φ ) = Bf,φ
=
Bf,φ
|((Φn ξµ − ζµ )f, φ)H | dµ |(ζµ f, φ)H | dµ,
since f ∈ N(ξµ ),
= |(ζ(·)f, φ)H |(Bf,φ ). On the other hand, it follows from (3.2.1) and (3.4.6) that |((ξΦn (·) − ζ(·))f, φ)H |(Bf,φ ) = |((Φn ξµ − ζµ )f, φ)H | dµ Bf,φ
=
Bf,φ
= Bf,φ
|((Φn ξµ − ζµ )(f ⊗ φ)φ, φ)H | dµ |((Φn ξµ − ζµ )(φ ⊗ f )∗ φ, φ)H | dµ
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
402
b1644-ch05
Real and Stochastic Analysis
= Bf,φ
≤
Bf,φ
|([Φn ξµ − ζµ , φ ⊗ f ]φ, φ)H | dµ [Φn ξµ − ζµ , φ ⊗ f ]τ dµ
= |[ξΦn (·) − ζ(·), φ ⊗ f ]|(Bf,φ ) ≤ ξΦn − ζo (Bf,φ ) → 0 (n → ∞), which is a contradiction to (3.4.7). Thus, ζµ = ζµ JN(ξµ )⊥ µ-a.e. as claimed.
Finally, let Φ = ζµ ξµ − . Then Φ is an O(H)-valued A-measurable function on Θ by Remark 3.2.4 (3). Moreover, the following computation is valid for A ∈ A: ζ(A) = ζµ dµ = ζµ JN(ξµ )⊥ dµ
A
A
−
= A
=
A
ζµ ξµ ξµ dµ, Φξµ dµ =
by Remark 3.2.4 (1)(b),
Φ dξ A
= ξΦ (A). This shows that ζ = ξΦ and Φ ∈ L1∗ (ξ). Since Φn − Φξ = ξΦn − ζo (Θ) → 0 the space L1∗ (ξ) is complete. It seems quite natural that the space L0 (Θ ; B(H)) is dense in L1∗ (ξ) for ξ ∈ bca(A, X) with a weak Radon-Nikod´ ym derivative. This can be shown easily for a gramian orthogonally scattered measure ξ ∈ cagos(A, X) as follows. Let ν(·) = ξ(·)2X . Then, since X is separable and ξ ν, ξ has a weak Radon-Nikod´ ym derivative with respect to ν by Corollary 2.4.3. Take any Φ ∈ L1DS (ξ). Let {Φn } ⊆ L0 (Θ ; B(H)) be a determining sequence for Φ. It follows from Remark 3.4.2 (1) that ξΦn −Φ ∈ cagos(A, X) for n ≥ 1 and hence Φn − Φξ = ξΦn −Φ o (Θ) = ξΦn −Φ (Θ)X
= (Φn − Φ) dξ
→ 0. Θ
The general case is proved below.
X
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
403
Theorem 3.4.5. For any X-valued measure ξ ∈ bca(A, X) of bounded operator semivariation with a weak Radon-Nikod´ ym derivative the space L0 (Θ ; B(H)) is dense in the Banach space (L1∗ (ξ), ·ξ ). Proof. Let ν be a finite measure such that ξ ν and ξ = dξ/dν be the weak Radon-Nikod´ ym derivative of ξ with respect to ν. Let Φ ∈ L1∗ (ξ) and {Φn } ⊆ L0 (Θ ; B(H)) be its determining sequence. Define a sequence {xn } of X-valued functions on Θ by xn (θ) =
0 Φn (θ)ξ (θ) − Φ(θ)ξ (θ)
if Φn (θ)ξ (θ) → Φ(θ)ξ (θ), otherwise.
Then, we see that xn (θ)X → 0 for all θ ∈ Θ and xn (θ) ν(dθ) = 0,
lim
ν(A)→0
n ≥ 1.
(3.4.8)
A
By the Vitali-Hahn-Saks theorem the limit in (3.4.8) is uniform in n. Let ε > 0 be given. Then there exists a δ > 0 such that δ ≤ ε and
ν(A) < δ =⇒ xn (θ) ν(dθ)
< ε, n ≥ 1. A
X
By Egoroff’s theorem there exists an A0 ∈ A such that ν(A0 ) < δ and xn (θ) → 0 uniformly on Ac0 . Once ε > 0 and δ > 0 are chosen there exists an integer N ≥ 1 such that xn (θ)X < ε,
n ≥ N, θ ∈ Ac0 .
(3.4.9)
Now let {A1 , A2 , . . . , Am } ∈ Π(Ac0 ) and {a1 , a2 , . . . , am } ⊆ B(H) be such that ai ≤ 1 for i = 1, 2, . . . , m. Then it follows from (3.4.9) that
m
ai ξΦn −Φ (Ai )
i=1
X
m
= ai (Φn − Φ)dξ
Ai i=1 X
m
= ai (Φn − Φ)ξ dν
Ai i=1 X
m
= ai xn (θ) ν(dθ)
Ai i=1
X
October 24, 2013
404
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
≤
m
i=1
≤
m i=1
≤
Ai
m
xn (θ) ν(dθ)
X
xn (θ)X ν(dθ) Ai
εν(Ai ) = εν(Ac0 )
i=1
for every n ≥ N . This shows that ξΦn −Φ o (Ac0 ) ≤ εν(Ac0 ),
n ≥ N.
On the other hand we have that ξΦn −Φ o (A0 ) ≤ xn (θ)X ν(dθ) ≤ Cδ ≤ Cε,
n ≥ 1,
A0
where C > 0 satisfies that xn (θ)X ≤ C for θ ∈ A0 and n ≥ 1 by the uniform boundedness principle. Thus we have that ξΦn −Φ o (Θ) ≤ ξΦn −Φ o (Ac0 ) + ξΦn −Φ o (A0 ) ≤ ε(ν(Ac0 ) + C) for every n ≥ N . Therefore, Φn − Φξ = ξΦn −Φ o (Θ) → 0 as n → ∞, completing the proof.
3.5. The spaces L1DS (η) and L2 (Fη ) As before assume that H and K are separable Hilbert spaces and consider X = S(K, H)-valued gramian orthogonally scattered measures. For such a measure η ∈ cagos(A, X) we shall show the equality L1DS (η) = L2 (Fη ), where L2 (Fη ) is an L2 -space for the T (H)-valued measure Fη induced by η: Fη (A) = [η(A), η(A)],
A ∈ A.
(3.5.1)
Recall that T (H) has the Radon-Nikod´ ym property since it is a separable dual space. We begin with a definition. Definition 3.5.1. Let F ∈ ca(A, T + (H)) be a T +(H)-valued measure, where T +(H) = {a ∈ T (H) : a ≥ 0}. Then the variation ν(·) = |F |(·) = F (·)τ of F is a finite positive measure. Let F = dF/dν
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
405
be the Radon-Nikod´ ym derivative of F with respect to ν, so that F ∈ 1 L (Θ, ν ; T (H)) and F (A) = F dν, A ∈ A, A
where the integral is in the sense of Bochner. Let Φ and Ψ be two O(H)valued A-measurable functions on Θ. Then, a pair (Φ, Ψ) is said to be 1/2 1/2 F-integrable if ΦF and ΨF are S(H)-valued functions and the T (H) 1/2 1/2 ∗ )(ΨF ) is Bochner integrable with respect to ν, valued function (ΦF 1/2 1/2 i.e., (ΦF )(ΨF )∗ ∈ L1 (Θ, ν ; T (H)). In this case we write 1/2 1/2 Φ dF Ψ∗ = (ΦF )(ΨF )∗ dν ∈ T (H). (3.5.2) [Φ, Ψ]F = Θ
Θ
2
Then, the space L (F ) is defined by L2 (F ) = {Φ : Φ is O(H)-valued, A-measurable, and (Φ, Φ) is F -integrable}. Two functions Φ, Ψ ∈ L2 (F ) are identified if ΦF norm in L2 (F ) is defined by ΦF = [Φ, Φ]F τ1/2 ,
1/2
= ΨF
1/2
ν-a.e. The
Φ ∈ L2 (F ).
It is known that for a T (H)+ -valued measure F ∈ ca(A, T + (H)) the space L2 (F ) is a normal Hilbert B(H)-module with gramian [·, ·]F given by (3.5.2) and the space L0 (Θ ; B(H)) is dense in it (cf. Mandrekar and Salehi [20]). Let η ∈ cagos(A, X) be an X-valued gramian orthogonally scattered by (3.5.1). Hence the measure and Fη be a T +(H)-valued measure defined space L2 (Fη ) is constructed. In [20], the integral A Φ dη is defined for every Φ ∈ L2 (Fη ) and A ∈ A. Note that η has a weak Radon-Nikod´ ym derivative with respect to νη (·) = η(·)2X = Fη (·)τ = |Fη |(·) since we assume that X is separable. We shall prove the following theorem. Theorem 3.5.2. Let η ∈ cagos(A, X) be an X-valued gramian orthogonally scattered measure such that η ν, where ν is a σ-finite measure, and η be the weak Radon-Nikod´ym derivative of η with respect to ν. Then, it holds that (L1DS (η), ·η ) = (L2 (Fη ), ·Fη ).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
406
b1644-ch05
Real and Stochastic Analysis
Proof. Observe that both spaces are Banach spaces and the space L0 (Θ ; B(H)) is dense in respective space. Hence, it suffices to show that the two norms ·η and ·Fη agree on L0 (Θ ; B(H)). In fact, for Φ ∈ L0 (Θ ; B(H)) it holds that Φ2η = ηΦ o (Θ)2 = ηΦ (Θ)2X
2
# $
= = Φ dη Φ dη, Φ dη
Θ Θ Θ X τ
∗ 2 =
Φ dFη Φ = [Φ, Φ]Fη τ = ΦFη , where ηΦ (·) =
(·)
Θ
τ
Φ dη. Thus we have Φ ∈ L0 (Θ ; B(H)).
Φη = ΦFη ,
Therefore, the theorem is proved.
Remark 3.5.3. Let η ∈ cagos(A, X) be an X-valued gramian orthogonally scattered measure and Fη be defined by (3.5.1). Then, for Φ ∈ L2 (Fη ) and A ∈ A the integral Φ dη A
is well-defined without the weak Radon-Nikod´ ym derivative of η (cf., e.g., Mandrekar and Salehi [20] and Kakihara [13, p. 118]). 3.6. The spaces L1∗ (ξ) and L2∗ (Mξ ) Scalar and operator valued bimeasures play an important role in the study of second order stochastic processes. We examined scalar bimeasures in Section 2.6. In this subsection we study T (H)-valued bimeasures. Moreover, for any X-valued measure ξ ∈ ca(A, X) a T (H)-valued bimeasure Mξ is induced and we shall show that if ξ is of bounded operator semivariation with a weak Radon-Nikod´ ym derivative, then the equality L1∗ (ξ) = L2∗ (Mξ ) 2 is true, where the space L∗ (Mξ ) is defined in Definition 3.6.1. Now we define T (H)-valued bimeasures. Let A × A = {A × B : A, B ∈ A}. Then a mapping M : A × A → T (H) is said to be a T (H)-valued bimeasure if (1) M (A, ·), M (·, A) ∈ ca(A, T (H)) for A ∈ A;
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
407
(2) For any positive integer n, a1 , . . . , an ∈ B(H) and A1 , . . . , An ∈ A it holds that n
aj M (Aj , Ak )a∗k ≥ 0.
j,k=1
Let M = M(A × A ; T (H)) denote the set of all T (H)-valued bimeasures on A × A. For a bimeasure M ∈ M the variation |M |(·, ·) and operator semivariation M o (·, ·) are defined for A, B ∈ A respectively by M (∆, ∆ )τ : π ∈ Π(A), π ∈ Π(B) , |M |(A, B) = sup ∆∈π,∆ ∈π
∗
M o (A, B) = sup a∆ M (∆, ∆ )b∆ ,
∆∈π,∆ ∈π
τ
where the supremum is taken for all partitions π ∈ Π(A), π ∈ Π(B) with a∆ , b∆ ∈ B(H), a∆ , b∆ ≤ 1 for ∆ ∈ π and ∆ ∈ π . Let Mv and Mb denote the sets of all bimeasures M ∈ M that are of bounded variation or bounded operator semivariation, respectively. If H = C, the complex number field, then we consider scalar bimeasures and write M = M and Mv = Mv . Typical examples of bimeasures are Mξ ∈ M and mξ ∈ M induced by an X-valued measure ξ ∈ ca(A, X), where Mξ (A, B) = [ξ(A), ξ(B)],
mξ (A, B) = (ξ(A), ξ(B))X ,
A, B ∈ A.
In this case, we have that Mξ o (Θ, Θ) = ξo (Θ)2 . Conversely, if a T (H)valued bimeasure M is given, then there exist a reproducing kernel normal Hilbert B(H)-module XM and an XM -valued measure ξ such that M = Mξ (cf. [13, p. 37]). Now let ξ ∈ bca(A, X) be an X-valued measure of bounded operator semivariation. We see that Mξ (A, ·), Mξ (·, A) ∈ vca(A, T (H)), i.e., they are T (H)-valued measures on A of bounded variation. This follows from (3.2.1) since if we write (ξ ◦ x)(·) = [ξ(·), x],
x ∈ X,
then Mξ (A, ·) = (ξ(A) ◦ ξ)(·), Mξ (·, A) = (ξ ◦ ξ(A))(·) ∈ vca(A, T (H)). For a T (H)-valued measure F ∈ vca(A, T (H)) of bounded variation we can
October 24, 2013
10:0
9in x 6in
408
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
define an L1 -space L1 (F ) by L1 (F ) = {Φ : Φ is O(H)-valued, A-measurable, and ΦF ∈ L1 (Θ, |F | ; T (H))}, ym where |F | is the variation of F , and F = dF/d|F | (the Radon-Nikod´ derivative of F with respect to |F |), which exists since T (H) has the RadonNikod´ ym property. The space L1 (F ) is a Banach space with norm ΦF τ d|F |, Φ ∈ L1 (F ) Φ1,F = ΦF 1,|F | = Θ
0
and L (Θ ; B(H)) is dense in it (cf. [14, Theorems 2.5 and 2.6]). Definition 3.6.1. Let M ∈ Mb be a T (H)-valued bimeasure of bounded operator semivariation on A×A. Let Φ and Ψ be O(H)-valued A-measurable functions on Θ. Then the pair (Φ, Ψ) is said to be M-integrable if (a) Φ, Ψ ∈ L1 (M (·, A)) for every A ∈ A; (b) M1 (·) ≡ Θ Ψ(λ) M (·, dλ)∗ , M2 (·) ≡ Θ Φ(θ) M (dθ, ·) ∈ vca(A, T (H)); (c) Φ ∈ L1 (M1∗ ), Ψ ∈ L1 (M2∗ ) and ∗ ∗ ∗ Φ(θ) M1 (dθ) = Ψ(λ) M2 (dλ) . Θ
Θ
2
Let L (M ) denote the set of all O(H)-valued A-measurable functions Φ on Θ such that (Φ, Φ) is M -integrable. The pair (Φ, Ψ) is said to be strictly M-integrable if (a) above and (b’), (c’) below are true: (b’) M1D (·) ≡ D Ψ(λ) M (·, dλ)∗ , M2C (·) ≡ C Φ(θ) M (dθ, ·) ∈ vca(A, T (H)) for every C, D ∈ A; (c’) For every C, D ∈ A it holds that Φ ∈ L1 ((M1D )∗ ), Ψ ∈ L1 ((M2C )∗ ) and ∗ Φ(θ) (M1D )∗ (dθ) = Ψ(λ)(M2C )∗ (dλ) . C
D
The common value of the above is denoted by ∗ Φ dM Ψ∗ . C
L2∗ (M )
D
Let denote the set of all O(H)-valued A-measurable functions Φ such that (Φ, Φ) is strictly M -integrable. [Recall that (Θ, A) is the basic measurable space.]
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
409
Before we prove the next theorem let us recall that if ξ ∈ bca(A, X) has a weak Radon-Nikod´ ym derivative ξ with respect to ν such that ξ ν, then ξo (Θ) = sup{|ξ ◦ x|(Θ) : x ∈ X, xX ≤ 1} = sup [ξ , x]τ dν : x ∈ X, xX ≤ 1
(3.6.1)
Θ
since (ξ ◦ x) = [ξ , x] by Lemma 3.3.6. Recall that ξΦ ∈ ca(A, X) is defined for Φ ∈ L1DS (ξ) by (3.4.2), i.e., ξΦ (A) = A Φ dξ for A ∈ A. If Φ ∈ L1∗ (ξ), then ξΦ ∈ bca(A, X) is of bounded operator semivariation and by (3.6.1) we have ξΦ o (Θ) = sup{|ξΦ ◦ x|(Θ) : xX ≤ 1} = sup [Φξ , x]τ dν : xX ≤ 1 < ∞,
(3.6.2)
Θ
where d(ξΦ ◦ x)/dν = [(ξΦ ) , x] = [Φξ , x] ∈ L1 (Θ, ν ; T (H)) by (3.4.4). Also recall that if the σ-algebra A has a countable generator, then any Hilbert space valued measure ξ on A has a separable range, so that its orthogonally scattered dilation has a weak Radon-Nikod´ ym derivative with respect to any dominating measure. Therefore, ξ has a weak RadonNikod´ ym derivative (cf. Corollary 2.4.6). Now we have the following. Theorem 3.6.2. Assume that the σ-algebra A has a countable generator. Then, for an X-valued measure ξ ∈ bca(A, X) of bounded operator semivariation it holds that L1∗ (ξ) ⊆ L2∗ (Mξ ). If Θ is a locally compact abelian group, then equality holds. Proof. Let {η, Y, P } be a minimal gramian orthogonally scattered dilation of ξ (cf. Subsection 3.2), ν(·) = η(·)2Y , and M = Mξ . By assumption ξ and η have weak Radon-Nikod´ ym derivatives ξ and η with respect to ν, respectively. First we shall show that L1∗ (ξ) ⊆ L2∗ (Mξ ). Let Φ ∈ L1∗ (ξ). Observe that, for A ∈ A, M (·, A) = [ξ(·), ξ(A)] = (ξ ◦ ξ(A))(·) is a T (H)-valued measure of bounded variation by (3.6.1). Hence it follows that ≡ ξA
dM (·, A) d[ξ(·), ξ(A)] = = [ξ , ξ(A)] ∈ L1 (Θ, ν ; T (H)) dν dν
October 24, 2013
10:0
9in x 6in
410
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
by Lemma 3.3.6. Thus, ΦξA = Φ[ξ , ξ(A)] = Φξ ξ(A)∗ = [Φξ , ξ(A)] ∈ L1 (Θ, ν; is a well-defined T (H)-valued function on Θ. Moreover, ΦξA T (H)) since ξΦ ∈ bca(A, X) is of bounded operator semivariation and ΦξA τ dν = [Φξ , ξ(A)]τ dν = |ξΦ ◦ ξ(A)|(Θ) Θ
Θ
≤ ξΦ o (Θ)ξ(A)X < ∞ by (3.6.2). Thus, condition (a) of Definition 3.6.1 is satisfied. As to condition (b’) of Definition 3.6.1, let C, D ∈ A and observe that Φ(λ) M (·, dλ)∗ = Φ(λ)[ξ(·), ξ(dλ)]∗ M1D (·) = D
D
#
$ Φ(λ) ξ(dλ), ξ(·) = (ξΦ (D) ◦ ξ)(·) ∈ vca(A, T (H))
= D
by (3.6.2). Similarly we have that M2C (·) ∈ vca(A, T (H)). Thus condition (b’) of Definition 3.6.1 is verified. Finally for condition (c’) of Definition 3.6.1 we see that dM1D = [ξΦ (D), ξ ] ∈ L1 (Θ, ν ; T (H)), dν and hence
C
Φ(θ) (M1D )∗ (dθ) =
Φ(θ) [ξ (θ), ξΦ (D)] ν(dθ)
C
# =
$ Φ(θ)ξ (θ) ν(dθ), ξΦ (D)
C
= [ξΦ (C), ξΦ (D)]. In the same fashion we get Φ(λ) (M2C )∗ (dλ) = [ξΦ (C), ξΦ (D)], D
so that condition (c’) of Definition 3.6.1 is satisfied. Therefore Φ ∈ L2∗ (M ). Suppose that Θ = G is a locally compact abelian group and let A = BG be the Borel σ-algebra of G. Recall that {η, Y, P } is a minimal gramian orthogonally scattered dilation of ξ. By Theorem 3.5.2 we have L1DS (η) = L1∗ (η) = L2 (Fη ). Then, by [14, Theorem 4.2] we have that
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
411
L2∗ (M ) = QL2 (F ), where F = Fη , Q = U −1 P U (a gramian orthogonal 2 projection corresponding to P ), and U : L (F ) → Sη (G) 2is a gramian unitary operator given by U (Φ) = G Φ dη = ηΦ (G) for Φ ∈ L (F ) (see the diagram below). U
L2 (F ) −−−−→ Sη (G) = Y Q& &P
(3.6.3)
L2∗ (M ) ←−− −− Sξ (G) ⊆ X. −1 U
Here, U is said to be gramian unitary if it is unitary in the ordinary sense and if it is a module map, i.e., U a = aU for a ∈ B(H). In this case U preserves the gramian. To show L2∗ (M ) ⊆ L1∗ (ξ) let Φ ∈ L2∗ (M ). Then Φ ∈ L2 (F ) = L1∗ (η) and hence there exists a determining sequence {Φn } ⊆ L0 (G ; B(H)) for Φ so that Φn η − Φη Y → 0
ν-a.e.
(3.6.4)
as n → ∞ and { A Φn dη} is a Cauchy sequence in Y for every A ∈ BG . Note that the weak Radon-Nikod´ ym derivative η = dη/dν exists since A has a countable generator. Since P is a gramian orthogonal projection it satisfies: [P x, y]Y = [x, P y]Y ,
x, y ∈ Y
(see the second paragraph of Section 4.4). Hence, it follows that for y ∈ Y [P Φη , y]Y = [Φη , P y]Y = Φη (P y)∗ = Φ[η , P y]Y = Φ[P η , y]Y = Φ[(P η) , y]Y = Φ[ξ , y]Y = Φξ y ∗ = [Φξ , y] by Corollary 2.3.5(4). Hence Φξ is well-defined ν-a.e. and P Φη = ΦP η = Φξ . Similarly, Φn ξ = Φn P η = P Φn η for n ≥ 1. Thus, we have that Φn ξ − Φξ X = (Φn − Φ)ξ X = (Φn − Φ)P η Y = P (Φn − Φ)η Y ≤ (Φn − Φ)η Y → 0 kn as n → ∞ by (3.6.4). Moreover, let Φn = k=1 ank 1Ank for n ≥ 1, where ank ∈ B(H) for n, k ≥ 1 and {Ank : 1 ≤ k ≤ kn } ∈ Π(G) for n ≥ 1. Then,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
412
b1644-ch05
Real and Stochastic Analysis
for A ∈ BG and n ≥ 1 Φn dξ =
ξΦn (A) = A
=
kn
ank ξ(A ∩ Ank )
k=1
kn
ank P η(A ∩ Ank )
k=1
=P
kn
ank η(A ∩ Ank ),
since P is a module map
k=1
Φn dη = P ηΦn (A).
=P A
That is, ξΦn = P ηΦn ,
n ≥ 1.
(3.6.5)
Since the sequence {ηΦn (A)} is Cauchy in Y , it follows that {ξΦn (A)} is Cauchy in X. Consequently Φ ∈ L1DS (ξ) with a determining sequence {Φn }. To show Φ ∈ L1∗ (ξ) we need to prove that ξΦ is of bounded operator semivariation. We shall show that ξΦ = P ηΦ , which implies that ξΦ ∈ bca(BG , X). Recall that for A ∈ BG , ξΦ (A) and ηΦ (A) are defined to satisfy ξΦ (A) − ξΦn (A)X → 0, ηΦ (A) − ηΦn (A)Y → 0 as n → ∞, respectively. Hence, if A ∈ A is fixed, for any ε > 0 we can choose an integer N ≥ 1 such that both ξΦ (A) − ξΦn (A)X < ε,
n ≥ N,
(3.6.6)
ηΦ (A) − ηΦn (A)X < ε,
n ≥ N.
(3.6.7)
Then, it follows from (3.6.5), (3.6.6) and (3.6.7) that for A ∈ BG ξΦ (A) − P ηΦ (A)Y ≤ ξΦ (A) − ξΦn (A)Y + ξΦn (A) − P ηΦn (A)Y + P ηΦn (A) − P ηΦ (A)Y < 2ε,
n ≥ N,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
413
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
which implies ξΦ (A) = P ηΦ (A)
Φ dξ = P
or A
Φ dη. A
That is, ξΦ = P ηΦ .
(3.6.8)
Thus, Φ ∈ L1∗ (ξ). Therefore, L2∗ (M ) ⊆ L1∗ (ξ).
Remark 3.6.3. (1) If Θ is a locally compact abelian group with a countably generated Borel σ-algebra A and ξ ∈ bca(A, X) has a minimal gramian orthogonally scattered dilation {η, Y, P }, then it follows that QL1∗ (η) = L1∗ (ξ), which is one of the main results of this article, and the following commuting diagram that extends (3.6.3) holds: U
L1DS (η) = L1∗ (η) = L2 (Fη ) −−−−→ Sη (Θ) = Y Q& &P L1DS (ξ) ⊇ L1∗ (ξ) = L2∗ (Mξ ) ←−− −− Sξ (Θ) ⊆ X −1 U
where U Φ = Θ Φ dη for Φ ∈ L2 (Fη ) and Q = U −1 P U . This diagram is an extension of the one obtained in the framework of spectral domain analysis for X-valued operator harmonizable processes on a locally compact abelian group in [14]. (2) In (1) above, let Φ ∈ L1∗ (ξ). Then, Φ1A ∈ L1∗ (ξ) and hence Q(Φ1A ) = Φ1A for every A ∈ A. Since Q = U −1 P U it follows that for A ∈ A U −1 P U (Φ1A ) = Φ1A ,
P U (Φ1A ) = U (Φ1A ),
P ηΦ (A) = ηΦ (A).
By (3.6.8) we have that ξΦ = P ηΦ = ηΦ ,
Φ ∈ L1∗ (ξ) = L2∗ (Mξ ).
(3.6.9)
Moreover, we note that L1∗ (ξ) is a Banach space with norm ·ξ given by (3.4.3) and also with the norm in L2∗ (Mξ ) ⊆ L2 (Fη ) that is given by
October 24, 2013
10:0
9in x 6in
414
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
using (3.6.9):
= Φ dξ , Φ dη ΦFη =
Θ
Y
Θ
Φ ∈ L2∗ (Mξ ) = L1∗ (ξ).
X
(3) In (1) above, let Φ ∈ L1∗ (ξ) and let V be a bounded module map, i.e., V ∈ B(X) and it satisfies V a = aV for a ∈ B(H). Hence, V Ψ = ΨV for any Ψ ∈ L0 (Θ ; B(H)). Note that V ξ ∈ bca(A, X). Then, as in the proof of (3.6.8) we can show that for A ∈ A V Φ dξ = Φ d(V ξ) or V [ξΦ (A)] = (V ξ)Φ (A). (3.6.10) A
A
This means: V (ξΦ ) = (V ξ)Φ . (4) For an X-valued measure ξ ∈ ca(A, X) consider the scalar bimeasure mξ ∈ M. Let L1DS (ξ) denote the set of all scalar valued functions that are ξ-integrable in the Dunford-Schwartz sense and L2∗ (mξ ) denote the set of all scalar valued functions that are strictly mξ -integrable in the sense of Chang and Rao [4] (cf. Definition 2.6.2). Then it holds that L1DS (ξ) = L2∗ (mξ ) with different topologies as is shown in Chang and Rao [5]. Theorem 3.6.2 is a generalization of this to the infinite dimensional case.
4. Cram´ er and Karhunen Processes We are interested in infinite dimensional second order stochastic processes. These processes are described as Hilbert-Schmidt class operator valued processes on some locally compact abelian group, specifically the number line R. Among such processes, stationary and harmonizable (in various senses) processes are studied extensively. We focus on those processes of Cram´er and Karhunen classes here. Originally these processes were considered by Cram´er [8] and Karhunen [18] without names and Rao [32] gave these terminologies.
4.1. Infinite dimensional second order stochastic processes In this subsection we give basic terminologies of second order stochastic processes. Let G be a locally compact abelian group. Let (Ω, F, µ) be a
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
415
probability measure space and let 2 2 L0 (Ω) = f ∈ L (Ω, F, µ) : f (ω) µ(dω) = 0 . Ω
A standard second order stochastic process over G is a mapping x(·) : G → L20 (Ω). We denote it by {x(t)}. The covariance function γ of {x(t)} is defined by γ(s, t) = (x(s), x(t))2 ,
s, t ∈ G,
where (·, ·)2 is the inner product in L20 (Ω). A finite dimensional second order stochastic process is a mapping x(·) : G → [L20 (Ω)]q for some positive integer q ≥ 2, i.e., x(t) = (x1 (t), . . . , xq (t)),
t ∈ G,
where {xi (t)} is a standard second order stochastic process for 1 ≤ i ≤ q. In this case, in addition to the scalar covariance function γ(s, t) = (x(s), x(t))2,q , (·, ·)2,q being the inner product in [L20 (Ω)]q , the process {x(t)} has a matricial covariance function Γ given by Γ(s, t) = [(xi (s), xj (t))2 ]qi,j=1 ,
s, t ∈ G,
(4.1.1)
i.e., Γ(s, t) is the Gram-Schmidt matrix generated by the two vectors x(s) = (x1 (s), . . . , xq (s)) and x(t) = (x1 (t), . . . , xq (t)). If we identify [L20 (Ω)]q = L20 (Ω; Cq ) = S(L20 (Ω), Cq ) = X, then the matricial covariance function Γ(s, t) given by (4.1.1) is exactly the gramian [x(s), x(t)] when H = Cq . As a natural generalization an infinite dimensional second order stochastic process over G is obtained by replacing Cq by an infinite dimensional Hilbert space H which is defined to be a mapping x(·) : G → L20 (Ω; H), where L20 (Ω; H) is the Hilbert space of all H-valued strong random variables on Ω that are square integrable with respect to µ with zero mean. Since we can and do identify L20 (Ω; H) with the space X = S(L20 (Ω), H) of all Hilbert-Schmidt class operators from L20 (Ω) to H, i.e., X = S(L20 (Ω), H) = L20 (Ω; H), an infinite dimensional second order stochastic process (or field) over G is regarded as an X-valued mapping over G. The scalar covariance function γ and operator covariance function Γ of an X-valued process {x(t)} on G
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
416
b1644-ch05
Real and Stochastic Analysis
are defined respectively by γ(s, t) = (x(s), x(t))X ,
s, t ∈ G,
Γ(s, t) = [x(s), x(t)],
s, t ∈ G.
Note that if {x(t)} is an X-valued process on G, then for each φ ∈ H we have an L20 (Ω)-valued process {xφ (t)} given by xφ (t) = (x(t), φ)H ,
t ∈ G.
A unified study of various types of stationarity and harmonizability is initiated in Rao [31] (see also [13]).
4.2. Cram´ er processes As in the previous subsection let X = S(L20 (Ω), H), where H is a separable Hilbert space and consider X-valued processes on a locally compact abelian group G. We also assume that L20 (Ω) is separable. We usually employ scalar or operator covariance functions to classify second order processes. Thus, some Cram´er classes of processes are defined as follows, where we use terminologies that are compatible with those in [13, pp. 173–174]. Definition 4.2.1. (1) An L20 (Ω)-valued process on G is said to be of weak (respectively of strong) Cram´er class if its covariance function γ can be written as ∗ γ(s, t) = ϕ(s, θ)ϕ(t, λ) m(dθ, dλ), s, t ∈ G (4.2.1) Θ2
on some measurable space (Θ, A), some scalar valued bimeasure m ∈ M (respectively m ∈ Mv ) and some family of scalar valued functions {ϕ(t, ·) : t ∈ G} ⊆ L2∗ (m), where the integral is a strict m- (respectively LebesgueStieltjes) integral (cf. Section 2.6 for strict integrals). (2) An X-valued process {x(t)} on G is said to be of scalarly weak (respectively of scalarly strong) Cram´er class if, for each φ ∈ H, the L20 (Ω)valued process {(x(t), φ)H } is of weak (respectively of strong) Cram´er class in the sense of (1). (3) An X-valued process on G is said to be of weak (respectively of strong) Cram´er class if its scalar covariance function γ is representable as (4.2.1).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
417
(4) An X-valued process on G is said to be of weak (respectively of strong) operator Cram´er class if its operator covariance function Γ has a representation of the form Γ(s, t) =
∗
Θ2
Φ(s, θ) M (dθ, dλ) Φ(t, λ)∗ ,
s, t ∈ G
for some measurable space (Θ, A), some operator bimeasure M ∈ Mb (respectively M ∈ Mv ) and some family of operator valued functions {Φ(t, ·) : t ∈ G} ⊆ L2∗ (M ). ' (the dual of the locally compact abelian group G), When Θ = G ' ), and ϕ(t, χ) = Φ(t, χ) = t, χ (the A = BGb (the Borel σ-algebra of G ' ) for t ∈ G and χ ∈ G, ' the above definitions reduce duality pair of G and G to corresponding harmonizabilities (cf. [13, pp. 155–156]). Weak and strong operator Cram´er classes are defined in [13, p. 173], but there they are based on an X-valued measure. We obtain integral representations of Cram´er processes. Proposition 4.2.2. Let {x(t)} be an X-valued process on G. (1) If the process {x(t)} is of weak Cram´er class relative to a scalar bimeasure m ∈ M and a family of scalar functions {ϕ(t, ·) : t ∈ G} ⊆ L2∗ (m) on some measurable space (Θ, A), then there exists an X-valued measure ξ ∈ ca(A, X) such that m = mξ and x(t) =
Θ
ϕ(t, θ) ξ(dθ),
t ∈ G,
(4.2.2)
where the integral is in the Dunford-Schwartz sense. Conversely, if there exist a measurable space (Θ, A), an X-valued measure ξ ∈ ca(A, X) and a family of scalar functions {ϕ(t, ·) : t ∈ G} ⊆ L2∗ (mξ ) such that (4.2.2) holds, then the process {x(t)} is of weak Cram´er class. (2) Suppose that the process {x(t)} is of weak operator Cram´er class relative to an operator bimeasure M ∈ Mb of bounded operator semivariation and a family of operator valued functions {Φ(t, ·) : t ∈ G} ⊆ L2∗ (M ) on some measurable space (Θ, A), where Θ is a locally compact abelian group and A = BΘ is its Borel σ-algebra having a countable generator. Then there exists an X-valued measure ξ ∈ bca(A, X) of bounded
October 24, 2013
418
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
operator semivariation such that M = Mξ and x(t) = Φ(t, θ) ξ(dθ), t ∈ G.
(4.2.3)
Θ
Conversely, if there exist a measurable space (Θ, A) with A having a countable generator, an X-valued measure ξ ∈ bca(A, X) of bounded operator semivariation and a family of operator valued functions {Φ (t, ·) : t ∈ G} ⊆ L1∗ (ξ) such that (4.2.3) holds, then the process {x(t)} is of weak operator Cram´er class. Proof. (1) The proof of the first part is given in [13, p. 175]. The second part is obvious since L1 (ξ) = L2∗ (mξ ). (2) Suppose that {x(t)} is an X-valued process of weak operator Cram´er class relative to a bimeasure M ∈ Mb and a family {Φ(t, ·) : t ∈ G} ⊆ L2∗ (M ). Then, since M is a T (H)-valued positive definite kernel on A × A, there exists a normal Hilbert B(H)-module XM admitting M as the reproducing kernel (cf. [13, p. 37]). Consequently, XM is a normal Hilbert B(H)module consisting of T (H)-valued functions ζ on A such that ζ(A) = [ζ(·), M (A, ·)]M ,
ζ ∈ XM , A ∈ A,
where [·, ·]M is the gramian in XM . Let η(A) = M (A, ·) for A ∈ A. Then, it follows that η is an XM -valued countably additive measure, i.e., η ∈ ca(A, XM ), and Mη = M since Mη (A, B) = [η(A), η(B)]M = [M (A, ·), M (B, ·)]M = M (A, B) for A, B ∈ A. Moreover, η is of bounded operator semivariation since so is M . But A has a countable generator and so the weak Radon-Nikod´ ym derivative η = dη/dνη exists, where νη is a dominating measure for η. Define an XM -valued process {y(t)} by Φ(t, θ) η(dθ), t ∈ G, y(t) = Θ
where the integral is well-defined since Theorem 3.6.2 is applied to have L1∗ (η) = L2∗ (M ). We see that {y(t)} is of weak operator Cram´er class. Let ˜ x) be the closed submodule of X generated by the set {x(t) : t ∈ G} S(˜ ˜ y ) the closed submodule of XM generated by the set {y(t) : t ∈ G}. and S(˜
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
419
If we let U (y(t)) = x(t) for t ∈ G, then U can be extended to a bounded ˜ y ) onto S(˜ ˜ x), where U satisfies: linear operator from S(˜ U (ay(t)) = aU (y(t)) = ax(t),
a ∈ B(H), t ∈ G,
i.e., U is a module map. If we define ξ = U η, then it follows that ξ ∈ bca(A, X) is of bounded operator semivariation and has the weak RadonNikod´ ym derivative ξ = dξ/dνη = U η . Since Φ(t, ·) ∈ L1∗ (M ) and U is a module map we can apply Remark 3.6.3(3) and (3.6.10) to obtain x(t) = U (y(t)) = U Φ(t, θ) η(dθ) = Θ
Θ
Φ(t, θ)(U η)(dθ)
=
Θ
Φ(t, θ) ξ(dθ),
t ∈ G.
Thus, the integral representation of {x(t)} is obtained. The converse is obvious.
Using the integral representations just obtained above we can consider some interrelations among Cram´er classes. Remark 4.2.3. Let {x(t)} be an X-valued process on G. (1) If {x(t)} is of weak Cram´er class, then it is of scalarly weak Cram´er class. In fact, suppose that {x(t)} is of weak Cram´er class, so that the scalar covariance function γ can be written as (4.2.1) for some measurable space (Θ, A), some scalar valued bimeasure m ∈ M, and some family of scalar functions {ϕ(t, ·) : t ∈ G} ⊆ L2∗ (m). It follows from Proposition 4.2.2 (1) that there exists an X-valued measure ξ ∈ ca(A, X) such that ϕ(t, θ) ξ(dθ), t ∈ G. x(t) = Θ
Then, for any φ0 ∈ H it is true that ϕ(t, θ) ξ(dθ), φ0 xφ0 (t) = (x(t), φ0 )H = = Θ
Θ
ϕ(t, θ)(ξ(dθ), φ0 )H =
Θ
H
ϕ(t, θ) ξφ0 (dθ)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
420
b1644-ch05
Real and Stochastic Analysis
for t ∈ G, where ξφ0 (·) = (ξ(·), φ0 )H ∈ ca(A, L20 (Ω)). Thus, {x(t)} is of scalarly weak Cram´er class. (2) An immediate consequence of (1) above is that {x(t)} is weakly harmonizable if and only if it is scalarly weakly harmonizable. (3) It is easy to see that a weak Cram´er process is not necessarily an operator weak Cram´er process. Indeed, let {φk }∞ k=1 ⊂ H be an orthonormal 2 basis and f (= 0) ∈ L0 (Ω), and take (Θ, A) = (R, BR ). Define 1 f φk , k
ξ(A) =
A ∈ BR ,
(4.2.4)
k∈A∩N
where N = {1, 2, . . .}. Then we see that ξ ∈ ca(BR , X) but ξ ∈ / bca(BR , X). Hence, the process {x(t)} defined by x(t) =
R
ϕ(t, u) ξ(du),
t∈R
(4.2.5)
is of weak Cram´er class but not of weak operator Cram´er class, where {ϕ(t, ·) : t ∈ R} ⊆ L1 (ξ).
4.3. Karhunen processes In this subsection, processes of some Karhunen classes are introduced and Karhunen dilation of a Cram´er process is considered. So let X = S(L20 (Ω), H) as before and G be a locally compact abelian group. Definition 4.3.1. (1) An L20 (Ω)-valued process on G is said to be of Karhunen class if its covariance function γ is expressed as γ(s, t) =
Θ
ϕ(s, θ)ϕ(t, θ) ν(dθ),
s, t ∈ G
(4.3.1)
for some finite measure space (Θ, A, ν) and some family of scalar functions {ϕ(t, ·) : t ∈ G} ⊆ L2 (Θ, ν). (2) An X-valued process on G is said to be of scalarly Karhunen class if, for each φ ∈ H, the L20 (Ω)-valued process {(x(t), φ)H } is of Karhunen class in the sense of (1) above. (3) An X-valued process on G is said to be of Karhunen class if its scalar covariance function γ is representable as (4.3.1).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
421
(4) An X-valued process on G is said to be of operator Karhunen class if its operator covariance function Γ is representable as Φ(s, θ) F (dθ) Φ(t, θ)∗ , s, t ∈ G (4.3.2) Γ(s, t) = Θ
for some measurable space (Θ, A), some operator measure F ∈ ca(A, T +(H)) and some family of operator valued functions {Φ(t, ·) : t ∈ G} ⊆ L2 (F ) (cf. Definition 3.5.1). Since the measure ν in (4.3.1) and the measure F in (4.3.2) are both of bounded variation there is no notion of weak Karhunen class. As Cram´er processes have integral representations, so do Karhunen processes. Proposition 4.3.2. Let {x(t)} be an X-valued process on G. (1) If {x(t)} is of Karhunen class relative to a finite measure space (Θ, A, ν) and a family of scalar functions {ϕ(t, ·) : t ∈ G} ⊆ L2 (Θ, ν), then there exists an X-valued orthogonally scattered measure ξ ∈ caos(A, X) such that ν(·) = ξ(·)2X and ϕ(t, θ) ξ(dθ), t ∈ G, (4.3.3) x(t) = Θ
where the integral is in the Dunford-Schwartz sense. Conversely, if the process {x(t)} is represented as in (4.3.3) by some X-valued orthogonally scattered measure ξ ∈ caos(A, X) and some family of scalar functions {ϕ(t, ·) : t ∈ G} ⊆ L2 (Θ, ν) on some finite measure space (Θ, A, ν) with ν(·) = ξ(·)2X , then it is of Karhunen class. (2) If {x(t)} is of operator Karhunen class relative to an operator measure F ∈ ca(A, T +(H)) and a family of operator valued functions {Φ(t, ·) : t ∈ G} ⊆ L2 (F ) for some measurable space (Θ, A), then it has an integral representation given by x(t) = Φ(t, θ) ξ(dθ), t ∈ G (4.3.4) Θ
for some X-valued gramian orthogonally scattered measure ξ ∈ cagos (A, X) such that Fξ = F, where the integral is well-defined (cf. Remark 3.5.3.). Conversely, if the process {x(t)} is represented as in (4.3.4) by some X-valued gramian orthogonally scattered measure ξ ∈ cagos(A, X) and some family of operator valued functions {Φ(t, ·) : t ∈ G} ⊆ L2 (Fξ ) on some measurable space (Θ, A), then it is of operator Karhunen class.
October 24, 2013
10:0
422
Proof.
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
(1) is well known (cf. [5, p. 55]).
(2) The first part is noted in [13, p. 175] and the second part is obvious. The weak and weak operator Cram´er processes are characterized using Karhunen dilations as follows. Proposition 4.3.3. Let {x(t)} be an X-valued process on G. (1) The process {x(t)} is of weak operator Cram´er class that has an integral representation (4.2.3), where (Θ, A) is a locally compact abelian group with its Borel σ-algebra A having a countable generator, if and only if it has an operator Karhunen dilation, i.e., there exist a normal Hilbert B(H)-module Y containing X as a closed submodule and a Y-valued process {y(t)} of operator Karhunen class on G such that x(t) = P y(t) for t ∈ G, where P : Y → X is the gramian orthogonal projection. (2) The process {x(t)} is of weak Cram´er class if and only if it has a Karhunen dilation, i.e., there exist a Hilbert space Y containing X as a closed subspace and a Y-valued process {y(t)} of Karhunen class such that x(t) = Jy(t) for t ∈ G, where J : Y → X is the orthogonal projection. Proof. (1) To prove the “if” part, suppose that {x(t)} has an operator Karhunen dilation {y(t)} as mentioned, so that it is of the form y(t) =
Θ
Φ(t, θ) η(dθ),
t∈G
for some η ∈ cagos(A, Y ) and {Φ(t, ·) : t ∈ G} ⊆ L2 (Fη ). Then it follows that Φ(t, θ) ξ(dθ), t ∈ G x(t) = P y(t) = Θ
by Remark 3.6.3(3), where ξ = P η ∈ bca(A, X). Thus, {x(t)} is of weak operator Cram´er class. Conversely, assume that {x(t)} is of weak operator Cram´er class that has an integral representation given by (4.2.3). Let {η, Y, P } be a separable gramian orthogonally scattered dilation of ξ that exists by assumption and
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
define {y(t)} by
b1644-ch05
423
y(t) =
Θ
Φ(t, θ) η(dθ),
t ∈ G,
where the integral is well-defined since {Φ(t, ·) : t ∈ G} ⊆ L1∗ (ξ) ⊆ L2∗ (Mξ ) ⊆ L2 (Fη ) = L1∗ (η) by Theorems 3.5.2 and 3.6.2. Clearly {y(t)} is of operator Karhunen class and P y(t) = x(t) for t ∈ G. Thus, {x(t)} has an operator Karhunen dilation.
(2) is established in Chang and Rao [5].
Remark 4.3.4. It is easy to verify in a similar fashion as in Remark 4.2.3(1) and (2) that an X-valued process on a locally compact abelian group G is of scalarly Karhunen class if and only if it is of Karhunen class. Moreover, there is a Karhunen process that is not an operator Karhunen process. To see this we let (Θ, A) = (R, BR ) and ξ be defined by (4.2.4). Then we see that ξ is orthogonally scattered, i.e., ξ ∈ caos(BR , X), but ξ is not gramian orthogonally scattered. Thus, taking a family of functions {ϕ(t, ·) : t ∈ R} ⊆ L1 (ξ) ⊆ L1DS (ξ), we see that the process {x(t)} defined by (4.2.5) is of Karhunen class, but not of operator Karhunen class. 4.4. Operator representation In this subsection we consider operator representation of operator Karhunen processes as is done by Rao [35] for (scalar) Karhunen processes (see also Chang and Rao [5] and Rao [32]). We shall work along with his development. We need some terminologies. As before let X = S(K, H), where H and K are separable Hilbert spaces, and consider the set B(X) of all bounded linear operators on X. An operator S ∈ B(X) is said to have a gramian adjoint if there exists an operator T ∈ B(X) (clearly uniquely) such that [Sx, y] = [x, T y],
x, y ∈ X.
The operator T is denoted by S ∗ and is called the gramian adjoint of S. It follows that an operator S ∈ B(X) has a gramian adjoint if and only if S is a module map, i.e., S(ax) = a(Sx) for a ∈ B(H) and x ∈ X. Let A(X) denote the set of all operators in B(X) that have gramian adjoints, or equivalently, the set of all bounded module maps on X. Hence, a gramian orthogonal projection is an operator P ∈ A(X) such that P 2 = P ∗ = P . An operator N ∈ A(X) is said to be gramian normal if N ∗ N = N N ∗ . A gramian spectral measure on a measurable space (Θ, A) is a gramian orthogonal
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
424
b1644-ch05
Real and Stochastic Analysis
projection valued function E on A that is strongly countably additive and satisfies E(∅) = 0 and E(Θ) = 1, the identity operator on X. We first show that an abelian family of gramian normal operators induces an operator Karhunen process, where the parameter set G can be any nonempty set. Proposition 4.4.1. Let G be any nonempty set. Let {N (t) : t ∈ G} ⊆ A(X) be a set of gramian normal operators such that N (s)N (t) = N (t)N (s) and N (s)N (t)∗ = N (t)∗ N (s) for s, t ∈ G. Define an X-valued process {x(t)} on G by x(t) = N (t)x0 for t ∈ G, where x0 ∈ X. Then there exist a compact space Θ, an X-valued gramian orthogonally scattered measure ξ ∈ cagos(BΘ , X) and a family of continuous functions {ϕ(t, ·) : t ∈ G} ⊆ C(Θ) such that the process has an integral representation given by ϕ(t, θ) ξ(dθ), t ∈ G x(t) = N (t)x0 = Θ
and the operator covariance function is given by Γ(s, t) = [x(s), x(t)] = ϕ(s, θ)ϕ(t, θ) Fξ (dθ), Θ
s, t ∈ G,
where Fξ (·) = [ξ(·), ξ(·)]. Thus, the process {x(t)} is of operator Karhunen class. Proof. Let A be the commutative B ∗ -algebra generated by the set {1, N (s), N (t)∗ : s, t ∈ G}. Then, A is isometrically *-isomorphic to C(Θ) for some compact Hausdorff space Θ. Let U : A → C(Θ) be the isometric *-isomorphism and ϕ(t, ·) = U N (t) for t ∈ G. By Dunford and Schwartz [11, p. 895] there exists a unique gramian spectral measure E : BΘ → A such that ϕ(t, θ) E(dθ), t ∈ G. N (t) = Θ
Then it follows that
x(t) = N (t)x0 = = Θ
Θ
ϕ(t, θ) E(dθ)x0
ϕ(t, θ) ξ(dθ),
t ∈ G,
where ξ(·) = E(·)x0 ∈ cagos(A, X). Hence, the process {x(t)} is of operator Karhunen class.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
425
Before stating the next proposition we remark the following. For any Hilbert space K there exists a probability measure space (Θ, A, ν) such that K is isomorphic to L2 (Θ, A, ν) = L2 (ν), where Θ can be taken as a compact Hausdorff space and A as its Borel σ-algebra (cf. Rao [33, p. 515]). Let (Θ, A) be an arbitrary measurable space and F ∈ ca(A, T +(H)). Then, since L2 (F ) is a normal Hilbert B(H)-module, it is isomorphic to H ⊗K1 = S(K1 , H) for some Hilbert space K1 . Hence it follows from the above remark ˜ such that K1 is isomorphic to that there exists a compact Hausdorff space Θ 2 ˜ ˜ L (Θ, A, ν), where ν is a probability measure. Thus there exists an operator ˜ T +(H)) such that L2 (F ) ∼ measure F˜ ∈ ca(A, = L2 (F˜ ). Now the following proposition shows that every operator Karhunen process has an operator representation in terms of an abelian family of gramian normal operators. Recall that we assume that H and K are separable. Proposition 4.4.2. Let X = S(K, H) and {x(t)} be an X-valued process on a set G of operator Karhunen class such that its operator covariance function Γ is given by Γ(s, t) = [x(s), x(t)] = Φ(s, θ) F (dθ) Φ(t, θ)∗ , s, t ∈ G Θ
for an operator measure F ∈ ca(A, T +(H)) and a family of operator valued functions {Φ(t, ·) : t ∈ G} ⊆ L2 (F ) on some measurable space (Θ, A). Then there exists an abelian set of operators {N (s), N (t)∗ : s, t ∈ G} ⊆ A(X) such that Φ(t, θ) ξ(dθ), t ∈ G x(t) = N (t)ξ(Θ) = Θ
for a unique X-valued gramian orthogonally scattered measure ξ cagos(A, X).
∈
˜ Proof. Without loss of generality we can suppose that X = S{x(t) : t ∈ G} 2 ˜ ˜ · · } is the closed submodule and L (F ) = S{Φ(t, ·) : t ∈ G}, where S{· generated by the set {· · · }. By the paragraph just above the proposition we can assume that Θ is a compact Hausdorff space and A is its Borel σ-algebra. The measure F can be assumed regular, i.e., for any A ∈ A and ε > 0 there exist a compact set C and an open set O such that C ⊆ A ⊆ O and F (O\C) < ε, F (·) being the semivariation. The gramian in L2 (F ) is given by [Φ, Ψ]F = Φ dF Ψ∗ , Φ, Ψ ∈ L2 (F ). Θ
October 24, 2013
10:0
9in x 6in
426
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
Then, it holds for s, t ∈ G that Γ(s, t) = [x(s), x(t)] = Φ(s, θ) F (dθ) Φ(t, θ)∗ Θ
= [Φ(s, ·), Φ(t, ·)]F , and we see that an operator U : X → L2 (F ) defined by U x(t) = Φ(t, ·),
t∈G
is (extended to be) a gramian unitary operator and gives an isomorphism between L2 (F ) and X. Let M (A, B) = F (A ∩ B) for A, B ∈ A. Then M : A × A → T (H) is a positive definite kernel and there exists a reproducing kernel normal Hilbert B(H)-module XM admitting M as a reproducing kernel. Let η(A) = F (A∩ ·) for A ∈ A, so that [η(A), η(B)]M = M (A, B) = F (A ∩ B) for A, B ∈ A, F = Fη , η ∈ cagos(A, XM ) and η is regular in the similar sense as for F . ˜ η (Θ), the closed By [13, Proposition 13 on p. 118] we have L2 (F ) ∼ = S submodule generated by the set {η(A) : A ∈ A}, where the isomorphism V : L2 (F ) → Sη (Θ) is given by V (Φ) = Φ dη, Φ ∈ L2 (F ). Θ
Since Φ(t, ·) ∈ (t)} by
L1DS (η)
2
= L (F ) we can define an XM -valued process {y
y(t) =
Θ
Φ(t, θ) η(dθ),
t ∈ G.
Then, {y(t)} is of operator Karhunen class. Now W = U V −1 : XM → X is gramian unitary given by W y(t) = x(t) for t ∈ G and hence XM and X are isomorphic. Thus we have by Remark 3.6.3(3) that x(t) = W y(t) = W Φ(t, θ) η(dθ) = Θ
Θ
Φ(t, θ) ξ(dθ),
t ∈ G,
where ξ = W η ∈ cagos(A, X) is gramian orthogonally scattered and regular. By Moln´ ar and Kakihara [23, Theorem 2.7] there exists a regular spectral measure E : A → B(K) such that ξ(·) = E(·)ξ(Θ). Since we can identify
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
427
B(K) = 1⊗B(K) = A(X) (cf. [13, p. 30]), E(·) becomes a gramian spectral measure. Let Φ(t, θ) E(dθ), t ∈ G. N (t) = Θ
Then it is easy to see that N (t) is well-defined, is in A(X), N (s)N (t) = N (t)N (s), and N (s)N (t)∗ = N (t)∗ N (s) for s, t ∈ G. For instance, N (s)N (t) = N (t)N (s) can be verified as follows. Let t ∈ G be fixed and {Φn } ⊂ L0 (Θ; B(H)) be a sequence such that Φn − Φ(t, ·)F → 0. Since N (s) and N (t) are module maps and N (s)E(·) = E(·)N (s) it follows that Φ(t, θ) E(dθ) N (s)N (t) = N (s) Θ
= Θ
N (s)Φ(t, θ) E(dθ)
= lim
n→∞
Θ
= lim
n→∞
=
Θ
Θ
N (s)Φn (θ) E(dθ) Φn (θ)N (s) E(dθ)
Φ(t, θ) E(dθ)N (s)
= N (t)N (s), where the limit is in the strong operator topology and Remark 3.6.3 (2) is used. Thus, we have for t ∈ G x(t) = Φ(t, θ) ξ(dθ) Θ
=
Θ
Φ(t, θ) E(dθ)ξ(Θ) = N (t)ξ(Θ),
completing the proof.
Remark 4.4.3. (1) When L20 (Ω) is separable, any X = L20 (Ω; H)-valued process is of operator Karhunen class (cf. [13, p. 175, Theorem 5]). (2) If G = R or N and {x(t)} is an X-valued norm continuous process, then it can be represented as x(t) = A(t)U (t)x0 ,
t∈G
October 24, 2013
10:0
9in x 6in
428
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
˜ for some point x0 ∈ H(˜ x) = S{x(t) : t ∈ G}, where A(t) is a densely defined closed module map on H(˜ x), {U (t) : t ∈ G} is a weakly continuous group of gramian unitary operators on H(˜ x) such that A(s)U (t) = U (t)A(s) for s, t ∈ G. This follows directly from Chang and Rao [6, Theorem 6, p. 181].
Acknowledgement The author is grateful to Professor M. M. Rao for his invitation to write this chapter of the book and his kind advices during the preparation of this. The author could express his results on weak Radon-Nikod´ ym derivative and its application in an organized way.
References [1] J. L. Abreu, A note on harmonizable and stationary sequences, Boletin de la Sociedad Matem´ atica Mexicana 15 (1970), 48–51. [2] J. L. Abreu and H. Salehi, Schauder basic measures in Banach and Hilbert spaces, Boletin de la Sociedad Matem´ atica Mexicana 29 (1984), 71–84. [3] B. Bongiorno, L. Di Piazza and K. Musial, A characterization of the weak Radon-Nikod´ ym property by finitely additive interval functions, Bull. Austral. Math. Soc. 80 (2009), 476–485. [4] D. K. Chang and M. M. Rao, Bimeasures and sampling theorems for weakly harmonizable processes, Stochastic Analysis and Applications 1 (1983), 21–55. [5] D. K. Chang and M. M. Rao, Bimeasures and nonstationary processes, in Real and Stochastic Analysis, M. M. Rao (ed.), Wiley, New York, 1986, pp. 7–118. [6] D. K. Chang and M. M. Rao, Special representations of weakly harmonizable processes, Stochastic Analysis and Applications 6 (1988), 169–189. [7] S. D. Chatterji, Orthogonally scattered dilations of Hilbert space valued set functions, Lecture Notes in Mathematics No. 945, Springer, 1982, pp. 269–281. [8] H. Cram´er, A contribution to the theory of stochastic processes, in Proc. Second Berkeley Symposium on Math. Statist. Probability, J. Neyman (ed.), University of California Press, Berkeley, 1951, pp. 329–339. [9] J. Diestel and J. J. Uhl, Jr., Vector Measures, American Mathematical Society, Providence, R. I., 1977. [10] N. Dunford and J. T. Schwartz, Linear Operators, Part I, Interscience, New York, 1958. [11] N. Dunford and J. T. Schwartz, Linear Operators, Part II, Interscience, New York, 1963. [12] M. R. Hestenes, Relative self-adjoint operators in Hilbert space, Pacific J. Math. 11 (1961), 1315–1357.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Weak Radon-Nikod´ ym Derivatives, Dunford-Schwartz Type Integration
b1644-ch05
429
[13] Y. Kakihara, Multidimensional Second Order Stochastic Processes, World Scientific, Singapore, 1997. [14] Y. Kakihara, Spectral domains of vector harmonizable processes, J. Statistical Planning and Inference 100 (2002), 93–108. [15] Y. Kakihara, Integration with respect to Hilbert-Schmidt class operator valued measures, Proceedings of 2005 Symposium on Applied Functional Analysis — Information Sciences and Related Fields, T. Murofushi, W. Takahashi and M. Tsukada (eds.), Yokohama Publishers, Yokohama, 2007, pp. 263–278. [16] Y. Kakihara, Radon-Nikod´ ym derivatives of Hilbert space valued measures, J. Statistical Theory and Practice 5 (2011), 453–473. [17] Y. Kakihara, Orthogonally scattered measures and weak Radon-Nikod´ ym derivatives, Integration: Mathematical Theory and Applications 3 (2012), 125–136. ¨ [18] K. Karhunen, Uber lineare Methoden in der Wahrscheinlichkeitsrechung, Ann. Acad. Sci. Fenn. Ser. A I Math. 37 (1947), 1–79. [19] Z. Lipecki and K. Musial, On the Radon-Nikod´ ym derivative of a measure taking values in a Banach space with basis, Rev. Roumaine Math. Pures Appl. 23 (1978), 911–915. [20] V. Mandrekar and H. Salehi, The square-integrability of operator-valued functions with respect to a nonnegative operator-valued measure and the Kolmogorov isomorphism theorem, Indiana University Mathematics J. 20 (1970), 545–563. [21] P. Masani, Orthogonally scattered measures, Adv. in Math. 2 (1968), 61–117. [22] P. Masani, Quasi-isometric measures and their applications, Bull. Amer. Math. Soc. 76 (1970), 427–528. [23] L. Moln´ ar and Y. Kakihara, A remark on Hilbert-Schmidt operator valued c.a.g.o.s. measures, Research Activities Fac. Sci. Engrg. Tokyo Denki Univ. 12 (1990), 13–19. [24] M. Morse and W. Transue, Bimeasures and their integral extensions, Annali di Mat. Pura ed Appl. 39 (1955), 345–346. [25] M. Morse and W. Transue, C-bimeasures Λ and their superior integrals Λ∗ , Rend. Circolo Mat. Palermo 4 (1955), 270–300. [26] M. Morse and W. Transue, The representation of a C-bimeasure on a general rectangle, Proc. Nat. Acad. Sci. U.S.A. 42 (1956), 89–95. [27] M. Morse and W. Transue, C-bimeasure Λ and their integral extensions, Ann. Math. 64 (1956), 480–504. [28] H. Niemi, On orthogonally scattered dilations of bounded vector measures, Ann. Acad. Sci. Fenn. Ser. A I Math. 3 (1977), 43–52. [29] M. Ozawa, Hilbert B(H)-modules and stationary processes, Kodai Mathematical J. 3 (1980), 26–39. [30] M. M. Rao, Remarks on a Radon-Nikod´ ym theorem for vector measures, Vector and Operator Valued Measures and Applications, D. H. Tucker and H. B. Maynard (eds.), Academic Press, New York, 1973, pp. 303–317. [31] M. M. Rao, Harmonizable processes: Structure theory, L’Enseign. Math. 28 (1982), 295–351.
October 24, 2013
430
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch05
Real and Stochastic Analysis
[32] M. M. Rao, Harmonizable, Cram´er, and Karhunen classes of processes, in Handbook of Statistics, J. Hannan, P. R. Krishnaiah and M. M. Rao (eds.), North-Holland, Amsterdam, Vol. 5, 1985, pp. 279–310. [33] M. M. Rao, Measure Theory and Integration, John Wiley & Sons, New York, 1987. [34] M. M. Rao, Bimeasures and harmonizable processes, in Lecture Notes in Mathematics No. 1379, Probability Measures on Groups IX, H. Heyer (ed.), Springer-Verlag, New York, 1989, pp. 254–298. [35] M. M. Rao, Structure of Karhunen processes, J. Combinatorics, Information & Systems Sciences 31 (2006), 167–187. [36] M. Rosenberg, Quasi-isometric dilations of operator-valued measures and Grothendieck’s inequality, Pacific J. Math. 103 (1982), 135–161. [37] E. Thomas, L’integration par rapport a une mesure de Radon vectorielle, Annales de L’Institut Fourier, Grenoble 20 (1970), 55–191. [38] H. Whitney, Geometric Integration Theory, Princeton University Press, Princeton, 1956.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
CHAPTER 6 ENTROPY, SDE-LDP AND FENCHEL-LEGENDRE-ORLICZ CLASSES
M. M. RAO
1. Introduction In Probability, the limiting distributions of suitable sequences of functions such as averages of independent random variables, known as ‘Central Limit Theorems’, do occupy a justly deserved premier position. To employ such a result in applications of most problems of interest, one needs to know how large should be the collection of random variables comprising the averages or similar functions, depending on the tolerance level of possible errors in its use. The situation in using these limiting results was pointed out long ago in a nice exposition by H. Chernoff (1956), explaining some key statistical and other applications. Indeed the problem arises whenever one is dealing with questions of inference. The earliest problem is in knowing the toleration levels of errors in such an application. This point was in fact raised by P. L. Tshebyshev at the end of the 19th century and was also detailed by A. I. Khintchine (1953) in a beautiful essay, presenting the basic acceptable criteria for many (even most) purposes, and this is outlined here as it will be a useful motivation for the ensuing analysis. Consider a finite set of mutually exclusive and exhaustive events A1 , . . . , An each occuring with probability p1 , . . . , pn , so that P (Ai ) = pi n and i=1 pi = 1, 0 ≤ pi ≤ 1. To measure the uncertainty of occurance of events, one may look for a real valued function h : (p1 , . . . , pn ) with the property that; (a) h(p1 , . . . , pn ) has the maximum (uncertainty) if p1 = · · · = pn , (b) h(p1 , . . . , pn , 0) = h(p1 , . . . , pn ), and (c) if A = (A1 , . . . , An ), B = (B1 , . . . , Bm ) are a pair of sets or events which are mutually independent, then h(AB) = h(A)+h(B) and more generally if the events A, B are not necessarily independent, then h(AB) = h(A) + hA (B).
431
October 24, 2013
10:0
9in x 6in
432
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
Here the last term denotes the conditional measure defined as: hA (B) = m j=1 pj hAj (B), where hAj (B) denotes the conditional probability of B given Aj . A characterization of such a function hAj (·) satisfying the above three conditions is somewhat similar to the classical Cauchy equation f (x + y) = f (x) + f (y) implying that f (x) = cx under the assumption of continuity or boundedness of f . Analogously Khintchine has shown that there is exactly one solution under the (natural) condition that h(·) is continuous on R. n It is given by h(p1 , . . . , pn ) = −c j=1 pj log pj where c > 0 is a constant. Now the logarithm can be given for any fixed base, but take the natural base for definiteness, and then one sets x log x = 0 if x = 0. Also normalize the function, for convenience, so as to have the constant c = 1, and the function h(·) which measures the uncertainty is then called the entropy of the scheme. It is interesting to note that, under the above conditions, the ‘log’ function shows up and it or its inverse (the ‘exponential’ fuction) is seen to play a critical role in much of the following analysis. This will be considered, leading to the so-called Legendre transform as well as some extensions of Laplace’s original method in the ensuing work. An important early effort combining both the central limit problem and the useful speed of convergence, especially the error bound, was considered by A. M. Liapounov, as already noted, at the end of the 19th century. An improved error estimation by A. C. Berry (1941) and C. G. Esseen (1945) by the middle of the century and it will be stated for a motivation. Essential detail is in many Probability text books (cf., e.g. Rao (1984, p. 256, or the 2nd edition by Rao and Swift (2006), p. 296) and for an extended analysis with refinements see particularly Sazonov (1981)). Theorem 1.1. Let X1 , . . . , Xn , be a sequence of independent (real) random variables on (Ω, Σ, P ) with means µi = E(Xi ), variances σi2 = E(Xi − µi )2 , and the third absolute moments ρ3i = E(|Xi − µi |3 ). If n E(S ), σ2 (Sn ) are the corresponding means and Sn = i=1 Xi , where n n3 3 variances, let ρSn = i=1 ρi (the sum of the third absolute moments), then the central limit theorem’s error relation is given by: x u2 1 e− 2 du sup P [(Sn − E(Sn ))/σ(Sn ) < x] − (2π) −∞ x∈R < C0
ρ3Sn , σ 3 (Sn )
(1)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
433
and if further the Xi are identically distributed, then the right side of (1) C1 K where K = ( σρ11 )3 and C0 , C1 are absolute constants can be replaced by √ n satisfying 0 < C0 < 4.8, 0 < C1 < 1.321. These constants are perhaps improvable. By bringing in entropy considerations into this analysis, it is found after some further work that the result can be generalized and refined including several not necessarily independent sequences or processes. This sharper analysis is discussed in later sections. The work is also motivated by some key observations of Ka¸c (1951) on the use of a large class of problems that are related to certain classical and stochastic differential equations leading to some elliptic and parabolic types which enrich such applications. This effort leads to using certain powerful techniques of function space analyzes which mutually benefit the subjects and enlarge the applicational potential in many areas. Sections 2 and 3 consider these questions (extending to Rn , n > 1) in detail. Section 4 deals with the LDP problems with particular attention to function spaces under all these considerations. An important aspect here is the treatment of LDP for processes valued in infinite dimensional vector spaces, as an extension of the just noted finite dimensional results but now using the projective limit theory. Here one has to consider the Fenchel extension of the convexity to multidimensions and that aspect of the function space theory is not readily available in books and so some attention is paid to this work in Section 5 with a view for future applications. The last section considers the evaluation problems of conditioning and some results on methods of evaluation, avoiding multiple solutions depending on ‘differentiation bases’ used. This general problem has not been satisfactorily treated in the literature. Finally, it will also be discussed about the (usually ignored) fact that the work in this whole area, as it is in most of the current mathematical research, deals with just the existence problem and the related important constructive methods are usually not treated. This aspect lags far behind and the only mathematician of recent times who spent much of his last years has been Erret Bishop with an incisive volume (1967, updated slightly in 1985). The problems that await solutions for the present work will be considered in the last part of this analysis, as an area which leads to future investigations. This especially relates to some important problems on which a large part of conditional expectation analysis rests. As it is well known, this aspect (the differentiation of measures) consists of numerous questions and the constructive analysis has not really made much of an impact
October 24, 2013
434
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
although its use and application are essential parts of current theories as well as applications. These results will be discussed in the last section.
2. Error Estimation Problems from Probabilty Limit Theory The basic results in probabilistic analysis come from statistical and related applications motivated by large sample analysis leading to a limit theory in estimating, for instance, an unknown parameter such as the mean of an underlying probability distribution for which only a few moments may be assumed to exist. The discrepancy between the estimated and the true value should be decided as a function of the sample size. This part of the analysis is usually termed the ‘central limit theory’, occupying a large portion of the early part of probability. The problems will be presented here from a general view-point. To begin with, the following special result on the Vitali convergence theorem of measure theory has an interesting extension from pointwise (or in measure) convergence of sequences to weak (or distributional) convergence on the original class by a change of measure spaces, instead of working on the given (measure) space. This point of view has a distinguished place in the present analysis since the underlying (probability) space can be replaced by another to serve the purpose at hand as was noted by Skorohod (1965). [The usefulness of this view was indicated in our basic text book (1984, 2nd edn. 2006) and emphasized recently in giving a purely probabilistic proof of the classical Sterling’s approximation.] Also in the mid 1960s, the problem of continuous square integrable martingales and their close relations with Brownian motion have been discovered simultaneously by Dambis, Dubins and Schwarz, as well as Knight in addition to Skorohod. A brief account of this will be included as it strengthens motivation here. It also shows the key role played by ‘stopping time transformations’ in this analysis. Theorem 2.1. (Knight) Let {Xt , Ft, t ≥ 0} ⊂ L2 (P ) be a continuous martingale, i.e., E Fs (Xt ) = Xs , where {Ft , t ≥ 0} ⊂ (Ω, Σ, P ), a complete probability triple, is a standard filtration, namely Ft ↑ with Ft+0 = ∩s>t Fs = Ft , t ≥ 0, and t → Xt a.e. is continuous. Then the process has almost all sample paths X(·) (ω) : R+ → R, either of unbounded variation or monotone on each nondegenarate interval. In particular if the Xt is a martingale process then it is either of unbounded variation on each nondegenerate interval, or a degenerate process, i.e., Xt = X0 , a.e. for all t ≥ 0 (a behavior characteristic of Brownian motion).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
435
There is the following sharper result for a subclass of processes which is given at once for an n-variate process, due to Dambis, and independently to Dubins and Schwarz relating it with the n-dimensional BM (both discovered in 1965). (The details of this result and the preceding one are also available in sevaral books, cf., e.g., Rao (1995), p. 406.) Theorem 2.2. Let (Ω, Σ, P ) be a complete probability space having {Ft , t ≥ 0} as an increasing family of σ-subalgebras, Ft ⊂ Σ, such that Ft = ∩s>t Fs , t ≥ 0 and each is P -complete (termed a standard filtration of σ-algebras of Σ). If {Xt , Ft , t ≥ 0} is an Rn -valued (separable) process with continuous paths starting at a0 ∈ Rn (i.e., P (lims→t Xs = Xt ) = 1, t ≥ 0, and P (X0 = a0 ) = 1), suppose that (a) {Xti − X0i , Ft } is a local martingale for 1 ≤ i ≤ n, in the sense that for some sequence of strictly increasing stopping times Tn of the Ft -family meaning that (a) {Tn ≤ t} ∈ Ft , t ≥ 0 (b) P (Tn < Tn+1 ≤ n + 1) = 1, P (Tn ↑ ∞) = 1, and (c) if τtm = min(Tm , t) with Ztn = X ◦ τtn , then {Ztn , Ft , t ≥ 0} is a uniformly integrable martingale. [Such an {Xt , Ft , t ≥ 0} is also called a local martingale.] Suppose additionally that the following three conditions hold: (i) Each of the component processes {(X i − X0i )t , Ft , t ≥ 0}, 1 ≤ i ≤ n is a continuous local martingale; (ii) Each of the second order processes {(X i X j − X0i X0j )t , Ft , t ≥ 0, 1 ≤ i, j ≤ n}, is a continuous local martingale; (iii) P [sups,t∈I Xs − Xt > 0] = 1 for each open interval I ⊂ R+ . Then the Xt -family is a shadow of the BM in the sense that Xt = Y ◦Tt , t ≥ 0 where ·, in (iii), above is the metric of Rn in which {Yt , t ≥ 0} is the BM. A complete proof of this key result is also found in the above reference. This theorem shows how the continuous (even finite dimensional) martingales are certain “time-change” transformations of a BM, and so one would want to have details for such a generalized, but well-related to the BM, processes for many applications. This is obtained with the latter result using stopping time transformations as a new tool. These two results motivate the work below on the error estimation and certain results from abstract analysis often used in probabilistic applications. For most of our work here, it is necessary to have stochastic integrals relative to martingale integrators as generalizations of Itˆ o’s earlier definition with BM integrators and appropriate stochastic integrands. This was given a general and natural extension of the subject by Bochner which will be
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
436
b1644-ch06
Real and Stochastic Analysis
recalled here for comprehension and useful application. For this the concepts of stochastic integrator and integrand will be restated after introducing the desired integral. The simplest method of defining the integral of a continuously differentiable function f : [0, 1] → R relative to the BM {Xt , t ∈ [0, 1]}, suggested by Paley, Wiener, and Zygmund in 1933, is to consider a for1 mal integration by parts of the ‘make-believe’ integral ‘ 0 f (t)dXt ’ where f (0) = 0 = f (1), f is continuous (i.e., f ∈ C01 ([0, 1])) as: 1 1 f (t)dXt = − Xt f (t)dt, (1) Tf = 0
0
and where the right side is Bochner’s well-known vector-valued integral. Now observe T : C0 [0, 1] → L2 (P ) as a mapping so that, using the fact that E(Xs Xt ) = min(s, t), T f 22 =
Ω
1
0 1
Xt f (t)dt
1
= Ω
1
0 1
0
1
0
0
=
0
0 1
Ω 1
=
dP
Xs Xt f (s)f (t)dsdt dP
= 0
2
Xs Xt dP
f (s)f (t)dsdt
min(s, t)f (s)f (t)dsdt
f (s)2 ds = f 22 .
(2)
Hence T becomes an isometric linear operator on C01 ([0, 1]) ⊂ L2 (P ), a dense subspace. But by the Riesz representation, there exists a unique vector measure Z : B([0, 1]) → L2 (P ) (of finite Fr´echet [not Vitali] variation) such that 1 1 Tf = f (t)dZ(t) = f (t)dXt (3) 0
0
for which Z([a, b)) = Xb − Xa and thus the BM determines a unique vector measure Z, and the right side is a well-defined object, called the Wiener integral obtained originally by N. Wiener in 1923 in a more elaborate manner. A further generalization for a much larger class of processes subsuming the BM was introduced by S. Bochner in 1955, thereby presenting a
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
437
powerful method to include numerous stochastic integrals such as (quasi-) martingales and others. Now this class will be considered, and an additional (essentially final) extension and some applications will be included. Since by (2) and (3), the bounded linear operator T on L2 (P ) uniquely determines a vector measure Z(·) of finite Fr´echet variation, so that for n each simple function f = i=1 ai χ[ti ,ti−1 ) , ai ∈ R, in the cases including the BM, one has with J ⊂ R, an interval n−1 n−1
f (t) dZ(t) = ai Z([ti , ti+1 )) = ai (Xti+1 − Xti ), J
i=0
and hence
i=0
2 E f (t) dZ(t) |f (t)|2 dt, ≤C J
J
with J = [0, 1], 0 = t0 < t1 < · · · < tn = 1. This seems to have motivated Bochner (1955) to introduce directly a random measure Z, termed L2,2 bounded, if there is a constant C > 0 such that for each bounded measurable f : J → R one has 2 |f (t)|2 dt, f ∈ L2 (J, dt). (4) ≤C E f (t) dZ(t) J
J
He also noted that if (4) holds for f ∈ Lρ (µ) and if the left side expectation is with power p ≥ 1 but 0 < ρ < ∞, who then termed it Lρ,p -boundedness. So (4) can be replaced by ρ (5) E f (t) dZ(t) ≤ C1 |f (t)|p dµ(t), J
J
for all simple functions f of the above type and for a process Z : J → Lp (P ), where such functions are dense in the metric space Lρ (µ), which is an F space with a translation invariant metric. The following distinction of the above concept contrasting the classical Lebesgue case is worthy of note. Theorem 2.3. Let the process X = {Xt , t ∈ [0, 1]} be L2,2 -bounded (or just the BM), and Z(·) be the vector measure which it induces as in (3), so that Z([a, b)) = Xb − Xa , 0 ≤ a < b ≤ 1. Then there exists an essentially unique (vector) function Y : [0, 1] → L2 (P ) such that 1 1 f (t)dZ(t) = f (t)Y (t) dt, f ∈ L2 ([0, 1]), (6) τ : f → 0
0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
438
b1644-ch06
Real and Stochastic Analysis
where the right side symbol is a vector (or Bochner) integral, and the left side one is the stochastic integral defined above. In general Y is not an R-N derivative of Z, as the case of (the nondifferentiable) BM shows. Proof. The mapping τ : f → J f (t) dXt is bounded and linear so that by an extended form (due to Dunford) of the Riesz representation theorem, there is a unique vector measure Z such that τ (f ) = I f (t) dZ(t), where Z is only of finite Fr´echet variation and, by the L2,2 -boundedness hypothesis, one has (5) with p = ρ = 2. Now J = [0, 1) can be regarded as a locally compact abelian group under addition (mod(1)) as group operation. Then by the Plancherel theorem on L2 (J, dt), with J˜ as the ‘dual group’ of J with its invariant measure which is a constant multiple of the Lebesgue measure, one has (7) E(|τ (f )|2 ) ≤ C1 |f (t)|2 dt = C2 |fˆ|2 (u) du. J˜
J
Hence, T (fˆ) =
J˜
fˆ ∈ L2 (J˜, dt),
˜ fˆ(t) dZ(t),
(8)
where Z˜ is (like Z) a vector measure of the Riesz representation. Then ixt ˆ ˜ f (t) dZ(t) = τ (f ) = T (f ) = e f (t)dt dZ(x) J
f (t)
= J
J˜
J˜
f (t)Y (t)dt, (say)
=
J
ixt ˜ e dZ(x) dt,
f ∈ L2 (J, dt), (9)
J
defining Y as a Fourier transform of the vector measure Z˜ and such a (vector) function or process {Y (t), t ∈ J} is also called weakly harmonizable. This implies all the statements for the L2,2 -bounded problem which can easily be extended to LCA groups. The Lρ,p -boundedness concept can also be given an appealing general form, motivated by the process {Y (t), t ∈ J} obtained above. In fact the classical Kolmogorov representation of the (weakly) stationary process on J is known to be representable as a Fourier transform of an orthogonally valued vector measure in L2 (P ). Thus the harmonizable case is an extension
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
439
of the former wherein the Z(·) is a general vector measure. This will now be presented to explain an aspect of the representation leading to the next stage. Namely, the L2,2 as well as the Lρ,p - boundedness properties may be extended to processes that can induce the known stochastic integrators, after remembering the key properties of the BM in this procedure, to include all the classes detailed in Theorem 2.2. For this the general concept of a stochastic integrator (applicable to the BM as well as the ‘quasimartingale’ classes) will now be introduced, after recalling the c`adl` ag (the French acronym for right continuity with left limits, namely, continue a` droit et limit a` gauche) condition and using the associated σ-algebra family of filtrations already employed in Theorem 2.2. First the Lρ,p - boundedness may be stated, more generally, as an Lφ1 ,φ2 bondedness where φi (0) = 0, i = 1, 2 and both increasing, in which φ1 is also convex while φ1x(x) ↑ ∞, x ↑ ∞. Here the convexity and higher growth of φ1 for x > 0 come from a classical uniform integrability characterization of a class of integrable random variables {Xn , n ≥ 1} due to de la Vall´ee Poussin (1915) which will be invoked here. There is a deep connection between Bochner’s L2,2 -boundedness concept and vector measures Z : B(J) → Lp (P ), 1 ≤ p ≤ 2, via an important inequality due to Grothendieck which was extended by Lindenstrauss and Pelczy´ nski (1968), detailed in the book (Rao (1995), Prop. VI.2.7). This will be stated to highlight the fact that the de la Vall´ee Poussin theorem shows up in this work quite naturally. The result alluded to is as follows: If Xt = Z(0, t], t ∈ J = [0, 1], is the process determined by the measure Z(·), then the set {Xt , t ∈ J} is L2,2 -bounded where Z : B(J) → L2 (P ), and if L2 is replaced by Lp (P ), 1 ≤ p ≤ 2, it becomes L2,p -bounded. This result lies somewhat deeper, and the details are included in the above reference. Now the Lp(P )-valued stochastic measure for p ≥ 1, has interesting properties. To utilize this fact in a more useful form for applications, the following concept of stochastic integrator will be needed and hence is given. Let φ: R → R+ be a symmetric increasing function vanishing only at x = 0. Consider the class Lφ (P ) = {f : J × Ω → R, B(J) × Σ measurable, f φ,P < ∞} where f φ,P
= inf k > 0 :
Ω×J
φ(f /k)dP dt
≤k<∞ .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
440
b1644-ch06
Real and Stochastic Analysis
It can be shown that letting dα = dt × dP, the space Lφ (α) introduced above under the metric ·φ is a complete linear metric space, even when α is a more general measure, and is a Banach space if further φ is convex in which case the metric can be replaced by an equivalent norm topology. For more on these points and spaces, one can refer to the monograph by Rao and Ren ((1991), Chapter 10). Definition 2.4. Let X = {Xt , Ft , t ∈ J} be a c`adl` ag process, each Xt being Ft -adapted (i.e., Ft -measurable), and that {Ft, t ∈ J} is a standard family of σ-algebras contained in Σ of (Ω, Σ, P ). Let O ⊂ B(J) ⊗ Ft be n a σ-algebra and f = i=1 fi χAi : Ω = J × Ω → R, a measurable simple function relative to O. Now let S(Ω , O) be the class of O-measurable simple functions. Then the process X is called a stochastic integrator on the set S(Ω , O) if the following two conditions hold for the integral operator τ defined on S(Ω , O) by: n
τ (f ) = f dXt = fi (Xsup Ai − Xinf Ai ), (10) Ω
i=1
for which: (i) sup{τ (f )φ : f ∈ S(Ω , O), f ∞ ≤ 1} < ∞, and (ii) fn ∈ S(Ω , O), |fn | ↓ 0, a.e. ⇒ lim τ (fn ) = 0, n
in probability. This is indeed an extension of the integrator given in Theorem 2.3 above. It is seen in the next result as a general characterization of stochastic integrals relative to the BM as well as those introduced by Itˆ o, HitsudaSkorohod, Stratonovich and several others. Here the function φ is termed a generalized Young function which plays an important role in the following analysis. In particular the L2,2, L2,p and Lρ,p -bondedness concepts introduced earlier are subsumed, and seems to be the most general one so far known. This also connects stochastic integrals and generalized Orlicz spaces. In fact it will be seen below that this connection between stochastic analysis and abstract function spaces is in fact quite natural and encompases both in formulation as well as in analyses of several results. It may be verified, with the noted extension of Grothendieck’s result, by Lindenstrauss and Pelczy´ nski (1968), that every vector measure Z: B(J) → Lp (P ) for 1 ≤ p ≤ 2 is actually L2,p -bounded relative to some finite measure
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
441
µ : B(J) → R+ . This allows one to state the general result for any Lφ1 ,φ2 bounded vector measure, in which φ1 can and will be taken to be the de la Vall´ee Poussin (convex) function which always satisfies φ1x(x) ↑ ∞. It then includes all stochastic integrals considered in the literature. Thus the general case with the Bochner boundedness concept, and the Definition 2.4, can be connected as follows: Theorem 2.5. Let φ1 , φ2 , be generalized Young functions and X = adl` ag process on a probability space (Ω, Σ, P ) {Xt , Ft, t ∈ J = [0, 1]} be a c` such that Xt ∈ Lφ2 (P ). Let S(Ω , O) be the (linear) space of simple functions introduced in Definition 2.4, measurable with respect to a σ-subalgebra O ⊂ B(J) × Σ. If X is Lφ1 ,φ2 -bounded relative to a σ-finite measure α on O, then the mapping τ of (10) extends from S(Ω , O) to Mφ1 (α) the closed linear span of {S(Ω , O), ·φ1 } into Lφ1 (P ) and that the dominated convegence criterion is valid for this integral. The converse holds in the following form. Let φ2 be moderately increasing in the sense that φ2 (2x) ≤ C0 φ2 (x), x ≥ 0, (Ω, Σ, P ) be a separable measure space, and X a stochastic integrator process, in the sense of Definition 2.4. Then there is a de la Vall´ee Poussin function φ1 (i.e., a positive convex function satisfying φ1x(x) ↑ ∞ as x ↑ ∞) and a σ-finite measure t α : O → R+ relative to which 0 f (s)dXs , t ≥ 0 is defined and the dominated convergence assertion holds for this (stochastic) integral. Proof. If X is Lφ1 ,φ2 -bounded, then τ defined by (10) is a continuous linear map on S(Ω , O) → Lφ1 (α), and has an extension onto its completion, denoted M φ1 (α) ⊂ Lφ1 (α). Since fn ∈ S(Ω , O), |fn | ↓ 0 ⇒ φ1 (|fn |) ↓ 0 so that the Lφ1 ,φ2 -condition shows that E(φ2 (|τ (fn )|) → 0 which implies that X is a stochastic integrator. For the opposite direction, X being a stochastic integrator, the mapping τ : S(Ω , O) → B(Ω , O) is linear and bounded. So by a known extension of Riesz representation, there is a unique operator valued measure M : B(J) → B(Lφ2 (P ), B(Ω , O)) such that τ (f ) = f (t) dM (t), f ∈ B(Ω , O), τ = M (J). (11) J
It then follows that τ (fn ) → 0 as |fn | ↓ 0 and since φ(·) is moderating, Lφ2 does not contain a copy of c0 , the space of null converging scalar sequences. Then using the fact that Lφ2 (P ) is separable, with some standard analysis, one concludes that on B(J), M (·) admits a σ-additive extension in the
October 24, 2013
10:0
9in x 6in
442
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
operator topology, and hence τ (f )φ2 ,P ≤ K0 (F (·, ·)φ2 ,P )φ2 ,α .
(12)
This implies after some further calculation with the bound of (12) that |τ (f )| < ∞, (13) E φ2 k0 gives the converse. Here some familiar computations on the metric space Lφ2 (P ) are also used. The standard, but not entirely trivial, details are found in the reference (Rao (1995), pp. 470–471). The point of this work is to show that the Lφ1 ,φ2 boundedness is essentially optimal for general stochastic integration. It is also of interest to observe that a vector measure ν: Σ → X , a Banach space, has a finite φ-variation relative to a finite positive measure and the de la Vall´ee Poussin function φ, so that it has finite φ-variation relative to a positive measure µ. This fact is also detailed in the above reference. The multi-parameter extension will now be discussed briefly. In an effort to generalize the (Wiener) BM {Xt, t ∈ Rn , n ≥ 1}, P. L´evy considered the real Gaussian random field with X0 = 0, E(Xt ) = 0 and E(|Xt − Xs |2 ) = t − s, t, s ∈ T, · being the Euclidean metric, so that for n = 1, this is just the classical BM. However for n > 1, it is not obvious that CX (t, s) = E(Xt Xs ) = 12 (t2 + s2 − t − s2 ) is positive definite to conclude that CX (·, ·) is a covariance function. With a long-winded and indirect argument, L´evy was able to conclude that CX (·, ·) is indeed a covariance function and asked for a simpler proof which was then supplied by L. Schwartz, and later by Cartier (1971) even for the case that T is a Hilbert space. The details of Cartier’s argument are included in the author’s book (1995). It was shown that the L´evy BM admits an integral representation as: ft (u) dZ(u), t ∈ Rn , (14) Xt = T 2
where Z : B0 (T ) → L (P ), E(Z(A)) = 0, and B0 (T ) is a δ-ring of Borel sets 1
of T and Z is dominated by the Lebesgue measure. Here ft (u) = Ck2 ht (u) where Ck is a constant and ht (u) = u−(d+1)/2(1 − ei(t,u) ). The point of this complicated expression for the representation of Xt is that Xt depends on the whole history of the Z(·) measure. Hence it will not be a martingale random field. Therefore the multidimensional Xt -field cannot be given by a martingale type integral. However, as seen below in Discussion 2.12, the
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
443
L´evy-BM satisfies the Bochner L2,2 -boundedness principle and hence it qualifies to be a stochastic integrator. The detailed version of the L´evy-BM and the related discussion are given in the author’s 1995 book noted above. It may be of interest to observe in this analysis, that the L´evy-BM also has stationary independent increments, since for {s, s , t, t } ⊂ T , one finds from the Cartier work that E[(Xt − Xt )(Xs − Xs )] =
1 [t − s + t − s − t − s − t − s ]. 2
Here again T = Rn for an n ≥ 2. In contrast to the multi-parameter case (or random fields), the vector (even Banach space valued) problem of an integral representation of the corresponding work is somewhat simpler. To indicate this, the following (L´evy-BM) case will be given as an illustration. Some new concepts for classes of martingales including sub- and supermartingales are recalled for this discription. Definition 2.6. (i) Let X = {Xt , Ft , t ∈ R+ } be an adapted process with {Ft, t ∈ R+ } as a standard filtration of (Ω, Σ, P ), a complete probability space, and Xt+0 (ω) = Xt (ω), ω ∈ Ω, t ≥ 0. Then X is termed a semimartingale if Xt = Yt +Zt , where {Yt, Ft , t ≥ 0} is a martingale, {Zt , Ft, t ≥ 0} is a process of bounded variation, and where the Zt -process is expressible as Zt = Zt1 + Zt2 , Z i = {Zti , Ft, t ≥ 0}, each a predictable increasing process for i = 1, 2. Here ‘predictable’ means that ∞ i i Xt− dZt = E(X∞ Z∞ ), (15) E 0
for any bounded positive right-continuous martingale {Xt , Ft , t ≥ 0}, by i = limt→∞ Zti and X∞ = limt→∞ Xt ; Xt− = limn Xt− 1 all of setting Z∞ n which exist. (ii) A process X = {Xt , Ft , t ≥ 0} is a quasi-martingale if for 0 ≤ a < b ≤ ∞ and a ≤ t1 < · · · < tn ≤ b, one has n
E[|Xti − E(Xti+1 |Fti )|] ≤ K0b < ∞.
(16)
i=1
These concepts are needed in order to extend the (stochastic) calculus, by associating the standard measure functions for classes of certain subalgebras determined by {B(R) × Ft, t ≥ 0}, in order to bring in some classical measure theoretical results into the analysis under consideration.
October 24, 2013
444
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
Both the above classes are related. For instance, it can be shown that every quasi-martingale is the difference of two non-negative supermartingales, analogous to the classical Jordan decomposition of a real function of bounded variation since (16) is a generalization of the concept of bounded variation. Moreover, a quasi-martingale is a semimartingale if it satisfies a local version of the Dirichlet condition of potential theory, noted by J. L. Doob. This can be given in the present context as follows. It depends on employing the stopping times of the standard filtration {Ft , t ≥ 0} of (Ω, Σ, P ) associated with the supermartingale {(Xt , Ft ), t ≥ 0}. Thus if {Tj , j ∈ J} is a collection of stopping times of the filtration going with the process, then the Xt -process is of class(D) if the set of random variables {XTj , j ∈ J ⊂ R+ } is uniformly integrable, and it is of class(DL) if the same holds locally, i.e., on each compact subset of the index set R+ . For instance, it is verified that a continuous nonnegative supermartingale indexed by a compact subset J of R+ is of class (D) if (and only if) P [supt∈J Xt > n] = o( n1 ). These processes are interesting for the present work since it can be shown that such a quasi-martingale as well as a semi-martingale in L1 (P ) qualify to be a stochastic integrator. Then by Theorem 2.4, if such a process takes values in L2 (P ), it satisfies the L2,2 boundedness condition which extends to an Lφ1 ,φ2 -boundedness under an obvious integrability hypothesis of the result. These martingale classes are the key examples of those admitted in that theorem. Let us now consider the case where one can associate a numerical measure with such a process, for example, a quasimartingale. Thus let X = {Xt , Ft , t ≥ 0} ⊂ L1 (P ) be a quasi-martingale which is right continuous. Define a set function for A ∈ Ft as: µxa ((t, t ] × A) =
(Xt − Xt )dP,
t < a,
A
Xa dP,
=
t = a.
(17)
A
Then it is easily seen that µxa is finitely additive on the semi-ring (or rather a semi-algebra) determined by the half-closed intervals (t, t ] ∈ Sa , and A ∈ Pa , where the P’s are the σ-subalgebras of Σ determined by the stopping times T of the filtration satisfying T1 (ω) < T2 (ω) ≤ a, ω ∈ Ω, the Ti are stopping times of the filtration such that Tin (ω) ↑ Ti (ω) and where each Tin is finitely valued (such T are termed predictable stopping times). This class is again needed below. The µxa are additive but need not be signed
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
445
measures on the ring generated by Sa × Pa . Here the precise result, due to Dol´eans-Dade (1968), is as follows. Its proof is somewhat involved. Theorem 2.7. The set function µxa , of (17) associated with a right continuous quasi-martingale X, is additive on the semi-algebra Sa × Pa for each a ≥ 0. It has a σ-additive extension if and only if X is of class (DL). The extension is unique. The details are in the previously referenced book (Rao (1995)) and will not be included here. The point of this result is that one cannot easily obtain stochastic integration from the classical (multidimensional) analog. It is inherently an aspect of a really infinite dimensional vector measure integration and analysis, as the ensuing work indicates. The concept of a stochastic integrator for vector valued processes can now be given as follows: Definition 2.8. Let X = {Xt , Ft , t ≥ 0} be a vector valued process Xt ∈ LφX (P ) where X is a Banach space and φ is a non-negative increasing real function vanishing only at x = 0, where the integration is taken in Bochner’s sense. If τ of (10) is defined for X -valued simple functions f , so that τ (f ) ∈ LφX (P ) for all such elements valued in X whence f (t) ∈ B(Ω, Ft , X ), the Banach space of bounded X -valued functions, τ (f ) is well defined and is required to be in a bounded set of its range which is a Fr´echet space. Moreover if the simple functions fn tend to zero in norm for a.a. points of Ω = R × Ω implies τ (fn ) → 0 in probability as n → ∞, then X is termed a vector stochastic integrator. With this vectorial form of the stochastic integrator concept the result of Theorem 2.5 can be presented for Banach space valued processes as follows: adl` ag process Theorem 2.9. Let X = {Xt , Ft , t ≥ 0} ⊂ LφX2 (P ) be a c` where {Ft , t ≥ 0} is a standard filtration of the probability space (Ω, Σ, P ) and X is a separable reflexive Banach space with φ2 , as a Young function. If X is Lφ1 ,φ2 -bounded relative to a sigma-finite measure α: O → R+ where O ⊂ B(R) ⊗ Σ, is a σ-algebra, and φi , i = 1, 2 are Young functions, then X is a stochastic integrator on the closed linear span denoted, MφX1 (α)(⊂ LφX1 (α)), which implies that the dominated convergence criterion holds for the mapping τ : MφX1 (α) → LφX2 (P ). Conversely, if φ2 is a Young function, φ2 (2x) ≤ C1 φ2 (x), x ≥ x0 ≥ 0 for a constant C0 > 0, and X a stochastic integrator on simple functions for the integral τ (f ) of X -valued simple f, then there exists a de la Vall´ee
October 24, 2013
446
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
Poussin function φ1 , a σ-finite measure α: O → R+ , relative to which X is Lφ1 ,φ2 -bounded, and the dominated convergence criterion holds for the resulting stochastic integral τ : MφX1 (α) → LφX2 (P ). This result may also be presented if X is an adjoint Banach space and some other general statements involving no probabilistic ideas but still extending Theorem 2.4. The proof of this and similar extensions are obtained from those discussed for the scalar case. Some of these ideas are also discussed in the preceding reference. Now some really new questions from the vector case appear if the index R+ is replaced by a multidimensional set. To clarify these, a version of the L´evy-BM with its index set as a Hilbert space will be recorded here, since there are many possibile extensions of the result. Using the well-known fact that a Gaussian field is determined by its mean and covariance functions, one has the following general result due to Cartier (1971). A special case was noted earlier. It will be given here in the general case. Theorem 2.10. If the index set T is a real Hilbert space, then there exists a probability space (Ω, Σ, P ) on which one can find a L´evy-BM {Xt, t ∈ T } which is a Gaussian random field with mean m: T → R and covariance CX (s, t) = E(Xs − m(s))(Xt − m(t)), noted after (14). For instance, with m = 0 so that E(Xt2 ) = t, one shows that CX (s, t) = E(Xs Xt ) =
1 [s + t − s − t], 2
(18)
is positive definite, after a nontrivial amount of work. The details will not be included here. An important specialization of these stochastic integrals is if the process (or field) {Xt , t ∈ J} can be analyzed further when the index set generalizing R+ has also a differential structure, so that the process (or field) may be ‘differentiable’ in a suitable sense. This in fact leads to a (stochastic) differential calculus. A new and rich branch of the subject (the stochastic [partial and/or vector] differential equations) emerges. This will be indicated here to round out the analysis, before moving on to the main problems of error analysis and estimation, coming from the central limit theory noted earlier in this section. To discuss a differential structure of the process {Xt , t ∈ [a, b]} subject to an L2,2 -boundedness condition, the usual procedure is to define the derivative as the integrand of a stochastic integral. This procedure
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
447
works only for the first order differential calculus. However, the second and higher order differential analysis necessitates the derivative to be formulated through at least ‘in measure’. When once this is defined then the classical methods of reducing the higher order equations to a first order vector equation may be adapted to the present case with suitable modifications. This is as follows, and will be elaborated further in Section 3. The very first problem of differentiation in the multidimensional case, consists of strong (also termed Fr´echet or ‘F’-) and directional or weak (some times called Gˆ ateau or ‘G’-) derivatives, as seen from the classical exposition, for instance in Hille and Phillips (1957, Chapter III and later). Thus in brief, for Banach spaces X , Y, a function f : X → Y is weakly (or G-) differentiable at x ∈ X if the following limit exists for an open set D ⊂ X and a linear operator D depending on x and h in X : lim [f (x + th) − f (x)]/t = (Df )(x; h),
t→0
(19)
and is strongly or (F)-differentiable if (Df )(x; h) is a bounded linear mapping in h and that, using (19), the following strong limit exists: lim f (x + h) − f (x) − (Df )(x; h)/h = 0.
h→0
(20)
A differential calculus can then be obtained first for (G)- and then refined for (F)-differentials. For instance one can demand that (19) should hold for h in ‘finitely open’ sets so that at each point x in the domain, one requires that (19) obtains for h in ‘finitely open’ sets in that at every point x in the domain one wants the limit to exist for x + th as t → 0 for ‘balls’ contained in star shaped neighborhoods. (Details can be found in HillePhillips, Chapter 3.) If these limits are uniform in all open balls of small radius, then the F -differential calculus holds. The results of use here are specializations when X = L1 (P ); Y = Rk , or Ck for some k < ∞. The case with x, h as vector random variables will be of interest. A brief analysis is included here for comparison in the context of vector BM as well as L2,2 -bounded classes. Theorem 2.11. Let {(Ω, Σ, Ft, P ), t ≥ 0} be a standard filtered measure adl` ag Rn -valued process (or field) space, and X = {Xt , Ft , t ≥ 0} be a c` which is L2,2-bounded relative to a σ-finite measure α: P → R+ where P is the predictable σ-algebra determined by the above filtration. Then X is a semi-martingale. On the other hand, a vector (even of infinite dimension)
October 24, 2013
448
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
c` adl` ag square integrable semi-martingale is always L2,2 -bounded relative to a σ-finite measure on P as above. This result is detailed in (Rao (1995), pp. 474–5). It includes the vector (Wiener) BM, and is more general. The characterization was independently given for the (one-dimensional) BM case by Dellacherie and Bichtler in 1980, implying that (quasi and) semi-martingales are the most general class of stochastic integrators in the L2 -context. An extension to the Lφ1 ,φ2 bounded case is also of interest. The following consequence of the above analysis will explain some related earlier work by other authors, particularly that of M´etiver-Pellaumail (1980), and much extended by Dinculeanu (2000). It will be of interest here. Discussion 2.12. Some consequences of the above analysis, as applied to the L´evy-BM {Xt , t ∈ T = Rn } can be considered. Although it is not a martingale for n ≥ 2 and is the same as the Wiener-BM for n = 1 when it is again a martingale. However both are Gaussian for any n, and are related by a (nontrivial) change of the ‘time’ parameter in T . Thus let {Bt , t ∈ Rn } be the standard Wiener-BM. Consider the (unfortunately unmotivated) mapping u : t → ut (·), t ∈ T = Rn defined for a constant An > 0, by √ n+1 ut (x) = An |x|− 2 [exp[i < t, x >] − 1], x ∈ Rn , i = −1. The complex non-linear function, ut ∈ L2 (Rn , µ), the Hilbert space of square integrable (complex) functions relative to the Lebesgue measure, plays a key role in this analysis. Further {ut, t ∈ Rn } forms a dense subset of the Hilbert space. If one sets Xt = But , the latter being the (complex Wiener-) BM with parameter space Rn , it is seen that Xt , t ∈ Rn , is a complex BM with X0 = 0, and is Gaussian with parameter space Rn . Since one can verify that us − ut 2 = |s − t| and the Bt - is n-parameter (complex Wiener-) BM, the Xt , t ∈ Rn is also Gaussian and after a routine computation one finds that the {Xt , t ∈ Rn } random field is Gaussian and E[|Xs − Xt |2 ] = 2|s − t|,
s, t ∈ Rn
which is the L´evy-BM. Although it is a particular type of (non-linear) transform in the parameter, it is still Gaussian, but is NOT a martingale. This is inferred from an additional computation that this random field admits a Dunford-Schwartz type representation as an integral: Xt = ft (v) dZ(v), t ∈ T, (21) T
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
449
where Z(·) is a vector measure of orthogonal values defined on the Borel sets of T and ft (·) is a function of the complex ut defined above. This implies that the Xt -field depends on the whole history of the Z-field of (21). The above representation of the L´evy-BM is due to Neveu (1965). Also a simplified theory of the original L´evy-BM was discussed by Cartier (1971). The needed details here can again be found in the above referenced book by the author (1995, pp. 482–485). The point is that this Z(·)-field now satisfies the L2,2 -boundedness condition and then the Z(t), derived from the WienerBM, is not a martingale but is a stochastic integrator in the sense of Definition 2.4. A process with the last property was termed a ‘summable process’ in Dinculeanu (2000, p. 164 and that follows). Under various conditions on the range (Banach) space he has extended the vector integration of much of the classical (functional) analysis. On the other hand Dellacherie (1980) and Bichteler (1981) have, as already noted before, characterized stochastic integrators as semi-martingales when the filtrations are somewhat restricted. But a semi-martingale satisfies an Lφ1 ,φ2 -boundedness condition and in particular if the process is square integrable then it is L2,2 -bounded which implies that, under these general conditions, the summable processes are subsumed under the (generalized) Bochner boundedness hypotheses. It is not clear how much larger is our class, but it seems that the work of the latter can be extended to a very general vector (even operator) valued case pioneered by Dinculeanu. It may also be concluded that the L´evyBM, although not a martingale, perhaps qualifies to be a semi-martingale under an appropriate integrability condition and further work is desirable here to clarify matters in the multi-parameter integration. The following is a vector space form of the classical Itˆo’s crucial differential formula in an infinite dimensional setup. It is included to show how for smooth or differentiable functions f : R → C these spaces are replaced by H which involve several counterparts of the classical formulas reappearing. A large part of this was formulated by M´etivier and Pellaumail (1980), and a sketch of the key Itˆo formula in this form will be presented to show the flow of ideas in the subject which take shape in this extension. Thus if f : H → H is twice continuously F-differentiable function, then following Schatten (1959), one finds that the first derivative f at x ∈ H satisfies f (x) ∈ B(H) and if Y ⊂ B(H), a closed (vector) algebra of continuous linear operators on H which have finite trace so that A ∈ Y to mean the trace tr(A) defined by |tr(A)|2 = A2 = (A, A) =
∞
i=1
Axi < ∞
(22)
October 24, 2013
450
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
for some (hence also for all) complete orthonormal sequences {xn , n ≥ 1} ⊂ H. Then f (x) ∈ B(Y, H). Now Y is a Hilbert space, of Schmidt class of operators on H, and a Hilbert space valued adapted process {Xt , G, t ≥ 0} ⊂ L2H (P ) has a finite quadratic variation process, denoted [X]. This holds for the X-process as the H-valued BM, but it can be shown that the result also holds more generally for the L2,2 -bounded vector valued processes of the above type. This is a consequence of the fact that for an L2,2 -bounded c`adl` ag process {Xt , Gt , t ∈ [0, b]} the quadratic variation [X] exists, positive definite and is finite on the Borel algebra BJ , J = [0, b]. With this preparation, the following version of the classical Itˆ o formula with the same probabilistic basis and the linear analysis formalism sketched above, can be given. Theorem 2.13. Let H be a separable Hilbert space and Y ⊂ B(H) be a separable Schmidt class (a Hilbert space in its own right under the trace norm (22)) and f : H → H be a twice continuously F-differentiable mapping with derivatives denoted f , f (f (x) ∈ B(H), f (x) ∈ B(Y, H)) uniformly continuous on bounded sets. If X = {Xt, Gt , t ≥ 0} ⊂ L2H (P ) is L2,2 -bounded c` adl` ag process relative to the natural filtration {Gt , t ≥ 0} and a measure α: P → R+ , where P is a predictable σ-algebra relative to the filtration, then for t > 0, the following vectorial form of Itˆ o’s formula obtains: t 1 t f (Xs− ) dXs + f (Xs− ) d[X] f (Xt ) − f (X0 ) = 2 0+ 0+
([f (Xs ) − f (Xs− )] − f (Xs− )∆Xs ) + 0<s≤t
−
1 f (Xs )(∆Xs )⊗2 , 2 0<s≤t
the last two sums being strongly convergent a.e. and ∆Xs = Xs − Xs− is the jump of the process at s > 0. The real difficulties start when the index R+ is replaced by a multidimensional set as in the L´evy-BM case and the corresponding random fields are considered. This will be left for now, and consider the stochastic differential equations of higher order to explain some new problems that arise here. 3. Higher Order SDE and Related Classes The stochastic differential equation (or SDE) of first order (even non-linear) can be considered in an equivalent form by redefining it into an integral form
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
451
which uses the Itˆo format and analyzes the behavior of the former from the latter. For the second and higher order (nonlinear) cases this procedure does not always work and a stochastic differential, either pointwise or in mean will have to be employed often with some new problems. Such equations arise in applications, for instance in the linear case, to describe the motion of a simple harmonic oscillator and its natural generalizations. This aspect of the problem and a related nonlinear analog will be discussed here before moving on to the asymptotic analysis of the already noted ‘large deviations’ problems. The linear case was analyzed in some detail by Dym (1966), and a related nonlinear higher order case is more involved and a reasonably detailed treatment and discussion was included in the author’s paper, Rao (1997). A stochastic linear differential equation is formally expressed as: f (t)Dn X(t)dt = f (t) dBn (t), (1) n
J
J
i th order differential operator with where Dn = i=1 ai D + a0 is an n constant coefficients, and Bn (t) is an L2,2 -bounded ‘noise’ process, with J as an interval. This is an integral form of the symbolic differential equation ˆ t is the random vector of Di Xt , i = 1, . . . , n so that the linear SDE where X becomes with (B = (0, . . . , Bn ), the last element being Bn ):
ˆ ˆ + dB(t) dX(t) = AX(t)
(2)
and where A is an n×n matrix with 1 s on the diagonal just above the main diagonal, with the last row having the constants a1 , . . . , an and zeroes elsewhere. In this (constant coefficient) case the situation is somewhat simple, and an explicit solution can be written down formally as: t −A(t−t0 ) ˆ C+ e−(t−u)A dB(u), (3) X(t) = e t0
ˆ 0 ) = C, is a constant vector. In the case that where the initial value X(t 2,2 the L -bounded process Bn (t) has independent values on disjoint sets, so that it has independent increments, then for any Borel set B and h > 0 one has the resulting equation holding with probability one where: ˆ + h) ∈ B|Xt ], t ≥ t0 , ˆ + h) ∈ B|X(t), t ≥ t0 ] = P [X(t P [X(t
(4)
ˆ so that the vector process {X(t), t ≥ t0 } is moreover a Markov process. Note that the discontinuties, if any, of the X-process are of the moving ˆ has type and in particular if the Bt -process is BM, then the X(t)-process
October 24, 2013
452
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
almost all continuous sample paths. In this case the solution can be given a simpler description by expressing it as t+h ˆ + ˆ + h) = e−Ah X(t) X(t e−A(t+h−u) dB(u), (5) t
for the above constant matrix A of (3). This allows one to analyze the path behavior of the solutions more analytically by introducing an operator family induced by the (transition) probability functions of the Xt -process. Naturally more detailed work is possible when the error process which is so far assumed only as L2,2 -bounded and of independent increments is further specialized as BM. In this case the solution process (5) satisfies the (conditional) probability equation: ˆ + h) ∈ B|X ˆ t ], t ≥ t0 , ˆ + h) ∈ B|Xs , t0 ≤ s ≤ t + h] = P [X(t P [X(t
(6)
and this is explicitly calculable in the BM case where the conditional probability function is shown to be regular. In this event the solution is a Gaussian Markov process and the conditional probability measures have densities explicitly obtainable in terms of a covariance R = (Rij ) matrix and a mean function µ(·). It can be verified that one has the mapping Tt : B(Rn ) → B(Rn ) defined by f (y)pt (x, y) dy, f ∈ B(Rn ), (7) (Tt f )(x) = Rn
where pt , t ≥ t0 is the conditional density of Xt given Xt0 , which is Gaussian. Moreover the operator family {Tt , t ≥ t0 } satisfies the ChapmanKolmogorov equation, forming a contractive semigroup. The adjoint semigroup of the preceding one is defined as above and is given by a similar equation, namely: ∗ (Tt−t f )(y) = f (x) pt−t0 (x, y) dx, f ∈ B(Rn ), (8) 0 Rn
which is also a contraction. These families have interesting structural properties, as established by Dym (1966), and his result will be given for a convenient reference. Theorem 3.1. The functions t → Tt f and t → Tt∗ f are real analytic for (0) each f ∈ B(Rn ) with ranges in Cb (Rn ), the space of bounded continuous ˆ functions vanishing at 0. The class {X(t), t ≥ t0 } is a Feller process and hence is strongly Markov. The infinitesimal generators, denoted as G, G∗ , of
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
453
the associated semi-groups are the following degenerate elliptic differential (2) operators acting on the subspaces Cb (Rn ) given by:
1 ∂2 ∂ ∂ + xi+1 − (x1 an + · · · + xn a1 ) , 2 2 ∂xn ∂xi ∂xn n−1
G=
(9)
i=1
and similarly G∗ = a1 +
1 ∂2 ∂ ∂ − xi+1 + (x1 an + · · · + xn a1 ) . 2 2 ∂xn ∂xi ∂xn n−1
(9 )
i=1
The detailed computation with related references was given by Dym (1966), and since the infinitesimal generators are degenerate operators most of the classical PDE theory is not applicable. Special methods have to be devised and applied in these higher order cases. Even when the coefficients ai are non-random functions of the time variable ‘t , the analog of Dym’s work has an interesting newer form of the class of ‘evolution’ operators Tst instead of (7) which are also of interest, and so they will be briefly considered here. The solution now takes the form: t −1 Xt = M (t) M (s) dB(s) + C , (10) t0
where M (t) is the invertible fundamental solution of the SDE dXt = A(t)Xt dt. If now the Bt -process has independent increments, then the (vector) solution is again a Markov process (but not necessarily its component processes). If p(s, x; t, F ) = P [Xt ∈ F |Xs ](x), then the Markov property of the process implies for each Borel set F : p(u, y; F )p(s, x; y, dy), s ≤ u ≤ t. (11) p(s, x; u, F ) = Rn
This gives, via Fubini’s theorem, the operator Tst : B(Rn ) → B(Rn ) to be defined by f (x)p(s, x; t, dy), f ∈ B(Rn ), (12) (Tst f )(x) = Rn
which satisfies the operator equation Tsu Tut = Tst ,
s < u < t,
Ttt = id. = Tss ,
and forms a contractive evolution family. If now the process Bt , t ∈ J, is supposed to have continuous paths, so that the solution {Xt , t ∈ J} has absolutely continuous trajectories, and
October 24, 2013
10:0
9in x 6in
454
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
the operator family {Tst , a ≤ s ≤ t ≤ b} is strongly differentiable on the continuous function space Cc (I), then its infinitesimal generator is given by: lim
h→0
Tt(t+h) f − Ttt f = Gt f, h
f ∈ dom(Tt ) ⊂ Cc (I),
(13)
whose domain depends on t. If t → Gt is a uniformly integrable operator function, then the evolution family satisfies the corresponding integral equation, using Bochner’s integration, with I as identity operator, as: t Gr Tsr dr. (14) Tst = I + a
The point to note here is the (nontrivial) change that the infinitesimal generator undergoes from constant coefficient case to that occurring when they are merely time-dependent. In fact the integral representation of Tst of (11) in terms of its generator is possible, but it is a “multiplicative integral”. This was discussed in (Rao, 2002), and need not be considered further here. The genuinely nonlinear case where the coefficients of the SDE depend on the process Xt will be discussed in order to complete the analysis. An interesting situation arises if the error process {Bt , t ∈ J} is BM and for simplicity J = [a, b] ⊂ R is taken, keeping A(t) as above corresponding to the coefficient matrix. This is again nonstochastic but the last row depending on ‘time’, as noted above the equation (7) and is given by (10). Thus the solution process is also Gaussian since A(t), the coefficient matrix, is nonstochastic. In this case it can be seen that Tst = M (t)M (s)−1 . From the theory of Ordinary Differential Equations with variable coefficients, if φ1 , . . . , φn+1 are the linearly independent solutions of the homogeneous equation [a0 (t)Dn + · · · + an−1 (t)D + an (t)]u(t) = 0, then the unique solution with the initial value matrix C = 0 is classically known to be representable as: Xt =
t n+1
a k=1
φk (s)
Wk (φ1 , . . . , φn+1 ) (s)dBs , W (φ1 , . . . , φn+1 )
(15)
where W (φ1 , . . . , φn+1 ) is the Wronskian determinant of the φi . It never vanishes due to the linear independence of the φi , and Wk is W with its k th column replaced by (0, . . . , 0, 1). If the integrand of (15) is denoted by R(s, t), also called the Riemann kernel, and reduces to the Green kernel
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
455
when the ak are constants, the R(·, ·) is n-times differentiable and R∗ (s, t) = R(t, s). The solution process is then representable as t b R(s, t)dBs = χ[a,t] (s)R(s, t)dBs , (16) Xt = a
a
and the covariance of the solution process Xt is given by (using the indepence of the Brownian increments) b χ{a,s∧t} (u)R(s, u)R∗ (t, u)du, (17) r(s, t) = a
which is n-times continuously differentiable. This useful result will be summarized for reference with a slight extension of Dym’s work, as follows: Theorem 3.2. The nth order linear stochastic differential equation (2) with {Bt , t ∈ J = [a, b]} just an L2,2 -bounded error process has a unique solution Xt , t ∈ J, satisfying the initial condition Xa = C. If the noise process Bt has independent increments, then the solution is a Markov process whose associated class of operators Tst , a ≤ s < t ≤ b, forms a strongly continuous evolution family with generators {Gs , s ∈ J}, having a dense domain, usually depending on s, contained in C(J). Moreover the trajectories of the solution process have an integral representation relative to the covariance kernel R(s, t) when the error process is the Brownian Motion. Further, if the coefficients of the stochastic differential equation are constants, then the solution which is Markov already, has also stationary transition probabilities, and the associated evolution family of operators becomes Tst = Tt−s , which is a (C0 )-semi-group with stationary transition probabilities, and {Tst = Tt−s , a ≤ s ≤ t ≤ b}, defines a semi-group having a degenerate elliptic differential operator as its generator. The covariance is real analytic and the solution is again a Feller process. It is possible to reformulate this, using a procedure suggested by Dynkin (1961, p. 106) with a related but different state space, as follows. Indeed ˜ = [a, ∞) × R2 with the product topology (so it is locally compact), let R ˜ by the and let the transition probabilities be defined for a Borel set F ⊂ R equation: p˜(t, (s, x); F ) = p(s, x; s + t, F ),
(18)
with limh↓0 p(s, x; s+h, R) = 1, ∀s, x. The new family p˜ is stationary for the vector process (t, Xt ) = Zt (say), now called the space-time process. With
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
456
b1644-ch06
Real and Stochastic Analysis
˜ → R, σ ˜ → R+ , the previous second order equation becomes q˜: R ˜:R ˜t , ˜ (Zt , Z˙ t ) dB dZ˜t + q˜(Zt , Z˙ t )dt = σ
˜ t = (0, Bt ), B
(19)
the new right side process is again L2,2 -bounded with stationary independent increments, where the boundedness conditions of q, σ, are then trans˜ formed to those of q˜, σ ˜ on R. The vector solution is Markovian but the individual components are not necessarily Markov processes in these vector representations, and therefore special methods of analysis are needed. The higher order nonlinear case has additional problems and interesting prospects. It will be indicated here, showing the new potential and needed analysis in this study. The corresponding equation is expressible as: ˙ ˙ ˙ dX(t) = q t, X(t), dX(t) dt + σ t, X(t), dX(t) dB(t), (20) which may be written in the integrated vector form as: t t Q(s, X(s))ds + S(s, X(s))dB(s), X(t) = A + t0
(21)
t0
where the above vectors (and matrices) are defined as: ∗ ˙ X(t) = (X(t), X(t)) ,
Q(t, x, y) = (y, q(t, x, y))∗ ,
B(t) = (0, B(t))∗ ;
A = (A1 , A2 )∗
S(t, x, y) = (0, 0; 0, σ(t, x, y)),
the last one is a second order matrix with zeroes in top row and zero and σ(t, x, y) in the last row, as shown, so that the vector stochastic differential equation (19) or (20) be given in integrated form in (21). If q, σ satisfy a certain Lipschitz condition, then one can consider an existence and uniqueness of the solution, so that an analog of the linear case discussed above can be established. It is also appropriate and useful to indicate here an application of SDEs to some important PDE problems using an approach known as the ‘Feynman-Ka¸c methodology’ first heuristically formulated by the wellknown physicist, Feynman and (mathematically) substantiated by the probabilist Ka¸c, in early 1950’s. Since this method advanced the subject further into current research, it will be discussed here with some applications. Consider the Cauchy equation: ∂u (t, x) = (Lu)(t, x) + c(x)u(t, x), ∂t
t > 0, x ∈ Rk ,
(22)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
457
with initial value u(0, x) = f (x), and the differential operator L is defined by L=
n n
∂2 ∂ 1
aij (x) + bi (x) , 2 i,j=1 ∂xi ∂xj ∂xi i=1
(23)
and A = (aij ) is an n × n non-negative definite matrix. This formulation applies to both the elliptic and parabolic PDEs. The key idea (almost unmotivated) is to associate a (nonlinear) SDE of the form dXt = b(Xt )dt + σ(Xt )dBt ,
X0 = x ∈ Rn ,
(24)
where {Bt , t ≥ 0} is a k-dimensional BM, and A(x) = σ(x)σ∗ (x) with σ(x) obeying an appropriate Lipschitz condition. The key and essential new fact is that u(·, ·) is a solution of the PDE (22), given as an evaluation of the conditional expectation: t u(t, x) = E f (Xt ) exp c(Xs )ds |Xt = x . (25) 0
The expectation on the right here is to be evaluated using the (conditional) probability P (·|Xt = x): Σ → R+ with the paths starting at x, in the space Cx (Rk ) and all trajectories lie in it. The advantage of this method is that the solution u(·, ·) of (25) can be obtained using numerical integration when explicit evaluation is not possible, which is helpful in applications. [See Freidlin (1985) in this connection on applications to several other related problems.] This method has the potential of extending (24) when b(·), σ(·) depend also on t, so that one has the general nonlinear SDEs. Then (24) takes the form: dXt = b(t, Xt )dt + σ(t, Xt )dBt ,
(26)
even when b(t, Xt ) and σ(t, Xt ) are operator valued and Bt is L2,2 -bounded with stationary independent increments. The following special application with Bt as Hilbert valued BM indicates a motivation for the general consideration. There is also a related problem of considerable importance and interest, namely that there are two classes of processes governed by (26) in which the ‘drift’ term b(t, Xt ) is either of two types bi (t, Xt ), i = 1, 2 and it is desired to know which may be the true drift of the observed output. This problem is solved by a statistical procedure known as testing hypotheses applied to the observed output. The appropriate methodology in this situation is an extension of the classical hypothesis testing procedure
October 24, 2013
10:0
9in x 6in
458
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
invented by J. Neyman and E. S. Pearson, crucially extended by Grenander (1950) and further advanced by Pitcher (1964), aiming to include the general stochastic analyses. A typical problem will be considered here to indicate the nontrivial and essential application of the above work that actually opens up new areas in abstract stochastic processes. Thus (25) leads to considering the equations: dXt = bi (t, Xt ) + σi (t, Xt )dBt ,
i = 1, 2,
(27)
where the error process {Bt , t ≥ 0} could be an L2,2 -bounded Hilbert space valued random process with independent increments, and satisfying the following conditions. These may be motivated as follows so that the conditions and the formulation may be seen as natural and not artificial. If X and Y are real random variables, Y = X + a, its translate, with distributions FX , FY respectively so that FX (x) = P [X < x],
FY (x) = P [Y < x] = P [X < x − a] = FX (x − a),
then the hypothesis is that H0 : a = 0 vs. H1 : a = 0, which is to be decided regarding the truth. The basic idea here guided by the variational calculus was formulated by Neyman and Pearson in the case of probability densities, and generalized for all probability measures by Grenander is to consider the likelihood ratio: dPa p(x − a) F (x) = Y (x) = dP FX p(x)
La (x) =
(x) . If φ(x) = pp(x) , then where p = FX
a φ(x − u)du; La (x) = exp − log La (x) = − 0
0
a
φ(x − u) du . (28)
Now let (Ta f )(x) = f (x−a), f ∈ L1 (P ) so that the family {Ta , a ∈ R} forms a semi-group on L1 (P ), and for any compactly based smooth h : R → R one has an equation for φ(·) as: φ(x)h(x)p(x) dx = − p (x)h(x)dx R
R
=
∂ ∂a
R
(Ta h)(x)p(x)dx
.
(29)
a=0
This gives a formula for φ in terms of a semi-group of operators, and forms a motivation for extensions to many other processes. Now Pitcher (1971) has
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
459
obtained a deep generalization of the above to diffusion processes which will be indicated to present an application of the subject to include a number of problems. For instance, the following is an extension of this methodology to processes which are solutions of SDEs. Consider the first order SDE given by (27) above so that the process is often termed a ‘diffusion’ when B(t), t ∈ J = [0, 1] is a BM. Suppose the Xt -process has a canonical (or function space) representation, in the sense that Ω ⊂ RJ whence Xt (ω) = ω(t), ω ∈ Ω. This is a convenient representation and it can be shown that most (real) stochastic processes can be given this type of presentation. Here the index set J = [0, 1] ⊂ R is taken only for convenience, but the work can be extended without real difficulty to many processes encountered in applications. First suppose that the drift coefficient b(t, x) of the SDE (27) is the Fourier transform of an L1 (dx)-function B(·, x) with an integrable derivative, whence it belongs to L1 (dx) and that eixy B(y, t)dy; (1 + |y|)|B(y, t)|dy ≤ K < ∞, b(t, x) = R
R
and bx = exists. These are technical conditions to be used here, but may be modified in other situations. They are employed here to outline a procedure of great importance in stochastic analysis. Thus if G is the space of real functions on RJ that depend only on a finite number of coordinates that change from function to function, so that it is an algebra which is dense in Lp (P ), 1 ≤ p < ∞ where P : Σ → [0, 1] is a probability measure to make (Ω, Σ, P ) a probability space, the sigma algebra Σ being one which contains a similar algebra determined by G. The members of g ∈ G then will be of the form g = h ◦ πn for some n where h is a bounded function depending only on finitely many coordinates. One can obtain a shift operator on G → G as Tα f = f (πn (x + αm)), m, x ∈ Ω, α ∈ R for all f ∈ G, where πn : Ω → Rn is the coordinate projection. It is seen that {Tα , α ∈ R} is well-defined forming a group of linear operators on L1 (P ) for which Tα 1 = 1. [For more detail see Rao (2000), p. 396.] It is now observed that the Tα , α ∈ R, with Tα 1 = 1 is a bounded (in fact a contractive) linear operator on L1 (P ), which by a classical Riesz theorem is representable as (Tα f )(x)dP (x) = f (x)dPα (x), f ∈ L1 (P ), (30) ∂b ∂x
Ω
Ω
and P0 = P although all Pα need not be mutually equivalent. The problem of testing the hypothesis that the Pα are equivalent measures to P0 is
October 24, 2013
460
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
answered by the following key result in a generalized form of the NeymanPearson-Grenander theorem, established by Pitcher (1961): Theorem 3.3. For the SDE (27), in which the coefficient a(·, ·) satisfies the above integrability conditions through its Fourier Transform, and the diffusion coefficients σi are bounded and positive, with ax continuous, suppose the shift function m : J → R has a square integrable derivative and m(0) = 0. Then the measures {Pα , α ∈ R}, are mutually absolutely continuous and the likelihood ratio is given by:
α dPα (x) = exp (T−t φ)(x)dt , a.a.x ∈ Ω, (31) dP 0 where for a measurable version of φ, one has with prob. 1, m (t) − m(t)ax (t, Tt ) dBt < ∞, φ = EC σ(t) J
(32)
C being the σ-algebra generated by the solution Xt , t ∈ J, and where E C is the conditional expectation operator, with Bt , as the BM (see also Velman’s (1969) important contribution). The result is not simple, but shows how the theory of Inference started by Neyman and Pearson for the finite samples and generalized to processes by Grenander and further deepened by Pitcher, uses some special aspects of analysis employing calculations of differentiation of measures as an inherent part of Inference Theory. One also finds a deeper relation between Markov Processes and elliptic-parabolic PDE, exemplified in a detailed analysis by Freidlin and Wentzell (1998), as well as Inference Theory of processes shown in the author’s monograph ((Rao), 2000). There is indeed an underlying closely related mathematical theme in both view-points and the main results go to a deeper level which will be of interest for a future research in these areas. All the above work and detail crucially depended on the regular versions of the conditional probability measures. This is a nontrivial problem since not only regularity depends on some unverifiable conditions of the underlying spaces, but more importantly there can be many such families of conditional measures. In most works on the subject conditions for the existence of such a measure are presented and the theory developed. However, the uniqueness of conditional measures is a nontrivial problem, and it will be discussed later in Section 6 where the presence of some crucial difficulties will be highlighted.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
461
It is now time to consider the earlier noted key problem, namely the error approximation, and more generally the large deviations questions for various processes originated in Theorem 1.1, using the fundamental ideas coming from entropy also discussed in Section 1. This will now be taken up from a view-point of the associated function spaces, that are attached to all these classes.
4. Entropy, Action/Rate Functionals and LDP If {Xn , n ≥ 1} is a sequence of independent random variables having a common distribution with mean µ and some higher moments, then the sequence {Sn /n, n ≥ 1} converges to µ in probability or with probability one, by the Law of Large numbers. It is then desirable to know the rate of convergence for applications of this result. Indeed Theorem 1.1 above indicates the kind of error estimation that is usually in both these types of limit assertions. The functional measuring the error is referred to in the literature under the three names given to this section head. The basic problem involved here will now be considered. First it is recalled that there is no universal method to compute limits except to observe the basic fact that the inferior and superior limits, which always exist, should be equal, defining the limit of the sequence. A refinement of this notion was employed with integrals by Laplace for positive functions and measures much earlier than the work leading to Theorem 1.1, which will be recalled to motivate the desired work. If h : [0, 1] → R is a continuous function so that its extrema are attained at some points of the compact interval [0, 1]. So let α = min0≤x≤1 h(x) = h(x0 ) for some x0 ∈ [0, 1]. If h is twice differentiable, then one has the following approximation. With the Taylor expansion of h to three terms and noting that 2 0) β + o(1) one gets with: h(x0 ) = α, so that h(x) = h(x0 ) + 0 + (x−x 2
1 0
e
−nh(x)
dx =
1
0
e−βn
(x−x0 )2 2
dx + o(1)
where β = h (x0 ) > 0 in the above Taylor expansion. This then implies, using the standard normal density integral with mean x0 and variance n2 , the following useful limiting relation on setting h(x0 ) = α which is attained: 1 log lim n→∞ n
1 0
e−nh(x) dx = −α.
(1)
October 24, 2013
10:0
9in x 6in
462
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
The result (1) is also valid for any continuous function h, approximating it by differentiable or smooth (even polynomial) functions because of the Weierstrauss approximation on compact intervals in Rn . The result indicates how the exponential and its inverse, the ‘log’ function plays key roles in the present study opening up large areas of asymptotic analysis by choosing h of (1) suitably. This will be illustrated for an analytic proof of the well-known Stirling’s approximation, also to be used below, of the factorial n!, by Laplace’s method. In order to proceed, consider Euler’s gamma integral of the factorial given by: ∞ ∞ xn e−x dx = e−(x−n log x) dx n! = 0
0
=n
∞
n+1 0
e−n(u−log u) du,
(2)
by a change of variable x = nu. Taking h(u) = u − log u in (1) which has a minimum at 1, consider an interval (1 − , 1 + ) for a small > 0 and expanding log h(x) in Taylor’s series around 1 to get (x − 1)2 (x − 1)2 h(x) = x − (x − 1) − + ··· = 1 + − ··· , (3) 2 2 one has from (2) on using (3) in the small interval (1 − , 1 + ): 1+ 1+ 2 −1 −n(u−log u) −n e du = e e−(u−1) /2n du + o(1), 1−
(4)
1−
which for large n this gives, with values of the well-known area integral of the normal density of mean 1 and variance n1 for the right side of (2) or (4), the following after an obvious simplification: 2π −n (n+1) e , (5) n! ∼ n n which is the known Stirling’s approximation, obtained in 1730 by James Stirling, for n! and is one of the best used asymptotic approximations in applications. This approximation is so close (the error goes down exponentially) that W. Feller, in his well-known book, exhibits some of the values as: for n = 2, one has 2! ∼ 1.919 with a 4% error; for n = 5, 5! ∼ 118.019 similarly with 2% error; for n = 10, 10! ∼ 3, 628, 600 with just 0.8% error; and for n = 100, the error similarly is 0.08%, and an exponentially small difference. The point of exhibiting this illustration is that for an approximation, the
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
463
error is seen to be exponentially small which is the most useful property. This is a motivation for considering approximations that are exponentially small in these deviations from their limiting results. And (1) implies that a function with a Laplace transformation on a finite measure space has an exponential decay which thus clearly motivates formulating the Large Deviations Principle (or LDP). It is also useful to state a probabilistic reasoning for random variables having all moments finite. Thus if X is such a random variable, its Laplace Transform, now termed a Moment Generating Function and denoted MX (·), exists and is defined as: etX dP. (6) MX (t) = Ω
It has the remarkable property that its cumulant function given by ΛX (t) = log MX (t) is convex. Indeed, for any 0 ≤ α ≤ 1, β = 1 − α ≤ 1, one has e(sα+tβ)X dP ≤ αΛ(s) + βΛ(t), ΛX (αs + βt) = log
(7)
Ω
where H¨ older’s inequality was used in the last display before the logarithm is taken on the (positive) product. Since MX (0) = 1 for a moment generating function, it follows that 0 = Λ(0) ≤ Λ(t) ↑ ∞ as t ↑ ∞, and this (convexity) property (7) is basic to the ensuing work. Now by the well-known integral representation of a convex function, one has; t Λ (u)du, (8) Λ(t) = Λ(a) + a
where Λ (·) is the left derivative of Λ which exists and is nondecreasing. Taking a = 0 so that Λ(0) = 0 since MX (0) = 1, and considering the ˜ which will be the usual inverse if Λ is strictly ˜ of Λ (generalized) inverse Λ increasing, and in any case is non-decreasing and left continuous so that t ˜ ˜ (u)du. (9) Λ(t) = Λ a
This representation has a fundamental role to play in the error estimation problem in Probability Theory. For instance, if X1 , X2 , . . . is a sequence of independent random variables with the same distribution on a probability space (Ω, Σ, P ), having the Laplace transform, then by the Kolmogorov law
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
464
b1644-ch06
Real and Stochastic Analysis
of large numbers, the sample mean converges to the population mean with probability one. More precisely one has: n 1
Xi − E(X1 ) > ε = hn (ε) + o(n); P n i=1
˜ ˜ as the Legendre transform where it was found that hn (ε) = e−Λ(ε) with Λ of Λ, and is given by:
˜ Λ(s) = sup{st − Λ(t) : t ∈ R}.
(10)
This function is differently defined in (9) and (10), but can be shown to be the same where the definition in (9) corresponds to the Functional Analysis work and (10) is the earlier result due to Legendre. Now this view of seeing the important Legendre transform enhances the LDP theory enormously and this will be treated here in some detail to explain and clarify the subject. It should be observed immediately that when restricted to the positive half line, the Laplace transform and the Legendre transform applied to probability measures, are precisely a positive increasing convex function and its complementary function which is necessarily convex, both vanishing at the origin and going to infinity. They are known as the complementary pair of Young function, after the detailed analysis by W.H. Young in the early 1900s. In some other applications motivated by problems of probability theory they are known as Laplace and Legendre transforms which are again mutually complementary. For the Young complementary pair the Young and hence its complementary cousin are extended to the whole real line by symmetry, whereas in the Laplace-Legendre case they were not so extended since the desired works in the former were aimed at Linear (or Functional) Analysis and the latter was not restricted to the linearity condition. Nevertheless this close connection should be noted, as it has not been exploited by these two groups of researchers. It will be remedied here and a useful cross-fertilization will be exhibited, with applications in the LDP studies as well as the Functional Analytic counter-parts. This will be more visible when these (convex) functions are defined not just on line but in higher dimensions. Many open problems will then be seen clearly. In both types of analyses, the calculation of Young’s complementary function or the alternatively called the Legendre transform is a convex function which however presents a nontrivial (even a nearly impossible) problem of explicit evaluation although the existence of such a function is implied by
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
465
the associated function theory. It may be useful to present an example to note the involved nature of the problem. 2 For instance, if φ(t) = et − 1, t ∈ R, then it is clearly a Young function, its complementary function ψ exists uniquely, satisfying the Young inequality, but it cannot be explicitly written down. This is also the Legendre function in the other terminology. The corresponding functions, or complementary pairs, in higher dimensions are more complicated but are desirable in our work. They were initiated by W. Fenchel (1949), and their importance in analysis has since been well-recognized and employed under the title ‘Convex Analysis’. It is used in the works of ‘Optimization Analyses’, and a basic account of that subject is in the book by Rockafellar (1970). This extended analysis in the LDP context and related problems appearing below will be briefly considered as Fenchel-Legendre-Orlicz class in Section 5. The following simple illustration shows how the Legendre transform plays a crucial role in the present analysis. Thus if Λ(·) : t → log MX (t) is the cumulant of the random variable X so that it is representable t function as Λ(t) = 0 Λ (u)du where Λ (t), t ≥ 0 is increasing, its complementary ˜ is similarly increasing with its integral representation given by: function Λ t ˜ Λ(t) = (11) Λ˜ (u)du, 0
and the integrand is the inverse of Λ . To appreciate a nontriviality in this construction of Legendre transform, consider a random sample {X1 , . . . , Xn } of a Bernoulli distribution such that P [X1 = 0] = p; P [X1 = 1] = q = 1 − p, then from the fact that Λ (·) is strictly increasing, one has ˜ Λ(t) = t log
q tq − log , p(1 − t) 1−t
and infinity otherwise. In particular if p = becomes
1 2
0 < t < 1,
then the Legendre transform
˜ Λ(t) = t log 2t + (1 − t) log 2(1 − t),
0 < t < 1,
(12)
and infinity otherwise. In this example, if one wants to find the probability of a Borel set A ⊂ R that does not represent heads in n tosses of our fair coin is then An = {k : | nk − 12 | ≥ ε} ∩ A and its probability is clearly
n 1 . (13) Qn (A) = P (An ) = k 2n k∈An
October 24, 2013
10:0
9in x 6in
466
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
Now the asymptotic value of this can be obtained by evaluating the binomial coefficient with the Stirling approximation seen above, to get: ˜ k = − inf Λ(x). ˜ (14) lim log Qn (A) = − lim min Λ n→∞ n→∞ k∈An x∈A n This example is given to exemplify the way that the Legendre transform plays a key role in large deviation probabilities or in error estimation of the problem indicated in Theorem 1.1 as well as the work of similar ones arising in essentially all the Probability Limit Theories. This also amplifies the key position of LDP in the asymptotic analysis of the problems. The first serious application of this set of ideas is due to Cram´er (1938) which motivated the main developments of the subject from that point on. Thus it is appropriate to recall this classical result as it forms a stepping stone for advancing the subject. The general LDP concept generalizing (14) above is stated as follows: Definition 4.1. Let X be a (separable) Fr´echet space and Qn : B(X ) → [0, 1] be a sequence of probability measures where B(X ) is the Borel ¯ + be a mapping such that it is (i) lower semiσ-algebra. Let I : X → R continuous, and (ii) {x : I(x) ≤ a} is compact for each a ∈ R+ . Then the sequence {Qn , n ≥ 1} is said to have the large deviation property relative to I, called an entropy, or rate function, if for each A ∈ B(X ) and for each sequence an ↑ ∞ the following inequalities hold: 1 log Qn (A) n→∞ an
− inf o I(x) ≤ lim x∈A
≤ lim
n→∞
1 log Qn (A) ≤ − inf I(x). ¯ an x∈A
(15)
If X is infinite dimensional I(·) is also termed an action functional. Here Ao is the interior and A¯ is the closure of A. An important relation between the entropy functional I and the ˜ of the cumulant generating function (Young) complementary function Λ Λ is given by the following result, first sketched by Sanov (1957) and the complete details of it can be found in the volume by Dupuis and Ellis (1997, Chapter 2). This result of Sanov will be discussed and its extensions considered below. An excellent companion study on large deviations focusing especially on ‘free probability’ as well as a generalized version of Sanov’s theorem is found in the companion article by F. Hiai (cf. Chapter 3 of this volume)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
467
and will be of particular interest for readers of LDP generally, as well as for the present considerations. To introduce a concrete begining of the subject and give a motivation, the classical Cram´er theorem will be presented as follows: Theorem 4.2. Let X1 , X2 , . . . , be a sequence of real random variables on a probability space (Ω, Σ, P ), having all moments so that their cumu˜ is its Legendre form (or the complemenlant function Λ(·) exists. If Λ tary Young function on the positive line R+ ), then the sequence of means ˜ as its entropy Yn = n1 nk=1 Xk , satisfies the LDP condition (15) with I = Λ function on the Borel algebra of R. An elementary example of this result is seen from a nontrivial appli˜ cation of (12) and (13) in observing how the entropy functional Λ(= I(·)) makes its appearance here naturally as the exponent of the exponential function. Thus for symmetric Bernoulli variables, the complementary func˜ is given by (12) with which one has Qn (A) = P (Yn−1 (A)) to be tion Λ bounded as: n 1 n 1 ≤ Q (A) ≤ (n + 1) max , (16) max n k∈An k 2n k∈An k 2n in which the binomial coefficient can be given an asymptotic value using the Stirling approximation obtained in (5) above. Thus on taking ‘logs’ through, being an increasing transform, one obtains easily the following: 1 1 k k log n k k n 1 = log − log − 1 − log 1 − +O log n 2 n n n n n k 2n ˜ k + O log n . (17) = −Λ n n Setting A = { nk }, a singleton, one obtains from the above: k 1 k log n ˜ = −Λ +O . log Qn n n n n This implies that for each A ∩ [0, 1], a Borel set, one has lim
n→∞
1 ˜ log Qn (A) = − min Λ(x), x∈A n
(18)
˜ is a convex function with a for x ∈ [0, 1] and = +∞ otherwise. Here Λ unique minimum at 1/2 and is symmetric about x = 12 . This shows that
October 24, 2013
10:0
9in x 6in
468
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
the large deviation probability tends to zero exponentially, which generally needs a deep analysis. The basic problem with Large Deviations is illustrated by Theorem 4.2 due to H. Cram´er which shows that the lower and upper bounds may have to be calculated separately, and being defined in terms of their infima and suprema, it often depends on separate methods and is based on procedures of nonlinear analysis — often different for upper and lower estimates. Thus the work is somewhat more involved than usually expected in the limit distributional analysis. A proof of Cram´er’s theorem is in most books on the subject, cf., e.g., S. R. S. Varadhan (1984), P. Dupuis and R. S. Ellis (1997) or Rao and Ren (2002) and several others. This is an active and expanding area since one desires (exponential) bounds, if possible, on the error probabilities in most applications, once a limit theorem is established. Different aspects of the subject and some directions will now be discussed in the rest of this section to indicate some of the advances and problems demanding special treatments, especially for the multi-dimensional (vector valued) and multi-parameter (random fields) questions. The result of Cram´er’s theorem opened up the investigation of the speed of convergence in general limit theorems as a basic aspect of investigations in order to apply the results both for practical applications as well as the limiting analysis for classes of problems reminding an analogy with game theory in which one uses the minimax argument and related results, which then brings in the entropy function. Thus if the random element X: Ω → X , a Banach space, such that it is measurable and weakly integrable so that x∗ (X) ∈ L1 (P ), x∗ ∈ X ∗ , with its cumulant function which is seen to be convex (cf.(7) above), one can compute it as Λ(x∗ ) = log E(ex
∗
(X)
),
x∗ ∈ X ∗ ,
˜ defined as: and then consider its complementary function Λ(·) ˜ Λ(x) = sup{x∗ , x − Λ(x∗ ) : x∗ ∈ X ∗ },
(19)
˜ to satisfy on each compact and look for considering the duality of (Λ, Λ) convex set C ⊂ X , the analog ˜ inf Λ(x) = inf sup {x∗ , x − Λ(x∗ )}
x∈C
x∈C x∗ ∈X ∗
= sup inf {x∗ , x − Λ(x∗ )}, x∗ ∈X ∗ x∈C
(20)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
469
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
which includes the multidimensional case of X = Rn , to be called by anal˜ as the rate or entropy of the (multi-dimensional) problem. When ogy, Λ this is understood, it motivates the study by replacing X ∗ , the space of continuous linear functionals with all continuous functions on the metric space X for which (20) may hold with a suitable right side interpretation. The surprising fact of this proposed extension is that it does not demand the linearity of the (continuous) functionals but depends only on their continuity, as the following fundamental result due to S. R. S. Varadhan (1966) shows. Let us recall the notation and concepts to be used. Thus ∗ Λ(x∗ ) = log E(ex (X) ) for the random element X: Ω → X is given, let its ¯ + be defined as ˜ X →R complementary mapping Λ: ˜ x → sup{x∗ , x − Λ(x∗ ) : x∗ ∈ X ∗ }, Λ: so that one has for any compact convex set C ⊂ X , the relation: ˜ inf Λ(x) = inf sup {x∗ (x) − Λ(x∗ )} = sup inf {x∗ (x) − Λ(x∗ )}.
x∈C
x∈C x∗ ∈X ∗
x∗ ∈X ∗ x∈C
(21) This formula is restated on using Laplace’s method detailed for (1), as in the following basic extension in which X is a separable (for measurability reasons) Fr´echet (also termed Polish) space: Theorem 4.3. (Varadhan) If Xn : Ω → X with {X , B} as a Polish measurable space, n ≥ 1, is any sequence of random elements on a probability space (Ω, Σ, P ) such that the measures Pn = P ◦ Xn−1 satisfy the LDP relative to the entropy function I : X → R+ , whence it is lower semi-continuous and {x : I(x) ≤ k} is compact for each k ≥ 0, then for any bounded below and continuous function f : X → R, one has: lim
n→∞
1 log E(e−nf (Xn ) ) = − inf{f (x) + I(x) : x ∈ X }, n
(22)
holding, and the entropy function is unique. Conversely, if I(·) is an entropy satisfying (22) for all bounded continuous f : X → R then the sequence of random variables {Xn, n ≥ 1} obeys the LDP with entropy I(·). A proof of this key result is given in essentially every book that discusses the LDP part, starting with the original CBMS lectures by Varadhan (1984) and the reader is referred to it or to a more detailed analysis in Dupuis and Ellis (1997).
October 24, 2013
470
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
An adjunct to this basic result, often used in applications, is of interest when there is a continuous mapping h: X → Y between a pair of Polish spaces, and if I is the entropy on X for a random sequence {Xn , n ≥ 1}, consider the image sequence of random elements {h(Xn ), n ≥ 1} in Y. If J(y) = inf{I(x) : x ∈ h−1 (y)}, y ∈ f (X ), and = +∞ otherwise, then one can show that J(·) is an entropy and the sequence {h(Xn ), n ≥ 1} again obeys the LDP with J as its entropy, as one would expect. Since J possibly takes fewer values than I, it is termed a ‘contraction’, and the result is a contraction principle. The converse part of the above theorem can be strengthened for a subclass obeying a stricter condition termed exponential tightness (ET) as follows. [Here ‘exponential’ comes from the inverse of ‘logarithm’ in the formulation of entropy for these problems as discussed in Section 1.] Thus an r.v. sequence {Xn , n ≥ 1} from (Ω, Σ, P ) into a complete metric space X , is exponentially tight if there is a compact set K ⊂ X , such that for all large enough n > 0 one has P [Xn ∈ K c ] is bounded by e−Mn . This means, precisely lim sup n→∞
1 log P [Xn ∈ K c ] ≤ −M. n
For such sequences, a converse to (22) is found by Bryc and several others (cf. Dupuis and Ellis (1997, p. 30) and Feng and Kurtz (2006), p. 44), may be given as follows: Theorem 4.4. If {Xn , n ≥ 1} is an exponentially tight sequence such that for all bounded continuous h: X → R, one has Λ(h) = lim
n→∞
1 log E(e−nh(Xn ) ) n
existing, then the mapping I(·) defined by I(x) = − inf{h(x) + Λ(h) : x ∈ X } is a rate function for which the right side of (22) obtains. Moreover, Λ(h) = − inf{h(x) + I(x) : x ∈ X },
h ∈ Cb (X ).
This implies that I(·) and Λ(·) are essentially complementary in the sense of Young, as used in the Orlicz space theory. It is thus of interest to consider further on a multivariate analog of the Young complementary analysis, now termed also the Fenchel-Young connection. It will be discussed
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
471
in Section 5 to explain its relation with the resulting Fenchel-Legendre(Orlicz) function spaces enabling a better understanding and identification of the subjects in the multivariate case. This again starts with the multidimensional version of the Cram´er theorem which will be considered shortly below. First some useful adjuncts are recalled. The presentation of results in this work is not for generality as it is for a unified and economic statements of theorems. In fact, the following classical result from Royden (1968), (cf. also Skorohod (1976)) shows that for any Polish space X there is a one-to-one and onto mapping f such that f (A) ⊂ [0, 1] for each Borel set A ⊂ X , and this is a special case of a renowned theorem of Kuratowski’s. The point of this discussion is that it exemplifies the close relation with the Calculus of Variations part of mathematics which, as is known, branches out into many serious applications of both theoretical and practical nature. A recent nice survey by Varadhan (2008) presents many applications of LDP and its spread into Theoretical Physics and other areas of special interest and this should be noted. Below some other aspects of this spread will be indicated, starting with the SDE discussed especially in Theorem 3.2, for the higher-order case driven by the Bochner L2,2 -bounded classes and specializing those with independent increments including the BM as a particular but also an important and leading case. If the coefficients ai (t) appearing in (23) there for Theorem 3.2 are sufficiently differentiable (e.g., n-times here), then the process Bt can be an L2,2 bounded one, and the representation of r(s, t) given just prior to this theorem holds. The detailed discussion of this point appears in the book (Rao (2000), Section 4.3). Thus consider the process determined by the nth order SDE as in Theorem 3.2 whose solution is representable by the (stochastic) integral t b Xt = R(s, t) dBs = χ[a,t] R(s, t) dBs , (23) a
a
where R(s, t) is the Riemann kernel and Bs , s ≥ a is an L2,2 -bounded process, including the BM as an important particular case. However, the L2,2 -bounded class of processes includes the Poisson processes which has independent increments, and thus the former class does not stipulate the sample path continuity! Thus the error processes admitted in the early part of Theorem 3.2 include but are larger classes than the BM alone. In the last part of that theorem, the L2,2 -bounded processes, is specialized to the BM class which has a.a.continuous paths and then the solution results in becoming a Feller process which is strongly Markovian, again
October 24, 2013
472
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
with a.a.continuous sample paths. In general however, the more difficult part of the demonstration is in obtaining the upper bound in the analysis of (15) which is the ‘sup’ to be replaced by some single element (usually the last in the case of the BM) using the symmetry and continuity of the increments as in the case of the BM. In considering the nth order SDE with the error process, one may allow a continuous martingale process. Indeed by Theorems 2.1 and 2.2, one still is conceptually close to the BM in executing similar types of extensions. It may be noted at this point that, as pointed out in Discussion 2.12 above, the L´evy-BM is L2,2 -bounded and hence qualifies to be a stochastic integrator, there is another class of processes called fractional Brownian Motions (abbreviated fBMs) which can be included in our discussion here, and are stated as follows. In Theorem 2.10 it was seen that there exists a L´evy-BM which is a Gaussian random field {Xt , t ∈ T } with mean m: T → R and covariance CX (s, t) = 12 [s + t − s − t] where s, t ∈ {H, ·}, a Hilbert space, and this random field is BM only when H is one dimensional. It is also noted that this satisfies the L2,2 boundedness condition which implies that it is a stochastic integrator. Some special cases are of interest. A useful adjunct to this result is that the mapping Cα (s, t) = (|s|α + |t|α − |s − t|α ),
s, t ∈ Rn ,
(24)
is again a covariance for any n ≥ 1, and hence there exists a Gaussian random field on some probability space with mean zero and this covariance. If α = 2H, 0 < H ≤ 1 this random field has special interest in several applications, and is called a fractional Brownian Motion and studied by Mandelbrot and Van Ness (1968) who advocated it for many applications with long range dependence in several publications by the first author and is called the fractional Brownian Motion for 0 < H ≤ 1 which is the BM for H = 12 and non Gaussian for 0 < H < 12 . This area has been explored and detailed with numerous applications by Samorodnitsky and Taqqu (1994) where they noticed fBM class as a subset of the symmetric α-stable or SαS class for 2 ≥ α ≥ 0. On the other hand the kernel given by (24) is a particular case of a large class, defined on certain groups, investigated in great detail by Gangolli (1967) which are eminently suitable for further analysis. The work of the latter was extended in some respects by Masani (1973) to operator valued kernels on a Hilbert space. This general area presents an interesting opening for considerable further analysis in future investigations on the LDP related studies.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
473
Proceeding to the LDP, adapting the methods from Freidlin and Wentzell (1998), one can consider the following type of result for processes given by (24) when the error class {Bs , s ≥ 0} is L2,2-bounded with continuous paths and the H parameter is one, which already includes the general Ornstein-Uhlenbeck class. This also contains the admissible mean functions of T. S. Pitcher’s, discussed in the book by Rao and Ren (2002, pp. 287– 293), with more detailed treatment in Rao (1975). Thus (23) implies for a ≤ t ≤ b, b 1 ˜ t)f (s) ds, f ∈ L2 ([a, b], dt), (R 2 f )(t) = R(s, a
having the positive definite kernel b ˜ u)R˜∗ (t, u)du, R(s, r(s, t) =
f ∈ L2 ([a, b], dt).
a
˜t is BM, one should have a As the analog of analysis in the case that B representation as b ˜ t) dBs , t ∈ T = [a, b], ˜t = (25) R(s, X a
˜ −1 is defined on the range of R ˜ one will have an analogous repreand if R sentation as discussed in the above book, giving the entropy function I(·) for the process as: 1 ˜ −1 2 ˜ ˜ ) = 2 R f , if f ∈ range(R) (26) I(f +∞, otherwise, ˜ −1 f, f ) on R ˜ 12 (L2 ([a, b], dt)). Thus one can state tenso that I(f ) = 12 (R tatively the following modified version of Freidlin’s for future work and extension when one is dealing with all continuous L2,2 -bounded processes ˜ t , t ≥ a}. This is presented formally as follows. {B Theorem 4.5. Let Xt = Xt , > 0, centered L2,2 -bounded process given ˜ Then {X , t ∈ [a, b)} obeys the by (23) with a continuous covariance R. t ¯ + defined by (26). ˜ L2 ([a, b), dt) → R LDP with entropy I: This result when completely detailed, will extend the theory which includes those representable by the BM and can lead to the processes that satisfy the full L2,2 -bounded class and possibly may go on to the full Lφ1 ,φ2 class as an ultimate set covered by these methods that include many types
October 24, 2013
474
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
of SDEs and others. In particular, it shows that the BM has the LDP with ˜ replaced by its covariance R in (26) for I(·) defined on the continuous R function space C0 [0, 1] of real functions on the unit interval vanishing at zero. This result was originally obtained by Schilder (1966) and a direct and shorter proof was immediately given by Varadhan included also in his monograph referred to above with related works reviewed in Freidlin and Ventzel (1970). It may be of interest to illustrate the following results on the inclusiveness and potential spread of LDP in the Probability Limit Analysis. The example given after Theorem 4.2 above, motivates this extension. Let {Xn, n ≥ 1}, be a sequence of independent random variables with their parn tial sum sequence {Sn = i=1 Xi , n ≥ 1}. Then it is seen that the sequence {Sn , n ≥ 1} forms a Markov sequence (or discrete indexed process). If the Xi have the same distribution with one moment then the (strong) law of large numbers implies that the averages { n1 Sn , n ≥ 1} form a convergent sequence with probability 1, when the variables Xn all have the same distribution, averaging to their mean, implying n 1 P [Xi > µ] = exp{n ln(1 − F (µ))}, 0 ← P [ Sn ≥ µ] ≥ n i=1
as n → ∞. This classical fact is related to the LDP-property and motivates a further study of the earlier questions more generally for classes of Markov processes. There are, however, some new hurdles to cross as noted below. The above connection to Markov processes has been extensively studied primarily by Donsker and Varadhan in a series of fundamental works and extended the theory in several directions. One of the related questions already is that a Markov process satisfies the Chapman-Kolmogorov equation which, if P ([Xt ∈ A]|Xr )(ω) = Qs,t (A, Xs (ω)), then it can be expressed for r < s < t as: Qs,t (A, u)Qr,s (du, Xr (ω)), (27) Qr,t (A, Xr (ω)) = R
for almost all ω ∈ Ω and Borel sets A ⊂ R. This equation holds for all Markov processes and plays as a key link to associate a semi-group of operators on Banach spaces so that one can use the existing abstract theory to advance the work on Markov processes, and also contributing new ideas and results to the former. Although this link is of considerable importance for both areas, it is not one-to-one. In fact there exist non-Markov processes satisfying the Chapman-Kolmogorov equation. This was first noted by the
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
475
legendary P. L´evy in 1949 and simpler examples were given soon after by W. Feller for even three-valued random processes of size three that satisfy the C-K equation but not Markovian, so that one is led to consider the LDP separately to all processes that satisfy the Chapman-Kolmogorov equations but are not necessarily Markovian. This extension does not appear to have been considered although some analysis in Feng and Kurtz (2006) looks at the subject from the semi-group point of view in this context, but the Feller examples in this study do not seem to be covered by that work. There is also another related class, called the wide-sense Markov random field for classes of processes taking values in Hilbert Space satisfying the following condition, discussed in Doob (1953, p. 233). If {Xt , t ∈ T } is a family of square integrable complex random process, let R(s, t) be defined ¯ s ) = R(s, t)E(|Xs |2 ). Then it is shown in Doob (1953) that a by E(Xt X second order process, as above, is wide sense Markov if and only if, what can be called the ‘corelation characteristic’, R satisfies the equation: R(s, u) = R(s, t)R(t, u),
s ≤ t ≤ u,
(28)
and this class has some properties analogous to the classical (or strong or the ususal) process. It was studied by Rozanov (1979) and further analyzed recently by Heyer and Rao (2012) when it is indexed by an LCA group G and the Xt , t ∈ G is valued in a separable Hilbert space, extending some of Rozanov’s work. It will thus be of interest to study the corresponding LDP analysis of this class, since considerable effort in the direction of prediction theory of these fields is found in the above papers. With the preceding considerations, the stage is set for a systematic analysis of classes of random fields whose parameter is multidimensional. Following the previous work, the first extension is Cram´er’s theorem which has the following new features leading to infinite dimensional extensions via projective systems and limits. The result takes a new form where X = Rn : Theorem 4.6. Let {Xn , n ≥ 1} be independent identically distributed Rn valued random variables having all moments. If they satisfy the LDP, then the rate function I(·) is given by
x,X1 n e dP : x ∈ R , (29) I(y) = sup x, y − log Ω
where I(·) is convex, continuous and nonnegative with a minimum at a = E(X1 ).
October 24, 2013
476
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
As Varadhan (1984) observes, the lower bound for I(·) in (29) is obtained with essentially the same work as in the one-dimensional case, but the upper bound needs additional arguments for its proof which were also included in the above volume. A method of extending this result to infinite dimensional vector or Banach spaces is to consider Cram´er’s result for all finite dimensional subspaces and piece them together, naturally using projective systems and their limits with the associated ‘patching up’ on the overlaps. This is done with necessary modifications as follows. It was pioneered by Dawson and G¨ artner (1987) which will be explained in the context of a general projective limit theory, e.g., as exposed in my recent publication (Rao (2011)). If Y is a Banach space, and Yn ⊂ Y is an n-dimensional vector subspace, let πn : Y → Yn be a ‘coordinate’ projection onto Yn and for m < n let Ymn ⊂ Yn be an m-dimensional subspace with {πnm : Yn → Ym } as a similar projection so that πm = πmn ◦ πn . Moreover each finite dimensional subspace of a Banach space can be identified with an Rn for some integer n isomorphically, the system {Yn , πmn , 1 ≤ m ≤ n < ∞} is termed a projective system of vector (sub)-spaces of Y and the coordinate mappings {πmn , 1 ≤ m ≤ n < ∞} are compatible satisfying the above composition rules. This may be given abstractly, replacing Yn by Ωα , and its Borel σalgebra by a general σ-algebra, the coordinate mappings πmn by {gαβ , α ≤ β} where α ∈ I is a directed set with inclusion replaced by a given partial order among these mappings, and the probability measures are connected −1 as Pβ ◦ gαβ = Pα for α ≤ β. Similarly the other concepts are extended and will be specialized soon for our case: Definition 4.7. (i) Let {(Ωα , Σα , Pα ), α ∈ J} be a family of probability spaces indexed by a directed index set J and {gαβ , α, β ∈ J, α < β} be a compatible family of mappings such that gαβ : Ωβ → Ωα satisfying −1 (Σα ) ⊂ Σβ , gα,α = id., the identity, and α < β < γ implies that gαβ gαβ ◦ gβγ = gαγ . This collection of mappings, measures, and spaces together −1 , is called the projective system of probability spaces whenever Pα = Pβ ◦ gαβ the compatibility of probabilities on overlaps and agreeing with marginals. If there is a nonempty set Ω ⊂ ×α∈J Ωα such that ω = {ωα ∈ Ωα , gαβ (ωβ ) = ωα , α ∈ J} ∈ Ω, now written as Ω = lim← (Ωα , gαβ ), is called the projective limit space of {Ωα , α ∈ J}. (ii) If Ω = ∅ and Σ = σ(∪α gα−1 (Σα )) suppose there exists a probability measure P : Σ → [0, 1] such that P ◦ gα−1 = Pα for each α ∈ J, where gα : Ω → Ωα satisfying gα−1 (Σα ) ⊂ Σ, so that it is measurable for Σ, and
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
477
for α < β, gα = gαβ ◦ gβ , then (Ω, Σ, P ) is called the projective limit of the system, written as: P = lim Pα : Σ = σ(∪α gα−1 (Σα )). ←
(30)
There are a few points to be elaborated here. Even when Ωα = ×k∈α Rk and Σα is the Borel sigma algebra of Ωα which is generated by the product σ-algebra of Rα for subsets of the positive integers, the projective limit Ω of the system can be empty especially, e.g., when R is replaced by certain subintervals. The following simple example illustrates this point. Let Ωn = (0, n1 ), n = 1, 2, ..., and gmn : Ωn → Ωm be the inclusion map. Then it is seen that Ω = lim← (Ωn , gmn) = ∅, and thus one needs some condition to restrict the system to avoid this tragedy. A useful and simple condition to avoid such situations, called sequential maximality, was introduced by S. Bochner and it is as follows. Let {Ωα , gαβ , α < β, α, β ∈ J} be a projective system. It is said to satisfy the sequential maximality (s.m.) condition if for each sequence α1 < α2 < · · · from the index J and any ω = (ωα , α ∈ J) ∈ ΩJ such that gαn αn+1 (ωαn+1 ) = ωαn , n ≥ 1, then there is an ω ∈ Ω, satisfying gα (ω) = ωα , and that gα (Ω) = Ωα , α ∈ J. The usefulness of this condition is that it is true for most of the projective systems commonly used in applications. A general property of this concept is available for the following important class. Proposition 4.8. For a projective system of compact Hausdorff spaces {(Ωα , gαβ ); α < β; α, β ∈ J}, the (projective) limit space is also compact and is nonempty if each Ωα is nonempty and so the s.m. holds. Because of this result, the s.m. condition is assumed hereafter for the projective systems considered in this work. [See Dugundji (1966), pp. 427– 434, on a quick review and detail of these topological properties.] The problem now is to find conditions on the projective family of probability measures {Pα , α ∈ J} where each Pα satisfies all the Cram´er conditions so that it has the LDP, and the projective limit of the family P = lim← (Pα , gα ), also has the LDP. This was answered positively by D. A. Dawson and J. G¨artner (1987) in the following form, opening up the work for infinite dimensional topological vector spaces as well as analogous study of LDP of general functional operators. In this context, it should be noted that in a projective system of topological measure spaces admitting a limit, each member being a regular probability, does not imply the same
October 24, 2013
478
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
property for the limit. This is because regularity of Pα means that most of its probability is concentrated in a compact set, and in a projective system these sets obviously vary with α. It is an important consequence of the above noted theorem due to Prokhorov (1955) that the limit of this system retains the regularity property if and only if for each > 0 there exists a compact set K ⊂ Ω, the projective limit space (non-trivial when the s.m. condition holds), such that for the limit probability measure space (Ω, Σ, P ) which exists by the above noted result, the following condition obtains: Pα (Ωα − gα (K )) < ,
α ∈ J,
(31)
where gα : Ω → Ωα and Σ0 = ∪α∈J gα−1 (Σα ) that generates the sigma algebra Σ on which P operates. Then a topological projective system admitting its limit will be regular under the above condition (31) which is necessary (and also sufficient). [See Rao (2011), Theorem 3.7 and the discussion related to it.] The topological projective system satisfying (31) will be termed regular in what follows. The desired infinite dimensional analog of Theorem 4.6 above will now be given: Theorem 4.9. Suppose {(Ωα , Σα , Pα , gαβ )α<β ; α, β ∈ J} is a regular projective system of probability spaces, each depending on a parameter > 0 and verifying the s.m. condition for each , so that it has a unique projective limit (Ω, Σ, P ) = lim(Ωα , Σα , Pα , gαβ )α<β . ←
Then as 0, the system possesses the (LDP), if and only if the limit triple does. When this conclusion holds, and if Iα , I are the respective entropy functions of the corresponding triples, then I(ω) = sup Iα (gα (ω)),
ω ∈ Ω,
(32)
α∈J
where the entropy function Iα (·) for each factor space satisfies: Iα (x) = inf{I(ω) : ω ∈ gα−1 ({x})},
x ∈ Ωα , α ∈ J.
(33)
Proof. The argument will be sketched here for appreciating the structure of the problem, following (Rao and Ren (2002)) which is essentially that of Dawson and G¨ artner (1987).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
479
By hypothesis the mappings gα , rαβ are onto and continuous. If the projective limit of the system exists and obeys the LDP then by Theorem 4.3 it may be deduced that (33) holds and Iα is an entropy function on the component space (Ωα , Σα , Pα ). The converse is more involved and the main points will be indicated. Suppose (Ωα , Σα , Pα ) verifies the LDP with entropy Iα for each α and a given > 0. If I(·) is defined by (32), it should be shown that it has the LDP property for the limit space (Ω, Σ, P ). Since the projective system is regular, for the Iα (·), the level sets Ayα = {x : Iα (x) ≤ y} are compact for each y ≥ 0 and by the compatibility of the (projective) system, gαβ (Ayβ ) = Ayα and {(Ayα , gαβ ), α < β} is a compact system and hence has a limit, Ay = lim← (Ayα , gαβ ). To see that I(·) of (32) is an entropy, it has to be shown that it satisfies the upper and lower limits as ↓ 0 for the LDP. The lower bound is simpler than the upper one. For this, let ω ∈ O ⊂ Ω be measurable and since Ω = lim← (Ωα , gαβ ) implies that there is an open set Oα ⊂ Ωα , gα−1 (Oα ) ⊂ O with gα (Oα ) ⊂ O and gα (ω) ∈ Oα , one has P (O) ≥ P (gα−1 (Oα )) and I = Iα ◦ gα so that lim inf log P (O) ≥ lim inf log Pα (Oα ) ≥ −Iα (gα (ω)). ↓0
↓0
(34)
Taking suprema over α ∈ J and since ω ∈ O is arbitrary, the lower bound for I can now be inferred. The upper bound usually involves a little more work. Thus let F ⊂ O be closed and let Ayα , y ≥ 0 be a compact level set for y ≥ 0, which will be a compact level set for Iα , so that Iα (y) ≤ y and then y will be an upper bound on F ∩ Ay where Ay = lim← (Ayα , gαβ ) is compact (nonempty) if this is nonempty. Now let F y ∩ A = ∅. In this case one sees that the result holds vacuously, as a standard argument shows. A further argument shows that on using the hypothesis that the component finite dimensional spaces obey the LDP implies that the upper bound holds in this case as well and so (Ω, Σ, P ) satisfies the LDP. The omitted details may be found in the indicated reference (cf., Rao and Ren (2002), pp. 296–297). The result is included to show the type of argument needed in these computations. The above result can be given a more concrete form as a direct extension of Cram´er’s n-dimensional version. This is done as follows since the underlying structure can then be understood more clearly. In a topological vector space the collection of all finite co-dimensional vector-subspaces J ordered by inclusion forms a projective system whose limit however is known to be much larger than the original space. Thus J
October 24, 2013
480
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
is directed if for α, β ∈ J and let α < β for α ⊃ β. If α < β whenever α ⊃ β, and gαβ : Xβ → Xα where Xα = X /α and gα : X → Xα , for α, β ∈ J, it defines a projective system of subspaces of X . If X is a locally convex metric space, such as a Banach space, then the projective limit of the projective system of subspaces, constructed above is known to have a limit which contains X as a proper subset. However it can be shown that the projective limit is identifiable with X if the projective system defined above satisfies Prokhorov’s ( , K ) condition given by (31) above, and it is essentially the best one. Since by Theorem 4.6 the rate or entropy function is then given by (29), the preceding theorem can be given the following (equivalent) form which is of interest for many applications, and it will be given a new number for reference. Theorem 4.10. Let the vector space X be locally convex and consider {(Xα , gαβ ), α < β in J} where Xα = X /α is the quotient space as before and α ∈ J, forming a projective system with α, β in J, α < β ⇔ α ⊃ β is the ordering used. For the projective system of probability spaces {(X ∗ , Σα , Pα , gαβ ) : α ≤ β in J} the following Iα (x) = lim log Eα (e x ↓0
∗
,x /
),
x ∈ Xα ,
(35)
exists and is finite, where Eα is computed under Pα . If now Iα (·) is convex and weakly differentiable in the sense that t → Iα (x + ty) is differentiable at x in the direction of y ∈ Xα and I˜α (x∗ ) = sup{x∗ , x − Iα (x) : x ∈ Xα },
x∗ ∈ Xα∗ ,
(36)
then I˜α (·) is the conjugate of the ‘Young function’ Iα and is the rate or entropy of the system {(Xα∗ , Σα , Pα ), gαβ : α < β in J}, Moreover, if ˜ ∗ ) = sup{I˜α (x∗ ) : x∗ ∈ X }, the algebraic dual of X , with ((X ∗ ) , X ) I(x topology, then {((X ∗ ) , Σ, P )} satisfies the LDP with the entropy or rate function I˜ where P = lim← (Pα , gαβ ). This result raises the problem of finding the Young complementary or conjugate function for a given convex (Young) function defined on Rn , n > 1, from the classical Orlicz space analysis which treated just the case n = 1, since as the above work shows that the complementary functions of Young convex functions in higher dimensions are not simple analogs, Fenchel called for a study of such functions as they appear in many optimization problems as well as the entropy analysis seen above. This will be discussed in the next section. But first an application for a problem
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
481
in Probability Theory will form a useful motivation as well as raising an interesting set of new questions in the area itself. Recall that X: Ω → X (a Banach space), is termed a (vector) Gaussian random variable if for each x∗ ∈ X ∗ (the adjoint space), the scalar random variable x∗ (X) = X, x∗ : Ω → R(C) is a Gaussian random variable. Its Fourier transform is written as ei X,x dP = e−Qp (x,x)/2, ∀ x ∈ X ∗ , (37) Ω
where Qp (x, x) = v(x) is the variance functional and Qp (x, y) = Cx,y defined on X ∗ × X ∗ is positive (semi)-definite, the covariance functional of the vector (or X )-valued process. Here for simplicity it is assumed that the mean functional is zero. The resulting one is termed a centered (vector) process. For a centered Gaussian (vector or Banach space valued) process the following computation holds: 1 ∗ ∗ ∗ x∗ , x2 P (dx) ≤ x∗ 2 Λp (x ) = Qp (x , x ) = 2 X using the CBS-inequality. Then one also finds that the following key relation ˜ p (x) given by: x → Λ ˜ p (x) = sup{x∗ , x − Λp (x∗ ) : x∗ ∈ X ∗ }, Λ
x ∈ X,
(38)
is a nonnegative convex functional on X . If X = Rn , then the conjugate ˜p = Λp , found Rockafeller (1970, p. 104) can also be shown to hold if X is a Λ˜ reflexive Banach space, but not true generally. [It is also called a ‘normalized action functional’ by Freidlin and Wentzell in all their publications. The shorter term entropy will be used here.] The explicit calculation of the energy functional which plays a central role in most of the computations related to the subject is not easy. The following important result due to Donsker and Varadhan, in the Gaussian case, indicates difficulties present in its applications, which however is a key example of the work in the subject. Theorem 4.11. The family of Gaussian measures {P , > 0} defined above in (37) for vector processes satisfies the LDP as 0. Further the quantities ˜ p(x) : x = 1}, a = inf{Λ
b = sup{Λp (x∗ ) : x∗ = 1},
(39)
October 24, 2013
10:0
9in x 6in
482
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
satisfy the relation ab = 1, and moreover lim
r→0
1 log P (x : x ≥ r) = −a. r2
(40)
Also, X exp{αx2 } dP (x) < ∞ if 0 < α < a, and = ∞ if a < α < ∞. When α = a either possibility may occur. ˜ p has the The fact that Λp is convex and satisfies that its conjugate Λ n same property in R , n > 1 is not obvious and in infinite dimensional spaces such as X , a Banach or a general locally convex space, is a nontrivial problem. The case that Λ is defined on the real line, and its complementary function has similar convexity and growth properties, has been illuminated and aided by the theory of Orlicz spaces and the Young function analysis. This is more difficult in the case of infinite dimensional Banach spaces, and a brief account will be indicated based on the particular treatments when R is replaced by a Banach space X . Much further investigation is desirable, and the LDP as well as entropy analyzes serve as a good motivation for this extended study of the subject. Remarks on a transformation of BM. As a final item of this section consider the following: It is a known fact that a nonlinear transformation of the BM, originally studied by the physicists Ornstein and Uhelnbeck, now called the O.U. process because of its importance in Physics, is a centered Gaussian process {X(t), t ∈ R} with covariance r(·, ·) given by: r(s, t) = σ2 exp{−β|s − t|},
β > 0, σ > 0, s, t ∈ R.
(41)
Later Doob (1940) has found that the process {Y (t), t ∈ R+ }, defined from the X(t) above as: √ 1 Y (t) = tX log t , t > 0, (42) 2β is BM on R+ so that E(Y (s+t)−Y (t)) = 0 and E(|Y (s+t)−Y (t)|2 ) = σ02 s. Then the increments are independent, and that the process {Y (t), t > 0} is BM which, as is well-known from early 1900s, has continuous sample paths a.e., so that the O.U. process has the same property. However in multi-dimensions the situation is not so obvious as the transformation is nonlinear, and a brief discussion will be given here postponing the somewhat more detailed and delicate version to the next section. The representation (42) shows that when the process X (hence Y ) takes values in an infinite dimensional vector space, there may be a form of this result where their
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
483
sample continuity properties can be deduced from one another. This possibility of extending the scalar valued process given by (42) to the case that X (hence Y ) is valued in an infinite dimensional vector space is also of interest. In fact this aspect has been investigated by Prokhorov and Gross separately, and their somewhat deeper but analogous properties will be detailed in the next section wherein multidimensional spaces, related to the Fenchel-Legendre-Orlicz class, are also considered.
5. Vector Valued Processes and Multiparameter FLO Classes In this section, two (related) classes of problems will be considered. The first one deals with multidimensional random processes and multiparameter classes (or random fields) motivated by the problem noted at the end of the above section and the second one deals with the multiparameter case. A very relevant treatment by the author (See Rao (1981)) seems to have been overlooked, and so a larger related part from it will be explained for an appreciation of the problem here in relation to that early work. It will be convenient to recall the notion of a weak distribution and the class of cylindrical probabilities to compare them for use in studying some sample path properties in considering the O.U. classes and their analysis briefly. The following result employs the projective limit theory and plays a useful role in the ensuing analysis. Theorem 5.1. Let {(Ωi , Σi , µi , gij )i<j : i, j ∈ I}, where I = (I, <) denotes a directed index set, be a projective system of signed measures, the system satisfying the s.m. condition whence Ω = lim← (Ωi , gij ) is nonvoid and in fact gi (Ω) = Ωi , where the gi are projections from Ω → Ωi and for i < j in I, there exists a k ∈ I satisfying the mapping relations gi = gij ◦ gk so that there is a µi : Σ0 = ∪i∈I gi−1 (Σi ) → R. Then there is a σ-additive µ such that µi = µ ◦ gi−1 , i ∈ I if and only if there exists a de la Vall´ee Poussin function φ (i.e., a convex non-negative function such that φ(0) = 0 and φ(x) −1 x ↑ ∞ as x ↑ ∞), and each µi extended to σ(gi (Σi )) is of φ-bounded variation in the sense that there is a constant 0 < C0 < ∞ satisfying for all (measurable) partitions π = {Ai }n1 of Ω the following: n
µi (Ar ) λ(Ar ) : Ar ∈ π < C0 . sup φ λ(Ar ) π r=1
(1)
October 24, 2013
484
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
This characterization of existence of projective limits of (signed) measures systems has vector measure versions. Some extensions of these measures if they take values in locally convex vector spaces, and in fact a few of them are discussed in the paper referred to above and the book (Rao (1995), Chapter 1). Further details and results are in the projective limits papers (Rao (1971) and (2011)) with related references. Here, just the above theorem is used to state some applications and results by Prokhorov and Gross together with the inter-relations leading to the key abstract Wiener space (a.w.s.) and the O.U. classes. Observe that the projective limit space Ω given above is quite large and it turns out to coincide with the algebraic dual of X ∗ , denoted X ∗ so that under the canonical or ‘natural’ inclusions one has X ⊂ X ∗∗ ⊂ X ∗ = Ω in which the last space is given weak-star or σ(X ∗ , X ∗ ) topology which agrees with the projective limit topology (Schwartz (1973), p. 177). Since the projective system is Radon to begin with, P = lim← Pα exists and is supported by Ω, but it is possible that X ⊂ Ω under the identification with the second adjoint, but may have even zero measure. Now Prokhorov’s condition, recalled for Theorem 4.9 above, implies that this large Ω has full measure, and even X ∗∗ , the second adjoint, may not be sufficient. This difficulty will be overcome if the Radon projective system which is the basis of this work, satisfies the ( , K ) condition, given as (31) prior to Theorem 4.9, as K ⊂ X and this result (stated in the form of Prokhorov’s theorem for vector spaces, and for locally convex spaces by Bourbaki) is fully discussed by Schwartz in the above book. It is desired here. In the case that the projective system of topological vector spaces is Hilbertian, the condition takes the following form. Now a given projective system of Radon probability spaces has a Radon limit if and only if for each > 0 there is a compact set K ⊂ X such that the inequality (31) of the preceding section holds. In the Hilbertian case under our inclusion ordering of subspaces of finite codimension, this simplifies to Xα = α⊥ . If the measure Pα which can be defined as the multivariate normal or Gaussian probability, is considered in the above construction then it has the projective limit space as X ∗ itself. The thus obtained limit becomes a generalization of the BM process, which is now termed an abstract Wiener process. Note that in the forward direction, the restriction is only on the underlying projective system, and the Gaussian specialization is in the converse direction and it leads to various possible concrete calculations and so is justifiably termed an abstract Wiener process. Now several concrete evaluations are possible. Since the necessary set, namely compact K , is needed
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
485
for further work, it is shown to be achievable for a class of functions depending just on a finite number of coordinates, called a measurable semi-norm (m.s.n.). Each of these functions depends on a finite number of coordinates (just as a compact set in an infinite dimensional topological vector space is determined by a finite dimensional subspace). This key fact will now be elaborated. It should also be observed, and seen below based on a construction by Hida and Nomoto (1964) (see also Yamasaki (1973)), that Gaussian processes also arise in infinite dimensional (sequence) spaces as projective limits of certain measures which are not easily related to abstract Wiener spaces. This will imply that the projective limit constructions include a.w.s. as a large but proper subclass. Specializing the projective system to Hilbert spaces, the preceding result is refined and can be specialized to obtain a more concrete statement. Thus if X is a Hilbert space, and {Xα , α ∈ I} is a family of subspaces of finite codimension (or deficiency) so that Xα is also α⊥ or equally as the quotient space X /α⊥ so that each Xα is finite dimensional, let φα : X → Xα be the orthogonal projection onto the subspace, so that φαβ : Xβ → Xα , α ⊂ β thus making {Xα , φαβ , α, β ∈ F } a family of projection operators where φαβ : Xβ → Xα , α < β(or α ⊂ β) of the type described above. If Bα is the Borel σ-algebra, and for α < β (or α ⊂ β) one must have for probabilities Pα on Xα , as Pα = Pαβ ◦ φ−1 αβ so that {(Xα , Bα , Pα , φαβ )α<β : α, β ∈ F } is a projective system of Radon probability spaces and X ⊂ X ∗∗ ⊂ X ∗ (= Ω) where X ∗ is the algebraic dual of X ∗ containing X ∗∗ properly, which is given the weak*-topology of X . This is also denoted by σ(X ∗ , X ∗ ), called the projective limit topology. So Ω = lim← (Ωα , φαβ ), Ωα = Xα = X /α and Ω is identifiable with X ∗ (⊃ X ∗∗ ). Since X ⊂ X ∗∗ ⊂ X ∗ = Ω and P = lim← (Pi , gij ), the projective limit measure (= a probability here), it may happen that X only has P -measure zero. On the other hand if X is a reflexive Banach space so that X is identifiable with X ∗∗ , and if the system satisfies the Prokhorov ( , K )-condition, whence Ω can be identified with X , the situation improves. However the ( , K )-condition can be specialized if X is a Hilbert space so that the cannonical maps gij will be orthogonal projections, and further reduction is possible if the projective system has the gij as orthogonal projections, and moreover the system is Gaussian. A significant simplification is achieved when X is a Hilbert space and the φαβ are orthogonal projections with finite dimensional ranges so that the associated norm functionals on X also depend on a finite number of coordinates. It is termed a measurable semi-norm, which is a cylindrical function. A simpler useful condition was
October 24, 2013
486
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
found by Gross (1962), which is an analog of the ( , K )-condition in the Gaussian case. [This was discussed in detail by the author (cf., Rao (1981), with complete proofs), and a general exposition of the subject is to be found in Kuo (1973) on the subject.] The desired result is the following, explaining the basic role of the m.s.n. and is due to Gross (1967) in the form discussed in Rao (1981): Theorem 5.2. Let X be a real Hilbert space carrying a projective system of Radon probability spaces P = {(Xα , Bα , Pαβ )α<β : α, β ∈ F }, where F is the partially ordered system of finite range orthogonal projections. If the system admits a Radon projective limit (Ω, Σ, P ) with P concentrating on X , then for each > 0, there exists a finite rank orthogonal projection π such that for any finite rank orthogonal projection π ⊥ π and each m.s.n. functional q on X such that {x : q(x) > }, is a Borel set, one has for all π of finite rank : P {x : q(πx) > } = Pα {πx : q(πx) > } < ,
α⊥ = π(X ).
(2)
Conversely, if P is a Gaussian projective system, and there is an m.s.n. q satisfying (2) above, then P = lim← Pα exists which is supported by B = sp{X ¯ , q(·)} and (B, B, P ) is a Radon probability space with B as the Borel σ-algebra of B. Remark: Since the embedding X ⊂ B is continuous, one has the inclusions ˜ ⊂ B where X and its (Hilbert) adjoint for adjoint spaces as B ∗ ⊂ X ∗ =X ∗ are identified, and the triple (B , X , B) is termed the abstract Wiener space (a.w.s.). The computation of the triple in several problems is detailed in Kuo (1975), and is of interest here since the analysis is often quite involved. Yet it will be seen below that there are projective limits of nonGaussian systems, even with bounded ranges, whose projective limits are Gaussian so that the above chararacterization does not cover all such systems that are included in the general projective limit analysis, even for Hilbert spaces. The following additional information on Gaussian measures in Hilbert or even Banach spaces is of interest as it is also related to the a.w.s. class. If B = C0 ([0, 1]) is the Banach space of continuous functions vanishing at ‘0’ and H is the Hilbert space determined by the absolutely continuous functions with square integrable derivatives, then this (B ∗ , H, B) is the classical a.w.s. example of a Weiner triple worked extensively by R. H. Cameron and W. T. Martin since the early 1940s, the m.s.n. being the L2 -norm of the differentiable elements of B, and is well-reviewed in H. H. Kuo (1975).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
487
Even in the case of two (real) variables, the original computations of Yeh (1960), actually to put in the form of a.w.s. as defined by Gross (1967) discussed above, has been a nontrivial task, as seen in the work by Finlayson (1975). The difficulty of verification of the a.w.s hypothesis as well as its predecessor, the Prokhorov compactness condition (the ( , K ) relation), are yet to be discovered for some (easily) verifiable forms. In this connection, the following result, due to Dinculeanu (2000, p. 120), on a related integral representation of a ‘Gaussian measure’ in a Banach space may be of interest to researchers in the subject. For this the account given in Neveu (1968) is extremely useful and seems to have inspired Dinculeanu’s considerations. It will be discussed later. Since the sum of a pair of Gaussian random variables is not necessarily Gaussian, although by the well-known theorem of Cram´er that the sum of a pair of independent random variables is Gaussian if and only if each summand is Gaussian, and such a condition is not assumed. An elaboration of the concept in an infinite dimensional Hilbert space X may start with A = {x : πx ∈ B}, B ⊂ Rn , a Borel set, and π : X → Rn , the coordinate projection, so B is the base of the ‘cylinder set’ A, and one can define (|·| is Euclidean norm): 2 1 (2π)− 2 e−|x| /2 dx, (3) P (A) = B
which gives an additive set (Gaussian) function but will not be σ-additive. An important result due to L. Gross shows that the measure is σ-additive if and only if the metric here is obtainable from a ‘measurable semi-norm’. In fact a slightly extended interesting version of the result due to Kuelbs (1970) in a (real) separable Banach space which carries a probability measure µ giving a positive value for each nonempty open set necessarily includes a real separable Hilbert space such that if i : H → B is the embedding, then (i, H, B) constitutes an abstract Wiener space (B ∗ , H, B), as the triple defined earlier. [A somewhat simplified proof of this basic result is given by Kuo (1975) in his lecture notes volume.] Thus the general assumption in Dinculeanu’s theorem, as well as Neveu’s work that is used here, which makes it essentially an a.w.s., for the reason that a nontrivial Gaussian subspace in infinite dimensions has to be of some form of an a.w.s. With this clarification the following general representation of a linear mapping B: X → L2 (P ) can be given an abstract integral version relative to a vector measure mw : f (x) dmw (x). (4) B(f ) = X
October 24, 2013
488
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
This result was deduced by Dinculeanu from his abstract representation theorem for Banach space results (cf., Dinculeanu (2000), pp. 118–120). The analysis of Gross on a.w.s. theory clarifies and puts the abstract version in perspective which is useful for applications. An interesting consequence of the a.w.s. analysis is the vector generalization of the O.U. process discussed briefly at the end of the preceding section and then treat the sample path properties such as continuity or even differentiability. This aspect was considered by Millet and Smole´ nski (1992) and it will be discussed here. A motivation for this analysis is as follows. The progression (or specialization) of our work is first to obtain a general characterization for the existence of processes in infinite dimensional spaces if their finite dimensional parts are supposed given satisfying the natural compatibility conditions. The solution is obtained via the projective limit theory culminating in the Prokhorov ( , K )-condition. A refinement of this class of processes is obtained in Gross’s a.w.s. analysis leading to the triple (B ∗ , W, B). Since a Gaussian process is characterized by its mean and covariance functions, the next natural step is to consider path properties of (Gaussian) processes taking values in either of the components of the triple. Here some new problems arise since the members of the triple have different topological structures and are progressively measurable, and the path behaviors can differ in each component space. This is discussed for the O.U. process which is Gaussian but as a certain (not necessarily linear) transformation of the BM. It leads to some new areas and problems for research, opening up inexhaustible source of new areas. Here is an interesting illustration of the a.w.s. discussed above, connecting it with some aspects of abstract analysis. A useful property of the BM for its transformed one is the sample path behavior which for the O.U. process had been considered by several authors because of its applicational potential (particularly in Physics). An early characterization in Doob’s (1942) work already noted in the last section, shows that {Xt , t ∈ R} is a centered stationary Markov process which for each couple s, t ∈ R the pair {Xs , Xt } has a nondegenerate bivariate (stationary) Gaussian distribution so that its covariance is given by ρ(s, t) = ρ(s − t) = Ce−D(|s−t|) for some C > 0, D > 0, and this property is crucial for the ensuing analysis. If the process is vector (or Banach)-space valued, as is the special interest here, the desired property can be described for a general class of vector valued processes adapting the format of Millet and Smole´ nski (1993).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
489
Since a Gaussian distribution (hence such a process) is uniquely determined by its mean and covariance functions as already noted (mean is zero here), it is only necessary to consider the general covariance functions as a positive definite form and use some abstract analysis results. For the O.U. process, by the above observed Doob characterization, the covariance function (or operator in the vector case) must be exponentially decreasing. Therefore a great deal of finer analysis is possible even when the process is Banach or Hilbert space valued for which the a.w.s. structure can be utilized. This aspect will be sketched. Since the O.U. process {Xt , t ∈ R+ } studied here is stationary, it can be represented as Xt = Tt X0 , where {Tt , t ≥ 0} is a family of contraction semi-group of (translation) operators on the range space B(⊃ L20 (P ) here) allowing application and several specialized properties from the well-known abstract (operator) theory. Some (growth or) decay relations on the translation semi-group from the general theory gives simple conditions for the (stationary) Gaussian processes and especially for the corresponding O.U. class with Kuelbs characterization in the background. The following result thus automatically includes the O.U. class and is slightly more general in that it aims to present a ‘somewhat more inclusive set’ of Gaussians valued in a Banach space B, satisfying the stationarity condition with some restrictions on the latter. However, they cover the corresponding O.U. class. Theorem 5.3. Let {S(t), t ≥ 0} be a semi-group of bounded linear operators on a Banach space B which is strongly measurable and satisfies the following pair of conditions: S(t)L(B) ≤ Ct−β ,
0 ≤ β < 1/4,
β < α < (1 − 2β)/2,
for 0 < t ≤ T < ∞, and T Γα (f ∗ , g ∗ ) = t−2α (S(t)∗ f ∗ , S(t)∗ g ∗ )H dt, 0
f ∗, g∗ ∈ B∗ ,
(5)
(6)
this being the covariance relative to a measure ν on B ∗ , implying that it is also representable as Γα (f, g) = B f ∗ (x)g ∗ (x)ν(dx) for some σ-finite measure ν. Then there exists a process Xt : Ω → B ∗ such that the mapping t → Xt is continuous and its covariance is representable for each f ∗ , g ∗ ∈ B ∗ as: s∧t ∗ ∗ S(s − u)∗ f ∗ , S(t − u)∗ g ∗ du. (s, t) → E(f (Xs )g (Xt )) = 0
October 24, 2013
490
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
A proof of this result needs several details and the reader is referred to the paper by Millet and Smole´ nski (1992) where some applications and extensions are also discussed. The result exemplifies the key role played by the a.w.s. It is a familiar fact that continuity properties of sample paths almost always need separate treatments, and this procedure is exemplified by the above result. For further details and related works, the reader may consult the above authors’ paper and proceed from it. The second aspect on multiparameter processes (and/or random fields) presents some new (as yet unresolved) problems. The work if T is a subspace of Rn , or generally an infinite dimensional vector space raises new problems also useful for the LDP analysis of the earlier sections, as well as the corresponding studies motivated by the above work. In this analysis the Fenchel-Legendre-Orlicz class analogs have to be considered. This will now be outlined to round out this discussion. ˜ On multiparameter FLO classes: Recall that the functionals Λ and Λ considered in the LDP analysis are not necessarily symmetric although convex. They need not be Young functions but satisfy the Young inequality, ˜ is important in this analysis. For further and the complementary function Λ study some knowledge of the associated function spaces LΛ (µ, X ) and its complementary space will be useful. So a brief account of it analogous to the by now well-known work of Rokafellar’s book will be sketched for a future analysis of interest in the LDP and related research generally. These remarks are aimed at stimulating further works in this direction. ˜ are convex, and nonnegative, Λ(x) = Even though the functionals Λ, Λ Λ(−x), it is not necessarily the case that Λ(tx) → ∞ for x = 0 so that it is not a Young function. A simple example is that X = R2 , Λ = Φ1 Φ2 where the Φi : R → R+ , i = 1, 2 are the standard Young functions on R in which Φ1 is continuous on R and Φ2 (0) = 0, = ∞, x = 0. Then Λ : R2 → R¯+ is a Young function on X = R2 , convex, vanishing at (0, 0) and Λ(tx) = 0 for all 0 = x ∈ R2 and is a Young function but its complementary function ˜ is not a ˜ is Φ(x, y) = Ψ1 (x), where Ψ1 is complementary to Φ1 . Hence Λ Young function. Thus some new conditions and ideas are needed in the case of higher dimensional Young or complementary functions for the concerned (function) spaces now. Here the following strengthened formulation is considered. With this a norm functional can be introduced to proceed further. ¯ + , be a Young function on a Banach space Definition 5.4. Let Λ: X → R X satisfying {x: Λ(tx) < ∞, some t > 0} = X , and some t > 0, and
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
491
Λ(−x) = Λ(x). Then the Fenchel-Orlicz space LΛ (P, X ) on a probability space (Ω, Σ, P ) is the set of(strongly) measurable functions f : Ω → X such that the Bochner integral Ω Λ(kf )dP < ∞ for some k > 0. The gauge functional which will be a norm, when equivalent functions are identified, is defined as:
f f Λ = inf k > 0 : Λ dP ≤ 1 . (7) k Ω The following result on the structure of this function class is an analog of the scalar function case, called the Orlicz space, but is not a simple prototype of the classical class. The desired result will be stated here to see what new ideas are needed. Theorem 5.5. The set (LΛ (P, X ), ·Λ ) is a normed linear space under the norm functional (7) when the norm zero elements are identified with zero. If X is finite dimensional or Λ(x) ≥ αx + β for some fixed α > 0, β > 0, and x = 0, then LΛ (P, X ) is a Banach space. The last condition is equivalent to continuity of Λ(·) at the origin. The growth condition imposed on Λ(·) assumed in the theorem is always satisfied if X is finite dimensional, but needed in the general case, as otherwise the space LΛ (P, X ) need not be complete. A detailed structural analysis of the LΛ (P, X )-spaces was extended by Turett (1980). The result leads to an analysis of these spaces of interest in themselves and also since ˜ of Λ is essentially the Legendre function satthe complementary function Λ isfying the Young inequality, it will be of significance in the LDP analysis and its extension to infinite dimensional problems. This is also used in the studies of the sample path continuity analyses of random fields. It may be useful to indicate here the corresponding LΛ (P )-spaces of set functions extending the point function case which minimizes the measurability problems and enhances the applicational potential. This aspect was studied by Uhl (1969) and later, in considerable detail. The following result gives an indication of this aspect of the analysis and indicates new possiblities. Let G : Σ → X be an additive vector (here Banach) space valued function on the σ-algebra, of finite Λ-norm where the latter is defined for a (Fenchel-) Young function Λ on X and a measure µ : Σ → R+ , so that GΛ < ∞, where
G ≤1 . (8) GΛ = inf k > 0 : IΛ k
October 24, 2013
10:0
9in x 6in
492
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
The Λ-variation IΛ (G) giving the norm above is defined by: n
G(Ai ) Λ µ(Ai ) : Ai ∈ Σ, Ai ∩ Aj = ∅, i = j . (9) IΛ (G) = sup µ(Ai ) i=1 Let {M Λ (µ, X ), ·Λ } be the linear space of set functions with norm given by (8), as usual. It can be verified that {M Λ (µ, X ), ·Λ } is a Banach space of additive set functions G : Σ → X and its structure has been analyzed by Uhl in the above reference. The following is one of his main results which is of interest in our analysis and its possible extensions. ˜ its comTheorem 5.6. Let Λ: X → R+ be a Young function and Λ, Λ plementary function with M , defined above as the subspace determined by simple functions. Then a continuous linear functional : M Λ → R is uniquely representable as: (f ) = f dG, f ∈ M Λ (µ, X ), G ∈ SΛ˜ (µ, X ∗ ), (10) Ω
where the integral is the well-known Bartle bilinear vector integral, and = sup{|(f )| : f ∈ M Λ (µ, X ), f Λ = 1}.
(11)
Here G has a (Radon-Nikod´ym density if X ∗ is separable or it has the so-called Radon-Niko´ ym property, generalizing the above condition). Consequently, the space LΛ (µ, X ), defined as (strongly measurable) X valued functions of finite ·Λ -norm is strictly convex in the sense that its unit sphere has no line segments, if Λ is strictly convex and satisfies the growth condition that Λ(2x) ≤ KΛ(x) for all x ∈ L Λ (µ, X ), and o < k < ∞. For details and related analysis the reader is referred to Turett (1980) and Uhl (1967) where several other associated results and extensions are found. It should be noted that without the growth condition such as the one in the above theorem, often referred to as the ∆2 in the literature, the characterization of the adjoint space is difficult and a solution is in Uhl’s unpublished part of his 1965 Carnegie thesis which needs a little polish and simplifications. It is perhaps useful to indicate at this point that the LΛ (µ) spaces with Λ(·) as an exponential function allows analysis of random processes valued in these spaces which are “intermediate” to L∞ (µ) and of polynomial growth valued in Lp (µ). Some of these results were discussed in the book by Rao and Ren (2002), Section 8.3, and further discussion will be unnecessary here.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
493
6. Evaluations and Representations of Conditional Means In the earlier discussion in Section 3 and at other places, conditional expectations and probabilities appear in crucial parts of the analysis. Their existence is deduced immediately from the Radon-Nikod´ ym theorem, but there is no good constructive procedure in higher dimensions. Some aspects of the unsatisfactory situation that still exists and what is available to solve the problem will be briefly discussed in this final section to highlight some of the questions to be resolved in the ensuing investigations. For convenience let us recall the problem. If f ∈ L1 (Ω, Σ, P ) = 1 L (P ), on a probability space, consider for a sigma subalgebra B ⊂ Σ, νf (B) = B f dP, B ∈ B which is a PB -continuous (signed-) measure, so dν that gf = dPfB exists and the relation E B : f → gf is the well-known conditional expectation operator, and is a positive linear contractive mapping on L1 (P ) into L1 (PB ). The properties of this operator including its calculation for given such algebras B is a necessary but nontrivial problem for many applications. Some of these questions and properties will be indicated here to show the complications underlying this important and key concept in the theory. More details will be found in the author’s Georgian Math. Journal (2001) paper and the earlier one in Indian J. Math. (1993) paper both of which were not placed in their proper perspective in the current research work, as they raise some key problems and gave some (nontrivial analysis with) solutions. They will be briefly considered here. Since the conditional expectation operator E B is a positivity preserving contractive linear operator on L1 (Ω, Σ, P ) = L1 (P ), it defines a conditional probability measure P B : B → E B (χB ), B ∈ Σ, which is a positive measure valued in L1 (B, PB ) vanishing on P -null sets of Σ. J.L. Kelley and T.P. Srinivasan found a reasonable condition called ‘locally small mean diameter condition’ that is strong enough and useful to provide a positive solution for differentiating a Banach valued measure ν. This states that for each
> 0 the diameter of the set diam{ Pν(B) (B) : 0 < P (B), B ⊂ A} ≤ , all sets considered being in Σ. This is a hard condition to verify but it is an abstraction of a relative weak compactness condition formulated earlier by R. S. Phillips. The upshot of this complicated condition (satisfied for reflexive spaces) is that for every B ⊂ Σ the conditional probability P B admits a unique integral representation for a g ∈ L1 (PB ) as a Bochner integral: B P (A) = g dPB , A ∈ Σ, (1) A
October 24, 2013
494
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
The fact that the measure used is a probability is not essential and any finite measure can be employed. For instance if µZ (A) = A ZdP , it can be used in the above by normalization, and if E B denotes the conditional expectation for the probability P and EµB as the corresponding operator for the unnormalized (but positive) measure, then the resulting conditional operator is easily seen to be EµB (X) =
E B (XZ) , E B (Z)
X ∈ L1 (P ).
(2)
One can obtain a Lebesgue type decomposition of the vector conditional measure too. But the real problem is a calculation of these conditional expectations even for the well-studied Gaussian (or any nonatomic) probability problem. This will be discussed briefly now. The actual evaluations are so essential in applied problems, one wishes that there are some algorithms. However, there are no such procedures. To illustrate, following Ka¸c and Slepian (1959) consider a pair of Gaussian processes {Xti, |t| ≤ a, a > 0}, i = 1, 2 such that Xt1 -process is ergodic and Xt2 -process is the derived one of the former at each point, Xt1 being centered (mean zero). It is desired to find the probability of the tangent at t = 0 given that the process was at a at that time, i.e., find P [X02 < y|X01 = a]. Traditional or other known methods do not provide a unique solution when one uses the familiar ratio definition for the evaluation of conditional probability, somewhat analogous to the directional derivative in the classical (elementary) multivariable calculus. In the case of n and m dimensional Gaussian vectors with means µx , µy and variance-covariance matrices Rxy , Ryy then using elementary (orthogonalization) manipulations, one can find a matrix A such that Y − µy and Z = (X − µx ) − A(y − µy ) are orthogonal, which is then seen to give the −1 , the last symbol being a (pseudo or) generalized solution as A = Rxy Ryy inverse if A is singular, so that one has E(X|Y ) = µx + A(Y − µy ).
(3)
This is a well-known linear regression equation for Gaussian variables, and it can be generalized if X is a finite vector and Y is an infinite vector forming a Gaussian process, but the situation will not be as simple. The precise statement will now be given referring the details to the above paper. Theorem 6.1. Let Y = {Yt , Ft , t ∈ R+ } be a real a.e. sample continuous Gaussian process that is also a martingale, and X be a random variable
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
495
such that each finite set (X, Yt1 , . . . , Ytn ) is distributed as a Gaussian vector. Then the regression function of X on the process admits the following linear representation: t K(t, s)dYs , (4) E(X|Ys , s ≤ t) = E(X|Y0 ) + 0+
where the function K(·, ·) is jointly measurable, integrable on bounded intervals, and the stochastic integral above is well-defined (simply) in the L2,2 -bounded sense. Although this extends the special case noted above, the details are somewhat involved and the reader is referred to the paper. The hypothesis applies to diffusion processes and thus includes the BM. The point of this hypothesis is that the ‘kernel’ of the stochastic integral above is non-stochastic and thus explains the linearity of Gaussian regressions in a considerably general setup. It was detailed in Rao (2000, Section 8.4). The integral representation in (4) above is a deeper property of the Gaussian martingales and that leads to Kalman filters. The latter subject with extensions to multivariate case and related analysis (cf. Section 8.5) was detailed in the above book. Although Rozanov (1971) considered in his work some aspects of infinite dimensional Gaussian distributions, of infinitely many variables, he does not cover the problem of conditioning with any aim of construction that is discussed in detail here. The key point of this analysis is an exact evaluation of conditional expectations, particularly in the multidimensional context with the backdrop of the Ka¸c-Slepian multiple solutions or paradoxes. Now the following unique evaluations hold when Fourier analysis can be employed. This was considered by the author (Rao (1993), Theorem 2) and a discussion with a useful application will conclude this article. Recall that if (Ω, Σ, P ) is a probability space and B ⊂ Σ is a sigma algebra, then for each P -integrable random variable X the conditional expectation exists and is expressible as a vector integral E B (X) = XdP B , (5) Ω
and the need for a unique evaluation of the conditional expectation of a random variable with a simple solution (and a paradox with multiple solutions) noted earlier can be solved in some cases. Let ϕ(t) = E(Xei(t1 Y1 ;··· ;tn Yn ) ) which is well-defined for any real vector (t1 , . . . , tn ), and integrable X. Then the next result, analogous to the L´evy inversion formula, holds and solves
October 24, 2013
496
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
the problem in the following way, motivated by the works of Yeh (1975) and Zabell (1979). Theorem 6.2. If ϕ introduced above is Lebesgue integrable on Rn and FY has a density fy in Rn relative to the Lebesgue measure, then the desired conditional expectation is given by: ei(a,u) ϕ(u) du1 · · · dun , (6) E(X|Y )(a) = [fy (a)(2π)n ]−1 Rn
on the set {x: fy (x) > 0} and fy (·) is continuous there. A proof of the result is detailed in my paper in the Indian Math. J. (1993), where some related results are also given. However, this result needs a considerable change if Y is an infinite dimensional vector or a process. It uses the a.w.s. concept as well as some results related to those considered in the preceding section. Some of these with a few ‘elementary examples’ and detailed computations are included in the second edition of my ‘Conditional Measures’ book (2005), Sections 5.6–5.7, and so they will be omitted here. Finally it is desirable to consider how these evaluations of conditioning may be carried out if Rn is replaced by a more general group. In a related (harmonic) analysis context Edwards and Hewitt (1965) presented a detailed analysis for pointwise limits of convolution operator sequences, and their approach shows a way to the present problem as well. This will be briefly described here. The above authors have abstracted the differentiation basis for some collection contained in locally compact groups G with a (left) Haar measure λ. A decreasing sequence Uk , k ≥ 1, of λ-measurable collection contained in G is called a D-sequence (‘D’ for derivation) if there is a constant 0 < K < ∞ such that 0 < λ(Uk · Uk−1 ) < K for all k ≥ 1. If moreover, each neighborhood of the identity of G contains some Uk , then the D sequence is termed a D -sequence. It can be shown that every Lie group admits a D as well as a D sequence. Indeed Edwards and Hewitt have discussed a great deal of structure theory of analysis on these groups of interest in our analysis. The following abstract version on conditional expectations can be established. Theorem 6.3. Let (Ω, Σ, P ) be a probability space, G a D group and λ as its (left) Haar measure. If X: Ω → R is an integrable random variable, let Y : Ω → G be an abstract (Borel) measureble mapping and {Un , n ≥ 1} be a D -sequence. Let fY be the density of the absolutely continuous part of PY = P ◦ Y −1 relative to λ. Then the conditional expectation E(X|Y ) can
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
497
be evaluated uniquely on the set {r: fY (r) > 0} by the formula: E(X|Y )(r)fY (r) = lim [λ(Uk )]−1 E(χUk (Y )X), k→∞
a.e.[λ],
(7)
so that E(X|Y ) is uniquely determinable on {r : fY (r) > 0}. The details of proof of this result may be found in the earlier paper, but a more relaxed version is found in the author’s book (Rao (2005), Section 7.6) where not only the evaluation problem but other extensions and ramifications of the entire conditioning is treated in more detail than most other publications on the subject of conditioning both in the finite and infinite dimensions. Conditions for uniqueness of such a sequence in a given (even a) Lie group and consequently an a.e. unique value of a conditional expectation still avait further work. The general case (of infinite dimensional spaces) may have to use the projective limit theory as in (Rao (2012)). For now this discussion of the subject will be concluded, pointing out the many areas and research problems that the interested readers can pursue in the covered subjects. For this purpose some additional works of closely related items, not exactly referred to in the text, are also included in the references section below. Others may be found in Rao (1995) and Rao (2000). Note. In a monumental restudy and expanded analysis of Wiener’s (1938) original view of the homogeneous chaos in Euclidean spaces of infinite dimension with Lebesgue measure just in the finite dimensional (sub)spaces with suitable identifications on overlaps, Masani (1997) has reproved all the results of the original Wiener work, using his new notation and terminology derived from and applicable to the Lebesgue measure in finite dimensional Euclidean spaces. All of this work focuses on the Euclidean structures and properties of the (classical) Lebesgue measure. The modern view of using the projective systems of measures with a generalization of the Kolmogorov-Bochner theorem does not appear but he reproves the needed parts in this case. The ( , K ) condition of Prokhorov’s and its specialization in the case of Brownian motion, as abstract Wiener space analysis of the Gross (-Segal) work does not play any role, but nevertheless the end result was rederived with Masani’s own symbolism exclusively for the Wiener measure. The special methods and analysis (using only the Pettis integration when needed) and Dinculeanu’s abstract extension of the classical F. Riesz’s representation theorem in the Banach space (operator) context was the only abstract result needed, and it seems hard to extend this procedure to the general cylindrical measures. However, focussed on Wiener’s measure and the Cameron-Martin development on Wiener’s
October 24, 2013
10:0
498
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
measure using Dinculeanu representation, the core result which is essentially the a.w.s. final product but based on expanding Wiener’s original ideas is presented in this voluminous article with most details. Interested readers should consult this paper as a research memoir and should obtain a ‘simpler’ form of it within the a.w.s. and related current and more familiar abstract versions (as discussed in Section 5 above) with a view to possible extensions and simplifications of this major reproof of the classical Wiener ideas. References [1] A. C. Berry, The accuracy of the Gaussian approximation to the sum of independent variables, Trans. Amer. Math. Soc. 49 (1941), 122–136. [2] E. Bishop, Foundations of Constructive Analysis, McGraw-Hill, New York, 1967. [3] S. Bochner, Harmonic Analysis and the Theory of Probability, Univ. of Calif. Press, Berkeley, CA, 1955. [4] V. I. Bogachev, Gaussian Measures, Amer. Math. Soc., Providence, R.I, 1998. [5] V. I. Bogachev, Differentiable Measures and the Malliavin Calculus, Amer. Math. Soc., Providence, R.I, 2000. [6] D. M. Chang and S. J. Kang, Evaluation formulas for conditional abstract Weiner integrals, Stochastic Anal. Appl. 7 (1999), 123–144. [7] P. Cartier, Introduction ` a l’stude des movements brownien a ` plusier param`etres, LNM 191 (1971), 58–75. [8] H. Chernoff, Large sample theory : Parametric case,’ Ann. Math. Statist. 24 (1956), 1–22. [9] D. A. Dawson, Stochastic evolution equations and related measure processes, J. Multivar. Anal. 5 (1975), 1–52. [10] D. A. Dawson and J. Gartner, Large deviations from the McKean-Vlasev limit for weakly interacting diffusions, Stochastics 20 (1987), 247–308. [11] D. A. Dawson and J. Gartner, Large deviations, free energy functional and quasi-potential for a mean field model of interacting diffusions, Memoirs, Amer. Math. Soc. 398 (1989), 1–94. [12] N. Dinculeanu, Vector Integration and Stochastic Integration in Banach Spaces, Wiley-Interscience, New York, 2000. [13] K. E. Dambis, On the decomposition of continuous submartingales, Theor. Prob. Appl. 10 (1965), 401–410. [14] C. Dol`eans-Dade, Existence du processus crossant natural ` a un potentiel de la class (D), Z. Wahrs. 9 (1968), 309–314. [15] M. D. Donsker, An invariance principle for certain probability limit theorems, Memoirs Amer. Math. Soc. 6 (1951), 12–19. [16] J. L. Doob, The Brownian moment and stochastic equations, Ann. Math. 43 (1942), 331–339. [17] J. L. Doob, Stochastic Processes, Wiley, New York, 1953. [18] L. E. Dubins and G. Schwarz, On continuous martingales, Proc. Nat. Acad. Sci. 53 (1965), 913–916.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
499
[19] J. Dugundji, Topology, Allyn and Bacon, Boston, 1966. [20] N. Dunford and J. T. Schwartz, Linear Operators, Part I: General Theory, Interscience, New York, 1958. [21] P. Dupuis and R. S. Ellis, A Weak Convergence Approach to the Theory of Large Deviations, Wiley-Interscience, New York, 1997. [22] R. E. Edwards and E. Hewitt, Pointwise limits for sequences of convolution operators, Acta Math. 113 (1965), 181–218. [23] A. C. Esseen, Fourier analysis of distribution functions: A mathematical study of the Laplace-Gaussian law, Acta Math. 77 (1945), 1–125. [24] W. Feller, Non-Markovian processes with the semi-group property, Ann. Math. Statist. 30 (1959), 1252–1253. [25] T. Feng and T. G. Kurtz, Large Deviations for Stochastic Processes, Amer. Math. Soc. Surveys, Providence, R.I., 2006. [26] X. Fernique, R´egularit´e de processus gaussins, Invent. Math. 12 (1971), 304–320. [27] H. C. Finlayson, Two classical examples of Gross’ abstract Wiener measures, Proc. Amer. Math. Soc. 53 (1975), 337–340. [28] S. V. Fomin, Differentiable measures in linear spaces, Proc. ICM, Moscow, 1966, pp. 78–79. [29] M. I. Freidlin, Functional Integration and Partial Differential Equations, Princeton Univ. Press, Princeton, N.J., 1985. [30] M. I. Freidlin and A. D. Wentzell, Random Perturbations of Dynamical System, 2nd Edition, Springer, New York, 1998. [31] R. Gangolli, Positive definite kernels on homogeneous spaces and certain stochastic processes related to L´evy’s Brownian motion of several variables, Ann. Inst. H. Poincae´ e 3 (1967), 121–225. [32] L. Gross, Measurable functions on Hilbert space, Trans. Amer. Math. Soc. 105 (1962), 372–390. [33] L. Gross, Abstract Wiener spaces, Proc. 5th. Berkeley Symp. Math. Statist. and Prob. 2 (1965), 31–42. [34] H. Heyer and M. M. Rao, Infinite dimensional stationary random fields over a locally compact abelian group, International J. Math. 23 (2012), 12050029 (23 pages). [35] T. Hida and H. Nomoto, Gaussian measures on projective linear space of spheres, Proc. Japaan Acad. Sci. 40 (1964), 301–304. [36] I. Iscoe, M. B. Marcus, D. McDonald, M. Talagrand and J. Zinn, Continuity of L2 -valued Ornstein-Uhlenbeck processes, Ann. Prob. 18 (1993), 68–84. [37] M. Ka¸c, On some connections between probability theory and differential and integral equations, Proc. Second Berkelly Symp. on Math. Statist. and Prob., 1951, pp. 189–215. [38] M. Ka¸c and D. Slepian, Large deviations of Gaussian processes, Ann. Math. Statist. 30 (1959), 1215–1228. [39] A. I. Khintchine, The entropy concept in probability theory, Usphki Mat. Nauk. VII(3) (1953), 103–120. [40] F. B. Knight, A reduction of continuous square integrable martingales, Springer LNM 190 (1971), 19–31.
October 24, 2013
500
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch06
Real and Stochastic Analysis
[41] J. Kuelbs, Gaussian measures on a Banach space, J. Functional Anal. 5 (1970), 351–367. [42] H. H. Kuo, Gaussian Measures in Banach Spaces, Springer LNM, 403, Berlin, 1975. [43] J. Lindenstrauss and A. Pelsi´ nski, Absolutely summing operators in Lp -spaces and applications, Studia Math. 29 (1968), 273–326. [44] B. B. Mandelbrot and J. W. van Ness, Fractional Brownian motion, fractional noises and applications, SIAM Rev. 10 (1968), 432–437. [45] P. R. Masani, On infinitely decomposable probability distributions and helical varieties in Hilbert space, Proc. Multivariate Analysis, Academic Press, 1973, pp. 309–323. [46] P. R. Masani, The homogeneous chaos from the standpoint of vector measures, Phil. Trans. R. Soc. Lond. A 355 (1997), 1109–1258. [47] A. Millet and W. Smole´ nski, On the continuity of Ornstein-Uhlenbeck processes in infinite dimensions, Prob. Th. and Related Fields 92, 529–547. [48] J. Neveu, Processus Al´ eatories Gaussians, Les Presses de Montr´eal, Canada, 1968. [49] R. E. A. C. Palay, N. Wiener and A. Zygmund, Notes on random functions, Math. Zeit. 37 (1933), 647–668. [50] T. S. Pitcher, Likelihood ratios for diffusion processes, with shifted mean values, Trans. Amer. Math. Soc. 101 (1961), 165–176. [51] T. S. Pitcher, Parameter estimation for stochastic processes, Acta Math. 112 (1964), 1–40. [52] M. M. Rao, Paradoxes in conditional probability, J. Multivar. Anal. 27 (1980), 434–446. [53] M. M. Rao, Stochastic processes and cylindrical probabilities, Sankhya, Ser. A 43 (1981), 149–169. [54] M. M. Rao, Projective limits of probability spaces, J. Multivar. Anal. 1 (1971), 28–57. [55] M. M. Rao, Exact evaluation of conditional expectations in the Kolmogorov model, Indian J. Math. 35 (1993), 57–69. [56] M. M. Rao, An approach to stochastic integration: (Ageneralized and unified treatment), Multivariate Analysis: Future Directions, C. R. Rao (ed.), Elsevier, New York, pp. 347–374. [57] M. M. Rao, Stochastic Processes: General Theory, Kluwer Academic, Dordrect, The Netherlands, 1995. [58] M. M. Rao, Higher order stochastic differential equations, Real and Stochastic Analysis: Recent Advances, CRC Press, New York, 1997, pp. 225–302. [59] M. M. Rao, Conditional Measures and Applications, 2nd edition, Chapman and Hall, London and New York, 2005. [60] M. M. Rao, Stochastic Processes: Inference Theory, Kluwer Academic, Dordrect, The Netherlands, 2000. [61] M. M. Rao, Representations of conditional measures, Georgian Math. J 8 (2001), 363–376. [62] M. M. Rao, Evolution operators in stochastic processes and inference, Lect. Notes in Math. 234 (2003), 353–372. Marcel Dekker, New York.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Entropy, SDE-LDP and Fenchel-Legendre-Orlicz Classes
b1644-ch06
501
[63] M. M. Rao, Characterization and duality of projective and direct limits of measures and applications, International J. Math. 22 (2011), 1089–1119. [64] M. M. Rao, Random and Vector Measures, World Scientific, Singapore, 2012. [65] M. M. Rao and R. J. Swift, Probability Theory with Applications, 2nd edition, first ed. 1984, Springer, New York, 2006. [66] M. M. Rao and Z. D. Ren, Theory of Orlicz Spaces, Marcel Dekker, New York, 1991. [67] M. M. Rao and Z. D. Ren, Applications of Orlicz Spaces, Marcel Dekker, New York, 2002. [68] R. T. Rockafeller, Convex Analysis, Princeton Univ. Press, Princeton, N.J., 1970. [69] H. L. Royden, Real Analysis, 2nd edition, Macmillian Company, New York, 1968. [70] A. Yu. Rozanov, Infinite-Dimensional Gaussian Distributions, AMS Translations, Providence, R.I., 1971. [71] G. Samorodnitsky and M. S. Takku, Stable Non-Gaussian Random Processes, Chapman and Hall, New York, 1994. [72] H. Sato, Gaussian measures on a Banach space and abstract Wiener measure, Nagoya Math. J. 36 (1969), 65–81. [73] V. V. Sazonov, Normal Approximation–Some Recent Results, Springer LNM, Vol. 829, 1981. [74] V. V. Sazonov and B. A. Zaleski, On the Central Limit Theorem in Hilbert Space, Technical Report, Moscow, 1983. [75] L. Schwartz, Radon Measures on Arbitrary Topological Spaces and Cylindrical Measures, Tata Institute of Fundamental Research, Bombay, 1973. [76] M. Schilder, “Some asymptotic formulas for Wiener integrals, Trans. Amer. Math. Soc. 125 (1966), 63–85. [77] A. V. Skorkhod, Studies in the Theory of Random Processes, Addison-Wesley, Reading, MA, 1965. [78] A. V. Skorkhod, On a representation of random variables, Ther. Veroyat. 21 (1976), 628–631. [79] J. B. Turett, Fenchel-Orlicz spaces, Dissertationes Math. 181 (1980), 1–85. [80] J. J. Uhl jr, Orlicz spaces of finitely additive set functions, Studia Math. 29 (1967), 19–58. [81] S. R. S. Varadhan, Large Deviations and Applications, SIAM Publications, Philadelphia, PA, 1984. [82] S. R. S. Varadhan, Large deviations, Ann. Prob. 36 (2008), 397–419. [83] J. R. Velman, Likelihood Ratios Determined by Differentiable Families of Measures, Hughes Aircraft Co. Report, Los Angeles, CA, 1969. [84] N. Wiener, The homogeneous chaos, Amer. J. Math. 60 (1938), 997–1030. [85] J. Yeh, Wiener measure in a space of functions of two variables, Trans. Amer. Math. Soc. 95 (1960), 432–450. [86] J. Yeh, Inversion of conditional expectations, Pacific J. Math. 52 (1975), 631–640. [87] S. Zabell, Continuous versions of regular conditional distributions, Ann. Prob. 7 (1979), 157–165.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
This page intentionally left blank
b1644-ch06
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
CHAPTER 7 BISPECTRAL DENSITY ESTIMATION IN HARMONIZABLE PROCESSES
H. SOEDJAK
1. Introduction The class of harmonizable processes extends the class of weakly stationary processes. The estimation problems of weakly stationary processes have been extensively studied. Similar studies for harmonizable processes have not been carried out. The purpose of this paper is to make a contribution to the area by establishing that the limit distribution of a bispectral density estimator of strongly harmonizable processes is normal. It has been shown that such estimator is consistent in Soedjak [9]. In the stationary case the estimator can be formed from a single sample path. However, for strongly harmonizable case a single sample path does not seem to be sufficient to construct a consistent estimator. The procedure used in this paper is to resample the sample paths of the process. Thus we form many sample paths of the process. In practice, the sample paths formed by this resampling are taken to be independent. Our method allows a dependence among the sample paths, in particular α-mixing dependence. The study here is conducted for the discrete strongly harmonizable processes. The extension to the continuous parameter case can be obtained without introducing significant modifications. Let L20 (P ) be the Hilbert Space of square integrable (real or complex) zero mean random variables on the underlying probability space (Ω, Σ, P ) with the usual inner product (X, Y ) = EXY and the norm X = (X, X). We consider {Xt , t ∈ Z} ⊂ L20 (P ) to be a strongly harmo- nizable random sequence, i.e., the covariance function r(s, t) = E Xs Xt has the representation π π eiλs−iλ t dF (λ, λ ), (1.1) r(s, t) = −π
−π
503
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
504
b1644-ch07
Real and Stochastic Analysis
where F , the bispectral distribution, determines a bounded complex measure on the σ-algebra generated by the open sets of the square (−π, π]2 . In this case, the bispectral distribution F can be expressed in terms of the covariance function r(·, ·). The inversion formula given in Rao [7] is as follows, m m 1 e−iλ2 s − e−iλ1 s eiλ2 t − eiλ1 t r(s, t), m→∞ (2π)2 −is it s=−m t=−m
F (A, A ) = lim
(1.2) where A and A are continuity intervals, that is A = (λ1 , λ2 ), A = (λ1 , λ2 ) and F (λ1 ±, λ2 ±) = F (λ1 , λ2 ), F (λ1 ±, λ2 ±) = F (λ1 , λ2 ). Rewriting (1.2) one has λ2 m λ2 m 1 −iλs e dλ eiλ t dλ r(s, t) F (A, A ) = lim m→∞ (2π)2 λ1 s=−m t=−m λ1
λ2
= lim
m→∞
λ1
λ2 λ1
m m 1 e−iλs eiλ t r(s, t)dλ dλ . 2 (2π) s=−m t=−m
It is now assumed that ∞
∞
|r(s, t)| < ∞,
(1.3)
s=−∞ t=−∞
then by the Lebesque dominated convergence theorem one has λ2 λ2 F (A, A ) = f (λ, λ ) dλ dλ λ1
λ1
with f (λ, λ ) =
∞
1
∞
e−iλs eiλ t r(s, t).
(1.4)
It follows that the representation of r(s, t) in (1.1) becomes π π eiλs−iλ t f (λ, λ ) dλ dλ . r(s, t) =
(1.5)
2
(2π)
−π
s=−∞ t=−∞
−π
It can be shown that the bispectral density f is continuous (−π, π]2 . Condition (1.3) requires that the covariance vanishes as s, t → ± ∞. This condition precludes the class of periodically correlated stochastic processes where the covariance does not vanish as s, t → ± ∞. See Hurd [1].
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
505
The random sequence itself is representable by the following integral π eitλ dZ(λ), (1.6) Xt = −π
where Z : B((−π, π]) → L20 (P ) is a σ-additive set function on the Borel σ-algebra B((−π, π]) of (−π, π], that is related to F as follows: EZ (A)Z(B) = dF (λ, λ ), A, B ∈ B((−π, π]). A
B
Section 2 presents the assumptions and a resampling procedure. The asymptotic distribution is shown in Section 3. 2. Assumptions and a Resampling Procedure In the stationary case, the covariance function r(s, t) is translation invariant, i.e., r(s, t) = r˜(s − t) and the sampling can be based on a single sample path of the process. However, in the strongly harmonizable case the covariance function depends on two variables, which are not so related. A single sample path does not seem to be adequate to construct a consistent estimator. The estimator will be formed from a sequence of sample paths of the process. Each sample path is strongly harmonizably correlated. Here, one provides a general abstraction with respect to the dependence among the sample paths. The sample paths are taken to be dependent such that the dependence decreases as the sample paths are farther separated. The detail of the procedure is as follows. The sampling consists of a sequence of n sample paths of the process. Each sample path starts from time −m and ends at time m: 1 1 , · · · , Xm } X 1m = {X−m .. .. . .
X nm
=
n {X−m ,···
(2.1)
n , Xm }.
The usual procedure of obtaining these sequences is by resampling, which in practice means that each row is independent of the preceding with the same distributions. Here the resampling is allowed to have some dependence and the distributions need not be the same. Thus the methods presented apply for a somewhat more general situation and it may be termed a ‘generalized resampling’ procedure. More precisely, from the point of view of mathematical analysis, this sequence of (varying) random vectors is assumed to satisfy the following condition.
October 24, 2013
506
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
Real and Stochastic Analysis
(A) As one takes more sample paths, the size of the sample paths also increases, that is, m = m(n) → ∞ as n → ∞; (B) For each j, the sample path {Xsj , s = −m, · · · , m} is strongly harmonizably correlated, i.e., π π eiλs−iλ t f (λ, λ )dλdλ , E[Xsj Xtj ] = −π
−π
where f (λ, λ ) is independent of j. Hence r(s, t) = E[Xsj Xtj ]
∀ j = 1, . . . , n.
Moreover, the covariance satisfies the condition ∞
∞
|r(s, t)| < ∞;
s=−∞ t=−∞
(C) The sample paths X jm , j = 1, . . . , n are α-mixing. That is, letting ∞
u
u ∞ F1,m = σ({X jm }j=1 ) and Fu+k,m = σ({X jm }j=u+k ),
define the mixing coefficient α(·) by α(k) = sup |P (A ∩ B) − P (A)P (B)| , u ∞ , B ∈ Fu+k,m . Then where the sup is taken over all u, A ∈ F1,m
α(k) → 0
as k → ∞.
α-mixing allows some dependence among the sample paths. This dependence tends to be weaker as the sample paths are farther separated. Notice that the special case of m-dependence holds if α(k) = 0 for all k > m, where m is some fixed integer. This automatically includes the case for which the Xjm are independent in the j variable, that is α(k) = 0 for k = 1, 2, . . . . A sequence of α-mixing random vectors can be constructed from independent random vectors. See Pham and Tran for such construction [4]. A comprehensive account of the work for the stationary case can be found in the recent work by Yaglom [10]. See also the paper by Politis and Romano [5] for the use of a procedure using ‘large’ and ‘small’ blocks in estimation problem for stationary processes. One notes that the estimation problem for a nonstationary class has been carried out for the class of (almost) periodically correlated processes (see Hurd and Leskow [2]). However, the resampling method used here is not
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
507
adequate for the generalized resampling required for strongly harmonizable case. Consider the statistic 1 j j X X . n j=1 s t n
rn (s, t) =
(2.2)
In Soedjak [9] it was shown under the above resampling procedure satisfying conditions (A), (B) and (C) that this is a consistent estimator of r(s, t). The consistency of rn (·, ·) and the inversion formula (1.4) indicate that one can consider the statistic fˆn,m (λ, λ ) =
m
1 (2π)
m
2
e−iλs eiλ t rn (s, t)
(2.3)
s=−m t=−m
as an estimator of the bispectral density f . In general, the consistency of rn (·, ·) does not necessarily imply the consistency of the estimator fˆn,m (λ, λ ). However, when the growth of m = m(n) w.r.t. that of n is controlled, then the estimator (2.3) is consistent as was established in Soedjak [9]. So far, Xsj are in general complex-valued. For the study of the limit distribution of the bispectral density estimator, Xsj are taken to be realvalued. When Xs are real-valued the following changes are in effect. The σ-additive set function in the representation of Xs (1.6) can be expressed as Z(A) = Z1 (A) + iZ2 (A),
A ∈ B ((−π, π]) ,
(2.4)
where Zj , j = 1, 2 are real-valued satisfying the following conditions: (i) Z1 (λ) = Z1 (−λ) and Z2 (λ) = −Z2 (−λ) with −λ denoting the interval symmetric to λ with respect to the point λ = 0; (ii) EZ 1 (A)Z2 (B) = 0, A, B ∈ B ((−π, π]); 2 2 (iii) E [Z1 (A)] = E [Z2 (A)] , A ∈ B ((−π, π]). This implies that the bispectral density f can be reduced to a real-valued and symmetric function 1 f (λ, λ ) = (2π)2
∞ s,t=−∞
(cos sλ cos tλ + sin sλ sin tλ )r(s, t).
(2.5)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
508
b1644-ch07
Real and Stochastic Analysis
Now expanding the representation of the covariance r(s, t) in (1.5) one has, by taking into consideration that f is real, r(s, t) =
π
−π
π −π
[(cos sλ cos tλ + sin sλ sin tλ )
+ i(sin sλ cos tλ − cos sλ sin tλ )]f (λ, λ ) dλ dλ . Since f (λ, λ ) = f (−λ, −λ ) the functions sin sλ cos tλ f (λ, λ ) and cos sλ sin tλ f (λ, λ ) are odd in λ and λ , respectively. It follows that the integral in the imaginary part must be equal to zero. Hence, the representation of the covariance becomes π π r(s, t) = (cos sλ cos tλ + sin sλ sin tλ )f (λ, λ )dλ dλ . (2.6) −π
−π
The formula for the bispectral density f in (2.5) indicates that one can consider the statistic fˆn,m (λ, λ ) =
1 (2π)2
m
(cos sλ cos tλ + sin sλ sin tλ ) rn (s, t),
(2.7)
s,t=−m
as an estimator of (real-valued) bispectral density f given in (2.5) where n rn (s, t) = n1 j=1 Xsj Xtj . The consistency of the complex valued as well as the real valued bispectral density estimator fˆn,m (λ, λ ) of the form (2.3) and (2.7), respectively was shown in Soedjak [9]. The consistency of the latter allows further investigations on its limit distribution. It is established that under further conditions the sequence of r.v.’s fˆn,m (λ, λ ) − An,m (λ, λ ) Bn,m (λ, λ ) converges weakly to the standard normal distribution for suitable real sequences of constants An,m (λ, λ ) and positive Bn,m (λ, λ ). The result is given as the corollary of Theorem 10. Consider for any fixed λ, λ ∈ (−π, π] j Zm (λ, λ ) =
m m
1 2
(2π)
(cos sλ cos tλ + sin sλ sin tλ )Xsj Xtj , (2.8)
s=−m t=−m
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
509
Bispectral Density Estimation in Harmonizable Processes
and then the partial sum Sn,m (λ, λ ) =
n
j Zm (λ, λ ).
(2.9)
j=1
Note that the bispectral density fˆn,m (λ, λ ) is equal to n1 Sn,m (λ, λ ). The procedure to be used here is a generalization of that found in Ibragimov and Linnik ([3], Ch.18), given for the stationary sequences. The harmonizable case is much more involved in comparison. The partial sum (2.9) is partitioned into alternating blocks of lengths p and q starting with a block of length p (hereafter the dependency of the r.v.’s on λ, λ will be suppressed): n j=1
j Zm =
p
j Zm +
j=1
2p+q
p+q j=p+1
2p+q+1
j Zm
j=(l+1)p+lq+1
kp+kq
kp+(k−1)q
j=(k−1)p+(k−1)q+1
+
n
j Zm
(l+1)p+(l+1)q j Zm +
j=lp+lq+1
+ ··· +
j Zm +
j=p+q+1
(l+1)p+lq
+ ··· +
2p+2q
j Zm +
j Zm +
j Zm
j=kp+(k−1)q+1
j Zm
j=kp+kq+1 n where k = [ p+q ]. Consequently, the last block has length q given by
q = n − (p + q)k
n = n − (p + q) p+q n −1 q ≤ n − (p + q) p+q = p + q. The p−blocks are represented by the partial sums (l+1)p+lq
j=lp+lq+1
j Zm
for 0 ≤ l ≤ k − 1,
October 24, 2013
10:0
9in x 6in
510
Real and Stochastic Analysis: Current Trends
b1644-ch07
Real and Stochastic Analysis
the q−blocks by (l+1)p+(l+1)q
j Zm
for 0 ≤ l ≤ k − 1,
j=(l+1)p+lq+1
and the q −block by n
j Zm .
j=kp+kq+1
To simplify the notations, denote Ul,n,m as a p−block and Wl,n,m as a q−block as follows: (l+1)p+lq
Ul,n,m =
j Zm ,
0≤l ≤k−1
j=lp+lq+1
Wl,n,m =
(l+1)p+(l+1)q j Zm , 0 ≤ l ≤ k − 1, j=(l+1)p+lq+1
n
j Zm ,
l = k.
j=kp+kq+1
j is attached to the Wl,n,m ’s, such Note that the q −block, nj=kp+kq+1 Zm n j that Wk,n,m = j=kp+kq+1 Zm . Now Equation (2.9) can be written with the above notations as Sn,m (λ, λ ) =
n
j Zm (λ, λ )
j=1
=
k−1
Ul,n,m +
l=0
k
Wl,n,m .
l=0
To further simplify, let Sn,m represent the sum of all p−blocks, and Sn,m represent the sum of all q−blocks including the q −block, that is Sn,m =
k−1
Ul,n,m
l=0 Sn,m =
k l=0
Wl,n,m .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
511
Thus one has + Sn,m . Sn,m = Sn,m
(2.10)
The parameters p, q, k, n and m are assumed to satisfy: p = p(n) → ∞, as n → ∞, such that p = o(n) as n → ∞,
n , such that q = o(p) as n → ∞, q= (A) p
n . k = p+q (B) m = m(n) → ∞ as n → ∞ such that m4 ∼ nµ for 0 < µ < 1 as n → ∞. Note that since q = o(p) as n → ∞, k and q have the same rate of growth. It also follows that n = o(p2 ) and kq = o(n) as n → ∞. This implies that there exists 0 < ν < 1 such that p1+ν ∼ n as n → ∞. An example 1 that satisfies (A) is when p = n 2 + for 0 < < 1/2, then as n → ∞, 1 p = o(n) and q = [n 2 − ] = o(p). In this example ν is a solution of the equation (1/2+)(1+ν) = 1 given . Condition (B) implies that m4 = o(n) as n → ∞. The strategy of the argument is as follows. Consider from (2.10) the normalized centered sum: − ES n,m − ES n,m Sn,m Sn,m Sn,m − ES n,m = + σn,m σn,m σn,m
(2.11)
2 where σn,m = V arSn,m which also depends on λ and λ . The idea of the S
−ES
proof is to show that n,mσn,m n,m converges to 0 in probability as n → ∞; in other words the contribution of the q−blocks are asymptotically ‘negligible’ (Lemma 6). It therefore follows from Slutsky’s theorem that the r.v.’s Sn,m −ES n,m σn,m
and
Sn,m −ES n,m σn,m
will have the same limit distribution. Then S
−ES
one is left to establish that the r.v. n,mσn,m n,m has a standard normal limit distribution. It will be shown in Lemma 7 that this is possible because when n, m are large, the negligibility of the q−blocks and the use of α−mixing condition will cause the p−blocks to behave as if they are independent. However, further specialization will be needed before the above general case is considered: One first looks at the case when |Xsj | ≤ c0 < ∞∀ j, s (Theorem 9), then one considers the case when the r.v.’s Xsj are not necessarily bounded but its
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
512
b1644-ch07
Real and Stochastic Analysis 4(1+δ)
moments satisfy E|Xsj | = K < ∞ ∀ j, s (Theorem 10). The latter case includes the Gaussian process. Towards this end the following conditions are assumed. (1) The Xsj are real with its corresponding σ-additive set function Z as given in (2.4). (2) For j = 1, . . . , n, {Xsj , s = −m, . . . , m} is strongly harmonizably correlated, π π j j E[Xs Xt ] = eisλ e−itλ f (λ, λ )dλdλ , −π
−π
where f (λ, λ ) is independent of j, so that E[Xsj Xtj ] remains the same between vector components, that is r(s, t) = E[Xsj Xtj ] ∀ j = 1, . . . , n. (3) For j = j ; and j, j = 1, . . . , n
E[Xsj Xsj ] =
r(|j − j |), 0,
s = s s = s
∞ where k=1 r2 (k) < ∞. (4) Xjm are α−mixing in the j variable, whose mixing coefficient α(·) satisfies the following growth condition α(k) ≤
c , k 1+β
c > 0,
β≥
µ (1 + ν), 2
k = 1, 2, . . .
where ν and µ are defined in Condition (B) and in the remarks following it. (5) n
m
j,j =1 s,t,s ,t =−m
|cum(Xsj , Xtj , Xsj , Xtj )| = o(nm2 ) as n, m → ∞
where the cumulants, cum(·, ·, ·, ·) are defined as follows: Let φ(t) = φ(t1 , . . . , tk ) be the characteristic function of the random vector X1 .. X = . , φ(t) = Eeit·X . Xk
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
513
Bispectral Density Estimation in Harmonizable Processes
By using the truncated Taylor expansion at n of eit·X about 0, we have with |t| = |t1 | + · · · + |tk |
φ(t) = E
ν1 +···+νk ≤n
=
ν1 +···+νk ≤n
ν1 +···+νk
i X ν1 · · · Xkνk tν11 · · · tνkk + o(|t|n ) ν 1 ! · · · νk ! 1
iν1 +···+νk E [X1ν1 · · · Xkνk ] tν11 · · · tνkk + o(|t|n ), ν 1 ! · · · νk !
that is the moments E[X1ν1 · · · Xkνk ] are the coefficients in the Taylor expansion. The cumulants cum(X1ν1 , . . . , Xkνk ) are defined to be the coefficients of the Taylor expansion of log φ(t) about 0, log φ(t) =
ν1 +···+νk ≤n
iν1 +···+νk cum(X1ν1 , . . . , Xkνk ) tν11 · · · tνkk + o(|t|n ). ν1 ! · · · νk !
Here log φ(t) is taken as the principal value of the logarithm. Remarks (a) φ(t) can be taken as the moment generating function if the Xj ’s are bounded, and hence have all moments finite. This is why the Taylor expansion of φ(t) and log φ(t) can be taken without any problem. The cumulants and the moments in Assumption (5) are related. For details and proofs see M. Rosenblatt [8]. For our purposes the following relation suffices: p−1 cum(X1 , . . . , Xk ) = (−1) (p − 1)! E Xj · · · E Xj j∈β1
j∈βp
where (β1 , . . . , βp ) is a partition of 1, 2, . . . , k and the sum is over all such partitions. In particular, when k = 4 all possible partitions of {1, 2, . . . , k} are as follows: : {1, 2, 3, 4} : {1}{2, 3, 4}; {2}{1, 3, 4}; {3}{1, 2, 4}; {4}{1, 2, 3} {1, 2}{3, 4}; {1, 3}{2, 4}; {1, 4}{2, 3} : {1}{2}{3, 4}; {1}{3}{2, 4}; {1}{4}{2, 3} p = 3, β1 , β2 , β3 {2}{3}{1, 4}; {2}{4}{1, 3}; {3}{4}{1, 2} p = 4, β1 , β2 , β3 , β4 : {1}{2}{3}{4}.
p = 1, β1 p = 2, β1 , β2
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
514
b1644-ch07
Real and Stochastic Analysis
Consequently, the above relation becomes cum(X1 , X2 , X3 , X4 ) = E[X1 X2 X3 X4 ] − {EX1 E[X2 X3 X4 ] + EX2 E[X1 X3 X4 ] + EX3 E[X1 X2 X4 ] + EX4 E[X1 X2 X3 ]} − {E[X1 X2 ]E[X3 X4 ] + E[X1 X3 ]E[X2 X4 ] + E[X1 X4 ]E[X2 X3 ]} + 2{EX1 EX2 E[X3 X4 ] + EX1 EX3 E[X2 X4 ] + EX1 EX4 E[X2 X3 ] + EX2 EX3 E[X1 X4 ] + EX2 EX4 E[X1 X3 ] + EX3 EX4 E[X1 X2 ]} − 6EX1 EX2 EX3 EX4 . Moreover, when EXj = 0, j = 1, 2, 3, 4 and Xj are real it further simplifies to cum(X1 , X2 , X3 , X4 ) = E[X1 X2 X3 X4 ] − E[X1 X2 ]E[X3 X4 ] − E[X1 X3 ]E[X2 X4 ] − E[X1 X4 ]E[X2 X3 ] = Cov (X1 X2 , X3 X4 ) − E[X1 X3 ]E[X2 X4 ] − E[X1 X4 ]E[X2 X3 ]. (b) The requirement for the mixing coefficient α(k) ≤ Assumption (5) implies that ∞
αδ/(2+δ) (k) < ∞ for δ > (2 + β)/β.
(2.12) 1 , k1+β
β > 0 in
(2.13)
k=1
For, ∞
α
δ/(2+δ)
(k) ≤
k=1
∞
k −(1+β)δ/(2+δ)
k=1
<
∞
k −(1+β)(2+β)/(2+3β)
k=1
since x/(2 + x) is an increasing finite function of x and δ > (2 + β)/β. Thus ∞ k=1
αδ/(2+δ) (k) =
∞
β2
k −(1+ 2+3β ) < ∞.
k=1
(c) Except the moment condition, all conditions for the consistency of the estimator fˆn,m (λ, λ ) are included in Assumptions (B), (2) and (4). These conditions and a moment condition will be used in Corollary 11.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
515
3. The Limit Distribution of the Estimator The proof of Theorem 9 below depends on the following seven lemmas, which are first established. Lemma 1. If |Xsj | ≤ c0 < ∞, a.e. ∀ j, s then as m → ∞ c0 2r j r | ≤ (2m + 1)2r 2r ( 2π ) a.e., r ≥ 1, j = 1, . . . , n. (i) |Zm
c0 2 (ii) |Ul,n,m | ≤ p (2m + 1)2 2( 2π ) a.e., 0 ≤ l ≤ k − 1. c0 2 2 0≤l ≤k−1 q(2m + 1) 2 2π a.e., c0 2 (iii) |Wl,n,m | ≤ 2 (p + q)(2m + 1) 2 2π a.e., l = k.
Proof.
j (i) Clearly, by Definition (2.8) of Zm , one has for j = 1, 2, . . . , n ! c0 2r j r | ≤ (2m + 1)2r 2r a.e., r ≥ 1. |Zm 2π
j ’s for 0 ≤ l ≤ k − 1, (ii) Since being a p−block, Ul,n,m has p terms of Zm one has by using (i), (l+1)p+lq
|Ul,n,m | ≤
j |Zm |
j=lp+lq+1 j | ≤ p max |Zm j
≤ p (2m + 1)2 2
c0 !2 a.e. 2π
(iii) Similarly, since Wl,n,m is a q−block for 0 ≤ l ≤ k − 1, which has q j ’s, and since it is a q −block for l = k which has less than terms of Zm j (p + q) terms of Zm ’s one has c0 ! 2 2 a.e., 0≤l ≤k−1 q(2m + 1) 2 2π |Wl,n,m | ≤ !2 (p + q)(2m + 1)2 2 c0 a.e., l = k. 2π Lemma 2. Let Assumptions (1), (2) be satisfied. Then j j V arZm (λ, λ ) = I1,m (λ, λ ) + I2,m (λ, λ ) + I3,m (λ, λ )
where j (λ, λ ) I1,m
1 = (2π)4
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
·(cos s λ cos t λ + sin s λ sin t λ )cum(Xsj , Xtj , Xsj , Xtj ),
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
516
b1644-ch07
Real and Stochastic Analysis
1 [f (λ, λ)f (λ , λ ) + f (λ, −λ)f (λ , −λ )] as m → ∞ 2 = h2 (λ, λ ), (say), 1 I3,m (λ, λ ) → [f 2 (λ, λ ) + f 2 (λ, −λ )] as m → ∞ 2 = h3 (λ, λ ), (say).
I2,m (λ, λ ) →
Proof.
j gives Definition (2.8) of Zm
j V arZm (λ, λ ) =
m
1 (2π)
4
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
· (cos s λ cos t λ + sin s λ sin t λ )Cov (Xsj Xtj , Xsj Xtj ). The covariance, Cov (Xsj Xtj , Xsj Xtj ) can be expanded using (2.12) and Assumption (3) as follows: Cov (Xsj Xtj , Xsj Xtj ) = cum(Xsj , Xtj , Xsj , Xtj ) + E[Xsj Xsj ]E[Xtj Xtj ] + E[Xsj Xtj ]E[Xtj Xsj ] = cum(Xsj , Xtj , Xsj , Xtj ) + r(s, s )r(t, t ) + r(s, t )r(t, s ). Then j j V arZm (λ, λ ) = I1,m (λ, λ ) + I2,m (λ, λ ) + I3,m (λ, λ )
(3.1)
where j I1,m (λ, λ ) =
m
1 4
(2π)
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
· (cos s λ cos t λ + sin s λ sin t λ ) cum(Xsj , Xtj , Xsj , Xtj ), (2π)4 I2,m (λ, λ ) m
=
s,s =−m
+
m
cos sλ cos s λ r(s, s )
m s,s =−m
cos tλ cos t λ r(t, t )
t,t =−m
sin sλ sin s λ r(s, s )
m t,t =−m
sin tλ sin t λ r(t, t )
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
+
m s,s =−m
+
m
cos sλ sin s λ r(s, s )
m
b1644-ch07
517
cos tλ sin t λ r(t, t )
t,t =−m m
sin sλ cos s λ r(s, s )
s,s =−m
sin tλ cos t λ r(t, t ),
t,t =−m
(2π)4 I3,m (λ, λ ) m
=
s,t =−m
+
m
cos sλ cos t λ r(s, t )
m
m
sin sλ sin t λ r(s, t )
s,t =−m
+
m
m
sin tλ sin s λ r(t, s )
t,s =−m m
cos sλ sin t λ r(s, t )
s,t =−m
+
cos tλ cos s λ r(t, s )
t,s =−m
cos tλ sin s λ r(t, s )
t,s =−m m
sin sλ cos t λ r(s, t )
s,t =−m
sin tλ cos s λ r(t, s ).
t,s =−m
To compute I2,m (λ, λ ) and I3,m (λ, λ ) in (3.1), note that the last two terms in I2,m (λ, λ ) and I3,m (λ, λ ) are equal to zero, since the summands there are odd in one variable due to the fact that r(s, t) = r(−s, −t). Let I2,1,m (λ) = I2,2,m (λ) = I3,1,m (λ, λ ) = I3,2,m (λ, λ ) =
1 (2π)2 1 (2π)2 1 (2π)2 1 (2π)2
m
cos sλ cos tλ r(s, t)
s,t=−m m
sin sλ sin tλ r(s, t)
s,t=−m m
cos sλ cos tλ r(s, t)
s,t=−m m
sin sλ sin tλ r(s, t).
s,t=−m
Then I2,m (λ, λ ) = I2,1,m (λ) I2,1,m (λ ) + I2,2,m (λ) I2,2,m (λ ), and I3,m (λ, λ ) = (I3,1,m (λ, λ ))2 + (I3,2,m (λ, λ ))2 .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
518
b1644-ch07
Real and Stochastic Analysis
By using the representation (2.6) for r(s, t), one has I2,1,m (λ) =
π −π
π
1 (2π)2
−π
m
cos sλ cos tλ(cos sα cos tα + sin sα sin tα )
s,t=−m
· f (α, α ) dα dα π π m m 1 1 = cos sλ cos sα cos tλ cos tα (2π) (2π) −π −π s=−m t=−m · f (α, α ) dα dα since
m s,t=−m π
π
= −π
−π
cos sλ cos tλ sin sα sin tα = 0,
m 1 cos s(λ − α) + cos s(λ + α) (2π) s=−m 2
m 1 cos t(λ − α ) + cos s(λ + α ) f (α, α ) dα dα (2π) t=−m 2 1 sin(2m + 1)(λ − α)/2 1 sin(2m + 1)(λ + α)/2 1 π π + = 4 −π −π (2π) sin(λ − α)/2 (2π) sin(λ + α)/2 1 sin(2m + 1)(λ − α )/2 1 sin(2m + 1)(λ + α )/2 · + (2π) sin(λ − α )/2 (2π) sin(λ + α )/2
·
·f (α, α ) dα dα by using the identity →
m s=−m
cos sλ =
sin(2m+1)λ/2 , sin λ/2
1 [f (λ, λ) + f (λ, −λ) + f (−λ, λ) + f (−λ, −λ)] as m → ∞ 4
since Dm (λ) =
1 sin(2m+1)λ/2 2π sin λ/2
=
is the Dirichlet kernel and f is continuous
1 [f (λ, λ) + f (λ, −λ)] 2
due to the fact that f (λ, λ ) = f (−λ, −λ ). Similar computations can be used to show the following: I2,2,m (λ) 1 = 4
π −π
π −π
1 sin(2m + 1)(λ − α)/2 (2π) sin(λ − α)/2
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
519
Bispectral Density Estimation in Harmonizable Processes
1 sin(2m + 1)(λ + α)/2 − (2π) sin(λ + α)/2 1 sin(2m + 1)(λ − α )/2 1 sin(2m + 1)(λ + α )/2 · − (2π) sin(λ − α )/2 (2π) sin(λ + α )/2 · f (α, α ) dα dα 1 [f (λ, λ) − f (λ, −λ) − f (−λ, λ) + f (−λ, −λ)] 4 1 = [f (λ, λ) − f (λ, −λ)], 2 →
as m → ∞
I3,1,m (λ, λ ) 1 sin(2m + 1)(λ − α)/2 1 π π = 4 −π −π (2π) sin(λ − α)/2 1 sin(2m + 1)(λ + α)/2 + (2π) sin(λ + α)/2 1 sin(2m + 1)(λ − α )/2 1 sin(2m + 1)(λ + α )/2 · + (2π) sin(λ − α )/2 (2π) sin(λ + α )/2 · f (α, α ) dα dα 1 [f (λ, λ ) + f (λ, −λ ) + f (−λ, λ ) + f (−λ, −λ )] 4 1 = [f (λ, λ ) + f (λ, −λ )], 2 →
as m → ∞
I3,2,m (λ, λ ) 1 π π 1 sin(2m + 1)(λ − α)/2 = 4 −π −π (2π) sin(λ − α)/2 1 sin(2m + 1)(λ + α)/2 − (2π) sin(λ + α)/2 1 sin(2m + 1)(λ + α )/2 1 sin(2m + 1)(λ − α )/2 − · (2π) sin(λ − α )/2 (2π) sin(λ + α )/2 · f (α, α ) dα dα 1 [f (λ, λ ) − f (λ, −λ ) − f (−λ, λ ) + f (−λ, −λ )] 4 1 = [f (λ, λ ) − f (λ, −λ )] . 2 →
as m → ∞
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
520
b1644-ch07
Real and Stochastic Analysis
It now follows that as m → ∞ 1 I2,m (λ, λ ) → [f (λ, λ) + f (λ, −λ)][f (λ , λ ) + f (λ , −λ )] 4 1 + [f (λ, λ) − f (λ, −λ)][f (λ , λ ) − f (λ , −λ )] 4 1 = [f (λ, λ)f (λ , λ ) + f (λ, −λ)f (λ , −λ )], 2 and 1 1 [f (λ, λ ) + f (λ, −λ )]2 + [f (λ, λ ) − f (λ, −λ )]2 4 4 1 2 = [f (λ, λ ) + f 2 (λ, −λ )]. 2
I3,m (λ, λ ) →
Hence the desired result.
Lemma 3. Let Assumptions (1) and (3) be satisfied. Then for j, j = 1, . . . , n and j = j ,
j,j j j Cov (Zm , Zm ) = I1,m (λ, λ ) +
"
1 r2 (|j − j |) 2(2π)4
× (2m + 1)2 +
sin(2m + 1)λ sin(2m + 1)λ sin λ sin λ
2 sin(2m + 1)(λ − λ )/2 sin(λ − λ )/2 2 # sin(2m + 1)(λ + λ )/2 + , sin(λ + λ )/2
+
where
j,j (λ, λ ) = I1,m
1 (2π)4
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
· (cos s λ cos t λ + sin s λ sin t λ )cum(Xsj , Xtj , Xsj , Xtj ). Proof.
j Definition (2.8) of Zm gives
j j , Zm )= Cov (Zm
1 (2π)4
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
· (cos s λ cos t λ + sin s λ sin t λ )Cov (Xsj Xtj , Xsj Xtj ).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
521
Bispectral Density Estimation in Harmonizable Processes
By using (2.12), namely
Cov (Xsj Xtj , Xsj Xtj ) = cum(Xsj , Xtj , Xsj , Xtj ) + E[Xsj Xsj ]E[Xtj Xtj ]
+ E[Xtj Xsj ]E[Xsj Xtj ] one can further write
j,j j,j j,j j j , Zm ) = I1,m + I2,m + I3,m Cov (Zm
(3.2)
where
j,j I1,m (λ, λ ) =
1 (2π)4
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
· (cos s λ cos t λ + sin s λ sin t λ )cum(Xsj , Xtj , Xsj , Xtj ),
j,j (2π)4 I2,m (λ, λ ) m
=
cos sλ cos s
s,s =−m m
+
sin sλ sin s
s,s =−m m
+
m t,t =−m
λEX js Xsj
m
cos sλ sin s λEX js Xsj
t,t =−m m t,t =−m m
s,s =−m
sin sλ cos s λEX js Xsj
cos tλ cos t λ EX jt Xtj
m
s,s =−m
+
λEX js Xsj
t,t =−m
sin tλ sin t λ EX jt Xtj
cos tλ sin t λ EX jt Xtj
sin tλ cos t λ EX jt Xtj ,
and
j,j (λ, λ ) (2π)4 I3,m m
=
s,t =−m
+
m s,t =−m
m
cos sλ cos t λ EX js Xtj
t,s =−m
sin sλ sin t λ EX js Xtj
cos tλ cos s λEX jt Xsj
m t,s =−m
sin tλ sin s λEX jt Xsj
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
522
b1644-ch07
Real and Stochastic Analysis m
+
s,t =−m m
+
m
cos sλ sin t λ EX js Xtj
t,s =−m m
s,t =−m
sin sλ cos t λ EX js Xtj
cos tλ sin s λEX jt Xsj
t,s =−m
sin tλ cos s λEX jt Xsj .
j,j j,j (λ, λ ) and I3,m (λ, λ ) can be simplified. By Assumption (4), I2,m
j,j (λ, λ ) = I2,m
1 r2 (|j − j |) (2π)4 " m # m m m 2 2 2 2 · cos sλ cos tλ + sin sλ sin tλ , s=−m
t=−m
s=−m
t=−m
and
j,j I3,m (λ, λ ) =
1 r2 (|j − j |) (2π)4 $ %2 $ m %2 m cos sλ cos sλ + sin sλ sin sλ . · s=−m
s=−m
But observe that by using appropriate trigonometric identities one has m
m 1 + cos 2sλ 2 s=−m $ % m 1 (2m + 1) + = cos 2sλ 2 s=−m sin(2m + 1)λ 1 (2m + 1) + , = 2 sin λ
cos2 sλ =
s=−m
and similarly, m
sin2 sλ =
s=−m
Also m s=−m
1 2
sin(2m + 1)λ (2m + 1) − . sin λ
m 1 (cos s(λ − λ ) + cos s(λ + λ )) 2 s=−m 1 sin(2m + 1)(λ − λ )/2 sin(2m + 1)(λ + λ )/2 , = + 2 sin(λ − λ )/2 sin(λ + λ )/2
cos sλ cos sλ =
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
523
Bispectral Density Estimation in Harmonizable Processes
and m
m 1 (cos s(λ − λ ) − cos s(λ + λ )) 2 s=−m 1 sin(2m + 1)(λ − λ )/2 sin(2m + 1)(λ + λ )/2 . = − 2 sin(λ − λ )/2 sin(λ + λ )/2
sin sλ sin sλ =
s=−m
This implies that
1 r2 (|j − j |) 4(2π)4 sin(2m + 1)λ sin(2m + 1)λ (2m + 1) + · (2m + 1) + sin λ sin λ
sin(2m + 1)λ sin(2m + 1)λ (2m + 1) − + (2m + 1) − sin λ sin λ
sin(2m + 1)λ sin(2m + 1)λ 1 2 2 r (|j − j |) (2m + 1) + , = 2(2π)4 sin λ sin λ
j,j I2,m (λ, λ ) =
and
j,j (λ, λ ) = I3,m
1 r2 (|j − j |) 4(2π)4 " 2 sin(2m + 1)(λ − λ )/2 sin(2m + 1)(λ + λ )/2 · + sin(λ − λ )/2 sin(λ + λ )/2 +
=
sin(2m + 1)(λ − λ )/2 sin(2m + 1)(λ + λ )/2 − sin(λ − λ )/2 sin(λ + λ )/2
1 r2 (|j − j |) 2(2π)4 " 2 2 # sin(2m + 1)(λ − λ )/2 sin(2m + 1)(λ + λ )/2 · + . sin(λ − λ )/2 sin(λ + λ )/2
The desired result follows from (3.2). Lemma 4. n, m → ∞
2 #
Let Assumptions (1), (2), (3) and (5) be satisfied. Then as
V arSn,m = Cnm2
∞ j=1
r2 (j)(1 + o(1))
October 24, 2013
10:0
9in x 6in
524
Real and Stochastic Analysis: Current Trends
b1644-ch07
Real and Stochastic Analysis
where 1 if λ = 0, λ = 0, 4 π 1 C= if λ = 0, λ = 0 or λ = 0, λ = 0 or λ, λ , λ − λ , λ + λ = 0, 2 π4 2 1 if λ, λ = 0, λ − λ = 0 or λ, λ = 0, λ + λ = 0. 2π 4 And similarly, V arUl,n,m = Cpm2
∞
r2 (j)(1 + o(1)),
0 ≤ l ≤ k − 1,
j=1
V arWl,n,m =
∞ Cqm2 r2 (j)(1 + o(1)),
0≤l ≤k−1
j=1
∞ 2 C(p + q)m r2 (j)(1 + o(1)),
l = k.
j=1
Proof.
Let
" 1 sin(2m + 1)λ sin(2m + 1)λ 2 gm (λ, λ ) = + (2m + 1) (2π)4 sin λ sin λ
+ Since
sin(2m + 1)(λ − λ )/2 sin(λ − λ )/2
2
+
sin(2m + 1)(λ + λ )/2 sin(λ + λ )/2
2 # .
1 sin(2m+1)λ/2 2π sin λ/2
is the Dirichlet kernel, sin(2m + 1)λ/2 → 0 as m → ∞ if λ = 0, sin λ/2 = 2m + 1 if λ = 0.
This implies that gm (λ, λ ) = Cm2 (1 + o(1)) as m → ∞ where 1 4 π 1 C= 2π4 2 1 2π4
if λ = 0, λ = 0, if λ = 0, λ = 0 or λ = 0, λ = 0 or λ, λ , λ − λ , λ + λ = 0, if λ, λ = 0, λ − λ = 0 or λ, λ = 0, λ + λ = 0.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
525
Bispectral Density Estimation in Harmonizable Processes
From Lemma 3 one can write 1 j,j j j , Zm ) = I1,m (λ, λ ) + gm (λ, λ )r2 (|j − j|) , Cov (Zm 2 and from Definition (2.9) of Sn,m one has V arSn,m =
n n
j j Cov (Zm , Zm )
j=1 j =1
=
n
j V arZm +2
n
j V arZm + gm (λ, λ )
=
r2 (|j − j|) + 2
j<j
j=1 n
j j Cov (Zm , Zm )
j<j
j=1
=
j VarZm (λ, λ ) + gm (λ, λ )
j=1
j,j I1,m (λ, λ )
j<j n−1
j=1
j<j
(n − j)r2 (j) + 2
j,j I1,m (λ, λ ).
Now consider
n j Var Sn,m (λ, λ ) j=1 V arZm (λ, λ ) = ∞ ∞ Cnm2 j=1 r2 (j) Cnm2 j=1 r2 (j) n−1 (n−j) 2 gm (λ, λ ) j=1 n r (j) ∞ 2 + Cm2 j=1 r (j) n j,j 2 j<j I1,m (λ, λ ) . + ∞ Cnm2 j=1 r2 (j)
By Lemma 2 one has n j j=1 V arZm (λ, λ ) nm2
n =
But & & & & n & & j 1 & & h1,m (λ, λ )& = & (2π)4 & &j=1
j=1
hj1,m (λ, λ ) nm2
+
(3.3)
1 (n − 1) (h2,m + h3,m ). m2 n (3.4)
& & n m & & (cos sλ cos tλ + sin sλ sin tλ ) & &j=1 s,t,s ,t =−m
& & & j j j & j · (cos s λ cos t λ + sin s λ sin t λ )cum(Xs , Xt , Xs , Xt )& &
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
526
b1644-ch07
Real and Stochastic Analysis
≤
n 4 (2π)4 j=1 2
m s,t,s ,t =−m
|cum(Xsj , Xtj , Xsj , Xtj )|
as n, m → ∞
= o(nm )
by Assumption (5). And by Lemma 4.2 one has as m → ∞ h2,m (λ, λ ) → h2 (λ, λ ) h3,m (λ, λ ) → h3 (λ, λ ). Thus the right side of (3.4) converges to zero as n, m → ∞; therefore the first term on the right side of (3.3) also converges to zero as n, m → ∞. The second term on the right side of (3.3) tends to 1 as n, m → ∞ by virtue of the property of gm (λ, λ ), i.e. gm (λ, λ ) = Cm2 (1 + o(1)) as m → ∞ r2 (j) ≤ r2 (j), s (n−j) r2 (j) → r2 (j) for and the fact that since (n−j) n n∞ 2 n−1 n−j 2 fixed j = 1, . . . , n − 1 and j=1 r (j) < ∞ one has j=1 ( n )r (j) → ∞ 2 r (j). On the other hand, the third term on the right side of (3.3) j=1 n tends to 0 as n, m → ∞, because by Assumption (5), i.e., j,j =1 m j j j j 2 s,t,s ,t =−m |cum(Xs , Xt , Xs , Xt )| = o(nm ) one has & & & n & j,j m n & j<j I1,m (λ, λ )& 4 ≤ |cum(Xsj , Xtj , Xsj , Xtj )| 2 2 nm nm j<j s,t,s ,t =−m
→0
as n, m → ∞. ∞ Hence, V arSn,m = Cnm2 j=1 r2 (j)(1 + o(1)) as n, m → ∞. The other statements can be verified in a similar way.
Lemma 5. Let Xsj be α−mixing in the j variable. If |Xsj | ≤ c0 < ∞ a.e. ∀ j, s, then as m → ∞, and for 0 ≤ l, l ≤ k − 1, l = l , (i) |Cov (Wl,n,m , Wl ,n,m )| = O(q2 m4 α(p|l − l |)). (ii) |Cov (Wl,n,m , Wk,n,m )| = O(q(p + q)m4 α(p(k − l))). Proof.
Since Wl,n,m =
(l+1)p+(l+1)q j=(l+1)p+lq+1
j Zm one observes the following:
(a) By Lemma 1 j | |Wl,n,m | ≤ q max |Zm j
≤ q(2m + 1)2 2
c0 !2 a.e.. 2π
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
527
(b) There is a block of length p|l − l |, i.e. consisting of p|l − l | terms of j between Wl,n,m and Wl ,n,m . Zm It then follows from Theorem 1 in Soedjak [9] that c0 !4 α(p|l − l |). 2π
|Cov (Wl,n,m , Wl ,n,m )| ≤ q 2 (2m + 1)4 4
Hence the above result (i) follows. c0 2 Similarly, since |Wk,n,m | ≤ (p + q)(2m + 1)2 2( 2π ) a.e. (by Lemma 1) c 0 !4 |Cov (Wl,n,m , Wk,n,m )| ≤ q(p + q)(2m + 1)4 4 α(p(k − l)). 2π
Hence Assertion (ii).
One is now in a position to show that the contribution of the q−blocks, Wl,n,m are asymptotically ‘negligible’, so that the normalized partial sum of the Wl,n,m converges to 0 in probability. Lemma 6. Let Conditions (A), (B), (1)–(5) be satisfied. If |Xsj | ≤ c0 < ∞ a.e. ∀ j, s, then $ % k 1 Wl,n,m −→ 0 as n → ∞, Var σn,m l=0
and hence
" P
# − ES n,m | |Sn,m > −→ 0 σn,m
as n → ∞.
Proof.
The variance can be computed directly. % $k−1 k Var Wl,n,m = V ar Wl,n,m + Wk,n,m l=0
l=0
& &2 & " #&2 k−1 & &k−1 & & & & & & Wl,n,m + Wk,n,m & − &E Wl,n,m + Wk,n,m & = E& & & & & l=0
l=0
& &2 k−1 &k−1 & & & = E& Wl,n,m & + EW k,n,m W l,n,m & & l=0
+ EW k,n,m
l=0
k−1 l=0
2
Wl,n,m + E |Wk,n,m |
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
528
b1644-ch07
Real and Stochastic Analysis
& &2 k−1 & k−1 & & & − &E Wl,n,m & − EW k,n,m E W l,n,m & & l=0
l=0
− EW k,n,m E
k−1
Wl,n,m − |EW k,m |
2
l=0 k−1
=
E|Wl,n,m |2 + 2
EW l,n,m W l ,n,m
l
l=0
+
k−1
EW l,n,m Wk,n,m +
l=0
k−1
EWl,n,m W k,n,m
l=0 2
+ E|Wk,n,m | −
k−1
|EW l,n,m |2
l=0
−2
EW l,n,m EW l ,n,m −
k−1
l
−
k−1
EW l,n,m EW k,n,m
l=0
EWl,n,m EW k,n,m − |EW k,n,m |2 .
l=0
After combining appropriate terms one obtains Var
k
Wl,n,m =
k−1
l=0
Cov (Wl,n,m , Wl ,n,m )
l
l=0
+2
k−1
Cov (Wl,n,m , Wk,n,m ) + Var Wk,n,m .
l=0
$ Var
V arWl,n,m + 2
k
1 σn,m
=
Wl,n,m
l=0
k−1
1 2 σn,m
+2
%
k−1
l=0
V arWl,n,m + 2
k−1 l
Cov (Wl,n,m , Wl ,n,m ) '
Cov (Wl,n,m , Wk,n,m ) + V arWk,n,m .
l=0
Each term on the r.h.s. of (3.5) can now be estimated.
(3.5)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
529
1st term: ∞ k−1 Cnm2 j=1 r2 (j) 1 k−1 V arWl,n,m n l=0 V arWl,n,m ∞ = . 2 2 kq σn,m σn,m k Cqm2 j=1 r2 (j) l=0
By Lemma 4 given any > 0 there exists n0 so that for n > n0 one has & & & & V arWl,n,m & & − 1 & < for all l = 0, . . . , k − 1. & 2 (j) & & Cqm2 ∞ r j=1 This implies that for n > n0 & & & 1 k−1 & Var Wl,n,m & & − 1 & & ∞ &k & Cqm2 j=1 r2 (j) l=0 & & k−1 & 1 && V arWl,n,m & − 1 ≤ & & ∞ & & Cqm2 j=1 r2 (j) k l=0
≤
1 k
k−1
= .
l=0
Since can be made arbitrarily small one gets k−1 V arWl,n,m 1 ∞ →1 k Cqm2 j=1 r2 (j)
as n → ∞.
l=0
Also by Lemma 4, 2 σn,m (=
2
V arSn,m ) = Cnm
∞
r2 (j)(1 + o(1)) as n → ∞,
j=1
so that n kq
k−1 l=0
V arWl,n,m → 1 · 1 as n → ∞. 2 σn,m
It follows that as n → ∞ kV arW0,n,m =O 2 σn,m
= o(1) since
kq n
kq n
as n → ∞
→ 0 as n → ∞ as implied by Assumption (A).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
530
Real and Stochastic Analysis
2nd term: n
2−µ 2
p1+β kq 2 ≤
n
& & & & 2 & & Cov (Wl,m , Wl ,m )& & 2 & & σn,m l
k−1 p1+β 2 |Cov (Wl,n,m , Wl ,n,m )| 2 kq 2 σn,m
2−µ 2
l
∞ Cnm2 j=1 r2 (j) 2 p1+β 1 ∞ 2 = 2 C k σn,m j=1 r (j) ·
k−1
1 nµ/2
8 ≤ C ×
α(p(l − l))
l
|Cov (Wl,n,m , Wl ,n,m )| q 2 m2 α(p(l − l))
∞ Cnm2 j=1 r2 (j) c0 !4 p1+β 1 ∞ 2 2 2π k σn,m j=1 r (j) k−1 (2m + 1)4 α (p(l − l)) nµ/2 m2 l
as noted in the proof of Lemma 5, c0 !4 (2m + 1)4 |Cov (Wl,n,m , Wl ,n,m )| ≤ 4 q 2 m2 α(p(l − l)) 2π m2
for 0 ≤ l,
l ≤ k − 1, l = l ,
8 = C ·
∞ Cnm2 j=1 r2 (j) 1 c0 !4 1+β p ∞ 2 2 2π σn,m j=1 r (j)
k−1 (2m + 1)4 (k − l) α(pl) k nµ/2 m2 l=1
≤
8 C ×
2 Cnm2 ∞ c0 !4 1+β 1 j=1 r (j) ∞ 2 p 2 2π σn,m j=1 r (j) k−1 (2m + 1)4 α(pl), nµ/2 m2 l=1
since k−l α(pl) ≤ α(pl) for l = 1, . . . , k − 1, k
b1644-ch07
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
531
Bispectral Density Estimation in Harmonizable Processes
≤K K=
8 C
∞
Cnm2
j=1
r2 (j) (2m + 1)4 k−1
2 σn,m
nµ/2 m2
1 c 0 !4 ∞ 2 , 2π j=1 r (j)
l=1
b1644-ch07
c , l 1+β
by Assumption (5), → K · 1 · 24 ·
∞ c 1+β l
as n → ∞,
l=1
2 = Cnm4 by Lemma 4, σn,m (2m+1)4 nµ/2 m2
∞
r2 (j)(1 + o(1)) as n → ∞, and by ∞ c → 24 as n → ∞. Now since l=1 l1+β < ∞ j=1
Assumption (B), it follows that as n → ∞ & & % $ & & 2 kq2 & & Cov (Wl,n,m , Wl ,n,m )& = O & 2 2−µ & & σn,m n 2 p1+β l
=O
n nµ/2 . p2 pβ
Assumption (5) determines the values of β, that is β ≥ µ/2(1 + ν). If β = (1 + ν), then it follows from Assumption (A) that n → ∞. This implies that if β > µ2 (1+ν), then
µ/2
n pβ
µ/2
n pβ
µ 2
n = ( p1+ν )µ/2 ∼ 1 as
→ 0 as n → ∞. Also as
2
a consequence of Assumption (A), n = o(p ) as n → ∞. Thus pn2 Hence, & & & & 2 & & Cov (Wl,n,m , Wl ,n,m )& = o(1) as n → ∞. & 2 & & σn,m
nµ/2 pβ
→ 0.
l
3rd term: Up to minor adjustments the computation is similar to that for the 2nd term. Thus & & 2−µ k−1 & n 2 p1+β && 2 & Cov (Wl,m , Wk,m )& & 2 & kq(p + q) & σn,m l=0
2−µ k−1 n 2 p1+β 2 ≤ |Cov (Wl,n,m , Wk,n,m )| 2 kq(p + q) σn,m l=0 ∞ Cnm2 j=1 r2 (j) 1 2 p1+β ∞ 2 = 2 C k σn,m j=1 r (j)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
532
b1644-ch07
Real and Stochastic Analysis
· ≤
k−1
1 nµ/2
8 C ×
|Cov (Wl,n,m , Wk,n,m )| q(p + q)m2 α(p(k − l)) 2 Cnm2 ∞ 1 j=1 r (j)
α(p(k − l))
l=0
c0 !4 p1+β ∞ 2 2π k j=1 r (j)
2 σn,m
k−1 (2m + 1)4 α(p(k − l)) nµ/2 m2 l=0
as noted in the proof of Lemma 5, |Cov (Wl,n,m , Wk,n,m )| q(p + q)m2 α(p(k − l)) c0 !4 (2m + 1)4 for 0 ≤ l ≤ k − 1, ≤4 2π m2 2 Cnm2 ∞ 1 8 c0 !4 1+β j=1 r (j) ∞ 2 = p 2 C 2π σn,m j=1 r (j) ×
k (2m + 1)4 α(pl) nµ/2 m2 l=1
k 2 4 Cnm2 ∞ c j=1 r (j) (2m + 1) =K , 2 σn,m l1+β nµ/2 m2 l=1 1 8 c0 !4 ∞ 2 , K= C 2π j=1 r (j) by Assumption (5), and → K · 1 · 24 ·
∞ c 1+β l
as n → ∞,
l=1
2 = Cnm2 by Lemma 4, σn,m (2m+1)4 nµ/2 m2
∞
r2 (j)(1 + o(1)) as n → ∞, and by ∞ c → 24 as n → ∞. Now since l=1 l1+β < ∞ it j=1
Assumption (B) follows that as n → ∞ & & $ % k−1 & & 2 kq(p + q) & & Cov (Wl,n,m , Wk,n,m )& = O & 2 2−µ & & σn,m n 2 p1+β l=0
=O
n nµ/2 p2 pβ
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
533
by Assumption (A). Using the same reasoning as in the 2nd term one finally has & & & & 2 k−1 & & Cov (Wl,n,m , Wk,n,m )& = o(1) as n → ∞. & 2 & & σn,m l=0
Hence the 3rd term → 0 as n → ∞. 4th term: ∞ Cnm2 j=1 r2 (j) 1 n V arWk,n,m ∞ V arWk,n,m = 2 2 p + q σn,m σn,m C(p + q)m2 j=1 r2 (j) → 1 · 1 as n → ∞ by Lemma 4. It follows that 1 2 σn,m
V arWk,n,m = O
p+q n
= o(1) as n, m → ∞ since p = o(n), q = o(p) as n → ∞ as given in Assumption (A). Hence the 4th term → 0 as n → ∞. All four terms on the r.h.s. of Equation (3.5) converges to 0 as n → ∞. Thus 1 Sn,m → 0 as n → ∞. Var σn,m By Chebyshev’s inequality " # − ES n,m | |Sn,m 1 1 P > < 2 V ar S σn,m σn,m n,m →0
as n → ∞.
Lemma 7. Let Conditions (A), (B), (1)–(5) be satisfied. If |Xsj | ≤ c0 < ∞ a.e., ∀ j, s, then Sn,m − ES n,m D −→ N (0, 1) σn,m
as n → ∞.
Proof. The proof is based on characteristic function techniques and the uniqueness theorem. Thus one wishes to show that as n → ∞ the character1 2 S −ES istic function of the r.v. n,mσn,m n,m converges to e− 2 t , the characteristic
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
534
b1644-ch07
Real and Stochastic Analysis
function of the standard normal r.v. . If this is shown then by the uniqueness theorem of characteristic functions one can conclude the desired result. U −EU Let the characteristic function of j,n,mσn,m j,n,m be ψj,n,m (t). Then (k−1 j=0 ψj,n,m (t) is also a characteristic function where k is a positive integer n defined in Assumption (A), that is k = [ p+q ]. Consider
& « & && « & „ „ & Sn,m −ES n,m −ES n,m k−1 & & it Sn,m & & it & σn,m σn,m − 12 t2 & & −e − ψj,n,m (t)&& & ≤ &Ee &Ee & & & & j=0 & & &k−1 & & 1 2& (3.6) ψj,n,m (t) − e− 2 t && . + && & j=0 &
Consequently, it suffices to show that for all t ∈ R
S −ES n,m (k−1 it( n,m ) σn,m (i) |Ee − j=0 ψj,n,m (t)| → 0 as n → ∞, and (k−1 1 2 (ii) | j=0 ψj,n,m (t) − e− 2 t | → 0 as n → ∞.
(i) First one observes that the r.v. 1 it σn,m [(U0,n,m +···+Uk−2,n,m )−(EU0,n,m +···+EUk−2,n,m )]
e
(k−1)p+(k−2)q
is measurable w.r.t. F1,m
, and the r.v.
1 it σn,m [Uk−1,n,m −EUk−1,n,m ]
e
∞ is measurable w.r.t. F(k−1)p+(k−1)q+1,m . 0 Let Uj,n,m = Uj,n,m − EUj,n,m , then by Theorem 1(i) in Soedjak [9] one obtains & & Pk−1 0 Pk−2 0 1 it 1 U it 1 U 0 & & it σn,m j=0 Uj,n,m − Ee σn,m j=0 j,n,m Ee σn,m k−1,n,m & ≤ 16α(q). &Ee
Similarly, for 1 ≤ l ≤ k − 2 & & Pl Pl−1 0 0 1 & & it σn,m it 1 U it 1 U 0 j=0 Uj,n,m − Ee σn,m j=0 j,n,m Ee σn,m l,n,m & ≤ 16α(q). &Ee In particular, for l = 1 & & & & 1 & it 1 P1 Uj,n,m 0 0 1 it σn,m Uj,n,m && &Ee σn,m j=0 − Ee & & ≤ 16α(q). & & j=0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
535
Now suppose for 1 ≤ l ≤ k − 2 the following inequality holds: & & & & l 0 & it 1 Pl U 0 1 it σn,m Uj,n,m && &Ee σn,m j=0 j,n,m − Ee & & ≤ 16lα(q). & & j=0 Then consider & & & & l+1 it 1 U 0 & & it 1 Pl+1 U 0 σn,m j,m & &Ee σn,m j=0 j,n,m − Ee & & & & j=0 & & Pl & it 1 Pl+1 U 0 it 1 U0 it 1 U 0 = &&Ee σn,m j=0 j,n,m − Ee σn,m j=0 j,n,m Ee σn,m l+1,n,m & & & l+1 Pl 0 0 0 1 1 1 it σn,m it σn,m Ul+1,n,m it σn,m Uj,n,m && j=0 Uj,n,m + Ee Ee − Ee & & j=0 & & P P l+1 l & & it 1 U0 it 1 U0 it 1 U 0 ≤ &Ee σn,m j=0 j,n,m − Ee σn,m j=0 j,n,m Ee σn,m l+1,n,m & & & l l Pl & 0 0 0 1 1 1 it U it U it U + &&Ee σn,m j=0 j,n,m − Ee σn,m j,n,m + Ee σn,m j,n,m & j=0 j=0 & & l+1 it 1 U 0 & it 1 U 0 · Ee σn,m l+1,n,m − Ee σn,m j,n,m && & j=0 & & & & l & it 1 Pl U 0 0 1 it σn,m Uj,n,m && j=0 j,n,m σn,m & − Ee ≤ 16α(q) + &Ee & & & j=0 & & & l+1 & && l+1 & 0 0 0 1 1 1 it σn,m Uj,n,m it σn,m Uj,n,m && & it σn,m Ul+1,n,m & & × &Ee Ee − Ee &+& & &j=0 & j=0 ≤ 16α(q) + 16lα(q) · 1 + 0 = 16(l + 1)α(q). Thus it has been shown by induction that & & & & k−1 0 0 & it 1 Pk−1 Uj,n,m 1 it σn,m Uj,n,m && &Ee σn,m j=0 − Ee & ≤ 16(k − 1)α(q). & & & j=0
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
536
b1644-ch07
Real and Stochastic Analysis
n C Since k = [ p+q ] and α(q) ≤ q1+β , then as n → ∞ and for each t ∈ R, & & & & k−1 & 0 & it 1 Pk−1 U 0 1 it U σn,m j,n,m & &Ee σn,m j=0 j,n,m − Ee & & & & j=0 n =O (p + q)q 1+β n since q = o(p) =O pq 1+β
n 1 since q = =O β q p
= o(1).
(3.7) k−1
= j=0 Uj,n,m as Relation (3.7) can now be written, since Sn,m & & & & k−1 −ES n,m & it Sn,m & σn,m &Ee − ψj,n,m (t)&& → 0 as n → ∞. & & & j=0
This proves assertion (i). (ii) Consider a sequence of rowwise independent sequences of r.v.’s Yn,j ,
n = 0, 1, . . . ; j = 1, 2, . . . , k − 1
where Yn,j has the same distribution as
Uj,n,m −EUj,n,m , σn,m
so that for each n,
V arUj,n,m 2 σn,m
Yn,j has mean 0 and variance and k = kn is defined as in Assumpn tion (A), i.e., k = [ p+q ]. Consider the partial sum Pn =
k−1
Yn,j .
j=0
Observe that since EYn,j = 0, ∀ n, j one directly gets EPn = 0. Now V arPn = E(Pn )2 − (EPn )2 2 k−1 = E Yn,j + 0 j=0
=
k−1 j=0
E(Yn,j )2 + 2
j<j
EYn,j Yn,j .
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
537
Since Yn,j are independent and have mean 0, EYn,j Yn,j = EYn,j EYn,j = 0. Thus V arPn =
k−1
E(Yn,j )2
j=0
=
=
By Lemma 4 implies that
that
k−1
2 σn,m
j=0
1 k
V arUj,n,m
∞ k−1 Cnm2 l=1 r2 (l) kp 1 V arUj,n,m . ∞ 2 n k σn,m Cpm2 l=1 r2 (l) j=0
V arU Pj,n,m2 Cpm2 ∞ l=1 r (l)
Lemma 4 that kp n
1
k−1
→ 1 as n → ∞ for all j = 0, 1, . . . , k − 1, which
V arU Pj,n,m2 → 1 j=0 Cpm2 ∞ l=1 r (l) 2 P∞ 2 Cnm l=1 r (l) → 1 as n → 2 σn,m
as n → ∞. It also follows from ∞. And Assumption (A) implies
→ 1 as n → ∞. Thus
VarPn → 1 as n → ∞. (k−1 One also notes that the characteristic function of Pn is j=1 ψj,n,m (t). (k−1 1 2 What one wishes to establish is | j=1 ψj,n,m (t)−e− 2 t | → 0 as n → ∞ for each t ∈ R. The limit distribution of Pn is to be N (0, 1). Here one needs to use a classical Central Limit Theorem. For reference the theorem is stated below. For details and proof see for example M. M. Rao [6], Thm. 5.3.5. Theorem 8. Let {Yn,j , 1 ≤ j ≤ k = k(n)} be a sequence of rowwise independent sequences of r.v.’s with two moments finite, such that the mean D is 0. Set Pn = kj=i Yn,j . Then Pn → N (0, 1), VarPn → 1, and Yn,j are infinitesimal, iff for each > 0 the following are satisfied a)
b)
lim
n→∞
lim
n→∞
where Fj (x) = P [Yn,j < x].
k j=1
|x|≥
k j=1
|x|<
x2 dFj (x) = 0 x2 dFj (x) = 1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
538
b1644-ch07
Real and Stochastic Analysis
From the above computation of Var Pn one has Var Pn =
k
E(Yn,j )2
j=1
=
k j=1
|x|≥
k
x2 dFj (x) +
j=1
|x|<
x2 dFj (x)
by the fundamental theorem of probability. Since Var Pn → 1 as n → ∞, it follows that as n → ∞ (hence k → ∞) 1 = lim
n→∞
k
2
|x|≥
j=1
x dFj (x) + lim
n→∞
k j=1
|x|<
x2 dFj (x).
(3.8)
Now Condition (a) together with Relation (3.8) imply Condition (b); thereD fore it suffices to establish Condition (a) to show that Pn → N (0, 1) by the above recalled limit theorem. k j=1
|x|≥
x2 dFj (x) =
k−1 j=1
=
1 2 σn,m
|x|≥
x2 dP
Uj,n,m − EUj,n,m <x σn,m
k−1 j=1
|x|≥σn,m
x2 dP [Uj,n,m − EUj,n,m < x]
by change of variables, =
k−1
1 2 σn,m
j=1
[|Uj,n,m −EUj,n,m |≥σn,m ]
(Uj,n,m − EUj,n,m )2 dP
by the fundamental theorem of probability, =
1 2 σn,m
k−1 j=1
[|Uj,n,m −EUj,n,m |≥σn,m ]
2 2 σn,m 2 (Uj,n,m − EUj,n,m ) dP 2 2 σn,m
k−1 1 (Uj,n,m − EUj,n,m )4 dP 4 2 σn,m [|U −EU |≥σ ] j,n,m j,n,m n,m j=1 4 (j+1)p+jq k−1 1 l l ≤ 2 4 (Zm − EZm ) dP. σn,m Ω
≤
j=1
l=jp+jq+1
(3.9)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
539
Bispectral Density Estimation in Harmonizable Processes
The reason that the estimate for the fourth moment is used is that the estimate for the second moment is not sufficiently refined for our purposes. It will be shown that for all j = 0, . . . , k − 1 4 (j+1)p+jq l l (Zm − EZm ) dP = o p3 m8 as n → ∞. (3.10) Ω
l=jp+jq+1
l l l = Zm − EZm . Then Let Z˜m 4 (j+1)p+jq+1 (j+1)p+jq l l 4 l1 2 ˜ l2 2 = E(Z˜m ) + E(Z˜m ) (Zm ) E Z˜m l=jp+jq+1
l1 =l2
l=jp+jq+1
+
l1 3 ˜ l2 E(Z˜m ) Zm +
l1 =l2
+
l1 2 ˜ l2 ˜ l3 E(Z˜m ) Zm Zm
l1 =l2 =l3
l1 ˜ l2 ˜ l3 ˜ l4 E Z˜m Zm Zm Zm .
(3.11)
l1 =l2 =l3 =l4
The terms on the r.h.s. can now be estimated. 1st term: Clearly by Lemma 1 and P being a probability measure one has for all j = 0, . . . , k − 1 (j+1)p+jq
l 4 l 4 E(Z˜m ) ≤ p max E(Z˜m ) l
l=jp+jq+1
≤p
2c0 2π
8
(2m + 1)8 ,
so that (j+1)p+jq
l 4 E(Z˜m ) = O(pm8 ) as n → ∞.
l=jp+jq+1
2nd term: Also by Lemma 1 and the fact that l1 =l2
1≤l1 =l2 ≤p
1 = O(p2 )
l1 2 ˜ l2 2 E(Z˜m ) (Zm ) = O(p2 m8 ) as n → ∞ for all j = 0, . . . , k − 1.
October 24, 2013
10:0
9in x 6in
540
Real and Stochastic Analysis: Current Trends
b1644-ch07
Real and Stochastic Analysis
3rd term: Similarly, for all j = 0, . . . , k − 1 l1 3 ˜ l2 E(Z˜m ) Zm = O(p2 m8 ) as n → ∞. l1 =l2
4th term: As n → ∞ l1 =l2 =l3
l1 2 ˜ l2 ˜ l3 E(Z˜m ) Zm Zm
= O
l1 2 ˜ l2 ˜ l3 |E(Z˜m ) Zm Zm |
jp+jq+1≤l1
= O
l1 2 ˜ l2 ˜ l3 |Cov ((Z˜m ) Zm , Zm )|
jp+jq+1≤l1
jp+jq+1≤l1
but for all j = 0, . . . , k − 1 1 p2
α(l3 − l2 ) =
jp+jq+1≤l1
p−(l+1) p−2 1 α(l) j 2 p l=1
j=1
p−2 1 (p − l − 1)(p − l) = 2 α(l) p 2 l=1
1 (p − 2)(p − 1) c 2 p2 l1+β p−2
≤
l=1
∞
→
1 c < ∞, 2 l1+β l=1
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
541
Bispectral Density Estimation in Harmonizable Processes
so that for all j = 0, . . . , k − 1, jp+jq+1≤l1
m
m
m
m
Lj
m
m
Tj
= O m 8
m
min{α(l2 − l1 ), α(l4 − l3 )}
(3.12)
Tj l1 ˜ l2 ˜ l3 ˜ l4 l1 ˜ l2 ˜ l3 ˜ l4 l1 ˜ l2 ˜ l3 , Z m Zm since E Z˜m Zm Zm Zm is equal to Cov (Z˜m Zm Zm , Zm ) or Cov (Z˜m l4 l ) due to the fact that E Z˜m = 0, so that by Theorem 3.1 and Lemma Z˜m 4.1 one has for jp + jq + 1 ≤ l1 < l2 < l3 < l4 ≤ (j + 1)p + jq and as n → ∞ l1 ˜ l2 ˜ l3 ˜ l4 |E Z˜m Zm Zm Zm | = O(m8 min{α(l2 − l1 ), α(l4 − l3 )}).
Now,
min{α(l2 − l1 ), α(l4 − l3 )} ≤ c
Tj
Tj
min
1 1 , (l2 − l1 )1+β (l4 − l3 )1+β
)
c 1 in Assumption (4). To estimate since α(l) ≤ l1+β Tj min{ (l2 −l1 )1+β , 1 } one computes the number of all combinations when max{(l2 −l1 ), (l4 −l3 )1+β (l4 − l3 )} = x for jp + jq + 1 ≤ l1 < l2 < l3 < l4 ≤ (j + 1)p + jq. Let the number of such combinations be Nx where x = 1, 2, . . . , p − 3. The formula for Nx can be obtained by looking at all arrangements of (l2 − l1 ) and (l4 − l3 ) when max{(l2 − l1 ), (l4 − l3 )} = x, that is if l4 − l3 = x, then l2 − l1 = 1, 2, . . . , x, and if l2 − l1 = x, then l4 − l3 = 1, 2, . . . , x. The ] and Case 2 situation is divided into 2 cases. Case 1 is for x = 1, 2, . . . , [ p−1 2 is for x = [ p−1 ] + 1, . . . , p − 3. 2 Case 1: x = 1, 2, . . . , [ p−1 2 ] The number of combinations when (l4 − l3 ) = x, (l2 − l1 ) = i, i = 1, 2, . . ., x − 1, and l1 = l + (jp + jq), l = 1, 2, . . . , p − x − 1 − i is p − x − i − l. Thus
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
542
b1644-ch07
Real and Stochastic Analysis
the number of combinations as l ranges from 1 to p − x − 1 − i is p−x−1−i
(p − x − i − l) =
l=1
(p − x − i − 1)(p − x − i) . 2
Similarly, by symmetry the number of combinations when (l2 − l1 ) = x, (l4 − l3 ) = i, i = 1, 2, . . . , x − 1 is (p − x − i − 1)(p − x − i) . 2 When (l2 − l1 ) = x and (l4 − l3 ) = x, then the number of combinations is (p − 2x − 1)(p − 2x) . 2 Thus for x = 1, 2, . . . , [ p−1 2 ] Nx =
x−1
(p − x − i − 1)(p − x − i) +
i=1
(p − 2x − 1)(p − 2x) . 2
Case 2: x = [ p−1 2 ] + 1, . . . , p − 3 Let max{(j − i), (l − k)} = (p − 3) − y, for y = 0, 1, . . . , (p − 3) − [ p−2 2 ] − 1, then a recursive formula is obtained, N(p−3)−(y+1) = 2 (1 + 2 + · · · + (y + 2)) + N(p−3)−y = (y + 2)(y + 3) + N(p−3)−y with Np−3 = 2. It follows that the formula for N(p−3)−y can readily be obtained, N(p−3)−y = (y + 1)(y + 2) + y(y + 1) + · · · + 2 · 3 + 1 · 2
y+1
=
i(i + 1).
i=1
With the change of variable one has
p−2−x
Nx =
i=1
(p − x − i − 1)(p − x − i).
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
Now,
min
Tj
=
1 1 , 1+β (l2 − l1 ) (l4 − l3 )1+β
p−2 [ 2 ]
Nx + x1+β
x=1 p−2 2
=
[ ]
1 x1+β
x=1
x=[
p−x−2
1 ]+1
= T 1 + T2 ,
]+1
Nx
i=1
p−1 2
x=[
1 x1+β
# "x−1 (p − 2x)(p − 2x − 1) (p − x − i − 1)(p − x − i) + 2
p−3
+
p−1 2
543
)
p−3
1
b1644-ch07
x1+β
(p − x − i − 1)(p − x − i)
i=1
(say)
where T1 =
p−2 [ 2 ]
x=1
1 x1+β
"x−1 # (p − 2x)(p − 2x − 1) (p − x − i − 1)(p − x − i) + , 2 i=1
and p−3
T2 = x=[
p−1 2
1 ]+1
x1+β
p−x−2
(p − x − i − 1)(p − x − i).
i=1
Note that T1 and T2 do not depend on the parameter j in Tj , i.e., ) 1 1 min , (l2 − l1 )1+β (l4 − l3 )1+β Tj ) 1 1 min , = . (l2 − l1 )1+β (l4 − l3 )1+β T0
The estimate for T1 is given as: T1 =
p−2 [ 2 ]
x=1 p−2 2
≤
[] x=1
1 x1+β 1 x1+β
# "x−1 (p − 2x)(p − 2x − 1) (p − x − i − 1)(p − x − i) + 2 i=1
(p − 2x)(p − 2x − 1) (p − x − 2)(p − x − 1)(x − 1) + 2
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
544
b1644-ch07
Real and Stochastic Analysis
≤
p−2 [ 2 ]
x=1
[ p−2 2 ] 1 1 (p − x − 2)(p − x − 1) + (p − 2x)(p − 2x − 1) x1+β 2 x=1 x1+β x
≤ (p − 3)(p − 2)
p−2 [ 2 ]
x=1
≤p
2
p−2 [ 2 ]
x=1
p−2 [ 2 ] 1 x 1 (p − 3)(p − 2) + 1+β xβ 2 x x=1
[ p−2 2 ] 1 p2 1 + xβ 2 x=1 xβ
[ p−2 2 ] 3 2 1 = p 2 xβ x=1 ≤
p 3 2 1 p . 2 x=1 xβ
But one observes that 1/2 [p ] p 1 l = + lβ l1+β
l=1
l=1
≤ p1/2
l≤p1/2
p
l l1+β
l=[p1/2 ]+1
1 l 1+β
+p
l>p1/2
1 , l1+β
and p−1
p 1 1 1 ≤ p−1/2 + β 1+β l l l 1+β 1/2 1/2 l=1
l≤p
l>p
→ 0 as n → ∞ since
∞
1 l=1 l1+β
< ∞. It follows that p−3 T1 ≤
p 3 −1 1 p 2 xβ x=1
→0
as n → ∞.
Thus T1 = o(p3 ) as n → ∞.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
545
Bispectral Density Estimation in Harmonizable Processes
The estimate for T2 is as follows: p−3
T2 =
x1+β
x=[ p−1 2 ]+1 p−3
≤
p−x−2
1
p−[ p−1 2 ]−3
x1+β
p−1 p− −j−2 2
1
x=[ p−1 2 ]+1
(p − x − i − 1)(p − x − i)
i=1
j=1
p−1 −j−1 × p− 2
2
p−1 p−1 ≤ p− p− −3 −2 2 2
p−3
x=[ p−1 2 ]+1
2 p−1 p−1 p− −1 −3 −1 −2 ≤ p− 2 2 ≤
≤
p−3 2
p3 8
2
p−1 2
p−3
x=[
p−1 2
1 ]+1
x1+β
p−3
1 x1+β p−3
x=[ p−1 2 ]+1
1
x=[ p−1 2 ]+1
x1+β
.
It follows that p−3 T2 ≤
1 8
p−3 x=[ p−1 2 ]+1
1 x1+β
→ 0 as n → ∞ since
∞
1 x=1 x1+β .
Thus T2 = o(p3 ) as n → ∞.
The estimates for T1 and T2 then give for all j = 0, . . . , k − 1 ) 1 1 min , = o(p3 ) as n → ∞, (l2 − l1 )1+β (l4 − l3 )1+β Tj
1 x1+β
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
546
b1644-ch07
Real and Stochastic Analysis
and this in turn implies that (3.12) becomes for all j = 0, . . . , k − 1
8 3 *l3 *l4 *l1 Z*l2 Z EZ m m m Zm = o(m p )
as n → ∞.
Lj
It has just been shown that all five terms on r.h.s. of Equation (3.11) are equal to o(m8 p3 ) as n → ∞ for all j = 0, . . . , k − 1. Hence, Assertion (3.10), (j+1)p+jq l 4 ] = o(m8 p3 ) as n → ∞, has been established. Thus there E[ l=jp+jq+1 Z˜m exists a nonnegative function θ(·) with the property θ(p) → 0 as p → ∞, such that for all j = 0, . . . , k − 1 E
(j+1)p+jq
4 l = m8 p3 θ(p) Z˜m
l=jp+jq+1
ˆ ≤ m8 p3 θ(p)
(3.13)
ˆ = sup where θ(p) j≥p θ(j) 0, a nonnegative decreasing function of p. It follows from the CBS inequality and Relation (3.13) that 2 2 (j+1)p+jq l l ˆ ≥E ≥ E Z˜m Z˜m m8 p3 θ(p) l=jp+jq+1 l=jp+jq+1
4
(j+1)p+jq
= {VarUj,n,m }2 . Dividing the right and the left sides by (Cpm2 ˆ m4 pθ(p) ∞ 2 ! 2 ≥ C j=1 r (j)
$
∞ j=1
r2 (j))2 one has
V arUj,n,m ∞ Cpm2 j=1 r2 (j)
%2 .
This implies by Lemma 4 that ˆ = K ≥ C2 lim inf m4 p θ(p) n→∞
∞
j=1
2 r2 (j) > 0,
K = constant > 0.
ˆ ˆ 0, one has But since the limit of θ(p) as p → ∞ exists, i.e. θ(p) ˆ =K >0 lim m4 p θ(p)
n→∞
as n → ∞.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
547
Returning to (3.9) and using (3.13), one has ∞ ˆ (Cnm2 j=1 r2 (j))2 km8 p3 θ(p) ∞ x dFj (x) ≤ 2 4 4 n m (C j=1 r2 (j))2 σn,m |x|≥
k j=1
2
=
kp2 4 ˆ m p θ(p) (C j=1 r2 (j))2 n2 2 2 (Cnm2 ∞ j=1 r (j)) × . 4 σn,m 1 ∞
2 ˆ = O( np ) = o(1), m4 p θ(p) → K as n → ∞ and by Lemma 4 But kp n2 2 2 σn,m = Cnm (1 + o(1)) as n → ∞. Therefore,
k j=1
|x|≥
x2 dFj (x) → 0
as n → ∞.
This establishes Condition (a) in Theorem 8. As mentioned before, ConD dition (b) follows with the help of Equation (3.8). By Theorem 8, Pn → (k−1 − 12 t2 N (0, 1), i.e., | j=0 ψj,n,m (t) − e | → 0 as n → ∞ for each t ∈ R. This proves Assertion (ii). With Assertions (i) and (ii) the desired result immediately follows from (3.6). The preceeding seven lemmas essentially give the proof of the following result: Theorem 9. If |Xsj | ≤ c0 < ∞ a.e. ∀ j, s, and let the Conditions (A), (B), (1)–(5) be satisfied, then Sn,m − ES n,m D −→ N (0, 1) σn,m Proof.
as n → ∞.
Lemma 6 shows that − ES n,m P Sn,m → 0 as n → ∞ σn,m
This implies by virtue of Equation (2.11), − ES n,m − ES n,m Sn,m Sn,m Sn,m − ES n,m = + σn,m σn,m σn,m
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
548
b1644-ch07
Real and Stochastic Analysis
and by Slutsky’s theorem that the r.v.’s
Sn,m −ES n,m σn,m
Sn,m −ES n,m have σn,m Sn,m −ES n,m has r.v. σn,m
and
the same limit distribution. Lemma 7 shows that the a standard normal limit distribution, that is
Sn,m − ES n,m D → N (0, 1) as n → ∞. σn,m
Hence the assertion of the theorem.
The result in Theorem 9 is restricted to the case when the Xsj are bounded. This will now be extended to a more general case where the Xsj are not necessarily bounded but satisfies some moment conditions. Let E|Xsj |4(1+δ) < K < ∞ ∀ j, s
and δ > (2 + β)/β
where β > 0 is defined in Assumption (5). Note that for such δ ∞
αδ/(2+δ) (k) < ∞
k=1
by (2.13). Consider for s = 1, 2, . . . a truncation of Xsj Xsj , |Xsj | ≤ N j,N Xs = 0, |Xsj | > N. Set *sj,N = Xsj − Xsj,N . X Then j Zm =
1 (2π)2
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t=−m
*sj,N )(Xtj,N + X *tj,N ) × (Xsj,N + X =
1 (2π)2
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t=−m
* j,N X j,N + X j,N X * j,N X * j,N + X * j,N ] · [Xsj,N Xtj,N + X t t t s s s e
e
ee
j,N N j,N N j,N N j,N N = Zm + Zm + Zm + Zm
(3.14)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
549
where j,N N Zm =
e
j,N N = Zm
e
j,N N = Zm
ee
j,N N = Zm
1 (2π)2 1 (2π)2 1 (2π)2 1 (2π)2
m
(cos sλ cos tλ + sin sλ sin tλ )Xsj,N Xtj,N ,
s,t=−m m
*sj,N Xtj,N , (cos sλ cos tλ + sin sλ sin tλ )X
s,t=−m m
* j,N , (cos sλ cos tλ + sin sλ sin tλ )Xsj,N X t
s,t=−m m
*sj,N X *tj,N . (cos sλ cos tλ + sin sλ sin tλ )X
s,t=−m
Let N,N Sn,m
=
n
j,N N Zm ,
e NN Sn,m
=
j=1 e
NN = Sn,m
n
n
e
j,N N Zm ,
j=1 e
j,N N Zm ,
ee
NN Sn,m =
j=1
n
ee
j,N N Zm .
j=1
Then e
e
N NN NN NN Sn,m Sn,m − ES N − ES n,m Sn,m − ES n,m n,m = + σn,m σn,m σn,m e
e
ee
ee
N NN NN NN − ES N − ES n,m Sn,m Sn,m n,m + + σn,m σn,m
(3.15)
The last three terms will be shown to have the following properties: for any > 0 there exists a positive integer n0 such that when N0 is chosen δ/2(1+δ)
satisfying N0
>
µ/2
(K1 +K2 )n0 3
a)
b)
c)
one has % f0 N SnN00,m < 3 Var σn0 ,m $ f % N0 SnN00,m Var < 3 σn0 ,m $ ff% N0 SnN00,m Var < 3 . σn0 ,m $
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
550
b1644-ch07
Real and Stochastic Analysis
a) $ Var
e
NN Sn,m σn,m
%
1 e NN Sn,m 2 σn,m n n 1 e e e j,N N j,N N j ,N N = 2 V arZm +2 Cov (Zm , Zm ) σn,m
=
j=1
j<j
(3.16) The estimates for the 1st and the 2nd terms will now be given. First observe that for j = 1, . . . , n e
j,N N 2 E|Zm | ≤
4 * j,N *sj,N X N 2 (2m + 1)4 sup E|X s | (2π)4 s,s ,j
4 * j,N |2 , by the CBS inequality N 2 (2m + 1)4 sup E|X s (2π)4 s,j 4 2 4 N (2m + 1) sup |Xsj |2 dP = (2π)4 s,j [|Xsj |>N ] 4 4 ≤ (2m + 1) sup |Xsj |4 dP. (2π)4 s,j [|Xsj |>N ] ≤
e
e
j,N N j,N N 2 ≤ 2E|Zm | , one has Since Var Zm n
e
j,N N V arZm ≤2
j=1
n
e
j,N N 2 E|Zm |
j=1
8 ≤ n(2m + 1)4 sup (2π)4 s,j but [|Xsj |>N ]
[|Xsj |>N ]
|Xsj |4 dP,
|Xsj |4 dP
≤
Ω
χ[|Xsj |>N ] |Xsj |4 dP 1
≤ {E|Xsj |4(1+δ) } 1+δ {P [|Xsj | > N ]}δ/(1+δ) , ≤ K 1/(1+δ)
1 {E|Xsj |}δ/(1+δ) , N δ/(1+δ)
by H¨ older’s inequality
by Markov’s inequality
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
551
Bispectral Density Estimation in Harmonizable Processes 2 1 {E|Xsj |4(1+δ) }δ/4(1+δ) , N δ/(1+δ) by Liapounov’s inequality 2 1 ≤ K 1/(1+δ) K δ/4(1+δ) δ/(1+δ) , for all s, j, N
≤ K 1/(1+δ)
so that n N δ/(1+δ) N δ/(1+δ) 1 e j,N N 1st term = V arZm 2 nµ/2 nµ/2 σn,m j=1
≤
C(2π)4 ×
Since by Assumption (B) o(1)) as n → ∞ one has
Cnm
8 ∞ j=1
2
∞
r2 (j)
j=1
K
r2 (j)
2 σn,m
(2m+1)4 nµ/2 m2
4(1+δ)+δ 4(1+δ)2
(2m + 1)4 nµ/2 m2
.
2 → 16, and by Lemma 4 σn,m = Cnm2 (1+
1st term = O
, δ/(1+δ)
nµ/2 N
(3.17)
so that there exists a positive integer n1 such that for all n ≥ n1 one has 1st term ≤ K1 where K1 =
C(2π)4
8·16 P∞
2 j=1 r (j)
K
4(1+δ)+δ 4(1+δ)2
nµ/2 N δ/(1+δ)
.
Also observe that for j < j e
e
j,N N j ,N N Cov (Zm , Zm )=
1 (2π)4
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
· (cos s λ cos t λ + sin s λ sin t λ )
* j , X j X * j ) × Cov (Xsj X t s t e
e
j,N N j ,N N |Cov (Zm , Zm )| ≤
4 * j , X j X * j )|. (2m + 1)4 max |Cov (Xsj X t s t s,t,s ,t (2π)4
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
552
b1644-ch07
Real and Stochastic Analysis
But * j |2+δ ≤ N 2+δ E|X * j |2+δ E|Xsj X t t = N 2+δ |Xtj |2+δ dP [|Xtj |>N ]
≤
[|Xtj |>N ]
|Xtj |2(2+δ) dP
≤ sup t,j
[|Xtj |>N ]
≤ sup t,j
Ω
|Xtj |2(2+δ) dP
|Xtj |2(2+δ) dP
2+δ & &
2(1+δ) & j &4(1+δ) ≤ sup dP , &Xt &
by Liapounov’s inequality
Ω
t,j
2+δ
< K 2(1+δ) < ∞,
by requirement (3.14),
so that by Theorem 1(ii) in Soedjak [9] for j < j e
e
j,N N j ,N N |Cov (Zm , Zm )|
2 " # 2+δ 4 j 2(2+δ) 4 ≤ (2m + 1) sup |Xt | dP αδ/(2+δ) (j − j ). (2π)4 t,j [|Xtj |>N ]
Now [|Xtj |>N ]
|Xtj |2(2+δ) dP
≤
Ω
χ[|X j |>N ] |Xtj |2(2+δ) dP t
≤ {E|Xsj |4(1+δ) }(2+δ)/2(1+δ) {P [|Xsj | > N ]}δ/2(1+δ) , by H¨ older’s inequality ≤ K (2+δ)/2(1+δ)
1
[E|Xsj |]δ/2(1+δ) , N δ/2(1+δ) by Markov’s inequality
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
553
Bispectral Density Estimation in Harmonizable Processes
≤ K (2+δ)/2(1+δ)
2 1 [E|Xsj |4(1+δ) ]δ/8(1+δ) , N δ/2(1+δ) by Liapounov’s inequality
≤ K (2+δ)/2(1+δ) K δ/8(1+δ)
2
1 , N δ/2(1+δ)
for all j, s,
one has N δ/2(1+δ) |2nd term| nµ/2 =
N δ/2(1+δ) 2 e e j,N N j ,N N |Cov (Zm , Zm )| 2 nµ/2 σn,m j<j
∞ 2 2 (2m + 1)4 Cnm j=1 r (j) 1 αδ/(2+δ) (j − j) ≤ K2 µ/2 2 2 σn,m n n m j<j where K2 =
C(2π)4
8 P ∞
2 j=1 r (j)
K
4(2+δ)(1+δ)+δ 8(1+δ)2
. By the fact that
n−1 1 1 δ/2+δ α (j − j ) = (n − j)αδ/(2+δ) (j) n n j=1
j<j
→
∞
αδ/(2+δ) (j) as n → ∞
j=1
< ∞, and as before, by Lemma 4 and
(2m+1)4 nµ/2 m2
→ ∞ as n → ∞ one has
|2nd term| = O
. δ/2(1+δ) nµ/2
N
(3.18)
Thus there exists a positive integer n2 such that for all n ≥ n2 one has |2nd term| ≤ K2
nµ/2 N δ/2(1+δ)
.
Then from (3.17) and (3.18) one has for all n ≥ max{n1 , n2 } $ Var
e
NN Sn,m σn,m
% ≤ K1
nµ/2 N δ/(1+δ)
+ K2
nµ/2 N δ/2(1+δ)
.
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
554
b1644-ch07
Real and Stochastic Analysis
Now for any > 0 fix n0 > max{n1 , n2 } and choose N0 such that δ/2(1+δ)
N0
>
µ/2
(K1 +K2 )n0 3
$ Var
. Then f
N0 SnN00,m σn0 ,m
% ≤ K1
µ/2
n0
δ/(1+δ)
N0
µ/2
n0
δ/2(1+δ)
N0
K1 + K 2
µ/2
< n0
+ K2
δ/2(1+δ)
N0
< 3 .
(3.19)
b) By symmetry to part (a) one has, for any > 0 and for fixed n0 > δ/2(1+δ)
max{n1 , n2 } and fixed N0 such that N0 $ Var
f
N0 SnN00,m σn0 ,m
>
µ/2
(K1 +K2 )n0 3
% < 3 .
(3.20)
c) $ Var
=
ee
NN Sn,m σn,m
%
1 2 σn,m
n n eN e eN e eN e j,N j,N j ,N Var Zm +2 Cov (Zm , Zm ). j=1
(3.21)
j<j
The estimates for the 1st and the 2nd terms can now be given. First observe that for j = 1, 2, . . . ee
4 *sj,N X *tj,N X * j,N * j,N (2m + 1)4 sup E|X s Xt | (2π)4 s,t,s ,t ,j
2 4 4 j,N * j,N 2 1/2 * ≤ (2m + 1) sup{E|Xs Xt | } , (2π)4 s,t,j
j,N N 2 | ≤ E|Zm
by the CBS inequality, 4 * j,N |4 , by the CBS inequality, (2m + 1)4 sup E|X s (2π)4 s,j 4 4 (2m + 1) sup |Xsj |4 dP. = (2π)4 s,j [|Xsj |>N ] ≤
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes ee
b1644-ch07
555
ee
j,N N j,N N 2 Since V arZm ≤ 2E|Zm | , one has n
V
eN e j,N arZm
≤2
j=1
n
ee
j,N N 2 E|Zm |
j=1
≤
8 n(2m + 1)4 sup (2π)4 s,j
Thus as in part (a) as n → ∞
1st term = O
[|Xsj |>N ]
|Xsj |4 dP.
, δ/(1+δ) nµ/2
N
(3.22)
so that there exists n1 such that for all n ≥ n1 one has 1st term ≤ K1
nµ/2 N δ/(1+δ)
where K1 is defined as in part (a). Also observe for j < j ee
ee
j,N N j ,N N Cov (Zm , Zm )=
1 (2π)4
m
(cos sλ cos tλ + sin sλ sin tλ )
s,t,s ,t =−m
· (cos s λ cos t λ + sin s λ sin t λ )
*j *j *j *jX × Cov (X s t , Xs Xt ) ee
ee
j,N N j ,N N , Zm )| ≤ |Cov (Zm
4 (2m + 1)4 max s, t, s , t (2π)4
* jX *j *j *j × |Cov (X s t , Xs Xt )|. But *sj |2(2+δ) E|X *tj |2(2+δ) ]1/2 , by the CBS inequality *tj |2+δ ≤ [E|X *sj X E|X
1/2 *sj |2(2+δ) sup E|X *tj |2(2+δ) ≤ sup E|X s,j
t,j
* j |2(2+δ) = sup E|X s s,j
≤ sup E|Xsj |2(2+δ) s,j
2(1+δ)
≤ sup[E|Xsj |4(1+δ) ] (2+δ) , s,j
by H¨ older’s inequality,
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
556
b1644-ch07
Real and Stochastic Analysis
≤K
2(1+δ) (2+δ)
< ∞,
by requirement (3.14),
so that by Theorem 1(ii) in Soedjak [9] for j < j ee
ee
j,N N j ,N N |Cov (Zm , Zm )|
2 " # 2+δ 4 j 2(2+δ) 4 ≤ (2m + 1) sup |Xt | αδ/(2+δ) (j − j ). j (2π)4 t,j [|Xs |>N ]
Following the same computations as in part (a) one has as n → ∞ |2nd term| = O
, δ/2(1+δ) nµ/2
N
(3.23)
whence there exists positive integer n2 such that for all n ≥ n2 one has |2nd term| ≤ K2
nµ/2 N δ/2(1+δ)
where K2 is defined as in part (a). Consequently, for any > 0 fix n0 > max δ/2(1+δ)
{n1 , n2 } and choose N0 such that N0 $ Var
ff
N0 SnN00,m σn0 ,m
>
µ/2
(K1 +K2 )n0 3
. Then
% < 3 .
(3.24)
This establishes part (c). Returning to Equation (3.15), let Sn,m − ES n,m , S*n,m = σn,m e
e
ee
ee
e NN = S*n,m
NN NN Sn,m − ES n,m , σn,m
eN e N S*n,m =
NN NN − ES n,m Sn,m . σn,m
NN S*n,m =
N NN − ES N Sn,m n,m , σn,m
eN N S*n,m =
N NN Sn,m − ES N n,m , σn,m
e
Then (3.15) becomes e eN e e NN N NN NN + S*n,m + S*n,m + S*n,m . S*n,m = S*n,m
e
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
557
Bispectral Density Estimation in Harmonizable Processes
Using the above equation one has e
fN eN
f eN N
eN N
1 2
fN f eN
|EeitSn,m − e− 2 t | ≤ |EeitSn,m (eitSn,m eitSn,m eitSn,m − 1)| eN N
1 2
+ |EeitSn,m − e− 2 t | fN eN
f eN N
fN f eN
f eN N
≤ |EeitSn,m (eitSn,m eitSn,m − 1)| + |EeitSn,m − 1| eN N
1 2
+ |EeitSn,m − e− 2 t | fN eN
fN f eN
fN eN
≤ |EeitSn,m (eitSn,m − 1)| + |EeitSn,m − 1| f eN N
eN N
1 2
+ |EeitSn,m − 1| + |EeitSn,m − e− 2 t | fN f eN
fN eN
≤ |EeitSn,m − 1| + |EeitSn,m − 1| f eN N
eN N
1 2
+ |EeitSn,m − 1| + |EeitSn,m − e− 2 t |.
(3.25)
The first three terms on the right can be further estimated. For all > 0 fN f fN f fN f eN eN eN |eitSn,m − 1|dP + |eitSn,m − 1|dP |EeitSn,m − 1| ≤ fN f eN [|S n,m |≤]
fN f eN [|S n,m |>]
≤ |t|
ee
fN f eN [|S n,m |≤]
≤ |t| +
ee
NN NN |S*n,m |dP + 2P [|S*n,m | > ]
2 eN e N Var S*n,m , 2
by Chebyshev’s inequality. Similarly, fN eN
2 eN N V arS*n,m , 2
f eN N
2 e NN V arS*n,m . 2
|EeitSn,m − 1| ≤ |t| + and |EeitSn,m − 1| ≤ |t| +
From the results in part (a), (b) and (c) one can fix n0 > max{n1 , n2 } and δ/2(1+δ)
choose N0 such that N0
>
µ/2
(K1 +K2 )n0 3
to get
e N e0 e N0 e0 N Var S*nN00,m , Var S*nN00,m , Var S*nN00,m ≤ 3 .
From Theorem 9, for such fixed N = N0 there exists a positive integer n3 such that for all n > n3 one has for each t ∈ R gN g eN 0 0
|EeitSn,m − 1| < .
October 24, 2013
10:0
9in x 6in
558
Real and Stochastic Analysis: Current Trends
b1644-ch07
Real and Stochastic Analysis
Thus for any > 0 and such n0 , n3 and N0 as above, one has for all n > max{n0 , n3 } e
1 2
|E itSn,m − e− 2 t | ≤ + 3|t| + 6,
for each t ∈ R.
Since can be made arbitrarily small, the following theorem has thus been proven. Theorem 10. Let Conditions (A), (B), (1)–(5) be satisfied. If moreover, sups,j E|Xsj |4(1+δ) < ∞ for δ > (2 + β)/β, then Sn,m − ES n,m D −→ N (0, 1) σn,m
as n → ∞.
Since the estimator fˆn,m (λ, λ ) of f (λ, λ ) is of the form fˆn,m (λ, λ ) = the following corollary of Theorem 10 holds.
1 S , n n,m
Corollary 11. Under the same conditions as in Theorem 10 one has n[fˆn,m (λ, λ ) − E fˆn,m (λ, λ )] D −→ N (0, 1) σn,m
as n → ∞.
4. Final Remarks and Suggestions The extension of the results to the case of continuous parameter strongly harmonizable processes can be obtained as follows. The resampling procedure consists of n repeated observations, each of which is a segment of strongly harmonizable process from time s = −Tn to s = Tn , i.e., X 1Tn = {Xs1 , −Tn ≤ s ≤ Tn } .. .. . . X jTn = {Xsj , −Tn ≤ s ≤ Tn } .. .. . . X nTn = {Xsn, −Tn ≤ s ≤ Tn }. The covariance does not depend on each realization j, so that j j r(s, t) = EX s Xt = eisλ−itλ f (λ, λ ) dλ dλ R2
(4.1)
October 24, 2013
10:0
9in x 6in
Real and Stochastic Analysis: Current Trends
Bispectral Density Estimation in Harmonizable Processes
b1644-ch07
559
where f is independent of j. For real valued processes the covariance is (cos sλ cos tλ + sin sλ sin tλ )f (λ, λ ) dλ dλ . r(s, t) = R2
The complex valued bispectral density estimator fˆn of f is 1 1 fˆn (λ, λ ) = n j=1 (2π)2 n
Tn
−Tn
Tn
−Tn
e−isλ eitλ Xsj Xtj ds dt,
and the real valued bispectral density estimator is 1 1 fˆn (λ, λ ) = n j=1 (2π)2 n
Tn
−Tn
Tn
−Tn
(cos sλ cos tλ + sin sλ sin tλ )Xsj Xtj ds dt.
With these estimators one obtains the consistency and the normal limit distribution without any serious difficulty due to the fact that the function 1 sin λT behaves like Dirac-δ function, that is the value of this function 2π λ concentrates more and more in the immediate neighborhood of λ = 0 as T → ∞. Mathematically it may be of interest to consider the sampling (4.1) when it is continuous in the j direction as well, i.e., {Xss12 , −T1 ≤ s1 ≤ T1 , 0 ≤ s2 ≤ T2 }. It seems that this extension is possible using a discretizing technique which was used in Ibragimov and Linnik [3], Ch. 18, Sec. 7 for the case when the process Xt , t ∈ R is stationary. The details have yet to be worked out. Using similar techniques it is possible to consider the limit distributions of the covariance estimator rn (s, t) and the bispectral distribution estimator Fˆn (λ, λ ). These involve separate computations (and cannot be deduced from the work presented here), and they can be subjects of future studies. The conditions of independent observations was to be relaxed by alternatives, such as weak dependence (m−dependence) and weak stationarity. The latter was given serious analysis in Ibragimov and Linnik [3]. This work was an important step toward widening the applications. The next natural step is to extend the estimation problems to harmonizable class. This class is quite large and includes the strong and weak classes. This paper gives a substantial analysis for the strong class. This opens up the study of the corresponding problems for periodically correlated class, a class that is being studied by Hurd et al. The applications of the periodically correlated
October 24, 2013
10:0
560
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-ch07
Real and Stochastic Analysis
class can be carried on before going into the applications of weakly harmonizable case whose general structure has been established by now (see Rao [7]). We present these ideas for the follow up and future analysis. References [1] H. L. Hurd, Representation of strongly harmonizable periodically correlated processes and their covariances, Journal of Multivariate Analysis 29 (1989). [2] H. L. Hurd and J. Leskow, Estimation of the fourier coefficient functions and their spectral densities for α-mixing almost periodically correlated processes, Statistics & Probability Letters 14. [3] I. A. Ibragimov and V. Linnik, Yu, Independent and Stationary Sequences of Random Variables, Wolter-Noordhoff Publishing, Groningen, 1971. [4] T. D. Pham and L. T. Tran, Some mixing properties of time series models, Stochastic Process and their Applications 19 (1985), 297–303. [5] D. N. Politis and J. P. Romano, A general resampling scheme for triangular arrays of α-mixing random variables with application to the problem of spectral density estimation, The Annals of Statistics 20(4) (1992), 1985–2007. [6] M. M. Rao, Probability Theory with Applications, Academic Press, Inc., New York, 1984. [7] M. M. Rao, Harmonizable processes: Structure theory, L ’Enseign Math 28 (1985), 295–351. [8] M. Rosenblatt, Stationary Sequences and Random Fields, Birkh¨ auser, Boston, 1985. [9] H. Soedjak, Consistent estimation of the bispectral density function of a harmonizable process, Journal of Statistical Planning and Interface 100(2) (2002), 159–170. [10] A. M. Yaglom, Correlation Theory of Stationary and Related Random Functions I & III, Springer Verlag, New York, 1987.
October 24, 2013
10:2
9in x 6in
Real and Stochastic Analysis: Current Trends
b1644-cont
CONTRIBUTORS
V. I. Bogachev, Department of Mechanics and Mathematics, Moscow State University, 119991 Moscow, Russia (e-mail: [email protected]) H. Heyer, Universit¨ at T¨ uebingen, Mat. Institut der Morgenstelle 10, Germany (e-mail: [email protected]) F. Hiai, Tohoku University, 3-8-16-303, Hakusan, Abiko 270-1134, Japan (e-mail: [email protected]) U. C. Ji, Department of Mathematics, College of Natural Science, Chungbuk National University, Cheongju, 360-763, Korea (e-mail: [email protected]) N. Obata, Graduate School of Information Science, Tohoku University, Sendai, 980-85t9, Japan (e-mail: [email protected]) Y. Kakihara, Deartment of Mathematics, San Bernardino State University, San Bernardino, CA 92407 (e-mail: [email protected]) H. Soedjak, 3212 Know It All Ln, Rogersville, MO 65742 (email: [email protected])
561