Probability, Statistics, and Their Applications: Papers in Honor of Rabi Bhattacharya (Lecture Notes-Monograph Series, V. 41)

Probability, Statistics and their Applications: Papers in Honor of Rabi Bhattacharya Krishna Athreya, Mukul Majumdar, M...

Author: R. N. Bhattacharya | Krishna B. Athreya

12 downloads 608 Views 17MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Probability, Statistics and their Applications: Papers in Honor of Rabi Bhattacharya

Krishna Athreya, Mukul Majumdar, Madan Purl & Edward Waymire, Editors

Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Volume 41

Probability, Statistics and their Applications: Papers in Honor of Rabi Bhattacharya

Krishna Athreya, Mukul Majumdar, Madan Puri & Edward Waymire, Editors

Institute of Mathematical Statistics Beachwood, Ohio

Institute of Mathematical Statistics Lecture Notes-Monograph Series

Series Editor: Joel Greenhouse

The production ofthe Institute of Mathematical Statistics Lecture Notes-Monograph Series is managed by the IMS Societal Office: Julia A. Norton, Treasurer and Elyse Gustafson, Executive Director.

Library of Congress Control Number: 2003103225 International Standard Book Number 0-940600-55-2 Copyright © 2003 Institute of Mathematical Statistics All rights reserved Printed in the United States of America

Editors

KRISHNA ATHREYA

MUKUL MAJUMDAR

School of 0 RIE Cornell University Ithaca, NY 14850

Department of Economics Cornell University Ithaca, NY 14850

EDWARD WAYMIRE

MADAN PURl

Department of Mathematics Oregon State University Corvallis, OR 97331

Department of Mathematics Indiana University Bloomington, IN

HI

IV

Contents

Preface ................................................................... vii Bhattacharya Bibliography ................................................ ix 1. Iteration of lID Random Maps on R+ .................................... 1 Krishna Athreya 2. Adaptive Estimation of Directional Trend .............................. 15 Rudolf Beran 3. Simulating Constrained Animal Motion Using Stochastic Differential Equations .................................................. 35 David Brillinger 4. {;I-expansions and the generalized Gauss map ........................... 49 Santanu Chakraborty and B. V. Rao 5. On Ito's Complex Measure Condition .................................. 65 Larry Chen, Scott Dobson, Ronald Guenther, Chris Orum, Mina Ossiander, Enrique Thomann, Edward Waymire 6. Variational formulas and explicit bounds of Poincare-type inequalities for one-dimensional processes .......................................... 81 Mu-Fa Chen 7. Brownian Motion and the Classical Groups ............................. 97 Anthony D 'Aristotile, Persi Diaconis and Charles M. Newman 8. Transition Density of a Reflected Symmetric Stable Levy Process in an Orthant .............................................................. 117 Amites Dasgupta and S. Ramasubramanian 9. On Conditional Central Limit Theorems For Stationary Processes .................................................. 133 Manfred Denker and Mikhail Gordin 10. Polynomially Harmonizable Processes and finitely polynomially determined Levy processes .............................. 153 A. Goswami and A. Sengupta 11. Effects of Smoothing on Distribution Approximations ................. 169 Peter Hall and Xiao-Hua Zhou 12. Survival under uncertainty in an exchange economy ................... 187 Nigar Hashimzade and Mukul Majumdar

v

13. Singular Stochastic Control in Optimal Investment and Hedging in the Presence of Transaction Costs .................................. 209 Tze Leung Lai and Tiong Wee Lim

14. Parametric Empirical Bayes Model Selection - Some Theory, Methods and Simulation .............................................. 229 Nitai Mukhopadhyay and Jayanta Ghosh 15. A Theorem of Large Deviations for the Equilibrium Prices in Random Exchange Economies ........................................ 247 Esa Nummelin 16. Asymptotic estimation theory of change-point problems for time series regression models and its applications ................................. 257 Takayuki Shiohama, Masanobu Taniguchi and Madan L. Puri 17. Fractional Brownian motion as a differentiable generalized Gaussian process ..................................................... 285

Victoria Zinde- Walsh and Peter C.B. Phillips

vi

Preface This collection of papers honors Professor Rabi Bhattacharya in recognition of a career of dedicated teaching, research and service to his profession. From his earliest publications Bhattacharya has provided unique approaches to a wide array of problems in Probability, Statistics and their connections and applications in sciences and engineering. The diversity of topics covered in this volume is testimony to this tremendous breadth of expertise and scholarship. A bibliography of Bhattacharya's publications appears after this Preface. A brief biographical sketch of Bhattacharya's academic career now follows.

Biographical Sketch Rabindra Nath Bhattacharya was born January 11, 1937, in his ancestral home Porgola, District of Barisal, in what is now Bangladesh. His family moved to Calcutta in 1947 following the partition of India. Undergraduate studies in India are generally done in four-year colleges affiliated with a university. Bhattacharya studied in Presidency College, Calcutta, and received a Bachelor of Science degree in 1956 and Master of Science in 1959. After completion of the M.Sc. he served as a research scholar at the Indian Statistical Institute from 1959-1960. From there Bhattacharya took up a Lectureship in Kalyani University near Calcutta where he taught from 1961 until 1964. Bhattacharya came to the United States in 1964 to attend graduate school in the Statistics Department on a fellowship from the University of Chicago. He completed his PhD as a student of Patrick Billingsley at Chicago in 1967. He returned to India in the spring of 1967 to marry Bithika (Gowri) Banerjee, his wife of the last thirty-six years and mother of his two adult children Urmi and Deepta. In the fall of 1967 he accepted a position in the Statistics Department at the University of California at Berkeley as an Assistant Professor. In 1972 Bhattacharya moved to the University of Arizona as an Associate Professor in the Mathematics Department. He was promoted to Full Professor in 1977. In 1982 he joined the Mathematics Department at Indiana University where he remained until his retirement in 2002. After his retirement from Indiana University, Bhattacharya returned to the University of Arizona where he presently resides as Professor of Mathematics. Bhattacharya's honors and awards are many. They include two special invited papers in the Annals of Probability (1977) and the Annals of Applied Probability (1999). He was elected a Fellow of the Institute of Mathematical Statistics in 1978. He is recipient of a Humboldt Prize 1994-95, and was a Guggenheim Fellow during the year 2000. Extreme generosity in sharing ideas is a hallmark of Rabi Bhattacharya's personality. Nine students completed Ph.D. degrees under his direction and one is currently in progress. Besides these he has helped many others through his consistent and thorough dedication to teaching, research and professional service. Bhattacharya served as Associate Editor on a number of journals throughout his career, ranging from the Annals of Probability, Journal of Statistical Planning and Inference, Econometric Theory, and Journal of Multivariate Analysis, Statistica Sinica. He currently serves as co-coordinating Editor of Journal of Statistical Planning and Inference, and as an Associate Editor for Statistica Sinica and for the Annals of Probability. He served as an elected member of the Vll

Institute of Mathematical Statistics Council from 1998 to 200l. Upon retirement from Indiana University his colleagues and students honored his years of service to the department and university with the following tribute penned by Richard Bradley: "Both in his dealings with people and in his perspective on the university, academic life, and the world at large, Rabi has consistently shown a deep humanity. This is manifested in the way he treats his students, his colleagues, and everybody else he deals with." This volume of articles is continued tribute to his valued contributions.

Vlll

Rabi Bhattacharya

Bibliography [1] Berry-Esseen bounds for the multidimensional central limit theorem (1968) Bull. Amer. Math. Soc., 74, 285-287.

[2] Rates of weak convergence for multidimensional central limit theorems (1970) Theor. Probab. Appl., 15, 68-86.

[3] Rates of weak convergence and asymptotic expansions in classical central limit theorems (1971) Ann. Math. Stat., 42, 241-259.

[4] Speed of convergence of the n-fold convolution of a probability measure on a compact group (1972) Z. Wahrscheinlichkeitstheorie Ver. Geb., 25, 1-10. [5J Recent results on refinements of the central limit theorem (1972) Proc. Sixth Berkeley Symposium on Math. Stat. and Prob., 2, 453-484. [6J Errors of normal approximation (1973) Proc. International Conf. on Prob. Theory and Math. Statist., Vilnius, U.S.S.R., 117-119. [7J Random exchange economies (1973) J. Econ. Theor, 6, 37-67 (with M. Majumdar). [8] On errors of normal approximation, (1975) Ann. Probab., 3, 815-828. [9J Normal Approximation and Asymptotic Expansions (with R. Ranga Rao) (1976) Wiley, New York. Russian Edition (1982). Revised Reprint by Krieger, Florida (1986). [10J On the stochastic foundations of the theory of water flow through unsaturated soil, (1976) Water Res. Research, 12, 503-512 (with V.K. Gupta and G. Sposito). [11] Refinements of the multidimensional central limit theorem and applications (1977) Ann. Probab., 7, 1-28. (Special invited paper). [12] On the validity of the formal Edgeworth expansion, Ann. Statist. (1978) 6, 434-451 (joint with J.K. Ghosh). [13] Criteria for recurrence and existence of invariant measures for multidimensional diffusions (1978) Ann. Probab., 6, 541-553. [14J On a statistical theory of solute transport in porous media (1979) SIAM J. Appl. Math., 34, 485-498 (joint with V.K. Gupta). [15] Foundational theories of solute transport in porous media: a critical review (1979) Advances in Water Res., 2, 59-68 (joint with V.K. Gupta and G. Sposito). [16] On global stability of some stochastic economic processes: A synthesis (1980) Quantitative Economics and Development (Ed. by L.R. Klein, M. Nerlove and R.C. Tsiang), 19-43, Academic Press, New York (with M. Majumdar).

ix

[17] A molecular approach to the foundations of solute transport in porous media, 1. Conservative solutes inhomogeneous, saturated media (1981) J. Hydrology, 50, 355-370 (joint with V.K. Gupta and G. Sposito). [18] Asymptotic behavior of several dimensional diffusions, Nonlinear Stochastic Systems in Physics, Chemistry and Biology (1981) (Ed. by L. Arnold and R. Lefever), Springer-Verlag. [19J Recurrence and ergodicity of diffusions (1982) J. Mult. Analysis, 12, 95-122 (with S. Ramasubramanian). [20J On classical limit theorems for diffusions (1982) Sankhya 44, Ser. A, 47-71. [21 J On the functional central limit theorem and the law of the iterated logarithm for Markov processes (1982) Zeit. Wahr. Ver. Geb. 60, 185-201. [22J The Hurst effect under trend (1983) J. App. Prob. 20, 649-662 (with V.K. Gupta and E. Waymire). [23J A new derivation of the Taylor-Aris theory of solute dispersion in a capillary, (1983) Water Res. Research, 19(4),945-951 (with V.K. Gupta). [24] A theoretical explanation of solute dispersion in saturated porous media at the Darcy scale (1983) Water Res. Research, 19(4),938-944 (with V.K. Gupta). [25J On the order of magnitude of cumulants of von Mises functionals and related statistics (1983) Ann. Prob., 11(2), 346-354 (with M.L. Puri). [26] Fokker Planck equations, Encyclopedia of Statistical Sciences, Vol. 3 (ed. by S. Kotz and R. Johnson) (1983) Wiley, New York, (joint with C.M. Newman). [27J Stochastic models in mathematical economics: A review, Statistics: Applications and New Directions (1984) Proc. lSI Golden Jubilee Int. Conf. (ed. by J.K. Ghosh and G. Kalianpur), 55-99 (joint with M. Majumdar). [28] On the Taylor-Aris theory of solute transport in a capillary (1984) SIAM J. Appl. Math. 44(1) (joint with V.K. Gupta). [29] Some recent results on Cramer-Edgeworth expansions with applications, Multivariate Analysis VI (1985) Proceedings of the Sixth International Symposium on Multivariate Analysis, (P.R. Krishnaiah, ed.), 57-75. [30] Asymptotic expansions and applications (1985) Proc. Fourth Vilnius Conf. on Prob. and Math. Stat., Vilnius, USSR. [31] A central limit theorem for diffusions with periodic coefficients (1985) Ann. Probab. 13, 385-396. [32] Solute dispersion in multidimensional periodic porous media (1986) Water Res. Research, 22(2), 156-164 (joint with V.K. Gupta).

[33] Some aspects of Edgeworth expansions in statistics and probability, New Perspectives in Theoretical and Applied Statistics (1987) (ed. by M. Puri, J. Villaplana and W. Wertz), Wiley, New York, 157-170. x

[34] Central limit theorems for diffusions with almost periodic coefficients (1988) Sankhya 50, 9-25 (joint with S. Ramasubramanian). [35] Asymptotics of a class of Markov processes which are not in general irreducible (1988) Ann. Probab. 16, 1333-1347 (with O. Lee). [36] On moment conditions for valid formal Edgeworth expansions (1988) J. Mult. Analysis 27, 68-79 (with J.K.Ghosh). [37] Ergodicity and the central limit theorem for a class of Markov Processes (1988) J. Mult. Analysis 27, 80-90 (with O. Lee). [38] Convolution effect in the determination of compositional profiles and diffusion coefficients by microprobe step scans (1988) American Mineralogist, 73, 901-909 (with J. Ganguly and S. Chakraborty). [39] Asymptotics of solute dispersion in periodic porous media (1989) SIAM J. Appl. Math., 49, 86-98 (with V.K. Gupta and H.F. Walker). [40] Second order and Lp~comparisons between the bootstrap and empirical Edgeworth expansion methodologies (1989) Ann. Statist., 17, 160-169 (with M. Qumsiyeh). [41] Controlled semi-Markov models-the discounted case (1989) J. Stat. Plan. Inf., 21, 365-381 (with M.Majumdar). [42] Controlled semi-Markov models under long-run average rewards (1989) J. Stat. Plan. Inf. 22, 223-242 (with M. Majumdar). [43] Applications of central limit theorems to solute dispersion in saturated porous media: from kinetic to field scales (1990) in Dynamics of Fluids in Hierarchical Porous Media (Ed. by J. Cushman), Academic Press, 61-96 (with V.K. Gupta). [44] Asymptotic Statistics Birkhauser (1990) DMV Lecture Series (with M. Denker). [45] Stochastic Processes with Applications (1990) Wiley, (with E. Waymire). [46] An extension of the classical method of images for the construction of reflecting diffusions (1991) Proc. R.C. Bose Symp. on Prob., Math. Stat. and Design of Experiments, 155-164, Wiley (Eastern), (with E.C. Waymire). [47] Stability in distribution for a class of singular diffusions (1992) Ann. Probab., 20, 312-321 (with G. Basak). [48] Central limit theorems for diffusions: recent results, open problems and some applications (1992) Proc. I.I.M. Conf., Oxford Univ. Press (with S. Sen). [49] A class of U-statistics and asymptotic normality of the number of k-clusters (1992) J. Multivariate Analysis 43, 300-330 (with J.K. Ghosh). [50] The range of the infinitesimal generator of an ergodic diffusion (1993) in Statistics and Probability: A Raghu Raj Bahadur Festschrift (J.K. Ghosh. et aI, editors), 73-81 (with G. Basak). Wiley. Xl

[51] Random iterations of two quadratic maps (1993) in Stochastic Processes: A Festschrift for G. Kallianpur (S. Cambanis et al., editors), 13-22 (with B.V. Rao) , Springer-Verlag. [52] Markov processes: asymptotic stability in distribution, central limit theorems (1993) in Probability and Statistics (S.K. Basu, B.K. Sinha, editors), Narosa Publishing House, New Delhi, 33-43. [53] Proxy and instrumental variable methods in regression with one regressor missing (1994) J. Mult. Analysis 47, 123-138 (joint with D.K. Bhattacharyya) . [54] Ergodicity of first order nonlinear autoregressive models (1995) J. Theor. Probab. 8, 207-219, (with C. Lee). [55] On geometric ergodicity of nonlinear autoregressive models, Statistics and Probability Letters, 311-315 (with C. Lee). [56] Methodology and applications (1995) in Advances in Econometrics and Quantitative Economics, (G.S. Maddala and P.C.B. Phillips, eds.), 88-122 (with M.L. Puri), Blackwell, Oxford, U.K. [57] Time scales for Gaussian approximation and its breakdown under a hierarchy of periodic spatial heterogeneities (1995) Bernoulli 1, 81-123 (with F. G6tze). [58] Comparisons of Chisquare, Edgeworth expansions and bootstrap approximations to the distributions of the frequency Chisquare (1996) Sankhya, Ser. A 58, 57-68 (with N.H. Chan). [59] Asymptotics of iteration of i.i.d. symmetric stable processes (1996) Research Developments in Probability and Statistics-Madan Puri Festschrift, (E. Brunner and M. Denker, eds.), 3-10 (with B.V. Rao). [60] A hierarchy of gaussian and non-gaussian asymptotics of a class of FokkerPlanck equations with multiple scales (1997) Nonlinear Analysis, Theory, Methods and Applications, 30, No.1, 257-263, Proc. 2nd World Congress of Nonlinear Analysis, Athens, Greece, Elsevier Science Ltd. [61] Central limit theorems for diffusions: recent results, open problems and some applications, Probability and Its Applications (1997) (M.C. Bhattacharjee and S.K. Basu, eds.), 16-31, Oxford Univ. Press (with S. Sen). [62] Phase changes with time for a class of diffusions with multiple periodic spatial scales, and applications (1997) Proc. 51st Session of the International Statistical Institute, Istanbul, TUrkey. [63] Convergence to equilibrium of random dynamical systems generated by i.i.d. monotone maps with applications to economics (1999) in Asymptotics, Nonparametrics, and Time Series: Festschrift for M.L. Puri (S. Ghosh, Editor), 713-742 (with M. Majumdar), Marcel Dekker (New York). [64] Speed of convergence to equilibrium and normality for diffusions with multiple periodic scales (1999) Stochastic Processes and Applications, 80, 55-86 (with M. Denker and A. Goswami). xii

[65] Multiscale diffusion processes with periodic coefficients and an application to solute transport in porous media (1999) (Special Invited Paper), Annals of Applied Probability, 9, 951-1020. [66] On a theorem of Dubins and Freedman (1999) J. Theoretical Probab. 12, 1165-1185 (with M. Majumdar). [67] Estimating the probability mass of unobserved support in random sampling (2000) J. Statist Plan and Inf., 91-106 (with A. Almudevar and C.C. Sastri). [68] Random iteration of i.i.d. quadratic maps (2000) in Stochastics in Finite and Infinite Dimensions: In Honor of G. Kallianpur (T. Hida, R.L. Karandikar, H. Kunita, B.S. Rajput, S. Watanabe and J. Xiang, eds.), Birkhauser, 49-58 (with K.B. Athreya). [69] Stochastic equivalence of convex ordered distributions and applications (2000) Probability in Engineering and Informational Science, vol. 14, 33-48 (with M.C. Bhattacharjee). [70] A class of random continued fractions with singular equilibria (2000) in Perspectives in Statistical Sciences (A.K. Basu, J.K. Ghosh, P.K. Sen and B.K. Sinha, eds.), Oxford University Press, 75-86, (with A. Goswami). [71] On characterizing the probability of survival in a large competitive economy (2001) Review of Economic Design, 6, 133-153 (with M. Majumdar). [72] On a class of stable random dynamical systems: Theory and applications (2001) J. Economic Theory, 96, 208-229 (with M. Majumdar). [73] A note on the distribution of integrals of geometric Brownian motion (2001) Stat. and Probab. Letters, 55, 187-192 (with E. Thomann and E.C. Waymire). [74] Iterated random maps and some classes of Markov processes (2001) in: Handbook of Statistics, Vol. 19, Vo. 19 (D.N. Shanbhag and C.R. Rao, eds.), Elsevier Science. 145-170 (with E.C. Waymire). [75] Markov processes and their applications, In: Handbook of Stochastic Analysis and Applications (2002) (D. Kannan and V. Lakshminatham, eds.). Marcel Dekker 1-46. [76] Large sample theory of intrinsic and extrinsic sample means on manifolds-I. Annals of Statistics (In Press) (with V. Patrangenaru). [77] Phase changes with time for a class of autonomous multiscale diffusions (2002) Sankhya, Ser. A, Special Issue in Honor of D.Basu, Guest ed. A. DasGupta, 64(3), 741-762. [78] An approach to the existence of unique invariant probabilities for Markov processes (2002) In: Limit Theorems in Probability and Statistics (I. Berkes, E. Csaki, M. Csorgo, eds.), J. Bolyai Mathematical Society, Budapest (with E. C. Waymire). [79] Phase changes with time for a class of autonomous multiscale diffusions, in Sankhya: Special issue in memory of D. Basu (To appear). xiii

[80] Markov processes: asymptotic stability in distribution, central limit theorems (2002) in Probability and Statistics, 33-43 (S.K. Basu and B.K. Sinha, eds.). [81] Review of "Limit Theorems of Probability Theory" by V.V. Petrov (2002) Bull. Amer. Math. Soc. 34, no. 1, 85-88. [82] Random Dynamical Systems: Theory and Applications (with M. Majumdar). To appear in the Cambridge Series in Economics, Cambridge Univ. Press. [83] Stochastic Processes: Theory and Applications (with E. Waymire). To appear in the Graduate Texts in Mathematics Series, Springer.

XIV

Iteration of lID Random Maps on R+ K.B. Athreya 1 Iowa State University and Cornell University Abstract Let {Xn} be a Markov chain on R+ generated by the iteration scheme X n+ 1 = C n+ 1 X n g(Xu ) , where {Cn,gn(-)} are i.i.d. such that {Cn} are nonnegative r.v. with values in [0, LJ, L::::; 00, {gn} are continuous functions from [0, (0) -+ [0,1] with gn(O) = 1. This paper presents a survey of recent results on the existence of nontrivial stationary measures, Harris irreducibility and uniqueness of stationary measures, convergence and persistence. Four well known special cases i.e. the logistic, Ricker, Hassel and Vellekoop-Hognas models are discussed.

Keywords: Markov chains, IID random maps, Stationary measures, Harris reducibility AMS Classification: 60J05, 60F05

1

Introduction

A topic of some interest to Professor Rabi N. Bhattacharya, whom the present volume honors, and to which he has contributed substatially is the iteration of i.i.d. random quadratic maps on the unit interval [0, 1]. Beginning with the paper Bhattacharya and Rao [7] where they analyzed the case of i.i.d. iteration of two quadratic maps using the Dubins-Freedman [9] results on random monotone maps on an interval, Professor Bhattacharya has obtained a number of interesting results on the uniqueness and support of the stationary distribution as well as on rates of convergence. For these the reader is referred to Bhattacharya and Majumdar [6] and Bhattacharya and Waymire [8]. In the present paper we study Markov chains generated by iteration of i.i.d. random maps on R+ that are restricted to the class of functions f: R+ --+ R+ such that they possess a finite, positive derivative at 0, vanish at and have a sublinear growth for large values. This class is of relevance and use in population ecology and growth models in economics. The conditions imposed on f in this class reflects two features common in ecological modelling, namely, i) for small values of the population size Xn at time n, the population size X n+ 1 at time n + 1 is approximately proportional to Xn with a random proportionality constant while for large values of X n , competition sets in and the linear growth is scaled down by a factor. This class includes many of the known models in the ecology literature such as the logistic maps, Ricker maps, Hassel maps and Vellekoop-Hognas maps, as explained in the next section. Here is an outline of the rest ofthe paper. In the next section we describe the basic mathematical set up and establish some results for Feller chains on R+. In section 3 we describe a set of necessary and two sets of sufficient conditions for the existence of stationary measures with support in (0, CXJ). In section 4, a trichotomy into subcritical, critical and supercritical cases is introduced and convergence results for the subcritical and critical cases are provided. Section 5

°

1 Reserch

supported in part by AFOSR lISI F 49620-01-1-0076

1

Iteration of IID Random Maps on R+

2

is devoted to Harris irreducibility and uniqueness of the stationary measures in the supercritical case. Some open problems are indicated at the end. It is a great pleasure for the author to dedicate this paper to Professor Rabi N. Bhattacharya who has been a dear friend and a source of inspiration.

2

The mathematical framework

Let the collection F of functions f: [0, L) i)

-----*

[0, L), L ::::; 00 be such that

f is continuous

ii) f(O)

=0 f~(O) exists and is positive and finite

iii) lim f(x)

-

iv) g(x) ==

f.+1(0)

xlO

x

f~) satisfies 0 < g(x) < 1 for 0 < x < L.

Let (D, B, P) be a probability space. Let {fj (w, x) }j2':1 be a collection of random maps from D x [0,00) -----* [0,00) that are jointly measurable, i.e. that are (B x B[O, 00), B[O, 00)) measurable and for each j, fJ(w,·) E F with probability one. Consider the random dynamical system generated by the iteration scheme:

X n + 1 (w,x)

Xo(w, x)

(1)

x.

Since fj(w,·) E F w.p.l. the model (1) reflects the two features common in ecological modelling i.e. for small values of X n , X n+ 1 is proportional to Xn with proportionality constant f~+l (0) _ Cn +1, say, and for large values of X t , this is reduced by the factor g(Xn). The class F includes the logistic, Ricker, Hassel, Vellekoop-H6gnas families mentioned in the introduction, as shown below. For the logistic family, fc(x) = 1 - x for 0 ::::; x ::::; 1.

= cX(l - x),

For the Ricker family [13]' L

= 00, fc,d(X) = cxe- dx ,

L

= 1,

f~(O)

= c, and

9 (x)

g(x) = e- dx , 0::::; x < 00. For the Hassel family [11], L and g(x) = (1 + x)-d.

f~(O)

= 00, fc,d(X) = cx(1 + x)-d,

= c,

f~(O)

=c

For the Vellekoop-H6gnas family [14], L = 00, f(x) = rx(h(x))-b, f~(O)

=

T,

g(x) = (h(x))-b.

From now on, suppose that {fdi2':1 are i.i.d. stochastic processes. Then the sequence {Xn} defined by (1) is a Markov chain with state space S = [0, L) and transition function P(·, A) = P(j, (w,.) E A) and initial value Xo = x. the same is true when Xo is chosen as a random variable (with values in S) but independently of {fd. Further, since fi are continuous w.p.l., {Xn} has the Feller property:

K.B. Athreya

3

For each k: 5 --+ 5 bounded and continuous, (Pk)(x) _ E(k(Xl)lxo = x) is continuous in x. For Feller Markov Chains it is known [8] that if a probability measure r is the weak limit point of the sequence {rn (.)} of occupation measures,

1

n-l

r n,x(A) == -:;;, L P(xJ

(2)

E Alxo = x)

o then r is necessarily stationary for P, i.e.

r(A)

=

is

P(x, A)r(dx)

VA

E

(3)

B(5),

the Borel a-algebra on S. The following proposition is slightly more general.

Proposition 2.1. Let {Xn} be Feller with state space 5 = [0, L). Let a subprobability measure r(.) on 5 be a vague limit point of r n,Xo for some initial r.v. Xo. Then r is stationary for P, i.e. it satisfies (3). For a proof see Athreya [1]. A sufficient condition for ensuring that every vague limit point r of {rn,x} is nontrivial on (0, L), i.e. satisfies r(O, L) > is provided by the following.

°

Proposition 2.2. Suppose there exists a V: 5 == [0, L) and constants <ex, M < 00 such that

°

i) V x ~ K,

E(V(xdIXo

ii) V x E 5,

E(V(Xl)IXo = x) ::; V(x)

Then r(K) 2 limr n,x(K) 2

=

--+

R+ a set K

c (0, L)

x) ::; V(x)- ex

+M

cx':;M > 0.

The proof is not difficult and may be found in Athreya [1].

3

Stationary Measures

In this section we present one set of necessary and two sets of sufficient conditions for the existence of a stationary probability measure 1f such that 1f(0, L) = 1 for the Markov Chain (1). For proofs of these see Athreya [1].

Theorem 3.1. Let C· == lim fj(x) , J dO x

9j(X) ==

fj(x) {

1CjX

> for x = for x

° °

Let

(4) Suppose there exists a probability measure 1f satisfying the stationarity condition (3) and the nontriviality condition 1f(0, L) = 1. Then the following hold:

i) E(ln C 1 ) <

00,

Iteration of IID Random Maps on R+

4

ii) iii)

J Elln9l(x)ln(dx) < 00, ElnCl = - J E(ln9l(x))n(dx) > O.

Corollary 3.1. If E In C l

:::;

0 then

i) The only stationary probability meausre on [0, (0) is the delta measure at

O. ii) For any x ~ 0, and Borel sets A such that A C (0, L)

lim r n,x(A) = O.

Next we present two sets of sufficient conditions for the existence of a stationary measure n with n(O, (0) > 0 for the Markov chain {Xn} in (1). Theorem 3.2. Let {fj}, {C j }, {9j} be as in Theorem 3.1. Let Dj(w)

== supfj(w, x). x2:0

Assume

i) k(x)

=

-Eln9l(x)

< 00

for all 0

< x < L.

ii) limk(x) = O. x!O

iii) kC) is nondecreasin9 in (T, L) for some 0 :::; T iv) EIlnC11

<

00,

v) E(ln D l )+ < vi) Elk(Ddl

<

< L.

ElnCl > O.

00.

00.

Then there exists a stationary distribution n for the Markov chain {Xn} defined by (1) such that n(O,L) = 1. Special Cases. We now apply Theorem 3.2 above to the four cases mentioned earlier.

1. Random Logistic Maps [7] Here h(x) = C l x(1 - x), 0 :::; x :::; 1, 0 :::; C l :::; 4, 9l(X) = (1 - x) so k(x) = -In(1 - x) and hence i), ii), and iii) of Theorem 3.2 hold. Also Dl = ~l :::; 1 and so v) holds. Thus i) ~ vi) of Theorem 3.2 reduce to EllnCll <

ElnCl > 0,

00,

E lIn (1 -

~l ) I < 00.

(5)

This was established by Athreya and Dai [3].

2. Random Ricker Maps [13] Here fl(X) = Clxe~dlX, 0:::; x < 00, 0::; C l , d l < 00. So k(x) and hence i), ii) and iii) of Theorem 3.2 reduce to Ed l < 00. Also Dl

=

~1 sup dlxe~dlx 1

x>O

EllnCll <

=

(Edl)x

~l. Thus, i) - vi) of Theorem 3.2 reduce to 1

00,

=

ElnCl > 0,

Ed l

<

00,

K.B. Athreya

5

3. Random Hassel Maps [11] Here JI(x) = C l x(l + x)-d 1 , 0 :s; X < 00, 0 :s; C l , d l < 00. So k(x) = (Edt) In(l+x) and hence i), ii) and iii) of Theorem 3.2 reduce to Ed l < 00. Also, Cl (1 Dl = { C l

1J d1-l 11

if d l > 1 if d l = 1 if d l < 1

00

So we need P(d l ;::: 1) = 1. This implies Dl :s; C 1 w.p.l. and so v) is implied by E(ln Cd+ < 00 which in turn is implied by iv). Finally, Iln(l+Dl)1 :s; In(1+C1 ). Thus i) - vi) of Theorem 3.2 are implied by EllnCll < 00, E(lnCt} > 0, Ed 1 < 00, P(d l ;::: 1) = 1.

4. Random Vellekoop-Hognas Maps [14] Here JI(x) = C l x(h l (x))-b 1 , O:S; X < 00 where O:S; C l , bl < 00 and h l (-) satisfies hI (0) = 1, hI (x) ;::: 1 for x ;::: 0, hI (-), is continuously differentiable and hl(x)

== x~~~~i is strictly increasing.

Note that this includes all three previous cases. So k(x) = Ebllnhl(x). Next, to find Dl note that the function rl(x) = In(x(hl(x))-h) satisfies

hi (x)

1

1

1 (

r l (x)=;-b 1hl (x)=;

-

)

1-bl hl (x).

Since hI (x) is strictly increasing and is zero at x = 0, ri(x) > 0 for 0 :s; x CX, where CXl = inf{ x: hI (x) > bt}. So

Thus, i) - vi) of Theorem 3.2 are implied by i) Ebllnhl(x) <

00

for all O:S; x <

00.

ii) limEbtln hI (x) = O. xlv

iii) Eb l In hI (x) is nondecreasing in (T, iv) EllnCll < v) :3 0
<

00,

00)

for some T ;::: O.

ElnCl > 0

00 -->

hI (CXl)

=

b11 and E (In (CX l (hI (cx))-b 1 ) ) +

00.

vi) Elk(DI)1 = Ek(D 1 ) = Eb 2 ln h 2 (D I ) < 00 where b1 and h 2 (-) are defined by h(x) = C 2 x(h 2 (x))-b 2 with f(·) being i.i.d. copy of JI(.). Remark: In all the above four cases the function 9j(x) = ~(X; --> 0 as x --> 00 J asserting that for large x the growth is sublinear. But in some ecological context such as arising in resource management procedures it is more realistic to keep 9j(x) bounded away from zero for large values of x. Similarly, in some growth models in economics the possibility of fJ (x) --> 00 as x --> 00 is not unrealistic. This leads us to a second set of sufficient conditions.

Iteration of IID Random Maps on R+

6

Theorem 3.3. Let {fj}, {C j }, {9} be as in Theorem 3.1. Suppose

i) lim ElnC191(x) - (31 exists and is > O. x-->O

ii) lim E(lnC1x91(X))+ x-->O

iii) lim ElnC191(x) x-->L

v) k(x)

= O.

== (32

== EllnC191(x)1

exists and is

< O.

is bounded on [a,b] for all 0

< a O.

ii) With probability one lim 91 (x) = 1, lim 91 (x) = 71 xlO

xjL

> 0 and there exists

0< a such that a ~ inf 91(X) ~ SUP91(X) ~ 1. x

iii) ElnC1 + Eln71

x

< O.

Then there exists a stationary 7[" for {Xn} satisfying 7["(0, L)

4

= 1.

Convergence results

The last section dealt with the existence of stationary measures for the Markov chain {Xn} generated by (1) or equivalently by the iteration scheme

(6) where the pair (Cn ,9n(-)n?1) are i.i.d. with 0 < Cn < 00, 9n(-) being w.p.I. a continuous function as in Theorem 3.1 and independent of Xo. The convergence questions that we consider here are: i) The almost sure convergence of the sequence {Xn} as n gence of the trajectories,

---> 00,

i.e. conver-

ii) the convergence of {Xn} in probability and iii) the convergence of the distribution of {Xn}. Since the state space of the Markov chain {Xn} is uncountable one has to look for results from general state space Markov chains theory. There is a body of results available for the case when the chain is Harris irreducible (see Nummelin [12]). Unfortunately, many of the iterated random maps cases turn out to be not irreducible, especially among those where the collection of functions F sampled from is finite or countable. In these cases if the maps are interval maps that are monotone then the Dubins-Freedman theory [9] can be appealed to. The papers by Bhattacharya and Rao [7], Bhattacharya and

K.B. Athreya

7

Majumdar [6] and Bhattacharya and Waymire [8] have nice accounts of this in the random logistic maps case. On the other hand, as shown in the next section, if the distribution of C n is smooth, e.g. absolutely continuous, then {Xn} turns out to be (under some more hypothesis) Harris irreducible. For the random logistics case Bhattacharya and Rao [7], Bhattacharya and Waymire [8] have some nice results under such assumptions. Motivated by Theorem 3.1, we give the following definition. Definition: The Markov chain {Xn} of (1) or (6) is subcritical, critical, or supercritical according as E In C 1 < 0, = 0, or > 0. In the sub critical case, {Xn} converges to zero w.p.l. In fact, a slightly more general result holds. For the rest of this section {Xnh>o will be as in (6). Theorem 4.1. Suppose -1

n

lim- LlnCj(w)

n

== d(w) <

1

°

w.p.l.

(7)

Then Xn(w) = O(pn) for any p > ed(w) and hence Xn(w)

---+

(8)

w.p.l.

° w.p.l.

Proof. Since fj E F Cn+lXngn+l (Xn) ::; C n+1X n

X n+1

< C n +1 C n ... C 1X o Thus

1 lIn -lnXn::; -lnXo + - ~lnC. n n nL J 1

Now (7)

=?

(8).

D Corollary 4.1. If ElnCl < 0 then (7) and hence (8) holds, provided {C 1}n2:1 are i.i.d. Remark: In this theorem the hypothesis {Cn }n>1 are independent is not needed. The geometric decay of {Xn} can be exploited to establish the log normality of X n , a common hypothesis proposed in the ecology literature. Theorem 4.2. Assume i) gj (-) is nonincreasing in [0,6] w.p.l. for some 6 ii) E In C 1 < 0, E(ln C 1)2 < iii)

°: ;

k(x) = - E In gl (x) <

00. 00

ex)

iv) Lk(ex Aj ) < 1

00

> 0.

for some

for all x and nondecreasing.

°<ex<

00

and

eElnCl

< A < 1.

Iteration of IID Random Maps on R+

8

Then

InX n

-

nE In C 1

(9)

ayn where a 2 = V(lnC 1 ). Proof. From (6) n

n

(10) 1

1

Since gl is nonincreasing in [0,6] w.p.I. and (8) holds, 1 ~ gj(X j for j large, some constant a and 0 < ).. < 1. But -Elngj(a)..j) ::; k(a)..j) and so 00

1)

~ gj(aJ,.J)

00

E(- Llngj(a)..j))::; Lk(a)..j) j=l 1 which is finite by (iii). Thus,

By the central limit theorem L~ InCj

-

nElnC1

yn a n

Now (10) and (11) yield (9).

d

~

N(O, 1).

o

Next we turn to the critical case. In the critical case the occupation measures /-In,x(-) defined by (2) all converge in distribution to 60 , This implies that for every E> 0,

i.e. an == Px(Xn ~E) ~ 0 in the Cesaro sense. A natural question is whether it can be improved to full convergence or equivalently does Xn ~ 0 in probability for all 0 < x < oo? For the logistic case, i.e. when h is a logistic map w.p.l. Athreya and Dai [3] have shown this by comparison argument. This is extended below to the present context assuming that w.p.l., II is unimodal with a common nonrandom mode a such that h is nondecreasing in [0, a] and nonincreasing in

[a, 00). Theorem 4.3. Let E(lnC1 )+ < 00 and ElnC1 = O. Assume further that there exists a nonrandom a in (0,00) such that w.p.l. h is nondecreasing in [0, al and nonincreasing in [a, 00 ). Then Xn~O for any initial value Xo = x. (12)

The proof makes use of the following.

K.B. Athreya

9

Theorem 4.4. COMPARISON LEMMA Let {fdi>1 be i.i.d. and unimodal as in the above theorem. Let Xo be independent of {fih>l. Let {Xn}, {Yn }, fYn} and {Zn}, n;:::: 0, be defined by X n+1 Y n+1

fn+l (Xn) min{fn+l(Yn ),o}, Yo = min{Xo,o}

Y n+1

min{fn+l CYn ) , o}, Yo

Zn

=

0

min{Xn,o}

Then for all n ;:::: 0, Y n ;:::: Y n ;:::: Zn w.p.l. Proof. Since Yo :::; Yo = 0, and iI is nondecreasing in [0, oj, iI (Yo) :::; iI (Yo) ip1plying Y1 = min(iI (Yo), 0) :::; min(iI (Yo), 0) == 171. Now induction yields Yn ;:::: Yn for all n. If Xo :::; 0, then Yo = Xo and so

implying Y1 = min{iI(Yo),o} = min{Xl'o} = ZI. If Xo > 0, then Yo = 0 so

implying Y1 = min{iI(Yo),o} ;:::: min{Xl'o} yields Y n ;:::: Zn for all n.

= ZI· Thus Y1

;::::

ZI. Induction

o

Remark: This comparison lemma does not require any conditions as E In C1 . Corollary 4.2. For any 0 <E<

0,

and n ;:::: 1

i) Px(Xn ;::::E)

< Px(Xn ;::::E) < Px(Yn ;::::E) :::; P(Yn ;::::E)

ii)

Proof. Clearly i) follows from the comparison lemma. Next, by the Markov property of {Yn } P(Yn+1 ;::::E)

= E P(Yn ;::::E

IYI ;::::E) :::; P(Yn :::; c).

Proof of Theorem 4.3 By Corollary 4.2 i) it suffices to show that P(Yn;::::E )-t O. But since this is nondecreasing in n this is equivalent to showing n-l 1 " P(Yj - ;::::E) -t 0 - 'Lt n 0 But the occupation measure sequence fL~ (.) defined by y

fLn (-)

n-l

1",-

== - Lt P(Yj n

0

~E)

(13)

Iteration of IID Random Maps on R+

10

can be shown to have a nontrivial limit point only if ElnCl > 0 (as in the proof of Theorem 3.1). Thus /-L;;([E,OO)) --t 0 implying (13). A natural question prompted by Theorem 4.3 is whether in the critical case the convergence of Xn to zero in probability could be strengthened to convergence w.p.l. Athreya and Schuh [5] showed that in the logistic case this is not possible.

Theorem 4.5. Let ElnCl I}. Then:

=

0, P(Cl

= 1) < 1 and I ==

sup{x :P(Cl < x) <

i) There exists a level {3, 0 < (3 < 1 and an atmost countable set ~ such that for any x E (0,1) - ~, P x (Xn 2: (3 for infinitely many n 2: 1) = 1 where P x stands for the intial condition Xo = x. Further, ~ is empty if P(Cl =4)=0. ii) If 1

 2, i.e. P(Cl > 2) > 0 then Px (limXn 2: 1 -

iv) For any initial value of X o, the empirical distribution

converges weakly to So w.p.l.

Remark: The above result has an interesting interpretation. In the critical case even though for large n the population size Xn is small with a high probability the population does not die out. Indeed w.p.l. the trajectory of Xn rises to heights {3 and beyond again and again. This may be referred to as the persistence of the critical logistic process.

5

Harris irreducibility

A Markov chain {Xn} with a measurable state space (S, 5) and transition function P(· , .) is Harris irreducible with reference measure ¢ if for every x E 5, ¢(A) > o::::} P(Xn E A for some n 2: 11Xo = x) > O. Here ¢ is assumed to be a a-finite nonzero measure. In this section we find sufficient conditions for Markov chains on S C R+ generated by the iteration of maps of the form f(x) = Oh(x) where he) is a continuous function. All the results of this section are from Athreya [2] where the reader will find full details. Let S = [0, L], L ::; 00, 0 = [0, k], k::::; 00 and h: S --t [0,(0) be continuous and strictly positive on (0, L). Let {Odi>l be i.i.d. r.v. with values in [0, k]. Let {Xn}n>O be the Markov chain defined by

(14)

K.B. Athreya

11

where Xo is independent of {Od. It is assumed here that for all 0 in [0, kJ, Oh(x) E S=[O,L]. The following provides a sufficient condition for Harris irreducibility of {Xn }.

Theorem 5.1. Suppose:

i) :3 0 <ex< k, <5 > 0 and a strictly positive Borel function W in J == (ex -<5, ex +<5) C (0, k) such that for all Borel sets B c J,

Q(B) - P(O, E B) 2:

L

w(O) dO.

ii) :3 0 0 such that Y x E 1== (p - 77, P + 77), and Borel set A c I with m(A) > 0

P(Xm E A/Xo = x) > O. If, in addition to i) and ii), suppose the following holds:

< L, :3 a finite set {ex1,ex2, ... ,exn} contained in support of Q(.) = P(Ol E·) such that Y n E I where

iii) Y 0 < x

lj+1 = I(lj, exj+1),

Yo = x,

i = 0,1,2, ... , n - 1.

Then (b): {Xn} is Harris irreducible on (0, L) with reference measure ¢(.) == m(· n I).

Remark: Condition i) is a smoothness hypothesis on the distribution of 01 . Without this, one could provide examples where the chain is not Harris irreducible. For example, if 01 has a finite support and {Xn} admits a stationary distribution Jr that is nonatomic then it cannot be Harris irreducible since for any initial value x, the distribution of Xn is discrete and hence cannot converge in the Cesaro sense and in variation norm to Jr. But Harris irreducibility and the existence of a stationary distribution 7r would imply such a convergence. Condition ii) is the existence of a periodic point. The first conclusion (a) is a local irreducibility result while (b) is a global irreducibility result. The next result exploits the fact that a sufficient condition for iii) of Theorem 5.1 to hold in the case when h(·) is S-unimodal on [0,1] (see definition below) is for the pair (p, ex) to be such that p is a stable periodic point for the map 1(·, ex) =ex h(} Definition: A map f: [0, 1]----+ [0,1] is called S-unimodal if i)

f

is three times continuously differentiable,

ii) I is unimodal with a mode at c in (0, 1) such that I" (c) < 0 and I is strictly increasing in (0, c) and strictly decreasing in (c,l), iii) f(O)

= f(l) = 0 and

iv) the Schwartzian derivative of I

(S f)(x)

= {

is < 0 for all 0 < x < 1.

j"I(X) !~)

3 (J"(X))2 ['(x)

-2

if I'(x) i= 0 if f'(x) = o.

Iteration of IID Random Maps on R+

12 Examples of S unimodal maps are

= x(1 -

f(x)

x),

f(x)

=

x 2 sin ?TX.

A result of Guckenheimer [10] is that if f(·) is a S-unimodal with a stable

~ 1,

periodic point p, i.e. for some m

f(rn)(p)

= p and If(rn)' (p)1 < 1, then

for almost all x in (0,1) (with respect to Lebesgue measure) the limit point set w(x) of the orbit Ox == {fen) (x), n ~ O} of x under f coincides with the orbit 'Y(p) of p under f, i.e. the set {p, f(p), ... ,f(rn-l) (p)}. Theorem 5.2. Let S each () E (). Suppose:

= [0,1]' 0

i) h(·) is S-unimodal ii) ::3 (p, ex) E Sx8, p

if(rn)' (p, ex)1

<1

-=I=-

°

=

[0, L], f(x,O) = Oh(x) with f : S

--->

and for some m ~ 1, fern) (p, ex)

S for

p and

(up is a stable periodic point of f(·, ex)).

°

iii) ::3 & > and a stricty positive function \fJ on J of e such that for all B c J,

Q(B) == P(Oi E B)

~

== (ex -&, ex +&) a subset

1

\fJ(O)m(dO)

where {(}di~1 are i.i.d. r.v. with values in 8 and m(·) is Lebesgue measure. iv) X n+1 = (}n+lh(Xn), n ~ 0, where Xo is independent of {(}di~1 with values in (0,1). Then {Xn} is Harris irreducible.

A special case of the above is the case of i.i.d. random logistic maps. Theorem 5.3. Let S = [0,1], () = [0,4], X n+1 = (}n+lXn(I-Xn) with {(}n}n~1 i. i. d. r. v. with values in [0,4] and Xo an independent r. v. with values in [0,1]. Suppose ::3 an open interval J c (0,4) and a strictly positive function \fJ on J such that for all B c J Q(B) = P((}i E B)

~

1

\fJ((})M(d(})

where m(·) is Lebesgue measure. If In (1,4) = Q, then assume in addition that there exists a(3 > 1 in the support ofQ(·) such that the map f(x, (3) (3x(l-x) admits a stable periodic point p in (0, 1). Then {Xn} is Harris irreducible. Suppose further that E In C 1 > and E lIn (1 - ~1 ) < 00. Then there exists a unique ergodic absolutely continuous stationary measure ?T such that the occupation measure

°

converges to

?T

I

in total variation norm.

Corollary 5.1. In the set up of Theorem 5.3 suppose (}1 has the uniform [0,4] distribution. Then::3 a unique absolutely stationary probability ?T such that ?T(O, 1) = 1 and for any < x < 1, IlPx(Xn E .) - ?TOil ---> where II . II is total variation.

°

°

K.B. Athreya

6

13

Some open questions 1) Persistence in the critical case. Extend the Athreya-Schuh [5] results to the present more general setting. 2) Nonuniqueness. Extend the nonuniqueness result of Athreya and Dai [4] for the logistic case to the present setting. 3) The condition E lIn (1 - ~l ) I < 00. For the random logistic case in the supercritical case this is a sufficient condition for the existence of a nontrivial stationary measure. However, it is known that if P(C 1 = 4) = 1 then von Neumann and Ulam [15] showed that the arcsine law is the unique ergodic has absolutely continuous stationary distribution. It is worth investigating whether this condition could be dropped. 4) The lognormal limit law in the critical case. It has been shown here that in the subcritical case the distribution of In Xn is approximately normal. Extend this to the critical case. 5) Statistical inference. Suppose the sequence {Xj} is observed for 0 Can one estimate the distribution of C 1 and gl (-)?

~ j ~

n.

K. B. Athreya

School of ORIE Cornell University Ithica, NY 14853 [email protected]

Departments of Mathematics and Statistics Iowa State University Ames, IA 50011 [email protected]

Bibliography [1]

Athreya, K. B. (2002a): Stationary measures for some Markov chain models in ecology and economics. Technical Report, School of ORIE, Cornell University (To appear in Economic Theory)

[2]

Athreya, K. B. (2002b): Harris irreducibility of iterates of lID maps on R+. Technical Report, School of ORIE, Cornell University (submitted).

[3]

Athreya K. B. and Dai, J. (2000): Random logistic maps I. Journal of Theoretical Probability. 13, No.2 595-608.

[4]

Athreya, K. B. and Dai, J. (2002): On the nonuniqueness of invariant probability measure for random logistic maps. Annals of Probability, 30, No.1 437-442.

[5]

Athreya, K. B. and Schuh, H. J. (2001): Random logistic maps II, the critical case. Technical Report, School of ORIE, Cornell University (to appear in Journal of Theoretical Probability)

[6]

Bhattacharya, R. N. and Majumdar, M. (1999): On a theorem of Dubins and Freedman. Journal of Theoretical Probability, 12 1165-1185.

14

Iteration of IID Random Maps on R+

[7]

Bhatacharya, R. N. and Rao, B. V. (1993): Random iterations of two quadratic maps in stochastic processes in honor of B. Kallianpur, Edited by Cambanis et aI, Springer-Verlag

[8]

Bhattacharya, R. N. and Waymire, E. (2002): An approach to the existence of unique invariant probabilities for Markov processes, In: Limit Theorems in Probability and Statistics (I. Berkes, E. Csaki, M. Csorgo, eds.), J. Bolyai Mathematical Society, Budapest.

[9]

Dubins, L. E. and Freedman, D. A. (1966): Invariant probability measures for certain Markov Processes. Annals of Mathematical Statistics, 37 837848.

[10] Guckenheimer, J. (1979): Limit sets of S-unimodel maps with zero entropy. Comm. Math. Phys. Vello, 655-659. [11] Hassel, M. P. (1974): Density-dependence in single-species populations. Journal of Animal Ecology 44 283-296. [12] Nummelin, E. (1984): General irreducible Markov chains and nonnegative operators. Cambridge University Press. [13] Ricker, W. E. (1954): Stock and recruitment. J. Fisheries Research Board of Canada II 559-623. [14] Vellekoop, M. H. and Hognas, G. (1997): Stability of stochastic population models. Studia Scientrarum Hungarica 13 459-476. [15] von Neumann and Ulam (1947): On combination of stochastic and deterministic processes. Bulletin of the American Mathematical Society 33 1120.

Adaptive Estimation of Directional Trend Rudolf Beran 1 University of California, Davis

Abstract Consider a one-way layout with one directional observation per factor level. Each observed direction is a unit vector in RP measured with random error. Information accompanying the measurements suggests that the mean directions, normalized to unit length, follow a trend: the factor levels are ordinal and mean directions at nearby factor levels may be close. Measured positions of the paleomagnetic north pole in time illustrate this design. The directional trend estimators studied in this paper stem from penalized least squares (PLS) fits in which the penalty function is the squared norm of first-order or second-order differences of mean vectors at adjacent factor levels. Expressed in spectral form, such PLS estimators suggest a much larger class of monotone shrinkage estimators that use the orthogonal basis implicit in PLS. Penalty weights and, more generally, monotone shrinkage factors are selected to minimize estimated risk. The possibly large risk reduction achieved by such adaptive monotone shrinkage estimators reflects the economy of the PLS orthogonal basis in representing the actual trend and the flexibility of unconstrained monotone shrinkage.

AMS classification: 62Hll, 62J99 Keywords: Directional data, penalized least squares, monotone shrinkage, economical basis, risk estimation, superefficiency, symmetric linear smoother

1

Introduction

Consider n independent measurements taken successively in time on the varying position of the earth's north magnetic pole. Each measured position may be represented as a unit vector in R3 that gives direction from the center of the earth to the north magnetic pole. Because of measurement errors, it is plausible to model the data as a realization of independent random unit vectors {Yi: 1 ::; i ::; n} whose mean vectors are 1]i = E(Yi). The subscript i labels time. The mean direction of Yi is defined to be the unit vector J-li = 1]dl1]i I. In this example, the mean directions {J-li} follow a trend, by which we mean that the subscript order matters and that the distance between J-li and J-lj may be relatively small when i is close to j. The naive estimator of J-li is fi,N,i = Vi. It can be derived as the maximum likelihood estimator of J-li when the distribution of Yi is Fisher-Langevin with mean direction J-li and precision parameter K-. Unless measurement error is very small, {fi,N,d is not a satisfactory estimator of the directional trend {J-ld. If we foresee that the trend in the means {1]i} may possess some degree of smoothness, not known to us, it is natural to look for more efficient estimators within classes of smoothers. In an instructive data analysis, Irving (1977) suggested forming local symmetric weighted averages of the {yd, normalizing these to unit length so as to obtain a more revealing estimator of directional trend. lThis research was supported in part by National Science Foundation Grant DMS99-70266.

15

Adaptive Estimation of Directional Trend

16

A symmetric weighted average is a particular symmetric linear smoother in the sense of Buja, Hastie, and Tibshirani (1989). We may consider any large class of symmetric linear smoothers as candidate estimators for the mean vectors {'r/i} and then proceed as follows: (a) estimate the quadratic risk of each such estimator without assuming any smoothness in the sequence of unknown mean vectors; (b) choose the candidate estimator that minimizes estimated risk; (c) normalize the estimated mean vectors to unit length so as to estimate the directional trend {/Li}. The candidate symmetric linear smoothers treated in this paper generalize certain penalized least squares (PLS) trend estimators. Details of the methodology are presented in Sections 2 and 3. Computational experiments reported in Sections 2 and 4 bring out how the proposed estimators reduce risk through constructive interplay between basis economy and unconstrained monotone shrinkage. Asymptotic theory developed in Section 5 supports key details of the methodology and quantifies how basis economy reduces risk. Other directional trend estimators proposed by Watson (1985), Fisher and Lewis (1985), and Jupp and Kent (1987) rely on analogs of cubic-spline or kernel methods for curve smoothing in Euclidean spaces. Tacit in these treatments are assumptions on the smoothness of the unknown trend. The methods of this paper assume no smoothness in the unknown directional trend but take advantage of any smoothness present to reduce estimation risk.

2

Construction of Estimators

As in the north magnetic pole example, choose subscripts so that Yi is the directional observation associated with the i-th smallest factor level. The directional trend estimators in this paper stem from penalized least squares (PLS) estimators for the mean vectors {'r/i: 1 SiS n} in which the penalty function is the squared norm of first-order or second-order differences of the {'r/d. It will be convenient in the exposition to suppose that the measured directions are unit vectors in RP. The practically important spherical and circular cases correspond to p = 3 and p = 2, respectively.

2.1

Candidate Estimator Classes

The n x p data matrix formed from the observed unit vectors {Yi: 1 SiS n} in RP is

(1)

Here Y(j) denotes the j-th column of Y. The analogously organized matrix of mean vectors is then

'r/~ H

=

E(Y) = {

(2) 'r/~

Rudolf Beran

17

First-order Penalized Least Squares. Let I . I denote Euclidean matrix norm, so that IAI2 = tr(AA') = tr(A' A). For any r x n matrix B, where r ::::;; n, define

= IY - HI2 + ,IBHI2.

d(H, B, ,)

(3)

Let D be the (n - 1) x n first-difference matrix -1

0

1 -1

0 1

0 0

0 0

(4)

D= 0

0

0

-1

1

The first-order PLS candidate estimators for H are the one-parameter family of symmetric linear smoothers {HD ( ,): , ~ O}, where

HDb) = argmind(H,D,,) = (I +,D'D)-ly

(5)

H

For positive " the estimated means in HDb) are more nearly constant in i than the measured directions Yi. A spectral decomposition of D'D is UDADUb, where UD is an orthogonal matrix, AD = diag{AD,n-i+t}, and AD,l ~ ... ~ AD,n = O. The eigenvectors that form the columns of UD are ordered so that the successive diagonal elements of AD are nondecreasing. It follows from (5) that

Let f Db) = {f D, i b): 1 ::::;; i :S n} denote the diagonal vector of the diagonal matrix FDb). Evidently 1 ~ fD,l b) ~ fD,2b) ~ ... ~ fD,nb) ~ O. Formula (6) plus modern algorithms for spectral decomposition provide a numerically stable method for computing the first-order PLS candidate estimators {HDb): , > O}. Other computational methods for HDb) are discussed in Press, Teukolsky, Vetterling and Flannery (1992), section 18.5. Second-order Penalized Least Squares. Let E be the (n - 2) x n seconddifference matrix

-1

o

2 -1

o

o

-1 2

0 -1

o o

o o

0

-1

2

-1

0

E=

o

o

(7)

The second-order PLS candidate estimators for H are the one-parameter family of symmetric linear smoothers {HEb): , ~ O}, where

HEb)

= argmind(H,E,,) = (1 +,E'E)-ly

(8)

H

For positive " the estimated means in HEb) are more nearly linear in i than the measured directions Yi. Replacing D in (6) with E yields a computationally useful alternative formula for HE (,). Here too, the diagonal elements of the matrix FEb) = diag{fEb)} satisfy 1 ~ fE,lb) ~ fE,2b) ~ ... ~ fE,nb) ~ O.

Adaptive Estimation of Directional Trend

18

Monotone Shrinkage Smoothers. Abstracting the structure in formula (6) suggests larger families of candidate symmetric linear estimators for H. Let

(9) and let F = diag{f}. The class of monotone shrinkage candidate estimators for H associated with a specified orthogonal matrix U is

CMon(U) = {H(f,U): f

E

F Mon } with H(f,U) = UFU'y

(10)

Evidently, the first-order PLS candidate estimators are a proper subset of CM on (UD) in which the shrinkage vectors are restricted to {fDb): 'Y > o}. Similarly, the second-order PLS candidate estimators are a proper subset of CMon(UE ) in which the shrinkage vectors are restricted to {fEb): 'Y > o}. The development in this paper emphasizes superefficient estimators of directional trend constructed from the candidate estimators CM on (UD) and CM on (UE) rather than from their PLS subsets. Enlarging the class of candidates can only decrease the risk of the best candidate and proves to be advantageous computationally.

2.2

Choice of Candidate Estimator and Normalization

If the risk function were known, it would be reasonable to choose the candidate monotone shrinkage estimator of H that minimizes risk, using a quadratic loss function for algebraic tractability. Because risk is not known, this oracle estimator is not realizable. Instead, we will select the candidate monotone shrinkage estimator that minimizes estimated risk and verify that the asymptotic performance of this estimator matches that of the oracle estimator. For the risk calculations, we will assume that the measured directions {Yi: 1 :::; i :::; n} are independent column vectors in RP, each having unit length. The distribution of Yi is Fisher-Langevin with mean direction JLi, a unit vector, and precision,." > O. Properties of this probability model are developed in Watson (1983) or Mardia and Jupp (2000) and are summarized in Fisher, Lewis and Embleton (1987). Let Z = U'Y and write, in analogy to (1),

z' Z

= { :1

(11)

Z'n

For any vector h, let ave(h) denote the average of the components of h. Section 3 develops an estimator for the quadratic risk of H(f, U) that is uniformly consistent over all f E F M as ,." and n tend to infinity: P

p(f, U) = ave[k- 1 qf2 + (Z2 - k- 1 q)(1 - 1)2]

with

Z2

=

L

z(j)'

(12)

j=l

Here q = p - 1 while k- 1 is a suitably consistent estimator for the dispersion ,.,,-1. Section 3.2 offers one possible construction of k- 1 . For specified orthogonal matrix U, define the adaptive monotone shrinkage estimator of H to be

HMon(U) = H(jMon(U) , U)

with

iMon(U) = argmin p(f, U). jEFMon

(13)

Rudolf Beran

19

The first and second-order monotone shrinkage estimators H Mon (1) and HMon(2) are specific instances of (13) with U = UD and UE respectively. In the notation that follows (6), the adaptive first-order PLS estimator of H is defined to be

HpLs (l) = H(fD(':;/D), UD) with 1D = argminp(fDb), UD).

(14)

,>0

Replacing the first-difference matrix D in (14) with the second-difference matrix E defines the second-order PLS estimator HPLS(2) of H. Normalizing to unit length the rows of these respective estimators of H yields the monotone shrinkage estimators MMon(k) and the PLS estimators MpLS(k) of the directional trend {/1i: 1 ~ i .:; n}. Data and Naive Fit

First-order PLS Fit

First-order Monotone Fit

Second-order PLS Fit

Sec.;ond-order Monotone Fit

Figure la. Competing fits to time-ordered measured positions of the paleomagnetic north pole. Linear interpolation in the top subplot shows the timesequence of the observed directions. A Paleomagnetic Example. The directional data fitted in Figure 1A consists of n = 26 measured positions of the paleomagnetic north pole taken from rock specimens at various sites in Antarctica of various ages. Kent and Jupp (1987, pp. 42-45) give the data and its provenance. Each subplot uses the Schmidt net, an area-preserving projection of the northern hemisphere onto the plane (cf. Section 4.2). The perimeter of each circle represents the equator while the center corresponds to the geographical north pole. Linear interpolation between successive mean directions or estimated mean directions is used to indicate the time sequence. The subplot in the first row exhibits the measured directions, which coincide with the naive trend estimator. Even with linear interpolation between successive observations, it is difficult to see a pattern, especially in the most recent observations near the geographic north pole. Cells (2,1) and (2,2) display the first-order estimates M pLs (l) and MMon(1) while cells (3,1) and (3,2) exhibit the second-order estimates M pLS (2) and MMon(2). Both monotone shrinkage fits and the second-order PLS fit are similar in appearance. Which should we

Adaptive Estimation of Directional Trend

20

use? On the basis of estimated risks and diagnostic plots, we will argue in Section 4.1 that the best of the competing estimates for this data is MM on (1). Next best, though with substantially larger estimated risks, are MMon(2) and MPLS (2), in that order. In their analysis ofthe data, Kent and Jupp (1987, Figs. 1 and 2) unwrapped the sphere and data onto a plane, used a cubic spline fit to the planar data, then v.:rapped this fit back onto the sphere. Their spline fit on the sphere is similar to M pLS but has a surprising kink in the tail near the left edge of the Schmidt net plot. They noted that this kink lacks physical significance and is an artifact of the spline-fitting technique. That the three estimates MMon(1), M PL s(2) and MMon(2) agree in broad visual features with the Kent-Jupp spline estimate but lack its suspect kink is a point in favor of the shrinkage estimates. Section 3 treats estimation of dispersion ",,-1, risk estimation, and computational algorithms. Diagnostic plots for competing PLS or monotone shrinkage estimators and computational experiments are the subject of Section 4. Asymptotic theory in Section 5 brings out three important properties. First, adaptation works for the PLS and monotone shrinkage candidate estimators in the sense that minimizing estimated risk also minimizes risk asymptotically. Second, the asymptotic risk of the estimators MMon(k) and MpLS(k) never exceeds that of the naive estimator and can be much smaller. Third, for greatest superefficiency of these estimators, the projection of the mean vectors {'I]i} on the first few columns of U should yield an accurate approximation to the {'I]i}. A diagnostic plot is available for identifying this favorable situation.

3

Estimated Risks and Algorithms

This section motivates the risk estimator p(j, U) defined in (12) and discusses methods for computing the directional trend estimators MMon(k) and MPLS(k) defined above.

3.1

Estimating Risks

We suppose in our analysis that the directions {Yi: 1 :s: i :s: n} are independent unit random vectors in RP. The distribution of Yi is Fisher-Langevin (/Lil ",,). As "" tends to infinity, it is known that, for q = p - 1,

'l]i = E(Yi)

COV(Yi)

= [1 - (2",,)-l qJ/Li + 0(",,-1) = ",,-I (I

-

/Li/L~)

+ 0(",,-1)

(15)

and that (16) (see Watson (1983), chapter 4). The limiting normal distribution on the right side of (16) is singular, supported on the q dimensional subspace orthogonal to /Li. From (15) and independence of the rows in Y, E(Y(j»)

= 'I](j),

(17)

Rudolf Beran

21

where (JTj = ~ -1 (1 - J17j) unit vector J1i. Hence,

+ o( ~ -1)

p

and J1ij denotes the j- th component of the

p

E LY(j)Y(j) = L 'Tl(j)'Tl(j) j=l j=l as

+ ~-lqI + 0(1\:-1)

(18)

~

tends to infinity. The performance of any directional trend estimator {Pi measured indirectly through the normalized quadratic loss n

= ildlilil} will be

p

LnCH, H) = n- 1H - HI2 = n- L IfJi - 'Tli1 = n- L lil(j) - 'Tl(j) 12 , 1

2

1

1

i=l

(19)

j=l

which compares the {fJi} with the means {'Tli}. This loss function leads to tractable formulae for risk. Tacit in our use of this loss is the supposition that a good estimator of the trend in means {'Tld will map, by normalizations to unit length, into a good estimator of the directional trend {J1d. Experiments reported in Section 4 offer empirical support for this assumption. For specified orthogonal matrix U, let Z = U'Y as in (11), B = E(Z) = U' H and 3(j, U) = u' H(j, U) = FZ. By analogy with equation (11),

(20)

From (18), p

E L z(j)z(j) j=l

p

= L e(j)e(j) + ~-lqI + 0(~-1)

(21)

j=l

as ~ tends to infinity because z(j) = U' Y(j) and e(j) = U''Tl(j)' Under loss (19) and the Fisher-Langevin model, the risk of candidate estimator H (j, U) is

Rn(H(j, U), H,~) = n-1EIH(j, U)-HI2 = n- 1EIFZ-BI 2 = Rn('3(j, U), B, ~). (22) Let

p

p(j,e,~) = ave[~-lqf2 +e(1- 1)2J

with

e = Le[j)'

(23)

j=l Here all operations on vectors are performed componentwise as in the S language (cf. Becker and Chambers (1984)). Applying (21) to (22) yields p

n- L ElFz(j) 1

e(j) 12

j=l p

n- L tr[F 2Cov(z(j)) 1

+ (I -

F)2e(j)e(j)J

j=l p(j,e,~) +0(~-1)

(24)

Adaptive Estimation of Directional Trend

22

as ~ tends to infinity. We argue in Section 5 that, for any choice of the orthogonal matrix U,

where

(26) This entails that the asymptotic risk of BMon(U) matches that of the oracle estimator, which is the monotone shrinkage estimator that minimizes actual risk. In particular, this asymptotic risk cannot exceed ~-lq, the asymptotic risk of the naive trend estimator, and is often much smaller. To estimate the risk function in (22), it suffices for large ~ to estimate the function ~). It follows from (21) and the definition of z2 in (12) that

pu,e,

p

n- Etr[(I - F)2 L z(j)z(j)] 1

j=l p

n- 1 tr[(I - F)2{L~(j)~(j)

+ ~-lqI}] + o(~-l)

j=l

ave[(e

+ ~-lq)(1 -

J)2]

+ O(~-l).

(27)

This calculation, (23), and (24) motivate estimating the risk of BU, U) by the function pU, U) defined in (12). Section 5 gives this risk estimator much stronger theoretical support. Scrutiny of formulae (23) and (24) throws light on ideal choice of the orthogonal basis matrix U. We say that the basis provided by the columns of U is economical in representing 3 if all but the first few components of ~2 are very nearly zero. In that case, setting the first few components of f equal to one and the remaining components to zero yields a monotone shrinkage candidate estimator of H whose risk, for large ~, is much smaller than that of the naive estimator Y.

3.2

Estimating Dispersion

A simple first-difference estimator of dispersion ~-l may be constructed from the norm of first-differences among the observed directions {yd. The asymptotic approximations below indicate that the bias of this estimator is modest if the norm of first differences among the mean vectors {7]i} is relatively small. When this is not the case, analogous estimators of dispersion can be constructed from the norm of higher-order differences. The first-difference dispersion estimator is n

k- 1 = (2q)-1

L

IYi - Yi_11 2 .

(28)

i=2

If

n

lim lim ~n-l ~ Irli -7]i-11 2 = 0,

n--+CX) K-+CX)

~ i=2

(29)

Rudolf Beran

23

then r;;,-I is a consistent estimator in the sense that lim lim KEIr;;,-1 - K-11 = O. n--+oo

To verify this, let T = L~=2IYi-Yi-112, ei = rJi-I). Evidently

n

KT

=

n- I

2::

(30)

fl,----tOO

K I / 2 (Yi-rJi)

n

lei - ei_11 2 + n- I

i=2

2::

and di =

K I / 2 (rJi-

n

Idi l2 + n- I

i=2

2:: d~(ei -

ei-I).

(31)

i=2

Because of (16), Skorokhod's theorem and Vitali's theorem, there exist versions of the {ei} and independent random column vectors {Wi} such that the distribution of Wi is N(O, I - fJifJ~) and limA;--+oo Elei - Wi 12 = O. These facts and (29) imply n

lim lim EIKT - n- I """ IWi - Wi_11 21 = O. n----+oo ~ i=2

(32)

K,----tCX)

On the other hand, n

lim lim Eln- I """ IWi - Wi_11 2 - 2ql n--+oo /,£-+00 ~ i=2

=

O.

(33)

Limits (32) and (33) imply the consistency property (30) for the original random variables.

3.3

Computational Aspects

The following remarks concern computation of the directional trend estimators MMon(k) and MPLS(k). Let fj = (z2 - r;;,-lq)/z2. Because the estimated risk function (12) satisfies

(34)

i

definition (13) of M on (U) is equivalent to the constrained isotonic weighted least squares evaluation

iMon(U) = argmin ave[(J - fj)]2 z 2].

(35)

jE:FMon

This expression reveals that iMon(U) is a regularization of the raw shrinkage vector fj E (-00, 1]n. Let 11 = {h E Rn: hI ~ h2 ~ ... ~ h n }, a superset of FM. An argument in Beran and Diimbgen (1998) shows that

The pool-adjacent-violators algorithm for isotonic regression (see Robertson, Wright and Dykstra (1988)) finds expeditiously in a finite number of steps. The positive-part clipping in (36) arises because fj is restricted to (-00, 1]n rather than to [0, 1]n.

J

Adaptive Estimation of Directional 'Trend

24

Similarly, definition (14) of iD for first-order PLS is equivalent to the constrained nonlinear weighted least squares evaluation (37) where !D(r) is given in the discussion that follows (6). The S-Plus function nls () may be summoned to solve this problem iteratively, in the manner described on p. 244 of Venables and Ripley (1999). Simple grid search provides the necessary starting approximation to iD. With minor changes in the code, the R function nls () also iterates to iD. Computation of iE for second-order PLS is entirely analogous. In numerical experiments, computation of the monotone shrinkage estimator MMon(k) was considerably faster than computation of the motivating PLS estimator MpLS(k). This finding together with the theoretical superiority in risk of MMon(k) over MPLs(k) provides strong grounds for considering only the former.

4

Experiments and Diagnostics

Section 4.1 discusses estimated risks and diagnostic plots for the competing fits to the paleomagnetic data presented in Section 2.2. Further experiments with artificial data, described in Sections 4.2 and 4.3, suggest that the orthogonal matrices UD and UE implicit in the PLS fits provide economical bases for a range of directional trends. Consequently, the PLS estimators described in Section 2 have much smaller estimated risk than the naive trend estimator; and the associated monotone shrinkage fits reduce risk further. The experimental results support theoretical conclusions developed in Section 5 about the benefits of basis economy while revealing aspects of estimator performance not covered by the asymptotics.

4.1

Paleomagnetic Data

For this data, k = 19.7 and the rescaled risk estimates kp(}, U) for the competing fits displayed in Figure 1A are: Naiv 2.000 The estimate MM on (1) is the clear winner in having smallest estimated risk, far smaller than that of the naive estimator. It is noteworthy that the relatively small estimated risk of MMon(1) is coupled with a pleasing visual appearance. The estimate gives a clear picture of the time-trend in the position of paleomagnetic north pole as measured from Antarctica. The diagnostic plots in Figure 1B provide further insight into the behavior of these directional trend estimates. Let v = k l/2 lzl. In cells (1,1) and (1,2), the plots of I 2 versus i suggest that UD provides a more economical basis for the unknown trend than UE: the {Vi} for the first basis tend to zero faster than for the second basis. The square root transformation enhances visibility of the smaller components. Greater basis economy explains why MMon(1) has smaller estimated risk than MMon(2).

vJ

Rudolf Beran

25

Cell (2,1) displays, with linear interpolation, the successive components of the shrinkage vectors lMon(UD) (dashed line) and !D(iD) (solid line) that enter into the constructions of MMon(l) and MpLs(l) respectively. We see that !D(iD) provides only a rough approximation to the better lMon(UD) and gives more weight to higher "frequencies." This observation explains both the ragged visual appearance of M pLs (l) and the substantially smaller estimated risk of MMon(l). The free-floating points in cell (2,1) are the components of the raw shrinkage vector 9 plotted against i without interpolation. It is these highly irregular values that monotone and PLS shrinkage vectors approximate in constrained fashion through (35) and (37) respectively. In cell (2,2), the analogous plots of the shrinkage vectors on (UE) (dashed line) and ! E (i E) (solid line) reveal that the latter is a good approximation to the former. This explains why the estimated risk of MMon(2) is not much smaller than that of MPLS(2). Both the economy of the orthogonal basis and the quality of the shrinkage strategy affect the risk of the directional trend estimate. In this example, first and second-order PLS generate orthogonal bases that are plausibly economical. However, the strongly constrained one-parameter shrinkage strategy implicit in PLS can fail to exploit basis economy. Adaptive monotone shrinkage takes full advantage of basis economy and is computationally faster than adaptive PLS. There seems little reason to use PLS trend estimators except as a source of potentially economical orthogonal bases.

1M

v for First-order Basis

v for Second-order Basis

>

"0 o

a:

>

g

OJ

a:

10

15

20

10

25

U)

15

20

25

Component

Component

First-order Shrinkage Vectors

Second-order Shrinkage Vectors

C!

C!

.,

.,0

0

I

N

~

'"0

'"0

f"

"0

U)

0

OJ

OJ

0

0

0

0

0

0 10

15 Component

20

25

10

15

20

25

Component

Figure 1 b. Diagnostic plots for fits to the paleomagnetic north pole data. Top row displays the components of v = k 1 / 2 1zl for each orthogonal basis. Bottom row displays the shrinkage vectors defining MpLs(k) (solid line) and MMon(k) (dashed line).

4.2

Generating and Plotting Trend Data

This subsection summarizes ideas used to generate and plot pseudo-random directional trend data in three dimensions. All calculations and plots were done in

Adaptive Estimation of Directional Trend

26

Windows S-Plus 3.2 with set. seed(2). As a software check, the computations were repeated in Unix S-Plus 3.4. Very similar results were obtained, after small changes in the code, in Unix R 1.00.

Cartesian and Polar Representations. A direction in R3 is a unit vector (a, b, c)' that has an equivalent polar coordinate representation (0, ¢), where oE [0,7r] and ¢ E [0, 27r). Direction (0,0,1)' is the north pole of the coordinate system. On the one hand,

U

=

a = sin(O) cos(¢),

c = cos(O).

b = sin(O) sin(¢),

(38)

On the other hand,

o=

cos -1 (c) E [0, 7r],

¢ = tan-l (b/a) E [0, 27r).

(39)

These values may be computed by using S-Plus functions acos ( ) and atan( , ).

Generating a Fisher-Langevin (/-1, /'1,) random direction. Let VI, V2 be independent random variables, each uniformly distributed on [0,1]. Define

o

cos- 1(/'1,-1<5 - 1)

¢

27rV2 ·

with

<5 = log[l

+ (exp(2/'1,) -

1)V1 ] (40)

The random unit vector u with polar coordinates (0, ¢) has a Fisher-Langevin distribution with mean direction Vo = (0,0,1)' and precision /'1, (e.g. Mardia and Jupp (2000)). For any unit vector /-1, the orthogonal matrix (41) rotates Vo into /-1 (Watson (1983, p. 28). Thus, the random unit vector O(/-1)u has a Fisher-Langevin distribution with mean vector /-1 and precision /'1,.

Generating and Plotting Trend Data. Let f and g be functions that map [0, 1] into, respectively, [0,7r] and [0, 27r). The pairs OjL,i

= j[i/(n + 1)],

¢jL,i

= g[i/(n + 1)],

1::; i ::; n

(42)

determine in polar coordinates a trend of n successive mean direction vectors {/-1i}. Let {Ui: 1 :S i :S n} be independent unit random vectors, each constructed using (40) and (38) to have a Fisher-Langevin (vo, /'1,) distribution. Then the

Yi

=

O(/-1i)Ui,

1::; i ::; n

(43)

are independent and Yi has a Fisher-Langevin (/-1i, /'1,) distribution. This method, applied to pseudo-random Uniform (0,1) variates, generated the data for the experiments in the next subsection. The figures in this paper use the Schmidt net to plot directions in three dimensions. In this area-preserving projection of the sphere into the plane, the three-dimensional direction (0, ¢) is plotted as the planar point having polar coordinates (r, ¢), where

r r

2 sin(O /2)

if

2 sin[ (7r - 0) /2]

0::; 0 ::; 7r /2 if

7r /2 < 0 ::; 7r.

(44)

The north pole Vo maps into the point (0,0) and the equator of the sphere maps into a circle of radius J2 about that point. Watson (1983), pp. 21-22, gives further details.

Rudolf Beran

4.3

27

Artificial Data

To probe how the nature of the trend might affect what we discovered in analyzing the paleomagnetic data, we will consider three sets of pseudo-random directional trend data. Each is generated as described in Section 4.2 with n = 300 observed directions and precision,," = 40. The functions f and 9 that determine the actual trend in mean direction are:

Wobble: g(t) = 47ft; f(t) = .37f[t +.2 + .15sin(367f)t]. Bat: g(t) = .47fsin(67ft); f(t) = .87f(t - .5). Jumps: g(t) = 27ft; f(t) = .27f if 0 ::; t ::; .15, = .17f if .15 < t ::; .3, = .47f = .27f if .45 < t ::; .65, = .37f if .65 < t ::; .8, and = .47f if

if .3 < t ::; .45, .8 < t ::; 1.

The following display reports, for each set of data, the rescaled risk estimate kp(j, U), where U is the orthogonal matrix and J is the shrinkage vector that define the superefficient estimator.

MpLs (l) MMon(1) MpLS (2) MMon(2) Wobble Bat Jump

.209 .143 .196

.109 .051 .164

.208 .051 .194

.107 .035 .165

Naive 2.000 2.000 2.000

The artificial Wobble data was constructed to resemble observations on the Chandler-wobble of the geographic north pole, blown up to wander over a larger portion of the northern hemisphere and given greater measurement errors. Brillinger (1973) analyzed actual Chandler-wobble data using time series techniques in the tangent plane to the north pole. In Figure 2, cells (1,1) and (1,2) present, with linear interpolation, the true Wobble mean directions and the observed directions. Cells (2,1) and (2,2) display the superefficient first-order estimates MP LS (1) and MM on (1). Cells (3,1) and (3,2) give the second-order estimates M pLS (2) and MMon(2). The interpolated true mean directions are superposed as a dotted curve on top of each estimate. Visually, each monotone fit improves on the respective PLS fit that provided the orthogonal basis used; the first-order and second-order monotone fits are similar; and each smoothed estimate improves greatly upon the naive estimate of directional trend. Estimated risks for the fits to the Wobble data, reported above, support these assessments. For this data, using orthogonal matrix UE in place of UE scarcely affects estimated risk. However, using monotone shrinkage in place of PLS halves the estimated risks for both choices of basis. Diagnostic plots (not given) akin to Figure 1B suggest that UD and UE provide comparably economical bases here. For this data, the shrinkage vector f D em) only roughly approximates the better JMon(UD ) and gives more weight to higher "frequencies". This explains both the ragged visual appearance of MpLs (l) and the substantially smaller estimated risk of MMon(1). The shrinkage vector defining M pLS (2) is likewise a rough approximation to that for MMon(2), though it gives less weight to higher "frequencies."

Adaptive Estimation of Directional Trend

28

Trend

Data and Naive Fit

90

90

'"@r'

180 ..

270

270

First-order PLS Fit

First-order Monotone Fit

90

90

270

270

Second-order PLS Fit

Second-order Monotone Fit

90

90

270

270

Figure 2. Competing fits to the Wobble directional trend_ Linear interpolation in cell (1,2) shows the time-sequence of the observed directions_

Rudolf Beran

29

Trend

Data and Naive Fit

90

90

180

270

270

First-order PLS Fit

First-order Monotone Fit

90

90

270

270

Second-order PLS Fit

Second-order Monotone Fit

9{)

90

270

270

Figure 3. Competing fits to the Bat directional trend. Linear interpolation in cell (1,2) shows the time-sequence of the observed directions_

Adaptive Estimation of Directional Trend

30

Trend

Data and Naive Fit

90

90

.0

180

270

270

First-order PLS Fit

First-order Monotone Fit

90

90

270

270

Second-order PLS Fit

Second-order Monotone Fit

90

90

270

270

Figure 4. Competing fits to the Jumps directional trend. Linear interpolation in cell (1,2) shows the time-sequence of the observed directions. The graphics and estimated risks for the Bat data behave differently. In the diagnostic plots (not given), the basis UE appears to be more economical that UD. Consequently, MMon(2) has smaller estimated risk than MMon(l). The clear winner, visually in Figure 3 as well as in estimated risk, is MMon(2). In this example, as for Wobble, the PLS shrinkage vector fD(iD) has difficulty approximating the better JM on (UD). Second-order PLS is closer in performance to the second-order monotone fit, but is still inferior. In diagnostic plots (not given) for the Jumps data, the {vd damp down to zero more slowly than in the preceding examples and at comparable rates for both orthogonal bases. Neither basis seems more economical than the other, a circumstance reflected in the estimated risks of the monotone shrinkage estimates. In this example, fD(iD) approximates JMon(UD) fairly well and fE(iE) approximates f~ on (UE) fairly well. Consequently, the differences among the monotone shrinkage and PLS estimates are not visually impressive in Figure 4.

5

Some Asymptotics

The following theorem shows that the loss and risk are asymptotically equal in this estimation problem and that the estimated risk converges to their common asymptotic value p(J, K} These findings reflect the ill-posed character of estimating directional trend and suggest an extended analysis that formally

e,

Rudolf Beran

31

justifies selecting candidate estimator to minimize estimated risk and quantifies the effect of basis economy.

Theorem. Suppose that for r > 0, lim lim

KEIh:- 1 - K- 11=

sup

n-+oo K-+OO Kn- 1 lHl 2 sr

o.

(45)

For W equal to Ln(iI, H), Rn(H, H, K) or fJ(j, U), lim lim

KEIW - p(j,

sup

n-+oo K-+OO Kn-1lHl2sr

e, K)I = O.

(46)

Proof. Fix n. Let S(r) = {B: Kn- 11B1 2 ~ r}. Note that B E S(r) if and only if Kn- 1 1H1 2 ~ r. Applying (45) to definitions (12) and (23) yields KElfJ(j, U) - p(j,

e, K)

KElave[(z2 -

1

V

+ 0(1)

K- 1 q

- e)(1 - f)2]1

+ 0(1) (47)

say.

The remainder term tends to zero uniformly over B E S(r) as K tends to infinity. Let A = U FU' and B = (1 - A)2. Using notation from Section 2 and Section 3.2 and the identity ave(h 2) = n- 1tr(hh') = n- 1tr(h'h), p

V

p

Kn- 11LY(j)BY(j) - K- 1 qtr(B) - L 1](j)B1](j) 1 j=l j=1

=

P

P

n- 11L e(j)Be(j) - qtr(B) j=1

+2L

Kl/21](j)Be(j) 1

j=1

p

n- 1 1

p

L[e(j)Be(j) - E( e(j)Be(j))] j=l

+2L

Kl/21](j)Be(j)

1

+ 0(1).

j=l

(48) The last step uses the calculation p

p

L E(e(j)Be(j)) j=l

=L

tr[B(I - JL7j)]

+ 0(1) = qtr(B) + 0(1).

(49)

j=l

On the one hand,

n

~

n- 1 EI

n

L[biie7j - E(bii e7j)1 i=l

Let 2+ stand for 2 + 0(1) as in (50) is bounded above by

K

tends to infinity. For increasing

K,

(50)

the first term

n

n

b 2] n -IVar 1/2[~ ~ iieij

+ n- 1EI L L bikeijekjl· k=1 i=,ik

< 2~2n-l [L b7iF/ 2 ~ 2~2n-l [tr(B 2)]1/2

i=l

<

i=1 1/2 -1/2 2+ n .

(51)

Adaptive Estimation of Directional Trend

32

For

1'10

tending to infinity, the second term in (50) is bounded above by n

n

n- 1 Var 1 / 2

[L L bikeijekj]

n- 1[LL b7k]1/2 k=l i#k 1 n- [tr(B 2)]1/2 ~ n- 1/ 2.

<

k=l i#k

< Thus, for every n and every r

(52)

> 0,

On the other hand, because the largest eigenvalue of B2 lies between 0 and 1,

n -1 EI I'Io 1 / 27](j)Be(j) 1

<

n -1 Var 1 / 2 [1'10 1/ 27](j) Be(j)]

< n- 1[I'Io7](j)B27](j)P/2 < n- 1 / 2 [n- 1 I'Io17](j 12P/2.

(54)

Thus, p

sup n- 1EI ~ 1'10 1 / 2 7]'. Be(j) 1 = O(n- 1/ 2).

lim K.--+OO ';::;'

-E

S( ) r

~

j=l

(55)

(J)

Combining (47), (48), (53) and (55) yields Theorem assertion (46) for W = p(j, U). Because of (24), the result also holds for W = RnCH, H, 1'10). From definitions (19) and (23), I'IoEILnCH,H) - p(j,e, 1'10)1 can be expressed as p

I'IoEI Lave[f2(z(j) - e(j))2 - I'Io- 1qf2

+ 2f(1- j)e(j) (Z(j)

- e(j»)]1

j=l p

=

n- 1EI Lle(j)Ce(j) - qtr(C)

+ 21'10 1/ 27](j)Ge(j) I,

(56)

j=l where C = A2 and G = A(I - A). Analysis of the the right-side of (56) by the method used for (48) establishes assertion (46) when W = LnCH, H).

Extensions. A more elaborate empirical process argument, akin to the analysis in Section 6 of Beran and Diimbgen (199S), shows that

for the three values of W in the Theorem above. This strengthening of (46) clarifies the performance of the adaptive estimator HMon(U), It follows from (57) that p(j~on' U) is a consistent estimator for the risk of HMon(U) and that lim lim n--+oo

K.--+OO

sup I'IoIRn(HMon(U), H, IHI 2 :Sr

1'10) -

TMon(e, 1'10)1 = 0,

(5S)

K.n- 1

where

(59)

Rudolf Beran

33

To quantify the affect of basis economy on the risk of the adaptive estimator

HMon(U), let (60) for b E [0, 1] and r > O. The basis U is highly economical for the mean matrix H if 2 E S(r, b) for a small value of b. By specialization of a theorem in Pinsker (1980), sup

min p(j,

2ES(r,b) jEFMon

e, "") =

min

sup

p(j,

jEFMon 2ES(r,b)

e, "") =

qrb/(r + b).

(61)

For details, see Theorems 1 and 4 in Beran (2000), noting that the result depends only on the function p(j, not on the model in the background. Combining (58), (59) with (61) establishes

e, ""),

lim lim n-+CX) fi-+CX)

sup

2ES(r,b)

""Rn(HMon(U), H, 11:)

=

qrb/(r

+ b) :::; qb.

(62)

Thus, the maximum asymptotic risk of the directional trend estimator MMon(U) is small if the basis U is highly economical in the formal sense that b is small. Though this formulation of basis economy is overly simplified for the sake of mathematical analysis, result (62) supports the experimental finding in Section 4 that a quick decrease in the higher order components of reduces the risk of

e

MMon(U). Rudolf Beran Department of Statistics University of California, Davis Davis, CA 95616, USA

Bibliography [1] Becker, R A. and Chambers, J. M. (1984), S: An Interactive Environment for Data Analysis and Graphics, Belmont, CA: Wadsworth. [2] Beran, R (2000) "REACT Scatterplot Smoothers: Superefficiency Through Basis Economy," Journal of the American Statistical Society, 95, 155-171. [3] Beran, R, and Diimbgen, 1. (1998), "Modulation of Estimators and Confidence Sets," Annals of Statistics, 26, 1826-1856. [4] Brillinger, D. R (1973), "An Empirical Investigation of the Chandler Wobble and Two Proposed Excitation Processes," Bulletin of the International Statistical Institute, Book 3, 413-433. [5] Buja, A., Hastie, T., and Tibshirani, R (1989), "Linear Smoothers and Additive Models" (with discussion), The Annals of Statistics, 17, 453-555. [6] Fisher, N. 1. and Lewis, T. (1987), "A note on Spherical Splines," Journal of the Royal Statistical Society, Series B, 47, 482-488.

34

Adaptive Estimation of Directional Trend

[7] Fisher, N. 1., Lewis, T., and Embleton, B. J. J. (1985), Statistical Analysis of Spherical Data, Cambridge: Cambridge University Press. [8] Irving, E. (1977), "Drift of the Major Continental Blocks Since the Devonian," Nature, 270, 304-309. [9] Jupp, P. E. and Kent, J. (1987), "Fitting Smooth Paths to Spherical Data," Applied Statistics, 36, 34-46. [10] Mardia, K. V. and Jupp, P. E. (2000), Directional Statistics (second ed.), New York: Wiley

[11] Press, W. R., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992), Numerical Recipes in FORTRAN: The Art of Scientific Computing (second ed.), Cambridge: Cambridge University Press [12J Robertson, T., Wright, F. T., and Dykstra, R. L. (1988), Order Restricted Statistical Inference, New York: Wiley. [13J Venables, W. N., and Ripley, B. D. (1999), Modern Applied Statistics with S-PLUS (third ed.), New York: Springer. [14J Watson, C. S. (1983), Statistics on Spheres, University of Arkansas Lecture Notes in the Mathematical Sciences, Volume 6, New York: WileyInterscience. [15J Watson, C. S. (1985), "Interpolation of Directed and Undirected Line Data," in Multivariate Analysis VI, ed. P. R. Krishnaiah, New York: Aca. demic Press, pp. 613-625.

Simulating Constrained Animal Motion Using Stochastic Differential Equations David R. Brillinger University of California, Berkeley Abstract Differential equations have long been used to describe the motion of particles. Stochastic differential equations (SDE)s have been employed for situations where randomness is included. This present work is motivated in part by seeking to describe the motion of mammals moving in a constrained region. Interesting questions that arise include: how to write down a pertinent (bivariate) SDE, how to include explanatories, and boundaries and how to simulate realizations of a process?

1

Introduction

Differential equations have long been used to describe the motion of particles and stochastic differential equations (SDE)s have been employed for situations where there is randomness. Our work is motivated in part by the case of ringed-seals, elephant seals, cows, elk and deer. The last three are moving about together in an experimental forest in Oregon. The study was influenced by emerging data sets in wildlife biology. Biologists and managers wish to use these data sets to address questions such as: how to allocate resources, can different species share a habitat, are changes taking place? One large experiment, Starkey, is described in [6] and [23]. There are technical questions arising of interest to both probabilists and statisticians. Useful tools include: differential equations (DE)s, stochastic differential equations (SDE)s, reflecting stochastic differential equations (RSDE)s, and potential functions The paper includes review and the results of some elementary simulations, particularly for the case of constrained motion having in mind future data analyses. The work is preparatory to employing simulated realizations of SDE models. The sections of the paper are: Introduction, Some wildlife examples, Equations of motion, Stochastic differental equations, The constrained case, Results of some simulations, Several particles, and Discussion.

2

Some wildlife examples

The work of the paper, particularly the need to consider bounded domains, may be motivated by some examples from wildlife biology. Figure 1 shows the motion of a ringed seal as recorded in the Barrow Strait, North West Territories. The animal is constrained within an ice-covered lake that has several air holes. The trip starts at the dot. The animal dives to the bottom, swims around then returns to the air hole. It also looks at another air hole. The locations are available at irregular time intervals. The researchers were concerned with the animal's navigation, foraging and use of its underwater habitat. To study its navigational sense the eyes of the animal were covered during the dive graphed. The ecology of ringed seals is described in [14]. 35

Constrained Animal Motion

36

Figure 1. A ringed seal swimming about m an ice-covered lake. indicates the starting location.

The dot

Figure 2 shows the noonday positions of an elephant seal that started out from and returned to an island off Santa Barbara [9]. The dots are the estimated noonday positions. The outward and the inward journeys are shown. Also shown for comparison is a great circle path. The animal's path fluctuates about it. The natural characteristics of the elephant seals are described in [28]. Figure 3 shows the estimated locations of an elk moving about in the Starkey Reserve in Eastern Oregon. There is a high benign fence about the reserve. The animal's track is estimated by the curve of broken line segments, the brokedness reflecting the sampling at disparite times. The positions are estimated about every 1.5 hours. The animal keeps moving towards the fence on the southwestern side of Starkey. Details of the experiment may be found in [6].

3

Equations of motion

Differential equations have long been used to describe the motion of particles, see for example [12]. To begin consider one particle moving in the plane. Denote its location at time t by r(t) = (x(t), y(t)). Suppose that there is a potential field, H(r, t). Such an H controls the direction and speed of the particle. In particular it may be used to describe both attraction and repulsion, for example

David R. Brillinger

37

45

40

35

·150

·140

·130

·120

longitude

Figure 2. An elephant seal's migration path. The dots are the midday positions.

H(r) =

Ir-aI 2 , leads to attraction of the particle to the point a while 1/lr-aI 2 ,

leads to repulsion from a. Figure 4 includes a perspective plot of an attractive potential in the top left paneL Nelson, [19], Section 10 discusses the description of such motion. Letting v denote velocity the equations he sets down are:

dr(t) = v(t)dt dv(t) = - f3v(t)dt - f3\l H(r(t), t)dt Here \l is the gradient \l = (a/ax, a/ ay ). The quantity - f3\l H is the external force, and f3 the coefficient of friction. In the case that the friction f3 is large the equation is approximately

dr(t)

=

-

\l H(r(t), t)dt

This leads to the usual form taken in the study of SDEs, see (2) below. If H is given, the force field F is - \l H. If there exists an H such that F = - \l H then the field F is called conservative. Writing F = (Fx, Fy) a necessary condition for F to be conservative is (1)

If the domain is simply connected, this is also also sufficient and one has H (x, y) =

j

(X,y)

(a,b)

F . dr

38

Constrained Anima.l Aiotiol1

Km

Figure 3. An elk roaming about the fenced Starkey Reserve. The white blobs are fenced-off areas. see [29]. The "." here indicates a line integral. But, does an H exist? One may use (1) as a check. The Starkey Reserve is not simply connected, see the blobs in Figure 3. Ignoring this, (the scientists said that the fences around the blobs had fallen down), one data analysis, [6], did not rule out the possibility of the existence of a conservative potential field.

4

Stochastic differential equations

Let {B(t)} denote a bivariate Brownian motion. Given the functional parameters p. and E consider the equation

dr(t)

=

J1(r(t), t)dt

+

E(r(t), t)dB(t)

(2)

Conditions for the existence and uniqueness of solutions may be found in Bhattacharya and \"1aymire [4], Stroock and Varadhan [30] and Ikeda and Watanabe [13] for example. To tie in with the material of the previous section it may be the case that p(r, t) = - V H(r, t) for some H.

David R. Brillinger

39

The motion of {r(t)} may be periodic, for example when there is a seasonal or circadian effect. The motion may be bounded. The parameters I' and E may include explanatories, e.g. time of day, distance to nearest road.

4.1

Interpretations

Consider the model (2). Let H t = {r( u) ,t, u :s; t} denote the history of the process up to and including, t, then one has the expressions

E{dr(t)IHt}

~

I'(r(t), t)dt

var{dr(t)IHt} . ~ E(r(t), t)dt As well as providing interpretations these relations suggest how I' and E might be estimated given data. Examples are developed in [8].

4.2

Solutions and their simulation.

By a solution of the SDE is meant an r(t) existing given the Brownian process {B(t)}, see [4]. Often the way the existence of a solution is demonstrated suggests an algorithm for simulating the process. Let {f(t)} denote an approximation sequence and consider the so-called Euler scheme. It is

with an initial value r(to), a discretization {td ofthe interval and k = 0,1,2, .... Perhaps the tk will be equi-spaced. The points may be connected by line segments. This and other schemes are investigated in [15]. Next we consider the case where the motion of the particle is constrained.

5

The constrained case

The motion of an animal may be restricted to a region. The ringed seal was in a lake, the elephant seal in a layer at the Earth's surface, and the elk's domain had a high fence about it. In what follows: a domain D will be given, with boundary aD. The constraint may be formalized as requiring r(t) to be in the closure [) for all t

5.1

An example: diffusion on a sphere

Figure 2 shows the path of an elephant seal. Here the motion is confined to the surface of the Earth, really to a layer at the surface. The problem may be formalized as follows: suppose that a particle on the sphere is migrating towards a target at an average speed <5 and that the particle is subject to Brownian disturbances of variance (J2. In the case that <5 = 0 this is the so-called spherical Brownian motion that was studied by Perrin [20].

Constrained Animal Motion

40

In [5] the following equations were set down letting () and eP be the colatitude and longitude relative to the target with 0 :s: () :s: 7f and 0 :s: eP < 27f. With (Ut , Vi) 2 dimensional Brownian motion, consider the process 2

d()t

=

(-6

+

2 t:n

()t

)dt

+

(JdUt

(J dePt

=

-.-() dVi s~n

t

Estimates of the parameters, including the variance of measurement noise, are given in [9] for one data set. In the computations the likelihood function is estimated by simulation.

5.2

Some simulation methods

There are a number of papers developing the existence and properties of vector diffusions in restricted domains and there are a few that develop simulation methods. References are given below. Sometimes the parametrization does the constraining. A simple univariate example might involve the path being positive. This could be implimented by writing the process as the exponential of an unconstrained process. In the case of the elephant seal the variables employed did the constraining to the surface of a sphere. To handle the constrained circumstance researchers often write

dr(t) = I'(r(t) , t)dt

+

E(r( t), t)dB( t) - dA(r(t), t)

(4)

where A is an adapted process of bounded variation that only increases when r(t) is on the boundary aD. Its purpose is to reflect the particle back to the interior of jj. Looking for a solution to (4) with appropriate conditions is the so-called Skorohod's problem. Various results have been obtained concerning the existence of solutions, references include: [30], [17], [1], [10], [24], [31], and [25]. Simulations are useful for: program checking, likelihood computation, bootstrapping, and estimating H amongst other things. Three methods are described next. These methods are illustrated in Section 6. A basic point is that one cannot simply use (3) and throwaway a point if it goes outside the boundary for doing so would bias against certain types of behavior.

Method 1. Build a sloping steep wall. That is have a potential term H rising rapidly at the boundary aD, when moving from the interior. This leads to the SDE dr( t) = I'(r( t), t)dt + E(r( t), t)dB - \1 H(r(t), t)dt (5) The time sequence {td will be increasing and B(tk+l) - B(tk) written Jtk+l - tk Zk+l where the ZkS and their entries are independent standard normals. As above the approximant to the value at time tk will be denoted f(tk). It is seen that Euler's method given at (3) may be used directly to obtain an approximate solution as in,

David R. Brillinger

41

h,(tk)

+

+

I'(h,(tk) , tk)(tk+l - tk)

E(h(tk), tkh/t k+1 - tk Zk+l -

'VH(f(tk), td(tk+l - tk)

with e.g. H(r) = ad(r, aD)f3 for d distance and scalars a, 13. In the description of the next two methods IIo will denote the projection operator taking an r to the nearest point of D. Method 2. Penalization scheme. With A 1 0 let f3>..(r) An approximate solution is now generated via,

=

f>..(tk+l) h(tk)

+

+

I'(h(tk), tk)(tk+l - tk)

b(r>..(tk), tk)Vtk+l - tk Zk+l -

D,

Some points may lie outside of

{r - IIo(r)}/A.

f3>..(f(tk))(tk+l - tk)

but small A brings them close.

Method 3. Projection method. In this case the sequence of approximations to the solution is f(tk+d =

lID (f(tk)

+

I'(f(td, tk)(tk+l - tk)

+

E(f(tk), tk)Vtk+l - tk Zk+l)

These values do lie in D. The function A of (4) may be approximated by

L (f(tk) - II

D (f(tk)))

tk "5:t

By a special construction for the case of hyperplane boundaries Lepingle [16] gets faster rates of convergence. He remarks that the constructed process might go outside D during some interval tk to tk+l and provides a construction to avoid this. Some comparisons From equation (4) dr

=

pdt

+

EdB -

dA

while from (5) dr

pdt

+

EdB -

'V H dt

so one has the connection dA

~

V'Hdt

A crucial difference however is that the support of dA is on the boundary aD while the added term 'V H may be nonzero inside D. References for specific methods of simulation are: [18], [21], [22], [26]. Asmussen et al. [2] find that the sampling has to be suprisingly fine in the onedimnsional case if the Euler method is used. They suggest improved schemes. One can speculate on how the animals behave when they get to the boundary. They may walk along it for a while. They may run at it and bounce back. They may stand there for a while. This relates to the character of the reflections implicit in the simulation method employed. Dupuis and Ishii [10] allow different types of reflections, including oblique. Ikeda and Watanabe [13] allow "sticky" and "non-sticky" behavior at the boundary.

42

6

Constrained Animal 1V10tioll

Some simulations

;Q

c.':.

O,D

0.5

i 0

-1.Q

Figure 4. Simulation of a region of attraction at (O~O) and a circular boundary. The top left hand figure is the potential function employed, H. The top right is a simulated trajectory using Method 1. The bottom left used Method 2 and the bottom right Method 3. To get practical experience, SOIne elementary simulations were carried out. A naive boundary, namely a circle was employed to make obtaining the result of a projection easy. Figure 4 shows results for the three methods. There were n = 1000 equispaced time points and in each case the same starting point and random numbers were employed. The potential function, J.L, used is shown on the top left panel of the figure. Its functional form is a standard normal density rotated about the origin. The boundary is taken to be a circle of radius 1. The top right panel shows the result of lVIethod 1. The term added to the potential function to force the particle to remain in D is proportional to

This function rises to 00 on 3D. The path certainly stays within D and is attracted towards the center. Since the term added is not zero in the region D one is obtaining an approximate solution. When the point moves near the boundary it is repulsed. The bottom left panel shows the result of employing Method 2. The penalization parameter A was taken to be t"+l - tk. In this

David R. Brillinger

43

case the trajectory goes outside of the circle making the method's approximate nature clear. Of course, by choice of parameters one can make the excursions smaller. The bottom left panel shows the result of employing Method 3, i.e. projection back onto the perimeter of points falling outside. The path stays in the circle. We learned that the methods were not that hard to program and Method 1 was perhaps the easiest. The running times of the three methods were comparable. Methods 1 and 3 lead to paths in D. The paths generated by the three methods are surprisingly different despite the random number generator having the same starting point in each case. The presence of the boundary is having an important effect. The path behavior is reminiscent of the sensitivity to initial conditions of certain dynamic systems.

7

Several particles

We begin by mentioning the work of Dyson, [11], [27]. For J particles moving on the line Dyson considered the model

(6) J

=

1,2, .... ,J This corresponds to the potential function

H(x)

=

-"21 "~" log(Xj -

Xi) 2

ii=j

This function differs from the models considered previously in the paper in being random. One notes that there is long range repulsion amongst the particles and they will not pass each other with probability l. Spohn [27] considers the general process

dXj(t)

=

-

~ L H'(xj(t)

- xi(t))dt

+

dBj(t)

iofj

where H is a potential function. He develops scaling results and considers correlation functions and Gibbs measures. Figure 5 presents a simulation of Dyson's process for the case of 2 particles and (J = .1. In the figure one sees the particles moving towards 0 repeatedly, but consequently being repelled from each other. Consider next a more general formulation. Consider particles moving in the plane. Suppose there are J particles with motions described by {r j (t)}, j = 1, ... , J. Collect the locations at time t into a 2 by J matrix, s(t) = [rj(t)] and set down the system of equations j = 1,2, ...

(7)

with the B j independent bivariate Brownians. The Dyson model (6) is a particular case, with special properties. The components may all be required to stay in the same region D. Questions of interest, e.g. the interactions, now become questions concerning the entries

44

Constrained Animal Motion

Simulation of Dyson model for 2 particles

4

2

o

-2

-4

o

20

40

60

80

100

sigma = 0.1

Figure 5. A simulation of Dyson's model (6).

of I' and 'E. Attraction and repulsion might be modelled, e.g. attraction of the animals i and j via setting

One may be able to express the strengths of connection. One might study the properties of the distance Ir i (t) - r j (t) I to learn about the dependence properties amongst the particles. The 1', E might include distance to the nearest other particle. There are phenomena to include - animals lagging, clumps, repulsion, attraction, staying about the same distance, ... Lastly there may be animals of several types. The simulation methods already discussed may be employed here. With data, parameters may be estimated and inferences drawn, e.g. one can study differences of animal behavior. It does need to be remembered that behavior may appear similar because both particles are moving under the influence of the same explanatories rather than inherently connected as in the model (6).

8

Discussion

The paper is principally a review in preparation for statistical work to come. SDEs are the continuing element in the paper. They provide a foundation for the work in particular they offer processes in continuous time, there is an extensive literature, and they have been studied by both probabilists and statisticians.

David R. Brillinger

45

To a substantial extent the concern of the paper has been with the effect of boundaries. It turns out that there are a several methods for (approximately) simulating processes that are constrained. A small simulation study was carried out to assess relative merits. Certain practical difficulties arise. These include: choice of sampling times, choice of parameter values, goodness of approximation, the possible presence of lags in a natural model, and the finding of functional forms with which to include explanatories The regularity conditions have not been laid out. They may be found in the references provided. One can argue that the results are still far from best possible for there is a steady changing of assumptions e.g. re boundedness, convexity and closure. Many problems remain. There has been some discussion of the case of interacting animals here and in [7]. This is a situation of current concern. In practice it seems that often the process can be only approximately Markov for once the animal has finished some activity it seems unlikely to start it again immediately, e.g. drinking. This means one would like equations including time lags. It is easy to set down such equations, but not so easy to get at the properties of the motion. As an example one might consider

for some function /-11 and lag T. The deer may be following the elk at a distance. There are analytic questions such as the expected speeds. There is some literature going under the key words "stochastic delay equation" see [3]. Other interesting questions include: 1. Given the diffusion process (2), how does one tell from the form of I' and E if there is a closed boundary that keeps the process inside once it starts inside? One could check to see if E(r(t), t) vanishes on aD and that I'(r(t), t) does not point outside there. 2. How does one include in the model the possibility that the process may follow along the boundary for a period? What are other important types of boundary behavior? Ikeda and Watanabe's sticky and non-sticky behavior has already been mentioned. The focus has been on diffusion processes but Levy processes, with their jump possibilities, seem a pertinent model for some situations. Work does not appear to have been done on the Skorhod problem for Levy processes. We have taken an analytic approach in the work and in particular have left for later questions of statistical inference. The tools of model and simulation are basic in the paper and are needed when one turns to the inference issues. Simulation was used to estimate the invariant distribution of the elk in [6] and the likelihood function of an elephant seals's journey in [9].

Acknowledgements Many people helped me out with references and comments. I mention the probabilists: P. Dupuis, A. Etheridge, S. Evans, T. Kurtz, J. San Martin, J. Pitman, and R. Williams. I also mention the US Forest Service researchers: A. A. Agers, J. G. Kie, H. K. Preisler who provided the Starkey elk data and the

46

Constrained Animal Motion

marine biologists: B. Kelly and B. S. Stewart who provided the ringed seal and elephant seal data respectively. Apratim Guha noticed that two of the initial figures appeared inappropriate leading to revision. Rabi himself helped me out in developing the paper. He directed me to pertinent references and discouraged me from being over concerned about the analytic assumptions made in many of these. Rabi thank you for the pleasant years we were colleagues and for writing that so helpful book with Ed Waymire. This work was completed with the support of NSF grants DMS-9704739 and 0203921.

Bibliography [1] Anulova, S. V. and Lipster, B. Sh. (1990). Diffusional approximation for processes with the normal reflection. Theory Probab. Appl. 35, 411-423. [2] Asmussen, S., Glynn, P. and Pitman, J. (1995). Discretization error in simulation of one-dimensional reflecting Brownian motion .. Ann. Appl. Prob. 8, 875-896. [3] Bell, D. R and Mohammed, S-E. A. (1995). Smooth densities for degenerate stochastic delay equations with hereditary drift. Ann. Prob. 23, 18751894. [4] Bhattacharya, R N. and Waymire, E. C. (1990). Stochastic Processes with Applications. Wiley, New York. [5] Brillinger, D. R (1997). A particle migrating randomly on a sphere. Theoretical Probability 10, 429-443. [6] Brillinger, D. R, Preisler, H. K., Ager, A. A. and Kie, J. G. (2001). The use of potential functions in modelling animal movement. Pp. 369-386 in Data Analysis from Statistical Foundations. (Ed. A. K. M. E. Saleh.) Nova, Huntington. [7] Brillinger, D. R, Preisler, H. K., Ager, A. A. and Kie, J. G. (2002). An exploratory data analyisis (EDA) of the paths of moving animals. J. Statistical Planning and Inference. To appear. [8] Brillinger, D. R, Preisler, H. K., Ager, A. A., Kie, J. G. and Stewart, B. S. (2002). Employing stochastic differential equations to model wildlife motion. Bull. Brazilian Math. Soc .. To appear. [9] Brillinger, D. R and Stewart, B. S. (1998). Elephant-seal movements: modelling migration. Canadian J. Statistics 26, 431-443. [10] Dupuis, PI and Ishii, H. (1993). SDEs with oblique reflections on nonsmooth domains. Ann. Prob. 21, 554-580. [11] Dyson, F. J. (1963). A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 3, 1191-1198. [12] Goldstein, H. (1957). Classical Mechanics. Addison-Wesley, Reading.

David R. Brillinger

47

[13] Ikeda, N. and Watanabe, S. (1989). Stochastic Differential Equations and Diffusion Processes. North-Holland, Amsterdam. [14] Kelly, B. P. (1988). Ringed seal. Pp. 57-75 in Selected Marine Mammals of Alaska. (Ed. W. Lentfer). Marine Mammal Commission, Washington. [15] Kloden, P. E. a nd Platen, E. (1992). Numerical Solution of Stochastic Differential Equations. Springer, New York. [16] Lepingle, D. (1995). Euler scheme for reflected stochastic differential equations. Math. Compo in Simulations. 38, 119-126. [17] Lions, P. L. and Snitman, A. S. (1984) Stochastic differential equations with reflecting boundary conditions. Comm. Pure Appl. Math. 37, 511-537. [18] Liu, Y. (1993). Numerical Approaches to Stochastic Differential Equations with Boundary Conditions. Ph. D. Thesis, Purdue University. [19] Nelson, E. (1967). Dynamical Theories of Brownian Motion. Princeton U. Press, Princeton. [20] Perrin, M. F .. (1928). Mouvement brownien de rotation. Ann. l'Ecole Norm. Sup. 45, 1-51. [21] Pettersson, R (1995). Approximations for stochastic differential equations with reflecting convex behaviors. Stoch. Proc. Appl. 59, 295-308. [22] Pettersson, R (1997). Penalization schemes for reflecting stochastic differential equations. Bernoulli 3, 403-414. [23] Preisler, H. K., Ager, A. A., Brillinger, D. R, Johnson, B. K., and Kie, J. G. (2002). Modelling movements of Rocky Mountain elk using stochastic differential equations. Submitted. [24] Rozokosz, A. and Slominski, L. (1997). On stability and existence of solutions of SDEs with reflection at the boundary. Stoch. Proc. Appl. 68, 285-302. [25] Saisho, Y. (1987). Stochastic differential equations for multi-dimensional domain with reflecting boundary. Probab. Th. Rel. Fields 74, 455-477. [26] Slominski, L. (2001). Euler's approximations of solutions of SDEs with reflecting boundary. Stach. Proc. Appl. 94, 317-337. [27] Spohn, H. (1987). Interacting Brownian particles: a study of Dyson's model. Pp. 151-179 in Hydrodynamic Behavior and Interacting Particle Systems. (Ed. G. Papanicolaou). Springer, New York. [28] Stewart, B. S., Yochem, P. K., Huber, H. R, DeLong, R L., Jameson, R J., Sydeman, W., Allen, S. G. and Le Boeuf, B. J. (1994). History and present status of the northern elephant seal population. Pp. 29-48 in Elephant seals: Population Ecology, Behavior and Physiology (B. J. Le Boeuf and R M. Laws, eds). University of California Press, Los Angeles. [29] Stewart, J. (1991). Calculus, Early Transcendentals. Brooks/Cole, Pacific Grove.

48

Constrained Animal Motion

[30] Stroock, D. W. and Varadhan, S. R. S. (1979). Multidimensional Diffusion Processes. Springer, New York. [31] Tanaka, H. (1978). Stochastic differential equations with reflecting boundary condition in convex regions. Hiroshima Math. 9, 163-177.

8-expansions and the generalized Gauss map Santanu Chakraborty Reserve Bank of India

and B.V. Rao Indian Statistical Institute

Abstract Motivated by problems in random continued fraction expansions, we study {;I-expansions of numbers in [0, (;I) where < {;I < 1. For such a number (;I, we study the generalized Gauss transformation defined on [0, (;I) as follows: if x =I- 0, T(x) ~ { ~ - e[,~l if x =

°

°

One of the problems that concerns us is the symbolic dynamics of this map and existence of absolutely continuous invariant probability.

AMS (MSC) no: 37 E 05; 60 J 05

1

Introduction

Suppose that /-l is a probability on the real line. Consider the following law of motion: If you are at x pick a number Z according to the law /-l and move to Z + x. Continue the motion with independent choices at each stage. This is nothing but the familiar random walk. Suppose that by an error the law is transcribed as : move to Z + ~, then what happens? To make sense of the problem, from now on we consider the state space to be (0,00). Let /-l be a probability on [0,(0) which drives the motion. If you are at x move to Z + ~ where Z is chosen independent of the past and has law /-l. This leads us to the Markov process Xo

= x > 0;

where (Zn; n .2: 1) is an i.i.d sequence of random variables, each having law /-l. The purpose of the paper is to discuss this process.

2

Generalities

If /-l is 60, the point mass at zero, then Xn = x or l/x according as n is even or odd. Unless x = 1 the process does not converge in distribution. For each x > , ~ (6 x + 61 / x ) is an invariant distribution for the process. In fact any invariant probability is a mixture of these. If /-l = 6a where a > 0, then the process starting at x is deterministic and is the sequence - in the usual notation of continued fractions - [x;] , [a; x] , [a; a, x] , ... which converges to the number given by the continued fraction [a; a, a,' .. ]. We leave the easy calculation involving convergents to the interested reader. The point mass at this point is the unique

°

49

O-expansions and the generalized Gauss map

50

invariant distribution for the process. From now on we assume that f-L is not a degenerate probability, on [0, (0). It may however have some mass at zero. Then Xn = [Zn; Zn-l,'" ,Zl, x] has the same law as [Zl; Z2,'" ,Zn, x] and consequently Xn converges in distribution to

Xoo

= Zl

+

1

Z2

+

1

Z3+"

simply denoted by [Zl; Z2, Z3," .]. The almost sure convergence of the expression on the right side is argued as follows. Since Zi are Li.d with strictly positive mean, we have 2..: Zn = 00 a.e and for nonnegative numbers (an) the continued fraction [al; a2, a3,"'] is convergent iff 2..: an = 00 (Khinchin [9], Th 10,p. 10). Since Xn converges to Xoo in distribution irrespective of the initial point x we have the following: Theorem 1 (Bhattacharya and Goswami [1]):

(1) The Markov process Xn has a unique invariant distribution II, and Xn converges in distribution to II.

(2) II is the unique probability on (0,00) characterized by II = f-L

*~

in the sense that whenever X, Z are independent random variables with X strictly positive, Z rv f-L and X rv Z + then X rv II.

1- '

In view of the last part of the theorem, each explicit evaluation of II leads to a characterization of II as the unique distribution satisfying the convolution equation above. It is in this context the problem was first discussed by Letac and Seshadri [11], [12]. They observed that when f-L is exponential then II is inverse Gaussian, thereby obtaining a characterization of the inverse Gaussian distribution. A systematic study of the markov process was initiated in Bhattacharya and Goswami [1] motivated by problems in random number generation. They showed, among other things, that II is always non-atomic. An excellent review is in Goswami [8].

3

Positive integer driver

One problem that concerns us here is the nature of the invariant probability whether it is absolutely continuous or singular. Since the invariant probability II is nothing but the distribution of Xoo = [Zl; Z2, Z3,"']' the problem reduces to studying the nature of the distribution of Xoo.. Let us assume that the driving probability f-L is concentrated on the set of strictly positive integers. In this case note that the representation [Zl (w); Z2 (w), Z3 (w), ... ] is already the usual continued fraction expansion of the number Xoo(w). Well known results about usual continued fraction expansions lead to an interesting consequence. The range of 1/Xoo is contained in (0,1). Under the distribution of 1/ X oo , the digits in the continued fraction expansion are i.i.d so that it is an invariant and ergodic measure for the Gauss transformation. So it must be same as the Gauss measure or must be singular to the Gauss measure and hence singular. But as one knows the digits are not independent under the Gauss measure. So the distribution of 1/Xoo is singular. Consequently the distribution of Xoo is singular too. Thus,

Santanu Chakraborty and B. V. Rao

51

Theorem 2: Suppose p is concentrated on strictly positive integers. Then XCXl

has singular distribution. This is perhaps known, but we have not found in the literature. Thus we here have a naturally arising family of singular distributions.

4

Bernoulli driver

The arguments used above fail when p has mass at zero. Due to the presence of zeros, [Zl (w); Z2 (w), Z3 (w), ... j is no longer the usual continued fraction expansion of the number XCXl(W). Let us assume that the driving probability p puts mass a at 0 and 1 - a at 1. Since each Zi takes only two values 0 and 1, it is not difficult to discover the continued fraction expansion of X(XJ(w) This is what we obtain now. Let us assume that ZI(W) = 1 or equivalently, consider the set n1 = {w: ZI(W) = I}. Define the stopping times for the process (Zik::l as follows: 70 (w)

= First even integer i such that Zi (w) =f O.

= First odd integer i > 70 suchthat Zi (w) =f O. 72 (w) = First even integer i > 71 such that Zi (w) =f 0 &c. 71 (w)

Let us now define,

Then, we have for a.e. w E n 1 , [ao(w); al (w), a2(w),·· .j. is the usual continued fraction expansion of X(XJ(w).1f Sk = L1
h Iim STk-l converges to 1 - a a.e. Furt h er, By the SLLN, we conclude tat, k--+(XJ 7k - 1 70 -1,71 - 70,72 - 71, ... are i.i.d. random variables taking values 1,3,5, ... with 7k

probabilities I-a; a(l-a); a 2 (I-a),··· . So, again by SLLN lim k k--+(XJ

-1 +1

=

1+a -1--· - a

Thus 1

k

lim - - " ai(w) = 1 + a

k--+(XJ

k+1 L

for a.e. wE

n1 .

i=O

Thus for almost every w E n1 , the average of the digits in the (continued fraction) expansion of X (XJ (w) has a finite limit. This argument can be repeated from the first nonzero Zi to get the same conclusion a.e. This shows that the range of X(XJ is Lebesgue null ([9], Th. 30, p. 63 or * below). In fact the proof shows more. Theorem 3:

(i) Let ITa be the distribution of X(XJ when p takes values 0 and 1 with probabilities a and 1 - a respectively. Then each ITa is singular. The family (ITa : 0 < a < 1) is a uniformly singular family.

B-expansions and the generalized Gauss map

52

(ii) If M is concentrated on the set of nonnegative integers and has finite mean then Xoo is singular. Singularity of TIc> in (i) above was shown by Bhattacharya and Goswami [1], using again the properties of the Gauss transformation, by different methods. The case when the driving probability M puts mass 0: at 0 and mass 1 - 0: at B where 0 < B < 1, leads to the interesting concept of B expansions and a generalization of the Gauss transformation.

5

() - expansions

Throughout our discussion we fix a B with 0 o. Let ao = max{ n ;::: 0 : nB ::;: x}. If x already equals aoB, 1 1 we write x = [aoB]. Otherwise, define rl by x = aoB + - where 0 < - < B. rl rl Then rl write x

>

1 B ;::: B and

= [aoB, alB],

where 0

1

< - < B. r2

let al = max{ n ;::: 0 : nB ::;: rd. If rl = alB, then we

i.e., x

So,

r2

= aoB + In. >

1

Li ;:::

alU

If alB

B and let a2

< rl,

define r2 by rl

= max{n ;::: 0 :

= alB + ~ r2

nB ::;: r2}. Pro-

U

ceeding in this way, either the process terminates at, say, n steps or it continues indefinitely. In the former case, we write x = [aoB; alB, ... ,anB] and we call this the continued fraction expansion of x with respect to B terminating at the n-th stage. In the latter case, we write x = [aoB; alB, a2B,· .. ] and it is called the infinite or non-terminating continued fraction expansion of x with respect to B. From now on, unless otherwise mentioned, we refer to this expansion as the continued fraction expansion of a number in (0,00). Since during any discussion a particular value of B is fixed, we shall omit the phrase 'w.r. t B'. We shall now briefly argue that the infinite expansion does converge to x. To do this, as with usual expansion, we define the n-th convergent of a number x E (0,00) as

In case x has terminating expansion, say, x

Pk qk

=

= [aoB; alB, ... ,akB],

then clearly

x. We make the usual convention that in this case

Pn - =x qn

for

n:2 k.

When x < B, we have ao = 0 and instead of writing x = [0; alB, a2B,···], we write x = [al B, a2B, ... 1 which is same as writing, in the usual notation, 1

x = ----,.....-1

alB+ - - - a2 B + ...

Let 0 < x < B, x = [alB, a2B, ... ]. In what follows an, Pn and qn depend on x. The following are routine to verify ( The stated identities hold for all n

Santanu Chakraborty and B. V. Rao

53

in case x has non-terminating expansion and they hold for n :::; k in case x has expansion terminating at the k-th stage): For n 2: 1, Pn

=

an()Pn-1

+ Pn-2

5.1(i)

and (5.1) Following the convention of the usual continued fraction expansions, namely, P-I = 1,po = 0, q-l = 0, qo = 1, we arrive at

for

Pn-Iqn - Pnqn-l = (_l)n

Let n 2: 1, and x = [al()' a2(),··· ,an ()

Pn

n 2: 0

(5.2)

1 + -]. Then a little algebra shows that rn

1

+ -Pn-l rn

x = ------'-"'----1 qn

(5.3)

+ -qn-l rn

1

Now, a n +l() :::; rn :::; (an+l (5.1(i)),(5.1(ii)), we get,

+ 1)().

(5.4)

Using these estimates in (5.4) and noting

(5.5) Observing that qo = 1 2: () and ql = al() 2: (), using (5.1(ii)), we obtain that \In 2: 1. Further, (5.1(ii)) also gives

qn 2: (),

Using this inequality, we have, by induction on n, (5.6) As a consequence qn

-----t

00.

So, from (5.5), I x - Pn I qn

-----t

0 as n

-----t

00.

This was

already observed in Bhattacharya and Goswami [1]. We shall now improve upon the estimate (5.6) which will be used later. To do this note that from (5.1(ii)), qn 2: (()2 + 1)qn-2 for n 2: 1. Now using induction on n, we get,

(5.7) [al()' a2(),· .. ] arises as the continued fraction expansion of a number x smaller than (). It is easy to see that a sequence [ao(); al(), a2(), ... ] arises as the continued fraction expansion of a number if and only if [al()' a2(),· .. ] arises as the expansion of a number smaller than (). To understand the idea, note that, for

O-expansions and the generalized Gauss map

54

the usual continued fraction expansion (case 0 = 1), [0; 2,1] does not arise as the expansion of any number - the correct one being [0; 3] . In fact, in the usual case this is the only restriction. More precisely [aI, a2, ... ] arises as the usual expansion of a number smaller than one iff (i) each ai 2: 1; (ii) in case it is terminating the last ak is strictly larger than one. Let us start with the most simple case, namely, 1/0 is an integer w.r.t 0 i. e. 1/0 = n 0 for some integer n 2: 1. Theorem 4: Let 1/0

= nO for some integer, say, n 2:

1. Then [aIO, a20, ... ]

arises as the expansion of a number smaller than 0 iff (i) each ai 2: n; (ii) in case it is terminating the last ak is strictly larger than n. Proof: Suppose [aIO, a20, ... ,akO] is the continued fraction expansion of a number x < O. Then nO =

1

1

1

shows that al 2: n. Now - = alO + x x

n
o < -1 < 0 implying as earlier a2 r2

for all i

< k. Since rk

= akO

>

2: n. Proceeding this way, we get that ai 2: n

~

< k and ak > n. 1 nO or < O. Also, ak-l 2: n implies ak O

i ~ k so that ai 2: n for i

> n, we have akO > 1

where

~

e1 = nO, we get ak > n as claimed. Conversely,

suppose we have integers ai for 1 Then, since ak

1

-

that ak-IO + ----n > nO or [ak-IO, akO] < O. Proceeding this way, we can show ak O [aiO, ... ,akO] < 0 for 1 ~ i ~ k. Thus if we define x = [aIO,· .. ,akO] then x < 0 and indeed [aIO,· .. ,akO] is the continued fraction expansion of x. Similar but simpler argument applies to show that in the non-terminating case, it is necessary and sufficient to have each ai 2: n. To consider a slightly more general case suppose that

e1 = [nIO; n 20].

Thus we have

and

It should be observed that in such a case n2 > (nl + 1). In this case [al 0, a20, ... ] arises as the expansion of a number smaller than 0 iff the following conditions hold:

(iii) In the terminating case the last ak must satisfy ak > nl. This can be seen as follows. Suppose [aIO,· .. ,akO] is the continued fraction expansion of a number x < O. If k = 1, it trivially follows that al > nl as claimed.

Santanu Chakraborty and B. V. Rao

55

Let us assume that k ~ 2. Arguing as in the previous case, we obtain that ai ~ nl for each i and ak > nl. Suppose that for some i < k, ai = nl. Then

[ai(), ai+I(),··· ,ak()] < () implies that ai() + [ai+I(),··· ,ak()] > nl() + _1_. But n2() since ai = nl, this immediately implies [ai+l (), ... ,ak()]

> _1_ so that n2()

ai+1 () +

[ai+2(), ... ,ak()] < n2() and consequently, ai+l < n2. Conversely, suppose that aI, ... ,ak are integers satisfying the conditions of the claim. As in the previous

> nl implies [ak()] < (). Now if ak-I > nl, ak-I() + _1_ > nl() + _1_ ak() n2() implying that [ak-l(), ak()] < (). On the other hand, if ak-I = nl, then using the hypothesis that ak < n2, we get ak-l() + ~() = nl() + _1_ > nl() + _1_. ak ak() n2() So, [ak-I(), ak()] < (). One can now proceed as in the earlier case and show that [al(), ... ,ak()] is indeed the continued fraction expansion of a number x < (). case, ak

The non-terminating case is dealt with in an analogous manner.

o The ideas used above, executed with a little care, will lead to a proof of the 1

following theorem which considers the case when (j is rational w.r.t () - i.e. it is a ratio of two polynomials in (), with integer coefficients. We shall not go into the details. 1 Theorem 5: Suppose that (j = [nl(); n2(), ... , n m ()]. Then [al(), a2(), ... ] arises

as the expansion of a number smaller than () iff the following conditions hold:

(ii) In case for some i ~ 1 and p < m, (ai+I,··· ,ai+p) = (nl,· .. ,np) then we should have ai+p+1 ::; np+1 if p + 1 is even while ai+p+l ~ np+l if p + 1 is odd. Moreover if m is even and p + 1 equals m, then ai+p+1 < np+l. (iii) In the terminating case the last ak must satisfy ak > nl and further if for some even p < m, (i.e, p + 1 is odd) (ak-p,··· , ak-I) = (nl,··· , np), then ak > np+1 . The case when

~

is irrational w.r.t. () - i.e. has a non-terminating expansion

can also be discussed in a similar fashion leading to the following theorem.

1 Let (j

Theorem 6:

= [nIe;n2e,·· .]. Then [a I e,a2(),···] arises as the expan-

sion of a number smaller than () iff (i) each ai

~

nl.

(ii) In case for some i ~ 1 andp ~ 1, (ai+l,··· ,ai+p) = (nl,··· ,np) then ai+p+1 ::; np+l if p + 1 is even while ai+p+1 ~ np+1 if p + 1 is odd,-

(iii) In the terminating case the last ak must satisfy ak > nl and further if for some evenp?l, (i.e,p+l is odd) (ak-p,··· ,ak-I) = (nl,··· ,np) then, ak > np+l· The above discussion gives necessary and sufficient criteria for an expression

[aoe; ale,···], to be actually the continued fraction expansion of a number. How-

B-expansions and the generalized Gauss map

56

ever these conditions depend on the expansion of

1

8.

More precisely the condi1

tions depended on the sequence of integers nl, n2,··· where 8 = [nIB; n2 B,···], 1 It is natural to enquire as to how the expansion of 8 itself looks like. This is what we do in the next section.

6

Expansion of 1/ ()

We shall now discuss the following problem: Given a finite sequence of integers nl, n2, . .. ,nk, find conditions so that there is a number B, 0 < B ::; 1, such that 1 8 = [nIB; n2 B,··· ,nkBJ. In case such a number exists, is it unique? Unfortunately we do not have the complete solution. One can easily observe that when such a B exists each ni must be at least as large as ni and nk > ni. However this is not a sufficient condition. For example we can not have simple algebra shows that the correct expansion is

1

8=

~ = [2B; 3BJ,

a

[3B; J.

The situation k = 1 is very simple. In this case for any integer ni ::::: 1 there is indeed a unique B and it is given by B = 1/ y'nl. The situation k namely,

1

1

8 = nIB + n2 B can always be solved for .

the resultmg

n2 >

= 2 is a little more involved. Note that the quadratic in B, B; but it does not ensure that

81 has

the required expansion. In fact we must necessarily have 1 1 ni + 1. To see this, observe that if ni B + n2 B is the expansion of 8' then

we have

1

8 > ni Band

1

n2 B < B. so that n2 > ni. Further if n2 = nl

+ 1, then

1

8 = (ni + 1)B, which is not the required expansion. Thus we must > ni + 1. Moreover when this condition holds there is such a B and it

reduces to

have n2 is given by B = v(n2 - 1)/nIn2. Indeed, for this B

Further such a B is unique, because the quadratic equation to be satisfied by B has only one positive root. The situation k = 3 is more complicated. It is necessary to have n2 > ni and also n3 > nl. Further when this condition holds there is such a B and it is given by

B

v(ni

+ n3 -

n2 n 3)2

+ 4nIn2n3 2nIn2n3

(ni

+ n3 -

n2n 3)

Santanu Chakraborty and B. V. Rao

57

In fact the equation to be satisfied by B is of fourth degree having two nonreal complex roots, one positive and one negative root. Thus such a B is unique as well. With this choice, l/B has the required B-expansion. Thus we have proved Theorem 7:

(i) For the existence of a number 0 nl + 1. When this holds, such a B is unique. (ii) For the existence of a number 0 nl and n3 > nl. When this holds, such a B is unique. However the situation for values of k larger than 3, eludes us. For the existence of a number B, 0 nl for i

= 2,

m where as ni 2 nl for 2 < i < m.

(ii) If for some i and p with i + P < m, (nHI,··· ,nHp) = (nl,··· ,np), then we should have nHp+1 :s; np+1 if p + 1 is even where as ni+p+1 2 np+1 if p + 1 is odd. Before proceeding further, we mention that in the literature, there exist several generalizations of the usual continued fraction expansions. See, Bissinger [3], Everett [7] and Renyi [14] Kraikamp and Nakada [10] and the references therein.

7

Generalized Gauss Transformation

Recall that the Gauss transformation on the interval [0, 1) associated with the usual continued fraction expansions is defined by

U(x) = { ; -

Il;l

The Gauss measure jJ defined by djJ(x) =

if

x =1: 0,

if

x = 0 _1_ dx on [0,1) is ergodic og2 1 + x

-1 1

and invariant for U [2]. Further ----+ 00

1 lim - log qn

n ....... oo

n

=,

a.e.

a.e. for some finite number

"

As in section 5, an are the digits in the continued fraction expansion and Pn / qn is the n-th convergent. And a.e. refers to jJ, or equivalently to Lebesgue measure. These properties playa crucial role in [1].

O-expansions and the generalized Gauss map

58

The analogue of this transformation for the 0 expansion is the transformation T - referred to as the generalized Gauss transformation - defined on [0,0) as follows:

T(x)

~ { ~ - Blixl

if

x

if x

=J 0, =

°

For several values of 0 < 1 it was shown in [5] - by using the concept of Markov maps - that T has an ergodic invriant measure equivalent to Lebesgue measure and moreover (*) and (**) hold. We shall not go into the details for two reasons. First, there may be simpler argument. Second, even after establishing these properties , which are no doubt interesting, we have not been able to draw conclusions about the distribution of Xoo. Theorems 4, 5, 6, and 7 are nothing but a description of the symbolic dynamics of this transformation. As remarked to us by Professor R.F Williams, this transformation is piecewise C 2 and is expanding - with derivative (in modulus) bounded below by 1/0 2 . By using well -known results (see [6] or perhaps implicit in [14]) we get.

Theorem 8: The generalized Gauss transformation T on [0, 0) defined above admits an absolutely continuous invariant measure. A tractable special case of this transformation will be discussed in the next section. Before proceeding further, we remark the following. One can define a map on [0,0) to itself by putting U1 (x) = O(~ - [~]) and one can also define a map on [0, -t;) to itself by putting U2(x) = -t; Cj~ [B~])' Obviously, these maps are conjugate to the Gauss map U on [0,1). However, the map T that we defined above is different from U1 and U2 and this map T is relevent for our discussion. We could not see if this is conjugate to the Gauss map U. Professor Y.Guivarch informed us that for several values of 0, T and U have different entropies and hence can not be conjugate.

-

8

Invariant Measure for T when

In this section we assume that

;2

E IN.

:2

E IN

Thus for some integer, say I,

~ = lB.

1

Thus (j has continued fraction expansion terminating at the first stage itself, 1 (j = [w]. Throughout this section 0 and hence the integer I is fixed. We shall now extend the usual argument (see Billingsley [ 2] ) to get an absolutely continuous invariant measure for the above transformation. In fact, we claim that dP(x)

1

---;-~

log

1 ---=--dx

111 Vi + X

which is same as saying dP(x)

1

log(l

+ 02 )

0 dx 1 + Ox

Santanu Chakraborty and B. V. Rao

59

is the required invariant measure for T. In the present case, we are lucky enough to explicitly write down the invariant measure which is perhaps not possible in general. Since we could not see any direct way of connecting the transformation T with the usual Gauss transformation U, we shall verify the above claim by carrying the same steps as in Billingsley referred to above. In order to show that T preserves P, it is enough to show that prO, eu) = P(T-1[0, eu)) for all u E [0,1). 00

Since T- I [0, eu)

1 U((k +1 u)e' ke) (equality is upto a set of Lebesgue mea-

=

k=l

sure zero), it is enough to verify the following :

l()U o

e

dx =

1 + ex

fj k=l

e e dx. (k~u)e 1 + x kle

The sum on the right side, after evaluation of the integrals, is a telescopic sum which equals log (l

~ u)

same as the left side.

We now show that P is ergodic too. As in Billingsley[ 2], we introduce the sets ~al,a2'" ,an and the maps 1/Jal,a2," ,an: [0, e) ---+ [0, e) as follows. ~al,a2"" ,an is the set of all x such that ai(x) = ai for i = 1,2, ... ,n. In view of the discussion in section 5, ~al,a2"" ,an may be empty for some n-tuples (aI, a2,'" ,an)' In what follows we assume that ~al,a2'" ,an is non-empty for the n-tuple (aI, a2,'" ,an) under consideration. 1/Jal,a2,'" ,an is given by, 1/Ja l ,a2 , ... ,an (t)

=

1

t E

--------:;1,-------

al e+ --------::1:---a2 e+ -----:1;---

[0, e).

... +--ane +t

Then ~al,a2, ... ,an is the image of [O,e) under that

1/Ja l a2 ... a ,

,

,

n

(t) = Pn qn

+ tpn-l + tqn-l

Pn = [ale, a2 e,'" ,anej. Also qn creasing for even n. So,

~al,a2""

,an

=

[ [

for

1/J a l,a2, ... ,a n .

One can show

t E [0, e) just like in (5.5). Recall that

1/J a l,a2,'" ,an (t)

is decreasing for odd nand in-

+ (}Pn-l + eqn-l

Pn qn

Pn qn

Pn qn

+ epn-l , + eqn-l

Pn qn

] ]

if n

even,

if n

odd.

Using (5.2), we see,

(8.1) where

>., as usual, denotes Lebesgue measure.

e-expansions and the generalized Gauss map

60

Let us denote .6. a1 ,a2"·· ,an and 'l/Jal ,a2"·· ,an by .6.n and 'l/Jn respectively. Here we fix aI, a2,'" ,an' Then .6. n has length I 'l/Jn(e) - 'l/Jn(O) I . Also,for 0 ::; x < y ::; e, the interval {w : x ::; Tn(w) < y} n.6. n has length I 'l/Jn(Y) - 'l/Jn(x) I . So, using the notation, '\(AIB) = '\(A n B) / '\(B) , we have,

In absolute value the numerator equals . 1 nommator equa s

e

(

qn qn

II

+ uqn-1

y-x (qn + xqn-l)(qn

+ yqn-l)

and the de-

)

After some algebra,

'l/Jn(Y) - 'l/Jn(x) 'l/Jn(e) - 'l/Jn(O) Now ~ ~

qn-l

Again,

e and hence, the right hand side of (8.2)

qn-1 qn

(8.2)

e

<

1

Li so that 1 -

+ qn-l - 2u

hand side of (5.2) is ::;

~ Y ~ X. 2u

(e - X)qn-1 1 e > - and hence the right qn + qn-1 - 2

2(y - x)

e . Thus,

y - x < '\(T-n[x y)l.6. ) < 2(y - x). 2e ,n e Hence, for any Borel set A also, we have,

'\(A) < '\(T-n(A)I.6. ) < 2'\(A). 2e n e Now, since 0 ::; x < 1

(8.3)

e, 1

e

log(l + ( 2) 1 + e 2 ::; log(l

e

e

+ ( 2) 1 + ex ::; log(l

+ ( 2)'

Hence, for any Borel set M, we have, log(l

e

1

+ ( 2) 1 + e 2

So, '\(M) ::; 1 ~ e log(l 2

'\(M) < P(M) < e '\(M). - log(l + ( 2)

+ ( 2)p(M)

and '\(M)

~

(84)

.

2

log(l e+ ( ) P(M).

Therefore, using these inequalities together with (8.3) and (8.4), we get the following:

where Cl, C 2 are constants depending on above inequality becomes

e only.

Now if A is invariant, the

C 1 (e)p(A) ::; P(AI.6. n ) ::; C2(e)p(A). Assuming P(A)

> 0, we get,

Santanu Chakraborty and B. V. Rao

61

Hence, for any Borel set E,

C1 ((})P(E)

~

~

P(EIA)

C 2((})P(E).

Taking E = AC, one gets P(AC) = 0 so that P(A) = 1. Therefore, T is ergodic under P, as claimed. We now prove (*) and (**) also hold - again following Billingsley closely. By ergodic theorem, if f is any non-negative function on [0, Ol, integrable or not, we have, .

1 n-1

hm n~(Xl n Taking

f

L

k=O

= aI,

1

k

f(T (w)) = 1 (1 og

r

()

0

1

+ (} X

L

a.e.

[Pl·

1

= 10g(1 + (}2)

=

£;

Jkl) 1

(Xl

(}a1(x) d

10g(1 + ( 2 ) io 1 + (}x x

1

k=l

10g(1 + ( 2 )

_1_

(k+1)()

k(} d 1 + (}x x

1

(Xl

k 10g(1

n-1

+ k2 + 2k) = 00.

n

lim ~ a1(Tk(w))

n----+oo

1(} (}f(x) dx

the right hand side becomes,

1

Thus,

+

(}2)

~

k=O

= n-+oo lim

L ak(w) =

00

a.e.

[Pl·

k=l

This proves (*). Towards (**), first notice that, (8.5) Also, from (5.5),

Or,

Or, x

Ilog(

Pn )

I~ 10g(1 + (1

1

+ (}2)n) <

1

(1

+ (2)n

(8.6)

qn

So, using (8.5) and (8.6),

Ilog[ak(w)O, ak+l(w)O," ·l-log[ak(w)O, ak+l(w)O,'" ,an(w)oli =llog(T k- 1 (w)) - log[al (T k- 1(w) )(}, a2 (T k- 1 (w))O, ... ,an-k+1 (T k- 1 (w) )oll

-11 -

(Tk-l( )) -1 Pn+1_k(T k- 1 (w)) I < og w og Qn+1_k(Tk-l(w)) - (1

1

+ (2)n-k+l'

e-expansions and the generalized Gauss map

62

Using this in (8.5) we get 1 1 1 1 rrn Pn+l_k(Tk-1(w)) - l o g - - = - og qn(w) n k=l qn+l_k(Tk-l(w)) n

1~

= ;;, L.,..,log(T

k-l

(w))

1~

+ ;;, L.,..,

k=l

(1

(n,k

+ (2)n-k+l

k=l

for some numbers (n,k which are smaller than one in modulus. Moreover since

L

1

00

(1

+ ( 2 )i

is finite, the second term on the right side converges to zero.

t=l

The ergodic theorem implies that the first term on the right side converges 1 elogx ... . to 2 e dx whIch IS fimte, say, 'Y. ThIS proves (**). Thus we have 1+e 0 1+ x the following:

1()

Theorem 9: Let

1

e

VI'

I E IN. Then f-l given by

1 log(1

+ (2 )

e

1 + ex

dx

is invariant and ergodic for the generalised Gauss map T on [0, B) defined by

T{x) Moreover, for a.e. x

=

~

{

~ - O[o~,l

if x

-I- 0,

if x

=

°

[alB, a2B,' .. J we have, .

hm

al

+ ... + an n

n-too

= 00.

Further, there is a finite number" such that if Pn denotes the n-th convergent qn of x, then for a.e. x, we have, 1

lim - log qn n--+oo n

9

,.

Remarks

We conclude with few remarks. The entire investigation above is directed to finding the nature of the distribution of the random continued fraction

where (Zi) is a sequence of nondegenerate iid random variables with values in [0, (0). It makes sense to talk about the expression even when the variables are not identically distributed. Thus assume that (Zi) are independent random variables with values in [0, (0). By the zero - one law, we know that 2.= Zn is either almost surely

Santanu Chakraborty and B. V. Rao

63

finite or almost surely infinite. In the first case the continued fraction does not converge (K, Theorem 10, page 10). So let us consider the case when it is infinity almost surely. Then Xoo is defined a.e. The question is whether one can establish results similar to sums of independent random variables. Towards this end, suppose that each Zi is discrete. One can show that the purity law of Jessen - Wintner holds. Here is the argument which is an adaptation of a well known argument - see for instance [4]. Let S denote the range of all the variables Zi' Note that S is a countable set. Put

D = S u (- S) U {O} = {x : xES or - xES or x = O} For any two sets A and B of R , let A + B = {x + y : x E A, y E B}, B+ = {x : x E B, x 2: O} and 1/ B = {x : x = 0 or x = 1/ y for some y E B}. For any Borel set B c [0,00) define Bo = (B + D)+ U (1/ B) and in general for n 2: 0, let B n+1 = (Bn + D)+ U (1/ Bn). Finally, Boo = Un'20Bn. Note that D being countable, these are all again Borel sets. Moreover if B is countable then so is Boo and if B is Lebesgue null then so is Boo. We claim that for every Borel set B the event (Xoo E Boo) is a tail event for the sequence (Zi)' Indeed if [Zl; Z2, Z3,"'] E B n , then [0; Z2, Z3,"'] E B n+1 and [Z2; Z3, ... ] E B n+2. Conversely if [Z2; Z3, ... J E Bn then [0; Z2, Z3, ... J E Bn+l and [Zl; Z2, Z3,"'] E B n+2. Thus for any Borel B, the event (Xoo E Boo) has probability one or zero. If for some countable B this has probability one then Xoo is discrete, otherwise it has continuous distribution. Suppose it is continuous. If for some Lebesgue null B, this event has probability one then Xoo is singular, otherwise it has an absolutely continuous distribution. Thus we have

Theorem 10: If Zi are independent discrete nonnegative random variables with Zi = 00, then the law of the continued fraction Xoo = [Zl; Z2, Z3,"'] is pure. Suppose that the sequence (Zi) is equivalent to a constant sequence in the sense of Khinchin - that is, there is a sequence of numbers (Zi) such that L P(Zi i- Zi) < 00. Then it is clear that Xoo is discrete. Perhaps the converse is also true as in the case of sums (see Levy [13]).

L

Santanu Chakraborty Reserve Bank of India Mumbai <[email protected]>

B.V. Rao Indian Statistical Institute Kolkata

Bibliography [1] Bhattacharya, Rabi and Goswami, Alok. [2000]: A class of random continued fractions with singular equillibria. Perspectives in Statistical Sciences. eds A.K.Basu et al; Oxford University Press. [2] Billingsley, P. [1965]: Ergodic Theory and Information. Wiley, New York. [3] Bissinger, B.H. [1944]: Generalization of continued fractions. Bull. Amer. Math. Soc., 50, p.868-876. [4] Breiman, L [1968]: Probability. Addison-Wesley

64

O-expansions and the generalized Gauss map

[5J Chakraborty, S. [2000J: Contributions to Random iterations and Dynamical systems, unpublished thesis. [6J De Melo, W. and Strein, S. van [1993J: Springer-Verlag, New York

One Dimensional Dynamics.

[7J Everett, C.J. [1946J: Representations for real numbers. Bull. Amer. Math. Soc., 52, p.861-869. [8J Goswami, A. [2002J: Random Continued Fractions: A Markov Chain Approach. preprint [9J Khinchin, A.Ya [1964J: Continued fractions. English translation of third Russian edition, Univ.Chicago Press, Chicago. [10J Kraikamp, C and Nakada, H [2000J: On Normal numbers for continued fractions. Erg. Th.and Dyn.Sys, 20, p.1405 - 1422. [l1J Letac, G. and Seshadri, V. [1983J: A characterization of the generalized inverse Gaussian distribution by continued fractions. Z. Wahr. Verw. Geb , 62, p.485 - 489. [12J Letac, G. and Seshadri, V. [1995J: A random continued fraction in IR d + 1 with an inverse Gaussian distribution. Bernoulli, 1, p. 381-394. [13J Levy,P [1931J: Sur les series dont les termes sont des variables eventuelles independantes. Studia. Math 3, p.119-155. [14J Renyi, A [1957J: Representations for real numbers and their Ergodic properties. Acta Math. Acad. Sci. Hung, 8, p.477-495.

On Ito's Complex Measure Condition 1 Larry Chen, Scott Dobson, Ronald Guenther, Chris arum Mina Ossiander, Enrique Thomann, Edward Waymire Department of Mathematics Oregon State University Abstract The complex measure condition was introduced by Ito (1965) as a sufficient condition on the potential term in a one-dimensional Schrodinger equation and/or corresponding linear diffusion equation to obtain a FeynmanKac path integral formula. In this paper we provide an alternative probabilistic derivation of this condition and extend it to include any other lower order terms, i.e. drift and forcing terms, that may be present. In particular, under a complex measure condition on the lower order terms of the diffusion equation, we derive a representation of mild solutions of the Fourier transform as a functional of a jump Markov process in wavenumber space.

Keywords: Duality, multiplicative cascade, multi-type branching random walk

1

Introduction

The complex measure condition was introduced by Ito (1965) as a sufficient condition on the potential term O( x) in the one-dimensional Schrodinger equation

h fJ¢

h 2 [P¢ 2m fJx2

- - = - - - - mO(x)¢ i fJt

(1.1 )

for the so-called Feynman principle of quantization. More specifically, under the condition that O(x) is the Fourier transform of a complex measure of bounded variation on (-00,00), Ito (1965) establishes the validity of the Feynman-Kac path integral formula appropriate to (1.1). Throughout "complex measure" will imply a regular measure with finite total variation without further mention. Ito (1965) further notes that his method is also applicable to the linear diffusion equation fJu

-;:} = ut

a 2 {Pu -!::l

2

uX

2

+ c(x)u.

(1.2)

Our first encounter with the complex measure condition arose in efforts to better understand the branching random walk associated with incompressible Navier-Stokes equations that was originally developed by LeJan and Sznitman (1997) and elaborated upon in Bhattacharya et al (2002). From this point of view the binary branching tree structure associated with the nonlinear N avierStokes equation is replaced by a unary tree structure for the linear diffusion equation. However in preparing the present article we learned about variants of these results for the Schrodinger equation recently given by Kolokoltsov (2000, 2002) and in references therein. We have not found a specific reference to the extension to lower order terms given here, but given the rather sizeable physics 1

AMS 1991 subject classifications: Primary OO?OO; Secondary OO?OO.

65

On Ito's Complex Measure Condition

66

interests in this topic, it may be known. In any case, a main point of emphasis for us is the apparent wide scope of applicability of the recursive branching techniques in Fourier space for both linear and non-linear partial differential equations. In this paper we shall consider the n-dimensional linear diffusion equation

au

(1.3)

at

with a view toward the complex measure condition. In particular we present an approach which yields a natural probabilistic understanding of Ito's condition for (1.2). Moreover, we will see that this condition extends to a general condition on the lower order terms of (1.3). This is achieved by a "Fourier dual FeynmanKac formula" which also includes a Fourier dual to Duhamels' principle for the equation with source term g(t,x). In place of the continuous Markov diffusion process associated with (3) via the Feynman-Kac formula and Duhamel principle in real-space time solutions, we derive a representation of mild solutions of the Fourier transformed equation as a functional of a jump Markov process in wavenumber space. Although not explicitly treated in this paper, the reader may note that extensions to certain higher order and/or fractional differential equations of the type described in Podlubny (1999) are also possible by this approach. The organization is as follows. In the next section we consider a simple example that reveals the basic idea. This also includes an alternative probabilistic derivation of Ito's complex measure condition for (1.2). The third section includes the more general result which extends the complex measure condition to each of the lower order coefficients with a source term g(t, x) given by (1.3). Finally, we explore more general conditions on non-constant diffusion coefficient wherein mild solutions continue to have a representation as a multiplicative functional of a Markov jump process in wave-number space. While it is a special pleasure for the authors to submit this paper in celebration of Rabi Bhattacharya's mathematical career, Rabi could easily have been listed as a co-author in connection with his continued inspiration and working relationship with this group at Oregon State University.

2

Ito's Condition: A basic example

To understand the nature of Ito's complex measure condition for (1.2) and to prepare the way for extension to (1.3), consider the specific example

au a 2 a 2u at = 2 ax2

+ cos(x)u,

u(o+, x) = uo(x),

(2.4)

for positive constant a 2 > o. For an integrable function f on Rn the Fourier transform, as well as its corresponding distributional extension, is defined with

~ f(O

=

(2 1)!l 7r

2

1 Rn

e-~.~ ·X f(x)dx,

(2.5)

Letting S denote the Schwartz space, a tempered distribution F E S' has corresponding Fourier transform F E S' defined by (F, ¢) = (F, ¢), ¢ E S. Similarly,

Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire

67

the inverse Fourier transform is defined by (p, ¢) (F, ¢), ¢ E S, where for integrable j, j (x) = j (-x), x ERn. In particular the Fourier inversion formula is simply (F) = (P) = F for F E S'. Thus for Dirac point mass at the origin <50 E S', one has (50, ¢) = (<50, J) = J(O) = ~ ¢(x)dx = ((27f)-~, ¢), ¢ E (211")

2

JRn

S, i.e. 50 = (27f)-~. Taking n = 1, the respective translates to point masses at ±1, <5±1 = T±1<50 , therefore have inverse Fourier transforms 5±1 = he±ix. V 211"

Thus for the special choice of cos (x) has

= eiX+2e-iX = ~ (5+ 1(x) + 5_ 1(x)) one (2.6)

For reasons that will become clear and without loss of generality let us rewrite (2.4) as

au a 2 a 2u at = ""2 ax2

1

1

+ (cos(x) + 2)u - 2 u , u(O+,x) = uo(x).

(2.7)

By standard Fourier transform operational calculus, a solution u E S' to (2.7) will have Fourier transform satisfying the Fourier transformed evolution = -A(~)U(t,~) =

1

+ 2(<5+1 + L1 + <50 ) * u(t,~)

-A(~)U(t,~) + }u(t, ~ + 1) + }u(t, ~ -

where

A(~) =

2

a2

e +},

(2.8)

1) +

}u(t,~)

~ E R.

(2.9)

Integrating with the help of the integrating factor e>'(~)t gives the following so-called "mild form" of the Fourier transformed equation u(t,~) =

e->'(~)t,uo(~) + lot e->'(Os}(u(t - s, ~ + 1) + 'u(t - s, ~ -

=

e->'(~)tuo(~)

+~u(t -

+ m(~) iot

1 1

A(~)e->'(~)S{2(3u(t - s,~

1) + u(t - s, ~))ds

1 + 1) + 3u(t -

s, O)}ds

s,~ -1) (2.10)

where the multiplicative factor m(~) = 3/ A(~) is introduced to write the recursion (2.10) in the form of an expected value. Namely,

where i. So is exponentially distributed with parameter A(~O)

=

2e + ~.

a2 e

ii. Conditionally given ~o = ~, ~<1> is ~ or ~ ± 1 with equal probabilities ~ each, independently of So.

On Ito's Complex Measure Condition

68

iii. Kg is 0-1 valued symmetric Bernoulli (coin tossing) random variable, independent of Sg, ~g. iv. m(~)A(~) = 3 for all ~ E R. In view of this structure one is naturally led to the jump Markov process {~(t) : t 2: O} defined on a probability space (rl,F,P) starting at ~g = ~ in Fourier frequency space R having simple symmetric random walk ~g, ~<1>" .. as discrete spatial structure and positive infinitesimal rates A(O; see Blumenthal and Getoor (1968) for detailed construction ofthe strong Markov process {~( t) : t 2: O} so specified. Additionally, Kg, K, K<2>, ... is an i.i.d. Bernoulli 0-1 valued sequence independent of the jump process {~(t) : t 2: O}. Now consider the multiplicative random functional defined recursively by UO(~g),

if Sg

>t

m(~g)uO(~
X(t,O) =

Sg

~

+ Sl, Kg

t < Sg

m(~g)m(~<1»m(~<2»UO(~<2»

1,

=

if Sg + Sl ~ t < Sg + Sl and if Kg = K = 1,

+ S2,

This stochastic recursion may be expressed more succinctly as

X(t 0) - {Uo(~g), if Sg > t Kgm(~g)X(t - Sg, < 1 » , -

if Sg

~

t.

Thus defining N t -1

u(t,~) = Et;e=t;X(t, 0)

=

Et;e=t;(UO(~Nt) exp{ -

L

N t -1

10g(A(~j)/3)}

j=O

II

K
i=O

(2.12) where Nt counts the number of jumps in {~(t) : t 2: O} by time t, and using the strong Markov property of the jump process {~(t) : t 2: O} one has u(t,~)

Et;e=t;X(t,O) Et;e=dX(t,O)I[Sg > t]

+ X(t, O)I[Sg

~

t]}

e-A(t;)tuo(~) + lot e-A(t;)S(~Et;=t;+lX(t +

1

6"Et;=t;-lX(t - s, < 1 »

1

s,

+ 6"Et;=t;X(t -

<1»

s,

< 1 > ))ds

e-A(t;)tuo(~) + m(~) lot A(~)e-A(t;)S~(1u(t - s,~ + 1) + 1u(t -

s, ~ - 1)

+ 1u(t -

s, O)ds

(2.13)

That is, u(t,~) defined by (2.12) solves the mild form of the Fourier transformed equation. The complex measure condition was designed in this example to provide the simple random walk as the "complex measure". However, more generally, under Ito's complex measure condition, by decomposing u into its real and imaginary parts and then by applying respective Hahn decompositions of these into positive and negative parts, up to appropriate normalizations to a mixture of probability measures which can be absorbed into the multiplicative

Cben, Dobson, Guentber, Orum, Ossiander, Tbomann, Waymire

69

factors m(~), one may obtain a random walk distribution. These details and generalizations to additional lower order terms, including a Fourier-transformed Duhamel principle, are the subject of the next section.

3

A General Complex Measure Condition

Let L denote the second order elliptic differential operator defined by n

Lf(x)

=

L

J,. k-1 -

a2

a

n

ajk ax .ax f(x) J

k

+L

ax. (bj(x)J)(x) +c(x)f(x) - Ef(x), (3.14) J

·-1 J-

where A = ((ajk : 1 :::; j, k :::; n)) is a positive-definite matrix of real numbers (constants), f. > 0, b = (bj(x) : 1 :::; j :::; n), and c(x) have the property that the Fourier transform of each term is a complex measure. We will also permit an additional forcing term g( t, x) for which the Fourier transform g( t, ~) is assumed to exist as a function. Precise conditions characterizing when a function is the Fourier transform of a complex measure are not known to us, though various examples and sufficient conditions are relatively easy to provide. For example, Bochner's theorem may be used to get a sufficient condition for this in terms of non-negative definiteness. We consider the Cauchy problem

au at

= Lu + g,

u(O, x) = uo(x).

(3.15)

-EU, f. > 0, appearing in (3.15), for L defined by (3.14), causes no loss in generality for applications to equations with E = 0. Let (3.16)

In view of the linearity ofthe equation, the term

where < ',' > is ordinary dot product. Then taking Fourier transforms in (3.15) one has by the integrating factor method that

(3.17)

A solution of the integral equation version (3.17) of the Fourier transformed differential equation is referred to as a mild solution of the Fourier transform. The hypothesis that the lower order coefficients each contributes a complex measure provides a set of up to four probability measures by considering the positive and negative parts of each of the real and imaginary parts. To obtain a random walk distribution we proceed as follows. Define positive measures q, Q on R n by n

q(B)

=

Icl(B) +

L j=l

Ibjl(B),

B E Bn ,

(3.18)

On Ito's Complex Measure Condition

70

and assuming q(Rn) > 0, (3.19)

°

we leave the case q(Rn) = as a simple but illuminating exercise for the reader. Then Q is a probability distribution which dominates each of the complex measures bj , c, j = 1,2, ... n. Let the corresponding Radon-Nikodym derivatives be denoted by

de

(3.20)

ro(17) = dQ (17) and

db

rj(17) = dQ (17),

(3.21)

j = 1, ... n.

Now let {~n == ~ : n = 0,1,2, ... } be the random walk on Rn with distribution of Li.d. displacements 171,172, ... given by Q. That is, ~o = ~ and ~n = ~n-1 - 17n for n 2 1. Also let {~(t) : t 2 O} denote the corresponding pure jump Markov process on Rn with holding times Be, B1, ... defined by the positive rates A(~n) =< A~n, ~n > +E, n = 0,1,2, ... , respectively. Let {Nt: t 2 O} denote the corresponding counting process of the number of jumps by time t, and let K,e, K,<1>, ... be i.i.d. Bernoulli 0-1 valued random variables on (0.,F,P) independent of the jump process {~(t): t 20}. The Bernoulli coin tossing sequence will induce "virtual states" upon the occurence of K,j = 0. Let

mo(~)

=

2(n + 1) (27r)¥ A(~)

(3.22)

and mj(~) = i~jmo(~),

(3.23)

j = 1,2, ... , n.

Substituting (3.22) and (3.23) into (3.17) gives

u(t, ~) =

e-'\(~)tuo(~) +

t A(~)e-'\Ws[~_I_ ( t 2 n + 1 JRn .

Jo

mj(~)rj(17)

J=O

·u(t - s,~ - 17)Q(d17) 12g(t - s'~)]d

+2

A(~)

s

= E~e=dl[Be 2 t]uo(~e) + [mJ(~e)r J(171)U(t - Be, ~
tn·

Recursively define a times functional by if Be 2 t if Be < t, K,e = 0, if Be < t, K,() = 1, Je = j E {O, ... , n} (3.25)

Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire

71

where {Ji : i = 1,2, ... }, {l'i:i : i = 0,1,2, ... } are respectively LLd. uniformly distributed over {O, 1, ... ,n}, and are i.i.d. symmetric Bernoulli 0-1, mutually independent and independent of the jump process, and 2fj( t, ~)

A(~) .

(3.26)

Define the number of generations to the first "virtual state" by

K = inf {n 2': 0 : l'i: n = O},

K t = Nt

1\

K.

(3.27)

Iteration of the stochastic recursion leads to Nt

u(t,~) = E';e=.;X(t, e) =

E.;e=dII r J

i - 1

(T/i)mJ i _ 1 (~i-l)l'i:i-l . UO(~NJ

i=l K t

+

II r J

i - 1

(T/i)mJi _ 1 (~i-dl[Nt > K]
i=l

where an empty product is assigned value one. Note that [l'i:i = 1, i = 1,2, ... ,Nt 1] = [Nt ~ K] so that the two terms in (3.28) are complementary. Theorem 3.1. Assume that there is a number B such that Ifj(t,~)1 ~ BA(~)j2,~ E Rn,t 2': O. Then the expectation

luo(OI

~

B, and

u(t,~) = E';e=.;X(t, e)

is finite for each

~ E

Rn, t 2': O.

The proof will follow from (3.28) via a series of lemmas. Throughout fy denotes the probability density of the designated random variable Y. For ~ E R n , let T.; be an exponentially distributed random variable with parameter A(O; i.e. fTe(8) Lemma 3.1. For any

~ E

=

A(~)e-.\(';)s

Rn and

8

for

8

>

o.

(3.29)

>0 (3.30)

Proof. The matrix A is positive definite, giving A(O) for any ~ ERn. Then

=

E

~

E+

<

A~, ~

>=

A(~)

2(n + 1) e -.\(';)s --,-' -----,-------,-::(27f )n/2

< mo(0)fTo(8).

(3.31)

D For r > 0 let g(r,·) denote the density of a gamma random variable having shape parameter r and scale parameter E; that is g(r,8) = ET8T-le-Esjr(r) for 8 > o. Also define a:= max sup I~jl > o. (3.32) l:Sy:Sn';ERn

Since A is positive definite, a is finite.

J<

A~,~

>

On Ito's Complex Measure Condition

72

Lemma 3.2. For any

E Rn, j = 1, ... , n, and s > 0

~

(3.33) Proof: First note that SUPy>o ye- y2 and j = 1, ... , n

= (2e)-1/2. Then for any ~

E Rn,

s > 0,

mo (O)EI~j le->-(Os

< mo(O)a EJ < A~, ~ >e-S-ES < mo(O)a E(2es)-1/2 e -Es mo(O)a

(71" )1/2 g (1/2, s).

(3.34)

2e

D Lemma 3.3. For (3 > 0

L

(3k

r(k/2

+ 1)

k:2::0

2(3

::::; (1

+ vn)e

(32

.

(3.35)

Proof: For j 2': 1

1 r (). + 2) =

.

(J -

1

3

1

.

2) . .. . 2 . 2J7f 2': (J -

I)!

vn

2

=

r ().)2vn '

so (3k

L r(k/2 + 1)

k:2::0

(3.36)

Remark The function Ea(z) := L~=l q:zk+ 1 )

D is the Mittag-Leffler function,

fir J;

and E~ (z) = e z2 {I + e- t2 dt}, e.g. see Feller (1971), Podlubny (1999). However the simple bound in the above lemma is adequate for the present development. Proof of Theorem 3.1: We can rewrite (3.28) as KANt

u(t, ~)

E.;

II

r Ji - 1 (T/i)mJ i _ 1 (~i-l)[UO(~Nt)l[NtSKl

1

(3.37)

where K is a geometrically distributed random variable with parameter 1/2, independent of the ~i'S and the Si'S; that is P(K = k) = 2- k - 1 for k 2': O. Observe that for any ~ E Rn, j = 0, ... ,n (3.38)

Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire

73

The hypothesized bounds on 71,0 and fJ along with P(K < 00) = 1 give

k

B

L Ef,l II ImJ

i _ 1

k"?O

(~i-1)ll[KANt=kl'

(3.39)

1

For k :2: 1, set k

Ak

.-

Ef,

II ImJ

i _ 1

(~i-1)ll[KANt=kl

1

k

Ef,

II ImJi_l(~i-1)1(1[Nt=k,K"?kl + l[K=k,Nt >kl) 1

k

Ef,

II ImJ

i _ 1

(~i_1)1(2-k

1

+2- k - 1 fat

fL,~ 8

t

r fL,~-l 8

10

m

(s)e->-(f,k)(t-s)ds

(s)ds)

m

(3.40)

It is helpful to introduce the binomial random variable

k-1

Yk =

L 1 [Ji=O]

(3.41)

i=O

with parameters k and n~1' Setting mo(O) 3.1 and 3.2 give

{

= M, and (3 = a(7r/2e)1/2, Lemmas

M 9 (1, s) if Ji -1 = 0 (3E1/2 M g(1/2, s) if J i -

1

-I- O.

(3.42)

Combining this with the convolution properties of gamma densities gives

(3.43)

Thus Ak

<

2- k Mk Ef,(3k- Y kE k-2Y k (fat g(k

+2- 1 fat

~ Yk, s)e->-(f,k)(t-s)ds

11' g(k ~ Y k ,r)A(~k)e->-(f,k)(S-r)drdS).

(3.44)

74

On Ito's Complex Measure Condition

This is bounded as follows:

i

t

o g(

+2 Yik, s)e->,,(c,k)(t-S)ds

k

:::; -l i t g( k + Yk ,s)g(l, t - s)ds E

2

0

1 (k+Yk

= ~g <

2

k+Yk (Et)~2~

r(k~tk

-

+ 1, t )

+ 1)

:2=-_ 1[1/2r( + 1) k+Yk

< _2---,--(E_t----,) -

(3.45)

~

and

(3.46)

3M k 2- k

k

-------;---Ec{3 1[1/2r(~ + 1) "

Y

k-Yk

k

E- 2 - (Et)

k+Yk 2

3 ME d(3nt 1/ 2 + t)k 1[1/2 (2(n + 1)) r(~+l)

(3.47)

Then (3.39) together with Lemma 3.3 gives

M 2 2 t(f3n+t 1 / 2 )2 < -3B- (1 + M E((3nt1/2 + t)) e 4(n+1) c

7r 1/ 2

1[1/2(n + 1)

:1~2 (1 + 2- n22 1[- nt 1 ({3nt 1 / 2 + t))e(2tr)-n t (f3 n H

1 2 / )2

< 00. (3.48) D

It is convenient to introduce the Fourier dual operator

by

i

defined for ~ ERn

Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire

where the middle convolution is componentwise, 1 :5: j :5: n).

h* j (~)

=

75

(JRn f (~ - T} )hj (dT}) :

Corollary 3.1. Under the conditions of the Theorem 3.1 one has that is a mild solution of {3.17}.

u(t,~)

Proof. Conditioning (3.28) one has by independence of {(~n, K n , Sn) : n 2: I} and Se, Ke, using the substitution lemma for conditional expectations, that u(t, ~)

= E';e=.;(l[Se > t]X(t, 0) + l[Se :5: t]X(t, 0)) = e-)..(';)tuo(~) (l[Se :5: t, Ke = l]r Je (T}l)mJ e(~)X(t - Se, < 1 ») +E'; (l[Se :5: t, Ke = O] )). +E';

(3.50) D

4

Nonconstant Diffusion Coefficient and the Complex Measure Conditions

In this section we explore the complex measure condition on the lower order coefficients when the highest order diffusion coefficient is not necessarily constant. Since the results are somewhat exploratory, we restrict to one dimension and consider the Cauchy problem

au at = Lu + g,

u(O, x) = uo(x)

(4.51)

for

Lf(x) = (a(x)f(x))xx Letting A(~) = E + a~) in the mild form

+ (b(x)f(x))x + c(x)f(x) -

Ef(x).

(4.52)

e, the Fourier transformed equation may be expressed

where a is the complex measure defined by Q:=

~(a( {O} )60 -

v 21r

a).

(4.54)

For later notational convenience let

a({O}) ry =

v'21T'

(4.55)

On Ito's Complex Measure Condition

76

To make the probabilistic construction in Fourier frequency space under the complex measure condition on the lower order coefficients we will require a condition of the following form on the leading order coefficient a( x). CONDITION A: Assume that a is a complex measure and

a( {O}) > lal (R\ {O}),

(4.56)

where lal denotes the corresponding total variation measure. One may note that in the case of constant coefficient a(x) = a, Condition A is merely the condition that a> O. The stochastic jump Markov process {~( t) : t ;::: O} and multiplicative functional X in this setting are defined as follows. First let q, Q denote the measure and probability distribution defined by the coefficients b, c exactly as in (3.18)(3.19) with dimension n = l. Similarly, one defines

de ro = dQ'

rl =

db dQ

(4.57)

precisely as in (3.20)-(3.21). In addition, let ao = IQlt(~) be the probability defined by normalizing the total variation measure of the complex measure a defined above in (4.55). Now define (4.58) Next let {Ji : i ;::: I} and {Ki : i ;::: I} be mutually independent sequences of i.i.d. symmetric Bernoulli 0-1 random variables as defined earlier for (3.25) with n = l. Additionally, let {ai : i ;::: I} be a sequence of independent Bernoulli 0-1 random variables, independent of {Ji : i ;::: I} and {Ki : i ;::: I}, and distributed according to the law

P(ai = 1) = p

lal(R\{O}) a( {O}) E (0,1).

=

(4.59)

For future reference, one should also note that p = ,-llal(R). Now the increments {TJi : i ;::: I} of the jump Markov process are i.i.d. and independent of the above coin tossing sequences {Ji }, {Kd with (4.60) Accordingly the skeletal jump process starting at

~o = ~

is given by

k

~k = ~ -

L TJi,

k;::: l.

(4.61 )

i=l

Conditionally on the spatial random walk {~d the holding times {Sk : k ;::: I} for the jump Markov process may be defined by specifying infinitesimal rates ..\(~k)' (e.g. see Blumenthal and Getoor (1968), Bhattacharya and Waymire (1990)), where (4.62)

Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire

°

Recall that E > is a parameter of (4.52). Finally, the multiplicative times functional factors

mj(~) =

Xis recursively defined with scale

(l-P)~>'(~)' if j = 0, (l-P)A:rr>.(~)' if j = 1

{

e

rp(t,~)

(4.63)

'f J. -- 2 ,

p>.W

and rescaled forcing term stochastic recursion

77

1

2((1 - p)A(~))-lg(t,~), by the following if if if if

Se Se Se Se

2: t < t, "'e = 0, < t, "'e = 1, ae = 0, Je = j < t,"'e = 1,ae = 1,Je =j

E {O, I} E {0,1}

(4.64) We are now ready to state the theorem in this setting. Theorem 4.1. Assume that the diffusion coefficient a satisfies Condition A. If band c are complex measures and if there is a number B such that luo(~)1 :::; B, and Ig(t,~)1 :::; BA(~)/2,~ E Rn,t 2: 0. Then a mild solution u(t,~) for the Fourier transformed equation is given by the stochastic representation

Proof. The focus is on the implied convergence of the expectation. We leave it to the reader to use the strong Markov property of the underlying jump process to verify the equation from the stochastic recursion defined by the times functional X. To establish integrability first note that

Ird1]) I :::;

lal(R) = lal(R\{O})

and

Thus (4.65) Now with counting random variables K, Nt, and K t defined exactly as in (3.27), where Nt is again the number of jumps by time t, one has upon iteration of the stochastic recursion that Kt

u(t,~)

= E~lFd(t, 0) = E~IF~

II

r J i - 1 (1]i)mJi-l (~i_l))l-Ui-l i=l . (r2(1]i)m2(~i-l)ti-l "'i-l {UO(~Nt )l[Nt :::; K] + rp(t - So - ... - SKtl ~Kt)l[Nt > K]} (4.66)

where an empty product is assigned value one. Therefore, letting q = q(R), one has k

lu(t,~) I :::; B

L E~e=~ II IqmJi_ (~i_lW-Ui-l1[K 1\ Nt = k]. 1

k;::O

0

On Ito's Complex Measure Condition

78

For each k it is helpful to introduce the mutually dependent pair of binomial distributed random variables k-l

k-l

= ~(1- O"i)l[Ji = 1],

Xk

Yk = ~(1- O"i)l[Ji = 0].

i=O

(4.67)

i=O

Also in the case that 0"0 = ... = O"k-l = 0 set hk = 1, else let hk denote the density of 2.:~-1 O"iSi conditional on the O"i'S, J/s, and ~i'S. Then, proceeding similarly as in the proof of Theorem 3.1, consider k-l

Ak :=Et,e=t,

II IqmJi(~i)ll-ai1[K

1\ Nt

= k]

o k-l

<

Et,e=t,

II IqmJi (~iW-ai 1[Nt 2: k]P(K 2: k) o

< Elf

+qmo(O)(-) 2e,

Setting (3 =

(1=:p)

1/2

l[Xk

max{(l_~)er'

= 1, Y k = 0] + pk }

vk-}, we have

x

2- k ((3t)---.k+Yk X A < -E _ 2 1[~ kt t,e-t, f( ~k + Yk) 2

k 2~ 2

+ y; > 1] + _q_(P.)k-l + (P.)k k-

2·

Chen, Dobson, Guenther, Orum, Ossiander, Thomann, Waymire

79

Thus we obtain

< B{~ 2- k (t /\ 1) E _ ({3t)~+Yk l[Xk ~ k?O

1;0-1; r( X k

t

2

+ Yik)

2

+

Yi

> 1]

k_

2 + y0Y(2 - p)2 +-2- P 2q

BC1(t /\ 1)

L L

2-k({3t)~

r(n/2) P(Xk

+ 2Yk = n)

k?12::;n::;2k

2B q + 2 - p (y0Y(2 _ p)

< 2B(t /\ 1)C 1

+ 1)

({3t/2) ~ 2B q L + -( n?2 r(n/2) 2 - p y0Y(2 -

< B(t /\ 1){3(1 + J2{3t/7r)e~t + 2B (

p)

t )+

2-p y0Y2-p

This establishes the desired convergence.

5

+ 1) 1). D

Acknowledgments

The authors are grateful to Professor V. N. Kolokoltsov for providing additional references and comments on a draft of this paper. This work was partially supported by a Focussed Research Group grant DMS-0073865 from the National Science Foundation.

Bibliography [1]

Albeverio, S., R. H¢egh-Krohn (1976): Mathematical theory of Feynman path integrals, Lecture Notes in Mathematics 523, Springer-Verlag, NY

[2]

Bhattacharya, R. and E. Waymire (1990): Stochastic Processes with Applications, Wiley, NY.

[3]

Bhattacharya, R., L Chen, S. Dobson, R. Guenther, C. Orum, M. Ossiander, E. Thomann, E. Waymire (2002): Majorizing Kernels & Stochastic Cascades With Applications To Incompressible Navier-Stokes Equations, Trans. Amer. Math. Soc. (in press).

[4]

Blumenthal, R.M. and R.K. Getoor (1968): Markov Processes and Potential Theory, Academic Press, NY

[5]

Feller, W. (1971): An Introduction to Probability Theory and its Applications, Vol II, 2nd ed., Wiley, NY

[6]

Folland, Gerald B. (1992) Fourier Analysis and its Applications Brooks/Cole Publishing Company, Pacific Grove California

[7]

Ito, K.(1965): Generalized uniform complex measures in the Hilbertian metric space with the application to the Feynman integral, Proc. Fifth Berkeley Symp. Math. Stat. Probab. II, 145-161.

80

On Ito's Complex Measure Condition

[8]

Kolokoltsov, V.N. (2000): Semiclassical analysis for diffusions and stochastic processes, Springer Lecture Notes in Mathematics, v. 1724, SpringerVerlag, NY.

[9]

Kolokoltsov, V.N. (2002): A new path integral respresentation for the solutions of the Schrodinger equations, Math.Proc. Camb.Phil.Soc 132 353-375

[10] LeJan, Y. and A.S. Sznitman (1997). Stochastic cascades and 3-dimensional Navier-Stokes equations, Prob. Theory and Rel. Fields 109 343-366. [11] Podlubny, I. (1999): Fractional Differential Equations, Academic Press, San Diego, CA.

Variational formulas and explicit bounds of Poincare-type inequalities for one-dimensional processes Mu-Fa Chen l Beijing Normal University

Abstract This paper serves as a quick and elementary overview of the recent progress on a large class of Poincare-type inequalities in dimension one. The explicit criteria for the inequalities, the variational formulas and explicit bounds of the corresponding constants in the inequalities are presented. As typical applications, the Nash inequalities and logarithmic Sobolev inequalities are examined.

AMS 2000 Subject classification: 49R50, 34L15, 26DIO, 60J27 Keywords: Variational formula, Poincare inequality, Nash inequality, logarithmic Sobolev inequality, Orlicz space, one-dimensional diffusion, birthdeath process.

1

Introduction

The one-dimensional processes in this paper mean either one-dimensional diffusions or birth-death Markov processes. Let us begin with diffusions. Let L = a(x)d 2 /dx 2 + b(x)d/dx be an elliptic operator on an interval (0, D) (D :S (0) with Dirichlet boundary at a and Neumann boundary at D when D < 00, where a and b are Borel measurable functions and a is positive everywhere. Set C(x) = bfa, here and in what follows, the Lebesgue measure dx is often omitted. Throughout the paper, assume that

J;

Z:=

lD

eC / a <

00.

(1.0)

Hence, dJ-l := a-1ecdx is a finite measure, which is crucial in the paper. We are interested in the first Poincare inequality

where Cd is the set of all continuous functions, differentiable almost everywhere and having compact supports. When D = 00, one should replace [0, D] by [0, D) but we will not mention again in what follows. Next, we are also interested in the second Poincare inequality (1.2) where 7r(f) = J-l(f)/Z = J JdJ-l/Z. To save the notations, we use the same A (resp., A) to denote the optimal constant in (1.1) (resp., (1.2)). The aim of the study on these inequalities is looking for a criterion under which (1.1) (resp., (1.2)) holds, i.e., the optimal constant A < 00 (resp., A < (0), 1 Research

supported in part by NSFC (No. 10121101), RFDP and 973 Project.

81

Poincare- type Inequalities

82

and for the estimations of A (resp., A). The reason why we are restricted in dimension one is looking for some explicit criteria and explicit estimates. Actually, we have dual variational formulas for the upper and lower bounds of these constants. Such explicit story does not exist in higher dimensional situation. Next, replacing the L2-norm on the right-hand sides of (1.1) and (1.2) with a general norm 11·lllffi in a suitable Banach space (the details are delayed to §3), respectively, we obtain the following Poincare-type inequalities IIf211lffi :S; AlffiD(f),

f

II (f - 7f(f) )211 :S; AlffiD(f),

f

0.

(1.3)

CCd[O, DJ.

(1.4)

f(O)

E CCd[O, D], E

=

For which, it is natural to study the same problems as above. The main purpose of this paper is to answer these problems. By using this general setup, we are able to handle with the following Nash inequalities[23j Ilf - 7f(f)112+ 4 / v :S; AND(f)llflli/ v

(1.5)

in the case of v > 2, and the logarithmic Sobolev inequality[18j: (1.6) To see the importance of these inequalities, define the first Dirichlet eigenvalue AO and the first Neumann eigenvalue A1, respectively, as follows.

AO = inf{D(f): f E C 1 (0,D) nC[O,D], f(O) = 0, 7f(12) = I}, A1 = inf{D(f) : f E C 1(0, D) n C[O, D], 7f(f) = 0, 7f(12) = I}. Then, it is clear that AO

(1.7)

= 1/A and A1 = 1/ A. Furthermore, it is known that

The second Poincare inequality

¢==::}

Var(Ptf) :S; Var(f) e- 2A1t .

Logarithmic Sobolev inequality

¢==::}

Ent(Ptf) :S; Ent(f) e-2t/ALS,

Nash inequality

¢==::}

Var(Ptf) :S; Cllfllr c

(1.8)

v,

where IlfilT is the LT(ft)-norm (cf., [8], [13], [18J and references within). It is clear now that the convergence in the first line is also equivalent to the exponential ergodicity for any reversible Markov processes with density (cf. [10]), and C(x), where i.e., IIPt(x,·) - 7fIIVar :S; C(x)e- ct for some constants E > Pt (x, .) is the transition probability. The study on the existence of the equilibrium 7f and on the speed of convergence to equilibrium, by Bhattacharya and his cooperators, consists a fundamental contribution in the field. See for instance [2J-[6J and references within. The second line in (1.8) is correct for diffusions but incorrect in the discrete situation. In general, one has to replace "¢==::}" by "====?". Here are three examples which distinguish the different inequalities.

°

° °

b(x) = a(x) = x, b(x) = a=x 2 Iog' x a(x) = 1 b(x) = -b

Ergodicity

2nd Poincare

LogS

L--r- exp .

Nash

,>1

,2::2

,>2

,>2

,>2

J

,2::0

,2::1

,>1

x

J

J

x

x

x

Table 1.1, Examples: Diffusions on [0, (0)

Mu-Fa Chen

83

Here in the first line, "LogS" means the logarithmic Sobolev inequality, "L1_ exp." means the L1-exponential convergence which will not be discussed in this paper. "J" means always true and "x" means never true, with respect to the parameters. Once known the criteria presented in this paper, it is easy to check Table 1.1 except the L1-exponential convergence. The remainder of the paper is organized as follows. In the next section, we review the criteria for (1.1) and (1.2), the dual variational formulas and explicit estimates of A and A. Then, we extend partially these results to Banach spaces first for the Dirichlet case and then for the Neumann one. For a very general setup of Banach spaces, the resulting conclusions are still rather satisfactory. Next, we specify the results to Orlicz spaces and finally apply to the Nash inequalities and logarithmic Sobolev inequality. Since each topic discussed subsequently has a long history and contains a large number of publications, it is impossible to collect in the present paper a complete list of references. We emphasize on recent progress and related references only. For the applications to the higher dimensional case and much more results, the readers are urged to refer to the original papers listed in References, and the informal book [13], in particular.

2

Ordinary Poincare inequalities

In this section, we introduce the criteria for (1.1) and (1.2), the dual variational formulas and explicit estimates of A and A.

To state the main results, we need some notations. Write x /\ y = min {x, y} and similarly, x V y = max { x, y}. Define

F = {f

j

=

O[O,D]

E

n C 1(O,D):

{f E 0[0, D] : f(O) f = f(·/\xo), f

= {f j' = {f

F'

f

= E

f(O) = 0,

f'1(O,D)

> O},

0, there existsxo E (0, DJso that

C1(O,xo)andf'l(o,xo) >

O},

(2.1)

E 0[0,

D] : f(O) = 0, fl(o,D) > O},

E 0[0,

D] : f(O) = 0, there existsxo E (0, D]so that

=

f(·/\ xo)andfl(o,x o) >

O}.

Here the sets F and F' are essential, they are used, respectively, to define below the operators of single and double integrals, and are used for the upper bounds. The sets j and j' are less essential, simply the modifications of F and F', respectively, to avoid the integrability problem, and are used for the lower bounds. Define

I(f)(x)

=

e-G(x) lD f'(x) x [feG la] (u)du,

fEF,

(2.2)

II(f)(x)

=

(X lD f(x) Jo dye-G(y) y [feG la] (u)du, 1

f

E

F'.

The next result is taken from [12; Theorems 1.1 and 1.2]. The word "dual" below means that the upper and lower bounds are interchangeable if one exchanges the orders of "sup" and "inf" with a slight modification of the set F (resp., F') of test functions.

Poincare-type Inequalities

84

= sup
Theorem 2.1. Let (1.0) hold. Define
xE(O,D)

e: .

Then, we have the following assertions.

(1) Explicit criterion: A <

00

iff B <

00.

(2) Dual variational formulas:

A::; inf

sup II(f)(x) = inf

JEP XE(O,D)

A

~ sup

inf

sup I(f)(x),

JeF xE(O,D)

II(f)(x) = sup

JEFf xE(O,D)

inf

I(f)(x).

(2.3)

JEF XE(O,D)

The two inequalities all become equalities whenever both a and b are continuous on [0, D].

(3) Approximating procedure and explicit bounds: (a) Define h = yTP, fn = fn-1II(fn-l) and Dn = SUPxE(O,D) II(fn) (x). Then Dn is decreasing in n and A ::; Dn ::; 4B for all n ~ 1. (b) Fix Xo E (0, D). Define fi xo ) =
= f~~{(· Axo)II(J~~{(.

Axo))

and en = sUPxoE(O,D) infxE(o,D) II(J~xo)(. A xo))(x). Then en is increasing in n and A ~ en ~ B for all n ~ 1. We mention that the explicit estimates "B ::; A ::; 4B" were obtained previously in the study on the weighted Hardy's inequality by [22]. We now turn to study A, for which it is natural to assume that

1D

e-C(s)ds

1 s

a(u)-leC(u)du

Theorem 2.2. Let (1.0) and (2.4) hold and set f the following assertions.

(1) Explicit criterion: A <

00

iff B <

00,

=

=

00.

(2.4)

f - 7r(f). Then, we have

where B is given by Theorem 1.l.

(2) Dual variational formulas: sup

inf

JEF xE(O,D)

IU)(x)::; A::; inf

sup I(f)(x).

(2.5)

JEF XE(O,D)

The two inequalities all become equalities whenever both a and b are continuous on [0, D].

(3) Approximating procedure and explicit bounds: (a) Define h = yTP, fn = fn-1II(fn-l) and Dn Then A ::; D n ::; 4B for all n ~ l.

= SUPXE(O,D) IIUn) (x).

(b) Fixxo E (O,D). Define fi xo ) =
Mu-Fa Chen

85

Part (1) of the theorem is taken from [11; Theorem 3.7J. The upper bound in (2.5) is due to [16J. The other parts are taken from [12; Theorems 1.3 and l.4J. Finally, we consider inequality (1.2) on a general interval (p, q) (-00 S; p < q S; (0). When p (resp., q) is finite, at which the Neumann boundary condition is endowed. We adopt a splitting technique. The intuitive idea goes as follows: Since the eigenfunction corresponding to A, if exists, must change signs, it should vanish somewhere in the present continuous situation, say B for instance. Thus, it is natural to divide the interval (p, q) into two parts: (p, B) and (B, q). Then, one compares A with the optimal constants in the inequality (1.1), denoted by Ale and A 2e , respectively, on (B, q) and (p, B) having the common Dirichlet boundary at B. Actually, we do not care about the existence of the vanishing point B. Such B is unknown, even if it exists. In practice, we regard B as a reference point and then apply an optimization procedure with respect to B. We now redefine C (x) = b/ a. Again, since it is in the ergodic situation, we assume the following (non-explosive) conditions:

J:

l' fe

q

e-C('lds e-C(s)ds

l'

e Cfa

~ 00

fes e C /a =

00

if P ~

-00

if q =

00

and

(2.6)

for some (equivalently, all) B E (p, q). Corresponding to the intervals (B, q) and (p, B), respectively, we have constants B le and B 2 (h given by Theorem 1.1. Theorem 2.3. Let (2.6) hold. Then, we have

(2) Let B be the medium of j.-l, then (Ale V A 2e )/2 S; A S; Ale V A 2e .

In particular, A <

00

iff Bl() V B 2e <

00.

Comparing the variational formulas (2.3) and (2.5) with the classical variational formulas given in (1.7), one sees that there are no common points. This explains why the new formulas (2.3) and (2.5) have not appeared before. The key here is the discover of the formulas rather than their proofs, which are usually simple due to the advantage of dimension one. As an illustration, here we present parts of the proofs. Proof of the upper bound in (2.5). Originally, the assertion was proved in [16J by using the coupling methods. Here we adopt the analytic proof given in [9J. Let 9 E C[a, DJ n Cl(a, D), 7["(g) = a and 7["(g2) = 1. Then, for every f E :F

Poincare-type Inequalities

86

with 7r(f)

~

0, we have

~

r 7r(dx)7r(dy) [g(y) - g(X)]2 2 Jo = r 7r(dX)7r(dy )(l g'(~dU)2 J{x~y} x f (u)

1=

D

Y

~

r 7r(dx)7r(dy) lx J{x~y}

Y

gf'~(U))2 du u

l

f'(~)d~

Y

x

(by Cauchy-Schwarz inequality)

= =

r 7r(dX)7r(dy )lx g'(u)2eC(U)e;,~(~) du[J(y) J{x~y} Y

lD

f(x)]

U

l u lD l lD

a(u)g'(u)27r(du) z;~(:;u)

~ D(g)

Ze-C(u)

f'()

sup uE(O,D)

~ D(g)

U

7r(dx)

u

7r(dy) [J(y) - f(x)]

7r(dx)

u

0

(since7r(f) ~

sup I(f)(x)

7r(dy) [f(y) - f(x)]

0).

XE(O,D) 1

Thus, D(g)- ~

-

SUPxE(O,D)

I(f)(x), and so

A=

sup

D(g)-l

~

sup I(J)(x). xE(O,D)

g: 7r(g)=O, 7r(g2)=1

This gives us the required assertion:

A ~ inf

sup I(J)(x).

JEF XE(O,D)

The proof of the sign of the equality holds for continuous a and b needs more work, since it requires some more precise properties of the corresponding eigenfunctions. D

Proof of the explicit upper bound "A

~

4B".

As mentioned before, this result is due to [22]. Here we adopt the proof given in [11], as an illustration of the power of our variational formulas. Recall that B = SUPxE(O,D) e- c e C la. By using the integration by parts formula, it follows that

J;

J:

(2.1)

Hence

lD

e-C(x) J'Pe c I(ViP) (x) = (J'P)'(x) x -aas required.

D

~

e-C(x)v'P(x) (1/2)e- C(x) .

2B

~ = 4B

Mu-Fa Chen

3

87

Extension; Banach spaces

Starting from this section, we introduce the recent results obtained in [14] and [15], but we will not point out time by time subsequently. In this section, we study the Poincare-type inequality (1.3). Clearly, the Banach spaces used here can not be completely arbitrary since we are dealing with a topic of hard mathematics. l,From now on, let (lB, I . IIIB' p,) be a Banach space of functions f: [0, D] --+ IR satisfying the following conditions: (1)

1ElB;

(2)

lBis ideal: Ifh E lBandlfl ~ Ihl, thenf E lB;

(3)

IlfilIB = sup gEQ

(4)

(3.1)

D r io Iflgdp"

Q:3 gowithinfgo

> 0,

where Q is a fixed set, to be specified case by case later, of non-negative functions on [0, D]. The first two conditions mean that lB is rich enough and the last one means that Q is not trivial, it contains at least one strictly positive function. The third condition is essential in this paper, which means that the norm I . IIIB has a "dual" representation. A typical example of the Banach space is lB = L' (p,), then Q = the unit ball in L~ (p,), l/r + l/r' = 1. The optimal constant A in (1.3) can be expressed as a variational formula as follows.

AIB -_ sup

{ IID(f)' f211IB . f

E Cd[O, D], f(O) -_ 0, 0< D(f) <

00

}.

(3.2)

Clearly, this formula is powerful mainly for the lower bounds of A. However, the upper bounds are more useful in practice but much harder to handle. Fortunately, for which we have quite complete results. Define

BIB DIB

= sup
=

sup xE(O,D)

Ilfo
1\

CIB =

sup xE(O,D)

11'P(x 1\ ·?IIIB 'P(X)

YII IB

(3.3)

J
Theorem 3.1. Let (1.0) and (3.1) hold. Then we have the following assertions.

(1) Explicit criterion: AIB <

00

iff BIB <

00.

(2) Variational formulas for the upper bounds: AIB~ inf, sup f(x)-lllf'P(x 1\ ')IIIB JEF xE(O,D) e-C(x) ~ inf sup f'() IlfI(x,D)IIIB' JEF xE(O,D) X

(3.4)

Poincare-type Inequalities

88

(3) Approximating procedure and explicit bounds: Let BlB < 00. Define fo = y"P, fn(x) = [[fn-l
(3.5) for all n 2: 1. We are now going to sketch the proof of the second variational formula in (3.4), from which the explicit upper bound AlB ::; 4BlB follows immediately, as we did at the end of the last section. The explicit estimates "BlB ::; AlB ::; 4BlB " were previously obtained in [7] in terms of the weighted Hardy's inequality [22]. The lower bounds follows easily from (3.2).

Sketch of the proof of the second variational formula in (3.4). The starting point is the variational formula for A (cf. (2.3)):

e-C(x) A < inf sup - IEF xE(O,D) f'(x) Fix g

lD

fec -x a

= inf sup

e-C(x)

IEF xE(O,D) f'(x)

lD x

fdp.

> 0 and introduce a transform as follows. b ---+ b/ g,

a

---+

a/g > O.

(3.6)

Under which, C(x) is transformed into

l°

x

Cg(x) =

big

- / = C(x). a g

This means that the function C is invariant of the transform, and so is the Dirichlet form D (f). The left-hand side of (1.1) is changed into

faD f2ge C/a

= faD f2gdp.

At the same time, the constant A is changed into

lD

e-C(x) Ag ::; inf sup f' ( ) f gdp. x x IEF XE(O,D) Making supremum with respect to g E Q, the left-hand side becomes

and the constant becomes

e-C(x) AlB = sup Ag ::; sup inf sup f' ( ) 9 glx x e-C(x)

= inf sup l' ( ) sup 1

x

= i~f s~p

x 9 e-C(x) f'(x)

lD °

lD x

f gdp ::; inf sup sup Ig x

f I(x,DWdp.

IlfI(x,D) IllB'

Mu-Fa Chen

89

We are done! Of course, more details are required for completing the proof. For instance, one may use 9 + lin instead of 9 to avoid the condition" 9 > 0" and then pass limit. 0 The lucky point in the proof is that "sup inf ::::; inf sup", which goes to the correct direction. However, we do not know at the moment how to generalize the dual variational formula for lower bounds, given in the second line of (2.3), to the general Banach spaces, since the same procedure goes to the opposite direction.

4

Neumann Case; Orlicz Spaces

In the Neumann case, the boundary condition becomes J'(O) = 0, rather than J(O) = O. Then Ao = 0 is trivial. Hence, we study Al (called spectral gap of L), that is the inequality (1.2). We now consider its generalization (1.4). Naturally, one may play the same game as in the last section extending (2.5) to the Banach spaces. However, it does not work this time. Note that on the left-hand side of (1.4), the term 7r(f) is not invariant under the transform (3.6). Moreover, since 7r(J) = 0, it is easy to check that for each fixed J E F, I(J)(x) is positive for all x E (0, D). But this property is no longer true when dJ-.l is replaced by gdJ-.l. Our goal is to adopt the splitting technique explained in Section 2. Let () E (p, q) be a reference point and let A;o, B~o, C~o, D;o (k = 1,2) be the constants defined in (3.2) and (3.3) corresponding to the intervals ((), q) and (p, ()), respectively. By Theorem 3.1, we have

k = 1,2. Theorem 4.1. Let (2.6) and (3.1) hold. Then, we have the Jo llo wing assertions.

(1) Explicit criterion: AB <

00

iff B~o V B't/ <

00.

(2) Estimates:

10 20 - B -
= 1,

bo ... bn -

J-.ln =

Consider a Banach space (IB, satisfying (3.1). Define i

1

I . liB, J-.l)

of functions E .-

{O, 1, 2, ... }

--+

lR

1

L-t J-.l'a.' i>l' - ,

'P'-'\;'"'t -

j=1

J

J

Clearly, the inequalities (1.3) and (1.4) are meaningful with a slight modification.

Poincare-type Inequalities

90

Theorem 4.2. Consider birth-death processes with state space E. Assume that

Z <

00.

(1) Explicit criterion for (1.3): Alffi

< 00 iff Blffi <

00.

(2) Explicit bounds for A lffi : Blffi ::::; Alffi ::::; 4Blffi· (3) Explicit criterion for (1.4): Let the birth-death process be non- explosive: 1

<Xl

i

"""" L....J -/I·b """" L....J /-L j =

(4.1)

00.

i=O ,...,~ ~ j=O

Then Alffi

< 00 iff Blffi <

00.

(4) Estimates for A lffi : Let E1 = {I, 2, ... } and let C1 and C2 be two constants such that 11r(f) I ::::; c111flllffi and 11r(f IEI)I ::::; c211fIEI ll lffi for all f E Iffi. Then, max {11111;I, (1::::; Alffi ::::; (1

V 2(1- 1r0)1111Ilffi )2}Alffi C

(4.2)

+ VcIil 1 Illffi ) 2Alffi.

Similarly, one can handle the birth-death processes on Z. An interesting point here is that the first lower bound in (4.2) is meaningful only in the discrete situation.

Orlicz spaces. The results obtained so far can be specialized to Orlicz spaces. The idea also goes back to [7]. A function : JR --+ JR is called an N - function if it is non-negative, continuous, convex, even (i.e., <1>( -x) = (x )) and satisfies the following conditions:

(x) =0 iff x=O,

lim (x) / x = 0,

lim (x) / x =

In what follows, we assume the following growth condition (or for <1>: sup (2x) /<1> (x ) x»l

<

00

00.

x-><Xl

x->O

~2-condition)

(¢::::::? sup x~ (x) /<1> (x ) < (0), x»l

where ~ is the left derivative of <1>. Corresponding to each N-function, we have a complementary N -function:

Y E JR. Alternatively, let 'Pc be the inverse function of ~, then c(Y) = J~YI 'Pc (cf.

[24]). Given an N-function and a finite measure /-L on E := (p, q) Orlicz space as follows:

Ilfll1>

= sup gEt;}

c JR, define an

JEr Iflgd/-L,

(4.3)

Mu-Fa Chen

91

where 9 = {} 2:: , : JE EBJ (} )d/-l :::;

00 },

which is the set of non-negative functions

in the unit ball of L
(resp., (2.6)) holds, then Theorem 3.1 ( resp., 4.1) is available for the Orlicz space (L
5

N ash inequality and Sobolev-type inequality

It is known that when v

> 2, the Nash inequality (1.5):

is equivalent to the Sobolev-type inequality: Ilf - 7f(J)II~/(v-2) :::; AsD(J), where II· Ilr is the Lr(/-l)-norm. Refer to [1], [8] and [26]. This leads to the use of the Orlicz space L
(5.1) The results in this section were obtained in [19], based on the weighted Hardy's inequalities. Define C(x) = bfa, /-l(m, n) = eC /a and

J:

J;

l i

x

e- c

B~fJ

= sup fJ

fJ

e- c

B~fJ

= sup
Here B~fJ (k = 1,2) is specified from BJB given in (3.3) with IBl = L
> 2.

(1) Explicit criterion: Nash inequality (equivalently, (5.1)) holds on (p,q) iff B~fJ V B~fJ

<

00.

(2) Explicit bounds:

~ (BlfJ 1\ B2fJ)

max { 2

v

v'

[1 _(ZlfJ V Z2fJ ) 1/2+I/V] 2 (BlfJ V B2fJ) } ZlfJ + Z2fJ v v

:::; Av :::; 4(B~fJ V B~fJ). In particular, if () is the medium of /-l, then

(5.2)

92

Poincare-type Inequalities

We now consider birth-death processes with state space {O, 1,2" .. }. Define 1

i

00

1!1'="\;"""'t't ~ ,i>l' _,

Bv

j=l/J-jaj

= sup 'Pi

(

t>l -

)

(v-2)/v

L/J-j ..

J=t

Theorem 5.2. For birth-death processes, let (4.1) hold and assume that Z < Then, we have max {(

2 )2/V , 1[ - (Z_1)1/2+1/V]2} _ vzv/2-1 -----zBv :s; Av :s; 16Bv.

Hence, when v > 2, the Nash inequality holds iff Bv <

6

00.

(5.3)

00.

Logarithmic Sobolev inequality

The starting point of the study is the following observation.

~II(J -

7r(J))211

:s; £(J) :s;

~~ II(J -

7r(J))211'

(6.1)

where (x) = Ixjlog(l + Ixl), £(J) = SUPcElR Ent( (J + C)2) and Ent(J) = flR flog 7['(f) d/J-, f 2: 0. Refer to [7] and [17; page 247], which go back to [25]. A modification of the coefficients is made in [12]. The observation leads to the use of the Orlicz space Ia = L(/J-) with (x) = Ixllog(l + Ixl). The results in this section were obtained in [20], based again on the weighted Hardy's inequalities. Refer also to [21] for the related study. Define

(6.2)

Again, here B~o (k

= 1,2) is specified from BM, given in (3.3).

Theorem 6.1. Let (2.6) hold.

(1) Explicit criterion: The logarithmic Sobolev inequality on (p, q) c lR holds iff sup /J-(x, q) log XE(O,q)

sup /J-(p,x)log xE(p,O)

(1 ) /J- X, q (1 ) /J- p, x

l 1 x

e- c

<

00

and

0

0

x

hold for some (equivalently, all) () E (p, q).

(6.3) e- c

< 00

Mu-Fa Chen

93

(2) Explicit bounds: Let (j be the root of B~(}

= B~(}, () E [p, q]. Then, we have (6.4)

By a translation if necessary, assume that () = 0 is the medium of J-L. Then, we have

We now consider birth-death processes with state space {O, 1,2,· .. }. Define

Bq, = sUP'PiM(J-L[i,oo)), i~l

where J-L[i,oo)

=

Lj~iJ-Lj and

M(x) is defined in (6.2).

Theorem 6.2. For birth-death processes, let (4.1) hold and assume that Z < Then, we have 2 {J4Z+1-1 - max 5 2' ~

A LS

~

00.

( 1 - ZlW-1(Zll))2} Bq, ZW- 1(Z-l)

551( 1 + w- 1( z- 1))2 Bq"

where Zl = Z - 1 and w- I is the inverse function of w: w(x) In particular, A LS < 00 iff

1 ) sup 'Pi J-L[i, 00) log [. i~1 J-L'l,00

= x 2 log(1 + x 2).

< 00.

Acknowledgement. This paper is based on the talks given at "Stochastic analysis on large scale interacting systems", Shonan Village Center, Hayama, Japan (July 17-26, 2002) and "Stochastic analysis and statistical mechanics", Yukawa Institute, Kyoto University, Japan (July 29-30, 2002). The author is grateful for the kind invitation, financial support and the warm hospitality made by the organization committee: Profs. T. Funaki, H. Osada, N. Yosida, T. Kumagai and their colleagues and students.

Mu-Fa Chen Department of Mathematics, Beijing Normal University Beijing 100875, The People's Republic of China E-mail: [email protected] Home page: http) /www.bnu.edu.cn;-chenmf/main_eng.htm

94

Poincare-type Inequalities

Bibliography [1] Bakry, D., Coulhon, T., Ledoux, M., Saloff-Coste, L., Sobolev inequalities in disguise, Indiana Univ. Math. J. 44(4), 1033-1074 (1995) [2] Bhattacharya, R N., Criteria for recurrence and existence of invariant measures for multidimensional diffusions, Ann. Probab. 3, 541-553 (1978). Correction, ibid. 8, 1194-1195 (1980) [3] Bhattacharya, R N., Multiscale diffusion processes with periodic coefficients and an application to solute transport in porous media, Ann. Appl. Probab. 9(4), 951-1020 (1999) [4] Bhattacharya, R N., Denker, M. and Goswami, A., Speed of convergence to equilibrium and to normality for diffusions with multiple periodic scales, Stoch. Proc. Appl. 80, 55-86 (1999) [5] Bhattacharya, R N. and G6tze, F. Time-scales for Gaussian approximation and its break down under a hierarchy of periodic spatial heterogeneities, Bernoulli 1, 81-123 (1995) [6] Bhattacharya, R N. and Waymire, C. Iterated random maps and some classes of Markov processes, in "Handbook of Statistics", Vol. 19, pp.145170, Eds. Shanbhag, D. N. and Rao, C. R, Elsevier Sci. B. V., 200l. [7] Bobkov, S. G., G6tze, F., Exponential integrability and transportation cost related to logarithmic Sobolev inequalities, J. Funct. Anal. 163, 1-28 (1999) [8] Carlen, E. A., Kusuoka, S., Stroock, D. W., Upper bounds for symmetric Markov transition functions, Ann. Inst. Henri Poincare 2, 245-287 (1987) [9] Chen, M. F., Analytic proof of dual variational formula for the first eigenvalue in dimension one, Sci. Chin. (A) 42(8), 805-815 (1999) [10] Chen, M. F., Equivalence of exponential ergodicity and L2-exponential convergence for Markov chains, Stoch. Proc. Appl. 87, 281-297 (2000) [11] Chen, M. F., Explicit bounds of the first eigenvalue, Sci. Chin. (A) 43(10), 1051-1059 (2000) [12] Chen, M. F., Variational formulas and approximation theorems for the first eigenvalue in dimension one, Sci. Chin. (A) 44(4), 409-418 (2001) [13] Chen, M. F., Ergodic Convergence Rates of Markov Processes - Eigenvalues, Inequalities and Ergodic Theory, Collection of papers, 1993-200l. http://www.bnu.edu.cnrchenmf/main_eng.htm [14] Chen, M. F., Variational formulas of Poincare-type inequalities in Banach spaces of functions on the line, Acta Math. Sin. Eng. Ser. 18(3), 417-436 (2002) [15] Chen, M. F., Variational formulas of Poincare-type inequalities for birthdeath processes, preprint (2002), submitted to Acta Math. Sin. Eng. Ser. [16] Chen, M. F. and Wang, F. Y., Estimation of spectral gap for elliptic operators,r Trans. Amer. Math. Soc. 349(3), 1239-1267 (1997)

Mu-Fa Chen

95

[17] Deuschel, J. D. and Stroock, D. W., Large Deviations, Academic Press, New York, 1989 [18] Gross, L., Logarithmic Sobolev inequalities, Amer. J. Math. 97, 1061-1083 (1976) [19] Mao, Y. H., Nash inequalities for Markov processes in dimension one, Acta Math. Sin. Eng. Ser. 18(1), 147-156 (2002) [20] Mao, Y. H., The logarithmic Sobolev inequalities for birth-death process and diffusion process on the line, Chin. J. Appl. Prob. Statis. 18(1), 94-100 (2002) [21] Miclo, L., An example of application of discrete Hardy's inequalities, Markov Processes Relat. Fields 5, 319-330, (1999) [22] Muckenhoupt B., Hardy's inequality with weights, Studia Math. XLIV, 31-38 (1972) [23] Nash, J., Continuity of solutions of parabolic and elliptic equations, Amer. J. Math. 80, 931-954 (1958) [24] Rao, M. M. and Ren, Z, D., Theory of Orlicz Spaces, Marcel Dekker, Inc. New York, 1991 [25] Rothaus, O. S., Analytic inequalities, isoperimetric inequalities and logarithmic Sobolev inequalities, J. Funct. Anal. 64, 296-313 (1985) [26] Varopoulos, N., Hardy-Littlewood theory for semigroups, J. Funct. Anal. 63, 240-260 (1985)

96

Poincare-type Inequalities

Brownian Motion and the Classical Groups Anthony D' Aristotile

Persi Diaconis

SUNY at Plattsburgh

Stanford University

Charles M. Newman Courant Inst. of Math. Sciences Abstract Let r be chosen from the orthogonal group On according to Haar measure, and let A be an n x n real matrix with non-random entries satisfying Tr AA t = n. We show that Tr Ar converges in distribution to a standard normal random variable as n ---t 00 uniformly in A. This extends a theorem of E. Borel. The result is applied to show that if entries {31, . . . , {3k n are selected from r where k n ---t 00 as n ---t 00, then

/if 'L,1~ltl {3j, 0 :s:

t

:s:

1 converges to Brownian motion. Partial results

in this direction are obtained for the unitary and symplectic groups.

Keywords: Brownian motion; sign-symmetry; classical groups; random matrix; Haar measure

1

Introduction

Let On be the group of n x n orthogonal matrices, and let r be chosen from the uniform distribution (Haar measure) on On. There are various senses in which the elements of for behave like independent standard Gaussian random variables to good approximation when n is large. To begin with, a classical theorem of Borel [6] shows that P{ for 11 ::; x} ---t

(x) where (x)

vk I-oo eX

t2

dt. Theorems 2.1 and 2.2 below refine this, showing that an arbitrary linear combination of the elements of r is approximately normal: as n ---t 00, =

sup A#O

-oo<x
2

IP{ Tr(Ar) ::; x} - (x) VITAITlfo

I ---t o.

(1.1)

Here A ranges over all non-zero n x n matrices and IIAII = Tr(AAt); thus the normal approximation result is uniform in A. Borel's theorem follows by taking A to have a one in the one-one position and zeros elsewhere. When A above is the identity matrix, Diaconis and Mallows (see [11]) proved that Trr is approximately normal; this follows by taking A as the identity. As A varies, it follows that linear combinations of elements of r are also approximately normal. Interpolating between these facts and Borel's result, we prove that linking appropriately normalized entries from r yields in the limit standard Brownian motion. This is stated precisely in Theorem 3 below. We give a little history. Borel's result is usually stated thus: Let X be the first entry of a point randomly chosen from the n-dimensional unit sphere. Then P{ foX ::; x} ---t (x) as n tends to 00. Since the first row (or column) of a uniformly chosen orthogonal matrix is uniformly distributed on the unit sphere, Theorem 2.1 includes Borel's theorem. Borel, following earlier work by Mehler [31] and Maxwell [28, 29], proved the result as a rigorous version of the equivalence of ensembles in statistical mechanics. This says that features of the

97

Brownian Motion and the Classical Groups

98

micro canonical ensemble (uniform on the sphere) are captured by the canonical ensemble (product Gauss measure). These results are often mistakenly attributed to Poincare. See [15] for a careful history, rates of convergence, and applications to de Finetti type theorems for orthogonally invariant processes. The present project may be seen in the same light: the conditional distribution of an n x n matrix M with independent standard Gaussian coordinates, conditioned on M MT = I is Haar measure on the orthogonal group. Borel also studied the joint distribution of several coordinates of fo f. His work was extended by Levy [24, 25, 26]' Olshanski and Vershik [33] and 1 1 Diaconis-Eaton-Lauritzen [13]. These last authors show that any n a x n a block of fof converges to product Gauss measure in total variation. They also give applications to versions of deFinettti's theorem suitable for regression and the analysis of variance. Extensions by McKean to infinite dimensions are in [30]; he writes that "It is fruitful to think of Wiener space as an infinite-dimensional Our Theorem 3 gives one rigorous version of this fantastic sphere ofradius statement. These ideas were developed by Hida [21]; see Kuo [23] for a recent account. The study of global functionals such as the trace is carried out in [14, 16, 33]. In particular, the joint limiting distribution of Tr(f), Tr(f2), . .. , Tr(fk) is determined as that of independent normal variables. This turns out to be equivalent to a celebrated theorem of Szego and allows further study of the eigenvalues of f; see [7]. The eigenvalues of such random matrix models arise in dozens of situations and are currently being intensely studied. Mehta [32] gives a book length treatment. The area is in active development; see [12] for a recent survey. Interestingly, the eigenvalues of a Gaussian matrix have very different behavior from the eigenvalues of a random orthogonal matrix. In the first case they fill out the inside of the unit circle with order fo of them on the real axis [2, 17]; in the second case the eigenvalues lie on the unit circle. Brownian limits for partial traces are established in [10] and by Rains [34]. This last paper does much more, establishing results for partial traces of random matrices with law invariant under conjugation by On. This includes powers of Haar distributed matrices. One recent global result of Jiang [22] shows that the maximum entry of fo f has the same limiting distribution as the maximum of n 2 standard normal variables. His method of proof gives an approximate coupling between the first J columns of rand J columns of standard normals for J of order n/ (log n)2. The uniform Gaussian limit for linear combinations of the entries of a random orthogonal matrix is proved in Section 2. This is used to prove Brownian motion limits in Section 3. The unitary and symplectic groups are treated in Sections 4 and 5. While we cannot prove completely parallel results, we can show that the sequences of partial sums along the diagonal, suitably normalized, converge to complex Brownian motions.

roo."

2

A Refinement of a Theorem of Borel

Our main tool will be obtained by extending a theorem of Borel [6]. A key to the analysis is that a Haar distributed element of the orthogonal group has entries that are invariant under the sign-change group. If r is an n x n orthogonal

Anthony D 'Aristotile, Charles M. Newman and Persi Diaconis

99

matrix and M is a random diagonal matrix with ±1 chosen uniformly down the diagonal, then the diagonal entries of Mf are ±fii . Under mild conditions on f ii, sums of such entries are close to Gaussian by classical theory. If f is uniform on On, then Mf has the same law as f. The following result both makes this precise and more general. Theorem 2.1. For each positive integer n, choose any n x n real nonrandom matrix A with IIAII = n (Here IIAII = TrAAt j, and let f be a random Haar distributed n x n orthogonal matrix. Then Tr Af converges in distribution to N(O,l) as n ~ 00. Remark. The matrix A above depends on n. We have suppressed this in the notation. See Mallows [27] for further discussion of this method quantifying joint convergence of a growing vector to a vector of independent normals.

Proof. By singular value decomposition [20], there are orthogonal n x n matrices U and V such that U AV = W where W = Diag(al' ... , an) and al 2;> a2 2;> ••• 2;> an 2;> O. Now

TrAf

TrAVV-If

TrU(AVV-If)U- 1 Tr(U AV)(V- I fU- I ). =

(2.2)

However, U AV is diagonal with non-negative, non-increasing entries and V-I fU- 1 is random orthogonal by the invariance of Haar measure. We thus assume for the rest of the proof that A is diagonal with nonincreasing entries aj and I All = n. If we write Xj for fjj, then we have TrAf = I: 1a j X j which we may also write as Sn. We will show that IE(e irSn ) - e- r; I converges to O. To do this, it is enough to demonstrate that for each real r there is a constant L 2;> 0 such that, for each E in (0,1), lim sup IE(e irsn - e-,~2)1 ::; LE.

(2.3)

n-+oo

This last assertion will hold if, given any subsequence nz of the positive integers, there is a further subsequence nzu such that IE(eirs"lu) - e-r:)1 is eventually less than or equal to LE. 2

Given E > 0 , choose a positive integer m 2;> }2 so that ~ ::; E2 for j > m. This is possible since by induction one can show that for all j and all n (recall that ai is non-increasing in i). Since aj ::; Vn, it is possible to choose nlu which satisfies

a; : ; y

~ ~ cy. as u ~ ~J v'vlu

Here 0 ::;

CYj ::;

00

for j = 1, ... , m.

(2.4)

1. We must consider E( eirS"lu ) but shall henceforth replace ne u .

7,2

by n to simplify notation. Now IE(e~rSn) - e- 2 sum of the following 3 terms:

1

is less than or equal to the

(2.5) and (2.7)

Brownian Motion and the Classical Groups

100

To bound (2.5) , first of all note that n

II

E( eirSn ) = E( eir 2:.7'=1 ajX j

eirajXj)

(2.8)

j=m+l n

=

II

E(e ir 2:.7'=l ajXj

(cos(rajX j ) + isin(rajXj )))

(2.9)

j=m+l n

=

II

E(e ir 2:.7'=l ajXj

cos(rajX j )).

(2.10)

j=m+l

To pass from (2.9) to (2.10), one should keep in mind the sign-symmetry of the X j . In addition, n

II

cos(rajXj ) - e- 2:..f=rn+1

T22

a;E(X]l1

(2.11)

j=m+1

j=m+l

le-2 L...j=rn+1 a r2 " , n

2 x2

j

2

n

L

~ r4

ajXf

j

_

r2 " , n e-2 L...j=rn+1 a2j E(X2) j I

(2.12)

n

L

+ r2 I

a;(X] - E(X])) I.

(2.13)

j=m+l

j=m+1

To see that (2.12) is bounded above by (2.13), first take notice that for complex numbers ZI,' . " Zn, WI,' . " Wn of modulus less than or equal to 1, we have n

n

n

j=1

j=1

j=1

This is easily proved by induction. Also, it is not hard to show that

for all real numbers t. Finally, one observes that le- a - e-bl < la non-negative a, b. In view of (2.8)-(2.10), (2.5) is equal to

J

I (e ir 2:.~1 ajX j

n

II

cos(rajX j )

j=m+1

-e- T22 2:..f==+1 a;E(X;) e ir 2:.7'=1 ajX j ) dPI

bl

for

Anthony D 'Aristotile, Charles M. Newman and Persi Diaconis

(as we saw earlier that E(XJ)

= r4 ~

L

Using (11)-(13), this is bounded by

n

+ r2

ajE(XJ)

j=m+l n

r4

1;).

2/ I 2:

n

L

=

2

n

j=m+l

L

a;X] - E(

j=m+l n

2:

+ r2 (Var(

ajE(Xj)

101

a;XJ) I dP

j=m+l

(2.14)

a;X]))!.

j=m+l

To obtain (2.14), which is our initial bound for (2.5), keep in mind that / IY - EYI dP

~ (/ IY -

EYI 2 dP)!

= (VarY)!

by Holder's inequality. We will return to (14) but first we claim that (2.7) converges to zero. Since

which converges to 1 -

L.";=1

0:;, our assertion is clear. It is also the case that

r2 1 2:n a2 2.6 ) converges to zero. To see this, first note that since e- Tn j==+l j is 2 a X ) bounded, it is enough to verify that E ( eir 2:= j=l j j converges to e- Tr2 2: j=l a j . (

111

But this immediately follows from the fact [13J that the entries of the block matrix [y'nrijhsi,jSm are in the limit independent, each with the standard normal distribution. From (2.14), and the previous paragraph, we have

~

n

r4

L

2

ajE(XJ)

+ r2 (Var(

j=m+l

n

L

a;X]))!

+ Bn

j=m+l

where Bn ---+ 0 as n ---+ 00. Since X/ has a beta distribution with parameters 1, 1 and thus E(Xj) = (n)(~+2) ~ and ~ L.:+l ~ 1, we have

n-

:2

a;

(2.15) Therefore

Brownian Motion and the Classical Groups

102

Furthermore, 11

11

= 2:: ajVar(X])

Var( 2:: a;X]) j=m+l

m+l 11

2:: a;a~ Cov(X], X~).

+

j,k=rn+l

#k

Now

=

11 2:: a·J4

11 4 4 1 - -n2 2:: aE(X.) J J

j=m+l

3

j=m+l

11

< - _ n2 ""' L-

a4 J

_

1 n2

11

4 L- aJ

_

""'

j=m+l

2

=

j=m+l

11

n2

aj

2:: j=m+l

(2.16)

:::; 2E2.

To obtain (2.16) we can appeal to (2.15). By expanding and taking expectations of both sides of 11

11

1 = (2::rL)(Lr~j)' j=l

j=l

it follows that Thus

Therefore, for j

i=

k,

Cov(X], X~) = 1

< - n(n - 1)

E(X]X~) - ~ n

1 n 2'

One then easily verifies that for n 2 2 2

2

Cov(Xj' X k )

:::;

2 3' n

Thus 11

L j,k=rn+l

j#k

a;a~ Cov(X], X~)

Anthony D'Aristotile, Charles M. Newman and Persi Diaconis

103

which converges to zero. We have

where En ---+ 0 as n ---+ 00. This yields (2.3) for some L depending on r, as desired, and we are done.

D

Our next result shows that the convergence in Theorem 2.2 is uniform in A. We only work with diagonal matrices A here but singular value decomposition says that this suffices. We then find it convenient to think of A as a point of a sphere of radius fo. Theorem 2.2. Let

r, Xj

=

rjj be as in Theorem 2.1 , and let An be the

surface of the sphere of radius fo in lRn. For v = (aI, ... , an) E An, write Sn(v) for 2.:.]=1 ajXj . Then Sn converges in distribution to N(O, 1) uniformly on An, z.e., as n ---+ 00, sup IP(Sn(v)::::; x) - (x) I ---+ O. xElft,vEAn

Proof. We first verify that the family F = {Sn(v): v E An, n = 1,2, ... } is tight. Corresponding to any sequence S of F, either there is a positive integer Y such that S is contained in the family {Sj (Vj): Vj E A j , 1 ::::; j ::::; Y} or S has a sub-sequence Snl (v n1 ) where nl ---+ 00. In the first case, S has a subsequence of the form Sk(Pku) where k is a fixed positive integer, 1 ::::; k ::::; Y, and Pku = (al u , ... , aku) E Ak for u = 1,2, .... Choose a sub-sequence Ul of the positive integers such that a ru1 ---+ br for 1 ::::; r ::::; k. Plainly Sk(PkuJ => Sk(W) where w = (h, ... , bk ). In the second case, the argument of Theorem 1 shows that SnJ vnJ => N(O, 1). Thus F is tight. It is easy to see that because of tightness, it suffices to show, as we now do, that for any interval [a, bJ S;;; lR lim n--=

sup

IP(Sn(v) ::::; x) - (x) I = O.

xE[a,b], vEAn

If false, there exists an EO > 0, a sub-sequence nz elements vn1 E Anl such that

---+ 00,

points x nl E [a, b], and

Now X n1 has a non-increasing or non-decreasing sub-sequence xn1u which converges to x E [a, bJ. We assume without loss of generality that xnzu is nondecreasing. We henceforth work with n1u but suppress the subsequence notation. Note that

Since Sn(v n ) => N(O,l), it is clear that P(x n < Sn(v n ) ::::; x) ---+ 0 and hence that P(Sn(v n ) ::::; xn) ---+ (x). Since (xn) ---+ (x), we obtain a contradiction which proves our claim. D

Brownian Motion and the Classical Groups

104

3

Orthogonal Matrices

We use the results of Section 2 to prove the main theorem of this section. This shows that if any growing selection of entries of a random orthogonal matrix are linked together in the classical way, a limiting standard Brownian motion results. To set up our notation, let r = (r ij )i,j=l be an n x n orthogonal matrix distributed by Haar measure. Choose a subset of size k n from among the entries of r. Suppose the entries are f31, f32, ... ,f3kn with f3j corresponding to e.g. lexicographic order ofr Ts : (r,s) < (x,y) ifr < x or ifr = x and s < y. To denote this ordering we write f31 r ll , f32 r 12 , ... , f3n+l r 21 , etc. ('oJ

('oJ

('oJ

Theorem 3.1. Let f31, f32, ... ,f3kn be entries of a Haar distributed random matrix in On, as above. Assume that k n / 00. If for £ in {I, ... ,kn } and t in

[0, 1],

then Xn ==> W, a standard Brownian motion, as n

-+ 00.

Proof. We first prove that the finite-dimensional distributions of Xn converge to the corresponding distributions of W. For a single time point t, we must prove that Xn(t) ==> N(O, t) = W t as n -+ 00. However, this is equivalent to

For each n, let A = (aij) i,j = 1 be the n x n real matrix defined as follows : if f3i

('oJ

r

ST,

for some i, 1 if f3[kntJ+l otherwise Note that

('oJ

r

:s: i :s: [knt] ST

IIAII = nand

which converges to N(O, 1) in distribution by Theorem 2.1.

However,

n - [kk':ln ::; k:t and so, by [5], it suffices to show that ji5,f3[k ntJ+l in probability, which folows from k n

-+ 00

° :s: -+

°

and the fact [13] that

We now consider two time points sand t with s < t. By the Cramer-Wold device [5], it is enough to show that

Anthony D'Aristotile, Charles M. Newman and Persi Diaconis for any (a, b) E

]R2.

105

However, this is equivalent to showing that

where This can be shown by choosing an appropriate sequence of matrices A, as follows, and again applying Theorem 2.1. First note that

[k n s]a 2 + ([knt] - [k n s])b2 ~ (kn s - 1) a2 + (( k n t - 1) - k n s) b2 = k n sa2 - a2 + k n tb 2 - b2 - k n sb 2 = k nC 2(s, t) - (a 2 + b2)

Also observe that

[k n s]a 2 + ([knt] - [k n s])b 2 :::; k n sa 2 + k ntb 2 - (kns - 1)b 2 = kn sa 2 + k ntb 2 - k n sb2 + b2 = k n C 2 (s,

t)

+ b2

Combining these facts, we have

k n C2(S, t)

< (n _ [kns]na 2 _ ([knt] - [kns])nb 2 ) -

k nC2(s, t)

k nC2(S, t)

n(a 2 + b2 ) :::; k n C2(s, t) With these preliminaries, we define the matrix A in two cases. If n ([k n t]-[k n s])nb k n C2(s,t)

2

~~C;J(s~:) -

> 0 let A = (a·~,].)T}. be defined as follows: z,]=l

-,

if f3i

rv

r vu,

for some i, 1 :::; i :::; [kns] if f3i

rv

i, [kns]

r vu,

for some

+ 1 :::; i :::;

if f3[k n t]+l otherwise

rv

r vu

[knt]

Brownian Motion and the Classical Groups

106 s]na 2 O n the 0 ther h an d ,1'f n - k[kn nC2(s,t) we define A = (ai,j)i,j=l by:

if

/3i

r-v

r vu,

for some i, 1 ::s; i ::s; [knsJ if

/3i

r-v

r vu,

for some i, [knsJ if /3[k n t] r-v otherwise

+ 1 ::s; i ::s;

r vu

Note that in either case IIAII = n and so Tr(Ar)) => N(O, 1) by Theorem 2.1. However, it is plain that C(~,t) S[kns] + C(~,t) (S[knt] - S[kns]) differs from

ji[:,

Tr(Ar) by a quantity in absolute value bounded by 1f[!]; where 1 is an entry of the random n x n random r. Thus, as before, what remains is to show that ~

ji[:, converges to zero in probability. Thus, given

P( I via2 + b2 C (s, t)

I V~ kn 1 <

) = P( Iyin I n1 <

E

E

>0

c (s, t) ~) nE

via2 + b2

which converges to 1 as n ---+ 00. A similar argument shows that the higher order finite dimensional distributions behave properly. We next show that Xn is tight. According to Theorem 15.6 of [5], it is enough to show that for sufficiently large n

for K independent of n, h, t, and t 2 . The left member of the above expression is

where [kntlJ < i, j ::s; [kntJ and [kntJ < k, I ::s; [knt2J. Put [knt]- [knhJ = ml and [knt2J - [kntJ = m2· The left member of (3.17) is bounded from above by 2

~2 (aE(ri1 r i2) + bE(rilr~2)) n

where a and b are both less than or equal to ml m2. Here we have used the fact that for distinct entries 6, /3, Q and (J of a random orthogonal matrix, E(6 2 /3Q) = 0 and E(6/3Q(J) is non-positive. The first assertion uses the fact that for any (nonrandom) diagonal sign matrix M, the random matrices r M and Mr are equidistributed with r; the second assertion uses that and also the fact that Ebll 112121/22) = - (n-l)~(n+2) [36]. However, both n 2E(q1 q2) and n 2 E(ri1 r~2) converge to 1 , and so for all n, both expectations are less than for some positive constant L. Combining all of this information, we

:2

([kntJ - 1)

Anthony D'Aristotile, Charles M. Newman and Persi Diaconis

107

have that

If k~ -s; t2 - t l

,

then

ml::,m2

-s;

2(t2 -

td while if k~ > t2 - t l , then either D

Xn(t) - Xn(h) or X n (t2) - Xn(t) is zero. Thus our claim is established.

4

Unitary Matrices

We first thought that obtaining unitary analogues of Theorems 1, 2, 3 would be straightforward but then encountered difficulties in translating to the complex case because of the lack of a singular value decomposition. This led us to carefully redo the preliminaries. Our main results are Theorems 5 and 6 below. For the proof of Theorem 6, it will be necessary to first establish Theorem 4, which is the analogue of Theorem 15.1 of [5]. To this end, let nk, V and £ denote respectively the Borel sets of IRk, D and D x D, where D is the Skorokhod space of right-continuous real-valued functions on [0,1] with left limits. For t l ," ., tk in [0,1], define by 7rtl, .. ,tk (x) = (x(td,"" X(tk)) for xED. Following Billingsley [5], sets of the form 7r~~... ,tk (H) where H E nk are subsets of D and called finite-dimensional sets. If To is a subset of [0,1], let :Fro be the collection of sets 7r~~... ,tk (H) where k 2:: 1, ti E To, and H E nk. Then :Fro is an algebra of sets, i.e., :FTo is closed under finite unions and finite intersections and the empty set 0 E :FTo ' See Royden [35] for more details. Obviously, :F[0,1] is the class of finite-dimensional sets. Billingsley has shown (Theorem 14.5 of [5]) that if To contains 1 and is dense in [0,1], then :FTo generates V. Extending these ideas, for Sl,' .. , Sk and tl,' .. , tz in [0,1]' define

by sending (x, y) to (x(st},· .. , X(Sk); y(td,' .. , y(tl))' Subsets of D x D of the form where HEn k, KEn I are called finite-dimensional sets (of D x D). If To and T1 are subsets of [0,1], let :FTo,Tl be the class of sets

where Si E To, tj E T 1, k 2:: I, l 2:: I, H E n k, and K E nl. One can easily verify that :FTo,Tl is a semi-algebra of sets, i.e., the intersection of any two members of FTo,Tl is again in :FTo,TJ and the complement of any set in :FTo,Tl is a finite disjoint union of elements of :FTo,TJ . If we let A be all finite disjoint unions

Brownian Motion and the Classical Groups

108

of members of FTo,T1 , then A is an algebra of sets in D x D (any semialgebra generates an algebra in this way [35]). Suppose To and Tl are both dense subsets of [0,1] and that 1 E To n T 1 . Let L be the a-algebra of subsets of D x D generated by FTo,Tl' Sets of the form

n

where H is in k and 81,' . " 8k E To are in FTo,T1 and may be identified with FTo ' Since FTo generates V, it is clear that G x DEL for all open sets G of D. Similarly D x L E L for all L open in D and so L contains all sets G x L where G, L are open in D. It is now plain that £ <::: L. On the other hand, Billingsley has shown that

is a measurable mapping. In a completely analogous way, it can be shown that

is also measurable (here n k x nl is the a-algebra of subsets of]Rk x]Rl generated by "measurable rectangles" of the form H x K where H E n k , K E nl). This a-algebra is precisely the a-algebra of Borel sets of]Rk x ]Rl (see [4]). It follows that the finite-dimensional subsets of D x D lie in £ by definition of measurable mapping. Thus L <::: £ and so we have L = £. Suppose P and Q are two probability measures on (D x D, £) which agree on FTo,T1 • Then they clearly agree on the a-algebra A generated by FTo,T1 • Since A generates £, it follows that P = Q on £ by Theorem 3. 2 of [4]. In the language of Billingsley [5], for To,T1 dense in [0,1] with 1 E TonTI, FTo,T1 is a "determining class." If P is a probability measure on (D, V), let Tp be the set of all points t E [0,1] such that 1ft is continuous except on a subset of D which has P':'measure O. Billingsley [5] has shown that Tp contains 0 and 1 and its complement in [0,1] is at most countable. Now let P be a probability measure on (D x D,£) with marginals Rl and R 2 . If 81,' .. , 8k E TRI and t 1 ,' . " tl E T R2 , then 7rs1 ,"',Sk is continuous except on a subset A of D of R1-measure zero. Similarly 1ftl, ... ,tl is continuous except on a subset B of D of R 2 -measure zero. Now (A x D)U(D x B) has P-measure 0 and off this set 7rs1 , ... ,Sk;tl, ... ,tl is continuous. We will need the following: Theorem 4.1. Let P n , n = 1,2, "', and P be probability measure8 on (DxD, E). Suppose Rl and R2 are the marginal probability measures of P. If {Pn } is tight and if Pn1f~: ... ,Sk;tl, ... ,tl =} P7r~: ... ,Sk;tl, ... ,tl holds whenever all the 8i are in TRl and all the tj are in T R2 , then Pn =} P. Proof. Since {Pn } is tight, each subsequence {Pn '} contains a further subsequence {Pn "} converging weakly to some limit Q. By Theorem 2 of [5], it suffices to show that each such Q is equal to P. Suppose Q1 and Q2 are the marginals of Q. If 81,' . " 8k all lie in TRI n TQ! and t 1 , •. " tz all lie in TR2 n TQ2 , then

Anthony D'Aristotile, Charles M. Newman and Persi Diaconis

109

by hypothesis. Also 1[' s l, .. ,Sk;h, .. ,tl is continuous except on a subset of D x D of Q-measure zero by comments preceding the statement of the theorem. Since Pn" =? Q, it follows by Theorem 5.1 of [5] that

p n 111['-1 Sl,···,Sk;t1, .. ,t1

=?

Q1['-l

Sl,···,Sk;t1,··,tI·

Thus whenever each Si E TR1 n TQ1 and each tj E TR2 n TQ2' Let T1 = TR1 n TQ1 and T2 = TR2 nTQ2 . Each ofT1 and T2 is dense in [0,1] and 1 E Tl nT2 and so as we have seen above, FT1 ,T2 is a determining class. The above equality says that P and Q agree on F T1 ,T2 and we are done. D We are now in a position to establish the complex analogue of Theorem 2.1 (for diagonal A). Theorem 4.2. Let A = Diag(al, ... ,an ) and B = Diag(b1, ... ,bn ) where al 2: a2 2: ... 2: an and b1 2: b2 2: ... 2: bn and IIAII = IIBII = n, and let ~ = f + iA be an n x n unitary matrix distributed by Haar measure. Then (TrAf, TrBA) =? ~(Zl' Z2) as n --+ 00, where Zl and Z2 are i.i.d. standard normal (i.e., Tr Af + iTr Bf converges in distribution to a complex standard normal distribution).

Proof. By the Cramer-Wold device [5], it suffices to prove that xTrAf

1

1

+ yTrBA =? x J2Z1 + y J2Z2

for arbitrary (x, y) E ~2. Write of f and A. We will show that

Xj

for

Ijj

and Yj for

Ajj

with lij,

Aij

the entries (4.18)

converges to zero. We follow the proof of Theorem 2.1 and show that there is a constant L > such that, for each E > 0, the lim sup of (4.18) is less or equal to LE. Given E > 0, choose a positive integer m 2: so that ~ ::; E2 and ~ ::; E2 for j > m and all n. Given any subsequence nz of the positive integers, choose a subsequence nzs which satisfies

°

;2

aj

- - --+ a'l,

yInl;

bj

- - --+

yInl;

(3

as /1

j,

--+ 00

for j = 1,2, ... m.

(4.19)

As before, we will suppress the subsequence notation. The quantity (4.18) is less than or equal to the sum of the following three terms

IE( eir(x 2::=j=l ajXj+y 2::=j=l bj Y j )

_

e-

2~ (x 2 2::=j=Tn+1 a;+y2 2::=j==+l b;)) (4.20)

2 7"2

.E(eir(x2::=';=l a j X j +y2::=';=l bjYj))I,

le- r; 2~ (x 2 2::=j==+l a;+y2 2::=;'==+1 b;) E( eir(x 2::=';=1 ajXj+y 2::=';=1 bj Y j )) r2 1 2 "n 2 ,..2 2 "Tn _e-22nx L..,j==+l a j e-Tx L..,j=l

Oi

2

,..2

1

2 "n

j e-22nY L..,j==+l

b2 j

,.2

(4.21) 2

,,= jl,

e-TY L..,j=l

{32

(4.22)

Brownian Motion and the Classical Groups

110

Since n

1 n

~n

~ ~

m

a2 -----t J

1-

j=m+1 n

L

~ 0: 2 ~ J

and

j=1

m

bJ -----t 1 -

j=m+1

L (3], j=l

the term (4.22) converges to zero. By a known result (see, e.g., Lemma 5.3 of [33]), 1

(foX I , ... , fox m , foYI , ... , foYm ) =* y'2(ZI' Z2, ... , Z2m) where the Zi are i.i.d. N(O,l). Thus

and so

and hence (4.21) converges to zero. To bound (4.20), we first claim that n

n

j=m+1

j=m+l

n

n

To see this, let and note that eir(xL'j=lajXj+YL'j=lbjYj)

= G(

II j=m+1

cos(rxajXj )

II

cos(rybjYy))

j=m+l

plus a sum of products of the form GJ where J is a product of sines and cosines involving at least one sine term. To establish our claim, it is enough to verify that the expectation of any such GJ is zero. First suppose J contains the factor sin(rxajXj ) but not the factor sin(rybjYy). Then E(GJ) = 0 by the sign-symmetry of the diagonal elements of~. Next consider a product GJ containing a factor sin(rxajX j ) sin(rybjYy). The diagonal elements of ~ are also exchangeable, and so we can assume j = m + 1. Write

Anthony D 'Aristotile, Charles M. Newman and Persi Diaconis

111

where Un is the unitary group and P,n is Haar measure. For 0 E [0,27fJ, let D(O) be the n x n diagonal matrix Diag(l, 1, ... , 1, e iO , 1, ... , 1) where eiO is in position m + 1. By the invariance of Haar measure, D(O)Ll has the same distribution as Ll, and so

1

Hsin(rxam+l(SCOS(r+O))) sin(rybm +1 (ssin(r+O))) dP,n =f.

Un

Thus

f27r 10

1 Un

H sin(rxa m+l (scos(r + 0))) sin(rybm +1(ssin(r + 0))) d/1n dO = 27ff.

By Fubini's Theorem [35], we have

1 Un

H

f27r sin(rxam+l(SCOS(r + 0))) 10

sin(rybm + 1 (ssin(r + 0))) dO dP,n

= 27ff.

Next let l(O) = sin(rxam+1(scosO)) sin(rybm +1(ssinO)). Now, l is periodic with period 27f and shifting l by '"Y units yields a functions whose integral over [0,27f] coincides with the integral of lover that same interval. Thus

1

H

Un

f27r l( 0) 10

dO dP,n = 27f f.

However, l is an odd function and so

10f27r l(O) dO = J7r -7r l(O)

dO

=

o.

It follows that f = 0 and our claim is established. Using this fact and arguing as we did in the proof of Theorem 2.1 , we have that the expression in (4.20) does not exceed the value

1 I IT Un

-e

IT

cos( rxaj X j )

j=m+l

"n

_ r2x2 2 E(X2) 2 L..j=m+l a j j

n

e

_ r2y2

L:nj=m+l b2j E(y2) j I dP,n n

L

a;E(Xf) + r ; (Var(

j=m+l n

+r4y4

2

2 2

L

:::; r 4 x 4

cos( rybj lj)

j=m+l

L j=m+l

a;XJ))~

j=m+l 2 2

b;E(Y/) + r ~ (Var(

n

L

b;Yl))~.

j=m+l

We can bound this last expression as in the proof of Theorem 1, which leads us to a proper choice of L and completes the proof of Theorem 4.2. 0 It is natural to ask if Theorem 3.1 has complex and symplectic analogues. We believe this is the case but thus far, like in the case of Theorem 2.1, we are able to prove a result of this type only for elements of the diagonals of these classes of matrices. In doing so, we obviously lean heavily on the preceding theorem.

Brownian Motion and the Classical Groups

112

Theorem 4.3. Let On = Un be the unitary group of n x n complex matrices, and let ~ = r + iA be an element of On distributed according to Haar measure. Let dj = "ijj + iAjj and let SJ:: = L~=1 dj . If Zn(t,w)

= S[ntJ(w), t

E

[0,1]'

then Zn =} TV converges to TV where TV is standard complex-valued Brownian motion (TV = WP) + iWP) where W(I) and W(2) are independent onedimensional Brownian motions with drift 0 and diffusion coefficient ~). Proof. We appeal to Theorem 5. One can easily adapt the argument for tightness given in Theorem 3.1 to show that ReZn is tight. Here Ebfl) = 2~ and Ebrr"issAuuAvv) = 0 for distinct r, s, u, and v. Similarly, ImZn is tight and hence Pn is tight where Pn is the law of (ReZn , ImZn ). By Theorem 4.1, it remains to show that (4.23) where P is the law of (W(I), W(2)). We consider time points 81, 82, it, and t2 where 81 < 82 and tl < t2, and one may easily verify that the general case can be handled analogously. Letting Xn = ReZn and Yn = ImZn, we wish to prove that

However, this statement would follow if

converges in distribution to

Appealing as before to the Cramer-Wold device [5], it suffices to show that

converges in distribution to

for any (a, b, c, d) E ]R4. The remainder of the proof follows by applying Theorem 4.2 in essentially the same way as Theorem 2.1 is applied in the proof of Theorem 3.1. D

5

Symplectic matrices

Recall (see [8] ) that the group of symplectic matrices Sp(n) may be identified with the subgroup of U(2n) of the form

[~

-1]

E

U (2n),

(5.24)

Anthony D'Aristotile, Charles M. Newman and Persi Diaconis

113

where A, B are complex n x n matrices. The trace of random matrices from this group is studied in [14, 16]. As shown there, if 8 is chosen according to Haar measure in Sp(n), then Tr(8) , Tr(8 2 ) , ... , Tr(8 k ) are asymptotically independent normal random variables. We now study the extent to which the diagonal entries of a random symplectic matrix generate Brownian motion. Random matrices in Sp( n) can be generated in the following way. Fill the real and imaginary entries of A and B in with real, standard normal i.i.d. random variables. Apply the Gram-Schmidt process to the n complex column vectors of dimension 2n which result. We now have a new A and B and we complete the right half of our matrix by following the pattern of (5.24). The matrix obtained in this way is distributed according to Haar measure in Sp(n). To see this, one can adapt the argument given for the construction of a random orthogonal matrix. See for example Proposition 7.2 of [17]. We now have

Theorem 5.1. Let Sp(n) be the symplectic group of 2n x 2n complex matrices of the form (5.24) , and let 8 be an element of Sp(n) chosen according to Haar measure /-Ln. Let A = (aij)r,j=l be the upper left n x n block of 8, and let d i = aii, 1 :::; i :::; n , and let SI: =

then Zn ::::} ~ TV where

TV

2:7=1 di ·

If

is standard complex-valued Brownian motion.

Proof. We are working with complex matrices and so we can follow the arguments of Theorems 4.2 and 4.3. We first need the symplectic analogue of Theorem 4.2. To accomplish this, only one change in the proof of Theorem 4.2 is required. In place of the diagonal matrix D(O), we use instead the 2n x 2n diagonal matrix D1 (0) = Diag(l, ... , 1, ei(J, 1, ... , 1, e-i(J, 1, ... , 1) where ei(J and e-i(J occur in positions number m + 1 and n + m + 1 respectively. The rest of the arguments for the analogues of Theorems 4.2 and 4.3 are clear. D

It should be noted that we cannot link all 2n diagonal entries to obtain Brownian motion. If we were to try, note that Zn(~) and Zn(l) - Zn(~) would tend to limits which are complex conjugates of one another and hence dependent.

Acknowledgement. The authors thank Harry Kesten for explaining how sign-symmetry could be used to show that the trace of a random orthogonal matrix converges to a standard normal distribution at the Bowdoin Conference on random matrices in 1985. They also thank Francis Comets for comments on earlier drafts of this paper. The first author thanks the Department of Statistics of Stanford University for warm hospitality extended to him during the summers of 1994-1997. He also thanks Jeff Rosenthal and Patrick Billingsley for some useful conversations. In addition, he acknowledges support from the Research Foundation of the State University of New York in the form of a PDQ Fellowship. The second and third authors acknowledge research support from the Division of Mathematical Sciences of the National Science Foundation.

Brownian Motion and the Classical Groups

114 Anthony D' Aristotile Dept. of Mathematics SUNY at Plattsburgh Plattsburgh, NY 12901

Persi Diaconis Depts. of Mathematics and Statistics Stanford University Stanford, CA 94305

Charles M. Newman Courant Inst. of Math. Sciences New York University 251 Mercer Street New York, NY 10012

Bibliography [1] Arratia, R., Goldstein, L., and Gordon, L., Poisson Approximation and the Chen-Stein Method, Stat. Science 5, 403-434, 1990. [2] Bai, Z.D., Methodologies in Special Analysis of Large Dimensional Random Matrices. A review, Statist. Sinica 9, 611-677, 1994. [3] Bhattacharya, R. and Waymire, E., Stochastic Processes with Applications, John Wiley and Sons, 1990. [4] Billingsley, P. J., Probability and Measure, Second Edition, John Wiley and Sons, 1986. [5] Billingsley, P. J., Convergence of Probability Measures, John Wiley and Sons, 1968. [6] Borel, E., Sur les principes de la theorie cinetique des gaz, Annales de l'ecole normale sup. 23, 9-32, 1906. [7] Bump, D. and Diaconis, P., Toeplitz minors, Jour. Combin. Th. A. 97, 252-271, 2001. [8] Brackner, T. and tom Dieck, J., Representation of Compact Lie Groups, Springer Verlag, 1985. [9] Daffer, P., Patterson, R., and Taylor, R., Limit Theorems for Sums of Exchangeable Random Variables, Rowman and Allanhold, 1985. [10] D'Aristotile, A., An Invariance Principle for Triangular Arrays, Jour. Theoret. Probab. 13, 327-342, 2000. [11] Diaconis, P., Application of the method of moments in probability and statistics. In H.J. Landau, ed., Moments in Mathematics, 125-142, Amer. Math. Soc., Providence, 1987. [12] Diaconis, P., Patterns in eigenvalues, To appear Bull. Amer. Math. Soc., 2002. [13] Diaconis, P., Eaton, M., and Lauritzen, 1., Finite de Finetti theorems in linear models and multivariate analysis, Scand. J. Stat. 19, 289-315, 1992.

Anthony D'Aristotile, Charles M. Newman and Persi Diaconis

115

[14] Diaconis, P. and Evans, S., Linear functions of eigenvalues of random matrices, Trans. Amer. Math. Soc. 353, 2615-2633, 2001. [15] Diaconis, P. and Freedman, D., A dozen de Finetti-style results in search of a theory, Ann. Inst. Henri Poincare Sup au n. 2 23, 397-423, 1987. [16] Diaconis, P. and Shahshahani, M., On the eigenvalues of random matrices, J. Appl. Prob. 31A, 49-62, 1994. [17] Eaton, M., Multivariate Statistics, John Wiley and Sons, 1983. [18] Edelman, A., Kostlan, E., and Shub, M., How many eigenvalues of a random matrix are real?, Jour. Amer. Math. Soc. 7, 247-267, 1999. [19] Feller, W., An Introduction to Probability Theory and Its Applications, Vol. II, John Wiley and Sons, 1971. [20] Golub, R. and Van Loan, C., Matrix Computations, 2nd Ed., Johns Hopkins Press, 1993. [21] Hida, T., A role of Fourier transform in the theory of infinite dimensional unitary group, J. Math. Kyoto Univ. 13, 203-212, 1973. [22] Jiang, T.F., Maxima of entries of Haar distributed matrices, Technical Report, Dept. of Statistics, Univ. of Minnesota., 2002. [23] Kuo, H.H., White Noise Distribution Theory, CRC Press, Boca Raton, 1996. [24] Levy, P., Le£;ons d'Analyse Fonctionnelle, Gauthiers-Villars, Paris, 1922. [25] Levy, P., Analyse Fonctionnelle, Memorial des Sciences Mathematiques, Vol. 5, Gauthier-Villars, Paris, 1925. [26] Levy, P., Problemes Concrets d'Analyse Fonctionnelle, Gauthier-Villars, Paris, 1931. [27] Mallows, C., A Note on asymptotic joint normality, Ann. Math. Statist. 43, 508-515, 1972. [28] Maxwell, J.C., Theory of Heat, 4th ed., Longmans, London, 1875. [29] Maxwell, J.C., On Boltzmann's theorem on the average distribution of energy in a system of material points, Cambridge. Phil. Soc. Trans. 12, 547-575, 1878. [30] McKean, H.P., Geometry of Differential Space, Ann. Prob. 1, 197-206, 1973. [31] Mehler, F.G., Ueber die Entwicklung einer Function von beliebig vielen. Variablen nach Laplaschen Functionen hoherer Ordnung, Grelle's Journal 66, 161-176, 1866. [32] Mehta, M., Random Matrices, Academic Press, 1991.

116

Brownian Motion and the Classical Groups

[33] Olshanski, G., Unitary representations of infinite-dimensional pairs (G, K) and the formalism of R. Howe. In A.M. Vershik and D.P. Zhelobenko, eds. Representation of Lie Grops and Related Topics, Adv. Studies in Contemp. Math. 7, 269-463, Gordon and Breach, New York, 1990. [34] Rains, E., Normal limit theorems for asymmetric random matrices, Probab. Th. Related Fields 112, 411-423, 1998. [35] Royden, H. L., Real Analysis, 2nd Edition, The Macmillan Company, 1968. [36] Stein, C., The accuracy of the normal approximation to the distribution of the traces of powers of random orthogonal matrices, Technical Report No. 470, Stanford University, 1995.

Transition Density of a Reflected Symmetric Stable Levy Process in an Orthant Amites Dasgupta and S. Ramasubramanian Indian Statistical Institute Abstract Let {Z(s,x)(t) : t 2': s} denote the reflected symmetric a-stable Levy process in an orthant D (with nonconstant reflection field), starting at (s, x). For 1 < a < 2,0 :::; s < t, x E D it is shown that Z(s,x) (t) has a probability density function which is continuous away from the boundary, and a representation given.

1

Introduction

Due to their applications in diverse fields, symmetric stable Levy processes have been studied recently by several authors; see [4], [5] and the references therein. In the meantime reflected Levy processes have been advocated as heavy traffic models for certain queueing/stochastic networks; see [14]. The natural way of defining a reflected/regulated Levy process is via the Skorokhod problem as in [9], [3], [11], [1]. In this article we consider reflected/regulated symmetric a-stable Levy process in an orthant, show that transition probability density function exists when 1 < a < 2 and is continuous away from the boundary; the reflection field can have fairly general time-space dependencies as in [11]. It may be emphasized that unlike the case of reflected diffusions (see [10]) powerful tools/methods of PDE theory are not available to us. To achieve our purpose we use an analogue of a representation for transition density (of a reflected diffusion) given in [2]. Section 2 concerns preliminary results on symmetric a-stable Levy process in JRd, its transition probability density function and the potential operator. In Section 3, corresponding reflected process with time-space dependent reflection field at the boundary is studied. A major effort goes into proving that the distribution of the reflected process at any given time t > 0 gives zero probability to the boundary.

2

Symmetric stable Levy process

Let (O,F,{Ft},P) be a filtered probability space, d 2 2,0 < a < 2. Let {B(t) : t 2 O} be an Fradapted d-dimensional symmetric a-stable Levy process. That is, {B(t)} is an JRd-valued homogeneous Levy process (with independent increments) with r.c.I.I. sample paths; it is roation invariant and

E[exp{ i(u, B(t) - x) }IB(O) = x] = exp{ -tlul a } 117

(2.1 )

Reflected Levy Process

118

for t 2:: 0, U E IR d, x E IRd. It is a pure jump strong Markov process. Using LevyIto theorem and Ito's formula, it can be shown that the (weak) infinitesimal generator of B(·) is given by the fractional Laplacian

J

~a/2 f(x) = ~ft}C(d, ex)

f(x

~~fl+~ f(x) d~

(2.2)

1~I>r

whenever the right side makes sense, where C(d,ex) = r(dta)/[2-a7fd/2Ir(~)I]; the measure v(d~) = C(d, ex) I~I}+Q d~ is called the Levy measure of B(·). Also, for any t > 0,

P(B(t)

i= B(t-)) = o.

(2.3)

See [4], [5], [7], [8] for more information.

= 8g(X)/8xi,gij(X) = 82g(X)/8xi8xj, 1::;

For afunctiong onIRd,gi(X)

i,j::; d.

Lemma 2.1. If f E C~(IRd) then ~a/2 f E Cb(IR d). Proof: For 0

< r < s, ~~,~2 is defined by (2.4) r
Let f E C~(IRd). For any x E IRd observe that

If(x

+~)

- f(x)1

1~Id+a

1

l(1,CXl)(I~1)

::; 21IfIICXlI~Id+a l(1,CXl)(I~I)

(2.5)

and that as ex > 0

J

CXl

1 de - C 1~Id+a <" -

J

r

-(a+l)d

r<

(2.6)

00.

1

1~1>1

So continuity of f and dominated convergence theorem imply that ~~~ ' well defined, bounded and continuous. Next, Taylor expansion gives

f(x +~) - f(x)

dId

=

L Ii(x)~i + "2 L i=l

fij(Y)~i~j

f

is

(2.7)

i,j=l

where Y is point on the line segment joining x and x + ~. Since function for each i

J ~i 1~1;+a d~ = o.

~

1-+

~i

is an odd

(2.8)

r
d

Note that

L

i,j=l

lij (Y)~i~j

= O(I~12)

and 1

J 1~121~1;+a d~ J =

O
r-a+1dr <

C

0

00

(2.9)

Amites Dasgupta and S. Ramasubramanian

119

as a > 2. Since fij (-) E Cb(IR d) it is now easily seen that lim .6.~{2 f is well rIO

'

defined, bounded and continuous. Since

.6.0./2 f(x) = .6.~~f(x) ,

+ lim.6.~{2 f(x) rIO'

(2.10)

the lemma now follows.

D

It is known that the process B(·) has a transition density function; we now give a representation for it.

Theorem 2.2. The transition probability density function of B(·) is given by

p(s, x; t, z)

J

00

=(47f)-d/2(t - s)-d/a

o

g(r) exp {1 ~Iz rd 4(t - s)2/a r2

-

x12} dr (2.11)

°

for :s: s < t < 00, x, z E IR d, where g(.) is the density function of the square root of an ~ -stable positive random variable. Proof: By homogeneity enough to consider s = 0, x = 0. Let t > 0. By (2.1) and Proposition 2.5.5 (on pp. 79-80) of [13] it follows that B(t) = (BI (t), ... ,Bd(t)) is sub-gaussian and that there exist independent one-dimensional random variables 8, U1 , ... ,Ud such that Ui rv N(O, 2t 2 / a ), 1 :s: i :s: d,8 is ~stable positive random variable and (Bl (t), ... ,Bd(t)) rv (8~ U I , 8~ U2, ... ,8~ Ud). Denoting by g(-) the density of 8 1 / 2 , the joint density of (UI , ... , Ud, 8 1 / 2 ) is given by

h(6, ... ,~d,r)=

( 1) (1)t 47f

d/2

d/ a

g(r)exp

{1 -4t2/a8~; d

}

.

Using the invertible transformation (6, ... ,~d,r) f---t (r6, ... ,r~d,r) on IR d x (0,00) the joint density of (Bl (t), ... ,Bd(t), 8 1 / 2 ) is given by 1d h r

(~Yb ... , ~Yd' r) r r

(4~) d/2

m

d/a :d9(r) exp {

~ 41;/" :2

t

yf } .

Now integrating w.r.t. r we get (2.11).

D 00

J

rlkg(r)dr < 00 o for k = 2, 3, ... Indeed note that g(.) depends only on a; so if we consider kdimensional symmetric a-stable Levy process then the transition density will be given by (2.11) with d replaced by k; and as the density is well defined at x = z the claim follows. Remark 2.3. From the preceding theorem it follows that

Proposition 2.4. Denote Po(s, x; t, z) = f)p(s, x; t, z)/f)s, Pi(S, x; t, z) = f)p(s, x; t, Z)/f)xi, Pij(S, x; t, z) = f)2p(s, x; t, Z)/f)xif)Xj, 1 :s: i,j :s: d.

Reflected Levy Process

120

(i) Fix t > O,Z E JRd. Let to < t; then P,Po,Pi,pij,l ::; i,j ::; d are bounded contin uous functions of (s, x) on [0, to] X JRd. (ii) For any t > 0,8 > 0 sup{I'V xp(s, x; t, z)1 : 0::; s < t, Iz -

xl

~

8} ::; K(d, 8)

(2.12)

where K (d, 8) is a constant depending only on d, 8 and 'V x denotes gradient w. r. t. x-variables. 2

2

2

Proof: (i) Since ye- Y ,y e- Y are bounded, using Remark 2.3 and dominated convergence theorem, the assertion can be proved by differentiating w.r.t. s, x under the integral in (2.11). (ii) Since yd+2 e- y2 is bounded, differentiating under the integral in (2.11) we get for all 0 ::; s < t, Iz - xl ~ 8

l'Vxp(s,x;t,z)1

< K(d)

J (2) oo

g(r)

Iz _ xl

I ) d+2

2r(: ~ s~l/a

d+l ( I

exp

{

4r2~t -=- :)2/a I

-

12}

dr

o

< K(d)

(~)

d+l

00

j g(r)dr

=

K(d, 8).

o

o The following result indicates a connection between the transition density and the generator; though it is not unexpected, a proof is given for the sake of completeness.

Theorem 2.5. For fixed t > 0, z E JRd the function (s, x) the Kolmogorov backward equation

Po(s, x; t, z)

+ 11~/2p(s, x; t, z) = 0, s < t, x

E

1-+

p( s, x; t, z) satisfies

JRd

(2.13)

where Po is as in the preceding proposition and x in 11~/2 signifies that 11 a/2 is applied to p as a function of x. Proof: By the preceding proposition and Lemma 2.1 11~/2p(s, x; t, z) is a bounded continuous function. Put u(s, x) = p(s, x; t, z), s < t, x E JRd. Using Ito's formula (see [7]) for 0 ::; s < c < t, x E JRd c

E{u(c, B(c)) - u(s, B(s)) - j[uo(r, B(r))

+ 11a/2u(r, B(r))]drIB(s)

s

That is

j p(c, y; t, z)p(s, x; c, y)dy - p(s, x; t, z) IRd c

j j [po(r,y;t,z) S

IRd

+ 11~/2p(r,y;t,z)]p(s,x;r,y)dy

dr.

=

x} = O.

Amites Dasgupta and S. Ramasubramanian

121

By Chapman-Kolmogorov equation, l.h.s. of the above is zero. As the above holds for all c > s and the quantity within double brackets is bounded continuous in (r, y), by Feller continuity one can obtain (2.13) from the above letting c 1 s. D

We next look at the O-resolvent (or potential operator) associated with the process B (.). For a measurable function rp on JRd, x E JRd define

J J 00

Grp(x) =

p(O,x;t,z)dt dz =

rp(z)

IRd

JJ 00

0

rp(z)p(O,x;t,z)dz dt

0

whenever the r.h.s. makes sense. Since difficult to see that

(2.14)

IRd

°< a < 2 ::::;; d, using (2.11) it is not

00

Jp(O,x;t,z)=Clz_~ld_<x,zi-x

(2.15)

o

which is the so called Riesz kernel.

Theorem 2.6. Let rp E C;(JRd) and rp,r.pi,r.pij,l::::;; i,j::::;; d be integrable w.r.t.

the d-dimensional Lebesgue measure. Then (aj Gr.p E C;(JRd), (bj (Gr.p)i(X) = Gr.pi(X), (Gr.p)ij(X) Gr.pij(X), x E JR d,l < Z,] < d (c) f:..<X/2Gr.p(x) = -rp(x),x E JRd. D We need a lemma

Lemma 2.7. If f E L 1 (JRd) n LaO (JRd) then Gf is well defined, bounded and

continuous. Proof: Let {Tt} be the contraction semi group associated with B(·). Observe that

J

JJ

o

1

1

Gf(x) =

Td(x)dt+

00

f(z)p(O,x;t,z)dz dt.

(2.16)

IRd

°

Since Td is continuous for each t > and ITdUI ::::;; Ilflloo it is clear that the first term on r.h.s. is bounded and continuous. By (2.11)

r

If(z)p(O,x;t,z)l(1,oo)(t)1 ::::;; K

°

d/ a lf(z)11(1,00)(t)

which is integrable as < a < 2 ::::;; d. So continuity of p in x now implies that the second term on r.h.s. of (2.16) is bounded and continuous. D

Proof of Theorem 2.6: By Lemma 2.6 we get Gr.p, Gr.pi, Gr.pij are bounded continuous. A simple change of variables yields

JJ 00

r.p(z + he~) - r.p(z) p(O, x; t, z)dz dt

o IRd

JJ 00

rpi(Z)P(O, x; t, z)dz dt

o

IRd

Reflected Levy Process

122

by dominated convergence theroem; thus (G
A

u

cx/2G

lfft?

1·1m TtG
tlO

t [J= J "'( o

lfft?

=

z )p(O, X; t

+ s, z )dz ds -

IR d

z

0

t [-- / J

",(z )p(O, x; s, z )dz

o

J= J "'( )p(0;

dS]

IR d

dS] = -
IR d

for each x E IR d , completing the proof.

3

X; s, z )dz

D

Reflected process

Let D = {x E IR d : Xi > 0, 1 ::::: i ::::: d} be the d-dimensional positive orthant. The reflection field is a function R : [0,00) X IR d X IR d ~ M d(IR) where M d(IR) is the space of (d x d) matrices with real entries. We write R(t,y,z) = (rij(t,y,z)). We assume the following Assumptions (AI) The function (y, z) uniformly in t, for 1 ::::: i, j ::::: d.

I---t

rij(t, y, z) is Lipschitz continuous,

(A2) For i =I- j, there exist Vij such that Irij(t,y,z)1 ::::: Vij for all t,y,z. Set V = (( Vij)) with Vii = 0. We assume spectral radius of V = o-(V) < 1. (A3) Take rii(·,·,·) == 1,1 ::::: i ::::: d. (A2) is a uniform Harrison-Reiman condition that has proved useful in queueing networks; (A3) is just a suitable normalization. Let s;::: O,x E D. The Skorokhod problem in D corresponding to {B(t) : t 2:: s} and R consists in finding Ft-adapted r.e.l.l. processes y(s,x)(t), Z(s,x)(t), t 2:: s such that (i) Z(s,x) (t) E D for all t ;::: s; (ii) ~(s,x)(s) = 0, ~(s,x)(.) is nondeereasing, 1 ::::: i ::::: d; (iii) ~(s,x) (-) can increase only when Z;S,x) (.)

J

= 0;

that is, for 1 ::::: i ::::: d, t 2:: s,

t

~(s,x) (t)

=

l{o} (Zi(s,x)

s

(r))d~(s,x) (r), a.s.

(3.1)

Amites Dasgupta and S. Ramasubramanian

123

(iv) Skorokhod equation holds, viz. for 1 ~ i ~ d, t :2: s

Xi

+ Bi(t) -

+L j#i

Bi(S)

+ ~(s,x)(t)

J t

rij(u, y(s,x)(u_), Z(s,x)(u_ ))dlj(s,x)(u)

(3.2)

s

or in vector notation

J t

Z(s,x)(t) = X + B(t) - B(s)

+

R(u, y(s,x)(u_), Z(s,x)(u_ ))dY(s,x)(u). (3.3)

s

Solving the deterministic Skorokhod problem path by path one can solve the above stochastic problem. Indeed the following result is given in [11]. Proposition 3.1. Assume (Ai) - (A3). For each s :2: 0, xED there is a unique

pair Z(s,x)(.), y(s,x)(.) solving the above problem; also ~(s,x)(t) ~ ((1 - V)-l L(s,x»)i(t), a.s.

(3.4)

for t ~ s where L(s,x) (-) is given by Lis,x)(t)

=

sup max{O, -[Xi

+ Bi(t) -

Bi(S)]}.

s~u~t

Moreover {(Z(s,x)(t), y(s,x)(t)) : t :2: s} is an Ft-adapted 15 x 15-valued Feller continuous strong Markov process. Any discontinuity ofY(s,x) (., w) or Z(s,x) (-, w) has to be a discontinuity of B(·,w). If R is a function only oft, z then {Z(s,x)(t) : t :2: s} is a 15-valued Feller continuous strong Markov process. 0 The z-part of the above viz. {Z(s,x) (t) : t ~ s} may be called the reflected (or

regulated) symmetric a-stable Levy process. Proposition 3.2. Assume (Ai) - (A3) and let 1

< a < 2.

Then E[var (y(s,x)(.); [s, t])] < 00 for all t > s :2: 0, xED, where var (g(.); [a, b]) denotes the total variation of 9 over [a, b]. Proof: As ~(s,x) (-) is nondecreasing for each i it is enough to show that EI~(s,x)t)1

<

also we may take s = 0, x = 0. Since a > 1 note that EIBi(t)Ia:' < 00 for all 1 ~ a' < a. As B(-) is symmetric note that it is a martingale. (3.4) of the preceding proposition implies 00;

EI~(O,O)(t)la' ~ C E [sup

O~r~t

IBi(r)l] a'

~ 6 EIBi(t)Ia:' <

00

by Doob's maximal inequality for any 1 < a' < a. The required conclusion now ~~.

0

Note: In the context of reflected processes, the reflection terms are usually specified only for z on the boundary. However, no matter how the reflection

ReBected Levy Process

124

field is extended to D or JRd, only the values on the boundary determine the process; Theorem 4.5 of [12] and its proof can be easily adapted to our situation. The next result concerns expected occupation time at the boundary.

Theorem 3.3. Assume (AJ) - (A3); let 1 < s

< 2. Thenfors 2': O,X

0:

E D,t

>

(3.5)

Proof: We consider only s=O. Note that aD = {x E JRd : Xi =0 for some i}. Let H = {x E JRd : min IXil :::; I}. Let

aD =

~

{

For 0 < E :::; 1 define CPt on JRd by CPt(z) = cp(zIE). Note that d d CPEl CPt,i, CPt,ij ECb(JR ) n Ll(JR ); also they are supported on EH ~ H. Clearly

(3.6) Next define gt on JRd by CXl

gt(x)

j -

=

E: CPt(x) j 0

IRd

By Theorem 2.6, !:1 0'./2 g €

=

(3.7)

p(O, x; t, z)dt dz.

t~
< E :::;

1. We now claim that

supEO'.lg€(X)I-----t 0 as

E

1 o.

(3.8)

x

Putting S = tlEO'. in (3.7) and as l
EO'.lg€(x)1

:::;

EO'.j jP(O,X;EO'.S,Z)dzds o IR d CXl

+EO'. j Icp€(z)1 j IR d

h (x; E)

p(O,X; EO'.S, z)ds dz

1

+ h(x; E).

As p(O,X;EO'.S,·) is a probability density suplh(x;E)1 :::; EO'. -----t O. As cP is intex

grable, by (2.11) sup Ih(x; E)I x

<

fa

J1
IR d

a

ds dz

1

C EO'.-d j cp( ~z)dz = C EO'. j cp(z)dz IR d

CEO'. -----t 0

IRd

Amites Dasgupta and S. Ramasubramanian

125

whence (3.8) follows. We next show that supEaIVg",(x)l-t 0 as E 1 o.

(3.9)

x

By Theorem 2.6, and putting s = t/Ea gives

J J

J (~) ~ J 00

p(O,x;Eas,z)ds dz

-
IRd

0

00

-
p(O,x;Eas,z)ds dz.

IRd

Since
:s: i :s:

supEaIVgE(x)1 x

because

()i

>

0

d, an argument similar to the derivation of

:s: C

Ea - l -t 0 as E 1 0

1; this proves (3.9).

Now applying Ito's formula to Eag",(Z(O,x) (.)), denoting Z(O,x) (.) by Z(.), y(O,x) (.) by Y (.) and taking expectations we get t

EkagE(Z(t)) - Eag",(X)]

=

E

°

J t

+E

J

(R(u, Y(u-), Z(u- ))EaVg",(Z(u)), dY(u)).

(3.10)

°

By (3.8) l.h.s. of (3.10) tends to zero as E -t o. As R is bounded, Proposition 3.2 and (3.9) imply that the last term in (3.10) goes to zero as E -t O. Finally, D as l
Remark 3.4. A function

D

Using Theorem 3.3 we now improve on it!

Theorem 3.5. Assume (AJ) - (A 3), 1 <

()i

P(Z(s,x) (t) E aD)

< 2. Thenfors;::: O,X =

O.

E D,t

>s

(3.11)

Reflected Levy Process

126

Proof: Let ((z)

(;J + ... + zl~)}, where K

= K exp {h(r)

>2

e exp { -1!r2} , Irl:S: 1 o , Irl ~ 1

= {

f > 0 define fE(Z) = h(((Z/f)), Z E JRd. Clearly fE E C~(JRd) and afE(Z)/aZi = 0 for any Z E aD, 1 :s: i :s: d. It is not difficult to see that

For

lim fE(Z) dO

= 1cw(z), Z E JRd

(3.12)

(for Z ~ aD note that Zi > c for all i for some c > 0; hence ((Z/f) > 1 for all small f). Next, an argument as in Lemma 2.1 gives for f > 0 (3.13) for suitable constants C 1, C2 . Now we claim that for

Z

E

D\aD,

t1 Ct / 2 fE(Z)

---+

0 as flO.

(3.14)

Indeed let Z rt aD; there exist ro > 0, c > 0 such that (Zi + ~i) > c, 1 :s: i :s: d for I~I < ro· Choose fa > 0 so that for all f < fa, (((Z+~)/f) > K exp{ -df2 /c 2} > 1 for I~I < ro· Therefore fE(Z +~) = 0 = fE(Z) for all I~I < ro, f < fa and hence

t1 Ct / 2fE(Z)

fE(Z+~)I~I~+Ctd~.

j

=

(3.15)

1~I>ro

Since 1~ll+Q l(ro,oo)(I~I) is integrable and Ad(aD) = 0, by (3.12), (3.15) now the claim (3.14) follows. To prove the theorem we consider only the case s = O. Denote by Z(·), Y(·). We want to prove that for xED, t > 0,

Z(O,x) (.), y(O,x) (-)

t

limEjt1 Ct / 2fE(Z(r))dr dO

=

(3.16)

O.

a

By Theorem 3.3 and (3.13) for each

f

> 0,

t

E j 1aD(Z(r))t1Ct/2 fE(Z(r))dr a For c> 0, put Dc prove that

= (2c,oo)d.

=

o.

(3.17)

In view of (3.17), to prove (3.16) it is enough to

t

l!reE j 1DcCZ(u))t1Ct/2 fE(Z(u))du = 0 a

(3.18)

Amites Dasgupta and S. Ramasubramanian

127

for any fixed c > O. If Z E Dc, I~I < c note that Zi + ~i > c, 1 ~ i ~ d. So one can choose EO > 0 such that fE(Z +~) = 0 for all I~I < c, Z E Dc, E < EO. Hence for any E < EO 11DJZ(u))~a /2 fE(Z(u))1 ~

J

1 1~Id+ad~ ~ C ac1a '

1~I>e

The required assertion (3.18) and hence (3.16) now follows by (3.14) and dominated convergence theorem. Now to prove (3.11) (with s = 0), first consider the case x tJ- an. Since afE(-)/aZi = 0 on an, and y(.) can increase only when Z(·) E an, by Ito's formula

J~a/2 t

E[fE(Z(t))]- fE(X)

=

E

fE(Z(r))dr.

a By (3.12), (3.16) letting

lOin the above we get (3.11).

E

Next let x E an; for c > 0 let TJ - TJ~x) = inf{r ~ 0 : Z(r) E Dc}. By strong Markov property and the preceding case

E[l[o,tj(TJ)laD(Z(t))]

= O.

Note that {TJ~x) ~ t} i 0 (modulo null set) as c 1 0; otherwise we will get a contradiction to Theorem 3.3. Letting c lOin the above we get the required conclusion. This completes the proof. 0

Note: It may be interesting to compare the proofs of Theorems 3.3, 3.5 with those of their analogues for reflected Brownian motion given in [6]. In the following \1 2p(r, y; t, z) = \12P(r,'; t, z), ~~/2p(r, y; t, z) = ~~/2p(r,.; t, z) denote respectively the operators \1, ~ a/2 applied as function of y-variables. Our main result is

Theorem 3.6. Assume (AJ) - (A3); let 1 < a < 2. For 0 ~ s < t < oo,x E fJ, zED define

p(s,x;t,z)

J t

+E

(R(u, Y(u-), Z(u- ))\1 2p(u, Z(u); t, z), dY(u)) (3.19)

s

where Y(-) = y(s,x)(-),Z(-) = Z(s,x)(.). ForO ~ s < t,x E fJ,z E an take pR(s, x; t, z) = O. Then (i) pR is continuous on {O ~ s < t < 00, x E fJ, ZED}, it is also differntiable in (t, z); (ii) for any Borel set A S;;; fJ, s < t, x E fJ P(Z(s,x) (t) E A)

=

J

pR(s, x; t, z)dz.

(3.20)

A

In case R is independent of y-variables, pR is the transition probability density function of the Markov process Z(·). 0

Reflected Levy Process

128

We need a lemma Lemma 3.7. Hypotheses and notation as in the Proposition 3.2.

If

(sn,x n ) - t (s,x) then for a.a. w, forT> s var (Y(Sn,Xn)(.,w) - y(s,x)(.,w); [s,T]) sup IZ(Sn,Xn)(t,w) - Z(s,x)(t,w)1

-t

0

-t

o.

s~t~T

Proof: Denote z(n)(.) = Z(Sn,X n )(.), y(n)(.) = Y(Sn,X n )(.), Z(.) = Z(s,x)(.), Y(·) = y(s,x)(-). We first consider the case Sn < s for all n. Clearly z(n)(t,w), yen) (t, w), t :2 s is the solution to the Skorokhod problem corresponding to z(n)(s,w) + B(·,w) - B(s,w). For any T > s note that var ([Be, w) - B(s, w)

=

Iz(n)(s,w) -

+ Zen) (s, w)]

- [B(·, w) - B(s, w)

+ xl; [s, T])

xl.

For any w such that B(·, w) is continuous at s we have Xn +B(s, w) - B(sn' w) x. Boundedness of Rand (3.4) imply

J

-t

S

R(u, y(n)(u-),z(n)(u-))dy(n)(u,w)

Thus Iz(n)(s,w) [11].

xl

-t

-t

0 as n

-t

00.

0, and hence the result follows by Proposition 3.9 of D

Next let Sn > S for all n. For any n,Z(t,w),Y(t,w),t:2 Sn is the solution to the Skorokhod problem corresponding to Z(sn,w) + B(·,w) - B(sn,w). Clearly var ([xn

+ B(·, w) -

B(sn, w)]- [Z(sn, w)

+ B(·, w) -

B(sn, w)]); [sn, T])

IZ(sn,w) - xnl· So by the arguments as in [11] var (y(n)(.,w) - Y(·,w); [sn' T]) sup

Iz(n)(t,w)-Z(t,w)1

< CIZ(sn'w) - xnl < CIZ(sn,w)-xnl.

sn~t~T

Note that for s ::;: t ::;: Sn we may take z(n)(t,w) = Xn , y(n)(t,w) = O. Clearly var (Y(·, w); [s, sn]), sup IX n - Z(t, w)l, IZ(sn, w) - xnl all tend to 0 as Sn - t S s~t~sn

by right continuity. The required conclusion is now immediate. Proof of Theorem 3.6: Since dY(s,x)(.) can charge only when Z(s,x)(.) E aD and d(z, aD) > 0 for z tI- aD, well definedness of (3.19) follows from (2.12) and Proposition 3.2. Assertion (i) now follows from properties of p (viz. (2.11), (2.12), Proposition 2.4), boundedness and continuity of R and Lemma 3.7. To prove assertion (ii), in view of Theorem 3.5, it is enough to establish (3.20) when A c D.

Amites Dasgupta and S. Ramasubramanian

129

Fix t > s; let E > o. Apply Ito's formula to p(r, Z(s,x)(r); t, z), s :::; r :::; (t - E) corresponding to the semimartingale Z(s,x)(.) and use Theorem 2.5 to get

= p(s, x; t, z)

p(t - E, Z(t - E); t, z)

t-E

+

J

(R(r, Y(r-), Z(r- ))\l2p(r, Z(r); t, z), dY(r))

s

+ a stochastic integral. Let any

(3.21)

J be a continuous function with compact support KeD. By (3.21) for E > 0

J J J E

J(z)p(t - E, Z(t - E); t, z)dz

D

=

J

J(z)p(s, x; t, z)dz

D

t-E

+E

J(z)

D

(R(r, Y(r-), Z(r- ))\l2p(r, Z(r); t, z), dY(r))dz

(3.22)

s

For any w, note that p(t - E, Z(t - E, w); t, z)dz =? 6Z(t-,w)(dz) as since P(Z(t) =I- Z(t-)) = 0 it now follows that lim[l.h.s. of (3.22)] = E[J(Z(s,x)(t))]. dO

E

1 o.

And (3.23)

As d(K, aD) > 0, by (2.12), Proposition 3.2 and boundedness of J(-), R(·) lim[r.h.s. of (3.22)] = dO

J

J(z)pR(s, x; t, z)dz.

(3.24)

E[J(Z(s,x)(t))]

(3.25)

D

Thus

J

J(z)pR(s, x; t, z)dz

=

D

for any continuous function

J with compact support in D.

Next for any open set FeD, let {In} be a sequence of continuous functions with compact support in D such that In rlF pointwise. Clearly lim E[Jn(Z(s,x)(t))] = E[I F(Z(s,x)(t))].

(3.26)

n--+CXl

Taking expectation in (3.21) and letting

E

lOwe get

pR(s, x; t, z) = lim E[P(t - E, Z(t - E); t, z)] 2:: O. dO

Therefore by monotone convergence theorem

nl~~

J

J

D

D

In(Z)pR(s, x; t, z)dz =

IF(Z)pR(s, x; t, z)dz.

(3.27)

Now (3.25), (3.26), (3.27) imply that (3.20) holds for any open FeD, and hence for any Borel set A cD.

Reflected Levy Process

130

Finally, the last assertion is immediate from (ii); this completes the proof.

0

We conclude with the following questions. 1. Can (x, z)

r--t

pR(s, x; t, z) given by (3.19) be extended continuously to

2. Is pR(s,x;t,z)

> 0 for s < t,x,z

D x D?

E D?

3. When is pR symmetric in x, z?

Acknowledgement: The authors thank B. Rajeev and S. Thangavelu for some useful discussions; and Siva Athreya for bringing [1] to their notice while the work was in progress. Amites Dasgupta Stat.-Math. Unit Indian Statistical Institute 203 B.T. Road Kolkata - 700 108

S. Ramasubramanian Stat.-Math. Unit Indian Statistical Institute 8th Mile Mysore Road Bangalore - 560 059

Bibliography [1]

R. Atar and A. Budhiraja: Stability properties of constrained jumpdiffusion processes. Preprint, 2001.

[2]

S. Balaji and S. Ramasubramanian : Asymptotics of reflecting diffusions in an orthant. Proc. Internat. Conf. Stochastic Processes, December'96, pp. 57-81. Cochin University of Science and Technology, Kochi, 1998.

[3]

I. Bardhan:

[4]

K. Bogdan: The boundary Harnack principle for the fractional Laplacian. Studia Math. 123 (1997) 43-80.

[5]

K. Bogdan and T. Byczkowski Potential theory for the a-stable Schrodinger operator on bounded Lipschitz domains. Studia Math. 133 (1999) 53-92.

[6]

J. M. Harrison and R. J. Williams: Brownian models of open queueing networks with homogeneous customer populations. Stochastics 22 (1987) 77-115.

[7]

N. Ikeda and S. Watanabe: Stochastic differential equations and diffusion processes. North-Holland, Amsterdam, 1981.

[8]

K. Ito : Lectures on Stochastic Processes. Tata Institute of Fundamental Research, Bombay, 1961.

Further applications of a general rate conservation law. Stochastic Process. Appl. 60 (1995) 113-130.

Amites Dasgupta and S. Ramasubramanian

[9]

131

O. Kella: Concavity and reflected Levy process. J. Appl. Probab. 29 (1992) 209-215.

[10] S. Ramasubramanian: Transition densities of reflecting diffusions. Sankhya Ser. A 58 (1996) 347-381.

[11] S. Ramasubramanian : A subsidy-surplus model and the Skorokhod problem in an orthant. Math. Oper. Res. 25 (2000) 509-538. [12] S. Ramasubramanian: Reflected backward stochastic differential equations in an orthant. Proc. Indian Acad. Sci. (Math. Sci') 112 (2002) 347-360. [13] G. Samorodnitsky and M. S. Taqqu: Stable non-gaussian random processes : stochastic models with infinite variance. Chapman and Hall, New York, 1994. [14] W. Whitt: An overview of Brownian and non-Brownian FCLT's for the single-server queue. Queueing Systems Theory Appl. 36 (2000) 39-70.

132

Reflected Levy Process

On Conditional Central Limit Theorems For Stationary Processes Manfred Denkerl Universitiit Gottingen and Mikhail Gordin V.A. Steklov Institute of Mathematics Abstract The central limit theorem for stationary processes arising from measure preserving dynamical systems has been reduced in [6] and [7] to the central limit theorem of martingale difference sequences. In the present note we discuss the same problem for conditional central limit theorems, in particular for Markov chains and immersed filtrations.

1

Introduction

Let ((khcl = ((~k' 'r/k)hE'l, be a two-component strictly stationary random process. Every measurable real-valued function f on the state space of the process defines another stationary sequence (f ((k) ) kE'Z. Various questions in stochastic control theory, modeling of random environment among many other applications lead to the study of conditional distributions of the sums l:~:~ f ((k) given 'r/O, ... ,'r/n-l. In particular, the asymptotic behaviour of these conditional distributions is of interest, including the case when the limit distribution is normal. We shall prove conditional central limit theorems in the slightly more abstract situation of measure preserving dynamical systems (X, F, P, T), where (X, F, P) is a probability space and T : X --t X is P-preserving. Let f be a measurable function and Ji be a sub-O"-algebra. f is said to satisfy the conditional central limit theorem with respect to Ji (CCLT(Ji)), if P a.s. the conditional distributions of n-l

1 ""' Vn L..;foT k ,

k=O

given Ji, converge weakly to a normal distribution with some non-random variance 0"2 2': o. This leads to the identification problem for L2(P)-subspaces consisting of functions satisfying a CCLT. Following [6], an elegant way to describe such subclasses IThis paper is partially supported by the DFG-RFBR grant 99-01-04027. The second named author was also supported by the RFBR grants 00-15-960l9 and 02-0l-00265.

133

134

On Conditional Central Limit Theorems For Stationary Processes

uses T-filtrations, i.e. increasing sequences of o--fields Fn = T- 1 Fn+l' n E Z. Here we need to consider a pair of T-filtrations (Fn)nEZ and (Qn)nEZ satisfying 9n C Fn for every n E Z. For example, in case of a strictly stationary random process (~khEZ as above the o--field Fn (or 9n) is generated by ((k)k~n (or (T/kh~n' respectively). First of all, the conditional distributions in CCLT(1i) are determined by

1i

=

V 9k V V Fk·

kEZ

k~O

Secondly, a general condition describing the class of functions f for which the CCLT(1i) holds is given by the coboundary equation f = h + g - goT with a (Fn)nEz-martingale difference sequence h 0 Tk (i.e. h is UT 1i-measurable and EH f := EUI1i) = 0). The coboundary equation is implicit ely also used in [10] and [9]. In [10], sufficient conditions for CCLT(1i) are obtained, when 1i is replaced by 1i = VkEZ 9k, and our Proposition 3.1 contains this result as a special case. This proposition also specializes in case of skew products T(x,y) = (T(x),Tx(Y)) as in [9], where 9n is a T-filtration, and where 1i is also replaced by 1i.

It is hardly possible to verify this coboundary condition using properties of the o--fields (Fn)nEZ and (Qn)nEZ without making assumptions about their interaction. It has been noticed in [5] that conditional independence plays a fundamental role when studying conditional measures and their properties in connection with thermodynamic formalism. This additional property of conditional independence has been called immersion in [1], and we shall adopt this terminology. It means that for every n E Z the o--fields Fn and 9n+l are conditionally independent given 9n. The property of immersion is an essential simplification, although it seems to be rather strong. However, it looks quite natural in several situations (see e.g. [5]), in particular, when both ((khEZ and (T/khEZ are Markovian. Indeed, if the sequence (T/k)kEZ models the time evolution of a random environment influencing the process (~k)kEZ' the condition just means that there is no interaction between the process (~k)kEZ and the environment (T/k)kEZ, The same picture arises when (~khEZ models the outcome of non-anticipating observations over the process (T/khEZ, mixed with noise. If the sequence ((khEZ is a Markov chain, there is a natural assumption in terms of transition probabilities to guarantee that the corresponding filtrations are immersed (see Section 4). The notion of immersed filtrations was first recognized as an important concept in connection with the classification problem of filtrations (see [1] and references therein). A closely related notion, regular factors, was introduced in [5]. The latter paper also contains some examples of regular factors originating in twodimensional complex dynamics. In more general situations (like in control theory) some form of the feed-back between the two processes may be present, and we cannot expect that the corresponding filtrations are immersed. In this case more general concepts and results (like Theorem 3.7 of the present paper) have to be developed. In particular, we study the CCLT-problem for functions of Markov chains. We

Manfred Denker and Mikhail Gordin

135

follow the ideas in [7] closely where a rather general and natural condition in terms of the transition operator was introduced for the CLT-problem. This condition means that the Poisson equation is solvable, and it avoids mixing assumptions and similar concepts (e.g. [9] contains results in this direction). There is a natural construction embedding the original Markov chain into another one, for which the Poisson equation has to be solved. We give some comments how this verification can be done, in particular, in the context of fibred dynamical systems [5]. However, we do not go into much of details. As a consequence we obtain the functional form of the CLT for fluctuations of a random sequence around the conditional mean. Finally, we consider the case of immersed Markov chains. This property together with a solution of the Poisson equations for the original and extended Markov chains establishes an analogous result for conditional mean values of the original sequence, in addition to the CLT for fluctuations. The present paper arose from an attempt to understand Bezhaeva's paper [2] from the viewpoint of martingales. Bezhaeva's article studies the same problem as in the present note in the special case of finite state Markov chains. We do not reproduce these results in detail and formulate the conclusions of our theorems in a way different from the viewpoint taken in [2]. However, we would like to sketch the differences in both approaches. There are two results on the CLT in [2]: Theorem 3 and Theorem 5 (the latter theorem seems to be the most important result of [2]). Our corresponding results are Theorem 3.7 and Theorem 4.4. Though, we do not verify here that the conditions of our Theorem 4.4 are satisfied for a class of Markov chains considered in [2] and arbitrary centered functions: this would be just a reproduction of a part of [2]. Its proof and the content of our Section 4 clearly show that even for finite state Markov chain we really deal with continuous state space when considering a conditional setup. In fact much more general chains than in Theorem 5 in [2] (for example, geometrically ergodic) can be considered on the basis of our Theorem 4.4. Our method of proving the CLT is quite different from that of [2] and, as was remarked above, is based on approximation by martingales. We assume in this paper that all probability spaces and (j-fields satisfy the requirements of Rokhlin's theory of Lebesgue spaces and measurable partitions. This does not imply any restriction to the joint distributions of random sequences we are considering; hence we may freely use conditional probability distributions given a (j-field. An alternative approach would be to reformulate the results avoiding conditional distributions. However, we do not think that the advantages given by such an approach justifies the complexity of such a description.

2

Immersed Filtrations

Throughout this paper, let (X, F, P) and T : X ---+ X be, respectively, a probability space and an automorphism of (X, F, P) (that is an invertible Ppreserving measurable transformation). An increasing sequence of (j-subfields

136

On Conditional Central Limit Theorems For Stationary Processes

(Fn) nEZ of F will be called a filtration and a T -filtration if, in addition, T- 1(Fn) = Fn+l for every n E Z. Any a-field E ~ F defines a natural T-filtration (En)nEZ = (T-nE)nEZ, whenever T-1E :2 E. A filtration (Qn)nEZ is said to be subordinated to a filtration (Fn)nEZ, if for every n E Z (2.1 ) and it is called immersed into the filtration (Fn)nEZ, if (Qn)nEZ is subordinated to (Fn)nEZ and for every n E Z the a-fields Fn and Qn+l are conditionally independent given Qn. We shall always assume that

F=

V Fn

(2.2)

nEZ

(V sES Es

denotes the smallest a-field containing all a-fields Es , s E S). Setting Q = VnEZ Qn it follows from the definition of a T-filtration that Q is completely invariant with respect to T (that is T-1(Q) = Q). Finally, define F- = nkEZ F k , and similarly Q- = nkEZ Qk· Throughout this paper (Qn)nEZ always denotes a T-filtration which is subordinated to the T-filtration (Fn)nEZ. We then set

'lin

=

Q V Fn·

The transformation T defines a unitary operator UT on L2 = L 2(X, F, P) by UT f = f 0 T, f E L 2 . Given a sub-a-field 'li c F, we denote its conditional expectation operator (on L 2 ) by EH and its conditional probability by P( ·I'li). Let II . 112 denote the L 2 -norm. As mentioned above, the notion of immersed filtrations arises naturally in the context of Gibbs measures in the thermodynamic formalism (see [5]) and of Markov chains (see e.g. [2]). In order to simplify our conditions in the CCLT for these applications we need the following lemma for immersed filtrations. Lemma 2.1. The T -filtration (QkhEZ is immersed into the T -filtration (Fk)kEZ, if for every n E Z (2.3)

or, equivalently,

(2.4) Conversely, if UhhEZ is immersed into (FkhEZ, then the following equalities hold for every n E Z and m ~ 1:

(2.5) Proof. We first show that (Qk)kEZ is immersed into (Fk)kEZ, if (2.3 ) holds. Let n E Z be fixed and let ~ and ry be bounded functions measurable with respect to Fn and Qn+l, respectively. It follows from (2.3 ) that EFnry = EYnry. Therefore we have

EYn EFn (~ry) = EYn (~EFnry) EYn (~EYnry) = EYn (OEYn (ry),

Manfred Denker and Mikhail Gordin

137

which implies the conditional independence of Fn and 9n+l given 9n. In a similar way (replacing Fn by 9n+l) one shows conditional independence assuming

(2). Conversely, we first show that conditional independence of Fn and 9n+l given 9n for some n E Z implies (2.3). Indeed, it suffices to verify (2.3 ) for all bounded Fn V 9n+l-measurable functions ofthe form ~TJ, where ~ and TJ are Fnand 9n+l-measurable, respectively. By conditional independence, for a 9n+lmeasurable, bounded function h,

whence EYn+l~ that

=

EYn~. Similarly one shows that gFnTJ

EFn (TJEYn+10 (EYn~)(EFnTJ)

= EYnTJ.

It follows

= EFn (TJEYn~) = (EYn~)(EYnTJ)

EYn (~TJ)· Since the equation (2.4 ) can be proved similarly, we obtain the equivalence of (2.3 ) and (2.4 ). Moreover, by induction one easily verifies (2.5 ). 0

3

A Conditional Central Limit Theorem

Let (Vk)k2:1 be a sequence of real-valued random variables. For every n E Z+ define a random function with values in the Skorokhod space D([O, 1]) ([3], [8]) in the standard way: it is piecewise constant, right continuous, equals in the interval [0, lin) and equals n- 1 / 2 Ll::;m::;[ntlVm for a point t E [lin, 1]. This random function will be denoted by Rn(Vl, ... , vn ) and has a distribution on D([O, 1]), denoted by Pn(Vl, ... , v n ). We write Wa for the Brownian motion on [0,1] with variance (J2 of w a (1) (we need not exclude (J2 = since Wo is the process which identically vanishes). The distribution of Wa in C([O, 1]) will be denoted by Wa.

°

°

Remark 3.1. In the sequel we deal with convergence in probability of a sequence of random probability distributions in D([O, 1]) to a non-random probability distribution. It is assumed here that the set of all probability distributions in D( [0, 1]) is endowed with the weak topology. It is well known that the piecewise constant random functions (in D([O, 1])) can be replaced by piecewise linear functions (in C([O, 1])) without changing the essence of the results formulated below.

3.1

A general CCLT

As mentioned in the introduction the conditional central limit theorems in [9] and [10] are proved using some martingale approximation. There are different

138

On Conditional Central Limit Theorems For Stationary Processes

versions of a martingale central limit theorem which may be used in the present context. They all are versions and extensions of Brown's martingale central limit theorem. It has been used in [10] directly, and is used in [9] and here in a modified form. We apply a corollary of Theorem 8.3.33 in [8] to obtain the following CLT for arrays of martingale difference sequences. Lemma 3.1. For n E Z+ let (nn, F n , (Fk,n) k?:.O, pn) be a probability space with filtration Fk,n C Fn (k ~ O), and let (vk,nh?:.l be a square integrable martingale difference sequence with respect to ((Fk,nk:::o, pn). If for every E > 0 and t ~ 0 we have (3.6) grk- 1 •n (v~,n1{lvk,nl>E}) -----+ 0 l:S;k:S;nt

L

and

(3.7) l:S;k:S;nt in probability as n

-----+

(Xl

then {Pn (Vl' ... , v n ) : n ~ I}, converges weakly to

Wa 2 • The following proposition is the key result in the martingale approximation method for the CCLT. Implicitly it also appears in [10], and its proof is analogous to that for the central limit theorem in [6] or [7]. Proposition 3.1. Let T be an ergodic automorphism and (Hn)nEZ be a Tfiltration. Assume that g, h E L2 and (3.8) If f is defined by f = h+ g - UTg,

(3.9)

then, with probability 1, the conditional distributions Pn(J, UT f, ... ,U:;,-l flHo) given Ho of the random functions Rn (J, UT f, ... ,U:;'-l 1) converge weakly to the (non-random) probability distribution W a , where (J = IIhl12 ~ O.

Remark 3.2. The equations in (3.8 ) say that the sequence (U¥h)nEz is a stationary martingale difference sequence with respect to the filtration (Hn)nEZ, Remark 3.3. The conclusion of Proposition 3.1 remains true if the (J-field Jio in the statement is changed to any coarser one. This follows easily from the definition of weak convergence and the non-randomness of the limit distribution. Proof of Proposition 3.1. By remark 3.2 the sequence of finite series Vk,n = n- l / 2U!;-l h, (1 ~ k ~ n), form a martingale difference sequence with respect to the filtrations (Hk)o:S;k:S;n' Assume first that (J > O. We show that the sequence {vk,nI1 ~ k ~ n, n E Z} with probability 1 satisfies the conditions 3.6 and 3.7 of Lemma 3.1 with respect to the conditional distribution given Ji o· Relative to this conditional distribution with probability 1 the sequence (U¥h)nEz is a (non-stationary) sequence of martingale differences with finite second moments. The ergodic theorem implies that with P-probability 1

Manfred Denker and Mikhail Gordin

139

as n - 00. It follows that with probability 1 the same relation holds almost surely with respect to the conditional probability given H a, establishing (3.7 ). We need to check (3.6). By the ergodic theorem again, for every E > 0 and A > 0 we have with P-probability 1 E ri k-l( Vk,n 2 1 {llIk,nl>E} ) lim sup n->oo l~k~nt

L

lim sup n- 1 Erik ((U~h)21{IU~hl>ml/2}) n->oo a~k~(n-l)t

L

< lim sup

n- 1

n->oo

L

Erik ((U~h)21{IU~hl>A})

a~k~(n-l)t

lim sup n- 1 Erik (U~h2U~1{lhl>A}) n->oo a~k~(n-l)t

L L

lim sup n- 1 U~(Erio (h 21{lhl>A})) n->oo a~k~(n-l)t EErio (h 21{lhl>A})

= E(h 21{lhl>A}),

and, choosing A large enough, the latter expression can be made arbitrarily small. Thus for every E > 0 with P-probability 1

L

Erik-l(v~,nl{llIk,nl>E}) - 0

l~k~nt

as n - 00. This implies that with probability 1 the same expression tends to zero with respect to the conditional probability given H a, proving (3.6 ). It follows from Lemma 3.1 that Pn(h, ... , U:;,-lhIH a ) converges weakly to Wa P-a.s. The same conclusion also holds if a = 0 (h = 0 in this case). Finally we need to show that the sequences (U¥h)nE7l, and (U¥!)nE7l, are stochastically equivalent. We have R n (n- 1 / 2!, ... , n- 1 / 2U:;'-1 J) - R n (n- 1/ 2h, ... , n- 1/ 2U:;'-lh) =

n - un-I)) 2 - U) R n (n -1/2(UTg - g ) ,n -1/2(UTg Tg,···, n -1/2(UTg T g . It is easy to see that the maximum (over the interval [0, 1]) of the modulus of the latter random function equals n- 1 / 2 maxl~k~n IU~g - gl and does not exceed n- 1 / 2 (lgl + maxl~k~n IU~gl). Since by the ergodic theorem n- 1 U¥g2 0, this expression tends to zero P-a.s. Thus we see that P-a.s. the distance in D([O, 1]) between Rn(h, ... , U:;,-lh) and Rn(J, ... , U:;,-l J) tends to zero as n - 00. This implies that, with probability 1, the conditional distributions Pn(J, ... ,U:;,-l J) IHa) in D([O, 1]) have the same weak limit as

D

140

3.2

On Conditional Central Limit Theorems For Stationary Processes

On Rubshtein's CCLT

Proposition 3.1 is in fact a general result which can be seen when compared to other theorems in the literature. We begin recalling Rubshtein's result in [10].

Theorem 3.4. Let (~n, TJn)nE'Z be an ergodic stationary process with ~ E L2 and E9~o = O. If

(3.10)

then, with probability 1, the conditional distributions P n (6, 6, ... , ~n I Q) of Rn (6,6, ... , ~n) converge weakly to the non-random probability W a , where

. -1 E (6 hm

n---->oo

n

+ ... + ~n )2

=

(J

2•

The proof of this result can be reduced to Proposition 3.1 observing that (3.10 ) implies a representation as in (3.9). The result in [9], Theorem 2.3 is of the same nature, but in the special situation of a skew product. Another special case of Proposition 3.1 is the following theorem, which is also a generalization of Theorem 2 in [6], when p = 2.

Theorem 3.5. Let T be an ergodic automorphism and (fin)nEZ be aT-filtration. If f E L2 is a real-valued function satisfying 00

2:(llf - Erik fl12

+ IIEri-k f112) < 00,

(3.11)

k=O

then Proposition 3.1 applies to f. In particular, there exists (J ~ 0 such that with probability 1 the conditional distributions Pn(f, UT f, ... ,U:;'-l fl fio) converge weakly to the probability distribution Wa.

Proof. The following explicit formula defines a function g which permits a representation as in (3.9 ), where we set h = f - goT + g: 00

00

k=l

k=O

(here and below the series are L 2 -norm convergent due to the assumption

Manfred Denker and Mikhail Gordin

141

(3.11 )). It follows that h

f - UTg+ g

I: Ui + (f -

f -

k

1

EHk 1)

+L

k=l

U;+l(EH_k 1)

k=O

k=l

k=O

k=O

k=l

k=l

k=O

k=l

I:

k=l

U!;(E H-k+l

-

EH-k)f

kEZ n

k=-n

This representation clearly shows that h satisfies (3.8 ) and the theorem follows 0 from Proposition 3.1.

3.3

The CCLT for subordinated filtrations

Let (Qn)nEZ and (Fn)nEZ be two subordinated T-filtrations as explained in section 2 on filtrations. We shall use Proposition 3.1 to obtain sufficient conditions that the CCLT holds together with the CLT for the conditional mean. We begin with the following reformulation of Proposition 3.1.

Proposition 3.2. Let T be an ergodic automorphism, (Qn)nEZ and (Fn)nEZ be a pair ofT-filtrations such that (Qn)nEZ is subordinated to (Fn)nE'Z,' For f E L2 define = EY f and = f Assume that and admit representations

J

1

J.

J

1

(3.12) and

f = h+ g - UTg, where

then

g, g E L 2 ,

(3.13)

On Conditional Central Limit Theorems For Stationary Processes

142

(1,

1, ...

1)

i) the distributions Pn UT ,U:;'-l of the random functions Rn(1, UT ,U:;'-l 1) converge weakly to the probability distribution Wa,

1, ...

where

a = IIhl12 ~ o.

ii) with probability 1, the conditional distributions Pn(i, UT i, . .. ,U:;.-l ilHo) given Ho of the random functions Rn(i, UT i, . .. ,U:;.-l 1) converge weakly to the (non-random) probability distribution Wo:, where (j = IIhl12 ~ o. Remark 3.6. The same proof as for Proposition 3.2 shows that the joint distribution of the partial sums of (1, converge to aGaussian law with covariance matrix (O"ij), where O"I = Ilhll§, O"~ = Ilhll§ and 0"1,2 = 0"2,1 = Jhhd/L. One easily deduces from this that also f is asymptotically normal with variance Ilh + hll§.

i)

Proof. The assertion ii) is a direct consequence of Proposition 3.1. The assertion i) also follows from the Proposition 3.1 (applied to the filtration (Qn)nEZ) and Remark 3.3. 0 Corollary 3.1. Under the assumptions of Proposition 3.2, with probability 1, the conditional distributions p;:(i, UT i, ... ,U:;.-l ill, UT u:;.-11) converge weakly to Wo:, where (j = IIhl12 ~ o.

1, ... ,

Proof. This follows from Remark 3.3, because the functions are Q-measurable and Q ~ Ho.

1, UT 1, ... , u:;.-11 0

Theorem 3.7. LetT be an ergodic automorphism, and let (Qn)nEZ and (Fn)nEZ be a pair of T -filtrations such that (Qn)nEZ is subordinated to (Fn)nEZ. Let f E L2 be a real-valued function satisfying (Xl

L

Ilf - EFk fl12 < 00,

(3.14)

k=O (Xl

LilEY f - E'H-k fl12 < 00

(3.15)

k=O (Xl

LilEY f - EYk fl12 < 00

(3.16)

k=O

and (Xl

L IIEY-k fl12 < 00. k=O

Setting ~

~

then f and f admit, respectively, the representations

and

(3.17)

Manfred Denker and Mikhail Gordin

143

where

Moreover,

i) the distributions Pn ([, UT [, ... ,U:;'-l [) of the random functions Rn( 1, UT [, ... ,U:;,-l [) converge weakly to the probability distribution W&, where (j = IIhl12 ~ O. ii) with probability 1, the conditional distributions Pn (1, UT 1, ... ,U:;,-l 1lfio) given fio of the random functions Rn (1, UT ,u:;,-11) converge weakly to the (non-random) probability distribution Wo=, where (j = IIhl12 ~ O.

1, ...

Remark 3.8. (1) Instead of (3.17 ) it is sometimes more convenient to verify the stronger condition CXJ

I: IIEF-k fl12 <

00.

k=O

(2) If then the class of functions satisfying the assumptions of Theorem 3.7 is dense in the subspace of the functions f E L2 satisfying EQ- f = O. A sufficient condition for this can be found in subsection 4.4. Proof of Theorem 3. '1. We apply Theorem 3.5 twice. Let us show first that f and (fin)nEZ satisfy the assumptions of Theorem 3.5. We have by (3.14 ) and (3.15 ) CXJ

CXJ

k=O

k=O

I: 111- EHk 1112

CXJ

k=O CXJ

<

I: Ilf - EFk fl12 <

00

k=O

and CXJ

I: IIEH-k 1112

CXJ

k=O CXJ

I: IIEQ f - EH-k fl12 <

00.

k=O

By (3.16 ) and (3.17 ) we can also apply Theorem 3.5 to [ and (Qn)nEZ (instead of f and (fin)nEZ), since CXJ

CXJ

CXJ

k=O

k=O

k=O

I: IIEQ-k [112 = I: IIEQ-k EQ fl12 = I: IIEQ-k fl12 <

00

144

and

On Conditional Central Limit Theorems For Stationary Processes

00

L

00

111 -

L

Egk 1112 =

k=O

IIEg f -

Egk fl12

<

00.

k=O

o Corollary 3.2. Let T be an ergodic automorphism, (Yn)nEZ and (:F'n)nEZ be a pair of T -filtrations such that (Yn)nEZ is immersed into (:F'n)nEZ, Assume that

(3.18) and that f E L2 is a real-valued function satisfying 00

L

Ilf -

EFk fl12

<

(3.19)

00,

k=O 00

L

IIEg f -

EH-k fl12

<

00

k=O

and

00

L

IIEg-k fl12 < 00.

k=O

Set

1=

Eg f

and

1=

f - Eg f,

0en (3;,.14 )-(3.17) of Theorem 3.7 are satisfied and its conclusion applies to f and f· Moreover, the class of functions satisfying the assumptions (3.14 )(3.17 ) is dense in {f E L2 : Eg- f = a}. Remark 3.9. In many applications we have :F'_ a-subfield. This obviously implies (3.18 ).

= N where N is the trivial

Proof of Corollary 3.2. We only need to verify (3.16). This can be deduced from (2.5 ) in the statement of Lemma 2.1 as follows:

IIEg f -

Egk fl12

IIEg f - Eg(Eh f)lb IIEgU - Eh f)112 < Ilf - EFk fib

and (3.16 ) follows from (3.19 ). By (3.18 ) Remark 3.8 (2) applies and the set of functions satisfying (3.14 )-(3.17 ) is dense in {f E L2 : Eg- f = a}. 0

4 4.1

Markov chains A general result

Let (Xn)nEZ be a stationary Markov chain with state space (Sx, Ax) (where Sx is a non-empty set and Ax a a-field in Sx), transition probability Qx : Sx x

Manfred Denker and Mikhail Gordin

145

Ax -----.. [0,1] and stationary probability measure /-Lx on Ax. We assume that the random sequence (Xk)kE71. is defined on some fixed sample space (X, F, P) where the probability measure P is the distribution of the Markov chain with initial distribution /-Lx, the stationary distribution. Then every Xk maps (X, F, P) onto (Sx, Ax, /-Lx) in a measurable and measure preserving way. For every n E Z denote by Kn the cr-field in X generated by Xn and by fin the cr-field generated by {Xk : k ::::: n}, i.e. fin = Vk
Proposition 4.1. Let (Xn )nE71. be an ergodic stationary Markov chain with stationary probability measure /-Lx and transition operator Qx. If F E L 2(/-Lx) has the representation (4.20) for some G E £2(/-Lx), then, with probability 1, the conditional distributions Pn (F 0 xo, F 0 Xl, ... , F 0 xn-llfio) given fio of the random functions Rn (F 0 xo, F 0 Xl, ... , F 0 Xn-l) converge weakly to the (non-random) probability distribution W u , where cr 2 = IIGII§ - IIQxGII§ 2:: o. Proof. We apply Proposition 3.1 to F has now the form

Fo Xo

0

Xo. Indeed, the representation (3.9 )

(G 0 Xl - (QxG) 0 xo) - Go Xl + G0 H + G 0 Xo - UT (G 0 xo),

where H = Go Xl - (QxG) sufficient to notice that

0

Xo

Xo satisfies (3.8). To complete the proof it is

IIHII~ EplG 0 xl1 2 - 2Ep(((G 0 xd . (QxG)

IIGII~

4.2

-

IIQxGII~·

0

xo))

+ Epl(QxG) 0

xOl2 0

Markov chains fibred over invertible transformations

We keep the notation as in the previous subsection. In addition, let (S-rr, A-rr) be a measurable space and 'ljJ : Sx -----.. S-rr a measurable map. 'ljJ defines a stationary sequence tr n = 'ljJ 0 Xn (n E Z) with one-dimensional marginal /-L-rr = /-Lx 0 'ljJ-I, the image of /-Lx under 'ljJ. We assume that there exists an invertible measurable transformation V of S-rr onto itself such that (4.21)

146

On Conditional Central Limit Theorems For Stationary Processes

Since (Xn)nEZ is a stationary sequence with one-dimensional distribution /-ix, it follows from (4.21 ) that V preserves /-i'Tr' Next, consider the following identity for the transition operator Qx, for all bounded, Ax-measurable functions F on Sx and all bounded, An-measurable functions G on S'Tr :

(Qx((G

0

1j;)F))(·) = G(V(1j;(·)))(QxF)(-)'

(4.22)

If Sx,z = 1j;-l(Z) denotes the fibre over Z E S'Tr) then property (4.22 ) means that the transition probability for an initial point x E Sx is concentrated on the fibre Sx,V('I/J(x))' In this case the transition operator Qx is fibred over the transformation V, and (S'Tr) A'Tr' /-L'Tr) and V are called the base probability space and the base transformation, respectively.

Fix some xESx- We are interested in the distribution of (xn)n;::::O conditioned by the constraints Xo = x, 1j; (xn) = vn (1j; (x) ) n E Z. In order to describe this behaviour let C be a a-field generated by some fixed random variables 'Trl. The following observation follows from Propsition 4.1 by passing from the a-field Ji o to the coarser a-field C.

Proposition 4.2. Let Qx be an ergodic transition probability with stationary probability measure /-LX) and assume that Qx is fibred over a transformation V with base probability space (S'Tr' A'Tr' /-i'Tr)'

If FE L 2 (/-ix) has a representation (4.20 )

F= G-QxG for some G E L2 (/-L x), then, with probability 1, the conditional distributions Pn(F 0 xo,F 0 x1, ... ,F 0 xn-1IC) of the random functions Rn(F 0 xo,F 0 Xl, ... , F 0 Xn-1) converge weakly to the (non-random) probability distribution W a , where a 2 = IIGII~ - IIQxGII~ ~ o. The same conclusion holds for

F=F

- EA~F, where A~

= 1j;-l(An).

Proof. First note that the first claim follows from Proposition 4.1. assumptions we have the identity

By the

~

which implies that both functions F and F defined by

F = EA~ F , F = F also satisfy (4.20 ), because EA~(G - Qx G) ~

=

-

F

EA~G - QxEA~G.

(4.23)

o

~

Remark 4.1. Only F defines a stationary process fox n (n ~ 0) with a possibly non-generate CLT, while F has the form F = Go V - G, hence is a coboundary and defines a stationary process with a degenerate limit in the CLT. For a function F with decomposition (4.23 ), we can always assume that the function G in a representation (4.20 ) has a decomposition of the form (4.23 ) as well, I.e.

Manfred Denker and Mikhail Gordin

147

and Under this condition (4.20 ) admits at most one solution. Functions satisfying (4.20 ) form a dense subset in LdjJx). This follows from the fact that their orthogonal complement in L 2 (jJx) is the space of Q:-invariant functions, whence are constant by ergodicity of Qx' They are also dense in the subspace of functions F, satisfying (4.24)

Remark 4.2. There are different strategies to obtain (4.20 ) for a given function. If Qx is a normal operator (in the sense that it commutes with its conjugate), very precise conditions for (4.20 ) to hold can be given in terms of the spectral decomposition of F relative to Qx ([4]). For a function F a solution G to the equation (4.20 ) can be written down as a formal power series: 00

G= LQ~F. n=O

(4.25)

In some cases this series converges with respect to an appropriate norm.

Remark 4.3. Fibration over the base space is of particular interest for fibred dynamical systems (see [5]). The fibres are given by Sx,z = 'l/J-l(z) and the measure jJx has a disintegration into probability measures jJx,z which are supported on the fibres Sx,z' Under (4.22 ) fibrewise transition probabilities are defined by Q~,n) : Sx,z x A(vn(z)) -+ [0,1], Q~,n)(x,A) = Q~(x,A),

x E Sx,z, A E A(vn(z)),

where z E S1f and A(z) is the restriction of the a-field Ax to the fibre Sx,z' The family (Q~,n»)zES",n::::O is measurable in z and satisfies the cocycle identity in n, l.e.

1

Q~k(z),l) (u, A)Q~,k) (x, du) = Q~,k+l)(x, A),

(4.26)

S",,vk(z)

for z E S1f' X E Sx,z, A E A(Vk+l(z)), k, I 2': O. The transition probability Q~,n) transports the conditional measure jJ(x,z) on the fibre Sx,z to the conditional measure jJ(x,vn(z» on Sx,vn(z)' The condition (4.24 ) means that F has vanishing integrals with respect to each fibre probability measure jJx,z, thus defining the family of function spaces on fibres Sx,z given by functions of vanishing integral with respect to jJ(x,z)' The family Q~,n) also defines a family of operators between these function spaces with the cocycle property (4.26 ) (the operator Q~,n) maps functions on the fibre Sx,Vn(z) to those on Sx,z). They also preserve integrals with respect to the conditional measures, in particular, the set of function with integral 0 is invariant with respect to these operators. Various conditions are known in the literature ensuring that this family of operators, restricted to spaces of functions with vanishing integrals over all fibres, are contractions with respect to an appropriate norm (provided

148

On Conditional Central Limit Theorems For Stationary Processes

n is sufficiently large). For example, in the case of immersed finite state Markov chains considered in Theorem 5 of [2] (we shall treat the immersed case in the next section avoiding such considerations) there are only finitely many types of finite fibres with finitely many types of transition probabilities between them. Under some additional assumptions the contraction property is ensured in the uniform norm. Alternatively, assuming that Sx is a metric space, we can use Holder norms to achieve the contraction property. This technique is often used in connection with thermodynamic formalism and its relativized version (see [5] and references therein). The transfer operator considered there is a generalization of the transition operator, because it does not need to preserve the space of constant functions; however, it is a specialization at the same time, because the "reversed process" is deterministic. Notice, that there is no need to apply the Hilbert projective norm technique because we assume the existence of a stationary probability measure (though this technique is very helpful in proving the existence of these measures).

4.3

Reduction of conditional Markov chains to chains with deterministic base

In this section we sketch the application of subsection 4.2 to the general problem mentioned in the introduction. Recall that we are interested in the asymptotic distribution of L~:~ !((k) given 'TJo, ... , 'TJn-l, where (k = (~k' 'TJk) is a two component strictly stationary homogeneous Markov chain. Let ((khEZ be a stationary homogeneous Markov chain. Its state space is denoted by (S(, Ad (where S( is a set and A( is a O"-field in Sd, its transition probability by Q( : S( x A( -----+ [0,1] and its stationary probability measure by /1< on A(, i.e. E(F((n+l)l(k, k ~ n) = (Q(F)((n). We assume that the random sequence ((k)kEZ is defined on some fixed probability space (X,F, P) where the probability measure P is derived from the stationary distribution f-t( (as in subsection 4.2). Then every (k maps (X, F, P) onto (5(, A(, f-td in a measurable and measure preserving way. For every n E Z denote by An the O"-field in X generated by (n and by Fn the O"-field generated by {(k : k ~ n}, i.e. Fn = Vk
... ) -: (... ,~o, ZI, Z2"")' The~et Sx consists of those ~airs (~, z)

S; ~ whlch satlsfy cp(xo) E

Srr WIth x - ( ... , X-I, xo), Z - ( ... , Z-I, ZO, ZI,"') Then we set 'ljJ((x, z)) = z. The random sequence (Xn)nEZ can be de-

Zoo

Manfred Denker and Mikhail Gordin

149

fined on the same probability space (X, F, P) as ((n)nEZ by setting Un = (( ... , (n-1, (n), ( ... , 'T/n-1, 'T/n, 'T/n+1,··· )), n E IZ, where 'T/n in the second coordinate marks the position 0 in the infinite string. It is obvious that Un oT = Un+1 and that (Un)nEZ generates F. Therefore (X, F, P) can be also considered as the path space of (Un)nEZ. Note that (Un)nEZ is a Markov chain, because (Un)nEZ is a random sequence for which the past can be reconstructed from the present. Now we see that we are essentially in the situation of subsection 4.2. The operator Qx can be defined correctly at least as an operator on £2(Sx, Ax, /1x), and Proposition 4.2 applies. Given a function F on S" the problem remains to check (4.20 ) for the function F' defined on S x by

F'(( ... , X-I, xo), (... , Z-l, ZO, Zl, ... )) = F(xo). First we need to subtract from F' the function Z f--+ J F'(u)/1x,z(du), the conditional expectation with respect to the base. Then we may prove, for example, convergence of the series (4.25 ) for the function F' - J F'(u)/1x,z(du). As to the behavior of the random sequence (E(F 0 ((n)I{'T/dkEZ)nEZ related to the function Z f--+ J F'(u)/1x,z(du), it requires some estimates showing that /1x,z is mainly determined by the finite part of the sequence z. In Bezhaeva [2] this is assured by condition (A).

4.4

Immersed Markov chains

We keep the notation of the previous subsection. Let Q be a transition probability on S x A and A' be a a-subfield of A. Then Q is said to be A'-compatible if the transition probability Q(., A) is a A'-measurable function for every A E A'. Let ((n)nEZ be a stationary Markov chain and ('T/n)nEZ be a random sequence defined by 'T/n = 'P((n), n E IZ. We say that ('T/n)nEZ is immersed into ((n)nEZ, if Q, is 'P- 1 (A'T/ )-compatible. Under this condition a straight forward calculation shows that the sequence ('T/n)nEZ is a Markov chain, and that the filtration 9n = Vk
Zo,

Q(z,l)

depends on Zo only where z =

Zl' ... );

ii) the conditional measure /1z is a function of Zo, Z-l, ... (here again z

=

( ... ,Z-l,ZO,Zl, ... )).

Recall that 9 is the a-field generated by ('T/n)nEZ and A~ = 'I/'-l(A1l") is the a-field on the state space of the Markov chain (Un)nEZ generated by the map '1/'. In other words it is generated by the map (x, z) f--+ z, where x = ( ... , X -1, xo) and z = ( ... , Z-l, Zo, Zl' ... ). Let A be the map sending (x, z) to Xo. In the following theorem we use the notations introduced above.

150

On Conditional Central Limit Theorems For Stationary Processes

Theorem 4.4. Let ((n)nEZ be a Markov chain and 7]n = tp((n) , n E Z. Assume, that (7]n)nEZ is immersed into ((n)nEZ. Let (Xn)nEZ denote the Markov chain associated to ((n)nEZ as in subsection 4.3. For a function F' = F 0 A on Sx, define p' = EA~ F', and p' = F' - P'.

If the functions F and

P'

admit representations

and

P' = G' -

QxG',

where G E L 2 (/-Lc,) and G' E L 2 (/-Lx), then [ the assumptions of Proposition 3.2. Thus,

= P' 0 Xo and 1 = P' 0 Xo satisfy

1, ...

i) the distributions Pn(1, UT ,u:;.-11) of the random functions Rn(1, UT ,u:;.-11) converge weakly to the probability distribution Wo:, where &2 = IIGII~ -IIQd~ ~ 0;

1,· ..

ii) with probability 1, the conditional distributions Pn ([, UT [, ... ,U:;'-I lIHo) given 110 of the random functions Rn(J, UT [ , ... ,U:;'-I[) converge weakly

to the (non-random) probability distribution W 0:, where ;:;2 =

II G' I ~ -

IIQxG'II~ ~ o. Proof. We apply Proposition 3.2 to the functions f and f· It is clear from the proof of Proposition 4.1 that [ satisfies the condition (3.13 ) of Proposition 3.2 with ?i = G' 0 Xo.

Setting pn = (... , 7]n-l, 7]n) we introduce a stationary Markov chain (Pn)nEZ with state space Sp, transition operator Qp and stationary measure /-Lp. Let X : Sx ----t Sp be the map sending (( ... , X-I, Xo), ( ... , Z-I, ZO, ZI' ... )) to ( ... , Z-I, zo) and by A" the a-field in Sx generated by x. Then by immersion it follows that EA~ (G 0 A) is A"-measurable. Therefore, it can be written in the form GoA with an appropriate function G on Sp, and, applying the immersion property again, we obtain This implies the representation

P'(xn) whence (3.12 ) holds for

=

G(Pn) - (QpG)(Pn),

1 with 9 =

G(Pn).

It follows that Proposition 3.2 applies to the function f

= f + f·

o

Bibliography [1] Beghdadi-Sakrani M., Emery M.: On certain probabilities equivalent to coin-tossing, d'apres Schachermayer. Sem. de Probab. XXXIII, Lect. Notes in Math. 1709 (1999), 240-256.

Manfred Denker and Mikhail Gordin

151

[2] Bezhaeva Z.l.: Limit theorems for conditional Markov chains. Theory of Probability and its Applications (translated from the russian original) 163 (1971), 428-437. [3] Billingsley P.: Convergence of Probability Measures. John Wiley & Sons, Inc. New York-London-Sydney-Toronto, 1968. [4] Borodin A.N., Ibragimov LA.: Limit Theorems for Functionals of Random Walks. Proc. of Steklov Inst. of Math. 195, Providence RI, 1995. [5] Denker M., Gordin M., Heinemann S.-M.: On the relative variational principle for fibre expanding maps. Ergodic Theory and Dynamical Systems 22 (2002), 757-782. [6] Gordin M.l.: The central limit theorem for stationary processes. Russian. Dokl. Akad. Nauk SSSR 1889 (1969), 739-741. [7] Gordin M. 1., LifSic B. A.: Central limit theorem for stationary Markov processes. Russian. Dokl. Akad. Nauk SSSR 2394 (1978), 766-767. [8] Jacod J., Shiryaev A. N.: Limit Theorems for Stochastic Processes. Grundlehren der math. Wiss. 288 Springer Verlag Berlin etc. 1987. [9] Kifer, Y.: Limit theorems for random transformations and processes in random environments. Trans. Amer. Math. Soc. 350 (1998), 1481-1518.

[10] Rubshtein, B.-Z.: A central limit theorem for conditional distributions. Convergence in Ergodic Theory and Probability (Bergelson, March, Rosenblatt eds.), Walter de Gruyter, Berlin 1996, 373-380.

152

On Conditional Central Limit Theorems For Stationary Processes

Polynomially Harmonizable Processes and Finitely Polynomially Determined Levy Processes A. Goswami Indiana University

and A. Sengupta Indian Statistical Institute

Abstract The sequence {Pk (t, x)} of two-variable Hermite polynomials are known to have the property that, if {Mt, t 2': O} denotes the standard Brownian motion, then Pk(t, M t ) is a martingale for each k 2': 1. This property of standard Brownian motion vis-a-vis Hermite polynomials motivated the general notion of "polynomially harmonizable processes". These are processes that admit sequences of time-space harmonic polynomials, that is, two-variable polynomials which become martingales when evaluated along the trajectory of the process. For Levy processes, this property is connected to certain properties of the associated Levy /Kolmogorov measures. Moreover, stochastic properties of the under lying processes (like independence, stationarity of increments) turn out to be equivalent to certain algebraic/analytic properties of the corresponding sequence of polynomials. We first present a brief survey of these recently obtained general results and then describe necessary and sufficient conditions for certain classes of Levy processes to be uniquely determined by a finite number of time-space harmonic polynomials.

AMS (1980) Subject Classification: Primary 60F05, Secondary 60J05 Keywords: Time-Space Harmonic Polynomials, p-Harmonizability, Levy Processes, Hermite Polynomials, Charlier Polynomials, Finitely Polynomially Determined Processes, Semi-Stable Markov Processes, Intertwinning Semigroups

1

Introduction: General Definitions

The sequence of two-variable Hermite polynomials {Pk , k ~ I} on [0, (0) x lR are defined via the classical one-variable Hermite polynomials {Pk, k ~ I} as follows:

where

Pk(X) = (_1)ke X2 /2 ::k (e- x2 / 2 ). Some of the well-known properties of the sequence {Pd are:

•

Pk(t, x) is a polynomial in the two variables t and x, for each k. 153

Polynomially Harmonizable Processes

154

•

Pk(t, .) has degree k in x, with the leading term having coefficient 1.

• •

& -Pk(t,x) =&t

(k)2 Pk- 2(t,X) = --12 &22Pk(t,x), for each ~

uX

k

> 2. -

For the last two properties, we take Po(t, x) _ 1. The first two properties simply tell us that we can write k

Pk(t,x)

=

LP;k)(t)x j , j=O

where the p;k) (t) are polynomials in t and pik)(t)

==

1.

The sequence {Pk } of Hermite polynomials as defined above is known to have some deep connections with the standard Brownian motion. One of these is the well-known fact that if {Mt, t ~ O} denotes the standard Brownian motion, then for each k, {Pk(t, M t ), t ~ O} is a martingale (for the natural filtration of {Md) and standard Brownian motion is the only process with this property. Moreover, if P(t, x) is any two-variable polynomial such that {P(t, M t )} IS a martingale, then P belongs to the linear span of the sequence {Pk }. A natural question that arises is: which stochastic processes admit such a sequence of 2-variable polynomials which when evaluated along the trajectory of the process are martingales and, if so, to what extent do these polynomials determine the process? Also, is it possible to get the sequences of polynomials so as to satisfy properties similar to those of the Hermite polynomials mentioned above? These questions were investigated in detail in Goswami and Sengupta [2] and Sengupta [6]. Following are some notations and definitions that were introduced in these works. Here we restrict ourselves only to continuous-time processes. Let M = {Mt, t ~ O} be a stochastic process on some probability space. The time-space harmonic polynomials for the process M are defined to be all those two-variable polynomials P(·,·) such that {P(t, M t )} is a martingale (always for the natural filtration of M). The two variables will be referred to as repectively the 'time' and the 'space' variables. The collection of all time-space harmonic polynomials for a process M will be denoted P(M). In other words,

P(M):= {P : P is a 2-variable polynomial and {P(t,Mt )} is a martingale} k

Any two-variable polynomial P can be written as P(t,x)

= L Pj(t)x j , for j=O

some k, where each Pj(t) is a polynomial in t. If in the above representation, Pk (t) =j:. 0, we say that P is of degree k in the 'space' variable x. For a stochastic process M = {Mt }, we define Pk (M) to be the collection of those time-space harmonic polynomials which are of degree k in the space variable, that is,

Pk(M)

:=

{P E P(M) : P is of degree k in the space variable x}.

A. Goswami and A. Sengupta

155

Clearly,

P(M)

=

UPk(M). k

Definition: A stochastic process M is said to be polynomially harmonizable (p-harmonizable, in short) if Pk (M) =1= 0, for all k :2: 1. In this terminology, standard Brownian motion is a p-harmonizable process. Indeed, Brownian motion is p-harmonizable in a somewhat stricter sense, to be understood below. For a process M, let us denote P k (M) to be the set of those time-space harmonic polynomials of degree k in x, for which the leading term in x is 'free' of t, that is, the coefficient of xk is a non-zero constant. In other words, k

PdM)

:=

{P E Pk(M) : P(t,x)

LPj(t)x j with Pk(-) a non-zero constant},

=

j=O

and we let,

P(M)

:=

UPk(M). k

Clearly, Pk(M)

c Pk(M)

'II k and so, P(M)

c P(M). Also, if Pk(M)

=1=

0,

k

then there is P(t, x) =

2:= Pj(t)x j

E Pk(M) with Pk(')

== 1.

j=O

Definition: A stochastic process M is said to be p-harmonizable in the strict sense if Pk(M) =1= 0, for all k :2: 1. The second property ofthe two-variable Hermite polynomials listed earlier shows that standard Brownian motion is actually p-harmonizable in the strict sense. The other classical example of a strict sense p-harmonizable process is the Poisson process. For a Poisson process, with intensity 1 for example, a sequence of time-space harmonic polynomials is given by the so-called two-variable Charlier polynomials

where {

~

} denote the Stirling numbers of the second kind. The Gamma

process is another example of a strict sense p-harmonizable process. In keeping with the special properties of the sequence of Hermite polynomials mentioned earlier, we introduce here a list of properties for a sequence of twovariable polynomials. Let {Pk, k :2: I} be a sequence oftwo-variable polynomials with Pk being of degree k in x. We define Po == 1. Let us write Pdt,x) = k

2:= p(k)(t)x j , where . 0 J

the pY)(t) are polynomials in t. We are going to refer to

J=

the following properties in the sequel.

(i) Strict sense property: For each k :2: 1, p~k)(.) == 1.

Polynomially Harmonizable Processes

156

aPk

(ii) The Appell property: For each k ?: 1, ax

=

kPk- 1 , that

.

. (k)

IS,

JPj (t) =

kPJ-l (k-l)(t) , 1< ·
(iii) The pseudo-type-zero property: There exists a real sequence {hk} such that aPk for each k ?: 1, - a t

=

z=k (k) hiPki

i=l

i,

that

.

IS,

d (k) -d Pj (t)

t

=

k-j (k)

z=

i

(k-i) hiPj (t),

i=l

1 :::; j :::; k.

(iv) Uniqueness property: For each k > 1, PdO,x) 0, 0:::; j :::; k - 1.

=

xk, that is, pr)(O)

The sequence of Hermite polynomials satisfies all the properties (i) - (iv); property (iii) holds here with h2 = -1 and hk = 0 for k =J- 2. It is easy to verify that the two variable Charlier polynomials satisfy these properties as well. Theorems 2.3 and 2.4 in the next section will establish that these are reflections of the fact that both Brownian motion and Poisson process are homogeneous Levy processes. Let us make some basic observations about the properties listed above. First of all, with the convention that Po - 1, property (i) will always imply property (ii). Secondly, in our applications, the sequence {Pk } will be arising as time-space harmonic polynomials of a process M. Now if, the process itself happens to be a martingale, we can always take PI = x, in which case property (ii) will actually imply a slightly stronger property than (i), namely, (i/) for each k ?: 1, Pdt, x) - xk has degree at most k - 2 in x, that is, p~k) == 1 (k) and Pk-I = O. Properties (ii) and (iii) for a sequence of polynomials were studied analytically in an entirely different context in Sheffer [7], which is the source of our terminolgy for these properties in this context. It turns out that for a stochastic process M, the properties (ii), (iii) and some other algebraic/analytic properties the corresponding sequence of time-space harmonic polynomials are intimately connected to some stochastic properties of M.

2

Levy Processes and p-Harmonizability

In this section, we describe some of the results on p-harmonizability of Levy processes. Details of these can be found in [6]. Discrete-time versions of many of these results were proved earlier in [2]. For us, a Levy process will mean a process M = {Mt, t ?: O} with independent increments and having no fixed times of discontinuity. A homogeneous Levy process is one which is homogeneous as a Markov process, that is, whose increments are stationary besides being independent. In the results that follow, we will often need to impose two conditions on the process M, to be referred to as the moment condition and support condition. They are as follows: • Condition (Mo)

For all t, M t has finite moments of all orders.

A. Goswami and A. Sengupta

157

i 00,

• Condition (Su) : There is a sequence tn Isupport (MtJ I > k for infinitely many tn.

such that, for all k 2:: 1,

The moment condition (Mo) is clearly necessary for the process to be p-harmonizable. The role of the condition (Su) is more technical in nature. However, it may be noted that any homogeneous Levy process always satisfies this condition (unless, of course, it is deterministic). For a general Levy process, a simpler condition that gurantees (Su) is that M t - Ms be non-degenerate for all 0 S; s < t, that is, the increments are all non-degenerate. We now state some of the main results from [6].

Theorem 2.1. Any homogeneous Levy process M = {Mt, t 2:: O} with Mo == 0 and satisfying the conditions (Mo) and (Su) is p-harmonizable in the strict sense. Moreover, there exists a unique sequence P k E P k (M), k 2:: 1 satisfying properties (i) - (iv) and such that P(M) is just the linear span of {Pk , k 2:: I}. Further, the process M is uniquely determined by the sequence {Pd upto all the moments of its finite-dimensional distributions. Remark: (i) The fact that P(M) equals the linear span of {Pk, k 2:: I} implies, in particular, that P(M) = P(M). This is actually a special case of a more general fact proved by Goswami and Sengupta in [2], namely, that for any process M satisfying (Su), if Pk(M) -=/=- 0 V k, then Pk(M) = Pk(M) V k. (ii) The property of M being determined by the sequence {Pd can be strengthened as follows. If we assume, for example, that for some t > 0 and E > 0, E (exp{ aMd) < 00 V Ia I < E, then the sequence {Pd completely determines the distribution of the process M. Theorem 2.2. Let M = {Mt, t 2:: O} be a Levy process with Mo == 0 and satisfying the conditions (Mo) and (Su). Then M is p-harmonizable if and only if for each k 2:: 1, E(Mtk ) is a polynomial in t. In this case, there exists a unique sequence P k E Pk(M), k 2:: 1 satisfying properties (i), (ii) and (iv) and such that P(M) is just the linear span of {Pk , k 2:: I}. Further, the process M is uniquely determined by the sequence {Pk } upto all the moments of its finite-dimensional distributions. Remark: Note the absence of the pseudo-type-zero property (iii) in this case. In fact, property (iii) would not hold unless the process is homogeneous. [see Theorem 2.4]. We now describe a characterization of p-harmonizability of a Levy process M in terms of the underlying Levy measure, or, equivalently the Kolmorov measure. Associated to any Levy process, there is a a-finite measure m on [0,00) x (JR \ {O}), called its Levy measure, such that, E{exp(iaMt)} exp [iafL(t) - !a 2 a2 (t)

+

J

(e iCW

-

1- 1

i~:2

)

m([O, t]

@

dU)] ,

where fL(') and a 2 (.) are the mean and variance functions of the 'gaussian part' of M. It can be shown that p-harmonizability of M is equivalent to requiring

Polynomially Harmonizable Processes

158

that all the following functions be polynomials in t:

and for k

> 2, hk(t)

=

J

ukm([O, t] ® du).

The above characterization takes on a slightly simpler form when expressed in terms of what is known as the Kolmogorov measure associated with the process. It is the unique Borel mesure L on [0, (0) x lR such that log E{ exp(io:Mt)} = io:v(t)

+

J(

eiaU -

1u2

iO:U) L([O, t] ® du),

where v(t) = EMt is the mean function of the process M. We refer to Ito [3] for the definition and the transformation that connects the Kolmogorov measure and the Levy measure. A necessary and sufficient condition for pharmonizability of the process M is that : v( t) as well as the functions hk (t) = Ju k - 2 L([0,t] ®du), k 2: 2 are all polynomials in t. We have seen that for any Levy process M satisfying the conditions (Mo) and (Su), we can get a sequence P k E Pk(M), k 2: 1, such that Appel property (ii) holds. Moreover, if M is homogeneous, then the sequence {Pk} can be chosen so as to satisfy the pseudo-type-zero property (iii). The next two results show that, under some conditions, the converse is also true. In both the following theorems, M = {Mt, t 2: O} will denote a continuous-time stochastic process with r.c.I.I. paths starting at Mo == and satisfying conditions (Mo) and (Su) and {Ft, t 2: O} will denote the natural filtration of M.

°

Theorem 2.3. If there exists a sequence Pk E Pk(M), k 2: 1, satisfying the Appel property (ii), then for each 0 ::; s < t, the conditional moments E((Mt-Ms)kIFs) are degenerate for all k. If moreover, for each t, the momentgenerating function of M t is finite on some open interval containing 0, then M is a Levy process. Theorem 2.4. If there exists a sequence P k E Pk(M), k 2: 1, satisfying both the Appel property (ii) and the pseudo-type-zero property (iii) and if for each t, the moment-generating function of M t is finite on some open interval containing 0, then M is a homogeneous Levy process. Remark: Under the hypothesis of either of the above theorems, it can further be shown that the sequence {Pd satisfies the properties (i) and (iv) as well and is the unique sequence to do so. Moreover, the sequence {Pd span all of P(M) and also determines the distribution of M.

Next, we briefly mention some connections between the time-space harmonic polynomials of a process and what is known as semi-stability property, as developed in Lamperti [5]. Recall that a process M with Mo _ 0 is called semi-stable of index (3 > 0 if for every c > 0, the processes {Met, t 2: O} and {c!3 M t , t 2: O} have the same distribution. It can be easily shown that if {Pk E Pk(M)} is

A. Goswami and A. Sengupta

159

a sequence of time-space harmonic polynomials of a semi-stable process M, of index (3, then each P k satisfies the following homogeneity property:

where Pk(·) is the one-variable polynomial Pk(l, .). In other words, each P k is homogeneous in t f3 and x. It can be shown that, under mild technical conditions, the converse is also true, that is, the existence of a sequence {Pk E Pk (M)} such that each P k is homogeneous in t f3 and x, for some (3 > 0, implies that the process M is semi-stable of index (3. It is also worthwhile to point out here that if a process M admits a sequence {Pk} of time-space harmonic polynomials which are homogeneous in t f3 and x, then 2(3 must be an integer and that in case 2(3 is odd, the finite dimensional distributions of M are all symmetric about o. Finally, let us mention how an intertwining relationship between two markov processes, as developed in Carmona et al [1] relates the time-space harmonic polynomials of the two processes. If M and N are two markov processes with semigroups (Pt ) and (Qt) respectively, one says that the two processes (or, the two semi groups ) are intertwined if there exists an operator A such that APt = QtA V t. In many cases, the operator A is given by the "multiplicative kernel" for a random variable Z, that is, AJ(x) = EJ(xZ). In such a case, it is k

easy to show that, if P(t, x) =

L

Pj(t)x j is a time-space harmonic polynomial

j=O k

for the process M, then P(t, x)

= AP(t, x) = L pj(t)E(Zjx j ) is time-space j=O

harmonic for N. This has proved to be very useful in that if one knows the time-space harmonic polynomials of a process M, then one can get those for other processes which are intertwined with M. This is illustrated with examples in Section 4.

3

Finitely Polynomially Determined Levy Processes

In this section, we address the main question of this article, which involves obtaining a characterization of Levy processes whose laws are determined by finitely many of its time-space polynomials. In a sense, this is an extension of Levy's characterization of standard Brownian motion, which says that, under the additional assumption of continuity of paths, standard Brownian motion is characterized by two of its time-space harmonic polynomials, namely, the first two 2-variable Hermite polynomials PI (t, x) = x and P 2 (t, x) = x 2 - t. One knows that the continuity of paths is a crucial assumption here, without which the characterization does not hold. In the results that follow, the only path property we will assume is the standard assumption of r.c.I.I. paths for Levy processes. Let us start with some general definitions. Let C be a given class of processes.

Polynomially Harmonizable Processes

160

Definition A process M E C will be called k-polynomially determined in C (in short, k-p.d. in C), if Pj(M) =J 0, V j ::; k, and, for any N E C, Pj(N) = Pj (M) V j ::; k =? N d M. (Here:1:: means equality in distribution.) Processes which are k-p.d. in C for some k ;::: 1 are called finitely polynomially determined in C (in short, f.p.d. in C).

Let us remark here that an f.p.d. process need not be p-harmonizable. A general question that we may address is: for what classes of processes C, can one get a complete characterization of the f.p.d. members of C? For two important classes of processes, such a complete characterzation has been obtained and are presented below. The first result characterizes the f.p.d. processes in the class of all homogeneous Levy processes. As mentioned in the previous section, for any Levy process M, one has the representaion log(E( eiaMt ))

io:v(t)

+J

io:f.-t(t) -

(e

iaU

-u~ - iO:U) L([O, t] ® du)

~o:2(J2(t) + J

(e iaU - 1 - 1

i::

2)

m([O, t] ® du),

where Land m are called respectively the Kolmogorov measure and the Levy measure associated to the process M. In case M is homogeneous, the measures Land m turn out to be the product measures

L(dt ® dx) = dt ® l(dx), m(dt ® dx) = dt ® 77(dx) , where I and 77 are (J-finite measures on lR and lR \ {O} respectively and the above representations take on the following special forms

io:vt + t

J(

eiaU - 1 -

u2

io:f.-tt - .lo:2(J2t + tJ 2

iO:U) l(du)

(e iau - 1 -

io:u ) 77(du).

1 +u 2

It may be pointed out in this connection that the relation between the measures land 77 is simply given by

An important property of I that will be used subsequently is that for all k ;::: 2, the k-th cumulant of Ml equals J u k - 21(du). the following theorem now gives a characterization of f.p.d processes in the class of all homogeneous Levy processes. Theorem 3.1. A process M is finitely polynomially determined in the class of all homogeneous Levy processes if and only if the associated measure l, or equivalently the measure 77, has finite support. Proof. It is immediate from the above relation between the measures land 77 that whenever one of them has finite support, so does the other. In the proof, we will work with l.

A. Goswami and A. Sengupta

161 n

Suppose first that

l

has finite support, say,

l

= L

(;Ii6{rd,

where (;Ii

>

0, ~

=

i=1

1, ... ,n and r/s are distinct real numbers. Here 6{r} denotes the 'dirac' mass at r. We show that M is k-p.d. among homogeneous Levy processes with k = 2n+2. Let N be any homogeneous Levy process with Pj(N) = Pj(M) \:j j :;

2n+2. We will show that VN = VM and IN = IM which will imply that N!!:... M. It is easy to see that P j (N) = Pj (M) \:j j :; 2n+ 2 implies the equality of the first 2n + 2 moments of NI and M I , which in turn implies the equality of their first 2n + 2 cumulants. This entails, first of all, that VN = VM and also, in view of the above mentioned property of l, that ujIN(du) = ujIM(du) \:j j = 0,1, ... , 2n. From these, one can easily deduce that for any choice of distinct real numbers

J

In particular, taking ai

=

J

ri, \:j i, one obtains that

JiDI

(u - ai)2IN(du)

= 0,

n

implying that IN is supported on {rl,'" ,rn}, that is, IN

= L

(;I~6{r;}' for

i=1

non-negative (;I~, 1 :; i :; n. Using the facts VN = VM and J u j l M (du) \:j j = 0, 1, ... ,2n, it is now easy to conclude that (;I~ is, IN = IM.

JujIN(du) = =

(;Ii \:j

i, that

To prove the converse, suppose that M is a homogeneous Levy process for which the associated measure I is not finitely supported. We show that M is not f.p.d. by exhibiting, for any k, a homogeneous Levy process N, different from M, such that P j (N) = P j (M) \:j j :; k. This is done as follows. Fix any k ~ 1. Since 1 is not finitely supported, we can get disjoint borel sets Ai C JR, i = 1, ... ,k such that l(Ai) > 0, \:j i. Consider the real vector space of signed measures on JR defined as V

=

{,u : ,u(.) =

linear map A : V

-+

it

cile

n Ai),

ci

E JR, 1 :; i :; k} and consider the

JRk-1 defined by

A being a linear map form a space of dimension k into a space of dimension k - 1, the nullity of A must be at least 1. Choose a non-zero ,u in the null-space of A. Further, we can and do choose ,u so that 1,u(Ai)1 < l(Ai), \:j i. If we now define i = l + ,u, then i is a positive measure with i f- l but J u j i( du) = J ujl(du), \:j j = 0"" ,k - 2. It is now easy to see that if N is the homogeneous Levy process with VN = VM and Kolmogorov measure L(dt ® dx) = dt ® i(dx) , d

then Pj(N) = Pj(M)

\:j

j :; k but N

f-

M. 0

Remarks: (i) A simple interpretation of the above therorem is that a homogeneous Levy process is f.p.d. if and only if its jumps, if and when they occur, are of sizes in a fixed finite set.

(ii) The proof of 'if' part of the theorem shows that if the measure l is supported on precisely k many points, then the process is determined by its first 2k + 2

Polynomially Harmonizable Processes

162

many time-space harmonic polynomials. A natural question is whether 2k + 2 is the minimum number of polynomials necessary. As we shall see in Section 4, that is indeed the case for the most common examples of homogeneous Levy processes. We conjecture that it is perhaps true in general. Our next reult will give a similar characterization of the f. p.d. property in a more general class of Levy processes than the homogeneous ones. To be specific, we consider the class of those Levy processes for which the Kolmogorov measure admits a 'disintegration' w.r.t. the Lebesgue measure on [0,(0). Formally, let us say that the Kolmogorov measure L of a Levy process M admits a 'derivative measure' I if L(dt,dx)

=

l(t,dx)dt,

where l(t, A), t E [0,(0), A E B is a transition measure on [0,(0) x B. Here B denotes the Borel a-field on R We denote C to be the class of all those Levy processes whose Kolmogorov measure admits such a derivative measure. Clearly, all homogeneous Levy processes belong to this class, since in that case l(t,·) == l(·). The class C is fairly large. For example, Gaussian Levy processes as well as non-homogeneous compound Poisson processes belong to this class. Since C is clearly a vector space, any Levy process that arises as the sum of independent Levy processes of class C also belong to this class. As expected, our characterization of f.p.d. processes among the class C will be in terms of the derivative measure l(t,·) defined above and the general idea of the proof runs along the same lines as in the case of homogeneous Levy processes. However, the actual argument becomes a little more technical. For example, we would show that a process M in the class C cannot be k-p.d. unless for almost all t, the derivative measure l(t,·) is supported on at most k points. This is the content of the following Lemma 3.1. The idea of the proof is analogous to that of the 'only if' part of Theorem 3.1 for homogeneous Levy processes. That is, assuming the contrary is true, we will have to define a new process N in class d

C such that Pj(N) = Pj(M) V j :s; k but N =I- M. However, getting hold of this process N or equivalently its derivative measure l( t, .) involves using an appropriate variant of a result of Descriptive Set Theory, known as Novikov's Selection Theorem, stated below as Lemma 3.2. We refer to Kechris [4] for details. Lemma 3.1. Suppose the process M is k-polynomially determined in class = {t >

C. Then for any version of l, the set T c [0, (0) defined by T Isupp(l (t, .)) I > k} is Borel and has lebesgue measure zero.

°:

We omit the proof of this lemma here. As mentioned above, the proof uses the following selection theorem (see [6] for details). Lemma 3.2. Suppose U is a standard Borel space and V is a a-compact subset of a Polish space. Let B c U x V be a Borel set whose projection to U is the whole of u. Suppose further that, for each x E U, the x-section of B is closed in V. Then there is a Borel measurable function 9 : U -+ V whose graph is contained in B.

A. Goswami and A. Sengupta

163

We now state and prove the characterization result for f.p.d.-processes in the class C.

Theorem 3.2. Let M be a Levy process of the class C. (a) If there exists an integer k :2: 1 and a measurable function (XI,··· ,Xk,PI,··· ,Pk) : [0,(0) --t]Rk X [O,oo)k such that (i) for each j k

0,1, ... ,2k,

2: Pi(t)(Xi(t))j

is a polynomial in t almost everywhere, and, (ii)

i=l k

l(t,·)

=

2: Pi(t)O{x;(t)}(-)

is a version of the derivative measure for M, then M

i=l

is finitely polynomially determined (indeed, (2k + 2)-polynomially determined) in C. (b) Conversely, if M is finitely polynomially determined in C, then there exists an integer k :2: 1 and a measurable function (Xl, ... ,Xb PI, ... ,Pk) : [0,(0) --t ]Rk X [0, oo)k such that a version of the derivak

tive measure associated with M is given by l(t,·)

=

2: Pi(t)O{x;(t)}(-). i=l

As mentioned above, the idea of the proof is similar to the homogeneous case except that it is a little more technical. One of the key observations used in the proof is that for a process M in the class C, Pj(M) =I- 0, 1 :s: j :s: k if and only if the first cumulant CI (t) of the process M is a polynomial in t and for all 2 :s: j :s: k, the functions t f-----+ J u j - 2l(t, du) are polynomials in t almost everywhere, where l is a version of the derivative measure associated to M. Using this, here is a brief sketch of the proof of the theorem.

Proof. (a) In view of the above observation, the conditions (i) and (ii) clearly imply that Pj(M) =I- 0, 1 :s: j :s: 2k + 2. If now N is another process of class C with Pj(N) = Pj(M) V 1 :s: j :s: 2k + 2, then it will follow that N has the same mean function as M and also for all O:S: j:S: 2k, JujlN(t,du) = JUjlM(t,du) for almost all t E [0, (0), where IN and lM denote (versions of) the derivative measures associated with Nand M respectively. Consequently, one will have k

J

IT (U- Xi(t))2lN(t,du) = i=l

k

J

IT (u- xi(t))2lM(t,du).

By the same argument as

i=l

in the proof of the 'if' part of Theorem 3.1, we get IN(t,·)

= lM(t,·)

for almost

d

every t, and hence N = M. (b) Suppose that Mis k-polynomially determined in C. Using Lemma 3.1, one can get a version l(t,·) of the derivative measure associated to M such that Isupp(l(t, ·))1 :s: k for all t E [0, (0). For each 1 :s: j :s: k, let T j = {t E [0, (0) : Isupp(l(t, ·))1 = j}. It can be shown that each T j and hence UjTj is a Borel set. For t E T j , order the elements of supp(l(t, .)) as XI(t) < ... < Xj(t) and denote the 1(t, .)- measures of these points by PI (t), ... ,Pj (t) respectively. Also, for j < i :s: k, set Xi(t) = xj(t)+l andpi(t) = 0. Finally, for t 1: UjTj , set Xi(t) == Yi and Pi (t) == for all 1 :s: i :s: k, where Yl, ... ,Yk is any arbitrarily chosen set of k points. With these notations, it is clear that l (t, .) has the form asserted. One can now show that the mapping t f-----+ (XI(t),··· ,Xk(t),PI(t),··· ,Pk(t)) as 0 defined above is measurable and that completes the proof.

°

Remark: In the next section, we will see some examples of possible forms of the functions Xi(t) and Pi(t). Let us remark here that it is possible to formulate

Polynomially Harmonizable Processes

164

the definition of the class C in terms of the Levy measures and then to give a characterization involving the 'derivative measure' arising out of the Levy measure. However, it is not clear how to go beyond the class C and to even formulate a condition that will, for example, characterize the f.p.d. processes among all Levy processes.

4

Some Examples

The most commonly known examples of polynomially harmonizable processes are the standard Brownian motion and the standard Poisson process. One can easily see that for a Brownian motion with fL and a 2 as its drift and diffusion coefficients respectively, a canonical sequence of time-space harmonic polynomials is given by Pk(t, x) = (at)k/2pk(x

;!t), at

where the Pk are the usual one-variable

Hermite polynomials as defined in Section 1. Similarly, for the Poisson process with intensity A, a sequence of of time-space harmonic polynomials is given by Pk(t,x)

=

jto

(~)xj:~

{\-j}

(At)i, where

{ ~} denote the Stirling numbers of the second kind. If M is a non-homogeneous compound Poisson process with intensity function A(') and jump-size distribution F, then it is not difficult to see, using the results described in Section 2, that M is polynomially harmonizable if and only if AU is a polynomial function and F has finite moments of all orders. It is possible, though cumbersome, to get an explicit sequence of time-space harmonic polynomials.

A not so well-known example of a p-harmonizable process is the process M = BES 2 (1), the square of the 1-dimensional Bessel process. It is well-known that

. a seml-sta . bl e markov process wh ose generator IS . gIven . by dx d + 2X dx d2 ' · IS t h IS 2 Using this, one can show that M is polynomially harmonizable and that a k

sequence of its time-space harmonic polynomials is given by P k (t, x) k! tk-j J (2j)!(k-j)! X

= L (- 2)j j=O

U sing the technique mentioned at the end of Section 2, we can now get other examples of p-harmonizable processes that arise as markov processes whose semigroups are intertwined with that of the process BES2(1). Some examples of random variables which lead to interesting semigroups intertwined with that of BES 2 (1), in the sense described in Section 2, are (i) Z = Z 1. b, having the beta distribution with parameters -21 and b, 2 ' and, (ii) Z = 2Zb+~' where Zb+~ has gamma distribution with parameter b+~. The first one leads to the process BES2 (2b ), the square of the Bessel process of

A. Goswami and A. Sengupta

165

dimension 2b, while the latter leads to a certain process detailed in Yor [8] with "increasing saw-teeth" paths. Another interesting example of a process intertwined with BES2 (1) in the same way is what is called Azema's martingale (see Yor [9]) defined as M t = sgn(Bt) ·-jt - gt, t 2.: 0, where B is the standard Brownian motion and gt denotes the last zero of B before time t. The multiplicative kernel here is given by the random variable ml, ther terminal value of " Brownian meander". In [9], Yor uses Chaotic Representation Property to give an alternative proof of pharmonizability of Azema's martingale as well as each member of the class of "Emery's martingales'. As an illustration of our method, we use the time-space harmonic polynomials of BES2(1) as obtained above and the intertwinning to describe time-space harmonic polynomials for two of the cases mentioned above. In the case of BES 2(2b), a sequence of time-space harmonic polynomials are

k

(-2)j(1)j t k- j

.

i) ( .) xJ, where (Y)k stands for the prodj=O (2J)!(b+ 2 j k-J!

given by Pk(t,x) = ~. k-l

uct

TI (y + i).

i=O

For the Azema's martingale, one uses the fact the ml has a Rayleigh distribution to obtain a sequence of time-space harmonic polynomials given by Pdt, x) = k.

(k)

+ 1)hj

(t)x j , where r(.) denotes the gamma function

EHk(t, mIX)

= ~ 2~r(~

and Hk(t,x)

= ~ hjk) (t)x j are the 2-variable Hermite polynomials.

j=O k

j=O

We now discuss some examples of f.p.d. processes. First of all, it is not difficult to see that the only 2-p.d. Levy processes are those that are deterministic, that is, M t is identically equal to a polynomial pet). Our first example of a non-trivial f.p.d. process is the standard Brownian motion, which is a homogeneous Levy process with l(du) = 6{0}(du). Thus, by our Theorem 3.1, standard Brownian motion is uniquely determined among homogeneous Levy processes by its first four time-space harmonic polynomials, for example, by the first four 2-variable Hrermite polynomials. This result should be contrasted with the well-known characterization due to Levy, which says that the first two Hermite polynomials suffice if one assumes continuity of paths in addition. In contrast, our result says that among all homogeneous Levy processes, standard Brownian motion is the only one for which the first four hermite polynomials are time-space harmonic. A natural question is whether we can do with less than four. The answer is an emphatic 'no'. An example of another homogeneous Levy process for which the first three Hermite polynomials are time-space harmonic is the mean zero process determined by the Kolmogorov measure L(dt, du) = dt o l(du) , where l(du) = ~ [6{ _l}(du) + 6{1}(du)]. It is not difficult to see that any gaussian Levy process, with mean and variance functions being polynomials, is also 4-p.d.

For the homogeneous Poisson process with intensity A, one has l(du) = A6{1} , so that once again it is 4-p.d. among all homogeneous Levy processes. Here

166

Polynomially Harmonizable Processes

also, four is the minimum number needed, since one can easily construct an example of a different homogeneous Levy process for which the first three Charlier polynomials are time-space harmonic. For the non-homogeneous compound Poisson process, it can easily be seen that it is f.p.d. if and only if the jump-size distribution is finitely supported and the intensity function is a polynomial function and that in this case, it is actually (2k + 2)-p.d. where k is the cardinality of the support of the jump-size distribution. We conclude with some examples of f.p.d. processes in the class C. We have a characterization of such processes in Theorem 3.2. Here are some examples of possible forms of the functions Xi(t) and Pi(t), that appear in that Thoerem. We consider only the case k = 2. The simplest possible case is that Xl (t), X2 (t) and PI(t) 2: 0,P2(t) 2: 0 are themselves polynomials. Another possibility is that

Xl(t) = a(t) + Jb(ij,X2(t) = a(t) - Jb(ij,PI(t) = c(t) + d(t) Jb(ij,P2(t) = c(t) - d(t)Jb(ij, where a, b, c, d are polynomials so chosen that c + dVb, c - dVb are both non-negative on [0, 00). One can similarly construct other examples. From Theorem 3.2, it follows that all these would lead to processes that are f.p.d (in fact, 6-p.d.) in the class C. A. Goswami Stat-Math Unit Indian Statistical Institute 203 B.T. Road Kolkata 700 108, India ago swami @indiana.edu

A. Sengupta Division of Theoretical Statistics and Mathematics Indian Statistical Institute 203 B.T. Road Kolkata 700 108, India

Bibliography [1] Carmona, P., Petit, F. and Yor, M. (1994). Sur les fonctionelles exponentielles de certain processus de levy. Stochastics and Stochastic Reports, 47, p. 71-101. [2] Goswami, A. and Sengupta, A. (1995). Time-Space Polynomial Martingales Generated by a Discrete-Parameter Martingale. Journal of Theoretical Probability, 8, no. 2, p. 417-431. [3] Ito, K. (1984). Lectures on Stochastic Processes. TIFR Lecture Notes, Tata Institute of Fundamental Research, Narosa, New Delhi. [4] Kechris, A.S. (1995). Classical Descriptive Set Theory, v. 156, Graduate Texts in Mathematics, Springer-Verlag. [5] Lamperti, J. (1972). Semi-Stable Markov Processes. Wahrscheinlichkeitstheorie Verw. Gebiete. 22, p. 205-225.

Zentrablatt

[6] Sengupta, A. (1998). Time-Space Harmonic Polynomials for Stochastic Processes, Ph. D. Thesis, Indian Statistical Institute, Calcutta, India.

A. Goswami and A. Sengupta

167

[7] Sheffer, I. M. (1939). Some Properties of Polynomial Sets of Type Zero. Duke Math. Jour., 5, p. 590-622. [8] Yor, M. (1989). Vne Extension Markovienne de l'algebre des Lois Betagamma. G.R.A.S. Paris, Serie I, 303, p. 257-260. [9] Yor, M. (1994). Some Aspects of Brownian Motion; Part II : Some Recent Martingale Problems. Lectures in Mathematics, ETH Zurich. Laboratoire de Probabilites, Vniversite Paris VI.

168

Polynomially Harmonizable Processes

Effects of Smoothing on Distribution Approximations Peter Hall Australian National University

and Xiao-Hua Zhou Indiana University School of Medicine

Abstract We show that a number of apparently disparate problems, involving distribution approximations in the presence of discontinuities, are actually closely related. One class of such problems involves developing bootstrap approximations to the distribution of a sample mean when the sample includes both ordinal and continuous data. Another class involves smoothing a lattice distribution so as to overcome rounding errors in the normal approximation. A third includes kernel methods for smoothing distribution estimates when constructing confidence bands. Each problem in these classes may be modelled in terms of sampling from a mixture of a continuous and a lattice distribution. We quantify the proportion of the continuous component that is sufficient to "smooth away" difficulties caused by the lattice part. The proportion is surprisingly small - it is only a little larger than n-1logn, where n denotes sample size. Therefore, very few continuous variables are required in order to render a continuity correction unnecessary. The implications of this result in the problem of sampling both ordinal and continuous data are discussed, and numerical aspects are described through a simulation study. The result is also used to characterise bandwidths that are appropriate for smoothing distribution estimators in the confidence band problem. In this setting an empirical method for bandwidth choice is suggested, and a particularly simple derivation of Edgeworth expansions is given.

Keywords: Bandwidth, bootstrap, confidence band, confidence interval, continuity correction, coverage error, Edgeworth expansion, kernel methods, mixture distribution.

1 1.1

Introduction Smoothing in distribution approximations

Rabi Bhattacharya has made very substantial contributions to our understanding of normal approximations in statistics and probability. None has been less important and influential than his exploration and application of smoothing as it is related to distribution approximations. For example, his development of ways of smoothing multivariate characteristic functions lies at the heart of his pathbreaking work on Berry-Esseen bounds and other measures of rates of

169

170

Effects of Smoothing on Distribution Approximations

convergence in the multivariate central limit theorem (e.g. Bhattacharya 1967, 1968, 1970; Bhattacharya and Rao, 1976). His introduction of what has become known as the "smooth function model" (Bhattacharya and Ghosh, 1978), for describing properties of Edgeworth expansions of statistics that can expressed as smooth functions of means, has allowed wide-ranging asymptotic studies of statistical methods such as those based on the bootstrap. The present paper is a very small contribution, but nevertheless in a related vein - a small token of our appreciation of the considerable contribution that Rabi has made to distribution approximations in mathematical statistics. A key assumption in many distribution approximations in statistics is that the distribution being approximated is continuous. Without this property, not only are approximation errors likely to be large, but special features that the approximations are often assumed to enjoy can be violated. These include the property that the coverage error of a two-sided confidence interval is an order of magnitude less than that for its one-sided counterpart. In a range of practical problems the assumption of smoothness can be invalid, however. In such cases there may sometimes be enough "residual" smoothing present in other aspects of the problem for it to be unnecessary to smooth in an artificial way. Nevertheless, even in these circumstances it is important to know how much residual smoothing is required, so that the adequacy of the residual smoothing can be assessed. In other problems there is simply not enough smoothing to overcome the most serious discretisation errors; there, artificial smoothing, for example using kernel methods, can be efficacious. In the present paper, motivated by particular problems of both these types, we derive a general theoretical benchmark for the level of smoothing that is adequate in each case. In the first class of problem, encountered in several practical settings, we suggest an empirical method for assessing whether the benchmark has been attained. In the second class, related to smoothed distribution estimation, we introduce an empirical technique for determining how much smoothing should be provided. Both types of problem have a common basis, in that they represent mixture-type sampling schemes where a portion of the data are drawn from a smooth distribution and the remainder from a lattice distribution. It is shown that the sampling fraction of the smooth component can be surprisingly small before difficulties arise through the roughness of the other component. The threshold is approximately n-1logn, where n denotes sample size. In the case of the second problem this result may be interpreted as a prescription for bandwidth choice, which can be implemented in practice using a smoothed bootstrap method. For the first problem the result may be interpreted as defining a safeguard: only when the smooth component is present in a particularly small proportion will the unsmooth component cause difficulties. Next we introduce the two classes of problem.

1.2

First problem: bootstrap inference for distributions with both ordinal and continuous components

In some applications it is common to encounter a data distribution that is a mixture of an atom at the origin and a continuous component on the positive

Peter Hall and Xiao-Hua Zhou

171

half-line. Examples include the the cost of health care (e.g. Zhou, Melfi and Hui, 1997) and the proportion of an account that an audit determines to be in error (e.g. Cox and Snell, 1979; Azzalini and Hall, 2000). In the second example, both 0 and 1 can be atoms of the sampled distribution. In both examples the mean of the mixture, rather than the mean of just the continuous component, is of interest. If all the data are ordinal and lie within a relatively narrow range, for example if the costs or proportions in the respectively examples are distributed among only a half-dozen equally-spaced bins, then the lattice nature of the data needs careful attention if bootstrap methods are to be used to construct confidence intervals for the mean. Indeed, particular difficulties associated with this case were addressed in the first detailed theoretical treatment of bootstrap methods for distribution approximation; see Singh (1981). One way of alleviating these difficulties is to use smoothed bootstrap methods; see for example Hall (1987a). On the other hand, no special treatment is required if just the positive part of the sampled distribution is addressed, provided this portion of the distribution is smooth.

This begs the question of what should be done in the mixture case. Does the implicit smoothing provided by the continuous component overcome potential difficulties caused by the ordinal component? How does the answer to this question depend on the proportion of the ordinal component in the mixture? Our results on the effects of smoothing on distribution approximation allow us to answer both these questions; see sections 3.1 and 4.1. A related problem is that of smoothing a discrete distribution so as to construct a confidence interval for its mean. One approach is to blur each lattice-valued observation over an interval on either side of its actual value; see for example Clark et al. (1997, p. 12). For example, if a random variable Y with this distribution takes only integer values, we might replace an observed value Y = i by i + EZ, where E > 0 and Z is symmetric on the interval [-1, 1]. How large does E have to be in order to effectively eliminate rounding errors from an approximation to the distribution of the mean of n values of Y? In particular, can we allow E to decrease with sample size, and if so, how fast? Answers will be given in sections 3.1 and 4.1. Of course, in this second aspect of the first problem it is the mean of Y, not the mean of X = Y + EZ, about which we wish to make inference. However, the mean of EZ is known, and so it is a trivial matter to progress from a confidence interval for E(X) to one for E(Y).

1.3

Second problem: functions

confidence bands for distribution

Let U = {U1 , ... , Un} denote a random sample drawn from a distribution F, and write F for the empirical distribution function based on U. Then, with Za/2 denoting the upper ~a-Ievel point of the standard normal distribution,

172

Effects of Smoothing on Distribution Approximations

F ± {n- 1F(1 - F)}1/2 Za/2 is a conventional confidence band for F founded on normal approximation, with nominal pointwise coverage 1 - CY. In more standard problems, involving a mean of smoothly distributed random variables, the coverage accuracy of such a band would equal 0 (n -1). In the present setting, however, owing to asymmetric rounding errors that arise in approximating the discrete Binomial distribution by a smooth normal one, coverage error of even a two-sided symmetric confidence band is in general no better than O(n- 1 / 2 ). A particularly simple way of smoothing in this setting, and potentially overcoming difficulties caused by rounding errors, is to use kernel methods. Let K, the kernel, be a bounded, symmetric, compactly supported probability density, write L for the corresponding distribution function, and let h be a bandwidth. Then

Fh(u)

=

n- 1

t

L(U ~ U

i )

(1.1 )

2=1

is a smoothed kernel estimator of F. We may interpret E{Fh(u)} in at least two different ways: firstly, as the mean of a sample drawn from a mixture of two distributions, one taking only the values 0 and 1 (the latter with probability F(u - hc), where [-c, c] denotes the support of K), and the other having a smooth distribution (equal to that of L{ (u - Ui ) / h}, conditional on (u - Ui ) / h lying within the support of K); and secondly, as the distribution function of X = Y + hZ, where Y and Z have distribution functions F and L, respectively. Hence, this problem and those described in section 1.2 have identical roots. The bias of Fh(u), as an estimator of F(u), equals O(h2) provided F is sufficiently smooth. In relative terms its variance differs from that of F(u) by only O(h). See Azzalini (1981), Reiss (1981) and Falk (1983) for discussion of these and related properties. Together these results suggest that taking h as small as possible is desirable, since then h would have least effect on moment properties. Indeed, the moment properties suggest that h = O(n-1) might give the O(n-l) coverage error seen in conventional problems. However, it may be shown that this size of bandwidth is not adequate for removing difficulties caused by lack of smoothness of the distribution of F. With h = O(n- 1 ), rounding errors still contribute terms of order n- 1/ 2 to coverage error of two-sided confidence bands. Can we choose h large enough to overcome these problems, and yet small enough to give an order of coverage accuracy close to the "ideal" O(n- 1 )? And even if this problem has a theoretical solution, can good coverage accuracy be achieved empirically? These questions will be answered in sections 3.2 and 4.2, where we shall propose and describe an empirical bandwidth-choice method in the confidence band problem. Additionally we shall show that our approach to the problem of smoothed distribution estimation, via sampling from a mixture distribution, leads to particularly simple derivations of Edgeworth expansions. There is of course an extensive literature of the problem of bandwidth choice for kernel estimation of distribution functions. It includes both plug-in and cross-validation methods; see Mielniczuk, Sarda and Vieu (1989), Sarda (1993), Altman and Leger (1995), and Bowman, Hall and Prvan (1998). However, in all

Peter Hall and Xiao-Hua Zhou

173

these cases the bandwidths that are proposed are of asymptotic size n -1/3, much larger than n- 1 . They are appropriate only for estimation of the distribution function curve, not for confidence interval or band construction, and produce relatively high levels of coverage error if used for the latter purpose. The class of distribution and density estimation problems is characterised by an interesting hierarchy of bandwidth sizes: n- 1 / 5 for estimating a density curve, n- 1 / 3 for distribution curve estimation, and a still smaller size, approximately n -1 log n (as we shall show in section 3.2), for constructing two-sided confidence bands for a distribution function.

2

Distribution-Approximation Difficulties Caused by Lack of Smoothness

Let Xl' ... ' Xn be independent and identically distributed random variables with the distribution of X, and let X = n- 1 Li Xi denote the sample mean. Many explanations for the small-sample performance of bootstrap approximations to the distribution of X are based on properties of its Edgeworth expansion. A formal expansion exists under moment conditions alone. In particular, provided only that (2.1) the formal Edgeworth expansion up to terms in n- k / 2 is well defined; it is

where

i

If, in addition to the moment assumption (2.1), the distribution of X is smooth (for example if it is absolutely continuous), Qk can provide an accurate approx-

imation to the standardised distribution of X. For example, if the distribution of X has a bounded density, and if we define f1 = E(X) and 0- 2 = var(X), then (2.3)

uniformly in x, as n ---+ 00. The performance of bootstrap methods rests heavily on this result, through the property that the bootstrap provides a particularly accurate estimate of the term Qk on the right-hand side of (2.3). However, (2.3) fails if the sampled distribution is lattice. For example, if nX has the Binomial Bi(n, q) distribution, where 0 < q < 1, then (2.3) holds only if we add, to the right-hand side, a continuity-correction term for each order from n- 1 / 2 to n- k / 2 inclusive. Such terms compensate for errors introduced by approximating the relatively rough Binomial distribution by a smooth function.

Effects of Smoothing on Distribution Approximations

174

In particular, if the sampling distribution is supported on the set of integers and has lattice span 1, and if we define

Dk(x)

=

Qk(X) -

2..=

Q~{ (j - nf1)/(n 1/ 2a)} ,

j~nx

then the "corrected" form of (2.3) holds:

uniformly in x. See for example pp. 237-241 of Bhattacharya and Rao (1976). Of course, (2.4) has analogues in the case of other lattice distributions. In these general cases we may express Dk(X) as an expansion with terms of size n- j / 2 , for 1 ~ j ~ k. The term in n- 1 / 2 equals

where S(u) = (u) - u + ~ and (u) denotes the integer part of u. The wellknown continuity correction, applied for example to normal approximations to the Binomial distribution, adjusts for Dk1 (x). We shall show in section 5, however, that if the distribution of X is smoothed through being a mixture of only a small proportion of a continuous distribution, then all aspects of the continuity correction Dk (x) may be dispensed with. That is, Dk(X) may be dropped from (2.4), and (2.3) holds for all k "2 1. The implications of this result for coverage accuracy of confidence regions can be considerable. To appreciate this point, note that since 1[1 (x) at (2.2) is symmetric in x then, in the case of a smooth sampled distribution, potential coverage errors of size n- 1 / 2 cancel from the formula for coverage of the twosided confidence interval X ±n-1/2azo:/2' As a result this interval has coverage error O(n- 1). However, since the correction term Dk(X) is not symmetric in x then this property fails when the sampled distribution is unsmooth, and there the order of coverage error is only O(n- 1 / 2 ), even for symmetric, two-sided confidence intervals. Moreover, a conventional continuity correction does not remove all the error of this size; taking that approach, the best that can generally be achieved is to produce a conservative confidence interval where the coverage error is dominated by, rather than equal to, the nominal level plus O(n- 1 ). See Hall (1982, 1987a) for discussion of this issue. Of course, these results have direct analogues in the Studentised case; in the discussion above we have treated the non-Studentised case, where (J" is assumed known, only for convenience.

Peter Hall and Xiao-H ua Zhou

3

3.1

175

Overcoming Difficulties Caused by Lack of Smoothness Solution to first problem

Suppose the distribution of X is obtained by mixing a smoothly distributed random variable Y (for example, one having a bounded probability density) with an arbitrary but nondegenerate random variable Z, in proportions p and 1 - p respectively, where p may depend on n. We wish to know the effect that any smoothing conferred by the distribution of Y has on the distribution of a mean X of n independent random variables distribution as X. It will be shown in section 5 that if n -1 log n = o(p) then the discretisation-error term Dk is negligible, and in fact sUPx IDk(X)1 = o(n- k/ 2 ). As a result, the distribution of X is accurately approximated by its formal Edgeworth expansion, to any order that is permitted by the number of moments enjoyed by the distribution of X. This property applies equally to the distributions of Studentised and non-Studentised means; in both cases, the comparatively small amount of smoothing obtained when n- 1 logn = o(p) is nevertheless sufficient to compensate for highly unsmooth features of the other component of the sampling distribution.

We shall also note in section 5 that these results extend to applications of the bootstrap. Indeed, all those properties of the bootstrap that are valid whenever a fixed sampled distribution is accurately approximated by its formal Edgeworth expansion (see e.g. Hall, 1992, Chapter 3), continue to hold for our mixture distribution, provided n -1 log n = o(p). Of course, these results are somewhat asymptotic in character, although the particularly small lower bound to the effective value of p suggests that in most cases the results will be available in practice. Numerical work in section 4.1 will bear this out. In a specific, practical problem an empirical method for determining whether p is sufficiently large is to explore the problem by Monte Carlo means: model the distribution of the smooth component of the sampled distribution, and, taking the mixing proportion equal to its naive estimate, simulate to ascertain the effect of discretisation error in the context of the model. In the case of specific component distributions (e.g. a normal smooth component and a Bernoulli lattice component) it can be shown that the constraint n -1 log n = o(p) is necessary as well as sufficient for formal Edgeworth approximation to be valid at all orders. In more general cases it is readily proved that the less stringent constraint n- 1 = O(p) is not sufficient. Very similar results may be derived in the related problem of smoothing the distribution of an integer-valued random variable Y by adding to it, rather than mixing it with, a continuous component. That is, we replace Y by Y + EZ, where E > 0 and Z has a continuous distribution. As long as E = E(n) decreases to 0 more slowly than n- 1 log n, this modification allows us to approximate the distribution of the mean of Y + EZ by its formal Edgeworth expansion to any or-

Effects of Smoothing on Distribution Approximations

176

der; see section 5.3. If the distribution of Z is symmetric then the distributions of both Y and Y + EZ have the same mean and skewness, and their variances differ only to order E2. Moreover, the "converse" results described in the previous paragraphs have direct analogues in the setting of additive smoothing of a discrete distribution.

3.2

Solution to second problem

Recall from section 1.3 that we seek a pointwise, (1 - a)-level confidence band for the distribution function F. We noted there that the standard normalapproximation band, F ± {n- 1F(1- F)}I/2zOO/2' has only O(n-l/2) coverage accuracy, owing to uncorrected discretisation errors. We suggest instead the smoothed band, (3.1) where Fh is as defined at (1.1). We shall show at the end of this section that by taking h = n -1 (log n) HE, for any E > 0, coverage error of this band is reduced to O(h). That is only a little worse than the O(n-I) level encountered in related problems, where the sampled distribution is smooth. These properties are highly asymptotic in character, however. To achieve a good level of performance in practice we suggest the following approach. Using standard kernel methods, compute an estimator of the density f = F' based on the sample U. For example, if employing the same kernel K as before, the estimator would be

where hI is a bandwidth the size of which is appropriate to density estimation. (In particular, hI would generally be computed using either cross-validation or a plug-in rule; it would be of size n- 1 / 5 , in asymptotic terms.) Let Fhi(U) = fv
ih

{3oo(u, h) = P(Fh*(u) - [n- 1Fh*(u){l - Fh*(u)}f/ 2Zoo/2 :::; Fhi (u) :::; Fh*(u)

+ {n- 1Fh*(u) {1- Fh*(u)}]1/2ZOO/2IU).

Choose h = hoo to render (300 (u, h) as close as possible to a over the interval I where we wish to construct the final confidence band. For example, we might select hoo to minimize Aoo(h), where

Peter Hall and Xiao-Hua Zhou

177

Our confidence band is that defined at (3.1), but with h = ha . If desired, an additional level of calibration can be incorporated by choosing b, h) = (i, h) simultaneously, to minimise A'Y(h), and taking the band to be that at (3.1) but with bandwidth h and critical point Zi/2 (instead of Za/2). Finally we outline a derivation of the theoretical properties claimed of the confidence band at (3.1). It will be shown in section 5.4 that if h decreases to 0 at a slower rate than n- 1 logn, i.e. if

n h( n) / (log n)

(3.2)

---+ 00 ,

then the smoothed empirical distribution function estimator Fh, defined at (1.1), admits a formal Edgeworth expansion of any order k 2': 1. That is, if Qk = Qh,k at (2.2) denotes the formal Edgeworth expansion of Fh(u) then the analogue of (2.3) holds for each k 2': 1:

p(n 2 Fh(;h(U~h(U) :::; x) = Qh,dx) + o(n- k/ 2) 1/

(3.3)

uniformly in x, where

Fh(U) = E{Fh(u)} = O"h(u)2

=

n var{Fh(u)}

J

K(v) F(u - hv) dv,

=

J

L(U ~ vt f(v) dv - Fh(U)2.

If F" exists and is bounded in a neighbourhood of u then Fh(u) and O"h(U)2 = F(u){l - F(u)} + O(h). Therefore, provided

n- 1 logn« h = O(n- 1 / 2 )

= F(u) + O(h2) (3.4)

,

(3.3) for k 2': 2 implies that

P (n

1/2

<) _

Fh(U) - F(u) [F(u){l _ F(U)}j1/2 - X - Qh,k(X)

+ O(h)

uniformly in x. It may be shown by Taylor expanding the argument of the probability that this implies

p(n

1/2

Fh(u) - F(u) ) () [Fh(u){l _ Fh(u)}]I/2 :::; X = Qh,k X

+

O(h

).

(3.5)

Since the bandwidth h = n- 1 (logn)1+ E satisfies (3.4) then the claims made immediately below (3.1) follow from (3.5). Another advantage of our approach is that it leads to particularly simple derivations of detailed Edgeworth expansions. Indeed, once one appreciates that the problem can be posed in terms of sampling from a mixture, (3.3) immediately gives a simple form of the expansion, to arbitrarily high order. Deriving the expansion in more traditional form, with terms of orders n -i/2 h j for i, j 2': 0 (rather than simply n- i / 2 ), is only a matter of Taylor expanding the quantities O"h and Qh,k at (3.3). A different argument, based on intrinsic properties of the smoothed distribution function, was given by Garcfa-Soidan, GonzalezManteiga and Prada-Sanchez (1997). In addition to the complexity of that technique, it requires more severe conditions on the smoothness of K.

178

4 4.1

Effects of Smoothing on Distribution Approximations

Numerical Properties Effects of different mixing proportions in the first problem

We conducted a simulation study to assess the effects of mixing proportions on coverage accuracy of two-sided confidence intervals based on either Studentised or non-Studentised means. We generated 1000 samples of sizes n = 10 and 20 from a mixture of a discrete Bernoulli distribution with probability of success 0.1 and different continuous distributions: chi-squared distributions with two, four and six degrees of freedom, and a standard normal distribution. Figure 1 graphs coverage probabilities for two-sided 95% confidence intervals in both Studentised and non-Studentised cases, where the endpoints of the intervals are taken to be X ± 1.96n- 1 / 2 (T and X ± 1.96n- 1 / 2 a, respectively, and a is the bootstrap standard deviation. Coverage accuracy in the non-Studentised case is high for even small proportions of continuous data, as argued in section 3.1. More difficulties are experienced in the Studentised case, however. There, increasing the proportion of continuous data has a more marked influence on coverage accuracy. Analogous results are obtained for one-sided confidence intervals, except that there the effect of the proportion of continuous data is confounded with the influence of skewness which now has a significant effect on coverage accuracy for different sample sizes.

4.2

Effect of different mixing proportions in the second problem.

Numerical studies which are not detailed here show that for small bandwidths, before bias becomes a significant problem, coverages of smoothed confidence intervals for distribution functions increase monotonically with increasing bandwidth. This is a consequence of the variability of smoothed distribution estimators decreasing with increasing bandwidth. Confidence intervals usually, although not always, undercover when h = 0 and overcover when the bandwidth is taken to equal the value, hMs E say, that gives least mean squared error for a given argument u of the distribution function. As the bandwidth is increased from h = 0 to hMs E it typically passes through a value that, when used to construct a smoothed a-level confidence interval for F (u), gives zero coverage error. The ~ootstrap method suggested in section 3.2 produces an empirical approximation ho: to this interval-optimal bandwidth. Table 1 gives numerical examples of the performance of ho:. There we took F to be the standard normal distribution function, although results are similar in other cases; only u = 0, where the normal density has zero gradient and, consequently, the bias of a distribution estimator equals O(h4) rather than O(h2), is atypical. Columns of Table 1 give approximations to the true coverage of confidence intervals (obtained by averaging over 1000 samples, using B = 1500 bootstrap simulations) for different values of n. Rows express (a) the confidence interval using the bandwidth h = h MSE that produces optimal pointwise accu-

Peter Hall and Xiao-Hua Zhou

179

Chi-square with df=2 when n=10

Chi-square with df=2 when n=20

q

~ ~

~ :g,

a. ~

~

()

q

'"a

£'

a'"

a.

~

.

~ ~

~

a

()

0.2

0.3

0.4

. r e

()

0.1

q

~

a'"

a.

~e

a

"'a

~

....

()

~

()

0.3

0.4

0.5

0.1

f

0.5

Chi-square with df=6 when n=20 q

~

~

e a.

t ~

a

()

a'"

'" a

.... a

'"a 0.2

0.3

0.4

0.5

0.1

0.2

0.3

0.4

Proportion of continous data

Proportion of continous data

N(O,1) when n=10

N(O,1) when n=20

q

'-'

0.4

Chi-square with df=6 when n=10

0.1

a.

0.3

Proportion of continous data

a

I

0.2

Proportion of continous data

.?

a

a 0.2

'" a

....

a'"

. ~

....

'" a

~

0.5

Chi-square with df=4 when n=20

'"a

.

0.4

Chi-square with df=4 when n=10

q

a.

0.3

Proportion of continous data

0.1

~e

0.2

Proportion of continuous data

a'"

~

a

0.5

q

a.

.... a'"

0.1

.2

'"a

~

....

'"a

~ :c

a'"

0.5

q

'----

'"a

~

a'"

"' a

a.

"'a

~e

:g, ~

....

~

a

'-'

.... a

'"a

'"a 0.1

0.2

0.3

0.4

Proportion of continous data

0.5

0.1

0.2

0.3

0.4

0.5

Proportion 01 continous data

Figure 1 Coverage probabilities of two-sided 95% confidence intervals. Solid and dotted lines show coverages of non-Studentised and Studentised intervals, respectively, for the mean of a mixture of a discrete Bernoulli distribution and a chi-squared or a normal distribution.

EHects of Smoothing on Distribution Approximations

180

racy (PTWS); (b) the interval calculated using our bootstrap method (BOOT); and (c) the unsmoothed interval (UNSM). Except when u = 0 the coverage for the interval BOOT lies between its counterparts for PTWS and UNSM. In almost every case it is substantially closer to 0.95 than the coverages of either of the other two intervals. In our calculations we employed the distribution version of the Epanechnikov kernel, defined by L(u) = (3/4)u - (1/4)u 3 + (1/2) for It I :::; 1, L(u) = 0 ift < -1 and L(u) = 1 ift > 1.

Method BOOT PTWS UNSM

u= n=20 0.955 0.990 0.971

u = 0.75

0.0 n=50 0.942 0.983 0.918

n=20 0.941 0.987 0.945

n=50 0.948 0.986 0.925

u = 1.5 n=20 0.933 0.965 0.766

n=50 0.954 0.980 0.858

N(O,l): the standard normal distribution. Methods: BOOT, the interval using our bootstrap method; PTWS, the confidence interval using the bandwidth h = hMs E that produces optimal pointwise accuracy; UNSM, the unsmoothed interval. Table 1: Coverages of different confidence intervals for F( u). The distribution is standard normal, u denotes the argument at which F is estimated, and rows headed PTWS, BOOT and UNS~ represent intervals using the pointwiseoptimal bandwidth, the bandwidth ha suggested in section 3.2, and h = 0, respectively.

5 5.1

Technical Details Mixture of discrete and continuous distribution

Let Y be a random variable with the property that its characteristic function 'IjJ(t) = E(e itY ) satisfies Cramer's condition: lim sup 1'IjJ(t) I < 1.

(5.1)

It l---too

In particular, (5.1) holds if the distribution of Y is absolutely continuous. Let Z denote a random variable independent of Y and having any nondegenerate distribution, and let the distribution of X be a mixture of those of Y and Z in proportions p : 1 - p. We shall take p to be a function of sample size, since this allows us to explore the case where X = Z with very high probability. Thus,

x

=

{Y Z

with probabilityp = p(n) with probability 1 - p.

(5.2)

Peter Hall and Xiao-Hua Zhou

181

Given this distribution of X, define the formal Edgeworth expansion Qk as at (2.2), and put p, = E(X) and (J2 = var(X). Note that all moments of X depend on n, through p( n).

Theorem 5.1. Assume the distribution of Y satisfies (5.1), and that the distribution of X is given by (5.2). Suppose too that the distribution of Z is nondegenerate, that E(IYI + IZl)k+2 < (X) (5.3)

where k

~

1, and that

pen) as n

- t 00.

-t

0

and

lim n p(n)/(log n) =

(5.4)

(X)

n->oo

Then (5.5)

uniformly in x. Note particularly that (5.4) requires only a very small proportion, not much larger than O(n-llog n), of the Xi'S to be equal to the smoothly distributed Yi's. Furthermore, the Edgeworth expansion at (5.5) involves no continuity-correction term. Therefore, "a small amount of smoothness goes a long way" in removing any effects of discreteness of the distribution of the sample mean.

5.2

Bootstrap form of Theorem 5.1

Let .1'* = {Xi, ... ,X~} denote a resample drawn by sampling randomly, with replacement, from X = {Xl, ... , X n }. Let S2 be the variance of X (defined using divisor n rather than n - 1), let X* denote the mean of .1'*, and let Qk be the empirical form of Qk, in which each population moment is replaced by its sample counterpart.

Theorem 5.2. Assume the conditions of Theorem 5.1. Then

P{nl/2(X* - X)/S:::;

xiX} = Qk(X) + op(n- k/ 2) ,

(5.6)

uniformly in x. The first term in Q k, of size n -1 /2, depends on only the first three moments of the distribution of X. Provided E(IXI + IYI)6 < 00, these three moments differ from their sample counterparts only by order n -1/2. Therefore, taking k ~ 2 and subtracting (5.5) and (5.6), we deduce that

p{ nl/2(X*

- X)/ S :::;

xiX} - p{ n 1/ 2(X

- p,)/(J :::;

x}

=

Op (n- 1 )

,

uniformly in x. This is the analogue of second-order correctness in the present setting: the bootstrap approximation to the distribution of the sample mean is accurate to order n- 1, not simply n- 1/ 2 (as in a conventional normal approximation). Note particularly that this has been achieved through only a small amount of smoothing, by mixing a virtually arbitrary Z distribution with only a little more than proportion O(n-1logn) of the relatively smooth Y distribution.

182

5.3

Effects of Smoothing on Distribution Approximations

Variant of Theorem 5.1 for distribution smoothing

Let Y and Z be independent variables, as discussed in section 5.1, and in place of (5.2) put X = Y + EZ where E = E(n) is nonrandom. For this definition of X let Qk be the formal Edgeworth expansion as at (2.2). Theorem 5.3. Assume the distributions of Y and Z satisfy (5.3), that X = + E(n) Z, and that (5.4) holds with p(n) there replaced by E(n). Then (5.5) holds.

Y

5.4

Application to first and second problems

Application to the first problem is straightforward, provided the distribution of Z is nondegenerate. If the distribution is degenerate and the condition

p( n) is bounded away from 0

(5.7)

fails, then 0' = 0'( n) is not bounded away from 0, and this causes difficulties even in interpreting (5.5). In particular, if (5.7) fails then a formal Edgeworth expansion in powers of n- 1 / 2 is no longer appropriate; it should instead be in powers of {np(n)}-1/2. However, it is straightforward to show that if (5.7) holds then Theorems 5.1 and 5.2 remain valid when the condition that Z has a nondegenerate distribution is removed. Claims made in section 3.1, about properties of confidence intervals and bootstrap methods in the case of the "first problem" (see section 1.2), now follow directly from Theorems 5.1 and 5.2 and their counterparts for the Studentised mean, discussed in section 5.5. Next we consider allowing the distributions of Y and Z, and hence X, to vary with n. Theorems 5.1 and 5.2 continue to hold in this case, provided (a) the moment condition (5.3) is strengthened to for some E > 0,

limsup E{IY(n)1

+ IZ(n)1} k+2+E < 00,

(5.8)

n->oo

(b) the variance of Z is bounded away from 0 in the limit, i.e. liminf var{Z(n)} > 0, n->oo

(5.9)

and (c) the smoothness condition (5.1) holds in a uniform sense, i.e. lim sup sup Itl->oo n~l

IE[exp {itY(n)}] I < l.

(5.10)

(The analogue of (5.9) for Y follows from (5.10).) Claims made in section 3.2, about performance of bootstrap methods in the case of the "second problem" (see sections 1.3 and 3.2), follow from Theorems 5.1 and 5.2 under these more general conditions. To appreciate why, note that if the kernel K whose integral equals L is compactly supported, and if the

Peter Hall and Xiao-Hua Zhou

183

distribution of the random variable U has a continuous density, then we may interpret X = L{(x - U)/h} as being of the form (5.2). In that representation, Y has the distribution of L{ (x - U) / h} conditional on x - h < U < x + h, and Z has a Bernoulli distribution with

P(Z = 1) = p{U < x - hi U ~ (x - h, x

+ h)},

P(Z = 0) = 1 - P(Z = 1).

(Here we have assumed, without loss of generality, that the support of K equals [-1,1].) If in addition K is bounded and the distribution of U has a bounded density then (5.8)-(5.10) hold, and (5.4) is equivalent to (3.2).

5.5

Further generalisations and extensions

The theorems also apply to the case of the Studentised mean. There we should: (a) alter (5.5) to

p{ nl/2(X - /1)/ S

::;

x}

=

Rk(X)

+ o(n- k/ 2 )

,

where Rk is the formal Edgeworth expansion corresponding to the Studentised mean; (b) strengthen the moment condition (5.3) to

E(IYI

+ IZ1)2k+4 < 00;

(5.11)

and (c) change the smoothness assumption (5.1) to limsup Itl+lsl--+oo

IE{ exp (itY + isty 2 )}1 < 1.

(5.12)

Alternatively, the original moment condition can be retained but a more restrictive smoothness assumption imposed; compare Hall (1987b). To clarify the differences between the formal Edgeworth expansions Q k and Rk we note that Rk also admits a formula like (2.2), but with different polynomials 7rk. In particular the polynomial 7rl now equals (3 (2x2 + 1), instead of (3 (1 - x 2). See Hall (1992, Chapter 2) for discussion of these issues.

i

i

Likewise, Theorems 5.1 and 5.2 can be extended to the so-called "smooth function model", where X is replaced by a smooth function of an r-vector of means. In this case the r-variate versions of (5.11) and (5.12) are sufficient. In each generalisation, condition (5.4) on the mixing proportion may be retained. Theorems 5.1 and 5.2 also continue to hold if, instead of defining X by (5.2), we take Xi = Yi for 1 ::; i ::; (np) , and Xi = Zi for (np) < i ::; n, where Y1 , Y 2, . .. and Z 1, Z2, . .. denote independent sequences of independent copies of Y and Z, respectively, where (np) denotes the integer part of np. None of the other assumptions needs to be altered; in particular, condition (5.4) on p = p(n) may be retained. However, these variants of the theorems appear to have relatively few statistical applications.

5.6

Outline proof of Theorem 5.1

The derivation is based on characteristic functions and Fourier inversion. It is similar to that in traditional cases (e.g. Petrov, 1975, Chapter 5), with

Effects of Smoothing on Distribution Approximations

184

the exception of the method for bounding the difference, 0 say, between the characteristic functions of the left-hand side of (5.5) and of the term Qk on the right-hand side. Using standard arguments one may obtain the bound Ir1o(t)\ ::; ~n-k/2 exp( -ryt 2 ) for It I ::; (nl/2, where ~ > 0 can be arbitrarily small and ry, ( > 0 depend on ~ but not on n. For It I > (nl/2 one may establish the bound C 2 (1 - 3 )n p (n), where C2 > 0 and C3 E (0,1) depend on ( but not on n. Assuming p satisfies (5.4) we may deduce from these bounds, by taking ~ arbitrarily small, that the integral of Ir1o(t)1 over the interval (-n C 4,n C4 ), for any C4 > 0, equals o(n- k / 2 ), as has to be shown in order to complete the proof.

c

The proof of Theorem 5.2 is similar, and may be based on arguments of Hall (1992, 0 section 5.2). Xiao-Hua Zhou Peter Hall and Xiao-H ua Zhou Centre for Mathematics and its Applications Division of Biostatistics Australian National University Department of Medicine Canberra, ACT 0200, Australia Indiana University School of Medicine RG/4th Floor Regenstrief Health Center 1050 Wishard Boulevard Indianapolis, IN 46202, USA

Bibliography [1] Altman, N. and Leger, C. (1995). Bandwidth selection for kernel distribution function estimation. J. Statist. Plan. Infer. 46, 195-214. [2] Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68, 326-328. [3] Azzalini, A. and Hall, P. (2000). Reducing variability using bootstrap methods with qualitative constraints. Biometrika, to appear. [4] Bhattacharya, R.N. (1967). Berry-Esseen bounds for the multi-dimensional central limit theorem. PhD Dissertation, University of Chicago. [5] Bhattacharya, R.N. (1968). Berry-Esseen bounds for the multi-dimensional central limit theorem. Bull. Amer. Math. Soc. 74, 285-287. [6] Bhattacharya, R.N. (1970). Rates of weak convergence for the multidimensional central limit theorem. Teor. Verojatnost. i Primenen 15, 69-85. [7] Bhattacharya, R.N. and Ghosh, J. K. (1978). On the validity of the formal Edgeworth expansion. Ann. Statist. 6, 434-451. [8] Bhattacharya, R.N. and Rao, R. Ranga (1976). Normal Approximation and Asymptotic Expansions. Wiley, New York. [9] Bowman, A.W., Hall, P. and Prvan, T. (1998). Cross-validation for the smoothing of distribution functions. Biometrika 85, 799-808.

Peter Hall and Xiao-Hua Zhou

185

[IOJ Clark, L.A., Cleveland, W.S., Denby, L. and Liu, C. (1997). Modeling customer survey data. Manuscript. [l1J Cox, D.R. and Snell, E.J. (1979). On sampling and the estimation of rare errors. Biometrika 66, 125-132. Correction ibid 69 (1982),491. [12J Falk, M. (1983). Relative efficiency and deficiency of kernel type estimators of smooth distribution functions. Statist. Neer. 37, 73-83. [13J Garcia-Soidan, P.H., Gonzalez-Manteiga, W. and Prada-Sanchez, J.M. (1997). Edgeworth expansions for nonparametric distribution estimation with applications. J. Statist. Plann. InE. 65, 213-231. [14J Hall, P. (1982). Improving the normal approximation when constructing one-sided confidence intervals for binomial or Poisson parameters. Biometrika 69, 647-652. [15J Hall, P. (1987a). On the bootstrap and continuity correction. J. Roy. Statist. Soc. Ser. B 49, 82-89. [16J Hall, P. (1987b). Edgeworth expansion for Student's t-statistic under minimal moment conditions. Ann. Probab. 15, 920-931. [17J Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York. [18J Mielniczuk, J., Sarda, P. and Vieu, P. (1989). Local data-driven bandwidth choice for density estimation. J. Statist. Plan. Infer. 23, 53-69. [19J Petrov, V.V. (1975). Sums of Independent Random Variables. Springer, Berlin. [20J Reiss, R.-D. (1981). Nonparametric estimation of smooth distribution functions. Scand. J. Statist. 8, 116-119. [21J Sarda, P. (1993). Smoothing parameter selection for smooth distribution functions. J. Statist. Plan. Infer. 35, 65-75. [22J Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist. 9, 1187-1195. [23J Zhou, X.H., Melfi, A. and Hui, S.1. (1997). Methods for comparison of cost data. Ann. Internal Med. 127, 752-756.

186

Effects of Smoothing on Distribution Approximations

Survival Under Uncertainty in an Exchange Economy1 Nigar Hashimzade and Mukul Majumdar Cornell University

Abstract The paper explores a number of issues related to economic survival in market economies. An individual agent may fail to survive (may be ruined) if it faces a collapse of endowment or unfavorable terms of trade. The role of "intrinsic" and "extrinsic" uncertainty in triggering unfavorable terms of trade is examined in detail. In the presence of intrinsic uncertainty affecting the endowments, an important issue is the nature of stochastic dependence among the agents, particularly in a large economy.

1

Introduction

The last twenty years have witnessed a significant growth of the literature on the "survival problem" ([25], p.436), primarily in the context of the causes and remedies of famines. Once a subject essentially of empirical development economics, economic survival became an issue of analytical economics and, most recently, of general equilibrium theory. Considerable progress has been achieved in the theoretical analysis and empirical investigations of the causes of famines and policy measures to combat famines (see the collection edited by Dreze [10] and the detailed list of references). There has been a recognition that a partial equilibrium model, focusing on the food market, is unable to capture the complexity of events that result in famines, and may indeed render misleading policy prescriptions. It is better to turn to general equilibrium models with an explicit treatment of survival, for a better understanding of the relevant issues. Cast in a market economy framework, a formal analysis clearly indicates that an agent may fail to survive due to an "endowment failure" and/or "an adverse movement of the terms of trade" As Sen puts it in [25]' "... starvation is a matter of some people not having enough food to eat, and not a matter of there being not enough food to eat. While the latter can be a cause of the former, it is clearly one of many possible influences.,,2 The Ethiopian famine in 1972-74 and the famine in Bangladesh in 1974 provide striking examples of the "terms of trade" effect, examples in which a particular group of agents got "decimated by the market mechanism." (Sen [26]) The famine victims often belonged to the groups of non-food producers. These individuals had to acquire food in the market in exchange for their output (or labor), and, thus, were more vulnerable IThe paper is dedicated with affection and respect to Professor Rabi Bhattacharya. Thanks are due to Kaushik Basu, Steve Coate, David Easley, James Mirrlees, and Karl Shell for discussion and comments. All remaining errors are ours. 2 As a matter of fact, "Some of the worst famines have taken place with no significant decline in food availability per head." ([26], p.17)

187

188

Survival under Uncertainty in an Exchange Economy

to the shifts in the terms of trade affecting their food purchasing power (see also [20], p.14). Sen's entitlement approach elaborated in [24]- [26], as well as the model of Coles and Hammond [7] are examples of static, deterministic analysis of the survival problem in a general equilibrium framework. Uncertainty was formally introduced, and the survival probability was precisely defined in a static Walrasian model in Bhattacharya and Majumdar [4]. Here again, an agent may fail to survive ("is ruined") in a particular state of the environment for two reasons: a meager endowment in this state (a direct effect on the individual) and/or an "unfavorable" equilibrium price system at which the wealth of the agent falls short of the minimum expenditure (computed at the equilibrium price system) needed for survival (an indirect terms of trade effect involving the preferences and the endowments of all the agents). The main results of Bhattacharya and Majumdar [4] and Hashimzade [13] (reviewed briefly in Section 2) characterize the probability of survival in a "large" Walrasian economy, under alternative assumptions on the nature of dependence among economic agents, when the endowments depend on the state of the environment. In both these studies mentioned above the uncertainty is "intrinsic" , i. e. affects one of the "fundamental characteristics" (endowments) of an economy. But in a dynamic world, adverse term-of-trade effects may emerge from "extrinsic" uncertainty, which may influence current prices through self-fulfilling beliefs or expectations. Static models are obviously inadequate to deal with such a role of expectations. Risk-averse agents tend to smooth consumption over time, and their intertemporal consumption decisions depend on their expectations about future endowments and prices. These decisions, in their turn, typically affect current equilibrium prices, as well as the probability of survival. In Section 3 we explore the connection between survival and extrinsic uncertainty more formally by using the overlapping generations ("OLG") model (see [11] and [22]. A typical overlapping generations economy is an infinite horizon discrete-time economy with an infinite sequence of consumers, each living two periods. In every time period t there are "young" agents, born at t, and "old" agents, born at t -1. If young agents are endowed by consumption good(s), and old agents are endowed by nominal asset (fiat money), there is an opportunity for an inter-generational trade. We give an example of an overlapping generations economy in which an agent may be ruined even when the fundamentals (endowments and preferences) of the economy are not affected by uncertainty. Self-fulfilling beliefs of the agents based on "sunspots" may generate an adverse terms of trade, i. e. may lead to an equilibrium price system at which the consumption of old agents is below the minimum subsistence level. We note that there is already a vast literature on OLG models, following the seminal paper by Samuelson [22], and, in particular, on the role of extrinsic uncertainty (following the paper by Cass and Shell [5]), but neither this literature, nor the literature on the Arrow-Debreu model of complete markets treats the question of economic survival. In Section 4, we turn to the question of insurance against risk, and we explore the role of markets for securities in the survival problem. Lack of insurance and

Nigar Hashimzade and Mukul Majumdar

189

financial markets and the very limited access to such markets for a vast number of agents characterize many developing countries. However, even the presence of complete markets for securities does not necessarily improve the chance of survival of an agent. Trade in securities allows us to achieve optimal allocation, when the set of securities is complete (an example is a complete set of Arrow securities [1]: suppose, there are two possible states of environment. Then a complete set of Arrow securities would be a set of two securities, each paying one monetary unit in one state and nothing in another). Even so, the optimal allocation can be such that the consumption of some agents falls below the survival threshold. We consider an economy where endowments of the agents are random, and the agents can trade a complete set of securities (in our example securities yield payoff denominated in a numeraire commodity, see [12]) to insure themselves against this type of intrinsic uncertainty. We show that trade in securities can, in fact, worsen survival prospects of the agents 3.

2

Equilibrium

In what follows, R++ is the set of positive real numbers, x = (Xk) E Rl is nonnegative (written x ~ 0) if Xk ~ 0 for all k, and x is strictly positive (written x » 0) if x E R ++.

Consider, first, a deterministic Walrasian exchange economy with two goods. Assume that an agent i has an initial endowment ei = (eil' ei2) » 0, and a Cobb-Douglas utility function (2.1) where 0 < I < 1 and the pair (XiI, Xi2) denotes the quantities of goods 1 and 2 consumed by agent i. Thus an agent i is described by a pair Qi = (/, ei). Let p be the price of the first good. In a Walrasian model with two goods, we can normalize prices so that (p, 1 - p) is the vector of prices accepted by all the agents. The typical agent solves the following maximization problem (P): (2.2) subject to the "budget constraint" defined as

where the income or wealth Wi of the i-th agent is defined as the value of its endowment computed at (p,1 - p):

(2.3) Solving the problem (P) one obtains the excess demand for the first good as:

(2.4) 3We are not addressing the issue of practical implementation of securities or insurance policies. The point of this excercise is to demonstrate that the traditional approach to the equilibrium in market economies fails to tackle the survival problem precisely because the usual concept of Pareto optimality ignores the notion of survival.

Survival under Uncertainty in an Exchange Economy

190

One can verify that (2.5) The total excess demand for the first good at the prices (p, 1 - p) in a Walrasian exchange economy with n agents is given by: n

(l(P, 1- p) = L(il(p, 1- p)

(2.6)

i=l

In view of (2.5) it also follows that (2.7)

The "market clearing" Walrasian equilibrium price system is defined by

(2.8) and direct computation gives us the equilibrium price p~ (we emphasize the dependence of equilibrium price on the number of agents by writing p~) as:

(2.9) where

(2.10) To be sure, one can verify directly that demand equals supply in the market for the second good when the excess demand for the first good is zero. Finally, let us stress that a Walrasian economy is "informationally decentralized" in the sense that agent i has no information about (ej) for i i= j. Thus it is not possible for agent i to compute the equilibrium price p~.

2.1

Survival

In order to provide the motivation for our formal approach, we recall the basic elements of Amartya Sen's analysis ([26], Appendix A) in our notation. Let Fi be a (nonempty) closed subset of R~+. We interpret Fi as the set of all combinations of the two goods that enable the i-th agent to survive. Now, given a price system (p, 1 - p), one can define a function mi (p) as

(2.11) Thus, mi (p) is readily interpreted as the minimum expenditure needed for survival at prices (p, 1 - p). Example: Let (ail, ai2)

»

0 be a fixed element of R~+. Let

Pi = {(Xil,Xi2) E R~+ : XiI 2:: aiI,Xi2 2:: ai2}

(2.12)

Nigar Hashimzade and Mukul Majumdar

191

In our approach we do not deal with the set Fi explicitly. Instead, let us suppose that, in addition to its utility function and endowment vector, each agent i is characterized by a continuous function mi (p) : [0, 1] -----t R++, and say that for an agent to survive at prices (p,1 - p), its wealth Wi(P) (see (2.3) must exceed mi(p), Hence, the i-th agent fails to survive (or, is ruined) at the Walrasian equilibrium (p~, 1 - p~) if (2.13) or, using the definition (2.3) (2.14) From (2.13) and (2.14) one can see that an agent may face ruin due to (a) a possible endowment failure or (b) the equilibrium price system adversely affecting its wealth relative to the minimum expenditure. This issue is linked to the literature on the "price" and "welfare" effects of a change in the endowment on a deterministic Walrasian equilibrium (see the review of transfer problem by Majumdar and Mitra [17]). Observe that in our economy even with an exact information on the total endowment (L:~=l eil) of the first good ("food"), it is not possible to figure out how many agents may starve in equilibrium, in the absense of detailed information on the pattern of (ei' mi) (and the formula (2.9)).

2.2

Intrinsic uncertainty: computing the probability of ruin

Let us introduce uncertainty. Suppose that the endowments ei of the agents (i = 1, 2, ... n) are random variables. In other words, each ei is a (measurable) mapping from a probability space(n, F, P) into the non-negative orthant of R2. One interprets n as the set of all possible states of environment, and ei(w) is the endowment of agent i in the particular state w. The distribution of ei (.) is denoted by /-li [formally each /-li is a probability measure on the Borel (J field of R2, its support being a nonempty subset of the strictly positive orthant of R2]. From the expression (2.9), the "market clearing" equilibrium price p~(w) is random, i.e., depends on w: (2.15) The wealth Wi(p~ (w)) of agent i at p~ (w) is simply p~ (w )eil (w )+[1-p~ (w )]ei2(w). The event is the set of all states of the environment in which agent i does not survive. Again, from the definition of the event R~. it is clear that an agent may be ruined due to a meager endowment vector in a particular state of environment.

Survival under Uncertainty in an Exchange Economy

192

In what follows, we shall refer to this situation as a "direct" effect of endowment uncertainty or as an "individual" risk of ruin. But it is also possible for ruin to occur through an unfavorable movement of the equilibrium prices (terms of trade) even when there is no change (or perhaps an increase!) in the endowment vector. A Walrasian equilibrium price system reflects the entire pattern of endowment that emerges in a particular state of the environment. Given the role of the price system in determining the wealth of an agent and the minimum expenditure needed for survival, this possibility of ruin through adverse terms of trade can be viewed as an "indirect" ("terms of trade") effect of endowment uncertainty. To begin with let us make the following assumptions:

A2. {Xi} are uncorrelated, {Yi} are uncorrelated.

A3. some

[(1/n) 2.= EXi] converges to some 7f2

7r 1

> 0,

[(1/n) 2.= EY;] converges to

> 0 as n tends to infinity.

In the special case when the distributions of ei are the same for all i (so that lin 2:i EXi = 7fl, where 7fl is the common expectation of all Xi; similarly for 7f2), A3 is satisfied. Under A1-A3, if the number n of agents increases to infinity, as a consequence of the weak law of large numbers (see Lamperti [15], p.22) we have the following property of equilibrium prices p~:

Proposition 1. Under Ai-A3, as n tends to infinity, ability to the constant

p~(w)

converges in prob-

(2.16) Roughtly, one interprets (2.16) as follows: for large values of n, the equilibrium price will not vary much from one state of the environment to another, and will be insensitive to the exact value of n, the number of agents. For the constant Po defined by (2.16), we have the following characterization of the probability of ruin in a large Walrasian economy:

Proposition 2. function,

If POeil(w)

+ (1

- po)ei2(w) has a continuous distribution

(2.17)

Nigar Hashimzade and Mukul Majumdar

193

Remark: The probability on the right side of (2.17) does not depend on n, and is determined by J-li, a characteristic of agent i, and PO. Our first task is to characterize P(R~) when n is large (so that the assumption that an individual agent accepts market prices as given is realistic). One is tempted to conjecture that the convergence property of Proposition 1 will continue to hold if correlation among agents becomes 'negligible' as the size of the economy increases. We shall indicate a 'typical' result that captures such intuition. Proposition 3. Let the assumptions (Ai) and (AS) hold. Moreover, assume

(A.2') There exist two non-negative sequences (£kh2:o, (£~)k2:0 both converging to zero such that for all i, k

ICOV(Xi,Xi+k)1 ICov(Yi, Yi+k)1 Then, as n tends to infinity,

p~ (w)

< £k <

£~

converges in probability to the constant

Po = 7r1/[ 7r 1 + 7r2]

2.3

Some comments on Walrasian equilibria

The analysis so far is deceptively simple for one primary reason. Once one dispenses with the Cobb-Douglas functional form, one loses the formula (2.15) in which a unique equilibrium in every w is conveniently computed. A more general treatment - unavoidably more technical - is in [4] which contains the proofs of Propositions 1-3 above, and Proposition 4 below. In a more general framework with 1 2 2 goods (see [2] and [9] for a classical exposition of the deterministic Walrasian equilibrium theory), we begin with the price simplex S

= {p = (Pk)

E Rl : p

> 0, tPk = I}. An agent i accepts k=1

the price system pES as given. It is described by a pair (Ii, ei), where the endowment vector ei E Rl, ei »0. The wealth of the agent i at P is I

P . ei == L Pkeik· The demand function fi is a continuous function from k=1 S x R++ to R~ such that for every (p, Wi) E S x R++, P . fi(P, Wi) = Wi

Wi

=

I

(where p. fi(P, Wi) - LPkfidp, Wi)). Usually the demand functions are de-

k=1

rived from a utility maximization problem of type (P) indicated above. For our analysis, the key concepts are the excess demand function of agent i, defined as (i(p) - Ii(p, Wi) - ei (compare to (2.4)). The excess demand function I

for the economy is ((p) =

L (i(p), i=1

a continuous function on S.

Note that

Survival under Uncertainty in an Exchange Economy

194 I

LPk(ik(P)

= 0;

hence, the excess demand function for the economy satisfies

k=l

the "Walras Law":

I

p. ((p)

==

LPk(k(P)

= O.

(2.18)

k=l

An equilibrium price system p* E S satisfies ((p*) = O. By Walras Law (2.18), iffor any fJ E S (k(fJ) = 0 for k = 1,··· ,l-1 then necessarily (dfJ) = 0 for k = l. The Walras Law (2.18) can be verified directly from (2.3) and (2.4) in our example, and when the equilibrium price (2.9) is derived for the first market, there is also equilibrium in the second market which can be directly checked. A detailed exposition of this model with l 2: 2 commodities is in Debreu [9]. In [3] the Debreu model was extended to introduce random preferences and endowments, and the implications of the law of large numbers and the central limit theorem were first systematically explored. Throughout this section we shall consider l = 2 to see the main results in the simplest form.

2.4

Dependence: Exchangeability

We shall now see that if dependence among agents does not "disappear" even when the economy is large, the risk of ruin due to the "indirect" terms of trade effect of uncertainty may remain significant. To capture this in a simple manner, let us say that /1 and v are two possible probability laws of {ei (.) h::O-l. Think of Nature conducting an experiment with two outcomes "H" and "T" with probabilities (e, 1 - e), 0 < e < 1. Conditionally, given that "H" shows up, the sequence {ei (.) h>l is independent and identically distributed with common distribution /1. On the other hand, conditionally given that "T" shows up, the sequence {ei (-) h::o- I is independent and identically distributed with comon distribution v. Let 7rlp, and 7rlv be the expected values of Xl under /1 and v respectively. Similarly, let 7r2p, and 7r2v be the expected values of YI under /1 and v. It follows that Pn (-) converges to Po (-) almost surely, where Po (-) = 7flp,/[7flp, + 7f2p,] = Pop, with probability and Po(-) = 7rlv/[7flv +7r2v] = POv with probability 1 - e. We now have a precise characterization of the probabilities of ruin as n tends to infinity. To state it, write

e

J ri (/1)

{(UI,U2) E R! : POp,UI

l

+

(1- POp,)U2 :::; mi(POp,)};

(2.19)

lL (dU I, dU2).

Similarly, define ri(v) obtained on replacing /1 by v in (2.19). Proposition 4. Assume that POeil (w) + (1- PO)ei2(w) had a continuous distribution function under each distribution /1 and v of ei = (eil' ei2).

(a) Then, as the number of agents n goes to infinity, the probability of ruin of the i-th agent converges to ri (/1), with probability to ri(v), with probability 1 when "T" occurs.

e

e,

when "H" occurs and

Nigar Hashimzade and Muku] Majumdar

195

(b) The overall, or unconditional, probability of ruin converges to

Here, the precise limit distribution is slightly more complicated, but the important distinction from the case of independence (or, "near independence") is that the limit depends not just on the individual uncertainties captured by the distributions j1 and v of an agent's endowments, but also on () that retains an influence on the distribution of prices even with large n.

2.5

Dependency neighborhoods

Dependency neighborhoods were introduced by Stein [28] and are defined in the following way. Consider a set of n random agents. A subset Si of the set of integers {I, 2, ... , n} containing an agent i is a dependency neighborhood of i if i is independent of all agents not in Si. The sets Si need not constitute a partition. Further, consider a dependency neighborhood of Si - a set Ni such that Si N i , and the collection of agents in Si is independent of the collection of agents not in N i . The latter can be viewed as the second-order dependency neighborhood of the agent i. In general,

c:

Ni =

U Sj

(2.20)

{jEsd need not be the case (this is related to the fact that pairwise independence does not imply mutual independence), although one might expect this relation to hold in non-exotic situations (see, for example, [21]). Consider now an economy En with dependency neighborhoods sin), ... ,S~n) for each of n agents. As above, the i-th agent is characterized by Qi = (/, ei), where ei = (eil' ei2). The Walrasian equilibrium price p~ is given by (2.9)-(2.10). The convergence property, similar to Proposition 3, holds under modified assumptions on the distribution of random endowments and an additional assumption on the size of the dependency neighborhood. Proposition 5. Let the assumptions (A 1) and (A 3) hold. Moreover, assume

max 1COV(Zi' Zj) 1< B < 00 , Z E {X, Y}, for every i = i#jESt) 1, ... ,n uniformly in n for some sufficienly large positive B. (A.2") Bni ==

(A.4) Sn - . max US;n):::; n 1 - c uniformly in n for some ~==1, ..

E

E (0,1).

. ,n

Then, as n tends to infinity, p~ (w) converges in probability to p lim p~ (w) =

7f1 7f1

+ 7f2

Using the results of Majumdar and Rotar [19], we can construct approximate distribution of equilibrium price in a large Walrasian economy.

.

Survival under Uncertainty in an Exchange Economy

196

Proposition 6. Let the assumptions (A.1), (A.2"), (A.3) and (A.4) hold. Let us also assume that (2.20) holds for the dependency neighborhoods structure. Then the distribution of P~ (w) can be approximated by normal distribution with mean Po and variance Vn defined as

Po

(2.21 ) (2.22)

(See [13] for proofs.)

3

Extrinsic uncertainty with overlapping generations: an example

In the previous section we assumed that endowments of the agents are different in different states of environment. This type of uncertainty, that affects the socalled fundamentals of the economy (endowments, preferences, and technology), is called the intrinsic uncertainty. When the uncertainty affects the beliefs of the agents (for example, the agents believe that market prices depend on some "sunspots") whereas the fundamentals are the same in all states, this type of uncertainty is called extrinsic uncertainty. Clearly, with respect to the probability of survival, the extrinsic uncertainty has no direct effect, because it does not affect the endowments. However, it may have an indirect effect: self-fulfilling beliefs of the agents regarding market prices affect their wealth, and some agents may be ruined in one state of environment and survive in some other state, even though the fundamentals of the economy are the same in all states. To study the indirect, or the adverse term-of-trade effect of extrinsic uncertainty on survival we turn to a dynamic economy. Consider a discrete time, infinite horizon OLG economy with constant population. We use Gale's terminology [11] wherever appropriate. For expository simplicity, and without loss of generality we assume that at the beginning of every time period t = 1,2, ... there are two agents: one "young" born in t, and one "old" born in t - 1. In period t = 1 there is one old agent of generation O. There is one (perishable) consumption good in every period. The agent born in t (generation t) receives an endowment vector et = (e¥, en and consumes a We consider the Samuelson case 4 and assume, without loss vector Ct = (c¥, of generality, et = (1,0). We assume that the preferences of the agent of generation t can be represented by expected utility function Ut (-) = E rut (Ct)] with Bernoulli utility ut (Ct), continuously differentiable and almost everywhere twice continuously differentiable, strictly concave and strictly monotone onn D, compact, convex subset of R~+. The old agent of generation 0 is endowed with one

cn.

4If a population grows geometrically at the rate ,,(, so that "(t agents is born in period t, and there is only one good in each period, the Samuelson case corresponds to marginal rate of intertemporal substitution of consumption under autarky, Ul (e Y , eO)/U2 (e Y , eO), being less than 'Y- In our case "( = 1.

Nigar Hashimzade and Mukul Majumdar

197

unit of fiat money, the only nominal asset in the economy. In every period the market for the perishable consumption good is open and accessible to all agents. Denote the nominal price of the consumption good at time t by Pt. Define a price system to be a sequence of positive numbers, p = {Pt} ~o, a consumption program to be a sequence of pairs of positive numbers c = {Ct} ~o, a feasible program to be a consumption program that satisfies cf + Cf-l ~ ef + ef-I = 1. The agent of generation t maximizes his lifetime expected utility in the beginning of period t. In period 1, the young agent gives its saving (sf) of the consumption good, to the old agent in exchange for one unit of money (the exchange rate is determined by PI)' Thus, PI Sl = 1. This unit of money is carried into period 2 (the old age of agent born in period 1) and is exchanged (at the rate determined by P2) for the consumption food saved by the young agent born in period 2 (s~). The process is repeated.

3.1

Perfect Foresight Equilibrium

If there is no uncertainty, with perfect foresight the price-taking young agent's optimization problem is the following:

maxU(cf,

cn

subject to

cf cf (0

~

sf

~

1-

sf

ptsf /PHI

1, t = 1,2, ... ).

Here, sf - ef - cf is savings of the young agent (this is the Samuelson case, in Gale's definitions [11]). A perfect foresight competitive equilibrium is defined as a feasible program and a price system such that (i) the consumption program c = {cd solves optimization problem of each agent given p = {j:Jt} : (cr, cf) E V, cr = 1 - St and cf = '[JtSt/PHl with St

= arg

max U

O~s¥~1

((1-

sf)

,sf _Pt

)

Pt+1

and (ii) the market for consumption good clears in every period:

c¥ + Cf-l PtSt

for t

=

1,2,···.

1 1

(demand (demand

supply for the consumption good) supply for money)

Survival under Uncertainty in an Exchange Economy

198

By strict concavity of the utility function U (c¥, cn, the young agent's optimization problem has a unique solution. Hence, we can express St as a single-valued function of pt! Pt+ 1, i. e. we write St = St (Pt! PH d. This function (called savings function) generates an offer curve in the space of net trades, as price ratios vary. In the perfect foresight equilibrium (3.1)

The stationary perfect foresight monetary equilibrium is a sequence of constant prices P and constant consumption programs (1 - s, s), where s = s(1).5

3.2

Sunspot equilibrium

Now consider an extrinsic uncertainty in this economy. There is no uncertainty in fundamentals, such as endowments and preferences, but the agents believe that market prices depend on realization of an extrinsic random variable (sunspot). We assume that there is one-to-one mapping from the sunspot variable to price of the consumption good. Because the agents cannot observe future sunspots, they maximize expected utility over all possible future realization of the states of nature. We examine the situation with two states of nature, (J" E {a, ,6}, that follow a first-order Markov process with stationary transition probabilities, (3.2)

where 7ffJfJl > 0 is the probability of being in state (J"' in the next period given that current state is (J", and 7ffJOi + 7f fJ (3 = 1. A young agent born in t observes price pf and solves the following optimization problem: max [7f fJOi U(c¥,fJ, c~'Oi)

+ 7f fJ (3U(c¥,fJ, c~,(3)]

subject to 1- sf

(0 ~

sf

~ 1, s(::::: 0,

(J",(J"'

E

{a,,6}).

We restrict our attention to stationary equilibria, in which prices depend on the current realization of the state of nature (J", and do not depend on the calendar time nor the history of (J". A stationary sunspot equilibrium, SSE, is a pair of feasible programs and nominal prices, such that for every (J" E {a,,6} (i) the consumption programs solve the agents' optimization problem: sfJ (pfJ / pfJ l ) =

arg max [7ffJOiU ((1 - sfJ), sfJpfJ /pOi) 0<80-<1

+ (3.3)

5Given our assumptions on preferences and endowments, the stationary perfect foresight monetary equilibrium exists and is optimal (see, for example, [16], Ch. 8).

Nigar Hashimzade and Mukul Majumdar

199

and (ii) markets clear in every period, in every state.

It is easy to see that a stationary sunspot equilibrium exists when the equation pO: sO: (po:) _ pf3 pf3

s(3

(pf3) = 0 pO:

(3.4)

has positive solutions for pO: /pf3 other than 1 . Solution pO: /Pf3 = 1 corresponds to the equilibrium in which uncertainty does not matter. It can be shown that, if sunspot equlibria exist in this economy, there is at least two of them, with pO: /pf3 > 1 and pO: /pf3 < 1 (see, for example, [6], [27]). This means that in the sunspot equilibrium consumption of old agents is above the certainty equilibrium consumption of olds in one state of nature and below in the other. Suppose, we introduce an exogenous minimal subsistence level of consumption (independent of a E {oo,,B}). It may be the case that in one of the states of nature consumption of old agents falls short of minimal subsistence level: old agents are ruined. Note that the endowments are not affected by the uncertainty, and, therefore, there is no direct effect of uncertainty on ruin. The event of ruin is caused purely by an indirect, or term-of-trade effect: the equilibrium price system is such that the wealth of old agents does not allow them to survive. The following numerical example illustrates this possibility for the case of quadratic utility.

3.3

Ruin In equilibrium

Let the preferences of the agents be represented by expected utility function with

U(c)

u(eY,e O )

-

v(eO )

1 1 2aveYeo + q eY + reo - -b( eY)2 - -d( CO? 2 2 2 ~ (A - eO) , 0 < CO ::::; A { 0, eO> A where a, b, e, q, r, 0, A are positive constants such that the utility function is increasing and jointly concave in its arguments in V. v(·) is the disutility of consuming less than A, the minimal subsistence level. 6 As above, agents in each generation receive identical positive endowments e = 1 of consumption good when young and zero endowments when old; the initial olds are endowed with one unit of money. 6It may seem odd that the disutility from starvation is finite, but this can be justified by the willingness of the agents to take a risk. Consider the following. In the continuous time, if the consumption of an old agent is above A, he lives to the end of the second period. If his consumption is below A, perhaps, he does not die immediately. Albeit low, the amount consumed allows him to live some time in the second period, and his lifespan in the second period is the longer, the closer is his consumption to A. In the discrete time this translates into probability of survival in the second period as a function of consumption. Thus, the

200

3.3.1

Survival under Uncertainty in an Exchange Economy

Benchmark case: perfect foresight

For the above preferences, savings function St (pt/ Pt+!) is implicitly defined by (3.5) where Pt == Pt/Pt+l. The offer curve is described by

In the stationary (deterministic) perfect foresight monetary equilibrium consumption plan of an agent is (x, 1 - x), where x solves

a(

3.3.2

J~ t : 1

x -

+ x (b+ d) + v'(I- x) + q -

x)

r - b= 0

(:1.7)

Stationary sunspot equilibria

Two states of nature, 0: and f3 evolve according to a stationary first-order Markov process. The states of nature do not affect the endowments. Agents can trade their real and nominal assets. In a stationary sunspot equilibrium with trade sa, sf3 solve the following system of equations: 7f Oo

7f oo

a(Jl-;~a +

aJ l~:a + (1 - 7f00) aJ l~:a +

r - ds o

-

V'(SO))

q - b (1 - SO) =

+ (1_7f 00 ) (aJl~r + r -

(3.8)

d s f3 - V'(Sf3)) ::

and

It is easy to see that one solution is sa = sf3 = 1- x, where x solves the equation for the perfect foresight above. This solution does not depend on the transition probabilities, prices and consumption are not affected by the uncertainty: sunspots do not matter in this equilibrium. However, there may be more solutions. For example, for a = 2, b = 0.5, d = 7, q = 0.02, r = 0.6 (J = 0.05 A = 0.3 and n CW = n f3f3 = 0.15 there are three stationary monetary equilibria in the economy: one coinciding with the perfect foresight equilibrium and two sunspot equilibria. Prices and consumption programs for these equilibria are given in the following table. old agent survives with probability 1 if CO ;:;:. A and with probability less than 1 if CO < A. Suppose, the objective of the agent is to maximize the probability of survival (or maximize his expected lifespan). Then it can be presented equivalently as the objective to minimize the disutility from consumption at the level below A. Clearly, this disutility can be finite, at least in the vicinity of A, if the agent is willing to take a risk. The authors are indebted to David Easley for this argument.

Nigar Hashimzade and Mukul Majumdar

201

State

PFE

1st SSE

2nd SSE

0:

(0.6670; 0.3330; 3.00)

(0.5973; 0.4027; 2.48)

(0.7518; 0.2482; 4.03)

(3

(0.6670; 0.3330; 3.00)

(0.7518; 0.2482; 4.03)

(0.5973; 0.4027; 2.48)

(In every entry, the first number is consumption of young, the second is consumption of old, and the third is nominal price of consumption good.) The consumption programs in sunspot equilibria are Pareto inferior to the program in the perfect foresight equilibrium. Furthermore, in two sunspot equilibria old agents survive in one state of nature and fail to survive in another with the same amount of resources, because equilibrium price is too high. (We intentionally considered the case where agents survive in the certainty equilibrium to demonstrate that survival is always feasible. Also, in this model young agents always survive, - otherwise, the overlapping generations structure collapses.)

4

Insurance and survival

The purpose of the following examples is to demonstrate that trade in securities does not guarantee survival of all agents. Furthermore, trade in securities can even deteriorate the survival chances of some agents. For expositionary simplicity, we consider a static Cobb-Douglas-Sen economy, similar to the one described in Section 2.

4.1

Static economy with two states: definitions

Let us first restate the definitions of a stochastic general equilibrium concept in a Cobb-Douglas-Sen economy with logarithmic preferences for a particular case of two possible states of environment. Consider a pure exchange economy with two goods, l E {1,2}, with good 1 being a numeraire. There are two states of nature, sEn = {o:, (3}, with 7r = P[s = 0:] = 1 - P[s = (3]. Two consumers, i E {1,2}, receive endowments ei(s) = (eil(s),ei2(s)) E Each consumer is characterized by the Cobb-Douglas logarithmic utility function:

Ri·

(4.1) In addition, each consumer is characterized by the minimum expenditure function, mi (P* (.)), the level of wealth at and below which consumer i fails to survive in the equlibrium with (random, normalized) equilibrium price vector (l,p*(·)). Consumers maximize utility in every state, taking price as given. A random equilibrium is defined as a set of vectors of allocations, {Xi (s ) }, and prices, p* (s) for each state of nature, such that

Survival under Uncertainty in an Exchange Economy

202

1. Given normalized price vector (l,p*(s)) in state s, consumption vector Xi(S) = (Xi1(S),Xi2(S)) maximizes utility of consumer i in state s subject to his budget constraint, xi1(s) + P*(S)Xi2(S) :::; eil(s) + p*(s)ei2(s), for every i and s; 2. Markets for consumtpion goods clear in every state. If we allow "( (the parameter in Cobb-Douglas preferences) vary across the consumers, the equilibrium price in state s will be

Hence, wealth of consumer i in state s is

Assume, for simplicity, that the minimum expenditure function is the same for all agents and has linear form:

m(p*(s))

=

ao

+ p*(s)al

for some positive constants ao and al. Then, consumer i is ruined in state s if

If this inequality holds for consumer i for s = Q' only, then consumer i is ruined with probability 7r. If it holds for s = f3 only, then i is ruined with probability (1 ~ 7f). If it holds for consumer i in both states, then i is ruined with probability 1. Suppose, consumers know 7f. The question is, if consumers could trade securities before s is realized, would this improve their chances to survive?

4.2

Arrow-type securities in a two-period economy

Assume now, that in the economy described in Section 4.1 there are two time periods, t = 0,1. Let the preferences of the consumers be described by von Neumann-Morgenstern expected utility function, with Bernoulli utility in the log Cobb-Douglas form (4.1), with "( varying across consumers. At t = 0 consumers can issue and trade contracts in real Arrow-type securities. At t = 1 consumers receive their endowments, execute the contracts and trade consumption goods. Markets for securities are complete: for every state of nature there is a security that promises to deliver at t = lone unit of numernire good if this particular state occurs, and nothing in other states (see [23] and [12] for a more general exposition). Denote the holdings of security that pays in state s by for consumer i; E R. Consumers know probability distribution

yt

yt

Nigar Hashimzade and Mukul Majumdar

203

of the states of nature. In time period t = 0 they choose holdings of securities, or portfolios, (yt, yf) to maximize expected utility of consumption in time period t = 1. We normalize price of the asset that pays in state a to unity and denote price of the asset that pays in state f3 by q. A random equilibrium with complete asset markets is a set of vectors of portfolios {(yt,yf)}, allocations {Xi(S)}, security prices (l,q) and consumption good prices (l,p(s)) for each state of nature, such that 1. Given asset prices (1, q) and normalized consumption good price vector

(1, p( S )) in state s, portfolio (yt, yf) and consumption vector Xi (s) = (Xil(S),Xi2(S)) maximize expected utility of consumer i at t = 0 subject to his budget constraints at t = 0, yt + qyf s: 0, at t = 1, XiI (s) + P*(S)Xi2(S) s: ei1(s) + p*(s)ei2(S) + Yi, for every i and s; 2. Asset markets clear at t = 0; 3. Markets for consumption goods clear at t

= 1 in every state.

Routine calculations give the following expressions for equilibrium prices: q

p(f3) and

_()

p a

=

1- 7r El(a) 7r EI (f3)

----

p(a) E2(a)El (f3) EI (a)E2(f3)

Li(l-7rl'i)eil(a) - (1-7r) Lil'ieil(f3)E1(a)/E1(f3) 7r Li l'i ei2(a) + (1 - 7r) Li l'i ei2(f3)E2 (a)/ E 2 (f3) .

Here, EI(S) == Li eli(s) is aggregate endowment of good I in state s. Wealth (in terms of the numeraire) of consumer i at t = 1 is then

-

El (f3) -

Note, that Wi (f3) = -(-)Wi(a), which means that if there is no aggregate E1a uncertainty in the endowment of numeraire, wealth is equalized across states. If there is no aggregate uncertainty in the endowments of both goods, relative price of consumption goods is also equalized across states. Then p = p( a) = p(f3) will be between p* (a) and p* (f3) and Wi = Wi (a) = Wi (f3) will be between Wi ( a) and Wi (f3). For the minimum expenditure function in the above form, we will also have that mi (p) = mi (p( a)) = mi (p(f3)) will be between mi (p* (a) ) and mi (P* (f3)). Could it happen that wealth of a consumer in a particular state falls below the minimum subsistence level in an economy with securities, whereas without securities his wealth in the same state is above the minimum subsistence level?

Survival under Uncertainty in an Exchange Economy

204

The following simple numerical examples demonstrate this possibility for the case with no aggregate uncertainty and for the case with aggregate uncertainty in endowments.

4.2.1

Example A: No Aggregate Uncertainty

Consider an economy with two consumers, i E {I,2}. Let the preferences of these two consumers and their endowments in two states be the following:

Let P[s = 0:] securities

Consumer i

ri

ei (0:)

ei ((3)

i = 1

1/2

(1,0)

(0,2)

i=2

1/3

(1,4)

(2,2)

1 - P[s = (3] =

1f

= 1/4. Then in the equilibrium without

7 8

p* (0:)

-

p* ((3)

-

4

5

and in the equilibrium with securities

p( 0:)

=

p((3)

=

31 38·

Suppose, both consumers have minimal expenditure function in he linear form, with the same parameters ao = 3/4 and al = 1. Then the survival threshold in the economy without securities is 1.625 in state 0: and 1.55 in state (3. It is easy to see that agent i = 1 is ruined in state s = 0: and survives in state s = (3; agent i = 2 survives in both states. With securities, the survival threshold in both states is ~ 1.5658, and agent i = 2 still survives in both states, but agent i = 1 is now ruined in both states.

4.2.2

Example B: Aggregate Uncertainty

Consider the same economy, now with aggregate uncertainty in the endowments:

ri

ei ( 0:)

ei ((3)

i=I

1/2

(1,0)

(0,2)

i = 2

1/3

(0,2)

(2,2)

Consumer i

Nigar Hashimzade and Mukul Majumdar

With

7r

205

= 1/4 the equilibrium price without securities is p*(a) p*({3)

3 4 4 5

-

and with securities

p(a) p({3)

15 19 30 19

Let the minimal expenditure function for both consumers be linear, with ao = 1/5 and al = 1. The survival threshold in an economy without securities is, then, 0.95 in state a and 1 in state {3. Both agents survive in both states. With securities, the survival threshold is ~ 0.990 in state a and ~ 1.779 in state {3. In that case, agent 2 still survives in both states, but agent 1 survives only in {3 and is ruined in a. These two examples demonstrate how trade in securities may worsen survival prospects of the agents with random endowments even when markets for securities are complete.

5

Concluding remarks

In this paper we introduced a formal general equilibrium approach to the problem of survival under uncertainty. The question of obvious practical importance is "how does one improve the chance of survival of an agent"? Clearly, when ruin is caused by market forces, the intervention of the government is desirable. The choice of the optimal policy is determined by the policy tools available to the government, and the sensitivity of the survival probability to the changes in policy variables. For the case of static economy with intrinsic uncertainty this problem was touched upon in [4]. In particular, under certain assumptions on the joint distribution of the endowments and linearity of the minimum expenditure function, the probability of survival of an agent increases as the limiting averages of the endowments increase. For the OLG economy with extrinsic uncertainty we showed elsewhere [14] that a lump-sum tax and transfer policy, with the amounts of taxes and transfers depending on equilibrium market price, can stabilize consumption at certainty equilibrium level (without affecting prices), thus eliminating the possiblity of ruin of the agents. In any case, the general equlibrium framework has to be used in order to accurately predict the outcomes of various policy measures. Another issue should be mentioned. Throughout this paper we assumed that the objective of an agent is to maximize his expected utility (as the traditional economic theory postulates). In a model with a single agent Majumdar and Radner [18] explored the implications for maximization of the probability of

206

Survival under Uncertainty in an Exchange Economy

survival. A systematic extension of this analysis to a framework with many interacting agents remains an important direction of research. Nigar Hashimzade and Mukul Majumdar Department of Economics, Cornell University, Ithaca, New York 14853

Bibliography [1] K. Arrow, The Role of Securites in the Optimal Allocation of Risk-bearing, Rev. Econ. Studies 31 (1964),pp. 91-96. [2] Y. Balasko and K. Shell, The Overlapping-Generations Model. II. The Case of Pure Exchange with Money, J. Econ. Theory 24 (1981), pp. 112-142. [3] R. N. Bhattacharya and M. Majumdar, Random Exchange Economies, J. Econ. Theory 6 (1973), pp. 37-67. [4] R. N. Bhattacharya and M. Majumdar, On Characterizing the Probability of Survival in a Large Competitive Economy, Review of Economic Design 6 (2001), pp. 133-153. [5] D. Cass and K. Shell, Do Sunspots Matter? J. Polito Economy 91 (1983), pp. 193-227. [6] S. Chattopadhyay and T.J. Muench, Sunspots and Cycles Reconsidered, Economic Letters 63 (1999), pp. 67-75. [7] J .L. Coles and P.J. Hammond, Walrasian Equilibrium Without Survival: Existence, Efficiency and Remedial Policy. In: Basu et al (eds.), "Choice, Welfare and Development." Oxford: Clarendon Press (1995), pp. 32-64. [8] G. Debreu, "Theory of Value; An Axiomatic Analysis of Economic Equilibrium", New Haven: Yale University Press (1959). [9] G. Debreu, Economies with a Finite Set of Equilibria, Econometrica 38 (1970), pp. 387-392. [10] Jean Dreze (ed.), "The Economics of Famine" , Cheltenham, Northampton, MA, USA: An Elgar Reference Collection (1999).

UK,

[11] D. Gale, Pure Exchange Equilibrium of Dynamic Economic Models, J. Econ. Theory 6 (1973), pp. 12-36. [12] J. D. Geanakoplos and H. M. Polemarchakis, Existence, Regularity, and Constrained Suboptimality of Competitive Allocations when the Asset Market is Incomplete. In: W. P. Heller, R. M. Starr, and D. A. Starrett (Eds. ), "Uncertainty, Information, and Communication: Essays in Honor of Kenneth J. Arrow." Cambridge, New York and Melbourne: Cambridge University Press (1986), Vol. III, pp. 65-95. [13] N. Hashimzade, Probability of Survival in a Random Exchange Economy with Dependent Agents, forthcoming in Economic Theory (2002).

Nigar Hashimzade and Mukul Majumdar

207

[14] N. Hashimzade, "Survival with Extrinsic Uncertainty: Some Policy Issues", Working Paper (2002), Cornell University. [15] J. Lamperti, "Probability." New York: Benjamin (1966). [16] L. Ljungqvist and T. Sargent, "Recursive Macroeconomic Theory". Cambridge, Massachusetts; London, England: The MIT Press (2000). [17] M. Majumdar and T. Mitra, Some Results on the Transfer Problem in an Exchange Economy. In: Dutta, B. et al (eds.), "Theoretial Issues in Development Economics." New Delhi: Oxford University Press (1983), pp. 221-244. [18] M. Majumdar and R. Radner, Linear Models of Economic Survival Under Production Uncertainty, Economic Theory 1 (1991), pp.13-30. [19] M. Majumdar and V. Rotar, Equilibrium Prices in a Random Exchange Economy with Dependent Agents, Economic Theory 15 (2000), pp. 531550. [20] M. Ravallion, "Markets and Famines," Oxford: Clarendon Press (1987). [21] Y. Rinott and V. Rotar, A Multivariate CLT for Local Dependence with n- 1 / 2 Iog n rate, and Applications to Multivariate Graph Related Statistics, J. Multivariate Analysis 56 (1996), pp. 333-350. [22] P. Samuelson, An Exact Consumption-Loan Model ofInterest with or without the Social Contrivance of Money, JPE 66 (1958), pp. 467-482. [23] W. Shafer, Equilibrium with Incomplete Markets in a Sequence Economy. In: M. Majumdar (ed.), "Organizations with Incomplete Information. Essays in Economic Analysis: A tribute to Roy Radner." Cambridge, New York and Melbourne: Cambridge University Press (1998), pp. 20-41. [24] A. Sen, Starvation and Exchange Entitlements: A General Approach and its Application to the Great Bengal Famine, Cambridge J. Econ. 1 (1977), pp. 33-60. [25] A. Sen, Ingredients of Famine Analysis: Availability and Entitlements, Quarterly Journal of Economics 96 (1981), pp. 433-464. [26] A. Sen, "Poverty and Famines: An Essay on Entitlement and Deprivation," Oxford: Oxford University Press (1981). [27] S. Spear, Sufficient Conditions for the Existence of Sunspot Equilibria, J. Econ. Theory 34 (1984), pp. 360-370. [28] C. Stein, Approximate Computation of Expectations. Harvard, CA: IMS (1986).

208

Survival under Uncertainty in an Exchange Economy

Singular Stochastic Control in Optimal Investment and Hedging in the Presence of Transaction Costs Tze Leung Lai Stanford University

and Tiong Wee Lim National University of Singapore Abstract In an idealized model without transaction costs, an investor would optimally maintain a proportion of wealth in stock or hold a number of shares of stock to hedge a contingent claim by trading continuously. Such continuous strategies are no longer admissible once proportional transaction costs are introduced. The investor must then determine when the stock position is sufficiently "out of line" to make trading worthwhile. Thus, the problems of optimal investment and hedging become, in the presence of transaction costs, singular stochastic control problems, characterized by instantaneous trading at the boundaries of a "no transactions" region whenever the stock position falls on these boundaries. In this paper, we review various formulations of the optimal investment and hedging problems and their solutions, with particular emphasis on the derivation and analysis of Hamilton-Jacobi-Bellman (HJB) equations using the dynamic programming principle. A particular numerical scheme, based on weak convergence of probability measures, is provided for the computation of optimal strategies in the problems we consider.

1

Introduction

The problems of optimal investment and consumption and of option pricing and hedging were initially studied in an idealized setting whereby an investor incurs no transaction costs from trading in a market consisting of a risk-free asset ("bond") with constant rate of return and a risky asset ("stock") whose price is a geometric Brownian motion with constant rate of return and volatility. For example, Merton (1969, 1971) showed that, for an investor acting as a pricetaker and seeking to maximize expected utility of consumption, the optimal strategy is to invest a constant proportion (the "Merton proportion") of wealth in the stock and to consume at a rate proportional to wealth. In the related problem of option pricing and hedging, arbitrage considerations of Black and Scholes (1973) demonstrated that, by setting up a portfolio of stock and option that is risk-free, the value of an option must equal the amount of initial capital required for this hedging. However, both the Merton strategy and the Black-Scholes hedging portfolio require continuous trading and result in an infinite turnover of stock in any finite

209

Singular Stochastic Control

210

time interval. In the presence of transaction costs proportional to the amount of trading, such continuous strategies are prohibitively expensive. Thus, there must be some "no transactions" region inside which the portfolio is insufficiently "out of line" to make trading worthwhile. In such a case, the problems of optimal investment and consumption and of option pricing and hedging involve singular stochastic control. As we shall see, Bellman's principle of dynamic programming can often be used to derive (at least formally) the nonlinear partial differential equation (PDE) satisfied by the value function of interest. The derived PDE will then suggest methods (analytic or numerical) to solve for the optimal policies. One such numerical scheme, based on weak convergence of probability measures, will be particularly useful to the problems described in this paper. It turns out that some of the resulting free boundary problems can be reduced to optimal stopping problems in ways suggested by Karatzas and Shreve (1984, 1985), thereby simplifying the solutions of the original optimal control problems. We will focus on the two-asset (one bond and one stock) setting which many authors consider. Besides simplifying the exposition, such a setting can be justified by the so-called "mutual fund theorems" whenever lognormality of prices is assumed; see, for example Merton (1971) in the absence of transaction costs and Magill (1976) in the presence of transaction costs. Specifically, the market consists of two investment instruments: a bond paying a fixed risk-free rate r > 0 and a stock whose price is a geometric Brownian motion with mean rate of return Q > 0 and volatility a > o. Thus, the prices of the bond and stock at time t ~ 0 are given respectively by

dE t

= rEt dt

and

(1.1 )

where {Wt : t ~ O} is a standard Brownian motion on a filtered probability space (0, F, {Fdt2::o, JP) with Wo = 0 a.s. The investor's position will be denoted by (Xt, yt) (in Section 2) or (Xt, Yt) (in Section 3), where

X t = dollar value of investment in bond,

= dollar value of investment in stock, Yt = number of shares held in stock.

yt

(1.2)

In particular, we note the relation yt = Yt St.

The rest of the paper is organized as follows. In Section 2, we consider optimal investment and consumption, beginning with a treatment of the "Merton problem" (no transaction costs) over a finite horizon, and then proceeding to the transaction costs problem considered by Magill and Constantinides (1976) and, more recently, by ourselves. We also consider the infinite-horizon case, drawing on results from Davis and Norman (1990) and Shreve and Soner (1994), and review the work of Taksar, Klass and Assaf (1988) on the related problem of maximizing the long-run growth rate of the investor's asset value. The problem of option pricing and hedging in the presence of transaction costs is considered in Section 3. Some concluding remarks are given in Section 4.

Tze Leung Lai and Tiong Wee Lim

2

211

Optimal Consumption and Investment with Transaction Costs

The investment and consumption decisions of an investor comprise three nonnegative {Ft}t>o-adapted processes C, L, and M, such that C is integrable on each finite time interval, and Land Mare nondecreasing and right-continuous with left-hand limits. Specifically, the investor consumes at rate Ct from the bond and L t (resp. M t ) represents the cumulative dollar value of stock bought (resp. sold) within the time interval [0, t], 0 ::; t ::; T. In the presence of proportional transaction costs, the investor pays fractions 0 ::; A < 1 and 0 ::; fJ, < 1 of the dollar value transacted on purchase and sale of stock, respectively. Thus, the investor's position (Xt, yt) satisfies dXt dyt

= (r X t - Ct) dt - (1 + A) dL t + (1 = ayt dt + ayt dWt + dL t - dMt .

fJ,) dMt ,

(2.1a) (2.1b)

The factor 1 + A (resp. 1 - fJ,) in (2.1a) reflects the fact that a transaction fee in the amount of A dL (resp. fJ, dM) needs to be paid from the bond when purchasing dL (resp. selling dM) dollar value of stock. We define the investor's wealth (or net worth) as Zt = X t

+ (1 -

fJ,)yt

if yt ~ 0;

By requiring that the investor remains solvent (i.e., has nonnegative net worth) at all times, the investor's position is constrained to lie in the solvency region D which is a closed convex set bounded by the line segments

= a~D = [h,D

{(x, y) : x

> 0,

y

{(x,y) : x::; 0, y

+ (1 + A)Y = O}, ~ 0 and x + (1- fJ,)y = O}.

< 0 and

x

We denote by A(t, x, y) the class of admissible policies, for the position (Xt, yt) = (x, y), satisfying (Xs, Y s ) E D for t ::; s ::; T, or equivalently, Zs ~ 0 for t ::; s ::; T. At time t, the investor's objective is to maximize over A(t, x, y) the expected utility

J(t, x, y)

~ IE [iT e-~('-') U (C,) ds + e~(T-') U2(ZT) Ix, ~ x, Y" ~ y1' j

where f3 > 0 is a discount factor and U1 and U2 are concave utility functions of consumption and terminal wealth. We assume that U1 is differentiable and that the inverse function (U{)-l exists. Often U1 and U2 are chosen from the so-called HARA (hyperbolic absolute risk aversion) class:

U (c)

= CI

/,y if r < 1,

r =I 0;

U(c) = loge

if r = 0,

which has constant relative risk aversion -cU" (c) jU' (c) = 1 value function by

V(t, x, y)

=

sup (C,L,M)EA(t,x,y)

J(t, x, y).

r.

(2.2)

We define the

(2.3)

Singular Stochastic Control

212

2.1

The Merton Problem (No Transaction Costs)

Before presenting the solution to the general transaction costs problem (2.3), we consider the case A = /1 = 0 (no transaction costs) analyzed by Merton (1969). In this case, by adding (2.1a) and (2.1b), the total wealth Zt = X t + yt can be represented as (2.4) where ()t = yt/(Xt + yt) is the proportion of the investment held in stock. Using the reparameterization z = x + y, the value function can be expressed as

V(t,z) =

sup

(C,L,M)EA(t,z)

lE [rT e-!3(s-t)U1(C s )ds + e-!3(T-t)U2(ZT) I Zt

it

where A(t, z) denotes all admissible policies (C, ()) for which Zs

= z] ,

> 0 for all

t :::;: s :::;: T. The Bellman equation for the value function is max{(8/8t C,o

+ £:)V(t, z) + U(C)

subject to the terminal condition VeT, z) generator of (2.4): £:=

- (3V(t, z)} = 0,

(2.5)

= U2 (z), where £: is the infinitesimal

a 2()2 z 2 8 2 8 2 8z2 +[rz+(0:-r)Oz- C18z '

Formal maximization with respect to C and () yields C = (Un- 1 (Vz ) and () -(Vz/Vzz) (0: - r)/a 2z (in which subscript denotes partial derivative, e.g., Vz 8V/8z). Substituting for C and () in (2.5) leads to the PDE

8V _ (0: - r)2 (8V/8z)2 8t 2a 2 8 2V/8z 2 where C*

+

( _ C*)8V rz 8z

+

U (C*) _ (3V = 0 1

,

= =

(2.6)

= C*(t, z) = (Un-l (Vz(t, z)). Let p=

0: - r

c

(1 -1)a2'

=

_1_

[(3 _ Ir _

1- I

Ci(t) = c/{l - q\ec(t-T)}

(i = 1,2),

1(0: - r) 2 ] 2(1 -1)a 2 '

(PI =

1,

(h =

(2.7)

1- c.

If U1 takes the form (2.2), then C* = (Vz)l/CY-l) and solving the PDE yields the optimal policy: (); == p and C; = C1(t)Zt when U2 == 0, or C; = C 2(t)Zt when U2 takes the form (2.2). Note that c = when I = O. Thus, in the Merton problem, the optimal strategy is to devote a constant proportion (the Merton proportion p) of the investment to the stock and to consume at a rate proportional to wealth. Furthermore, for i = 1 or 2 (corresponding to U2 == 0 or to (2.2)), the value function is

(3

z"Y

Vet, z) = - [Ci(t)P-l I

Vet, z)

=

ai(t)

1

+ Ci(t)

if I < 1, 1-=1=0;

log[Ci(t)z]

if I = 0,

Tze Leung Lai and Tiong Wee Lim

213

where ai(t) = ,6-2[r - ,6 + (a - r)2/2o- 2]{1 - e t3 (t-T) [1 + 0, V(t, z) < 00 when "( ::; o. A necessary and sufficient condition for a finite value function when 0 < "( < 1 is ,6 > "(r + "((a - r)2/{2(1 - "()o-2}. Corresponding results for general utility functions U1 and U2 have been given by Cox and Huang (1989), who use a martingale technique instead of the usual dynamic programming principle. By working under the equivalent martingale measure so that differences in mean rates of return among assets are removed, the martingale approach allows candidate optimal policies to be constructed by solving a linear (instead of nonlinear) PDE; see also Karatzas, Lehoczky and Shreve (1987).

2.2

Transaction Costs and Singular Stochastic Control

In the presence of transaction costs, analytic solutions are generally unavailable, even for HARA utility functions. One approach to the problem is to apply a discrete time dynamic programming algorithm on a suitable approximating Markov chain for the controlled process. This approach is based on weak convergence of probability measures, which will ensure that the discrete-time value function converges to its continuous-time counterpart as the discretization scheme becomes infinitely fine. Note that the optimal investment and consumption problem involves both singular control (portfolio adjustments) and continuous control (consumption decisions).

We begin with an analysis of the Bellman equation, which will subsequently suggest an appropriate Markov chain approximation for our problem. We can obtain key insights into the nature of the optimal policies by temporarily restricting Land M to be absolutely continuous with derivatives bounded by"" l.e., Lt

=

lt

Rsds

and

Mt

=

lt

(2.8)

msds,

Proceeding as before, the Bellman equation for the value function (2.3) is max {(a/at

C,C,m

+ £:)V(t, x, y) + U1 (C) - ,6V(t, x, y)} = 0,

(2.9)

subject to V(T, x, y) = U2(X + (1 - fJ,)y) if y ?:: 0; V(T, x, y) = U2(x + (1 + >.)y) if y < 0, where £: is the infinitesimal generator of (2.1a)-(2.1b):

(J"2y2 a 2

a

a

[a

a ]

[

a

a]

£: = -2- ay2 +(rx-C) ax +ay ay + ay - (1 + >.) ax R+ (1 - fJ,) ax - ay m.

(2.10) The maximum in (2.9) is attained by C = (Un- 1 (Vx ), R = "'TI{Vy~(1+A)Vx}' and m = ",TI{Vy ::;(l-p,)Vx}. Thus, it can be conjectured that buying or selling either takes place at maximum rate or not at all, and the solvency region 'D can be partitioned into three regions corresponding to "buy stock" (B), "sell stock" (S), and "no transactions" (N). Instantaneous transition from B to the buy boundary aB or from S to the sell boundary as takes place by letting '" ---+ 00 and moving the portfolio parallel to aA'D or ap,'D (i.e., in the direction of

Singular Stochastic Control

214

V

(-1, (1 + A)-l or (1, -(1- JL)-l)T, where T denotes transpose). This suggests that Vet, x, y) = Vet, x + (1 - JL)8y, y - by) for (t, x, y) E Sand Vet, x, y) = Vet, x - (1 + A)8y, y + t5y) for (t, x, y) E B. In the limit as t5y -----f 0, we have

Vy(t, x, y) = (1 - JL)Vx(t, x, y), Vy(t, x, y) = (1 + A)Vx(t, x, y), In

N

the value function satisfies (2.9) with R= m

(t,x,y) E S, (t,x,y) E B.

(2.11a) (2.11b)

= 0, leading to the PDE

av (J2y2 a 2v * av av * - + - - + ( r x - C )-+ay-+U1(C )-;3V=O, at 2 ay2 ax ay

(t,x,y) EN, (2.11c)

where C* = C*(t,x,y) = (Un- 1 (Vx (t,x,y)) as in (2.6). To solve (2.11a)-(2.11c), the first step is to find an approximating Markov chain which is locally consistent with the controlled diffusion (2.1a)-(2.1b). Following Kushner and Dupuis (1992), we will use the "finite difference" method to obtain the transition probabilities of the approximating Markov chain. Specifically, for a candidate consumption decision (i.e., continuous control) C, we make the following (standard) approximations to the derivatives in equation (2.11c):

Vi(t, x, y)

-----f

[Vet + 8, x, y) - Vet, x, y)]/8,

V( x t,x,y )

-----f

{

-----f

{[V(t+t5,X,Y+E) - V(t+t5,X,Y)]/E ify ~ 0, [Vet + t5,x,y) - Vet + 8,x,y - E)]/E if Y < 0,

-----f

[Vet + 8,x,y + E)

T7 (

Vy t,x,y

)

Vyy(t,x,y)

[V(t+8'X+E,Y)-V(t+t5,X,Y)l/E ifrx-C~O, [Vet + 8,x,y) - Vet + 8,x - E,y)l/E if rx - C < 0,

+ Vet + 8,x,y -

E) - 2V(t + 8,X,Y)]/E 2.

(2.12) Collecting terms and noting that C* in (2.11c) is the optimal control, we obtain the following backward induction equation for the "consumption step":

VO(t, x, y)

= e-!3 omsx

{~p(X' f) x,y

1

x, y)V(t + t5, X, f))

+ 8Ul(C)}

,

(2.13)

where only the following five transition probabilities are nonzero:

p(x ±

E,

Y 1 x, y) = (rx - C)±8/E,

p(x, y ± E1 x, y) = ay±b/E + ((J2y2 /2)t5/E 2 , p(x, y 1x, y) = 1 - (Irx -

CI + alyl)8/E -

((J2y2)8/E2.

Equation (2.13) is to be evaluated for t E 1[' = {O, 8, 2t5, ... , N 8} with 8 = T /N and (x, y) belonging to some grid X x 1{ made up of multiples of ±E. Given 8, the choice of E must ensure that p(x, y 1 x, y) ~ 0. Let Al = maxxEX,C Irx - CI and A2 = max y E1{ Iyl. Then one could set

Tze Leung Lai and Tiong Wee Lim

215

A similar treatment of equations (2.11a)-(2.11b) yields respective relations for the "sell step" and the "buy step" (singular controls):

VS(t, x, y)

=

pV(t, x, Y - E)

vh(t, x, y) = (1

+ (1 -

+ >.)-1 [>' V(t, x -

p)V(t, x + E, y - E),

E, y)

+ V(t, x -

E, Y + E)].

Since only one of buy, sell or no transactions can happen at each step, the dynamic programming equation for the (discrete-time) finite horizon value function is therefore

V(t, x, y)

=

max{VO(t, x, y), VS(t, x, y), Vh(t, x, y)},

with terminal condition V(T, x, y) = U2(x + (1 - p)y) if Y 2': 0; V(T, x, y) = U2 (x + (1 + >.)y) if Y < O. For a sufficiently fine grid 1l' x x: x Y, this gives good approximations to the value function (2.3) and the transaction regions: (t, x, y) E S if V(t, x, y) = VS(t, x, y) and (t, x, y) E S if V(t, x, y) = Vh(t, x, y). When U1 and U2 take the form (2.2), we find that V is concave and homothetic in (x, y): for 'r/ > 0,

V(t, 'r/X, 'r/y)

=

'r/'V(t, x, y)

V(t, 'r/X, 'r/y) = {;3-1 [1-

if, < 1, ,

e(J(t-T)]

-I 0;

+ e(J(t-T)} log'r/ + V(t, x, y)

if,

= O.

Homotheticity of V suggests that if equations (2.11a) and (2.11b) are satisfied for some (t, x, y) E as and as, respectively, then the same is true for any (t, 'r/X, 'r/y) with 'r/ > o. Thus, it can further be conjectured that the boundaries between the transaction and no transactions regions are straight lines (rays) through the origin for each t E [0, T]. Moreover, since C* = (Vx )l/ b -l), equation (2.11c) becomes

av O' 2y2 a 2v av av 1 - , (av)'/h-l) -+----+rx-+ay-+-- --;3V = 0 at 2 ay2 ax ay , ax '

(t,x,y) EN,

with the fifth term on the l.h.s. of (2.14) replaced by -(1 + log Vx )

(2.14) when, = O.

We can further exploit homotheticity of V to reduce the nonlinear PDE (2.14) to an equation in one state variable. Indeed, let 'I/J(x) = V(t,x,l) so that V(t, x, y) = Y''I/J(t, x/y). Then, for some functions A*(t), A*(t), and -(1- p) < x*(t) < x*(t) < 00, equations (2.11a)-(2.11b) and (2.14) are equivalent to the following when, < 1 and, -I 0:

'I/J(t,x)

=

'I/J(t, x)

=

,-I A*(t)(x + ,-I A*(t)(x +

1- p)',

x ::; x*(t),

(2.15a)

1 + >')"

x 2': x*(t),

(2.15b)

x

E

[x*(t),x*(t)], (2.15c)

where

b3

=

0'2/2. (2.16)

Singular Stochastic Control

216

A similar set of equations can also be obtained for 'Y = o. A simplified version of the numerical scheme described earlier in this section can be implemented to solve for 'lj;(t, x) as well as the boundaries x*(t) and x*(t). For details and numerical examples, see Lai and Lim (2002a). Hence, for HARA utility functions, the optimal policy for the transaction costs problem (2.3) is given by the triple (C*,L*,M*), where

and L; =

!at lI{xs/ys=x*(s)} dL:,

t E [0, T].

The introduction of transaction costs into Merton's problem in Section 2.1 has the following consequence. The investor should optimally maintain the proportion of investment in stock between B*(t) := [1 + x*(t)]-1 > 0 and B*(t) := [1 + x*(t)]-1 < J-L-I, i.e., B*(t) :::; B; :::; B*(t) in our earlier notation. Thus, the no transactions region N is a "wedge" in the solvency region D. Such an observation can be traced back to Magill and Constantinides (1976), who found that "the investor trades in securities when the variation in the underlying security prices forces his portfolio proportions outside a certain region about the optimal proportions in the absence of transaction costs." The foregoing analysis and solution of problem (2.3) can be extended to the case of more than one stock. While a straightforward application of the principle of dynamic programming would suffice to derive the Bellman equation, computational aspects of the problem become much more involved. As pointed out by Magill and Constantinides (1976), m stocks imply 3m possible partitions of the solvency region so even for moderately large m (e.g., 35 ~ 250, 3 10 ~ 60000) it is unclear how to systematically solve for the transaction regions. When the stock prices are geometric Brownian motions, Magill (1976) established a mutual fund theorem on the reduction of the optimal investment and consumption problem to the case consisting of a bond and only one stock.

2.3

Stationary Policies for Infinite-Horizon Problems

We can view the infinite-horizon optimal investment and consumption problem as the limiting case of the finite-horizon problem in Section 2.2. By setting t = 0 and letting T --+ 00, the finite-horizon value function (2.3) approaches the following infinite-horizon value function (dropping the subscript on U1 ): V(x, y)

=

sup (C,L,M)EA(x,y)

IE

roo e-(3tU(Ct ) dt,

Jo

(x,y) ED,

(2.17)

where A(x,y) denotes the set of all admissible policies (C,L,M) for an initial position (x, y) E D such that (Xt, yt) ED for all t ~ 0 a.s. Because the problem no longer depends on time t, the regions 5, 5, and N are stationary over time. The Bellman equation is given by (2.9) without a/at. The analysis of Section

Tze Leung Lai and Tiong Wee Lim

217

2.2 carries over, leading to analogs of equations (2.11a)-(2.11c) (i.e., without t and av/at). For a general utility function U, the numerical procedure described in Section 2.2 can be modified to give a solution of the infinite-horizon investment and consumption problem. With the finite difference approximations given by (2.12) but without t or t + <5, we obtain, after normalization, the following analog of (2.13):

VO(x,y) = mgxe-(j3+a 2y2 /c 2 ),5

{~P(X'YIX'Y)V(X'Y) + <5U1(C)} ,

(2.18)

x,y

where <5

= E/'L., 'L. = Irx - CI + alyl, and p(x ± E, Y I x, y) = (rx - C)±<5/E,

p(x, y ±

E

I x, y) = ay±<5/E.

Thus, proceeding as in Section 2.2, the dynamic programming equation is

V(x, y) = max{Vo(x, y), VS(x, y), Vb(x, y)}, where VS(x, y)

(2.19)

= p, V(x, y - E) + (1 - p,)V(x + E, y - E) and Vb(x, y) = (1 +

.\)-1 [.\ V(x - E, y) + (1- '\)V(x - E, Y + E). According to which value on the r.h.s. of (2.19) V(x, y) takes, the position (x, y) is classified as belong to N, S, or B.

We next specialize U to take the form (2.2) to simplify the dynamic programming equation. For future reference, we begin with some results for the case of no transaction costs (.\ = p, = 0). An analysis of the infinite-horizon analog of (2.5) (i.e., without a/at) yields (); -= p and C; = cZt for all t ;?: 0, where p and c are given by (2.7). The value function is

V(z) 1 [

z'Y = _c'Y- 1

'Y

V(z) = {32 r - (3 +

if'Y < 1, 'Y =I- 0;

(a-r)2] 20- 2

1 + fjlog({3z)

if'Y

= O.

These results can also be derived from those in Section 2.1 on the Merton problem by letting T ----7 00, since then Ci (0) ----7 C (i = 1, 2). In the presence of transaction costs, the control problem has been independently considered by Davis and Norman (1990) using the principle of smooth fit and by Shreve and Soner (1994) using the concept of viscosity solutions to second-order PDEs. Earlier Constantinides (1986) obtained an approximate solution of the problem under the restriction that the investor consumes at a rate proportion to his holding in bond. A general numerical procedure when there are m > 1 stocks has been developed by Akian, Menaldi and Sulem (1996). Because V is concave and homothetic, it is possible to reduce the problem to solving ordinary differential equations (ODEs). Indeed, the control problem can be solved by finding a C 2 function 'ljJ and constants 00 > x* > x* > -(1 - p,) and A*, A* satisfying equations (2.15a)-(2.15c) without time dependence. It can be shown that ()* ::; p ::; ()*, with ()* = (1 + X*)-1, ()* = (1 + x*)-1. Two sufficient conditions for finiteness of the value function

Singular Stochastic Control

218

V are f3 > 1r + 1(0: - r)2 /{2(1-1)a 2} and (f3 - 0:1)(1 + A) > (f3 - r1)(1- J1); see Shreve, Saner and Xu (1991). Interestingly, if lump-sum transaction costs proportional to portfolio value (e.g., portfolio management fees) are imposed in addition to proportional transaction costs, then portfolio selection and withdrawal for consumption are made optimally at regular intervals (as opposed to trading at randomly spaced instants of time), with the investor consuming deterministically between transactions, as shown by Duffie and Sun (1990). To find the constants x*, x*, A*, A*, and the function 'ljJ, the principle of smooth fit can be first applied to 'ljJ" at x* and x* to solve for A* and A * (which depend on x* and x* respectively). Next, the second order ODE (2.15c) (without t and 8'ljJ / 8t) can be written as a pair of first-order equations after a change of variables. Specifically, for 1 # 0 (so U(c) = c'Y /1), let Q(f) = -bI/1- b2f + (1-1)b 3f 2 and R(f) = -bI/1 + (b3 - b2)f -1b3f2, where b1 , b2, and b3 are defined in (2.16). Then there exist functions f(x) and h(x) satisfying the system of differential equations

l' =

1 [R(f) - h], -b

f(x*) =

3X

1*

:=

x*::

+ A'

(2.20a) hi = _1_ _h_[h - Q(f)], 1 -1 b3 xf

h(x*) = Q(f*),

h(x*) = Q(f*),

(2.20b)

such that 'ljJ(x) =

~

[1 h (X)]'Y1 1-1

1

[~]'Y f(x)

satisfies (2.15c) (without t and 8'ljJ/8t). In this case, the optimal consumption policy is Ct = C*(Xt, Yt), where C*(x, y) = 1(1 - 1)-lxh(x/y)/ f(x/y). The case 1 = 0 can be treated similarly. Davis and Norman (1990) suggested the following algorithm for the numerical solution of (2.20a)-(2.20b) (in which f, h, x*' x* need to be determined). The iterative procedure starts with an arbitrary value x* of x* > 1 - p, and the corresponding values j* = x* /(x* + 1 + A) and h* = Q(}*). It uses numerical integration to evaluate j(x)

= j*

-l x

x*

R(j(u~) -

h(u) du,

3U

h(x) = h* - _1_1 x* h(u)[h(u) -=- Q(j(u))] du 1-1 x b3uf(u) for a sequence of decreasing x values until the first value x* of x for which h(x*) ~ Q(}(x*)). At this point, we have a solution of (2.20a)-(2.20b) with J1 replaced by x* + 1 - x*/ f(x*). The iterative procedure continues by adjusting the initial guess x* and computing the resulting x*' terminating when x* + 1 x*/ f(x*) differs from J1 by no more than some prescribed error bound.

Tze Leung Lai and Tiong Wee Lim

2.4

219

Maximization of Long-Run Growth Rate

An alternative optimality criterion was considered by Taksar, Klass and Assaf (1988). Instead of maximizing expected utility of consumption as in (2.17), suppose the objective is to maximize, in the model (2.1a)-(2.1b) without consumption (i.e., C t == 0), the expected rate of growth of investor assets (equivalently the long-run growth rate). This optimality criterion can be reformulated in terms of R t = Yt/ X t alone so that the problem is to minimize the following limiting expected "cost" per unit time: (2.21 ) where .\

f-Lx f(x)=x+l'

a2x 2 2(x + 1)2

(

2

( ) o:-r+- X- . 2 x +1 (2.22) In (2.21), L t (resp. NIt) can be interpreted as the cumulative percentage of stock bought (resp. sold) within the time interval [0, t], and is related to L t (resp. M t ) via dL t = (1/ Xt) dL t (resp. dMt = Y;;-1 dMd. If'\ = f-L = 0 (no transaction costs), the second and third terms in (2.21) vanish and the optimal policy is to keep Rt equal to the optimal proportion obtained as the minimizer of h(x). This is tantamount to setting Ot (= Yt/(X t + Yt)) equal to p* := (0: - r)/a 2 + 1/2, which resembles the Merton proportion pin (2.7).

g(x)=x+l'

h(x)

=

We study the general problem of minimizing (2.21) under the condition Io:-rl < (]"2/2. (If this condition is violated, the optimal policy is to transfer all the investment to bond or stock at time 0 and to do no more transfer thereafter.) Since

an analysis of the value function V using the Bellman equation shows (in a manner similar to the previous section) that there exist constants x *, x*, A (optimal value) such that

(a 2 /2)x 2 V"(x)

+ (0: -

r

+ a 2 /2)xV'(x) + h(x)

- A

= 0,

x E [x*, x*], (2.23a)

V'(x)

=

F(x),

x ~ x*'

V'(x) = G(x),

x> x* , -

(2.23b)

where F(x) = -'\(I+x)-l(I+(I+.\)x)-l and G(x) = f-L(1+x)-l(I+(I-f-L)X)-l. Using the principle of smooth fit at x* and x*, we find that A = h((1 + A)X*) = h((1 - f-L)x*), from which it follows that either

or

x* = (_1_) (p* - 1/2)(1 + A)X* + p* 1 - f-L (1 - p*)(1 + A)X* - p*

(2.24) Hence, even though an alternative criterion (of maximizing long-run growth rate) is used to assess the optimality of investment policies, the above analysis shows that like Section 2.3 the investor should again optimally maintain the

Singular Stochastic Control

220

proportion of investment in stock between ()* := x*/(1 + x*) and ()* := x* /(1 + x*). The constants x* and x* can be computed by solving the second-order nonhomogeneous ODE V'( ) = 2 x a 2 x 2p*

l

x

[h(

x*

x*

) _ h( )] 2(p*-1) d + F* + x*F~ (x* Y Y Y 1 - 2P* x

)2 P* (2.25)

with initial conditions V'(x*) = F* := F(x*) and V"(x*) = F~ := F'(x*) at x*' which is obtained by differentiating (2.23a)-(2.23b). A search procedure can then be employed to find that value of x* for which x* given by (2.24) satisfies V'(x*) = G(x*) in view of (2.23b).

3

Option Pricing and Hedging

This section considers the problem of constructing hedging strategies which best replicate the outcomes from options (and other contingent claims) in the presence of transaction costs, which can be formulated as the minimization of some loss function defined on the replication error. In our recent work, we directly minimize the (expected) cumulative variance of the replicating portfolio in the presence of additional rebalancing costs due to transaction costs. As shown in Section 3.3, this leads to substantial simplification as the optimal hedging strategy can be obtained by solving an optimal stopping (instead of control) problem. In Sections 3.1 and 3.2 we review an alternative approach, developed by Hodges and Neuberger (1989), Davis, Panas and Zariphopoulou (1993) and Clewlow and Hodges (1997), which is based on the maximization of the expected utility of terminal wealth and which generally results in a free boundary problem in four-dimensional space. Instead of solving the free boundary problem, Constantinides and Zariphopoulou (1999) derived analytic bounds on option pnces.

3.1

Formulation via Utility Maximization

The utility-based approach adopts a paradigm the investor trades only in the underlying stock and proportional transaction costs are imposed Following the notation in (1.2), his holding of (number of shares) is given by dXt

similar to Section 2. Suppose on which the option is written on purchase and sale of stock. bond (dollar value) and stock

= r X t dt - (1 + ),,)St dL t + (1 -

(3.1a)

fL)St dMt ,

dYt = dL t - dMt ,

(3.1b)

where L t (resp. M t ) represents the cumulative number of shares bought (resp. sold) within the time interval [0, t]. Define the cash value of y shares of stock when the stock price is S by

Y(y, S) = (1

+ )..)yS

if y < 0;

Y(y, S)

= (1 - fL)yS if y

~

o.

Tze Leung Lai and Tiong Wee Lim

221

For technical reasons, the investor's position is constrained to lie in the region

v = ((x,y,S)

E lR 2 x lR+: x

+ Y(y,S) > -a}

(3.2)

for some prescribed positive constant a. We denote by A(t, x, y, S) the class of admissible trading strategies (L, M) for the position (x, y, S) E V at time t such that (Xs, Ys, Ss) E V for all s E [t, T]. The objective is to maximize the expected utility of terminal wealth, giving rise to the value functions Vi(t,x,y,S)

=

lE [U(Z~)],

sup

i

= 0, s, b,

(3.3)

(L,M)EA(t,x,y,S)

where U : lR -+ lR is a concave increasing function (so it is a risk-averse utility function). The terminal wealth of the investor (with or without an option position) is given by z~

=

Z~ =

Z~ =

+ Y(YT, ST) X T + Y(YT, ST) II{STSK} + [Y(YT - 1, ST) + K] II{ST>K} X T + Y(YT, ST) II{STSK} + [Y(YT + 1, ST) - Kj II{ST>K} XT

(no call), (sell a call), (buy a call),

in which we have assumed that the option is asset settled so that the writer delivers one share of stock in return for a payment of K when the holder chooses to exercise the option at maturity T. In the case of cash settled options, the writer delivers (ST - K)+ in cash, so Z~ = X T + Y(YT, ST) - (ST - K)+ and Z~ = X T + Y(YT, ST) + (ST - K)+. From the definition of the value functions (3.3), it is evident that an application of the principle of dynamic programming will yield the same PDE for each value function (i = 0, s, b), with the terminal condition governed by utility of the respective terminal wealth. By temporarily restricting Land M as in (2.8) (and then letting r;, -+ (0), the Bellman equation for Vi is max£,m(8/8t + £)Vi(t, x, y, S) = 0, where £ is the infinitesimal generator of (3.1a)-(3.1b) and dSt = St(adt + adWt ):

£

8 = rx-

8x

2

2

8 +a S2 8 + as- -2 + [ -8 8S

2

8S

8y

(1

8 ]f + + A)8x

[ (1 - JL)8 - -8 ] m. 8x

8y

Thus, once again, the state space can be partitioned into regions in which it is optimal to buy stock at the maximum rate, or to sell stock at the maximum rate, or not to do any transaction. Arguments similar to those in Section 2 show that there exist functions y*(t, x, S) (buy boundary) and y*(t, x, S) (sell boundary) for each i = 0, s, b such that V~(t, x, y, S) = (1

+ A)SV:(t, x, y, S),

V~(t,x,y,S) = (1- JL)SV:(t,x,y,S),

lI;;i + rxV:

+ aSV~ + (a 2 S2 /2)V~s

=

0,

y

s:;

y*(t, x, S),

(3.4a)

y::::: y*(t,x,S),

(3.4b)

y E [y* (t, x, S), y* (t, x, S)], (3.4c)

The optimal hedging strategy associated with (3.3) is given by the pair (L *, M*), where for each i = 0, s, b, L*t --

l

0

t

II {ys=y.(s,Xs,Ss)} dL*s'

t E

[O,Tj.

Singular Stochastic Control

222

Two different definitions of option prices have been proposed. In Hodges and Neuberger (1989) and subsequently in Clewlow and Hodges (1997), the reservation selling (resp. buying) price is defined as the amount of cash ps (resp. pb) required initially to provide the same expected utility as not selling (resp. buying) the option. Thus, ps and pb satisfy the following equations:

(3.5) An alternative definition is used by Davis, Panas and Zariphopoulou (1993). Assuming that U(O) = 0, define Xi

=

inf{x : Vi(O,

X,

0, S)

~

O},

°

i

= O,s, b,

°

so in particular, X O:s; because VO(O, 0, 0, S) ~ (investing in neither bond nor stock is admissible). Thus, an investor pays an "entry fee" -x o to trade in the market strictly on his own account. The selling price ps and buying price pb of the option are then constructed such that the investor is indifferent between going into the market with and without an option position: ps = X S - x O and pb = -(x b -x O ). Although they advocate this definition for the option writer's price, Davis, Panas and Zariphopoulou (1993, pp. 492-493) express reservations of using it to define the buyer's price.

3.2

Solution for Exponential Utility Functions

A reduction in dimensionality (from four to three) can be achieved by specializing to the negative exponential utility function U(z) = 1 - e--Yz (with constant index of risk aversion -U"(z)jU'(z) = ,). Using this utility function, the bond position can be managed through time independently of the stock holding and

Vi(t, X, y, S)

=

1 - exp { _,xer(T-t)} Hi(t, y, S),

i = 0, s, b,

where Hi(t, y, S) := 1 - Vi(t, 0, y, S). As a consequence, the free boundary problem (3.4a)-(3.4c) for each i = 0, s, b is transformed into the following problem:

H;(t, y, S)

= _,er(T-t) (1

H;(t,y,S) = H:

_,e

+ )")SHi(t, y, S),

y:S; y*(t, S),

(3.6a)

p)SHi(t,y,S),

y ~ y*(t,S),

(3.6b)

[y*(t, S), y*(t, S)].

(3.6c)

r (T-t)(1_

+ aSH1 + (a 2 S 2 /2)H1s =

0,

y

E

It is also straightforward to observe that the price definitions are equivalent to pS _ -1 -rTl [HS(O,O,S)] -, e og HO(O,O,S) ,

pb __ -

-1 -rT

,

e

1

[Hb(O, 0, S)]

og HO(O, 0, S)

. (3.7)

The solution of the free boundary problem (3.6a)-(3.6c) can be obtained by approximating dYt = dL t - dMt and dSt = St(a dt + a dWt ) with Markov chains and applying a that discrete-time dynamic programming algorithm as in

Tze Leung Lai and Tiong Wee Lim

223

Section 2.2. To this end, it is useful to note from (3.6a)-(3.6b) that

Hi(t, Yl, S) = Hi(t, Y2, S) exp {~l'er(T-t)(l

+ ),,)S(YI ~ Y2)},

Yl:::; Y2 :::; y*(t, S),

Hi(t, Yl, S) = Hi(t, Y2, S) exp { ~l'er(T-t)(l ~ P,)S(YI ~ Y2)}, Yl 2': Y2 2': y*(t, S). We discretize time t so that it takes values in 1f = {O, 15, 215, ... , N 15}, where 15 = T / N. The number of shares is also discretized so that Y is a multiple of E. Then we can approximate the stock price process using the following random walk: with probability p, with probability 1 ~ p, where u = Ja 215 + (a ~ a 2/2)215 2 and p = [1 + (a ~ a 2/2)15/u]/2. Let Y = {kE : k is an integer} and § = {e ku So : k is an integer} This discretization scheme leads to the following algorithm for (t, y, S) E 1f X Y X §:

Hi(t, y, S) = min {Hi(t, Y + E, S) exp Hi(t, Y ~

E,

ber(T-t) (1

+ ),,)SE] ,

S) exp [ ~ "(er(T-t) (1 ~ p,)SE] ,

pHi(t + 15, y, eUS)

+ (1 ~ p)Hi(t + 15, y, e-US) };

(3.8)

see Davis, Panas and Zariphopoulou (1993) and Clewlow and Hodges (1997) for details. Depending on which term on the r.h.s. of (3.8) is the smallest, the point (t, y, S) is classified as belonging to B, S, or N, respectively. We set Y* ( t, S) (resp. y* (t, S)) to be the largest (resp. smallest) value of Y for which (t, y, S) E B (resp. S).

3.3

A New Approach

The previous analysis shows that, in the presence of transaction costs, perfect hedging of an option is not possible and trading in options involves an element of risk. Indeed, if the region D defined in (3.2) is replaced by the solvency region of Section 2, Soner, Shreve and Cvitanic (1995) showed that "the least costly way of hedging the call option in a market with proportional transaction costs is the trivial one-to buy a share of the stock and hold it." By relaxing the requirement of perfect hedging, Leland (1985) and Boyle and Vorst (1992) demonstrated that discrete-time hedging strategies, for which trading takes place at regular intervals, can nearly replicate the option payoff at maturity. The option price is essentially the Black-Scholes value with an adjusted volatility. While hedging error can be reduced to zero as the time between trades approaches zero, the adjusted volatility approaches infinity and the option value approaches the value of one share of stock. A new approach has been recently proposed in Lai and Lim (2002b). The formulation is motivated by the original analysis of Black and Scholes (1973) in the following way: form a hedging portfolio that minimizes hedging error and price the option by the (expected) initial capital require to set up the hedge.

Singular Stochastic Control

224

For the hedging portfolio, the objective is to minimize the expected cumulative instantaneous variance and additional rebalancing costs due to transaction fees, given by

J(t, S, y) = IE

[iT iT

F(s,

+",

s" y,) ds +

(S,/K) dM,

>'iT

Is,

(S'/ K) dT",

=

S,y,

=

yj.

= a 2(S/ K)2[y - fl.(t, S)]2

for the option writer and F(t, S, y) = + ~(t,S)F for the option buyer. Here, fl.(t,S) = N(d1(t,S)) is the Black-Scholes delta (i.e., the number of shares in the option's perfectly replicating portfolio) with where F(t, S, y)

a 2(S/K)2[y

d1(t, S) = {log(S/K)

+ r(T - t)}/aJT - t + aJT - t/2.

Taking 0: = r, analysis of the Bellman equation for the value function V (t, S, y) minL,M J(t, S, y) leads to the following free boundary problem:

Vy(t, S, y) = -AS/ K Vy(t, S, y) = /-LS/ K

=

NC n {y < ~(t, S)}, in NC n {y > ~(t, S)}, in

inN. By working with Vy instead of directly with V, we deduce from the previous set of equations that Vy(t, S, y) satisfies another free boundary problem associated with an optimal stopping problem. It is this reduction to optimal stopping that greatly simplifies the hedging problem. Applying the transformations s = a 2(t - T) and z = log(S/ K) - (p - 1/2)s, where p = r / a 2, it suffices to work with v(s, z, y) = Vy(t(s), S(s, z), y). For each y, we obtain the following discrete-time dynamic programming equation for the option writer, utilizing a symmetric Bernoulli walk approximation to Brownian motion:

v(s, z, y) = min{/-Le z+!3s, v(s, z, y)}lI{y>D(s,z)} + max{ _Ae z+!3s, v(s, z, y) }1I{y
(3.9)

= [/-LII{y>D(o,z)} - AII{y
![v(s+r5,z+05,y)+v(s+r5,z-05,y)], g(s,z,y) = 2e 2(z+(p-l/2)s)[y-D(s,z)], D(s, z) = eO: Ps (z/ J=S + J=S), and s = -15, -215, .... Each point (s, z, y) E (-00,0] x lR x [0,1] can be classified as belonging to the sell region, buy region, or no transactions region, according to whether v(s,z,y) = /-Le z+f3s , v(s,z,y) = _Ae z+!3s, or -Ae z+!3s < v(s, z, y) < /-Le z+f3s , respectively. Since v(s, z, y) is nondecreasing in y, there exist sell and buy boundaries, denoted respectively by yS(s, z) and yb(s, z), such that if y > yS(s, z) (resp. y < yb(s, z)), the option writer must immediately sell y-yS(s, z) (resp. buy yb(s, z) -y) shares of stock to form an optimal hedge. The optimal hedging portfolio for the option buyer can also be obtained from (3.9) by symmetry: the optimal sell and buy boundaries for the option buyer with sell rate /-L and buy rate A are _yb(s, z) and -yS(s, z)

Tze Leung Lai and Tiong Wee Lim

225

respectively, where yS(s, z) and yb(s, z) are the optimal sell and buy boundaries for the option writer with sell rate A and buy rate J-L. Simulation studies have shown the approach to be efficient in the sense that it results in the smallest standard error of hedging error for any specified mean hedging error, where hedging error is defined to the difference between the Black-Scholes value and the initial capital needed to replicate the option payoff at maturity. For details and refinements, see Lai and Lim (2002b).

4

Conclusion

Optimal investment portfolios and hedging strategies derived in the absence of transaction costs involve continuous trading to maintain the optimal positions. Such continuous policies are at best approximations to what can be achieved in the real world, and a frequent practice is to execute the policies discretely so that transactions take place at regular (or predetermined) intervals. With appropriate adjustments, these policies can also be implemented in the presence of transaction costs since they do not lead to an infinite turnover of asset. However, in the absence of a clearly defined objective, it is difficult to argue that a discrete policy is optimal in any sense. This difficulty can be overcome in investment and consumption problems through utility maximization, and in option pricing and hedging problems through the minimization of hedging error. Many formulations of these problems lead naturally to singular stochastic control problems, in which transactions either occur at maximum rate ("bang-bang") or not at all. In the analysis of these singular control problems, the principle of dynamic programming is used to derive the Bellman equations, which are nonlinear PDEs whose solutions in the classical sense have posed formidable existence and uniqueness problems. The development of viscosity solutions to these PDEs in the 1980s is a major breakthrough that circumvents these difficulties; see Crandall, Ishii and Lions (1992). In contrast to discrete policies, singular control policies require trading to take place at random instants of time, when asset holdings fall too "out of line" from a "target." Besides being naturally intuitive, singular control policies lend further insight into optimal investor behavior when faced with investment decisions (with or without consumption). Efficient numerical procedures can be developed to solve for the singular control policies based on Markov chain approximations of the controlled diffusion process. In some instances, a reduction to optimal stopping reduces the computational effort considerably.

Tiong Wee Lim Dept. of Statistics and Appl. Prob. National University of Singapore Singapore 117546

Tze Leung Lai Department of Statistics Stanford University Stanford, CA 94305

226

Singular Stochastic Control

Bibliography [1] Akain, M., Menaldi, J. L. and Sulem, A. (1996). On an investmentconsumption model with transaction costs. SIAM J. Control Optim. 34 329-364. [2] Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. J. Political Economy 81 637-654. [3] Boyle, P. P., and Vorst, T. (1992). Option replication in discrete time with transaction costs. J. Finance 47 271-293. [4] Clewlow, L. and Hudges, S. D. (1997). Optimal delta-hedging under transaction costs. J. Econom. Dynamics Control 21 1353-1376. [5] Constantinides, G. M. (1986). Capital market equilibrium with transaction costs. J. Political Economy 94 842-862. [6] Constantinides, G. M. and Zariphopoulou, T. (1999). Bounds on prices of contingent claims in an intertemporal economy with proportional transaction costs and general preferences. Finance Stoch. 3 345-369. [7] Cox, J. C. and Huang, C. F. (1989). Optimal consumption and portfolio policies when asset prices follow a diffusion process. J. Econom. Theory 49 33-83. [8] Crandall, M. G., Ishii, H. and Lions, P. L. (1992). User's guide to viscosity solutions of second order partial differential equations. Bull. Amer. Math. Soc. 27 1-67. [9] Davis, M. H. A. and Norman, A. R. (1990). Portfolio selection with transaction costs. Math. Oper. Res. 15 676-713. [10] Davis, M. H. A., Panas, V. G. and Zariphopoulou,T. (1993). European option pricing with transaction costs. SIAM J. Control Optim. 31 470493. [11] Duffie, D. and Sun, T. (1990). Transaction costs and portfolio choice in discrete-continuous-time setting. J. Econom. Dynamics Control 14 35-51. [12] Hodges, S. D. and Neuberger, A. (1989). Optimal replication of contingent claims under transactions costs. Rev. Futures Markets 8 222-239. [13] Karatzas, I., Lehoczky, J. P. and Shreve, S. E. (1987). Optimal portfolio and consumption decisions for a "small investor" on a finite horizon. SIAM J. Control Optim. 25 1557-1586. [14] Karatzas, I. and Shreve, S. E. (1984). Connections between optimal stopping and singular stochastic control I. Monotone follower problems. SIAM J. Control Optim. 22 856-877. [15] Karatzas, I. and Shreve, S. E. (1985). Connections between optimal stopping and singular stochastic control II. Reflected follower problems. SIAM J. Control Optim. 23 433-451.

Tze Leung Lai and Tiong Wee Lim

227

[16] Kushner, H. J. and Dupuis, P. G. (1992). Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag, New York. [17] Lai, T. L. and Lim, T. W. (2002a). Optimal investment and consumption on a finite horizon with transaction costs. Technical Report, Department of Statistics and Applied Probability, National University of Singapore. [18] Lai, T. L. and Lim, T. W. (2002b). A new approach to pricing and hedging options with transaction costs. Technical Report, Department of Statistics, Stanford University. [19] Leland, H. E. (1985). Option pricing and replication with transactions costs. J. Finance 40 1283-1301. [20] Magill, M. J. P. (1976). The preferability of investment through a mutual fund. J. Econom. Theory 13 264-271. [21] Magill, M. J. P. and Constantinides, G. M. (1976). Portfolio selection with transaction costs. J. Econom. Theory 13 245-263. [22] Merton, R. C. (1969). Lifetime portfolio selection under uncertainty: The continuous-time case. Rev. Econom. Statist. 51 247-257. [23] Merton, R. C. (1971). Optimum consumption and portfolio rules in a continuous-time sense. J. Econom. Theory 3 373-413 [Erratum 6 (1973) 213-214]. [24] Shreve, S. E. and Soner, H. M. (1994). Optimal investment and consumption with transaction costs. Ann. Appl. Probab. 4 609-692. [25] Shreve, S. E., Soner, H. M. and Xu, G.-L. (1991). Optimal investment and consumption with two bonds and transaction costs. Math. Finance 153-84. [26] Soner, H. M., Shreve, S. E. and Cvitanic, J. (1995). There is no nontrivial hedging portfolio for option pricing with transaction costs. Ann. Appl. Probab. 5 327-355. [27] Taksar, M., Klass, M. J. and Assaf, D. (1988). A diffusion model for optimal portfolio selection in the presence of brokerage fees. Math. Oper. Res. 13 277-294.

228

Singular Stochastic Control

Parametric Empirical Bayes Model Selection Some Theory, Methods and Simulation Nitai Mukhopadhyay Eli Lilly and Company

and

J ayanta Ghosh Purdue University Abstract For nested models within the PEB framework of george and Foster (Biometrika,2000), we study the performance of AIC, BIC and several relatively new PEB rules under 0-1 and prediction loss, through asymptoties and simulation. By way of optimality we introduce a new notion of consistency for 0-1 loss and an oracle or lower bound for prediction loss. The BIC does badly, AIC does well for the prediction problem with least squares estimates. The structure and performance of PEB rules depend on the loss function. Properly chosen they rend to outperform other rules.

1

Introduction

Our starting point is a paper by George and Foster (2000), abbreviated henceforth as [6]. [6] propose a number of new methods using PEB (Parametric Empirical Bayes) ideas on model selection as a tool for selecting variables in a linear model. An attractive property of the new methods is that they use penalized likelihood rules with the penalty coefficient depending on data, unlike the classical AIC, due to Akaike (1973), and BIC, due to Schwartz (1978), which use constant penalty coefficients. The penalty for a model dimension q is usually Aq, where A is a penalty coefficient. [6] compare different methods through simulation. Our major contribution is to supplement this with some theoretical work for both prediction loss and 0-1 loss. The former is supposed to be relevant in soft science, where one only wants to make good prediction, and the latter is relevant in hard science, where one wants to know the truth. It is known in model selection literature that these different goals lead to different notions of optimality. Our theory is based on the assumption that we have nested , orthogonal models - a situation that would arise if one tries to fit an orthogonal polynomial of unknown degree. This special case receives special attention in [6]. Our paper is based on Chapter 4 of Mukhopadhyay (2000), subsequently referred to as [9]. A related paper is Berger, Ghosh and Mukhopadhyay, (2003), which shows the inadequacy of BIC in high dimensional problems. 229

230

Parametric Empirical Bayes Model Selection

The BIC was essentially developed as an approximation to the Bayesian integrated likelihood when all parameters in the likelihood have been integrated out. The model that maximizes this is the posterior mode, it minimizes the Bayes risk for 0-1 loss. It is shown in Berger, Ghosh and Mukhopadhyay, (2003) that BIC is a poor approximation to this in high dimensional problems. The optimality of AIC in high dimensional prediction problems has been proved in a series of papers, e.g., Shibata (1981), Li (1987) and Shao (1997). Both the BIC and AIC are often used in problems for which they were not developed. We examine the penalties of [6] in Section 2 and make some alternative recommendations. All the model selection rules are studied in Sections 3 and 4 from the point of view of consistency under 0-1 loss. In section 5 we follow the predictive approach, using the consistency results proved earlier. For the situation where least squares estimates are used for prediction after selection of a model, we define an oracle, a sort of lower bound, in the spirit of Shibata. In the PEB framework it is easy to calculate the limit of the oracle, namely, the function B(·) and show that the Bayes prediction rule and the AIC attain this lower bound asymptotically. This is not always the case for the PEB rules, which are Bayes rules for 0-1 loss. Section 5 ends with a study of the case where Bayes (shrinkage) estimates are used instead of least squares estimates. Then the PEB rules are asymptotically optimal and can do substantially better than AIC. However, the benefit comes from the better estimates rather than more parsimonious model selection. Simulations in Section 6, for both 0-1 and squared error prediction loss, bear out the validity of asymptotic results in finite samples, they also provide useful supplementary information. Results similar to those outlined above are studied, in the Frequentist setting of Shao (1997), in Mukhopadhyay and Ghosh (2002) and for Shibata's Frequentist setting of nonparametric regression in Berger, Ghosh, and Mukhopadhyay, (2003). The assumptions, priors, results and proofs differ in the three cases. The PEB formulation of [6] provides a PEB background for the simplest as well as cleanest results of this type.

2

PEB Model Section Rules for 0-1 Loss

The problem of variable selection in nested orthogonal models can be put in the following canonical form in terms of the regression coefficients. The data consist of independent r. v's Yij, i = 1, 2, ... ,p, j = 1, 2, ... ,r. There are p models M q , 1 ::; q ::; p. Hardly any change occurs if q = 0 is also allowed. Under M q , Yij = (3i + tij, 1::; i ::; q, j = 1,2, ... ,r

N. Mukhopadhyay and J.K. Ghosh

231

= tij, q+ 1:::; i :::;p, j = 1,2,·· ·r, with til'S i.i.d. N(0,0- 2 ). For simplicity we assume 0- 2 is known. If 0- 2 is unknown the same theory applies if 0- 2 is replaced by a consistent estimate of 0- 2 . If r > 1 and p is large, then a consistent estimate of 0- 2 is available from the residuals Yij - Yi. In our asymptotics r is held fixed and p -+ 00. The sample size is n = pro Clearly, the model Mq of dimension q specifies that {3q+l,··· ,{3p are all zero. In the PEB formulation, see e.g. Morris (1983), the dimension of parameter space is reduced by assigning the parameters a prior distribution with a few unspecified (hyper-)parameters which are estimated from data and integrating out original parameters. [6] assume, as in Morris (1983), that {3I,·· . ,{3q are i.i.d. N (0, C 0- 2 / r). In our work we have used C 0- 2 , both choices have validity see our discussion in Berger and Pericchi (2001). In any case in the simulations r = 1, so that our prior is the same as that of [6]. As indicated in Morris (1983), a PEB formulation is a compromise between a classical Frequentist approach and a full Bayesian approach. In many decision theoretic examples based on real or simulated data, Efron and Morris (1973), Morris (1983) and others have shown that the PEB formulation permits borrowing of strength from estimates of similar parameters, leading to estimates that substantially improve classical estimates even in a Frequentist sense. However, this does not follow from PEB theory. The PEB theory works well, i.e. provides better estimates than classical ones in the sense of cross-validation or being closer to a known true value, when the normality (or other prior) distributional assumption is checked by comparing the expected and empirical distribution of Yi's. If Mq is true, then YI , Y2 , ... ,Yq are i.i.d. N(O, co- 2 + 0- 2 / r ). In the PEB formulation here there are two unknown remaining parameters, namely c and the true q denoted as qo. The PEB solution adopted by us is to estimate c from data and put a prior 7r(q) on q. We make one final assumption that 0- 2 = 1 which can be ensured by a suitable scale transformation. Suppose c is known and 7r(q) is a prior on q. The Bayes solution is to maximize with respect to q. The likelihood with {3I,· .. ,(3q integrated out namely,

L(q, c)

=

A 7r(q)(l

+ rc)-q / 2 exp{

rc SSq} ... 1 +rc

(1)

q

where SSq =

r2....:Y? and A doesn't depend on q or c.

Since c is not known, one

1

choice - referred to as a conditional maximum likelihood estimate of c - is to maximize the expression in (1) with respect to c, giving ~

SS

rC q = max{--q - 1, O} q

(2)

We now take 7r(q) uniform on 1 :::; q :::; p. Then the PEB Bayes rule will choose Mq if q maximizes the expression in (1) after replacing c by cq. This amounts to maximizing with respect to q,

Parametric Empirical Bayes Model Selection

232

A(q)

= A(q, cq ) = 2 log L(q, c) =

rC q ~ SSq - q log(1 1 + rCq

= SSq - q(1 + log + SSq)

+ rc q ) (3)

q

If instead of estimating c, we put a prior on tion we should maximize

C

and then use Laplace approxima-

(4) Details are given in [9]. Later we provide some evidence that a single estimate of a C across all models is preferable. A natural PEB estimate is obtained by taking 7r(q) = lip, and summing the expressions of the likelihood in (1) over 1 ::; q ::; p and then maximizing with respect to c. This estimate C1[ is referred to as the Marginal Maximum Likelihood estimate in [6]. One then gets a third penalized (log) likelihood ~ )} A1[ ( q) = S S q - q { I +~rC1[ log+ (1 + rC1[ rC1[ In this paper C1[ will also stand for any estimate which converges a.s. to c as true qo --> 00. George and Foster [6] discuss the relative advantages and disadvantages of each estimate of C and refer to unpublished work of Johnstone and Silverman (2000). The new model selection rules are to be compared with AIC which maximizes SSq - 2q/r and BIC which maximizes SSq - q{log(pr)}/r. As indicated before both these classical rules are inappropriate for high dimensional problems with 0-1 loss. The rule based on A(q) is essentially due to [6] except that, instead of our uniform prior, they choose the "binomial" prior.

(4a) where, according to [6], w is to be estimated also by maximizing (1). For a given q, it is clear that w appears only in the prior 7r(q) and not on the likelihood of the data given M q . The maximizing w, namely,

Wq

=

q/p

(5)

can hardly be called a PEB estimate in the same spirit as cq . Also for q/p bounded away from zero an one, the penalty in (log) integrated likelihood due to this 7r(q) is O(q) whereas this part of the penalty vanishes at the end-points. In other words, irrespective of the data, the models in the middle range of q are being unduly penalized. The binomial prior seems more appropriate in the all 2P subsets model selection which is much problem, where the models in the middle have cardinality (P) q bigger than the cardinality of, say q = 1 or p.

N. Mukhopadhyay and J.K. Ghosh

233

Even for all subsets model selection, there is some confounding between wand c in the following sense. The Bayesian "non-centrality" parameter is p

E(~ j3i) = pwc

(6)

1

An estimate of this can only help determine the product wc. Separate estimation of wand c will require the use of the normal likelihood in a way that is not robust. We will return to this problem elsewhere.

3

Consistency

We first consider the case where c is known, so in the PEB criteria estimates Cq , c7r are to be replaced by c. It is clear that if MqO remains fixed (as p -----> 00), then the likelihood ratio of MqO with respect to any other fixed M q, remains bounded away from zero and infinity. Hence it would be impossible to discriminate one of them from the other with error probabilities tending to zero as p -----> 00. That can happen only when Iqo - qll -----> 00 as p -----> 00. The following definition is motivated by this fact.

Definition Let qo -----> 00 as p -----> 00. A penalized likelihood criterion A(q, ¥,p) for model selection is consistent at qo if given E > 0 and for sufficiently large p and qo, there exists a k, (depending on E, p, qo, such that

Pqo{A(qo, ¥,p) > A(q, ¥,p), Vlq - qol 2 k} > 1 -

E

(7)

Of course we could take fixed qo and examine consistency from the right only. The treatment is exactly similar. Let

A(q, ¥,p) for some).. >

o.

=

SSq - q)..

Then for ql > qo and ql - qo

(8)

-----> 00

q

A(qo, ¥,p) - A(q, ¥,p) = -r ~ 9i2

+ (ql

- qo) .. = (ql - qo)().. - 1 + op) (9)

qo+l

Similarly for q < ql and qo - ql

-----> 00,

A(qo, ¥,p) - A(q, ¥,p)

=

(ql - qo)(1

+ rc -).. + op(I))

(10)

We thus have Proposition 3.1. The penalized likelihood criterion A(q, ¥,p) with constant penalty coefficient ).. is consistent at all qo -----> 00 iff 1 < ).. < 1 + rc.

Parametric Empirical Bayes Model Selection

234

For AIC, >.. show that

=

2, so one would have consistency if rc > 1. If rc < 1, one can

A(1, Y-,p) - A(q, Y-,p)

- t 00

(11)

a.s.

if q - t 00, i.e., AIC chooses MI or models not far from MI' It is shown in section 5 that this is a good thing to do, if one wants to make predictions and least squares estimates are used. The usual BIC with>" = log n is inconsistent, this extremely high penalty also leads to poor performance in prediction. A modified version due to several people, see [9] or Mukhopadhyay, Berger and Ghosh (2002) for references, has log p instead of log n. That also is not consistent in general. For consistency one requires r :2: 3 and 1 + rc-log r > O. We now turn to the three PEB rules with estimates cq or c7r . It is easy to check that the rule based on A7r (q) is consistent if c7r is a consistent estimate for c. To prove this we need to show

1 +rc 1 < --log(1 +rc) < 1 +rc rc

(12)

The right hand inequality follows from

log(1

+ rc) < rc

which is proved by the fact that the second derivative of log(1 The left hand inequality follows from

(1

+ rc)log(1 + rc) > r

(13)

+ x)

is negative.

(14)

which is proved by the fact that the second derivative of (1 + x)log(1 + x) is positive. The other two PEB criteria differ from each other by a quantity which is op(q), hence they are either both consistent or both inconsistent. Since cq has undesirable properties as an estimate of c (vide Section 4) neither of these rules is consistent in our sense. This does have some effect on their performance in prediction problems. All one can show for these two cases is that A(qo, Y-,p) - A(q, Y-,p) - t 00 if \q - qo\ - t 00 and (qO/ql) is bounded away from zero. To prove this, one has to use the behavior of cq for q > qo which is studied in the next section.

4

Estimation of c.

By the law of large numbers, for large q,

N. Mukhopadhyay and J.K. Ghosh

235

q

rL:Y? Cq

=

1

q

qac

_

1 = c (approximately), for q ::; qo

.

= - (approxImately), q > qo

(15)

q

Clearly, for large incorrect models, cq decreases the penalty for each additional parameter, namely, l+log (1 + cq ). This is counterintuitive. Plots of cq for simulated data in [9] shows that cq tends to die out for large incorrect values of q. This is the main reason why consistency became a problem for A(q, cq ). If the true qo is fixed and not large, one cannot have a consistent estimate of c. If qo

-----t 00

at a rate faster than some known

q,

then a consistent estimate is

(16) However such knowledge of q is unlikely. A plot of information about both c and true qo.

cq

provides good visual

An estimate of c, which is easy to calculate and has a nice Bayesian interpretation is the model average

(17)

where

(18)

Asymptotic behavior of c7r is difficult to study. It is unlikely to be consistent in general for the following reason. For values of q much larger than qo, cq will be much smaller than c but such q's will have large weights itq inappropriately. The net effect of this will be to pull down the average c7r away from c. Some evidence of this based on simulation is provided in [9]. We now make two rather strong assumptions which ensure consistency of a slightly modified version of c7r • AI) As p

-----t 00,

qo/p is bounded away from zero

A2) There is a known positive number k such that c::; k.

Parametric Empirical Bayes Model Selection

236

The modified version, also denoted by the same symbol, is p

Cn =

I)·q min(cq, k)

(19)

1

Under our assumptions cn ---+ c a.s. We sketch a proof. For slight simplicity, we take r = I. For q :::; qo

(20) This can be used to show for all q < qo (1 - E), 0 < E < 1, b > 0 and sufficiently small,

A(q, cq )

-

A(qo, cqo ) < (qo - ql {log(1 +

cqo ) -

c + b} + ql {log(1 +

cqo ) -log(1 + cq )} (4.1) (21)

(where '"'I > 0) with probability> I-E. We have used the fact that log (l+c) < c. We can now show as in the proof of Proposition 5.1 that

L

exp{A(q, cq )

-

A(qo, cqo )} ---+ 0

(22)

q~qo(1-t)

with probability tending to one as p ---+

00.

For q 2:: qo (23)

where by the strong law, sup Irql---+ 0 in probability. So, by concavity oflog(x), q?qo

there exists b > 0, such that for p 2:: q 2:: qo (1 + E) (24) where rq is a generic term such that sup Irql is op (1). Then for p 2:: q > qo(I+E), q

where

sup

qollrql = op(l)

q>qo(1+t)

The expression in (25) is, by (24),

b

< -q2

Once again an analogue of (22) for q > qo(1 + E) is true. So the contribution to cn from q > qo(1 + E) and q < qo(1- E) is negligible. But for Iq - qol < E, cq can be made as close to c by choice of E. This proves the consistency of cn .

N. Mukhopadhyay and J.K. Ghosh

5

237

Bayes Rule for Prediction Loss and Asymptotic Performance

It is well-known (see, e.g., Shao (1997)) that the loss in predicting unobserved Y's, for an exact replicate of the given design, on the basis of given data is the q

sum of a term not depending on the model and the squared error loss 2: (Yi -;Ji? . 1

So in evaluating performance of a model selection rule it is customary to ignore the term not involving the model and focus on the squared error loss. We do so below. For a fixed c the Bayes rule is described in the following theorem. We need to first define a quantile model. A model Mq is a posterior a-quantile model if n(i + 1 ~ qlY) ~ a < n(i ~ qlY) or equivalently. Theorem 5.1. The Bayes rule selects the smallest dimensional model ifrc and the posterior r~~cl quantile model if rc > 1

~

1

Proof Let Mq stand for the true (random) model with prior n(q) The posterior distribution of ;Ji given Mq is

rc n(;Jilq, Y) = N(--Yj, c/(1 1 + rc =

+ rc)),

i ~q

point mass at zero, i > q

Hence -

Y

2

2

C

E{(Yj-;Ji) Iq'Y}={1+rc} +1+rc' i~q -2

= Yi, Similarly, E{(;Ji - a)2Iq, Y} = {;~~~P =

i>q

+ l~rc'

i

~q

a,i > q

Suppose we ignore the fact that we have to select from among nested models (i.e., we have to include all j < i if we include i in our model) and just try to decide whether to set ;Ji non zero or zero. The posterior risks of these two decisions are

W(i excluded IY) -

=

c {n(q 1 + rc

Hence inclusion of i is preferred iff

~ ilY)} + Y?{( ~ )2n(q ~ iIY)}. 1 + rc

Parametric Empirical Bayes Model Selection

238

which implies

1 + rc 2 rc

< n(i

::; qlY)

Suppose rc > 1. Then we choose all i such that n(i ::; qlY) > ~~~~. Given the obvious monotonicity of n(i ::; qIY), this means we choose the ~~~~ posterior quantile model. Clearly this is the Bayes rule. More formally if d( q1) is the decision to choose model M q , corresponding posterior risk b

ql

w(q1IY) =

L

w(i includedlY)

L

+

w(i excludedIY)·

i=l

rc-1 : : : L Min{w(i includedlY), W(i excludedlY)} = W(-2 -quantile modellY) rc p

1

Similarly if rc ::; 1, it is easy to see that the simplest model minimizes the posterior risk among all models. This completes the proof. To define asymptotic Empirical Bayes optimality, we define an oracle, I.e., a lower bound to the performance of any selection rule. Let MqO be the true (unknown) model and d(q1) the decision to select Given y, the PEB risk of d(qd under M qO ' after division by qo, is 1

M q1 .

P

ql

A(q1) = -lLE{(Yi qo t= . 1

f3d Iqo, Y} + 2

L

E{f3ilqo, Y}]

. +1 t=ql

for q1 ::; qo 1

c

=--+ 1 + rc for q1

(1

1

-

L yqo

+ rc)2 qo.t=l

2

+

t

q1 - qo 1 qo q1 - qo .

L yq

t=qo+1

2

t

> qo·

Using the strong law of large numbers we obtain a heuristic approximation to A(q1) namely

c + q ~ + qo - q c2r2 l+rc qo(1+rc)r qo (l+rc)r' c 1 1 q - qo 1 =--+ -+---,qo
f3(qd =

which reduces to c

c 2r2

q (1 - rc)

-- + +1 + rc r(l + rc) qo

r

q ::; qo

N. Mukhopadhyay and J.K. Ghosh

239

and e 1 + re

q - qo 1

1

+ r(l + re) + ----;;;- -:;

q > qo

Clearly qo(3(·) is a non-random approximation to the posterior risk under Mqo. Note that (3(.) is minimum at qo if re > 1 and at q = 1 if re = 1, then (3(.) does not depend on q.

Let A (.) and (3(.) be defined as above. Then

Theorem 5.2.

inf A(q) lim supIA(q)-(3(q)I=O and lim .qf(3() =1. qo-+= q qo-+= In q q

Proof We consider the case q

~

qo. The other case follows similarly.

2

A(q) - (3(q) =

qo

(1:

re

1

qo - q

q

L fi2 - (1 + re)} q

) {-

2 2

c r

1

1

qo

~

-2

+ -qo- (1 + re )2 {qo-- -q L..,. Yi -

(1

+ re)}

q+l

=

Tl (q)

We show that sup ITl (q) I

--+

+ T2(q) 0 a.s. One can show the other part --+ 0 in a

q
similar way. q

By SLLN, given

E

-

> 0, we choose a A such that for q > A, I L:: Yi 2 /q- (1 +e)1 <

E.

1

Since qo > q and (1+e)2 > 1, ITl(q)1 for q ~ A can be made smaller than

~ ql. The remaining ITl(qo)1 if we choose q sufficiently large.

< E for A < q E

By repeated application of this kind of elementary argument one proves the first part of the theorem. The first part implies lim Iinf A (q) - inf (3 (q) I = 0 qo-+= q q Since, inf (3 (q) q

=e =1

if e

<1

if e

~

1

is positive, the second part of the theorem follows.

Theorem 5.3. For known e, the optimal model Mqc is asymptotically equivalent to the oracle q minimizing A(q) in terms of posterior predictive loss, i.e.,

Parametric Empirical Bayes Model Selection

240

posterior predictive loss of Mqc under qo (qo inf A(q) )

---+

1a.s.

q

as qo

---+ 00

To prove this we need the following result, which has some independent interest.

Proposition 5.1 Let qo be the true model,. As qo for any b ---+ 00 such that b = o(qo).

---+ 00,

7r(lql -qol

> bl¥)

---+

0

This is in the spirit of posterior consistency at qo except that b is not fixed but goes to infinity at a relatively slow rate. Proof of Theorem 5.4 Without loss of generality take r = 1. If c < 1, the model Mqc always chooses the simplest model. Hence its posterior risk (under qo) is qoA(qc). Since f3(q) is minimized at q = qc in this case, we are done. For c > 1, inf f3(q)

= 1.

Also by Prop 5.1,

qc qo

q

a.s.

---+

1

We consider the cases where qc ::;: qo The other case is similar. The posterior risk of Mqc for qc < qo is

which

---+

1 a.s. since

qc ---+ qo

1 a.s.

Proof of Prop. 5.1. We take r = 1 as before and let >.(c) = l~C log(l + c). It has been proved before that 1 < >.(c) < 1 + c. Using the strong law, given E > 0, there exists k > 0 such that for q > qo + k, with probability tending to one I(A(q) - A(qo))I(q - qo) - (1 - >,(c))1 < E i.e. A(q) - A(qo)

< -(q - qo)'y, for some I > o.

Hence

7r(q> qo

+ kl¥)::;:

L

t(q-qo)

where t =

e-"(

q>qo+k

=

t k / (1 - t)

One can similarly show 7r(q < qo - kl¥)

---+

---+

0

0, using >.(c) < 1 + c

Remark 5.1. Theorem 5.1. holds for unknown c if c is a consistent estimate and we use qc of the Empirica Bayes model selection rules but replacing cq by c. The same result holds for AlC also, which is interesting since AlC does not need to estimate c consistently. We prove this below. One simply notes that in Section 3 we prove that for rc > 1, AlC is consistent for qo, if qo ---+ 00. Also for rc < 1, AlC (q)- AlC (1) ---+ -00, if q ---+ 00. Using

N. Mukhopadhyay and J.K. Ghosh

241

these facts one shows, as in the proof of Theorem 5.4., Ale attains the same risk as the oracle. So far we have been looking at several Bayesian model selection rules from the point of view of prediction or squared error loss in a situation where after selection of model least squares estimates are used. Results differ in a major way if least squares estimated are replaced by the Bayes estimates E((3ilq, y) = 1~~c Yi if Mq is chosen and i -::; q. Since the proofs are similar we merely state the main facts. For a known c, the Bayes rule becomes the posterior median rule. This is a special case of a general result of Barbieri and Berger (2000) but can also be derived like Theorem 5. To define a Bayesian oracle, we redefine

A(q)

=

1

~

1"C

-

~

2

2

-[L,..,E{((3i- -I-Ii) Iqo,y}+ L,..,{((3i -0 Iqo,y}] qo i=1 + 1"C q+l q C -qo 1 + 1"C

+

(qo - q) C {-qo 1 + 1"C

1"C

+

1 + 1"C

+

~qO

-

2

.

Ii} (qio - ql) If ql -::; qo

q+l

and C

-- + 1 + 1"C

(q - qo) C -qo 1 + 1"C

+

~q

1

-

2

( - - I i ) /(ql - q) if ql > qo

qo

1 + 1"C

The heuristic nonrandom approximation is

(3(q) = !l_C_ qo 1 + 1"C and C

1 + 1"C

+ (ql

+ qo -

q{

qo

C

1 + 1"C

- qo) { C qo 1 + 1"C

+

2 2 1" C

1 + 1"C

+ _1_} 1 + 1"C

}

q -::; qo

q > qo

inf(3(qd = l~rc' attained at qo, for all c. The posterior median Bayes rule as well as the PEB model selection rules followed by Bayes estimation attains the risk of the Bayesian oracle, namely q minimizing A(q), provided C is known or a consistent estimate of C is used. The advantage of using the (shrinkage) Bayes estimates can be seen comparing the inf (3(q) for the two cases, namely l~rc for Bayes estimates and ~ for least squares estimates. For all fully Bayes rules reduce the posterior risk per component in the model by It~c which can be very large if both 1" and care small.

Parametric Empirical Bayes Model Selection

242

c=0.5,qo=50

c=0.5,qo=20 4 3 2

0.8 0.6 0.4 0.2

1

50 100150200250 c=3,qo=50

c=3,qo=20

1

50 100150200250

50 100150200250

Figure 1: Behavior of cq in a nested sequence of models.

6

Simulations and Discussion

A plot of cq against q is a good Bayesian data analytic tool that provides information about both c and the true dimension qo. This is true of all the four graphs in Figure 1 but it is specially noticeable when c if not too small. The second set of simulations describe the performance of different model selection rules for 0-1 loss. We have taken r = 1 In addition to AIC, BIC and the three PEB rules defined in section 2, we consider the Conditional Maximum Likelihood rule (CML) of [6], in which both cq and Wq are used as indicated in Section 2, even though the binomial prior seems unintuitive in the nested case. In simulation c = 0.5 or 3. Higher values of c are considered in [9], the results are very similar to those for c = 3. It is clear from Tables 1 and 2 that the BIC and CML are disastrous, as expected. AIC does well for c = 3 but badly for c =0.5, again as expected from Section 3. However, inconsistency is preferable to consistency in the prediction problem, vide the proof of Theorem 5.1 and Proposition 5.2. This is borne out by the third set of simulations.

The third set of simulations (Tables 3 and 4) describes performance of these cri teria under prediction loss. Once again, A* ( q) seems to do substantially better than A(q) and An is somewhat worse than the other two. AIC is competitive for c > 1 and dramatically better than c < 1 This is because with least squares estimates neither of the three PEB rules are asymptotically optimal if c < 1. Of course the Bayes rule qc for prediction loss would have done much better and be comparable to AIC.

N. Mukhopadhyay and J.K. Ghosh

41 5 A(q)

5

1

BIG AIG GML

8 44 270

38 310 1 3

A*(q)

A7r(q)

10 I

7 12 136 475 1 1 1 1 1 3 1 1 999

28 199 2

3 10 14 102 444 1 1 1 1 1 3 1 1 999

4 10 16 80 384 1 1 1 1 2 5 1 1 999

243

20 I 15 26 104 2 8 20 20 48 242 1 1 1 1 2 8 1 1 999

500 I

40 I 26 40 78 3 21 40 34 50 150 1 1 1 1 3

476 498 515 475 497 510 478 498 516 1 1 1 1 3 10 999 999 999

11

1 999 999

800 774 796 810 771 795 809 775 796 812 1 1 1 1 4 12 999 999 999

I

900 II 873 895 908 873 895 908 874 895 908 1 1 1

1 3 9 999 999 999

Table 1: Quartiles of the dimensions selected by different criteria for c = 0.5, r

=

II

qo

1.

10 I 4

3 A(q)

5 12

5 22 2

2 A*(q)

8

6

4 4

4

1

8 10 14 1

1 1

2

1 2

2 3

3 6

4

4 4

9 5

1

1

GML

10

5 38

1

AIG

9 5

8 64

BIG

11

4

3

A7r(q)

10

1 999

10 1

1 999

2 999

20 I 17 20 21 16 19 20 17 20 21 1 1 3 16 19 20 1 999 999

40 37 39 41 36 39 40 37 39 41 1 1 3 36 39 40 999 999 999

I

500 497 500 500 497 500 500 497 500 500 1 1 3 497 499 500 999 999 999

I

800 797 799 800 797 799 800 797 799 800 1 1 3 797 799 800 999 999 999

I

900 897 899 900 897 899 900 897 899 900 1 1 3 896 899 900 999 999 999

II

Table 2: Quartiles of the dimensions selected by different criteria for c = 3, r = 1.

Parametric Empirical Bayes Model Selection

244

II

qo A(q)

A*(q) An (q)

BIG AIG GML

227.94

5 211.53

10 205.26

20 178.71

40 138.14

500 522.19

800 818.44

900 909.33

35.77 293.53 2.63 5.18 425.15

20.66 297.17 3.05 4.91 412.92

37.06 297.28 5.5 7.86 466.09

42.47 235.44 10.54 13.68 499.04

54.76 180.23 20.69 25.82 574.25

518.25 522.89 250.74 258.63 998.05

816.34 818.57 401.06 409.8 1000.51

908.1 909.44 450.62 457.95 998.57

4

Table 3: Prediction loss of the models selected by different criteria for c r = 1.

II

qo A(q) A*(q) An(q)

BIG AIG GML

4

94.92 14.36 146.83 6.85 6.56 331.95

5 113.44 19.09 145.46 9.51 7.09 371.34

10 39.42 15.6 53.61 21.74 13.13 446.24

20 31.23 24.45 31.21 50.36 23.23 635.56

I

40 44.6 44.22 44.6 108.76 43.62 847.6

500 503.71 503.65 503.71 1489.84 503.66 998.85

II

= 0.5,

800 804.03 804.08 804.03 2392.47 804.32 998.6

Table 4: Prediction loss of the models selected by different criteria for c r = 1.

I

900 904.4 904.36 904.35 2693.92 904.23 999.55

= 3,

We have not done any simulations on the posterior median Bayes rule, which uses PEB shrinkage Bayes estimates. It is expected to outperform AIC as seen from the comparison of j3(.), s for model selection followed by least squares and model selection followed by Bayes estimates. The three PEB criteria of Section 2, followed by Bayes estimates, are expected to do much better than evident in Tables 3 and 4 but not as well as the posterior median rule. It may be worth pointing out that there is a basic difference between the median Bayes rule and AIC. Whether c > 1 or < 1, the median Bayes rule is consistent at qo- a proof can be constructed using Proposition 5.1 But it then shrinks the estimates towards zero appropriately, depending on values of c. AIC doesn't have this option, it uses least squares estimates. So for critically small values of c, namely c < 1, it has to choose a much lower dimensional model to have some sort of shrinkage.

Bibliography [1]

Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. In B. N. Petrov and F. Czaki, editors, Proceedings of the Second International Symposium on Information Theory, 267-271 Budapesk: Akad. Kiado.

[2]

Barbieri, M. and Berger, J. (2000) Optimal Predictive Model Selection, ISDS Discussion Paper, Duke University.

II

N. Mukhopadhyay and J.K. Ghosh

245

[3J

Berger, J.O., Ghosh J. K., and Mukhopadhyay, N. (2003) Approximations and consistency of Bayes factors as model dimension grows, Journal of Statistical Planning and inference, [112], 241-258.

[4J

Berger, J. O. and Pericchi, L. R. (2001) Objective Bayesian Methods for Model Selection: Introduction and Comparison, IMS Lecture Notes, (P. Lahiri editor) 38, 135-203.

[5J

Efron, B. and Morris, C. (1973) Stein's Estimation Rule and its Competitors an Empirical Bayes Approach, Journal of the American Statistical Association, 68, 117-130.

[6J

George, E. I. and Foster, D. F. (2000) Calibration and Empirical Bayes Variable Selection, Biometrika, 87, 731-747.

[7J

Li, K-C (1987) Asymptotic Optimality of cP ' Cl, cross Validation and Generalized cross Validation: Discrete Index Set, Annals of Statistics, 15, 958975.

[8J

Morris, C. (1983) Parametric Empirical Bayes Inference, Journal of the American Statistical Association, 78, 47-55.

[9J

Mukhopadhyay, N. (2000) Bayesian Model Selection for High Dimensional Models with Prediction Loss and 0-1 loss, thesis submitted to Purdue University.

[10J Mukhopdhyay, N. and Ghosh, J.K. (2002) Bayes Rules for Prediction Loss

and AIC, (submitted). [l1J Rissanen, J. (1983), A Universal Prior for Integers and Estimation by Minimum Description Length, Annals of Statistics, 11, 416-431. [12J Schwartz, G. (1978) Estimating the Dimension of a Model, The Annals of Statistics, 6, 461-464. [13J Shao, J. (1997) An Asymptotic Theory for Linear Model Selection, Statistica Sinica, 7, 221-264. [14J Shibata, R. (1981) An Optimal Selection of Regression Variables, Biometrika, 68, 45-54. [15J Shibata, R. (1983) Asymptotic Mean Efficiency of a Selection of Regression Variables, Annals of the Institute of Statistical Mathematics, 35, 415-423.

246

Parametric Empirical Bayes Model Selection

A Theorem of Large Deviations for the Equilibrium Prices in Random Exchange Economies Esa Nummel in University of Helsinki

Abstract We formulate and prove a theorem concerning the large deviations of equilibrium prices in large random exchange economies.

1

Introduction

We consider an economic system (shortly, economy) E, where certain commodities j = 1, ... , l are traded. Let R~ =def {p = (pI, ... ,pl) E Rl; pJ ~ 0 for all j = 1, ... , l}. The elements p of R~ are interpreted as price vectors (shortly, prices). (We will follow a convention, according to which superscripts always refer to the commodities whereas subscripts refer to the economic agents.) The total excess demand function Z (p) = (Zl (p), ... , Zl (p)) E Rl comprises the total excess demands on the l commodities in the economy at the prices p E R~. Its zeros p* are called the equilibrium prices: Z(p*) = O.

(In fact, according to Walras' law, we may regard money as an l+ 1 'st commodity [the numeraire] having price pl+l = 1 and total excess demand Zl+l(p) = -p. Z(p).) In the classical equilibrium theory the economic variables and quantities are supposed to be deterministic, see [2]. It is, however, realistic to allow uncertainty in an economic model. We assume throughout this paper that the total excess demand Z (p) is a random variable (for each fixed price p). In particular, it then follows that the equilibrium prices p* form a random set.

The seminal works concerning equilibria of random economies are due to Hildenbrand [5], Bhattacharya and Majumdar [lJ and Follmer [4J. The equilibrium prices in large random economic systems obey (under appropriate regularity conditions) classical statistical limit laws. The law of large numbers [lJ states that, as the number n of economic agents increases, the random equilibrium prices (r.e.p.'s) p~ become asymptotically equal to deterministic "expected" equilibrium prices: · Pn* 11m

n-+CXJ

= Pe'*

247

Random Exchange Economies

248

(The subscript n refers to the number of economic agents.) The central limit theorem (CLT) for the r.e.p. 's [lJ characterizes the "small deviations" of the r.e.p.'s from their expected values as asymptotically normal:

n ~ (p~ - p:)

----+

N in distribution,

where N denotes a multinormal random vector having mean zero. We argue in this article for the relevance of the theory of large deviations to random equilibrium theory. To this end, suppose that, an aposteriori observation of the equilibrium price is made, and let p denote the value of this observation. If the modeler is concerned with the estimation of the apriori probability of an aposteriori observation p of the equilibrium price in a large economy, the use of the CLT requires the apriori model to be "good" in the sense that the observation p ought to fall within a narrow range (having the asymptotically negligible order n - ~ = o( n)) from its expected value p;.

However, due to the fact that economics is concerned with the (economic) behaviour of human beings, any (predictive) economic model is always to some extent defective. It follows, in particular, that in a large economy an observed equilibrium price p may well represent a "large deviation" from its apriori predicted value p; (viz. fall outside the region of validity of the CLT). The main result of this paper is a theorem of large deviations (LD's) for the random equilibrium prices. It yields an exponential estimate for the (apriori small) probabilities of observations of r.e.p.'s "far away" from their expected values. Namely, we prove that, under appropriate regularity conditions, for an arbitrary fixed price p, there exists a constant i(p) 2: 0 such that

(1.1 )

In accordance with standard LD terminology (see [3]), we refer to the price depending constant i(p) as the entropy. In what follows we shall formulate and prove (1.1) as an exact mathematical theorem. LD theorems for random equilibrium prices were earlier presented in [7],[8J. The version here is of "local type" in that we are concerned with probabilities of observations of r.e.p.'s in small neighborhoods of a given fixed price. Because of this it turns out that the hypotheses of [7],[8J can be somewhat relaxed. Also it becomes possible to give a self-contained proof which does not lean on the general abstract LD theory. Therefore the proof ought to be accessible also to a reader who is not an LD specialist. The basic idea in the proof is to use a centering argument of a type which is commonly used in LD theory.

Esa N ummelin

2

249

Formulation of the LD theorem

We describe now the basic set-up and formulate the large deviation theorem in exact terms. We will be concerned with a sequence En, n = 1,2, ... , of economies. We assume that in the economy En there are N n economic agents labeled as i = 1, ... , N n . We assume that N n is of the order O(n); namely,

Nn

::;

An for some constant A <

00.

(2.1)

Let (0, P, F) be a probability space. We consider a double sequence of R1-valued maps (in: X R~ -----t Rl, n = 1,2, ... , i = 1, ... , N n , such that, for each fixed n, i and p, the function

°

is a random variable (viz. F-measurable). (in(P) is interpreted as the (random) individual excess demand by the i'th agent in En at the price p. Example 2.1. In a Cobb-Douglas exchange economy the individual excess demand by an agent i E En on commodity j is given by the formula I ;-j (

) _

"in p -

(,,-J)-1 jJ aj

in

"" k k ~p ein

- e jin ,

k=l

where the parameters a{n 2: 0 satisfy I

L a{n = 1 for all i

and n,

j=l

and e{n denotes the agent's initial endowment on the commodity j, see e.g. [1 OJ. In a random Cobb-Douglas exchange economy the parameters a;n and e{n are supposed to be random variables. The random total excess demand in the economy En is obtained as the sum of the random individual excess demands: Nn

Zn(P) =

L (in(P)· i=l

(In order to indicate its dependence on the size parameter n, we equip henceforth the total excess demand with the subscript n.) For a fixed economy En and for a fixed realization wE 0, a price p~(w) at which the total excess demand function vanishes, i.e., such that Zn(w;p~(w)) = 0, is called an equilibrium price for the realization w in the economy En. We denote by 7r~ (w) the set of equilibrium prices p~ for the realization w in the economy En·

Random Exchange Economies

250

Let

Cn(o:;p)

log Eea,zn(p), 0: E Rl,

=

denote the cumulant generating function (c.g.f.) of the random total excess demand Zn (p), P E R~, and let

e(o:;p) = limsupn-1Cn(0:;p). n--+oo

We denote

i(p)

=

-

inf e(o:;p) aERI

and call it the entropy (associated with the price p). Note that, due to the fact that c(O;p) == 0 it follows that i(p) ;::: 0 always. Recall that a c.g.f. is always a convex function. Consequently, Cn(o:;p) as well as the limit e( 0:; p) are convex functions (of the variable 0:). Thus in particular, if

Be

80: (o:(p); p) = 0 for some o:(p) E Rl, cf. the hypothesis (HI),

(2.2)

then it follows that

(2.3)

i(p) = -e(o:(p);p). The zeros pnces:

P:

of the entropy function i(p) will be called expected equilibrium

Under appropriate regularity conditions these are the same as the zeros of the mean excess demand function J-L(p) , defined by

Proposition 2.1. Suppose that

(2.4) there is a unique o:(p) such that ~~ (o:(p);p) = 0, and (2.5) e( 0:; p), 0: E Rl is differentiable at 0: = O.

Then (2.6) J-L(p)

=

~~ (O;p), and

(2.7) i(p) = 0 if and only if J-L(p) = O.

Proof of Proposition 2.1 That (2.5) implies (2.6) is a standard fact in LD theory (see e.g. [3]). In order to prove (2.7) assume first that i(p)

= 0,

i.e.,

c(o:(p);p) = min C(O:iP) = O. aERI

Esa Nummelin

Since c(O;p)

251

= 0, it follows from the uniqueness of a(p) that a(p) = O. p,(p)

Bc Ba(O;P)

=

O.

Bc p,(p) = Ba(O;P) =

o.

=

Therefore

Suppose conversely that

Again, due to uniqueness, a(p)

= 0 so

that

i(p) = -c(a(p);p) = -c(O;p) = 0,

o

indeed.

Example 2.2. Suppose that N n == nand (in(P) = (i(P) for i = 1, ... , n, where (i(P), i = 1,2, ... , is a sequence of i.i.d. random variables (fOT each fixed price p). In this case Cn(a;p) - nc(a;p), (2.8)

and therefore c(a;p)

=

log EeO:-(l (p)

is equal to the c.g.f. of the individual excess demand (l(P). Moreover, due to the classical LLN for i. i. d. random variables, the mean excess demand is equal to the expectation of the individual excess demand:

Let us now fix a price p E R~. We formulate the following set of hypotheses. (The abbreviation "w.p.l" means the same as "with probability 1", and the phrase" eventually" means "for all sufficiently big n" .) (HI) ::Ja = a(p) E Rl: g~ (a(p);p) = 0;

(H2) c(a(p);p) = lim n-1Cn(a(p);p); n---+CX)

(H3) ::JA1(p) <

00,

cl(P) > 0 : I(In(q)1 < Al(P) w.p.l, for all i and n, for

Iq - pi < cl(P);

< 00, c2(P) > 0 : Iq - pi < c2(P);

(H4) ::JA2(P)

(H5) ::JA-1(p)

< 00:

1(:~(q)1

::; A2(P) w.p.l, for all i and n, for

l(n-lZ~(p))-ll::; A-l(P) w.p.1., for all n.

Remarks. (i) Condition (H4) implies condition (H3).

Random Exchange Economies

252

(ii) Suppose that (in(P) = (i (p) , where (i(P), i = 1,2, ... , are i.i.d. as before. Now, due to (2.8), the hypothesis (H2) is trivially true. Also it turns out that in this case hypothesis (H5) can be replaced by the simpler hypothesis (H5 ') ,i (p) is non-singular, see [9].

(i) Suppose that the hypotheses (H1-3) hold true. Then there exists a constant Mo (p) < 00 such that

Theorem 2.1.

P(7r~

n U(p, c)

eventually, for all 0 < c <

0) < e-n(i(p)-Mo(p)E)

-=1=

Cl (p).

(ii) Suppose that the hypotheses (H1-2,4-5) hold true. constant Ml (p) < 00 such that P(7r~

eventually, for all c >

n U(p, c)

-=1=

Then there exists a

0) > e- n (i(p)+M 1 (p)E)

o.

Let us call a price p E R~ non-expected, if the entropy i(p) > o. Under the conditions (2.4-5) this is equivalent to p not being a zero of the mean excess demand fl(P):

fl(P)

-=1=

o.

By using Borel-Cantelli lemma we obtain the following corollary of part (i) of the LD theorem: Corollary 2.1. Suppose that the hypotheses (H1-3) hold true. Let p E R~ be a

non-expected price. Then 7r~

3

n U(p, c) = 0 eventually,

w.p.1, for all 0 < c < cl(P)·

Proof of the LD theorem

For the proof of the upper bound (i) we need two lemmas. standard type in LD theory.

The first is of

We define the following sequence of probability measures:

Pn;p(dw) = ea(p).zn(w;p)-Cn(a(p);p) P(dw), n = 1,2, .... Lemma 3.1. Suppose that hypotheses (H1-2) hold true. Then for each <5 > 0, there exists a constant 'T] = 'T]( <5; p) > 0 such that

Esa Nummelin

253

Proof of Lemma 3.1. Let t > 0 be arbitrary. By Chebyshev's inequality we have for the j'th component of the total excess demand:

p n,p . (zjn (p) > . etZ~(p) _ n&) < _ e- tn8 E n,p

where

ej

denotes the j'th unit vector in RI. Due to (HI) and (H2),

n-tCX)

= &(t)t where &(t)

---+

0 as t

---+

M

O. By choosing t small enough we thus see that

limsupn-llogPn;p(Z~(p) ~ n&)

< O.

n-tCX)

By symmetry, we have also limsupn-llogPn;p(Z~(p) ::::; -n&)

< 0,

n-tcx)

o

which completes the proof of Lemma 1.

Lemma 3.2. Suppose that the hypotheses (Hl-2) hold true. Then, for all &> 0, we have:

e- n (i(p)+2I a (p)18) < P(IZn(p)1 < n&) < e- n (i(p)-2I a (p)18) eventually.

Proof of Lemma 3.2. Recalling (2.3) we see that it suffices to prove that lim sup In- 1 logP(IZn(P)1

< n&) -c(o:(p);p)l::::; lo:(p)W

(3.1)

n-tCX)

Due to Lemma 1,

~ < 1- e- n'T)(8;p) < Pn;p(IZn(P) I < n&)

::::; 1 eventually,

and hence, in view of the definition of the probability measure Pn;p(}

Now clearly,

whence -log2 -lo:(p)ln& < logP(IZn(p)1 < n&) - Cn(o:(p);p)::::; lo:(p)ln& eventually, from which the claim (3.1) follows by letting n

---+ 00.

o

Random Exchange Economies

254

Now we are able to prove the upper bound inequality (i). To this end, note first that, due to the hypotheses (2.1), (H3) and the mean value theorem, we can conclude that the event 7r~

n U(p, E) =I- 0

implies the event

IZn(P) I ~ AA1(p)nE w.p.l, for all n;:::: 1, 0 < E < E1(P)· Thus, in view of Lemma 2, P(7r~

n U(p, E) =I- 0)

~ P(IZn(p)1

< AA1(p)nE) < e-n(i(p)-Mo(p)e:) eventually,

where the constant Mo(p) = 2AA1(p)la(p)l. For the lower bound we need the following lemma which is a straightforward corollary of Theorem XIV in [6].

Lemma 3.3. Suppose that f : R~ E-neighborhood of the price p:

---t

If"(q)1 ~ M <

Rl has bounded second derivative in an

00

for Iq - pi < E.

Moreover, suppose that the derivative f' (p)

E

Rl x I is non-singular, and

. E I} If '( p) -1 1< mm{2If(p)I' 4ME .

Then f(q) = 0 for some Iq - pi < E.

Proof of Lemma 3.3Let g(h) = j'(p)-l(J(p + h) - f(p)),

Ihl < E.

Then g(O) = 0, g'(O) = I (= the identity), and

It follows that

Let

z

~

- j'(p)-l f(p).

Then Izl = If'(p)-lllf(p)1 < ~ and hence by setting s = ~ in [L: Lemma XIV.1.3] we can conclude that there exists a unique Ihl < E satisfying g(h) = z, viz. f(p + h) = O.

Esa N ummelin

255

Now we are able to prove the lower bound inequality (ii). To this end, let

in Lemma 3. Due to (H4) and (H5), we have

and

Note that, by monotonicity, it suffices to prove the assertion for small only. Thus we may assume that

E < min {E2(P),

4A (P;A 2

-l(p)

E

> 0

},

where E2(P) is as in (H4). Now, in view of Lemma 3 it follows that, if

then n- 1 Zn(q) = 0 for some

Iq - pi < E,

VIZ.

1f~

n U(p, E)

=1=

0.

Finally, by Lemma 2

P(1f~ n U(p, E)

=1=

0) 2': P(ln- 1 Zn(P) I < A

> e-n(i(p)+Mt(p)c:)

E ( ))

P eventually, 2

-1

where the constant

This completes the proof of the theorem.

Acknowledgements I would like to thank Professor Krishna B. Athreya for the invitation to take part in this Festschrift in Honor of Professor Rabi Bhattacharya. I am indebted to Professor Mukul Majumdar for useful comments on the text.

Random Exchange Economies

256

Bibliography [1]

Bhattacharya, R.N. and Majumdar, M.: Random exchange economies. J. Economic Theory 6, 37-67 (1973).

[2]

Debreu, G.: Theory of Value. Wiley, 1959.

[3]

Dembo, A. and Zeitouni, 0.: Large Deviations and Applications. Jones & Bartlett, Boston, 1993.

[4]

Follmer, H.: Random economies with many interacting agents. J. Math. Economics 1, 52-62 (1974).

[5]

Hildenbrand, W.: Random preferences and equilibrium analysis. J. Economic Theory 3, 414-429 (1971).

[6]

Lang, S.: Real and Functional Analysis. Springer, New York, 1993.

[7]

Nummelin, E.: On the existence and convergence of price equilibria for random economies. The Annals of Applied Probability 10, 268-282 (2000).

[8]

Nummelin, E.: Large deviations of random vector fields with applications to economics. Advances in Applied Math. 24, 222-259 (2000).

[9]

Nummelin, E.: Manuscript, under preparation, 2003.

[10] Varian, H.: Microeconomic Analysis. Norton, New York, 1992.

Asymptotic estimation theory of change-point problems for time series regression models and its applications Takayuki Shiohama, Masanobu Taniguchi Osaka University, Japan

and Madan L. Puri Indiana University, USA Abstract It is important to detect the structural change in the trend of time series model. This paper addresses the problem of estimating change point in the trend of time series regression models with circular ARMA residuals. First we show the asymptotics of the likelihood ratio between contiguous hypotheses. Next we construct the maximum likelihood estimator (MLE) and Bayes estimator (BE) for unknown parameters including change point. Then it is shown that the proposed BE is asymptotically efficient, and that MLE is not so generally. Numerical studies and the applications are also given.

AMS subject classifications: 62MlO, 62M15, 62N99 Keywords: Change point, time series regression, asymptotic efficiency, Bayes estimator, maximum likelihood estimator.

1

Introduction

The change point problem for serially correlated data has been extensively studied in the literature. References on various time series models with change-point can be found in the book of Csorgo and Horvath (1997) and the review paper of Kokoszka and Leipus (2000). Focusing on a change point in the mean of linear process, Bai (1994) derived the limiting distribution of a consistent change-point estimator by least squares method. Later Kokoszka and Leipus (1998) studied the consistency of CUSUM type estimators of mean shift for dependent observations. Their results include long-memory processes. For a spectral parameter change in Gaussian stationary process, Picard (1985) addressed the problem of testing and estimation. Giraitis and Leipus (1990,1992) generalized Picard's results to the case when the process concerned is possibly non-Gaussian. For a structural change in regression model, a number of authors studied the testing and estimation of change point. It is important to detect the structural change in economic time series because parameter instability is common in this field. For testing structural changes in regression models with longmemory errors, Hidalgo and Robinson (1996) explored a testing procedure with 257

Asymptotic estimation theory

258

nonstochastic and stochastic regressors. Asymptotic properties of change-point estimator in linear regression models were obtained by Bai(1998), where the error process may include dependent and heteroskedastic observations. Despite the large body of literature on estimating unknown change-point in time series models, the asymptotic efficiency has been rarely discussed. For the case of independent and identically distributed observations, Ritov (1990) obtained an asymptotically efficient estimator of change point in distribution by a Bayesian approach. Also the asymptotic efficiency of Bayes estimator for change-point was studied by Kutoyants (1994) for diffusion-type process. Dabye and Kutoyants (2001) showed consistency for change-point in a Poisson process when the model was misspecified. The present paper develops the asymptotic theory of estimating unknown parameters in time series regression models with circular ARMA residuals. The model and the assumptions imposed are explained in Section 2. Also Section 2 discusses the fundamental asymptotics for the likelihood ratio process between contiguous hypotheses. Section 3 provides the asymptotics of the maximum likelihood estimator (MLE) and Bayes estimator (BE) for unknown parameters including change-point. Then it is shown that the BE is asymptotically efficient, and that the MLE is not so generally. Some numerical examples by simulations are given in Section 4. Section 5 is devoted to the investigation of some real time series data. All the proofs are collected in Section 6. Throughout this paper we use the following notations. A' denotes the transpose of a vector or matrix A and X(,) is the indicator function.

2

Asymptotics of likelihood ratio and some lemmas

Consider the following linear regression model Yt

where

= {o'x(t/n ::; T) + (3'x(t/n > T)}Zt + Ut, = Tt(O, (3, T) + Ut, (say), t = 1, ... , n

Zt = (Ztl, ... , Ztq)'

are observable regressors,

0

=

(Q1, ... ,

(2.1)

Qq)'

and (3 =

(/31, ... , /3q)' are unknown parameter vectors, and {ud is a Gaussian circular ARMA process with spectral density f()..) and E(ut) = O. Here T is an unknown change-point satisfying 0 < T < 1 and (0', (3', T) E E> c IRq x IRq x lR. Letting n-h

2:=

Zt+h,jZtk,

h

= 0,1, ...

t=l

n

2:=

ZHh,jZtk,

h = 0, -1, ... ,

t=l-h

we will make the following assumptions on the regressors {zd, which are a sort

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

259

of Grenander's conditions.

Assumption 2.1. l+p

(G.1) aii(O) = O(n),

i

= 1, ... , q,

and

L ZZi = O(p) for any (1 ~ l ~ n). t=l

(G.2) limn----;oo z~+l,daii(O)

= 0,

i = 1, ... , q.

(G.3) The limit lim

an (h)

_~_J_

n----;oo

n

= Pi ·(h) J

exists for every i, j = 1, ... ,q and h = 0, ±1, .... Let R(h) = {pij(h); i, j = 1, ... ,q}. (G.4) R(O) is nonsingular. From (G.3) there exists a Hermitian matrix function M(A) = {Mij(A); i,j = 1, ... ,q} with positive semidefinite increments such that (2.2)

Suppose that the stretch of series from model (1) Y n = (Yl,··· ,Yn)' is available. Denote the covariance matrix of Un = (Ul,···, un)' by 2: n , and let tn = (rl,··· ,rn)' with r t = r t ( a,{3, T). Then the likelihood function based on Y n is given by

Since we assume that {Ut} is a circular ARMA process, it is seen that the following representation

2: n =

U~diag{21T f(Ad,·

where Un = {n- 1 / 2 exp(21Tits/n); t, s son (1977)). Write

~n

has

.. ,21T f(An)} Un

= 1, ... ,n} and Ak = 21Tk/n (see Ander-

Then the likelihood function (2.3) is rewritten as

Define the local sequence for the parameters:

a n =a+n- 1 / 2 a,

f3n=f3+n- 1 / 2 b,

Tn=T+n-1p

(2.5)

Asymptotic estimation theory

260

where a, bE IRq and pER Under the local sequence (2.5) the likelihood ratio process is represented as

where dn(>'k)

= (27fn)-1/2 L~=1 UteitAk and A(>'k) = A1 + A2 + A3 with [Tn+p] Al

= (27f!(Ak))-1/2

L

((3 - o:)'zse-iSAk,

s=[Tn]+1 [m+p] A2 = -(27fn!(Ak))-1/2 a'zse-iSAk

L

8=1

and

s=[Tn+p]+1 Here note that dn(Ak), k = 1,2, ... are i.i.d. complex normal random variables with mean 0 and variance !(Ak) (c.f. Anderson (1977)). Henceforth we write the spectral representation of Ut by

Ut

=

i:

eitAdZu(A).

(2.7)

The asymptotic distribution of Zn (a, b, p) is given as follows.

Theorem 2.1. Suppose that Assumption 2.1 holds. Then for all (0:', (3', T) E the log-likelihood ratio has the asymptotic representation log Zn(a, b, p)

= ((3 - o:)'Wl 1 - 87f2

+ yTa'W2 + ~b'W3 [Tn+p]

<Xl

L

f(j)

j=-<Xl

=

log Z(a, b, p)

where

L

((3 - 0:)' zs+jZ~((3 - 0:)

s=[Tn]+1

+ op(l),

(say),

e,

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

261

and

Here WI, W 2 and W3 are asymptotically normal with mean 0 and covariance matrix VI, V2 and V3 , respectively, where

Next we present some fundamental lemmas which are useful in the estimation of change point.

Lemma 2.1. Suppose that Assumption 2.1 holds. Then for any compact set C C 8, we have sup Ea,(3,TZ~/2(a, b, p) ::; exp{ -g(a, b, pH a,{3,TEC

where g(a, b, p)

= (ai, b')K

(~) + cipi

with some positive definite matrix K and c >

o.

Lemma 2.2. Suppose that Assumption 2.1 holds. Then for any compact set C C 8, there exist Ii(C) = Ii, B(C) = B such that sup [iial - a211 2 + Ilb l (a,{3,T)EC!ai I
3

E a ,{3,T

[Z~/4( a2, b2, P2) - Z~/4(al' bl, PI)

-

r: ;

r

b2112 + IPI - P21 2 B(l

I

+ H"").

Estimation theory

We are interested in the behavior of maximum likelihood estimator (MLE) and Bayes estimator (BE). To introduce these estimators, we need a loss function w(y), y E ]Rd which is 1. nonnegative, continuous at point 0 and w(O)

= 0, but is not identically 0;

2. symmetric: w(y) = w( -y); 3. the sets {y : w(y) < c} are convex for all c > O.

Asymptotic estimation theory

262

We denote by W p the class of loss functions satisfying 1-3 with polynomial majorants. The example of such function is w(y) = lylP,p > O.

A'

=

The MLE 0ML

,A'

(Q ML , (3ML,TMd of 0

" = (0.', (3 ,T)

=

L(QML' i3ML,TML)

is defined by (3.1 )

L(o, (3, T)

max (a,{J,T)Ee

-,

-,

The Bayes estimator 0 E = (a~, (3 E, PE) with respect to the quadratic loss function l(x) = IIxl1 2 and a prior density 7rC) is of the form

OE

=

r Op(OIYn)dO

(3.2)

Je

where

=

p(OIYn )

7r(O)Ln(O)

Je 7r(v)Ln(v)dv

.

We suppose that the prior density is a bounded, positive and continuous function possessing a polynomial majorant on e. For Z (u), u = (a', b' , p)', in Theorem 1, define two random vectors

u'

A'

(ii', b , p) and iL'

=

Z(u) =

_,

= (ii', b

,p) by relations

Z(u),

sup

(3.3)

UElF!.2q+l

_

U=

flF!.2d 1

uZ(u)du

(3.4)

~----------

flF!.2Q+l

Z (v )dv

Theorem 3.1. Let the parameter set e be an open subset of jR2q+l. Then the MLE is uniformly on (0., (3, T) E e, consistent P -

lim OM L

n--+CXl

=0

and converges in distribution

£e( diag{ vITi, ...

,vITi, n})( 9M L

For any continuous loss function w E W

lim EOw((diag{ vITi,,,,

p,

-

O)} -----t £( u). d

we have

,vITi, n} )(9 ML -

n--+CXl

0)) = Ew(u).

A similar theorem for Bayes estimators can be stated as follows. Theorem 3.2. The Bayes estimator OE, uniformly on 0 E

e,

is consistent

Pe - lim OE = 0 n--+CXl

and converges in distribution

£e( diag{ vITi, ...

,vITi, n})(O E

-

O)}

-----t

d

£( iL).

For any continuous loss function w E W P' we have

lim EOw((diag{ vITi,,,,

n--+CXl

,vITi, n} )(OE -

0)) = Ew(iL).

Remark. From Theorem 3 and Theorem 1.9.1 oflbragimov and Has'minski(1981), we can see that the BE is asymptotically efficient such that

Ellul1 22: ElliLl12.

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

4

263

Numerical examples

In this section we report some Monte Carlo results for the MLE and BE of an unknown change point. We consider the following time series regression model:

_{a'{3

Yt -

,

+ Ut, t = 1, ... , [Tn] Zt + Ut, t = [Tn] + 1 ... ,n, Zt

(4.1)

where {ud is a Gaussian AR(I) process generated by Ut

=

~Ut-l

+ Ct,

Ct "-'

i.i.d.N(O, (}2).

To verify the theoretical results and for comparative purposes, we deal with the following regressors; Model (I) : Zt = 1 (scalar-valued), Model (II): Zt = cos(vt) (scalar-valued), Model (III): Zt = (1, cos(vt))'. For simplicity, we assume that the parameters a, (3, ~ and () are known and focus on the estimation of unknown change point T. The error term Ct'S are same across different combinations of parameters and models. The coefficients (a,{3) are taken to be (0,2), (1,3) and ((0,1),,(2,3),) for the corresponding Models (I), (II) and (III), respectively, and v = 7r/6. The MLE and BE with uniform prior of T are given by

k = inf{k:

max {Ln(i/n)}

l:::=;t
= Ln(k/n)},

and Ti

= i/n,

i

= 1, ... ,n - 1

respectively. Then we compute the mean and the square root of the mean square error (RMSE) for TM Land TB based on 100 replications. Table 4.1 summarizes the simulation results for ~ = 0.7,0.9 and n = 100,300. The change point T is fixed to be 0.5. A closer inspection of Table 1 reveals some interesting characteristics. First, we notice that, in each case, the RMSE of BE is smaller than that of MLE, however mean estimates are almost same for all cases. A change in a cosine trend function seems to increase the bias of a change point estimators, while for n = 300, the mean estimates lie in the vicinity of 0.5. The effect of large value of ~ (near unit root) for MLE is particularly significant for Model (I) in view of RMSE. The histogram of these results are plotted in Figures 4.1 and 4.2 for ~ = 0.7 and ~ = 0.9, respectively, when n = 100. A study of these figures facilitates understanding of the simulation results in Table 4.1. It is obvious that the shape of distributions for MLE and BE is different when ~ = 0.9. The former has a fatter tail in general, while the latter has high frequencies around 0.5. For Models (II) and (III), the distributions of MLE and BE are skewed to the right,

264

Asymptotic estimation theory

which causes an increase in bias of an estimator. These facts are verified by comparing the sample coefficient of skewness and the sample kurtosis which are listed in Figures 4.2 and 4.3 together. It is questioned how large the RMES becomes for different values of ~ and the cases when the change point locates the edge of samples. A perspective view of the result given in Table 1 for the RMSE of Model (I) is shown in Figures 4.3 over a grid of points T = 0.1, ... ,0.9 and ~ = 0.1, ... ,0.9. According to this figure, as it is expected, we observe that the RMSE increases as ~ increases. However it seems that the RMSE's are stable and unaffected by T even though T is close to 0.1 and 0.9 when ~ is from 0.1 to 0.7. The discrepancy of RMSE between MLE and BE is significantly large for ~ = 0.9 and T = 0.5. As it can be seen from this figure that the BE works better than MLE in terms of RMSE in all cases. Next, we investigate the effect of the selection of frequency v in Model (II). The autoregressive parameter ~ is fixed at 0.7. Table 4.2 presents the results . We observe that the precision of the change point estimates deteriorates when v is close to 0 when n = 100. While the consistency is convincing for large n, the RSEM of MLE and BE becomes large as the frequency v tends to O. We summarize the simulation results as follows. First, the performance of BE is better than MLE in terms of RMSE, which is consistent with the theoretical result given in the previous section. Even though we assumed that the parameters except for change point are known, it is expected that similar characteristics will be observed for the cases of unknown parameters. To see these, we will report some real data analysis in next section.

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

265

Model (I) 1il

iil

_.1..__

~

!il

0,0

0,2

0,4

0,6

0,8

0

'" g 1,0

0,0

0.2

--L 0.4

0,6

0,8

1.0

0,8

1.0

0,8

1.0

(b) BE

(a) MLE

Model (II) iil

iil

(')

0

g

g

g 0

0,0

0,2

0,4

0,6

0,8

1,0

0.0

0,2

(c) MLE

Model

0,4

0,6

(d) BE

(III)

iil

iil

0

g

g

g

'"

0

0

0.0

0,2

0,4

0,6

(e) MLE

0,8

1,0

0,0

0,2

0.4

0,6

(f) BE

Figure 4.1. Histograms for the results of Table 1 for ~ = 0.7 and n = 100. The sample coefficient of skewness PI and the sample kurtosis P2 are: (a) PI = 0.70, P2 = 7.12; (b) PI = -0.01, P2 = 4.68; (c) PI = 0.12, P2 = 4.74; (d) PI = 0.42,P2 = 3.56; (e) PI = 0.18,p2 = 5.55; (f) PI = 0.77,P2 = 5.10.

Asymptotic estimation theory

266 Table 4.1

Average estimates and RMSE of

T

when

T

0.5 RMSE

Mean

n = 100

n = 300

n = 100

n = 300

MLE

BE

MLE

BE

MLE

BE

MLE

BE

0.4955

0.4893

0.5032

0.4983

0.1121

0.0858

0.0497

0.0422

0.4726

0.4924

0.4998

0.5144

0.1981

0.1121

0.1840

0.1220

0.5197

0.5207

0.5000

0.4978

0.1187

0.0854

0.0394

0.0336

0.5081

0.5091

0.4984

0.4975

0.1348

0.1058

0.0425

0.0350

0.5311

0.5313

0.4932

0.4940

0.1100

0.0916

0.0337

0.0282

0.5314

0.5361

0.4900

0.4885

0.1597

0.1315

0.0538

0.0438

Model (I)

= 0.7 ~ = 0.9 ~

Model (II) ~ ~

= 0.7 = 0.9

Model (III)

= 0.7 ~ = 0.9 ~

5

Real data applications

This section is devoted to the application of change point estimation to three data sets (Nile data, U. S. quarterly unemployment rate and international airline ticket sales data) where a visible change point can be observed. Based on these data, we fit (4.1). The estimation procedure is as follows. First, we estimate the unknown parameters by a maximum likelihood method. For fixed k, q -s: k -s: n - q, the MLE of Q and f3 is given by

Then we can estimate the spectral density of the residual process {Ut = Yt -

{O:~X(t -s: k) + 13~X(t > k)}zd using the following nonparametric estimator

where M = n 2/ 5, w(·) is a weight function and rk,n(l) = n- 1 L~:; UtUt+z. Hence the likelihood function is calculated using this spectral estimates. The MLE's of unknown parameters are

k = inf{k: TML = kin,

max

q5Y5,n-q O:ML

L(oi,13i,iln) = L(Ok,13k' kin)}

= Ok and 13ML = 13 k .

(5.1)

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

267

Model (I) !i

g ~ ~

__L 0.0

02

0.4

0.6

g 0

OJ

S! 0.8

0.0

1.0

02

0.4

0.6

011

1.0

(b) BE

(a) MLE

Model (II) ~

til ~

0.0

.__L 02

0.4

0.6

:?l

_.L..~

til ~

0.8

0.0

1.0

0.2

(e) MLE

0.4

0.6

0.8

1.0

0.8

1.0

(d) BE

Model (III) g

g

til

~

~

g 0

0

0.0

0.2

0.4

0.6

(e) MLE

0.8

1.0

0.0

0.2

0.4

0.6

(f) BE

Figure 4.2. Histograms for the results of Table 1 for ~ = 0.9 and n = 100. The sample coefficient of skewness {11 and the sample kurtosis {12 are: (a) {11 = 0.11, {12 = 3.25; (b) {1l = 0.34, {12 = 2.71; (c) {11 = 0.93, {12 = 5.50; (d) JLl = 0.63, {12 = 4.00; (e) {1l = 0.54, {12 = 3.51; (f) {1l = 0.34, {12 = 2.95.

Asymptotic estimation theory

268

A

./'/ ,,11./ ~

0

m :::E

'

.

........p: .......o••_.--0'"

•••• '4,

t"!

0

0:::

.-: 0

~

Figure 4.3. RMSE of Model (I) when n

= 100.

Q---1J

MlE

"' •••••••• -1>.

BE

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

269

Table 4.2. Average and RMSE of MLE and BE for

T

when

T

= 0.5 for Model (II).

Mean

RMSE

n

= 100

n

= 300

n

= 100

n

= 300

v

MLE

BE

MLE

BE

MLE

BE

MLE

BE

7r /2

0.5028

0.5017

0.5005

0.5004

0.0250

0.0201

0.0074

0.0065

7r /4

0.4848

0.4849

0.4944

0.4947

0.0584

0.0496

0.0266

0.0211

7r /8

0.4840

0.4969

0.4857

0.4895

0.1361

0.1217

0.0551

0.0418

7r/16

0.5847

0.5710

0.5183

0.5161

0.2283

0.1629

0.0833

0.0697

7r/32

0.5434

0.5381

0.4613

0.4675

0.2141

0.1715

0.1285

0.1021

Next we compute the Bayes estimator. For simplicity of calculation, we postulate the result that the asymptotic distribution of aM Land i3 B are same as CxB and (c.f. Kutoyants (1994)). Therefore the Bayes change point estimator TB becomes

i3B _

TB =

L~::qq TiLn(&ML, i3ML' Ti) n-q Li=q Ln(D'.ML, J3ML' Ti) A

A

,

Ti

= i/n,

1,

= q, ... , n -

q.

Nile data

These data have been investigated by an i.i.d. framework, for details see e.g., Cobb (1978) and Hinkley and Schechtmann (1987). The data consist ofreadings of the annual flows of the Nile River at Aswan from 1871 to 1970. There was a shift in the flow levels in 1899, which was attributed partly to the weather changes and partly to the start of construction work for a new dam at Aswan. We apply a mean shift model for this data with Zt = 1. The MLE gives aML = 1097.75, ~ML = 849.97 and T = 0.28 (k = 28). On the other hand, the BE is TB = 0.2790(k = [TBn] = 27). The original series together with ML trend estimator are plotted in Figure 5.1. Figure 5.2 shows the posterior distribution of T, which shows strong evidence that the shift occurred in 1898. These results agree with those of the other authors. U. S. quarterly unemployment rates

This data set, (n = 184), is analyzed in Tsay (2002) by use of threshold AR model for first differenced series. Here we explain a seasonal trend by employing regression models with trigonometric functions and change point. The regression function is chosen to be Zt = (1, cos(vt))'. A Fisher's test for added deterministic periodic component rejects the Gaussian white noise at level .01. We have taken v = 47r /184 which gives the peak in the periodogram. The MLE detected the possible change point TML = 0.49(k = 90) and corresponding regression coefficients &ML = (4.65, -0.85)' and i3ML = (6.81, -0.94)'. The BE is TB = 0.49 which corresponds to k = [TBn] = 90. The estimated trend

Asymptotic estimation theory

270

~~----------------------------------------------------------~

o

20

40

60

100

80

Figure 5.1. Nile data with estimated mean and change point

k = 28

(MLE).

d-

~-

;;d-

~0

20

40

Figure 5.2. Posterior distribution of T.

60

80

100

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

271

function together with original data is shown in Figure 5.3. The posterior distribution for T is plotted in Figure 5.4. This analysis reveals that the mean level of an unemployment rate increased to about 2% in 3rd quarter of 1970, while the amplitude of long term cyclical trend stayed the same level throughout the period.

International airline ticket sales data This data have been investigated by fitting a seasonal ARIMA model (Box et. al. (1994) ). An alternative modeling is deterministic cyclical trend function modeling with a change point for once-differentiated data. The regression function given by z~ = (COS(VIt),COS(V2t),COS(V3t)) is selected by examining the periodogram. There are three frequencies which have comparably large spectrum, namely VI = 267r/143, V2 = 507r/143 and V3 = 747r/143. The ML estimators give the &ML = (-7.54,14.14, 1.43)',.6ML = (-35.76,37.01, -19.66)' and TML = 0.6319(k = 91). While Bayes estimator is TB = 0.6216(k = 89). As shown in the posterior probability of T, the change might have occurred from t = 80 to 100, which implies the possibility of multiple changes.

6

Proofs

Proof of Theorem 1. From (2.7), we have log Zn(a,,6, T) = -

2~

t

f(Ak)-1/2 {dn(Ak)A(Ak)

+ dn(Ak)

A(Ak)} -

k=1

2~

t

(6.1)

IA(Ak)12

k=1

First we evaluate the first term in (6.2). From (2.7) we have

-

2~

t

f(Ak)-1/2 {dn(Ak)A(Ak)

+ dn(Ak)

A(Ak)}

k=1

= __1_

2yfii

f(Ak)-1/2

k=l

+ dn (Ak)A2 + dn (Ak)A3 + dn(Ak)Al + dn(Ak)A2 + dn(Ak)A3} El + E2 + E3 + E4 + E5 + E6 (say). X

=

t

{dn(Ak)A I

Write the spectral density f(A) in the form

where Rf(j)'s satisfy 2.:;:-00 IjlmIRf(j)1 < 00 for any given mEN. Then, from Theorem 3.8.3 of Brillinger (1975) we may write

Asymptotic estimation theory

272

o

50

100

150

Figure 5.3. U. S. quarterly unemployment rates (1948-1993) with estimated trend and change point k = 90(MLE).

C>

~U">

~C>

~U">

tiC>

ciU">

ci-

:30

50

Figure 5.4. Posterior distribution of T.

100

150

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

o

20

40

60

80

100

273

120

140

Figure 5.5. The international airline ticket sales, once -differentiated data (dotted line) with estimated trend and change point k = 91 (black line).

<:0

ci~v

~-

gj ci

<:>

ci

0

20

40

60

Figure 5.6. Posterior probabilities of T.

80

100

120

140

Asymptotic estimation theory

274

where r(j) 's satisfy for any given mEN 00

~ Ijlmlr(j)1 <

00.

j=-oo

Then E1 can be written as El = -

1 ~ 1/2 2..jii ~ f()..k)dn()..k)Al k=l n

= -

~n+~

n

4~7r ~ f()..k)-l L k=l

1

1

n

= -4n7r - '" ~ 27r k=l

1

L

({3 - 0)' ZsUtei(t-S)Ak

t=l s=[Tn]+l

1

n

00

~ r(j)e- ijAk ' " ~

~

[Tn+p] '"

~

({3 - 0)' ZsUtei(t-S)Ak

t=l s=[Tn]+l

j=-oo

noon

[Tn+p]

.,

= -4n7r27r~ - - '" '" r(j) ' " ' " ({3 - 0)' ZsUtet(t-S-J)Ak ~ ~ ~ t=l s=[Tn]+l

k=l j=-oo

It is well known that

~ ei(t-s-j)Ak = ~

k=l

{n0 ifotherwise. t - s - j = 0 (mod n)

Since -[Tn + p] ~ t - s ~ [(1 - T)n] and r(j) satisfies any given m, we have

2: j

(6.2)

Ijlmlr(j)1 <

00

for

Hence we have only to evaluate E1 for 1 = 0 of t - s - j = In. Thus E1 is

1 1 E1

n

00

= ---

47r27r ~

~

({3 - 0)' ZsUt-

L

L

r(j)

j=-oo

n ' " ei(t-s-j)Ak

n~

t=l s=[Tn]+l

k=l

[Tn+p]

00

~ - 87r 2

'"

~

j=-oo

1

1

[m+p]

~ r(j) ' "

_

({3 - 0)' zs{ us+j} ==

El

(say).

s=[Tn]+l

Then _

1

El = -

1 47r ({3 - a)'

J 7T

7T

-

=

~({3 2

O)'WI

~

J7T

Zs

-7T

eijAeisAdZu()")

s=[Tn]+l

J=-OO

=-

[Tn+p]

00

87r 2 . ~ r(j)({3 - 0)' [m+p]

~

s=[Tn]+l

(say),

zse iSA f()..)-ldZ u ()")

(6.3)

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

275

where Zu(A) is the spectral measure of Ut defined by (2.7). Let L~::Y~~~l+l zse is ).. = A(A; h, p). we observe

Recalling that {Ut} is Gaussian, we have

WI

(0, 4:

r; N

2

i:

A(A; h, p)A *(A; h, P)f(A)-ldA))

(6.4)

Similarly we obtain

(6.5)

Next we calculate the second term E2 that is

E2 = -

t

2~

f(Ak)-1/2d n (Ak)A 2

k=l

n

n

~n+~

= _1_ ' " f(Ak)-1_1- ' " ' " Uta' zsei(t-S»)..k 4mf~

fo~ ~ t=l

k=l

n

=_1_~", 4mf 27f

'"

~j~<Xl <Xl

s=l

n

<Xl

[Tn+p]

r(j)e-ij)..k_1_", ' " a'Utzsei(t-S»)..k

fo f:-t ~

n

[Tn+p]

n

= ~~ '" f(j)_l_ ' " ' " a'Utzs.!. ' " ei)..dt-s-j). 47f27f

j~<Xl

fo8 ~

n~

Here note that n - 1 ;::::: t - S ;: : : -[Tn]. Because of (6.2) we have only to evaluate E2 for l = 0, 1 of t - s - j = In. Then

Asymptotic estimation theory

276

Similarly as in E 1 ,

.;Ta' = -2-W2

(say),

where

W 2 ---+ N Dt

(0, ~ j7r-7r 2f ()...)-ldM()"')) , 2n

(6.7)

which follows from the Riemann-Lebesgue theorem and Grenander's conditions (G.1) - (G.4). Similarly we obtain. (6.8) Next

Since [(1 - T)n] 2': t - s 2': 1 - n, we have only to evaluate E3 for I = 0, -1 of t - s - j = In. Hence

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

Similarly as in

E2

277

we have

(6.9)

- .;r=Tb' W 2 3, where (6.10) Similarly we obtain.

E 6 = .;r=Tb' W3 2

(6.11)

Hence from (6.4), (6.5), (6.7), (6.8), (6.10) and (6.11), we have

2~

t

jP..k)-1/2 { dn(Ak)A(Ak) + dn(Ak) A(Ak) } k=l c:::: (f3 - a)'Wl + JTa'W2 + .;r=Tb'W3 . -

(6.12)

Next we evaluate the second term in (6.2), which is 1 - 2n

n

L

JA(Ak)J2

k=l 1

n

= - - ~(Al 2n~

+ A2 + A3)(Al + A2 + A 3)

k=l

=

-~ ~(JAlJ2 + JA 2J2 + JA3J2 + AlA2 + AlA3 + A2A3 + +A2 A l + A3 A l + A3 A 2). 2n~ k=l

Asymptotic estimation theory

278 We have

(6.13)

n

00

= __1_ ~ ~ ~ r(j)e-~j>'k 4mf L..t 27r L..t k=1

1 1

00

47r 27r L..t

r(j)

L

L..t

~

({3 _ a)' Zt z :({3 _

a)ei(t-S»,k

[Tn+p] ~

1

[Tn+p] ~

L..t

L..t

({3 - a)' ZtZ' ({3 - a)-

n. . ~ e~(t-S-J»,k

n L..t

S

k=1

[Tn+p]

00

= - 47r 27r

L..t

t=[Tn]+1 s=[Tn]+1

j=-oo

1 1

[Tn+p]

~

t=[Tn]+1 s=[Tn]+1

j=-oo

~

= ---

[Tn+p]

L

r(j)

j=-oo

({3 - a)' zs+jz:({3 - a).

s=[Tn]+1

Next we have

= __1_~ 4n7r 27r

L 00

j=-oo

[Tn+p] [Tn+p]

r(j)a' ~ L..t t=1

~

L..t s=1

Zt Z ' S

a

{n

~ ~ ei(t-s-j»,k

}

n L..t

.

k=1

Note that [Tn] 2': t - s 2': -[Tn]. Similarly we have

(6.14)

1 ~ 7r 7r J=-OO .

T

= -4-2 L..t r(j)a' = -

T

47r a'

j7r e~J.. )' dM(A)a = --a' j7r -1 T

-7r

j7r f(A)-ldM(A)a -7r

47r

-7r

~ .. )' L..t r(j)e~J dM(A)a

27r J=-OO .

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

279

Also we obtain

(6.15) 1

n

A312 --2:I 2n k=l

=

~

1 1 --~-4mf k=l f(>"k)

(1-

..;n

1

~

~ t=[Tn+p]+l

loon = --~ r(j)a' ~

4mf 21r ~

=-

1-

T

= _1-

~

b' /71"

T

n

ZtZ' a

{I

f

n.

b' ze -is>''k ) S

.}

- ~ et(t-S-J)>"k n ~

8

t=[Tn+p]+18=[Tn+p]+1

-71" 21r J=-OO .

41r

..;n s=[Tn+p]+l

~ ~

~

j=-oo

In ~ , it>") bzte k ( -~

k=l

r(j)eij>"dM(>..)b

b' /71" f(>..)-ldM(>..)b.

-71"

41r

The fourth term becomes

1 -2n

n

2: AIA2 k=l

=-

4~1r t k=l

f

;1r

r(j)e-ij>"k ( [Tf] ((3 - 0)' Zt eit >..)

j=-oo 00

j=-oo

In

t=[Tn]+l [Tn+p] [Tn+p]

= ~~_1_ ~ r(j) ~ 41r 21r..;n ~

(-

~

t=h+1

From 1 - p ::; t - S ::; [Tn]

+p-

8=1 n

~ ((3 - 0)' ztz~a.!. ~ ei(t-s-j)>"k ~

8=1

1, t - s - j

ITfl a' z,e- i ' ' ' , )

n~ k=l

= 0, it is seen that

Asymptotic estimation theory

280

Similarly we observe

(6.17)

Now we evaluate

n = -~~~ '" r(j) '~ " '" a'zt z ' b~ 47r 27r n ~ ~ s n j=-oo t=l s=[Tn+p]+l ~n+~

00

n ' " ei(t-s-j)Ak. ~ k=l

Since -n + 1 :::; t - s :::; -1, we have only to evaluate for t - s - j

= 0, -no (6.18)

1 n -2n LA2 A 3 k=l

1 1

~ - - - J r ( l - r)

47r 27r

~

1 [Tn+p] 1 n 1 n. . r(j)L a'ztz:b- Le~(t-S-J)Ak j=-oo VITi t=l y'(1 - r)n s=[Tn+p]+l n k=l

L

L

00

f=

_ Jr(l - r) r(j)a' j7r eijAdM(A)b 47r . 27r J=-OO . -7r

= _ y'' ' --'r('' ' ---l--r----,-) a' j7r f(A)-ldM(A)b. 47r

-7r

Similarly we have

_~

t

2n k=l

A3A2 '" - Jr(l - r) a' j7r f(A)-ldM(A)b. 47r_7r

(6.19)

From the equations from (6.14) to (6.19) together with (6.4), (6.7), (6.10) and (6.13) complete the proof of Theorem l. Proof of Lemma 1. From Hannan (1970) and Anderson (1977) the joint density of dn(Al), ... ,dn(An) is given by k

p(dn(Al),··· ,dn(An)) = en

II exp( -dn(Ak)f(Ak)-ldn(Ak)) k=l

(6.20)

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

281

EZ~/2(a, b, p)

= Eexp

In [ - 4J7i ( ;

JI

Cn exp ( -

=

=

x exp ( -

4~

x exp ( -

4~

t.

t,

J... J

1

exp [ 16n

t,

d n(Ak)!(Ak)-l d n(Akl)

/(Ak)-1/2 { d,,(Ak)A(Ak)

t,

(f(,\k)-1/2 dn(Ak) 1

I: IA(Ak)12 n

4n

k=l

=

exp (

-l~n

t.

n

exp - 4n {; IA(Ak)12

+ dn(A.)

A(Ak) } )

IA(Ak)!') d(d,,(A1)" . dn(An))

C n exp [-

X

[1

] f(Ak)~1/2 {dn(Ak)A(Ak) + dn(Ad _ A(Ak)}

+ ~j;;))

I: IA(Ak)12 n

]

(mk)

1/2dn(Ak)

+ ~j;;l)

d(dn(AI)··· dn(An)

k=l

IA (Ak)1 2 )

.

Recalling the definition of likelihood process in (2.7), we have

From the proof of Theorem 1 and Assumption (G.1), the first term in (6.21) is bounded by 1

n

-16n

L (AlAI) k=l

3 1 [m+p] ~ -1681[2

(6.22)

[Tn+p]

I:

I:

({3 - a)' Ztr(t - s)zs({3 - a) t=[Tn]+1 s=[Tn]+1 3 1 [Tn+p] < - - --2 " " ' { ({3 - a)' zt} 2 x min f (A) ~ I 16 81[ ~ A t=[Tn]+1 = - [O(p)] for p> O. We have already shown in (6.17) and (6.18) that

1~n

t

{AI(A2

+ A 3)} = O(n~I/2)

k=l

and _1_

~ {AI (A2 + A3)} = O(n~I/2).

16n~

k=l

(6.23)

]

]

Asymptotic estimation theory

282

Furthermore, from the proof of Theorem 1 we can find a positive definite matrix K so that (6.24) Hence (6.23)-(6.24) implies the required result. Proof of Lemma 2. Let ()~ = (a~,,6~, 71)' and (); = (a;,,6; 72)' are some given values in 8, and are the forms of a1 = a + n- 1/ 2a1,,61 = ,6 + n- 1/ 2b2, 71 = 7+n- 1pl,a2 = a+n- 1/ 2a2,,62 = ,6+n- 1/ 2b1 and 72 = 7+n- 1p2. Denoting A(..\'k) under (}i as A(ai, bi , Pi; Ak) we set ~ln

= A(al,b1,Pl;Ak) - A(a2,b2,P2;Ak)

~2n

= IA(a1,b1,Pl;Ak)1 2 -IA(a2,b2,P2;Ak)1 2

and

The process Y n is written as (6.25) Then we observe

E a ,{3,T

\Z~/4(al' b 1 , PI) - Z~/4(a2' b2, P2)\1/4

= E a1 ,{31,Tl (1 - Yn)4

= E (1 - 4Yn + 6Y; - 4Y; + Y;) We have

Similarly, we obtain

6EY;

=

6exp(41] + 2,),

and

EY;

=

exp(161] + 4,).

Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri

283

Hence

(6.26) Using the following expansion for small y

we have E[l - Yn]4

= 1 - 4(1 + 1] + ,) + 6(1 + 41] + 2,) - 4(1 + 91] + 3,) + (1 + 161] + 4,) =

+0(1]2) + 0(/2) + 0(1],) 0 + 0(1]2) + 0(/2) + 0(1],)

which implies that the Taylor expansion of (6.26) starts with the linear combinations of second order terms of 1]2, ,2 and 1],. Here we need to evaluate the asymptotics of 1] and, in (6.26). Assume that without loss of generality PI 2 P2, then

Using the similar argument in proof of Lemma 1, we observe

which 'is written as

Analogously we have

which completes the proof.

Proof of Theorem 2. The proof follows from Theorem 1, Lemmas 1 and 2 of this paper and Theorem 1.10.1 of Ibragimov and Has'minski (1981). Proof of Theorem 3. The properties of the likelihood ratio Zn (a, b, p) established in Theorem 1, Lemmas 1 and 2 allow us to refer to Theorem 1.10.2 of Ibragimov and Has'minski (1981).

Bibliography [1] Anderson, T. W. (1977). Estimation for autoregressive moving average models in time and frequency domains. Ann. Statist. 5 842-865.

284

Asymptotic estimation theory

[2] Bai, J. (1994). Least squares estimation of a shift in linear processes. J. Time Ser. Anal. 15453-472. [3] Bai, J. (1997). Estimation of change point in multiple regression models. The Review of Economics and Statistics. 79 551-563. [4] Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (1994). Time Series Analysis Forcasting and Control, 3rd. ed. Prentice Hall, New Jersey. [5] Brillinger, D. R. (1981). Time Series: Data Analysis and Theory, expanded ed. San Francisco: Holden-day. [6] Cobb, G. W. (1978). The probrem of the Nile: Conditional solution to a change-point problem, Biometrika. 65 243-251. [7] Csorgo, M. and Horvath, L. (1997). Limit Theorems in Change-Point Analysis. Wiley, New York. [8] Dabye, Ali S. and Kutoyants, Yu. A. (2001). Misspecified change-point estimation problem for a Poisson process. J. Appl. Prob. 38A 701-709. [9] Giraitis, L. and Leipus, R. (1990). A functional CLT for nonparametric estimates of spectra and change-point problem for spectral function. Lietunos Mathematikos Rinkinys. 30674-697. [10] Giraitis, L. and Leipus, R. (1992). Testing and estimating in the change-point problem of the spectral function. Lietunos Mathematikos Rinkinys. 32 20-38. [11] Hidalgo, J. and Robinson, P. M. (1996). Testing for structural change in a long-memory environment. J. Econometrics. 70 159-174. [12] Hinkley, D. V. and Schechtman, E. (1987). Conditional bootstrap methods in the meanshift model. Biometrika. 74 85-93. [13] Hannan, E. J. (1970). Multiple Time Series. Wiley, New York. [14] Ibragimov, I. A. and Has'minski, R. Z. (1981). Statistical Estimation. New York: Springer-Verlag [15] Kokoszka, P. and Leipus, R. (1998). Change-point in the mean of dependent observations. Statist. and Probab. Letters. 40 385-393. [16] Kokoszka, P. and Leipus, R. (2000). Detection and estimation of changes in regime. Preprint. [17] Kutoyants, Yu. A. (1994). Identification of Dynamical System with Small Noise. Dordrecht: Kluwer Academic Publishers. [18] Picard, D. (1985). Testing and estimating change points in time series. Adv. in Appl. Probab. 17 841-867. [19] Ritov, y. (1990). Asymptotic efficient estimation of the change point with unknown distributions. Ann. Statist. 18 1829-1839. [20] Tsay, R. S. (2002). Analysis of Financial Time Series. Wiley, New York.

Fractional Brownian motion as a differentiable generalized Gaussian process Victoria Zinde-Walsh 1 McGill University

fj

CIREQ

and Peter C.B. Phillips 2 Cowles Foundation, Yale University University of Auckland fj University of York Abstract Brownian motion can be characterized as a generalized random process and, as such, has a generalized derivative whose covariance functional is the delta function. In a similar fashion, fractional Brownian motion can be interpreted as a generalized random process and shown to possess a generalized derivative. The resulting process is a generalized Gaussian process with mean functional zero and covariance functional that can be interpreted as a fractional integral or fractional derivative of the deltafunction.

Keywords: Brownian motion, fractional Brownian motion, fractional derivative, covariance functional, delta function, generalized derivative, generalized Gaussian process JEL Classification Number: C32, Time Series Models

1

Introd uction

Fractional Brownian motion, like ordinary Brownian motion, has almost everywhere continuous sample paths of unbounded variation and ordinary derivatives of the process do not exist. Gel'fand and Vilenkin (1964) provided an alternative characterization of Brownian motion as a generalized Gaussian process defined as a random functional on a space of well behaved functions. Interpreted as a generalized random process, Brownian motion is differentiable. A generalized Gaussian process is uniquely determined by its mean functional and the bivariate covariance functional. Correspondingly, the generalized derivative of a Gaussian process with zero mean functional is a generalized Gaussian process with zero mean functional and covariance functional that can be computed from the covariance functional of the original process. Gel'fand and Vilenkin provide a description of the generalized Gaussian process which represents the derivative of Brownian motion. This process has a covariance functional that can be interpreted in terms of the delta-function. 1 Zinde-Walsh thanks the Fonds Quebecois de la recherche sur la societ e et la culture (FQRSC) and the Social Sciences and Humanities Research Council of Canada (SSHRC) for support of this research. 2Phillips thanks the NSF for support under Grant No. SES 0092509.

285

Fractional Brownian Motion

286

The present paper considers fractional Brownian motion from the same perspective as a generalized process and shows how to characterize its generalized derivative. The resulting process is a generalized Gaussian process with mean functional zero and covariance functional that can be interpreted as a fractional integral or fractional derivative of the delta-function. Higher order derivatives can be similarly described.

2

Fractional Brownian motion as a generalized random process

The form of the fractional Brownian motion process considered here was introduced by Mandelbrot and Van Ness (1968). In Marinucci and Robinson(1999) it is called Type I fractional Brownian motion. This form of (standard) fractional Brownian motion for 0 < H < 1 is represented in integral form as

BH(r) = A(H)-l

[1:00 (r - s)H-~ dB(s) - [°00 (_s)H-~ dB(S)] ,

r 2: 0 (2.1)

with A(H)

=

[2k + fo

oo

{(I

+ s) H-~

1

- sH-!} ds]"2 and where B is standard

Brownian motion and H is the self similarity index. For H = ~ the process coincides with Brownian motion. Samorodnitsky and Taqqu (1994, ch.7.2) give the 'moving average' representation (2.1) as well as an alternative harmonizable representation of the fractional Brownian motion process. Bhattacharya and Waymire (1990) provide some background discussion of the Hurst phenomenon and subsequent theoretical developments that led to the consideration of stochastic processes of this type. The mean functional of (2.1) is E B H(r) is (Samorodnitsky and Taqqu, 1994)

V(rl' r2) = EB H(rl)B H(r2) = Note that BH(O)

= 0 and for V(rl,r2)

=

rl, r2

~

= 0 and the covariance kernel V (rl , r2 ) [h1 2H

+ Ir212H -lr2 -

rll2H] .

> 0 the covariance kernel becomes

~ [r~H +r~H -lr2 _rlI2H].

(2.2)

The usual covariance kernel of Brownian motion follows when H = ~. Following Gel'fand and Vilenkin (1964), define the space K of 'test functions' as follows. K is the space of infinitely continuously differentiable functions

(2.3) 30ther spaces of test functions can be chosen.

For example, the space S of infinitely

Victoria Zinde- Walsh and Peter G.B.Phillips

287

Integrals in linear functionals such as (2.3) are taken from 0 to 00 and they are convergent due to the fact that all q; E K have finite support. Test functions could differ at negative values of r without affecting the value of the functional (BH, q;) . Thus we can restrict ourselves to the subspace K+ of K of functions q;(r) with non-negative support. The representation (2.3) provides an interpretation of BH as a linear functional on the space K+. It is easily seen that this functional is continuous in the topology on K +. Since E (B H) = 0, the mean functional is zero. Next we derive the covariance functional of B H . This functional, which we denote by VH[q;, '¢] is given in terms of the covariance kernel V(rl' r2) of the process BH. For q;, '¢ E K+ we have

VH[q;, '¢] := (V, (q;(t), ,¢(s))) =

JJ

V(t, s)q;(t),¢(s)dtds.

Substituting the expression for V(t, s) from (2.2) we have 2VH

=

11 + 1 1 -1 [it -1 [1 00

00

00

=

[q;, '¢]

[t 2H

q;(t)dt

00

00

q;(t)

,¢(s)

00

S2H

-It -

s2H '¢(s)ds

s12H] q;(t)'¢(s)dtds

+

1

00

'¢(s)ds

1

00

t 2H q;(t)'¢dt

(t - S)(2H+l)-1,¢(S)dS] dt

8

(2.4)

(t - S)(2H+l)-1q;(t)dt] ds.

Denote the integral r(a) J~ (t - x )a-l f(x )dx by (fa f)(t) for a > O. This integral is the fractional integral (in the Liouville sense) of the function f· If g(t) = (fa f)(t) where a > 0, then f is the fractional derivative of 9 and we shall write f(t) = (f-ag)(t). We use these expressions to simplify (2.4) in what follows. Start by noting that since [t 2H

!too '¢(s)ds];;o = 0

which equals

(2H)

= (2H)

1 1 1

00

00

t 2H - 1

00

t 2H - 1 [(f'¢) (00) - (f'¢) (t)] dt.

'¢(s)dsdt

differentiable functions that go to zero at infinity faster than any power, or spaces of functions that are not infinitely differentiable. The number of continuous derivatives that the test functions possess will determine the number of generalized derivatives of the process that can be defined on that space.

Fractional Brownian Motion

288 Use this expression in (2.4) to get

2VH [<,b,1);]

[1 t 2H - [(I1);) (00) - (11);) (t)] dt] + (2H) (I1);)(00) [1 t 2H - 1 [(1<,b) (00) - (I<,b) (t)] dt] 00

= (2H) (1<,b) (00)

1

00

-f(2H + 1)

[1

00

<,b(t) (I2H+11);) (t)dt +

1

00

1);(t) (I2H+l<,b) (t)dt] . (2.5)

Now

1

00

=

<,b(t) (1 2H +1 1);) (t)dt

[(I<,b) (t) (I2H+11);)(t)];;o

= (I<,b) (00) (1 2H+1 1);) (00) =

1

00

-1 -1

00

00

(I<,b) (t) (I2H1);) (t)dt

(1<,b) (t)(12H1);) (t)dt

[(I<,b) (00) - (I<,b) (t)] (I2H1);)(t)dt,

(2.6)

and

1

00

t 2H - 1 [(11);) (00) - (11);) (t)] dt

1e 1 00

=

H- 1

00

1);(s)dsdt.

(2.7)

Using (2.6) and (2.7) in (2.5) gives the following expression for 2VH[<,b, 1);],

[1 t 2H - [(I1);) (00) - (I1);) (t)] dt] + (2H) (I1);)(00) [1 t 2H - [(I<,b) (00) - (I<,b) (t)] dt] 00

(2H) (I<,b)(oo)

1

00

-f(2H + 1) -f(2H + 1)

=

1 +1 00

1 1

00

00

1

[(1<,b) (00) - (I<,b) (t)] (I2H1);)(t)dt [(I1);) (00) - (11);) (t)] (I2H<,b)(t)dt

[(1<,b) (00) - (1<,b) (t)] [t 2H - 1 (2H) (I1);)(00) - f(2H

00

+ 1)(I2H1);)(t)] dt

[(11);) (00) - (I1);) (t)] [t 2H - 1 (2H) (1<,b)(oo) - f(2H

+ 1)(I2H<,b)(t)] dt,

so that

VH [<,b,1);]

=

1

roo [(I<,b) (00) -

(I<,b) (t)] [t 2H - 1 (2H) (I1);)(00) - f(2H

+ 1)(12H1);)(t)] dt +

1

roo [(I1);) (00) -

(I1);) (t)] [t 2H - 1 (2H) (1<,b) (00) - f(2H

+ 1) (12H<,b) (t)]

2 Jo 2 Jo

dt.(2.8)

Victoria Zinde- Walsh and Peter G.B.Phillips

289

Setting H = ~ in this expression, we find that (2.8) specializes to VI [¢, 1jJ] = 2

Joroo [(I¢) (00) -

(I¢) (t)] [(I1jJ)(oo) - (I1jJ) (t)] dt,

which is the covariance functional of Brownian motion as a generalized process, a formula given in Gel'fand and Vilenkin (1964, p. 259). Thus, as a generalized random process, fractional Brownian motion is a generalized Gaussian process with mean functional zero and covariance functional given by (2.8). Observe that (2.8) is a bilinear functional involving fractional integrals of the test functions 1jJ and ¢. This alternative approach provides a new description of fractional Brownian motion. In the conventional manner, fractional Brownian motion can be described by its randomly selected sample paths, so that one can think about this process as being indexed by a random element in the probability space where the process lives. In contrast, the new description of fractional Brownian motion as a generalized process indexes the process by deterministic functions belonging to the class K +. Its covariance properties are similarly indexed by these deterministic functions through the covariance functional VH[¢, 1jJ].

3

The generalized derivative of the fractional Brownian motion process

One advantage of this new description of fractional Brownian motion is that it is differentiable, and the process representing the derivative is also a generalized Gaussian process. The mean functional is zero for the derivative process and, according to Gel'fand and Vilenkin (1964, p. 257), its covariance functional Vk [¢, 1jJ] satisfies Vk[¢, VJ] = VH [¢', Vi]. Substituting ¢', 1jJ' for ¢ and 1jJ, respectively in (2.8), we get the expression

VH[¢' , 1jJ']

=~

2

roo [¢ (00) _ ¢(t)] [t 2H -1 (2H) 1jJ( (0) -

Jo

+~

2

=

roo [1jJ (00) -1jJ(t)] [t 2H -

Jo

r(2~ + 1)

{1°°

r(2H + 1)(I2H 1jJ')(t)] dt

(2H) ¢(oo) - r(2H + 1)(I2H¢')(t)] dt

1

¢(t) (I2H-11jJ)(t)dt

since (Ia+1 f')(t) = (Ia f)(t) and ¢(oo) of the test functions.

+

1

00

1jJ(t)(I 2H - 1¢)(t)dt}

(3.1)

= 1jJ(00) = 0, in view of the finite support

Next we interpret the bilinear functional ViI- First, for ordinary Brownian motion (H = ~) the functional Vk [¢, 1/J] has the simple form V{ [¢, 1/J] = 2

roo ¢(t)1jJ(t)dt,

Jo

Fractional Brownian Motion

290

which can be interpreted in terms of the delta-function 8(w), i.e.,

V{ [¢, 1j;] = (XJ ¢(t)1j;(t)dt. 2 Jo

1 I: 00

=

1 I: 11 00

=

00

=

00

8(w)¢(t)1j;(t + w)dtdw 8(s - t)¢(t)1j;(s)dtds (3.2)

8(s - t)¢(t)1j;(s)dtds.

Thus, the covariance kernel of the derivative of standard Brownian motion is the delta function, as shown in Gel'fand and Vilenkin (1964, p. 260). Similarly in the fractional case we can interpret ViI in terms of a generalized fractional integral/derivative of the delta-function. Treating w(t) = (Ia J)(t) as a generalized function on K, the functional (w, ¢) = J w(t)¢(t)dt is differentiable as a generalized function with derivative (w', ¢) = J w'(t)¢(t)dt = - J w(t)¢'(t)dt by definition of a generalized derivative (Gel'fand and Shilov, 1964). Using this relation in the expression for ViI [¢, 1j;] gives

ViI[¢,1j;]

=

r(2~ + 1)

{1°O ¢(t)(J2H - 1j;)(t)dt + 1 1

00

1j;(t)(I2H-l¢)(t)dt}.

As we see in what follows, this expression can be written in the form

Vk[¢,1j;] = r(2H + 1)

11 00

00

(J 2H - 1 8) (s - t)¢(t)1j;(s)dtds.

(3.3)

extending the representation (3.2) for the covariance functional of the first derivative of Brownian motion. So the covariance kernel of the derivative of fractional Brownian motion (treated as a generalized process) is the fractional derivative/integral (J 2H -18) of the delta function. For H > ~ this is a fractional integral, while for H < ~ it is a fractional derivative. We examine the two cases separately. In the case of a fractional integral with a = 2H - 1

> 0 and t > 0 we have

t

1

ta-

1

(r8) (t) = r(a) Jo (t - xt- 1 8(x)dx = r(a)· Then

1 1 [r~a) 1 1 [I 1 [I 1 1 o) 00

00

=

=

00

00

=

00

=

¢(t) (Ia1j;) (t)dt

t

¢(t) ¢(t) ¢(t)

00

(Ia

(t-x)a-l1j;(x)dx] dt

t

(I a8) (t - X)1j;(X)dX] dt

t

(I a8) (W)1j;(t - W)dW] dt (t - s)¢(t)1j;(s)dsdt,

(3.4)

291

Victoria Zinde- Walsh and Peter G.B.Phillips

and similarly

1=

'ljJ(t) (JU¢) (t)dt

so that

Vk[¢, 'ljJ]

=

r(2~ + 1)

=

r(2H

+ 1)

=

1= 1=

{1= 1= 1=

(JUb) (t - s)¢(t)'ljJ(s)dsdt,

¢(t) (I2H-1'ljJ) (t)dt

+

1=

'ljJ(t)(J 2H - 1¢) (t)dt }

(I2H- 1b) (t - s)¢(t)'ljJ(s)dtds,

giving the result (3.3). In the case of a fractional derivative with a = 2H - 1 < 0 (0 < H < ~) we write J2H -1 I = J2H I' and then

1= 1= [rta) 1 1= [1 1= [1 1= [1 1= 1= ¢(t) (Ia'ljJ') (t)dt

=

=

with a similar result for

r(2~ + 1)

= r(2~ + 1) r(2H

t

¢(t)

t

¢(t)

=

=

t

¢(t)

=

Vk[¢, 'ljJ] =

t

¢(t)

=

+ 1)

(t-x)a-1'ljJ'(x)dX] dt

(rb)(t - x)'ljJ'(X)dX] dt (rb)(W)'ljJ'(t-W)dW] dt (Ia- 1b) (w)'ljJ(t - W)dW] dt

(JU- 1b) (t - s)¢(t)'ljJ(s)dsdt,

Jo= ¢(t)(Ja¢')(t)dt. It follows that

{1= {1= 1= 1=

1= + 1=

¢(t) (I2H-1'ljJ) (t)dt ¢(t) (I2H'ljJ')(t)dt

+

'ljJ(t)(I2H-1¢) (t)dt }

'ljJ(t)(I2H ¢')(t)dt}

(J 2H - 1b) (t - s)¢(t)'ljJ(s)dsdt,

as required for (3.3). Clearly, one can proceed with further differentiation of the fractional process. Subsequent m-th order derivatives will provide gen,eralized Gaussian processes with mean functional zero and covariance functional expressed in terms of the generalized function (J 2H - m b) (t - s). Victoria Zinde Walsh Department of Economics McGill University & CIREQ

Peter C.B. Phillips Cowles Foundation, Yale University University of Auckland & University of York

292

Fractional Brownian Motion

Bibliography [1] Bhattacharya, R. N. and E. C. Waymire (1990). Stochastic Processes with Applications. New York: John Wiley. [2] Gel'fand I M. and G. E. Shilov (1964). Generalized Functions, Vol.4. New York: Academic Press. [3] Gel'fand I M. and N. Ya. Vilenkin (1964). Generalized Functions, Vol. 1. New York: Academic Press. [4] Mandelbrot, B.B. and J. W. Van Ness (1968). "Fractional Brownian Motions, Fractional Noises and Applications". SIAM Review, 10, 422-437. [5] Marinucci, D. and P. M. Robinson (1999). "Alternative Forms of Fractional Brownian Motion". Journal of Statistical Planning and Inference, 80, 111122. [6] Samorodnitsky, G. and M. S. Taqqu (1994). Stable Non-Gaussian Random Processes. London: Chapman & Hall.

A History of Probability and Statistics and Their Applications before 1750 (Wiley Series in Probability and Statistics)

Read more

A History of Probability and Statistics and Their Applications before 1750 (Wiley Series in Probability and Statistics)

Read more

Dependence in Probability and Statistics (Lecture Notes in Statistics)

Read more

Workshop on Branching Processes and Their Applications (Lecture Notes in Statistics Lecture Notes in Statistics - Proceedings)

Read more

papers on probability statistics and statistical physics

Read more

Probability and Measure (Wiley Series in Probability and Statistics)

Read more

Inequalities in Statistics and Probability (Institute of Mathematical Statistics, Lecture Notes-Monograph Series, Vol. 5)

Read more

Spatial Statistics (Wiley Series in Probability and Statistics)

Read more

Directional Statistics (Wiley Series in Probability and Statistics)

Read more

Statistics: A Biomedical Introduction (Wiley Series in Probability and Statistics)

Read more

Statistics for Spatial Data (Wiley Series in Probability and Statistics)

Read more

Models for Probability and Statistical Inference: Theory and Applications (Wiley Series in Probability and Statistics)

Read more

Subjective and Objective Bayesian Statistics: Principles, Models, and Applications (Wiley Series in Probability and Statistics)

Read more

Applications of Statistics to Industrial Experimentation (Wiley Series in Probability and Statistics)

Read more

Advanced Calculus with Applications in Statistics (Wiley Series in Probability and Statistics)

Read more

Advanced Calculus with Applications in Statistics (Wiley Series in Probability and Statistics)

Read more

Analysis of Financial Time Series (Wiley Series in Probability and Statistics - Applied Probability and Statistics Section Series)

Read more

Statistics and Probability for Engineering Applications

Read more

Advanced Calculus with Applications in Statistics (Wiley Series in Probability and Statistics)

Read more

Nonparametric Statistics with Applications to Science and Engineering (Wiley Series in Probability and Statistics)

Read more

Environmental Statistics: Methods and Applications (Wiley Series in Probability and Statistics)

Read more

Dependence in Probability and Statistics

Read more

Dependence in probability and statistics

Read more

Statistics and Probability for Engineering Applications

Read more

Probability and Statistics in Engineering

Read more

Lectures in Probability and Statistics

Read more

Applied Econometric Time Series (Wiley Series in Probability and Statistics)

Read more

Multiple Time Series (Wiley Series in Probability and Mathematical Statistics)

Read more

Alternative Methods of Regression (Wiley Series in Probability and Statistics)

Read more

Methods of Multivariate Analysis (Wiley Series in Probability and Statistics)

Read more

Recommend Documents

A History of Probability and Statistics and Their Applications before 1750 (Wiley Series in Probability and Statistics)

A History of Probability and Statistics and Their Applications before 1750 This Page Intentionally Left Blank A Hist...

A History of Probability and Statistics and Their Applications before 1750 (Wiley Series in Probability and Statistics)

A History of Probability and Statistics and Their Applications before 1750 ANDERS HALD Formerly Professor of Statistics...

Dependence in Probability and Statistics (Lecture Notes in Statistics)

Lecture Notes in Statistics Edited by P. Bickel, P.J. Diggle, S.E Fienberg, U. Gather, I. Olkin, S. Zeger 200 Paul D...

Workshop on Branching Processes and Their Applications (Lecture Notes in Statistics Lecture Notes in Statistics - Proceedings)

Lecture Notes in Statistics – Proceedings Edited by P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger F...

papers on probability statistics and statistical physics

...

Probability and Measure (Wiley Series in Probability and Statistics)

...

Inequalities in Statistics and Probability (Institute of Mathematical Statistics, Lecture Notes-Monograph Series, Vol. 5)

Spatial Statistics (Wiley Series in Probability and Statistics)

Spatial Statistics Spatial Statistics BRIAN D. RIPLEY University of London @EEiCIENCE A JOHN WILEY & SONS, INC., PUB...

Directional Statistics (Wiley Series in Probability and Statistics)

Directional Statistics WILE’S’ SERIES IN PROBABILITY AND STATISTICS Established hy WALTER A. SHEWHART arid SAMUEL S. ...

Statistics: A Biomedical Introduction (Wiley Series in Probability and Statistics)

A Biomedical Introduction Byron Wm. Brown, Jr. Stanford University Myles Hollander Florida State University New York ...